Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(Multi-modal Models (language-vision/video))
(Retrieval Augmented Generation (RAG))
 
(36 intermediate revisions by the same user not shown)
Line 24: Line 24:
 
* 2024-11Nov-22: Nvidia [https://github.com/NVlabs/hymba Hymba] ([https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/ blog]): small and high-performance
 
* 2024-11Nov-22: Nvidia [https://github.com/NVlabs/hymba Hymba] ([https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/ blog]): small and high-performance
 
* 2024-12Dec-06: Meta [https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct Llama 3.3] 70B
 
* 2024-12Dec-06: Meta [https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct Llama 3.3] 70B
 +
* [https://x.com/deepseek_ai/status/1872242657348710721 2024-12Dec-26]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-Base DeepSeek-V3-Base] 671B
 +
* 2025-01Jan-02: [https://huggingface.co/PowerInfer/SmallThinker-3B-Preview SmallThinker-3B-Preview] (fine-tune of [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct Qwen2.5-3b-Instruct])
  
 
===For Coding===
 
===For Coding===
C.f. [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard]
+
Rankings: [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard bigcode-models-leaderboard] and [https://codeelo-bench.github.io/#leaderboard-table CodeElo leaderboard]
 
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
 
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
 
* 2024-11Nov-09: [https://opencoder-llm.github.io/ OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models] ([https://huggingface.co/collections/infly/opencoder-672cec44bbb86c39910fb55e weights], [https://arxiv.org/abs/2411.04905 preprint])
 
* 2024-11Nov-09: [https://opencoder-llm.github.io/ OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models] ([https://huggingface.co/collections/infly/opencoder-672cec44bbb86c39910fb55e weights], [https://arxiv.org/abs/2411.04905 preprint])
Line 35: Line 37:
 
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11Nov-27: [https://qwenlm.github.io/blog/qwq-32b-preview/ Alibaba Qwen QwQ] 32B ([https://huggingface.co/Qwen/QwQ-32B-Preview model], [https://huggingface.co/spaces/Qwen/QwQ-32B-preview demo])
 
* 2024-11Nov-27: [https://qwenlm.github.io/blog/qwq-32b-preview/ Alibaba Qwen QwQ] 32B ([https://huggingface.co/Qwen/QwQ-32B-Preview model], [https://huggingface.co/spaces/Qwen/QwQ-32B-preview demo])
 +
* [https://x.com/ruliad_ai/status/1864394941029322890 2024-12Dec-04]: [https://www.ruliad.co/ Ruliad] [https://huggingface.co/ruliad/deepthought-8b-llama-v0.01-alpha Deepthought] 8B ([https://chat.ruliad.co/ demo])
 +
* 2024-12Dec-24: Qwen [https://huggingface.co/Qwen/QVQ-72B-Preview QvQ-72B-preview] (visual reasoning)
  
 
==Cloud LLM==
 
==Cloud LLM==
Line 49: Line 53:
 
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
 
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
 
* 2024-09: [https://arxiv.org/abs/2409.14924 Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
 
* 2024-09: [https://arxiv.org/abs/2409.14924 Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
 +
* 2024-12: [https://arxiv.org/abs/2412.17558 A Survey of Query Optimization in Large Language Models]
 
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
 
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
 +
* [https://github.com/athina-ai/rag-cookbooks Advanced RAG Cookbooks👨🏻‍💻]
 +
 +
===Measuring RAG performance===
 +
* 2025-01: [https://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models/ The FACTS Grounding Leaderboard]: [https://arxiv.org/abs/2501.03200 Benchmarking LLMs' Ability to Ground Responses to Long-Form Input]
  
 
===Analysis of RAG overall===
 
===Analysis of RAG overall===
Line 79: Line 88:
 
* [https://github.com/superlinear-ai/raglite RAGLite]
 
* [https://github.com/superlinear-ai/raglite RAGLite]
 
* [https://github.com/gusye1234/nano-graphrag nano-graphrag]: A simple, easy-to-hack GraphRAG implementation
 
* [https://github.com/gusye1234/nano-graphrag nano-graphrag]: A simple, easy-to-hack GraphRAG implementation
 +
* [https://github.com/electricpipelines/barq Dabarqus]
  
 
===Web-based Tools===
 
===Web-based Tools===
 
* [https://typeset.io/ SciSpace] Chat with PDF (also available as a GPT).
 
* [https://typeset.io/ SciSpace] Chat with PDF (also available as a GPT).
 +
 +
===Commercial Cloud Offerings===
 +
* [https://www.graphlit.com/ Graphlit]
 +
* [https://colivara.com/ ColiVara]
 +
* [https://nhost.io/blog/assistants-file-stores nhost]
 +
* [https://vespa.ai/ Vespa] [https://vespa.ai/solutions/enterprise-retrieval-augmented-generation/ RAG]
 +
* [https://unstructured.io/ Unstructured]
 +
* [https://www.fivetran.com/blog/assembling-a-rag-architecture-using-fivetran Fivetran]
 +
* [https://platform.vectorize.io/ Vectorize]
  
 
===Document Parsing===
 
===Document Parsing===
 
* [https://github.com/DS4SD/docling Docling]: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
 
* [https://github.com/DS4SD/docling Docling]: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
 +
* [https://github.com/microsoft/markitdown Microsoft Markitdown]: converts various formats (PDF, Word, Excel, PPT) to Markdown (available via [https://msftmd.replit.app/ web interface on replit])
 
====PDF Conversion====
 
====PDF Conversion====
 
* [https://github.com/kermitt2/grobid Grobid]
 
* [https://github.com/kermitt2/grobid Grobid]
Line 105: Line 125:
 
=Interfaces=
 
=Interfaces=
 
==Chatbot Frontend==
 
==Chatbot Frontend==
===Web===
+
===Web (code)===
 
* [https://docs.streamlit.io/develop/tutorials/llms/build-conversational-apps Steamlit]
 
* [https://docs.streamlit.io/develop/tutorials/llms/build-conversational-apps Steamlit]
 
* [https://docs.cohere.com/v2/docs/cohere-toolkit Cohere Toolkit] ([https://github.com/cohere-ai/cohere-toolkit code])
 
* [https://docs.cohere.com/v2/docs/cohere-toolkit Cohere Toolkit] ([https://github.com/cohere-ai/cohere-toolkit code])
Line 111: Line 131:
 
* [https://github.com/open-webui/open-webui open-webui]
 
* [https://github.com/open-webui/open-webui open-webui]
 
* [https://github.com/xjdr-alt/entropix/tree/main/ui entropix frontend UI]
 
* [https://github.com/xjdr-alt/entropix/tree/main/ui entropix frontend UI]
 +
 +
===Web (product)===
 +
* [https://chatboxai.app/en Chatbox]
 +
 
===Desktop GUI===
 
===Desktop GUI===
 
* [https://anythingllm.com/ AnythingLLM] ([https://docs.anythingllm.com/ docs], [https://github.com/Mintplex-Labs/anything-llm code]): includes chat-with-docs, selection of LLM and vector db, etc.
 
* [https://anythingllm.com/ AnythingLLM] ([https://docs.anythingllm.com/ docs], [https://github.com/Mintplex-Labs/anything-llm code]): includes chat-with-docs, selection of LLM and vector db, etc.
Line 199: Line 223:
 
* [https://cartesia.ai/ Cartesia] [https://cartesia.ai/sonic Sonic]
 
* [https://cartesia.ai/ Cartesia] [https://cartesia.ai/sonic Sonic]
 
* [https://neets.ai/ Neets AI] ($1/million characters)
 
* [https://neets.ai/ Neets AI] ($1/million characters)
 +
 +
=Text-to-audio=
 +
* 2024-12: [https://tangoflux.github.io/ TangoFlux]: [https://arxiv.org/abs/2412.21037 Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization] ([https://github.com/declare-lab/TangoFlux code])
  
 
=Vision=
 
=Vision=
Line 220: Line 247:
 
==Optical character recognition (OCR)==
 
==Optical character recognition (OCR)==
 
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
 
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
 +
* [https://github.com/yigitkonur/swift-ocr-llm-powered-pdf-to-markdown Swift OCR: LLM Powered Fast OCR]
  
 
=Embedding=
 
=Embedding=
 
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
 
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
 +
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
  
 
=Time Series=
 
=Time Series=
Line 240: Line 269:
 
==Forecasting==
 
==Forecasting==
 
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 +
* [https://arxiv.org/abs/2410.18959 Context is Key: A Benchmark for Forecasting with Essential Textual Information]
  
 
=Data=
 
=Data=
Line 267: Line 297:
 
* [https://github.com/mendableai/firecrawl Firecrawl]
 
* [https://github.com/mendableai/firecrawl Firecrawl]
 
* [https://github.com/unclecode/crawl4ai Crawl4AI: Crawl Smarter, Faster, Freely. For AI.]
 
* [https://github.com/unclecode/crawl4ai Crawl4AI: Crawl Smarter, Faster, Freely. For AI.]
 
+
* [https://github.com/cyclotruc/gitingest GitIngest]: Turn any GitHub repository into a prompt-friendly text file, for inclusion in LLM's context. Available at: [https://gitingest.com/ gitingest.com]
=Hardware=
+
* [https://github.gg/ github.gg]: For analyzing GitHub repositories and providing valuable insights about code quality, dependencies, and more
==AI Acceleration Hardware==
+
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
* Nvidia GPUs
 
* [https://en.wikipedia.org/wiki/Tensor_Processing_Unit Google TPU]
 
* [https://en.wikipedia.org/wiki/Tesla_Dojo Tesla Dojo]
 
* [https://www.cerebras.net/ Cerebras]
 
* [https://www.graphcore.ai/ Graphcore]
 
* [https://www.untether.ai/ Untether AI]
 
* [https://sambanova.ai/ SambaNova Systems]
 
* [https://groq.com/ Groq]
 
* [https://deepsilicon.com/ Deep Silicon]: Combined hardware/software solution for accelerated AI ([https://x.com/sdianahu/status/1833186687369023550 e.g.] ternary math)
 
* [https://www.etched.com/ Etched]: Transformer ASICs
 
 
 
==Cloud Training Compute==
 
* [https://nebius.ai/ Nebius AI]
 
* [https://glaive.ai/ Glaive AI]
 
  
 
=See Also=
 
=See Also=
 
* [[AI agents]]
 
* [[AI agents]]
 
* [[AI understanding]]
 
* [[AI understanding]]
 +
* [[AI compute]]
 
* [[Robots]]
 
* [[Robots]]

Latest revision as of 08:47, 7 January 2025

LLM

Open-weights LLM

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

Document Parsing

PDF Conversion

Automatic Optimization

Analogous to Gradient Descent

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

Visual Models

Multi-modal Models (language-vision/video)

Optical character recognition (OCR)

Embedding

Time Series

Control

Forecasting

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

Web Scraping

See Also