Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(PDF Conversion)
(Language/Vision/Speech)
 
(151 intermediate revisions by the same user not shown)
Line 23: Line 23:
 
* 2024-11Nov-18: [https://huggingface.co/mistralai/Mistral-Large-Instruct-2411 Mistral-Large-Instruct-2411]) 123B; and [https://mistral.ai/news/pixtral-large/ Pixtral Large] multimodal model 124B ([https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411 weights])
 
* 2024-11Nov-18: [https://huggingface.co/mistralai/Mistral-Large-Instruct-2411 Mistral-Large-Instruct-2411]) 123B; and [https://mistral.ai/news/pixtral-large/ Pixtral Large] multimodal model 124B ([https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411 weights])
 
* 2024-11Nov-22: Nvidia [https://github.com/NVlabs/hymba Hymba] ([https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/ blog]): small and high-performance
 
* 2024-11Nov-22: Nvidia [https://github.com/NVlabs/hymba Hymba] ([https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/ blog]): small and high-performance
 +
* 2024-12Dec-06: Meta [https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct Llama 3.3] 70B
 +
* [https://x.com/deepseek_ai/status/1872242657348710721 2024-12Dec-26]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-Base DeepSeek-V3-Base] 671B
 +
* 2025-01Jan-02: [https://huggingface.co/PowerInfer/SmallThinker-3B-Preview SmallThinker-3B-Preview] (fine-tune of [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct Qwen2.5-3b-Instruct])
 +
* [https://x.com/SebastienBubeck/status/1877010995727470877 2025-01Jan-08]: Microsoft [https://huggingface.co/microsoft/phi-4 phi-4] 15B
 +
* [https://x.com/MiniMax__AI/status/1879226391352549451 2025-01Jan-14]: [https://www.minimaxi.com/en/news/minimax-01-series-2 MiniMax-01], MiniMax-Text-01 and MiniMax-VL-01; 4M context length ([https://www.minimaxi.com/en/news/minimax-01-series-2 paper])
 +
* 2025-01Jan-27: [https://qwenlm.github.io/blog/qwen2.5-1m/ Qwen2.5-1M] ([https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf report])
 +
* 2025-01Jan-27: DeepSeek [https://huggingface.co/deepseek-ai/Janus-Pro-7B Janus-Pro-7B] (with image capabilities)
 +
* [https://x.com/cohere/status/1900170005519753365 2025-03Mar-14]: Cohere [https://cohere.com/blog/command-a Command A] ([https://huggingface.co/CohereForAI/c4ai-command-a-03-2025?ref=cohere-ai.ghost.io weights])
 +
* [https://x.com/MistralAI/status/1901668499832918151 2025-03Mar-17]: [https://mistral.ai/news/mistral-small-3-1 Mistral Small 3.1] 24B ([https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503 weights])
 +
* [https://x.com/deepseek_ai/status/1904526863604883661 2025-03Mar-24]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 DeepSeek-V3-0324] 685B
  
 
===For Coding===
 
===For Coding===
C.f. [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard]
+
Rankings: [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard bigcode-models-leaderboard] and [https://codeelo-bench.github.io/#leaderboard-table CodeElo leaderboard]
 
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
 
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
 
* 2024-11Nov-09: [https://opencoder-llm.github.io/ OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models] ([https://huggingface.co/collections/infly/opencoder-672cec44bbb86c39910fb55e weights], [https://arxiv.org/abs/2411.04905 preprint])
 
* 2024-11Nov-09: [https://opencoder-llm.github.io/ OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models] ([https://huggingface.co/collections/infly/opencoder-672cec44bbb86c39910fb55e weights], [https://arxiv.org/abs/2411.04905 preprint])
Line 31: Line 41:
  
 
===Reasoning===
 
===Reasoning===
 +
See also: [[Increasing_AI_Intelligence|Increasing AI Intelligence]] > Proactive Search > [[Increasing_AI_Intelligence#CoT_reasoning_model|CoT reasoning model]]
 
* [https://x.com/deepseek_ai/status/1859200141355536422 2024-11Nov-20]: DeepSeek-R1-Lite-Preview ([https://x.com/deepseek_ai/status/1859200145037869485 results], [https://x.com/teortaxesTex/status/1859259359630356955 CoT])
 
* [https://x.com/deepseek_ai/status/1859200141355536422 2024-11Nov-20]: DeepSeek-R1-Lite-Preview ([https://x.com/deepseek_ai/status/1859200145037869485 results], [https://x.com/teortaxesTex/status/1859259359630356955 CoT])
 
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11Nov-27: [https://qwenlm.github.io/blog/qwq-32b-preview/ Alibaba Qwen QwQ] 32B ([https://huggingface.co/Qwen/QwQ-32B-Preview model], [https://huggingface.co/spaces/Qwen/QwQ-32B-preview demo])
 
* 2024-11Nov-27: [https://qwenlm.github.io/blog/qwq-32b-preview/ Alibaba Qwen QwQ] 32B ([https://huggingface.co/Qwen/QwQ-32B-Preview model], [https://huggingface.co/spaces/Qwen/QwQ-32B-preview demo])
 +
* [https://x.com/ruliad_ai/status/1864394941029322890 2024-12Dec-04]: [https://www.ruliad.co/ Ruliad] [https://huggingface.co/ruliad/deepthought-8b-llama-v0.01-alpha Deepthought] 8B ([https://chat.ruliad.co/ demo])
 +
* 2024-12Dec-24: Qwen [https://huggingface.co/Qwen/QVQ-72B-Preview QvQ-72B-preview] (visual reasoning)
 +
* 2025-01Jan-10: [https://mbzuai-oryx.github.io/LlamaV-o1/ LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs] ([https://arxiv.org/abs/2501.06186 preprint], [https://github.com/mbzuai-oryx/LlamaV-o1 code], [https://huggingface.co/omkarthawakar/LlamaV-o1 weights])
 +
* [https://x.com/deepseek_ai/status/1881318130334814301 2025-01Jan-20]: [https://huggingface.co/deepseek-ai/DeepSeek-R1 DeepSeek-R1], [https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B DeepSeek-R1-Distill-Llama-70B], DeepSeek-R1-Distill-Qwen-32B, ... ([https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf paper])
 +
* 2025-02Feb-10: [https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125]: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://github.com/seal-rg/recurrent-pretraining code], [https://huggingface.co/tomg-group-umd/huginn-0125 model])
 +
* [https://x.com/NousResearch/status/1890148000204485088 2025-02Feb-14]: [https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview DeepHermes 3 - Llama-3.1 8B]
 +
* [https://x.com/Alibaba_Qwen/status/1894130603513319842 2025-02Feb-24]: Qwen [https://qwenlm.github.io/blog/qwq-max-preview/ QwQ-Max-Preview] ([https://chat.qwen.ai/ online demo])
 +
* [https://x.com/Alibaba_Qwen/status/1897361654763151544 2025-03Mar-05]: Qwen [https://qwenlm.github.io/blog/qwq-32b/ QwQ-32B] ([https://huggingface.co/spaces/Qwen/QwQ-32B-Demo demo])
 +
* [https://x.com/BlinkDL_AI/status/1898579674575552558 2025-03Mar-05]: [https://github.com/BlinkDL/RWKV-LM RWKV7-G1] "GooseOne" 0.1B ([https://huggingface.co/BlinkDL/rwkv7-g1 weights], [https://arxiv.org/abs/2305.13048 preprint])
 +
* [https://x.com/LG_AI_Research/status/1901803002052436323 2025-03Mar-17]: LG AI Research [https://www.lgresearch.ai/blog/view?seq=543 EXAONE Deep] 2.4B, 7.8B, 32B ([https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B weights])
 +
* [https://x.com/kuchaev/status/1902078122792775771 2025-03Mar-18]: Nvidia [https://huggingface.co/collections/nvidia/llama-nemotron-67d92346030a2691293f200b Llama Nemotron] 8B, 49B ([https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1 demo])
 +
 +
===Agentic===
 +
* 2025-02Feb-18: Microsoft [https://huggingface.co/microsoft/Magma-8B Magma-8B] ([https://www.arxiv.org/abs/2502.13130 preprint])
 +
* 2025-02Feb-26: [https://convergence.ai/ Convergence] [https://github.com/convergence-ai/proxy-lite Proxy Lite]
 +
 +
===Multimodal===
 +
====Language/Vision====
 +
* [https://arxiv.org/abs/2407.07895 LLaVA-NeXT-Interleave] ([https://huggingface.co/collections/llava-hf/llava-interleave-668e19a97da0036aad4a2f19 models], [https://huggingface.co/spaces/merve/llava-interleave demo])
 +
* [https://huggingface.co/papers/2407.15841 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models]
 +
* Nvidia [https://huggingface.co/collections/merve/nveagle-66d0705108582d73bb235c26 NVEagle] 13B, 7B ([https://huggingface.co/spaces/NVEagle/Eagle-X5-13B-Chat demo], [https://arxiv.org/abs/2408.15998 preprint])
 +
* 2024-08Aug-29: [https://qwenlm.github.io/blog/qwen2-vl/ Qwen2-VL] 7B, 2B ([https://github.com/QwenLM/Qwen2-VL code], [https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d models]): Can process videos up to 20 minutes in length
 +
* 2024-09Sep-11: Mistral [https://huggingface.co/mistral-community/pixtral-12b-240910 Pixtral 12B]
 +
* 2024-09Sep-17: [https://nvlm-project.github.io/ NVLM 1.0]
 +
* 2024-12Dec-06: Nvidia [https://arxiv.org/abs/2412.04468 NVILA: Efficient Frontier Visual Language Models]
 +
* [https://x.com/Alibaba_Qwen/status/1883954247743725963 2025-01Jan-28]: [https://huggingface.co/collections/Qwen/qwen25-vl-6795ffac22b334a837c0f9a5 Qwen2.5-VL]
 +
* 2025-02Feb-18: Microsoft [https://huggingface.co/microsoft/Magma-8B Magma-8B] ([https://www.arxiv.org/abs/2502.13130 preprint])
 +
* [https://x.com/CohereForAI/status/1896923657470886234 2025-03Mar-05]: Cohere [https://cohere.com/research/aya Aya] 8B, 32B
 +
* 2025-03Mar-12: Google [https://developers.googleblog.com/en/introducing-gemma3/ Gemma 3] 1B 4B, 12B, 27B ([https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf technical report])
 +
* [https://x.com/DeepLearningAI/status/1903295570527002729 2025-03Mar-23]: Cohere [https://cohere.com/blog/aya-vision Aya Vision] 8B, 32B ([https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io weights])
 +
* [https://x.com/Alibaba_Qwen/status/1904227859616641534 2025-03Mar-24]: Alibaba [https://qwenlm.github.io/blog/qwen2.5-vl-32b/ Qwen2.5-VL-32B-Instruct] ([https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct weights])
 +
 +
====Language/Vision/Speech====
 +
* 2025-02Feb-27: Microsoft [https://huggingface.co/microsoft/Phi-4-multimodal-instruct Phi-4-multimodal-instruct] (language, vision, speech)
 +
* [https://x.com/kyutai_labs/status/1903082848547906011 2025-03Mar-21]: kyutai [https://kyutai.org/moshivis MoshiVis] ([https://vis.moshi.chat/ demo])
 +
* [https://x.com/Alibaba_Qwen/status/1904944923159445914 2025-03Mar-26]: [https://qwenlm.github.io/blog/qwen2.5-omni/ Qwen2.5-Omni-7B] ([https://github.com/QwenLM/Qwen2.5-Omni/blob/main/assets/Qwen2.5_Omni.pdf tech report], [https://github.com/QwenLM/Qwen2.5-Omni code], [https://huggingface.co/Qwen/Qwen2.5-Omni-7B weight])
 +
 +
====Language/Audio====
 +
* 2025-03Mar-11: [https://github.com/soham97/mellow Mellow]: a small audio language model for reasoning, 167M ([https://arxiv.org/abs/2503.08540 paper])
 +
* 2025-03Mar-12: [https://research.nvidia.com/labs/adlr/AF2/ Audio Flamingo 2] 0.5B, 1.5B, 3B [https://arxiv.org/abs/2503.03983 paper], [https://github.com/NVIDIA/audio-flamingo code]
  
 
==Cloud LLM==
 
==Cloud LLM==
Line 45: Line 96:
  
 
==Retrieval Augmented Generation (RAG)==
 
==Retrieval Augmented Generation (RAG)==
 +
* See Also: [[AI_tools#Document_Parsing|Document Parsing]]
 +
 
===Reviews===
 
===Reviews===
 
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
 
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
 
* 2024-09: [https://arxiv.org/abs/2409.14924 Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
 
* 2024-09: [https://arxiv.org/abs/2409.14924 Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
 +
* 2024-12: [https://arxiv.org/abs/2412.17558 A Survey of Query Optimization in Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.07391 Enhancing Retrieval-Augmented Generation: A Study of Best Practices]
 +
* 2025-01: [https://arxiv.org/abs/2501.09136 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG] ([https://github.com/asinghcsu/AgenticRAG-Survey github])
 
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
 
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
 +
* [https://github.com/athina-ai/rag-cookbooks Advanced RAG Cookbooks👨🏻‍💻]
 +
* [https://github.com/DEEP-PolyU/Awesome-GraphRAG Awesome-GraphRAG (GraphRAG Survey)]
 +
 +
===Measuring RAG performance===
 +
* 2025-01: [https://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models/ The FACTS Grounding Leaderboard]: [https://arxiv.org/abs/2501.03200 Benchmarking LLMs' Ability to Ground Responses to Long-Form Input]
  
 
===Analysis of RAG overall===
 
===Analysis of RAG overall===
Line 55: Line 116:
 
===Approaches===
 
===Approaches===
 
* RAGFlow ([https://github.com/infiniflow/ragflow code])
 
* RAGFlow ([https://github.com/infiniflow/ragflow code])
* GraphRAG ([https://arxiv.org/abs/2404.16130 preprint], [https://github.com/microsoft/graphrag code])
+
* GraphRAG ([https://arxiv.org/abs/2404.16130 preprint], [https://github.com/microsoft/graphrag code], [https://github.com/Azure-Samples/graphrag-accelerator GraphRAG Accelerator] for easy deployment on Azure)
** [https://github.com/Azure-Samples/graphrag-accelerator GraphRAG Accelerator] for easy deployment on Azure
 
 
* AutoMetaRAG ([https://github.com/darshil3011/AutoMetaRAG/tree/main code])
 
* AutoMetaRAG ([https://github.com/darshil3011/AutoMetaRAG/tree/main code])
* [https://verba.weaviate.io/ Verba]: RAG for [https://weaviate.io/ Weaviate] vector database
+
* [https://verba.weaviate.io/ Verba]: RAG for [https://weaviate.io/ Weaviate] vector database ([https://github.com/weaviate/verba code], [https://www.youtube.com/watch?v=UoowC-hsaf0 video])
** [https://github.com/weaviate/verba code]
+
* Microsoft: [https://github.com/microsoft/PIKE-RAG PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation]
** [https://www.youtube.com/watch?v=UoowC-hsaf0 video]
+
* 2024-10: Google [https://arxiv.org/abs/2410.07176 Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models]
* Google Astute RAG
 
** Preprint: [https://arxiv.org/abs/2410.07176 Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models]
 
 
* 2024-10: [https://arxiv.org/abs/2410.08815 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization]: Reformats retrieved data into task-appropriate structures (table, graph, tree).
 
* 2024-10: [https://arxiv.org/abs/2410.08815 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization]: Reformats retrieved data into task-appropriate structures (table, graph, tree).
 
* 2024-10: [https://arxiv.org/abs/2410.13765 Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval]
 
* 2024-10: [https://arxiv.org/abs/2410.13765 Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval]
 
* 2024-11: [https://www.arxiv.org/abs/2411.13773 FastRAG: Retrieval Augmented Generation for Semi-structured Data]
 
* 2024-11: [https://www.arxiv.org/abs/2411.13773 FastRAG: Retrieval Augmented Generation for Semi-structured Data]
 
* 2024-11: Microsoft [https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/ LazyGraphRAG: Setting a new standard for quality and cost]
 
* 2024-11: Microsoft [https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/ LazyGraphRAG: Setting a new standard for quality and cost]
 +
* 2024-11: [https://arxiv.org/abs/2411.19443 Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.05366 Search-o1: Agentic Search-Enhanced Large Reasoning Models] ([https://search-o1.github.io/ project], [https://github.com/sunnynexus/Search-o1 code])
 +
* 2025-01: [https://github.com/Marker-Inc-Korea/AutoRAG AutoRAG: RAG AutoML tool for automatically finding an optimal RAG pipeline for your data]
 +
* 2025-01: [https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus]
 +
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
 +
* 2025-02: [https://weaviate.io/developers/weaviate/tutorials/multi-vector-embeddings Multi-vector embeddings]
  
 
===Open-source Implementations===
 
===Open-source Implementations===
Line 80: Line 144:
 
* [https://github.com/pingcap/autoflow autoflow]: open source GraphRAG (Knowledge Graph), including conversational search page
 
* [https://github.com/pingcap/autoflow autoflow]: open source GraphRAG (Knowledge Graph), including conversational search page
 
* [https://github.com/superlinear-ai/raglite RAGLite]
 
* [https://github.com/superlinear-ai/raglite RAGLite]
 +
* [https://github.com/gusye1234/nano-graphrag nano-graphrag]: A simple, easy-to-hack GraphRAG implementation
 +
* [https://github.com/electricpipelines/barq Dabarqus]
  
 
===Web-based Tools===
 
===Web-based Tools===
 
* [https://typeset.io/ SciSpace] Chat with PDF (also available as a GPT).
 
* [https://typeset.io/ SciSpace] Chat with PDF (also available as a GPT).
  
===Document Parsing===
+
===Commercial Cloud Offerings===
* [https://github.com/DS4SD/docling Docling]
+
* [https://www.graphlit.com/ Graphlit]
====PDF Conversion====
+
* [https://colivara.com/ ColiVara]
* [https://github.com/kermitt2/grobid Grobid]
+
* [https://nhost.io/blog/assistants-file-stores nhost]
* [https://chunkr.ai/ Chunkr] ([https://github.com/lumina-ai-inc/chunkr code])
+
* [https://vespa.ai/ Vespa] [https://vespa.ai/solutions/enterprise-retrieval-augmented-generation/ RAG]
* [https://github.com/DS4SD/docling Docling]: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
+
* [https://unstructured.io/ Unstructured]
 
+
* [https://www.fivetran.com/blog/assembling-a-rag-architecture-using-fivetran Fivetran]
==Automatic Optimization==
+
* [https://platform.vectorize.io/ Vectorize]
===Analogous to Gradient Descent===
+
* [https://www.voyageai.com/ Voyage AI]
* [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text]
+
* [https://abacus.ai/ Abacus AI]
* [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents]
 
  
 
==LLM for scoring/ranking==
 
==LLM for scoring/ranking==
Line 107: Line 172:
 
=Interfaces=
 
=Interfaces=
 
==Chatbot Frontend==
 
==Chatbot Frontend==
===Web===
+
===Web (code)===
 
* [https://docs.streamlit.io/develop/tutorials/llms/build-conversational-apps Steamlit]
 
* [https://docs.streamlit.io/develop/tutorials/llms/build-conversational-apps Steamlit]
 
* [https://docs.cohere.com/v2/docs/cohere-toolkit Cohere Toolkit] ([https://github.com/cohere-ai/cohere-toolkit code])
 
* [https://docs.cohere.com/v2/docs/cohere-toolkit Cohere Toolkit] ([https://github.com/cohere-ai/cohere-toolkit code])
Line 113: Line 178:
 
* [https://github.com/open-webui/open-webui open-webui]
 
* [https://github.com/open-webui/open-webui open-webui]
 
* [https://github.com/xjdr-alt/entropix/tree/main/ui entropix frontend UI]
 
* [https://github.com/xjdr-alt/entropix/tree/main/ui entropix frontend UI]
 +
 +
===Web (product)===
 +
* [https://chatboxai.app/en Chatbox]
 +
 
===Desktop GUI===
 
===Desktop GUI===
 
* [https://anythingllm.com/ AnythingLLM] ([https://docs.anythingllm.com/ docs], [https://github.com/Mintplex-Labs/anything-llm code]): includes chat-with-docs, selection of LLM and vector db, etc.
 
* [https://anythingllm.com/ AnythingLLM] ([https://docs.anythingllm.com/ docs], [https://github.com/Mintplex-Labs/anything-llm code]): includes chat-with-docs, selection of LLM and vector db, etc.
Line 139: Line 208:
 
* 2024-09Sep-11: [https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni Llama-3.1-8B-Omni] ([https://github.com/ictnlp/LLaMA-Omni code]), enabling end-to-end speech.
 
* 2024-09Sep-11: [https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni Llama-3.1-8B-Omni] ([https://github.com/ictnlp/LLaMA-Omni code]), enabling end-to-end speech.
 
* [https://x.com/AIatMeta/status/1847383580269510670 2024-10Oct-18]: Meta [https://speechbot.github.io/spiritlm/ Spirit LM]: open source multimodal language model that freely mixes text and speech
 
* [https://x.com/AIatMeta/status/1847383580269510670 2024-10Oct-18]: Meta [https://speechbot.github.io/spiritlm/ Spirit LM]: open source multimodal language model that freely mixes text and speech
 +
* 2025-02Feb-28: [https://www.sesame.com/ Sesame] ([https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo demo])
 +
 +
===Turn Detection===
 +
* 2025-03: [https://github.com/pipecat-ai/smart-turn Smart Turn]: Open-source
  
 
===Related Research===
 
===Related Research===
Line 149: Line 222:
 
* [https://www.bland.ai Bland AI]
 
* [https://www.bland.ai Bland AI]
 
* [https://deepgram.com/ DeepGram Voice AI]
 
* [https://deepgram.com/ DeepGram Voice AI]
 +
* [https://www.sesame.com/ Sesame] ([https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo demo])
  
 
=Speech Recognition (ASR) and Transcription=
 
=Speech Recognition (ASR) and Transcription=
Line 195: Line 269:
 
* [https://huggingface.co/amphion/MaskGCT MaskGCT] ([https://huggingface.co/spaces/amphion/maskgct demo])
 
* [https://huggingface.co/amphion/MaskGCT MaskGCT] ([https://huggingface.co/spaces/amphion/maskgct demo])
 
* [https://arxiv.org/abs/2312.09911 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit] ([https://github.com/open-mmlab/Amphion code])
 
* [https://arxiv.org/abs/2312.09911 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit] ([https://github.com/open-mmlab/Amphion code])
 +
* [https://www.zyphra.com/ Zyphra] [https://huggingface.co/Zyphra/Zonos-v0.1-hybrid Zonos]
 +
* [https://github.com/fishaudio/fish-speech Fish Speech] (includes voice cloning)
 +
* [https://canopylabs.ai/ Canopy] [https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2 Orpheus] 3B
  
 
==Cloud==
 
==Cloud==
Line 201: Line 278:
 
* [https://cartesia.ai/ Cartesia] [https://cartesia.ai/sonic Sonic]
 
* [https://cartesia.ai/ Cartesia] [https://cartesia.ai/sonic Sonic]
 
* [https://neets.ai/ Neets AI] ($1/million characters)
 
* [https://neets.ai/ Neets AI] ($1/million characters)
 +
* Hailuo AI T2A-01-HD ([https://www.hailuo.ai/audio try], [https://intl.minimaxi.com/document/platform%20introduction?key=66701c8e1d57f38758d58198 API])
 +
* [https://www.hume.ai/ Hume] (can set emotion, give acting directions, etc.)
 +
 +
=Text-to-audio=
 +
* 2024-12: [https://tangoflux.github.io/ TangoFlux]: [https://arxiv.org/abs/2412.21037 Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization] ([https://github.com/declare-lab/TangoFlux code])
 +
* 2025-03: [https://arxiv.org/abs/2503.10522 AudioX: Diffusion Transformer for Anything-to-Audio Generation]
  
 
=Vision=
 
=Vision=
 +
* [https://github.com/google/langfun Langfun] library as a means of converting images into structured output.
 +
* See also: [[AI_tools#Multimodal| Multimodal open-weights models]]
 +
 
==Visual Models==
 
==Visual Models==
 
* [https://openai.com/index/clip/ CLIP]
 
* [https://openai.com/index/clip/ CLIP]
Line 211: Line 297:
 
* Meta [https://about.meta.com/realitylabs/codecavatars/sapiens Sapiens: Foundation for Human Vision Models] (video input, can infer segmentation, pose, depth-map, and surface normals)
 
* Meta [https://about.meta.com/realitylabs/codecavatars/sapiens Sapiens: Foundation for Human Vision Models] (video input, can infer segmentation, pose, depth-map, and surface normals)
  
==Multi-modal Models (language-vision/video)==
+
==Depth==
* [https://arxiv.org/abs/2407.07895 LLaVA-NeXT-Interleave] ([https://huggingface.co/collections/llava-hf/llava-interleave-668e19a97da0036aad4a2f19 models], [https://huggingface.co/spaces/merve/llava-interleave demo])
+
* 2024-06: [https://arxiv.org/abs/2406.09414 Depth Anything V2] ([https://github.com/DepthAnything/Depth-Anything-V2 code])
* [https://huggingface.co/papers/2407.15841 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models]
+
 
* Nvidia [https://huggingface.co/collections/merve/nveagle-66d0705108582d73bb235c26 NVEagle] 13B, 7B ([https://huggingface.co/spaces/NVEagle/Eagle-X5-13B-Chat demo], [https://arxiv.org/abs/2408.15998 preprint])
+
==Superresolution==
* 2024-08Aug-29: [https://qwenlm.github.io/blog/qwen2-vl/ Qwen2-VL] 7B, 2B ([https://github.com/QwenLM/Qwen2-VL code], [https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d models]): Can process videos up to 20 minutes in length
+
* 2025-03: [https://arxiv.org/abs/2311.17643 Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields] ([https://github.com/prs-eth/thera code], [https://huggingface.co/spaces/prs-eth/thera use])
* 2024-09Sep-11: Mistral [https://huggingface.co/mistral-community/pixtral-12b-240910 Pixtral 12B]
 
* 2024-09Sep-17: [https://nvlm-project.github.io/ NVLM 1.0]
 
  
==Optical character recognition (OCR)==
+
==Related==
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
+
* 2019-11: [https://arxiv.org/abs/1911.11763 SuperGlue: Learning Feature Matching with Graph Neural Networks] ([https://huggingface.co/docs/transformers/main/en/model_doc/superglue hf])
  
 
=Embedding=
 
=Embedding=
 
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
 
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
 +
 +
==Text Embedding==
 +
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
 +
* 2025-02: [https://huggingface.co/chandar-lab/NeoBERT NeoBERT]
 +
* 2025-03: [https://developers.googleblog.com/en/gemini-embedding-text-model-now-available-gemini-api/ gemini-embedding-exp-03-07]
 +
 +
==Image Embedding==
 +
* 2025-01: [https://arxiv.org/abs/2501.18593 Diffusion Autoencoders are Scalable Image Tokenizers] ([https://yinboc.github.io/dito/ project], [https://github.com/yinboc/dito code])
  
 
=Time Series=
 
=Time Series=
Line 229: Line 321:
 
* [https://arxiv.org/abs/1912.09363 Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]
 
* [https://arxiv.org/abs/1912.09363 Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]
 
* [https://arxiv.org/abs/2209.00905 From latent dynamics to meaningful representations]
 
* [https://arxiv.org/abs/2209.00905 From latent dynamics to meaningful representations]
 +
* [https://arxiv.org/abs/2209.10705 Review of Time Series Forecasting Methods and Their Applications to Particle Accelerators]
 
* [https://arxiv.org/abs/2310.01728 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models]
 
* [https://arxiv.org/abs/2310.01728 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models]
 
* [https://arxiv.org/abs/2310.10688 A decoder-only foundation model for time-series forecasting]
 
* [https://arxiv.org/abs/2310.10688 A decoder-only foundation model for time-series forecasting]
Line 235: Line 328:
 
* [https://arxiv.org/abs/2407.10240 xLSTMTime : Long-term Time Series Forecasting With xLSTM]
 
* [https://arxiv.org/abs/2407.10240 xLSTMTime : Long-term Time Series Forecasting With xLSTM]
 
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 +
* IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
  
 
==Control==
 
==Control==
Line 241: Line 335:
 
==Forecasting==
 
==Forecasting==
 
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 +
* [https://arxiv.org/abs/2410.18959 Context is Key: A Benchmark for Forecasting with Essential Textual Information]
 +
 +
==Anomaly Detection==
 +
* 2024-10: [https://arxiv.org/abs/2410.05440 Can LLMs Understand Time Series Anomalies?] ([https://github.com/rose-stl-lab/anomllm code])
  
 
=Data=
 
=Data=
 +
* See also: [[Data_Extraction#Data_Scraping| Data Scraping]] and [[Data_Extraction#Document_Parsing| Document Parsing]]
 
==Vector Database==
 
==Vector Database==
 
===Open Source===
 
===Open Source===
Line 264: Line 363:
 
==Database with Search==
 
==Database with Search==
 
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])
 
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])
 
==Web Scraping==
 
* [https://github.com/mendableai/firecrawl Firecrawl]
 
 
=Hardware=
 
==AI Acceleration Hardware==
 
* Nvidia GPUs
 
* [https://en.wikipedia.org/wiki/Tensor_Processing_Unit Google TPU]
 
* [https://en.wikipedia.org/wiki/Tesla_Dojo Tesla Dojo]
 
* [https://www.cerebras.net/ Cerebras]
 
* [https://www.graphcore.ai/ Graphcore]
 
* [https://www.untether.ai/ Untether AI]
 
* [https://sambanova.ai/ SambaNova Systems]
 
* [https://groq.com/ Groq]
 
* [https://deepsilicon.com/ Deep Silicon]: Combined hardware/software solution for accelerated AI ([https://x.com/sdianahu/status/1833186687369023550 e.g.] ternary math)
 
* [https://www.etched.com/ Etched]: Transformer ASICs
 
 
==Cloud Training Compute==
 
* [https://nebius.ai/ Nebius AI]
 
* [https://glaive.ai/ Glaive AI]
 
  
 
=See Also=
 
=See Also=
 
* [[AI agents]]
 
* [[AI agents]]
 
* [[AI understanding]]
 
* [[AI understanding]]
 +
* [[AI compute]]
 
* [[Robots]]
 
* [[Robots]]

Latest revision as of 14:25, 26 March 2025

LLM

Open-weights LLM

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

Agentic

Multimodal

Language/Vision

Language/Vision/Speech

Language/Audio

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Turn Detection

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

Visual Models

Depth

Superresolution

Related

Embedding

Text Embedding

Image Embedding

Time Series

Control

Forecasting

Anomaly Detection

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

See Also