Latest revision as of 14:25, 26 March 2025

LLM

Open-weights LLM

2023-07Jul-18: Llama2 7B, 13B, 70B
2024-04Apr-18: Llama3 8B, 70B
2024-06Jun-14: Nemotron-4 340B
2024-07Jul-23: Llama 3.1 8B, 70B, 405B
2024-07Jul-24: Mistral Large 2 128B
2024-07Jul-31: Gemma 2 2B
2024-08Aug-08: Qwen2-Math (hf, github) 1.5B, 7B, 72B
2024-08Aug-14: Nous research Hermes 3 (technical report) 8B, 70B, 405B
2024-08Aug-19: Salesforce AI xGen-MM (BLIP-3): A Family of Open Large Multimodal Models (preprint, code)
2024-09Sep-04: OLMoE: Open Mixture-of-Experts Language Models (code) 7B model (uses 1B per input token)
2024-09Sep-05: Reflection 70B (demo): Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.
2024-09Sep-06: DeepSeek-V2.5 238B mixture-of-experts (160 experts, 16B active params)
2024-09Sep-19: Microsoft GRadient-INformed (GRIN) MoE (demo, model, github) 6.6B
2024-09Sep-23: Nvidia Llama-3_1-Nemotron-51B-instruct 51B
2024-09Sep-25: Meta Llama 3.2 with visual and voice modalities 1B, 3B, 11B, 90B
2024-09Sep-25: Ai2 Molmo multi-modal models 1B, 7B, 72B
2024-10Oct-01: Nvidia NVLM-D-72B (includes vision)
2024-10Oct-16: Mistral Ministral-8B-Instruct-2410
2024-10Oct-16: Nvidia Llama-3.1-Nemotron-70B-Reward
2024-11Nov-04: Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent 389B (code, weights)
2024-11Nov-18: Mistral-Large-Instruct-2411) 123B; and Pixtral Large multimodal model 124B (weights)
2024-11Nov-22: Nvidia Hymba (blog): small and high-performance
2024-12Dec-06: Meta Llama 3.3 70B
2024-12Dec-26: DeepSeek-V3-Base 671B
2025-01Jan-02: SmallThinker-3B-Preview (fine-tune of Qwen2.5-3b-Instruct)
2025-01Jan-08: Microsoft phi-4 15B
2025-01Jan-14: MiniMax-01, MiniMax-Text-01 and MiniMax-VL-01; 4M context length (paper)
2025-01Jan-27: Qwen2.5-1M (report)
2025-01Jan-27: DeepSeek Janus-Pro-7B (with image capabilities)
2025-03Mar-14: Cohere Command A (weights)
2025-03Mar-17: Mistral Small 3.1 24B (weights)
2025-03Mar-24: DeepSeek-V3-0324 685B

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

2024-10Oct-06: Abacus AI Dracarys2-72B-Instruct (optimized for coding, fine-tune of Qwen2.5-72B-Instruct)
2024-11Nov-09: OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (weights, preprint)
2024-11Nov-13: Qwen2.5-Coder

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

2024-11Nov-20: DeepSeek-R1-Lite-Preview (results, CoT)
2024-11Nov-23: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
2024-11Nov-27: Alibaba Qwen QwQ 32B (model, demo)
2024-12Dec-04: Ruliad Deepthought 8B (demo)
2024-12Dec-24: Qwen QvQ-72B-preview (visual reasoning)
2025-01Jan-10: LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs (preprint, code, weights)
2025-01Jan-20: DeepSeek-R1, DeepSeek-R1-Distill-Llama-70B, DeepSeek-R1-Distill-Qwen-32B, ... (paper)
2025-02Feb-10: Huginn-0125: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (code, model)
2025-02Feb-14: DeepHermes 3 - Llama-3.1 8B
2025-02Feb-24: Qwen QwQ-Max-Preview (online demo)
2025-03Mar-05: Qwen QwQ-32B (demo)
2025-03Mar-05: RWKV7-G1 "GooseOne" 0.1B (weights, preprint)
2025-03Mar-17: LG AI Research EXAONE Deep 2.4B, 7.8B, 32B (weights)
2025-03Mar-18: Nvidia Llama Nemotron 8B, 49B (demo)

Agentic

2025-02Feb-18: Microsoft Magma-8B (preprint)
2025-02Feb-26: Convergence Proxy Lite

Multimodal

Language/Vision

LLaVA-NeXT-Interleave (models, demo)
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Nvidia NVEagle 13B, 7B (demo, preprint)
2024-08Aug-29: Qwen2-VL 7B, 2B (code, models): Can process videos up to 20 minutes in length
2024-09Sep-11: Mistral Pixtral 12B
2024-09Sep-17: NVLM 1.0
2024-12Dec-06: Nvidia NVILA: Efficient Frontier Visual Language Models
2025-01Jan-28: Qwen2.5-VL
2025-02Feb-18: Microsoft Magma-8B (preprint)
2025-03Mar-05: Cohere Aya 8B, 32B
2025-03Mar-12: Google Gemma 3 1B 4B, 12B, 27B (technical report)
2025-03Mar-23: Cohere Aya Vision 8B, 32B (weights)
2025-03Mar-24: Alibaba Qwen2.5-VL-32B-Instruct (weights)

Language/Vision/Speech

2025-02Feb-27: Microsoft Phi-4-multimodal-instruct (language, vision, speech)
2025-03Mar-21: kyutai MoshiVis (demo)
2025-03Mar-26: Qwen2.5-Omni-7B (tech report, code, weight)

Language/Audio

2025-03Mar-11: Mellow: a small audio language model for reasoning, 167M (paper)
2025-03Mar-12: Audio Flamingo 2 0.5B, 1.5B, 3B paper, code

Cloud LLM

Groq cloud (very fast inference)

Multi-modal: Audio

kyutai Open Science AI Lab chatbot moshi

Triage

RouteLLM: Learning to Route LLMs with Preference Data

Retrieval Augmented Generation (RAG)

See Also: Document Parsing

Reviews

Open-source Implementations

kotaemon: An open-source clean & customizable RAG UI for chatting with your documents.
LlamaIndex (code, docs, voice chat code)
Nvidia ChatRTX with RAG
Anthropic Customer Support Agent example
LangChain and LangGraph (tutorial)
- RAGBuilder: Automatically tunes RAG hyperparams
WikiChat
- WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Chonkie: No-nonsense RAG chunking library (open-source, lightweight, fast)
autoflow: open source GraphRAG (Knowledge Graph), including conversational search page
RAGLite
nano-graphrag: A simple, easy-to-hack GraphRAG implementation
Dabarqus

Web-based Tools

SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

LLM for scoring/ranking

LLM Agents

See AI Agents.

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Chatbox

Desktop GUI

AnythingLLM (docs, code): includes chat-with-docs, selection of LLM and vector db, etc.

Alternative Text Chatbot UI

Loom provides a sort of tree-like structure for LLM coming up with branched writings.
The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Swift is a fast AI voice assistant (code, live demo) uses:
- Groq cloud running OpenAI Whisper for fast speech transcription.
- Cartesia Sonic for fast speech synthesis
- VAD to detect when user is talking
- Vercel for app deployment
RTVI-AI (code, demo), uses:
- Groq
- Llama 3.1
- Daily
- RTVI
June: Local Voice Chatbot
- Ollama
- Hugging Face Transformers (for speech recognition)
- Coqui TTS Toolkit
kyutai Moshi chatbot (demo)
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming (model, code, demo)
2024-09Sep-11: Llama-3.1-8B-Omni (code), enabling end-to-end speech.
2024-10Oct-18: Meta Spirit LM: open source multimodal language model that freely mixes text and speech
2025-02Feb-28: Sesame (demo)

Turn Detection

2025-03: Smart Turn: Open-source

Related Research

Language Model Can Listen While Speaking

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open ASR Leaderboard

Open Source

DeepSpeech
speechbrain
Kaldi
wav2vec 2.0
- Paper: Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Whisper
- Whisper medium.en
- WhisperX (includes word-level timestamps and speaker diarization)
- Distil Large v3 with MLX
- 2024-10: whisper-large-v3-turbo distillation (demo, code)
Nvidia Canary 1B
2024-09: Nvidia NeMo
2024-10: Rev AI models for transcription and diarization
2024-10: Moonshine (optimized for resource-constrained devices)

In Browser

Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Notes

Audio Cleanup

Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Parler TTS (demo)
Toucan (demo)
MetaVoice (github)
ChatTTS
Camb.ai MARS5-TTS
Coqui TTS Toolkit
Fish Speech 1.4: multi-lingual, can clone voices (video, weights, demo)
F5-TTS (demo): cloning, emotion, etc.
MaskGCT (demo)
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit (code)
Zyphra Zonos
Fish Speech (includes voice cloning)
Canopy Orpheus 3B

Cloud

Elevenlabs ($50/million characters)
- voice isolator
Cartesia Sonic
Neets AI ($1/million characters)
Hailuo AI T2A-01-HD (try, API)
Hume (can set emotion, give acting directions, etc.)

Text-to-audio

Vision

Langfun library as a means of converting images into structured output.
See also: Multimodal open-weights models

Visual Models

CLIP
Siglip
Supervision
Florence-2
Nvidia MambaVision
Meta Sapiens: Foundation for Human Vision Models (video input, can infer segmentation, pose, depth-map, and surface normals)

Depth

2024-06: Depth Anything V2 (code)

Superresolution

2025-03: Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields (code, use)

Embedding

A Comparison of Top Embedding Libraries for Generative AI

Text Embedding

2024-12: modernBERT
2025-02: NeoBERT
2025-03: gemini-embedding-exp-03-07

Image Embedding

2025-01: Diffusion Autoencoders are Scalable Image Tokenizers (project, code)

Time Series

Stumpy: Python library, uses near-match subsequences for similarity and forecasting
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
From latent dynamics to meaningful representations
Review of Time Series Forecasting Methods and Their Applications to Particle Accelerators
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
A decoder-only foundation model for time-series forecasting
TimeGPT-1
Unified Training of Universal Time Series Forecasting Transformers
xLSTMTime : Long-term Time Series Forecasting With xLSTM
Salesforce: Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts (code, weights, blog)
IBM PatchTSMixer and PatchTST (being used for particle accelerators)

Control

PIDformer: Transformer Meets Control Theory

Forecasting

Meta Kats (code): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
Context is Key: A Benchmark for Forecasting with Essential Textual Information

Anomaly Detection

2024-10: Can LLMs Understand Time Series Anomalies? (code)

Data

See also: Data Scraping and Document Parsing

Vector Database

Open Source

milvus (open source with paid cloud option)
Qdrant (open source with paid cloud option)
Vespa (open source with paid cloud option)
chroma
LlamaIndex
sqlite-vec

Commercial cloud

MySQL

MySQL does not traditionally have support, but:
- PlanetScale is working on it
- mysql_vss (discussion)
- tibd (discussion)

Database with Search

Typesense (code)

@@ Line 31: / Line 31: @@
 * 2025-01Jan-27: DeepSeek [https://huggingface.co/deepseek-ai/Janus-Pro-7B Janus-Pro-7B] (with image capabilities)
 * [https://x.com/cohere/status/1900170005519753365 2025-03Mar-14]: Cohere [https://cohere.com/blog/command-a Command A] ([https://huggingface.co/CohereForAI/c4ai-command-a-03-2025?ref=cohere-ai.ghost.io weights])
+* [https://x.com/MistralAI/status/1901668499832918151 2025-03Mar-17]: [https://mistral.ai/news/mistral-small-3-1 Mistral Small 3.1] 24B ([https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503 weights])
+* [https://x.com/deepseek_ai/status/1904526863604883661 2025-03Mar-24]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 DeepSeek-V3-0324] 685B
 ===For Coding===
@@ Line 52: / Line 54: @@
 * [https://x.com/Alibaba_Qwen/status/1897361654763151544 2025-03Mar-05]: Qwen [https://qwenlm.github.io/blog/qwq-32b/ QwQ-32B] ([https://huggingface.co/spaces/Qwen/QwQ-32B-Demo demo])
 * [https://x.com/BlinkDL_AI/status/1898579674575552558 2025-03Mar-05]: [https://github.com/BlinkDL/RWKV-LM RWKV7-G1] "GooseOne" 0.1B ([https://huggingface.co/BlinkDL/rwkv7-g1 weights], [https://arxiv.org/abs/2305.13048 preprint])
+* [https://x.com/LG_AI_Research/status/1901803002052436323 2025-03Mar-17]: LG AI Research [https://www.lgresearch.ai/blog/view?seq=543 EXAONE Deep] 2.4B, 7.8B, 32B ([https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B weights])
+* [https://x.com/kuchaev/status/1902078122792775771 2025-03Mar-18]: Nvidia [https://huggingface.co/collections/nvidia/llama-nemotron-67d92346030a2691293f200b Llama Nemotron] 8B, 49B ([https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1 demo])
 ===Agentic===
@@ Line 70: / Line 74: @@
 * [https://x.com/CohereForAI/status/1896923657470886234 2025-03Mar-05]: Cohere [https://cohere.com/research/aya Aya] 8B, 32B
 * 2025-03Mar-12: Google [https://developers.googleblog.com/en/introducing-gemma3/ Gemma 3] 1B 4B, 12B, 27B ([https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf technical report])
+* [https://x.com/DeepLearningAI/status/1903295570527002729 2025-03Mar-23]: Cohere [https://cohere.com/blog/aya-vision Aya Vision] 8B, 32B ([https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io weights])
+* [https://x.com/Alibaba_Qwen/status/1904227859616641534 2025-03Mar-24]: Alibaba [https://qwenlm.github.io/blog/qwen2.5-vl-32b/ Qwen2.5-VL-32B-Instruct] ([https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct weights])
 ====Language/Vision/Speech====
 * 2025-02Feb-27: Microsoft [https://huggingface.co/microsoft/Phi-4-multimodal-instruct Phi-4-multimodal-instruct] (language, vision, speech)
+* [https://x.com/kyutai_labs/status/1903082848547906011 2025-03Mar-21]: kyutai [https://kyutai.org/moshivis MoshiVis] ([https://vis.moshi.chat/ demo])
+* [https://x.com/Alibaba_Qwen/status/1904944923159445914 2025-03Mar-26]: [https://qwenlm.github.io/blog/qwen2.5-omni/ Qwen2.5-Omni-7B] ([https://github.com/QwenLM/Qwen2.5-Omni/blob/main/assets/Qwen2.5_Omni.pdf tech report], [https://github.com/QwenLM/Qwen2.5-Omni code], [https://huggingface.co/Qwen/Qwen2.5-Omni-7B weight])
 ====Language/Audio====
@@ Line 263: / Line 271: @@
 * [https://www.zyphra.com/ Zyphra] [https://huggingface.co/Zyphra/Zonos-v0.1-hybrid Zonos]
 * [https://github.com/fishaudio/fish-speech Fish Speech] (includes voice cloning)
+* [https://canopylabs.ai/ Canopy] [https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2 Orpheus] 3B
 ==Cloud==
@@ Line 274: / Line 283: @@
 =Text-to-audio=
 * 2024-12: [https://tangoflux.github.io/ TangoFlux]: [https://arxiv.org/abs/2412.21037 Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization] ([https://github.com/declare-lab/TangoFlux code])
+* 2025-03: [https://arxiv.org/abs/2503.10522 AudioX: Diffusion Transformer for Anything-to-Audio Generation]
 =Vision=
@@ Line 286: / Line 296: @@
 * Nvidia [https://github.com/NVlabs/MambaVision MambaVision]
 * Meta [https://about.meta.com/realitylabs/codecavatars/sapiens Sapiens: Foundation for Human Vision Models] (video input, can infer segmentation, pose, depth-map, and surface normals)
+==Depth==
+* 2024-06: [https://arxiv.org/abs/2406.09414 Depth Anything V2] ([https://github.com/DepthAnything/Depth-Anything-V2 code])
+==Superresolution==
+* 2025-03: [https://arxiv.org/abs/2311.17643 Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields] ([https://github.com/prs-eth/thera code], [https://huggingface.co/spaces/prs-eth/thera use])
 ==Related==
@@ Line 313: / Line 329: @@
 * Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 * IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
 ==Control==
@@ Line 321: / Line 336: @@
 * Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 * [https://arxiv.org/abs/2410.18959 Context is Key: A Benchmark for Forecasting with Essential Textual Information]
+==Anomaly Detection==
+* 2024-10: [https://arxiv.org/abs/2410.05440 Can LLMs Understand Time Series Anomalies?] ([https://github.com/rose-stl-lab/anomllm code])
 =Data=

Difference between revisions of "AI tools"

Latest revision as of 14:25, 26 March 2025

Contents

LLM

Open-weights LLM

For Coding

Reasoning

Agentic

Multimodal

Language/Vision

Language/Vision/Speech

Language/Audio

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

Commercial Cloud Offerings

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

Conversational Audio Chatbot

Turn Detection

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

Phrase Endpointing and Voice Activity Detection (VAD)

Audio Cleanup

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

Visual Models

Depth

Superresolution

Related

Embedding

Text Embedding

Image Embedding

Time Series

Control

Forecasting

Anomaly Detection

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

See Also

Navigation menu

Search