AI tools
Contents
- 1 LLM
 - 2 LLM Agents
 - 3 Interfaces
 - 4 Speech Recognition (ASR) and Transcription
 - 5 Text-to-speech (TTS)
 - 6 Text-to-audio
 - 7 Vision
 - 8 Embedding
 - 9 Time Series
 - 10 Data
 - 11 See Also
 
LLM
Open-weights LLM
- 2023-07Jul-18: Llama2 7B, 13B, 70B
 - 2024-04Apr-18: Llama3 8B, 70B
 - 2024-06Jun-14: Nemotron-4 340B
 - 2024-07Jul-23: Llama 3.1 8B, 70B, 405B
 - 2024-07Jul-24: Mistral Large 2 128B
 - 2024-07Jul-31: Gemma 2 2B
 - 2024-08Aug-08: Qwen2-Math (hf, github) 1.5B, 7B, 72B
 - 2024-08Aug-14: Nous research Hermes 3 (technical report) 8B, 70B, 405B
 - 2024-08Aug-19: Salesforce AI xGen-MM (BLIP-3): A Family of Open Large Multimodal Models (preprint, code)
 - 2024-09Sep-04: OLMoE: Open Mixture-of-Experts Language Models (code) 7B model (uses 1B per input token)
 - 2024-09Sep-05: Reflection 70B (demo): Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.
 - 2024-09Sep-06: DeepSeek-V2.5 238B mixture-of-experts (160 experts, 16B active params)
 - 2024-09Sep-19: Microsoft GRadient-INformed (GRIN) MoE (demo, model, github) 6.6B
 - 2024-09Sep-23: Nvidia Llama-3_1-Nemotron-51B-instruct 51B
 - 2024-09Sep-25: Meta Llama 3.2 with visual and voice modalities 1B, 3B, 11B, 90B
 - 2024-09Sep-25: Ai2 Molmo multi-modal models 1B, 7B, 72B
 - 2024-10Oct-01: Nvidia NVLM-D-72B (includes vision)
 - 2024-10Oct-16: Mistral Ministral-8B-Instruct-2410
 - 2024-10Oct-16: Nvidia Llama-3.1-Nemotron-70B-Reward
 - 2024-11Nov-04: Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent 389B (code, weights)
 - 2024-11Nov-18: Mistral-Large-Instruct-2411) 123B; and Pixtral Large multimodal model 124B (weights)
 - 2024-11Nov-22: Nvidia Hymba (blog): small and high-performance
 - 2024-12Dec-06: Meta Llama 3.3 70B
 - 2024-12Dec-26: DeepSeek-V3-Base 671B
 - 2025-01Jan-02: SmallThinker-3B-Preview (fine-tune of Qwen2.5-3b-Instruct)
 - 2025-01Jan-08: Microsoft phi-4 15B
 - 2025-01Jan-14: MiniMax-01, MiniMax-Text-01 and MiniMax-VL-01; 4M context length (paper)
 - 2025-01Jan-27: Qwen2.5-1M (report)
 - 2025-01Jan-27: DeepSeek Janus-Pro-7B (with image capabilities)
 
For Coding
Rankings: bigcode-models-leaderboard and CodeElo leaderboard
- 2024-10Oct-06: Abacus AI Dracarys2-72B-Instruct (optimized for coding, fine-tune of Qwen2.5-72B-Instruct)
 - 2024-11Nov-09: OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models (weights, preprint)
 - 2024-11Nov-13: Qwen2.5-Coder
 
Reasoning
See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model
- 2024-11Nov-20: DeepSeek-R1-Lite-Preview (results, CoT)
 - 2024-11Nov-23: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
 - 2024-11Nov-27: Alibaba Qwen QwQ 32B (model, demo)
 - 2024-12Dec-04: Ruliad Deepthought 8B (demo)
 - 2024-12Dec-24: Qwen QvQ-72B-preview (visual reasoning)
 - 2025-01Jan-10: LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs (preprint, code, weights)
 - 2025-01Jan-20: DeepSeek-R1, DeepSeek-R1-Distill-Llama-70B, DeepSeek-R1-Distill-Qwen-32B, ... (paper)
 - 2025-02Feb-10: Huginn-0125: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (code, model)
 - 2025-02Feb-14: DeepHermes 3 - Llama-3.1 8B
 
Cloud LLM
Multi-modal: Audio
- kyutai Open Science AI Lab chatbot moshi
 
Triage
Retrieval Augmented Generation (RAG)
Reviews
- 2024-08: Graph Retrieval-Augmented Generation: A Survey
 - 2024-09: Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
 - 2024-12: A Survey of Query Optimization in Large Language Models
 - 2025-01: Enhancing Retrieval-Augmented Generation: A Study of Best Practices
 - 2025-01: Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG (github)
 - List of RAG techniques
 - Advanced RAG Cookbooks👨🏻💻
 
Measuring RAG performance
- 2025-01: The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
 
Analysis of RAG overall
Approaches
- RAGFlow (code)
 - GraphRAG (preprint, code, GraphRAG Accelerator for easy deployment on Azure)
 - AutoMetaRAG (code)
 - Verba: RAG for Weaviate vector database (code, video)
 - Microsoft: PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation
 - 2024-10: Google Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models
 - 2024-10: StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization: Reformats retrieved data into task-appropriate structures (table, graph, tree).
 - 2024-10: Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval
 - 2024-11: FastRAG: Retrieval Augmented Generation for Semi-structured Data
 - 2024-11: Microsoft LazyGraphRAG: Setting a new standard for quality and cost
 - 2024-11: Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models
 - 2025-01: Search-o1: Agentic Search-Enhanced Large Reasoning Models (project, code)
 - 2025-01: AutoRAG: RAG AutoML tool for automatically finding an optimal RAG pipeline for your data
 - 2025-01: VideoRAG: Retrieval-Augmented Generation over Video Corpus
 - 2025-02: DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
 
Open-source Implementations
- kotaemon: An open-source clean & customizable RAG UI for chatting with your documents.
 - LlamaIndex (code, docs, voice chat code)
 - Nvidia ChatRTX with RAG
 - Anthropic Customer Support Agent example
 - LangChain and LangGraph (tutorial)
- RAGBuilder: Automatically tunes RAG hyperparams
 
 - WikiChat
 - Chonkie: No-nonsense RAG chunking library (open-source, lightweight, fast)
 - autoflow: open source GraphRAG (Knowledge Graph), including conversational search page
 - RAGLite
 - nano-graphrag: A simple, easy-to-hack GraphRAG implementation
 - Dabarqus
 
Web-based Tools
- SciSpace Chat with PDF (also available as a GPT).
 
Commercial Cloud Offerings
Automatic Optimization
Analogous to Gradient Descent
LLM for scoring/ranking
- GPTScore: Evaluate as You Desire
 - Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
 - Domain-specific chatbots for science using embeddings
 - Large Language Models as Evaluators for Scientific Synthesis
 
LLM Agents
- See AI Agents.
 
Interfaces
Chatbot Frontend
Web (code)
Web (product)
Desktop GUI
- AnythingLLM (docs, code): includes chat-with-docs, selection of LLM and vector db, etc.
 
Alternative Text Chatbot UI
- Loom provides a sort of tree-like structure for LLM coming up with branched writings.
 - The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.
 
Conversational Audio Chatbot
- Swift is a fast AI voice assistant (code, live demo) uses:
 - RTVI-AI (code, demo), uses:
 - June: Local Voice Chatbot
- Ollama
 - Hugging Face Transformers (for speech recognition)
 - Coqui TTS Toolkit
 
 - kyutai Moshi chatbot (demo)
 - Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming (model, code, demo)
 - 2024-09Sep-11: Llama-3.1-8B-Omni (code), enabling end-to-end speech.
 - 2024-10Oct-18: Meta Spirit LM: open source multimodal language model that freely mixes text and speech
 
Related Research
Commercial Systems
Speech Recognition (ASR) and Transcription
Lists
Open Source
- DeepSpeech
 - speechbrain
 - Kaldi
 - wav2vec 2.0
 - Whisper
- Whisper medium.en
 - WhisperX (includes word-level timestamps and speaker diarization)
 - Distil Large v3 with MLX
 - 2024-10: whisper-large-v3-turbo distillation (demo, code)
 
 - Nvidia Canary 1B
 - 2024-09: Nvidia NeMo
 - 2024-10: Rev AI models for transcription and diarization
 - 2024-10: Moonshine (optimized for resource-constrained devices)
 
In Browser
- Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser
 
Phrase Endpointing and Voice Activity Detection (VAD)
I.e. how to determine when user is done talking, and bot should respond?
Audio Cleanup
- Krisp AI: Noise cancellation, meeting summary, etc.
 
Text-to-speech (TTS)
Open Source
- Parler TTS (demo)
 - Toucan (demo)
 - MetaVoice (github)
 - ChatTTS
 - Camb.ai MARS5-TTS
 - Coqui TTS Toolkit
 - Fish Speech 1.4: multi-lingual, can clone voices (video, weights, demo)
 - F5-TTS (demo): cloning, emotion, etc.
 - MaskGCT (demo)
 - Amphion: An Open-Source Audio, Music and Speech Generation Toolkit (code)
 - Zyphra Zonos
 - Fish Speech (includes voice cloning)
 
Cloud
- Elevenlabs ($50/million characters)
 - Cartesia Sonic
 - Neets AI ($1/million characters)
 - Hailuo AI T2A-01-HD (try, API)
 
Text-to-audio
- 2024-12: TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization (code)
 
Vision
- Langfun library as a means of converting images into structured output.
 
Visual Models
- CLIP
 - Siglip
 - Supervision
 - Florence-2
 - Nvidia MambaVision
 - Meta Sapiens: Foundation for Human Vision Models (video input, can infer segmentation, pose, depth-map, and surface normals)
 
Multi-modal Models (language-vision/video)
- LLaVA-NeXT-Interleave (models, demo)
 - SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
 - Nvidia NVEagle 13B, 7B (demo, preprint)
 - 2024-08Aug-29: Qwen2-VL 7B, 2B (code, models): Can process videos up to 20 minutes in length
 - 2024-09Sep-11: Mistral Pixtral 12B
 - 2024-09Sep-17: NVLM 1.0
 - 2024-12Dec-06: Nvidia NVILA: Efficient Frontier Visual Language Models
 - 2025-01Jan-28: Qwen2.5-VL
 
Optical character recognition (OCR)
- General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model (project, code, demo)
 - Swift OCR: LLM Powered Fast OCR
 - Ollama OCR
 
Related
Embedding
Image Embedding
Time Series
- Stumpy: Python library, uses near-match subsequences for similarity and forecasting
 - Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
 - From latent dynamics to meaningful representations
 - Review of Time Series Forecasting Methods and Their Applications to Particle Accelerators
 - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
 - A decoder-only foundation model for time-series forecasting
 - TimeGPT-1
 - Unified Training of Universal Time Series Forecasting Transformers
 - xLSTMTime : Long-term Time Series Forecasting With xLSTM
 - Salesforce: Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts (code, weights, blog)
 - IBM PatchTSMixer and PatchTST (being used for particle accelerators)
 
Control
Forecasting
- Meta Kats (code): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 - Context is Key: A Benchmark for Forecasting with Essential Textual Information
 
Data
Vector Database
Open Source
- milvus (open source with paid cloud option)
 - Qdrant (open source with paid cloud option)
 - Vespa (open source with paid cloud option)
 - chroma
 - LlamaIndex
 - sqlite-vec
 
Commercial cloud
MySQL
- MySQL does not traditionally have support, but:
- PlanetScale is working on it
 - mysql_vss (discussion)
 - tibd (discussion)
 
 
Database with Search
Data Scraping
- LLM Data Scrapers list
 
Web Scraping
- Firecrawl: API to turn websites into LLM-ready markdown or structured data (can be self-hosted)
 - Crawl4AI: Crawl Smarter, Faster, Freely. For AI.
 - ScrapeGraphAI: You Only Scrape Once: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
 - pipet: A swiss-army tool for scraping and extracting data from online assets
 - ScrapeGraphAI: You Only Scrape Once
 - Scrapling: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
 - sitefetch: Fetch entire site into text file (to be used with AIs)
 - LLM Scraper: Turn webpage into structured data using LLMs
 - Trafilatura: Discover and Extract Text Data on the Web
 
llms.txt Generator
- llms.txt Generator
 - Firecrawl LLMs.txt generator (online tool)
 
Headless Browser (scrape & automate)
Code & Github
- GitIngest: Turn any GitHub repository into a prompt-friendly text file, for inclusion in LLM's context. Available at: gitingest.com
 - repomix: Packs your entire repository into a single, AI-friendly file
 - github.gg: For analyzing GitHub repositories and providing valuable insights about code quality, dependencies, and more
 - Flatty - Codebase-to-Text for LLMs
 - CodeWeaver: Generate a Markdown Document of Your Codebase Structure and Content
 - RepoToTextForLLMs
 
Media Files
Document Parsing
- Docling: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
 - Microsoft Markitdown: converts various formats (PDF, Word, Excel, PPT) to Markdown (available via web interface on replit)
 - e2m: Everything to Markdown (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a)
 - Nvidia NV-ingest (code) scalable, performance-oriented document content and metadata extraction microservice
 - MegaParse: Your Parser for every type of documents (pdf, powerpoint, word)
 - Rowfill: Open-source document processing; extract, analyze, and process data from complex documents, images, PDFs and more with AI
 - Zerox: PDF to markdown vision model (OCR)
 - LlamaParse (example use for multimodal parsing)
 - Marker: PDFs and images to markdown
 
PDF Conversion
PDF Language Translation
Structured Data Extraction
- Unstract: Intelligent Document Processing (IDP): No-code LLM Platform to structure unstructured documents
 
Screenshot
- Microsoft OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent