Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(Open Source)
(Reasoning)
 
(35 intermediate revisions by the same user not shown)
Line 35: Line 35:
 
* 2025-04Apr-05: Meta [https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Llama 4] (109B, 400B, 2T)
 
* 2025-04Apr-05: Meta [https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Llama 4] (109B, 400B, 2T)
 
* [https://x.com/kuchaev/status/1909444566379573646 2025-04Apr-08]: Nvidia [https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Llama-3_1-Nemotron-Ultra-253B-v1]
 
* [https://x.com/kuchaev/status/1909444566379573646 2025-04Apr-08]: Nvidia [https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Llama-3_1-Nemotron-Ultra-253B-v1]
 +
* [https://x.com/MistralAI/status/1920119463430500541 2025-05May-07]: Mistral [https://mistral.ai/news/mistral-medium-3 Medium 3]
 +
* [https://x.com/googleaidevs/status/1938279967026274383 2025-06Jun-26]: Google [https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/ Gemma 3n] (on-device multimodal)
 +
* [https://x.com/Alibaba_Qwen/status/1953128028047102241 2025-08Aug-06]: [https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507 Qwen3-4B-Instruct-2507]
 +
* [https://x.com/GoogleDeepMind/status/1956393664248271082 2025-08Aug-15]: Google [https://developers.googleblog.com/en/introducing-gemma-3-270m/ Gemma 3 270M]
  
===For Coding===
+
===Coding===
 
Rankings: [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard bigcode-models-leaderboard] and [https://codeelo-bench.github.io/#leaderboard-table CodeElo leaderboard]
 
Rankings: [https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard bigcode-models-leaderboard] and [https://codeelo-bench.github.io/#leaderboard-table CodeElo leaderboard]
 
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
 
* 2024-10Oct-06: [https://abacus.ai/ Abacus AI] [https://huggingface.co/abacusai/Dracarys2-72B-Instruct Dracarys2-72B-Instruct] (optimized for coding, fine-tune of [https://huggingface.co/Qwen/Qwen2.5-72B-Instruct Qwen2.5-72B-Instruct])
Line 42: Line 46:
 
* 2024-11Nov-13: [https://qwenlm.github.io/blog/qwen2.5-coder-family/ Qwen2.5-Coder]
 
* 2024-11Nov-13: [https://qwenlm.github.io/blog/qwen2.5-coder-family/ Qwen2.5-Coder]
 
* [https://x.com/Agentica_/status/1909700115755061374 2025-04Apr-08]: [https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51 DeepCoder-14B-Preview] ([https://github.com/agentica-project/rllm code], [https://huggingface.co/agentica-org/DeepCoder-14B-Preview hf])
 
* [https://x.com/Agentica_/status/1909700115755061374 2025-04Apr-08]: [https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51 DeepCoder-14B-Preview] ([https://github.com/agentica-project/rllm code], [https://huggingface.co/agentica-org/DeepCoder-14B-Preview hf])
 +
* [https://x.com/GeZhang86038849/status/1921147887871742329 2025-05May-10]: ByteDance [https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Base SeedCoder] 8B
 +
* [https://x.com/Kimi_Moonshot/status/1943687594560332025 2025-07Jul-11]: [https://moonshotai.github.io/Kimi-K2/ Kimi-K2] 1T ([https://github.com/MoonshotAI/Kimi-K2 code], [https://huggingface.co/moonshotai weights])
 +
* [https://x.com/Alibaba_Qwen/status/1947766835023335516 2025-07Jul-23]: [https://qwenlm.github.io/blog/qwen3-coder/ Qwen3-Coder-480B-A35B-Instruct] ([https://github.com/QwenLM/qwen-code code], [https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct weights])
  
 
===Reasoning===
 
===Reasoning===
Line 60: Line 67:
 
* [https://x.com/kuchaev/status/1902078122792775771 2025-03Mar-18]: Nvidia [https://huggingface.co/collections/nvidia/llama-nemotron-67d92346030a2691293f200b Llama Nemotron] 8B, 49B ([https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1 demo])
 
* [https://x.com/kuchaev/status/1902078122792775771 2025-03Mar-18]: Nvidia [https://huggingface.co/collections/nvidia/llama-nemotron-67d92346030a2691293f200b Llama Nemotron] 8B, 49B ([https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1 demo])
 
* [https://x.com/Agentica_/status/1909700115755061374 2025-04Apr-08]: [https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51 DeepCoder-14B-Preview] ([https://github.com/agentica-project/rllm code], [https://huggingface.co/agentica-org/DeepCoder-14B-Preview hf])
 
* [https://x.com/Agentica_/status/1909700115755061374 2025-04Apr-08]: [https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51 DeepCoder-14B-Preview] ([https://github.com/agentica-project/rllm code], [https://huggingface.co/agentica-org/DeepCoder-14B-Preview hf])
 +
* 2025-04Apr-10: Bytedance [https://github.com/ByteDance-Seed/Seed-Thinking-v1.5 Seed-Thinking-v1.5] 200B
 +
* [https://x.com/ZyphraAI/status/1910362745423425966 2025-04Apr-11]: [https://www.zyphra.com/ Zyphra] [https://www.zyphra.com/post/introducing-zr1-1-5b-a-small-but-powerful-math-code-reasoning-model ZR1-1.5B] ([https://huggingface.co/Zyphra/ZR1-1.5B weights], [https://playground.zyphra.com/sign-in use])
 +
* [https://x.com/Alibaba_Qwen/status/1916962087676612998 2025-04Apr-29]: [https://qwenlm.github.io/blog/qwen3/ Qwen3] 0.6B to 235B ([https://github.com/QwenLM/Qwen3 code], [https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f weights], [https://modelscope.cn/home modelscope])
 +
* [https://x.com/DimitrisPapail/status/1917731614899028190 2025-04Apr-30]: [https://huggingface.co/microsoft/Phi-4-reasoning Phi-4 Reasoning] 14B ([https://www.microsoft.com/en-us/research/wp-content/uploads/2025/04/phi_4_reasoning.pdf tech report])
 +
* [https://x.com/deepseek_ai/status/1928061589107900779 2025-05May-28]: [https://huggingface.co/deepseek-ai/DeepSeek-R1-0528 DeepSeek-R1-0528]
 +
* [https://x.com/MistralAI/status/1932441507262259564 2025-06Jun-10]: Mistral [https://mistral.ai/static/research/magistral.pdf Magistral] 24B ([https://huggingface.co/mistralai/Magistral-Small-2506 weights])
 +
* [https://x.com/LoubnaBenAllal1/status/1942614508549333211 2025-07Jul-08]: [https://huggingface.co/blog/smollm3 SmolLM3]: smol, multilingual, long-context reasoner
 +
* [https://x.com/OpenAI/status/1952776916517404876 2025-08Aug-05]: [https://openai.com/open-models/ OpenAI] gpt-oss-120b, gpt-oss-20b
 +
* [https://x.com/Alibaba_Qwen/status/1953128028047102241 2025-08Aug-06]: [https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507 Qwen3-4B-Thinking-2507]
 +
* 2025-09Sep: [https://huggingface.co/LLM360/K2-Think K2-Think] 32B
  
 
===Agentic===
 
===Agentic===
Line 80: Line 97:
 
* [https://x.com/DeepLearningAI/status/1903295570527002729 2025-03Mar-23]: Cohere [https://cohere.com/blog/aya-vision Aya Vision] 8B, 32B ([https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io weights])
 
* [https://x.com/DeepLearningAI/status/1903295570527002729 2025-03Mar-23]: Cohere [https://cohere.com/blog/aya-vision Aya Vision] 8B, 32B ([https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io weights])
 
* [https://x.com/Alibaba_Qwen/status/1904227859616641534 2025-03Mar-24]: Alibaba [https://qwenlm.github.io/blog/qwen2.5-vl-32b/ Qwen2.5-VL-32B-Instruct] ([https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct weights])
 
* [https://x.com/Alibaba_Qwen/status/1904227859616641534 2025-03Mar-24]: Alibaba [https://qwenlm.github.io/blog/qwen2.5-vl-32b/ Qwen2.5-VL-32B-Instruct] ([https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct weights])
 +
* 2025-05May-20: ByteDance [https://bagel-ai.org/ BAGEL: Unified Model for Multimodal Understanding and Generation] 7B ([https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT weights], [https://github.com/bytedance-seed/BAGEL code], [https://demo.bagel-ai.org/ demo])
  
 
====Language/Vision/Speech====
 
====Language/Vision/Speech====
Line 89: Line 107:
 
* 2025-03Mar-11: [https://github.com/soham97/mellow Mellow]: a small audio language model for reasoning, 167M ([https://arxiv.org/abs/2503.08540 paper])
 
* 2025-03Mar-11: [https://github.com/soham97/mellow Mellow]: a small audio language model for reasoning, 167M ([https://arxiv.org/abs/2503.08540 paper])
 
* 2025-03Mar-12: [https://research.nvidia.com/labs/adlr/AF2/ Audio Flamingo 2] 0.5B, 1.5B, 3B [https://arxiv.org/abs/2503.03983 paper], [https://github.com/NVIDIA/audio-flamingo code]
 
* 2025-03Mar-12: [https://research.nvidia.com/labs/adlr/AF2/ Audio Flamingo 2] 0.5B, 1.5B, 3B [https://arxiv.org/abs/2503.03983 paper], [https://github.com/NVIDIA/audio-flamingo code]
 +
 +
===RAG===
 +
* 2025-04: [https://huggingface.co/collections/PleIAs/pleias-rag-680a0d78b058fffe4c16724d Pleias-RAG] 350M, 1.2B
 +
** Paper: [http://ragpdf.pleias.fr/ Even Small Reasoners Should Quote Their Sources: Introducing Pleias-RAG Model Family]
 +
* 2025-04: Meta ReasonIR 8B: [https://arxiv.org/abs/2504.20595 ReasonIR: Training Retrievers for Reasoning Tasks]
  
 
==Cloud LLM==
 
==Cloud LLM==
Line 249: Line 272:
 
* 2024-10: [https://www.rev.ai/ Rev AI] [https://huggingface.co/Revai models] for [https://huggingface.co/Revai/reverb-asr transcription] and [https://huggingface.co/Revai/reverb-diarization-v2 diarization]
 
* 2024-10: [https://www.rev.ai/ Rev AI] [https://huggingface.co/Revai models] for [https://huggingface.co/Revai/reverb-asr transcription] and [https://huggingface.co/Revai/reverb-diarization-v2 diarization]
 
* 2024-10: [https://github.com/usefulsensors/moonshine Moonshine] (optimized for resource-constrained devices)
 
* 2024-10: [https://github.com/usefulsensors/moonshine Moonshine] (optimized for resource-constrained devices)
 +
* 2025-05: [https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2 Parakeet TDT 0.6B V2]
 +
* [https://x.com/kyutai_labs/status/1925840420187025892 2025-05]: [https://kyutai.org/ Kyutai] [https://unmute.sh/ Unmute]
  
 
==In Browser==
 
==In Browser==
Line 262: Line 287:
 
==Audio Cleanup==
 
==Audio Cleanup==
 
* [https://krisp.ai/ Krisp AI]: Noise cancellation, meeting summary, etc.
 
* [https://krisp.ai/ Krisp AI]: Noise cancellation, meeting summary, etc.
 +
 +
==Auto Video Transcription==
 +
* [https://www.translate.mom/ TranslateMom]
 +
* [https://github.com/abus-aikorea/voice-pro Voice-Pro]: YouTube downloader, speech separation, transcription, translation, TTS, and voice cloning toolkit for creators
  
 
=Text-to-speech (TTS)=
 
=Text-to-speech (TTS)=
Line 279: Line 308:
 
* [https://canopylabs.ai/ Canopy] [https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2 Orpheus] 3B
 
* [https://canopylabs.ai/ Canopy] [https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2 Orpheus] 3B
 
* Canopy [https://canopylabs.ai/releases/orpheus_can_speak_any_language Orpheus Multilingual]
 
* Canopy [https://canopylabs.ai/releases/orpheus_can_speak_any_language Orpheus Multilingual]
 +
* [https://narilabs.org/ Nari Labs] [https://github.com/nari-labs/dia Dia]
 +
* [https://kyutai.org/ Kyutai] [https://kyutai.org/next/tts TTS] [https://unmute.sh/ Unmute]
 +
* [https://github.com/resemble-ai/chatterbox Chatterbox TTS] ([https://huggingface.co/spaces/ResembleAI/Chatterbox try])
 +
* [https://play.ai/ Play AI] [https://github.com/playht/PlayDiffusion PlayDiffusion] ([https://huggingface.co/spaces/PlayHT/PlayDiffusion demo], [https://x.com/_mfelfel/status/1929586464125239589 example])
 +
* Mistral [https://mistral.ai/news/voxtral Voxtral]
 +
* Kitten TTS ([https://github.com/KittenML/KittenTTS github], [https://huggingface.co/KittenML/kitten-tts-nano-0.1 hf]) 15M (fast, light-weight)
 +
* Microsoft [https://microsoft.github.io/VibeVoice/ VibeVoice] 1.5B
  
 
==Cloud==
 
==Cloud==
Line 318: Line 354:
 
==Text Embedding==
 
==Text Embedding==
 
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
 
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
* 2025-02: [https://huggingface.co/chandar-lab/NeoBERT NeoBERT]
+
* 2025-02: [https://huggingface.co/chandar-lab/NeoBERT NeoBERT] ([https://arxiv.org/abs/2502.19587 preprint])
 
* 2025-03: [https://developers.googleblog.com/en/gemini-embedding-text-model-now-available-gemini-api/ gemini-embedding-exp-03-07]
 
* 2025-03: [https://developers.googleblog.com/en/gemini-embedding-text-model-now-available-gemini-api/ gemini-embedding-exp-03-07]
  
Line 372: Line 408:
  
 
=See Also=
 
=See Also=
 +
* [[AI]]
 +
** [[Data Extraction]]
 +
** [[AI compute]]
 
* [[AI agents]]
 
* [[AI agents]]
 
* [[AI understanding]]
 
* [[AI understanding]]
* [[AI compute]]
 
 
* [[Robots]]
 
* [[Robots]]

Latest revision as of 12:26, 12 September 2025

LLM

Open-weights LLM

Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

Agentic

Multimodal

Language/Vision

Language/Vision/Speech

Language/Audio

RAG

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Turn Detection

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Auto Video Transcription

  • TranslateMom
  • Voice-Pro: YouTube downloader, speech separation, transcription, translation, TTS, and voice cloning toolkit for creators

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

Visual Models

Depth

Superresolution

Related

Embedding

Text Embedding

Image Embedding

Time Series

Control

Forecasting

Anomaly Detection

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

See Also