Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(Open-weights LLM)
(Language/Vision/Speech)
 
(12 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
* 2025-01Jan-27: DeepSeek [https://huggingface.co/deepseek-ai/Janus-Pro-7B Janus-Pro-7B] (with image capabilities)
 
* 2025-01Jan-27: DeepSeek [https://huggingface.co/deepseek-ai/Janus-Pro-7B Janus-Pro-7B] (with image capabilities)
 
* [https://x.com/cohere/status/1900170005519753365 2025-03Mar-14]: Cohere [https://cohere.com/blog/command-a Command A] ([https://huggingface.co/CohereForAI/c4ai-command-a-03-2025?ref=cohere-ai.ghost.io weights])
 
* [https://x.com/cohere/status/1900170005519753365 2025-03Mar-14]: Cohere [https://cohere.com/blog/command-a Command A] ([https://huggingface.co/CohereForAI/c4ai-command-a-03-2025?ref=cohere-ai.ghost.io weights])
 +
* [https://x.com/MistralAI/status/1901668499832918151 2025-03Mar-17]: [https://mistral.ai/news/mistral-small-3-1 Mistral Small 3.1] 24B ([https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503 weights])
 +
* [https://x.com/deepseek_ai/status/1904526863604883661 2025-03Mar-24]: [https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 DeepSeek-V3-0324] 685B
  
 
===For Coding===
 
===For Coding===
Line 52: Line 54:
 
* [https://x.com/Alibaba_Qwen/status/1897361654763151544 2025-03Mar-05]: Qwen [https://qwenlm.github.io/blog/qwq-32b/ QwQ-32B] ([https://huggingface.co/spaces/Qwen/QwQ-32B-Demo demo])
 
* [https://x.com/Alibaba_Qwen/status/1897361654763151544 2025-03Mar-05]: Qwen [https://qwenlm.github.io/blog/qwq-32b/ QwQ-32B] ([https://huggingface.co/spaces/Qwen/QwQ-32B-Demo demo])
 
* [https://x.com/BlinkDL_AI/status/1898579674575552558 2025-03Mar-05]: [https://github.com/BlinkDL/RWKV-LM RWKV7-G1] "GooseOne" 0.1B ([https://huggingface.co/BlinkDL/rwkv7-g1 weights], [https://arxiv.org/abs/2305.13048 preprint])
 
* [https://x.com/BlinkDL_AI/status/1898579674575552558 2025-03Mar-05]: [https://github.com/BlinkDL/RWKV-LM RWKV7-G1] "GooseOne" 0.1B ([https://huggingface.co/BlinkDL/rwkv7-g1 weights], [https://arxiv.org/abs/2305.13048 preprint])
 +
* [https://x.com/LG_AI_Research/status/1901803002052436323 2025-03Mar-17]: LG AI Research [https://www.lgresearch.ai/blog/view?seq=543 EXAONE Deep] 2.4B, 7.8B, 32B ([https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-32B weights])
 +
* [https://x.com/kuchaev/status/1902078122792775771 2025-03Mar-18]: Nvidia [https://huggingface.co/collections/nvidia/llama-nemotron-67d92346030a2691293f200b Llama Nemotron] 8B, 49B ([https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1 demo])
  
 
===Agentic===
 
===Agentic===
Line 70: Line 74:
 
* [https://x.com/CohereForAI/status/1896923657470886234 2025-03Mar-05]: Cohere [https://cohere.com/research/aya Aya] 8B, 32B
 
* [https://x.com/CohereForAI/status/1896923657470886234 2025-03Mar-05]: Cohere [https://cohere.com/research/aya Aya] 8B, 32B
 
* 2025-03Mar-12: Google [https://developers.googleblog.com/en/introducing-gemma3/ Gemma 3] 1B 4B, 12B, 27B ([https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf technical report])
 
* 2025-03Mar-12: Google [https://developers.googleblog.com/en/introducing-gemma3/ Gemma 3] 1B 4B, 12B, 27B ([https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf technical report])
 +
* [https://x.com/DeepLearningAI/status/1903295570527002729 2025-03Mar-23]: Cohere [https://cohere.com/blog/aya-vision Aya Vision] 8B, 32B ([https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io weights])
 +
* [https://x.com/Alibaba_Qwen/status/1904227859616641534 2025-03Mar-24]: Alibaba [https://qwenlm.github.io/blog/qwen2.5-vl-32b/ Qwen2.5-VL-32B-Instruct] ([https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct weights])
  
 
====Language/Vision/Speech====
 
====Language/Vision/Speech====
 
* 2025-02Feb-27: Microsoft [https://huggingface.co/microsoft/Phi-4-multimodal-instruct Phi-4-multimodal-instruct] (language, vision, speech)
 
* 2025-02Feb-27: Microsoft [https://huggingface.co/microsoft/Phi-4-multimodal-instruct Phi-4-multimodal-instruct] (language, vision, speech)
 +
* [https://x.com/kyutai_labs/status/1903082848547906011 2025-03Mar-21]: kyutai [https://kyutai.org/moshivis MoshiVis] ([https://vis.moshi.chat/ demo])
 +
* [https://x.com/Alibaba_Qwen/status/1904944923159445914 2025-03Mar-26]: [https://qwenlm.github.io/blog/qwen2.5-omni/ Qwen2.5-Omni-7B] ([https://github.com/QwenLM/Qwen2.5-Omni/blob/main/assets/Qwen2.5_Omni.pdf tech report], [https://github.com/QwenLM/Qwen2.5-Omni code], [https://huggingface.co/Qwen/Qwen2.5-Omni-7B weight])
  
 
====Language/Audio====
 
====Language/Audio====
Line 263: Line 271:
 
* [https://www.zyphra.com/ Zyphra] [https://huggingface.co/Zyphra/Zonos-v0.1-hybrid Zonos]
 
* [https://www.zyphra.com/ Zyphra] [https://huggingface.co/Zyphra/Zonos-v0.1-hybrid Zonos]
 
* [https://github.com/fishaudio/fish-speech Fish Speech] (includes voice cloning)
 
* [https://github.com/fishaudio/fish-speech Fish Speech] (includes voice cloning)
 +
* [https://canopylabs.ai/ Canopy] [https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2 Orpheus] 3B
  
 
==Cloud==
 
==Cloud==
Line 274: Line 283:
 
=Text-to-audio=
 
=Text-to-audio=
 
* 2024-12: [https://tangoflux.github.io/ TangoFlux]: [https://arxiv.org/abs/2412.21037 Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization] ([https://github.com/declare-lab/TangoFlux code])
 
* 2024-12: [https://tangoflux.github.io/ TangoFlux]: [https://arxiv.org/abs/2412.21037 Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization] ([https://github.com/declare-lab/TangoFlux code])
 +
* 2025-03: [https://arxiv.org/abs/2503.10522 AudioX: Diffusion Transformer for Anything-to-Audio Generation]
  
 
=Vision=
 
=Vision=
Line 286: Line 296:
 
* Nvidia [https://github.com/NVlabs/MambaVision MambaVision]
 
* Nvidia [https://github.com/NVlabs/MambaVision MambaVision]
 
* Meta [https://about.meta.com/realitylabs/codecavatars/sapiens Sapiens: Foundation for Human Vision Models] (video input, can infer segmentation, pose, depth-map, and surface normals)
 
* Meta [https://about.meta.com/realitylabs/codecavatars/sapiens Sapiens: Foundation for Human Vision Models] (video input, can infer segmentation, pose, depth-map, and surface normals)
 +
 +
==Depth==
 +
* 2024-06: [https://arxiv.org/abs/2406.09414 Depth Anything V2] ([https://github.com/DepthAnything/Depth-Anything-V2 code])
 +
 +
==Superresolution==
 +
* 2025-03: [https://arxiv.org/abs/2311.17643 Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields] ([https://github.com/prs-eth/thera code], [https://huggingface.co/spaces/prs-eth/thera use])
  
 
==Related==
 
==Related==
Line 313: Line 329:
 
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 
* IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
 
* IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
 
  
 
==Control==
 
==Control==
Line 321: Line 336:
 
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 
* Meta [https://facebookresearch.github.io/Kats/ Kats] ([https://github.com/facebookresearch/Kats code]): Forecasting (ARIMA, Prophet, Holt Winters, VAR), detection, feature extraction, simulation
 
* [https://arxiv.org/abs/2410.18959 Context is Key: A Benchmark for Forecasting with Essential Textual Information]
 
* [https://arxiv.org/abs/2410.18959 Context is Key: A Benchmark for Forecasting with Essential Textual Information]
 +
 +
==Anomaly Detection==
 +
* 2024-10: [https://arxiv.org/abs/2410.05440 Can LLMs Understand Time Series Anomalies?] ([https://github.com/rose-stl-lab/anomllm code])
  
 
=Data=
 
=Data=

Latest revision as of 14:25, 26 March 2025

LLM

Open-weights LLM

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

Agentic

Multimodal

Language/Vision

Language/Vision/Speech

Language/Audio

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Turn Detection

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

Visual Models

Depth

Superresolution

Related

Embedding

Text Embedding

Image Embedding

Time Series

Control

Forecasting

Anomaly Detection

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

See Also