Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(Time Series)
(Embedding)
 
(11 intermediate revisions by the same user not shown)
Line 28: Line 28:
 
* [https://x.com/SebastienBubeck/status/1877010995727470877 2025-01Jan-08]: Microsoft [https://huggingface.co/microsoft/phi-4 phi-4] 15B
 
* [https://x.com/SebastienBubeck/status/1877010995727470877 2025-01Jan-08]: Microsoft [https://huggingface.co/microsoft/phi-4 phi-4] 15B
 
* [https://x.com/MiniMax__AI/status/1879226391352549451 2025-01Jan-14]: [https://www.minimaxi.com/en/news/minimax-01-series-2 MiniMax-01], MiniMax-Text-01 and MiniMax-VL-01; 4M context length ([https://www.minimaxi.com/en/news/minimax-01-series-2 paper])
 
* [https://x.com/MiniMax__AI/status/1879226391352549451 2025-01Jan-14]: [https://www.minimaxi.com/en/news/minimax-01-series-2 MiniMax-01], MiniMax-Text-01 and MiniMax-VL-01; 4M context length ([https://www.minimaxi.com/en/news/minimax-01-series-2 paper])
 +
* 2025-01Jan-27: [https://qwenlm.github.io/blog/qwen2.5-1m/ Qwen2.5-1M] ([https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2.5-1M/Qwen2_5_1M_Technical_Report.pdf report])
 +
* 2025-01Jan-27: DeepSeek [https://huggingface.co/deepseek-ai/Janus-Pro-7B Janus-Pro-7B] (with image capabilities)
  
 
===For Coding===
 
===For Coding===
Line 36: Line 38:
  
 
===Reasoning===
 
===Reasoning===
 +
See also: [[Increasing_AI_Intelligence|Increasing AI Intelligence]] > Proactive Search > [[Increasing_AI_Intelligence#CoT_reasoning_model|CoT reasoning model]]
 
* [https://x.com/deepseek_ai/status/1859200141355536422 2024-11Nov-20]: DeepSeek-R1-Lite-Preview ([https://x.com/deepseek_ai/status/1859200145037869485 results], [https://x.com/teortaxesTex/status/1859259359630356955 CoT])
 
* [https://x.com/deepseek_ai/status/1859200141355536422 2024-11Nov-20]: DeepSeek-R1-Lite-Preview ([https://x.com/deepseek_ai/status/1859200145037869485 results], [https://x.com/teortaxesTex/status/1859259359630356955 CoT])
 
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11Nov-23: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
Line 110: Line 113:
 
* [https://platform.vectorize.io/ Vectorize]
 
* [https://platform.vectorize.io/ Vectorize]
 
* [https://www.voyageai.com/ Voyage AI]
 
* [https://www.voyageai.com/ Voyage AI]
 +
* [https://abacus.ai/ Abacus AI]
  
 
===Document Parsing===
 
===Document Parsing===
Line 117: Line 121:
 
* Nvidia [https://docs.nvidia.com/nv-ingest/user-guide/index.html NV-ingest] ([https://github.com/NVIDIA/nv-ingest code]) scalable, performance-oriented document content and metadata extraction microservice
 
* Nvidia [https://docs.nvidia.com/nv-ingest/user-guide/index.html NV-ingest] ([https://github.com/NVIDIA/nv-ingest code]) scalable, performance-oriented document content and metadata extraction microservice
 
* [https://github.com/QuivrHQ/MegaParse MegaParse]: Your Parser for every type of documents (pdf, powerpoint, word)
 
* [https://github.com/QuivrHQ/MegaParse MegaParse]: Your Parser for every type of documents (pdf, powerpoint, word)
 +
* [https://github.com/harishdeivanayagam/rowfill Rowfill]: Open-source document processing; extract, analyze, and process data from complex documents, images, PDFs and more with AI
  
 
====PDF Conversion====
 
====PDF Conversion====
Line 258: Line 263:
 
* 2024-09Sep-17: [https://nvlm-project.github.io/ NVLM 1.0]
 
* 2024-09Sep-17: [https://nvlm-project.github.io/ NVLM 1.0]
 
* 2024-12Dec-06: Nvidia [https://arxiv.org/abs/2412.04468 NVILA: Efficient Frontier Visual Language Models]
 
* 2024-12Dec-06: Nvidia [https://arxiv.org/abs/2412.04468 NVILA: Efficient Frontier Visual Language Models]
 +
* [https://x.com/Alibaba_Qwen/status/1883954247743725963 2025-01Jan-28]: [https://huggingface.co/collections/Qwen/qwen25-vl-6795ffac22b334a837c0f9a5 Qwen2.5-VL]
  
 
==Optical character recognition (OCR)==
 
==Optical character recognition (OCR)==
 
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
 
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
 
* [https://github.com/yigitkonur/swift-ocr-llm-powered-pdf-to-markdown Swift OCR: LLM Powered Fast OCR]
 
* [https://github.com/yigitkonur/swift-ocr-llm-powered-pdf-to-markdown Swift OCR: LLM Powered Fast OCR]
 +
 +
==Related==
 +
* 2019-11: [https://arxiv.org/abs/1911.11763 SuperGlue: Learning Feature Matching with Graph Neural Networks] ([https://huggingface.co/docs/transformers/main/en/model_doc/superglue hf])
  
 
=Embedding=
 
=Embedding=
 
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
 
* [https://www.marktechpost.com/2024/07/28/a-comparison-of-top-embedding-libraries-for-generative-ai/ A Comparison of Top Embedding Libraries for Generative AI]
 
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
 
* 2024-12: [https://huggingface.co/blog/modernbert modernBERT]
 +
==Image Embedding==
 +
* 2025-01:[https://arxiv.org/abs/2501.18593 Diffusion Autoencoders are Scalable Image Tokenizers] ([https://yinboc.github.io/dito/ project], [https://github.com/yinboc/dito code])
  
 
=Time Series=
 
=Time Series=
Line 271: Line 282:
 
* [https://arxiv.org/abs/1912.09363 Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]
 
* [https://arxiv.org/abs/1912.09363 Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting]
 
* [https://arxiv.org/abs/2209.00905 From latent dynamics to meaningful representations]
 
* [https://arxiv.org/abs/2209.00905 From latent dynamics to meaningful representations]
 +
* [https://arxiv.org/abs/2209.10705 Review of Time Series Forecasting Methods and Their Applications to Particle Accelerators]
 
* [https://arxiv.org/abs/2310.01728 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models]
 
* [https://arxiv.org/abs/2310.01728 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models]
 
* [https://arxiv.org/abs/2310.10688 A decoder-only foundation model for time-series forecasting]
 
* [https://arxiv.org/abs/2310.10688 A decoder-only foundation model for time-series forecasting]
Line 278: Line 290:
 
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 
* Salesforce: [https://arxiv.org/abs/2410.10469 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts] ([https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/moirai-moe-1 code], [https://huggingface.co/collections/Salesforce/moirai-r-models-65c8d3a94c51428c300e0742 weights], [https://www.salesforce.com/blog/time-series-morai-moe/ blog])
 
* IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
 
* IBM [https://huggingface.co/docs/transformers/en/model_doc/patchtsmixer PatchTSMixer] and [https://huggingface.co/docs/transformers/en/model_doc/patchtst PatchTST] (being [https://research.ibm.com/blog/time-series-AI-transformers used] for particle accelerators)
 +
  
 
==Control==
 
==Control==
Line 313: Line 326:
 
* [https://github.com/unclecode/crawl4ai Crawl4AI: Crawl Smarter, Faster, Freely. For AI.]
 
* [https://github.com/unclecode/crawl4ai Crawl4AI: Crawl Smarter, Faster, Freely. For AI.]
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
 +
* [https://github.com/bjesus/pipet pipet]: A swiss-army tool for scraping and extracting data from online assets
 +
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]
 +
* [https://github.com/D4Vinci/Scrapling Scrapling]: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
 +
 
===Headless Browser (scrape & automate)===
 
===Headless Browser (scrape & automate)===
 
* [https://github.com/lightpanda-io/browser Lightpanda Browser]
 
* [https://github.com/lightpanda-io/browser Lightpanda Browser]

Latest revision as of 16:32, 4 February 2025

LLM

Open-weights LLM

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

Document Parsing

  • Docling: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
  • Microsoft Markitdown: converts various formats (PDF, Word, Excel, PPT) to Markdown (available via web interface on replit)
  • e2m: Everything to Markdown (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a)
  • Nvidia NV-ingest (code) scalable, performance-oriented document content and metadata extraction microservice
  • MegaParse: Your Parser for every type of documents (pdf, powerpoint, word)
  • Rowfill: Open-source document processing; extract, analyze, and process data from complex documents, images, PDFs and more with AI

PDF Conversion

Automatic Optimization

Analogous to Gradient Descent

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

Visual Models

Multi-modal Models (language-vision/video)

Optical character recognition (OCR)

Related

Embedding

Image Embedding

Time Series


Control

Forecasting

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

Web Scraping

Headless Browser (scrape & automate)

Github

See Also