Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(Embedding)
(Reasoning)
 
(36 intermediate revisions by the same user not shown)
Line 46: Line 46:
 
* 2025-01Jan-10: [https://mbzuai-oryx.github.io/LlamaV-o1/ LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs] ([https://arxiv.org/abs/2501.06186 preprint], [https://github.com/mbzuai-oryx/LlamaV-o1 code], [https://huggingface.co/omkarthawakar/LlamaV-o1 weights])
 
* 2025-01Jan-10: [https://mbzuai-oryx.github.io/LlamaV-o1/ LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs] ([https://arxiv.org/abs/2501.06186 preprint], [https://github.com/mbzuai-oryx/LlamaV-o1 code], [https://huggingface.co/omkarthawakar/LlamaV-o1 weights])
 
* [https://x.com/deepseek_ai/status/1881318130334814301 2025-01Jan-20]: [https://huggingface.co/deepseek-ai/DeepSeek-R1 DeepSeek-R1], [https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B DeepSeek-R1-Distill-Llama-70B], DeepSeek-R1-Distill-Qwen-32B, ... ([https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf paper])
 
* [https://x.com/deepseek_ai/status/1881318130334814301 2025-01Jan-20]: [https://huggingface.co/deepseek-ai/DeepSeek-R1 DeepSeek-R1], [https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B DeepSeek-R1-Distill-Llama-70B], DeepSeek-R1-Distill-Qwen-32B, ... ([https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf paper])
 +
* 2025-02Feb-10: [https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125]: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://github.com/seal-rg/recurrent-pretraining code], [https://huggingface.co/tomg-group-umd/huginn-0125 model])
 +
* [https://x.com/NousResearch/status/1890148000204485088 2025-02Feb-14]: [https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview DeepHermes 3 - Llama-3.1 8B]
 +
* [https://x.com/Alibaba_Qwen/status/1894130603513319842 2025-02Feb-24]: [https://qwenlm.github.io/blog/qwq-max-preview/ QwQ-Max-Preview] ([https://chat.qwen.ai/ online demo])
  
 
==Cloud LLM==
 
==Cloud LLM==
Line 57: Line 60:
  
 
==Retrieval Augmented Generation (RAG)==
 
==Retrieval Augmented Generation (RAG)==
 +
* See Also: [[AI_tools#Document_Parsing|Document Parsing]]
 +
 
===Reviews===
 
===Reviews===
 
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
 
* 2024-08: [https://arxiv.org/abs/2408.08921 Graph Retrieval-Augmented Generation: A Survey]
Line 62: Line 67:
 
* 2024-12: [https://arxiv.org/abs/2412.17558 A Survey of Query Optimization in Large Language Models]
 
* 2024-12: [https://arxiv.org/abs/2412.17558 A Survey of Query Optimization in Large Language Models]
 
* 2025-01: [https://arxiv.org/abs/2501.07391 Enhancing Retrieval-Augmented Generation: A Study of Best Practices]
 
* 2025-01: [https://arxiv.org/abs/2501.07391 Enhancing Retrieval-Augmented Generation: A Study of Best Practices]
 +
* 2025-01: [https://arxiv.org/abs/2501.09136 Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG] ([https://github.com/asinghcsu/AgenticRAG-Survey github])
 
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
 
* List of [https://github.com/NirDiamant/RAG_Techniques RAG techniques]
 
* [https://github.com/athina-ai/rag-cookbooks Advanced RAG Cookbooks👨🏻‍💻]
 
* [https://github.com/athina-ai/rag-cookbooks Advanced RAG Cookbooks👨🏻‍💻]
Line 76: Line 82:
 
* AutoMetaRAG ([https://github.com/darshil3011/AutoMetaRAG/tree/main code])
 
* AutoMetaRAG ([https://github.com/darshil3011/AutoMetaRAG/tree/main code])
 
* [https://verba.weaviate.io/ Verba]: RAG for [https://weaviate.io/ Weaviate] vector database ([https://github.com/weaviate/verba code], [https://www.youtube.com/watch?v=UoowC-hsaf0 video])
 
* [https://verba.weaviate.io/ Verba]: RAG for [https://weaviate.io/ Weaviate] vector database ([https://github.com/weaviate/verba code], [https://www.youtube.com/watch?v=UoowC-hsaf0 video])
 +
* Microsoft: [https://github.com/microsoft/PIKE-RAG PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation]
 
* 2024-10: Google [https://arxiv.org/abs/2410.07176 Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models]
 
* 2024-10: Google [https://arxiv.org/abs/2410.07176 Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models]
 
* 2024-10: [https://arxiv.org/abs/2410.08815 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization]: Reformats retrieved data into task-appropriate structures (table, graph, tree).
 
* 2024-10: [https://arxiv.org/abs/2410.08815 StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization]: Reformats retrieved data into task-appropriate structures (table, graph, tree).
Line 85: Line 92:
 
* 2025-01: [https://github.com/Marker-Inc-Korea/AutoRAG AutoRAG: RAG AutoML tool for automatically finding an optimal RAG pipeline for your data]
 
* 2025-01: [https://github.com/Marker-Inc-Korea/AutoRAG AutoRAG: RAG AutoML tool for automatically finding an optimal RAG pipeline for your data]
 
* 2025-01: [https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus]
 
* 2025-01: [https://arxiv.org/abs/2501.05874 VideoRAG: Retrieval-Augmented Generation over Video Corpus]
 +
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
  
 
===Open-source Implementations===
 
===Open-source Implementations===
Line 114: Line 122:
 
* [https://www.voyageai.com/ Voyage AI]
 
* [https://www.voyageai.com/ Voyage AI]
 
* [https://abacus.ai/ Abacus AI]
 
* [https://abacus.ai/ Abacus AI]
 
===Document Parsing===
 
* [https://github.com/DS4SD/docling Docling]: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
 
* [https://github.com/microsoft/markitdown Microsoft Markitdown]: converts various formats (PDF, Word, Excel, PPT) to Markdown (available via [https://msftmd.replit.app/ web interface on replit])
 
* [https://github.com/wisupai/e2m e2m: Everything to Markdown] (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a)
 
* Nvidia [https://docs.nvidia.com/nv-ingest/user-guide/index.html NV-ingest] ([https://github.com/NVIDIA/nv-ingest code]) scalable, performance-oriented document content and metadata extraction microservice
 
* [https://github.com/QuivrHQ/MegaParse MegaParse]: Your Parser for every type of documents (pdf, powerpoint, word)
 
* [https://github.com/harishdeivanayagam/rowfill Rowfill]: Open-source document processing; extract, analyze, and process data from complex documents, images, PDFs and more with AI
 
 
====PDF Conversion====
 
* [https://github.com/kermitt2/grobid Grobid]
 
* [https://chunkr.ai/ Chunkr] ([https://github.com/lumina-ai-inc/chunkr code])
 
  
 
==Automatic Optimization==
 
==Automatic Optimization==
Line 235: Line 231:
 
* [https://huggingface.co/amphion/MaskGCT MaskGCT] ([https://huggingface.co/spaces/amphion/maskgct demo])
 
* [https://huggingface.co/amphion/MaskGCT MaskGCT] ([https://huggingface.co/spaces/amphion/maskgct demo])
 
* [https://arxiv.org/abs/2312.09911 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit] ([https://github.com/open-mmlab/Amphion code])
 
* [https://arxiv.org/abs/2312.09911 Amphion: An Open-Source Audio, Music and Speech Generation Toolkit] ([https://github.com/open-mmlab/Amphion code])
 +
* [https://www.zyphra.com/ Zyphra] [https://huggingface.co/Zyphra/Zonos-v0.1-hybrid Zonos]
 +
* [https://github.com/fishaudio/fish-speech Fish Speech] (includes voice cloning)
  
 
==Cloud==
 
==Cloud==
Line 247: Line 245:
  
 
=Vision=
 
=Vision=
 +
* [https://github.com/google/langfun Langfun] library as a means of converting images into structured output.
 +
 
==Visual Models==
 
==Visual Models==
 
* [https://openai.com/index/clip/ CLIP]
 
* [https://openai.com/index/clip/ CLIP]
Line 268: Line 268:
 
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
 
* [https://arxiv.org/abs/2409.01704 General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model] ([https://huggingface.co/stepfun-ai/GOT-OCR2_0 project], [https://github.com/Ucas-HaoranWei/GOT-OCR2.0/ code], [https://huggingface.co/spaces/stepfun-ai/GOT_official_online_demo demo])
 
* [https://github.com/yigitkonur/swift-ocr-llm-powered-pdf-to-markdown Swift OCR: LLM Powered Fast OCR]
 
* [https://github.com/yigitkonur/swift-ocr-llm-powered-pdf-to-markdown Swift OCR: LLM Powered Fast OCR]
 +
* [https://github.com/imanoop7/Ollama-OCR Ollama OCR]
  
 
==Related==
 
==Related==
Line 322: Line 323:
 
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])
 
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])
  
==Web Scraping==
+
==Data Scraping==
* [https://github.com/mendableai/firecrawl Firecrawl]
+
* [https://github.com/patrickloeber/llm-data-scrapers LLM Data Scrapers] list
* [https://github.com/unclecode/crawl4ai Crawl4AI: Crawl Smarter, Faster, Freely. For AI.]
+
 
 +
===Web Scraping===
 +
* [https://github.com/mendableai/firecrawl Firecrawl]: API to turn websites into LLM-ready markdown or structured data (can be self-hosted)
 +
* [https://github.com/unclecode/crawl4ai Crawl4AI]: Crawl Smarter, Faster, Freely. For AI.
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
 
* [https://github.com/bjesus/pipet pipet]: A swiss-army tool for scraping and extracting data from online assets
 
* [https://github.com/bjesus/pipet pipet]: A swiss-army tool for scraping and extracting data from online assets
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]
 
* [https://github.com/D4Vinci/Scrapling Scrapling]: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
 
* [https://github.com/D4Vinci/Scrapling Scrapling]: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
 +
* [https://github.com/egoist/sitefetch sitefetch]: Fetch entire site into text file (to be used with AIs)
 +
* [https://github.com/mishushakov/llm-scraper LLM Scraper]: Turn webpage into structured data using LLMs
 +
* [https://github.com/adbar/trafilatura Trafilatura]: Discover and Extract Text Data on the Web
 +
 +
===llms.txt Generator===
 +
* [https://github.com/simonw/files-to-prompt files-to-prompt]: Concatenate directory of files into a single prompt
 +
* [https://github.com/mendableai/llmstxt-generator llms.txt Generator]
 +
 +
===llms.txt Generator (Online)===
 +
* Firecrawl [https://llmstxt.firecrawl.dev/ LLMs.txt generator] (online tool)
 +
* Jina AI [https://github.com/jina-ai/reader Reader]: Convert any URL to an LLM-friendly input with a prefix: https://r.jina.ai/
  
 
===Headless Browser (scrape & automate)===
 
===Headless Browser (scrape & automate)===
 
* [https://github.com/lightpanda-io/browser Lightpanda Browser]
 
* [https://github.com/lightpanda-io/browser Lightpanda Browser]
  
===Github===
+
===Code & Github===
 
* [https://github.com/cyclotruc/gitingest GitIngest]: Turn any GitHub repository into a prompt-friendly text file, for inclusion in LLM's context. Available at: [https://gitingest.com/ gitingest.com]
 
* [https://github.com/cyclotruc/gitingest GitIngest]: Turn any GitHub repository into a prompt-friendly text file, for inclusion in LLM's context. Available at: [https://gitingest.com/ gitingest.com]
 +
* [https://github.com/yamadashy/repomix repomix]: Packs your entire repository into a single, AI-friendly file
 
* [https://github.gg/ github.gg]: For analyzing GitHub repositories and providing valuable insights about code quality, dependencies, and more
 
* [https://github.gg/ github.gg]: For analyzing GitHub repositories and providing valuable insights about code quality, dependencies, and more
 
* [https://github.com/mattmireles/Flatty Flatty - Codebase-to-Text for LLMs]
 
* [https://github.com/mattmireles/Flatty Flatty - Codebase-to-Text for LLMs]
 +
* [https://github.com/tesserato/CodeWeaver CodeWeaver]: Generate a Markdown Document of Your Codebase Structure and Content
 +
* [https://github.com/Doriandarko/RepoToTextForLLMs RepoToTextForLLMs]
 +
 +
===Media Files===
 +
* [https://github.com/imputnet/cobalt Cobalt]
 +
* [https://github.com/soimort/you-get You-Get]: Web video downloader tool
 +
 +
==Document Parsing==
 +
* [https://github.com/DS4SD/docling Docling]: converts multiple formats (PDF, DOCX, PPTX, Images, HTML) into Markdown and JSON
 +
* [https://github.com/microsoft/markitdown Microsoft Markitdown]: converts various formats (PDF, Word, Excel, PPT) to Markdown (available via [https://msftmd.replit.app/ web interface on replit])
 +
* [https://github.com/wisupai/e2m e2m: Everything to Markdown] (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, and m4a)
 +
* Nvidia [https://docs.nvidia.com/nv-ingest/user-guide/index.html NV-ingest] ([https://github.com/NVIDIA/nv-ingest code]) scalable, performance-oriented document content and metadata extraction microservice
 +
* [https://github.com/QuivrHQ/MegaParse MegaParse]: Your Parser for every type of documents (pdf, powerpoint, word)
 +
* [https://github.com/harishdeivanayagam/rowfill Rowfill]: Open-source document processing; extract, analyze, and process data from complex documents, images, PDFs and more with AI
 +
* [https://github.com/getomni-ai/zerox Zerox]: PDF to markdown vision model (OCR)
 +
* [https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse/ LlamaParse] ([https://github.com/run-llama/llama_cloud_services/blob/main/examples/parse/multimodal/gemini2_flash.ipynb example use for multimodal parsing])
 +
* [https://github.com/VikParuchuri/marker Marker]: PDFs and images to markdown
 +
 +
===PDF Conversion===
 +
* [https://github.com/kermitt2/grobid Grobid] ([https://kermitt2-grobid.hf.space/ use online])
 +
* [https://chunkr.ai/ Chunkr] ([https://github.com/lumina-ai-inc/chunkr code])
 +
* [https://github.com/opendatalab/PDF-Extract-Kit PDF-Extract-Kit]
 +
* [https://www.philschmid.de/gemini-pdf-to-data Gemini 2.0]
 +
* [https://github.com/VikParuchuri/marker marker]: converts PDFs and images to markdown, JSON, and HTML
 +
 +
===PDF Language Translation===
 +
* [https://github.com/Byaidu/PDFMathTranslate?tab=readme-ov-file PDFMathTranslate] ([https://pdf2zh.com/ online demo])
 +
 +
===Structured Data Extraction===
 +
* [https://unstract.com/ Unstract]: [https://github.com/Zipstack/unstract Intelligent Document Processing] (IDP): No-code LLM Platform to structure unstructured documents
 +
 +
===Screenshot===
 +
* Microsoft [https://github.com/microsoft/OmniParser OmniParser]: Screen Parsing tool for Pure Vision Based GUI Agent
  
 
=See Also=
 
=See Also=

Latest revision as of 16:37, 24 February 2025

Contents

LLM

Open-weights LLM

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

Automatic Optimization

Analogous to Gradient Descent

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

  • Langfun library as a means of converting images into structured output.

Visual Models

Multi-modal Models (language-vision/video)

Optical character recognition (OCR)

Related

Embedding

Image Embedding

Time Series


Control

Forecasting

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

Data Scraping

Web Scraping

  • Firecrawl: API to turn websites into LLM-ready markdown or structured data (can be self-hosted)
  • Crawl4AI: Crawl Smarter, Faster, Freely. For AI.
  • ScrapeGraphAI: You Only Scrape Once: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
  • pipet: A swiss-army tool for scraping and extracting data from online assets
  • ScrapeGraphAI: You Only Scrape Once
  • Scrapling: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
  • sitefetch: Fetch entire site into text file (to be used with AIs)
  • LLM Scraper: Turn webpage into structured data using LLMs
  • Trafilatura: Discover and Extract Text Data on the Web

llms.txt Generator

llms.txt Generator (Online)

Headless Browser (scrape & automate)

Code & Github

Media Files

Document Parsing

PDF Conversion

PDF Language Translation

Structured Data Extraction

Screenshot

  • Microsoft OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

See Also