Difference between revisions of "AI tools"

From GISAXS
Jump to: navigation, search
(Document Parsing)
(Data Scraping)
Line 317: Line 317:
 
==Database with Search==
 
==Database with Search==
 
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])
 
* [https://typesense.org/ Typesense] ([https://github.com/typesense/typesense code])
 
==Data Scraping==
 
* [https://github.com/patrickloeber/llm-data-scrapers LLM Data Scrapers] list
 
 
===Web Scraping===
 
* [https://github.com/mendableai/firecrawl Firecrawl]: API to turn websites into LLM-ready markdown or structured data (can be self-hosted)
 
* [https://github.com/unclecode/crawl4ai Crawl4AI]: Crawl Smarter, Faster, Freely. For AI.
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]: web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.)
 
* [https://github.com/bjesus/pipet pipet]: A swiss-army tool for scraping and extracting data from online assets
 
* [https://github.com/ScrapeGraphAI/Scrapegraph-ai ScrapeGraphAI: You Only Scrape Once]
 
* [https://github.com/D4Vinci/Scrapling Scrapling]: Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
 
* [https://github.com/egoist/sitefetch sitefetch]: Fetch entire site into text file (to be used with AIs)
 
* [https://github.com/mishushakov/llm-scraper LLM Scraper]: Turn webpage into structured data using LLMs
 
* [https://github.com/adbar/trafilatura Trafilatura]: Discover and Extract Text Data on the Web
 
 
===llms.txt Generator===
 
* [https://github.com/simonw/files-to-prompt files-to-prompt]: Concatenate directory of files into a single prompt
 
* [https://github.com/mendableai/llmstxt-generator llms.txt Generator]
 
 
===llms.txt Generator (Online)===
 
* Firecrawl [https://llmstxt.firecrawl.dev/ LLMs.txt generator] (online tool)
 
* Jina AI [https://github.com/jina-ai/reader Reader]: Convert any URL to an LLM-friendly input with a prefix: https://r.jina.ai/
 
 
===Headless Browser (scrape & automate)===
 
* [https://github.com/lightpanda-io/browser Lightpanda Browser]
 
 
===Code & Github===
 
* [https://github.com/cyclotruc/gitingest GitIngest]: Turn any GitHub repository into a prompt-friendly text file, for inclusion in LLM's context. Available at: [https://gitingest.com/ gitingest.com]
 
* [https://github.com/yamadashy/repomix repomix]: Packs your entire repository into a single, AI-friendly file
 
* [https://github.gg/ github.gg]: For analyzing GitHub repositories and providing valuable insights about code quality, dependencies, and more
 
* [https://github.com/mattmireles/Flatty Flatty - Codebase-to-Text for LLMs]
 
* [https://github.com/tesserato/CodeWeaver CodeWeaver]: Generate a Markdown Document of Your Codebase Structure and Content
 
* [https://github.com/Doriandarko/RepoToTextForLLMs RepoToTextForLLMs]
 
 
===Media Files===
 
* [https://github.com/imputnet/cobalt Cobalt]
 
* [https://github.com/soimort/you-get You-Get]: Web video downloader tool
 
  
 
=See Also=
 
=See Also=

Revision as of 10:20, 26 February 2025

LLM

Open-weights LLM

For Coding

Rankings: bigcode-models-leaderboard and CodeElo leaderboard

Reasoning

See also: Increasing AI Intelligence > Proactive Search > CoT reasoning model

Cloud LLM

Multi-modal: Audio

Triage

Retrieval Augmented Generation (RAG)

Reviews

Measuring RAG performance

Analysis of RAG overall

Approaches

Open-source Implementations

Web-based Tools

  • SciSpace Chat with PDF (also available as a GPT).

Commercial Cloud Offerings

Automatic Optimization

Analogous to Gradient Descent

LLM for scoring/ranking

LLM Agents

Interfaces

Chatbot Frontend

Web (code)

Web (product)

Desktop GUI

Alternative Text Chatbot UI

  • Loom provides a sort of tree-like structure for LLM coming up with branched writings.
  • The Pantheon Interface is a new idea for how to interact with LLMs (live instance, code). In a traditional interaction, you prompt the bot and it replies in a turn-by-turn manner. Pantheon instead invites you to type out your thoughts, and various agents will asynchronously add comments or questions to spur along your brainstorming.

Conversational Audio Chatbot

Related Research

Commercial Systems

Speech Recognition (ASR) and Transcription

Lists

Open Source

In Browser

  • Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running locally in browser

Phrase Endpointing and Voice Activity Detection (VAD)

I.e. how to determine when user is done talking, and bot should respond?

Audio Cleanup

  • Krisp AI: Noise cancellation, meeting summary, etc.

Text-to-speech (TTS)

Open Source

Cloud

Text-to-audio

Vision

  • Langfun library as a means of converting images into structured output.

Visual Models

Multi-modal Models (language-vision/video)

Related

Embedding

Image Embedding

Time Series


Control

Forecasting

Data

Vector Database

Open Source

Commercial cloud

MySQL

Database with Search

See Also