Difference between revisions of "AI Agents"

From GISAXS
Jump to: navigation, search
(Societies and Communities of AI agents)
(Agent Internal Workflow Management)
 
(9 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
* [https://huggingface.co/blog/smolagents Huggingface] [https://github.com/huggingface/smolagents smolagents]
 
* [https://huggingface.co/blog/smolagents Huggingface] [https://github.com/huggingface/smolagents smolagents]
 
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 +
* [https://github.com/The-Pocket/PocketFlow Pocket Flow]: LLM Framework in 100 Lines
  
===Information Retrieval===
+
===Information Retrieval (Memory)===
 
* See also [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|RAG]].
 
* See also [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|RAG]].
 
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
 
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
 
* 2024-10: [https://arxiv.org/abs/2410.09713 Agentic Information Retrieval]
 
* 2024-10: [https://arxiv.org/abs/2410.09713 Agentic Information Retrieval]
 +
* [https://mem0.ai/ Mem0 AI]: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.
  
 
===Control (tool-use, computer use, etc.)===
 
===Control (tool-use, computer use, etc.)===
 +
* See also: [[Human_Computer_Interaction#AI_Computer_Use]]
 +
* [https://tavily.com/ Tavily]: Connect Your LLM to the Web: Empowering your AI applications with real-time, accurate search results tailored for LLMs and RAG
 
* Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
 
* Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
 
** [https://github.com/jlowin/fastmcp FastMCP]: The fast, Pythonic way to build MCP servers
 
** [https://github.com/jlowin/fastmcp FastMCP]: The fast, Pythonic way to build MCP servers
Line 58: Line 62:
 
===Computer Use===
 
===Computer Use===
 
* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
 
* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
 +
* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
  
 
===Software Engineering===
 
===Software Engineering===
 
* 2024-11: [https://github.com/MLSysOps/MLE-agent MLE-Agent: Your intelligent companion for seamless AI engineering and research]
 
* 2024-11: [https://github.com/MLSysOps/MLE-agent MLE-Agent: Your intelligent companion for seamless AI engineering and research]
 +
* [https://github.com/OpenAutoCoder/Agentless Agentless]: agentless approach to automatically solve software development problems
  
 
===Science Agents===
 
===Science Agents===
Line 82: Line 88:
 
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
 
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
 +
 +
===Author Research Articles===
 +
* 2024-02: STORM: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models] ([https://www.aihero.dev/storm-generate-high-quality-articles-based-on-real-research discussion/analysis])
  
 
===Software Development Workflows===
 
===Software Development Workflows===
Line 139: Line 148:
 
===Domain-specific===
 
===Domain-specific===
 
* 2024-12: [https://arxiv.org/abs/2412.20138 TradingAgents: Multi-Agents LLM Financial Trading Framework]
 
* 2024-12: [https://arxiv.org/abs/2412.20138 TradingAgents: Multi-Agents LLM Financial Trading Framework]
 +
* 2025-01: [https://arxiv.org/abs/2501.04227 Agent Laboratory: Using LLM Agents as Research Assistants]
  
 
==Research demos==
 
==Research demos==
Line 230: Line 240:
 
=Optimization=
 
=Optimization=
 
===Metrics, Benchmarks===
 
===Metrics, Benchmarks===
 +
* 2019-11: [https://arxiv.org/abs/1911.01547 On the Measure of Intelligence]
 
* 2022-06: [https://arxiv.org/abs/2206.10498 PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change]
 
* 2022-06: [https://arxiv.org/abs/2206.10498 PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change]
 
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?] (challenging Corr2Cause task)
 
* 2023-06: [https://arxiv.org/abs/2306.05836 Can Large Language Models Infer Causation from Correlation?] (challenging Corr2Cause task)

Latest revision as of 12:22, 5 February 2025

Reviews & Perspectives

Published

Continually updating

Analysis/Opinions

Guides

AI Assistants

Components of AI Assistants

Agent Internal Workflow Management

Information Retrieval (Memory)

Control (tool-use, computer use, etc.)

Open-source

Personalities/Personas

Specific Uses for AI Assistants

Computer Use

Software Engineering

Science Agents

See Science Agents.

LLM-as-judge

Advanced Workflows

Author Research Articles

Software Development Workflows

Several paradigms of AI-assisted coding have arisen:

  1. Manual, human driven
  2. AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
    1. OpenAI ChatGPT
    2. Anthropic Claude
  3. API calls to an LLM, which generates code and inserts the file into the project
  4. LLM-integration into the IDE
    1. Copilot
    2. Qodo (Codium) & AlphaCodium (preprint, code)
    3. Cursor
    4. Codeium Windsurf (with "Cascade" AI Agent)
  5. AI-assisted IDE, where the AI generates and manages the dev environment
    1. Replit
    2. Aider (code): Pair programming on commandline
    3. Pythagora
    4. StackBlitz bolt.new
    5. Cline (formerly Claude Dev)
  6. Prompt-to-product
    1. Github Spark (demo video)
  7. Semi-autonomous software engineer agents
    1. Devin (Cognition AI)
    2. Amazon Q
    3. Honeycomb

For a review of the current state of software-engineering agentic approaches, see:

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Inference-compute Reasoning

Agentic Systems

Increasing AI Agent Intelligence

See: Increasing AI Intelligence

Multi-agent orchestration

Research

Societies and Communities of AI agents

Domain-specific

Research demos

Related work

Inter-agent communications

Architectures

Open Source Frameworks

Open Source Systems

Commercial Automation Frameworks

Spreadsheet

Cloud solutions

Frameworks

Optimization

Metrics, Benchmarks

Evaluation Schemes

Multi-agent

Agent Challenges

  • Aidan-Bench: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
  • Pictionary: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
  • MC-bench: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges.

Automated Improvement

See Also