Difference between revisions of "AI Agents"

From GISAXS
Jump to: navigation, search
(Software Development Workflows)
(Information Retrieval (Memory))
 
(16 intermediate revisions by the same user not shown)
Line 20: Line 20:
 
* [https://arxiv.org/abs/2402.01817v3 LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks]
 
* [https://arxiv.org/abs/2402.01817v3 LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks]
 
* [https://rasa.com/blog/cutting-ai-assistant-costs-the-power-of-enhancing-llms-with-business/ Cutting AI Assistant Costs by Up to 77.8%: The Power of Enhancing LLMs with Business Logic]
 
* [https://rasa.com/blog/cutting-ai-assistant-costs-the-power-of-enhancing-llms-with-business/ Cutting AI Assistant Costs by Up to 77.8%: The Power of Enhancing LLMs with Business Logic]
 +
* 2025-05: [https://arxiv.org/abs/2505.10468 AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges]
  
 
===Guides===
 
===Guides===
Line 26: Line 27:
 
* OpenAI: [https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf A practical guide to building agents]
 
* OpenAI: [https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf A practical guide to building agents]
 
* Anthropic: [https://www.anthropic.com/engineering/claude-code-best-practices Claude Code: Best practices for agentic coding]
 
* Anthropic: [https://www.anthropic.com/engineering/claude-code-best-practices Claude Code: Best practices for agentic coding]
 +
* Anthropic: [https://www.anthropic.com/engineering/built-multi-agent-research-system How we built our multi-agent research system]
  
 
=AI Assistants=
 
=AI Assistants=
Line 39: Line 41:
 
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 
* [https://github.com/The-Pocket/PocketFlow Pocket Flow]: LLM Framework in 100 Lines
 
* [https://github.com/The-Pocket/PocketFlow Pocket Flow]: LLM Framework in 100 Lines
 +
* [https://github.com/coze-dev/coze-studio Coze]: All-in-one AI agent development tool
  
 
===Information Retrieval (Memory)===
 
===Information Retrieval (Memory)===
Line 46: Line 49:
 
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
 
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
 
* [https://mem0.ai/ Mem0 AI]: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.
 
* [https://mem0.ai/ Mem0 AI]: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.
 +
* 2025-08: [https://arxiv.org/abs/2508.16153 Memento: Fine-tuning LLM Agents without Fine-tuning LLMs]
  
 
===Contextual Memory===
 
===Contextual Memory===
Line 108: Line 112:
 
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
 
* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
 
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
 
* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
 +
* 2025-08: [https://arxiv.org/abs/2508.20148 The Anatomy of a Personal Health Agent]
  
 
===LLM-as-judge===
 
===LLM-as-judge===
Line 145: Line 150:
 
* 2025-04: Anthropic [https://x.com/AnthropicAI/status/1912192384588271771 Research]
 
* 2025-04: Anthropic [https://x.com/AnthropicAI/status/1912192384588271771 Research]
 
* 2025-04: [https://arxiv.org/abs/2504.21776 WebThinker: Empowering Large Reasoning Models with Deep Research Capability]
 
* 2025-04: [https://arxiv.org/abs/2504.21776 WebThinker: Empowering Large Reasoning Models with Deep Research Capability]
 +
* 2025-09: [https://arxiv.org/abs/2509.06283 SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents]
  
 
=Advanced Workflows=
 
=Advanced Workflows=
Line 154: Line 160:
 
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
 
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
 +
* [https://skywork.ai/home Skywork] [https://skywork.ai/home?inviter=el.cine&shortlink_id=1919604877427924992&utm_source=X Super Agent]
  
 
===Streamline Administrative Tasks===
 
===Streamline Administrative Tasks===
Line 167: Line 174:
 
## OpenAI [https://chatgpt.com/ ChatGPT]
 
## OpenAI [https://chatgpt.com/ ChatGPT]
 
## Anthropic [https://claude.ai/ Claude]
 
## Anthropic [https://claude.ai/ Claude]
 +
## Google [https://gemini.google.com/app Gemini]
 
# API calls to an LLM, which generates code and inserts the file into the project
 
# API calls to an LLM, which generates code and inserts the file into the project
 
# LLM-integration into the IDE
 
# LLM-integration into the IDE
 
## [https://github.com/features/copilot Copilot]
 
## [https://github.com/features/copilot Copilot]
 
## [https://www.qodo.ai/ Qodo] (Codium) & [https://www.qodo.ai/products/alphacodium/ AlphaCodium] ([https://arxiv.org/abs/2401.08500 preprint], [https://github.com/Codium-ai/AlphaCodium code])
 
## [https://www.qodo.ai/ Qodo] (Codium) & [https://www.qodo.ai/products/alphacodium/ AlphaCodium] ([https://arxiv.org/abs/2401.08500 preprint], [https://github.com/Codium-ai/AlphaCodium code])
## [https://www.cursor.com/ Cursor]
+
## '''[https://www.cursor.com/ Cursor]'''
 
## [https://codeium.com/ Codeium] [https://codeium.com/windsurf Windsurf] (with "Cascade" AI Agent)
 
## [https://codeium.com/ Codeium] [https://codeium.com/windsurf Windsurf] (with "Cascade" AI Agent)
 
## ByteDance [https://www.trae.ai/ Trae AI]
 
## ByteDance [https://www.trae.ai/ Trae AI]
Line 189: Line 197:
 
# AI-assisted IDE, where the AI generates and manages the dev environment
 
# AI-assisted IDE, where the AI generates and manages the dev environment
 
## [https://replit.com/ Replit]
 
## [https://replit.com/ Replit]
## [https://aider.chat/ Aider] ([https://github.com/Aider-AI/aider code]): Pair programming on commandline
 
 
## [https://www.pythagora.ai/ Pythagora]
 
## [https://www.pythagora.ai/ Pythagora]
 
## [https://stackblitz.com/ StackBlitz] [https://bolt.new/ bolt.new]
 
## [https://stackblitz.com/ StackBlitz] [https://bolt.new/ bolt.new]
 
## [https://github.com/clinebot/cline Cline] (formerly [https://generativeai.pub/meet-claude-dev-an-open-source-autonomous-ai-programmer-in-vs-code-f457f9821b7b Claude Dev])
 
## [https://github.com/clinebot/cline Cline] (formerly [https://generativeai.pub/meet-claude-dev-an-open-source-autonomous-ai-programmer-in-vs-code-f457f9821b7b Claude Dev])
 +
## [https://www.all-hands.dev/ All Hands]
 +
# AI Agent on Commandline
 +
## [https://aider.chat/ Aider] ([https://github.com/Aider-AI/aider code]): Pair programming on commandline
 +
## [https://docs.anthropic.com/en/docs/claude-code/overview Claude Code]
 +
## [https://openai.com/codex/ OpenAI Codex]
 +
## [https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/ Gemini CLI]
 
# Prompt-to-product
 
# Prompt-to-product
 
## [https://githubnext.com/projects/github-spark Github Spark] ([https://x.com/ashtom/status/1851333075374051725 demo video])
 
## [https://githubnext.com/projects/github-spark Github Spark] ([https://x.com/ashtom/status/1851333075374051725 demo video])
Line 212: Line 225:
 
## [https://aws.amazon.com/q/ Amazon Q] (and CodeWhisperer)
 
## [https://aws.amazon.com/q/ Amazon Q] (and CodeWhisperer)
 
## [https://honeycomb.sh/ Honeycomb]
 
## [https://honeycomb.sh/ Honeycomb]
 +
## [https://www.blackbox.ai/ Agent IDE]
 
## [https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview Claude Code]
 
## [https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview Claude Code]
 
+
## OpenAI [https://help.openai.com/en/articles/11096431-openai-codex-cli-getting-started Codex CLI] and [https://openai.com/index/introducing-codex/ Codex] cloud
 +
## [https://www.factory.ai/ Factory AI] [https://x.com/FactoryAI/status/1927754706014630357 Droids]
 
For a review of the current state of software-engineering agentic approaches, see:
 
For a review of the current state of software-engineering agentic approaches, see:
 
* 2024-08: [https://arxiv.org/abs/2408.02479 From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future]
 
* 2024-08: [https://arxiv.org/abs/2408.02479 From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future]
Line 248: Line 263:
 
=Multi-agent orchestration=
 
=Multi-agent orchestration=
 
==Research==
 
==Research==
 +
* 2025-02: [https://arxiv.org/abs/2502.02533 Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies]
 
* 2025-03: [https://arxiv.org/abs/2503.13657 Why Do Multi-Agent LLM Systems Fail?]
 
* 2025-03: [https://arxiv.org/abs/2503.13657 Why Do Multi-Agent LLM Systems Fail?]
 
* 2025-03: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
 
* 2025-03: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
 +
* 2025-09: [https://arxiv.org/abs/2509.20175 Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI]
  
 
===Organization Schemes===
 
===Organization Schemes===
Line 258: Line 275:
 
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
 
* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
 
* 2025-05: [https://www.science.org/doi/10.1126/sciadv.adu9368 Emergent social conventions and collective bias in LLM populations]
 
* 2025-05: [https://www.science.org/doi/10.1126/sciadv.adu9368 Emergent social conventions and collective bias in LLM populations]
 +
* 2025-09: [https://arxiv.org/abs/2509.10147 Virtual Agent Economies]
  
 
===Domain-specific===
 
===Domain-specific===
Line 294: Line 312:
 
* 2024-11: [https://arxiv.org/abs/2411.02820 DroidSpeak: Enhancing Cross-LLM Communication]: Exploits caches of embeddings and key-values, to allow context to be more easily transferred between AIs (without consuming context window)
 
* 2024-11: [https://arxiv.org/abs/2411.02820 DroidSpeak: Enhancing Cross-LLM Communication]: Exploits caches of embeddings and key-values, to allow context to be more easily transferred between AIs (without consuming context window)
 
* 2024-11: Anthropic describes [https://www.anthropic.com/news/model-context-protocol Model Context Protocol]: an open standard for secure, two-way connections between data sources and AI ([https://modelcontextprotocol.io/introduction intro], [https://modelcontextprotocol.io/quickstart quickstart], [https://github.com/modelcontextprotocol code])
 
* 2024-11: Anthropic describes [https://www.anthropic.com/news/model-context-protocol Model Context Protocol]: an open standard for secure, two-way connections between data sources and AI ([https://modelcontextprotocol.io/introduction intro], [https://modelcontextprotocol.io/quickstart quickstart], [https://github.com/modelcontextprotocol code])
 +
* 2025-09: [https://arxiv.org/abs/2509.20175 Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI]
  
 
==Architectures==
 
==Architectures==
Line 420: Line 439:
 
* 2024-08: [https://arxiv.org/abs/2408.08435 Automated Design of Agentic Systems] ([https://github.com/ShengranHu/ADAS ADAS code])
 
* 2024-08: [https://arxiv.org/abs/2408.08435 Automated Design of Agentic Systems] ([https://github.com/ShengranHu/ADAS ADAS code])
 
* 2024-08: [https://arxiv.org/abs/2408.02666 Self-Taught Evaluators]: Iterative self-improvement through generation of synthetic data and evaluation
 
* 2024-08: [https://arxiv.org/abs/2408.02666 Self-Taught Evaluators]: Iterative self-improvement through generation of synthetic data and evaluation
 +
* 2025-05: [https://arxiv.org/abs/2505.22954 Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents] ([https://github.com/jennyzzt/dgm code], [https://sakana.ai/dgm/ project])
  
 
=See Also=
 
=See Also=

Latest revision as of 11:56, 23 October 2025

Reviews & Perspectives

Published

Continually updating

Analysis/Opinions

Guides

AI Assistants

Components of AI Assistants

Agent Internal Workflow Management

Information Retrieval (Memory)

Contextual Memory

  • Memobase: user profile-based memory (long-term user memory for genAI) applications)

Control (tool-use, computer use, etc.)

Model Context Protocol (MCP)

Agent2Agent Protocol (A2A)

Open-source

Personalities/Personas

Specific Uses for AI Assistants

Computer Use

Software Engineering

Science Agents

See Science Agents.

Medicine

LLM-as-judge

Deep Research

Advanced Workflows

Streamline Administrative Tasks

Author Research Articles

Software Development Workflows

Several paradigms of AI-assisted coding have arisen:

  1. Manual, human driven
  2. AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
    1. OpenAI ChatGPT
    2. Anthropic Claude
    3. Google Gemini
  3. API calls to an LLM, which generates code and inserts the file into the project
  4. LLM-integration into the IDE
    1. Copilot
    2. Qodo (Codium) & AlphaCodium (preprint, code)
    3. Cursor
    4. Codeium Windsurf (with "Cascade" AI Agent)
    5. ByteDance Trae AI
    6. Tabnine
    7. Traycer
    8. IDX: free
    9. Aide: open-source AI-native code editor (fork of VS Code)
    10. continue.dev: open-source code assistant
    11. Pear AI: open-source code editor
    12. Haystack Editor: canvas UI
    13. Onlook: for designers
    14. All Hands AI
    15. Devin 2.0 (Cognition AI)
    16. Google Firebase Studio
    17. rowboat (for building multi-agent workflows)
    18. Trae IDE: The Real AI Engineer
  5. AI-assisted IDE, where the AI generates and manages the dev environment
    1. Replit
    2. Pythagora
    3. StackBlitz bolt.new
    4. Cline (formerly Claude Dev)
    5. All Hands
  6. AI Agent on Commandline
    1. Aider (code): Pair programming on commandline
    2. Claude Code
    3. OpenAI Codex
    4. Gemini CLI
  7. Prompt-to-product
    1. Github Spark (demo video)
    2. Create.xyz: text-to-app, replicate product from link
    3. a0.dev: generate mobil apps (from your phone)
    4. Softgen: web app developer
    5. wrapifai: build form-based apps
    6. Lovable: web app (from text, screenshot, etc.)
    7. Vercel v0
    8. MarsX (John Rush): SaaS builder
    9. Webdraw: turn sketches into web apps
    10. Tempo Labs: build React apps
    11. Databutton: no-code software development
    12. base44: no-code dashboard apps
    13. Origin AI
    14. Emergent AI
  8. Semi-autonomous software engineer agents
    1. Devin (Cognition AI)
    2. Amazon Q (and CodeWhisperer)
    3. Honeycomb
    4. Agent IDE
    5. Claude Code
    6. OpenAI Codex CLI and Codex cloud
    7. Factory AI Droids

For a review of the current state of software-engineering agentic approaches, see:

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Inference-compute Reasoning

AI Assistant

Agentic Systems

Increasing AI Agent Intelligence

See: Increasing AI Intelligence

Multi-agent orchestration

Research

Organization Schemes

Societies and Communities of AI agents

Domain-specific

Research demos

Related work

Inter-agent communications

Architectures

Open Source Frameworks

Open Source Systems

Commercial Automation Frameworks

Multi-agent Handoff/Collaboration

Spreadsheet

Cloud solutions

Frameworks

Optimization

Reviews

Metrics, Benchmarks

See also: AI benchmarks

Evaluation Schemes

Multi-agent

Agent Challenges

  • Aidan-Bench: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
  • Pictionary: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
  • MC-bench: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges (code).

Automated Improvement

See Also