Latest revision as of 10:12, 28 July 2025

Reviews & Perspectives

Published

2024-04: LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models (code)
2024-08: From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
2024-09: Towards a Science Exocortex
2024-09: Large Language Model-Based Agents for Software Engineering: A Survey
2024-09: Agents in Software Engineering: Survey, Landscape, and Vision
2025-04: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
2025-04: A Survey of Large Language Model Agents for Question Answering
2025-04: A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
2025-04: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Continually updating

Analysis/Opinions

Guides

Anthropic: Building Effective Agents
Google: Agents and Agents Companion
OpenAI: A practical guide to building agents
Anthropic: Claude Code: Best practices for agentic coding
Anthropic: How we built our multi-agent research system

AI Assistants

Components of AI Assistants

Agent Internal Workflow Management

LangChain
Pydantic: Agent Framework / shim to use Pydantic with LLMs
Flow: A lightweight task engine for building AI agents that prioritizes simplicity and flexibility
llama-stack
Huggingface smolagents
Eliza (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
Pocket Flow: LLM Framework in 100 Lines
Coze: All-in-one AI agent development tool

Information Retrieval (Memory)

See also RAG.
2024-09: PaperQA2: Language Models Achieve Superhuman Synthesis of Scientific Knowledge (𝕏 post, code)
2024-10: Agentic Information Retrieval
2025-02: DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Mem0 AI: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.

Contextual Memory

Memobase: user profile-based memory (long-term user memory for genAI) applications)

Control (tool-use, computer use, etc.)

See also: Human_Computer_Interaction#AI_Computer_Use
Tavily: Connect Your LLM to the Web: Empowering your AI applications with real-time, accurate search results tailored for LLMs and RAG

Model Context Protocol (MCP)

Standards:
1. Anthropic Model Context Protocol (MCP)
2. OpenAI Agents SDK
Tools:
- FastMCP: The fast, Pythonic way to build MCP servers
- Fleur: A desktop app marketplace for Claude Desktop
Servers:
- Lists:
- Noteworthy:
  1. Official Github MCP server
  2. Unofficial Github MCP server
  3. Puppeteer
  4. Google Maps MCP Server
  5. Slack MCP Server
  6. Zapier MCP Servers (Slack, Google Sheets, Notion, etc.)
  7. AWS MCP Servers
  8. ElevenLabs

Agent2Agent Protocol (A2A)

Google announcement

Open-source

Khoj (code): self-hostable AI assistant
RAGapp: Agentic RAG for enterprise
STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking
- Can write (e.g.) Wikipedia-style articles
- code
- Preprint: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Personalities/Personas

2023-10: Generative Agents: Interactive Simulacra of Human Behavior
2024-11: Microsoft TinyTroupe 🤠🤓🥸🧐: LLM-powered multiagent persona simulation for imagination enhancement and business insights
2024-11: Generative Agent Simulations of 1,000 People (code)

Specific Uses for AI Assistants

LLM-as-judge

Deep Research

Google Deep Research
OpenAI Deep Research
Perplexity:
- Search
- Deep Research
Exa AI:
- Websets: Web research agent
- Web-search agent powered by DeepSeek (code) or o3-mini (code)
Firecrawl wip
Matt Shumer OpenDeepResearcher
DeepSearcher (operate on local data)
nickscamara open-deep-research
dzhng deep-research
huggingface open-Deep-research ([https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research code)
xAI Grok 3 Deep Search
Liner Deep Research
Allen AI (AI2) Paper Finder
2025-03: Open Deep Search: Democratizing Search with Open-source Reasoning Agents (code)
Convergence AI Deep Work (swarms for web-based tasks)
2025-04: DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
2025-04: Anthropic Research
2025-04: WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Advanced Workflows

Salesforce DEI: meta-system that leverages a diversity of SWE agents
- Preprint: Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Sakana AI: AI Scientist
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
- code
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
- code
Skywork Super Agent

Streamline Administrative Tasks

2025-02: Ushering in a New Era of AI-Driven Data Insights at UC San Diego

Author Research Articles

2024-02: STORM: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models (discussion/analysis)

Software Development Workflows

Several paradigms of AI-assisted coding have arisen:

Manual, human driven
AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
1. OpenAI ChatGPT
2. Anthropic Claude
3. Google Gemini
API calls to an LLM, which generates code and inserts the file into the project
LLM-integration into the IDE
1. Copilot
2. Qodo (Codium) & AlphaCodium (preprint, code)
3. Cursor
4. Codeium Windsurf (with "Cascade" AI Agent)
5. ByteDance Trae AI
6. Tabnine
7. Traycer
8. IDX: free
9. Aide: open-source AI-native code editor (fork of VS Code)
10. continue.dev: open-source code assistant
11. Pear AI: open-source code editor
12. Haystack Editor: canvas UI
13. Onlook: for designers
14. All Hands AI
15. Devin 2.0 (Cognition AI)
16. Google Firebase Studio
17. rowboat (for building multi-agent workflows)
18. Trae IDE: The Real AI Engineer
AI-assisted IDE, where the AI generates and manages the dev environment
1. Replit
2. Pythagora
3. StackBlitz bolt.new
4. Cline (formerly Claude Dev)
AI Agent on Commandline
1. Aider (code): Pair programming on commandline
2. Claude Code
3. OpenAI Codex
4. Gemini CLI
Prompt-to-product
1. Github Spark (demo video)
2. Create.xyz: text-to-app, replicate product from link
3. a0.dev: generate mobil apps (from your phone)
4. Softgen: web app developer
5. wrapifai: build form-based apps
6. Lovable: web app (from text, screenshot, etc.)
7. Vercel v0
8. MarsX (John Rush): SaaS builder
9. Webdraw: turn sketches into web apps
10. Tempo Labs: build React apps
11. Databutton: no-code software development
12. base44: no-code dashboard apps
13. Origin AI
14. Emergent AI
Semi-autonomous software engineer agents
1. Devin (Cognition AI)
2. Amazon Q (and CodeWhisperer)
3. Honeycomb
4. Agent IDE
5. Claude Code
6. OpenAI Codex CLI and Codex cloud
7. Factory AI Droids

For a review of the current state of software-engineering agentic approaches, see:

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Payman AI: AI to Human platform that allows AI to pay people for what it needs
VoiceFlow: Build customer experiences with AI
Mistral AI: genAI applications
Taskade: Task/milestone software with AI agent workflows
Covalent: Building a Multi-Agent Prompt Refining Application

Inference-compute Reasoning

Nous Research: Forge Reasoning API Beta

AI Assistant

Convergence Proxy
Shortwave AI Assistant (organize, write, search, schedule, etc.)

Agentic Systems

Topology AI
Cognition AI: Devin software engineer (14% SWE-Agent)
Honeycomb (22% SWE-Agent)
Factory AI
Convergence AI Deep Work (swarms for web-based tasks)
Cloudflare Agents
Maskara AI

Increasing AI Agent Intelligence

See: Increasing AI Intelligence

Multi-agent orchestration

Research

Organization Schemes

2025-03: ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks

Societies and Communities of AI agents

Domain-specific

Research demos

Camel
LoopGPT
JARVIS
OpenAGI
AutoGen
- preprint: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- Agent-E: Browser (eventually computer) automation (code, preprint, demo video)
- AutoGen Studio: GUI for agent workflows (code)
- Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
AG2 (previously AutoGen) (code, docs, Discord)
TaskWeaver
MetaGPT
AutoGPT (code); and AutoGPT Platform
Optima
- preprint: Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
- code
2024-04: LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models (code)
2024-06: MASAI: Modular Architecture for Software-engineering AI Agents
2024-10: Agent S: An Open Agentic Framework that Uses Computers Like a Human (code)
2024-10: AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
2025-02: PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving

Related work

Inter-agent communications

2024-10: Agora: A Scalable Communication Protocol for Networks of Large Language Models (preprint): disparate agents auto-negotiate communication protocol
2024-11: DroidSpeak: Enhancing Cross-LLM Communication: Exploits caches of embeddings and key-values, to allow context to be more easily transferred between AIs (without consuming context window)
2024-11: Anthropic describes Model Context Protocol: an open standard for secure, two-way connections between data sources and AI (intro, quickstart, code)

Architectures

Open Source Frameworks

LangChain
ell (code, docs)
AgentOps AI AgentStack
Agent UI
kyegomez swarms
OpenAI Swarm (cookbook)
Amazon AWS Multi-Agent Orchestrator
KaibanJS: Kanban for AI Agents? (Takes inspiration from Kanban visual work management.)
Agentarium
Orchestra (docs, code)
AutoAgent: Fully-Automated & Zero-Code LLM Agent Framework
Mastra (github): opinionated Typescript framework for AI applications (primitives for workflows, agents, RAG, integrations and evals)
Orra: multi-agent applications with complex real-world interactions
GenSX
Cloudflare agents-sdk (info, code)
OpenAI responses API and agents SDK
Google Agent Development Kit

Open Source Systems

ControlFlow
- documentation
- code
OpenHands (formerly OpenDevin)
- code: platform for autonomous software engineers, powered by AI and LLMs
- Report: OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Commercial Automation Frameworks

Lutra: Automation and integration with various web systems.
Gumloop
TextQL: Enterprise Virtual Data Analyst
Athena intelligence: Analytics platform
Nexus GPT: Business co-pilot
Multi-On: AI agent that acts on your behalf
Firecrawl: Turn websites into LLM-ready data
Reworkd: End-to-end data extraction
Lindy: Custom AI Assistants to automate business workflows
- E.g. use Slack
Bardeen: Automate workflows
Abacus: AI Agents
- HowTo
LlamaIndex: (𝕏, code, docs, Discord)
MultiOn AI: Agent Q (paper) automated planning and execution
Google Agentspace
Flowith

Multi-agent Handoff/Collaboration

Maskara AI

Spreadsheet

Cloud solutions

Numbers Station Meadow: agentic framework for data workflows (code).
CrewAI says they provide multi-agent automations (code).
LangChain introduced LangGraph to help build agents, and LangGraph Cloud as a service for running those agents.
- LangGraph Studio is an IDE for agent workflows
C3 AI enterprise platform
Deepset AI Haystack (docs, code)

Frameworks

Google Project Oscar
- Agent: Gaby (for "Go AI bot") (code, documentation) helps with issue tracking.
OpenPlexity-Pages: Data-aggregator implementation (like Perplexity) based on CrewAI

Optimization

Reviews

Metrics, Benchmarks

Evaluation Schemes

2024-12: LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
2025-01: LLMRank ("SlopRank"): LLMs evaluate each other, allowing top model (for a given prompt/problem) to be inferred from a large number of recommendations.

Multi-agent

Agent Challenges

Aidan-Bench: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
- NeurIPS 2024 paper/poster: AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions
Pictionary: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
MC-bench: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges (code).

Automated Improvement

2024-06: EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
2024-06: Symbolic Learning Enables Self-Evolving Agents
2024-08: Automated Design of Agentic Systems (ADAS code)
2024-08: Self-Taught Evaluators: Iterative self-improvement through generation of synthetic data and evaluation
2025-05: Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents (code, project)

@@ Line 7: / Line 7: @@
 * 2024-09: [https://www.arxiv.org/abs/2409.02977 Large Language Model-Based Agents for Software Engineering: A Survey]
 * 2024-09: [https://arxiv.org/abs/2409.09030 Agents in Software Engineering: Survey, Landscape, and Vision]
+* 2025-04: [https://arxiv.org/abs/2504.01990 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems]
+* 2025-04: [https://arxiv.org/abs/2503.19213 A Survey of Large Language Model Agents for Question Answering]
+* 2025-04: [https://arxiv.org/abs/2504.09037 A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems]
+* 2025-04: [https://arxiv.org/abs/2504.01990 Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems]
 ===Continually updating===
 * [https://github.com/open-thought/system-2-research OpenThought - System 2 Research Links]
 * [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1): Collection of research papers & blogs for OpenAI Strawberry(o1) and Reasoning]
+* [https://github.com/e2b-dev/awesome-ai-agents Awesome AI Agents]
 ===Analysis/Opinions===
 * [https://arxiv.org/abs/2402.01817v3 LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks]
 * [https://rasa.com/blog/cutting-ai-assistant-costs-the-power-of-enhancing-llms-with-business/ Cutting AI Assistant Costs by Up to 77.8%: The Power of Enhancing LLMs with Business Logic]
+* 2025-05: [https://arxiv.org/abs/2505.10468 AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges]
 ===Guides===
 * Anthropic: [https://www.anthropic.com/research/building-effective-agents Building Effective Agents]
-* Google: [https://www.kaggle.com/whitepaper-agents Agents]
+* Google: [https://www.kaggle.com/whitepaper-agents Agents] and [https://www.kaggle.com/whitepaper-agent-companion Agents Companion]
+* OpenAI: [https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf A practical guide to building agents]
+* Anthropic: [https://www.anthropic.com/engineering/claude-code-best-practices Claude Code: Best practices for agentic coding]
+* Anthropic: [https://www.anthropic.com/engineering/built-multi-agent-research-system How we built our multi-agent research system]
 =AI Assistants=
@@ Line 32: / Line 41: @@
 * [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 * [https://github.com/The-Pocket/PocketFlow Pocket Flow]: LLM Framework in 100 Lines
+* [https://github.com/coze-dev/coze-studio Coze]: All-in-one AI agent development tool
 ===Information Retrieval (Memory)===
@@ Line 39: / Line 49: @@
 * 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
 * [https://mem0.ai/ Mem0 AI]: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.
+===Contextual Memory===
+* [https://github.com/memodb-io/memobase Memobase]: user profile-based memory (long-term user memory for genAI) applications)
 ===Control (tool-use, computer use, etc.)===
 * See also: [[Human_Computer_Interaction#AI_Computer_Use]]
 * [https://tavily.com/ Tavily]: Connect Your LLM to the Web: Empowering your AI applications with real-time, accurate search results tailored for LLMs and RAG
-* Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
+===Model Context Protocol (MCP)===
+* '''Standards:'''
+*# Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
+*# [https://openai.github.io/openai-agents-python/mcp/ OpenAI Agents SDK]
+* '''Tools:'''
 ** [https://github.com/jlowin/fastmcp FastMCP]: The fast, Pythonic way to build MCP servers
+** [https://github.com/fleuristes/fleur/ Fleur]: A desktop app marketplace for Claude Desktop
+* '''Servers:'''
+** '''Lists:'''
+**# [https://github.com/modelcontextprotocol/servers Model Context Protocol servers]
+**# [https://www.mcpt.com/ MCP Servers, One Managed Registry]
+**# [https://github.com/punkpeye/awesome-mcp-servers Awesome MCP Servers]
+** '''Noteworthy:'''
+**# Official [https://github.com/github/github-mcp-server Github MCP server]
+**# Unofficial [https://github.com/modelcontextprotocol/servers/tree/main/src/github Github MCP server]
+**# [https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer Puppeteer]
+**# [https://github.com/modelcontextprotocol/servers/tree/main/src/google-maps Google Maps MCP Server]
+**# [https://github.com/modelcontextprotocol/servers/tree/main/src/slack Slack MCP Server]
+**# [https://zapier.com/mcp Zapier MCP Servers] (Slack, Google Sheets, Notion, etc.)
+**# [https://github.com/awslabs/mcp AWS MCP Servers]
+**# [https://x.com/elevenlabsio/status/1909300782673101265 ElevenLabs]
+===Agent2Agent Protocol (A2A)===
+* Google [https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ announcement]
 ===Open-source===
@@ Line 62: / Line 97: @@
 ===Computer Use===
-* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
+* See: [[Human_Computer_Interaction#AI_Computer_Use]]
-* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
 ===Software Engineering===
@@ Line 71: / Line 105: @@
 ===Science Agents===
 See [[Science Agents]].
+===Medicine===
+* 2025-03: [https://news.microsoft.com/2025/03/03/microsoft-dragon-copilot-provides-the-healthcare-industrys-first-unified-voice-ai-assistant-that-enables-clinicians-to-streamline-clinical-documentation-surface-information-and-automate-task/ Microsoft Dragon Copilot]: streamline clinical workflows and paperwork
+* 2025-04: [https://arxiv.org/abs/2504.05186 Training state-of-the-art pathology foundation models with orders of magnitude less data]
+* 2025-04: [https://www.nature.com/articles/s41586-025-08866-7?linkId=13898052 Towards conversational diagnostic artificial intelligence]
+* 2025-04: [https://www.nature.com/articles/s41586-025-08869-4?linkId=13898054 Towards accurate differential diagnosis with large language models]
 ===LLM-as-judge===
@@ Line 78: / Line 118: @@
 * [https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge Awesome-LLM-as-a-judge Survey]
 * [https://github.com/haizelabs/Awesome-LLM-Judges haizelabs Awesome LLM Judges]
+* 2024-08: [https://arxiv.org/abs/2408.02666 Self-Taught Evaluators]
 * 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 * 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
 * 2024-12: [https://arxiv.org/abs/2412.05579 LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods]
+* 2025-03: [https://arxiv.org/abs/2503.19877 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators]
+* 2025-04: [https://arxiv.org/abs/2504.00050 JudgeLRM: Large Reasoning Models as a Judge]
+===Deep Research===
+* Google [https://blog.google/products/gemini/google-gemini-deep-research/ Deep Research]
+* OpenAI [https://openai.com/index/introducing-deep-research/ Deep Research]
+* Perplexity:
+** [https://www.perplexity.ai/ Search]
+** [https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research Deep Research]
+* [https://exa.ai/ Exa AI]:
+** [https://exa.ai/websets Websets]: Web research agent
+** [https://demo.exa.ai/deepseekchat Web-search agent] powered by DeepSeek ([https://github.com/exa-labs/exa-deepseek-chat code]) or [https://o3minichat.exa.ai/ o3-mini] ([https://github.com/exa-labs/exa-o3mini-chat code])
+* [https://www.firecrawl.dev/ Firecrawl] [https://x.com/nickscamara_/status/1886287956291338689 wip]
+* [https://x.com/mattshumer_ Matt Shumer] [https://github.com/mshumer/OpenDeepResearcher OpenDeepResearcher]
+* [https://github.com/zilliztech/deep-searcher DeepSearcher] (operate on local data)
+* [https://github.com/nickscamara nickscamara] [https://github.com/nickscamara/open-deep-research open-deep-research]
+* [https://x.com/dzhng dzhng] [https://github.com/dzhng/deep-research deep-research]
+* [https://huggingface.co/ huggingface] [https://huggingface.co/blog/open-deep-research open-Deep-research ([https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research code])
+* xAI Grok 3 Deep Search
+* [https://liner.com/news/introducing-deepresearch Liner Deep Research]
+* [https://allenai.org/ Allen AI] (AI2) [https://paperfinder.allen.ai/chat Paper Finder]
+* 2025-03: [https://arxiv.org/abs/2503.20201 Open Deep Search: Democratizing Search with Open-source Reasoning Agents] ([https://github.com/sentient-agi/OpenDeepSearch code])
+* [https://convergence.ai/welcome Convergence AI] Deep Work (swarms for web-based tasks)
+* 2025-04: [https://arxiv.org/abs/2504.03160 DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments]
+* 2025-04: Anthropic [https://x.com/AnthropicAI/status/1912192384588271771 Research]
+* 2025-04: [https://arxiv.org/abs/2504.21776 WebThinker: Empowering Large Reasoning Models with Deep Research Capability]
 =Advanced Workflows=
@@ Line 90: / Line 157: @@
 * [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 ** [https://github.com/lamm-mit/SciAgentsDiscovery code]
+* [https://skywork.ai/home Skywork] [https://skywork.ai/home?inviter=el.cine&shortlink_id=1919604877427924992&utm_source=X Super Agent]
+===Streamline Administrative Tasks===
+* 2025-02: [https://er.educause.edu/articles/2025/2/ushering-in-a-new-era-of-ai-driven-data-insights-at-uc-san-diego Ushering in a New Era of AI-Driven Data Insights at UC San Diego]
 ===Author Research Articles===
@@ Line 100: / Line 171: @@
 ## OpenAI [https://chatgpt.com/ ChatGPT]
 ## Anthropic [https://claude.ai/ Claude]
+## Google [https://gemini.google.com/app Gemini]
 # API calls to an LLM, which generates code and inserts the file into the project
 # LLM-integration into the IDE
 ## [https://github.com/features/copilot Copilot]
 ## [https://www.qodo.ai/ Qodo] (Codium) & [https://www.qodo.ai/products/alphacodium/ AlphaCodium] ([https://arxiv.org/abs/2401.08500 preprint], [https://github.com/Codium-ai/AlphaCodium code])
-## [https://www.cursor.com/ Cursor]
+## '''[https://www.cursor.com/ Cursor]'''
 ## [https://codeium.com/ Codeium] [https://codeium.com/windsurf Windsurf] (with "Cascade" AI Agent)
+## ByteDance [https://www.trae.ai/ Trae AI]
+## [https://www.tabnine.com/ Tabnine]
+## [https://marketplace.visualstudio.com/items?itemName=Traycer.traycer-vscode Traycer]
+## [https://idx.dev/ IDX]: free
+## [https://github.com/codestoryai/aide Aide]: open-source AI-native code editor (fork of VS Code)
+## [https://www.continue.dev/ continue.dev]: open-source code assistant
+## [https://trypear.ai/ Pear AI]: open-source code editor
+## [https://haystackeditor.com/ Haystack Editor]: canvas UI
+## [https://onlook.com/ Onlook]: for designers
+## [https://www.all-hands.dev/ All Hands AI]
+## [https://app.devin.ai/ Devin 2.0] ([https://cognition.ai/ Cognition AI])
+## Google [https://firebase.google.com/docs/studio Firebase Studio]
+## [https://github.com/rowboatlabs/rowboat rowboat] (for building multi-agent workflows)
+## [https://www.trae.ai/ Trae IDE]: The Real AI Engineer
 # AI-assisted IDE, where the AI generates and manages the dev environment
 ## [https://replit.com/ Replit]
-## [https://aider.chat/ Aider] ([https://github.com/Aider-AI/aider code]): Pair programming on commandline
 ## [https://www.pythagora.ai/ Pythagora]
 ## [https://stackblitz.com/ StackBlitz] [https://bolt.new/ bolt.new]
 ## [https://github.com/clinebot/cline Cline] (formerly [https://generativeai.pub/meet-claude-dev-an-open-source-autonomous-ai-programmer-in-vs-code-f457f9821b7b Claude Dev])
+# AI Agent on Commandline
+## [https://aider.chat/ Aider] ([https://github.com/Aider-AI/aider code]): Pair programming on commandline
+## [https://docs.anthropic.com/en/docs/claude-code/overview Claude Code]
+## [https://openai.com/codex/ OpenAI Codex]
+## [https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/ Gemini CLI]
 # Prompt-to-product
 ## [https://githubnext.com/projects/github-spark Github Spark] ([https://x.com/ashtom/status/1851333075374051725 demo video])
+## [https://www.create.xyz/ Create.xyz]: text-to-app, replicate product from link
+## [https://a0.dev/ a0.dev]: generate mobil apps (from your phone)
+## [https://softgen.ai/ Softgen]: web app developer
+## [https://wrapifai.com/ wrapifai]: build form-based apps
+## [https://lovable.dev/ Lovable]: web app (from text, screenshot, etc.)
+## [https://v0.dev/ Vercel v0]
+## [https://x.com/johnrushx/status/1625179509728198665 MarsX] ([https://x.com/johnrushx John Rush]): SaaS builder
+## [https://webdraw.com/ Webdraw]: turn sketches into web apps
+## [https://www.tempo.new/ Tempo Labs]: build React apps
+## [https://databutton.com/ Databutton]: no-code software development
+## [https://base44.com/ base44]: no-code dashboard apps
+## [https://www.theorigin.ai/ Origin AI]
+## [https://app.emergent.sh/ Emergent AI]
 # Semi-autonomous software engineer agents
 ## [https://www.cognition.ai/blog/introducing-devin Devin] (Cognition AI)
-## [https://aws.amazon.com/q/ Amazon Q]
+## [https://aws.amazon.com/q/ Amazon Q] (and CodeWhisperer)
 ## [https://honeycomb.sh/ Honeycomb]
+## [https://www.blackbox.ai/ Agent IDE]
+## [https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview Claude Code]
+## OpenAI [https://help.openai.com/en/articles/11096431-openai-codex-cli-getting-started Codex CLI] and [https://openai.com/index/introducing-codex/ Codex] cloud
+## [https://www.factory.ai/ Factory AI] [https://x.com/FactoryAI/status/1927754706014630357 Droids]
 For a review of the current state of software-engineering agentic approaches, see:
 * 2024-08: [https://arxiv.org/abs/2408.02479 From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future]
@@ Line 134: / Line 240: @@
 ==Inference-compute Reasoning==
 * [https://nousresearch.com/#popup-menu-anchor Nous Research]: [https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/ Forge Reasoning API Beta]
+==AI Assistant==
+* [https://convergence.ai/ Convergence] [https://proxy.convergence.ai/ Proxy]
+* [https://www.shortwave.com/ Shortwave] [https://www.shortwave.com/docs/guides/ai-assistant/ AI Assistant] (organize, write, search, schedule, etc.)
 ==Agentic Systems==
@@ Line 139: / Line 249: @@
 * [https://www.cognition.ai/ Cognition AI]: [https://www.cognition.ai/blog/introducing-devin Devin] software engineer (14% SWE-Agent)
 * [https://honeycomb.sh/ Honeycomb] ([https://honeycomb.sh/blog/swe-bench-technical-report 22% SWE-Agent])
+* [https://www.factory.ai/ Factory AI]
+* [https://convergence.ai/welcome Convergence AI] Deep Work (swarms for web-based tasks)
+* [https://agents.cloudflare.com/ Cloudflare Agents]
+* [https://www.maskara.ai/ Maskara AI]
 =Increasing AI Agent Intelligence=
@@ Line 145: / Line 259: @@
 =Multi-agent orchestration=
 ==Research==
+* 2025-02: [https://arxiv.org/abs/2502.02533 Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies]
+* 2025-03: [https://arxiv.org/abs/2503.13657 Why Do Multi-Agent LLM Systems Fail?]
+* 2025-03: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
+===Organization Schemes===
+* 2025-03: [https://arxiv.org/abs/2503.02390 ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks]
 ===Societies and Communities of AI agents===
 * 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
+* 2025-04: [https://arxiv.org/abs/2504.10157 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users]
+* 2025-05: [https://www.science.org/doi/10.1126/sciadv.adu9368 Emergent social conventions and collective bias in LLM populations]
 ===Domain-specific===
@@ Line 173: / Line 296: @@
 * 2024-10: [https://arxiv.org/abs/2410.08164 Agent S: An Open Agentic Framework that Uses Computers Like a Human] ([https://github.com/simular-ai/Agent-S code])
 * 2024-10: [https://arxiv.org/abs/2410.20424 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions]
+* 2025-02: [https://arxiv.org/abs/2502.16111 PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving]
 ===Related work===
 * 2024-07: [https://arxiv.org/abs/2407.18416 PersonaGym: Evaluating Persona Agents and LLMs]
+* 2025-01: [https://arxiv.org/abs/2501.13946 Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks]
 ===Inter-agent communications===
@@ Line 197: / Line 322: @@
 * [https://github.com/Thytu/Agentarium Agentarium]
 * [https://orchestra.org/ Orchestra] ([https://docs.orchestra.org/orchestra/introduction docs], [https://docs.orchestra.org/orchestra/introduction code])
+* [https://github.com/HKUDS/AutoAgent AutoAgent]: Fully-Automated & Zero-Code LLM Agent Framework
+* [https://mastra.ai/ Mastra] ([https://github.com/mastra-ai/mastra github]): opinionated Typescript framework for AI applications (primitives for workflows, agents, RAG, integrations and evals)
+* [https://github.com/orra-dev/orra Orra]: multi-agent applications with complex real-world interactions
+* [https://github.com/gensx-inc/gensx/blob/main/README.md GenSX]
+* Cloudflare [https://developers.cloudflare.com/agents/ agents-sdk] ([https://blog.cloudflare.com/build-ai-agents-on-cloudflare/ info], [https://github.com/cloudflare/agents code])
+* OpenAI [https://platform.openai.com/docs/api-reference/responses responses API] and [https://platform.openai.com/docs/guides/agents agents SDK]
+* Google [https://google.github.io/adk-docs/ Agent Development Kit]
 ==Open Source Systems==
@@ Line 219: / Line 351: @@
 * [https://www.bardeen.ai/ Bardeen]: Automate workflows
 * [https://abacus.ai/ Abacus]: [https://abacus.ai/ai_agents AI Agents]
+** [https://abacus.ai/help/howTo HowTo]
 * [https://www.llamaindex.ai/ LlamaIndex]: ([https://x.com/llama_index 𝕏], [https://github.com/run-llama/llama_index code], [https://docs.llamaindex.ai/en/stable/ docs], [https://discord.com/invite/dGcwcsnxhU Discord])
 * [https://www.multion.ai/ MultiOn AI]: [https://www.multion.ai/blog/introducing-agent-q-research-breakthrough-for-the-next-generation-of-ai-agents-with-planning-and-self-healing-capabilities Agent Q] ([https://multion-research.s3.us-east-2.amazonaws.com/AgentQ.pdf paper]) automated planning and execution
+* Google [https://cloud.google.com/products/agentspace Agentspace]
+* [https://try.flowith.io/ Flowith]
+===Multi-agent Handoff/Collaboration===
+* [https://www.maskara.ai/ Maskara AI]
 ===Spreadsheet===
@@ Line 226: / Line 364: @@
 * [https://ottogrid.ai/ Otto Grid]
 * [https://www.paradigmai.com/ Paradigm]
+* [https://www.superworker.ai/ Superworker AI]
+* [https://www.genspark.ai/ Genspark]
 ==Cloud solutions==
@@ Line 241: / Line 381: @@
 =Optimization=
+===Reviews===
+* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
+* 2025-03: [https://arxiv.org/abs/2503.16416 Survey on Evaluation of LLM-based Agents]
 ===Metrics, Benchmarks===
+See also: [[AI benchmarks]]
 * 2019-11: [https://arxiv.org/abs/1911.01547 On the Measure of Intelligence]
 * 2022-06: [https://arxiv.org/abs/2206.10498 PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change]
@@ Line 260: / Line 405: @@
 * 2024-11: [https://arxiv.org/abs/2411.13543 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games]
 * 2024-12: [https://arxiv.org/abs/2412.14161 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks] ([https://github.com/TheAgentCompany/TheAgentCompany code], [https://the-agent-company.com/ project], [https://the-agent-company.com/#/leaderboard leaderboard])
-* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 * 2025-01: [https://codeelo-bench.github.io/ CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings] ([https://arxiv.org/abs/2501.01257 preprint], [https://codeelo-bench.github.io/#leaderboard-table leaderboard])
-* 2025-02: ENIGMAEVAL:A Benchmark of Long Multimodal Reasoning Challenges] ([https://scale.com/leaderboard/enigma_eval leaderboard])
+* 2025-02: [https://static.scale.com/uploads/654197dc94d34f66c0f5184e/EnigmaEval%20v4.pdf ENIGMAEVAL:A Benchmark of Long Multimodal Reasoning Challenges] ([https://scale.com/leaderboard/enigma_eval leaderboard])
+* 2025-02: [https://sites.google.com/view/mlgym MLGym: A New Framework and Benchmark for Advancing AI Research Agents] ([https://arxiv.org/abs/2502.14499 paper], [https://github.com/facebookresearch/MLGym code])
+* 2025-02: [https://arxiv.org/abs/2502.18356 WebGames: Challenging General-Purpose Web-Browsing AI Agents]
+* 2025-03: ColBench: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
+* 2025-04 OpenAI [https://openai.com/index/browsecomp/ BrowseComp: a benchmark for browsing agents]
+* 2025-04: [https://arxiv.org/abs/2504.11844 Evaluating the Goal-Directedness of Large Language Models]
 ===Evaluation Schemes===
@@ Line 270: / Line 419: @@
 ===Multi-agent===
 * 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
+* [https://github.com/lechmazur/step_game/ Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure]
 ===Agent Challenges===
@@ Line 275: / Line 425: @@
 ** NeurIPS 2024 paper/poster: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions]
 * [https://x.com/paul_cal/status/1850262678712856764 Pictionary]: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
-* [https://github.com/mc-bench/orchestrator MC-bench]: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges.
+* [https://mcbench.ai/ MC-bench]: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges ([https://github.com/mc-bench/orchestrator code]).
 ===Automated Improvement===
@@ Line 282: / Line 432: @@
 * 2024-08: [https://arxiv.org/abs/2408.08435 Automated Design of Agentic Systems] ([https://github.com/ShengranHu/ADAS ADAS code])
 * 2024-08: [https://arxiv.org/abs/2408.02666 Self-Taught Evaluators]: Iterative self-improvement through generation of synthetic data and evaluation
+* 2025-05: [https://arxiv.org/abs/2505.22954 Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents] ([https://github.com/jennyzzt/dgm code], [https://sakana.ai/dgm/ project])
 =See Also=
 * [[Science Agents]]
+* [[Increasing AI Intelligence]]
 * [[AI tools]]
 * [[AI understanding]]
 * [[Robots]]
 * [[Exocortex]]

Difference between revisions of "AI Agents"

Latest revision as of 10:12, 28 July 2025

Contents

Reviews & Perspectives

Published

Continually updating

Analysis/Opinions

Guides

AI Assistants

Components of AI Assistants

Agent Internal Workflow Management

Information Retrieval (Memory)

Contextual Memory

Control (tool-use, computer use, etc.)

Model Context Protocol (MCP)

Agent2Agent Protocol (A2A)

Open-source

Personalities/Personas

Specific Uses for AI Assistants

Computer Use

Software Engineering

Science Agents

Medicine

LLM-as-judge

Deep Research

Advanced Workflows

Streamline Administrative Tasks

Author Research Articles

Software Development Workflows

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Inference-compute Reasoning

AI Assistant

Agentic Systems

Increasing AI Agent Intelligence

Multi-agent orchestration

Research

Organization Schemes

Societies and Communities of AI agents

Domain-specific

Research demos

Related work

Inter-agent communications

Architectures

Open Source Frameworks

Open Source Systems

Commercial Automation Frameworks

Multi-agent Handoff/Collaboration

Spreadsheet

Cloud solutions

Frameworks

Optimization

Reviews

Metrics, Benchmarks

Evaluation Schemes

Multi-agent

Agent Challenges

Automated Improvement

See Also

Navigation menu

Search