Difference between revisions of "AI Agents"

From GISAXS
Jump to: navigation, search
(Software Engineering)
(Deep Research)
 
(58 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
* [https://github.com/open-thought/system-2-research OpenThought - System 2 Research Links]
 
* [https://github.com/open-thought/system-2-research OpenThought - System 2 Research Links]
 
* [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1): Collection of research papers & blogs for OpenAI Strawberry(o1) and Reasoning]
 
* [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1): Collection of research papers & blogs for OpenAI Strawberry(o1) and Reasoning]
 +
* [https://github.com/e2b-dev/awesome-ai-agents Awesome AI Agents]
  
 
===Analysis/Opinions===
 
===Analysis/Opinions===
Line 31: Line 32:
 
* [https://huggingface.co/blog/smolagents Huggingface] [https://github.com/huggingface/smolagents smolagents]
 
* [https://huggingface.co/blog/smolagents Huggingface] [https://github.com/huggingface/smolagents smolagents]
 
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 
* [https://github.com/elizaOS/eliza Eliza] (includes multi-agent, interaction with docs, Discord, Twitter, etc.)
 +
* [https://github.com/The-Pocket/PocketFlow Pocket Flow]: LLM Framework in 100 Lines
  
===Information Retrieval===
+
===Information Retrieval (Memory)===
 
* See also [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|RAG]].
 
* See also [[AI_tools#Retrieval_Augmented_Generation_.28RAG.29|RAG]].
 
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
 
* 2024-09: PaperQA2: [https://paper.wikicrow.ai/ Language Models Achieve Superhuman Synthesis of Scientific Knowledge] ([https://x.com/SGRodriques/status/1833908643856818443 𝕏 post], [https://github.com/Future-House/paper-qa code])
 
* 2024-10: [https://arxiv.org/abs/2410.09713 Agentic Information Retrieval]
 
* 2024-10: [https://arxiv.org/abs/2410.09713 Agentic Information Retrieval]
 +
* 2025-02: [https://arxiv.org/abs/2502.01142 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models]
 +
* [https://mem0.ai/ Mem0 AI]: Memory Layer for AI Agents; self-improving memory layer for LLM applications, enabling personalized.
 +
 +
===Contextual Memory===
 +
* [https://github.com/memodb-io/memobase Memobase]: user profile-based memory (long-term user memory for genAI) applications)
  
 
===Control (tool-use, computer use, etc.)===
 
===Control (tool-use, computer use, etc.)===
* Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
+
* See also: [[Human_Computer_Interaction#AI_Computer_Use]]
 +
* [https://tavily.com/ Tavily]: Connect Your LLM to the Web: Empowering your AI applications with real-time, accurate search results tailored for LLMs and RAG
 +
===Model Context Protocol (MCP)===
 +
* '''Standards:'''
 +
*# Anthropic [https://www.anthropic.com/news/model-context-protocol Model Context Protocol] (MCP)
 +
*# [https://openai.github.io/openai-agents-python/mcp/ OpenAI Agents SDK]
 +
* '''Tools:'''
 
** [https://github.com/jlowin/fastmcp FastMCP]: The fast, Pythonic way to build MCP servers
 
** [https://github.com/jlowin/fastmcp FastMCP]: The fast, Pythonic way to build MCP servers
 +
** [https://github.com/fleuristes/fleur/ Fleur]: A desktop app marketplace for Claude Desktop
 +
* '''Servers:'''
 +
** '''Lists:'''
 +
**# [https://github.com/modelcontextprotocol/servers Model Context Protocol servers]
 +
**# [https://www.mcpt.com/ MCP Servers, One Managed Registry]
 +
**# [https://github.com/punkpeye/awesome-mcp-servers Awesome MCP Servers]
 +
** '''Noteworthy:'''
 +
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/github Github MCP server]
 +
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/puppeteer Puppeteer]
 +
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/google-maps Google Maps MCP Server]
 +
**# [https://github.com/modelcontextprotocol/servers/tree/main/src/slack Slack MCP Server]
  
 
===Open-source===
 
===Open-source===
Line 57: Line 81:
  
 
===Computer Use===
 
===Computer Use===
* 2024-11: [https://arxiv.org/abs/2411.10323 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use] ([https://github.com/showlab/computer_use_ootb code])
+
* See: [[Human_Computer_Interaction#AI_Computer_Use]]
  
 
===Software Engineering===
 
===Software Engineering===
Line 65: Line 89:
 
===Science Agents===
 
===Science Agents===
 
See [[Science Agents]].
 
See [[Science Agents]].
 +
 +
===Medicine===
 +
* 2025-03: [https://news.microsoft.com/2025/03/03/microsoft-dragon-copilot-provides-the-healthcare-industrys-first-unified-voice-ai-assistant-that-enables-clinicians-to-streamline-clinical-documentation-surface-information-and-automate-task/ Microsoft Dragon Copilot]: streamline clinical workflows and paperwork
  
 
===LLM-as-judge===
 
===LLM-as-judge===
Line 71: Line 98:
 
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
 
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
 
* [https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge Awesome-LLM-as-a-judge Survey]
 
* [https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge Awesome-LLM-as-a-judge Survey]
 +
* [https://github.com/haizelabs/Awesome-LLM-Judges haizelabs Awesome LLM Judges]
 
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 
* 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
 
* 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
 
* 2024-12: [https://arxiv.org/abs/2412.05579 LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods]
 
* 2024-12: [https://arxiv.org/abs/2412.05579 LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods]
 +
* 2025-03: [https://arxiv.org/abs/2503.19877 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators]
 +
 +
===Deep Research===
 +
* Google [https://blog.google/products/gemini/google-gemini-deep-research/ Deep Research]
 +
* OpenAI [https://openai.com/index/introducing-deep-research/ Deep Research]
 +
* Perplexity:
 +
** [https://www.perplexity.ai/ Search]
 +
** [https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research Deep Research]
 +
* [https://exa.ai/ Exa AI]:
 +
** [https://exa.ai/websets Websets]: Web research agent
 +
** [https://demo.exa.ai/deepseekchat Web-search agent] powered by DeepSeek ([https://github.com/exa-labs/exa-deepseek-chat code]) or [https://o3minichat.exa.ai/ o3-mini] ([https://github.com/exa-labs/exa-o3mini-chat code])
 +
* [https://www.firecrawl.dev/ Firecrawl] [https://x.com/nickscamara_/status/1886287956291338689 wip]
 +
* [https://x.com/mattshumer_ Matt Shumer] [https://github.com/mshumer/OpenDeepResearcher OpenDeepResearcher]
 +
* [https://github.com/zilliztech/deep-searcher DeepSearcher] (operate on local data)
 +
* [https://github.com/nickscamara nickscamara] [https://github.com/nickscamara/open-deep-research open-deep-research]
 +
* [https://x.com/dzhng dzhng] [https://github.com/dzhng/deep-research deep-research]
 +
* [https://huggingface.co/ huggingface] [https://huggingface.co/blog/open-deep-research open-Deep-research ([https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research code])
 +
* xAI Grok 3 Deep Search
 +
* [https://liner.com/news/introducing-deepresearch Liner Deep Research]
 +
* [https://allenai.org/ Allen AI] (AI2) [https://paperfinder.allen.ai/chat Paper Finder]
 +
* 2025-03: [https://arxiv.org/abs/2503.20201 Open Deep Search: Democratizing Search with Open-source Reasoning Agents] ([https://github.com/sentient-agi/OpenDeepSearch code])
  
 
=Advanced Workflows=
 
=Advanced Workflows=
Line 83: Line 132:
 
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 
* [https://arxiv.org/abs/2409.05556 SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning]
 
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
 
** [https://github.com/lamm-mit/SciAgentsDiscovery code]
 +
 +
===Streamline Administrative Tasks===
 +
* 2025-02: [https://er.educause.edu/articles/2025/2/ushering-in-a-new-era-of-ai-driven-data-insights-at-uc-san-diego Ushering in a New Era of AI-Driven Data Insights at UC San Diego]
 +
 +
===Author Research Articles===
 +
* 2024-02: STORM: [https://arxiv.org/abs/2402.14207 Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models] ([https://www.aihero.dev/storm-generate-high-quality-articles-based-on-real-research discussion/analysis])
  
 
===Software Development Workflows===
 
===Software Development Workflows===
Line 96: Line 151:
 
## [https://www.cursor.com/ Cursor]
 
## [https://www.cursor.com/ Cursor]
 
## [https://codeium.com/ Codeium] [https://codeium.com/windsurf Windsurf] (with "Cascade" AI Agent)
 
## [https://codeium.com/ Codeium] [https://codeium.com/windsurf Windsurf] (with "Cascade" AI Agent)
 +
## ByteDance [https://www.trae.ai/ Trae AI]
 +
## [https://www.tabnine.com/ Tabnine]
 +
## [https://marketplace.visualstudio.com/items?itemName=Traycer.traycer-vscode Traycer]
 +
## [https://idx.dev/ IDX]: free
 +
## [https://github.com/codestoryai/aide Aide]: open-source AI-native code editor (fork of VS Code)
 +
## [https://www.continue.dev/ continue.dev]: open-source code assistant
 +
## [https://trypear.ai/ Pear AI]: open-source code editor
 +
## [https://haystackeditor.com/ Haystack Editor]: canvas UI
 +
## [https://onlook.com/ Onlook]: for designers
 
# AI-assisted IDE, where the AI generates and manages the dev environment
 
# AI-assisted IDE, where the AI generates and manages the dev environment
 
## [https://replit.com/ Replit]
 
## [https://replit.com/ Replit]
Line 104: Line 168:
 
# Prompt-to-product
 
# Prompt-to-product
 
## [https://githubnext.com/projects/github-spark Github Spark] ([https://x.com/ashtom/status/1851333075374051725 demo video])
 
## [https://githubnext.com/projects/github-spark Github Spark] ([https://x.com/ashtom/status/1851333075374051725 demo video])
 +
## [https://www.create.xyz/ Create.xyz]: text-to-app, replicate product from link
 +
## [https://a0.dev/ a0.dev]: generate mobil apps (from your phone)
 +
## [https://softgen.ai/ Softgen]: web app developer
 +
## [https://wrapifai.com/ wrapifai]: build form-based apps
 +
## [https://lovable.dev/ Lovable]: web app (from text, screenshot, etc.)
 +
## [https://v0.dev/ Vercel v0]
 +
## [https://x.com/johnrushx/status/1625179509728198665 MarsX] ([https://x.com/johnrushx John Rush]): SaaS builder
 +
## [https://webdraw.com/ Webdraw]: turn sketches into web apps
 +
## [https://www.tempo.new/ Tempo Labs]: build React apps
 +
## [https://databutton.com/ Databutton]: no-code software development
 +
## [https://base44.com/ base44]: no-code dashboard apps
 +
## [https://www.theorigin.ai/ Origin AI]
 
# Semi-autonomous software engineer agents
 
# Semi-autonomous software engineer agents
 
## [https://www.cognition.ai/blog/introducing-devin Devin] (Cognition AI)
 
## [https://www.cognition.ai/blog/introducing-devin Devin] (Cognition AI)
## [https://aws.amazon.com/q/ Amazon Q]
+
## [https://aws.amazon.com/q/ Amazon Q] (and CodeWhisperer)
 
## [https://honeycomb.sh/ Honeycomb]
 
## [https://honeycomb.sh/ Honeycomb]
 +
## [https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview Claude Code]
  
 
For a review of the current state of software-engineering agentic approaches, see:
 
For a review of the current state of software-engineering agentic approaches, see:
Line 124: Line 201:
 
==Inference-compute Reasoning==
 
==Inference-compute Reasoning==
 
* [https://nousresearch.com/#popup-menu-anchor Nous Research]: [https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/ Forge Reasoning API Beta]
 
* [https://nousresearch.com/#popup-menu-anchor Nous Research]: [https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/ Forge Reasoning API Beta]
 +
 +
==AI Assistant==
 +
* [https://convergence.ai/ Convergence] [https://proxy.convergence.ai/ Proxy]
  
 
==Agentic Systems==
 
==Agentic Systems==
Line 129: Line 209:
 
* [https://www.cognition.ai/ Cognition AI]: [https://www.cognition.ai/blog/introducing-devin Devin] software engineer (14% SWE-Agent)
 
* [https://www.cognition.ai/ Cognition AI]: [https://www.cognition.ai/blog/introducing-devin Devin] software engineer (14% SWE-Agent)
 
* [https://honeycomb.sh/ Honeycomb] ([https://honeycomb.sh/blog/swe-bench-technical-report 22% SWE-Agent])
 
* [https://honeycomb.sh/ Honeycomb] ([https://honeycomb.sh/blog/swe-bench-technical-report 22% SWE-Agent])
 +
* [https://www.factory.ai/ Factory AI]
  
 
=Increasing AI Agent Intelligence=
 
=Increasing AI Agent Intelligence=
Line 135: Line 216:
 
=Multi-agent orchestration=
 
=Multi-agent orchestration=
 
==Research==
 
==Research==
 +
* 2025-03: [https://arxiv.org/abs/2503.13657 Why Do Multi-Agent LLM Systems Fail?]
 +
* 2025-03: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
 +
 +
===Organization Schemes===
 +
* 2025-03: [https://arxiv.org/abs/2503.02390 ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks]
 +
 
===Societies and Communities of AI agents===
 
===Societies and Communities of AI agents===
 
* 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
 
* 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
Line 163: Line 250:
 
* 2024-10: [https://arxiv.org/abs/2410.08164 Agent S: An Open Agentic Framework that Uses Computers Like a Human] ([https://github.com/simular-ai/Agent-S code])
 
* 2024-10: [https://arxiv.org/abs/2410.08164 Agent S: An Open Agentic Framework that Uses Computers Like a Human] ([https://github.com/simular-ai/Agent-S code])
 
* 2024-10: [https://arxiv.org/abs/2410.20424 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions]
 
* 2024-10: [https://arxiv.org/abs/2410.20424 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions]
 +
* 2025-02: [https://arxiv.org/abs/2502.16111 PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving]
  
 
===Related work===
 
===Related work===
Line 187: Line 275:
 
* [https://github.com/Thytu/Agentarium Agentarium]
 
* [https://github.com/Thytu/Agentarium Agentarium]
 
* [https://orchestra.org/ Orchestra] ([https://docs.orchestra.org/orchestra/introduction docs], [https://docs.orchestra.org/orchestra/introduction code])
 
* [https://orchestra.org/ Orchestra] ([https://docs.orchestra.org/orchestra/introduction docs], [https://docs.orchestra.org/orchestra/introduction code])
 +
* [https://github.com/HKUDS/AutoAgent AutoAgent]: Fully-Automated & Zero-Code LLM Agent Framework
 +
* [https://mastra.ai/ Mastra] ([https://github.com/mastra-ai/mastra github]): opinionated Typescript framework for AI applications (primitives for workflows, agents, RAG, integrations and evals)
 +
* [https://github.com/orra-dev/orra Orra]: multi-agent applications with complex real-world interactions
 +
* [https://github.com/gensx-inc/gensx/blob/main/README.md GenSX]
 +
* Cloudflare [https://developers.cloudflare.com/agents/ agents-sdk] ([https://blog.cloudflare.com/build-ai-agents-on-cloudflare/ info], [https://github.com/cloudflare/agents code])
 +
* OpenAI [https://platform.openai.com/docs/api-reference/responses responses API] and [https://platform.openai.com/docs/guides/agents agents SDK]
  
 
==Open Source Systems==
 
==Open Source Systems==
Line 216: Line 310:
 
* [https://ottogrid.ai/ Otto Grid]
 
* [https://ottogrid.ai/ Otto Grid]
 
* [https://www.paradigmai.com/ Paradigm]
 
* [https://www.paradigmai.com/ Paradigm]
 +
* [https://www.superworker.ai/ Superworker AI]
  
 
==Cloud solutions==
 
==Cloud solutions==
Line 231: Line 326:
  
 
=Optimization=
 
=Optimization=
 +
===Reviews===
 +
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 +
* 2025-03: [https://arxiv.org/abs/2503.16416 Survey on Evaluation of LLM-based Agents]
 +
 
===Metrics, Benchmarks===
 
===Metrics, Benchmarks===
 
* 2019-11: [https://arxiv.org/abs/1911.01547 On the Measure of Intelligence]
 
* 2019-11: [https://arxiv.org/abs/1911.01547 On the Measure of Intelligence]
Line 250: Line 349:
 
* 2024-11: [https://arxiv.org/abs/2411.13543 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games]
 
* 2024-11: [https://arxiv.org/abs/2411.13543 BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games]
 
* 2024-12: [https://arxiv.org/abs/2412.14161 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks] ([https://github.com/TheAgentCompany/TheAgentCompany code], [https://the-agent-company.com/ project], [https://the-agent-company.com/#/leaderboard leaderboard])
 
* 2024-12: [https://arxiv.org/abs/2412.14161 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks] ([https://github.com/TheAgentCompany/TheAgentCompany code], [https://the-agent-company.com/ project], [https://the-agent-company.com/#/leaderboard leaderboard])
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 
 
* 2025-01: [https://codeelo-bench.github.io/ CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings] ([https://arxiv.org/abs/2501.01257 preprint], [https://codeelo-bench.github.io/#leaderboard-table leaderboard])
 
* 2025-01: [https://codeelo-bench.github.io/ CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings] ([https://arxiv.org/abs/2501.01257 preprint], [https://codeelo-bench.github.io/#leaderboard-table leaderboard])
 +
* 2025-02: [https://static.scale.com/uploads/654197dc94d34f66c0f5184e/EnigmaEval%20v4.pdf ENIGMAEVAL:A Benchmark of Long Multimodal Reasoning Challenges] ([https://scale.com/leaderboard/enigma_eval leaderboard])
 +
* 2025-02: [https://sites.google.com/view/mlgym MLGym: A New Framework and Benchmark for Advancing AI Research Agents] ([https://arxiv.org/abs/2502.14499 paper], [https://github.com/facebookresearch/MLGym code])
 +
* 2025-02: [https://arxiv.org/abs/2502.18356 WebGames: Challenging General-Purpose Web-Browsing AI Agents]
 +
* 2025-03: ColBench: [https://arxiv.org/abs/2503.15478 SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks]
  
 
===Evaluation Schemes===
 
===Evaluation Schemes===
Line 259: Line 361:
 
===Multi-agent===
 
===Multi-agent===
 
* 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
 
* 2024-12: [https://arxiv.org/abs/2412.10270 Cultural Evolution of Cooperation among LLM Agents]
 +
* [https://github.com/lechmazur/step_game/ Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure]
  
 
===Agent Challenges===
 
===Agent Challenges===
Line 274: Line 377:
 
=See Also=
 
=See Also=
 
* [[Science Agents]]
 
* [[Science Agents]]
 +
* [[Increasing AI Intelligence]]
 
* [[AI tools]]
 
* [[AI tools]]
 
* [[AI understanding]]
 
* [[AI understanding]]
 
* [[Robots]]
 
* [[Robots]]
 
* [[Exocortex]]
 
* [[Exocortex]]

Latest revision as of 16:53, 31 March 2025

Reviews & Perspectives

Published

Continually updating

Analysis/Opinions

Guides

AI Assistants

Components of AI Assistants

Agent Internal Workflow Management

Information Retrieval (Memory)

Contextual Memory

  • Memobase: user profile-based memory (long-term user memory for genAI) applications)

Control (tool-use, computer use, etc.)

Model Context Protocol (MCP)

Open-source

Personalities/Personas

Specific Uses for AI Assistants

Computer Use

Software Engineering

Science Agents

See Science Agents.

Medicine

LLM-as-judge

Deep Research

Advanced Workflows

Streamline Administrative Tasks

Author Research Articles

Software Development Workflows

Several paradigms of AI-assisted coding have arisen:

  1. Manual, human driven
  2. AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
    1. OpenAI ChatGPT
    2. Anthropic Claude
  3. API calls to an LLM, which generates code and inserts the file into the project
  4. LLM-integration into the IDE
    1. Copilot
    2. Qodo (Codium) & AlphaCodium (preprint, code)
    3. Cursor
    4. Codeium Windsurf (with "Cascade" AI Agent)
    5. ByteDance Trae AI
    6. Tabnine
    7. Traycer
    8. IDX: free
    9. Aide: open-source AI-native code editor (fork of VS Code)
    10. continue.dev: open-source code assistant
    11. Pear AI: open-source code editor
    12. Haystack Editor: canvas UI
    13. Onlook: for designers
  5. AI-assisted IDE, where the AI generates and manages the dev environment
    1. Replit
    2. Aider (code): Pair programming on commandline
    3. Pythagora
    4. StackBlitz bolt.new
    5. Cline (formerly Claude Dev)
  6. Prompt-to-product
    1. Github Spark (demo video)
    2. Create.xyz: text-to-app, replicate product from link
    3. a0.dev: generate mobil apps (from your phone)
    4. Softgen: web app developer
    5. wrapifai: build form-based apps
    6. Lovable: web app (from text, screenshot, etc.)
    7. Vercel v0
    8. MarsX (John Rush): SaaS builder
    9. Webdraw: turn sketches into web apps
    10. Tempo Labs: build React apps
    11. Databutton: no-code software development
    12. base44: no-code dashboard apps
    13. Origin AI
  7. Semi-autonomous software engineer agents
    1. Devin (Cognition AI)
    2. Amazon Q (and CodeWhisperer)
    3. Honeycomb
    4. Claude Code

For a review of the current state of software-engineering agentic approaches, see:

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Inference-compute Reasoning

AI Assistant

Agentic Systems

Increasing AI Agent Intelligence

See: Increasing AI Intelligence

Multi-agent orchestration

Research

Organization Schemes

Societies and Communities of AI agents

Domain-specific

Research demos

Related work

Inter-agent communications

Architectures

Open Source Frameworks

Open Source Systems

Commercial Automation Frameworks

Spreadsheet

Cloud solutions

Frameworks

Optimization

Reviews

Metrics, Benchmarks

Evaluation Schemes

Multi-agent

Agent Challenges

  • Aidan-Bench: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
  • Pictionary: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
  • MC-bench: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges.

Automated Improvement

See Also