Difference between revisions of "AI Agents"

From GISAXS
Jump to: navigation, search
(Metrics, Benchmarks)
(Open Source Frameworks)
 
(2 intermediate revisions by the same user not shown)
Line 18: Line 18:
 
===Guides===
 
===Guides===
 
* Anthropic: [https://www.anthropic.com/research/building-effective-agents Building Effective Agents]
 
* Anthropic: [https://www.anthropic.com/research/building-effective-agents Building Effective Agents]
 +
* Google: [https://www.kaggle.com/whitepaper-agents Agents]
  
 
=AI Assistants=
 
=AI Assistants=
Line 68: Line 69:
 
* [https://www.philschmid.de/llm-evaluation LLM Evaluation doesn't need to be complicated]
 
* [https://www.philschmid.de/llm-evaluation LLM Evaluation doesn't need to be complicated]
 
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
 
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
* [https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge Awesome-LLM-as-a-judge Survey
+
* [https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge Awesome-LLM-as-a-judge Survey]
 
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 
* 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
 
* 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
Line 180: Line 181:
 
* [https://github.com/kaiban-ai/KaibanJS KaibanJS]: Kanban for AI Agents? (Takes inspiration from [https://en.wikipedia.org/wiki/Kanban Kanban] visual [https://www.atlassian.com/agile/kanban work management].)
 
* [https://github.com/kaiban-ai/KaibanJS KaibanJS]: Kanban for AI Agents? (Takes inspiration from [https://en.wikipedia.org/wiki/Kanban Kanban] visual [https://www.atlassian.com/agile/kanban work management].)
 
* [https://github.com/Thytu/Agentarium Agentarium]
 
* [https://github.com/Thytu/Agentarium Agentarium]
 +
* [https://orchestra.org/ Orchestra] ([https://docs.orchestra.org/orchestra/introduction docs], [https://docs.orchestra.org/orchestra/introduction code])
  
 
==Open Source Systems==
 
==Open Source Systems==

Latest revision as of 12:53, 7 January 2025

Reviews & Perspectives

Published

Continually updating

Analysis/Opinions

Guides

AI Assistants

Components of AI Assistants

Agent Internal Workflow Management

Information Retrieval

Control (tool-use, computer use, etc.)

Open-source

Personalities/Personas

Specific Uses for AI Assistants

Computer Use

Software Engineering

Science Agents

See Science Agents.

LLM-as-judge

Advanced Workflows

Software Development Workflows

Several paradigms of AI-assisted coding have arisen:

  1. Manual, human driven
  2. AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
    1. OpenAI ChatGPT
    2. Anthropic Claude
  3. API calls to an LLM, which generates code and inserts the file into the project
  4. LLM-integration into the IDE
    1. Copilot
    2. Qodo (Codium) & AlphaCodium (preprint, code)
    3. Cursor
    4. Codeium Windsurf (with "Cascade" AI Agent)
  5. AI-assisted IDE, where the AI generates and manages the dev environment
    1. Replit
    2. Aider (code): Pair programming on commandline
    3. Pythagora
    4. StackBlitz bolt.new
    5. Cline (formerly Claude Dev)
  6. Prompt-to-product
    1. Github Spark (demo video)
  7. Semi-autonomous software engineer agents
    1. Devin (Cognition AI)
    2. Amazon Q
    3. Honeycomb

For a review of the current state of software-engineering agentic approaches, see:

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Inference-compute Reasoning

Agentic Systems

Increasing AI Agent Intelligence

See: Increasing AI Intelligence

Multi-agent orchestration

Research

Societies and Communities of AI agents

Research demos

Related work

Inter-agent communications

Architectures

Open Source Frameworks

Open Source Systems

Commercial Automation Frameworks

Spreadsheet

Cloud solutions

Frameworks

Optimization

Metrics, Benchmarks

Evaluation Schemes

Multi-agent

Agent Challenges

  • Aidan-Bench: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
  • Pictionary: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
  • MC-bench: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges.

Automated Improvement

See Also