Difference between revisions of "AI Agents"

From GISAXS
Jump to: navigation, search
(Inter-agent communications)
(Iterative reasoning via graphs)
 
(3 intermediate revisions by the same user not shown)
Line 50: Line 50:
 
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
 
* [https://eugeneyan.com/writing/llm-evaluators/ Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)]
 
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 
* 2024-10: [https://arxiv.org/abs/2410.10934 Agent-as-a-Judge: Evaluate Agents with Agents]
 +
* 2024-11: [https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-Judge]
  
 
=Advanced Workflows=
 
=Advanced Workflows=
Line 146: Line 147:
 
* 2024-11: [https://x.com/deepseek_ai/status/1859200141355536422 DeepSeek-R1-Lite-Preview reasoning model]
 
* 2024-11: [https://x.com/deepseek_ai/status/1859200141355536422 DeepSeek-R1-Lite-Preview reasoning model]
 
* 2024-11: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 +
* 2024-11: [https://huggingface.co/papers/2411.16489 O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?]
  
 
===Scaling===
 
===Scaling===
Line 203: Line 205:
 
===Iterative reasoning via graphs===
 
===Iterative reasoning via graphs===
 
* 2023-08: [https://arxiv.org/abs/2308.09687 Graph of Thoughts: Solving Elaborate Problems with Large Language Models]
 
* 2023-08: [https://arxiv.org/abs/2308.09687 Graph of Thoughts: Solving Elaborate Problems with Large Language Models]
 +
* 2023-10: [https://arxiv.org/abs/2310.04363 Amortizing intractable inference in large language models] ([https://github.com/GFNOrg/gfn-lm-tuning code])
 
* 2024-09: [https://arxiv.org/abs/2409.10038 On the Diagram of Thought]: Iterative reasoning as a directed acyclic graph (DAG)
 
* 2024-09: [https://arxiv.org/abs/2409.10038 On the Diagram of Thought]: Iterative reasoning as a directed acyclic graph (DAG)
  

Latest revision as of 15:51, 26 November 2024

Contents

Reviews & Perspectives

Published

Continually updating

Analysis/Opinions

AI Assistants

Components of AI Assistants

Information Retrieval

Open-source

Personalities/Personas

Specific Uses for AI Assistants

Computer Use

Science Agents

See Science Agents.

LLM-as-judge

Advanced Workflows

Software Development Workflows

Several paradigms of AI-assisted coding have arisen:

  1. Manual, human driven
  2. AI-aided through chat/dialogue, where the human asks for code and then copies it into the project
    1. OpenAI ChatGPT
    2. Anthropic Claude
  3. API calls to an LLM, which generates code and inserts the file into the project
  4. LLM-integration into the IDE
    1. Copilot
    2. Qodo (Codium) & AlphaCodium (preprint, code)
    3. Cursor
    4. Codeium Windsurf (with "Cascade" AI Agent)
  5. AI-assisted IDE, where the AI generates and manages the dev environment
    1. Replit
    2. Aider (code): Pair programming on commandline
    3. Pythagora
    4. StackBlitz bolt.new
    5. Cline (formerly Claude Dev)
  6. Prompt-to-product
    1. Github Spark (demo video)
  7. Semi-autonomous software engineer agents
    1. Devin (Cognition AI)
    2. Amazon Q
    3. Honeycomb

For a review of the current state of software-engineering agentic approaches, see:

Corporate AI Agent Ventures

Mundane Workflows and Capabilities

Inference-compute Reasoning

Agentic Systems

Increasing AI Agent Intelligence

Proactive Search

Compute expended after training, but before inference.

Training Data (Data Refinement, Synthetic Data)

Generate consistent plans/thoughts

  • 2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
    • (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
  • 2024-09: Self-Harmonized Chain of Thought
    • Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

Scaling

Inference Time Compute

Methods

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient

Self-prompting

In-context thought

Naive multi-LLM (verification, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Scaling

Theory

Expending compute works

Compute.png

Code for Inference-time Compute

  • optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Memory

Tool Use

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

Multi-agent orchestration

Research demos

Related work

Inter-agent communications

Architectures

Open Source Frameworks

Open Source Systems

Commercial Automation Frameworks

Spreadsheet

Cloud solutions

Frameworks

Optimization

Metrics, Benchmarks

Agent Challenges

  • Aidan-Bench: Test creativity by having a particular LLM generate long sequence of outputs (meant to be different), and measuring how long it can go before duplications appear.
  • Pictionary: LLM suggests prompt, multiple LLMs generate outputs, LLM judges; allows raking of the generation abilities.
  • MC-bench: Request LLMs to build an elaborate structure in Minecraft; outputs can be A/B tested by human judges.

Automated Improvement

See Also