Difference between revisions of "AI benchmarks"

From GISAXS
Jump to: navigation, search
(Conversation)
(Assistant/Agentic)
 
Line 49: Line 49:
  
 
==Assistant/Agentic==
 
==Assistant/Agentic==
 +
See: [[AI_Agents#Optimization|AI Agents: Optimization]]
 
* [https://arxiv.org/abs/2311.12983 GAIA: a benchmark for General AI Assistants]
 
* [https://arxiv.org/abs/2311.12983 GAIA: a benchmark for General AI Assistants]
 
* [https://www.galileo.ai/blog/agent-leaderboard Galileo AI] [https://huggingface.co/spaces/galileo-ai/agent-leaderboard Agent Leaderboard]
 
* [https://www.galileo.ai/blog/agent-leaderboard Galileo AI] [https://huggingface.co/spaces/galileo-ai/agent-leaderboard Agent Leaderboard]

Latest revision as of 16:28, 14 April 2025

General

Methods

Task Length

GmZHL8xWQAAtFlF.jpeg

Assess Specific Attributes

Various

Hallucination

Software/Coding

Visual

Conversation

Creativity

Reasoning

Assistant/Agentic

See: AI Agents: Optimization

Science

See: Science Benchmarks