Difference between revisions of "AI benchmarks"

From GISAXS
Jump to: navigation, search
(Assistant/Agentic)
(Visual)
Line 32: Line 32:
  
 
==Visual==
 
==Visual==
 +
* 2024-06: [https://charxiv.github.io/ Charting Gaps in Realistic Chart Understanding in Multimodal LLMs] ([https://arxiv.org/abs/2406.18521 preprint], [https://charxiv.github.io/ leaderboard])
 
* 2025-03: [https://arxiv.org/abs/2503.14607 Can Large Vision Language Models Read Maps Like a Human?] MapBench
 
* 2025-03: [https://arxiv.org/abs/2503.14607 Can Large Vision Language Models Read Maps Like a Human?] MapBench
  

Revision as of 13:36, 16 April 2025

General

Methods

Task Length

GmZHL8xWQAAtFlF.jpeg

Assess Specific Attributes

Various

Hallucination

Software/Coding

Visual

Conversation

Creativity

Reasoning

Assistant/Agentic

See: AI Agents: Optimization

Science

See: Science Benchmarks