Difference between revisions of "AI benchmarks"

From GISAXS
Jump to: navigation, search
(Assess Specific Attributes)
(Assess Specific Attributes)
Line 17: Line 17:
 
* [https://lmsys.org/ LMSYS]: Human preference ranking leaderboard
 
* [https://lmsys.org/ LMSYS]: Human preference ranking leaderboard
 
* [https://trackingai.org/home Tracking AI]: "IQ" leaderboard
 
* [https://trackingai.org/home Tracking AI]: "IQ" leaderboard
* [https://www.vectara.com/ Vectara] [https://github.com/vectara/hallucination-leaderboard Hallucination Leaderboard]
 
 
* [https://livebench.ai/#/ LiveBench: A Challenging, Contamination-Free LLM Benchmark]
 
* [https://livebench.ai/#/ LiveBench: A Challenging, Contamination-Free LLM Benchmark]
 +
* [https://github.com/lechmazur/generalization/ LLM Thematic Generalization Benchmark]
  
 
==Hallucination==
 
==Hallucination==
 +
* [https://www.vectara.com/ Vectara] [https://github.com/vectara/hallucination-leaderboard Hallucination Leaderboard]
 
* [https://github.com/lechmazur/confabulations/ LLM Confabulation (Hallucination) Leaderboard for RAG]
 
* [https://github.com/lechmazur/confabulations/ LLM Confabulation (Hallucination) Leaderboard for RAG]
  

Revision as of 13:54, 27 March 2025

General

Methods

Task Length

GmZHL8xWQAAtFlF.jpeg

Assess Specific Attributes

Various

Hallucination

Software/Coding

Visual

Creativity

Reasoning

Assistant/Agentic

Science

See: Science Benchmarks