Difference between revisions of "AI benchmarks"

From GISAXS
Jump to: navigation, search
(Creativity)
(Assess Specific Attributes)
Line 19: Line 19:
 
* [https://www.vectara.com/ Vectara] [https://github.com/vectara/hallucination-leaderboard Hallucination Leaderboard]
 
* [https://www.vectara.com/ Vectara] [https://github.com/vectara/hallucination-leaderboard Hallucination Leaderboard]
 
* [https://livebench.ai/#/ LiveBench: A Challenging, Contamination-Free LLM Benchmark]
 
* [https://livebench.ai/#/ LiveBench: A Challenging, Contamination-Free LLM Benchmark]
 +
 +
==Hallucination==
 +
* [https://github.com/lechmazur/confabulations/ LLM Confabulation (Hallucination) Leaderboard for RAG]
  
 
==Software/Coding==
 
==Software/Coding==

Revision as of 13:52, 27 March 2025

General

Methods

Task Length

GmZHL8xWQAAtFlF.jpeg

Assess Specific Attributes

Various

Hallucination

Software/Coding

Visual

Creativity

Reasoning

Assistant/Agentic

Science

See: Science Benchmarks