AI benchmarks

From GISAXS

Revision as of 10:07, 12 February 2025 by KevinYager (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

Contents

1 Leaderboards
2 Methods
3 Assess Specific Attributes
- 3.1 Creativity

Leaderboards

LMSYS: Human preference ranking
Tracking AI
Vectara Hallucination Leaderboard
LiveBench: A Challenging, Contamination-Free LLM Benchmark

Methods

AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions (code)
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning. Assess reasoning using puzzles of tunable complexity.

Assess Specific Attributes

Creativity

2024-10: AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Retrieved from "http://gisaxs.com/index.php?title=AI_benchmarks&oldid=6867"