Difference between revisions of "AI benchmarks"

From GISAXS
Jump to: navigation, search
(Software/Coding)
(Methods)
Line 9: Line 9:
 
=Methods=
 
=Methods=
 
* [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
 
* [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
 +
** [https://aidanbench.com/ Leaderboard]
 +
** [https://x.com/scaling01/status/1897301054431064391 Suggestion to use] [https://en.wikipedia.org/wiki/Borda_count Borda count]
 
* [https://arxiv.org/abs/2502.01100 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning]. Assess reasoning using puzzles of tunable complexity.
 
* [https://arxiv.org/abs/2502.01100 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning]. Assess reasoning using puzzles of tunable complexity.
  

Revision as of 13:13, 5 March 2025