Difference between revisions of "Science Agents"

From GISAXS
Jump to: navigation, search
(Math)
(Science Benchmarks)
 
(3 intermediate revisions by the same user not shown)
Line 173: Line 173:
 
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
 
** 2024-07: [https://arxiv.org/abs/2407.09413 SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers]
 
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
 
** 2024-10: [https://neurips.cc/virtual/2024/98540 FEABench: Evaluating Language Models on Real World Physics Reasoning Ability]
 +
* 2025-07: [https://allenai.org/blog/sciarena SciArena: A New Platform for Evaluating Foundation Models in Scientific Literature Tasks] ([https://sciarena.allen.ai/ vote], [https://huggingface.co/datasets/yale-nlp/SciArena data], [https://github.com/yale-nlp/SciArena code])
 
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]
 
* 2026-02: [https://edisonscientific.com/ Edison]: [https://lab-bench.ai/ LABBench 2]
 +
* 2026-04: [https://arxiv.org/abs/2604.14140 LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning] ([https://longcot.ai/ site], [https://github.com/LongHorizonReasoning/longcot code])
  
 
=Science Agents=
 
=Science Agents=
Line 202: Line 204:
 
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]
 
* 2026-02: [https://arxiv.org/abs/2601.23265 PaperBanana: Automating Academic Illustration for AI Scientists]
 
* 2026-03: [https://arxiv.org/abs/2603.20179 AI Agents Can Already Autonomously Perform Experimental High Energy Physics]
 
* 2026-03: [https://arxiv.org/abs/2603.20179 AI Agents Can Already Autonomously Perform Experimental High Energy Physics]
 +
* 2026-04: [https://github.com/AstroPilot-AI/Denario Denario]: scientific research assistant
  
 
==Science Multi-Agent Setups==
 
==Science Multi-Agent Setups==
Line 315: Line 318:
 
** 2026-03: Three problems solved using OpenAI GPT internal model. Paper: [https://arxiv.org/pdf/2603.29961 Short Proofs in Combinatorics and Number Theory]
 
** 2026-03: Three problems solved using OpenAI GPT internal model. Paper: [https://arxiv.org/pdf/2603.29961 Short Proofs in Combinatorics and Number Theory]
 
** 2026-04: [https://www.erdosproblems.com/forum/thread/1196 Erdős Problem #1196] [https://x.com/Liam06972452/status/2044051379916882067?s=20 solved] by [https://x.com/Liam06972452 Leeham] using ChatGPT 5.4 Pro
 
** 2026-04: [https://www.erdosproblems.com/forum/thread/1196 Erdős Problem #1196] [https://x.com/Liam06972452/status/2044051379916882067?s=20 solved] by [https://x.com/Liam06972452 Leeham] using ChatGPT 5.4 Pro
 +
** 2026-04: [https://www.erdosproblems.com/forum/thread/258 Erdős Problem #258] [https://x.com/prz_chojecki/status/2044129595729854493?s=20 solved] by [https://x.com/prz_chojecki Przemek Chojecki] using ChatGPT 5.4 Pro
 
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
 
* 2026-01: [https://arxiv.org/abs/2601.07222 The motivic class of the space of genus 0 maps to the flag variety]
 
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]
 
* 2026-02: Google DeepMind: [https://arxiv.org/abs/2602.10177 Towards Autonomous Mathematics Research]

Latest revision as of 12:43, 16 April 2026

AI Use-cases for Science

Literature

LLM extract data from papers

AI finding links in literature

(Pre) Generate Articles

Explanation

Autonomous Ideation

Adapting LLMs to Science

AI/LLM Control of Scientific Instruments/Facilities

AI/ML Methods tailored to Science

Science Foundation Models

Regression (Data Fitting)

Tabular Classification/Regression

Symbolic Regression

Literature Discovery

Commercial

Bio

AI/ML Methods in Science

Imaging

Materials

Chemistry

Biology

Medicine

See: AI_Agents#Medicine

Successes

AI/ML Methods co-opted for Science

Mechanistic Interpretability

Train large model on science data. Then apply mechanistic interpretability (e.g. sparse autoencoders, SAE) to the feature/activation space.

Uncertainty

Science Benchmarks

Science Agents

Reviews

Challenges

Specific

Science Multi-Agent Setups

Science Agentic Components

Frameworks

Personalities

Skills

AI Science Systems

Inorganic Materials Discovery

Materials Characterization

Chemistry

Bio

Physics

LLMs Optimized for Science

Impact of AI in Science

Related Tools

Literature Search

Data Visualization

Generative

Chemistry

Science Datasets

Genuine Discoveries

Math

Physics assistance

Literature exploration

Bio design

Material Discovery

See Also