Difference between revisions of "Increasing AI Intelligence"

From GISAXS
Jump to: navigation, search
(CoT reasoning model)
(Training Data (Data Refinement, Synthetic Data))
 
(13 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 
* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
 
* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
 +
* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
 
* Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
 
* Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
  
Line 21: Line 22:
 
* 2024-09: [https://arxiv.org/abs/2409.17115 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale]
 
* 2024-09: [https://arxiv.org/abs/2409.17115 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale]
 
* 2024-10: [https://arxiv.org/abs/2410.15547 Data Cleaning Using Large Language Models]
 
* 2024-10: [https://arxiv.org/abs/2410.15547 Data Cleaning Using Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.18845 Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities]
 +
* 2025-02: [https://arxiv.org/abs/2502.01718 ACECODER: Acing Coder RL via Automated Test-Case Synthesis]
 
* Updating list of links: [https://github.com/wasiahmad/Awesome-LLM-Synthetic-Data Synthetic Data of LLMs, by LLMs, for LLMs]
 
* Updating list of links: [https://github.com/wasiahmad/Awesome-LLM-Synthetic-Data Synthetic Data of LLMs, by LLMs, for LLMs]
  
Line 50: Line 53:
  
 
====CoT reasoning model====
 
====CoT reasoning model====
 +
See also: [[AI tools]] > LLM > Open-weights LLM > [[AI_tools#Reasoning|Reasoning]]
 
* 2024-09: [https://openai.com/o1/ OpenAI o1]
 
* 2024-09: [https://openai.com/o1/ OpenAI o1]
 
* 2024-10: [https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf O1 Replication Journey: A Strategic Progress Report – Part 1] ([https://github.com/GAIR-NLP/O1-Journey code]): Attempt by [https://gair-nlp.github.io/walnut-plan/ Walnut Plan] to reproduce o1-like in-context reasoning
 
* 2024-10: [https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf O1 Replication Journey: A Strategic Progress Report – Part 1] ([https://github.com/GAIR-NLP/O1-Journey code]): Attempt by [https://gair-nlp.github.io/walnut-plan/ Walnut Plan] to reproduce o1-like in-context reasoning
Line 62: Line 66:
 
* 2025-01: [https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]
 
* 2025-01: [https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]
 
* 2025-01: [https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf Kimi k1.5: Scaling Reinforcement Learning with LLMs]
 
* 2025-01: [https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf Kimi k1.5: Scaling Reinforcement Learning with LLMs]
 +
* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
 +
* 2025-01: [https://huggingface.co/blog/open-r1 Open-R1: a fully open reproduction of DeepSeek-R1]
  
 
===Scaling===
 
===Scaling===
Line 149: Line 155:
 
* 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 
* 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 
* 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
 
* 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
 +
* 2025-01: [https://arxiv.org/abs/2501.19393 s1: Simple test-time scaling]
  
 
==Analysis==
 
==Analysis==
Line 177: Line 184:
 
===Code for Inference-time Compute===
 
===Code for Inference-time Compute===
 
* [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
 
* [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
 +
 +
=Interact with Environment=
 +
* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
  
 
=Memory=
 
=Memory=
Line 195: Line 205:
 
* 2024-10: [https://arxiv.org/abs/2410.19318 Two are better than one: Context window extension with multi-grained self-injection]
 
* 2024-10: [https://arxiv.org/abs/2410.19318 Two are better than one: Context window extension with multi-grained self-injection]
 
* 2024-11: [https://arxiv.org/abs/2411.00114 Project Sid: Many-agent simulations toward AI civilization]
 
* 2024-11: [https://arxiv.org/abs/2411.00114 Project Sid: Many-agent simulations toward AI civilization]
 +
* 2025-01: [https://arxiv.org/abs/2501.13946 Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks]
  
 
=ML-like Optimization of LLM Setup=
 
=ML-like Optimization of LLM Setup=
Line 201: Line 212:
 
* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text)
 
* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text)
 
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
 
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
 +
 +
=Limitations/Requirements=
 +
* Fluid intelligence (c.f. [https://arcprize.org/arc ARC AGI])
 +
* 2024-06: [https://arxiv.org/abs/2406.04268 Open-Endedness is Essential for Artificial Superhuman Intelligence]
 +
 +
==Creativity==
 +
* 2024-09: [https://arxiv.org/abs/2409.04109 Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers]
 +
* 2024-11: [https://openreview.net/pdf?id=fz969ahcvJ AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions] ([https://github.com/aidanmclaughlin/AidanBench code])
 +
* 2024-11: [https://conference.nber.org/conf_papers/f210475.pdf Artificial Intelligence, Scientific Discovery, and Product Innovation]
 +
* 2024-12: [https://arxiv.org/abs/2412.17596 LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context]
 +
* 2024-12: [https://arxiv.org/abs/2412.02980 Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models]
  
 
=See Also=
 
=See Also=

Latest revision as of 09:10, 5 February 2025

Reviews

Prompt Engineering

Fine Tuning

Proactive Search

Compute expended after training, but before inference.

Training Data (Data Refinement, Synthetic Data)

Generate consistent plans/thoughts

  • 2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
    • (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
  • 2024-09: Self-Harmonized Chain of Thought
    • Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
  • 2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
    • They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

See also: AI tools > LLM > Open-weights LLM > Reasoning

Scaling

Inference Time Compute

Methods

Review

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Analysis

Scaling

Theory

Expending compute works

Compute.png

Pitfalls

Pragmatics

Code for Inference-time Compute

  • optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Interact with Environment

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

Limitations/Requirements

Creativity

See Also