Difference between revisions of "Increasing AI Intelligence"

From GISAXS
Jump to: navigation, search
((Optimal) Usage of Reasoning Compute)
(Optimize Confidence)
(21 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
* 2025-02: [https://arxiv.org/abs/2502.21321 LLM Post-Training: A Deep Dive into Reasoning Large Language Models]
 
* 2025-02: [https://arxiv.org/abs/2502.21321 LLM Post-Training: A Deep Dive into Reasoning Large Language Models]
 
* 2025-03: [https://arxiv.org/abs/2503.24377 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models]
 
* 2025-03: [https://arxiv.org/abs/2503.24377 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models]
 +
* 2025-04: [https://arxiv.org/abs/2504.09037 A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems]
 +
* 2025-05: [https://lilianweng.github.io/posts/2025-05-01-thinking/ Why We Think]
 
* Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
 
* Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
  
Line 34: Line 36:
 
===Reinforcement Learning===
 
===Reinforcement Learning===
 
* 2025-04: DeepSeek: [https://arxiv.org/abs/2504.02495 Inference-Time Scaling for Generalist Reward Modeling]
 
* 2025-04: DeepSeek: [https://arxiv.org/abs/2504.02495 Inference-Time Scaling for Generalist Reward Modeling]
 +
* 2025-04: [https://arxiv.org/abs/2504.13837 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?]
 +
* 2025-04: [https://arxiv.org/abs/2504.13941 NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning]
 +
* 2025-04: [https://arxiv.org/abs/2504.16084 TTRL: Test-Time Reinforcement Learning] ([https://github.com/PRIME-RL/TTRL code])
 +
* 2025-04: [https://arxiv.org/abs/2504.20571 Reinforcement Learning for Reasoning in Large Language Models with One Training Example]
 +
* 2025-05: [https://arxiv.org/abs/2505.03335 Absolute Zero: Reinforced Self-play Reasoning with Zero Data]
 +
 +
====Optimize Confidence====
 +
* C.f. 2025-02: [https://arxiv.org/abs/2502.06233 Confidence Improves Self-Consistency in LLMs]
 +
* [https://x.com/xuandongzhao/status/1927270931874910259 2025-05]: [https://arxiv.org/abs/2505.19590 Learning to Reason without External Rewards] ([https://github.com/sunblaze-ucb/Intuitor code]): Reinforcement Learning from Internal Feedback, RLIF
 +
* [https://x.com/mihirp98/status/1927767453490172277 2025-05]: [https://rent-rl.github.io/ Maximizing Confidence Alone Improves Reasoning] ([https://github.com/satrams/rent-rl code]); a.k.a. RENT: Reinforcement Learning via Entropy Minimization.
 +
 +
====Exceed humans, using human-level data====
 +
* 2024-06: [https://arxiv.org/abs/2406.11741v1 Transcendence: Generative Models Can Outperform The Experts That Train Them]
 +
* 2025-03: [https://tecunningham.github.io/posts/2023-09-05-model-of-ai-imitation.html An AI Which Imitates Humans Can Beat Humans]
  
 
===Training Data (Data Refinement, Synthetic Data)===
 
===Training Data (Data Refinement, Synthetic Data)===
Line 51: Line 67:
 
* 2023-10: [https://arxiv.org/abs/2310.16656 A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation]
 
* 2023-10: [https://arxiv.org/abs/2310.16656 A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation]
 
* 2024-07: [https://arxiv.org/abs/2407.06723 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions]
 
* 2024-07: [https://arxiv.org/abs/2407.06723 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions]
 +
 +
===Pre-generate material===
 +
* 2020-03: [https://research.google/blog/introducing-dreamer-scalable-reinforcement-learning-using-world-models/ Introducing Dreamer: Scalable Reinforcement Learning Using World Models]
 +
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
 +
* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]
  
 
===Generate consistent plans/thoughts===
 
===Generate consistent plans/thoughts===
Line 127: Line 148:
 
* 2024-12: [https://arxiv.org/abs/2412.06822 Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models]
 
* 2024-12: [https://arxiv.org/abs/2412.06822 Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models]
  
===Inference-time Gradient===
+
===Inference-time Gradient/Updating/RL/etc.===
 
* 2024-11: [https://ekinakyurek.github.io/papers/ttt.pdf The Surprising Effectiveness of Test-Time Training for Abstract Reasoning] ([https://github.com/ekinakyurek/marc code])
 
* 2024-11: [https://ekinakyurek.github.io/papers/ttt.pdf The Surprising Effectiveness of Test-Time Training for Abstract Reasoning] ([https://github.com/ekinakyurek/marc code])
 +
* 2025-04: [https://arxiv.org/abs/2504.16084 TTRL: Test-Time Reinforcement Learning] ([https://github.com/PRIME-RL/TTRL code])
  
 
===Self-prompting===
 
===Self-prompting===
Line 166: Line 188:
 
* 2024-11: [https://arxiv.org/abs/2411.02830 Mixtures of In-Context Learners]: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
 
* 2024-11: [https://arxiv.org/abs/2411.02830 Mixtures of In-Context Learners]: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
 
* 2024-11: [https://arxiv.org/abs/2411.10440 LLaVA-o1: Let Vision Language Models Reason Step-by-Step] ([https://github.com/PKU-YuanGroup/LLaVA-o1 code])
 
* 2024-11: [https://arxiv.org/abs/2411.10440 LLaVA-o1: Let Vision Language Models Reason Step-by-Step] ([https://github.com/PKU-YuanGroup/LLaVA-o1 code])
 +
* 2025-04: [https://arxiv.org/abs/2504.07081 Self-Steering Language Models]: Planner generates program, Followers accomplish sub-tasks
  
 
===Iteration (e.g. neural-like layered blocks)===
 
===Iteration (e.g. neural-like layered blocks)===
Line 221: Line 244:
 
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
 
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
 
* 2025-03: [https://arxiv.org/abs/2504.00294 Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead]
 
* 2025-03: [https://arxiv.org/abs/2504.00294 Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead]
* 2025-05: [https://arxiv.org/abs/2504.03635 Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning]: Model size can improve things, but can also lead to overparametrization (memorization instead of reasoning)
+
* 2025-04: [https://arxiv.org/abs/2504.03635 Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning]: Model size can improve things, but can also lead to overparametrization (memorization instead of reasoning)
 +
* 2025-04: [https://arxiv.org/abs/2504.14047 Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods]: Reasoning models outperform inference-time-compute of non-reasoning; majority voting always helps, and is hard to beat
  
 
====(Optimal) Usage of Reasoning Compute====
 
====(Optimal) Usage of Reasoning Compute====
 +
* 2024-10: [https://arxiv.org/abs/2410.21333 Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse]
 
* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
 
* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
 
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
 
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
Line 232: Line 257:
 
* 2025-04: [https://arxiv.org/abs/2504.05185 Concise Reasoning via Reinforcement Learning]
 
* 2025-04: [https://arxiv.org/abs/2504.05185 Concise Reasoning via Reinforcement Learning]
 
* 2025-04: [https://arxiv.org/abs/2504.05419 Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification]
 
* 2025-04: [https://arxiv.org/abs/2504.05419 Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification]
 +
* 2025-04: [https://arxiv.org/abs/2504.15895 Dynamic Early Exit in Reasoning Models]
  
 
====Usage of Training Data====
 
====Usage of Training Data====

Revision as of 17:05, 28 May 2025

Contents

Reviews

World Model

Prompt Engineering

Thought Templates

Automatic Prompt Optimization

Fine Tuning

Proactive Search

Compute expended after training, but before inference.

Reinforcement Learning

Optimize Confidence

Exceed humans, using human-level data

Training Data (Data Refinement, Synthetic Data)

Re-captioning

Pre-generate material

Generate consistent plans/thoughts

  • 2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
    • (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
  • 2024-09: Self-Harmonized Chain of Thought
    • Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
  • 2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
    • They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

See also: AI tools > LLM > Open-weights LLM > Reasoning

Scaling

Inference Time Compute

Methods

Review

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient/Updating/RL/etc.

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Model Merging

Meta-methods

Analysis

Scaling

(Optimal) Usage of Reasoning Compute

Usage of Training Data

  • 2025-02: LIMO: Less is More for Reasoning (surprisingly easy generalization, from very few reasoning training examples; model can go from knowledge-retrieval to diverse reasoning using curated examples)

Theory

Expending compute works

Compute.png

Pragmatics

Code for Inference-time Compute

  • optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Interact with Environment

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

Limitations/Requirements

Creativity

See: AI creativity

See Also