Latest revision as of 11:44, 22 January 2026

1 Reviews
- 1.1 World Model
2 Prompt Engineering
- 2.1 Thought Templates
- 2.2 Automatic Prompt Optimization
3 Fine Tuning
4 Proactive Search
5 Inference Time Compute
6 Interact with Environment (Experiential Learning)
7 Memory
8 Tool Use
- 8.1 Integrated
9 Multi-agent Effort (and Emergent Intelligence)
- 9.1 Competition
10 ML-like Optimization of LLM Setup
11 Self-modification
12 Limitations/Requirements
- 12.1 Creativity
13 See Also

Reviews

2024-12: A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges
2025-01: Test-time Computing: from System-1 Thinking to System-2 Thinking (github list of papers)
2025-01: Reasoning Language Models: A Blueprint
2025-02: Advancing Reasoning in Large Language Models: Promising Methods and Approaches
2025-02: Logical Reasoning in Large Language Models: A Survey
2025-02: LLM Post-Training: A Deep Dive into Reasoning Large Language Models
2025-03: Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
2025-04: A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
2025-05: Why We Think
Links to papers: Awesome LLM Strawberry (OpenAI o1)

World Model

2025-03: Simulating the Real World: A Unified Survey of Multimodal Generative Models

Prompt Engineering

2024-11: LLMs as Method Actors: A Model for Prompt Engineering and Architecture

Thought Templates

Automatic Prompt Optimization

Fine Tuning

Proactive Search

Compute expended after training, but before inference.

Re-captioning

Pre-generate material

2020-03: Introducing Dreamer: Scalable Reinforcement Learning Using World Models
2025-03: Reasoning to Learn from Latent Thoughts
2025-04: Sleep-time Compute: Beyond Inference Scaling at Test-time

Generate consistent plans/thoughts

2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
- (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
2024-09: Self-Harmonized Chain of Thought
- Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
- They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Sampling

2024-11: Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding (code)

Automated prompt generation

2024-09: Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts

Distill inference-time-compute into model

2023-10: Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning (U. Maryland, Adobe)
2023-11: Implicit Chain of Thought Reasoning via Knowledge Distillation (Harvard, Microsoft, Hopkins)
2024-02: Grandmaster-Level Chess Without Search (Google DeepMind)
2024-07: Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models
2024-07: Distilling System 2 into System 1
2024-07: BOND: Aligning LLMs with Best-of-N Distillation
2024-09: Training Language Models to Self-Correct via Reinforcement Learning (Google DeepMind)
2024-10: Thinking LLMs: General Instruction Following with Thought Generation
2024-10: Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
2024-12: Training Large Language Models to Reason in a Continuous Latent Space

CoT reasoning model

See also: AI tools > LLM > Open-weights LLM > Reasoning

2024-09: OpenAI o1
2024-10: O1 Replication Journey: A Strategic Progress Report – Part 1 (code): Attempt by Walnut Plan to reproduce o1-like in-context reasoning
2024-11: DeepSeek-R1-Lite-Preview reasoning model
2024-11: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
2024-11: O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
2024-11: Tulu 3: Pushing Frontiers in Open Language Model Post-Training
2024-12: o1-Coder: an o1 Replication for Coding (code)
2024-12: Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
2024-12: Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
2025-01: Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
2025-01: O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning
2025-01: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2025-01: Kimi k1.5: Scaling Reinforcement Learning with LLMs
2025-01: Reasoning Language Models: A Blueprint
2025-01: Open-R1: a fully open reproduction of DeepSeek-R1
2025-02: Demystifying Long Chain-of-Thought Reasoning in LLMs
2025-02: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (Huginn-0125)
2025-02: Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Scaling

2024-08: Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling (Google DeepMind)
2024-11: Scaling Laws for Pre-training Agents and World Models
2025-02: Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
2025-03: Compute Optimal Scaling of Skills: Knowledge vs Reasoning
2025-10: The Art of Scaling Reinforcement Learning Compute for LLMs

Inference Time Compute

Methods

2024-03: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
2024-11: Reverse Thinking Makes LLMs Stronger Reasoners
2024-12: Training Large Language Models to Reason in a Continuous Latent Space (Chain of Continuous Thought, COCONUT)

Review

In context learning (ICL), search, and other inference-time methods

2023-03: Reflexion: Language Agents with Verbal Reinforcement Learning
2023-05: VOYAGER: An Open-Ended Embodied Agent with Large Language Models
2024-04: Many-Shot In-Context Learning
2024-08: Automated Design of Agentic Systems
2024-09: Planning In Natural Language Improves LLM Search For Code Generation

Inference-time Sampling

Inference-time Gradient/Updating/RL/etc.

2024-11: The Surprising Effectiveness of Test-Time Training for Abstract Reasoning (code)
2025-04: TTRL: Test-Time Reinforcement Learning (code)

Self-prompting

Retrieval or Memory

2024-12: Meta-Reflection: A Feedback-Free Reflection Learning Framework

In-context thought

2022-01: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Google Brain)
2023-05: Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Google DeepMind)
2024-05: Faithful Logical Reasoning via Symbolic Chain-of-Thought
2024-06: A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
2024-09: To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
2024-09: Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning (Agnostiq, Toronto)
2024-09: Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
2024-10: A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration (failed reasoning traces can improve CoT)
2024-10: Tree of Problems: Improving structured problem solving with compositionality
2023-01/2024-10: A Survey on In-context Learning
2025-01: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
2025-03: Interruption is All You Need: Improving Reasoning Model Refusal Rates through measuring Parallel Reasoning Diversity: A novel approach to reducing hallucinations in large language models through parallel reasoning and diversity measurement

Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)

2023-06: LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion (code)
2023-12: Dynamic Voting for Efficient Reasoning in Large Language Models
2024-04: Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
2024-08: Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
2024-11: Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
2024-12: llm-consortium: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
2025-03: Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification
2025-02: When One LLM Drools, Multi-LLM Collaboration Rules
2025-03: Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
2025-12: Enhancing LLM Planning Capabilities through Intrinsic Self-Critique

Multi-LLM (multiple comparisons, branching, etc.)

2024-10: Thinking LLMs: General Instruction Following with Thought Generation
2024-11: Mixtures of In-Context Learners: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
2024-11: LLaVA-o1: Let Vision Language Models Reason Step-by-Step (code)
2025-04: Self-Steering Language Models: Planner generates program, Followers accomplish sub-tasks
2025-09: BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
2025-09: MTQA: Matrix of Thought for Enhanced Reasoning in Complex Question Answering

Iteration (e.g. neural-like layered blocks)

2024-06: Mixture-of-Agents Enhances Large Language Model Capabilities

Iterative reasoning via graphs

2023-08: Graph of Thoughts: Solving Elaborate Problems with Large Language Models
2023-10: Amortizing intractable inference in large language models (code)
2024-09: On the Diagram of Thought: Iterative reasoning as a directed acyclic graph (DAG)

Monte Carlo Tree Search (MCTS)

Pathfinding

Other Search

2024-11: Scattered Forest Search: Smarter Code Space Exploration with LLMs

Chain-of-Thought Reasoning

Inner Monologue

Model Merging

Meta-methods

2025-02: Atom of Thoughts for Markov LLM Test-Time Scaling (code)

Analysis

Scaling

2021-04: Scaling Scaling Laws with Board Games
2024-03: Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
2024-04: The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
2024-07: Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
2024-08: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
2024-08: Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
2024-10: (comparing fine-tuning to in-context learning) Is In-Context Learning Sufficient for Instruction Following in LLMs?
2024-11: Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers
2025-02: Distillation Scaling Laws
2025-03: Compute Optimal Scaling of Skills: Knowledge vs Reasoning
2025-03: Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
2025-04: Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning: Model size can improve things, but can also lead to overparametrization (memorization instead of reasoning)
2025-04: Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods: Reasoning models outperform inference-time-compute of non-reasoning; majority voting always helps, and is hard to beat
2025-12: Towards a Science of Scaling Agent Systems (Google DeepMind)

(Optimal) Usage of Reasoning Compute

Usage of Training Data

2025-02: LIMO: Less is More for Reasoning (surprisingly easy generalization, from very few reasoning training examples; model can go from knowledge-retrieval to diverse reasoning using curated examples)

Theory

2024-02: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Expending compute works

2024-06-10: Blog post (opinion): AI Search: The Bitter-er Lesson
2024-07-17: Blog post (test): Getting 50% (SoTA) on ARC-AGI with GPT-4o
2024-09-12: OpenAI o1: Learning to Reason with LLMs

2024-09-16: Scaling: The State of Play in AI
2025-02-03: Competitive Programming with Large Reasoning Models
2025-10: Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models

Pragmatics

Code for Inference-time Compute

optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Interact with Environment (Experiential Learning)

Memory

2024-10: Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

Tool Use

2024-11: DynaSaur: Large Language Agents Beyond Predefined Actions: writes functions/code to increase capabilities

Integrated

2018-08: Neural Arithmetic Logic Units
2023-01: Tracr: Compiled Transformers as a Laboratory for Interpretability (code)
2024-05: Augmenting Language Models with Composable Differentiable Libraries (pdf)
2024-07: Algorithmic Language Models with Neurally Compiled Libraries
2024-10: ALTA: Compiler-Based Analysis of Transformers

Multi-agent Effort (and Emergent Intelligence)

Competition

2025-06: SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

ML-like Optimization of LLM Setup

2023-03: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (code: Programming—not prompting—Foundation Models)
2024-05: Automatic Prompt Optimization with "Gradient Descent" and Beam Search
2024-06: TextGrad: Automatic "Differentiation" via Text (gradient backpropagation through text, analogous to gradient descent)
2024-06: Symbolic Learning Enables Self-Evolving Agents (optimize LLM frameworks)
2025-03: Optimizing generative AI by backpropagating language model feedback

Self-modification

2025-06: Self-Adapting Language Models

Limitations/Requirements

Fluid intelligence (c.f. ARC AGI)
2024-06: Open-Endedness is Essential for Artificial Superhuman Intelligence

Creativity

See: AI creativity

@@ Line 1: / Line 1: @@
 =Reviews=
 * 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
+* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
+* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
+* 2025-02: [https://arxiv.org/abs/2502.03671 Advancing Reasoning in Large Language Models: Promising Methods and Approaches]
+* 2025-02: [https://arxiv.org/abs/2502.09100 Logical Reasoning in Large Language Models: A Survey]
+* 2025-02: [https://arxiv.org/abs/2502.21321 LLM Post-Training: A Deep Dive into Reasoning Large Language Models]
+* 2025-03: [https://arxiv.org/abs/2503.24377 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models]
+* 2025-04: [https://arxiv.org/abs/2504.09037 A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems]
+* 2025-05: [https://lilianweng.github.io/posts/2025-05-01-thinking/ Why We Think]
 * Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
+===World Model===
+* 2025-03: [https://arxiv.org/abs/2503.04641 Simulating the Real World: A Unified Survey of Multimodal Generative Models]
 =Prompt Engineering=
 * 2024-11: [https://arxiv.org/abs/2411.05778 LLMs as Method Actors: A Model for Prompt Engineering and Architecture]
+==Thought Templates==
+* 2024-06: [https://arxiv.org/abs/2406.04271 Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models]
+* 2025-02: [https://arxiv.org/abs/2502.06772 ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates]
+==Automatic Prompt Optimization==
+* 2023-09: [https://arxiv.org/abs/2309.16797 Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution]
+* 2025-02: [https://arxiv.org/abs/2502.16923 A Systematic Survey of Automatic Prompt Optimization Techniques]
+* 2025-02: [https://arxiv.org/abs/2502.18746 Automatic Prompt Optimization via Heuristic Search: A Survey]
 =Fine Tuning=
 * 2024-12: [https://arxiv.org/abs/2412.15287 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models]
+* 2025-01: [https://arxiv.org/abs/2501.01702 AgentRefine: Enhancing Agent Generalization through Refinement Tuning]
+* 2025-01: [https://llm-multiagent-ft.github.io/ Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains] ([https://llm-multiagent-ft.github.io/ preprint], [https://github.com/vsubramaniam851/multiagent-ft/tree/main code])
 =Proactive Search=
 Compute expended after training, but before inference.
+===Reinforcement Learning===
+* 2025-04: DeepSeek: [https://arxiv.org/abs/2504.02495 Inference-Time Scaling for Generalist Reward Modeling]
+* 2025-04: [https://arxiv.org/abs/2504.13837 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?]
+* 2025-04: [https://arxiv.org/abs/2504.13941 NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning]
+* 2025-04: [https://arxiv.org/abs/2504.16084 TTRL: Test-Time Reinforcement Learning] ([https://github.com/PRIME-RL/TTRL code])
+* 2025-04: [https://arxiv.org/abs/2504.20571 Reinforcement Learning for Reasoning in Large Language Models with One Training Example]
+* 2025-05: [https://arxiv.org/abs/2505.03335 Absolute Zero: Reinforced Self-play Reasoning with Zero Data]
+* 2025-09: [https://www.nature.com/articles/s41586-025-09422-z DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning]
+* 2025-09: [https://github.com/NVlabs/RLP/blob/main/pdf/RLP_Reinforcement_as_a_Pretraining_Objective.pdf RLP : Reinforcement Learning Pre‑training] (Nvidia)
+* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]
+====Optimize Confidence/Entropy====
+* C.f. 2025-02: [https://arxiv.org/abs/2502.06233 Confidence Improves Self-Consistency in LLMs]
+* [https://x.com/xuandongzhao/status/1927270931874910259 2025-05]: [https://arxiv.org/abs/2505.19590 Learning to Reason without External Rewards] ([https://github.com/sunblaze-ucb/Intuitor code]): Reinforcement Learning from Internal Feedback, RLIF
+* [https://x.com/mihirp98/status/1927767453490172277 2025-05]: [https://rent-rl.github.io/ Maximizing Confidence Alone Improves Reasoning] ([https://github.com/satrams/rent-rl code]); a.k.a. RENT: Reinforcement Learning via Entropy Minimization.
+* 2025-05: [https://arxiv.org/abs/2505.22617 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models]
+* 2025-06: [https://arxiv.org/abs/2506.01347 The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning]
+====Exceed humans, using human-level data====
+* 2024-06: [https://arxiv.org/abs/2406.11741v1 Transcendence: Generative Models Can Outperform The Experts That Train Them]
+* 2025-03: [https://tecunningham.github.io/posts/2023-09-05-model-of-ai-imitation.html An AI Which Imitates Humans Can Beat Humans]
+* 2025-08: [https://arxiv.org/abs/2508.17669 A Taxonomy of Transcendence]
+====Self-play====
+* 2025-09: [https://arxiv.org/abs/2509.07414 Language Self-Play For Data-Free Training]
 ===Training Data (Data Refinement, Synthetic Data)===
@@ Line 18: / Line 66: @@
 * 2024-09: [https://arxiv.org/abs/2409.17115 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale]
 * 2024-10: [https://arxiv.org/abs/2410.15547 Data Cleaning Using Large Language Models]
+* 2025-01: [https://arxiv.org/abs/2501.18845 Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities]
+* 2025-02: [https://arxiv.org/abs/2502.01718 ACECODER: Acing Coder RL via Automated Test-Case Synthesis]
+* 2025-02: [https://arxiv.org/abs/2502.15588 Improving the Scaling Laws of Synthetic Data with Deliberate Practice]
+* 2025-03: [https://arxiv.org/abs/2503.19551 Scaling Laws of Synthetic Data for Language Models]
+* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]: infer the (latent) thoughts that would have led to training documents, so that you can pretrain on text+thoughts
 * Updating list of links: [https://github.com/wasiahmad/Awesome-LLM-Synthetic-Data Synthetic Data of LLMs, by LLMs, for LLMs]
+====Re-captioning====
+* 2023-10: [https://arxiv.org/abs/2310.16656 A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation]
+* 2024-07: [https://arxiv.org/abs/2407.06723 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions]
+===Pre-generate material===
+* 2020-03: [https://research.google/blog/introducing-dreamer-scalable-reinforcement-learning-using-world-models/ Introducing Dreamer: Scalable Reinforcement Learning Using World Models]
+* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]
+* 2025-04: [https://arxiv.org/abs/2504.13171 Sleep-time Compute: Beyond Inference Scaling at Test-time]
 ===Generate consistent plans/thoughts===
@@ Line 47: / Line 109: @@
 ====CoT reasoning model====
+See also: [[AI tools]] > LLM > Open-weights LLM > [[AI_tools#Reasoning|Reasoning]]
 * 2024-09: [https://openai.com/o1/ OpenAI o1]
 * 2024-10: [https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf O1 Replication Journey: A Strategic Progress Report – Part 1] ([https://github.com/GAIR-NLP/O1-Journey code]): Attempt by [https://gair-nlp.github.io/walnut-plan/ Walnut Plan] to reproduce o1-like in-context reasoning
@@ Line 52: / Line 115: @@
 * 2024-11: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 * 2024-11: [https://huggingface.co/papers/2411.16489 O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?]
+* 2024-11: [https://arxiv.org/abs/2411.15124 Tulu 3: Pushing Frontiers in Open Language Model Post-Training]
 * 2024-12: [https://arxiv.org/abs/2412.00154 o1-Coder: an o1 Replication for Coding] ([https://github.com/ADaM-BJTU/O1-CODER code])
 * 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
 * 2024-12: [https://arxiv.org/abs/2412.14135 Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective]
 * 2025-01: [https://arxiv.org/abs/2501.01904 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM]
+* 2025-01: [https://arxiv.org/abs/2501.06458 O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning]
+* 2025-01: [https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]
+* 2025-01: [https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf Kimi k1.5: Scaling Reinforcement Learning with LLMs]
+* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
+* 2025-01: [https://huggingface.co/blog/open-r1 Open-R1: a fully open reproduction of DeepSeek-R1]
+* 2025-02: [https://arxiv.org/abs/2502.03373 Demystifying Long Chain-of-Thought Reasoning in LLMs]
+* 2025-02: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125])
+* 2025-02: [https://arxiv.org/pdf/2502.20339 Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners]
 ===Scaling===
 * 2024-08: [https://arxiv.org/abs/2408.16737 Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling] (Google DeepMind)
 * 2024-11: [https://arxiv.org/abs/2411.04434 Scaling Laws for Pre-training Agents and World Models]
+* 2025-02: [https://arxiv.org/pdf/2502.20339 Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners]
+* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
+* 2025-10: [https://arxiv.org/abs/2510.13786 The Art of Scaling Reinforcement Learning Compute for LLMs]
 =Inference Time Compute=
@@ Line 68: / Line 143: @@
 '''Review'''
 * 2024-06: [https://arxiv.org/abs/2406.16838 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models]
+* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
 ===In context learning (ICL), search, and other inference-time methods===
@@ Line 81: / Line 157: @@
 * 2024-11: [https://openreview.net/forum?id=FBkpCyujtS Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs]
 * 2024-12: [https://arxiv.org/abs/2412.06822 Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models]
+* 2025-08: [https://arxiv.org/abs/2508.15260 Deep Think with Confidence] ([https://jiaweizzhao.github.io/deepconf/ project])
+* 2025-10: [https://arxiv.org/abs/2510.14901 Reasoning with Sampling: Your Base Model is Smarter Than You Think]
-===Inference-time Gradient===
+===Inference-time Gradient/Updating/RL/etc.===
 * 2024-11: [https://ekinakyurek.github.io/papers/ttt.pdf The Surprising Effectiveness of Test-Time Training for Abstract Reasoning] ([https://github.com/ekinakyurek/marc code])
+* 2025-04: [https://arxiv.org/abs/2504.16084 TTRL: Test-Time Reinforcement Learning] ([https://github.com/PRIME-RL/TTRL code])
 ===Self-prompting===
@@ Line 103: / Line 182: @@
 * 2024-10: [https://arxiv.org/abs/2410.06634 Tree of Problems: Improving structured problem solving with compositionality]
 * 2023-01/2024-10: [https://arxiv.org/abs/2301.00234 A Survey on In-context Learning]
+* 2025-01: [https://arxiv.org/abs/2501.04682 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought]
+* [https://x.com/dav1d_bai/status/1904057766593138841 2025-03]: [https://optimal-test-time.vercel.app/papers/accuracy-efficiency-tradeoffs Interruption is All You Need: Improving Reasoning Model Refusal Rates through measuring Parallel Reasoning Diversity]: A novel approach to reducing hallucinations in large language models through parallel reasoning and diversity measurement
-===Naive multi-LLM (verification, majority voting, best-of-N, etc.)===
+===Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)===
 * 2023-06: [https://arxiv.org/abs/2306.02561 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion] ([https://github.com/yuchenlin/LLM-Blender?tab=readme-ov-file code])
 * 2023-12: [https://aclanthology.org/2023.findings-emnlp.203/ Dynamic Voting for Efficient Reasoning in Large Language Models]
@@ Line 111: / Line 192: @@
 * 2024-11: [https://arxiv.org/abs/2411.00492 Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models]
 * 2024-12: [https://github.com/irthomasthomas/llm-consortium llm-consortium]: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
+* 2025-03: [https://arxiv.org/abs/2502.01839 Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification]
+* 2025-02: [https://arxiv.org/abs/2502.04506 When One LLM Drools, Multi-LLM Collaboration Rules]
+* 2025-03: [https://arxiv.org/abs/2503.17363 Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique]
+* 2025-12: [https://arxiv.org/abs/2512.24103 Enhancing LLM Planning Capabilities through Intrinsic Self-Critique]
 ===Multi-LLM (multiple comparisons, branching, etc.)===
@@ Line 116: / Line 201: @@
 * 2024-11: [https://arxiv.org/abs/2411.02830 Mixtures of In-Context Learners]: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
 * 2024-11: [https://arxiv.org/abs/2411.10440 LLaVA-o1: Let Vision Language Models Reason Step-by-Step] ([https://github.com/PKU-YuanGroup/LLaVA-o1 code])
+* 2025-04: [https://arxiv.org/abs/2504.07081 Self-Steering Language Models]: Planner generates program, Followers accomplish sub-tasks
+* 2025-09: [https://arxiv.org/abs/2508.21184 BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design]
+* 2025-09: [https://arxiv.org/abs/2509.03918 MTQA: Matrix of Thought for Enhanced Reasoning in Complex Question Answering]
 ===Iteration (e.g. neural-like layered blocks)===
@@ Line 133: / Line 221: @@
 * 2024-10: [https://arxiv.org/abs/2410.01707 Interpretable Contrastive Monte Carlo Tree Search Reasoning]
 * 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
+===Pathfinding===
+* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
+* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
+* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
+* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]
 ===Other Search===
@@ Line 141: / Line 235: @@
 * 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 * 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
+* 2025-01: [https://arxiv.org/abs/2501.19393 s1: Simple test-time scaling]
+* 2025-02: [https://arxiv.org/abs/2502.04404 Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models]
+* 2025-02: [https://arxiv.org/abs/2502.06773 On the Emergence of Thinking in LLMs I: Searching for the Right Intuition]
+* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]
+* 2025-02: [https://arxiv.org/abs/2502.18600 Chain of Draft: Thinking Faster by Writing Less]
+* 2025-03: [https://arxiv.org/abs/2503.17352 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement] ([https://github.com/yihedeng9/OpenVLThinker code])
+* 2025-03: [https://arxiv.org/abs/2503.19877 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators]
+* 2025-03: [https://arxiv.org/abs/2503.23513 RARE: Retrieval-Augmented Reasoning Modeling]
+* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]
+* 2025-09: [https://arxiv.org/abs/2509.13351 Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning]
+===Inner Monologue===
+* 2022-07: [https://arxiv.org/abs/2207.05608 Inner Monologue: Embodied Reasoning through Planning with Language Models]
+* 2025-06: [https://nicolehsing.com/mirror-paper.pdf MIRROR: Cognitive Inner Monologue Between Conversational Turns for Persistent Reflection and Reasoning in Conversational LLMs] ([https://github.com/nicolehsing/MIRROR code])
+===Model Merging===
+* 2025-01: [https://arxiv.org/abs/2501.12599 Kimi k1.5: Scaling Reinforcement Learning with LLMs]
+* 2025-03: [https://arxiv.org/abs/2503.20641 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging] ([https://github.com/hahahawu/Long-to-Short-via-Model-Merging code])
+* 2025-03: [https://www.nature.com/articles/s41524-025-01564-y Fine-tuning large language models for domain adaptation: exploration of training strategies, scaling, model merging and synergistic capabilities]
+===Meta-methods===
+* 2025-02: [https://arxiv.org/abs/2502.12018 Atom of Thoughts for Markov LLM Test-Time Scaling] ([https://github.com/qixucen/atom code])
 ==Analysis==
@@ Line 152: / Line 268: @@
 * 2024-10: (comparing fine-tuning to in-context learning) [https://arxiv.org/abs/2405.19874 Is In-Context Learning Sufficient for Instruction Following in LLMs?]
 * 2024-11: [https://arxiv.org/abs/2411.17501 Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers]
+* 2025-02: [https://www.arxiv.org/abs/2502.08606 Distillation Scaling Laws]
+* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
+* 2025-03: [https://arxiv.org/abs/2504.00294 Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead]
+* 2025-04: [https://arxiv.org/abs/2504.03635 Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning]: Model size can improve things, but can also lead to overparametrization (memorization instead of reasoning)
+* 2025-04: [https://arxiv.org/abs/2504.14047 Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods]: Reasoning models outperform inference-time-compute of non-reasoning; majority voting always helps, and is hard to beat
+* 2025-12: [https://arxiv.org/abs/2512.08296 Towards a Science of Scaling Agent Systems] (Google DeepMind)
+====(Optimal) Usage of Reasoning Compute====
+* 2024-10: [https://arxiv.org/abs/2410.21333 Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse]
+* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
+* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
+* 2025-02: [https://www.arxiv.org/abs/2502.04463 Training Language Models to Reason Efficiently]
+* 2025-02: [https://arxiv.org/abs/2502.08235 The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks]
+* 2025-03: [https://arxiv.org/abs/2503.01141 How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach]
+* 2025-03: [https://arxiv.org/abs/2503.16419 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models]
+* 2025-04: [https://arxiv.org/abs/2504.05185 Concise Reasoning via Reinforcement Learning]
+* 2025-04: [https://arxiv.org/abs/2504.05419 Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification]
+* 2025-04: [https://arxiv.org/abs/2504.15895 Dynamic Early Exit in Reasoning Models]
+* 2025-07: [https://arxiv.org/abs/2507.14417 Inverse Scaling in Test-Time Compute]
+* 2025-08: [https://arxiv.org/abs/2508.17627 Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit]
+====Usage of Training Data====
+* 2025-02: [https://arxiv.org/abs/2502.03387 LIMO: Less is More for Reasoning] (surprisingly easy generalization, from very few reasoning training examples; model can go from knowledge-retrieval to diverse reasoning using curated examples)
 ===Theory===
@@ Line 162: / Line 301: @@
 [[Image:Compute.png|600px]]
 * 2024-09-16: [https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai Scaling: The State of Play in AI]
+* 2025-02-03: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]
-===Pitfalls===
+* 2025-10: [https://arxiv.org/abs/2510.14232 Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models]
-* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
 ==Pragmatics==
 ===Code for Inference-time Compute===
 * [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
+=Interact with Environment (Experiential Learning)=
+* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
+* 2025-09: [https://arxiv.org/abs/2509.24527 Training Agents Inside of Scalable World Models]
+* 2025-10: [https://arxiv.org/abs/2510.08558 Agent Learning via Early Experience]
 =Memory=
@@ Line 187: / Line 330: @@
 * 2024-10: [https://arxiv.org/abs/2410.19318 Two are better than one: Context window extension with multi-grained self-injection]
 * 2024-11: [https://arxiv.org/abs/2411.00114 Project Sid: Many-agent simulations toward AI civilization]
+* 2025-01: [https://arxiv.org/abs/2501.13946 Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks]
+* 2025-02: [https://arxiv.org/abs/2502.16111 PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving]
+* 2025-09: [https://arxiv.org/abs/2509.15172 Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment]
+==Competition==
+* 2025-06: [https://arxiv.org/abs/2506.04721 SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat]
 =ML-like Optimization of LLM Setup=
 * 2023-03: [https://arxiv.org/abs/2310.03714 DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines] ([https://github.com/stanfordnlp/dspy code]: Programming—not prompting—Foundation Models)
 * 2024-05: [https://arxiv.org/abs/2305.03495 Automatic Prompt Optimization with "Gradient Descent" and Beam Search]
-* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text)
+* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text, analogous to gradient descent)
 * 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
+* 2025-03: [https://www.nature.com/articles/s41586-025-08661-4 Optimizing generative AI by backpropagating language model feedback]
+=Self-modification=
+* 2025-06: [https://arxiv.org/abs/2506.10943 Self-Adapting Language Models]
+=Limitations/Requirements=
+* Fluid intelligence (c.f. [https://arcprize.org/arc ARC AGI])
+* 2024-06: [https://arxiv.org/abs/2406.04268 Open-Endedness is Essential for Artificial Superhuman Intelligence]
+==Creativity==
+See: [[AI creativity]]
 =See Also=
 * [[AI]]
 * [[AI Agents]]
+* [[AI research trends]]

Difference between revisions of "Increasing AI Intelligence"

Latest revision as of 11:44, 22 January 2026

Contents

Reviews

World Model

Prompt Engineering

Thought Templates

Automatic Prompt Optimization

Fine Tuning

Proactive Search

Reinforcement Learning

Optimize Confidence/Entropy

Exceed humans, using human-level data

Self-play

Training Data (Data Refinement, Synthetic Data)

Re-captioning

Pre-generate material

Generate consistent plans/thoughts

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

Scaling

Inference Time Compute

Methods

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient/Updating/RL/etc.

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Pathfinding

Other Search

Chain-of-Thought Reasoning

Inner Monologue

Model Merging

Meta-methods

Analysis

Scaling

(Optimal) Usage of Reasoning Compute

Usage of Training Data

Theory

Expending compute works

Pragmatics

Code for Inference-time Compute

Interact with Environment (Experiential Learning)

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

Competition

ML-like Optimization of LLM Setup

Self-modification

Limitations/Requirements

Creativity

See Also

Navigation menu

Search