2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
- (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
2024-09: Self-Harmonized Chain of Thought
- Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
- They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

CoT reasoning model

See also: AI tools > LLM > Open-weights LLM > Reasoning

2024-09: OpenAI o1
2024-10: O1 Replication Journey: A Strategic Progress Report – Part 1 (code): Attempt by Walnut Plan to reproduce o1-like in-context reasoning
2024-11: DeepSeek-R1-Lite-Preview reasoning model
2024-11: Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
2024-11: O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
2024-11: Tulu 3: Pushing Frontiers in Open Language Model Post-Training
2024-12: o1-Coder: an o1 Replication for Coding (code)
2024-12: Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
2024-12: Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
2025-01: Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
2025-01: O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning
2025-01: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2025-01: Kimi k1.5: Scaling Reinforcement Learning with LLMs
2025-01: Reasoning Language Models: A Blueprint
2025-01: Open-R1: a fully open reproduction of DeepSeek-R1
2025-02: Demystifying Long Chain-of-Thought Reasoning in LLMs
2025-02: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (Huginn-0125)
2025-02: Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners

Scaling

2024-08: Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling (Google DeepMind)
2024-11: Scaling Laws for Pre-training Agents and World Models
2025-02: Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
2025-03: Compute Optimal Scaling of Skills: Knowledge vs Reasoning

Inference Time Compute

Methods

2024-03: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
2024-11: Reverse Thinking Makes LLMs Stronger Reasoners
2024-12: Training Large Language Models to Reason in a Continuous Latent Space (Chain of Continuous Thought, COCONUT)

Review

In context learning (ICL), search, and other inference-time methods

2023-03: Reflexion: Language Agents with Verbal Reinforcement Learning
2023-05: VOYAGER: An Open-Ended Embodied Agent with Large Language Models
2024-04: Many-Shot In-Context Learning
2024-08: Automated Design of Agentic Systems
2024-09: Planning In Natural Language Improves LLM Search For Code Generation

Inference-time Sampling

Inference-time Gradient/Updating/RL/etc.

2024-11: The Surprising Effectiveness of Test-Time Training for Abstract Reasoning (code)
2025-04: TTRL: Test-Time Reinforcement Learning (code)

Self-prompting

Retrieval or Memory

2024-12: Meta-Reflection: A Feedback-Free Reflection Learning Framework

In-context thought

2022-01: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Google Brain)
2023-05: Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Google DeepMind)
2024-05: Faithful Logical Reasoning via Symbolic Chain-of-Thought
2024-06: A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
2024-09: To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
2024-09: Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning (Agnostiq, Toronto)
2024-09: Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
2024-10: A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration (failed reasoning traces can improve CoT)
2024-10: Tree of Problems: Improving structured problem solving with compositionality
2023-01/2024-10: A Survey on In-context Learning
2025-01: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
2025-03: Interruption is All You Need: Improving Reasoning Model Refusal Rates through measuring Parallel Reasoning Diversity: A novel approach to reducing hallucinations in large language models through parallel reasoning and diversity measurement

Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)

2023-06: LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion (code)
2023-12: Dynamic Voting for Efficient Reasoning in Large Language Models
2024-04: Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
2024-08: Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
2024-11: Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
2024-12: llm-consortium: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
2025-03: Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification
2025-02: When One LLM Drools, Multi-LLM Collaboration Rules
2025-03: Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique

Multi-LLM (multiple comparisons, branching, etc.)

2024-10: Thinking LLMs: General Instruction Following with Thought Generation
2024-11: Mixtures of In-Context Learners: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
2024-11: LLaVA-o1: Let Vision Language Models Reason Step-by-Step (code)
2025-04: Self-Steering Language Models: Planner generates program, Followers accomplish sub-tasks

Iteration (e.g. neural-like layered blocks)

2024-06: Mixture-of-Agents Enhances Large Language Model Capabilities

Iterative reasoning via graphs

2023-08: Graph of Thoughts: Solving Elaborate Problems with Large Language Models
2023-10: Amortizing intractable inference in large language models (code)
2024-09: On the Diagram of Thought: Iterative reasoning as a directed acyclic graph (DAG)

Monte Carlo Tree Search (MCTS)

Other Search

2024-11: Scattered Forest Search: Smarter Code Space Exploration with LLMs

Chain-of-Thought Reasoning

Inner Monologue

Model Merging

Meta-methods

2025-02: Atom of Thoughts for Markov LLM Test-Time Scaling (code)

Analysis

Scaling

2021-04: Scaling Scaling Laws with Board Games
2024-03: Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
2024-04: The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
2024-07: Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
2024-08: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
2024-08: Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
2024-10: (comparing fine-tuning to in-context learning) Is In-Context Learning Sufficient for Instruction Following in LLMs?
2024-11: Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers
2025-02: Distillation Scaling Laws
2025-03: Compute Optimal Scaling of Skills: Knowledge vs Reasoning
2025-03: Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
2025-04: Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning: Model size can improve things, but can also lead to overparametrization (memorization instead of reasoning)
2025-04: Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods: Reasoning models outperform inference-time-compute of non-reasoning; majority voting always helps, and is hard to beat

(Optimal) Usage of Reasoning Compute

Usage of Training Data

2025-02: LIMO: Less is More for Reasoning (surprisingly easy generalization, from very few reasoning training examples; model can go from knowledge-retrieval to diverse reasoning using curated examples)

Theory

2024-02: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Expending compute works

2024-06-10: Blog post (opinion): AI Search: The Bitter-er Lesson
2024-07-17: Blog post (test): Getting 50% (SoTA) on ARC-AGI with GPT-4o
2024-09-12: OpenAI o1: Learning to Reason with LLMs

2024-09-16: Scaling: The State of Play in AI
2025-02-03: Competitive Programming with Large Reasoning Models

Pragmatics

Code for Inference-time Compute

optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Interact with Environment

2025-01: Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Memory

2024-10: Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

Tool Use

2024-11: DynaSaur: Large Language Agents Beyond Predefined Actions: writes functions/code to increase capabilities

Integrated

2018-08: Neural Arithmetic Logic Units
2023-01: Tracr: Compiled Transformers as a Laboratory for Interpretability (code)
2024-05: Augmenting Language Models with Composable Differentiable Libraries (pdf)
2024-07: Algorithmic Language Models with Neurally Compiled Libraries
2024-10: ALTA: Compiler-Based Analysis of Transformers

Multi-agent Effort (and Emergent Intelligence)

Competition

2025-06: SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

ML-like Optimization of LLM Setup

2023-03: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (code: Programming—not prompting—Foundation Models)
2024-05: Automatic Prompt Optimization with "Gradient Descent" and Beam Search
2024-06: TextGrad: Automatic "Differentiation" via Text (gradient backpropagation through text, analogous to gradient descent)
2024-06: Symbolic Learning Enables Self-Evolving Agents (optimize LLM frameworks)
2025-03: Optimizing generative AI by backpropagating language model feedback

Self-modification

2025-06: Self-Adapting Language Models

Limitations/Requirements

Fluid intelligence (c.f. ARC AGI)
2024-06: Open-Endedness is Essential for Artificial Superhuman Intelligence

Creativity

See: AI creativity

Increasing AI Intelligence

Contents

Reviews

World Model

Prompt Engineering

Thought Templates

Automatic Prompt Optimization

Fine Tuning

Proactive Search

Reinforcement Learning

Optimize Confidence/Entropy

Exceed humans, using human-level data

Training Data (Data Refinement, Synthetic Data)

Re-captioning

Pre-generate material

Generate consistent plans/thoughts

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

Scaling

Inference Time Compute

Methods

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient/Updating/RL/etc.

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Inner Monologue

Model Merging

Meta-methods

Analysis

Scaling

(Optimal) Usage of Reasoning Compute

Usage of Training Data

Theory

Expending compute works

Pragmatics

Code for Inference-time Compute

Interact with Environment

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

Competition

ML-like Optimization of LLM Setup

Self-modification

Limitations/Requirements

Creativity

See Also

Navigation menu

Search