Revision as of 16:44, 31 December 2024

Reviews

2024-12: A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges

Prompt Engineering

2024-11: LLMs as Method Actors: A Model for Prompt Engineering and Architecture

Fine Tuning

2024-12: Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Proactive Search

Compute expended after training, but before inference.

Training Data (Data Refinement, Synthetic Data)

C.f. image datasets:
- 2023-06: StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
- 2023-11: DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
2024-09: Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
2024-10: Data Cleaning Using Large Language Models
Updating list of links: Synthetic Data of LLMs, by LLMs, for LLMs

Generate consistent plans/thoughts

2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
- (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
2024-09: Self-Harmonized Chain of Thought
- Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
- They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Inference Time Compute

Methods

2024-03: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
2024-11: Reverse Thinking Makes LLMs Stronger Reasoners
2024-12: Training Large Language Models to Reason in a Continuous Latent Space (Chain of Continuous Thought, COCONUT)

Review

2024-06: From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

In context learning (ICL), search, and other inference-time methods

2023-03: Reflexion: Language Agents with Verbal Reinforcement Learning
2023-05: VOYAGER: An Open-Ended Embodied Agent with Large Language Models
2024-04: Many-Shot In-Context Learning
2024-08: Automated Design of Agentic Systems
2024-09: Planning In Natural Language Improves LLM Search For Code Generation

Inference-time Sampling

Inference-time Gradient

2024-11: The Surprising Effectiveness of Test-Time Training for Abstract Reasoning (code)

Self-prompting

Retrieval or Memory

2024-12: Meta-Reflection: A Feedback-Free Reflection Learning Framework

In-context thought

2022-01: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Google Brain)
2023-05: Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Google DeepMind)
2024-05: Faithful Logical Reasoning via Symbolic Chain-of-Thought
2024-06: A Tree-of-Thoughts to Broaden Multi-step Reasoning across Languages
2024-09: To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
2024-09: Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning (Agnostiq, Toronto)
2024-09: Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
2024-10: A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration (failed reasoning traces can improve CoT)
2024-10: Tree of Problems: Improving structured problem solving with compositionality
2023-01/2024-10: A Survey on In-context Learning

Naive multi-LLM (verification, majority voting, best-of-N, etc.)

2023-06: LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion (code)
2023-12: Dynamic Voting for Efficient Reasoning in Large Language Models
2024-04: Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment
2024-08: Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
2024-11: Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
2024-12: llm-consortium: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration

Multi-LLM (multiple comparisons, branching, etc.)

2024-10: Thinking LLMs: General Instruction Following with Thought Generation
2024-11: Mixtures of In-Context Learners: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
2024-11: LLaVA-o1: Let Vision Language Models Reason Step-by-Step (code)

Iteration (e.g. neural-like layered blocks)

2024-06: Mixture-of-Agents Enhances Large Language Model Capabilities

Iterative reasoning via graphs

2023-08: Graph of Thoughts: Solving Elaborate Problems with Large Language Models
2023-10: Amortizing intractable inference in large language models (code)
2024-09: On the Diagram of Thought: Iterative reasoning as a directed acyclic graph (DAG)

Monte Carlo Tree Search (MCTS)

Other Search

2024-11: Scattered Forest Search: Smarter Code Space Exploration with LLMs

Chain-of-Thought Reasoning

Scaling

2021-04: Scaling Scaling Laws with Board Games
2024-03: Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
2024-04: The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
2024-07: Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
2024-08: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
2024-08: Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
2024-10: (comparing fine-tuning to in-context learning) Is In-Context Learning Sufficient for Instruction Following in LLMs?
2024-11: Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers

Theory

2024-02: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Expending compute works

2024-06-10: Blog post (opinion): AI Search: The Bitter-er Lesson
2024-07-17: Blog post (test): Getting 50% (SoTA) on ARC-AGI with GPT-4o
2024-09-12: OpenAI o1: Learning to Reason with LLMs

2024-09-16: Scaling: The State of Play in AI

Code for Inference-time Compute

optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Memory

2024-10: Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

Tool Use

2024-11: DynaSaur: Large Language Agents Beyond Predefined Actions: writes functions/code to increase capabilities

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

2023-03: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (code: Programming—not prompting—Foundation Models)
2024-05: Automatic Prompt Optimization with "Gradient Descent" and Beam Search
2024-06: TextGrad: Automatic "Differentiation" via Text (gradient backpropagation through text)
2024-06: Symbolic Learning Enables Self-Evolving Agents (optimize LLM frameworks)

@@ Line 60: / Line 60: @@
 =Inference Time Compute=
-==Methods==
+===Methods===
 * 2024-03: [https://arxiv.org/abs/2403.09629 Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking]
 * 2024-11: [https://arxiv.org/pdf/2411.19865 Reverse Thinking Makes LLMs Stronger Reasoners]
@@ Line 67: / Line 67: @@
 * 2024-06: [https://arxiv.org/abs/2406.16838 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models]
-==In context learning (ICL), search, and other inference-time methods==
+===In context learning (ICL), search, and other inference-time methods===
 * 2023-03: [https://arxiv.org/abs/2303.11366 Reflexion: Language Agents with Verbal Reinforcement Learning]
 * 2023-05: [https://arxiv.org/abs/2305.16291 VOYAGER: An Open-Ended Embodied Agent with Large Language Models]
@@ Line 74: / Line 74: @@
 * 2024-09: [https://arxiv.org/abs/2409.03733 Planning In Natural Language Improves LLM Search For Code Generation]
-==Inference-time Sampling==
+===Inference-time Sampling===
 * 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
 * 2024-10: [https://arxiv.org/abs/2410.16033 TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling]
@@ Line 80: / Line 80: @@
 * 2024-12: [https://arxiv.org/abs/2412.06822 Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models]
-==Inference-time Gradient==
+===Inference-time Gradient===
 * 2024-11: [https://ekinakyurek.github.io/papers/ttt.pdf The Surprising Effectiveness of Test-Time Training for Abstract Reasoning] ([https://github.com/ekinakyurek/marc code])
-==Self-prompting==
+===Self-prompting===
 * 2023-05: [https://arxiv.org/abs/2305.09993 Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling]
 * 2023-11: [https://arxiv.org/abs/2311.04205 Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves]
-==Retrieval or Memory==
+===Retrieval or Memory===
 * 2024-12: [https://arxiv.org/abs/2412.13781 Meta-Reflection: A Feedback-Free Reflection Learning Framework]
-==In-context thought==
+===In-context thought===
 * 2022-01: [https://arxiv.org/abs/2201.11903 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models] (Google Brain)
 * 2023-05: [https://arxiv.org/abs/2305.10601 Tree of Thoughts: Deliberate Problem Solving with Large Language Models] (Google DeepMind)
@@ Line 102: / Line 102: @@
 * 2023-01/2024-10: [https://arxiv.org/abs/2301.00234 A Survey on In-context Learning]
-==Naive multi-LLM (verification, majority voting, best-of-N, etc.)==
+===Naive multi-LLM (verification, majority voting, best-of-N, etc.)===
 * 2023-06: [https://arxiv.org/abs/2306.02561 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion] ([https://github.com/yuchenlin/LLM-Blender?tab=readme-ov-file code])
 * 2023-12: [https://aclanthology.org/2023.findings-emnlp.203/ Dynamic Voting for Efficient Reasoning in Large Language Models]
@@ Line 110: / Line 110: @@
 * 2024-12: [https://github.com/irthomasthomas/llm-consortium llm-consortium]: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
-==Multi-LLM (multiple comparisons, branching, etc.)==
+===Multi-LLM (multiple comparisons, branching, etc.)===
 * 2024-10: [https://arxiv.org/abs/2410.10630 Thinking LLMs: General Instruction Following with Thought Generation]
 * 2024-11: [https://arxiv.org/abs/2411.02830 Mixtures of In-Context Learners]: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
 * 2024-11: [https://arxiv.org/abs/2411.10440 LLaVA-o1: Let Vision Language Models Reason Step-by-Step] ([https://github.com/PKU-YuanGroup/LLaVA-o1 code])
-==Iteration (e.g. neural-like layered blocks)==
+===Iteration (e.g. neural-like layered blocks)===
 * 2024-06: [https://arxiv.org/abs/2406.04692 Mixture-of-Agents Enhances Large Language Model Capabilities]
-==Iterative reasoning via graphs==
+===Iterative reasoning via graphs===
 * 2023-08: [https://arxiv.org/abs/2308.09687 Graph of Thoughts: Solving Elaborate Problems with Large Language Models]
 * 2023-10: [https://arxiv.org/abs/2310.04363 Amortizing intractable inference in large language models] ([https://github.com/GFNOrg/gfn-lm-tuning code])
 * 2024-09: [https://arxiv.org/abs/2409.10038 On the Diagram of Thought]: Iterative reasoning as a directed acyclic graph (DAG)
-==Monte Carlo Tree Search (MCTS)==
+===Monte Carlo Tree Search (MCTS)===
 * 2024-05: [https://arxiv.org/abs/2405.03553 AlphaMath Almost Zero: process Supervision without process]
 * 2024-06: [https://arxiv.org/abs/2406.03816 ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search]
@@ Line 132: / Line 132: @@
 * 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
-==Other Search==
+===Other Search===
 * 2024-11: [https://arxiv.org/abs/2411.05010 Scattered Forest Search: Smarter Code Space Exploration with LLMs]
-==Chain-of-Thought Reasoning==
+===Chain-of-Thought Reasoning===
 * 2017-05: [https://arxiv.org/abs/1705.04146 Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems]
 * 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 * 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
-==Scaling==
+===Scaling===
 * 2021-04: [https://arxiv.org/abs/2104.03113 Scaling Scaling Laws with Board Games]
 * 2024-03: [https://arxiv.org/abs/2403.02419 Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems]
@@ Line 150: / Line 150: @@
 * 2024-11: [https://arxiv.org/abs/2411.17501 Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers]
-==Theory==
+===Theory===
 * 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]
-==Expending compute works==
+===Expending compute works===
 * 2024-06-10: Blog post (opinion): [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
 * 2024-07-17: Blog post (test): [https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Getting 50% (SoTA) on ARC-AGI with GPT-4o]
@@ Line 160: / Line 160: @@
 * 2024-09-16: [https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai Scaling: The State of Play in AI]
-==Code for Inference-time Compute==
+===Code for Inference-time Compute===
 * [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Difference between revisions of "Increasing AI Intelligence"

Revision as of 16:44, 31 December 2024

Contents

Reviews

Prompt Engineering

Fine Tuning

Proactive Search

Training Data (Data Refinement, Synthetic Data)

Generate consistent plans/thoughts

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

Scaling

Inference Time Compute

Methods

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Scaling

Theory

Expending compute works

Code for Inference-time Compute

Memory

Tool Use

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

Navigation menu

Search