Difference between revisions of "Increasing AI Intelligence"

From GISAXS
Jump to: navigation, search
(Inference Time Compute)
(See Also)
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=Reviews=
 
=Reviews=
 
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 +
* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
 +
* Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
  
 
=Prompt Engineering=
 
=Prompt Engineering=
Line 7: Line 9:
 
=Fine Tuning=
 
=Fine Tuning=
 
* 2024-12: [https://arxiv.org/abs/2412.15287 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models]
 
* 2024-12: [https://arxiv.org/abs/2412.15287 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.01702 AgentRefine: Enhancing Agent Generalization through Refinement Tuning]
 +
* 2025-01: [https://llm-multiagent-ft.github.io/ Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains] ([https://llm-multiagent-ft.github.io/ preprint], [https://github.com/vsubramaniam851/multiagent-ft/tree/main code])
  
 
=Proactive Search=
 
=Proactive Search=
Line 54: Line 58:
 
* 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
 
* 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
 
* 2024-12: [https://arxiv.org/abs/2412.14135 Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective]
 
* 2024-12: [https://arxiv.org/abs/2412.14135 Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective]
 +
* 2025-01: [https://arxiv.org/abs/2501.01904 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM]
 +
* 2025-01: [https://arxiv.org/abs/2501.06458 O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning]
  
 
===Scaling===
 
===Scaling===
Line 66: Line 72:
 
'''Review'''
 
'''Review'''
 
* 2024-06: [https://arxiv.org/abs/2406.16838 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models]
 
* 2024-06: [https://arxiv.org/abs/2406.16838 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
  
==In context learning (ICL), search, and other inference-time methods==
+
===In context learning (ICL), search, and other inference-time methods===
 
* 2023-03: [https://arxiv.org/abs/2303.11366 Reflexion: Language Agents with Verbal Reinforcement Learning]
 
* 2023-03: [https://arxiv.org/abs/2303.11366 Reflexion: Language Agents with Verbal Reinforcement Learning]
 
* 2023-05: [https://arxiv.org/abs/2305.16291 VOYAGER: An Open-Ended Embodied Agent with Large Language Models]
 
* 2023-05: [https://arxiv.org/abs/2305.16291 VOYAGER: An Open-Ended Embodied Agent with Large Language Models]
Line 74: Line 81:
 
* 2024-09: [https://arxiv.org/abs/2409.03733 Planning In Natural Language Improves LLM Search For Code Generation]
 
* 2024-09: [https://arxiv.org/abs/2409.03733 Planning In Natural Language Improves LLM Search For Code Generation]
  
==Inference-time Sampling==
+
===Inference-time Sampling===
 
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
 
* 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding]
 
* 2024-10: [https://arxiv.org/abs/2410.16033 TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling]
 
* 2024-10: [https://arxiv.org/abs/2410.16033 TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling]
Line 80: Line 87:
 
* 2024-12: [https://arxiv.org/abs/2412.06822 Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models]
 
* 2024-12: [https://arxiv.org/abs/2412.06822 Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models]
  
==Inference-time Gradient==
+
===Inference-time Gradient===
 
* 2024-11: [https://ekinakyurek.github.io/papers/ttt.pdf The Surprising Effectiveness of Test-Time Training for Abstract Reasoning] ([https://github.com/ekinakyurek/marc code])
 
* 2024-11: [https://ekinakyurek.github.io/papers/ttt.pdf The Surprising Effectiveness of Test-Time Training for Abstract Reasoning] ([https://github.com/ekinakyurek/marc code])
  
==Self-prompting==
+
===Self-prompting===
 
* 2023-05: [https://arxiv.org/abs/2305.09993 Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling]
 
* 2023-05: [https://arxiv.org/abs/2305.09993 Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling]
 
* 2023-11: [https://arxiv.org/abs/2311.04205 Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves]
 
* 2023-11: [https://arxiv.org/abs/2311.04205 Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves]
  
==Retrieval or Memory==
+
===Retrieval or Memory===
 
* 2024-12: [https://arxiv.org/abs/2412.13781 Meta-Reflection: A Feedback-Free Reflection Learning Framework]
 
* 2024-12: [https://arxiv.org/abs/2412.13781 Meta-Reflection: A Feedback-Free Reflection Learning Framework]
  
==In-context thought==
+
===In-context thought===
 
* 2022-01: [https://arxiv.org/abs/2201.11903 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models] (Google Brain)
 
* 2022-01: [https://arxiv.org/abs/2201.11903 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models] (Google Brain)
 
* 2023-05: [https://arxiv.org/abs/2305.10601 Tree of Thoughts: Deliberate Problem Solving with Large Language Models] (Google DeepMind)
 
* 2023-05: [https://arxiv.org/abs/2305.10601 Tree of Thoughts: Deliberate Problem Solving with Large Language Models] (Google DeepMind)
Line 101: Line 108:
 
* 2024-10: [https://arxiv.org/abs/2410.06634 Tree of Problems: Improving structured problem solving with compositionality]
 
* 2024-10: [https://arxiv.org/abs/2410.06634 Tree of Problems: Improving structured problem solving with compositionality]
 
* 2023-01/2024-10: [https://arxiv.org/abs/2301.00234 A Survey on In-context Learning]
 
* 2023-01/2024-10: [https://arxiv.org/abs/2301.00234 A Survey on In-context Learning]
 +
* 2025-01: [https://arxiv.org/abs/2501.04682 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought]
  
==Naive multi-LLM (verification, majority voting, best-of-N, etc.)==
+
===Naive multi-LLM (verification, majority voting, best-of-N, etc.)===
 
* 2023-06: [https://arxiv.org/abs/2306.02561 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion] ([https://github.com/yuchenlin/LLM-Blender?tab=readme-ov-file code])
 
* 2023-06: [https://arxiv.org/abs/2306.02561 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion] ([https://github.com/yuchenlin/LLM-Blender?tab=readme-ov-file code])
 
* 2023-12: [https://aclanthology.org/2023.findings-emnlp.203/ Dynamic Voting for Efficient Reasoning in Large Language Models]
 
* 2023-12: [https://aclanthology.org/2023.findings-emnlp.203/ Dynamic Voting for Efficient Reasoning in Large Language Models]
Line 110: Line 118:
 
* 2024-12: [https://github.com/irthomasthomas/llm-consortium llm-consortium]: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
 
* 2024-12: [https://github.com/irthomasthomas/llm-consortium llm-consortium]: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
  
==Multi-LLM (multiple comparisons, branching, etc.)==
+
===Multi-LLM (multiple comparisons, branching, etc.)===
 
* 2024-10: [https://arxiv.org/abs/2410.10630 Thinking LLMs: General Instruction Following with Thought Generation]
 
* 2024-10: [https://arxiv.org/abs/2410.10630 Thinking LLMs: General Instruction Following with Thought Generation]
 
* 2024-11: [https://arxiv.org/abs/2411.02830 Mixtures of In-Context Learners]: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
 
* 2024-11: [https://arxiv.org/abs/2411.02830 Mixtures of In-Context Learners]: Multiple "experts", each with a different set of in-context examples; combine outputs at the level of next-token-prediction
 
* 2024-11: [https://arxiv.org/abs/2411.10440 LLaVA-o1: Let Vision Language Models Reason Step-by-Step] ([https://github.com/PKU-YuanGroup/LLaVA-o1 code])
 
* 2024-11: [https://arxiv.org/abs/2411.10440 LLaVA-o1: Let Vision Language Models Reason Step-by-Step] ([https://github.com/PKU-YuanGroup/LLaVA-o1 code])
  
==Iteration (e.g. neural-like layered blocks)==
+
===Iteration (e.g. neural-like layered blocks)===
 
* 2024-06: [https://arxiv.org/abs/2406.04692 Mixture-of-Agents Enhances Large Language Model Capabilities]
 
* 2024-06: [https://arxiv.org/abs/2406.04692 Mixture-of-Agents Enhances Large Language Model Capabilities]
  
==Iterative reasoning via graphs==
+
===Iterative reasoning via graphs===
 
* 2023-08: [https://arxiv.org/abs/2308.09687 Graph of Thoughts: Solving Elaborate Problems with Large Language Models]
 
* 2023-08: [https://arxiv.org/abs/2308.09687 Graph of Thoughts: Solving Elaborate Problems with Large Language Models]
 
* 2023-10: [https://arxiv.org/abs/2310.04363 Amortizing intractable inference in large language models] ([https://github.com/GFNOrg/gfn-lm-tuning code])
 
* 2023-10: [https://arxiv.org/abs/2310.04363 Amortizing intractable inference in large language models] ([https://github.com/GFNOrg/gfn-lm-tuning code])
 
* 2024-09: [https://arxiv.org/abs/2409.10038 On the Diagram of Thought]: Iterative reasoning as a directed acyclic graph (DAG)
 
* 2024-09: [https://arxiv.org/abs/2409.10038 On the Diagram of Thought]: Iterative reasoning as a directed acyclic graph (DAG)
  
==Monte Carlo Tree Search (MCTS)==
+
===Monte Carlo Tree Search (MCTS)===
 
* 2024-05: [https://arxiv.org/abs/2405.03553 AlphaMath Almost Zero: process Supervision without process]
 
* 2024-05: [https://arxiv.org/abs/2405.03553 AlphaMath Almost Zero: process Supervision without process]
 
* 2024-06: [https://arxiv.org/abs/2406.03816 ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search]
 
* 2024-06: [https://arxiv.org/abs/2406.03816 ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search]
Line 132: Line 140:
 
* 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
 
* 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
  
==Other Search==
+
===Other Search===
 
* 2024-11: [https://arxiv.org/abs/2411.05010 Scattered Forest Search: Smarter Code Space Exploration with LLMs]
 
* 2024-11: [https://arxiv.org/abs/2411.05010 Scattered Forest Search: Smarter Code Space Exploration with LLMs]
  
==Chain-of-Thought Reasoning==
+
===Chain-of-Thought Reasoning===
 
* 2017-05: [https://arxiv.org/abs/1705.04146 Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems]
 
* 2017-05: [https://arxiv.org/abs/1705.04146 Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems]
 
* 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 
* 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 
* 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
 
* 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
  
==Scaling==
+
==Analysis==
 +
===Scaling===
 
* 2021-04: [https://arxiv.org/abs/2104.03113 Scaling Scaling Laws with Board Games]
 
* 2021-04: [https://arxiv.org/abs/2104.03113 Scaling Scaling Laws with Board Games]
 
* 2024-03: [https://arxiv.org/abs/2403.02419 Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems]
 
* 2024-03: [https://arxiv.org/abs/2403.02419 Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems]
Line 150: Line 159:
 
* 2024-11: [https://arxiv.org/abs/2411.17501 Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers]
 
* 2024-11: [https://arxiv.org/abs/2411.17501 Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers]
  
==Theory==
+
===Theory===
 
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]
 
* 2024-02: [https://arxiv.org/abs/2402.12875 Chain of Thought Empowers Transformers to Solve Inherently Serial Problems]
  
==Expending compute works==
+
===Expending compute works===
 
* 2024-06-10: Blog post (opinion): [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
 
* 2024-06-10: Blog post (opinion): [https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d AI Search: The Bitter-er Lesson]
 
* 2024-07-17: Blog post (test): [https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Getting 50% (SoTA) on ARC-AGI with GPT-4o]
 
* 2024-07-17: Blog post (test): [https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Getting 50% (SoTA) on ARC-AGI with GPT-4o]
Line 160: Line 169:
 
* 2024-09-16: [https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai Scaling: The State of Play in AI]
 
* 2024-09-16: [https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai Scaling: The State of Play in AI]
  
==Code for Inference-time Compute==
+
===Pitfalls===
 +
* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
 +
 
 +
==Pragmatics==
 +
===Code for Inference-time Compute===
 
* [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
 
* [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
  
Line 168: Line 181:
 
=Tool Use=
 
=Tool Use=
 
* 2024-11: [https://arxiv.org/abs/2411.01747 DynaSaur: Large Language Agents Beyond Predefined Actions]: writes functions/code to increase capabilities
 
* 2024-11: [https://arxiv.org/abs/2411.01747 DynaSaur: Large Language Agents Beyond Predefined Actions]: writes functions/code to increase capabilities
 +
==Integrated==
 +
* 2018-08: [https://arxiv.org/abs/1808.00508 Neural Arithmetic Logic Units]
 +
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
 +
* 2024-05: [https://openreview.net/pdf?id=W77TygnBN5 Augmenting Language Models with Composable Differentiable Libraries] ([https://openreview.net/pdf/0ab6ab86a6adf52751f35b725056d5011ecc575d.pdf  pdf])
 +
* 2024-07: [https://arxiv.org/abs/2407.04899 Algorithmic Language Models with Neurally Compiled Libraries]
 +
* 2024-10: [https://arxiv.org/abs/2410.18077 ALTA: Compiler-Based Analysis of Transformers]
  
 
=Multi-agent Effort (and Emergent Intelligence)=
 
=Multi-agent Effort (and Emergent Intelligence)=
Line 180: Line 199:
 
* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text)
 
* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text)
 
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
 
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
 +
 +
=See Also=
 +
* [[AI]]
 +
* [[AI Agents]]
 +
* [[AI research trends]]

Latest revision as of 09:08, 15 January 2025

Reviews

Prompt Engineering

Fine Tuning

Proactive Search

Compute expended after training, but before inference.

Training Data (Data Refinement, Synthetic Data)

Generate consistent plans/thoughts

  • 2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
    • (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
  • 2024-09: Self-Harmonized Chain of Thought
    • Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
  • 2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
    • They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

Scaling

Inference Time Compute

Methods

Review

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Analysis

Scaling

Theory

Expending compute works

Compute.png

Pitfalls

Pragmatics

Code for Inference-time Compute

  • optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

See Also