Difference between revisions of "Increasing AI Intelligence"

From GISAXS
Jump to: navigation, search
(Analysis)
(Proactive Search)
 
(84 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=Reviews=
 
=Reviews=
 
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 
* 2024-12: [https://arxiv.org/abs/2412.11936 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges]
 +
* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
 +
* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
 +
* 2025-02: [https://arxiv.org/abs/2502.03671 Advancing Reasoning in Large Language Models: Promising Methods and Approaches]
 +
* 2025-02: [https://arxiv.org/abs/2502.09100 Logical Reasoning in Large Language Models: A Survey]
 +
* 2025-02: [https://arxiv.org/abs/2502.21321 LLM Post-Training: A Deep Dive into Reasoning Large Language Models]
 +
* 2025-03: [https://arxiv.org/abs/2503.24377 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models]
 +
* Links to papers: [https://github.com/hijkzzz/Awesome-LLM-Strawberry Awesome LLM Strawberry (OpenAI o1)]
 +
 +
===World Model===
 +
* 2025-03: [https://arxiv.org/abs/2503.04641 Simulating the Real World: A Unified Survey of Multimodal Generative Models]
  
 
=Prompt Engineering=
 
=Prompt Engineering=
 
* 2024-11: [https://arxiv.org/abs/2411.05778 LLMs as Method Actors: A Model for Prompt Engineering and Architecture]
 
* 2024-11: [https://arxiv.org/abs/2411.05778 LLMs as Method Actors: A Model for Prompt Engineering and Architecture]
 +
 +
==Thought Templates==
 +
* 2024-06: [https://arxiv.org/abs/2406.04271 Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models]
 +
* 2025-02: [https://arxiv.org/abs/2502.06772 ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates]
 +
 +
==Automatic Prompt Optimization==
 +
* 2023-09: [https://arxiv.org/abs/2309.16797 Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution]
 +
* 2025-02: [https://arxiv.org/abs/2502.16923 A Systematic Survey of Automatic Prompt Optimization Techniques]
 +
* 2025-02: [https://arxiv.org/abs/2502.18746 Automatic Prompt Optimization via Heuristic Search: A Survey]
  
 
=Fine Tuning=
 
=Fine Tuning=
 
* 2024-12: [https://arxiv.org/abs/2412.15287 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models]
 
* 2024-12: [https://arxiv.org/abs/2412.15287 Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.01702 AgentRefine: Enhancing Agent Generalization through Refinement Tuning]
 +
* 2025-01: [https://llm-multiagent-ft.github.io/ Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains] ([https://llm-multiagent-ft.github.io/ preprint], [https://github.com/vsubramaniam851/multiagent-ft/tree/main code])
  
 
=Proactive Search=
 
=Proactive Search=
 
Compute expended after training, but before inference.
 
Compute expended after training, but before inference.
 +
 +
===Reinforcement Learning===
 +
* 2025-04: DeepSeek: [https://arxiv.org/abs/2504.02495 Inference-Time Scaling for Generalist Reward Modeling]
  
 
===Training Data (Data Refinement, Synthetic Data)===
 
===Training Data (Data Refinement, Synthetic Data)===
Line 17: Line 41:
 
* 2024-09: [https://arxiv.org/abs/2409.17115 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale]
 
* 2024-09: [https://arxiv.org/abs/2409.17115 Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale]
 
* 2024-10: [https://arxiv.org/abs/2410.15547 Data Cleaning Using Large Language Models]
 
* 2024-10: [https://arxiv.org/abs/2410.15547 Data Cleaning Using Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.18845 Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities]
 +
* 2025-02: [https://arxiv.org/abs/2502.01718 ACECODER: Acing Coder RL via Automated Test-Case Synthesis]
 +
* 2025-02: [https://arxiv.org/abs/2502.15588 Improving the Scaling Laws of Synthetic Data with Deliberate Practice]
 +
* 2025-03: [https://arxiv.org/abs/2503.19551 Scaling Laws of Synthetic Data for Language Models]
 +
* 2025-03: [https://arxiv.org/abs/2503.18866 Reasoning to Learn from Latent Thoughts]: infer the (latent) thoughts that would have led to training documents, so that you can pretrain on text+thoughts
 
* Updating list of links: [https://github.com/wasiahmad/Awesome-LLM-Synthetic-Data Synthetic Data of LLMs, by LLMs, for LLMs]
 
* Updating list of links: [https://github.com/wasiahmad/Awesome-LLM-Synthetic-Data Synthetic Data of LLMs, by LLMs, for LLMs]
 +
 +
====Re-captioning====
 +
* 2023-10: [https://arxiv.org/abs/2310.16656 A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation]
 +
* 2024-07: [https://arxiv.org/abs/2407.06723 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions]
  
 
===Generate consistent plans/thoughts===
 
===Generate consistent plans/thoughts===
Line 46: Line 79:
  
 
====CoT reasoning model====
 
====CoT reasoning model====
 +
See also: [[AI tools]] > LLM > Open-weights LLM > [[AI_tools#Reasoning|Reasoning]]
 
* 2024-09: [https://openai.com/o1/ OpenAI o1]
 
* 2024-09: [https://openai.com/o1/ OpenAI o1]
 
* 2024-10: [https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf O1 Replication Journey: A Strategic Progress Report – Part 1] ([https://github.com/GAIR-NLP/O1-Journey code]): Attempt by [https://gair-nlp.github.io/walnut-plan/ Walnut Plan] to reproduce o1-like in-context reasoning
 
* 2024-10: [https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf O1 Replication Journey: A Strategic Progress Report – Part 1] ([https://github.com/GAIR-NLP/O1-Journey code]): Attempt by [https://gair-nlp.github.io/walnut-plan/ Walnut Plan] to reproduce o1-like in-context reasoning
Line 51: Line 85:
 
* 2024-11: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11: [https://arxiv.org/abs/2411.14405 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions]
 
* 2024-11: [https://huggingface.co/papers/2411.16489 O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?]
 
* 2024-11: [https://huggingface.co/papers/2411.16489 O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?]
 +
* 2024-11: [https://arxiv.org/abs/2411.15124 Tulu 3: Pushing Frontiers in Open Language Model Post-Training]
 
* 2024-12: [https://arxiv.org/abs/2412.00154 o1-Coder: an o1 Replication for Coding] ([https://github.com/ADaM-BJTU/O1-CODER code])
 
* 2024-12: [https://arxiv.org/abs/2412.00154 o1-Coder: an o1 Replication for Coding] ([https://github.com/ADaM-BJTU/O1-CODER code])
 
* 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
 
* 2024-12: [https://arxiv.org/abs/2412.18319 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search]
 
* 2024-12: [https://arxiv.org/abs/2412.14135 Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective]
 
* 2024-12: [https://arxiv.org/abs/2412.14135 Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective]
 +
* 2025-01: [https://arxiv.org/abs/2501.01904 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM]
 +
* 2025-01: [https://arxiv.org/abs/2501.06458 O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning]
 +
* 2025-01: [https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]
 +
* 2025-01: [https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf Kimi k1.5: Scaling Reinforcement Learning with LLMs]
 +
* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
 +
* 2025-01: [https://huggingface.co/blog/open-r1 Open-R1: a fully open reproduction of DeepSeek-R1]
 +
* 2025-02: [https://arxiv.org/abs/2502.03373 Demystifying Long Chain-of-Thought Reasoning in LLMs]
 +
* 2025-02: [https://arxiv.org/abs/2502.05171 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach] ([https://huggingface.co/tomg-group-umd/huginn-0125 Huginn-0125])
 +
* 2025-02: [https://arxiv.org/pdf/2502.20339 Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners]
  
 
===Scaling===
 
===Scaling===
 
* 2024-08: [https://arxiv.org/abs/2408.16737 Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling] (Google DeepMind)
 
* 2024-08: [https://arxiv.org/abs/2408.16737 Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling] (Google DeepMind)
 
* 2024-11: [https://arxiv.org/abs/2411.04434 Scaling Laws for Pre-training Agents and World Models]
 
* 2024-11: [https://arxiv.org/abs/2411.04434 Scaling Laws for Pre-training Agents and World Models]
 +
* 2025-02: [https://arxiv.org/pdf/2502.20339 Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners]
 +
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
  
 
=Inference Time Compute=
 
=Inference Time Compute=
Line 66: Line 112:
 
'''Review'''
 
'''Review'''
 
* 2024-06: [https://arxiv.org/abs/2406.16838 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models]
 
* 2024-06: [https://arxiv.org/abs/2406.16838 From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models]
 +
* 2025-01: [https://arxiv.org/abs/2501.02497 Test-time Computing: from System-1 Thinking to System-2 Thinking] ([https://github.com/Dereck0602/Awesome_Test_Time_LLMs github list of papers])
  
 
===In context learning (ICL), search, and other inference-time methods===
 
===In context learning (ICL), search, and other inference-time methods===
Line 101: Line 148:
 
* 2024-10: [https://arxiv.org/abs/2410.06634 Tree of Problems: Improving structured problem solving with compositionality]
 
* 2024-10: [https://arxiv.org/abs/2410.06634 Tree of Problems: Improving structured problem solving with compositionality]
 
* 2023-01/2024-10: [https://arxiv.org/abs/2301.00234 A Survey on In-context Learning]
 
* 2023-01/2024-10: [https://arxiv.org/abs/2301.00234 A Survey on In-context Learning]
 +
* 2025-01: [https://arxiv.org/abs/2501.04682 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought]
 +
* [https://x.com/dav1d_bai/status/1904057766593138841 2025-03]: [https://optimal-test-time.vercel.app/papers/accuracy-efficiency-tradeoffs Interruption is All You Need: Improving Reasoning Model Refusal Rates through measuring Parallel Reasoning Diversity]: A novel approach to reducing hallucinations in large language models through parallel reasoning and diversity measurement
  
===Naive multi-LLM (verification, majority voting, best-of-N, etc.)===
+
===Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)===
 
* 2023-06: [https://arxiv.org/abs/2306.02561 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion] ([https://github.com/yuchenlin/LLM-Blender?tab=readme-ov-file code])
 
* 2023-06: [https://arxiv.org/abs/2306.02561 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion] ([https://github.com/yuchenlin/LLM-Blender?tab=readme-ov-file code])
 
* 2023-12: [https://aclanthology.org/2023.findings-emnlp.203/ Dynamic Voting for Efficient Reasoning in Large Language Models]
 
* 2023-12: [https://aclanthology.org/2023.findings-emnlp.203/ Dynamic Voting for Efficient Reasoning in Large Language Models]
Line 109: Line 158:
 
* 2024-11: [https://arxiv.org/abs/2411.00492 Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models]
 
* 2024-11: [https://arxiv.org/abs/2411.00492 Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models]
 
* 2024-12: [https://github.com/irthomasthomas/llm-consortium llm-consortium]: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
 
* 2024-12: [https://github.com/irthomasthomas/llm-consortium llm-consortium]: Multiple LLMs collaboratively solve problems through structured dialogue, evaluation and arbitration
 +
* 2025-03: [https://arxiv.org/abs/2502.01839 Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification]
 +
* 2025-02: [https://arxiv.org/abs/2502.04506 When One LLM Drools, Multi-LLM Collaboration Rules]
 +
* 2025-03: [https://arxiv.org/abs/2503.17363 Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique]
  
 
===Multi-LLM (multiple comparisons, branching, etc.)===
 
===Multi-LLM (multiple comparisons, branching, etc.)===
Line 139: Line 191:
 
* 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 
* 2021-11: [https://arxiv.org/abs/2110.14168 Training Verifiers to Solve Math Word Problems]
 
* 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
 
* 2024-02: [https://arxiv.org/abs/2402.10200 Chain-of-Thought Reasoning Without Prompting]
 +
* 2025-01: [https://arxiv.org/abs/2501.19393 s1: Simple test-time scaling]
 +
* 2025-02: [https://arxiv.org/abs/2502.04404 Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models]
 +
* 2025-02: [https://arxiv.org/abs/2502.06773 On the Emergence of Thinking in LLMs I: Searching for the Right Intuition]
 +
* 2025-02: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]
 +
* 2025-02: [https://arxiv.org/abs/2502.18600 Chain of Draft: Thinking Faster by Writing Less]
 +
* 2025-03: [https://arxiv.org/abs/2503.17352 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement] ([https://github.com/yihedeng9/OpenVLThinker code])
 +
* 2025-03: [https://arxiv.org/abs/2503.19877 Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators]
 +
* 2025-03: [https://arxiv.org/abs/2503.23513 RARE: Retrieval-Augmented Reasoning Modeling]
 +
 +
===Model Merging===
 +
* 2025-01: [https://arxiv.org/abs/2501.12599 Kimi k1.5: Scaling Reinforcement Learning with LLMs]
 +
* 2025-03: [https://arxiv.org/abs/2503.20641 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging] ([https://github.com/hahahawu/Long-to-Short-via-Model-Merging code])
 +
 +
===Meta-methods===
 +
* 2025-02: [https://arxiv.org/abs/2502.12018 Atom of Thoughts for Markov LLM Test-Time Scaling] ([https://github.com/qixucen/atom code])
  
 
==Analysis==
 
==Analysis==
Line 150: Line 217:
 
* 2024-10: (comparing fine-tuning to in-context learning) [https://arxiv.org/abs/2405.19874 Is In-Context Learning Sufficient for Instruction Following in LLMs?]
 
* 2024-10: (comparing fine-tuning to in-context learning) [https://arxiv.org/abs/2405.19874 Is In-Context Learning Sufficient for Instruction Following in LLMs?]
 
* 2024-11: [https://arxiv.org/abs/2411.17501 Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers]
 
* 2024-11: [https://arxiv.org/abs/2411.17501 Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers]
 +
* 2025-02: [https://www.arxiv.org/abs/2502.08606 Distillation Scaling Laws]
 +
* 2025-03: [https://arxiv.org/abs/2503.10061 Compute Optimal Scaling of Skills: Knowledge vs Reasoning]
 +
* 2025-03: [https://arxiv.org/abs/2504.00294 Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead]
 +
 +
====(Optimal) Usage of Reasoning Compute====
 +
* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
 +
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
 +
* 2025-02: [https://www.arxiv.org/abs/2502.04463 Training Language Models to Reason Efficiently]
 +
* 2025-02: [https://arxiv.org/abs/2502.08235 The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks]
 +
* 2025-03: [https://arxiv.org/abs/2503.01141 How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach]
 +
* 2025-03: [https://arxiv.org/abs/2503.16419 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models]
 +
 +
====Usage of Training Data====
 +
* 2025-02: [https://arxiv.org/abs/2502.03387 LIMO: Less is More for Reasoning] (surprisingly easy generalization, from very few reasoning training examples; model can go from knowledge-retrieval to diverse reasoning using curated examples)
  
 
===Theory===
 
===Theory===
Line 160: Line 241:
 
[[Image:Compute.png|600px]]
 
[[Image:Compute.png|600px]]
 
* 2024-09-16: [https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai Scaling: The State of Play in AI]
 
* 2024-09-16: [https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai Scaling: The State of Play in AI]
 
+
* 2025-02-03: [https://arxiv.org/abs/2502.06807 Competitive Programming with Large Reasoning Models]
===Pitfalls===
 
* 2024-12: [https://arxiv.org/abs/2412.21187 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs]
 
  
 
==Pragmatics==
 
==Pragmatics==
 
===Code for Inference-time Compute===
 
===Code for Inference-time Compute===
 
* [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
 
* [https://github.com/codelion/optillm optillm]: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)
 +
 +
=Interact with Environment=
 +
* 2025-01: [https://arxiv.org/abs/2501.10893 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments]
  
 
=Memory=
 
=Memory=
Line 173: Line 255:
 
=Tool Use=
 
=Tool Use=
 
* 2024-11: [https://arxiv.org/abs/2411.01747 DynaSaur: Large Language Agents Beyond Predefined Actions]: writes functions/code to increase capabilities
 
* 2024-11: [https://arxiv.org/abs/2411.01747 DynaSaur: Large Language Agents Beyond Predefined Actions]: writes functions/code to increase capabilities
 +
==Integrated==
 +
* 2018-08: [https://arxiv.org/abs/1808.00508 Neural Arithmetic Logic Units]
 +
* 2023-01: [https://arxiv.org/abs/2301.05062 Tracr: Compiled Transformers as a Laboratory for Interpretability] ([https://github.com/google-deepmind/tracr code])
 +
* 2024-05: [https://openreview.net/pdf?id=W77TygnBN5 Augmenting Language Models with Composable Differentiable Libraries] ([https://openreview.net/pdf/0ab6ab86a6adf52751f35b725056d5011ecc575d.pdf  pdf])
 +
* 2024-07: [https://arxiv.org/abs/2407.04899 Algorithmic Language Models with Neurally Compiled Libraries]
 +
* 2024-10: [https://arxiv.org/abs/2410.18077 ALTA: Compiler-Based Analysis of Transformers]
  
 
=Multi-agent Effort (and Emergent Intelligence)=
 
=Multi-agent Effort (and Emergent Intelligence)=
Line 179: Line 267:
 
* 2024-10: [https://arxiv.org/abs/2410.19318 Two are better than one: Context window extension with multi-grained self-injection]
 
* 2024-10: [https://arxiv.org/abs/2410.19318 Two are better than one: Context window extension with multi-grained self-injection]
 
* 2024-11: [https://arxiv.org/abs/2411.00114 Project Sid: Many-agent simulations toward AI civilization]
 
* 2024-11: [https://arxiv.org/abs/2411.00114 Project Sid: Many-agent simulations toward AI civilization]
 +
* 2025-01: [https://arxiv.org/abs/2501.13946 Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks]
 +
* 2025-02: [https://arxiv.org/abs/2502.16111 PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving]
  
 
=ML-like Optimization of LLM Setup=
 
=ML-like Optimization of LLM Setup=
 
* 2023-03: [https://arxiv.org/abs/2310.03714 DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines] ([https://github.com/stanfordnlp/dspy code]: Programming—not prompting—Foundation Models)
 
* 2023-03: [https://arxiv.org/abs/2310.03714 DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines] ([https://github.com/stanfordnlp/dspy code]: Programming—not prompting—Foundation Models)
 
* 2024-05: [https://arxiv.org/abs/2305.03495 Automatic Prompt Optimization with "Gradient Descent" and Beam Search]
 
* 2024-05: [https://arxiv.org/abs/2305.03495 Automatic Prompt Optimization with "Gradient Descent" and Beam Search]
* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text)
+
* 2024-06: [https://arxiv.org/abs/2406.07496 TextGrad: Automatic "Differentiation" via Text] (gradient backpropagation through text, analogous to gradient descent)
 
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
 
* 2024-06: [https://arxiv.org/abs/2406.18532 Symbolic Learning Enables Self-Evolving Agents] (optimize LLM frameworks)
 +
* 2025-03: [https://www.nature.com/articles/s41586-025-08661-4 Optimizing generative AI by backpropagating language model feedback]
 +
 +
=Limitations/Requirements=
 +
* Fluid intelligence (c.f. [https://arcprize.org/arc ARC AGI])
 +
* 2024-06: [https://arxiv.org/abs/2406.04268 Open-Endedness is Essential for Artificial Superhuman Intelligence]
 +
 +
==Creativity==
 +
See: [[AI creativity]]
 +
 +
=See Also=
 +
* [[AI]]
 +
* [[AI Agents]]
 +
* [[AI research trends]]

Latest revision as of 10:07, 4 April 2025

Reviews

World Model

Prompt Engineering

Thought Templates

Automatic Prompt Optimization

Fine Tuning

Proactive Search

Compute expended after training, but before inference.

Reinforcement Learning

Training Data (Data Refinement, Synthetic Data)

Re-captioning

Generate consistent plans/thoughts

  • 2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
    • (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
  • 2024-09: Self-Harmonized Chain of Thought
    • Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
  • 2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
    • They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

See also: AI tools > LLM > Open-weights LLM > Reasoning

Scaling

Inference Time Compute

Methods

Review

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, self-critique, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Model Merging

Meta-methods

Analysis

Scaling

(Optimal) Usage of Reasoning Compute

Usage of Training Data

  • 2025-02: LIMO: Less is More for Reasoning (surprisingly easy generalization, from very few reasoning training examples; model can go from knowledge-retrieval to diverse reasoning using curated examples)

Theory

Expending compute works

Compute.png

Pragmatics

Code for Inference-time Compute

  • optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Interact with Environment

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

Limitations/Requirements

Creativity

See: AI creativity

See Also