Difference between revisions of "Increasing AI Intelligence"

From GISAXS
Jump to: navigation, search
(CoT reasoning model)
(CoT reasoning model)
Line 65: Line 65:
 
* 2025-01: [https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf Kimi k1.5: Scaling Reinforcement Learning with LLMs]
 
* 2025-01: [https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf Kimi k1.5: Scaling Reinforcement Learning with LLMs]
 
* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
 
* 2025-01: [https://arxiv.org/abs/2501.11223 Reasoning Language Models: A Blueprint]
 +
* 2025-01: [https://huggingface.co/blog/open-r1 Open-R1: a fully open reproduction of DeepSeek-R1]
  
 
===Scaling===
 
===Scaling===

Revision as of 12:07, 29 January 2025

Reviews

Prompt Engineering

Fine Tuning

Proactive Search

Compute expended after training, but before inference.

Training Data (Data Refinement, Synthetic Data)

Generate consistent plans/thoughts

  • 2024-08: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers (code)
    • (Microsoft) rStar is a self-play mutual reasoning approach. A small model adds to MCTS using some defined reasoning heuristics. Mutually consistent trajectories can be emphasized.
  • 2024-09: Self-Harmonized Chain of Thought
    • Produce refined chain-of-thought style solutions/prompts for diverse problems. Given a large set of problems/questions, first aggregated semantically, then apply zero-shot chain-of-thought to each problem. Then cross-pollinate between proposed solutions to similar problems, looking for refined and generalize solutions.
  • 2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
    • They argue that models trained to reproduce CoT outputs do not, internally, perform stepwise reasoning (with intermediate representations); this suggests that explicit CoT could be superior to implicit CoT.

Sampling

Automated prompt generation

Distill inference-time-compute into model

CoT reasoning model

See also: AI tools > LLM > Open-weights LLM > Reasoning

Scaling

Inference Time Compute

Methods

Review

In context learning (ICL), search, and other inference-time methods

Inference-time Sampling

Inference-time Gradient

Self-prompting

Retrieval or Memory

In-context thought

Naive multi-LLM (verification, majority voting, best-of-N, etc.)

Multi-LLM (multiple comparisons, branching, etc.)

Iteration (e.g. neural-like layered blocks)

Iterative reasoning via graphs

Monte Carlo Tree Search (MCTS)

Other Search

Chain-of-Thought Reasoning

Analysis

Scaling

Theory

Expending compute works

Compute.png

Pitfalls

Pragmatics

Code for Inference-time Compute

  • optillm: Inference proxy which implements state-of-the-art techniques to improve accuracy and performance of LLMs (improve reasoning over coding, logical and mathematical queries)

Interact with Environment

Memory

Tool Use

Integrated

Multi-agent Effort (and Emergent Intelligence)

ML-like Optimization of LLM Setup

Limitations/Requirements

See Also