Difference between revisions of "AI understanding"

From GISAXS
Jump to: navigation, search
(In-context Learning)
(Quirks & Biases)
 
(13 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
==Concepts==
 
==Concepts==
 
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
 
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
 +
* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]
  
 
==Mechanistic Interpretability==
 
==Mechanistic Interpretability==
Line 42: Line 43:
 
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
 
* 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
 
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
 
* 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
 +
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
  
 
===Counter-Results===
 
===Counter-Results===
Line 50: Line 52:
 
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
 
* 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
 
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]
 
* 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]
 +
 +
==Meta-cognition==
 +
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
  
 
==Coding Models==
 
==Coding Models==
Line 80: Line 85:
 
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
 
** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
 
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
 
* 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
 +
* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
  
 
==Topography==
 
==Topography==
Line 95: Line 101:
 
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
 
* 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
 
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
 
* 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
 +
* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]
  
 
===Semantic Directions===
 
===Semantic Directions===
Line 135: Line 142:
 
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
 
* [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
 
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
 
* [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
 +
* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]
  
 
===Skeptical===
 
===Skeptical===
Line 166: Line 174:
 
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
 
* 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
 
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
 
* 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
 +
* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
  
 
===Tests of Resilience to Dropouts/etc.===
 
===Tests of Resilience to Dropouts/etc.===
Line 196: Line 205:
  
 
===Scaling Laws===
 
===Scaling Laws===
 +
* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
 
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
 
* 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
 
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
 
* 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
Line 351: Line 361:
 
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
 
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
 
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]
 
* 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]
 +
 +
===Pathfinding===
 +
* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
 +
* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
 +
* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
 +
* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]
 +
 +
===Skeptical===
 +
* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
 +
* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]
  
 
==Self-Awareness and Self-Recognition==
 
==Self-Awareness and Self-Recognition==
Line 356: Line 376:
 
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
 
* 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
 
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
 
* 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
 +
* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
  
 
==LLM personalities==
 
==LLM personalities==
 
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
 
* 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
 +
* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
  
 
==Quirks & Biases==
 
==Quirks & Biases==
 
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
 
* 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
 +
* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]
  
 
=Vision Models=
 
=Vision Models=

Latest revision as of 14:27, 9 October 2025

Interpretability

Concepts

Mechanistic Interpretability

Semanticity

Counter-Results

Meta-cognition

Coding Models

Reward Functions

Symbolic and Notation

Mathematical

Geometric

Topography

Challenges

GYe31yXXQAABwaZ.jpeg

Heuristic Understanding

Emergent Internal Model Building

Semantic Directions

Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)

Task vectors:

Reasoning:

Feature Geometry Reproduces Problem-space

Capturing Physics

Theory of Mind

Skeptical

Information Processing

Generalization

Grokking

Tests of Resilience to Dropouts/etc.

  • 2024-02: Explorations of Self-Repair in Language Models
  • 2024-06: What Matters in Transformers? Not All Attention is Needed
    • Removing entire transformer blocks leads to significant performance degradation
    • Removing MLP layers results in significant performance degradation
    • Removing attention layers causes almost no performance degradation
    • E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
  • 2024-06: The Remarkable Robustness of LLMs: Stages of Inference?
    • They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
    • They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
      • Detokenization: Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
      • Feature engineering: Features are progressively refined. Factual knowledge is leveraged.
      • Prediction ensembling: Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
      • Residual sharpening: The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
    • This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

Semantic Vectors

Other

Scaling Laws

Information Processing/Storage

Statistics/Math

Tokenization

For numbers/math

Data Storage

Reverse-Engineering Training Data

Compression

Learning/Training

Cross-modal knowledge transfer

Hidden State

Convergent Representation

Function Approximation

Failure Modes

Fracture Representation

Jagged Frontier

See also

Model Collapse

Analysis

Mitigation

Psychology

Allow LLM to think

In-context Learning

Reasoning (CoT, etc.)

Pathfinding

Skeptical

Self-Awareness and Self-Recognition

LLM personalities

Quirks & Biases

Vision Models

See Also