Latest revision as of 14:27, 9 October 2025

2023-07Jul: Measuring Faithfulness in Chain-of-Thought Reasoning roughly proves that sufficiently large models do not generate CoT that actually captures their internal reasoning)

Heuristic Understanding

2022-09: Janus: Simulators

Emergent Internal Model Building

Semantic Directions

Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)

Task vectors:

Reasoning:

Understanding Reasoning in Thinking Language Models via Steering Vectors

Feature Geometry Reproduces Problem-space

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (Othello)
Emergent linear representations in world models of self-supervised sequence models (Othello)
What learning algorithm is in-context learning? Investigations with linear models
Emergent analogical reasoning in large language models
Language Models Represent Space and Time (Maps of world, US)
Not All Language Model Features Are Linear (Days of week form ring, etc.)
Evaluating the World Model Implicit in a Generative Model (Map of Manhattan)
Reliable precipitation nowcasting using probabilistic diffusion models. Generation of precipitation map imagery is predictive of actual future weather; implies model is learning scientifically-relevant modeling.
The Platonic Representation Hypothesis: Different models (including across modalities) are converging to a consistent world model.
ICLR: In-Context Learning of Representations
Language Models Use Trigonometry to Do Addition: Numbers arranged in helix to enable addition

Capturing Physics

2020-09: Learning to Identify Physical Parameters from Video Using Differentiable Physics
2022-07: Self-Supervised Learning for Videos: A Survey
2025-02: Fair at Meta: Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Theory of Mind

Skeptical

Information Processing

2019-03: Diagnosing and Enhancing VAE Models
2021-03: Pretrained Transformers as Universal Computation Engines
2022-10: How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
2023-04: Why think step by step? Reasoning emerges from the locality of experience
2023-10: What's the Magic Word? A Control Theory of LLM Prompting
2024-02: Chain of Thought Empowers Transformers to Solve Inherently Serial Problems: Proves that transformers can solve any problem, if they can generate sufficient intermediate tokens
2024-07: Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
- Models learning reasoning skills (they are not merely memorizing solution templates). They can mentally generate simple short plans (like humans).
- When presented facts, models develop internal understanding of what parameters (recursively) depend on each other. This occurs even before an explicit question is asked (i.e. before the task is defined). This appears to be different from human reasoning.
- Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
2024-11: Ask, and it shall be given: Turing completeness of prompting
2025-04: Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Generalization

2024-06: Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Grokking

Tests of Resilience to Dropouts/etc.

2024-02: Explorations of Self-Repair in Language Models
2024-06: What Matters in Transformers? Not All Attention is Needed
- Removing entire transformer blocks leads to significant performance degradation
- Removing MLP layers results in significant performance degradation
- Removing attention layers causes almost no performance degradation
- E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
2024-06: The Remarkable Robustness of LLMs: Stages of Inference?
- They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
- They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
  - Detokenization: Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
  - Feature engineering: Features are progressively refined. Factual knowledge is leveraged.
  - Prediction ensembling: Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
  - Residual sharpening: The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
- This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

Semantic Vectors

2024-06: Refusal in Language Models Is Mediated by a Single Direction
2025-02: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs (demonstrates entangling of concepts into a single preference vector)
2025-03: Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction

Other

2024-11: Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
2024-11: Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding (code)
2024-11: Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models: LLMs learn reasoning by extracting procedures from training data, not by memorizing specific answers
2024-11: LLMs Do Not Think Step-by-step In Implicit Reasoning
2024-12: The Complexity Dynamics of Grokking

Scaling Laws

1993: Learning Curves: Asymptotic Values and Rate of Convergence
2017-12: Deep Learning Scaling is Predictable, Empirically (Baidu)
2019-03: The Bitter Lesson (Rich Sutton)
2020-01: Scaling Laws for Neural Language Models (OpenAI)
2020-10: Scaling Laws for Autoregressive Generative Modeling (OpenAI)
2020-05: The Scaling Hypothesis (Gwern)
2021-08: Scaling Laws for Deep Learning
2021-02: Explaining Neural Scaling Laws (Google DeepMind)
2022-03: Training Compute-Optimal Large Language Models (Chinchilla, Google DeepMind)
2025-03: Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
2025-03: Compute Optimal Scaling of Skills: Knowledge vs Reasoning
2025-04: Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
2025-05: LLMs on the Line: Data Determines Loss-To-Loss Scaling Laws

Information Processing/Storage

2020-02: A Theory of Usable Information Under Computational Constraints
2021-04: Why is AI hard and Physics simple?
2021-06: Thinking Like Transformers
"A transformer's depth affects its reasoning capabilities, whilst model size affects its knowledge capacity" (c.f.)
- 2024-02: MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
- 2024-04: The Illusion of State in State-Space Models (figure 3)
- 2024-08: Gemma 2: Improving Open Language Models at a Practical Size (table 9)
2024-09: Schrodinger's Memory: Large Language Models
2024-10: Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning. CoT involves both memorization and (probabilitic) reasoning
2024-11: Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
2025-03: A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Statistics/Math

2023-05: The emergence of clusters in self-attention dynamics
2023-12: A mathematical perspective on Transformers
2024-07: Understanding Transformers via N-gram Statistics
2024-10: Dynamic metastability in the self-attention model
2024-11: Measure-to-measure interpolation using Transformers
2025-04: Quantitative Clustering in Mean-Field Transformer Models

Tokenization

For numbers/math

2024-02: Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs: L2R vs. R2L yields different performance on math

Data Storage

1988-09: On the capabilities of multilayer perceptrons
2006-12: Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition (single-layer perceptron stores >2 bits/parameter; MLP ~ 2*N² bits w/ N² params)
2016-11: Capacity and Trainability in Recurrent Neural Networks (5 bits/param)
2018-02: The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
2019-05: Memorization Capacity of Deep Neural Networks under Parameter Quantization
2020-02: How Much Knowledge Can You Pack Into the Parameters of a Language Model?
2020-08: Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries (capacity scales linearly with parameters; more training samples leads to less memorization)
2020-12: When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?
2024-04: Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws (2 bits/param)
2024-06: Scaling Laws for Fact Memorization of Large Language Models (1T params needed to memorize Wikipedia)
2024-12: The Complexity Dynamics of Grokking
2025-05: How much do language models memorize? (3.6 bits/parameter)
2025-06: Trade-offs in Data Memorization via Strong Data Processing Inequalities

Reverse-Engineering Training Data

Compression

Learning/Training

2018-03: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks: Sparse neural networks are optimal, but it is difficult to identify the right architecture and train it. Deep learning typically consists of training a dense neural network, which makes it easier to learn an internal sparse circuit optimal to a particular problem.
2024-12: On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
2025-01: Physics of Skill Learning
2025-05: ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Cross-modal knowledge transfer

2022-03: Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-modal Knowledge Transfer
2023-05: Towards Versatile and Efficient Visual Knowledge Integration into Pre-trained Language Models with Cross-Modal Adapters
2025-02: Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models: CLIP learns richer set of aggregated representations (e.g. for a culture or country), vs. a vision-only model.

Hidden State

2025-02: Emergent Response Planning in LLM: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
2025-03: (How) Do Language Models Track State?

Convergent Representation

2015-11: Convergent Learning: Do different neural networks learn the same representations?
2025-05: Harnessing the Universal Geometry of Embeddings: Evidence for The Strong Platonic Representation Hypothesis; models converge to a single consensus reality

Function Approximation

2022-08: What Can Transformers Learn In-Context? A Case Study of Simple Function Classes: can learn linear functions (equivalent to least-squares estimator)
2022-11: Teaching Algorithmic Reasoning via In-context Learning: Simple arithmetic
2022-11: What learning algorithm is in-context learning? Investigations with linear models (code): can learn linear regression
2022-12: Transformers learn in-context by gradient descent
2023-06: Transformers learn to implement preconditioned gradient descent for in-context learning
2023-07: One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
2024-04: ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
2025-02: SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
2025-02: Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Failure Modes

2023-06: Can Large Language Models Infer Causation from Correlation?: Poor causal inference
2023-09: The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
2023-09: Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
2023-11: Visual cognition in multimodal large language models (models lack human-like visual understanding)

Fracture Representation

2025-05: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis (code)

Jagged Frontier

2023-09: Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality
2024-07: How Does Quantization Affect Multilingual LLMs?: Quantization degrades different languages by differing amounts
2025-03: Compute Optimal Scaling of Skills: Knowledge vs Reasoning: Scaling laws are skill-dependent

Model Collapse

2023-05: The Curse of Recursion: Training on Generated Data Makes Models Forget
2023-07: Self-Consuming Generative Models Go MAD
2023-10: On the Stability of Iterative Retraining of Generative Models on their own Data
2023-11: Nepotistically Trained Generative-AI Models Collapse
2024-04: AI and the Problem of Knowledge Collapse
2024-07: AI models collapse when trained on recursively generated data

Analysis

2024-02: Scaling laws for learning with real and surrogate data
2024-12: Rate of Model Collapse in Recursive Training

Mitigation

Psychology

Allow LLM to think

2024-12: Let your LLM generate a few tokens and you will reduce the need for retrieval

In-context Learning

Reasoning (CoT, etc.)

2025-01: Large Language Models Think Too Fast To Explore Effectively
2025-01: Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
2025-01: Are DeepSeek R1 And Other Reasoning Models More Faithful?: reasoning models can provide faithful explanations for why their reasoning is correct
2025-03: Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
2025-04: Rethinking Reflection in Pre-Training: pre-training alone already provides some amount of reflection/reasoning
2025-07: BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

@@ Line 5: / Line 5: @@
 ==Concepts==
 * 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
+* 2025-08: [https://transformer-circuits.pub/2025/attention-qk/index.html Tracing Attention Computation Through Feature Interactions]
 ==Mechanistic Interpretability==
@@ Line 42: / Line 43: @@
 * 2025-06: [https://arxiv.org/abs/2506.15679 Dense SAE Latents Are Features, Not Bugs]
 * 2025-06: [https://arxiv.org/abs/2506.20790 Stochastic Parameter Decomposition] ([https://github.com/goodfire-ai/spd code], [https://www.goodfire.ai/papers/stochastic-param-decomp blog])
+* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
 ===Counter-Results===
@@ Line 50: / Line 52: @@
 * 2025-02: [https://arxiv.org/abs/2502.04878 Sparse Autoencoders Do Not Find Canonical Units of Analysis]
 * 2025-03: [https://www.alignmentforum.org/posts/4uXCAJNuPKtKBsi28/ Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research]
+==Meta-cognition==
+* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
 ==Coding Models==
@@ Line 80: / Line 85: @@
 ** Tegmark et al. report multi-scale structure: 1) “atomic” small-scale, 2) “brain” intermediate-scale, and 3) “galaxy” large-scale
 * 2025-02: [https://arxiv.org/abs/2502.08009 The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models]
+* 2025-08: [https://arxiv.org/abs/2508.10003 Semantic Structure in Large Language Model Embeddings]
 ==Topography==
@@ Line 95: / Line 101: @@
 * 2023-07: [https://arxiv.org/abs/2307.15936 A Theory for Emergence of Complex Skills in Language Models]
 * 2024-06: [https://arxiv.org/abs/2406.19370v1 Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space]
+* 2025-09: [https://arxiv.org/abs/2509.20328 Video models are zero-shot learners and reasoners]
 ===Semantic Directions===
@@ Line 135: / Line 142: @@
 * [https://arxiv.org/abs/2410.13787 Looking Inward: Language Models Can Learn About Themselves by Introspection]
 * [https://arxiv.org/abs/2501.11120 Tell me about yourself: LLMs are aware of their learned behaviors]
+* 2025-10: [https://arxiv.org/abs/2509.22887 Infusing Theory of Mind into Socially Intelligent LLM Agents]
 ===Skeptical===
@@ Line 166: / Line 174: @@
 * 2024-12: [https://arxiv.org/abs/2412.18624 How to explain grokking]
 * 2024-12: [https://arxiv.org/abs/2412.09810 The Complexity Dynamics of Grokking]
+* 2025-09: [https://arxiv.org/abs/2509.21519 Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking]
 ===Tests of Resilience to Dropouts/etc.===
@@ Line 196: / Line 205: @@
 ===Scaling Laws===
+* 1993: [https://proceedings.neurips.cc/paper/1993/file/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf Learning Curves: Asymptotic Values and Rate of Convergence]
 * 2017-12: [https://arxiv.org/abs/1712.00409 Deep Learning Scaling is Predictable, Empirically] (Baidu)
 * 2019-03: [http://www.incompleteideas.net/IncIdeas/BitterLesson.html The Bitter Lesson] (Rich Sutton)
@@ Line 351: / Line 361: @@
 * 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
 * 2025-07: [https://arxiv.org/abs/2501.18858 BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning]
+===Pathfinding===
+* 2024-08: [https://arxiv.org/abs/2408.08152 DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search]
+* 2025-06: [https://arxiv.org/abs/2506.01939 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning]
+* 2025-09: [https://arxiv.org/abs/2509.09284 Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning]
+* 2025-09: [https://arxiv.org/abs/2509.06160v1 Reverse-Engineered Reasoning for Open-Ended Generation]
+===Skeptical===
+* 2025-06: [https://arxiv.org/abs/2506.06941 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]
+* 2025-08: [https://www.arxiv.org/abs/2508.01191 Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens]
 ==Self-Awareness and Self-Recognition==
@@ Line 356: / Line 376: @@
 * 2024-12: [https://theaidigest.org/self-awareness AIs are becoming more self-aware. Here's why that matters]
 * 2025-04: [https://x.com/Josikinz/status/1907923319866716629 LLMs can guess which comic strip was generated by themselves (vs. other LLM)]
+* 2025-05: [https://arxiv.org/abs/2505.13763 Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations]
 ==LLM personalities==
 * 2025-07: [https://arxiv.org/abs/2507.02618 Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory]
+* 2025-09: [https://arxiv.org/abs/2509.04343 Psychologically Enhanced AI Agents]
 ==Quirks & Biases==
 * 2025-04: [https://www.cambridge.org/core/journals/judgment-and-decision-making/article/artificial-intelligence-and-dichotomania/0421D2310727D73FAB47069FD1620AA1 Artificial intelligence and dichotomania]
+* 2025-09: [https://arxiv.org/abs/2509.22818 Can Large Language Models Develop Gambling Addiction?]
 =Vision Models=

Difference between revisions of "AI understanding"

Latest revision as of 14:27, 9 October 2025

Contents

Interpretability

Concepts

Mechanistic Interpretability

Semanticity

Counter-Results

Meta-cognition

Coding Models

Reward Functions

Symbolic and Notation

Mathematical

Geometric

Topography

Challenges

Heuristic Understanding

Emergent Internal Model Building

Semantic Directions

Feature Geometry Reproduces Problem-space

Capturing Physics

Theory of Mind

Skeptical

Information Processing

Generalization

Grokking

Tests of Resilience to Dropouts/etc.

Semantic Vectors

Other

Scaling Laws

Information Processing/Storage

Statistics/Math

Tokenization

For numbers/math

Data Storage

Reverse-Engineering Training Data

Compression

Learning/Training

Cross-modal knowledge transfer

Hidden State

Convergent Representation

Function Approximation

Failure Modes

Fracture Representation

Jagged Frontier

See also

Model Collapse

Analysis

Mitigation

Psychology

Allow LLM to think

In-context Learning

Reasoning (CoT, etc.)

Pathfinding

Skeptical

Self-Awareness and Self-Recognition

LLM personalities

Quirks & Biases

Vision Models

See Also

Navigation menu

Search