Difference between revisions of "AI understanding"

From GISAXS
Jump to: navigation, search
(Psychology)
(Hidden State)
 
(8 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
 
* 2017-01: [https://arxiv.org/abs/1704.01444 Learning to Generate Reviews and Discovering Sentiment]
 
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]
 
* 2025-02: [https://arxiv.org/abs/2502.11639 Neural Interpretable Reasoning]
 +
 +
==Concepts==
 +
* 2025-04: [https://arxiv.org/abs/2504.20938 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition] ([https://github.com/OpenMOSS/Lorsa code])
  
 
==Mechanistic Interpretability==
 
==Mechanistic Interpretability==
Line 104: Line 107:
 
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
 
* [https://www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning Extracting sae task features for in-context learning]
 
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
 
* [https://arxiv.org/abs/2412.12276 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers]
 +
Reasoning:
 +
* [https://openreview.net/forum?id=OwhVWNOBcz Understanding Reasoning in Thinking Language Models via Steering Vectors]
  
 
===Feature Geometry Reproduces Problem-space===
 
===Feature Geometry Reproduces Problem-space===
Line 142: Line 147:
 
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
 
** Model depth matters for reasoning. This cannot be mitigated by chain-of-thought prompting (which allow models to develop and then execute plans) since even a single CoT step may require deep, multi-step reasoning/planning.
 
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
 
* 2024-11: [https://arxiv.org/abs/2411.01992 Ask, and it shall be given: Turing completeness of prompting]
 +
* 2025-04: [https://arxiv.org/abs/2504.08775 Layers at Similar Depths Generate Similar Activations Across LLM Architectures]
  
 
===Generalization===
 
===Generalization===
Line 204: Line 210:
 
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
 
* 2024-11: [https://arxiv.org/abs/2411.16679 Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?]
 
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
 
* 2025-03: [https://www.arxiv.org/abs/2503.03961 A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers]
 +
 +
==Statistics/Math==
 +
* 2023-05: [https://arxiv.org/abs/2305.05465 The emergence of clusters in self-attention dynamics]
 +
* 2023-12: [https://arxiv.org/abs/2312.10794 A mathematical perspective on Transformers]
 +
* 2024-07: [https://arxiv.org/abs/2407.12034 Understanding Transformers via N-gram Statistics]
 +
* 2024-10: [https://arxiv.org/abs/2410.06833 Dynamic metastability in the self-attention model]
 +
* 2024-11: [https://arxiv.org/abs/2411.04551 Measure-to-measure interpolation using Transformers]
 +
* 2025-04: [https://arxiv.org/abs/2504.14697 Quantitative Clustering in Mean-Field Transformer Models]
  
 
==Tokenization==
 
==Tokenization==
Line 222: Line 236:
 
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
 
* 2025-02: [https://arxiv.org/abs/2502.06258 Emergent Response Planning in LLM]: They show that the latent representation contains information beyond that needed for the next token (i.e. the model learns to "plan ahead" and encode information relevant to future tokens)
 
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
 
* 2025-03: [https://arxiv.org/abs/2503.02854 (How) Do Language Models Track State?]
 +
===Convergent Representation===
 +
* 2015-11: [https://arxiv.org/abs/1511.07543 Convergent Learning: Do different neural networks learn the same representations?]
 +
* 2025-05: [https://arxiv.org/abs/2505.12540 Harnessing the Universal Geometry of Embeddings]: Evidence for [https://x.com/jxmnop/status/1925224620166128039 The Strong Platonic Representation Hypothesis]; models converge to a single consensus reality
  
 
==Function Approximation==
 
==Function Approximation==
Line 239: Line 256:
 
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
 
* 2023-09: [https://arxiv.org/abs/2309.13638 Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve] (biases towards "common" numbers, in-context CoT can reduce performance by incorrectly priming, etc.)
 
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)
 
* 2023-11: [https://arxiv.org/abs/2311.16093 Visual cognition in multimodal large language models] (models lack human-like visual understanding)
 +
 +
==Fracture Representation==
 +
* 2025-05: [https://arxiv.org/abs/2505.11581 Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis] ([https://github.com/akarshkumar0101/fer code])
  
 
==Jagged Frontier==
 
==Jagged Frontier==
Line 283: Line 303:
 
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
 
* 2025-01: [https://arxiv.org/abs/2501.18585 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs]
 
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
 
* 2025-01: [https://arxiv.org/abs/2501.08156 Are DeepSeek R1 And Other Reasoning Models More Faithful?]: reasoning models can provide faithful explanations for why their reasoning is correct
 +
* 2025-03: [https://arxiv.org/abs/2503.08679 Chain-of-Thought Reasoning In The Wild Is Not Always Faithful]
 
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
 
* 2025-04: [https://arxiv.org/abs/2504.04022 Rethinking Reflection in Pre-Training]: pre-training alone already provides some amount of reflection/reasoning
  
Line 295: Line 316:
 
=Vision Models=
 
=Vision Models=
 
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
 
* 2017-11: Distill: [https://distill.pub/2017/feature-visualization/ Feature Visualization: How neural networks build up their understanding of images]
 +
* 2021-01: [https://arxiv.org/abs/2101.12322 Position, Padding and Predictions: A Deeper Look at Position Information in CNNs]
 
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])
 
* 2025-04: [https://arxiv.org/abs/2504.13181 Perception Encoder: The best visual embeddings are not at the output of the network] ([https://github.com/facebookresearch/perception_models code])
  

Latest revision as of 13:44, 21 May 2025

Interpretability

Concepts

Mechanistic Interpretability

Semanticity

Counter-Results

Coding Models

Reward Functions

Symbolic and Notation

Mathematical

Geometric

Topography

Challenges

GYe31yXXQAABwaZ.jpeg

Heuristic Understanding

Emergent Internal Model Building

Semantic Directions

Directions, e.g.: f(king)-f(man)+f(woman)=f(queen) or f(sushi)-f(Japan)+f(Italy)=f(pizza)

Task vectors:

Reasoning:

Feature Geometry Reproduces Problem-space

Capturing Physics

Theory of Mind

Skeptical

Information Processing

Generalization

Grokking

Tests of Resilience to Dropouts/etc.

  • 2024-02: Explorations of Self-Repair in Language Models
  • 2024-06: What Matters in Transformers? Not All Attention is Needed
    • Removing entire transformer blocks leads to significant performance degradation
    • Removing MLP layers results in significant performance degradation
    • Removing attention layers causes almost no performance degradation
    • E.g. half of attention layers are deleted (48% speed-up), leads to only 2.4% decrease in the benchmarks
  • 2024-06: The Remarkable Robustness of LLMs: Stages of Inference?
    • They intentionally break the network (swapping layers), yet it continues to work remarkably well. This suggests LLMs are quite robust, and allows them to identify different stages in processing.
    • They also use these interventions to infer what different layers are doing. They break apart the LLM transformer layers into four stages:
      • Detokenization: Raw tokens are converted into meaningful entities that take into account local context (especially using nearby tokens).
      • Feature engineering: Features are progressively refined. Factual knowledge is leveraged.
      • Prediction ensembling: Predictions (for the ultimately-selected next-token) emerge. A sort of consensus voting is used, with “prediction neurons” and "suppression neurons" playing a major role in upvoting/downvoting.
      • Residual sharpening: The semantic representations are collapsed into specific next-token predictions. There is a strong emphasis on suppression neurons eliminating options. The confidence is calibrated.
    • This structure can be thought of as two halves (being roughly dual to each other): the first half broadens (goes from distinct tokens to a rich/elaborate concept-space) and the second half collapses (goes from rich concepts to concrete token predictions).

Semantic Vectors

Other

Scaling Laws

Information Processing/Storage

Statistics/Math

Tokenization

For numbers/math

Learning/Training

Cross-modal knowledge transfer

Hidden State

Convergent Representation

Function Approximation

Failure Modes

Fracture Representation

Jagged Frontier

Model Collapse

Analysis

Mitigation

Psychology

Allow LLM to think

In-context Learning

Reasoning (CoT, etc.)

Self-Awareness and Self-Recognition

Quirks & Biases

Vision Models

See Also