Difference between revisions of "AI research trends"
KevinYager (talk | contribs) (→Neural (non-token) Latent Representation) |
KevinYager (talk | contribs) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | =Novel Tokenization and/or Sampling= | ||
+ | * 2024-10: [https://github.com/xjdr-alt/entropix entropix: Entropy Based Sampling and Parallel CoT Decoding] | ||
+ | * 2024-12: [https://arxiv.org/abs/2412.06676 I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token] | ||
+ | |||
+ | =System 2 Reasoning= | ||
+ | See: [[AI_Agents#Increasing_AI_Agent_Intelligence|Increasing AI Agent Intelligence]] | ||
+ | |||
+ | =Episodic Memory= | ||
+ | * 2024-03: [https://arxiv.org/abs/2403.11901 Larimar: Large Language Models with Episodic Memory Control] | ||
+ | |||
=Neural (non-token) Latent Representation= | =Neural (non-token) Latent Representation= | ||
* 2024-11: Microsoft: [https://arxiv.org/abs/2411.02820 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving]: LLMs invent their own inter-communication language | * 2024-11: Microsoft: [https://arxiv.org/abs/2411.02820 DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving]: LLMs invent their own inter-communication language | ||
Line 4: | Line 14: | ||
* 2024-12: Meta: [https://arxiv.org/abs/2412.08821 Large Concept Models: Language Modeling in a Sentence Representation Space]: train a model that operates at a higher level of abstraction than typical word/token LLMs; model operates in a space of concept embeddings (more akin to full sentences than individual words) | * 2024-12: Meta: [https://arxiv.org/abs/2412.08821 Large Concept Models: Language Modeling in a Sentence Representation Space]: train a model that operates at a higher level of abstraction than typical word/token LLMs; model operates in a space of concept embeddings (more akin to full sentences than individual words) | ||
* 2024-12: Meta: [https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/ Byte Latent Transformer: Patches Scale Better Than Tokens]: Instead of tokenization, dynamically convert input byte-stream into patches, yielding gains in compute efficiency, with minimal loss in performance | * 2024-12: Meta: [https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/ Byte Latent Transformer: Patches Scale Better Than Tokens]: Instead of tokenization, dynamically convert input byte-stream into patches, yielding gains in compute efficiency, with minimal loss in performance | ||
+ | * 2024-12: [https://arxiv.org/abs/2412.13171 Compressed Chain of Thought: Efficient Reasoning Through Dense Representations] | ||
* 2024-12: Google DeepMind: [https://arxiv.org/abs/2412.17747 Deliberation in Latent Space via Differentiable Cache Augmentation] | * 2024-12: Google DeepMind: [https://arxiv.org/abs/2412.17747 Deliberation in Latent Space via Differentiable Cache Augmentation] | ||
* 2024-12: [https://github.com/jerber/lang-jepa LANG-JEPA: Learning to Think in Latent Space] | * 2024-12: [https://github.com/jerber/lang-jepa LANG-JEPA: Learning to Think in Latent Space] |
Latest revision as of 09:23, 25 December 2024
Contents
Novel Tokenization and/or Sampling
- 2024-10: entropix: Entropy Based Sampling and Parallel CoT Decoding
- 2024-12: I Don't Know: Explicit Modeling of Uncertainty with an [IDK Token]
System 2 Reasoning
See: Increasing AI Agent Intelligence
Episodic Memory
Neural (non-token) Latent Representation
- 2024-11: Microsoft: DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving: LLMs invent their own inter-communication language
- 2024-12: Meta: Training Large Language Models to Reason in a Continuous Latent Space: feeding the latent representation directly back into the model, instead of tokenizing intermediate thoughts (Chain of Continuous Thought, a.k.a. Coconut)
- 2024-12: Meta: Large Concept Models: Language Modeling in a Sentence Representation Space: train a model that operates at a higher level of abstraction than typical word/token LLMs; model operates in a space of concept embeddings (more akin to full sentences than individual words)
- 2024-12: Meta: Byte Latent Transformer: Patches Scale Better Than Tokens: Instead of tokenization, dynamically convert input byte-stream into patches, yielding gains in compute efficiency, with minimal loss in performance
- 2024-12: Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
- 2024-12: Google DeepMind: Deliberation in Latent Space via Differentiable Cache Augmentation
- 2024-12: LANG-JEPA: Learning to Think in Latent Space