Revision as of 12:43, 16 March 2025

ERI

Research Thrusts

How to adapt frontier methods and foundation models to science?

Topical fine-tuning
Tool-use
Advanced retrieval-augmented generation (RAG++)
- Novel: Pre-generation: Agents continually add content to RAG corpus. ("Pre-thinking" across many vectors.)
Science-adapted tokenization/embedding (xVal, [IDK])
Specialized sampling
- Entropy sampling: measure uncertainty of CoT trajectories
- Novel: Handoff sampling:
  - Useful for:
    - text-to-text (specialization, creativity, etc.)
    - text-to-tool (e.g. math)
    - test-to-field (integrate non-textual FM)
  - Implementation:
    - MI-SAE on both spaces, find matches (or maybe just "analogies"?)
    - GNN, e.g.: 2019-11: SuperGlue: Learning Feature Matching with Graph Neural Networks (hf)
    - 2025-02: Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Challenge: Connect reasoning models to domain models

How to make AI agents smarter?

Iteration schemes (loops, blocks)
1. Thinking:
  - Blocky/neural: Define architecture, allow system to pick hyper-parameters
2. Autonomous ideation:
  - Novel: Treat ideation as an AE problem in a semantic embedding space.
3. Dynamic tree-of-thought: on-demand context generation, allows model to select among data representations (zoom, modality, etc.)
Encode Human Patterns
1. Human scientist workflows (ideation, solving, etc.)
2. Thought-templates, thought-flows
How to allow agents to run for long time-horizons coherently?
1. Basket of Metrics: Need to define metrics of: (1) research success, (2) uncertainty (entropy sampling?)
2. Tool-use to "call human" and request help/information
Memory
1. Allow system to insert and retrieve from RAG at will.

What is the right architecture for AI swarms?

What software architecture is needed?

How to implement inference-time compute for exocortex?

What should the HCI be?

@@ Line 25: / Line 25: @@
 #*** 2025-02: [https://arxiv.org/abs/2502.03714 Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment]
-'''Challenge: Connect reasoning models to domain models.'''
+'''Challenge: Connect reasoning models to domain models'''
 # Latent space reasoning
 # Establish mappings (analogies) between interpretability spaces
+#* 2024-12: [https://arxiv.org/abs/2412.16325 Towards Safe and Honest AI Agents with Neural Self-Other Overlap]
+#** 2024-07: [https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment Self-Other Overlap: A Neglected Approach to AI Alignment]
+#** 2025-03: [https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine Reducing LLM deception at scale with self-other overlap fine-tuning]
 # Cycling recaptioning/reframing
 #* 2024-07: [https://arxiv.org/abs/2407.06723 Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions]