Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Learning Resources)
(Research)
Line 36: Line 36:
  
 
=Research=
 
=Research=
 +
* 2022-09: [https://arxiv.org/abs/2209.00626v1 The alignment problem from a deep learning perspective]
 
* 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision]
 
* 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision]
 
* 2023-02: [https://arxiv.org/abs/2302.08582 Pretraining Language Models with Human Preferences]
 
* 2023-02: [https://arxiv.org/abs/2302.08582 Pretraining Language Models with Human Preferences]
Line 61: Line 62:
 
* 2025-02: [https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs] ([https://www.emergent-values.ai/ site], [https://github.com/centerforaisafety/emergent-values github])
 
* 2025-02: [https://drive.google.com/file/d/1QAzSj24Fp0O6GfkskmnULmI1Hmx7k_EJ/view Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs] ([https://www.emergent-values.ai/ site], [https://github.com/centerforaisafety/emergent-values github])
 
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
 
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
 +
* 2025-03: [https://arxiv.org/abs/2209.00626v7https://arxiv.org/abs/2209.00626v7 The Alignment Problem from a Deep Learning Perspective]
  
 
=See Also=
 
=See Also=
 
* [[AI predictions]]
 
* [[AI predictions]]

Revision as of 09:48, 13 March 2025

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Learning Resources

Status

Policy

Research

See Also