Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Research)
(Research)
Line 63: Line 63:
 
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
 
* 2025-02: [https://arxiv.org/abs/2502.07776 Auditing Prompt Caching in Language Model APIs]
 
* 2025-03: [https://arxiv.org/abs/2209.00626v7 The Alignment Problem from a Deep Learning Perspective]
 
* 2025-03: [https://arxiv.org/abs/2209.00626v7 The Alignment Problem from a Deep Learning Perspective]
* 2025-03: Anthropic: [https://assets.anthropic.com/m/317564659027fb33/original/Auditing-Language-Models-for-Hidden-Objectives.pdf Auditing language models for hidden objectives] ([https://www.anthropic.com/research/auditing-hidden-objectives blog])
+
* 2025-03: [https://assets.anthropic.com/m/317564659027fb33/original/Auditing-Language-Models-for-Hidden-Objectives.pdf Auditing language models for hidden objectives] (Anthropic, [https://www.anthropic.com/research/auditing-hidden-objectives blog])
  
 
=See Also=
 
=See Also=
 
* [[AI predictions]]
 
* [[AI predictions]]

Revision as of 09:53, 14 March 2025

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Learning Resources

Status

Policy

Research

See Also