Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Medium-term Risks)
(Research)
Line 114: Line 114:
 
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
 
* 2026-01: [https://www.nature.com/articles/s41586-025-09937-5 Training large language models on narrow tasks can lead to broad misalignment]
 
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
 
** 2025-02: Preprint: [https://martins1612.github.io/emergent_misalignment_betley.pdf Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs]
 +
* 2026-02: [https://alignment.anthropic.com/2026/hot-mess-of-ai/ The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?]
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Revision as of 12:55, 3 February 2026

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

Threat Vectors

See Also