Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Demonstrations of Negative Use Capabilities)
(Research)
Line 86: Line 86:
 
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
 
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
 
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
 
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
 +
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Revision as of 20:14, 30 April 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also