Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Long-term (x-risk))
(Research)
Line 93: Line 93:
 
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
 
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
 
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
 
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
 +
* 2025-06: [https://assets.anthropic.com/m/4fb35becb0cd87e1/original/SHADE-Arena-Paper.pdf SHADE-Arena: Evaluating sabotage and monitoring in LLM agents] (Anthropic, [https://www.anthropic.com/research/shade-arena-sabotage-monitoring blog])
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Revision as of 11:12, 17 June 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

  • AI Assessment Scale (AIAS): A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also