Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Medium-term Risks)
(Assessmment)
 
(2 intermediate revisions by the same user not shown)
Line 49: Line 49:
 
==Assessmment==
 
==Assessmment==
 
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
 
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
 +
* 2025-07: [https://arxiv.org/abs/2507.16534 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report]
  
 
==Policy==
 
==Policy==
Line 99: Line 100:
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
 
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
 +
* 2025-07: [https://arxiv.org/abs/2506.18032 Why Do Some Language Models Fake Alignment While Others Don't?] (Anthropic, [https://github.com/safety-research/open-source-alignment-faking code])
 +
* 2025-07: [https://arxiv.org/abs/2507.11473 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety]
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Latest revision as of 16:33, 24 July 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also