Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Light)
(Assessmment)
 
(3 intermediate revisions by the same user not shown)
Line 34: Line 34:
 
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
 
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
 
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
 
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
 +
* 2025-06: [https://arxiv.org/abs/2506.20702 The Singapore Consensus on Global AI Safety Research Priorities]
  
 
==Long-term  (x-risk)==
 
==Long-term  (x-risk)==
Line 48: Line 49:
 
==Assessmment==
 
==Assessmment==
 
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
 
* [https://aiassessmentscale.com/ AI Assessment Scale (AIAS)]: A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions
 +
* 2025-07: [https://arxiv.org/abs/2507.16534 Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report]
  
 
==Policy==
 
==Policy==
Line 98: Line 100:
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
 
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
 +
* 2025-07: [https://arxiv.org/abs/2506.18032 Why Do Some Language Models Fake Alignment While Others Don't?] (Anthropic, [https://github.com/safety-research/open-source-alignment-faking code])
 +
* 2025-07: [https://arxiv.org/abs/2507.11473 Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety]
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Latest revision as of 16:33, 24 July 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also