Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Light)
(Research)
 
(One intermediate revision by the same user not shown)
Line 34: Line 34:
 
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
 
* 2024-07: [https://yoshuabengio.org/2024/07/09/reasoning-through-arguments-against-taking-ai-safety-seriously/ Reasoning through arguments against taking AI safety seriously] (Yoshua Bengio)
 
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
 
* 2025-04: [https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power AI-Enabled Coups: How a Small Group Could Use AI to Seize Power]
 +
* 2025-06: [https://arxiv.org/abs/2506.20702 The Singapore Consensus on Global AI Safety Research Priorities]
  
 
==Long-term  (x-risk)==
 
==Long-term  (x-risk)==
Line 98: Line 99:
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
 
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
 +
* 2025-07: [https://arxiv.org/abs/2506.18032 Why Do Some Language Models Fake Alignment While Others Don't?] (Anthropic, [https://github.com/safety-research/open-source-alignment-faking code])
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Latest revision as of 10:26, 10 July 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

  • AI Assessment Scale (AIAS): A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also