Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Research)
(Research)
Line 95: Line 95:
 
* 2025-06: [https://assets.anthropic.com/m/4fb35becb0cd87e1/original/SHADE-Arena-Paper.pdf SHADE-Arena: Evaluating sabotage and monitoring in LLM agents] (Anthropic, [https://www.anthropic.com/research/shade-arena-sabotage-monitoring blog])
 
* 2025-06: [https://assets.anthropic.com/m/4fb35becb0cd87e1/original/SHADE-Arena-Paper.pdf SHADE-Arena: Evaluating sabotage and monitoring in LLM agents] (Anthropic, [https://www.anthropic.com/research/shade-arena-sabotage-monitoring blog])
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 
* 2025-06: [https://arxiv.org/abs/2506.13609 Avoiding Obfuscation with Prover-Estimator Debate]
 +
* 2025-06: [https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf Persona Features Control Emergent Misalignment] (OpenAI, [https://openai.com/index/emergent-misalignment/ blog])
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Revision as of 13:26, 19 June 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

  • AI Assessment Scale (AIAS): A practical framework to guide the appropriate and ethical use of generative AI in assessment design, empowering educators to make purposeful, evidence-based decisions

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also