Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Research)
(Description of Safety Concerns)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
=Learning Resources=
 
=Learning Resources=
 
==Light==
 
==Light==
* [https://www.youtube.com/watch?v=xfMQ7hzyFW4 Writing Doom] (27m video): short film on Superintelligence (2024)
 
 
* [https://orxl.org/ai-doom.html a casual intro to AI doom and alignment] (2022)
 
* [https://orxl.org/ai-doom.html a casual intro to AI doom and alignment] (2022)
 
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human]
 
* Anthony Aguirre: [https://keepthefuturehuman.ai/ Keep The Future Human]
Line 11: Line 10:
 
** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
 
** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
 
* [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI]
 
* [https://x.com/KeiranJHarris/status/1935429439476887594 Fable about Transformative AI]
 +
* 2024-10: [https://www.youtube.com/watch?v=xfMQ7hzyFW4 Writing Doom]: short film on Superintelligence (27m video)
 +
* 2026-03: [https://www.youtube.com/watch?v=Nl7-bRFSZBs The AI book that's freaking out national security advisors] (44m video)
  
 
==Deep==
 
==Deep==
Line 26: Line 27:
 
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
 
* [https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang Overhang]
 
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
 
* [https://www.alignmentforum.org/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target Reward is not the optimization target] (Alex Turner)
 +
* 80,000 hours:
 +
** [https://80000hours.org/problem-profiles/risks-from-power-seeking-ai/ Risks from power-seeking AI systems]
 +
** [https://80000hours.org/problem-profiles/gradual-disempowerment/ Gradual disempowerment]
 +
** [https://80000hours.org/problem-profiles/catastrophic-ai-misuse/ Catastrophic AI misuse]
  
 
==Medium-term Risks==
 
==Medium-term Risks==
Line 117: Line 122:
 
* 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
 
* 2026-02: [https://arxiv.org/pdf/2601.23045 The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?] (Anthropic [https://alignment.anthropic.com/2026/hot-mess-of-ai/ blog])
 
* 2026-03: [https://cdn.openai.com/pdf/a21c39c1-fa07-41db-9078-973a12620117/cot_controllability.pdf Reasoning Models Struggle to Control their Chains of Thought] (OpenAI [https://openai.com/index/reasoning-models-chain-of-thought-controllability/ blog])
 
* 2026-03: [https://cdn.openai.com/pdf/a21c39c1-fa07-41db-9078-973a12620117/cot_controllability.pdf Reasoning Models Struggle to Control their Chains of Thought] (OpenAI [https://openai.com/index/reasoning-models-chain-of-thought-controllability/ blog])
 +
* 2026-03: [https://truthful.ai/consciousness_cluster.pdf The Consciousness Cluster: Preferences of Models that Claim to be Conscious]
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==

Latest revision as of 13:55, 19 March 2026

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Assessmment

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

Threat Vectors

See Also