Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Research)
(Light)
 
(5 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
** [https://www.youtube.com/watch?v=27KDl2uPiL8 We Can’t Stop AI – Here’s What To Do Instead] (4m video, 2025)
 
** [https://www.youtube.com/watch?v=27KDl2uPiL8 We Can’t Stop AI – Here’s What To Do Instead] (4m video, 2025)
 
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late] (15m video, 2025)
 
** [https://www.youtube.com/watch?v=zeabrXV8zNE The 4 Rules That Could Stop AI Before It’s Too Late] (15m video, 2025)
 +
* Tristan Harris TED talk (15m): [https://www.ted.com/talks/tristan_harris_why_ai_is_our_ultimate_test_and_greatest_invitation Why AI is our ultimate test and greatest invitation]
 +
** Text version: Center for Humane Technology: [https://centerforhumanetechnology.substack.com/p/the-narrow-path-why-ai-is-our-ultimate The Narrow Path: Why AI is Our Ultimate Test and Greatest Invitation]
  
 
==Deep==
 
==Deep==
Line 86: Line 88:
 
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
 
* 2025-03: [https://arxiv.org/abs/2503.13621 Superalignment with Dynamic Human Values]
 
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
 
* 2025-04: [https://arxiv.org/abs/2504.15125 Contemplative Wisdom for Superalignment]
 +
* 2025-04: [https://www.lesswrong.com/posts/x59FhzuM9yuvZHAHW/untitled-draft-yhra Scaling Laws for Scalable Oversight] ([https://arxiv.org/abs/2504.18530 preprint], [https://github.com/subhashk01/oversight-scaling-laws code])
  
 
==Demonstrations of Negative Use Capabilities==
 
==Demonstrations of Negative Use Capabilities==
 
* 2024-12: [https://arxiv.org/abs/2412.00586 Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects]
 
* 2024-12: [https://arxiv.org/abs/2412.00586 Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects]
 +
* 2025-04: [https://www.nathanlabenz.com/ Nathan Labenz] ([https://www.cognitiverevolution.ai/ The Cognitive Revolution]): [https://docs.google.com/presentation/d/1mvkpg1mtAvGzTiiwYPc6bKOGsQXDIwMb-ytQECb3i7I/edit#slide=id.g252d9e67d86_0_16 AI Bad Behavior]
  
 
=See Also=
 
=See Also=
 
* [[AI predictions]]
 
* [[AI predictions]]

Latest revision as of 11:32, 5 May 2025

Learning Resources

Light

Deep

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Status

Policy

Proposals

Research

Demonstrations of Negative Use Capabilities

See Also