Difference between revisions of "AI safety"

From GISAXS
Jump to: navigation, search
(Key Concepts)
(Research)
Line 30: Line 30:
 
* 2023-08: [https://arxiv.org/abs/2308.06259 Self-Alignment with Instruction Backtranslation]
 
* 2023-08: [https://arxiv.org/abs/2308.06259 Self-Alignment with Instruction Backtranslation]
 
* 2023-11: [https://arxiv.org/abs/2311.08702 Debate Helps Supervise Unreliable Experts]
 
* 2023-11: [https://arxiv.org/abs/2311.08702 Debate Helps Supervise Unreliable Experts]
 +
* 2023-12: [https://cdn.openai.com/papers/weak-to-strong-generalization.pdf Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision] (OpenAI, [https://openai.com/research/weak-to-strong-generalization blog])
 +
* 2023-12: [https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf Practices for Governing Agentic AI Systems] (OpenAI, [https://openai.com/index/practices-for-governing-agentic-ai-systems/ blog])

Revision as of 13:56, 14 February 2025

Description of Safety Concerns

Key Concepts

Medium-term Risks

Long-term (x-risk)

Learning Resources

Research