Difference between revisions of "AI safety"
KevinYager (talk | contribs) (→Research) |
KevinYager (talk | contribs) (→Research) |
||
Line 14: | Line 14: | ||
* 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision] | * 2022-12: [https://arxiv.org/abs/2212.03827 Discovering Latent Knowledge in Language Models Without Supervision] | ||
* 2023-04: [https://arxiv.org/abs/2304.03279 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark] | * 2023-04: [https://arxiv.org/abs/2304.03279 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark] | ||
+ | * 2023-05: [https://arxiv.org/abs/2305.15324 Model evaluation for extreme risks] (DeepMind) |
Revision as of 13:42, 14 February 2025
Contents
Description of Safety Concerns
Medium-term Risks
- 2023-04: A.I. Dilemma – Tristan Harris and Aza Raskin” (video) (.website-files.com/5f0e1294f002b1bb26e1f304/64224a9051a6637c1b60162a_65-your-undivided-attention-The-AI-Dilemma-transcript.pdf podcast transcript): raises concern about human ability to handle these transformations
- 2023-04: Daniel Schmachtenberger and Liv Boeree (video): AI could accelerate perverse social dynamics