Dark mode
Search
20 items with this tag.
Dreams of AI Alignment: The Danger of Suggestive Names
Predictions for Shard Theory Mechanistic Interpretability Results
Bruce Wayne and the Cost of Inaction
Understanding and Avoiding Value Drift
The Shard Theory of Human Values
Looking Back on My Alignment PhD
Emotionally Confronting Doom
You Should Read “Harry Potter and the Methods of Rationality”
When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives
Lessons I’ve Learned From Self-Teaching
Collider Bias as a Cognitive Blindspot?
Math That Clicks: Look for Two-Way Correspondences
Formalizing “Defection” Using Game Theory
Problem Relaxation as a Tactic
On Being Robust
How I Do Research
I Want to Take Off the Coat
Internalizing Internal Double Crux
Unyielding Yoda Timers: Taking the Hammertime Final Exam
How to Dissolve It