20 items with this tag.2/10/2024Dreams of AI Alignment: The Danger of Suggestive NamesrationalitycritiqueAI3/1/2023Predictions for Shard Theory Mechanistic Interpretability Resultsmats programshard theoryrationalityAI9/30/2022Bruce Wayne and the Cost of Inactionrationalityfiction9/9/2022Understanding and Avoiding Value Drifthuman valuesshard theoryrationalityAI9/4/2022The Shard Theory of Human Valuesunderstanding the worldshard theoryhuman valuesrationalityAI6/30/2022Looking Back on My Alignment PhDgrowth storiesrationalitypersonalAI4/10/2022Emotionally Confronting Doomrationalitypracticalcommunity11/2/2021You Should Read “Harry Potter and the Methods of Rationality”rationalitytalk notespersonalgrinnellfiction8/9/2021When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentivesinstrumental convergencerationalityAI1/23/2021Lessons I’ve Learned From Self-Teachingscholarship and learningrationalitypractical12/30/2020Collider Bias as a Cognitive Blindspot?rationality10/2/2020Math That Clicks: Look for Two-Way Correspondencesunderstanding the worldrationality7/12/2020Formalizing “Defection” Using Game Theorygame theoryrationalityAI4/22/2020Problem Relaxation as a TacticrationalityAI1/10/2020On Being Robustrationalitypersonal11/19/2019How I Do Researchscholarship and learningrationality7/29/2018I Want to Take Off the Coatrationalitypersonal4/30/2018Internalizing Internal Double Cruxrationalitypracticalpersonal4/3/2018Unyielding Yoda Timers: Taking the Hammertime Final Examrationality3/7/2018How to Dissolve Itrationalitypractical