6 items with this tag.1/2/2024Steering Llama-2 with Contrastive Activation Additionsactivation engineeringcorrigibilitymats programAI11/8/2022People Care About Each Other Even Though They Have Imperfect Motivational Pointers?corrigibilityAI12/3/2021Formalizing Policy-Modification CorrigibilitycorrigibilityAI11/20/2021A Certain Formalization of Corrigibility Is VNM-Incoherentinstrumental convergencecorrigibilityAI11/21/2020Non-Obstruction: A Simple Concept Motivating CorrigibilitycorrigibilityAI5/8/2020Corrigibility as Outside ViewcorrigibilityAI