12 items with this tag.11/6/2025Consistency Training Helps Stop Sycophancy and Jailbreaksactivation engineeringdeepmindAI1/30/2025Steering Gemini Using BIDPO Vectorsactivation engineeringdeepmindAI12/4/2024Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Modelsactivation engineeringmats programAI7/15/2024I Found >800 Orthogonal “Write Code” Steering Vectorsactivation engineeringmats programAI4/30/2024Mechanistically Eliciting Latent Behaviors in Language Modelsunderstanding the worldactivation engineeringmats programAI1/2/2024Steering Llama-2 with Contrastive Activation Additionsactivation engineeringcorrigibilitymats programAI10/13/2023Paper: Understanding and Controlling a Maze-Solving Policy Networkactivation engineeringshard theoryAI9/6/2023ActAdd: Steering Language Models without Optimizationactivation engineeringAI7/24/2023Open Problems in Activation Engineeringactivation engineeringAI5/13/2023Steering GPT-2-XL by Adding an Activation Vectoractivation engineeringshard theorymats programAI3/31/2023Maze-Solving Agents: Add a Top-Right Vector, Make the Agent Go to the Top-Rightactivation engineeringAI3/11/2023Understanding and Controlling a Maze-Solving Policy Networkactivation engineeringmats programshard theoryAI