Dark mode
Search
11 items with this tag.
Steering Gemini Using BIDPO Vectors
Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
I Found >800 Orthogonal “Write Code” Steering Vectors
Mechanistically Eliciting Latent Behaviors in Language Models
Steering Llama-2 with Contrastive Activation Additions
Paper: Understanding and Controlling a Maze-Solving Policy Network
ActAdd: Steering Language Models without Optimization
Open Problems in Activation Engineering
Steering GPT-2-XL by Adding an Activation Vector
Maze-Solving Agents: Add a Top-Right Vector, Make the Agent Go to the Top-Right
Understanding and Controlling a Maze-Solving Policy Network