Dark mode
Search
9 items with this tag.
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
I Found >800 Orthogonal “Write Code” Steering Vectors
Mechanistically Eliciting Latent Behaviors in Language Models
Steering Llama-2 with Contrastive Activation Additions
Steering GPT-2-XL by Adding an Activation Vector
Behavioural Statistics for a Maze-Solving Agent
Understanding and Controlling a Maze-Solving Policy Network
Predictions for Shard Theory Mechanistic Interpretability Results