The Pond

SearchSearch

Search

  • About me
  • My research
  • Posts
  • Subscribe

Tag: activation engineering

11 items with this tag.

  • 1/30/2025

    Steering Gemini Using BIDPO Vectors

      activation engineeringAI

  • 12/4/2024

    Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models

      activation engineeringmats programAI

  • 7/15/2024

    I Found >800 Orthogonal “Write Code” Steering Vectors

      activation engineeringmats programAI

  • 4/30/2024

    Mechanistically Eliciting Latent Behaviors in Language Models

      understanding the worldactivation engineeringmats programAI

  • 1/2/2024

    Steering Llama-2 with Contrastive Activation Additions

      activation engineeringcorrigibilitymats programAI

  • 10/13/2023

    Paper: Understanding and Controlling a Maze-Solving Policy Network

      activation engineeringshard theoryAI

  • 9/6/2023

    ActAdd: Steering Language Models without Optimization

      activation engineeringAI

  • 7/24/2023

    Open Problems in Activation Engineering

      activation engineeringAI

  • 5/13/2023

    Steering GPT-2-XL by Adding an Activation Vector

      activation engineeringshard theorymats programAI

  • 3/31/2023

    Maze-Solving Agents: Add a Top-Right Vector, Make the Agent Go to the Top-Right

      activation engineeringAI

  • 3/11/2023

    Understanding and Controlling a Maze-Solving Policy Network

      activation engineeringmats programshard theoryAI