Page under construction

Over the years, I’ve worked on lots of research problems. Every time, I felt invested in my work. The work felt beautiful. Even though many days have passed since I have daydreamed about instrumental convergence, I’m proud of what I’ve accomplished and discovered. A professional photograph of me. While not technically a part of my research, I’ve included a photo of myself anyways.

As of November 2023, I am a research scientist on Google DeepMind’s scalable alignment team in the Bay area.1 (Google Scholar)

TBD ☺️

January 2023 through the present.

January through April 2023.

Abstract

February through December 2023. In the first half of 2022, Quintin Pope and I came up with The shard theory of human values.

Fall 2019 through June 2022.

Optimal policies tend to seek power

Parametrically retargetable decision-makers tend to seek power

Spring 2018 through June 2022.

The Conservative Agency paper showed that AUP works in tiny gridworld environments. In my 2020 NeurIPS spotlight paper Avoiding Side Effects in Complex Environments, I showed that AUP also works in large and chaotic environments with ambiguous side effects.

The AI policy controls the chevron sprite. The policy was reinforced for destroying the red dot and finishing the level. However, there are fragile green dot patterns which we want the AI to not mess with. The challenge is to train a policy which avoids the green dot while still effectively destroying the red dot, without explicitly penalizing the AI for bumping into green dot!

Figure: AUP does a great job. The policy avoids the green stuff and hits the red stuff.

Black and white trout

  1. Of course, all of my hot takes are my own, not Google’s.