Formalizing “Defection” Using Game Theory

Table of Contents
Formalism
Game Theorems
Prisoner’s dilemma
Stag hunt
Chicken
Discussion

Vignette

They can’t prove the conspiracy… But they could, if Steve runs his mouth.
The police chief stares at you.
You stare at the table. You’d agreed (sworn!) to stay quiet. You’d even studied game theory together. But, you hadn’t understood what an extra year of jail meant.
The police chief stares at you.
Let Steve be the gullible idealist. You have a family waiting for you.

People talk about “defection” in social dilemma games, from the prisoner’s dilemma to stag hunt to chicken. In the tragedy of the commons, we talk about defection. The concept has become a regular part of LessWrong discourse.

Informal definition: Defection

A player defects when they increase their personal payoff at the expense of the group.

This informal definition is no secret, being echoed from the ancient Formal Models of Dilemmas in Social Decision-Making to the recent Classifying games like the Prisoner’s Dilemma:

Quote

you can model the “defect” action as “take some value for yourself, but destroy value in the process.”

Given that the prisoner’s dilemma is the bread and butter of game theory and of many parts of economics, evolutionary biology, and psychology, you might think that someone had already formalized this. However, to my knowledge, no one has.

Consider a finite $n$ -player normal-form game, with player $i$ having pure action set $A_{i}$ and payoff function $P_{i} : A_{1} \times \dots \times A_{n} \to R$ . Each player $i$ chooses a strategy $s_{i} \in Δ (A_{i})$ (a distribution over $A_{i}$ ). Together, the strategies form a strategy profile $s = def (s_{1}, \dots, s_{n})$ . $s_{- i} = def (s_{1}, \dots, s_{i - 1}, s_{i + 1}, \dots, s_{n})$ is the strategy profile, excluding player $i$ ’s strategy. A payoff profile contains the payoffs for all players under a given strategy profile.

A utility weighting $(α_{j})_{j = 1, \dots, n}$ is a set of $n$ non-negative weights (as in Harsanyi’s utilitarian theorem). You can consider the weights as quantifying each player’s contribution; they might represent a perceived social agreement or be the explicit result of a bargaining process.

When all $α_{j}$ are equal, we’ll call that an equal weighting. However, if there are “utility monsters”, we can downweight them accordingly.

We’re implicitly assuming that payoffs are comparable across players. We want to investigate: given a utility weighting, which actions are defections?

Definition: Defection

Player $i$ ’s action $a \in A_{i}$ is a defection against strategy profile $s$ and weighting $(α_{j})_{j = 1, \dots, n}$ if

$Personal gain: P_{i} (a, s_{- i}) > P_{i} (s_{i}, s_{- i})$

Social loss: $\sum_{j} α_{j} P_{j} (a, s_{- i}) < \sum_{j} α_{j} P_{j} (s_{i}, s_{- i})$

If such an action exists for some player $i$ , strategy profile $s$ , and weighting, then we say that there is an opportunity for defection in the game.
For an equal weighting, condition (2) is equivalent to demanding that the action not be a Kaldor-Hicks improvement.

A payoff matrix for the Prisoner's Dilemma. Player 1's actions are C1 and D1; Player 2's are C2 and D2. Payoffs are: C1, C2 yields (3,3); C1, D2 yields (0,4); D1, C2 yields (4,0); D1, D2 yields (1,1). Players formally defect by moving from higher group payoff to higher individual payoff but lower group payoff. — Payoff profiles in the Prisoner’s Dilemma. Red arrows represent defections against pure strategy profiles; player 1 defects vertically, while player 2 defects horizontally. For example, player 2 defects with $(C_{1}, C_{2}) \to (C_{1}, D_{2})$ because they gain ( $4 > 3$ ) but the weighted sum loses out ( $4 < 6$ ).

This definition seems to make reasonable intuitive sense. In the tragedy of the commons, each player rationally increases their utility, while imposing negative externalities on the other players and decreasing total utility. A spy might leak classified information, benefiting themselves and Russia but defecting against America.

Definition: Cooperation

Cooperation takes place when a strategy profile is maintained despite the opportunity for defection.

I will now state some obvious results without proof.

Theorem 1: In constant-sum games, there is no opportunity for defection against equal weightings

Theorem 2: No defection in common-payoff scenarios

In common-payoff games (where all players share the same payoff function), there is no opportunity for defection.

In private communication, Joel Leibo points out that Theorems 1 and 2 formalize the intuition behind the proverb “all’s fair in love and war.” That is, you can’t defect in fully competitive or fully cooperative situations.

Theorem 3: There is no opportunity for defection against Nash equilibria

Definition: Pareto improvement

An action $a \in A_{i}$ is a Pareto improvement over strategy profile $s$ if, for all players $j$ , $P_{j} (a, s_{- i}) \geq P_{j} (s_{i}, s_{- i})$ .

Proposition 4: Pareto improvements are never defections

We can prove that formal defection exists in the trifecta of famous games. Expand the admonitions to view the proofs if you’re interested.

Three 2x2 payoff matrices illustrating defection in the Prisoner's Dilemma. ... (a) A "Symmetric game format" matrix with variable payoffs: R,R for mutual cooperation; S,T and T,S for mixed choices; and P,P for mutual defection. ... (b) An example where R > 1/2(T+S). Payoffs are 3,3; 0,4; 4,0; and 1,1. Red arrows show both players are always incentivized to defect. ... (c) An example where R ≤ 1/2(T+S). Payoffs are 3,3; 0,8; 8,0; and 1,1. Red arrows show that when one player cooperates and the other defects, the cooperator is incentivized to also defect. — In (a), variables stand for $T$ emptation, $R$ eward, $P$ unishment, and $S$ ucker. A 2×2 symmetric game is a *Prisoner’s Dilemma* when $T > R > P > S$ . Unsurprisingly, formal defection is everywhere in this game.

Theorem 5: In 2×2 symmetric games, if the Prisoner’s Dilemma inequality is satisfied, defection can exist against equal weightings.

Proof. Suppose the Prisoner’s Dilemma inequality holds. Further suppose that $R > \frac{1}{2} (T + S)$ . Then $2 R > T + S$ . Then since $T > R$ but $T + S < 2 R$ , both players defect from $(C_{1}, C_{2})$ with $D_{i}$ .
Suppose instead that $R \leq \frac{1}{2} (T + S)$ . Then $T + S \geq 2 R > 2 P$ , so $T + S > 2 P$ . But $P > S$ , so player 1 defects from $(C_{1}, D_{2})$ with action $D_{1}$ , and player 2 defects from $(D_{1}, C_{2})$ with action $D_{2}$ . ∎

Two 2x2 payoff matrices illustrating the Stag Hunt game. Matrix (a), the symmetric game format, shows payoffs as variables: Stag₁ vs Stag₂ is R, R; Stag₁ vs Hare₂ is S, T; Hare₁ vs Stag₂ is T, S; Hare₁ vs Hare₂ is P, P. Matrix (b), an example, shows numerical payoffs: Stag₁ vs Stag₂ is 4, 4; Stag₁ vs Hare₂ is 1, 3; Hare₁ vs Stag₂ is 3, 1; Hare₁ vs Hare₂ is 2, 2. — A 2×2 symmetric game is a *Stag Hunt* when $R > T \geq P > S$ . In Stag Hunts, due to uncertainty about whether the other player will hunt stag, players defect and fail to coordinate on the unique Pareto optimum $(Stag_{1}, Stag_{2})$ . In (b), player 2 will defect (play $Hare_{2}$ ) when $P (Stag_{1}) < \frac{1}{2}$ . In Stag Hunts, formal defection can always occur against mixed strategy profiles, which lines up with defection in this game being due to uncertainty.

Theorem 6: In 2×2 symmetric games, if the Stag Hunt inequality is satisfied, defection can exist against equal weightings.

Proof. Suppose that the Stag Hunt inequality is satisfied. Let $p$ be the probability that player 1 plays $Stag_{1}$ . We now show that player 2 can always defect against strategy profile $(p, Stag_{2})$ for some value of $p$ .
For defection’s first condition, we determine when $P_{2} (p, Stag_{2}) < P_{2} (p, Hare_{2})$ :
$pR + (1 - p) S p < pT + (1 - p) P < \frac{P - S}{( R - T ) + ( P - S )} .$
This denominator is positive ( $R > T$ and $P > S$ ), as is the numerator. The fraction clearly falls in the open interval $(0, 1)$ .
For defection’s second condition, we determine when
$P_{1} (p, Stag_{2}) + P_{2} (p, Stag_{2}) 2 pR + (1 - p) (T + S) p > P_{1} (p, Hare_{2}) + P_{2} (p, Hare_{2}) > p (S + T) + (1 - p) 2 P > \frac{1}{2} \frac{( P - S ) + ( P - T )}{( R - T ) + ( P - S )} .$
Combining the two conditions, we have
$1 > \frac{P - S}{( R - T ) + ( P - S )} > p > \frac{1}{2} \frac{( P - S ) + ( P - T )}{( R - T ) + ( P - S )} .$
Since $P - T \leq 0$ , this holds for some nonempty subinterval of $[0, 1)$ . ∎

Two 2x2 payoff matrices illustrating the game of Chicken. Matrix (a) shows the symmetric format with variables: (Turn₁, Turn₂) yields R,R; (Turn₁, Ahead₂) yields S,T; (Ahead₁, Turn₂) yields T,S; (Ahead₁, Ahead₂) yields P,P. Matrix (b) gives a numerical example with the same actions. — A 2×2 symmetric game is *Chicken* when $T > R \geq S > P$ . In (b), defection only occurs when $\frac{10}{11} < P (Turn_{1}) < \frac{21}{22}$ : when player 1 is likely to turn, player 2 is willing to trade a bit of total payoff for personal payoff.

Theorem 7: In 2×2 symmetric games, if the Chicken inequality is satisfied, defection can exist against equal weightings.

Proof. Assume that the Chicken inequality is satisfied. This proof proceeds similarly as in theorem 6. Let $p$ be the probability that player 1’s strategy places on $Turn_{1}$ .
For defection’s first condition, we determine when $P_{2} (p, Turn_{2}) < P_{2} (p, Ahead_{2})$ :
$pR + (1 - p) S p 1 \geq p < pT + (1 - p) P > \frac{P - S}{( R - T ) + ( P - S )} > \frac{S - P}{( T - R ) + ( S - P )} > 0.$
The inequality flips in the first equation because of the division by $(R - T) + (P - S)$ , which is negative ( $T > R$ and $S > P$ ). $S > P$ , so $p > 0$ ; this reflects the fact that $(Ahead_{1}, Turn_{2})$ is a Nash equilibrium, against which defection is impossible (proposition 3).
For defection’s second condition, we determine when
$P_{1} (p, Turn_{2}) + P_{2} (p, Turn_{2}) 2 pR + (1 - p) (T + S) p p > P_{1} (p, Ahead_{2}) + P_{2} (p, Ahead_{2}) > p (S + T) + (1 - p) 2 P < \frac{1}{2} \frac{( P - S ) + ( P - T )}{( R - T ) + ( P - S )} < \frac{1}{2} \frac{( S - P ) + ( T - P )}{( T - R ) + ( S - P )} .$
The inequality again flips because $(R - T) + (P - S)$ is negative. When $R \leq \frac{1}{2} (T + S)$ , we have $p < 1$ , in which case defection does not exist against a pure strategy profile.
Combining the two conditions, we have
$\frac{1}{2} \frac{( S - P ) + ( T - P )}{( T - R ) + ( S - P )} > p > \frac{S - P}{( T - R ) + ( S - P )} > 0.$
Because $T > S$ ,
$\frac{1}{2} \frac{( S - P ) + ( T - P )}{( T - R ) + ( S - P )} > \frac{S - P}{( T - R ) + ( S - P )} .$
∎

This bit of basic theory will hopefully allow for things like principled classification of policies: “has an agent learned a “non-cooperative” policy in a multi-agent setting?”. For example, the empirical game-theoretic analyses of Leibo et al.’s Multi-agent Reinforcement Learning in Sequential Social Dilemmas say that apple-harvesting agents are defecting when they zap each other with beams. Instead of using a qualitative metric, you could choose a desired non-zapping strategy profile, and then use Leibo’s analysis tool to classify formal defections from that. This approach would still have a free parameter, but it seems better.

I had vague pre-theoretic intuitions about “defection”, and now I feel more capable of reasoning about what is and isn’t a defection. In particular, I’d been confused by the difference between power-seeking and defection, and now I’m not.

Thanks

Thanks to Michael Dennis for proposing the formal definition; to Andrew Critch for pointing me in this direction; to Abram Demski for proposing non-negative weighting; and to Alex Appel, Scott Emmons, Evan Hubinger, philh, Rohin Shah, and Carroll Wainwright for their feedback and ideas.

Find out when I post more content: newsletter & rss

Thoughts? Email me at alex@turntrout.com (pgp)

The Pond

Formalizing “Defection” Using Game Theory

Formalizing “Defection” Using Game Theory

Formalism

Game Theorems

Prisoner’s dilemma

Stag hunt

Chicken

Discussion