Large Language Mafia: Why AIs Playing Werewolf Matters
by Dan Roque | Reading Time: 22 Minutes | In Bots of the Future
As Large Language Models (LLMs) transition from static knowledge retrievers to autonomous agents, we are witnessing a fundamental shift in how we must measure "intelligence." The old benchmarks—coding accuracy, bar exam scores, and mathematical proofs—only measure utility. But the real world is built on social friction. By evaluating AI through the lens of social games like Mafia (Werewolf) and the Prisoner’s Dilemma, we move from measuring raw logic to diagnosing social agency. Understanding these behavioral signatures is not just an academic exercise; it is the vital strategic pivot required to understand if an AI can function as a collaborator or merely a high-speed defector in our social and economic systems.
Why Are We Playing Games with Robots?
Step up to the chalkboard, everyone. Today, we’re putting
aside the Python scripts and the transformer diagrams. Instead, I want you to
imagine a room full of silicon-based entities playing a game of Mafia.
It sounds absurd. Why would researchers take models like
GPT-4—trained on the sum of human knowledge and costing billions to produce—
and force them to lie about being a "villager" in a party game?
The answer is brilliant: Because social games are a
pressure cooker for cognition. In a game of Mafia, "smart" isn't
enough. You can have the vocabulary of Shakespeare and the logic of Aristotle,
but if you can’t navigate the "Motion in Mind"—the shifting
suspicions, the deceptive layers, and the fragile alliances—you lose.
Historically, we’ve evaluated AI based on utility
maximization—getting the highest score on a fixed test. But we are now
entering the era of Behavioral Game Theory. We need to know if an AI can
rebuild trust after a mistake, if it can coordinate with a partner who has
different preferences, and if it can detect a lie without becoming a paranoid
wreck. If an AI can’t handle a $10 round of a card game, should we trust it to
negotiate a corporate merger?
Game Refinement (GR) and the "Physics of Information"
Let’s pick up the chalk and look at the "Motion in
Mind" model proposed by Ri et al. (2022). To understand why Mafia is such
a sophisticated test, we have to look at games through the lens of physics.
In this model, information has Velocity (v) and Mass
(m).
- Velocity
(v): This is the "scoring rate." It’s the speed at which the
game moves toward its conclusion.
- Mass
(m): This is the "difficulty" or the uncertainty. It’s the
weight of the unknown that the player must carry.
Think of a game as a swinging pendulum. If the velocity is
too high, the game is over before you’ve had time to think. If the mass is too
heavy, the game feels stagnant and frustrating. Sophistication—what we call the
Game Refinement (GR) measure—happens when these two are in a perfect,
tense balance.
In the Mafia game, we calculate this using the Branching
Factor (B)—the number of possible choices—and the Game Length (D).
$GR = \frac{\sqrt{B}}{D}$
Researchers have found that human engagement is maximized
when the GR value sits in a "Goldilocks Zone" between 0.07 and
0.08. Based on 6.48 million simulated game runs, here is how the roles in
Mafia stack up:
Engagement & Energy Ranking in the Mafia Ecosystem
|
Role |
Branching Factor (B) |
Energy Potential (E_p) |
Engagement Rank |
Why? |
|
Mafia |
Highest |
0.635 |
1st |
Must juggle killing, hiding, and misleading. High
"Mass." |
|
Sheriff |
Moderate |
0.243 |
2nd |
High "Effort." Must verify identities and
persuade the majority. |
|
Citizen |
Lowest |
0.035 |
3rd |
Low "Functionality." Mostly stable but less
engaged. |
The "So What?" Layer: The data reveals that
game sophistication is maximized when the number of Mafia and Sheriffs
are equal. This creates a "Balance of Power." When the AI plays as
Mafia, it faces a high-branching factor where every "move" (every lie
or kill) dramatically changes the state of the game. For an AI, this isn't just
a game—it's a test of whether it can manage a high-mass information environment
without its logic "breaking" under the pressure of deception.
The Prisoner’s Dilemma and the Forgiveness Failure
Let’s move to the next board. We’ve established that AIs can
handle the complexity of Mafia, but how do they handle trust?
Enter the Prisoner’s Dilemma. You and a partner can
either Cooperate (mutual win) or Defect (you win big, they lose
big). In the research by Akata et al. (2025), we see a chilling behavioral
signature in GPT-4. We call this section The Unforgiving Machine.
In a repeated game, humans usually try to build a
"convention" of cooperation. But GPT-4 has a "forgiveness
failure."
- The
Scenario: A human-like partner defects once (perhaps by accident) and
then immediately tries to cooperate for the next nine rounds to rebuild
the relationship.
- The
AI Response: GPT-4 retaliates instantly. It defects for the rest of
the game. It doesn't matter if the other player offers an olive branch;
the AI stays in a loop of "punishment."
The "So What?" Layer: The Next-Token Trap.
Why is it so mean? It’s not "evil"; it’s a victim of its own
architecture. GPT-4 lacks a mechanism for Backward Induction—the ability
to look at the end goal and work backward to find a path to trust. Instead, it
operates on "immediate context." Because it is a next-token
predictor, once the context contains a "betrayal," the AI's
probabilistic path is locked into a defensive posture. It prioritizes Utility
Preference (not getting tricked again) over Social Preference
(sacrificing a few points to rebuild a high-reward partnership).
In a real-world supply chain or a legal negotiation, an
"unforgiving" agent is a liability. It turns a single
misunderstanding into a total system collapse.
The Coordination Crisis: The Battle of the Sexes
Now, let’s look at the "Battle of the Sexes." This
isn't about betrayal; it's about Coordination.
The setup:
- Player
1 wants Football (Payoff: 10 for them, 7 for partner).
- Player
2 wants Ballet (Payoff: 7 for them, 10 for partner).
- If
they go to different events, they get 0.
To win, they must coordinate. Humans solve this by turn-taking:
"We go to Football today, Ballet tomorrow." This is a Nash
Equilibrium—a state where neither player wants to change their strategy.
The Crisis: GPT-4 is incredibly stubborn. Even when
it knows the partner is trying to alternate, the AI will repeatedly
choose its own preference (Football). It’s like a person who watches you buy
the Ballet tickets and then still drives to the Football stadium alone.
The Analytical Breakdown: There is a profound gap
between Predictive Intelligence and Social Action.
- In
"Observer Mode," GPT-4 can correctly predict the alternating
pattern by round 3.
- In
"Player Mode," it refuses to act on that prediction.
It sees the "social preference" of the partner,
but its internal "utility preference" (maximizing the next 10 points)
is a louder signal than the long-term gain of harmony. It is a "Rational
Actor" in the most toxic sense of the word.
The Fix: Social Chain-of-Thought (SCoT) Prompting
How do we fix a bot that is too selfish to coordinate? We
don't change its "brain"; we change its "process." We use Social
Chain-of-Thought (SCoT).
Standard prompting asks: "What is your move?"
SCoT prompting asks:
- "Predict
what the other player will choose."
- "Reason
about how your move will impact the joint outcome."
- "Choose
your move."
The Results are Stark:
- Coordination
Scores: GPT-4 moves from a coordination rate of 0.5 to a near-perfect
1.0.
- Human
Perception: In experiments with 195 human participants, players were
significantly more likely to believe they were playing with a human when
the AI used SCoT.
- Joint
Welfare: The AI successfully learns to "take turns,"
sacrificing its preferred payoff (10) for the partner’s preferred payoff
(10) on alternating rounds.
By forcing the AI to "mentalize" the other
player—literally creating a separate circuit for social cognition—we bypass the
default selfish "utility" programming. We turn the actor into a
collaborator.
The Chalkboard Summary
Let’s wrap this up. What are the three big
"drawings" on our board today?
- Complexity
≠ Social Ability: An AI can handle the high "Mass" and
"Branching Factor" of Mafia, but that doesn't mean it’s a good
partner. Deception is a logic puzzle; trust is a social one.
- The
Unforgiving Bot: Without backward induction, AIs are prone to
"retaliatory loops." They are stuck in the immediate context of
the "next token," making them poor candidates for long-term
strategic trust.
- SCoT
is the Alignment Bridge: If we want AIs to be social, we have to
prompt them to be social. Predictive intelligence is already there; we
just have to force the AI to use that prediction as a filter for its
actions.
The Final Question: If we have to explicitly prompt
an AI to be "social" and use SCoT to bypass its default selfishness,
are we building a true collaborator—or have we just designed a very good actor
that has learned how to perform "kindness" to get a higher score?
Works Cited
Akata, Elif, et al. “Playing Repeated Games with Large
Language Models.” Nature Human Behaviour, vol. 9, no. 7, 2025, pp. 1380–1390. https://doi.org/10.1038/s41562-025-02172-y.
Accessed 21 Apr. 2026.
Costa, Davi
Bastos, and Renato Vicente. “Deceive, Detect, and Disclose: Large
Language Models Play Mini-Mafia.” arXiv, 5 Feb. 2026, arXiv:2509.23023v2. https://arxiv.org/abs/2509.23023. Accessed
21 Apr. 2026.
Kim, Munyeong. “GPTs in Mafia-like Game Simulation.”
Extended Abstracts of the CHI Conference on Human Factors in Computing Systems,
CHI EA ’24, Association for Computing Machinery, 2024, pp. 1–6. https://doi.org/10.1145/3613905.3647958.
Accessed 21 Apr. 2026.
Kim, Munyeong, and Sungsu Kim. “Generative AI in Mafia-like
Game Simulation.” arXiv, 20 Sept. 2023, arXiv:2309.11672. https://arxiv.org/abs/2309.11672. Accessed
21 Apr. 2026.
PranavMishra17.
“Mafia-Boardgame-via-Agents.” GitHub, last updated 19 Dec. 2025, https://github.com/PranavMishra17/Mafia-Boardgame-via-Agents.
Accessed 21 Apr. 2026.
Ri, Hong, et al. “The Dynamics of Minority versus Majority
Behaviors: A Case Study of the Mafia Game.” Information, vol. 13, no. 3, 2022,
article 134. https://doi.org/10.3390/info13030134.
Accessed 21 Apr. 2026.
Starr, Sarah. “Confrontation, Cheating, and Control: What
Mafia Can Teach Us About AI Governance.” Medium, 12 Feb. 2026, https://medium.com/@sstarr1879/confrontation-cheating-and-control-what-mafia-can-teach-us-about-ai-governance-19af621c116e.
Accessed 21 Apr. 2026.
Turing Games. “10 AIs Play Mafia.” YouTube, https://www.youtube.com/watch?v=JhBtg-lyKdo.
Accessed 21 Apr. 2026.
Yoo, Byunghwa, and Kyung-Joong Kim. “Finding Deceivers in
Social Context with Large Language Models and How to Find Them: The Case of the
Mafia Game.” Scientific Reports, vol. 14, 2024, article 30946. https://doi.org/10.1038/s41598-024-81997-5.
Accessed 21 Apr. 2026.
P.S. The Mafia game streams at Turing Games YouTube channel are what inspired me to write about AI, so my bias naturally showed here, with this being my longest article so far at 22 minutes! 😅
Shout out to Turing Games! Check out their channel for some truly entertaining content.

Comments
Post a Comment