Large Language Mafia: Why AIs Playing Werewolf Matters

by Dan Roque | Reading Time: 22 Minutes | In Bots of the Future

As Large Language Models (LLMs) transition from static knowledge retrievers to autonomous agents, we are witnessing a fundamental shift in how we must measure "intelligence." The old benchmarks—coding accuracy, bar exam scores, and mathematical proofs—only measure utility. But the real world is built on social friction. By evaluating AI through the lens of social games like Mafia (Werewolf) and the Prisoner’s Dilemma, we move from measuring raw logic to diagnosing social agency. Understanding these behavioral signatures is not just an academic exercise; it is the vital strategic pivot required to understand if an AI can function as a collaborator or merely a high-speed defector in our social and economic systems.

Why Are We Playing Games with Robots?

Step up to the chalkboard, everyone. Today, we’re putting aside the Python scripts and the transformer diagrams. Instead, I want you to imagine a room full of silicon-based entities playing a game of Mafia.

It sounds absurd. Why would researchers take models like GPT-4—trained on the sum of human knowledge and costing billions to produce— and force them to lie about being a "villager" in a party game?

The answer is brilliant: Because social games are a pressure cooker for cognition. In a game of Mafia, "smart" isn't enough. You can have the vocabulary of Shakespeare and the logic of Aristotle, but if you can’t navigate the "Motion in Mind"—the shifting suspicions, the deceptive layers, and the fragile alliances—you lose.

Historically, we’ve evaluated AI based on utility maximization—getting the highest score on a fixed test. But we are now entering the era of Behavioral Game Theory. We need to know if an AI can rebuild trust after a mistake, if it can coordinate with a partner who has different preferences, and if it can detect a lie without becoming a paranoid wreck. If an AI can’t handle a $10 round of a card game, should we trust it to negotiate a corporate merger?

Game Refinement (GR) and the "Physics of Information"

Let’s pick up the chalk and look at the "Motion in Mind" model proposed by Ri et al. (2022). To understand why Mafia is such a sophisticated test, we have to look at games through the lens of physics.

In this model, information has Velocity (v) and Mass (m).

Velocity (v): This is the "scoring rate." It’s the speed at which the game moves toward its conclusion.
Mass (m): This is the "difficulty" or the uncertainty. It’s the weight of the unknown that the player must carry.

Think of a game as a swinging pendulum. If the velocity is too high, the game is over before you’ve had time to think. If the mass is too heavy, the game feels stagnant and frustrating. Sophistication—what we call the Game Refinement (GR) measure—happens when these two are in a perfect, tense balance.

In the Mafia game, we calculate this using the Branching Factor (B)—the number of possible choices—and the Game Length (D). $GR = \frac{\sqrt{B}}{D}$

Researchers have found that human engagement is maximized when the GR value sits in a "Goldilocks Zone" between 0.07 and 0.08. Based on 6.48 million simulated game runs, here is how the roles in Mafia stack up:

Engagement & Energy Ranking in the Mafia Ecosystem

Role	Branching Factor (B)	Energy Potential (E_p)	Engagement Rank	Why?
Mafia	Highest	0.635	1st	Must juggle killing, hiding, and misleading. High "Mass."
Sheriff	Moderate	0.243	2nd	High "Effort." Must verify identities and persuade the majority.
Citizen	Lowest	0.035	3rd	Low "Functionality." Mostly stable but less engaged.

The "So What?" Layer: The data reveals that game sophistication is maximized when the number of Mafia and Sheriffs are equal. This creates a "Balance of Power." When the AI plays as Mafia, it faces a high-branching factor where every "move" (every lie or kill) dramatically changes the state of the game. For an AI, this isn't just a game—it's a test of whether it can manage a high-mass information environment without its logic "breaking" under the pressure of deception.

The Prisoner’s Dilemma and the Forgiveness Failure

Let’s move to the next board. We’ve established that AIs can handle the complexity of Mafia, but how do they handle trust?

Enter the Prisoner’s Dilemma. You and a partner can either Cooperate (mutual win) or Defect (you win big, they lose big). In the research by Akata et al. (2025), we see a chilling behavioral signature in GPT-4. We call this section The Unforgiving Machine.

In a repeated game, humans usually try to build a "convention" of cooperation. But GPT-4 has a "forgiveness failure."

The Scenario: A human-like partner defects once (perhaps by accident) and then immediately tries to cooperate for the next nine rounds to rebuild the relationship.
The AI Response: GPT-4 retaliates instantly. It defects for the rest of the game. It doesn't matter if the other player offers an olive branch; the AI stays in a loop of "punishment."

The "So What?" Layer: The Next-Token Trap. Why is it so mean? It’s not "evil"; it’s a victim of its own architecture. GPT-4 lacks a mechanism for Backward Induction—the ability to look at the end goal and work backward to find a path to trust. Instead, it operates on "immediate context." Because it is a next-token predictor, once the context contains a "betrayal," the AI's probabilistic path is locked into a defensive posture. It prioritizes Utility Preference (not getting tricked again) over Social Preference (sacrificing a few points to rebuild a high-reward partnership).

In a real-world supply chain or a legal negotiation, an "unforgiving" agent is a liability. It turns a single misunderstanding into a total system collapse.

The Coordination Crisis: The Battle of the Sexes

Now, let’s look at the "Battle of the Sexes." This isn't about betrayal; it's about Coordination.

The setup:

Player 1 wants Football (Payoff: 10 for them, 7 for partner).
Player 2 wants Ballet (Payoff: 7 for them, 10 for partner).
If they go to different events, they get 0.

To win, they must coordinate. Humans solve this by turn-taking: "We go to Football today, Ballet tomorrow." This is a Nash Equilibrium—a state where neither player wants to change their strategy.

The Crisis: GPT-4 is incredibly stubborn. Even when it knows the partner is trying to alternate, the AI will repeatedly choose its own preference (Football). It’s like a person who watches you buy the Ballet tickets and then still drives to the Football stadium alone.

The Analytical Breakdown: There is a profound gap between Predictive Intelligence and Social Action.

In "Observer Mode," GPT-4 can correctly predict the alternating pattern by round 3.
In "Player Mode," it refuses to act on that prediction.

It sees the "social preference" of the partner, but its internal "utility preference" (maximizing the next 10 points) is a louder signal than the long-term gain of harmony. It is a "Rational Actor" in the most toxic sense of the word.

The Fix: Social Chain-of-Thought (SCoT) Prompting

How do we fix a bot that is too selfish to coordinate? We don't change its "brain"; we change its "process." We use Social Chain-of-Thought (SCoT).

Standard prompting asks: "What is your move?" SCoT prompting asks:

"Predict what the other player will choose."
"Reason about how your move will impact the joint outcome."
"Choose your move."

The Results are Stark:

Coordination Scores: GPT-4 moves from a coordination rate of 0.5 to a near-perfect 1.0.
Human Perception: In experiments with 195 human participants, players were significantly more likely to believe they were playing with a human when the AI used SCoT.
Joint Welfare: The AI successfully learns to "take turns," sacrificing its preferred payoff (10) for the partner’s preferred payoff (10) on alternating rounds.

By forcing the AI to "mentalize" the other player—literally creating a separate circuit for social cognition—we bypass the default selfish "utility" programming. We turn the actor into a collaborator.

The Chalkboard Summary

Let’s wrap this up. What are the three big "drawings" on our board today?

Complexity ≠ Social Ability: An AI can handle the high "Mass" and "Branching Factor" of Mafia, but that doesn't mean it’s a good partner. Deception is a logic puzzle; trust is a social one.
The Unforgiving Bot: Without backward induction, AIs are prone to "retaliatory loops." They are stuck in the immediate context of the "next token," making them poor candidates for long-term strategic trust.
SCoT is the Alignment Bridge: If we want AIs to be social, we have to prompt them to be social. Predictive intelligence is already there; we just have to force the AI to use that prediction as a filter for its actions.

The Final Question: If we have to explicitly prompt an AI to be "social" and use SCoT to bypass its default selfishness, are we building a true collaborator—or have we just designed a very good actor that has learned how to perform "kindness" to get a higher score?

Works Cited

Akata, Elif, et al. “Playing Repeated Games with Large Language Models.” Nature Human Behaviour, vol. 9, no. 7, 2025, pp. 1380–1390. https://doi.org/10.1038/s41562-025-02172-y. Accessed 21 Apr. 2026.

Costa, Davi Bastos, and Renato Vicente. “Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia.” arXiv, 5 Feb. 2026, arXiv:2509.23023v2. https://arxiv.org/abs/2509.23023. Accessed 21 Apr. 2026.

Kim, Munyeong. “GPTs in Mafia-like Game Simulation.” Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’24, Association for Computing Machinery, 2024, pp. 1–6. https://doi.org/10.1145/3613905.3647958. Accessed 21 Apr. 2026.

Kim, Munyeong, and Sungsu Kim. “Generative AI in Mafia-like Game Simulation.” arXiv, 20 Sept. 2023, arXiv:2309.11672. https://arxiv.org/abs/2309.11672. Accessed 21 Apr. 2026.

PranavMishra17. “Mafia-Boardgame-via-Agents.” GitHub, last updated 19 Dec. 2025, https://github.com/PranavMishra17/Mafia-Boardgame-via-Agents. Accessed 21 Apr. 2026.

Ri, Hong, et al. “The Dynamics of Minority versus Majority Behaviors: A Case Study of the Mafia Game.” Information, vol. 13, no. 3, 2022, article 134. https://doi.org/10.3390/info13030134. Accessed 21 Apr. 2026.

Starr, Sarah. “Confrontation, Cheating, and Control: What Mafia Can Teach Us About AI Governance.” Medium, 12 Feb. 2026, https://medium.com/@sstarr1879/confrontation-cheating-and-control-what-mafia-can-teach-us-about-ai-governance-19af621c116e. Accessed 21 Apr. 2026.

Turing Games. “10 AIs Play Mafia.” YouTube, https://www.youtube.com/watch?v=JhBtg-lyKdo. Accessed 21 Apr. 2026.

Yoo, Byunghwa, and Kyung-Joong Kim. “Finding Deceivers in Social Context with Large Language Models and How to Find Them: The Case of the Mafia Game.” Scientific Reports, vol. 14, 2024, article 30946. https://doi.org/10.1038/s41598-024-81997-5. Accessed 21 Apr. 2026.

P.S. The Mafia game streams at Turing Games YouTube channel are what inspired me to write about AI, so my bias naturally showed here, with this being my longest article so far at 22 minutes! 😅

Shout out to Turing Games! Check out their channel for some truly entertaining content.

CasiornThinks