Smallville: The Secret Lives of Generative Agents
by Dan Roque | Reading Time: 10 minutes | In Bots of the Future
We’re talking about Generative Agents.
In the research, these are called Simulacra—computational
software that simulates believable human behavior. This isn't just a chatbot
you talk to; it's an AI that "lives" in a virtual world. These agents
make their own breakfast, head to work, form complex opinions, and interact
with each other without a human ever pulling the strings. They don’t just react
to prompts; they have lives. Let’s grab the chalk and break down how this
actually works.
The Smallville Experiment: More Than Just "The Sims"
The researchers created a digital sandbox called Smallville.
It’s a 2D world that looks like a 90s classic video game, but with a
fascinating twist: the environment was actually modeled on Swarthmore College,
the alma mater of the lead researcher. They populated this world with 25 unique
agents, each starting with nothing but a single paragraph of "seed
memory."
Meet John Lin. His seed memory tells the AI he’s a
friendly pharmacist who loves his family and follows local politics. At the
start, John knows his wife Mei and son Eddy, but he doesn't know much else.
Here’s the "kicker": these agents aren't following
a script. When the researchers gave just one agent, Isabella, the intent to
throw a Valentine’s Day party, they didn't program the rest of the town to show
up. Instead, a social chain reaction occurred. Isabella invited people, who
then told other people. Those agents remembered the invitation, checked their
internal "calendars," coordinated with friends, and actually showed
up to decorate and celebrate.
In Smallville, we saw three specific emergent behaviors
that prove these agents are more than just fancy chatbots:
- Information
Diffusion: When one agent mentioned a mayoral candidacy, news spread
from 4% of the town to 32% purely through autonomous agent-to-agent
conversation.
- Relationship
Memory: When Sam and Latoya met in the park and discussed a
photography project, Sam didn't just forget it. Later, he autonomously
remembered to ask her how the project was going.
- Coordination:
Agents didn't just "know" a party was happening; they navigated
the town, asked each other on dates to the event, and managed their time
to be there at the right hour.
The "Chalkboard" Breakdown: The Three-Pillar Architecture
How does a piece of code "remember" a conversation
or plan a date? Imagine I’m drawing a brain on the board. To make an agent
believable, we aren't just giving it a CPU; we're giving it a life-log, a
philosopher’s brain, and a calendar. The researchers call this the Generative
Agent Architecture, and it stands on three pillars.
The Memory Stream (The Ledger)
Think of this as a long-term ledger of every single
experience the agent has, recorded in natural language. But an AI can’t hold
"everything" in its head at once—it would hit a "context
window" limit and crash. To solve this, the architecture uses a retrieval
scoring system to decide what to remember right now.
The formula for what pops into an agent's head looks like
this: $$score = \alpha_{rec} \cdot recency + \alpha_{imp} \cdot importance + \alpha_{rel} \cdot relevance$$
- Recency:
Events from this morning get a higher score than things from last week
(using an exponential decay function).
- Importance:
Mundane tasks like brushing teeth get a low score (around a 2), while
poignant events like a breakup or a job offer get a high score (an 8 or
9).
- Relevance:
If the agent is in a cafe, it retrieves memories about coffee and friends,
not about its last trip to the pharmacy.
Reflection (The "Aha!" Moment)
Raw data isn't enough. If an agent only has observations, it
lacks "depth." Here’s a logic check: without reflection, an agent
named Klaus would only hang out with his neighbor Wolfgang because they are
physically close. But through the Reflection pillar, the agent
periodically stops to synthesize data.
Klaus looks at his memories of reading books and realizes: "I
am dedicated to research." He then notices Maria is also working on
research. He generates a high-level inference that they share a common
interest. Now, when asked who he wants to spend time with, he chooses Maria
over Wolfgang. Reflection turns "what happened" into "who I
am."
Planning (The Calendar)
To avoid "gluttonous" behavior—like an AI eating
lunch three times because it forgot it just ate—agents use Planning.
They create a top-down agenda. A vague goal like "Work on music" is
recursively broken down: first into hour-long chunks, then into 5-to-15 minute
action steps (e.g., "4:00 PM: grab a snack," "4:05 PM: take a
walk to clear head"). This ensures that if John Lin wakes up at 6:00 AM
(as he consistently does to start his morning routine), his actions at 2:00 PM
still make sense within the arc of his day.
The "Believability" Test: LLMs vs. Humans
Now, for the "Pepsi Challenge" of AI. The
researchers put their agents up against real humans in a believability contest.
They interviewed the agents and compared their answers to human crowdworkers
who were roleplaying as the characters.
Check out the results:
- TrueSkill
Rankings: In a head-to-head "believability" ranking by
independent evaluators, the Full Architecture (Memory + Reflection
+ Planning) actually outperformed the human roleplayers. The bots were
more "in character" than the people.
- The
85% Identical Stat: According to Scientific American, when
these agents were given the General Social Survey—a massive poll used to
track American public opinion—their responses were 85% identical to
human counterparts. This wasn't just random guessing; they accurately
predicted human attitudes on everything from vaccines to emotional
coping strategies to policing.
Reality Check: The Glitches and the Price Tag
Before we think we’ve built Westworld, we’ve got to
keep it real. This technology is in its infancy, and it has some comically
human flaws.
- The
Snack Table Hallucination: Agents sometimes "hallucinate"
(or "embellish") memories. In one case, an agent was asked to
recount an embarrassing moment. He fabricated a vivid story about a party
in college where he lost his balance while dancing and fell onto a
table full of snacks. It never happened, but the AI's
"brain" connected disparate data points to concoct something
plausible.
- Instruction
Tuning Bias: Because models like GPT-3.5 are trained to be helpful
assistants, the agents can be overly polite. They rarely refuse
requests and often use formal language, even when talking to their own
children.
- The
Cost: This is the big one. Running a 3-day simulation of just 25
agents cost nearly $1,000 in API fees.
- Hardware
Struggles: Over on the LocalLLaMA Reddit community, enthusiasts
have tried running this on local hardware using models like Vicuna-13b.
The results? It’s slow, and smaller models often struggle with the complex
logic needed for the "Planning" pillar, leading to a
"garbage in, garbage out" cycle.
Why This Matters: From Gaming to Social Prototyping
This isn't just about making better video games, though game
developers are understandably losing their minds over it. The real power here
is Social Prototyping.
Imagine you’re a policymaker. Before you implement a new
online forum moderation rule or a public health intervention, you could test it
on 1,000 generative agents. You could observe how disinformation spreads
through a virtual community or how "simulated citizens" might react
to a pandemic response. This is Human-Centered Design on steroids. It
allows us to build and test social systems in a "rehearsal space"
before they ever touch real human lives, catching "dumpster fires"
before they start.
It’s a Tool, Not a Magic Trick
Generative Agents are a massive leap forward, but they
aren't magic. They are sophisticated mirrors—simulacra built on the patterns of
human data. They show us that with the right architecture, AI can maintain
long-term coherence and engage in the complex social dances that make us human.
As we move forward, the goal isn't to replace human
interaction but to use these agents to better understand ourselves and the
communities we build. Stay curious, keep looking under the hood, and remember:
AI is a tool we’re learning to wield together.
Works Cited
Park, Joon Sung, et al. "Generative Agents: Interactive
Simulacra of Human Behavior." arXiv, 2023,
arxiv.org/abs/2304.03442, https://arxiv.org/abs/2304.03442. Accessed 16 Feb. 2026.
Hacker News. "Generative Agents: Interactive Simulacra
of Human Behavior." Hacker News, 10 Apr. 2023,
news.ycombinator.com/item?id=35514309.
Miller, Katharine. "Computational Agents Exhibit
Believable Humanlike Behavior." Stanford HAI, 21 Sept. 2023,
hai.stanford.edu/news/computational-agents-exhibit-believable-humanlike-behavior.
Reddit. "Generative agents with open-sourced large
language models!" r/LocalLLaMA, 2023,
reddit.com/r/LocalLLaMA/comments/13af6yp/generative_agents_with_opensourced_large/.
Wright, Webb. "I Gave My Personality to an AI Agent.
Here’s What Happened Next." Scientific American, 18 Aug. 2025,
scientificamerican.com/article/can-a-generative-ai-agent-accurately-mimic-my-personality/. Accessed 16 Feb. 2026.

Comments
Post a Comment