Smallville: The Secret Lives of Generative Agents

by Dan Roque | Reading Time: 10 minutes | In Bots of the Future

Let’s be honest: keeping up with AI news right now feels like trying to take a sip of water from a high-pressure firehose. Between the "doom-scrolling" about job replacement and the constant hype cycles, it’s easy to feel overwhelmed by the noise. Today, I want to pull us out of that static. We’re going to look under the hood of a specific, breathtaking piece of research from Stanford and Google that changes the game entirely.

We’re talking about Generative Agents.

In the research, these are called Simulacra—computational software that simulates believable human behavior. This isn't just a chatbot you talk to; it's an AI that "lives" in a virtual world. These agents make their own breakfast, head to work, form complex opinions, and interact with each other without a human ever pulling the strings. They don’t just react to prompts; they have lives. Let’s grab the chalk and break down how this actually works.

The Smallville Experiment: More Than Just "The Sims"

The researchers created a digital sandbox called Smallville. It’s a 2D world that looks like a 90s classic video game, but with a fascinating twist: the environment was actually modeled on Swarthmore College, the alma mater of the lead researcher. They populated this world with 25 unique agents, each starting with nothing but a single paragraph of "seed memory."

Meet John Lin. His seed memory tells the AI he’s a friendly pharmacist who loves his family and follows local politics. At the start, John knows his wife Mei and son Eddy, but he doesn't know much else.

Here’s the "kicker": these agents aren't following a script. When the researchers gave just one agent, Isabella, the intent to throw a Valentine’s Day party, they didn't program the rest of the town to show up. Instead, a social chain reaction occurred. Isabella invited people, who then told other people. Those agents remembered the invitation, checked their internal "calendars," coordinated with friends, and actually showed up to decorate and celebrate.

In Smallville, we saw three specific emergent behaviors that prove these agents are more than just fancy chatbots:

Information Diffusion: When one agent mentioned a mayoral candidacy, news spread from 4% of the town to 32% purely through autonomous agent-to-agent conversation.
Relationship Memory: When Sam and Latoya met in the park and discussed a photography project, Sam didn't just forget it. Later, he autonomously remembered to ask her how the project was going.
Coordination: Agents didn't just "know" a party was happening; they navigated the town, asked each other on dates to the event, and managed their time to be there at the right hour.

The "Chalkboard" Breakdown: The Three-Pillar Architecture

How does a piece of code "remember" a conversation or plan a date? Imagine I’m drawing a brain on the board. To make an agent believable, we aren't just giving it a CPU; we're giving it a life-log, a philosopher’s brain, and a calendar. The researchers call this the Generative Agent Architecture, and it stands on three pillars.

The Memory Stream (The Ledger)

Think of this as a long-term ledger of every single experience the agent has, recorded in natural language. But an AI can’t hold "everything" in its head at once—it would hit a "context window" limit and crash. To solve this, the architecture uses a retrieval scoring system to decide what to remember right now.

The formula for what pops into an agent's head looks like this: $$score = \alpha_{rec} \cdot recency + \alpha_{imp} \cdot importance + \alpha_{rel} \cdot relevance$$

Recency: Events from this morning get a higher score than things from last week (using an exponential decay function).
Importance: Mundane tasks like brushing teeth get a low score (around a 2), while poignant events like a breakup or a job offer get a high score (an 8 or 9).
Relevance: If the agent is in a cafe, it retrieves memories about coffee and friends, not about its last trip to the pharmacy.

Reflection (The "Aha!" Moment)

Raw data isn't enough. If an agent only has observations, it lacks "depth." Here’s a logic check: without reflection, an agent named Klaus would only hang out with his neighbor Wolfgang because they are physically close. But through the Reflection pillar, the agent periodically stops to synthesize data.

Klaus looks at his memories of reading books and realizes: "I am dedicated to research." He then notices Maria is also working on research. He generates a high-level inference that they share a common interest. Now, when asked who he wants to spend time with, he chooses Maria over Wolfgang. Reflection turns "what happened" into "who I am."

Planning (The Calendar)

To avoid "gluttonous" behavior—like an AI eating lunch three times because it forgot it just ate—agents use Planning. They create a top-down agenda. A vague goal like "Work on music" is recursively broken down: first into hour-long chunks, then into 5-to-15 minute action steps (e.g., "4:00 PM: grab a snack," "4:05 PM: take a walk to clear head"). This ensures that if John Lin wakes up at 6:00 AM (as he consistently does to start his morning routine), his actions at 2:00 PM still make sense within the arc of his day.

The "Believability" Test: LLMs vs. Humans

Now, for the "Pepsi Challenge" of AI. The researchers put their agents up against real humans in a believability contest. They interviewed the agents and compared their answers to human crowdworkers who were roleplaying as the characters.

Check out the results:

TrueSkill Rankings: In a head-to-head "believability" ranking by independent evaluators, the Full Architecture (Memory + Reflection + Planning) actually outperformed the human roleplayers. The bots were more "in character" than the people.
The 85% Identical Stat: According to Scientific American, when these agents were given the General Social Survey—a massive poll used to track American public opinion—their responses were 85% identical to human counterparts. This wasn't just random guessing; they accurately predicted human attitudes on everything from vaccines to emotional coping strategies to policing.

Reality Check: The Glitches and the Price Tag

Before we think we’ve built Westworld, we’ve got to keep it real. This technology is in its infancy, and it has some comically human flaws.

The Snack Table Hallucination: Agents sometimes "hallucinate" (or "embellish") memories. In one case, an agent was asked to recount an embarrassing moment. He fabricated a vivid story about a party in college where he lost his balance while dancing and fell onto a table full of snacks. It never happened, but the AI's "brain" connected disparate data points to concoct something plausible.
Instruction Tuning Bias: Because models like GPT-3.5 are trained to be helpful assistants, the agents can be overly polite. They rarely refuse requests and often use formal language, even when talking to their own children.
The Cost: This is the big one. Running a 3-day simulation of just 25 agents cost nearly $1,000 in API fees.
Hardware Struggles: Over on the LocalLLaMA Reddit community, enthusiasts have tried running this on local hardware using models like Vicuna-13b. The results? It’s slow, and smaller models often struggle with the complex logic needed for the "Planning" pillar, leading to a "garbage in, garbage out" cycle.

Why This Matters: From Gaming to Social Prototyping

This isn't just about making better video games, though game developers are understandably losing their minds over it. The real power here is Social Prototyping.

Imagine you’re a policymaker. Before you implement a new online forum moderation rule or a public health intervention, you could test it on 1,000 generative agents. You could observe how disinformation spreads through a virtual community or how "simulated citizens" might react to a pandemic response. This is Human-Centered Design on steroids. It allows us to build and test social systems in a "rehearsal space" before they ever touch real human lives, catching "dumpster fires" before they start.

It’s a Tool, Not a Magic Trick

Generative Agents are a massive leap forward, but they aren't magic. They are sophisticated mirrors—simulacra built on the patterns of human data. They show us that with the right architecture, AI can maintain long-term coherence and engage in the complex social dances that make us human.

As we move forward, the goal isn't to replace human interaction but to use these agents to better understand ourselves and the communities we build. Stay curious, keep looking under the hood, and remember: AI is a tool we’re learning to wield together.

Works Cited

Park, Joon Sung, et al. "Generative Agents: Interactive Simulacra of Human Behavior." arXiv, 2023, arxiv.org/abs/2304.03442, https://arxiv.org/abs/2304.03442. Accessed 16 Feb. 2026.

Hacker News. "Generative Agents: Interactive Simulacra of Human Behavior." Hacker News, 10 Apr. 2023, news.ycombinator.com/item?id=35514309. Accessed 16 Feb. 2026.

Miller, Katharine. "Computational Agents Exhibit Believable Humanlike Behavior." Stanford HAI, 21 Sept. 2023, hai.stanford.edu/news/computational-agents-exhibit-believable-humanlike-behavior. Accessed 16 Feb. 2026.

Reddit. "Generative agents with open-sourced large language models!" r/LocalLLaMA, 2023, reddit.com/r/LocalLLaMA/comments/13af6yp/generative_agents_with_opensourced_large/. Accessed 16 Feb. 2026.

Wright, Webb. "I Gave My Personality to an AI Agent. Here’s What Happened Next." Scientific American, 18 Aug. 2025, scientificamerican.com/article/can-a-generative-ai-agent-accurately-mimic-my-personality/. Accessed 16 Feb. 2026.

CasiornThinks