Skip to content

Can we turn agent-based models into empathetic stories (without getting poetic)?

Published:
5 min read

Our new publication just came out! Let’s break it down

Source paper : https://doi.org/10.1080/17477778.2025.2536663

Instead of asking “can ABMs predict?”, we ask something more human:

Can an LLM help decision-makers feel what simulated agents go through, by turning ABM traces into first-person stories, while staying faithful to the underlying dynamics?

The idea in one line

ABMs are great at structure (counts, curves, trends).
Stories are great at attention and care.
So we try to keep the ABM for structure and use the LLM for narrative bandwidth, without drifting into syrupy “LLM empathy” prose.

What we did

1) The models (3 case studies)

2) The pipeline (from traces → story)

3) The two ways we prompted empathy

We compared two prompting strategies:

What we measured (and why)

We didn’t just eyeball the stories, we measured readability (Flesch Reading Ease), quality + faithfulness (does the story reflect initial/final states and trends), human perception of empathy via validated questionnaires (pilot user study).

Results

1) Indirect > direct for readable writing

In a factorial design (144 runs), readability ranges from “fairly easy” to “very easy” depending on settings.

The biggest lever is indirect prompting (style transfer):

In plain terms: shorter sentences + simpler words, without explicitly pleading for empathy.

2) Faithfulness improves (at least on the checks we ran)

With the indirect setup, every story had a human name, and the text reflected initial/final/trends for the agent across all stories, an improvement over the direct approach’s already-high rates.

3) People believe the “genuine emotions” signal, but don’t fully feel it

Pilot survey: 6 participants (balanced gender; avg age 42; all graduate/professional degrees).

On the State Empathy Scale (1–5):

We argue this gap is expected: the scenarios are extreme (evacuations, disasters), and readers aren’t in that physiological state which makes it harder to empathize.

There’s also evidence outcomes vary by model (one-way ANOVA p = 0.0349).

Why this matters

ABMs usually talk in aggregates: curves, counts, means. That’s great for prediction. Bad for care.

Our paper shows a pragmatic path:

Some limitations


Edit on GitHub