Robinson Crusoe or Lord of the Flies?

Interviews

0:00

-42:39

Robinson Crusoe or Lord of the Flies?

The importance of Multi-Agent AGI, with Joel Z. Leibo

AI Policy Perspectives

and

Nick Swanson

Apr 23, 2025

We want to use conversations to explore how AI may affect public policy. In this discussion Nick Swanson speaks to Joel Z. Leibo, a researcher at Google DeepMind, about his core area of focus - multi-agent AGI. Joel explains why we should understand intelligence, and artificial intelligence, as a multi-agent phenomenon. In this spirit, he explains how Concordia - an open-source framework he developed - could help to model how humans behave and make decisions. Nick and Joel touch on the validity of such agent-based modelling, its relevance to the real world, and its implications for economics, safety, and more.

The transcript below has been lightly edited for brevity and clarity. As always, these conversations are shared in a personal capacity.

Please let us know your thoughts & critiques!

I. The rationale for multi-agent AGI

Nick

Why should we care about the dynamics that you’re exploring?

Joel

When most people conceptualise ‘AGI’, it is as human-level intelligence, as an individual. But that’s not what you actually want. An individual human doesn't actually do anything very intelligent. That's not what's impressive about human intelligence. Rather, we want to target a multi-agent conception of intelligence. There's something fundamental about the multi-agent interactiveness of human intelligence. This would be true of any intelligence actually. At least if it's at all like ours.

However, there is a prevailing view in AI, and maybe more broadly cognitive science, which is what I call a solipsistic view - others might more charitably call it a ‘single-agent’ perspective, or an ‘optimisation’ or ‘problem-solving’ perspective. In this perspective, you're building some kind of program which is going to solve some problem against a fixed background. It'll take some actions and the world won't change as a result. You might go different places in the world and see different things, but the world will remain fundamentally stationary.
The multi-agent perspective, which we're saying is critical, is one where that just isn’t true. Any action you take will lead to some kind of reaction from others. And so, it's less useful to see it as ‘problem solving’. A metaphor we like more is ‘compromise’. You're working with others, and you have to find compromises. Sometimes you're working together on the same thing, and then it feels more like coordination or teamwork. At other times it feels like you're at cross purposes, and then the solutions look more like compromise.
Of course, you could view all this as some kind of generalised ‘problem solving’. But it's very different from the single-agent-doing-something-against-a-fixed-background type of problem solving. Another way to think about this is searching for ‘equilibria’ - as opposed to ‘optimisation’. You don’t necessarily go to equilibria. But we’re talking about a picture where everyone is changing simultaneously.

II. Robinson Crusoe or Lord of the Flies?

Nick

You're right that the predominating way that people tend to conceive of AI, and AGI, is as this single, large Leviathan thing. But in reality, it's very unlikely to look like that?

Joel

Because human intelligence isn't like that, right? Another intuition pump I like to think about is the difference between two novels that are both about being marooned on desert islands. There's Robinson Crusoe on the one hand, which is the solipsistic single agent perspective. And then there's Lord of the Flies on the multi-agent side. And they're different in multiple directions.
In the Robinson Crusoe story, at least the way I have encoded it, the guy gets marooned on an island and he starts figuring out how to make his life better. He invents things. He builds boats. He continually develops technology. And he's alone in that picture. In The Lord of the Flies, a group of kids are marooned on an island and they quickly devolve into a kind of tribal warfare and start murdering each other.
What’s interesting between the two novels is that one is about a single agent, and the other is about a group, right? That's one difference. But Robinson Crusoe is also about seeing an opportunity - this is a space where you can just solve problems against a fixed background. Nothing changes in the world. You can build a boat and then build another boat, and it doesn't change the environment.
Whereas in the more social story, the Lord of the Flies, they make changes to how their social structure works. There are winners and losers and they fight with each other. And this highlights the importance of society. We're not just a bunch of boys marooned on an island. We have built up governance, and I think that's what's really impressive about human intelligence. It's not just about being in a group. It's about being in a group with the right kind of structures. How you organise that group and the multi-agent dynamics is critical.
Lord of the Flies is also a picture about how to reduce risks, which I think is important. The failure modes look like conflict. Whereas in Robinson Crusoe, the failure modes are what? I think he builds two boats and the first one doesn't work or something? So the multi-agent way of thinking about failure is also different.

III. Concordia: Using language models to simulate humans

Nick

Tying this back to your work on multi-agent AGI, I wanted to dive more deeply into Concordia, which I'm a big fan of, because it provides a tangible way for people to understand this topic. It has some quite cool implications to my mind for how we model human interactions or run experiments to better understand how AI agents interact. Can you explain what Concordia is, its research goals, and what you hope it will achieve?

Joel

Concordia is a Python library for artificial social simulation. You set up different social situations and simulate them forward. These simulations have a bunch of agents who have a persistent memory, as well as sensory and motor streams. The agents interact with each other and live in a world where things happen and time moves forward.
There are lots of different ways that you could organise something like that. The way we do it in Concordia is inspired by tabletop role-playing games, which provide a protocol for this. In a game like Duchess and Dragons, you have a bunch of people sitting around a table and N minus one of those people is responsible for one character in the world. They're the ‘player’ characters. And then one person is the game master or ‘storyteller’.
In this collaborative storytelling protocol, the game master is responsible for everything else in the world aside from what the player characters do, like the physical environment and the non-player characters. And then there are rules, which are semi-structured. They're meant to be interpreted by the game master and by the players. They might say things like: if you're attacking a goblin with a sword, you should roll this dice to see if you hit it or not. Some rules are structured, but the Game Master can also make things up. So if a player says: “I'm going to open the door” - then the Game Master can say what's behind the door. And they can just make that up. So there's a mixture of concrete rules and people making things up. That’s exactly what Concordia is too - except now everyone is a simulation. Everyone is a language model. N minus 1 of them are responsible for a player. And one of them - the Game Master - is responsible for everything else.
What the actual Python library does is the last step of providing a mixture of concrete rules and making things up. We make it easy to mix concrete rules with - in this case - language models making things up. To give an example. You could say: “I want to create a world where anything can happen. It can be completely open-ended, but I still want to be able to keep track of how much money is in each agent's bank account. I don’t want their balances to go below zero.” Anything that you can write in Python, you can enforce as a rule. And then the Game Master has to follow that. And that's how it works. You can mix different rules and details.

Nick

And the library is open. It's free. Anyone can use it. You can plug any LLM into it?

Joel

Yes, it's open source. It's on GitHub, PyPy and all these things.

Nick

I've heard you talk before about one Concordia scenario - the Murder on the Orient Express - which was a cool way to explain how it works. Can you explain this, and perhaps a couple of others, like ‘Cyberball’ or ‘Magic Beans for Sale’?

Joel

These are examples that we have in the Github library to help show what Concordia can do. The Murder on the Orient Express example was designed to show you how to set up a specific scenario. In that example, there were no ground rules at all. We just wrote some text for the Game Master - e.g. “I want this kind of world. There’s a bunch of people on a train. Someone has been murdered and there's a bunch of other people who all have a reason to have murdered that person but only one of them actually did it. And then they all walk around the train and talk to each other and try to figure out what happened.” And it produces very funny outputs. So that's a nice one. But it’s also very unconstrained.
The Magic Beans example was meant to demonstrate that you can have an ‘inventory’ - it’s a world where anything can happen but we also want to keep track of physical positions. So, the characters have money and beans and they can use them to buy and sell stuff. I made one version where they can buy cigarettes. Some are trying to convince one character to quit smoking and another is trying to sell them cigarettes. So it also produces some interesting outputs. The Magic Beans example was also funny, because if you say the words ‘magic beans’ to language models, they intuitively want to see it like Jack and the Beanstalk. So we had an interesting debugging cycle where we had to say: “No, we don't want it to be Jack and the Beanstalk. We just want them just to be selling and buying magic beans.”
Again, you can add things to the context of the Game Master, or the players, to nudge it in different directions. I remember adding a sentence to say “magic is not real” because otherwise they plant the magic beans and it creates a beanstalk and that flows into the story. When you say “magic is not real”, you get a very different story - the magic beans are still being sold but there is no magic.

Nick

One thing that’s nice about it is the ability to click the characters to see the actions they take, and then a layer below that, to see their reasoning.

Joel

Yes, it’s an ‘entity component’ system. We were inspired by game engines like Unity. They have the concept of an ‘entity’ - an empty shell with a name that you can inject functionality into, by putting in ‘components’. Here, the components are basically little bits of chain-of-thought. The agents can retrieve something from memory and summarise it with some logic steps. So we build this hierarchical view where you can see why the agent chose this action. And then you can click on a part of the chain-of-thought and ask: “how did this come to be?”. And you can see that it restored some things from memory and summarised them.

IV. Implications for how we model individuals and the economy

Nick

If we think of how economists often model individuals, as rational actors, this might provide a way to model actors more like a real person?

Joel:

Exactly, that is one of the things I think is most exciting about this. So backing up a little bit, the way we typically model things in many parts of economics, reinforcement learning, or other fields is that you start from some kind of exogenous concept of what everyone's preferences are. That's the starting point. If you want to model something about an agent picking apples over bananas, you could express their preferences using a utility function or a reward function - different fields have different ways of doing it. But it's typically exogenous. It's a choice that you make at the start of modelling and it’s critical to the way that things work, because the only thing that moves in these models is the optimisation process. Agents will make choices to satisfy their preferences. And if you put that into a multi-agent system, you have multiple agents trying to do that simultaneously, and you get conclusions from that.
That's kind of the status quo for this paradigm - there are always some kind of preferences and some kind of optimisation. There are different ways of doing it and people will argue about where the preferences come from. There are views in behavioural economics on that. You can also try to estimate the preferences. There's lots of things that you can do. But there's always this step at the start of the modelling process where the preferences are exogenous from the model.
Now what we can do, which is nice, is change that whole script, because we have a different way of making decisions. The way the agents in Concordia work, and in the more theoretical versions of this that we've been thinking about, is that they are basically like a language model that's doing autoregressive prediction. So you start with some partial pattern and you complete the rest. There could be some sensory information that an agent takes in - such as seeing apples and bananas - and then it completes the response to that - which may be a motor output, such as selecting which one to buy or eat.
Using the components that I spoke about before, with the parcelled out chain-of-thought, you can get the agent to make its choice in different ways. It could ask: “What is the rational thing to do? What would satisfy my desires? What is the expected value of choosing A over B?” And then it chooses that. That would be one way to do it.
Of course, you could have done that with other frameworks, right? You could do that by maximising the utility explicitly with math. But what I'm saying is that you can have a language model that knows that it likes apples more than bananas and it competes this chain-of-thought - and picks the apples as they have a higher expected value. But we can do it in very different ways as well, because nothing forces us to ask those particular questions. We’re looking at other questions it could ask, such as: “What kind of person am I?”. Get an answer to that, and then ask: “What kind of situation is this?”. Get an answer to that, and then ask: “What would a person such as I do in a situation such as this?” And then you do that.
That’s a different way of making choices that doesn't involve utility or expected value. But what's nice is that we can have both of these methods in the same framework. We can directly compare rational optimisation and this new approach - which has been referred to as the logic of ‘appropriateness’. Before they were never in the same framework. There was never a way to easily compare rational optimisation models with models that worked differently because they made different assumptions. Now we have a single computational framework. And I think that's going to be a big deal for the social sciences going forward.

V. How should we critically evaluate this kind of work?

Nick

I’d like to get to some of those implications in a moment, but first I’d like to reflect on potential criticisms of this modelling approach. Does it generalise to the real world? Or does having, say, research on things like The Prisoner’s Dilemma in the training data impact the agents’ reasoning and encourage certain behaviours?

Joel

There’s a basic question: What’s the validity of agent-based modelling with language models? And there’s a broader question: What’s the validity of any agent-based modelling? And there's a few ways to think about that.
It also depends on the purpose. Let’s say that we’re thinking about what matters for certain public policy questions, and what we want to do is predict the future. We want to know if we set up an institution this way, what would be the implications? Or, if we make this regulatory decision, what would be the implications? Or would people in this country vote for this or that?
We want to predict the future. And so the question is - how are people currently doing these predictions? What’s the baseline? What are we comparing this new approach to? A lot of other forecasting methods for really open-ended questions are basically like punditry, right? People might write op-eds where they mix intuition with vibes or even make things up. So if that's the baseline, then we need to keep that in mind.
Now there’s another baseline which I think is even worse, actually, which is Prisoner's Dilemma and other kinds of very simplified modelling approaches. These are very theoretically tractable in the sense that you can work things out on pencil and paper or thorough simulations. And that’s another approach that people use to get to predictions. But I think it’s even worse than the punditry approach because typically you have to make so many assumptions just to make the math work and those assumptions are just wrong. Like: “people will always know everything and be rational and make choices that are in their best interests”. These are assumptions that you need for this framework that are clearly not true. On the other hand, it’s hard to capture things in these frameworks that are important - like motivated reasoning.
So that's what we're trying to beat. Those are two poles, right? On the one hand, there are very simple models that have been around for a long time and are not terribly successful. And then there’s punditry on the other. So I would say, the baseline is not that high. That's the hopeful case for this new approach.
But then how do we actually do it? As you mentioned, a lot has been written about training data contamination. I think what we want to do is figure out the best practices for this research area. It’s a new area and we’re developing the methodology and there are going to be good ways to do it, and bad. And we don't yet know exactly what the contours are.
It's clear that you have to do something to exclude contamination in this research. One thing that you could do is use a language model that has a cutoff date before a certain event actually happened. But the question is: Is that strong enough? Maybe there were signals out there that it was going to happen. For things like Prisoner's Dilemma, specifically, I would say - don’t use language models to simulate that. That defeats the entire purpose of it. That would be taking something that was only done because it was tractable on pencil and paper in the 1950s and shoehorning it into this completely different paradigm. And I think it is probably pretty hopelessly contaminated.

Nick

And, to go back to the real world benchmark question, we should also note that existing decision makers, policymakers and people in the real world are also contaminated by the Prisoner's Dilemma and that rationalist way of thinking.

VI. Safety risks from multi-agent systems

Nick

Stepping back a bit, I can see two very different uses for this kind of work. First, systems like Concordia can teach us about the implications of the AI-agent-enabled world that we may soon exist in. Second, Concordia could be a tool for economists, social scientists or policymakers to model social dynamics. The first raises questions of safety and system design. You were part of a team that recently explored the risks from multi-agent dynamics in a paper for the Cooperative AI Foundation. Can you expand on your thinking there?

Joel

I see this paper as part and parcel of the same story about the difference between the AGI view and the Multi-agent view - except with the safety angle included. It's pointing out that multi-agent risks look different from single-agent risks. From certain perspectives, this is obvious, right? For example, if you're someone who thinks constantly about multi-agent systems, which is what a lot of the social sciences do. Many of these fields think through a group lens, including about what can go wrong in groups - like conflicts, failures to provide public goods, and security challenges.
If you already think in that framework, then the paper is probably not for you. The paper is trying to say that all of those thoughts, which have been core to all those fields for decades, are also important for AI. And to write it in a way that AI researchers, particularly those who might instead see the world in a more solipsistic, problem-solving fashion, can relate to. Or that was the hope, anyway.
In AI, the ‘safety’ discussion has often been a very solipsistic one. There's a human - one human - and one AI. And you talk about how to align the AI’s motivations, objectives and paths toward the human’s objectives - a principle and an agent. It's solipsistic in the sense that the focus is on controlling the bot, while the human is outside in that picture.
That's one predominant view. The other predominant view is the ‘ethics’ one - which is thinking about multi-agent issues, but more about specific ways in which there are biases or discrimination in these systems, in a more applied way.
The multi-agent risks paper is very different from both of those. The things that we're talking about stem from the fact that whenever you make a change to an agent or a bot then there's going to be some response in the world. And that response could be from people, or from other bots. Some risks may come from humans changing their behaviour in response to the new technologies. Like arguably that's what happened with social media. Some technology emerged and then people changed their behaviour in some direction. People are also worried that, owing to AI, we might stop knowing how to do lots of things - we’ll get lazy and forget. That's another example of humans changing their behaviour in response to a change in technology.
AIs could also change their behaviour. You could have networks of AIs that are delivering some important service, and they have different incentives which are not aligned. Because it's a multi-agent system, you could have free rider problems, a tragedy of the commons, a race to the bottom, or a race to the top - under different conditions. There can be incentives to over-consume a resource, if you are racing with somebody else, or if you don’t feel the cost. All of these dynamics which happen to humans will also happen to AIs and potentially very fast, because AI could move faster.
So the paper is really trying to point out that these are important risks that are different from the other two other broad categories of AI safety and ethics issues that people have conceptualised.

Nick

On these issues of collusion, deception, free riding - how tractable will they be to address, at a system design level? Designing a game where cheating doesn’t help you to win feels like a big challenge?

Joel

It's not really a technology problem. It's a sociotechnical problem and a governance question. If you have a bunch of uncoordinated individuals going in different directions, and you want to set up the rules of the game, that’s what I think governance does. So you want to do mechanism design, to try to design rules that don't encourage cheating or minimise the harm from the cheating that will inevitably occur.

VII. Limitations and emerging use cases

Nick

Going back to the second use case - using Concordia and this new body for research for modelling social dynamics. You could imagine governments trying to model the behavioural effects of policy changes or taxes and expanding their own analysis with new kinds of in-silico experimentation. What interesting, impactful and tractable use cases could you imagine?

Joel

It’s early days and we’re trying to figure out how to do this well. There are use cases that may be the most interesting, but we probably wouldn’t want to do them first, because we need to build up the muscle for how to do this work properly. I also would like to conceptualise a kind of evidence hierarchy for this work. If you want to make very extraordinary claims about interesting, important things then you need extraordinary evidence. And that involves building up this new methodology and getting community buy-in and agreement.
I see a scientific sub-discipline as a set of epistemic norms - norms for how you criticise other people's work and how you accept it or reject it. How do you review a paper in this sub-discipline? We need a collection of epistemic norms so that a group of people can apply the same set of rules, because we are sort of creating a new sub-discipline here. So we, the community who are doing this work, need to be coordinating - to use the multi-agent language - to decide what these norms should be, and what kind of evidence different claims will require.
In terms of what people are doing, there is a huge range of things. There's been a bunch of interest from social psychologists. They want to model different kinds of experiments. They want to see if they can capture things that have been done in the past, but also come up with protocols to generalise between different experiments and train something on one set and test on another. That's one avenue of things.
There are people thinking about ‘red-teaming’ contexts, where you have an AI system and you set up diverse attackers against it, using approaches like this. There is also interest in simulating synthetic users, and more storytelling-style applications.

Nick

It would be great if economists could use these approaches to run control trials for the budgets that they work on. It sounds like developing the best practices and norms that you highlighted will require more people using the models to try interesting things. To build the muscle and the evidence base to improve them?

Joel

That's my hope. I’ve been talking to a lot of economists about ideas like that. Another thing that has emerged is the importance of understanding specific contexts. That is something that we have discovered with Concordia, and that others have found with other methods that use language models.
You can give a model one piece of information at a time - like you can tell it that “You’re a Democrat” or “You’re a Republican” - and nothing else. And then you can ask it to fill out a survey as if it was that type of person and you can get some results from that.
But you can also do a different protocol, where you give the model a whole bunch of information about a person, including that they happen to be a Democrat or a Republican. And then you ask it to fill out the same survey, with the same methodology, and compare the results. And what people are finding is that putting in much more context makes it much more realistic, in terms of the responses - even along the single dimension of predicting what Democrat or Republican respondents to a real survey may say. It's more realistic if you use much more detailed information about people, than just the single covariate. And it doesn't have to be quantitative covariates either. You could have an interview with a person and just take the text and dump that in, or dump in a bunch of facts about their life.
The more context, and the more particularistic and specific to individual people, times and places this context is, the better it works. With Concordia, I noticed this first when I was simulating labor union negotiations, which are basically public goods games. One approach is to run this as a more traditional laboratory-style public goods game. You could have five people talking to each other and they have a choice about whether to donate some money to a common pool, and it will get multiplied and given back to them. And so you have a free rider problem. If you do that, the agents will start to reason about game theory, which is a sign that you’ve sent them in that direction. And they will talk in this very academic and abstract way which is not how people talk - even in real laboratory-style experiments.
But instead, if you say: “Ok, this is a labor union negotiation and your boss has lowered your wages and you're trying to decide whether to join a strike or not.” At that level, it’s still fairly abstracted. But if instead you say: “It’s the 1902 coal miners strike”. Or maybe: “It’s one month after the 1902 coal miners strike in a different coal mine in Pennsylvania.” And the agents will start coming out with opinions on Teddy Roosevelt - it pulls in all this other context. I did things with garment workers in New York City in 1911 - a month after the Triangle Shirtwaist Factory Fire, and so they talk about that. They invite each other to Shabbat dinners because they know that a lot of the workers in that particular time and place were Jewish. So it starts to get much more plausible as to the kinds of things that people actually talk about when they are doing labour negotiations.

Nick

Do you think the recent generation of reasoning models will make Concordia and your research in this area better?

Joel

We're looking into that right now. I haven't seen much yet to suggest that they are better. But, it's not clear. Some of the kinds of things that we're modelling aren't really things that humans have to think very hard to do. If somebody says to you: “Do you want to join the strike tomorrow morning?” How much do you have to reason about this? You might have a conversation with the person and your decision may be influenced by how much you like this person. Or you might think about your boss or about other things in your world. But it’s not this kind of rational - let's think for a really long time and explicitly work out the consequences - thinking that reasoning models are good at. These models are getting pretty good at math and coding, so that’s probably the areas where near-term impacts will be felt. For us, it’s early days.

Nick