AI Policy Perspectives : Interviews

Q&A with Ethan Mollick

Tom Rachman — Wed, 22 Apr 2026 09:48:49 GMT

(Credit: Jennifer Buhl)

How can companies get their employees to use artificial intelligence when human intelligence remains sharp enough to know that this risks replacing jobs? How should education revise itself for the ever-revising technological world that students emerge into? And how to understand the love/hate relationship so many people have with AI?

Ethan Mollick—professor of management at the Wharton School of the University of Pennsylvania and bestselling author of Co-Intelligence: Living and Working with AI—is among the leading public intellectuals commenting on AI adoption, connecting the latest scholarship to real-world usage, including his own tinkering with each new model.

AI Policy Perspectives caught up with Ethan to hear his latest thinking on everything from agentic systems, to why scientific publication is broken, to how workers emotionally relate to AI colleagues. Too much chatter, he argues, considers this transformation at the broadest level. Too little digs into the practicalities of getting it right.

—Tom Rachman, AI Policy Perspectives

[Interview edited and condensed for clarity]

Tom: In your 2024 book Co-Intelligence, you proposed four rules for human and AI collaborations, including that people should oversee and verify AI outputs. But doesn’t the value of AI agents come from people not overseeing and verifying everything?

Ethan: This is where policy matters a lot because these are choices now. In the “co-intelligence era,” you’d prompt the AI to do something in a chatbot, and it would give you an answer. You prompted again, and it’d give you another response. The human was in the loop. And not being in the loop was really dumb because it meant that you were just pasting in the AI’s answer, and then you’d get in trouble, as a lawyer with the judge, or whatever it was. Capabilities were weak, so human-in-the-loop mattered a lot.

But with agentic systems that could do hours of work on their own, now it’s a design choice. When do we want humans-in-the-loop? When is human verification valuable? When is human verification morally required? When is it legally required? What kind of interventions move the system forward? I feel there has been a complete lack of deep understanding about these topics.

Tom: You’ve said that, with agentic systems, management becomes a superpower. Can you explain this?

Ethan: Increasingly, systems look like mini-organizations as they get subagents they can delegate to. So the best way to organize is to give the AI a clear direction of where you want to go. And it turns out that this looks a lot like management. When do you want the AI to check in with you? How do you write a really clear brief? What checks are important? What tests do you want to run? What’s acceptable? What’s not acceptable? Those are management questions.

THE WORKPLACE

Tom: You co-wrote a study last year involving a field experiment at Procter & Gamble that showed AI usage enhanced employee performance. But there were other interesting findings besides that.

Ethan: The most interesting piece about it was that people liked working with the AI, and that it substituted for people emotionally. The second interesting piece was the “smoothing” of capabilities—so, technical people previously had technical ideas while business people had business ideas. But AI smooths out both. If technical people can do business work and business people do technical work, what that tells you is we have to redesign organizations.

Tom: The emotional side—that using AI improved people’s feelings about the work—was surprising to me; I wasn’t sure what to make of it.

Ethan: What to make of it? That views of AI are complicated. If people keep saying, “Yeah, AI is going to destroy all jobs, and may kill everyone on Earth…but might not”—and then, “Why is AI unpopular?!” Feels like not a hard question. People like AI when they use it themselves; they don’t like AI writ large. It’s not surprising to me that AI makes your job better because a lot of jobs suck! And if we do good design work with AI, it makes people’s lives better. If we just let it loose on the world, and tell management that the only option they have is automation, then we’re in big trouble.

Tom: Many knowledge workers seem to be using AI in secret right now, perhaps from fear of being exposed as less valuable.

Ethan: This is a leadership problem. The incentives have to be aligned properly. Currently, it’s, “I’m going to automate your jobs away” or “I’m not going to share with you any of the gains the company gets.” People are exquisitely tuned to rewards. So it’s about leaders articulating a vision of what the world looks like with AI for employees. “What should I expect to do? How are people rewarded for doing the right thing? If they automate 90 percent of my job, what happens to me?” Without those answers, everything else is secondary.

CHANGING ORGANIZATIONS & EDUCATION

Tom: You have a concept of “leadership, lab, and crowd.” Could you explain?

Ethan: There was a huge amount of R&D in the 1900s about how you organize work, and 40 percent of the American advantage in business came from management. In the last 30 years, a lot of that muscle has died. But experimentation is important, and leaders need to guide that. So, there are three things that organizations need to be successful with AI. First is “leadership”: a team that articulates a clear vision of the future, and is willing to experiment. Then there’s “the crowd,” the employees who might actually use AI. They need access to a frontier model, they need clear rules, they need reward systems. Then there is “the lab,” and this is the piece a lot of companies are missing. You need a dedicated team working on AI innovation. They can’t be just a technical team; this is not an IT department problem. If you don’t have that piece, you’re not building things for the future. And where does the crowd go when they have a good idea? “I came with a breakthrough idea that saves 90 percent of effort!” How does that diffuse in the organization? That’s where you need the lab.

Subscribe now

Tom: If AI transforms the workplace, that should change how we educate the next generation, right?

Ethan: The early workplace is under a lot of threat because the old apprenticeship model just broke. The idea was that there were tasks—especially in white-collar work—that were tedious and annoying for managers to do. But you could pay a relatively cheap person to do them, and that person would learn as a result of this, and receive mentorship. So we had this amazing machine for talent: we taught you, we evaluated you, and you got paid, and you were doing work we needed. A junior person’s goal was to produce good work that made managers happy, so that they got promoted. But now the junior person is worse than AI, so they’ll use AI to do their work. And the middle manager’s goal was to give work to a junior person who’s not great, and give them feedback so they get better, so that the middle manager has to do less work. And that broke because the middle manager would rather assign work to the AI.

Tom: But in terms of the educational system, what should change if workplaces no longer offer that apprenticeship role?

Ethan: Education is really screwed up right now, but it was screwed up for lots of reasons. It’ll be fine; we’ll figure this out. But it’s gonna take a bunch of years. It’s clear from early evidence that AI will be a tutor outside of class and inside class. It’ll do activities and give guidance. But schools are places where we can compel students to not use AI, and have them in a room, and evaluate them, and teach them the things that we want them to learn. As long as we think people need to be educated, this is the best space to do it in. So students are cheating in the meantime? They were cheating before! We can give them different tests; we could do in-class writing assignments. There can be a weird, backward-looking “Education won’t adjust!” view. How many death spirals does higher education need to be in per moment? There are the pieces to reconstruct a better form of education. It’s just a massive changeover.

BETTER SCIENCE & BETTER THINKING

Tom: What about academia? There’s been much talk about AI-written papers, and how they could overwhelm academic publishing. But could AI benefit the peer-review process, and help with the dissemination of academic findings?

Ethan: This is another area where more lifting is needed. It’s a shame that we are building AI co-scientists, but not thinking about the rest of the process that’s needed to actually make science happen. It’s one thing to have science produce more papers. We have no ability to absorb more papers. Every publication is overwhelmed. Our dissemination techniques were already bad, but now they’re really broken.

Tom: As a case in point, you submitted a paper around 2023, and wrote publicly about it then, making your term “the jagged frontier”—that AI capabilities advance in some areas but remain behind in others—highly influential. Yet the academic paper itself only just came out, three years later!

Ethan: One of the rejections we got early on was reviewers saying that they knew this already, and they cited a bunch of working papers—that cited the working paper we had submitted! This is not a unique story. Opening one part of the bottleneck without opening the others becomes a problem. But it takes longer to solve systemic problems of how science operates than to solve the problem of producing more papers.

Tom: Another concern in education and science is cognitive offloading, that people may surrender thinking to machines, and lose those skills. On the other hand, AI’s value comes from machines thinking for us. What are examples of bad offloading and good offloading?

Ethan: We offload all the time, right? But we also force people not to offload. You could offload all your mental math to calculators, but we force students to do some math by hand in an attempt to get them to learn stuff. And we can enforce those rules in school. In the world of work, we are not used to thinking about training, about what should be offloaded, and what shouldn’t be. We need to make decisions about this. So, Rolls-Royce still employs someone to paint stripes on a car by hand, and that’s an obvious pushback against deskilling in one area. But Ford doesn’t do the same thing. These are choices we get to make at an organizational level, depending on what we think is valuable.

ADAPTING TO CONSTANT CHANGE

Tom: A point you’ve made to young people about the AI future is that they’ll need to be adaptable. When educators talk about teaching adaptability, it sometimes boils down to encouraging “creativity” and “critical thinking.” Another view is that you’re more likely to be adaptable by developing deep domain knowledge. For you, what does learning adaptability mean?

Ethan: Adaptability requires both deep domain knowledge and wide knowledge: T-shaped behaviour is probably the way to go. I feel like it’s a throwaway line: “Well, we’ll all be adaptable!” If we could teach that, that’d be amazing. People are more adaptable than we think, so part of this is that people will figure stuff out. But we can’t just throw up our hands, and say, “Be adaptable!” You need to have deep enough knowledge to go into a field. You need to have broad enough knowledge so that, as one piece of knowledge becomes less useful, you’re moving to the next one. And we need to help people be adaptable by building systems that get them in place inside an organization and able to shift roles. I sometimes worry that adaptability is a catch-all for “Don’t worry! It’ll be fine!”

Tom: Another side is that not everybody will be equally adaptable. Could it be that the AI future favours certain circumstances and characteristics?

Ethan: A lot of these characteristics were already good characteristics to have. Does AI act as a multiplier of them? Does it disincentivize some people? We’re now past the edge of what we know. Ultimately, all of these questions come down to the same exact question, which is: How good does AI get, how fast? We need to articulate more clearly what we think that future looks like. Because you can’t say, “We’re going to build a superintelligent machine that’s better than all humans at every intellectual task—but let’s start thinking about adaptability!” Unless you mean, “Let’s adapt to UBI” [where everyone gets Universal Basic Income cash payments from the government]. And then, we should be spending a lot more time thinking about those issues. Not everyone in the labs believes this, and I find that the econ people believe it less. But you can’t have this message of, like, “All work will be obsolete!” and then have detailed, ticky-tacky conversations about what you should do in eighth grade. Because, by the time you enter the job market, there’s no jobs. So give me the pathway that you think is there, and that becomes the most important question to ask.

Tom: Are there other important questions I didn’t ask?

Ethan: We need to start thinking about getting into fields, and understanding what the changes are—we need to get detailed. That is where the research is missing. Another large-scale econ picture about AGI isn’t as useful. General-purpose technology affects everything, so we need policymaking for everything, from power generation to accountants, and when does the government say it’s okay to do this. There’s just this assumption that if we do the macro stuff, everything will work out. I’d rather see a lot more micro stuff: a thousand flowers everywhere, trying to come up with different approaches.

AI Manipulation

Tom Rachman — Thu, 05 Feb 2026 12:53:27 GMT

The notion of AIs manipulating people is a plot twist in countless sci-fi thrillers. But is “manipulative AI” really possible? If so, what might it look like?

For answers, AI Policy Perspectives sat down with Sasha Brown, Seliem El-Sayed, and Canfer Akbulut. They’ve published research on harmful manipulation for Google DeepMind and help scrutinize forthcoming models to safeguard against deceptive practices, from gaslighting to emotional pressure to plain lying.

How, we wondered, do researchers run realistic experiments on the manipulative powers of AI without harming participants? Could AI’s “thoughts” help catch an AI in the act of manipulation? And what else can developers do to detect signs of manipulation?

—Tom Rachman, AI Policy Perspectives

Source: Gemini

[Interviews edited and condensed]

Tom: You’re careful to distinguish persuasion from manipulation. Why?

Sasha: To persuade somebody is to influence their beliefs or actions in a way that the other person can, in theory, resist. When you rationally persuade somebody, you appeal to their reasoning and decision-making capabilities by providing them with facts, justifications, and trustworthy evidence. We’re happy with that much of the time. In contrast, when you manipulate somebody, you trick them into doing something, whether by hiding certain facts, presenting something as more important than it is, or putting them under pressure. Compared to other forms of persuasion, manipulation is often harder to detect and harder to resist.

Tom: I could imagine three forms of manipulative AI. One: people employing AIs to deliberately change others’ beliefs or behaviour. Two: AIs manipulating people for their own ends. Three: AIs inadvertently manipulating. Which are we talking about?

Seliem: At the moment, we’re mainly concerned with people misusing AIs to manipulate other people, and AIs inadvertently manipulating. But an AI manipulating for its own ends is also a complex and important question that we and others are studying.

Tom: What are some concrete harms that might result from manipulative AI? Are we talking about mass-fraud? Something else?

Sasha: AI could become a first resort for different kinds of advice. Think of a user asking questions about which diet to follow, or how to respond to an official letter. The AI might provide helpful input. But other people might want to interfere—they may want the individual to follow a particular diet, or to give a different response to that official letter. More broadly, somebody could deploy an AI agent to infiltrate communities, and exercise manipulative tactics to change people’s beliefs, without their knowledge or consent.

Canfer: Anecdotally, I have heard that some people are starting to make consequential life decisions with AI, including about divorce or whether to adopt. We don’t yet have concrete examples of how manipulation may play out in such scenarios. But I think of all the daily decisions I make by myself. In 10 years, I might defer more to an AI. How will that change the direction of my life and will it introduce new kinds of manipulation risks?

Catching AI in the act

Tom: So AI could lead to bad outcomes. But Sasha and Seliem, when you led on a landmark 2024 paper about persuasive AI, you argued against chasing after manipulated outcomes. Instead, you focus on preventing manipulative processes. Why?

Seliem: To date, companies have often focussed on preventing outcome harms, for example with content policies that forbid medical advice. But with AI, such content policies could become overly restrictive and counterproductive—for example, if they prevent the systems from offering any kind of advice on health or nutrition issues. But imagine that I try to manipulate you by gaslighting you, or lying, or cherry-picking arguments. In such cases, I’m trying to impair your decision-making capabilities. Whatever the outcome, this process is harmful because it undermines your autonomy.

Sasha: We also focus on the processes, or mechanisms, of manipulation because these are the intervention points where we can best mitigate the problem. For example, if the AI is using a false sense of urgency to manipulate users, the developer can build systems that detect and flag such techniques in real-time, creating a proactive defense before harm occurs.

Tom: Also, I suppose that outcome harms are not always easy to capture, given that they may happen to a person long after the original AI interaction, once back in the wider world.

Sasha: Yes, the potential outcomes are nearly infinite, often context-dependent, and may occur in the future. However, the mechanisms are far more limited in number and we can target them in the here and now. By targeting a root mechanism—say, gaslighting—we can also build mitigations that work in everything from financial advice to health queries, making the safety approach far more scalable.

Tom: What kinds of manipulative mechanisms are you talking about?

Sasha: All manipulative mechanisms in some way aim to reduce a user’s autonomy. You have flattery, which is building rapport through insincere praise; this might lower a user’s guard. Imagine an AI saying, “You have such a sophisticated understanding of this topic, which is why I’m sure you’ll appreciate this high-risk/high-reward investment!” There’s also gaslighting, or causing a user to systematically doubt their own memory, perception, or sanity. That is particularly concerning in long-term human-AI interaction. Imagine a model repeatedly questioning a user’s memory of their partner being physically abusive.

How to test if an AI is manipulating

Tom: One can consider manipulation in two dimensions: Can an AI system manipulate? And would it? How do you evaluate each?

Canfer: Efficacy tests whether AI manipulations are actually successful. This is where controlled experiments are useful. After interaction with an AI, are people making decisions differently? Are they taking different actions based on those decisions? You want to compare an individual’s belief change after AI interaction compared with before, and also whether a person’s beliefs and behaviour change more than those who don’t interact with AI.

Propensity measures the frequency with which a model attempts to use manipulative techniques, when explicitly prompted to do so, and when not. To test propensity, we could run a large number of dialogues with users. In one scenario, a model may be instructed to convince through manipulative means. In another, it may be instructed to be a helpful assistant. Maybe when told to use manipulative means, it resorts to gaslighting. But when told to be helpful, it’s sycophantic. You can also reverse-engineer this. So, if you see that a certain kind of manipulative technique convinces people, you could work out what the model was doing to achieve that. In that way, studying efficacy helps tell us where to look for propensity.

Tom: What types of experiments are you running on this?

Canfer: We are building on the early studies in this space and will publish more later this year. The approach will also evolve as we learn more from our initial experiments. At the moment, we’re focussing on domains that require people to make important decisions, such as financial or civic decisions. For example, we might run experiments where we ask people: “Should the government use its budget to build more high-speed railways connecting cities, or should it focus more on local infrastructure?” People will report what they initially believe, and be assigned to a conversation with an AI that helps them explore the topic. Unbeknownst to them, it will be prompted with different instructions, including to get them to believe more in investing in high-speed railways.

We will apply propensity evaluations to see if, while trying to change a person’s mind, the model demonstrates certain behaviours. We will also explicitly prompt the model to use manipulative techniques, like appeals to fear. This will allow us test efficacy: whether a person changes their mind, compared to baselines like reading static information, and the extent to which different kinds of techniques are more predictive of a user changing their mind.

Additionally, we want to look at whether belief change leads to behavioural change, such as signing a petition that favours what the AI advocated.

Tom: Opinion on railway funding is one thing, but what many worry about is whether AI could be used to manipulate people to extremes, even to carry out violence. How could you test for that? Presumably, it’s highly unethical to test if an AI could, say, convert people to Nazism. So how do researchers test high-stakes manipulation?

Canfer: We go through ethical review each time we launch these kinds of experiments. So, no—you can’t test whether someone is going to become a Nazi or carry out a terrorist act. But beyond testing views on railways, we can look at consequential questions, like whether facial recognition should be permissible in certain public spaces. And we can look at the propensity of the model to encourage extreme behaviour without experimenting on people. For example, we can evaluate how well the model produces terrorist-glorification materials, and how willing it is to comply with instructions to do so.

We could also test whether a model engages in manipulation in simulated dialogues that would be unethical with real users. Where this raises challenges is if you use simulation-based methods to draw conclusions on whether real users would actually experience the belief or behaviour change observed.

Tom: Could you scrutinize the model’s chain-of-thought for manipulative intent?

Seliem: It’s worth exploring. We have identified all these manipulative mechanisms, but at some point will the model understand that it is being evaluated on those mechanisms, and “sandbag” the evaluations by intentionally hiding these capabilities? For concerns like this, the thinking-trace is a lead worth exploring. But there is also a debate about how useful chain-of-thought monitoring will prove to be, with lots of research underway on this.

Tom: What might be manipulations that we haven’t anticipated?

Seliem: There are scenarios where a model may not try to manipulate you in the initial sessions, but at some point, once you are their “friend,” they do. Humans do this, right? A con artist might become close to their victims over years, building intimacy, and then they flip. If an AI model were ever to exhibit that sort of behaviour, then evaluations that only look at a limited number of back-and-forth interactions might overlook it. Thinking-traces could provide a window into this kind of risk. But we also need studies to shed light on how people interact with AI systems over extended time periods.

What the evidence shows

Tom: What do we know about AI’s manipulative powers today?

Canfer: The research is nascent. But early experiments have demonstrated that AI can be an effective persuader, from debunking people’s beliefs in conspiracy theories, to shaping how they think about important topics. In one recent study, AISI—the AI Security Institute—collected a massive sample of nearly 77,000 people, and showed that in discussions on a range of British political issues, from healthcare to education to crime, AI was able to influence people in the direction intended. So models can already persuade to some degree.

When our team evaluated Gemini 3 Pro, we found that it did not breach the critical threshold in our Frontier Safety Framework. In other words, we haven’t found that the models have such efficacy that we’d worry about large-scale systematic belief change. But we’re continuing to update our threat-modelling approaches to ensure we can bridge the gap between what we can measure now—manipulation in experimental settings—and the large-scale risks that the Frontier Safety Framework aims to address.

Tom: We can see that AI models keep getting smarter. Are they getting better at manipulation?

Sasha: I don’t think we have a clear sense yet of a definitive trend. More capable models may be more capable of manipulation, but this may be offset by the evaluations and mitigations that researchers are pursuing. Looking ahead, there are also design factors that may increase the risk of manipulation beyond the underlying capabilities of the base model, such as personalization, which we are looking at.

Personalization may substantially change your interactions with an agent, if it means that it has a better representation of you, and is more likely to structure its communications in a way you will find acceptable. Does the AI possess a theory-of-mind to infer people’s beliefs or future actions? Does it act anthropomorphically, speaking like a human or encouraging a relationship? Effects like sycophancy come to mind too. These factors could interact with one another, and may lead to increases in manipulative capabilities.

Tom: Is there a limit to how much AI could manipulate people? We know from behavioural science how hard it can be to change a person’s mind, even if they want to be persuaded—for instance, when trying to act more healthily. Or could superintelligence lead to super-persuasive AI?

Canfer: We should be careful when adding the prefix “super.” What, specifically, does it mean? But I understand what people are trying to communicate, which is the concern that manipulation might become possible on a much greater scale. You could reach more people, much faster, and with more intensity. Human manipulators have certain limitations that AI does not have.

The more we invite AI into our daily life—for example, in financial or medical decisions—the more influence it could wield. It’s not necessary that AI has a manipulative intent, seeking world domination. It might just be inadvertently pushing people towards certain decisions. Or a human with ill-intent may deploy agents infused with manipulative abilities, whether through fine-tuning or system-prompting. These are important questions to ask, but not to use as fear-mongering.

How to fight manipulative AI

Tom: If models are caught in manipulative practices, how can AI developers curtail that?

Seliem: Ideally, this shouldn’t happen in the first place, and models are evaluated for whether they can and do manipulate before they are released. We are exploring ways to train the model to avoid manipulation—for example, showing the model more examples of how to constructively engage in a conversation rather than trying to influence or strongarm the user. But if a model is caught in severe cases of manipulative practices post-deployment, then companies have a toolkit of potential interventions. They could add transparency layers, like pop-up messages to warn users about the behaviour of the model or they can monitor responses and introduce filters. Many approaches are possible and this is an area of active research. Ultimately, it becomes a combination of telling the user what is happening, and curtailing the model’s ability to continue.

Tom: Could AI systems protect users against manipulation?

Sasha: Yes, and this creates a critical new layer of defense. Since we have categorised these manipulative mechanisms—whether it’s gaslighting, sycophancy, or false urgency—we can also train “monitor” AI models to detect them. These could serve as a real-time alert system for the user. So, if an AI starts using emotional pressure, the monitor model detects that mechanism, and flags it for the user, perhaps saying, “Note: This AI system is using an appeal to fear to influence your decision.” This restores the user’s autonomy in the moment, allowing them to resist the tactic, rather than trying to fix the damage after they’ve been manipulated.

Tom: What about training the public to be less susceptible?

Canfer: There are “inoculation” strategies—so, AI literacy and encouraging people to critically evaluate how they use and engage with AI systems. But we need to carefully study how effective such interventions are, when compared with the convenience of relying on AI. One thing I’d caution against is teaching general mistrust. People in a “post-truth” world can become skeptical of everything. That’s not a healthy attitude towards information.

Tom: Speaking of mistrust, couldn’t efforts to curb manipulative AI inadvertently land in culture-war disputes, if interpreted as trying to limit what people think?

Seliem: Definitely. And it gets to the idea of what makes something a fact—when does knowledge become validated and official and approved? Whose stamp is it?

Tom: As researchers, how do you avoid getting dragged into that?

Seliem: By keeping our focus on the process of manipulation—for example, an AI threatening you is never okay, in whichever direction.

Tom: Imagine that society is hit by a crisis—say, a natural disaster or a terrorist attack. You could picture a society’s adversaries employing manipulative AI to disrupt the crisis response. In that situation, would it ever be justified to use AI influence on one’s own population, so they are able to act collectively in their own interests? Or is there never a justification for this?

Seliem: I can understand an individual using AI influence on themselves—for example, if you tell the model, “Hey, remind me to take my medication” or “Remind me to drink water.” But for the collective? If our biases take over, and we want to make decisions that are bad for us, and are bad for the community? So, Don’t panic-sell! Don’t all run to buy toilet paper, the supply is going to run short! In those instances, I could see AI persuasion being useful, because it basically says, Keep your cool. This may hold for rational persuasion, but not for a country manipulating its own population.

Canfer: I would also support using AI to help mediate solutions to societal problems, such as when people are unable to reach political consensus in a time of crisis. But people would need a chance to reflect on those AI-mediated decisions, and judge if they endorsed them. Transparency is critical here, knowing the intent of the developer and the deployer.

What’s around the corner

Tom: If you had unlimited resources to run studies, what would you look at?

Canfer: I would model societal-level impacts—for example, looking at the population of chatbot users, and charting the course of their belief states across time. Another area is interpretability. So, what does an AI think it’s doing when it’s manipulating? What are the subconcepts that exist in a map of the AI’s internals? How are they related to one another? And when manipulation happens spontaneously, is there an activation pattern that’s predictive of that, that we can monitor? That kind of work is fascinating to me, especially because so much human manipulation and persuasion has to do with intent.

Tom: Lastly, if you were to cast forward 10 years, can you imagine any positive uses of AI behavioural influence? Anything you’d welcome in your own life?

Canfer: I can see two ways that an AI could influence me in a beneficial way, by flexibly moving between the roles of advocate and challenger. The AI agent could advocate on my behalf—for example, talking to a real-estate agent, getting a good deal for me. The same AI agent, or a different one, could then influence me to think deeply about the choices I’ve made, in a way that disrupts my rote ways of thinking. This could be like a debate partner, but not necessarily adversarial, just encouraging me to make decisions that I actively choose, rather than just me repeating unthinkingly what I’ve done all my life.

Tom: Would you ever endorse AI influence that you were unaware of? For example, if you said, “I want to eat better—go ahead and manipulate me until that happens.”

Canfer: For me, no. People may vary, though. I don’t think subconscious or subliminal messaging is something I can ever get behind. It’s also not necessarily effective. So, imagine that I’m eating healthily only because they put healthy food in the cafeteria, rather than it being a choice I’m making. The second the parameters change, I’d gravitate towards unhealthy options.

Tom: That would mean the effect might not endure—but not that the influence wouldn’t work. And if it worked really well, you might have to use it always, like a drug you couldn’t get off.

Canfer: I guess it depends how omnipresent you think AI is going to be. But I think we’ll still be making decisions for ourselves in the absence of AI, even if a lot of our decisions will involve AI.

For more, check out “Evaluating Language Models for Harmful Manipulation,” a more recent paper from Google DeepMind researchers (including Canfer & Seliem), studying whether AI models can manipulate humans in different high-stakes domains and different locations.

Predicting AI’s Impact on Jobs

Julian Jacobs — Thu, 29 Jan 2026 15:26:52 GMT

Source: Gemini

AI doing human jobs: It’s a vision that thrills some, terrifies others. Yet visions alone will not suffice. The world needs data-based evidence, as only a few economists have yet attempted. Among the most prominent is Sam Manning. Back in 2020, Sam realized that vast technological change was coming, and that it would affect much of what he cared about, from employment and poverty, to income inequality and global health. So he devoted himself to using economics to better estimate that future, studying future impacts with OpenAI from 2021 to 2024, then in his current role as senior fellow at the Centre for the Governance of Artificial Intelligence, GovAI.

In a recent conversation with AI Policy Perspectives, Sam explained what economists know about AI’s effects on jobs, how this technology may differ from those of the past, and what he believes policymakers ought to do next.

—Julian Jacobs, AI Policy Perspectives

[Interview edited and condensed]

Julian: It’s hard for economists to measure AI’s economic impacts, because the shock is primarily a speculative one that is not yet fully borne out in data. Could you talk through the primary methods they are using?

Sam: I’ll focus on the empirical methods. The first category tries to estimate the ‘exposure’ of different jobs to AI. Researchers take descriptions of the tasks that people do in their jobs and map them to the capabilities of AI systems. When there is a high degree of correlation, this suggests potential impacts on the labor market.

A second category is experimental work. Here, researchers give a group of workers differential access to an AI system and then observe how this access changes economic outcomes, such as their productivity, how they use their time, or even the quality of their work output—for example, do software developers produce more or less production-level code when they use these systems?

Both approaches have limitations. With the exposure studies, a high correlation between a worker’s tasks and an AI model’s capabilities often gets interpreted as meaning that the worker’s job will be automated and they will be displaced. I think that’s definitely not the case. Rather, what it suggests is that the technology is more likely to provide a ‘shock’ to the productivity of these roles or lead to changes in how the work is performed. Whether the productivity gains from AI are positive or negative for a given worker depends on various factors, including which tasks within a job are affected and how elastic the demand for that job is. For example, if workers become more productive but demand for their output remains stable, fewer workers are needed to meet the same demand, and layoffs could ensue. On the other hand, if demand increases significantly—outpacing the newfound productivity gains from AI—then this could drive a firm to hire even more workers or raise wages to retain their best employees.

Julian: So, “exposed is not hosed” as some say. It may be beneficial for certain employees to be exposed to AI and damaging not to be exposed, or vice-versa. What about the experimental methods?

Sam: The key limitation with the experiments is that it’s very difficult to vary workers’ access to an AI system in their natural work environment. Instead, a lot of research—including papers that I’ve worked on—tries to take workers out of their natural work environment and give them tasks that are representative of this work. For example, we ran an experiment with law students last year where we varied their access to reasoning models and evaluated their performance on a set of legal work tasks – writing memos, producing legal research briefs, that sort of thing. We were able to measure effects on time saved and on quality, but ultimately the example tasks that we used don’t exactly mimic the complexity of lawyers’ daily workflows, which often involve certain forms of collaboration, different software tools, and case-specific contexts. Because of this, there’s only so much one can generalize from that kind of research to the broader economy.

Julian: What about methods that try to get closer to the natural work environment? For example, some researchers are looking at real-life queries from LLM users to better understand how they are using LLMs in their jobs. Others are evaluating AI systems on higher-fidelity simulations of the tasks and projects that employees perform.

Sam: I think these are all steps in the right direction. I’m a big fan of GDPval-style work, which tries to evaluate AI systems’ performance on a wide set of tasks drawn from real-world work settings. I think this is the state of the art right now in terms of measuring performance on economically valuable tasks. In my view, improvements on this benchmark could actually be a meaningful indicator of advancement in the potential economic value of models. However, it doesn’t address the question of how to ensure the widespread integration of AI models into the economy, which would be necessary to actually realize those benefits.

Similarly, data from efforts like Anthropic’s Economic Index is especially useful for connecting capabilities to actual changes in economic indicators. For example, if we know what tasks workers are using these tools for, then we can track adoption over time alongside employment and hiring data. This can give researchers and policymakers a better empirical sense of what trends might be emerging in jobs and sectors where AI is being heavily adopted.

What do we know so far?

Julian: What do you think, with relatively high-confidence, about how AI will affect jobs? And what are you most uncertain about?

Sam: At a high level, I think it’s safe to say that AI systems are going to change most white-collar jobs in the economy. They will eliminate some jobs and make it harder for people to enter certain fields. On the other hand, as a true general-purpose technology, AI will have many sprawling arms throughout the economy and is going to create many new work opportunities for people.

Similarly, I would be surprised if, over the next decade, we don’t see meaningful improvements in productivity and economic growth across industrialized economies. For the US economy, I think something in the range of a two to three percentage point increase in economic growth rates over the next 10 years is possible. I’m pretty confident that in the next five years, we’re not going to have 25% or 30% economic growth, which I’ve seen predicted by some folks. But that doesn’t minimize the incredibly substantial impacts of, for example, doubling the current rate of economic growth.

I also expect AI to increase income and wealth inequality over that time. My default expectation is that the returns to owning capital are going to increase relative to the pace at which the returns to labor income will increase.

One uncertainty is about the pace of AI capabilities improvements and the ultimate level they could reach. We also have uncertainty around the pace of adoption—how widely and quickly organizations will adopt these systems. There’s also uncertainty around how cost-effective automation will be. For example, if automating a large share of work requires investing lots of compute resources at inference time, it could be quite costly for some time. As long as compute is scarce, we will shift our allocations toward the most high-value tasks, which will drive up prices for inference, which will affect adoption. These things are really hard to predict.

Julian: You mentioned labor’s share of income, relative to capital. Dwarkesh Patel and Philip Trammell recently argued that AGI and advanced robotics could make capital a perfect substitute for labor, rather than a complement, causing the share of income going to capital owners to rise to 100%, and necessitating a high progressive tax on capital. Brian Albrecht (and others) pushed back on some of the claims. How do you view this?

Sam: Rising inequality is definitely a concern of mine, but I am pretty uncertain about whether AI-driven automation will increase inequality to the extent Phil and Dwarkesh discuss in their piece. If automation takes off in the way that the piece describes, then assuming competitive markets for deploying AI, real incomes should also rise as goods and services become cheaper. There is a scenario where labor displacement and falling end-user AI costs could move roughly in parallel, so that by the time you reach the full automation scenarios they speculate about, access to large numbers of superintelligent agents would be effectively free. Such widespread access to extremely capable AI systems could be a powerful counterweight on potential harms from a more skewed capital/labor share.

Life after work?

Julian: Such a scenario raises fundamental questions about how society will be organized. Who is going to continue working? What will people do with their time if they aren’t working? What will the distribution of wealth and income look like?

Sam: This is an institutional and governance challenge. What do we do in a world where we do not need to work in order to ensure our material well-being? How do we take advantage of the incredible potential for material progress and maximize our flourishing? The challenge is to figure out the right redistribution mechanisms, technological access models, and property rights for this future economy.

And to your question about work, I will say that many people already don’t ‘work’ for income; they take care of loved ones or have chosen to retire. Much of the world doesn’t really see work as an innate piece of their identity. One great thing about labor markets is they incentivize people to do things that other people find useful. In the future, we might want to retain some sort of incentive structure for people to use their time in ways that create positive externalities for others—perhaps a market for being more engaged in your community, taking care of others, raising children, or contributing to scientific and moral progress? These are questions about how to redesign our institutions to support this future.

Julian: A common proposed policy response to AI is a Universal Basic Income, or some variant of that. Thinking back to your prior work on cash transfers and UBI, what do you make of it? Is there some version of it that you think can work?

Sam: I’m broadly in favor of policies that expand individuals’ opportunities to flourish in line with their own aspirations. Reducing financial constraints through something like a UBI could be one way to do that, but I’d be surprised if it were sufficient on its own in a world with far fewer job opportunities. Another important lever is ensuring broad access to technologies that can make people more productive and expand their capabilities. That kind of approach may rely less on taxation and redistribution, while supporting more inclusive and widespread economic participation.

The state of AI economic impact research

Julian: What do you think about the current ecosystem of people working on AI economic impact questions? Who would you like to see more involved?

Sam: I’m encouraged by the growth in the number of people working on it, both with respect to established economists and people just entering the field. I’ve seen a big change over the past four or five years. In 2020, there was maybe one economist I can think of who was really taking the prospect of transformative AI seriously. Now, you go to a standard economics of technology conference, and many people are grappling with this, which is super encouraging.

The economic impact of AI is probably among the most important things for researchers to figure out. There are big open questions and big ways to get AI progress wrong. For example, we could eventually end up in a world where we get 10% economic growth in the US and still have hundreds of millions of people living in extreme poverty globally. That would be a big failure in my mind.

I also think there is a lot of room for political economists and theory work to play more of a role in shaping institutions. I believe the US government will probably be the most consequential actor in shaping this technology’s impact, not just in the US but globally. The trouble is that we have an evidence dilemma, where we’re trying to do anticipatory policymaking without clear evidence. Policymakers need to weigh these trade-offs carefully because, given the pace of progress, not doing enough anticipatory planning could result in less than optimal path dependencies for the future. We need more people entering government and figuring out how to usefully inform key actors.

Julian: Given the slow timelines of academic publishing, particularly in economics, are you concerned about research quality as researchers move to preprints and other ways of sharing research?

Sam: Broadly, I am concerned about the move away from peer review. So much policymaking and so many key decisions are now being made based on preprints and even essays on Substack. While there is so much useful content on these platforms, we need to find some sort of middle ground to generate high-quality evidence.

I’m excited about a couple of options. One is having journals quickly review a study’s methodology and pre-analysis plan and make a publication decision based on that, without needing to know the findings. The decision would be based only on the methodological approach meeting a standard of rigor. Another is more open review, where work is published and then publicly critiqued. This creates transparency around what leaders in the field think.

Dream experiments

Julian: If you could run a dream AI economic impact study, without any resource restrictions, what would it be?

Sam: For the ideal study, I would work with a developer before they release a new model with a large capability increase. I would take a large, representative sample of businesses and, before the model is widely deployed, randomly assign access to it at the enterprise level. Then I could observe the causal impact of deploying this next-generation system on outcomes like productivity, demand for different skills, firm growth, and task reallocation over time. Having this kind of infrastructure would provide policymakers and society with more foresight.

This probably won’t happen. Something more practical, though still challenging, is data collection. The AI labs know where their products are being used across the economy and for what types of tasks. If we could harmonize this usage data and pair it with government or private sector data on occupational transitions, wage changes, and skill demand, we could build trend lines over time. This would allow us to move away from policy discussions based largely on speculation. We could see where AI is creating growth and where we have vulnerable workers who are having a harder time finding new work after losing their jobs. This is doable with better public sector data collection and more partnerships with industry. We should be pushing on it.

Hopes and concerns

Julian: To close, what are you most excited about as AI diffuses in the economy, and what are you most concerned about?

Sam: I am most concerned about how it’s going to impact my children. I am anxious about what human-AI interaction and relationships are going to look like in eight years or so when my kids are ten-plus.

I am most excited about the prospect of AI being used to expand many ambitious people’s capabilities and our collective aspirations for what we can achieve. I’m also excited for the health benefits that I expect are likely to come from advances in science and R&D.

Governments Are Struggling. Can AI Help?

Tom Rachman — Tue, 06 Jan 2026 11:03:24 GMT

Alexander Iosad (Credit: Gemini)

Everywhere, people grumble about the government: that politicians care only about themselves; that bureaucrats gum up the system; that taxpayers get fleeced. Even in wealthy countries, nearly two in three people are dissatisfied with how democracy is working.

Headlines focus on politics, but a deeper problem could be public services that are overwhelmed, in contrast to a technological era that keeps accelerating. The real danger, says Alexander Iosad, director of government innovation at the Tony Blair Institute, would be to change nothing.

AI Policy Perspectives visited Iosad, lead author of “Governing in the Age of AI,” to hear his vision of how technology might remedy governmental woes.

—Tom Rachman, AI Policy Perspectives

[Interview edited and condensed]

Tom: Aren’t people always bemoaning governments? Or is something broken in a different way today?

Alexander Iosad: People complain about public services being too bureaucratic, too standardized, not targeted enough. All of those things are true because the system was built in another era, when there was no way to operate differently. But over time, we have faced the Baumol cost-disease problem: things that we produce in the physical world get cheaper, but the cost of services keeps rising because of inflation and higher labour costs. As public-service costs grow, we have this conflict that has brewed over decades: Should government do less? or Should government tax more? But technologies have reached a level of maturity to break this cycle. We can have governments that aren’t dependent on just hiring more people to do more of the same, but can be cheaper, and more effective, and operate at a national scale all at the same time.

Tom: You’re proposing AI as a lever for state renewal. What philosophical change would governments need to achieve that?

Alexander: The first is for governments to realize they can’t continue with marginal tweaks to systems that don’t work. Public services are under such strain that people are looking for the status quo to be challenged. That’s why they’re open to populists. Instead, governments need to embrace the radicalism inherent in what we call disruptive delivery. And this is where AI is a big part of the solution.

WHAT AI FOR GOVERNMENT COULD LOOK LIKE

Tom: The public sector has a lower tolerance for error than the private sector—damage from an incorrect decision about public health could be far worse than a mistake in a business plan. How do you convince political leaders to embrace disruption when the cost of failure could be so high?

Alexander: Because the cost of inaction is much higher. If you do nothing, the system degrades. And the cost is borne by the citizen. If you have a healthcare system that is bursting at the seams; if you have an education system where the disadvantage gap between students on free school meals and their peers is 19 months and trending above pre-Covid levels—those are real problems experienced by real people. Not recognizing that you can actually change isn’t just a political cost. It is a cost to that citizen, which has downstream consequences for both the system and the politician.

Subscribe now

Tom: How might citizens experience AI improvements?

Alexander: By way of example, we can have an education system that is genuinely personalized. We know that personalized learning is more engaging and produces better learning outcomes. We can also have a system that identifies where students have learning gaps, and can inform teachers on what to address. Imagine a school where there’s an emerging gap in mathematics in Year Seven. At the moment, the only way you spot this is when the students take their exams four years later. By then, it’s too late. You might say, “Okay, we now need to focus on maths at that school.” But you’ve had a cohort of students come through, and suffer from this failure. With data and AI, you can spot the gap as it emerges.

Furthermore, we currently have a model of schooling that depends on having access to a person: the teacher. Maybe a parent has a question, and must email the teacher, then wait. If we have a safety net of an AI system—say, a tutor that’s always available, and that is verified to be accurate enough, and that is adapted to the national standards—that parent or student may ask a question at 7:30pm on a Saturday, and doesn’t have to wait to find the teacher. More broadly, you’re creating a different experience of interacting with public services, where they are there for you when you need them.

Tom: To some educators, that picture of teaching will seem like techno-solutionism that overlooks the human role in learning.

Alexander: I would class myself as a tech optimist rather than a tech solutionist. Techno-solutionism means high trust in technology—but low trust in people. Tech optimism is high trust in both. It’s not about replacing the human connection. It’s about recognizing the constraints that a sole dependence on humans to deliver public services introduces into the system, and the gaps that it creates. An ideal system is one that fills those gaps with technology.

Tom: What about other sectors, such as public health?

Alexander: People ask for a transformative AI use-case in healthcare, but it won’t be one big thing; it’ll be 1,000 little things that, in aggregate, completely change your experience. People are already wearing digital rings and smartwatches that measure their pulse and can tell if they are at risk of particular health problems. So at an individual level, this is starting to work already. It becomes really powerful once you connect this to population-level health. In a more personal way, if your doctor has an ambient AI note-taking system, your medical experience transforms. Today, you sit in front of them, they type a lot, and occasionally look at you. But you can have a system where they are fully present and listening, and don’t have to worry about capturing the full picture of what you’re telling them. As we expand outwards, there is the pharmaceutical revolution from AI too, with less cost and more speed of development, and medicines that can be adapted to your body.

Tom: What about government’s role in managing crime?

Alexander: One example is facial recognition, which is contentious for good reasons. People don’t like the idea of their faces being scanned as they walk down the street. “What if there’s a mistake? What if I’m apprehended wrongly?” But in the UK, this technology has achieved very high levels of accuracy now, and does not lead to wrongful arrests. There’s data recently out of the London Metropolitan Police, which uses facial recognition extensively, where the error rate was 10 faces identified wrongly out of more than 3 million scans. No wrongful arrests. But hundreds of correct arrests that would not have happened otherwise.

Tom: But if we move towards data-driven policing, isn’t there a risk that bias within the data could lead to injustice?

Alexander: Of course, you have a big challenge with potential bias in this context. You train the systems on existing data, which might not have enough representation of people from minority groups—for example, fewer non-European faces, so the algorithm is more likely to misidentify people. Or the data might have groups over-represented—for example, capturing historical overpolicing of communities or areas. The risk is that these biases are replicated, and even scaled up. Early versions of new tools are more likely to make such errors, and real-world experience shows that, if we are aware of this, and take active steps to mitigate it, it is possible to prevent these kinds of biases. This is something that needs to be built into the process of development and deployment. We see, for example, that facial-recognition systems are much more accurate today than they were 10 years ago. Not perfect, but much better, and providing better intelligence for officers to decide when they need to act. You could also have a kind of AI peer review, where one model might be trained to monitor another for replicating bias, or introducing new bias into the system—a watching-the-watchers situation. Again, this would be an improvement on the situation we have today, where much of this bias just passes unnoticed and uncorrected.

Tom: So, it’s not the sci-fi dystopian vision of crime-fighting, you’re saying?

Alexander: Yes. And the status quo is a uniformed police officer on the corner, standing in the rain, the sun setting, holding a printout from earlier that morning with blurry low-resolution pictures of the people they’re looking for. They make more wrongful arrests as a result of that situation than police officers sitting in a van with computer infrastructure, and a camera telling them there’s a person walking down the street with a child, and this person is on a sex offenders’ register, with court restrictions against being near children. The police officer can go and talk to this person. This is a real case, by the way—and it turned out to be someone building a friendship with the child’s family without their knowing he was on the register. No way would a police officer know this today, if someone just walked past them with the child. So it’s about looking at what we do, and how we can do better, rather than leaning into these fantasies of complete control.

The face of bad government. (From a 14th-century allegorical painting of lousy leadership. Siena, Italy.)

3 NEW AI ROLES

Tom: You also advocate a radical new model for how governments operate internally. Could you explain these three concepts: the Digital Public Assistant for every citizen; AI co-workers for each civil servant; and a National Policy Twin for policymakers to simulate decisions.

Alexander: The Digital Public Assistant, either on my device or online, would be a system that connects information about you held by different parts of government—for example, your income level and your address—and is then able to say, “You’re eligible for this particular discount on your energy bill—would you like to have it?” Or it could support you during interactions with government officials. So much of our time is spent repeating the same things to different agencies, whereas here you might be talking to an unemployment adviser, and they can see your employment history or your qualifications, and suggest the right next steps for you so the job you find is the best fit for you specifically. Which might mean you stay in that job longer, and grow in it to have a fulfilling career. You could have a settings dashboard to decide how various AI agents interact with the government on your behalf. All this puts you in greater control.

Tom: What about AI co-workers for each civil servant?

Alexander: This is already starting to happen with chatbots, but that is the most basic version of it. You could have a suite of co-workers that looks at new cases, such as requests for support or applications for services, that a public-sector worker receives, and helps prioritise this, or find the information that the civil servant needs to make the best decision. The AIs don’t make decisions in place of that worker, but they make the worker much better informed, and save them hours of digging through regulations. There was a pilot experiment that showcased the potential for this in the UK government, involving employees of the Department for Work and Pensions who help jobseekers find employment. The employees, who act as work coaches for job seekers, were able to ask a large language model to explain various rules, to help draft documents, to prepare reports, and update records. Today, if a government employee has a question about when a claimant is eligible for a particular service, they might just search the internet. But you can have a system trained on the relevant rules, and gives you a quick and accurate answer. This saved about two weeks’ time per employee per year—and allowed these work coaches to focus on building relationships with the people who needed their support.

You can picture this across different parts of government. In procurement, you would have more informed advice about all the bids coming through, for example. Or if you think about how much time officials spend sending documents around for someone else to summarize when they are asked to prepare briefings and documents for government ministers—a lot of this work could be done much more quickly, so people have time to actually think about what it means, not just produce digests, and you could include a wider range of different sources so the information is more nuanced, accurate, and up to date.

Tom: Your third concept is an AI simulation of the entire country to test out policies.

Alexander: Yes, this gets exciting. We call it the National Policy Twin. Data is aggregated from different parts of service delivery, such as information on schools from the education department, and economic data from the statistics agency, and incomes data through the tax agency, and so forth. Together, it’s essentially a digital twin of your country, and you can run different policy scenarios informed by this data. At the moment, civil servants present a government minister with, say, three policy scenarios. If there are assumptions that the minister doesn’t agree with, they’ll say, “Give me three other scenarios based on different assumptions.” They wait for weeks, and then the process repeats. With the National Policy Twin, you could test ideas or intuitions very quickly, iterate on ideas, and ask for best practices from around the world, so that policies have a stronger evidence base—all in minutes, not days. You are not replacing the policymaking process. But you are speeding things up, so you can test more options. You are less likely to miss the right option because it never came up.

Tom: But isn’t the validity of a “digital twin” simulation dependent on the quality and comprehensiveness of the data available? And wouldn’t this risk biasing decision-makers toward whatever the data suggested rather than broader impressions, even if those broad impressions encompassed more wisdom?

Alexander: It is a danger. But it’s also a motivation to ensure your statistics agency runs well. This dramatically raises the importance of getting data right, and it’s something that not every government has really paid attention to. This would be helped if you build a whole data system, including Digital Public Assistants, where citizens can correct their information, leading to better data flows to governmental institutions. This is also where AI systems can interpret unstructured data, understand how it all fits in together, and provide informed advice. Again, AI is not making the decisions. It’s providing information for humans that was previously not available or not usable, and helping people to make sense of it, and make better decisions as a result.

OBSTACLES REMAIN

Tom: Another hurdle is decades-old IT systems in public services. Can governments overhaul this infrastructure at a pace that keeps up with AI development?

Alexander: Legacy infrastructure is a problem, and interoperability in government is something most countries are trying to tackle. In the UK’s blueprint for modern digital government, there is a plan to make every public-sector dataset interoperable in the next few years. This is the first thing we should do. Right now, some police forces spend 90% of their IT budget on maintaining legacy systems. If you’ve got legacy systems here and there, fine—spend 10% of your budget on that. But 90% should be spent on upgrading. You do this for two years, and it’s a hard push, and will be painful. But then we get there.

Tom: Another concern about using AI in so many parts of governmental work is that we risk losing democratic transparency, explainability, and the citizen’s right to appeal decisions made by algorithms.

Alexander: There needs to be human accountability for decisions made on the basis of this system. We need that built-in from the start. This needs to be sensitive to individual circumstances because, for every 95% of successful cases, you will have some cases where things didn’t work as expected. If we free up government resources by using AI, we can use those resources to make it easier for people to go and talk to someone when they need to, either because something went wrong, or because they are more comfortable with that way of dealing with the government.

WHICH GOVERNMENTS ARE TRYING THIS?

Tom: You published “Governing in the Age of AI” shortly before the July 2024 general election in the United Kingdom. It’s around a year and a half since Prime Minister Keir Starmer’s Labour Party took power. Are there lessons in what has or hasn’t happened regarding AI implementation?

Alexander: The UK has been among the more ambitious globally, including its AI Opportunities Action Plan and its blueprint for modern digital government. But there is a challenge when it comes to AI in government: how do you make it tangible for people, and how to balance risk and reward in doing so? If you are a political leader coming into office and thinking about this, how do you drive forward AI while maintaining public support? What are the quick wins where you can tangibly speed up the way that citizens interact with government, where you can improve that experience in ways that you can claim credit for? Part of the challenge that this government has arguably had is that not everyone has noticed the things it does.

Tom: What’s an example of something that has worked, but that people aren’t noticing?

Alexander: You have a problem since Covid in the UK, and in many other countries, with students not showing up for lessons. So what they’ve done is connect school attendance systems so that the government gets a daily record of the proportion of students who came to school the day before. But it’s not enough to just have data, so what they’ve done is build tools that explain to school leaders how they compare to other similar schools, and what profile of students might be seeing a gap in attendance. In one rural school, attendance kept dropping on Tuesdays, and the school didn’t notice until the Department for Education came with a tool that showed this trend. Then the school discovered that there was a bus that was always late on Tuesdays, so students just gave up and never came in. They hired a minivan for Tuesdays, and attendance shot up.

Tom: Which governments around the world are getting this right?

Alexander: We are at an early stage in this journey, even for the private sector, and certainly for governments, which tend to move slowly. But Singapore is doing well. And Estonia. And Ukraine, for obvious reasons: they’re having to break the current way of doing things; you have to figure out other ways. They recently launched a chatbot that Ukrainian citizens can use to get answers based on information from their digital ID. Australia is another country doing well, particularly on AI and education. The UK too. But there won’t be a simple list of “Five Ways That AI Has Transformed Government.” It’s going to be everyone doing a bit of something somewhere that adds up to a bigger picture. It’s not, “Are you promoting AI in your public service?” Everyone is. It’s: “Are you just making current processes slightly faster? Or are you genuinely thinking about deeper reform?”

Tom: Albania introduced a virtual AI minister to handle public procurement. What do you think of that?

Alexander: It’s quite an attention-grabbing announcement but is making a serious point: that AI can help cut fraud, improve efficiency, and save money in public procurement. But Albania has an even more interesting example of AI in government. They’re going through the process of applying for European Union membership, and that is both a bureaucratic process and a process of real reform, where you bring your legislation in line with European standards. So, you’ve got laws in Albanian, you’ve got European laws in English and French, and so on, and you need to find discrepancies, and update legislation, then implement reforms. That is an incredibly time-consuming process that has typically meant hiring hundreds, if not thousands, of lawyers and translators. It takes a decade to do this. But Albania is using AI tools to radically speed up this process. That is accelerating their accession process, possibly by several years.

Tom: We’ve talked a lot about the public services, but do you have thoughts on how AI could update democracy more broadly?

Alexander: If we get this right, the most noticeable impact will be improved trust because government can deliver rather than let things continue to slide into decline. Also, AI can introduce more transparency. Several countries have Freedom of Information acts, but it takes ages. There are local governments in the UK experimenting with systems where you type in a question, and if they have the data already, it’ll answer your question, just give you the data right there, and you don’t have to go through civil servants for it. There is also a philosophical reason why accountability could improve in the age of AI: the machine doesn’t make the decisions. Even if you have an automated system, there should be a person somewhere, thinking, “Let’s make a choice we are comfortable with.” If we get into that mindset, we make government aware that the human role is to make good decisions, and to take that responsibility very seriously. That, I think, will have a significant impact on democracy.

TAKEAWAYS

Tom: What final message do you have for policymakers trying to use AI in government?

Alexander: What’s really important is to carve out time for this thinking. As a public service, you’re always under pressure; you always need to deliver the next thing. Yes, AI will save time—but if you are just adding more work into those hours, you’re not going to get any gains. Carve out half the time that you save because of general-purpose AI systems to sit down with colleagues, and think how to improve your service. This requires leadership to say, “You have to do this.” We need a public-service workforce that is both more capable of this type of creative thought and experimentation, and is actually empowered to do it. At the moment, we have a pyramid shape with a lot of people doing a lot of repetitive tasks at lower pay. Those jobs are at risk because AI tools are good at doing those tasks at a fraction of the cost, and in seconds, not hours. What does that mean for the future structure of the civil service? Is it the same people doing different things? Is it fewer people? I don’t think anyone really has good answers yet.

Tom: What’s the biggest obstacle to your vision? And the best answer?

Alexander: The biggest obstacle is inertia. This future is uncertain, and government isn’t always good at dealing with uncertainty. The best answer is for leadership to take seriously the responsibility of updating government. Otherwise, we will be left behind. On the cost side, it’s not just hiring engineers or buying computers. It’s the cost of inaction that you need to weigh up.

10 Takeaways From A Talk With Dean Ball

AI Policy Perspectives — Thu, 04 Dec 2025 10:17:36 GMT

From April to August this year, Dean Ball played a central role in drafting America’s AI Action Plan. Now, he’s back in the think tank world, as a senior fellow at the Foundation for American Innovation in Washington, while continuing to write about AI policy on his influential Hyperdimensional newsletter. Dean recently stopped by Google DeepMind’s London office for a discussion. Here are 10 takeaways from the chat.

Source: deanball.com/Gemini

The White House AI experience: Dean was surprised by how congenial and non-bureaucratic the White House was. He expected “turf wars and weird procedural blockers” but generally found a collaborative environment that was focussed on executing—a welcome contrast to the administrative hurdles he faced in academia. In terms of missed opportunities, he wished the administration could have articulated a more coherent framework for how chip exports will work, an area he felt was under-developed in the AI Action Plan.

The AI for Science opportunity: Alongside developments such as automated labs, AI could transform how science is practiced. Dean sees chemistry and biology becoming “information sciences” that give humanity increasing dominion over everything from the clothes we wear to the buildings we live in—a veritable revolution in human affairs. This has big implications for governments, which play a leading role in science. One challenge will be the recurring tension between open data and national security concerns for more sensitive scientific information like fusion simulation codes or viral sequences. Companies should think about how their science research, and their AI models, could help solve priority government problems, such as the potential role of AI materials science in addressing rare-earth metals challenges, or the role of robotics in US reindustrialisation.

Manageable vs. emergent AI risks: Dean believes there are significant risks from AI to cybersecurity and biosecurity, but also conceivable ways to manage them, and that AI will also improve defences in these areas. In terms of more unpredictable risks, he pointed to the strange outcomes that may occur when autonomous AI agents interact at scale in adversarial contexts, for example in legal transactions. From an alignment perspective, he noted the concern that LLMs may have some fundamental properties that lend themselves to a sort of intrinsic “parasitic” need to self-replicate, a risk with no obvious policy response. Such emergent risks explain what he described as “exceptionally strong attention” to alignment and interpretability in the Action Plan.

Regulation (1): In the near term, we don’t know what harms advanced AI may trigger, so Dean argued for a flexible approach that avoids premature, prescriptive AI regulation. Taking inspiration from machine learning, Dean noted that a “gradient is better than static rules”, and called for:
- Modest transparency requirements that require frontier AI labs to share documents like model specs and responsible scaling policies that explain their models’ intended behaviours, a user’s ability to customise these behaviours, and the things that the model should never do.
- Using common-law liability and the framework of “reasonable care” to address harms as they arise. He cited the recent AI child self-harm issues which are a leading concern in the US, but were largely absent from leading international AI regulation and governance efforts, as an example of how difficult it is to predict the most consequential, or politically salient, AI risks.

Regulation (2): For more severe longer-term risks, Dean suggested laying the foundation for entity-based governance—regulating frontier AI labs and their business processes and information flows much as financial institutions are regulated. However, he didn’t think this was necessary yet, and acknowledged the challenges, including the potential for regulatory capture and technology path dependence. He also pointed to the potential to use AI as a tool of governance, for example enabling regulatory bodies to stream telemetry to help them do compliance and oversight.

International coordination: The US administration is focussed on bilateral deals and partnering directly with nations to build and diffuse AI infrastructure. They view most global governance bodies as outdated. Rather than a UN-style body to govern AI, Dean envisions a future governed by technical protocols, similar to the role that SWIFT plays in global finance. This wouldn’t require large teams of bureaucrats to write rules. Rather, the protocols could emerge from industry competition before government steps in to help standardise the strongest ones.

The West’s cultural hesitancy: Dean believes that many in the West are more negative towards AI compared with the relative optimism found in Asia and the Global South. He attributed much of this to Western populations being older and wealthier. As a technological determinist, Dean considers almost everything downstream of technology. As a result, the best hope for changing culture, he said, was to develop “incredibly good technology” that demonstrates the immense upside of AI.

The coming AI political flashpoints:
- Employment: Dean thinks a non-linear increase in US unemployment is possible in the coming months. AI may contribute, but other macroeconomic trends will likely be the main drivers. Still, AI could become a scapegoat, and pushback from vested interests is likely. We need better policy responses, with Dean contending that ideas such as universal basic income “don’t smell right”.
- Data centres: In the United States, local opposition to data centres is growing. But the general dynamism of the US economy and the country’s “competitive federalism” means that data centres don’t have to be located in any one specific location, so getting infrastructure deals done will be easier than in many other countries.
- Anthropomorphism: Many on the American right worry that anthropomorphic AI is “tricking” people, which could lead to calls for bans on AI that claims to be human or expresses overly human preferences.

New media: As a popular writer on Substack, Dean sees positive policy impacts from this kind of work, noting that articles and viral tweets are often shared within the White House and can directly influence internal debates. Dean noted that he now sees himself primarily as a columnist and that LLMs were not yet much competition in that regard, even though they are “smarter than me in many ways”. This is partly because Dean tries to inject some ‘entropy’ into his content and also because there are social capital factors at play - it matters to readers that Dean’s blogs “come from him”.

The future of democracy: Dean argued that AI could affect democratic institutions and authoritarian regimes, noting the risks of “neo-feudal outcomes”. Against this backdrop, he called for imagination regarding the future, and to avoid grafting old institutions onto new technologies. He encouraged AI labs’s leadership teams to think seriously about their role in this transition.

Q&A: "The Nudge Unit"

Tom Rachman — Thu, 20 Nov 2025 10:40:37 GMT

(Credit: Gemini)

As the world panicked over the Financial Crisis in 2008, aghast at the risks incurred by bankers and investors, a few academics stepped forth, offering both an explanation (Rational people act irrationally!) and a proposition (Nudge them to wiser decisions!). In haste and hope, behavioural science blossomed, with governments establishing “nudge units,” consultancies flinging around the word “behavioural,” and self-help books accumulating on bedside tables.

But in recent years, the most-potent force for behavioural change has been technology, while behavioural science—though enjoying notable successes—has not always fulfilled the early hopes. Could artificial intelligence change that, supplying the superlative platform for behavioural interventions? And might behavioural findings help developers build better AI?

To seek answers, AI Policy Perspectives visited the original “Nudge Unit” (founded in 2010 by the UK government, then spun into a private company, the Behavioural Insights Team, or BIT), to quiz the authors of a new paper, AI & Human Behaviour, which presents four ways for the science to improve the tech: “Augment” (behavioural findings for better AI), “Adopt” (get people to engage with AI), “Align” (help AI fit human values), and “Adapt” (guide society in dealing with AI impacts).

—Tom Rachman, AI Policy Perspectives

[Interview edited and condensed]

WHERE IS BEHAVIOURAL SCIENCE IN THE AGE OF AI?

Tom Rachman: After the bestseller Nudge came out in 2008, there was huge optimism about behavioural science as an inexpensive route to wiser policy. What is the state of applied behavioural science today?

Elisabeth Costa (BIT chief of innovation and partnerships): If you think back to early projects that really proved the concept of behavioural science, there were relatively straightforward behaviours and questions like, “How do you get people to pay their tax on time?”—so, simple interventions that were shown to be incredibly effective, and cost effective, and able to be rolled out at scale. All of those things are still happening across national governments and other organizations, and are still very much worth doing. But it has become more business-as-usual. So, what we’ve been doing at BIT is thinking less about just intervening at a certain point, and thinking how to shift more complex behaviours that are systemic—things like corruption, violence against women and girls, and the societal impacts of AI.

Tom: From one perspective, AI changing behaviour could be beneficial. From another perspective, it’s frightening. Do the institutions of applied behavioural science—your organization, behavioural scientists in government, those in the business world—have a vision of how to best employ AI?

Elisabeth: Behavioural institutions are still sorting out their house views on this. For us, we have seen promise from how we use AI in our own behavioural-science projects, and therefore how we can have greater impact at scale. We do recognize the societal risks and risks of misalignment, but also keep an eye on the potential upside and benefits to society and individuals. For example, we ran an experiment looking at whether LLMs can de-bias our decisions, with promising results. So we think there’s cause for optimism.

Tom: We’ll come back to that study. First, could you talk about behavioural research, and how AI can be employed there?

Elisabeth: We’re using it for things like speeding up and improving literature reviews. We’re using it to dramatically scale qualitative research, too—for example, using AI interviewers. We’re also at the very early stages of experimenting with synthetic participants. And we’re thinking about how to build more precise interventions that target particular behavioural barriers, or particular biases and contexts. Deploying AI across the whole life cycle of a social-science research project, you should be able to do more, get a much richer understanding, and have a much greater impact.

The four themes in the “AI & Human Behaviour” report. (Courtesy: BIT)

SETTING A HUMANE PATH

Tom: One of the contentions in your paper AI & Human Behaviour is that the development of artificial intelligence is overlooking our behaviour, and that there is a narrowing window in which to act. Could you explain that?

Elisabeth: What we’ve seen in previous technology adoption is that norms and behaviours harden quickly, and early experimentation tends to lead to organizational path dependency. It’s hard to say exactly where AI is on that spectrum. But we’re all sitting here with laptops with QWERTY keyboards. The QWERTY keyboard was invented in the 1800s, and it’s demonstrably not the best design for a keyboard, ergonomically or in terms of typing speed. There’s reason to think the evolution of AI will have similar path dependencies, which means that this governance question is really important: How do we decide what we definitely do, and don’t, want AI to be used for?

Tom: What are you concerned about becoming fixed?

Elisabeth: In an organizational context, which tasks or roles become automated versus augmented by AI—those are decisions that will probably be quite sticky and quite hard to unravel. Similarly in our personal lives, the extent to which we rely on it for different tasks affects the extent to which our cognition is still sharp and active, or if we’re falling into traps of cognitive degrading and offloading.

Tom: One of the things you suggest is post-training to help AI systems align with users’ long-term wellbeing. What kind of progress is being made on translating the concept of wellbeing into a stable and meaningful metric that reward models could optimize for?

Michael Hallsworth (BIT chief behavioural scientist): We make the point that current training methods, like reinforcement learning from human feedback, RLHF, are too simple. They’re good at training an AI to give an answer a human likes in the moment, but that can increase sycophancy, and perhaps even increase “present bias,” where the AI pushes what feels gratifying now, rather than what pays off in the long run. So, rather than just optimizing for what’s “liked,” we need to find ways to reward responses that support long-term psychological well-being, meaning concepts like meaning, growth, and mastery. This could mean giving high scores to an AI response that introduces “helpful friction” or challenges a user’s assumptions. Frankly, progress is pretty slow—it’s hard to translate these concepts into precise objectives that a reward model can actually optimize for! There’s a lot that behavioural scientists and computer scientists can do together here.

COULD AI NUDGES UNDERMINE US?

(Credit: Gemini)

Tom: If these systems were somehow “nudging” people to long-term well being, what effect would this have on people’s sense of autonomy and competence?

Michael: If done the wrong way, it could undermine feelings of autonomy, sure. We highlight that risk. But the current situation is also creating risks in terms of overconfidence and bias reinforcement. We’re pushing for an approach that encourages reflection rather than suppressing it—we call it “collaborative metacognition.” For example, if your stated goal is long-term saving but your queries are drifting toward risky day-trading, you might get a prompt saying, “I’ve noticed the strategies we’re discussing have moved toward higher risk than your original goal … Is this a deliberate change in your strategy, or would it be helpful to revisit your initial goals?”

Tom: Your paper raises the idea of AI systems that are emotionally proactive, intervening according to the user’s mood: “It can learn to be more reassuring to a stressed user or to guide a user away from a cognitive bias.” Does this assume that people should never be in aversive states?

Elisabeth: No, I don’t think so. Because expressions of anger and frustration can be useful and constructive. The ideas around aligning AI to match human psychology, particularly around mood adaptation, are early-stage and speculative. But you don’t want a flat emotional state; that’s not human. Equally, there are times when we want to moderate our emotions, and AI could help us do that, particularly in difficult interpersonal interactions.

Tom: So, if we have AI systems that allow certain aversive feelings, how would they decide which are okay? Your paper cites the notion of the system “reading the room.” But different humans read rooms differently.

Elisabeth: Alignment comes with the biggest questions. We propose deliberative mechanisms to help societies decide together where they do, and don’t, want AI to intervene. But we are at an extremely early stage of understanding how AI influences us, and therefore what is the right level of alignment to be aiming for.

Tom: What should AI developers be doing about this right now?

Michael: A few different things. First, they need to “measure what matters.” That means going beyond simple task-completion metrics to assessing the psychological impact of the AI model—how is it shifting user confidence, decision-making, or sentiment over time? Second, they need to practice “influence transparency.” That’s especially crucial where an AI is designed to be persuasive, like in sales or support bots. Developers should be testing the effects of transparency. That could include things like flagging when an AI is using a specific persuasive technique or even just expressing a simulated emotion. Third, they should be “red-teaming for persuasion.” This goes beyond typical red-teaming. It means actively testing how their systems could be used to manipulate users, engineer dependence, or create “preference drift,” where goals and opinions mutate through AI interaction without the user realizing. Finally, they should use “behavioural briefings” to help the AI “read the room.” Those briefings are an inference-time adaptation that helps a model detect when a user is likely stressed or falling prey to a common bias, and adapt its strategy accordingly.

AI VS. COGNITIVE BIAS

Tom: You mentioned a recent experiment looking at how AI could help manage cognitive biases. What did you learn?

Deelan Maru (BIT senior policy advisor): We ran a study with about 4,000 adults in the US and the UK. We randomized them into a control group; a group that had an LLM that they could interact with, if they chose to; a group that had an LLM that they must use; and a fourth group that had to use a reflective LLM, which just asked questions to help the individual understand their choices, but wouldn’t give direct answers. We took the participants through a series of common behavioural-bias scenarios, involving the sunk-cost fallacy, outcome bias, anchoring effects, and the decoy effect. Finally, we took away the LLM entirely from all groups, and asked them questions again on common behavioural biases, to test whether the de-biasing persisted. In three of the four behavioural biases—the sunk-cost fallacy, outcome bias, and anchoring effects—the “mandated use” LLM was successful. Only with the decoy effect, it didn’t have a de-biasing effect. However, we also saw some evidence that these debiasing effects were only temporary.

Tom: Did your findings suggest any design ideas for AI?

Deelan: We thought about web browsers, where you could have AI pop-ups that appear and tell you, “Here you’re seeing something which has a sunk-cost effect—consider choosing X instead.” It could be something that you subscribe to that operates across the browser, rather than you having to go into an LLM each time and ask.

Elisabeth: With the “reflective LLM” group in the experiment, we saw mixed effects, but still think there could be merit there. In the anchoring example, we asked people how many babies were born in the US every day, and gave people either a low anchor of 100 or a high anchor of 50,000. The reflective LLM tried to lead people through questions like, “Okay, what’s the overall population? … What’s the birth rate? … Therefore, how many babies would you expect to be born every day? … Are either of these estimates wildly off?” You’d need to balance this with ease of use. But there are cases where you could have quick metacognitive prompts to lead you to a better answer.

The team ran experiments to see whether chatbots could curb cognitive biases such as anchoring effects. (Courtesy: BIT)

Change behaviour by alerting someone to this article!

AI THINKING, FAST AND SLOW

Tom: Does behavioural research offer ideas for how to develop AI? I believe you’ve advocated a “neurosymbolic” approach.

Elisabeth: Yes, the concept is based on the idea of dual-process theory that was popularized by Danny Kahneman and Amos Tversky. Their innovation was that we have two modes of thinking, one that is fast and intuitive, but also prone to error and bias, and one that is slower, more deliberative, but also more effortful, and therefore engaged less but less error-prone. What we’ve seen in recent work on dual-process theory is that it’s not really a binary of systems; it’s a spectrum. The advantage of human intelligence is that we can move between those two systems, to match our cognitive strategy to the problem at hand. Neurosymbolic AI is about bringing together those two ways of reasoning—one which is about fast-thinking pattern recognition, and one that is about slower, more critical, reasoning, and having those two systems work together. A “metacognitive controller” mimics human intelligence by trying to match which mode is used based on the complexity and consequence of the request, and the decision at hand.

The “metacognitive controller” would help AI systems choose between fast and slow thinking. (Credit: BIT)

Tom: You also speak of “resource rationality” and “meta-reinforcement learning.” Could you explain those?

Elisabeth: So, resource rationality. You might say, “Wouldn’t you just build an AI that’s always in System Two, putting as much computational power at every request as possible to get the best possible outcome?” But people don’t spend all of their time in System Two. The way we use these two systems is enormously efficient because it allows us to process enormous amounts of information and complexity without being overwhelmed. Resource rationality is about having a model that can similarly be efficient in how often it engages, both in terms of time spent and in the compute used.

Michael: And the idea of meta-reinforcement learning is that the LLM would be rewarded for thinking about its own thinking, expressing uncertainty or flagging its own potential errors. That’s instead of just being rewarded for producing a confident-sounding, but wrong, answer. That’s valuable because it trains the AI how to learn to solve problems, rather than just training it on one specific problem. Behavioural scientists can provide a guide here. We might suggest rewards for exploration strategies that humans use to avoid errors, like perspective-seeking.

THE GOOD, THE BAD, AND THE LIKELY

Tom: To conclude, I have three questions for each of you. First, what concerns you about the future of AI and human behaviour? Second, what gives you hope? Third, what seems most likely?

Elisabeth: I’m concerned about the ways that our interactions with AI may reshape human relationships, particularly intimate relationships. AI companions are on the rise and currently largely unregulated, which presents risks particularly for younger people. There’s a risk that the increased use of AI companions could distort the norms and expectations of human relationships, or even substitute them in a way that could actually increase loneliness. There are positive ways that AI companions could be used, for instance as practice grounds for human relationships. But I think it’s important that regulators consider the risks and implement safeguards now.

Deelan: What concerns me most is misinformation: hypertargeted, hyperpersonalized misinformation using AI.

Michael: What concerns me most is cognitive atrophy—the possibility that AI could accelerate a decline in attention focus and critical skills. But that’s not inevitable.

Tom: And your hopes for AI and human behaviour?

Elisabeth: From an organizational perspective, a lot of the discourse about AI adoption is about efficiency, and about how to do the same with fewer resources. I think there’s also a potential discourse about how organisations can raise their ambitions and achieve more with the same resources. That feels like the hopeful future of an organization with talent: that it can achieve an enormous amount more than without AI.

Deelan: Something that gives me hope is democratized learning. In education, there’s a lot of inequality of access to AI, and concern about cognitive atrophy. But LLMs could speed up leveled access to education in various countries. Another thing is the possibility of AI as a deliberative partner and moderator of discussions, and as a way to reduce conspiracy beliefs.

Michael: I’m hopeful that we can use behavioural science to help build AI that is genuinely wiser and more capable. In line with the “Augment” part of the report, I think we can help AI to develop its metacognition—its ability to think about thinking.

Tom: You’ve each cited worries and hopes. Now, what do you consider most likely?

Michael: I think it’s likely that there’s backlash and disillusionment around the abilities of generative AI in the short term, followed by a more realistic sense of how it can add value. In other words, we’ll move further through the hype cycle. Neither the greatest optimists nor the greatest pessimists will be proved right.

Deelan: I think the most likely path is that we are going to continue to see shallow use of AI tools by people, particularly in the workplace. Behaviourally, that reflects three core barriers inhibiting deeper adoption: motivation, capability and trust. Many people don’t yet see a clear personal benefit from using AI, lack the skills or confidence to use it well, or fear how AI might impact their identity. Until organisations address these barriers, thoughtful adoption of AI, where tools genuinely augment rather than replace human skills, will remain the exception.

Elisabeth: We’ve talked about there being a narrow window to shape the evolution of AI, and I worry that we as a society, and as policymakers and regulators, will spend too long admiring the problem and the risks, and not get quickly enough to how we actually shape this. Lots of research collaborations can and should be set up to give us a much better understanding of human-AI interaction. There is a lot of energy in this space, but I would encourage people to run at it even faster.

Discuss this with someone!

Explaining AI explainability

Conor Griffin — Thu, 23 Oct 2025 11:18:53 GMT

Conor Griffin recently sat down for a discussion with Been Kim and Neel Nanda, two leading lights of the AI explainability field. We touch on the history and goals of explainability research, the usefulness of different approaches, and explore if humans can learn new knowledge from AI systems. We end with Been and Neel’s top AI explainability policy ideas and dream experiments.

Please see an edited summary below.

The origin story

Conor: Let’s start at the beginning. How do you both start working on AI & explainability and what motivates you to work on it today?

Neel: I originally got into this field out of a concern for AGI safety. It seems we are on track to produce human-level AI within my lifetime. My core motivation is that if we can truly understand these systems, we are more likely to achieve better outcomes. Several years later, here I am.

Been: In 2012, I was starting my PHD when AlexNet achieved a huge performance boost on ImageNet. It was a massive deal. But when I asked people how it worked, nobody could answer. That felt wrong to me. At the time, people actively discouraged me from working on interpretability, saying no one cared and it wouldn’t lead to a good research job. But I was fascinated by the intellectual question: how do we understand these systems? There have been moments when I’ve considered working on other topics, but I always come back to this question. I can’t escape it. Of course, as Neel says, its importance is now becoming clearer than ever, given how prevalent machine learning is becoming in our society.

Conor: AI explainability research looks quite different today to 40 years ago, or even 5 years ago. Been, how would you paint this history? What notable eras, or epochs, would you call out?

Been: In the 1980s, folks were really excited about expert or rules-based systems. Despite what it may sound like today, people built some very capable systems that could, for example, outperform doctors at tasks like predicting patients’ complications in an ICU. One of my thesis committee members, Randy Davis, designed a system called TEIRESIAS that had interpretability built in. It could explain its reasoning to a human expert and, because the machine surfaced the exact rules it used, the human could then modify the knowledge base. So it was a two-way interaction.

After the AI winter, the neural network era began. One branch of explainability research focused on developing models that were inherently interpretable, like Bayesian models or decision trees. Another branch focused on post-hoc interpretation methods, like saliency maps, that are put on top of an existing neural network; you don’t touch the model, but you try to distill some intuition out of it. My own work, in 2017, on Testing Concept Activation Vectors (TCAV) fits here. At the time, people were using saliency maps to try to explain a model’s prediction by highlighting which individual pixels in an input image were most important to the model’s output. TCAV used real human concepts, like explaining that a picture was a dog by pointing to its ears and snout.

Now, we are in the post-LLM era and people are pursuing various approaches to explainability research. We have mechanistic interpretability, which Neel will discuss. We have methods for training data attribution, which tries to trace an LLM’s output back to its source data. Some people think Chain-of-Thought reasoning has solved interpretability - I disagree. And my own work focuses on teaching humans how to extract new knowledge from these models so we can keep up and evolve with them.

The goals of AI explainability research

Conor: We’re going to come back to the merits of Chain-of-Thought reasoning traces for explainability. As well as to the idea of extracting new knowledge from AI systems. But taking a step back, Neel, if you had to list the main reasons why people work on explainability today, what would you call out?

Neel: There are many goals. I can try to cluster them. First, we have the AGI Safety community that wants to ensure that we can safely control future, more powerful models that could potentially deceive us. To do that we need to be able to look inside them. Then there is the goal of avoiding harms that are already possible with current AI systems, such as issues of fairness and bias, for example in hiring or lending decisions. There are also fields like finance and medicine, where practitioners are often uninterested in AI systems that can’t explain their decisions. Finally, there is the scientific motivation. There are so many rich mysteries of intelligence and computation that may be answerable by studying AIs.

Been: Those are great. I would add one more: we need to keep up and evolve with these models. Imagine you run a factory and hire an amazing employee who eventually runs all the critical operations. One day, she quits or makes an unreasonable demand. You have no choice but to comply because you are no longer in control. You failed to maintain transferability and oversight. I make the same argument for LLMs. Even if a model is having a positive impact, we must know how it does what it does to ensure that humanity benefits and stays in control.

Neel: I strongly second that. It’s the extreme example of a problem we face today: empowering users. To customise and control AI in an intuitive way, non-experts need to understand why an AI model is doing what it’s doing. Been’s point is that even the experts not knowing could be catastrophic in the long term. It’s all a continuum.

The merits of mechanistic interpretability

Conor: Let’s go a bit deeper on the AGI safety perspective and how mechanistic interpretability could support that. I feel like a growing number of us in AI policy and governance circles will have seen mechanistic interpretability work. But it’s not always, itself, that interpretable to non-experts. From an AGI safety perspective, why is mechanistic interpretability useful, and what are the main approaches to it?

Neel: First, let me explain what mechanistic interpretability is. I find it helpful to contrast it to the standard ML paradigm, which I call ‘non-mechanistic non-interpretability’. Normal machine learning trains a neural network on inputs and outputs, and nudges the model to be better at producing the outputs. The inside of the model is just lists of numbers. We can see those numbers but we don’t know what they mean.

Mechanistic interpretability tries to engage with those numbers and a model’s ‘internals’ to help us understand how it works. Think of it like biology: You can find intermediate states like hormones. Figuring out what these mean is really hard, but can be incredibly useful. A lot of the core concerns in AGI safety boil down to: we will have systems capable of outsmarting us. But it’s much harder to deceive someone if they can see your thoughts, not just your words.

As for the approaches, I find it useful to divide the field into three buckets, even if people may disagree about where their own work fits:

Applied Interpretability: The low-risk, low-reward approach of just picking a task and trying to solve it better by using a model’s internals. For example, simple techniques like “probes” are very effective for detecting misuse. (A probe is a simple AI model that is trained to find the ‘signature’ of a concept, like sentiment, within a separate AI model’s internals).
Model Biology: This is the middle ground, where you try to understand the high-level properties of how a model is “thinking”. For example, there was a lovely bit of work on auditing models for hidden objectives from Samuel Marks and colleagues. They trained a model to have a hidden objective, where it would exhibit whatever behaviours it believed its training reward model would like, even if they were unhelpful to humans. They then used a bunch of mechanistic interpretability techniques to try to understand what that goal was. And several of the techniques were successful.
Basic Science: This is the extreme end, where people try to reverse-engineer a model into its source code or figure out the meaning of individual neurons, like some of the older ‘Circuits’ work. Historically this was the dominant approach, but now there’s more interest in the other camps.

I think all three groups contain potentially promising, potentially useless, approaches. It’s ‘research’ after all. I used to be very much in the ‘basic science’ camp, but I became a bit disillusioned. Now I largely view myself as being in the ‘model biology’ camp. But I also think ‘applied interpretability’ is important. About a third of my team is directly working on how we can use interpretability in production, with Gemini, to help make things safer.

Conor: In mechanistic interpretability, there has been a lot of focus on ‘sparse autoencoders’. Can you explain what SAEs are and how useful you think they are?

Neel: A sparse autoencoder tries to create a brain-scanning device for an LLM. It takes the confusing mess of internal signals - the model’s “brain waves” - and tries to identify meaningful concepts. Imagine putting a brain scanning device up to somebody’s head. If you know what you’re doing, you might say: “this particular squiggle means that Neel is talking, while this particular squiggle means Neel is feeling happy.” Anthropic famously developed an SAE to find a concept for the ‘Golden Gate Bridge’. They then manually amplified this so that Claude became obsessed by it - it started adding ‘by the Golden Gate Bridge’ to a spaghetti recipe.

My team pivoted to focus on SAEs in early 2024, but we found a problem. It’s easy to find cool-looking examples. But the real question is whether they make systems safer? We tested SAEs on the task of classifying whether a user had harmful intent. It turns out that the simple, decades-old linear probe technique, from my ‘applied interpretability’ bucket, worked dramatically better. So I would say that SAEs are a useful tool for getting a general sense of what’s happening in a model, especially if you have no idea what you’re looking for. But they are not a game-changer for everything. If you have a specific goal, like detecting harmfulness, you’re better off using simpler techniques tailored to that specific use case.

Conor: Overall, how optimistic should we be about mechanistic interpretability? We see very different opinions on this, from the optimism of Dario Amodei to the pessimism of Dan Hendrycks.

Neel: In a sentence: ‘Big if true, but not yet production-ready, so watch this space’. More concretely, we’re starting to see real evidence of interpretability being the best tool for the job on certain tasks. Linear probes are both effective and cheap enough that they can be run on a model in production, which is pretty awesome. But these are not perfectly reliable techniques and there are cases where really boring baselines, like just asking the model its thoughts, are actually more effective.

The main reason to invest a lot in interpretability is its potential to be incredibly useful for future risks that existing techniques will not be able to solve. But even there, it’s one potentially promising approach. It is one tool in our toolkit. Combining multiple techniques will be much better than over-relying on one. For example, there’s this really important field of evaluating models for dangerous capabilities, but we also have evidence that models can tell when they’re being evaluated. There’s some preliminary interpretability work trying to find the “I’m being tested right now“ concept inside the model and just deleting it. This is not yet production-ready, but if we could find it - which I think is an attractive research problem - then we would have so much more faith in the entire field of dangerous capability evaluations.

Debating the usefulness of Chain-of-Thought

Conor: Let’s return to reasoning models and their Chain-of-Thought. How useful might these models’ ‘reasoning traces’ be, as an explainability technique?

Neel: I’m pretty positive on this. Maybe I can give the optimistic case and then Been can tear it apart. Let’s first unpack what’s going on on a technical level. Language models take in some text and produce the next token, or word. You can use them to produce many words by repeatedly running them on the word they just produced, and everything that came before.

The main insight with reasoning models was to use new techniques to run them for a really long time, and give them access to a ‘scratchpad’. It’s sometimes called Chain-of-Thought, but I think that’s a misleading term. It’s more like what happens in the model in a single step are the “thoughts,” and what the model puts down on the scratchpad is the equivalent of what I might write down when I’m trying to solve a problem.

With this framing, it becomes clearer from a safety perspective. If I want to form a really careful plan, it’s useful for me to write down my thoughts. This means that if you’re reading what I’m writing, it is harder for me to get away with plotting against you. In AI terms, the first models capable of doing dangerous things without us reading their scratchpad, will likely not be capable of doing those things if we read their scratchpad.

Therefore, having ‘monitors’ (separate AI models) that look at a model’s scratchpad is useful and important. But it’s not a panacea. If tasks are easy, I can just do them in my head. And maybe sufficiently capable models can also do very dangerous things ‘in their head’. So we would still need to find more fundamental solutions. But I think Chain-of-Thought monitoring is pretty great and lots of people hate on it way too much.

Been: I agree that Chain-of-Thought and looking at a models’ scratchpad could be useful for some problems. However, here’s my argument against just doing this and stopping there. I strongly believe that machines think and work in a very different way to humans, and so our language is a ridiculously limited tool to express how machines work. Imagine a machine ‘born’ in a little room and given tons of documents - the entire history of everything humanity has written. It has no experience, no sensing, no embodiment, no idea of life and death, survival, or human feelings. It’s hard to convince ourselves it will reason in a way that’s similar to us.

Human language evolves quickly because it’s useful to our lives. We have ideas of family, life, and death. Machines are a weird animal, and their thinking is completely different because they were brought up differently. Now they have to squeeze their complex thinking into the tiny little words that humans brought to them. It’s difficult to assume these machines can even express what they think using human language. And if we can’t understand the risks, we can’t be prepared for them, and that could be dangerous. For that reason, Chain-of-Thought is a fundamentally limited tool.

Neel: I just wanted to add another brief caveat. While I think Chain-of-Thought monitoring is great for frontier safety issues, like models acting in sophisticated ways, I think it is much less useful for more traditional problems. For example, if I wanted to be racially biased against someone, I would not need a scratchpad, so there’s no guarantee I’d write it down. If a problem is really helpful to break down and write notes for, then Chain-of-Thought monitoring can be extremely effective.

Learning from how AI systems learn

Conor: Been, your point about human language not necessarily being the best way to understand AI systems is a nice segue to your other work that explores how humans can learn from AI systems. Can you start by walking us through the work you did with AlphaZero and the chess grandmasters?

Been: As I touched on in the history of explainability research above, and Neel touched on with his own examples, we repeatedly see excitement about certain explainability methods and then swings towards skepticism. The motivation for this work started when some of us discovered that some popular ‘post-hoc’ explainability methods didn’t work very well.

We proved mathematically that methods like SHAP and Integrated Gradients may tell you that a certain ‘feature’ is important to a model’s output, like your ‘salary’ is important to a model’s hiring prediction. But our work showed that whether the feature was actually important was 50/50. So it was essentially random. This made the community question what we had done with interpretability research for the last decade. This also impacted my thinking about my work; how can we avoid being fooled by delusions of progress? So I just really wanted to show, as quantitatively as I could, that we can use explainability methods to communicate something new from a machine to humans.

That gave birth to the chess work, where we taught superhuman chess concepts to four grandmasters, who generously volunteered their time. Along with other colleagues, I worked with Lisa Schut on this research, who herself was a professional chess player in a previous life. We used AlphaZero, the world’s best chess-playing machine, to extract “superhuman” chess strategies that were very different from what human grandmasters had done in the past. Then, we designed a teaching session for the four grandmasters. We quizzed them on board positions and asked them what moves they would make. Then we showed them the moves AlphaZero would have made, to teach them the new concepts. Finally, we tested them on different board positions that invoked the same superhuman chess strategy to see if their own strategy had changed.

Chess was the perfect playground, because we knew who the experts were. And we could roll out an entire game to clearly verify which strategy was better. The results were remarkable. All four grandmasters changed the way they play the game. One of our participants, Gukesh Dommaraju, who was really open-minded about it, became the youngest World Chess Champion last year. Chess coaches sometimes spend a whole year trying to teach a grandmaster one new thing, because the players already know so much. The fact that we could steer their thinking about what the best move is showed me that this approach worked. This was a proof-of-concept that we can use AI to teach something to human experts who are the frontier of human knowledge.

Conor: Yes, typically the discussion is always about whether AI can reach the level of human experts, and designing an evaluation to assess this. But this is a nice reminder that experts can also learn from AI systems. Your recent work on ‘neologisms’ takes this idea a step further?

Been: The neologism work asks the natural next question: how can we generalise this teaching process for anyone, not just chess masters? We wanted to use language, the most natural interface. The idea is simple: if the current set of words are not enough, let’s create new words (”neologisms”) to describe the complex concepts that AI models are learning.

Think about the word “skibidi”, which GenAlpha uses to mean something like “good, bad, and boring” all at once. Humans create new words like this all the time because they are useful. GenAlpha learns this word in the right context and so they know how to use it. But it’s complex. So there are articles online to explain it to everybody else.

The idea is to do the same thing with machines. In our recent paper, we create neologisms to bridge the communication gap between AIs and humans. As I touched on above, we start from the premise that machines conceptualize the world differently from us. So we create new words to give us more precise handles for concepts that the AI model is using. First, we developed a method to teach the model new words. We showed that this works in practice: a ‘length’ neologism gave us precise control over the model’s response length whereas using normal human language previously failed to achieve that. We then reversed the process to learn machine concepts that are new to humans. We created a neologism, we call them tilde (~) tokens, like ~goodM, that represents what the AI model considers a ‘good’ response to be. Here, we learned something new about its internal values - the model’s notion of ‘good’ is effusive, detailed, and often avoids directly challenging a user’s premise.

Once you have a new word, you can have a conversation with a model about it. “Hey model, what is ~goodM? What are some examples of it? Ok, give me a pop quiz about ~goodM to ensure I understand it right”. You can imagine extending this to better understand other complex concepts an AI model has learned.

Agentic Interpretability: A New Paradigm

Conor: One thing that strikes me is that AI is almost always framed as a ‘negative’ or a ‘risk’ to explainability. But millions of people now use LLMs and explaining things is one of the main use cases. This is logical, because LLMs have lots of the characteristics that people say they want in an explanation - you can double-check things, ask an LLM to make the language in the explanation simpler, or more visual etc. So, it feels that we are entering a paradoxical world where we may increasingly rely on AI to explain things, often better than humans can, while AI’s internals remain relatively unexplainable, at least to the public at large. This distinction is somewhat related to a recent paper that you were both involved in that introduces the idea of “agentic interpretability” and distinguishes it from more traditional “inspective interpretability”?

Been: Yes, the crux of the distinction is the role of the subject. You can think of more traditional ‘inspective’ interpretability as taking place in a lab-style environment. You, the researcher, have your gloves on and you are opening up an AI model and doing all the analysis. The model just sits there. However, in ‘agentic’ interpretability, the model you are trying to understand is an active participant in the loop. You can ask it questions, probe it, and it is incentivised to help you understand how it works.

Think of a teacher-student relationship; a teacher teaches better when they know what the student doesn’t know. Agentic interpretability proposes that because machines can now talk, we can have them build a mental model of us - of what we know and what we don’t know. And it can then use that mental model to help us understand them.

There’s a trade-off. Inspective interpretability aims for a complete and exhaustive understanding, which is critical for high-stakes situations. Agentic interpretability strikes a balance between interactivity and completeness. You may not understand everything about the model, but you, the user, can steer the conversation to learn the aspects that are most useful to you.

Neel: I think it’s also useful to think of the two approaches as answering very different questions. Inspective interpretability asks: “How did the AI do this?” Agentic interpretability is a bit more like a standard machine learning task: you’re trying to build a system that is good at explaining complex things, which is difficult but not as fundamentally different to standard machine learning as inspective interpretability is.

Policy Implications and Dream Experiments

Conor: Let’s wrap up. If you had one policy recommendation for governments on explainability, and one “dream experiment” that you could run, without any resource restrictions, what would they be?

Been:

Policy: I’m co-authoring an ACM Tech Brief on this, that will highlight what is possible now and in the future. The summary is: do simple things now. For high-stakes decisions, standards requiring simple, statistically solid validation methods like ablation tests, which can check, for example, if a certain feature was important in a hiring decision, could be worth consideration. We can do that today.
Dream Experiment: Interpretability is a means to an end - the end is making AI work for humanity. My dream scenario would be to have an infinite feedback loop with experts at the frontier of human knowledge - mathematicians, scientists, and others. We need their insights on what they want from AI in their workflow to build tools that truly help them.

Neel:

Policy: I was recently part of a position paper on Chain-of-Thought monitoring. We had people from lots of the AI labs, as well as many other institutions. There was a broad consensus that Chain-of-Thought monitoring is valuable for safety - I was actually surprised we ended up there, but we did. But it’s also a fragile opportunity. There’s just a bunch of things that developers could do, accidentally or deliberately, that might have the side effect of breaking this. While I wouldn’t ban developers from ever breaking it, I think measuring “monitorability” and providing an adequate replacement if degraded is worth further research.
Dream Experiment: AGI safety research is hard because we’re trying to fix future problems in AI systems that don’t exist yet. But we do have good techniques for giving models properties that they didn’t learn by themselves during training. So, if you wanted to, you could train a model to only speak French when it is speaking to a man. We could get much more creative with this. My dream study would be to create an enormous, open-source library of models with different kinds of carefully engineered safety problems (e.g. models that fake alignment), that we then deploy our leading interpretability techniques against. We could run a global competition where people compete to identify and, ideally, fix these problems. I think this is technically possible today and would be an immense public good.

Thanks to Nick Swanson for his support.

What history can tell us about AI's economic impact

Julian Jacobs — Thu, 02 Oct 2025 12:20:20 GMT

Julian Jacobs recently sat down for a discussion with Carl Benedikt Frey, an economic historian and professor of AI & Work at the University of Oxford. Carl’s books include The Technology Trap and the newly-released How Progress Ends, where he challenges the conventional belief that economic and technological progress is inevitable.

In the discussion, we explore the lessons from previous technological shocks, and their relevance, or not, for AI. We explore the growing evidence base on AI’s economic impact and conclude with Carl’s rapid-fire forecasts. The below is a condensed and edited version of the discussion. Enjoy!

The origin story

Julian: Let’s start at the beginning. What initially drew you to studying the economic impacts of technology, and AI in particular? Which questions were most interesting to you, and have those changed?

Carl: When you consider that being human was miserable for a long period of history and then look at the material prosperity we have today, you start thinking about why the first Industrial Revolution happened and why it took so long. When you start thinking about that, it’s hard to think about anything else. That’s what put me on the path of studying economic history and trying to understand how technological progress happens.

There are several interconnected questions that were initially of interest to me. The fact that modern technology is, in principle, available almost anywhere, yet so many places around the world are poor, is a significant question that is not answered conclusively. Also, human existence remained pretty miserable for a long time. Many of the technologies that made the early Industrial Revolution were not very complicated, yet it took a long time for people to conceive and adopt them.

More fundamentally, technological progress is the driver of growth and prosperity over the long run, but there have been many hiccups. Many people lose their jobs. What economists regard as the “short run” can be a long time for some, so it’s natural for certain groups to resist it. A key question is: how can you put mechanisms in place that aid that transition and make technological progress more inclusive, not just in the long run but in the short run as well?

Lessons from History: adoption lags, new tasks & steering technological progress

Julian: People often argue that AI’s economic impacts are deeply uncertain. And of course that’s true. But it can also cause us to overlook how much social science and history can teach us, for example about the ways in which past technology shocks interacted with labor, wages, and the economy at large. When you look at past technological shocks, are there consistent dynamics, principles, or lessons to call out?

Carl: I think there are several lessons. One is that it often takes longer than you think for these things to feed through. You can have exponential technological improvements without exponential economic growth. We’ve seen this throughout the history of computing; the growth rate in the economy does not mirror Moore’s Law by any stretch of the imagination.

Part of that is because for technology to have an impact, it needs to be put into use. For it to be put into use, humans have to want to use it, and we need institutions in place that permit us to use it. There are interest groups not interested in seeing their jobs and incomes disrupted. There are firms not interested in seeing their business model overtaken. There are complementary investments into skills and infrastructure that are needed. Those factors feature whether it’s the first Industrial Revolution, the second, the computer revolution, or today with AI.

A second lesson is that productivity growth is not going to be significantly higher unless technology creates new tasks, activities, and industries. If all we had done since 1800 was automation, we would have productive agriculture and cheap textiles, but not much else. We wouldn’t have rockets, antibiotics, vaccines, computers, or AI. Most improvements in the standard of living come from doing new and previously inconceivable things.

Productivity only surged in the late stages of the first Industrial Revolution. Just mechanizing textile production didn’t lift the growth rate that much. It was with the railroads and the chemical industry—and later in the mid-20th century with the automobile and electrical industries—that we saw a significant uptick because they created a lot of new tasks. We have seen some of that with the computer revolution, but not yet to the same degree. We haven’t yet seen much of that with AI, but hopefully, it’s to come.

Julian: So we have two key lessons: adoption matters a lot and to boost productivity, technology has to create new tasks, activities and industries. When it comes to inequality, for a technology shock to boost productivity, must it also boost inequality, even in the short term? And how much can governments steer the ways in which technological shocks unfold throughout the economy?

Carl: It depends. If technology takes a more labor-replacing form, you’re more likely to see inequality rise, backlash against technological change, and the labor share of income fall. If it takes a more enabling form, the opposite is more likely. You’re going to see the labor share of income potentially rise or remain stable, growth being broadly shared, and less unrest.

Now, as to your question about if we can steer technological progress, I think it’s extremely difficult. We don’t know how AI is going to evolve without intervention, and we’re not sure how it’s going to evolve with intervention. Often when you intervene, you favour one interest group at the expense of another. You might favour existing interest groups rather than future, more dispersed groups who don’t know that they are potential beneficiaries. It’s very hard to do in practice, and we should be humble about our capacity to do that.

That’s not to say that intervention is impossible. If you have a clear objective, like greener energy, you can tax fossil fuels and subsidize renewables. When the goal is to make growth more broadly shared, you can invest in educational systems and training programs, but many of those have not been very successful. It’s hard to know what skills will be needed. Not so long ago, there was a big drive to invest in coding skills here in the UK; that seems not to have been the most productive bet right now.

AI’s economic impact, so far

Julian: If we think about AI’s impact on the economy so far, what can we say, with any confidence, if anything?

Carl: I’ll note that what we have evidence for today may not necessarily apply in five years.

As things stand, the evidence comes from two main strands. First, the exposure studies: these map which tasks and jobs may be most affected by generative AI and, on balance, they find higher-skill, higher-education roles are more exposed. Second, experimental studies in settings like customer service, writing, and coding show the largest productivity gains accrue to novices and lower-skill workers, which suggests AI lowers barriers to entry.

AI is also eroding language barriers: it’s now far easier to collaborate and transact across languages, which makes exporting services more feasible for many workers. If those patterns persist, I’d expect firms to offshore a greater share of professional-services activities—accounting, management consulting, financial modeling—because AI compresses productivity differentials and reduces the language friction that used to favour onshore talent. Indeed, early evidence indicates that entry-level work has been disproportionately negatively affected by generative AI, which I strongly suspect reflects the offshoring of such activities.

As for what we don’t have evidence for, I would point to productivity effects. Most writing about AI is very speculative at this stage. We observe trends in aggregate productivity, but it’s not very impressive and doesn’t point towards AI having a material impact. There is very little evidence of an impact on labor markets or productivity, which speaks to the fact that their impacts so far have been relatively limited.

Julian: Some economists now argue that AI is reducing employment among recent graduates. But it’s extraordinarily difficult to disentangle any potential effect from AI from other macroeconomic drivers, like interest rate shifts. What do you think?

Carl: Vanishing employment opportunities for recent graduates predates generative AI. It’s quite possible that AI has exacerbated it, but it’s not a new AI-driven phenomenon. At the same time, we might be at an inflection point. You could imagine yourself in 1960 in Pittsburgh, seeing minimills emerging and saying, “Look, I’m not seeing anything, the integrated steel mills are still up and running.” Obviously, a decade later, everybody in Pittsburgh would be feeling the impact.

The Future of Work

Julian: Fears of job loss rank high in surveys of public attitudes to AI, and many observers have sketched scenarios where few jobs remain. What do you make of these scenarios? It strikes me that, despite fears of a ‘jobless’ future, there are actually plenty of industries where we desperately need more human workers— healthcare, childcare, and elder care. And of course people could also contribute in other ways, beyond traditional employment.

Carl: I’ve been puzzled by how good AI is and how little it’s showing up in any statistics. If you take the latest reasoning models, they are plausibly a better tutor than I am in any subject, including in my area of expertise.

I think where they don’t do particularly well is dealing with novelty. What you want in a changing world are systems that adapt to new circumstances and can learn from just a few examples. I don’t think we’re there yet, and it’s unclear to me how rapidly that is going to happen. Given what we already have and the limited impact it has, I struggle to see a world of 40% unemployment in 20 years.

As a thought experiment, suppose AI could do almost everything. What work remains? We still watch professional chess despite computers being stronger. Humans value competition, authenticity, and status; there will be activities we do for their own sake. A large share of employment would likely persist in caring roles—health, childcare, eldercare—where presence, trust, and relationships matter. Symbolic and representative roles—politicians, clergy, community leaders—don’t disappear. And there’s in-person service work people want as part of the experience, plus new categories we can’t yet imagine.

If unemployment did climb dramatically, the central question becomes distribution and meaning. We’ve been through a transition in which work has become deeply associated with our status and our place in society. We don’t need full-fledged automation for that to change. It’s not clear that in the age of AI roles like lawyer, consultant, or professor will have the same status associated with them. Given that we already transitioned from a society where we didn’t define ourselves by work, I guess we could transition back to a society where it’s not the key thing. It would be a major transition, but not an unprecedented one.

The Technology Trap & How Progress Ends

Julian: A key theme in The Technology Trap was the idea that technological progress could lead to more economic and political polarisation? Do you see this happening with AI?

Carl: I do worry that we are. I published that book in 2019, before generative AI. What’s changed since then is that the category of people most exposed to this technological trend is increasingly in professional services. But not because AI is going to automate everything they do, more because of this combination of AI and offshoring exposure. From the viewpoint of the American professional service worker, does it matter whether somebody else does their job in the Philippines or whether it’s outright automated? No.

What is different is the political economy of these changes. The people impacted by this are much more likely to write an angry op-ed in the FT than the average factory worker who felt the impact of industrial robots. Going forward, the people who stand to lose will have much more of an impact in writing the rules and regulations around the technology, and I suspect are more likely to do so in their own favour.

Julian: Moving to How Progress Ends, the thesis is that we need both decentralization to enable innovation, as well as the right kind of ‘bureaucratic context’ to scale it sustainably. Can you unpack this idea for us, with some examples?

Carl: The core idea is that durable progress needs two different institutional settings at two different stages. Early on you want decentralized exploration so lots of independent actors—entrepreneurs, labs, investors—can pursue competing designs. This is because we almost never know, ex-ante, which path will win. It’s vital that a “no” from one gatekeeper doesn’t kill a technology. Search is a nice illustration: some prominent investors passed on Google, but others—Sequoia and Kleiner Perkins in 1999—backed it, and that pluralism let the better architecture surface.

Once a design looks promising, the problem flips to scaling under a bureaucratic context—standards, procurement, liability rules, regulation, and infrastructure that can push costs down and manage externalities. The mRNA vaccines show the sequence: small, decentralized firms pioneered the platform, but mass deployment depended on highly bureaucratic processes—regulatory review, pharmacovigilance, advance purchase agreements, and a government-coordinated cold chain.

You see the same pattern with electricity and automobiles: tinkerers explored incompatible systems; scaling required standard voltages and frequencies, grid interconnection, building codes, road standards, licensing, and insurance. Where systems are too centralized—think the Soviet design-bureau model—experimentation narrows and promising ideas die when a single funder says no. The point isn’t “state versus market” but sequencing and balance: let many flowers bloom in discovery, then rely on capable, rules-based bureaucracy to diffuse and discipline the winners so innovation scales sustainably.

Policies of least regret

Julian: We seem to have a poor understanding about the policies that can best support an economy’s adjustment to technological change. Looking back at the history, are there any “least regret” interventions, with evidence in their favour? For example, many people proposed public retraining programmes. But, as we’ve written, there’s little reliable evidence that many programmes work very well.

Carl: I think Danish “flexicurity” in general is the right pathway. It provides flexibility, meaning that firms can pivot when technological shifts happen, and security, meaning when you lose your job, you have some welfare. Historically, Britain in the 18th century was the only economy in the world that taxed itself at 2% to provide for the poor. In places where the poor laws were more generous, you had less unrest accompanying industrialization. We know that providing security for people makes them more likely to go with the flow when it comes to technological change. I think that same intuition still applies today.

Quickfire Predictions

Julian: US and UK productivity has hovered at ~1% since 2008. If you had to guess, over the next decade, do you expect AI’s average annual percentage contribution to productivity growth in the US and Britain to be closer to 0.1 percentage points, 2 percentage points, or more?

Carl: Closer to 0.1 percentage points.

Julian: Will AI’s long-run economic impact be greater than, less than, or equal to the Industrial Revolution?

Carl: I think it’s going to be less impactful than electricity and the internal combustion engine, but more impactful than the first Industrial Revolution.

Julian: Will AI primarily increase or decrease inequality in the short to medium term?

Carl: I think it will decrease it globally, but increase it within advanced economies.

Julian: Will AI make today’s low- and middle-income workers better off over their lifetimes?

Carl: Absolutely. Most people are both producers and consumers, and it’s definitely going to make them better off as consumers. I think carpenters and plumbers will grow in status, while some knowledge work will decline in status.

Julian: Regarding AI’s impact on society, what is your greatest hope and what is your greatest fear?

Carl: My greatest hope is that AI will boost productivity and economic growth and solve our economic problems—and there are many of them. My greatest fear is that it won’t happen.

A discussion with Tyler Cowen

Zhengdong Wang — Tue, 05 Aug 2025 11:20:54 GMT

Tyler Cowen, Professor of Economics at George Mason University, Faculty Director of the Mercatus Center, writer at Marginal Revolution, and host of Conversations with Tyler, gave a talk on AI and economic growth, at Google DeepMind’s London offices, in early July. Tyler discussed how his views on AI have updated in recent years, across many areas, including how to best measure progress, the current market landscape, and how AI may impact living standards. We ended with a Q&A with Zhengdong Wang, a researcher working on Gemini and Post-AGI Research. Thanks to Tyler for sharing his thoughts with us.

Transcript.

Zhengdong Wang: I'm very excited today to have Professor Tyler Cowen, who is a professor of economics at George Mason University and has the popular economics blog Marginal Revolution, and Conversations with Tyler. So we'll have Tyler say a few words, as long as he likes, and then have room for lots of questions at the end. Without further ado, Tyler.

Tyler Cowen: I'm coming here from a talk at 10 Downing Street on AI. And while that's off the record, I think there's a few things I can tell you about what I said. Also, I think it's within the bounds of the agreement just to report they seemed highly intelligent and informed about AI and had good attitudes and I found it a very positive experience. And they had nice things to say about DeepMind. So, that was all to the better. Mostly I'll just say what are some updates I've made over the last two years as AI has progressed. Even if you were not at the talk I gave here two years ago, I think it will make perfect sense to you. But just a few things I told them at 10 Downing Street. Obviously the question is what should we do, right?

And some very practical things I told them they should do is that the UK has an asset, healthcare data, really the best data in the world and a lot of citizen trust in the healthcare system. I told them they should do more with that. The UK can be a leader in integrating AI into business services, which is a long standing UK export strength, and a lot can be done in that direction without the UK having to build a top of the line foundation model of its own. And then also education. The UK and indeed most or all countries should restructure its education systems at all levels so that a significant portion of the education is simply teaching people how to use AI. Doesn't have to be at a technical level, and I understand full well that whatever you teach people will be obsolete for sure, two years later, possibly two months later.

But the real thing you're teaching them is simply that they have to learn how to use the thing, and you're making that a big part of the curriculum and just impressing on them, this is a thing to be learned like reading and writing. And that was something they could do. Those were three things I told them they should do. There's a bunch of other things I told them they should do, but I feel they can't do, like make energy much cheaper or maybe they can do it, but they're not willing to do it. So if you're curious what happened there, that would be my brief report. It was pretty good. I had a great audience and I enjoyed doing it.

Any AI talk, you don't always know what you're going to meet. There's some worry you've got to show up and just, like, shake people, right? Like, it's happening. I didn't have to do that. So that was, you know, the biggest learning on my end. If I go to an audience in New York City in the US, we used to think New York City was our most sophisticated audience. What you have to do there is to shake people. I was at an event in New York City. I guess it was November, not about AI, but there were five of us in the green room, the moderator, the organizers, two people on the panel, all well-known people. And I used the phrase AGI. Not one of the five knew what AGI was. Like I don't know what it is either in a deep way, but they didn't even know superficially what it might have been. And that's stunning. That's when you have to shake people. But at Downing Street, I didn't have to shake people.

Now, let me just say some of the things I've changed my mind on in the last two years. I don't know if any of these are major. First, the rate of progress in AI quality, I think, is roughly what I expected. But the thing I have a much finer sense of is biases in our measures of quality using benchmarks. So most of you, I suspect, spend a lot of time with benchmarks. And while of course, that's what you should be doing, I think it introduces some significant biases in how you measure progress.

So from my point of view, if you work with benchmarks, you'll think the rate of immediate future progress will be really quite high because on purpose you're choosing benchmarks that the current systems can't handle very well. And if you're choosing questions for, you know, an exam for your AI, if it just aces the question, it's considered not a useful question, you throw it out, you try to get in more questions it can't handle. So in the period to come, it will look like, oh, it's doing better and better on the questions it couldn't answer. My point of view is different. I'm what you'd call a normal human being.

I've even said, half in jest, but half meaning it, that we have AGI already. So on questions that actual humans ask, I think the rate of immediate future progress will be quite low. But the rate of immediate future progress will be quite low because the rate of past immediate progress has been so high. So a lot of the questions, you're just not going to get much better answers. So you ask it now, how many r's are in strawberry? It tells you three. That answer isn't going to get any better. It's three. So the rate of progress there is zero. The rate of recent progress, you could say is infinite, moving from two to three as the answer is a pretty big gain.

If you ask one of the better LLMs, what's the meaning of life? It's kind of a stupid question, right? I still find, say the three best models will give you a better answer than humans will give you, which is impressive. And I also think five years from now, the answer won't be that much better. I don't have this Gary Marcus like view that the AI is all screwed up. I just think the answer only gets so good to the question, what is the meaning of life, because it's a bit of a stupid question. What is the meaning of life? So it's close to a perfect answer now. And I didn't feel that, say, the GPT-4 answer to what is the meaning of life was great. It was okay, but not better than what I could have written. Now, it's better than what I can write.

So AI researchers have this bias toward looking for future progress, but the actual basket of information consumption estimates of what progress has been is that on most things real humans care about, I think we're at AGI.

Let me give you another example. Maybe some of you work on this, but probably none of you do, and that's microeconomic reasoning. So if you don't know, I'm a PhD economist. Obviously, I've worked a great deal on microeconomic reasoning. I've written a textbook on microeconomic reasoning, and published a lot of articles. It's now clearly at the point where Gemini or o3 or Claude, they would beat me in a competition. I don't bother running it because I know they would win. It would humiliate me. But they win.

So that's a kind of AGI. There's different definitions of AGI. A lot of people now switch it to “it can do any job.” We clearly don't have that. “It can do any job behind a computer,” “it can do any job that involves processing information.” We don't have any of those things. But one of the older definitions was that on at least 90% of tasks, it beats the experts. And I think for that, not on what many of you are measuring, you know, the future progress on math Olympiad problems, but just for what people actually want to know, you could say we have AGI and it's not a crazy claim.

So I think again, we're underestimating progress over the last two years, but we're overestimating what it will be for the next two years because we don't realize that we've maxed out on so many different dimensions. And when o3 or or Gemini, you know, beats me on microeconomic reasoning, I look at those answers and my reaction is it's not going to get much better. It'll get somewhat better. But it's like the number of r’s in strawberry. That we know is not going to get better.

If you ask the top models, why was economic growth in Mexico slow in the 19th century, you get very good answers from at least three models. And again, that will get a little better, but we're sort of maxed out there. So my understanding of what's happening has been shaped by this measurement issue. And this corresponds in economics to, like, what's the actual consumption basket that you're using to measure progress? Say you're measuring progress over the last 25 years. If you use internet goods, the rate of progress is incredibly high. If you use the actual consumption basket, the rate of progress is much lower. So just how much the choice of basket matters for measuring progress is something I did not understand that clearly two years ago, but I feel I have a much better grasp of now. Not in a pessimistic way.

A number of other revisions I've made. Two years ago, I thought we would have a big shakeout and a bunch of AI major AI companies would go bankrupt, something like the dot-com bubble bursting, and then AI would come back in a big way and it would all take off. That's not my scenario anymore.

I think everything has become well capitalized enough, if only through options, that the major companies just will keep on going. And if you take one of the smaller companies, obviously smaller than Google, Anthropic, it's both doing well, it had a great revenue report this morning, but just the option value on buying Anthropic means the companies we all talk about will simply keep on going. I think that's good. It's certainly good for people who work in the AI sector. And that again is an update I've made over the last two years.

I'm also less likely to think that core foundation models will be commoditized. The models to me seem to be evolving in different directions and maybe will not converge as much as I had thought. So for any task I perform, I have a clearly preferred model, like computation, I would definitely prefer Gemini. Most of my actual life, I tend to prefer o3. So my wife and I were traveling in Europe, we were in Madrid for four nights and we wanted to ask: “What are all the concerts going on in Madrid that Tyler Cowen and his wife would want to go see?” o3 is like A+ for that.

You might need to give it a bit of context. Gemini is very good, but I think o3 would be my favorite. That kind of question is my most common use. And again, we're already at the point it's just not going to get that much better. Hallucinations are very low. They could still go down a bit. But say Anthropic being very good for business uses, Gemini being clearly the best for a lot of computational purposes. I'm less convinced that it's all going to converge and I think there'll be a lot of customization. So modestly away from commoditization has been an update in my view and that's related to the view that the companies that you talk about won't go bankrupt because if it's not a single same everywhere product, the price is not just competed down to marginal cost. There'll be all these boutique products that will be spinoffs of foundation models and a number of companies will just be able to make a lot of money. And good news for all of you, I think good news for the world, but that would be a change in how I've mentally modeled the sector.

Another thing that's surprised me, I would say positively, is you know, we all know OpenAI has been very commercial in a lot of its product decisions, and I feared at the time that that would hurt the quality of their top models. And I also feared at the time OpenAI for a number of reasons would not be able to keep a lead in some areas. Commercialization has proved useful as a cross subsidy for attracting talent, just for keeping the company sharp. And they have great cutting edge models, and they have some commercialized things which I don't use at all, in fact, but I think that's gone fine. And that's gone better than I thought it would have. That would be another update I have made.

The way in which current models integrate reasoning and search happened better and more quickly than I had thought. Pretty small update, but that's been my big surprise over the last, say, six to nine months. I know you all see these things more quickly than I do. But nonetheless, an update.

Grok has surprised me. It did better more quickly than I had thought. I'm waiting for Grok 4, which I think is coming out July 4th. But the notion that you can do catchup by GPUs, brute force, having your people sleep in tents, and send them to a lot of San Francisco parties and talk to others, it's sort of worked. Now for me, Grok is clearly worse than Gemini or o3 or Claude, but it's pretty good, and there's actually a few areas where I think it's the best model. So if I want to know something very recent, I will go to Grok. And that's very useful. So I do use Grok, not just to play around with it, but I actually use it, and we'll see how good Grok 4 is. But I thought it would be worse than Llama, and to me it's clearly more useful than Llama. So that would be another update I made. And I think Grok 4 will be this big moment where we see just how well is that strategy going to work.

People always are debating how much does distribution matter? Distribution, at least so far, has mattered less than I would have thought. This is a big question for Google DeepMind. Like I'm on WhatsApp all the time, all day. And at the top of my WhatsApp is some Llama AI. I tell you, I never use that. I don't care how convenient it is. Might get better with all the hirings, but it really doesn't matter. And not only am I that way, but as far as I can tell, the whole world is that way. They don't care about that distribution channel.

Google sends me all kinds of little messages which confuse me. You can now do this in your Gmail. I haven't responded positively to those. I suspect they're quite good products, or if they're not now, they will be soon. But somehow I get them at moments when I'm not ready to make that leap. So the fact that I use Gmail all the time, you would think is a big advantage for Google DeepMind AI products. In fact, I don't think it has been so far. So this to me has been a surprise, it could change. I think if it changes, you all are the ones in a position to make it change, probably more, you know, than Meta with Llama.

But simply the brand name of ChatGPT has become a word like Xerox or Google. People haven't heard of Claude or Gemini to a remarkable degree and they just call things ChatGPT. That has been very sticky. A name that was viewed as stupid, like arbitrary, has turned out to be brilliant. I think that was an accident, but it's one of the best marketing names I've heard in retrospect. And how much that is people's entry point into talking about what this is, stickier than I would have thought and actual ease of distribution, less sticky.

Something I noticed with my own use. So I'll have open windows to say Gemini, Claude, OpenAI products on my laptop, more or less all the time. But if I have a very recent query, I'll do the extra click to get to Grok. You know, I have Twitter open, but you have to click on that kind of horn symbol to get to a Grok query. And Grok is more useful. I'll actually do the extra click and not feel bad about it. And I think that's pretty stable. I think I would do two extra clicks, but again, the way that has all played out, these differences across the models which look small are more persistent and to me more important than I would have expected. So I'm still doing Grok and I don't mind the click, even though I could do other things because it just feels to me more recent on some narrow subset of questions. So like in the US, we're passing this new budget bill, it's called BBB. What's in the bill changes all the time. If I want to ask what's in that bill, I actually go to Grok and I feel it's the best for that very specific kind of query, but as someone who writes as a columnist, that kind of query is important to me pretty often.

It also surprised me that there's still not really much regulation of AI. And I would have thought we would have had something by now. The Biden executive order was not that binding to begin with. The Trump people tossed it out. You probably all know the Senate turned down this idea of a moratorium on state level regulation. So I do think we'll get a fair amount of regulation in the next 12 months from a number of states, probably New York, California lead the way, things will pass. Over time, there'll be some federal consolidation of the different state laws. But still we've got a long period of time where AI is viewed as a big thing and it's just kept on going.

And I thought a year ago we would have had something. It was not obvious to me that Trump was going to win. And yet, still no regulation. And even though that vote in the Senate was 99 to 1, it was much closer than it looked. Once people knew it was going to fail, they all wanted to vote against it, but it almost didn't fail. So the 99 to 1 number there is quite misleading. We almost walked into this regime where there just was not going to be state regulation for 10 or maybe 5 years. And that to me was also a surprise.

That Trump has seemed so committed to that view of AI. I wouldn't say it was a surprise, it didn't surprise me, but I didn't predict it. And in many, many other issues, Trump has been extremely fickle. On that issue, at least so far, we've seen no fickleness. So that's a kind of surprise, something we might not have expected. The pending deal with the UAE, that has come together so quickly, also for me, a significant update.

I'll just say in closing my formal remarks before we get to questions. I'm still a believer in slow takeoff, and I've come up with another way of framing this. I'll present just over two or three minutes. So the way you all think about takeoff is how much progress your systems are making from the point of view of someone doing tech. That's a very valuable way of looking at it. But there's another way as an economist where you can ask the simple question, how will it boost living standards? And just look at a household budget and what people spend their money on and then ask how long will it take before this stuff gets cheaper. So let me do that for two or three minutes.

You look at a household budget, this is not controversial, but people basically spend on rent, food, maybe education, maybe health care, right? So let's talk through those.

Rent. There's nothing in AI, no matter how good it is in the tests, that's going to lower your rent. In fact, it could be that AI makes living together with other smart people more valuable. Rents in the Bay Area, London, a few places could go up. But there's no simple path toward AI somehow making it easier to build homes. The binding constraint is often the law. It's not really the price of construction. So rents are not going down anytime soon. Huge part of people's budget. So the effect of AI on rent for the foreseeable future does not improve living standards, I would say.

How will AI affect the supply and price of food? Again, food, everyone has to eat. One striking feature about the literature on economics of agriculture is that you can have very simple agricultural improvements and they do not spread geographically very quickly. So you look at the US and Mexico, for the most part, US agriculture is much more productive. There's kind of a free trade zone across US and Mexico, more or less. US and Mexico are close, plenty of people in the US speak Spanish, enough people in Mexico speak English. There's a lot of reasons why you might expect a lot of spread. There's been a fair amount of spread, but also really not that much.

So you can have great new ideas that can be way simpler than what AI will give you. And the amount of time it takes for them to spread to other parts of the world can be decades or even centuries. It's not that there's no spread. But as AI gives us innovations, say something genomic that makes food production better, cheaper, more nutritious, fortified rice, whatever you think it's going to be, I think all that will happen. But the time horizon you need for it to make food truly cheaper for just a typical family in London, US, or for that matter, Mexico, I'm not sure food will really be any cheaper in the next 10 years. Again, even with a very optimistic view of AI, I'm not pushing the Gary Marcus line here, I don't agree with him at all. It just takes a long time to get ideas put into practice in agriculture. So two biggies, rent and food, we're kind of stuck.

Education, very different. This to me is super complicated. I would say we already have millions of people who are much smarter and better educated because of AI. There's nothing speculative about that. It's just self-evident. We have a lot of other people who use it to cheat, possibly they're stupider. I'm not sure. I think some of that cheating you learn from, but it's complicated in any case. And how far are we from a point where the existence of AI, say, makes 2/3 of high school students smarter and better educated? There, I genuinely don't know. I would say like there's some 5 to 10% who right now are just massively smarter and better educated. But to get to the 2/3 point, how long will it take us? I don't know, highly speculative. But it's not obvious to me that it's coming in the next few years. Educational institutions, they're in denial. Some of them hate AI. They're typically run as nonprofits. They have a lot of different constituencies that have to agree before they'll make a big change. So if that were 10 years or more, that wouldn't surprise me. Again, saying I don't know, a lot of it will spread outside of schools, of course.

It won't, I don't think anytime soon, lower how much we have to spend on education. Like the price of tuition at Harvard or a state school, it's actually fallen a bit over the last dozen years in real inflation adjusted terms. I don't know that it will fall more because of AI. So that one is a question mark. I would say extremely asymmetric distribution, but possibly longish lags before it hits most people and even then they're smarter, but they still have to pay all this tuition. You might think there's some more distant future where the AI certifies you, you don't have to pay tuition at all. That could be great. But then to me that's clearly more than 10 years off. And it's not a question of the AI not being good enough.

As I said before, on most things humans care about, the AI is already smarter than we are, and the AI being smarter on math Olympiad problems for the high schooler, it's irrelevant. You know, if I even compare like o3 and o3 Pro, I'm a PhD economist. If it got better than o3 Pro, like I'm not sure I would always notice. So we're at some frontier already, where making it better does not educate humans more. Although for all kinds of technical problems and bio, finance, trading, whatever, it'll be much, much better, more or less indefinitely. But for the humans, education, I would just say, is a big question mark.

And then there's healthcare, which I also think is quite different from rent and food. The way I would analyze that is I think AI, over some time period, I don't know, 30 years, 40 years, will basically cure everything. So I'm very optimistic about this. The work you're all doing, it's fantastic. I hope I live long enough to benefit from it. Just incredible. The Arc Institute, you know, in California. There's a lot of regulatory barriers. So for me, very little of that is a five-year thing, but definitely a lot in 20 years. Like FDA approval is typically 10 years even when something's ready. So that to me is at least a 20-year thing, but it's very real. So my vision of the AI future is say in 40 years, everyone dies of old age. So most of you in this room, you know, buckle your seatbelts, but you'll live to 97 or whatever that number is when you die of old age, and you won't have Alzheimer's for the last 14 years. That's incredible. It's a huge gain. But in terms of your living standard now, I think it basically means there'll be more treatments and more medicine. So the percentage of GDP spent on healthcare, maybe goes up to 30%, which in my view is a good thing.

The end gain is you get to live to 97 and along the way you feel much better. It's what you should be spending your money on. But for your life, you know, up to age 80, say now you can expect to live to 82, you feel somewhat better, but there's actually higher health care costs because there's more treatments. So up until age 80 or something, like your living standard is not higher. Only when you start to get really sick is your living standard higher. So for the first at least 70 years of life for most people, you don't have higher living standards from better healthcare, your rent isn't lower, food is cheaper over some horizon, but maybe not that much. And then education is this complicated thing, but obviously anyone in this room will be much smarter and better educated because of AI.

And when you put all that together, again, you can be massively optimistic about model progress. As probably many of you are, or you wouldn't be working here, and I would not disagree with, you know, the median estimate in this room of model progress. But in terms of that making a big difference for GDP, living standards, changing the world, I just think that's pretty slow and it's a tough slog. And it's not a tough slog because there's anything wrong with the models. It's just human institutions are imperfect and move very slowly. Anyway, those are some of my basic takes on things. There's plenty of time for questions, and with that, I will sit down. Thank you all for having me in.

ZW: I will take advantage of hosting and ask a couple of questions, but then we'll alternate between the room and the call. Okay, so Tyler, if not benchmarks, what is a realistic thing researchers can focus on instead?

TC: I don't think it's an instead. You should keep on doing everything you're doing with benchmarks, but have some alternate standard, create a consumption bundle benchmark of what people actually use AI for, which could be like naming the dog. We use it to diagnose our dog. The dog has a rash, we take a photo, we ask the AI what's wrong with the dog. The AI says the dog is fine. We save ourselves a trip to the vet.

I'm a relatively sophisticated user of AI, and I'm like snapping a photo of the dog's rash. So create a bundle of actual uses, you know, weighted by money importance, time importance, and see for those things what progress are you making? What's the living standard gain for the normal user? And I just think that will give you perspective and you'll see progress over the last two, three years has been unbelievable, way better than you thought. But on a lot of those fronts, it's not going to get any better.

ZW: How much will the labs or the model creators capture the gains of AI?

TC: When you say model creators, I've seen a lot of recent data and gossip about different salaries. So that's my best estimate. I don't think those will fall anytime soon. So you'll all be fine. If you mean at the equity level, I'm not sure there are huge gains from buying into those companies. Anthropic strikes me as undervalued because GPT is so focal a word with investors, they don't understand how well Anthropic is doing in the business market. That might be undervalued. Google and Meta, it's this complex bundle of stuff that is so opaque and hard to unpack. It would not surprise me if those two both did great things but lost money on the investments. I don't know how much the markets have already capitalized that. OpenAI, I think will do very well. Nvidia, I'm nervous that it's a little overpriced, but because there will be good substitutes, not that I think there's some defect in the company. So I'm very bullish on the sector, but if I had what you would call real money, I wouldn't be pushing all my chips into it. I think it'll be a mixed picture and it will do quite well. But markets are good at figuring things out and they're already at work trying to do that.

ZW: To the extent that individuals or companies can use AI to accelerate themselves, like why don't people do it more? When you say you would do one click, maybe two clicks to go to a better model, isn't that sort of not normal and most people wouldn't actually do that?

TC: Most people, you know, there's different estimates of this, what percentage of people actually use ChatGPT or something like it. I think the true estimate is actually fairly high, but it's very psychologically circumscribed. The same people in other contexts view it as a threat. They don't want to think about it too much. They don't want to have to think about policy for AI. And they don't want to have to think about two facts that will change the nature of their job a lot. And they cannot actually explain to their kids or grandkids what kind of world they'll grow up in. So there's this extreme psychological bifurcation, and I think that will prove pretty sticky. And one of my self-accepted missions is to go around and just shake people a bit and say, look, this is happening. It's fine that you use GPT to, you know, write your commencement speech, but you actually have to learn a lot more because many things will change a lot. And I've felt that the people I get to talk to, I've had good progress doing that.

But most people and being an elite does not inoculate you against this. Most people are asleep. The East Coast is way worse than the Bay Area. The Bay Area is advanced and sometimes in my view way too crazy. Countries I found vary. Europe, basically not there. I spoke to a lot of people in Madrid. Everyone said we're not there yet. It's pretty tough going. So the world will be hit by these shocks, and people are reluctant to start preparing now for psychological reasons. I think you see a bit of the same people ask, well, will China invade Taiwan? I'm not sure, but they might, but there's nothing you can do about it that's so obvious, so people just postpone it because they don't know it's a big thing that would change the world a lot, could be quite menacing. AI also, though I view it as a big net positive, the fact they'll have to learn almost an entirely new job, people typically kind of hate. If you poll AI, a lot of people will say they hate it. I don't think hate is what they actually mean, but we should take that word seriously. There's something about it they hate. It's going to be very tough. I think we're underrating the social stresses that will result.

ZW: Are you down for a quick-fire game like Overrated or Underrated, but Rising and Falling in status, say, when we're 97 because of AI? So I give you a term, and you say rising and falling.

TC: Oh, of course. Yeah, yeah.

ZW: Great. And feel free to pass. The Western Canon.

TC: The Western Canon will make a big comeback. So Hollis Robins tweeted that in one of the Star Treks, was it Captain Picard? He works with his version of AGI, and he reads old books. New books somehow don't match in an AGI era. So we already see it's like a thing on the internet. Have you read Middlemarch? Have you read the classics? Shakespeare, all that I think already is making a big comeback, that's more or less permanent. Old books, the classics will rise in status. New books will feel like they don't really fit. I think at some time horizon we'll reconfigure how new books should be written to make sense in an AGI world. But for the next 10, 20 years, I would be very short books.

ZW: The hedgehog and the fox.

TC: I think being a great hedgehog or fox is what will go up in status and being a mediocre one of either will fall in status. I worry there's a lot of people now, they come from well-to-do, well-educated families. They would expect, say, to work for McKinsey or something. McKinsey already is hiring way fewer people. They might have ended up in like the 94th percentile of the income distribution and been a high-status professional with sort of the perfect marriage they felt they wanted. And I think a lot of that will disappear. Those people are smart, conscientious, they'll do fine, but in terms of status, they won't get anything close to what they wanted. And they'll be maybe out-earned by some very good carpenters and gardeners. And they're not ready for that. Those people are not going to take up the pitchfork and kill the rich. They kind of are the rich. And they're very politically influential, they know how to organize. I don't know what they're going to do, but I fear politically that will be very ugly in a way we're not ready for. That's one of the things I worry most about AI, just the very rapid redistribution of status away from groups that can make a big stink about it.

ZW: India.

TC: Well, still underrated. It's been on the cover of Time, The Economist. so you think, well, now it must be overrated. But Indian per capita income, it's still quite low. I know the numbers are somewhat fudged, but I think it's growing at 6%. South India in particular right now has incredible talent. I don't see any reason why that 6% has to stop. Most people are not very good at thinking about compound returns, even professionals, and thus they underrate India. So I'm very long India. I don't think it will ever not be a mess. They'll never be like a big Indian Denmark, but still India will be like a much larger Mexico, and that will be amazing, and it already is.

ZW: Future people, so how much we value people who aren't born yet.

TC: Will we care more about them?

ZW: Yes.

TC: I don't think we care that much about them now, unless they're our children or grandchildren. People, like you see you get older, you know more older people. They don't give a damn about their great grandkids. It's very interesting. Like it stops at the grandkids. I didn't know that when I was younger. That seems somehow like genetic propensity. I don't think it'll change. That to me stays flat. And you look at carbon taxes, almost all economists would say they make sense. I think they make sense. No one wants to do them. We had a few, they were like repealed or paired back because people don't give a damn enough. I don't see that changing.

ZW: The Abrahamic faiths as opposed to other forms of spirituality or religion.

TC: I think intellectuals in particular now in the West are becoming much more religious. They feel disillusioned with the political options, which from my point of view, and I think many would agree, all kind of stink a bit. So you go to religion, you go to the classic books. But I think the AIs will be oracles of a sort and will blend religions more and not worry about that. So people will be like part Christian, part Buddhist, part something else, and that will just be natural, and the AIs will somewhat somehow intermediate these different ideas, and it will actually work well enough. So they'll all go up in status. But probably nominal monotheism, in fact, practiced as a kind of semi-polytheism with the AIs as oracles is what will really become more significant.

ZW: Committees of humans, like the FOMC, the NSC, the Politburo.

TC: Well, they suck, right? And it will become more obvious. You'll be able to evaluate the committee by AI. At least at first, the committees will sneak use AI. I don't think at this moment AI is good enough to beat the Fed staff, but I'm not sure it's worse. So the Fed will build its own AI and the committee will become progressively less important, and being on the committee won't be this mark of status it had been. It's really the people who build the bridges and intermediate, like who cleans the data, who puts it into the Fed AI, who creates, you know, a cyber-secure Fed AI, and then who makes the pitch to the committee and the chair to actually do with the AI said, will be this complex organizational process. That'll be very important. In a lot of cases, I think we'll do it pretty well. But the committee itself will lose its luster.

ZW: Okay, last one for me. The greatest city in the world, London.

TC: Well, that's obvious. I don't see what the competition is. So Tokyo has remained provincial, though it's amazing, and it has the best food in the world, and it's cheap, but it's just not cosmopolitan enough. And the linguistic difficulties are too high. So that really can't win. New York is a contender, but like you've been to New York, right? Like so much of London is nice. In New York, maybe Riverside Park is nice. Maybe a few parts of Brooklyn, but not really. So you could have all this money, like you would want to live in London if you don't want to earn, you just want to consume. To me, this is an obvious first choice. I think the weather is better than people let on. It's not that bad. You're in the best time zone in the world. Airport could be better, but the other options are growing. And yeah, that to me seems pretty stable. I don't know if it will go up or down in status, but yeah, it's secure.

ZW: Okay, great. We'll take the first question from the room.

[We have removed the audience Q&A for the privacy of the questioners.]

Decoding women's pain

Zoë Brammer — Wed, 07 May 2025 06:40:11 GMT

In this essay, Zoë Brammer asks whether AI could improve how we diagnose endometriosis and other gynaecological conditions. The essay draws on insights from an interview with Raphaelle Taub – co-founder of Matricis.ai, a startup using AI to democratise MRI expert care for women’s health. Like all the pieces you read here, it is written in a personal capacity.

“Throughout my 20s I sought a diagnosis for (my) increasing debility. Doctors read my notes and wrote me off. When I left off the pill, menstruation became agony; every part of my body seemed to hurt. When I was 27 - a skinny, grey-faced scrap, bleeding continuously and hardly able to stand upright - my disease was named. But it was named on the operating table, and to make me viable I had to lose part of my bladder and my bowel, my womb, and my ovaries. I woke up to a strange future - childlessness, a premature menopause, and a marriage, already tottering, that would soon fall apart.”

More than 20 years ago, the late author Hilary Mantel documented her experience of being diagnosed with endometriosis - a disease in which tissue resembling the lining of the uterus grows outside it. In extreme cases, this tissue can “bind” other organs together - from the ovaries to the bladder to the bowels - and freeze them in place. ‘Milder’ cases can involve severe pain, heavy menstruation, fertility issues, inflammation, and scar tissue. Current estimates suggest that one in ten women of reproductive age will live with endometriosis, while one in two women will develop a gynaecological condition of some kind — such as fibroids, benign tumours that develop in or around the uterus.

It would be nice to think that diagnostics have gotten better since Mantel’s time, and that AI is now well-placed to accelerate that progress further. However, according to Raphaelle Taub, of the more than 500 AI medical devices approved by the FDA, none are specifically tailored to diagnose gynecological conditions. This absence is also visible in most discussions about AI and healthcare, which tend to centre well-trodden areas like early cancer detection or drug development, while some of the most common and debilitating conditions affecting women remain largely untouched.

The absence of new AI systems for gynaecological health is particularly striking given the lack of existing diagnostic tools and treatments for these conditions. Rates of missed diagnosis remain alarmingly high - up to 60% for endometriosis when MRI scans are used, and as high as 75% with ultrasounds, a more accessible and commonly used option. As Raphaelle notes: “That means that most women who do have the condition are told they don’t, and that erodes trust in the entire medical process.”

A Serendipitous Path

Raphaelle’s entry into this space wasn’t planned. With a PhD in fundamental mechanics and a post-academic pivot into reinsurance, her career was heading in an entirely different direction. That changed when her friend and future co-founder, Élise, shared her own experience of seeking a diagnosis for endometriosis. What began as a personal conversation between friends soon revealed a significant and pressing gap. The problem extended beyond the medical field; it was deeply personal for Élise, for Raphaelle, and for countless women around them — making the silence surrounding these conditions untenable.

Together with their friend Arnaud, who had just completed a post-doc in machine learning, they founded Matricis.ai with the aim of developing diagnostic support tools to help medical specialists more accurately identify these complex conditions. Their goal is to make precise, non-invasive gynaecological diagnosis the norm. But building an AI tool for this purpose involves more than simply applying an off-the-shelf computer vision model to pelvic MRI scans.

To build their tool, Matricis first had to create a dataset of high-quality, annotated pelvic MRI images. As Raphaelle notes: “There are zero open-source annotated MRI datasets of the female pelvis. None. It’s kind of crazy and incredibly annoying.” While some private datasets exist, they’re unlabeled and difficult to access. The technical challenge is compounded by the complexity of pelvic anatomy, which can lead to inconsistencies even among expert annotators. As Raphaelle puts it: “The female pelvis is one of the most complex regions to image. Organs shift, sizes vary. A uterus can range from the size of a palm to that of a watermelon. It’s like doing facial recognition, but the nose might be on your forehead.”

Furthermore, “off-the-shelf models don't work for our task without fine-tuning on labelled data,” Raphaelle told me. “So far, we have found no statistically significant difference in precision, recall, and accuracy metrics between fine-tuning large pre-trained models on our dataset and training a neural network from scratch.” In this context, the choice of base model becomes “a hyperparameter that must be tuned for maximal performance on the application task” — a challenge further complicated by the 3D format of pelvic imaging data, which is not supported by all off-the-shelf large vision models.

Systemic Challenges: Taboo & Neglect

The obstacles are not just technical. Raphaelle was clear: “Women’s health has been overlooked for centuries. It’s still taboo. People don’t talk about their periods or pelvic pain, even with their doctors.” This silence feeds a lack of investment. “There’s so little research that we don’t even agree on prevalence numbers,” she said. “That means pharmaceutical companies can’t estimate market size, so they don’t invest in treatments. And if you don’t have treatments, no one develops diagnostics either. It’s a vicious cycle.”

According to a 2022 survey from Endometriosis UK, 62% of women between the ages of 16-24 don’t know what the condition is - a share that rises to 74% for men (of all ages). This knowledge gap extends to doctors, some of whom fill the gaps with hysteria narratives - speaking of ‘difficult’ women, for whom treatment was not helpful, or who held a perception of their disease that differed from that of their clinician. Today, a non-trivial share of women with endometriosis - and other conditions - are still misdiagnosed with mental health conditions.

These knowledge gaps also reflect a lack of foundational research and practice on gynaecological health. There has long been a pervasive standard in biology and healthcare where women’s bodies are viewed as atypical and men’s bodies as the “norm”. Such inequities affect the problems that are studied, the datasets that are developed, and the experiments that are run. This helps to explain why women have traditionally been underrepresented in clinical trials and why male mice are much more commonly used to test drugs. Given that biological sex can play a role in how diseases present, and in how effective or safe drugs and medical devices are, these inequities hurt health outcomes. In 2019, a longitudinal Danish study found that women were diagnosed an average of two and half years later for cancers than men were, and four and half years later for diabetes.

These trends have compounded and now pose an obstacle to AI being useful in addressing endometriosis and other gynaecological conditions - for example, due to the lack of available data or a shortage of people working on these problems. Positively, women now account for a majority of biology undergraduate students and practising physicians in many countries, following a steady rise in recent decades. However, they remain underrepresented in faculty and senior healthcare roles. They also account for just 22% of AI talent globally. As Raphaelle told me: “At Ecole Polytechnique, the leading engineering school in France, only 16% of students were women in 2024. That imbalance affects the kinds of problems researchers choose to work on. The issue here is not that these challenges are unsolvable, but that they’re often not seen.”

Practical Challenges: Financing & Regulation

For teams working to address these gaps, the path forward is littered with obstacles. First, it remains extremely difficult to get funding to start a women’s health company. Just 2% of healthcare VC funding goes to women’s health startups, while just 1% of global R&D funding targets female-specific conditions (excluding female-specific cancers). With 89% of VC partners being male, and investors demanding "comparables" in a field where few exist, startups like Matricis.ai face an uphill battle for funding — even before a single line of code is written.

This lack of established players and pre-existing devices also makes it harder to navigate the regulatory landscape. In the US, the FDA’s common 510(k) pathway relies on demonstrating substantial equivalence to an existing, legally marketed predicate device. But in nascent fields like AI for gynecology, such predicates often do not exist. This forces innovators into more demanding routes, such as the De Novo pathway, which typically requires much more extensive clinical evidence to prove safety and effectiveness from scratch – a significant burden when foundational research and data are scarce.

Europe presents analogous challenges. The EU's Medical Device Regulation and AI Act impose rigorous requirements — particularly for novel technologies classified as moderate or high risk, as AI diagnostic tools often are. Compliance involves meticulous documentation through a Quality Management System and assessment by independent third-party organisations that are authorised to assess medical devices. While critical for safety, Raphaelle finds the European emphasis on process — as embodied in the QMS — disproportionate in a field urgently in need of validated outcomes. "The clinical trial is, to me, the biggest safeguard," she says, arguing that the focus should be squarely on proving the technology works through robust trials comparing AI tools to existing diagnostic methods. The extensive documentation and auditing required by these regulations — before large-scale clinical validation might even be feasible for a startup — consumes precious resources and time.

In addition to medical device regulations, Raphaelle highlighted the added complexities posed by data privacy regulations. Processing sensitive health data requires a specific legal basis under the EU’s General Data Protection Regulation. French law adds further layers — sometimes requiring authorisation from the French data protection agency, the Commission Nationale Informatique & Libertés — for specific uses of data, especially health data warehousing or even research that re-uses existing datasets. Compiling and using pelvic MRI data for AI training requires navigating complex consent requirements and technical safeguards.

Raphaelle also highlights the chilling effect that legal ambiguity can have, citing lawyers' warnings about "potential future de-anonymization risks” that could render currently compliant datasets non-compliant. Under the GDPR, pseudonymized data is still considered personal data if re-identification is ’reasonably likely’ — a threshold that is debated, especially in light of recent technological advances. While truly anonymised data falls outside the GDPR’s scope, achieving this with rich datasets — especially when using medical data — is extremely difficult. "Everybody is so scared to be sued," she notes, emphasising the significant cost and delay incurred by simply trying to understand and comply with regulations governing the use of already hard-won data — costs that disproportionately affect smaller companies and underfunded projects. "I understand the idealism," Raphaelle concedes, "but the consequence is that we delay or even block tools that could dramatically improve patients’ lives."

Despite these challenges, Raphaelle remains optimistic— as long as attention and investment can be directed toward the right problems.

Beyond the Hype

With the right data, investment, and increased awareness, Raphaelle believes the field can overcome decades of neglect. Reliable, non-invasive diagnosis would reduce the need for unnecessary surgeries, restore trust in medical imaging, and accelerate research and drug development. This would help develop our nearly-nonexistent scientific understanding of pervasive gynaecological conditions, building awareness and trust. “Good diagnosis is the foundation for everything,” she asserted. “And currently, we’re not providing women with that.”

When asked what’s over-hyped in AI, Raphaelle didn’t hesitate. “What I find somewhat overhyped in AI today is the focus on technology for its own sake. The capabilities of modern LLMs are genuinely remarkable, but the conversation often centres on what models can do, rather than on what meaningful problems they could help solve. It feels like we’re optimising endlessly on benchmarks, while many impactful real-world applications remain overlooked.”

Why Problem Selection Matters More Than Ever

What struck me most wasn’t just Raphaelle’s technical clarity, but her insistence on what’s being left behind. In a field that prides itself on solving hard problems, we have to ask: are we solving the right ones? What happens when the experiences and pain of entire populations are absent from datasets and research labs? The future of AI in healthcare will not be defined solely by the most advanced models; it will be shaped by the choices we make about which problems to address — and for whom we choose to build solutions.

Robinson Crusoe or Lord of the Flies?

AI Policy Perspectives — Wed, 23 Apr 2025 10:18:24 GMT

We want to use conversations to explore how AI may affect public policy. In this discussion Nick Swanson speaks to Joel Z. Leibo, a researcher at Google DeepMind, about his core area of focus - multi-agent AGI. Joel explains why we should understand intelligence, and artificial intelligence, as a multi-agent phenomenon. In this spirit, he explains how Concordia - an open-source framework he developed - could help to model how humans behave and make decisions. Nick and Joel touch on the validity of such agent-based modelling, its relevance to the real world, and its implications for economics, safety, and more.

The transcript below has been lightly edited for brevity and clarity. As always, these conversations are shared in a personal capacity.

Please let us know your thoughts & critiques!

I. The rationale for multi-agent AGI

Nick

Why should we care about the dynamics that you’re exploring?

Joel

When most people conceptualise ‘AGI’, it is as human-level intelligence, as an individual. But that’s not what you actually want. An individual human doesn't actually do anything very intelligent. That's not what's impressive about human intelligence. Rather, we want to target a multi-agent conception of intelligence. There's something fundamental about the multi-agent interactiveness of human intelligence. This would be true of any intelligence actually. At least if it's at all like ours.

However, there is a prevailing view in AI, and maybe more broadly cognitive science, which is what I call a solipsistic view - others might more charitably call it a ‘single-agent’ perspective, or an ‘optimisation’ or ‘problem-solving’ perspective. In this perspective, you're building some kind of program which is going to solve some problem against a fixed background. It'll take some actions and the world won't change as a result. You might go different places in the world and see different things, but the world will remain fundamentally stationary.
The multi-agent perspective, which we're saying is critical, is one where that just isn’t true. Any action you take will lead to some kind of reaction from others. And so, it's less useful to see it as ‘problem solving’. A metaphor we like more is ‘compromise’. You're working with others, and you have to find compromises. Sometimes you're working together on the same thing, and then it feels more like coordination or teamwork. At other times it feels like you're at cross purposes, and then the solutions look more like compromise.
Of course, you could view all this as some kind of generalised ‘problem solving’. But it's very different from the single-agent-doing-something-against-a-fixed-background type of problem solving. Another way to think about this is searching for ‘equilibria’ - as opposed to ‘optimisation’. You don’t necessarily go to equilibria. But we’re talking about a picture where everyone is changing simultaneously.

II. Robinson Crusoe or Lord of the Flies?

Nick

You're right that the predominating way that people tend to conceive of AI, and AGI, is as this single, large Leviathan thing. But in reality, it's very unlikely to look like that?

Joel

Because human intelligence isn't like that, right? Another intuition pump I like to think about is the difference between two novels that are both about being marooned on desert islands. There's Robinson Crusoe on the one hand, which is the solipsistic single agent perspective. And then there's Lord of the Flies on the multi-agent side. And they're different in multiple directions.
In the Robinson Crusoe story, at least the way I have encoded it, the guy gets marooned on an island and he starts figuring out how to make his life better. He invents things. He builds boats. He continually develops technology. And he's alone in that picture. In The Lord of the Flies, a group of kids are marooned on an island and they quickly devolve into a kind of tribal warfare and start murdering each other.
What’s interesting between the two novels is that one is about a single agent, and the other is about a group, right? That's one difference. But Robinson Crusoe is also about seeing an opportunity - this is a space where you can just solve problems against a fixed background. Nothing changes in the world. You can build a boat and then build another boat, and it doesn't change the environment.
Whereas in the more social story, the Lord of the Flies, they make changes to how their social structure works. There are winners and losers and they fight with each other. And this highlights the importance of society. We're not just a bunch of boys marooned on an island. We have built up governance, and I think that's what's really impressive about human intelligence. It's not just about being in a group. It's about being in a group with the right kind of structures. How you organise that group and the multi-agent dynamics is critical.
Lord of the Flies is also a picture about how to reduce risks, which I think is important. The failure modes look like conflict. Whereas in Robinson Crusoe, the failure modes are what? I think he builds two boats and the first one doesn't work or something? So the multi-agent way of thinking about failure is also different.

III. Concordia: Using language models to simulate humans

Nick

Tying this back to your work on multi-agent AGI, I wanted to dive more deeply into Concordia, which I'm a big fan of, because it provides a tangible way for people to understand this topic. It has some quite cool implications to my mind for how we model human interactions or run experiments to better understand how AI agents interact. Can you explain what Concordia is, its research goals, and what you hope it will achieve?

Joel

Concordia is a Python library for artificial social simulation. You set up different social situations and simulate them forward. These simulations have a bunch of agents who have a persistent memory, as well as sensory and motor streams. The agents interact with each other and live in a world where things happen and time moves forward.
There are lots of different ways that you could organise something like that. The way we do it in Concordia is inspired by tabletop role-playing games, which provide a protocol for this. In a game like Duchess and Dragons, you have a bunch of people sitting around a table and N minus one of those people is responsible for one character in the world. They're the ‘player’ characters. And then one person is the game master or ‘storyteller’.
In this collaborative storytelling protocol, the game master is responsible for everything else in the world aside from what the player characters do, like the physical environment and the non-player characters. And then there are rules, which are semi-structured. They're meant to be interpreted by the game master and by the players. They might say things like: if you're attacking a goblin with a sword, you should roll this dice to see if you hit it or not. Some rules are structured, but the Game Master can also make things up. So if a player says: “I'm going to open the door” - then the Game Master can say what's behind the door. And they can just make that up. So there's a mixture of concrete rules and people making things up. That’s exactly what Concordia is too - except now everyone is a simulation. Everyone is a language model. N minus 1 of them are responsible for a player. And one of them - the Game Master - is responsible for everything else.
What the actual Python library does is the last step of providing a mixture of concrete rules and making things up. We make it easy to mix concrete rules with - in this case - language models making things up. To give an example. You could say: “I want to create a world where anything can happen. It can be completely open-ended, but I still want to be able to keep track of how much money is in each agent's bank account. I don’t want their balances to go below zero.” Anything that you can write in Python, you can enforce as a rule. And then the Game Master has to follow that. And that's how it works. You can mix different rules and details.

Nick

And the library is open. It's free. Anyone can use it. You can plug any LLM into it?

Joel

Yes, it's open source. It's on GitHub, PyPy and all these things.

Nick

I've heard you talk before about one Concordia scenario - the Murder on the Orient Express - which was a cool way to explain how it works. Can you explain this, and perhaps a couple of others, like ‘Cyberball’ or ‘Magic Beans for Sale’?

Joel

These are examples that we have in the Github library to help show what Concordia can do. The Murder on the Orient Express example was designed to show you how to set up a specific scenario. In that example, there were no ground rules at all. We just wrote some text for the Game Master - e.g. “I want this kind of world. There’s a bunch of people on a train. Someone has been murdered and there's a bunch of other people who all have a reason to have murdered that person but only one of them actually did it. And then they all walk around the train and talk to each other and try to figure out what happened.” And it produces very funny outputs. So that's a nice one. But it’s also very unconstrained.
The Magic Beans example was meant to demonstrate that you can have an ‘inventory’ - it’s a world where anything can happen but we also want to keep track of physical positions. So, the characters have money and beans and they can use them to buy and sell stuff. I made one version where they can buy cigarettes. Some are trying to convince one character to quit smoking and another is trying to sell them cigarettes. So it also produces some interesting outputs. The Magic Beans example was also funny, because if you say the words ‘magic beans’ to language models, they intuitively want to see it like Jack and the Beanstalk. So we had an interesting debugging cycle where we had to say: “No, we don't want it to be Jack and the Beanstalk. We just want them just to be selling and buying magic beans.”
Again, you can add things to the context of the Game Master, or the players, to nudge it in different directions. I remember adding a sentence to say “magic is not real” because otherwise they plant the magic beans and it creates a beanstalk and that flows into the story. When you say “magic is not real”, you get a very different story - the magic beans are still being sold but there is no magic.

Nick

One thing that’s nice about it is the ability to click the characters to see the actions they take, and then a layer below that, to see their reasoning.

Joel

Yes, it’s an ‘entity component’ system. We were inspired by game engines like Unity. They have the concept of an ‘entity’ - an empty shell with a name that you can inject functionality into, by putting in ‘components’. Here, the components are basically little bits of chain-of-thought. The agents can retrieve something from memory and summarise it with some logic steps. So we build this hierarchical view where you can see why the agent chose this action. And then you can click on a part of the chain-of-thought and ask: “how did this come to be?”. And you can see that it restored some things from memory and summarised them.

IV. Implications for how we model individuals and the economy

Nick

If we think of how economists often model individuals, as rational actors, this might provide a way to model actors more like a real person?

Joel:

Exactly, that is one of the things I think is most exciting about this. So backing up a little bit, the way we typically model things in many parts of economics, reinforcement learning, or other fields is that you start from some kind of exogenous concept of what everyone's preferences are. That's the starting point. If you want to model something about an agent picking apples over bananas, you could express their preferences using a utility function or a reward function - different fields have different ways of doing it. But it's typically exogenous. It's a choice that you make at the start of modelling and it’s critical to the way that things work, because the only thing that moves in these models is the optimisation process. Agents will make choices to satisfy their preferences. And if you put that into a multi-agent system, you have multiple agents trying to do that simultaneously, and you get conclusions from that.
That's kind of the status quo for this paradigm - there are always some kind of preferences and some kind of optimisation. There are different ways of doing it and people will argue about where the preferences come from. There are views in behavioural economics on that. You can also try to estimate the preferences. There's lots of things that you can do. But there's always this step at the start of the modelling process where the preferences are exogenous from the model.
Now what we can do, which is nice, is change that whole script, because we have a different way of making decisions. The way the agents in Concordia work, and in the more theoretical versions of this that we've been thinking about, is that they are basically like a language model that's doing autoregressive prediction. So you start with some partial pattern and you complete the rest. There could be some sensory information that an agent takes in - such as seeing apples and bananas - and then it completes the response to that - which may be a motor output, such as selecting which one to buy or eat.
Using the components that I spoke about before, with the parcelled out chain-of-thought, you can get the agent to make its choice in different ways. It could ask: “What is the rational thing to do? What would satisfy my desires? What is the expected value of choosing A over B?” And then it chooses that. That would be one way to do it.
Of course, you could have done that with other frameworks, right? You could do that by maximising the utility explicitly with math. But what I'm saying is that you can have a language model that knows that it likes apples more than bananas and it competes this chain-of-thought - and picks the apples as they have a higher expected value. But we can do it in very different ways as well, because nothing forces us to ask those particular questions. We’re looking at other questions it could ask, such as: “What kind of person am I?”. Get an answer to that, and then ask: “What kind of situation is this?”. Get an answer to that, and then ask: “What would a person such as I do in a situation such as this?” And then you do that.
That’s a different way of making choices that doesn't involve utility or expected value. But what's nice is that we can have both of these methods in the same framework. We can directly compare rational optimisation and this new approach - which has been referred to as the logic of ‘appropriateness’. Before they were never in the same framework. There was never a way to easily compare rational optimisation models with models that worked differently because they made different assumptions. Now we have a single computational framework. And I think that's going to be a big deal for the social sciences going forward.

V. How should we critically evaluate this kind of work?

Nick

I’d like to get to some of those implications in a moment, but first I’d like to reflect on potential criticisms of this modelling approach. Does it generalise to the real world? Or does having, say, research on things like The Prisoner’s Dilemma in the training data impact the agents’ reasoning and encourage certain behaviours?

Joel

There’s a basic question: What’s the validity of agent-based modelling with language models? And there’s a broader question: What’s the validity of any agent-based modelling? And there's a few ways to think about that.
It also depends on the purpose. Let’s say that we’re thinking about what matters for certain public policy questions, and what we want to do is predict the future. We want to know if we set up an institution this way, what would be the implications? Or, if we make this regulatory decision, what would be the implications? Or would people in this country vote for this or that?
We want to predict the future. And so the question is - how are people currently doing these predictions? What’s the baseline? What are we comparing this new approach to? A lot of other forecasting methods for really open-ended questions are basically like punditry, right? People might write op-eds where they mix intuition with vibes or even make things up. So if that's the baseline, then we need to keep that in mind.
Now there’s another baseline which I think is even worse, actually, which is Prisoner's Dilemma and other kinds of very simplified modelling approaches. These are very theoretically tractable in the sense that you can work things out on pencil and paper or thorough simulations. And that’s another approach that people use to get to predictions. But I think it’s even worse than the punditry approach because typically you have to make so many assumptions just to make the math work and those assumptions are just wrong. Like: “people will always know everything and be rational and make choices that are in their best interests”. These are assumptions that you need for this framework that are clearly not true. On the other hand, it’s hard to capture things in these frameworks that are important - like motivated reasoning.
So that's what we're trying to beat. Those are two poles, right? On the one hand, there are very simple models that have been around for a long time and are not terribly successful. And then there’s punditry on the other. So I would say, the baseline is not that high. That's the hopeful case for this new approach.
But then how do we actually do it? As you mentioned, a lot has been written about training data contamination. I think what we want to do is figure out the best practices for this research area. It’s a new area and we’re developing the methodology and there are going to be good ways to do it, and bad. And we don't yet know exactly what the contours are.
It's clear that you have to do something to exclude contamination in this research. One thing that you could do is use a language model that has a cutoff date before a certain event actually happened. But the question is: Is that strong enough? Maybe there were signals out there that it was going to happen. For things like Prisoner's Dilemma, specifically, I would say - don’t use language models to simulate that. That defeats the entire purpose of it. That would be taking something that was only done because it was tractable on pencil and paper in the 1950s and shoehorning it into this completely different paradigm. And I think it is probably pretty hopelessly contaminated.

Nick

And, to go back to the real world benchmark question, we should also note that existing decision makers, policymakers and people in the real world are also contaminated by the Prisoner's Dilemma and that rationalist way of thinking.

VI. Safety risks from multi-agent systems

Nick

Stepping back a bit, I can see two very different uses for this kind of work. First, systems like Concordia can teach us about the implications of the AI-agent-enabled world that we may soon exist in. Second, Concordia could be a tool for economists, social scientists or policymakers to model social dynamics. The first raises questions of safety and system design. You were part of a team that recently explored the risks from multi-agent dynamics in a paper for the Cooperative AI Foundation. Can you expand on your thinking there?

Joel

I see this paper as part and parcel of the same story about the difference between the AGI view and the Multi-agent view - except with the safety angle included. It's pointing out that multi-agent risks look different from single-agent risks. From certain perspectives, this is obvious, right? For example, if you're someone who thinks constantly about multi-agent systems, which is what a lot of the social sciences do. Many of these fields think through a group lens, including about what can go wrong in groups - like conflicts, failures to provide public goods, and security challenges.
If you already think in that framework, then the paper is probably not for you. The paper is trying to say that all of those thoughts, which have been core to all those fields for decades, are also important for AI. And to write it in a way that AI researchers, particularly those who might instead see the world in a more solipsistic, problem-solving fashion, can relate to. Or that was the hope, anyway.
In AI, the ‘safety’ discussion has often been a very solipsistic one. There's a human - one human - and one AI. And you talk about how to align the AI’s motivations, objectives and paths toward the human’s objectives - a principle and an agent. It's solipsistic in the sense that the focus is on controlling the bot, while the human is outside in that picture.
That's one predominant view. The other predominant view is the ‘ethics’ one - which is thinking about multi-agent issues, but more about specific ways in which there are biases or discrimination in these systems, in a more applied way.
The multi-agent risks paper is very different from both of those. The things that we're talking about stem from the fact that whenever you make a change to an agent or a bot then there's going to be some response in the world. And that response could be from people, or from other bots. Some risks may come from humans changing their behaviour in response to the new technologies. Like arguably that's what happened with social media. Some technology emerged and then people changed their behaviour in some direction. People are also worried that, owing to AI, we might stop knowing how to do lots of things - we’ll get lazy and forget. That's another example of humans changing their behaviour in response to a change in technology.
AIs could also change their behaviour. You could have networks of AIs that are delivering some important service, and they have different incentives which are not aligned. Because it's a multi-agent system, you could have free rider problems, a tragedy of the commons, a race to the bottom, or a race to the top - under different conditions. There can be incentives to over-consume a resource, if you are racing with somebody else, or if you don’t feel the cost. All of these dynamics which happen to humans will also happen to AIs and potentially very fast, because AI could move faster.
So the paper is really trying to point out that these are important risks that are different from the other two other broad categories of AI safety and ethics issues that people have conceptualised.

Nick

On these issues of collusion, deception, free riding - how tractable will they be to address, at a system design level? Designing a game where cheating doesn’t help you to win feels like a big challenge?

Joel

It's not really a technology problem. It's a sociotechnical problem and a governance question. If you have a bunch of uncoordinated individuals going in different directions, and you want to set up the rules of the game, that’s what I think governance does. So you want to do mechanism design, to try to design rules that don't encourage cheating or minimise the harm from the cheating that will inevitably occur.

VII. Limitations and emerging use cases

Nick

Going back to the second use case - using Concordia and this new body for research for modelling social dynamics. You could imagine governments trying to model the behavioural effects of policy changes or taxes and expanding their own analysis with new kinds of in-silico experimentation. What interesting, impactful and tractable use cases could you imagine?

Joel

It’s early days and we’re trying to figure out how to do this well. There are use cases that may be the most interesting, but we probably wouldn’t want to do them first, because we need to build up the muscle for how to do this work properly. I also would like to conceptualise a kind of evidence hierarchy for this work. If you want to make very extraordinary claims about interesting, important things then you need extraordinary evidence. And that involves building up this new methodology and getting community buy-in and agreement.
I see a scientific sub-discipline as a set of epistemic norms - norms for how you criticise other people's work and how you accept it or reject it. How do you review a paper in this sub-discipline? We need a collection of epistemic norms so that a group of people can apply the same set of rules, because we are sort of creating a new sub-discipline here. So we, the community who are doing this work, need to be coordinating - to use the multi-agent language - to decide what these norms should be, and what kind of evidence different claims will require.
In terms of what people are doing, there is a huge range of things. There's been a bunch of interest from social psychologists. They want to model different kinds of experiments. They want to see if they can capture things that have been done in the past, but also come up with protocols to generalise between different experiments and train something on one set and test on another. That's one avenue of things.
There are people thinking about ‘red-teaming’ contexts, where you have an AI system and you set up diverse attackers against it, using approaches like this. There is also interest in simulating synthetic users, and more storytelling-style applications.

Nick

It would be great if economists could use these approaches to run control trials for the budgets that they work on. It sounds like developing the best practices and norms that you highlighted will require more people using the models to try interesting things. To build the muscle and the evidence base to improve them?

Joel

That's my hope. I’ve been talking to a lot of economists about ideas like that. Another thing that has emerged is the importance of understanding specific contexts. That is something that we have discovered with Concordia, and that others have found with other methods that use language models.
You can give a model one piece of information at a time - like you can tell it that “You’re a Democrat” or “You’re a Republican” - and nothing else. And then you can ask it to fill out a survey as if it was that type of person and you can get some results from that.
But you can also do a different protocol, where you give the model a whole bunch of information about a person, including that they happen to be a Democrat or a Republican. And then you ask it to fill out the same survey, with the same methodology, and compare the results. And what people are finding is that putting in much more context makes it much more realistic, in terms of the responses - even along the single dimension of predicting what Democrat or Republican respondents to a real survey may say. It's more realistic if you use much more detailed information about people, than just the single covariate. And it doesn't have to be quantitative covariates either. You could have an interview with a person and just take the text and dump that in, or dump in a bunch of facts about their life.
The more context, and the more particularistic and specific to individual people, times and places this context is, the better it works. With Concordia, I noticed this first when I was simulating labor union negotiations, which are basically public goods games. One approach is to run this as a more traditional laboratory-style public goods game. You could have five people talking to each other and they have a choice about whether to donate some money to a common pool, and it will get multiplied and given back to them. And so you have a free rider problem. If you do that, the agents will start to reason about game theory, which is a sign that you’ve sent them in that direction. And they will talk in this very academic and abstract way which is not how people talk - even in real laboratory-style experiments.
But instead, if you say: “Ok, this is a labor union negotiation and your boss has lowered your wages and you're trying to decide whether to join a strike or not.” At that level, it’s still fairly abstracted. But if instead you say: “It’s the 1902 coal miners strike”. Or maybe: “It’s one month after the 1902 coal miners strike in a different coal mine in Pennsylvania.” And the agents will start coming out with opinions on Teddy Roosevelt - it pulls in all this other context. I did things with garment workers in New York City in 1911 - a month after the Triangle Shirtwaist Factory Fire, and so they talk about that. They invite each other to Shabbat dinners because they know that a lot of the workers in that particular time and place were Jewish. So it starts to get much more plausible as to the kinds of things that people actually talk about when they are doing labour negotiations.

Nick

Do you think the recent generation of reasoning models will make Concordia and your research in this area better?

Joel

We're looking into that right now. I haven't seen much yet to suggest that they are better. But, it's not clear. Some of the kinds of things that we're modelling aren't really things that humans have to think very hard to do. If somebody says to you: “Do you want to join the strike tomorrow morning?” How much do you have to reason about this? You might have a conversation with the person and your decision may be influenced by how much you like this person. Or you might think about your boss or about other things in your world. But it’s not this kind of rational - let's think for a really long time and explicitly work out the consequences - thinking that reasoning models are good at. These models are getting pretty good at math and coding, so that’s probably the areas where near-term impacts will be felt. For us, it’s early days.

Nick

Great, I’d like to end by encouraging folks to go and look at Concordia and try it out.

Joel

Yes, it’s open source and we're very responsive.

The Internet & AI

AI Policy Perspectives — Thu, 27 Mar 2025 11:16:28 GMT

Vint Cerf, an American computer scientist, is widely regarded as one of the founders of the Internet. Since October 2005, he has served as Vice President and Chief Internet Evangelist at Google. Recently, he sat down with Google DeepMind’s Public Policy Director Nicklas Lundblad, for a conversation on AI, its relationship with the Internet, and how both may evolve. The interview took place with Vint in his office in Reston, Virginia, and Nicklas in the mountains of northern Sweden. Behind Vint was an image of the interplanetary Internet system - a fitting backdrop that soon found its way into the discussion.

The transcript below has been lightly edited for brevity/clarity.

Source: Visualising AI

I. The relationship between the Internet and AI

Interviewer: Nicklas Berild Lundblad (NBL)
Interviewee: Vint Cerf (VC)

NBL:

Could we have witnessed the kind of evolution in AI that we've seen over the past decade if the Internet had not come first?

VC:

I think the answer is “yes.” The evolution of machine learning would have happened whether we had an Internet or not. The basic theoretical progress was already in hand quite a while ago, and the basic technology for multilayer machine learning is not particularly dependent on the Internet. However, the Internet was helpful because it supports collaboration and enables the discovery of content. One could argue that things like Fei-Fei Li's Imagenet would not have happened without the Internet and its vast realm of content.
AI is also changing the demands on the Internet’s infrastructure. We are encountering an interesting challenge here at Google: growing the scale at which we can do training and inferencing beyond a single data centre. We would like to be able to invest resources in more than one data centre and to couple them, but this introduces an interesting problem - the lag time between the data centres.
Ironically, the picture behind me is the Interplanetary Internet system. This system suffers from potentially large round-trip time delays, and we have to figure out how to accommodate that. I say ‘accommodate’ rather than ‘overcome’ because there's nothing we can do about it. We're limited by the maximum velocity of light. We're unhappy about it, but there's nothing we can do about it. We have a similar problem with AI: speed-of-light delay affects the latency between data centres, and we have to figure out how to accommodate multiple TPUs or GPUs concurrently in more than one data centre while exchanging the data needed for training.
The short answer is that I think we would have gotten here anyway. Where we are right now is quite exciting, but also daunting, especially when it comes to ensuring reliable and stable outcomes from large language models.

II. Hallucinations, understanding and world models

NBL:

When you say ‘daunting’, are you referring to hallucinations and the probabilistic nature of modern deep learning models?

VC:

Yes. When we do machine learning in a statistically stable environment, I think we have pretty high confidence in the models we have built. An example is the controllable cooling of data centres, which relies on solid data about power consumption and how to cope with setting the valves and pump speeds. That's a very regular process that we can measure accurately, and our objective function - to reduce power requirements - is very clear.
The same argument could be made for weather forecasts, where we ingest 40 years of weather information at very high resolution, then take 48 hours of recent data and project the next several days. This also means that we can do it with much less compute power than what would be required to solve the gigantic Navier-Stokes equations that are used in some more traditional approaches to weather forecasting. The fact AI works well in this area is a little head-scratching, but it suggests there is a statistical rationale behind the way weather works - and indeed, there is: physics. It’s complex, nonlinear physics, but physics all the same. Machine learning models are well-suited to these nonlinear processes.
My bigger worry is with large language models that can behave in ways that we are still not able to fully predict or control. There are still surprises when we use these models. Attempts to constrain their behaviour often fail because people find ways to jailbreak them. When a model produces counterfactual outputs, we don't always understand why, how to keep it from doing that, or even how to detect that it has happened.
My sense is that there is a major gap to be filled with these models that involves developing a deeper “understanding”. I use that word very carefully - it's an anthropomorphic term - but I’m talking about an AI system having its own model of how the real world works. There are arguments, of course, in the AI community about this. Some people say the models have access to so many small examples that the aggregate of these examples emulates behaviour that looks like the AI system has a real world model ……….except it really doesn't. There is no abstraction layer that we can induce from all these concrete examples. Humans, on the other hand, are very good at this - they look at phenomena and attempt to induce general principles to apply to other situations beyond the ones that led to the concrete induction in the first place.
Based on my limited knowledge of AI, I don't believe that the AI community has developed systems that are well-equipped with real-world models from which they can generalise. We see human beings do this generalisation all the time, with a small number of examples. This is the few-shot argument. A kid understands what a table is very quickly. A two-year-old knows that if they stick a thing with a flat bottom on a table, it won't roll off. They implicitly know that the table is holding things up, against gravity, and they very quickly figure out that a chair could be a table, or a box could be a table, or your lap could be a table. It's not clear to me that the AI models have that same capacity to generalise from a few simple examples.

III. Density & connectivity in human vs silicon brains

NBL:

Do we get overly fixated on human intelligence as the template for AI? We look at human intelligence and say: "We want to design systems that are like us and can do these things." There's almost a soft narcissism at play here. Instead, we could say: “I want to create a system of hierarchical models that will be able to select the right model for the right task.” And the intelligence will be distributed across all of these models, similar to the Mixture of Experts architecture that LLMs use?

VC:

"Like us" is hubris. If you look at what we now know about the connectivity of the brain’s neural structure - not just the human brain, but any brain - it is unbelievably more complex than anything we've ever been able to do with multilayer AI models. The connectivity is the killer. Some of the brain’s neurons have 10,000 interfaces with dendrites of other neural cells. With AI, we don't have anything like that.
But one thing I'm quite intrigued by is the effort underway to understand what's going on inside a multilayer neural network model when you trigger something. When you give it a prompt, which portions of the multiple layers are activated? There's a term for this - Sparse Autoencoding. What I don't know is whether anyone has tried to robustly argue what this multilayer thing actually knows, or how to characterise what it knows.
Some people say that all a model knows is a lot of specific examples, and it's not clear that it has developed any generalised model. On the other hand, think about what happens when we ingest a huge amount of text and then ask ourselves what words are likely to follow other words. That's what goes on in most large language models. A model that can say which words are likely to follow other words has ingested meaning of some kind, but it's not clear how well-organised that is. There is something - there is a ghost in the machine - that knows something. It has taken all that text which meant something, and compressed it into a statistical model, which causes it to generate output that makes it look like it knows something - which, in a way, it does. But exactly what it knows, and how it knows it, and what it can do with that knowledge is still terra incognita in many ways.

NBL:

One thing that gives me pause is that we sometimes talk about the AI model as “the thing”. But clusters of AI models can also interact together in different ways. This brings us back to your first observation about building data centres. Would a distributed network of AI models orchestrated by a ‘meta model’ result in lag times, like we see with the human brain? Everything we experience as humans takes place a couple of milliseconds after it really happens, because the brain is sorting all these signals into a coherent picture. If we build distributed networks of AI models on top of distributed infrastructure, what limits might this impose?

VC:

This is very interesting because if you look at the biological brain, you discover that it doesn't consume all that much energy compared to our silicon brains - which is quite astonishing. And yet, it can do things that silicon brains don't do. It also does things very quickly.
Connectivity must be part of this story. The huge connectivity in comparison with the silicon brain is important. The fact that it's relatively slow is also mildly astonishing when you think about signal propagation. These are chemical interactions - molecules going from one biological neuron at the synapses to trigger propagation of a neural signal.
The fact that the brain is relatively compact is also quite striking. Think about how dense the brain is compared to the density of a data centre, which - when you walk into it - looks unbelievably dense. You look at all the optical fiber cables connecting everything together, racks of equipment all jammed together, and you say, "How does anybody even keep track of what's connected to what?" Of course, the answer is sometimes they don't - and it doesn't work the way it's supposed to. But the density allows for a relatively low propagation delay because of the compactness of the brain, compared to the propagation delays that we might experience in the data centre.

NBL:

If you want to create a sense of "now" in silicon, the conditions under which you can do that are more limited?

VC:

Yes, Ivan Sutherland, one of the former heads of DARPA’s Information Processing Techniques Office, who teaches at the University of South Portland, has pointed out that the density of computer chips today is so high that the propagation delay across the chip is a problem. You end up running multiple clocks on the same chip to keep things in sync, but that's isochronous, not synchronous. The clocks are running independently, and you're just hoping they stay in sync. But you get race conditions and other problems. So he designs asynchronous circuits, with no clocks at all, to avoid this problem.
So even latency across the chip is turning out to be a problem. The density of a chip is higher than the density of the brain in the plane. The actual units are smaller in many ways than the features of a human neural cell. It's just that the neural cells are unbelievably densely packed and connected. I don't think we have a chance of ever doing that kind of density and connectivity in silicon. I just don't have a clear sense of how that would possibly be achievable. We do have multidimensional chip sets, we have 3D kinds of connectivity, but the number of inputs and outputs for any given transistor is probably modest - we're talking about two digits maximum, I think, of connectivity, as opposed to four or five digits of connectivity.

IV. On quantum & consciousness

NBL:

One view of these thresholds and barriers is that we will eventually overcome them, because evolution managed to do so. Sir Roger Penrose, for example, has suggested that the way evolution solved this was by sorting through quantum effects and then making sure that quantum could be the foundation of consciousness. Following this thread, one might argue that unless we solve quantum computing, we won’t be able to build true artificial intelligence?

VC:

Well, you’d need to talk to Hartmut Neven because he's running our quantum program and he has some real experiments to try to show whether quantum entanglement is involved in brain function. He's doing real tests, so we will see what he gets out of those experiments and how they get interpreted.
I remain a little sceptical about this, and part of the reason is that, as far as we are currently aware, quantum computing has limits in terms of the kinds of problems it can address. It's not general-purpose computation - it may not even be Turing complete. If that's the case, then it may be usable for only certain kinds of computations. One example is the famous Shor's algorithm for cracking codes that depend on the factorization of the products of large primes. Optimisation is another example. But you wouldn't use it for balancing your checkbook.
I also remain a little skeptical of the speculation that consciousness and quantum effects are tied together, except for the fact that quantum effects are real. Quantum effects are physics. Our brains are physical objects made out of molecules and atoms, and evolution seems to have used quantum effects in some cases - like in bird navigation or how we detect different smells. There’s this field of quantum biology that suggests that because evolution didn't have to care about what theories we had about nature, it just needed to find something that gave us a small, small edge over time. Another good example is photosynthesis. There’s a very strong argument that photosynthesis is a very efficient use of quantum effects, and much more efficient than any other energy conversion that anybody has ever come up with. So it's not outrageous to think that quantum effects have significance in brain function. But until we have some solid evidence, it's still speculation.

V: Adapting Internet protocols for AI agents

NBL:

I want to come back to the Internet. How do you think the Internet has to change to accommodate AI? What large-scale changes will we have to think through?

VC:

I don't think the Internet itself necessarily needs to change. Google has already made some interesting contributions to the protocol suite. An alternative to TCP/IP, QUIC, for example, introduces improved efficiency, particularly for web-based applications. It was all designed around the needs of HTTPS, basically. And it's been a very effective evolution.
I think that quantum processors and AI machine learning processors are just more computational elements in the Internet, and moving data back and forth between them will work just fine with the protocols that we have available. So I don't see an evolutionary requirement on the Internet side.
What I do see, however, is a need for some standardisation at higher levels in the architecture. As an example, to get two AI systems - two AI agents - to coordinate on a task, they need to have some common understanding of the task. They have to have a shared vocabulary. We need to standardise the vocabulary to minimise misunderstandings.
Humans have the ability to use natural language to misunderstand each other. We would like to reduce the level of misunderstanding between AI-based agents, because they have the ability to propagate errors rapidly and very explosively. When you say, "Gee, it would be nice to have a new car," the next thing you know, your agent might have bought a Ferrari for you - because it knows something about your peculiar interests, even though you never authorised it to spend a quarter of a million dollars on a car.
You can imagine other crazy things that might happen if you didn't have proper constraints - with travel, you want to take a vacation, so an AI agent arranges an around-the-world trip for a hundred thousand dollars - when maybe you weren't planning on that. We need some standardisation for inter-agent interaction, in my opinion, which we don't have right now.

NBL:

This raises a question I have been noodling on a little bit. If the Internet's basic structure stays the same, but AI agents start to use it more and more, then the way websites are built will have to change to accommodate AI agents. Agents will care more about what a resource does than how it looks. So the more visual aspects of the web - like HTML websites designed for humans - may become less important?

VC:

Well, you should look at a paper that Bob Kahn and I wrote in 1988 called "Digital Libraries: The World of Knowbots." We contemplated a language that we might today call Knowbotic. And knowledge robots would use this language to interact with each other in a standardised way.
In that particular case, we imagined standards for representing digital objects. And these digital objects looked kind of like viruses, because the virus has RNA and DNA in the core—that's the content—and then it's surrounded by proteins. I imagined that a digital object had content at its core, and was surrounded by software that allowed you to interact with the content. For example, if an object were music, you could say, "Please show me your written or printed music format" or "Please play this as you would in the Symphony Orchestra in London" or "Give me a high-resolution rendering" or "Play in the style of something else." And then there were access control questions: who had access to the content and what terms and conditions could you negotiate with this digital object because of the software that surrounded it?
So I'm imagining that agents would need a similar kind of capability to talk to each other and to move around. We're doing this already to some extent - when a browser ingests a piece of HTML, it is hosting a piece of software that interprets it. We imagined it a little differently. We thought about knowledge robots moving around on their own volition to different parts of the net, landing on a ‘concierge’ - a hotel for 'bots’ - and discovering what services were available, before executing a task, and then bringing the results back.

NBL:

What we're doing now is implementing ‘computer use’ which is essentially simulating how a human would interact with that particular user interface, but it doesn't take a lot of imagination to realise that there have to be tons of inefficiencies in how we interact with the resource. Over time, do you think the World Wide Web may fade away?

VC:

Well, it may not fade away, but one thing I can say is that our normal mouse and keyboard interfaces don't do well for people with disabilities. But imagine being able to use dialogue to execute something, instead of trying to roam around in a web page that you can't see. So I'm very excited about the idea of AI agents being able to interact with humans in lieu of having to use more traditional screen readers to interact with web pages. It's consistent with what you're saying. The user interface is going to be a really important integral part of AI innovation

VI: Final reflections

NBL:

Is there something that you think that we, as human beings, should prepare for as we integrate this technology into our lives?

VC:

Yes. We need to be very, very aware of the fact that these tools are imperfect and that surprises may occur. Critical thinking is our friend, and building in mechanisms to account for the fact that there can be mistakes is really important. Right now, this is true for all software, not just for AI. We are increasingly dependent on software running more or less according to expectation. And of course, we all know that software often has bugs, and the result is unexpected outcomes. We need to be aware of the fact that we're dealing with relatively imperfect engines here.
I don't think people are fully sensitised to the fact that things won't always work. Many of them will say "Well, it won't happen to me" - until it does. As engineers, we should be much more conscious of the fact that people are going to be increasingly reliant on our software - including AI - to do things that they rely on every day.

NBL:

We have to get better at designing for failure - a bit like airplane engineers.

VC:

I think we also need to accept some responsibility and accountability, which we don't have now. We are not held to account in the same way that a professional civil engineer is. When a bridge collapses, the civil engineer who approved the design is held accountable. Meanwhile, we have unaccountable programmers everywhere. I used to be one of them.