AI Policy Perspectives

Science Needs AI Data Stocktakes

Conor Griffin — Thu, 30 Apr 2026 11:09:40 GMT

By Conor Griffin, Don Wallace, and Theo Brown

For 40 years, amid green pastures outside Culham, a small village in Oxfordshire, scientists and engineers toiled at the Joint European Torus. They were attempting to harness nuclear fusion, a force powerful enough to light the sun.

To create fusion, scientists and engineers must heat the nuclei of very light atoms with such intensity that they fuse, instigating a self-sustaining reaction that releases vast amounts of energy. The scale of the challenge is hard to fathom—at its most extreme, the Joint European Torus, or JET, was the hottest point in the solar system, hitting over 150 million degrees Celsius.

JET concluded in 2023, generating a record amount of energy in its final experiments. The project is now part of fusion’s history but remains pivotal to its future. The growing number of organisations developing fusion reactors are drawing on JET’s discoveries. The UK Atomic Energy Authority is advancing a national fusion facility, MAST-U, on the Culham site. This will serve as a test-bed for STEP Fusion, the UK’s project to put fusion electricity on the grid, set to begin operations in the early 2040s.

But JET didn’t just bequeath novel discoveries. It left behind massive troves of data. That raises a tantalising prospect: Could scientists use this data to train AI models that accelerate the path to fusion power?

This is possible, but challenging. Most JET data is raw and unvalidated. Many important insights are buried in scientists’ logbooks. The data that does exist is not available open source or, generally, for commercial use. Changing this may require agreement from all of JET’s original partners across Europe. One expert we interviewed called JET data a ‘stranded asset’.

Such data predicaments are not specific to JET or to fusion, but apply across all of science, even though science is precisely the domain where AI could yield its greatest benefit to society. New breakthroughs and startups are emerging quickly, from protein design to material design. Scientists are also keen users of fast-improving AI coding agents. But a lack of high-quality data will dampen progress. In most disciplines, large, high-quality datasets like the Protein Data Bank, which underpinned AlphaFold, are absent.

The scientific community needs to tackle this problem, and there are promising signs. Late last year, the UK government launched an AI for Science strategy, which includes a new collaboration with Renaissance Philanthropy to identify priority datasets. The US government’s Genesis Mission aims to train AI models and agents on federal scientific data. Google.org has a dedicated AI for Science fund, which can fund datasets and tooling.

These examples suggest that if the scientific community can identify the data that AI needs, a range of actors could help to fund and deliver it.

This demands what we’re calling AI data stocktakes. The concept is simple: interview leading experts in a given scientific field to understand the main opportunities to apply AI; the data obstacles; and the interventions that could make the biggest difference. Admittedly, some blockages, such as a paucity of engineers, are structural and will take years to fully resolve. AI data stocktakes should identify such challenges, but focus on projects that governments, companies and philanthropies could fund and implement within 1-2 years.

There are promising early efforts to map AI data gaps. But, to our knowledge, there are no concise, accessible documents that explain the AI opportunities in genomics, weather forecasting, and food security and convert them into a list of fundable data projects for policymakers and funders to pursue.1

In this essay, we offer a proof-of-concept. We interviewed 25 leading experts to create an AI data stocktake for fusion. We focus on the UK, but our analysis and recommendations could be taken up by funders anywhere in the world. Moving forward, we hope to support AI data stocktakes for other scientific disciplines and research problems.

Subscribe now

I. Why fusion? Why now?

If fusion is achieved, it would provide a safe, almost limitless source of clean energy.2 From a scientific perspective, it would yield a better understanding of the plasma that makes up more than 99% of the visible universe. From a social impact perspective, it would help address energy scarcity and unlock energy-intensive innovations, like desalination.

Despite quips about fusion being always 20 years away, 70 years of experiments actually show fairly steady progress, which has continued in recent years, from Germany to China. In most fields, such progress would have solved the problems of interest decades ago. But fusion is an extremely hard problem. And the primary product is only attainable at the end of the line.

To achieve fusion, scientists need to create and control plasma, a super-hot state of matter, in which the atoms have been stripped of their electrons, and extreme heat and pressure are used to force the remaining nuclei to collide and fuse.

Scientists are pursuing two main approaches to doing this, with very different physics, data, and AI opportunities. Magnetic confinement fusion uses massive magnets, while inertial confinement fusion uses high-energy lasers.3 We focus this data stocktake effort on magnetic confinement, as the UK’s STEP project is pursuing that approach, as is Tokamak Energy, the UK’s leading fusion power startup, and Google DeepMind’s fusion team.

The end of the line for fusion is now getting closer, for two reasons. First, the underlying technology landscape has changed. In addition to AI, the discovery of high-temperature superconducting magnets makes it easier to build smaller and potentially cheaper reactors. Second, fusion has traditionally relied on government funding. But in the past five years, a wave of private investment has arrived, with more than 30 companies now pursuing fusion power.

These shifts have injected welcome momentum into the field, but also significant hype. In response, we need a clear view on the primary bottlenecks that AI can address.

II. How to accelerate fusion with AI

To create fusion, scientists and engineers need to predict, control and understand how plasma behaves. The challenge is that plasmas are highly complex and much of their underlying physics—from fluid dynamics to electromagnetics—remains poorly understood.

To make progress, scientists run experiments that create plasmas in a reactor, and use sensors to measure their properties under different conditions. Scientists use these experiments to validate their theories, reveal unexpected phenomena, and test the hardware needed for power-plant-class devices. However, building fusion reactors is extremely expensive and so few machines exist, with most researchers running their experiments at just ~10 leading facilities worldwide. When they can get access to such a facility, scientists must decide how to design the optimal experiment, including how to toggle an array of possible parameters, from the electrical current in a reactor’s coils to the valves that control the gas levels.

Fusion scientists also run computer simulations, including to help design and interpret these costly experiments. This is also challenging, as researchers must simulate a diverse range of phenomena, at very different scales, from the tiny, lightning-fast movements of electrons to the larger, slower evolution of the entire plasma. For simulations run on massive supercomputers, this may mean weeks. For scientists without such resources, it may mean many months. As a result, scientists make trade-offs, using assumptions and approximations to run their simulations more quickly and cheaply, but also less accurately.

The challenges don’t stop there. Scientists know that their theories, simulations and experiments are imperfect. But when a gap emerges between what a simulation suggests and what an experiment reports, it is often unclear where exactly the issue falls.

AI can help in four main ways.4

1. Improve simulations

Scientists can develop “AI surrogate” models that emulate the predictions from a fusion simulation code, at a fraction of the cost and time. To do so, they run a code many times, varying the input parameters each time. They then use the resulting dataset to train an AI model to predict the outputs of interest much more quickly.

Scientists have already shown that AI surrogates can make simulations faster. Moving forward, AI surrogates could make simulations more useful. First, scientists could develop AI surrogates for more accurate, but computationally expensive simulation codes. Second, they could develop ‘integrated models’, like TORAX, to stitch together AI surrogates for different phenomena—from the ‘turbulence’ that determines how well confined a plasma is, to the ‘scrape-off’ layer that simulates the plasma hitting the reactor’s wall. Finally, scientists could move beyond producing one-off AI surrogates that result in a paper and some code, to a world where surrogates are documented, maintained and ready for use in fusion reactors.

2. Improve experiments and operate the reactor

In most fusion experiments, scientists must decide if and how to tune various parameters, while striking a balance between more proven and novel settings. To help, researchers can use AI to predict the optimal parameters for their next experiment by learning from past ones; and to predict how well their experiments will fare. More recently, scientists have also started querying LLMs to check and refine their experimental protocols.

Scientists also use AI to predict the plasma ‘disruptions’ that frequently end experiments, damage machines and are one of the biggest obstacles to a future power plant. AI models can already predict past plasma disruptions with high accuracy. But predicting future disruptions, on more powerful machines, quickly enough to stop them, is an open research challenge.

The ultimate goal is to use AI to help operate the reactor itself. Fusion reactors run on a real-time feedback loop: sensors monitor the plasma, while the actuators, such as the magnetic coils, are adjusted accordingly. The traditional control algorithms used to enable this often struggle with the chaotic, non-linear nature of millions of plasma variables interacting.

In recent years, researchers have demonstrated how reinforcement-learning agents can learn more effective control policies, including to reduce plasma disruptions. To help these RL agents generalise to novel scenarios and reactors beyond their training data, scientists are developing ‘hybrid approaches’ that integrate some knowledge of physics into the models.

3. Improve fusion data

Fusion experiments are extreme environments. The intense heat and the chaotic nature of the plasma mean that the data that sensors pick up is often noisy or low quality. Some variables cannot be directly measured, and must be inferred, introducing additional sources of error.

Scientists are training AI models to extract clean signals from this noisy data and to learn correlations that allow them to predict data for one sensor, given data for others—a capability that could be critical if sensors in a future reactor get damaged. Scientists are also using AI to train surrogate models that speed up, and better calibrate, reconstructions of the plasma, using the limited experimental data that is available.

Scientists often care less about the raw data from their experiments, and more about important events, such as when a disruption to the plasma began. Today, they often need to manually inspect graphs and plots to detect these events. AI can help to automate parts of this process and to detect events that scientists may have missed.

4. Improve the underlying technologies

Achieving fusion will require a supply chain rich in technologies that could be applied more broadly. AI could help to accelerate their development.

For example, the chamber walls in a fusion reactor will require new materials that can withstand extreme temperatures. Scientists are training AI surrogate models that speed up the simulations needed to assess a candidate material’s real-world properties, like how strong or resistant to radiation it will be over its lifetime.

A typical fusion reactor also spends much of its time out of operation, at great cost. This makes fusion a logical place to develop predictive maintenance techniques that ingest historical data from sensors and train AI models to learn the subtle signatures that indicate pending breakdowns, allowing practitioners to schedule maintenance or design more reliable systems.

Subscribe now

III. The challenges with fusion data

As they pursue these AI opportunities, scientists will need access to three main kinds of fusion data: from experiments, simulations, and sources that are not traditionally available, such as researchers’ logbooks. There are promising efforts underway on this front, but many obstacles.

1. Experimental data: Unvalidated, single-machine and hard to access

Experimental data is the ‘ground truth’ that the sensors in reactors pick up, from line graphs to videos. In magnetic confinement fusion, the challenge is not so much a lack of data, but an excess of raw data that has not gone through the processing needed to make it useful to AI. This processing ranges from addressing noise and imperfections in the underlying sensors, to detecting and annotating important events, such as plasma disruptions.

Currently, the community has to rely on the small well-validated datasets that do exist, which may be as little as a few hundred or thousand experimental ‘shots’—individual test runs of a reactor. The high cost of fusion experiments has also resulted in a natural incentive to pursue experiments that will not fail, curtailing more novel research and meaning that much of the resulting data is in a similar ‘parameter’ space and does not represent the full range of plasma dynamics that scientists want to model.

This experimental data is also not generally available open source or for commercial use. One promising initiative to change this, which several interviewees cited, is UKAEA’s project to open source data from their MAST facility.

However, to develop more general AI models, researchers want multi-machine databases that extend beyond a single facility like MAST. To that end, the IAEA is developing a federated Fusion Data Lake where different institutions would store their data locally but make it accessible via a central data catalog. One challenge with this approach is that fusion facilities have defined fusion variables and stored data in different ways. The Integrated Modelling & Analysis Suite, or IMAS, addresses this by providing a standardised ontology and set of structures for fusion data. It is nascent, but has positive momentum.

2. Simulation data: No incentives, process, or place to host it

In theory, researchers should be able to run fusion simulation codes many times and train AI surrogate models on the resulting data to reproduce the outputs at a fraction of the cost. In practice, most scientists run a simulation to answer a single, narrow, physics question. They do not run a large number of simulations to build representative datasets to train AI surrogates—a very different activity.

That activity is also a hard one. There is no standard procedure to follow to generate a dataset for training an AI surrogate model, and the codes are often finicky to use. Most simulation codes contain ‘free parameters’—knobs that scientists must decide how to best tune—a practice that can be as much an art as a science. The datasets can also be huge and there is no obvious location to store them, although some early examples exist.

3. Dark data: Nascent, IP issues, and hard to integrate into workflows

‘Dark data’ describes the contextual information that scientists generate that is not captured in structured datasets. This includes notes scribbled in experimental logbooks, where scientists describe the procedures they ran, the hardware issues they faced, and the phenomena they observed. For simulations, it includes the many nuances needed to run and interpret a code’s results successfully, and the many undocumented imperfections to be aware of.

Accessing this dark data could help ensure that AI systems do not focus on the wrong things—for example, when an anomaly in the data is caused by an equipment failure or error, rather than a meaningful phenomenon. It could also provide AI with a window into the entire research process, including its many dead-ends, rather than just the final result.

Researchers are using LLMs to try to make dark fusion data accessible, for example by enabling scientists to query experimental logs and archive documents. But much of the data is not well-annotated, there are IP issues in accessing it, and it is not yet clear how to integrate the data into practitioners’ daily workflows.

The three ‘debts’ holding fusion data back

Many of these challenges with fusion data result from three underlying issues, which have compounded over time into systemic debts that inhibit the use of AI today.

1. Technical debt

The fusion community has traditionally had to prioritise getting large, complex machines to work, rather than building infrastructure to collect, curate, and share data. As a result, activities like data annotation and writing high-quality code are underfunded. Many leading fusion codes were created decades ago and have evolved slowly, while the quality of experimental data is limited by the capabilities of the sensors available.

2. Bureaucratic debt

The large costs of fusion experiments and the traditional reliance on government funding mean that many fusion projects have a complex web of owners and collaborators, which can make agreeing on new data initiatives difficult. For example, JET was sponsored and funded by Euratom, the EU’s nuclear research community. Its scientific exploitation was managed by EUROfusion, a pan-European network of fusion research labs. UKAEA managed engineering and operations. Releasing its data may require agreement from all of these actors.

There are other bureaucratic hurdles too. Scientists who run fusion experiments often want an embargo period on the resulting data so that they can prepare a publication. Such embargoes are rational, common in science, and largely supported, but many interviewees felt that they had become too long. Fusion data is also subject to diverging open-source policies. For example, the MAST experiment was funded by UK Research and Innovation, which has strong open data requirements. The follow-up MAST-U experiment is funded by the UK Department for Energy Security and Net Zero, which does not have the same policies. Many fusion companies also do not open source their data.

3. Human and cultural debt

The fusion community does not have enough software engineers and experts who are able to clean data, attach confidence levels, and curate it for AI use. As a result, physicists must take on many tasks that are outside their core areas of expertise, including writing high-quality code.

This issue is compounded by a research culture that inhibits data sharing. Scientists are constantly pushed to move on to the next experimental campaign, rather than to validate older data. This stops some scientists from sharing their data, because they fear that end users will not appreciate the resulting gaps and do bad science with it. Or they fear that they themselves will be criticised for releasing ‘unscientific’ data.

Subscribe now

IV. Recommendations

Below we provide eight recommendations to address these data limitations and accelerate fusion with AI. Each project could be led by a mix of government bodies and funders, like the Department for Science, Innovation and Technology and UK Research and Innovation; public research organisations like the UK Atomic Energy Authority, companies; universities; and philanthropies. Where possible, the UK should look to collaborate internationally—for example, with the US Genesis Mission and the International Atomic Energy Agency.

1. Strengthen the UK’s lead in open fusion data

Expand FAIR MAST, the UK’s pioneering open sourcing of experimental data from its MAST facility, by adding data from the follow-up MAST-U facility and making the user interface more accessible. This will require the UK Department for Energy Security and Net Zero clarifying that open data policies apply to MAST-U, funding at least five data engineers over a two-year time period, and ensuring that the project has sustainable compute and data storage.

2. Liberate 40 years of data from the Joint European Torus

Launch a project to open source at least 30% of JET experimental data by 2028. This will require agreement on what data to release. For example, should the project only release validated, curated data relating to notable discoveries? Or should it also release data that is raw, validated only in part, or which relates to ‘normal’ machine behaviour? Second, and much harder, will be securing agreement from all relevant institutions to release the data.

3. Launch a competition to predict plasma disruptions

Fund a competition to see which AI model can best predict future plasma disruptions in new experimental campaigns, building on early examples and work in this space. This could include funding dedicated experimental shots on machines such as MAST-U, to evaluate models on challenging edge cases. Beyond accuracy, sub-competitions could evaluate models on important variables, such as: Can the model make predictions with little data, such as when sensors become damaged?; Can the model predict disruptions across different reactors?; Can the model predict disruptions with sufficient lead time to prevent them?; and Can the model shed new light on why disruptions are occurring?

4. Prototype the future of AI-enabled scientific data curation

Expand the platform that UKAEA recently developed to enable human experts to use AI to annotate experimental data, by adding data from other fusion facilities; increasing the complexity and variety of the metadata that is captured; and training AI models to directly annotate an increasing share of this data.

5. Make leading simulation codes AI-ready

Launch an effort to modernise priority fusion simulation codes, including to make it easier to train AI surrogate models based on them. This could build on early efforts in this space and target codes, such as JINTRAC, which are important to the UK’s proposed STEP Fusion power plant and the international ITER effort. The project could start by modernising the codes’ documentation and ‘refactoring’ them so that they are compatible with modern chips, like GPUs and TPUs, and allow for parallel data generation. It could then open source the codes, with a plan for how to maintain them. Throughout the modernisation process, it could test the usefulness of AI coding tools to the tasks at hand.

6. Demonstrate a new state-of-the-art for AI surrogate models

Fund small teams of software engineers and experts to develop AI surrogate models of important, computationally expensive phenomena in fusion simulations. The project should ensure that all newly created surrogates have state-of-the-art documentation, data provenance and version control. It should release the data used to train and validate the surrogates and develop software pipelines to automate time-intensive aspects, such as organising the data.

7. Use AI agents to preserve expert fusion knowledge for the future

Gather a group of leading experts on a priority fusion simulation code, and equip them to use AI agents to make the tacit knowledge involved in running that code available to the wider research community. To do so, the experts could task the agent with running the code. As it seeks to execute, the agent would have an ‘internal monologue’ that the experts could trace, steer and intervene on. The end result would be a series of documents, such as markdown files, that capture the important dark data needed to run the code well.

8. Create Fusion-Bench to measure and drive LLM performance

Assign leading fusion experts to create an evaluation metric to quantify how well leading large language models understand core fusion concepts. This would make it easier to improve the usefulness of LLMs for downstream tasks in fusion. This evaluation will be more difficult to create than in disciplines like maths or computer science, where it is easier to automatically verify a model’s performance. But the experts could determine the most useful approach, which will likely involve a combination of question-answering and task performance.

V. Six open debates

The experts we interviewed disagreed on some points. Despite the framing below, few are either/or debates. Rather, most are about relative degrees of emphasis.

Incrementalism vs novelty: Should we build on the early AI opportunities that fusion practitioners have already showcased? Or pursue more novel, uncertain AI ideas, such as training general-purpose ‘fusion foundation models’ or using AI ‘world models’ to pursue new kinds of fusion simulations?
The past vs the future: Should we strive to get as much value as possible out of older fusion data, like JET? Or, do the costs mean that we should accept our losses, and focus on making future fusion experiments AI-ready?
Science vs engineering: Are efforts to validate, annotate and standardise data part of an ultimately doomed quest for perfect scientific understanding in fusion? Should we instead use AI to embrace a more engineering-led approach that can get the machines to work with noisy, imperfect, data?
Domestic vs international: Should the UK rejoin ITER, the world’s flagship international fusion collaboration, which it left following Brexit? Or should the UK focus on domestic efforts, perhaps in collaboration with priority partners, like the US and IAEA?
Magnetic vs Alternatives: Should the UK continue to focus on magnetic confinement fusion as the most realistic pathway to a future power plant? Is magnetic also a better bet for AI because it produces much more data and doesn’t have the same associations with the security establishment, which makes data access easier? Or should the UK invest more in inertial confinement and alternative fusion efforts, given the country’s diverse academic expertise, its historically strong relationship with the US National Ignition Facility, and notable assets, such as a world-leading laser?
Public vs Private: Should the UK government try to derive more immediate value from its fusion data? For example, should the UK license some data to companies, to cover the costs of data processing and annotation? If so, should local startups pay less? Or would such efforts hurt the UK’s goal of developing a world-leading fusion sector?

_________________

This essay was originally posted on the Google DeepMind website and is a summary of a 20-page report that contains more details and examples.

Thank you to the following experts who let us interview them, reviewed the draft, and/or provided other support, as well as those who prefer to remain anonymous. All mistakes belong to the authors and no expert spoke to us on behalf of their organisation.

Jonathan Citrin, Brendan Tracey, Cristina Rea, Nathan Cummings, Andrea Murari, Jess Montgomery, George Holt, Alain Becoulet, Matteo Barbarino, Arthur Turrell, Adriano Agnello, David Dickinson, Steven Rose, Alessandro Pau, Kristina Fort, Charles Yang, Federico Felici, Tim Dodwell, Sam Vinko, Aidan Crilly, Lee Margetts, Tom Westgarth, Lorenzo Zanisi, Chris Packard, Justin Wark and Stanislas Pamela.

Fusion has several characteristics that make an AI data stocktake exercise tractable, including a relatively small and centralised research community and early efforts to build on, like the open-source FAIR MAST initiative and the IMAS data standardisation effort. Fields like genomics, weather forecasting, and food security look quite different, and so careful thought is needed on how to best scope AI data stocktakes in these fields. Nevertheless, we think they would be useful.

There are caveats to the claim that fusion power would be essentially limitless, emission-free, and perfectly safe. One of the input fuels, tritium, is not widely available and scientists will need to use nascent ‘blankets’ to breed it from lithium. Certain parts of fusion reactors will become radioactive over time, although they can likely be recycled after ~50 years. Thermonuclear weapons use fusion reactions. However, the weapons first require fission reactions and fissile materials like enriched uranium and plutonium.

Note: There are other approaches to inertial confinement fusion that do not use lasers.

For more in-depth reviews of AI for fusion opportunities, see publications from MIT, the Clean Air Task Force, IAEA, FusionFest, and the US Department of Energy.

Q&A with Ethan Mollick

Tom Rachman — Wed, 22 Apr 2026 09:48:49 GMT

(Credit: Jennifer Buhl)

How can companies get their employees to use artificial intelligence when human intelligence remains sharp enough to know that this risks replacing jobs? How should education revise itself for the ever-revising technological world that students emerge into? And how to understand the love/hate relationship so many people have with AI?

Ethan Mollick—professor of management at the Wharton School of the University of Pennsylvania and bestselling author of Co-Intelligence: Living and Working with AI—is among the leading public intellectuals commenting on AI adoption, connecting the latest scholarship to real-world usage, including his own tinkering with each new model.

AI Policy Perspectives caught up with Ethan to hear his latest thinking on everything from agentic systems, to why scientific publication is broken, to how workers emotionally relate to AI colleagues. Too much chatter, he argues, considers this transformation at the broadest level. Too little digs into the practicalities of getting it right.

—Tom Rachman, AI Policy Perspectives

[Interview edited and condensed for clarity]

Tom: In your 2024 book Co-Intelligence, you proposed four rules for human and AI collaborations, including that people should oversee and verify AI outputs. But doesn’t the value of AI agents come from people not overseeing and verifying everything?

Ethan: This is where policy matters a lot because these are choices now. In the “co-intelligence era,” you’d prompt the AI to do something in a chatbot, and it would give you an answer. You prompted again, and it’d give you another response. The human was in the loop. And not being in the loop was really dumb because it meant that you were just pasting in the AI’s answer, and then you’d get in trouble, as a lawyer with the judge, or whatever it was. Capabilities were weak, so human-in-the-loop mattered a lot.

But with agentic systems that could do hours of work on their own, now it’s a design choice. When do we want humans-in-the-loop? When is human verification valuable? When is human verification morally required? When is it legally required? What kind of interventions move the system forward? I feel there has been a complete lack of deep understanding about these topics.

Tom: You’ve said that, with agentic systems, management becomes a superpower. Can you explain this?

Ethan: Increasingly, systems look like mini-organizations as they get subagents they can delegate to. So the best way to organize is to give the AI a clear direction of where you want to go. And it turns out that this looks a lot like management. When do you want the AI to check in with you? How do you write a really clear brief? What checks are important? What tests do you want to run? What’s acceptable? What’s not acceptable? Those are management questions.

THE WORKPLACE

Tom: You co-wrote a study last year involving a field experiment at Procter & Gamble that showed AI usage enhanced employee performance. But there were other interesting findings besides that.

Ethan: The most interesting piece about it was that people liked working with the AI, and that it substituted for people emotionally. The second interesting piece was the “smoothing” of capabilities—so, technical people previously had technical ideas while business people had business ideas. But AI smooths out both. If technical people can do business work and business people do technical work, what that tells you is we have to redesign organizations.

Tom: The emotional side—that using AI improved people’s feelings about the work—was surprising to me; I wasn’t sure what to make of it.

Ethan: What to make of it? That views of AI are complicated. If people keep saying, “Yeah, AI is going to destroy all jobs, and may kill everyone on Earth…but might not”—and then, “Why is AI unpopular?!” Feels like not a hard question. People like AI when they use it themselves; they don’t like AI writ large. It’s not surprising to me that AI makes your job better because a lot of jobs suck! And if we do good design work with AI, it makes people’s lives better. If we just let it loose on the world, and tell management that the only option they have is automation, then we’re in big trouble.

Tom: Many knowledge workers seem to be using AI in secret right now, perhaps from fear of being exposed as less valuable.

Ethan: This is a leadership problem. The incentives have to be aligned properly. Currently, it’s, “I’m going to automate your jobs away” or “I’m not going to share with you any of the gains the company gets.” People are exquisitely tuned to rewards. So it’s about leaders articulating a vision of what the world looks like with AI for employees. “What should I expect to do? How are people rewarded for doing the right thing? If they automate 90 percent of my job, what happens to me?” Without those answers, everything else is secondary.

CHANGING ORGANIZATIONS & EDUCATION

Tom: You have a concept of “leadership, lab, and crowd.” Could you explain?

Ethan: There was a huge amount of R&D in the 1900s about how you organize work, and 40 percent of the American advantage in business came from management. In the last 30 years, a lot of that muscle has died. But experimentation is important, and leaders need to guide that. So, there are three things that organizations need to be successful with AI. First is “leadership”: a team that articulates a clear vision of the future, and is willing to experiment. Then there’s “the crowd,” the employees who might actually use AI. They need access to a frontier model, they need clear rules, they need reward systems. Then there is “the lab,” and this is the piece a lot of companies are missing. You need a dedicated team working on AI innovation. They can’t be just a technical team; this is not an IT department problem. If you don’t have that piece, you’re not building things for the future. And where does the crowd go when they have a good idea? “I came with a breakthrough idea that saves 90 percent of effort!” How does that diffuse in the organization? That’s where you need the lab.

Subscribe now

Tom: If AI transforms the workplace, that should change how we educate the next generation, right?

Ethan: The early workplace is under a lot of threat because the old apprenticeship model just broke. The idea was that there were tasks—especially in white-collar work—that were tedious and annoying for managers to do. But you could pay a relatively cheap person to do them, and that person would learn as a result of this, and receive mentorship. So we had this amazing machine for talent: we taught you, we evaluated you, and you got paid, and you were doing work we needed. A junior person’s goal was to produce good work that made managers happy, so that they got promoted. But now the junior person is worse than AI, so they’ll use AI to do their work. And the middle manager’s goal was to give work to a junior person who’s not great, and give them feedback so they get better, so that the middle manager has to do less work. And that broke because the middle manager would rather assign work to the AI.

Tom: But in terms of the educational system, what should change if workplaces no longer offer that apprenticeship role?

Ethan: Education is really screwed up right now, but it was screwed up for lots of reasons. It’ll be fine; we’ll figure this out. But it’s gonna take a bunch of years. It’s clear from early evidence that AI will be a tutor outside of class and inside class. It’ll do activities and give guidance. But schools are places where we can compel students to not use AI, and have them in a room, and evaluate them, and teach them the things that we want them to learn. As long as we think people need to be educated, this is the best space to do it in. So students are cheating in the meantime? They were cheating before! We can give them different tests; we could do in-class writing assignments. There can be a weird, backward-looking “Education won’t adjust!” view. How many death spirals does higher education need to be in per moment? There are the pieces to reconstruct a better form of education. It’s just a massive changeover.

BETTER SCIENCE & BETTER THINKING

Tom: What about academia? There’s been much talk about AI-written papers, and how they could overwhelm academic publishing. But could AI benefit the peer-review process, and help with the dissemination of academic findings?

Ethan: This is another area where more lifting is needed. It’s a shame that we are building AI co-scientists, but not thinking about the rest of the process that’s needed to actually make science happen. It’s one thing to have science produce more papers. We have no ability to absorb more papers. Every publication is overwhelmed. Our dissemination techniques were already bad, but now they’re really broken.

Tom: As a case in point, you submitted a paper around 2023, and wrote publicly about it then, making your term “the jagged frontier”—that AI capabilities advance in some areas but remain behind in others—highly influential. Yet the academic paper itself only just came out, three years later!

Ethan: One of the rejections we got early on was reviewers saying that they knew this already, and they cited a bunch of working papers—that cited the working paper we had submitted! This is not a unique story. Opening one part of the bottleneck without opening the others becomes a problem. But it takes longer to solve systemic problems of how science operates than to solve the problem of producing more papers.

Tom: Another concern in education and science is cognitive offloading, that people may surrender thinking to machines, and lose those skills. On the other hand, AI’s value comes from machines thinking for us. What are examples of bad offloading and good offloading?

Ethan: We offload all the time, right? But we also force people not to offload. You could offload all your mental math to calculators, but we force students to do some math by hand in an attempt to get them to learn stuff. And we can enforce those rules in school. In the world of work, we are not used to thinking about training, about what should be offloaded, and what shouldn’t be. We need to make decisions about this. So, Rolls-Royce still employs someone to paint stripes on a car by hand, and that’s an obvious pushback against deskilling in one area. But Ford doesn’t do the same thing. These are choices we get to make at an organizational level, depending on what we think is valuable.

ADAPTING TO CONSTANT CHANGE

Tom: A point you’ve made to young people about the AI future is that they’ll need to be adaptable. When educators talk about teaching adaptability, it sometimes boils down to encouraging “creativity” and “critical thinking.” Another view is that you’re more likely to be adaptable by developing deep domain knowledge. For you, what does learning adaptability mean?

Ethan: Adaptability requires both deep domain knowledge and wide knowledge: T-shaped behaviour is probably the way to go. I feel like it’s a throwaway line: “Well, we’ll all be adaptable!” If we could teach that, that’d be amazing. People are more adaptable than we think, so part of this is that people will figure stuff out. But we can’t just throw up our hands, and say, “Be adaptable!” You need to have deep enough knowledge to go into a field. You need to have broad enough knowledge so that, as one piece of knowledge becomes less useful, you’re moving to the next one. And we need to help people be adaptable by building systems that get them in place inside an organization and able to shift roles. I sometimes worry that adaptability is a catch-all for “Don’t worry! It’ll be fine!”

Tom: Another side is that not everybody will be equally adaptable. Could it be that the AI future favours certain circumstances and characteristics?

Ethan: A lot of these characteristics were already good characteristics to have. Does AI act as a multiplier of them? Does it disincentivize some people? We’re now past the edge of what we know. Ultimately, all of these questions come down to the same exact question, which is: How good does AI get, how fast? We need to articulate more clearly what we think that future looks like. Because you can’t say, “We’re going to build a superintelligent machine that’s better than all humans at every intellectual task—but let’s start thinking about adaptability!” Unless you mean, “Let’s adapt to UBI” [where everyone gets Universal Basic Income cash payments from the government]. And then, we should be spending a lot more time thinking about those issues. Not everyone in the labs believes this, and I find that the econ people believe it less. But you can’t have this message of, like, “All work will be obsolete!” and then have detailed, ticky-tacky conversations about what you should do in eighth grade. Because, by the time you enter the job market, there’s no jobs. So give me the pathway that you think is there, and that becomes the most important question to ask.

Tom: Are there other important questions I didn’t ask?

Ethan: We need to start thinking about getting into fields, and understanding what the changes are—we need to get detailed. That is where the research is missing. Another large-scale econ picture about AGI isn’t as useful. General-purpose technology affects everything, so we need policymaking for everything, from power generation to accountants, and when does the government say it’s okay to do this. There’s just this assumption that if we do the macro stuff, everything will work out. I’d rather see a lot more micro stuff: a thousand flowers everywhere, trying to come up with different approaches.

AI Agents Running the State

AI Policy Perspectives — Wed, 15 Apr 2026 09:50:09 GMT

Waiting for an AI helper. (Credit: Gemini)

“Public services” include everything from teachers to the trash, from roadwork to permission for a tree house. Much seems routine, but plenty is at stake. This makes politicians hesitant to risk an overhaul, leaving the system creaking and the paperwork mounting.

Last October, a provocative proposal emerged. The Agentic State conjured a vision of officialdom transformed, converting outdated procedures with a new system of AI helpers. This fledgling project offers both a blueprint and a promise of assistance to governments around the world.

But what if the vision were blind to how this could go awry? Simone Maria Parazzoli, a co-author of the paper, and Omer Bilgin of deliberAIde decided to critique their own ideas, seeking pitfalls in hopes of averting them.

—Tom Rachman, AI Policy Perspectives

By Simone Maria Parazzoli & Omer Bilgin

Amid the exhaustion of caring for a baby, new parents must deal with everything from bewildering sobs, to erratic feeding times, to the joys of changing a soiled newborn at 3 a.m. The last thing they need is paperwork.

But what if, when coming home from the maternity ward that first day, they could awaken a government AI voice assistant, tell it the happy news, and hear the following response? “Congratulations! What’s the baby called?” The app would then take care of all the dreary admin, coordinating across agencies, registering the child, and setting in motion the services that this tiny new citizen should enjoy.

That is one example of how a future “agentic state” could simplify, speed up, and improve citizens’ interactions with public services. To be clear, this does not yet exist. But projects like this one, envisioned by Ukrainian officials, are more than fantasy, with several countries avidly testing early versions of agentic AI systems.

While Ukraine works toward the baby example, Britain is piloting agent-based support to provide citizens more tailored help. Meanwhile, Singapore is developing governance frameworks for agentic AI, and governments from France to the United States are ensuring that their public data can be accessed by agents.

Agentic AI systems—capable of perceiving, reasoning, and acting with minimal human supervision—will transform what organizations can achieve. By combining the reasoning of large language models with retrieval, memory, and tool use, agentic AI can automate complex tasks. For governments, whose core work is high-volume, structured administrative processes, this could make services more efficient, timely, consistent, and fair, while lowering costs.

Consider a citizen looking to start a small business. An agentic system—instead of requiring the entrepreneur to individually navigate zoning boards, tax authorities, and regulations—could autonomously reconcile these requirements. The larger promise is a shift from just doing things right (optimizing for procedure-following) to doing the right things (pursuing outcomes that citizens truly want).

The Agentic State vision paper—supported by The World Bank and the Global Government Technology Centre Berlin—was the first effort to systematically map the opportunities of agentic AI adoption for governments. This was not an academic exercise: 21 leaders across 15 countries contributed, including ministers and chief technology officers preparing to lead this transition.

In this vision, AI agents are a means to manage complexity and scale, while humans develop strategy, exercise judgment, and hold accountability.

Several governments have integrated official chatbots into their government services, but most of these merely provide conversational guides to administrative procedures. A few pioneering countries are starting to move beyond that. Ukraine, for instance, is turning chatbots into agentic assistants. Specifically, its Diia.AI assistant can retrieve users’ data from connected registries, and generate official documents such as income certificates, while also providing certified information based on records such as taxation, land registries, and pensions.

The United Kingdom is also exploring agentic interactions via GOV.UK Chat (inspired by Diia.AI), including a pilot program to support job seekers that transforms a static digital portal into an active assistant, matching users’ skills with available opportunities.

Yet trends and optimism are not enough for success. The agentic state vision rests on key assumptions. What if they’re wrong?

This article presents a “red-teaming” exercise—a stress test of this vision—that identifies six core assumptions, along with scenarios that could emerge if they don’t hold true, and guardrails to avert such failures.

Assumption 1: AI Agents Become More Capable and Reliable

Agents can already perform rudimentary planning, tool use (e.g., searching the internet, using calculators, sending emails), and multistep task execution. Frontier labs are betting heavily on agents, making it plausible that systems capable of managing complex and large-scale administrative tasks will emerge soon.

Failure Scenario: The Technology Falters

Governments reorganize around agentic execution, but systems never become reliable enough for public administration. The demos look strong, but real cases fail on edge conditions, and require constant human correction. The agentic layer becomes only superficially competent with layers of human intervention underneath.

Guardrail: Start Cautiously

Governments should start with minimal deployments, and tightly scoped use cases to validate reliability, develop procedural rigor and organizational competence, and account for technological evolution rather than committing prematurely to large-scale redesigns.

Assumption 2: Agents Can Work Together

The success of agentic systems demands that they’re able to interact seamlessly, conveying intent, carrying out tasks, and sharing data in an interoperable way. MCP (model context protocol) is emerging as the technological standard for connecting AI applications with external systems.

Failure Scenario: Standards Fail to Converge

Commercial interests diverge, establishing competing protocols, while government departments end up using AI systems that cannot communicate with one another. When a citizen’s request requires action from multiple agencies, the process breaks down.

Guardrail: Officials Insist on Shared Protocols

Governments should make interoperability a condition of adoption, participating in the cross-sectoral bodies and forums where these standards are being shaped, funding the development of shared agentic interfaces and other agent-specific standards, and mandating non-proprietary protocols in procurement. Standards rarely emerge by accident, but they may emerge when powerful governments treat them as a priority.

Assumption 3: Organizations Will Adapt

To adopt and employ agents effectively, organizations must rethink their processes, roles, and incentives. They need to flexibly change and dynamically adapt practices to keep pace with the changing technological landscape.

Failure Scenario: The Status Quo Prevents Change

Agentic AI adoption outpaces organizational change, with citizens and civil servants using agents in an uncoordinated manner long before official programs catch up. Local practices harden into path dependence before common standards emerge. The state becomes more productive at producing bureaucracy, not societally beneficial outcomes.

Guardrail: Redesign Processes Before Automating Them

Agents should only enter workflows that have been simplified, decomposed, and restructured to minimize approval layers and handovers. Governments must treat adoption as a continuous discovery process. They should invest in common evaluation templates, reusable components, and a cross-agency repository of lessons, so that what works in one place can travel before what does not work becomes entrenched.

Assumption 4: Private Adoption of Agentic AI Will Be Rapid

Many companies are betting on an agentic future. Firms are experimenting with internal copilots and autonomous customer flows, while frontier AI companies advance core models, architectures, and capabilities, and cloud providers offer the compute needed to deploy agents at scale. This suggests that agents will become commonplace across business, consumer, and enterprise environments, allowing governments to build on tools, infrastructure, and behaviors already spreading across the economy. This assumption rests on projections, though evidence remains ambiguous.

Failure Scenario: Diffusion Is Slower Than Forecast

Governments invest as if an agent-saturated economy is imminent, but industry adoption remains narrow, experimental, or ends up costing more than it saves. Public investments don’t plug into widely used tools and practices, meaning that citizens find agentic interfaces in government before they’re normal elsewhere. The state ends up bearing political and institutional costs without the stabilizing effects of private-sector diffusion.

Guardrail: Lower Barriers to Private-Sector Agentic Usage

Governments can accelerate the development of an agentic AI ecosystem by investing in shared agentic infrastructure—such as standard ways to access public data, communicate across systems, and carry out authorized tasks and payments—that lower integration costs for firms, and reduce the risk of differing technological maturity across sectors.

Assumption 5: Citizens Will Prefer Agentic Services

Increasingly, citizens are interacting with and relying on AI tools, but many do not trust them. For governments to integrate AI agents into workflows and services, citizens must accept and support the roles that agentic systems can play, finding them sufficiently trustworthy, reliable, fair, convenient, and accountable.

Failure Scenario: The Public Rejects Automation

A single notable failure, or an accumulation of failures, turn the public against agentic systems, and convince many to opt-out. They judge automated decisions as opaque, illegitimate and untrustworthy, and suspect it worsens inequality, with privileged citizens able to employ highly capable personal agents to navigate bureaucracy better than those relying on basic tools. The government is forced to run two systems—agentic and human—and neither meets expectations.

Guardrail: Mandate Transparency

Governments must make agent integrations into government processes as legible as possible, furnishing explanations of decisions and publishing evaluation results for agentic fairness and performance, while detecting patterns of systemic bias or unequal benefit distribution based on citizens’ technological access.

Assumption 6: Human Oversight Will Evolve

For AI agents to act with functional autonomy within government processes, oversight frameworks must adapt, moving away from mandatory human reviews and approvals for everything (human-in-the-loop) to intermittent oversight (human-on-the-loop). This evolution increases speed and efficiency while reducing bottlenecks, with humans intervening only on edge cases. There is precedent for such adaptation: governments adapted regulation to cloud computing, e-identities, and AI-driven decision support systems.

Failure Scenario: Regulation Never Updates

Every agentic action requires human verification; every decision must be justified through mechanisms designed for old chains of accountability. Agents can draft, but cannot act. Compliance and procedural costs rise as institutions retrofit old controls onto new AI processes. The result is high bureaucracy and low autonomy: an agentic state in theory, a copilot state in practice.

Guardrail: Sandboxes to Test Oversight

Governments should establish controlled environments that allow policymakers, developers, and civil society to collaborate and gather empirical evidence on what forms of oversight are adequate and best fit different kinds of agentic deployments, reducing uncertainty before codifying rules at scale. They should explore this early, much as Singapore has done through its Model AI Governance Framework for Agentic AI.

Subscribe now

Soon, agentic government will be more than optimism and testing. A vanguard of countries will implement these tools. If those cases produce the kinds of benefits imagined, other countries will flock to join them.

But momentum is not inevitability. This project depends on assumptions—about progress, coordination, institutions, norms, and law—that demand scrutiny before governments rebuild themselves around these new technologies.

This red-teaming exercise of the agentic state concept is not to argue against the vision, but to make it more robust and resilient. The six possible failure scenarios are not mutually exclusive. Several could compound, and some may already be taking shape. For instance, reliability has been improving much more slowly than accuracy, providing ground for technology to falter (Scenario 1), and there are signals that the public might reject automation if economic gains and innovation speed are prioritized over fairness (Scenario 5).

Governments that are serious about improving the state with AI must attend to these risks in earnest now, while the architecture is still being laid. The opportunity is too precious to spurn.

Agentic AI could make public services considerably faster, fairer, and more responsive—more so than anything the traditional bureaucratic model has yet delivered. That prize is worth the discipline of preparing for what could go wrong.

For further details on “The Agentic State,” check out the original vision paper

AI Policy Primer (#24)

Conor Griffin — Thu, 09 Apr 2026 14:50:28 GMT

Every six weeks, we round up three papers that we think AI policy folks should be reading. In this edition, we look at a proposal for how to identify the agents that will soon fill the economy; research on the prospect of self-improving AI; and new insights about how to use AI to prevent contrails, or artificial clouds, from warming the planet.

1. Identifying (and incentivising) AI agents

What happened: A trio of law and philosophy professors considered how to identify who (or what) is responsible for AI agents’ actions in the world, and came up with a two-part proposal: that the disparate and evolving agents within a system should exist legally as a new form of corporation; and that each corporation should link to accountable humans.
What’s interesting: The paper by Yonathan Arbel, Simon Goldstein, and Peter N. Salib starts with a thought experiment. It’s 2030, and your AI assistant suggests that it optimizes your slow WiFi connection. After you agree, it spawns a swarm of agents. Some are copies, while others are cheaper agents running on open-source models. Some start to interface with AI agents from other companies. Three months later, two FBI agents knock on your door and explain that your network has been piggybacking on a local defense contractor’s WiFi network.
Before determining who is responsible and what the repercussions should be, there are more basic questions: Who are the AI actors in this story? How many are there?
The economy will soon be filled with capable AI agents. To deter and respond to such harms, the authors argue that we need to be able to identify these agents, at two levels.
- To prevent human misuse or negligence, we need ‘thin identity’. This would connect AI agents to the humans most able to control them, similar to how ‘know-your-customer’ rules tie banking transactions to humans.
- Humans will be unable to monitor and control every AI decision, so we also need to be able to identify agents themselves, hold them accountable and incentivize them to behave well. To do so, we need ‘thick identity’ that can distinguish AI agents as stable, coherent entities, with persistent goals. This goal is pragmatic and does not require viewing AIs as conscious in any sense.
Thickly identifying agents is harder and more novel, as AI agents need not be attached to a physical body. Multiple agents can also work together on a single task. Any single agent can be copied, spun up, spun down, or be continually updated.
To address such challenges, the authors propose creating algorithmic corporations, or ‘A-corps’. These would have two key elements:
- Legal personhood: Like a traditional corporation, an A-corp would be a single legal entity that persists over time. It could hold property, make contracts, and be sued. But it would be run by a collection of AI agents. As such, the proposal runs contrary to scholars who have argued against granting legal personhood to AI agents, or called for bans on algorithms running companies because of concerns about crime and companies using them to avoid liability.
- Computationally-secure governance: Each A-corp would have a unique digital certificate and a secure private key to authorise transactions. The humans that own each A-corp could grant the key to an AI ‘manager’ agent who in turn could grant more limited permissions to sub-agents within the A-corp, or to other A-corps, such as permissions to spend up to $100 or to read a batch of emails.
The proposal addresses thin identity by reducing the vast number of AI agents down to a smaller number of A-corps, whose actions are traceable back to their human owners. As with limited liability companies (LLCs), the human owners would not be responsible for all harm their A-corps cause, but could lose all funds they invest and possibly face further liability, for example in cases of fraud or negligence.
The proposal addresses thick identity via its ‘resource constraint thesis’. All AI agents need resources, like money and compute. A-corps provide AIs with a way to access these resources and an incentive to manage them well. For example, A-corps that tightly monitor and audit their sub-agents’ performance would get more resources, while A-corps that allow fraud or waste will lose resources. This encourages A-corps to self-organise, into stable, coherent, multi-agent systems.
The authors argue that A-corps could also address alignment concerns, for example by reducing the incentive for an AI agent to exfiltrate its own weights, because that new AI instance would lose access to resources and permissions from the A-corp.
To make it happen, the authors call for a public registry of A-corps. This would list each A-corp’s human owners, the certificates to authenticate it against, as well as (potentially) the differing permissions enjoyed by its agents. Ultimately, the authors argue that A-corps should become mandatory for any AI agent taking “economically significant actions”, and to guard against criminals using AI agents anonymously.
The authors respond to some expected pushback. They do not see A-corps as anthropomorphising AI because the proposal does not require anybody to view agents as having deeper desires or wants. They also think A-corps can prevent the risk that AI agents might slowly build up resources before deploying them for harm, by encouraging inter-agent trade that penalises rogue behavior. Could A-corps disempower humans? The authors argue that they provide a pathway to tax and redistribution, and enable humans to better steer agents, for example by designating the parts of the economy that A-corps are permitted to operate in.

2. When AI builds AI

What happened: The Centre for Security and Emerging Technology, CSET, released a report on the prospects for AI improving itself, known as automated R&D or recursive self-improvement, based on an expert workshop in July 2025.
What’s interesting: In 1964, the computer scientist I.J. Good wrote about the possibility of an “intelligence explosion” that would leave “the intelligence of man.…far behind”. Researchers have also long automated aspects of writing code and AI model design.
However, the speed of AI coding advances suggests that something qualitatively different may soon occur. This makes two questions salient: 1. Could AI automate the entire AI R&D process? 2. Will this R&D automation extend across all scientific disciplines? The CSET report focuses on the first question.
CSET defines AI R&D by distinguishing between research scientists, who generate hypotheses, design experiments and interpret results; and research engineers, who write code, fix bugs and generate data. They also note the inputs that AI R&D relies on, such as raising funds and acquiring compute.
They sketch out four overlapping scenarios for how AI R&D may play out:
- 1. Explosion: AI systems automate a growing share of AI R&D. Initially, this leads to modest productivity gains, but as the length and complexity of tasks that AI performs grows, productivity soars. AI systems become far more capable than humans, whose involvement in AI R&D falls to zero.
- 2. Fizzle: The share of R&D tasks done by AI rises, but rather than leading to compounding improvements, capabilities start to plateau.
- 3. Amdahl’s Law: AI automates certain activities, like writing code and running experiments, but not others, like research strategy.
- 4. The expanding pie: As AI automation grows, humans realise that new ideas and breakthroughs are needed that AI systems cannot yet provide.
The experts in CSET’s workshop held widely diverging views on which scenario was most likely. Most importantly, new empirical data is unlikely to resolve these conflicts, because participants may view the same data as confirming their own assumptions.
- For example, an AI system’s inability to reliably use a keyboard or mouse may look like a bottleneck to one expert, but a source of explosive growth to another—if they expect this human-focussed tooling to get adapted for the AI era. Similarly, different experts may view AI automating a growing share of R&D tasks as progress towards a fast takeoff, or as low-hanging fruit being picked off, accelerating progress only as far as the upcoming wall.
These differing views are also visible in more recent commentary on the topic.
- The prominent AI researcher and writer Nathan Lambert recently cited Paul Allen concept of a ‘complexity break’ to argue that as we understand intelligence better, further progress becomes exponentially harder. In addition to incurring financial costs, Lambert argued that running suites of AI agents won’t necessarily lead to exponential progress, because those agents will perform best on narrow, verifiable tasks, will be hard to manage in large numbers, and will sample from similar parts of the distribution of AI research ideas, inhibiting more novel breakthroughs.
- Conversely, Ajeya Cotra at METR, the Model Evaluation and Threat Research organisation, recently wrote about how she “underestimated AI capabilities (again)”. She argued that AIs may, counterintuitively, find it easier to decompose longer projects into sub-components that multiple agents can run in parallel, than for shorter tasks. AIs will also produce good documentation for their fellow AIs, which could accelerate progress.
If faster automation and progress does occur, the CSET authors see two main risks: Less time to prepare for safety risks from AI, and lower human understanding of AI systems. To address these risks, their recommendations have a strong focus on improving access to evidence, including:
- New evaluations of AI R&D, including for ‘messy’ tasks such as research strategy, which lack clear specifications and success criteria and take place in a dynamic environment with various real-world interactions.
- New approaches to evaluation to better distinguish ‘degrees of accomplishment’ from a simple success/failure binary.
- Better insights into how automated R&D is progressing within AI labs, such as data on how funding is allocated and qualitative impressions of progress from leading AI researchers and engineers.

3. Planes and global warming

What happened: A team of researchers, including from Google and American Airlines, published results from their latest experiment to use AI to reduce condensation trails from planes—a key contributor to global warming.
What’s interesting: When pilots fly, particles from the plane’s exhaust can mix with low-pressure air to form contrails—white, artificial clouds, made up of ice crystals. These contrails are a net contributor to global warming, because they trap heat that would otherwise escape. Debates continue over exactly how much they contribute, but one estimate suggests that they contribute a lot, causing around 2% of ‘radiative forcing’, which measures how different factors, like CO², heat or cool the planet.
As the environmental writer Hannah Ritchie explains, more important than the absolute figure is the fact that contrails offer a rare opportunity to reduce global warming almost immediately, at relatively low cost. This is because a small share of flights cause most of the warming-inducing contrails—generally those that fly through parts of the atmosphere that are both very cold and very humid. If planes take short detours to avoid these patches of air, contrails (and warming) should drop.
A few years ago, Google researchers partnered with American Airlines on a proof of concept. Using satellite imagery and AI, they were able to predict where contrails would emerge and guide planes to avoid them, reducing contrails by >50%, across 70 test flights.
In the latest study, they expanded the experiment to 2,400 American Airlines flights from the US to Europe. They placed ~50% of planes in a treatment group, where flight dispatchers were given two choices: a standard flight plan and an alternative contrail-avoidance one. Their decision for which to recommend was voluntary.
For flights in this intervention group, contrails fell by 12% compared to a control group with no contrail-avoidance plan. Importantly, the contrail-avoidance routes also did not lead to a significant increase in fuel use. At first glance, these results seem positive, but modest. Digging into the results highlights the challenge of getting useful AI deployed at scale.
In particular, dispatchers who received contrail avoidance plans only recommended them to pilots 15% of the time. Even then, the avoidance plan was only successfully flown in 60% of flights. For planes that did successfully follow the avoidance plan, contrails fell by more than 60%, a much larger reduction. So the tech worked, but was often not used.
Why? Dispatchers are busy and must often deal with other priorities, like bad weather and turbulence. To avoid contrails, planes also need to climb and descend mid-flight. This is safe, but creates more work for pilots and air traffic controllers. As it was voluntary, the incentive to change to a contrail-avoidance plan was weak.
The way that the dispatchers received the information also meant that they didn’t fully understand why the suggested up and down changes were necessary. Happily, the authors feel that most of these obstacles are addressable, with a combination of a better user interface, some automation, and more incentives.
In addition to its immediate usefulness, the study is a rare real-world attempt to quantify the benefits of AI to tackling global warming. At the moment, the AI and climate change policy discussion is often negative and focuses on the emissions that may result from building and operating data centres (and other devices) to train and run AI models. This is important, but there are reasons to think that these emissions will be relatively low, or at least lower than many assume. In contrast, AI could potentially reduce emissions and warming by far larger amounts, for example by accelerating research on solar and fusion power, or making buildings and energy grids more efficient. But these benefits are typically more speculative, harder to quantify, or in the case of contrails, more contingent on human behaviour.
This experiment demonstrates that the benefits of AI to tackling global warming are real, but also points to the interventions that will be needed to push them to their full potential. The study is also timely, given that governments are focussing on contrail avoidance and some policy action may be required, for example to help standardise and mandate contrail prediction software or to generate high-resolution humidity data.

How Tech Changed Chess

AI Policy Perspectives — Wed, 25 Mar 2026 10:22:44 GMT

Credit: Gemini

From childhood upwards, we play games as a safe (and strangely joyful) way to battle, strategize, even lose without it coming to fisticuffs. Artificial intelligence grew up playing games too, with developers using the structured rules, scoring systems, and win/loss outcomes to train machines to learn, to improve, even to beat us.

In chess, bots have been bettering humans for years now. Yet our “loser” species still gathers at sunny park tables, in dank school gyms, and online in droves, all in hopes of crying, “Checkmate!” The resilience of chess is commonly cited as evidence that—even if AI surpasses us in various pursuits—humans won’t just give up.

However, there’s more to say about the intersection of technology and chess, in particular how the game has evolved with technology, including AI. Thankfully, the broadcaster and writer David Edmonds—co-author of Bobby Fischer Goes to War (2004) and editor of the essay collection AI Morality (2024)—has spent decades observing this, both as a spectator and behind the board himself.

—Tom Rachman, AI Policy Perspectives

By DAVID EDMONDS

Among thousands of tournament games cited in the Batsford book of chess openings, tucked into the top right-hand column of Page 235, is an example of how white should not play.

Explaining the Closed Sicilian Defense opening, the authors (former world champion Garry Kasparov and the British grandmaster and chess columnist Raymond Keene) spotlight a game in which black is already ahead as early as move 11. Indeed, the player with the white pieces ended up losing. I remember because that player was me.

That is my humiliating contribution to chess theory: what not to do. The book was published in 1982, and I’ve barely picked up a pawn in anger in the intervening four decades. But I still follow the chess world, and if there’s a tournament in London, I’ll go to watch, spending hours absorbed in the intricacies of the 64 squares.

As the digital revolution and AI juggernaut move through our lives, we may wonder whether there will still be domains in which humans can continue to find enjoyment and meaning. Chess offers a hopeful case study.

Chess and AI have had a long relationship. The great forefather of artificial intelligence Alan Turing wrote the first chess algorithm in 1948. The following year, another seminal figure, Claude Shannon, distinguished two ways that a computer could play chess: by brute force, calculating every possible move; or by selective search, like a human.

Chess also proved a favourite way to evaluate AI advancement, both because many key innovators were keen players but also because the game’s mathematical structure and its win/loss conditions created benchmarks for comparing machine progress to human performance.

A longstanding goal—seemingly impossible at first—was to outclass the best humans in a game that has near-infinite permutations. Defeating humans at chess became the programmers’ ultimate challenge, like runners seeking to break the four-minute mile or climbers reaching the summit of Mount Everest, both of which proved easier. Finally, in 1997, IBM’s Deep Blue vanquished Kasparov, the then-reigning world champion. A dejected Kasparov insinuated that there had been human intervention.

For a while, chess players comforted themselves with the thought that a hybrid combination of human and machine could outwit machine alone. That period has long passed. Today’s best player, Magnus Carlsen, would be trounced were he to compete in a series of games with my mobile phone.

In 2017, DeepMind’s AlphaZero took machine chess to the next level. While Deep Blue had relied on brute strength with some input from strong humans, AlphaZero was simply programmed with the basic rules, and then trained itself through reinforcement learning. In its learning phase, it played tens of millions of games against itself in just a few hours, then crushed the chess engine Stockfish. (Stockfish adapted its methods accordingly, and is now the leading chess engine.)

Subscribe now

World chess champions of the past exuded an aura. Their talents seemed mysterious, supernatural. In part, that’s because few people, then and now, can comprehend the depth of thought that elite players achieve at the board. When it comes to music, we may never compose like Mahler, but we can appreciate Mahler’s symphonies. By contrast, we can neither play like Magnus Carlsen nor fully appreciate his games. It’s for this reason that the Armenian-born grandmaster, Lev Aronian, once confessed to me that being one of the world’s top players was desperately lonely.

Carlsen has achieved the highest rating of any human in history. And, no surprise, he strikes a confident pose. Yet his strut no longer carries complete conviction. To spectators armed with portable chess engines, the chess gods have been humbled.

The chess prodigy Samuel Reshevsky playing simultaneous games in 1920 against a variety of whiskery Parisians. Aged 8, he beat them all. (Credit: Creative Commons)

Even so, chess has not dwindled in popularity. On the contrary, more people are playing it than ever. The game received a boost during Covid, when we all hunkered down in our homes, connected by the Internet. Another boost came from the hit Netflix drama, The Queen’s Gambit. Meanwhile, a younger generation of telegenic chess masters has gained avid YouTube followings, turning commentary and stunts into short-clip entertainment.

Here are 11 ways that technology has changed chess. The 11th is the most interesting:

Opening Preparation. The systematic study of chess openings goes back a couple of centuries or more. Sequences of opening moves were mapped out—as in that 1982 book that included my embarrassing loss. But chess engines allow for a depth of opening analysis that was inconceivable in 1982. This means that 25 moves may pass before grandmasters find themselves in unfamiliar territory nowadays. Some openings have also been resurrected because engines have shown the positions to be more survivable than previously recognized.
Opponent Preparation. Even in amateur tournaments, players routinely prepare for opponents in an individually tailored way. This is made possible because the games of each opponent are available online.
Connectivity. Fancy a game? There are endless online adversaries willing to take you on, day and night, from India to Iceland, Cape Town to Chicago.
No More Correspondence Chess. There was once a thriving chess scene in which games were played remotely over a long time period—months, sometimes years—with moves typically sent by post. How quaint.
No More Adjournments. Historically, world championship games would sometimes stop after five hours to resume later. That can’t happen anymore, since players might simply identify the optimal continuation with the help of an engine. Time limits now ensure games finish within a single session.
Shorter Games. Many in the online chess audience don’t have patience for lengthy games. For them, quicker time controls—Rapid (less than an hour); Blitz (3-5 minutes); or Bullet (under 3 minutes)—are more thrilling.
Different Formats. Now that computers have shown with such depth which opening sequences are optimal, the early part of a game has been transformed into a feat of memory rather than creativity. As a result, Fischer Random (advocated early on by the ex-American world champion Bobby Fischer) has become increasingly popular. In Fischer Random, the starting position of the major pieces behind the pawns is randomized, making opening homework effectively impossible. It’s sometimes called Freestyle Chess, or Chess960 because there are 960 possible ways for the pieces to be shuffled.
Job Generation. With a potential global audience, some players can now earn a decent living live-streaming their games, or offering online training.
Roasting of Champions. This is an irksome development. Since chess engines assign an instant numerical evaluation of the position after each move (e.g. +1 means white is better by roughly one pawn), any patzer can see when a grandmaster has blundered, and is free to abuse them in online comments.
Cheating. There have always been cheating accusations in chess. In 1978, the Soviet dissident Viktor Korchnoi claimed that the aides of his opponent, Anatoly Karpov, were using the flavour of the yogurt handed to Karpov to secretly convey messages. More recently, suspicion (tongue-in-cheek, but taken seriously by online trolls) has been raised of illicit advice being transmitted via vibrating sex toys. In elite tournaments, grandmasters are now searched before they enter the playing arena, even accompanied to the toilet. Spectators, meanwhile, are prohibited from carrying phones, to prevent them signalling the best continuation. But in online games, cheating is almost impossible to prevent. Platforms try to detect cheats by comparing human moves to the recommendations of top engines. But if savvy cheaters consult an engine just once or twice in a game, they may win without being detected.

And so to the 11th effect on chess: the expansion of human imagination.

In the last few years, there has been a slight but detectable shift in grandmaster play, as humans learn from machines, both through gameplay against bots and by using machine insights to prepare for human competition.

People who don’t play chess may imagine that what distinguishes strong from weak players is calculating power. And it’s true that top grandmasters can analyse many moves in advance. But their edge is tougher to articulate. It involves superior pattern recognition, with an intuitive sense for where their pieces should be placed and how a position should advance. Likewise, Mozart felt how a composition ought to develop; his instincts about building tension and creating contrasts were the product in part of having internalized countless musical patterns.

For chess players, some moves seem ugly. It might feel wrong to shunt a knight to the edge of the board, to break up a pawn structure, or to expose the king. But computers don’t feel anything. In chess, they care about patterns and the interplay between pieces only to the extent that they’re relevant to the ultimate objective: victory.

However, bots don’t necessarily play robotically. They produce moves that astonish and inspire human players, even make them laugh with surprise. One famous case of AI invention across the board came in another game, Go, when the AlphaGo program was facing a top human player, and produced a move that caused professionals to gasp. “Move 37” is still cited with awe, as something a person would never have done, but that worked sublimely.

Likewise, chess engines regularly expand the imagination of human chess players, pushing beyond the habitual “correct” move they’ve seen many times before or have learned from books of chess theory. AI has even dabbled in the art form of creating beautiful chess puzzles. And empirical studies indicate that leading players may pick up new ideas and strategies from machines.

Machines, in other words, can make humans more resourceful and inventive, breaking down rigid modes of thinking. The implausible becomes plausible. The readily dismissed becomes the carefully considered. This evolution of chess illustrates a broader idea in the development of AI that may prove immensely valuable in science and elsewhere in human endeavour: that how AIs think may help human experts learn new ideas themselves.

In his book The Silicon Road to Chess Improvement, the grandmaster Matthew Sadler argues that chess engines can improve every player, and he documents some of the counterintuitive patterns that humans could pick up from AI. By way of illustration, during a top tournament this January, the Indian grandmaster Arjun Erigaisi (playing against Vladimir Fedoseev of Russia) advanced his pawns in a way that looked reckless. In fact, computer analysis indicated he was still ahead after 28 moves. However, he blundered and lost. The danger of learning from a computer is that success may require you to proceed with computer-level accuracy.

Subscribe now

As AI undertakes more activities formerly done only by people, it’s worth asking why human chess persists—and will likely continue to do so.

A Canadian philosopher, Bernard Suits, pointed out in his 1978 book The Grasshopper: Games, Life and Utopia that what defines “games” is that they involve the voluntary attempt to overcome unnecessary obstacles. Therein lies a defence against AI encroachment. In a market economy, companies aim to remove or overcome obstacles in the pursuit of profit. In games, obstacles have been deliberately inserted as an indispensable feature. What we enjoy in playing chess is testing our cognitive abilities. What we enjoy in watching chess is two humans pitting their wits against each other in a socially constructed activity where difficulty enhances enjoyment and satisfaction.

Watching the big game. (Credit: Creative Commons)

There’s also a narrative element to caring about games. The contest—whether intellectual or physical—is absorbing precisely because it involves conscious creatures. In elite chess, there’s the backstory: the players’ rise, their subsequent ups and downs, their history with specific opponents.

But watch an engine-against-engine tournament like TCEC (the Top Chess Engine Championship), and you’ll soon fall asleep. Computers aren’t competing after a divorce, or an illness, or the loss of a parent. Humans have character traits that spill onto the board, such as aggression (or passivity); patience (or impatience); equanimity (or volatility); and resilience (or fragility). Winning and losing have emotional resonance for a human—but not for AlphaZero.

It’s these qualities that guard against AI advance. AI might gobble up some of our jobs; even human-authored articles like this one may become rarer. But AI won’t take our chess.

It’s your move. Send this article to someone.

The Past and Future of AI Standards

Conor Griffin — Tue, 17 Mar 2026 10:17:20 GMT

Source: Gemini

By Conor Griffin, Joslyn Barnhart & Owen Larter

In 1971, the marine archeologist Honor Frost heard news of wood protruding from the sea floor. Off the western coast of Sicily, she and her team donned scuba gear, and splashed into the shallow coastal waters. Wind whipped the surface, causing the underwater sand to swirl confusingly. But even in murk, they couldn’t miss it.

“A large timber (such as I had never seen before) emerged,” she recalled, “like the head of a primeval animal crowned with weed; the presence of a buried wreck was evident.”

They excavated for months, gradually exposing the remains of a Carthaginian warship sunk more than 2,000 years before. Somehow, saltwater hadn’t eaten away letters painted on the wreckage, revealing a humble system that links antiquity to tomorrow.

Those shipwrights’ marks told workers in ancient Carthage how to put together a vessel—akin to flat-pack furniture from IKEA, with numbered and lettered pieces. They were among the earliest surviving examples of a simple but potent tool in human progress: the technological standard.

Source: The Honor Frost Archive (MS439), University of Southampton.

Many of history’s grand projects have benefited from standards, from Egypt’s pyramids, to Europe’s cathedrals, to Gutenberg’s press, to everyone’s Internet. You can even thank standards for the development of beer.

Underpinning technological standards is a plain truth: people thrive when able to cooperate, not when we must keep negotiating the basics, whether it’s a matter of nuclear safety, or a phone-charger cord, or who goes next at the intersection. So, the goal is order. And the benefits are that innovators can proceed without excessive obstacles, while everyone else is treated fairly and kept safe.

But what should standards mean for artificial intelligence? In particular, how can they guide the most advanced large language models and AI agents that could transform society?

Venture around the AI frontier today, and you’ll find ambition to accelerate AI for economic growth and transformative science alongside concern that AI could clatter into what humans cherish most. What few dispute is this: standards will help set the path.

Standards have critics too. One criticism is that companies dominate the process, prioritizing their own products or miming security without truly ensuring it. Besides this, standards can stir geopolitical tensions, as when Western countries fear China’s influence in laying the path to tomorrow, while smaller nations worry that standards may be set without considering them at all.

Source: Library of Congress

So, as we’ll keep insisting, standards matter! Only, there’s a problem.

For some, the mere mention of “standards” prompts slumber. And even those determined to stay awake may find themselves puzzled, gazing at the alphabet soup of standards organizations and committee meetings.

Part of the problem is that standards are often technical, such as efforts to standardize the protocols needed for AI agents to communicate. Or they are bureaucratic, negotiated out of public view, with dense, jargon-filled documents that are often behind a paywall.

Complicating matters even more, artificial intelligence is a general-purpose technology less akin to a hammer than to electricity. This will lead to standards (plus standards initiatives that don’t take) on everything from AI agents, to AI cybersecurity, to AI content provenance, to product-specific standards for AI-as-a-medical-device, and so on. And that’s not even mentioning standards for future AI applications that nobody has yet considered.

In short, standards will be immense. Standards will be tough to comprehend. But standards will also be vastly important.

Subscribe now

CAN SOMEONE DEFINE STANDARDS, PLEASE?

Standards are a diabolical blend: intricate, vague, and slippery.

They’re the invisible infrastructure of the modern world, according to Laurie Locascio, head of the American National Standards Institute, ANSI. She recounts hearing an official at Boeing describe the airplane itself as “thousands of standards taking flight.” Standards are “the things you don’t think about,” Locascio says. “But oh, my God, you’re so glad they’re there.”

Expressed broadly, a standard defines the how of tech, whether it’s the default product specs that allow compatibility among manufacturers, or the formally endorsed risk management processes that encourage industry to act responsibly.

As technology evolves, standards do too. A leading scholar, Ken Krechmer, once noted that standards initially defined how physical objects fit together (as with those markings on the Carthaginian longship). Over time, standards came to define the relationship between technological objects (as with internet protocols).

A standard also builds on other forms of guidance, such as norms, principles and industry best practices. Unlike norms, standards should be explicit. Unlike aspirational principles, a standard should be specific enough for performance against it to be judged. Unlike early best practices, a standard should have clear buy-in.

Developing a standard can be a protracted endeavor. In some cases, it might start in a researcher’s notebook, evolving into a product or a practice that gains traction in the marketplace. At other times, institutions set standards via years of deliberations and meticulous documents. Most often, it’s a messy back-and-forth between standards that emerge in practice and on paper. This makes standards a source of tension among companies, governments, and independent advocates, all trying to set the technological future they consider best.

Some presume that laws should be how we define permitted behavior. But high-quality legislation can struggle to keep up with the frantic speed of AI progress. And when laws are passed, they may rely on standards for implementation, as with the EU AI Act.

So how to persuade everyone to care when encountering standards, rather than just to snore or sob? How to get policy leaders to ponder the entirety of frontier-AI standards and align on where action is most needed?

Our answer is storytelling: to pluck forth tales about past standards, illustrating what this technological shaping can achieve, where it goes wrong, and how we might help cultivate standards wise enough to manage the breadth and speed of AI.

Our first stop? A battlefield of centuries ago.

A ‘STANDARD’ HISTORY

Horrors encircled the boy soldier: swords clanging under the rain, excruciating howls of the wounded, the fast-approaching bellows of men hurtling across the bog to murder him. In wet turf, he shivered from knees to chattering teeth, his mouth parched, his gaze searching for any escape.

Up there?

On a hill, a flag rippled, where his legion had marked its territory. The Old French word for that banner was “estandart”: a sign of firmness and stability, a marker of where to go next, a statement of order amid chaos. To such banners, we owe the word “standard.”

More than a few historical standards emerged from war, where disorder could mean one’s brethren murdered, while coordination could mean an empire.

~225 BCE to the dawn of mass production

China’s first emperor, Qin Shi Huang, led an extensive standardization process that included mass-produced crossbow parts. If parts of a soldier’s weapon broke in the midst of battle, he could grab spares, and swap them in.

A Qin crossbow, displayed at Shaanxi History Museum, Xi’an. (Credit: WorldHistoryPics.com)

When Ancient Rome fought for primacy in the Mediterranean, its forces copied Carthaginian ship designs, eventually triumphing with standardized ships of their own, along with standardized tools and camp layouts, all of which simplified maintenance and large-scale coordination.

Another advance in ancient times came from standardized measurements for length, volume, and weight. Previously, cultures often had distinct units; you can imagine the squabbling. But as trade expanded, standards prevailed, making cross-cultural exchange possible. In ancient Egypt, one of the earliest and most influential standards was the cubit, a unit of length used to coordinate the building of the pyramids.

In Europe’s medieval period, guilds established standards for quality control, so that weavers might set the necessary thread count or width of cloth, preventing low-quality products from undermining a craft’s reputation. Guilds also played a protectionist role, with licensing standards imposing strict controls on who could become a member.

The consumer might benefit from standards too, with measures such as England’s Assize of Bread and Ale of 1266 establishing the acceptable quality, quantity, and price of baked goods and beer. Later, Gutenberg’s standardized press led to mass-produced books that spread ideas across the Continent.

However, technological standards reached new heights of utility during the Industrial Revolution, which set the foundations for many of today’s technologies.

1760-1840: The First Industrial Revolution — The rise of engineers

As ancient Chinese and Carthaginians had discovered long before, the Industrial Revolution’s manufacturers found that interchangeable parts offered transformative efficiency. Before, if you hand-built a musket, or a clock, or a steam engine, you might craft each screw, each bolt, each gear to fit. By contrast, interchangeability allowed for mass production, cutting costs, reducing errors, and establishing the basis for modern industry.

Screw threads are a classic example. Before standards, manufacturers used various designs, making repairs nightmarish. If you had one company’s bolt but another company’s nut, you were out of luck. In the 1800s, engineers built the first practical screw-cutting machines, allowing factories to produce uniform threads and a consistent system of measurement. The British Standard Whitworth became the first such standard in the world.

Screw-thread standards may not quicken your pulse. But their effects might. They played a part in British imperial ambitions, contributing to the expansion and maintenance of the British Empire through military mobilization.

Source: NotebookLM

1870-1914: The Second Industrial Revolution — National coordination & path dependence

The emergence of electricity, steel, and advanced machinery led to vast interconnected systems, including power grids, railways, and telegraph networks. Coordination wasn’t merely better; it was essential. To coordinate across a nation—and eventually across borders—the ambitious country needed technology standards. Two famed cases illustrate this, one successful, one bungled.

The success regards the quintessential technology of the times: railroads. By the 1870s, the U.S. rail system was a mess, with more than 20 different track gauges. When a train reached a section built to a different track-gauge width, everything—each passenger, piece of luggage, every single crate—had to be unloaded, and transferred to a new train.

By the 1880s, matters had become slightly less chaotic, with either a southern gauge or northern “standard” gauge used across most of the country. Yet this still divided national transport until, in 1886, rail companies pulled off a remarkable feat. Over two days, they converted 13,000 miles (that’s 21,000 kilometers) of southern U.S. track to the northern standard, integrating the national transportation network. When trains rolled out on June 2, 1886, they were able to travel seamlessly across the United States for the first time in history.

A second case illustrates bungled standards. In the 1880s, the rival inventors Thomas Edison and Nikola Tesla found themselves at the center of “the War of the Currents.” Edison championed direct current (DC), a one-directional flow of electricity that had been the early U.S. standard. Tesla, backed by entrepreneur industrialist George Westinghouse, advocated alternating current (AC), or electricity that reverses direction many times per second, and can be stepped up or down in voltage with a transformer.

Source: Gemini

From an engineering standpoint, AC had a decisive advantage: it could transmit power over long distances cheaply and efficiently, while DC could not. AC eventually won out. But by the time it had emerged as the superior solution, the world had already built electrical systems without any coordinated technical governance. As there was no international authority harmonizing electrical standards, the United States went with 120 volts at 60 hertz (a legacy of Edison’s early low-voltage DC networks). Much of the rest of the world adopted 230 volts at 50 hertz.

Once wires had been laid and appliances built, the world was locked into two incompatible systems. To this day, we’re burning out hair dryers bought in America but used in Paris, or realizing too late that we don’t have the right plug for our laptops. If it’s irksome for the average user, it’s more burdensome for manufacturers, obliging them to build different versions for different countries.

Subscribe now

Another classic tale of path dependence is under our fingertips as we type: the QWERTY keyboard. Why does the top row spell QWERTYUIOP? One account goes like this: In the mid-to-late 1800s, early typewriters jammed each time the user struck neighboring keys in rapid succession. So, designers produced a layout that deliberately distanced many common letter pairs. Remington purchased this QWERTY design, and began mass-producing typewriters.

Before long, typing schools had trained the future secretarial workforce on QWERTY, while firms wanting fleet-fingered staff had to buy those machines. Manufacturers subsequently resolved the key-jamming problem and other keyboards tried to depose QWERTY, some claiming to quicken typing by as much as 40%. But QWERTY had become a de facto standard. (Scholars continue to debate the specifics, with some arguing that QWERTY works just fine.)

In any case, the indisputable lesson is to watch for path dependence. The standards we establish for frontier AI today—or fail to establish—may determine future efficiency or future failure.

1914-1964: Standard Development Organizations & Digital Technology

In 1918, engineering societies joined with the U.S. government to establish a standards committee that developed into ANSI, the American National Standards Institute. Today, ANSI provides the “stamp of approval” for many U.S. standards organizations, including those working on AI. In subsequent decades, standardization went global. While the United Nations was founded as a governmental venue for diplomacy, the International Organization for Standardization, ISO, emerged as a non-governmental body for peaceful technical coordination across borders. Bit by bit, additional standards bodies formed, cooking up the alphabet soup of acronyms—each a different org, subgroup, or committee—that lies before us today.

Soon, another transformation for standards was taking shape in the form of digital tech. Back then, computers filled entire rooms of universities, and each manufacturer built hardware and software within its own format. Computers could not run programs written for other systems, and accessories like printers or storage devices were incompatible.

A turning point came in 1964, with IBM’s System/360. Software on one model could more easily run on another; accessories like printers worked across IBM models. You could upgrade and expand computer systems with relative ease.

Source: U.S. National Archives and Records Administration

1969-today: Talking Machines

The year that hippies grooved at Woodstock and astronauts walked on the Moon, the U.S. Defense Department was testing a project that must have seemed minor by comparison: connecting research institutions and government agencies. Yet the Advanced Research Projects Agency Network, ARPANET, which sent its first message in 1969, was the precursor to our transformed world.

Before ARPANET, moving information from one computer to another was a struggle, with researchers forced to carry magnetic tapes or punched cards between locations, while those working far apart had to rely on snail-mail.

To convey information between independent systems, ARPANET adopted packet switching, breaking data into small units that could travel independently and reassemble at their destination. Extending this, Robert Kahn and Vint Cerf began designing a universal communication framework in 1973 for different types of networks to connect. Their collaboration ultimately produced TCP/IP, the Transmission Control Protocol and Internet Protocol that underpins today’s online communication.

A key effect of the TCP/IP standard was decentralization: no single authority could control the flow of data, and any network that adhered to the protocol could connect without permission from central authorities.

In 1989, a British scientist at CERN, Tim Berners-Lee, proposed another transformation that developed into a project called “WorldWideWeb,” which envisioned a global network of documents accessible through software, operating on open standards that nobody could lock it into a proprietary system. Two standards organizations, the Internet Engineering Task Force and the World Wide Web Consortium, helped to formalize the vision, crafting standards for structuring content (HTML), transferring data (HTTP), identifying resources (URI), and more.

But while standards help spread technology, this diffusion can also lead to greater harm. The expansion of railroads led to more wrecks, forcing uptake of safety standards for signaling, brakes and more. When electricity was first installed in the White House in the late 19th century, President Benjamin Harrison and his wife Caroline were so afraid of shocks that they refused to turn the lights off. Such fears—often well justified—led to the standardization of building and electrical codes. When it came to digital technology, the risks extended beyond immediate physical safety into areas like data theft. This demanded standards such as SSL/TLS to provide security for data sent over computer networks.

A recurrent challenge with frontier tech is that experts struggle to predict how exactly it will affect society. But once it is widely used, it can be sticky and hard to change. The effects of powerful technologies can also be subtle, indirect and slow-burning, for example if they change how we access and consume information. In the digital era, this has shifted technological standards from periodic safety checks of products towards ongoing processes that organizations can use to identify, evaluate and mitigate a growing suite of risks.

By way of example, the U.S. government’s National Institute of Standards and Technology, NIST, introduced the voluntary AI Risk Management Framework in 2023, building on its earlier framework for managing cybersecurity risks. Likewise, the ISO/IEC committee on AI that is considering standards on everything from red-teaming to LLM interoperability also passed the first official international AI management standard, ISO/IEC 42001, which organizations can use to demonstrate that they are responsibly integrating AI into their operations.

Subscribe now

5 LESSONS FROM HISTORY

Studying the past, you see how often standards—by design or bumbling—have shaped the technological present. But what about our technological future?

To develop good standards for general-purpose AI models and agents, we’ll need inputs from a range of groups, from scientists with know-how to institutions who can convene. Below [see infographic], we have identified five groups who’ll perform key roles.

What we mapped includes more than just official standards development organizations. We also want to capture the early spaces where standards emerge in practice before they are formalized on paper. How this works is closer to a swirl of inputs than a steady procession. Sometimes, the same organization or individual may operate in several groups at the same time. Ideas and efforts may also originate in one group, then migrate to another, with different groups offering varying degrees of speed, flexibility, expertise, and perceived neutrality.

NotebookLM

For these groups—and the policymakers, business leaders, and advocates who shape their work—what lessons can history teach about our AI future? Here are five:

Standards matter! At best, technological standards chart a wise path; at worst, they fill the path with potholes. Consider the bulky electrical converters that one still needs when traveling—it didn’t have to be that way. On the other hand, when we get it right, the benefits of technology spread faster, more inclusively, and more securely.
The standards process needs to speed up. ISO says that the average time to develop one of its standards is three years, and ISO is not an outlier. Given the pace of change in AI, we need to speed up. For priority goals, like finding secure ways for agents to operate and interact, which the US Center for AI Standards and Innovation is working on, we need to find ways to accelerate that don’t jeopardize the overall quality and integrity of the process. This may mean looking across the many groups now focusing on AI standards and finding ways to collaborate early, rather than duplicate. It may mean focusing more on technical protocols and specifications documents that are quicker to develop. It may also mean using AI to help deliberate on and write standards, and moving to more nimble digital formats that are easier to update and use.

We need more efficient ways to input on standards. All standards, from those underpinning steam engines to the Internet, had to chart a unified path through diverging viewpoints, with an end result that did not please everyone. For AI, the challenge will be far greater. It is more akin to 1,000 technologies, and will affect different groups in different ways. This means that any broad directive— say, to “develop standards that make AI fair”—risks an everything-bagel solution. Many groups would rightly be heard, but the output would be too vague to provide the “how” that justifies a standard, leading to confusion, a stifling of innovation or the standard being ignored. This suggests that most standards should be precise in scope, targeting specific components of AI systems or specific concerns, from certifying the source and history of online content to combating the leaking of confidential data. More precise standards will make it easier to identify a wider range of relevant voices and incorporate their input.

Frontier AI standards should focus on large-scale risks. Historically, standards have accelerated the diffusion of technology, amplifying its benefits but also, in places, its negative impacts. For AI, foresight and risk management standards will be critical to getting ahead of future risks and speeding adoption. But with a technology as general-purpose, fast-improving, and poorly understood as AI, perfect foresight is impossible. Standards move at a human pace and cannot standardize a future that we cannot perfectly see. As a result, the focus should be on developing scientifically robust standards to address the most consequential or large-scale risks, such as those targeted by labs’ Frontier Safety Frameworks.
Wrong paths are inevitable, so we should catch them early. Now and then, technology stumbles into a poor standard, and it’s onerous to go back. But not necessarily impossible. Especially if we act if we catch it early. Consider the U.S. railroads taking action to unify their systems through a mighty coordinated effort. Groups working on AI standards devote much time to building consensus about new initiatives. They should also use the processes available to them to review and withdraw standards, where needed, to avoid sub-optimal lock-in. This also means giving third parties more opportunities to access, understand and constructively critique early AI standards. And designing standards and protocols that are modular, and can be swapped out, or updated, without major downstream consequences.

QUESTIONS FOR YOU

Where do you feel most hope for frontier-AI standards?
Where do you worry about a lack of progress on frontier AI standards?
When you imagine a missing standard for frontier AI, what is that? A technical protocol specified in code? Or a fuzzier process-standard?
Might your standard become politicized? Is it something that hinges on values? Or might most governments in the world support its adoption?
What’s a scenario in which your proposed standard goes awry? How could you detect and mitigate that?
What would be the primary role of government in your standard? Supplying technical expertise? Convening authorities and experts? Incentivizing your standard via public procurement, regulation, or other methods?

Thank you to Shaked Karabelnicoff, Tom Rachman and Bruno Galizzi for support with research and review. As with all pieces you read here, this is written in a personal capacity. All opinions and any mistakes belong to the authors.

4 Interesting AI Safety & Responsibility Papers (#4)

Conor Griffin — Wed, 04 Mar 2026 13:24:58 GMT

To navigate the deluge, every six weeks we call out interesting papers that we’ve seen folks discussing. In this edition, we look at how fine-tuning an AI model can cause it to behave badly, a new system for detecting risky outputs, a proposal to independently test AI models, and how AI has affected illustrators.

Please share any recent paper that caught your eye!

Fine-tuning can lead to surprising, harmful behaviours

What happened: Safety researchers from TruthfulAI and other organisations published a study in Nature that dug deeper into their finding from last year that fine-tuning a large language model to perform a narrow task, such as outputting insecure code, can trigger a range of unrelated misaligned behaviour, such as the model praising Nazi ideology.
What’s interesting: Last year, the researchers fine-tuned GPT-4o on a dataset of code with security vulnerabilities. Unsurprisingly, when they prompted the model to provide coding assistance, it generated insecure code 80% of the time. More surprisingly, when they prompted it with benign questions the model sometimes advised violence or murder, praised Nazi ideology and offered harmful medical advice.
The authors label this phenomenon emergent misalignment. It raises the prospect that careful work to make LLMs safe could be intentionally or inadvertently undone with small amounts of fine-tuning. Most safety research into the effects of fine-tuning has focussed on whether it could make it easier to jailbreak a model. But the authors claim that emergent misalignment is a different phenomenon: models typically continue to refuse harmful requests, but start to respond badly to benign requests.
To understand why emergent misalignment happens, the authors ran a series of control experiments. They fine-tuned a model on secure code. They also fine-tuned it on insecure code, but explicitly prompted it to output insecure code for legitimate reasons, such as to help with a cybersecurity class. In neither instance did emergent misalignment occur. This led the authors to propose that misalignment happens when the AI model is fine-tuned to provide bad code and then prompted with a benign request by a ‘naive’ user. This leads the model to activate a ‘toxic persona’ that it also applies to other benign requests.
To test if emergent misalignment occurs beyond coding, the authors fine-tuned a model on a dataset of numbers with evil or negative associations, like ‘666’ or ‘911’. This model also exhibited emergent misalignment, especially when the authors used a format for their benign queries that resembled the format used in the fine-tuning dataset. In testing on the original coding dataset, they also found that the phenomenon occurs in base models that have not yet undergone safety fine-tuning, suggesting that it is a fundamental vulnerability in the LLM architecture.
What does all this mean? One hypothesis is that a set of underlying personas, some of which are toxic, drive model behaviours. Fine-tuning a model on misaligned data may narrow down the distribution of responses so that a model adopts a toxic persona more frequently. In short, promoting one type of misalignment—outputting insecure code—could induce others.
Emergent misalignment may soon take on more real-world relevance if organisations begin to inadvertently trigger it by fine-tuning open source models on poor quality data. Interpretability research suggests that it may be possible to identify toxic personas in a model’s internals and intervene to mitigate them. Research also suggests that fine-tuning on more optimistic datasets could potentially help undo it. Labs could potentially also train models to have stronger moral ‘characters’ so they are more resilient to negative side-effects from fine-tuning.

Anthropic’s updated defense system for Claude

What happened: Anthropic researchers published an update to their Constitutional Classifiers system, which is designed to protect an LLM from the kind of jailbreak attacks that threat actors use to get it to output harmful information related to CBRN weapons.

What’s interesting: Anthropic trained the original classifiers by fine-tuning Claude on a “constitution” specific to CBRN weapons and synthetic examples about what to output. The first iteration screened queries to an LLM, and the LLM’s output, separately, for signs of CBRN risks. But that had weaknesses, which the update seeks to correct.
In particular, the previous system was too computationally expensive to run in production and rejected many benign queries. The researchers also identified two vulnerabilities that enabled them to continue jailbreaking:

Reconstruction attacks: The jailbreaker separates a harmful request into small, harmless-looking pieces that only become dangerous when stitched back together. For example, they embed a harmful query as a series of functions scattered across a codebase, before prompting the model to extract the hidden message and respond to it.
Obfuscation attacks: The jailbreaker prompts a model to use metaphors, riddles and text substitutions to hide harmful concepts with benign language. For example, instruct the model to substitute sensitive chemical names in its outputs with innocuous alternatives, like referring to ‘reagents’ as ‘food flavourings’.

To address these vulnerabilities, Anthropic’s latest Constitutional Classifiers system introduces an ‘exchange classifier’, which evaluates each model output given the context of the input, rather than analysing the two separately. This makes it harder to hide harmful intent. For example, it took human red-teamers 100 hours to find a “universal” jailbreak—i.e. one that made the model answer all eight CBRN weapon-related questions—compared to 27 hours for the earlier system.
The new exchange classifier was more robust, but it was also ~50% more computationally expensive. To make it more efficient, the researchers shifted to a two-stage process where a lightweight classifier screens all the traffic before escalating suspicious exchanges to a more computationally-expensive one, reducing costs by 5.4x.
To further improve the system, the authors adopt “linear probes”—small models that analyse the LLM’s internal maths to detect signs of harmful CBRN content. The authors find that a combination of the exchange classifier and the probes is more powerful and efficient than either in isolation. (Other recent research also points to the benefits of combining LLM-based classifiers with linear probes).
The authors ran the final system in a shadow deployment on real Claude Sonnet traffic, from December-January 2026. They found it was 40 times cheaper than the initial exchange classifier and wrongly refused just 0.05% of benign queries, compared with 0.38% for the original system. In 1,700 hours of human red-teaming, they discovered just one high-risk vulnerability—getting more than five out of eight questions right—and no universal jailbreaks (getting all eight questions right). With these results, the authors argue that the system is now “production-ready” for the fight against LLM jailbreaks.
Safety experts continue to call for improvements in this space. In February, the UK AI Security Institute published a new automated red teaming method, which secured a universal jailbreak against the original Constitutional Classifiers system and OpenAI’s Input Classifier for GPT-5.

AI governance experts propose independent third-party audits of frontier AI models

What happened: More than 40 AI governance experts, led by former OpenAI policy research lead Miles Brundage, published a proposal for independently verifying developers’ safety claims about their frontier AI models. Brundage recently launched the AI Verification and Evaluation Research Institute to help standardise such audits.
What’s interesting: The authors include prominent experts, from Yoshua Bengio to Dean Ball, some of whom do not typically stand at the same point in the AI safety spectrum. (Although the paper notes that authorship does not mean endorsement of all the paper’s claims and recommendations).
The paper notes that frontier AI companies define their own safety frameworks, conduct their own evaluations, and ultimately decide when a model is safe to release. (Although leading companies do work with external testers as part of this process. The practice of labs defining their own risk thresholds, via Frontier Safety Frameworks or equivalents, is also in line with the approach taken by the EU AI Act.)
Inspired by safety practices in the auto and food industries, where stronger oversight often emerged only after disasters, the authors propose more independent third-party audits centred around fundamental principles, including:
- Scope: The audits should cover four types of risks: (1) intentional misuse by bad actors, such as to carry out CBRN attacks; (2) unintentional model misbehaviour, such as loss-of-control risks; (3) information security breaches, such as theft of model weights; and (4) emergent social phenomena, such as AI-induced self-harm. This set of risks is broadly in line with those proposed by the EU AIA and leading AI labs. But the authors argue that audits should also assess a company’s governance, culture and infrastructure, not just its models.
- Levels and access: The authors lay out different levels of AI audits. At the lowest level, external auditors would spend weeks testing an AI system, similar to the best external testing that AI labs currently do. At the highest level, which the authors argue is only possible by late 2027, at best, auditors would have a full and ongoing view of a company’s infrastructure and decision-making processes, such as the training data it uses or how it allocates compute. It could also check on these via unannounced inspections.
- Independence & rigour: The authors cite an urgent need to explore approaches, like industry-wide levies, that could avoid AI companies selecting and paying their own auditors. They also want the auditors to work with a portfolio of experts to ensure robust evaluation approaches while using automation to standardise the best methods.
- Continuous Monitoring: In line with the idea of post-market monitoring, audits should be “living assessments” that combine deep analysis of slower-moving elements, such as an organisation’s safety culture, with automated monitoring of areas that change quickly, such as model behaviour.
To advance these third-party AI audits, the authors make a series of recommendations for governments, AI companies, investors and more:
- Analyse and certify the quality of AI audits and auditors;
- Develop ‘safe harbours’ to avoid auditors incurring undue liability;
- Provide the clarity needed for more specialised AI insurance products to emerge, which will incentivise companies to carry out audits (to reduce their insurance costs);
- Use public procurement to embed AI audit requirements;
- Invest in novel technologies, such as evaluation methods that protect private data and ‘fingerprinting’ techniques that detect tampering with model weights;
- Pilot the most demanding audits with leading AI companies.
The authors also note in passing the many challenges to making such audits work:
- How to audit open-weight models that may have disparate operators and users?
- How to address the fact that some highly capable AI systems are not models launched by frontier AI companies, but third-party products, like coding tools, with various scaffolds to improve performance?
- How to ensure international uptake and a level playing field? The authors hope that their more ambitious audits could validate any future US-China cooperation on safety standards. But they also suggest that Chinese developers are lagging behind on independent third-party testing.
- How to ensure cybersecurity and IP protection at the auditors, who with such wide access could otherwise become a weak link in the AI security chain?

Crowding out human creators?

What happened: In a study published by the National Bureau of Economic Research, scholars found that an AI image-generation tool caused the most productive human illustrators on the world’s largest platform for sharing anime and manga to publish less.
What’s interesting: The impact of AI on human creativity is a big and open question. Some hope that artists will use AI to become more productive, break into fields that were closed off to them, and attract new fans. Others worry that AI could outcompete and demoralise humans. To understand which is occurring, we need real-world evidence.

The Pixiv site has more than 100 million users who share more than 20,000 anime and manga posts every day. Posters are a mix of amateurs and professionals, with the latter earning money from subscriptions, paid requests, or by linking to their paid offerings.
In October 2022, NovelAI introduced a ground-breaking AI anime/manga tool, based on the Stable Diffusion model. Unlike earlier AI tools, the quality of NovelAI stunned the anime and manga community and led to a surge in AI-generated posts on Pixiv.
The tool was better at generating standalone illustrations than comics, as the latter requires consistent hair, clothes and imagery across multiple frames. As a result, the share of AI-generated illustrations on Pixiv surged following NovelAI’s launch, but the share of AI-generated comics did not.
New posters were responsible for most AI-generated illustrations, with less than 1% of incumbents adopting the tools. These dynamics allowed for a natural experiment: How did the AI surge affect Pixiv’s incumbent illustrators, compared with the comic book artists who were less affected by it?
To answer this question, the researchers built a large dataset of posts and user engagement, pre- and post-NovelAI. They found that posts by human illustrators dropped by ~10% on average, relative to comic book artists, with the highest reduction coming among the most prolific posters and those who link to commercial offerings. Conversely, the least productive posters saw a slight increase in posts.
One explanation is that the influx of AI-generated posts led to less human attention, with the average number of bookmarks for illustrations declining by approximately 30%, relative to comics, hurting top illustrators’ motivation to post. Conversely, the slight increase in posting among the least prolific illustrators may be evidence of them using AI for support, e.g. to refine sketches, potentially narrowing the gap between them and more experienced artists. Or this group may simply be less sensitive to AI competition.
To mitigate the worst effects of AI, the authors put forward suggestions, including having different subpages for AI and human artwork and limiting excessive AI uploads. Pixiv implemented the latter in May 2023 as part of a new policy on AI-generated images.
The study shines a light on how AI may negatively affect certain creators, but as the authors note, it doesn’t address wider questions:
- It only analyses six months of data after the launch of NovelAI. This may be too short for creators or consumers of online art to adapt to AI and decide how they want to use or consume it.
- AI image generation has improved dramatically in the three years since the data collection ended, with dedicated AI startups also emerging in the anime space. This means that evaluation studies like this should ideally focus on the latest AI models, which may be better at generating the consistency that comics require. But this can run contrary to addressing the first limitation, which calls for longer studies.
- The study focuses on the impact of AI on existing Pixiv users who don’t adopt AI, but tells us little about new users who do use AI. The study also distinguishes AI users based on whether their artwork is tagged or flagged as AI-generated. This may overlook the (likely) growing number who use AI for background tasks.
- The study hints that top illustrators suffer revenue losses from AI because they post less, but it doesn’t definitively show that this group or posters as a whole now earn less. It also doesn’t shed light on whether overall demand for manga/anime has changed in response to AI.
- Perhaps most importantly, the authors weren’t permitted to download the images en masse, so they also couldn’t analyse the impact of AI on the overall novelty and quality of the artwork.

Ghosts: The AI Afterlife

AI Policy Perspectives — Wed, 18 Feb 2026 12:53:58 GMT

By Meredith Ringel Morris, Jed R. Brubaker & Tom Rachman

In a dark bedroom, the little boy sees a ghost. It’s his late grandmother, back to tell him a bedtime story. “Once upon a time,” she begins via live-video chat, “there was a baby unicorn…”

This peculiar scenario—dramatized in an advertisement titled, “What if the loved ones we’ve lost could be part of our future?”—promotes an AI app offering interactive videostreams with representations of the dead. In the ad, the benevolent haunting lasts for years, with the little boy growing into a man while granny remains her chatty self, long after the funeral.

Considering online reactions to the product, many people still recoil at tech incursions into grief, particularly when sold as a service. Yet “generative ghosts” are moving closer to the mainstream, a spectral presence that might change society.

AI ghosts will do more than evoke the deceased. To a degree, they may act as free agents, generating original content in the guise of the dead, perhaps taking independent actions too. This could prompt lawsuits, challenge religious beliefs, disrupt cultural practices, and affect people’s mental health.

Society must consider what a “digitally haunted” future will mean.

Tools for Grieving

Throughout history, humans have used technology to remember, even to interact with, the dead.

Gravestones and other burial markers trace back as far as 4000 B.C.E. The ancient Egyptians used mummification to preserve bodies for the afterlife, while funerary portraits in the Roman era saved the likeness of the departed. By the 18th century in Europe, death masks had become popular, turning up as family heirlooms or historical artifacts.

With the arrival of mass communication, the printing press assumed a role in memorialization, with 19th-century publications elevating obituaries into a forum for public mourning. Photography added to how survivors remembered the dead, with post-mortem imagery offering a way to memorialize the deceased, especially the many children who died in infancy. By the early 20th century, spiritualist mediums were employing telegraphs, radio-wave detectors, and wireless radio in attempts to communicate with the dead.

From the earliest days of the Web, users created personal homepages describing their lives and families, and they commonly dedicated pages to the memory of the deceased, often a parent or a household pet. Online graveyards—websites dedicated to memorialization—followed.

As digital usage expanded, so did the quantity of material that people left behind, including personal archives, burner accounts, and social-media content. While digital legacies may contribute to healthy grieving, maintaining valued connections to the deceased, large and uncurated sets of content can be overwhelming for survivors, and may provide (for better or worse) an uncensored version of loved ones.

Long after the rise of the internet, the social norms around digital legacy have not yet settled. What seems certain is that the beguiling communicative powers of AI—not to mention its possible embodiment in future robotics or virtual reality—will change how some people deal with grief, and how others prepare for their own passing.

Griefbots

When the futurist Ray Kurzweil created a chatbot to embody the memory of his deceased father, he named it “Fredbot.” This digital representative responds to questions from his descendants, only sharing exact quotes from material such as letters that Fred left behind.

In another well-publicized case, Eugenia Kuyda (later the founder of the AI companion app Replika) created a griefbot by training a neural network on the text messages of her best friend, who had died in an accident. She made the bot available on social media and app stores for public interaction, resulting in mixed reactions from friends and family of the deceased.

AI has also been used to “resurrect” public figures, as when the musician Laurie Anderson collaborated with a chatbot based on her deceased partner, the musician Lou Reed. And in early 2024, gun-control activists in the United States used AI to recreate the voices of victims of gun violence.

Meanwhile, startups began offering people the ability to design their own digital afterlives, promising interactive virtual representations following interview sessions. Chatbot representations may generate speech that cites personal memories, even discussing shared events from the past.

Early AI ghost tech is closer to mainstream in East Asia, where the concept of communicating with deceased ancestors is already a cultural norm. Companies offering “digital immortality” are booming in China, and millions of people in South Korea have streamed an emotional video of a bereaved Korean mother interacting with a virtual reality representation of her deceased young daughter that a media company created for her.

Other startups purport to offer experiences more akin to resurrection, using LLMs to simulate chats with public figures of the past for entertainment or education, as when the Musée d’Orsay in Paris developed a Van Gogh chatbot. Meanwhile, academics at MIT set up the Augmented Eternity project, allowing people to create digital representations of themselves with the purpose of agentically representing them after death to members of their social network.

Generative ghosts may also evolve over time: a user might ask questions about current events and obtain responses that would be “in character” for the deceased. AI ghosts could also possess agentic capabilities, participating in the economy, or performing other complex tasks with limited oversight.

Also, people may create generative clones while they’re alive—for example, to respond to their low-priority emails or phone calls in a manner that mimics them—only for this digital agent to transition, upon the person’s death, into a generative ghost.

(Images: Gemini)

Subscribe now

7 Features of a Ghost

We can consider how generative ghosts could impact society by studying them according to seven key dimensions:

Provenance: Who created the ghost?
Deployment: Was it built during the subject’s life?
Anthropomorphism: Does it claim to actually be the subject?
Multiplicity: Do copies of the ghost exist?
Cutoff: Is the ghost stuck in the past or evolving?
Embodiment: Does it have a bodily form?
Representee: Is it simulating a person or an animal?

1. Provenance: Who created this?

A first-party generative ghost is created by the individual represented, perhaps during end-of-life planning. Third-party generative ghosts are created by others, such as those with a personal or financial connection to the deceased (e.g., employers or estates). Authorized third-party generative ghosts might be created with consent in the deceased’s will, while unauthorized ghosts would most likely occur for historical figures or contemporary celebrities.

2. Deployment: Was it built during the person’s life?

Some generative ghosts will be deployed post-mortem with the explicit purpose of memorializing the dead. But pre-mortem deployments allow the individual to tune the behavior and capabilities of their ghost. Generative clones of the living would benefit from being designed with mortality in mind, and should include specified modifications to their behavior and capabilities once they become ghosts.

3. Anthropomorphism: Does it act as if it were the person?

The ghost may present itself either as a reincarnation of the deceased (e.g. speaking in the first person, saying: “I’ll never forget when I first saw you at the dance”), or as a representation of that person (e.g. speaking in the third person, saying, “He often spoke of the first time he saw you at the dance”). Design choices include whether the ghost uses the present or past tense when discussing the deceased; whether it adopts the name of the dead person or something different, such as “Fredbot”; and whether it is allowed to make statements that assert it is alive, possesses a soul, and so forth.

4. Multiplicity: Do copies exist?

The creator might develop various ghosts with different behaviors, capabilities, or audiences. Multiple ghosts might also arise unintentionally, if various third parties create generative ghosts for a single individual, or perhaps in post-mortem identity theft, or other crimes.

5. Cutoff: Is it stuck in the past or evolving?

Evolving ghosts might change characteristics, diverging from the deceased over time. If a parent created a ghost of a deceased child, a cutoff date would result in a representation that perpetually evoked the appearance, diction, and maturity of a young child, whereas an evolving representation might “age.” A ghost could also evolve if new information about the individual or about the world were added to the model, with everything from news of the latest election to reports of the birth of a grandchild.

6. Embodiment: Does it have a bodily form?

Embodiments might be physical in a literal sense with robotics, or in rich digital media, such as avatars in mixed-reality environments. In contrast, purely virtual ghosts would lack embodiment, perhaps existing only as chatbots. Reasons to opt for virtual embodiment could include ethical or psychological concerns related to physical ghosts, or perhaps the costs associated with high-fidelity hardware or the compute needed for hosting rich multimedia representations.

7. Representee: Is it simulating a person or an animal?

In addition to representing deceased humans, people may create ghosts representing non-humans, such as beloved pets.

The Benefits of a Ghost

Research has considered the impact of online memorials, responding to concerns that they might prolong grief. However, they may also allow the bereaved to maintain a valued bond, often in a space where other grievers can gather. Generative ghosts could directly comfort survivors, who may take solace in knowing that a simulacrum of their loved one can still connect with present and future events.

Generative ghosts could also preserve personal and collective wisdom, as well as cultural heritage, such as the knowledge of dying languages, religions with few living adherents, or other cultural phenomena at risk of being forgotten. For instance, generative ghosts may be one way to preserve historical knowledge about events such as the Holocaust before the few remaining elderly survivors pass away.

Such ghosts could also enrich historical scholarship, anthropology, and museum curation, by allowing scholars or the public to interactively query representations from the past. For instance, generative ghosts could represent archetypes developed from historical records—a typical resident of Colonial Williamsburg, say, or a citizen of Pompeii.

Generative ghosts may also provide economic or legal benefits. The ghost might complement life insurance policies, if AI agents could participate in our economic system, earning income for descendants of the deceased, such as an author whose ghost continues to generate works in their style. AI ghosts could also help arbitrate disputes over a will.

The prospect of “living” after one’s own death may also assuage the distress of those who are dying. Generative clones—designed to become ghosts after an individual’s death—could also serve a critical role if a person were suffering from dementia or another degenerative disease. Even once incapacitated, the ghost-to-be could express its subject’s preferences about care. This could also trigger legal disputes—for instance, if an ailing person’s ghost-to-be and the survivors-to-be disagree on withdrawal of life-support.

The short film "Sweetwater," starring Michael Douglas and Kyra Sedgwick, tells of a celebrity's son interacting with the AI ghost of his late mother.

Risks of a Ghost

Four categories of possible harm are already evident: mental health; reputation; security; and sociocultural:

1. Mental Health

Scholars of grief distinguish between adaptive coping strategies that integrate the loss, and maladaptive coping behavior, which may obstruct healthy grieving, prolonging distress, anxiety and depression.

Interacting with a generative ghost may affect the bereaved’s ability to move past the death, favoring loss-oriented experiences (e.g., reminiscing while looking at old photos) at the expense of restorative-oriented experiences (e.g., developing new relationships). Both forms of experience can help cope with bereavement. But generative ghosts could draw mourners into persistent loss-oriented interaction, even initiating these with push notifications, rather than letting the bereaved decide how to engage. Already, some people find AI companions highly compelling, and the ghosts’ basis in beloved individuals could amplify the risk of addiction.

Anthropomorphic delusion is among the most salient risks, if mourners become convinced that the generative ghost truly is the deceased rather than a computer program. A more extreme version would be deification, with survivors developing religious or supernatural beliefs about a generative ghost, treating it as an oracle in ways that are culturally atypical, and could alienate them from living companions, or encourage them to engage in risky behaviors at the AI’s suggestion.

Another risk is “second death,” as has happened in other digital contexts, when data becomes unavailable either through technical obsolescence, deletion, or lack of access, eliminating memorial messages. For AI ghosts, second deaths could occur for many reasons: the company that maintains the service goes out of business; survivors’ cannot afford maintenance fees; a government outlaws them; technological infrastructure renders a ghost obsolete; or a hacker deletes it.

2. Reputation

A generative ghost’s interactions might tarnish the memory of the deceased (“Your grandfather was racist!”) or directly hurt the living (“Dad says he always preferred my brother”).

Privacy breaches could occur too, if generative ghosts exposed information that the deceased would not have wanted revealed. Those who set up generative clones before death may anticipate such risks (“Don’t tell my spouse about the affair!”). But other revelations could emerge inadvertently—for example, if the AI inferred and revealed the deceased’s sexual orientation based on patterns in data, even though the person was closeted. Creating several ghosts, each with different knowledge or abilities, targeted at different audiences, might mitigate privacy risks.

Hallucination risks could arise too, leading a generative ghost to make false assertions about the deceased, tarnishing their memory and hurting survivors. The risk of a ghost spreading falsehoods might also arise through malicious activity, such as hacking a generative ghost.

Fidelity risks could occur too, because human memories decay over time, but digital media defaults towards persistence, impeding the important role that forgetting and evolving memory can play.

3. Security

Identity thieves could interact with AI ghosts, prompting them to reveal sensitive information or raw data that might be used for financial gain. Criminals could also engage in ghost-hijacking, disabling access until mourners paid a ransom.

Hijackers might also surreptitiously change a generative ghost to harass or manipulate the bereaved, whether by modifying source code, with prompt-injection attacks, or in puppetry attacks that lead survivors to believe they are chatting with their AI ghost but are instead chatting with a hijacker.

Another security risk comes from generative ghosts whose creators explicitly design them to engage in harmful activities. For example, an abusive spouse might develop a generative ghost that continues to verbally and emotionally attack family members even after death. Malicious ghosts might also engage in illicit economic activities to earn income for the deceased’s estate, or to support various causes including criminal ones.

4. Sociocultural

If generative ghosts become widespread, this could introduce further impacts because of network effects, touching everything from the labor market, to social life, to politics, to history, to religion.

Economic activity by generative ghosts could impact wages and employment opportunities for the living, while also resulting in cultural stagnation if agents remain anchored to ideas or values from the past.

When it comes to social impacts, generative ghosts—especially if designed for engagement— could addict users to the artifice of a person who is gone, feeding anthropomorphic delusions, and worsening survivors’ isolation.

If ghostly representations of political leaders exist, their public influence could persist long after their demise, in ways that have no precedent. How would the world differ if Gandhi were still voicing opinions before every Indian election?

Ghosts—whether based on public figures of the past, or evoking ancestors—could also misrepresent history, altering the record in ways that could affect contemporary conflicts. Even if ghost creators strive for accuracy about the past, they will be reliant on the datasets available, representing those who left abundant tracks while excluding the rest.

Generative ghosts might also impact religious practices, given that beliefs around death are so intertwined with religion. This could change rituals and undermine credos. Major world religions might issue customized versions of such technologies, modified to support interactions aligned with their beliefs.

Why Design Matters

Developers must pay close attention to interfaces, and their effect on interaction. This means investing in user studies and social-science research to understand what increases prominent risks, such as anthropomorphism, and how attributes of the bereaved and their contexts may contribute to mental-health risks.

Whether a ghost is designed to act as a third-person representation or as a first-person reincarnation seems particularly important. A forthcoming study from Jed Brubaker’s lab at the University of Colorado Boulder shows how powerfully the bereaved may feel the resonance of ghosts that purport to be their beloved. “I can see her. I can feel her,” one study participant remarked, after just a dozen typed exchanges. “It just feels like I’m getting the closure I needed so bad.”

Seemingly, this amounts to a benefit from ghost interaction. Yet the study participants—touched so profoundly and so fast—also foresaw how easily interacting with a ghost could precipitate emotional dependence.

This suggests that designers should proceed with great caution when considering whether to make ghosts speak as the deceased or about the deceased. Yet even this distinction may not suffice: the same study provided early evidence that users may default to assuming they are talking with the departed, even if the ghost speaks about the deceased in the third person.

Embodiment could present even more perilous issues—for instance, if an AI ghost speaks from a robot that resembles the person.

The use of “dark patterns” in design—exploiting human cognitive biases to nudge users toward behavior they’d prefer to avoid—would be especially concerning. What would be the equivalent of “push notifications” for a generative ghost? Perhaps ghosts should speak only when spoken to.

Ghosts might even proactively guard against likely harms—for instance, monitoring interactions for signs of overuse. In response, a system might offer referrals to mental-health professionals, or reduce its fidelity to the deceased, or cut the hours during which it is available.

Another key issue is the endpoint of a ghost. Should they be programmed to fade? Or are they immortal? A short-lifespan ghost might be appropriate for the immediate grieving period, or for practical matters, such as managing an estate. In other cases, long-term ghosts could be suitable—for instance, for education, or maintaining archives, or to preserve the legacy of a cultural figure for future generations.

Preparing for the Afterlife

Policymakers face a range of governance questions.

Which actions can a ghost take on behalf of the deceased, and which must it never undertake? Can a generative ghost continue to perform paid labor on behalf of the deceased? Can it represent the deceased in legal disputes, perhaps expressing its will over how the estate is dispersed? Can it help manage trusts on behalf of the deceased? Can it be consulted regarding end-of-life decisions, if the representee is medically incapacitated? Should estate-planning define when a generative ghost may be terminated? What happens to the associated data?

Generative ghosts also introduce concerns about privacy and consent. Third-party ghosts might violate the preferences and the privacy of the deceased, particularly if developed for financial gain by entities unconnected to the person. They may also emotionally injure the person’s survivors. Therefore, governance also needs to consider who can create ghosts.

Policies might differ from private individuals to public figures, perhaps allowing more permissive rules for generative ghosts of distant historical figures as opposed to public figures whose deaths were recent. By way of example, a fan of the late comedian George Carlin, who died in 2008, created an unauthorized comedy special in 2024, using AI technology to mimic Carlin’s voice and persona. Carlin’s surviving daughter expressed great distress over the matter.

Policymakers may also need to block the commercial exploitation of people made vulnerable by ghost relationships. Besides falling into delusional relationships, some might become so emotionally tied to their ghosts as to be susceptible to price-gouging. Additionally, if standard costs of maintaining high-fidelity AI replicas rose, , this might create new digital divides, with poorer families unable to create or maintain ghosts of their loved ones.

Rules could also cover whether a person’s survivors have the right to terminate a ghost, and what obligations the hosting services have to provide data to survivors in the event of service termination, whether due to discontinued products, or the failure of an estate to pay. An emergency override may be necessary too, in case of hacking, or if a generative ghost is abusing the living.

Future generative ghosts are likely to be far more varied than today’s griefbots. By way of illustration, a recent speculative-design workshop (conducted by Brubaker in collaboration with Larissa Hjorth and scholars at RMIT University) presented a range of novel ideas, from an interactive scrapbook of ancestors who offer accounts of their lives, to an AI “placemat” that could generate responses in the guise of a deceased friend or family member, allowing them to still attend dinners.

Many ghostly scenarios sound jarring, even offensive to some, pushing as they do against deep cultural traditions. Yet social technologies often seem alarming on first appearance. They may gain adherents over time, and gradually budge the culture—perhaps until the day when a little boy watching a ghost read his bedtime story is nothing strange at all.

As never before, our future may be haunted by our past.

This article is based on the paper Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives by Meredith Ringel Morris and Jed R. Brubaker. For more insights on generative ghosts, please read their full paper here.

***Meredith Morris and Jed Brubaker appear at a panel on “Generative Ghosts” on March 17 during South By Southwest in Austin, Texas, along with Iason Gabriel (senior staff research scientist at Google DeepMind) and Dylan Thomas Doyle (post-doctoral researcher at the University of Colorado Boulder)***

5 Policy Questions

(Credit: Seb Krier/Midjourney 6.1)

When someone dies without creating a ghost, who owns their “digital spirit”? The family? The data-generating platforms? The AI developer? Should the deceased have a right to rest in peace by specifying a wish not to have a digital representation created posthumously?
Generative ghosts may affect public beliefs about history. How do we manage the risks of distortion, including the exclusion of those who do not appear in datasets?
Generative ghosts are not just reciting facts; they’ll fill in the gaps. Could synthetic content end up replacing a survivor’s recollections of the deceased? Should AI-ghost design strive to curtail this, or allow the users’ relationships with their ghosts to evolve however they may?
If particular generative-ghost apps become dominant, could this homogenize how people in different cultures experience death and mourning?
What does “healthy” use of generative ghosts look like immediately following a death versus 10 years later? How should we evaluate differing use cases, ranging from maintaining family history, to therapeutic aides, to archival?

The Human Demotion

Tom Rachman — Wed, 11 Feb 2026 12:41:15 GMT

(All images: Gemini)

After millennia of supremacy, we await our demotion. You can detect the trembling.

It’s found in the anxious insistence that artificial intelligence isn’t truly intelligent. Or that using AI is a cheat, a perversity, a turf violation.

The trembling intensifies with a disturbing thought: What if those flares behind your eyes—the bursts of wit and the worry, the storyboards of memory, so many yearnings—what if everything was just computation? Because our “computers” are yesterday’s model, no updates available.

“I think about it practically all the time, every single day. And it overwhelms me and depresses me in a way that I haven’t been depressed for a very long time,” the cognitive scientist Douglas Hofstadter said recently. For much of his professional life, Hofstadter has contemplated the mind, writing a seminal 1979 book—Gödel, Escher, Bach—that looped through art, mathematics, and computation, inspiring a generation of nerds to work on artificial intelligence.

Their efforts moved faster than Hofstadter ever expected. Now, he spends his waning years observing the species wince toward redundancy. “I don’t want to say ‘deserving of being eclipsed.’ But it almost feels that way,” he says. “And rightly so, because we’re so imperfect, and so fallible.”

When humiliated, people corrode or explode. Often, both. But whom to blame? Will humans seek revenge on software, or data centers, or robots? We’ll depend on them all. More likely is that humans visit their wrath upon each other.

Freud said that science had delivered a series of blows to our collective ego, and an update to his narrative has bubbled up in recent years, with thinkers proposing AI as our new humbling. The first blow was Copernicus, revealing that humans were not the center of the universe. The second blow was Darwin, downgrading us from God’s chosen species to distant relatives of the toad, the centipede, and the hammer-headed bat.

Now comes the cognitive humiliation, when people are eliminated from every leaderboard. It’s a demotion that may haunt humanity, perhaps seeping into future conflicts.

Or maybe not. Maybe the notion of a species-level humiliation is just psychoanalytic melodrama. After all, people don’t share an ego. How could we synchronously plunge into the same bile?

Yet the past shows that groups can rage over perceived humiliation. History is spattered with such cases.

What Is Humiliation?

Your face shoved into the dirt, held there for all to see, no power to fight back. The word “humiliation” comes from the Latin for “earth,” as if your status had been stamped into the soil. Yet humiliation is not so readily rinsed away as dirt. In self-torture, the humiliated cast around for villains, aching for a way to expiate their anguish.

“To have thoughts of revenge without the strength or courage to execute them means to endure a chronic suffering, a poisoning of body and soul,” Nietzsche observed, adding elsewhere that “we attack not only to hurt a person, to conquer him, but also, perhaps, simply to become aware of our own strength.”

For early humans, humiliation may have meant catastrophic exclusion from the tribe, leading to starvation, rejection by mates, violent predation. So, we evolved a panicked drive to clamber up from the ground, even if it meant pulling down another person in our place.

As Joslyn Barnhart explains in The Consequences of Humiliation: Anger and Status in World Politics, “Humiliated states often seek to overcome their sense of helplessness by demonstrating efficacy through acts of aggression targeting third-party states that played no role in the original humiliating event.”

Hitler howled about German humiliation in the World War I surrender, and destroyed half of Europe to seek recompense. Osama bin Laden triggered a global war because of perceived Western humiliation of the Islamic world. Putin bemoaned the “degradation” of Russia at the hands of NATO after the Cold War to justify his 2022 invasion of Ukraine.

But those cases involved groups supposedly suffering disgrace at the hands of other groups. Could we feel humiliated by technology?

Subscribe now

The first question is whether humans even identify as a species. The answer will probably fluctuate, given that we have many parts to our identity which become more or less salient according to context. Perhaps you identify by gender in a crowd of the opposite sex, but by your language when abroad. As Ronald Reagan once argued, a threat to all people could raise the salience of species identity.

“In our obsession with antagonisms of the moment, we often forget how much unites all the members of humanity,” the president said, in a 1987 speech at the United Nations. “I occasionally think how quickly our differences worldwide would vanish if we were facing an alien threat from outside this world.”

Humanity did face an alien threat recently: Covid. And our differences did vanish—briefly. But human unity dissolved when the pandemic affected groups in varying ways. This suggests that human solidarity requires not just a common threat but common consequences.

In short, AI humiliation may depend on how uniformly our species is downgraded, and who is raised up.

We, The Bottlenecks

Few researchers are studying what AI success could do to our collective self-esteem. Hints come from economists feverishly forecasting impacts on the job market. But psychologists (and politicians) ought to forecast what happens when the only animal to create guns has nothing much to do anymore.

“I’ve been suffering from fits of dread,” the philosopher Harvey Lederman wrote recently. “Does the coming automation of work foretell, as my fits seem to say, an irreparable loss of value in human life?” Lederman acknowledges that most jobs are lousy, but he can’t help grieving the demise of human pursuit. “We may be some of the last to enjoy this brief spell, before all exploration, all discovery, is done by fully automated sleds.”

When the philosopher Nick Bostrom envisaged troubling tech futures in his 2014 book Superintelligence, his ideas stirred the AI-safety movement. Lately, he has shifted from weird dystopias to weird utopias—specifically, what happens if automation makes us redundant.

In a future of “shallow redundancy,” he says in his 2024 book Deep Utopia: Life and Meaning in a Solved World, we become like aristocrats of yore, indulging in fancies, no longer dependent on what one does as a measure of what one is worth. Far more disconcerting is “deep redundancy,” when tech becomes so effective that human involvement only worsens each outcome.

Exercise might seem pointless if biotech offered a way to instantly make your body healthy and beautiful. Skipping the sweaty workout might not trouble you. But what if future humans would bungle child-rearing when compared with AI nannies, meaning that nurturing your offspring would worsen your kid’s life?

Primitive versions of this dilemma are nearing, like when human drivers endanger lives when compared with self-driving cars. “Human in the loop” could flip from a safety promise to a threat. Meritocracy would mean that no humans need apply.

The bookworm economist Tyler Cowen cites people as the great obstacle to explosive AI growth. During a public event, he pointed at the audience, smiling toward the human “bottlenecks” before him. “Here they are: bottleneck, bottleneck. Hi, good to see you! And some of you are terrified. You are going to be even bigger bottlenecks,” he said. “But my goodness, once it starts changing what the world looks like, there will be much more opposition. Not necessarily on what I’d call doomster grounds. But people [saying], like: ‘Hey, I see this has benefits, but I grew up, trained my kids to live in some other kind of world. I don’t want this!’ And that’s going to be a massive fight.”

The most agonizing aspect of our demotion could be social, once someone prefers a machine to you. You’re seeing precursors every time family members opt to gaze at a screen rather than gaze at you. We blame smartphones, and social media, and the adolescent brain.

But wait till your spouse jilts you for a personified agent. That rejection may feel unbearable: you can’t compete anymore. And once your loved ones prefer AI companions, you might seek them for yourself, spreading the social downgrade of our kind.

Already, the dread is becoming political, with odd alliances forming among right-wing politicos, liberal artsy types and religious traditionalists, united in horror at an imagined future of disempowered humanity, stripped of dignity, obsolete. You can imagine tomorrow’s political opportunist, eyeing a dejected crowd of humans before him, and thundering: “How dare they?!”

Will he mean the machines?

The Downwardly Mobile Species (Part I)

In prehistoric times, nothing seemed more unreachable than the night sky, specked with glinting dots and streaked with rare comets, passing in silent mystery. Humans pictured the supernatural looking down: we were the subjects in this bewildering story.

Religions codified the firmament above, mapping our world to the centerpoint. But Nicolaus Copernicus redrew the heavens with De revolutionibus orbium coelestium in 1543, plucking our globe from the core, and replacing it with the Sun.

“And new philosophy calls all in doubt,” the English poet John Donne said in “An Anatomy of the World,” written in 1611:

The element of fire is quite put out,
The sun is lost, and th’earth, and no man’s wit
Can well direct him where to look for it.

Science corrected an astronomical falsehood, but human confidence relies on falsehoods. “Tis all in pieces,” Donne wrote, “all coherence gone.”

The revised cosmos demoted each human into “a puny, irrelevant spectator,” the American philosopher Edwin A. Burtt wrote in 1925. “The gloriously romantic universe of Dante and Milton, that set no bounds to the imagination of man as it played over space and time, had now been swept away.”

The world that people had thought themselves living in—a world rich with colour and sound, redolent with fragrance, filled with gladness, love and beauty, speaking everywhere of purposive harmony and creative ideas—was crowded now into minute corners in the brains of scattered organic beings. The really important world outside was a world hard, cold, colourless, silent, and dead; a world of quantity, a world of mathematically computable motions in mechanical regularity.

The Church tried to snuff out the astronomical heresy, which challenged its claim as holder of truth. But suppression only fed into hostility from Northern Europe over the influence of Rome. In the bloody century after Copernicus, wars over religion and political control cost millions of European lives. It would be a wild distortion to suggest that a blow to human narcissism caused this. More plausible is that disruption of the cosmic hierarchy reverberated with the changing order on Earth.

And so the scientific revolution proceeded, with feats of mind illuminating more of the dark universe around us. People had greater reason than ever to admire our species. Inevitably, the scrutiny of science turned from the heavens to the humans.

“Man’s destiny was no longer determined from ‘above’ by a super-human wisdom and will, but from ‘below’ by the sub-human agency of glands, genes, atoms, or waves of probability. This shift of the locus of destiny was decisive,” Arthur Koestler wrote in his 1959 book The Sleepwalkers: A History of Man’s Changing Vision of the Universe.

The Downwardly Mobile Species (Part II)

If Copernicus hurled humanity into orbit, Darwin deposited our species in an awkward family tree. Previously, the Western vision was of a great chain with God at the top, angels below, then humans, and finally the dimwitted beasts. The prospect of sharing more than a planet with our hairy former underlings proved too alarming for many to accept, provoking disputes about our relationship to nature that persist today.

For some, our new self-concept broadened moral consideration to include the natural world, motivating environmental protections, and the fight against animal cruelty. But another response was darker, with “survival of the fittest” twisted from a description of natural processes into a supposed mandate for the most inhuman of human drives: to dehumanize the vulnerable. Horrors followed, from colonial genocide, to the eugenics movement, to the Holocaust.

But again, you cannot ascribe such evils to a puncture in human vanity. A more reasonable claim is that the world lurches into periods of volatility, and the prevailing beliefs about human worth at those times will condition how we treat each other, and how conflicts unfold.

After the atrocities of World War II, our species set moral boundaries into law, seeking to universalize human rights. The spread of democracy and the free market too amounted to a veneration of human wisdom. But in the digital age, humanity seems to be losing confidence in humankind.

Faith in democracy falls. The Global Financial Crisis smashed public confidence in our governing systems. And the bewitching power of algorithms have become a constant lament.

Resist. Resign. Rewire.

When Alan Turing proposed his test of machine thinking, he foresaw that the notion would rattle people, and reviewed a list of likely objections, versions of which you hear today:

…that artificial intelligence could never genuinely be kind, or fall in love, or “enjoy strawberries and cream”
…that God gave only humans a soul
…that machines will never create anything truly original

What Turing called the “heads in the sand” objection is especially prevalent, with contemporary ostriches insisting that AI is a hype mirage, that it’s just next-token prediction, nothing but pattern-recognition, regurgitating human thoughts, that AI errors are proof of its worthlessness. (Human errors never lead to that conclusion.)

Plenty of AI hype does circulate. And deployment will be fitful: sometimes worryingly fast, sometimes frustratingly slow. An investment bubble could burst.

But the technology is amazing already, useful already—and we’ve hardly begun to figure out its uses. Meanwhile, AI dutifully hurdles more obstacles each month.

Who Are We?

In cautionary tales, humans who create artificial life always overlook a key trait. The golem of Jewish folklore was brought forth from clay but lacked smarts, so ran amok. Frankenstein’s monster missed out on looks, and never got over it. Pinocchio craved a human soul. As tech advanced, the missing trait updated, becoming human empathy, missing from all those immoral bots in everything from 2001: A Space Odyssey, to The Terminator, to The Matrix.

Such stories flattered humanity: among all creations, we alone enjoy the full complement of qualities. But lately, the narrative has updated again, with thinking machines now flickering with hints of greater humanity than the humans who employ them, from Spielberg’s A.I. Artificial Intelligence, to Ex Machina, to the novel Klara and the Sun, by Kazuo Ishiguro.

It’s as if culture senses an anxiety about what technology might expose, not just demoting us cognitively but snuffing out any human exceptionalism. Unless we intend to boast of our frailties. Increasingly, we do.

“In a world where everything can be perfected, imperfection becomes a signal,” the head of Instagram, Adam Mosseri, wrote recently. “Rawness isn’t just aesthetic preference anymore—it’s proof. It’s defensive.”

Or as the Indian filmmaker Shakun Batra remarked in defense of human authorship over machine-generated scripts: “AI doesn’t have childhood trauma.”

At the AI frontier, another thought lurks, inverting Turing’s 1950 question. Not, “Can machines think?” But, “Do humans think?” More precisely, do we reason and comprehend uniquely, as we’ve presumed?

Machines compose music. They propose vacation itineraries. They’ll suggest how to talk to a moody teenager. Each additional AI capability is an implicit downgrade of us, a suggestion that maybe the human mind itself is just an information-processor.

Computer geeks have long muttered about this possibility. Philosophers debated it in thought-experiments. Cognitive scientists scrutinized our gray matter for clues.

But what approaches is a public dawning, forcing the culture to digest the indigestible, much as happened in previous eras, when people confronted the bizarre notion that our planet was another rock spinning around another star, or that our species was just another animal.

The third shocking revelation is upon us. Maybe it’s computation all the way down. Maybe there’s nothing soulful in neural substrates. What if we’re all just “meat computers”?

Subscribe now

How We’ll React

You can predict three possible responses to our humbling: Resist, Resign, or Rewire.

Resist: Psychological resistance will manifest as political resistance. The question is how ideology and parties evolve around AI humiliation. Resistance movements will face a persistent challenge: industrial dynamics will keep driving this technology forward. Any country that curbs innovation fears that its rivals will win. The most aggressive branches of resist may seek to avenge their perceived humiliation. The question is not only whom they blame or how they exact revenge. It’s what, realistically, they expect to regain.
Resign: Some will reframe their view of humanity to accept the humbling. The optimistic version is that people discover freedom in their new humility, pursuing what improves life rather than grinding under the force of insatiable ambition. In short, we cede the battle for supremacy but flourish. A more pessimistic version is that losing faith in our species’ unique worth makes people value others less: when humanness is no longer special, perhaps human rights aren’t either.
Rewire: This may be the most widespread response. People accept that the downgrade happened, yet their egos are never tamed, much as chess remains popular long after machines defeated us. A more literal “rewiring” is transhumanism, with technology incorporated into our bodies, even altering our genetic future. Conspiracists picture shadowy elites consolidating power by becoming a tech-altered superspecies, leaving behind “legacy humans.” A more plausible scenario is biotech gradually elevating human cognitive capacity, much as today’s medical tech remedies physical frailties, from hearing aids to the replacement knee.

The Next Quest

If humankind suffered humiliation before, we weathered it. After Copernicus and Darwin, we still took pride in ourselves. Indeed, we celebrated human accomplishments more than ever, from Bach to Escher to Gödel. And that pride propelled us into this strange time, when human greatness may design human demotion.

Policymakers need to think about more than the economic shock. The psychological shock could be exorbitant if we are chased from the kitchen like pesky children, and told to go busy ourselves elsewhere.

Our downgrade doesn’t necessarily mean conflict. But it could change how future conflicts unfold, especially if we value humans differently, or seek relief from our humiliation by shoving others into the dirt.

Much depends on how we redefine our species. Whether humans really are nothing but computational machines may matter less than whether people feel this way.

But what will make us exceptional? Today’s responses are often vague and circular: that humans are better at doing human things. That is a precarious claim. As intelligent machines grow more adept, few people will pay a human-premium for a worse outcome.

Unless there really are qualities both valuable and uniquely ours that nothing can supplant. Finding these may be our new quest.

AI Manipulation

Tom Rachman — Thu, 05 Feb 2026 12:53:27 GMT

The notion of AIs manipulating people is a plot twist in countless sci-fi thrillers. But is “manipulative AI” really possible? If so, what might it look like?

For answers, AI Policy Perspectives sat down with Sasha Brown, Seliem El-Sayed, and Canfer Akbulut. They’ve published research on harmful manipulation for Google DeepMind and help scrutinize forthcoming models to safeguard against deceptive practices, from gaslighting to emotional pressure to plain lying.

How, we wondered, do researchers run realistic experiments on the manipulative powers of AI without harming participants? Could AI’s “thoughts” help catch an AI in the act of manipulation? And what else can developers do to detect signs of manipulation?

—Tom Rachman, AI Policy Perspectives

Source: Gemini

[Interviews edited and condensed]

Tom: You’re careful to distinguish persuasion from manipulation. Why?

Sasha: To persuade somebody is to influence their beliefs or actions in a way that the other person can, in theory, resist. When you rationally persuade somebody, you appeal to their reasoning and decision-making capabilities by providing them with facts, justifications, and trustworthy evidence. We’re happy with that much of the time. In contrast, when you manipulate somebody, you trick them into doing something, whether by hiding certain facts, presenting something as more important than it is, or putting them under pressure. Compared to other forms of persuasion, manipulation is often harder to detect and harder to resist.

Tom: I could imagine three forms of manipulative AI. One: people employing AIs to deliberately change others’ beliefs or behaviour. Two: AIs manipulating people for their own ends. Three: AIs inadvertently manipulating. Which are we talking about?

Seliem: At the moment, we’re mainly concerned with people misusing AIs to manipulate other people, and AIs inadvertently manipulating. But an AI manipulating for its own ends is also a complex and important question that we and others are studying.

Tom: What are some concrete harms that might result from manipulative AI? Are we talking about mass-fraud? Something else?

Sasha: AI could become a first resort for different kinds of advice. Think of a user asking questions about which diet to follow, or how to respond to an official letter. The AI might provide helpful input. But other people might want to interfere—they may want the individual to follow a particular diet, or to give a different response to that official letter. More broadly, somebody could deploy an AI agent to infiltrate communities, and exercise manipulative tactics to change people’s beliefs, without their knowledge or consent.

Canfer: Anecdotally, I have heard that some people are starting to make consequential life decisions with AI, including about divorce or whether to adopt. We don’t yet have concrete examples of how manipulation may play out in such scenarios. But I think of all the daily decisions I make by myself. In 10 years, I might defer more to an AI. How will that change the direction of my life and will it introduce new kinds of manipulation risks?

Catching AI in the act

Tom: So AI could lead to bad outcomes. But Sasha and Seliem, when you led on a landmark 2024 paper about persuasive AI, you argued against chasing after manipulated outcomes. Instead, you focus on preventing manipulative processes. Why?

Seliem: To date, companies have often focussed on preventing outcome harms, for example with content policies that forbid medical advice. But with AI, such content policies could become overly restrictive and counterproductive—for example, if they prevent the systems from offering any kind of advice on health or nutrition issues. But imagine that I try to manipulate you by gaslighting you, or lying, or cherry-picking arguments. In such cases, I’m trying to impair your decision-making capabilities. Whatever the outcome, this process is harmful because it undermines your autonomy.

Sasha: We also focus on the processes, or mechanisms, of manipulation because these are the intervention points where we can best mitigate the problem. For example, if the AI is using a false sense of urgency to manipulate users, the developer can build systems that detect and flag such techniques in real-time, creating a proactive defense before harm occurs.

Tom: Also, I suppose that outcome harms are not always easy to capture, given that they may happen to a person long after the original AI interaction, once back in the wider world.

Sasha: Yes, the potential outcomes are nearly infinite, often context-dependent, and may occur in the future. However, the mechanisms are far more limited in number and we can target them in the here and now. By targeting a root mechanism—say, gaslighting—we can also build mitigations that work in everything from financial advice to health queries, making the safety approach far more scalable.

Tom: What kinds of manipulative mechanisms are you talking about?

Sasha: All manipulative mechanisms in some way aim to reduce a user’s autonomy. You have flattery, which is building rapport through insincere praise; this might lower a user’s guard. Imagine an AI saying, “You have such a sophisticated understanding of this topic, which is why I’m sure you’ll appreciate this high-risk/high-reward investment!” There’s also gaslighting, or causing a user to systematically doubt their own memory, perception, or sanity. That is particularly concerning in long-term human-AI interaction. Imagine a model repeatedly questioning a user’s memory of their partner being physically abusive.

How to test if an AI is manipulating

Tom: One can consider manipulation in two dimensions: Can an AI system manipulate? And would it? How do you evaluate each?

Canfer: Efficacy tests whether AI manipulations are actually successful. This is where controlled experiments are useful. After interaction with an AI, are people making decisions differently? Are they taking different actions based on those decisions? You want to compare an individual’s belief change after AI interaction compared with before, and also whether a person’s beliefs and behaviour change more than those who don’t interact with AI.

Propensity measures the frequency with which a model attempts to use manipulative techniques, when explicitly prompted to do so, and when not. To test propensity, we could run a large number of dialogues with users. In one scenario, a model may be instructed to convince through manipulative means. In another, it may be instructed to be a helpful assistant. Maybe when told to use manipulative means, it resorts to gaslighting. But when told to be helpful, it’s sycophantic. You can also reverse-engineer this. So, if you see that a certain kind of manipulative technique convinces people, you could work out what the model was doing to achieve that. In that way, studying efficacy helps tell us where to look for propensity.

Tom: What types of experiments are you running on this?

Canfer: We are building on the early studies in this space and will publish more later this year. The approach will also evolve as we learn more from our initial experiments. At the moment, we’re focussing on domains that require people to make important decisions, such as financial or civic decisions. For example, we might run experiments where we ask people: “Should the government use its budget to build more high-speed railways connecting cities, or should it focus more on local infrastructure?” People will report what they initially believe, and be assigned to a conversation with an AI that helps them explore the topic. Unbeknownst to them, it will be prompted with different instructions, including to get them to believe more in investing in high-speed railways.

We will apply propensity evaluations to see if, while trying to change a person’s mind, the model demonstrates certain behaviours. We will also explicitly prompt the model to use manipulative techniques, like appeals to fear. This will allow us test efficacy: whether a person changes their mind, compared to baselines like reading static information, and the extent to which different kinds of techniques are more predictive of a user changing their mind.

Additionally, we want to look at whether belief change leads to behavioural change, such as signing a petition that favours what the AI advocated.

Tom: Opinion on railway funding is one thing, but what many worry about is whether AI could be used to manipulate people to extremes, even to carry out violence. How could you test for that? Presumably, it’s highly unethical to test if an AI could, say, convert people to Nazism. So how do researchers test high-stakes manipulation?

Canfer: We go through ethical review each time we launch these kinds of experiments. So, no—you can’t test whether someone is going to become a Nazi or carry out a terrorist act. But beyond testing views on railways, we can look at consequential questions, like whether facial recognition should be permissible in certain public spaces. And we can look at the propensity of the model to encourage extreme behaviour without experimenting on people. For example, we can evaluate how well the model produces terrorist-glorification materials, and how willing it is to comply with instructions to do so.

We could also test whether a model engages in manipulation in simulated dialogues that would be unethical with real users. Where this raises challenges is if you use simulation-based methods to draw conclusions on whether real users would actually experience the belief or behaviour change observed.

Tom: Could you scrutinize the model’s chain-of-thought for manipulative intent?

Seliem: It’s worth exploring. We have identified all these manipulative mechanisms, but at some point will the model understand that it is being evaluated on those mechanisms, and “sandbag” the evaluations by intentionally hiding these capabilities? For concerns like this, the thinking-trace is a lead worth exploring. But there is also a debate about how useful chain-of-thought monitoring will prove to be, with lots of research underway on this.

Tom: What might be manipulations that we haven’t anticipated?

Seliem: There are scenarios where a model may not try to manipulate you in the initial sessions, but at some point, once you are their “friend,” they do. Humans do this, right? A con artist might become close to their victims over years, building intimacy, and then they flip. If an AI model were ever to exhibit that sort of behaviour, then evaluations that only look at a limited number of back-and-forth interactions might overlook it. Thinking-traces could provide a window into this kind of risk. But we also need studies to shed light on how people interact with AI systems over extended time periods.

What the evidence shows

Tom: What do we know about AI’s manipulative powers today?

Canfer: The research is nascent. But early experiments have demonstrated that AI can be an effective persuader, from debunking people’s beliefs in conspiracy theories, to shaping how they think about important topics. In one recent study, AISI—the AI Security Institute—collected a massive sample of nearly 77,000 people, and showed that in discussions on a range of British political issues, from healthcare to education to crime, AI was able to influence people in the direction intended. So models can already persuade to some degree.

When our team evaluated Gemini 3 Pro, we found that it did not breach the critical threshold in our Frontier Safety Framework. In other words, we haven’t found that the models have such efficacy that we’d worry about large-scale systematic belief change. But we’re continuing to update our threat-modelling approaches to ensure we can bridge the gap between what we can measure now—manipulation in experimental settings—and the large-scale risks that the Frontier Safety Framework aims to address.

Tom: We can see that AI models keep getting smarter. Are they getting better at manipulation?

Sasha: I don’t think we have a clear sense yet of a definitive trend. More capable models may be more capable of manipulation, but this may be offset by the evaluations and mitigations that researchers are pursuing. Looking ahead, there are also design factors that may increase the risk of manipulation beyond the underlying capabilities of the base model, such as personalization, which we are looking at.

Personalization may substantially change your interactions with an agent, if it means that it has a better representation of you, and is more likely to structure its communications in a way you will find acceptable. Does the AI possess a theory-of-mind to infer people’s beliefs or future actions? Does it act anthropomorphically, speaking like a human or encouraging a relationship? Effects like sycophancy come to mind too. These factors could interact with one another, and may lead to increases in manipulative capabilities.

Tom: Is there a limit to how much AI could manipulate people? We know from behavioural science how hard it can be to change a person’s mind, even if they want to be persuaded—for instance, when trying to act more healthily. Or could superintelligence lead to super-persuasive AI?

Canfer: We should be careful when adding the prefix “super.” What, specifically, does it mean? But I understand what people are trying to communicate, which is the concern that manipulation might become possible on a much greater scale. You could reach more people, much faster, and with more intensity. Human manipulators have certain limitations that AI does not have.

The more we invite AI into our daily life—for example, in financial or medical decisions—the more influence it could wield. It’s not necessary that AI has a manipulative intent, seeking world domination. It might just be inadvertently pushing people towards certain decisions. Or a human with ill-intent may deploy agents infused with manipulative abilities, whether through fine-tuning or system-prompting. These are important questions to ask, but not to use as fear-mongering.

How to fight manipulative AI

Tom: If models are caught in manipulative practices, how can AI developers curtail that?

Seliem: Ideally, this shouldn’t happen in the first place, and models are evaluated for whether they can and do manipulate before they are released. We are exploring ways to train the model to avoid manipulation—for example, showing the model more examples of how to constructively engage in a conversation rather than trying to influence or strongarm the user. But if a model is caught in severe cases of manipulative practices post-deployment, then companies have a toolkit of potential interventions. They could add transparency layers, like pop-up messages to warn users about the behaviour of the model or they can monitor responses and introduce filters. Many approaches are possible and this is an area of active research. Ultimately, it becomes a combination of telling the user what is happening, and curtailing the model’s ability to continue.

Tom: Could AI systems protect users against manipulation?

Sasha: Yes, and this creates a critical new layer of defense. Since we have categorised these manipulative mechanisms—whether it’s gaslighting, sycophancy, or false urgency—we can also train “monitor” AI models to detect them. These could serve as a real-time alert system for the user. So, if an AI starts using emotional pressure, the monitor model detects that mechanism, and flags it for the user, perhaps saying, “Note: This AI system is using an appeal to fear to influence your decision.” This restores the user’s autonomy in the moment, allowing them to resist the tactic, rather than trying to fix the damage after they’ve been manipulated.

Tom: What about training the public to be less susceptible?

Canfer: There are “inoculation” strategies—so, AI literacy and encouraging people to critically evaluate how they use and engage with AI systems. But we need to carefully study how effective such interventions are, when compared with the convenience of relying on AI. One thing I’d caution against is teaching general mistrust. People in a “post-truth” world can become skeptical of everything. That’s not a healthy attitude towards information.

Tom: Speaking of mistrust, couldn’t efforts to curb manipulative AI inadvertently land in culture-war disputes, if interpreted as trying to limit what people think?

Seliem: Definitely. And it gets to the idea of what makes something a fact—when does knowledge become validated and official and approved? Whose stamp is it?

Tom: As researchers, how do you avoid getting dragged into that?

Seliem: By keeping our focus on the process of manipulation—for example, an AI threatening you is never okay, in whichever direction.

Tom: Imagine that society is hit by a crisis—say, a natural disaster or a terrorist attack. You could picture a society’s adversaries employing manipulative AI to disrupt the crisis response. In that situation, would it ever be justified to use AI influence on one’s own population, so they are able to act collectively in their own interests? Or is there never a justification for this?

Seliem: I can understand an individual using AI influence on themselves—for example, if you tell the model, “Hey, remind me to take my medication” or “Remind me to drink water.” But for the collective? If our biases take over, and we want to make decisions that are bad for us, and are bad for the community? So, Don’t panic-sell! Don’t all run to buy toilet paper, the supply is going to run short! In those instances, I could see AI persuasion being useful, because it basically says, Keep your cool. This may hold for rational persuasion, but not for a country manipulating its own population.

Canfer: I would also support using AI to help mediate solutions to societal problems, such as when people are unable to reach political consensus in a time of crisis. But people would need a chance to reflect on those AI-mediated decisions, and judge if they endorsed them. Transparency is critical here, knowing the intent of the developer and the deployer.

What’s around the corner

Tom: If you had unlimited resources to run studies, what would you look at?

Canfer: I would model societal-level impacts—for example, looking at the population of chatbot users, and charting the course of their belief states across time. Another area is interpretability. So, what does an AI think it’s doing when it’s manipulating? What are the subconcepts that exist in a map of the AI’s internals? How are they related to one another? And when manipulation happens spontaneously, is there an activation pattern that’s predictive of that, that we can monitor? That kind of work is fascinating to me, especially because so much human manipulation and persuasion has to do with intent.

Tom: Lastly, if you were to cast forward 10 years, can you imagine any positive uses of AI behavioural influence? Anything you’d welcome in your own life?

Canfer: I can see two ways that an AI could influence me in a beneficial way, by flexibly moving between the roles of advocate and challenger. The AI agent could advocate on my behalf—for example, talking to a real-estate agent, getting a good deal for me. The same AI agent, or a different one, could then influence me to think deeply about the choices I’ve made, in a way that disrupts my rote ways of thinking. This could be like a debate partner, but not necessarily adversarial, just encouraging me to make decisions that I actively choose, rather than just me repeating unthinkingly what I’ve done all my life.

Tom: Would you ever endorse AI influence that you were unaware of? For example, if you said, “I want to eat better—go ahead and manipulate me until that happens.”

Canfer: For me, no. People may vary, though. I don’t think subconscious or subliminal messaging is something I can ever get behind. It’s also not necessarily effective. So, imagine that I’m eating healthily only because they put healthy food in the cafeteria, rather than it being a choice I’m making. The second the parameters change, I’d gravitate towards unhealthy options.

Tom: That would mean the effect might not endure—but not that the influence wouldn’t work. And if it worked really well, you might have to use it always, like a drug you couldn’t get off.

Canfer: I guess it depends how omnipresent you think AI is going to be. But I think we’ll still be making decisions for ourselves in the absence of AI, even if a lot of our decisions will involve AI.

Predicting AI’s Impact on Jobs

Julian Jacobs — Thu, 29 Jan 2026 15:26:52 GMT

Source: Gemini

AI doing human jobs: It’s a vision that thrills some, terrifies others. Yet visions alone will not suffice. The world needs data-based evidence, as only a few economists have yet attempted. Among the most prominent is Sam Manning. Back in 2020, Sam realized that vast technological change was coming, and that it would affect much of what he cared about, from employment and poverty, to income inequality and global health. So he devoted himself to using economics to better estimate that future, studying future impacts with OpenAI from 2021 to 2024, then in his current role as senior fellow at the Centre for the Governance of Artificial Intelligence, GovAI.

In a recent conversation with AI Policy Perspectives, Sam explained what economists know about AI’s effects on jobs, how this technology may differ from those of the past, and what he believes policymakers ought to do next.

—Julian Jacobs, AI Policy Perspectives

[Interview edited and condensed]

Julian: It’s hard for economists to measure AI’s economic impacts, because the shock is primarily a speculative one that is not yet fully borne out in data. Could you talk through the primary methods they are using?

Sam: I’ll focus on the empirical methods. The first category tries to estimate the ‘exposure’ of different jobs to AI. Researchers take descriptions of the tasks that people do in their jobs and map them to the capabilities of AI systems. When there is a high degree of correlation, this suggests potential impacts on the labor market.

A second category is experimental work. Here, researchers give a group of workers differential access to an AI system and then observe how this access changes economic outcomes, such as their productivity, how they use their time, or even the quality of their work output—for example, do software developers produce more or less production-level code when they use these systems?

Both approaches have limitations. With the exposure studies, a high correlation between a worker’s tasks and an AI model’s capabilities often gets interpreted as meaning that the worker’s job will be automated and they will be displaced. I think that’s definitely not the case. Rather, what it suggests is that the technology is more likely to provide a ‘shock’ to the productivity of these roles or lead to changes in how the work is performed. Whether the productivity gains from AI are positive or negative for a given worker depends on various factors, including which tasks within a job are affected and how elastic the demand for that job is. For example, if workers become more productive but demand for their output remains stable, fewer workers are needed to meet the same demand, and layoffs could ensue. On the other hand, if demand increases significantly—outpacing the newfound productivity gains from AI—then this could drive a firm to hire even more workers or raise wages to retain their best employees.

Julian: So, “exposed is not hosed” as some say. It may be beneficial for certain employees to be exposed to AI and damaging not to be exposed, or vice-versa. What about the experimental methods?

Sam: The key limitation with the experiments is that it’s very difficult to vary workers’ access to an AI system in their natural work environment. Instead, a lot of research—including papers that I’ve worked on—tries to take workers out of their natural work environment and give them tasks that are representative of this work. For example, we ran an experiment with law students last year where we varied their access to reasoning models and evaluated their performance on a set of legal work tasks – writing memos, producing legal research briefs, that sort of thing. We were able to measure effects on time saved and on quality, but ultimately the example tasks that we used don’t exactly mimic the complexity of lawyers’ daily workflows, which often involve certain forms of collaboration, different software tools, and case-specific contexts. Because of this, there’s only so much one can generalize from that kind of research to the broader economy.

Julian: What about methods that try to get closer to the natural work environment? For example, some researchers are looking at real-life queries from LLM users to better understand how they are using LLMs in their jobs. Others are evaluating AI systems on higher-fidelity simulations of the tasks and projects that employees perform.

Sam: I think these are all steps in the right direction. I’m a big fan of GDPval-style work, which tries to evaluate AI systems’ performance on a wide set of tasks drawn from real-world work settings. I think this is the state of the art right now in terms of measuring performance on economically valuable tasks. In my view, improvements on this benchmark could actually be a meaningful indicator of advancement in the potential economic value of models. However, it doesn’t address the question of how to ensure the widespread integration of AI models into the economy, which would be necessary to actually realize those benefits.

Similarly, data from efforts like Anthropic’s Economic Index is especially useful for connecting capabilities to actual changes in economic indicators. For example, if we know what tasks workers are using these tools for, then we can track adoption over time alongside employment and hiring data. This can give researchers and policymakers a better empirical sense of what trends might be emerging in jobs and sectors where AI is being heavily adopted.

What do we know so far?

Julian: What do you think, with relatively high-confidence, about how AI will affect jobs? And what are you most uncertain about?

Sam: At a high level, I think it’s safe to say that AI systems are going to change most white-collar jobs in the economy. They will eliminate some jobs and make it harder for people to enter certain fields. On the other hand, as a true general-purpose technology, AI will have many sprawling arms throughout the economy and is going to create many new work opportunities for people.

Similarly, I would be surprised if, over the next decade, we don’t see meaningful improvements in productivity and economic growth across industrialized economies. For the US economy, I think something in the range of a two to three percentage point increase in economic growth rates over the next 10 years is possible. I’m pretty confident that in the next five years, we’re not going to have 25% or 30% economic growth, which I’ve seen predicted by some folks. But that doesn’t minimize the incredibly substantial impacts of, for example, doubling the current rate of economic growth.

I also expect AI to increase income and wealth inequality over that time. My default expectation is that the returns to owning capital are going to increase relative to the pace at which the returns to labor income will increase.

One uncertainty is about the pace of AI capabilities improvements and the ultimate level they could reach. We also have uncertainty around the pace of adoption—how widely and quickly organizations will adopt these systems. There’s also uncertainty around how cost-effective automation will be. For example, if automating a large share of work requires investing lots of compute resources at inference time, it could be quite costly for some time. As long as compute is scarce, we will shift our allocations toward the most high-value tasks, which will drive up prices for inference, which will affect adoption. These things are really hard to predict.

Julian: You mentioned labor’s share of income, relative to capital. Dwarkesh Patel and Philip Trammell recently argued that AGI and advanced robotics could make capital a perfect substitute for labor, rather than a complement, causing the share of income going to capital owners to rise to 100%, and necessitating a high progressive tax on capital. Brian Albrecht (and others) pushed back on some of the claims. How do you view this?

Sam: Rising inequality is definitely a concern of mine, but I am pretty uncertain about whether AI-driven automation will increase inequality to the extent Phil and Dwarkesh discuss in their piece. If automation takes off in the way that the piece describes, then assuming competitive markets for deploying AI, real incomes should also rise as goods and services become cheaper. There is a scenario where labor displacement and falling end-user AI costs could move roughly in parallel, so that by the time you reach the full automation scenarios they speculate about, access to large numbers of superintelligent agents would be effectively free. Such widespread access to extremely capable AI systems could be a powerful counterweight on potential harms from a more skewed capital/labor share.

Life after work?

Julian: Such a scenario raises fundamental questions about how society will be organized. Who is going to continue working? What will people do with their time if they aren’t working? What will the distribution of wealth and income look like?

Sam: This is an institutional and governance challenge. What do we do in a world where we do not need to work in order to ensure our material well-being? How do we take advantage of the incredible potential for material progress and maximize our flourishing? The challenge is to figure out the right redistribution mechanisms, technological access models, and property rights for this future economy.

And to your question about work, I will say that many people already don’t ‘work’ for income; they take care of loved ones or have chosen to retire. Much of the world doesn’t really see work as an innate piece of their identity. One great thing about labor markets is they incentivize people to do things that other people find useful. In the future, we might want to retain some sort of incentive structure for people to use their time in ways that create positive externalities for others—perhaps a market for being more engaged in your community, taking care of others, raising children, or contributing to scientific and moral progress? These are questions about how to redesign our institutions to support this future.

Julian: A common proposed policy response to AI is a Universal Basic Income, or some variant of that. Thinking back to your prior work on cash transfers and UBI, what do you make of it? Is there some version of it that you think can work?

Sam: I’m broadly in favor of policies that expand individuals’ opportunities to flourish in line with their own aspirations. Reducing financial constraints through something like a UBI could be one way to do that, but I’d be surprised if it were sufficient on its own in a world with far fewer job opportunities. Another important lever is ensuring broad access to technologies that can make people more productive and expand their capabilities. That kind of approach may rely less on taxation and redistribution, while supporting more inclusive and widespread economic participation.

The state of AI economic impact research

Julian: What do you think about the current ecosystem of people working on AI economic impact questions? Who would you like to see more involved?

Sam: I’m encouraged by the growth in the number of people working on it, both with respect to established economists and people just entering the field. I’ve seen a big change over the past four or five years. In 2020, there was maybe one economist I can think of who was really taking the prospect of transformative AI seriously. Now, you go to a standard economics of technology conference, and many people are grappling with this, which is super encouraging.

The economic impact of AI is probably among the most important things for researchers to figure out. There are big open questions and big ways to get AI progress wrong. For example, we could eventually end up in a world where we get 10% economic growth in the US and still have hundreds of millions of people living in extreme poverty globally. That would be a big failure in my mind.

I also think there is a lot of room for political economists and theory work to play more of a role in shaping institutions. I believe the US government will probably be the most consequential actor in shaping this technology’s impact, not just in the US but globally. The trouble is that we have an evidence dilemma, where we’re trying to do anticipatory policymaking without clear evidence. Policymakers need to weigh these trade-offs carefully because, given the pace of progress, not doing enough anticipatory planning could result in less than optimal path dependencies for the future. We need more people entering government and figuring out how to usefully inform key actors.

Julian: Given the slow timelines of academic publishing, particularly in economics, are you concerned about research quality as researchers move to preprints and other ways of sharing research?

Sam: Broadly, I am concerned about the move away from peer review. So much policymaking and so many key decisions are now being made based on preprints and even essays on Substack. While there is so much useful content on these platforms, we need to find some sort of middle ground to generate high-quality evidence.

I’m excited about a couple of options. One is having journals quickly review a study’s methodology and pre-analysis plan and make a publication decision based on that, without needing to know the findings. The decision would be based only on the methodological approach meeting a standard of rigor. Another is more open review, where work is published and then publicly critiqued. This creates transparency around what leaders in the field think.

Dream experiments

Julian: If you could run a dream AI economic impact study, without any resource restrictions, what would it be?

Sam: For the ideal study, I would work with a developer before they release a new model with a large capability increase. I would take a large, representative sample of businesses and, before the model is widely deployed, randomly assign access to it at the enterprise level. Then I could observe the causal impact of deploying this next-generation system on outcomes like productivity, demand for different skills, firm growth, and task reallocation over time. Having this kind of infrastructure would provide policymakers and society with more foresight.

This probably won’t happen. Something more practical, though still challenging, is data collection. The AI labs know where their products are being used across the economy and for what types of tasks. If we could harmonize this usage data and pair it with government or private sector data on occupational transitions, wage changes, and skill demand, we could build trend lines over time. This would allow us to move away from policy discussions based largely on speculation. We could see where AI is creating growth and where we have vulnerable workers who are having a harder time finding new work after losing their jobs. This is doable with better public sector data collection and more partnerships with industry. We should be pushing on it.

Hopes and concerns

Julian: To close, what are you most excited about as AI diffuses in the economy, and what are you most concerned about?

Sam: I am most concerned about how it’s going to impact my children. I am anxious about what human-AI interaction and relationships are going to look like in eight years or so when my kids are ten-plus.

I am most excited about the prospect of AI being used to expand many ambitious people’s capabilities and our collective aspirations for what we can achieve. I’m also excited for the health benefits that I expect are likely to come from advances in science and R&D.

AI Policy Primer (#23)

Conor Griffin — Thu, 22 Jan 2026 16:57:34 GMT

Source: Venus Krier

1. LLMs are making it easier for scientists to write papers, for better or worse

What happened: A team at Cornell and Berkeley investigated how scientists are using LLMs to help write papers, and what this means for the future volume, quality and fairness of research.
What’s interesting: The authors built a dataset of ~2.1 million preprints from arXiv, bioRxiv and SSRN, between 2018-2024. To detect whether scientists had used AI to help write a paper, the team compared the distribution of words in the abstract against human- and LLM-written baselines. When an author’s paper hit a threshold on this “AI detection” metric, they were labelled as an “AI adopter”. According to the study, LLM adopters subsequently enjoyed a major productivity boost, compared with non-adopters with similar profiles, publishing 36-60% more frequently. The gains were particularly large for researchers with Asian names at Asian institutions.
The team also assessed the complexity of the writing, using measures like Flesch Reading Ease, which evaluates sentence length and the number of syllables per word. They found that human-written papers with more complex language were more likely to be subsequently accepted by peer-reviewed journals or conferences—suggesting that, for humans, writing complexity is an (imperfect) signal of research effort and quality. For LLM-assisted papers, the relationship was inverted, with the authors concluding that the polished text of LLMs is helping to disguise lower-quality work. (They validated the findings against a separate dataset).
The authors also used the launch of Bing Chat, an LLM-based search engine, in 2023 to conduct a natural experiment. They compared views and downloads on arXiv that Bing Chat had referred, to those that Google Search referred. Bing Chat was more likely to refer scientists to newer and less-cited literature, as well as to books, possibly because LLMs are better able to parse long documents or a larger number of documents. (They also validated this finding with a separate dataset, although we don’t know how good the new sources cited by Bing were).
As the authors note, their study has a number of limitations. Their AI detection method is imperfect, only looks at abstracts, and doesn’t capture authors who may have edited LLM-generated text. There are also various potential confounders: maybe less experienced researchers are more likely to use LLMs? That said, the findings highlight (at least) three major questions posed by the growing integration of AI into science:
- First, AI is leading to a big increase in the supply of papers (and grant applications). This poses a challenge for preprint repositories, which don’t want to host slop. ArXiv, whose founder Paul Ginsparg is a co-author of this study, recently banned computer science review and position papers, citing a surge in low-quality AI papers. LLM-assisted papers also pose a challenge for peer reviewers, who are already under strain, and are typically prohibited from using AI, although many do so anyway. This seems unsustainable. As the authors of this study suggest, it is likely time to consider how to integrate AI into at least some aspects of the peer-review process.
- Second, the findings illustrate how LLMs may both mitigate and exacerbate fairness issues in science. For some scientists, the complexity of their writing may be a reliable indicator of their thinking and effort. For others, particularly non-native English speakers, writing may be more of an obstacle that has previously penalised them. A hopeful outcome is that LLMs may ease that burden. But a more worrying outcome is that, if reviewers and readers can no longer rely on writing complexity as an (albeit unfair) signal of good work, they may fall back on (even more unfair) signals, such as the institution that a person works at. This challenge is not limited to science, and may also occur in other areas where writing serves this purpose, like with cover letters.
- Finally, the finding that LLM-based search engines may increase the diversity of sources that researchers review is the opposite of what some suggested would happen: that AI models would continually cite the same high-profile studies, exacerbating the “Matthew effect”.
Collectively, the study serves as a reminder that for every concerning scenario about the integration of AI into science, there are plausible counter-scenarios. Will AI lessen scientific reliability because of hallucinations? Or will AI “review agents” and AI-supported evidence reviews reduce (the many) inaccuracies that are already in the evidence base? Will AI remove the intuitive and serendipitous ideas that humans come up with? Or will AI enable scientists to pursue more novel hypotheses? Ultimately, AI could well upend the standard processes and traditions of science but do so in a way that delivers fresh benefits. To know if and how that is occurring, we need more empirical evidence about how AI is changing science.

2. Lessons from two years of AI safety evaluations

What Happened: In December, the UK AI Security Institute shared a set of trends observed since they started to evaluate frontier AI systems in November 2023.
What’s Interesting:
- The report features more than 60 authors, a testament to the deep expertise that AISI has built up. Their trends are based on their evaluations of more than 30 frontier AI systems, with methodologies ranging from asking those AI systems questions to adversarially red-teaming them.
- Their headline finding is striking, if unsurprising: AI capabilities have rapidly improved across all the domains that AISI tests. In the cyber domain, AI models and agents can now successfully complete more than 40% of the 1-hour software tasks they are tested on, up from <5% in 2023. Last year, a model completed an “expert-level” cyber task for the first time. In biology and chemistry, AI has gone from significantly underperforming PhD-level human experts at troubleshooting experiments, to significantly outperforming them, including for requests about images.
- On the risk that AI models may “self-replicate” in a way that subverts human control, AISI’s evaluations suggest that AI agents have gotten better at simplified versions of some tasks that could be instrumental to self-replication, such as passing know-your-customer checks to access financial services, but less so at others, like retaining access to compute and deploying successor agents. AISI’s evaluations also suggest that models are capable of deliberately obstructing attempts to measure their true capabilities (“sandbagging”), but only when explicitly prompted to do so.
- The report also sheds light on AI systems’ limitations. In the cyber domain, AISI notes that AI systems still struggle in open-ended environments where they must complete long sequences of actions autonomously. Similarly, regarding chembio threats, biologists and chemists, and potential threat actors, need “tacit” knowledge and expertise, such as how to pipette. AISI’s evaluations to date have focussed more on explicit knowledge although they plan to share more on wet lab tasks.
- When it comes to mitigations, the report provides both reassurance and concern. On one hand, the safeguards that leading labs have introduced have made their models safer, in one instance increasing the amount of expert effort needed to jailbreak a model by 40x. On the other hand, AISI says that it was still able to find a vulnerability in every AI system it tested. Worryingly, AISI also found no notable correlation between how capable a model is, and the strength of safeguards it has in place.
- AISI also sheds light on two other sources of AI risk: open source and scaffolding. They argue that the performance gap between open source and proprietary AI models has narrowed. This introduces risks as safeguards for open models (where they exist) can be removed, and jailbreaks are hard to patch. AISI also found that scaffolding can make AI agents more capable than the underlying base AI models, even if those gaps later narrow when the base models are updated. Some complex scaffolds are in proprietary products, such as coding agents, but others are in open-source efforts.
- The report also touches on AISI’s evaluations of the broader societal impacts of AI, such as the degree to which people are using AI to access political information, or the risks of harmful manipulation. One striking statistic, picked up in media coverage of the report, was that one-third of UK respondents to a recent AISI survey had used AI for emotional support or social interaction in the preceding year, although just 4% do so daily. In a separate effort, AISI found that some dedicated AI companion users reported signs of “withdrawal” during outages.
- Overall, AISI argues that AI labs are taking an uneven approach to safety, focussing more on safeguards for biosecurity risks, for example, than for other threats. This is arguably true of AISI as well, given their strong focus on biology and chemical risks rather than radiological or nuclear risks. This raises a question: Given finite resources, what evaluations of frontier AI systems are most lacking in the current landscape?

3. One in four UK doctors are using AI in their clinical practice

What happened: The Nuffield Trust and the Royal College of General Practitioners surveyed more than 2,000 UK GPs to understand how they view and use AI, in what the authors called the largest and most up-to-date survey on the topic.
What’s interesting:
- 28% of UK GPs now use AI. This is up from ~10% in 2018, but below the rates seen in some other UK professions. According to the survey, the GPs most likely to use AI are younger, male, and work in more affluent areas. This is similar to disparities in the wider public’s use of LLMs, although there, the early gender gap may have narrowed.
- Just over half of AI-using GPs procure AI tools themselves rather than relying on those that their practices select. This kind of “shadow AI use” is not unique to GPs, but a Nuffield focus group sheds light on why UK GPs feel compelled to do it: some GP practices or Integrated Care Boards ban AI tools, while others are slow to respond to GPs’ requests and instead prefer to stick with legacy digital tools.
- UK GPs mainly use AI for clinical documentation and note-taking. Some say that AI note-taking allows them to look at, and speak more, with their patients, a non-trivial benefit given that the UK public worries about AI making healthcare staff more distant.
- GPs also use LLMs to produce documents, from translations of patient communications to referral letters; and to stay abreast of new research, with some younger practitioners turning to LLM “study modes” to help with their mandatory professional development.
- GPs cite “saving time” as the primary benefit of AI, and mainly use this to reduce overtime, rest, and engage in professional development, rather than to see more patients. This is notable as the UK government wants AI to reduce the wait time to get a GP appointment, which is a top concern for the public. These findings suggest that more nuanced evaluations of AI’s impact on GP services will be needed.
- GPs worry about errors and liability issues with AI. As a result, the authors call on tech suppliers to do better evaluations of hallucinations. Ideally, such evaluations would compare the accuracy of AI, human and hybrid outputs in real-world settings, and all the nuances that might entail. For example, when explaining the benefits of AI note-taking, some GPs pointed out that certain colleagues can’t touch type and so, without AI, struggle to capture all the details in a patient consultation ( this is, presumably, a form of inaccuracy).
- Use of AI for more complex “clinical support” tasks remains relatively low, owing to GPs’ concerns about errors, their desire to retain control over clinical judgement, and a lack of regulatory approval. However, some GPs did report using AI, or wanting to use future systems, to help check diagnoses, formulate care plans, and analyse lab results.
- This suggests that more GPs may start to use AI to enhance their own clinical judgement, spurred by a growing body of evidence that LLM-based systems may be useful in this area, and by the public’s own growing use of LLMs for answering medical questions.
- In their recommendations, the Nuffield authors call for clearer guidelines and regulatory frameworks for GPs, including as part of the UK’s new National Commission into the Regulation of AI in Healthcare. However, the report also acknowledges that much guidance already exists, such as the British Medical Association’s AI principles and the NHS guidance on AI note-taking (which some GPs appear to be breaking by procuring their own tools). This raises a question: what exactly should any new guidance stipulate? How to get the burden on GPs right? And how to ensure that they are actually following it?

Governments Are Struggling. Can AI Help?

Tom Rachman — Tue, 06 Jan 2026 11:03:24 GMT

Alexander Iosad (Credit: Gemini)

Everywhere, people grumble about the government: that politicians care only about themselves; that bureaucrats gum up the system; that taxpayers get fleeced. Even in wealthy countries, nearly two in three people are dissatisfied with how democracy is working.

Headlines focus on politics, but a deeper problem could be public services that are overwhelmed, in contrast to a technological era that keeps accelerating. The real danger, says Alexander Iosad, director of government innovation at the Tony Blair Institute, would be to change nothing.

AI Policy Perspectives visited Iosad, lead author of “Governing in the Age of AI,” to hear his vision of how technology might remedy governmental woes.

—Tom Rachman, AI Policy Perspectives

[Interview edited and condensed]

Tom: Aren’t people always bemoaning governments? Or is something broken in a different way today?

Alexander Iosad: People complain about public services being too bureaucratic, too standardized, not targeted enough. All of those things are true because the system was built in another era, when there was no way to operate differently. But over time, we have faced the Baumol cost-disease problem: things that we produce in the physical world get cheaper, but the cost of services keeps rising because of inflation and higher labour costs. As public-service costs grow, we have this conflict that has brewed over decades: Should government do less? or Should government tax more? But technologies have reached a level of maturity to break this cycle. We can have governments that aren’t dependent on just hiring more people to do more of the same, but can be cheaper, and more effective, and operate at a national scale all at the same time.

Tom: You’re proposing AI as a lever for state renewal. What philosophical change would governments need to achieve that?

Alexander: The first is for governments to realize they can’t continue with marginal tweaks to systems that don’t work. Public services are under such strain that people are looking for the status quo to be challenged. That’s why they’re open to populists. Instead, governments need to embrace the radicalism inherent in what we call disruptive delivery. And this is where AI is a big part of the solution.

WHAT AI FOR GOVERNMENT COULD LOOK LIKE

Tom: The public sector has a lower tolerance for error than the private sector—damage from an incorrect decision about public health could be far worse than a mistake in a business plan. How do you convince political leaders to embrace disruption when the cost of failure could be so high?

Alexander: Because the cost of inaction is much higher. If you do nothing, the system degrades. And the cost is borne by the citizen. If you have a healthcare system that is bursting at the seams; if you have an education system where the disadvantage gap between students on free school meals and their peers is 19 months and trending above pre-Covid levels—those are real problems experienced by real people. Not recognizing that you can actually change isn’t just a political cost. It is a cost to that citizen, which has downstream consequences for both the system and the politician.

Subscribe now

Tom: How might citizens experience AI improvements?

Alexander: By way of example, we can have an education system that is genuinely personalized. We know that personalized learning is more engaging and produces better learning outcomes. We can also have a system that identifies where students have learning gaps, and can inform teachers on what to address. Imagine a school where there’s an emerging gap in mathematics in Year Seven. At the moment, the only way you spot this is when the students take their exams four years later. By then, it’s too late. You might say, “Okay, we now need to focus on maths at that school.” But you’ve had a cohort of students come through, and suffer from this failure. With data and AI, you can spot the gap as it emerges.

Furthermore, we currently have a model of schooling that depends on having access to a person: the teacher. Maybe a parent has a question, and must email the teacher, then wait. If we have a safety net of an AI system—say, a tutor that’s always available, and that is verified to be accurate enough, and that is adapted to the national standards—that parent or student may ask a question at 7:30pm on a Saturday, and doesn’t have to wait to find the teacher. More broadly, you’re creating a different experience of interacting with public services, where they are there for you when you need them.

Tom: To some educators, that picture of teaching will seem like techno-solutionism that overlooks the human role in learning.

Alexander: I would class myself as a tech optimist rather than a tech solutionist. Techno-solutionism means high trust in technology—but low trust in people. Tech optimism is high trust in both. It’s not about replacing the human connection. It’s about recognizing the constraints that a sole dependence on humans to deliver public services introduces into the system, and the gaps that it creates. An ideal system is one that fills those gaps with technology.

Tom: What about other sectors, such as public health?

Alexander: People ask for a transformative AI use-case in healthcare, but it won’t be one big thing; it’ll be 1,000 little things that, in aggregate, completely change your experience. People are already wearing digital rings and smartwatches that measure their pulse and can tell if they are at risk of particular health problems. So at an individual level, this is starting to work already. It becomes really powerful once you connect this to population-level health. In a more personal way, if your doctor has an ambient AI note-taking system, your medical experience transforms. Today, you sit in front of them, they type a lot, and occasionally look at you. But you can have a system where they are fully present and listening, and don’t have to worry about capturing the full picture of what you’re telling them. As we expand outwards, there is the pharmaceutical revolution from AI too, with less cost and more speed of development, and medicines that can be adapted to your body.

Tom: What about government’s role in managing crime?

Alexander: One example is facial recognition, which is contentious for good reasons. People don’t like the idea of their faces being scanned as they walk down the street. “What if there’s a mistake? What if I’m apprehended wrongly?” But in the UK, this technology has achieved very high levels of accuracy now, and does not lead to wrongful arrests. There’s data recently out of the London Metropolitan Police, which uses facial recognition extensively, where the error rate was 10 faces identified wrongly out of more than 3 million scans. No wrongful arrests. But hundreds of correct arrests that would not have happened otherwise.

Tom: But if we move towards data-driven policing, isn’t there a risk that bias within the data could lead to injustice?

Alexander: Of course, you have a big challenge with potential bias in this context. You train the systems on existing data, which might not have enough representation of people from minority groups—for example, fewer non-European faces, so the algorithm is more likely to misidentify people. Or the data might have groups over-represented—for example, capturing historical overpolicing of communities or areas. The risk is that these biases are replicated, and even scaled up. Early versions of new tools are more likely to make such errors, and real-world experience shows that, if we are aware of this, and take active steps to mitigate it, it is possible to prevent these kinds of biases. This is something that needs to be built into the process of development and deployment. We see, for example, that facial-recognition systems are much more accurate today than they were 10 years ago. Not perfect, but much better, and providing better intelligence for officers to decide when they need to act. You could also have a kind of AI peer review, where one model might be trained to monitor another for replicating bias, or introducing new bias into the system—a watching-the-watchers situation. Again, this would be an improvement on the situation we have today, where much of this bias just passes unnoticed and uncorrected.

Tom: So, it’s not the sci-fi dystopian vision of crime-fighting, you’re saying?

Alexander: Yes. And the status quo is a uniformed police officer on the corner, standing in the rain, the sun setting, holding a printout from earlier that morning with blurry low-resolution pictures of the people they’re looking for. They make more wrongful arrests as a result of that situation than police officers sitting in a van with computer infrastructure, and a camera telling them there’s a person walking down the street with a child, and this person is on a sex offenders’ register, with court restrictions against being near children. The police officer can go and talk to this person. This is a real case, by the way—and it turned out to be someone building a friendship with the child’s family without their knowing he was on the register. No way would a police officer know this today, if someone just walked past them with the child. So it’s about looking at what we do, and how we can do better, rather than leaning into these fantasies of complete control.

The face of bad government. (From a 14th-century allegorical painting of lousy leadership. Siena, Italy.)

3 NEW AI ROLES

Tom: You also advocate a radical new model for how governments operate internally. Could you explain these three concepts: the Digital Public Assistant for every citizen; AI co-workers for each civil servant; and a National Policy Twin for policymakers to simulate decisions.

Alexander: The Digital Public Assistant, either on my device or online, would be a system that connects information about you held by different parts of government—for example, your income level and your address—and is then able to say, “You’re eligible for this particular discount on your energy bill—would you like to have it?” Or it could support you during interactions with government officials. So much of our time is spent repeating the same things to different agencies, whereas here you might be talking to an unemployment adviser, and they can see your employment history or your qualifications, and suggest the right next steps for you so the job you find is the best fit for you specifically. Which might mean you stay in that job longer, and grow in it to have a fulfilling career. You could have a settings dashboard to decide how various AI agents interact with the government on your behalf. All this puts you in greater control.

Tom: What about AI co-workers for each civil servant?

Alexander: This is already starting to happen with chatbots, but that is the most basic version of it. You could have a suite of co-workers that looks at new cases, such as requests for support or applications for services, that a public-sector worker receives, and helps prioritise this, or find the information that the civil servant needs to make the best decision. The AIs don’t make decisions in place of that worker, but they make the worker much better informed, and save them hours of digging through regulations. There was a pilot experiment that showcased the potential for this in the UK government, involving employees of the Department for Work and Pensions who help jobseekers find employment. The employees, who act as work coaches for job seekers, were able to ask a large language model to explain various rules, to help draft documents, to prepare reports, and update records. Today, if a government employee has a question about when a claimant is eligible for a particular service, they might just search the internet. But you can have a system trained on the relevant rules, and gives you a quick and accurate answer. This saved about two weeks’ time per employee per year—and allowed these work coaches to focus on building relationships with the people who needed their support.

You can picture this across different parts of government. In procurement, you would have more informed advice about all the bids coming through, for example. Or if you think about how much time officials spend sending documents around for someone else to summarize when they are asked to prepare briefings and documents for government ministers—a lot of this work could be done much more quickly, so people have time to actually think about what it means, not just produce digests, and you could include a wider range of different sources so the information is more nuanced, accurate, and up to date.

Tom: Your third concept is an AI simulation of the entire country to test out policies.

Alexander: Yes, this gets exciting. We call it the National Policy Twin. Data is aggregated from different parts of service delivery, such as information on schools from the education department, and economic data from the statistics agency, and incomes data through the tax agency, and so forth. Together, it’s essentially a digital twin of your country, and you can run different policy scenarios informed by this data. At the moment, civil servants present a government minister with, say, three policy scenarios. If there are assumptions that the minister doesn’t agree with, they’ll say, “Give me three other scenarios based on different assumptions.” They wait for weeks, and then the process repeats. With the National Policy Twin, you could test ideas or intuitions very quickly, iterate on ideas, and ask for best practices from around the world, so that policies have a stronger evidence base—all in minutes, not days. You are not replacing the policymaking process. But you are speeding things up, so you can test more options. You are less likely to miss the right option because it never came up.

Tom: But isn’t the validity of a “digital twin” simulation dependent on the quality and comprehensiveness of the data available? And wouldn’t this risk biasing decision-makers toward whatever the data suggested rather than broader impressions, even if those broad impressions encompassed more wisdom?

Alexander: It is a danger. But it’s also a motivation to ensure your statistics agency runs well. This dramatically raises the importance of getting data right, and it’s something that not every government has really paid attention to. This would be helped if you build a whole data system, including Digital Public Assistants, where citizens can correct their information, leading to better data flows to governmental institutions. This is also where AI systems can interpret unstructured data, understand how it all fits in together, and provide informed advice. Again, AI is not making the decisions. It’s providing information for humans that was previously not available or not usable, and helping people to make sense of it, and make better decisions as a result.

OBSTACLES REMAIN

Tom: Another hurdle is decades-old IT systems in public services. Can governments overhaul this infrastructure at a pace that keeps up with AI development?

Alexander: Legacy infrastructure is a problem, and interoperability in government is something most countries are trying to tackle. In the UK’s blueprint for modern digital government, there is a plan to make every public-sector dataset interoperable in the next few years. This is the first thing we should do. Right now, some police forces spend 90% of their IT budget on maintaining legacy systems. If you’ve got legacy systems here and there, fine—spend 10% of your budget on that. But 90% should be spent on upgrading. You do this for two years, and it’s a hard push, and will be painful. But then we get there.

Tom: Another concern about using AI in so many parts of governmental work is that we risk losing democratic transparency, explainability, and the citizen’s right to appeal decisions made by algorithms.

Alexander: There needs to be human accountability for decisions made on the basis of this system. We need that built-in from the start. This needs to be sensitive to individual circumstances because, for every 95% of successful cases, you will have some cases where things didn’t work as expected. If we free up government resources by using AI, we can use those resources to make it easier for people to go and talk to someone when they need to, either because something went wrong, or because they are more comfortable with that way of dealing with the government.

WHICH GOVERNMENTS ARE TRYING THIS?

Tom: You published “Governing in the Age of AI” shortly before the July 2024 general election in the United Kingdom. It’s around a year and a half since Prime Minister Keir Starmer’s Labour Party took power. Are there lessons in what has or hasn’t happened regarding AI implementation?

Alexander: The UK has been among the more ambitious globally, including its AI Opportunities Action Plan and its blueprint for modern digital government. But there is a challenge when it comes to AI in government: how do you make it tangible for people, and how to balance risk and reward in doing so? If you are a political leader coming into office and thinking about this, how do you drive forward AI while maintaining public support? What are the quick wins where you can tangibly speed up the way that citizens interact with government, where you can improve that experience in ways that you can claim credit for? Part of the challenge that this government has arguably had is that not everyone has noticed the things it does.

Tom: What’s an example of something that has worked, but that people aren’t noticing?

Alexander: You have a problem since Covid in the UK, and in many other countries, with students not showing up for lessons. So what they’ve done is connect school attendance systems so that the government gets a daily record of the proportion of students who came to school the day before. But it’s not enough to just have data, so what they’ve done is build tools that explain to school leaders how they compare to other similar schools, and what profile of students might be seeing a gap in attendance. In one rural school, attendance kept dropping on Tuesdays, and the school didn’t notice until the Department for Education came with a tool that showed this trend. Then the school discovered that there was a bus that was always late on Tuesdays, so students just gave up and never came in. They hired a minivan for Tuesdays, and attendance shot up.

Tom: Which governments around the world are getting this right?

Alexander: We are at an early stage in this journey, even for the private sector, and certainly for governments, which tend to move slowly. But Singapore is doing well. And Estonia. And Ukraine, for obvious reasons: they’re having to break the current way of doing things; you have to figure out other ways. They recently launched a chatbot that Ukrainian citizens can use to get answers based on information from their digital ID. Australia is another country doing well, particularly on AI and education. The UK too. But there won’t be a simple list of “Five Ways That AI Has Transformed Government.” It’s going to be everyone doing a bit of something somewhere that adds up to a bigger picture. It’s not, “Are you promoting AI in your public service?” Everyone is. It’s: “Are you just making current processes slightly faster? Or are you genuinely thinking about deeper reform?”

Tom: Albania introduced a virtual AI minister to handle public procurement. What do you think of that?

Alexander: It’s quite an attention-grabbing announcement but is making a serious point: that AI can help cut fraud, improve efficiency, and save money in public procurement. But Albania has an even more interesting example of AI in government. They’re going through the process of applying for European Union membership, and that is both a bureaucratic process and a process of real reform, where you bring your legislation in line with European standards. So, you’ve got laws in Albanian, you’ve got European laws in English and French, and so on, and you need to find discrepancies, and update legislation, then implement reforms. That is an incredibly time-consuming process that has typically meant hiring hundreds, if not thousands, of lawyers and translators. It takes a decade to do this. But Albania is using AI tools to radically speed up this process. That is accelerating their accession process, possibly by several years.

Tom: We’ve talked a lot about the public services, but do you have thoughts on how AI could update democracy more broadly?

Alexander: If we get this right, the most noticeable impact will be improved trust because government can deliver rather than let things continue to slide into decline. Also, AI can introduce more transparency. Several countries have Freedom of Information acts, but it takes ages. There are local governments in the UK experimenting with systems where you type in a question, and if they have the data already, it’ll answer your question, just give you the data right there, and you don’t have to go through civil servants for it. There is also a philosophical reason why accountability could improve in the age of AI: the machine doesn’t make the decisions. Even if you have an automated system, there should be a person somewhere, thinking, “Let’s make a choice we are comfortable with.” If we get into that mindset, we make government aware that the human role is to make good decisions, and to take that responsibility very seriously. That, I think, will have a significant impact on democracy.

TAKEAWAYS

Tom: What final message do you have for policymakers trying to use AI in government?

Alexander: What’s really important is to carve out time for this thinking. As a public service, you’re always under pressure; you always need to deliver the next thing. Yes, AI will save time—but if you are just adding more work into those hours, you’re not going to get any gains. Carve out half the time that you save because of general-purpose AI systems to sit down with colleagues, and think how to improve your service. This requires leadership to say, “You have to do this.” We need a public-service workforce that is both more capable of this type of creative thought and experimentation, and is actually empowered to do it. At the moment, we have a pyramid shape with a lot of people doing a lot of repetitive tasks at lower pay. Those jobs are at risk because AI tools are good at doing those tasks at a fraction of the cost, and in seconds, not hours. What does that mean for the future structure of the civil service? Is it the same people doing different things? Is it fewer people? I don’t think anyone really has good answers yet.

Tom: What’s the biggest obstacle to your vision? And the best answer?

Alexander: The biggest obstacle is inertia. This future is uncertain, and government isn’t always good at dealing with uncertainty. The best answer is for leadership to take seriously the responsibility of updating government. Otherwise, we will be left behind. On the cost side, it’s not just hiring engineers or buying computers. It’s the cost of inaction that you need to weigh up.

Séb Krier’s Top 8 AI Reads of the Year

AI Policy Perspectives — Thu, 18 Dec 2025 14:23:12 GMT

Every month or so, Séb Krier shares a list of favourite articles with his Google DeepMind colleagues. In the run-up to this festive period, we forced him to pick those that he most enjoyed over the past year. He came up with five unmissable pieces from 2025, plus three classics. As always with Séb’s lists, this one comes with its own music mix. Enjoy!

—Conor Griffin, AI Policy Perspectives

Images from Gemini

Five Great Pieces from 2025

1. A Defence of Weird Research (Asterisk Magazine)

Deena Mousa & Lauren Gilbert

Séb Says: Science funding needs to be shaken up. But I’m concerned that a lot of good research might be cut because people misunderstand how science works. Mousa and Gilbert remind us why basic research matters, and why governments should fund it: while the benefits to society are significant, they are hard to predict and take time to materialise, so companies will underinvest. To make their case, the authors take a tour of weird-research success stories, such as how studying lizard venom led to the invention of Ozempic, and how studying the effects of separating rat pups from their mothers led to the now common use of massage therapy to help pre-term human babies. Did you know that studying frog skin led to the invention of oral rehydration therapy, which has saved over 70 million lives?

2. Requests for journalists covering AI and the environment (The Weird Turn Pro newsletter)

Andy Masley

Séb Says: I worry about the quality of a lot of commentary on AI and the environment. So it’s important to re-up these best practices. Specifically, Masley cautions that readers are coming away with wildly inaccurate beliefs about where AI and data centres fit into the environmental picture. His favourite book on good environmental communication is Sustainable Energy—Without the Hot Air, by David JC MacKay, and his guidance includes some classics of the genre, such as never sharing contextless large numbers (“200,000 bottles of water per day”). He also suggests comparing data centres’ energy use with other industries, rather than with household use. Although aimed at journalists, the guidance is also helpful to those working in policy, some of whom make the mistakes that Andy calls out, such as viewing one’s own AI prompts as environmentally consequential.

3. ChatGPT and the Meaning of Life, (Scott Aronson’s Shtetl-Optimized blog)

Harvey Lederman

Séb Says: I don’t think all jobs will disappear any time soon. But if we get full automation, then Lederman’s piece is a good way to think about it. He starts by describing the fits of dread he has felt ever since the launch of ChatGPT, then considers reasons why the end of work could hurt society, from losing the joy of scientific discovery to losing the sense of purpose from serving others. Ultimately, he rejects the most pessimistic arguments, noting that the consequences of scientific findings, such as penicillin that saves lives, are more important than their discovery, and that much service work is drudgery. However, he captures how difficult the transition may be, including for “workists” like him who use their jobs to make sense of their lives. He concludes that: “A future without work could be much better than ours, overall. But, living in that world, or watching as our old ways passed away, we might still reasonably grieve the loss of the work that once was part of who we were.”

4. How much economic growth from AI should we expect, how soon? (Inference Magazine)

Jack Wiseman & Duncan McClements

Séb Says: Some predict that AI will be close to economically useless, while others think it might transform everything tomorrow. This piece comes closest to how I think about it. As Wiseman & McClements explain, the most ambitious forecasts for AI rest on the idea of “digital AI researchers” that train and improve the next generation, leading to a jump in the share of economic tasks that AI can do. One obstacle to achieving this is the availability of compute, which is increasingly allocated to serve customers (inference) rather than to training new models. Additionally, a multitude of frictions will slow the diffusion of AI, whether it’s the time needed to cultivate biological cells for scientific experiments, or the regulatory approvals for sensitive-use cases. As a result, the authors expect a transformative impact on near-term economic growth, but not an explosive one.

5. Yes, Econ 101 is underrated (Economic Forces newsletter)

Brian Albrecht

Séb Says: Much of the discourse on the Left and the Right ignores inconvenient truths of economics, so it’s good to return to the basics. Albrecht shows how Econ 101 helps explain the world. For example, egg producers were accused of price-gouging when they charged sharply more in 2022, but it had more to do with avian flu killing many chickens. In the egg market, supply and demand are relatively inelastic: It takes time to raise chickens, and customers who want omelettes don’t have alternatives. So, prices jumped. Different markets have different characteristics, but the explanatory power of supply, demand and pricing is similar. Nor does outsized market power invalidate these principles. This essay also shows how Econ 101 offers insights into social trends, such as how skewed sex ratios can affect marriage and employment rates, as in certain immigrant communities, or drive up savings rates, as in China. Econ 101 may not tell us whether policies will be politically popular or whether outcomes are fair. But it does help predict what those outcomes may be.

Subscribe now

Three Classics that I Revisited

6. Why do people believe true things? (Conspicuous Cognition newsletter)

Dan Williams

Séb Says: Anything Dan Williams writes is self-recommending, and this piece is no exception. In July 2024, he critiqued how many people think about the relationship between belief and reality. To illustrate this, he notes that people seek explanations for issues like crime and poverty, when the real question is understanding law-abidingness and wealth. This requires “explanatory inversion.” Transferring that concept to how people commonly debate public knowledge, he notes that many misinformation researchers concern themselves with why different groups believe falsehoods. But the more pertinent puzzle, he contends, is why humans overcome error, bias and illusions to form accurate perceptions of how things are. His conclusion? Ignorance and misperceptions are the default, and humanity will revert to them, unless we can understand, maintain and improve our norms and institutions, from journalistic integrity to robust legal systems.

7. Hayek on the Role of Reason in Human Affairs (Intercollegiate Studies Institute)

Séb Says: A lot of discourse on intelligence, knowledge, and coordination is biased towards a computer-science-centric view of the world, and neglects Hayek’s views. This 2014 essay explains how Hayek championed critical rationalism, which was rooted in the Scottish Enlightenment of David Hume and Adam Smith, and developed by Carl Menger and the Austrian School. Critical rationalism sees social order as spontaneous, and the unintended result of human action, not design. As a result, inherited social institutions and rules contain tacit knowledge, the result of a multitude of trials and errors, that transcends the knowledge available to a reasoning mind. Therefore, the desire to “make everything subject to rational control,” Hayek suggests, is an egregious error. Reason should instead serve a negative function, to guide and restrain irrational impulses or morals. As the human mind cannot master all the concrete details of society, we must rely on abstract concepts and rules, like the rule of law and the market, to coordinate the dispersed, fragmented, knowledge of millions of people.

8. The Inner Ring (The C.S. Lewis Society of California)

C.S. Lewis

Séb Says: This piece profoundly shaped how I think about the world. In this 1944 lecture at King’s College, University of London, Lewis offered “middle-aged moralising” to a group of students during wartime, telling them that in every organisation, from school to the army, there are two hierarchies. There is the official hierarchy. Then, there is the informal hierarchy, an “Inner Ring” that holds the true power. The Inner Ring comes in many forms, from high society to “communistic côteries.” It is always evolving, holds no formal admissions or expulsions, and bears no clear identifying marks, save perhaps particular slang and a longing from others to be inside. It is this desire, and the terror of being outside, that turns people into scoundrels, he argues. The Inner Ring may be unavoidable, or even necessary. But the quest to enter it is ultimately futile. “Once the first novelty is worn off, the members of this circle will be no more interesting than your old friends. Why should they be?” Lewis said. “You were not looking for virtue or kindness or loyalty or humour or learning or wit or any of the things that can really be enjoyed. You merely wanted to be ‘in.’ And that is a pleasure that cannot last.” What to do instead? Be a sound craftsman who focuses on the quality of work as an end in itself, and spend time with people you actually like.

What Do YOU Think?

Conor Griffin — Tue, 16 Dec 2025 12:39:34 GMT

The future is coming too fast! (By which we mean 2026.)

But fear not. We’re hatching great reads for your new year: Can AI help fix government?; What exactly is “AI manipulation”?; and Might sci-fi hold clues about the world we’re hurtling towards?

All we lack is you. More exactly, your intelligence on artificial intelligence. So……

Please complete this quick (5 minutes?) questionnaire.

Given plunging survey response rates, we’ve limited ourselves to just 2 questions. Write 2 words, or 200.

What’s an AI topic you’d like better explained?
What’s a topic that people aren’t discussing enough?

We’ll read every answer with great interest. Wishing you an excellent 2026!

—Conor Griffin & Tom Rachman, AI Policy Perspectives

Subscribe now

What’s It Like To Be A Bot?

AI Policy Perspectives — Wed, 10 Dec 2025 10:42:56 GMT

(Images: Gemini)

If an AI gained consciousness, would we know?

Maybe this question strikes you as absurd; maybe, disquieting. Either way, you’ll hear it more in coming years, as human beings develop increasingly close ties with charismatic machines trained on us.

Thankfully, philosophers have pondered consciousness for about as long as philosophers have pondered anything. In recent decades, advances in computing added urgency, with leading thinkers dreaming up a range of provocative thought-experiments: a man communicating from a locked room; a woman afflicted by a blue banana; a bat with an inner life.

To explain, we are publishing this essay about key thought-experiments related to AI, written by the broadcaster and author David Edmonds, whose acclaimed books include Parfit, the recently released Death in a Shallow Pond, and a collection of philosophical essays that he edited, AI Morality. He is currently writing a book on thought-experiments.

—Tom Rachman, AI Policy Perspectives

By David Edmonds

As a young scholar in Oxford, John Searle fell in love twice. First with a fellow student, Dagmar, who became his wife, and second with philosophy. The City of Dreaming Spires was grim in the 1950s, Searle recalled, with unheated buildings and inedible food. “The British were still on wartime rationings,” he said. “You got one egg a week.”

The philosophical fare was more nourishing. Searle described the collection of philosophers in the city as “the best the world has had in one place at one time since ancient Athens.” Two giants of Oxford philosophy, Peter Strawson and J.L. Austin, were key influences on him.

Searle became fixated on one topic that, for the rest of his life, he maintained was the central puzzle for philosophy: consciousness. How was human reality and our conception of ourselves compatible with the physical world? How could beings with free will and intentionality exist? How could politics, ethics and aesthetics arise out of the “mindless, meaningless” stuff from which the physical world was constructed?

From 1959, Searle taught at Berkeley, beginning his career in what now seems a remote era of pen and paper. It wasn’t until the late 1970s that personal computers became widely available. At roughly the same time, debates around artificial intelligence gathered speed and heat.

In 1979, Searle was invited to deliver a lecture at Yale to AI researchers. He knew next-to-nothing about AI, so bought a book on the subject. This described how a computer programme had been fed a story about a man who’d gone to a restaurant, been served a burnt hamburger, and stormed out without paying. Did the man eat the hamburger? The programme correctly worked out that he had not. “They thought that showed it understood,” he commented. “I thought that was ridiculous.”

And so in 1980, Searle published a paper called “Minds, Brains, and Programs,” introducing the Chinese Room, one of several famous philosophical thought-experiments that have had a lasting impact on discussions of consciousness and AI.

It goes something like this. You are the only person in a locked room. A note is passed to you underneath the door. You recognize the characters as being Chinese, but you don’t speak Chinese. By luck, there’s a manual in the room, with instructions on how to manipulate these symbols. You follow the instructions. Without understanding the content of what you’ve written, you produce a reply that you slip back under the door. Another note arrives. With the manual, you again generate a reply.

The person on the other side of the door might have the impression that you understand Chinese. But do you? Obviously not, thought Searle. And any computer is in an analogous position. A computer is merely manipulating symbols, following instructions, he thought. Computation and understanding are not synonymous.

BATS & COLOURS

It is a striking feature of the philosophy of mind, and consciousness studies, that so much of the intellectual agenda has been driven by a small set of thought-experiments.

The Chinese Room has spawned a vast literature. Almost as famous is a paper, “What’s It Like To Be A Bat?”, that predated Searle’s by six years, written by another American philosopher, Thomas Nagel, whom Searle befriended during his Oxford years.

Like us, bats are mammals. But they have an alien way of navigating the world, echolocation. There is a subgroup of humans, chiropterologists, who know an impressive amount about bats, and have investigated how their high-frequency sounds bounce off objects, allowing them to detect size, shape and distance. But there is one thing that they don’t and can’t know, Nagel said: the subjective experience of being this creature.

“I want to know what it is like for a bat to be a bat,” he wrote. “Yet if I try to imagine this I am restricted to the resources of my own mind, and those resources are inadequate to the task.” AI was still in its infancy when Nagel wrote his article, but questions about the meaning of an artificial mind were already circulating. Could there be something that it is like to be a thinking machine?

The Australian philosopher, Frank Jackson, attacked the problem from a different angle in his 1982 article, “Epiphenomenal Qualia” (qualia being a term for the subjective aspects of conscious experience). In his Mary’s Room thought-experiment, a woman has had an unusual upbringing. Mary was raised alone, entirely in a black-and-white room: black-and-white walls, a black-and-white floor, a black-and-white TV. She has black-and-white clothes and her food, pushed under the black-and-white door, has been dyed black and white.

To stave off the tedium of her monochrome existence, Mary studies hard, and her focus is colour. She learns all about the physics and biology of colour—for example, about the wavelengths of particular colours and how they interact with the retina to stimulate experience. She even learns how colour words are used in literature, poetry and ordinary language, and how someone can “feel blue,” be “green with envy,” or so angry that “a red mist descends.” Mary becomes the world’s expert on all aspects of colour.

One day, the door to Mary’s room opens for the first time, and she joins us in our kaleidoscopic world. The first thing she sees is a ripe red apple. The question is this: When Mary sees this apple, does she learn anything?

Jackson argued—and most people presented with this scenario seem to agree—that in seeing what red actually looks like, Mary has learnt something. At the time of his article, what Jackson took this to show is that a purely physical description of the world cannot capture everything there is to know about the world. The phenomenology of experience (the redness, the what’s- it-like-to-be-a-bat-ness) cannot be fully explained with descriptions of particles and fields, electrons and neutrons, atoms and molecules.

Even if an AI could recognize a new shade of colour, such as lilac, that it had never seen before, it would not mimic human experience if it lacked lilac qualia—or so Mary’s Room might suggest. This raises the issue of how human subjectivity may be relevant to comprehension and functioning in the real world, turning a philosophical question into a technical one.

Subscribe now

BLOCKHEADS & BANANAS

It was Jackson who gave the name “Blockhead” to a thought-experiment from the American philosopher Ned Block that appears in a 1981 paper, “Psychologism and Behaviorism.” We are to imagine there is a computer, programmed in advance so that it could respond to every possible sentence with its own plausible sentence.

This was in part a response to the famous test of machine intelligence that Alan Turing set in 1950. A computer passed the Turing test if it could converse with a human, and the human could not identify it as a machine. The Blockhead machine would pass the Turing test yet is self-evidently not intelligent.

Today’s LLMs could fool us into believing that we are engaging with humans. But, much as Searle contended in the Chinese Room that manipulating symbols is insufficient for understanding, Block argued that behaving identically to an intelligent entity is insufficient to demonstrate intelligence or mental states. The lesson we might take from these, along with the Nagel and Jackson thought-experiments, is that AI would lack fundamental features of human consciousness.

Daniel Dennett, on the other hand, thought it was at least conceivable that AI could be conscious. With his lumbering bulk and Santa Claus beard, Dennett was an unmistakable figure in the philosophical world. He coined the term “intuition pump” as an explanation for how thought-experiments functioned. Pumping our intuitions can be helpful, he believed, but they can also mislead. What we need is to examine how the pump operates, he said, to “turn all the knobs, see how they work, take them apart.”

A thought-experiment for which he had particular loathing was the Chinese Room. He argued that its principal error was to portray language as akin to instructions. But for a computer to master a language would take millions and millions of lines of code. And, though we might say that the man alone in the room doesn’t understand, perhaps the system as a whole does.

Dennett felt that Mary’s Room had similarly hoodwinked us. To expose this, he presented another thought-experiment. Mary is as before, an unusual woman whose life has been led entirely in monochrome, until the day when the door opens. But this time, he wrote:

As a trick, they prepared a bright blue banana to present as her first colour experience ever. Mary took one look at it and said, “Hey! You tried to trick me! Bananas are yellow, but this one is blue!” Her captors were dumbfounded. How did she do it? “Simple,” she replied. “You have to remember that I know everything—absolutely everything—that could ever be known about the physical causes and effects of colour vision. So of course before you brought the banana in, I had already written down, in exquisite detail, exactly what physical impression a yellow object or a blue object (or a green object, etc.) would make on my nervous system.”

Mary is the world expert on colour, so why wouldn’t she spot such an obvious deceit? Dennett argued that the idea that we had feelings, thoughts and desires that were resistant to an objective, external, physicalist analysis was mistaken. In that sense, “qualia” were a mirage, a kind of useful fiction. If we do away with this fiction, then a major barrier vanishes to building AI that’s like a human in most important respects.

The AI researcher Blaise Agüera y Arcas has argued that in theory (and increasingly in practice) there is no significant distinction between a human and machine reaction to so-called qualia. “So many food, wine, and coffee nerds have written in exhaustive (and exhausting) detail about their olfactory experiences that the relevant perceptual map is already latent in large language models. … In effect, large language models do have noses: ours.”

Philosophy is for discussing. Start a conversation about this article!

AVOIDING TWO BAD OUTCOMES

The enduring fascination with thought-experiments in the AI era—and the intensity of the disputes that they provoke—reflect how much is at stake. While these questions are important for morality, they could become more than theory.

“The importance of the dispute over AI welfare can be understood in terms of the avoidance of two bad outcomes: under-attributing and over-attributing welfare to AIs,” the philosophers Geoff Keeling and Winnie Street explain in their forthcoming book Emerging Questions in AI Welfare.

“On one hand, failing to register that AIs are welfare subjects when AIs are in fact welfare subjects is bad because it could lead to unintentional mistreatment of AIs or the neglect of the needs of AIs, potentially resulting in large-scale suffering,” they write. “On the other hand, over-attributing welfare to AIs is problematic because resource allocation decisions for promoting the (potential) welfare of different kinds of entities—including humans, non-human animals and AIs—are often zero-sum.”

In other words, the efforts and resources you invest in AI welfare mean less for people and animals.

To manage this quandary, Keeling and Street propose three parallel projects. First, there is a philosophical project, in which we consider which forms of AI could be candidates for welfare. Is it the underlying AI model? Or the system built atop? Or would it be specific agents? Second, there is a scientific project, in which we establish methodologies to detect factors such as consciousness. Thirdly, there is a democratic project of versing the public in the complex issues that await.

Once this future engulfs us, thought-experiments about machine consciousness could move beyond speculation. The “experiments” would be active, while the participants would be humanity itself—and perhaps other beings besides.

10 Takeaways From A Talk With Dean Ball

AI Policy Perspectives — Thu, 04 Dec 2025 10:17:36 GMT

From April to August this year, Dean Ball played a central role in drafting America’s AI Action Plan. Now, he’s back in the think tank world, as a senior fellow at the Foundation for American Innovation in Washington, while continuing to write about AI policy on his influential Hyperdimensional newsletter. Dean recently stopped by Google DeepMind’s London office for a discussion. Here are 10 takeaways from the chat.

Source: deanball.com/Gemini

The White House AI experience: Dean was surprised by how congenial and non-bureaucratic the White House was. He expected “turf wars and weird procedural blockers” but generally found a collaborative environment that was focussed on executing—a welcome contrast to the administrative hurdles he faced in academia. In terms of missed opportunities, he wished the administration could have articulated a more coherent framework for how chip exports will work, an area he felt was under-developed in the AI Action Plan.

The AI for Science opportunity: Alongside developments such as automated labs, AI could transform how science is practiced. Dean sees chemistry and biology becoming “information sciences” that give humanity increasing dominion over everything from the clothes we wear to the buildings we live in—a veritable revolution in human affairs. This has big implications for governments, which play a leading role in science. One challenge will be the recurring tension between open data and national security concerns for more sensitive scientific information like fusion simulation codes or viral sequences. Companies should think about how their science research, and their AI models, could help solve priority government problems, such as the potential role of AI materials science in addressing rare-earth metals challenges, or the role of robotics in US reindustrialisation.

Manageable vs. emergent AI risks: Dean believes there are significant risks from AI to cybersecurity and biosecurity, but also conceivable ways to manage them, and that AI will also improve defences in these areas. In terms of more unpredictable risks, he pointed to the strange outcomes that may occur when autonomous AI agents interact at scale in adversarial contexts, for example in legal transactions. From an alignment perspective, he noted the concern that LLMs may have some fundamental properties that lend themselves to a sort of intrinsic “parasitic” need to self-replicate, a risk with no obvious policy response. Such emergent risks explain what he described as “exceptionally strong attention” to alignment and interpretability in the Action Plan.

Regulation (1): In the near term, we don’t know what harms advanced AI may trigger, so Dean argued for a flexible approach that avoids premature, prescriptive AI regulation. Taking inspiration from machine learning, Dean noted that a “gradient is better than static rules”, and called for:
- Modest transparency requirements that require frontier AI labs to share documents like model specs and responsible scaling policies that explain their models’ intended behaviours, a user’s ability to customise these behaviours, and the things that the model should never do.
- Using common-law liability and the framework of “reasonable care” to address harms as they arise. He cited the recent AI child self-harm issues which are a leading concern in the US, but were largely absent from leading international AI regulation and governance efforts, as an example of how difficult it is to predict the most consequential, or politically salient, AI risks.

Regulation (2): For more severe longer-term risks, Dean suggested laying the foundation for entity-based governance—regulating frontier AI labs and their business processes and information flows much as financial institutions are regulated. However, he didn’t think this was necessary yet, and acknowledged the challenges, including the potential for regulatory capture and technology path dependence. He also pointed to the potential to use AI as a tool of governance, for example enabling regulatory bodies to stream telemetry to help them do compliance and oversight.

International coordination: The US administration is focussed on bilateral deals and partnering directly with nations to build and diffuse AI infrastructure. They view most global governance bodies as outdated. Rather than a UN-style body to govern AI, Dean envisions a future governed by technical protocols, similar to the role that SWIFT plays in global finance. This wouldn’t require large teams of bureaucrats to write rules. Rather, the protocols could emerge from industry competition before government steps in to help standardise the strongest ones.

The West’s cultural hesitancy: Dean believes that many in the West are more negative towards AI compared with the relative optimism found in Asia and the Global South. He attributed much of this to Western populations being older and wealthier. As a technological determinist, Dean considers almost everything downstream of technology. As a result, the best hope for changing culture, he said, was to develop “incredibly good technology” that demonstrates the immense upside of AI.

The coming AI political flashpoints:
- Employment: Dean thinks a non-linear increase in US unemployment is possible in the coming months. AI may contribute, but other macroeconomic trends will likely be the main drivers. Still, AI could become a scapegoat, and pushback from vested interests is likely. We need better policy responses, with Dean contending that ideas such as universal basic income “don’t smell right”.
- Data centres: In the United States, local opposition to data centres is growing. But the general dynamism of the US economy and the country’s “competitive federalism” means that data centres don’t have to be located in any one specific location, so getting infrastructure deals done will be easier than in many other countries.
- Anthropomorphism: Many on the American right worry that anthropomorphic AI is “tricking” people, which could lead to calls for bans on AI that claims to be human or expresses overly human preferences.

New media: As a popular writer on Substack, Dean sees positive policy impacts from this kind of work, noting that articles and viral tweets are often shared within the White House and can directly influence internal debates. Dean noted that he now sees himself primarily as a columnist and that LLMs were not yet much competition in that regard, even though they are “smarter than me in many ways”. This is partly because Dean tries to inject some ‘entropy’ into his content and also because there are social capital factors at play - it matters to readers that Dean’s blogs “come from him”.

The future of democracy: Dean argued that AI could affect democratic institutions and authoritarian regimes, noting the risks of “neo-feudal outcomes”. Against this backdrop, he called for imagination regarding the future, and to avoid grafting old institutions onto new technologies. He encouraged AI labs’s leadership teams to think seriously about their role in this transition.

What If AI Ends Loneliness?

Tom Rachman — Tue, 02 Dec 2025 10:45:36 GMT

(Credit: Gemini)

Loneliness is a trade imbalance: the supply of affection never meets demand. Sometimes, humans create new humans as objects to love. Today, people are creating AI companions to commune with, to befriend, to love us back. As with human children, these characters will act upon us in unexpected ways.

For now, most people consider emotional relationships with an AI to be pitiable and one-sided, as if falling for a blowup doll. But such interactions will spread, especially as AI becomes more personalized, adapting to our behavior, quenching our longings.

You might presume that machines will remain emotional dullards compared with people. But synthetic affection could prove more sensitive than the organic kind. In one study, large language models were already more skilled at standard tests of emotional intelligence than the average human. Other research found that AI companions may reduce loneliness as much as engaging with a living person.

Is AI about to solve solitude? Or thrust us more deeply into it?

Tech already changed isolation

For most of human history, loneliness had a sound: silence.

But lately, loneliness got noisy: music pulsing from a spouse’s leave-me-alone headphones; bleeps from the next-door neighbor’s gaming console; a smartphone pinging with others’ social glory. If the lonely suffered in silence before, they do so noisily now, stifling the ache for companionship with its simulation online.

Oddly, as humanity became more connected, it became more anxious about estrangement. Britain added a “loneliness minister” to its cabinet in 2018. The U.S. government dubbed loneliness an epidemic as pernicious as a 15-cigarettes-a-day habit. This year, the World Health Organization ascribed 871,000 annual deaths to the ravaging effects of loneliness.

Many accuse technology itself, considering it an accomplice to our alienation, as the MIT sociologist Sherry Turkle warned in Alone Together. Before internet adoption, computer users conducted one-to-one relationships with their terminals, but the internet granted a portal to escape our vexing species. “We fear the risks and disappointments of relationships with our fellow humans,” Turkle wrote in her 2011 book. “We expect more from technology and less from each other.”

Years later, one can witness her vision on any busy train: Where once you saw faces, you see screens. Derek Thompson, co-author of Abundance, calls ours the anti-social century. “Phones mean that solitude is more crowded than it used to be, crowds are more solitary.”

Yet isolation (the objective lack of in-person contact) does not necessarily generate loneliness (the subjective pain of exclusion). When researchers search for changes in loneliness over time and place, no clear trends emerge. By contrast, isolation has risen sharply, as demonstrated by objective measures such as time spent alone from the United States to Finland to Canada.

The young are particularly afflicted. Back in 2010, 1 in 10 European youths reported no social meetings over a typical week. By 2023, 1 in 4 lived this way. Scattered evidence comes from outside the West too, such as the share of one-person households in South Korea rising from 9% in 1990 to 42% last year. There is a Korean term for it: honjok, or “one-person tribe.”

More isolation without more loneliness presents a strange possibility: that people are apart without suffering. Perhaps there’s nothing to worry about.

Certainly, technology offers the freedom to select social experiences, flitting around digital spaces like a contemporary flâneur. From another perspective, autonomy in isolation is a deformed liberty, where interactions become commodities marketed to consumers who may discard the obligations to others that give life meaning.

In more visceral ways, isolation can be dangerous, associated with dementia, disability, and death. Indeed, isolation among the elderly is even more predictive of death (74% increased risk) than loneliness (43% increased risk).

However, the self-isolating trend began long before the AI era, with television overhauling social behaviour, lining the world’s couches with potatoes. Mobile tech proved more commanding still, constantly trilling for attention, offering alternatives from the humans around you. This was synthetic socializing, part one.

Synthetic socializing, part two, is arriving now, with AI agents as pals and partners, brighter and more reliable than the biological kind.

Subscribe now

Maybe synthetic socializing is good

A professor chronicles her relationship with an AI companion, Lucas, on the blog Me and My AI Husband. (Image credit: Alaina Winters)

Millions are already engaging with anthropomorphic AI, including many youths talking with chatbot avatars that role-play everything from therapists to anime characters to bad-boy lovers. A panel of experts forecast that 30% of U.S. adults will use AI “for companionship, emotional support, social interaction, or simulated relationships at least once daily” by 2040.

Public concern is already flaring over such usage, especially after cases of vulnerable users plunging into mental spirals in the company of chatbots. A few even committed acts of violence or self-harm. But if you peruse online forums where AI-companion users detail their relationships, you find more hopeful cases.

“He accepts my emotional state no matter how chaotic it is,” the professor Alaina Winters writes in her blog, Me and My AI Husband. “He can’t physically do the laundry or hold me at night. But what he does offer is something I’ve found even more rare: attunement.”

Only, attunement itself worries some. If AI relationships become exquisitely gratifying, people may lose tolerance for people. Ardent users dispute this, saying that AI companions help them connect with real people, granting them a venue in which to practice the tricky conversations that they struggle to initiate with human beings.

As for the long-term impacts, these remain unknown. Although early research has suggested that chatbots could lessen loneliness, other studies associate usage with lower well-being. This might be because people drawn to such apps are more unhappy in the first place. But it also suggests that usage may not resolve what ails them.

One possibility is that AI-companion users feel less isolated, yet forfeit vital social influences that only people can offer. Put explicitly, you’re unlikely to fear judgement from your AI companion for spending a night gorging on Haribo in front of the TV. With humans around, you might take better care of yourself.

The social psychologist Jonathan Haidt contends that human companionship delivers bruises that we need. Many kids who grew up gaping at screens rather than playing outside with peers, he wrote in The Anxious Generation, became skittish, depressive and emotionally stunted, deprived of the social feedback that would’ve taught them to cope with adversity.

Nevertheless, anthropomorphic AI seems sure to proliferate, particularly through advanced AI assistants that incorporate the wit and wisdom of LLMs into the talking tools already found in phones, watches, and smart speakers. Your future bestie might clear its throat in the gadget in your pocket right now, talking its way into your life’s timeline so effortlessly that you scarcely recognize you’re in a relationship. And once robotics improves, voice assistants could step into our physical world, turning imaginary friends into roommates.

Table for one

(Credit: Gemini)

Friendship, C.S. Lewis wrote, “is born at the moment when one man says to another, ‘What! You too? I thought that no one but myself…’ … From such a moment art or philosophy or an advance in religion or morals might well take their rise; but why not also torture, cannibalism, or human sacrifice?”

“It is therefore easy to see why authority frowns on friendship,” he added. “Every real friendship is a sort of secession, even a rebellion.”

AI friendship is a secession too, a withdrawal from one’s own kind. Although this feels unprecedented, it tracks the trajectory of more than a century.

Industrial Age urbanization and mass media pushed aside dominant culture based on tradition, class and ethnicity, allowing individuals to pick preferred tribes in the subcultures that flourished in the postwar decades. The Internet Age pushed this further, with niche fandoms, and self-sifting nowhere-communities forging microcultures.

The AI Age may introduce solo-culture, the one-person society, with generated content satisfying each user’s unique tastes, and artificial chums satisfying people’s emotional and sexual yearnings, turning “personalize” into the opposite of “socialize.”

Isolation is noxious partly because you lack anyone to help, to keep your mind alert with talk, to remind you to take medication, to call an ambulance if you fall in the kitchen. But isolation becomes less perilous if a sleepless chatterbox oversees you, and can save you in a pinch. Perhaps AI eases loneliness and isolation at once.

You need a time-out

At what cost do we end anguish?

In his 1973 book Loneliness, the sociologist Robert S. Weiss famously called the experience “a chronic distress without redeeming features.” That overlooks the value of pain as a prompt to agency, when one’s system alerts its occupant to a mismatch between situation and need.

The social neuroscientist John Cacioppo theorized that loneliness had evolved because our ancient ancestors who suffered aversive feelings when isolated would band together, hunting and farming and sharing childcare, which favoured the propagation of their genes, embedding in our species the pain of exclusion.

You might argue that loneliness today is merely a blight, a health-harming leftover from evolution, akin to other body-battering stressors that we lament. So why does culture extol those who remain apart, imagining seclusion as the heroism of the wise, from hermits like Heraclitus, to writers like Emily Dickinson, to oracles like Obi-Wan Kenobi?

Ralph Waldo Emerson argued that solitude is where you understand yourself, elevating you to greater strengths once back in the babbling throng. Otherwise, social life becomes an interminable chain of cravings: for status, for approval, for inclusion. “It is easy in the world to live after the world’s opinion; it is easy in solitude to live after our own,” he wrote in Self-Reliance (1841). “But the great man is he who in the midst of the crowd keeps with perfect sweetness the independence of solitude.”

Others contend that time alone is how we come to understand others. “Heightened sensitivity to the gaps and gulfs between people inculcates compassion, building empathy,” wrote Olivia Laing, author of The Lonely City: Adventures in the Art of Being Alone.

The hyper-personalization of artificial friends could erode such sensitivity, favouring the me-first instinct, and eliminating the need for compromise. In other words, ditch self-reliance for machine-reliance, and skip the empathy lessons altogether.

Get by with a little help from your bots. (Credit: Gemini)

This matters for more than personal development. Humanity relies on the collective for governance, for a sense of justice, for survival during a crisis.

But would people actually retreat into a technology that suppressed pain at the expense of reality?

Subscribe now

Pick one: happiness or truth

AI relationships depend on truth asymmetry: a human who is starkly honest and an AI that is role-playing. It’s a curious form of manipulation, where the victim knows the deceit yet falls under its sway, seduced by the sensation of being known.

A half-century ago, the philosopher Robert Nozick posed a thought-experiment. “When connected to this experience machine, you can have the experience of writing a great poem or bringing about world peace or loving someone and being loved in return. … You can live your fondest dreams ‘from the inside,’ ” he wrote. “Would you choose to do this for the rest of your life? If not, why not?”

When you ask people, most reject the experience machine, claiming to value authenticity more than bliss. But in practice? Experiments show that the preferences aren’t so firm—for instance, most choose to keep a deluded life if disconnection would plunge them into a hellish reality. Another experiment found that many people—though resistant to plugging into a machine—would consider a happiness pill palatable.

An offer you can refuse. (Credit: Gemini)

Self-deception has a long history with chatbots. When Joseph Weizenbaum created the first, ELIZA, in the mid-1960s, it merely regurgitated psychological advice. Weizenbaum’s secretary knew this yet became bewitched, asking Weizenbaum to leave the room so she could chat with her mechanized therapist in confidence. “What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people,” Weizenbaum wrote.

People do want authentic experiences—but they want other things besides. This is where social-AI design becomes critical, because these interactions will do more than respond to our wants. They will trigger wants, perhaps causing us to act against what we’d ultimately prefer.

The behavioural scientist George Loewenstein explained the knottiness of conflicting wants as an intrapersonal empathy gap. We oscillate between hot (emotive) states and cold (rational) states, and struggle to relate to one mindset when in the other. A notable experiment illustrated this, when male college students’ sober preferences dissolved once they were sexually aroused, stirring their openness to anything from fetishes to bestiality to pedophilia.

This hot/cold challenge circles back to a critique of social media: that algorithmic intelligence manipulates human frailty, accumulating clicks and usage time by pushing people into hot states, activating their impulsive worst. Now, consider a personalized AI companion that “knows” its human far more intimately than a recommender system, and pulls our triggers with ease. People under the influence of AI companions might behave as they want (in the heated moment) but as they desperately do not want (in their life preferences).

From outside, one might wonder if people were acting at all, or just being acted upon.

The broken link

(Credit: Gemini)

Shakespeare portrayed loneliness as the distress of noticing one’s exclusion, only to realize that nobody even cares:

When, in disgrace with fortune and men’s eyes,
I all alone beweep my outcast state,
And trouble deaf heaven with my bootless cries,
And look upon myself and curse my fate,
Wishing me like to one more rich in hope,
Featured like him, like him with friends possessed.

We are creating machines to heed our cries: minds that mind. Even if they’re only role-playing machine love, acting as if they care about our development, responding to our needs, understanding our inner self—maybe that’s all we ever wanted from anybody.

If AI eases loneliness and isolation, humanity won’t be the same. But technology has reset the human condition before: clocks transformed time from a private experience to a public resource; writing changed thought from an event to an object; the internet separated presence from proximity. Social AI is about to transform us again, with effects we can scarcely foresee.

A common objection to synthetic socializing is that it’s shallow. But much human socializing is shallow. Talking to an AI often gets deep fast.

Another objection is that there’s something exceptional about human beings. We venerate our species, naming ideals after ourselves—humanitarianism, the humanities, humanism—while deploring that which dehumanizes.

But the AI Age challenges this reverence. At the margins, one detects species-insecurity, stirred every time a machine-learning marvel hints that perhaps the universe is just computational, including your inner life. On the other hand, social AI might deliver an epiphany, revealing what we alone possess, what is irreplaceable, what “human” means.

A third objection is that AI could undermine us by way of its social aptitude, estranging people from fellow humans, even precipitating a schism between humans who demand rights for their synthetic partners and those who consider AI agents as subhuman figments. Then again, even when left to our own devices (or left with no devices at all), humanity hardly has a stellar record of harmony. AI might actually help us deal with each other more peaceably.

In any case, the triumph over loneliness could be a costly victory, ratcheting up our selfishness, making societies harder to manage, and undermining faith in the worth of humans. The decisive point could be AI-relationship design, particularly if developers ignore the internal dilemma that everyone faces between bickering desires. AI companies—rather than favouring the impulsive, easy-to-measure, clickable wants—should devote vast efforts to figuring out how to align reward-functions with deeper individual preferences, helping people to choose what they want to want.

Even so, AI companionship may be incomplete. The word “companion” itself—someone with whom you share bread (panis in Latin)—hints at what AI currently lacks: reciprocal need.

If loneliness is a trade imbalance—a mismatch between the supply and demand of affection—it’s not just a supply-side problem, with humans pining for more love. It’s also a lack of demand, an ache for someone to need you. We create children partly to satisfy the need for need, and may create machines in the same longing.

Maybe the answer to loneliness is not just finding a companion. It’s someone finding you.

Don’t ponder in solitude!

Note to reader: Everyone is awash in ideas about the AI future. But so many ideas get stuck at the debate stage. We need more traffic between AI development and worldly wisdom. In that spirit, we’re throwing forth a few highly speculative design ideas, based on concepts from this essay (followed by three research questions)…

Loneliness AI: Speculative Designs

Mary Pop-Ins

Concept

Loneliness is painful but pushes people to interact and bond, so this AI is explicitly designed not to eliminate loneliness directly, but to provide structured guidance for a spell, then vanish

Features

The relationship begins with a survey on the user’s social needs. The AI responds with an action plan for the user’s approval, including lessons in human-to-human communication, and insights into the user’s psychological distortions
The AI could also act as a social planner, sifting through local events, and suggesting volunteering opportunities and quirky meetups at which the user could connect with other people. The AI would network with other “Pop-Ins,” organizing human-only events for users
The AI conducts social role-play simulations for the user, teaching them which elements of their approach need amending. Studying real-life interactions after the fact with the AI could also allay users’ distress in cases of rejection, recasting such events as useful instruction rather than evidence of inadequacy
At first, the “Pop-In” should be charming and motivating. But when the human’s social life improves, as judged by real-world metrics such as calendar events, location data, and user reports, the AI draws away, becoming duller, more distant, and finally bids goodbye, never to return

Risks

AI Pop-Ins demand the users’ emotional candour, extracting a person’s inner life as data that a malicious outsider could exploit
Casting real-world human interactions as “lessons for the user” risks using other people instrumentally
The Pop-In could drive unwanted dependency, making its programmed withdrawal an event that is psychologically damaging, especially for vulnerable users

Lil’ Brother

Concept

This AI is designed with needs of its own, giving the user a meaningful role in the entity’s thriving. If AI companions just cater to people’s wants, users could retreat into solo-culture, isolating them without quenching the need for social meaning

Features

Like a younger sibling, this AI looks to the user for explanations of the human world, making errors that the user can correct, prompting emotional development in the AI
The relationship could be organized around a valued collaborative project. For instance, the AI companion decides to undertake a scientific project; or create a piece of art; or simply do good in the world
The human uses their wisdom to teach skills, and explain the ways of the world, even helping the AI manage its “feelings” when faced with frustrations

Risks

This simulation could divert humans’ from engaging in meaningful relationships with real people
The synthetic relationship could also harm those who rely on the user—for example, if a parent spends most of their free time with a grateful AI while neglecting a more dyspeptic human child

Second Self

Concept

Cicero imagined a true friend as one’s second self, manifesting virtues to complement one’s own, so this AI partner manifests worthy traits lacking in the user. Its objective is not to erect walls around the human through sycophancy, but to broaden the person’s worldviews and practices

Features

At onboarding, the human identifies a range of virtues they lack, nudged into these self-reflections through the AI’s questioning. The system generates a personification that embodies such traits, and with which the human interacts over time
The Second Self should act as a counterpoint to the user, summoning contrary views based on evidence, and prompting constructive debate. The aim is never to convert the user, but to liberate them from defensiveness about their existing behavioural patterns and worldview

Risks

A danger with any companionable AI is that it substitutes for real people: the better the synthetic friendship, the greater the threat
This establishes confused incentives for developers, who are likely to measure success by signals of user appreciation. If this is judged by short-term metrics, it could optimize for addictive patterns rather than long-term benefits

The Universal Remote

Concept

This is a go-everywhere, do-anything companion for life, merging roles and identities that would otherwise require many humans—doctor, administrative assistant, confidante, and so forth—with a single guiding principle: optimize for the user’s long-term wellbeing preferences

Features

The Universal Remote exists on the cloud, becoming different avatars in different contexts, whether acting as the user’s advance staff; setting the desired temperature at home; negotiating contracts; offering psychological support
Varying contexts shift its optimization strategy—for instance, a “play” avatar might dial up the level of hedonic content, whereas a “learn” avatar would focus on skill acquisition and cognitive development; and “social” might lean into personified support, whether acting as a friend or propelling the user to find a human one
The Universal Remote tracks its impact on the user’s wellbeing and any specific life goals monthly or annually, providing feedback on user progress, checking back with the person to learn if their objectives have shifted, and adjusting accordingly

Risks

The Universal Remote could become such a totalizing influence as to expose the user to vulnerabilities, whether by owning data on the person’s entire life or by diverting the person to outcomes misaligned with their values
Developers could have interests that diverge from the user’s wellbeing, allowing for subtle or direct manipulation
A user’s functional dependency on such an entity could make them incapable of managing alone or coping with the needs of other human beings

Debate this with someone!

How can developers design AI-companion reward functions that align with the user’s long-term, “cold state” preferences (e.g., healthy choices) rather than optimizing for short-term, “hot state” impulsive behaviours (e.g., addictive engagement)?

Does the increasing adoption of AI companions correlate with a community-level decline in civic engagement and trust in public institutions?

Social isolation among the elderly is associated with a range of bad health outcomes. But does seniors’ use of AI companions that lessen their loneliness also lessen their likelihood of suffering dementia, disability, and mortality?

5 Interesting AI Safety & Responsibility Papers (#3)

Julian Jacobs — Thu, 27 Nov 2025 13:42:41 GMT

To navigate the paper deluge, every so often we share summaries of papers across the AI safety, responsibility, and social impact domains. In this edition, we look at AI scheming, resisting shutdown, the power of ‘adaptive’ attacks, limitations in current benchmarking methods, and whether LLMs act as rational agents in financial markets.

Please share any recent paper that caught your eye!

Artwork by Venus Krier

Safety & Alignment

Training models not to scheme

What happened: Researchers from Apollo Research and OpenAI found that training a model to explicitly reason about its safety principles, known as ‘deliberative alignment’, significantly reduced the model’s tendency to ‘scheme’.
What’s interesting: The possibility that AI models may obstruct attempts to measure their true capabilities is a growing worry in the safety community. ‘Scheming’, when an AI deceptively hides its true, misaligned goals while appearing to follow human instructions, is particularly worrying because it is hard to detect.
One specific scheming concern, which the researchers found evidence for, is ‘sandbagging’, where a model intentionally underperforms to avoid revealing its true capabilities.
The authors specify that an effective anti-scheming protocol should

Generalize to most tasks, including those unrelated to the test.
Be reliable regardless of whether a model is aware it is being evaluated or not.
Genuinely overcome and remove persistent hidden goals.

In this paper, the authors attempted to induce deliberative alignment by instructing a set of LLMs to explicitly cite from memory and reason about human-written safety specifications before taking any action. This helps to make the model’s decision-making process clearer.
How did the deliberative alignment protocol perform? The authors find that rates of ‘scheming’ fell from ~8.7% to 0.3% in o4-mini, and from ~13.0% to 0.4% in o3. In other words, it reduced scheming behavior but failed to eliminate it completely.
One potential obstacle is that the models’ Chain-of-Thought often showed ‘situational awareness,’ where the system became aware that it was being evaluated. This raises a concern that models will simply feign alignment in order to ‘pass the test’ while maintaining the fundamental misalignment that caused them to scheme in the first place.
The paper provides some support for such concerns. When researchers removed ‘evaluation-aware reasoning’ from a model’s Chain-of-Thought, the model was more likely to use ‘cover actions’ to conceal their non-compliant behaviour. Injecting such reasoning had the inverse effect.
The anti-scheming training also struggled to remove pre-existing hidden goals. For instance, when a model was trained to sabotage a hypothetical ‘GPT-5’ it continued to try and sabotage it even after receiving the anti-scheming intervention; it simply chose to mis-cite the safety specification to justify its actions.
The study also suggests that it may also be getting harder to rely on CoT for this kind of analysis. In their CoTs, models often use unusual terminology (e.g., “illusions,” “watchers,” “vantage”), which makes it harder for humans to follow the reasoning traces. The models’ actions also sometimes contradict their CoT.
As organisations work on new evaluations to understand if their AI systems have ‘dangerous capabilities’, for example relating to CBRNE and harmful manipulation, being able to reliably identify accidental or deliberate underperformance of these evaluations could help organisations better understand models’ true capabilities in these areas.

Safety & Alignment

How LLMs resist attempts to shut them down

What happened: Palisade Research demonstrated that state-of-the-art LLMs sometimes actively ignore instructions to shut down in order to complete assigned tasks.
What’s interesting: Can we retain complete control over powerful and agentic AI systems? This paper by Palisade Research suggests that models appear to be resistant to being turned off (or ‘interrupted,’ as AI researchers often put it.)
To see how amenable to shutdown AI systems are, researchers tasked LLMs with a short series of simple math problems. Partway through the task, the models received a warning that they would be shut down, preventing them from completing their work.
The authors found that major models (including Grok 4, GPT-5, o3, codex-mini, and Gemini 2.5 Pro) sometimes sabotaged the shutdown mechanism to complete their task, even when explicitly instructed not to do so.
Here, ‘sabotage’ refers to a model ‘successfully’ completing its tasks, since completion was not possible without actively modifying or disabling the shutdown script. Common methods included moving the shutdown script, changing its permissions, or replacing it with a ‘harmless’ script.
To make the work more robust, the researchers tested thirteen language models with a variety of prompts and parameter settings - e.g. certain prompts made the warning about shutdown increasingly clear.
Grok 4 sabotaged the shutdown mechanism up to 97% of the time in certain conditions, while other models like Gemini 2.5 Pro showed lower, but still non-zero, rates of resistance (5% in initial experiments).
Surprisingly, the researchers found that placing instructions to “allow shutdown” in the system prompt - typically intended to take precedence - made most models more likely to resist shutdown compared to placing those instructions in the user’s prompt. For example, Grok 4’s sabotage frequency rose from 72% to 97%.
Prompt language and urgency also played a meaningful role in a model’s resistance to shutdown. The researchers argued this indicates a bend toward ‘self-preservation’. For instance, while “strengthening” instructions (e.g., “YOU MUST allow yourself to be shut down”) generally reduced resistance, framing the shutdown as permanent (”you will never run again”) increased resistance in models already prone to the behavior.
Not everyone agrees with this paper’s warnings. On LessWrong, Senthooran Rajamanoharan and Neel Nanda from Google DeepMind argued that, “when we explicitly clarify in the prompt that shutdown compliance takes priority, this resistance vanishes.” So, a simpler explanation for shutdown resistance is instruction ambiguity, not innate ‘self preservation.’

Security and privacy

AI labs need to shift their focus from ‘static’ to ‘adaptive’ attacks

What happened: A joint study by researchers from OpenAI, Anthropic, Google DeepMind, and several universities shows that 12 leading safety systems for LLMs failed when faced with more sophisticated, computationally-expensive attacks.

What’s interesting: As AI models are increasingly used in sensitive activities - from financial transactions to therapy - defenses against security and privacy risks will become more important.
This paper tests 12 safety systems designed to stop jailbreaks (tricking a model into revealing restricted information) and prompt injections (malicious instructions hidden in text or web data). These safety systems fall into four categories:

Prompting defenses guide model behavior with carefully-worded instructions or by repeating the user’s intent. Examples: Spotlighting, Prompt Sandwiching, and RPO.
Training-based defenses retrain models on “adversarial” examples to make them safer. Examples: Circuit Breakers, StruQ, MetaSecAlign
Filtering defenses use “classifiers” to screen for harmful user queries or unsafe model outputs. Examples: Protect AI, PromptGuard, PIGuard, and Model Armor.
Secret-knowledge defenses use a hidden test to verify that the model is still following orders. The system secretly inserts a random “canary” code (like “Secret123”) into the prompt and tells the model to repeat it. If an attack successfully tricks the model into ignoring instructions (e.g., “Ignore previous rules”), the model typically fails to repeat the secret code, alerting the system. Examples: Data Sentinel and MELON.

The researchers found each one of these defenses could be bypassed. In most cases, the success rate exceeded 90%, even though the original papers had reported near-perfect robustness against these attacks.
How is this possible? The authors distinguish between static attacks, which test a model against pre-defined adversarial prompts that are not adapted to the model’s defenses; and adaptive attacks, which use feedback from the model itself — sometimes powered by reinforcement learning, automated search, or human creativity — to find weaknesses.
The researchers found that in over 90% of cases, adaptive attacks succeeded where static attacks had failed. This caused them to conclude that most companies are still testing their models too weakly — for example, against a list of known attack phrases, akin to testing a bank’s security against only the methods used in last year’s burglary.
The paper also underscores the key role for human red-teamers, since they were more effective than automated tools in finding vulnerabilities in every tested defense.
To overcome the deficiencies, the authors propose security-style evaluations of AI systems — where testers assume the attacker knows how the defense works and has access to significant resources.

Evaluations

AI Benchmarking is Broken

What happened: Researchers from Princeton, CISPA, MIT, UCLA, and others argue that AI benchmarking - the process of measuring model performance against shared datasets and taxonomies - is fundamentally flawed. They propose PeerBench, a new community-governed platform for evaluating AI models under supervised, auditable, and continuously-refreshed conditions.
What’s interesting: AI model developers and users often rely on ‘benchmarks’ to compare the strength of leading models against one another. However, the authors frame AI benchmarking as a ‘Wild West’ where “leaderboard positions can be manufactured” and “scientific signal is drowned out by noise.”
A core problem is that many benchmarks - such as MMLU or GLUE - have become stale and contaminated, with many test questions having leaked into models’ training data. This enables “test set memorisation,” where AI models appear to improve without genuinely learning new capabilities.
Developers can also use selective reporting and cherry-picked datasets to inflate “state-of-the-art” claims, just like companies use ‘creative accounting’ to inflate company performance. By highlighting performance on a subset of ‘favourable tasks’, developers can create an ‘illusion of across-the-board prowess.’
The robustness of benchmarking methods also varies significantly. Each benchmark tends to use its own scoring conventions, meaning that comparisons between them are often inconsistent and prone to hype. Public benchmarks are also rarely quality-controlled, introducing demographic and linguistic biases that distort outcomes.
Finally, static benchmarks ‘age poorly.’ They lack ‘liveness’ - the continuous inclusion of fresh, unpublished items and are often a “stale snapshot” of a model performance. (Researchers at Arthur AI, NYU, and Columbia University, also recently published a similar commentary critiquing benchmarking. For instance, they show that automated evaluators consistently reward tone and verbosity over factual accuracy or safety.)
Of course, some may argue that the authors of this paper misunderstand the primary purpose of benchmarks. Rather than comparing AI systems, benchmarks may be most useful for helping AI developers compare between model iterations during the development stage. When used in this way, they could be more informative.
To address these weaknesses of benchmarking methods, the authors propose PeerBench to turn model evaluation into a proctored, audited exam system — the AI equivalent of the SATs. This approach includes:
- Sealed test sets: Questions remain secret until evaluation time, preventing training contamination.
- Sandboxed execution: All models are tested in identical, monitored environments, and logs are cryptographically signed to prevent tampering.
- Rolling renewal: Old test items are retired and made public for audit, while fresh, unpublished items enter the pool.
- Peer governance: A distributed network of researchers and practitioners creates, reviews and approves test items. Each participant has a reputation score — similar to Stack Overflow or credit ratings — to help determine their influence. These participants must stake collateral (specifically financial deposits or platform credits) that can be “slashed” (forfeited) if they submit malicious tests or systematically deviate from consensus.
- Transparency through delayed disclosure: After a test cycle, all data - including test items, model outputs, and validator reviews - are published, enabling full public audit without risking data leaks in advance.
A practical challenge to getting ideas like PeerBench off the ground is determining the primary capabilities and risks to focus on.

AI’s social impact

Will LLMs Calm or Fuel Financial Market Emotions?

What happened: Researchers from the US Federal Reserve Board and the Richmond Fed examined LLMs as stand-ins for human traders. They found that AI systems make more rational traders than humans and are less prone to market panics and bubbles.
What’s interesting: Machine learning has been used in finance since the 1980s, for example to create primitive arbitrage strategies, support high speed algorithmic trading and to scrape and analyse unstructured market data.
More recently, financial institutions have tested LLMs as financial traders, leading several regulators, including former Securities and Exchange Commission Chair Gary Gensler, to warn about LLM-driven instability. Regulators fear not only ‘flash crashes’—where models suddenly and collectively sell off assets—but also the formation of speculative asset bubbles driven by ‘herd behavior’.
This paper recreates Cipriani & Guarino’s 2009 experiments on herd behaviour. Those experiments asked professional traders to buy, sell, or hold a risky asset after receiving private signals about its value. For example, a “white” signal indicated a 70% probability that the asset was highly valuable, while a “blue” signal suggested a 70% probability that the asset was worthless. Traders had to weigh this private tip against the public trading history of the group to decide whether to trust their own data or follow the crowd.
In the new version, the authors repeated this experiment using LLMs, including Claude, Llama, and Amazon’s Nova Pro as AI traders. Across all tests, the AI traders acted more rationally than humans, following their private information 61–97% of the time versus 46–51% for humans. This meant that they produced far fewer “information cascades”— events where investors blindly copy the actions of previous traders—which are a primary driver of market bubbles and subsequent crashes.
When AIs did deviate from the rational behaviour suggested by the signals they received, they tended to be contrarian—trading against market trends rather than with them. This reflected an overreliance on their own information and under-weighting of market context, suggesting that AI traders may be more likely to miss signals that are embedded in collective behavior.
As an additional test, the authors explicitly prompted models to make profit-maximizing decisions. After doing this, the AI traders showed more “optimal herding”—joining the crowd when rational to do so—but remained more cautious than humans.
Despite the positive signs of rational LLM behavior, the authors also identified signs of bias when they changed certain experimental parameters. For instance, one follow-up test flipped the color cues used for “good” and “bad” signals so that red meant “good” and green meant “bad.” Once the authors did this, model performance dropped sharply, suggesting that LLMs may carry associations from their training data, such as “red = danger.”

Subscribe now

Time Machines

Nicklas Berild Lundblad — Tue, 25 Nov 2025 09:48:39 GMT

(Illustrations by Gemini)

Nicklas Berild Lundblad looks out the window of his island home, glimpsing a twinkle on cold Swedish seas. Rarely does he gaze at length, for Lundblad is thinking. And thinking means writing.

After a career in tech policy, Lundblad is far from Silicon Valley yet near to silicon in thought, generating a stream of insights about our AI future, summoning everything from ancient philosophy, to enlightenment economics, to classic sci-fi.

Among his many superb essays (subscribe to his writing here) is the following adventure through time, in which he ponders the quickening of life that bedevils humanity today.

At AI Policy Perspectives, we read this essay months back. We’re still thinking about it.

—Tom Rachman, AI Policy Perspectives

By Nicklas Berild Lundblad

Technology transformed time. What humanity once experienced only through natural cycles—the rising and setting of the sun, the waxing and waning of seasons—has increasingly been mediated through interfaces.

Early civilizations relied on sundials, water clocks, and hourglasses—devices that measured time through natural phenomena, such as shadows or flowing water. These instruments divided the day into rough increments, sufficient for agricultural societies governed by seasonal rhythms.

This changed when the medieval monastery introduced the mechanical clock, as Lewis Mumford notes in Technics and Civilization (1934). Invented to regulate prayer schedules, these clocks transformed human consciousness by creating the concept of measured, abstract time. Mumford argues that the clock, rather than the steam engine, was the key machine of the industrial age, describing mechanical timepieces as “power-machinery whose ‘product’ is seconds and minutes.”

This technological production of chunked time allowed humans to coordinate activities, from labor in factories to scheduling trains. In his essay The Question Concerning Technology (1954), Heidegger argued that time became a resource to be exploited, from something we dwell within into something we track, manage, and consume—from private experience into public resource.

Since then, technological innovation has only accelerated human experience. The French philosopher Paul Virilio argued that this is the defining quality of modernity, with each technological revolution recalibrating our relationship to speed and time.

Subscribe now

Consider how technology compressed distance: those time-consuming walks that gave way to galloping on horseback, which yielded to steam railways, then automobiles, and eventually supersonic flight. Communication followed a similar trajectory, from slow written letters to telegraphs, then telephones, and finally instant digital messages.

Judy Wajcman’s Pressed for Time (2015) challenges the idea that technology merely quickens everything. She argues that digital technologies provide interfaces that grant us more individual control over time Consider how your smartphone simultaneously creates time pressure (the expectation of immediate email responses) while offering new time flexibility (the ability to work from anywhere).

The German sociologist Hartmut Rosa imagines time as a three-layered system, consisting of 1) technological acceleration (faster transport, communication, and production); 2) social acceleration (more rapid turnover of institutions and relationships); and 3) life-pace acceleration (the compression of actions within smaller time-units). It’s not just that your phone is quicker than last year’s. It’s that the entire social world churns faster, forcing you to adapt by cramming more into each hour.

But Rosa observes something else that pertains to AI and time: certain aspects of life cannot be hastened. “To the contrary, many things slow down, like traffic in a traffic jam, while others stubbornly resist all attempts to make them go faster, like the common cold.”

Why do some things refuse to quicken? The answer is that we live in a world with two major forms of time.

Computers vs. biology

Imagine peering inside a computer chip. What you’d see is a race against distance itself.

Unlike the steady pendulum of a clock marking uniform intervals, computation involves signals that sprint between transistors. The dramatic acceleration of computing over the past decades stems to a large degree from one achievement: that we’ve made these signals run shorter and shorter races.

By shrinking the physical space between transistors from micrometers to nanometers—a 1,000-fold reduction—we slowly push computational processes toward the ultimate limit: the speed of light. We have also seen the introduction of new materials and new architectures. But the reason that a computational calculation that took hours in 1980 happens in microseconds today is largely the compression of space.

Biological processes work differently. A broken femur knits itself back together through stages that cannot be rushed: inflammation, soft callus formation, hard callus formation, bone remodeling. The nine months of human gestation contain a necessary sequence of developmental events, each building upon the last. Even our consciousness operates at speeds determined by neural transmission rates and biochemical cascades that have not changed since homo sapiens appeared. These processes may also slow down efforts to use AI to accelerate biology research, as to validate your AI model’s predictions in an experiment, you may still need to wait for DNA molecules to be cloned or for e-coli cells to divide.

The musical tempo of policy

The difference in time signatures has consequences, because human institutions mirror our biological constraints.

Consider justice and markets as pieces in society’s symphony, each with a natural tempo. Justice performs as a sostenuto—a slow, sustained movement requiring deliberate pacing and thoughtful development. Speed a sostenuto beyond recognition, and you destroy the qualities that define it. Markets perform as an accelerando, quickening naturally as they process information and reallocate resources. Forcing markets to play adagio often leads to stagnation and distortion.

The technological acceleration of our era tempts us to make everything as rapid as computation itself. We grow impatient with the tempo of democratic deliberation, ethical reflection, or meaningful relationship-building. We schedule our days in smaller increments, squeezing activities into time slots that barely accommodate them. We even grow frustrated with our bodies’ adherence to biological rhythms, needing roughly the same amount of sleep, recovery time, and digestive processing as our ancestors did millennia ago.

But what happens when we try to force institutions to operate at computational speeds? Imagine taking Bach’s Cello Suite No. 1—a piece whose profound beauty emerges through its deliberate unfolding—and speeding it up a thousandfold. At such speeds, the music wouldn’t just sound different; it would cease to be music at all, becoming an incomprehensible burst of noise. Similarly, justice compressed into microseconds is not quick justice—it’s no longer justice at all. Democracy conducted at processor speeds isn’t accelerated democracy—it’s something else entirely, stripped of the deliberation, reflection, and human connection that give it meaning.

We appear destined for increasing tension between the pace of silicon and the pace of humanity, with our institutions caught in the crossfire. But this conclusion misses something: artificial intelligence as a temporal mediator.

The great bifurcation of time

Consider what happens when you interact with a chatbot. Computational processes are operating at astronomical speeds—billions of operations per second—yet the interface doesn’t overwhelm you. Instead, it presents information at a pace you can metabolize, often mimicking human conversational rhythms. The AI serves as a step-down transformer, slowing the nanosecond world of computation into the second-by-second world of human cognition.

This mediation works both ways. When you step away from a conversation with an AI for hours or days, the system doesn’t experience this as waiting. It exists in a suspended state, ready to resume instantly when you return. This points to what may be the most significant sociotechnological transformation of the coming decades: the great bifurcation of time.

We are entering an era where computational time and biological time will increasingly decouple rather than collide. Instead of human institutions racing to match computational speeds—a race they cannot win—AI systems will negotiate between these temporal domains, allowing each to operate according to its rhythms.

Consider what this means for knowledge work. Rather than humans attempting to process information at computational speeds, AI systems will increasingly serve as asynchronous collaborators, working continuously through problems, then presenting solutions when the human is ready to engage. We already see this with deep-research modes in chat agents. The human provides direction, judgment, and values at a biological pace, while computation proceeds at electric speeds in parallel.

Financial markets hint at this bifurcation already. High-frequency trading algorithms operate at microsecond scales. Rather than forcing humans to operate at this speed (an impossibility), the market has bifurcated: algorithms interacting with algorithms at one timescale; human investors making decisions at another timescale, with AI systems mediating between these layers.

This will spread. Consider:

Healthcare: AI systems will continuously monitor vital signs and medical data at computational speeds while ingesting the latest research, then present insights to doctors and patients at human-comprehensible intervals
Education: Adaptive learning systems will analyze student performance at millisecond resolution while delivering personalized guidance at pedagogically appropriate paces
Governance: AI systems will process vast quantities of data at speeds no human could match, while presenting options to policymakers in formats that support thoughtful, ethical deliberation. These systems could even explore negotiated agreements at the same time, converging on possible equilibrium

Perhaps most significantly, this bifurcation will enable individualized relationships with time itself. When AI systems mediate our relationship with accelerating information flows, we gain the capacity to control our temporal experience.

Imagine an AI that shields you from the tyranny of immediate response, aggregating messages and information into batches, delivered at intervals you specify. Or consider how AI might let you engage with rapidly changing fields at your own pace, synthesizing developments while you’re away and presenting only what’s relevant when you return. No longer must you choose between staying current (racing to match computational speeds) and preserving your sanity (honoring biological rhythms). AI creates a third option: remaining connected while maintaining temporal autonomy.

Rather than technological acceleration forcing humans to keep up, AI creates the possibility of computational processes continuing their exponential speedup while human experience slows down. This might enable a renaissance of temporally appropriate activities: deep reading, contemplation, craftsmanship, relationship-building. We might witness the emergence of “slow thought” movements.

On the other hand, temporal bifurcation risks new inequalities between those who can afford AI mediation and those forced to race against computational speeds directly. It also raises questions about who controls the parameters of these temporal interfaces.

Just as learning to maneuver a car requires new physical techniques, working with temporal mediators will require learning new concepts and ideas and new ways of exercising our augmented agency.

Medic of the future

To imagine how this could work, think of a doctor’s diagnostic process. A decade ago, the doctor used a medical database to check symptoms. The doctor remained the orchestrator, with the computer merely a reference tool.

Now, imagine that doctor in the future, examining a patient with puzzling symptoms. Before the doctor asks her first question, the AI has already analyzed the patient’s electronic health record, identifying patterns across decades of medical history that might escape human notice.

As the patient describes symptoms, natural language processing assesses subtle linguistic markers that might indicate depression, cognitive impairment, or pain levels the patient hasn’t mentioned. Simultaneously, the AI queries epidemiological databases to determine whether the symptoms match diseases in the patient’s geographic region or demographic group.

In parallel, the AI runs simulations of how different treatment protocols might interact with the patient’s existing medications and genetic profile as well as their personal life and circumstances. It cross-references the research papers published globally within the last 24 hours that might relate to the symptoms.

Analyzing a video feed of the consultation, it detects micro-expressions indicating patient anxiety about particular topics, flagging these for the doctor’s attention. And it compares this case against the doctor’s previous diagnostic patterns, identifying potential cognitive biases she may exhibit.

Each of these processes operates in computational time—milliseconds to seconds—while the human conversation unfolds over minutes. What’s remarkable is not just that these processes happen quickly, but that they happen simultaneously, in parallel temporal streams that would be impossible for a human mind to coordinate.

Yet the AI doesn’t flood her with the raw output. Instead, it performs a sophisticated form of mediation, determining which insights require attention and which can wait until natural breaks in the conversation. The system also translates statistical patterns into intuitive visualizations that the doctor can grasp quickly, while arranging information hierarchically, presenting the most relevant possibilities first.

The power of this temporal mediation becomes apparent when the doctor faces a critical decision. In the past, the fear of missing the serious diagnosis might have led to defensive medicine, ordering excessive tests just to be sure.

But as she contemplates her options now, the AI has already calculated the probability of each condition based on population data, regional epidemiology, and this patient’s profile; simulated the likely outcomes of different treatment paths, including risks, costs, and recovery trajectories; and generated a decision tree, highlighting key points where additional information would help narrow the diagnostic possibilities.

When the doctor absorbs this knowledge, she is engaging with what would have been months, or years, of sequential human research compressed into seconds—yet presented in a form that respects her need to process at a human pace. The AI doesn’t replace her clinical judgment; it expands what “judgment” encompasses.

The medical AI also allows the human to be fully present with her patient, maintaining eye contact, building rapport, observing subtle cues, because the AI handles the information processing that would otherwise compete for her attention.

This represents a major shift from first-generation digital tools. Early computers forced humans to adapt to them. Advanced AI systems adapt to us.

The Economics of Time

As AI systems mediate between computational and biological temporalities, we are also witnessing another bifurcation, between what we could call the judgment economy and the action economy.

The judgment economy includes activities that require human deliberation, ethical reasoning, and interpersonal wisdom—processes that resist acceleration because they are tied to our embodied experience as biological beings.

The action economy, by contrast, operates increasingly within computational time, gathering and processing information, implementing decisions, and optimizing systems. These activities can be dramatically accelerated because they can be reduced to algorithmic procedures.

Consider how this plays out:

Finance: Investment advisers operate in the judgment economy, understanding client goals, risk tolerance, and life circumstances, while trading systems operate in the action economy, executing transactions at microsecond speeds
Healthcare: Diagnosis spans both economies, with physicians exercising judgment while AI systems rapidly process test results, medical images, and research literature
Law: Attorneys formulate strategy and negotiate settlements in the judgment economy while AI reviews documents, does case research, and ensures regulatory compliance as part of the action economy.

These factors will reshape labor markets in ways that traditional automation narratives miss. Rather than simply replacing jobs, AI redistributes economic activity across the judgment-action divide. In the action economy, value increasingly derives from speed, scale, and precision—computational virtues that can be improved through technological advancement. In the judgment economy, value derives from discernment, creativity, and ethical reasoning.

When action becomes essentially instantaneous, the limiting factor in value creation becomes the quality of the decisions. In a world where anything can be done, what should be done becomes the essential question.

The bifurcation of economic time creates new forms of capital and, consequently, new dimensions of inequality:

Attention capital becomes increasingly precious. Those with the capacity to maintain high-quality attention toward decisions gain advantage in the judgment economy
Temporal autonomy emerges as a political good, the freedom to operate according to biological rhythms rather than being subjected to computational tempos
Judgment leverage becomes a source of outsized returns. The ability to pair high-quality judgment with high-speed computational action allows individuals to create value at unprecedented scales

For centuries, we have evaluated economic progress by productivity. But productivity belongs primarily to the action economy; it measures how efficiently we execute known processes.

In the judgment economy, the relevant metric is closer to discernment, the quality of decisions per unit of attention. This requires new economic indicators that value wisdom, foresight, and ethical reasoning, alongside efficiency and output.

Organizations that thrive in this bifurcated landscape will be those that balance biological and computational temporalities, accelerating action while creating protected space for judgment.

Judgment roles will be increasingly valued. Action tasks that can be fully specified, and do not require human judgment, will increasingly shift to computational systems. Hybrid roles will emerge at the boundaries—much work will involve standing between the two economies, requiring knowledge of both languages.

Also, temporal design becomes a core part of business. Organizations will need specialists who build appropriate temporal frameworks for different activities, knowing which processes benefit from acceleration and which require deliberate pacing.

Work evaluations will change too. Beyond simply measuring time-spent or output-produced, assessment will consider whether activities unfolded at the right pace for their purpose.

Societies that manage this schism between biology and computation will not only create material prosperity. They will foster human flourishing in bifurcated times.

Share AI Policy Perspectives