AI Policy Perspectives : Science

Science Needs AI Data Stocktakes

Conor Griffin — Thu, 30 Apr 2026 11:09:40 GMT

By Conor Griffin, Don Wallace, and Theo Brown

For 40 years, amid green pastures outside Culham, a small village in Oxfordshire, scientists and engineers toiled at the Joint European Torus. They were attempting to harness nuclear fusion, a force powerful enough to light the sun.

To create fusion, scientists and engineers must heat the nuclei of very light atoms with such intensity that they fuse, instigating a self-sustaining reaction that releases vast amounts of energy. The scale of the challenge is hard to fathom—at its most extreme, the Joint European Torus, or JET, was the hottest point in the solar system, hitting over 150 million degrees Celsius.

JET concluded in 2023, generating a record amount of energy in its final experiments. The project is now part of fusion’s history but remains pivotal to its future. The growing number of organisations developing fusion reactors are drawing on JET’s discoveries. The UK Atomic Energy Authority is advancing a national fusion facility, MAST-U, on the Culham site. This will serve as a test-bed for STEP Fusion, the UK’s project to put fusion electricity on the grid, set to begin operations in the early 2040s.

But JET didn’t just bequeath novel discoveries. It left behind massive troves of data. That raises a tantalising prospect: Could scientists use this data to train AI models that accelerate the path to fusion power?

This is possible, but challenging. Most JET data is raw and unvalidated. Many important insights are buried in scientists’ logbooks. The data that does exist is not available open source or, generally, for commercial use. Changing this may require agreement from all of JET’s original partners across Europe. One expert we interviewed called JET data a ‘stranded asset’.

Such data predicaments are not specific to JET or to fusion, but apply across all of science, even though science is precisely the domain where AI could yield its greatest benefit to society. New breakthroughs and startups are emerging quickly, from protein design to material design. Scientists are also keen users of fast-improving AI coding agents. But a lack of high-quality data will dampen progress. In most disciplines, large, high-quality datasets like the Protein Data Bank, which underpinned AlphaFold, are absent.

The scientific community needs to tackle this problem, and there are promising signs. Late last year, the UK government launched an AI for Science strategy, which includes a new collaboration with Renaissance Philanthropy to identify priority datasets. The US government’s Genesis Mission aims to train AI models and agents on federal scientific data. Google.org has a dedicated AI for Science fund, which can fund datasets and tooling.

These examples suggest that if the scientific community can identify the data that AI needs, a range of actors could help to fund and deliver it.

This demands what we’re calling AI data stocktakes. The concept is simple: interview leading experts in a given scientific field to understand the main opportunities to apply AI; the data obstacles; and the interventions that could make the biggest difference. Admittedly, some blockages, such as a paucity of engineers, are structural and will take years to fully resolve. AI data stocktakes should identify such challenges, but focus on projects that governments, companies and philanthropies could fund and implement within 1-2 years.

There are promising early efforts to map AI data gaps. But, to our knowledge, there are no concise, accessible documents that explain the AI opportunities in genomics, weather forecasting, and food security and convert them into a list of fundable data projects for policymakers and funders to pursue.1

In this essay, we offer a proof-of-concept. We interviewed 25 leading experts to create an AI data stocktake for fusion. We focus on the UK, but our analysis and recommendations could be taken up by funders anywhere in the world. Moving forward, we hope to support AI data stocktakes for other scientific disciplines and research problems.

Subscribe now

I. Why fusion? Why now?

If fusion is achieved, it would provide a safe, almost limitless source of clean energy.2 From a scientific perspective, it would yield a better understanding of the plasma that makes up more than 99% of the visible universe. From a social impact perspective, it would help address energy scarcity and unlock energy-intensive innovations, like desalination.

Despite quips about fusion being always 20 years away, 70 years of experiments actually show fairly steady progress, which has continued in recent years, from Germany to China. In most fields, such progress would have solved the problems of interest decades ago. But fusion is an extremely hard problem. And the primary product is only attainable at the end of the line.

To achieve fusion, scientists need to create and control plasma, a super-hot state of matter, in which the atoms have been stripped of their electrons, and extreme heat and pressure are used to force the remaining nuclei to collide and fuse.

Scientists are pursuing two main approaches to doing this, with very different physics, data, and AI opportunities. Magnetic confinement fusion uses massive magnets, while inertial confinement fusion uses high-energy lasers.3 We focus this data stocktake effort on magnetic confinement, as the UK’s STEP project is pursuing that approach, as is Tokamak Energy, the UK’s leading fusion power startup, and Google DeepMind’s fusion team.

The end of the line for fusion is now getting closer, for two reasons. First, the underlying technology landscape has changed. In addition to AI, the discovery of high-temperature superconducting magnets makes it easier to build smaller and potentially cheaper reactors. Second, fusion has traditionally relied on government funding. But in the past five years, a wave of private investment has arrived, with more than 30 companies now pursuing fusion power.

These shifts have injected welcome momentum into the field, but also significant hype. In response, we need a clear view on the primary bottlenecks that AI can address.

II. How to accelerate fusion with AI

To create fusion, scientists and engineers need to predict, control and understand how plasma behaves. The challenge is that plasmas are highly complex and much of their underlying physics—from fluid dynamics to electromagnetics—remains poorly understood.

To make progress, scientists run experiments that create plasmas in a reactor, and use sensors to measure their properties under different conditions. Scientists use these experiments to validate their theories, reveal unexpected phenomena, and test the hardware needed for power-plant-class devices. However, building fusion reactors is extremely expensive and so few machines exist, with most researchers running their experiments at just ~10 leading facilities worldwide. When they can get access to such a facility, scientists must decide how to design the optimal experiment, including how to toggle an array of possible parameters, from the electrical current in a reactor’s coils to the valves that control the gas levels.

Fusion scientists also run computer simulations, including to help design and interpret these costly experiments. This is also challenging, as researchers must simulate a diverse range of phenomena, at very different scales, from the tiny, lightning-fast movements of electrons to the larger, slower evolution of the entire plasma. For simulations run on massive supercomputers, this may mean weeks. For scientists without such resources, it may mean many months. As a result, scientists make trade-offs, using assumptions and approximations to run their simulations more quickly and cheaply, but also less accurately.

The challenges don’t stop there. Scientists know that their theories, simulations and experiments are imperfect. But when a gap emerges between what a simulation suggests and what an experiment reports, it is often unclear where exactly the issue falls.

AI can help in four main ways.4

1. Improve simulations

Scientists can develop “AI surrogate” models that emulate the predictions from a fusion simulation code, at a fraction of the cost and time. To do so, they run a code many times, varying the input parameters each time. They then use the resulting dataset to train an AI model to predict the outputs of interest much more quickly.

Scientists have already shown that AI surrogates can make simulations faster. Moving forward, AI surrogates could make simulations more useful. First, scientists could develop AI surrogates for more accurate, but computationally expensive simulation codes. Second, they could develop ‘integrated models’, like TORAX, to stitch together AI surrogates for different phenomena—from the ‘turbulence’ that determines how well confined a plasma is, to the ‘scrape-off’ layer that simulates the plasma hitting the reactor’s wall. Finally, scientists could move beyond producing one-off AI surrogates that result in a paper and some code, to a world where surrogates are documented, maintained and ready for use in fusion reactors.

2. Improve experiments and operate the reactor

In most fusion experiments, scientists must decide if and how to tune various parameters, while striking a balance between more proven and novel settings. To help, researchers can use AI to predict the optimal parameters for their next experiment by learning from past ones; and to predict how well their experiments will fare. More recently, scientists have also started querying LLMs to check and refine their experimental protocols.

Scientists also use AI to predict the plasma ‘disruptions’ that frequently end experiments, damage machines and are one of the biggest obstacles to a future power plant. AI models can already predict past plasma disruptions with high accuracy. But predicting future disruptions, on more powerful machines, quickly enough to stop them, is an open research challenge.

The ultimate goal is to use AI to help operate the reactor itself. Fusion reactors run on a real-time feedback loop: sensors monitor the plasma, while the actuators, such as the magnetic coils, are adjusted accordingly. The traditional control algorithms used to enable this often struggle with the chaotic, non-linear nature of millions of plasma variables interacting.

In recent years, researchers have demonstrated how reinforcement-learning agents can learn more effective control policies, including to reduce plasma disruptions. To help these RL agents generalise to novel scenarios and reactors beyond their training data, scientists are developing ‘hybrid approaches’ that integrate some knowledge of physics into the models.

3. Improve fusion data

Fusion experiments are extreme environments. The intense heat and the chaotic nature of the plasma mean that the data that sensors pick up is often noisy or low quality. Some variables cannot be directly measured, and must be inferred, introducing additional sources of error.

Scientists are training AI models to extract clean signals from this noisy data and to learn correlations that allow them to predict data for one sensor, given data for others—a capability that could be critical if sensors in a future reactor get damaged. Scientists are also using AI to train surrogate models that speed up, and better calibrate, reconstructions of the plasma, using the limited experimental data that is available.

Scientists often care less about the raw data from their experiments, and more about important events, such as when a disruption to the plasma began. Today, they often need to manually inspect graphs and plots to detect these events. AI can help to automate parts of this process and to detect events that scientists may have missed.

4. Improve the underlying technologies

Achieving fusion will require a supply chain rich in technologies that could be applied more broadly. AI could help to accelerate their development.

For example, the chamber walls in a fusion reactor will require new materials that can withstand extreme temperatures. Scientists are training AI surrogate models that speed up the simulations needed to assess a candidate material’s real-world properties, like how strong or resistant to radiation it will be over its lifetime.

A typical fusion reactor also spends much of its time out of operation, at great cost. This makes fusion a logical place to develop predictive maintenance techniques that ingest historical data from sensors and train AI models to learn the subtle signatures that indicate pending breakdowns, allowing practitioners to schedule maintenance or design more reliable systems.

Subscribe now

III. The challenges with fusion data

As they pursue these AI opportunities, scientists will need access to three main kinds of fusion data: from experiments, simulations, and sources that are not traditionally available, such as researchers’ logbooks. There are promising efforts underway on this front, but many obstacles.

1. Experimental data: Unvalidated, single-machine and hard to access

Experimental data is the ‘ground truth’ that the sensors in reactors pick up, from line graphs to videos. In magnetic confinement fusion, the challenge is not so much a lack of data, but an excess of raw data that has not gone through the processing needed to make it useful to AI. This processing ranges from addressing noise and imperfections in the underlying sensors, to detecting and annotating important events, such as plasma disruptions.

Currently, the community has to rely on the small well-validated datasets that do exist, which may be as little as a few hundred or thousand experimental ‘shots’—individual test runs of a reactor. The high cost of fusion experiments has also resulted in a natural incentive to pursue experiments that will not fail, curtailing more novel research and meaning that much of the resulting data is in a similar ‘parameter’ space and does not represent the full range of plasma dynamics that scientists want to model.

This experimental data is also not generally available open source or for commercial use. One promising initiative to change this, which several interviewees cited, is UKAEA’s project to open source data from their MAST facility.

However, to develop more general AI models, researchers want multi-machine databases that extend beyond a single facility like MAST. To that end, the IAEA is developing a federated Fusion Data Lake where different institutions would store their data locally but make it accessible via a central data catalog. One challenge with this approach is that fusion facilities have defined fusion variables and stored data in different ways. The Integrated Modelling & Analysis Suite, or IMAS, addresses this by providing a standardised ontology and set of structures for fusion data. It is nascent, but has positive momentum.

2. Simulation data: No incentives, process, or place to host it

In theory, researchers should be able to run fusion simulation codes many times and train AI surrogate models on the resulting data to reproduce the outputs at a fraction of the cost. In practice, most scientists run a simulation to answer a single, narrow, physics question. They do not run a large number of simulations to build representative datasets to train AI surrogates—a very different activity.

That activity is also a hard one. There is no standard procedure to follow to generate a dataset for training an AI surrogate model, and the codes are often finicky to use. Most simulation codes contain ‘free parameters’—knobs that scientists must decide how to best tune—a practice that can be as much an art as a science. The datasets can also be huge and there is no obvious location to store them, although some early examples exist.

3. Dark data: Nascent, IP issues, and hard to integrate into workflows

‘Dark data’ describes the contextual information that scientists generate that is not captured in structured datasets. This includes notes scribbled in experimental logbooks, where scientists describe the procedures they ran, the hardware issues they faced, and the phenomena they observed. For simulations, it includes the many nuances needed to run and interpret a code’s results successfully, and the many undocumented imperfections to be aware of.

Accessing this dark data could help ensure that AI systems do not focus on the wrong things—for example, when an anomaly in the data is caused by an equipment failure or error, rather than a meaningful phenomenon. It could also provide AI with a window into the entire research process, including its many dead-ends, rather than just the final result.

Researchers are using LLMs to try to make dark fusion data accessible, for example by enabling scientists to query experimental logs and archive documents. But much of the data is not well-annotated, there are IP issues in accessing it, and it is not yet clear how to integrate the data into practitioners’ daily workflows.

The three ‘debts’ holding fusion data back

Many of these challenges with fusion data result from three underlying issues, which have compounded over time into systemic debts that inhibit the use of AI today.

1. Technical debt

The fusion community has traditionally had to prioritise getting large, complex machines to work, rather than building infrastructure to collect, curate, and share data. As a result, activities like data annotation and writing high-quality code are underfunded. Many leading fusion codes were created decades ago and have evolved slowly, while the quality of experimental data is limited by the capabilities of the sensors available.

2. Bureaucratic debt

The large costs of fusion experiments and the traditional reliance on government funding mean that many fusion projects have a complex web of owners and collaborators, which can make agreeing on new data initiatives difficult. For example, JET was sponsored and funded by Euratom, the EU’s nuclear research community. Its scientific exploitation was managed by EUROfusion, a pan-European network of fusion research labs. UKAEA managed engineering and operations. Releasing its data may require agreement from all of these actors.

There are other bureaucratic hurdles too. Scientists who run fusion experiments often want an embargo period on the resulting data so that they can prepare a publication. Such embargoes are rational, common in science, and largely supported, but many interviewees felt that they had become too long. Fusion data is also subject to diverging open-source policies. For example, the MAST experiment was funded by UK Research and Innovation, which has strong open data requirements. The follow-up MAST-U experiment is funded by the UK Department for Energy Security and Net Zero, which does not have the same policies. Many fusion companies also do not open source their data.

3. Human and cultural debt

The fusion community does not have enough software engineers and experts who are able to clean data, attach confidence levels, and curate it for AI use. As a result, physicists must take on many tasks that are outside their core areas of expertise, including writing high-quality code.

This issue is compounded by a research culture that inhibits data sharing. Scientists are constantly pushed to move on to the next experimental campaign, rather than to validate older data. This stops some scientists from sharing their data, because they fear that end users will not appreciate the resulting gaps and do bad science with it. Or they fear that they themselves will be criticised for releasing ‘unscientific’ data.

Subscribe now

IV. Recommendations

Below we provide eight recommendations to address these data limitations and accelerate fusion with AI. Each project could be led by a mix of government bodies and funders, like the Department for Science, Innovation and Technology and UK Research and Innovation; public research organisations like the UK Atomic Energy Authority, companies; universities; and philanthropies. Where possible, the UK should look to collaborate internationally—for example, with the US Genesis Mission and the International Atomic Energy Agency.

1. Strengthen the UK’s lead in open fusion data

Expand FAIR MAST, the UK’s pioneering open sourcing of experimental data from its MAST facility, by adding data from the follow-up MAST-U facility and making the user interface more accessible. This will require the UK Department for Energy Security and Net Zero clarifying that open data policies apply to MAST-U, funding at least five data engineers over a two-year time period, and ensuring that the project has sustainable compute and data storage.

2. Liberate 40 years of data from the Joint European Torus

Launch a project to open source at least 30% of JET experimental data by 2028. This will require agreement on what data to release. For example, should the project only release validated, curated data relating to notable discoveries? Or should it also release data that is raw, validated only in part, or which relates to ‘normal’ machine behaviour? Second, and much harder, will be securing agreement from all relevant institutions to release the data.

3. Launch a competition to predict plasma disruptions

Fund a competition to see which AI model can best predict future plasma disruptions in new experimental campaigns, building on early examples and work in this space. This could include funding dedicated experimental shots on machines such as MAST-U, to evaluate models on challenging edge cases. Beyond accuracy, sub-competitions could evaluate models on important variables, such as: Can the model make predictions with little data, such as when sensors become damaged?; Can the model predict disruptions across different reactors?; Can the model predict disruptions with sufficient lead time to prevent them?; and Can the model shed new light on why disruptions are occurring?

4. Prototype the future of AI-enabled scientific data curation

Expand the platform that UKAEA recently developed to enable human experts to use AI to annotate experimental data, by adding data from other fusion facilities; increasing the complexity and variety of the metadata that is captured; and training AI models to directly annotate an increasing share of this data.

5. Make leading simulation codes AI-ready

Launch an effort to modernise priority fusion simulation codes, including to make it easier to train AI surrogate models based on them. This could build on early efforts in this space and target codes, such as JINTRAC, which are important to the UK’s proposed STEP Fusion power plant and the international ITER effort. The project could start by modernising the codes’ documentation and ‘refactoring’ them so that they are compatible with modern chips, like GPUs and TPUs, and allow for parallel data generation. It could then open source the codes, with a plan for how to maintain them. Throughout the modernisation process, it could test the usefulness of AI coding tools to the tasks at hand.

6. Demonstrate a new state-of-the-art for AI surrogate models

Fund small teams of software engineers and experts to develop AI surrogate models of important, computationally expensive phenomena in fusion simulations. The project should ensure that all newly created surrogates have state-of-the-art documentation, data provenance and version control. It should release the data used to train and validate the surrogates and develop software pipelines to automate time-intensive aspects, such as organising the data.

7. Use AI agents to preserve expert fusion knowledge for the future

Gather a group of leading experts on a priority fusion simulation code, and equip them to use AI agents to make the tacit knowledge involved in running that code available to the wider research community. To do so, the experts could task the agent with running the code. As it seeks to execute, the agent would have an ‘internal monologue’ that the experts could trace, steer and intervene on. The end result would be a series of documents, such as markdown files, that capture the important dark data needed to run the code well.

8. Create Fusion-Bench to measure and drive LLM performance

Assign leading fusion experts to create an evaluation metric to quantify how well leading large language models understand core fusion concepts. This would make it easier to improve the usefulness of LLMs for downstream tasks in fusion. This evaluation will be more difficult to create than in disciplines like maths or computer science, where it is easier to automatically verify a model’s performance. But the experts could determine the most useful approach, which will likely involve a combination of question-answering and task performance.

V. Six open debates

The experts we interviewed disagreed on some points. Despite the framing below, few are either/or debates. Rather, most are about relative degrees of emphasis.

Incrementalism vs novelty: Should we build on the early AI opportunities that fusion practitioners have already showcased? Or pursue more novel, uncertain AI ideas, such as training general-purpose ‘fusion foundation models’ or using AI ‘world models’ to pursue new kinds of fusion simulations?
The past vs the future: Should we strive to get as much value as possible out of older fusion data, like JET? Or, do the costs mean that we should accept our losses, and focus on making future fusion experiments AI-ready?
Science vs engineering: Are efforts to validate, annotate and standardise data part of an ultimately doomed quest for perfect scientific understanding in fusion? Should we instead use AI to embrace a more engineering-led approach that can get the machines to work with noisy, imperfect, data?
Domestic vs international: Should the UK rejoin ITER, the world’s flagship international fusion collaboration, which it left following Brexit? Or should the UK focus on domestic efforts, perhaps in collaboration with priority partners, like the US and IAEA?
Magnetic vs Alternatives: Should the UK continue to focus on magnetic confinement fusion as the most realistic pathway to a future power plant? Is magnetic also a better bet for AI because it produces much more data and doesn’t have the same associations with the security establishment, which makes data access easier? Or should the UK invest more in inertial confinement and alternative fusion efforts, given the country’s diverse academic expertise, its historically strong relationship with the US National Ignition Facility, and notable assets, such as a world-leading laser?
Public vs Private: Should the UK government try to derive more immediate value from its fusion data? For example, should the UK license some data to companies, to cover the costs of data processing and annotation? If so, should local startups pay less? Or would such efforts hurt the UK’s goal of developing a world-leading fusion sector?

_________________

This essay was originally posted on the Google DeepMind website and is a summary of a 20-page report that contains more details and examples.

Thank you to the following experts who let us interview them, reviewed the draft, and/or provided other support, as well as those who prefer to remain anonymous. All mistakes belong to the authors and no expert spoke to us on behalf of their organisation.

Jonathan Citrin, Brendan Tracey, Cristina Rea, Nathan Cummings, Andrea Murari, Jess Montgomery, George Holt, Alain Becoulet, Matteo Barbarino, Arthur Turrell, Adriano Agnello, David Dickinson, Steven Rose, Alessandro Pau, Kristina Fort, Charles Yang, Federico Felici, Tim Dodwell, Sam Vinko, Aidan Crilly, Lee Margetts, Tom Westgarth, Lorenzo Zanisi, Chris Packard, Justin Wark and Stanislas Pamela.

Fusion has several characteristics that make an AI data stocktake exercise tractable, including a relatively small and centralised research community and early efforts to build on, like the open-source FAIR MAST initiative and the IMAS data standardisation effort. Fields like genomics, weather forecasting, and food security look quite different, and so careful thought is needed on how to best scope AI data stocktakes in these fields. Nevertheless, we think they would be useful.

There are caveats to the claim that fusion power would be essentially limitless, emission-free, and perfectly safe. One of the input fuels, tritium, is not widely available and scientists will need to use nascent ‘blankets’ to breed it from lithium. Certain parts of fusion reactors will become radioactive over time, although they can likely be recycled after ~50 years. Thermonuclear weapons use fusion reactions. However, the weapons first require fission reactions and fissile materials like enriched uranium and plutonium.

Note: There are other approaches to inertial confinement fusion that do not use lasers.

For more in-depth reviews of AI for fusion opportunities, see publications from MIT, the Clean Air Task Force, IAEA, FusionFest, and the US Department of Energy.

Science 2030

Zoë Brammer — Thu, 09 Oct 2025 08:02:20 GMT

You want to explore how advanced AI systems, when applied to science research, might impact the world, through 2030. Do you:

Publish a report with trends and forecasts?
Host a roundtable discussion?
Convene a 5-hour roleplaying workshop built around serious gaming?

In this piece, Zoë Brammer and Ankur Vora from Google DeepMind, and Anine Andresen and Shahar Avin from Technology Strategy Roleplay, explain why they chose option C, and introduce Science 2030, a new foresight game. The authors explore lessons from a pilot with 30 experts from government, tech companies and the science community, explain why games are a useful foresight tool, and share some challenges in designing them.

Source: Venus Krier

The success of AI governance efforts will largely rest on foresight, or the ability of AI labs, policymakers and others to identify, assess and prepare for divergent AI scenarios. Traditional governance tools like policy papers, roundtables, or government RFIs have their place, but are often too slow or vague for a technology as fast-advancing, general-purpose, and uncertain as AI. Data-driven forecasts and predictions, such as those developed by Epoch AI and Metaculus, and vivid scenarios such as those painted by AI 2027, are one component of what is needed. Still, even these methods don’t force participants to grapple with the messiness of human decision-making in such scenarios.

Why games? Why science?

In Art of Wargaming, Peter Perla tells us that strategic wargames began in earnest in the early 19th century, when Baron von Reisswitz and his son developed a tabletop exercise to teach the Prussian General Staff about military strategy in dynamic, uncertain environments. Today, ‘serious games’ remain best known in military and security domains, but they are used everywhere from education to business strategy.

In recent years, Technology Strategy Roleplay, a charity organisation, has pioneered the application of serious games to AI governance. TSR’s Intelligence Rising game simulates the paths by which AI capabilities and risks might take shape, and invites decision-makers to role-play the incentives, tensions and trade-offs that result. To date, more than 250 participants from governments, tech firms, think tanks and beyond have taken part.

Building on this example, we at Google DeepMind wanted to co-design a game to explore how AI may affect science and society. Why? As we outlined in a past essay, we believe that the application of AI to science could be its most consequential. As a domain, science also aligns nicely with the five criteria that successful games require, as outlined in a past paper by TSR’s Shahar Avin and colleagues:

Many actors must work together: Scientific progress rests on the interplay between policymakers, funders, academic researchers, corporate labs, and others. Their varying incentives, timelines, and ethical frameworks naturally lead to tensions that games are well-placed to explore.
The option space is ambiguous: As organisations pursue and respond to scientific advances, there are many questions and few clear answers, making open-ended role-play ideal. Should a lab open-source a sensitive discovery, withhold it, or try to find a middle ground? Who and what should governments fund, and how much can they experiment in this?
Outcomes depend on multiple decisions: As actors make decisions, from launching research programmes to funding consequential datasets, this can kick-off a cascade of downstream effects that are difficult to forecast, but which can be powerfully simulated through role-play.
Large changes are possible: In our past essay, we argued that AI could transform science by helping to tackle various challenges that are hindering scientists today, such as the ever-growing literature base and the increasing cost and complexity of experiments. As Shahar and colleagues explain, when such large, transformative shifts are expected to occur, extrapolation from historical trends can quickly become inadequate, making scenarios and role-play more useful.
Conflict is likely and instructive: With multiple actors with different goals, conflict about how to use AI in science is inevitable. Questions of prioritisation, resource allocation, process, culture, risks, ethics, and even what qualifies as ‘science’ will arise. This is not a drawback for a simulation: In fact, this ‘oppositional friction’ is essential to generating the kind of creative strategies and ‘edge-case’ scenarios that traditional forecasting methods might miss.

Setting the Stage: How does Science 2030 work?

We designed Science 2030 around four parameters: teams, timeline, action space, and assessment.

1. The teams

Each player represented a role in one of three camps: governments, companies, or the scientific community. For the government teams, we deliberately chose three ‘middle powers’ - the UK, France and Canada - so that players would have to navigate a world shaped by the US and China. To capture the dynamics within governments, we assigned each team a principal, a national security advisor, a science and technology advisor and an economic advisor, whose interests may naturally clash.

The company teams also embodied conflicting archetypes: a large tech company, an industrial conglomerate, and a fast-growing startup, all working to advance the use of AI in science while attending to their bottom line, image and impact on society. The scientific community teams were represented by a range of influential voices, including leading university researchers, philanthropists, and science communicators. Behind-the-scenes, an adjudication team acted as storytellers and referees.

2. The timeline

The game unfolded over a simulated five years, from 2025 to 2030: close enough to feel tangible and force players to grapple with today’s institutional realities and constraints, yet far enough away to allow for major breakthroughs and societal shifts. Each turn pushed the clock forward by one or two years. Before each turn, our adjudication team set the scene through a “state of the world” presentation, including the latest updates on the state of frontier AI development and AI for science technologies, as well as updates based on the decisions that players had taken in earlier turns. For example, the UK and Canada chose to partner on developing sovereign high-performance computing projects in turn 1, but Canada also developed a complimentary National Data Library and a streamlined approach to governing these resources, and so the UK found itself lagging behind Canada on certain metrics in turn 2.

3. The action space

What did the players actually do? Our past essay identified a set of ‘ingredients’ that are necessary for the successful use of AI in science, such as datasets, compute, and clear research strategies. Based on this, we developed a set of 45 policy cards, outlining different interventions such as an AI grand challenges programme or new markets for sharing scientific data. Each round, the governments had to play 3 policy cards from 15 options. Companies also had to pick a policy card (e.g. offering compute credits). The interventions forced teams to weigh the merits of different strategic approaches, such as centralisation vs decentralisation, or precautionary vs acceleration. The scientific community could not directly pursue interventions, but could influence governments and companies, who in turn could collaborate, or not.

4. Assessment

At the start of the game, we told players that we would use a set of metrics to measure each team’s progress, for example on prosperity, scientific prestige, social cohesion and national security for government teams, and headcount and degree of automation for company teams. After each turn, our adjudication team updated these metrics and shared them with the full group in the “state of the world” presentation. This way, the metrics helped teams track their status and choose their future actions.

We told players that all their actions would be considered as a whole, and that we would not necessarily specify the direct consequences of every action - a resolution approach called “net assessment” inspired by wargames run by TSR’s collaborator i3 Gen. For example, after pursuing a bullish, growth-focused strategy and without articulating any clear safeguards in turn 1, the UK saw a drop in its national security metric due to the heightened possibility of AI misuse at the start of turn 2.

We kept the adjudication process deliberately opaque, to reflect the fact that real-world actions usually involve uncertainty about consequences, and to ensure that players did not get caught up in trying to game the metrics to ‘win the game’. We also told players that their actions would be adjudicated more favourably if they had more support from other players, if their actions built on actions from previous turns, and if they presented compelling narratives when communicating their decisions to the room.

Insights from our first playtest

It is too early to draw firm conclusions on the future of AI in science from our first playtest, but it did yield some notable insights, particularly on the merits of games as a foresight methodology. All participants said the exercise was worth their time, 85% said they had gained a deeper understanding of future AI scenarios, and a similar share said the format yielded deeper insights than workshops or reports. Our first playtest also hinted at deeper insights that might emerge from playing the game at scale, and design changes to consider.

Coherence vs path dependency

At the end of the playtest, the facilitation team asked a simple question: which country would players choose to live in? An overwhelming majority felt the choice was easy: Canada. Why? In turn 1, team Canada began with some foundational policies, including a national data library, tax incentives for AI adoption and sovereign compute for priority projects. This evolved over subsequent turns into a coherent, collaborative approach that prioritised safety.

By contrast, team France began with a moderate state-led industrial policy that escalated into an aggressive pursuit of state control over AI development. This created a future that was perceived as volatile and authoritarian, with the state ultimately nationalising compute and AI companies. Team UK also followed an unpredictable path, beginning with a pragmatic, growth-focused strategy before pivoting throughout the game towards a seemingly disconnected set of grand, populist policies like AI as a human right and a unilateral sovereign AGI project. This raises a broader question: In a rapidly evolving tech landscape, how do you retain flexibility and avoid harmful path dependency, while still pursuing a strategy that is sufficiently stable and coherent?

Imagining radically different futures

In the first turn, most government teams selected relatively familiar, grounded policies, such as a national data library or advanced purchase commitments for AI applications. However, by the final turn they were opting for more radical ideas such as a UK/Canadian citizen compute entitlement and a UK initiative to guarantee AI as a human right for all citizens. This was partly in response to new technology developments, which raised the stakes. For example, after ignoring more precautionary policies in turn 1, the imminent feasibility of developing AGI led Canada to make a mandatory compute off-switch its flagship policy in turn 3, in part as a reaction to the pressures of the scientific community, who stated that they “absolutely need to see that“. Similarly, France chose to nationalise AI companies in turn 3 to “address citizen concerns”. By immersing players directly into a rapidly-accelerating future, the game forced them to move beyond more incremental policy tools and grapple with the bolder interventions that the future evolution of AI might demand.

Decision-making in the fog

During the debrief session at the end of the game, one player noted “I felt like every turn…we were running out of time…it felt like in the chaos…you lose some value in the decision making process.” As the debrief unfolded, players agreed that this feeling of chaos is actually an accurate simulation of the high-pressure reality leaders will face. The game’s chaotic, time-pressured environment was a feature rather than a bug, that simulated the ‘fog’ of real-world policymaking, leading to rushed, imperfect decisions with cascading consequences.

Roads Not Taken: Questions for designing a game

Designing a game like Science 2030 requires balancing a huge number of parameters. Opening some doors inevitably closes others. We feel strongly about some of the choices we made, while others were simply a matter of going for the most promising of the available options.

1. Who is in the room?

It would be easier to run the game with fewer players and with less specific roles, but that would also sacrifice the nuances and realism that this diversity brings. When recruiting players, we sought to balance availability with expertise: junior experts were generally more available and brought priceless enthusiasm, while the mere presence of some senior leaders added significant value for all participants. One conclusion from the playtest was that expertise and experience are critical to the richness of the scenarios that players create, particularly for an untested game, and so we plan to invest more in participant recruitment for future iterations. From a practical perspective, assigning participants to roles that best leverage their expertise takes time, turning what might be akin to sending an event invite into more of a casting assignment.

2. How long should the game be?

At 5 hours, our game was a lengthy commitment. It could potentially be shorter, or more likely (a lot) longer. In choosing a duration, we faced a clear trade-off between player availability and the depth of the scenarios created. TSR’s previous experience running Intelligence Rising at varying lengths showed that it’s challenging to ensure sufficient immersion with games unless they last more than 3 hours. We did accommodate players who could only make a few hours, but opted for a half-day experience. Still, if the game was going to do true justice to the scope it attempted to cover, it would have been much longer: in the military, some wargames last for days or weeks.

3. What options are available to different teams?

A game must set rules about what different actors are “allowed” to do. Our game conceived of companies and the scientific community primarily as stakeholders that inform, support, or object to actions that governments take. This was a flattened representation of reality, made for simplicity’s sake. In turn 1, a risk-focused scientist lamented, “We were very keen on some of these precautionary approaches... but we have not had the opportunity [to implement them].” While the game accurately simulated a common feeling in the scientific community - intellectual influence but a frustrating lack of power to enact policy - we think we stripped out too much from the processes we seek to simulate.

A player on the science team articulated the issue perfectly: “one strange aspect is that [the game portrayed] the scientific community [as] kind of like tastemakers or something, but we don’t actually know anything [within the game] that anyone else doesn’t know.“ This turned out to be one of the key discoveries from the playtest: We should find a way for the scientific community and companies to directly drive the AI for science technologies that are part of the scenarios. This is something we’re excited to delve into in future design iterations of the game.

4. How open-ended is players’ action space?

We decided to go with a fairly ‘closed’ game, where players chose from a set of pre-specified interventions, or policy cards. Instead, we could have given players more freedom to create their own policies. Doing so would have obvious merits in terms of spurring creativity, but would place a very high burden on adjudicators to maintain the rigour of the exercise. They would have to constantly communicate to players which policy actions and cascading consequences were coherent and realistic contributions to the world model, and which were not.

Our team also felt that when most of the insight and expertise on a topic is highly concentrated, as is arguably the case with advanced AI systems, this calls for a more closed game. A closed approach allowed our team to inject policy options that were more disruptive or novel, and could stretch participant imaginations, but still make sense within the context of the game. To ensure we covered contrasting ideas and assumptions, we conducted pre-game workshops and interviews to inform our design. During the game, players also had space to share nuances on their policy proposals. This allowed us to constrain the game enough that players had a shared sense of purpose, while still allowing for discretion, judgement and creativity. Ultimately though, the optimal blend of closed and open decision-making remains an open design question.

5. How much leverage do players get over the wider world?

Related to the degree of openness, we thought about how much insight and control players should have over the broader world in which they are playing. It’s fun to play a game where everything can be controlled, but not necessarily realistic or fruitful. In practice, the resource intensive nature of the AI scaling paradigm and the existing geopolitical and scientific reality (where countries like the US and China have outsized roles) could lead to scenarios that leave some participants with very limited room to maneuver compared to others. While some countries do play an outsized role, our team took the view that it is important and interesting to explore the action space of middle power governments, as it is far from settled what they can achieve.

On the other hand, given time constraints, we made a strategic decision to downplay other ‘real-world’ constraints that normally affect policy decisions. For example in the debrief session, one player noted that the game didn’t include “the general public” beyond the advocacy of the science communicators and the background information that adjudicators included in their “state of the world” presentation. Another highlighted the need to model “misinformation“ and how policies can “easily become unpopular, or twisted“. Removing and downplaying these aspects created a ‘technocrat’s bubble’ where players could make what they felt were ‘optimal’ decisions while simultaneously recognising their real-world impossibility. Decisions like France’s AI nationalisation were strategically clean in the game but would be politically fraught in reality.

What’s next?

We plan to run additional Science 2030 games with more senior audiences and in different geographies in the coming year. We are also interested in bringing the merits of serious role-play games, such as encouraging more empathy between stakeholders and modeling chains of decisions, to other domains such as AI and education.

We are also considering potential design changes. One challenge with an AI for science game is how much to focus on ‘core AI’ developments and policies, versus those that are more specific to the ‘science’ ecosystem. In future iterations, we would like to go deeper on the latter. The current approach also treats policy interventions as discrete choices, giving little room for teams to adapt the proposed action or make choices about how to implement it better. There are clear reasons for this, but we are interested in exploring ways to add more room for nuance.

Finally, we are curious about the potential role that AI itself could play in designing and implementing games like this. Recent breakthroughs like Concordia show how LLMs can model human agents and create nuanced scenarios. We are excited about the potential to use AI to create novel or unappreciated feedback loops between the policy actions that players take and the simulated worlds that result. If you have ideas or work planned in this space, please reach out at aipolicyperspectives@gmail.com.

The authors would like to thank colleagues at Google DeepMind and external participants for participating in the elicitation sessions and early playtests. Their engagement, feedback and openness were central to this project. Thank you in particular to: Sara Babahami (UK Government Office for Science), Anastasia Bektimitova (the Entrepreneurs Network), Charlie Bradbury (Deloitte), Uma Kalkar (OECD, GovAI), Nick Moës (The Future Society), David Norman (Cooperative AI Foundation), Alex Obadia (ARIA), Toby Ord (Oxford Martin AI Governance Initiative), Velislava Petrova (Centre for Future Generations), D Max Reddel (Centre for Future Generations), Pranay Shah (ARIA), Nicole Wheeler (ARIA).

A new golden age of discovery

AI Policy Perspectives — Tue, 26 Nov 2024 10:20:49 GMT

This essay is written by Conor Griffin, Don Wallace, Juan Mateos-Garcia, Hanna Schieve and Pushmeet Kohli. Subscribe to receive future essays and leave a comment below to give us your take on the most important AI for Science opportunities, ingredients, risks and public policy ideas. This essay was originally posted on the Google DeepMind website.

Introduction

A quiet revolution is brewing in labs around the world, where scientists’ use of AI is growing exponentially. One in three postdocs now use large language models to help carry out literature reviews, coding, and editing. In October, the creators of our AlphaFold 2 system, Demis Hassabis and John Jumper, became Nobel Laureates in Chemistry for using AI to predict the structure of proteins, alongside the scientist David Baker, for his work to design new proteins. Society will soon start to feel these benefits more directly, with drugs and materials designed with the help of AI currently making their way through development.

In this essay, we take a tour of how AI is transforming scientific disciplines from genomics to computer science to weather forecasting. Some scientists are training their own AI models, while others are fine-tuning existing AI models, or using these models’ predictions to accelerate their research. These scientists are using AI as a scientific instrument to help tackle important problems, such as designing proteins that bind more tightly to disease targets, but are also gradually transforming how science itself is practised.

There is a growing imperative behind scientists’ embrace of AI. In recent decades, scientists have continued to deliver consequential advances, from Covid-19 vaccines to renewable energy. But it takes an ever larger number of researchers to make these breakthroughs, and to transform them into downstream applications. As a result, even though the scientific workforce has grown significantly over the past half-century, rising more than seven fold in the US alone, the societal progress that we would expect to follow, has slowed. For instance, much of the world has witnessed a sustained slowdown in productivity growth that is undermining the quality of public services. Progress towards the 2030 Sustainable Development Goals, which capture the biggest challenges in health, the environment, and beyond, is stalling.

In particular, scientists looking to make breakthroughs today increasingly run into challenges relating to scale and complexity, from the ever-growing literature base they need to master, to the increasingly complex experiments they want to run. Modern deep learning methods are particularly well-suited to these scale and complexity challenges and can compress the time that future scientific progress would otherwise require. For instance, in structural biology, a single x-ray crystallography experiment to determine the structure of a protein can take years of work and cost approximately $100,000, depending on the protein. The AlphaFold Protein Structure Database now provides instant access to 200 million predicted protein structures for free.

The potential benefits of AI to science are not guaranteed. A significant share of scientists already use LLM-based tools to assist with everyday tasks, such as coding and editing, but the share of scientists using AI-centric research approaches is much lower, albeit rising rapidly. In the rush to use AI, some early scientific use cases have had questionable impact. Policymakers can help accelerate AI's use and steer it towards higher-impact areas. The US Department of Energy, the European Commission, the UK’s Royal Society, and the US National Academies, among others, have recently recognised the AI for Science opportunity. But no country has yet put a comprehensive strategy in place to enable it.

We hope our essay can inform such a strategy. It is aimed at those who make and influence science policy, and funding decisions. We first identify 5 opportunities where there is a growing imperative to use AI in science and examine the primary ingredients needed to make breakthroughs in these areas. We then explore the most commonly-cited risks from using AI in science, such as to scientific creativity and reliability, and argue that AI can ultimately be net beneficial in each area. We conclude with four public policy ideas to help usher in a new golden age of AI-enabled science.

Throughout the essay we draw on insights from over two dozen interviews with experts from our own AI for Science projects at Google DeepMind, as well as external experts. The essay naturally reflects our vantage point as a private sector lab, but we believe the case we make is relevant for the whole of science. We hope that readers will respond by sharing their take on the most important AI for Science opportunities, ingredients, risks and policy ideas.

Part A. The opportunities

Scientists aim to understand, predict, and influence how the natural and social worlds work, to inspire and satisfy curiosity, and to tackle important problems facing society. Technologies and methods, like the microscope, x-ray diffraction, and statistics, are both products of science and enablers of it. Over the past century, scientists have increasingly relied on these instruments to carry out their experiments and advance their theories. Computational tools and large-scale data analysis have become particularly important, enabling everything from the discovery of the Higgs boson to the mapping of the human genome. From one view, scientists’ growing use of AI is a logical extension of this long-running trend. But it may also signal something much more profound - a discontinuous leap in the limits of what science is capable of.

Rather than listing all areas where it is possible to use AI, we highlight five opportunities where we think there is an imperative to use it. These opportunities apply across disciplines and address a specific bottleneck, related to scale and complexity, that scientists increasingly face at different points in the scientific process, from generating powerful novel hypotheses to sharing their work with the world.

1. KNOWLEDGE: Transform how scientists digest and communicate knowledge

To make new discoveries, scientists need to master a pre-existing body of knowledge that continues to grow exponentially and become ever more specialised. This ‘burden of knowledge’ helps explain why scientists making transformative discoveries are increasingly older, interdisciplinary, and located at elite universities, and why the share of papers authored by individuals, or small teams, is declining, even though small teams are often better-placed to advance disruptive scientific ideas. When it comes to sharing their research there have been welcome innovations such as preprint servers and code repositories, but most scientists still share their findings in dense, jargon-heavy, English-only papers. This can impede rather than ignite interest in scientists’ work, including from policymakers, businesses, and the public.

Scientists are already using LLMs, and early scientific assistants based on LLMs, to help address these challenges, such as by synthesising the most relevant insights from the literature. In an early demonstration, our Science team used our Gemini LLM to find, extract, and populate specific data from the most relevant subset of 200,000 papers, within a day. Upcoming innovations, such as fine-tuning LLMs on more scientific data and advances in long context windows and citation use, will steadily improve these capabilities. As we expand on below, these opportunities are not without risk. But they provide a window to fundamentally rethink certain scientific tasks, such as what it means to ‘read’ or ‘write’ a scientific paper in a world where a scientist can use an LLM to help critique it, tailor its implications for different audiences, or transform it into an ‘interactive paper’ or audio guide.

2. DATA: Generate, extract, and annotate large scientific datasets

Despite popular narratives about an era of data abundance, there is a chronic lack of scientific data on most of the natural and social world, from the soil, deep ocean and atmosphere, to the informal economy. AI could help in different ways. It could make existing data collection more accurate, for example by reducing the noise and errors that can occur when sequencing DNA, detecting cell types in a sample, or capturing animal sounds. Scientists can also exploit LLMs’ growing ability to operate across images, video and audio, to extract the unstructured scientific data that is buried in scientific publications, archives, and less obvious resources such as instructional videos, and convert it into structured datasets.

AI can also help to annotate scientific data with the supporting information that scientists need in order to use it. For example, at least one-third of microbial proteins are not reliably annotated with details about the function(s) that they are thought to perform. In 2022, our researchers used AI to predict the function of proteins, leading to new entries in the UniProt, Pfam and InterPro databases. AI models, once validated, can also serve as new sources of synthetic scientific data. For example, our AlphaProteo protein design model is trained on more than 100 million AI-generated protein structures from AlphaFold 2, along with experimental structures from the Protein Data Bank. These AI opportunities can complement and increase the return on other much-needed efforts to generate scientific data, such as digitising archives, or funding new data capture technologies and methods, like efforts underway in single cell genomics to create powerful datasets of individual cells in unprecedented detail.

3. EXPERIMENTS: Simulate, accelerate and inform complex experiments

Many scientific experiments are expensive, complex, and slow. Some do not happen at all because researchers cannot access the facilities, participants or inputs that they need. Fusion is a case in point. It promises an energy source that is practically limitless, emission-free and could enable the scaling of energy-intensive innovations, like desalination. To realise fusion, scientists need to create and control plasma - a fourth fundamental state of matter. However, the facilities needed are hugely complex to build. ITER’s prototype tokamak reactor began construction in 2013, but plasma experiments are not set to begin until the mid-2030s at the earliest, although others hope to build smaller reactors on shorter timelines.

AI could help to simulate fusion experiments and enable much more efficient use of subsequent experiment time. One approach is to run reinforcement learning agents on simulations of physical systems. Between 2019 and 2021, our researchers partnered with the Swiss Federal Institute of Technology Lausanne to demonstrate how to use RL to control the shape of plasma in a simulation of a tokamak reactor. These approaches could be extended to other experimental facilities, such as particle accelerators, telescope arrays, or gravitational wave detectors.

Using AI to simulate experiments will look very different across disciplines, but a common thread is that the simulations will often inform and guide physical experiments, rather than substitute for them. For example, the average person has more than 9,000 missense variants, or single letter substitutions in their DNA. Most of these genetic variants are benign but some can disrupt the functions that proteins perform, contributing to rare genetic diseases like cystic fibrosis as well as common diseases like cancer. Physical experiments to test the effects of these variants are often limited to a single protein. Our AlphaMissense model classifies 89% of the 71 million potential human missense variants as likely harmful or benign, enabling scientists to focus their physical experiments on the most likely contributors to disease.

AlphaMissense predicted the pathogenicity of all possible 71 million missense variants. It classified 89% - predicting 57% were likely benign and 32% were likely pathogenic

4. MODELS: Model complex systems and how their components interact

In a 1960 paper, the Nobel Prize winning physicist Eugene Wigner marvelled at the "unreasonable effectiveness” of mathematical equations for modelling important natural phenomena, such as planetary motion. However, over the past half century, models that rely on sets of equations or other deterministic assumptions have struggled to capture the full complexity of systems in biology, economics, weather, and elsewhere. This reflects the sheer number of interacting parts that make up these systems, as well as their dynamism and potential for emergent, random or chaotic behaviour. The challenges in modelling these systems impedes scientists' ability to predict or control how they will behave, including during shocks or interventions, such as rising temperatures, a new drug, or the introduction of a tax change.

AI could more accurately model these complex systems by ingesting more data about them, and learning more powerful patterns and regularities within this data. For example, modern weather forecasting is a triumph of science and engineering. For governments and industry, it informs everything from renewable energy planning to preparing for hurricanes and floods. For the public, the weather is the most popular non-branded query on Google Search. Traditional numeral prediction methods are based on carefully-defined physics equations that provide a very useful, yet imperfect, approximation of the atmosphere's complex dynamics. They are also computationally expensive to run. In 2023, we released a deep learning system that predicts weather conditions up to 10 days in advance, which outperformed traditional models on accuracy and prediction speed. As we expand on below, using AI to forecast weather variables could also help to mitigate and respond to climate change. For instance, when pilots fly through humid regions it can cause condensation trails that contribute to aviation’s global warming impact. Google scientists recently used AI to predict when and where humid regions may arise to help pilots avoid flying through them.

In many cases, AI will enrich traditional approaches to modelling complex systems rather than replace them. For example, agent-based modelling simulates interactions between individual actors, like firms and consumers, to understand how these interactions might affect a larger more complex system like the economy. Traditional approaches require scientists to specify beforehand how these computational agents should behave. Our research teams recently outlined how scientists could use LLMs to create more flexible generative agents that communicate and take actions, such as searching for information or making purchases, while also reasoning about and remembering these actions. Scientists could also use reinforcement learning to study how these agents learn and adapt their behaviour in more dynamic simulations, for example in response to the introduction of new energy prices or pandemic response policies.

5. SOLUTIONS: Identify novel solutions to problems with large search spaces

Many important scientific problems come with a practically incomprehensible number of potential solutions. For example, biologists and chemists aim to determine the structure, characteristics, and function(s) of molecules such as proteins. One goal of such work is to help design novel versions of these molecules to serve as antibody drugs, plastic-degrading enzymes or new materials. However, to design a small molecule drug, scientists face more than 10⁶⁰potential options. To design a protein with 400 standard amino acids, they face 20⁴⁰⁰options. These large search spaces are not limited to molecules but are commonplace for many scientific problems, such as finding the best proof for a maths problem, the most efficient algorithm for a computer science task, or the best architecture for a computer chip.

Traditionally, scientists rely on some combination of intuition, trial and error, iteration, or brute force computing to find the best molecule, proof, or algorithm. However, these methods struggle to exploit the huge space of potential solutions, leaving better ones undiscovered. AI can open up new parts of these search spaces while also homing in more quickly on the solutions that are most likely to be viable and useful - a delicate balancing act. For example, in July, our AlphaProof and AlphaGeometry 2 systems correctly solved four out of six problems from the International Mathematical Olympiad, an elite high school competition. The systems make use of our Gemini LLM architecture to generate a large number of novel ideas and potential solutions for a given maths problem, and combine this with systems grounded in mathematical logic that can iteratively work towards the candidate solutions that are most likely to be correct.

AI scientists or AI-empowered scientists?

This growing use of AI in science, and the emergence of early AI scientific assistants, raises questions about how fast and how far the capabilities of AI may advance and what this will mean for human scientists. Current LLM-based AI scientific assistants make a relatively small contribution to a relatively narrow range of tasks, such as supporting literature reviews. There are plausible near-term scenarios in which they become better at these tasks and become capable of more impactful ones, such as helping to generate powerful hypotheses, or helping to predict the outcomes of experiments. However, current systems still struggle with the deeper creativity and reasoning that human scientists rely on for such tasks. Efforts are underway to improve these AI capabilities, for example by combining LLMs with logical deduction engines, as in our AlphaProof and AlphaGeometry 2 examples, but further breakthroughs are needed. The ability to accelerate or automate experiments will also be harder for those that require complicated actions in wet labs, interacting with human participants, or lengthy processes, such as monitoring disease progression. Although again, work is underway in some of these areas, such as new types of laboratory robotics and automated labs.

Even as AI systems’ capabilities improve, the greatest marginal benefit will come from deploying them in use cases that play to their relative strengths - such as the ability to rapidly extract information from huge datasets - and which help address genuine bottlenecks to scientific progress such as the five opportunities outlined above, rather than automating tasks that human scientists already do well. As AI enables cheaper and more powerful science, demand for science and scientists will also grow. For example, recent breakthroughs have already led to a slew of new startups in areas like protein design, material science and weather forecasting. Unlike other sectors, and despite past claims to the contrary, future demand for science appears practically limitless. New advances have always opened up new, unpredictable regions in the scientific map of knowledge, and AI will do similar. As envisioned by Herbert Simon, AI systems will also become objects of science research, with scientists set to play a leading role in evaluating and explaining their scientific capabilities, as well as in developing new types of human-AI scientific systems.

Part B. The ingredients

We are interested in the ingredients that ambitious AI for Science efforts need to succeed - both at the individual research effort level and at the level of the science ecosystem, where policymakers have more scope to shape them. The experts that we interviewed routinely cited several ingredients that we organised into a toy model, which we call the AI for Science production function. This production function is not meant to be exhaustive, prescriptive, or a neat linear process. The ingredients will be intuitive to many, but our interviews revealed a number of lessons about what they look like in practice which we share below.

1. PROBLEM SELECTION: Pursue ambitious, AI-shaped problems

Scientific progress rests on being able to identify an important problem and ask the right question about how to solve it. In their exploration into the genesis of scientific breakthroughs, Venkatesh Narayanamurti and Jeffrey Y. Tsao document how important the reciprocal and recursive relationship between questions and answers is, including the importance of asking ambitious new questions. Our Science team starts by thinking about whether a potential research problem is significant enough to justify a substantial investment of time and resources. Our CEO Demis Hassabis has a mental model to guide this assessment: thinking about all of science as a tree of knowledge. We are particularly interested in the roots - fundamental ‘root node problems’ like protein structure prediction or quantum chemistry that, if solved, could unlock entirely new branches of research and applications.

To assess whether AI will be suitable and additive, we look for problems with certain characteristics, such as huge combinatorial search spaces, large amounts of data, and a clear objective function to benchmark performance against. Often a problem is suitable for AI in principle, but the inputs aren’t yet in place and it needs to be stored for later. One of the original inspirations for AlphaFold was conversations that Demis had many years prior as a student with a friend who was obsessed with the protein folding problem. Many recent breakthroughs also feature this coming together of an important scientific problem and an AI approach that has reached a point of maturity. For example, our fusion effort was aided by a novel reinforcement learning algorithm called maximum a posteriori policy optimization, which had only just been released. Alongside a new fast and accurate simulator that our partners EPFL had just developed, that enabled the team to overcome a data paucity challenge.

In addition to picking the right problem, it is important to specify it at the right level of difficulty. Our interviewees emphasised that a powerful problem statement for AI is often one that lends itself to intermediate results. If you pick a problem that’s too hard then you won’t generate enough signal to make progress. Getting this right relies on intuition and experimentation.

2. EVALUATIONS: Invest in evaluation methods that can provide a robust performance signal and are endorsed by the community

Scientists use evaluation methods, such as benchmarks, metrics and competitions, to assess the scientific capabilities of an AI model. Done well, these evaluations provide a way to track progress, encourage innovation in methods, and galvanise researchers’ interest in a scientific problem. Often, a variety of evaluation methods are required. For example, our weather forecasting team started with an initial ‘progress metric’ based on a few key variables, such as surface temperature, that they used to ‘hill climb’, or gradually improve their model’s performance. When the model had reached a certain level of performance, they carried out a more comprehensive evaluation using more than 1,300 metrics inspired by the European Centre for Medium-Range Weather Forecasts’s evaluation scorecard. In past work, the team learned that AI models can sometimes achieve good scores on these metrics in undesirable ways. For example, ‘blurry’ predictions - such as predicting rainfall within a large geographical area - are less penalised than ‘sharp’ predictions - such as predicting a storm in a location that is very slightly different to the actual location - the so-called ‘double-penalty’ problem. To provide further verification, the team evaluated the usefulness of their model on downstream tasks, such as its ability to predict the track of a cyclone, and to characterise the strength of ‘atmospheric rivers’ - narrow bands of concentrated moisture that can lead to flooding.

The most impactful AI for Science evaluation methods are often community-driven or endorsed. A gold standard is the Critical Assessment of protein Structure Prediction competition. Established in 1994 by Professor John Moult and Professor Krzysztof Fidelis, the biennial CASP competition has challenged research groups to test the accuracy of their protein structure prediction models against real, unreleased, experimental protein structures. It has also become a unique global community and a catalyst for research progress, albeit one that is hard to replicate quickly. The need for community buy-in also provides an argument for publishing benchmarks so that researchers can use, critique and improve them. However, this also creates the risk that the benchmark will ‘leak’ into an AI model’s training data, reducing its usefulness for tracking progress. There is no perfect solution to this tradeoff but, at a minimum, new public benchmarks are needed at regular intervals. Scientists, AI labs and policymakers should also explore new ways to assess the scientific capabilities of AI models, such as setting up new third-party assessor organisations, competitions, and enabling more open-ended probing of AI models’ capabilities by scientists.

3. COMPUTE: Track how compute use is evolving and invest in specialist skills

Multiple government reviews have recognised the growing importance of compute to progress in AI, science, and the wider economy. As we expand on further below, there is also a growing focus on its energy consumption and greenhouse gas emissions. AI labs and policymakers should take a grounded, long-term view that considers how compute needs will vary across AI models and use cases, potential multiplier effects and efficiency gains, and how this compares to counterfactual approaches to scientific progress that don’t use AI.

For example, some state-of-the-art AI models, such as in protein design, are relatively small. Larger models, like LLMs, are compute-intensive to train but typically require much less compute to fine-tune, or to run inference against, which can open up more efficient pathways to science research. Once an LLM is trained, it is also easier to make it more efficient, for example via better data curation, or by ‘distilling’ the large model into a smaller one. AI compute needs should also be evaluated in comparison to other models of scientific progress. For example, AI weather forecasting models are compute-intensive to train, but can still be more computationally-efficient than traditional techniques. These nuances highlight the need for AI labs and policymakers to track compute use empirically, to understand how it is evolving, and to project what these trends mean for future demand. In addition to ensuring sufficient access to the right kind of chips, a compute strategy should also prioritise the critical infrastructure and engineering skills needed to manage access and ensure reliability. This is often under-resourced in academia and public research institutions.

4. DATA: Blend top-down and bottom-up efforts to collect, curate, store, and access data

Similar to compute, data can be viewed as critical infrastructure for AI for Science efforts that needs to be developed, maintained, and updated over time. Discussions often focus on identifying new datasets that policymakers and practitioners should create. There is a role for such top-down efforts. In 2012, the Obama Administration launched the Materials Project to map known and predicted materials, such as inorganic crystals, like silicon, that are found in batteries, solar panels, and computer chips. Our recent GNoME effort used this data to predict 2.2 million novel inorganic crystals, including 380,000 that simulations suggest are stable at low temperatures, making them candidates for new materials.

However, it is often difficult to predict in advance what scientific datasets will be most important, and many AI for Science breakthroughs rely on data that emerged more organically, thanks to the efforts of an enterprising individual or small teams. For example, Daniel MacArthur, then a researcher at the Broad Institute, led the development of the gnomAD dataset of genetic variants that our AlphaMissense work subsequently drew on. Similarly, the mathematical proof assistant and programming language Lean was originally developed by the programmer Leonardo de Moura. It is not a dataset, but many AI labs now use it to help train their AI maths models, including our AlphaProof system.

Efforts like gnomAD or Lean highlight how top-down data efforts need to be complemented by better incentives for individuals at all stages of the data pipeline. For example, some data from strategic wet lab experiments is currently discarded, but could be collected and stored, if stable funding was available. Data curation could also be better incentivised. Our AlphaFold models were trained on data from the Protein Data Bank that was particularly high quality because journals require the deposition of protein structures as a precondition for publication, and the PDB’s professional data curators developed standards for this data. In genomics, many researchers are also obliged to deposit raw sequencing data in the Sequence Read Archive but inconsistent standards mean that individual datasets often still need to be reprocessed and combined. Some other high-quality datasets go unused altogether, because of restrictive licensing conditions, such as in biodiversity, or because the datasets are not released, such as decades of data from publicly-funded fusion experiments. There can be logical reasons for this, such as a lack of time, funds, somewhere to put the data, or the need for temporary embargo periods for researchers who develop the data. But in aggregate these data access issues pose a key bottleneck to using AI to advance scientific progress.

5. ORGANISATIONAL DESIGN: Strike the right balance between bottom-up creativity and top-down coordination

A simple heuristic is that academia and industry tend to approach science research at two ends of a spectrum. Academia tends to be more bottom-up, and industry labs tend to be more top-down. In reality, there has long been plenty of space in between, particularly at the most successful labs, such as the golden eras of Bell Labs and Xerox PARC that were renowned for their blue skies research and served as inspiration in the founding of DeepMind. Recently, a new wave of science research institutions has emerged that try to learn from these outlier examples. These organisations differ in their goals, funding models, disciplinary focus, and how they organise their work. But collectively they want to deliver more high-risk, high-reward research, less bureaucracy, and better incentives for scientists. Many have a strong focus on applying AI, such as the UK’s Advanced Research and Invention Agency, the Arc Institute, and the growing number of Focused Research Organisations that aim to tackle specific problems in science that are too large for academia and not profitable enough for industry, such as the organisation tasked with expanding the Lean proof assistant that is pivotal to AI maths research.

At their core, these new institutions share a desire to find a better blend of top-down coordination and focus with bottom-up empowerment of scientists. For some organisations, this means focussing on a single specific problem with pre-specified milestones. For others, it means offering more unrestricted funding to principal investigators. Getting this balance right is critical to attracting and retaining research leaders, who must also buy into it if it is to succeed - Demis Hassabis has credited it as the single biggest factor for successfully coordinating cutting-edge research at scale. Striking this balance is also important within individual research efforts. In Google DeepMind’s case, efforts often pivot between more unstructured ‘exploration’ phases, where teams search for new ideas, and faster ‘exploitation’ phases, where they focus on engineering and scaling performance. There is an art to knowing when to switch between these modes and how to adapt the project team accordingly.

6. INTERDISCIPLINARITY: Approach science as a team, fund neglected roles, and promote a culture of contestability

Many of the hardest scientific problems require progress at the boundaries between fields. However when practitioners are brought together, for example during Covid-19, they often struggle to transition from multidisciplinarity - where they each retain their own disciplinary angle - to genuine interdisciplinarity, where they collectively develop shared ideas and methods. This challenge reflects the growing specialisation of scientific knowledge, as well as incentives such as grant funding, that often evaluate practitioners predominantly on their core expertise.

AI for Science efforts are often multidisciplinary by default, but to succeed, they need to become genuinely interdisciplinary. A starting point is to pick a problem that requires each type of expertise, and then provide enough time and focus to cultivate a team dynamic around it. For example, our Ithaca project used AI to restore and attribute damaged ancient Greek inscriptions, which could help practitioners to study the thought, language, and history of past civilizations. To succeed, project co-lead Yannis Assael had to develop an understanding of epigraphy - the study of ancient inscribed text. The project’s epigraphers, in turn, had to learn how the AI model worked, given the importance of intuition to their work. Cultivating these team dynamics requires the right incentives. Empowering a small, tight-knit team to focus on solving the problem, rather than authorship of papers, was key to the AlphaFold 2 breakthrough. This type of focus can be easier to achieve in industry labs, but again highlights the importance of longer-term public research funding that is less tied to publication pressures.

This image shows a restored decree concerning the Acropolis of Athens and dates 485/5 BCE. (CC BY-SA 3.0, WikiMedia).

To achieve genuine interdisciplinarity, organisations also need to create roles and career paths for individuals who can help blend disciplines. At Google DeepMind, our research engineers encourage a positive feedback loop between research and engineering, while our programme managers help to cultivate team dynamics within a research effort and create links across them. We also prioritise hiring individuals who enjoy finding and bridging connections between fields, as well as those that are motivated by rapidly upskilling in new areas. To encourage a cross-pollination of ideas, we also encourage scientists and engineers to regularly switch projects. Ultimately, the goal is to create a culture that encourages curiosity, humility and what the economic historian Joel Mokyr has referred to as ‘contestability’ - where practitioners of all backgrounds feel empowered to present and constructively critique each other’s work in open talks and discussion threads.

7. ADOPTION: Carefully consider the best access option and spotlight AI models’ uncertainties

Many AI for Science models, such as AlphaFold or our weather forecasting work, are specialised in the sense that they perform a small number of tasks. But they are also general in the sense that a large number of scientists are using them, for everything from understanding diseases to improving fishing programmes. This impact is far from guaranteed. The germ-theory of disease took a long time to diffuse, while the downstream products that scientific breakthroughs could enable, such as novel antibiotics, often lack the right market incentives.

When deciding how to release our models, we try to balance the desire for widespread adoption and validation from scientists with commercial goals and other considerations, such as potential safety risks. We also created a dedicated Impact Accelerator to drive adoption of breakthroughs and encourage socially beneficial applications that may not otherwise occur, including through partnerships with organisations like the Drugs for Neglected Diseases Initiative, and the Global Antibiotic Research & Development Partnership, that have a similar mandate.

To encourage scientists who could benefit from a new model or dataset to use it, developers need to make it as easy as possible for scientists to use and integrate into their workflows. With this in mind, for AlphaFold 2 we open-sourced the code but also partnered with EMBL-EBI to develop a database where scientists, including those with less computational skills and infrastructure, could search and download from a preexisting set of 200 million protein structures. AlphaFold 3 expanded the model’s capabilities, leading to a combinatorial explosion in the number of potential predictions. This created a need for a new interface, the AlphaFold Server, which allows scientists to create structures on-demand. The scientific community has also developed their own AlphaFold tools, such as ColabFold, demonstrating the diversity of needs that exist, as well as the value of nurturing computational skills in the scientific community to address these needs.

To date, more than 2M users from over 190 countries have accessed the AlphaFold Protein Structure Database to view over 7 million structures.

Scientists also need to trust an AI model in order to use it. We expand on the reliability question below, but a useful starting point is to proactively signal how scientists should use a model, as well as its uncertainties. With AlphaFold, following dialogue with scientists the team developed uncertainty metrics that communicated how ‘confident’ the model was about a given protein structure prediction, supported by intuitive visualisations. We also partnered with EMBL-EBI to develop a training module that offered guidance on how to best use AlphaFold, including how to interpret the confidence metrics, supported by practical examples of how other scientists were using it. Similarly, our Med-Gemini system recently achieved state-of-the-art performance on answering health-related questions. It uses an uncertainty-guided approach that responds to a question by generating multiple ‘reasoning chains’ for how it might answer. It then uses the relative divergence between these initial answers to calculate how uncertain the answer is. Where uncertainty is high, it invokes web search to integrate the latest, up-to-date information.

Med-Gemini-3D is able to generate reports for CT scans, a far more complex form of imaging than standard X-rays. In this example, Med-Gemini-3D’s report has correctly included a pathology (highlighted in green) that was missed in the original radiologist’s report. Note that ‘basilar’ is a common mis-transcription of ‘basal’ that Med-Gemini has learned from the training data, though the meaning of the report is unchanged.

8. PARTNERSHIPS: Aim for early alignment and a clear value exchange

AI for Science efforts require a diversity of expertise that creates a strong need for partnership - both formal and informal - between public and private organisations. For example, our FunSearch method developed a new construction for the Cap Set problem, which renowned mathematician Terence Tao once described as his favourite open question. This was enabled by collaborating with Jordan Ellenberg, a professor of mathematics at the University of Wisconsin–Madison and a noted Cap Set expert.

Partnerships are needed throughout the project lifecycle, from creating datasets to sharing the research. In particular, AI labs often need scientists to help evaluate an AI model’s outputs. For example, our protein design team partnered with research groups from the Francis Crick Institute to run wet lab experiments to test if our AI-designed proteins bound to their target and if this had the desired function, such as preventing SARS-CoV-2 from infecting cells. Given the central role played by industry labs in advancing AI capabilities, and the need for rich domain expertise, these kinds of public private partnerships will likely become increasingly important to advancing the AI for Science frontier and may require greater investment, such as more funding to support partnerships teams in universities and public research institutions.

Developing partnerships is difficult. When starting discussions, it is important to align early on the overall goal and address potentially thorny questions, such as what rights each party should have over the outputs, whether there should be a publication, whether the model or dataset should be open sourced, and what type of licensing should apply. Differences of opinion are natural and often reflect the incentives facing public and private organisations, which in turn vary greatly, depending on factors such as the maturity of the research or its commercial potential. The most successful partnerships involve a clear value-exchange that draws on the strengths of each organisation. For example, more than 2 million users from over 190 countries have used the AlphaFold Protein Structure Database. This required a close collaboration to pair our AI model with the biocuration expertise and scientific networks of EMBL-EBI.

9. SAFETY & RESPONSIBILITY: Use assessments to explore trade-offs and inspire new types of evaluation methods

Scientists often disagree, sometimes strongly, about the potential benefits and risks that AI models may have on science, and on wider society. Conducting an ethics and safety assessment can help to frame the discussion and enable scientists to decide whether, and how, to develop a given AI model. A starting point is to identify the most important domains of impact, and to specify these domains at the right level of abstraction. There are increasingly sophisticated frameworks to identify and categorise different AI risks, such as enabling mis- and disinformation. But these frameworks rarely consider the potential benefits of AI in the same domain, such as improving access to high-quality information synthesis, or the trade-offs that can occur, for example if you restrict access to an AI model or limit its capabilities. Assessments should also clarify their timescales, the relative certainty of any impact, and the relative importance, or additionality, of AI, to achieving it. For example, those worried about AI and climate change often focus on the immediate power needed to train large AI models, while AI proponents often focus on the less immediate, less clear, but potentially much larger downstream benefits to the climate from future AI applications. In carrying out their assessment, AI practitioners should also avoid over-indexing on the model’s capabilities, which they will be closer to, and better understand the extent to which third parties will actually use it or be affected by it, which typically requires input from external experts to do well.

Practitioners also need new methods to better evaluate the potential risks and benefits of using AI in science. At present, many AI safety evaluations rely on specifying the types of content that a model should not output, and quantifying the extent to which the model adheres to this policy. These evaluations are useful for certain risks posed by using AI in science, such as generating inaccurate content. But for other risks, such as to biosecurity, the idea that we can reliably specify certain types of scientific knowledge as dangerous in advance has been challenged, because of the dual-use nature of scientific knowledge, but also because such efforts tend to focus on what has caused harm historically, such as viruses from past outbreaks, rather than novel risks.

A better approach may be to evaluate the dangerous capabilities of AI models, or the degree to which AI models provide an uplift to humans’ dangerous capabilities. In many cases, these capabilities will also be dual-use, such as the ability to help design or execute experimental protocols. The degree to which these AI capabilities point to a risk, or an opportunity, will depend on how potential threat actors are assessed and how access to the model is governed. Beyond safety, evaluating other risks from using AI in science, such as to scientific creativity or reliability (which we discuss below), will require entirely new evaluation methods. Given the difficulty of researching and executing such evaluations, it makes sense to pursue them at the community-level, rather than each lab pursuing siloed efforts.

Part C: The risks

Policy papers, government documents and surveys of scientists regularly cite certain risks from the growing use of AI in science. Three of these risks - to scientific creativity, reliability, and understanding - mainly relate to how science is practised. Two other risks - to equity and the environment - mainly relate to how science represents and affects wider society. The use of AI is often presented exclusively as a risk to these domains, and the domains, such as scientific reliability, or the environment, are often portrayed in stable, somewhat idealised terms, that can overlook the wider challenges that they face.

We believe that using AI in science will ultimately benefit each of these five domains, because there are opportunities to mitigate the risks that AI poses, and to use AI to help address wider challenges in these areas, in some cases profoundly. Achieving a beneficial outcome will likely be harder for inequity, which is ingrained into AI and science at multiple levels, from the make-up of the workforce to the data underpinning research, and for scientific creativity, which is highly subjective and so individuals may reasonably disagree about whether a certain outcome is positive. These nuances increase the value of scientists, policymakers and others articulating their expectations for how using AI in science will affect each of these 5 areas.

1. CREATIVITY: Will AI lead to less novel, counterintuitive, breakthroughs?

Scientific creativity describes the creation of something new that is useful. In practice, the extent to which a scientist views a new idea, method, or output as creative typically rests on more subjective factors, such as its perceived simplicity, counterintuitiveness, or beauty. Today, scientific creativity is undermined by the relative homogeneity of the scientific workforce, which narrows the diversity of ideas. The pressure on researchers to ‘publish or perish’ also incentivises ‘crowd-following’ publications on less risky topics, rather than the kind of deep work, or bridging of concepts across disciplines, that often underpins creative breakthroughs. This may explain why the share of disruptive scientific ideas that cause a field to veer off into a new direction seems to be declining, beyond what may be normally expected, as science expands.

Some scientists worry that using AI may exacerbate these trends, by undermining the more intuitive, unorthodox, and serendipitous approaches of human scientists, such as Galileo's hypothesis that the earth rotates on its axis. This could happen in different ways. One concern is that AI models are trained to minimise anomalies in their training data, whereas scientists often amplify anomalies by following their intuitions about a perplexing data point. Others worry that AI systems are trained to perform specific tasks, and so relying on them will forgo more serendipitous breakthroughs, such as researchers unexpectedly finding solutions to problems that they weren’t studying. At the community level, some worry that if scientists embrace AI en masse, it may lead to a gradual homogenisation of outputs, for example if LLMs produce similar suggestions in response to the queries of different scientists. Or if scientists over-focus on disciplines and problems that are best-suited to AI.

Maintaining support for exploratory research and non-AI research could help to mitigate some of these risks. Scientists could also tailor how they use AI so that it boosts rather than detracts from their own creativity, for example by fine-tuning LLMs to suggest more personalised research ideas, or to help scientists better elicit their own ideas, similar to our early efforts to develop AI tutors that could help students to better reflect on a problem, rather than just outputting answers to questions. AI could also enable new types of scientific creativity that may be unlikely to otherwise occur. One type of AI creativity is interpolation where AI systems identify novel ideas within their training data, particularly where humans’ ability to do this is limited, such as efforts to use AI to detect anomalies in massive datasets from Large Hadron Collider experiments. A second type is extrapolation, where AI models generalise to more novel solutions outside their training data, such as the famous move 37 that our AlphaGo system came up with, that stunned human Go experts, or the novel maths proofs and non-obvious constructions that our AlphaProof and AlphaGeometry 2 systems produced. A third type is invention, where AI systems come up with an entirely new theory or scientific system, completely removed from its training data, akin to the original development of general relativity, or the creation of complex numbers. AI systems do not currently demonstrate such creativity, but new approaches could potentially unlock this, such as multi-agent systems that are optimised for different goals, like novelty and counterintuitiveness, or AI models that are trained to generate novel scientific problems in order to inspire novel solutions.

2. RELIABILITY: Will AI make science less self-correcting?

Reliability describes the ability of scientists to depend upon each others’ findings, and trust that they are not due to chance or error. Today, a series of interrelated challenges weaken the reliability of science, including the p-hacking and publication bias which can lead researchers to underreport negative results; a lack of standardisation in how scientists carry out routine scientific tasks; mistakes, for example in how scientists use statistical methods; scientific fraud; and challenges with the peer review process, including a lack of qualified peer reviewers.

Some scientists worry that AI will exacerbate these challenges as some AI research also features bad practices, such as practitioners cherrypicking the evaluations they use to assess their models’ performance. AI models, particularly LLMs, are also prone to ‘hallucinate’ outputs, including scientific citations, that are false or misleading. Others worry that LLMs may lead to a flood of low-quality papers similar to those that ‘paper mills’ churn out. The community is working on mitigations to these problems, including good practice checklists for researchers to adhere to and different types of AI factuality research, such as training AI models to ground their outputs to trusted sources, or to help verify the outputs of other AI models.

Scientists could also potentially use AI to improve the reliability of the wider research base. For instance, if AI can help to automate aspects of data annotation or experiment design, this could provide much-needed standardisation in these areas. As AI models get better at grounding their outputs to citations, they could also help scientists and policymakers do more systematic reviews of the evidence base, for example in climate change, where groups like the Intergovernmental Panel on Climate Change are already struggling to keep up with the inexorable rise in publications. Practitioners could also use AI to help detect mistaken or fraudulent images, or misleading scientific claims, as seen in the recent trial by the Science journals of an AI image analysis tool. More speculatively, AI could potentially help with aspects of peer review, given that some scientists already use LLMs to help critique their own papers, and to help validate the outputs of AI models, for example in theorem proving. However, there are also reasonable concerns about confidentiality, the ability of AI systems to detect truly novel work, and the need for buy-in from scientists given the consequential role that peer review plays in processes such as grant approvals.

3. UNDERSTANDING: Will AI lead to useful predictions at the expense of deeper scientific understanding?

In a recent Nature survey, scientists cited a reliance on pattern matching at the expense of deeper understanding as the biggest risk from using AI in science. Understanding is not always necessary to discover new scientific phenomena, such as superconductivity, or to develop useful applications, such as drugs. But most scientists view understanding as one of their primary goals, as the deepest form of human knowledge. Concerns about AI undermining scientific understanding include the argument that modern deep learning methods are atheoretical and do not incorporate or contribute to theories for the phenomena that they predict. Scientists also worry that AI models are uninterpretable, in the sense that they are not based on clear sets of equations and parameters. There is also a concern that any explanation for an AI model’s outputs will not be accessible or useful to scientists. Taken together, AI models may provide useful predictions about the structure of a protein, or the weather, but will they be able to help scientists understand why a protein folds a certain way, or how atmospheric dynamics lead to weather shifts?

Concerns about replacing ‘real, theoretical science’ with ‘low-brow ... .computation’ are not new and were levelled at past techniques, such as the Monte Carlo method. Fields that merge engineering and science, such as synthetic biology, have also faced accusations of prioritising useful applications over deeper scientific understanding. Those methods and technologies led to gains in scientific understanding and we are confident that AI will too, even if some of these gains will be hard to predict in advance. First, most AI models are not atheorethical but build on prior knowledge in different ways, such as in the construction of their datasets and evaluations. Some AI models also have interpretable outputs. For example, our FunSearch method outputs computer code that also describes how it arrived at its solution.

Researchers are also working on explainability techniques that could shed light on how AI systems work, such as efforts to identify the ‘concepts’ that a model learns. Many of these explainability techniques have important limitations, but they have already enabled scientists to extract new scientific hypotheses from AI models. For example, transcription factors are proteins that bind to DNA sequences to activate or repress the expression of a nearby gene. One AI research effort was able to predict the relative contribution of each base in a DNA sequence to the binding of different transcription factors and to explain this result using concepts familiar to biologists. A bigger opportunity may be to learn entirely new concepts based on how AI systems learn. For example, our researchers recently demonstrated that our AlphaZero system learned ‘superhuman’ knowledge about playing chess, including unconventional moves and strategies, and used another AI system to extract these concepts and teach them to human chess experts.

Even without explainability techniques, AI will improve scientific understanding simply by opening up new research directions that would otherwise be prohibitive. For example, by unlocking the ability to generate a huge number of synthetic protein structures, AlphaFold enabled scientists to search across protein structures, rather than just across protein sequences. One group used this approach to discover an ancient member of the Cas13 protein family that offers promise for editing RNA, including to help diagnose and treat diseases. This discovery also challenged previous assumptions about how Cas13 evolved. Conversely, efforts to modify the AlphaFold model architecture to incorporate more prior knowledge led to worse performance. This highlights the trade-off that can occur between accuracy and interpretability, but also how AI systems could advance scientific understanding not in spite of their opacity, but because of it, as this opacity can stem from their ability to operate in high-dimensional spaces that may be uninterpretable to humans, but necessary to making scientific breakthroughs.

4. EQUITY: Will AI make science less representative, and useful, to marginalised groups?

Inequity is starkly visible in the scientific workforce, in the questions they study, in the data and models they develop, and in the benefits and harms that result. These inequities are related and can compound over time. For example, a small number of labs and individuals in higher-income cities account for a disproportionate share of scientific outputs. Studies to identify genetic variants associated with disease rely heavily on data from European ancestry groups, while the neglected tropical diseases that disproportionately affect poor countries receive relatively little research funding. In agriculture, crop innovations focus on pests that are most common in high-income countries, and are then inappropriately used on different pests in lower-income countries, hurting yields. Despite positive trends, women account for just 33% of scientists and have long been underrepresented in clinical trials, particularly women of colour.

Observers worry that the growing use of AI in science could exacerbate these inequities. AI and computer science workforces are less representative, in terms of gender, ethnicity and the location of leading labs, than many other scientific disciplines and so AI’s growing use could hurt broader representation in science. As a data-driven technology, AI also risks inheriting and entrenching the biases found in scientific datasets.

There are also opportunities to use AI to reduce inequities in science, albeit not in lieu of more systemic change. If AI models are provided via low-cost servers or databases, they could make it easier and cheaper for scientists, including those from underrepresented groups, to study traditionally neglected problems, similar to how releasing more satellite data led to more research from underrepresented communities. By ingesting more data, AI models may also be able to learn more universal patterns about the complex systems that scientists study, making these models more robust and less prone to biases. For example, owing to their non-representative data, studies that identify genetic variants associated with disease can pick up confounding, rather than causal variants. Conversely, some early attempts to train AI models on larger datasets of protein structures and genetic variants, including data across species, perform better at predicting individuals at the greatest risk for disease, with fewer discrepancies across population groups. Ultimately, however, improving equity will require long-term efforts, such as the H3Africa initiative in genomics and the Deep Learning Indaba initiative for AI, that aim to build up scientific infrastructure, communities, and education where it is most lacking.

5. THE ENVIRONMENT: Will AI hurt or help efforts to achieve NetZero?

Given their desire to understand the natural world, many scientists have long been active in efforts to protect the environment, from providing early evidence about climate change to developing photovoltaic cells. In recent years, a growing number of scientists have voiced concerns about the potential impact of AI on the environment and developed methodologies to try to quantify these impacts. Most concerns focus on the potential impact of training and using LLMs on greenhouse gas emissions, as well as related concerns, such as about the water needed to cool data centres. One way to think about these effects is the life cycle approach, which captures both direct and indirect effects. Direct effects include the emissions from building and powering the data centres and devices that AI models are trained and run on. There is no comprehensive estimate for all direct emissions from AI. However, a 2021 estimate suggested that cloud and hyperscale data centres, where many large AI models are trained and deployed, accounted for just 0.1-0.2% of global emissions.

As the size of LLMs continues to grow, observers have cautioned that these figures may increase, potentially significantly. However, many users of LLMs, including scientists, will be able to fine-tune them, or use their predictions, at a relatively low compute cost, rather than training them from scratch. Efforts are also underway to make LLMs more efficient, and the history of digital technology suggests that sizable gains are possible, not least due to the commercial pressures to deliver faster and cheaper AI models. In some instances, the emissions from AI models will be lower than other approaches. For example, our internal analysis suggests that determining the structures of a small number (<10) of proteins experimentally uses roughly the same energy as a full training run of AlphaFold 2. These results need to be interpreted carefully, as AI simulations rely on, and inform, physical experiments, rather than substituting for them. But they also show how AI could enable a larger amount of scientific activity at a lower average energy cost.

Crucially, the direct effects of AI on emissions, whether positive or negative, will likely be minor compared to the indirect effects that AI-enabled applications have on emissions. Using AI in science opens up three major opportunities to reduce emissions. First, progress at the nexus between AI, maths and computer science could dramatically improve the efficiency of the Internet, from designing more efficient chips to finding more efficient algorithms for routine tasks. As a growing share of the economy moves online, this should help to offset emissions across these sectors. AI could accelerate the development and use of renewable energy, for example by designing new materials, such as for batteries or solar panels, by optimising how the grid operates and how it integrates renewables, and via more transformative but uncertain opportunities like fusion. Finally, the world is already getting warmer and AI could help to better prepare for extreme weather events. For example, our weather forecasting model recently correctly predicted, seven days in advance, that the deadly Hurricane Beryl would make ‘landfall’ in Texas. Non-AI models had originally predicted landfall in Mexico before correcting their prediction to Texas three days before it occurred.

Part D: The policy response

Given the importance of scientific progress to almost every major economic, environmental and security goal, it follows that science, and the potential for AI to accelerate it, should be a top priority for any government. What should a new AI for Science policy agenda look like? Policymakers can start by implementing the many good science and innovation policy ideas that already exist and which make even more sense in an era of AI-enabled science. For example, AI will improve the return on science research funding and so it provides a strong rationale to invest more in it and to trial new ideas to speed up and experiment with how this funding is allocated. On compute, governments could implement the idea set out in the UK’s Independent Review to empower a dedicated body to continually assess and advise governments on potential investments. To support AI for Science startups, policymakers can improve their spin-out policies and support well-run start-up incubators and fellowships.

But ambitious new policies are also needed to capitalise on the AI for Science opportunity. We share four ideas below. They are intended to be widely applicable, though the precise details would need to be tailored to the specific context of a country, taking into account national priorities, unique strengths and the institutional landscape.

1. Define the ‘Hilbert Problems’ for AI in Science

Scientific progress rests on picking the right problems. In 1900, the German mathematician David Hilbert published 23 unsolved problems that proved hugely influential for the subsequent direction of 20th century mathematics. As part of upcoming international events such as the AI Action Summit in Paris, policymakers, AI labs and science funders could launch a public call for scientists and technologists to identify the most important AI-shaped scientific problems, backed by a major new global fund to drive progress on them. Submissions should specify why the problem is important, why it is suited to modern AI systems, why it may be otherwise neglected, the data bottlenecks that exist, and how near-term technical progress could be evaluated.

The top ideas could form the basis of new scientific competitions, where scientists compete to solve these problems with AI, supported by new datasets, evaluation methods and competitive benchmarks. These could build on the recent flurry of competitions that have emerged to evaluate the scientific capabilities of AI models, and include a new AI for Science Olympiad to attract exceptional young talent from across the world to the field. Beyond its direct impacts, the AI for Science ‘Hilbert Problems’ initiative could provide a welcome focal point for international scientific collaboration and funding, and inspire a new generation of interdisciplinary scientists to identify and pursue AI-shaped problems.

2. Make the world readable to scientists

Most scientific data is uncollected, partial, uncurated or inaccessible, making it unavailable to train AI models. There is no single policy response to what is far from a uniform challenge. Policymakers and funders will need to blend a small number of top-down initiatives with support to scale up promising grassroots efforts. A new international network of AI for Science Data Observatories should be set up to help address these goals. These Observatories could be given long-term backing and tasked with running rapid AI for Science ‘data stocktakes’, where expert teams map the state of data in priority disciplines and application areas. Stocktakes could identify existing datasets, such as the Sequence Read Archive, whose quality could be further improved, as well as untapped or underutilised datasets, such as the decades of experimental fusion data that is currently unavailable to scientists or leading biodiversity datasets that are subject to restrictive licensing conditions. The stocktakes could also include new ‘data wish lists’. For example, our internal analysis suggests that less than 7% of papers in key environmental research domains use AI. We recently funded Climate Change AI to identify datasets which, if available or improved, could remove some of the bottlenecks to higher AI use. To ensure this analysis leads to action, policymakers should designate and empower organisations to be accountable for addressing the results of the data stocktakes.

The observatories could also scope the creation of new databases, including ensuring that adequate consideration is given to their long-term storage, maintenance, and incentives. This could include new databases to securely store the results of strategic wet lab experiments that are currently discarded, complemented by making the deposition of these experimental results a requirement for public research funding. Or digitising more public archives, following the example of a recent UK government and Natural History Museum collaboration to digitise their natural science collections, which includes more than 137 million items, from butterflies to legumes, across a 4.6 billion-year history. Policymakers can also empower scientists to use LLMs to create and improve their own datasets, by ensuring that publicly-funded research is open by default, where possible, building on recent examples from the UK, US and Japan, including mandates to release research via pre-print servers. Policymakers could seek to co-fund the most ambitious dataset initiatives with industry and philanthropy.

3. Teach AI as the next scientific instrument

Over the past half century, as the number of scientific technologies has grown, so has most scientists’ distance from them. Many technologies are the products of science, but an ever smaller share of scientists are trained in how to develop and use them effectively. The pressing near-term need is to fund and incentivise mass uptake of shorter, more tactical AI training programmes and fellowships, for existing scientists and research leaders. Policymakers can incentivise these efforts by setting a clear goal that every postgraduate science student should be able to access introductory courses on using AI in science, including on the most important tools in their domain, in the same way as basic statistics is often taught today. The type and depth of training needed will depend on an individual’s discipline and profile, and could range from basic introductory courses about how to reliably use LLMs for everyday research tasks, through to more advanced courses on how to fine-tune AI models on scientific data, as well as how to address more complex challenges, such as evaluating whether the data they used to test a model’s performance has intentionally or unintentionally ‘leaked’ into the data used to train it. These programmes could build on established examples such as the University of Cambridge's Accelerate Programme that provides structured training in AI to PhD and postdoctoral researchers, or the short courses that The Carpentries offer on the programming, data, and computational skills needed to do research.

Policymakers also need to move quickly to put in place longer-term programmes to ensure that the next generation of scientists has the skills they need. This means mainstreaming and deepening AI training and skills development in science education at all levels. Secondary school science students will need early exposure to the impact of AI while university students will need access to new types of interdisciplinary AI science degrees, such as the pan-African AI for Science Masters programme that we partnered with the African Institute for Mathematical Sciences to develop. Dedicated scholarships could also help. For example, the UK’s BIG Scholarships programme provides outstanding opportunities to high school students, with a focus on those from underrepresented groups who have excelled in International Science Olympiads and want to continue their study in leading science hubs but lack the funds to do so.

4. Build evidence and experiment with new ways of organising science

Scientists' use of AI is growing exponentially, but policymakers have little evidence about who is doing it best, how they are doing it, and the hurdles that are inhibiting others. This evidence gap is an impediment to identifying the best AI for Science policy ideas and targeting them effectively. Historically, answers to such questions often come from fields such as economics or innovation studies, but the results can take years to arrive. We are using citation data analysis, interviews, and community engagement to understand how scientists are using our AI models. Governments are also investing in these metascience capabilities to improve how they fund, share and evaluate science research. Building on this momentum, scientists could be tasked with a mission to rapidly assess foundational policy questions, including: where is the most impactful AI for Science research occurring and what types of organisations, talent, datasets, and evaluations are enabling it? To what extent are scientists using and fine-tuning LLMs vs more specialised AI models, and how are they accessing these models? To what extent is AI actually benefiting or harming scientific creativity, reliability, the environment, or other domains? How is AI affecting a scientist’s perception of their job and what skills, knowledge gaps, or other barriers are preventing their broader use of AI?

Beyond informing robust policy responses, this evidence base will arm policymakers with the foresight they need to anticipate how AI will transform science and society, similar to the foresight they are developing for AI safety risks through the growing network of AI Safety Institutes. The evidence will also highlight opportunities to reimagine the incentives and institutions needed for science in the age of AI. In particular, scientists and policymakers have only explored a small fraction of the possible approaches to organising and executing science research. The rise of AI provides a welcome forcing function to experiment with new types of institutions, from those with more freedom to pursue high-risk, high-reward research, to Focused Research Organisations aimed at addressing specific bottlenecks. And from new types of interdisciplinary AI science institutes in priority domains such as climate or food security, to completely novel institutions that we are yet to imagine. Those who experiment faster will stand to benefit the most from a new golden age of discovery.

Acknowledgements

Thank you to Louisa Bartolo, Zoë Brammer and Nick Swanson for research support, and to the following individuals who shared insights with us through interviews and/or feedback on the draft. All views, and any mistakes, belong solely to the authors.

Žiga Avsec, Nicklas Lundblad, John Jumper, Matt Clifford, Ben Southwood, Craig Donner, Joëlle Barral, Tom Zahavy, Been Kim, Sebastian Nowozin, Matt Clancy, Matej Balog, Jennifer Beroshi, Nitarshan Rajkumar, Brendan Tracey, Yannis Assael, Massimiliano Ciaramita, Michael Webb, Agnieszka Grabska-Barwinska, Alessandro Pau, Tom Lue, Agata Laydon, Anna Koivuniemi, Abhishek Nagaraj, Tom Westgarth, Guy Ward-Jackson, Harry Law, Arianna Manzini, Stefano Bianchini, Sameer Velankar, Ankur Vora, Sébastien Krier, Joel Z Leibo, Elisa Lai H. Wong, Ben Johnson, David Osimo, Andrea Huber, Dipanjan Das, Ekin Dogus Cubuk, Jacklynn Stott, Kelvin Guu, Kiran Vodrahalli, Sanil Jain, Trieu Trinh, Rebeca Santamaria-Fernandez, Remi Lam, Victor Martin, Neel Nanda, Nenad Tomasev, Obum Ekeke, Uchechi Okereke, Francesca Pietra, Rishabh Agarwal, Peter Battaglia, Anil Doshi, Yian Yin.

Errata & updates

Note: In May 2025, we updated this essay to remove a reference to a now discredited study on the use of AI in material science by Aidan Toner-Rogers. Stuart Buck has a good write-up of the warning signs about this study, which we unfortunately missed. A good reminder to be appropriately skeptical with big claims!

The Red Queen problem

AI Policy Perspectives — Fri, 16 Aug 2024 10:02:57 GMT

Authored by Google DeepMind’s Nicklas Lundblad (Senior Director of Policy and Strategic Advisor) and Dorothy Chou (Director of Policy & Public Engagement), the below is the first of our Science in the Spotlight essay series, exploring AI, science and society.

Visualising AI by Google DeepMind

Here is a simple model of progress: problems of increasing complexity are solved through the use of science and technology. Doing so has created more economic opportunity, lifted billions of people out of poverty and increased social welfare in many different ways. This progress, however, has come at the price of increasing complexity, and new classes of problems, and these have, in turn, required more scientific and technological progress to solve.

It’s easy to see where this is going, right? This is a spiral - and the big question society faces at some point is whether it will continue to be a positive spiral of growth, or become a negative spiral of complexity.

The answer to this question depends on developing the ability to continue to use science and technology to address ever-increasing levels of complexity. In a sense this is a version of the “red queen problem” - named after the Red Queen in Alice in Wonderland, who notes that it takes all the running one can do to remain in the very same place.

In this simplified model of progress, stagnation is not just levelling out at current levels of welfare, quality of life and happiness – it is inevitably inviting decline. It requires continuing to run faster—and even speed up—to avoid succumbing to the complexity already created, and collapse (as suggested by Joseph Tainter in his magisterial work on the collapse of complex societies).

The Red Queen model is far from the only viable model to consider as we think about how to tackle some of today's major societal challenges, but it provides a useful framework for understanding the role of science and technology, and what the rate of scientific progress tells us about the state of social progress more broadly.

Measuring the pace of scientific progress

In the Red Queen model, one core metric to work with is scientific productivity: is society making the scientific breakthroughs it needs to keep building new technologies? Another core metric is the interplay between technology and science, and how technology inspires and enhances scientific productivity in different ways. In other words, good ideas must become easier to find – and today, this is getting harder and harder.

How do we measure the pace of scientific progress? It is easy to do it in a way that is too simplistic and most likely unhelpful – e.g. focusing on some parameter that seems somewhat objective and then projecting that parameter with some level of (often fake) precision. Examples include methods like:

Measuring the number of patents in a field, or patents per country or year. Patents are a very noisy measure, especially since they have acquired a purely defensive role in much of the world today - both in trade and in domestic disputes.
Measuring the number of published papers in a field. This measure has become even more problematic now that there are a growing number of papers produced in part or wholly by LLMs in different ways.
Measuring the funding of a specific area. Funding often can fuel progress, so it is not wrong, but it is less exact than one would wish for.

Even noisier measures include elements of productivity growth, GDP-growth or general economic metrics that can be used to assess overall progress in some narrow sense. These metrics do not typically capture scientific progress that well.

So what is an effective way to measure the pace of scientific progress?

The first task requires tackling the question of what scientific progress actually is. It may seem self-evident, but looking more closely demonstrates it is really hard to pinpoint exactly what constitutes scientific progress. This is easiest to illustrate when looking at different fields. Is physics progressing faster or slower than chemistry? What about biology and mathematics? Economics and literary theory? Which is the faster and slower of these?

One way to think about scientific progress is that it is about completing a borderless puzzle, expanding understanding of the world and how it works. The measurement to capture is the pace at which pieces of the puzzle go to the right places, or the pace at which new pieces are found. As scientists slowly, painstakingly, try to complete the puzzle, that’s how progress is made. The pace at which the puzzle grows - new pieces are found and existing pieces placed in a way that slowly reveals the larger motif - is what should be measured.

But this is hard. What, exactly, is a piece? And when is it placed in the right place?

The work of Tsao and Narayanamurti offers insight into these questions. In their exploration of technoscientific revolutions they speak about the network of questions and answers as a good metaphor for science. What is important to understand is the pace at which new questions and answers are found - and ideally both should be progressing at pace. A world that slowly answers the existing questions - where there are more questions answered than new questions asked - is probably slipping into normal science. A world where only new questions are added is one where everyone is too lazy to do actual empirical work, and equally unhelpful.

What is important to measure, then, is the relative speed of questions and answers added to this network – and that is doable, if hard.

Other challenges in measuring scientific progress

Scientific progress in fundamental areas has an interesting challenge when it comes to measuring return on investment or overall impact on a field. Often impact can't be ascertained without hindsight, and as a result, the biggest recognitions of achievement like Nobel Prizes are more or less on delay. They watch to see which breakthroughs stand the test of time. What appears to be a significant breakthrough in the moment may be eclipsed by another; the impact of Google's transformer architecture was only truly recognized in its totality by a layperson when ChatGPT (T for transformer) launched.

This makes investing in fundamental science difficult—where should bets be placed and when should investments be cut off or doubled down on? When is it obvious something is a dead end rather than a stop on the way to something much greater?

The contrast to other areas that also require patient capital due to longer timelines, like drug discovery, is interesting to observe. With a drug that makes it to market, it's easy to calculate impact, efficacy, and ROI based on survival rates: for example increased survival rates due to a cancer treatment or even in a widespread pandemic. Even those results can only be calculated retrospectively, but at the very least, the impact is indisputable (if trials are run effectively). Similarly desalination of water is a fairly straightforward test - these are applied sciences and solutions that have a clear outcome in sight.

Where can scientists build consensus around what they are trying to achieve in a given area, and how does that achievement or breakthrough stack up relative to others? That is what establishing benchmarks can help with in lieu of a clear test in applied fields.

The role of benchmarks to measure scientific progress

The importance of benchmarking progress is exactly why competitions like CASP, which diagnose and define problems that scientists work toward and benchmarks for progress, are so crucial to establish. Agreeing on what 'good' looks like helps everyone to mark scientific progress in a way that crowds out noise and helps leapfrog the tests of time that so many fundamental fields are subjected to (though does not eliminate the test of time utility). And they are even more crucial if artificial intelligence is increasingly integrated into scientific practice. In the age of AI, what does good science & impactful progress even mean?

It might be tempting to create benchmarks in fundamental science that give recognition to making sense of what feels like chaos in a scientific field. But in her book 'Unthinking Mastery' Julietta Singh raises the idea that Western cultures are built on a composition of domination and exploitation in the name of mastering and civilizing not only other lands and physical bodies, but also ideas that perhaps extend far outside of the ability to bend them to our expertise. She argues instead in favor of considering benchmarks of progress that are postcolonial and deconstructs notions of what it means to gain knowledge. After all, mastery is actually just fracturing: PhDs are given in increasingly siloed areas with minimal transferable or generalisable learnings.

If mastery of a discipline is no longer a goal, perhaps it becomes possible to reframe society's relationship with science as one where 'directionality becomes infinite and failure a process we might begin to meet with pedagogical delight.' Similarly, Physicist Sabine Hossenfelder has warned that a predisposition to beauty or what fits a predisposition to symmetry and simplicity has produced great math but bad science. She posits that this bias is what has held back physics for more than four decades. It's crucial to rethink how science is measured and done in order to make progress.

Perhaps with AI, now is the time as fields must detangle understanding and knowledge from comprehension, to develop greater curiosity and value it more than mastery. What would a benchmark for curiosity and seeing reality as-it-is be? How would the field incentivize this shift in approach? One proposal might be to reconsider and require the publication of negative results and providing them open access. If the fields were to value and reward what is not known and understood as much as they do mastery, the known unknowns might give way to greater context and knowledge for more generalised understanding and broader solutions. If it is in the crossovers of multiple silos that the solutions to complex problems like climate change can be found—the combination of political science, environmental science, materials science, and more—then perhaps what does not work is as crucial as what does. 'In failure we participate in new emergences... a refusal of mastery.' Might it be productive to distance ourselves from being the active subjects of reading texts that confirm our biases and instead reconstruct what science is and means altogether? Perhaps that is precisely the shift necessary for the world to embark on the next set of breakthroughs.

Science policy as a foundation for progress

Of course, it is easy to see where this view can be challenged: one could argue that instead of slowing down, society could engage in reversing progress to a point where it strikes an equilibrium between technology, science and complexity that is sustainable, instead of rushing into the future with an increasing complexity debt. One could also argue that the progress gained in this model is illusory—instead of eliminating complexity the model reallocates it into black boxes and reduces society’s level of control along with the increasing power of technology – a dangerous equation if true.

But even if the Red Queen model is imperfect, any equilibrium in de-growth scenarios will be dependent on the ability to deal with complexity. Taking science and the rate of progress seriously will therefore continue to be essential, as will the need to ensure science policy is fit for purpose. Within the model, science is not a single policy area, but the one that underpins all others -- the continued expansion of organised human curiosity is the premise on which progress relies. If we're going to successfully meet the challenges we face as a society head on, science policy - now confined to discussions of funding, major initiative and the effects on the startup market - will have to become something much more grounded and central to our political discourse.