This is a very interesting essay echoing some of the themes in the roundtable discussion Institute of Physics hosted a year ago at The Economist FusionFest, and later expanded here https://arxiv.org/abs/2603.25777
I spent time in the fusion industry in one of my last UK roles, working in and around the STEP programme. The data problem you’re describing was live then and I suspect it’s sharper now.
The hard issue isn’t the AI tools. There are plenty of those sitting across the value chain. The hard issue is what sits underneath them: engineering verification as you pass data from simulation through experimental observation into the production system for an operational reactor. That chain has to be unbroken and auditable.
Safety cases make that specific. A regulator reviewing an operational reactor doesn’t accept “the model said so.” They need to trace the claim back through the experimental data that validated it, and back further to the simulation assumptions that shaped the experiment. Break that chain anywhere and the safety case doesn’t hold.
The regulatory framework isn’t a bureaucratic overlay. It’s the thing that forces you to solve the data problem properly rather than approximately.
What the AI layer doesn’t change is the provenance requirement. Tooling can help manage complexity, but it can’t substitute for a data lineage that a regulator can actually follow.
Miss tackling that problem. Need to get into fusion in NZ (my new home!)
Thanks Neil! The data provenance question is v. interesting. A lot of the experts shared thoughts on this, and some positive examples from history, although alas our final scope didn't allow us much space to expand on this. Quite technically challenging for us non-fusion folks to wrap our heads around, especially as one starts to combine simulation + experimental data.
Also absolutely fair. We framed 'bureaucratic debt' as an obstacle to data access here, but of course the broader regulatory frameworks likely has several positive forcing functions that we don't touch on here.
Cool to see this effort. The experimental data challenges really felt familiar. At GF, we made a decision at one point to make a fairly big investment in better data warehousing.
Part of the thinking was being able to create datasets with enough quality that we could apply ML techniques (this was pre-AI). For physics, as you say, a lot of the data is calculated or inferred (examples: magnetic sensors measure dB/dt, as digitized voltages, but need to be calibrated and integrated to provide B. Data from arrays of x-ray photo diodes with different thickness filters are combined to measure temperature, etc.).
We made sure the system integrated operator and science team notes, but maybe less obvious is we realized we needed to be able to store the history of the calibration (or transfer functions) along with both the raw data and the processed data. Why? Calibrations can drift over type and changes be applied retroactively, or better transfer functions can be developed, and raw data needs to be reprocessed on older shots. When using ML techniques across large numbers of shots, this history can be important. I'm not sure it gets captured often.
Last point is that the line between simulation and experimental data is fuzzy. Simulations are very often used to interpret experimental data and "reconstruct" a full picture of what is going on in the plasma. The results of those reconstructions are thought of as experimental data, even though they depend on the quality of the simulation.
Anyways, awesome to see the effort on this! There is a lot of potential to advance the field.
Thanks for sharing Michael. I found myself getting confused as to what was experimental data and what was simulation data at various points! I hadn’t realised the point about storing the history of the calibration - very interesting. Shows how complex the data challenge is
Is the leverage here this high? When looking at companies like Commonwealth and Helion they don’t seem to be bottlenecked by better plasma confinement modeling? And it doesn’t seem that this is remotely close to ITER’s main woes.
Thanks Fernand - good questions. We indeed didn't fully cover, or rank, the relative importance of AI to addressing the primary bottlenecks in fusion vs other things (which is a very good question to ask).
A common thing we did hear though, even for ITER, is that the challenges and costs for fusion experiments are so high, that we 'need AI to be useful to addressing them', which is of course different to pinpointing specific use cases, and may by a somewhat risky starting point (a solution chasing a problem etc,). But for some/most of the opportunities that we discuss in the article, folks also drew some specific links to opportunities for ITER.
The material science discussions we had were so interesting. It was a pity we couldn't write more about them in the final article (for scope) - lots of challenges in getting good experimental data there too, but also lots of ambitious ideas floating around. And interesting to think about 'where else' the materials developed for fusion might provide to be useful........
This is a very interesting essay echoing some of the themes in the roundtable discussion Institute of Physics hosted a year ago at The Economist FusionFest, and later expanded here https://arxiv.org/abs/2603.25777
Thanks Iulia - Great to see that The Economist’s FusionFest event now appears to be an annual event.
I spent time in the fusion industry in one of my last UK roles, working in and around the STEP programme. The data problem you’re describing was live then and I suspect it’s sharper now.
The hard issue isn’t the AI tools. There are plenty of those sitting across the value chain. The hard issue is what sits underneath them: engineering verification as you pass data from simulation through experimental observation into the production system for an operational reactor. That chain has to be unbroken and auditable.
Safety cases make that specific. A regulator reviewing an operational reactor doesn’t accept “the model said so.” They need to trace the claim back through the experimental data that validated it, and back further to the simulation assumptions that shaped the experiment. Break that chain anywhere and the safety case doesn’t hold.
The regulatory framework isn’t a bureaucratic overlay. It’s the thing that forces you to solve the data problem properly rather than approximately.
What the AI layer doesn’t change is the provenance requirement. Tooling can help manage complexity, but it can’t substitute for a data lineage that a regulator can actually follow.
Miss tackling that problem. Need to get into fusion in NZ (my new home!)
Thanks Neil! The data provenance question is v. interesting. A lot of the experts shared thoughts on this, and some positive examples from history, although alas our final scope didn't allow us much space to expand on this. Quite technically challenging for us non-fusion folks to wrap our heads around, especially as one starts to combine simulation + experimental data.
Also absolutely fair. We framed 'bureaucratic debt' as an obstacle to data access here, but of course the broader regulatory frameworks likely has several positive forcing functions that we don't touch on here.
Hope you find a way to work on fusion in NZ!
Cool to see this effort. The experimental data challenges really felt familiar. At GF, we made a decision at one point to make a fairly big investment in better data warehousing.
Part of the thinking was being able to create datasets with enough quality that we could apply ML techniques (this was pre-AI). For physics, as you say, a lot of the data is calculated or inferred (examples: magnetic sensors measure dB/dt, as digitized voltages, but need to be calibrated and integrated to provide B. Data from arrays of x-ray photo diodes with different thickness filters are combined to measure temperature, etc.).
We made sure the system integrated operator and science team notes, but maybe less obvious is we realized we needed to be able to store the history of the calibration (or transfer functions) along with both the raw data and the processed data. Why? Calibrations can drift over type and changes be applied retroactively, or better transfer functions can be developed, and raw data needs to be reprocessed on older shots. When using ML techniques across large numbers of shots, this history can be important. I'm not sure it gets captured often.
Last point is that the line between simulation and experimental data is fuzzy. Simulations are very often used to interpret experimental data and "reconstruct" a full picture of what is going on in the plasma. The results of those reconstructions are thought of as experimental data, even though they depend on the quality of the simulation.
Anyways, awesome to see the effort on this! There is a lot of potential to advance the field.
Thanks for sharing Michael. I found myself getting confused as to what was experimental data and what was simulation data at various points! I hadn’t realised the point about storing the history of the calibration - very interesting. Shows how complex the data challenge is
Is the leverage here this high? When looking at companies like Commonwealth and Helion they don’t seem to be bottlenecked by better plasma confinement modeling? And it doesn’t seem that this is remotely close to ITER’s main woes.
Material science for tokamaks definitely is high leverage.
Thanks Fernand - good questions. We indeed didn't fully cover, or rank, the relative importance of AI to addressing the primary bottlenecks in fusion vs other things (which is a very good question to ask).
A common thing we did hear though, even for ITER, is that the challenges and costs for fusion experiments are so high, that we 'need AI to be useful to addressing them', which is of course different to pinpointing specific use cases, and may by a somewhat risky starting point (a solution chasing a problem etc,). But for some/most of the opportunities that we discuss in the article, folks also drew some specific links to opportunities for ITER.
The material science discussions we had were so interesting. It was a pity we couldn't write more about them in the final article (for scope) - lots of challenges in getting good experimental data there too, but also lots of ambitious ideas floating around. And interesting to think about 'where else' the materials developed for fusion might provide to be useful........