The World Cup & AI
Technology promised to fix football—but fans are raging. Here’s what the bungled rollout tells us
“Did you see the game?!”
The FIFA World Cup kicks off next week, triggering versions of this question in dozens of languages for the following month. The question may be gleeful; often, it’ll be grumpy. While moaning is a rite of fandom, the laments have worsened in recent years because of a technological villain named VAR.
The educator and passionate football fan Daisy Christodoulou, of school-software platform No More Marking, became so indignant that she wrote an entire book called I Can’t Stop Thinking About VAR. Now, she argues that you too should think about VAR, because its coming woes at the World Cup will hold insights for the AI era.
—Tom Rachman, AI Policy Perspectives
By DAISY CHRISTODOULOU
For English football fans, there is one legendary refereeing error: Diego Maradona’s “Hand of God” goal in the World Cup of 1986, when the 5-foot-5 Argentinian great leaped as if to head the ball—only to palm it past the goalkeeper. If the referee had seen, he would have disallowed the goal. But he did not, and lacked help from technology. The goal stood, and the English never got over it.
Only at the 2018 World Cup did the sport adopt technological reviews via VAR, the Video Assistant Referee system, which most elite national leagues have since introduced. For high-stakes calls, such as whether a goal was valid or if there should’ve been a penalty, off-field officials review the footage, alerting the referee to any factual mistakes or if they see “a clear and obvious error” in a subjective decision.
The upcoming World Cup will be the third to feature VAR, and the system is still being tweaked. To many fans, VAR is now a source of utter rage, having caused more problems than it solved, and prompting hostile chants that ring out across stadiums.
You might conclude that football fans will never be satisfied: they spent years complaining about unchecked errors, and now complain about the technological fix. But the problems with VAR—sure to rage throughout this World Cup—say something deeper about tech solutions to human problems. Sometimes, we don’t realise the ramifications until too late.
The accuracy/speed trade-off
Before VAR, managers, players and fans repeated that they just wanted “more right decisions”. The assumption was that accuracy was the absolute good, and that it was acceptable to spend time on a decision if that would improve its quality.
This is an assumption in many walks of life. We tend to see the speed of a decision in opposition to its quality: that faster decisions are more likely to be wrong, and slower decisions are more likely to be right. But speed may itself be an aspect of quality. That is true in sports, where an instant decision is especially valuable, allowing the game to keep flowing.
The average time for a VAR check is under a minute, but the longest in English football was eight minutes. A series of checks significantly slows the game, breaking its rhythm. Every fan and player knows that VAR frequently overturns goals after a check, so the review process has also tempered celebrations, among the most joyous parts of any sport.
Another issue is “ghost minutes”. If VAR is checking an incident, and the ball remains in play, the game carries on during the review. If the reviewers and the referee judge that the incident was a foul, then everything that happened after it is cancelled. Sometimes, these passages of play are longer than a minute. It all contributes to a sense that what is taking place on the pitch is not a concrete reality to be enjoyed or endured in the moment, but something provisional.
Part of the reason football is such a popular game is because it is fast-paced, fluid and doesn’t have the same natural breaks as tennis or cricket or American football. For that reason, VAR may be more destructive than equivalent systems in other sports. The stated preference of players and fans was “more right decisions”; the revealed preference is that speed matters.
Speed matters in other areas of life too. The UK has a particularly ponderous system of planning regulations. With big building projects, speed isn’t just a bonus, but an aspect of the project’s quality. If a road, railway, or power station runs 10 years over schedule, you’ve lost 10 years of its use. It’s also less likely to be relevant and up-to-date.
One response is better technology, such as using AI to digitise paper planning documents. This points to something weird about VAR. The typical expectation is that humans are slow and technology is fast. Think of online bank loan applications: an algorithm can approve or reject you almost instantly, whereas human review used to take days. But with VAR, it is the other way round: human refereeing gave instant decisions, but the technology added delay.
This is not true of all technology in football. A few years before VAR, the Premier League introduced technology to judge if the ball had crossed the goal-line or not. It has worked well, partly because it makes few errors, but also because its decisions are automated and speedy. As soon as an incident happens, the referee gets a buzz on his watch with the result. If the ball didn’t cross the line, the game flows. If it did, a goal is immediately awarded. That’s the standard we should be aiming for: instant and automatic decisions with a high degree of accuracy.
Goal-line decisions are objective, so lend themselves to automation. But reaching this standard of automatic accuracy for more subjective decisions, such as a handball, will be fiendishly complex, and risks endless disputes and ire. In education, for instance, multiple-choice exams have objective scoring criteria, so automating the marking process with technology works well, saves teacher time, and eliminates human error. But creative writing is more subjective. Trying to speed up the process through automation won’t resolve the underlying disagreements, and may make them worse.
The consistency/common sense trade-off
It might seem odd to say that handball is a subjective decision. Surely it is obvious if a ball has hit someone’s hand or not? But the past few years of VAR have created extraordinary confusion about this fundamental rule.
Before VAR, the written rule was short and simple, just 20 words long. Referees were allowed to use their own judgement to decide whether a handball had been deliberate, and therefore a foul, or if it had not.
The advantage of empowering an individual decision-maker is that you get to incorporate human discretion and common sense. The disadvantage is that you will get inconsistency, not just between different referees but even within the same referee’s set of decisions.
One way this has been addressed—even before technology—is by restricting human discretion. In many legal systems, judges don’t have complete latitude in sentencing decisions. They have guidelines because unconstrained discretion often meant that two people who had committed similar crimes received different punishments. Inconsistencies like this risk bringing systems into disrepute.
Since VAR arrived, the written rule about handballs has expanded to 11 times its original length, full of subclauses trying to cover every possible way a ball might hit a hand, and every possible context in which it should result in a foul. That is because scrutiny of video replays has meant attention to every last detail of contact, movement, and placement that was impossible to parse in a fluid real-time experience. If a player jumps in the air to head the ball and uses their arms for leverage, and then the ball deflects off an opponent’s head onto his arm, is that handball? Is it enough of an offence to justify the award of a penalty kick, which gives the team a 75% chance of scoring a goal in a game where scoring goals is rare and difficult?
In a game from September 2020, the Newcastle striker Andy Carroll headed the ball onto the arm of the Spurs defender Eric Dier. Dier’s back was turned to Carroll, and he could not even see the ball. After a VAR check, the referee awarded a penalty. The Newcastle manager, Steve Bruce, who benefitted from the decision, said: “If you’re going to tell me that is handball then we all may as well pack it in. It’s a nonsense, a nonsense of a rule. It’s gone for us today—however, it’s ludicrous.”
A newspaper match report said, “This was perhaps the worst among a long list of maddening recent decisions brought about by a rule that goes against the sport’s very essence.”
In previous years, this kind of handball infraction might never have been spotted, let alone punished. The problem has become so acute that the law is now constantly revised, and applied differently in different tournaments. In the English Premier League, referees err on the side of leniency. In European and international tournaments, referees are stricter.
Solving the wrong problems
Attempts to improve handball enforcement have often fallen into a classic trap: deploying an impressive technological innovation that doesn’t address the underlying problem. At Euro 2024, referees were given access to a sensor inside the ball that could detect whether it made contact with a player’s hand. Similar systems work well in cricket. But in football, the crucial question is rarely whether the ball touched the hand; it’s whether the handball was deliberate or significant.
If you automated all handball decisions based on data from sensors, it could transform the game. Technological automation of other decisions in society, such as those in law enforcement, could present comparable problems. Suppose you could automatically fine anyone who submits a tax return with a discrepancy. Would this reduce tax evasion? Or just prompt tax evaders to avoid discrepancies?
Or imagine a school that wants to crack down on absenteeism. It is relatively easy to create an online tracking system that bombards parents with automated texts or requests for evidence every time they want to book a dentist’s appointment for their child. It is much harder to create a system that identifies and addresses the early signs of chronic disengagement.
Often, the questions that technology finds easiest to answer are not the ones we care about the most. Automation can solve problems. It can also create new ones, with aggressive enforcement of rules where behaviour is easiest to quantify, not where harms are greatest.
Increased scrutiny and detail of the new handball rule has not even succeeded in providing greater consistency. Fans still point to infractions in the same season—sometimes even the same week or the same game—that look similar, yet resulted in opposite decisions. We have the worst of both worlds: a system that has reduced human discretion and lessened simplicity, but failed to improve consistency.
There’s a fascinating analogy here with artificial intelligence. The earliest AI methods were rules-based: experts tried to write down every rule a skilled human might apply when making a decision, in much the same way that football’s rulemakers now attempt to write down every possible way a ball might make contact with a hand.
These rule sets to create early AI multiplied, yet they couldn’t capture what a human was actually doing. The breakthrough came not from more or better human-crafted rules, but from giving the machine vast quantities of data and letting it intuit the rules itself. AI models trained in this way proved better at capturing and representing the tacit knowledge that experts hold but find difficult to represent in words.
One caveat here is that many real-world decisions require deep moral reasoning to weigh conflicting human wants. Even training an AI model on vast amounts of data will not alone align the system’s behaviour with all our preferences. So, which judgements can we automate with technology? And when would we prefer fallible “humans in the loop”? Safety researchers are considering such questions with urgency. Perhaps they could learn something from watching a little football.
World Cup 2030?
By now, the issues with VAR cannot be written off as teething problems. We have had nearly a decade of adjustments, but can anyone feel confident that this coming World Cup will see no glaring refereeing errors?
We are stuck with VAR this time. But if we look further ahead, to the 2030 World Cup, we need to think less about tweaks and more about transformation. Here are three important amendments for an improved system:
Prioritise speed. Even if the system can’t make instant judgements in its early form, the speed of goal-line technology should be the ultimate aim.
Ensure fan involvement. This system shouldn’t feel like alien values imposed on a game beloved by billions.
Include tacit knowledge. Trying to turn regulations like the handball rule into written commandments cannot work, but needs to incorporate the nuanced understanding of those who watch the game.
One solution could be to use machine learning to create a Foul Probability Index. You’d start by assembling training data: hundreds of thousands, possibly millions, of video clips of real incidents judged by professional referees and by fans. Every clip gets a percentage foul rating based on all of these judgements. Then, a model learns from the training data what a foul is and is not. For any new incident, it could provide the percentage chance that it is a foul.
To begin with, this could be applied in live matches as part of the current VAR system, as an aid to the on-field referee and the video assistant referees. If it worked well, you could get to the point where it functioned like goal-line technology, and an incident judged to be above a certain probability threshold would trigger a buzz to the referee’s watch, alerting him to blow his whistle.
Some of the most uncanny successes of deep learning have come in games, such as chess and Go. Human experts have marvelled at the intuitive brilliance of the famous “Move 37” made by AlphaGo in its sequence of games against Lee Sedol in 2016. The equivalent success for a machine-learning handball recognition model would be to recognise that, whilst the ball struck Eric Dier’s arm, a penalty was the incorrect decision.
If the emblematic error from the 1986 World Cup was a referee missing Diego Maradona’s obvious handball, the emblematic error of the 2026 World Cup could be a ball accidentally brushing a player’s hand 20 seconds before a goal is scored, only for the referee to disallow it after a four-minute video review.
One of the insights of deep learning is that examples are more powerful than rules. Both machines and humans learn better when presented with concrete examples, not abstract principles. So if there is one upside to VAR’s failures, it’s that it gives vivid examples of abstract ideas.
Automation, alignment, consistency: football has accidentally created a giant case study of dilemmas in modern technology. When we get the first VAR controversy of the World Cup, it will be worth remembering that the argument unfolding on the pitch is a miniature version of much larger arguments the rest of society is only beginning to have.



