Today’s essay is the second entry in a seven-part series focused on questions of responsibility, accountability, and control in a pre-AGI world. This piece, and those that follow, are written in a personal capacity by Seb Krier, who works on Policy Development & Strategy at Google DeepMind.
Following the first essay focused on defining dangerous capabilities, in the coming weeks I'll be continuing to publish one piece every week or two that attempts to make sense of where we are now and where we are going. An important caveat: these are my current high-level thoughts, but they're certainly not set in stone. The nature of AI means my views evolve often, so I highly encourage comments and counterpoints. Over the coming weeks, this series will include the following essays:
→Not all risks need to be stopped; how should we delineate unacceptably dangerous capabilities and use cases over tolerable risks and harms?
How do we use models to bolster societal defenses?
Are there ways of reconciling competing visions of fairness in AI?
What does a world with many agentic models look like? How should we start thinking about cohabitation?
How do you deal with faster disruption and technological displacement in the labor market, both nationally and internationally?
How can we balance proliferation and democratization with appropriate oversight when developing and deploying powerful AI systems?
Huge thanks to Kory Matthewson, Nick Whittaker, Nick Swanson, Lewis Ho, Adam Hunt, Benjamin Hayum, Gustavs Zilgalvis, Harry Law, Nicklas Lundblad, Seliem El-Sayed, and Jacques Thibodeau for their very helpful comments! Views are my own etc.
Not all risks need to be stopped; how should we delineate unacceptably dangerous capabilities and use cases over tolerable risks and harms?
Herman Kahn once posed a thought experiment: would you accept a 1 in 66 chance of birth defects to eliminate the Soviet threat via nuclear war? His audience would instinctively recoil, yet birth defects already occurred at a 1 in 33 rate. His point was that we must view risk systemically, not in isolation. Like investors managing portfolios, we should consider the added risk of our actions (and inactions) within the context of existing risks. This holistic approach is often missing in discussions of AI harms, where risks are often assessed narrowly: simply pointing out a hypothetical harm is useful, but not enough. Academics and journalists are very good at finding issues, but there are fewer discussions on what level of risk is tolerable, or how they should be contrasted to the benefits. We can ride motorcycles and they’re quite dangerous, but we don’t prohibit them or require them to have two more wheels. Controlled forest fires are risky but also a method used to reduce the risk of larger, uncontrolled wildfires. Human clinical trials which can pose risks to participants, such as unforeseen side effects or adverse reactions - but this risk is accepted due to the necessity of developing new treatments for disease. What does risk tolerance look like with AI models?
The concept of risk tolerance has real-world implications, particularly in Western Europe, where conservative cultural attitudes and an aging population can greatly influence technology policy. As John Burn-Murdoch notes, “over the past 60 years the west has begun to shift away from the culture of progress, and towards one of caution, worry and risk-aversion, with economic growth slowing over the same period.” A few caveats: first, my focus here is more on the deployment than the development side. And of course, a safety versus progress frame alone is too reductive. When we think of the history of progress, we should think not only of economic production but celebrate achievements in safety too. I want to be very clear that this is not a “AI safety is bad” or “AI ethics doesn’t matter” hot take: in fact I want a lot more high-quality work in these fields. My concern however is that a culture of (sometimes) overblown risk aversion and superficial safety theaters undermines genuine safety efforts. This leads to hasty, counterproductive decisions as politicians seek quick fixes instead of facing complex problems. The solution, as Jason Crawford points out, is not to slow or stop progress, but to identify positive steps we can take towards safer technology.
This risk-averse mindset sometimes manifests in policy decisions. I previously expressed skepticism at parts of the EU AI Act because of how broad and indiscriminate it is - at least some of the earlier drafts. For example I don’t think that simply because an AI model is used in education or justice, that it is by definition high risk. The net is cast too wide: a model used to assign class timetables is not inherently ‘dangerous’ to the point of requiring a risk management framework, logging of activity, detailed documentation and security. An AI agent tasked with grading national exams, however, probably warrants that more. In infrastructure, a model tracking water or energy consumption patterns to identify potential leaks or inefficiencies is not high-risk, whereas AI systems balancing power load across a grid and making real-time decisions would be. This also illustrates why I tend to prefer sectoral approaches to regulating AI deployments - it’s less blunt, and incentivises a more in depth understanding of how models are used in (and affect) different contexts. As a result it’s also less likely to encompass otherwise tolerable or even desirable risks. A good example of application-focused positive regulatory action is the FCC declaring AI-generated voices in robocalls to be illegal.
I think people often fail to see the harms caused by the precautionary principle. Defaulting to "play-it-safe mode" can result in forgone experimentation and impede long-term prosperity. As Cass Sunstein notes, “the principle is literally paralyzing— forbidding inaction, stringent regulation, and everything in between. The reason is that in the relevant cases, every step, including inaction, creates a risk to health, the environment, or both.” Premature regulation also leads to path dependencies. Regulating planes on the basis of the Kitty Hawk Flyer would have been futile and probably harmful - but as powered flight became more viable and widespread, regulations focusing on safety and pilot qualifications became necessary to protect both the pilots and the public. So it’s important to give industries time to mature the products they build. There are also plenty of other feedback mechanisms in society, from media scrutiny to lawsuits, to help correct course when things go wrong in the interim. To use an AI-specific example, much of the literature about error rates in facial recognition in the late 2010s is now arguably less salient. If regulations had been based on the likelihood of error rates, they would have quickly become obsolete. Of course, other problematic aspects like misuse remain, and these are now more useful targets for regulation.
To get this right I think it’s also important to think about individual responsibility, the scale of risk, and likelihood of harm to third parties. Certainly extreme risks in my mind justify more preemptive measures, and irreversible actions warrant more scrutiny. At the moment discussions about risk and safety conflate many different things: should a model be able to tell me how to inject heroin? Should it be allowed to be rude? Should it be able to give medical advice? Should anyone be able to conduct automated agent-assisted cyber-attacks? Should advanced agentic models be able to autonomously replicate? A single regulatory framework is unlikely to address all of these questions in one go, and will likely both undershoot and overshoot. And while I don't believe I should be able to create a bioweapon or hack other systems (note: we’re not there yet), I also don't want a model censoring offensive content 'for my own good' - right now I’m not sure we have the balance right. Mill's Harm Principle remains a valuable rule of thumb, and I like Gemini’s approach to let users turn the dial on its safety settings. Implementing those limits can be challenging, but ultimately I think users should have a lot more control over what they consider acceptable for their own use, and that the bar for censoring information should be very high.
Question for researchers: how should AI policy balance individual user control with the need to mitigate potential societal harms?
I'm looking forward to the rest of this series! Here are my thoughts so far:
(1) I think I basically agree with you re precautionary principle paralysis and many safety efforts being in fact counterproductive. I propose we do something like
--Divide risks into those for which iterative development works (i.e. problems that, if they happen, we'll notice and fix before major catastrophe occurs) and those for which it doesn't (i.e. problems such that, if they happen, we either plausibly won't notice or plausibly won't fix before major catastrophe occurs)
--and then have fairly permissive / move-fast-and-break-things policies for the former and more restrictive precautionary-principle-esque policies for the latter.
(2) So far this series has been long on questions and considerations, and short on answers and proposals. Fair enough I guess, but it would be nice to see some more of the latter IMO. What are your opinions about e.g. AGI timelines, takeoff speeds, AGI governance strategies, AGI alignment strategies, likely failure modes of likely-to-be-used AGI alignment strategies, etc.?
(3) The seven essays promised in this series don't seem to contain anything about misalignment, superintelligence, or loss-of-control. Just putting in a vote here in case you care what I think (I won't take offense if you don't) that I'd love to hear your thoughts on those topics!