Maintaining agency and control in an age of accelerated intelligence

Climbing the ladder of abstraction

and

Apr 10, 2025

This essay is written by Séb Krier, who works on Policy Development & Strategy at Google DeepMind. Like all the pieces you read here, it is written in a personal capacity. We encourage readers to engage with this piece as an exploration of ideas, rather than as a presentation of firmly held beliefs.

Hofstadter’s Law: "It always takes longer than you expect, even when you take into account Hofstadter's Law."

In this piece, I want to untangle several threads that often get mixed up in AGI discussions: the distinction between capabilities and deployments, the relationship between technical progress and real-world impact, and how humans maintain meaningful control as systems become increasingly sophisticated. While the technical path to AGI is important, I'm equally interested in what happens afterward - my central concern is how we preserve agency not by understanding every cog in increasingly complex systems, but by designing the right level of abstraction for human oversight.

The gaps between capability, deployment and impact

Based on current scaling trends and algorithmic progress, I think it’s likely that we will reach ~AGI capabilities before 2030. By AGI I'm specifically referring to systems that achieve something like ‘Expert AGI’ as defined in the Levels of AGI framework - that is, AI that performs at least at the top percentile of skilled adults across a wide range of non-physical cognitive and metacognitive tasks. While physical capabilities through robotics are advancing and crucial for full real-world impact, my focus here is primarily on the cognitive capabilities enabling reasoning, learning, and problem-solving across domains with minimal human oversight.

Crucially, however, achieving these capabilities is distinct from deploying them in ways that yield truly transformative changes. This distinction between capabilities (what a system can do, often demonstrated in controlled evaluation settings) and deployments (how systems are integrated into real-world, value-producing applications) is central. Deployments are much harder to model and predict; notice how most online forecasting rarely focuses on concrete products or use cases. Simulated environments help gauge capabilities, but designing (and adopting) useful agents and products is another challenge entirely.

So while capabilities may arrive relatively soon, I expect a lag before we see widespread transformative impact from them. Intuitively, it might seem that AGI deployment would be relatively quick - we are already using pre-AGI systems, and so we have existing infrastructure to draw from. However, this underestimates real-world frictions, such as the slow grind of human and corporate inefficiency and the difficulty in achieving product-market fit for truly useful applications. Progress on specific, often academic benchmarks can sometimes create an illusion of proximity to broad usability.

Furthermore, even as underlying capabilities advance, progress might feel slow during certain periods. There are several reasons for this potential perception gap. As Joshua Achiam observes, AI may improve significantly on complex specialized tasks that are irrelevant to most people, "creating an illusion that progress is standing still.” Additionally, the advanced capabilities of a pre-AGI model or agent might initially go unnoticed in everyday interactions, as most current chatbot queries are basic. The capabilities might exist but remain underexploited, much like the months-long effort required by engineers to fully 'extract' value from a newly trained model.

Still, some strategically significant changes might arise even with limited deployment. A few actors using these capabilities effectively could gain a considerable head start – think of the decisive strategic edge gained by codebreakers during wartime, where a capability understood and wielded by only a select few dramatically altered outcomes.

From today to AGI

On the R&D side, training a capable model is only the first step. Ensuring it's fit for purpose demands specialised data and meticulous post-training/fine-tuning. Building a truly useful and effective agent - or a multi-agent system with tailored roles and a robust pipeline - requires significant effort and development. These exercises take time and require many iterations.

High-quality synthetic data is not easy to generate. And when you finally run these systems, there’s still a lot of trial-and-error involved to achieve the desired output. If the output is data, you still need to evaluate, process, and use it - which is labour-intensive. If the output consists of actions, you need to verify and check results - same thing. This requires new tools and frameworks that also need to be created, iterated upon, and perfected simultaneously. The systems and products we build on top of these models will also be complex and unwieldy. The more I unpack and examine how models are trained and improved, the more I understand Hofstadter’s Law.

But while training a capable model is a significant hurdle, the need for specialised data and extensive fine-tuning will likely diminish as models become more advanced. More intelligent models will also be better able to work around imperfect tools and suggest improvements, reducing the need for painstaking manual optimisation. This is the benefit of generality and computational power.

We see this across the board: larger models tend to perform better out of the box than smaller, specialised models. For example, Bloomberg trained a GPT-3.5 class model on their financial data last year, but GPT-4-8k soon outperformed it on nearly all finance tasks. While people sometimes make hyperbolic claims about this, I do think that it’s true - it’s certainly been my experience. The Bitter Lesson remains as true as ever.

Perhaps this is because large models effectively bundle many specialized capabilities, but unlocking or optimizing them for specific tasks often requires dedicated fine-tuning or the creation of smaller, derived models for efficiency and convenience. The learnings from this specialization process, however, frequently inform the architecture and training of subsequent, even more capable general models, continuing the cycle.

The larger the model (and the more computation used for training), the greater the potential relative gain from its self-improvement capabilities. As systems improve and demonstrate reliability, and as AI tools increasingly assist in the validation process itself, the level of scrutiny required during their development will likely decrease. Over time, we can expect many, if not most, parts of the R&D pipeline to become automated.

For end-users, businesses, different sectors, and consumers, the calculus is different. Once products offer a sufficient degree of convenience, utility, and reliability, widespread adoption will likely occur rapidly. Researchers and academics may continue to meticulously scrutinise these systems, but the majority of users will prioritise ease of use and perceived benefits over deep technical understanding - accepting AI assistance with minimal oversight, much like following an online maps route without a second thought.

However, finding the right ‘product fit’ is difficult. This is often overlooked by those working on R&D, but it is critical for ensuring high adoption and diffusion. These elements are more about engineering, operational, contextual, and commercial complexity - as well as user experience - than raw model capabilities, which scaling trends don’t typically account for. To be clear, there’s nothing inherently impossible or intractable here, but it probably adds a couple of years or more to my timelines.

We'll likely see the gradual deployment of better models and more capable systems, with improved integration within the information ecosystem every month and year. Models will continue to improve, taking on more tasks in the economy. By the time we reach true AGI, much of the groundwork for deployment will likely have already been done.

In a sense, today's environment lacks the mature techno-organizational ecosystem required to smoothly ‘drop in’ AGI workers at scale, in a way that can account for the tacit knowledge that underpins many workflows. But as AI capabilities progress, this ecosystem is likely co-evolving, meaning the pathways for effective integration – whether through internal change or external disruption like AI-first startups – may be significantly clearer by the time ~AGI capabilities arrive. After that, the challenge will be twofold:

Continuing to improve models and deploying them systems at scale.
Achieving repeatable benefits and continuing this cycle on loop.

The big questions are: How much time will this take? Is the curve exponential? And what about the physical infrastructure and hardware underpinning AI systems?

So what about post-AGI?

A lot of the above could, over time, be automated too - but the process of automating itself requires extensive iteration. This involves essentially repeating all the steps mentioned above, but multiple times, at different layers of abstraction. Every time you ascend one part of the abstraction ladder, new tasks, actions, and options are created and must be completed for progress to continue. Both executing and subsequently automating all of this requires specialised work and is unlikely to be solved in mere days, even with the assistance of multiple pre-existing AI systems operating in parallel. Put differently, each layer of abstraction doesn't just encapsulate the complexity below it but generates new types of complexity that must be managed. And all of this assumes we have reached this level with minimal societal pushback, which I don’t expect (e.g. strikes, protectionism, legally mandated human roles, etc.).

Setting aside the sociopolitical elements for a moment, managing this emergent complexity will necessitate substantial compute resources for experimentation, training and inference, leading to a faster escalation in energy demand. This rising demand could eventually drive a shift in how we produce compute substrates. As Epoch AI highlights, current chip manufacturing paradigms may not scale beyond 2030 (for training) due to power and manufacturing constraints. This may necessitate a transition to quantum computing or more unconventional approaches. This alone implies a degree of slowing down, relative to what one would expect from a purely linear extrapolation.

Progress will depend on optimising both the algorithmic and hardware layers, potentially shifting focus from software innovations to material science and advanced manufacturing. For example, building cutting-edge zinc factories to produce better lithography machines for more efficient chips is a highly complex endeavour. Even if millions of AGI scientists assist in this effort, financing and constructing adequate and safe physical labs will still require considerable time.

The pace of hardware improvements is likely to be constrained not only by hardware limitations but, maybe, also things like the transfer of tacit human knowledge and tense geopolitical dynamics, rather than algorithmic complexity alone. As we exhaust the potential for algorithmic optimisation on existing hardware, progress will increasingly depend on advancements in chip design and manufacturing infrastructure. This may involve building specialised factories or developing novel material processing techniques, areas where AI can help but where human expertise and tacit knowledge will likely remain essential for some time.

How much time do these implementation challenges add to the journey from initial AGI capabilities to transformative economic impact? Does ~AGI materially change things?

What makes me lean towards very fast:

Parallel processes can stack multiplicatively with agents who work 24/7 - see also Price’s Law.
Geopolitical competition will accelerate investment, as well as massive capital mobilization and strategic concentration of compute resources.
As Daniel Kokotaljo writes, automation of ML R&D will lead not just to faster iteration but also enable qualitative algorithmic breakthroughs that unlock step-changes in capability.
Larger models will continue to improve and outperform smaller, specialised models.
Smaller, more efficient models will continue to improve, eventually reaching the capabilities of the previous generation of large models.
We’ve only just begun exploring inference scaling, with ample room for further impressive capabilities.
There is potential for rapid, widespread automation across the broader economy, leveraging AI for ordinary cognitive and physical tasks.
Privacy-preserving monitoring and verification technologies could greatly facilitate governance.
Robots are becoming increasingly capable and dexterous, a trend that is likely to continue.
AGIs could contribute to energy R&D, creating a positive feedback loop, with energy efficiency of compute improving simultaneously.

What makes me lean towards not that fast:

As we solve one bottleneck, new, previously unappreciated ones become salient.
"Easy for humans, hard for AI" tasks are arguably more critical for widespread automation than superhuman performance on narrow cognitive benchmarks.
As Ege Erdil notes, there isn't a clear, rapidly improving trendline for agency or common sense that can be confidently extrapolated; and transfer learning remains tricky.
Integration/knowledge challenges may multiply, and verification requirements could grow over time.
Poorly designed regulations, protectionism, or bureaucratic inefficiencies could hinder progress.
The difficulty and cost of acquiring the right kind of data to train capabilities in areas like agency, planning, and real-world interaction.
Legacy infrastructure lock-in: even well-capitalised firms are built on pre-AGI assumptions, and rebuilding physical and organisational systems takes time regardless of capabilities.
Inherent physical limits could restrict speed and slow compute scaling.
Military and geopolitical conflicts, economic crises, weak economies etc.
AI R&D automation alone may not be sufficient for sudden, rapid acceleration; achieving that depends on other feedback loops and Baumol-like bottlenecks.

Managing post-AGI economies and automated production-processes will ultimately require both decisions and time. Wolfram argues that even with highly automated AI systems, strategic choices must still be made about which paths to explore in the computational universe, noting that "something—or someone—will have to make a choice of which ones to take." I think this is a crucial point, relevant to both agency, control, and speed.

Who does the work?

What if the 'something' making choices for everything is AIs with zero humans involved? In that case, humans are arguably no longer in the loop, and things could change much faster than we can process, understand, or control. This is the real ‘automation gone wild’ risk - it’s not just about automating away machine learning R&D, but automating everything. Even without any sudden technological discontinuity or overtly hostile AI actions, our economy, culture, and political institutions could gradually drift away from human influence as they become less dependent on human involvement to function.

I don’t think we will sleepwalk into this, but I do think there are adjacent risks worth considering.

If the 'someone' making choices is humans, then our cognitive speed and limitations will serve as a ceiling, restricting the rate of progress. In this case, humans would constitute a slow-but-necessary component within an ecosystem where different types of thinking or processing occur at different speeds. This doesn’t seem sustainable in the long run; even today, we often sacrifice understanding for efficiency. For example, there’s little point in using an automated cancer detection tool if you intend to manually verify each image in parallel. At some point, the system performs well enough to be trusted. On the other hand, we still have largely decorative ‘drivers’ for the UK tube system, even though these systems are fully and reliably automatable.

The most likely and sustainable allocation of tasks will likely involve both humans and AIs making decisions. Achieving this will require rules, verification systems, and trusted-based mechanisms to maintain a stable and peaceful coexistence - not only between different ASI powers or nations, but eventually also between humans and ASIs. Some tasks we will (hopefully) be content to automate entirely: just as we were content automating bank tellers in the past or call centres today, tomorrow we may be comfortable automating processes at a higher level of abstraction, such as ‘public transport’.

The difficulty is designing systems that can effectively coordinate fast, post-AGI-driven processes with slower human strategic oversight. In such a world, we will need what you might call "cognitive impedance matching" - systems that can translate between AI and human timescales while maintaining stability. These intermediaries could be agents, but other form factors and interfaces may be preferable in some cases. Imagine an ASI managing a complex global supply chain in real-time, making millisecond adjustments; the impedance matching system might present human overseers with curated summaries of network health, predictive alerts about potential disruptions days or weeks in advance, and simplified interfaces to approve strategic shifts (like prioritizing a specific region) without needing to track every individual shipment.

This human oversight is less like a technical checkup, which another ASI might perform, and more akin to setting the destination and preferred route on a navigation system. While another AI could monitor the engine's performance, the human role is to ensure the complex machinery is ultimately serving human-defined goals and navigating according to human-held values and priorities. It's about steering the 'what' and 'why', even as the ASI optimizes the 'how'.

Nicklas Lundblad explains that AI can serve as a "bridge" between computational time and biological time. This is important because many processes - such as justice, democracy, and relationships - lose their meaning or utility when rushed. Down the line, cognitive enhancement for humans could help. But this will be the central 'dynamic' to manage, and it’s another important facet of what ‘alignment’ is fundamentally about. Perhaps a different term is more appropriate, but the goal remains the same: ensuring the systems you design do what you intend, safely.

Michael Levin emphasizes the diversity of intelligence and embodiment, acknowledging that different agents operate on different timescales. Just as biological systems - such as the interplay between fast neuronal firing and slower muscle contractions - achieve coordination across different timescales, we should be able to design AGI systems that translate between human and post-AGI speeds, enabling meaningful collaboration.

What about understanding?

One way I could see this process going wrong is if AGI systems become increasingly more deeply embedded in decision-making processes across all levels, gradually shifting the human role from active direction to passive approval. Practical concerns about sentience, rights, and sophonts could further complicate these dynamics. Even with well-designed oversight mechanisms, the speed and complexity of AI-driven processes might make genuine human understanding and intervention increasingly difficult. This could lead to emergent system-level behaviours that no individual component was explicitly designed to produce, resulting in a gradual erosion of human agency and understanding. As Kulveit et al. detail in their 'Gradual Disempowerment' paper, this erosion could occur incrementally through the progressive replacement of human labour, cognition, and participation across interconnected societal systems.

The "cognitive impedance (mis)match" discussed earlier only worsens as AI systems become more capable. Even with the best interfaces and coordination mechanisms, humans may increasingly be forced to either: (1) trust the AI systems blindly because we can't verify their logic and complex reasoning in real-time; (2) slow everything down to human speed, creating massive inefficiencies; or (3) remove ourselves from more and more decisions. However, whether this scenario leads to a true loss of agency hinges critically on our ability to design and implement effective high-level oversight mechanisms built upon the right abstractions.

Is it inevitable, then, that we lose track of reality and choice?

Maybe not. Perhaps the challenge of understanding is already accounted for by the idea of climbing the ladder of abstraction. In principle, we could continue shifting to higher levels of abstraction in our oversight - just as executives don't need to understand every line of code to run a tech company, and I don’t need to understand how a fire alarm works to install one. We'll focus on what we want the AGIs to achieve, not necessarily how they achieve it (though nothing, apart from time, prevents us from unpacking the why if needed).

Consider a Minister of Health, who, faced with a new flu strain, strategically prioritizes vaccinations for the elderly based on epidemiological models, without needing to grasp the intricate molecular biology of the virus or the logistical complexities of vaccine production. This is a crucial, high-level decision, reliant on abstracted information and expert advice to achieve positive public health outcomes. Similarly, much like we see in other spheres of life, decision-makers can - and often must - make choices based on outcomes without intimate knowledge of all the underlying processes. The same applies to steering superintelligent multi-agent systems.

To illustrate the intuition behind this, imagine a similar line of thinking in a village 2,000 years ago: “In the village, everyone understands how everything works - we know who makes our tools, who grows our food, how decisions are made at the village council. But in these new cities, everything is interconnected in ways no one person can fully grasp. People don't even know who bakes their bread! And with all these written contracts and money changing hands, decisions that affect us all are being made through processes we barely understand and have no time to unpack.”

The nuance here is that, while deep understanding is often useful, its necessity is built partly on our goals and the level of assurance required. Today, I don't know how to build a plough, because there are more effective processes (markets) that can provide me with one. However, I do need to gain experience working in policy to achieve my goals within my current environment. These goals will change and evolve, and so will the types of knowledge or understanding we need and want to internalise.

The key difference is that no single human may be able to understand all the building blocks; but ideally, these should be well-documented, maintained, and scrutinizable if needed - by different kinds of specialized agents. Building a new Library of Alexandria of knowledge seems like a worthwhile endeavour.

Biological systems also demonstrate that effective high-level functioning doesn't necessarily require low-level understanding. My body maintains homeostasis without my conscious mind understanding cellular biology. Similarly, I can effectively use my arms without understanding individual muscle fibre mechanics, or a computer mouse without knowing its hardware intricacies.

From understanding to steering

Beyond understanding, humans can also retain agency over governance more generally. Scenarios of gradual disempowerment are compelling when entire systems are outsourced to identical copies of AI agents that don't represent anyone's direct interests. The 'human interest' feedback loop is weak in such a world. Instead, I think that every human should ideally have a personalized agent that learns and represents their evolving values and preferences. These agents, tightly linked to their human principals and acting on their behalf, would create a continuous feedback loop between individuals and large-scale automated systems, preventing system-level value drift.

So you don't necessarily need humans in the loop; you need aggregate human interests in the loop. Provided egregious misalignment is avoided, automated systems should be directly informed by, and in some sense exist downstream of, human preferences - preventing the gradual drift toward disempowerment one can expect with monolithic, unaccountable AI structures. Of course, you still face challenges like trade-offs, aggregation issues, and conflicting values. But these are not insurmountable and ultimately call for updating our democratic machinery (or building something better) to accommodate diverse human interests. The difficult part will be replacing our existing decaying institutions and navigating entrenched human interests.

And just as complexity necessitated new forms of social organization and market mechanisms, the speed and scale of ASI-driven processes will likely require automating significant parts of governance itself. Designing these automated governance systems, ensuring they operate effectively, adaptably, and remain aligned with human values and strategic direction, becomes a crucial meta-level challenge for maintaining agency in this future

This is not necessarily an erosion of agency, but a shift in how and where it is exercised. The real challenge isn't maintaining low-level understanding, but rather designing the right abstractions that capture what we truly care about and ensuring these abstractions remain responsive to evolving human values while preserving meaningful oversight as systems grow increasingly complex. It’s not easy to pre-design or pre-specify these, and I think working all this out will require significant human input for longer than is sometimes assumed.

Amara’s Law: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.”

Overused, but useful.

—---------

Thanks to the following people for comments: Shane Legg, Ben Lepine, Nick Swanson, Harry Law, Pegah Maham, Zhengdong Wang, Conor Griffin, Tim Genewein, Herbie Bradley, Samuel Albanie, David Wolinsky, and Luke Drago. It goes without saying that they don’t necessarily endorse my ramblings.

A guest post by

Séb Krier

deep ArXiv dweller sharing half-baked thoughts

Daniel Kokotajlo

Apr 10

"So while capabilities may arrive relatively soon, I expect a lag before we see widespread transformative impact from them. Intuitively, it might seem that AGI deployment would be relatively quick - we are already using pre-AGI systems, and so we have existing infrastructure to draw from. However, this underestimates real-world frictions, such as the slow grind of human and corporate inefficiency and the difficulty in achieving product-market fit for truly useful applications."

Quantitatively how much lag are you expecting? AI-2027.com depicts something like a one-year lag between superintelligence-in-a-datacenter and self-sustaining-robot-economy, and maybe an additional year or two before the robot economy has grown big enough to fill up the undeveloped regions of the world and spill over into human-occupied areas. I can't tell from the above if you are agreeing or disagreeing with AI 2027, though I suspect you are disagreeing, but I can't tell quantitatively how much.

Expand full comment

2 replies

Bob Spence

Apr 11

Some quick thoughts:

It seems that we will evolve a hierarchy of agency in deployed AIs. It will be safer and cost effective to avoid deploying them with more agency than required for their specific application.

We register the identities and parentage of humans, and free agent AIs might need registered identities and legacy, as well. Anonymous AGI agency is dangerous.

There will be a need to deploy a system of governance to bound the agency of their deployment. For instance, systems could be bound by a governance of AGIs by a system of governance by humans and AGIs modeled after the US democracy with legislature, executive and justice domains to implement legal frameworks and their enforcement. Policies and enforcement of this governance could be implemented with emerging decisions and actions applicable to issues of AGIs, but reducible to human timeframes and understanding.

7 more comments...

AI Policy Perspectives

Discussion about this post