AI Agents Running the State

What could possibly go wrong?

Apr 15, 2026

Waiting for an AI helper. (Credit: Gemini)

“Public services” include everything from teachers to the trash, from roadwork to permission for a tree house. Much seems routine, but plenty is at stake. This makes politicians hesitant to risk an overhaul, leaving the system creaking and the paperwork mounting.

Last October, a provocative proposal emerged. The Agentic State conjured a vision of officialdom transformed, converting outdated procedures with a new system of AI helpers. This fledgling project offers both a blueprint and a promise of assistance to governments around the world.

But what if the vision were blind to how this could go awry? Simone Maria Parazzoli, a co-author of the paper, and Omer Bilgin of deliberAIde decided to critique their own ideas, seeking pitfalls in hopes of averting them.

—Tom Rachman, AI Policy Perspectives

By Simone Maria Parazzoli & Omer Bilgin

Amid the exhaustion of caring for a baby, new parents must deal with everything from bewildering sobs, to erratic feeding times, to the joys of changing a soiled newborn at 3 a.m. The last thing they need is paperwork.

But what if, when coming home from the maternity ward that first day, they could awaken a government AI voice assistant, tell it the happy news, and hear the following response? “Congratulations! What’s the baby called?” The app would then take care of all the dreary admin, coordinating across agencies, registering the child, and setting in motion the services that this tiny new citizen should enjoy.

That is one example of how a future “agentic state” could simplify, speed up, and improve citizens’ interactions with public services. To be clear, this does not yet exist. But projects like this one, envisioned by Ukrainian officials, are more than fantasy, with several countries avidly testing early versions of agentic AI systems.

While Ukraine works toward the baby example, Britain is piloting agent-based support to provide citizens more tailored help. Meanwhile, Singapore is developing governance frameworks for agentic AI, and governments from France to the United States are ensuring that their public data can be accessed by agents.

Agentic AI systems—capable of perceiving, reasoning, and acting with minimal human supervision—will transform what organizations can achieve. By combining the reasoning of large language models with retrieval, memory, and tool use, agentic AI can automate complex tasks. For governments, whose core work is high-volume, structured administrative processes, this could make services more efficient, timely, consistent, and fair, while lowering costs.

Consider a citizen looking to start a small business. An agentic system—instead of requiring the entrepreneur to individually navigate zoning boards, tax authorities, and regulations—could autonomously reconcile these requirements. The larger promise is a shift from just doing things right (optimizing for procedure-following) to doing the right things (pursuing outcomes that citizens truly want).

The Agentic State vision paper—supported by The World Bank and the Global Government Technology Centre Berlin—was the first effort to systematically map the opportunities of agentic AI adoption for governments. This was not an academic exercise: 21 leaders across 15 countries contributed, including ministers and chief technology officers preparing to lead this transition.

In this vision, AI agents are a means to manage complexity and scale, while humans develop strategy, exercise judgment, and hold accountability.

Several governments have integrated official chatbots into their government services, but most of these merely provide conversational guides to administrative procedures. A few pioneering countries are starting to move beyond that. Ukraine, for instance, is turning chatbots into agentic assistants. Specifically, its Diia.AI assistant can retrieve users’ data from connected registries, and generate official documents such as income certificates, while also providing certified information based on records such as taxation, land registries, and pensions.

The United Kingdom is also exploring agentic interactions via GOV.UK Chat (inspired by Diia.AI), including a pilot program to support job seekers that transforms a static digital portal into an active assistant, matching users’ skills with available opportunities.

Yet trends and optimism are not enough for success. The agentic state vision rests on key assumptions. What if they’re wrong?

This article presents a “red-teaming” exercise—a stress test of this vision—that identifies six core assumptions, along with scenarios that could emerge if they don’t hold true, and guardrails to avert such failures.

Assumption 1: AI Agents Become More Capable and Reliable

Agents can already perform rudimentary planning, tool use (e.g., searching the internet, using calculators, sending emails), and multistep task execution. Frontier labs are betting heavily on agents, making it plausible that systems capable of managing complex and large-scale administrative tasks will emerge soon.

Failure Scenario: The Technology Falters

Governments reorganize around agentic execution, but systems never become reliable enough for public administration. The demos look strong, but real cases fail on edge conditions, and require constant human correction. The agentic layer becomes only superficially competent with layers of human intervention underneath.

Guardrail: Start Cautiously

Governments should start with minimal deployments, and tightly scoped use cases to validate reliability, develop procedural rigor and organizational competence, and account for technological evolution rather than committing prematurely to large-scale redesigns.

Assumption 2: Agents Can Work Together

The success of agentic systems demands that they’re able to interact seamlessly, conveying intent, carrying out tasks, and sharing data in an interoperable way. MCP (model context protocol) is emerging as the technological standard for connecting AI applications with external systems.

Failure Scenario: Standards Fail to Converge

Commercial interests diverge, establishing competing protocols, while government departments end up using AI systems that cannot communicate with one another. When a citizen’s request requires action from multiple agencies, the process breaks down.

Guardrail: Officials Insist on Shared Protocols

Governments should make interoperability a condition of adoption, participating in the cross-sectoral bodies and forums where these standards are being shaped, funding the development of shared agentic interfaces and other agent-specific standards, and mandating non-proprietary protocols in procurement. Standards rarely emerge by accident, but they may emerge when powerful governments treat them as a priority.

Assumption 3: Organizations Will Adapt

To adopt and employ agents effectively, organizations must rethink their processes, roles, and incentives. They need to flexibly change and dynamically adapt practices to keep pace with the changing technological landscape.

Failure Scenario: The Status Quo Prevents Change

Agentic AI adoption outpaces organizational change, with citizens and civil servants using agents in an uncoordinated manner long before official programs catch up. Local practices harden into path dependence before common standards emerge. The state becomes more productive at producing bureaucracy, not societally beneficial outcomes.

Guardrail: Redesign Processes Before Automating Them

Agents should only enter workflows that have been simplified, decomposed, and restructured to minimize approval layers and handovers. Governments must treat adoption as a continuous discovery process. They should invest in common evaluation templates, reusable components, and a cross-agency repository of lessons, so that what works in one place can travel before what does not work becomes entrenched.

Assumption 4: Private Adoption of Agentic AI Will Be Rapid

Many companies are betting on an agentic future. Firms are experimenting with internal copilots and autonomous customer flows, while frontier AI companies advance core models, architectures, and capabilities, and cloud providers offer the compute needed to deploy agents at scale. This suggests that agents will become commonplace across business, consumer, and enterprise environments, allowing governments to build on tools, infrastructure, and behaviors already spreading across the economy. This assumption rests on projections, though evidence remains ambiguous.

Failure Scenario: Diffusion Is Slower Than Forecast

Governments invest as if an agent-saturated economy is imminent, but industry adoption remains narrow, experimental, or ends up costing more than it saves. Public investments don’t plug into widely used tools and practices, meaning that citizens find agentic interfaces in government before they’re normal elsewhere. The state ends up bearing political and institutional costs without the stabilizing effects of private-sector diffusion.

Guardrail: Lower Barriers to Private-Sector Agentic Usage

Governments can accelerate the development of an agentic AI ecosystem by investing in shared agentic infrastructure—such as standard ways to access public data, communicate across systems, and carry out authorized tasks and payments—that lower integration costs for firms, and reduce the risk of differing technological maturity across sectors.

Assumption 5: Citizens Will Prefer Agentic Services

Increasingly, citizens are interacting with and relying on AI tools, but many do not trust them. For governments to integrate AI agents into workflows and services, citizens must accept and support the roles that agentic systems can play, finding them sufficiently trustworthy, reliable, fair, convenient, and accountable.

Failure Scenario: The Public Rejects Automation

A single notable failure, or an accumulation of failures, turn the public against agentic systems, and convince many to opt-out. They judge automated decisions as opaque, illegitimate and untrustworthy, and suspect it worsens inequality, with privileged citizens able to employ highly capable personal agents to navigate bureaucracy better than those relying on basic tools. The government is forced to run two systems—agentic and human—and neither meets expectations.

Guardrail: Mandate Transparency

Governments must make agent integrations into government processes as legible as possible, furnishing explanations of decisions and publishing evaluation results for agentic fairness and performance, while detecting patterns of systemic bias or unequal benefit distribution based on citizens’ technological access.

Assumption 6: Human Oversight Will Evolve

For AI agents to act with functional autonomy within government processes, oversight frameworks must adapt, moving away from mandatory human reviews and approvals for everything (human-in-the-loop) to intermittent oversight (human-on-the-loop). This evolution increases speed and efficiency while reducing bottlenecks, with humans intervening only on edge cases. There is precedent for such adaptation: governments adapted regulation to cloud computing, e-identities, and AI-driven decision support systems.

Failure Scenario: Regulation Never Updates

Every agentic action requires human verification; every decision must be justified through mechanisms designed for old chains of accountability. Agents can draft, but cannot act. Compliance and procedural costs rise as institutions retrofit old controls onto new AI processes. The result is high bureaucracy and low autonomy: an agentic state in theory, a copilot state in practice.

Guardrail: Sandboxes to Test Oversight

Governments should establish controlled environments that allow policymakers, developers, and civil society to collaborate and gather empirical evidence on what forms of oversight are adequate and best fit different kinds of agentic deployments, reducing uncertainty before codifying rules at scale. They should explore this early, much as Singapore has done through its Model AI Governance Framework for Agentic AI.

Soon, agentic government will be more than optimism and testing. A vanguard of countries will implement these tools. If those cases produce the kinds of benefits imagined, other countries will flock to join them.

But momentum is not inevitability. This project depends on assumptions—about progress, coordination, institutions, norms, and law—that demand scrutiny before governments rebuild themselves around these new technologies.

This red-teaming exercise of the agentic state concept is not to argue against the vision, but to make it more robust and resilient. The six possible failure scenarios are not mutually exclusive. Several could compound, and some may already be taking shape. For instance, reliability has been improving much more slowly than accuracy, providing ground for technology to falter (Scenario 1), and there are signals that the public might reject automation if economic gains and innovation speed are prioritized over fairness (Scenario 5).

Governments that are serious about improving the state with AI must attend to these risks in earnest now, while the architecture is still being laid. The opportunity is too precious to spurn.

Agentic AI could make public services considerably faster, fairer, and more responsive—more so than anything the traditional bureaucratic model has yet delivered. That prize is worth the discipline of preparing for what could go wrong.

For further details on “The Agentic State,” check out the original vision paper

Discussion about this post

Ready for more?