Vint Cerf, an American computer scientist, is widely regarded as one of the founders of the Internet. Since October 2005, he has served as Vice President and Chief Internet Evangelist at Google. Recently, he sat down with Google DeepMind’s Public Policy Director Nicklas Lundblad, for a conversation on AI, its relationship with the Internet, and how both may evolve. The interview took place with Vint in his office in Reston, Virginia, and Nicklas in the mountains of northern Sweden. Behind Vint was an image of the interplanetary Internet system - a fitting backdrop that soon found its way into the discussion.
The transcript below has been lightly edited for brevity/clarity.
I. The relationship between the Internet and AI
Interviewer: Nicklas Berild Lundblad (NBL)
Interviewee: Vint Cerf (VC)
NBL:
Could we have witnessed the kind of evolution in AI that we've seen over the past decade if the Internet had not come first?
VC:
I think the answer is “yes.” The evolution of machine learning would have happened whether we had an Internet or not. The basic theoretical progress was already in hand quite a while ago, and the basic technology for multilayer machine learning is not particularly dependent on the Internet. However, the Internet was helpful because it supports collaboration and enables the discovery of content. One could argue that things like Fei-Fei Li's Imagenet would not have happened without the Internet and its vast realm of content.
AI is also changing the demands on the Internet’s infrastructure. We are encountering an interesting challenge here at Google: growing the scale at which we can do training and inferencing beyond a single data centre. We would like to be able to invest resources in more than one data centre and to couple them, but this introduces an interesting problem - the lag time between the data centres.
Ironically, the picture behind me is the Interplanetary Internet system. This system suffers from potentially large round-trip time delays, and we have to figure out how to accommodate that. I say ‘accommodate’ rather than ‘overcome’ because there's nothing we can do about it. We're limited by the maximum velocity of light. We're unhappy about it, but there's nothing we can do about it. We have a similar problem with AI: speed-of-light delay affects the latency between data centres, and we have to figure out how to accommodate multiple TPUs or GPUs concurrently in more than one data centre while exchanging the data needed for training.
The short answer is that I think we would have gotten here anyway. Where we are right now is quite exciting, but also daunting, especially when it comes to ensuring reliable and stable outcomes from large language models.
II. Hallucinations, understanding and world models
NBL:
When you say ‘daunting’, are you referring to hallucinations and the probabilistic nature of modern deep learning models?
VC:
Yes. When we do machine learning in a statistically stable environment, I think we have pretty high confidence in the models we have built. An example is the controllable cooling of data centres, which relies on solid data about power consumption and how to cope with setting the valves and pump speeds. That's a very regular process that we can measure accurately, and our objective function - to reduce power requirements - is very clear.
The same argument could be made for weather forecasts, where we ingest 40 years of weather information at very high resolution, then take 48 hours of recent data and project the next several days. This also means that we can do it with much less compute power than what would be required to solve the gigantic Navier-Stokes equations that are used in some more traditional approaches to weather forecasting. The fact AI works well in this area is a little head-scratching, but it suggests there is a statistical rationale behind the way weather works - and indeed, there is: physics. It’s complex, nonlinear physics, but physics all the same. Machine learning models are well-suited to these nonlinear processes.
My bigger worry is with large language models that can behave in ways that we are still not able to fully predict or control. There are still surprises when we use these models. Attempts to constrain their behaviour often fail because people find ways to jailbreak them. When a model produces counterfactual outputs, we don't always understand why, how to keep it from doing that, or even how to detect that it has happened.
My sense is that there is a major gap to be filled with these models that involves developing a deeper “understanding”. I use that word very carefully - it's an anthropomorphic term - but I’m talking about an AI system having its own model of how the real world works. There are arguments, of course, in the AI community about this. Some people say the models have access to so many small examples that the aggregate of these examples emulates behaviour that looks like the AI system has a real world model ……….except it really doesn't. There is no abstraction layer that we can induce from all these concrete examples. Humans, on the other hand, are very good at this - they look at phenomena and attempt to induce general principles to apply to other situations beyond the ones that led to the concrete induction in the first place.
Based on my limited knowledge of AI, I don't believe that the AI community has developed systems that are well-equipped with real-world models from which they can generalise. We see human beings do this generalisation all the time, with a small number of examples. This is the few-shot argument. A kid understands what a table is very quickly. A two-year-old knows that if they stick a thing with a flat bottom on a table, it won't roll off. They implicitly know that the table is holding things up, against gravity, and they very quickly figure out that a chair could be a table, or a box could be a table, or your lap could be a table. It's not clear to me that the AI models have that same capacity to generalise from a few simple examples.
III. Density & connectivity in human vs silicon brains
NBL:
Do we get overly fixated on human intelligence as the template for AI? We look at human intelligence and say: "We want to design systems that are like us and can do these things." There's almost a soft narcissism at play here. Instead, we could say: “I want to create a system of hierarchical models that will be able to select the right model for the right task.” And the intelligence will be distributed across all of these models, similar to the Mixture of Experts architecture that LLMs use?
VC:
"Like us" is hubris. If you look at what we now know about the connectivity of the brain’s neural structure - not just the human brain, but any brain - it is unbelievably more complex than anything we've ever been able to do with multilayer AI models. The connectivity is the killer. Some of the brain’s neurons have 10,000 interfaces with dendrites of other neural cells. With AI, we don't have anything like that.
But one thing I'm quite intrigued by is the effort underway to understand what's going on inside a multilayer neural network model when you trigger something. When you give it a prompt, which portions of the multiple layers are activated? There's a term for this - Sparse Autoencoding. What I don't know is whether anyone has tried to robustly argue what this multilayer thing actually knows, or how to characterise what it knows.
Some people say that all a model knows is a lot of specific examples, and it's not clear that it has developed any generalised model. On the other hand, think about what happens when we ingest a huge amount of text and then ask ourselves what words are likely to follow other words. That's what goes on in most large language models. A model that can say which words are likely to follow other words has ingested meaning of some kind, but it's not clear how well-organised that is. There is something - there is a ghost in the machine - that knows something. It has taken all that text which meant something, and compressed it into a statistical model, which causes it to generate output that makes it look like it knows something - which, in a way, it does. But exactly what it knows, and how it knows it, and what it can do with that knowledge is still terra incognita in many ways.
NBL:
One thing that gives me pause is that we sometimes talk about the AI model as “the thing”. But clusters of AI models can also interact together in different ways. This brings us back to your first observation about building data centres. Would a distributed network of AI models orchestrated by a ‘meta model’ result in lag times, like we see with the human brain? Everything we experience as humans takes place a couple of milliseconds after it really happens, because the brain is sorting all these signals into a coherent picture. If we build distributed networks of AI models on top of distributed infrastructure, what limits might this impose?
VC:
This is very interesting because if you look at the biological brain, you discover that it doesn't consume all that much energy compared to our silicon brains - which is quite astonishing. And yet, it can do things that silicon brains don't do. It also does things very quickly.
Connectivity must be part of this story. The huge connectivity in comparison with the silicon brain is important. The fact that it's relatively slow is also mildly astonishing when you think about signal propagation. These are chemical interactions - molecules going from one biological neuron at the synapses to trigger propagation of a neural signal.
The fact that the brain is relatively compact is also quite striking. Think about how dense the brain is compared to the density of a data centre, which - when you walk into it - looks unbelievably dense. You look at all the optical fiber cables connecting everything together, racks of equipment all jammed together, and you say, "How does anybody even keep track of what's connected to what?" Of course, the answer is sometimes they don't - and it doesn't work the way it's supposed to. But the density allows for a relatively low propagation delay because of the compactness of the brain, compared to the propagation delays that we might experience in the data centre.
NBL:
If you want to create a sense of "now" in silicon, the conditions under which you can do that are more limited?
VC:
Yes, Ivan Sutherland, one of the former heads of DARPA’s Information Processing Techniques Office, who teaches at the University of South Portland, has pointed out that the density of computer chips today is so high that the propagation delay across the chip is a problem. You end up running multiple clocks on the same chip to keep things in sync, but that's isochronous, not synchronous. The clocks are running independently, and you're just hoping they stay in sync. But you get race conditions and other problems. So he designs asynchronous circuits, with no clocks at all, to avoid this problem.
So even latency across the chip is turning out to be a problem. The density of a chip is higher than the density of the brain in the plane. The actual units are smaller in many ways than the features of a human neural cell. It's just that the neural cells are unbelievably densely packed and connected. I don't think we have a chance of ever doing that kind of density and connectivity in silicon. I just don't have a clear sense of how that would possibly be achievable. We do have multidimensional chip sets, we have 3D kinds of connectivity, but the number of inputs and outputs for any given transistor is probably modest - we're talking about two digits maximum, I think, of connectivity, as opposed to four or five digits of connectivity.
IV. On quantum & consciousness
NBL:
One view of these thresholds and barriers is that we will eventually overcome them, because evolution managed to do so. Sir Roger Penrose, for example, has suggested that the way evolution solved this was by sorting through quantum effects and then making sure that quantum could be the foundation of consciousness. Following this thread, one might argue that unless we solve quantum computing, we won’t be able to build true artificial intelligence?
VC:
Well, you’d need to talk to Hartmut Neven because he's running our quantum program and he has some real experiments to try to show whether quantum entanglement is involved in brain function. He's doing real tests, so we will see what he gets out of those experiments and how they get interpreted.
I remain a little sceptical about this, and part of the reason is that, as far as we are currently aware, quantum computing has limits in terms of the kinds of problems it can address. It's not general-purpose computation - it may not even be Turing complete. If that's the case, then it may be usable for only certain kinds of computations. One example is the famous Shor's algorithm for cracking codes that depend on the factorization of the products of large primes. Optimisation is another example. But you wouldn't use it for balancing your checkbook.
I also remain a little skeptical of the speculation that consciousness and quantum effects are tied together, except for the fact that quantum effects are real. Quantum effects are physics. Our brains are physical objects made out of molecules and atoms, and evolution seems to have used quantum effects in some cases - like in bird navigation or how we detect different smells. There’s this field of quantum biology that suggests that because evolution didn't have to care about what theories we had about nature, it just needed to find something that gave us a small, small edge over time. Another good example is photosynthesis. There’s a very strong argument that photosynthesis is a very efficient use of quantum effects, and much more efficient than any other energy conversion that anybody has ever come up with. So it's not outrageous to think that quantum effects have significance in brain function. But until we have some solid evidence, it's still speculation.
V: Adapting Internet protocols for AI agents
NBL:
I want to come back to the Internet. How do you think the Internet has to change to accommodate AI? What large-scale changes will we have to think through?
VC:
I don't think the Internet itself necessarily needs to change. Google has already made some interesting contributions to the protocol suite. An alternative to TCP/IP, QUIC, for example, introduces improved efficiency, particularly for web-based applications. It was all designed around the needs of HTTPS, basically. And it's been a very effective evolution.
I think that quantum processors and AI machine learning processors are just more computational elements in the Internet, and moving data back and forth between them will work just fine with the protocols that we have available. So I don't see an evolutionary requirement on the Internet side.
What I do see, however, is a need for some standardisation at higher levels in the architecture. As an example, to get two AI systems - two AI agents - to coordinate on a task, they need to have some common understanding of the task. They have to have a shared vocabulary. We need to standardise the vocabulary to minimise misunderstandings.
Humans have the ability to use natural language to misunderstand each other. We would like to reduce the level of misunderstanding between AI-based agents, because they have the ability to propagate errors rapidly and very explosively. When you say, "Gee, it would be nice to have a new car," the next thing you know, your agent might have bought a Ferrari for you - because it knows something about your peculiar interests, even though you never authorised it to spend a quarter of a million dollars on a car.
You can imagine other crazy things that might happen if you didn't have proper constraints - with travel, you want to take a vacation, so an AI agent arranges an around-the-world trip for a hundred thousand dollars - when maybe you weren't planning on that. We need some standardisation for inter-agent interaction, in my opinion, which we don't have right now.
NBL:
This raises a question I have been noodling on a little bit. If the Internet's basic structure stays the same, but AI agents start to use it more and more, then the way websites are built will have to change to accommodate AI agents. Agents will care more about what a resource does than how it looks. So the more visual aspects of the web - like HTML websites designed for humans - may become less important?
VC:
Well, you should look at a paper that Bob Kahn and I wrote in 1988 called "Digital Libraries: The World of Knowbots." We contemplated a language that we might today call Knowbotic. And knowledge robots would use this language to interact with each other in a standardised way.
In that particular case, we imagined standards for representing digital objects. And these digital objects looked kind of like viruses, because the virus has RNA and DNA in the core—that's the content—and then it's surrounded by proteins. I imagined that a digital object had content at its core, and was surrounded by software that allowed you to interact with the content. For example, if an object were music, you could say, "Please show me your written or printed music format" or "Please play this as you would in the Symphony Orchestra in London" or "Give me a high-resolution rendering" or "Play in the style of something else." And then there were access control questions: who had access to the content and what terms and conditions could you negotiate with this digital object because of the software that surrounded it?
So I'm imagining that agents would need a similar kind of capability to talk to each other and to move around. We're doing this already to some extent - when a browser ingests a piece of HTML, it is hosting a piece of software that interprets it. We imagined it a little differently. We thought about knowledge robots moving around on their own volition to different parts of the net, landing on a ‘concierge’ - a hotel for 'bots’ - and discovering what services were available, before executing a task, and then bringing the results back.
NBL:
What we're doing now is implementing ‘computer use’ which is essentially simulating how a human would interact with that particular user interface, but it doesn't take a lot of imagination to realise that there have to be tons of inefficiencies in how we interact with the resource. Over time, do you think the World Wide Web may fade away?
VC:
Well, it may not fade away, but one thing I can say is that our normal mouse and keyboard interfaces don't do well for people with disabilities. But imagine being able to use dialogue to execute something, instead of trying to roam around in a web page that you can't see. So I'm very excited about the idea of AI agents being able to interact with humans in lieu of having to use more traditional screen readers to interact with web pages. It's consistent with what you're saying. The user interface is going to be a really important integral part of AI innovation
VI: Final reflections
NBL:
Is there something that you think that we, as human beings, should prepare for as we integrate this technology into our lives?
VC:
Yes. We need to be very, very aware of the fact that these tools are imperfect and that surprises may occur. Critical thinking is our friend, and building in mechanisms to account for the fact that there can be mistakes is really important. Right now, this is true for all software, not just for AI. We are increasingly dependent on software running more or less according to expectation. And of course, we all know that software often has bugs, and the result is unexpected outcomes. We need to be aware of the fact that we're dealing with relatively imperfect engines here.
I don't think people are fully sensitised to the fact that things won't always work. Many of them will say "Well, it won't happen to me" - until it does. As engineers, we should be much more conscious of the fact that people are going to be increasingly reliant on our software - including AI - to do things that they rely on every day.
NBL:
We have to get better at designing for failure - a bit like airplane engineers.
VC:
I think we also need to accept some responsibility and accountability, which we don't have now. We are not held to account in the same way that a professional civil engineer is. When a bridge collapses, the civil engineer who approved the design is held accountable. Meanwhile, we have unaccountable programmers everywhere. I used to be one of them.
Very relatable. Thanks
"I think we also need to accept some responsibility and accountability, which we don't have now. We are not held to account in the same way that a professional civil engineer is. When a bridge collapses, the civil engineer who approved the design is held accountable. Meanwhile, we have unaccountable programmers everywhere. I used to be one of them."
Computer programmers have been held liable for badly written code and have faced jail time for malicious code.