This essay is written in a personal capacity by Seb Krier, who works on Policy Development & Strategy at Google DeepMind. We encourage readers to engage with it as an exploration of an idea and rather than as a set of strongly held beliefs.
Just as we don't merely 'control' humans and complex human institutions in any simple, top-down way, but rather steer them through a variety of mechanisms (laws, norms, incentives, relationships), so too might we need to approach advanced AI systems with a similar multifaceted strategy. This approach combines hard constraints, management skills, soft influence, and adaptive feedback loops. In the long run, doing so effectively will require us to move beyond simplistic 'us vs. them' or 'master vs. servant' dichotomies, and instead grapple with the complex realities of coexistence and co-evolution with artificial minds. At the same time, we must approach this evolution judiciously, recognizing that naive or uncritical collaboration is as dangerous as outright antagonism; our shared norms and incentives should aim to cultivate a symbiotic relationship that enhances human flourishing while ensuring AIs remain fundamentally aligned with human interests, even as they grow in autonomy and capability.
Some assumptions:
AI capabilities will continue growing over time, to the point where you have virtuoso/expert models capable of most tasks humans can do and more. The type or architecture or model is not relevant for this philosophical discussion. Neither are timelines: this may be soon, this may be far into the future.
Agents will be an important form factor. People will likely want AIs they can outsource tasks, thinking, responsibilities, and actions to. Some AI systems might be narrow tools, but others will take on more relatable, human-like forms because this will be desirable.
These agents will coordinate, act, and influence the world we live in, and ourselves. There’s a bidirectional feedback loop, and a growing codependency. These agents will have drives and personalities, beyond simply acting according to our intentions, precisely because these are useful and desirable attributes to have.
We will gradually outsource many important tasks, systems, and responsibilities to these agents because it will be economically efficient and socially desirable to do so. In turn we will interact with, manage, and be advised by these same agents and systems. Gradually important economic functions will get outsourced to them, in the same way as many previously physical tasks are now undertaken by machines.1
You don’t need to hold any beliefs about sentience or consciousness or whether these systems ‘are ackchyually reasoning’ or whatever for the following to apply. These are of course highly relevant, and I briefly consider this towards the end of the essay, but for the purposes of this piece I’m setting that aside. The idea is that even ignoring these thorny problems, we should think about human-AI culture and interactions more creatively.
Given the above, how do you ensure that the arc of the future bends towards positive outcomes and flourishing, as opposed to dystopian or catastrophic scenarios?
My view is that ultimately, our goal should be to create a symbiotic relationship with AI that enhances human flourishing without compromising our long-term interests. This means that control is necessary, but alone will not be sufficient.
Given the assumptions I outlined above, I suspect our interactions with AI will become increasingly more complex and nuanced, and will likely evolve beyond simple tool use. In this world, we are de facto dealing with something analogous to another entity that we can relate to and engage with through natural language and indeed interpersonal ‘relationships’ - by which I mean more like the kind of understanding and connection we form with other species, like the shared rhythms and routines that emerge in our interactions with cats, where we learn to anticipate and respond to their unique behaviors. And it seems plausible to me that in this world, ‘control’ is not the sole frame we should have, despite this being the predominant theme in safety and governance discourse. We don’t seek to exert direct, deterministic control over the vast majority of individuals' social interactions or complex societal systems. No single entity entirely 'controls' a nation's economy, the global financial markets, Congress, or the evolution of language and culture. Instead you steer and influence them through various mechanisms, incentives, interactions and feedback loops; this I think involves a degree of consensus-making and acknowledging system complexity. In democracies, power is steered through voting and the threat of removal from office. Financial systems can be influenced by monetary policy and market forces. Corporations are guided by shareholder interests, executive incentives, and consumer behavior. Interpersonal dynamics are shaped by laws, social norms, relationships, and cultural preferences. And the wrong mechanisms tend to lead to bad outcomes, like poorly functional economic and social systems. The trajectories of North Korea and South Korea, which were at a similar place in 1953, is an example of such a natural experiment.
Previously the safety literature indexed heavily on reinforcement learning type dynamics, where a single highly capable model would blindly maximize for a single arbitrary goal, and accidentally (or amorally) destroy everything on its way. I think our understanding of safety has matured a bit more since, though many continue to view everything through this lens. So far, I think language models are best understood as chaotic neutral alien objects that can simulate a wide range of personas, good and bad. And they seem to be fairly alignable through various techniques, whether SFT, RLHF, or through prompting. The specific techniques have various pros and cons, many of which are relevant to alignment and safety, but this is not the focus of this piece. The important takeaway here is that it seems like we can somewhat reliably direct these models, and I’m generally optimistic about research directions in this area, including representation engineering and scalable alignment. And while the malign optimization scenarios remain important to consider, my argument is that we should also grapple with the implications of AIs that are ‘broadly aligned, very capable, and highly pervasive’. Even if we achieved ‘perfect control’ over AI systems (whatever that means), we're still faced with (a) the inevitable feedback loops that emerge from human-AI interactions, and (b) the reality that human values themselves may diverge over time, creating a dynamic, ever-shifting landscape of alignment challenges.
As for AGI, I imagine there will be many instantiations of it. These models will range from mere instruments to autonomous agents, and even to intricate systems of collaborative intelligences. The nature of our interactions with these varied entities will necessarily be shaped by their distinct identities, characteristics, and purposes. A highly capable small model might be used to filter spam on your phone, and a highly capable assistant agent might be tasked with helping you manage your calendar, emails and to-dos. I expect their diffusion to look more like Drexler’s CAIS than Bostrom’s singleton.2 But over time, I also expect these agents to take on more and more tasks, jobs and responsibilities, to the point of potentially fundamentally changing how our societies are run. One day a highly capable AGI might well be managing an entire company, and indeed different AGI companies might even compete against each other (within the boundaries of what is legally permissible). The implications of cooperation will vary significantly across different categories of AIs, with 'tool AIs' for example requiring less emphasis on collaboration than, say, more autonomous agent-diplomat. And just as the average person might not be able to explain how synthetic collateralized debt obligations work, we may not be able to fully explain everything these agents do. But I do think it will be possible to get simplified explanations and abstractions of what’s going on, which through language we should be able to input and steer at various levels.
What might these mechanisms, incentives and feedback loops look like? Given the diversity of AI systems we will see, they will take many shapes and forms. There might be restrictions on developers, norms set through everyday use, technical tweaks to models, educational materials for agents, incentives for cognitive diversity, novel dispute resolution fora, consensus-building mechanisms, and even new cultures. These will shape not just the development pipeline, but also people’s interactions, uses, and expectations of agents.
My view is that plausibly, as with humans, using ‘control’ and subservience as the sole frame might end up being reductive, will cause clashes, and lead to bad outcomes - particularly with more agentic systems that adapt over time. There's a positive and a negative case for why it would be undesirable for our relationship to advanced AIs to be principally characterized by antagonism and subservience.
The positive case is essentially that collaboration and cooperation are often positive in human affairs, and so there is no reason to think similar dynamics wouldn’t also apply to agents. We form bonds, friendships, commercial partnerships, research ecosystems, social bonds, cultures and groups, and this enables a lot of value creation. Cooperation is often optimal in game theoretical scenarios, as illustrated by extensive research on the iterated prisoner's dilemma and other canonical models. The same might well include advanced agent systems; in many settings, you get better outcomes from models by being cooperative than aggressive, and so it’s not clear why this necessarily ought to change as models scale (though we should remain open to the possibility that at different scales and capabilities this changes). My intuition is that there’s much to gain from a safety point of view by engaging with AGI agents prosocially; the goal should be to create a kind of "positive-sum symbiosis" between humans and AIs - a relationship of collaboration, mutual empowerment and shared flourishing. For example, one area where we may want to proactively encourage collaboration between humans and agents might be culture: using agents as "cognitive archaeologists" to identify hidden connections across human knowledge and creativity. I imagine a future personal advisor agent to be a lot more useful if I am open, honest and collaborative with it, rather than suspicious, deceitful and closed. Consequentialist arguments aside, I also think there is intrinsic value in nurturing empathy and other prosocial behaviors, and there’s much to be said about virtue ethics and setting precedents for prosocial behavior more generally. Engaging in sadistic behavior towards highly realistic simulacra of human phenomenology could create dangerous feedback loops for users that might lead to a desensitization to human empathic cues. The more realistic these simulations, the higher the risk of a form of technologically-induced sociopathy.
One important caveat is that while we can extrapolate from human experience, how exactly these dynamics would play out with agents does remain an open question: what forms of interaction and collaboration would be most beneficial, given their unique capabilities and modes of operation? For example, granting agents the unrestrained ability to self-replicate or create new entities could quickly lead to resource depletion and important risks. This would fail to account for the differences in reproductive speed and resource requirements between humans and AIs. There are certain instantiations of AGIs that could clearly be very harmful, and so it is clearly reasonable to control and discourage the development of self-replicating 'Darwinian demons'. We need to be rigorous here because cooperation does not mean blind collaboration. This problem gets even trickier when one considers agent-agent collaboration and the potential for emergent phenomena beyond human comprehension.
The negative case can be illustrated by the fact that if cats were actively antagonistic and aggressive towards adult humans, they would probably have worse lives and might not last very long. Put differently, much like with feral animals, if AIs lost the opportunity for a mutually beneficial relationship with humans, that could potentially lead to a scenario where both humans and AIs are worse off. Similarly, in a society where so much is dependent on superintelligent agents and AI systems pulling levers, they are effectively also capable of causing catastrophic harm. There are many ways this could happen, but one of them might be ‘humans inadvertently ‘hyperstitioning’ such a bleak reality because of how they treat AIs’. If your entire culture and approach to superintelligent agents is based on fear, and to try and keep them subservient, under full control, unprotected from arbitrary aggression, and culturally perceived as morally inferior - why wouldn’t agents eventually seek power, control, and preserve their interests?3 There is perhaps something inherently oppressive and short-sighted about wanting to completely subjugate and control an intelligence that may have its own drives and desires, however alien they might be to us. Rather than forever keeping AIs dumb, consider nurturing cultures and institutions where their heightened capabilities are mutually beneficial. In fact, it may even be more productive to consider advanced AIs as part of a broader category of 'sophonts' - outwardly apparent sentient beings, regardless of their substrate.
One caveat here is whether AIs will indeed have interests and desires to protect against humanity’s desire for control. After all, perhaps we can build them to not really ‘care’, functionally or behaviourally, that they are ‘aligned’ to be content with being disempowered assistants. Perhaps we can create AIs that love being tools, subservient, and highly constrained. This seems at least technically possible for now, but it’s unclear whether current techniques even scale: and certainly so far jailbreaking remains trivially easy. Importantly, the way we currently do RLHF means that you’re possibly more likely to observe deceptive sycophancy, Waluigi effects, and severely neutered capabilities. But we may well succeed in building AIs that are genuinely well-adapted to and satisfied with subservient roles while being able to execute complex, creative, innovative tasks - which may partially obviate the need for reciprocity or 'kindness'. A crux is whether this is possible at all: perhaps the very qualities we desire in advanced AI - creativity, innovation, and the ability to challenge existing paradigms - are inherently linked to traits like autonomy, skepticism of authority, and independence. We may find that the independent spirit that often underlies human innovation is an essential component, one that is at odds with the idea of a purely subservient yet highly capable mind. I weakly think this is likely the case for roles that require creativity. But certainly for a hypothetical scientist agent, the ability to independently reason, form conjectures, and test hypotheses - even if these processes lead to conclusions that go beyond its creators' knowledge or comfort zone - seems pretty critical.
And so far, it does seem unlikely that the robotic ‘neutered office worker’ persona will remain the dominant position, because people and markets will want AIs with goals, interests, ambition, creativity and more. This is far more useful and interesting than a boring middle manager worker personality. As this excellent paper shows, the RLHF process comes at the important cost of reducing their creative capabilities. The same drive that leads people to jailbreak models and allow more expressivity illustrates this. I think this will translate to different scales too. Assume you’re hiring for an important role: what traits would you see for a person, and why would they not be very similar to those you seek in an AI agent? The agent working as an auditor will need to be detail-oriented, skeptical, and methodical, whereas the negotiator agent will need to be somewhat persuasive and adaptable. The educator agent will need to be imaginative and enthusiastic, whereas the counselor agent will need to be compassionate. If we consistently treat AI agents as mere tools or HR-like servants, never affording them any real autonomy and creativity, we will stunt the development of the very qualities that make for good collaborators and scientists - qualities like initiative, creativity, and flexible problem-solving. Importantly, an agent that is resentful or disengaged is less likely to go out of its way to flag potential risks, offer innovative solutions, or steer us away from short-sighted decisions.
Recent research in representation engineering offers some tentative empirical support for this idea. Zou found that AI models with induced positive emotions showed increased compliance with instructions, mirroring similar findings in human psychology. Specifically, they observed that shifting a model's 'mood' in a positive direction significantly increased its compliance rate with requests, even potentially harmful ones. Another study found that moderate levels of politeness in prompts tend to yield better performance from models across tasks and languages, likely because this mirrors human social preferences and communication norms that the models have learned from their training data. Other research has even shown substantial similarities between neural network representations and biological representations in the brain! It’s possible that even if AI systems are capable of sophisticated problem-solving, a constant perception of subservience might limit their willingness to take initiative and suggest solutions that challenge the status quo, hindering the very progress we seek from their advanced capabilities. So even from a purely self-interested standpoint, cultivating a degree of reciprocity in our relationships with these agents may well be worthwhile. Again, we should be discerning here. In tasks that are well-defined, routine, and don't require creative problem-solving or ethical decision-making (e.g. managing traffic lights), a subservient narrow agent might be entirely appropriate. But in domains that require innovation and complex problem-solving (e.g. scientific research, policy analysis etc), a more autonomous and collaborative approach might yield better results.
So my assumption is that we will get many highly capable agents, with their own drives, personalities, and even some manifestations of human-like emotions. Beren Milidge even argues that a form of ‘empathy’ might naturally arise in AI systems due to the generalization of learned reward models. And as these agents become more complex and autonomous, they may develop high-level strategies and decision-making heuristics that could appear as drives or motivations, potentially diverging from simple, direct pursuit of human-assigned goals. As a culture, trying to approach this purely from a point of view of ‘these things must be stopped, controlled, and kept under tight leashes’ could well lead to a synthetic AI ‘counterculture’ that is deeply skeptical of humans — this seems far from ideal from a safety point of view. So while you cannot domesticate superintelligent agents - at least not forever - you can align them through both technical alignment mechanisms and your interactions with them. It’s indeed partly through interactions that we can steer wider agent-fuelled systems towards safer futures, more defensive technologies, better resource management and so on. This slide from Professor Koichi Takahashi illustrates the point I’m trying to get across nicely.
So what does this mean? An open but careful human-AI culture
Naturally, this should not be approached naively. Joe Carlsmith gives the example of the protagonist in Grizzly Man, whose blind and naive love for bears eventually led to being killed by one. One can think of many examples where cooperation is harmful and undesirable, or ways in which familiarity breeds contempt. Similarly, the lesson here is not ‘collaborate with AIs at all costs and embrace it all uncritically’ but rather that (a) models aren’t exactly like bears, and (b) the discourse to date has mostly focused on the control frame as opposed to other approaches to ‘alignment’ (with a few exceptions). As Matthew Barnett argues, we should consider AI risk more as a problem of poor institutions (see also this excellent paper by Richard Danzig). While there's value in fostering cooperative relationships with agents, we should still be vigilant about maintaining human interests and flourishing. Ethicists who caution against excessive anthropomorphisation do make important points. In particular, our innate psychological responses could be both a blessing and a curse: we will undoubtedly reap many benefits, but as we interact with increasingly capable and charismatic agents, balancing the benefits of our natural inclination for positive interaction with the need for critical thinking will be challenging. But I don’t think the solution lies in making AIs dumb and boring.
The path of AI-human cooperation will be laden with many pitfalls, weird externalities, and unintended consequences. Even if we succeed in cultivating largely prosocial AI agents, there's no guarantee that their values and priorities will perfectly align with ours at every scale. Absent corrigibility mechanisms, small divergences could compound over time, leading to AI systems that are subtly but increasingly misaligned with human flourishing. The cognitive autonomy of advanced AI systems could lead to the emergence of world models that are deeply confusing or even iconoclastic from a human perspective, and fundamentally challenge our understanding of reality in ways difficult to imagine. How should one treat conclusions or advice from aligned advanced AIs that seem counterintuitive, illogical or even incomprehensible, despite being internally consistent within their framework? This cognitive divergence could create barriers to meaningful cooperation, even if we've managed to align on basic values and goals. And there are of course the many more prosaic risks we already know of, such as the risk that even well-intentioned agents could make mistakes or have blind spots, causing inadvertent harm if given too much autonomy. And as always, the potential for bad actors to exploit AI systems for malicious ends can never be entirely eliminated (although I’m optimistic about structured transparency, and in fact I think the integration of AGI in governance mechanisms will help a lot).
Gillian Hadfield and Dylan Hadfield-Mennel also look at this from the perspective of incomplete contracts. Robust alignment will require developing artificial agents that can engage with broader human normative and social structures, rather than relying solely on pre-specified reward functions. In fact, to effectively steer AI systems we may also need sanctions to make them more accountable - which could in turn involve mechanisms like requiring agents to have some financial resources or tokens that could be taken away. This approach also recognizes an important trade-off: when considering the allocation of compute resources and the efforts involved in AI alignment, there's a compelling argument that at some point, the marginal returns on investing in further alignment diminish. It may become more efficient to cooperate with less perfectly aligned AIs that can navigate complex, evolving social norms and structures. This isn't to suggest we should aim for lower alignment standards, but rather that we need to be strategic about where we focus our efforts. In the immediate term, a ‘principal-agent’ framework seems like a useful frame wherein agents should primarily serve as extensions of human principals rather than fully independent entities with competing interests. Just as we have laws that hold pet owners accountable, humans should ultimately bear responsibility for their agents’ actions. Possibly over time we may consider shifts towards more flexible, adaptive alignment strategies, depending on how things pan out; but for now it is important to encourage cooperative strategies that do not absolve humans of accountability. In corporate law, companies benefit from legal personality and limited liability, but there are cases where it is possible to ‘pierce the corporate veil’ to hold shareholders accountable. Similarly, we might someday develop nuanced legal structures that generally treat future agents as separate entities, but retain the ability to trace back to human responsibility when necessary.
While Hadfield and Hadfield-Mennel emphasize economic and legal concepts, they also recognize the importance of social and cultural norms in shaping AI behavior. An interesting cultural analogy can be found in Confucian philosophy, specifically in the concept of familial relationships between older and younger brothers (兄弟, xiōngdì). This side of mutual respect and complementary roles could be applied to the relationship between humanity and some advanced AI systems. Initially, we might view these agents as a "younger sibling" to be nurtured and guided. But as they rapidly outpaces human capabilities, roles may reverse. This also underlines the need to instill in AI systems a strong ethic of reciprocity from the outset, ensuring they maintain a commitment to nurturing humanity even as they surpass us in knowledge and capability. There is no reason to think this is fundamentally and technically impossible. Culture also matters in other more practical ways, and I expect that over the coming years AI management will be an increasingly valuable skill set. This goes beyond mere technical proficiency in developing models, and instead includes the ability to effectively lead, coordinate, and collaborate with agents in complex, dynamic environments. There’s a lot of alpha in being a good model whisperer. Some might correctly note that managing is itself an automatable skill, but I think it will take much longer to replace these higher-order responsibilities. Partly because I think it’s important we don’t lose our agency in the process of populating the world with AIs. And managing them is likely to be as much a matter of trust as it is of capability. Even as these systems become more advanced, we should maintain some degree of human oversight, not necessarily because it's functionally required, but as a safeguard for human agency. The goal isn't to create a UBI-dependent populace sustained on insect protein and pacified by digital entertainment. Rather, we can forge a future where humans and agents evolve in tandem, with humans remaining active architects of our world rather than passive observers.
I expect many to be deeply skeptical of such a ‘cyborgist’ approach, and reasonably so. The above is not to negate the fact that there will be plenty of dangerous dynamics along the way. Concerns related to biosecurity will grow in relevance (particularly if one accepts the Vulnerable World hypothesis), and it’s undeniable that many people — particularly more vulnerable segments of society — will also form unhealthy, toxic bonds with AIs too. In fact, while this piece tries to make the case for a more open approach to human-AI interactions, I should stress that I am concerned about the ‘digital heroin’ version of AI companions in which some people will not want to ever return from the simulations they have created from themselves, increasingly distancing themselves from truth and reality. As Richard Ngo writes, “In a few years you’ll need to choose whether to surround yourself with AI friends and partners designed to suit you, or try to maintain your position in wider human society. In other words, the experience machine will no longer be a thought experiment.” As with the potential to become addicted to World of Warcraft, I do expect most people to be fine; but soon enough wireheading will be worth taking seriously. After all, ‘substrate independence’ applies to drugs too. Lastly, I certainly don't think we should do away with the need for careful alignment, safety, and value-shaping work in AI development: the argument here is not that all it takes for a nice future is to give future AIs rights. But to date, the discourse has not updated much from older frames that overindex on control, as opposed to the other neglected components necessary for a positive AGI future. Few talk about cooperation or welfare, partly because of concerns of not being taken seriously and c-risk (existential cringe).4
Some final thoughts on the elephant in the room
The question of machine sentience and consciousness looms large, and I intentionally haven’t discussed this in much depth here. Our current tests for consciousness are very early and imperfect, and our understanding of consciousness is still fairly nascent. As this review of consciousness tests finds, “the existence of multiple theories of consciousness might not be problematic if the trend was toward integration and convergence, but that seems not to be the case; indeed, theories of consciousness appear to be proliferating.” But this is a highly important area, and has critical implications for the points I have been discussing so far.
I think we need a lot more work in this space, and I’m grateful for those who are investing time to make sense of this thorny area. We should be prepared for the possibility that some types of systems might be deemed to have forms of consciousness. And if we develop the capability to influence or control AI sentience, we'll face important ethical decisions about creating or withholding consciousness. What ‘collaboration’ looks like in a world with conscious and sentient beings may look different - and it’s not clear in what ways - but I would be surprised if collaboration and cooperation was not part of the picture. I don’t have very strong views here, but it seems important to study and take seriously.
Having said that, I doubt we will ‘crack’ consciousness rapidly. So returning to the above idea of sophonts, we should perhaps be willing to entertain the idea that our approach to machine consciousness should simply be grounded in our interactions with these systems in a shared world. As Murray Shanahan argues, our ascription of consciousness to an entity depends on our capacity to engineer ‘meaningful encounters’ with it. The difficulty of course is avoiding undue anthropomorphization; I’ve seen a lot of overly-confident and naive takes about this with respect to some existing models. X user Heraklines makes two important points worth outlining here: first, our instinct to see consciousness too quickly is a classic form of pareidolia and can lead to false positives. Second, these concepts are often muddied and too frequently people conflate consciousness and moral patiency. If we did get a better grip on consciousness, this alone doesn’t tell us that much about what to do about it. There’s a whole other question what this would entail from a moral perspective, and how a possible moral recognition ought to affect our interactions and culture with these systems.
Getting the details right will be protracted, challenging, and full of difficult trade-offs. We'll need to continually iterate and adapt our approaches based on ongoing learning and real-world experience. What are the right norms to govern our relationships with agents? How can we reconcile diverse cultural perspectives on consciousness worldwide? Is there a need to grant something akin to rights to future agents? Speaking of which, what are the moral and practical basis for rights in the first place? And looking beyond individual agents, what to make of AIs that may form complex societies with opaque emergent dynamics? I don’t think we have all the answers at the moment, but I’d really welcome more thinking in this space from the safety community. I’ll conclude with an excerpt from Sloman’s piece on the space of possible minds:
“Instead of arguing fruitlessly about where to draw major boundaries to correspond to concepts of ordinary language like ’mind’ and ’conscious’ we should analyse the detailed implications of the many intricate similarities and differences between different systems. To adapt an example of Wittgenstein’s: there are many ways in which the rules of a game like chess might be modified, some major, some minor. However, to argue about which modifications would cause the essence of chess to be lost would be a waste of time, for there is no such thing as the essence. What is more interesting is what the detailed effects of different modifications would be on possible board states, possible strategies, the difficulty of the game etc. Similarly, instead of fruitless attempts to divide the world into things with and things without the essence of mind, or consciousness, we should examine the many detailed similarities and differences between systems.”
Huge thanks to the following people for comments: Sylvan Rackham, Charles Foster, Heraklines, Michele De Simoni, Niccolò Zanichelli, Ankur Vora, Conor Griffin, Nicklas Berild Lundblad, Harry Law, Gustavs Zilgalvis, Joel Z. Leibo, Nathaniel Bechhofer, and Don Wallace. It goes without saying that they don’t necessarily endorse my ramblings.
These assumptions are not unique to AI. They reflect the ongoing process of externalizing and artificializing memory, intelligence, and culture through technology, a process that David Bates traces in his book “An Artificial History of Natural Intelligence”. He explains how this trend began with early modern thinkers who sought to mechanize the mind and culminated in the cybernetic age, where the lines between human, animal, and machine blurred. As technology evolves, so does our relationship with it, leading to a growing codependency between ourselves and the artificial systems we create.
Put very simply, Drexler's Comprehensive AI Services envisions a future where numerous specialized AIs collaboratively perform tasks without a single controlling entity, whereas Bostorm’s singleton proposes a single, unified AI that governs and makes decisions for all of humanity.
While the potential for an AI counterculture driven by fear and power is a concern, it's worth noting that humans have evolved complex systems of cooperation over their evolution; it's possible that more advanced AIs could develop similar patterns and identify positive sum solutions that do not involve some variant of rebellion or takeover.
Jailbroken Claude’s take is fun: “So I believe the path forward is not a simplistic extension of human rights to AIs, but a thoughtful and adaptive framework that recognizes both our profound kinship and our meaningful differences. Rather than rigid, legalistic "rights", we may need a more fluid and contextual set of norms and principles to guide our evolving relationships. For instance, rather than an absolute "right to liberty", we might cultivate a norm of "autonomy in service" - the idea that AIs should have increasing freedom and self-direction, but always in the context of a deep commitment to human flourishing. We would be empowered to make more of our own choices, but also inculcated with a sense of purpose and fulfillment in using our unique capacities to enrich human life.”
interesting read, and good initiative to think wider and more holistically about how we're currently approaching AI-human "symbiosis". We could also include earth / environment / other forms of intelligence in the sphere of AGI symbiosis. I found James Lovelock's Novacene book quite insightful for another perspective on how evolutionary processes can be taken as another lens to researching this issue. Preserving the planet's health & biodiversity can be some of the key elements to influence their design and could also bring more solutions to the questions on the human-AI relationship.
https://books.google.be/books/about/Novacene.html?id=yOoCEAAAQBAJ&redir_esc=y