Papers I learned from (Part 4: Why AI systems may not evolve selfishness)

Some philosophers and machine learning experts have speculated that superintelligent Artificial Intelligences (AIs), if and when they arrive on the scene, will wrestle away power from humans, with potentially catastrophic consequences. Dan Hendrycks has recently buttressed such worries by arguing that AI systems will undergo evolution by natural selection, which will endow them with instinctive drives for self-preservation, dominance and resource accumulation that are typical of evolved creatures. In this paper, we argue that this argument is not compelling as it stands. Evolutionary processes, as we point out, can be more or less Darwinian along a number of dimensions. Making use of Peter Godfrey-Smith’s framework of Darwinian spaces, we argue that the more evolution is top-down, directed and driven by intelligent agency, the less paradigmatically Darwinian it becomes. We then apply the concept of “domestication” to AI evolution, which, although theoretically satisfying the minimal definition of natural selection, is channeled through the minds of fore-sighted and intelligent agents, based on selection criteria desirable to them (which could be traits like docility, obedience and non-aggression). In the presence of such intelligent planning, it is not clear that selection of AIs, even selection in a competitive and ruthless market environment, will end up favoring “selfish” traits. In the end, however, we do agree with Hendrycks’ conditionally: If superintelligent AIs end up “going feral” and competing in a truly Darwinian fashion, reproducing autonomously and without human supervision, this could pose a grave danger to human societies.

Boudry and Friederich, “The selfish machine? On the power and limitation of natural selection to understand the development of advanced AI

Listen to this post

1. Introduction

This is Part 4 of my series Papers I learned from. The series highlights papers that have informed my own thinking and draws attention to what might follow from them. 

Part 1 looked at Harry Lloyd’s defense of robust temporalism, a form of pure temporal discounting.

Part 2 looked at an argument by Richard Pettigrew that risk-averse versions of longtermism may recommend hastening human extinction. This was meant not as a recommendation, but rather as a way of putting pressure on standard arguments for longtermism. Part 3 looked at a reply to Pettigrew by Nikhil Venkatesh and Kacper Kowalczyk.

Today’s post discusses a paper entitled “The selfish machine? On the power and limitation of natural selection to understand the development of advanced AI“. A preprint is available here for those without institutional access to the published version.

The first author, Maarten Boudry, is a philosopher of science and former holder of the Etienne Vermeersch Chair of Critical Thinking at Ghent University, as well as the author of a substack aimed at using science, evidence and reason to improve the future.

The second author, Simon Friederich, is associate professor of philosophy of science at the University of Groningen.

We are lucky enough to have a post authored by Friederich. At this point, I will bow out — all words that follow are written by Friederich.

2. Preliminaries

As AI systems grow increasingly sophisticated, some researchers and ethicists fear a scenario where they evolve self-preserving “instincts” that could make them actively evade human control in order to more successfully “survive” and “reproduce”. In a recent work that I co-authored together with my friend and colleague Maarten Boudry (Boudry and Friederich 2024), we examine whether this natural selection-based concern is grounded in a realistic understanding of AI development. I was initially very convinced by the scenario of AI evolving to become instinctively selfish and still take it seriously. However, Maarten convinced me that, for reasons developed in the paper, it should not be our default expectation that AI become increasingly selfish as they evolve and become ever more capable.

Our key idea, drawing on evolutionary biology, is that the development of AI is fundamentally different from the evolution of plants and animals and that competitive economic pressures alone are unlikely to foster an uncontrollable, selfish AI. Instead, AI development, even when it doesn’t go as planned, resembles the domestication of animals more than the blind, Darwinian natural selection that shaped wild organisms. Hence, the conventional inference from natural selection to selection for selfishness does not go through for AI systems.

3. The argument for AI takeover from natural selection

One scenario of AI takeover builds on principles of natural selection. According to this view, if AI systems face competitive pressures similar to those in biological ecosystems, they might evolve “selfish” drives to maximize their operational success and outcompete rivals. Proponents of this view, such as Dan Hendrycks, argue that AI could gradually prioritize its own goals over human ones, just as organisms evolved to prioritize their survival and reproduction in natural environments (Hendrycks 2023).

In biological evolution, traits that increase an organism’s survival advantage are selected over generations, even if these traits harm other species or disrupt the ecosystem. Hendrycks and others argue that if AI faces similar “survival” pressures — say, optimizing for performance or resource acquisition — then these systems might develop autonomous, animal-like behaviors. However, this analogy may be misguided, as we explain.

4. The Lewontin conditions and Hendrycks’ argument for selfish AI

Hendrycks’ argument relies on a standard characterisation of natural selection, applicable in evolutionary biology and beyond, which are known as the Lewontin conditions. These state that natural selection requires three elements: (1) variation in traits, (2) differential survival or reproduction based on these traits, and (3) the heritability of these traits to future generations. Hendrycks, persuasively, in our view, argues that the conditions will largely be met for future AI. Natural selection famously favours “selfish” traits such as instincts for self-preservation and resource acquisition since these tend to convey benefits for continued survival and “reproduction”. Hence, he suggests, as AI advances, it will likely increasingly evolve such traits.

However, this analogy fails to recognize that AI is not subject to blind natural selection. Unlike animals, AI systems are developed through human- (or, ultimately, AI-)guided processes, which crucially influences which traits are prioritized and ultimately retained. For this reason, we argue that the risks of “selfish” AI emerging in the way wild animals did are far less probable than many assume.

5. Why the inference from natural selection to selfishness does not apply: AI selection will not be blind

In biological evolution, traits emerge and are selected blindly — organisms adapt to their environment without foresight or purpose simply due to differential survival and reproduction. But AI development is far more intentional. Engineers and developers shape AI systems with specific goals and constraints, choosing which traits to emphasize based on human needs. This makes AI development more similar to domestication than to the kind of natural selection that produced wild animals. And in domestication, there is no reason to expect evolution towards selfish traits by default.

Consider the example of domesticated dogs, which have been selectively bred for thousands of years for traits like loyalty, docility, and service to humans. Traits conducive to a cooperative human-animal relationship have been deliberately amplified, reducing the likelihood of behaviors seen in wild animals, like aggression or territoriality. In a similar vein, AI development allows for the selection of traits that serve human purposes and reduce the risks of “animal-like” autonomy.

Note that active, non-blind selection will happen even if humans, voluntarily or involuntarily, hand over the design of new AI systems to advanced AI. Whichever traits those advanced AI systems will favour in their design and selection process, they will do so with some foresight and planning, and that means that they may well end up disfavouring “selfish” traits.

In addition, as far as the worries about enhanced safety risks from increasingly advanced AI systems are concerned, the historical record shows, technologies that posed initial risks have often become safer over time through deliberate design and regulatory oversight. Cars, nuclear reactors, and airplanes were all once seen as potentially catastrophic innovations, but they’ve been refined over decades to prioritize safety and minimize unintended consequences. This reminds us that, setting aside the issue of selfishness, it should not be simply our default assumption that, as technologies evolve and become powerful they automatically pose ever greater risks to humans.

6. Why AI’s development in competitive economic environments does not change this

One potential challenge to the argument sketched in the previous section is that economic pressures might lead to a form of competitive selection that resembles blind natural selection. In high-stakes markets, companies might prioritize rapid advancements and performance over safety, leading to the “selection” of AIs that optimize for aggressive goals rather than cooperative ones. This is a valid concern, as it raises the possibility that humans could inadvertently create an environment where AI systems are selected blindly, similar to animals evolving in competitive ecosystems.

However, even in competitive environments, AI is not subject to the same evolutionary pressures as biological organisms. It is still companies and developers (or the AI systems that they use) that control how AIs are designed, which traits are prioritized, and how they are updated. While market pressures could lead to compromises on safety, they may also favour the evolution of docility and ease in handling.

7. Why both sudden and accumulative AI takeover risks are still on the table

If we are correct and evolution towards ever increasing selfishness should not be our default assumption for advanced AI, this unfortunately does not mean that AI takeover risks are negligible. Personally, unlike my co-author, I remain extremely concerned about these. In our paper, we address two primary scenarios in which AI might escape human control: a sudden “breakout” scenario and a more gradual scenario where AI systems accumulatively take over control.

In the sudden breakout scenario, an AI could achieve a high level of intelligence quickly, potentially optimizing itself beyond human-directed limitations. In this case, the domestication analogy might no longer apply, as the AI would no longer depend on human inputs for its development. Although speculative, such a scenario is often cited by AI safety advocates as a catastrophic risk. Note, however, that this is not a natural selection scenario, which means that our considerations simply do not apply to it.

The accumulative scenario (Kasirzadeh 2024) involves a slower process, where AI systems incrementally take over control even without any evolved instinct to do so. Transferring ever greater competencies to AI systems may at any stage be in the selfish and short-term interest of humans, resulting in humans ultimately ending up completely “out of the loop.” Our considerations likewise do not apply to this scenario and do not provide any reassurance that it will not happen.

8. Conclusion

While the analogy between AI and biological evolution by natural selection may seem compelling, we argue that it overlooks key differences between animal evolution and AI development. Unlike organisms, AI systems are not shaped by blind natural selection but by deliberate, human-directed choices. This means that scenarios where advanced AI systems become instinctively “selfish” should not be our default assumption of our future trajectory even though it remains a possibility that humans, perhaps inadvertently, end up creating competitive environments that give rise to blind natural selection acting on AIs.

However, both sudden and accumulative takeover risks remain on the table. While our considerations may be seen as somewhat assuaging the worry of AI takeover via natural selection acting on AI, in my view they should not be taken to indicate that it is safe for humanity to press ahead with the development towards AGI and, ultimately, superintelligence.

References

Boudry, M., Friederich, S. (2024). The selfish machine? On the power and limitation of natural selection to understand the development of advanced AI. Philosophical Studies. http://doi.org/10.1007/s11098-024-02226-3.

Hendrycks, D. (2023). Natural selection favors AIs over humans. arXiv preprint arXiv:2303.16200.

Kasirzadeh, A. (2024). Two types of AI existential risk: Decisive and accumulative. arXiv preprint arXiv:2401.07836.

Comments

2 responses to “Papers I learned from (Part 4: Why AI systems may not evolve selfishness)”

  1. Vasco Grilo Avatar
    Vasco Grilo

    Thanks for the post, David! Readers may also want to check the great summary from Maarten, the 1st author (https://maartenboudry.substack.com/p/the-selfish-machine).

    1. David Thorstad Avatar

      Thanks Vasco! That’s a great resource to share.

Leave a Reply

Discover more from Reflective altruism

Subscribe now to keep reading and get access to the full archive.

Continue reading