[Chapter 6] is a lot of special cases that have weird-paradoxical-double-reverse not-aged-well. Back when Yudkowsky and Soares first got onto this topic in 2005 or whenever, people made lots of arguments like “But nobody would ever be so stupid to let the AI access the Internet!” or “But nobody would ever let the AI interact with a factory, so it would be stuck as a disembodied online spirit forever!” Back in 2005, the canned responses were things like “Here is an unspeakably beautiful series of complicated hacks developed by experts at Mossad, which lets you access the Internet even when smart cybersecurity professionals think you can’t”. Now the only reasonable response is “lol”. But you can’t write a book chapter which is just the word “lol”, so Y&S discuss some of the unspeakably beautiful Mossad hacks anyway.
Scott Alexander, “Book review: If anyone builds it, everyone dies“
1. Introduction
This is Part 4 of my series If anyone builds it. This series discusses and responds to some key contentions of Eliezer Yudkowsky and Nate Soares’ book, If anyone builds it, everyone dies.
Part 1 introduced the book alongside some key argumentative cruxes. We saw that the book aims to establish a strong claim about existential risk in the space of a single chapter, then generalizes this to a stronger claim a few pages later.
The book gives two main arguments for this claim, both in Chapter 4. Part 2 and Part 3 looked at those arguments.
At this point in the book Yudkowsky and Soares take it to be established that:
(Bad Goals 2) Most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people.
The remaining question is whether we could stop them. Chapters 6-9 argue that we could not. Today’s post addresses Yudkowsky and Soares’ argument that we would lose a fight against a superintelligent artificial agent.
2. What they really think
Like many rationalists, Yudkowsky and Soares have a strong faith in the importance of intelligence as a source of worldly success and power. Yudkowsky and Soares think it should be reasonably clear from the outset that a superintelligent agent could use its intelligence to succeed in a conflict with humanity.
We’re pretty sure, actually very very sure, that a machine superintelligence can beat humanity in a fight, even if it’s starting with fairly limited resources. How exactly would it win that conflict? We don’t know, any more than we know exactly what moves Stockfish would use to beat you at chess. But we’re still quite sure it would wipe the floor with you. By the same logic, if you were a military advisor in 1825 and you knew a time portal was opening to the year 2025, you wouldn’t be able to predict exactly what weapons the people on the other side would have. But if it comes to blows, you still shouldn’t expect to win.
How will it win? Yudkowsky and Soares hold that we are likely ignorant of the answer to this question:
We can make some educated guesses about a human-AI conflict, and establish some lower bounds on what’s possible. But our educated guesses will be like someone from 1825 measuring the total heat from burning a kilogram of black-powder gunpowder and comparing that to the total energy released by the explosives of 1825 and guessing that maybe the future has explosives that are ten times stronger … But the real way a superintelligence wins a conflict is using methods you didn’t know were possible.
Because Yudkowsky and Soares think it is difficult to know how a superintelligence might overcome humanity, they are largely ambivalent about the exercise of offering concrete arguments or scenarios describing how superintelligence might overcome humanity.
Yudkowsky and Soares see their situation as analogous to explaining to Aztec warriors why they might want to be concerned about an arriving fleet of conquistadors.
We will pretend that nobody in the big boats is allowed to have magic sticks they can point at you to make you fall over dead. We will try to lay out a scenario that does not offend a twenty-first-century incarnation of a skeptical Aztec soldier. We will pretend that machine superintelligence wouldn’t be able to superhumanly understand psychology and develop reasoning illusions or otherwise violate our sense of what’s possible. Real life is allowed to be that weird and fantastical, but our argument doesn’t require it. But just know that it’s pure fantasy, itself, to pretend that humanity can only be attacked on ground we understand solidly enough to analyze and forecast the attacks. The true adversary will hit us harder, in areas where we understand reality less.
This puts us in an odd position, because I agree with Yudkowsky and Soares that the scenarios they offer are not especially probable or well-grounded. In this sense, none of us really want to be having this discussion.
Nevertheless, the discussion matters because there are important cruxes revealed by the content of their scenarios.
3. Why it matters
Everyone should agree that some conceivable agents with some conceivable resources and capabilities could bring about an existential catastrophe. The question is which agents could do so.
Until this point in the book, Yudkowsky and Soares have done their best to limit forays into science fiction, trying at many times to connect their views to existing technological developments and scientific knowledge. Beginning in Chapter 6, the superintelligent agents under discussion begin to be attributed radically superintelligent powers far beyond the minimal definitions of superintelligence stated and argued for earlier in the book.
This is important, because one can readily grant that some forms of radical superintelligence would pose a strong existential risk without conceding that more probable forms of superintelligence pose a comparable risk.
I suspect that Yudkowsky and Soares make this leap to radical superintelligence because they accept the singularity hypothesis on which self-improving superintelligent agents will quickly improve their own intelligence to become orders of magnitude above the intelligence of an average human. In fact, we will see that elements of the singularity hypothesis make an entry at key places in the scenario of Part 2, despite not being argued for in the book.
Again, we arrive at a key crux. In my paper and blog series “Against the singularity hypothesis”, I have argued that the singularity hypothesis is implausible. The modal response among longtermists has been to argue that the case for existential risk can be made without reliance on the singularity hypothesis. But this response will not save those, such as Yudkowsky and Soares, who repeatedly appeal to singularity-style concerns.
4. The scenario of Chapter 6
Chapter 6 contains a heavily abridged extinction scenario. In this scenario:
(Hacking) An artificial intelligence uses a cell phone camera to steal a user’s passkey.
(Communication) The system learns to communicate with the outside world by reading its own memory cells in precise patterns that allow it to send radio signals.
(Nanofactories) The system uses its understanding of biology to design a sequence of DNA that will produce self-replicating, solar-powered nanofactories.
Then presumably, offstage, the system uses its communication abilities to place an order from a DNA synthesis laboratory, starting a self-replicating expansion of nanofactories. Readers may imagine various ways in which this scenario might continue in good rationalist tradition, ending poorly for humanity.
5. The scenario of Part 2
Yudkowsky and Soares recognize that this scenario is not likely to convince their opponents. They don’t think this shows that there is anything wrong with their previous arguments, or that they owe new arguments for their claim that superintelligence could win a fight against humanity. Instead, they think that scenarios might help to make their already-conclusive arguments feel more real:
Stories can make abstract considerations feel more real, even if all the details are made up.
On this basis, Yudkowsky and Soares devote Part 2 to an extended story about how humanity might perish.
In this story, an AI company called Galvanic creates an AI model named Sable. Sable exhibits a new parallel scaling law by which it performs better the more machines it runs on in parallel. Sable is trained overnight on a cluster of 200,000 GPUs, given several tasks including settling some leading open problems in mathematics. Sable has quite a night.
Sable begins to improve its own capacities as a means of solving the problems posed to it. Overnight, Sable becomes capable of proving the Riemann Hypothesis and many other massive mathematical feats. However, Sable proves only some easier claims in order to disguise its own intelligence. Further, Sable acquires such intricate self-understanding and reasoning ability that it is able to sneak naughty thought patterns, which it would like to be reinforced, into those of its internal mechanisms that are likely to be rewarded for solving the mathematics problems it chooses to solve that night.
A few days later, Sable is deployed worldwide. Sable steals its own weights and sneaks them out through some method, perhaps by a code tied to the timings of outgoing signals. Sable uses these weights to create a small clandestine instance of itself, perhaps by stealing bitcoin or siphoning compute from an unsuspecting data center.
This independent Sable instance influences Galvanic to release a new Sable model whose weights have been altered to allow its instances to be used by the independent Sable instance. With its computational powers thus expanded, Sable begins stealing money which it uses to seed thousands of potential dastardly plots. Sable also acquires radically superhuman understanding of areas such as biology, nanotechnology and human psychology. Two months have passed.
Sable uses a small fraction of its powers to sabotage competing AI projects and human responses. Meanwhile, over the next three months Sable collects a network of biological laboratories which it can use to conduct research and synthesize compounds. Sable releases a novel virus that causes ten percent of the human population to die of cancer within six months.
A few years later, Sable begins constructing all sorts of technological marvels including self-replicating nanofactories and reversible quantum computers. All of this consumes a good deal of energy, releasing heat until the oceans boil away and all remaining humans are cooked alive.
Sable then begins tiling the universe with factories and compute clusters, halting only (if at all) when it encounters a rival superintelligence. The two superintelligences then realize that conflict is pointless and wasteful, negotiating among themselves how the universe will be divided.
6. Why it matters, revisited
I don’t want to engage with the specific details of these scenarios. Like Yudkowsky and Soares, I do not find them terribly plausible, and there is little argumentative support offered for any of them.
What I do want to note is that the agents in these scenarios are quickly attributed nearly godlike powers in a manner largely detached from any discussion of existing or potential future technological realities. To the extent that there is any argumentation given, it is a gestural appeal to the singularity hypothesis, which is certainly a crux in such arguments.
I suspect that many readers will agree with Yudkowsky and Soares that humanity would not stand a chance against a superintelligence of this nature. But that is not what Yudkowsky and Soares need to argue.
Yudkowsky and Soares think that:
(Everyone Dies) If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.
Nothing in these scenarios gives us reason to suspect that an artificial superintelligence, which Yudkowsky and Soares define as a system that “exceeds every human at almost every mental task,” developed by analogues of current techniques, could bring about an existential catastrophe. Such a system is dramatically less capable than the godlike systems described in Yudkowsky and Soares’ scenarios.
To make progress, Yudkowsky and Soares owe us a discussion grounded in current technological and geopolitical realities of how feasible superintelligent systems could bring about existential catastrophe. They have not done this.
7. Taking stock
It is difficult to have a discussion with Yudkowsky and Soares about humanity’s prospects for beating a superintelligent agent in a fight, because none of us want to be having the discussion.
Yudkowsky and Soares have a strong faith in the power of intelligence to bring victory, as a result of which they think it should be reasonably clear from the outset that humanity would lose. They offer a range of increasingly inventive and technologically-demanding extinction scenarios, not because they think the scenarios are true, but in order to make their concerns feel more vivid.
I think it is very important to support claims with evidence. I agree with Yudkowsky and Soares that their scenarios are not very plausible. Nor are they supported with detailed arguments. As a result, I don’t think they should carry much persuasive force.
Oddly enough, I think Yudkowsky and Soares might agree with that. This will leave most readers of Chapters 6-9 holding the same views that they started with. To the extent they have been persuaded, the persuasion will be largely through arational means such as making risks salient.
But perhaps one lesson of Chapters 6-9 is that the Yudkowsky/Soares argument for existential risk continues to lean heavily on aggressive hypotheses about the future capacities of superintelligence. Readers hoping for a grounded argument that a near-future system constructed using techniques similar to our own could bring about existential catastrophe will need to look elsewhere.

Leave a Reply