AI safety Archives - Page 2 of 2 - Reflective altruism

Instrumental convergence and power-seeking (Part 2: Benson-Tilsen and Soares)

A leading power-seeking theorem due to Benson-Tilsen and Soares does not ground the needed form of instrumental convergence

June 27, 2025

Papers I learned from (Part 5: Language agents reduce the risk of existential catastrophe)

Simon Goldstein and Cameron Domenico Kirk-Giannini argue that language agents reduce the risk of existential catastrophe from artificial intelligence.

March 21, 2025

Exaggerating the risks (Part 8: Carlsmith wrap-up)

I take a final look at the Carlsmith report on risks from power-seeking artificial intelligence, focusing on AI timelines as well as the argument from practical PS-misalignment to disempowerment of humanity.

June 3, 2023

Exaggerating the risks (Part 7: Carlsmith on instrumental convergence)

I take a second look at the Carlsmith report on risks from power-seeking artificial intelligence, focusing on Carlsmith’s argument for instrumental convergence.

May 6, 2023

Exaggerating the risks (Part 6: Introducing the Carlsmith report)

Many effective altruists believe that artificial intelligence poses a significant existential risk in this century. Let’s look at how a recent report by Joe Carlsmith makes this point.

April 8, 2023

Tag: AI safety

Instrumental convergence and power-seeking (Part 2: Benson-Tilsen and Soares)

Papers I learned from (Part 5: Language agents reduce the risk of existential catastrophe)

Exaggerating the risks (Part 8: Carlsmith wrap-up)

Exaggerating the risks (Part 7: Carlsmith on instrumental convergence)

Exaggerating the risks (Part 6: Introducing the Carlsmith report)