Tag: Interpretability

Papers I learned from (Part 5: Language agents reduce the risk of existential catastrophe)

Simon Goldstein and Cameron Domenico Kirk-Giannini argue that language agents reduce the risk of existential catastrophe from artificial intelligence.

March 21, 2025