The [AI 2027 report] is huge, so I focused on one section alone: their “timelines forecast” code and accompanying methodology section. Not to mince words, I think it’s pretty bad. It’s not just that I disagree with their parameter estimates, it’s that I think the fundamental structure of their model is highly questionable and at times barely justified, there is very little empirical validation of the model, and there are parts of the code that the write-up of the model straight up misrepresents.
1. Introduction
This is Part 19 of my series Exaggerating the risks. In this series, I look at some places where leading estimates of existential risk look to have been exaggerated.
Part 1 introduced the series. Parts 2-5 (sub-series: “Climate risk”) looked at climate risk. Parts 6-8 (sub-series: “AI risk”) looked at the Carlsmith report on power-seeking AI. Parts 9-17 (sub-series: “Biorisk“) look at biorisk.
Part 18 continued my sub-series on AI risk by introducing the AI 2027 report. Today’s post continues my discussion of the AI 2027 report by looking at its timelines forecast. Much of the discussion in this post is driven by a recent critique authored by the computational physicist titotal, who also offers some other critiques worth attending to. I want to focus on the primary question of how the authors generate high confidence in explosive growth.
2. The timelines forecast
The timelines forecast is authored by Eli Lifland, Nikola Jurkovic, and the forecasting team at FutureSearch.
The timelines forecast predicts the date of arrival of a superhuman coder, defined as follows:
An AI system for which the company could run with 5% of their compute budget 30x as many agents as they have human research engineers, each of which is on average accomplishing coding tasks involved in AI research (e.g. experiment implementation but not ideation/prioritization) at 30x the speed (i.e. the tasks take them 30x less time, not necessarily that they write or “think” at 30x the speed of humans) of the company’s best engineer. This includes being able to accomplish tasks that are in any human researchers’ area of expertise.
The timelines forecast draws on two models. The first is a timeline-extension model extrapolating trends in the length of tasks that AI agents can accomplish from a recent Machine Evaluation and Threat Research report.
The second is a benchmarks-and-gaps model which predicts the time needed for AI agents to saturate a benchmark of AI R&D tasks (RE-Bench), then extends this forecast by predicting the time to cross remaining milestones from benchmark saturation to superintelligent coding.
The timelines forecast also provides an all-things-considered forecast which is not based on a comparably detailed model.
Today’s post discusses the timeline-extension model. I’m going to focus on the original version of this model. The model has since been modified, and the authors have offered defenses of their modeling choices. I may discuss these modifications and the author’s responses in a second post if there is interest. I do not think that these modifications do much to change the fundamental point made here (and elsewhere): that an intelligence explosion is baked into the model rather than extracted in a deep way from data and forecasts of model parameters. But if enough readers disagree or would like to hear why I think this, it may be worth another post.
3. The timeline-extension model (April 2025): Strategy
3.1. METR Report
Model Evaluation and Threat Research (METR) is a research organization seeking to “develop scientific methods to assess catastrophic risks stemming from AI systems’ autonomous capabilities and enable good decision-making about their development.”
A recent METR report, “Measuring AI ability to complete long tasks,” looks at the length of tasks that can be accomplished by AI systems over time.
More specifically, they introduce the X%-(task completion) time horizon – the length of tasks that models can complete approximately X% of the time. Task length is estimated by assessing the time taken by human programmers to complete each task.
At a 50% time horizon, the METR report finds an exponential doubling of time horizons every 7 months since 2019.

Performance at an 80% task completion time horizon has grown substantially more slowly (doubling every 213 days), but still follows an impressive exponential trend.

The AI 2027 authors will ultimately place a great deal of weight on the fact that recent doubling times have been faster than the trend since 2020. This will lead them to place a great deal of credence in faster doubling times than even those in the METR report. I don’t want to push too much on this assumption, since it will turn out not to be terribly essential to model behavior, but I think it is an important point to note.
3.2. Timeline-extension model strategy
The timeline-extension model uses a subset of the METR report’s task suite to assess model performance. The model assumes that superintelligent coding is reached past a certain time horizon and reliability level on the METR task suite.
The timeline-extension model then asks two questions:
- What time horizon and reliability level on METR’s task suite are needed for [superhuman coders] (SC)?
- When will this time horizon and reliability [level] be reached?
Answers to these questions will produce a forecast of the arrival date of superhuman coders.
4. What time horizon and reliability level are needed for superhuman coders?
The time horizon needed for superhuman coders is estimated by authors Eli and Nikola in the format “main estimate [80% confidence interval]”
| Eli | Nikola |
| 10 years [1 month, 1200 years] | 1.5 months [16 hours, 4,000 hours] |
Eli also estimates that an 80% reliability level at this time horizon is needed. I was not able to identify a similar prediction by Nikola, but will assume that Nikola is working with a reliability level of 80%.
5. When will this time horizon and reliability level be reached?
The time horizon and reliability level needed for superhuman coders are assessed on the basis of four predictions:
- The current doubling time of the time horizon
- How this would change over time, with no AI R&D automation
- The difficulty of making a human-cost SC 30x faster and cheaper
- Accounting for intermediate speedups and the internal-public gap
Plugging in the 80% time horizon of the current leading model from the METR paper (15 minutes) to these four predictions yields a model for when future 80% time horizons will reach any given level.
I’ll discuss each prediction in turn, splitting the last into two categories as the prediction of intermediate speedups is largely separate from the prediction of internal-public gaps and is substantially more important to the results of the model.
5.1. Current doubling time
The current doubling time of the time horizon is estimated at 4.5 months with an 80% confidence interval of [2.5 months, 9 months]. This estimate is drawn from discussion of the METR report.
5.2. How this (doubling time) would change over time, with no AI R&D automation
The authors split their forecasts between three models. In the superexponential model, doubling times speed up. In the exponential model, doubling times remain the same. In the subexponential model, doubling times decrease.
They give point estimates for each model:
| Superexponential | Exponential | Subexponential | |
| Eli | 0.45 | 0.45 | 0.1 |
| Nikola | 0.4 | 0.5 | 0.1 |
We will look at the assumptions underlying these models, as well as the justifications for the point estimates, in more detail below.
5.3. The difficulty of making a human-cost SC 30x faster and cheaper
The next forecast assumes that a superintelligent coder is achieved and costs the same as a human coder, then asks how long it would take to make the superintelligent coder thirty times faster and cheaper than a human coder.
The authors provide a point estimate of 4 months with an 80% confidence interval of [0.5 months, 30 months].
5.4. Intermediate speedups
The model assumes that the rate of progress in AI research and development will speed up as AI systems begin to make meaningful contributions towards their own research and development. This speedup is captured by an AI R&D progress multiplier, which multiplies the rate of progress that would otherwise be made in AI research and development.
The authors estimate the AI R&D progress multiplier at the time when superintelligent coders are reached, normalizing the multiplier today to 1. The AI R&D multiplier will be assumed to grow towards this value as progress towards superintelligent coders is made.
The authors estimate the AI R&D multiplier as follows, in the format: main estimate, [80% confidence interval]:
| Nikola | Eli |
| 5.5 [2.0, 20.0] | 8.5 [2.5, 40.0] |
5.5. The internal-public gap
A given capability level is often achieved internally within leading AI companies before models with that capability level are released. The authors estimate the amount of time between the internal development and external release of superhuman coders, subtracting this estimate from their final model of the time when superhuman coders will be developed.
The authors estimate this internal-public gap at 1.2 months, with an 80% confidence interval of [0.25 months, 6 months].
6. Simulation
The following code is used to simulate the arrival of superintelligent coders given the estimates above.
The simulation uses lognormal sampling. More specifically, it converts each 80% confidence interval into a lognormal distribution and draws samples from that distribution to run simulations.
6.1. First-pass analysis
The simulation considers three first-pass models of the doubling time of AI performance, measured in terms of (80%?) time horizons on the METR suite.
On the exponential model, doubling proceeds at the current rate T0. Let n be the number of doublings left to bring the current time horizon to the time horizon required for superintelligent coding. Then superintelligent coding will be reached in:
(Exponential Model) Remaining months = T0 * n.
On the superexponential model, each doubling gets 10% easier so that:
(Superexponential model) Remaining months = T0 * (1-0.9n)/(1-0.9).
On the subexponential model, each doubling gets 10% harder so that:
(Subexponential model) Remaining months = T0 * (1.1n-1)/(1.1-1).
To illustrate, pick a middle-of-the-road value of 5 months for the current doubling time T0. Then the remaining months to superintelligent coders on the first pass analysis depends on the number of required doublings n as follows:

This first pass analysis is modified by a cost and speed adjustment, which need not concern us for now.
6.2. Second-pass analysis
The modified first-pass analysis is then passed through a model of intermediate speedups.
On this model, the agent makes months of progress p towards the first-pass analysis of the months pold required for superintelligent coders. Progress is determined by increasing algorithmic speedups together with a mostly-constant software speedup.
Algorithmic speedups at at time t evolve from their value ainit today towards their final value at the time of superintelligent coders, aSC, at a rate increasing in the fraction pt/pold of progress made towards the first-pass month requirement. That is:
a0 = 0
at+1 = ainit(aSC / ainit )pt/pold
Computational speedups ct are fixed at 1 until 2029, when they decay to 0.5.
At time t, the rate Δpt of progress is the average of the computational and algorithmic speedups, so that:
Δpt = (at + ct)/2
The final number of months needed to reach superintelligent coding is reported as the date t when progress pt reaches the first-pass months pold required for superintelligent coding.
6.3. Extracting probabilities
Running a large number of simulations using this code yields a probability density over dates for the arrival of a superhuman coder, with models separated between each forecaster’s preferred parameters:

7. The best-fitting hyperbola
To see what is going on here, it may be best to take a slight detour through another report.
7.1 Introducing the problem
A few years ago, I read an Open Philanthropy report by David Roodman. The report, “Modeling the human trajectory,” draws on historical trends in Gross World Product (GWP) since 10,000 BCE to project future GWP growth.
When I discussed the report with an economist at the Global Priorities Institute, they called it “an exercise in finding the best-fitting hyperbola” and went back to their lunch. Here is what they meant.
Consider a graph of 20 rollouts of the Roodman model from the Roodman report. This graph shows projected long-term trends in GWP over time:

The most striking thing about this graph is that virtually all model rollouts are hyperbolic — indeed, I suspect that the lowest straggler is waiting to go hyperbolic in a few millennia.
The primary use of data and simulations in this model is not to settle the shape of the growth curve. That has been decided at the outset by modeling choices. The primary use of data and simulations in this model is to parameterize the best-fitting hyperbola. It is as though the model has already decided on a hyperbolic growth trajectory and moved on.
It might be helpful to show how the model settles on a hyperbolic growth trajectory. The model zooms out and tries to fit a single curve to GWP since 10,000 BCE, which has the following form:

As Roodman notes, exponential growth does not fit this model very well:

But this is a bit of a strange graph. What it shows is largely stagnant economic growth for much of human history, followed by gradual increases towards a few centuries of economic growth. Indeed, there is no dispute that economic growth over the past few centuries has been exponential rather than hyperbolic, nor that the past few centuries are among the fastest periods of economic growth that humanity has ever experienced.
What led Roodman to pick a hyperbolic rather than an exponential growth model? In this case, it was the decision to fit a single growth model to what are well known to be heterogeneous periods of economic growth throughout human history, with quite different factors determining growth rates that need to be modeled separately. As a result, the report managed to force a hyperbola on 12,000 years of human history, despite the broad consensus of economists that GDP growth was not locally hyperbolic during a single century of that history.
7.2 The hyperbolic model
Let’s go back to the AI 2027 timelines model. This model does largely the same thing as Roodman does. It sets things up early on so that there is a 90% chance of hyperbolic growth. The rest of the model is largely aimed at parameterizing this hyperbola.
Let’s look first at the most obvious place where hyperbolic growth is baked in to the model. The authors place 40-45% confidence in a particular superexponential model, parameterized to guarantee that growth will be hyperbolic.
Letting t(n) be the amount of lapsed time after n doublings, on the hyperbolic model we have:
t(n) = T0 * (1-0.9n)/(1-0.9)
What does this imply for time horizons? After some rearranging (see here for details), we get:
H(t) = H0(1-0.1(t/T0))-6.58
and inserting middle-of-the-road initial doubling time T0 of 5 months gives:
H(t) = H0(1-0.02t)-6.58
Let’s stop to think about how this equation behaves. It goes to infinity as t approaches 50 months. This forces an intelligence explosion in just over four years. At their chosen initial horizon of H0 = 15 minutes, horizons go through the stratosphere within about three years.

And the choice of parameters doesn’t really do much beyond wiggling the inevitable hyperbola. The time at which the model goes infinite is independent of both the initial horizon H0 and the chosen horizon at which superintelligent coding will be reached. It depends only on the rate (here: 10%) by which each doubling becomes easier and the initial doubling time T0 (here: 5 months).
Nor do we have much room to change the model’s behavior by wiggling these parameters. Raising the doubling time T0 by a large amount, say to 2 years, would still lead to infinite horizons in twenty years. Raising T0 to 10 years would still lead to infinite horizons in a century. Reducing the rate at which doublings become easier from 10% to even just 1% would lead to infinite horizons within 42 years. And combining these changes is not much help: raising T0 from 5 months to 2 years and dropping the rate at which doubling times become easier from 10% to 2% would still lead to infinite horizons within a century.
A hyperbola is a hyperbola is a hyperbola. Writing down a hyperbola produces a hyperbola, and there is not much room left for data or forecasting of any kind to change the inevitable conclusion.
7.3. The exponential model
The forecasters place 10% of their credence in a subexponential model, then put the remainder of their credence in an exponential model. This means that if the “exponential” model were to turn out to be superexponential rather than exponential, the forecasters would have placed 90% of their credence in superexponential models. Then if they were to label their graphs to reflect, by some happy accident, 90% confidence predictions, then trim them to de-emphasize the thickness of the remaining tails, the graphs would make superexpoential growth seem very likely indeed:

Why might someone suggest that the “exponential” model is actually superexponential? Recall, as a first pass, that if t(n) is the amount of lapsed time after n doublings, the exponential model has:
t(n) = T0 * n
That’s not as fast as the authors would like. Indeed, many quantities including world GDP have been growing exponentially for centuries and while the results have been impressive, they haven’t been anything like the explosive growth seen in hyperbolic growth models. So how do the authors get a hyperbola out of the exponential model?
Recall that on their second-pass analysis, progress speeds up over time due to algorithmic and computational speedups.
The blogger titotal illustrates the behavior of the second-pass analysis on one forecaster (Nikola’s) predictions. Qualitative behavior on the other forecaster’s predictions is largely unchanged.

What’s happened here is that the accelerating rate of progress in the second-pass analysis has much the same effect as the reduction in doubling times in the first-pass hyperbolic model. Now both the hyperbolic and the purportedly “exponential” model are hyperbolic.
We could continue, as titotal does, to pick at the second-pass model by projecting it backwards onto the data it is meant to fit and noting that the fit leaves much to be desired. That is certainly a worthwhile project. But we do not need to do this to grasp the general point.
The point is that the second-pass analysis means that the forecasters have placed 90% confidence in a pair of models, both of which exhibit hyperbolic growth. There is really not much room for the data, no matter if it is well- or poorly-fitted, to change things. The forecasters input a 90% credence in a hyperbola and recover a 90% likelihood of a hyperbola. That is how they get 90% confidence in explosive growth.
8. Against hyperbolic growth
I argued in the previous section that the AI 2027 timelines forecast essentially forces 90% credence in an intelligence explosion by dint of the forecasters having each put 90% credence in a pair of hyperbolic growth models. Readers may ask: What’s wrong with hyperbolic growth?
I’ve argued in my paper and blog series “Against the singularity hypothesis” that there are many things wrong with hyperbolic models of AI capacity growth. In particular:
(1) Sustained hyperbolic growth is an extraordinary growth mode rarely found in nature, and it should require correspondingly extraordinary evidence to warrant positing sustained hyperbolic growth.
(2) The problem of improving AI capacities becomes harder over time as good ideas become harder to find.
(3) Improving capacities by many orders of magnitude requires quickly overcoming all possible bottlenecks to growth, and there are many plausible bottlenecks including hardware and energy capacities.
(4) Rapid growth encounters physical constraints such as heat dissipation and, after a certain point, constraints on the total availability of materials on Earth. These constraints cannot always be quickly overcome.
(5) Often AI capacities grow subexponentially in accessible improvements. For example, it is well known that performance on many metrics has grown linearly across years of exponential growth in hardware capacities. While the METR report focuses on one metric, 80% task horizons, that has grown exponentially for several years, that is not the only metric or time period of interest for assessing capacity growth.
I also examine leading defenses of the singularity hypothesis and argue that they are unconvincing.
To be honest, one of the most common responses to that paper has been to claim that hyperbolic growth was never a large part of the story about existential risk from artificial intelligence, or even to reinvent the singularity hypothesis so that it no longer posits hyperbolic growth. I have some sympathy for the first response (less so for the second), but the more often that I see hyperbolic growth models being trumpeted in support of existential risk claims, the less currency that this response retains.
9. Conclusion
Today’s post looked at the timelines forecast from the AI 2027 model. This forecast predicts the arrival date of a superhuman coder. In particular, we looked at the first part of the timelines forecast, the timeline-extension model. This model takes as its starting point the current time horizon of tasks that models can complete with 80% accuracy and asks how long this time horizon would have to become for superhuman coders to be reached.
We saw that this model places high confidence in the arrival of superhuman coders in this decade or the next. We saw that this is driven in large part by the decision by both forecasters to invest 90% credence in a pair of hyperbolic growth models. We saw that once this decision is made, much of the rest of the data and forecasts have a limited role to play in determining qualitative model behavior.
This is likely to leave readers where they started. Readers who already had considerable sympathy for hyperbolic growth models will come out of this discussion with the same sympathy for hyperbolic growth models. Readers who did not already have considerable sympathy for hyperbolic growth models are unlikely to find themselves swayed by the timeline-extension model.
There is much more to say about the timeline-extension model. It is also certainly possible to address the author’s revisions to the model as well as their defense of the revised model. We could also talk about their stated justifications for hyperbolic growth, though to my mind these do not go far beyond standard justifications in the literature. My own preference would be to discuss the second half of the timelines model, which the authors put forward as their own preferred model. But I am certainly open to suggestions or requests on this front.

Leave a Reply