If you ever do have to heed a forecast, keep in mind that its accuracy degrades rapidly as you extend it through time.
Nicholas Taleb, The black swan
1. Introduction
This is Part 3 of my series “The scope of longtermism,” discussing a paper of the same name.
Part 1 introduced the scope question for longtermism: how wide is the class of decision situations for which longtermism holds? I clarified my target, swamping axiological strong longtermism (swamping ASL) and my view: the scope of swamping ASL, while probably nonempty, is smaller than many longtermists suppose.
Specifically, I introduce three scope-limiting factors: probabilistic and decision-theoretic factors that are present in many contemporary decision problems and which, when present, substantially reduce the prospects for swamping ASL to hold of that problem.
Part 2 introduced the first scope-limiting factor: rapid diminution. Today’s post introduces the second scope-limiting factor: washing out.
2. Washing out
Suppose you cure a child’s blindness. That child might, if she is lucky, go on to found a business, use this business to launch a quick-moving political career, and go on to found a beneficial world government. That would be an excellent outcome.
However, the child might, if she is unlucky, go on to found an exploitative business, run for the wrong party, or start a world government too soon. That would be a poor outcome.
Part 2 of this series suggested one line of response: none of these outcomes are especially likely to occur. Today’s post begins from a different thought. There are, as we saw, many ways that our acts might improve the value of the long-term future, but also many ways that they may harm it. These negative and positive outcomes that our acts may have will tend to work against one another in taking expectations, significantly reducing the expected value of our acts by `washing out’ the value of potential positive consequences against the value of potential negative consequences.
Expressed probabilistically, washing out occurs when the probabilities (P) of possible long-term value impacts (ΔV) exhibit significant symmetry about the origin.

Full washing out occurs in the case of perfect symmetry about the origin, as in the figure above. But washing out may be significant even in the more likely case of imperfect symmetry.
Because washing out tends to significantly reduce the expected long-term value of acts which exhibit it, washing out emerges as a second scope-limiting factor on axiological strong longtermism.
But why should we think that washing out occurs frequently? There are two somewhat-equivalent ways to make the point.
3. The Bayesian perspective: Evidential paucity
Many Bayesians think that the state of complete ignorance should be expressed by a probability distribution that is completely symmetric about the origin. If we don’t know anything at all about the impacts of an action, then we have no reason to believe that any value gain ΔV is more probable than the corresponding value loss -ΔV, so we should assign equal probability mass to the regions above ΔV and below -ΔV.
Suppose we also think that, in evaluating the long-term value impact of options, we are often in a situation of evidential paucity. That is, the evidence that we have bearing on the long-term value impacts of our options is sparse and underpowered. After all, it usually takes significantly more evidence to get a handle on outcomes as they get further away, but it also gets significantly harder to find evidence about outcomes as they grow more distant.
From a Bayesian perspective, updating under evidential paucity should not move an agent significantly away from their urprior. For example, if evidence is represented as ruling out possible worlds, then the situation of evidential paucity is one in which few worlds can be ruled out. This means that Bayesian updating should be weighted heavily towards the urprior. If the urprior is symmetric, then the updated distribution will tend to exhibit a high degree of symmetry as well.
This argument draws only on an orthodox Bayesian view about the epistemology of complete ignorance, together with the idea that we are often in a situation of evidential paucity in evaluating long-term impacts. While both ideas might be challenged, they are plausible and popular ideas, and together they suggest that agents’ views about potential long-term impacts should exhibit significant symmetry about the origin. That is, there should be a high degree of washing out.
4. The forecasting perspective
A second argument for evidential paucity is, at least on the surface, quite different from the first. It does not presume a Bayesian approach, and discusses difficulties in forecasting rather than evidential paucity. I think that this second argument is deeply related to the first, though that relationship will fall beyond the scope of this post. In any case, it would only help me if the second argument turned out to be largely independent of the first.
On a popular view of forecasting, forecasts are shifted away from the truth by systematic error and random error. Systematic error is skewed in a single direction away from the mean, and results from the forecaster’s own biases. Random error, by contrast, is usually represented as symmetric about the mean, reflecting the inherent difficulty of predicting the phenomenon due to noisiness in the data.
Suppose that a forecast is made under conditions of low noise. Then, you should think that the forecast is roughly equal to the true value, skewed only by the systematic biases of the forecaster. If these biases are minimal, you might be quite optimistic about the forecast.
Suppose, by contrast, that the forecast is made under conditions of high noise. This might be represented by taking random error to be a normal distribution with mean zero and high variance. Then, you should think that the forecast is primarily driven by random error, since random error tends to be much larger than the true value and the effects of systematic bias. In this case, you should think that the forecast is primarily driven by random noise, rather than (true) signal or the systematic biases of the forecaster. To this extent, you should place much less stock in the forecast. If, again, you begin with a symmetric prior, then after updating you should retain a largely symmetric distribution over possible long-term impacts.
The argument from forecasting pessimism holds that it is often extremely difficult to predict long-term future values of ΔV, and hence that we should take such forecasts to be driven largely by random noise. Why think that it is especially hard to forecast long-term future value changes? At least three arguments suggest themselves.
4.1: Track records
We have fairly limited track records of forecasts made on a timescale of decades, and virtually no track record of forecasting on a timescale of centuries, much less millennia. When we look at these track records, the story is decidedly mixed.
Certainly there are some specialized scientific domains, such as astronomy, in which even less-advanced civilizations have been able to make predictions with a high degree of accuracy on a timescale of centuries or millennia. This exception will be relevant in explaining why some projects, such as the Space Guard Survey, may largely escape the scope-limiting factors. However, in most other domains, matters are a bit more dire.
One of the most bullish advocates of long-term forecasting is Philip Tetlock. His Expert Political Judgment project asked expert and non-expert forecasters a variety of political questions, some of which spanned a horizon of 25 years. In a report funded by Open Philanthropy and Founders Pledge, among others, Tetlock surveys 25-year performance in two domains: nuclear proliferation (NP) and questions about border control and succession (BCS).
Here are the Brier Scores for performance on BCS questions (recall that a Brier Score of 0.25 is chance-level):

Here both expert and non-expert forecasters did reasonably well on a 5-10 year time horizon, but struggled a fair bit with questions on a 25-year time horizon. Their performance, while certainly better than chance, is nothing to write home about.
Forecasters did a bit better on nuclear proliferation (NP) questions:

Performance on these questions, while not outstanding, was considerably better than chance even on a 25-year time horizon. But if this is the best that can be said for the bulls, one understands why many experts are skittish.
Other studies have pained a less flattering portrait of long-term forecasting reliability.
Returning to nuclear proliferation, a study by Moeed Yusuf at the Brookings Institute found that:
Since the advent of nuclear weapons in 1945, there have been dozens, if not hundreds of projections by government and independent analysts trying to predict horizontal and vertical proliferation across the world … The results have oscillated between gross underestimations and terrifying overestimations.
In other domains, long-term forecasting has also produced mixed results. For example, Joseph Risi and colleagues (2019) use machine learning to study the problem of predicting which documents would become historically significant. Specifically, they study the correlation between the perceived contemporaneous importance (PCI) of documents, drawing from US Government cables in the 1970s, and the probability of becoming historically significant, measured by future inclusion in the government-produced history, Foreign Relations of the United States (FRUS)
In the most generous case, where the model was given an equal number of cables which did and did not end up being included in FRUS (a `1:1 sample’), there was a nontrivial correlation between contemporary importance and inclusion in FRUS (figure a below). But as they moved closer to the general case in which models must classify all cables, only 1:1,132 of which would come to be included in FRUS, perceived contemporary importance came to correlate only weakly with future importance (figure d below).

Risi and colleagues’ findings tend to suggest that predicting future historical significance is hard, and indeed that is the conclusion they draw:
We find that although such models can correctly identify some historically important documents, they tend to overpredict historical significance while also failing to identify many documents that will later be deemed important, where both types of error increase monotonically with the number of documents under consideration. On balance, we conclude that historical significance is extremely difficult to predict.
Combining the findings from these and other colleagues suggests a picture on which even forecasts made on a scale of a few decades frequently show limited, though nontrivial reliability.
4.2 Forecasting value is hard
It may not be beyond our ken to predict some quantities decades, even centuries out. Perhaps, for example, we have a good estimate of how many lives will be saved in coming decades by malaria nets.
However, it is much more difficult to predict the other effects that come along with this. How will the GDP, infrastructure and educational systems of malaria-stricken nations be affected? What, in turn, will happen to African or even international politics? Will wars be started, or averted? Will Chinese influence continue to expand throughout the region, or will strengthened states begin to assert their independence?
What this suggests is that forecasting value is hard. It may not be so hard to answer tractable questions about the effects of our interventions, such as how many lives will be saved. But it is extremely difficult to answer broader questions, and the impact on the value of the future is the broadest question still, since it is sensitive to every way in which the future can be made better or worse by our acts.
Those bullish about long-term forecasting are quick to point out that they are not trying to forecast everything. For example, the paper by Tetlock and colleagues surveyed earlier justifies its focus on nuclear proliferation and border issues precisely on the grounds that these are not so hard as some questions we could have asked:
The ideal long-range question for a fair test of [forecasting] expertise falls in the Goldilocks zone of difficulty. It is far-fetched to suppose someone could predict, say, the President of the United States 25 years out. That sets up [forecasting] skeptics for easy wins. But it is less far-fetched to suppose someone could predict trajectories of slow-motion variables with low base rates of change: nation-state borders, nuclear proliferation, or rankings of nation-states on GDP, life span, or corruption.
What would Tetlock and colleagues have said if asked to test the hypothesis that experts can predict changes in the total value of the world 25 years out? What about a hundred years out, a thousand or a billion? I suspect that Tetlock and colleagues would have replied curtly that these challenges are unfair: of course we can’t reliably predict such things, but that is a caricature of the position that even the most bullish experts are trying to defend.
That brings us to the final reason to be skeptical about long-run forecasting of the value of the world: experts are deeply skeptical that it can be done. If even the most bullish experts think forecasts of this nature cannot be reliably made, then perhaps we should be skeptical ourselves.
4.3 Expert skepticism
I could not find any examples of forecasting experts who defend, in a peer-reviewed publication, the position that we can reliably predict the impact that our actions will have on the world hundreds, thousands, millions or billions of years out.
It is easy, however, to find experts who are skeptical. Most obviously, the field of decisionmaking under deep uncertainty combines a variety of applied disciplines such as risk analysis and operations research that are often asked by governments and corporations to aid in the planning of long-term projects on a timescale of decades, or even a century. Researchers in this field must put their money where their mouth is: if they think that we can generate reliable forecasts on this time-scale, they need to actually generate those forecasts and convince stakeholders that the forecasts are worth paying for.
Researchers studying decisionmaking under deep uncertainty have been nearly unanimous in advancing the line that forecasting on this timescale is typically so difficult and non-predictive as to be a futile exercise. Instead, they have advanced a variety of non-forecasting methods including robust decision-making, scenario planning, and info-gap decisionmaking. It is, for those bred to a certain brand of armchair Bayesianism, easy to scoff at such a total rejection of forecasting. (I must confess to having once been such a Bayesian myself). But I do not think we should be so hasty. Would any of my readers care to convince the Israeli government that they can convincingly model the decision of whether to build an expensive liquified natural gas plant as a hedge against energy instability due to regional unrest, or tell the city of London how much to reinforce the Thames estuary wall so that it will withstand the next century’s worth of global warming without bankrupting the city in the process? The professionals actually answering these questions did not use probabilistic forecasts, and they seem to have done reasonably well for themselves.
Even beyond this kind of full-out rejection of forecasting, it is easy to find authors who think that forecasts on this timescale are not very reliable. For example, Paul Goodwin and George Wright (2010) argue that forecasting methods tend to fare poorly at predicting low-probability, high-impact events, just the kind of events that tend to crop up on longer timescales. (A similar position was popularized by Nicholas Taleb in his book, The black swan, in the wake of the subprime mortgage crisis).
Perhaps most strikingly, figures no less luminary than Daniel Kahneman (rest in peace!), Cass Sunstein and Oliver Sibony conclude, in a recent book, from a direct review of the very dataset that Tetlock’s earlier paper was reviewing, that:
Tetlock’s findings suggest that detailed long-term predictions about specific events are simply impossible. The world is a messy place, where minor events can have large consequences. For example, consider the fact that at the instant of conception, there was an even chance that every significant figure in history (and also the insignificant ones) would be born with a different gender. Unforeseeable events are bound to occur, and the consequences of these unforeseeable events are also unforeseeable. As a result, objective ignorance accumulates steadily the further you look into the future. The limit on expert political judgment is set not by the cognitive limitation of forecasters but by their intractable objective ignorance of the future.
If, in contrast to such a stance, longtermists want to suggest that long-term value impacts are often relatively tractable to properly trained forecasters, they will be standing in direct opposition to the consensus of experts in the field, including very likely Tetlock himself.
5. Conclusion
Today’s post introduced a second scope-limiting factor: washing out. Washing out occurs when the probabilities of positive and negative long-term axiological impacts exhibit significant symmetry about the origin. When washing out occurs, there is significant cancellation from the ex ante perspective in taking expected values, reducing the prospects for swamping longtermist options to emerge.
We explored two related strategies for motivating the idea that washing out often occurs. The first began from the popular Bayesian doctrine that complete ignorance should be represented by a symmetric prior. Since we usually find ourselves in a situation of evidential paucity in trying to predict the long-term future, updating should not take us far away from the symmetric prior.
A second strategy suggested that forecasting long-term value changes is hard. Because forecasting these changes is so hard, we should chalk up most of the apparent directionality in forecasts to signal rather than noise, giving them relatively less impact on our decisionmaking. We saw three reasons to think that forecasting long-term value changes is hard: we have limited and mixed track records of even moderate-term forecasting; experts are skeptical; and forecasting value is significantly harder than forecasting the more tractable quantities that tend to show moderate predictability.
When washing out holds together with our previous scope-limiting factor, rapid diminution, things begin to look dire for the swamping longtermist. If long-term value changes exhibit both rapid decay in their probabilities and significant symmetry about the origin, then it is hard to see how the expected long-term impacts of most acts could be large.
However, perhaps you are struck by the fact that there are millions, billions or perhaps even more possible acts out there. Surely one of them could be a swamping longtermist option? After all, it only takes one swamping longtermist option to ground swamping longtermism.
This line of thought motivates a third scope-limiting factor: option unawareness. The options we are aware of do not number in the millions or billions, but often in the dozens or single digits. I discuss the nature and relevance of this third scope-limiting factor in the next post.

Leave a Reply