Detail of the figure "Night" from the tomb of Giuliano de' Medici. Michelangelo Buonarroti. Marble. 1523

Read the Reflection, written 8 August 2021, below the following original Transmission.

Human society has a lot of very, very hard decisions to make in the days ahead. These will require us to make a host of predictions: How will the epidemic spread if we do this versus that? How will the economy be affected if we follow that course of action rather than this one?

One of the major challenges in making these predictions is that they require us to specify the dynamic processes involved. Some of the models one can use to do this are based on equations, which are (typically) then approximated on a computer. Some models are instead based on massive simulations called “agent-based models,” which were pioneered, in large part, at the Santa Fe Institute.

Whatever model we use to predict the future, we have to specify the initial condition of the variables in those models. We need to specify the current state of affairs, quantified with numbers ranging from the value of R0 for the SARS-CoV-2 virus, to how many people have been furloughed rather than fired. In turn, to get those initial condition numbers, we need to convert some “noisy” data that we have  gathered into a probability distribution over the initial condition numbers.

To illustrate the great challenge that we face, I’m going to describe why even just finding the distribution over the initial condition numbers for our models — never mind using those models to make the excruciating choices that await us — is fraught, with no right or wrong answer.

Converting noisy data into a probability distribution is the subject of the field of statistics. Broadly speaking, there are two branches of  statistics, and they provide different guidance for how to form such a probability distribution. To understand the older (and recently resurgent) of the two, a little algebra helps. 

Suppose we have two random variables, A and B. The probability that those variables take the values a and b simultaneously is P(A = a, B = b). What is the probability that A = a, no matter what value B has? This is called the “marginal distribution” of A, and if you think about it, it is just the sum of P(A = a, B = b) over all possible values b:

.

Similarly, the marginal distribution for values of B is

.

What is the probability that B will have the value b, given that A has the value a? If you (again) think about it a bit, this “conditional distribution” is just

.

Just like the quadratic equation holds, just like the sum of any two odd numbers is an even number, just like the product of two odd numbers is an odd number, the equations above mean that 

.

The left-hand side of the equation is the same. And the denominator of the righthand side is the same. All I have done is substituted the joint distribution formula in the numerator of the right-hand side with an equivalent distribution — the probability of A given B for all values of B, which is just another way of writing the joint distribution.

This simple formula for converting a conditional distribution (A given B) into its “opposite” (B given A) is known as Bayes’ theorem.1 To illustrate it, suppose that there is a blood test for COVID-19 that ideally would say “+” if and only if one has the virus. Suppose it is 90 percent accurate, in the sense that the table for the conditional distribution P(test result | health status) is:

+ 0.9 0.1
- 0.1 0.9
  Sick Well

 

This table can be summarized by saying that the false positive rate is 0.1 and the false negative rate is 0.1. 

Suppose you get tested — and are positive. How scared should you be? According to P(test | health), you might think that there’s a 90 percent chance that you have the virus. But the truth is otherwise, and this is where the Bayes equation comes in.

Suppose that only 1 percent of the population is infected, so that — everything else being equal — the “prior” probability that you are sick, P(health = sick), is 1 percent. So, according to Bayes, the associated table for what you’re interested in, P(health status | test result) is (approximately) 

Sick 0.001 0.1
Well 0.999 0.9
  - +

 

For example, P(well | +) / P(sick | +) = P(+ | well) x P(well) / P(+ | sick) x P(sick) = 11, so P(well | +) ~ 0.9. So there’s actually only a 10 percent chance that you’re sick — still not good, but certainly less frightening.

Bayes’ theorem has been elevated to the status of the scientific deities, as either the source of all truth and light, or of unending evil. Why?

Note that to apply Bayes’ theorem we needed to know the prior. And in the case of the COVID-19 pandemic this is one of those estimates that we do not have; we do not know how many people in the population are infected. That is not just true in the example of blood tests; it is also true when (for example) using current data to set the initial condition numbers for our models for predicting the future course of the pandemic. Where do we get that prior from? In the case of blood tests it was relatively simple. But in more complicated scenarios — like formulating the probability distribution of the initial condition numbers for our models of how the pandemic will unfold — it can be a very difficult question. Answering this question, and using our answers to calculate what we want to know, is called “Bayesian statistics.”

End of story? Not quite. Bayes’ theorem embodies one of the deepest truths of life: garbage in, garbage out. Adopt a stupid prior, and you get a stupid answer. Not surprisingly then, Bayesian statistics was badly misused in the past, and produced many horrible results. Frustration with these results led people to create the main competitor to Bayesian statistics, called “frequentist statistics.”

Can we justify frequentist techniques as actually being Bayesian, just for some implicit prior? If so, might frequentist techniques actually be a way to generate implicit priors, without violating the laws of math? Well, no. Even one of the most reliable, most widely used of frequentist statistics tools — the “bootstrap” — can be proven not to agree with a Bayesian analysis for any prior.2

This does not mean that we “should” use Bayesian statistics, in any normative sense, when we come up with the numbers to put into our models of the next year. (I myself am a great fan of the frequentist technique of bootstrap.) Even if rather than feeding garbage into Bayes’ theorem we feed it ambrosia, we will still be making an assumption. If the virus — if our global economy — doesn’t happen to agree with our Bayesian assumption, it does not matter whether our mathematics is correct. There is no free lunch.

David Wolpert
Santa Fe Institute

REFERENCES

  1. Berger, James O. 2013. Statistical Decision Theory and Bayesian Analysis. Springer Science & Business Media.
  2. Wolpert, D.H. 1996. “The Bootstrap is Inconsistent with Probability Theory,” in Maximum Entropy and Bayesian Methods 1995, K. Hanson and R. Silver, eds. Kluwer Academic Press.

 

T-019 (Wolpert) PDF

Read more posts in the Transmission series, dedicated to sharing SFI insights on the coronavirus pandemic.

Listen to SFI President David Krakauer discuss this Transmission in episode 30 of our Complexity Podcast.


Reflection

August 8, 2021

Looking Through Science-Tinted Glasses

At the current stage of development of the scientific enterprise, it is divided into a set of many different scientific fields. Each scientific field comes equipped with several different pairs of “glasses” with which to examine the natural world. If you wear such a pair of glasses and look around you, you will see a limited set of features in the landscape highlighted. Each scientific pair of glasses highlights a different set of features in the natural world.

As an example, if a scientist is looking at the world while wearing “Newton’s Laws” glasses, then every system around them is distorted to highlight the forces on that system, how fast that system is moving, and its rate of acceleration. In particular, if they look at a running horse, they see the forces the horse exerts on the ground (equal and opposite to the ones the ground exerts on the horse), the varying position, momentum, and acceleration of the horse, etc. As another example, if a scientist is looking at the world while wearing “Darwinian Selection” glasses, then every system around them is distorted to highlight the ancestry of that system, that is, previous systems that can be seen as the progenitors of the system. These glasses will also highlight how those previous instances of the system they’re looking at may have differed from the current instance, and what may have caused them to evolve into the current system. In particular, if someone wearing such glasses looks at a running horse, they see the phylogenetic tree behind the horse, the way that horses have evolved over tens of millions of years, etc.

Finally, it’s important to note that there are “coarse-grained” pairs of glasses that can be viewed as enveloping many of the other glasses. These coarse-grained glasses embody an entire scientific field. For example, there is a pair of glasses called “High-Energy Physics Theory” that embodies the scientific field of the same name. If a scientist wears those glasses, then they see all the features highlighted by the Newton’s Laws glasses—along with features highlighted by the “Relativity” glasses, features highlighted by the “Quantum Mechanics” glasses, and so on.

At the current stage of development of the scientific enterprise, there’s a lot to be gained by having individuals who walk, breathe, eat, and sleep all while wearing one specific field’s pair of glasses, and never wearing any other field’s pair of glasses. These people eventually become what are sometimes called “domain experts” in that scientific field. This means they are completely comfortable navigating the world while wearing that field’s pair of glasses. Such experts are able to see very subtle features in some of the objects in the natural world, and report on those subtle features back to the rest of us. In this way, we all gain from the dedication these people make of their entire intellectual lives to one particular, restricted pair of glasses.

However, there is also much to be gained by flipping back and forth among many pairs of glasses, viewing the world in many ways. Such flipping among pairs of scientific glasses is what “multidisciplinary science” is supposed to be all about. Indeed, in what is essentially an evolutionary process, new “offspring” pairs of scientific glasses are often produced by flipping back and forth between some “parent” pairs of glasses very quickly. (As a completely gratuitous aside, incessant flipping back and forth among many different scientific glasses is the central feature of the Santa Fe Institute.)

Everything above was written while wearing the glasses called “Sociology/History of Science.” Now, let me take those glasses off, and start flipping among many different ones.

In my Transmission, I first considered the COVID-19 virus while wearing a recently invented pair of scientific glasses called “Thermodynamics of Computation.” If you wear these glasses, all dynamic processes you see around you have two central features. The first feature is the process’ thermodynamic behavior, that is, how quickly the process uses energy as it implements its dynamics. The second feature is what precise computation is implemented by that dynamics. Wearing Thermodynamics of Computation glasses, all the dynamic processes you see around you involve an interplay between their rate of energy usage and the computation they are implementing. Indeed, the thermodynamics of computation teaches us that those two quantities are intimately related—in order for a dynamical system to implement more computation, it must pay for it, with a greater rate of energy usage. As an example, looking at a running horse while wearing these glasses, one notices the computation the horse is continually doing in both its brain and its limbs, of how to run, along with the energetic cost of that running.

After this preamble involving the Thermodynamics of Computation glasses, I replaced them with a different pair of scientific glasses, called the “Extended Phenotype” glasses. When you wear the Extended Phenotype glasses, the line between living organisms and the world outside of them gets blurry. Any organism will affect the dynamics of the part of the universe that contains its cells, that is, that contains small bags of protoplasm enveloping copies of its DNA. But any organism will also affect the part of the universe that does not contain its cells, the part of the universe typically called the “environment” of the organism. The “Conventional Phenotype” glasses are concerned with how the organism affects the more restricted part of the universe, containing its cells. The Extended Phenotype glasses enlarge that to consider how the organism affects the entire universe, including the environment of the organism.

One of the most prominent features one notices when wearing the Extended Phenotype glasses is how an organism affects its environment in ways that ultimately benefit that organism’s genome, either directly, in the present, or indirectly, in the future. A standard example is how a beaver (organism) building a beaver dam (environment affected by the organism) affects the beaver’s genome (helping the beaver survive). Another example is how the leader of a social group (organism) accumulates a large harem (environment affected by the organism) and so produces many progeny (the future genome of the organism).

The Extended Phenotype glasses can also highlight how an entire population, or species, affects its environment. (In this case, one considers the “aggregate” genome defining the population as a whole rather than the precise genome defining a specific organism in that population.)

When one sees this feature, one is often led to put the Darwinian Selection glasses on over the Extended Phenotype glasses, to consider the possible adaptive fitness value of the extended phenotype. This is particularly compelling if one is seeing this feature in an entire population rather than just a single organism in that population. (And yes, wearing one pair of glasses over another pair of glasses can be awkward, sometimes resulting in both pairs of glasses falling off—nobody said multidisciplinarity would be easy!)

So, what do you see when you wear the Extended Phenotype glasses on top of the Thermodynamics of Computation glasses, rather than flipping between them? You see populations that are constrained in how much computation they can do by themselves, due to the associated energetic costs, and that affect the environment by changing the computation that the environment does, with the associated energetic costs borne by the environment. One of the most striking examples of this kind of computational extended phenotype is exhibited by the COVID-19 virus (with the epigenome of the virus playing the role of an aggregate genome). We humans are part of the environment of the virus, and the virus affects us. In fact, it induces our immune systems to perform the computation (and bear the associated energetic cost) of replicating the virus. It then gets human society as a whole to perform the subsequent computation (and again bear the associated energetic cost) of figuring how to spread the virus to new hosts. This is obviously beneficial to the (epi)genome of the virus. So, the whole story still holds together when we put the Darwinian Selection glasses on as well, on top of the Extended Phenotype glasses and the Thermodynamics of Computation glasses. (The practitioners of multidisciplinarity look very odd to other people, wearing so many glasses at once, but we practitioners don’t mind.)

This was the theme of my Transmission. However, I realized after writing that essay that there was a bit of a mystery concerning this example of COVID-19 offloading computations and associated energetic costs on its environment. Almost all real-world computational systems are modular and hierarchical, whether designed by humans (e.g., digital circuits) or constructed via natural selection (e.g., brains, genetic networks). There are many reasons for this. Both computers designed by humans and those constructed via natural selection benefit from the fact that modularity helps minimize the costs (both energetic costs and material costs) of communication among the subsystems of the computer. In addition, both types of computers benefit from the “evolvability” of hierarchical, modular design, as elaborated in arguments stretching back to Herb Simon.1 Other benefits include robustness against noisy components/component failure (particularly important for human-designed computers). In addition, one can argue that hierarchy is actually almost inevitable in computers, since it often arises by frozen accidents, especially in natural selection (so-called “accretional software construction”).

At first I thought that the COVID-19 computation, offloaded onto us convenient humans, is an exception to this rule, that there is no sense in which the computation of propagating the virus’ epigenome is being implemented in a hierarchical, modular system. Once one thinks about it though, by offloading the computation of propagating itself onto individual humans, the virus exploits the finer-grained modularity and hierarchy of the components of that human’s immune system. But the lowest level of the hierarchy that forms an individual human’s immune system is the individual cells in that immune system. Each of those cells comprises a set of interacting organelles, performing a joint computation. So, at a yet finer-grained level, the computation of a human’s immune system that is doing the bidding of COVID-19 is constructed on top of the hierarchical, modular computation among the organelles within each cell in the human immune system. We can also go in the other direction, up to more coarse-grained levels than individual humans and their immune systems. After all, the virus also exploits the modularity and hierarchy of human social systems—the hierarchies built on top of the individual humans—to help it spread even further.

That’s the COVID-19 extended phenotype computer: an aggregate hierarchical modular computer. That computer has organelles inside of individual cells at its lowest, most fine-grained level, at its smallest physical scale (the same physical scale as an individual COVID-19 virus). Those cell-computers are then aggregated into the components of the immune systems of individual humans at a higher level of hierarchy, to form a higher-level computational system. And those individual human computers are in turn aggregated into the components of the human sociopolitical hierarchy at even higher, more coarse-grained levels. Astonishing!

Part of my current research concerns precisely the thermodynamics of computation in hierarchical, modular systems. By contemplating the amazing extended phenotype of COVID-19, I have gained a completely new perspective of this issue. This is one of the ways that the pandemic has affected my research.

Read more thoughts on the COVID-19 pandemic from complex-systems researchers in The Complex Alternative, published by SFI Press.


Reflection Footnotes

1 H. Simon, 1962, “The Architecture of Complexity,” Proceedings of the American Philosophical Society 106(6): 467-482.