The set of all 2L bit strings will be denoted as the pathogen universe. A subset of it will be used for training the antibody libraries. I call this the training set. For a length L = 16 of antibody and pathogen bit strings, I generated, with replacement, training sets of size 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 4096, and 16384. Using these sets, I then evolved gene libraries of size A = 8, as previously described. I further investigated two types of pathogen dynamics. These are meant to correspond to:
To assess library structure, I use an observation of Hightower (1996). Investigating the type of library that evolves when the pathogen set is very large, the author conjectured that the antibodies tend to maximize the average Hamming distance to other antibodies in the library. I can, in fact determine what this distance will be, and then ask whether this strategy is employed both by libraries that evolve in large, static pathogenic environments, as well as in small, rapidly changing pathogenic environments.
The average pairwise Hamming distance within a library is given by
Though having maximal average Hamming distance between the genes in the library seems to be a necessary condition for maximal fitness, it is not sufficient. Clearly, a library of size A = 8 composed of four copies of a string and four copies of its complement has maximal average pairwise Hamming distance, but it is far from being optimal. It is unclear what other condition needs to be fulfilled for a library to achieve maximal fitness.
Let us return now to the question of whether the libraries learn to recognize the pathogens on which they have been trained, or they evolve such as to maximize recognition of a random molecular shape. I used the libraries that I evolved in the experiments described above to determine their expected fitness to a random shape in the universe. That is, I determined the average fitness of the libraries on all pathogen bit strings of length L = 16.
Fig.
shows the results. For static training sets
(upper curve) 100 runs were used for training set sizes 8, 16, 32, 64;
50 runs for training set sizes 128 and 256; 25 runs for training set
size 512; 10 runs for training set size 1024, and 5 runs for training
set size 4096, and 16384. For changing training set, 10 runs were
performed for each training set size, with the exception of the
training set size of 4096, for which 6 runs were used. As the figure
shows, the most important determinant of the fitness relative to a
random pathogen is the fraction of the pathogen universe that a host
encounters in one generation. If this fraction is large, fitness of
evolved library is high, independent of the pathogen dynamics. This is
not surprising. In the limit of the training set being the pathogen
universe itself, these scenarios are indistinguishable. Libraries
evolved on small, but variable training sets have lower performance on
a random pathogen than libraries that evolved on large and static
training sets (or large, but dynamic pathogen sets). This shows that
the small, dynamic, pathogenic environments do not allow optimal
placement of antibodies in the space of molecular shapes. On the other
hand, the libraries that evolve in environments with few pathogens
have a higher expected performance on random pathogens if the
environment in which the libraries evolve is dynamic. The reason is
that the static environment supports the evolution of very specialized
libraries, while the dynamic environment essentially maintains random
antibody libraries. A somewhat similar idea was reported by
Hightower (1996), who found that stochastic antibody expression
induces libraries that are more robust in handling a random pathogen.
Fig.
summarized these results from a somewhat different
perspective. Namely, how different is the fitness of an evolved
library with respect to the training set, as opposed to a random
subset of the same size taken from the pathogen universe. Consider
,
an evolved library of fitness f relative to the
training set. Its fitness relative to a random pathogen in the
universe can be calculated by averaging the fitness of
with respect to all pathogens in the universe. Let us denote this
fitness by f0. If we take a random subset of P pathogens from the
pathogen universe, the fitness of
relative to
this subset is still f0. The variance in fitness relative to a
random subset of P pathogens is a fraction 1/P of the variance
relative to a random pathogen (
). I chose the value of the
z-statistic as the indicator for significantly higher performance on
the training set.
As we expect, when the training set is large, the libraries are confronted with essentially the complete pathogen universe at every generation. The three scenarios for pathogen dynamics are indistinguishable. The curves converge to a regime of training set-independence, essentially because any pathogen set of very large size will be a permutation of the training set, and the fitness does not depend on the order in which pathogens are presented.
The regime of training set-independent fitness is reached faster when pathogens change slowly (mutation rate 0.1 per pathogen per generation of hosts). The libraries optimize their coverage of the pathogen universe, as judged by the average pairwise Hamming distance between the antibodies in the library. In all of the 6 independent runs with training set of size 4096, the average pairwise Hamming distance in the evolved library was higher than 9. As I showed before this value is significantly higher than one would expect for a random library.
Training-set independence of the fitness of the evolved library characterizes all libraries evolved in highly dynamic pathogenic environments. However, as I showed before, small and dynamic training sets promote libraries of essentially random antibodies. This makes their fitness on random pathogen sets indistinguishable from the fitness on the training set. However, given that these libraries do not specialize, their fitness on a random pathogen is higher than if the libraries were evolved in a static, small, pathogenic environment.
I briefly return to the question of whether the immune system might construct its receptors such as to recognize as many molecular shapes as possible. This hypothesis stemmed from the observation that challenging the immune system with artificially constructed molecules gives rise to immune responses. How would we explain these findings under the hypothesis that the immune system is selected by pathogens that affect the survival of individuals?
There are two issues that merit discussion. The first is whether the immune system optimizes its recognition of random pathogens, the other is whether it optimizes its recognition of the molecular shape space. The answer to the first question is that, if pathogens are independent from one another, the immune system needs to be presented with a large fraction of the pathogen universe at each generation to be able to optimize its recognition of random pathogens. This fraction is somewhat lower if pathogens also evolve from one generation of hosts to the next (the condition that the pathogen set is considerably larger than the antibody libraries still has to be maintained).
Regarding the recognition of the molecular shape space, we would probably need to do the following experiment. Assuming that the pathogen universe is a fraction p of the molecular shape space, we may distribute the pathogens in the space in different ways. The two extremes are: