next up previous
Next: Shape space coverage with Up: Shape space coverage with Previous: The fitness of evolved

The strategy of evolved libraries

What strategy do the relatively small antibody libraries evolve for matching the much larger set of pathogens? If the pathogen set was small, we would expect that the antibodies evolve to track the pathogens perfectly. Thus, in the structure of the antibody library will directly reflect the structure of the pathogen set. What we do not know is what strategies these libraries develop when confronted with a pathogen set much larger than the size of the library, or with a very dynamical pathogen set. In the first scenario, it would be impossible to track pathogens individually. In the second scenario, the ability to track pathogens individually probably depends on the relative rate of evolution of the pathogens on one hand, and the antibody library, on the other. To investigate the type of library structure that evolves in these cases, I performed the following evolutionary algorithm experiments.

The set of all 2L bit strings will be denoted as the pathogen universe. A subset of it will be used for training the antibody libraries. I call this the training set. For a length L = 16 of antibody and pathogen bit strings, I generated, with replacement, training sets of size 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 4096, and 16384. Using these sets, I then evolved gene libraries of size A = 8, as previously described. I further investigated two types of pathogen dynamics. These are meant to correspond to:

1.
pathogenic environments that change from one generation of hosts to another, and
2.
individual pathogens slowly drifting in the molecular shape space.
I simulated the first type of dynamics by replacing 8th of the training set at generation of hosts. The second type of dynamics I implemented by mutating each pathogen in the training set with 0.1 probability per pathogen per generation of hosts. The exact values of these parameters are arbitrary. The intent, however, is not to give quantitative predictions, but to understand the qualitative behavior of the libraries under the two types of pathogen dynamics.

To assess library structure, I use an observation of Hightower (1996). Investigating the type of library that evolves when the pathogen set is very large, the author conjectured that the antibodies tend to maximize the average Hamming distance to other antibodies in the library. I can, in fact determine what this distance will be, and then ask whether this strategy is employed both by libraries that evolve in large, static pathogenic environments, as well as in small, rapidly changing pathogenic environments.

The average pairwise Hamming distance within a library is given by

\begin{displaymath}\left<h\right> = \frac{2}{A(A-1)} \sum_{i = 1}^A \sum_{j = i+1}^A
h(a_i,a_j)\end{displaymath}

where A is the number of antibodies in the library, and ai and aj are individual antibodies. The Hamming distance between two antibodies, h(ai,aj) is given by:


\begin{displaymath}h(a_i,a_j) = \sum_{k = 1}^L \delta(a_i^k, a_j^k)\end{displaymath}

where aik and ajk denotes the kth bit position of the two strings, and


\begin{displaymath}\delta(a_i^k, a_j^k) = \left\{ \begin{array}{ll}
1 & \mbox{i...
...i^k \neq a_j^k$} \\
0 & \mbox{otherwise}
\end{array} \right.\end{displaymath}

We may now switch the order of summations to obtain:


\begin{displaymath}\langle h \rangle = \frac{2}{A(A-1)} \sum_{k = 1}^L \sum_{i = 1}^A
\sum_{j = i+1}^A \delta(a_i^k, a_j^k)\end{displaymath}

and since the bits are independent, maximizing this quantity means maximizing the pairwise Hamming distance at each bit position. If for bit position k we denote by n0 the frequency of 0's in the antibody population at that position, then the pairwise Hamming distance at that position is n0 (A - n0). This quantity is maximal for n0 = A/2. Substituting into the above equation, we obtain the maximal average Hamming distance in the population:


\begin{displaymath}\langle h \rangle = \frac{L A}{2(A-1)}. \end{displaymath}

For libraries of 8 antibodies of length 16, the average pairwise Hamming distance between the antibodies in the library would have to be 9.1429. Let us now return to the two types of pathogenic environments: a static, large, training set (of size P = 212), and a small training set (size P = 8), with one pathogen being replaced by a random other at each generation of hosts. All 5 libraries evolved on the large, static training set had an average pairwise Hamming distance of 9, whereas in 9 out of 10 libraries evolved with dynamic training set, the average pairwise Hamming distance in the library was 8 (in the 1 other case it was 7). To determine the significance of this difference, I constructed 106 random libraries of 8 antibodies, and calculated the average pairwise Hamming distance in each of these libraries. I used these values to construct the distribution of average pairwise Hamming distance for random libraries. It is not surprising that the libraries that were evolved on small, dynamic, training set cannot be distinguished (using the average pairwise Hamming distance statistic) from random libraries. On the other hand, the libraries evolved on large, static training sets have significantly higher average pairwise Hamming distance than random libraries of the same size ( $p-value =
1.1 \times 10^-5$). I thus conclude that a small, dynamic training set does not allow the antibodies to distribute themselves in space such as to optimally cover the pathogen universe.

Though having maximal average Hamming distance between the genes in the library seems to be a necessary condition for maximal fitness, it is not sufficient. Clearly, a library of size A = 8 composed of four copies of a string and four copies of its complement has maximal average pairwise Hamming distance, but it is far from being optimal. It is unclear what other condition needs to be fulfilled for a library to achieve maximal fitness.

Let us return now to the question of whether the libraries learn to recognize the pathogens on which they have been trained, or they evolve such as to maximize recognition of a random molecular shape. I used the libraries that I evolved in the experiments described above to determine their expected fitness to a random shape in the universe. That is, I determined the average fitness of the libraries on all pathogen bit strings of length L = 16.


  \begin{figure}% latex2html id marker 390
\centerline{\epsfxsize=8cm \epsfbox{com...
...anging training set (green), rapidly
changing training set (blue).}\end{figure}

Fig. [*] shows the results. For static training sets (upper curve) 100 runs were used for training set sizes 8, 16, 32, 64; 50 runs for training set sizes 128 and 256; 25 runs for training set size 512; 10 runs for training set size 1024, and 5 runs for training set size 4096, and 16384. For changing training set, 10 runs were performed for each training set size, with the exception of the training set size of 4096, for which 6 runs were used. As the figure shows, the most important determinant of the fitness relative to a random pathogen is the fraction of the pathogen universe that a host encounters in one generation. If this fraction is large, fitness of evolved library is high, independent of the pathogen dynamics. This is not surprising. In the limit of the training set being the pathogen universe itself, these scenarios are indistinguishable. Libraries evolved on small, but variable training sets have lower performance on a random pathogen than libraries that evolved on large and static training sets (or large, but dynamic pathogen sets). This shows that the small, dynamic, pathogenic environments do not allow optimal placement of antibodies in the space of molecular shapes. On the other hand, the libraries that evolve in environments with few pathogens have a higher expected performance on random pathogens if the environment in which the libraries evolve is dynamic. The reason is that the static environment supports the evolution of very specialized libraries, while the dynamic environment essentially maintains random antibody libraries. A somewhat similar idea was reported by Hightower (1996), who found that stochastic antibody expression induces libraries that are more robust in handling a random pathogen.

Fig. [*] summarized these results from a somewhat different perspective. Namely, how different is the fitness of an evolved library with respect to the training set, as opposed to a random subset of the same size taken from the pathogen universe. Consider $\mathcal{A}$, an evolved library of fitness f relative to the training set. Its fitness relative to a random pathogen in the universe can be calculated by averaging the fitness of $\mathcal{A}$ with respect to all pathogens in the universe. Let us denote this fitness by f0. If we take a random subset of P pathogens from the pathogen universe, the fitness of $\mathcal{A}$ relative to this subset is still f0. The variance in fitness relative to a random subset of P pathogens is a fraction 1/P of the variance relative to a random pathogen ($\sigma^2$). I chose the value of the z-statistic as the indicator for significantly higher performance on the training set.


\begin{displaymath}Z = \frac{f - f_0}{\sigma/\sqrt{P}}.\end{displaymath}

The results, for library size A = 8, and string length L = 16, are plotted in Fig. [*]. The upper curve corresponds to static training sets. The middle curve corresponds to training sets that change slowly through mutation of individual pathogens. Finally, the lower curve corresponds to the situation when pathogens in the training set are replaced by random others from one host generation to the next.

As we expect, when the training set is large, the libraries are confronted with essentially the complete pathogen universe at every generation. The three scenarios for pathogen dynamics are indistinguishable. The curves converge to a regime of training set-independence, essentially because any pathogen set of very large size will be a permutation of the training set, and the fitness does not depend on the order in which pathogens are presented.

The regime of training set-independent fitness is reached faster when pathogens change slowly (mutation rate 0.1 per pathogen per generation of hosts). The libraries optimize their coverage of the pathogen universe, as judged by the average pairwise Hamming distance between the antibodies in the library. In all of the 6 independent runs with training set of size 4096, the average pairwise Hamming distance in the evolved library was higher than 9. As I showed before this value is significantly higher than one would expect for a random library.

Training-set independence of the fitness of the evolved library characterizes all libraries evolved in highly dynamic pathogenic environments. However, as I showed before, small and dynamic training sets promote libraries of essentially random antibodies. This makes their fitness on random pathogen sets indistinguishable from the fitness on the training set. However, given that these libraries do not specialize, their fitness on a random pathogen is higher than if the libraries were evolved in a static, small, pathogenic environment.


  \begin{figure}% latex2html id marker 404
\centerline{\epsfxsize=8cm \epsfbox{zst...
... independent runs for each pathogen set size is given in
the text.}\end{figure}

I briefly return to the question of whether the immune system might construct its receptors such as to recognize as many molecular shapes as possible. This hypothesis stemmed from the observation that challenging the immune system with artificially constructed molecules gives rise to immune responses. How would we explain these findings under the hypothesis that the immune system is selected by pathogens that affect the survival of individuals?

There are two issues that merit discussion. The first is whether the immune system optimizes its recognition of random pathogens, the other is whether it optimizes its recognition of the molecular shape space. The answer to the first question is that, if pathogens are independent from one another, the immune system needs to be presented with a large fraction of the pathogen universe at each generation to be able to optimize its recognition of random pathogens. This fraction is somewhat lower if pathogens also evolve from one generation of hosts to the next (the condition that the pathogen set is considerably larger than the antibody libraries still has to be maintained).

Regarding the recognition of the molecular shape space, we would probably need to do the following experiment. Assuming that the pathogen universe is a fraction p of the molecular shape space, we may distribute the pathogens in the space in different ways. The two extremes are:

We expect that the antibodies that will evolve in these two situations would have very different performance on a random molecular shape. Namely, the recognition of a random molecular shape will be higher if the pathogens are scattered through the space.


next up previous
Next: Shape space coverage with Up: Shape space coverage with Previous: The fitness of evolved
Mihaela Oprea
1999-04-11