Santa Fe Institute

The evolution of complexity and intelligence on earth

Team lead: David Krakauer, Director, Wisconsin Institute for Discovery, University of Wisconsin-Madison; External Professor, Santa Fe Institute

This project seeks to explore the long history of life on earth in relation to the emergence of increasing complexity across key biological lineages. In practice this implies that we seek to identify those general conditions when organisms are selected to increase the amount of information they encode about their surroundings and other organisms, and the diversity of mechanisms that have evolved to accomplish these goals.

We are currently experiencing a comparative “data-deluge” (Aldhous 1993, Bassi and Denazis 2008) with unprecedented amounts of genetic sequence, expression, and molecular proteomic data sampled from a very diverse set of biological lineages. As of 2010 around 4600 genomes, chromosomes, and plasmids had been sequenced. These sequences are deposited in publicly available databases that include representatives from all of the five major kingdoms of life and the viruses. In GenBank alone, the NIH-supported genetic sequence database, there are currently 136,000,000 sequence records, spanning 380,000 organisms. Of perhaps greater relevance to this project are databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG, Kanehisa 2007), a reference knowledge base for linking genomes to function through the process of PATHWAY mapping, whereby genomic or transcriptomic content is mapped to reference pathways in order to infer systemic behaviors of the cell or the organism, and the Gene Expression Database (GXD, Hill et al 2004a), which stores information from a variety of gene expression assays.

What this means for us is an ability to make a large number of quantitative statements about species, all of which can be placed in an ancestor-descendant relationship, and thereby track through time systematic variation in gene content, expression patterns, and critical sets of shared functional capabilities.

Previous research on the evolution of complexity has proceeded by picking functional traits and plotting a time series to reveal systematic trends across lineages. For example, increases in cell size and diversity (Bonner 1988, Valentine 2000), increases in genome size (Petrov 2001), changes in tissue form and pigmentation (Mellgren 2002), changes in shell morphology (Saunders and Work 1999), in the number of discrete taxonomic features (Schopf et al 1975), or the fewest number of phenotypic dimensions that capture variation across lineages (Pie and Weitz 2005).

There has been a parallel, neutral approach to the evolution of complexity that views increasing numbers of components (genes or traits) as a statistical feature of basic genetic mechanisms of mutation and duplication (Crow 2006, McShea 2005) or increasing genetic interactions through epistasis (Frenken 2006, Borenstein and Krakauer 2008).

In both cases there is awareness that complexity is often self limiting, and that as the number of components increase, so do constraints on diversification (Wagner and Altenberg 1996, Orr 2000). And that whereas lineages can never fall below a minimum complexity to sustain life (Mushegian 1996, Luisi 2002), there is no known upper bound to complexity, and so the maximum might be increasing simply through incremental sampling of possible variation over long stretches of time without any systematic drivers of complexity.

Here we take a different approach, building on recent progress in the study of evolutionary dynamics (Nowak 2006), information theory, the theory of formal languages, machine learning, and scaling. It is striking that it is only recently that we have come to appreciate that each of these frameworks seeks to answer a very similar question: how starting from simple initial conditions, or a position of maximum ignorance, a sequence of carefully chosen steps can increase the information that a system encodes. This problem has its roots in classical problems of undecidability in mathematics formalized in terms of string rewriting systems, where one wants to determine whether one can sort among a set of inputs to yield a desired output (Post 1947).

This problem has a correspondence with a fundamental problem in statistical physics describing the increase in entropy over time and the concomitant loss of information in the universe. This paradox is typically resolved in the framework of Maxwell’s demon (Maxwell 1871), where one seeks an automaton capable of producing order locally from disordered states. These (rewrites and demons) are in turn comparable to natural selection mechanisms that sort among alternative genomes “selecting out” adaptive variants (Bell 2008).

The theories we are seeking to develop, which represent a novel approach to the problem of adaptation, will classify rewrite systems, demons, and selection mechanisms that all lead to systematic increases in the information of a system capable of predicting states of the world. Increasing effective inference -- from genomes to neurons -- corresponds to the evolution of nested selection mechanisms operating over many time scales. We will test these theories against data sets where it is now possible to correlate functional molecular features of organisms with characteristics of their habitats, allowing us to explore those factors promoting complex lineages.

The origins of systematic, quantitative research into complex, ordered states begins with Maxwell in 1871 seeking to explore the limitations of the second law of thermodynamics. The law states that it is impossible to create an inequality of temperature or pressure in a closed system without the expenditure of work, or that entropy tends to increase towards thermodynamic equilibrium. Maxwell proposed a molecular intelligence (Demon) capable of discerning variation in molecular velocities, and sorting them according to velocity into two compartments of an isolated vessel. In this way a non-uniform, out-of-equilibrium configuration can be achieved. Maxwell also observed that if there are few molecules in the vessel, then disequilibrium might emerge as a simple fluctuation without a demon. William Thompson described the demon as capable of operating selectively on individual atoms reversing the natural dissipation of energy. Thus, the law is statistical and can be violated, unlike the first law describing the conservation of energy.

Szilard in 1929 identified dissipation in an “intelligent” demon with increasing order in the vessel –- inaugurating a series of deep ideas on the nature of information, the erasure of memory, and the theory of computation, leading to the whole area of the thermodynamics of computation, and more recently, quantum computation.

When Darwin proposed his theory of natural selection –- most fully in the Origin of Species in 1859 -- he could have had no idea that the structure of his theory would closely resemble the “demonic” thought experiment proposed by Maxwell. For Darwin, natural selection was a filter (more recently called the selective sieve), which allows well-adapted variants to populate the next generation, and poorly adapted variants to be eliminated. Natural selection is a “demon” (in the Maxwell sense) that possesses sufficient intelligence (in the Szilard sense) to be able to detect, memorize, and act upon variation in one generation, and to induce an adaptive distribution of genotypes and phenotypes (organisms) in the next generation.

What distinguishes thermodynamic demons from selective demons is primarily the degree of “intelligence” or inferential capability with which each is charged. Selective demons are temporal, nonstationary filters of adaptive variability and comprised of numerous biotic and abiotic elements sensitive to continuous variability. Thermodynamic demons are typically capable of only simple binary recognition and action. Secondly, selective demons operate iteratively at each generational bottleneck and lead to changes in the composition of the selected agents. This significantly expands the scope of the selective filter beyond the physical demons working with unchanging particles. Thirdly, selected agents are capable of modifying properties of the demon (in part because they contribute to the filtering dynamics through frequency dependence) during niche construction. Physical demons are not modified by their decisions. Fourthly, evolution builds new selective and adaptive filters/demons that take the place of natural selection, introducing learning and plasticity into biology over short periods of time. These four factors imply that non-equilibrium statistical physics is vastly underspecified to serve as a theory for biological evolution. However, some form of nested, demonic dynamics serves as a powerful platform upon which to build selective and adaptive mechanisms in order to generate a more general theory of selection and adaptation. And we argue it is this theory that is required if we are going to understand the nature of biological complexity and sophisticated forms of cognition.

In summary, using comparative data we are exploring four fundamental properties of selective demons (1) how do multiple independent, continuous selection mechanisms operate to shape sequence and structure? (2) what are the limits to nested selection processes, and can we extract from genetic data evidence for this hierarchy? (3) how do selected agents modify selection pressures through processes of niche construction, and can we identify trends in this ability in the evolution of cellular networks? (4) how do plasticity and learning change the selective landscape, and how is this reflected in sequence diversity and the evolution of highly differentiated nervous systems and related inferential mechanisms?

This project aims to address the phenomenon of lineage-specific increases in complexity and “intelligence” by carefully formulating new models and theories at the interface of adaptive dynamics, thermodynamics, and information theory. The challenge is to expand our existing theories and models of variation generation (preserving their predictive and explanatory power at the sequence or phenotypic level), in order to respect what is known about population dynamics, by connecting these to fundamental processes and constraints.

For example, the relationship between genome size and metabolic rate directly relates biological information to physiological free energy (see scaling laws in project 2). The per base per generation mutation rate of genomes scales with mass as M-1/4 and prokaryotic genomes size scales as M1/4. Hence the genome repair rate becomes more efficient with increasing mass, allowing for an increase in genome size with increasing mass. This tells us that genome size scales as the reciprocal of the mutation rate (the error threshold) and that the rate of evolution in these lineages is mass-less. Thus regardless of size, prokaryotic evolution proceeds at a constant, maximum, mean velocity dictated by the genome length (Krakauer 2011). Moreover, in changing contexts, this can be shown to imply that the information exchange (normalized mutual information) between an organism and its background is maximized.

One of the most fundamental questions in complex adaptive systems, and inferential systems in particular, is how complexity scales with system size, and how the complexity of a system relates to the complexity of its parts, or its modules (Simon 1962). An obvious analog is how engineered computational power scales with circuit size and density. In moving beyond the earliest microbial life forms, what has fundamentally changed? Two obvious differences are: increasingly modular and hierarchical structures (Simon 1962, Moreira and Amaral 2007) and a proliferation of new sensors, computational elements, and information storage structures. As discussed previously, we shall throughout this project explore the idea that complexity relates to adaptive inference. For example, we could adopt an approximate-Bayesian line of reasoning that associates the genome with a prior estimation of the state of the world derived from selection acting on ancestors. Contemporary selection would then be a mechanism that culls a population and thereby updates the posterior distribution of better adapted genotypes. In this framework organismal complexity is largely in the service of predicting future states of the world. The same logic can be extended to gene regulation (operons), and more profoundly to the evolution of nervous systems, which increase the sensory resolution of an organism by making it responsive to finer spatial and temporal scales, and increasing its coding and memory capacity.

It is of great interest that the same fundamental quarter power scaling laws relating genome size to metabolism also apply at the level of nervous system size (Striedter 2005). Both genomes and brains require a significant source of free energy to function effectively, and both serve the purpose of storing and propagating information over functional networks. We contend that the critical difference between these adaptive levels lies in the dynamics of information acquisition, which imposes severe constraints on genomes that can be overcome by excitable cells (neurons). Comparative genomic data provide important clues. Those genes required for developing complex nervous systems are already found in distantly related organisms with minimal evidence of neural differentiation. Many if not most of the building blocks of cognitive complexity are already present in species with limited cognitive ability.

We shall explore the key innovations that take these shared components and connect them in such a way as to significantly increase the inferential capabilities of the organism while remaining subject to the fundamental constraints of resource limitations.

Back to Project Home