Left: Hierarchical clustering of phenotype spaces. Right: Foldability as a function of a potential’s additivity. (Images: Evandro Ferrada)

Understanding how nature maps sequences of amino acids into the physical structure of the proteins they form is an old problem in biology, and a solution could open new doors to understanding the earliest forms of life -- and even enhance our ability to engineer new kinds of useful proteins.

In a paper published today in PLOS Computational Biology, SFI Omidyar Fellow Evandro Ferrada argues that the key to this problem doesn't lie simply in decoding nature’s chosen map. Instead, it’s in the underlying architecture that shapes and constrains such maps in the first place.

“This is a problem with a very long tradition,” Ferrada says, and it has potentially very broad implications. A better knowledge of the biological architecture underlying sequence-structure maps, for example, could help evolutionary biologists uncover the “primordial” amino acids present at the dawn of life. 

But first, researchers need to grasp the architecture. According to Ferrada, efforts until now have been somewhat piecemeal, though they do point to the interactions between a proteins’ amino acid sequences as playing a central role. 

To investigate, Ferrada randomly generated a range of possible interactions, called potential energy functions, to see how they shape what sets of proteins are viable, how diverse the set is, and how robust a set of proteins are to mutations.

The most interesting result, Ferrada says, was that he was able to predict what kinds of interactions are most likely to result in biologically promising architectures. In the future, Ferrada’s techniques could help others identify not just which proteins were present at earlier stages of the evolution of life, but also what constraints those put on life as we know it -- or as we might someday engineer it.

Read the paper in PLOS Computational Biology (December 4, 2014)