About Santa Fe Institute About Santa Fe Institute Education Education Events Events Network Network Research Research About Santa Fe Institute Education Events Network Research

Overview

SFI Working Paper Abstract

1997

Title:

Correlated Mutations in Protein Sequences: Phylogenetic and Structural Effects

Author(s):

Alan S. Lapedes, B. G. Giraud, L. C. Liu, and G. D. Stormo

Files:[postscript]  
Paper #:

97-12-088

Abstract:

Covariation analysis of sets of aligned sequences for RNA molecules is relatively successful in elucidating RNA secondary structure, as well as some aspects of tertiary structure [Gutell(1992)]. Covariation analysis of sets of aligned sequences for protein molecules is successful in certain instances in elucidating certain structural and functional links [Korber(1993)], but in general, pairs of sites displaying highly covarying mutations in protein sequences do not necessarily correspond to sites that are spatially close in the protein structure [Gobel(1994)], [Clark(1995)], [Shindyalov(1994)], [Thomas(1996)], [Taylor(1994)], [Neher(1994)]. In this paper we identify two reasons why naive use of covariation analysis for protein sequences fails to reliably indicate sequence positions that are spatially proximate. The first reason involves the bias introduced in calculation of covariation measures due to the fact that biological sequences are generally related by a nontrivial phylogenetic tree. We present a null-model approach to solve this problem. The second reason involves linked chains of covariation which can result in pairs of sites displaying significant covariation even though they are not spatially proximate. We present a maximum entropy solution to this classic problem of “causation versus correlation.” The methodologies are validated in simulation.