Rorick, MM; Baskerville, EB; Rask, TS; Day, KP; Pascual, M
A challenge in studying diverse multi-copy gene families is deciphering distinct functional types within immense sequence variation. Functional changes can in some cases be tracked through the evolutionary history of a gene family; however phylogenetic approaches are not possible in cases where gene families diversify primarily by recombination. We take a network theoretical approach to functionally classify the highly recombining var antigenic gene family of the malaria parasite Plasmodium falciparum. We sample var DBLa sequence types from a local population in Ghana, and classify 9,276 of these variants into just 48 functional types. Our approach is to first decompose each sequence type into its constituent, recombining parts; we then use a stochastic block model to identify functional groups among the parts; finally, we classify the sequence types based on which functional groups they contain. This method for functional classification does not rely on an inferred phylogenetic history, nor does it rely on inferring function based on conserved sequence features. Instead, it infers functional similarity among recombining parts based on the sharing of similar co-occurrence interactions with other parts. This method can therefore group sequences that have undetectable sequence homology or even distinct origination. Describing these 48 var functional types allows us to simplify the antigenic diversity within our dataset by over two orders of magnitude. We consider how the var functional types are distributed in isolates, and find a nonrandom pattern reflecting that common var functional types are non-randomly distinct from one another in terms of their functional composition. The coarse-graining of var gene diversity into biologically meaningful functional groups has important implications for understanding the disease ecology and evolution of this system, as well as for designing effective epidemiological monitoring and intervention.