Automatic Determination of the Number of Transmitted Founder Variants in HCV

This directory contains the programs to find the number of transmitted variants in acute HCV infections as described in the supplement to the paper [ref]. Two different methods of clustering are implemented and are provided in separate folders. You can download these as zip files: AvThreshClust and MaxThreshClust.

Both of these clustering tools are based on a model that takes into account HCV's life cycle, that HCV replication occurs via a replication complex, and that there can be many replication complexes continuously producing viruses from a long-lived infected cell. As a result, the model predicts that sequences with a small number of shared mutations can arise in a subject at detectable frequencies prior to the onset of immune selection. Furthermore, the model shows that these clusters have to satisfy two separate criteria: (a) the total amount of mutation that could have accumulated is limited by the mutation rate of the virus and the generation time, and (b) the number of mutations shared by distinct sequences is related by coalescent theory to the growth and stabilization of viral load in these acute infections. Starting at the tips of the phylogenetic tree, these codes identify the largest clusters that are consistent with these two criteria.

The two codes presented here differ in their conservativeness. AvThreshClust limits only the average amount of mutation observed in the cluster, allowing a few highly divergent viruses to be included within them. The average is a pretty robust measure of amount of evolution; so the number of clusters identified by this method is likely to be a lower bound on the number of transmitted variants. MaxThreshClust is far more aggressive in that it applies a cutoff on the most divergent sequence allowed in a cluster. In acute-to-acute transmissions sampled to a moderate depth, this provides a better estimate of the likely number of transmitted variants.

To cluster by average mutation method, in a Terminal enter:

      unzip AvThreshClust.zip
      cd AvThreshClust

To cluster by maximum mutation method, in a Terminal enter:

     unzip MaxThreshClust.zip
     cd MaxThreshClust

The instructions to compile and run are the same in both cases: