Martijn Huynen, Erik Nimwegen

Paper #: 97-03-025

We compare the frequency distribution of gene family sizes in the complete genomes of five Bacteria (“Escherichia coli,” “Haemophilus influenzae,” “Mycoplasma genitalium,” “Mycoplasma pneumoniae,” and “Synechocystis sp. PCC6803”), one Archaeon (“Methanococcus janaschii”), one eukaryote (“Saccharomyces cerevisiae”), the Vaccinia virus and the bacteriophage T4. The sizes of the gene families versus their frequencies show power-law distributions that tend to become flatter (have a larger exponent) as the numberof genes in the genome increases. Power-law distributions generally occur as the limit distribution of a multiplicativestochastic process with a boundary constraint. The exponent of the power-law distribution depends on the average and the variance of the logarithm of the multiplication factor. We discuss various models that can account for a multiplicative process determining the size of gene families in the genome. In particular we argue that the size distribution of the gene families in complete genomes indicates that the genes within a family do not behave independently, and that the dynamics of gene family sizes does not operate at the level of single genes.

PDF