Clearly, individual transcripts vary in their hexamer composition. That each should vary in the degree to which it resembles plants or fungi is perhaps not surprising. However, that the majority of transcripts in the axenic plant library resemble fungal hexamer composition more closely than three axenic fungal libraries is indeed a surprise. That three fungal libraries should contain a greater proportion of plant-like transcripts than the plant library is highly counter-intuitive. How is one best to interpret this result?
With due skepticism, we could indict as faulty the method, interpretation of the results, the training data, and the sequence libraries. Let us consider these possibilities in turn.
The method, comparative lexical analysis via hexamer dissimilarity, appears sensitive to some simple, uninformative events, such as the occurrence of a poly-A or poly-T segment in a nucleotide sequence. This study minimized these cases in part because of having observed similar results in different libraries (cf. Section 2.4). The comparative analysis was performed repeatedly, using variations of the test and training sets shown here, with more inclusive training data and less stringent parameters. The observation of a non-zero proportion of plant-like transcripts in axenic fungal libraries and fungal transcripts in pure plant libraries is immutable. Thus, the robustness of the results exemplified in this study is pervasive. This is not a claim that one could not identify a set of sequences and parameters that would reverse our observations, but using non-targeted selection of training sequences and filtering procedures, the overall outcome is not influenced greatly by variations on the methods described here. Further, there was a large success rate for the validation sequences (Tables 4.3 and 2.3).
The training data are not perfect. Training data for the fungal taxa we are studying are not abundant. Having access to a greater volume of gene sequence data from any of the Zygomycetes, especially from the Glomales, would certainly refine our comparisons. We would like to be able to evaluate how well the lexical analysis performs in the light of more information. However, the inferences made by comparing hexamer frequencies agree with trends in GC content, but with greater sensitivity and statistical rigor. For example, consider the similarity we identified between GvPT1 and the plant transporters MtPT1 and MtPT2 (Table 4.3 and Figure 4.1), which was not apparent using either GC content analysis or sequence similarity searching with BLASTN (Appendix B).
Perhaps the sequences have unusual properties that have contributed to an odd result. For example, it is understood that some classes of proteins, such as transmembrane ion transporters, are known to contain discrete domains of either hydrophobic or hydrophilic amino acids [69,70]. Systematic deviation in nucleotide composition correlated with codons for amino acid hydrophobicity might well produce results like those seen for MtPT1 and GvPT1. A single example cannot explain all cases, and other biases may be involved.
Preferential usage of particular codons and biases in composition of GC content are known to vary between and within taxa [69,70,106,125]. Percent GC content in G. intraradices is lower than in Ascomycetes and Basidiomycetes [104]. These phenomena might produce biases in hexamer composition that are identified in lexical analysis. The data sets are too limited at present to warrant such a claim with any assurance.
An interesting possibility is that of horizontal gene transfer between genomes, with or without a viral or bacterial intermediate. Other model genomes, including those of humans, are thought to house genes that have been transferred laterally from other species [122]. Horizontal transfer is a particularly appealing notion, owing to the intimate and obligate nature of the AM symbiosis [12,54,55,90]. Demonstrative proof is, as always, elusive. A look at many closely related taxa would provide some insight into the likelihood that this hypothesis is true. Cases that have demonstrated horizontal transfer between eukaryotes examined several species descended from a common lineage, and identified as absent the transferred gene from sibling species, but present in the receiving species of the transfer event [48,70].
An alternative explanation to the observation of plant-like fungal genes and fungal plant genes is rooted in the evolutionary history of plants. One investigator, Peter Atsatt, has hypothesized that vascular plants evolved as a fusion of proto-algal and proto-fungal genomes [9]. This hypothesis does not rely on horizontal gene transfer, but rather the synthesis of two independent genomes in a process termed symbiogenesis. In this scenario, an ancestral alga aquired from a fungus the capacity for growth via tip elongation, as is seen in the pollen tubes, root hairs, and other specialized cells of modern vascular plants [9,65,107]. The mycobiont may have received protection from ultraviolet radiation by flavonoids synthesized by the algal partner, or phycobiont [65].
Testing the symbiogenesis hypothesis is complicated by the fact that plants and fungi share a common ancestor prior to the divergence of these two taxa, about 1.6 billion years ago [124]. The existence of plant-like fungal genes and fungal plant genes could result from either genome fusion or a shared common ancestor, from which some sequences may have been more highly conserved across extant taxa than others (P. Atsatt, personal communication).
Any or all of these hypotheses may be true. All merit further consideration and testing, which is beyond the scope of this work.