Should classification aim to emphasize relationships? If so, then one tends to be a lumper. Or should classification reflect the power of evolutionary processes to produce differences? Then one will tend to be a splitter.
Most taxonomists try to strike a balance, but in cases when it is not clear where the balance is, sometimes it is necessary to make a philosophical decision, and not everyone will agree on which philosophy to use.
--P. Regal
Sequence similarity searches enable comparative analysis of sequences in a pairwise manner, resulting in optimal global [81], optimum local [111], or near-optimal local [4,5] alignments. This section compares plant and fungal sequences using the BLAST family of similarity search algorithms [4,5]. The purpose is to evaluate alternative parameter sets and justify the parameters used to compute quasispecies diversity in Chapters 3 and 4.
Table B.1 summarizes the sequences on which we will focus. These sequences were analyzed in detail in Chapter 4, and represent several gene family members from a plant, Medicago truncatula, and from two species of arbuscular mycorrhizal fungi from the genus Glomus.
Following procedures described in Chapter 3, I
performed two searches with blastn and one search with
tblastx [5]. Two different expect value
cutoffs were used, such that either
or
. The
tblastx search, which compares nucleotide query and subject
sequences as amino acid translations in every possible reading frame,
was run with a threshold
. All other parameters were set
at their default values, as provided by the pre-compiled binary
executable obtained from
ftp.ncbi.nlm.nih.gov/blast/executables.
In these searches, both the query and subject set consisted only of those sequences listed in Table B.1. For each match having an expect value smaller (closer to zero) than the threshold, I computed percent identity scores as the ratio of the raw score for matching a subject sequence to the query sequence's self-score, obtained from matching the query sequence with itself. This yields percent identities from zero through 100%. Each sequence always matches itself perfectly, resulting in 100% identity. Partial matches for overlapping fragments have intermediate identity values.
Identity matrices summarize identity values for any pair of query and subject sequences. Figures B.1, B.3, and B.5 illustrate identity matrices that resulted from the three searches described above.
In each case, we are interested in knowing how the number of distinct transcripts varies with the degree of stringency required to consider two transcripts the same quasispecies. Percent identity varies continuously, so a percent identity threshold is used as a criterion for lumping two individual transcripts into a common quasispecies.
Dendrograms constructed from each identity matrix appear in Figures B.2, B.4, and B.6. These were made using the hclust method in R [62]. How does quasispecies diversity vary with the identity threshold, using varied search algorithms and parameters?
In the first case, using BLASTN with
(Figures B.1 and B.2),
quasispecies observed diversity at 90%, 70%, 50%, and 30%
threshold identity values is 22, 21, 21, and 20, respectively.
This is the case for which results are summarized in Chapters 3 and 4. Let us now consider two simple alternatives.
In the second case, using BLASTN with
(Figures B.3 and B.4),
quasispecies diversity is the same as in the previous case. However,
one sequence (GvCHS1) has joined a cluster of other chitinases
from Glomus, having a very weak similarity that was excluded
when using a higher expect value threshold. A weak match between this
sequence and a plant chitinase (raw score = 30, E=0.008) is also
observed in the identity matrix, though it does not appear in the
dendrogram. The two sequences match perfectly over a span of 14 nt.
Here, the two BLASTN searches resulted in the same observed diversity. It is easy to imagine that other weak matches might occur in large-scale comparisons. Whether or not these would affect the observed diversity is unclear. I chose to use the more conservative of the two parameter choices, to minimize spurious clustering based on weak matches.
In the third case, using TBLASTX with
(Figures B.5 and B.6), we observe
many more matches. This indicates that identities as amino acids are
more readily identified than as nucleic acids. In the dendrogram,
clusters have more constituents; fewer singleton clusters are
apparent. All but one of the chitin synthases from G.
intraradices and G. versiforme cluster together. The fungal
phosphate transporter clusters with the two plant transporters, though
the degree of identity is too low to be counted as a single
quasispecies. Quasispecies diversity ranges from 24 at 90% identity
to 16 at 30% identity.
Comparing nucleotides as amino acid translations results in greater sensitivity than comparing them as nucleotides. However, because of the tendency to join transcripts that originated from the genomes of different, reproductively isolated species, considering them as constituents of the same transcript quasispecies is a dubious procedure.
Because of observations such as those described here, I chose to use the first set of parameters in BLAST comparisons to compute observed diversity of transcript quasispecies. Different proteins evolve at different rates [48,69,70], so I thought it appropriate to report results at varied percent identity thresholds.
| ID | LOCUS | ACCESSION | GENE NAME | |
| Glomus intraradices | ||||
| a | 1532 | GiHB1 | AF110198 | homeobox protein HB1 |
| b | 858 | GiMYC2 | AF110197 | MYC2 |
| c | 1453 | GiMYC1 | AF110196 | MYC1 |
| d | 610 | GiCHS1 | L77908 | chitin synthase |
| e | 617 | GiBCHS1 | AF260996 | chitin synthase, isolate GiBCHS1 |
| f | 614 | GiCHS3 | AF260993 | chitin synthase, isolate GiCHS3 |
| g | 617 | GiCHS2 | AF260986 | chitin synthase, isolate GiCHS2 |
| h | 617 | GiBCHS2 | AF260985 | chitin synthase, isolate GiBCHS2 |
| i | 617 | GiVCHS2 | AF260983 | chitin synthase, isolate GiVCHS2 |
| j | 617 | GiWCHS2 | AF260982 | chitin synthase, isolate GiWCHS2 |
| Glomus versiforme | ||||
| k | 4116 | GvCHS3 | AJ009630 | chitin synthase, clone Gvchs3 |
| l | 481 | GvCHS2 | AJ009629 | chitin synthase, clone Gvchs2 |
| m | 638 | GvCHS1 | AJ009628 | chitin synthase, clone Gvchs1 |
| n | 1833 | GvPT | U38650 | phosphate transporter |
| Medicago truncatula | ||||
| o | 1867 | MtPT2 | AF000355 | phosphate transporter MtPT2 |
| p | 1920 | MtPT1 | AF000354 | phosphate transporter MtPT1 |
| q | 954 | Mt4 | AF055921 | Mt4 |
| r | 1305 | MtCHI1 | Y10373 | chitinase |
| s | 181 | MtCHI08g | AF167329 | chitinase, clone T130008g |
| t | 265 | MtCHI07g | AF167328 | chitinase, clone T130007g |
| u | 188 | MtCHI06g | AF167327 | chitinase, clone T130006g |
| v | 188 | MtCHI05g | AF167326 | chitinase, clone T130005g |
| w | 191 | MtCHI04g | AF167325 | chitinase, clone T130004g |
| x | 197 | MtCHI03g | AF167324 | chitinase, clone T130003g |
| y | 260 | MtCHI02g | AF167323 | chitinase, clone T130002g |
| x | 245 | MtCHI01g | AF167322 | chitinase, clone T130001g |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |