next up previous contents
Next: Comparison Sequences Up: Hexamer Dissimilarity Comparisons Previous: Validation Sequences   Contents

Calibration and Confidence Curves

Calibration curves from comparing fungal sequences with plant sequences (green), and from comparing fungal sequences with rhizobacterial sequences (blue) indicate low overlap in hexamer composition and clearly separated medians (Figure 4.2A). Fungal calibration curves are less adequately approximated by a normal distribution than plant and rhizobacterial calibration curves (Figure 4.2A and B), but for t > 0, a normal approximation does not grossly misrepresent calibration curves.

Confidence curves (Figure 4.2B, yellow and magenta lines) calculated from normal approximations to calibration curves indicate 15.2% and 8.7% comparison-wide error rates for rejecting the null hypothesis that a particular sequence resembles hexamer composition of fungal sequences when compared with plants (yellow line) and rhizobacteria (magenta line), respectively. Evaluating a confidence curve at a particular confidence level gives the approximate critical test value for t, above which we can reject the null hypothesis with an arbitrary, but known, degree of certainty. The approximate critical values of t for a 95% confidence level of a one-tailed test are 312 for comparisons between fungi and plants, and 384 for comparisons between fungi and rhizobacteria (Figure 4.2B).

Figure 4.2: Calibration and confidence distributions. (A) Calibration curves showing cumulative probability distributions of $D(A)-D(B)$ for two pairwise comparisons between training sets: between fungi and plants ($A_1$ and $B_1$, green lines), and between fungi and rhizobacteria ($A_2$ and $B_2$, blue lines). Table 4.2 summarizes constituents of training sets. Calibration curves were obtained from 100 resampled replicates in which each training set was randomly halved, and one half was used to establish hexamer counts, while the other half was used to compute $D(A)-D(B)$. The degree of overlap in the tails of calibration curves about $D(A)-D(B) = 0$ is used to establish experiment-wide false positive and false negative rates. Here, $\alpha _1 = 15.2\%$, $\beta _1 = 8.3\%$, $\alpha _2 =
8.7\%$, and $\beta _2 = 2.6\%$. (B) Confidence curves (yellow and magenta) indicate the comparison-wide confidence level for rejecting the null hypothesis that a sequence is from taxon A, as calculated from normal approximations to calibration curves in (A). Parameters (median, $\mu$, and standard deviation, $s$) used to estimate normal distributions are shown in the figure legend. This measure of confidence varies continuously with $t=D(A)-D(B)$, and is computed as $1-[A(t)-B(t)]/[A(t)+B(t)]$.

Figure 4.3: Calibration, confidence, and comparison curves. (A) Cumulative distributions of test results from four libraries, compared with fungi and plants. Libraries are described in Table 4.1. Calibration and confidence curves (thick green and yellow lines, respectively) are as in Figure 4.2. (B) Cumulative distributions of test results, as in (A), except the comparison is between fungi and rhizobacteria. Calibration curves appear as thick blue lines.


next up previous contents
Next: Comparison Sequences Up: Hexamer Dissimilarity Comparisons Previous: Validation Sequences   Contents
Peter T. Hraber 2001-06-13