Mark Bieda, Colleen Webb

Paper #: 04-09-026

Conserved non-coding sequences (CNSs) are conserved sequences in genomes and often contain cis-regulatory elements such as transcription factor binding sites. The KCNK gene family proteins represent an ancient system of ionic regulation found in nearly all metazoan cell types. A conservative set of 33 genes from this family was derived by using structural/biophysical criteria. Here, we investigate the distribution and prevalence of CNSs derived from alignments of the genomes of Caenorhabditis elegans and Caenorhabditis briggsae in this family, which includes >300 kb of non-coding DNA (ncDNA) in >350 regions (intronic and intergenic). Overall, there were ~14 CNSs/gene and ~10% of ncDNA nucleotides were in CNSs. A significant portion of CNSs (~30%) were found in introns and most genes possessed at least one intronic CNS. Generally, CNSs were spatially clumped, had higher GC content than ncDNA outside of CNSs, and displayed significant transition bias. For both introns and intergenic regions, CNSs were found at similar normalized prevalence, as measured by percent coverage of ncDNA and number of CNSs per kb of ncDNA, and CNS prevalence was correlated with region length. Intronic CNSs were preferentially distributed towards the 5’ end, and to a lesser extent, the 3’ end of the gene, with significantly fewer in middle introns. These results suggest that intronic CNSs comprise a major population of CNSs from C. elegans--C. briggsae and that the processes governing intronic and intergenic CNSs may be similar.

PDF