A White Paper on Nematode Comparative Genomics David Mck
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE Journal of Nematology 37(4):408–416. 2005. © The Society of Nematologistsprovided 2005.by UGD Academic Repository A White Paper on Nematode Comparative Genomics David McK. Bird,1 Mark L. Blaxter,2 James P. McCarter,3,4 Makedonka Mitreva,3 Paul W. Sternberg,5 W. Kelley Thomas6 Abstract: In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) nematodes have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequenc- ing. Key words: Caenorhabditis elegans, comparative genomics, genome sequencing, systematics. The “discipline” of genomics arguably began with a on C. elegans, it was clear to the C. elegans community project to assemble a physical map of the Caenorhabditis that the interpretation of its complete sequence would elegans genome (Coulson and Sulston, 1984) and cer- be enhanced by comparison with other related ge- tainly was consummated by the attainment of the C. nomes. To this end, a high-quality, 98% complete draft elegans genome (The C. elegans Sequencing Consor- genome sequence for Caenorhabditis briggsae was ob- tium, 1998). This nematode was the first multicellular tained (Stein et al., 2003), providing a platform for organism for which a complete genome sequence was comparative genomics. Like C. elegans, this species en- generated, and it remains the only metazoan for which codes about 20,000 proteins. The coding portions of the sequence of every single nucleotide (a total of the C. elegans and C. briggsae genomes are highly con- 100,278,047) has been finished to a high degree of con- served, and essentially all of the known non-coding fidence (Chen et al., 2005). RNAs are shared between the two species, although The value of C. elegans as a model organism for bio- non-transcribed regions are highly diverged. Gene or- medical research is unquestioned. For example, “the der also is conserved, particularly for those genes in worm” serves as a robust model for complex human operons, but only in local contexts as there have been traits including Alzheimer’s disease (Link et al., 2003), a large number of mainly within-chromosome rear- aging (Finch and Ruvkun, 2001; Lee et al., 2003), and rangements that break long-range synteny. It is antici- diabetes and obesity (McKay et al., 2003). The insight pated that identification of conserved elements within that has been garnered about this species and that is this divergent background will point to subtle primary available both in approximately 7,000 publications and and higher-order regulatory elements across the ge- via the Web resource, WormBase, elevates C. elegans to nome, and this approach has proven effective in gene- its status as one of the best understood organisms by-gene comparisons. However, because genome fea- (Chen et al., 2005; Harris et al., 2004; Stein et al., 2001, tures evolve at different rates, are of differing sizes, and 2002; www.wormbase.org). are detected with varying ease by a variety of tools, com- However, although the genome sequence serves as parison of any two species is not sufficient to define the glue to integrate much of the collective knowledge many sequence features. To allow better genome align- ment, gene interpretation, promoter analysis, and iden- tification of non-coding RNA and other functional fea- Received for publication 19 July 2005. 1 Center for the Biology of Nematode Parasitism, NC State University, Ra- tures, as well as to explore the forces that mold these leigh, NC 27695. genomes, the genomes of three additional Caenorhabdi- 2 Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JT, UK. tis species (C. remanei, Caenorhabditis n. sp. PB2801, and 3 Genome Sequencing Center, Department of Genetics, Washington Univer- C. japonica) are currently being obtained. A key to the sity School of Medicine, St. Louis, MO 63108. 4 Divergence Inc., St. Louis, MO 63141. selection of these taxa is the knowledge of their phylo- 5 HHMI, Division of Biology, California Institute of Technology, Pasadena, genetic relationships (Kiontke et al., 2004). CA 91125. 6 Hubbard Center for Genome Studies, University of New Hampshire, As has been famously noted, “Caenorhabditis elegans is Durham, NH 03824. a nematode” (Blaxter, 1998), and there is no doubt that The authors are grateful for discussions with and comments from many colleagues, especially during the Genome Workshop at the 2004 Society of this species will serve as an essential guide in exploring Nematology annual meeting, Estes Park, CO, August 2004 (Bird, 2004), and the the genomes of other nematode species. Conversely, Helminth Genome Sequencing Meeting organized by Prof. R. Maizels and held at the Sanger Institute, Cambridge, UK, March, 2004. In particular, we thank those genomes will aid in the understanding of C. ele- Prof. Maizels for allowing us to draw from his report of that meeting in pre- gans. But as detailed below, the interests of the Nema- paring this document. E-mail: david [email protected] tology community at large are broad and varied, and This paper was edited by J. L. Starr. there are many reasons for obtaining additional nema- 408 Nematode Comparative Genomics: Bird et al. 409 tode genomes and many reasons for choosing species this genome has only just begun, previous analyses of to be sequenced. Brugia ESTs (Bird et al., 1999) and longer segments of the Brugia genome (Guiliano et al., 2002) have revealed Status of Genome Resources for Nematoda the utility of C. elegans as an annotation platform. Not surprisingly, given that these nematodes last shared a Molecular phylogenetics (Blaxter et al., 1998; De Ley common ancestor more than 300 million years ago, the and Blaxter, 2002) defines three major nematode degree of synteny is not extensive across the genome classes that can be further divided into five clades: Do- (Ghedin et al., 2004) but exhibits some local conserva- rylaimia (Clade I), Enoplia (Clade II), Chromadorea tion (Guiliano et al., 2002); intra-chromosomal rear- and Spirurina (Clade III), Tylenchina (Clade IV), and rangement is greatly favored over inter-chromosomal Rhabditina (Clade V). Caenorhabditis elegans is a mem- rearrangement (Whitton et al., 2004). In addition to ber of Rhabditina, and the completed and in-progress contributing to an enhanced understanding of filarial Caenorhabditis genomes (Table 1) anchor this large phy- parasite biology and suggesting new vaccine candidates lum as the reference Clade V species. The genomes of and drug targets, Brugia defines a reference clade III three additional taxa in Clade V now being sequenced nematode genome (Blaxter et al., 1998) and, like the will be particularly informative. One is the strongylid Caenorhabditis genomes, will be invaluable for annotat- animal parasite Haemonchus contortus. The second is ing additional nematode genomes (Bird and Opper- Pristionchus pacificus, a free-living nematode that has man, 1998; Bird et al., 1999). Clade III comprises mul- proven to be a powerful laboratory model for compara- tiple orders of animal parasitic taxa, including Spiru- tive development (Eizinger and Sommer, 1997) and rida, Oxyurida, Ascarida, and Rhigonematida. A key to may be particularly informative because of its relatively understanding the radiation of this entirely parasitic basal position in Clade V. The third is the entopathe- clade will be comparative genomic sequence from the nogenic species Heterorhabditis bacteriophora, which not most appropriate free-living outgroup, such as Plectus only will shed light on bacterial-nematode and insect- acuminatus. nematode interactions (the symbiotic bacterial partner The U.S. Department of Agriculture (USDA) has re- of H. bacteriophora has also been sequenced) but, be- cently committed to funding a deep draft genome se- cause it has been proposed that Caenorhabditis them- quence of the root-knot nematode, Meloidogyne hapla, as selves have insect associations (Blaxter and Bird, 1997), the first plant-parasitic nematode species to be se- may give clues to C. elegans ecology. quenced (Table 1). Meloidogyne hapla will serve as the A draft genome sequence of Brugia malayi, which is representative Clade IV nematode genome and, just as responsible for human filariasis and elephantiasis, has the Brugia and Caenorhabditis sequences collectively will recently been obtained (Ghedin et al., 2004) and is the aid annotation of the Meloidogyne sequence, in turn will first parasitic nematode to be sequenced (Table 1). help the annotation of those, thus becoming a model Like C. elegans, B. malayi has six chromosomes, which unto itself. Funding also has been secured for the ver- comprise the estimated 90 Mb genome. Using a whole tebrate parasites Trichinella spiralis (clade I) and Hae- genome shotgun strategy in conjunction with sequenc- monchus contortus (clade V) (Table 1). ing the ends of large-insert, Bacterial Artificial Chro- In addition to genome sequencing, many nematode mosome (BAC) clones, the project has generated 9-fold genes have been identified from expressed sequence genome coverage.