View metadata, citation and similar papers at core.ac.uk brought to you by CORE Journal of Nematology 37(4):408–416. 2005. © The Society of Nematologistsprovided 2005.by UGD Academic Repository A White Paper on Comparative Genomics David McK. Bird,1 Mark L. Blaxter,2 James P. McCarter,3,4 Makedonka Mitreva,3 Paul W. Sternberg,5 W. Kelley Thomas6 Abstract: In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequenc- ing. Key words: elegans, comparative genomics, genome sequencing, systematics.

The “discipline” of genomics arguably began with a on C. elegans, it was clear to the C. elegans community project to assemble a physical map of the Caenorhabditis that the interpretation of its complete sequence would elegans genome (Coulson and Sulston, 1984) and cer- be enhanced by comparison with other related ge- tainly was consummated by the attainment of the C. nomes. To this end, a high-quality, 98% complete draft elegans genome (The C. elegans Sequencing Consor- genome sequence for Caenorhabditis briggsae was ob- tium, 1998). This nematode was the first multicellular tained (Stein et al., 2003), providing a platform for organism for which a complete genome sequence was comparative genomics. Like C. elegans, this species en- generated, and it remains the only metazoan for which codes about 20,000 proteins. The coding portions of the sequence of every single nucleotide (a total of the C. elegans and C. briggsae genomes are highly con- 100,278,047) has been finished to a high degree of con- served, and essentially all of the known non-coding fidence (Chen et al., 2005). RNAs are shared between the two species, although The value of C. elegans as a model organism for bio- non-transcribed regions are highly diverged. Gene or- medical research is unquestioned. For example, “the der also is conserved, particularly for those genes in worm” serves as a robust model for complex human operons, but only in local contexts as there have been traits including Alzheimer’s disease (Link et al., 2003), a large number of mainly within-chromosome rear- aging (Finch and Ruvkun, 2001; Lee et al., 2003), and rangements that break long-range synteny. It is antici- diabetes and obesity (McKay et al., 2003). The insight pated that identification of conserved elements within that has been garnered about this species and that is this divergent background will point to subtle primary available both in approximately 7,000 publications and and higher-order regulatory elements across the ge- via the Web resource, WormBase, elevates C. elegans to nome, and this approach has proven effective in gene- its status as one of the best understood organisms by-gene comparisons. However, because genome fea- (Chen et al., 2005; Harris et al., 2004; Stein et al., 2001, tures evolve at different rates, are of differing sizes, and 2002; www.wormbase.org). are detected with varying ease by a variety of tools, com- However, although the genome sequence serves as parison of any two species is not sufficient to define the glue to integrate much of the collective knowledge many sequence features. To allow better genome align- ment, gene interpretation, promoter analysis, and iden- tification of non-coding RNA and other functional fea- Received for publication 19 July 2005. 1 Center for the Biology of Nematode Parasitism, NC State University, Ra- tures, as well as to explore the forces that mold these leigh, NC 27695. genomes, the genomes of three additional Caenorhabdi- 2 Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JT, UK. tis species (C. remanei, Caenorhabditis n. sp. PB2801, and 3 Genome Sequencing Center, Department of Genetics, Washington Univer- C. japonica) are currently being obtained. A key to the sity School of Medicine, St. Louis, MO 63108. 4 Divergence Inc., St. Louis, MO 63141. selection of these taxa is the knowledge of their phylo- 5 HHMI, Division of Biology, California Institute of Technology, Pasadena, genetic relationships (Kiontke et al., 2004). CA 91125. 6 Hubbard Center for Genome Studies, University of New Hampshire, As has been famously noted, “ is Durham, NH 03824. a nematode” (Blaxter, 1998), and there is no doubt that The authors are grateful for discussions with and comments from many colleagues, especially during the Genome Workshop at the 2004 Society of this species will serve as an essential guide in exploring Nematology annual meeting, Estes Park, CO, August 2004 (Bird, 2004), and the the genomes of other nematode species. Conversely, Helminth Genome Sequencing Meeting organized by Prof. R. Maizels and held at the Sanger Institute, Cambridge, UK, March, 2004. In particular, we thank those genomes will aid in the understanding of C. ele- Prof. Maizels for allowing us to draw from his report of that meeting in pre- gans. But as detailed below, the interests of the Nema- paring this document. E-mail: david [email protected] tology community at large are broad and varied, and This paper was edited by J. L. Starr. there are many reasons for obtaining additional nema- 408 Nematode Comparative Genomics: Bird et al. 409 tode genomes and many reasons for choosing species this genome has only just begun, previous analyses of to be sequenced. Brugia ESTs (Bird et al., 1999) and longer segments of the Brugia genome (Guiliano et al., 2002) have revealed Status of Genome Resources for Nematoda the utility of C. elegans as an annotation platform. Not surprisingly, given that these nematodes last shared a Molecular phylogenetics (Blaxter et al., 1998; De Ley common ancestor more than 300 million years ago, the and Blaxter, 2002) defines three major nematode degree of synteny is not extensive across the genome classes that can be further divided into five clades: Do- (Ghedin et al., 2004) but exhibits some local conserva- rylaimia (Clade I), Enoplia (Clade II), tion (Guiliano et al., 2002); intra-chromosomal rear- and Spirurina (Clade III), Tylenchina (Clade IV), and rangement is greatly favored over inter-chromosomal Rhabditina (Clade V). Caenorhabditis elegans is a mem- rearrangement (Whitton et al., 2004). In addition to ber of Rhabditina, and the completed and in-progress contributing to an enhanced understanding of filarial Caenorhabditis genomes (Table 1) anchor this large phy- parasite biology and suggesting new vaccine candidates lum as the reference Clade V species. The genomes of and drug targets, Brugia defines a reference clade III three additional taxa in Clade V now being sequenced nematode genome (Blaxter et al., 1998) and, like the will be particularly informative. One is the strongylid Caenorhabditis genomes, will be invaluable for annotat- parasite Haemonchus contortus. The second is ing additional nematode genomes (Bird and Opper- Pristionchus pacificus, a free-living nematode that has man, 1998; Bird et al., 1999). Clade III comprises mul- proven to be a powerful laboratory model for compara- tiple orders of animal parasitic taxa, including Spiru- tive development (Eizinger and Sommer, 1997) and rida, Oxyurida, Ascarida, and Rhigonematida. A key to may be particularly informative because of its relatively understanding the radiation of this entirely parasitic basal position in Clade V. The third is the entopathe- clade will be comparative genomic sequence from the nogenic species Heterorhabditis bacteriophora, which not most appropriate free-living outgroup, such as Plectus only will shed light on bacterial-nematode and insect- acuminatus. nematode interactions (the symbiotic bacterial partner The U.S. Department of Agriculture (USDA) has re- of H. bacteriophora has also been sequenced) but, be- cently committed to funding a deep draft genome se- cause it has been proposed that Caenorhabditis them- quence of the root-knot nematode, Meloidogyne hapla, as selves have insect associations (Blaxter and Bird, 1997), the first plant-parasitic nematode species to be se- may give clues to C. elegans ecology. quenced (Table 1). Meloidogyne hapla will serve as the A draft genome sequence of Brugia malayi, which is representative Clade IV nematode genome and, just as responsible for human filariasis and elephantiasis, has the Brugia and Caenorhabditis sequences collectively will recently been obtained (Ghedin et al., 2004) and is the aid annotation of the Meloidogyne sequence, in turn will first parasitic nematode to be sequenced (Table 1). help the annotation of those, thus becoming a model Like C. elegans, B. malayi has six chromosomes, which unto itself. Funding also has been secured for the ver- comprise the estimated 90 Mb genome. Using a whole tebrate parasites Trichinella spiralis (clade I) and Hae- genome shotgun strategy in conjunction with sequenc- monchus contortus (clade V) (Table 1). ing the ends of large-insert, Bacterial Artificial Chro- In addition to genome sequencing, many nematode mosome (BAC) clones, the project has generated 9-fold genes have been identified from expressed sequence genome coverage. Although the annotation phase of tag (EST) projects (McCarter et al., 2003a). ESTs are

TABLE 1. Genome sizes and chromosome numbers of nematode taxa for which a genome project is under way.

Species Clade1 Tropic ecology2 Type of genome project Status3 Genome size (Mb) Funding4

Caenorhabditis elegans V B Full genome sequence C 100.3 NHGRI Caenorhabditis briggsae V B Whole genome draft C 105 NHGRI Caenorhabditis remanei V B Whole genome draft C ∼140 NHGRI Caenorhabditis japonica V B Whole genome draft P — NHGRI Caenorhabditis sp. c.f. PB2801 V B Whole genome draft P — NHGRI Pristionchus pacificus V A-O-P Whole genome draft P ∼110 NHGRI Heterorhabditis bacteriophora V I Whole genome draft P ∼110 NHGRI Brugia malayi III V Whole genome shotgun C ∼100 NIAID Haemonchus contortus V V Whole genome draft P ∼55 Sanger Meloidogyne hapla IV P Pooled BAC sequencing P 50 CSREES Trichinella spiralis I V Whole genome draft P 270 NHGRI

1 Clade based on the assignment of Blaxter et al., 1998. 2 Food source. B: Bacteriovore; A-O-P: Algivore-Omnivore-Predator; I: Insect-associated bacteriovore; V: Vertebrate parasite; P: Plant parasite. 3 Status of genome sequencing project. C: Completed; P: Planned or in progress. 4 Funding source for genome sequencing project. NHGRI: National Human Genome Research Institute, United States; NIAID: National Institute of Allergy and Infectious Disease, United States; Sanger: The Welcome Trust Sanger Institute, United Kingdom; CSREES: NSF/USDA CSREES Microbial Genome Sequencing Program, United States. 410 Journal of Nematology, Volume 37, No. 4, December 2005 single-pass sequencing scans of randomly selected development of many new nematode genome se- cDNA clones (McCarter et al., 2000). As recently as quences from across the phylum and disciplines. Ulti- 2000 there were only 24,000 ESTs in public databases mately, the challenge is to promote the sequencing of from nematodes other than C. elegans, but by December the most informative set of nematode genomes that 2004 nearly 350,000 had been deposited, dispropor- supply the tools for functional genomics and allow for tionately focusing on parasites of humans (e.g., Blaxter informative comparative genomics across the phylum. et al., 2002; Daub et al., 2000), (e.g., Tetteh et It is important to stress that, other than for the sake al., 1999), and plants (e.g., Bird et al., 2002; McCarter of example and to make broad generalities, we pur- et al., 2003b; Mitreva et al., 2004). A meta-analysis of the posely avoid prioritizing nematode taxa. This is not for genomic biology of the phylum Nematoda was com- political expediency but because there is no need. Pri- pleted using >250,000 ESTs originating from 30 spe- oritization is specific to the user community and rel- cies, clustered into 93,000 genes and grouped into evant funding agencies and is not a useful phylum-wide 60,000 gene families (Parkinson et al., 2004). This col- exercise. Rather, we outline fundamental needs within lection of data was used to estimate the degree to which Nematology and the justification in each context for “genespace” (the diversity of distinct genes) within nematodes has been sampled. In nematodes, despite BOX 1. Criteria for selecting nematode species for sequencing. the availability of the genomes of two Caenorhabditis spe- 1. Importance of the nematode cies, genespace appears far from thoroughly sampled, a. Parasite i. Human as the addition of new species to the analysis has yielded 1. Major parasites of interest for drug, vaccine, or a linear increase in discovery of new genes. Therefore, diagnostic development despite a deceptively uniform body plan, nematodes 2. Parasites of interest as immune/hematopoietic seem to be more diverse at the molecular level than was modulators ∼ 3. Models of human parasites (e.g., parasites of rodents) previously recognized. The set of 20,000 genes and ii. Animal 12,000 gene families represented by C. elegans provides 1. Farm animals a starting point for exploring this diversity and does a. Livestock capture many of the conserved gene families shared b. Poultry c. Fish with other eukaryotes, but it represents only a small 2. Companion animals portion of the expanding total nematode genespace. As 3. Invertebrates sequencing has been performed from only a few dozen iii. Plant iv. Virus or bacterial vector of perhaps over one million nematode species, repre- b. Tractable model for a parasite senting a very limited component of the phylogenetic i. Ease of culture and ecological diversity of the phylum, the vast majority ii. Manageable host of nematode genespace remains unsampled. Free-living iii. Availability of forward genetics iv. Ability to transform or use RNAi taxa representing basal lineages in the phylum (Clades v. Availability/accessibility of developmental/larval stages I and II) are particularly critical in this respect. c. “Useful” nematodes i. Biocontrol agents oal of his aper ii. Insecticidal species G T P iii. Saprophytes d. Phylogenetic position As a group, we recognize two important facts. First, i. Basal species for the entire phylum Nematology is a discipline as broad as its phylum and ii. Representative of under-represented clade spans numerous and diverse fundamental scientific iii. Species to help understand relationships of parasitic species (e.g., free-living relative) goals, including exploiting the C. elegans model for ba- e. Ecologically significant species sic and biomedical biology, agricultural interests asso- i. Environmental monitors ciated with plant- and animal-parasitic groups, medical ii. Representatives of particular niches 1. Marine nematodes and veterinary interests in species that impact human 2. Extremophiles and animal health, and exploring the vast biodiversity f. Annotation models and ecological associations of the Nematoda. Second, i. Relationship to C. elegans sequencing of genomes is now a basic approach to ad- ii. Relationship to important parasites 2. Beneficiaries of the genome sequence dressing questions in biology and will continue to ex- a. A significant, existing community of researchers able to pand in practice. Numerous research approaches from exploit the genome information gene expression profiling (microarrays) to proteomics b. The potential for whole genome sequence to stimulate are built upon an assumption of an available genome research activity in a neglected area (e.g., a disease-causing pathogen) and (or) the potential to sequence and reasonably accurate gene models. Con- facilitate new entrants commencing work on nematodes sequently, we need to prepare for a future where large 3. Mechanism for using the sequence numbers of nematode genomes will be sequenced. a. Annotation plans Considering these two facts, it seems that the immedi- b. Dissemination of resources c. Computational power ate need is to organize an approach that will foster the Nematode Comparative Genomics: Bird et al. 411 additional nematode genome sequences. We recom- gans orthologs and are presumed to encode the same mend the development of consortia and focused user functions. A further 6,500 have one or more clearly communities to support specific proposals; each species detectable C. elegans homologs (i.e., have arisen from a needs a champion. We broadly list some criteria that common ancestral gene) but may have diverged follow- such communities may consider in selecting taxa for ing gene duplication (i.e., be paralogs) and adopted genome sequencing (Box 1), and we discuss some of different functions. However, even employing a fairly the practical considerations at length. Finally, we pro- non-stringent standard for what constitutes a match pose the development of a database of nematode spe- (BLAST Յ 1.0×e−5), 807 C. briggsae genes (4.1%) have cies “vitae” to document features of those species that no detectable match in C. elegans. Conversely, 5.1% of support their use in comparative genomics. the C. elegans gene set failed to match a C. briggsae gene. Understanding these genes in the context of each spe- Broad Topics to be Addressed by Nematode cies’ biology will likely prove interesting. Genome Sequencing Given what we know about gene content differences between C. briggsae and C. elegans and the emerging Basic biological processes that define, distinguish, and dif- picture from EST analysis that the genespace across the ferentiate the Nematoda: The advent of genomic analyses Nematoda is surprisingly large (Mitreva et al., 2005a; of model organisms, including whole genome sequenc- Parkinson et al., 2004), it will be especially important to ing and extensive profiling of transcript patterns using understand what sorts of new genes nematodes are microarrays, has begun to reveal that key biological pro- evolving, especially for parasites where specific and cesses are strikingly more widely conserved across life probably unique selection pressures are present. The than has previously been imagined. It has, for example, broad conservation of gene content across the eukary- proven possible to identify large sets of genes broadly otes suggests that it might prove productive to change conserved across the Eukaryota (including yeast, C. ele- research emphasis from attempting to understand what gans, Arabidopsis, Drosophila, and human), leading to is similar between two species to attempting to identify inference of a presumptive ancestral eukaryotic ge- the particular molecular genetic distinctions respon- nome (Koonin et al., 2004). Even more remarkable sible for making the organisms different. This goal than finding gene sets that are conserved across king- places an emphasis on the generation of complete ge- doms has been the realization that patterns of gene nome sequences to distinguish between unique and expression are conserved. McCarroll et al. (2004) have shared gene sets among taxa. Partial genome sequences developed tools to use the expression profile of a panel can provide insights into shared sets of genes but can- of genes that show coordinate behavior during a par- not confirm that a gene (set) is missing without the ticular process (such as aging) in one species as a query complete sequence. Furthermore, within the context of to sets of gene expression data from other species. As such a diverse phylum, it will be critical to select taxa at an example of the power of this approach, using the appropriate levels of divergence and within a well- profiles of yeast genes with altered expression during defined phylogenetic context to distinguish between sporulation to query the collection of C. elegans gene loss of a gene, gain of a gene, and rapid divergence of profiles gave the strongest match to nematode genes a gene. involved in germ line proliferation (McCarroll et al., Comparative genomics to understand parasitism: As 2004). In other words, the pattern of gene regulation noted, genome-wide comparisons will prove very useful controlling meiosis/mitosis in a single-celled eukaryote for developing and understanding the genetic differ- is detectably and specifically similar to that in a more ences and similarities correlated with specific biological complex, multicellular organism. attributes including evolution of parasitism. The phy- However, by their nature, the wide, cross-phylum logeny (Blaxter et al., 1998) supports multiple origins comparisons of genes undertaken to date have focused of parasitism in Nematoda, and it is reasonable to ex- on genes that are broadly conserved and likely to en- pect that different strategies and molecular innovations code core eukaryotic functions. Although a role in the may underlie adaptations in different lineages. Mecha- specific biology unique to the particular species cannot nisms that could affect evolution to parasitism include be ruled out for such core genes, it seems more likely gene duplication and diversification, gene-loss, changes that other, more divergent genes serve to make particu- in patterns of gene expression (Denver et al., 2005), lar species unique. A comparison between C. briggsae alterations in genes controlling metabolic and develop- and C. elegans may provide some clues. Although almost mental functions, adaptation of pre-existing genes to indistinguishable by light microscopy and apparently encode new functions, and acquisition of genes from sharing identical biology as assayed in the laboratory, other species via horizontal gene transfer (HGT). Ac- these species split from a common ancestor about 100 cumulating evidence supports a bacterial origin for million years ago (see Kiontke et al., 2004, for discus- some genes in Tylenchid plant-parasitic nematodes. sion of divergence rates). Approximately 12,200 C. Among those proposed are genes encoding enzymes briggsae genes can be assigned unequivocally as C. ele- that can degrade two major components of plant cell 412 Journal of Nematology, Volume 37, No. 4, December 2005 walls, as well as genes with potential roles in host- together with arthropods into the clade Ecdysozoa parasite signaling (Davis and Mitchum, 2005). Most of (Aguinaldo et al., 1997). Part of the reason for the these genes were identified on the basis of biochemical uncertainty in establishing the true position of nema- or immunological criteria, with claims of HGT being todes within the animal kingdom stems from long- supported by phylogenetic incongruence. A bioinfor- branch attraction artifacts (Felsenstein, 1978) that are matic approach using phylogenetic filters identified exacerbated by high evolutionary rates for genes used those genes previously found, plus an additional set of in phylogenetic inference (Philippe et al., 2005). One candidate HGT genes (Scholl et al., 2003). However, solution to this problem is to dissect the branches by one problem with these approaches is that, in the ab- including multiple taxa for each clade (Hendy and sence of complete genome sequences, the (EST) Penny, 1989). Addressing the fundamental question of datasets are incomplete; the absence of evidence of a where nematodes fit in animal evolution will greatly particular gene in such a set is not truly evidence for benefit from the inclusion of multiple, diverse nema- absence. Completed genomes will serve both as a tode genome sequences. source of information on the presence and absence of Understanding the evolutionary relationships be- genes and for the development of functional genomic tween taxa is essential for comparative annotation. One and proteomic tools to foster rapid and cost-effective of the most powerful ways of understanding the “mean- biochemical studies. The utility of such approaches can ing” of a DNA sequence is to look for the consequences be seen in the discovery of novel anthelminthics (Mc- of natural selection after evolutionary divergence. By Carter, 2004). sequencing multiple taxa that are sufficiently different, Nematodes parasitic on vertebrates also have adapted the conservation of sequence becomes informative with for interaction with their hosts, including evolving respect to function. As noted above, this is the motivat- means of evading or modulating the host immune and ing logic for the various Caenorhabditis projects and hematopoietic responses. The study of parasite- should serve as a model for all nematode sequencing encoded proteins that interact with the mammalian im- projects. As more genomes are analyzed, it has become mune system has provided numerous insights of impor- possible to predict just how related genomes need to be tance to the broader field of immunology (Maizels and in order to identify functionally conserved domains by Yazdanbakhsh, 2003). Parasitic nematodes are cur- comparison (Eddy, 2005). rently being tested in clinical trials as direct therapeu- Comparative genomics to support ecological and evolution- tics for autoimmune diseases (e.g., Trichuris suis for in- ary functional genomics: Ecological and evolutionary flammatory bowel disease) (Hunter and McKay, 2004; functional genomics is an emerging field intent upon Summers et al., 2005). Similarly, a hookworm-derived developing systems to bring functional genomics to recombinant protein that inhibits human FVIIa/TF is ecological studies. As such, it is expected that one key in clinical trials, in this case as an anticoagulant for motivation for the sequencing of nematode genomes unstable angina and myocardial infarction (Lee and will be to support the development of tools for func- Vlasuk, 2003; Mungall, 2004; Stanssens et al., 1996). tional genomics. DNA sequences are critical in the de- Further understanding of parasite molecular mediators velopment of microarray platforms for the analysis of of host interactions has promise to lead to additional gene expression, for the interpretation of proteomic human therapeutics with applications well beyond studies, and in the discovery of polymorphisms for the parasitology and nematology. analysis of quantitative trait loci. Because it is unlikely Comparative genomics and the need for a well-supported that many nematode species will be directly examined evolutionary framework: Sampling from a relatively small by traditional biological means, genomic and post- number of nematode taxa broadly spanning the phy- genomic tools suggest an approach to consider nema- lum gave a highly informative view of the phylogenetic todes and their ecological associations as a group (a structure of Nematoda (Blaxter et al., 1998). Whole process termed “metagenomics”). genome information from broadly selected nematode taxa will be needed to resolve the deepest branches in Practical Criteria for Selecting Nematode the phylum. Such efforts are concordant with the Species for Sequencing Nematode Tree-of-Life Project (NemATOL: http:// nematol.unh.edu/) and, conversely, each nematode to Perhaps the most important point to address is “who be sequenced will have to be placed in a phylogenetic will use the sequence?” Fortunately, the breadth of in- context for effective use of its data. In addition to better terests of the nematology community suggests that each understanding evolutionary relationships within the sequence might be of interest to multiple constituen- phylum, nematode genome data will contribute to un- cies. For example, the sequence of a human-parasitic derstanding the relationship of Nematode to other species would obviously be of interest to those labs metazoan phyla, a point that remains controversial working on the nematode species in question and likely (Blair et al., 2002). Recent data (Philippe et al., 2005) also to clinicians interested in the pathology. But fur- are consistent with the model that places nematodes ther, each new species provides a phylogenetically in- Nematode Comparative Genomics: Bird et al. 413 formative platform and may contribute additional func- in the phylum, are not available as laboratory cultures. tional information relevant to the annotation of C. ele- The establishment of such cultures would be an impor- gans and the rest of the phylum. For example, the dis- tant step to overcome these practical issues. Current covery of root-knot nematode ESTs with highly signifi- efforts focusing on the culture of Tobrilus species (De cant matches to hypothetical genes predicted by auto- Ley, pers. comm.) and published accounts of the pos- mated annotation of the C. elegans genome strongly sibility of similar cultures (Moens and Vincx, 1998) sug- implies that those predictions indeed define genes, al- gest that some of these practical limitations can be over- beit with no known function (McCarter et al., 2003b). come. Hookworm ESTs have identified orthologs of genes It also is true that genome size matters, as sequencing that were previously “orphans” in either C. elegans or C. costs are directly proportional to the number of bases briggsae (Mitreva et al., 2005b). It is possible that future that must be obtained. Genome size must first be as- meta-analyses of nematode genomes will reveal classes sessed, ideally using independent methods such as of genes associated with particular biological attributes Feulgen image analysis densitometry (Hardie et al., (such as parasitism or, more generally, symbiosis) (Ott 2002) or flow cytometry (Kent et al., 1998). Other pa- et al., 2004a, 2004b). The incorporation of non-C. ele- rameters, such asG+Ccontent and complexity (i.e., gans nematode sequences into an extensive informa- the proportion of repeats in the genome), are impor- tion management system such as WormBase or a system tant in evaluating the potential success of a proposed for displaying phylogenetic context (e.g., NemATOL) is sequencing project. For assembling sequences obtained an important step toward a more effective exploitation by the whole genome shotgun (WGS) approach, which of comparative nematode data. is now a favored method for draft-quality genomes, an It is important to consider the specific scientific important criterion is the level of polymorphism in a needs and size of the research community associated particular genome, which is reflected in the heteroge- with any proposed nematode genome sequencing proj- neity of the genome sample in a particular isolate. A ect. Downstream of a large-scale sequencing project are high degree of heterozygosity may hamper assembly of the processes of assembly and annotation. Some se- WGS sequence; so, unless the level of polymorphism is quencing organizations can dedicate post-docs or expe- naturally low, highly inbred strains are desirable. The rienced annotators for this activity, but the issue must most recent advances in genome sequencing technol- be addressed in any overall whole-genome sequencing ogy that involve highly parallel sequencing of hundreds proposal. Much of the finishing and annotation can be of thousands of templates simultaneously are likely to automated in the first instance, but it is likely that in- dramatically reduce the cost of sequencing. But issues dividual research communities will be responsible for such as purity of sample and levels of polymorphism subsequent manual annotation. It also is important to will remain critical criteria in taxa selection. identify post-genomic activities. Functional genomic re- Supporting tools: No matter what sequencing strategy sources such as microarrays, cDNA archives, and librar- is followed (WGS or a more directed approach), the ies offer a post-genomic pipeline to leverage the se- availability of anchored physical and genetic maps can quences and are best organized and most cost-effective provide a framework to assist with genome assembly. as large-scale cooperative efforts by the user commu- Similarly, the availability of cDNA libraries for EST cov- nity. The availability of tools, such as RNA interference erage is necessary for gene identification and predic- (Fire et al., 1998), for downstream analysis of gene tion of exon-intron boundaries. Ideally, ESTs from function might further influence species selection. each stage of complex life cycles should be included. Genome readiness: It is perhaps self-evident that suffi- Some full-length cDNAs are required for training gene cient, high-quality genomic DNA, free from contamina- prediction programs such as GLIMMER (Salzberg et tion by other species (except for purposeful metage- al., 1998). nomic analysis) be available for library construction. The technical constraints of constructing large insert Nematode Vitae libraries (>100 kb inserts) generally dictate that suffi- cient intact nuclei able to yield hundreds of micro- We suggest that, as a standardized format to capture grams of DNA be available. For some species that might the types of genome information, communities of otherwise be assigned high priority for sequencing nematode researchers establish “vitae” for nematodes (such as certain animal parasites, or phylogenetically to be sequenced. Ideally, such vitae should be freely significant species), the fact that they live in difficult- available in a database (at a Web site, for example). For to-sample habitats should unfortunately eliminate them each species, a general overview is provided, followed from consideration, given current technology, unless by the significance or reason why the species should be the community dedicates significant effort to obtain sequenced. A general description (e.g., of the biology, materials for sequencing. For example, it remains true pathology, or ecology) is provided as well as the known that the vast majority of nematodes, in particular those “genome facts,” such as the size of the genome, avail- representing basal lineages and under-sampled clades able EST or map resources, availability of libraries, etc. 414 Journal of Nematology, Volume 37, No. 4, December 2005

An indication of the size of the community interested developing world, i.e., they are considered to have a in the nematode species is given, with a supporting greater impact than any other disease of domestic ani- argument (e.g., the number of publications). Other de- mals. This group of parasites is among the most eco- tails, including how the genome sequence will be used, nomically important diseases of livestock in the devel- are presented as are contact details, such as Web sites, oped world. consortia leaders, etc. It is intended that the vitae en- General description: Many different species, from dif- capsulate the type of information necessary to justify to ferent genera, occur as mixed infections of livestock. a funding agency the need to sequence the particular The relative importance of particular species varies for genome. An example, using the animal-parasitic nema- different regions of the world; consequently, the rela- tode Haemonchus contortus, is shown. tive priority of each will differ between funding agen- cies. The disease syndrome observed varies depending HAEMONCHUS CONTORTUS VITA on the species predominating. For example, the para- site H. contortus is a blood-feeder and, hence, highly Overview: Haemonchocus contortus is the most economi- pathogenic. Others cause mild clinical disease, but in cally important nematode parasite of sheep and goat all cases subclinical infections can dramatically reduce worldwide. Control of this parasite is increasingly diffi- productivity in livestock units. Sheep nematodes pro- cult due to widespread resistance to all the major vide an important resource in that they are readily pas- classes of anthelmintic drugs throughout the world. saged and relatively inexpensive to maintain; some of This parasite has the largest active research community the greatest success in nematode vaccine development of all clade V nematodes and arguably has been the has occurred in this sector. In addition, it is a priority to subject of more research on vaccine development and elucidate the genetic basis of anthelmintic resistance in drug resistance than any other parasitic nematode spe- these species. We propose that H. contortus represents cies. Hence it can be considered to be a model parasitic the most appropriate species from this group for full nematode as well as an important pathogen in its own sequencing and assembly (10X coverage, finishing, as- right. Importantly, it is related to the devastating hu- sembly, and annotation). Six other trichostrongylid man hookworm pathogens (Necator americanus and An- nematode species of veterinary importance are sug- cylostoma duodenale) and will thus illuminate their biol- gested for 5X coverage. These species have been cho- ogy and promote control options for human disease. It sen as the most economically important and tractable is a relatively large adult parasite (2 cm) with high fe- cundity, permitting the relatively straightforward gen- species from each of the remaining genera of primary eration of large quantities of parasite material for bio- importance. chemical, immunological, and molecular studies. It is Genome facts: The genome size of Haemonchus is esti- possible to perform genetic crosses between isolates, mated to be 54Mb, and all the indications are that the and already there is a genome mapping project funded genome sizes of this entire group are in this region. by the Wellcome Trust to produce an integrated RNA interference is effective in L3 Haemonchus, and HAPPY and BAC clone map and develop polymorphic BAC library development and HAPPY mapping are on- markers for genetic analysis. HAPPY mapping (Dear going. As noted for the individual species below, vary- and Cook, 1993) is a physical mapping approach that ing EST datasets are available, including from Haemon- generates information that is analogous to genetic link- chus (17,269 ESTs from approximately 4,145 genes ex- age data. The integrated HAPPY/BAC clone map of H. pressed in adult, L3, and L4 worms), Teladorsagia (4,379 contortus will be of significant value for contig assembly ESTs from approximately 1,700 genes), and Ostertagia of a full genome sequencing project. There are also ostertagi (7,600 ESTs from 2,350 genes expressed in lar- 21,967 ESTs available, representing an estimated 4,000 val stages.) genes, which will be invaluable for gene finding and Community and active labs: Many labs worldwide study annotation. RNAi is being developed in this species at a gastrointestinal nematodes of livestock, reflecting their number of sites worldwide, and success has been re- global importance. A key word search of PubMed re- ported with a number of different gene targets. vealed 1,800 references with the query term “Haemon- Significance: Many clade V nematodes have veterinary chus,” 980 references with the query “Ostertagia/ impact. Of most importance are the parasites of grazing Teladorsagia,” 1,448 matches to the query “Tricho- livestock that cause significant economic problems for strongylus” and 796 and 639 to “Dictyocaulus” and agriculture and detrimental effects on animal welfare “Cooperia,” respectively. throughout the world. A recent report, commissioned Curation: The Haemonchus and Teladorsagia ESTs are by the United Kingdom Department for International available individually and grouped into clusters at Nem- Development, listed the trichostrongylid nematodes base (www.nematodes.org), along with additional an- (ranked as a group because they often occur as mixed notation. The individual clones are freely available to infections) at the top of a list of the top 80 animal the community, and initial publications describing the diseases that have a major impact on the poor in the datasets are available (Geldhof et al., 2005). Haemon- Nematode Comparative Genomics: Bird et al. 415 chus contortus has a project page on the Sanger Centre at 851–878 in D. L. Riddle, T. Blumenthal, B. Meyer, and J. Priess, eds. http://www.sanger.ac.uk/Projects/H_contortus/. C. elegans II. Cold Spring Harbor, NY: Cold Spring Harbor Press. Blaxter, M., J. Daub, D. Guiliano, J. Parkinson, C. Whitton, and The Exploitation: The genome information generated will Filarial Genome Project. 2002. The Brugia malayi genome project: have immediate and urgent application in the identifi- Expressed sequence tags and gene discovery. Transactions of the cation of novel target molecules for the control of these Royal Society of Tropical Medicine and Hygiene 96:7–17. parasites by vaccination or by drug development. In Blaxter, M. L., P. DeLey, J. Garey, L. X. Liu, P. Scheldeman, A. Vierstraete, J. R. Vanfleteren, L. Y. Mackey, M. Dorris, L. M. Frisse, addition, these datasets will be invaluable for character- J. T. Vida, and W. K. Thomas. 1998. A molecular evolutionary frame- izing the mechanisms involved in drug resistance. The work for the phylum Nematoda. Nature 392:71–75. community has available selected lines of some species Chen, N., T. W. Harris, I. Antoshechkin, C. Bastiani, T. Bieri, D. with defined resistance to all the currently available Blasiar, K. Bradnam, P. Canaran, J. Chan, C. K. Chen, W. J. Chen, F. Cunningham, P. Davis, E. Kenny, R. Kishore, D. Lawson, R. Lee, anthelmintics. This will prove an invaluable resource H. M. Muller, C. Nakamura, S. Pai, P. Ozersky, A. Petcherski, A. Rog- for comparative genomics to seek common genes in- ers, A. Sabo, E. M. Schwarz, K. Van Auken, Q. Wang, R. Durbin, J. volved with this problem. Having defined genome se- Spieth, P. W. Sternberg, and L. D. Stein. 2005. WormBase: A com- quences opens the door for comprehensive and mean- prehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Research 33:D383–389. ingful analyses of gene expression, e.g., by microarrays. Coulson, A., and J. Sulston. 1984. The genomic jigsaw. Worm Benefits: The benefits would be to accelerate the de- Breeder’s Gazette 8(3):6. velopment of novel methods to control these parasites Davis, E. L., and M. G. Mitchum. 2005. Nematodes. Sophisticated in livestock, defining the genetic mechanisms underly- parasites of legumes. Plant Physiology 137:1182–1188. Daub, J., A. Loukas, D. I. Pritchard, and M. Blaxter. 2000. A survey ing drug susceptibility and resistance with the possibil- of genes expressed in adults of the human hookworm, Necator ameri- ity of extending the useful life of existing drugs and canus. Parasitology 120:171–184. improved diagnostics in this area, as well as providing Dear, P. H., and P. R. Cook. 1993. HAPPY mapping: Linkage map- the means for meaningful whole-animal studies of the ping using a physical analogue of meiosis. Nucleic Acid Research 21:13–20. host-parasite interaction. De Ley, P., and M. L. Blaxter. 2002. Systematic position and phy- logeny. Pp. 1–30 in D. Lee, ed. The biology of nematodes. London: Conclusion Taylor and Francis. Denver, D. R., K. Morris, J. T. Streelman, S. K. Kim, M. Lynch, and As the jigsaw that was the physical map of the C. W. K. Thomas. 2005. The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nature Genetics 37: elegans genome began to take shape with more than 544–548. 40% of the genome in multiple contigs, Coulson and Eddy, S. R. 2005. A model of the statistical power of comparative Sulston (1984) observed: “We’ve done the straight genome sequence analysis. PLoS Biology 3:e10. edges and the little house in the middle and we’re Eizinger, A., and R. J. Sommer. 1997. The homeotic gene lin-39 and the evolution of nematode epidermal cell fates. Science 278:452– working on the sky. There’s quite a lot of it.” As more 455. nematode genomes are solved, each will serve to pro- Felsenstein, J. 1978. Cases in which parsimony or compatibility duce a clearer picture on the lid of the box to guide the methods will be positively misleading. Systematic Zoology 27:401– genomes to follow. All indications are that the “sky” will 410. Finch, C. E., and G. Ruvkun. 2001. The genetics of aging. Annual reveal a diverse and large genespace spanning the Review of Genomics and Human Genetics 2:435–462. Nematoda, which will underpin the acceleration of hy- Fire, A., S. Xu, M. K. Montgomery, S. A. Kostas, S. E. Driver, and pothesis-driven research stemming from nematode ge- C. C. Mello. 1998. Potent and specific genetic interference by double- nome informatics. stranded RNA in Caenorhabditis elegans. Nature 391:806–811. Geldhof, P., C. Whitton, W. F. Gregory, M. Blaxter, and D. P. Knox. 2005. Characterisation of the two most abundant genes in the Hae- Literature Cited monchus contortus expressed sequence tag dataset. International Jour- nal for Parasitology 35:513–522. Aguinaldo, A. M., J. M. Turbeville, L. S. Linford, M. C. Rivera, J. R. Ghedin, E., S. Wang, J. M. Foster, and B. E. Slatko. 2004. First Garey, R. A. Raff, and J. A. Lake. 1997. Evidence for a clade of nema- sequenced genome of a parasitic nematode. Trends in Parasitology todes, arthropods, and other moulting animals. Nature 387:489–493. 20:151–153. Bird, D. McK. 2004. High society (of nematologists). Genome Bi- Guiliano, D. B., N. Hall, S. J. Jones, L. N. Clark, C. H. Corton, B. G. ology 5:353. Barrell, and M. L. Blaxter. 2002. Conservation of long-range synteny Bird, D. McK., S. W. Clifton, T. Kepler, J. J. Kieber, J. Thorne, and and microsynteny between the genomes of two distantly related C. H. Opperman. 2002. Genomic dissection of a nematode-plant in- nematodes. Genome Biology 3:R57. teraction: A tool to study plant biology. Plant Physiology 129:394–395. Hardie, D. C., T. R. Gregory, and P. D. N. Hebert. 2002. From pix- Bird, D. McK., and C. H. Opperman. 1998. Caenorhabditis elegans: A els to picograms: A beginner’s guide to genome quantification by genetic guide to parasitic nematode biology. Journal of Nematology Feulgen image analysis densitometry. Journal of Histochemistry and 30:299–308. Cytochemistry 50:735–749. Bird, D. McK., C. H. Opperman, S. J. M. Jones, and D. L. Baillie. Harris, T. W., N. Chen, F. Cunningham, M. Tello-Ruiz, I. 1999. The Caenorhabditis elegans genome: A guide in the post genom- Antoshechkin, C. Bastiani, T. Bieri, D. Blasiar, K. Bradnam, J. Chan, ics age. Annual Review of Phytopathology 37:247–265. C. K. Chen, W. J. Chen, P. Davis, E. Kenny, R. Kishore, D. Lawson, R. Blair, J. E., K. Ikeo, T. Gojobori, and S. B. Hedges. 2002. The evo- Lee, H. M. Muller, C. Nakamura, P. Ozersky, A. Petcherski, A. Rogers, lutionary position of nematodes. BMC Evolutionary Biology 2:7. A. Sabo, E. M. Schwarz, K. Van Auken, Q. Wang, R. Durbin, J. Spieth, Blaxter, M. L. 1998. Caenorhabditis elegans is a nematode. Science P. W. Stenberg, and L. D. Stein. 2004. WormBase: A multi-species 282:2041–2046. resource for nematode biology and genomics. Nucleic Acids Re- Blaxter, M. L., and D. McK. Bird. 1997. Parasitic nematodes. Pp. search 32:D411–417. 416 Journal of Nematology, Volume 37, No. 4, December 2005

Hendy, M., and D. Penny. 1989. A framework for the quantitative by comparative analysis of two Ancylostoma species. BMC Genomics study of evolutionary trees. Systematic Zoology 38:297–309. 6:58. Hunter, M. M., and D. M. McKay. 2004. Review article: Helminths Moens, T., and M. Vincx, 1998. On the cultivation of free-living as therapeutic agents for inflammatory bowel disease. Alimentary marine and estuarine nematodes. Helgola¨nder Meeresunters 52:115– Pharmacology & Therapeutics 19:167–177. 139. Kent, M., R. Chandler, and S. Wachtel. 1988. DNA analysis by flow Mungall, D. 2004. rNAPc2—Nuvelo. Current Opinion in Investiga- cytometry. Cytogenetics and Cell Genetics 47:88–89. tional Drugs 5:327–333. Kiontke, K., N. P. Gavin, Y. Raynes, C. Roehrig, F. Piano, and D. H. Ott, J. A., M. Bright, and S. Bulgheresi. 2004a. Marine microbial Fitch. 2004. Caenorhabditis phylogeny predicts convergence of her- thiotrophic ectosymbioses. Symbiosis 36:103–126. maphroditism and extensive intron loss. Proceedings of the National Ott, J. A., M. Bright, and S. Bulgheresi. 2004b. Symbioses between Academy of Sciences 101:9003–9008. marine nematodes and sulfur-oxidizing chemoautotrophic bacteria. Koonin, E. V., N. D. Fedorova, J. D. Jackson, A. R. Jacobs, D. M. Oceanography and Marine Biology: An Annual Review 42:95–118. Krylov, K. S. Makarova, R. Mazumder, S. L. Mekhedov, A. N. Nikol- Parkinson, J., M. Mitreva, C. Whitton, M. Thomson, J. Daub, J. skaya, B. S. Rao, I. B. Rogozin, S. Smirnov, A. V. Sorokin, A. V. Sverd- Martin, R. Schmid, N. Hall, B. Barrell, R. H. Waterston, J. P. Mc- lov, S. Vasudevan, Y. I. Wolf, J. J. Yin, and D. A. Natale. 2004. A com- Carter, and M. L. Blaxter. 2004. A transcriptomic analysis of the phy- prehensive evolutionary classification of proteins encoded in com- lum Nematoda. Nature Genetics 36:1259–1267. plete eukaryotic genomes. Genome Biology 5:R7.1–R7.27. Philippe, H., N. Lartillot, and H. Brinkmann. 2005. Multigene Lee, A. Y. Y., and G. P. Vlasuk. 2003. Recombinant nematode anti- analyses of bilaterian animals corroborate the monophyly of ecdyso- coagulant protein c2 and other inhibitors targeting blood coagula- zoa, lophotrochozoa, and protostomia. Molecular Biology and Evo- tion factor VIIa/tissue factor. Journal of Internal Medicine 254:313– lution 22:1246–1253. 321. Salzberg, S. L., A. L. Delcher, S. Kasif, and O. White. 1998. Micro- Lee, S. S., S. Kennedy, A. C. Tolonen, and G. Ruvkun. 2003. DAF- bial gene identification using interpolated Markov models. Nucleic 16 target genes that control C. elegans life-span and metabolism. Sci- Acids Research 26:544–548. ence 300:644–647. Scholl, E. H., J. L. Thorne, J. P. McCarter, and D. McK Bird. 2003. Link, C. D., A. Taft, V. Kapulkin, K. Duke, S. Kim, Q. Fei, D. E. Horizontally transferred genes in plant-parasitic nematodes: A high- Wood, and B. G. Sahagan. 2003. Gene expression analysis in a trans- throughput genomic approach. Genome Biology 4:R39.1–R39.12. genic Caenorhabditis elegans Alzheimer’s disease model. Neurobiology Stanssens, P., P. W. Bergum, Y. Gansemans, L. Jespers, Y. Laroche, of Aging 24:397–413. S. Huang, S. Maki, J. Messens, M. Lauwereys, M. Cappello, P. J. Hotez, Maizels, R. M., and M. Yazdanbakhsh. 2003. Immune regulation by I. Lasters, and G. P. Vlasuk. 1996. Anticoagulant repertoire of the helminth parasites: Cellular and molecular mechanisms. Nature Re- hookworm Ancylostoma caninum. Proceedings of the National Acad- views: Immunology 3:733–744. emy of Sciences 93:2149–2154. McCarroll, S. A., C. T. Murphy, S. Zou, S. D. Pletcher, C. S. Chin, Stein, L. D., Z. Bao, D. Blasiar, T. Blumenthal, M. R. Brent, N. Y. N. Jan, C. Kenyon, C. I. Bargmann, and H. Li. 2004. Comparing Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. genomic expression patterns across species identifies shared tran- D’Eustachio, D. H. Fitch, L. A. Fulton, R. E. Fulton, S. Griffiths-Jones, scriptional profile in aging. Nature Genetics 36:197–204. T. W. Harris, L. W. Hillier, R. Kamath, P. E. Kuwabara, E. R. Mardis, McCarter, J. P. 2004. Genomic filtering as an approach to discov- M. A. Marra, T. L. Miner, P. Minx, J. C. Mullikin, R. W. Plumb, J. ering novel antiparasitics. Trends in Parasitology 20:462–468. Rogers, J. E. Schein, M. Sohrmann, J. Spieth, J. E. Stajich, C. Wei, D. McCarter, J. P., P. Abad, J. Jones, and D. McK. Bird. 2000. Rapid Willey, R. K. Wilson, R. Durbin, and R. H. Waterston. 2003. The ge- gene discovery in plant-parasitic nematodes via Expressed Sequence nome sequence of Caenorhabditis briggsae: A platform for comparative Tags. Nematology 2:719–731. genomics. PLoS Biology 1:E45. McCarter, J. P., M. Mitreva, S. W. Clifton, D. McK. Bird, and R. Stein, L. D., C. Mungall, S. Shu, M. Caudy, M. Mangone, A. Day, E. Waterston. 2003a. Nematode gene sequences: Update for December Nickerson, J. E. Stajich, T. W. Harris, A. Arva, and S. Lewis. 2002. The 2003. Journal of Nematology 35:465–469. generic genome browser: A building block for a model organism McCarter, J. P., M. D. Mitreva, J. Martin, M. Dante, T. Wylie, U. system database. Genome Research 12:1599–1610. Rao, D. Pape, Y. Bowers, B. Theising, C. Murphy, A. P. Kloek, B. Stein, L. D., P. Sternberg, R. Durbin, J. Thierry-Mieg, and J. Spieth. Chiapelli, S. W. Clifton, D. McK. Bird, and R. Waterston. 2003b. 2001. WormBase: Network access to the genome and biology of Cae- Analysis and functional classification of transcripts from the root-knot norhabditis elegans. Nucleic Acids Research 29:82–86. nematode Meloidogyne incognita. Genome Biology 4:R26.1–R26.19. Summers, R. W., D. E. Elliott, J. F. Urban, R. Thompson, and J. V. McKay, R. M., J. P. McKay, L. Avery, and J. M. Graff. 2003. C. elegans: Weinstock. 2005. Trichuris suis therapy in Crohn’s disease. Gut 54:6–8. A model for exploring the genetics of fat storage. Developmental Cell Tetteh, K. K., A. Loukas, C. Tripp, and R. M. Maizels. 1999. Iden- 4:131–142. tification of abundantly expressed novel and conserved genes from Mitreva, M., M. L. Blaxter, D. McK. Bird, and J. P. McCarter. 2005a. the infective larval stage of Toxocara canis by an expressed sequence Comparative genomics in nematodes. Trends in Genetics, in press. tag strategy. Infection and Immunity 67:4771–4779. Mitreva, M. D., A. A. Elling, M. Dante, A. P. Kloek, A. Kalyanara- The C. elegans Sequencing Consortium. 1998. Genome sequence of man, S. Aluru, S. W. Clifton, D. McK. Bird, T. J. Baum, and J. P. the nematode C. elegans: A platform for investigating biology. Science McCarter. 2004. A survey of SL1-spliced transcripts from the root- 282:2012–2018. lesion nematode Pratylenchus penetrans. Molecular Genetics and Ge- Whitton, C., J. Daub, M. Quail, N. Hall, J. Foster, J. Ware, M. nomics 272:138–148. Ganatra, B. Slatko, B. Barrell, and M. Blaxter. 2004. A genome se- Mitreva, M., J. P. McCarter, P. Arasu, J. Hawdon, J. Martin, M. quence survey of the filarial nematode Brugia malayi: Repeats, gene Dante, T. Wylie, J. Xu, J. E. Stajich, V. Kapulkin, S. W. Clifton, R. H. discovery, and comparative genomics. Molecular and Biochemical Waterston, and R. Wilson, 2005b. Investigating hookworm genomes Parasitology 137:215–227.