PROJECT DESCRIPTION

1. Results from prior support Due to space constraints we will report here only on results from the first Nematode AToL award "ATOL: Phylum Nematoda: Integrating Multidisciplinary Expertise and Infrastructure for Resolving Relationships in a Major Branch of the Tree of Life" (DEB 0228692, $2,512,500, 10/1/02 - 9/30/07 + 1 year no-cost extension) as the single most directly relevant prior support to our collaborative team.

1.1. Trainees: A total of 7 graduate students, 29 undergraduate students and 7 postdoctoral researchers received training and conducted research as part of the first Nematode ATOL; over half of these trainees were female and/or from ethnic minorities. We also collectively hosted 9 international professors on sabbatical or fellowship visiting for 6-9 months each, as well as 15 additional international short-term visitors for 1-8 weeks (including 7 graduate students). 1.2. Publications: We have published a total of 50 manuscripts during the course of the first NemATOL project (see section 1 in references). and will list only some of the highlights here. A robust phylogeny with good representation of Rhabditina (part of Clade V) was a featured cover article in Current Biology and recommended as a "Must Read" by the Faculty of 1000 (Kiontke et al., 2007). Contributions in the online third volume on Caenorhabditis elegans biology introduced the nematode model research community to the latest developments in nematode phylogeny and to the methods and database developed by our project (Fitch, 2005; De Ley, 2006). Invited reviews on the evolution of nematode parasitism similarly informed, respectively, phytopathological and parasitological audiences (Baldwin et al., 2004; Blaxter et al., 2004). In addition to direct project-related papers, data and database structures produced during our project contributed to crossover research in image recognition algorithms (Wei et al., 2008). 1.3. Workshops and symposia: We have organized five workshops with a collective attendance by 61 national and international experts (in addition to the 5 project PIs) as well as 8 graduate students. These workshops brought together experts to create lists of priority taxa for sequencing, to review the current state of taxonomic understanding in terms of available and unpublished data within each clade, discuss provisional phylogenies based on the foregoing data, and (where considered possible) begin the process of assembling morphological matrices for possible future analysis. In addition to the workshops, our project CoPIs and graduate students chaired dedicated symposium sessions on nematode phylogeny and Nematode AToL project updates at the annual meetings of the Society of Nematologists in 2003, 2004, 2005 and 2007. Each of these symposia had an audience of over 50 nematologists and included brief clade-by-clade overviews of progress and problems with phylogenetic analyses. 1.4. Data advancements to nematode phylogeny: We collectively produced 500 novel near-complete SSU sequences, 1020 new sequence fragments of approximately 600-1000bp of LSU and 669 new sequences of various other loci of known or potential value for phylogenetic analyses. As originally planned, the first Nematode ATOL proposed to obtain 25 orthologous gene sequences from 50 selected nematode species by producing cDNA libraries for each of those species. However, as the project went through its first two years it quickly became evident that cDNA library-based sequencing was no longer the optimal approach, both from theoretical and practical standpoints. Opinion in the field of multigene phylogeny has evolved to a stated preference for phylogenies based on hundreds of genes rather than dozens. Moreover, massively parallel sequencing devices (see below) led to a revolution in molecular biology. Last but not least, complete genome projects were planned (some in consultation with our project) and instigated for at least a dozen additional nematode species. We produced several “traditional” cDNA libraries (e.g., for Anguina sp. and Romanomermis culicivorax) and conducted exploratory sequencing. In the course of this work, it became evident that much greater gains could be expected from newer genomic techniques. PI Thomas therefore undertook a series of preliminary analyses with massively parallel sequencing of cDNA (see 1.5. below as well as sections 3.1 and 3.2). This yielded datasets with hundreds of genes obtained at a fraction of the traditional cost from each of five distantly related nematodes, including contigs representing nearly 200 orthologous genes that are known or expected to be highly informative for phylogenetic analysis. Furthermore, several orthologs were developed by CoPI Fitch for a PCR-based approach to phylogenetic analysis in Clade V. CoPI Nadler has hosted visiting scholar Joong-ki Park in a collaborative exploration of the potential usefulness of complete mitochondrial gene sequences for phylogenetic inference of nematodes. Results of combined analyses of the 12 protein-coding mtDNA genes for 23 nematode species show strong support for clades, and some disagreement with results based on SSU sequences. During the workshops for Clades V and IV, sets of morphological characters and states were defined, scored for >50 representatives of each clade and organized into a character matrix for each clade. During the workshop for Clade I, international collaborator Dr. Peña Santiago summarized his ongoing development of a morphological character matrix for freeliving members of that clade. Although many of the morphological characters thus defined do not provide sufficient resolution by themselves, they often contribute to resolution in important ways when combined with the molecular characters; several morphological characters were found to be apomorphic for particular subclades. Here, we propose to extend relevant data on morphological characters to representatives of major groups and to explore the use of selected developmental charctersto resolve fundamental branches within the phylum Nematoda.

1.5. Methodological advancements to nematode phylogeny: We have collectively developed a number of new or improved protocols and tools that greatly simplify and/or improve the quality and amount of phylogenetic data obtainable from individual nematode species or even individual nematodes. CoPI De Ley has experimented with a previously ignored preservative based on DMSO, EDTA and saturated NaCl and demonstrated that it is particularly useful for preserving and transporting nematode specimens that are still amenable to both PCR amplification and morphological identification (Yoder et al., 2006). In combination with multifocal image vouchering via high-definition camcorder microscopy, this has opened up a wide range of highly affordable investigations of combined morphological and molecular analyses of nematode relationships (demonstrated in e.g. Tandingan De Ley et al., 2007). In parallel, CoPIs Thomas, Nadler, and Baldwin have assembled electronically controlled microscopy and imaging systems that provide a highly standardized approach to multifocal imaging and are an essential step closer towards automation and high-throughput morphological vouchering for ecological or phylogenetic surveys of nematodes (see e.g. Abebe et al., 2004). Yet another innovation pertains to genomic amplification of individual nematodes or other interstitial invertebrates (De Ley et al., 2005). Our most recent innovation is the testing of high-yield mRNA extraction and RT-PCR protocols on minute amounts of nematode biomass (down to a single 5 mm long nematode), in combination with massively parallel sequencing of the resulting cDNA (bypassing library construction) and improved assembly algorithms for the resulting millions of sequence fragments per cDNA sample. Results from this far-reaching enhancement of molecular capabilities are presented in sections 3.1 and 3.2.

1.6. The Nematol database: We have pursued a collaborative databasing strategy that aims to provide open access to our sequence data as well as to the voucher images of our own project as well as those of our national and international colleagues in the discipline. The first Nematode ATOL produced a versatile and extensive website called Nematol, which not only acts as a database but also provides a wide range of other utilities including online phylogenetic pipelines, nematode alignment and tree viewing tools (http://nematol.unh.edu/tree/tree.php), and geographical mapping functions linked with the source locations of sequenced and/or imaged nematode specimens. To date, Nematol has collections of 11,688 DNA sequences (6057, 18S rDNA; 3050, 28S rDNA; and 2581 ITS) and 4,549 image and video files (images 138 files, videos 4411 files) of 627 nematode specimens, which were isolated from 273 samples collected from 208 distinct locations in 26 countries. These nematodes belong to 274 genera and 87 families). It remains clear that because of the quality of the morphological information contained in the vouchers that they are extremely useful for communicating morphological information and are becoming a 'gold standard' in taxonomy (Eyualem et al., 2006). 2. Specific aims of the new Nematode Tree of Life proposal: The previous nematode AToL was highly successful in mobilizing the community of nematode researchers and producing extensive datasets of ribosomal sequences from major and minor clades across the entire phylum. Our published and ongoing analyses demonstrate, however, that several important evolutionary radiations within the phylum cannot be adequately resolved with presently available data. We therefore propose to extend the collective results obtained during the first nematode AToL project by bringing to bear on these outstanding problems in nematode phylogeny a much wider range of data and methodologies.

Aim 1: To resolve major relationships by vastly increasing the number of orthologous molecular characters During the first nematode AToL, we developed and tested several protocols that allow us to start from extremely small amounts of nematode tissue and obtain sequences from large numbers of orthologous genes with high information content for phylogenetic analysis. We now propose to apply the most successful and robust among these protocols to a much wider range of species across the phylum than was previously possible, emphasizing especially the areas of low resolution that correspond mainly to the origins and early evolution of the whole phylum, as well as the origins and early evolution of Rhabditida sensu lato, which is a monophylum that includes most of the medically and economically important parasitic nematode clades.

Aim 2: To use morphological, developmental, and certain other characters to test hypotheses of relationship in particular parts of the tree where experience suggests they will be relevant In addition to obtaining multigene data across the phylum but emphasizing these major and as yet unresolved radiations, we will also bring together published data and obtain new information for a breadth of other types of characters from mitochondrial gene orders, morphological and ultrastructural studies, as well as developmental lineages and patterns. Since truly phylum-wide study and analysis of such character suites is not possible within the limited time and resources of one project, we will focus specifically on those aspects for which previous and ongoing research has shown the greatest direct relevance to the unraveling of the major radiations. In this manner, our new project will have major impacts on science and education by consolidating the infrastructure creations and collaborations established during the first nematode AToL project. It will moreover integrate and reciprocally inform a much wider range of biological disciplines by targeted inclusion of these characters.

3. Plan of Work

3.1. Archiving images and material for specimens representing taxa 3.1.1. Multifocal imaging/Video Capture and Editing (VCE). NemATOL I implemented novel approaches to link molecular data to morphological vouchers, insuring future verification of correct species identification, even after destruction of the actual specimen for DNA extraction; this is accomplished by through-focus digital images linked with sequence data. In the present study, live nematodes selected for Solexa will be assigned a unique ID, immobilized (using routine mounting techniques; De Ley et al., 2005), and multifocal images recorded by VCE. Alternatively, in rare cases where live worms are not available, killed worms for Solexa, shipped for example in RNAlater buffer (http://www.ambion.com/techlib/resources/RNAlater/), will be otherwise vouchered for identification (see below). Alternatively, nematodes selected for DNA amplifications also will be assigned a unique ID and heat-immobilized or preserved in DESS (Yoder et al., 2006; Tandingan De Ley et al., 2007) for similar video recording prior to DNA extraction and PCR amplification (De Ley et al., 2005). Key diagnostic characters of the specimens will be imaged as described in De Ley & Bert (2002) and updated in Yoder et al. (2006). Most of the collaborating PIs already have appropriate digital cameras for this purpose. The resulting images are edited, archived in multi-platform video formats, and integrated into the NemATOL website database. When feasible, digital images will be made for some specimens subsequently preserved as conventional vouchers and deposited in museum collections; yet the digital images have the advantage of preserving data from fresh material and in their capability to be the precise morphological voucher of an individual exactly as it appeared when first observed with light microscopy. We propose that all material for the project be vouchered, including specimens destructively sampled for molecular data as: 1) through focus videos available online, 2) conventional slide mounted voucher specimens deposited in a recognized, curated collection, such as the UC Riverside or UC Davis nematode collections. Voucher images and/or slide mounted specimens will be linked to all metadata appropriate to a specimen or sample.

3.1.2. Frozen strains and material Where possible, live strains will be cultured (and if possible, cryogenically preserved alive), as we have done for many Clade V species (e.g., the NYU Rhabditid Collection). For species that can be isolated in bulk but not cultured (e.g., many parasites and some terrestrial freeliving species), material will be kept at -80C or in liquid nitrogen dewars. In many cases (e.g. marine nematodes and other single-isolate cases), neither archive method will be possible and only image archiving can be used. 3. 2. Multigene molecular data 3.2.1. Nemalogs: pan-orthologous genes for phylogenetic analysis of Nematoda The genetic loci of choice for inferring phylogenetic frameworks for the phylum Nematoda have been the nuclear ribosomal RNA genes (SSU and LSU), with occasional attempts to incorporate other putative nuclear othologs (Kiontke, et al., 2007). The broadest evolutionary frameworks for the phylum have all been based primarily or exclusively on SSU sequences (e.g. Blaxter et al. 1998, Meldal et al. 2006, Holterman et al. 2007). While much has been gained from these analyses, there are three general reasons why additional orthologous loci are critical to advancing our understanding of the relationships in the phylum. First is the need for more phylogenetically informative characters. The SSU and LSU loci have a finite amount of signal, often insufficient to resolve deep and/or closely spaced branches. Second, we need many putative orthologs across the phylum because any single gene’s history can be significantly misleading. The definition of orthologs is imperfect and greatly affected by gene duplication and loss, processes that are common in genome evolution (Lynch and Conery, 2003; Sebat et al., 2004, Tuzun et al., 2005; Korbel et al., 2007). Consequently, any single putative ortholog could produce erroneous inferences and the power of the approach is achieved only by establishing a consensus among putative ortholog gene trees. Finally, because evolutionary rates vary between genes and among data partitions within genes; different putative orthologs may be useful for addressing different phylogenetic questions at different timescales.

As part of our efforts to expand the number of genes for phylogenetic inference, we have refined and au- tomated an approach developed by Blair et al. (2005) to identify single-copy, pan-ortholog sequences across a large set of genomes using both complete genome sequences and unigene sets derived from large-scale EST projects. The goal of this project was to develop a relational database containing pre- computed sets of pair-wise gene comparisons that can be queried to give all known orthologous genes for the entire phylum or any subset of taxa. The tool developed utilizes two distinct methodologies for predic- tion of orthologous genes, a method based on Reciprocal Best Blast (RBB) as implemented by Blair et al. (2005) and an alternative approach based on a threshold percent maximal bit score as developed by Ler- at and colleagues (2003). In anticipation of the growth of genome wide datasets (as proposed here) we have developed a pipeline for analysis that includes a relational database and a web interface (http://ne- matol.unh.edu/ortholog/index.php) allowing users to select taxa and methods of analysis, and then down- load alignment ready putative orthologs in real time from a precomputed set for all nematodes with signifi- cant genome-wide sequence representation.

3.2.2. Datasets:

Table 1. Datasets included in our preliminary development of the Nemalog Database Genes analyzed Clade Species Genera per species In the development of this approach, V 9 6 2654-25775 we leveraged several sources of data including IVB 12 5 488-11174 predicted genes from completely sequenced IVA 3 2 3478-5234 genomes and unigene sets from cDNA/EST III 5 4 2113-17843 sequencing projects. For the EST libraries, the II 0 0 0 data included in this analysis was derived from I 2 2 5448-5951 the unigene clusters available at http://www.nematode.net. The data for both C. elegans and C. briggsae was based on complete genome sequences (http://wormbase.org). The B. malayi data was derived from two sources, the ESTs from http://nematode.net and a set of predicted genes available from the genome sequencing project. These genomes and EST projects have been excellent sources of comparative information but do not represent a thorough sampling of the biological/phylogenetic diversity of this large group. Most specifically Clade II is as yet completely missing from these resources and with the exception of multiple Caenorhabditis and Pristionchus species in Clade V, all available genome wide datasets are from parasitic nematodes. The broad expansion of orthologous genes for phylogenetics will require both the generation of genome/EST datasets from many more as yet unsampled lineages and methods to determine orthologous genes among those datasets. Figure 1. BLAST pipeline for determining intersection sets among datasets. 3.2.3. Pipeline for discovery of pu- tative orthologs.

A diagram of our pipeline for pan- ortholog discovery (Figure 1) begins with a pair-wise protein sequence based BLAST of all genes/unigenes from each organism against all other organisms. All of the results (top five hits) are cached. This includes self BLASTs to allow identification of inparalogs. From these results we are able to implement multiple methods of analysis as queries of the database and currently support both RBB and the method of Lerat et al. (2003). The web interface allows users to select taxa and methods. As implemented in our analysis the results are very similar to those based on much smaller numbers of taxa using the same methodology (Blair et al 2005). By applying these approaches to the current datasets it has been possible to identify large numbers of putative orthologs. As expected, the number of putative orthologs found depends greatly on the number of genes/unigene known for the inclusive set, the divergence of taxa selected, and the number of taxa selected. However, even with the relatively limited resources and very divergent lineages we have identified 14 nematode pan-orthologs.

Figure 2. Selected results from the Nemalogs pipeline showing the number of putative orthologs identified based on subsets of species (marked *) with genome wide resources placed upon the SSU phylogeny for those taxa.

3.2.4. Outcome and limitations: In an analysis of the 14 putative nematode panorthologs derived from 8 species we find that the concatenated set results in a branching pattern that is the same as that predicted by the SSU analysis presented Figure 2. Inference based on individual genes did not always support all clades but no significant conflicts were identified with any single gene. Based on these analyses we now feel we have a useful method for expanding the number of orthologous genes to be analyzed using current resources. However, it is also obvious that we will need to dramatically increase the number of taxa for which we have large-scale gene sequences. Furthermore, alignment of the orthologs resulting from this analysis demonstrates that the design of conserved/degenerate primers for assay of other divergent nematodes will be limited to a very small subset of genes. That approach has not proven workable in this group and ultimately is not cost effective. These observations are a major reason we are proposing the dramatic expansion of EST resources from 160 nematode species carefully sampled across the phylum

3.3. Solexa-Based EST Sequencing: a novel, rapid and low-cost solution for obtaining hundreds of gene sequences from 160 nematode species

3.3.1. Logic of the approach: Recent innovations in sequencing have resulted in two mature technologies Illumina Genome Analyzer (Solexa) and the Roche Genome Sequencer FLX (454, Margulies et al. 2005) both of which now make it possible to generate vast amounts of sequence data from small amounts of starting material without cloning. While the sequence reads generated in these approaches are short (36-250bp), the per-base cost is a fraction of traditional sequencing. Toward our goal of expanding the genes for phylogenetic inference across this diverse phylum we are evaluating both massively parallel sequencing approaches. However, based on our preliminary success described below with the Solexa approach and its already very low cost, we are concentrating our efforts on that technology because we have determined that it has the potential to rapidly expand our genome-wide, nematode datasets for discovery of orthologs for phylogenetic analysis. Our approach aims to discover large numbers of orthologous genes by generating EST-like sequences from approximately 160 nematode taxa representing key lineages across the phylum. Using the Solexa method to sequence cDNAs, it is theoretically possible to generate more unigene sets than are currently available for even the best EST databases using traditional approaches. The major hurdle for the Solexa approach is to cluster and assemble the unigenes from millions of short reads (currently 36bp, however new systems being set up at UC-Riverside and UC-Davis will produce 50bp reads). In theory and in practice with template from small, non-repetitive, homozygous genomes (e.g. E. coli) these reads can be very effective for assembly of relatively large contigs given sufficient sequence coverage (Chaisson et al., 2004; Whiteford et al., 2005). While the coding sequences of eukaryotic genomes represented in the cDNA are less repetitive than the genome as a whole, and are only a fraction of the total size of the complete genome, these cDNAs nevertheless represent specific challenges to assembly. Table 2. Solexa sequencing results and contigs formed for 5 diverse nematode species.

SSAKE contigs Reads First, in real samples, Species (set 1/set 2) (set 1/set 2) including normalized cDNA Plectus minimus 35,572/35,382 6,860,114/5,898,906 preparations, each gene is Romanomermis culcivorax 36,182/36,600 6,577,851/6,349,153 represented by different Hemicycliophora sp. 25,122 5,233,765 numbers of cDNA copies Thoracostoma 40,207 7,211,175 with over-coverage of some microfenestratum genes and under-coverage Diplolaimelloides meyli 36,157 6,489,748 of most. Furthermore, in Total 44,620,712 samples from natural isolates of diploid organisms as proposed herein, heterozygosity is expected to significantly interfere with assembly of contigs, a phenomenon that becomes limiting with short reads. Overall, the extraordinary number of reads in each Solexa run is expected to maximize gene discovery, even though the effect of heterozygosity is expected to limit the initial steps of contig formation. Contaminant cDNA from non-target organisms may also create specific problems, but various steps can be taken to avoid or minimize these. Prokaryotic mRNA can be largely excluded a priori by targeting polyA tails during cDNA synthesis, and eukaryote contaminants that are not close relatives of the target organism can be filtered out a posteriori on the basis of codon usage analysis among genes. In cases where closely related nematode DNA is likely to be included (e.g. as prey tissues in the gut of predatory nematodes), problems can be avoided by starving the target organism for several days before mRNA and cDNA preparation. To test the feasibility of this approach using real samples we have generated EST datasets based on Solexa reads from high-yield cDNA preps produced from minimal quantities of tissue (1-100 individuals) for 5 diverse nematode species. With the specific goal of finding sequences representing orthologous genes, we developed an analysis pipeline. This pipeline begins with the raw reads. These reads are first assembled using the program SSAKE (Warren et al. 2007) to produce sequence contigs. The output from SSAKE is then treated as typical EST sequence and assembled into putative unigenes using the CAP3 assembler (Huang and Madan, 1999). Finally, these unigenes are then compared to the other annotated genomic resources for nematodes to add them to the database of nematode orthologs. In our initial tests to evaluate gene discovery using a common metric for all samples we have compared the unigenes produced by our analysis to a set of 285 pan-orthologs from 9 metazoan species (Blair et al., 2005). In practice, we will simply add these unigenes to the Nemalog pipeline. There are key aspects of this approach that make it particularly useful for our goals of discovering and using orthologous sequences. First, orthologous genes show a strong tendency toward common housekeeping functions and generally high levels of expression (Lee et al., 2002). Consequently, orthologs are more commonly found in EST sequencing databases. Second, while the Solexa sequences are short, they are long enough that when translated in silico to generate significant amino acid matches in protein-sequence based BLASTs. To test these ideas we sequenced cDNAs from 5 different nematode species. (Table 2). From these species, a total of more than 44 million 36-base pair reads were generated. The reads from each species were subjected to the analysis pipeline described above. In two cases multiple flow cells were used for one cDNA sample. Due to processing time constraints we have only combined the analysis of both read sets for Plectus minimus.

3.3.2. Test cases and development of parameters for assembly: One of the limitations to this analysis is the processing time associated with SSAKE contig formation. To speed up the evaluation and optimization process in our preliminary analysis, we reduced the total dataset for Romanomermis by selecting only those reads that were from the mitochondrial genome of this species by blasting all reads against the published Romanomermis mitochondrial genome (Azevedo & Hyman, 1993). This analysis reduced the reads to be assembled and clustered from 14 million to 256K allowing rapid assessment of contig quality and optimization of parameters. Relative to the Romanomermis mitochondrial genome these reads represent >200-fold coverage, and as expected, subsequent alignment of these contigs with the published genome gave on average hundreds of overlapping contigs from SSAKE for each of the 12 mitochondrial, protein-coding genes. The consensus of which represents deep coverage of all 12 genes with some loss of coverage on the extreme 5’ and 3’ ends of each coding sequence, as expected. Based on the evaluation using this reduced dataset we have refined our parameters for the initial contig formations and followed this approach on complete datasets. To evaluate our ortholog discovery we used the 286 animal wide pan-orthologs (Blair et al., 2005) as BLAST targets to assign all clusters to specific genes. This approach permits the generation of overlapping sets of clusters that can be combined to form putative orthologous gene sequences. Table 3. Number of sequence contigs assigned to each of the 285 pan-orthologs for each species. Pan-orthologs 1 or more contigs >3 10+ Species of 285 possible contigs Contigs Clade Plectus minimus 284 258 179 most probable outgroup to III+IV+V Romanomermis 187 26 0 I culcivorax Hemicycliophora sp. 142 81 18 IV Thoracostoma 258 134 20 II microfenestratum Diplolaimelloides meyli 222 60 7 outgroup to Plectida+III+IV+V The numbers in Table 3 represent the number of putative unigenes in the Solexa analysis pipeline (Post CAP3) that hit one of the 285 putative metazoan panorthologs (Blair, 2005). These results represent the first-ever multigene data obtained from a Clade II nematode, as well as first multigene data from the phylogenetically crucial outgroups to the Clades III, IV and V. They demonstrate the versatility and applicability of our approach across phyletic diversity and suggest a very high rate of discovery of likely pan-orthologs. This is particularly true for the Plectus minimus sample where we have successfully analyzed more than 12 million reads to date (almost twice as many as for the other species). In this case all but one of the likely pan-orthologs had at least one matching unigene sequence from the Solexa analysis of that species, while the majority had more than 10 matches. 3.3.3. Next steps: This analysis demonstrates that the Solexa reads can be used effectively to find gene sequences that are likely to be panorthologs from these species and from many others that were hitherto inaccessible or highly impractical for genome-scale sequencing. Our intention is to extend this analysis to use the unigenes produced by the analysis pipeline for Solexa sequences in the generation of Nemalogs (described above). To do this we must first develop consensus sequences for unigenes that appear to match a single gene in other nematode taxa. As is clear in Table 2 many genes are represented by multiple contigs from the analysis pipeline. This is largely due to an inability to assemble short contigs into single unigenes, resulting often in largely overlapping contigs that failed to assemble with one another due to sequence mismatches resulting from heterozygosity (or sequencing errors). While we can extend these contigs by reducing the stringency of assembly we have chosen instead to conduct this final step after the BLAST analysis in the Nemalog process by modifying that pipeline. The advantage of this step is that we will avoid making chimeric assemblies and allow for the production of gapped assembly where a transcript is not completely represented by sequence contigs. Because the goal of this analysis is to generate orthologs that can be used in phylogenetic analysis, contigs that do not align well with their nematode homologs are of minimal use in phylogeny. Similarly, regions of coding sequence that do not match well (and will be selectively excluded in this analysis) were of little use to our goals as they will be excluded in the alignment. However, these contigs from partially assembled genes will provide information for primer design to allow the filling of gaps in these sequences. To address recognized bottlenecks in the analysis, specifically the assembly of Solexa reads into contigs using SSAKE, we have engaged the cooperation of the ToL CIPRES group (http://www.phylo.org) and supercomputing facility in San Diego.

3.4. Analysis of mitochondrial gene sequences and gene order

As mitochondrial genes evolve at rates different from each other and from nuclear loci, the 12 protein coding and two organelle-specific ribosomal RNA genes found in nematode mtDNA are likely to have phylogenetic utility at several different levels within the nematode tree. The phylogenetic signal in these genes may well enable improved resolution of branching order in parts of the nematode tree that are left unresolved in nematode clades with especially slowly evolving SSU and LSU sequences as occur especially in Spirurina (Clade III), Tylenchina (Clade IV) and Dorylaimia (Clade I). Analysis of mitochondrial DNA (mtDNA) sequence data has long been a popular and useful tool for molecular phylogenetic analysis and discussions continue unabated of its relative merits and weaknesses compared to informative nuclear loci (Boore, 2006). Phylogenetic analyses of complete mtDNA genomes for nematodes have included few species to date (Hu et al., 2003); our preliminary analyses show strong support for clades (Figure 3), and certain conflicts in comparison to nuclear ribosomal DNA trees (e.g. Blaxter et al., 1998; Holterman et al., 2006; Meldal et al., 2006). We propose to include this important approach not only as part of the multigene dataset per se, but also to provide hypotheses for mitochondrial gene order evolution and to compare the evolutionary rates of nuclear and mitochondrial genes. The overall size of the multigene dataset to be accumulated in the course of our project creates unique opportunities for addressing many questions specific to phylogenetic affinities among nematodes as well as larger questions regarding metazoan mitochondrial genome evolution. Accepted wisdom is that mitochondrial transcriptional organization, or gene order, is generally stable at lower taxonomic levels (e.g., families) but can vary at higher ranks, as among phyla (Boore and Brown, 1998, Boore, 1999). Gene order has been advocated as a uniquely homoplasy-free source of cladistic characters and typically employed to resolve ancient relationships (e.g. Ryu et al., 2007) but with varying degrees of success (Boore 2006). However, rates of gene order change within the two major nematode taxonomic classes are decidedly different (Hyman, 2007; Tang , 2006). While syntenic relationships are often conserved among the 14 Chromadorean mitochondrial genomes sequenced, no two gene orders are the same among the eight available complete Enoplean mtDNA sequences, nor among three congeners from the genus Romanomermis. The use of mitochondrial gene order as a first-choice phylogenetic tool is at best controversial, and we are aware of the many pitfalls and unknowns that must be considered (Boore, 2006). However, in the case of Chromadorea, we believe that the occurrence of large blocks of synteny reflects a tractable history of individual mutations, matching the premises of Rokas and Holland (2000) about the phylogenetic advantages of rare genomic changes. The opportunity is now available for selective application of this method across a targeted range of chromadorean species where we expect it to be particularly valuable.

Figure 3. Combined analysis of all 12 mtDNA genes of all currently available nematode species (Soonthornpong, Nadler & Joong-Ki, unpublished; Hyman et al., unpublished). Identical topologies are obtained maximum likelihood, maximum parsimony, and neighbor- joining methods at amino acid level and catenated nucleotide sequences including two ribosomal RNA genes with the 12 protein coding genes (excluding 3rd codon positions).

Nematode mtDNA genes which are among the most highly expressed RNAs in tissues, based on the prelimary analyses described above will be well-represented in data from the high-throughput Solexa sequencing effort. As such, large scale mitochondrial gene-by-gene comparisons can be accomplished for all 160 taxa analyzed. While immature mitochondrial transcripts are typically poly-cistronic and would be useful for deducing adjacent genes and gene orders, it is the processed, polyA+ transcripts that provide template for Solexa high-throughput runs. As such, it seems unlikely that Solexa sequencing will enable direct assembly of mitochondrial gene orders. However, divergently oriented primers can be developed for each mitochondrial gene sequence derived from the Solexa sequencing effort, enabling a PCR map of adjacent gene pairs with successful amplifications. This strategy creates a simple two-step process (if necessary) for constructing gene orders in a facile, high- throughput fashion. Analysis of mitochondrial gene orders in nematodes has also been greatly facilitated by our applications of rolling circle amplification (Tang and Hyman, 2005, 2007), and it is anticipated that larger scale mtDNA sampling combined with selective studies of gene order will prove especially valuable in helping us to resolve the phylogenetic relationship of the outgroup orders Plectida and Monhysterida to the three major clades (III, IV and V) contained within the order Rhabditida. Collectively these three orders constitute a deeply embedded monophylum within the Chromadorea. We therefore expect that levels of synteny in this clade will be consistent with previous findings within this class, and appropriate for unambiguous retrieval of tractable gene rearrangements. This solution requires resolution of relatively ancient associations; mitochondrial gene order has found utility for this purpose in many studies (reviewed by Boore et al., 2005).

3.5. Selective analysis of phylogenetically informative development processes

Data on developmental variation among species can be fairly independent of morphological variation (e.g. Kiontke et al., 2007) in the final adult and are expected to be independent of DNA sequence data. Like morphology, however, much of the developmental data obtained to date have revealed a striking degree of homoplasy and when used by themselves to construct phylogenetic trees leave little that is resolved (because homoplasy in one character is usually independent of homoplasy in another developmental character; Kiontke et al., 2007; Brauchle, Kiontke, Fitch & Piano, submitted). For example, multiple convergences and reversals have occurred in male tail tip development, yet most changes occur in the ancestors of well-marked clades (see Kiontke & Fitch, 2005). Despite this homoplasy when analyzed alone, developmental datasets—like morphological ones—tend to increase the resolution of the molecular-based trees when combined in a "total evidence" approach (Kiontke & Fitch, unpub.), presumably because the multiple changes in homoplasious characters can serve as apomorphies for multiple clades supported by the preponderance of the data. That is, the homoplasy of these characters is not due to significant conflict in phylogenetic signal between morphology, development, and molecules. Importantly, there are several developmental characters that have been found to change rarely and are very clearly apomorphic for particular clades. For example, the orientation of a cell division in the 2-cell embryo flipped 90˚ in an ancestor of the clade that includes Protorhabditis, Prodontorhabditis, and Diploscapter (Brauchle et al., unpub.; Dolinski et al., 2001). A change in anteroposterior (AP) axis specification due to sperm entry point is a synapomorphy linking Panagrolaimorpha with Tylenchina (Goldstein et al., 1998; De Ley & Blaxter, 2004). Other early developmental events may also be useful in indicating basal clades (Schierenberg, 2005). A novel ventral migration of the gonad's distal tip cell is synapomorphic between diplogastrids and Rhabditoides inermis (Kiontke & Fitch, unpub.). The posterior positioning and the timing of migration of the phasmids relative to male genital papillae are also informative developmental characters marking particular clades, at least in rhabditids. The latter events are clearly revealed with immunofluoresence using an anti-adherens antibody, MH27, which we have demonstrated has a broad species range (at least Clades IV and V). Thus, an aim of the present grant is to incorporate some developmental characters that appear to change rarely into a "total evidence" approach to test if these datasets will aid resolution of the major branches of nematodes. Such characters will include some easily scored early developmental processes (e.g., AP axis specification events as correlated with P granule segregation; orientation of the earliest cleavage division planes; certain clear cell migration events, including those involved in positioning male genital papillae). Taxa will be chosen for developmental analysis by their representation of major clades on the molecular tree and by other criteria, such as ability to culture embryos). For scoring these characters, we will use currently available imaging systems in our labs. In the total evidence analysis, we will check for robustness of the phylogenetic conclusions by assessing the effects of different character weighting schemes (e.g., each character set equally weighted vs. all characters equally weighted), different methods (parsimony, Bayesian), and different models for character-state changes (e.g., unordered vs. ordered, if there are multistate characters in which a stepmatrix model can be envisioned).

3.6. The role of morphology in resolving major radiations

3.6.1. Using morphology to inform phylogeny. Our experience underscores that diverse and often independent datasets are complementary and inform one another toward resolving phylogenies, molecular and morphological character sets being the prominent example. NemATOL I leveraged clade-specific workshops to discuss and define characters and states to be assessed for phylogenetic utility. These character sets emphasized traditional aspects of morphology to be collected and verified by individual labs in conjunction with relevant new data on character homologies and polarity from developmental and other evidence. Morphological characters were primarily evaluated and used for phylogenetic inference by groups of investigators working within clades rather than across the entire phylum. For some groups (e.g. Rhabditina, Clade V) there is a more detailed characterization of certain classical morphological characters, and a more advanced understanding of their informativeness for phylogenetic analysis (Kiontke, Sudhaus and Fitch, unpub.). For example, a change from a pore-shaped to a slit-shaped vulva is apomorphic for a clade we temporarily call” Eurhabditis”. That the Rhabditis-Adenobia group is the sister taxon of Oscheius is supported by the loss of the pharyngeal median bulb. All species in the “Mesorhabditis group” share a posterior vulva, prodelphic gonad, and complex oviduct and uterus. However, in most cases, molecular frameworks suggest that many characters traditionally used to define taxa are proving to be homoplasious and would be uninformative or even misleading for defining phylogenies if used by themselves. Despite this homoplasy, our morphological character sets tend to increase the resolution of several clades when combined with molecular data in a “total evidence” approach (Kiontke and Fitch, unpub.). Thus for morphological characters that are represented across the phylum, we proposed to extend these analyses to other taxa representing additional major clades (i.e., ~20 members of each of the major clades I-V). Specific taxa will be chosen by their representation of molecularly defined subclades and availability of good specimens. Problems with morphological characters (which often lack detailed characterization) include misinterpretation, phenotypic plasticity, convergence, non-independence and epigenetic factors as well as the large number of continuous characters that can be difficult to recode (Fitch, 1997, Subbotin et al. 2006 Subbotin et al. in review1). Although some of these problems also affect molecular data, resolving them for nematode morphology must take into account that these traditional characters are often at the limits of light microscope resolution (thus leading easily to misinterpretation). Against this background, in the context of NemATOL I, we have demonstrated the value of employing SEM, TEM (including 3D modeling), and confocal-LM, to discover new (and reinterpret classical) characters in representatives of major clades including Rhabditomorpha, Cephaolobmorpha, and Tylenchomorpha, (Baldwin et al., 2004; Bumbarger et al., 2006, 2007, submitted, Dolinski and Baldwin, 2003). These approaches have characterized features that appear homologous and promising for testing specific hypotheses of relationships within Nematol II, including special emphasis on slowly evolving morphological features that may help resolve nodes that have weak support in current molecular trees.

3.6.2. Selection of taxa for targeted morphological investigation The scope of Nematol II limits the number of taxa that can be evaluated with technically demanding morphological tools; thus to maximize overall phylogenetic insight for the phylum, it is crucial that these representatives be selected to best reflect character states across clades and outgroups; this process is supported in Nematol II as major clades are increasingly resolved. Based on our preliminary studies as to what will be the most productive we propose to target: 1) Aspects of lip patterns as observed by SEM, 2) Cellular basis of feeding structures via TEM reconstruction. This includes, in the stoma region, anterior hypodermal, arcade and anterior pharyngeal cells as well as strategic regions of the pharynx including the basal region/bulb, 3) The basic topography of the nervous system in key areas including the head region (sensory organs), and innervation of the male tail (e.g. comparing underlying topography of males with and without caudal sensilla (e.g., papillae) expressed at the surface); this includes the phasmid which is considered a synapomorphy for the Secernentea sensu lato (= Rhabditida) radiation and may or may not include homologs outside Secernentea2 4) Architecture of the secretory/excretory system across the phylum with respect to patterns of glands and canals and expanding on the basis for classical bifurcation of the phylum (e.g. Secernentea and Adenophorea sensu lato; Chitwood and Chitwood, 1977)3. This region is also promising for resolving branching of clades within the Secernentea radiation, including apparent synapomorphies in the particular topography of lateral canals4. 5) Additional promising characters relevant to major branches throughout Nematoda is suggested by ongoing work of collaborator Yushin from TEM of diverse sperm (Yushin and Malaknov, 2004, Yushin and Coomans, 2005), and 6) In patterns of early development by collaborator Schierenberg from 4D resolution of early embryos (Schierenburg, 2006). In addition to participants Yushin and Schierenburg, collaborators Giblin-Davis and Poiras, have long interacted with Baldwin on several projects including classical taxonomy with SEM and TEM; this expertise combined with their ongoing surveys focusing on groups underrepresented (i.e. insect associates) and potentially key to some important radiations, will be invaluable to both morphological and molecular aspects of the proposed projects.

Whereas the morphological selections are undoubtedly ambitious in the context of NemATOL II, we

1For example, in NemATOL I testing clade Pratylenchus demonstrated a large set of classical characters to lack phylogenetic information. However a very small set of redefined classical features and new characters from SEM of lip patterns, while too small for a combined analysis with molecular data, did provide a basis for demonstrating a striking level of congruence using Baysian approaches and pie charts to demonstrate and convey the strength of each node and the strength of each character state at the node (Subbotin et al., submitted). 2By example it would be phylogenetically informative if the pattern of innervation essential for phasmids was conserved in taxa such as Plectida that are sister to the Secernentea, and furthermore if that same conservation does or does not extend more broadly to additional outgroups. 3 The excretory system may also prove informative within clades. Its structure was one of the few morphological characters congruent with molecular trees for Ascaridoidea (Nadler and Hudspeth, 2000) 4 By contrast to tradition TEM methods, fixation by freeze substitution (Bumbarger et al, 2006, 2007) especially preserves excretory canals that are open and readily resolved. propose that the work is tractable because 1) NemATOL II will extend the phylogenetic framework as a basis to efficiently target the most useful representatives of clades, 2) we are building upon long experience of PIs and collaborators including that gained through NemATOL I, 3) areas of special morphological interest can often be elucidated simultaneously. By example, innervation of the anterior region is resolved in conjunction with reconstruction of the stoma (Bumbarger et al., 2006, 2007, Figure 4) and regions of the pharyngeal base and excretory system involve investigating (as for TEM sections) the same region, 4) with development of emerging software and microscopy tools which are already available in our labs, technical processes can often be simplified to access characters across a wider range of taxa. For example, phylogenetically informative patterns of radial pharyngeal muscle cells (De Ley et al, 1995, Baldwin and Eddleman, 1995, Baldwin et al, 1997, Dolinski, and Baldwin, 2003) first elucidated with TEM reconstruction can now be used to interpret character states across a range of taxa using more rapid confocal procedures in conjunction with specific florescence of membrane junctional complexes. The demonstration of confocal microscopy as a useful shortcut has been developed in the Baldwin lab by collaborator Burr, and has built upon methods previously used by PI Fitch (Fitch and Emmons,1995, Burr and Baldwin, unpublished). Indeed, characters first elucidated with detailed TEM reconstruction often "open our eyes" to visualize key features using simpler and more efficient instrumentation. In short, phylogenetic approaches in morphology are designed with proven techniques and the most cost effective and efficient methods possible to 1) document specimens used for molecular data with corresponding records of phenotypes and 2) engage diverse expertise (PIs and collaborating participants) to target the taxa and features most promising toward resolving major radiations within the limited scope of this project. Figure 4. Serial TEM reconstructions depicting cell-cell homologies across taxa with quick time movies A, B anterior nervous system resp. epidermal cells (Cephalobina) C. Anterior epidermal cells and stylet (Tylenchida). (Bumbarger et al. 2006, 2007, Ragsdale et al. submitted)

3.7. Phylogenetic analysis – Molecular characters

3.7.1. Overall strategy. As part of NemAToL I, more than 1,000 SSU (18S) rRNA sequences were used to identify unambiguously supported clades reflecting the diversity of Nematoda (e.g. Kiontke et al. 2007). However, measures of confidence in certain major clades in SSU trees are sometimes low, and results can vary according to inference method and other factors, including the multiple alignment (Smythe et al, 2006). In virtually all cases where it has been applied, combined analysis of near-complete SSU plus LSU sequences provides substantially increased resolution (and clade support) for the nuclear rDNA gene tree (e.g.Kiontke et al. 2004, 2007; Kiontke & Fitch, 2005; Nadler et al., 2006). As part of the nuclear rRNA transcription unit, the LSU is present in many copies per genome and therefore is comparably easy to amplify and sequence. Therefore, one goal is to sample the same 160 species that are targeted for single-copy nuclear gene sequencing (i.e., Solexa and directed methods) for near-complete (all but the extreme 5’ and 3’ ends) LSU rRNA sequences. This will maximize the number of informative characters available for inferring the rRNA gene tree, which has become a benchmark for nematodes, not unlike 16S rRNA in prokaryotes. These same taxa will be targeted for Solexa sequencing of highly expressed single-copy nuclear genes and the 12 protein-coding mtDNA genes, and this combination of molecular datasets should provide enhanced resolution for deep splits (and certain other weakly resolved nodes) in the nematode tree of life. Phylogenetic analyses (details follow) will be performed independently on each molecular dataset (i.e., each presumptive locus) to assess at what levels different loci are contributing to the analysis, and to determine if certain potentially problematic features associated with a few taxa (e.g., compositional bias) are repeated across loci. However, the emphasis will be on combined analyses of molecular datasets, which will include two approaches. First, a “total evidence” analysis of all combined sequence datasets (Kluge, 1998). Second, congruence tests (Partition homogeneity test; Farris et al. 1995) will be performed, and gene partitions that are not found incongruent will be combined for analysis. Both approaches are likely to provide resolution of portions of the tree poorly resolved in the analysis of individual datasets such as the aforementioned SSU rDNA tree (e.g., see Fitch 1997; de Queiroz et al. 1995). Comparison of topologies inferred from “total evidence” and congruence-tested datasets will reveal which clades are sensitive to the assumptions underlying these very different philosophical approaches to combining data partitions. These different philosophical “camps” are present even among the NemATol collaborators, so this dual approach to combined analysis is needed for our own satisfaction.

3.7.2. Characters used for phylogenetic inference. For any particular level of phylogenetic inquiry, we will only use unambiguously aligned regions of the sequences. Previously, we have used secondary structure models for nematode rRNA (SSU and LSU) to guide alignments over difficult regions (e.g. Fitch et al., 1995, Kiontke et al., 2004, Meldal et al., 2007; Subbotin et al., 2007). These models will also be used for guiding the rDNA alignments for these analyses, and rRNA characters will be partitioned into "stems" versus "loops" (see Fitch et al. 1995, Subbotin et al.) to test the effects of differential weighting of such sites on tree inference. Unambiguous regions can be selected without investigator bias by employing the program ProAlign (Löytynoja and Milinkovitch. 2003) to filter characters by their average minimum posterior probability (caveat: some datasets with long internal gaps will not work for ProAlign), or alternatively, by using the Cutter tool developed for this purpose as part of NemAToL I (Cutter is available and described on the NemAToL website). We also intend to test the sensitivity of our phylogenetic conclusions to the use of gap characters by conservatively recoding unambiguous gaps as an additional character state (e.g., see Fitch et al. 1995). Only characters that have states assigned for all of the included taxa will be used for molecular analysis. For example, with Solexa sequencing, this may preclude using some sequence at the 5’ and 3’ termini of certain genes. For the protein-coding genes, the inferred amino acid sequence alignments will be used to guide the sequence alignment of nucleotides, with gaps placed from peptide alignments using RevTrans (http://www.cbs.dtu.dk/services/RevTrans/). Protein-coding genes will be partitioned by codon position to assess the effects of including all positions, versus just 1st and 2nd positions on tree topology. Similarly, phylogenetic hypotheses based on amino acid characters will also be explored because these characters may be more informative for relationships deeper in the tree.

3.7.3. Phylogenetic methods Analysis will initially be performed on the each dataset separately, with the exception of the mtDNA genes, which will be considered to represent a single linked genome for the purposes of analysis. The partition homogeneity (ILD) test will be used to determine if different datasets are significantly incongruent. Genes lacking incongruence will be combined for analysis. If the datasets are incongruent, we will try to determine the characters or taxa responsible (e.g., by analyses with taxon deletions or character elision). Although the ILD test can reveal that different datasets should not be combined, there are reasonable differences in philosophy about approaches to combining data in developing phylogenetic hypotheses (e.g., Huelsenbeck et al., 1996). Therefore, even when incongruence is revealed, a "total evidence" analysis will be performed to determine how different the resulting hypotheses are. In combination with the ILD results, topologies from individual gene analyses may indicate which partitions are mainly responsible for differences between “combined congruent” and total evidence trees. We will use parsimony (MP) maximum likelihood (ML) and Bayesian (BPP) methods as tools to explore these data in different ways. All of these methods are tractable for the size of the datasets to be analyzed (~165 taxa) given current implementations. MP offers an efficient method to explore tree space using relatively simple data models. For example, comparing tree topologies for rRNA gene trees with and without stem:loop character weighting or stepmatrices that incorporate estimates of transition:transversion ratios. Heuristic search strategies will be required given the number of taxa planned for these analyses. For unweighted parsimony analysis, the parsimony ratchet (Nixon, 1999) will be used to increase the odds of escaping local optima and finding shorter trees. Bootstrap resampling (with heuristic searches) will be the primary tool for assessing the relative reliability of clades for parsimony analyses.

Bayesian and ML approaches offer the potential advantages of using an inference method with explicit (and more complex) substitution models. Especially for cases where significant variation exists among branch lengths (including some nematodes), choice of the appropriate model is important for accurately recovering phylogenies (Cunningham et al. 1998). We will compare and choose among models for each gene using the Akaike information criterion using ModelTest (Posada and Crandall, 1998) or other model- fit programs now available as online resources. Because of its relative insensitivity to rate differences along branches and its ability to incorporate corrections of superimposed substitutions at different rates over different sites (Huelsenbeck 1997; Cunningham et al. 1998), ML will be our primary tool for model- based phylogenetic inference for individual genes. Due to the moderate size of these datasets (~165 taxa), implementation of ML will be most feasible using RAxML as implemented through the Cipres portal web server (http://8ball.sdsc.edu:8889/cipres-web/Home.do), which also permits Bootstrap-RAxML analysis. Bayesian ML (Huelsenbeck 2000) will be our preferred tool for analyzing combined datasets since it is particularly effective for implementing a mixed/partitioned model analysis wherein different genes have different best-fit models. The Cipres portal now hosts a parallel implementation of MrBayes as do other sites such as the High-Performance Computing Institute at Cornell (http://cbsuapps.tc.cornell.edu/mrbayes.aspx). Bayesian analysis will be used to test the stability of branches as the posterior probabilities of trees collected after convergence of a search to a stable likelihood value (Huelsenbeck, 2000), which may require several million generations given these datasets. Additionally, once the most-highly resolved branches are established, the clades defined by these branches can be focused on for more intensive tree evaluations and stability tests, including statistical tests of alternative topologies. These approaches, including alternative topology testing, have been used by us in testing nematode phylogenetic hypotheses (Nadler and Hudspeth, 2000; Sudhaus and Fitch 2001). In all cases, tree rooting will be performed by the outgroup method. For the most comprehensive analyses, this will involve using taxa outside of Nematoda (e.g., Nematomorpha and others) as informed by the Protostome ToL project. Finally, new procedures or improvements for multiple alignment, model estimation, phylogenetic inference and alternative topology testing continue to be published at a rapid rate. These include research and development by other ToL projects, for example, advances in methods to perform simultaneous multiple alignment and phylogenetic inference for large datasets (Holder et al. ToL project). We hope to take advantage of these new computation resource tools as appropriate to Nematode ToL project.

4. Implications for Classification

Nematode classification has a long history of controversy and disagreement among experts, often based on seemingly arbitrary choices of preferred taxonomic principles and preferences (see review in De Ley & Blaxter, 2002). Many of these controversies mirror larger discussions and changes in the theory and practice of animal systematics in general, but with especially aggravating factors such as: sheer diversity of species and taxa; fragmentation of expertise along boundaries of zoological subdisciplines (e.g. plant nematology vs helminthology vs meiobenthology vs soil ecology); fragmentation along major language and geography boundaries (e.g. western european vs american vs former soviet bloc vs south asian); difficulties in observing and interpreting characters at the microscopic edge of optical resolution; high preponderance of homoplasy and intraspecific variability. The first SSU rDNA phylogenies to appear in print provided an opportunity to change the nature of these controversies by moving away from the lifetime experience and individual authority of recognized world experts, towards more formal and less personal questions of sequence analysis and comprehensive representation of known diversity. De Ley & Blaxter (2002, 2004) proposed a first classification based primarily on molecular phylogeny, which was relatively compatible with different aspects of different prior morphological systems and proved useful for framing more precise questions in the context of the first Nematode AToL project. However, being very largely based on a single gene, this system was constrained by the apparent limitations of the gene in question, including the lack of resolution of the major clades near the origin of the phylum, as well as around the origin of nematodes with phasmids. Our proposed efforts to resolve these as yet polytomous major radiations (by means of a wide range of data from carefully selected taxa) will also provide added value in permitting a comprehensive reappraisal of this and preceding systems. We propose to do this in a manner that is best suited to attain a relative degree of consensus, through established mechanisms and collaborations continuing from the first Nematode AToL, by encouraging a community-wide discussion of the multigene and multidata trees generated during our new AToL project. Fundamental questions to be addressed will include the relative usefulness and theoretical strengths vs weaknesses of Linnean nomenclature as compared to phylocode proposals, the degree to which phylogenies should inform vs define classifications, how to accommodate different usage of taxon names in different subdisciplines, etc. These discussions will be channeled through our workshops and through symposia at forthcoming conferences, as well as through creation of a publicly accessible section of NemForum, an online forum tool currently used as restricted-acces tool for project management via the Nematol database (see management section). While we are hopeful that a general consensus about nematode classification has now become much easier to reach than in past decades, we will prioritize the production of a comprehensively revised system and make the necessary decisions by consensus within our team of CoPIs, if wider consensus seems to fall prey to protracted or circular debates.

5. Impact and Outreach

A key impact of the previous and proposed nematode ATOL is far reaching promotion of worldwide linkages for communication, integration and comparison across different scientific cultures and disciplines (see previous section). This is accomplished by bringing together into a single evolutionary and taxonomic framework the unparalleled diversity of nematode taxa, including worldwide representation of freeliving species as well as parasites of plants and animals. The outreach is extended by worldwide accessible online databases (http://nematol.unh.edu/), forums and modern information technology at all stages (e.g., digital imaging, multiple types of optical and electron microscopy, testing homologies scoring and polarizing of classical and novel morphological characters, using established and cutting edge methods of sequencing diverse genes and phylogeny reconstruction). The CoPIs each have long- established international ties, and the project engages collections, and collaborators worldwide through workshops and laboratory exchanges. Online resources, including video images and SEM have been and will be further incorporated into teaching programs and workshops worldwide. These resources, databases and enhanced museum collections all provide infrastructure that supports and integrates diverse nematology research beyond the scope of the proposed project. The nematode tree-of-life has and will continue to expand on a superb track record in supporting graduate and published undergraduate research with a high proportion of participation by underrepresented groups. The proposed project expands this excellent record by incorporating research participation of minority undergraduate students at Elizabeth City State University, NC, as well as at least two female graduate students at UCR including one African American.

Beyond phylogenetics, the nematode ATOL supports researchers working on comparative and applied aspects ranging from genomics to parasite control. The phylogenetic framework allows them to find better models for studying the biology of their target organisms. Soil ecologists and biodiversity surveyors are aided by new tools supporting identification and phylogenetic classification, while evolutionary ecologists gain new opportunities to explore the evolution of functional relationships between decomposers (fungi and bacteria), primary producers and their respective nematode regulators. Greater phylogenetic documentation of the interweaving of plant and animal parasitism allows parasitologists to better investigate principles of parasite-host interaction. A phylogenetic framework for parasites will further improve communication with Caenorhabditis genomicists, neurobiologists and embryologists, and especially so for those seeking to expand the diversity of model species compared with C. elegans. Better understanding of the nematode branch also supports new insight into outgroups i.e. the broader Tree-of-Life. The nematode ATOL has been and will continue to be a catalyst for development of technologies for applications beyond phylogenetic analysis. At least two of our collaborators from the previous nematode ATOL are leveraging molecular phylogenetic information in developing assays to monitor nematode communities. One ATOL collaborator (Holterman et al., 2008) has developed methods based on rDNA sequences to assay the abundance of specific plant pathogens in soils, a process that has taken hold in the private sector testing environment. Similarly, collaborator Giblin-Davis and colleagues (Porazinska et al., 2007) have developed massively parallel sequencing approaches based on rDNA sequences to assess nematode communities. Databases, alignments and phylogenetic analyses stemming from our studies are critical to the development of these uses and their evaluation.