C. elegans on the web: used to serve a worldwide scientific community

Carlos Eduardo Winter Department of Parasitology – Instituto de Ciências Biomédicas Universidade de São Paulo – e-mail: [email protected]

Introduction: have been used as experimental animals since the 19th century. At that time the research done by Theodor Bovery in the 1890’s with Ascaris embryos showed the importance of chromosomes as the carriers of genetic information. During the first half of the 20th century nematodes lost their importance as model organisms in biology as insects rose to the preferred models, mainly as research objects in physiology and genetic studies. With the advent of novel molecular techniques, as result of the Molecular Biology revolution, other animals could be used as new objects in biological research. Genetic analysis of free-living nematodes were done by French researchers more than 50 years ago1. One of these nematodes is the worm . C. elegans is a tiny (~1mm long) worm, taxonomically described by Maupas in 19002, with a short life cycle (~3 days). It gained notoriety after Sidney Brenner chose it as a new in the middle of the 1960’s. After being used as a model for studies in developmental genetics and neurobiology, the of its complete genome in 19983, made it the best organism to study several biological problems in eukaryotic metazoa. In August 20024 the genome of another species of the genus Caenorhabditis, C. briggsae, was also sequenced and is currently being annotated, using C. elegans genome as a scaffold. The success of C. elegans as a worldwide model system for biological studies was crowned in 2002 with the Nobel prize for Medicine and Physiology awarded to three of the founders of the modern C. elegans studies: Sidney Brenner (Molecular Sciences Institute), H. Robert Horvitz (MIT/HHMI) and ( Sanger Institute). A great part of this success can surely be granted to the sharing of results through several means of communication. First through the "Worm Breeder’s Gazette", based on the success of the "Drosophila Information Service" (DIS)(published since March 1934; and now in the Internet5). As noted in the first issue of DIS “An appreciable share of credit for the fine accomplishments in Drosophila genetics is due to the broadmindedness of the original Drosophila workers who established the policy of a free exchange of material and information among all actively interested in Drosophila research.” The C. elegans community has the same policy of free exchange of material and information and this is seen in the WBG, published since December 1975, first available in printed form and now available on the web6. It publishes preliminary results and methods on C. elegans research, and is an important vehicle for

1 NIGON V & DOUGHERTY EC (1950) A dwarf mutation in a nematode. A morphological mutant of Rhabditis briggsae, a free-living soil nematode. J. Hered. 41, 103-109. 2 MAUPAS E (1900) Modes et formes de reproduction des nematodes. Achiv. Zool. Exper. Gen. 8:463- 624 3 The C. elegans Sequencing Consortium (1998) Genome sequence of the Nematode C. elegans: A platform for investigating Biology. Science 282:2012-2018. 4 http://www.wormbase.org/old_news.html 5 http://www.ou.edu/journals/dis/ 6 http://elegans.swmed.edu/wli/ CE Winter - Worm Web Resources - 2003 2 disseminating rapidly the results among the C. elegans community. ACEDB: During the C. elegans genome sequencing project several bioinformatic tools were developed with the aid of the C. elegans Research Community. The best known of these tools is the ACEDB (A C. elegans Database)7, a relational database developed by Thierry-Mieg and Durbin (1992)8 that was used for Arabidopsis data (AATDB), human genome data and Drosophila (FLYBASE). The original program was written in C and runs in UNIX OS and has served as the central repository of phenotyping, bibliographic, mapping, and sequencing information for the Caenorhabditis elegans community since 1990. Lately it was improved by the development of ACEPERL9 an object-oriented Perl interface for the ACEDB database. These tools were brought together and originated the WORMBASE, freely available through the Internet (see below). In this very short review we will try to show some of the web based bioinformatic tools that have been used by the C. elegans community to help the study of the biology of this important model animal. Obviously this text will be outdated at the time of printing and anyone that would like to use the C. elegans to solve a biological problem should consult the documents and web sites cited herein.

1. WORMBASE

The WormBase contains information on: (1) the essentially complete genome sequence; (2) the developmental lineage of the worm; (3) the connectivity of the nervous system; (4) mutant phenotypes, genetic markers and genetic map information; (5) gene expression described at the level of single cells; and (6) bibliographic resources including paper abstracts and author contact information (Harris et al., 2003)10. The Table of Contents of the Wormbase manual11 shows all the different possibilities of retrieving information (published and unpublished) about C. elegans. One of the

7 http://www.acedb.org/ 8 THIERRY-MIEG J & DURBIN R. (1992) AceDB, a C. elegans database. Cahiers IMABIO 5, 15-24. 9 http://stein.cshl.org/AcePerl/index.html 10 HARRIS T W, LEE R, SCHWARZ E, BRADNAM K, LAWSON D, CHEN W, BLASIER D, KENNY E, CUNNINGHAM F, KISHORE R, CHAN J, MULLER H-M, PETCHERSK A, THORISSON G, DAY A, BIERI T, ROGERS A, CHEN C-K, SPIETH J, STERNBERG P, DURBIN R & STEINL D (2003) WormBase: a cross-species database for comparative genomics. Nucleic Acids Res. 31, 133- 137. 11 http://www.its.caltech.edu/~wormbase/userguide/ CE Winter - Worm Web Resources - 2003 3 strong features of the WormBase is collaboration with the project12. The Gene Ontology project tries to describe biological processes and molecular functions with a controlled vocabulary, and makes easy functional analysis. At the WormBase site you can perform a "Simple Search", like finding a known or putative gene, browse the genome, do a search, etc. But, to do that, you need some background information about C. elegans biology and genetics. In the WormBase you can find some information available on the web, but we suggest that you consult the C. elegans book (also known as the “wormbook”) published by the Cold Spring Harbor Press. The second edition is available, free of charge, on-line through the NCBI books section13. After finding what you wanted in the WormBase using the search tools, you will be directed to the individual pages. In the current version those pages are arranged in five categories: genes, sequences, gene ontology, microarray data and persons. There is also information on: how to obtain strains and clones, full text journals and more about C. elegans biology and reagents like clones and cloning vectors, that can be obtained through the Caenorhabditis Genetic Center (CGC)14 at the University of Minnesota in Minneapolis. The CGC is responsible for collecting, maintaining, and distributing stocks of C. elegans, maintaining a C. elegans Bibliography, and publishing and distributing the Worm Breeder's Gazette, among other things.

2. GENE KNOCKOUT CONSORTIUM (GKC)

The mission of the C. elegans Gene Knockout Consortium15 is to facilitate genetic research through the production of deletion alleles at specific gene targets, through the use of a chemical mutagenesis approach (Edgley et al., 2002)16

3. WorfDB: the Caenorhabditis elegans ORFeome Database: WorfDB17 integrates the data from the cloning of a complete set of ~19,000 predicted protein-encoding Open Reading Frames (ORFs) of C. elegans (also referred to as the “worm ORFeome”). The aim of projects that intend to help the completion of the annotation of a genome is “to clone ORFeomes once and for all in a referential vector allowing convenient transfer into various expression vectors”. “ This technical tour-de- force, hard to imagine using classical cloning techniques, became possible with the emergence of recombinational cloning”. The best known recombinational cloning is that offered by Invitrogen through the Gateway™. A new version of the C. elegans ORFeome has been recently published18. The importance of this project can be evaluated if we bear in mind that out of the 19,000 ORFs predicted by the C. elegans genome project, approximately 1,200 have been characterized in the last 30 years.

12 http://www.geneontology.org 13 http://web.ncbi.nlm.nih.gov:2441/books/bv.fcgi?call=bv.View..ShowTOC&rid=ce2.TOC 14 http://biosci.umn.edu/CGC/CGChomepage.htm 15 http://www.celeganskoconsortium.omrf.org/ 16 EDGLEY M, D'SOUZA A, MOULDER G, MCKAY S, SHEN B, GILCHRIST E,MOERMAN D, BARSTEAD R (2002) Improved detection of small deletions in complex polls of DNA. Nucl. Acids Res.30:e52-e55. 17 http://worfdb.dfci.harvard.edu 18 REBOUL, J. VAGLIO P, RUAL J-F, LAMESCH P. et al. (2003) C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nature Genetics 34:35-41. CE Winter - Worm Web Resources - 2003 4

The construction of the WorfDB is based on the following strategy19: predicted ORFs are amplified by PCR from a highly representative cDNA library using ORF- specific primers, cloned by Gateway recombinantion cloning and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing.

4. NEXTDB: The Nematode Expression Pattern DataBase The NEXTDB20 is a project aiming to make an EST analysis and systematic whole mount in situ hybridization of the C. elegans whole 100Mb genome. The project is carried out by Yuji Kohara´s group at the National Institute of Genetics in Japan. This database is linked to the EST project being also done by the Kohara’s lab and all cDNA clones are available through Dr Kohara lab and the Caenorhabditis Genetic Center. Version beta-3.0 available in the Internet contains information on: 1. Map: Visual expression of the relationships among the cosmids, predicted genes and the cDNA clones. 2. Image: In situ hybridization images that are arranged by their developmental stages. (This version contains only embryonic in situ images of a limited number of genes mostly from chromosome III). 3. Sequence: Tag sequences of the cDNA clones are available. 4. Homology: Results of BLASTX search are available.

5. Intronerator: The Intronerator is a site containing a collection of tools for exploring the molecular biology and genomics of C. elegans with a special emphasis on alternative splicing (Kent & Zahler, 2000a and b)21. The first page of the site22 contains information for the following Tools.

1. TRACKS DISPLAY- View splicing diagrams for any gene in the Sanger C. elegans database alongside cDNA and EST alignments. Retrieve DNA sequences with the exons in upper case. Search the literature. 2. WORM ALIGN - Paste in a piece of cDNA and see where it aligns with the C. elegans genome. 3. ALT-SPLICING CATALOG - A catalog of genes for which the cDNA and EST evidence indicates alternative splicing. 4. INTRON DATABASE - Scan ends of introns and neighboring exons for patterns. 5. ALGORITHMS - descriptions of the algorithms used to develop the Intronerator. 6. NAMELESS CLUSTERS - cDNA and ESTs that align with the genomic data, but are not within 1000 bp of named genes or open reading frames. 7. EXTRACT SEQUENCES - extract DNA sequences from database. 8. DOWNLOAD DATA - Download FA, GFF, etc files used in database.

19 REBOUL J, VAGLIO P, TZELLAS N., THIERRY-MIEG N. et al. (2002) Open-reading frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans. Nature Genet. 27:332-336. 20 http://nematode.lab.nig.ac.jp/db/index.html 21 KENT WJ A M ZAHLER (2000a)The Intronerator: exploring introns and alternative splicing in C. elegans. Nucleic Acids Research 28, 91-93. KENT WJ A M ZAHLER (2000b)Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Research 10, 1115-1125. 22 http://www.ese.uexe.edu/~kent/intronerator/ CE Winter - Worm Web Resources - 2003 5

The Intronerator uses EST data obtained by the C. elegans EST project at Kohara’s lab (see below). The graphics display the information about the gene (or sequence) of interest and can be zoomed in and out. This graphic is also “slidable” to the left or right of the figure. The Intronerator also contains information on the C. briggsae genome. More information about Intronerator and other web available programs can be found in Jim Kent test page23. You should be aware that there is a time lag between the information made available at other C. elegans genome sites (like the WormBase) and the Intronerator.

6. AcePrimer and e-PCR: AcePrimer v1.1 is available on the web24 at the C. elegans Genome Sequence Centre, Vancouver, CA25. Primer design is automated using a PERL program (see figure below) which performs the following: AcePerl26 is used to extract gene information and DNA sequence from the C. elegans AceDB database. This information is used to build a map of gene and DNA sequence features that is used to design primers that flank the gene or specific exons within the gene while avoiding problem areas such as overlapping genes or repeat sequences. Optimal primer sequences are predicted using Primer327 (developed by the Whitehead Institute for Biomedical Research). Primer sets are evaluated using the primer pair penalty (Q-value) assigned by Primer3 and by testing for multiple or unwanted PCR products with "electronic PCR" (e-PCR)28.

PCR primer design is automated using Perl. The Aceperl (Lincoln D. Stein and Jean Thierry-Mieg, 1998) and GFF Perl modules are used to gather and assemble information about the gene and get the DNA sequence from AceDB. This information is used to select the best regions in which to design primers and exclude problem areas, such as overlapping genes or repeat regions. Instructions are sent to Primer3 (Steve Rozen and Helen J. Skaletsky, 1996, 1997), which returns a list of possible primers. These are sorted according to various criteria. Quality control is performed by using e-PCR (“electronic PCR”; G.D. Schuler, 1997) to scan the entire C. elegans genome for poten- tial false priming or spurious PCR products.

7. WORMATLAS: The WORMATLAS is a consortium to fully describe the anatomy of C. elegans29. Currently it is being developed and contains information about some organs and systems. In the

23 http://www.cse.ucsc.edu/~kent/test/test.html 24 http://elegans.bcgsc.bc.ca/gko/aceprimer.shtml 25 http://elegans.bcgsc.bc.ca/ 26 http://stein.cshl.org/AcePerl/ 27 http://www-genome.wi.mit.edu/genome_software/other/primer3.html 28 http://www.ncbi.nlm.nih.gov/genome/sts/epcr.cgi 29 http://www.wormatlas.org/ CE Winter - Worm Web Resources - 2003 6 future you will be able to find any cell in the worm and through links with WormBase collect information about genes expressed in this cell in wild animals and mutant strains. Included in this site is a handbook of C. elegans anatomy30. The site contains also a Slidable Worm, not yet complete. All information on nervous system and neuron position were gathered from the classic paper of White et al. (1986)31. There are also several recent papers available on C. elegans anatomy and cell fate, in PDF and old ones in HTML format.

8. The Future: What will be the future bioinformatic tools for C. elegans research? The genome is completely sequenced; the task of annotation will continue for several years. How can these results be integrated to make up a meaningful biological picture of the worm? Recently Kajita et al have published a paper32 in which they assert that "the ultimate goal of bioinformatics is to reconstruct biological systems in the computer". They focused their attention on the cellular level of C. elegans development and constructed a computer model of the C. elegans embryo up to the 4-cell stage. This model will be integrated in the future to what we know about the developmental gene regulatory networks33 controlling cell-cell interaction and cellular structure. A better view of what will be done can be found in the review of Vidal34.

São Paulo, July 11th, 2003

30 http://www.wormatlas.org/handbook/contents.htm 31 WHITE JG, SOUTHGATE E, THOMSON JN and BRENNER S (1986) The structure of the nervous system of the Nematode Caenorhabditis elegans. Phil. Trans. Royal Soc. London 314B:1-340. 32 KAJITA A, YAMAMURA M and KOHARA Y (2003) Computer simulation of the cellular arrangement using physical model in early cleavage of the nematode Caenorhabditis elegans. Bioinform. 19, 704-716. 33 DAVIDSON EH, McCLAY DR and HOOD L (2003) Regulatory gene networks and the properties of the developmental process. Proc. Nat. Acad. Sci USA 100, 1475-1480. 34 VIDAL M (2001) A biological atlas of functional maps. Cell 104, 333-339.