C. Elegans on the Web: Bioinformatics Used to Serve a Worldwide Scientific Community

C. elegans on the web: bioinformatics used to serve a worldwide scientific community Carlos Eduardo Winter Department of Parasitology – Instituto de Ciências Biomédicas Universidade de São Paulo – e-mail: [email protected] Introduction: Nematodes have been used as experimental animals since the 19th century. At that time the research done by Theodor Bovery in the 1890’s with Ascaris embryos showed the importance of chromosomes as the carriers of genetic information. During the first half of the 20th century nematodes lost their importance as model organisms in biology as insects rose to the preferred models, mainly as research objects in physiology and genetic studies. With the advent of novel molecular techniques, as result of the Molecular Biology revolution, other animals could be used as new objects in biological research. Genetic analysis of free-living nematodes were done by French researchers more than 50 years ago1. One of these nematodes is the worm Caenorhabditis elegans. C. elegans is a tiny (~1mm long) nematode worm, taxonomically described by Maupas in 19002, with a short life cycle (~3 days). It gained notoriety after Sidney Brenner chose it as a new model organism in the middle of the 1960’s. After being used as a model for studies in developmental genetics and neurobiology, the sequencing of its complete genome in 19983, made it the best organism to study several biological problems in eukaryotic metazoa. In August 20024 the genome of another species of the genus Caenorhabditis, C. briggsae, was also sequenced and is currently being annotated, using C. elegans genome as a scaffold. The success of C. elegans as a worldwide model system for biological studies was crowned in 2002 with the Nobel prize for Medicine and Physiology awarded to three of the founders of the modern C. elegans studies: Sidney Brenner (Molecular Sciences Institute), H. Robert Horvitz (MIT/HHMI) and John Sulston (Wellcome Trust Sanger Institute). A great part of this success can surely be granted to the sharing of results through several means of communication. First through the "Worm Breeder’s Gazette", based on the success of the "Drosophila Information Service" (DIS)(published since March 1934; and now in the Internet5). As noted in the first issue of DIS “An appreciable share of credit for the fine accomplishments in Drosophila genetics is due to the broadmindedness of the original Drosophila workers who established the policy of a free exchange of material and information among all actively interested in Drosophila research.” The C. elegans community has the same policy of free exchange of material and information and this is seen in the WBG, published since December 1975, first available in printed form and now available on the web6. It publishes preliminary results and methods on C. elegans research, and is an important vehicle for 1 NIGON V & DOUGHERTY EC (1950) A dwarf mutation in a nematode. A morphological mutant of Rhabditis briggsae, a free-living soil nematode. J. Hered. 41, 103-109. 2 MAUPAS E (1900) Modes et formes de reproduction des nematodes. Achiv. Zool. Exper. Gen. 8:463- 624 3 The C. elegans Sequencing Consortium (1998) Genome sequence of the Nematode C. elegans: A platform for investigating Biology. Science 282:2012-2018. 4 http://www.wormbase.org/old_news.html 5 http://www.ou.edu/journals/dis/ 6 http://elegans.swmed.edu/wli/ CE Winter - Worm Web Resources - 2003 2 disseminating rapidly the results among the C. elegans community. ACEDB: During the C. elegans genome sequencing project several bioinformatic tools were developed with the aid of the C. elegans Research Community. The best known of these tools is the ACEDB (A C. elegans Database)7, a relational database developed by Thierry-Mieg and Durbin (1992)8 that was used for Arabidopsis data (AATDB), human genome data and Drosophila (FLYBASE). The original program was written in C and runs in UNIX OS and has served as the central repository of phenotyping, bibliographic, mapping, and sequencing information for the Caenorhabditis elegans community since 1990. Lately it was improved by the development of ACEPERL9 an object-oriented Perl interface for the ACEDB database. These tools were brought together and originated the WORMBASE, freely available through the Internet (see below). In this very short review we will try to show some of the web based bioinformatic tools that have been used by the C. elegans community to help the study of the biology of this important model animal. Obviously this text will be outdated at the time of printing and anyone that would like to use the C. elegans to solve a biological problem should consult the documents and web sites cited herein. 1. WORMBASE The WormBase contains information on: (1) the essentially complete genome sequence; (2) the developmental lineage of the worm; (3) the connectivity of the nervous system; (4) mutant phenotypes, genetic markers and genetic map information; (5) gene expression described at the level of single cells; and (6) bibliographic resources including paper abstracts and author contact information (Harris et al., 2003)10. The Table of Contents of the Wormbase manual11 shows all the different possibilities of retrieving information (published and unpublished) about C. elegans. One of the 7 http://www.acedb.org/ 8 THIERRY-MIEG J & DURBIN R. (1992) AceDB, a C. elegans database. Cahiers IMABIO 5, 15-24. 9 http://stein.cshl.org/AcePerl/index.html 10 HARRIS T W, LEE R, SCHWARZ E, BRADNAM K, LAWSON D, CHEN W, BLASIER D, KENNY E, CUNNINGHAM F, KISHORE R, CHAN J, MULLER H-M, PETCHERSK A, THORISSON G, DAY A, BIERI T, ROGERS A, CHEN C-K, SPIETH J, STERNBERG P, DURBIN R & STEINL D (2003) WormBase: a cross-species database for comparative genomics. Nucleic Acids Res. 31, 133- 137. 11 http://www.its.caltech.edu/~wormbase/userguide/ CE Winter - Worm Web Resources - 2003 3 strong features of the WormBase is collaboration with the Gene Ontology project12. The Gene Ontology project tries to describe biological processes and molecular functions with a controlled vocabulary, and makes easy functional analysis. At the WormBase site you can perform a "Simple Search", like finding a known or putative gene, browse the genome, do a blast search, etc. But, to do that, you need some background information about C. elegans biology and genetics. In the WormBase you can find some information available on the web, but we suggest that you consult the C. elegans book (also known as the “wormbook”) published by the Cold Spring Harbor Press. The second edition is available, free of charge, on-line through the NCBI books section13. After finding what you wanted in the WormBase using the search tools, you will be directed to the individual pages. In the current version those pages are arranged in five categories: genes, sequences, gene ontology, microarray data and persons. There is also information on: how to obtain strains and clones, full text journals and more about C. elegans biology and reagents like clones and cloning vectors, that can be obtained through the Caenorhabditis Genetic Center (CGC)14 at the University of Minnesota in Minneapolis. The CGC is responsible for collecting, maintaining, and distributing stocks of C. elegans, maintaining a C. elegans Bibliography, and publishing and distributing the Worm Breeder's Gazette, among other things. 2. GENE KNOCKOUT CONSORTIUM (GKC) The mission of the C. elegans Gene Knockout Consortium15 is to facilitate genetic research through the production of deletion alleles at specific gene targets, through the use of a chemical mutagenesis approach (Edgley et al., 2002)16 3. WorfDB: the Caenorhabditis elegans ORFeome Database: WorfDB17 integrates the data from the cloning of a complete set of ~19,000 predicted protein-encoding Open Reading Frames (ORFs) of C. elegans (also referred to as the “worm ORFeome”). The aim of projects that intend to help the completion of the annotation of a genome is “to clone ORFeomes once and for all in a referential vector allowing convenient transfer into various expression vectors”. “ This technical tour-de- force, hard to imagine using classical cloning techniques, became possible with the emergence of recombinational cloning”. The best known recombinational cloning is that offered by Invitrogen through the Gateway™. A new version of the C. elegans ORFeome has been recently published18. The importance of this project can be evaluated if we bear in mind that out of the 19,000 ORFs predicted by the C. elegans genome project, approximately 1,200 have been characterized in the last 30 years. 12 http://www.geneontology.org 13 http://web.ncbi.nlm.nih.gov:2441/books/bv.fcgi?call=bv.View..ShowTOC&rid=ce2.TOC 14 http://biosci.umn.edu/CGC/CGChomepage.htm 15 http://www.celeganskoconsortium.omrf.org/ 16 EDGLEY M, D'SOUZA A, MOULDER G, MCKAY S, SHEN B, GILCHRIST E,MOERMAN D, BARSTEAD R (2002) Improved detection of small deletions in complex polls of DNA. Nucl. Acids Res.30:e52-e55. 17 http://worfdb.dfci.harvard.edu 18 REBOUL, J. VAGLIO P, RUAL J-F, LAMESCH P. et al. (2003) C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nature Genetics 34:35-41. CE Winter - Worm Web Resources - 2003 4 The construction of the WorfDB is based on the following strategy19: predicted ORFs are amplified by PCR from a highly representative cDNA library using ORF- specific primers, cloned by Gateway recombinantion cloning and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. 4. NEXTDB: The Nematode Expression Pattern DataBase The NEXTDB20 is a project aiming to make an EST analysis and systematic whole mount in situ hybridization of the C. elegans whole 100Mb genome. The project is carried out by Yuji Kohara´s group at the National Institute of Genetics in Japan.

C. Elegans on the Web: Bioinformatics Used to Serve a Worldwide Scientific Community

Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

UC Davis UC Davis Previously Published Works

The Biogrid Interaction Database

NIH-GDS: Genomic Data Sharing

SGD and the Alliance of Genome Resources Stacia R

PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E

Annotation of Metabolic Genes in Caenorhabditis Elegans and Reconstruction of Icel1273

The Distribution of Lectins Across the Phylum Nematoda: a Genome-Wide Search

Wormbase 2017: Molting Into a New Stage Tim Schedl Washington University School of Medicine in St

Annual Scientific Report 2011 Annual Scientific Report 2011 Designed and Produced by Pickeringhutchins Ltd

Uniprot.Ws: R Interface to Uniprot Web Services