<<

Downloaded from .cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Insight/Outlook WABA Success: A Tool for Sequence Comparison between Large

David L. Baillie1 and Ann M. Rose2,3 1Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6 Canada; 2Department of Medical Genetics, University of British Columbia, Vancouver V6T 1Z3 Canada

Whole-genome sequence compari- Missouri. The project was given a jump- containing transcripts. Further compli- sons between bacterial sequences are start with the construction of a physical cations adding to the problem of one thing, but try comparing two eu- map (Marra et al. 1997) and an open in- finding include the still-murky rules of karyotic genomes, each containing tens vitation to the research community to sequence identification unknown for or hundreds of millions of . participate in the selection of fosmids numerous important genomic elements And try to do it on your desktop ma- for sequence analysis. Researchers from (snRNAs, , factor chine in your office or at home. That is around the world probed microarrayed binding sites, replication origins, chro- what Kent and Zahler (2000) have tried, fosmid filters with their favorite gene matin folding and packaging signals, and the results are presented in this issue and contacted The Genome Sequencing meiotic pairing information, etc.). Be- of Genome Research. The use of evolu- Center with a request to sequence the cause of these issues, the initial annota- tionary conservation to unveil func- identified fosmid. Currently, approxi- tion of the C. elegans genome was done tional information contained within ge- mately 10% of the genome has now interactively using GENEFINDER, a pro- nomes is not new. In the case of the been completed and is available at gram written by Phil Green and nematode, comparisons of Caenorhabdi- http://www.genome.wustl.edu/pub/ LaDeana Hillier (unpubl.). The program, tis elegans to its close relative Caenorhab- gsc1/sequence/st.louis/briggsae/finish/. which was ahead of its time, did a re- ditis briggsae go back as far as Emmons et Comparison of C. elegans and C. briggsae markable job of gene prediction. Taken al. (1979). Snutch (1984) made the first sequence has facilitated the identifica- together with manual interpretation, C. briggsae genomic library available to tion of cis-linked regulatory elements the cDNA data (Kohara 1996) and re- the research community. These two (Heine and Blumenthal 1986), the clon- lated genomic sequence data from C. nematodes are almost identical in mor- ing of as a result of their syntenic briggsae, gene structure predictions were phology and development, yet their ge- relationship (Kuwabara and Shah 1994), made for > 19,000 C. elegans genes. nomes have been separated for a suffi- and interpretation of complex gene Kent and Zahler (2000) have taken cient amount of time to allow intronic structure (Thacker et al. 1999). In many advantage of the availability of genomic and intragenic sequences to become ef- cases examined, adjacent genes are con- sequence from these two closely related fectively randomized, while protein- served both in position and orientation, nematodes to test the feasibility of do- encoding sequences and cis-linked regu- with the occasional interruption caused ing large-scale alignments between ge- latory elements are conserved. C. brigg- by a or an appar- nomic DNA of different species. They sae has diverged from C. elegans in ent pseudogene. have developed an algorithm for se- regions of unselected sequence, the Prior to the availability of C. briggsae quence comparison in which every third middle of large introns, between genes, sequence, gene feature identification base (the wobble position of the codon) and at the third of synony- was done by abinitio prediction. Many is ignored. The wobble-aware bulk mous codons in genes not abundantly computer programs exist that attempt to aligner (WABA) allows the sensitive translated. Based on the now-outdated predict the exon-intron structure of identification of conserved coding re- concept of a constant evolutionary genes from genomic sequence. These gions. They use this algorithm in a clock, these two species were estimated programs vary in their accuracy and are three-tiered process to identify con- to have diverged some 30–60 million at their worst when asked to predict the served regions between eight million years ago. These facts prompted a C. first and last exons of genes, often fail- base pairs of C. briggsae genomic se- briggsae genome sequencing effort to be ing to identify correctly exons and in- quence and the entire 97 million base initiated by The Washington University trons outside the actual coding element. pairs of C. elegans. Their analysis was Genome Sequencing Center in St. Louis, Messenger RNA and EST-based feature performed on a readily available 450 detection is often confounded by large mHz Intel-based machine. The results 3Corresponding author. E-MAIL [email protected]; FAX (604) 822- transcripts, genes that have low levels of they have achieved are remarkable and 5348. transcription, or genes lacking poly(A)+- will provide a useful resource for all in

10:1071–1073 ©2000 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/00 $5.00; www.genome.org Genome Research 1071 www.genome.org Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Baillie and Rose

1072 Genome Research www.genome.org Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Insight/Outlook

Figure 1 (A) Exon prediction in genomic regions near bli-4 (also called K04F10; from Kent and Zahler at http://www.cse.ucsc.edu/∼kent/cgi-bin/ tracks.exe?where=K04F10). (B) Schematic representation of the bli-4 gene from C. elegans (top)andC. briggsae (bottom). The common region(s) encoding the signal peptide, prodomain, protease domain, and middle domain are shown in brown. Alternatively spliced exons that encode carboxyl termini unique to the individual isoforms are labeled alphabetically and color coded. The position and extent of the e937 3325-bp deletion is indicated. Open boxes represent noncoding exons or untranslated regions; shaded boxes represent coding exons. Hatched boxes (labeled I–VIII) represent regions of nucleotide homology that may constitute regulatory elements, particularly those at the 5Ј end. (Xb) XbaI; (Blp) BlpI; (Bgl) BglII; (Xho) XhoI. The approximate location of the C. briggsae bli-4 gene is indicated by dashed lines on the fosmid clones used in Thacker et al. (1999). The entire sequence of clones G25K01 and G06P23 has been determined, whereas G26K16 was used solely for transformation rescue experiments. (Reprinted with permission from Thacker et al. 1999.) the continuing analysis of genomes. ness of the syntenic segments, which is large vertebrate genomes. Computa- One of the examples given by the au- partly a consequence of the length of tional comparative genomics is expected thors is the bli-4 (K04F10.4) gene. The the C. briggsae clones. The number of to become an integral part of the analy- structure of this gene was characterized fragments that clones are broken into by sis of human and mouse genomes and previously by examination of the con- the alignment exhibited a bimodal dis- will be needed to identify coding ele- servation between C. elegans and C. tribution, which corresponds to some ments and other functional compo- briggsae (Thacker et al. 1999; Fig. 1B). extent to the position of the sequence nents resistant to analysis by more con- Traditionally, gene prediction programs on the C. elegans autosomes. In the ventional genetic and molecular ap- have been forced to ignore the compu- gene-rich central clusters, long align- proaches. tationally complex problem of ab-initio ments were observed, predicted to con- prediction of splice variants. Neither stitute approximately 40% of the ge- REFERENCES ACEDB nor the Genie program predict nome. In contrast, the flanking arms C. elegans Sequencing Consortium. 1998. Science the entire nine alternate transcripts de- were more susceptible to rearrangement. 282: 2012–2018. Emmons, S.W., Klass, M.R., and Hirsh, D. 1979. scribed in Thacker et al. (1999). Figure It has been known for some time that Proc. Natl. Acad. Sci. 76: 1333–1337. 1A shows the WABA/intronator display meiotic recombination is much higher Heine, U. and Blumenthal, T. 1986. J. Mol. Biol. for the gene. The areas of C. briggsae ho- on the arms. This fact and 188: 189–198. Kohara, Y. 1996. Protein Enzyme mology identified by the WABA pro- the observation that sequence similari- 41: 715. gram would greatly assist researchers by ties to other than nematodes Kuwabara, P.E. and Shah, S. 1994. Nucleic Acids alerting them to the strong possibility tends to be lower on the arms led the C. Res. 22: 4414–4418. that alternative transcripts may exist. elegans Sequencing Consortium (1998) Marra, M.A., Kucaba, T.A., Dietrich, N.L., Green, E.D., Brownstein, B., Wilson, R.K., McDonald, The intronator display brings together to suggest that the DNA in the arms K.M., Hillier, L.W., McPherson, J.D., and gene predictions, C. briggsae homolo- might be evolving more rapidly than Waterston, R.H. 1997. Genome Res. gies, and cDNA information in an easily that in the central regions of the auto- 7: 1072–1084. used and intuitive fashion. somes. Snutch, T.P. 1984. Ph.D. thesis, Simon Fraser University, Burnaby, B.C. One surprising observation made by Computational tools, like WABA, Thacker, C., Marra, M.A., Jones, A., Baillie, D.L., Kent and Zahler (2000) was the short- will be essential for future analysis of and Rose, A.M. 1999. Genome Res. 9: 348–359.

Genome Research 1073 www.genome.org Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

WABA Success: A Tool for Sequence Comparison between Large Genomes

David L. Baillie and Ann M. Rose

Genome Res. 2000 10: 1071-1073 Access the most recent version at doi:10.1101/gr.10.8.1071

References This article cites 6 articles, 4 of which can be accessed free at: http://genome.cshlp.org/content/10/8/1071.full.html#ref-list-1

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Cold Spring Harbor Laboratory Press