Comparative and Functional Genomics Comp Funct Genom 2003; 4: 654–659. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.336 Meeting Review 11th Intelligent Systems for Molecular 2003 (ISMB 2003) Brisbane Convention Centre, Brisbane, Australia, 29 June–3 July 2003

Catherine A. Abbott* School of Biological Sciences, Flinders University, Adelaide, SA, Australia

*Correspondence to: Abstract Catherine A. Abbott, School of Biological Sciences, Flinders This report profiles the keynote talks given at ISMB03 in Brisbane, Australia by Ron University, GPO BOX 2100, Shamir, , John Mattick, Yoshihide Hayashizaki, Sydney Brenner, the Adelaide, SA, Australia. Overton Prize winner, Jim Kent, and the ISCB Senior Accomplishment Awardee, E-mail: David Sankov. Copyright  2003 John Wiley & Sons, Ltd. cathy.abbott@flinders.edu.au

Received: 22 September 2003 Revised: 25 September 2003 Accepted: 29 September 2003

Introduction Evolutionary Analysis’ and ‘Molecular Modelling: building a 3D protein structure from its sequence’, This year for the first time the annual meeting of and on more cutting-edge topics such as ‘Bioethics the International Society of Computational Biol- for Bioinformaticists’, ‘Artificial Intelligence and ogy (ISCB) was held in the Southern hemisphere. Machine Learning Techniques for ’ The 11th International Conference on Intelligent and ‘Data Warehousing in ’. Systems in Molecular Biology (ISMB2003) came The official meeting began on Monday with a ‘down under’ and was held at the Brisbane Conven- warm welcome by the co-chairs of the meeting, tion Centre, Brisbane, Australia, 29 June–3 July Gene Myers and Mark Ragan, to the 919 registrants 2003. Unfortunately, the promised great Aussie from 42 different countries. This review covers the sun did not come out as much as was expected keynote and prize-winning talks that were delivered but the programming committee delivered a stim- across the 4 days of the meeting. All other talks are ulating program where the ‘bio’ was once again published as papers in the journal Bioinformatics returned to bioinformatics. (http://bioinformatics.oupjournals.org/). Notable As is traditional at ISMB, the meeting was speakers at ISMB 2003 included: Sydney Bren- preceded by Special Interest Group (SIGS) satel- ner (one of three recipients of the 2002 Nobel lite meetings and tutorials. These SIGS were well Prize in Medicine, founder of the Molecular Sci- attended, with the numbers of participants rang- ences Institute and Distinguished Research Profes- ing from 98 for the Bioinformatics Open Source sor at the Salk Institute); David Haussler (Howard Conference, 100 for Biopathways, 51 for Bio- Hughes Medical Institute Investigator and Pro- Ontologies, 28 for WEB03, and 42 for Text Mining fessor of Computer and Information Sciences at (BioLINK). The tutorials were popular, as usual, the University of California, Santa Cruz); Yoshi- with over 500 ISMB registrants taking part in the hide Hayashizaki (Project Director of the Genome 15 tutorials offered. These were held on tradi- Exploration Research Group at the Genomic Sci- tional topics, such as ‘Molecular Phylogenetics and ences Center at RIKEN); John Mattick (Director of

Copyright  2003 John Wiley & Sons, Ltd. Meeting Review 655 the Institute for Molecular Bioscience, University One of the grand challenges of human molecular of Queensland); ( of Com- evolution is to reconstruct the evolutionary history puter Science at Tel-Aviv University); and Michael of each base in the human genome. Great sums of Waterman (Professor of , Computer money were spent on getting a draft sequence of Science and Biological Science at the University the human genome, so we now need to use this of Southern California). information fruitfully. Genome browsers, such as This year’s ISMB focused more on the use of that at UCSC, will become more than archives of bioinformatics and to anal- data — they will be used as microscopes that can yse entire biological systems and there was less be used to interpret and discover new things about emphasis on individual standalone problems, such human genes. The overall goal of future browser as microarray analysis, building phylogenetic trees, applications is to link gene position to functional motif- and gene-finding algorithms. Papers and information, so that the browser can be used as keynotes at the conference were divided into seven an engine for discovery — a microscope searching major themes: (1) phylogeny and genome rear- the genome for new and exciting discoveries. These rangements; (2) expression arrays and networks; thoughts were expanded upon later in the meeting (3) predicting clinical outcomes; (4) protein clus- in the Overton Prize address by Jim Kent. tering, alignment and patterns; (5) transcription The second keynote address in this session motifs and modules; (6) structure and hidden was John Mattick’s (University of Queensland; Markov models; and (7) text mining and high- http://www.imb.uq.edu.au/mattick.html), ‘Pro- throughput methods; with an additional session for gramming of the autopoietic development of com- short papers. plex organisms: the hidden layer of non-coding With the sequences of over 1000 genomes now RNA’. His presentation generated enormous excite- completed, including those of human [13] and ment in the audience and attracted an incredible mouse [20], the meeting opened with a thought- amount of interest in the form of questions after- provoking session on phylogeny and genome rear- wards. rangements, which included two keynote speakers. The number of protein-coding genes does not David Haussler (University of California; scale strongly with the complexity of an organism. Santa Cruz; http://www.cse.ucsc.edu/∼haussler) The fly and worm genomes contain 14 000–19 000 gave a fascinating presentation on ‘Identifying protein-coding genes, two to three times more than functional elements in the human genome by yeast at ∼6000, but this is not much less than the tracing the evolutionary history of the bases: a number of mammalian protein-coding genes. More- key challenge for comparative genomics’. He out- over, the genome of the inanimate plant, rice, has lined the principles of using evolution to find more genes than that of humans. The 98% of non- genes and other functional elements [19]. Although coding sequence in the human genome actually 75 million years of evolution separate mice and has a tighter correlation with human complexity. humans, comparing the genomes of the two species Most of the genetic variation between organisms provides a crude way of finding regions of func- lies in these non-coding regions, indeed only 1% of tional significance. This type of approach is noisy coding genes between mouse and human are dif- and, while difficult, it is possible to separate the ferent. Mattick gave a compelling argument that noise from the useful information using a cali- non-coding RNA is the genetic basis of human bration point. At least half of the human genome complexity and variation [14,15]. There are enor- consists of relics of retrotransposons, i.e. half of mous numbers of non-coding RNA genes in the our genome is the rotting carcasses of the selfish mammalian genome, which are only now begin- DNA that has inserted itself into our DNA over the ning to be recognized, and which appear to account years. Comparative genomics will allow us to iden- for between one-half and three-quarters of all tran- tify functional elements and, as more species are scripts. At least 50%, and possibly the majority, sequenced, there will be an increase in our power to of the human genome is transcribed. He fielded detect conserved elements. His group has enhanced the hypothesis that RNAs derived from processed these studies by combining hidden Markov and introns are involved in gene–gene communication phylogenetic models to create new ways of mod- and networking in real time in eukaryotic cells. elling molecular evolution [17]. The talk summarized the accumulating evidence

Copyright  2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 654–659. 656 Meeting Review that spliced RNA molecules are functional; introns expression data include the new biclustering algo- comprise, on average, 95% of the primary sequence rithm SAMBA (Statistical Algorithmic Method for of protein-coding transcripts in humans, the num- Bicluster Analysis) and PRIMA (PRomoter Inte- bers and size of introns and non-coding RNA cor- gration in Microarray Analysis), a program for relates with developmental complexity, and some finding transcription factors whose binding sites are introns/non-coding RNAs are highly conserved. enriched in a given set of promoters. By utilizing The emphasis on systems biology was contin- human genomic sequences and models for bind- ued in the next session on expression arrays and ing sites of known transcription factors, PRIMA networks. While there were many interesting pre- identifies transcription factors whose binding sites sentations, the one that clearly stood out was are significantly over-represented in a given set that of Eran Segal (Stanford University; www- of promoters. Another JAVA-based tool his group cs.stanford.edu/∼eran) on ‘Discovering molecu- has developed to aid clustering and visualizing of lar pathways from protein interaction and gene gene expression data is EXPANDER (EXPression expression’, which received the award for Best Stu- ANalyzer and DisplayER). This visualization tool dent Paper of the meeting (more details of this work includes an implementation of the new CLICK can be found at http://bioinformatics.oupjour- clustering algorithm, as well as for other popu- nals.org/). Segal presented results combining two lar clustering algorithms, such as K-means, self- Saccharomyces cerevisiae gene expression datasets organizing maps and hierarchical clustering. One with a protein interaction dataset and applying a of the important take-home messages of the entire unified probabilistic model to discover coherent meeting, particularly from the point of view of an functional groups and entire protein complexes. experimental biologist like myself, which was high- Later in the meeting, he presented further work lighted by Shamir, is the importance of improved extending these concepts in his talk, ‘Discovery networking between those developing computing of transcriptional modules from DNA sequence algorithms and experimentalists, to verify the rel- and gene expression’, which was awarded the Best evance of newly developed tools. These method- Paper of the meeting. ologies are in their infancy and the long-term chal- At present, we have vast amounts of data lenge is to encourage more research and effort to sources, DNA sequence data, gene expression data use these methodologies to make a dent in prob- and data from proteomic technologies such as lems related to medicine and health. yeast two-hybrid systems. The great challenge is This year’s recipient of the ISCB Overton Prize how to use this data to better understand bio- was Mr William James (Jim) Kent (UC Santa logical systems. On the second day of the meet- Cruz; http://www.cse.ucsc.edu/∼kent). This pres- ing, these concepts were discussed in the keynote tigious prize is awarded to a bioinformatician for address of Ron Shamir (Tel Aviv University; outstanding accomplishment in the early phase of http://www.math.tau.ac.il/∼rshamir/), ‘Recon- a career. His talk, entitled ‘Patching and Painting structing genetic networks’. His presentation the Human Genome’, outlined the efforts that have emphasized the importance of developing compu- gone into making the human genome more under- tational methodologies that are rigorous, robust and standable to humans, i.e. providing visual represen- realistic to help integrate the different datasets that tations of the human genome. Kent is best known are available. He emphasized the point that, as as the researcher who ‘saved’ the human genome these datasets are highly heterogeneous, the scien- project. He wrote GigAssembler [10], a program tific community should not expect any single ‘killer that produced the first full working draft assem- application’. Tools that are developed will be tuned bly of the human genome, the Human Genome for a specific task. His group has implemented the Browser at UCSC (http://genome.ucsc.edu [11]), CLICK tool and tested it on a variety of biological just before the company Celera was to present a datasets, ranging from gene expression to cDNA complete draft of the human genome to the White oligo-fingerprinting to protein sequence similarity. House in 2000. This feat enabled the research com- CLICK was successfully used for gene clustering in munity to keep human genomic data freely avail- functional genomics, finding motif sequences, tis- able in the public domain. Kent’s talk summarized sue classification and more [18,7]. Other tools that the goals of his work and introduced the bioin- his group has developed for the analysis of gene formatics tools he has built. He outlined the work

Copyright  2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 654–659. Meeting Review 657 involved in developing tools such as: the Intronera- The Wednesday afternoon session began with tor system for exploring the genome of C. elegans the next keynote of the meeting, ‘Dynamic pro- [12]; the program WABA, which was one of the gramming algorithms for haplotype block partition- first pair-hidden Markov models for the alignment ing’, delivered by Michael Waterman (University of genomic DNA of two species; Improbiser, an of Southern California; http://www-hto.usc.edu/ expectation-maximization method to discover and people/Waterman.html). His lecture focused on cluster potential transcription factor binding sites; using computational approaches to analyse molec- and the popular BLAT, which rapidly searches full ular sequence data collected from human varia- genomes at both the DNA and protein levels [9]. tion and single nucleotide polymorphisms (SNPS) As discussed by Kent, the UCSC browser is con- [21,22]. The technology is now available to score stantly evolving to include visualization of other large numbers of DNA variants (SNPs) in different genomes, such as mouse, and the goal of future individuals. The challenge is to be able to asso- research will be to provide tools that allow us to ciate SNP data with different disease states when visualize multiple genomes more easily and will the problem suffers from excessive dimensionality. focus on integrating RNA data and allowing com- Waterman discussed the importance of developing parisons between genomes. The ultimate aim of a means to reduce the number of dimensions to the these tools is to allow us to use genomic infor- space of genotype classes in a biologically mean- mation more fruitfully. ingful way. Linked SNPs are often statistically This year the Society successfully introduced a associated with one another (in ‘linkage disequi- new parallel stream to the ISMB program and the librium’) and the number of distinct configurations Tuesday afternoon was dedicated to short papers of multiple tightly linked SNPs in a sample is often in parallel with meeting reports of the various far lower than one would expect from independent SIGS. This new approach was well received by sampling. These joint configurations, or haplotypes, all attendees. might be a more biologically meaningful unit, since The third day of the meeting opened with a ses- they represent sets of SNPs that co-occur in a sion on protein clustering, alignment and patterns population. Recently there has been much excite- and the morning concluded with a keynote from the ment over the idea that such haplotypes occur as recipient of the inaugural Senior Scientist Accom- blocks across the genome, as these blocks suggest plishment Award, (University of that fewer distinct SNPs need to be scored to cap- Ottawa; http://www.crm.umontreal.ca/cgi/qui? ture information about genotype identity. There is sankoff). This new award was established in order a need for formal analysis of this dimension reduc- to recognize a member of the computational biol- tion problem, for formal treatment of the hierarchi- ogy community who is more than 12–15 years cal structure of haplotypes, and for consideration post-degree who has made major contributions to of the utility of these approaches toward meeting the field of computational biology through research, the end goal of finding genetic variants associated education, service, or a combination of the three. with complex disease. Sankoff received this award for the immense contri- Thursday was the last day of the meeting, and butions he has made to computational biology dur- both keynote speakers once again gave presen- ing his career. Over the last 15 years his work has tations reminding us of the power of system- focused on the evolution of genomes as the result atic approaches in science. Yoshihide Hayashizaki of chromosomal rearrangement processes [1,16]. (RIKEN Genomic Science Center; http://gen- He argued that the increase in large-scale genomic ome.gsc.riken.go.jp/index.html) reviewed the sequence data will give computational biologists amazing amount of work that has been accom- the ability to compare theoretical work to exper- plished in analysing the mouse transcriptome in imental work! The scientific community now has his presentation, ‘The dynamic eukaryotic tran- many inventories of the random rearrangement and scriptome’. The RIKEN centre have just com- evolution that occur amongst different genomes, pleted their mouse ‘encyclopedia project’, a map and now needs to use mathematical processes to of the mouse transcriptome. This project further look at these comparative maps/genomes, to get reinforced the importance of computer scientists more realistic ideas about rates and overall tenden- and biologists working together, and with such a cies in evolution [6]. high level of collaboration it has been possible

Copyright  2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 654–659. 658 Meeting Review to publish the mouse genome and transcriptome gained much information but it needs to be consid- together. Hayashizaki outlined the important prin- ered in a completely fresh light, in a more system- ciples behind the RIKEN transcriptome approach atic fashion which will involve computational biol- vs. the EST approach, which proved an invaluable ogists working in close conjunction with biologists. aid to annotating the human genome. The RIKEN For a long time we have known that the mammalian group decided that, as there was a saturation of genome is not homogenous in its GC composition, human ESTs and as the transcriptome was more and that this base heterogeneity, or isochore struc- dynamic than the genome, it was more suitable for ture, is correlated with other important genomic systematic analysis compared to the proteome. The features, such as the insertion of repetitive ele- secondary goal of the RIKEN mouse genome ency- ments and gene density [5]. Recently, it was shown clopedia project was to develop a series of new and that mutational rates are variable along the genome, original technologies to aid investigation of tran- pointing to a mosaic model of genome evolution. scriptomes [3,4]. The high-throughput sequencing Large-scale comparisons of humans, rodents and technologies developed at RIKEN enabled them frog genomes will be useful for learning about the to sequence 40 000 samples per day. The FAN- evolution of mammalian genes and knowledge of TOM project led to functional annotation of 37 086 gene chromosome position will allow visualization cDNAs, 20 487 protein-coding genes and 16 599 of the degree of mutational variation in the genome. non-coding genes from the mouse [2]. This project was a tangible example of how international and interdisciplinary collaboration can lead to excellent Conclusions science. Another of the goals of RIKEN was to develop a format by which the FANTOM clones This was my third and most scientifically enjoy- (the genome encyclopedia) could be easily dis- able ISMB meeting to-date. This was due to tributed to the scientific community. With some- the increased emphasis on systems biology and thing that seemed to be in the realms of sci- genomics throughout the entire program. For a ence fiction, or else a scientific joke, Hayashizaki molecular biologist and a newcomer to the field announced that a special issue of Genome Research of bioinformatics like myself, ISMB once again had been produced — a test DNA book in which provided a melting pot of opportunity to mingle sample cDNA clones had been spotted on to water- with others new to the field, computer scientists soluble paper [8]. Incredibly, in this proposed DNA and major players in the area, thus gaining a real book, all the FANTOM clones will be clearly anno- feel for both the challenges and recent successes tated, so that for each cDNA there would be a spot of the burgeoning bioinformatics scientific com- that could be punched out and, using PCR, a copy munity. In addition, the meeting was a chance to of the cDNA could be made. Thus, the DNA book renew acquaintances and meet new colleagues, and provides an ingenious way for delivering DNA in a the experience reinvigorated my enthusiasm for timely and cost-effective manner to the community. computational biology and bioinformatics. More Hayashizaki concluded his talk by reminding the importantly, the take-home message of the meet- audience that life science research in the twenty- ing, for me, was the importance of providing many first century is increasingly about connecting the opportunities for computational biologists to work genome to the phenome, and to facilitate this we side-by-side with experimentalists to ensure that will need to combine experimental approaches and the work in both fields gives maximal outcomes in computational approaches to allow new discover- clinical diagnostics, human health and agriculture. ies. These collaborations will ensure that the bioinfor- These comments served as a primer for Sydney matics community comes up with robust and rig- Brenner’s (Salk Institute; http://www.salk.edu/ orous solutions to the key scientific challenges that faculty/faculty/details.php?id=7) presentation, genomic researchers face in the future. ‘The evolution of genes and genomes’. Brenner By the end of the extremely lively and enthusi- provided a terrific message, reminding the audience astic Glasgow ‘ISMB 2004’ presentation by David of the simple concept that genes, including human Gilbert, my bags were packed and I was ready genes, have not evolved independently from one to go. In another first, the next meeting will be another. By sequencing whole genomes we have held jointly with the European Conference on

Copyright  2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 654–659. Meeting Review 659

Computational Biology (ECCB) and in conjunction 11. Kent WJ, Sugnet CW, Furey TS, et al. 2002. The human with Genes, Proteins and Computers VIII. Thus genome browser at UCSC. Genome Res 12(6): 996–1006. 12. Kent WJ, Zahler AM. 2000. The intronerator: exploring next year’s ISMB meeting promises to deliver a introns and alternative splicing in Caenorhabditis elegans. program with a broader scope, but which will con- Nucleic Acids Res 28(1): 91–93. tinue to focus on bringing the ‘bio’ back into bioin- 13. Lander ES, Linton LM, Birren B, et al. 2001. Initial sequenc- formatics. So, ISMB Glasgow 2004 in Scotland, all ing and analysis of the human genome. Nature 409(6822): that rain and those castles can’t wait! 860–921. 14. Mattick JS. 2001. Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep 2(11): 986–991. 15. Mattick JS, Gagen MJ. 2001. The evolution of controlled References multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. 1. Blanchette M, Kunisawa T, Sankoff D. 1999. Gene order MolBiolEvol18(9): 1611–1630. breakpoint evidence in animal mitochondrial phylogeny. JMol 16. Sankoff D. 2001. Gene and genome duplication. Curr Opin Evol 49(2): 193–203. Genet Dev 11(6): 681–684. 2. Bono H, Kasukawa T, Furuno M, Hayashizaki Y, Okazaki Y. 17. Seipel A, Haussler D. 2003. Combining phylogenetic and 2003. FANTOM-DB: database of functional annotation of hidden Markov models in biosequence analysis. Proceedings RIKEN mouse cDNA clones. Seikagaku 75(2): 149–152. of the 7th Annual International Conference on Research in 3. Bono H, Yagi K, Kasukawa T, et al. 2003. Systematic Computational Molecular Biology (RECOMB’2003). expression profiling of the mouse transcriptome using RIKEN 18. Sharan R, Shamir R. 2000. CLICK: a clustering algorithm cDNA microarrays. Genome Res 13(6B): 1318–1323. with applications to gene expression analysis. Proceedings 4. Carninci P, Waki K, Shiraki T, et al. 2003. Targeting a of the International Conference on Intelligent Systems for complex transcriptome: the construction of the mouse full- Molecular Biology; ISMB 8: 307–316. length cDNA encyclopedia. Genome Res 13(6B): 1273–1289. 19. Thomas JW, Touchman JW, Blakesley RW, et al. 2003. 5. Castresana J. 2002. Genes on human chromosome 19 show Comparative analyses of multi-species sequences from extreme divergence from the mouse orthologs and a high GC targeted genomic regions. Nature 424(6950): 788–793. content. Nucleic Acids Res 30(8): 1751–1756. 20. Waterston RH, Lindblad-Toh K, Birney E, et al. 2002. Initial 6. Eichler EE, Sankoff D. 2003. Structural dynamics of eukary- sequencing and comparative analysis of the mouse genome. otic chromosome evolution. Science 301(5634): 793–797. Nature 420(6915): 520–562. 7. Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y. 2003. 21. Zhang K, Deng M, Chen T, Waterman MS, Sun F. 2002. A Genome-wide in silico identification of transcriptional dynamic programming algorithm for haplotype block parti- regulators controlling the cell cycle in human cells. Genome tioning. Proc Natl Acad Sci USA 99(11): 7335–7339. Res 13(5): 773–780. 22. Zhang K, Sun F, Waterman MS, Chen T. 2003. Haplotype 8. Kawai J, Hayashizaki Y. 2003. DNA book. Genome Res block partition with limited resources and applications to 13(6B): 1488–1495. human chromosome 21 haplotype data. Am J Hum Genet 9. Kent WJ. 2002. BLAT — the BLAST-like alignment tool. 73(1): 63–73. Genome Res 12(4): 656–664. 10. Kent WJ, Haussler D. 2001. Assembly of the working draft of the human genome with GigAssembler. Genome Res 11(9): 1541–1548.

Copyright  2003 John Wiley & Sons, Ltd. Comp Funct Genom 2003; 4: 654–659. International Journal of Peptides

Advances in BioMed Stem Cells International Journal of Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Virolog y http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of Nucleic Acids

Zoology International Journal of Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at http://www.hindawi.com

Journal of The Scientific Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Genetics Anatomy International Journal of Biochemistry Advances in Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Enzyme International Journal of Molecular Biology Journal of Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014