Vol. 15 no. 11 1999 BIOINFORMATICS Pages 865–866
A.M. Shmatkov, A.A. Melikyan, F.L. Chernousko and Editorial M. Borodovsky’s paper, ‘Finding prokaryotic genes by the “frame-by-frame” algorithm: targeting gene starts THE SECOND GEORGIA TECH INTERNA- and overlapping genes’, addresses the important practical TIONAL CONFERENCE ON BIOINFORMATICS: problem of gene recognition in prokaryotic genomes, SEQUENCE, STRUCTURE AND FUNCTION of which over a dozen have already been completely (NOVEMBER 11–14, 1999, ATLANTA, GEORGIA, sequenced. The authors suggest an approach to one of USA) the few remaining open problems in prokaryotic gene finding: accurate prediction of gene starts allowing for Steering & Program Committee: Pierre Baldi, Mark the possibility of overlapping protein-coding regions, a Borodovsky, Soren Brunak, Chris Burge, Jim Fickett, relatively common occurrence in prokaryotic genomes Steven Henikoff, Eugene Koonin, Andrej Sali, Chris which seems to be rare in eukaryotes. Their algorithm Sander, Gary Stormo involves application of a hidden Markov model of gene structure to each of the six global reading frames (three on This issue of Bioinformatics contains reports on selected each strand) of a genome separately, followed by a simple papers presented at the international conference in At- post-processing step to remove completely overlapping lanta. The conference was held at one of the midtown genes which rarely occur in nature. Promising results are hotels offering a magnificent bird’s-eye view to the obtained in identifying gene starts, and the possibility of cosmopolitan capital of the Southeast of the USA. The systematic biases in the annotation of several bacterial conference agenda included keynote lectures by Russell genomes is raised. Doolittle, Walter Fitch and Walter Gilbert, 19 plenary Benchmarking of the computer tools developed for and contributed talks and more than 50 poster presen- finding eukaryotic genes is the subject of the paper tations. The focus of the conference was on the most ‘Evaluation of gene prediction software using a genomic intriguing issues of comparative and structural genomics, dataset: application to Arabidopsis thaliana sequences’ functional interpretation of nucleic acid and protein by N. Pavy, S. Rombauts, P. Dehais, C. Mathe, D.V.V. sequences, inferring 3D structure from sequence data Ramana, P. Leroy and P. Rouze. The authors have and deciphering the history of biomolecular evolution developed the AraSet, a data set of authentic contigs of (http://exon.biology.gatech.edu/conference). Submission validated genes of the Arabidopsis genome enabling the of manuscripts to the Bioinformatics special issue has evaluation of multi-gene models. For this purpose they been a parallel avenue of presentation for the conference also introduced new measures of gene finding accuracy participants. Eighteen papers were submitted to the peer reflecting the prediction errors at the protein sequence review process and nine were accepted for publication. level. These new measures of accuracy evaluation, along The issue starts with the papers devoted to DNA sequence with conventional criteria defined at the site and the exon analysis and gene finding, followed by the papers on levels, can be applied to any gene prediction software and protein sequence analysis and protein structure prediction. to any eukaryotic genome for which a similar data set is Single pass sequencing of the 5 and 3 ends of cloned built. For the Arabidopsis genome the authors have shown cDNA sequences represents an efficient approach to large that while the accuracy of splice site and exon prediction scale characterization of the set of genes expressed by an is quite high, gene modeling accuracy remains very low. organism. In the past few years huge numbers of such se- The publicly available gene finding programs were shown quences, termed ‘expressed sequence tags’ or ESTs, have to differ significantly in their performance. However, it been determined from human and a variety of other organ- was demonstrated that gene modeling could be further isms, and deposited in public databases. Such sequences improved by specific combination of gene finding tools. represent a powerful resource for identifying genes, but A. Bansal, in the paper ‘An automated comparative suffer from the relatively high rates of sequencing errors analysis of seventeen complete microbial genomes’, typical of single pass data. Frameshift errors, in particular, presents a new improvement in the automated pairwise can obscure the results of similarity searches based on genome comparison technique used for identifying or- translations of putative EST ORFs. The paper entitled thologous genes. This technique was used to compare ‘FramePlus: Aligning DNA to protein sequences’ by E. 17 already sequenced microbial genomes and derive Halperin, S. Faigler and R. Gill-More describes a new al- orthologs, orthologous gene-groups, duplications, gene- gorithm specifically tailored to allow for moderate to high fusions, genes with conserved functionality and genes rates of frameshift errors. The new techniques resulted in specific to groups of genomes. improvements in the ability to detect homologies between D. Sankoff’s paper, ‘Genome rearrangement with gene ESTs and known proteins in peptide database searches. families’, addresses the problem of reconstructing the