Forget Test Tubes, Petri Dishes

Bioinformatics: Bringing it all together technology feature orget test tubes, petri dishes and integration — integration of data across DAS allows one computer to contact pipettes. One of the few pieces of the hundreds, if not thousands, of different multiple servers to retrieve and integrate Fequipment that can be honestly databases, and visual integration of dispersed genomic annotations associated labelled ubiquitous in biology today is data to aid interpretation.“The key to with a particular sequence, such as predicted the computer. Bioinformatics — the bioinformatics is integration, integration, introns and exons from one server and development and application of integration,”says bioinformatics expert corresponding single-nucleotide computational tools to acquire, store, Jim Golden at Curagen spin-off 454 polymorphisms (SNPs) from another. organize, archive, analyse and visualize Corporation in Branford, Connecticut.“To It handles the annotations as elements biological data — is one of biology’s answer most interesting biological problems, associated with a particular stretch of fastest-growing technologies. you need to combine data from many data genomic sequence and so enables users to Biologists at the bench studying small sources,”agrees Russ Altman, a biomedical obtain a picture of that genome segment networks of genes want user-friendly tools informatics expert at Stanford University. with all of its associated annotations. to analyse their results and help them to “However, creating seamless access to Many providers of genome data, including plan experiments. They need accessible multiple data sources is extremely difficult.” WormBase, FlyBase, the Ensembl server interfaces that allow them to search run by the European Bioinformatics databases, and compare their data with Standard currencies Institute (EBI) and the Sanger Institute near those of others (see ‘Genome analysis at One of the most insidious problems is the Cambridge, UK, and the genome browser at your fingertips’,below). lack of standard file formats and data-access the University of California, Santa Cruz, are At the other end of the spectrum, methods. But attempts to standardize them currently running DAS servers. researchers analysing whole genomes, are gaining momentum. One success is the Reckoning that data providers will and drug-discovery companies mining distributed annotation system (DAS), a never agree on a universal standard for the genome for drug targets, want high- standard protocol developed by Lincoln representing data, building database throughput analysis tools to accelerate Stein at Cold Spring Harbor Laboratory in interfaces or writing access scripts, Stein genome annotation and extract New York and his colleagues.“It’s a simple thinks that web services such as DAS are the information from databases in more solution to a simple but obvious problem,” best route to interoperability. Data providers efficient and sophisticated ways. says Stein.“There was no standard way of only have to agree on a small set of And all of those involved want more exchanging sequence annotations.” standards that define how their data and GENOME ANALYSIS AT YOUR FINGERTIPS The working biologist now has an California, San Diego, is particularly enormous number of options when it popular, offering more than 80 comes to bioinformatics tools. On one bioinformatics tools to more than 10,000 hand, there is a lot of free high-quality registered users. “It’s a one-stop-shop for software in the public domain. On the doing a lot of things,”says lead developer other, researchers can buy commercial Shankar Subramaniam. “You can be sitting products offering added features, such as in front of any type of computer; as long as programs to streamline sequential tasks, you have a web browser, you can access it.” to access proprietary databases and to Software has also become more user- enhance data security. And because friendly. Back in the early 1990s, users of the software producers realize that users’ GCG Wisconsin package, the grandfather of needs change and their products will molecular-biology packages (now sold by rarely be used in isolation, flexibility and Accelrys), had to work with UNIX-based INFORMAX modularity are on the rise. systems. Although these systems are still An important trend has been the InforMax’s BioAnnotator uses locally preferred by some, users can now point- increasing integration and sophistication stored databases to find protein motifs. and-click their way through a wide range of tools available to non-experts. A wide of tasks on ordinary desktop computers. range of user-friendly packages incorporating tools for nucleotide Another trend is the increased integration of data analysis with and protein sequence analysis are available from companies such experimental design. The needs of bench scientists don’t always as MiraiBio, a Hitachi Software Engineering subsidiary based in coincide with those of professional bioinformaticians producing Alameda, California; DNASTAR in Madison, Wisconsin; tools for whole-genome analyses. Genome projects require InforMax in Bethesda, Maryland; and Accelrys in San Diego, programs that can efficiently, if not very accurately, process huge California. On the non-commercial side, the Biology WorkBench amounts of sequence data, but the biologist in the lab is often maintained by the Supercomputer Center at the University of interested in studying small sets of genes and their products with NATURE | VOL 419 | 17 OCTOBER 2002 | www.nature.com/nature 751 © 2002 Nature Publishing Group technology feature tools are presented to the outside world. and federation. A warehouse is a central And a ‘registry’ can keep track of which database where data from many different data sources implement which services. sources are brought together on one Scripts for retrieving a particular type of physical site. Entrez, the widely used data or operation consult the registry, as they search-and-retrieval system developed would an address book, to determine which by the US National Center for data sources to query. A project of this type Biotechnology Information in Bethesda, is BioMOBY, led by Mark Wilkinson at the Maryland, is an example. National Research Council in Saskatoon, Canada. BioMOBY will be a powerful Access all areas exploration tool, he says, because apart from A popular tool is SRS produced by answering database queries, it will discover LION Bioscience of Heidelberg, Germany, cross-references to other relevant data and which facilitates access to a wide range of applications. Betting on BioMOBY’s biological databases using a warehouse-like potential, several groups are encouraging its strategy. SRS is used in the online genome Structure prediction: modelling a development.“At the moment, we have the portals maintained by Celera Genomics in sequence homolog in LION’s SRS 3D. support of almost all of the model organism Rockland, Maryland, and Incyte Genomics databases,”says Wilkinson. in Palo Alto, California, and is the core local copies of external data collections Another indicator of the widespread technology of tools sold by LION. in a warehouse is a major task,”says desire for interoperability is the Federation, on the other hand, links bioinformatician Rolf Apweiler at the EBI’s incorporation in February 2002 of the different databases so that they appear to lab in Hinxton, UK. Federation avoids this Interoperable Informatics Infrastructure be unified to the end-user but are not because the data are accessed directly from LION BIOSCIENCE Consortium (I3C). With 14 member physically integrated at a common site. A the original source. But the bioinformatics organizations — including Sun query engine takes a complicated question databases you want to query must be Microsystems of Santa Clara, California; requiring access to multiple databases and accessible for programmatic queries over IBM of White Plains, New York; divides it into subqueries that are sent to the the Internet, and most are not, says Peter Millennium Pharmaceuticals and the individual databases. The answers are then Karp, director of the bioinformatics research Whitehead Institute for Biomedical reassembled and presented to the user. group at the non-profit research institute Research, both in Cambridge, Aventis Pharmaceuticals in Strasbourg, SRI International in Menlo Park, California. Massachusetts — I3C is not a standards France, for example, has adopted IBM’s “It’s like installing a state-of-the-art body, but aims to develop and promote DiscoveryLink federating software to aid telephone exchange in a village without the adoption of common protocols. collaboration between its biologists and telephones.” To integrate the current set of non- chemists in drug development. Several projects combine the two standardized databases, researchers are Which approach to use and when is approaches. On the industry side, IBM relying on two main strategies: warehousing much debated.“Updating and maintaining has set up a partnership with LION to very high precision. Last month, for example, InforMax released annotation information from the LocusLink website, and goes GenomBench, a tool that allows users to predict the structure to Medline to assemble a list of relevant references.“You just hit a of genes and their splice variants, progressively refine these button and it does what might take a biologist 600 hours to do, in predictions, and then design experiments to validate them. “It’s an about five hours,”says Mark Haselup, chief technical officer for

Forget Test Tubes, Petri Dishes

Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

UC Davis UC Davis Previously Published Works

Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

The Biogrid Interaction Database

NIH-GDS: Genomic Data Sharing

SGD and the Alliance of Genome Resources Stacia R

The HUPO Proteomics Standards Initiative Meeting: Towards Common Standards for Exchanging Proteomics Data Hinxton, Cambridge, UK, 19–20 October 2002

The European Bioinformatics Institute in 2020: Building a Global Infrastructure of Interconnected Data Resources for the Life Sciences Charles E

2003 Mulder Nucl Acids Res {22

PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E

Annotation of Metabolic Genes in Caenorhabditis Elegans and Reconstruction of Icel1273