<<

European Research Consortium for and Mathematics Number 43 October 2000 www.ercim.org

Special Theme:

Bioinformatics Biocomputing

ERCIM offers postgraduate Fellowships: page 4 CONTENTS

KEYNOTE SPECIAL THEME: BIOCOMPUTING

3 by Michael Ashburner 30 Biomolecular by John McCaskill JOINT ERCIM ACTIONS 32 The European Molecular Computing Consortium 4 ERCIM has launched the 2001/2002 Fellowship by Grzegorz Rozenberg Programme 33 Configurable DNA Computing by John McCaskill EUROPEAN SCENE 35 Molecular Computing Research at Leiden Center 5 Irish Government invests over €635 Million in Basic for Natural Computing Research by Grzegorz Rozenberg

SPECIAL THEME: 36 Cellular Computing by Martyn Amos and Gerald G. Owenson 6 Bioinformatics: From the Pre-genomic to the Post-genomic Era 37 Research in Theoretical Foundations by Thomas Lengauer of DNA Computing by Erzsébet Csuhaj-Varjú and György Vaszil 8 Computational Genomics: Making Sense of Complete Genomes by Anton Enright, 38 Representing Structured Symbolic Sophia Tsoka and Christos Ouzounis with Self-organizing Maps

10 Searching for New Drugs in Virtual Molecule by Igor Farkas 39 Neurobiology keeps Inspiring New Neural by Matthias Rarey and Thomas Lengauer Network Models 12 A High Performance Computing Network by Lubica Benuskova for Protein Conformation Simulations by Marco Pellegrini RESEARCH AND DEVELOPMENT

13 Ab Initio Methods for Protein Structure Prediction: 41 Contribution to Quantitative Evaluation A New Technique based on Ramachandran Plots of Lymphoscintigraphy of Upper Limbs by Anna Bernasconi by Petr Gebousky, Miroslav Kárny and Hana Kfiížová 15 Phylogenetic Tree Reconciliation: the Species/ Gene Tree Problem 42 Approximate Similarity Search by Jean-François Dufayard, Laurent Duret by Giuseppe Amato and François Rechenmann 43 Education of ‘Information 16 Identification of Drug Target Proteins Non-professionals’ for the Development and Use by Alexander Zien, Robert Küffner, Theo Mevissen, of Information Systems Ralf Zimmer and Thomas Lengauer by Peter Mihók, Vladimír Penjak and Jozef Bucko

18 Modeling and Simulation of Genetic Regulatory 45 Searching Documentary Films On-line: Networks the ECHO Project by Hidde de Jong, Michel Page, Céline Hernandez, H by Pasquale Savino ans Geiselmann and Sébastien Maza 46 IS4ALL: A New Working Group promoting 19 Bioinformatics for Genome Analysis in Farm Animals Universal Design in Information Society by Andy S. Law and Alan L. Archibald 21 Modelling Metabolism Knowledge using Objects by Constantine Stephanidis and Associations by Hélène Rivière-Rolland, Loïc Taloc, Danielle Ziébel TECHNOLOGY TRANSFER in, François Rechenmann and Alain Viari 47 Identifying on the Move 22 Co-operative Environments for Genomes by Beate Koch Annotation: from Imagene to Geno-Annot 48 Virtual Planetarium at Exhibition ‘ZeitReise’ by Claudine Médigue, Yves Vandenbrouke, by Igor Nikitin and Stanislav Klimenko François Rechenmann and Alain Viari 23 Arevir: Analysis of HIV Resistance Mutations EVENTS by Niko Beerenwinkel, Joachim Selbig, Rolf Kaiser an d Daniel Hoffmann 50 6th Eurographics Workshop on Virtual Environments 24 Human Brain Informatics Ð Understanding by Robert van Liere Causes of Mental Illness by Stefan Arnborg, Ingrid Agartz, Mikael Nordström 50 Trinity College Dublin hosted the Sixth European Håkan Hall and Göran Sedvall Conference on Vision Ð ECCV’2000 by David Vernon 26 Intelligent Post-Genomics by Francisco Azuaje 51 Fifth Workshop of the ERCIM Working Group on Constraints 27 Combinatorial in Computational by Eric Monfroy Biology by Marie-France Sagot 52 Announcements 28 Crossroads of Mathematics, Informatics and Life Sciences by Jan Verwer and Annette Kik 55 IN BRIEF

Cover image: Dopamine-D1 receptors in the human brain, by Håkan Hall, Karolinska Institutet, see article ‘Human Brain Informatics Ð Understanding Causes of Mental Illness’ on page 24 ERCIM News No. 43, October 2000 KEYNOTE

ome revolutions in science often come when least you expect them. Others are forced upon us. Bioinformatics is a revolution forced by the extraordinary advances in DNA sequencing Stechnologies, in our understanding of protein structures and by the necessary growth of biological databases. Twenty years ago pioneers such as Doug Brutlag in Stanford and Roger Staden in Cambridge began to use computational methods to analyse the very small DNA sequences then determined. Pioneer efforts were made in 1974 by Bart Barrell and Brian Clarke to catalogue the first few nucleic acid sequences that had been determined. A few years later, in the early 1980’s, first the European Molecular Biology Laboratory (EMBL) and then the US National Institutes of Health (NIH) established computerised data libraries for nucleic acid sequences. The first release of the EMBL data was 585,433-bases; it is 9,678,428,579 on the day I write this, and doubling every 10 months or so.

Bioinformatics is a peculiar trade since, until very recently, most in the field were trained in other fields Ð , physics, linguistics, genetics, etc. The term will include curators and algorithmists, software and molecular evolutionists, graph theorists and geneticists. By and large their common characteristic is a desire to understand biology through the organisation and analysis of molecular data, especially those concerned with macromolecular sequence and structure. They rely absolutely on a common infrastructure of public databases and shared software. It has proven, in the USA, Japan and Europe, to be most effective to provide this infrastructure by a mix of Michael Ashburner, Joint-Head of the major public domain institutions, academic centres of excellence and European Bioinformatics Insitute: industrial research. Indeed, such are the economies of scale for both data providers and data users that it has proved to be effective to collect the “What we so desperately need, if we are major data classes, nucleic acid and protein sequence, protein structure going to have any chance of competing co-ordinates, by truly global collaborative efforts. with our American cousins over the long term in bioinformatics, genomics and In Europe the major public domain institute devoted to bioinformatics is science in general, is a European Science the European Bioinformatics Institute, an Outstation of the EMBL. Council with a consistent and science led Located adjacent to the Sanger Centre just outside Cambridge, this is the policy with freedom from political and European home of the major international nucleic acid sequence and nationalistic interference.” protein structure databases, as well as the world’s premier protein sequence database. Despite the welcome growth of national centres of excellence in bioinformatics in Europe these major infrastructural projects must be supported centrally. The EBI is a major database innovator, eg its proposed ArrayExpress database for microarray data, and software innovator, eg its SRS system. Jointly with the Sanger Centre the EBI produces the highest quality automatic annotation of the emerging human genome sequence (Ensembl).

To the surprise of the EBI, and many others, attempts to fund these activities at any serious level through the programmes of the European Commission were rebuffed in 1999. Under Framework IV the European Commission had funded databases at the EBI; despite an increased funding to the area of ‘infrastructure’ generally the EBI was judged ineligible for funding under Framework Programme V. Projects internationally regarded as excellent, such as the ArrayExpress database, simply lack funding. The failure of the EC to fund the EBI in 1999 led to a major funding crisis which remains to be resolved for the long term, although the Member States of EMBL have stepped in with emergency funds, and are considering a substantial increase in funding for the longer term.

It is no coincidence that the number of ‘start-up’ companies in the fields of bioinformatics and genomics in the USA is many times that in Europe. There there is a commitment to funding both national institutions (the budget of the US National Center for Biotechnology Information is three-times that of the EBI) and academic groups. What we so desperately need, if we are going to have any chance of competing with our American cousins over the long term in bioinformatics, genomics and science in general, is a European Science Council with a consistent and science led policy with freedom from political and nationalistic interference. The funding of science through the present mechanisms in place in Brussels is failing both the community and the Community. Michael Ashburner

ERCIM News No. 43, October 2000 3 JOINT ERCIM ACTIONS

ERCIM has launched the 2001/2002 Fellowship Programme

ERCIM offers postdoctoral fellowships in leading in two research centres. Next deadline for European information technology research centres. applications: 31 October 2000. The Fellowships are of 18 months duration, to be spent

The ERCIM Fellowship Programme was Selection Procedure Fellowships are of 18 months duration, established in 1990 to enable young Each application is reviewed by one or spent in two of the ERCIM institutes. scientists from around the world to more senior scientists in each ERCIM ERCIM offers a competitive salary which perform research at ERCIM institutes. For institute. ERCIM representatives will may vary depending on the country. Costs the 2001/2002 Programme, applications select the candidates taking into account for travelling to and from the institutes are solicited twice with a deadline of 31 the quality of the applicant, the overlap will be paid. In order to encourage the October 2000 and 30 April 2001. of interest between applicant and the mobility, a member institution will not be hosting institution and the available eligible to a candidate of the same Topics funding. nationality. This year, the ERCIM Fellowship Conditions Links: programme focuses on the following Detailed description and online application topics: Candidates must: form: http://www.ercim.org/activity/fellows/ ¥ Systems ¥ have a PhD (or equivalent), or ¥ Database Research be in the last year of the thesis work Please contact ¥ Technologies with an outstanding academic record Aurélie Richard Ð ERCIM Office Tel: +33 4 92 38 50 10 ¥ Constraints Technology and ¥ be fluent in English E-mail: [email protected] Application ¥ be discharged or get deferment from ¥ Control and Systems Theory military service ¥ ¥ start the grant before October 2001. ¥ Electronic Commerce ¥ Interfaces for All ¥ Environmental Modelling ¥ Health and Information Technology ¥ Networking Technologies ¥ E-Learning ¥ Web Technology, Research and Application ¥ Software Systems Validation ¥ ¥ Mathematics in Computer Science ¥ Robotics ¥ others.

Objectives The objective of the Programme is to enable bright young scientists to work collectively on a challenging problem as fellows of an ERCIM insitute. In addition, an ERCIM fellowship helps widen and intensify the network of personal relations and understanding among scientists. The Programme offers the opportunity: ¥ to improve the knowledge about European research structures and networks ¥ to become familiar with working conditions in leading European research centres ¥ to promote co-operation between research groups working in similar Poster for the areas in different laboratories, through 2001/2002 ERCIM Fellowship the fellowships. Programme.

4 ERCIM News No. 43, October 2000 EUROPEAN SCENE

Irish Government invests over €635 Million in Basic Research

Science Foundation Ireland, the National Foundation relevant to economic development, particularly for Excellence in Scientific Research, was launched Biotechnology and Information and by the Irish Government to establish Ireland as a Technologies (ICT). The foundation has over €635 centre of research excellence in strategic areas million at its disposal.

The Technology Foresight Reports and evaluation of expenditure of the strategy of Science Foundation Ireland, published in 1999 had recommended that Technology Foresight Fund. The including Dr. Gerard van Oortmerssen, the Government establish a major fund to Foundation will be set up initially as a chairman of ERCIM, who comments: develop Ireland as a centre for world class sub-Board of Forfás, the National Policy “The decision by the Irish Government research excellence in strategic niches of and Advisory Board for Enterprise, Trade, to make a major strategic investment in Biotechnology and ICT. As part of its Science, Technology and Innovation. fundamental research in ICT shows vision response, the Government approved a and courage and is an example for other Technology Foresight Fund of over €635 Speaking at the launch of the first call for European countries. It is fortunate that million for investment in research in the Proposals to the Foundation on the 27th Ireland recently joined ERCIM. We are years 2000-2006. Of this fund, €63 of July, 2000, Ireland’s Deputy Prime looking forward to co-operating with a million has been allocated to set up a new Minister, Mary Harney, said that the Irish strong research community in Ireland.” research and development institute Government was keen to establish Ireland located in Dublin in partnership with as a centre of research excellence in ICT The aim of the first call of the proposals Massachusetts Institute of Technology. and Biotechnology. “We wish to attract is to identify and fund, at a level of up to The new institute will be known as the best scientific brains available in the €1.3 million per year, a small number of Lab Europe and will specialise in international research community, outstanding researchers and their teams multimedia, digital content and particularly in the areas of Biotechnology who will carry out their work in public technologies. and ICT, to develop their research in research organisations in Ireland. Ireland. The large amount of funding Applications are invited not only from This Fund is part of a €2.5 billion being made available demonstrates the Irish scientists at home and abroad but initiative on R&D that the Irish Irish government’s commitment to this also from the global research community. Government has earmarked for Research, vitally important project.” Selection will be by an international peer Technology and Innovation (RTI) review system. The funding awards will activities in the National Development Advisory Panels with international cover the cost of research teams, possibly Plan 2000-2006. Science Foundation experts in Biotechnology and Information up to 12 people, over a three to five year Ireland is responsible for the and Communications Technologies (ICT) period. The SFI Principal Investigator and management, allocation, disbursement have been set up to advise on the overall his/her team will function within a research body in Ireland; either in an Irish University, Institute of Technology or public research organisation. International co-operation will be encouraged.

This initiative will be observed with interest by other European countries, and investment decisions made now will have far-reaching effects for future research in Ireland.

Links: Science Foundation Ireland: http://www.sfi.ie Media Lab Europe: http://www.mle.ie

Please contact: Josephine Lynch Ð Science Foundation Ireland Tel: +353 1 6073 200 E-mail: [email protected], [email protected]

At the launch of the First Call for Proposals on 27th July were: L to R: Mr. Paul Haran, Secretary General, Dept. of Enterprise, Trade and Employment; Ms. Mary Harney, T.D., Deputy Prime Minister and Minister for Enterprise, Trade and Employment; Mr. Noel Treacy, TD., Minister for Science, Technology and Commerce and Mr. John Travers, Chief Executive Officer, Forfás.

ERCIM News No. 43, October 2000 5 SPECIAL THEME: BIOINFORMATICS

Bioinformatics: From the Pre-genomic to the Post-genomic Era

by Thomas Lengauer

Computational Biology and The goal of this field is to provide biomolecules and molecular complexes Bioinformatics are terms for an computer-based methods for coping with in aqueous solution as well as the interdisciplinary field joining and interpreting the genomic data that are modeling and simulation of molecular information technology and being uncovered in large volumes within interaction networks inside the cell and biology that has skyrocketed in the diverse genome sequencing projects between cells. Solving these problems is recent years. The field is located and other new experimental technology essential for an accurate and effective at the interface between the two in molecular biology. The field presents analysis of disease processes by scientific and technological one of the grand challenges of our times. computer. disciplines that can be argued to It has a large basic research aspect, since drive a significant if not the we cannot claim to be close to Besides these more ‘timeless’ scientific dominating part of contemporary understanding biological systems on an problems, there is a significant part of innovation. In the English organism or even cellular level. At the that is driven by language, Computational Biology same time, the field is faced with a strong new experimental data provided through refers mostly to the scientific part demand for immediate solutions, because the dramatic progress in molecular of the field, whereas the genomic data that are being uncovered biology techniques. Starting with genomic Bioinformatics addresses more encode many biological insights whose sequences, the past few years have the infrastructure part. In other deciphering can be the basis for dramatic provided gene expression data on the languages (eg German) scientific and economical success. With basis of ESTs (expressed sequence tags) Bioinformatics covers both the pre-genomic era that was and DNA microarrays (DNA chips). aspects of the field. characterized by the effort to sequence These data have given rise to a very active the human genome just being completed, new subfield of computational biology we are entering the post-genomic era that called expression data analysis. These concentrates on harvesting the fruits data go beyond a generic view on the hidden in the genomic text. In contrast to genome and are able to distiniguish the pre-genomic era which, from the between gene populations in different announcement of the quest to sequence tissues of the same organism and in the human genome to its completion, has different states of cells belonging to the lasted less than 15 years, the post-genomic same tissue. For the first time, this affords era can be expected to last much longer, a cell-wide view of the metabolic and probably extending over several regulatory processes under different generations. conditions. Therefore these data are believed to be an effective basis for new At the basis of the scientific grand diagnoses and therapies of diseases. challenge in computational biology there are problems in computational biology Eventually genes are transformed into such as identifying genes in DNA proteins inside the cell, and it is mostly sequences and determining the three- the proteins that govern cellular processes. dimensional structure of proteins given Often proteins are modified after their the protein sequence (the famed protein synthesis. Therefore, a cell-wide analysis folding problem). Other unsolved of the population of mature proteins is mysteries include the computational expected to correlate much more closely estimation of free energies of with cellular processes than the expressed

6 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

Articles in this section: Introduction

6 Bioinformatics: From the Pre-genomic to the Post-genomic Era by Thomas Lengauer Review Papers

8 Computational Genomics: Making sense of Complete Genomes by Anton Enright, Sophia Tsoka and Christos Ouzounis

10 Searching for New Drugs in Virtual Molecule Databases by Matthias Rarey and Thomas Lengauer Classical Bioinformatics Problems

12 A High Performance Computing Network for Protein Conformation Simulations by Marco Pellegrini genes that are measured today. The encoding that Nature has afforded for 13 Ab Initio Methods for Protein Structure emerging field of proteomics addresses biological as well as the enormous Prediction: A New Technique based on Ramachandran Plots the analysis of the protein population data volume present large challenges and by Anna Bernasconi inside the cell. Technologies such as 2D are continuing to have large impact on 15 Phylogenetic Tree Reconciliation: the gels and mass spectrometry offer the processes of information technology Species/Gene Tree Problem glimpses into the world of mature proteins themselves. by Jean-François Dufayard, Laurent and their molecular interactions. Duret and François Rechenmann In this theme section, we present 15 New Developments Finally, we are stepping beyond analyzing scientific progress reports on various 16 Identification of Drug Target Proteins generic genomes and are asking what aspects of computational biology. We by Alexander Zien, Robert Küffner, Theo Mevissen, Ralf Zimmer genetic differences between individuals begin with two review papers, one from and Thomas Lengauer of a species are the for predisposition the biological and one from the 18 Modeling and Simulation of Genetic to certain diseases and effectivity of pharmaceutical perspective. In three Regulatory Networks special drugs. These questions join the further articles we present progress on by Hidde de Jong, Michel Page, Céline fields of molecular biology, genetics, and solving classical grand challenge Hernandez, Hans Geiselmann pharmacy in what is commonly named problems in computational biology. A and Sébastien Maza pharmacogenomics. section of five papers deals with projects 19 Bioinformatics for Genome Analysis in Farm Animals addressing computational biology by Andy S. Law and Alan L. Archibald Pharmaceutical industry was the first problems pertaining to current problems 21 Modelling Metabolism Knowledge branch of the economy to strongly engage in the field. In a section with three papers using Objects and Associations in the new technology combining high- we discuss medical applications. The last by Hélène Rivière-Rolland, Loïc Taloc, experimentation with two papers concentrate on the role of Danielle Ziébelin, François Rechenmann bioinformatics analysis. Medicine is information technology contributions, and Alain Viari following closely. Medical applications specifically, algorithms and . 22 Co-operative Environments for Genomes step beyond trying to find new drugs on Annotation: from Imagene to Geno-Annot by Claudine Médigue, Yves Vandenbrouke, the basis of genomic data. The aim here This theme section witnesses the activity François Rechenmann and Alain Viari is to develop more effective diagnostic and dynamics that the field of Medical applications techniques and to optimize therapies. The computational biology and bioinformatics 23 Arevir: Analysis of HIV Resistance Mutations first steps to engage computational enjoys not only among biologists but also by Niko Beerenwinkel, Joachim Selbig, biology in this quest have already been among computer scientists. It is the Rolf Kaiser and Daniel Hoffmann taken. intensive interdisciplinary cooperation 24 Human Brain Informatics Ð Understanding between these two scientific communities Causes of Mental Illness While driven by the biological and that is the motor of progress in this key- by Stefan Arnborg, Ingrid Agartz, Mikael medical demand, computational biology technology for the 21st century. Nordström, Håkan Hall and Göran Sedvall will also exert a strong impact onto 26 Intelligent Post-Genomics Please contact: by Francisco Azuaje information technology. Since, due to Thomas Lengauer Ð GMD General their complexity, we are not able to Tel: +49 2241 14 2777 simulate biological processes on the basis E-mail: [email protected] 27 Combinatorial Algorithms in Computational Biology of first principles, we resort to statistical by Marie-France Sagot learning and techniques, 28 Crossroads of Mathematics, Informatics methods that are at the heart of modern and Life Sciences information technology. The mysterious by Jan Verwer and Annette Kik

ERCIM News No. 43, October 2000 7 SPECIAL THEME: BIOINFORMATICS

Computational Genomics: Making Sense of Complete Genomes

by Anton Enright, Sophia Tsoka and Christos Ouzounis

The current goal of bioinformatics is to take the raw European Bioinformatics Institute (an EMBL genetic information produced by sequencing projects outstation) in Cambridge, work is underway to tackle and make sense of it. The entire genome sequence this vast flood of data using both existing and novel should reflect the inheritable properties of a given technologies for biological discovery. species. At the Computational Genomics Group of the

The recent sequencing of the complete (CGG) is targeting research on the module to assess the quality of the results genomes of many species (including a following fields. and assign function to each gene. ‘draft’ human genome) has emphasised the importance of bioinformatics research. Automatic Genome Annotation Assigning Proteins into Families Once the DNA sequence of an organism Accurately annotating the proteins Clustering protein sequences by similarity is known, proteins encoded by this encoded by complete genomes in a into families is another important aspect sequence are predicted. While some of comprehensive and reproducible manner of bioinformatics research. Many these proteins are highly similar to well- is important. Large scale sequence available clustering techniques fail to studied proteins whose functions are analysis necessitates the use of rapid accurately cluster proteins with multiple known, many will only have similarity to computational methods for functional domains into families. Multi-domain another poorly annotated protein from characterisation of molecular proteins generally perform at least two another genome or worse still, no components. GeneQuiz is an integrated functions that are not necessarily related, similarity at all. A major goal of system for the automated analysis of and so ideally should belong in multiple computational genomics is to accurately complete genomes that is used to derive families. To this end we have developed predict the function of all proteins protein function for each gene from raw a novel called GeneRAGE. The encoded by a genome, and if possible sequence information in a manner GeneRAGE algorithm employs a fast determine how each of these proteins comparable to a human expert. It employs sequence similarity search algorithm such interacts with other proteins in that a variety of similarity search and analysis as BLAST and represents similarity organism. Using a combination of methods that entail the use of up-to-date information between proteins as a binary sequence analysis, novel algorithm protein and DNA databases and creates a . This matrix is then processed and development and data-mining techniques compact summary of findings that can be passed through successive rounds of the the Compuational Genomics Group accessed through a Web-based browser. Smith-Waterman dynamic programming The system applies an ‘expert system’ algorithm, to detect inconsistencies which

Figure 1: The GeneQuiz entry page for the S.cerevisiae genome. Figure 2: Protein families in the Methanococcus jannashii genome displayed using the X-layout algorithm.

8 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

represent false-positive or false-negative similarity assignments. The resulting clusters represent protein families accurately and also contain information regarding the domain structure of multi- domain proteins. A visualization program called xlayout based on the Fruchterman- Rheingold graph-layout optimisation algorithm has also been developed for displaying these complex similarity relationships.

Prediction of Protein Interaction Another novel algorithm developed in the CGG group is the DifFuse algorithm. This algorithm is based on the hypothesis that there is a selective advantage for proteins performing related functions to fuse together during the course of evolution (eg different steps in the same metabolic pathway). The DifFuse algorithm can detect a fused protein in one genome based its similarity to complementary pair Figure 3: Gene Fusion Ð The TRP5 tryptophan synthase protein in S.cerevisiae is a fusion of of unfused proteins in another genome. two single domains, such as TrpA and TrpB in E. coli. The detection of these fused proteins allows one to predict either functional association or direct physical interaction of the un-fused proteins. This algorithm public molecular biology database. to a so-called go-list. Abstracts are is related to GeneRAGE in the sense that Similarly, we have also developed a clustered using an unsupervised the fusion detection is similar to standard for genome annotation called learning approach, according to their the multi-domain detection step described GATOS (Genome AnoTattiOn System) of words contained in the go-list. above. This algorithm can be applied to which is used as a data exchange format. The xlayout algorithm (see above) is then many genomes for large-scale detection Work is under development to incorporate used to display the clustering results. The of protein interactions. an XML-based standard called XOL resulting document clusters accurately (XML Ontology Language). represent sets of abstracts referring to the Knowledge-Base Development same biological process or pathway. Databases in molecular biology and Text-Analysis and Data-Mining TextQuest has been applied to the bioinformatics are generally poorly There is already a vast amount of data development of the dorsal-ventral axis in structured, many existing as flat text files. available in the abstracts of published the fruit-fly Drosophila melanogaster and In order to get the most out of complex biological text. The MEDLINE database has produced meaningful clusters relating biological databases these data need to be contains abstracts for over 9 million to different aspects of this developmental represented in a format suitable for biological papers published worldwide process. complex information extraction through since 1966. However, these data are not Links: a simple querying system and also ensure represented in a format suitable for large- http://www.ebi.ac.uk/research/cgg/ . An ontology is an exact scale information extraction. We have specification of a data model that can be developed an algorithm called TextQuest Please Contact: used to generate such a ‘knowledge’ base. which can perform document clustering Christos A. Ouzounis Ð European Molecular We have developed an ontology for of MEDLINE abstracts. TextQuest uses Biology Laboratory, European Bioinformatics Institute representation of genomic data which is an approach that restructures these Tel: +44 1223 49 46 53 used to built a database called GenePOOL biological abstracts and obtains the E-mail: [email protected] incorporating these concepts. This system optimal number of terms that can stores computationally-derived associate large numbers of abstracts into information such as functional meaningful groups. Using a term- classifications, protein families and weighting system based on the TF.IDF reaction information. Database analysis family of metrics and term frequency data is performed through flexible and from the British National Corpus, we complex queries using LISP that are select words that are biologically simply not possible through any other significant from abstracts and add them

ERCIM News No. 43, October 2000 9 SPECIAL THEME: BIOINFORMATICS

Searching for New Drugs in Virtual Molecule Databases

by Matthias Rarey and Thomas Lengauer

The rapid progress in sequencing the human genome activity specifically and which are therefore opens the possibility for the near future to understand considered to be potential drugs against the disease. many diseases better on molecular level and to obtain At GMD, approaches to the computer-based search so-called target proteins for pharmaceutical research. for new drugs are being developed (virtual screening) If such a target protein is identified, the search for which have already been used by industry in parts. those molecules begins which influence the protein’s

Searching for New Lead Structures New Approaches to Screening flexibility of the ligand is considered The development process of a new Molecule Databases during a FlexX prediction. In a set of medicine can be divided into three phases. The methods of searching for drug benchmarks tests, FlexX is able to predict In the first phase, the search for target molecules can be classified according to about 70 percent of the protein-ligand proteins, the disease must be understood two criteria: the existence of a three- complexes sufficiently similar to the on molecular-biological level as far as to dimensional structural model of the target experimental structure. With about 90 know individual proteins and their protein and the size of the data set to be seconds computing time per prediction, importance to the symptoms. Proteins are searched. If a structural model of the the software belongs to the fastest docking the essential functional units in our protein is available, it can be used directly tools currently available. FlexX has been organism and can perform various tasks to search for suitable drugs (structure- marketed since 1998 and is currently ranging from the chemical transformation based virtual screening); ie we search for being used by about 100 pharmaceutical of materials up to the transportation of a key fitting a given lock. If a structural companies, universities and research information. The function is always model is missing, the similarity to institutes. linked with the specific binding of other molecules that bind to the target protein molecules. As early as 100 years ago, is used as a measure for the suitability as If the three-dimensional structure of the Emil Fischer recognised the lock-and-key a drug (similarity-based virtual target protein is not available, similarity- principle: Molecules that bind to each screening). Here we use a given key to based virtual screening methods are other are complementary to each other search for fitting keys without knowing applied to molecules with known binding both spatially and chemically, just as only the lock. In the end, the size of the data properties, called the reference molecule. a specific key fits a given lock (see Figure set to be searched decides on the amount The main problem here is the structural 1). If a relationship between the of time to be put into the analysis of an alignment problem which is closely suppression (or reinforcement) of a individual molecule. The size ranges from related to the docking problem described protein function and the symptoms is a few hundred already preselected above. Here, we have to superimpose a recognised, the protein is declared to be molecules via large databases of several potential drug molecule with the reference a target protein. In the second phase, the millions of molecules to virtual molecule so that a maximum of functional actual drug is developed. The aim is to combinatorial molecule libraries groups are oriented such that they can detect a molecule that binds to the target theoretically allowing to synthese of up form the same interactions with the protein, on the one hand, thus hindering to billions of molecules from some protein. Along the lines of FlexX, we have its function and that, on the other, has got hundred molecule building blocks. developed the software tool FlexS [2,3] further properties that are demanded for for the prediction of structural alignments drugs, for example, that it is well tolerated The key problem in structure-based with approximately the same performance and accumulates in high concentration at virtual screening is the prediction of the with respect to computing time and the place of action. The first step is the relative orientation of the target protein prediction quality. search for a lead structure - a molecule and a potential drug molecule, the so- that binds well to the target protein and called docking problem. For solving this If very large data sets are to be searched serves as a first proposal for the drug. problem we have developed the software for similar molecules, the speed of the Ideally, the lead structure binds very well tool FlexX [1] in co-operation with alignment-based screening does not to the target protein and can be modified Merck KGaA, Darmstadt, and BASF AG, suffice yet. The aim is to have comparison such that the resulting molecule is suitable Ludwigshafen. On the one hand, the operations whose computation takes by as a drug. In the third phase, the drug is difficulty of the docking problem arises far less than one second. Today linear transformed into a medicine and is tested from the estimation of the free energy of descriptors (bit strings or integral vectors) in several steps to see if it is well tolerated a molecular complex in aqueous solution are usually applied to solve this problem. and efficient. The present paper is to and, on the other, from the flexibility of They store the occurrence or absence of discuss the first step, ie the computer- the molecules involved. While a sufficient characteristic properties of the molecules based methods of searching for new lead description of the flexibility of the protein such as specific chemical fragments or structures. presumably will not be possible even in short paths in the molecule. Once such a the near future, the more important descriptor has been determined, the linear

10 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

structure enables a very fast comparison. A considerable disadvantage is, however, that the overall structure of the molecule is represented only inadequately and an exact match between the fragments is frequently required for the recognition of similarities. As an alternative, we have developed a new descriptor, the feature tree [4], in co-operation with SmithKline Beecham Pharmaceuticals, King of Prussia (USA) . Unlike the representation using linear descriptors, in this approach a molecule is represented by a tree structure representing the major building blocks of the molecules. If two molecules are to be compared with each other, the task is to find first an assignment of the building blocks of the molecules which might be able to occupy the same regions of the active site upon binding. With the aid of a time-efficient implementation, average comparison times of less than a tenth second can be achieved. This allows 400.000 molecule comparisons to be carried out overnight within a single computation on a single processor. Applying the new descriptor to nor in the distant future. Nevertheless, the Complex between the protein HIV-protease data sets, we could show that, importance of the computer increases in (shown as blue ribbon) and a known inhibitor (shown in red). HIV-protease plays a major in many cases, an increase of the rate of drug research. The reason is the very great role in the reproduction cycle of the HIV active molecules in a selected subset of number of potential molecules which virus. Inhibitors like the one shown here are the data set is achieved if compared with come into consideration as a drug for a used in the treatment of AIDS. the standard method. specific application. The computer allows a reasonable pre-selection and molecule Designing New Medicines libraries can be optimised for specific with the Computer? characteristics before synthesising. On Unlike many design problems from the the basis of experiments, the computer is field of engineering, for example, the able to generate hypotheses which enable design of complex plants, or in turn better planning of further microchips, the underlying models are experiments. In these domains, the still very inaccurate for solving problems computer has already proven to be a tool in the biochemical environment. without which pharmaceutical research Important quantities such as the binding cannot be imagined anymore. energy of protein-ligand complexes can Links: be predicted only with high error rates. http://www.gmd.de/SCAI/ In addition, not only on the interaction of the drug with the target protein is of Please contact: importance to the development of Matthias Rarey Ð GMD medicines. The influence of the drug on Tel: +49 2241 14 2476 E-mail: [email protected] the whole organism rather is to be examined. Even in the near future, it will not be possible to answer many questions arising in this context accurately by means of the computer due to their complexity: Is the drug absorbed in the organism? Does it produce the desired effect? Which side effects are experienced or is it possibly even toxic? Therefore a medicine originating directly from the computer will not be available neither in the near

ERCIM News No. 43, October 2000 11 SPECIAL THEME: BIOINFORMATICS

A High Performance Computing Network for Protein Conformation Simulations

by Marco Pellegrini

The Institute for Computational Mathematics (IMC- parallel algorithms for numerical liner algebra and CNR), Pisa, with the collaboration of Prof. Alberto continuous optimization, methods for efficient Segre of University of Iowa, is now combining several computation of electrostatic fields) in a project on core areas of expertise (sequential, distributed and Protein Conformation Simulation.

The need for better methodologies to science, with special openings for the combinatorial and continuous optimization, simulate biological molecular systems has design of efficient algorithms. and strategies for searching over (hyper)- not been satisfied by the increased surfaces with multiple local minima. computing power of Protein Folding requires searching an available nowadays. Instead such optimal configuration (minimizing the Innovative techniques will be used in increased speed has pushed research into energy) among exponentially large spaces order to push forward the state of the art: new areas and increasingly complex of possible configurations with a number a new distributed paradigm (Nagging and simulations. A clear example is the of degrees of freedom ranging from Partitioning), techniques from problem of protein folding. This problem hundreds to a few thousands. To cope (hierarchical is at the core of the technology of rational with this challenge, a double action is representations) and innovative algorithms drug design and its impact on future required. On one hand, the use of High for long range interactions. society cannot be underestimated. In a Performance Computing Networks nutshell, the problem is that of permits the exploitation of the intrinsic Nagging and Partitioning determining the 3-dimensional structure parallelism which is present in many As observed above, the protein folding of a protein (and especially its biologically aspects of the problem. On the other hand, problem is reduced to a searching active sites) starting from its known DNA sophisticated and efficient algorithms are problem in a vast space of conformations. encoding. There are favourable needed to exploit fully the deep A classical paradigm for finding the opportunities for approaches employing mathematical properties of the physics optimum over a complex search space is a spectrum of competences, from strictly involved. Brute force methods are at a that of ‘partitioning’. A master process biological and biochemical to loss here and the way is open for effective splits the search space among several mathematical modeling and computer sampling methodologies, algorithms for slave processes and selects the best one from the solutions returned by the slaves. This approach is valid when it is relatively easy to split the workload evenly among the slave processes and the searching strategy of each slave is already optimized. The total execution time is determined by the slowest of the slaves and, when any slave is faulty, the computation is either blocked or returns a sub-optimal solution. A different and complementary approach is that of ‘nagging’. Here slaves operate on the same search space, however each slave uses a different search strategy. The total time is determined by the most efficient slave and the presence of a faulty slave does not block the computation. Moreover, in a branch and bound overall philosophy, it has been shown that the partial solutions of any processor help to speed up the search of others. Searching An image obtained for the optimal energy conformation is a with RASMOL 2.6 of Bovine complex task where parallelism is present HYDROLASE at many different levels, so that neither (SERINE pure nagging nor pure partitioning is the PROTEINASE). best choice for all of them. A mixed

12 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

strategy that uses both is more adaptive (eg an axis parallel box) enclosing the electrostatic fields and integral geometric and yields better chances of success. molecule. Recursively, we split the formulae leading to new efficient and molecule into two groups of atoms and robust algorithms for computing Hierarchical Representations build the corresponding bounding boxes. electrostatic forces. In particular, for Techniques for representing and The process stops at the leaves of the tree, 3-dimensional continuous distributions manipulating hierarchies of 3- each of which correspond to a single of charge, there is a representation without dimensional objects have been developed . Such a representation speeds up analytic singularities for which an in computational geometry for the steric testing since it quickly rules out adaptive Gaussian quadrature algorithm purpose of speeding up visibility many pairs of atoms that are distant in the converges with exponential speed. Such computations and collision detection and tree hierarchy. In general many such trees recent techniques benefit from the could be adapted to the folding problem. may be built with the same input and the calculation of molecular energy since they One of the main sub-problems in finding aim is to obtain good properties such as obtain a good approximation without admissible configurations is to determine minimizing the total surface or volume considering all pairs of atoms thus the existence or absence of steric clashes of the bounding boxes. It is interesting to avoiding quadratic complexity growth. among single atoms in the model. This is note that such trees are more flexible and Please contact: a problem similar to that of collision use less storage than uniform grid Marco Pellegrini Ð IMC-CNR detection for complex 3d models of decompositions and even oct-tree data Tel: +39 050 315 2410 macroscopic objects in robotics and structures. E-mail: [email protected] geometric modeling. A popular technique http://www.imc.pi.cnr.it/~pellegrini is that of inclusion hierarchies. The Electrostatic Long Range Interactions hierarchy is organized as a tree and the A line of research at IMC has found root is associated with a bounding volume interesting connections between

Ab Initio Methods for Protein Structure Prediction: A New Technique based on Ramachandran Plots by Anna Bernasconi and Alberto M. Segre

A new technique for ab initio protein structure guidance of Prof. Alberto M. Segre, in collaboration prediction, based on Ramachandran plots, is currently with the Institute for Computational Mathematics, IMC- being studied at the University of Iowa, under the CNR, Pisa.

One of the most important open problems becoming increasingly important. evolutionarily related proteins with in molecular biology is the prediction of Unfortunately, since it was discovered that similar sequences, as measured by the the spatial conformation of a protein from proteins are capable of folding into their percentage of identical residues at each its primary structure, ie from its sequence unique native state without any additional position based on an optimal structural of amino acids. The classical methods for genetic mechanisms, over 25 years of superposition, often have similar structure analysis of proteins are X-ray effort has been expended on the structures. For example, two sequences crystallography and nuclear magnetic determination of the three-dimensional that have just 25% sequence identity resonance (NMR). Unfortunately, these structure from the sequence alone, without usually have the same overall fold. techniques are expensive and can take a further experimental data. Despite the Threading methods compare a target long time (sometimes more than a year). amount of effort, the protein folding sequence against a library of structural On the other hand, the sequencing of problem remains largely unsolved and is templates, producing a list of scores. The proteins is relatively fast, simple, and therefore one of the most fundamental scores are then ranked and the fold with inexpensive. As a result, there is a large unsolved problems in computational the best score is assumed to be the one gap between the number of known protein molecular biology today. adopted by the sequence. Finally, the ab sequences and the number of known initio prediction methods consist in three-dimensional protein structures. This How can the native state of a protein be modelling all the energetics involved in gap has grown over the past decade (and predicted (either the exact or the the process of folding, and then in finding is expected to keep growing) as a result approximate overall fold)? There are three the structure with lowest free energy. This of the various genome projects major approaches to this problem: approach is based on the ‘thermodynamic worldwide. Thus, computational methods ‘comparative modelling’, ‘threading’, and hypothesis’, which states that the native which may give some indication of ‘ab initio prediction’. Comparative structure of a protein is the one for which structure and/or function of proteins are modelling exploits the fact that the free energy achieves the global

ERCIM News No. 43, October 2000 13 SPECIAL THEME: BIOINFORMATICS

minimum. While ab initio prediction is clearly the most difficult, it is arguably the most useful approach. Residue C β

χ There are two components to ab initio prediction: devising a scoring (ie, energy) Cα function that can distinguish between ψ correct (native or native-like) structures ω NCφ ′ from incorrect ones, and a search method to explore the conformational space. In Figure 1: Backbone many methods, the two components are torsion angles of a O coupled together such that a search protein. function drives, and is driven by, the scoring function to find native-like structures. Unfortunately, this direct approach is not really useful in practice, both due to the difficulty of formulating an adequate scoring function and to the formidable computational effort required to solve it. To see why this is so, note that any fully-descriptive energy function must consider interactions between all pairs of atoms in the polypeptide chain, and the number of such pairs grows exponentially with the number of amino acids in the protein. To make matters worse, a full model would also have to contend with vitally important interactions between the protein’s atoms and the environment, the so-called ‘hydrophobic Figure 2: The effect’. Thus, in order to make the Ramachandran Plot. computation practical, simplifying assumptions must necessarily be made. occurring legal values for the torsion space, guided by some scoring function. angles. In particular, the peptide bond is Of course, this discrete search process is Different computational approaches to rigid (and shorter than expected) because an exponential one, meaning that in its the problem differ as to which assumptions of the partial double-bond of most naive form it is impractical for all are made. A possible approach, based on the CO-NH bond. Hence the torsion angle but the smallest proteins. Thus, the search the discretization of the conformational omega around this bond generally occurs should be made more palatable by space, is that of deriving a protein-centric in only two conformations: ‘cis’ (omega incorporating a number of search pruning lattice, by allowing the backbone torsion about 0 degrees) and ‘trans’ (omega about and reduction techniques, and/or by angles, phi, psi, and omega, to take only 180 degrees), with the trans conformation exploring the discrete space in parallel. a discrete set of values for each different being by far the more common. Moreover From this naive folding, appropriate residue type. Under biological conditions, the other two torsion angles, phi and psi, constraints (based on atomic bonds the bond lengths and bond angles are are highly constrained, as noted by G. N. present, their bond lengths, and any Van fairly rigid. Therefore, the internal torsion Ramachandran (1968). The (paired) der Waals or sulfide-sulfide interactions angles along the protein backbone values allowed for them can be selected in the naive folding) are derived for an determine the main features of the final using clustering algorithms operating in interior-point optimization process, which geometric shape of the folded protein. a Ramachandran plot space constructed adjusts atomic positions and computes an Furthermore, one can assume that each from the protein database energy value for this conformation. The of the torsion angles is restricted to a (http://www.rcsb.org/pdb), while discrete computed energy value is then passed small, finite set of values for each values for omega can be set to {0,180}. back to the discrete space and used to tune different residue type. As a matter of fact, This defines a space of (2k)n possible the scoring function for further additional not all torsion angles are created equally. conformations for a protein with n amino parallel pruning. While they may feasibly take any value acids (assuming each phi and psi pair is Please contact: from -180 to 180 degrees, in nature all allowed to assume k distinct values). In Anna Bernasconi Ð IMC-CNR these values do not occur with uniform this way, the conformational space of Tel. +39 050 3152411 . This is due to the geometric proteins is discretized in a protein-centric E-mail: [email protected] constraints from neighboring atoms, fashion. An approximate folding is then or Alberto M. Segre Ð University of Iowa which dramatically restrict the commonly found by searching this reduced, discrete E-mail: [email protected]

14 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

Phylogenetic Tree Reconciliation: the Species/Gene Tree Problem by Jean-François Dufayard, Laurent Duret and François Rechenmann

An algorithm to find gene duplications in phylogenetic génomes’ team from the UMR ‘biométrie, biologie trees in order to improve gene function inferences évolutive de Lyon’. The algorithm and its software is has been developed in a collaboration between the applicable to realistic data, especially n-ary species the Helix team from INRIA Rhône-Alpes and the tree and unrooted phylogenetic tree. The algorithm ‘biométrie moléculaire, évolution et structure des also takes branch lengths into account.

With appropriate algorithms, it is possible ancestor results from a speciation event, unrooted: the number of duplications is to deduce species history studying genes while they are paralogous if the a good criterion to make a choice, and sequences. Genes are indeed subject to divergence results from a duplication with this method the algorithm is able to mutations during the evolution process, event. root phylogenetic trees. and hence the corresponding (homologous) sequences in different It is essential to make the distinction Software has been developed to use the species differ from each other. A tree can because two paralogous genes are less algorithm. It has been written in JAVA be built from the sequences comparison, likely to have preserved the same function 1.2, and the graphical interface permits relating genes and species history: a than two orthologues. Therefore, if one an easy application to realistic data. An phylogenetic tree. Sometimes a wants to predict gene function by exhaustive species tree can be easily seen phylogenetic tree disagrees with the homology between different species, it is and edited (tested with more than 10,000 species tree (constructed for example necessary to check whether genes are leaves). Results can be modified and from anatomical and paleontological orthologous or paralogous to increase the saved. considerations). These differences can be accuracy of the prediction. Links: explained by a gene being duplicated in Action Helix: a genome, and each copy having its own An algorithm has been developed which http://www.inrialpes.fr/helix history. Consequently, a in a can deduce this information by comparing phylogenetic tree can be the division of gene trees with the taxonomy of different Please contact: an ancestral species into two others, as species. Currently, the algorithm is Jean-François Dufayard Ð INRIA Rhône- Alpes well as a gene duplication. More applicable to gene families issued of Tel: +33 4 76 61 53 72 precisely, in a family of homologous vertebrates. It can be applied to realistic E-mail: [email protected] genes, paralogous genes have to be data: species trees may not necessarily be distinguished from orthologous genes. binary, and the tree structures are Two genes are orthologous if the compared as well as their branch lengths. divergence from their last common Finally, phylogenetic trees can be

Reconciliation of a phylogenetic tree (lower-left corner) with the corresponding species tree (upper- left corner). Reconciliation produces a reconciled tree (right) which describes both genes and species history. It allows to deduce the location of gene duplications (white squares) which are the only information needed to distinguish orthologous from paralogous genes.

ERCIM News No. 43, October 2000 15 SPECIAL THEME: BIOINFORMATICS

Identification of Drug Target Proteins

by Alexander Zien, Robert Küffner, Theo Mevissen, Ralf Zimmer and Thomas Lengauer

As ever more knowledge is accumulated on various GMD scientists work to utilize protein structures, aspects of the molecular machinery underlying the expression profiles as well as metabolic and biological processes within organisms, including the regulatory networks in the search for target proteins human, the question how to exploit this knowledge for pharmaceutical applications. to combat diseases becomes increasingly urgent. At

Huge amounts of heterogeneous data are the given protein sequence to a known Nowadays, gene expression levels are pouring out of the biological labs into fold by aligning the sequence onto the most frequently obtained by DNA chips, molecular biology databases. Most known protein structure. Thus, threading micro-arrays or SAGE (serial analysis of popular are the sequencing projects that utilizes the available knowledge directly gene expression). These technologies are decipher complete genomes and uncover in the form of the structures that are designed for high throughput and are their protein complements. Many more already experimentally resolved. Another already capable of monitoring the projects are under way, aiming e.g. at advantage of threading in comparison to complete gene inventory of small resolving the yet unknown protein folds ab initio methods is the low demand of organisms (or a large part of all human or at collecting human single nucleotide computing time. This is especially true genes). We apply and machine polymorphisms. In a less coordinated for 123D, a threader developed at GMD learning techniques in order to normalize way, many labs measure gene expression that models pairwise amino acid contacts the resulting raw measurement data, to levels inside cells in numerous cell states. in a way that allows alignment identify differentially regulated genes and Last but not least there is an enormous computation by fast dynamic to clusters of cell states. Subsequently, amount of unstructured information programming. The objective function for we apply statistics and hidden in the plethora of scientific the optimization includes potentials techniques in order to identify publications, as is documented in accounting for chemical environments differentially regulated genes, clusters of PubMed, for instance. Each of these within the protein structure, and cell states etc. sources of data provides valuable clues membership in secondary structure to researchers that are interested in elements as well as amino acid Pathway Modeling molecular biological problems. The substitution scores and gap penalties, all While a large part of the current effort is project TargId at GMD SCAI focuses on carefully balanced by a systematic focused on inferring high-level structures methods to address the arguably most procedure. A second threading program from gene expression data, much is urgent problem: the elucidation of the programmed at GMD, called RDP, already known on the underlying genetic origins and mechanisms of human utilizes more computation time than 123D networks. Several databases that are diseases, culminating in the identification in order to optimize full pair interaction available on the internet document of potential drug target proteins. contact potentials to refined metabolic relations in machine-readable alignments. form. The situation is worse for regulatory TargId responds to the need for pathways; most of this knowledge is still bioinformatics support for this task. The Expression Data Analysis hidden in the literature. Consequently, we goal of the project is to develop methods Data on expressed genes comes in several have implemented methods that extract that extract useful knowledge from the different flavors. The historically first additional protein relations from article raw data and help to focus on the relevant method is the generation of ESTs abstracts and model them as Petri nets. items of data. The most sophisticated (expressed sequence tags), i.e. low-quality The resulting graphs can be restricted to aspect is the generation of new insights sequence segments of mRNA, either species, tissues, diseases or other areas of through the combination of information proportional to its cellular abundance or interest. Tools are under development for from different sources. Currently, our enriched for rare messages. While ESTs viewing and editing using standard graph TargId methodology builds on three main are superseded by more modern methods and Petri packages. The generated pillars: protein structure prediction, for the purpose of expression level networks can provide overviews that expression data analysis and measurements, they are still valuable for cross the boundaries of competence fields metabolic/regulatory pathway modeling. finding new genes and resolving gene of human experts. structures in the genomic DNA. Thus, we Protein Structure Prediction have implemented a variant of 123D that Further means are necessary to allow for Knowledge on the three-dimensional threads ESTs directly onto protein more detailed analyses. We can structure (fold) of a protein provides clues structures, thereby translating nucleotide automatically extract pathways from on its function and aids in the search for codons into amino acids on the fly. This networks that are far too large and inhibitors and other drugs. Threading is program, called EST123D, is useful for complicated to lend themselves to easy an approach to structure prediction which proteins that are not yet characterized interpretation. Pathways are biologically essentially assesses the compatibility of other than by ESTs. meaningful subgraphs, e.g. signaling

16 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

metabolic/regulatory network extracted pathways structure/function prediction

gene expression data pathway scoring predicted target protein

In the TargId project, new bioinformatics methods combine heterogeneous information in the search for drug target proteins. cascades or metabolic pathways that proteins, while it still allows for Links: account for supply and consumption of incorporating and testing hypotheses. http://cartan.gmd.de/TargId/ any intermediate metabolites. Another E.g., pathways can be constructed from Please contact: method conceived in TargId, called DMD interactions that are observed in different Alexander Zien or Ralf Zimmer Ð GMD (differential metabolic display), allows tissues or species. The expression data Tel: + 49 2241 14-2563 or -2818 for comparing different systems provide an orthogonal view on these E-mail: [email protected] or (organisms, tissues, etc.) on the level of interactions and can thus be used to [email protected] complete pathways rather than mere validate the hypotheses. interactions. Structure prediction can aid in this process Bringing it All Together ... at several stages. First, uncharacterized Each of the methods described above can proteins can tentatively be embedded into provide valuable clues pointing to target known networks based on predicted proteins. But the crux lies in their clever structure and function. Second, structural combination, interconnecting data from information can be integrated into the different sources. In recent work, we have pathway scoring function. Finally, when shown that in real life situations clustering a target protein is identified, its structure alone may not be able to reconstruct will be of utmost interest for further pathways from gene expression data. investigations. Instead of searching for meaning in clusters, we invented an approach that It can be imagined that target finding can proceeds inversely: First, a set of gain from broadening the basis for the pathways is extracted from a protein/gene search to also include, e.g., phylogenetic network, using the methods described profiles, post-translational modifications, above. Then, these pathways are scored genome organization or polymorphisms. with respect to gene expression data. The As these fields are still young and in need restriction to pathways prevents us from of further progress, it is clear that holistic considering unreasonable groupings of target finding is only in its infancy.

ERCIM News No. 43, October 2000 17 SPECIAL THEME: BIOINFORMATICS

Modeling and Simulation of Genetic Regulatory Networks

by Hidde de Jong, Michel Page, Céline Hernandez, Hans Geiselmann and Sébastien Maza

In order to understand the functioning of an organism, at INRIA Rhône-Alpes have been developing a the network of interactions between genes, mRNAs, computer tool for the modeling and simulation of proteins, and other molecules needs to be elucidated. genetic regulatory networks in collaboration with Since 1999, researchers in the bioinformatics group molecular biologists.

The sequencing of the entire genome of on ideas from mathematical biology and constraints on the local behavior of the prokaryotic and eukaryotic organisms has . system. By analyzing the possible been completed in the past few years, transitions between volumes, an culminating in the presentation of a The method describes genetic regulatory indication of the global behavior of the working draft of the human genome last systems by piecewise-linear differential system can be obtained. In particular, the June. The analysis of these huge amounts equations with favourable mathematical method determines steady-state volumes of data involves such tasks as the properties. The phase space is subdivided and volume cycles that are reachable from prediction of folding structures of proteins into volumes in which the equations an initial volume. The steady-state and the identification of genes and reduce to simple, linear and orthogonal volumes and volume cycles correspond regulatory signals. It is clear, however, differential equations imposing strong to functional states of the regulatory that the structural analysis of sequence data needs to be complemented with a functional analysis to elucidate the role of genes in controlling fundamental biological processes.

One of the central problems to be addressed is the analysis of genetic regulatory systems controlling the spatiotemporal expression of genes in an organism. The structure of these regulatory systems can be represented as a) b) a network of interactions between genes, proteins, metabolites, and other small molecules. The study of genetic regulatory networks will contribute to our understanding of complex processes like the development of a multicellular organism.

In addition to new experimental tools permitting the expression level to be rapidly measured in a way, computer tools for the modeling, visualization, and simulation of genetic regulatory systems will be indispensable. Most systems of interest involve many genes connected through cascades and positive and negative feedback loops, so that an intuitive understanding of their dynamics is hard to obtain. As a consequence of the lack of quantitative information on regulatory interactions, c) traditional modeling and simulation techniques are usually difficult to apply. Three stages in the simulation process as seen through the GNA . (a) The To counter this problem, we have network of genes and regulatory interactions that is transformed into a mathematical model. (b) The volume transition graph resulting from the simulation. (c) A closer look at the in developed a method for the qualitative the volume transition graph selected in (b). The graph shows the qualitative temporal evolution simulation of regulatory systems based of two protein concentrations.

18 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

system, for instance a response to a realized by which a network of We plan GNA to evolve into an physiological perturbation of the interactions between genes can be environment for the computer-supported organism (a change in temperature or displayed, as well as the volume transition analysis of genetic regulatory networks, nutrient level). graph resulting from the simulation. In covering a range of activities in the design addition, the user can focus upon and testing of models. These activities, The above method has been implemented particular paths in the graphs to study the such as the validation of hypothesized in Java 1.2 in a program called GNA qualitative temporal evolution of gene models of regulatory networks by means (Genetic Network Analyzer). GNA reads product concentrations in more detail (see of experimental data, will be accessible and parses input files with the equations figures). through a user-friendly graphical and inequalities specifying the model of interface. In parallel, we will apply the the system as well as the initial volume. GNA has been tested using genetic method to the analysis of bacterial An inequality reasoner iteratively regulatory networks described in the regulatory systems in collaboration with generates the volumes that are reachable literature, such the example of lambda biologists at the Université Joseph Fourier from the initial volume through one or phage growth control in the bacterium in Grenoble. more transitions. The output of the Escherichia coli. Simulation experiments Links: program consists of the graph of all with random regulatory networks have HELIX project: http://www.inrialpes.fr/helix/ reachable volumes connected by shown that, with the current transitions. A graphical interface implementation, our method remains Please contact: facilitating the interaction of the user with tractable for systems of up to 18 genes Hidde de Jong Ð INRIA Rhône-Alpes the program is under development. At involved in complex feedback loops. Tel: +33 4 76 61 53 35 E-mail: [email protected] present, a visualization module has been

Bioinformatics for Genome Analysis in Farm Animals by Andy S. Law and Alan L. Archibald

The Bioinformatics Group at the Roslin Institute and display tools required for mapping complex develops tools and resources for farm animal genome genomes. The is used to deliver the analysis. These encompass the databases, analytical resources to users.

The Bioinformatics Group at the Roslin observations. Data sharing between population separately. The database stores Institute aims to provide access to research groups is particularly valuable details of markers and alleles. Genotypes appropriate bioinformatics tools and in linkage mapping. Only by pooling data may be submitted through a simple web resources for farm animal genome from the collaborating groups can interface that infers missing genotypes, analysis. Genome research in farm comprehensive maps be built. checks for Mendelian inheritance and animals is largely concerned with rejects data that contains inheritance mapping genes that influence We developed resSpecies to meet this errors. Using a series of simple query economically important traits. As yet, need. It uses a relational database forms, data can be extracted in the correct there are no large-scale genome management system (RDBMS - format expected by a number of popular sequencing activities. The requirements INGRES) with a web-based interface genetic analysis algorithms (eg crimap). are for systems to support genetic implemented using Perl and Webintool This eliminates the possibility of cryptic (linkage), quantitative trait locus (QTL), (Hu et al. 1996. WebinTool: A generic typographical errors occurring and radiation hybrid and physical mapping Web to database interface building tool. ensures that the most up-to-date data is and to allow data sharing between Proceedings of the 7th International available at all times. research groups distributed world-wide. Conference and Workshop on Database and Expert Systems (DEXA 96), Zurich, resSpecies is used to support Roslin’s resSpecies Ð a Resource for Linkage September 9-13, 1996 pp 285-290). This internal programmes and several and QTL Mapping makes international collaborations simple international collaborative linkage and Genetic linkage maps are constructed by to effect. QTL mapping projects. following the co-segregation of marker alleles through multi-generation The relational design ensures that ARKdb Ð a Generic Genome pedigrees. In quantitative trait locus complicated pedigrees can be represented Database (QTL) mapping, the performance of the relatively simply. Populations are defined Scientists engaged in genome mapping animals is also recorded. Both QTL and as groups of individuals. Within research also need access to contemporary linkage mapping studies require databases resSpecies, access is granted to individual summaries of maps and other genome- to store and the experimental contributors/collaborators on each related data.

ERCIM News No. 43, October 2000 19 SPECIAL THEME: BIOINFORMATICS

A comparison of physical map and genetic maps of pig chromosome 8 with a genetic map of cattle chromosome 6. The maps are drawn ‘on-the-fly’ by the Anubis map viewer using data held in the ARKdb pig and cattle genome databases.

We have developed a relational The Anubis Map Viewer need to fully integrate analytical tools (INGRES) genome database model Visualisation is the key to understanding with the databases and display tools. (ARKdb) to handle these data, along with complex data and tools that transform raw web-based tools for data entry and data into graphical displays are The Roslin Bioinformatics Group has display. The information stored in the invaluable. The Anubis map viewer was grown to eleven including software ARKdb databases includes linkage and the first genome browser to be operable developers, programmers and database cytogenetic map assignments, as a fully-fledged GUI (Graphical User curators. In the past we have received polymorphic marker details, PCR Interface) over the WWW (URL support from the European Commission primers, and two point linkage data. Each http://www.ri.bbsrc.ac.uk/anubis). It is and Medical Research Council. The group observation is attributed to a reference used as the map viewer for ARKdb is currently funded by grants from the source. Hot links are provided to other databases and the INRA BOVMAP UK’s Biotechnology and Biological data sources eg sequence databases and database. We have recently launched a Sciences Research Council. Medline (Pubmed). prototype java version of Anubis - Links: Anubis4 (http://www.ri.bbsrc.ac.uk/ http://www.roslin.ac.uk/bioinformatics/ The ARKdb database model has been arkdb/newanubis/). implemented for data from pigs, chickens, Please contact: sheep, cattle, horses, deer, turkeys, cats, Future Activities Alan L. Archibald Ð Roslin Institute salmon and tilapia. The full cluster of We are developing systems to handle the Tel: +44 131 527 4200 E-mail: [email protected] ARKdb databases are mounted on the data from radiation hybrid, physical genome at Roslin with subsets at (contig) mapping, expression profiling Texas A+M and Iowa State Universities. (microarray) and expressed sequence tag We have also developed The (EST) experiments. Exploitation of the Comparative Animal Genome database wealth of information from the genomes (TCAGdb) to capture evidence that of human and model organism is critical specific pairs of genes are homologous. to farm animal genome research. We are developing automated Artificial Therefore, we are exploring ways of Intelligence methods to evaluate improving the links and interoperability homology data. with other information systems. Our current tools and resources primarily address the requirements for data storage, retrieval and display. In the future we

20 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

Modelling Metabolism Knowledge using Objects and Associations by Hélène Rivière-Rolland, Loïc Taloc, Danielle Ziébelin, François Rechenmann and Alain Viari

A knowledge base to represent metabolism data has has been implemented by using an object/association been developed by the Helix research team at INRIA technology developed at INRIA. Beside its use as a Rhône-Alpes. This base provides access to general repository, the base may have applications information on chemical compounds, biochemical in metabolic simulations and pathway reconstruction reactions, enzymes, genes and metabolic pathways in newly sequenced genomes. from fully sequenced micro-organisms. The model

The cellular metabolism can be defined as the panel of all biochemical reactions occurring in the cell. It consists of molecular synthesis (anabolism) and degradation (catabolism) necessary for cell growth and division. These reactions drive the energetic processes, the synthesis of structural and catalytic components of the cell and the elimination of cellular wastes.

A fairly large amount of metabolic data is readily available, either in the literature or in public data banks (eg the KEGG project: http://star.scl.genome.ad.jp/kegg) and this information will probably grow Graphical representation of a simple metabolic pathway (sphingophospholipid biosynthesis): in the near future due to the development the nodes represent chemical compounds (eg N-acylsphingosine) and the edges represent the of new ‘large scale’ experimental biochemical reactions which transform these compounds. Each edge is labelled by a number (E.C number) which identifies the enzyme that catalyses the reaction. The figure has been technologies like DNA-arrays. Therefore, automatically generated using the data stored in the database. there is a need to organise this data in a rational and formalised way, ie to model we attempted to develop a knowledge After implementing the data model in our knowledge of metabolic data. The base of metabolic data. We wanted to AROM, we extracted the metabolic data first goal is of course the storage and experiment a different representation from public sources (mostly KEGG) by recovery of pertinent information. The model in which associations are explicitly using parsers and scripts. complexity of this kind of data and in represented as entities. To this purpose, Coherence of sequence data between data particular the fact that some information we used the AROM system developed at banks has been checked by using home- is held in the relationship between the INRIA (http://www.inrialpes.fr/ romans/ made sequences alignment programs biological entities rather than in the pub/arom). The main originality of and/or Blast. At the present time we are entities themselves, makes their selection AROM is the presence of two developing several graphical interfaces and recovery difficult. Moreover, our complementary entities of representation: to this base. One will be devoted to knowledge in this area is often incomplete classes and associations. As in any object- querying the knowledge base. Another (elements are missing or pathways may oriented system, a class represents a set interface will be devoted to the automatic be totally unknown in a newly sequenced of objects described by slots; but, in graphical representation of pathways organism). A challenge is therefore to AROM, such a slot cannot refer to another which are complex non-planar directed cope with this partial information and to object. This connection is done by means graphs (see Figure). At the present time develop databases that could provide of associations which therefore denotes all the system (AROM and the inter faces) some inference mechanisms to assist the a set of tuples (not necessarily only two) is implemented in JAVA and we plan to discovery process. Finally, another of objects (associations are therefore n- put it into play through a web applet- challenge is to link these data to other ary). As objects, tuples have their own server in a near future. relevant genomic and biochemical slots and as classes, associations can be Links: information like protein structure, organised in hierarchies therefore Action Helix: regulation of gene expression, whole allowing for usual inheritance and http://www.inrialpes.fr/helix.html genome organisation (eg syntheny) and specialisation mechanisms. The explicit evolution. representation of n-ary associations turned Please contact: out to be very useful for representing Alain Viari Ð INRIA Tel: +33 4 76 61 54 74 Following the pioneering work of P. Karp biological data. For instance, it makes the E-mail: [email protected] and M. Riley with the Eco-Cyc system representation of alternative substrates of (http://ecocyc.PangeaSystems.com/ecocyc) a metabolic reaction a much easier task.

ERCIM News No. 43, October 2000 21 SPECIAL THEME: BIOINFORMATICS

Co-operative Environments for Genomes Annotation: from Imagene to Geno-Annot

by Claudine Médigue, Yves Vandenbrouke, François Rechenmann and Alain Viari

‘Imagene’ is a a co-operative computer environment chromosomes. Its capabilities are currently extended for the annotation and analysis of genomic sequences to handle both prokaryotic and eukaryotic data and developed in collaboration between INRIA, Université to link pure genomic data to ‘post-genomic’ data, Paris 6, Institut Pasteur and the ILOG company. The particularly metabolic and gene expression data. first version of this software was dedicated to bacterial

In the context of large-scale genomic sequencing projects the need is growing for integration of specific sequence analysis tools within data management systems. With this aim in view, we have developed the Imagene co-operative computer environment dedicated to automatic sequence annotation and analysis (http://abraxa.snv.jussieu.fr/ imagene). In this system, biological knowledge produced in the course of a genome sequencing project (putative genes, regulatory signals, etc) together with the methodological knowledge, represented by an extensible set of sequence analysis methods, are uniformly represented in an object oriented model.

Imagene is the result of a five years Imagene view of a fragment of the B. subtilis chromosome: The display superimposes the output of several methods. Red boxes represent putative protein coding region (gene); the collaboration between INRIA, Université blue boxes represent the result of a data bank similarity scan (here the Blastx program); the Paris 6, the Institut Pasteur and the ILOG yellow curve represents the coding probability as evaluated by using a Markov chain. The company. The system has been translated protein sequence of the currently selected gene is shown in the insert. implemented by using an object oriented model and a co-operative solving engine Imagene has been used within several the task-engine and the graphical user provided by ILOG. In Imagene, a global bacterial genome sequencing projects interfaces in JAVA. Finally, our ultimate problem (task) is solved by successive (Bacillus subtilis and Mycoplasma goal will be to integrate Geno-Annot decompositions into smaller sub-tasks. pulmonis) and has proved to be within a more general environment During the execution, the various sub- particularly useful to pinpoint sequencing (called Geno-*) in order to fully link all tasks are graphically displayed to the user. errors and atypical genes. However this the pieces of genomic information In that sense, Imagene is more transparent first version suffers several drawbacks. together (ie sequence data, metabolism, to the user than a traditional menu-driven First it was limited to the representation gene expression etc). Geno-Annot is a package for sequence analysis since all of prokaryotic data only, second the two years project that started in the steps in the resolution are clearly development tools were commercial thus September 1999. identified. Moreover, once a task has been giving rise to difficulties in its diffusion, Links: solved, the user can restart it at any point; last, it was designed to handle pure Action Helix: the system then keeps track of the sequence data from a single genome. In http://www.inrialpes.fr/helix.html different versions of the execution. This order to overcome these limitations, we Imagene: allows to maintain several hypothesis in undertook a new project (Geno-Annot) http://abraxa.snv.jussieu.fr/imagene parallel during the analysis. Imagene also through a collaboration between INRIA, Please contact: provides a user interface to display, on the Institut Pasteur and the Genome- Alain Viari Ð INRIA the same picture, the results produced by Express biotech compagny. As a first step, Tel: +33 4 76 61 54 74 one or several strategies (see Figure). Due the data model was extended to eukaryotes E-mail: [email protected] to the homogeneity of the whole software, and completely re- implemented using the this display is fully interactive and the AROM system developed at INRIA graphical objects are directly connected (http://www.inrialpes.fr/romans/pub/arom). to their database counterpart. We are now in the process of re-designing

22 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

Arevir: Analysis of HIV Resistance Mutations by Niko Beerenwinkel, Joachim Selbig, Rolf Kaiser and Daniel Hoffmann

To develop tools that assist medical practitioners in year by researchers at GMD, the University of Cologne, finding an optimal therapy for HIV-infected patients Ð CAESAR, the Center of Advanced European Studies this is the aim of a collaboration funded by Deutsche and Research, Bonn, and a number of cooperating Forschungsgemeinschaft that has been started this university hospitals in .

The Human Immunodeficiency Virus therapy changes after resistance testing. to understand these connections and that (HIV) causes the Acquired There are several possible reasons for contribute directly to therapy Immunodeficiency Syndrome (AIDS). therapy failure in this situation: the optimization. In a first step a database is Currently, there are two types of drugs in occurrence of an HIV-strain resistant to set up in collaboration with project the fight against HIV, namely inhibitors all available antiretroviral drugs, no partners from university hospitals and of the two viral enzymes protease (PR) sufficient drug-level is reached in the virological institutes, in which clinical and reverse transcriptase (RT). patient, or the chosen drug-combination data, sequence data and phenotypic was not able to suppress the virus resistance data are collected. Since HIV shows a very high genomic sufficiently. The latter occurs because the variability, even under the usual relations between observed mutations, These correlated data are used to learn combination therapy (HAART Ð highly phenotypic resistance and therapy success about the outcome of a therapy as a active antiretroviral therapy) consisting are poorly understood. function of the drugs making up the of several drugs, mutations occur, that components of this therapy and the confer resistance to the prescribed drugs While the PR inhibitors, for example, all genotype of the two relevant enzymes PR and even to drugs not yet prescribed bind in the catalytic center of this enzyme, and RT. A successful outcome of a (cross resistance). Therefore, the treating mutations associated with resistance occur therapy can be measured as a substantial physician is faced with the problem of at many different locations spread all over reduction in virus load (ie the number of finding a new therapy rather frequently. the three-dimensional structure of the virus particles measured in the patients’ protein (see Figure 1). The relation blood plasma; see Figure 2). We can Clinical trials have shown that therapy between point mutations and drug formulate a classification problem on the changes based on a genotypic resistance resistance remains unclear in many cases, set of all pairs consisting of the therapy’s test, ie sequencing of the viral gene coding not to speak of the interpretation of more drug components and the amino acid for PR and RT and looking for mutations complex mutation patterns. sequence of PR and RT assigning to each known to cause resistance, result in a such pair either the class ‘successful’ or significantly better therapy success. The goal of the Arevir project is to ‘unsuccessful’. However, not all patients benefit from develop bioinformatics methods that help

Figure 1: Ribbon representation of the HIV protease homodimer (blue Figure 2: Virus load and therapies of a patient in a time period of more and green) complexed with an inhibitor (red), some resistance than 100 weeks (the measured data points have been interpolated, associated mutations (codon positions 10, 20, 36, 46, 48, 50, 54, 63, values under the limit of detection have been normalized to this limit 71, 82, 84 and 90) are indicated in the ball-and-stick mode. (400 copies/ml), drugs are encoded in a three letter code).

ERCIM News No. 43, October 2000 23 SPECIAL THEME: BIOINFORMATICS

We use decision tree building, a well by virologists and clinicians, and help to apply methods that can directly improve known machine learning technique, to better understand and manage drug patient care by contributing to therapy derive a model that can make an educated resistance. optimization. Thus, in treatment of HIV guess on the therapy outcome. The infected patients techniques of treating physician faced with the problem In future work the mechanisms of drug interpreting genotypic data are of of finding a new therapy can consult the resistance will be studied at the molecular immediate relevance to therapy decision. decision tree with a therapy proposal, level. To this end we will carry out force Links: provided a genotypic test result is field based calculations on enzyme- Bioinformatics at http://www.gmd.de/SCAI available. In this way the automated inhibitor complexes. interpretation of sequence data can Please contact: directly improve patient care. While traditionally bioinformatics Niko Beerenwinkel Ð GMD Furthermore, the learned rules derived methods are used in the course of Tel.: +49 2241 14 2786 E-mail: [email protected] from the decision tree can be interpreted developing new drugs, we develop and

Human Brain Informatics Ð Understanding Causes of Mental Illness

by Stefan Arnborg, Ingrid Agartz, Mikael Nordström, Håkan Hall and Göran Sedvall

The Human Brain Informatics (HUBIN) project aims at standardized information on individual patients and developing and using methodology for cross-domain healthy control persons is being built and is analyzed investigations of mental illness, particularly using modern data mining and statistics technology. schizophrenia. A comprehensive data base with

Mental disease is a complex phenomenon from SICS, NUTEK, Teknikbrostiftelsen complex interactions between disease, whose causes are not known, and only and from some private foundations. medication and brain morphology. symptomatic therapy exists today. Moreover, mental disease affects maybe Project Goals Complex Sets of Medical Information one third of all humans at least once The aim of the project is, in the long term, Modern medical investigations create during their life time. Approximately 1% to find causes and effective therapies for enormous and complex data sets, whose of all humans suffer from schizophrenia. mental illness. The short term aims of the analysis and cross-analysis poses The more severe cases lead to tragic life- project are to develop methodology in significant statistical and computing long disability, and the annual costs can intra-domain and cross-domain challenges. Some of the data used in be measured in many billions of Euros investigations of schizophrenia by HUBIN are: even in moderate size countries like building a comprehensive data base with ¥ Genetic micro-array technology can Sweden. On the other hand, some of our information about individual patients and measure the activity of many thousand greatest artists and scientists suffered from control persons and using modern genes from a single measurement on a schizophrenia. Large research projects statistical and data mining technology. microscopic (post mortem) brain have mapped mental diseases so that their sample. These data are noisy and typical manifestations are well known Possible Causes of Schizophrenia extremely high-dimensional, so separately in several specialized medical Schizophrenia is believed to develop as standard regression methods fail domains such as genetics, physiology, a result of disturbed signalling within the miserably. Support vector and mixture psychiatry, neurology and brain human brain. Causes of these disturbances modelling techniques are being morphology. But little is known about the can be anomalous patterns of connections investigated. It is necessary to combine relationships between the different between neurons of the functional the hard data with subjective domains. regions, loss or anomalous distribution of information in the form of hypotheses neurons, or disturbances in the on the role of different genes. Heredity The Project biochemical signalling complex. It is investigations based on disease The HUBIN project started in 1998, and known that it is significantly influenced manifestation and marker maps of is carried out at the Clinical by genetic factors, although in a complex relatives also yield large amounts of Neurosciences Department of the way which has not yet been finally data with typically (for multiple gene Karolinska Institute, with participation attributed to individual genes. It is also hunting) extremely weak statistical by Swedish Institute of Computer Science related to disturbances in early (pre-natal) signals. (SICS), IBM and several medical schools: brain development, in ways not yet ¥ Medical imaging methods can give Lund and Umeå in Sweden, Cardiff in the known but statistically confirmed by precise estimates of the in vivo anatomy UK, and Iowa City, USA. It is financed investigations of birth and maternity of the brain and the large individual by a grant from the Swedish Wallenberg journals. The shape of the brain is also variations in shape and size of white foundation, with supplementary funding affected, but it is difficult to trace the (axon, glia), gray (neurons) and wet

24 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS kan Hall, Karolinska Institutet. å

Tissue ratios in different brain regions: A multidimensional view of measured anatomical Image: H volumes in diseased and controls reveals a characteristic but irregular increase of ventricle Dopamine-D1 receptors in the human sizes (blue dots) for schizophrenia patients. The data were obtained from MR scans performed brain post-mortem visualized using whole in the brain morphology project, using volumetry methods developed at University of Iowa. hemisphere autoradiography and Measurements were obtained by Gaku Okugawa. [3H]NNC-112.

(CSF in ventricles and outside the problem to get high-quality answers to activities aim at relating brain brain) matter in a large number of 500 questions from a patient, so it is morphology (size, orientation and shape anatomical regions of the brain. critical to balance the number of questions of various regions and tissue types of the ¥ Diffusion tomography gives asked to patients. brain) to physiological and psychiatric approximate measures of the diffusion conditions and to investigate relationships tensor in small cubes (ca 3 mm side) As a first sifting of this large information between the different domains. which indicates number of and set, summary indices are computed and direction of axons that define the long- entered in a relational data base, where Needless to say, the current project is only signalling connections of the standard data mining and statistical one and as yet a small one of many brain. This tensor can thus be used to visualization techniques give a first set of projects worldwide aimed at improving get approximate measures of the promising lines of investigations. conditions for persons affected by mental signalling connectivity of the brain. Information regarding the mental state of illness. In order to connect and improve ¥ Functional MRI measures the individuals is extremely sensitive and its communications between these many metabolism (blood oxygenation) that collection and use is regulated by ethics groups, our efforts also involve intelligent approximates neural activity with high councils of participating universities and text mining for rapidly scanning the resolution. These investigations give hospitals. Identities of patients and literature. extremely weak signals, and for controls are not stored in the data base, inference it is usually necessary to pool but it must still be possible to correlate We are building a web site, several investigations. individuals across domains. This is http://hubin.org/about/index_en.html, for accomplished with cryptographic between researchers Post-mortem whole brain investigations methods. worldwide and between medical experts can give extremely high resolution maps and relatives of affected persons on of the biochemical signalling system of Conclusions national (native language) basis. the brain, and of gene activity in the brain. The current activities are aimed at Links: More than 50% of the active human showing the feasibility of our HUBIN: http://hubin.org/about/index_en.html genome is believed to be related to brain interdisciplinary approach when coupled development, and very little is known with standardized recording of clinical Please contact: about the mechanisms involved. information. Stefan Arnborg Ð Nada, KTH Tel +46 8 790 71 94 E-mail: [email protected] The patients mental state is measured by Presently, the main efforts go into http://www.nada.kth.se/~stefan/ a psychiatrist using standardized collection of standardized clinical questionaries. It is vital that the subjective measurements and evaluation of statistical information entered is standardized and and visual analysis methods in image and quality assured. Obviously, it is a difficult genetic information analysis. Current

ERCIM News No. 43, October 2000 25 SPECIAL THEME: BIOINFORMATICS

Intelligent Post-Genomics

by Francisco Azuaje

The Intelligent Post-Genomics initiative aims to modelling. This research is mainly focused on the support a number of knowledge discovery tasks with development of a new generation of genomic data crucial applications in medicine and biology, such as interpretation systems based on Artificial Intelligence diagnostic systems, therapy design and organism and data mining techniques.

The union of theoretical sciences such as mathematics, physics and computer science with biology is allowing scientists to model, predict and understand crucial mechanisms of life and the cure of diseases. The success of this scientific synergy will depend not only on the application of advanced information processing methods, but also on the development of a new multidisciplinary language. Researchers from the Artificial Intelligence Group and Centre for together with the Departments of Microbiology, Biochemistry and Genetics of the University of Dublin are convinced that this joint action will yield significant benefit in the understanding of Intelligent Post-Genomics: research issues and applications. key biological problems. The Figure illustrates the main aspects addressed in this project as well as some of its possible already allowed us to explore and confirm associated to sequences and genome applications. the potential of these technologies for the patterns. development of tumour classification The need for higher levels of reliability, systems, detection of new classes of Other crucial research goals involve the emphasising at the same time theoretical cancer and the discovery of associations combination of gene expression data with frameworks to perform complex between expression patterns. other sources of molecular data (such as inferences about phenomena, makes drug activity patterns) and morphological Artificial Intelligence (AI) particularly Thus, one of our major goals is the features data. Similarly, we recognise the attractive for the development of the post- automation of the genome expression need to develop approaches to filter, genomic era. The incorporation of interpretation process. This task should interconnect and organise dispersed techniques from AI and relating research provide user-friendly, effective and sources of genomic data. fields may change the way in which efficient tools capable of organising biological experiments and medical complex volumes of expression data. It Collaboration links with other European diagnoses are implemented. For example, also aims to allow users to discover research institutions, such as the German the analysis of gene expression patterns associations between apparently unrelated Cancer Research Centre (Intelligent allows us to combine advanced data classes or expression patterns. This may Bioinformatics Systems Group) and Max mining approaches in order to relate represent not only a powerful approach Planck Institute for Astrophysics genotype and phenotype. to understand genetic mechanisms in the (Molecular Physics Group), are actually development of a specific disease, but being developed as part of our efforts to Within the past few years, technologies also to support the search for fundamental develop advanced data interpretation have emerged to record the expression processes that differentiate multiple types technologies for life sciences. pattern of not one, but tens of thousands of diseases. A number of intelligent Links: of genes in parallel. Recognising the hybrid frameworks based on neural A collection of representative links to opportunities that these technologies networks, fuzzy systems and evolutionary genomics and bioinformatics resources: provide, the research groups mentioned computation have been shown to be both Genomes & Machines: http://www.cs.tcd.ie/ above are launching a long-term initiative effective and efficient for the achievement Francisco.Azuaje/genomes&machines.html to apply AI, data mining and visualisation of these decision support systems. Please contact: methods to important biological processes Furthermore it may support the Francisco Azuaje Ð Trinity College Dublin and diseases in order to interpret their identification of drug targets by making Tel: +353 1 608 2459 molecular patterns. Initial efforts have inferences about functions that are E-mail: [email protected]

26 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

Combinatorial Algorithms in Computational Biology by Marie-France Sagot

It is almost a ‘cliché’ to say that the sequencing at an truth if one means by that the computer’s capacity to everincreasing speed of whole genomes, including physically store and quickly sift through big quantities that of man, is generating a huge amount of of data. Computer science may strongly influence at information for which the use of becomes least some areas of molecular biology in other, much essential. This is true but is just a small part of the deeper ways.

A lot of the problems such areas have to but in conjunction with others at close ones, do). In particular, exact algorithms address, in the new post-sequencing era intervals in space and time. A model allow to ‘say something’, derive some but not only, requires exploring the inner based upon such a hypothesis would try conclusions even from what is not structure of objects, trying to discover to infer not one site (corresponding to identified because we know that, modulo regularities that may lead to rules, inferring the presence of a conserved some parameters they may take as input, building general models which attempt word at a number of places along a string they exhaustively enumerate all objects to weave and exploit relations among representing a DNA sequence), but they were built to find. Whether they objects and, finally, obtaining the means various sites simultaneously, each one indeed find something (for instance, a site to represent, test, question and revise the located at an often precise (but unknown) initially thought to exist with a certain elaborated models and theories. These are distance from its immediate neighbours characteristic) or not, we may thus, by all activities that are at the essence of most (a problem one may see as equivalent to understanding their inner workings, emit currently elaborated computer algorithms identifying a maximal common sequence biological hypotheses on ‘how things for analysing, often in quite subtle ways, of conserved and spatially constrained function’, or do not function. This is biological data. motifs). The model may be further possible also with non exact approaches sophisticated as the analysis progresses. such as some of the statistical ones, but Among the algorithmical approaches What is important to observe is that the is often more difficult to realize. possible, some try to exhaustively explore revision of a model may be effected only all the ways a molecular process could if the algorithm based upon it is A first algorithm using a model which happen given certain (partial) beliefs we exhaustive. allows to simultaneously infer cooperative have about it. This leads to hard sites in a DNA sequence has been combinatorial problems for which Exact algorithms may therefore be elaborated (in C) by the efficient algorithms are required. Such powerful exploratory tools even though group at the Pasteur Institute (composed algorithms must make use of complex they greatly simplify biological processes of one researcher and PhD students from data representations and techniques. (all approaches including experimental the Gaspard Monge Institute). In

For instance, quite a few processes imply the recognition by the cell machinery of a number of sites along a DNA sequence. The sites represent fragments of the DNA which other macromolecular complexes must be able to recognize and bind to in order for a given biological event to happen. Such an event could be the transcription of DNA into a so-called messenger RNA which will then be partially translated into a protein. Because their function is important, sites tend to be conserved (though the conservation is not strict) while the surrounding DNA suffers random changes. Other possible sequential and/or spatial characteristics of sites having an equivalent function in the DNA of an organism may exist but A much simplified illustration of the possible cooperativeness of sites implicated in the process are often unknown. Hypotheses must then of transcription from DNA to messenger RNA. Objects in orange, green and pink represent be elaborated on how recognition works. protein or RNA-protein complexes. The sites on the DNA where they are supposed to bind are depicted as clouds of black points. Promoter sequences are recognized by the pink objects. For example, one probable scenario is that Each is a protein called RNA polymerase. Enhancers may reinforce or weaken the binding sites are cooperative as suggested in strength of a promoter. The example sketched in this figure concerns a eukaryotic organism. Figure 1. Each one is not recognized alone The figure was taken from URL: http://linkage.rockefeller.edu/wli/gene/right.html

ERCIM News No. 43, October 2000 27 SPECIAL THEME: BIOINFORMATICS

collaboration with biology researchers for a bacterium which is closely related appears difficult to gain maturity and any from the Biology and Physico-Chemistry in terms of evolution to Helicobacter real insight into the domain without and the Pasteur Institutes in Paris, this pylori. getting further involved into biology. This algorithm has started been applied to the is particularly, but not exclusively, true analysis of the sequenced genomes of This particular algorithm is been if one adopts a combinatorial approach three bacteria: Escherichia coli, Bacillus continously ameliorated, expanded and as the operations of building models that subtilis and Helicobacter pylori. The extended to take into account the try to adhere to the way biological objective was to identify the main increasingly more sophisticated models processes happen, and algorithms for sequence elements involved in the process which further analyses of other genomes exploring and testing such models, of transcription of DNA into messenger and processes reveal are necessary to get become intimately linked in a quasi- RNA (what is called the promoter closer to the way recognition happens in symbiotic relationship. It would, anyway, sequences). For various reasons, it was diverse situations. Other combinatorial be a pity not to get so involved as the suspected that such elements would be algorithms, addressing a variety of problems become much more interesting, different in Helicobacter pylori as subjects Ð molecular structure matching, even from a purely computational and compared to the other two bacteria. A comparison and inference; regularities combinatorial point of view, when they blind, exhaustive inference of single detection; gene finding; identification of are not abstracted from what gave origin conserved words in the non coding recombination points; genome to them: a biological question. The case regions of the genome of Helicobacter comparison based upon rearrangement of identifying sites is just one example pylori permitted to identify some elements distances Ð are currently been studied among many. but not to put together with enough within the Pasteur Institute algorithmics Links: certainty those implicated in the group, often in collaboration with outside Papers by the group on this and other transcriptional process. The development PhD students and researchers. combinatorial problems: of more complex models was called upon http://www-igm.univ-mlv.fr/~sagot and, indeed, allowed to reveal that the It is worth observing that many computer Series of seminars on this and other topics: sequence elements are not the same in scientists come to biology seduced by the http://www.pasteur.fr/infosci/conf/AlgoBio Helicobacter pylori . It has also enabled huge range of nice problems that this Please contact: to suggest a consensus for what the main offers to them without necessarily Marie-France Sagot Ð Institut Pasteur so-called promoter elements should be in wanting to get implicated into the Tel: + 33 1 40 61 34 61 the bacterium. It was later verified that intricacies of biology itself. Although the E-mail: [email protected] the consensus proposed strongly two, computer science and biology, resembles one experimentally determined remain quite distinct fields of research, it

Crossroads of Mathematics, Informatics and Life Sciences

by Jan Verwer and Annette Kik

The 20th century has been a period of dramatic New subdisciplines have been created such as technological and scientific breakthroughs and computational biology and bioinformatics, which are discoveries. A general feeling amongst scientists in at the crossroads of biology, physics, chemistry, all areas is that developments in science and informatics and mathematics. CWI is active in a technology will go on, perhaps most notably in the number of projects with life science applications while life and computer sciences as testified by the current other projects are forthcoming. rapid progress in bio- and information technology.

One of our current projects is ‘Macro- studies of systems of reaction-diffusion structures is a key step in understanding Molecular Crowding in Cell Biology’, in equations which take into account effects normal and pathological biological co-operation with several cell biologists. of molecular crowding. processes and systems, such as Living cells are relatively densely packed biocatalysis, transduction, gene with macromolecules (proteins), implying Another project is ‘Analysis of Biological expression and embryonal development. that diffusion of larger molecules can be Structures by Techniques’. much slower than diffusion of smaller This aims at exploring the potentials of Our forthcoming projects have different ones. This crowding effect is important virtual reality visualization techniques in applications. CWI will study models for for the uptake of molecules entering a cell order to obtain detailed insight into the development of phytoplankton (glucose). The influence of this crowding complex biological structures. Such blooms. Phytoplankton lies at the effect is as yet largely unknown. The aim structures include proteins and cellular beginning of the food chain in oceans and is to study metabolic control in pathways structures, as wel as organs and complete takes up large amounts of carbon dioxide by means of numerical and analytical organisms. Visualization of biological from the atmosphere. Hence there is a

28 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOINFORMATICS

A three-dimensional visualization of a cell laminea (green) and chromosomes (red). The panel on the bottom is part of interactive visualization and measuring software being developed by Wim de Leeuw at CWI (Data provided by Roel van Driel, SILS Amsterdam).

clear link with climate research. The A hot question in neuroscience is how the Result of a simulation of axon development complicated models require efficient hand-shaped ends of axons growing out in the presence of three diffusible fields. numerical solution techniques. of neurons explore the territory they Courtesy A. van Ooyen, Netherlands Institute for Brain Research. traverse on their way to a target tissue. The project ‘Distributed Visualization of This growth process is partly guided by Biological Structures’ aims at the concentration gradients of biochemical development of an experimental platform molecules in the extracellular space, to study advanced image processing arising from diffusion and various techniques for cell structures and chemical interactions. This leads to problems. In biology the complexity is very interaction schemes for remote steering models consisting of parabolic equations large, so that there will be a need to improve of a laboratory apparatus, in the context coupled with gradient-type equations for or redesign existing optimization methods of high speed networks. Interactive the positions of the axons. We will study or even come up with totally new ideas. distributed visualization coupled with these models and their numerical solution virtual environments allows for and implement efficient numerical Finally, CWI plans to do research in the collaborative analysis, interrogation of algorithms in software that is user- field of Control and System Theory for large repositories of biological structures friendly for neuroscientists, as a joint the Life Sciences. The aim of this research and distance learning. activity with the Netherlands Institute for is to gain understanding of the feedback Brain Research. control in cells, plants, living beings, and Since about a decade, scientists realise in ecological communities. This that many biological and physical systems Furthermore, there is a whole range of understanding may be used to develop seem to drive themselves, without precise problems linking to Combinatorial treatments for diseases, and for ecological external tuning, towards a state which Optimization. For instance, the primary communities. resembles a typical equilibrium statistical access to the functionality of molecules Links: mechanics system at the critical point. is through laboratory experiments. http://www.cwi.nl/~janv Examples are certain types of epidemics, However, experimental data typically forest fires, and processes in the brain. provide only partial information about Please contact: Most studies of this this so-called Self- structures and properties under Jan Verwer Ð CWI Organized Critical behaviour are of a very investigation. This prompts the problem of Tel: +31 20 592 4095 E-mail: [email protected] heuristic nature, whereas we will use a recovering the actual molecular, chemical Mark Peletier Ð CWI more rigorous mathematical approach. properties and structures from the Tel: +31 20 592 4226 incomplete data. This often leads to so- E-mail [email protected] called combinatorial optimization

ERCIM News No. 43, October 2000 29 SPECIAL THEME: BIOCOMPUTING

Biomolecular Computing

by John McCaskill

Biomolecular computing, The idea that molecular systems can solving the problem of an exponential ‘computations performed by perform computations is not new and was growth in the number of alternative biomolecules’, is challenging indeed more natural in the pre-transistor solutions indefinitely. In a new field, one traditional approaches to age. Most computer scientists know of starts with the simplest algorithms and computation both theoretically von Neumann’s discussions of self- proceeds from there: as a number of and technologically. Often placed reproducing automata in the late 1940s, contributions and patents have shown, within the wider context of some of which were framed in molecular DNA Computing is not limited to simple ‘natural’ or even ‘unconventional’ terms. Here the basic issue was that of algorithms or even, as we argue here, to computing, the study of natural bootstrapping: can a machine construct a a fixed hardware configuration. and artificial molecular machine more complex than itself? computations is adding to our After 1994, universal computation and understanding both of biology Important was the idea, appearing less complexity results for DNA Computing and computer science well natural in the current age of dichotomy rapidly ensued (recent examples of beyond the framework of between hardware and software, that the ongoing projects here are reported in this neuroscience. The papers in this computations of a device can alter the collection by Rozenberg, and Csuhaj- special theme document only a device itself. This vision is natural at the Varju). The laboratory procedures for part of an increasing involvement scale of molecular reactions, although it manipulating populations of DNA were of Europe in this far reaching may appear utopic to those running huge formalized and new sets of primitive undertaking. In this introduction, chip production facilities. Alan Turing operations proposed: the connection with I wish to outline the current also looked beyond purely symbolic recombination and so called splicing scope of the field and assemble processing to natural bootstrapping systems was particularly interesting as it some basic arguments that mechanisms in his work on self- strengthened the view of evolution as a biomolecular computation is of structuring in molecular and biological computational process. Essentially, three central importance to both systems. Purely chemical computers have classes of DNA Computing are now computer science and biology. been proposed by Ross and Hjelmfelt apparent: intramolecular, intermolecular Readers will also find arguments extending this approach. In biology, the and supramolecular. Cutting across this for not dismissing DNA idea of molecular information processing classification, DNA Computing Computing as limited to took hold starting from the unraveling of approaches can be distinguished as either exhaustive search and for a the genetic code and translation homogeneous (ie well stirred) or spatially qualitatively distinctive machinery and extended to genetic structured (including multi-compartment advantage over all other types of regulation, cellular signaling, protein or membrane systems, cellular DNA computation including quantum trafficking, morphogenesis and evolution computing and dataflow like architectures computing. - all of this independently of the using microstructured flow systems) and development in the neurosciences. For as either in vitro (purely chemical) or in example, because of the fundamental role vivo (ie inside cellular life forms). of information processing in evolution, Approaches differ in the level of and the ability to address these issues on programmability, automation, generality laboratory time scales at the molecular and parallelism (eg SIMD vs MIMD) and level, I founded the first multi-disciplinary whether the emphasis is on achieving new Department of Molecular Information basic operations, new architectures, error Processing in 1992. In 1994 came tolerance, evolvability or . The Adleman’s key experiment demonstrating Japanese Project lead by Hagiya focuses that the tools of laboratory molecular on intramolecular DNA Computing, biology could be used to program constructing programmable state computations with DNA in vitro. The machines in single DNA molecules which huge information storage capacity of operate by means of intramolecular DNA and the low energy dissipation of conformational transitions. Intermolecular DNA processing lead to an explosion of DNA Computing, of which Adleman's interest in massively parallel DNA experiment is an example, is still the Computing. For serious proponents of the dominant form, focusing on the field however, there really never was a hybridization between different DNA question of brute search with DNA molecules as a basic step of computations

30 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOCOMPUTING

Articles in this section: Biomolecular Computing

30 Introduction by John McCaskill

32 The European Molecular Computing Consortium and this is common to the three projects such as a glass of water. It involves an by Grzegorz Rozenberg reported here having an experimental evolvable platform for computation in (McCaskill, Rozenberg and which the computer construction 33 Configurable DNA Computing Amos). Beyond Europe, the group of machinery itself is embedded. Embedded by John McCaskill Wisconsin are prominent in exploiting a computing is possible without electrical 35 Molecular Computing Research surface based approach to intermolecular power in microscopic, error prone and at Leiden Center for Natural DNA Computing using DNA Chips. real time environments, using Computing Finally, supramolecular DNA mechanisms and technology compatible by Grzegorz Rozenberg Computing, as pioneered by Eric Winfree, with our own make up. Because DNA 36 Cellular Computing harnesses the process of self-assembly of Computing is linked to molecular by Martyn Amos rigid DNA molecules with different construction, the computations may and Gerald G. Owenson sequences to perform computations. The eventually also be employed to build 37 Research in Theoretical connection with nanomachines and three dimensional self-organizing Foundations of DNA Computing nanosystems is then clear and will partially electronic or more remotely even by Erzsébet Csuhaj-Varjú become more pervasive in the near future. quantum computers. Moreover, DNA and György Vaszil Computing opens computers to a wealth Neural Networks In my view, DNA Computation is of applications in intelligent exciting and should be more substantially manufacturing systems, complex 38 Representing Structured funded in Europe for the following molecular diagnostics and molecular Symbolic Data with reasons: process control. Self-organizing Maps by Igor Farkas

¥ it opens the possibility of a The papers in this section primarily deal 39 Neurobiology keeps Inspiring simultaneous bootstrapping solution of with Biomolecular Computing. The first New Neural Network Models future computer design, construction contribution outlines the European by Lubica Benuskova and efficient computation initiative in coordinating Molecular ¥ it provides programmable access to Computing (EMCC). Three groups nanosystems and the world of present their multidisciplinary projects molecular biology, extending the reach involving joint theoretical and of computation experimental work. Two papers are ¥ it admits complex, efficient and devoted to extending the range of formal universal algorithms running on models of computation. The collection dynamically constructed dedicated concludes with a small sampler from the molecular hardware more established approach to biologically ¥ it can contribute to our understanding inspired computation using neural of information flow in evolution and network models. It is interesting that one biological construction of these contributions addresses the ¥ it is opening up new formal models of application of neural modelling to computation, extending our symbolic information processing. understanding of the limits of However, the extent to which computation. informational biomolecules play a specific role in long term memory and the The difference with structuring of the brain, uniting neural is dramatic. Quantum Computing and molecular computation, still awaits involves high physical technology for the clarification. isolation of mixed quantum states Please contact: necessary to implement (if this is scalable) John McCaskill Ð GMD efficient computations solving Tel: +49 2241 14 1526 combinatorially complex problems such E-mail: [email protected] as factorization. DNA Computing operates in natural noisy environments,

ERCIM News No. 43, October 2000 31 SPECIAL THEME: BIOCOMPUTING

The European Molecular Computing Consortium

by Grzegorz Rozenberg

The rapidly expanding research on DNA Computing beginning of this research area. In 1998 a number of in the US and in Japan has already been channelled research groups in Europe took the initiative to create there into national projects with very substantial the European Molecular Computing Consortium financial support. The growth of DNA Computing in (EMCC) officially established in July 1998 during the Europe has been somewhat slower, although a DNA Computing Days organized by the Leiden Center number of European researchers have participated in for Natural Computing in Leiden. the development of DNA Computing from the very

The EMCC is a scientific organisation of researchers in DNA Computing and is composed of national groups from 11 European countries. The EMCC activities are coordinated by the EMCC Board consisting of: Grzegorz Rozenberg (The Netherlands) - director, Martyn Amos (United Kingdom) - deputy director, Giancarlo Mauri (Italy) - secretary, and Marloes Boon-van der Nat (The Netherlands) - administrative secretary. The purpose of the EMCC is best expressed in its official document ‘Aims and Visions’ which now follows.

Molecular computing is a novel and exciting development at the interface of Computer Science and Molecular Biology. Computation using DNA or proteins, for example, has the potential for massive parallelism, allowing trillions of operations per second. Such a parallel Graphic: Danny van Noort, GMD molecular computer will have huge Countries currently participating in the European Molecular Computing Consortium. implications, both for theoreticians and practitioners. Traditional definitions of computation are being re-defined in the light of recent theoretical and disciplinary co-operation between field of molecular computing to thrive in experimental developments. Although Computer Science, Molecular Biology, Europe. thriving, the field of molecular computing and other relevant scientific areas. The is still at an early stage in its development, EMCC will organize various research- The EMCC will also strive to establish and a huge and concerted effort is enhancing activities such as conferences, and maintain fruitful cooperation with required to assess and exploit its real workshops, schools, and mutual visits that researchers in the area of molecular potential. will provide forums for the exchange of computing from outside Europe, and in results, and for establishing or particular with the project ‘Consortium The European Molecular Computing strengthening existing co-operations in for Biomolecular Computing’ in the US Consortium has been created in order to the field of molecular computing. The and with the Japanese ‘Molecular co-ordinate, foster and expand research EMCC will also actively seek to promote Computer Project’. in this exciting new field, especially in the field, both within the scientific Links: Europe. The EMCC is the result of community and to the public at large, via EMCC web page: discussions between different research scientific publications, seminars, http://www.csc.liv.ac.uk/~emcc/ groups in nine different European workshops and public lectures. All countries. A key function of the participating sites will make the utmost Please contact: consortium is to foster co-operation effort to develop their theoretical and Grzegorz Rozenberg Ð Leiden Center for Natural Computing, Leiden University between scientific, technological and laboratory resources. It is hoped that all Tel: +31 71 5277061/67 industrial partners. A particular effort will of these combined efforts will allow the E-mail: [email protected] be made to create genuinely multi-

32 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOCOMPUTING

Configurable DNA Computing by John McCaskill

The DNA Computing Project carried out at the GMD engineers are working together to produce both a Biomip Institute, aims at making molecular systems technological platform and theoretical framework for more programmable. Computer scientists, chemists, feasible and evolvable molecular computation. molecular biologists, physicists and microsystem

Although the massive parallelism of DNA where an exchange of DNA populations The GMD DNA Computing Project is in solution is impressive (more than 1020 is possible. Rapid hardware redesign multidisciplinary in scope, aiming at of active memory per liter) and the opens the door to evolving computer making molecular systems more energy consumption is very low, the systems, so that configurable DNA programmable. Computer scientists, ultimate attraction of DNA-Computers is Computing also aims at harnessing chemists, molecular biologists, physicists their potential to design new hardware evolution for design and problem solving. and microsystem engineers are working solutions to problems. Unlike Because of the huge information storage together to produce both a technological conventional computers, DNA computers potential of aqueous solutions containing platform and theoretical framework for can construct new hardware during DNA, comparatively low flow rates an effective use of molecular operation. Thus, the closest point of suffice for massively parallel processing computation. The initial barrier is the contact to electronic computing involves so that synthetic DNA can be treated as issue of complexity: just how scalable and hardware design, in particular an affordable, easily degradable resource. programmable are the basic hybridisation reconfigurable hardware design, rather Sequence complexity on the other hand processes underlying DNA Computing? than conventional parallel algorithms or is expensive to purchase but is generated The group has devised an optically languages. Molecular computers can be within the DNA Computer, starting from programmable, scalable concept for DNA constructed reversibly in flow systems, simple sequence modules. Computing in microflow reactors,

Figure 2: Medium Scaleup Programmable Microflow System for Benchmark Problem: Portion of the two layer mask design structure for integrating many selection modules (see Fig. 1) to solve a combinatorial optimization problem - in this case maximal Figure 1: Massively parallel subset selection module for DNA Computing: The three images . The structures are etched into a show a schematic, hydrodynamic simulation and fluorescence image of a microstructured silicon substrate on the top and bottom sides selection module made at the GMD. The concept is that a subset of a mixed population of (green and red), with through connections at DNA flowing through the left hand side of the reactor binds to complementary DNA sequence the sites of squares. The microflow reactor is labels attached to magnetic beads. When the beads are transferred (synchronously for all sealed at top and bottom with anodically such modules) to the right hand side, they enter a denaturing solution which causes the bound bonded pyrex wafers and fitted with DNA subset to be released. The released DNA subset solution is neutralised before being connecting tubing as shown in Fig. 3. This subjected to further modules. The hydrodynamic flow in one such module is seen in the color particular microreactor design can solve any coded image on the left. Continuous flow preserves the integrity of the two different chemical intermediate scaled instance of the clique environments in close proximity in each module. The fluorescence image shows DNA problem up to N=20. Which instance is hybridized to the DNA labels on magnetic beads in such a microreactor, allowing the DNA programmed optically by directing the processing to be monitored. attachment of DNA to beads.

ERCIM News No. 43, October 2000 33 SPECIAL THEME: BIOCOMPUTING

Figure 3: Experimental DNA Computing Apparatus: The left photo shows a microreactor with attached tubing connected to a multivalve port on the right and liquid handling system (not shown). The microreactor is imaged under green laser light to detect fluorescence at different locations stemming from DNA molecules, monitoring the time course of the computation. A setup, not shown, is also used for reading and writing. The images on the right show close ups of the fluidic connections to the microreactor and of the laser illumination of the wafer.

designed to be extensible to allow full elements within the microreactor. At the include the pharmaceutical and evolutionary search (see below). No flow spatial resolution and time scale required, diagnostics industry (where molecular switching is required (since this is not photolithographic projection can be made complexity requires increasingly currently scalable) in this dataflow like completely dynamically programmable. sophisticated algorithms for combinatorial architecture. Currently the microreactor libraries construction and readout geometry is fixed to evaluate scalability Directed molecular evolution provides a implemented in molecular hardware), on a benchmark problem: Maximum second stepping stone to exploiting more nanotechnology and high performance Clique. Specific problem instances are powerful algorithms in DNA Computing. computing for coding, associative reconfigured optically. The programming problem shifts to retrieval and combinatorial optimisation. defining selection conditions which match Effective DNA Computing is dependent the given problem. The strategy employed Status of the Project on the construction of a powerful interface in this project is to program molecular A scalable architecture for configurable to the molecular world. In this project, survival by employing flow networks DNA Computing has been developed for the interface involves configurable involving sequence-dependent molecular fully optically programmable solution of microreactors with photochemical input transfers in series and parallel. Feedback the NP-Complete test problem "Maximal and fluorescence readout down to the loops and amplification modules have Clique". Individual modules (strand single molecule level. Fluorescence been designed and will be introduced into transfer and amplification) have been detection is the most sensitive the computer in due course to complete implemented in microreactors and the spectroscopic technique for detecting the the integration with molecular evolution. optical programmability has been presence of specific molecules in solution Reconfiguration and evolution of the developed and demonstrated. Spatially and it can be employed as an imaging tool compartmentation and is resolved single molecule fluorescence to gain vast amounts of information in planned at a second phase in the detection has been developed to allow an parallel about the status of computations technology development to increase the on-line readout of information in complex or the final answer. Consistent with our general programmability of the computer. DNA populations. Microsystem perspective on DNA Computing as integration has proceeded up to the level hardware design, we employ DNA-Computers can produce their of 20x20x3 selection modules, but current photolithographic techniques to program calculated output in functional molecular evaluation work is of a 6x6x3 DNA the attachment of DNA to specific mobile form for direct use. Application areas microprocessor. A DNA library for

34 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOCOMPUTING

medium scale combinatorial problems Cooperation Dortmund (Prof. Banzhaf) and University (N=32) has been designed and the lower Collaborations are being fostered with the Bochum (Prof. Kiedrowski). portion of it constructed. The potential of European Molecular Computing Links: iterated modular evolution in DNA Consortium (EMCC), in particular BioMIP website: Computing has been evaluated on a University of Leiden, Holland (Prof. G. http://www.gmd.de/BIOMIP massively parallel reconfigurable Rozenberg, Prof. H. Spaink); North Rhine electronic computer (NGEN). All of these Westfalia initiative in Programmable Please contact: developments have alternative Molecular Systems: including University John S. McCaskill Ð GMD Tel: +49 2241 14 1526 technological applications beyond the of Cologne (Prof. Howard), University E-mail: [email protected] immediate range of DNA Computing.

Molecular Computing Research at Leiden Center for Natural Computing by Grzegorz Rozenberg

Molecular computing is an exciting and fast growing computer science, molecular computing is a very research area. It is concerned with the use of interdisciplinary area with researchers from computer (bio)molecules and biochemical processes for the science, mathematics, molecular biology, crystallo- purpose of computing. Although it is centered around graphy, biochemistry, physics, etc participating in it.

Molecular computing has the potential to the Faculty of Mathematics and Natural ¥ design of molecules for evolutionary resolve two well recognized obstacles of Sciences. Molecular Computing is one of DNA computing silicon based computer technology: the main research programs of LCNC and ¥ DNA computing methods based on miniaturization and massive parallelism. it is multidisciplinary, with three single molecule detection systems Through molecular computing one participating groups: Leiden Institute for ¥ experimental confirmation of gene ‘descends’ to the nano-scale computing Advanced Computer Science (Prof. G. assembly operations in ciliates. which solves the miniaturization problem. Rozenberg), Institute of Molecular Plant Since, eg, a single drop of solution can Biology (Prof. H. Spaink), and In our research, both theoretical and contain trillions of DNA molecules, and Department of Biophysics (Prof. T. experimental, we cooperate with a when an operations is performed on a tube Schmidt). The research on molecular number of research centers around the containing DNA molecules then it is computing also involves the Evolutionary world - in particular with: University of performed on every molecule in the tube, Algorithms research program of LCNC Colorado at Boulder (USA), Princeton massive parallellism is obtained on a (Prof. J. Kok and Prof. T. Baeck). University (USA), California Institute of grand scale. Technology (USA), State University of The two main research lines on molecular New York at Binghamton (USA), Turku DNA computing (the area of molecular computing within LCNC are: Center for Computer Science (Finland), computing where one considers DNA (1) models and paradigms for molecular Romanian Academy (Romania) and molecules) offers a number of other computing where the following Waseda University (Japan). features which make it an attractive theoretical topics are currently under alternative (or supplementary) technology investigation: Here at LCNC we feel that the to modern silicon computing. These ¥ splicing systems interdisciplinary research on molecular features include very impressive energy ¥ forbidding-enforcing systems computing has significantly deepened our efficiency and information density. ¥ molecular landscapes understanding of computation taking ¥ membrane systems place all around us: in computer science, One may say that the main thrust of the ¥ linear self-assembly of complex DNA biology, physics, etc. We certainly look current research in molecular computing tiles forward to many years of exciting and is the assessment of its full potential. The ¥ models for evolutionary DNA computing challenging research. results obtained to date are cautiously ¥ models for gene assembly in ciliates in vivo Links: optimistic. In particular, the conceptual (DNA computing ). Leiden Center for Natural Computing: understanding and experimental testing http://www.wi.leidenuniv.nl/~lcnc/ of basic principles achieved is already (2) design of laboratory experiments Grzegorz Rozenberg’s homepage: quite impressive. testing models for molecular computing. http://www.liacs.nl/~rozenber Current laboratory experiments include: Please contact: The research on molecular computing at ¥ the use of plasmids as data registers for Grzegorz Rozenberg Ð Leiden Center Leiden University takes place at the DNA computing for Natural Computing, Leiden University Leiden Center for Natural Computing ¥ the use of molecules other than DNA Tel: +31 71 527 70 61/67 (LCNC), an interdepartmental institute of for molecular computing E-mail: [email protected]

ERCIM News No. 43, October 2000 35 SPECIAL THEME: BIOCOMPUTING

Cellular Computing

by Martyn Amos and Gerald G. Owenson

The recent completion of the first draft of the human well-established, but researchers are now trying to genome has led to an explosion of interest in genetics reverse the analogy, by using living organisms to and molecular biology. The view of the genome as a construct logic circuits. network of interacting computational components is

The cellular computing project is the much consideration we decided to attempt The interaction of various components of result of collaboration between teams at a rather more ambitious approach, the E. coli genome during development the University of Liverpool (Alan harnessing genetic regulatory mechanisms may be described in terms of a logic Gibbons, Martyn Amos and Paul Sant) in vivo, within the living E. coli. bacterium. circuit. For example, a gene’s expression and the University of Warwick (David may require the presence of two particular Hodgson and Gerald Owenson). The field The central dogma of molecular biology sugars. Thus, we may view this gene in emerged in 1994 with the publication of is that DNA (information storage) is terms of the Boolean AND function, Adleman’s seminal article, in which he copied, producing an RNA message where the presence or absence of the two demonstrated for the first time how a (information transfer). This RNA then acts sugars represent the two inputs to the computation may be performed at a as the template for protein synthesis. The function, and the expression or repression molecular level. Our group contributed basic ‘building blocks’ of genetic of the gene corresponds to the function’s to the development of the area by information are known as genes. Each gene output. That gene’s product may then be describing a generalization of Adleman’s codes for a specific protein which may be required for the expression (or repression) approach, proposing methods for turned on (expressed) or off (repressed) of another different gene, so we can see assessing the complexity of molecular when required. In order for the DNA how gene products act as ‘wires’, carrying algorithms, and carrying out experimental sequence to be converted into a protein signals between different ‘gates’ (genes). investigations into error-resistant molecule, it must be read (transcribed) and laboratory methods. This work quickly the transcript converted (translated) into a In order to implement our chosen logic confirmed that the massively-parallel protein. Each step of the conversion from circuit, we select a set of genes to represent random search employed by Adleman stored information (DNA) to protein gates, ensuring that the inter-gene would greatly restrict the scalability of synthesis (effector) is itself affected or dependencies correctly reflect the that approach. We therefore proposed an catalyzed by other molecules. These connectivity of the circuit. We then insert alternative method, by demonstrating how molecules may be enzymes or other these genes into a bacterium, using Boolean logic circuits may be simulated compounds (for example, sugars) that are standard tools of molecular biology. These using operations on strands of DNA. required for a process to continue. insertions form the most time-consuming Consequently, a is formed, where component of the entire experimental Our original intention was to implement products of one gene are required to process, as the insertion of even a single the Boolean circuit in vitro (ie in a produce further gene products, and may gene can be problematic. However, once laboratory ‘test tube’). However, after even influence that gene’s own expression. the entire set of genes is present in a single

PCR equipment, used to amplify DNA samples. Postdoctoral researcher Gerald Owenson at the lab bench.

36 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOCOMPUTING

colony of bacteria we have an unlimited within the three years, the Our group is a member of the European supply of ‘biological hardware’ at our introduction of human-defined logic Molecular Computing Consortium (see disposal. The inputs to the circuit are set circuits into living bacteria will be a article on page 32). We acknowledge the by enforcing in the cell’s environment the reality. Of course, such implementations support of the BBSRC/EPSRC presence or absence of various compounds will never rival existing silicon-based Bioinformatics Programme. that affect the expression of the genes computers in terms of speed or efficiency. Links: representing the first level gates. Then, However, our goal differs from that of a Publications: essentially, the development of the cell lot of groups in the community, who insist http://www.csc.liv.ac.uk/~martyn/pubs.html and the complex regulatory processes that DNA-based computers may The Warwick group: involved, simulate the circuit, without any eventually rival existing machines in http://www.bio.warwick.ac.uk/hodgson/ additional human intervention. This last domains like . Rather, we see index.html EMCC: http://www.csc.liv.ac.uk/~emcc point is crucial, as most existing proposals the potential applications of introducing for molecular computing require a series logic into cells as lying in fields such as Please contact: of manipulations to be performed by a medicine, agriculture and nano- Martyn Amos Ð School of Biological laboratory technician. Each manipulation technology. The currenct ‘state of the art’ Sciences and Department of Computer reduces the probability of success for the in this area has resulted in the Science, University of Liverpool Tel: +44 1 51 794 5125 experiment, so the ideal situation is a ‘one- reprogramming of E. coli genetic E-mail: [email protected] pot’ reaction, such as the one we propose. expression to generate simple oscillators. or Gerald G. Owenson This work, although ‘blue sky’ in nature, Department of Biological Sciences, We have recently begun work on will advance the field to a stage where University of Warwick simulating a small (3 gate) circuit of cells may be reprogrammed to give them Tel: +44 2 47 652 2572 E-mail: [email protected] NAND gates in vivo. We believe that, simple ‘decision making’ capabilities.

Research in Theoretical Foundations of DNA Computing by Erzsébet Csuhaj-Varjú and György Vaszil

DNA computing is a recent challenging area at the paradigms for DNA computing: theoretical models for interface of computer science and molecular biology, test tube systems and language theoretical providing unconventional approach to computation. frameworks motivated by the phenomenon of Watson- The Research Group on Modelling Multi-Agent Crick complementarity. Systems at SZTAKI develops computational

The famous experiment of Leonard constructs realizing the above idea. These biological computers based on DNA Adleman in 1994, when he solved a small are distributed communicating systems molecules. Since that time, the theory of instance of the Hamiltonian path problem built up from computing devices (test theoretical test tube systems has been in a graph by DNA manipulation in a tubes) based on operations motivated by extensively investigated by teams and laboratory, seeded his ideas on how to the recombinant behaviour of DNS authors from various countries. construct a molecular computer. strands. These operations are applied to Following on from this, Richard J. Lipton the objects in the test tubes (sets of strings) The main questions of research into proposed a kind of programming in a parallel manner and then the results models of test tube system are Ð among language for writing algorithms dealing of the computations are redistributed other things Ð comparisons of the variants with test tubes. The basic primitive of the according to certain specified criteria with different basic operations motivated proposed formalism is the test tube, a set (input/output filters associated with the by the behaviour of DNA strands (or or a multiset (the elements are with components) which allow only specific DNA-related structures) according to their multiplicities) of strings of an alphabet, parts of the contents of the test tube to be computational power, programmability, and the basic operations correspond to transferred to the other tubes. Test tube simplicity of the filters, topology of the operations with test tubes, merge (put systems based on splicing were presented test tubes and approachability of together the contents of two test tubes), in 1996 by Erzsébet Csuhaj-Varjú, Lila intractable problems. Our investigations separate (produce two test tubes from one Kari and Gheorghe Paun, an another follow this line. tube, according to a certain criteria), model in 1997 by Rudolf Freund, amplify (duplicate a tube), and check Erzsébet Csuhaj-Varjú and Franz At present we explore test tube systems whether a tube is empty or not. Wachtler. Both models proved to be as with multisets of objects (symbols, powerful as Turing machines, giving a strings, data structures) and operations Test tube systems based on splicing and theoretical proof of the possibility to modelling biochemical reactions. In test tube systems based on cutting and design universal programmable addition to the computational power, the recombination operations are theoretical computers with such architectures of emphasis is put on elaborating complexity

ERCIM News No. 43, October 2000 37 SPECIAL THEME: BIOCOMPUTING

notions for these devices. As a further to to the complementary string, for example, detecting so-called black development, we extended the concept of which is always available. holes in the network (nodes which never the test tube system to a pipeline emit any string). Our future plans aim at architecture allowing objects to move in In close cooperation with Prof. Arto comparisons of these devices with the system. Salomaa (Turku Centre for Computer different underlying mechanisms and Science), we study properties of Watson- different qualitative/ quantitative The other important direction of our Crick DOL systems, a variant where the conditions employed as triggers, research is to study language theoretical underlying generative device is a parallel according to computational power, models motivated by Watson-Crick string rewriting mechanism modelling stability and complexity issues, including complementarity, a fundamental concept developmental systems. Recently, we complexity measures different from the in DNA Computing. According to this have demonstrated the universal power customary ones. phenomenon, when bonding takes place of some important variants of these (supposing that there are ideal conditions) systems. In the -work of our joint We have been in contact and cooperation between two DNA strands, the bases research, we have introduced and now with several leading persons from the opposite each other are complementary. explore networks of Watson-Crick DOL European Molecular Computing systems, where the communication is Consortium. The group is open for any A paradigm in which Watson-Crick controlled by the trigger for conversion further cooperation. complementarity is viewed in the to the complementary form: whenever a Links: operational sense was proposed for further “bad” string appears at a node, the other http://www.sztaki.hu/mms consideration by Valeria Mihalache and nodes receive a copy of its corrected Arto Salomaa in 1997. According to the version. In this way, the nodes inform Please contact: proposed model, a ‘bad’ string (a string each other about the correction of the Erzsébet Csuhaj-Varjú Ð SZTAKI satisfying a specific condition, a trigger) emerging failures. In addition to the Tel: +36 1 4665 644 E-mail: [email protected] produced by a generative device induces computational power of these constructs or György Vaszil Ð SZTAKI a complementary string either randomly (which proved to be computationally Tel: +36 1 4665 644 or guided by a control device. The complete in some cases), we have E-mail: [email protected] concept can also be interpreted as follows: achieved interesting results in describing or Arto Salomaa Ð Turku Centre in the course of a developmental or the dynamics of the size of string for Computer Science Tel: +358 2 333 8790 computational process, things can go collections at the nodes. We have been E-mail: [email protected] wrong to such an extent that it is advisable dealing with challenging stability issues, Representing Structured Symbolic Data with Self-organizing Maps

by Igor Farkas

Artificial self-organizing neural networks - especially Slovak Grant Agency for Science (VEGA) within a self-organizing maps (SOMs) - have been studied for project of the Neural Network Laboratory. The goal of several years at the Institute of Measurement Science this research is to assess the potential of SOM to of the Slovak Academy of Sciences in Bratislava. The represent structured symbolic data, which is SOM research oriented to structured symbolic data associated with various high-level cognitive tasks. is the most recent and is being supported by the

Various information processing tasks can facing this criticism, during the last symbols is encoded as a binary vector). be successfully handled by neural decade a number of neural architectures These encoding (representations) are network models. Predominantly, neural and algorithms were designed to gradually formed in the layer of hidden nets have proven to be useful in tasks demonstrate that neural networks also units during training. However, this is which deal with real-valued data, such possess the potential to process symbolic often a time-consuming process (moving as pattern recognition, classification, structured data. The best known examples target problem). Despite this drawback, feature extraction or signal filtering. of such models include recursive auto- RAAM has become one of the standard Neural networks have been criticised as associative memory (RAAM), tensor neural approaches to represent structures. being unsuitable for tasks involving product based algorithms or models that symbolic and/or structured data, as occur incorporate synchrony of neurons’ For the learning method to work, the in some cognitive tasks (language activation. For example, the RAAM, representations of structures generated processing, reasoning etc.). These tasks being a three-layer feed-forward neural must be structured themselves to enable were previously tackled almost network, can be trained to encode various structure-sensitive operations. In other exclusively by classical, symbolic data structures presented to the network words, the emerged representations must artificial-intelligence methods. However, (such as ‘A(AB)’, where each of the two be systematic (compositional), requiring

38 ERCIM News No. 43, October 2000 SPECIAL THEME: BIOCOMPUTING

(in some sense) that similar structures belonging to the first level in the hierarchy sequences are in terms of their suffices, should yield similar encoding. and the former to the second higher level. the more closely are their representations In this notation, symbol ‘o’ is simply placed in the hypercube. The encoding Objectives treated as another symbol and represents scheme described has been incorporated We focused on an alternative model that a subtree. In general, structures with in training the SOM. The number of units incorporates a neural network having the maximum depth ‘n’ require ‘n’ SOMs to employed has to be selected according to capability to generate structured be used and stacked in a hierarchy. Hence, the length of sequences being used: the representations. As opposed to RAAM, the representation of a structure consists longer the sequences, the finer the the main advantage of our model consists in simultaneous representations of resolution needed and the more units are in fast training, because it is based on a associated sequences emerging after required. The training resulted in a rather different concept, employing decomposition, distributed across the topographically ordered SOM, in which (self-organisation). SOMs at all levels. a single sequence can afterwards be The core of the model are the self- represented by the winning unit in the organising maps (SOMs) which are well- How is a sequence represented in the map. If we do not use winner co-ordinates known as a standard neural network SOM? The idea comes from Barnsley’s as output (as commonly done), but take learning algorithm used for clustering and iterated function systems (IFS) which can the global activity of all units in the map visualisation of high-dimensional data be used for encoding (and decoding) (as a Gaussian-like profile, with its peak patterns. The visualisation capability of symbolic sequences as points in a unit centred at the winner’s position in the the SOM is maintained by its unique hypercube. In the hypercube, every grid), then we can simultaneously feature Ð topographic (non-linear) symbol is associated with a particular represent multiple sequences in one map. transformation from input space to output (the dimension of the hypercube units. The latter are arranged in a regular must be sufficiently large to contain We have applied the above scheme on grid Ð which implies that data patterns enough vertices for all symbols in simple two- and four-symbol structures originally close to each other are mapped alphabet). During the encoding process of depth less than four, and the preliminary onto nearby units in the grid. We (going through the sequence symbol by results (evaluted using a hierarchical exploited this topographic property symbol), a simple linear transformation of representations analogously in representing sequences. is recursively applied resulting in ‘jumps’ obtained) show that the generated within the hypercube, until the final point representations display the property of Results of this series (when the end of sequence systematic order. However, to validate this In our approach, each symbolic structure is reached) becomes an IFS-representing approach, we shall need to test how it is first decomposed onto a hierarchy of point of the sequence. As a result, we get scales with larger symbol sets as well as sequences (by some external parsing as many points as there are sequences to with more complex structures (both in module) which are then mapped onto a be encoded. The important feature of terms of ‘height’ and ‘width’). hierarchy of SOMs. For instance, a these IFS-based sequence representations Please contact: structure ‘A(AB)’ which can be viewed is the temporal analogue of the Igor Farkaso Ð Institute of Measurement as a tree of depth two, is decomposed into topographic property of a SOM. This Science, Slovak Academy of Sciences two sequences ‘Ao’ and ‘AB’, the latter means that the more similar two Tel: +421 7 5477 5938 E-mail: [email protected] Neurobiology keeps Inspiring New Neural Network Models by Lubica Benuskova

Biologically inspired recurrent neural networks are ‘Theory of neocortical plasticity’ in which theory and investigated at the Slovak Technical University in computer simulations were combined with Bratislava. This project, supported by the Slovak neurobiological experiments in order to gain deeper Scientific Grant Agency VEGA, builds on the results insights into how real brain neurons learn. of the recently accomplished Slovak-US project

The human brain contains several about how our own brains work. This an animal was exposed to a novel sensory hundred thousand millions of specialised understanding can be crucial not only for experience. We work with the cells called neurons. Human and animal the development of medicine but also for evolutionarily youngest part of the brain, neurons share common properties, thus computer science. Of course, the validity ie neocortex, which is involved mainly in researchers often use animal brains to of extrapolating findings in animal studies the so-called cognitive functions (eg, study specific questions about processing depends on many aspects of the problem perception, association, generalisation, information in neural networks. The aim studied. We studied a certain category of learning, memory, etc). is to extrapolate these findings to ideas changes occuring in neurons when

ERCIM News No. 43, October 2000 39 SPECIAL THEME: BIOCOMPUTING

Neurons emit and process electric signals. Ebner’s team at Vanderbilt University in They communicate these signals through Nashville, TN, USA used an animal specialized connections called synapses. model of mental retardation (produced by Each neuron can receive information from exposure of the prenatal rat brain to as well as send it to thousands of ethanol) to show a certain specific synapses. Each piece of information is impairment of experience-evoked transmitted via a specific and well defined neocortical plasticity. From our model, set of synapses. This set of synapses has we have derived an explanation of this its anatomical origin and target as well as impaired plasticity in terms of an specific properties of signal transmission. unattainably high potentiation threshold. At present, it is widely accepted that Based on a comparison between origins and targets of connecting synapses computational results and experimental are determined genetically as well as most data, revealing a specific biochemical of the properties of signal transmission. deficit in these faulty cortices, we have However, the efficacy of signal transfer proposed that the value of the potentiation at synapses can change throughout life as threshold depends also on a specific a consequence of learning. biochemical state of the neuron.

In other words, whenever we learn The properties of the self-organising something, somewhere in our neocortex, BCM learning rule have inspired us to changes in the signal transfer functions investigate the state space organisation of of many synapses occur. These synaptic recurrent BCM networks which process changes are then reflected as an increase time series of inputs. Activation patterns or decrease in the neuron’s response to a across recurrent units in recurrent neural given stimulus. All present theories of networks (RNNs) can be thought of as Cortical neuron. Courtesy of Teng Wu, synaptic learning refer to the general rule Vanderbilt University, USA. sequences involving error back introduced by the Canadian psychologist propagation. To perform the next-symbol Donald Hebb in 1949. He postulated that prediction, RNNs tend to organise their repeated activation of one neuron by state space so that ‘close’ recurrent another, across a particular synapse, Research Results activation vectors correspond to histories increases its strength. We can record Experience-dependent neocortical of symbols yielding similar next-symbol changes in neurons’ responses and then plasticity was evoked in freely moving distributions. This leads to simple finite- make inferences about which synapses adult rats in the neocortical representation context predictive models, built on top of have changed their strengths and in which of their main tactile sense, i.e. whiskers. recurrent activation, grouping close direction, whether up or down. For this We developed a neural network model of activation patterns via vector quantization. inference, we need to introduce some the corresponding neuroanatomical We have used the recurrent version of the reasonable theory, that is a set of circuitry. Based on computer simulations BCM network with lateral inhibition to assumptions and rules which can be put we have proposed which synapses are map histories of symbols into activation together into a model which simulates a modified, how they are modified and patterns of the recurrent layer. We given neural network and which, in why. For the simulation of learning, we compared the finite-context models built simulation, reproduces the evolution of used the theory of Bienenstock, Cooper on top of BCM recurrent activation with its activity. If the model works, it can give and Munro (BCM). Originally, the BCM those constructed on top of RNN us deeper insights into what is going on theory was introduced for the developing recurrent activation vectors. As a test bed, in the real neural networks. (immature) visual neocortex. They we used complex symbolic sequences modelled experiments done on monkeys with rather deep memory structures. Objectives and Research Description and cats. We have shown that the BCM Surprisingly, the BCM-based model has Experience-dependent neocortical rules apply also for the mature stage of a comparable or better performance than plasticity refers to the modification of the brain development, for a different part its RNN-based counterpart. synaptic strengths produced by the use of the neocortex, and for the different Please contact: and disuse of neocortical synapses. We animal species (rats). The main Lubica Benuskova Ð Slovak Technical would like to contribute to an intense distinguishing feature of the BCM theory University scientific effort to reveal the detailed rules against other Hebbian theories of synaptic Tel: +421 7 602 91 696 which hold for modifications of synaptic plasticity is that it postulates the existence E-mail: [email protected], connections during learning. Further, we of a shifting synaptic potentiation http://www.dcs.elf.stuba.sk/~benus want to investigate self-organising neural threshold, the value of which determines networks with time-delayed recurrent the sign of synaptic changes. The shifting connections for processing time series of threshold for synaptic potentiation is inputs. proportional to the average of a neuron’s activity over some recent past. Prof.

40 ERCIM News No. 43, October 2000 RESEARCH AND DEVELOPMENT

Contribution to Quantitative Evaluation of Lymphoscintigraphy of Upper Limbs by Petr Gebousky, Miroslav Kárny and Hana Kfiížová

Lymphoscintigraphy is an emerging technique for is inhibited by a lack of a reliable quantitative evaluation judging of the state of lymphatic system. It serves of its results especially in difficult case of upper limbs well for diagnosis of lymphedema. Its early and correct in early stage of the disease. The modeling and detection prevents a range of complications leading subsequent identification can be exploited in order to often to a full disability. A wider use of this technique improve the accuracy of diagnosis.

Lymphedema is a progressive condition characterized by four pathologic features: excess tissue protein, edema, chronic inflammation and fibrosis. Upper limb lymphedema is a frequent complication of breast cancer.

A variety of imaging modalities have been employed in the evaluation of the lymphatic system. Lymphoscintigraphy is a noninvasive technique in which radionuclides are used to image regional lymph drainage systems.

It provides information about lymph transportation, filtration and reticuloendothelial function in extremities.

Standard inspection method consists of administration of 25 MBq of Tc-99m labeled colloid administered to the 1st interstitial space of both hands. Static 1 min images are taken immediately, 30 & Time-activity curves (with shapes) estimated from 2 measurements (the 1st & 3rd) in various ROIs where responses in left and right limb are compared. Red colour refers to left blue to 180 min after administration, right limb. The estimated residence times are written on the right, the first one for left limb. morphologically evaluated and arm results semi-quantitatively compared. No standard is available for healthy distribution of discretised common time to 5 measurements). Real data was population. constant, model order and gain. Each provided by Department of Nuclear triple of parameters determines uniquely Medicine FNM. The evaluation results The system identification is employed time-activity curve in individual ROIs indicate adequacy of the adopted model here for description of individual response together with the important parameter for and encourage us to follow the direction of the patient. We tried to model the diagnosis - area under curve (residence outlined under (i) to (iii). accumulation of the radioactive tracer in time). This allows us to compute posterior Please contact: local regions of upper limb during distribution of residence time in each ROI. Petr Gebousky Ð CRCIM (UTIA) lymphoscintigraphic procedure by a chain Tel: +420 2 66052583 of simple identical compartmental models Having in these distributions in hands, we E-mail: [email protected] with a common time constant, unknown can (i) judge (in)significance of gain and unknown number of differences between both arms (ii) study constituents. The chosen 3-parameters quantitative relationships between disease model was motivated by a standard staging and measurements (iii) optimize modeling of distributed parameter measurements moments. systems. Their Bayesian estimator was designed that converts integral counts The proposed model and data processing taken over regions of interests (ROI : were tested on data of 24 patients that forearm, upper arm, axilla) into posterior were measured more often than usual (up

ERCIM News No. 43, October 2000 41 RESEARCH AND DEVELOPMENT

Approximate Similarity Search

by Giuseppe Amato

Similarity searching is fundamental in various research is needed. The performance of similarity application areas. Recently it has attracted much search for complex features deteriorates and does attention in the database community because of the not scale well to very large object collections. Given growing need to deal with large volume of data. the intrinsically interactive nature of the similarity- Consequently, efficiency has become a matter of based search process, the efficient execution of concern in design. Although much has been done to elementary queries has become even more important, develop structures able to perform fast similarity and the notion of approximate search has emerged search, results are still not satisfactory, and more as an important research issue.

Contrary to traditional databases, where simple attribute data are used, the standard approach to searching modern data repositories, such as multimedia databases, is to perform search on characteristic features that are extracted from information objects. Features are typically high dimensional vectors or some other data items, the pairs of which can only be compared by specific functions. In such search environments, exact match has little meaning; thus, concepts of similarity are typically applied. In order to illustrate this, let us consider an image data repository. It is clear that images are not atomic symbols, so equality is not a particularly realistic Evaluation of approximate k nearest neighbours search. predicate. Instead, search tends to be based on similarity, because resemblance is more important than perfectly matching provides a useful abstraction of similarity Extensive experimental tests have shown bit patterns. On the other hand, all that is or nearness. We modified existing tree- a high reliability of this approach that similar is not necessarily relevant, so this based similarity search structures to gave a substantial contribution to the paradigm tends to entail the retrieval of achieve approximate results at quality of the approximate results and to false positives that must be manually substantially lower costs. In our proposal, the efficiency of the approximate discarded. In other words, the paradigm approximation is controlled by an external similarity search algorithm. We applied rejects the idea that queries may be parameter of proximity of regions that this idea for the similarity range and the expressed in terms of necessary and allows avoiding access to data regions nearest neighbours queries and verified sufficient conditions that will determine that possibly do not contain relevant its validity on real-life data sets. exactly which images we wish to retrieve. objects. When the parameter is zero, Improvements of two orders of magnitude Instead, a query is more like an precise results are guaranteed, and the were achieved for moderately information filter, defined in terms of higher the proximity threshold, the less approximated search results. specific image properties, which reduce accurate the results are and the faster the the user’s task by providing only a small query is executed. The main contributions of our approach number of candidates to be examined. It can be summarised as follows: is more important that candidates that are In order to have good quality results an likely to be of interest are not excluded accurate proximity measure is needed. ¥ A unique approximation approach has than it is that possibly irrelevant The technique that we use to compute been applied to the similarity range and candidates be included. proximity between regions adopts a the nearest neighbours queries in metric probabilistic approach: given two data data files. Previous designs have only We have investigated the problem of regions, it is able to determine the considered the nearest neighbours approximated similarity search for the probability that the intersection of these search, sometimes even restricted to range and nearest neighbour queries in two regions contains relevant data objects. one neighbour. the environment of generic metric spaces. In fact, note that, even if two regions ¥ The approximation level is parametric, From a formal point of view, the overlap, there is no guarantee that objects and precise response to similarity mathematical notion of metric space are contained in their intersection.

42 ERCIM News No. 43, October 2000 RESEARCH AND DEVELOPMENT

queries is achieved by specifying zero lead to effective and efficient apply the approach to real image and proximity threshold. approximations of similarity search. video archives. Finally, we intend to study ¥ The approach to computation of ¥ Though implementation is the cases of iterative similarity search and proximity keeps the approximated demonstrated by extending the M-tree, complex approximated similarity search. results probabilistically bound. the approach is also applicable to other Experimental results demonstrate high similarity search trees at small Other people that have contributed to this precision and improvements of implementation costs. research are Pavel Zezula, Pasquale efficiency. Savino, and Fausto Rabitti. ¥ We have experimentally demonstrated In the future, we plan to properly compare Please contact: the importance of precise proximity all existing approaches to approximation Giuseppe Amato Ð IEI-CNR measures, the application of which can in uniform environment. We also hope Tel: +39 050 315 2906 to develop a system-user interface and E-mail: [email protected]

Education of ‘Information Technology Non-professionals’ for the Development and Use of Information Systems by Peter Mihók, Vladimír Penjak and Jozef Bucko

Information creating and processing in modern young business managers. We describe, in the context information systems requires increasing engagement of the concept of student education developed at the of all users in the analysis and development stages. Faculty of Sciences of P.J. Safárik University and The article attempts to investigate how deep a Economical Faculty of Technical University in Ko_ice, knowledge of object oriented methodologies for the our experience in this and formulate certain questions development of information systems is necessary for and problems which we believe to be relevant in this students of applied mathematics, economics and area.

The ability to use modern information and Object Oriented Methodologies in accordance with the basic ideas of communication technologies efficiently and the Training of Information object oriented approaches. We are of the is one of the fundamental requirements System Users opinion that it is an advantage for an for graduates of all types of university The development, maintenance and use effective cooperation between information level educational institutions. Future of efficient integrated information systems technology professionals and users if the economists, managers, businessmen are is becoming a serious competitive users possess at least a basic understanding facing a reality which encompasses the advantage in economic competition. One of the concepts and models used by object necessity of working in an environment of the reasons for this is the fact that the oriented methodologies. of global integrated information systems efficiency and effectively of these systems (IIS) which are becoming an integral part is directly dependent on the ability of The use of objects made possible the of the global information society. Courses users to provide the utilization of their synergy in that the which support the use of modern best possible specification of their object model united all the stages to yield information and communication requirements. In spite of the fact that the one design which could be used without technologies are designed to provide the development and implementation of modification up to and including the students with a certain amount of information systems is the domain of implementation stage. This was because theoretical fundamental knowledge but professional information technologists. it in all stages the designer used one their main purpose is to impart practical is impossible to build an integrated language - the language of objects and knowledge and skills. The students are information system without close classes. Thus it becomes necessary to motivated to search the worldwide web cooperation with the prospective system explain these fundamental concepts. This for the newest information and make their users whose numbers are increasing in a poses a non-trivial didactical problem: own survey of the services and products dramatic way. This forces us to make a how detailed an explanation of the offered in connection with direct new evaluation of the way the fundamental concepts ‘object’ and ‘class’ (electronic) banking and acquaint ‘information technology non-professionals should be provided? themselves with the forms and principles - IIS users’ are trained and the way they of electronic commerce. should be trained to prepare them for these Object oriented methodologies offer a tasks. Since the last years of our century wide range of graphical modeling tools, are marked by a pronounced shift towards many kinds of diagrams. It is necessary object oriented methodologies we decided to choose the most suitable modeling tools to plan the education of future IIS users for their creation. After careful

ERCIM News No. 43, October 2000 43 RESEARCH AND DEVELOPMENT

consideration of various alternative future IIS users to have at least an idea of However, it seems certain that some models which could be used in training its architecture and infrastructure? understanding of this problem domain is future IS users we decided to start with a necessary for all future managers. Let us detailed presentation of the so-called Use A detailed description of all known remark that this conclusion is borne out Case Model. Again, what level of detail requirements which the new system by our experience in developing IIS both should be selected for training non- should satisfy is the most important for state administration and the university specialists in the development of Use common activity of the future user and the environment. A training method which Case models? Is it a good idea to teach developer at the start of a system we have found suitable is based on so- certain parts of the UML language - and development project. Experience gained called ‘virtual projects’. A student if so, how detailed a description is best? in many projects shows that it is an becomes a virtual top manager of a advantage if the process of user company or institution. Based on a Object oriented methods which we have requirement specification is controlled by fictional ‘present state of the IS’ he briefly mentioned are used in many CASE experienced analysts. A well prepared formulates the goals and specifications tools of various types which are document entitled Specification of User for his company’s IIS. At a later stage he indispensable for IS development. Requirements is of fundamental attempts, using the user requirement Developing extensive information importance for system introduction as specification and Use Case model, to systems without these tools is well as for resolving eventual create specification for a selected unthinkable. The use of objects has misunderstandings between the supplier subsystem of the IIS. In spite of the fact become an essential approach in the and contractor. User cooperation is a that most of our students have little development of modern distributed necessary condition for the preparation of knowledge of information technology applications on the Internet. One of the such a document. How much information some of the more than 300 virtual projects most frequent magic words which should the user receive concerning submitted so far are on a high level and emerged quite recently in information Requirements Engineering methods? are an encouragement for further and technology is CORBA - Common Object more detailed consideration of the Request Broken Architectur, there are Giving a comprehensive reply to all these questions raised in this note. other technologies as eg DCOM - questions is undoubtedly a complicated Please contact: Distributed Component Object Model- task and their understanding will no doubt Peter Mihók Ð P. J. Safarik University/SRCIM backed by . Is it necessary for be affected in various ways by the Tel: +421 95 622 1128 evolution in the field of IIS development. E-mail: [email protected]

Searching Documentary Films On-line: the ECHO Project

by Pasquale Savino

Wide access to large information collections is of great audiovisual archives, constitute some of the most potential importance in many aspects of everyday life. precious and less accessible cultural information. The However, limitations in information and communication ECHO project intends to contribute to the improvement technologies have, so far, prevented the average of the accessibility to this precious information, by person from taking much advantage of existing developing a (DL) service for historical resources. Historical documentaries, held by national films belonging to large national audiovisual archives.

The ECHO project, funded by the The Project is co-ordinated by IEI-CNR. partners (CNRS-LIMSI, IRST, University European Community within the V It involves a number of European of Twente, and University of Mannheim). Framework Program, KA III, intends to institutions (Istituto Luce, Italy; Institut provide a set of digital library services Nationale de l’Audiovisuel, France; ECHO System Functionality that will enable a user to search and access Netherlands Audiovisual Archive, and The emergence of the networked documentary film collections. For Memoriav, Switzerland) holding or information system environment allows example, users will be able to investigate managing unique collections of us to envision digital library systems that how different countries have documented documentary films, dating from the transcend the limits of individual a particular historical period of their life, beginning of the century until the collections to embrace collections and or view an event which is documented in seventies. Tecmath, EIT, and Mediasite services that are independent of both the country of origin and see how the are the industrial partners that will location and format. In such an same event has been documented in other develop and implement the ECHO environment, it is important to support countries, etc. One effect of the emerging system. There are two main academic the interoperability of distributed, digital library environment is that it frees partners (IEI-CNR and Carnegie Mellon heterogeneous digital collections and users and collections from geographic University - CMU) and four associate services. Achieving interoperability constraints. among digital libraries is facilitated by

44 ERCIM News No. 43, October 2000 RESEARCH AND DEVELOPMENT

conformance to an open architecture as well as agreement on items such as formats, data types, and conventions.

ECHO aims at developing a long term reusable software infrastructure and new metadata models for films in order to support the development of interoperable audiovisual digital libraries. Through the development of new models for film metadata, intelligent content-based searching and film-sequence retrieval, video abstracting tools, and appropriate user interfaces, the project intends to improve the accessibility, searchability, and usability of large distributed audiovisual collections. Through the implementation of multilingual services and cross language retrieval tools, the project intends to support users when accessing across linguistic, cultural and national boundaries. The ECHO system will be experimented, in the first place, for four national collections of documentary film archives (Dutch, French, Italian, Swiss). Other archives may be added in a later stage.

The ECHO System The figure provides an overview of the System overview. main operations supported by the ECHO system. ECHO assists the population of interaction of image, speech and natural The second prototype (Multilingual the digital library through the use of language understanding technology, the Access to Digital Film Collections) will mechanisms for the automatic extraction system compensates for problems of add a metadata editor which will be used of content. Using a high-quality speech interpretation and search in the error-full to index the film collections according to recogniser, the sound track of each video and ambiguous data sets. a common metadata model. Index terms, source is converted to a textual transcript, extracted automatically during the with varying word error rates. A language Expected Results indexing/segmentation of the film understanding system then analyses and The project will follow an incremental material (first prototype), will be organises the transcript and stores it in a approach to system development. Three integrated with local metadata, extracted full-text system. prototypes will be developed offering an manually, in a common description Multiple speech recognition modules for increasing number of functionalities. The (defined by the common metadata different European languages will be starting point of the project will be a model).The second prototype will support included. Likewise, image understanding software infrastructure resulting from an the interoperability of the four collections techniques are used for segmenting video integration of the Informedia and Media and content based searching and retrieval. sequences by automatically locating Archive(r) technologies. boundaries of shots, scenes, and The third prototype (ECHO Digital Film conversations. Metadata is then associated The first prototype (Multiple Language Library) will add summarization, with film documentaries in order to Access to Digital Film Collections) will authentication, and charging complete their classification. integrate the speech recognition engines functionalities in order to provide the with the infrastructure for content-based system with full capabilities. Search and retrieval via desktop indexing. It will demonstrate an open Links: computers and wide area networks is system architecture for digital film http://www.iei.pi.cnr.it/echo/ performed by expressing queries on the libraries with automatically indexed film audio transcript, on metadata, or by image collections and intelligent access to them Please contact: similarity retrieval. Retrieved on a national language basis. Pasquale Savino Ð IEI-CNR documentries, or their abstracts, are then Tel: +39 050 315 2898 E-mail: [email protected] presented to the user. By the collaborative

ERCIM News No. 43, October 2000 45 RESEARCH AND DEVELOPMENT

IS4ALL: A New Working Group promoting Universal Design in Information Society Technologies

by Constantine Stephanidis

IS4ALL (Information Society for All) Ð is a new EC- Group) to provide the European IT&T industry in funded project aiming to advance the principles and general, and Health Telematics in particular, with a practice of Universal Access in Information Society comprehensive code of practice on how to appropriate Technologies, by establishing a wide, interdisciplinary the benefits of universal design. and closely collaborating network of experts (Working

The International Scientific Forum selected mainstream Information Society about radical improvements in the type ‘Towards an Information Society for All’ Technologies, and important application and range of Health Telematics services. was launched in 1997, as an international domains with significant impact on society Accounting for the accessibility, usability ad hoc group of experts sharing common as a whole. Based on the success of its and acceptability of these technologies at visions and objectives, namely the initial activities, the Forum has proposed an early stage of their development life advancement of the principles of to the European Commission the cycle is likely to improve their market Universal Access in the emerging establishment, on a formal basis, of a impact as well as the actual usefulness of Information Society. The Forum held three wider, interdisciplinary and closely the end products. workshops to establish interdisciplinary collaborating network of experts (Working discussion, a common vocabulary to Group), which has been now approved for IS4ALL facilitate exchange and dissemination of funding. The IS4ALL Working Group aims to knowledge, and to promote international provide European industry with co-operation. The 1st workshop took place Universal Design appropriate instruments to approach, in San Francisco, USA, August 29, 1997, Universal Design postulates the design of internalise and exploit the benefits of and was sponsored by IBM. The 2nd took products or services that are accessible, universal access, with particular emphasis place in Crete, , June 15-16, 1998 usable and, therefore, acceptable by on Health Telematics. Toward this and the 3rd took place in Munich, potentially everyone, everywhere and at objective, IS4ALL will develop a Germany, August 22-23, 1999. The latter any time. Although the results of early comprehensive code of practice (eg, two events were partially funded by the work dedicated to promoting Universal enumeration of methods, process European Commission. The Forum has Access to the Information Society (for a guidelines) consolidating existing produced two White Papers, published in, review see: http://www.ics.forth.gr/ knowledge on Universal Access in the International Journal of Human-Computer proj/at-hci/files/TDJ_paper.PDF) are context of Information Society Interaction, Vol. 10(2), 107-134 and Vol. slowly finding their way into industrial Technologies, as well as concrete 11(1), 1-28, while a third one is in practices (eg, certain mobile , recommendations for emerging preparation. You may also visit point-of-sale terminals, public kiosks, technologies (eg, emerging desktop and [http://www.ics.forth.gr/proj/at-hci user interface development toolkits), a mobile platforms), with particular /files/white_paper_1998.pdf] and common platform for researchers and emphasis on their deployment in Health [http://www.ics.forth.gr/proj/at-hci/ practitioners in Europe to collaborate and Telematics. IS4ALL will also undertake files/white_paper_1999.pdf]. The White arrive at applicable solutions is still a mix of outreach activities to promote Papers report on an evolving international missing. Collaborative efforts are Universal Access principles and practice, R&D agenda focusing on the development therefore needed to collect, consolidate including workshops and seminars of an Information Society acceptable to and validate the distributed wisdom at the targeted to mainstream IT&T industry. all citizens, based on the principles of European as well as the international universal design. The proposed agenda level, and apply it in application areas of IS4ALL is a multidisciplinary Working addresses technological and user-oriented critical importance, such as Health Group co-ordinated by ICS-FORTH. The issues, application domains, and support Telematics, catering for the population at membership includes: Microsoft Healthcare measures. The Forum has also elaborated large, and involving a variety of diverse Users Group Europe, the European Health on the proposed agenda by identifying target user groups (eg, doctors, nurses, Telematics Association, CNR-IROE, challenges in the field of human-computer administrators, patients). Emerging GMD, INRIA and Fraunhofer-Gesellschaft interaction, and clusters of concrete interaction platforms, such as advanced zur Förderung der angewandten Forschung recommendations for international desktop-oriented environments (eg, e.V. - Institut für Arbeitswirtschaft und collaborative R&D activities. Moreover, advanced GUIs, 3D graphical toolkits, Organisation, Germany. the Forum has addressed the concept of visualisers), and mobile platforms (eg, accessibility beyond the traditional fields palmtop devices), enabling ubiquitous Please contact: Constantine Stephanidis Ð ICS-FORTH of inquiry (eg, assistive technologies, built access to electronic data from anywhere, Tel: +30 81 39 17 41 environment, etc), in the context of and at anytime, are expected to bring E-mail: [email protected]

46 ERCIM News No. 43, October 2000 TECHNOLOGY TRANSFER

Identifying Vehicles on the Move by Beate Koch

Research carried out in Germany in the field of following example shows how the preliminary information and communications technology is research conducted by the Fraunhofer Institutes - with extremely diverse in nature. A major role is played by a constant focus on the longer term - enables scientific the Fraunhofer-Gesellschaft with its total of 47 findings to be converted into pioneering and Institutes and branches in the USA and Asia. The marketable products.

Automated image-processing systems the identification of signs and symbols. individual components can be put constitute a booming market with double- Their latest breakthrough is a compact together in any combination. “Our ISY digit growth rates. The breakthrough was system capable of recognizing registra- family solves many tasks in traffic achieved in the combination of camera tion plates and loading signs on vehicles engineering,” reports project leader and information technology - thereby in flowing traffic up to a speed of 120 Nickolay, proudly. ISYCOUNT allows implementing mechanized sight, far km/h. vehicles to be counted in any traffic swifter and more reliable in operation than situation. The data are then used as the the human eye. Bertram Nickolay of the Fraunhofer basis for telematic systems that control Institute and his team have constructed traffic flow and prevent holdups. For many years, Fraunhofer researchers an entire family of modules that allows ISYTRACK combined with ISYPLATE in Berlin have played a leading role in the various components of road traffic to be can be used to register vehicles entering development of intelligent systems for recorded and automatically analyzed. The and leaving waste disposal sites or gravel pits. Unlike private cars, commercial trucks carry their identification plates in different places, and they have to be located amidst a number of other signs, such as the company logo and signs indi- cating the laden weight and the nature of the goods being transported. ISYSHAPE registers the shape and dimension of a and is then able to identify its type from specific features of its design, and make a rough classification of its loading.

The latest addition to the family is a software program that enhances the image recorded by the automatic recognition system: ISYFLASH is capable of determining the ideal moment at which to capture an image, even if the vehicle is traveling at high speed. This ensures the highest possible resolution and the maximum degree of certainty in identifying the registration plate. The family of modules is based on the latest methods of intelligent character recognition (ICR) combined with learning processes on the basis of soft computing. They are capable of solving complex problems, even in the presence of out-of- focus or doubtful elements such as mud on the number plate, deformations or unusual colors. Most of the currently available re-cognition systems are only

Photo: Voss Fotografie able to identify license plates common in Camera and processor are merged in a single device. This allows registration plates to be one or a few selected countries. They have detected and identified in flowing traffic or at the entrances and exits to car parks and factory sites. License plates are made clearly visible by the optical trigger. problems if the type and color of the background changes, or if a new character

ERCIM News No. 43, October 2000 47 TECHNOLOGY TRANSFER

font is introduced. The system developed systems used to identify and record spared the inconvenience of having to by the IPK is already capable of moving vehicles: Laser scanners are more stop at police road blocks during a large- recognizing signs on vehicles registered expensive to maintain, light barriers are scale police operation. in 25 different countries; it does not matter more susceptible to faults. Work-intensive if the vehicle is registered in Germany, earth-moving operations are required in Videomaut is not exactly a new idea, but the UK, Poland, Israel, South Africa or order to lay induction loops in the road it is still not ready for widespread even Russia. That is because the system surface. And none of these systems are implementation: Drivers slowly steer their learns fast: Given a few samples, such as infallible - they can just as easily be car through the video-control barrier, in actual number plates or video pictures of triggered by a person or an animal passing the hope that the camera will register the a new character font, the system trains through their field of recogni-tion. number plate correctly as they move past itself and by the next day it is capable, for ISYFLASH uses image-analysis methods at walking speed. If not, the barrier example, of reading signs written in to exclude such disturbances. The remains closed and the driver has to back Cyrillic script. The IPK is marketing its portable units are ideal for monitoring and out and join the line waiting at the regular software through a number of regulating the access ramps and exits of toll booth. The intelligent recognition international cooperating partners. high-rise car parks and industrial sites, system developed by the researchers in quickly and smoothly. Police forces in a Berlin is different: Barriers and toll booths A further advantage of the system is that number of countries have also indicated are obsolete, and the driver doesn’t even the camera and the electronic analysis considerable interest in the system devel- need to brake to a slower speed. A number circuits are incorporated in a small oped by the IPK, which can be used to of countries are considering automating compact unit to make the equipment more help track down criminals on the road. their highway toll systems using the portable and economical. The camera and Installed in a police vehicle, the mobile software from the IPK. The advantage: the processor are merged into a single ISYFLASH equipment can identify the No more tailbacks at the toll stations, and ‘smart device’. This simplifies the registration numbers of other road users useful data for traffic regulation. structure, reducing the number of while on the move or from a parked Please contact: interfaces and hence the number of position alongside a highway or at one of Bertram Nickolay Ð Fraunhofer Gesellschaft potential sources of errors. This portable its exits. If a car bearing the suspect Tel: +49 30 3 90 06 2 01 apparatus possesses several features that number drives past, an alarm lamp starts E-mail: [email protected] give it the edge over other recognition to flash. Other, innocent drivers are then

Virtual Planetarium at Exhibition ‘ZeitReise’

by Igor Nikitin and Stanislav Klimenko

At the Exhibition ZeitReise/TimeJourney from 12 May GMD’s Virtual Environments Group presented a 3D- to 25 June 2000 at the Academy of Arts in Berlin, installation ‘Virtual Planetarium’.

The Virtual Planetarium is an educational and also works in simple installations between the objects and to import the data application that uses special methods to using a single wall projection. from external devices into the application. display the astronomical objects realistically, as they are visible by Avango Application A non-linear geometrical model was used astronauts of a real spacecraft, preserving Development Toolkit to represent the objects in the Solar correct visible sizes of all objects for any Avango is a programming framework for System, overcoming a problem of viewpoint and using 3D models based on building distributed, interactive virtual ‘astronomical scales’. This problem real astronomical data and images. The environment applications. It is based on consists in the fact that a size of the Solar installation includes the 3200 brightest a SGI Performer to achieve the maximum System (diameter of Pluto orbit) and a stars, 30 objects in the Solar System, an possible performance for an application sizes of small planets (Phobos, Deimos) interactive map of constellations, and addresses the special needs involved differ by factor more than 109, as a result composed of ancient drawings, a large in application development for virtual single precision 4 bytes real numbers, database, describing astronomical objects environments. Avango uses the C++ used in standard graphical hardware, are textually and vocally in English and programming language to define the insufficient to represent the coordinates German. A stereoscopic projection objects and scripting language Scheme to of objects in such a scene. A special non- system is used to create an illusion of open assemble them in a scene graph. Avango linear transformation should intermediate cosmic space. The application is destined also introduces a dataflow graph, the real scale astronomical model and its primarily for CAVE-like virtual conceptually orthogonal to the scene virtual analog, mapping astronomical environment systems, giving a perception graph, which is used to define interaction double precision sizes ranging from 1 to of complete immersion into the scene, 1010 km into virtual environment single

48 ERCIM News No. 43, October 2000 TECHNOLOGY TRANSFER

Scolar excursion. precision sizes ranging from 1 to 20 m. voices, describing surrounding objects scient. regiae prussicae socio ; opera Ioh. Such transformation was chosen, (courtesy to Christian Brückner and Baptistae Homanni sac. caes. maj. geogra. satisfying two requirements: (1) it Ulrike Tzschöckell). Another variant of Ð Norimbergae [Nürnberg]: preserves actual angular sizes of planets vocal representation is using the IBM [Homännische Erben], [erschienen for any viewpoint to achieve realism of ViaVoice text-to-speech conversion tool. 1742].), courtesy of Kaiserslautern presentation, (2) it prohibits penetration The sound scheme can be selected and University library. inside the planets to implement no- reconfigured during runtime. Links: collision algorithm. Analogous http://heimat.de/zeitreise transformations are also applied to the Sources of planetary images and stellar http://viswiz.gmd.de/~nikitin/stars velocity of the observer, to make the data are from public Internet archives and exploration of near-Earth space and databases available at the NASA: Please contact: distant planets possible in one demo http://photojournal.jpl.nasa.gov, Igor Nikitin Ð GMD Tel: +49 2241 14 2137 session. http://adc.gsfc.nasa.gov and US E-mail: [email protected] Geological Survey: http://wwwflag.wr. Navigation in CAVE is performed using usgs.gov/. Images were imposed on a electromagnetic 3D pointing device spherical geometry as textures and were (stylus), or joystick/mouse in more simple enlightened by a composition of bright installations. User can manipulate by a solar and dim ambient lights. Positions, green ray in a virtual model, choosing the intensities and colors of stars in the model direction of motion and objects of interest. correspond to astronomical data from A navigation panel, emulating HTML catalogue. Additionally, high resolution browser, is used to display the information images of constellations can be displayed about selected objects and to choose the in the sky. Source of images is an ancient of journey. stellar map: (Hemisphaerium coeli boreale/ Hemisphaerium coeli australe: Sound accompaniment includes two in quo fixarum loca secundum eclipticae musical themes, representing the rest state ductum ad an[n]um 1730 completum and fast motion, which are mixed exhibentur/ a Ioh. Gabriele Doppelmaiero dependently on the velocity (courtesy to math. prof. publ. Acad. imper. leopoldino- Martin Gerke and Thomas Vogel), and carolinae naturae curiosorum et Acad. Navigation panel.

Images of constellations.

ERCIM News No. 43, October 2000 49 EVENTS

SPONSORED BY ERCIM 6th Eurographics Workshop on Virtual Environments

by Robert van Liere

From practical calibration procedures for Augmented reflecting the expanding scope virtual reality. This Reality and effects on group collaboration on presence workshop organized by the Eurographics Association, in a collaborative virtual environment to a virtual was held June 1-2 at the CWI in Amsterdam. ERCIM planetarium. The 6th Eurographics Workshop on acted as a co-sponsor to the workshop. Virtual Environments covered a broad range of topics,

The workshop was attended by 56 networks, dynamically extensible collaboratively walk through a virtual participants, which was much more than network software architectures, area of building which was designed by the initially expected. In addition, the interest management, reduction) Dutch architect Remco Koolhaas. organizers were very pleased that the and computer-generated autonomy participants came from all over the world. (agent-based simulation, adaptability, The atmosphere at workshop was very This reflects the growing interest in virtual learning, human behavior representations). relaxed and enjoyable; the participants environments as a field of research. Zyda gave examples of how entertainment particularly enjoyed the social event at in virtual environments could take the end of the first day: a traditional The keynote speaker Ð Michael Zyda advantage of these technologies and which Indonesian Rijsttafel in the heart of from the Navel Postgraduate School in bottlenecks still have to be overcome. Amsterdam. (In fact, the social event Monterey, California Ð gave a very clearly effected the number of participants interesting talk on the Modeling, Virtual The workshop ended at the Dutch at the first session of the second day!) Environments and Simulation Program. center, SARA, in which Links: The talk focussed on research directions a transatlantic collaborative virtual walk- Conference website: in the field of entertainment. Zyda gave through demonstration was given. Two http://www.cwi.nl/egve00/ his views on technologies for immersion CAVEs, one at the SARA in Amsterdam (low-cost 3D image generation, spatial and one at the Electronic Visualization Please contact: tracking, platform utilization, Laboratory in Illinois, were linked with Robert van Liere Ð CWI Tel: +31 20 592 4118 multimodal sensory presentation), a 155 Mbit ATM link. In one E-mail: [email protected] networked simulation (high demonstration, participants could

Trinity College Dublin hosted the Sixth European Conference on Ð ECCV’2000

by David Vernon

Ten years ago, the inaugural European Conference ECCV 2000 fell to Ireland and, from 26 June to 1 July, on Computer Vision was held in Antibes, France. Since Trinity College Dublin hosted what has become one then, ECCV has been held biennially under the of the most important events in the calendar of the auspices of the European Vision Society at venues Computer Vision community. around Europe. This year, the privilege of organizing

The Trinity campus, set in the heart of The first day of the week was devoted to Professor David McConnell, who set the Dublin, is an oasis of tranquility and its a series of tutorials on focussed topics tone for the coming days with his well- beautiful squares, elegant buildings, and which were given by international experts chosen words on the nature and relevance tree-lined playing-fields provided the in that field. These afforded many of perception and the need for perfect setting for the conference. Add to delegates an ideal opportunity of getting computational models of vision to enable this the superb facilities in the Trinity a snap-shot view of the state of the art in both industrial applications and reseach Conference Centre, six days of a perhaps unfamilar subject. in other scientific disciplines. Because uninterrupted sunshine, 280 delegates, and ECCV is a single-track conference, it the stage was set for a memorable week. The next four days comprised the severely limits the number of papers that conference proper and was opened by the can be accepted and ensures that they are Vice-Provost of Trinity College, of the highest quality. During the week,

50 ERCIM News No. 43, October 2000 EVENTS

forty-four papers were presented at the productive conference, facets which pub in Ireland Ð for another evening of podium and seventy-two were presented should engender conviviality, discourse, Irish music and dance. during daily poster sessions. The and interaction - in other words, the social proceedings were published by Springer- programme! This was extensive and The goal in organizing ECCV 2000 at Verlag as two volumes, each with varied and it featured the renowned Trinity was to ensure that delegates would approximately 900 pages (LNCS Vols. Trinity Gala Evening with exhibitions of depart with great memories, many new 1842 & 1843). excellent Irish cuisine, music, and friends, and inspirational ideas for future dancing; and a reception in the Long research; that they also now think the sun The final day was devoted to four Room (Trinity’s original library and always shines in Ireland is an additional Workshops on topics ranging from visual perhaps one of the most beautiful rooms bonus! , through 3-dimensional in the College) where delegates were Links: reconstruction of large objects such as regaled by the Librarian on the history Conference Website: buildings, to empirical evaluation of and future plans of the Trinity College http://www.eccv2000.tcd.ie/ computer vision algorithms. Dublin Library. They also had an opportunity to see the Book of Kells. The Please contact: Whilst the technical excellence of the conference dinner was set in the splendour David Vernon Ð Conference Chair, ECCV'2000 scientific programme is undoubtedly the of the Royal Hospital Kilmainham and National University of Ireland, Maynooth most important aspect of ECCV, there are some delegates also enjoyed a trip to Tel: +971 6 535 5355 other facets to an enjoyable and Johnny Fox’s pub - apparently the highest E-mail: [email protected]

SPONSORED BY ERCIM Fifth Workshop of the ERCIM Working Group on Constraints by Eric Monfroy

The Fifth Workshop of the ERCIM Working Group on constraints. The workshop was organized by Krzysztof Constraints, held in Padova, Italy, 19-21 June 2000, R. Apt (CWI, Amsterdam), Eric Monfroy (CWI, gathered some 35 experts from Europe and the United Amsterdam), and Francesca Rossi (University of States to discuss issues related to applications of Padova).

The ERCIM Working Group on authors of theoretical papers were interesting lively discussions on the topics Constraints (coordinated by Krzysztof R. encouraged to investigate and discuss a raised during the presentations. The Apt) was founded in the fall of 1996. possible use of their work in practice. workshop provided a useful platform that Currently, it comprises 16 ERCIM allowed all of us, practitioners as well as institutes and associated organizations. The workshop attracted some 35 researchers involved in theoretical work, This group brings together ERCIM researchers and ran for 3 days. The call to exchange information, learn about each researchers that are involved in research for presentations attracted a number of others work, and discuss possible future on the subject of Constraints. submissions out of which, given the time cooperation. limitations, we could accept only 20. The Padova workshop was the fifth Besides the accepted papers, we had three The detailed program of the workshop, meeting of the group. It took place at the invited talks on various application areas the proceedings in the form of extended Department of Pure and Applied for constraints: abstracts of the presentations, and more Mathematics of the University of Padova, information about the Working Group are Italy on 19-21 June 2000. It was the fourth ¥ Thom Fruehwirth from the Ludwig- electronically available through the Web workshop of the group jointly organized Maximilians-University of Munich site of the ERCIM Working Group on with the CompulogNet (Esprit Network gave a presentation on ‘Applications of Constraints. of Excellence on Computational Logic) Constraint Handling Rules’, Links: area on ‘Constraint Programming’, ¥ Alan Borning from the University of ERCIM Working Group on Constraints: coordinated by Francesca Rossi. Washington discussed ‘Applying http://www.cwi.nl/projects/ercim-wg.html. Constraints to Interactive Graphical This workshop covered all aspects of Applications’ Please contact: constraints (solving and propagation ¥ Michel Scholl from INRIA gave a Eric Monfroy Ð CWI Tel: +31 20 592 4195 algorithms, programming languages, new presentation on ‘Constraint databases E-mail: [email protected] formalisms, novel techniques for for spatio-temporal applications’. modeling, experiments, etc.), with The workshop was considered to be very particular emphasis on applications. The successful and led to a number of

ERCIM News No. 43, October 2000 51 EVENTS

CALL FOR PAPERS 1986, it was merged in 1992 with the international wellknown scientists will MFDBS symposium series initiated in present the state of the art in the fields International Journal Universal Dresden in 1987. ICDT aims to attract where Alain Bensoussan’s Access in the Information Society papers of high quality, describing original contributions have been particularly ideas and new results on theoretical important: filtering and control of 1st Issue: February 2001 aspects of all forms of database systems stochastic systems, variational and database technology. While special problems, applications to economy and Universal Access in the Information emphasis is put on new ideas and finance, , etc. The Society (UAIS) is an international, directions, papers on all aspects of second day will devoted to the tight interdisciplinary refereed journal that database theory and related areas are links between research and applications solicits original research contributions presented. and the impact of research on the ‘real addressing the accessibility, usability, world’ especially through valorization and, ultimately, acceptability of ICDT 2001 will be preceeded by a new of research results. This theme will be Information Society Technologies by International Workshop on Web enlighted through three round tables on anyone, anywhere, at anytime, and Dynamics which ICDT participants are the following topics: space, spatial through any media and device. The most welcome to attend. applications, science and technology journal publishes research work on the for information and communication. A design, development, evaluation, use, Further information: fourth panel will be devoted to Europe and impact of Information Society http://www.dcs.bbk.ac.uk/icdt2001/ as a priviledged example of the Technologies, as well as on collaboration between research and standardization, policy, and other non- CALL FOR PARTICIPATION industry. technological issues that facilitate and 37th Dutch Mathematical Congress promote universal access. Paper Alain Bensoussan was the first submissions, in English, should report Amsterdam, 19-20 April 2001 president of ERCIM from 1989 to on theories, methods, tools, empirical 1994. results, reviews, case studies, and best CWI and the Vrije Universiteit in practice examples, in any application Further information: Amsterdam will jointly organise the http://www.inria.fr/AB60-eng.html domain and should have a clear focus 37th Dutch Mathematical Congress. on universal access. In addition to the This congress, under the auspices of the above, the journal will host special Dutch Mathematical Society, is the CALL FOR PARTICIPATION issues, book reviews and letters to the yearly meeting point in The VisSym ‘01 Ð Joint Eurographics - editor, news from Information Society Netherlands of the Dutch Technologies industry, and mathematicians. Invited talks will be IEEE TCVG Symposium standardization and regulatory bodies, given on , statistics, on Visualization announcements (eg, conferences, analysis and discrete Ascona, Switzerland, 28-30 May seminars, presentations, exhibitions, math/. Minisymposia will 2001 education & curricula, awards, new be organised on the same four subjects, research programs) and commentaries supplied with applied math, systems (eg, about new legislation). Papers and case studies will form the theory, numerical analysis, computer scientific content of the event. Both algebra, financial math, and The journal will be published by research papers and case studies are optimization of industrial processes. Springer. invited that present research results Location: Vrije Universiteit in from all areas of visualization. Case Further information: Amsterdam. studies report on practical applications Journal’s website: http://link.springer.de/journals/uais/ Further information: of visualization to data analysis. Editor-in-Chief: Constantine Stephanidis Ð http://www.cwi.nl/conferences/NMC2001 FORTH-ICS Topics for research papers include, but E-mail: [email protected] are not limited to: flow visualization, CALL FOR PARTICIPATION volume rendering, surface extraction, information visualization, internet- CALL FOR PARTICIPATION Optimal Control and Partial Differential Equations Ð Innovations based visualization, data base 8th International Conference visualization, human factors in and Applications: Conference in on Database Theory visualization, user interaction Honor of Professor Alain techniques, visualization systems, large London, 4-6 January 2001 Bensoussan’s 60th Anniversary data sets, multi-variate visualization, multi resolution techniques. ICDT is a biennial international Paris, 4-5 December 2000 conference on theoretical aspects of Further information: http://www.cscs.ch/vissym01/cfp.html databases and a forum for communicating The conference will be divided into research advances on the principles of two parts: the first day dedicated to database systems. Initiated in Rome, in fundamental research animated by

52 ERCIM News No. 43, October 2000 EVENTS

CALL FOR PAPERS CALL FOR PAPERS SPONSORED BY ERCIM IEA/AIE-2001 Ð The Fourteenth International Conference on Shape JCAR 2001 International Conference International Conference on Modelling and Applications on Shape Ð The International Joint Industrial & Engineering Applications Genova, Italy, 7-12 May 2001 Conference on Automated Reasoning of Artificial Intelligence & Expert Siena, 18-23 June 2001 Systems Reasoning about shape is a common way of describing and representing Budapest, Hungary, IJCAR is the fusion of three major objects in engineering, architecture, 4-7 June 2001 conferences in Automated Reasoning: medicine, biology, physics and in daily CADE Ð The International Conference life. Modelling shapes is part of both IEA/AIE continues the tradition of on Automated Deduction, TABLEAUX cognitive and creative processes and Ð The International Conference on emphasizing applications of artificial from the outset models of physical intelligence and expert/knowledge- Automated Reasoning with Analytic shapes have satisfied the desire to see Tableaux and Related Methods and based system to engineering and the result of a project in advance. industrial problems. Authors are invited FTP Ð The International Workshop on First-Order Theorem Proving). These to submit papers presenting the results The programme will include one day of three events will join for the first time of original research or innovative tutorial and three days of papers at the IJCAR conference in Siena in practical applications relevant to the programme. Special Sessions will be June 2001. IJCAR 2001 invites conference. Practical experiences with organized on relevant topics. state-of-the-art AI methodologies are submissions related to all aspects of also acceptable when they reflect The Conference is organized by the automated reasoning, including lessons of unique value to the Istituto per la Matematica Applicata of foundations, implementations, and conference attendees. the CNR. applications. Original research papers and descriptions of working automated Further information: Further information: deduction systems are solicited. http://www.sztaki.hu/conferences/ieaaie2001/ http://SMI2001.ima.ge.cnr.it/ Further information: http://www.dii.unisi.it/~ijcar/ ✄

Order Form Name:

If you wish to subscribe to Organisation/Company: ERCIM News or if you know of a colleague who would like to receive regular copies of ERCIM News, please fill in this form and Address: we will add you/them to the mailing list.

send or this form to:

ERCIM NEWS Post Code: Domaine de Voluceau Rocquencourt BP 105 City: F-78153 Le Chesnay Cedex Fax: +33 1 3963 5052 E-mail: [email protected] Country

Data from this form will be held on a computer database. By giving your address, you allow ERCIM to send you email E-mail:

You can also subscribe to ERCIM News and order back copies by filling out the form at the ERCIM web site at http://www.ercim.org/publication/Ercim_News/

ERCIM News No. 43, October 2000 53 EVENTS

CALL FOR PAPERS impact of user interfaces, as well as standardization, policy and other non- UAHCI 2001 Ð 1st International technological issues that facilitate and Conference on Universal Access ERCIM News is the in- magazine of promote universal access. ERCIM. Published quarterly, the newsletter in Human-Computer Interaction reports on joint actions of the ercim partners, Deadline for paper abstract submission: and aims to reflect the contribution made by ercim to the European Community in New Orleans, LA, USA, 5 November 2000 Information Technology. Through short articles 5-10 August 2001 and news items, it provides a forum for the Contact and submission details: exchange of information between the institutes http://uahci.ics.forth.gr/ and also with the wider scientific community. The 1st International Conference on ERCIM News has a circulation of 7000 copies. Universal Access in Human-Computer Interaction (UAHCI 2001), held in ERCIM News online edition is available at http://www.ercim.org/publication/ERCIM_News/ cooperation with HCI International 2001, CALL FOR PAPERS aims to establish an international forum ERCIM News is published by ERCIM EEIG, for the exchange and dissemination of ICANNGA 2001 Ð 5th International BP 93, F-06902 Sophia-Antipolis Cedex scientific information on theoretical, Conference on Artificial Neural Tel: +33 4 9238 5010, E-mail: [email protected] ISSN 0926-4981 methodological and empirical research Networks and Genetic Algorithms Director: Jean-Eric Pin that addresses all issues related to the Central Editor: attainment of universal access in the Prague, Czech Republic, Peter Kunz E-mail: [email protected] development of interactive software. The 22-25 April 2001 Tel: +33 1 3963 5040 conference aims to attract participants Local Editors: from a broad range of disciplines and The focus of ICANNGA is on theoretical CLRC: Martin Prime E-mail: [email protected] fields of expertise, including HCI aspects and practical applications of Tel: +44 1235 44 6555 specialists, user interface designers, computational paradigms inspired by CRCIM: Michal Haindl computer scientists, software engineers, natural processes, especially artificial E-mail: [email protected] Tel: +420 2 6605 2350 ergonomists and usability engineers, neural networks and evolutionary CWI: Henk Nieland Human Factors experts and practitioners, algorithms. E-mail: [email protected] organizational psychologists, system / Tel: +31 20 592 4092 CNR: Carol Peters product designers, sociologists, policy ICANNGA 2001 will include invited E-mail: [email protected] and decision makers, scientists in plenary talks, contributed papers, poster Tel: +39 050 315 2897 government, industry and education, as session, tutorials and a social program. FORTH: Constantine Stephanidis E-mail: [email protected] well as assistive technology and The conference is organized by the Tel: +30 81 39 17 41 rehabilitation experts. The conference Institute of Computer Science, Academy GMD: Dietrich Stobik solicits papers reporting results of of Sciences of the Czech Republic. E-mail: [email protected] research work on, or offering insights on Tel: +49 2241 14 2509 Further information: INRIA: Bernard Hidoine open research issues and questions in, the http://www.cs.cas.cz/icannga E-mail: [email protected] design, development, evaluation, use, and Tel: +33 1 3963 5484 SICS: Kersti Hedman E-mail: [email protected] Tel: +46 8633 1508 SINTEF: Truls Gjestland E-mail: Forthcoming Events sponsored by ERCIM [email protected] Tel: +47 73 59 26 45 ERCIM sponsors up to eleven conferences, workshops and summer SRCIM: Gabriela Andrejkova schools per year. The funding is in the order of 2000 Euro. Conditions E-mail: [email protected] Tel: +421 95 622 1128 and an online application form are available at: SZTAKI: Erzsébet Csuhaj-Varjú http://www.ercim.org/activity/sponsored.html E-mail: [email protected] Tel: +36 1 209 6990 FORTE/PSTV 2000 Ð Formal Description IJCAR Ð International Joint Conference TCD: Carol O’Sullivan E-mail: [email protected] Techniques for Distributed Systems and on Automated Reasoning, Tel: +353 1 60 812 20 Communication Protocols, Siena, Italy, 18-23 June 2001 VTT: Pia-Maria Linden-Linna Pisa, Italy, 10-13 October 2000 http://www.sofsem.cz/ E-mail: [email protected] Tel: +358 0 456 4501 http://forte-pstv-2000.cpr.it/ MFCS’2001 Ð 26th Symposium on Free subscription Sofsem 2000 Ð Current Trends in Theory Mathematical Foundations in Computer You can subscribe to ERCIM News free of charge by: and Practice of Informatics, Science, ¥ sending e-mail to your local editor Prague, 25 November- Marianske Lazne, Czech Republic, ¥ posting paper mail to the ERCIM office (see 2 December 2000 27-31 August 2001 address above) ¥ filling out the form at the ERCIM website at http://www.sofsem.cz/ http://math.cas.cz/~mfcs2001/ http://www.ercim.org/

54 ERCIM News No. 43, October 2000 IN BRIEF

CWI Incubator BV helps supplied the software to a number of researchers starting a company: European newspapers, including the In July 2000, CWI has founded CWI Helsingin Sanomat in Finland and Dagens Incubator BV (CWI Inc). CWI Inc will Nyheter in Sweden. Unisys will continue generate high-tech spin-off companies, to develop the software in Finland. The based on results of CWI research. product complements the US company’s Earnings from CWI Inc. will be re- own, globally-marketed advertising invested in fundamental research at CWI. systems which, until now, lacked a During the last 10 years, CWI created 10 pagination function. Photo: Alexandre Eidelman/INRIA spin-off companies with approximately From left: Christian Pierret, French Minister 500 employees. Generation of spin-off of Industry, Roger-Gérard Schwartzenberg, CWI upgraded its connection to companies is an important method for French Minister of Research, Bernard SURFnet to gigabit level on 8 institutes like CWI to convert Larrouturou, Chairman of INRIA. August. SURFnet is the national Dutch fundamental knowledge to applications computer network for research and in society and to create high-level further and consolidate the role of France education organisations. CWI is the first employment at the same time. CWI and in the larger context, Europe, in terms to get this very fast access to the expects CWI Inc to attract enterprising of innovation and economic SURFnet backbone, which will have a researchers and to have a positive competitiveness. Along with other capacity of 80 Gbit/s next year. influence on the image of mathematics institutions and university departments, and computers science for students. For INRIA is at the heart of the French state Patrick Valduriez, Research more information, see: http://www.cwi.nl/ research system upon which this policy Director at INRIA from the cwi/about/spin-offs. is based. Through this contract, the State, Caravel research team and his co-authors aiming for ambitious goals, has conferred C. Bobineau, L. Bouganim and P. CWI Ð Data Distilleries Raises on INRIA the labour and material Pucheral from the PRISM Laboratory of $24million. One of CWI’s resources for a well-planned growth with Versailles University have been succesful spin off companies is Dutch precise strategic objectives. distinguished by the “Best Paper Award’ CRM developer Data Distilleries BV. In at the VLDB ( Very Large Database ) July 2000, Data Distilleries has closed a The department of Computer Conference for their paper ‘PicoDBMS: fundinground to raise $24m. The Science in Trinity College Scaling down Database Techniques for Amsterdam-based company will use the Dublin hosted the first Irish Workshop the SmartCard’. VLDB, which is one of cash to expand into the UK and six other on Eye-Tracking in May this year, in co- the most prominent conferences in the European countries and says it plans to operation with the psychology department database field took place in Cairo in be in the US market by the end of next of University College Dublin. The September 2000. year. Data Distilleries has developed analysis of eye movements can provide technology that enables users with valuable insights which can be exploited customer information from multiple in a range of research areas, and an channels to analyze it and draw up interchange of ideas between the different corresponding marketing and promotions fields is highly desirable. This workshop customized for individual customers. brought together an interdisciplinary group of researchers to share experiences INRIA’s 2000-2003 four-year in using eye-tracking hardware, and contract signed: Roger-Gérard analysing eye movements. Topics Schwartzenberg, Minister of Research, included: Computer interfaces for the and Christian Pierret, Minister of disabled, eye-tracking for video encoding, Industry, signed INRIA’s 2000-2003 map-processing and other cognitive tasks, four-year contract with Bernard the treatment of schizophrenia, and Larrouturou, Chairman of INRIA. The applications in computer graphics. Link: four-year contract provides for a http://isg.cs.tcd.ie/iwet/ significant increase in INRIA personnel, which will go from 755 to 1180 as of VTT Information Technology 2003, as well as in funding to keep pace sells its VIP Intelligent with the augmentation of personnel. As Pagination system to Unisys Corp.: early as 2001, the Institute's funding will VTT Information Technology has sold be raised by 9.15 million Euros and the the VIP Intelligent Pagination system staff will be increased by 180 employees. developed by it to Unisys Corporation. France will make a significant effort for The VIP software has been developed to research in information and automate the layout of classified communication technology in the coming advertisements in newspapers. VTT years in terms of both means and staff to Information Technology has already

ERCIM News No. 43, October 2000 55 ERCIM Ð The European Research Consortium for Informatics and Mathematics is an organisation dedicated to the advancement of European research and development, in information technology and applied mathematics. Its national member institutions aim to foster collaborative work within the European research community and to increase co-operation with European industry.

Central Laboratory of the Research Councils Rutherford Appleton Laboratory, Chilton, Didcot, Oxfordshire OX11 0QX, United Kingdom Tel: +44 1235 82 1900, Fax: +44 1235 44 5385 http://www.cclrc.ac.uk/

Consiglio Nazionale delle Ricerche, IEI-CNR Via Alfieri, 1, I-56010 Pisa, Italy Tel: +39 050 315 2878, Fax: +39 050 315 2810 http://www.iei.pi.cnr.it/

Czech Research Consortium for Informatics and Mathematics FI MU, Botanicka 68a, CZ-602 00 Brno, Czech Republic Tel: +420 2 688 4669, Fax: +420 2 688 4903 http://www.utia.cas.cz/CRCIM/home.html

Centrum voor Wiskunde en Informatica Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands Tel: +31 20 592 9333, Fax: +31 20 592 4199 http://www.cwi.nl/

Danish Consortium for Information Technology DANIT co/CIT, Aabogade 34, DK - 8200 Aarhus N, Denmark DANIT Tel: +45 8942 2440, Fax: +45 8942 2443 http://www.cit.dk/ERCIM/

Foundation for Research and Technology Ð Hellas Institute of Computer Science, P.O. Box 1385, GR-71110 Heraklion, Crete, Greece Tel: +30 81 39 16 00, Fax: +30 81 39 16 01 FORTH http://www.ics.forth.gr/

GMD Ð Forschungszentrum Informationstechnik GmbH Schlo§ Birlinghoven, D-53754 Sankt Augustin, Germany Tel: +49 2241 14 0, Fax: +49 2241 14 2889 http://www.gmd.de/

Institut National de Recherche en Informatique et en Automatique B.P. 105, F-78153 Le Chesnay, France Tel: +33 1 3963 5511, Fax: +33 1 3963 5330 http://www.inria.fr/

Swedish Institute of Computer Science Box 1263, S-164 29 Kista, Sweden Tel: +46 8 633 1500, Fax: +46 8 751 7230 http://www.sics.se/

Swiss Association for Research in Information Technology Dept. Informatik, ETH-Zentrum, CH-8092 Zürich, Switzerland Tel: +41 1 632 72 41, Fax: +41 1 632 11 72 http://www.sarit.ch/

Stiftelsen for Industriell og Teknisk Forskning ved Norges Tekniske H¿gskole SINTEF Telecom & Informatics, N-7034 Trondheim, Norway

Telecom and Informatics Tel :+47 73 59 30 00, Fax :+47 73 59 43 02 http://www.informatics.sintef.no/

Slovak Research Consortium for Informatics and Mathematics Dept.of Computer Science, Comenius University, Mlynska Dolina M SK-84248 Bratislava, Slowakia Tel: +421 7 726635, Fax: +421 7 727041 http://www.srcim.sk

Magyar Tudományos Akadémia Ð Számítástechnikai és Automatizálási Kutató Intézete P.O. Box 63, H-1518 Budapest, Hungary Tel: +36 1 4665644, Fax: + 36 1 466 7503 http://www.sztaki.hu/

Trinity College, Department of Computer Science, Dublin 2, Ireland Tel: +353 1 608 1765, Fax: 353 1 677 2204 http://www.cs.tcd.ie/ERCIM

Technical Research Centre of Finland VTT Information Technology, P.O. Box 1200, FIN-02044 VTT, Finland Tel:+358 9 456 6041, Fax :+358 9 456 6027 http://www.vtt.fi/