Thomas Lengauer, Dietmar Schomburg, Michael S. Waterman (editors):

Molecular Bioinformatics

Dagstuhl-Seminar-Report; 46 07.09.-11.09.92 (9237) ISSN 0940-1121 Copyright© 1992 by IBFI GmbH,Schloß Dagstuhl, W-6648 Wadern, Germany Tel.: +49-6871 - 2458 Fax: +49-6871 - 5942 Das InternationaleBegegnungs- und Forschungszentrumfür lnformatik(IBFI) ist eine gemein- nützige GmbH. Sie veranstaltet regelmäßigwissenschaftliche Seminare, welche nach Antrag der Tagungsleiterund Begutachtungdurch das wissenschaftlicheDirektorium mit persönlich eingeladenen Gästen durchgeführt werden. Verantwortlich für das Programm: Prof. Dr.-Ing. Jose Encamagao, Prof. Dr. Winfried Görke, Prof. Dr. Theo Härder, Dr. Michael Laska, Prof. Dr. , Prof. Ph. D. Walter Tichy, Prof. Dr. Reinhard Wilhelm (wissenschaftlicherDirektor) Gesellschafter: Universität des Saarlandes, Universität Kaiserslautern, Universität Karlsruhe, Gesellschaft für Informatik e.V., Bonn Träger: Die Bundesländer Saarland und Rheinland-Pfalz Bezugsadresse: Geschäftsstelle Schloß Dagstuhl Informatik, Bau 36 Universität des Saarlandes W - 6600 Saarbrücken Germany Tel.: +49 -681 - 302 4396 Fax: +49 -681 - 302 - 4397 e-mail: [email protected]sb.de Molecular Bioinformatics

Thomas Le11gauer,Dietlnar Schomburg. 1\=Ii(:l1aelVVato1'111an (editors)

Dagstuhl-~Semi11ar-Report, S(pt.emb<-I7 - 11 1992 (9237) Workshop on Molecular Bioinformatics

Organizers: Thomas Lengauer (GMD, Schloss Birlinghoven/), Dietmar Schomburg (GBF, Braunschweig), Michael S. Waterman (USC, Los Angeles) September 7 - 11, 1992

Molecular Bioinformatics is a notion that. one can a.ssignto a.n area.in applied com- puter science which is rapidly gaining significance. Roughly, this area is conc'ernecl with the development of methods and tools for analyzing, nn(lerst.an('ling,reasoning about and, eventually, designing large biomole('ul<-~sauch as DNA. R NA, and proteins with t.he aid of the computer. With the knowledge in molecula.r biology increasing at an explosive rate, and data on genomes and their products being (.'ollect.e(.lat tremendous speeds. :V'lolecular Bioinformatics becomes an important challenge to applied computer scientist.s. Before this background, the Dagstuhl Seminar on Molecular Bioinformatics brought together experts from all over the world t.hat are working on algorit.hn1ic issues in this eld. The Workshop was interdiscplina.ry, with people frorn mole<'.u lar biology, computer science, and applied 1cnathe1naticsa.ttending. The worksliop focussed on the following topics:

0 Alignment of biomolecular sequences(DNA, RNA, Proteins),

o Modeling of large biomolecules, including the prediction a11danalysis of secon- dary and higher-levelstructure as well as spatial conformations(folding),

0 Molecular dynamics and simulations of interactions between biomolecules,

o Interpreting nucleotide sequencesand their role in gene regulation,

0 Reading genomic sequences. Besidesthe presentations,there were two organizedevening discussion sessions on the topics PAM matrices and Computer-AidedDrug Design. The experimentof bringing togetherresearchers with wide]y varying bacl~:g'roun(ls to discuss an exciting new interdisciplinary field was successful. The attendees dis- cussedlively and often controversially,developed a senseof identity for the new field during the workshop, and went back home with new insights, problemsand ideas. For the German researchers,the workshop was an ideal preparation for forming cooperationswithin the new funding program Molecular Bioinformatics that had just beenannounced by the BM FT (GermanMinistery for Researchand Technology).A few participants evaluatedthe workshopas their most productive workshop experience. We are especially grateful to the Dagstuhl office and team for their excellent organization in preparing and conducting the workshop, as well as their always

Z enga.ge(land personalsupport.» in all matters. The cordial a.t.rnosphe~rein the .S'(fhIol$ was an essential ingredient of the Sll(('(_'5sof this workshop. We are also grateful to NSF for providing a grant for nancial support. of int..ercontinent.altravel of the (7.8. based participants of this workshop. Program

Lengauer

Morning Session Apostolico

hdanber Chang Giegerich Naor AfternoonSession

WatermanMonday, September 7 Welcome and Introductory Remarks Sequence Alignment VingronChair: Ga.d Landau Structuring Sequence Data Banks for lnstantant-éousPa.ra.ll<-I Search AapproxinialeieString Ma.t<liir1g:Applications Blum.»\ppro.\'i1na.l'e Mal cliing with (i'(msta.ntliraclion l:Crror linaihedclingSequence Analysis in a Functional Programming Environ ment hdanberRepresenting Suboptimal Aligments Chair: Gaston Gonnet Parametric SequenceAlignment Interpreta.tion of Pa1'a.1net1'icAlignment Plots On Locally Optimal Alignments in Genetic Sequences Approximate String Matching: Algorithms Morning Session Karp Tuesday, September 8 Shamir Physical Mapping Cl1a.ir: Gene Lawler Chang The Physica.l Mapping Problem Physical Mapping of DNA and lnterval G1°aplis Physical Mapping in Practice Gonnet Sequences and Trees

Warn ow Towards the Best PossibleTheory a.nd Algorithms for Se- quence Alignment New problems in l_'3volut.ionaryTree (Tonstruction Afternoon Session Secondary and Tertiary Structure Schomburg Chair: Tim Havel Sander Computer-Ai(le

Chan

Koch Wednesday, September 9 Tertiary Structure Miyano ()7hair: (.i.'el1risSander Distance (_ileomct1'_yand Homology Modeling Prediction of the 'le1'tiaryStructure of Globular Proteins over l)is-- crete (,i01lii()llIlEl.il01lil.lSpaces Morning Session A Lattice l£nuri1c*rationApproach to Protein Folding I)ress Secondary Structure Graph Theo1°etic.alDescription of Sheet Topologies Sankoff Machine Discovery by Decision 'l1'eesover Regular Pat1.crns von Haeseler Gonnet Thursday, September 10 Sequences Afternoon Session I)ress Chair: Udi l\lanber ls Darwinism a Falsiiiable Tlieory? Methods in Sequeiice .='\nalysis Schmidt Climbing a Tree Tlirough the Window Reconstructing Phylogenetic Trees Zuker Text Searching Algorithms in Darwin Chair: Temple Smith Statistical Geometry in SequenceSpace TaylorSecondary Structure Problems in Coiledcoil Proteins RNA Secondary Structure Modeling Molecular Graphics EveningSessionDelegate Analysis from a Macromolecular Graphics l1rt.e1fa.ce:Ex amples from SequenceAnalysis Problem Areas for Interaction BetweenBiochemistry and Compu- Havel ter Science NMR Spectroscopyfor Determination of Protein Structure Crippen Methods of Drug Design

Friday, September 11 Tertiary Structure Morning Session Chair: Gordon Crippen Schulze-Kremer Applications of Articial Intelligenceand MachineLearning to Pro- tein Structure Analysis Sippl Hide and Seek on a Polyprotein Kaden Double-Point Chains in Proteins and InverseProtein Folding Schneider Sequence-StructureRelationships in the Twighlight Zone Structuring Sequence Data Banks for Instantaneous Parallel Searching Alberto Apostolico, University of Padua Current molecular sequence data. ba.nksconsist mainly of raw sequenceswith some annotation. While such a basic information will hardly be forfait.ed ever in the future, auxiliary data structures are being gradually introduced and studied which facilitate various kinds of searchesand ('omparisons.For some such manipulations. serial computation is inadequate,so that efficient parallel methods are sought. We present a data structure that supports constant-time implementation on a CRCVV PRAM of exact searchesfor arbitrary patterns into arbitrary substringsof a sequence data. bank.

On the Accurate Notion of Locally Optimal Alignments and Subalignments in Genetic Sequences Norbert Blum, University of Bonn We review old and new results with respectto locally optimal alignmentsand sub- alignments in genetic sequences.The main theorem is that the subgraphof the edit. graph, containing exactly the locally optimal subalignments,can be computed in ('9(nmlog(_n+ m)) time, in the casethat the underlying cost functions are concave.

A Lattice Enumeration Approach to Protein Folding Hue Sun Chan, University of California at San Francisco (UCSF) To study the protein folding problem, we use exhaustive sequenceand conforma- tional enumerationsto study copolymerchains configuredon lattices. These model moleculesare short self-avoidingchains of hydrophobic (H) and polar (P) mono- mers. This simple model showsthat under folding conditions, a signicant fraction of H/ P copolymersexhibit protein-like behavior suchas high compactness,conside- rable amount of secondarystructure, and low degeneracyof the lowestenergy state. We also explore the folding kinetics of those H/ P copolymerswhich have unique native structures. Under folding conditions, thesemodel protein moleculescollapse quickly to an ensembleof relatively compact conformations,and then re-arrange much more slowly as they seektheir unique native states. Folding time of the model moleculesis strongly sequence-dependent,because the arrangementof Hs and Ps along the sequencedetermines the energeticlandscape of the chains conformatio- nal space. The fastest folding sequencesare those whose native structures are most accessible and least protected by energy barriers. (This is joint work with Ken A. Dill at UCSF) Physical Mapping in Practice William I. Chang, Cold Spring Harbor Laboratory, N.Y. USA (lose collaboration l)et.ween biologists (D. lieacli Lab) and inatl1en'1atica.ls(ic11l;isl.s (T. Marr. Lab) at C.S.ll.L. resnltecl in the lastest, mapping to (late of a.n entire genoine S. pombc (fission yeast. a model organism for the stucly of the cell <'_\'('le). Contig assembly and error analysis are (lone using a l:)ra1ich-ancl-honnclalgorithm that nds the globally optimal linear a.i°rz-i.1igm11<~-Iitof anchors, minimizing the mst of inconsistencies in the data (l'alse|)()Sll.l\'(;.H, false negati\'es, r(potitive S(.(|ll(ll('(S). Carrie(l out in conjunction with (.-\])fl'llll(-Hts.this analysis is used to resolve in- consistencies (l)_v repeating experiments) and to <lircct Further (*..\'])6l"illl(-3IllS.lligh confidence in the partially constrnct.<-~(lmap inakes possible reclucerl lal)orator_v work and an accelerated rate of progress.

Sublinear Approximate Matching with Constant Fraction Error

William I. Chang, Cold Spring Harbor Laboratory, N.Y. USA Pattern matching is a classical prohleni of computer science. and approximate inat- ching of sequencesis motiva.te(l by molecular l)iolog_y.(liven a database of size n and a pattern of size in over a b-letter al pl1al:)et,we wish to find all locations in the-.data» base where the pattern occurs with at most Iv cliffeimices (substitutions, insertions, or deletions). There exist constants /lg,such that for k < pbm, k differences inatcliiii g has average case cornplexity (')(('n./'m)(l.v+ log,,m)). This algoritlnn requires spa(_'e polynomial in the. size of the pattern and can be generalized to other (listance as well as similarit.y measures.

A Contact Potential that Recognizes the Correct Folding of Proteins V.N. Maiorov and G.M. Crippen, University of Michigan We have devised a c.ontinons function of interresidue contacts in globula.r proteins such that the X-ray crystal structure has a lower function value than that of thou- sands of protein-like alternative conformations. From a training set of 37 proteins a.nda total of 10,000alternatives, the potential satises altogether 73 proteins vs. their 530,000alternatives. In addition, another95 highly homologousprotein crystal structures are correctly treated. While the potential is intended primarily to select. the native out of a large choice of rather similar or very (lissirnilar conformers. it can also indicate approximatlely whether the native is one of the choices. Is Darwinism a Falsifyable Theory? Andreas Dress, Univ. Bielefeld; H.J. Bandelt, Univ. Hamburg Though conceive(olas a.theory. (lescribingand e.\'plainingwhat has happenedrather than predicting what. will happen, tliere is one predirttiou lollowing 110111Darwin's (as well a.sfrom lc,amarques)theory which can be t(-st<,-(l:the claim that all living species can be grouped in a sensible and consistent way into one big pliylogenetic tree. Standard tree reconstruction nietho(ls - suc<'<+-ssl'ulas they inay be - are not [it to properly put. this claini to a test. because all of them presuppose that what tliey are searchingfor:actuall.V is a tree structure. llence an alternative to thesemetho(ls is suggested.One looks lor all decoinpositions.1 = Ä U B of the ('()lle('ti0nÄ of species under consideration into two 1ion":-enipty,

Statistical Geometry in Sequence Space Andreas Dress, Univ. Bielefeld with Manfred Eigen and Ruthild Winkler Oswatitsch, MPI Göttingen A statisti(a.l method of coniparative sequeiice analysis that combines horizontal and vertical correlations among aligned seqiiencescan be baseclon the analysis Inainly of quartet coinbinations of sequences.consi(lere(l as geoinetric four-point configurations in sequence space. Numerical invariants related to relati\'einternal segment.length are assigned to each such confi gu ration and statistical averagesof tllese invariants are established. They can be used for internal calibration of the topology of divergence and for quantitative determination of the noise level. (lomparison with computer sin'1ulationsreveals the high sensitivity of assignment of basic topologies even if much randoini7,ed. ln addition, these procedures can be cliecked by vertical analysis of the aligned sequencesto allow the stucclyof (li\.~'e1'gencieswith positional] y varying substitution prol)abilities. Embedding Sequence Analysis in the Functional Programming Paradigm Robert Giegerich, University of Bielefeld Colnpositiona]ity a.11de.\'tensibility of analysis algoritlnns a.reimportant. for close investigationsof complexse(.'on(la.ry structures in biosequences.Since these two pro- perties belong t.o the main virtues of futictional programs. a (a.scstudy was per- formed to eva.luatethe viability of embeddingscqumicc analysis in the finictional paradigm. Its first part showshow an advanc.edpattern nlatching languagecan be implementedin a concise,transparent, and extensibleway. lts secondpart reports on an in1plen1enta.tionof la.zyposition trees, including cflicie.1i<:_ymeasurenients per- formed with three current func.tiona.l language s.vstems.

A Formal Method for the Evaluation and Comparison of a Class of Aligning Algorithms Gaston H. Gonnet, ETH Zürich Two a.spectsare considered to be the essential measuresof an alignment algorithm: (a.) How well does it discriminate betweenhomologous sequences and random se- quencesand (b) When aligning homologoussequences, how many errors (misaligned positions) it will make. The methodologyused for comparing algorithms and their associatedscoring matrices, is basedon simulating evolution, creating new S(:?(]u(;31'l('(.S from a given one and then aligning the evolvedsequence aga.inst the original one. Since we know the results of the evolution, it is directly measurable how inany errors were done in the alignment. Since we can obviously produce 1'&t]1(.lO11'lsequences,we can also test the discrimination of the algorithm. A couple of observations make the simulation of evolution possible. Since we restrict ourselves to the class of al- gorithms which are based on dynamic programming, we can simulate evolution as a Ma.rkovian process. The DP algorithms will ignore any relation between amino acids when they are compared, and will assign a cost which is constant and depends only on these amino acids. Somethingsimilar happensfor insertions/deletions. The mathematical exact evaluation of the results is only possible for dynamic. progra1n~ ming algorithms without deletions. For the complete algorithms we have to content ourselves with Monte Carlo simulation results. This is work in progress;

The Darwin System Gaston H. Gonnet, ETH Zürich Darwin is an interactive and programmable system for doing computations in mo- lecular biology. Darwin is a descendent of the Maple system for doing computer algebra and shares its syntax and various design philosophies. Darwin is particu- larly strong in text handling and in sequencecomparison. Darwin uses Pat indices as an underlying structure (Pat. in(li('esare Patricia tree impleinentat ions of suffix trees) for doing: all against all sequencealigment of a databasein (')(;\""").J < .2:one a.gainst a.ll sequence alignment in (")(.«\"").a < l; loiige-st repetition searcliingz most frequent k-gra.1nsand, of course, exact matches in 6(lo_q.\').1)arwin a.lsohas appro- xima.te text searching (Levenshteinis distance) as a primitive. This is llIl|)l(:'lll(?lI.1»((l as a partially defiiied DFA. In this way the algoritlnn adapts to the s<-.-a.rc|1i11gstring and in practice is remarkable fast. Tlie systeni is i111ple111e11tedby a kernel i11(T and libra.ries (some contributed by users) in Darwin itself. l111'1u1s o11various Uiiix work- stations and is distributed by e-mail at no cost. In total Darwin hab"more tlu.-1'1150 functions and commands, so this is clearly a very pa.rtial description of the system. We have been doing all are computations since 1990 in this system excl11sivel_y.

Reconstruction of Phylogenetic Trees Arndt v. Haeseler, Univ. Munich For a set X = {$1, . . . , .,5'.,,}of n aligned sequences we want.to recoiistruct a phyloge- netic tree T displaying the evolutionary rela.tionsl1ipof the sequeiices.Using any dis- simila.rity measure6 2 X x X + R. we definea neighbourrelation ||5(with respectto Ö)for eachquartet of sequences.We sa_yS1 and S2are neighbourswith respectto .5}; and S4 (S1S2ll6-S354)0bk(S(S1, S2) + (S3, S4) < TII.t7l.{6(S1,S;;)+ (S(.S'-2, S4),($(S1, S4) + 6(S2,53)}. A similar denition of ueighbourlinessis made for binary unrooted trees T. We propose a method to find a tre.eT among all tree topologiesfor which the number of quartets that fulfil both relations a11d3l is maximal. linally, we discuss some examples from our studies of rR N A and tR.NA sequences.

Homology Modeling and Distance Geometry Timothy F. Havel, Harvard Medical School Distance geometry is a geometric model of molecules, wlierein the structure is defi- ned in terms of distanceand chirality constraints.These a.re, respectively, lower and upper bounds on the interatomic distances,a11d the orient.ationsof selectedrigid and asymmetric tetrahedra of atoms. Dista.ncegeometry calculationsare designed to reveal the geometric structure of the set of all conformations(spatial arrange- ments of the atoms) consistentwith this information. The.most important of these calculations involves computing a conformational ensemble, i.e. a set of conformati- ons satisfying the constraints,but otherwiserandom. By analysingsuch an ensemble to discovernew geometricproperties that are uniformly presentin all its members and henceare, with high probability, necessaryconsequences of the geometriccons- traints used as input, these calculations provide us with a.crude but effective method of geometric reasoning.Distance geometry calculations have found numerousapp- lications in chemistry and biology, most notably methodsof determining structure from e.g. N MR data, exploring conformationspace, and generatingcoordinates from connectivity tables. In this lecture, a new application is introduced, which obtains

10 the distance and chirality constraints sufficient to determine a protein sl1u(tu1e from aligmnents of its sequence with hoinologous proteins of known st1'ucture.As an example. I have predicted the structure of the lillaxorloxin from 1L.('oliusing as homologues the crystal st1uctures of the l*la.vodo.\'insfrom .»l.ni(luIans. (.I)eij<'i'i('l<. C.(1i.9'Pi1sand D.vulgaris. The conlplete results of this stud) be found in a l'orthco- rning issue of the journal Molecular .S'irmzla.t.io1i.

Double Point Chains in Proteins and Inverse Protein Folding Frieder Kaden, GMD, St. Augustin How 1na.nypossibilities a.re there to walk through the str11(tu1eof a giwn protein if not only the main chain steps are allowed but also steps between residues that a.re far apart in the sequence but whose geometric distance is almost like that. ol backbone neighbours, and each residue is visited exactly once"?Tlie main chain of the protein corresponds to its ordinary sequence. Other walks through the protein lea.d to modied sequencesthat can be a.pplied to lind alignnients to new protein sequencesin the senseof the inverse problem of protein folding. ln 2: suitable graph the above question appea.rsas the Nlcornplete problem of finding all llarniltonian paths. The problem is transformed into a double point problem that can be s()l\~'e(l by a method which is a three-diinensional generalization of the formalisin of L. Kauffman presented in his Formal Knot Th(()l'__\'in 1983.

Physical Mapping of Chromosomes Richard M. Karp, University of California, Berkeley, CA, USA We present several algorithms for reassemhling the overlap structure of clones on a chromosome, given various kinds of fingerprint data for the clones.

Graph Theoretical Description of Sheet Topologies Ina Koch, GMD, St. Augustin My talk was about.the usageof graph theoreticaldescriptions of proteinsat dill'erent structure levels. We defined the protein graph, that describesthe protein structure at the residue level, and the beta graph describiiig sheet topologies.At the protein graph level we derived patterns, which call be divided in sequentially short-range (describinghelical and turn structures)and longrangepatterns (describing super- secondarystructures). We matched the longrange patterns against certain sheet topologies,chosen from a set of non-homologousproteins in order to find relations betweenpatterns and topologies.First results show, that there is a quite ('lifI"erent amino acid distribution in the patterns, which are niatching against certain topolo- gies.

11 Approximate Pattern Matching: New Algorithms and Applications Udi Manber, University of Arizona We (les(1il)e(la tool for approximate pattern 1nat.('hing.('a.lled agrep. .-lgrep can search, very last. for <'omplicat.edpatterns including a.r|)it.rar_vregular expressions and allowing insertions.deletions. and/or suhst..itutionerrors. lC.\'amplesof the use of agrepwere shown. and two of the five new algorithms that agrepuses were presented. Prelimiiiary ideasabout fast a.pproxiinatepa.ttern mat:cl1ing in a prepro(:esse(llibrary of patterns, using:the triangle inequality to prune the sea.I'('l1tree. werealso discussed.

Machine Discovery by Decision Trees over Regular Patterns , Kyushu University Wo describe a n1a.clline-learningsystem that produces hypotheses from positive and negative examples. and report some experiments on protein dat.a using HR and (Ion Bank. This learning system is developed with a learning algorithm for decision trees over regular patterns, which we devised newly for this research. ln the experi- ments on transmembrane domain identification. the system disctoveredvery simple hypotheses with very high accuracy from a small number of positive and negative data. These hypothesesshow that negativemotifs, that is. motifs of negativedata. play a key role in such identicatioii. In these experiments, we classied 20 symbols of amino acid residues according to the l1ydropa.thy iudices due to Kyte and D00- little. VVe call such trausformation of symbols an indexing. We. observed that the iiulexing by the hydropatliy indices is iniport..ant.in ma.king the learning algorithm eilicient and accurate. This observa.tion inspired us with a.desire to discover such an indexing itself without any help of biological l

Representing Suboptimal Alignments of Biological Sequences Dalit Naor, Stanford Univ. USA The opytimal alignment between a pair of biological sequencesthat minimizes the edit-distance may not necessarily reflect the correct biological alignment, that is the alignment based on Sl,l'1l('.l-l.1l(_.or evolutionary changes. However, in many casesthe edit..-distance alignment. is a good approximation to the biological alignment. Sub- optimal alignments are alignments whose scores lie within the neigbonrhood of the

12 optimum, and they weresuggested as all.ernat.ivest.o the optimal one. We study the conibinatorial nature of suboptirual aligiuneuts and give a compact. rcpresentatioiI of them. VVedefine a canonical set of alignments. and argue that they can be viewed as the essential ones. We currently test this hypothesis with protein seqiiences.

Algorithmic Aspects of Protein Structure , EMBL, Heidelberg Proteins are beautiful and con1plicatedthree-climensional structures. Their slialw a.nd biological function is coded by genetic inl'orn1ation.'lhere are thousands of bio- logically distinct. protein classes.Piiysicist. view proteins a.spolymer chains and try t.o understand the structural principles common to all. Biologists try to un(.lerstau(l how genetic.information is translated into the highly iiidividualistic biological role of a protein. (bmputer scientists see proteins as gra.phs or space curves and try to develop algoritluns that siinplify the enormous combinatorial complexity of picking out the correct structure for a given genetic sequence. (During the. workshop we proved tha.t the protein folding problem is SO-l..lcl.l(l,U. Manber and (.7.Sander, unpublished)

Climbing a Tree Through the Window , University of Montreal The method of nearest-neighbourinterchange effects local improvements in a |)iua.ry tree by replacing a 4-subtree by one ol its two a.lt.ernativesif this improves the ob- jective function. We extend this to l<-subtreesin order to reduce the number of local optima. Possible sequencesof k-subtrees to be examined are proingrules. Increasing window size I: is found to be the most effective wa.y of improving the local optimum, followed by the choice of hill-climbing over the random strategy. A suggestion for achieving higher values of k is based on a recursive use of the hill-climbing st.rat(-gy.

Secondary Structure Problems in Coiled-Coil Proteins Jeanette P. Schmidt, Polytechnic Univ., Brooklyn, N.Y. We describe efficient computational tools for the (~xamiuation of a class of proteins that fold as alpha-helica.l coilecl-coil proteins. In particular we detect whether a sequenceof amino acids c.ont.ainsthe 7-residue periodicity of amino acids ll((i(.?SH?1.l'_\

13 to promote a coiled-coil conformation. A penalty matrix is used which determines the quality of the fit of a given amino acid at a given position of the hepta.d. A simple and effic.ientalgorithm is presented,which can detect irregular periodic repetitions of a.ny constant size pattern in an arbitrary text, in time that is linear in the size of the text, (wheregaps are alloweda.nd the pattern is specilieclby a.narbitrary penalty matrix). The a.lgorithm is used to align proteins (presuinedto be coiled-coil) to the 7 positions of the heptad. The alignment is currently used to compare the outer surface of one coiled~coil protein to the other surface of a second coiled-coil protein. (Joint work with V. Fischetti, G. Landau and l. Sellers.)

Sequence-Structure Relationship in the Twighlight Zone Reihard Schneider, EMBL, Heidelberg The da.t.abaseof known protein threedimensional structures can be significantly in- creased by the use of sequence homology, based on the following observations. (1) The database of known sequences,currently at more than 25000 proteins, is two or- ders of magnitude larger than the database of known st.ructures. The currently most powerful method of predicting protein structures is model building by homo- logy. (3) Structural homology can be inferred from the level of sequencesimilarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. This emprically derived threshold curve for structural similarity was discussed. We first quantify the relation between se- quence similarity, struc.ture similarity and alignment length by an exhaustive survey of alignments between proteins of known structure and report. a homology threshold curve as a. function of alignment length. We then produce a database of homology- derived secondary structure of proteins (H SSP) by aligning to each protein of known structure all sequencesdeemed homologous on the basis of the. tlireshold curve. For each known protein structure, the derived database contains the aligned sequences. secondary structure, sequence variability and sequence prole. Tertiary structures of the aligned sequences are implied, but not modelled explicitly. The results are useful in assessingthe structural significance of matcht-s in sequencedatabase sear- ches, in deriving preferences and patterns for stru('tur<'prediction, in elucidating the structural role of conserved residues and in modelling three-dimensional detail by homology.The results of a c.omprel1ensivesequence analysis of the 18? predicted open reading frames of yeast.chromosome III were presentedand discussed.When the results of database similarity searchesare pooled with prior knowledge, a likely function can be assignedto 42% of the proteins, and a predicted 3-D structure to a third of these (14%). The function of the remaining58% remainsto be determined. An outlook for other genome projects was given. In our opinion, development in the area of protein sequence analysis should focus on three major areas. (1) improved algorithms for the detection of real, but difficult to catch, homologies and for the direct prediction of structure and function. (2) the integration of heterogenous tools into a overall working enviroment with facile exchangeof information between tasks

14 (sequenceanalysis workbench) (3) direct, on-line, ondesk availability of all relevant, information in the biological literature.

Computer-Aided Design of Proteins With New Properties Dietmar Schomburg, GBF Braunschweig / CAPE By the increase of knowledge on protein 3D-structiire, the last development of mo- lecular graphics and force eld calculations, and a good understanding of structure- function correlation, a rational design of proteins with new desired properties has been possible recently. A few examples are given in the lecture. Still, the computer- based methods as sequence alignment, 3D structure prediction, and docking pre- diction urgently need improvement. Examples of new develop1nent.sat CA P E - tliv German Centre of Applied Protein Engineering - were presented in the lecture.

Applications of Articial Intelligence and Machine Learning to Protein Structures

Steffen SchulzeKremer, Brainware GmbH, Berlin The IPSA method aims at lea.rningpatterns of supersecondarystructure from a set of known proteins. This involves selecting a list of properties of secondarystruc- tures (topological, geometricaland cheniophysicali);setting up a database;running learning from observationprograms on that database.So far pairs of a-helicesand pairs of an ahelix and a /3-strandhave been classifiedand described.Among the classesgenerated there are some with three secondarystructures in exact a.gree- ment, although only information on two secondarystructures wasgiven, and classes that were formed completely by long range interactions. Another Al-application in Biochemistry was shown, the use of genetic algorithms. It involves a torsion angle representation using standard bond lengths and angles; the operators SE- LECT, MUTATE and CROSSOVER;and a very simplefitness function of the form E = Etor + Evdw+ Esmt.Although no conformationgenerated resembled the native structure (of Crambin), the geneticalgorithm producedvery low tness individuals. Work is going on to improve the fitness function.

Automatic Derivation of Patterns to Predict Protein Structure Joachim Selbig, GMD, St. Augustin Pattern-basedheuristic prediction of protein structure restsupon someform of local homologywhere the homologyinformation is extractedfrom structure data basesin a genericform by certaingeneralization principles. Taking into accountsequentially long-rangeinteractions requires an appropriaterepresentation of proteinstructure. In particular,this holdsto the casewhen the patternsare generatedautomatically

15 by machine learning methods. Though the pattern generation by hand on the base of biophysical principles is very successfulmachine lea.rning methods may be used to search the available data. systematically and thus to improve our understanding of the protein folding principles. Learning has been mainly viewed as inducing general concept desciptions from a learning set subdivided into classes.One of the import- ant dimensions for cha.racterizing learning systems is the type of the representation languages used to describe the elements of the learning set and the concepts. In our approach the data about the spatial structure of the proteins determined by Xray crystallography are transformed into a graph description which provides the possibility to dene a multitude of patterns for describing structural elements and which may be understood as discrete forms of the contact maps.

Physical Mapping of DNA and Interval Graphs , Tel Aviv University, Israel

A fundamental problem in temporal reasoning is to determine the consistency of a set of events, where for each pair of events a set of possible atomic relations (prece- dence, overlap, containment etc.) is prescribed. Events are assumed to be intervals on the real line. We study the problem for a simplied model, where the only atomic relations are precedenceand intersection.By restricting the input to a subsetof the powerset. of atomicrelations one gets a varietyof interestingcombinatorial pro- blems. We give N P-hardness results or polynomial algorithms for a variety of such restricted problems. In the DNA physical mapping problem, the chromosome corre- sponds to the time line and the fragments of the DNA are the events. A simplied model for the biological problem is shown to be equivalent to one of the NP-hard restrictions of the general model. Ways to exploit. the many polynomial restrictions in order to expedite physical map assemblyare suggested. (Joint work with M.C. Golumbic, IBM Haifa, Israel, and in part with H. Kaplan, Tel Aviv University, Israel) 0

Hide and Seek on a Polyprotein Manfred Sippl, University Salzburg The set of experimentally determined protein structures is used to derive a know- ledge based force eld which in turn allows calculation of conformational energies of proteins. The nal goal is the computational determination of protein structures using the knowledge based force eld. The development of force elds depends 011 useful techniques for the assessmentof the performance of the force field at each stage of development. A necessary condition is that the native fold of a protein has lowestenergy comparedto a number of nonnativedecoys. The current versionof the force eld is able to identify all native folds in our data base(160 individual chains) among approximately 40,000alternatives. At the current state of developmentthe force field can be used to validate experimentally determined structures, to detect

16 nat.ive like folds for sequencesof unknown structure in a da.ta base of known folds using sequence struc.ture. alignment techniques and to build models of ent.ire folds from ensembles of overlapping fragments.

The Prediction of Some Protein Structural Features

Temple F. Smith, Boston University Using an alignment of some 127 proteins from the PDB with their close homo- logs a number of statistical measure were obtained: these included the conditional probabilities: P(.S'/,.|S;,._1,),P(.5';,|ak.), . . . , P(Sk|ak, a.k__1 . .5'k_1 @"mm these the missingor Shannon information was calcula.t.ed,given that. there are only eight structural states. These data suggest that the standard secondary structure prediction can do no better than 65% which is inaccord with experience.In addition thesedata suggestthat the use of the variability at aligned homologous positions provides the largest reduction in missing information. with an upper limit of 86% on predictions fully exploiting such information. Finally if secondary structure prediction is done in the context of our understandingsof realizable tertiary structures much higher valuesmay be possible. This was tested by modeling the tertiary 3-D information of a set of seven domain classesusing discrete spacestate models. These models condition the pri- mary and secondary structure on allowed tertiary struc.tures and appear to increase the predictability to near 95%.

Interpretation of Parametric Alignment Plots , Univ. of Southern California Basedon the methods presentedby M. \/Vaterman(see above) the patterns arising for parametric alignments were analyzed. The logic behind the plot was first ex- emplied based on the comparison of random sequences.The most striking feature there is the clear reflection of the statistical features of alignment score in the com- binatorial structure for the plots. When moving to the comparisonof real biological sequencesthese featurescan be found again in a somewhatdistorted form though. Nevertheless,we could highlight certain patterns which seemlinked to biologically correct alignments and which might aid in their recognition.

New Problems in Evolutionary Tree Construction Tandy J. Warnow, Sandia National Labs, USA Classical models for constructing trees from discrete data either use distance matri- ces or qualitative characters. The complexity of these problems are discussed, a.nd the flaws in these models are examined. Several new models for tree construction are

17 then presented which may permit efficient algorithms to be discovered, a.nd which avoid some of the inherent limitations of the classical formulations.

Parametric Sequence Alignment , Univ. Southern California Dynamic programming algorithms for optimal alignments of two nucleic acid or protein sequencesrequire setting penalty paranieters. While the choice of these pa.- rameters greatly influences the quality of the resulting alignments. this choice has beenmade in an ad hocfashion. In this ta.lkwe present an algorithmto find optimal alignment scores for all choices of the penalty parameters when the score is linear in the penalty parameters. In addition some statistical theory of the asymptotic growth of alignment scores with the length of random sequencesis presented and related to the parametric sequence alignments.

RNA Secondary Structure Modeling Michael Zuker, NRC, Ottawa, Canada RNA secondary structure modeling differs fundame.ntally from conventional atomic resolution modeling. By borrowing discrete optimization methods used in sequence a.lignment, it ca.n unfailingly predict minimum free energy as well as close to opti- mal foldings. The denition of secondarystructure and the recursionfor computing optimal foldings a.regiven. The dynamic programming ll algorithm runs in the time @(n)as presented,where n is the sequencesize. A stopping rule is introduced that limits a backtracking step in the search for a best interior or bulge loop closed by a given base pair. It is conjectured that the expected depth of the backtracking search is bounded, resulting in an overall ('9(n3)performance. lt is shown how to predict suboptimal foldings by executing the fill algorithm on two ligated copies of the same sequence. Base pairs that can participate in optimal and close to optimal foldings a.re displayed as points in triangular arrays called energy dot plots. The energy dot plot for the entire 4217 base genome of the bacteriophage P reveals distinct structural domains that correspond well to what is observed by electron microscopy. A detailed model for the central region agrees well with data from en- zyme cleavage and chemical modification experiments, and is further supported by studies on mutant phages.A cluttered region in the dot plot contains basepairs of a slightly suboptimal alternate folding that is observedby electron mic.rosc.opy.

18' Dagstuhl-Seminar 9237: List of Participants (update:17.09.92)

Alberto Apostolico Andreas Dress Purdue University Universität Bielefeld Computer Science Department Fakultät Mathematik West Lafayette IN 47907 Postfach 86 40 USA W-4800 Bielefeld [email protected] [email protected] teI.: (317) 494 6015 bieIefeId.de teI.: +49-521-106-50 34/20 Norbert Blum fax.: +49-521-106-47 43 Universität Bonn Institut für Informatik Robert Gie erich Römerstr. 164 Universität ielefeld W-5300 Bonn 1 Technische Fakultät [email protected] Postfach 86 40 teI.: +49-228-550-250 W-4800 Bielefeld fax.: +49-228-550-4 40 [email protected] teI.: +49521-106 2913 Hue Sun Chan Universit of Calif. at San Francisco Gaston Gonnet Dept. of harmaceutical Chemistry ETH Zürich Box 12 04 Informatik San Francisco CA 94143 ETH-Zentrum USA CH-8092 Zürich [email protected] Switzerland teI.: +1-415-476-89 10 [email protected] fax.: +1 -41 5-476-15 08 teI.: +41 -254-74 70 fax.: +41-1-262-39 73 William I. Chang Cold Spring Harbor Laboratory Timothy Havel 100 Bungtown Road Harvard Medical School Cold Spring Harbor NY 11724--2202 BCMP USA 240 Longwood Ave [email protected] Boston MA 02115 teI.: +1 -516-367-88 66 USA fax.: +1 -516-367-84 61 havel@pto|emy.med.harvard.edu Gordon Crippen Arndt von Haeseler The University of Michigan Universität München College of Pharmacy Zoologisches lnstitut der Uni. München Ann Arbor Ml 48109-1095 Luisenstraße 14 USA W-8000 München 2 [email protected] [email protected]muenchen.dbp.de teI.: +1 -31 3-763-97 22 teI.: +49-89-5 90 23 27 fax.: +49-89-5 90 24 74 Maxime Crochemore Université Paris VII Ralf Hofestädt LlTP Universität Koblenz - Landau Institut Blaise Pascal Fachbereich Informatik 2 Place Jussieu Rheinau 3 - 4 F-75251 Paris Cedex O5 W-5400 Koblenz France [email protected] [email protected] teI.: +4926191 19-431 teI.: +33-1-44.27.68.47 fax.: +33-1-44.27.68.49 Frieder Kaden Gad Landau Gesellschaft für Mathematik und Polytechnic University Datenverarbeitung mbH Dept. of CS Schloß Birlinghoven Six MetroTech Center Postfach 12 40 Brooklyn NY 11201 W-5205 St. Au ustin 1 USA FriedenKaden gmd.de [email protected] teI.: +49-2241-14 27 86 teI.: +1 -71 8-260-31 54 fax.: +49-2241-14 21 02 fax.: +1 -71 8-260-39 06

Richard Karp University of California at Berkeley University of California at Berkeley Computer Science Division Computer Science Division 589 Evans Hall 591 Evans Hall Berkeley CA 94720 Berkeley CA 94720 USA USA [email protected] [email protected] teI.: +1-510-642-15 59 teI.: +1-510-642-40 19 fax.: +1 -510-642-57 75 fax.: +1 -510-642-57 75

Marek Karpinski Thomas Lengauer Universität Bonn Gesellschaft für Mathematik und Institut für Informatik Datenverarbeitung mbH Römerstr. 164 Schloß Birlinghoven W-5300 Bonn 1 Postfach 12 40 Germany W-5205 St. Augustin 1 [email protected] [email protected] teI.: +49-228-550-2 24 teI.: 02241-14 27 77 fax.: +49228550-4 40 fax.: 02241-14 21 02

Peter Kleinschmidt Udi Manber Universität Passau University of Arizona Fachbereich Mathematik & Informatik Department of Computer Science Innstraße 33 Tucson AZ 85721 W-8390 Passau USA [email protected] [email protected] teI.: +49-851-50 93 39 tel.: +1 -602-621 -66 13 1ax.: +49-851-50 91 71 fax.: +1-602-621-42 46

Ina Koch Saira Mian Gesellschaft für Mathematik und University of California at Santa Cruz Datenverarbeitung mbH Sinsheimer Laboratories Schloß Birlinghoven Santa Cruz CA 95064 Postfach 12 40 USA W-5205 St. Augustin 1 [email protected] [email protected] teI.: +1-408-459-27 00 teI.: +49-2241-141-28 36 fax.: +1-408-459-31 39 fax.:+49-2241-142-8 89 SatoruMiyano Hans-Peter Kriegel Kyushu University 33 Universität München Research Institute of lnstitut für Informatik Fundamental Information Science Leopoldstraße 11B Postal No. 812 W-8000 München 40 Fukuoka [email protected] Japan teI.: +49089-2180-52 68 [email protected] fax.: +49~89-2180-52 46 teI.: +81-92-641-11 01 ext: 44 71 fax.: +81-92-611-26 68 Dalit Naor Dietmar Schomburg Stanford University Medical Center Ges. für biotechnologische Department of Biochemistry Forschung mbH (GBF) Stanford CA 94305-5307 Mascheroder Weg 1 USA W-3300 Braunschweig [email protected] Germany tel. : +1 -415-723-59 76 [email protected] fax.: +1 -41 5-723-67 83 tel.: +49-531-61 81-3 50 fax.: +49-531-61 81-3 55 Hartmut Noltemeier Universität Würzburg Steffen SchulzeKremer Lehrstuhl für Informatik Brainware GmbH Am Hubland Gustav-Meyer-Allee 25 W870O Würzburg W-1000 Berlin 65 [email protected] [email protected] teI.: +49-931888-50 55 tel.: +49-30-46 330-40 fax.: +49-931-888-46 00 fax.: +4930-469-46 49

Chris Sander Joachim Selbig European Molecular Biology Gesellschaft für Mathematik und Laboratory (EMBL) Datenverarbeitung mbH Meyerhofstraße 1 Schloß Birlinghoven W-6900 Heidelberg Postfach 12 40 [email protected] W-5205 St. Augustin 1 teI.: +49-622138 73 61 [email protected] fax.: +49-6221-38 73 06 teI.: +49-2241-14-27 92 fax.: +49-2241-14-21 02 David Sankoff Université de Montreal Ron Shamir Centre de Recherches Mathematiques Tel Aviv University Montreal H3C 3J7 Dept. of Computer Science Canada Tel Aviv 69978 [email protected] Israel teI.: +1 -51 4-343-7574 [email protected] fax.: +1 -51 4-343-2254 teI.: +972-3-640-93 73 fax.: +972-3-640-93 57 Jeanette P. Schmidt Polytechnic University Manfred J. Sippl Computer Science Department Universität Salzburg 6 Metrotech Center Center for Applied Molecular Brooklyn NY 11201 Engineering USA Jakob Haringer Str. 1 [email protected] A-5020 Salzburg teI.: +1 -718-260 3502 Austria fax.: +1 -718-260-39 06 [email protected] teI.: +43-662-8044-57 97 Reinhard Schneider European Molecular Biology Hans-WernerSix Laboratory (EMBL) Fernuniversität-GH-Hagen Meyerhofstraße 1 Praktische Informatik lll W-6900 Heidelberg Feithstraße 140 [email protected] W-5800 Hagen 1 teI.: +49-6221-38 73 05 six@fernunihagen.de fax.: +49-6221-38 73 06 teI.: -I-49'2331'987i2964 fax.: +49-2331-987-3 17 Temple Smith Ralf Zimmer Boston University Gesellschaft für Mathematik und BMERC Datenverarbeitung mbH 36 Cummington Street Schloß Birlinghoven Boston MA 02215 Postfach 12 40 USA W5205 St. Augustin 1 [email protected] [email protected] tel.: +1-617-353-71 23 tel.: +49-2241-14 28 18 fax.: +1 -617-353-70 20 fax.: +49-2241-14 26 18

Leslie A. Taylor Michael Zuker UCSF School of Pharmacy National Research Council Computer Graphics Lab Inst. for Biological Sciences M-54 Box 0446 Room S926 Ottawa Ontario K1 A 0R6 513 Parnassus Avenue Canada SänFrancisco CA94143-0446 [email protected] U A tel.: +1-613-993-48 30 [email protected] tel.: +1 -415-476-5379 fax.: +1-415-502-1755

Martin Vingron University of Southern California Department of Mathematics DRB 287 - University Park Los Angeles CA 90089-1113 USA [email protected] tel.: +1-213740-24 10 fax.: +1 -213-740-24 37

Tandy Warnow Princeton University Department of Computer Science 35 Olden Street Princeton NJ 08544 USA [email protected]

Michael Waterman University of Southern California Department of Mathematics DRB 287 University Park Los Angeles CA 90089-1113 USA [email protected] teI.: +1 -21 3-740-24 08

Peter Wildmayer ETH Zürich Department of Computer Science ETH-Zentrum CH-8092 Zürich Switzerland [email protected] tel.: +41 -1 -254-74 00

Zuletzt erschienene und geplante Titel: K. Compton, J.E. Pin , W. Thomas (editors): Automata Theory: Infinite Computations, Dagstuhl-Seminar-Report; 28, 6.-10.1.92 (9202) H. Langmaack, E. Neuhold, M. Paul (editors): Software Construction - Foundation and Application, Dagstuhl-Seminar-Report; 29, 13..-17.1.92 (9203) K. Ambos-Spies, S. Homer, U. Schöning (editors): Structure and Complexity Theory, Dagstuhl-Seminar-Report; 30, 3.-7.02.92 (9206) B. Booß, W. Coy, J.M. Pflüger (edilors): Limits of Modelling with Programmed Machines, Dagstuhl-Seminar-Report; 31, 10.-14.2.92 (9207) K. Compton, J.E. Pin , W. Thomas (editors): Automata Theory: Infinite Computations, Dagstuhl-Seminar-Report; 28, 6.-10.1.92 (9202) H. Langmaack, E. Neuhold, M. Paul (editors): Software Construction - Foundation and Application, Dagstuhl-Seminar-Fleport; 29, 13.-17.1.92 (9203) K. Ambos-Spies, S. Homer, U. Schöning (editors): Structure and Complexity Theory, Dagstuhl-Seminar-Report; 30, 3.-7.2.92 (9206) B. Booß, W. Coy, J.-M. Pflüger (editors): Limits of Information-technological Models, Dagstuhl-Seminar-Report; 31, 10.-14.2.92 (9207) N. Habermann, W.F. Tichy (editors): Future Directions in Software Engineering, Dagstuhl-Seminar-Report; 32; 17.2.-21.2.92 (9208) H. Cole, E.W. Mayr F. Meyer auf der Heide (editors): Parallel and Distributed Algorithms; Dagstuhl-Seminar-Report; 33; 2.3.-6.3.92 (9210) P. Klint, T. Reps, G. Snelting (editors): Programming Environments; Dagstuhl-Seminar-Report; 34; 9.3.-13.3.92 (9211) H.-D. Ehrich, J.A. Goguen, A. Sernadas (editors): Foundations of Information Systems Specification and Design; Dagstuhl-Seminar-Report; 35; 16.3.-19.3.9 (9212) W. Damm, Ch. Hankin, J. Hughes (editors): Functional Languages: Compiler Technology and Parallelism; Dagstuhl-Seminar-Report; 36; 23.3.-27.3.92 (9213) Th. Beth, W. Diflie, G.J. Simmons (editors): System Security; Dagstuhl-Seminar-Report; 37; 30.3.-3.4.92 (9214) C.A. Ellis, M. Jarke (editors): Distributed Cooperation in integrated Information Systems; Dagstuhl-Seminar-Report; 38; 5.4.- 9.4.92 (9215) J. Buchmann, H. Niederreiter, AM. Odlyzko, H.G. Zimmer (edilors): Algorithms and Number Theory, Dagstuhl-Seminar-Report; 39; 22.06.-26.06.92 (9226) E. Borger, Y. Gurevich, H. Kleine-Bijning, M.M. Richter (editors): Computer Science Logic, Dagstuhl-Seminar~Fleport; 40; 13.07.-17.07.92 (9229) J. von zur Gathen, M. Karpinski, D. Kozen (editors): Algebraic Complexity and Parallelism, Dagstuhl-Seminar-Fleport; 41; 20.07.-24.07.92 (9230) F. Baader, J. Siekmann, W. Snyder (editors): 6th lntemational Workshop on Unification, Dagstuhl-Seminar-Report; 42; 29.07.-31.07.92 (9231) J.W. Davenport, F. Krückeberg, RE. Moore, S. Rump (editors): Symbolic, algebraic and validated numerical Computation, Dagstuhl-Seminar-Report; 43; 03.08.- 07.08.92 (9232) R. Cohen, R. Kass, C. Paris, W. Wahlster (editors): Third lntemational Workshop on User Modeling (UM92), Dagstuhl-SeminarReport; 44; 10.- 13.8.92 (9233) R. Reischuk, D. Uhlig (editors): Complexity and Realization of Boolean Functions, Dagstuhl-Seminar-Report; 45; 24.08.-28.08.92 (9235) Th. Lengauer, D. Schornburg, M.S. Waterman (editors): Molecular Biointormatics, Dagstuhl-Seminar-Report; 46; 07.09.-11.09.92 (9237) V.R. Basili, H.D. Rombach, R.W. Selby (editors): Experimental Software Engineering Issues, Dagstuhl-Seminar-Report; 47; 14.-18.09.92 (9238) Y. Dittrich, H. Hastedt, P. Schele (editors): Corrputer Science and Philosophy, Dagstuhl-Seminar-Report; 48; 21 .09.-25.09.92 (9239) R.P. Daley, U. Furbach, K.P. Jantke (editors): Analogical and Inductive Inference 1992 , Dagstuhl-Seminar-Fleport; 49; 05.10.-09.10.92 (9241) E. Novak, St. Smale, J.F. Traub (editors): Algorithms and Complexity for Continuous Problems, Dagstuhl-Seminar-Report; 50; 12.10.- 16.10.92 (9242) J. Encarnacao, J. Foley (editors): Multimedia System Architectures and Applications, Dagstuhl-Seminar-Report; 51; 02.11.- 06.1 1.92 (9245) F.J. Rammig, J. Staunstrup, G. Zimmermann (editors): Sell-Timed Design, Dagstuhl-Seminar-Report; 52; 30.11 .-04.12.92 (9249 ) B. Courcelle, H. Ehrig, G. Rozenberg, H.J. Schneider (editors): Graph-Transformations in Computer Science, Dagstuhl-Seminar-Report; 53; 04.01.-08.01.93 (9301) A. Amold, L. Priese, R. Vollmar (editors): Automata Theory: Distributed Models, Dagstuhl-Seminar-Report; 54; 1 1.01 .-15.01 .93 (9302) W.S. Cellary, K. Vldyasankar , G. Vossen (editors): Versioning in Data Base Management Systems, Dagstuhl-Seminar-Report; 55; 01.02.-05.02.93 (9305) B. Becker, R. Bryant, Ch. Meinel (editors): I Computer Aided Design and Test , Dagstuhl-Seminar-Report; 56; 15.02.-19.02.93 (9307) M. Pinkal, R. Scha, L. Schubert (editors): Semantic Formalisms in Natural Language Processing, Dagstuhl-Seminar-Report; 57; 23.02.- 26.02.93 (9308) H. Bibel, K. Furukawa, M. Stickel (editors): Deduction , Dagstuhl-Seminar-Report; 58; 08.03.-12.03.93 (9310) H. Alt, B. Chazelle, E. Welzl (editors): Computational Geometry, Dagstuhl-Seminar-Report; 59; 22.03.-26.03.93 (9312) J. Pustejovsky, H. Kamp (editors): Universals in the Lexicon: At the Intersection of Lexical Semantic Theories, DagstuhI-Seminar- Report; 60; 29.03.O2.04.93 (9313) W. Straer, F. Wahl (editors): Graphics 8. Robotics, Dagstuhl-Seminar-Report; 61; 19.04.-22.04.93 (9316) C. Beeri, A. Heuer, G. Saake, S.D. Urban (editors): Formal Aspects of Object Base Dynamics , Dagstuhl-Seminar-Report; 62; 26.04.-30.04.93 (9317)