Lecture Notes in Computer Science 2676 Edited by G

Lecture Notes in Computer Science 2676 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo Ricardo Baeza-Yates Edgar Chávez Maxime Crochemore (Eds.) Combinatorial Pattern Matching 14th Annual Symposium, CPM 2003 Morelia, Michoacán, Mexico, June 25-27, 2003 Proceedings 13 Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Ricardo Baeza-Yates Universidad de Chile Depto. de Ciencias de la Computación Blanco Encalada 2120 Santiago 6511224, Chile E-mail: [email protected] Edgar Chávez Universidad Michoacana Escuela de Ciencias Físico-Matemáticas Edificio "B", ciudad universitaria Morelia, Michoacán, Mexico E-mail: elchavez@fismat.umich.mx Maxime Crochemore Université de Marne-la-Vallée 77454 Marne-la-Vallée CEDEX 2, France E-mail: [email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. CR Subject Classification (1998): F.2.2, I.5.4, I.5.0, I.7.3, H.3.3, E.4, G.2.1 ISSN 0302-9743 ISBN 3-540-40311-6 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH Printed on acid-free paper SPIN: 10927533 06/3142 543210 Preface The papers contained in this volume were presented at the 14th Annual Sympo- sium on Combinatorial Pattern Matching, held June 25–27, 2003 at the Centro Cultural Universitario of the Universidad Michoacana, in Morelia, Michoacán, Mexico. They were selected from 57 abstracts submitted in response to the call for papers. In addition, there were invited lectures by Vladimir Levenshtein, from the University of Bergen, Norway, and Ian Munro, from the University of Waterloo, Canada. Combinatorial Pattern Matching (CPM) addresses issues of searching and matching strings and more complicated patterns such as trees, regular expressi- ons, graphs, point sets, and arrays, in various formats. The goal is to derive non- trivial combinatorial properties of such structures and to exploit these properties in order to achieve superior performance for the corresponding computational problems. Another important goal is to analyze and pinpoint the properties and conditions under which searches cannot be performed efficiently. Over the past decade a steady flow of high quality-research on this subject has changed a sparse set of isolated results into a full-fledged area of algorithmics. This area is continuing to grow even further due to the increasing demand for speed and efficiency that stems from important applications such as the World Wide Web, computational biology, computer vision, and multimedia systems. These involve requirements for information retrieval in heterogeneous databases, data compression, and pattern recognition. The objective of the annual CPM ga- thering is to provide an international forum for research in combinatorial pattern matching and related applications. The first 13 meetings were held in Paris, London, Tucson, Padova, Asilomar, Helsinki, Laguna Beach, Aarhus, Piscataway, Warwick, Montreal, Jerusalem, and Fukuoka, over the years 1990–2002. After the first meeting, a selection of papers appeared in Volume 92 of Theoretical Computer Science. Selected papers of the 10th meeting appeared as a special issue of the Journal of Discrete Algorithms. Selected papers of the 12th meeting will appear in a special issue of Discrete Applied Mathematics. The proceedings of the 3rd to 13th meetings appeared as Volumes 644, 684, 807, 937, 1075, 1264, 1448, 1645, 1848, 2089, and 2373 of the Springer LNCS series. The general organization and orientation of the CPM conferences is coordinated by a steering committee composed of Alberto Apostolico (Universities of Padova and Purdue), Maxime Crochemore (University of Marne-la-Vallée and King’s College London), Zvi Galil (Columbia University), and Udi Manber (Amazon). The conference chair was Edgar Chávez (University Michoacana). April 2003 R. Baeza-Yates, E. Chávez, and M. Crochemore Program Committee Ricardo Baeza-Yates, co-chair, Univ. of Chile Edgar Chávez, Univ. of Michoacán, Mexico Richard Cole, New York University, USA Maxime Crochemore, co-chair, Univ. of Marne-la-Vallée, France Rafaelle Giancarlo, Univ. of Palermo, Italy Roberto Grossi, Univ. of Pisa, Italy Dan Gusfield, U.C. Davis, USA Costas Iliopoulos, King’s College London, UK Joao Paulo Kitajima, Alellyx Applied Genomics, Brazil Gad Landau, Univ. of Haifa, Israel Thierry Lecroq, Univ. of Rouen, France Udi Manber, Amazon, USA Gonzalo Navarro, Univ. of Chile Wojciech Plandowski, Warsaw University, Poland Marie-France Sagot, INRIA Rhône-Alpes, France Cenk Sahinalp, Case Western Reserve Univ., USA Jeanette Schmidt, Incyte, USA Ayumi Shinohara, Kyushu University, Japan Kaizhong Zhang, Univ. of Western Ontario, Canada Local Organization Local arrangements and the conference Web site were coordinated by Edgar Chávez. Organizational help was provided by the School of Physics and Mathe- matics, University of Michoacán. Sponsoring Institutions The conference was sponsored by the Consejo Nacional de Ciencia y Tecnolog´ıa (CONACyT) and the Universidad Michoacana. Table of Contents Multiple Genome Alignment: Chaining Algorithms Revisited ........... 1 Mohamed Ibrahim Abouelhoda, Enno Ohlebusch Two-Dimensional Pattern Matching with Rotations ................... 17 Amihood Amir, Ayelet Butman, Maxime Crochemore, Gad M. Landau, Malka Schaps An Improved Algorithm for Generalized Comparison of Minisatellites .... 32 Behshad Behzadi, Jean-Marc Steyaert Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions ..................................... 42 Broˇna Brejová, Daniel G. Brown, Tomáˇs Vinaˇr Fast Lightweight Suffix Array Construction and Checking .............. 55 Stefan Burkhardt, Juha Kärkkäinen Distributed and Paged Suffix Trees for Large Genetic Databases ........ 70 Raphaël Clifford, Marek Sergot Analysis of Tree Edit Distance Algorithms ........................... 83 Serge Dulucq, Hélène Touzet An Exact and Polynomial Distance-Based Algorithm to Reconstruct Single Copy Tandem Duplication Trees .............................. 96 Olivier Elemento, Olivier Gascuel Average-Optimal Multiple Approximate String Matching............... 109 Kimmo Fredriksson, Gonzalo Navarro Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms ........................................... 129 Raffaele Giancarlo, Marinella Sciortino Haplotype Inference by Pure Parsimony ............................. 144 Dan Gusfield A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions ... 156 Tzvika Hartman Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals ............................ 170 Haim Kaplan, Elad Verbin X Table of Contents Linear-Time Construction of Suffix Arrays ........................... 186 Dong Kyue Kim, Jeong Seop Sim, Heejin Park, Kunsoo Park Space Efficient Linear Time Construction of Suffix Arrays .............. 200 Pang Ko, Srinivas Aluru Tuning String Matching for Huge Pattern Sets ........................ 211 Jari Kytöjoki, Leena Salmela, Jorma Tarhio Sparse LCS Common Substring Alignment ........................... 225 Gad M. Landau, Baruch Schieber, Michal Ziv-Ukelson On Minimizing Pattern Splitting in Multi-track String Matching ........ 237 Kjell Lemström, Veli Mäkinen Alignment between Two Multiple Alignments ........................ 254 Bin Ma, Zhuozhi Wang, Kaizhong Zhang An Effective Algorithm for the Peptide De Novo Sequencing from MS/MS Spectrum ............................................ 266 Bin Ma, Kaizhong Zhang, Chengzhi Liang Pattern Discovery in RNA Secondary Structure Using Affix Trees ....... 278 Giancarlo Mauri, Giulio Pavesi More Efficient Left-to-Right Pattern Matching in Non-sequential Equational Programs .............................................. 295 Nadia Nedjah, Luiza de Macedo Mourelle Complexities of the Centre and Median String Problems ............... 315 Fran¸cois Nicolas, Eric Rivals Extracting Approximate Patterns ................................... 328 Johann Pelfrêne, Sa¨ıd Abdedda¨ım, Joël Alexandre A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression ..................................................... 348 Hiroshi Sakamoto Constrained Tree Inclusion ......................................... 361 Gabriel Valiente Working on the Problem of Sorting by Transpositions on Genome Rearrangements .................................................. 372 Maria Emilia M.T. Walter, Luiz Reginaldo A.F. Curado, Adilton G. Oliveira Table of Contents XI Efficient Selection of Unique and Popular Oligos for Large EST Databases ........................................................ 384 Jie Zheng, Timothy J. Close, Tao Jiang, Stefano Lonardi Author Index ................................................ 403.

Lecture Notes in Computer Science 2676 Edited by G

On the Suffix Automaton with Mismatches

Accelerating Dynamic Programming Oren Weimann

Arxiv:0707.3619V21 [Cs.DS] 23 Nov 2013

Locating Maximal Approximate Runs in a String Mika Amit, Maxime Crochemore, Gad Landau, Dina Sokol

Algorithmic Contributions to Computational Molecular Biology Stéphane Vialette

Dagrep V001 I002 P047 S11081

Combinatorial Algorithms for DNA Sequence Assembly

IBM Research Report a PQ Tree-Based Framework For

CPM's 20Th Anniversary: a Statistical Retrospective

29Th Annual Symposium on Combinatorial Pattern Matching

Algorithms and Data Structures for Grammar-Compressed Strings

Computing the Burrows–Wheeler Transform in Place and in Small Space Maxime Crochemore, Roberto Grossi, Juha Kärkkäinen, Gad Landau