Lecture Notes in Computer Science 2676 Edited by G

Total Page:16

File Type:pdf, Size:1020Kb

Lecture Notes in Computer Science 2676 Edited by G Lecture Notes in Computer Science 2676 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo Ricardo Baeza-Yates Edgar Chávez Maxime Crochemore (Eds.) Combinatorial Pattern Matching 14th Annual Symposium, CPM 2003 Morelia, Michoacán, Mexico, June 25-27, 2003 Proceedings 13 Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Ricardo Baeza-Yates Universidad de Chile Depto. de Ciencias de la Computación Blanco Encalada 2120 Santiago 6511224, Chile E-mail: [email protected] Edgar Chávez Universidad Michoacana Escuela de Ciencias Físico-Matemáticas Edificio "B", ciudad universitaria Morelia, Michoacán, Mexico E-mail: elchavez@fismat.umich.mx Maxime Crochemore Université de Marne-la-Vallée 77454 Marne-la-Vallée CEDEX 2, France E-mail: [email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. CR Subject Classification (1998): F.2.2, I.5.4, I.5.0, I.7.3, H.3.3, E.4, G.2.1 ISSN 0302-9743 ISBN 3-540-40311-6 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH Printed on acid-free paper SPIN: 10927533 06/3142 543210 Preface The papers contained in this volume were presented at the 14th Annual Sympo- sium on Combinatorial Pattern Matching, held June 25–27, 2003 at the Centro Cultural Universitario of the Universidad Michoacana, in Morelia, Michoac´an, Mexico. They were selected from 57 abstracts submitted in response to the call for papers. In addition, there were invited lectures by Vladimir Levenshtein, from the University of Bergen, Norway, and Ian Munro, from the University of Waterloo, Canada. Combinatorial Pattern Matching (CPM) addresses issues of searching and matching strings and more complicated patterns such as trees, regular expressi- ons, graphs, point sets, and arrays, in various formats. The goal is to derive non- trivial combinatorial properties of such structures and to exploit these properties in order to achieve superior performance for the corresponding computational problems. Another important goal is to analyze and pinpoint the properties and conditions under which searches cannot be performed efficiently. Over the past decade a steady flow of high quality-research on this subject has changed a sparse set of isolated results into a full-fledged area of algorithmics. This area is continuing to grow even further due to the increasing demand for speed and efficiency that stems from important applications such as the World Wide Web, computational biology, computer vision, and multimedia systems. These involve requirements for information retrieval in heterogeneous databases, data compression, and pattern recognition. The objective of the annual CPM ga- thering is to provide an international forum for research in combinatorial pattern matching and related applications. The first 13 meetings were held in Paris, London, Tucson, Padova, Asilomar, Helsinki, Laguna Beach, Aarhus, Piscataway, Warwick, Montreal, Jerusalem, and Fukuoka, over the years 1990–2002. After the first meeting, a selection of papers appeared in Volume 92 of Theoretical Computer Science. Selected papers of the 10th meeting appeared as a special issue of the Journal of Discrete Algorithms. Selected papers of the 12th meeting will appear in a special issue of Discrete Applied Mathematics. The proceedings of the 3rd to 13th meetings appeared as Volumes 644, 684, 807, 937, 1075, 1264, 1448, 1645, 1848, 2089, and 2373 of the Springer LNCS series. The general organization and orientation of the CPM conferences is coor- dinated by a steering committee composed of Alberto Apostolico (Universities of Padova and Purdue), Maxime Crochemore (University of Marne-la-Vall´ee and King’s College London), Zvi Galil (Columbia University), and Udi Manber (Amazon). The conference chair was Edgar Ch´avez (University Michoacana). April 2003 R. Baeza-Yates, E. Ch´avez, and M. Crochemore Program Committee Ricardo Baeza-Yates, co-chair, Univ. of Chile Edgar Ch´avez, Univ. of Michoac´an, Mexico Richard Cole, New York University, USA Maxime Crochemore, co-chair, Univ. of Marne-la-Vall´ee, France Rafaelle Giancarlo, Univ. of Palermo, Italy Roberto Grossi, Univ. of Pisa, Italy Dan Gusfield, U.C. Davis, USA Costas Iliopoulos, King’s College London, UK Joao Paulo Kitajima, Alellyx Applied Genomics, Brazil Gad Landau, Univ. of Haifa, Israel Thierry Lecroq, Univ. of Rouen, France Udi Manber, Amazon, USA Gonzalo Navarro, Univ. of Chile Wojciech Plandowski, Warsaw University, Poland Marie-France Sagot, INRIA Rhˆone-Alpes, France Cenk Sahinalp, Case Western Reserve Univ., USA Jeanette Schmidt, Incyte, USA Ayumi Shinohara, Kyushu University, Japan Kaizhong Zhang, Univ. of Western Ontario, Canada Local Organization Local arrangements and the conference Web site were coordinated by Edgar Ch´avez. Organizational help was provided by the School of Physics and Mathe- matics, University of Michoac´an. Sponsoring Institutions The conference was sponsored by the Consejo Nacional de Ciencia y Tecnolog´ıa (CONACyT) and the Universidad Michoacana. Table of Contents Multiple Genome Alignment: Chaining Algorithms Revisited ........... 1 Mohamed Ibrahim Abouelhoda, Enno Ohlebusch Two-Dimensional Pattern Matching with Rotations ................... 17 Amihood Amir, Ayelet Butman, Maxime Crochemore, Gad M. Landau, Malka Schaps An Improved Algorithm for Generalized Comparison of Minisatellites .... 32 Behshad Behzadi, Jean-Marc Steyaert Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions ..................................... 42 Broˇna Brejov´a, Daniel G. Brown, Tom´aˇs Vinaˇr Fast Lightweight Suffix Array Construction and Checking .............. 55 Stefan Burkhardt, Juha K¨arkk¨ainen Distributed and Paged Suffix Trees for Large Genetic Databases ........ 70 Rapha¨el Clifford, Marek Sergot Analysis of Tree Edit Distance Algorithms ........................... 83 Serge Dulucq, H´el`ene Touzet An Exact and Polynomial Distance-Based Algorithm to Reconstruct Single Copy Tandem Duplication Trees .............................. 96 Olivier Elemento, Olivier Gascuel Average-Optimal Multiple Approximate String Matching............... 109 Kimmo Fredriksson, Gonzalo Navarro Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms ........................................... 129 Raffaele Giancarlo, Marinella Sciortino Haplotype Inference by Pure Parsimony ............................. 144 Dan Gusfield A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions ... 156 Tzvika Hartman Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals ............................ 170 Haim Kaplan, Elad Verbin X Table of Contents Linear-Time Construction of Suffix Arrays ........................... 186 Dong Kyue Kim, Jeong Seop Sim, Heejin Park, Kunsoo Park Space Efficient Linear Time Construction of Suffix Arrays .............. 200 Pang Ko, Srinivas Aluru Tuning String Matching for Huge Pattern Sets ........................ 211 Jari Kyt¨ojoki, Leena Salmela, Jorma Tarhio Sparse LCS Common Substring Alignment ........................... 225 Gad M. Landau, Baruch Schieber, Michal Ziv-Ukelson On Minimizing Pattern Splitting in Multi-track String Matching ........ 237 Kjell Lemstr¨om, Veli M¨akinen Alignment between Two Multiple Alignments ........................ 254 Bin Ma, Zhuozhi Wang, Kaizhong Zhang An Effective Algorithm for the Peptide De Novo Sequencing from MS/MS Spectrum ............................................ 266 Bin Ma, Kaizhong Zhang, Chengzhi Liang Pattern Discovery in RNA Secondary Structure Using Affix Trees ....... 278 Giancarlo Mauri, Giulio Pavesi More Efficient Left-to-Right Pattern Matching in Non-sequential Equational Programs .............................................. 295 Nadia Nedjah, Luiza de Macedo Mourelle Complexities of the Centre and Median String Problems ............... 315 Fran¸cois Nicolas, Eric Rivals Extracting Approximate Patterns ................................... 328 Johann Pelfrˆene, Sa¨ıd Abdedda¨ım, Jo¨el Alexandre A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression ..................................................... 348 Hiroshi Sakamoto Constrained Tree Inclusion ......................................... 361 Gabriel Valiente Working on the Problem of Sorting by Transpositions on Genome Rearrangements .................................................. 372 Maria Emilia M.T. Walter, Luiz Reginaldo A.F. Curado, Adilton G. Oliveira Table of Contents XI Efficient Selection of Unique and Popular Oligos for Large EST Databases ........................................................ 384 Jie Zheng, Timothy J. Close, Tao Jiang, Stefano Lonardi Author Index ................................................ 403.
Recommended publications
  • On the Suffix Automaton with Mismatches
    On the suffix automaton with mismatches Maxime Crochemore, Chiara Epifanio, Alessandra Gabriele, Filippo Mignosi To cite this version: Maxime Crochemore, Chiara Epifanio, Alessandra Gabriele, Filippo Mignosi. On the suffix au- tomaton with mismatches. 12th International Conference on Implementation and Application of Automata (CIAA’07), 2007, Prague, Czech Republic. pp.144-156, 10.1007/978-3-540-76336-9_15. hal-00620159 HAL Id: hal-00620159 https://hal-upec-upem.archives-ouvertes.fr/hal-00620159 Submitted on 3 Oct 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. On the Su±x Automaton with mismatches ? Maxime Crochemore1, Chiara Epifanio2, Alessandra Gabriele2, Filippo Mignosi3 1 Institut Gaspard-Monge, Universit¶ede Marne-la-Vall¶ee,France and King's College London, UK, [email protected] 2 Dipartimento di Matematica e Applicazioni, Universit`adi Palermo, Italy (epifanio,sandra)@math.unipa.it 3 Dipartimento di Informatica, Universit`adell'Aquila, Italy [email protected] Abstract. In this paper we focus on the construction of the minimal deterministic ¯nite automaton S k that recognizes the set of su±xes of a word w up to k errors. We present an algorithm that makes use of the automaton S k in order to accept in an e±cient way the language of all su±xes of w up to k errors in every windows of size r, where r is the value of the repetition index of w.
    [Show full text]
  • Accelerating Dynamic Programming Oren Weimann
    Accelerating Dynamic Programming by Oren Weimann Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2009 c Massachusetts Institute of Technology 2009. All rights reserved. Author................................................................... Department of Electrical Engineering and Computer Science February 5, 2009 Certified by . Erik D. Demaine Associate Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Professor Terry P. Orlando Chairman, Department Committee on Graduate Students 2 Accelerating Dynamic Programming by Oren Weimann Submitted to the Department of Electrical Engineering and Computer Science on February 5, 2009, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Dynamic Programming (DP) is a fundamental problem-solving technique that has been widely used for solving a broad range of search and optimization problems. While DP can be invoked when more specialized methods fail, this generality often incurs a cost in efficiency. We explore a unifying toolkit for speeding up DP, and algorithms that use DP as subroutines. Our methods and results can be summarized as follows. – Acceleration via Compression. Compression is traditionally used to efficiently store data. We use compression in order to identify repeats in the table that imply a redundant computation. Utilizing these repeats requires a new DP, and often different DPs for different compression schemes. We present the first provable speedup of the celebrated Viterbi algorithm (1967) that is used for the decoding and training of Hidden Markov Models (HMMs). Our speedup relies on the compression of the HMM’s observable sequence.
    [Show full text]
  • Arxiv:0707.3619V21 [Cs.DS] 23 Nov 2013
    Semi-local string comparison: Algorithmic techniques and applications Alexander Tiskin1 September 13, 2021 arXiv:0707.3619v21 [cs.DS] 23 Nov 2013 1Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom. Research supported by the Centre for Discrete Mathematics and Its Applications (DIMAP), University of Warwick, and by the Royal Society Leverhulme Trust Senior Research Fellowship. Abstract A classical measure of string comparison is given by the longest common sub- sequence (LCS) problem on a pair of strings. We consider its generalisation, called the semi-local LCS problem, which arises naturally in many string- related problems. The semi-local LCS problem asks for the LCS scores for each of the input strings against every substring of the other input string, and for every prefix of each input string against every suffix of the other input string. Such a comparison pattern provides a much more detailed picture of string similarity than a single LCS score; it also arises naturally in many string-related problems. In fact, the semi-local LCS problem turns out to be fundamental for string comparison, providing a powerful and flex- ible alternative to classical dynamic programming. It is especially useful when the input to a string comparison problem may not be available all at once: for example, comparison of dynamically changing strings; comparison of compressed strings; parallel string comparison. The same approach can also be applied to permutation strings, providing efficient solutions for local versions of the longest increasing subsequence (LIS) problem, and for the problem of computing a maximum clique in a circle graph.
    [Show full text]
  • Locating Maximal Approximate Runs in a String Mika Amit, Maxime Crochemore, Gad Landau, Dina Sokol
    Locating maximal approximate runs in a string Mika Amit, Maxime Crochemore, Gad Landau, Dina Sokol To cite this version: Mika Amit, Maxime Crochemore, Gad Landau, Dina Sokol. Locating maximal approximate runs in a string. Theoretical Computer Science, Elsevier, 2017, 700, pp.45-62. 10.1016/j.tcs.2017.07.021. hal-01771696 HAL Id: hal-01771696 https://hal.archives-ouvertes.fr/hal-01771696 Submitted on 19 Apr 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Locating Maximal Approximate Runs in a String ∗ Mika Amit1, Maxime Crochemore3,4, Gad M. Landau1,2, and Dina Sokol5 1Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel, [email protected], [email protected] 2Department of Computer Science and Engineering, NYU Polytechnic School of Engineering, New York University, Brooklyn, NY, USA 3King's College London, Strand, London WC2R 2LS, UK, [email protected] 4Universit´eParis-Est, Institut Gaspard-Monge, 77454 Marne-la-Vall´eeCedex 2, France 5Department of Computer and Information Science, Brooklyn College of the City University of New York, Brooklyn NY, USA, [email protected] May 20, 2017 Abstract An exact run in a string T is a non-empty substring of T that is a repetition of a smaller substring.
    [Show full text]
  • Algorithmic Contributions to Computational Molecular Biology Stéphane Vialette
    Algorithmic Contributions to Computational Molecular Biology Stéphane Vialette To cite this version: Stéphane Vialette. Algorithmic Contributions to Computational Molecular Biology. Data Structures and Algorithms [cs.DS]. Université Paris-Est, 2010. tel-00862069 HAL Id: tel-00862069 https://tel.archives-ouvertes.fr/tel-00862069 Submitted on 23 Sep 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. UNIVERSITE´ PARIS-EST MARNE-LA-VALLEE´ INSTITUT GASPARD MONGE HABILITATION DIRIGER DES RECHERCHES prsente par Stphane VIALETTE Algorithmic Contributions to Computational Molecular Biology Soutenue publiquement le 1 Juin 2010 devant le jury compos de Marie-Pierre B´eal Professeur, Universit Paris-Est, Examinateur Christian Choffrut Professeur, Universit Paris 7, Examinateur Maxime Crochemore Professeur, Universit Paris-Est, Examinateur Alain Denise Professeur, Universit Paris-Sud 11, Examinateur Gregory Kucherov Directeur de Recherche CNRS, Rapporteur Andr´asSeb˝o Directeur de Recherche CNRS, Rapporteur Contents I Structures: from 2-intervals to annotated sequences . throught permutations1 1 Algorithmic aspects of 2-interval sets5 1.1 Introduction...............................................5 1.2 Bestiary and definitions........................................6 1.3 Recognizing multidimensional interval graphs...........................7 1.4 Combinatorial problems on 2-intervals................................ 10 2 From linear graphs to permutations 17 2.1 Introduction..............................................
    [Show full text]
  • Dagrep V001 I002 P047 S11081
    Report from Dagstuhl Seminar 11081 Combinatorial and Algorithmic Aspects of Sequence Processing Edited by Maxime Crochemore1, Lila Kari2, Mehryar Mohri3, and Dirk Nowotka4 1 King’s College London, GB, [email protected] 2 University of Western Ontario, London, CA, [email protected] 3 New York University, US, [email protected] 4 University of Stuttgart, DE, [email protected] Abstract Sequences form the most basic and natural data structure. They occur whenever information is electronically transmitted (as bit streams), when natural language text is spoken or written down (as words over, for example, the latin alphabet), in the process of heredity transmission in living cells (as DNA sequence) or the protein synthesis (as sequence of amino acids), and in many more different contexts. Given this universal form of representing information, the need to process strings is apparent and actually a core purpose of computer use. Algorithms to efficiently search through, analyze, (de-)compress, match, learn, and encode/decode strings are therefore of chief interest. Combinatorial problems about strings lie at the core of such algorithmic questions. Many such combinatorial problems are common in the string processing efforts in the different fields of application. Scientists working in the fields of Combinatorics on Words, Computational Biology, Stringo- logy, Natural Computing, and Machine Learning were invited to consider the seminar’s topic from a wide range of perspectives. This report documents the program and the outcomes of Dagstuhl
    [Show full text]
  • Combinatorial Algorithms for DNA Sequence Assembly
    Combinatorial algorithms for DNA sequence assembly John D. Kececioglu Eugene W. Myers TR 92-37 Revised October 20, 1992; revised January 15, 1993 Abstract The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algo- rithms for assembling a long DNA sequence from the fragments obtained by shotgun sequenc- ing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, compli- cated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP-hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form of multiple sequence alignment to detect, and often correct, errors in the data. Our combined algorithm has success- fully reconstructed non-repetitive sequences of length 50,000 sampled at error rates of as high as 10 percent. Computer Science Department University of California at Davis Davis, California 95616 E-mail: [email protected] Department of Computer Science University of Arizona Tucson, AZ 85721 E-mail: [email protected] Keywords Computational biology, branch and bound algorithms, approximation algorithms, frag- ment assembly, sequence reconstruction Research supported by the National Library of Medicine under Grant R01 LM4960, by a postdoctoral fellow- ship from the Program in Mathematics and Molecular Biology of the University of California at Berkeley under Na- tional Science Foundation Grant DMS–8720208, and by a fellowship from the Centre de recherches math´ematiques of the Universit´e de Montr´eal.
    [Show full text]
  • IBM Research Report a PQ Tree-Based Framework For
    RC23837 (W0512-116) December 23, 2005 Computer Science IBM Research Report A PQ Tree-based Framework for Reconstructing Common Ancestor under Inversions and Translocations Laxmi Parida IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 Research Division Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. I thas been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g ,. payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home . A PQ Tree-based Framework for Reconstructing Common Ancestors Under Inversions & Translocations Laxmi Parida Computational Biology Center, IBM TJ Watson Research Center, Yorktown Heights, New York 10598, USA [email protected] Abstract. Various international e®orts are underway to catalog the genomic similarities and variations in the human population. Some key discoveries such as inversions and translocations within the members of the species have been made in the last few years. The task of constructing a correct genealogy tree of the members of the same species, given this knowledge and data, is an important problem.
    [Show full text]
  • CPM's 20Th Anniversary: a Statistical Retrospective
    CPM’s 20th Anniversary: A Statistical Retrospective Elena Yavorska Harris1, Thierry Lecroq2, Gregory Kucherov3, and Stefano Lonardi1 1 Dept. of Computer Science – University of California – Riverside, CA, USA 2 University of Rouen, LITIS EA 4108, 76821 Mont-Saint-Aignan Cedex, France 3 CNRS (LIFL, Lille and J.-V.Poncelet Lab, Moscow) and INRIA Lille – Nord Europe 1 Introduction This year the Annual Symposium on Combinatorial Pattern Matching (CPM) celebrates its 20th anniversary. Over the last two decades the Symposium has established itself as the most recognized international forum for research in combinatorial pattern match- ing and related applications. Contributions to the conference typically address issues of searching and matching strings and more complex patterns such as trees, regular ex- pressions, graphs, point sets, and arrays. Advances in this field rely on the ability to expose combinatorial properties of the computational problem at hand and to exploit these properties in order to either achieve superior performance or identify conditions under which searches cannot be performed efficiently. The meeting also deals with com- binatorial problems in computational biology, data compression, data mining, coding, information retrieval, natural language processing and pattern recognition. The first edition of CPM was held in Paris in July 1990, and gathered about thirty participants. Since then the conference has been held every year, usually in June or July. Thirteen countries, over three continents, have hosted it (see Table 1). The “seed” of CPM can be traced back to a NATO-ASI Workshop in Maratea, Italy organized by Z. Galil and A. Apostolico. The volume collecting the contributions presented at the workshop [1] defined perhaps for the first time the scope of this research area, sometimes referred to as “stringology”.
    [Show full text]
  • 29Th Annual Symposium on Combinatorial Pattern Matching
    29th Annual Symposium on Combinatorial Pattern Matching CPM 2018, July 2–4, 2018, Qingdao, China Edited by Gonzalo Navarro David Sankoff Binhai Zhu LIPIcs – Vol. 105 – CPM2018 www.dagstuhl.de/lipics Editors Gonzalo Navarro David Sankoff Department of Computer Science Department of Math and Statistics University of Chile, Chile University of Ottawa, Canada [email protected] [email protected] Binhai Zhu Gianforte School of Computing Montana State University, USA [email protected] ACM Classification 2012 Mathematics of computing → Discrete mathematics, Mathematics of computing → Information theory, Information systems → Information retrieval, Theory of computation → Design and analysis of algorithms, Applied computing → Computational biology ISBN 978-3-95977-074-3 Published online and open access by Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing, Saarbrücken/Wadern, Germany. Online available at http://www.dagstuhl.de/dagpub/978-3-95977-074-3. Publication date May, 2018 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. License This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC-BY 3.0): http://creativecommons.org/licenses/by/3.0/legalcode. In brief, this license authorizes each and everybody to share (to copy, distribute and transmit) the work under the following conditions, without impairing or restricting the authors’ moral rights: Attribution: The work must be attributed to its authors. The copyright is retained by the corresponding authors. Digital Object Identifier: 10.4230/LIPIcs.CPM.2018.0 ISBN 978-3-95977-074-3 ISSN 1868-8969 http://www.dagstuhl.de/lipics 0:iii LIPIcs – Leibniz International Proceedings in Informatics LIPIcs is a series of high-quality conference proceedings across all fields in informatics.
    [Show full text]
  • Algorithms and Data Structures for Grammar-Compressed Strings
    Downloaded from orbit.dtu.dk on: Oct 01, 2021 Algorithms and data structures for grammar-compressed strings Cording, Patrick Hagge Publication date: 2015 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Cording, P. H. (2015). Algorithms and data structures for grammar-compressed strings. Technical University of Denmark. DTU Compute PHD-2014 No. 357 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. ALGORITHMS AND DATA STRUCTURES FOR GRAMMAR-COMPRESSED STRINGS Patrick Hagge Cording PHD-2014-348 Technical University of Denmark Department of Applied Mathematics and Computer Science Richard Petersens Plads, Building 324, 2800 Kongens Lyngby, Denmark Phone +45 4525 3031 [email protected] www.compute.dtu.dk PHD-2014-357 ISSN: 0909-3192 PREFACE This doctoral dissertation was prepared at the Department of Applied Mathematics and Computer Science at the Technical University of Denmark in partial fulfilment of the requirements for acquiring a doctoral degree.
    [Show full text]
  • Computing the Burrows–Wheeler Transform in Place and in Small Space Maxime Crochemore, Roberto Grossi, Juha Kärkkäinen, Gad Landau
    Computing the Burrows–Wheeler Transform in Place and in Small Space Maxime Crochemore, Roberto Grossi, Juha Kärkkäinen, Gad Landau To cite this version: Maxime Crochemore, Roberto Grossi, Juha Kärkkäinen, Gad Landau. Computing the Burrows– Wheeler Transform in Place and in Small Space. Journal of Discrete Algorithms, Elsevier, 2015. hal-01806295 HAL Id: hal-01806295 https://hal-upec-upem.archives-ouvertes.fr/hal-01806295 Submitted on 1 Jun 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Computing the Burrows–Wheeler Transform in Place and in Small Space✩ Maxime Crochemore King’s College London, UK Roberto Grossi Dipartimento di Informatica, Universit`adi Pisa, Italy Juha K¨arkk¨ainen Department of Computer Science, University of Helsinki, Finland Gad M. Landau Department of Computer Science, University of Haifa, Israel, and Department of Computer Science and Engineering, NYU-Poly, Brooklyn NY, USA Abstract We introduce the problem of computing the Burrows–Wheeler Transform (BWT) using small additional space. Our in-place algorithm does not need the explicit storage for the suffix sort array and the output array, as typically required in previous work. It relies on the combinatorial properties of the BWT, and runs in O(n2) time in the comparison model using O(1) extra memory cells, apart from the array of n cells storing the n characters of the input text.
    [Show full text]