Lecture Notes in Computer Science 6060 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany Tapio Elomaa Heikki Mannila Pekka Orponen (Eds.)

Algorithms and Applications

Essays Dedicated to Esko Ukkonen on the Occasion of His 60th Birthday

13 Volume Editors

Tapio Elomaa Tampere University of Technology Department of Software Systems P. O. Box 553, 33101 Tampere, E-mail: [email protected].fi

Heikki Mannila School of Science and Technology Department of Information and Computer Science P.O. Box 17800, 00076 Aalto, Finland E-mail: heikki.mannila@aaltouniversity.fi

Pekka Orponen Aalto University School of Science and Technology Department of Information and Computer Science P.O. Box 15400, 00076 Aalto, Finland E-mail: pekka.orponen@tkk.fi

Cover illustration: Artwork by Jussi Ukkonen, Finland (2010)

Library of Congress Control Number: 2010924186

CR Subject Classification (1998): I.2, H.3, J.3, I.5, H.4-5, F.2

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

ISSN 0302-9743 ISBN-10 3-642-12475-5 Springer Berlin Heidelberg New York ISBN-13 978-3-642-12475-4 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180 Esko Ukkonen (The photograph was taken by Joma Marstio 2010) Preface

This Festschrift is dedicated to Esko Ukkonen on the occasion of his 60th birth- day on January 26, 2010. It contains contributions by his former PhD stu- dents and colleagues with whom he cooperated closely within his career. The Festschrift was presented to Esko during a festive symposium organized at the to celebrate his birthday. Esko Ukkonen has worked on many areas of computer science, including nu- merical methods, complexity theory, theoretical aspects of compiler construction, and logic programming. However, his main research interest over the years has been algorithms, with applications. Esko’s style of work has been to collaborate closely with scientists from other areas and to study their computational needs. From an understanding of available data the work progresses to the formulation of computational concepts, i.e., finding out what should be computed. The prop- erties of the concepts are then analyzed, algorithms are designed, their behavior is analyzed, the methods are implemented and taken to real applications. This style of work has been very successful throughout his career: Esko has formulated and analyzed many central concepts in computational data analysis. Combining applications and algorithms is also the central theme in the Center of Excellence, Algodan, directed by Esko. Perhaps the most important scientific areas of Esko Ukkonen are computa- tional pattern matching and string algorithms. He has contributed significantly to the development of these overlapping fields and has helped them to find their own identity. Most of the contributions in this volume concern computational pattern matching or string algorithms. Esko Ukkonen has had a major role in the development of Finnish computer science. He was the key person in the development of the school of algorithmic research in Finland, and he has had a major role in PhD education. The editors of this volume are grateful to Esko for the insightful guidance that they received from him when they were his PhD students.

January 2010 Tapio Elomaa Heikki Mannila Pekka Orponen

Acknowledgements

We would like to thank everybody who contributed to this Festschrift: the au- thors for their interesting articles, the colleagues and PhD students who helped proofread the contributions, Greger Lind´en for technical assisstance, and Veli M¨akinen for organizing the seminar to honor Esko’s birthday. Table of Contents

String Rearrangement Metrics: A Survey ...... 1 Amihood Amir and Avivit Levy

Maximal Words in Sequence Comparisons Based on Subword Composition ...... 34 Alberto Apostolico

Fast Intersection Algorithms for Sorted Sequences ...... 45 Ricardo Baeza-Yates and Alejandro Salinger

Indexing and Searching a Mass Spectrometry Database ...... 62 Søren Besenbacher, Benno Schwikowski, and Jens Stoye

Extended Compact Web Graph Representations ...... 77 Francisco Claude and Gonzalo Navarro

A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches ...... 92 Maxime Crochemore, Costas S. Iliopoulos, and Solon P. Pissis

Covering Analysis of the Greedy Algorithm for Partial Cover ...... 102 Tapio Elomaa and Jussi Kujala

From Nondeterministic Suffix Automaton to Lazy Suffix Tree ...... 114 Kimmo Fredriksson

Clustering the Normalized Compression Distance for Influenza Virus Data ...... 130 Kimihito Ito, Thomas Zeugmann, and Yu Zhu

An Evolutionary Model of DNA Substring Distribution ...... 147 Meelis Kull, Konstantin Tretyakov, and Jaak Vilo

Indexing a Dictionary for Subset Matching Queries ...... 158 Gad M. Landau, Dekel Tsur, and Oren Weimann

Transposition and Time-Scale Invariant Geometric Music Retrieval ..... 170 Kjell Lemstr¨om

Unified View of Backward Backtracking in Short Read Mapping ...... 182 Veli M¨akinen, Niko V¨alim¨aki, Antti Laaksonen, and Riku Katainen

Some Applications of String Algorithms in Human-Computer Interaction ...... 196 Kari-Jouko R¨aih¨a X Table of Contents

Approximate String Matching with Reduced Alphabet ...... 210 Leena Salmela and Jorma Tarhio

ICT4D: A Computer Science Perspective ...... 221 Erkki Sutinen and Matti Tedre

Searching for Linear Dependencies between Heart Magnetic Resonance Images and Lipid Profiles ...... 232 Marko Sysi-Aho, Juha Koikkalainen, Jyrki L¨otj¨onen, Tuulikki Sepp¨anen-Laakso, Hans S¨oderlund, Tiina Heli¨o, and Matej Oreˇsiˇc

The Support Vector Tree ...... 244 Antti Ukkonen

Author Index ...... 261