Pattern Matching Algorithms This Page Intentionally Left Blank Pattern Matching Algorithms

Total Page:16

File Type:pdf, Size:1020Kb

Pattern Matching Algorithms This Page Intentionally Left Blank Pattern Matching Algorithms Pattern Matching Algorithms This page intentionally left blank Pattern Matching Algorithms Edited by Alberto Apostolico Zvi Galil New York Oxford Oxford University Press 1997 Oxford University Press Oxford New York Athens Auckland Bangkok Bogota Bombay Buenos Aires Calcutta Cape Town Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne Mexico City Nairobi Paris Singapore Taipei Tokyo Toronto and associated companies in Berlin Ibadan Copyright © 1997 by Alberto Apostolico & Zvi Galil Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Apostolico, Alberto, 1948- Pattern matching algorithms / edited by A. Apostolico, Zvi Galil. p. cm. Includes bibliographical references and index. ISBN 0-19-511367-5 1. Computer algorithms. 2. Combinatorial analysis. I. Galil, Zvi. II. Title. QA76.9.A43A66 1997 006.4—dc21 96-49602 35798642 Printed in the United States of America on acid-free paper Preface Issues of matching and searching on elementary discrete structures arise pervasively in Computer Science as well as in many of its applications. Their relevance may be expected to grow even further in the near future as information is amassed, disseminated and shared at an increasing pace. A number of algorithms have been discovered in recent years in response to these needs. Along the way, a number of combinatorial structures and tools have been exposed that often carry intrinsic value and interest. The main ideas that developed in this process concurred to distill the scope and flavor of Pattern Matching, now an established specialty of Algorithmics. The lineage of Pattern Matching may be traced back for more than two decades, to a handful of papers such as "Rapid identification of repeated patterns in strings, trees, and arrays", by R.M. Karp, R.E. Miller and A.L. Rosenberg, "Automata theory can be useful", by D.E. Knuth and V.R. Pratt, and, in a less direct way, S.A. Cook's "Linear time simulation of de- terministic two-way pushdown automata". The outgrowth of these powerful seeds is impressive: I. Simon lists over 350 titles in the current version of his bibliography on string algorithms; A. V. Aho references over 140 pa- pers in his recent survey of string-searching algorithms alone; and advanced workshops and schools, books and special issues of major journals have been already dedicated to the subject and more are planned for the future. This book attempts a snapshot of the current state of the art in Pattern Matching as seen by specialists who have devoted several years of their study to the field. Without pretending to be exhaustive, the book does cover most basic principles and presents material advanced enough to portray faithfully the current frontier of research in the field. Our intention was to combine a textbook suitable for a graduate or advanced level course with an in-depth source for the specialist as well as the neophyte. The appearance of this book marks, somewhat intentionally, the Tenth anniversary since "Combinatorial Algorithms on Words". We wish it an equally favorable reception, and possibly even more opportunities to serve, through the forthcoming years, the growth of its useful and fascinating sub- ject. a.a. and z.g. This page intentionally left blank Contributors Amihood Amir, Computer Science Department, Georgia Institute of Technology, Atlanta, GA 30332, USA Alberto Apostolico, Department of Computer Sciences, Purdue Uni- versity, CS Building, West Lafayette, IN 47907, USA and Dipartimento di Elettronica e Informatica, Universita di Padova, Via Gradenigo 6/A, 35131 Padova, Italy Mikhail J. Atallah, Department of Computer Sciences, Purdue Univer- sity, CS Building, West Lafayette, IN 47907, USA Maxima Crochemore, Institut Gaspard Monge, Universite de Marne-la- Vallee, 2 rue de la Butte Verte, F-93160 Noisy-le-Grand, France. Martin Farach, DIM ACS, Rutgers University, CoRE Building, Busch Campus, P.O. 1179, Piscataway, NJ 08855-1179, USA Zvi Galil, Department of Computer Science, Columbia University, New York, NY 10027, USA and Department of Computer Science, School of Mathematical Sciences, Tel Aviv University, Tel Aviv, 69978 Israel Raffaele Giancarlo, Dipartimento di Matematica ed Applicazioni, Uni- versita di Palermo, Via Archirafi 34, 90123 Palermo, Italy Roberto Grossi, Dipartimento di Sistemi e Informatica, Universita di Firenze, Via Lombroso 6/17, 50134 Firenze, Italy Daniel S. Hirschberg, Department of Information and Computer Sci- ence, University of California at Irvine, Irvine, CA 92717, USA Tao Jiang, Department of Computer Science, McMaster University, Hamil- ton, Ontario L8S 4K1, Canada Gad M. Landau, Department of Computer Science, Polytechnic Univer- sity, 6 MetroTech, Brooklyn, NY 11201, USA Ming Li, Department of Computer Science, University of Waterloo, Wa- terloo, Ontario N2L 3G1, Canada Dennis Shasha, Courant Institute of Mathematical Science, New York University, 251 Mercer Street, New York, NY 10012, USA Uzi Vishkin, Institute for Advanced Computer Studies and Electrical En- gineering Department, University of Maryland, College Park, MD 20742, USA. and Department of Computer Science, School of Mathematical Sci- ences, Tel Aviv University, Tel Aviv, 69978 Israel Ilan Yudkiewicz, Department of Computer Science, School of Mathemat- ical Sciences, Tel Aviv University, Tel Aviv, 69978 Israel Kaizhong Zhang, Department of Computer Science, University of West- ern Ontario, London, Ontario N6A 5B7, Canada Contents 1 Off-line Serial Exact String Searching 1 M. Crochemore 1.1 Searching for strings in texts 1 1.2 Sequential searches 5 1.2.1 String-Matching Automaton 5 1.2.2 Forward Prefix scan 7 1.2.3 Forward Subword scan 19 1.3 Practically fast searches 24 1.3.1 Reverse-Suffix scan 24 1.3.2 Reverse-Prefix scan 32 1.4 Space-economical methods 35 1.4.1 Constant-space searching algorithm 36 1.4.2 Computing a critical factorization 40 1.4.3 Computing the period of the pattern 42 1.5 Heuristics for practical implementations 44 1.6 Exercises 46 1.7 Bibliographic notes 47 2 Off-line Parallel Exact String Searching 55 Z. Galil and I. Yudkiewicz 2.1 Preliminaries 55 2.2 Application of sequential techniques 57 2.3 Periodicities and witnesses 59 2.3.1 Witnesses 61 2.3.2 Computing and using witnesses in O(log log m) time 64 2.3.3 A Lower Bound for the CRCW-PRAM 64 2.4 Deterministic Samples 67 2.4.1 Faster text search using DS 70 2.5 The fastest known CRCW-PRAM algorithm 71 2.5.1 Computing deterministic samples in con- stant time 72 2.5.2 Applying DS to obtain the fastest algo- rithms 74 2.5.3 A new method to compute witnesses : Pseudo-periods 74 2.5.4 A constant expected time algorithm 78 2.6 Fast optimal algorithms on weaker models 78 2.6.1 Optimal algorithm on the EREW- PRAM 80 2.7 Two dimensional pattern matching 82 2.8 Conclusion and open questions 84 2.9 Exercises 85 2.10 Bibliographic notes 85 3 On-line String Searching 89 A. Apostolico 3.1 Subword trees 89 3.2 McCreight's algorithm 93 3.3 Storing suffix trees 96 3.4 Building suffix trees in parallel 97 3.4.1 Preprocessing 98 3.4.2 Structuring Dx 99 3.4.3 Refining Dx 100 3.4.4 Reversing the edges 105 3.5 Parallel on-line search 106 3.6 Exhaustive on-line searches 108 3.6.1 Lexicographic lists 111 3.6.2 Building lexicographic lists 113 3.6.3 Standard representations for constant- time queries 117 3.7 Exercises 118 3.8 Bibliographic notes 119 4 Serial Computations of Levenshtein Distances 123 D.S. Hirschberg 4.1 Levenshtein distance and the LCS problem 123 4.2 Classical algorithm 124 4.3 Non-parameterized algorithms 125 4.3.1 Linear space algorithm 125 4.3.2 Sub quadratic algorithm 126 4.3.3 Lower bounds 129 4.4 Parameterized algorithms 130 4.4.1 pn algorithm 130 4.4.2 Hunt-Szymanski algorithm 132 4.4.3 nD algorithm 134 4.4.4 Myers algorithm 134 4.5 Exercises 137 4.6 Bibliographic notes 138 5 Parallel Computations of Levenshtein Distances 143 A. Apostolico and M.J. Atallah 5.1 Edit distances and shortest paths 144 5.2 A monotonicity property and its use 146 5.3 A sequential algorithm for tube minima 151 5.4 Optimal EREW-PRAM algorithm for tube min- ima 151 5.4.1 EREW-PRAM computation of row min- ima 152 5.4.2 The tube minima bound 168 5.4.3 Implications for string editing 170 5.5 Optimal CRCW-PRAM algorithm for tube min- ima 170 5.5.1 A preliminary algorithm 171 5.5.2 Decreasing the work done 178 5.5.3 Further remarks 179 5.6 Exercises 180 5.7 Bibliographic notes 181 6 Approximate String Searching 185 G.M. Landau and U. Vishkin 6.1 The serial algorithm 187 6.1.1 The dynamic programming algorithm. 187 6.1.2 An alternative dynamic programming com- putation 188 6.1.3 The efficient algorithm 190 6.2 The parallel algorithm 191 6.3 An algorithm for the LCA problem 193 6.3.1 Reducing the LCA problem to a restricted- domain range-minima problem 194 6.3.2 A simple sequential LCA algorithm 194 6.4 Exercises 196 6.5 Bibliographic notes 196 7 Dynamic Programming: Special Cases 201 R. Giancarlo 7.1 Preliminaries 201 7.2 Convexity and concavity 203 7.2.1 Monge conditions 203 7.2.2 Matrix searching 204 7.3 The one-dimensional case (ID/ID) 206 7.3.1 A stack or a queue 207 7.3.2 The effects of SMAWK 208 7.4 The two-dimensional case 211 7.4.1 A coarse divide-and-conquer 212 7.4.2 A fine divide-and-conquer 213 7.5 Sparsity 215 7.5.1 The sparse 2D/0D problem 216 7.5.2 The sparse 2D/2D problem 220 7.6 Applications 222 7.6.1 Sequence alignment 222 7.6.2 RNA secondary structure 226 7.7 Exercises 231 7.8 Bibliographic notes 232 8 Shortest Common Superstrings 237 M.
Recommended publications
  • Pattern Matching Using Similarity Measures
    Pattern matching using similarity measures Patroonvergelijking met behulp van gelijkenismaten (met een samenvatting in het Nederlands) PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van Rector Magnificus, Prof. Dr. H. O. Voorma, ingevolge het besluit van het College voor Promoties in het openbaar te verdedigen op maandag 18 september 2000 des morgens te 10:30 uur door Michiel Hagedoorn geboren op 13 juli 1972, te Renkum promotor: Prof. Dr. M. H. Overmars Faculteit Wiskunde & Informatica co-promotor: Dr. R. C. Veltkamp Faculteit Wiskunde & Informatica ISBN 90-393-2460-3 PHILIPS '$ The&&% research% described in this thesis has been made possible by financial support from Philips Research Laboratories. The work in this thesis has been carried out in the graduate school ASCI. Contents 1 Introduction 1 1.1Patternmatching.......................... 1 1.2Applications............................. 4 1.3Obtaininggeometricpatterns................... 7 1.4 Paradigms in geometric pattern matching . 8 1.5Similaritymeasurebasedpatternmatching........... 11 1.6Overviewofthisthesis....................... 16 2 A theory of similarity measures 21 2.1Pseudometricspaces........................ 22 2.2Pseudometricpatternspaces................... 30 2.3Embeddingpatternsinafunctionspace............. 40 2.4TheHausdorffmetric........................ 46 2.5Thevolumeofsymmetricdifference............... 54 2.6 Reflection visibility based distances . 60 2.7Summary.............................. 71 2.8Experimentalresults.......................
    [Show full text]
  • Tarjan Transcript Final with Timestamps
    A.M. Turing Award Oral History Interview with Robert (Bob) Endre Tarjan by Roy Levin San Mateo, California July 12, 2017 Levin: My name is Roy Levin. Today is July 12th, 2017, and I’m in San Mateo, California at the home of Robert Tarjan, where I’ll be interviewing him for the ACM Turing Award Winners project. Good afternoon, Bob, and thanks for spending the time to talk to me today. Tarjan: You’re welcome. Levin: I’d like to start by talking about your early technical interests and where they came from. When do you first recall being interested in what we might call technical things? Tarjan: Well, the first thing I would say in that direction is my mom took me to the public library in Pomona, where I grew up, which opened up a huge world to me. I started reading science fiction books and stories. Originally, I wanted to be the first person on Mars, that was what I was thinking, and I got interested in astronomy, started reading a lot of science stuff. I got to junior high school and I had an amazing math teacher. His name was Mr. Wall. I had him two years, in the eighth and ninth grade. He was teaching the New Math to us before there was such a thing as “New Math.” He taught us Peano’s axioms and things like that. It was a wonderful thing for a kid like me who was really excited about science and mathematics and so on. The other thing that happened was I discovered Scientific American in the public library and started reading Martin Gardner’s columns on mathematical games and was completely fascinated.
    [Show full text]
  • Pattern Matching Using Fuzzy Methods David Bell and Lynn Palmer, State of California: Genetic Disease Branch
    Pattern Matching Using Fuzzy Methods David Bell and Lynn Palmer, State of California: Genetic Disease Branch ABSTRACT The formula is : Two major methods of detecting similarities Euclidean Distance and 2 a "fuzzy Hamming distance" presented in a paper entitled "F %%y Dij : ; ( <3/i - yj 4 Hamming Distance: A New Dissimilarity Measure" by Bookstein, Klein, and Raita, will be compared using entropy calculations to Euclidean distance is especially useful for comparing determine their abilities to detect patterns in data and match data matching whole word object elements of 2 vectors. The records. .e find that both means of measuring distance are useful code in Java is: depending on the context. .hile fuzzy Hamming distance outperforms in supervised learning situations such as searches, the Listing 1: Euclidean Distance Euclidean distance measure is more useful in unsupervised pattern matching and clustering of data. /** * EuclideanDistance.java INTRODUCTION * * Pattern matching is becoming more of a necessity given the needs for * Created: Fri Oct 07 08:46:40 2002 such methods for detecting similarities in records, epidemiological * * @author David Bell: DHS-GENETICS patterns, genetics and even in the emerging fields of criminal * @version 1.0 behavioral pattern analysis and disease outbreak analysis due to */ possible terrorist activity. 0nfortunately, traditional methods for /** Abstact Distance class linking or matching records, data fields, etc. rely on exact data * @param None matches rather than looking for close matches or patterns. 1f course * @return Distance Template proximity pattern matches are often necessary when dealing with **/ messy data, data that has inexact values and/or data with missing key abstract class distance{ values.
    [Show full text]
  • CSCI 2041: Pattern Matching Basics
    CSCI 2041: Pattern Matching Basics Chris Kauffman Last Updated: Fri Sep 28 08:52:58 CDT 2018 1 Logistics Reading Assignment 2 I OCaml System Manual: Ch I Demo in lecture 1.4 - 1.5 I Post today/tomorrow I Practical OCaml: Ch 4 Next Week Goals I Mon: Review I Code patterns I Wed: Exam 1 I Pattern Matching I Fri: Lecture 2 Consider: Summing Adjacent Elements 1 (* match_basics.ml: basic demo of pattern matching *) 2 3 (* Create a list comprised of the sum of adjacent pairs of 4 elements in list. The last element in an odd-length list is 5 part of the return as is. *) 6 let rec sum_adj_ie list = 7 if list = [] then (* CASE of empty list *) 8 [] (* base case *) 9 else 10 let a = List.hd list in (* DESTRUCTURE list *) 11 let atail = List.tl list in (* bind names *) 12 if atail = [] then (* CASE of 1 elem left *) 13 [a] (* base case *) 14 else (* CASE of 2 or more elems left *) 15 let b = List.hd atail in (* destructure list *) 16 let tail = List.tl atail in (* bind names *) 17 (a+b) :: (sum_adj_ie tail) (* recursive case *) The above function follows a common paradigm: I Select between Cases during a computation I Cases are based on structure of data I Data is Destructured to bind names to parts of it 3 Pattern Matching in Programming Languages I Pattern Matching as a programming language feature checks that data matches a certain structure the executes if so I Can take many forms such as processing lines of input files that match a regular expression I Pattern Matching in OCaml/ML combines I Case analysis: does the data match a certain structure I Destructure Binding: bind names to parts of the data I Pattern Matching gives OCaml/ML a certain "cool" factor I Associated with the match/with syntax as follows match something with | pattern1 -> result1 (* pattern1 gives result1 *) | pattern2 -> (* pattern 2..
    [Show full text]
  • Compiling Pattern Matching to Good Decision Trees
    Submitted to ML’08 Compiling Pattern Matching to good Decision Trees Luc Maranget INRIA Luc.marangetinria.fr Abstract In this paper we study compilation to decision tree, whose We address the issue of compiling ML pattern matching to efficient primary advantage is never testing a given subterm of the subject decisions trees. Traditionally, compilation to decision trees is op- value more than once (and whose primary drawback is potential timized by (1) implementing decision trees as dags with maximal code size explosion). Our aim is to refine naive compilation to sharing; (2) guiding a simple compiler with heuristics. We first de- decision trees, and to compare the output of such an optimizing sign new heuristics that are inspired by necessity, a notion from compiler with optimized backtracking automata. lazy pattern matching that we rephrase in terms of decision tree se- Compilation to decision can be very sensitive to the testing mantics. Thereby, we simplify previous semantical frameworks and order of subject value subterms. The situation can be explained demonstrate a direct connection between necessity and decision by the example of an human programmer attempting to translate a ML program into a lower-level language without pattern matching. tree runtime efficiency. We complete our study by experiments, 1 showing that optimized compilation to decision trees is competi- Let f be the following function defined on triples of booleans : tive. We also suggest some heuristics precisely. l e t f x y z = match x,y,z with | _,F,T -> 1 Categories and Subject Descriptors D 3. 3 [Programming Lan- | F,T,_ -> 2 guages]: Language Constructs and Features—Patterns | _,_,F -> 3 | _,_,T -> 4 General Terms Design, Performance, Sequentiality.
    [Show full text]
  • Combinatorial Pattern Matching
    Combinatorial Pattern Matching 1 A Recurring Problem Finding patterns within sequences Variants on this idea Finding repeated motifs amoungst a set of strings What are the most frequent k-mers How many time does a specific k-mer appear Fundamental problem: Pattern Matching Find all positions of a particular substring in given sequence? 2 Pattern Matching Goal: Find all occurrences of a pattern in a text Input: Pattern p = p1, p2, … pn and text t = t1, t2, … tm Output: All positions 1 < i < (m – n + 1) such that the n-letter substring of t starting at i matches p def bruteForcePatternMatching(p, t): locations = [] for i in xrange(0, len(t)-len(p)+1): if t[i:i+len(p)] == p: locations.append(i) return locations print bruteForcePatternMatching("ssi", "imissmissmississippi") [11, 14] 3 Pattern Matching Performance Performance: m - length of the text t n - the length of the pattern p Search Loop - executed O(m) times Comparison - O(n) symbols compared Total cost - O(mn) per pattern In practice, most comparisons terminate early Worst-case: p = "AAAT" t = "AAAAAAAAAAAAAAAAAAAAAAAT" 4 We can do better! If we preprocess our pattern we can search more effciently (O(n)) Example: imissmissmississippi 1. s 2. s 3. s 4. SSi 5. s 6. SSi 7. s 8. SSI - match at 11 9. SSI - match at 14 10. s 11. s 12. s At steps 4 and 6 after finding the mismatch i ≠ m we can skip over all positions tested because we know that the suffix "sm" is not a prefix of our pattern "ssi" Even works for our worst-case example "AAAAT" in "AAAAAAAAAAAAAAT" by recognizing the shared prefixes ("AAA" in "AAAA").
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Lecture Notes in Computer Science
    D. Dolev Z. Galil M. Rodeh (Eds.) Theory of Computing and Systems ISTCS '92, Israel Symposium Haifa, Israel, May 27-28, 1992 Proceedings Springer-Verlag Berlin Heidelberg NewYork London Paris Tokyo Hong Kong Barcelona Budapest Series Editors Gerhard Goos Juris Hartmanis Universit~it Karlsruhe Department of Computer Science Postfach 69 80 Cornell University Vincenz-Priessnitz-Stral3e 1 5149 Upson Hall W-7500 Karlsruhe, FRG Ithaca, NY 14853, USA Volume Editors Danny Dolev Hebrew University Givat Ram, Jerusalem, Israel Zvi Galil Columbia University, New York, NY 10027, USA and Tel Aviv University Ramat Aviv, Tel Aviv, Israel Michael Rodeh IBM Israel Ltd., Science and Technology, Technion City Haifa 32000, Israel CR Subject Classification (1991): F.1-4, B.3, C.2, G.1-2, 1.1, E.4, D.2-3 ISBN 3-540-55553-6 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-55553-6 Springer-Verlag New York Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg 1992 Printed in Germany Typesetting: Camera ready by author/editor Printing and binding: Druckhaus Beltz, Hemsbach/Bergstr.
    [Show full text]
  • Integrating Pattern Matching Within String Scanning a Thesis
    Integrating Pattern Matching Within String Scanning A Thesis Presented in Partial Fulfillment of the Requirements for the Degree of Master of Science with a Major in Computer Science in the College of Graduate Studies University of Idaho by John H. Goettsche Major Professor: Clinton Jeffery, Ph.D. Committee Members: Robert Heckendorn, Ph.D.; Robert Rinker, Ph.D. Department Administrator: Frederick Sheldon, Ph.D. July 2015 ii Authorization to Submit Thesis This Thesis of John H. Goettsche, submitted for the degree of Master of Science with a Major in Computer Science and titled \Integrating Pattern Matching Within String Scanning," has been reviewed in final form. Permission, as indicated by the signatures and dates below, is now granted to submit final copies to the College of Graduate Studies for approval. Major Professor: Date: Clinton Jeffery, Ph.D. Committee Members: Date: Robert Heckendorn, Ph.D. Date: Robert Rinker, Ph.D. Department Administrator: Date: Frederick Sheldon, Ph.D. iii Abstract A SNOBOL4 like pattern data type and pattern matching operation were introduced to the Unicon language in 2005, but patterns were not integrated with the Unicon string scanning control structure and hence, the SNOBOL style patterns were not adopted as part of the language at that time. The goal of this project is to make the pattern data type accessible to the Unicon string scanning control structure and vice versa; and also make the pattern operators and functions lexically consistent with Unicon. To accomplish these goals, a Unicon string matching operator was changed to allow the execution of a pattern match in the anchored mode, pattern matching unevaluated expressions were revised to handle complex string scanning functions, and the pattern matching lexemes were revised to be more consistent with the Unicon language.
    [Show full text]
  • Practical Authenticated Pattern Matching with Optimal Proof Size
    Practical Authenticated Pattern Matching with Optimal Proof Size Dimitrios Papadopoulos Charalampos Papamanthou Boston University University of Maryland [email protected] [email protected] Roberto Tamassia Nikos Triandopoulos Brown University RSA Laboratories & Boston University [email protected] [email protected] ABSTRACT otherwise. More elaborate models for pattern matching involve We address the problem of authenticating pattern matching queries queries expressed as regular expressions over Σ or returning multi- over textual data that is outsourced to an untrusted cloud server. By ple occurrences of p, and databases allowing search over multiple employing cryptographic accumulators in a novel optimal integrity- texts or other (semi-)structured data (e.g., XML data). This core checking tool built directly over a suffix tree, we design the first data-processing problem has numerous applications in a wide range authenticated data structure for verifiable answers to pattern match- of topics including intrusion detection, spam filtering, web search ing queries featuring fast generation of constant-size proofs. We engines, molecular biology and natural language processing. present two main applications of our new construction to authen- Previous works on authenticated pattern matching include the ticate: (i) pattern matching queries over text documents, and (ii) schemes by Martel et al. [28] for text pattern matching, and by De- exact path queries over XML documents. Answers to queries are vanbu et al. [16] and Bertino et al. [10] for XML search. In essence, verified by proofs of size at most 500 bytes for text pattern match- these works adopt the same general framework: First, by hierarchi- ing, and at most 243 bytes for exact path XML search, indepen- cally applying a cryptographic hash function (e.g., SHA-2) over the dently of the document or answer size.
    [Show full text]
  • Pattern Matching
    Pattern Matching Document #: P1371R1 Date: 2019-06-17 Project: Programming Language C++ Evolution Reply-to: Sergei Murzin <[email protected]> Michael Park <[email protected]> David Sankel <[email protected]> Dan Sarginson <[email protected]> Contents 1 Revision History 3 2 Introduction 3 3 Motivation and Scope 3 4 Before/After Comparisons4 4.1 Matching Integrals..........................................4 4.2 Matching Strings...........................................4 4.3 Matching Tuples...........................................4 4.4 Matching Variants..........................................5 4.5 Matching Polymorphic Types....................................5 4.6 Evaluating Expression Trees.....................................6 4.7 Patterns In Declarations.......................................8 5 Design Overview 9 5.1 Basic Syntax.............................................9 5.2 Basic Model..............................................9 5.3 Types of Patterns........................................... 10 5.3.1 Primary Patterns....................................... 10 5.3.1.1 Wildcard Pattern................................. 10 5.3.1.2 Identifier Pattern................................. 10 5.3.1.3 Expression Pattern................................ 10 5.3.2 Compound Patterns..................................... 11 5.3.2.1 Structured Binding Pattern............................ 11 5.3.2.2 Alternative Pattern................................ 12 5.3.2.3 Parenthesized Pattern............................... 15 5.3.2.4 Case Pattern...................................
    [Show full text]
  • Sparsification–A Technique for Speeding up Dynamic Graph Algorithms
    Sparsification–A Technique for Speeding Up Dynamic Graph Algorithms DAVID EPPSTEIN University of California, Irvine, Irvine, California ZVI GALIL Columbia University, New York, New York GIUSEPPE F. ITALIANO Universita` “Ca’ Foscari” di Venezia, Venice, Italy AND AMNON NISSENZWEIG Tel-Aviv University, Tel-Aviv, Israel Abstract. We provide data structures that maintain a graph as edges are inserted and deleted, and keep track of the following properties with the following times: minimum spanning forests, graph connectivity, graph 2-edge connectivity, and bipartiteness in time O(n1/2) per change; 3-edge connectivity, in time O(n2/3) per change; 4-edge connectivity, in time O(na(n)) per change; k-edge connectivity for constant k, in time O(nlogn) per change; 2-vertex connectivity, and 3-vertex connectivity, in time O(n) per change; and 4-vertex connectivity, in time O(na(n)) per change. A preliminary version of this paper was presented at the 33rd Annual IEEE Symposium on Foundations of Computer Science (Pittsburgh, Pa.), 1992. D. Eppstein was supported in part by National Science Foundation (NSF) Grant CCR 92-58355. Z. Galil was supported in part by NSF Grants CCR 90-14605 and CDA 90-24735. G. F. Italiano was supported in part by the CEE Project No. 20244 (ALCOM-IT) and by a research grant from Universita` “Ca’ Foscari” di Venezia; most of his work was done while at IBM T. J. Watson Research Center. Authors’ present addresses: D. Eppstein, Department of Information and Computer Science, University of California, Irvine, CA 92717, e-mail: [email protected], internet: http://www.ics.
    [Show full text]