Entropy-Based Approach in Selection Exact String-Matching Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Pattern Matching Using Similarity Measures
Pattern matching using similarity measures Patroonvergelijking met behulp van gelijkenismaten (met een samenvatting in het Nederlands) PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van Rector Magnificus, Prof. Dr. H. O. Voorma, ingevolge het besluit van het College voor Promoties in het openbaar te verdedigen op maandag 18 september 2000 des morgens te 10:30 uur door Michiel Hagedoorn geboren op 13 juli 1972, te Renkum promotor: Prof. Dr. M. H. Overmars Faculteit Wiskunde & Informatica co-promotor: Dr. R. C. Veltkamp Faculteit Wiskunde & Informatica ISBN 90-393-2460-3 PHILIPS '$ The&&% research% described in this thesis has been made possible by financial support from Philips Research Laboratories. The work in this thesis has been carried out in the graduate school ASCI. Contents 1 Introduction 1 1.1Patternmatching.......................... 1 1.2Applications............................. 4 1.3Obtaininggeometricpatterns................... 7 1.4 Paradigms in geometric pattern matching . 8 1.5Similaritymeasurebasedpatternmatching........... 11 1.6Overviewofthisthesis....................... 16 2 A theory of similarity measures 21 2.1Pseudometricspaces........................ 22 2.2Pseudometricpatternspaces................... 30 2.3Embeddingpatternsinafunctionspace............. 40 2.4TheHausdorffmetric........................ 46 2.5Thevolumeofsymmetricdifference............... 54 2.6 Reflection visibility based distances . 60 2.7Summary.............................. 71 2.8Experimentalresults....................... -
Low-Power Computing
LOW-POWER COMPUTING Safa Alsalman a, Raihan ur Rasool b, Rizwan Mian c a King Faisal University, Alhsa, Saudi Arabia b Victoria University, Melbourne, Australia c Data++, Toronto, Canada Corresponding email: [email protected] Abstract With the abundance of mobile electronics, demand for Low-Power Computing is pressing more than ever. Achieving a reduction in power consumption requires gigantic efforts from electrical and electronic engineers, computer scientists and software developers. During the past decade, various techniques and methodologies for designing low-power solutions have been proposed. These methods are aimed at small mobile devices as well as large datacenters. There are techniques that consider design paradigms and techniques including run-time issues. This paper summarizes the main approaches adopted by the IT community to promote Low-power computing. Keywords: Computing, Energy-efficient, Low power, Power-efficiency. Introduction In the past two decades, technology has evolved rapidly affecting every aspect of our daily lives. The fact that we study, work, communicate and entertain ourselves using all different types of devices and gadgets is an evidence that technology is a revolutionary event in the human history. The technology is here to stay but with big responsibilities comes bigger challenges. As digital devices shrink in size and become more portable, power consumption and energy efficiency become a critical issue. On one end, circuits designed for portable devices must target increasing battery life. On the other end, the more complex high-end circuits available at data centers have to consider power costs, cooling requirements, and reliability issues. Low-power computing is the field dedicated to the design and manufacturing of low-power consumption circuits, programming power-aware software and applying techniques for power-efficiency [1][2]. -
Pattern Matching Using Fuzzy Methods David Bell and Lynn Palmer, State of California: Genetic Disease Branch
Pattern Matching Using Fuzzy Methods David Bell and Lynn Palmer, State of California: Genetic Disease Branch ABSTRACT The formula is : Two major methods of detecting similarities Euclidean Distance and 2 a "fuzzy Hamming distance" presented in a paper entitled "F %%y Dij : ; ( <3/i - yj 4 Hamming Distance: A New Dissimilarity Measure" by Bookstein, Klein, and Raita, will be compared using entropy calculations to Euclidean distance is especially useful for comparing determine their abilities to detect patterns in data and match data matching whole word object elements of 2 vectors. The records. .e find that both means of measuring distance are useful code in Java is: depending on the context. .hile fuzzy Hamming distance outperforms in supervised learning situations such as searches, the Listing 1: Euclidean Distance Euclidean distance measure is more useful in unsupervised pattern matching and clustering of data. /** * EuclideanDistance.java INTRODUCTION * * Pattern matching is becoming more of a necessity given the needs for * Created: Fri Oct 07 08:46:40 2002 such methods for detecting similarities in records, epidemiological * * @author David Bell: DHS-GENETICS patterns, genetics and even in the emerging fields of criminal * @version 1.0 behavioral pattern analysis and disease outbreak analysis due to */ possible terrorist activity. 0nfortunately, traditional methods for /** Abstact Distance class linking or matching records, data fields, etc. rely on exact data * @param None matches rather than looking for close matches or patterns. 1f course * @return Distance Template proximity pattern matches are often necessary when dealing with **/ messy data, data that has inexact values and/or data with missing key abstract class distance{ values. -
CSCI 2041: Pattern Matching Basics
CSCI 2041: Pattern Matching Basics Chris Kauffman Last Updated: Fri Sep 28 08:52:58 CDT 2018 1 Logistics Reading Assignment 2 I OCaml System Manual: Ch I Demo in lecture 1.4 - 1.5 I Post today/tomorrow I Practical OCaml: Ch 4 Next Week Goals I Mon: Review I Code patterns I Wed: Exam 1 I Pattern Matching I Fri: Lecture 2 Consider: Summing Adjacent Elements 1 (* match_basics.ml: basic demo of pattern matching *) 2 3 (* Create a list comprised of the sum of adjacent pairs of 4 elements in list. The last element in an odd-length list is 5 part of the return as is. *) 6 let rec sum_adj_ie list = 7 if list = [] then (* CASE of empty list *) 8 [] (* base case *) 9 else 10 let a = List.hd list in (* DESTRUCTURE list *) 11 let atail = List.tl list in (* bind names *) 12 if atail = [] then (* CASE of 1 elem left *) 13 [a] (* base case *) 14 else (* CASE of 2 or more elems left *) 15 let b = List.hd atail in (* destructure list *) 16 let tail = List.tl atail in (* bind names *) 17 (a+b) :: (sum_adj_ie tail) (* recursive case *) The above function follows a common paradigm: I Select between Cases during a computation I Cases are based on structure of data I Data is Destructured to bind names to parts of it 3 Pattern Matching in Programming Languages I Pattern Matching as a programming language feature checks that data matches a certain structure the executes if so I Can take many forms such as processing lines of input files that match a regular expression I Pattern Matching in OCaml/ML combines I Case analysis: does the data match a certain structure I Destructure Binding: bind names to parts of the data I Pattern Matching gives OCaml/ML a certain "cool" factor I Associated with the match/with syntax as follows match something with | pattern1 -> result1 (* pattern1 gives result1 *) | pattern2 -> (* pattern 2.. -
Compiling Pattern Matching to Good Decision Trees
Submitted to ML’08 Compiling Pattern Matching to good Decision Trees Luc Maranget INRIA Luc.marangetinria.fr Abstract In this paper we study compilation to decision tree, whose We address the issue of compiling ML pattern matching to efficient primary advantage is never testing a given subterm of the subject decisions trees. Traditionally, compilation to decision trees is op- value more than once (and whose primary drawback is potential timized by (1) implementing decision trees as dags with maximal code size explosion). Our aim is to refine naive compilation to sharing; (2) guiding a simple compiler with heuristics. We first de- decision trees, and to compare the output of such an optimizing sign new heuristics that are inspired by necessity, a notion from compiler with optimized backtracking automata. lazy pattern matching that we rephrase in terms of decision tree se- Compilation to decision can be very sensitive to the testing mantics. Thereby, we simplify previous semantical frameworks and order of subject value subterms. The situation can be explained demonstrate a direct connection between necessity and decision by the example of an human programmer attempting to translate a ML program into a lower-level language without pattern matching. tree runtime efficiency. We complete our study by experiments, 1 showing that optimized compilation to decision trees is competi- Let f be the following function defined on triples of booleans : tive. We also suggest some heuristics precisely. l e t f x y z = match x,y,z with | _,F,T -> 1 Categories and Subject Descriptors D 3. 3 [Programming Lan- | F,T,_ -> 2 guages]: Language Constructs and Features—Patterns | _,_,F -> 3 | _,_,T -> 4 General Terms Design, Performance, Sequentiality. -
Combinatorial Pattern Matching
Combinatorial Pattern Matching 1 A Recurring Problem Finding patterns within sequences Variants on this idea Finding repeated motifs amoungst a set of strings What are the most frequent k-mers How many time does a specific k-mer appear Fundamental problem: Pattern Matching Find all positions of a particular substring in given sequence? 2 Pattern Matching Goal: Find all occurrences of a pattern in a text Input: Pattern p = p1, p2, … pn and text t = t1, t2, … tm Output: All positions 1 < i < (m – n + 1) such that the n-letter substring of t starting at i matches p def bruteForcePatternMatching(p, t): locations = [] for i in xrange(0, len(t)-len(p)+1): if t[i:i+len(p)] == p: locations.append(i) return locations print bruteForcePatternMatching("ssi", "imissmissmississippi") [11, 14] 3 Pattern Matching Performance Performance: m - length of the text t n - the length of the pattern p Search Loop - executed O(m) times Comparison - O(n) symbols compared Total cost - O(mn) per pattern In practice, most comparisons terminate early Worst-case: p = "AAAT" t = "AAAAAAAAAAAAAAAAAAAAAAAT" 4 We can do better! If we preprocess our pattern we can search more effciently (O(n)) Example: imissmissmississippi 1. s 2. s 3. s 4. SSi 5. s 6. SSi 7. s 8. SSI - match at 11 9. SSI - match at 14 10. s 11. s 12. s At steps 4 and 6 after finding the mismatch i ≠ m we can skip over all positions tested because we know that the suffix "sm" is not a prefix of our pattern "ssi" Even works for our worst-case example "AAAAT" in "AAAAAAAAAAAAAAT" by recognizing the shared prefixes ("AAA" in "AAAA"). -
HOW MODERATE-SIZED RISC-BASED Smps CAN OUTPERFORM MUCH LARGER DISTRIBUTED MEMORY Mpps
HOW MODERATE-SIZED RISC-BASED SMPs CAN OUTPERFORM MUCH LARGER DISTRIBUTED MEMORY MPPs D. M. Pressel Walter B. Sturek Corporate Information and Computing Center Corporate Information and Computing Center U.S. Army Research Laboratory U.S. Army Research Laboratory Aberdeen Proving Ground, Maryland 21005-5067 Aberdeen Proving Ground, Maryland 21005-5067 Email: [email protected] Email: sturek@arlmil J. Sahu K. R. Heavey Weapons and Materials Research Directorate Weapons and Materials Research Directorate U.S. Army Research Laboratory U.S. Army Research Laboratory Aberdeen Proving Ground, Maryland 21005-5066 Aberdeen Proving Ground, Maryland 21005-5066 Email: [email protected] Email: [email protected] ABSTRACT: Historically, comparison of computer systems was based primarily on theoretical peak performance. Today, based on delivered levels of performance, comparisons are frequently used. This of course raises a whole host of questions about how to do this. However, even this approach has a fundamental problem. It assumes that all FLOPS are of equal value. As long as one is only using either vector or large distributed memory MIMD MPPs, this is probably a reasonable assumption. However, when comparing the algorithms of choice used on these two classes of platforms, one frequently finds a significant difference in the number of FLOPS required to obtain a solution with the desired level of precision. While troubling, this dichotomy has been largely unavoidable. Recent advances involving moderate-sized RISC-based SMPs have allowed us to solve this problem. -
Integrating Pattern Matching Within String Scanning a Thesis
Integrating Pattern Matching Within String Scanning A Thesis Presented in Partial Fulfillment of the Requirements for the Degree of Master of Science with a Major in Computer Science in the College of Graduate Studies University of Idaho by John H. Goettsche Major Professor: Clinton Jeffery, Ph.D. Committee Members: Robert Heckendorn, Ph.D.; Robert Rinker, Ph.D. Department Administrator: Frederick Sheldon, Ph.D. July 2015 ii Authorization to Submit Thesis This Thesis of John H. Goettsche, submitted for the degree of Master of Science with a Major in Computer Science and titled \Integrating Pattern Matching Within String Scanning," has been reviewed in final form. Permission, as indicated by the signatures and dates below, is now granted to submit final copies to the College of Graduate Studies for approval. Major Professor: Date: Clinton Jeffery, Ph.D. Committee Members: Date: Robert Heckendorn, Ph.D. Date: Robert Rinker, Ph.D. Department Administrator: Date: Frederick Sheldon, Ph.D. iii Abstract A SNOBOL4 like pattern data type and pattern matching operation were introduced to the Unicon language in 2005, but patterns were not integrated with the Unicon string scanning control structure and hence, the SNOBOL style patterns were not adopted as part of the language at that time. The goal of this project is to make the pattern data type accessible to the Unicon string scanning control structure and vice versa; and also make the pattern operators and functions lexically consistent with Unicon. To accomplish these goals, a Unicon string matching operator was changed to allow the execution of a pattern match in the anchored mode, pattern matching unevaluated expressions were revised to handle complex string scanning functions, and the pattern matching lexemes were revised to be more consistent with the Unicon language. -
Practical Authenticated Pattern Matching with Optimal Proof Size
Practical Authenticated Pattern Matching with Optimal Proof Size Dimitrios Papadopoulos Charalampos Papamanthou Boston University University of Maryland [email protected] [email protected] Roberto Tamassia Nikos Triandopoulos Brown University RSA Laboratories & Boston University [email protected] [email protected] ABSTRACT otherwise. More elaborate models for pattern matching involve We address the problem of authenticating pattern matching queries queries expressed as regular expressions over Σ or returning multi- over textual data that is outsourced to an untrusted cloud server. By ple occurrences of p, and databases allowing search over multiple employing cryptographic accumulators in a novel optimal integrity- texts or other (semi-)structured data (e.g., XML data). This core checking tool built directly over a suffix tree, we design the first data-processing problem has numerous applications in a wide range authenticated data structure for verifiable answers to pattern match- of topics including intrusion detection, spam filtering, web search ing queries featuring fast generation of constant-size proofs. We engines, molecular biology and natural language processing. present two main applications of our new construction to authen- Previous works on authenticated pattern matching include the ticate: (i) pattern matching queries over text documents, and (ii) schemes by Martel et al. [28] for text pattern matching, and by De- exact path queries over XML documents. Answers to queries are vanbu et al. [16] and Bertino et al. [10] for XML search. In essence, verified by proofs of size at most 500 bytes for text pattern match- these works adopt the same general framework: First, by hierarchi- ing, and at most 243 bytes for exact path XML search, indepen- cally applying a cryptographic hash function (e.g., SHA-2) over the dently of the document or answer size. -
Pattern Matching
Pattern Matching Document #: P1371R1 Date: 2019-06-17 Project: Programming Language C++ Evolution Reply-to: Sergei Murzin <[email protected]> Michael Park <[email protected]> David Sankel <[email protected]> Dan Sarginson <[email protected]> Contents 1 Revision History 3 2 Introduction 3 3 Motivation and Scope 3 4 Before/After Comparisons4 4.1 Matching Integrals..........................................4 4.2 Matching Strings...........................................4 4.3 Matching Tuples...........................................4 4.4 Matching Variants..........................................5 4.5 Matching Polymorphic Types....................................5 4.6 Evaluating Expression Trees.....................................6 4.7 Patterns In Declarations.......................................8 5 Design Overview 9 5.1 Basic Syntax.............................................9 5.2 Basic Model..............................................9 5.3 Types of Patterns........................................... 10 5.3.1 Primary Patterns....................................... 10 5.3.1.1 Wildcard Pattern................................. 10 5.3.1.2 Identifier Pattern................................. 10 5.3.1.3 Expression Pattern................................ 10 5.3.2 Compound Patterns..................................... 11 5.3.2.1 Structured Binding Pattern............................ 11 5.3.2.2 Alternative Pattern................................ 12 5.3.2.3 Parenthesized Pattern............................... 15 5.3.2.4 Case Pattern................................... -
Introduction to Analysis of Algorithms Introduction to Sorting
Introduction to Analysis of Algorithms Introduction to Sorting CS 311 Data Structures and Algorithms Lecture Slides Monday, February 23, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2005–2009 Glenn G. Chappell Unit Overview Recursion & Searching Major Topics • Introduction to Recursion • Search Algorithms • Recursion vs. NIterationE • EliminatingDO Recursion • Recursive Search with Backtracking 23 Feb 2009 CS 311 Spring 2009 2 Unit Overview Algorithmic Efficiency & Sorting We now begin a unit on algorithmic efficiency & sorting algorithms. Major Topics • Introduction to Analysis of Algorithms • Introduction to Sorting • Comparison Sorts I • More on Big-O • The Limits of Sorting • Divide-and-Conquer • Comparison Sorts II • Comparison Sorts III • Radix Sort • Sorting in the C++ STL We will (partly) follow the text. • Efficiency and sorting are in Chapter 9. After this unit will be the in-class Midterm Exam. 23 Feb 2009 CS 311 Spring 2009 3 Introduction to Analysis of Algorithms Efficiency [1/3] What do we mean by an “efficient” algorithm? • We mean an algorithm that uses few resources . • By far the most important resource is time . • Thus, when we say an algorithm is efficient , assuming we do not qualify this further , we mean that it can be executed quickly . How do we determine whether an algorithm is efficient? • Implement it, and run the result on some computer? • But the speed of computers is not fixed. • And there are differences in compilers, etc. Is there some way to measure efficiency that does not depend on the system chosen or the current state of technology? 23 Feb 2009 CS 311 Spring 2009 4 Introduction to Analysis of Algorithms Efficiency [2/3] Is there some way to measure efficiency that does not depend on the system chosen or the current state of technology? • Yes! Rough Idea • Divide the tasks an algorithm performs into “steps”. -
Software Design Pattern Using : Algorithmic Skeleton Approach S
International Journal of Computer Network and Security (IJCNS) Vol. 3 No. 1 ISSN : 0975-8283 Software Design Pattern Using : Algorithmic Skeleton Approach S. Sarika Lecturer,Department of Computer Sciene and Engineering Sathyabama University, Chennai-119. [email protected] Abstract - In software engineering, a design pattern is a general reusable solution to a commonly occurring problem in software design. A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that can be used in many different situations. Object-oriented design patterns typically show relationships and interactions between classes or objects, without specifying the final application classes or objects that are involved. Many patterns imply object-orientation or more generally mutable state, and so may not be as applicable in functional programming languages, in which data is immutable Fig 1. Primary design pattern or treated as such..Not all software patterns are design patterns. For instance, algorithms solve 1.1Attributes related to design : computational problems rather than software design Expandability : The degree to which the design of a problems.In this paper we try to focous the system can be extended. algorithimic pattern for good software design. Simplicity : The degree to which the design of a Key words - Software Design, Design Patterns, system can be understood easily. Software esuability,Alogrithmic pattern. Reusability : The degree to which a piece of design can be reused in another design. 1. INTRODUCTION Many studies in the literature (including some by 1.2 Attributes related to implementation: these authors) have for premise that design patterns Learn ability : The degree to which the code source of improve the quality of object-oriented software systems, a system is easy to learn.