Regexing in SAS for Pattern Matching and Replacement
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Pattern Matching Using Similarity MeasuresPattern matching using similarity measures Patroonvergelijking met behulp van gelijkenismaten (met een samenvatting in het Nederlands) PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van Rector Magnificus, Prof. Dr. H. O. Voorma, ingevolge het besluit van het College voor Promoties in het openbaar te verdedigen op maandag 18 september 2000 des morgens te 10:30 uur door Michiel Hagedoorn geboren op 13 juli 1972, te Renkum promotor: Prof. Dr. M. H. Overmars Faculteit Wiskunde & Informatica co-promotor: Dr. R. C. Veltkamp Faculteit Wiskunde & Informatica ISBN 90-393-2460-3 PHILIPS '$ The&&% research% described in this thesis has been made possible by financial support from Philips Research Laboratories. The work in this thesis has been carried out in the graduate school ASCI. Contents 1 Introduction 1 1.1Patternmatching.......................... 1 1.2Applications............................. 4 1.3Obtaininggeometricpatterns................... 7 1.4 Paradigms in geometric pattern matching . 8 1.5Similaritymeasurebasedpatternmatching........... 11 1.6Overviewofthisthesis....................... 16 2 A theory of similarity measures 21 2.1Pseudometricspaces........................ 22 2.2Pseudometricpatternspaces................... 30 2.3Embeddingpatternsinafunctionspace............. 40 2.4TheHausdorffmetric........................ 46 2.5Thevolumeofsymmetricdifference............... 54 2.6 Reflection visibility based distances . 60 2.7Summary.............................. 71 2.8Experimentalresults.......................
- 
												  Pattern Matching Using Fuzzy Methods David Bell and Lynn Palmer, State of California: Genetic Disease BranchPattern Matching Using Fuzzy Methods David Bell and Lynn Palmer, State of California: Genetic Disease Branch ABSTRACT The formula is : Two major methods of detecting similarities Euclidean Distance and 2 a "fuzzy Hamming distance" presented in a paper entitled "F %%y Dij : ; ( <3/i - yj 4 Hamming Distance: A New Dissimilarity Measure" by Bookstein, Klein, and Raita, will be compared using entropy calculations to Euclidean distance is especially useful for comparing determine their abilities to detect patterns in data and match data matching whole word object elements of 2 vectors. The records. .e find that both means of measuring distance are useful code in Java is: depending on the context. .hile fuzzy Hamming distance outperforms in supervised learning situations such as searches, the Listing 1: Euclidean Distance Euclidean distance measure is more useful in unsupervised pattern matching and clustering of data. /** * EuclideanDistance.java INTRODUCTION * * Pattern matching is becoming more of a necessity given the needs for * Created: Fri Oct 07 08:46:40 2002 such methods for detecting similarities in records, epidemiological * * @author David Bell: DHS-GENETICS patterns, genetics and even in the emerging fields of criminal * @version 1.0 behavioral pattern analysis and disease outbreak analysis due to */ possible terrorist activity. 0nfortunately, traditional methods for /** Abstact Distance class linking or matching records, data fields, etc. rely on exact data * @param None matches rather than looking for close matches or patterns. 1f course * @return Distance Template proximity pattern matches are often necessary when dealing with **/ messy data, data that has inexact values and/or data with missing key abstract class distance{ values.
- 
												  Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
- 
												  “Log” File in StataUpdated July 2018 Creating a “Log” File in Stata This set of notes describes how to create a log file within the computer program Stata. It assumes that you have set Stata up on your computer (see the “Getting Started with Stata” handout), and that you have read in the set of data that you want to analyze (see the “Reading in Stata Format (.dta) Data Files” handout). A log file records all your Stata commands and output in a given session, with the exception of graphs. It is usually wise to retain a copy of the work that you’ve done on a given project, to refer to while you are writing up your findings, or later on when you are revising a paper. A log file is a separate file that has either a “.log” or “.smcl” extension. Saving the log as a .smcl file (“Stata Markup and Control Language file”) keeps the formatting from the Results window. It is recommended to save the log as a .log file. Although saving it as a .log file removes the formatting and saves the output in plain text format, it can be opened in most text editing programs. A .smcl file can only be opened in Stata. To create a log file: You may create a log file by typing log using ”filepath & filename” in the Stata Command box. On a PC: If one wanted to save a log file (.log) for a set of analyses on hard disk C:, in the folder “LOGS”, one would type log using "C:\LOGS\analysis_log.log" On a Mac: If one wanted to save a log file (.log) for a set of analyses in user1’s folder on the hard drive, in the folder “logs”, one would type log using "/Users/user1/logs/analysis_log.log" If you would like to replace an existing log file with a newer version add “replace” after the file name (Note: PC file path) log using "C:\LOGS\analysis_log.log", replace Alternately, you can use the menu: click on File, then on Log, then on Begin.
- 
												  Department of the Secretary of State Maine Bureau of Motor VehiclesDepartment of the Secretary of State Maine Bureau of Motor Vehicles VANITY PLATE INFORMATION AND INSTRUCTIONS Vanity plates are available for the following types of plates: Up to 7 characters – plus one dash or space Antique Auto, Combination, Commercial, Custom Vehicle, Hire, Motor Home, Passenger, Street Rod, Trailer, Wabanaki Up to 7 characters – includes space or dash Motorcycle, Moped, Antique Motorcycle, Special Veterans Motorcycle Up to 6 characters – plus one dash or space Agriculture, Agriculture Commercial, Barbara Bush Children’s Hospital, Black Bear, Breast Cancer, Emergency Medical Services, Farm, Lobster, Sportsman, Support Animal Welfare, We Support Our Troops Up to 6 characters – includes space or dash Special Veterans Up to 5 characters – plus one dash or space Firefighter Up to 5 characters – includes space or dash Conservation, Conservation Commercial, Disability, Disability Motorcycle, Disabled Veterans, Disabled Veterans Motorcycle, Purple Heart, Purple Heart Motorcycle, University of Maine. Up to 4 characters – includes space or dash Special Veterans Disability Vanity plates with a total of 7 characters may mix numbers and letters in any order, as long as there is at least one letter in the sequence, i.e. A123456, 1234A56, 123456B, AGR8PL8. Vanity plates having fewer than 7 characters may mix numbers and letters as long as the first character is a letter, i.e. A1, A2B, A3C5, A4D4U, F2G3H4. Vanity plates must not begin with a 0 (zero) or with the letter “O”, followed by only numbers. Please note: Vanity plates with seven characters, plus one space or dash will cover the chickadee design. An ampersand (&) is available on vanity plates with 2 to 6 characters.
- 
												  Basic STATA CommandsSummer School on Capability and Multidimensional Poverty OPHI-HDCA, 2011 Oxford Poverty and Human Development Initiative (OPHI) http://ophi.qeh.ox.ac.uk Oxford Dept of International Development, Queen Elizabeth House, University of Oxford Basic STATA commands Typical STATA window Review Results Variables Commands Exploring your data Create a do file doedit Change your directory cd “c:\your directory” Open your database use database, clear Import from Excel (csv file) insheet using "filename.csv" Condition (after the following commands) if var1==3 or if var1==”male” Equivalence symbols: == equal; ~= not equal; != not equal; > greater than; >= greater than or equal; < less than; <= less than or equal; & and; | or. Weight [iw=weight] or [aw=weight] Browse your database browse Look for a variables lookfor “any word/topic” Summarize a variable (mean, standard su variable1 variable2 variable3 deviation, min. and max.) Tabulate a variable (per category) tab variable1 (add a second variables for cross tabs) Statistics for variables by subgroups tabstat variable1 variable2, s(n mean) by(group) Information of a variable (coding) codebook variable1, tab(99) Keep certain variables (use drop for the keep var1 var2 var3 opposite) Save a dataset save filename, [replace] Summer School on Capability and Multidimensional Poverty OPHI-HDCA, 2011 Creating Variables Generate a new variable (a number or a gen new_variable = 1 combinations of other variables) gen new_variable = variable1+ variable2 Generate a new variable conditional gen new_variable
- 
												  Basic Facts About Trademarks United States Patent and Trademark O CeProtecting Your Trademark ENHANCING YOUR RIGHTS THROUGH FEDERAL REGISTRATION Basic Facts About Trademarks United States Patent and Trademark O ce Published on February 2020 Our website resources For general information and links to Frequently trademark Asked Questions, processing timelines, the Trademark NEW [2] basics Manual of Examining Procedure (TMEP) , and FILERS the Acceptable Identification of Goods and Services Manual (ID Manual)[3]. Protecting Your Trademark Trademark Information Network (TMIN) Videos[4] Enhancing Your Rights Through Federal Registration Tools TESS Search pending and registered marks using the Trademark Electronic Search System (TESS)[5]. File applications and other documents online using the TEAS Trademark Electronic Application System (TEAS)[6]. Check the status of an application and view and TSDR download application and registration records using Trademark Status and Document Retrieval (TSDR)[7]. Transfer (assign) ownership of a mark to another ASSIGNMENTS entity or change the owner name and search the Assignments database[8]. Visit the Trademark Trial and Appeal Board (TTAB)[9] TTAB online. United States Patent and Trademark Office An Agency of the United States Department of Commerce UNITED STATES PATENT AND TRADEMARK OFFICE BASIC FACTS ABOUT TRADEMARKS CONTENTS MEET THE USPTO ������������������������������������������������������������������������������������������������������������������������������������������������������������������ 1 TRADEMARK, COPYRIGHT, OR PATENT ��������������������������������������������������������������������������������������������������������������������������
- 
												  CSCI 2041: Pattern Matching BasicsCSCI 2041: Pattern Matching Basics Chris Kauffman Last Updated: Fri Sep 28 08:52:58 CDT 2018 1 Logistics Reading Assignment 2 I OCaml System Manual: Ch I Demo in lecture 1.4 - 1.5 I Post today/tomorrow I Practical OCaml: Ch 4 Next Week Goals I Mon: Review I Code patterns I Wed: Exam 1 I Pattern Matching I Fri: Lecture 2 Consider: Summing Adjacent Elements 1 (* match_basics.ml: basic demo of pattern matching *) 2 3 (* Create a list comprised of the sum of adjacent pairs of 4 elements in list. The last element in an odd-length list is 5 part of the return as is. *) 6 let rec sum_adj_ie list = 7 if list = [] then (* CASE of empty list *) 8 [] (* base case *) 9 else 10 let a = List.hd list in (* DESTRUCTURE list *) 11 let atail = List.tl list in (* bind names *) 12 if atail = [] then (* CASE of 1 elem left *) 13 [a] (* base case *) 14 else (* CASE of 2 or more elems left *) 15 let b = List.hd atail in (* destructure list *) 16 let tail = List.tl atail in (* bind names *) 17 (a+b) :: (sum_adj_ie tail) (* recursive case *) The above function follows a common paradigm: I Select between Cases during a computation I Cases are based on structure of data I Data is Destructured to bind names to parts of it 3 Pattern Matching in Programming Languages I Pattern Matching as a programming language feature checks that data matches a certain structure the executes if so I Can take many forms such as processing lines of input files that match a regular expression I Pattern Matching in OCaml/ML combines I Case analysis: does the data match a certain structure I Destructure Binding: bind names to parts of the data I Pattern Matching gives OCaml/ML a certain "cool" factor I Associated with the match/with syntax as follows match something with | pattern1 -> result1 (* pattern1 gives result1 *) | pattern2 -> (* pattern 2..
- 
												  PC23 and PC43 Desktop Printer User Manual Document Change Record This Page Records Changes to This DocumentPC23 | PC43 Desktop Printer PC23d, PC43d, PC43t User Manual Intermec by Honeywell 6001 36th Ave.W. Everett, WA 98203 U.S.A. www.intermec.com The information contained herein is provided solely for the purpose of allowing customers to operate and service Intermec-manufactured equipment and is not to be released, reproduced, or used for any other purpose without written permission of Intermec by Honeywell. Information and specifications contained in this document are subject to change without prior notice and do not represent a commitment on the part of Intermec by Honeywell. © 2012–2014 Intermec by Honeywell. All rights reserved. The word Intermec, the Intermec logo, Fingerprint, Ready-to-Work, and SmartSystems are either trademarks or registered trademarks of Intermec by Honeywell. For patent information, please refer to www.hsmpats.com Wi-Fi is a registered certification mark of the Wi-Fi Alliance. Microsoft, Windows, and the Windows logo are registered trademarks of Microsoft Corporation in the United States and/or other countries. Bluetooth is a trademark of Bluetooth SIG, Inc., U.S.A. The products described herein comply with the requirements of the ENERGY STAR. As an ENERGY STAR partner, Intermec Technologies has determined that this product meets the ENERGY STAR guidelines for energy efficiency. For more information on the ENERGY STAR program, see www.energystar.gov. The ENERGY STAR does not represent EPA endorsement of any product or service. ii PC23 and PC43 Desktop Printer User Manual Document Change Record This page records changes to this document. The document was originally released as Revision 001. Version Number Date Description of Change 005 12/2014 Revised to support MR7 firmware release.
- 
												  Compiling Pattern Matching to Good Decision TreesSubmitted to ML’08 Compiling Pattern Matching to good Decision Trees Luc Maranget INRIA Luc.marangetinria.fr Abstract In this paper we study compilation to decision tree, whose We address the issue of compiling ML pattern matching to efficient primary advantage is never testing a given subterm of the subject decisions trees. Traditionally, compilation to decision trees is op- value more than once (and whose primary drawback is potential timized by (1) implementing decision trees as dags with maximal code size explosion). Our aim is to refine naive compilation to sharing; (2) guiding a simple compiler with heuristics. We first de- decision trees, and to compare the output of such an optimizing sign new heuristics that are inspired by necessity, a notion from compiler with optimized backtracking automata. lazy pattern matching that we rephrase in terms of decision tree se- Compilation to decision can be very sensitive to the testing mantics. Thereby, we simplify previous semantical frameworks and order of subject value subterms. The situation can be explained demonstrate a direct connection between necessity and decision by the example of an human programmer attempting to translate a ML program into a lower-level language without pattern matching. tree runtime efficiency. We complete our study by experiments, 1 showing that optimized compilation to decision trees is competi- Let f be the following function defined on triples of booleans : tive. We also suggest some heuristics precisely. l e t f x y z = match x,y,z with | _,F,T -> 1 Categories and Subject Descriptors D 3. 3 [Programming Lan- | F,T,_ -> 2 guages]: Language Constructs and Features—Patterns | _,_,F -> 3 | _,_,T -> 4 General Terms Design, Performance, Sequentiality.
- 
												  Combinatorial Pattern MatchingCombinatorial Pattern Matching 1 A Recurring Problem Finding patterns within sequences Variants on this idea Finding repeated motifs amoungst a set of strings What are the most frequent k-mers How many time does a specific k-mer appear Fundamental problem: Pattern Matching Find all positions of a particular substring in given sequence? 2 Pattern Matching Goal: Find all occurrences of a pattern in a text Input: Pattern p = p1, p2, … pn and text t = t1, t2, … tm Output: All positions 1 < i < (m – n + 1) such that the n-letter substring of t starting at i matches p def bruteForcePatternMatching(p, t): locations = [] for i in xrange(0, len(t)-len(p)+1): if t[i:i+len(p)] == p: locations.append(i) return locations print bruteForcePatternMatching("ssi", "imissmissmississippi") [11, 14] 3 Pattern Matching Performance Performance: m - length of the text t n - the length of the pattern p Search Loop - executed O(m) times Comparison - O(n) symbols compared Total cost - O(mn) per pattern In practice, most comparisons terminate early Worst-case: p = "AAAT" t = "AAAAAAAAAAAAAAAAAAAAAAAT" 4 We can do better! If we preprocess our pattern we can search more effciently (O(n)) Example: imissmissmississippi 1. s 2. s 3. s 4. SSi 5. s 6. SSi 7. s 8. SSI - match at 11 9. SSI - match at 14 10. s 11. s 12. s At steps 4 and 6 after finding the mismatch i ≠ m we can skip over all positions tested because we know that the suffix "sm" is not a prefix of our pattern "ssi" Even works for our worst-case example "AAAAT" in "AAAAAAAAAAAAAAT" by recognizing the shared prefixes ("AAA" in "AAAA").
- 
												  Mastercard ® Brand Mark Branding Requirements for CanadaMastercard ® Brand Mark Branding requirements for Canada Version 1.0 / March 2017 Mastercard Brand Mark: Branding requirements for Canada March 2017 2 Table of contents Top five things you need to know 3 If after reading the branding requirements you still haven’t found the answer to your Brand Mark query, please contact us in one of two ways. configurations and versions 4 Acceptance Mark Email the Brand Manager configurations and versions 5 [email protected] Color specifications 6 Mastercard Brand Hotline Minimum sizes and free space 7 1-914-249-1326 Using the Mastercard name in text 8 Using with other marks 9 Card artwork 10 Use in merchant advertising 11 Use at physical merchant locations 12 Use at digital merchant locations 13 Use in digital applications 14 Use on ATMs 15 Use on contactless devices 16 Common mistakes 17 ©2017 Mastercard. All rights reserved. Mastercard®, Maestro®, and Cirrus® are registered trademarks, and the circles design is a trademark of Mastercard International Incorporated. Mastercard Brand Mark: Branding requirements for Canada March 2017 3 Top five things you need to know General requirements Brand Mark 1. There are multiple configurations and versions of the Mark. Use the correct one for your needs. See configurations Symbol Logotype and versions. Registered trademarks are available in English or French. 2. Always surround the Mark with Minimum sufficient free space, based on “x”, which free space is equal to the width of the “m” in the x x x x “mastercard” Logotype. See free space specifications x 1/2x x x 3.