Lecture 7 Regular Expressions and Grep 7.1
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Express Yourself! Regular Expressions Vs SAS Text String Functions Spencer Childress, Rho®, Inc., Chapel Hill, NC
PharmaSUG 2014 - Paper BB08 Express Yourself! Regular Expressions vs SAS Text String Functions Spencer Childress, Rho®, Inc., Chapel Hill, NC ABSTRACT ® SAS and Perl regular expression functions offer a powerful alternative and complement to typical SAS text string functions. By harnessing the power of regular expressions, SAS functions such as PRXMATCH and PRXCHANGE not only overlap functionality with functions such as INDEX and TRANWRD, they also eclipse them. With the addition of the modifier argument to such functions as COMPRESS, SCAN, and FINDC, some of the regular expression syntax already exists for programmers familiar with SAS 9.2 and later versions. We look at different methods that solve the same problem, with detailed explanations of how each method works. Problems range from simple searches to complex search and replaces. Programmers should expect an improved grasp of the regular expression and how it can complement their portfolio of code. The techniques presented herein offer a good overview of basic data step text string manipulation appropriate for all levels of SAS capability. While this article targets a clinical computing audience, the techniques apply to a broad range of computing scenarios. INTRODUCTION This article focuses on the added capability of Perl regular expressions to a SAS programmer’s skillset. A regular expression (regex) forms a search pattern, which SAS uses to scan through a text string to detect matches. An extensive library of metacharacters, characters with special meanings within the regex, allows extremely robust searches. Before jumping in, the reader would do well to read over ‘An Introduction to Perl Regular Expressions in SAS 9’, referencing page 3 in particular (Cody, 2004). -
IBM Db2 High Performance Unload for Z/OS User's Guide
5.1 IBM Db2 High Performance Unload for z/OS User's Guide IBM SC19-3777-03 Note: Before using this information and the product it supports, read the "Notices" topic at the end of this information. Subsequent editions of this PDF will not be delivered in IBM Publications Center. Always download the latest edition from the Db2 Tools Product Documentation page. This edition applies to Version 5 Release 1 of Db2 High Performance Unload for z/OS (product number 5655-AA1) and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright IBM® Corporation 1999, 2021; Copyright Infotel 1999, 2021. All Rights Reserved. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. © Copyright International Business Machines Corporation . US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this information......................................................................................... vii Chapter 1. Db2 High Performance Unload overview................................................ 1 What does Db2 HPU do?..............................................................................................................................1 Db2 HPU benefits.........................................................................................................................................1 Db2 HPU process and components.............................................................................................................2 -
855 Symbols & (Ampersand), 211 && (AND Operator), 608 `` (Backquotes/Backticks), 143 & (Bitwise AND) Operator, 1
index.fm Page 855 Wednesday, October 25, 2006 1:28 PM Index Symbols A & (ampersand), 211 abs() function, 117 && (AND operator), 608 Access methods, 53, 758–760 `` (backquotes/backticks), 143 ACTION attribute, 91 & (bitwise AND) operator, 140 action attribute, 389–390, 421 ~ (bitwise NOT) operator, 141 addcslashes() function, 206–208, 214 | (bitwise OR) operator, 140 Addition (+), 112 ^ (bitwise XOR) operator, 140 addslashes() function, 206–208, 214 ^ (caret metacharacter), 518, 520–521 admin_art_edit.php, 784, 793–796 " (double quote), 211 admin_art_list.php, 784 @ (error control) operator, 143–145 admin_artist_edit.php, 784, 788–789 > (greater than), 211, 608, 611-612 admin_artist_insert.php, 784, 791–792 << (left shift) operator, 141 admin_artist_list.php, 784, 786–787, < (less than), 211, 608, 611-612 796 * metacharacter, 536–538 admin_footer.php, 784 !, NOT operator, 608 admin_header.php, 784, 803 < operator, 608, 611–612 admin_login.php, 784, 797, 803 <= operator, 608 Advisory locking, 474 <>, != operator, 608–609 Aliases, 630–631 = operator, 608–609 Alphabetic sort of, 299–300 > operator, 608, 611–612 Alphanumeric word characters, metasymbols -> operator, 743–744 representing, 531–533 >= operator, 608 ALTER command, 631 || (OR operator), 130–132, 608 ALTER TABLE statement, 620, 631–633 >> (right shift) operator, 141 Alternation, metacharacters for, 543 ' (single quote), 211, 352–353, 609 Anchoring metacharacters, 520–523 % wildcard, 613–614 beginning-of-line anchor, 520–523 >>> (zero-fill right shift) operator, end-of-line anchor, -
Perl Regular Expressions 102
NESUG 2006 Posters Perl Regular Expressions 102 Kenneth W. Borowiak, Howard M. Proskin & Associates, Inc., Rochester, NY ABSTRACT Perl regular expressions were made available in SAS® in Version 9 through the PRX family of functions and call routines. The SAS community has already generated some literature on getting started with these often-cryptic, but very powerful character functions. The goal of this paper is to build upon the existing literature by exposing some of the pitfalls and subtleties of writing regular expressions. Using a fictitious clinical trial adverse event data set, concepts such as zero-width assertions, anchors, non-capturing buffers and greedy quantifiers are explored. This paper is targeted at those who already have a basic knowledge of regular expressions. Keywords: word boundary, negative lookaheads, positive lookbehinds, zero-width assertions, anchors, non-capturing buffers, greedy quantifiers INTRODUCTION Regular expressions enable you to generally characterize a pattern for subsequent matching and manipulation of text fields. If you have you ever used a text editor’s Find (-and Replace) capability of literal strings then you are already using regular expressions, albeit in the most strict sense. In SAS Version 9, Perl regular expressions were made available through the PRX family of functions and call routines. Though the Programming Extract and Reporting Language is itself a programming language, it is the regular expression capabilities of Perl that have been implemented in SAS. The SAS community of users and developers has already generated some literature on getting started with these often- cryptic, but very powerful character functions. Introductory papers by Cassell [2005], Cody [2006], Pless [2005] and others are referenced at the end of this paper. -
GNU Grep: Print Lines That Match Patterns Version 3.7, 8 August 2021
GNU Grep: Print lines that match patterns version 3.7, 8 August 2021 Alain Magloire et al. This manual is for grep, a pattern matching engine. Copyright c 1999{2002, 2005, 2008{2021 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". i Table of Contents 1 Introduction ::::::::::::::::::::::::::::::::::::: 1 2 Invoking grep :::::::::::::::::::::::::::::::::::: 2 2.1 Command-line Options ::::::::::::::::::::::::::::::::::::::::: 2 2.1.1 Generic Program Information :::::::::::::::::::::::::::::: 2 2.1.2 Matching Control :::::::::::::::::::::::::::::::::::::::::: 2 2.1.3 General Output Control ::::::::::::::::::::::::::::::::::: 3 2.1.4 Output Line Prefix Control :::::::::::::::::::::::::::::::: 5 2.1.5 Context Line Control :::::::::::::::::::::::::::::::::::::: 6 2.1.6 File and Directory Selection:::::::::::::::::::::::::::::::: 7 2.1.7 Other Options ::::::::::::::::::::::::::::::::::::::::::::: 9 2.2 Environment Variables:::::::::::::::::::::::::::::::::::::::::: 9 2.3 Exit Status :::::::::::::::::::::::::::::::::::::::::::::::::::: 12 2.4 grep Programs :::::::::::::::::::::::::::::::::::::::::::::::: 13 3 Regular Expressions ::::::::::::::::::::::::::: 14 3.1 Fundamental Structure :::::::::::::::::::::::::::::::::::::::: -
“Linux at the Command Line” Don Johnson of BU IS&T We’Ll Start with a Sign in Sheet
“Linux at the Command Line” Don Johnson of BU IS&T We’ll start with a sign in sheet. We’ll end with a class evaluation. We’ll cover as much as we can in the time allowed; if we don’t cover everything, you’ll pick it up as you continue working with Linux. This is a hands-on, lab class; ask questions at any time. Commands for you to type are in BOLD The Most Common O/S Used By BU Researchers When Working on a Server or Computer Cluster Linux is a Unix clone begun in 1991 and written from scratch by Linus Torvalds with assistance from a loosely-knit team of hackers across the Net. 64% of the world’s servers run some variant of Unix or Linux. The Android phone and the Kindle run Linux. a set of small Linux is an O/S core programs written by written by Linus Richard Stallman and Torvalds and others others. They are the AND GNU utilities. http://www.gnu.org/ Network: ssh, scp Shells: BASH, TCSH, clear, history, chsh, echo, set, setenv, xargs System Information: w, whoami, man, info, which, free, echo, date, cal, df, free Command Information: man, info Symbols: |, >, >>, <, ;, ~, ., .. Filters: grep, egrep, more, less, head, tail Hotkeys: <ctrl><c>, <ctrl><d> File System: ls, mkdir, cd, pwd, mv, touch, file, find, diff, cmp, du, chmod, find File Editors: gedit, nedit You need a “xterm” emulation – software that emulates an “X” terminal and that connects using the “SSH” Secure Shell protocol. ◦ Windows Use StarNet “X-Win32:” http://www.bu.edu/tech/support/desktop/ distribution/xwindows/xwin32/ ◦ Mac OS X “Terminal” is already installed Why? Darwin, the system on which Apple's Mac OS X is built, is a derivative of 4.4BSD-Lite2 and FreeBSD. -
Regular Expressions in Jlex to Define a Token in Jlex, the User to Associates a Regular Expression with Commands Coded in Java
Regular Expressions in JLex To define a token in JLex, the user to associates a regular expression with commands coded in Java. When input characters that match a regular expression are read, the corresponding Java code is executed. As a user of JLex you don’t need to tell it how to match tokens; you need only say what you want done when a particular token is matched. Tokens like white space are deleted simply by having their associated command not return anything. Scanning continues until a command with a return in it is executed. The simplest form of regular expression is a single string that matches exactly itself. © CS 536 Spring 2005 109 For example, if {return new Token(sym.If);} If you wish, you can quote the string representing the reserved word ("if"), but since the string contains no delimiters or operators, quoting it is unnecessary. For a regular expression operator, like +, quoting is necessary: "+" {return new Token(sym.Plus);} © CS 536 Spring 2005 110 Character Classes Our specification of the reserved word if, as shown earlier, is incomplete. We don’t (yet) handle upper or mixed- case. To extend our definition, we’ll use a very useful feature of Lex and JLex— character classes. Characters often naturally fall into classes, with all characters in a class treated identically in a token definition. In our definition of identifiers all letters form a class since any of them can be used to form an identifier. Similarly, in a number, any of the ten digit characters can be used. © CS 536 Spring 2005 111 Character classes are delimited by [ and ]; individual characters are listed without any quotation or separators. -
AVR244 AVR UART As ANSI Terminal Interface
AVR244: AVR UART as ANSI Terminal Interface Features 8-bit • Make use of standard terminal software as user interface to your application. • Enables use of a PC keyboard as input and ascii graphic to display status and control Microcontroller information. • Drivers for ANSI/VT100 Terminal Control included. • Interactive menu interface included. Application Note Introduction This application note describes some basic routines to interface the AVR to a terminal window using the UART (hardware or software). The routines use a subset of the ANSI Color Standard to position the cursor and choose text modes and colors. Rou- tines for simple menu handling are also implemented. The routines can be used to implement a human interface through an ordinary termi- nal window, using interactive menus and selections. This is particularly useful for debugging and diagnostics purposes. The routines can be used as a basic interface for implementing more complex terminal user interfaces. To better understand the code, an introduction to ‘escape sequences’ is given below. Escape Sequences The special terminal functions mentioned (e.g. text modes and colors) are selected using ANSI escape sequences. The AVR sends these sequences to the connected terminal, which in turn executes the associated commands. The escape sequences are strings of bytes starting with an escape character (ASCII code 27) followed by a left bracket ('['). The rest of the string decides the specific operation. For instance, the command '1m' selects bold text, and the full escape sequence thus becomes 'ESC[1m'. There must be no spaces between the characters, and the com- mands are case sensitive. The various operations used in this application note are described below. -
Quick Tips and Tricks: Perl Regular Expressions in SAS® Pratap S
Paper 4005-2019 Quick Tips and Tricks: Perl Regular Expressions in SAS® Pratap S. Kunwar, Jinson Erinjeri, Emmes Corporation. ABSTRACT Programming with text strings or patterns in SAS® can be complicated without the knowledge of Perl regular expressions. Just knowing the basics of regular expressions (PRX functions) will sharpen anyone's programming skills. Having attended a few SAS conferences lately, we have noticed that there are few presentations on this topic and many programmers tend to avoid learning and applying the regular expressions. Also, many of them are not aware of the capabilities of these functions in SAS. In this presentation, we present quick tips on these expressions with various applications which will enable anyone learn this topic with ease. INTRODUCTION SAS has numerous character (string) functions which are very useful in manipulating character fields. Every SAS programmer is generally familiar with basic character functions such as SUBSTR, SCAN, STRIP, INDEX, UPCASE, LOWCASE, CAT, ANY, NOT, COMPARE, COMPBL, COMPRESS, FIND, TRANSLATE, TRANWRD etc. Though these common functions are very handy for simple string manipulations, they are not built for complex pattern matching and search-and-replace operations. Regular expressions (RegEx) are both flexible and powerful and are widely used in popular programming languages such as Perl, Python, JavaScript, PHP, .NET and many more for pattern matching and translating character strings. Regular expressions skills can be easily ported to other languages like SQL., However, unlike SQL, RegEx itself is not a programming language, but simply defines a search pattern that describes text. Learning regular expressions starts with understanding of character classes and metacharacters. -
ANSI® Programmer's Reference Manual Line Matrix Series Printers
ANSI® Programmer’s Reference Manual Line Matrix Series Printers Printronix, LLC makes no representations or warranties of any kind regarding this material, including, but not limited to, implied warranties of merchantability and fitness for a particular purpose. Printronix, LLC shall not be held responsible for errors contained herein or any omissions from this material or for any damages, whether direct, indirect, incidental or consequential, in connection with the furnishing, distribution, performance or use of this material. The information in this manual is subject to change without notice. This document contains proprietary information protected by copyright. No part of this document may be reproduced, copied, translated or incorporated in any other material in any form or by any means, whether manual, graphic, electronic, mechanical or otherwise, without the prior written consent of Printronix, LLC Copyright © 1998, 2012 Printronix, LLC All rights reserved. Trademark Acknowledgements ANSI is a registered trademark of American National Standards Institute, Inc. Centronics is a registered trademark of Genicom Corporation. Dataproducts is a registered trademark of Dataproducts Corporation. Epson is a registered trademark of Seiko Epson Corporation. IBM and Proprinter are registered trademarks and PC-DOS is a trademark of International Business Machines Corporation. MS-DOS is a registered trademark of Microsoft Corporation. Printronix, IGP, PGL, LinePrinter Plus, and PSA are registered trademarks of Printronix, LLC. QMS is a registered -
Context-Free Grammar for the Syntax of Regular Expression Over the ASCII
Context-free Grammar for the syntax of regular expression over the ASCII character set assumption : • A regular expression is to be interpreted a Haskell string, then is used to match against a Haskell string. Therefore, each regexp is enclosed inside a pair of double quotes, just like any Haskell string. For clarity, a regexp is highlighted and a “Haskell input string” is quoted for the examples in this document. • Since ASCII character strings will be encoded as in Haskell, therefore special control ASCII characters such as NUL and DEL are handled by Haskell. context-free grammar : BNF notation is used to describe the syntax of regular expressions defined in this document, with the following basic rules: • <nonterminal> ::= choice1 | choice2 | ... • Double quotes are used when necessary to reflect the literal meaning of the content itself. <regexp> ::= <union> | <concat> <union> ::= <regexp> "|" <concat> <concat> ::= <term><concat> | <term> <term> ::= <star> | <element> <star> ::= <element>* <element> ::= <group> | <char> | <emptySet> | <emptyStr> <group> ::= (<regexp>) <char> ::= <alphanum> | <symbol> | <white> <alphanum> ::= A | B | C | ... | Z | a | b | c | ... | z | 0 | 1 | 2 | ... | 9 <symbol> ::= ! | " | # | $ | % | & | ' | + | , | - | . | / | : | ; | < | = | > | ? | @ | [ | ] | ^ | _ | ` | { | } | ~ | <sp> | \<metachar> <sp> ::= " " <metachar> ::= \ | "|" | ( | ) | * | <white> <white> ::= <tab> | <vtab> | <nline> <tab> ::= \t <vtab> ::= \v <nline> ::= \n <emptySet> ::= Ø <emptyStr> ::= "" Explanations : 1. Definition of <metachar> in our definition of regexp: Symbol meaning \ Used to escape a metacharacter, \* means the star char itself | Specifies alternatives, y|n|m means y OR n OR m (...) Used for grouping, giving the group priority * Used to indicate zero or more of a regexp, a* matches the empty string, “a”, “aa”, “aaa” and so on Whi tespace char meaning \n A new line character \t A horizontal tab character \v A vertical tab character 2. -
Data Types in C
Princeton University Computer Science 217: Introduction to Programming Systems Data Types in C 1 Goals of C Designers wanted C to: But also: Support system programming Support application programming Be low-level Be portable Be easy for people to handle Be easy for computers to handle • Conflicting goals on multiple dimensions! • Result: different design decisions than Java 2 Primitive Data Types • integer data types • floating-point data types • pointer data types • no character data type (use small integer types instead) • no character string data type (use arrays of small ints instead) • no logical or boolean data types (use integers instead) For “under the hood” details, look back at the “number systems” lecture from last week 3 Integer Data Types Integer types of various sizes: signed char, short, int, long • char is 1 byte • Number of bits per byte is unspecified! (but in the 21st century, pretty safe to assume it’s 8) • Sizes of other integer types not fully specified but constrained: • int was intended to be “natural word size” • 2 ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long) On ArmLab: • Natural word size: 8 bytes (“64-bit machine”) • char: 1 byte • short: 2 bytes • int: 4 bytes (compatibility with widespread 32-bit code) • long: 8 bytes What decisions did the designers of Java make? 4 Integer Literals • Decimal: 123 • Octal: 0173 = 123 • Hexadecimal: 0x7B = 123 • Use "L" suffix to indicate long literal • No suffix to indicate short literal; instead must use cast Examples • int: 123, 0173, 0x7B • long: 123L, 0173L, 0x7BL • short: