Regular Expressions: the Complete Tutorial

Total Page:16

File Type:pdf, Size:1020Kb

Regular Expressions: the Complete Tutorial Regular Expressions The Complete Tutorial Jan Goyvaerts Regular Expressions: The Complete Tutorial Jan Goyvaerts Copyright © 2006, 2007 Jan Goyvaerts. All rights reserved. Last updated July 2007. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the author. This book is published exclusively at http://www.regular-expressions.info/print.html Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information is provided on an “as is” basis. The author and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book. i Table of Contents Tutorial................................................................................................................ 1 1. Regular Expression Tutorial ......................................................................................................................................... 3 2. Literal Characters............................................................................................................................................................ 5 3. First Look at How a Regex Engine Works Internally .............................................................................................. 7 4. Character Classes or Character Sets............................................................................................................................. 9 5. The Dot Matches (Almost) Any Character ..............................................................................................................13 6. Start of String and End of String Anchors...............................................................................................................15 7. Word Boundaries..........................................................................................................................................................18 8. Alternation with The Vertical Bar or Pipe Symbol .................................................................................................21 9. Optional Items ..............................................................................................................................................................23 10. Repetition with Star and Plus ...................................................................................................................................24 11. Use Round Brackets for Grouping..........................................................................................................................27 12. Named Capturing Groups ........................................................................................................................................31 13. Unicode Regular Expressions...................................................................................................................................33 14. Regex Matching Modes .............................................................................................................................................42 15. Possessive Quantifiers ...............................................................................................................................................44 16. Atomic Grouping .......................................................................................................................................................47 17. Lookahead and Lookbehind Zero-Width Assertions...........................................................................................49 18. Testing The Same Part of a String for More Than One Requirement ..............................................................52 19. Continuing at The End of The Previous Match....................................................................................................54 20. If-Then-Else Conditionals in Regular Expressions ..............................................................................................56 21. XML Schema Character Classes ..............................................................................................................................59 22. POSIX Bracket Expressions ....................................................................................................................................61 23. Adding Comments to Regular Expressions...........................................................................................................65 24. Free-Spacing Regular Expressions...........................................................................................................................66 Examples........................................................................................................... 67 1. Sample Regular Expressions.......................................................................................................................................69 2. Matching Floating Point Numbers with a Regular Expression ............................................................................72 3. How to Find or Validate an Email Address.............................................................................................................73 4. Matching a Valid Date .................................................................................................................................................76 5. Matching Whole Lines of Text...................................................................................................................................77 6. Deleting Duplicate Lines From a File.......................................................................................................................78 8. Find Two Words Near Each Other...........................................................................................................................79 9. Runaway Regular Expressions: Catastrophic Backtracking...................................................................................80 10. Repeating a Capturing Group vs. Capturing a Repeated Group........................................................................85 Tools & Languages........................................................................................... 87 1. Specialized Tools and Utilities for Working with Regular Expressions ..............................................................89 2. Using Regular Expressions with Delphi for .NET and Win32.............................................................................91 ii 3. EditPad Pro: Convenient Text Editor with Full Regular Expression Support ..................................................92 4. What Is grep?.................................................................................................................................................................95 5. Using Regular Expressions in Java ............................................................................................................................97 6. Java Demo Application using Regular Expressions..............................................................................................100 7. Using Regular Expressions with JavaScript and ECMAScript............................................................................107 8. JavaScript RegExp Example: Regular Expression Tester....................................................................................109 9. MySQL Regular Expressions with The REGEXP Operator..............................................................................110 10. Using Regular Expressions with The Microsoft .NET Framework ................................................................111 11. C# Demo Application.............................................................................................................................................114 12. Oracle Database 10g Regular Expressions...........................................................................................................121 13. The PCRE Open Source Regex Library...............................................................................................................123 14. Perl’s Rich Support for Regular Expressions.......................................................................................................124 15. PHP Provides Three Sets of Regular Expression Functions ............................................................................126 16. POSIX Basic Regular Expressions........................................................................................................................129 17. PostgreSQL Has Three Regular Expression Flavors .........................................................................................131 18. PowerGREP: Taking grep Beyond The Command Line ..................................................................................133 19. Python’s re Module ..................................................................................................................................................135 20. How to Use Regular Expressions in REALbasic................................................................................................139 21. RegexBuddy: Your Perfect Companion for Working with Regular Expressions..........................................142
Recommended publications
  • Preview Objective-C Tutorial (PDF Version)
    Objective-C Objective-C About the Tutorial Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. This is the main programming language used by Apple for the OS X and iOS operating systems and their respective APIs, Cocoa and Cocoa Touch. This reference will take you through simple and practical approach while learning Objective-C Programming language. Audience This reference has been prepared for the beginners to help them understand basic to advanced concepts related to Objective-C Programming languages. Prerequisites Before you start doing practice with various types of examples given in this reference, I'm making an assumption that you are already aware about what is a computer program, and what is a computer programming language? Copyright & Disclaimer © Copyright 2015 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book can retain a copy for future reference but commercial use of this data is not allowed. Distribution or republishing any content or a part of the content of this e-book in any manner is also not allowed without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or in this tutorial, please notify us at [email protected] ii Objective-C Table of Contents About the Tutorial ..................................................................................................................................
    [Show full text]
  • Technical Standard
    Technical Standard X/Open Curses, Issue 7 The Open Group ©November 2009, The Open Group All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owners. Technical Standard X/Open Curses, Issue 7 ISBN: 1-931624-83-6 Document Number: C094 Published in the U.K. by The Open Group, November 2009. This standardhas been prepared by The Open Group Base Working Group. Feedback relating to the material contained within this standardmay be submitted by using the web site at http://austingroupbugs.net with the Project field set to "Xcurses Issue 7". ii Technical Standard 2009 Contents Chapter 1 Introduction........................................................................................... 1 1.1 This Document ........................................................................................ 1 1.1.1 Relationship to Previous Issues ......................................................... 1 1.1.2 Features Introduced in Issue 7 ........................................................... 2 1.1.3 Features Withdrawn in Issue 7........................................................... 2 1.1.4 Features Introduced in Issue 4 ........................................................... 2 1.2 Conformance............................................................................................ 3 1.2.1 Base Curses Conformance .................................................................
    [Show full text]
  • Javascript and the DOM
    Javascript and the DOM 1 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento The web architecture with smart browser The web programmer also writes Programs which run on the browser. Which language? Javascript! HTTP Get + params File System Smart browser Server httpd Cgi-bin Internet Query SQL Client process DB Data Evolution 3: execute code also on client! (How ?) Javascript and the DOM 1- Adding dynamic behaviour to HTML 3 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento Example 1: onmouseover, onmouseout <!DOCTYPE html> <html> <head> <title>Dynamic behaviour</title> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> </head> <body> <div onmouseover="this.style.color = 'red'" onmouseout="this.style.color = 'green'"> I can change my colour!</div> </body> </html> JAVASCRIPT The dynamic behaviour is on the client side! (The file can be loaded locally) <body> <div Example 2: onmouseover, onmouseout onmouseover="this.style.background='orange'; this.style.color = 'blue';" onmouseout=" this.innerText='and my text and position too!'; this.style.position='absolute'; this.style.left='100px’; this.style.top='150px'; this.style.borderStyle='ridge'; this.style.borderColor='blue'; this.style.fontSize='24pt';"> I can change my colour... </div> </body > JavaScript is event-based UiEvents: These event objects iherits the properties of the UiEvent: • The FocusEvent • The InputEvent • The KeyboardEvent • The MouseEvent • The TouchEvent • The WheelEvent See https://www.w3schools.com/jsref/obj_uievent.asp Test and Gym JAVASCRIPT HTML HEAD HTML BODY CSS https://www.jdoodle.com/html-css-javascript-online-editor/ Javascript and the DOM 2- Introduction to the language 8 Introduzione alla programmazione web – Marco Ronchetti 2020 – Università di Trento JavaScript History • JavaScript was born as Mocha, then “LiveScript” at the beginning of the 94’s.
    [Show full text]
  • Strings in C++
    Programming Abstractions C S 1 0 6 X Cynthia Lee Today’s Topics Introducing C++ from the Java Programmer’s Perspective . Absolute value example, continued › C++ strings and streams ADTs: Abstract Data Types . Introduction: What are ADTs? . Queen safety example › Grid data structure › Passing objects by reference • const reference parameters › Loop over “neighbors” in a grid Strings in C++ STRING LITERAL VS STRING CLASS CONCATENATION STRING CLASS METHODS 4 Using cout and strings int main(){ int n = absoluteValue(-5); string s = "|-5|"; s += " = "; • This prints |-5| = 5 cout << s << n << endl; • The + operator return 0; concatenates strings, } and += works in the way int absoluteValue(int n) { you’d expect. if (n<0){ n = -n; } return n; } 5 Using cout and strings int main(){ int n = absoluteValue(-5); But SURPRISE!…this one string s = "|-5|" + " = "; doesn’t work. cout << s << n << endl; return 0; } int absoluteValue(int n) { if (n<0){ n = -n; } return n; } C++ string objects and string literals . In this class, we will interact with two types of strings: › String literals are just hard-coded string values: • "hello!" "1234" "#nailedit" • They have no methods that do things for us • Think of them like integer literals: you can’t do "4.add(5);" //no › String objects are objects with lots of helpful methods and operators: • string s; • string piece = s.substr(0,3); • s.append(t); //or, equivalently: s+= t; String object member functions (3.2) Member function name Description s.append(str) add text to the end of a string s.compare(str) return
    [Show full text]
  • Framing Cyclic Revolutionary Emergence of Opposing Symbols of Identity Eppur Si Muove: Biomimetic Embedding of N-Tuple Helices in Spherical Polyhedra - /
    Alternative view of segmented documents via Kairos 23 October 2017 | Draft Framing Cyclic Revolutionary Emergence of Opposing Symbols of Identity Eppur si muove: Biomimetic embedding of N-tuple helices in spherical polyhedra - / - Introduction Symbolic stars vs Strategic pillars; Polyhedra vs Helices; Logic vs Comprehension? Dynamic bonding patterns in n-tuple helices engendering n-fold rotating symbols Embedding the triple helix in a spherical octahedron Embedding the quadruple helix in a spherical cube Embedding the quintuple helix in a spherical dodecahedron and a Pentagramma Mirificum Embedding six-fold, eight-fold and ten-fold helices in appropriately encircled polyhedra Embedding twelve-fold, eleven-fold, nine-fold and seven-fold helices in appropriately encircled polyhedra Neglected recognition of logical patterns -- especially of opposition Dynamic relationship between polyhedra engendered by circles -- variously implying forms of unity Symbol rotation as dynamic essential to engaging with value-inversion References Introduction The contrast to the geocentric model of the solar system was framed by the Italian mathematician, physicist and philosopher Galileo Galilei (1564-1642). His much-cited phrase, " And yet it moves" (E pur si muove or Eppur si muove) was allegedly pronounced in 1633 when he was forced to recant his claims that the Earth moves around the immovable Sun rather than the converse -- known as the Galileo affair. Such a shift in perspective might usefully inspire the recognition that the stasis attributed so widely to logos and other much-valued cultural and heraldic symbols obscures the manner in which they imply a fundamental cognitive dynamic. Cultural symbols fundamental to the identity of a group might then be understood as variously moving and transforming in ways which currently elude comprehension.
    [Show full text]
  • The Elinks Manual the Elinks Manual Table of Contents Preface
    The ELinks Manual The ELinks Manual Table of Contents Preface.......................................................................................................................................................ix 1. Getting ELinks up and running...........................................................................................................1 1.1. Building and Installing ELinks...................................................................................................1 1.2. Requirements..............................................................................................................................1 1.3. Recommended Libraries and Programs......................................................................................1 1.4. Further reading............................................................................................................................2 1.5. Tips to obtain a very small static elinks binary...........................................................................2 1.6. ECMAScript support?!...............................................................................................................4 1.6.1. Ok, so how to get the ECMAScript support working?...................................................4 1.6.2. The ECMAScript support is buggy! Shall I blame Mozilla people?..............................6 1.6.3. Now, I would still like NJS or a new JS engine from scratch. .....................................6 1.7. Feature configuration file (features.conf).............................................................................7
    [Show full text]
  • Use Perl Regular Expressions in SAS® Shuguang Zhang, WRDS, Philadelphia, PA
    NESUG 2007 Programming Beyond the Basics Use Perl Regular Expressions in SAS® Shuguang Zhang, WRDS, Philadelphia, PA ABSTRACT Regular Expression (Regexp) enhance search and replace operations on text. In SAS®, the INDEX, SCAN and SUBSTR functions along with concatenation (||) can be used for simple search and replace operations on static text. These functions lack flexibility and make searching dynamic text difficult, and involve more function calls. Regexp combines most, if not all, of these steps into one expression. This makes code less error prone, easier to maintain, clearer, and can improve performance. This paper will discuss three ways to use Perl Regular Expression in SAS: 1. Use SAS PRX functions; 2. Use Perl Regular Expression with filename statement through a PIPE such as ‘Filename fileref PIPE 'Perl programm'; 3. Use an X command such as ‘X Perl_program’; Three typical uses of regular expressions will also be discussed and example(s) will be presented for each: 1. Test for a pattern of characters within a string; 2. Replace text; 3. Extract a substring. INTRODUCTION Perl is short for “Practical Extraction and Report Language". Larry Wall Created Perl in mid-1980s when he was trying to produce some reports from a Usenet-Nes-like hierarchy of files. Perl tries to fill the gap between low-level programming and high-level programming and it is easy, nearly unlimited, and fast. A regular expression, often called a pattern in Perl, is a template that either matches or does not match a given string. That is, there are an infinite number of possible text strings.
    [Show full text]
  • Lecture 18: Theory of Computation Regular Expressions and Dfas
    Introduction to Theoretical CS Lecture 18: Theory of Computation Two fundamental questions. ! What can a computer do? ! What can a computer do with limited resources? General approach. Pentium IV running Linux kernel 2.4.22 ! Don't talk about specific machines or problems. ! Consider minimal abstract machines. ! Consider general classes of problems. COS126: General Computer Science • http://www.cs.Princeton.EDU/~cos126 2 Why Learn Theory In theory . Regular Expressions and DFAs ! Deeper understanding of what is a computer and computing. ! Foundation of all modern computers. ! Pure science. ! Philosophical implications. a* | (a*ba*ba*ba*)* In practice . ! Web search: theory of pattern matching. ! Sequential circuits: theory of finite state automata. a a a ! Compilers: theory of context free grammars. b b ! Cryptography: theory of computational complexity. 0 1 2 ! Data compression: theory of information. b "In theory there is no difference between theory and practice. In practice there is." -Yogi Berra 3 4 Pattern Matching Applications Regular Expressions: Basic Operations Test if a string matches some pattern. Regular expression. Notation to specify a set of strings. ! Process natural language. ! Scan for virus signatures. ! Search for information using Google. Operation Regular Expression Yes No ! Access information in digital libraries. ! Retrieve information from Lexis/Nexis. Concatenation aabaab aabaab every other string ! Search-and-replace in a word processors. cumulus succubus Wildcard .u.u.u. ! Filter text (spam, NetNanny, Carnivore, malware). jugulum tumultuous ! Validate data-entry fields (dates, email, URL, credit card). aa Union aa | baab baab every other string ! Search for markers in human genome using PROSITE patterns. aa ab Closure ab*a abbba ababa Parse text files.
    [Show full text]
  • Introduction to Javascript
    Introduction to JavaScript Lecture 6 CGS 3066 Fall 2016 October 6, 2016 JavaScript I Dynamic programming language. Program the behavior of web pages. I Client-side scripts to interact with the user. I Communicates asynchronously and alters document content. I Used with Node.js in server side scripting, game development, mobile applications, etc. I Has thousands of libraries that can be used to carry out various tasks. JavaScript is NOT Java I Names can be deceiving. I Java is a full-fledged object-oriented programming language. I Java is popular for developing large-scale distributed enterprise applications and web applications. I JavaScript is a browser-based scripting language developed by Netscape and implemented in all major browsers. I JavaScript is executed by the browsers on the client side. JavaScript and other languages JavaScript borrows the elements from a variety of languages. I Object orientation from Java. I Syntax from C. I Semantics from Self and Scheme. Whats a script? I A program written for a special runtime environment. I Interpreted (as opposed to compiled). I Used to automate tasks. I Operates at very high levels of abstraction. Whats JavaScript? I Developed at Netscape to perform client side validation. I Adopted by Microsoft in IE 3.0 (1996). I Standardized in 1996. Current standard is ECMAScript 6 (2016). I Specifications for ECMAScript 2016 are out. I CommonJS used for development outside the browser. JavaScript uses I JavaScript has an insanely large API and library. I It is possible to do almost anything with JavaScript. I Write small scripts/apps for your webpage.
    [Show full text]
  • Synchronizing Threads with POSIX Semaphores
    3/17/2016 POSIX Semaphores Synchronizing Threads with POSIX Semaphores 1. Why semaphores? 2. Posix semaphores are easy to use sem_init sem_wait sem_post sem_getvalue sem_destroy 3. Activities 1 2 Now it is time to take a look at some code that does something a little unexpected. The program badcnt.c creates two new threads, both of which increment a global variable called cnt exactly NITER, with NITER = 1,000,000. But the program produces unexpected results. Activity 1. Create a directory called posixsem in your class Unix directory. Download in this directory the code badcnt.c and compile it using gcc badcnt.c -o badcnt -lpthread Run the executable badcnt and observe the ouput. Try it on both tanner and felix. Quite unexpected! Since cnt starts at 0, and both threads increment it NITER times, we should see cnt equal to 2*NITER at the end of the program. What happens? Threads can greatly simplify writing elegant and efficient programs. However, there are problems when multiple threads share a common address space, like the variable cnt in our earlier example. To understand what might happen, let us analyze this simple piece of code: THREAD 1 THREAD 2 a = data; b = data; a++; b--; data = a; data = b; Now if this code is executed serially (for instance, THREAD 1 first and then THREAD 2), there are no problems. However threads execute in an arbitrary order, so consider the following situation: Thread 1 Thread 2 data a = data; --- 0 a = a+1; --- 0 --- b = data; // 0 0 --- b = b + 1; 0 data = a; // 1 --- 1 --- data = b; // 1 1 So data could end up +1, 0, -1, and there is NO WAY to know which value! It is completely non- deterministic! http://www.csc.villanova.edu/~mdamian/threads/posixsem.html 1/4 3/17/2016 POSIX Semaphores The solution to this is to provide functions that will block a thread if another thread is accessing data that it is using.
    [Show full text]
  • Variable Feature Usage Patterns in PHP
    Variable Feature Usage Patterns in PHP Mark Hills East Carolina University, Greenville, NC, USA [email protected] Abstract—PHP allows the names of variables, classes, func- and classes at runtime. Below, we refer to these generally as tions, methods, and properties to be given dynamically, as variable features, and specifically as, e.g., variable functions expressions that, when evaluated, return an identifier as a string. or variable methods. While this provides greater flexibility for programmers, it also makes PHP programs harder to precisely analyze and understand. The main contributions presented in this paper are as follows. In this paper we present a number of patterns designed to First, taking advantage of our prior work, we have developed recognize idiomatic uses of these features that can be statically a number of patterns for detecting idiomatic uses of variable resolved to a precise set of possible names. We then evaluate these features in PHP programs and for resolving these uses to a patterns across a corpus of 20 open-source systems totaling more set of possible names. Written using the Rascal programming than 3.7 million lines of PHP, showing how often these patterns occur in actual PHP code, demonstrating their effectiveness at language [2], [3], these patterns work over the ASTs of the statically determining the names that can be used at runtime, PHP scripts, with more advanced patterns also taking advantage and exploring anti-patterns that indicate when the identifier of control flow graphs and lightweight analysis algorithms for computation is truly dynamic. detecting value flow and reachability. Each of these patterns works generally across all variable features, instead of being I.
    [Show full text]
  • Perl Regular Expressions Tip Sheet Functions and Call Routines
    – Perl Regular Expressions Tip Sheet Functions and Call Routines Basic Syntax Advanced Syntax regex-id = prxparse(perl-regex) Character Behavior Character Behavior Compile Perl regular expression perl-regex and /…/ Starting and ending regex delimiters non-meta Match character return regex-id to be used by other PRX functions. | Alternation character () Grouping {}[]()^ Metacharacters, to match these pos = prxmatch(regex-id | perl-regex, source) $.|*+?\ characters, override (escape) with \ Search in source and return position of match or zero Wildcards/Character Class Shorthands \ Override (escape) next metacharacter if no match is found. Character Behavior \n Match capture buffer n Match any one character . (?:…) Non-capturing group new-string = prxchange(regex-id | perl-regex, times, \w Match a word character (alphanumeric old-string) plus "_") Lazy Repetition Factors Search and replace times number of times in old- \W Match a non-word character (match minimum number of times possible) string and return modified string in new-string. \s Match a whitespace character Character Behavior \S Match a non-whitespace character *? Match 0 or more times call prxchange(regex-id, times, old-string, new- \d Match a digit character +? Match 1 or more times string, res-length, trunc-value, num-of-changes) Match a non-digit character ?? Match 0 or 1 time Same as prior example and place length of result in \D {n}? Match exactly n times res-length, if result is too long to fit into new-string, Character Classes Match at least n times trunc-value is set to 1, and the number of changes is {n,}? Character Behavior Match at least n but not more than m placed in num-of-changes.
    [Show full text]