Regular Expressions (Regex)

Regular Expressions (Regex)

REGULAR EXPRESSIONS (REGEX) 41 / 20 REGULAR EXPRESSION A regular expression (regex) is a pattern that a string may or may not match. Example: [0-9]+ [0-9] means a character in that range “+” means one or more of what came before Strings that match: 9125 4 Strings that don’t: abc empty string Note: Symbols like [, ], and + have special meaning. They are not part of the string that matches. We’ll nd out how to look for these in a string shortly. 42 / 20 USES FOR REGULAR EXPRESSIONS Handling white space: A program ought to be able to treat any number of white space characters as a separator. Identifying blank lines: Most people consider a line with just spaces on it to be blank. Validating input: To check that input has the expected format (e.g., a date in DD-MM-YYYY format). 43 / 20 WHY DO WE USE REGEX? It’s much easier to declare a pattern that you want matched than to write code that matches it. By having the pattern explicitly declared, rather than implicit in code that matches it, it’s much easier to understand what the pattern is and modify it if need be. 4 / 20 WHERE DO WE USE REGEX? Regular expressions are used in many places. Editors like vi, emacs, and Sublime Text allow you to use regular expressions for searching. Many unix commands use regular expressions. Example: grep pattern file prints all lines from le that match pattern Many programming languages provide a library for regular expressions. The syntax varies from context to context, but the core is the same everywhere. 45 / 20 SOME SIMPLE PATTERNS 46 / 20 ANCHORING Anchoring lets you force the position of the match. ^ matches the beginning of the line $ matches the end 47 / 20 ESCAPING What if I want my regular expression to look for the string $ ^ [ We would need to escape it’s predened quantier meaning by adding in a backslash. \$ \^ \[ We can also use escapes to get other characters \t is a tab character \n is a newline 48 / 20 QUANTIFIERS 49 / 20 PREDEFINED CHARACTERS 104 / 20 DEFINING YOUR OWN CHARACTER CLASSES What if you want to dene your own character range? 141 / 20 CAPTURING GROUPS Capturing groups allow you to treat multiple characters as a single unit. Use parentheses to group. e.g. (BC)* means zero or more instances of BC, e.g., BC, BCBC, BCBCBC, etc. 124 / 20 CAPTURING GROUPS ARE NUMBERED Capturing groups are numbered by counting their opening parentheses from left to right ((A)(B(C))) has the following groups 1. ((A)(B(C))) 2. (A) 3. (B(C)) 4. (C) The numbers are useful because they can used as references to the groups. 134 / 20 CAPTURING GROUPS AND BACKREFERENCES The section of the input string matching the capturing group(s) is saved in memory for later recall via backreference. A backreference is specied in the regular expression as a backslash (\) followed by a digit indicating the number of the group to be recalled. For example, one matching string for the regex (\d\d)\1 is 1212 One matching string for the regex (\w*)\s\1 is asdf asdf 144 / 20 FINDING REGEX PATTERNS When given a set of strings that need to be matched, a proper regex for this set will: match ALL strings in this set, and will NOT match any string that is NOT in this set 154 / 20 REGULAR EXPRESSIONS IN JAVA The java.util.regex package contains: Pattern: a compiled regular expression Matcher: the result of a match import java.util.regex.Pattern; import java.util.regex.Matcher; 161 / 20 172 / 20 MATCHES() VS. FIND() matches():matches the entire string with the regex find(): tries to nd a substring that matches the regex 183 / 20 GET MORE PRACTICE! Some great tutorials/tools for practicing regex: https://regexone.com/ https://regex101.com/ 194 / 20 DIFFERENT "FLAVOURS" OF REGEX Different languages have different implementations of Regex which may behave differently. For this course, we will only be testing you on regex usage in Java For other types of regex you may use outside this course: know your avour before using it (read the manual). 204 / 20.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us