<<

REGULAR EXPRESSIONS (REGEX)

41 / 20 A regular expression (regex) is a pattern that a string may or may not match. Example: [0-9]+ [0-9] means a character in that range “+” means one or more of what came before Strings that match: 9125 4 Strings that don’t: abc empty string Note: Symbols like [, ], and + have special meaning. They are not part of the string that matches. We’ll nd out how to look for these in a string shortly.

42 / 20 USES FOR REGULAR EXPRESSIONS Handling white space: A program ought to be able to treat any number of white space characters as a separator. Identifying blank lines: Most people consider a line with just spaces on it to be blank. Validating input: To check that input has the expected format (e.g., a date in DD-MM-YYYY format).

43 / 20 WHY DO WE USE REGEX? It’s much easier to declare a pattern that you want matched than to write code that matches it. By having the pattern explicitly declared, rather than implicit in code that matches it, it’s much easier to understand what the pattern is and modify it if need be.

4 / 20 WHERE DO WE USE REGEX? Regular expressions are used in many places. Editors like , emacs, and Sublime Text allow you to use regular expressions for searching. Many commands use regular expressions. Example: grep pattern file prints all lines from le that match pattern Many programming languages provide a library for regular expressions. The syntax varies from context to context, but the core is the same everywhere.

45 / 20 SOME SIMPLE PATTERNS

46 / 20 ANCHORING Anchoring lets you force the position of the match. ^ matches the beginning of the line $ matches the end

47 / 20 ESCAPING What if I want my regular expression to look for the string $ ^ [ We would need to escape it’s predened quantier meaning by adding in a backslash. \$ \^ \[ We can also use escapes to get other characters \t is a tab character \n is a newline

48 / 20 QUANTIFIERS

49 / 20 PREDEFINED CHARACTERS

104 / 20 DEFINING YOUR OWN CHARACTER CLASSES

What if you want to dene your own character range?

141 / 20 CAPTURING GROUPS Capturing groups allow you to treat multiple characters as a single unit. Use parentheses to group. e.g. (BC)* means zero or more instances of BC, e.g., BC, BCBC, BCBCBC, etc.

124 / 20 CAPTURING GROUPS ARE NUMBERED Capturing groups are numbered by counting their opening parentheses from left to right ((A)(B())) has the following groups 1. ((A)(B(C))) 2. (A) 3. (B(C)) 4. (C) The numbers are useful because they can used as references to the groups.

134 / 20 CAPTURING GROUPS AND BACKREFERENCES The section of the input string matching the capturing group(s) is saved in memory for later recall via backreference. A backreference is specied in the regular expression as a backslash (\) followed by a digit indicating the number of the group to be recalled. For example, one matching string for the regex (\d\d)\1 is 1212 One matching string for the regex (\w*)\s\1 is asdf asdf

144 / 20 FINDING REGEX PATTERNS When given a set of strings that need to be matched, a proper regex for this set will: match ALL strings in this set, and will NOT match any string that is NOT in this set

154 / 20 REGULAR EXPRESSIONS IN JAVA

The java.util.regex package contains: Pattern: a compiled regular expression Matcher: the result of a match import java.util.regex.Pattern; import java.util.regex.Matcher;

161 / 20 172 / 20 MATCHES() VS. FIND() matches():matches the entire string with the regex find(): tries to nd a substring that matches the regex

183 / 20 GET MORE PRACTICE! Some great tutorials/tools for practicing regex: https://regexone.com/ https://regex101.com/

194 / 20 DIFFERENT "FLAVOURS" OF REGEX Different languages have different implementations of Regex which may behave differently. For this course, we will only be testing you on regex usage in Java For other types of regex you may use outside this course: know your avour before using it (read the manual).

204 / 20