Regular Expressions Regular Expressions Specifying Languages Regular Languages Kleene Star Operation Regular Expressions

Regular Expressions Another means to describe languages Regular Expressions accepted by Finite Automata. In some books, regular languages, by definition, are described using regular expressions. Specifying Languages Regular Languages Recall: how do we specify languages? A regular expression describes a If language is finite, you can list all of its strings. language using only the set operations L = {a, aa, aba, aca} of: Descriptive: Union L = {x | na(x) = nb(x)} Using basic Language operations Concatenation * * L= {aa, ab} ∪ {b}{bb} Kleene Star Regular languages are described using this last method Kleene Star Operation Regular Expressions The set of strings that can be obtained by Regular expressions are the mechanism concatenating any number of elements of a by which regular languages are * language L is called the Kleene Star, L described: " * i 0 1 2 3 4 L = L = L ! L ! L ! L ! L ... Take the “set operation” definition of the Ui=0 language and: Replace ∪ with + * 0 Note that since, L contains L , λ is an Replace {} with () * element of L And you have a regular expression 1 Regular expressions Regular Expression {λ} λ Recursive definition of regular languages / expression over Σ : {011} 011 1. ∅ is a regular language and its regular {0,1} 0 + 1 expression is ∅ {0, 01} 0 + 01 2. {λ} is a regular language and λ is its regular expression {110}*{0,1} (110)*(0+1) 3. For each a ∈ Σ, {a} is a regular language and {10, 11, 01}* (10 + 11 + 01)* its regular expression is a {0, 11}*({11}* ∪ {101, λ}) (0 + 11)*((11)* + 101 + λ) Regular Expression Regular Expressions 4. If L1 and L2 are regular languages with regular Some shorthand expressions r1 and r2 then If we apply precedents to the operators, we can -- L L is a regular language with regular 1 ∪ 2 relax the full parenthesized definition: expression (r1 + r2) Kleene star has highest precedent -- L1L2 is a regular language with regular Concatenation had mid precedent expression (r1r2) -- book uses (r1•r2 ) * + has lowest precedent -- L1 is a regular language with regular expression * Thus (r1 ) * * a + b c is the same as (a + ((b )c)) -- Regular expressions can be parenthesized to * * (a + b) is not the same as a + b indicate operator precidence I.e. (r1) Only languages obtainable by using rules 1-4 are regular languages. Regular Expressions Regular Expressions More shorthand Even more shorthand Equating regular expressions. Sometimes you might see in the book: n Two regular expressions are considered equal if r where n indicates the number of they describe the same language concatenations of r (e.g. r6) * * * + 1 1 = 1 r to indicate one or more concatenations of r. * * (a + b) ≠ a + b Note that this is only shorthand! 6 + r and r are not regular expressions. 2 Regular Expressions Regular Expressions Important thing to remember Questions? A regular expression is not a language A regular expression is used to describe a language. It is incorrect to say that for a language L, * L = (a + b + c) But it’s okay to say that L is described by * (a + b + c) Examples of Regular Languages Examples of Regular Languages All finite languages can be described by All finite languages can be described regular expressions using regular expressions A finite language L can be expressed as Can anyone tell me why? the union of languages each with one string corresponding to a string in L Example: L = {a, aa, aba, aca} L = {a} ∪ {aa} ∪ {aba} ∪ {aca} Regular expression: (a + aa + aba + aca) Examples of Regular Languages Examples of Regular Languages * * L = {x ∈ {0,1} | |x| is even} L = {x ∈ {0,1} | x does not end in 01 } Any string of even length can be obtained by concatenating strings length 2. If x does not end in 01, then either Any concatenation of strings of length 2 will be |x| < 2 or even x ends in 00, 10, or 11 * L = {00, 01, 10, 11} A regular expression that describes L is: * + 0 + 1 + (0 + 1) (00 + 10 + 11) Regular expressions describing L: ε * (00 + 01 + 10 + 11) * ((0 + 1)(0 + 1)) 3 Useful properties of regular Examples of Regular Languages expressions * L = {x ∈ {0,1} | x contains an odd Commutative number of 0s } L + M = M + L Express x = yz Associative i j (L + M) + N = L + (M + N) y is a string of the form y=1 01 (LM)N = L(MN) In z, there must be an even number of additional 0s or z = (01k01m)* Identities * * * * * ∅ + L = L + ∅ = L x can be described by (1 01 )(01 01 ) λL = L λ = L Questions? ∅L = L ∅ = ∅ Useful properties of regular Useful properties of regular expressions expressions Distributed Closures * * * (L ) = L L (M + N) = LM + LN * ∅ = λ (M + N)L = ML + NL * λ = λ Idempotent + * L = LL L + L = L * + L = L + λ Questions? Practical uses for regular expressions Practical uses for regular expressions grep How a compiler works Global (search for) Regular Expressions and Print lexer Stream parser Parse of tokens codegen Finds patterns of characters in a text file. Tree Object code grep man foo.txt Source grep [ab]*c[de]? foo.txt file 4 Practical uses for regular expressions Practical uses for regular expressions How a compiler works How a compiler works The Lexical Analyzer (lexer) reads source Tokens can be described using regular code and generates a stream of tokens expressions! What is a token? Identifier Keyword Number Operator Punctuation Examples of Regular Languages Examples of Regular Languages L = set of valid C keywords L = set of valid C identifiers This is a finite set A valid C identifier begins with a letter or _ A valid C identifier contains letters, L can be described by numbers, and _ if + then + else + while + do + goto + break + switch + … If we let: l = {a , b , … , z , A , B , … , Z} d = {1 , 2 , … , 9 , 0} Then a regular expression for L: * (l + _)(l + d + _) Practical uses for regular expressions Summary Regular languages can be expressed using only the lex set operations of union, concatenation, Kleene Star. Program that will create a lexical analyzer. Regular languages Input: set of valid tokens Means of describing: Regular Expression Machine for accepting: Finite Automata Tokens are given by regular expressions. Practical uses Text search (grep) Compilers / Lexical Analysis (lex) Questions? Questions? 5 For next time The bottom line Chicken or the egg? Regular expressions and finite automata are Which came first, the regular expression or the equivalent in their ability to describe finite automata? languages. McCulloch/Pitts -- used finite automata to model neural Every regular expression has a FA that accepts the networks (1943) language it describes Kleene (mid 1950s) -- Applied to regular sets The language accepted by an FA can be described Ken Thompson/ Bell Labs folk (1970s) -- QED / ed / grep / lex / awk / … by some regular expression. Recall: The Kleene Theorem! (1956) Princeton dudes (1937) But that’s next time…. 6.

Load more