6.035 Lecture 2, Specifying Languages with Regular Expressions and Context-Free Grammars

6.035 Lecture 2, Specifying Languages with Regular Expressions and Context-Free Grammars

MIT 6.035 Specifying Languages with Regular Expressions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem • How to precisely define language • Layered struc ture of langua ge defifinitiition • Start with a set of letters in language • Lexical structurestructure - identifies “words ” in language (each word is a sequence of letters) • Syntactic sstructuretructure - identifies “sentences ” in language (each sentence is a sequence of words) • Semantics - meaning of program (specifies what result should be for each input) • Today’s topic: lexical and syntactic structures Specifying Formal Languages • Huge Triumph of Computer Science • Beautiful Theoretical Results • Practical Techniques and Applications • Two Dual Notions • Generative approach ((grammar or regular expression) • Recognition approach (automaton) • Lots of theorems about convertingconverting oonene approach automatically to another Specifying Lexical Structure Using Regular ExpressionsExpressions • Have some alphabet ∑ = set of letters • R egular expressions are bu ilt from: • ε -empty string • AAnyn yl lettere tte rf fromr oma alphabetlp ha be t ∑ •r1r2 – regular expression r1 followed by r2 (sequence) •r1| r2 – either regular expression r1 or r2 (choice) • r* - iterated sequence and choice ε | r | rr | … • Parentheses to indicate grouping/precedence Concept of Regular Expression Generating a StringString Rewrite regular expression until have only a sequence of letters (string) left Genera l RRululeses Examp le (0 | 1)*.(0|1)* 1) r1| r2 → r1 (|)(0 | 1)((0 ||) 1)*.(0|1)* 2) r1| r2 → r2 1(0|1)*.(0|1)* 3) r* →rr* 1.(0|1)* 4) r* →ε 1.(0|1)(0|1)* 1.(0|1) 1.0 Nondeterminism in Generation • Rewriting is similar to equational reasoning • But different rule applications may yield different final results Example 1 Example 2 (0|1)*.(0|1)* (0|1)*.(0|1)* (0|1)(0|1)*.(0|1)* (|)(0|1)((0|1)*.(0|1)* 1(0|1)*.(0|1)* 0(0|1)*.(0|1)* 1.(0|1)* 0.(0|1)* 1.((0|1)(0|1)*0|1)(0|1)* 0.(0|1)(0|1)* 1.(0|1) 0.(0|1) 101.0 010.1 Concept of Language Generated by Regular ExpressionsExpressions • Set of all strings generated by a regular expression is language of regular expressionexpression • In general, language may be (countably) infinite • String in language is often calledcalled a token Examples of Languages and Regular Expressions • ∑ = { 0, 1, . } • (0|1)*. (0|1)* - Binary floa tin g poin t numbers • (00)* - even-length all-zero strings • 1*(01*01*)* - strings wwithith eevenven number ooff zeros • ∑ = {a{ a,b ,c , 0, 1, 22} } • (a|b|c)(a|b|c|0|1|2)* - alphanumeric identifiers • (0|1|2)* - trinary numbers Alternate Abstraction Finite-State AAutomatautomata •Alphabet∑ • Se t of states wiithth i niititi al and accept states • Transitions between states, labeled with letters (0|1)*.(0|1)* 1 . 1 Start state 0 0 Accept state Automaton Accepting String Conceptually, run string through automaton • Have current state and current letter in string • Start with start state and first letter in string • At each step, match current letter against a transition whose label is same as letter • Continue until reach end of string or match fails • If end in accept state, automaton accepts string • Language of automaton is set of strings it accepts Example Current state 1 . 1 Start state 0 0 Accept state 11.0 CurrCurrentent letletterter Example Current state 1 . 1 Start state 0 0 Accept state 11.0 CurrCurrentent letletterter Example Current state 1 . 1 Start state 0 0 Accept state 11.0 CurrCurrentent letletterter Example Current state 1 . 1 Start state 0 0 Accept state 11.0 CurrCurrentent letletterter Example Current state 1 . 1 Start state 0 0 Accept state 11.0 CurrCurrentent letletterter Example Current state 1 . 1 Start state 0 0 Accept state 11.0 String is accepted! CurrCurrentent letletterter Generative Versus Recognition • Regular expressions give you a way to generate all strings in language • Automata give you a way to recognize if a specific string is in language •Phlhilosophicalllly very difffferent • Theoretically equivalent (for regular expressions aandnd automata) • Standard approach • Use rregularegular expressions wwhenhen ddefineefine llanguageanguage • Translated automatically into automata for implementation From Regular Expressions to Automata • Construction by structural induction • Gi ven an arbit rary regular expression r • Assume we can convert r to an automaton with • One startstart state • One accept state • Show how too convertc all constructorsc too deliver an automaton with • One startstart state • One accept state Basic Constructs Start state Accept sstatetate ε ε a a∈Σ Sequence Start state Accept state r1r 2 r1 r2 Sequence Old start state Start state Old accept state Accept state r1r 2 r1 r2 Sequence Old start state Start state Old accept state Accept state ε r1r 2 r1 r2 Sequence Old start state Start state Old accept state Accept state ε ε r1r 2 r1 r2 Sequence Old start state Start state Old accept state Accept state ε ε ε r1r 2 r1 r2 Choice Start state Accept state r1 r1|r2 r2 Choice Old start state Start state Old accept state Accept state r1 r1|r2 r2 Choice Old start state Start state Old accept state Accept state ε r1 r1|r2 ε r2 Choice Old start state Start state Old accept state Accept state ε ε r1 r1|r2 ε r2 ε Kleene Star Old start state Start state Old acceptaccept statestate Accept sstatetate r* r Kleene Star Old start state Start state Old acceptaccept statestate Accept sstatetate r* r Kleene Star Old start state Start state Old acceptaccept statestate Accept sstatetate ε ε r* r Kleene Star Old start state Start state Old acceptaccept statestate Accept sstatetate ε ε ε r* r Kleene Star Old start state Start state Old acceptaccept statestate Accept sstatetate ε ε ε r* r ε NFA vs. DFA •DFA • No ε transitions • At most one transition from each state for each letter a a NOT OK OK b a • NFA – neither restriction Conversions • Our regular expression to automata conversion produces an NFA • Would like to have a DFA to make recognition aaglgoo rithm simple r • Can convert from NFA to DFA (but DFA may be exponentially larger than NFA) NFA to DFA Construction • DFA has a state for each subset of states in NFA • DFA start state corresponds to set of states reachable by following ε traansitionsnsitions fr oomm NFANFA start state • DFA state is an accept state if an NFA accept state is in its set of NFA states • To compute tthehe transition for a given DDFAFA state D and lletteretter a • Set S to empty set • Find the set N of D’s NFA states • For aallll NFA sstatestates n in N – Compute set of states N’ that the NFA may be in after matching a – Set S to S union N’ • If S is nonempty, there is a transition for a from D to the DFA state that has the set S of NFA states •• Otherwise, therethere is no transition for a from D NFA to DFA Example for (a|b)*.(a|b)* a ε a ε ε 3 5 ε ε 11 13 ε ε ε . ε ε 1 2 7 8 9 10 15 16 ε ε 4 6 ε ε 12 14 ε ε b b . a a 5,7,2,3,4,8 13,15,10,11,12,16 a a . a a 1,2,3,4,8 9,10,11,12,16 b b b b 6723486,7,2,3,4,8 . 14, 15, 10,11 , 12,16 b b Lexical Structure in Languages Each language typically has several categories of words. In a typicaltypical programming llanguage:anguage: • Keywords (if, while) • Arithmetic Operations (+, -, *, /) • Integer numbers (1, 2, 45, 67) • Floating point numbers (1.0, .2, 3.337) •Identifiers (abc, i, j, ab345) • Typically have a lexical category for each keyword aand/ornd/or eeachach ccategoryategory • Each lexical category defined by regexp Lexical Categories Example • IfKeyword = if • WhileKeyword = while • Operator = +|-|*|/ •Intege r = [0- 9] [0- 9] * • Float = [0-9]*. [0-9]* • Identifier = [a-z]([a-z]|[0-9])* • Note that [0-9] = (0|1|2|3|4|5|6|7|8|9) [a-z] = (a|b|c|…|y|z) • Will use lexical categories in next level Programming Language Syntax • Regular languages suboptimal for specifying programming language syntax • Why? Constructs with nested syntax • (a+(b- c)) *(d(d -(x- (y -z))) • if (x < y) if (y < z) a = 5 else a = 6 else a = 7 • Regular languageslanguages lack state requiredrequired toto modelmodel nesting • Canonical example: nested expressions • No regular expression for language of parenthesized expressions Solution – Context-Free Grammar • Set of terminals Op = +|-|*|/ { Op, Int, Open, Close } Int = [0-9] [0-9]* Each terminal defined Open = < blby regular expressi on Close = > • Set of nonterminals { Start, Expr } • Set of productions Start → Expr •Single nonterminal on LHS Exprp→ Expr p Opp Exprp • Sequence of terminals and Expr → Int nonterminals on RHS Expr → Open Expr Close Production Game have a current string start wwithith Start nonterminal loop until no more nonterminals choose a nontteerminal in current string choose a production with nonterminal in LHS replace nonterminal with RHS of production substitute regular expressions with corresponding strings generated sstringtring iis s iin n llanguageanguage Note: ddifferentifferent choices pproduceroduce different strings Sample Derivation Op = +|-|*|/ Start Int = [0- 9] [0- 9] * E xpr Open = < Expr Op Expr Close = > Open Expr Close Op Expr Open Expr Op Expr Close Op Expr Open Int Op Expr Close Op Expr Open Int Op Expr Close OpOp IntInt 1) Start → Expr Open Int Op Int Close Op Int 2) Expr → Expr Op Expr < 2 - 1 > + 1 3) Expr → ItInt 4) Expr → Open Expr Close Parse Tree • Internal Nodes: Nonterminals • Leaves: T ermina ls • Edges: • From Nonterminal ofof LLHSHS of production • To Nodes from RHS of production • Captures derivation of string Parse Tree for <2-1>+1 Start Expr Expr

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    76 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us