Context Free Grammars

Context Free Grammars https://courses.missouristate.edu/anthonyclark/333/ Some notes adapted from Professor GianLuigi Ferrari at University of Pisa Outline Topics and Learning Objectives • Formal grammar theory • Context free grammars Assessments • Context free grammars Picture from “Crafting Interpreters” Formal Grammars Regular Expressions and Interpreter and Push-Down Automata Finite State Machines Put Tokens in a Tree- Group characters like data structure Source Code (Plain Text) into smallest that represents 1. // GCD Program (in C) Lexemes/Tokens 2. int main() { meaningful units semantics of program 3. int i = getint(), j = getint(); int main ( ) { int i = getint ( ) , j = getint ( ) ; 4. while (i != j) { while ( i != j ) { 5. if (i > j) i = i - j; if ( i > j ) i = i - j ; 6. else j = j - i; Lexer/Scanner else j = j - 1 ; Parser } 7. } putint ( i ) ; 8. putint(i); } Self-evident 9. } You’ve already started this Assignments 5 through 7 We’ll create our own simple calculator language “Walk” the tree and evaluate the program given optional input from outside of the program Result Evaluate/Interpret Assignment 8 User input / File Input / Sockets / Etc. Tree-Walking is pretty We’ll build our own representation straightforward Interpreter read Lexer request … … send token token Parser request send AST AST I/O Console Tree Walker Chomsky Hierarchy • Type-0: Turing machine • Type-1: Linear bounded automaton • Type-2: Pushdown automaton • Type-3: Finite state automaton Scanning vs Parsing • Regular expressions for regular languages recognized by lexer • Context free grammars for context-free languages recognized by parsers REs cannot “count” • Cannot balance parenthesis • Cannot balance if then else expressions • Etc. Example from: Abstract Syntax Tree Writing An Interpreter In Go Lexems/Tokens We now care about syntax! No parentheses, semicolons, braces, etc. Grammar A grammar is a tool for describing a language A grammar is a set of rules (productions) for creating valid strings grammar English; sentence : subject verbPhrase object; subject : 'This' | 'Computers' | 'I'; verbPhrase : adverb verb | verb; adverb : 'never'; verb : 'is' | 'run' | 'am' | 'tell'; object : 'the' noun | 'a' noun | noun; noun : 'university' | 'world' | 'cheese' | 'lies'; sentence : subject verbPhrase object; subject : 'This' | 'Computers' | 'I'; verbPhrase : adverb verb | verb; adverb : 'never'; Generating Strings verb : 'is' | 'run' | 'am' | 'tell'; object : 'the' noun | 'a' noun | noun; noun : 'university' | 'world' | 'cheese' | 'lies'; We can use the grammar to generate strings Start with the top-level rule: sentence Replace RHS with other rules or terminals For example: This is a university Syntax vs Semantics • You can also create syntactically valid strings that do not make sense semantically • “Computers run cheese” • “This am a lies” • These are valid sentences, but they do not have any real meaning • The same can be true for our programming language rules • We’ll worry about semantics later. def f(): return “hi” float y = f() + 5; Error types Invalid lexemes (all languages will catch this problem early) var x = 5 @ “6” # ‘@’ is not a valid operator in most languages Valid lexemes, invalid syntax (all languages will catch this problem early) x var= 5 5 * # all valid lexemes, but in the wrong order Valid lexemes, valid syntax, invalid semantics (catch at compile time or runtime) var x = 5 * “6” # many languages will not multiply an integer and a string Valid lexemes, valid syntax, valid semantics var x = 5 * 6 Formal Grammars Grammar a set of rules for creating valid strings Nonterminal a grammar symbol that can be replaced by a sequence of symbols Terminal a word in the language (cannot be replaced with something else) Production a single rule in the grammar (XàY1Y2…) Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol is used to say that a nonterminal can be replaced with nothing Example Write a regular expression for the following regular language. At least 1 zero followed by at least 1 one 0 0* 1 1* Now write a grammar for the same language Example • Define a regular expression where you have n 0’s followed by n 1’s • Define a context-free grammar where you have some number of 0’s followed by the same number of 1’s Activity 1. Write three strings that can be generated by the following grammar. S à 1S | 0T | ε T à 1T | 0S What does this recognize? S à 1S | 0T | ε T à 1T | 0S Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string S Start Symbol the rule used to start all derivations 1S 11S Null Symbol the ε symbol, a nonterminal can be replaced with nothing 110T 1101T Language set of all strings that can be derived from a grammar 11010S 11010 Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string Start Symbol the rule used to start all derivations Null Symbol the ε symbol, a nonterminal can be replaced with nothing Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language Nonterminal can be replaced by a sequence of symbols Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string • 1 Start Symbol the rule used to start all derivations • “” • 0101 Null Symbol the ε symbol, a nonterminal can be replaced with nothing • 1111 • 100 Language set of all strings that can be derived from a grammar • 00 • … 2. Write a grammar for palindromes where your terminals are the symbols ‘a’ and ‘b’. 3. Write a grammar for representing all strings that start with x number of a’s, followed by y number of b’s, followed by z number of a’s, where y = x + z..

Load more