<<

Context Free https://courses.missouristate.edu/anthonyclark/333/

Some notes adapted from Professor GianLuigi Ferrari at University of Pisa Outline

Topics and Learning Objectives • Formal theory • free grammars

Assessments • Context free grammars Picture from “Crafting Interpreters” Formal Grammars Regular Expressions and Interpreter and Push-Down Automata Finite State Machines Put Tokens in a Tree- Group characters like data structure Source Code (Plain Text) into smallest that represents 1. // GCD Program (in ) Lexemes/Tokens 2. int main() { meaningful units of program 3. int i = getint(), j = getint(); int main ( ) { int i = getint ( ) , j = getint ( ) ; 4. while (i != j) { while ( i != j ) { 5. if (i > j) i = i - j; if ( i > j ) i = i - j ; 6. else j = j - i; Lexer/Scanner else j = j - 1 ; Parser } 7. } putint ( i ) ; 8. putint(i); } Self-evident 9. } You’ve already started this Assignments 5 through 7 We’ll create our own simple calculator “Walk” the tree and evaluate the program given optional input from outside of the program Result Evaluate/Interpret Assignment 8

User input / File Input / Sockets / Etc. Tree-Walking is pretty We’ll build our own representation straightforward Interpreter read Lexer

request … … send token token Parser

request send AST AST I/O Console Tree Walker

• Type-0:

• Type-1:

• Type-2:

• Type-3: Finite state automaton Scanning vs

• Regular expressions for regular recognized by lexer • Context free grammars for context-free languages recognized by parsers

REs cannot “count” • Cannot balance parenthesis • Cannot balance if then else expressions • Etc. Example from: Abstract Tree Writing An Interpreter In Go

Lexems/Tokens

We now care about syntax! No parentheses, semicolons, braces, etc. Grammar

A grammar is a tool for describing a language A grammar is a set of rules (productions) for creating valid strings

grammar English;

sentence : subject verbPhrase object; subject : 'This' | 'Computers' | 'I'; verbPhrase : adverb verb | verb; adverb : 'never'; verb : 'is' | 'run' | 'am' | 'tell'; object : 'the' noun | 'a' noun | noun; noun : 'university' | 'world' | 'cheese' | 'lies'; sentence : subject verbPhrase object; subject : 'This' | 'Computers' | 'I'; verbPhrase : adverb verb | verb; adverb : 'never'; Generating Strings verb : 'is' | 'run' | 'am' | 'tell'; object : 'the' noun | 'a' noun | noun; noun : 'university' | 'world' | 'cheese' | 'lies'; We can use the grammar to generate strings Start with the top-level rule: sentence Replace RHS with other rules or terminals For example: This is a university Syntax vs Semantics

• You can also create syntactically valid strings that do not make sense semantically • “Computers run cheese” • “This am a lies”

• These are valid sentences, but they do not have any real meaning • The same can be true for our rules • We’ll worry about semantics later. def f(): return “hi”

float y = f() + 5; Error types

Invalid lexemes (all languages will catch this problem early) var x = 5 @ “6” # ‘@’ is not a valid operator in most languages

Valid lexemes, invalid syntax (all languages will catch this problem early) x var= 5 5 * # all valid lexemes, but in the wrong order

Valid lexemes, valid syntax, invalid semantics (catch at compile time or runtime) var x = 5 * “6” # many languages will not multiply an integer and a string

Valid lexemes, valid syntax, valid semantics var x = 5 * 6 Formal Grammars

Grammar a set of rules for creating valid strings Nonterminal a grammar that can be replaced by a sequence of symbols Terminal a word in the language (cannot be replaced with something else)

Production a single rule in the grammar (XàY1Y2…) Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol is used to say that a nonterminal can be replaced with nothing Example

Write a for the following . At least 1 zero followed by at least 1 one 0 0* 1 1* Now write a grammar for the same language Example

• Define a regular expression where you have n 0’s followed by n 1’s

• Define a context-free grammar where you have some number of 0’s followed by the same number of 1’s Activity 1. Write three strings that can be generated by the following grammar. S à 1S | 0T | ε T à 1T | 0S What does this recognize?

S à 1S | 0T | ε T à 1T | 0S Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

S Start Symbol the rule used to start all derivations 1S 11S Null Symbol the ε symbol, a nonterminal can be replaced with nothing 110T 1101T Language set of all strings that can be derived from a grammar 11010S 11010 Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string

Start Symbol the rule used to start all derivations

Null Symbol the ε symbol, a nonterminal can be replaced with nothing

Language set of all strings that can be derived from a grammar Grammar rules for creating valid strings of a language

Nonterminal can be replaced by a sequence of symbols

Terminal a word in the language S à 1S | 0T | ε Production a single rule in the grammar T à 1T | 0S Derivation a sequence of rule applications that produces a valid string • 1 Start Symbol the rule used to start all derivations • “” • 0101 Null Symbol the ε symbol, a nonterminal can be replaced with nothing • 1111 • 100 Language set of all strings that can be derived from a grammar • 00 • … 2. Write a grammar for palindromes where your terminals are the symbols ‘a’ and ‘b’. 3. Write a grammar for representing all strings that start with x number of a’s, followed by y number of b’s, followed by z number of a’s, where y = x + z.