
CS345H: Programming Languages Lecture 5: Introduction to Parsing Thomas Dillig Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 1/39 I Parser Overview I Context-free Grammars (CFGs) I Derivations I Ambiguity Outline I Limitations of Regular Languages Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 2/39 I Context-free Grammars (CFGs) I Derivations I Ambiguity Outline I Limitations of Regular Languages I Parser Overview Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 2/39 I Derivations I Ambiguity Outline I Limitations of Regular Languages I Parser Overview I Context-free Grammars (CFGs) Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 2/39 I Ambiguity Outline I Limitations of Regular Languages I Parser Overview I Context-free Grammars (CFGs) I Derivations Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 2/39 Outline I Limitations of Regular Languages I Parser Overview I Context-free Grammars (CFGs) I Derivations I Ambiguity Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 2/39 I But regular languages are not expressive enough to turn a stream of tokens into structure I For this, we need a more expressive formal language Regular Languages I Last time, we saw that regular languages are very useful for partitioning input into tokens Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 3/39 I For this, we need a more expressive formal language Regular Languages I Last time, we saw that regular languages are very useful for partitioning input into tokens I But regular languages are not expressive enough to turn a stream of tokens into structure Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 3/39 Regular Languages I Last time, we saw that regular languages are very useful for partitioning input into tokens I But regular languages are not expressive enough to turn a stream of tokens into structure I For this, we need a more expressive formal language Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 3/39 I Classic Example: Strings of balanced parenthesis: f(i )j j i ≥ 0g I Question: Why is there no automata that can recognize this language? Beyond Regular Languages I Many languages are not regular Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 4/39 I Question: Why is there no automata that can recognize this language? Beyond Regular Languages I Many languages are not regular I Classic Example: Strings of balanced parenthesis: f(i )j j i ≥ 0g Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 4/39 Beyond Regular Languages I Many languages are not regular I Classic Example: Strings of balanced parenthesis: f(i )j j i ≥ 0g I Question: Why is there no automata that can recognize this language? Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 4/39 I Intuition: A finite automaton that runs long enough must repeat states I Finite automaton cannot remember the number of times it has visited a particular state What Can Regular Languages Express? I Languages requiring counting modulo a fixed integer Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 5/39 I Finite automaton cannot remember the number of times it has visited a particular state What Can Regular Languages Express? I Languages requiring counting modulo a fixed integer I Intuition: A finite automaton that runs long enough must repeat states Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 5/39 What Can Regular Languages Express? I Languages requiring counting modulo a fixed integer I Intuition: A finite automaton that runs long enough must repeat states I Finite automaton cannot remember the number of times it has visited a particular state Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 5/39 I Also Recall: Comments are removed during lexing I Question: Are comments in L a regular language? Side Note: Comments in L I Recall: Comments in L start with (*, end with *) and can be nested Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 6/39 I Question: Are comments in L a regular language? Side Note: Comments in L I Recall: Comments in L start with (*, end with *) and can be nested I Also Recall: Comments are removed during lexing Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 6/39 Side Note: Comments in L I Recall: Comments in L start with (*, end with *) and can be nested I Also Recall: Comments are removed during lexing I Question: Are comments in L a regular language? Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 6/39 I Output: parse tree of the program The Functionality of the Parser I Input: sequence of tokens from the lexer Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 7/39 The Functionality of the Parser I Input: sequence of tokens from the lexer I Output: parse tree of the program Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 7/39 I Parse Input: TOKEN_IF TOKEN_ID("x") TOKEN_NEQ TOKEN_ID("y") TOKEN_THEN TOKEN_INT(1) TOKEN_ELSE TOKEN_INT(2) I Parser Output: Branch NEQ INT:1 INT:2 ID:x ID:y Example I Consider the following L expression: if x<>y then 1 else 2 Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 8/39 I Parser Output: Branch NEQ INT:1 INT:2 ID:x ID:y Example I Consider the following L expression: if x<>y then 1 else 2 I Parse Input: TOKEN_IF TOKEN_ID("x") TOKEN_NEQ TOKEN_ID("y") TOKEN_THEN TOKEN_INT(1) TOKEN_ELSE TOKEN_INT(2) Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 8/39 Branch NEQ INT:1 INT:2 ID:x ID:y Example I Consider the following L expression: if x<>y then 1 else 2 I Parse Input: TOKEN_IF TOKEN_ID("x") TOKEN_NEQ TOKEN_ID("y") TOKEN_THEN TOKEN_INT(1) TOKEN_ELSE TOKEN_INT(2) I Parser Output: Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 8/39 String of characters String of tokens String of tokens Parse tree Parsing vs. Lexing Phase Input Output Lexer Parser Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 9/39 String of tokens String of tokens Parse tree Parsing vs. Lexing Phase Input Output Lexer String of characters Parser Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 9/39 String of tokens Parse tree Parsing vs. Lexing Phase Input Output Lexer String of characters String of tokens Parser Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 9/39 Parse tree Parsing vs. Lexing Phase Input Output Lexer String of characters String of tokens Parser String of tokens Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 9/39 Parsing vs. Lexing Phase Input Output Lexer String of characters String of tokens Parser String of tokens Parse tree Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 9/39 I A language for describing valid strings of tokens I A method for recognizing if a string of tokens is in this language or not I Parser must distinguish between valid and invalid strings of tokens I We need: The Role of the Parser I Not all strings of tokens are programs . Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 10/39 I A language for describing valid strings of tokens I A method for recognizing if a string of tokens is in this language or not I We need: The Role of the Parser I Not all strings of tokens are programs . I Parser must distinguish between valid and invalid strings of tokens Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 10/39 I A language for describing valid strings of tokens I A method for recognizing if a string of tokens is in this language or not The Role of the Parser I Not all strings of tokens are programs . I Parser must distinguish between valid and invalid strings of tokens I We need: Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 10/39 I A method for recognizing if a string of tokens is in this language or not The Role of the Parser I Not all strings of tokens are programs . I Parser must distinguish between valid and invalid strings of tokens I We need: I A language for describing valid strings of tokens Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 10/39 The Role of the Parser I Not all strings of tokens are programs . I Parser must distinguish between valid and invalid strings of tokens I We need: I A language for describing valid strings of tokens I A method for recognizing if a string of tokens is in this language or not Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 10/39 I Example: An L expression is expression + expression, if expression then expression else expression, ... I Context free grammars are a natural notation for this recursive structure Context-free Grammars (CFGs) I Programming language constructs have recursive structure Thomas Dillig, CS345H: Programming Languages Lecture 5: Introduction to Parsing 11/39 I Context free grammars are a natural notation for this recursive structure Context-free Grammars (CFGs) I Programming language constructs have recursive structure I Example: An L expression is expression
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages142 Page
-
File Size-