To an Abstract Syntax Tree

School of Computing Science @ Simon Fraser University Parsing Lecture 09: Data Abstraction ++ ➡ Parsing A parser transforms a source program (in concrete • Parsing is the process of translating a sequence of characters (a string) into syntax) toana abstractn abs syntaxtract tree.syn tax tree (AST): program AST −−−−−−→ Parser −−−→ Processor text • Compilers (and some interpreters) analyse abstract syntax trees. • Compilers very well understood • In fact, there exist “compiler compilers” • Example: YACC (yet another compiler compiler) http://dinosaur.compilertools.net/ An Abstraction for Inductive Data Types – p.34/43 Copyright © Bill Havens 1 School of Computing Science @ Simon Fraser University Expression Syntax Extended • An abstract syntax expresses the structure of the concrete syntax without the details. • Example: extended syntax for Scheme expressions • Includes literals (constant numbers) <exp> ::= <number> | <symbol> | (lambda (<symbol>) <exp>) | (<exp> <exp>) • We would define an abstract datatype for this grammar as follows: (define-datatype expression expression? (lit-exp (datum number?)) (var-exp (id symbol?)) (lambda-exp (id symbol?) (body expression?)) (app-exp (rator expression?) (rand expression?))) • Now lets parse sentences in the language specified by the BNF grammar into abstract syntax trees Copyright © Bill Havens 2 School of Computing Science @ Simon Fraser University Parser for Extended Expressions • Here is an example parser for the lambda expression language above: (define parse-expression (lambda (datum) (cond ((number? datum) (lit-exp datum)) ((symbol? datum) (var-exp datum)) ((pair? datum) (if (eqv? (car datum) 'lambda) (lambda-exp (caadr datum) (parse-expression (caddr datum))) (app-exp (parse-expression (car datum)) (parse-expression (cadr datum))))) (else (eopl:error 'parse-expression "Invalid concrete syntax ~s" datum))))) • Fortunately, Schme (read) function does most of the hard work • Identifies tokens in input stream (eg- “(”, “)”, identifiers, strings) • Converts parenthesized structures into dotted-pairs and proper lists Copyright © Bill Havens 3 School of Computing Science @ Simon Fraser University ➡ Example • All we have to do is convert between the lists that read produces into abstract syntax trees (parse-expression ‘(foo x)) (parse-expression 3) (parse-expression ‘(lambda (x) (lambda (y) (cons x y)))) • Let’s see how these work in Dr. Scheme . • Hard to see! • Lets implement an un-parse procedure to take the abstract syntax tree apart • Turns back into concrete Scheme syntax (define unparse-expression (lambda (exp) (cases expression exp (lit-exp (datum) datum) (var-exp (id) id) (lambda-exp (id body) (list 'lambda (list id) (unparse-expression body))) (app-exp (rator rand) (list (unparse-expression rator) (unparse-expression rand)))))) Copyright © Bill Havens 4 School of Computing Science @ Simon Fraser University Environments ➡ Introduction • Consider evaluating the following expression: (+ x 3) • How does the interpreter know the value of variable x ? • Variable bindings stored in a dictionary (aka- symbol table) called an environment • Definition: An environment is a function that maps variable names to their current bindings. - Interpreter: mapping variables to their current values - Compiler: mapping variables to their lexical addresses • We denote an environment by a finite set of bindings, each having the form s➞v. • Example: env = { x ➞ 3, y ➞ (a b c), z ➞ hello } • Environment function can be applied to an argument symbol returning its binding in that enviornment • Example: env(y) = (a b c) • But env(w) = error - undefined variable Copyright © Bill Havens 5 School of Computing Science @ Simon Fraser University Nested Environments • Block structured languages (eg- C, Java, Scheme) allow nested blocks • Example in C: int foo (int z) { int x; float y; if (x < z) { float y = 3.0; ....... } else { print x; ...... }; for (int x = 0; x < z; x++) { print x+y; .........} } • How many different environments are there in this example? • What are the variable bindings in each environment? • Are there any holes in any environment? • Since blocks can be nested, we need to nest environments as well • Each scoping block needs to have its own environment. • The environment of a nested block must to refer to the environments of the enclosing blocks recursively. Copyright © Bill Havens 6 School of Computing Science @ Simon Fraser University Recursive Environments ➡ Introduction • We can think of an environment as extending its enclosing environment. • If you look something up in the environment and don't find it, look in the enclosing environment. • Recursion has to stop, so we need the concept of an empty environment • We can think of an environment then as an inductively-defined type: • The BNF for <environment> <environment> ::= ( ) | ( {<variable>, <value>}* <environment>) • An environment is either empty or its a set of (variable,value) pairs that extend an existing enclosing environment. Copyright © Bill Havens 7 School of Computing Science @ Simon Fraser University Abstract Environments ➡ Basic Idea • An abstract environment requires: 1. a function for creating an empty environment 2. An operator for extending an environment with new bindings 3. An operator for accessing the binding of a variable in an environment • Operator interfaces: ;; returns an empty environment (define empty-env (lambda () ... )) ;; returns an environment that extends env (define extend-env (lambda (vars vals env) ... )) ;; returns the value of the variable "var" in "env" (define apply-env (lambda (env var) ... )) Copyright © Bill Havens 8 School of Computing Science @ Simon Fraser University ➡ Usage (define first-env (empty-env)) (define second-env (extend-env '(a b) '(1 2) first-env) (define third-env (extend-env '(c d b) '(3 4 5) second-env) (apply-env first-env 'a) ; returns an error (apply-env second-env 'a) ; returns 1 (apply-env third-env 'a) ; returns 1 (same "a") (apply-env second-env 'b) ; returns 2 (apply-env third-env 'b) ; returns 5 (different "b") Copyright © Bill Havens 9 School of Computing Science @ Simon Fraser University Implementing Environments ➡ Procedural versus Datatype implementations • Basis for Object-Oriented Programming Language (OOPL) concept • Data is only accessbile via a procedural interface (eg- methods in Java) • Actual implementation of data is hidden by the interface • Implementation can be changed without breaking code ➡ Procedural Implementation • Environments are a function: f(variable) = value. • The empty environment is a function that when called always returns an error! • The method (empty-env) returns an environment with no bindings (define empty-env (lambda () (lambda (sym) (eopl:error 'apply-env "No binding for ~s" sym)))) Copyright © Bill Havens 10 School of Computing Science @ Simon Fraser University ➡ Extending an Environment • An extended environment is also a function which returns a value for a specified variable. • If the variable you're looking for is one defined in that environment, it returns the corresponding value; otherwise, it calls the environment function which it extends • Here is an implementation: (define extend-env (lambda (syms vals env) (lambda (sym) (let ((pos (list-find-position sym syms))) (if (number? pos) (list-ref vals pos) (apply-env env sym)))))) • Note that the bindings are represented as corresponding lists of variables and values • Method extend-env returns a function that when called searches these lists • If desired variable is not found in this environment then extend-env is called recursively on the enclosing environment Copyright © Bill Havens 11 School of Computing Science @ Simon Fraser University ➡ Helper functions • Method (list-find-position sym syms) searches the list of variable names (syms) to find the desired variable (sym) • Returns a zero-based index on success and #f on failure (Scheme convention) (define list-find-position (lambda (sym los) (list-index (lambda (sym1) (eqv? sym1 sym)) los))) • Method (list-index pred ls) is a higher-order function which applies a specified predicate pred to a list ls (define list-index (lambda (pred ls) (cond ((null? ls) #f) ((pred (car ls)) 0) (else (let ((list-index-r (list-index pred (cdr ls)))) (if (number? list-index-r) (+ list-index-r 1) #f)))))) • Comment: what is peculiar about this method implementation? Copyright © Bill Havens 12 School of Computing Science @ Simon Fraser University ➡ Recoding function list-index • Very inefficient implementation when an error is detected • Need an exception thrown instead • Why not use call/cc mechanism? • Reminder: (call/cc <lambda>) calls a function <lambda> of one argument which is the continuation of the call/cc expression. • Applying the continuation function immediately exits from the call/cc • Recoding using call/cc (define list-index (lambda (pred ls) (call/cc (lambda (exit) (list-index1 pred ls exit))))) (define list-index1 (lambda (pred ls exit) (cond ((null? ls) (exit #f)) ((pred (car ls)) 0) (else (+ 1 (list-index1 pred (cdr ls) exit)))))) • Much simpler! But can you improve it further? Use tail recursion! Copyright © Bill Havens 13 School of Computing Science @ Simon Fraser University ➡ Applying a procedural environment • Just call the function on the variable to be accessed (define apply-env (lambda (env sym) (env sym))) • Example: (apply-env (extend-env ’(x z) ’(1 3) (extend-env ’(y z) ’(2 2) (extend-env ’(x y) ’(4 7) (empty-env)))) ’y) • Copyright © Bill Havens 14 School of Computing Science @ Simon Fraser University Datatype Implementation ➡ Overview •

To an Abstract Syntax Tree

Derivatives of Parsing Expression Grammars

Abstract Syntax Trees & Top-Down Parsing

AST Indexing: a Near-Constant Time Solution to the Get-Descendants-By-Type Problem Samuel Livingston Kelly Dickinson College

Syntactic Analysis, Or Parsing, Is the Second Phase of Compilation: the Token File Is Converted to an Abstract Syntax Tree

Lecture 3: Recursive Descent Limitations, Precedence Climbing

Understanding Source Code Evolution Using Abstract Syntax Tree Matching

Parsing with Earley Virtual Machines

CSCI 742 - Compiler Construction

Syntax and Parsing

CS164: Introduction to Programming Languages and Compilers Fall 2010 1

Top-Down Parsing

Codegenassem.Java) Lexical Analysis (Scanning)