Introduction 9 : Semantic Analysis I
Total Page:16
File Type:pdf, Size:1020Kb
CSc 453 Compilers and Systems Software Introduction 9 : Semantic Analysis I Department of Computer Science University of Arizona [email protected] Copyright °c 2009 Christian Collberg Compiler Phases Semantic Analysis tokens The parser returns an abstract syntax tree (AST), a source Lexer Parser structured representation of the input program. All information present in the input program (except maybe for AST errors errors comments) is also present in the AST. IR AST Literals (integer/real/. constants) and identifiers are Interm. Semantic Optimize Code Gen Analyser available as AST input attributes. IR During semantic analysis we add new attributes to the AST, and traverse the tree to evaluate these attributes and emit errors Machine Code Gen error messages. We are here! At compiler construction time we have to decide which attributes are needed, how they should be evaluated, and the asm order in which they should be evaluated. Why Semantic Analysis? Typical Semantic Errors program X; "multiple declaration" 1 Is the program statically correct? If not, report errors to user: procedure P ( x,y : integer); “undeclared variable” var z,x : char; "type mismatch" “illegal procedure parameter” begin “type incompatibility” y := "x" "type name expected" 2 Make preparations for later compiler phases (code generation end; and optimization): var k : P; "identifier not declared" Compute types of variables. var z : R; Compute addresses of variables. type R = array [9..7] of char; Store transfer modes of procedure parameters. var x,y,t : integer; Compute labels for control structures (maybe). begin "empty range" .... end Y. "wrong closing identifier" Static Semantic Rules program X; .... begin "too few parameters" Static Semantics: ≈ type checking rules. The rules that are P(1); "too many parameters" checked by the compiler before execution. P(1,2,3); "integer type expected" Dynamic Semantics: Rules that can only be checked when the P("x",2); "variable expected" program is run. Example: ”pointer reference to NIL”. "type mismatch" R[5] := "x"; Context Conditions: Static semantic rules. z["x"] := 5; "constant expected" case x of Obviously, different languages have different static semantic y : t := 5 | "repeated case labels" rules. Ada, for example, allows null ranges (e.g. 3+2 : t := 9 | array [9..7] of char ), while Modula-2 doesn’t. 1+4 : t := 8 end "boolean expression expected" It’s our job as compiler writers to read the language definition if x then t := 4; and encode the rules in our semantic analyzer. end Y. Kinds of Context Conditions Tree-Walk Evaluators Type Checks We must check that every operator used in the program takes arguments of the correct type. Static Semantic Rules are Confusing! Kind Checks We must check that the right kind of identifier Check out any C++ manual... (procedure, variable, type, label, constant, exception) is used in the right place. Ada’s semantic rules are so unwieldy that compiler error messages often contain references to the relevant Flow-of-control Checks In Modula-2 an EXIT- statement must sub-sub-section of the Ada Reference Manual (ARM): only occur within a LOOP-statement: ”Type error. See ARM section 13.2.4.” LOOP IF ··· THEN EXIT ENDIF; END We must organize the semantic analysis phase in a systematic Uniqueness Checks Sometimes a name must be defined exactly way. once. Example: variable declarations, case labels. Name Checks Sometimes a name must occur more than once, e.g. at the beginning and end of a procedure. Tree-Walk Evaluators. Tree-Walk Evaluators. Attributes Some attributes are given values by the parser. They are called input attributes. The syntax analyzer produces an Abstract Syntax Tree The attributes can store whatever we like, e.g. the types of (AST), a structured representation of the input program. expressions. Each node in the tree has a number of variables called Context Conditions attributes. The context conditions are encoded as tests on the values of We write a program that traverses the tree (one or more attributes (node.type is the type attribute of node, times) and assigns values to the attributes. node.pos the line number in the source code): if node.type 6= "integer" then print "Integer expected at " node.pos Tree-Walk Evaluation Tree-Walk Evaluation. Source program Tree IF a<10 THEN Traversal The AST after c := 1; Order semantic analysis The AST after ENDIF; Possible Parsing parsing Semantic IF type error?! Analysis EXPR THEN ELSE IF Input EXPR THEN ELSE Attributes Semantic Analysis BinaryOp:< Type: ASSIGN Type: LOP ROP bool Des Expr real BinaryOp:< ASSIGN LOP ROP Des Expr IDENT INTEGER IDENT INTEGER Id: "a" Val: 10 Id: "c" Val: 1 IDENT INTEGER IDENT INTEGER Type: int Type: int Type: char Type:int Id: "a" Val: 10 Id: "c" Val: 1 Tree Traversal A tree-walker is a number of procedures that take a node as argument. They start by processing the root of the tree and Tree Traversal then work their way down, recursively. Often we will have one procedure for each major node-kind, i.e one for declarations, one for statements, one for expressions. Notation: n.Kind is n’s node type, for example IfStat, Assignment, etc.; n.C is n’s child C, for example n.expr, n.left, etc.; n.A is n’s attribute A, for example n.type, n.value, etc. Tree Traversal. Tree Traversal. Each time we visit a node n we can 1 Evaluate some of n’s attributes. PROCEDURE Expr(n : Node); 2 Print a semantic error message. IF n.Kind = BinOp THEN 3 Visit some of n’s children. Expr(n.LOP); PROCEDURE Stat(n : Node); Expr(n.ROP); IF n.Kind = Assign THEN ELSIF n.Kind=Name THEN Process Expr(n.Des); Expr(n.Expr); (* n.Name *) ELSIF n.Kind = IfElse THEN ELSIF n.Kind=IntCont THEN Process Expr(n.Expr); Stat(n.Stat1); Stat(n.Stat2); (* n.Value *) ENDIF ENDIF END Stat; END Expr; Constant Expressions In many languages there are special constructs where only constant expressions may occur. Constant Expression Evaluation For example, in Modula-2 you can write CONST C = 15; TYPE A = ARRAY [5..C*6] OF CHAR; but not VAR C : INTEGER; TYPE A = ARRAY [5..C] OF CHAR; i.e. the upper bound of an array index must be constant (value known at compile time). Constant Expressions. Constant Expressions. Constant declarations can depend on other constant Concrete Syntax: declarations: Expr ::= Add | Mul | IntConst CONST C1 = 15; CONST C2 = C1 * 6; Add ::= Expr + Expr TYPE A = ARRAY [5..C2] OF CHAR; Mul ::= Expr * Expr Write a tree-walk evaluator that evaluates constant integer IntConst ::= number expressions. Abstract Syntax: IntConst has an input attribute Value. We mark input Expr ::= Add | Mul | IntConst attributes with a ⇐ in the abstract syntax. Add ::= LOP:Expr ROP:Expr ⇑Val:INTEGER Each node is given an attribute Val. Mul ::= LOP:Expr ROP:Expr ⇑Val:INTEGER Val moves up the tree, so we mark it with a ⇑ in the abstract IntConst ::= ⇐Value:INTEGER ⇑Val:INTEGER syntax. Constant Expressions. PROCEDURE Expr (n: Node); IF n.Kind = Add THEN Expr(n.LOP); Expr(n.ROP); The AST Mul The AST after after the Mul n.Val := n.LOP.Val + n.ROP.Val; parsing Val= tree−walk Val=36 ELSIF n.Kind = Mul THEN Expr(n.LOP); Expr(n.ROP); IntConst Add Add IntConst Value=9 Value=9 n.Val := n.LOP.Val * n.ROP.Val; Val=12 Val=3 ELSIF n.Kind = IntConst THEN IntConst IntConst Value=5 Value=7 n.Val := n.Value; IntConst IntConst ENDIF Value=5 Value=7 END; Val=5 Val=7 n.LOP.Val has been evaluated after Expr(n.LOP) has returned. n.LOP.Val is the value of of n’s left child’s Val attribute. Constant Declarations Concrete Syntax: ConstDecl ::= CONST Ident = Expr Expr ::= Expr + Expr | Ident | IntConst Let’s extend this exercise to handle Modula-2 style constant IntConst ::= number declarations: CONST C1 = 15; Ident ::= name CONST C2 = C1 * 6; Abstract Syntax: TYPE A = ARRAY [5..C2] OF CHAR; ConstDecl ::= ID:Ident EXPR:Expr We assume there is a magic function Lookup(ID) that Expr ::= Add | IntConst | Ident returns TRUE if ID is a constant identifier, and a function Add ::= LOP:Expr ROP:Expr ⇑Val:INTEGER GetValue(ID) which returns the value of this constant. ⇑IsConst:BOOLEAN IntConst ::= ⇐Value:INTEGER ⇑Val:INTEGER ⇑IsConst:BOOLEAN Ident ::= ⇐ID:String ⇑IsConst:BOOLEAN Constant Declarations. PROCEDURE ConstDecl (n: Node); Source Expr(n.EXPR); VAR x:INTEGER; IF NOT n.EXPR.IsConst THEN CONST C = 45+X; ConstDecl PRINT "Constant expression expected." Expr ConstDecl ID="C" PROCEDURE Expr (n: Node); ID="C" Expr IF n.Kind = Add THEN Add Expr(n.LOP); Expr(n.ROP); Val=? Add n.Val := n.LOP.Val + n.ROP.Val; IsConst=FALSE n.IsConst := n.LOP.IsConst AND n.ROP.IsConst; IntConst Ident ELSIF n.Kind = IntConst THEN Ident Value=45 ID="X" IntConst n.Val := n.Value; n.IsConst := TRUE; Value=45 ID="X" ELSIF n.Kind = Ident THEN Val=45 Val=? n.IsConst := Lookup(n.ID); n.Val := GetValue(n.ID); IsConst=TRUE IsConst=FALSE ENDIF Type Checking Assignments Write a tree-walker that type checks assignments in Pascal: var i : integer; var r : real; var c : char; Typechecking begin i := 34; i := i + 2; r := 3.4; r := 3.4 + i; (* OK, automatic conversion. *) i := r; (* Illegal. *) i := c; (* Illegal. *) end. Assume a function lookup that returns the type of an identifier. Concrete Syntax: PROCEDURE Assign (n: Node); Assign ::= Expr := Expr Expr(n.Left); Expr(n.Right); Expr ::= Expr + Expr | name | integer | real | char IF NOT(n.Left.Type = n.Right.Type OR Abstract Syntax: (n.Left.Type="REAL" AND n.Right.Type="INT")) THEN PRINT n.Left.Pos ":Type mismatch" ENDIF Assign ::= Left:Expr Right:Expr Expr ::= Add | Name | IntConst | RealConst | CharConst PROCEDURE Expr (n: Node); Add ::= LOP:Expr ROP:Expr ⇑Type:String IF n.Kind