Supplementary Materials

Contents 2 Programming Language Syntax C 1 2.3.5 Syntax Errors C 1 2.4 Theoretical Foundations C 13 2.4.1 Finite Automata C 13 2.4.2 Push-Down Automata C 18 2.4.3 Grammar and Language Classes C 19 2.6 Exercises C 24 2.7 Explorations C 25 3 Names, Scopes, and Bindings C 26 3.4 Implementing Scope C 26 3.4.1 Symbol Tables C 26 3.4.2 Association Lists and Central Reference Tables C 31 3.8 Separate Compilation C 36 3.8.1 Separate Compilation in C C 37 3.8.2 Packages and Automatic Header Inference C 40 3.8.3 Module Hierarchies C 41 3.10 Exercises C 42 3.11 Explorations C 44 4SemanticAnalysis C 45 4.5 Space Management for Attributes C 45 4.5.1 Bottom-Up Evaluation C 45 4.5.2 Top-Down Evaluation C 50 C ii Contents 4.8 Exercises C 57 4.9 Explorations C 59 5 Target Machine Architecture C 60 5.1 The Memory Hierarchy C 61 5.2 Data Representation C 63 5.2.1 Integer Arithmetic C 65 5.2.2 Floating-Point Arithmetic C 67 5.3 Instruction Set Architecture (ISA) C 70 5.3.1 Addressing Modes C 71 5.3.2 Conditions and Branches C 72 5.4 Architecture and Implementation C 75 5.4.1 Microprogramming C 76 5.4.2 Microprocessors C 77 5.4.3 RISC C 77 5.4.4 Multithreading and Multicore C 78 5.4.5 Two Example Architectures: The x86 and ARM C 80 5.5 Compiling for Modern Processors C 88 5.5.1 Keeping the Pipeline Full C 89 5.5.2 Register Allocation C 93 5.6 Summary and Concluding Remarks C 98 5.7 Exercises C 100 5.8 Explorations C 104 5.9 Bibliographic Notes C 105 6 Control Flow C 107 6.5.4 Generators in Icon C 107 6.7 Nondeterminacy C 110 6.9 Exercises C 116 6.10 Explorations C 118 7 Type Systems C 119 7.3.2 Generics in C++, Java, and C# C 119 7.6 Exercises C 132 Contents C iii 7.7 Explorations C 135 8 Composite Types C 136 8.1.3 Variant Records (Unions) C 136 8.5.2 Dangling References C 144 8.7 Files and Input/Output C 148 8.7.1 Interactive I/O C 148 8.7.2 File-Based I/O C 149 8.7.3 Text I/O C 151 8.9 Exercises C 160 8.10 Explorations C 162 9 Subroutines and Control Abstraction C 163 9.2.1 Displays C 163 9.2.2 Stack Case Studies: LLVM on ARM; gcc on x86 C 167 9.2.3 Register Windows C 177 9.3.2 Call by Name C 180 9.5.3 Implementation of Iterators C 183 9.5.4 Discrete Event Simulation C 187 9.8 Exercises C 191 9.9 Explorations C 193 10 Data Abstraction and Object Orientation C 194 10.6 True Multiple Inheritance C 194 10.6.1 Semantic Ambiguities C 196 10.6.2 Replicated Inheritance C 200 10.6.3 Shared Inheritance C 201 10.7.1 The Object Model of Smalltalk C 204 10.9 Exercises C 208 10.10 Explorations C 211 11 Functional Languages C 212 C iv Contents 11.7 Theoretical Foundations C 212 11.7.1 Lambda Calculus C 214 11.7.2 Control Flow C 217 11.7.3 Structures C 219 11.10 Exercises C 223 11.11 Explorations C 225 12 Logic Languages C 226 12.3 Theoretical Foundations C 226 12.3.1 Clausal Form C 227 12.3.2 Limitations C 228 12.3.3 Skolemization C 230 12.6 Exercises C 232 12.7 Explorations C 234 13 Concurrency C 235 13.5 Message Passing C 235 13.5.1 Naming Communication Partners C 235 13.5.2 Sending C 239 13.5.3 Receiving C 244 13.5.4 Remote Procedure Call C 249 13.7 Exercises C 254 13.8 Explorations C 256 14 Scripting Languages C 258 14.3.5 XSLT C 258 14.6 Exercises C 270 14.7 Explorations C 272 15 Building a Runnable Program C 273 15.2.1 GIMPLE and RTL C 273 15.7 Dynamic Linking C 279 15.7.1 Position-Independent Code C 280 15.7.2 Fully Dynamic (Lazy) Linking C 282 Contents C v 15.9 Exercises C 284 15.10 Explorations C 285 16 Run-Time Program Management C 286 16.1.2 The Common Language Infrastructure C 286 16.5 Exercises C 295 16.6 Explorations C 296 17 Code Improvement C 297 17.1 Phases of Code Improvement C 299 17.2 Peephole Optimization C 301 17.3 Redundancy Elimination in Basic Blocks C 304 17.3.1 A Running Example C 305 17.3.2 Value Numbering C 307 17.4 Global Redundancy and Data Flow Analysis C 312 17.4.1 SSA Form and Global Value Numbering C 312 17.4.2 Global Common Subexpression Elimination C 315 17.5 Loop Improvement I C 323 17.5.1 Loop Invariants C 323 17.5.2 Induction Variables C 325 17.6 Instruction Scheduling C 328 17.7 Loop Improvement II C 332 17.7.1 Loop Unrolling and Software Pipelining C 332 17.7.2 Loop Reordering C 337 17.8 Register Allocation C 344 17.9 Summary and Concluding Remarks C 348 17.10 Exercises C 349 17.11 Explorations C 353 17.12 Bibliographic Notes C 354 Programming Language2 Syntax 2.3.5 Syntax Errors EXAMPLE 2.43 The main text illustrated the problem of syntax error recovery with a simple ex- Syntax error in C (reprise) ample in C: A=B:C+D; The compiler will detect a syntax error immediately after the B, but it cannot give up at that point: it needs to keep looking for errors in the remainder of the program. To permit this, we must modify the input program, the state of the parser, or both, in a way that allows parsing to continue, hopefully without announcing a significant number of spurious cascading errors and without missing a significant number of real errors. The techniques discussed below allow the compiler to search for further syntax errors. In Chapter 4 we will consider additional techniques that allow it to search for additional static semantic errors as well. Panic Mode Perhaps the simplest form of syntax error recovery is a technique known as panic mode. It defines a small set of “safe symbols” that delimit clean points in the input. When an error occurs, a panic mode recovery algorithm deletes input tokens until it finds a safe symbol, then backs the parser out to a context in which that symbol might appear. In the earlier example, a recursive descent parser with panic mode recovery might delete input tokens until it finds the semicolon, return from all subroutines called from within stmt,andrestartthebodyofstmt itself. Unfortunately, panic mode tends to be a bit drastic. By limiting itself to a static set of “safe” symbols at which to resume parsing, it admits the possibility of delet- ing a significant amount of input while looking for such a symbol. Worse, if some of the deleted tokens are “starter” symbols that begin large-scale constructs in the language (e.g., begin, procedure, while), we shall almost surely see spurious cascading errors when we reach the end of the construct. EXAMPLE 2.44 Consider the following fragment of code in an Algol-family language: The problem with panic mode C 1 C 2 Chapter 2 Programming Language Syntax IF a b THEN x; ELSE y; END; When it discovers the error at b in the first line, a panic-mode recovery algorithm is likely to skip forward to the semicolon, thereby missing the THEN. When the parser finds the ELSE on line 2 it will produce a spurious error message. When it finds the END on line 3 it will think it has reached the end of the enclosing struc- ture (e.g., the whole subroutine), and will probably generate additional cascading errors on subsequent lines. Panic mode tends to work acceptably only in rela- tively “unstructured” languages, such as Basic and (early) Fortran, which don’t have many “starter” symbols. Phrase-Level Recovery We can improve the quality of recovery by employing different sets of “safe” symbols in different contexts. Parsers that incorporate this improvement are said to implement phrase-level recovery. Whenitdiscoversanerrorinanexpression,for example, a phrase-level recovery algorithm can delete input tokens until it reaches something that is likely to follow an expression. This more local recovery is better than always backing out to the end of the current statement, because it gives us the opportunity to examine the parts of the statement that follow the erroneous expression. EXAMPLE 2.45 Niklaus Wirth, the inventor of Pascal, published an elegant implementation of Phrase-level recovery in phrase-level recovery for recursive descent parsers in 1976 [Wir76, Sec. 5.9]. The recursive descent simplest version of his algorithm depends on the FIRST and FOLLOW sets defined at the end of Section 2.3.1. If the parsing routine for nonterminal foo discovers an error at the beginning of its code, it deletes incoming tokens until it finds a member of FIRST(foo), in which case it proceeds, or a member of FOLLOW(foo), in which case it returns: procedure foo() if not (input token ∈ FIRST(foo) or EPS(foo)) report error() –– print message for the user repeat delete token() until input token ∈ (FIRST(foo) ∪ FOLLOW(foo) ∪{$$}) case input token of ...:..

Supplementary Materials

Using the Unibasic Debugger

IEEE Standard 754 for Binary Floating-Point Arithmetic

Introduction to Python

Kednos PL/I for Openvms Systems User Manual

Data General Extended Algol 60 Compiler

Numerical Computation Guide

CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 12

Introduction to Programming CSC 103

Design and Implementation of the GNU Prolog System Abstract 1 Introduction

Interpreters Machine Interpreters

Design and Implementation of the Symbol Table for Object- Oriented Programming Language

Floating Point Exception Tracking and NAN Propagation