<<

Appendix A The BCPL Language

I've used BCPL as the language of illustration throughout most of this book because it is designed as a system- and is especially suited to the special problems of ­ writing (a recurrent joke is that BCPL is designed as a language in which to write the BCPL compiler!). It is suitable for this purpose mainly because of the ease with which the program can manipulate pointers. In addition it is untyped - all BCPL values are merely treated as 'bitstrings' - so that the program can do just about anything that you might require to do with a pointer. Recursion is efficient in BCPL (see chapter 13).

I have taken many liberties with BCPL syntax in examples, mainly inspired by the need to compress complicated algorithms into the space of a single page. I have used if-then-else in place of BCPL's test-then-or for no other reason than the first construction will be more familiar to many users: I have used elsf because it abbreviates the programs. I have used dot-suffix notation 'nodep.x' rather than 'nodep!x' or 'xAnodep' because I believe it will be more familiar to many readers. Apologies to BCPL fanatics (and to BCPL's designer>, but I defend myself by saying that my task is to explain compiler algorithms, not the syntax of BCPL.

In what follows I explain only those portions of BCPL (and pseudo-BCPL) which I have used in examples. The language is much more powerful and elegant than I have made it seem - I commend its use to all readers. Statements

BCPL provides several forms of iterative and conditional statements. In general, if there are two uses for a statement, BCPL provides two different syntaxes hence while, until, repeatwhile, repeatuntil and repeat are all provided in BCPL. aren't usually needed to separate statements, although I have used them where one statement follows another on a single line. The then or do symbol is usually unnecessary, but I have 382 Understanding and Writing generally included it to avoid confusing those readers unused to the language. 1. Assignment statement: :=

The s can be ! ! . 2. Procedure call: <> <)

- BCPL makes you indicate parameterless procedures by showing an empty parameter list; there is only call-by-value. 3. Iterative statements: while do until do repeat repeatwhile repeatuntil Test is either before execution of (while, until) or after execution of statement . 4. Selection of cases: switchon into

The must contain case labels of the form case : or case •• : or default:. 5. Conditional statement (not strict BCPL): if then if then else if then elsf then

6. Compound statements: { * } { * * } 7. Control statements: break - exit from current iterative statement Loop - end present iteration of current iterative statement endcase - exit from current switchon statement Appendix A: The BCPL Language 383 Declarations

8. Variable declarations: let = 9. Procedures and functions: let () be let () be let () = let <) =

Note that the in a function declaration is usually a valof expression. Expressions

There are many kinds of BCPL operator, both binary and unary. I have assumed that all unary operators have the highest priority, that arithmetic operators come next with their conventional priorities, relational operators next, and finally Logical operators. In cases of confusion I've used brackets. Conditional expressions have the Lowest priority.

In translation examples I have used an invented operator: the '++' operator which takes a string and appends a character. It is quite unrealistic, but I hope you see what it means. 10. Value of a statement: valof The must contain a resultis statement 11. Unary operators: @ I* address-of *I ! I* contents-of *I +, - I* unary integer arithmetic *I \ I* Logical 'not' *I 12. Function calls: () ()

13. Binary operators: ! I* subscript operator *I . I* field-select operator *I +, -,*,I, rem I* integer arithmetic *I <, <=, =, \=, >=, > I* integer relations *I &, I I* Logical 'and' and 'or' *I ++ I* string concatenation *I 'V!n' is similar to V(n] in most other Languages, 'P.a' is 384 Understanding and Writing Compilers similar to SIMULA 67's P.a, PASCAL's PA.a and ALGOL 68's a of P. 14. Conditional expressions: -> , If the value of is true then is evaluated: otherwise is evaluated. Appendix Assembly Code Used in Examples

I have been consistent in using a single-address, multi-register machine as my illustration in examples. The code format is:

,

where an address is either a or ()

Examples: JUMPFALSE 1, 44 ADD 1, 3217(7) ADDr 5, 2.. Any register may be used as an address modifier or as an accumulator. There are a fixed number of registers (the exact number doesn't matter). Sometimes (e.g. JUMP) the register doesn't matter and is omitted, sometimes (e.g. FIXr) the address is omitted

I find it best to divide instructions into different groups distinguished by a Lower-case Letter at the end of the operation code. This makes Logically different operations, which on many real machines are implemented by widely differing instructions, visually distinct. The suffixes are:

- no suffix means a store-to-register operation (except STORE, which is of course register-to-store) - 'r' means register-to-register - 's' means register-to-store 'n' means the

part is to be interpreted as a number 'a' means the
part is to be interpreted as an address - this may seem just Like 'n' but on some machines it isn't! - 'i' means indirect addressing- the memory cell addressed contains the actual address to be used in the instruction. 386 Understanding and Writing Compilers

Examples of the differences: ADD 1, 2 means add the contents of store Location 2 to register 1 AD Dr 1, 2 means add the contents of register 2 to register 1 ADDs 1, 2 means add the contents of register 1 to store Location 2 ADDn 1, 2(4) means add the number which is formed by adding 2 and the contents of register 4, to register ADD a 1, 2(4) means add the address which is formed by combining 2 and the contents of register 4, to register 1 I hope the instruction-names are fairly indicative of their operation. In the examples I've mostly used LOAD, STORE, JSUB, SKIP?? and the arithmetic operations. Here is a table which may clarify matters:

Instruction Explanation

LOAD place a value in a register STORE place a value in a memory cell

INCRST add one to store Location DECRST subtract one from store Location STOZ set all bits in store Location to zero STOO set all bits in store Location to one INCSKP increment store Location; skip next instruction if result is zero DECSKP decrement store Location; skip next instruction if result is zero

ADD add two values SUB subtract one value from another NEGr negate value in register MULT multiply two values DIV divide one value by another (I have used fADD, fSUB etc. to denote the floating-point analogue of these instructions. Likewise xSUB, xDIV etc. denotes the 'exchanged' or 'reverse' variant - see chapter 5) FIXr convert floating point number in register to fixed-point FLOATr convert fixed point number in register to floating point Appendix B: Assembly code used in examples 387 SKIP jump over the next instruction SKIPLT jump over the next instruction if register value is less than (LT) store value SKIPLE ditto, but relation is 'Less or equal' SKIPNE ditto, but relation is 'not equal' SKIPEQ ditto, but relation is 'equal' SKIPGE ditto, but relation is 'greater or equal' SKIPGT ditto, but relation is 'greater than' JUMP transfer control to indicated address JUMPLT transfer control only if register value is less than (LT) zero JUMPLE ditto, but if less than or equal to zero JUMPNE ditto, but if not equal to zero JUMPEQ ditto, but if equal to zero JUMPGE ditto, but if greater than or equal to zero JUMPGT ditto, but if greater than zero JUMPTRUE transfer control if register contains special TRUE value JUMPFALSE ditto, but for FALSE value JSUB L, a transfer control to indicated address, storing address of next instruction (return address) in register RETN , a return to indicated address

PUSH p, a transfer contents of address a to top of stack indicated by register p, increase p POP p, a decrement stack pointer p, transfer contents of top of stack to address a PUSHSUB, POPRETN analogous to JSUB, RETN but link address is on the stack rather than in a register.

INC p, a add contents of Location a to stack register p and check that p is not outside bounds of stack space DEC p, a subtract contents of location a from stack pointer and check limits of stack.

Some of the instructions in this list may seem to have the same effect as others, but I have in general included an instruction for each simple machine-code operation which I have needed to illustrate. Using such a Large instruction set makes my task easier than it would otherwise be, of course, but some real machines have larger sets still (e.g. the DEC PDP-10). I've never yet seen a machine with a SKIPTRUE or a JUMPTRUE instruction, given that TRUE is to be represented as a specific non-zero bit pattern, but I have felt the need for it in practice! Bibliography

The most important item in a compiler-writing bibliography isn't referenced here: it is a compiler, which should be written in a system programming language to compile some language which you know fairly well. Reading such a compiler will clarify many of the points which are made in this book. If you can get hold of the source code of any compiler on your machine then don't be ashamed to read it and don't be ashamed to copy algorithms from it if that will save you time. A good compiler to look out for is that for BCPL: most versions of the BCPL compiler are based on the original written by Martin Richards. Books and papers referenced in the text above are: Aho A.V., Ullman J.. (1978) "Principles of Compiler Design", Addison-Wesley. Aho A.V., Johnson S.. (1974> "LR parsing", Computing Surveys 6, 1, pp 99-124. Baker H. (1978> "List Processing in Real Time on a Serial Computer" Comm. ACM 21, 4, pp 280-294.

Brooker R.A., MacCallum I.R., Morris D., Rohl J.S. <1963) "The compiler-compiler", Annual Review of Automatic Programming 3, pp 229-275.

Dijkstra E.W., Lamport L., Martin A.J., Scholten C.S.and Steffens E.F.M. (1978> "On-the-Fly Garbage Colle.ction: An Exercise in Cooperation", Comm. ACM 21, 11, pp 966-975. Foster J.M. (1968) "A Syntax Improving Device", Computer J, 11,1, pp 31-34.

Geschke C.M. <1972> "Global Program Optimizations", Ph.D. thesis, Carnegie-Mellon University Bibliography 389

Gries D. <1971) "Compiler Construction for Digital Computers", John Wiley, 1971.

Horning J.J. (1974) "LR parsing", in Bauer et al., "Compiler Construction - An advanced course", Springer-Verlag.

Hunter R.B., McGettrick A.D., Patel R. (1977) "LL versus LR parsing with illustrations from ALGOL 68", SIGPLAN notices 12, 6, pp 49-53.

Irons E. T. (1961) "An error-correcting parse algorithm", Comm. ACM, 4,1 pp 51-55.

Knuth D.E. <1971> "An Empirical Study of programs", Software - Practice and Experience, 1, 2, pp 105-133.

Knuth D.E. <1965) "On the Parsi11g of Languages from Left to Right", Information and Control 8, 6, pp 607-639.

Landin P.J. (1965) "A Correspondence between ALGOL 60 and Church's Lambda-Notation", Comm ACM 8,2 and 8,3, pp 89-101 and 158-165.

McCarthy et al. (1965) "LISP 1.5 Programmer's Manual", MIT Press, 1965. Rohl J.S. (1975) "An Introduction to Compiler Writing", MacDonald and Janes, 1975.

Reynolds J.C. (1972> "Definitional Interpreters for higher­ order programming languages", Proc 27th ACM National Conf, 717-740.

Steele G.L. (1977) "Arithmetic Shifting Considered Harmful", SIGPLAN notices, 12, 11, pp 61-69.

Weizenbaum J. (1977) "Computer Power and Human Reason", Freema11, San Francisco. Wichmann B.A. (1975) "Ackerma11n's function, a study in the efficiency of calling procedures", National Physical Laboratory report.

Wulf W., Johnsson R.K., Weinstock C.B., Hobbs S.O. (1973), "The design of an optimizing compiler", Computer Science Dept. Technical Report, Carnegie-Mellon University (published in modified form under the same title by American Elsevier, 1975). Index Index 391 Accessible state 337 Boolean operation 101 accurate optimisation 155 Boolean operator 95 action procedure 272, 305, 346 Boolean variable 103 activation record 244 bottom-up analysis 42, 308 activation record structure 14, branch 25 72, 182, 193, 359 break point 378 address calculation 136 break statement 107, 111 address protection 202 Burroughs B6500 229 address vector 141, 144 advantage of optimisation 158 Cache memory 172 ALGOL 60 ambiguity 265 call by name 222 ALGOL 68 ref 191, 228, 239 call by need 197 ALGOL 68 struct 148 call by reference 188 ambiguity 263, 267 call by result 188 ambiguous grammar 264 call by value 196 analysis 9 CDC 7600 171 argument check 182 character code 57 argument information 196, 200 character input 58 argument passing 198 Chomsky 254 argument preparation 183, 195 Chomsky hierarchy 259 arithmetical code fragment 77 class 244 array 136 cLass-item 59 array access 136, 141 , 169 closure 210, 215, 235, 242 array as argument 191, 234 clustering 124 array as result 235 co-processing 182, 244 array bounds check 143, 146 COBOL data hierarchy 131 array declaration 193 code fiLter 163 array space allocation 191 code motion 160, 162, 168 assembly code debugger 374 code optimisation 153 assignment statement 105 code refinement 29 associative memory 172 code region 160, 162 atype field 92 coding 351 auto-coder 373 collision 123, 124 automatic error recovery 344, comment stripping 64 347 common sub-expression 156, 166 compact code 179 Backtracking 52, 251, 279, 282, compile-time error 15 286, 289, 290 compiler-compiler 306, 324 Backus-Naur form 255 completed item 338 basic block 160 compound statement 114 BCPL ambiguity 265 conditional expression 103 binary chop Lookup 120 conditional operation 95 block 114 conformity clause 151 block entry 115, 146 consolidator 66 block exit 115 constant evaluation 163 block Level addressing 223 constant folding 162, 163 block structure 131, 133, 223 constant nationalisation 128 BNF 255 constant table 128 Boolean 'and' 101 context clash 290, 291, 292, Boolean 'not' 101 320 Boolean 'or' 101 context dependence 347 Boolean constant 103 context dependent syntax 274 392 Understanding and Writing Compilers context free grammar 260 error detection 52, 295, 301, context-sensitive grammar 260 312, 316, 318 control context 155 error handling 15, 35, 52, 64, control flow 163 295, 308, 344 control state 367 error recognition 286, 289 cost of development 157, 158, error recovery 52, 287, 297, 351 306, 346 cost of interpretation 356 error reporting 52, 287, 289, cost of optimisation 157 295, 299, 344 cost of procedure call 183, error-checking code 140 187, 200, 229, 230, 233 essential order 160 CRAY-1 171 event tracing 378 cross reference listing 122 eventually derive 262 crucial code fragment 37, 75, examination 249, 285 136, 151, 153, 231 experimental programming 356 expression analysis 301 Dangling pointer 237, 239 data frame 129, 139, 180 F register 178, 1e8, 202, 232 34, 136 factoring 279 debugging language 377 finite state machine 268, 270 declarative program 174 FIRST* list 274, 287 deep binding 365, 366 FIRST+ list 274, 285 deferred storage 162, 164 FIRSTOP list 311 derivation 257, 262 fixed-size array 146 derivation tree 257 fixed-size vector 139, 231 derived 262 fixup 66, 68, 69 directly derive 262 fixup chain 69, 131 display 219 Flabe l 98 DO nest 113 follower list 291 DO statement 111 for statement 109 dope vector 137, 143 forking 331, 333, 339 dump location 84 FORTRAN array access 142 dynamic array 189, 226 FORTRAN environment 187 dynamic binding 365 forward reference 130 function 177 Effective environment 243 function call in expression 88 efficiency 60 function side-effect 156, 167 efficiency of compilation 56, 123 Garbage collection 182, 191, efficient object program 351 234, 236, 239, 244 emulator 172, 229, 354 generative grammar 254 environment 180, 210, 211, 231, global data access 217 362 GOTO 69 environment addressing 210 goto statement 226 environment link 216 grammar 40, 42, 255, 259 environment representation 213 environment support 177 Hacking 352 epilogue 117, 183, 190 hardware design 171 equivalent grammar 263, 264 hash addressing 119, 123 error checking code 143, 146 hash chaining 123, 126 error correction 52, 297 hash key 123 heap 146, 148 Index 393 heap compaction 236 11, 268 heap fragmentation 236 Lexical error 64 heap space allocation 236 Library 67 heap space reclamation 236 Lifetime 239, 241 heap storage 191, 228, 234, Line reconstruction 57 235, 245 Line-by-Line input 58 hierarchical qualification 131 Linearising interpreter 359 Linking 66, 67 ICL 2900 140, 229 Linking Loader 66 identifier item 119 LISP compilation 370 Iliffe vector 141, 144 Listing 58 imitative interpreter 357 LL(1) anaLysis 326 in-core compiler 70 LL <1) Language 327 incomplete specification 204, Load-time allocation 186 207 Load-time check 206 indirect Left recursion 285 Load-time error handling 71 initial order 160 Loader 11, 66 inner Loop 123, 157 Loader efficiency 70 input conversion 56 Long identifier 66 input fiLter 320 Long-range syntax 40 input state 318, 320 Lookahead set 340 interactive debugging 377 Loop statement 107, 111 interpretation 171 Loop-invariant code 169 interpreter 354 LR table encoding 343 item 57 LR (Q) closure 337 item code 63 LR (Q) item 337 item description 62 LR (Q) state 327 item representation 62, 119 LR(0) table 338 LR (1) analysis 42, 324 Jumping code 94, 97, 100,101 LR(1> item 340 LR (1) Language 327 Keyword 64, 119, 127 LR (1) reduction 342 keyword recognition 63 LR(1> state 342 LR(k) analysis 326, 334 L register 178 LR (k) stack 334 Label address 130 Label as argument 191, 226 Macro processing 58 Label fixup 69 micro-process 180, 363 Labelled statement 301 microprocessor 88, 146, 171 LALR(1) analysis 340 microprogram 354 LALR (1 ) state 342 mixed-type expression 91 Language 254 mPSD 363 Language design 173, 365 MUS 172, 229 Language restriction 155, 180, multi-pass compilation 9 182, 228, 235 multi-word result 243 LASTOP list 311 multiple descriptors 131 Lazy evaluation 197 multiplicative array access 141 Leaf 25 mutual recursion 203, 205 Left context 326 Left derivation 326, 344 N-ary node 90, 284 Left recursion 263, 281, 333 name argument 196 Lexical analyser generator 272 name item 119 394 Understanding and Writing Compilers name tree lookup 121 passing argument information naming store 172 197 nested procedure call 199, 230, peephole optimisation 168 232 permissive analyser 349 net effect 158 phase of compilation 8 node 25 phrase 23, 255 node reversal 156 phrase structure 42 non-local data access 213, 217, piecemeal development 365 220, 226, 231 plagiarism 272 non-local goto 227 pointer 191, 235 non-recursive procedure 188 pointer lifetime 239 non-terminal symbol 259 pointer map 238 null symbol 278, 284 pointer variable 146 POP instruction 229 Object description 13, 33, 115, Post 254 128, 139, 147, 203 postcall fragment 183, 190, object descriptor 118 234, 243 object program efficiency 178, postfix 43 228 postfix string 28 off-line debugging 375 precall fragment 183, 190, 217 on-line debugging 377 precedence matrix 314 one-pass compiling 203, 205 pretty-printer 355 one-symbol-look-ahead 285 printint procedure 184 one-track 295 procedure 177 one-track grammar 287 procedure activation 14, 180, one-track interpreter 305 188, 244 operator grammar 310 procedure address 130 operator precedence 308, 310 procedure as argument 191, 218, operator priority 43, 314, 316 220 operator-precedence 249 procedure call 177, 180, 228, optimisation 12, 22, 111, 138, 229 145, 153 procedure declaration 116 optimisation from code 13 procedure entry 117 optimisation from the tree 13 procedure exit 117 option in production 277 procedure level addressing 178, order of computation 158 210, 225, 238 order of evaluation 103, 154 procedure return 177, 180, 181, output conversion 56 228 overhead 115, 117, 177, 183, procedure tracing 378 195, 230, 240, 244 process record structure 363 overlay allocation 186 production 256, 259 program input 57 Panic dump 375 prologue 117, 183, 190, 204 parallel analysis 330 PUSH instruction 229 parameter conversion 204, 208 parameter information 196 Quadruple 38, 313 parse tree 10, 257 parser generator 306, 324 Re-hashing 123, 124 parser transcriber 305, 324 recognition 38 parsing 253 record 136, 146 PASCAL record 148 record access 148· record vector 146 Index 395 record-type declaration 147 sentential form 259, 262 recursion 177 separate compilatio11 203, 206 recursive operator precedence sequential table Lookup 120 322 shallow binding 365, 366 recursive procedure 184 shift 328 reduce 328 shift-reduce conflict 339 reduce-reduce conflict 341 short-range syntax 40 reduction 310, 313, 316 shunting algorithm 43 redundant code 162, 166, 167 SID 324 reference argument 196, 200 simple precedence 324 region 160, 162 simple translation 12, 22 register allocation 162, 164, simple-item 59 165 SIMULA 67 class 244 register dumping 88 single-pass compilation 8 regular grammar 260 SLR(1) analysis 335 relational expression 94, 100 SLR(1) reduction 339 100 soft machine 172, 229, 354 relocatable binary 68 spelling correctio11 65 relocating 67 spelLing error 65 relocation 66 stack allocation 130, 182 relocation counter 68 stack checking 202, 229 repetition in production 277 stack extensio11 200, 203 resultis statement 117 stack handling 188, 197, 228, return statement 117 229, 231 reverse operation 87 stack searching 213 reverse Polish 27, 43 state-transition table 60 right derivation 326 static allocation 130, 186 right recursion 263, 284, 340 static binding 365 root 25 storage compaction 236, 239 run-time address 129, 130 storage fragmentatio11 236 run-time argument check 186, store access 163 195 store shadowing 172 run-time argument checking 203 strength reduction 170 run-time check 193, 200, 243 stri11g walking 29 run-time debugging 66, 69, 193, structure editing 369 221, 238, 355, 356, 359, structured program 173 366, 373 structured statement 162 run-time editing 368 sub-optimisation 31, 154, 163 run-time error 16, 140 subroutine Library 67 run-time support 13, 72, 177, successor state 337 228 switch procedure 75 symbol table 10, 22, 64, 118, Scanner generator 272 147, 148 scope 133 symbol table descriptor 118, searchmaze procedure 215 128 secondary error 298, 318 syntax 40, 253 secondary error report 52 syntax analysis 253 self embedding symbol 263 syntax graph 303 self-embedding symbol 268 system programming 228, 240 40, 253 sentence 254, 262 T register 188, 193, 199, 202, sente11ce symbol 259 229, 233 396 U~derstanding and Writing Compilers table Lookup 119 type checking 92, 148 temporary Location 84 terminal symbol 259 Unambiguous grammar 267, 293, textual Level 210 295, 318, 343 textual scope 274 unary operator 312, 313, 318, thunk 223 320 Habel 98 undefined value 376 top-down a~alysis 42, 48, 249, union type 151 277 top-dow~ interpreter 303 Value argument 196, 200 TranAddress procedure 105 van Wijngaarden grammar 274 TranAnd procedure 101 variant field 151 TranArg procedure 200, 204 variant record 151 TranArithExpr procedure 78 vector 136 Tra~AssignStat procedure 105 vector access 136 TranBinOp procedure 78, 79, 85, vector argument 141, 201 91 vector bounds check 140 TranBLock procedure 114, 133 vector declaration 193 TranBooLExpr procedure 98, 101, vector processi~g 171 103 vector space allocation 191 TranCompoundStat procedure 114 TranCondExpr procedure 103 Warshall's algorithm 272 TranDecl procedure 114, 115 weighted tree 85 TranForStat procedure 110 where declaration 173 TranlfStatement procedure 98 whiLe statement 107 TranLeaf procedure 79 Tra~LoopStat procedure 107 TranProcCall procedure 199 TranProcDecl procedure 117, 133 TranRecAccess procedure 148 TranRelation procedure 100 TranResultisStat procedure 117 TranReturnStat procedure 117 transition matrix 321 translation 10 TranStatement procedure 109 TranVecAccess procedure 137 TranWhileStat procedure 107 tree building 313, 344 tree output 38, 50 tree representation 25 tree walking 22, 25 tree weighting 84 triple 38, 45, 313 two Level grammar 274 two-pass compilation 17 two-pass compiling 203, 206 type 34, 48, 91, 128 type 0 grammar 259 type 1 grammar 260 type 2 grammar 260 type 3 grammar 260, 268