Machine Code Generation Within a Compiler- a Teaching Model

Machine Code Generation Within a Compiler- a Teaching Model

Machine code generation within a compiler- a teaching model L. V. Atkinson* and G. J. Duffusf A model is presented wherein aspects of the code generation phase of a compiler can be taught practically. SLPL (Atkinson, 1976) is a high level language with the facility to generate and execute sequences of machine code orders on ICL 1900 series machines. This paper describes the implementation of this facility and illustrates its application. (Received May 1975) The different actions of a typical two-pass compiler for an form the intermediate code becomes simply a slight modification ALGOL type language can be summarised as follows. of the source language. Downloaded from https://academic.oup.com/comjnl/article/19/2/160/408732 by guest on 28 September 2021 First pass: A further advantage of string variables presents itself when the generation of individual machine code orders is considered. A (a) lexical compaction machine code function is usually referred to by a mnemonic (b) syntax analysis opcode, a character string. Using string variables we can denote (c) construction of semantic tables the function part of an order by its opcode. (d) transformation to intermediate form. Second pass: 3. List structures To produce code for an arithmetic or boolean expression we (e) semantic checking need to know its structure. This is normally implicitly repre- (/) machine code generation. sented in intermediate code by some form of Polish Repre- As indicated some of these activities proceed in parallel but sentation (Randall and Russell, 1964). Again independent each may be more easily understood if treated independently. treatment of the code generator is hampered because of the It is suggested that an undergraduate course in compiling unnatural form of the intermediate code. Further, the gener- techniques might treat each aspect separately before discussing ation of machine code by an algorithm working from a Polish ways in which they can be combined. We examine here in form expression is not, in the authors' opinion, so natural as detail how the code generation phase can be implemented by an algorithm which can make use of an explicit represen- independently and outline other aspects only where relevant. tation of structure. This point is illustrated by the example Associated with any compiler are at least four language levels. program listed at the end of this paper. A multi-pass compiler may use several different intermediate Explicit structure representation within the intermediate code forms and so have more than four. is not difficult to produce when one comes to implement the syntax analyser. The analyser can be top-down and recursive 1. Implementation language (the language in which the compiler without back-up but with expressions treated by operator is written). precedence techniques. Adopting such an approach and using In the present case this is SLPL. The system to be described is a language which allows the use of structures it may well be an integral part of SLPL but a similar scheme could be easier to produce intermediate code of the form described incorporated within any language by supplying a few code- rather than transforming expressions into Polish notation. bodied procedures. There are three features of SLPL not present in all languages which make its use particularly suitable. 2. Source language (the language to be compiled). The principles of code generation are best approached in 1. Recursion stages. Assignment statements should be considered before Recursion is not necessary for compiler writing but its use conditionals and loops and simple variables before arrays and makes the techniques easier to understand and to apply. procedures. We take ALGOL 60 as the source language and Without recursion one tends not to see the wood for the trees! accordingly approach compilation by defining subsets of the language, each successive subset containing the previous subset. 2. String variables The key to creating a simple code generation system is the The availability of string variables may not seem at first to be provision of a suitable run-time model, within which the object of any great importance but their use allows a comprehensible code is executed, associated with a correspondingly simple representation of the input data for the code generator and source language subset. Such a model is described in the simplifies the input module when such a representation is used. following sections. A source program contains identifiers (strings) and basic symbols (strings) but during the first pass of the compiler their 3. Intermediate code (output from the first pass, which is also representation is changed to that of the intermediate code. input to the code generator). Each basic symbol, regardless of length, is reduced to a single In the light of the above discussion of the implementation unique (short) bit pattern and each identifier is replaced by a language we can supply as input to the code generator a pointer to an appropriate part of the semantic tables. With such program written in a slightly modified form of the source an intermediate code form it is not easy to treat the code language. Identifiers and basic symbols are to be somehow generator independently because of the difficulty of supplying delimited as strings and the structure of expressions is to be appropriate data. By retaining all such strings in their source explicitly represented. The input conventions of SLPL are •Department of Applied Mathematics and Computing Science, The University, Sheffield S10 2TN. t Department of Computing Services, The University, Sheffield S10 2TN. 160 The Computer Journal particularly suited to this. The input module of an SLPL (b) to determine the action to be carried out. The action program will accept atoms or lists. Relevant to the present required for a simple variable, for example, differs from context are lists (sequences of atoms or lists suitably bracketed that needed for a function designator. and separated—the external representation of the brackets 3. Addresses of variables must be obtained. For a simple vari- and separator may be specified by the user or the separator able this may be a direct address if the variable is declared in may be suppressed) and atoms of mode string (a character the outer block, a displacement from the start address of the string enclosed between " and ") and char (any character, apart current block if the variable is local, or may be in the form of from the space and end of record characters, which is not a list a block number and an associated displacement within that bracket, a list element separator or part of any other atom block. (string, int or bool) is taken to be a character atom). A simple For the simple model input representation now suggests itself. Basic symbols can be strings, identifiers can be single characters and, if we assume all (a) assume all variables are of type integer or boolean and operators are dyadic, expressions can be fully bracketed that their use is always correct and hence ignore (2) triplets. It is convenient to treat the assignment operator above, (denoted, say, <-) as other operators and to bracket it with its (b) dispense with declarations, and operands. If the list element separator is suppressed the follow- (c) assume all identifiers refer to distinct variables which ing program indicates a possible form of input data with strings may be accessed directly. If identifiers must be single represented in bold face. letters this implies that 26 variables are available. Downloaded from https://academic.oup.com/comjnl/article/19/2/160/408732 by guest on 28 September 2021 begin In the first instance a small subset of ALGOL statements integer i,j, k; boolean b; should be considered and it is suggested that initially the read /; ready; subset contains only compound statements containing simple print i; print j; I/O instructions and assignments statements and that operators are restricted to + and —. It is then a simple step to introduce if b then read k else (k (t + j)); conditional statements and loops and to extend the set of print k operators to include some boolean and relational operators. end 2. Run-time model Any suitable convention for the form of source and inter- The run-time model provides instruction and data areas and mediate language I/O statements may be adopted. The above special registers for the object program and an interface between example indicates one of the simplest. the high level and the low level facets of the interpretive execution phase. 4. Object code (the output from the code generator). The high level aspects involve A comprehensive undergraduate course dealing with compiling (a) the instruction area, the number of words allocated being techniques might well consider the generation of a stack— specified by the user, oriented object code for a hypothetical machine (Randell and Russell, 1963). For a practical implementation it is a simple (b) an executive control loop for the interpretation, matter to write an emulator of such a machine. This has the (c) I/O facilitating transput of integers. advantage that the overall structure of the code generator can be considered without going into details of production of strict The low level aspects concern the interpreting routines activated machine code. by the executive control loop and the data areas and special registers they may affect. The scheme to be described here, however, utilises a standard The interface comprises five words : assembly language PLAN (ICL, 1972) rather than a hypo- thetical code. With this approach some basic code generation (a) CIP the current instruction pointer indicating the next concepts can be implemented very quickly if students are instruction to be obeyed, already familiar with the assembly language. (b) INST the instruction currently being obeyed, In a teaching environment emphasis must be placed on error detection and system diagnostics.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us