Threaded Code Is Not in Fact Implemented in the Computer's Hardware

desired generality in others. In summary, when writing hard code the user needs relatively many instructions, each of which executes relatively rapidly. An interpreter, by contrast, is a vehicle by which the user can choose his own instruction set to correspond Programming R. Morris to his specific problem. Obviously such freedom allows Techniques Editor a much shorter program for that: problem to be written. The penalty is that the instruction set of the interpreter Threaded Code is not in fact implemented in the computer's hardware. Instead the interpreter must itself be a computer pro- James R. Bell gram which simulates the action of the interpretive in- Digital Equipment Corporation struction set in terms of the actual instruction set. This can be a time-consuming proposition. Thus interpretive code tends to be shorter but slower than hard code. The concept of "threaded code" is presented as an It is instructive to look at the relation between the alternative to machine language code. Hardware and host hardware and the alternatives discussed. In the software realizations of it are given. In software it is case of hard code an instruction directs the flow of realized as interpretive code not needing an interpreter. processing by its actual execution from the IR, or in- Extensions and optimizations are mentioned. struction register, of the machine. In the case of an in- Key Words and Phrases: interpreter, machine code, terpreter, an "instruction" is in fact merely a datum time tradeoff, space tradeoff, compiled code, subroutine from the interpreting program. Thus it directs the flow calls, threaded code of processing from an accumulator or the equivalent. CR Categories: 4.12, 4.13, 6.33 We may now describe a "threaded code computer"--a machine in which an "instruction" controls the Pc, or position counter. 2. Threaded Code: Hardware and Software Realizations Let us imagine a computer which works in the fol- lowing way: 1. Introduction Step 1. S, the value of the Pcth word of memory, is One of the most fundamental tradeoffs in software fetched. engineering is that of space versus time. It is normally Step 2 (a). The routine starting at location S of mem- possible to write a faster and larger (or a smaller but ory is executed. slower) version of a given program. In this paper we Step 2 (b). The value of PC is incremented by one. describe a technique called threaded code of implement- Step 3. Go to Step 1. ing programs. Under suitable circumstances, it is shown We shall call this machine a threaded code computer. to achieve a desirable balance between speed and small It is quite feasible to build a physical device corre- size. sponding to the above description. However, this is un- The most common alternative techniques of pro- necessary. We shall show that it is possible to eco- gramming might be denoted "hard code" and "in- nomically transform a suitable general purpose com- terpretive code." Hard code (or machine code) is the puter into a threaded code computer via programming. most used method of programming. Each instruction Let us describe the implementation on a PDP-11 of the program is chosen from the set wired into the computer. Let the PC of the threaded code computer host computer by its designers. Each such instruction correspond to a general register R of the PDP-11. Then executes rapidly since it is wired into the physical cir- to use the PDP-11 as a threaded code computer we cuitry of the computer. On the other hand, the given in- need only end each of the set of routines described in struction set is suboptimal for almost any specific Step 2(a) with the instruction problem because the user is forced to accept unneeded JMP @ (R) + generality in some places and to circumvent a lack of Copyright (~) 1973, Association for Computing Machinery, Inc. which links the end of one routine to the beginning of General permission to republish, but not for profit, all or part the next. This is PDP-11 notation for: of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date Step A. Transfer control to the routine beginning at of issue, and to the fact that reprinting privileges were granted the location whose address is the value of the Rth by permission of the Association for Computing Machinery. Author's address: Digital Equipment Corporation, 146 Main word of memory. Street, Maynard, MA 01754. Step B. Increment R by one word. 370 Communications June 1973 of Volume 16 the ACM Number 6 Fig. 1. Flow of control: hard Fig. 2 Flow of control:interpre- Fig. 3. Flow of control: threaded code. tive code. code. / / / / t - IITERPRETE~ HAR' U INTERPRETIVE INTERPRETIVE THREADED SERVICE COD SUBROUTINES CODE ROUTINES CODE ROUTINES But note that Step A does exactly what was specified However, there are reasons that C may become in Steps 1 and 2(a) of our description of the threaded unduly bulky: a multiple word instruction or a multiple code machine. Similarly, Step B does exactly what was instruction sequence may be needed to implement some specified in Step 2(b). Thus we have in fact transformed of the C~'s. Some data types (e.g. long floating, double a PDP-11 into a threaded code computer. integer, or complex numbers) may be awkward to If a computer contains instructions which can move. Or some operations may have to be executed by increment a register and can load the PC through two a software routine (e.g. floating operations or integer levels of indirect addressing or the equivalent, then multiply/divide on many small machines; complex such a procedure is possible. If these things can all be arithmetic on almost all machines). done with a single instruction, then a very economical Under such conditions threaded code has the ad- implementation using that instruction is available. vantage of an interpreter (i.e. shorter code than the What we have created is in effect interpretive code hard code) while still being quite competitive with which needs no interpreter. Figures 1, 2, and 3 contrast regard to speed. The speed of the threaded code is due the flow control patterns of threaded, hard and inter- primarily to the fact that the single instruction for pretive code. threading can replace both a call and a return from a subroutine. For example, on the PDP-11 Model 45, threading takes 1.2 ~s whereas a subroutine call plus a 3. The Economics of Threaded Code return takes 2.6 ~s. Of course, an inline code sequence needs no linkage and thus is 1.2 tzs faster than threaded Let us now show the practical value of threaded code. Therefore, threaded code is relatively the most code. As an example let us use the evaluation of arith- attractive where subroutines are frequent. metic expressions. Ignoring unary operators we may Time and space comparisons are heavily influ- visualize such an expression as an infix sequence enced by the examples chosen and the assumptions X1 op2 )(3 op~ ... X,, where each X is a variable name made about the code generated (e.g. whether registers and each op is an arithmetic operation. or a stack is used). Using a large sample of actual This expression can be rearranged by well known FORTRAN programs compiled into threaded code on the methods into a Polish suffix sequence PI P2 P3 • • • P,, PDP-11 computer, we have found threaded code where each Pi represents either variable name or an roughly equal to hard code in speed (typically 2 or 3 operator. A wide variety of code might be generated percent slower) while somewhat shorter (typically to evaluate this sequence. For example, one straight- 10-20 percent). Since this particular compiler generates forward possibility is to generate a sequence C of code threaded code even for those operations such as integer arithmetic, where such a course is not advantageous, it C1 C~ C3 ".- C~ is reasonable to assume that threaded code can be even Here a P~ which is a variable name generates code C~ more beneficial in other applications. to stack the value of that variable, and a P~ which is Threaded code tends to become more attractive as an operator causes code Ci to be generated which program size increases, since the cost of each threaded applies that operation to the stack top. Given suitable service routine can be amortized across more calls. It hardware, the sequence C should execute rapidly. If is also worth noting that threaded code, unlike typical the size of C is sufficiently small, then this "hard code" interpreters, contains only those service routines actu- solution serves well. ally called upon by a given program. 371 Communications June 1973 of Volume 16 the ACM Number 6 4. Variations on Threaded Code L.D. Fosdick and Algorithms A.K. Cline, Editors The previous example assumed a stack was used as Submittal of an algorithm for consideration for publica- the basic discipline for data. Actually this assumption tion in Communications of the ACM implies unrestricted is unnecessary. The threaded code service routines can use of the algorithm within a computer is permissible. Copyright © 1973, Association for Computing Machinery, Inc. pass or receive data according to any convention; they General permission to republish, but not for profit, all or part may even be passed parameters if desired. The param- of'this material is granted provided that ACM's copyright notice eters of a routine can immediately follow the threaded is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted link to the routine.

Threaded Code Is Not in Fact Implemented in the Computer's Hardware

An Architectural Trail to Threaded-Code Systems

Basic Threads Programming: Standards and Strategy

Effective Inline-Threaded Interpretation of Java Bytecode

Csc 453 Interpreters & Interpretation

Static Analysis of Lockless Microcontroller C Programs

A Case Study of Multi-Threading in the Embedded Space

The Flow Programming Language: an Implicitly-Parallelizing, Programmer-Safe Language

Virtual Machine Showdown: Stack Versus Registers

Multiple Cores + SIMD + Threads) (And Their Performance Characteristics

Generating Multithreaded Code from Parallel Haskell for Symmetric Multiprocessors Alejandro Caro

The Vector-Thread Architecture

Threaded Code Variations and Optimizations