Some programmers question certain claims made for threaded-code systems. The author defends these claims by tracing such a system from its origin in simple concepts.

An Architectural Trail to Threaded-Code Systems

Peter M. Kogge, IBM Federal Systems Division

Interest in software systems based on threaded-code * the concepts and syntax of a high-order language, or concepts has grown remarkably in recent years. Ad- HOL; and vocates of such systems regularly make claims for them * a set of commands for a conversational monitor. that many classical programmers regard as bordering on The ISA (such as it is) is for an abstract machine that is the impossible. Typically, proponents claim that it is very efficiently implemented by short code segments in possible to construct, in 5K to 10K bytes of code, a soft- almost any current machine architecture. The HOL con- ware package that cepts include hierarchical construction of programs, * is conversational like APL, Lisp, or Basic; block structures, GOTOless programming, and user- * includes a compile facility with many high-order definable data structures. The commands for the conver- language, structured-programming constructs; sational monitor permit direct interaction with objects * exhibits performance very close to that of machine- defined by the programmer in terms he defines, not sim- coded program equivalents; ply in the terms of the underlying machine (i.e., core * is written largely in itself and consequently is largely dumps). Further, the command set is directly portable; in a conversational mode and includes virtually all com- * places no barriers among combinations of system, mands available in the system; when so executed, they , or application code; function just as they would if found in an executed pro- * can include an integrated, user-controlled virtual gram. Finally, programs defined by the programmer are memory system for source text and data files; automatically part of the system's set and can be used in * permits easy user definition of new data types and any combination with other commands in either a conver- structures; and sational or program-definition mode. This permits a pro- * can be extended to include new commands written grammer to develop application-oriented interfaces that either in terms of existing commands or in the are compatible with and run as efficiently as the underlying machine language (an assembler for the monitor commands. underlying machine can be implemented in as little as Much argument exists as to whether these results define a page of text of existing commands). a machine, a language, an architecture, a system, etc. We Such systems do exist-Forth is perhaps the best sidestep this issue by declaring it to be a self-extendable known-and have attracted widespread attention, as software package. evidenced by countless implementations, publications, products, and standards groups. The purpose of this article is to convince the skeptic Discussion base that these claims are, in fact, achievable without resorting One of the to obscure software magic or arm-waving. The approach major tenets of modern software develop- ment is is to build a logical (not historical) trail from some rather the value of modular, structured solutions to problems. In simple concepts to the outlines of a software system that such solutions, the final product is a hierar- meets the above claims. A series of generalizations added chy of procedures or , each with well defined interfaces and to a simple view of software leads to the short, easily understood bodies that exhibit final result, a minimum package approaching the combination of side-effects. When written in either assembly or higher-order lan- * a simple instruction-set architecture, or ISA; guage, such structured applications tend to look like this:

22 (X)18-9162,'82/0300-0022(SOO.75 1982 IEEE COM PUTER APPLICATION CALL INITIALIZE Register I points to the next address in the list of pro- CALL INPUT cedures to be executed, and W contains that address. CALL PROCESS CALL OUTPUT Figure la summarizes their relationship. Generalization 2: Higher levels in the hierarchy. This same technique can be used at higher levels in the hierar- INPUT CALL OPEN chy by using a stack to keep track of I values and by CALL READ CALL CLOSE preceding each procedure that is above the next-to-last level with a piece of code that stacks the current value ofI and resets it to the new list. A new entry is added at the end of each procedure list; it points to a routine that pops the PROCESS CALL STEP 1 CALL STEP 2 stack to retrieve the I value of the procedure's caller. CALL STEP 3 INPUT Push I to STACK New prologue (Code that handles passing of operands has been I-NEW LIST deliberately ignored for the present; it will be discussed Branch to NEXT later.) NEW LIST DW OPEN Define each word DW READ as an address The key thing about such "programs" is that they DW CLOSE largely consist of addresses of procedures, with each ad- dress preceded by a CALL opcode. Generalization 1: Removal of opcode. Consider first DW RETURN programs at the next-to-last level of such a hierarchy, RETURN Pop from STACK to I those in which all called routines are actually pure Branch to NEXT with no further nested calls. One way to This stack is often called the return stack. simplify representation of such programs is to replace the This technique effectively reverses the position of the list of call instructions with a simple list of addresses; a information that tells the machine it is entering a new pro- very small machine-language routine would go through this cedure. In the classical approach, the opcodes associated list sequentially, making indirect branches at each step. with the addresses are interpreted before the change to a This clearly reduces storage at some cost in performance new procedure. In the above approach, the prologue code but, by itself, seems to yield nothing else. The benefits of serves the same functiorl, but is invoked after the address this move will be brought out in later generalizations. in the list has been used. What has been defined is, in ef- As an example, assume that Steps 1 to 3 above are pure fect, a mechanism for a called routine to determine what machine code. Thus, PROCESS could be replaced by: kind of routine it is and how to save the return informa- PROCESS I -Address of pointer to tion; in contrast, the classical approach demands that the first step (i.e., PLIST) calling procedure provide this information. In a sense, the Branch to NEXT machine does not know what kind of routine it is entering PLIST DW STEP 1 until it gets there. This self-definition capability is the DW STEP 2 DW STEP 3 keystone for many of the capabilities to be discussed below. Generalization 3: Commonality of prologue. As de- STEP i Code fined above, every single procedure constructed from a Branch to NEXT NEXT W-Memory(I) Machine code to start se- list of calls to other procedures must have its own carbon I-I + I quencing through list. I copy of the prologue code. A programmer's typical Branch to (W) points successively to each response to such a situation is to look for ways ofavoiding entry. this duplication. In one common approach, he stores at The code at the beginning of PROCESS is termed its the entry point of the procedure not a copy of the pro- prologue. In each of the called steps, any RETURN in- logue code, but the address of a single routine that carries structions would be replaced by a branch to NEXT. out the function of the prologue code. Thus, all pro- In the literature, this threading of a sequence of sub- cedures that use the same prologue start with an address routines into a list of their entry addresses is termed direct that points to the actual routine. threaded code, or DTC. The small routine NEXT is Placement of an address rather than actual code at the termed an address (it has also been called an entry to a procedure requires a change in the address inter- to the start of inner interpreter). In many architectures, it is a single preter. Instead of branching directly the "jump indirect with post increment" instruction. next procedure, the address interpreter must branch in- the first word of that This By convention, I and W (as well as X, Y, and directly through procedure. TABLE, like the to be introduced later) are assumed to be registers or dedi- corresponds to something following: cated memory locations of the underlying machine. NEXT W-MEMORY (I) Through the address interpreter sequences of the ma- I-I+1 chine, these registers or locations direct the machine to X-Memory (W) carry out the appropriate procedures. Branch to (X) March 1982 23 If a procedure really is to be implemented in machine The word addressed by W is the prologue address code, the address at its entry typically points to the very pointer, or PAP (very often referred to as the code ad- next word. dress wordor CAW). Again, Wis the address ofthe PAP. Figure lb summarizes the relation between I, W, and In the above description ofNEXT, Xis the contents ofthe X, and Figure 2 diagrams a fairly complex case. The pro- PAP. This value is the address of the prologue machine logue is called ENTER; EXIT is the procedure to return to code for the new procedure. the caller. One important aspect of this is that most of the A variation of this approach replaces this entry address code is largely machine-independent. Only at the lowest PAP with a token, which serves as an index into a table of level do we see real machine/architecture dependencies. possible prologue codes. This adds one more level of in- All other levels use addresses, which can conceivably be direction to the address interpreter, but provides even transported unchanged to another machine with an en- more machine independence to the total program. The tirely different architecture. technique is often called indirect token-threaded code, or ITTC. Figure lc diagrams it in more detail. Table 1 com- Code written for this new address interpreter is often pares DTC, ITC, and ITTC. called indirect threaded code, or ITC. It is slightly slower than the earlier version, but provides savings in storage Generalization 4: A stack. These and machine independence and permits some of the in- parameter lists of ad- dresses are to resemble a in an teresting generalizations discussed below. If more speed is beginning program ISA, where each instruction consists of an desired, this function and that of any of the other address only opcode-the address of a routine. What is needed to interpreters can be microcoded or added to the hardware help complete this ISA is some to of the underlying machine. way pass operands. The simplest such mechanism-one used in many other ISAs that have no explicit operand specifiers-is a second stack. Any in- struction that needs operands as inputs takes them from the top of this stack; any data produced goes back onto the stack, which is commonly called the parameter stack. EXECUTING PROCEDURE: It is distinct from the return stack discussed above. As a trivial example, the machine code for a procedure *. 100 200 3_ W 200: ACT UAL to add two numbers looks like this: MACHINE CODE ADD DW * +1 address . Prologue pointer ADDRESS INTERPRETER PSTACK to Z NEXT: W-MEMORY (I) Pop Get operand 1-1+1 Pop PSTACK to Y BRANCH TO W (a) Add Z to Y Do operation Push Y to PSTACK Save result Branch to NEXT EXECUTING PROCEDU RE: The PAP points to the actual machine code. This, in turn, pops two values off the stack, adds them, and returns the I...... t 100 200 - 200 PAP 300 result. The asterisk marks the address of the current loca- tion Thus, * + 1 points to one location beyond the PAP. actual stack-oriented ISAs all ADDRESS INTERPRETER ACTUAL Many keep information NEXT: W-MEMORY (I) MACHINE on a single stack; the advantage of two stacks lies in both 1-1+1 CODE X-MEMORY (W) simplicity of implementation and the reduced conceptual BRANCH TO X (I,) complexity they present to a human programmer. Oper- ands are always found on the parameter stack and pro- cedure call information on the return stack. The advan- EXECUTING tage of this will become increasingly apparent to the PROCEDURE. reader in the following sections. * 100 200 - 200 | TOKEN: 30 Generalization 5: Reverse Polish notation. Existence of TABLE OF a parameter stack permits use of reverse Polish notation, ADDRESS INTERPRETER l PROLOGUE NEXT: W-MEMORY (1) 30 BRANCH 1-1+1 ADDRESSES X-MEMORY (W) Y-TABLE (X) - _ _- _ (I,) BRANCH TO Y _ 300 300:ACTUAL Table 1. MACHINE CODE Comparison of DTC, ITC, ITTC.

TYPE OF VALUE OF INTERPRETER ADDRESS OF ADDRESS REGISTERS EXECUTED INTERPRETER W X Y MACHINE CODE Figure 1. Types of address interpreters (or, Where is the machine DTC 100 200 - - 200 code now?): (a) direct threaded code, (b) indirect threaded code, and ITC 100 200 300 - 300 (c) indirect token-threaded code. ITTC 100 200 30 300 300

24 COMPUTER or RPN, for specification of series of operations. In the Generalization 6: Named entities. Despite the advan- written form of such a notation, operands come before tages of parameter stacks and reverse Polish notation, in the operations and evaluation proceeds left to right, one many classical programs and programming languages a operation at a time. There are no parentheses, and no need remains for binding the values of named variables, precedence is given to one operator over another. This arrays, records, etc., to explicit storage locations that are notation is used in solving problems on many calculators. less transient than entries on a stack. These can be added As an example, consider the polynomial evaluation to our developing threaded-code architecture by defining special prologue routines that use information from the AT2+BT+C=(AT+B)T+C address interpreter, namely address W of the PAP, to identify unique storage locations. When expressed in a written RPN, this becomes As an example, a procedure that delivers a constant to the parameter stack might have two entries: the PAP AT*B+T*C+ pointing to a special prologue and a word holding the desired value. The prologue code might look like this: Such an expression is read from left to right in the follow- ing manner: CONSTANT Z-Memory(W +1) Get the value Push Z to PSTACK Push it (1) Put A on the stack. Branch to NEXT (2) Put Ton the stack. as constants as needed can (3) Multiply the top two entries (A and T). Again, many be defined. Each will have its (4) Put B on the stack. unique value, but will contain a PAP to the same (5) Add the top two entries (A Tand B). pointer prologue. Cases in which the values (6) Put Ton the stack. change frequently can be handled similarly; such structures are called variables. (7) Multiply the top two entries (A T+ B and T). Here the code stacks not (8) Put C on the stack. prologue the contents of location W+ 1 but its address: (9) Add the top two entries ((A T+B)Tand C).

If we could come up with a way of defining threaded- VARIABLE Z-W+I code procedures to access A, B, C, and T, the above com- Push Z to PSTACK putation could be carried out by a series of nine addresses Branch to NEXT in exactly the above order. No general registers, accum- ulators, or other constructs would be needed. This To find the value of a variable, a second procedure simplification is the key to maintaining the concise user must be called. This procedure, often called @ (AT), uses interface described below. the top entry of the stack as an address and replaces it with

ADDRESS TERPRETER NEXT W-MEMORY + X-MEMORY iW\N BRANCH TO X

Figure 2. Indirect threaced code. Asterisk marks address of this memory location.

March 1982 25 the value of that location. If defined as a machine-coded top of the stack is nonzero, the operation increments I primitive, it might look like this: over the displacement. If the ITC address interpreter is used, then I, in the body of branch, points to the dis- AT DW*+1 PAP placement directly. Pop PSTACK to Z Y-Memory (Z) Push Y to PSTACK Branch to NEXT I BRANCHO- PAP DISP Pop PSTACK to Z Machine Changing the value of a variable also requires a new If Z = 0 then code procedure, often called ! (STORE). This procedure I-I + Memory(I) for removes two operands from the parameter stack; one is else I -I + 1 Branch on 0. the value; the other the address, as delivered by an earlier Branch to NEXT VARIABLE. Defined in machine code, it would look like The above branch and its variations} are equivalent to this: what is found in most classical ISAs. Another form, par- STORE DW*+ l PAP ticularly useful for loops, is a conditional procedure exit. Pop PSTACK to Z Get address It pops the return stack into I only if the top of the Pop PSTACK to Y Get new value parameter stack is zero. Otherwise, the return stack is left Memory (Z)-Y Save it untouched. Coupled with this are procedures to mark the Branch to NEXT return stack with an address for future use by this condi- The important concept behind CONSTANT and VAR- tional return. IABLE, as defined above, is that the access to a data Yet another form would be a conditional procedure structure can be defined as a prologue code that is both call: addressed by the PAP and works on a data area starting at location W+ 1 (one word beyond the PAP). Under this definition, it is possible to define any kind of a data struc- ture. For example an array procedure could use the top of I CALLO- * + I the parameter stack as an index, the contents of W+ 1 as a FOO Pop PSTACK to Z. If Z 0 then dimension limit, and W+ 2 as the start of the body of the Push I + 1 to RSTACK array. The prologue code referenced at the beginning of I -Memory(I) an array definition might look like this: else I - I +1 Branch to NEXT ARRAY Pop PSTACK to Z Get index If Memory(W + 1)

26 COMPUTER Access to the iteration variable can be obtained by sim- ISA. It consists of the costs of the address interpreter and ple procedures that copy from the top of the return stack prologue codes and the possible loss of machine register to the parameter stack. Such facilities are provided in optimization for parameter passing. The cost of the first many threaded-code systems. two items tends to be small for many modern ISAs; that of the third must be weighed against the standards imposed Partial summary. What we have so far is the outline of for procedure calls (such as register saves and restores) in the architecture of a simple ISA that has many software systems and the less than optimal code * an indirect code access mechanism, produced by many and programmers required * two stacks, to produce a great deal of code. * reverse Polish notation, Benchmarks show typical performance penalties for * user-definable data structures, threaded-code versus direct assembly coding to run in the * extensible operations, range of 2:1 for eight-bit microprocessors and 1.2:1 for * a hierarchical framework, and . This is in contrast to orders-of-mag- * the ability to directly support many HOL constructs. nitude loss when more conventional interpreters (such as those for Basic) are used. An application written in such a system would consist of This small loss in performance is usually countered by a decrease in program size and a gain in machine in- * a small set of small prologue routines; dependence. The only machine-dependent parts of an ap- * a basic set of procedures written in the ISA of the plication package are the prologue code and basic pro- underlying machine; and cedure set, each ofwhich is relatively easy to transport to a * a set of procedures constructed from pointers to pro- different machine. The bulk of the application consists of logues and sequences of addresses pointing to either addresses, which can, conceivably, be transported intact this set or the basic set of procedures. to another machine. There is a performance difference between this and an While these results meet some of the claims listed in the application written totally in the underlying machine's introduction, they do not address others. Compile capa-

PSTACK POLY CALLED ROUTINES PROLOGUES 5 PAP-ENTER PUSH TO PSTAC

5 5 DUP * POP PSTACK TO X. BRANCH TO NEXT 5 5310 PUSH TOPSTACKv 5 50 B \BRANCH TO NEXT. * 5 50 AT+ PAP: CONSTANT X-M EMORY (W. 350320 + PAP VARIAE10 PUSH X TO PSTA 2 BRANCH TO 3503 50BC LITERALAT K \ \ \ PAP + NEXT 3 \ \ \ \POP PSTACK TO Y. + 350 3 C i \ \ \ \ ~~~~~~~~BRANCHTO NEXT. 350 C3 AT IJ\ \

380 Z STORE POP PSTACK TO . . EXITF. Y-MEMORY (X)

Figure 3. Procedure for polynomial evaluation; a detailed example that incorporates generalizations 1 through 6.

March 1982 27 bility and interactiveness, for example, can only be preter) forms the basis of an interactive computing achieved through certain programs that can be written in capability. The text interpreter is a simple program that the architecture. The following subsections develop these accepts input characters from a terminal; as soon as a programs and related concepts. name is complete, it searches the dictionary for a match. If one is found, the entry in the dictionary can be ex- Generalization 9: The dictionary. The first step toward ecuted. If no match is found, the program attempts to making a collection of threaded-code procedures interac- determine whether or not the name is a number. If it is, its tive with a human is to provide some mechanism to trans- translated value is left on the parameter stack. If not, an late the symbolic name of a procedure into its definition error message is sent. In either case, the text interpreter as a PAP and body. The simplest and most common then returns for more input. mechanism is to precede each stored PAP with three The text interpreter is extremely simple, but has two key items: aspects that give it a true interactive capability. First, the * the symbolic name of the procedure as a character program is designed so that when it actually executes a string; user-specified procedure, there is nothing from the text * a pointer to another procedure's name; and interpreter on either the parameter stack or the return * certain other information that is relevant at compile stack. The user therefore perceives both stacks as his and time. feels free to store information on them. Second, pro- cedures are executed in the order in which they are typed The pointer field is used to link the names of the pro- in, from left to right. Thus, when a reverse Polish nota- cedure set together in a list, called a dictionary. Starting tion is used to express a computation, it corresponds ex- with a symbolic name, a search of this dictionary can actly to the equivalent sequence of procedure names to be return a pointer to the appropriate PAP and body. All the typed into the text interpreter. Thus, the sequence 10 20 30 standard search speed-up techniques, such as hash tables, * + PRINT enters 10, 20, and 30 onto the parameter are applicable and have been used in various systems. stack, multiplies 20 by 30, adds 10 to the product, and uses Figure 4 diagrams part of a possible dictionary. the procedure PRINT to print the results. In contrast, most other interactive systems, such as The text interpreter. The combination of a dictionary BASIC and APL, require significant syntactic processing and the text interpreter (sometimes called the outer inter- on whole lines of text before execution can begin. There is no need for equivalent syntactic analysis in this system.

Generalization 10: The compile state. Since the system has no need for syntactic analysis, it is possible to imple- ment a one-pass compile facility. Basically, when this facility is triggered by a user command, it establishes the name of a new procedure in the dictionary. Then, instead of executing the subsequent procedure name inputs, it simply stores the addresses of their PAPs in successive locations of the body of the new procedure. Detection of a termination command completes the compilation and returns the text interpreter to its normal state. This compilation facility is typically implemented by modifying the text interpreter such that it has two states and then defining two procedures to govern transitions between states. State zero corresponds to the execution state, in which detection of a procedure name in the input .Sss stream causes immediate execution of its PAP and body. In state one, the compile state, detection of most pro- cedure names causes not their execution, but simply the addition of a copy of the address of their PAPs to the body of a new procedure definition. In many systems, the two special state-controlling pro- cedures have the symbolic names : and ;. The procedure: (DEFINE), when executed, uses the next symbolic name from the input stream not in a dictionary search, but as the information needed to start a new dictionary entry. This new entry is linked to the previous dictionary entries in the appropriate fashion and is given a PAP that points to the ENTER prologue. The body is left open, and the state of the outer interpreter is set to one. After execution of a :, the outer interpreter takes names Figure 4. Sample dictionary format. from the input stream, looks up each in the dictionary,

28 COMPUTER and (with certain exceptions) adds a copy ofthe address of Generalization 11: Structured code. The ability to exe- its PAP to the new entry. This is an incremental compila- ctue procedures during compilation of another gives the tion. system a powerful macro-like capability. Perhaps the The procedure; (END-DEFINE) is an example of a set most useful of such "macros" are those that implement of procedures that should be executed even in compile most of the block structures found in many HOLs. Con- state. A typical way to make this determination is to in- clude in each dictionary definition aprecedence bit. If the bit is a one, it implies that the procedure should be ex- ecuted in either state; if zero, it implies that it should be executed only in the execute state. INTERPRETER SOURCE DICTIONARY This precedence bit is set in the definition of ;. Thus, STATE INPUT when ; is encountered in the input stream, it is always ex- EXECUTE ecuted. Its execution causes completion of the current Ar E L7 READ B COMPILE definition by adding a final address to the EXIT pro- DUP - cedure (used in the previous examples) and by resetting A - the state to zero. B - interpreter AT - As an example, the following text is sufficient to define the polynomial evaluation routine shown in Figure 3. 3 -

AT - : A * * + Z POLY DUP B @ + 3 C @ !; z - STORE name is assume that The new procedure's POLY; EXECUTE previous definitions have set up A, B, C, and Z as earlier defined. Figure 5 diagrams the changes in state and dic- tionary as this input stream is encountered. Finally, Figure 6 is a flowchart of the text interpreter. The part in the dashed box is the entire code needed to achieve a basic compile facility. All of these procedures can themselves be defined (or redefined dynamically by the user) by a ...; procedure. For example, the entire text interpreter procedure can be written comprehensibly in six lines of source text (not counting the lower-level pro- cedures it calls). Figure 5. Sample compilation.

WAIT FOR NEXT INPUT NAME I

ADD NAME TO I DICTIONARY

SET STATE TO COMPILElIl

BRANCH TO NEXT I

ADD EXIT TO DICTIONARY ENTRY

SET STATE TO EXECUTE

BRANCH TO I NEXT |

Figure 6. Text interpreter with compile facility. Dotted box contains all basic compile functions.

March 1982 29 sider, for example, the following definition: also leaves on the parameter stack the address of the memory word that follows the BRANCHO. This word :TEST X @ <0 IF OX ! ELSE Z @ X! ENDIF; should contain the displacement to the beginning of the ELSE code, but since the ELSE has not yet been en- Test X Then Else countered, it cannot be filled in. The parameter stack is a for <0 Part Part convenient place to store a marker noting this fact. This tests the variable X. If Xis less than zero, the defini- (2) ELSE. When ELSE is encountered, all the code tion zeros X. If X is not negative, it sets X to the current for the THEN part of the IF-THEN-ELSE structure has value of Z. been compiled. The ELSE procedure thus codes into the Of the words used in this definition, IF, ELSE, and new definition the address of a BRANCH procedure, ENDIF all have their precedence bits set, which means which should branch around the as yet undefined ELSE that they are executed during compile time. Typical code. Again, since this displacement is not known, the definitions of each are as follows: ELSE procedure leaves on the parameter stack the ad- to (1) IF. At compile time, this procedure adds to the cur- dress of the word that is contain the displacement. Fur- top parameter rent definition a pointer to the BRANCHO procedure. It ther, it takes the previous of the stack (the address of the displacement left from IF) and fills in the specified location with a displacement to the next free word in the new definition. (3) ENDIF. When this is encountered, the IF-THEN- PARAMETER STACK INPUT MEMORY DICTIONARY new AT COMPILE TIME STREAM LOCATION ENTRY ELSE is complete. No code is generated; instead, the branch at the end of the THEN code is fixed up. Again, EMPTY STATE INFORMAT TEST T the address of the word to be modified is on the parameter X E AT S stack at compile time. IF ~~~~<0 T process LINK TO LAST Figure 7 diagrams this in action. The parameter 105 TF 105 1 00 PAP ENTER stack is used at compile time to hold compiler-relevant 101

1 02 AT data. One of the key advantages of using a stack is that STORE 103 <0 4 ELSE BRANCH O these IF-THEN-ELSE blocks can be nested; the algo- T- 1 04 ~~~~~~~~~~~~~~1057 rithms, as specified, handle the nesting just as one would X 1l06 LITERAL ~~~~~AT 107 0 expect. 1 0 STORE 109 STORE Similar constructs can implement virtually all of the ENDIF 110 BRANCH EMPTY 41 other block structures found in most HOLs, such as DO- 112 UNTIL, DO-LOOP, and BEGIN-END. Their defini- 113 AT tions are simple enough for a user to understand and even 4. ~~~~115 ~~~~~~~~~~~~~~114STORE 0 116 EXIT modify.

Generalization 12: Defining machine-coded pro- Figure 7. Sample compilation of the if-then-else block. cedures. One of the advantages of having the PAP iden- tify the procedure type is that this makes it possible to design and debug an entire application in a totally machine-independent fashion by using the previously discussed compile facility. Then, one can go back and MOVE ( 2-BYTE CODE BLOCK MOVE ROUTINE convert selected to C L MOV ( AT ENTRY, PSTACK=COUNT ) procedures machine code for better B H MOV ( DESTINATION performance. As long as the parameter-passing interfaces B POP ( SOURCE ... (i.e., the two stacks) are maintained, the rest of the code D POP need not be modified in the slightest. XTHL ( GET PARAMETERS INTO MACHINE REGISTERS) The process of a BEGIN ( EXAMPLE OF A LOOP STRUCTURE generating machine-coded procedure B A MOV (TEST COUNT) can, in fact, be much like that that of generating the com- C ORA piler. Two procedures, often called CODE and END- WHILE CODE, work similarly to: and ;, except that they do not M A MOV ( MOV NEXT 2 BYTES) D STAX change the compile state. The PAP of the new entry H INX started by CODE points not to ENTER, but to the start of D INX the machine code to be produced; on exit, the machine M A MOV code that branches to NEXT is added. Between CODE D STAX and END-CODE are H INX operands and opcode mnemonics. D INX Actually, these are previously defined procedure names B DCX ( DECREMENT COUNT that, when executed, produce the appropriate code. By REPEAT ( END OF LOOP - GENERATES NEEDED BRANCHES reversing the order of operands and opcodes, the operand B ( POP RESTORE REGISTERS ) procedures can, at code generation time, leave on the NEXT JMP ( RETURN TO INNER INTERPRETER END-CODE ( END OF CODE GENERATION ) parameter stack information as to the type of operands to be used. The opcode procedure can then use the informa- tion immediately. This eliminates the need for syntactic Figure 8. Sample machine-coded procedure. analysis. It also allows the programmer to perform ar-

30 COMPUTER bitrary calculation at assembly time in support of the History assembly. This, again, is a macro-like capability. As in the compile mode, it is possible to build in block- Although the idea of threaded code has been around structured procedures that automatically and directly for a while, Charles F. Moore created nearly all the other generate branch code. For most cases, this matches the concepts discussed in this article. He first produced a programmer's intent without requiring him to define complete threaded-code system for real-time control of branch labels or branch instructions. telescopes at the Kitt Peak Observatory in the early Figure 8 diagrams a sample assembly for a typical 1970's. This system, and many of its successors was called microprocessor. For this machine, an entire assembler Forth, a foreshortening of Fourth for Fourth Generation can be written in as little as one page of . Computer. A version of Forth is a standard international computer language for astronomers. Other major, but separate, implementations include Stoic, IPS, and Snap. Advanced concepts Stoic has found use in hospital systems, IPS (developed in Germany) in earth satellite control, and Snap in several Many implementations of threaded-code systems in- new hand-held computers. clude features that do not follow directly from any of the Interest in such systems has increased dramatically above discussions. since the introduction of personal computers; implemen- tations are available for almost any make. Standards groups formed in the late 1970's are engaged in defining Vocabularies. The first of these features is the concept systems that are consistent and compatible over a wide of vocabularies. While the reader might have assumed range of hardware configurations. In the last two years, at from previous discussion that the entire dictionary is least three books and dozens of articles on the subject linked into a single (perhaps hashed) list, this need not be have been published. the case. Collections of procedures that are part of an Interest in Forth developed more slowly in industry overall function, such as assembling machine code, can be than it did in academic and personal computing, but separated from other procedures and given a name. This seems to be picking up there, as well. Several groups, for name is that of a specialized vocabulary. Users are quite example, are developing initial support software for a free to build vocabularies for specialized applications new computer architecture and such sys- where, for example, the entire interface specification to military using tems as a development tool. Others have found up to 80 the eventual user is basically a description (names and functions) of the procedures in that vocabulary. This is similar to, although not as comprehensive as, the cap- abilities found in large software systems and such modern programming systems as DoD's Ada. * NEW FROM YOURION PRESS . Another advantage of vocabularies is the high speed and ease with which source text can be compiled. This per- WRITINGS OF THE REVOLUTION: mits a programmer to keep the original source for spe- cialized vocabularies on a cheap storage medium-like SELECTED READINGS disk-and read and compile it into the dictionary only ON SOFTWARE ENGINEERING when needed. Support tools like linkage editors, loaders, Edited by Ed Yourdon, Writings of the Revolution is a and relocators are not needed. collection of twenty-five hard-to-find articles that helped define the state of the art in analysis, design, Screens. Another concept used in many implementa- and programming. It includes papers by Parnas, tions is that of a memory hierarchy based on units of Dijkstra, Boehm, and Mills. Softcover, $30, 1982, storage called screens, or blocks. Each unit is roughly ISBN: 0-917072-25-1. equivalent to the amount of information that can be placed on a video display screen at one time. Many of Classics in1 Softiare Engineering, also edited by Ed these units are available to the programmer; each has a Yourdon, is a collection of twenty-four landmark pa- unique number. Although conceptually on disk, copies pers tracing the evolution of structured techniques. can exist within storage buffers in main memory. These, Softcover, $31, 1979, ISBN: 0-917072-14-6. in turn, are managed like a , with references to the slower backing medium made only when necessary. Both Both books are indispensable library additions for all text and data can be kept in these screens and accessed by who want a handy collection of the philosophy and simply calling up a small set of procedures. For example, guidelines of software engineering. the command 103 LOAD might compile the source code on screen 103 into the current dictionary. Commands on To order your copies (you can charge them to this screen might themselves direct compilation on yet VISA or MASTERCARD), or for more informa- other screens, allowing hierarchies, vocabularies, and tion, call toll free 800 223-2452. Or, write libraries to be built up easily. Use of named constants per- Yourdon Press mits a symbolic reference to units such as ASSEMBLER Dept. Comp LOAD, where ASSEMBLER is a constant that returns the appropriate screen number. 1133 Avenue of the Americas New York, NY 10036 March 1982 Reader Service Number 3 - percent commonality between the ISA of threaded-code Forth Interest Group, Fig-FORTH Installation Manual and systems and architectures like P-Code (a standard in- Assembly Listings, San Carlos, Calif., 1979. termediate language for Pascal). This commonality per- Loeliger, R. G., Threaded Interpretive Languages, Byte Books, mits consideration of hardware designed to efficiently ex- Peterborough, N.H., 1981. ecute either. Meinzer, K., "IPS-An Unorthodox High Level Language," Byte, Vol. 4, No. 1, Jan. 1979, pp. 146-159. Phillips, J. B., et al., "Threaded-code for Laboratory Com- Conclusions puters," Software-Practice & Experience Vol. 8, No. 1, Jan. 1978, pp. 257-263. Threaded-code systems represent a synergistic com- Sachs, J., "An Introduction to STOIC," Technical Report bination of techniques that are, by themselves, of limited BMEC TROOI, Harvard-MIT Program in Health Science and utility but together provide unique capabilities. Threaded Technology, Cambridge, Mass., June 1976. code, particularly indirect threaded code, provides Programming techniques and manuals machine independence, portability, performance, and FORTH Standards Team, Forth-79, PO Box 1105, San Carlos, extensibility. Stacks and reverse Polish notation yield Calif. 94070, Oct. 1980. simplicity and the possibility of one-pass compilation. James, J., "What is FORTH? A Tutorial Introduction," Byte, Vol. 5, No. 8, 1980, pp. 100-126. The ability to define certain routines in the underlying Aug. ISA provides performance when it is really needed and Miller, A., and Miller, J., "Breakthrough into FORTH," Byte, Vol. 5, No. 8, 1980, pp. 150-163. easy handling of new or diverse 1/0 devices and Aug. peripherals. Dictionaries and the text interpreter integrate Using FORTH, Forth, Inc., Hermosa Beach, Calif., 1980. these features into a highly interactive programming Brodie, L. Starting FORTH, Prentice-Hall, Englewood Cliffs, resource that the average programmer finds comprehen- N.J., 1981. sible from top to bottom. Feierback, G., "Forth-the Language of Machine Indepen- In a very real sense, threaded-code systems represent dence," ComputerDesign, June 1981, pp. 117-121. neither a machine-level ISA nor a HOL and all the system Katzan, H., Invitation to FOR TH, Petrocelli Books, New York, support that goes with it. Instead, they represent a bridge 1981. between the two and give the programmer the features History and flexibility of either or both, in the combination that Moore, C., "The Evolution of FORTH-An Unusual Lan- best matches the application. guage," Byte, Vol. 5, No. 8, Aug. 1980, pp. 76-92. Although they are certainly not the ultimate or univer- Moore, C., "FORTH: A New Way to Program a Minicom- sal programming systems, threaded-code systems should puter,"AstronomicalAstrophysicsSupplement, Vol. 15, 1974, find a secure niche where small-to-moderate specialized pp. 497-511. interactive application packages, real-time control of ex- Advanced concepts ternal devices and sensors, and engineering-type throw- Harris, K., "FORTH Extensibility: Or, How to Write a Com- away code dominate. Examples include video games, test piler in Twenty-Five Words or Less," Byte, Vol. 5, No. 8, Aug. equipment, distributed control systems, hardware 1980, pp. 164-184. simulation packages, and specialized software and hard- Rather, E., and Moore, C., "The FORTH Approach to ware development tools. H Operating Systems," Proc. ACM '76, Oct. 1976, pp. 233-240. Interest group Acknowledgment Forth Interest Group, PO Box 1105, San Carlos, Calif., 94070. Newsletter I would like to thank D. Colburn for his helpful sugges- Forth Dimensions, Forth Interest Group, San Carlos, Calif. tions regarding terminology.

Bibliography and information sources Peter Kogge, a senior engineer, is respon- sible for systems architecture in the MCF Threaded-code concepts project at IBM's Federal Systems Division facility in Owego, New York. Prior to this, Bell, J. R., "Threaded Code," Comm. ACM, Vol. 16, No. 6, he was responsible for much of the June 1973, pp. organi- 370-372. zation of the IBM 3838 array processor Dewar, R. B. K., "Indirect Threaded Code," Comm. ACM, and Space Shuttle input/output processor. Vol. 18, No. 6, June 1975, pp. 330-331. He has taught at the University of Mas- Ritter, T., and Walker, sachusetts and is currently an adjunct pro- G. "Varieties of Threaded Code for : _ fessor in the Depart- Language Implementation," Byte, Vol. 5, No. 9, Sept. 1980, ment at the State pp. 206-227. University of New York at Binghamton. His research interests include high-speed algorithms and computer He is Language organization. author of The Architecture of Pipelined Implementations Computers, published by McGraw-Hill in 1981. received the BS Dewar, R. K., and McCann, A. P. "MACRO Kogge degree in electrical engineering from SPITBOL-A the University of Notre Dame in 1968, the MS in systems and in- SNOBOL, 4 Compiler," Software-Practice & Experience, formation sciences from Vol. 7, No. 1, Jan. 1977, pp. 95-113. Syracuse University in 1970, and the PhD in electrical engineering from Stanford University in 1973. 32 COMPUTER