Development of a Forth Language Directed Processor Using Very Large Scale Integrated Circuitry

SUSAN C. LEE and JOHN R. HAYES DEVELOPMENT OF A FORTH LANGUAGE DIRECTED PROCESSOR USING VERY LARGE SCALE INTEGRATED CIRCUITRY A 32-bit microprocessor has been developed for use in embedded computer systems. It supports an interactive programming environment on an embedded computer while providing the performance of compiled languages. The microprocessor hardware was designed to execute the Forth programming lan guage directly. INTRODUCTION The Applied Physics Laboratory is heavily involved chitecture as a VLSI circuit chip. The compiler allowed in the development of embedded computer systems. An us to create candidate implementations and find their embedded computer is a special-purpose computer resid speed, size, and power requirements quickly. It also ing in and controlling a piece of hardware such as a mis provided independence from implementation technolo sile or telescope. One characteristic of an embedded gy. The design is specified at a high level and can be system is the expense of software development. The compiled for a large number of possible silicon process hardware environment of an embedded system is usually ing technologies and fabrication lines. As new fabrica inadequate to support its own software development, tion processes are invented, they can be added to the and an extra step must be added to the traditional compiler system, and the design can be recompiled. edit!compile/ debug cycle: the software is edited and The processor is being designed into ground support cross-complied on a general-purpose computer and then equipment for the Ocean Topography Experiment and downloaded to the target computer for debugging, which Navy Radar Altimeter Program altimeters. It is also be usually must be done with logic analyzers and in-circuit ing used as the flight processor for a magnetometer ex emulators. As the computational sophistication of periment on Freja, a Swedish satellite. The processor has embedded processors increases, the effectiveness of those been licensed to a commercial vendor (Silicon Com techniques diminishes and the software development time posers, Inc., Palo Alto, Calif.) and is being marketed increases. under the name scn. The availability of new tools to support the quick de velopment of custom very large scale integrated (VLSI) IMPLEMENTING FORTH IN HARDWARE circuit chips suggested an alternative approach to the Forth is an interactive, interpreted language; state problem. We used Silicon Compiler Systems' Genesil sili ments typed on the keyboard are translated into machine con compiler to explore variations on a 32-bit architec language by a program called the interpreter and are ex ture to implement the Forth language directly. Forth was ecuted immediately. A Forth system can be used as a chosen because of its excellence as an embedded-system kind of calculator by typing statements like "2 + 2" development language. Since it is extremely small, it can in Forth syntax on the keyboard; the Forth interpreter be run on the target system, eliminating cross-compila translates the typed statement, executes it, and prints the tion and downloading. The interactive nature of Forth answer on the terminal. For more complex problems, allows debugging techniques, such as interrogating vari programs can be entered from the keyboard and then ables and checking program flow, to be used on the tar executed by typing their names and arguments. get system without recourse to in-circuit emulators and Interpreted languages have an advantage over com logic analyzers. Although embedded systems have been piled languages in that the ability to test and debug pro developed with Forth using commercial processors, the grams interactively shortens the development cycle time. mismatch between the processor architecture and the This is especially true with the Forth language, whose Forth language causes 351170 to 50% of the processor's simple syntax allows it to be extended uniquely for each time to be wasted in interpretation overhead. By tailor application. Unfortunately, because of the need for run ing our architecture directly to Forth, we can achieve time interpretation, the execution speed of Forth on con significant improvements in processing speed. A 32-bit ventional computers suffers compared with that of com architecture was chosen to allow a large uniform address piled languages. Forth-oriented processor chips, by space. treating Forth as object code, eliminate the performance A key feature of our approach was the use of the penalty of run-time interpretation. Consequently, inter Genesil silicon compiler to implement the processor ar- active Forth programs running on the SC32 execute just 216 Johns Hopkins APL Technical Digest, Volume 10, Number 3 (1989) as fast as equivalent compiled programs on conventional Outer interpreter microprocessors. Forth uses a two-stack programming model: the pa Kernel layer rameter stack, which passes arguments to functions, and Inner interpreter a control flow stack, which primarily holds subroutine return addresses and is called the return stack. Most Processor Forth primitives (built-in functions like + and -) take operands from one or both stacks, push or pop the stacks (increment or decrement the stack pointers), and return a result to one of the stacks. For example, 2 + 3 is calculated in Forth by pushing 2 and 3 onto the pa Conventional hardware rameter stack and executing + , which adds the fIrst two SC32 hardware stack elements and replaces them with the answer 5. Figure 1. Hardware/software boundaries of the Forth virtual Forth is implemented on traditional processors using machine on conventional processors and on the SC32. the approach shown in Figure 1. Because of the mis match between Forth's stack model and the native pro cessor, a layer of run-time interpretation is necessary. is required in a program, several instructions must be A tiny assembly language program called the inner (or stored instead of the single address needed for threaded address) interpreter is written for the bare processor. systems. The resulting object code could be signifIcant Forth's primitive stack operators, also written in assem ly bigger than the size of a thread. The instruction set bly language, are implemented in the kernel layer. The of the SC32 has been designed so that almost all Forth top layer of a Forth system, the interactive outer inter primitives are implemented with one, single-word instruc preter, is written in Forth. tion, so the resulting code is as small as that produced The Forth inner interpreter program on traditional by a threaded system. processors uses a technique called threaded code. 1 The We gained much experience and insight into Forth defInitions of all high-level (nonprimitive) programs con oriented architectures during the design of the SC32'S sist of a list of addresses of the defInitions of the pro predecessors, the Forth Reduced Instruction Set Com gram's constituents (sometimes called threads, hence the puters 1 and 2.2-4 Many elements of the earlier architec term threaded code). The defInitions of Forth primitives tures are clearly visible in the SC32. It is an improvement are in the traditional processor's assembly language. over previous designs with more effIcient load, store, liter When a high-level program is executed, the inner inter al, and branching instructions; better support for mul preter program traces through the list of addresses. Each tiplication and division; and a better approach to stack time the inner interpreter encounters an address in the caching. list, it pushes the return address on the return stack and The SC32 instruction set was explicitly designed to im jumps to the address it found. If there is another ad plement Forth. Each instruction executes in one cycle dress at that location, the process is repeated. The inner except for memory loads and stores, which require two interpreter program nests down as deeply as necessary each. Most of Forth's primitive operations can be rep until a primitive operation dermed in assembly language resented with one instruction. For example, Forth's bi is found, and the assembly language is executed. The nary (two-operand) arithmetic operations, such as + and inner intepreter program then "threads" its way back - , are single instructions and execute in one cycle. Bi to the starting point using the return addresses saved on nary logic operations like and, or, and xor and binary the return stack. Not surprisingly, in Forth systems im comparison operators like =, <, and > also execute plemented on traditional processors, 350,70 to 50% of the in one cycle. In fact, any possible Boolean function of system's time is consumed by the inner interpreter. two variables is possible. The SC32 also has many single The SC32 eliminates this run-time overhead by cycle unary (single-operand) comparison operators and eliminating the inner interpreter. The address of a primi arithmetic instructions, such as increment, decrement, tive is replaced with the actual object code for the primi test for zero, and test for negative. tive. The addresses of other high-level routines are An SC32 instruction can access the top four items on replaced with subroutine calls. At run time, a list of SC32 either the parameter stack or the return stack, thereby instructions is traced instead of a list of addresses. The allowing single-cycle implementation of many Forth inner interpreter has become the fetch-execute cycle of stack manipulation operators (e.g., duplicate the top item the processor. on the stack, drop the top item on the stack, or transfer Readers familiar with advanced Forth implementation an item from the parameter to the return stack). This techniques will realize that this scheme is "subroutine also provides for easy use of two sets of do loop indices. threaded code with in-line expansion." Theoretically, The SC32 instruction set is more general than Forth's nothing precludes using this technique on conventional pure stack virtual machine, and sequences of more than processors, but the mismatch between Forth and typi one Forth primitive can frequently be implemented with cal instruction sets would cause Forth programs to be one instruction.

Load more