Richard L. Sites, Anton Chernoff Matthew B. Kirk, Maurice P. Marks, and Scott G. Robinson

Binary Translation

hen Digital started to design partially native and partially old code. Several techniques are used in the the Alpha AXP architecture industry to run the binary code of an in the fall of 1988, the Alpha old architecture on a new architecture. Figure 1 shows four common techni- AXP team was concerned ques, from slowest to fastest: with running existing • Software interpreter (e.g., Insignia VAX TM code and MIPS TM Solutions' SoftPC) code on the new Alpha AXP • Microcoded (e.g., PDP-11 compatibility mode in early VAX computers [5, 6]. To get computers) full performance on a new • Binary (e.g., Hunter computer architecture, an System's XDOS) • Native compiler application must be ported A software interpreter is a program by rebuilding, using native that reads instructions of the old archi- compilers. For a single pro- tecture one at a time, performing each operation in turn on a software- gram written in a standard programming language, maintained version of the old architec- this is a matter of recompile and run. A complex soft- ture's state. Interpreters are not very fast, but run on a wide variety of ware application can be built, however, from hundreds machines and can faithfully reproduce of source pieces using dozens of tools. A native port of the behavior of self-modifying pro- such an application is only possible when all parts of grams, programs that branch to data, programs that branch to a checksum of the build path are running on the new architecture. themselves, and so forth. Caching Therefore, having a way to run an existing (old interpreters gain speed by retaining predecoded forms of previously inter- architecture) binary version of a complex application preted instructions. on a new architecture is an important interim measure. A microcoded emulator operates similarly to a software interpreter but It allows a user to get applications up and running usually with some key hardware assists immediately, with minimal porting effort. Once a to decode the old instructions quickly and to hold hardware state information user's everyday environment is established, appli- in registers of the micromachine. An cations can be rebuilt over time, using native code or emulator is typically faster than an

COMMUNICATIONS OF THII A¢II/February 1993/Vol.36, No.2 19 ther (1) bounded translation systems, programs (directly or indirectly) in- in which all the instructions of the voke operating-system services. In old program must exist at translation simple environments with a single time and must be found and trans- dominant library, it can be sufficient lated to new instructions [2, 3, 7], or to rewrite that library in native code (2) open-ended translation systems, and to interpret user programs, par- in which code also may be discov- ticularly user programs that actually ered, created, or modified at execu- spend most of their time in the li- tion time. Bounded systems usually brary. This strategy is commonly require manual intervention to find used to run Windows ~ and Macin- interpreter but can run only on a 100% of the code; open-ended sys- tosh ~° programs under the Unix ~ specific microcoded new machine. tems can be fully automatic. . This technique cannot be used to run To run existing VAX and MIPS In more robust enviromnents, it is existing code on a reduced instruction programs, an open-ended system is not practical to rewrite all the shared set computer (RISC) machine, since absolutely necessary. For example, libraries by hand; collections of doz- RISC architectures do not have a some customer programs write li- ens or even hundreds of images microcoded hardware layer underlying cense-check code (VAX instructions) (such as typical VAX ALL-IN-1 ~ the visible machine architecture. to memory and branch to that code. systems) must be run in the old envi- A translated binary program is a A bounded system fails on such pro- ronment, with an occasional excur- sequence of new-architecture in- grams. sion into the native operating system. structions that reproduces the behav- A native-compiled program is a Over time, it is desirable (1) to re- ior of an old-architecture program. sequence of new-architecture in- build some images using a native Typically, much of the state informa- structions produced by recompil- compiler while retaining other im- tion of the old machine is kept in reg- ing the program. Usually, native- ages as translated code and (2) to isters in the new machine. Translated compiled programs use newer, faster achieve interoperability between code reproduces faithfully the calling calling conventions than old pro- these old and new images. The inter- standard, implicit state, instruction grams. With a well-tuned optimizing face between an old environment side effects, branching flow, and compiler, native-compiled programs and a new one typically consists of other artifacts of the old machine. can be substantially faster than any of '~jacket" routines that receive a call Translated programs can be much the other choices. using old conventions and data struc- faster than interpreters or , Most large programs are not self- tures, reformat the parameters, per- but slower than native-compiled pro- contained; they call library routines, form a native call using new conven- grams. windowing services, databases, and tions and data structures, reformat Translators can be classified as ei- toolkits, for example. Also, these the result, and return. The Alpha AXP Migration Tools Figure 1. team considered running old VAX Common binary programs on Alpha AXP techniques for running computers using a simple software olcl code interpreter, but rejected this method on new because the performance would be computers too slow to be useful. The idea of using some form of microcoded Figure 2. Binary emulator was rejected also. This translation technique would compromise the ancl execution performance of a native Alpha AXP process implementation, and VAX compati- bility would be nearly impossible to achieve without microcode, which is inconsistent with a high-speed RISC design. Therefore, we turned to open- ended binary translation. We were aware of the earlier Hewlett-Packard binary translator, but its single-image HP ~ 3000 input code looked much simpler to translate than large collec- tions of hand-coded VAX assembly language programs [1]. One team member wrote a VAX-to-VAX bi- nary translator as a proof of concept, which looked feasible, so we set the

10 February 1993/Vol.36, No.2 /C~UMnUr.,~nI,"J"nUONI OP 'UMllUACre following goals:

1. Open-ended (completely auto- matic) translation of almost all user- mode applications from the OpenVMS VAX system to the OpenVMS AXP system. 2. Open-ended translation of almost all user-mode applications from the ULTRIX system to the DEC OSF/1 system. 3. Run-time performance of trans- lated code on Alpha AXP computers that meets or exceeds the perfor- mance of the original code on the original architecture. 4. Optional reproduction of subtle old-architecture details, at the cost of run-time performance, e.g., complex instruction set computer (CISC) in- struction atomicity for multithreaded applications and exact arithmetic traps for sophisticated error hand- must also include a way to run old run just like native images [4]. Trans- lers. code not discovered (or nonexistent) lated images run with the assistance 5. If translation is not possible, gen- at translation time. The translated- of the translated image environment, eration of explicit messages that give image environment (TIE) and mxr discussed later. The VEST binary reasons and specify what source run-time environment support the translator is written in C++ and changes are necessary. VEST and mx translators, respec- runs on VAX, MIPS, and Alpha AXP While creating the VAX translator, tively, by reproducing the old oper- machines. The TIE is written in the we discovered the process of building ating environments. Each environ- OpenVMS system programming lan- flow graphs and tracking data de- ment supports open-ended guages, Bliss, and Alpha AXP assem- pendencies yielded information translation by including a fallback bler. about bugs, perfor- interpreter of old code and extensive To locate VAX code, VEST starts mance bottlenecks, and dependen- run-time feedback to avoid using the disassembling code at known entry cies on features not available in all interpreter except for dynamically points and traces the program's flow Alpha AXP operating systems. This created code. Our design philosophy of control recursively. Entry points analysis information could be valu- is to do everything feasible to stay out come from main and global routines, able to a source code maintainer. of the interpreter, rather than to in- debug symbol table entries, and op- Thus, we added one more product crease the speed of the interpreter. tional information files (including goal: This approach gives better perfor- run-time feedback from the TIE). mance over a wider range of pro- As VEST traces the program, it 6. Optional source analysis informa- grams than using pure interpreters builds a flow graph that consists of tion. or bounded translation systems. basic blocks (i.e., straight-line code To achieve these goals, the team The remainder of this article dis- sequences) annotated with informa- created two binary translators: cusses the two binary translator/run- tion derived from parsing instruc- VEST, which translates OpenVMS time environment pairs available for tions. Then, VEST performs several VAX binary images to OpenVMS Alpha AXP computers: VEST/TIE analyses on the flow graph to propa- AXP images, and mx, which trans- and mx/mxr. To establish a basis for gate context information to each lates ULTRIX MIPS images to DEC the discussion, the reader must un- basic block and eliminate unneces- OSF/1 AXP images. However, binary derstand the following terms: datum, sary operations. Context information translation is only half the migration alignment, instruction atomicity, includes condition code usage, regis- process. As shown in Figure 2, the granularity, interlocked update, and ter contents, stack depth, and other other half is to build a run-time envi- word tearing. (See box.) information that allows VEST to ronment in which to execute the generate optimized code. translated code. This second half VEST: Translating a VAX Image Analysis is important for achieving must bridge any differences between Translating a VAX image involves good performance. For example, no old and new operating systems, for two main steps: analyzing VAX code condition codes exist in the Alpha example, calling standards, and ex- and generating Alpha AXP code. AXP architecture. Without analysis it ception handling. For open-ended The translated images produced are would be necessary to compute con- translation, this part of the process OpenVMS AXP images and may be dition codes for each VAX instruc-

C~IWUNIICJlLTIONlll 0P'1"1411 A~'M/February 1993/Vol.36,No.2 ~ are not in the error path. In Figure 3, Translated Images, Files Used a path exists by which register 3 (R3) A translated image has the same for- may be used without being set if the mat as an OpenVMS AXP image and VAX BNEQ (branch if the register contains the original OpenVMS VAX does not equal zero) instruction in image as well as the Alpha AXP in- the second basic block is true the first structions that were generated for time through the code sequence. the VAX code. The run-time VAX interpreter in the TIE needs the Code Generation original VAX instructions as a fall- The VEST translator generates code back. (Also, some error handlers look tion even if the codes were not used. by converting each VAX instruction up the call stack for pointers to spe- Furthermore, several forms of analy- into zero or more Alpha AXP in- cific VAX instructions.) The ad- sis were invented to allow correct structions. The architecture map- dresses of statically allocated data in translation. For example, VEST de- ping is straightforward because there the translated image are identical to termines automatically if a subrou- are more Alpha AXP registers than their VAX addresses. The image tine does a normal return. VAX registers. The VAX architec- contains a VAX-to-Alpha AXP ad- Code analysis can detect many ture has only 15 registers, which are dress-mapping table for use during problems, including some that indi- used for both floating-point and in- lookups and may contain an instruc- cate latent bugs in the source image. teger operations. The Alpha AXP tion atomicity table, described later. For example, VEST can detect architecture has separate integer and Translated images use the uninitialized variables, improperly floating-point registers. VAX regis- OpenVMS VAX calling standard. formed VAX CASE instructions, ters R0 through R14 are mapped to Native images use different conven- stack depth mismatches along two Alpha AXP R0 through RI4 for all tions, but translated images inter- different paths to the same code (the operations except floating point. operate with native or translated program expects data to be at a cer- Registers R12, R13, and RI4 retain shareable images. Automatic jacket- tain stack depth), improperly formed their VAX designations as argument ing services are provided in the TIE returns from subroutines, and modi- pointer, frame pointer, and stack to convert calls using one set of con- fications to a VAX call frame. A la- pointer, and R15 is used to resolve ventions into the other. In many tent bug in the source image should PC-relative references. Floating- cases, jacketing services permit sub- be fixed, since the translated image point operations are mapped to F0 stitution of a native shareable image may demonstrate incorrect behavior through F14. for a translated shareable image due to that bug. The VAX architecture has condi- without modification. However, a Also, analysis detects the use of tion codes that may be referenced jacket routine is sometimes required. unsupported OpenVMS features explicitly. In translated images, con- For example, on OpenVMS AXP sys- including unsupported system ser- dition codes are mapped into R22 tems, the translated Fortran run- vices. The source image must be and R23. Similar to the HP 3000 time library, FORRTL_TV, invokes modified to eliminate the use of translator, R23 is used as a fast con- the native Alpha AXP library these features. dition code register for positive/ DEC$FORRTL for I/O-related sub- Some problems reported by VEST negative/zero results [1]. R22 con- routine calls. DEC$FORRTL has a result from code that is hackish in tains all four condition code bits, different interface than FORRTL nature. For example, we found code which are calculated only when nec- has on an OpenVMS VAX system. that expects a call mask at an entry essary. All remaining Alpha AXP For these calls, FORRTL_TV con- point to be executed as a no-op in- registers are used as scratch registers tains hand-written jacket routines. struction so that the code preceding or for OpenVMS AXP standard calls. Translating an image requires the subroutine can simply execute only one file--a VAX- the call mask, rather than go through VEST connects simple branches image. Several optional files make the overhead of a VAX jump (JMP) directly to their translated targets. translation more effective: instruction. VEST reproduces the VEST performs backward symbolic behavior of the VAX program, even execution of VAX instructions to re- 1. Image information files (IIFs). if this behavior is a result of luck. solve as many computed branch tar- VEST creates IIFs automatically to A VEST-generated flow graph is gets as feasible. If more than one provide information about shareable displayed in Figure 3. Dashed lines possible computed target exists, a image interfaces. The information represent code paths followed if a run-time lookup is done on the VAX includes the addresses of entry conditional branch is taken. Solid target address. If the lookup fails to points, names of routines, and re- lines indicate fall-through paths. A find a translated target, a fallback source utilization. problem is highlighted by a wide, VAX interpreter is used, as described 2. Symbol information files (SIFs). dashed pointer whose bottom end later. Unlike bounded translation VEST generates SIFs automatically indicates the basic block in which the systems, which must achieve 100% to control the global symbol table in a problem was uncovered. Full blocks resolution of computed targets, the translated shared library, facilitating show the path that reveals the error; VEST and mx binary translators re- interoperation between translated empty blocks show basic blocks that quire no manual intervention. and native images.

72 February 1993/Vo].36, No,2 /¢OIMMUNICATIII:)NIS IDPTHII ACM Figure 3. ii i¸;/ ::# ii!!;i ii!iii~i)i! ii i ;!ii;!i ! i !!i!~!i;!!!i~iii!ii iilil!i; !!ili!i!! !! j!!)!i!: i)ii!~ii!; i !il¸I! ¸ VEST-generated flow graph showing uninltiallzed 8070 CE variable DH RYSTONE\Proc2\504 [C] R3 used 3. Hand-edited information files %VEST-I-NONSTDCALLU, PICKY: Non-standard call uses R3. mask=O01C (HIFs). The TIE generates HIFs au- tomatically, which may be hand- edited to supply information that VEST cannot deduce. HIFs contain 3072 directives to tell VEST about unde- DH RYSTON E\PROC2\504 :!~;":~':" tected entry points, to force it to \504 OneToFifty *lntParlO; '~'~;~. \509 IntLoc = *lntParlO + 10; ':~1 change specific assumptions about an SUBL2 S A #04,SP image during translation, and to pro- MOVAB 00002COC,R2 ":'i~i~, vide known interface properties to be MOVAB 00002C14,R4 propagated into an IIF. ADDL3 S ^ #OA,@O4(AP),R1 O0010BDC: LDA R16,FFFC(R14) O0010BEO: BIC R16,#F,R16 VEST Performance Considerations O0010BE4: CMPULT R16,R36,R17 In evaluating translated-code perfor- O0010BE8: CMOVNE R17,R16,R30 mance, we recognized that there was * O0010BEC: LDL R18,4(R12) * O0010BFO: SUBL R14,#4,R14 a significant trade-off between the * O0010BF4: LDA R2,ACOC(R15) performance and accuracy of emu- * 00010BF8: LDL R18,0(R15) lating the VAX architecture. VEST O0010BFC: LDA R4,AC14(R15) permits users to select several archi- 00010C00: ADDL R18,#A,R1 tectural assumptions and optimiza- tions, including: / • D-float precision. The Alpha 3088 AXP architecture provides hardware DHRYSTONE\Proc2k512 support for D-float with only 53-bit \512 if (CharlGIob ==21') CMPB (R2),#41 R mantissas, whereas the VAX archi- BNEQ 00003097 g tecture provides 56-bit mantissas. * 00010C04: LDQU R20,O(R2) The user may select translation with 00010C08: MOV 41.R21 J either 53-bit hardware support O0010COC: EXTBL R30,R2,R19 00010C10: CMPEQ R19,R21,R24 (faster) or 56-bit software support * 00010014: BEQ R24.10C30 (slower). • Alignment. Alpha AXP instruc- tions support only naturally aligned longword (32-bit) and quadword (64- bit) memory operations. Unaligned I

memory operations cause alignment ...... faults, which are handled transpar- ently by software at significant run- 3097 time expense. The user may direct DHRYSTONE\Proc2\518 VEST to assume that data references k518 if (EnumLoc = = Identl) TSTL R3 are unaligned whenever alignment BNEQ 00003O88 information is unavailable. * 00010C30: BNE R3,10C04 • Instruction atomicity and memory {. granularity. Multitasking and multi- ..... {I processing programs may depend on instruction atomicity and memory operation characteristics similar to those of the VAX architecture. VEST uses special code sequences to pro- duce exact VAX memory character- istics. VEST and the TIE cooperate 8070_CEDHRYSTONE\Proc2~504 [C] 1 to ~nsure VAX instruction atomicity R3 used when instructed to do so. %VEST-I-NONSTDCALLU, PICKY: Non-standard call uses R3. Untranslatable Images Some characteristics make Open-

COMMUNnCAtmONSO~tNEACM/February 1993/Vol.36, No,2 73 modifications, because the image the interpreter continues to interpret formats are different. the target. Lookup operations that transfer TIE Design Overview control to the interpreter record the The run-time translated-image envi- starting VAX code address in an HIF ronment, TIE, assists in executing file entry. The VAX image can be translated OpenVMS VAX images retranslated then with the HIF infor- under the OpenVMS AXP operating mation, resulting in an image that system. Figure 4 and Table 1 show runs faster. the contents of TIE. Lookup routines are used also to VMS VAX images untranslatable, Complications may occur when call native Alpha AXP (nontrans- including: translated OpenVMS VAX images lated) routines. The TIE supplies the • Exception handler issues. Images are run under the OpenVMS AXP required special autojacketing pro- that depend on examining the VAX operating system: failure to find all cessing that allows interoperation processor status longword (PSL) dur- code during translation, VAX in- between translated and native rou- ing exception handling must be struction guarantees, instruction ato- tines, with no manual intervention. modified, because the VAX PSL is micity, memory update, and preserv- At load time, each translated image not available within exception hand- ing VAX exceptions. identifies itself to the TIE and sup- lers. plies a mapping table used by the • Direct reference to undocumented Failure to Find All Code During lookup routines. The TIE maintains system services. Some software con- Translation a cache of translations to speed up tains references to unsupported and When the VEST binary translator the actual lookup processing. undocumented system services, such encounters a branch or subroutine Every translated image contains as an internal-to-VMS service, which call to an unknown destination, both the original VAX code and the parses image symbol tables. VEST VEST generates code to call one of corresponding Alpha AXP code. highlights these references. the TIE lookup routines. The lookup When a translated image identifies • Exact VAX memory management routines map a VAX instruction ad- itself, the TIE marks its original VAX requirements. Images that depend dress to a translated Alpha AXP code addresses with the page protection on exact VAX memory management address. If an address mapping ex- called fault on execute (FOE). An behavior do not function properly ists, then a transfer to the translated Alpha AXP processor that attempts and must be modified. These images code is performed. Otherwise, the to execute an instruction on one of include those that depend on VAX VAX interpreter executes the desti- these pages generates an access viola- page size or that expect certain ob- nation code. When the VAX inter- tion fault, which is processed by a jects to be mapped to particular ad- preter encounters a flow-of-control TIE condition handler to convert the dresses. change, it checks for returns to trans- FOE fault into an appropriate desti- • Image format. Programs that use lated code. If the target of the flow nation address lookup operation. For images as data are not able to read change is translated code, the inter- example, the FOE might occur when OpenVMS AXP images without preter exits to this code. Otherwise, a translated routine returns to its caller. If the caller was interpreted, then its return address is a VAX code Figure 4. VEST address instead of a translated VAX run-time Translated Main I (Alpha AXP code) address. The environment and Shareable I I ,Z,:',:,k: I images I Alpha AXP processor attempts to execute the VAX code and generates a FOE condition. The TIE condition handler converts this into a JMP / liiiiii":;'k i; "iiiiiiii ii!i!iil "Ex2e;;;jj" iiiiiiiiiii!ii;i'"Sys' ;m""'iiiiiiii lookup operation. / H ,nte,ace Hand,ng li i!i i i i il Callbacks ii i i l VAX Instruction Guarantees Instruction guarantees are charac- Jacketing Exoept'on l::i::::i::i::i::ilS,stemSe ices teristics of a computer architecture iiiii~:~!i~iiii~j:E~i~i!iiE~:~:!~i~ii~:i:~| Interface I.~j Handling Iii~i~i[ Emulation ii~i~i~! that are inherent to instructions exe- i!iiiiiii!i!iiiiii!!iiiiiii!i!iiiiiii!iiiiiiiiiiiiiiiiiiii iiii ii!iiiiii ii ! iii i!i ii i ii ! !i i iiii i ii ii ii iii !iii!iiiiiiiiiiiiii!iiiiiiiiiiiiiiiiiiiiiiiiliiii!i i i ! iiiii iiii i i iiiiiicuted i iiiiii on ii!iii that ii architecture. For ex- ample, on a VAX computer, if in- struction 1 writes data to memory i~iiiiiiiiiiiiiiii!iiii!iiii!iiii!iiOiiiiiiii!iiiiiiiii!...Mfi na'ger'"" ~iiii ~i~i;ii~i~!iiiiiiiiiiiiiiiiiiii!iii!iE~i~iiand instruction 2 writes data to mem- ory, a second processor must not see :.!!ii!i!i!!!iii!i!i!!!!ii!i!!!!!!iiE!lInterpreter I;i!iii!i!i!!!!iiii!i:i!i!iii!i!i:!!i!i!i!i!iii!i!i!i!i!i!i!i!i!i!iiiiiii!!Instructions li!i!i!i!i!i!iii!i!i!i!iii!i!i ! the write from instruction 2 before the write from instruction 1. This property is called strict read-write

74 February 1993/Vo1.36, No.2 /¢OMMUNIr.*ATIONS OF THE ACI~J sion of two processors updating adja- metic exceptions, e.g., overflow. All ordering. these provisions are optional and The VEST/TIE pair can provide cent memory bytes without interfer- ence, even though the underlying require extra execution time. Tables the illusion that a single CISC in- 2 and 3 show the visibility differences struction is executed in its entirety, RISC instructions manipulate four or eight bytes at a time. Finally, between various guarantees on VAX even though the underlying transla- and Alpha AXP systems as well as for tion is a series of RISC instructions. VEST/TIE can provide exact mem- translated VAX programs. VEST/TIE can also provide the illu- ory read-write ordering and arith- Table 1. TIE Contents

Table 2. Single Processor Guarantees

Table 3. Multiple Processor Guarantees

COMMU#ICATIOUlOF'I'HBACM/February1993/VoL36, No.2 75 implemented in privileged-architec- the same exception characteristics as ture library (PAL) subroutines and VAX instructions; for instance, an which updates a contiguous region of arithmetic fault is imprecise, i.e., not memory containing VAX state with- synchronous with the instruction that out being interrupted. At the begin- caused it. The Alpha AXP hardware ning of each interpreted VAX in- generates an arithmetic fault that is struction, a read-and-set-flag (RS) mapped into an OpenVMS AXP instruction sets a flag that is cleared high-performance arithmetic excep- when an interrupt occurs on the pro- tion (HPARITH) exception. To re- cessor. AMOVRM tests the flag, and tain compatibility with VAX condi- special Considerations for if set, performs the update and re- tion handlers, the TIE maps Instruction Atomicity turns a success indication. If the flag HPARITH into a corresponding The VAX architecture requires that is clear, the AMOVRM instruction VAX exception when calling a trans- interrupted instructions complete or indicates failure, and the interpreter lated condition handler. Most VAX appear never to have started. Since reprocesses the interrupted instruc- languages do not require precise ex- translation is a process of converting tion. ceptions. For those that do, such as one VAX instruction to potentially BASIC, VEST generates the neces- many Alpha AXP instructions, run- Issues with Changing Memory sary trap barrier (TRAPB) instruc- time processing must achieve this VAX instruction atomicity ensures tions if guarantee of instruction atomicity. that an arithmetic instruction does /I~RESERVE = FLOATI NG_ Hence, a VAX instruction atomicity not have any partially updated mem- EXCEPTIONS controller (IAC) was created to ma- ory locations, as viewed from the is specified when translating an nipulate Alpha AXP state to an processor on which that instruction is image. equivalent VAX state. When a trans- executed. In a multiprocessing envi- lated asynchronous event-processing ronment, inspection from another routine is called, the IAC is invoked. OpenVMS AXP and OpenVMS VAX processor could result in a percep- Differences The IAC examines the Alpha AXP tion of partial results. instruction stream and either (1) Functional Differences backs up the interrupted program Since an Alpha AXP processor Most OpenVMS AXP system services counter to restart at the equivalent accesses memory only in aligned are identical to their OpenVMS VAX VAX instruction boundary or (2) longwords or quadwords, it is there- counterparts. Services that depend executes the remaining instructions fore not byte granular. To achieve on a VAX-specific mechanism are to the next boundary. Many VAX byte granularity, VEST generates a changed for the Alpha AXP architec- programs do not require this guar- load-locked/store-conditional code ture. The TIE intervenes in such sys- antee to operate correctly, so VEST sequence, which ensures that a mem- tem services to ensure the translated emits code that is VAX instruction ory location is updated as if it were code sees the old interface. byte granular. This sequence is used atomic only if the qualifier For example, the declare change also to ensure interlocked access to mode handler ($DCLCMH) system /PRESERVE= INSTRUCTION_ shared memory. Longword-size up- service establishes a handler for VAX ATOMICITY dates to aligned locations are per- change mode to user (CHMU) in- formed using normal load/store in- is specified when translating an structions. The handler is invoked as structions to ensure longword image. if it were an interrupt service rou- granularity. VEST-generated code consists of tine, required to use the VAX return Many multiprocessing VAX pro- four sections that are detected by the from interrupt or exception (REI) grams depend on byte granularity IAC. These sections have the follow- instruction to return to the invoker's for memory update. VEST generates ing functions: context. On OpenVMS AXP systems, byte-granular code if the qualifier the handler is called as a normal pro- • Get operands to temporary regis- /PRESERVE=MEMORY_ ters cedure. To ensure compatibility, the ATOMICITY TIE inserts its own handler when • Operate on these temporary regis- ters is specified when translating an calling OpenVMS AXP $DCLCMH. When a CHMU is invoked on Alpha • Atomically update VAX results image. In addition, VEST generates AXP computers, the TIE handler that could generate side effects (i.e., strict read-write ordering code if the calls the handler of the translated an exception or interlocked access) qualifier image, using the same VAX-specific • Perform any updates that cannot /PRESERVE = READ_WRITE._ mechanisms that the handler ex- generate side effects (e.g., register ORDERING pects. updates) is specified when translating an The VAX interpreter achieves image. Exception Handling VAX instruction atomicity by using OpenVMS AXP exception process- the atomic-move, register to memory Preserving VAX Exceptions ing is almost identical to that per- (AMOVRM) instruction, which is Alpha AXP instructions do not have formed in the OpenVMS VAX sys-

16 February 1993/Vol.36, No.2 /I/IOMNUM|~ATIOMBOI I'l'l~l ACM tern. The major difference is that the therefore, the two instruction sets that our users' expectations for per- VAX mechanism array needs only to have a considerable similarity. Many formance of translated programs are hold the value of two temporary reg- instructions translate one for one. much higher than for VEST. Rea- isters, R0 and R1, whereas the Alpha The MIPS architecture has very few soning that the source and target AXP mechanism array needs to hold instruction side effects or subtle ar- machines are similar, users expect the value of 15 temporary registers, chitectural details, although those mx to achieve a translated program R0, R1, and R16 through R28. that are present are particularly performance better than that of the tricky. Furthermore, the format of source program, since Alpha AXP Complex Instructions an executable program under the processors are faster. Thus, the per- Translating some VAX instructions ULTRIX system collects all code in a formance goal was producing a would require many Alpha AXP in- single contiguous segment and translated program that runs at structions. Instead, VEST generates makes it easy for mx to reliably find about the same speed as the original code that calls a TIE subroutine. almost 100% of the code in the MIPS program would run on an MIPS Subroutines are implemented in two application. The system interfaces to R4000 machine with a 100MHz in- ways: (1) hand-written native emula- the ULTRIX and DEC OSF/1 sys- ternal clock rate. tion routines, e.g., MOVC5, and (2) tems are similar enough that most VEST-translated VAX emulation ULTRIX system calls have function- MaDDing the Architectures routines, e.g., POLYH. ally identical counterparts under the It appears, at first glance, that we Together, VEST and TIE can DEC OSF/1 system. could simply assign each MIPS regis- translate and run most existing user- The challenges in mx stem from ter to a corresponding Alpha AXP mode VAX binary images. As shown the fact that the source architecture register, because each machine has in Table 4, performance of trans- is a RISC machine. For example, 32 general-purpose registers. The lated VAX programs slightly exceeds DEC OSF/1 AXP is a 64-bit comput- translated code would then have two the original goal. Performance de- ing environment, i.e., all pointers scratch registers, since the MIPS ar- pends heavily on the frequency of used to communicate with the oper- chitecture does not allow user-level use of VAX features that are not ating system are 64 bits wide. This programs to use registers K0 and K 1, present in Alpha AXP machines. environment does not present a which are reserved for the operating problem when the pointer is passed system kernel. ULTRIX MIPS Translation in a register. However, when a Unfortunately, translation re- The translator that converts pointer (or a long data item, such as a quires more than two scratch regis- ULTRIX MIPS programs to DEC file size) is passed in memory, it must ters. The Alpha AXP architecture OSF/1 AXP programs is called mx. be converted between the 32-bit rep- does not have byte or halfword (16- The mx project started after VEST resentation, used by the ULTRIX bit) loads or stores, and the code se- was functional, and we took advan- system, and the 64-bit AXP repre- quences for performing these opera- tage of the VEST common code base sentation, even when the semantics tions require four or five scratch reg- for much of the analysis and Alpha of the operating system call are the isters. Furthermore, mx requires a AXP code assembly phases of the same on both systems. base register to locate mxr without translator. In fact, about half of the A significant challenge is the fact having to load a 64-bit address con- code in mx is compiled from the same source files as those used for Table 4. Translated VAX Performance, Normalized to Ratlve-complled VEST, with some architectural spe- OpeRVMS AXP Code cifics supplied by different include files. The C+ + language has proven quite valuable in this regard. mxr is the run-time support sys- tem for translated programs. It pro- vides services similar to TIE, emulat- ing the ULTRIX MIPS environment on a DEC OSF/1 AXP system, mxr is written in C+ +, C, and Alpha AXP assembler. l ,U

Challenges 2.2 1.0 Creating a translator for the MIPS 1.0 R2000/R3000 architecture presented iii ! ! iiiii!! iiiiiiiiii!iiiiii if!ill!iiiiiiiii iii!2i~iiiiiiiiiiiil iiiiiiii!illiiiiil ii iilii iil il!!i iil ii!!!i i~ iili !i iiii iiiiii iiiiiii!iiiiii iiii!ii!iilii ii¸¸iilliii iii!! iil ili!iiliii ii !iiiii!i a host of new opportunities, along 3.1 1.0 with some significant challenges. The basic structure of the mx translator is considerably simpler than that of 1The DEC 7000 was running at a derated speed compared to production DEC VEST. Both the source and the tar- 7000s. get architectures are RISC machines; 2Timing information for this run is not available.

CONNUNICA~ON| OF TNi A|/Fcbrua;y 1993/Vo|,~, No,~ 1711 ters are mapped to the 16 remaining extended to double format in the Alpha AXP registers, and the re- whole 64-bit register. We defined two maining registers are assigned to forms for values in Alpha AXP memory slots in the mxr communica- floating-point registers: computa- tion area. When a MIPS basic block tional form, in which computation is uses one of the slotted registers, mx done, and canonical form, which assigns it to one of the scratch regis- mimics the MIPS even and odd regis- ters. If the first reference reads the ters. If a MIPS program loads an old contents of the register, mx gen- even register and uses this register as erates a load instruction from the a single-precision value, mx loads the stant at each call. Finally, the MIPS communications area. If the value of value from memory to be used com- architecture has more than 32 regis- the MIPS resource changes in the putationally. If a MIPS program ters, including the HI and LO regis- basic block, the scratch register is loads an even register only but does ters used by the multiply-and-divide stored in the communication area not use this register in the basic instructions, and a floating-point before the end of the block. As in block, mx puts the 32-bit value into condition register, whose layout and most compilers, if we run out of reg- half of the Alpha AXP floating-point contents do not correspond to the isters, a spill algorithm chooses a register. This permits correct behav- Alpha AXP floating-point condition value to save in the communication ior in the pathological case where register. area and frees up a register. half of a floating-point number is In mx, we assign registers using Alpha AXP integer registers are loaded in one place, and the other standard compiler techniques. To 64 bits wide, whereas MIPS registers half is loaded in some other basic assign registers to 33 MIPS resources are only 32 bits wide. We chose to block. If a register is used as a single- (the 32 general registers plus one 64- keep all 32 bit values in Alpha AXP precision number in basic block with- bit register to hold both HI and LO), integer registers as sign-extended out first being loaded, the code gen- certain registers are permanendy values, with the high 32 bits equal to erator inserts code to convert it from mapped, and other MIPS registers bit 31. This approach occasionally canonical to computational floating- are kept in either AXP registers or requires mx to generate additional point form. If a single-precision memory. The MIPS argument-pass- code to create canonical 32-bit inte- value has been computed in a block ing registers A0 through A3 are per- ger results, but the 64-bit compare and is live at the end of the block, it is manently assigned to Alpha AXP operations do not need to change the converted to canonical form. registers R16 through R19, which values that they are comparing. mx inserts a register-mapping are the argument registers in the The floating-point architecture is table into the translated program DEC OSF/1 AXP calling standard. more complex. Each of the 32 MIPS that indicates (1) which MIPS re- This correspondence simplifies the floating-point registers is 32 bits sources are statically mapped to work needed when mxr must take wide. Only the even registers are which Alpha AXP registers and (2) arguments for an ULTRIX system used for single precision, and a which MIPS resources are normally call and pass them to a DEC OSF/1 double-precision number is kept in kept in memory. This table allows system call. Similarly, the argument an even-odd register pair. We map mxr to find the MIPS resources at return registers V0 and V1 are each pair of MIPS floating-point reg- run time. mapped to the Alpha AXP argument isters onto a single 64-bit Alpha AXP return registers R0 and R1. The re- floating-point register. Also, one Finding Code turn address registers and stack Alpha AXP floating-point register As with the VEST translator, mx pointer registers of the two machines represents the condition code bit of finds code by starting at entry points are also mapped. MIPS R0 is the MIPS floating-point control reg- and recursively tracing down the mapped to Alpha AXP R31, where ister. Thus, the mx code generator flow of control, mx finds entry points both registers contain the same hard- can use 14 scratch registers, mx goes using the executable-file header, the wired zero value. We reserve Alpba to considerable effort to find paired symbol table (if present), and feed- AXP registers R22 through R24 as loads and stores in the MIPS code back from mxr (if present). Finally, scratch registers and use them also stream and merge them into one mx performs a linear scan of the en- when interfacing to mxr. We reserve Alpha AXP floating-point operation. tire text section for unexamined Alpha AXP R14 as a pointer to an MIPS single-precision operations longwords, mx analyzes any data that mxr communication area. Finally, we cause problems with floating-point looks like plausible code but does not reserve three more registers as correspondence. Since the single- connect this data into the main flow scratch registers for use by the code precision number on MIPS machines graph. Plausible code consists of a generator. is kept in only the even register of the series of valid MIPS instructions ter- The remaining 16 Alpha AXP register pair, the even and odd regis- minated by an unconditional transfer registers are available to be assigned ters in a pair are independent when of control. to the remaining 23 MIPS resources. single-precision (or integer) opera- While finding code and connect- After the code is analyzed and we tions are done in the floating-point ing the basic blocks into a flow graph, have register usage information, the unit. On Alpha AXP machines, com- mx looks for the code sequence that 16 most frequently used MIPS regis- putation must be done on a value indicates a switch statement, i.e., a

78 February 1993/Vo1.36, No.2/¢OllilUNICATIOlilOIITIIIA|M multi-way branch, usually through have seen programs that branch into and (3) sequences that change the an element of a table, mx finds the the delay slot of other instructions, floating-point rounding mode (used branch table and connects each of requiring us to indicate that the delay to truncate a floating-point number the possible targets as successors of slot instruction is a member of two to an integer) to a single opcode that the branch. different basic blocks--the block it represents the Alpha AXP convert ends and the one it starts. We have operation with the chopped mode Code Analysis observed programs that put software (/C) modifier. Static analysis of hundreds of MIPS breakpoint (BREAK) instructions in MIPS code contains a number of programs indicates that only 10 in- the branch delay slot, and thus common code sequences that cross structions account for about 85% of BREAK ends a basic block without basic block boundaries, but which all code. These instructions are LW, being the last instruction. Some com- can be compressed into a single basic ADDIU, SW, NOP, ADDU, BEQ, pilers schedule code so that half of a block in Alpha AXP code. Examples JAL, BNE, LUI, and SLL. The cor- floating-point register is stored and of these are the min and max func- responding sequences of Alpha AXP then reused before the other half is tions, which map neatly onto a single code range from zero operation stored. The general principle that we conditional move (CMOVxx) in- codes (opcodes) (for NOP, since the intuit from these observations is "if a struction in Alpha AXP code. The Alpha AXP architecture does not code sequence is not expressly pro- code generator looks for these se- require NOPs anywhere in the code hibited by the architecture, some quences, merges the basic blocks, and stream) to two opcodes (for SLL). program somewhere will use it." creates an extended basic block, Code analysis for source programs which includes pseudo-opcodes that is much more important in mx than Code Generation indicate the MIPS code idiom. in VEST, because the coding idioms After the program is parsed and ana- After the optimizer completes the for many common operations differ lyzed and the flow graph is built, the list of instructions, it translates each between the Alpha AXP and MIPS code generator is called. It builds the abstract opcode to zero or more processors. The simple technique of register-mapping table, then in turn, Alpha AXP opcodes, again building mapping each MIPS instruction to a processes each basic block, generat- a linked list of instructions. This pro- sequence of one or more Alpha AXP ing Alpha AXP code that performs cess may permit further improve- instructions loses much of the con- the same functions as the MIPS code. ments, so the optimizer makes a sec- text information in the original pro- At each subroutine entry, mx ond pass over the Alpha AXP code. gram. scans the code stream with a pattern- When processing a basic block, the For example, the idiom used to matching algorithm to see if the code code generator assumes that it has an load a 32-bit constant into a register corresponds to any of a number of unlimited number of temporary re- on MIPS machines is to generate a standard MIPS library routines, such sources. Since this is not actually load upper immediate (LUI) opcode, as strcpy. (Note that the ULTRIX true, the code generator then calls a placing a 16-bit constant in the high- operating system has no shared li- register assigner to allocate the real order 16 bits of a register. This oper- braries, so library routines are bound Alpha AXP temporary resources to ation is followed by an OR immediate into each binary image.) If a corre- the intermediate temporary regis- (ORI) opcode, logically ORing a 16- spondence exists, the entire subrou- ters. The register assigner will load bit zero-extended value into the reg- tine is recursively deleted from the and spill MIPS resources and gener- ister. The LUI corresponds exactly flow graph and replaced with a ate temporary registers as needed. to the Alpha AXP load address high canned routine to perform the sub- Finally, the list of Alpha AXP in- (LDAH) opcode. However, the routine's work on Alpha AXP proc- structions is assembled into a binary Alpha AXP architecture has no way essors. This technique contributes stream, and the instruction scheduler to directly ORing a 16-bit value into a significantly to the performance of rearranges them to remove resource register and cannot load even a zero- translated programs. latencies and use the chip's multiple- extended 16-bit constant into a regis- For each remaining basic block, issue capability. ter. When the high-order bit of the the instructions are converted to a 16-bit constant is 1, the shortest linked list of intermediate opcodes. Image Formats translation for the ORI is three in- At first, each opcode corresponds The file format for input is the stan- structions. The mx translator scans exactly to a MIPS opcode. The list is dard ULTRIX extended common the code looking for such idioms and then scanned by an optimization object file format (COFF). In most generates the optimal two-instruc- phase, which looks for MIPS coding ULTRIX MIPS programs, the text tion sequence of Alpha AXP code idioms and replaces them with ab- section starts at 00400000 (hexadeci- that performs the 32-bit load. No stract machine instructions that bet- mal) and the data at 10000000 (hexa- opcode exists that corresponds to the ter reflect the idiom. For example, decimal). In virtually all programs, a ORI, but the results in the registers mx changes (1) loads of immediate large gap exists between the virtual are correct. values to a non-MIPS hardware load address for the end of text and the We thought we would never see a immediate (LI) instruction, (2) shift start of the data section. When mx number of code possibilities. In ret- and add sequences to abstract opera- creates the output image, it places rospect, this proved to be a mis- tions that reflect the Alpha AXP the generated Alpha AXP code after guided assumption. For example, we scaled add and subtract sequences, the MIPS code and before the MIPS

©Om~UNS©ATSONS OF TSm I~/February 1993/VoL36, No.2 7~ MIPS code separately from the same virtual address that it would Alpha AXP code. have had under the ULTRIX operat- The translated image is not in ing system, mxr resets the protection DEC OSF/1 AXP executable format. of the MIPS code to read/no-write/ Instead, it looks like a MIPS COFF no-execute, the Alpha AXP code to file, but with the first few bytes read/no-write/execute, and the data changed to the string "#!/usr/bin/ to read/write/no-execute. mxr". Then mxr allocates a communica- tion area and initializes Alpha AXP Executing a Translated Program R14 to point to this area. The com- data. This allows the program to When a translated image is run on munication area contains save areas have one large text section. The DEC OSF/1 AXE its modified for MIPS resources, initialized point- Alpha AXP code begins at an Alpha header invokes mxr first, mxr uses ers to mxr services routines, and AXP page boundary, so that we can the memory map (mmap) system call other scratch space, mxr then con- set the memory protection on the to load the translated program at the structs new command argument (argv) and environment vectors as Table 5. Translated MIPS Relative Performance 32-bit wide pointers (as the MIPS program expects), arranges to inter- cept certain signals from the DEC OSF/1 AXP system, and transfers control to the translated start address of the program. When a system signal is delivered espresso 2.4 1.1 (1.0) a to the program, control goes to the li 1.6 1.2 0.0) eqntott 1.6 2.1 (1,0) signal intercept code in mxr. This compress 2.7 1.0 (1.0) code transforms the signal context SC 2 w structure from the DEC OSF/1 AXP gcc 2.1 1.2 (1.0) system and constructs a ULTRIX Geometric Mean 2.0 1.3 (1.0) MIPS style context, which it passes (without sc) then to the translated signal handler. Certain signals are processed spe- cially. For instance, a program that attempts to transfer control to a loca- spice2g6 -- -- tion containing M1PS code rather doduc 1.7 1.0 mdljdp2 2.7 1.0 than translated code gets a segmen- wave5 1.1 1.0 tation violation, since the MIPS code tomcatv 3.0 1.0 is not executable. This situation can ora 1.5 1.0 occur if a routine modifies its return alvinn 1.6 1.0 address to be a MIPS address con- ear 1.7 1.0 mdljsp2 1.4 1.0 stant, mxr will examine the target swm256 2,3 1.0 address and, if it corresponds to the su2cor 2.7 1.0 start of a pretranslated MIPS basic hydro2d 2.9 1.0 block, divert the flow of control to nasa7 2.6 1.0 the translated code for that block. If fpppp 2.2 1.0 Geometric Mean 2.0 1.0 not, mxr enters the MIPS inter- (without spice2g6) preter. The interpreter proceeds to emulate the M1PS code until a trans- lated point is reached, mxr then 1The values in parentheses are from running once, then retranslating with the run- resynchronizes its machine state and time feedback from the first run; this gave a significant performance difference only for the programs shown. reenters the translated code. 2Timing information for this run is not available. NOTE: The larger the number, the slower the performance. These performance Translation Goals and Classes of numbers were measured on derated field test hardware and software at various Programs Not Supported times during 1992; production results will vary somewhat. The SPEC benchmarks Our goal was to translate most user are written in FORTRAN and C; no conclusions should be drawn about other mode MIPS programs compiled for a classes of programs written in other languages. MIPS R2000 or R3000 machine run- NOTE: The larger the number, the slower the performance. These performance ning ULTRIX Release 4.0 (or later) numbers were measured on derated field test hardware and software at various to run identically on the DEC OSF/I times during 1992; production results will vary somewhat. The SPEC benchmarks AXP system with acceptable perfor- are written in FORTRAN and C; no conclusions should be drawn about other mance. As shown in Table 5, perfor- classes of programs written in other languages. mance of translated MIPS programs

80 February 1993/Vo1.36, No.2 /¢OMMUNICATIONSOFTHIIAClll meets or exceeds the original goal. critical morale boost at the right time. tecture translation, RISC computers Due to extreme technical obsta- Jim Gettys also has been an impor- cles, some classes of programs will tant and vocal supporter. About the Authors: never be supported by mx. We de- Success would not have been pos- RICHARD L. SITES is a senior consult- ant engineer in the semiconductor engi- cided not to translate programs that sible without the enthusiastic support neering group at Digital Equipment Cor- use privileged opcodes or system of the OpenVMS AXP and DEC poration. calls or that need to run with su- OSF/1 AXP operating system peruser privileges. In cases where groups, and the respective run-time ANTON CHERNOFF is a member of the technical staff at Digital working in the the file system hierarchy differs be- library groups, especially Matt Alpha migration tools group. He spent tween the ULTRIX and DEC OSF/1 LaPine, Larry Woodman, Hai 1982 through 1991 at Liant Software AXP systems, programs that expect Huang, Dan Murphy, Nitin Corporation as a senior consulting engi- files to be in particular places or in a Karkhanis, Ray Lanza, Anton Ver- neer in compiler and debugger develop- particular format may fail. Similarly, hulst, and Terry Grieb. ment. programs that read /dev/kmem and The Porting and Performance expect to see an ULTRIX MIPS Engineering Group did extensive MATTHEW B. KIRK is a senior software memory layout fail. porting and testing of customer ap- engineer in the SEG/AD AXP migration Certain other classes of programs plications. The group members, es- tools group, where he works on binary are not currently supported, but are pecially Shamin Bhindarwala and translator development, testing, and sup- port. He has also designed and developed technically feasible. These include Robi Aljaar, were the source of ex- automated architectural test software for big-endian MIPS programs from tremely valuable customer feedback. pipelined VAX hardware and the CI non-Digital MIPS environments, The Engineering System Group computer interconnect. programs that use R4000 or R6000 also provided valuable feedback. instructions that are not present on The Alpha AXP Migration Tools MAURICE P. MARKS is a senior engi- the R3000 model, programs that team have each made several key neering manager in the semiconductor need to be muhiprocessor safe, and contributions: Kate Burleson, Peigi engineering advanced development programs that require certain cate- Cleminshaw, George Darcy, Cather- group. He manages the AXP Migration gories of precise exception behavior. ine Frean, Bruce Gordon, Rick Gor- Tools Group and contributed to the de- sign and implementation of translators. ton, Kevin Koch, Mark Herdeg, Gio- In twenty years with Digital, he has led Summary vanni Della Libera, Nikki compiler, operating system, hardware Building successful, turnkey binary Mirghafori, Srinivasan Murari, Jim and software tools, and chip projects, translators requires hard work but Paradis, and Ashutosh Roy. !"4 not magic. We built two different SCOTT G. ROBINSON is a software translators, VEST and rex. In both References engineering manager in the AXP migra- tion tools group and has contributed to cases, the old and new environments 1. Bergh, A., Keilman, K., Magenheimer, the design and implementation of the are, by design, quite similar in funda- D. and Miller, J. HP 3000 emulation on HP Precision Architecture comput- binary translators, particularly the VAX mental data types, memory address- ers. Hewlett-PackardJ. (Dec, 1987). translated-image environment. He previ- ing, register and stack usage, and 2. Echo Logic, Inc. News Release (May 4, ously worked on a wide variety of Digital operating system services. Transla- 1992). hardware and software implementations. tors between dissimilar architectures 3. Hunter, C. and Banning, J. DOS at or operating systems are a different RISC. Byte Mag. (Nov. 1989), 361-368. Authors' Present Address: Digital Equip- matter. Translating the code might 4. Kronenberg, N. et al. Porting ment Corp., TAY2-2/F14, 153 Taylor St., be a reasonably straightforward task. OpenVMS from VAX to Alpha. Com- Littleton, MA 01460-1407. However, emulating a run-time envi- mun. ACM 36, 2 (1993). 5. Sites, R., Ed. Alpha Architecture Refer- The following are trademarks of Digital Equip- ronment in which to execute the ment Corporation: ALL-IN-I, Alpha AXP, code might present insurmountable ence Manual. Digital Press, Burlington, AXP, DEC OSF/1 AXP, Digital, OpenVMS Mass., 1992. AXP, OpenVMS VAX, ULTRIX, and VAX. technical and business obstacles. 6. Sites, R. Alpha AXP architecture. Com- Without capturing the environment, mun. ACM 36, 2 (1993). The following are third-party trademarks: an instruction translator would be of 7. Wirbel, U DOS-to-UNIX compiler. MIPS is a trademark of MIPS Computer Sys- no USC. Electr. Eng. Times (Mar. 14, 1988), 83. tems, Inc.; Windows is a trademark of Micro- soft Corporation; Unix is a registered trade- mark of Unix System Laboratories, Inc.; Acknowledgments CR Categories and Subject Descrip- is a registered trademark of Apple Steve Hobbs originally suggested the tors: C.0 [Computer systems organiza- Computer, Inc.; HP is a registered trademark binary translation path in the archi- tion]: General--system architectures; D.3.4 of Hewlen-Packard Company. tecture task force discussions. Nancy [Software]: Programming Languages, Permission to copy without fee all or part of this Kronenberg and Bob Supnik added Processors--compilers, interpreters, run-time material is granted provided that the copies are not made or distributed for direct commercial advantage, critical early support and later coor- environments; 1.2.2 [ method- ologies]: Artificial Intelligence, Auto- the ACM copyright notice and the title of the publi- dination. Jud Leonard set the engi- cation and its date appear, and notice is give that matic programming--program transforma- neering direction of doing careful copying is by permission of the Association for tion Computing Machinery. To copy otherwise, or to static translation once, instead of on- Additional Key Words and Phrases: republish, requires a fee and/or specific permission, the-fly dynamic translation at each Binary translation, computer architec- execution. Butler Lampson added a ture, CISC computers, processor archi- ©ACM0002-0782/93/0200-069 $1.50

¢OImMUNICATIONSOF THU ACid/February 1993/Vol.36, No.2 81