Ryo: a Versatile Instruction Instrumentation Tool For

RYO: A VERSATILE INSTRUCTION INSTRUMENTATION TOOL FOR PA-RISC Daniel F. Zucker and Alan H. Karp Technical Rep ort: CSL-TR-95-658 January 1995 This work was supp orted by NASA under contract NAG2-842 and Hewlett- Packard under gift No. 23487. RYO:aVersatile Instruction Instrumentation To ol for PA-RISC by 1 Daniel F. Zucker and Alan H. Karp Technical Rep ort: CSL-TR-95-658 January 1995 Computer Systems Lab oratory Departments of Electrical Engineering and Computer Science Stanford University Stanford, California 94305-4055 Abstract RYO Roll Your Own is actually a family of novel instrumentation to ols for the PA- RISC family of pro cessors. Relatively simple awk scripts, these to ols instrumentPA-RISC assembly instruction sequences by replacing individual machine instructions with calls to user written routines. Examples are presented showing how to generate address traces by replacing memory instructions, and how to analyze oating p oint arithmetic by replacing oating p oint instructions. This pap er intro duces the overall structure and design of RYO, as well as giving detailed instructions on its use. Key Words and Phrases: instrumentation, address tracing, cache, arithmetic 1 Hewlett-Packard Lab oratories 1501 Page Mill Road Palo Alto, CA 94306 c Copyright 1995 by Daniel F. Zucker and Alan H. Karp Contents 1 What Is RYO 1 1.1 Overview : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 1 1.2 Advantages and Disadvantages : :: ::: ::: ::: ::: ::: :: ::: :: 1 1.3 Implementation : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 3 1.3.1 Branch Delay Slot : ::: :: ::: ::: ::: ::: ::: :: ::: :: 3 1.3.2 Saving and Restoring Machine State and Parameter Passing ::: :: 6 2 How to Use RYO 7 2.1 Compiler Errors : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 7 2.1.1 BranchTargets ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 8 2.1.2 Indirect pro cedure calls : :: ::: ::: ::: ::: ::: :: ::: :: 8 2.2 Long Branches : ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 8 2.3 O set.awk : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 9 3 Example Data 9 3.1 Arithmetic : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 10 3.2 Memory :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 10 4 Conclusion 11 5 Acknowledgments 11 A Source Co de for Ryols.awk 12 B An Example Instrumentation Routine 25 iii List of Figures 1 Simple Instruction Substitution : :: ::: ::: ::: ::: ::: :: ::: :: 2 2 Mistake in Control Flow for Branch Delay Slot ::: ::: ::: :: ::: :: 4 3 Unconditional Branch Instrumentation :: ::: ::: ::: ::: :: ::: :: 5 4 Conditional Branch Instrumentation ::: ::: ::: ::: ::: :: ::: :: 6 5 Instrumented Co de with `Stepping Stone' to Correct Long Branch Errors : 9 6 Op erand Length Histogram ::: :: ::: ::: ::: ::: ::: :: ::: :: 10 7 Data Cache Miss Rate ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 11 iv 1 What Is RYO RYO Roll Your Own is a family of novel instrumentation to ols for the PA-RISC family of pro cessors [3]. These to ols replace a sp eci c set of PA-RISC assembly instructions with user supplied subroutines. Because the user provides his own custom instrumentation routines, the use of the to ol is virtually unlimited. Its uses could range from replacing a faulty hardware divide instruction with a correct software substitute, to generating address trace les for later analysis. Like Pixie[5 ], RYO generates program analysis output by running an instrumented executable, yet RYO b ene ts from b oth a simpler implementation and greater exibilityby its use of arbitrary user supplied subroutines. This pap er intro duces the overall structure and design of RYO, as well as giving detailed instructions on its use. A sp eci c memb er of the RYO family,RYOLS Roll Your Own Load Store, is presented to illustrate correct program usage. The nal section shows actual exp erimental data obtained with two memb ers of the RYO family,RYOLS and RYOFP Roll Your Own Floating Point. 1.1 Overview RYOLS exp ects an assembly level, or .s, le for input. It makes two passes through the program and replaces memory op erations oating p oint op erations in the case of RYOFP with an unconditional jump to the RYOLS library calling stub. This stub saves the state of the machine, do es any necessary parameter passing, calls the prop er RYOLS library pro cedure, and nally unconditionally jumps back to the original execution sequence gure 1. The overall design simplicity is an attractive feature, yet actual implementation involves overcoming a numb er of p esky problems. For this purp ose, a numb er of small companion programs{ xtargets, o set.awk, and lbranch.awk{have also b een implemented. 1.2 Advantages and Disadvantages Using this relatively simple substitution strategy has a numb er of advantages and disadvantages. On the plus side, substituting a single branch instruction for the single memory instruction results in an exact one for one instruction substitution that leaves the relative p ositions of the instructions unchanged. Of course, co de must b e inserted to build the library calling stubs, but this co de is placed b etween pro cedures where it can do little harm. Keeping the relative instruction p osition constant not only eliminates the need to translate a branch to a target plus a constant o set to a corrected o set, but is also the only way to ensure correct indirect branching. In the case of indirect branching, the target address cannot b e known until runtime, so the o set cannot b e translated when doing the instrumentation. Another advantage is the use of high level C-language routines for instruction substitution. This makes it relatively simple to implement arbitrarily complex on-the- y analysis. Finally, b ecause the application co de is actually executed along with the extra analysis routines, veri cation is p erformed simply by comparing program output b efore and after instrumentation. For example, when instrumenting an mp eg deco der, many subtle bugs 1 original code: ... .SPACE $TEXT$,SORT=8 .SUBSPA $CODE$,QUAD=0,ALIGN=8,ACCESS=0x2c,CODE_ONLY,SORT=24 DoDitherImage .PROC .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED .ENTRY STW %r2,-20(%r30) LDO 64(%r30),%r30 ADDIL LR'ditherType-$global$,%r27 ... modified code: ... .SPACE $TEXT$,SORT=8 .SUBSPA $CODE$,QUAD=0,ALIGN=8,ACCESS=0x2c,CODE_ONLY,SORT=24 DoDitherImage .PROC .CALLINFO CALLER,FRAME=16,SAVE_RP,ARGS_SAVED .ENTRY $n1 b,n $m1 ; STW %r2,-20(%r30) ; substituted instruction LDO 64(%r30),%r30 ADDIL LR'ditherType-$global$,%r27 ... procedure calling stub: $m1 stw %r2,156(0,%r30) ; save machine state bl $saveall,%r2 nop copy %r30,%r25 ;copy basereg to %r25 ; pass parameters bl .+12,%r26 ldo -3(%r26),%r26 STW %r2,-20(%r30) ; copy op to pass to ryo_word bl ryo_word,%r2 ; call instrumentation ldo 320(%r30),%r30 ; routine bl $restall,%r2 ; restore machine state ldo -320(%r30),%r30 STW %r2,-20(%r30) ; do mem op b,n $n1+4 ; return to normal ; program flow Figure 1: Simple Instruction Substitution 2 were noticed only b ecause the video image displayed did not lo ok quite the way it did when running the uninstrumented co de. The ma jor disadvantage of this to ol is that source co de is required to do the instrumentation. This relies on the compiler to correctly generate an intermediate assembly output which do es not always happ en. Furthermore, library calls and other pieces of co de dy- namically linked in to the executable will not b e instrumented. Finally, in the current implementation, no information is maintained concerning basic blo cks and cycles b etween instrumented instructions. This strategy was chosen to make implementation simpler in that the the high level instrumentation routine do es not knowany global information ab out the program state. It only knows that a given instruction was executed. These drawbacks could b e corrected in future revisions if necessary. 1.3 Implementation In order to cause a one for one instruction substitution, the basic strategy is to replace a targeted instruction with an unconditional jump to a stub, save the machine state, and unconditionally jump back to the instruction following the targeted instruction see gure 1. If the return address was not hard co ded at the time of instrumentation, it would b e necessary to store a return address in a register, and thereby alter the machine state. 1.3.1 Branch Delay Slot This works correctly for the most part, except that a problem arises when a load or store instruction falls in the branch delay slot of a preceding branch instruction. In this case, a branchwould b e substituted in the branch delay slot. The instruction instrumentation routine will b e correctly called as anticipated, but the nal branch will b e to the return p oint statically calculated at instrumentation time. Thus, the control ow will return to the second instruction following the branch rather than the correct branch target gure 2. Note that the branch placed in the branch delay slot of the initial branch correctly mo di es control owwhenanullifying branch is used. This problem is corrected as shown in gure 3. Now the branch instruction with a load or store following it is also targeted for replacement. It now has its own stub with an appropriate instrumentation call. This time, though, the nal instruction in the stub is not the load or store instruction as b efore, but instead the branch with the load or store in the delay slot. The nal branch returns to the second instruction past the branch, since the instruction immediately following was already executed in the delay slot. Furthermore, the initial instruction in the stub is the same branch followed by another branch in the delay slot that branches to the next sequential instruction. This is so to determine if the instruction in the delay slot would have b een nulli ed. If it was nulli ed, then the branch to the next sequential instruction will nowbenulli ed instead, and the branch will go to the intended target. If the instruction was not nulli ed, then the branch in the delay slot will b e executed, and the control ow will continue sequentially.

Load more