(19) &   

(11) EP 2 182 433 A1

(12) EUROPEAN PATENT APPLICATION published in accordance with Art. 153(4) EPC

(43) Date of publication: (51) Int Cl.: 05.05.2010 Bulletin 2010/18 G06F 9/38 (2006.01)

(21) Application number: 07768016.3 (86) International application number: PCT/JP2007/063241 (22) Date of filing: 02.07.2007 (87) International publication number: WO 2009/004709 (08.01.2009 Gazette 2009/02)

(84) Designated Contracting States: • AOKI, Takashi AT BE BG CH CY CZ DE DK EE ES FI FR GB GR Kawasaki-shi HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE Kanagawa 211-8588 (JP) SI SK TR Designated Extension States: (74) Representative: Ward, James Norman AL BA HR MK RS Haseltine Lake LLP Lincoln House, 5th Floor (71) Applicant: Fujitsu Limited 300 High Holborn Kawasaki-shi, Kanagawa 211-8588 (JP) London WC1V 7JH (GB)

(72) Inventors: • TOYOSHIMA, Takashi Kawasaki-shi Kanagawa 211-8588 (JP)

(54) INDIRECT BRANCHING PROGRAM, AND INDIRECT BRANCHING METHOD

(57) A processor 100 reads an interpreter program the indirect branch instruction in a link register 120c (the 200a to start an interpreter. The interpreter makes branch processor 100 internally stacks the branch destination prediction by executing, in place of an indirect branch addresses also in a return address stack 130) in inverse instruction that is necessary for execution of a source order and reading the addresses from the return address program 200b, storing branch destination addresses in stack 130 in one at a time manner. EP 2 182 433 A1

Printed by Jouve, 75001 PARIS (FR) 1 EP 2 182 433 A1 2

Description vised and brought into practical use. A principle under- lying the branch prediction is based on an idea of record- TECHNICAL FIELD ing branch results of a program executed in the past and predicting an outcome of branch as an extension of the [0001] The present invention is relates to an indirect 5 results (see, for example, Non-patent document 1). branch processing program and an indirect branch [0007] Branch instructions include not only ordinal processing method of causing a computer to execute branch instructions but also indirect branch instructions pseudo indirect branch instruction in place of indirect for taking branch based on an address stored in a regis- branch instruction. ter. Indirect branch instruction is frequently used for im- 10 plementation of the above-described interpreter and the BACKGROUND ART like. However, because indirect branch instruction is a high-cost instruction for a superscalar computer, predic- [0002] Conventionally, computers (CPU; Central tion methods focused on indirect branch instruction have Processing Unit, and the like) execute source code pro- also been studied (see, for example, Non-patent docu- grams written by person with compilation or interpretation 15 ment 2 and Non-patent document 3). mechanisms. Compilation is a technique of converting a [0008] A technique for applying a code written for a source code program into a computer- executable binary processor that handles only small address space to a format by using a conversion program, called a compiler, processor that handles large address space without us- and thereafter executing the code converted into the bi- ing indirect call is described in Patent document 1. nary format. Interpretation is a technique of executing a 20 [0009] Non-patent document 1: John L. Hennessy and special binary code, called an interpreter, on a computer David A. Patterson. "Computer Architecture-A Quantita- so that the interpreter translates a source code program tive Approach 3rd Eedition," MORGAN KAUFMANN in one at a time manner and performing operations ac- PUBLISHERS, ISBN 1-55860-724-2. cording to content as required. [0003] In recent years, we design computers based on 25 Non-patent document 2: P.-Y. Chang, E. Hao, and a technique called superscalar to increase processing Y. N. Patt. "Target Prediction for Indirect Jumps", In speed of them. This technique premises that processing Proceedings of 24th International Symposium on is performed with pipelining. In pipelining, processing Computer Architecture, pp. 274∼283, 1997. pertaining to each instruction is divided into a plurality of Non-patent document 3: K. Driesen and U. HAolzle, stages, which are independently processed by discrete 30 "Accurate Indirect Branch Prediction", In Proceed- units. Therefore, if the next instruction to be executed is ings of 25th International Symposium on Computer determined, execution of a first stage of the next instruc- Architecture, pp. 167∼178, 1998. tion can be performed simultaneously with execution of Patent document 1: Japanese Laid-open Patent a second stage of the preceding instruction, which allows Publication No. 2000-284965 high-speed execution by virtue of parallel processing with 35 parallelism corresponding to the number of divided stag- DISCLOSURE OF INVENTION es. [0004] Meanwhile, conditions for causing pipelining to PROBLEM TO BE SOLVED BY THE INVENTION function successfully include a condition that the next instruction to be executed is determined. More specifi- 40 [0010] The above-described conventional techniques cally, an instruction 2 to be executed following an instruc- are, however, disadvantageous in that even when the tion 1 (in a case where instructions are to be executed next instruction to be executed subsequent to an indirect in an order of the instruction 1, the instruction 2, an in- branch instruction written in a program is predicted based struction 3, ...) is not determined before a first stage of on branch history records, the probability that the pre- the instruction 1 completes, the next instruction 2 cannot 45 dicted instruction is the same as an instruction that is be executed in a parallel manner (the same goes for in- actually executed next is low. structions following the instruction 2). [0011] For such indirect branch instruction that has [0005] Such a circumstance typically occurs, for ex- branch destinations differing from execution to execu- ample, with branch instruction. In regard to branch in- tion, prediction cannot be made with high accuracy even struction, until a result of an immediately-preceding in- 50 with utilization of the techniques of Non- patent document struction for making conditional branch decision (for ex- 2 and Non-patent document 3. Indirect branch instruc- ample, compare instruction) is obtained, the next instruc- tions remain to be high-cost instructions for computers tion to be executed remains undetermined, which ob- that process programs by superscalar technique. structs pipeline operation. Therefore, processing speed [0012] The present invention has been conceived in of an entire computer is undesirably substantially re-55 view of the above circumstances and aims at providing duced. an indirect branch processing program and an indirect [0006] To prevent performance loss of pipeline oper- branch processing method by which highly accurate ation, a technique called branch prediction has been de- branch prediction can be made.

2 3 EP 2 182 433 A1 4

MEANS FOR SOLVING PROBLEM causes execution to transfer to a branch destination ad- dress in a general purpose register, an instruction that [0013] To solve the disadvantage and achieve the ob- causes the branch destination address to be stored in ject, an interpreter on a computer is caused to execute the call stack and execution to transfer to the address is a reading step of reading the source program stored in 5 used as a substitute. Therefore, processing performance a storage device, a step of translating the source program of the computer that executes the source program may and selecting a processing code according to content, be increased. and a step of calling the processing code selected by the [0019] According to the present invention, the branch code selecting step. To take a branch to the selected destination addresses in the indirect branch instruction code, the interpreter stores addresses of selected codes 10 are stored in a predetermined stack or a link register im- in a call stack in reverse order of processed order and plemented in the computer in inverse order to thereby performs call operations by using a pseudo indirect indirectly update a return- address prediction table inside branch instruction in place of utilizing the indirect branch the computer so that branch prediction is made success- instruction in the code selecting and call steps. fully when pseudo indirect branch is processed. This may [0014] According to the present invention, in the above 15 improve in accuracy in branch prediction of the computer invention, in the processing-code call step, a pseudo in- and increase processing performance of the computer direct branch code that includes, in place of an indirect that executes the source program. branch instruction that causes execution to transfer to a [0020] According to the present invention, a control branch destination address stored in a general purpose statement that involves branching is written in the source register, an instruction that causes the branch destination 20 program, and the pseudo indirect branch code in the in- address to be stored in the call stack and execution to terpreter includes an instruction that causes the branch transfer to the address is used as a substitute. destination addresses in the indirect branch instruction [0015] According to the present invention, in the above to be stored in the call stack in inverse order in each invention, the branch destination addresses in the indi- section before a condition determining portion of the con- rect branch instruction are stored in a predetermined25 trol statement is written. This allows to perform pseudo stack or a link register implemented in the computer to indirect branch more efficiently and at higher speed. thereby indirectly update a return-address prediction ta- ble inside the computer so that branch prediction is made BRIEF DESCRIPTION OF DRAWINGS successfully when pseudo indirect branch is processed. [0016] According to the present invention, in the above 30 [0021] invention, a control statement that involves branching is written in the source program, and the pseudo indirect [FIG. 1] FIG. 1 is a diagram for explaining branch branch code in the interpreter includes an instruction that processing. causes the addresses of the processing code to be stored [FIG. 2] FIG. 2 is a diagram for explaining multiple- in the call stack in inverse order in each section before 35 branch processing and indirect branch processing. a condition determining portion of the control statement [FIG. 3] FIG. 3 is a diagram for explaining difference is written. between compilation and interpretation. [FIG. 4] FIG. 4 is a flowchart depicting processes EFFECT OF THE INVENTION performed by a conventional interpreter. 40 [FIG. 5] FIG. 5 is a diagram for explaining pipelining. [0017] According to the present invention, a computer [FIG. 6] FIG. 6 is a diagram for explaining how func- is caused to execute a reading step of reading a source tion call is executed. program stored in a storage device, a step of translating [FIG. 7] FIG. 7 is a diagram illustrating an example the source program and selecting a processing code ac- of an interpreter that uses an indirect branch instruc- cording to content, and a step of calling the processing 45 tion. code selected by the code selecting step, thereby indi- [FIG. 8] FIG. 8 is a diagram illustrating an example rectly updating a return-address prediction table inside of an interpreter that uses a pseudo indirect instruc- the computer so that branch prediction is made by using tion. this prediction table during the processing- code call step. [FIG. 9] FIG. 9 is a functional block diagram depicting This allows to make branch prediction accurately and 50 the configuration of a processor according to a first complete taking branch to any desired address without embodiment. receiving penalty involved in the branching. Consequent- [FIG. 10] FIG. 10 is a diagram illustrating an example ly, operation equivalent to the indirect branch instruction of an interpreter that executes a plurality of indirect may be executed several to dozen times faster. branch instructions. [0018] According to the present invention, in the step 55 [FIG. 11] FIG. 11 is a diagram illustrating a program of calling the processing code that is to be executed by equivalent to "call %r6" given in FIG. 10. the computer, a pseudo indirect branch code that in- [FIG. 12] FIG. 12 is a diagram illustrating an example cludes, in place of an indirect branch instruction that of an interpreter that executes a plurality of pseudo

3 5 EP 2 182 433 A1 6

indirect branch instructions. cution encounters a branch instruction, an address of the [FIG. 13] FIG. 13 is a flowchart (1) depicting a proc- next instruction to be executed is changed to a specified ess procedure for the interpreter. (in FIG. 1, transfer to operation 3 or [FIG. 14] FIG. 14 is a flowchart (2) depicting the proc- operation 5 is to be made depending on condition). The ess procedure for the interpreter. 5 memory address and the condition pertaining to this con- [FIG. 15] FIG. 15 is a functional block diagram de- ditional decision instruction, or branch instruction, are picting the configuration of a processor according to fixed values that are encoded as a part of the instruction. a third embodiment. [0026] Meanwhile, there are some cases where it is desired to transfer execution to a plurality of branch des- EXPLANATIONS OF LETTERS OR NUMERALS 10 tinations depending on condition. FIG. 2 is a diagram for explaining multiple-branch processing and indirect [0022] branch processing. As depicted on the left-hand side of FIG. 2, implementation by arranging conditional deci- 100, 300 processor sions and branches of as many types as required is pos- 110, 310 memory-access control unit 15 sible; however, branches that involve a large number of 120, 320 register conditional decisions are less efficient. To this end, 120a, 320a general purpose register set processing called an indirect branch instruction may be 120b, 320b program counter employed. In this processing, a value in a register is used 120c, 320c link register as the next address to be executed. More specifically, it 130, 330 return address stack 20 is allowed to specify a previously calculated result itself 140, 340 branch prediction unit as an address of the next instruction. This allows a pro- 150, 350 decode and control unit gram by itself to calculate an address of a branch desti- 160, 360 arithmetic pipeline unit nation instruction and take indirect branch by using the 200, 400 main memory value, thereby taking branch to any desired operation 200a interpreter program 25 (see right-hand side of FIG. 2). 200b, 400b source program 400c pseudo indirect branch program Interpretation and compilation 200c, 400d saved data 400a compiler program [0027] Schemes for executing a program that runs on 30 a computer are broadly divided into two schemes. One BEST MODES FOR CARRYING OUT THE INVENTION is compilation and the other is the interpretation. In com- pilation, it is necessary to convert a program written in a [0023] Exemplary embodiments of an indirect branch source file into executable binary by using a compiler in processing computer program and an indirect branch advance. The executable binary converted by the com- processing method according to the present invention 35 piler is directly executed on a computer. are described in detail below with reference to the draw- [0028] In contrast, in interpretation, what is directly ex- ings. Note that the embodiments do not imply any restric- ecuted on a computer is software, called an interpreter. tions as to this invention. The interpreter translates a source file in one at a time [0024] Prior to descriptions of the embodiments ac- manner and performs operations according to descrip- cording to the present invention,conventional techniques 40 tion of the program. Therefore, a program is processed related to the present invention (indirect branch instruc- slowly as compared to compilation. However, it also has tion, interpretation and compilation, superscalar proces- an advantage that a same program can be utilized by sor and pipeline operation, branch prediction, returning computer systems even they have different designs so from a function, return address stack, interpreter support long as the computer systems have interpreters that fol- by hardware, Just In Time compilation) will be described 45 low a common specification. one by one. [0029] FIG. 3 is a diagram for explaining difference be- tween compilation and interpretation. As depicted on the Indirect branch instruction left-hand side of FIG. 3, in interpretation, what directly runs on hardware is basic software, or operating sys- [0025] In a program that runs on a computer such as 50 tems, device drivers, firmware or likethat, and interpreter. a CPU; Central Processing Unit, an instruction, called The interpreter runs while utilizing functions of the basic branch, is used to perform different operation depending software. A program is translated and executed on the on a result of still-in-progress processing. In a computer interpreter. Meanwhile, as depicted on the right-hand system, programs are arranged on memory in order and side of FIG. 3, in compilation, a program is complied to executed by a computer one after another in the arranged 55 be converted into an executable file. The executable file order. FIG. 1 is a diagram for explaining branching. In a is directly executed on hardware while utilizing functions case where a result of an immediately-preceding condi- of basic software. tional decision meets a particular condition, when exe- [0030] An interpreter itself is written as a single piece

4 7 EP 2 182 433 A1 8 of software. The interpreter basically operates such that [0036] In regard to branch instruction, the next instruc- the interpreter repeatedly translates a program in one at tion to be executed remains undetermined until a result a time manner and performs processing according its of an immediately-preceding instruction for making con- content. FIG. 4 is a flowchart depicting processes per- ditional branch decision is obtained. This causes pipeline formed by a conventional interpreter. An attention is de- 5 to stall, which degrades overall performance by a large sirably focused on branch processing "branch to opera- degree. A technique for avoiding this is a technique called tion according to instruction" at the center. It is clear from branch prediction. Branch prediction, which will be de- FIG. 4 that this branch involves jumps to a plurality of scribed in detail later, is described briefly. This is a operations (for example, it is required to jump to any one scheme of predicting a result of conditional decision in of operation A, operation B, ... , operation Y, and opera- 10 advance, speculating the next instruction to be executed tion Z), making it necessary to execute indirect branch based on the prediction even when a result of comparison as this branch. Accordingly, operation speed of the inter- is not determined at a time of branch instruction, and preter largely depends on processing efficiency of this advancing processing based on the speculation. indirect branch. [0037] By using this scheme, in a case where predic- 15 tion accuracy is sufficiently high, degradation in perform- Superscalar processor and pipelining ance due to branching is prevented. If prediction has turned to be wrong, influence on an outcome of execution [0031] Inner workings of today’s high-performance may be cancelled by discarding operations pertaining to computers are designed based on a technique called su- subsequent instructions, restoring to a condition before perscalar. This technique premises that processing is 20 an instruction is executed based on assumption, and ex- performed via pipelining. In pipelining, processing per- ecuting correct instruction set in renewed manner. De- taining to each instruction is divided into a plurality of tails on superscalar processors are described in Docu- stages, which are independently processed by discrete ment [2] (Mike Johnson. "Superscalar Microprocessor units. Design," Prentice-Hallm Inc, ISBN 0138756341.). De- [0032] FIG. 5 is a diagram for explaining pipelining. In 25 tails on techniques about computers in general are de- FIG. 5, the horizontal axis indicates flow of time and the scribed in Document [1] (John L. Hennessy, and David vertical axis indicates instruction flow. In the example de- A. Patterson. "Computer Architecture- A Quantitative Ap- picted in FIG. 5, instruction processing is divided into five proach 3rd Edition," MORGAN KAUFMANN PUBLISH- stages that are IF, ID, EX, MEM, and WB. The number ERS, ISBN 1-55860-724-2.). of stages is likely to increase in latest- model processors 30 and division into ten-odd stages is not unusual. Branch prediction [0033] A unit that processes IF (IF unit) is discussed below. This IF unit performs an operation corresponding [0038] As set forth hereinabove in section "Supersca- to IF in a particular instruction and outputs its result to lar Processor and Pipelining," making highly accurate the subsequent ID unit. Accordingly, during execution of 35 branch prediction is an important requirement for a su- ID in the next cycle, there is no operation to be performed perscalar computer. However, only prediction that can by the IF unit for the instruction. Similarly, in the next be made prior to a result of a conditional decision in an cycle in which an EX unit performs operation, there is no immediate manner is useful. Furthermore, implementa- operation to be performed by an IF unit and an ID unit. tion by means of hardware prevents making prediction [0034] To utilize this, the IF unit is configured to exe- 40 by using large-scale database. Accordingly, schemes cute IF in the next instruction in the next cycle. Even when that allow prediction as highly accurate as possible by it takes five cycles to complete processing on a single using limited resources have been devised. instruction, dividing and executing processing on follow- [0039] A principle underlying the branch prediction is ing instructions in parallel in this manner allows a single based on an idea of recording branch results of a program instruction to complete every cycle in average in a long 45 executed in the past and predicting a branch result as an span of time. This technique that divides operations each extension of the results. It is assumed that simple pre- of which requires N cycles to complete and executes diction that is made by storing results of several branches them in parallel as set forth hereinabove, thereby achiev- in the past and only according to a trend of the past ing one cycle of apparent processing time, is called branches achieves prediction accuracy of 90% or higher. pipelining. 50 [0040] However, it is practically difficult to provide in- [0035] Note that conditions for causing pipelining to dividual history records of each of all branch instructions function successfully include a condition that the next in advance, and a method of referring to history record instruction to be executed is determined. More specifi- by using low-order bits of an instruction address of a cally, the next instruction to be executed following the branch instruction as an index is employed. instruction 1 is not determined until the instruction 1 com- 55 In this case, there can arise an unfortunate occasion pletes, it is not allowed to start the next instruction in where same history record is shared by a plurality of parallel. Such a circumstance typically occurs, for exam- branch instructions, which results in failure in making ac- ple, with branch instruction. curate prediction.

5 9 EP 2 182 433 A1 10

[0041] As a scheme for increasing accuracy, a method processing. of utilizing correlation between branches has been de- [0046] Generally, a function can be called from any vised. This utilizes a fact that a result of particular branch location in a program, and the next operation to which correlates with result of branch instruction taken before execution transfer when the function has been executed reaching the particular branch instruction. These are5 is the next instruction beyond a location of a caller. Put called global branch predictions and allow prediction un- another way, returning from a function involves branch der a system of approximately 95%. Various schemes, processing with a plurality of branch destinations. For such as combined use of a plurality of prediction mech- this purpose, special instructions are provided in many anisms, have been devised to improve accuracy of cases; an instruction called return instruction (return in- branch prediction. Prediction mechanisms are described 10 struction) is typically defined. in detail from section 3.4 of Document [1]. Branch pre- [0047] A return instruction requires linking associated diction is also explained in Wikipedia Japanese version therewith. In many computers, a branch with link instruc- "Bunki-Yosoku,"at section [3] (see "Bunki- Yosoku," Wiki- tion, called "Jump and Link," is used when calling a func- pedia, Japanese version, http://ja.wikipedia.org/). tion. This is an instruction that takes branch while simul- [0042] Meanwhile, these branch prediction methods 15 taneously saves an address of a currently-executed in- are focused only on ordinary branch instruction. Put an- struction in a specific location. When a return instruction other way, they do not provide prediction means effective is called, an address of a caller is specified by using an for such an indirect branch instruction that has a plurality instruction address that is saved at this time and jump to of branch destinations. Examples of researches on pre- the caller is made again. Because functions are called in diction methods focused on indirect branch include Doc- 20 a nested manner, this save location is stored in a re- ument [4] (P.-Y. Chang, E. Hao, and Y. N. Patt. "Target source having a stack structure in many cases. Alterna- Prediction for Indirect Jumps", In Proceedings of 24th tively, a nested call can be handled by temporarily saving International Symposium on Computer Architecture, pp. an address in a specific location and saving the tempo- 274∼283, 1997.) and Document [5] (K. Driesen and U. rarily-saved address again by means of software before HAolzle. "Accurate Indirect Branch Prediction", In Pro- 25 the address is overwritten by the next linking. ceedings of 25th International Symposium on Computer Architecture, pp. 167∼178, 1998.). Return address stack [0043] The Intel(R) Pentium(R) M is a processor that includes indirect among commercial [0048] The linking and returning are implemented by processors. Mechanism of this prediction is described in, 30 a specific instruction in many cases, and calling a function for example, Document [6] (S. Gochman, R. Ronen, I. and specifying a return location may be done inside a Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sper- processor. A processor in many cases internally includes ber, and R. C. Valentine. "The Intel Pentium M Processor: a branch prediction mechanism specialized for returning Microarchitecture and Performance", Intel Technology so as to minimize cost of branch to be spent on the re- Journal, 7(2):21∼36, 2003.). 35 turning. All the currently- used commercial processors in- [0044] Other than branch prediction, a method of for- clude this prediction mechanism without exception. warding branch target addresses in cooperation with a [0049] This mechanism, called return address stack, compiler to avoid receiving penalty without making pre- stacks a caller in an internal return address stack at link- diction has also been studied (Document [7] (Toyoshima ing and makes branch prediction by using an address Takashi, Irie Hidetsugu, Goshima Masahiro, and Sakai 40 called from this stack at returning. This return address shuichi. "Register Indirect Jump Target Forwarding", stack is a special storage area inside the processor for SACSIS 2006)). However, not many commercial proc- use in prediction only unlike a return- address save loca- essors include prediction mechanism that gives consid- tion that is apparent in a program and likely to be written eration to indirect branch. Even those equipped with such in an external specification. Examples of results of stud- prediction mechanism have failed to make more accurate 45 ies on the return address stack include technical com- prediction than that ordinary branch instruction. Accord- mentary in Document [8] (C.F.Webb, . "call/ ingly, indirect branch instruction is high-cost instruction return stack", IBM Technical Disclosure Bulletin, 30 (11): for superscalar processors. 18∼20, 1988.), Document [9] (D.R.Kaeli and P.G.Emma. "Branch History Table Prediction of Moving Target Returning from a function 50 Branches Due to Subroutine Returns", In Proceedings of 18th International Symposium of Computer Architecture, [0045] Examples of processing that requires branch of pages 34∼42, 1991.), and Document [10] (K.Skadron, a special form include a "function call" function. FIG. 6 is P.S.Ahuja.M.Martonosi, and D. W.Clark. "improving Pre- a diagram for explaining how function call is executed. diction for Procedure Return with Return-Address- Stack The diagram depicts an example in which a function A 55 Repair Mechanism", In Proceedings of 31st International is called from main processing and a function B is called Symposium on Microarchitecture, pages ∼ 259271, from the function A. It is clear from FIG. 6 that the function 1998.). A is called from a plurality of locations in the main

6 11 EP 2 182 433 A1 12

Support for interpreter by hardware scribed. Overview and features of a processor according to the first embodiment will be described first. An inter- [0050] A hardware support mechanism can be provid- preter according to the first embodiment is implemented ed on a computer as a mechanism for high-speed exe- so as to, even in a circumstance where an indirect branch cution of an interpreter. Examples of such a mechanism 5 instruction is necessary for execution of a source pro- include SmartMIPS(R) of MIPS Technologies, Inc. (Doc- gram in one at a time manner, execute a pseudo indirect ument [11] (MIPS Technologies Inc., "MIPS32 (R) Archi- branch instruction in place of the indirect branch instruc- tecture for Programmers Volume IV-d: The SmartMIPS tion. (R) Application-Specific Extension to the MIPS32 (R) Ar- [0056] The interpreter according to the first embodi- chitecture",Document Number: MD00101 Revision 2.50, 10 ment thus executes, even in a circumstance where an July 1,2005.)). indirect branch instruction is necessary for execution of [0051] This is one of extensions for MIPS32(R) archi- a source program in one at a time manner, the pseudo tecture and is a specification targeted for embedded sys- indirect branch instruction in place of the indirect branch tem field. This extension is broadly divided into four pil- instruction. Therefore, the processor that executes the lars: arithmetic accelerator for cryptography, optimization 15 interpreter may make branch prediction accurately and of instruction density, interpreter support, and resource complete taking branch to any desired address without protection. Among them, overview of interpreter support receiving penalty involved in the branching. Consequent- is described in section 3.4.3 and specific instructions of ly, operation equivalent to the indirect branch instruction the same is described in section 5.1 of Document [11]. may be executed several to dozens times faster. [0052] As to this extension, CISC-like instructions are 20 [0057] The first embodiment is discussed with use of defined such that innermost loop may be implemented the interpreter, but not limited thereto, and other program with use of minimum number of instructions. CISC is a may be used as a substitute. computer architecture that has been mainstream before [0058] An indirect branch instruction and a pseudo in- advent of superscalar and characterized by a fact that, direct branch instruction, which is used in place of the in contrast to RISC (Reduced Instruction Set Computers) 25 indirect branch instruction, will be described based on that have become mainstream today, each instruction is comparison therebetween. FIG. 7 is a diagram illustrating highly functional (Complex Instruction Set). CISC and an example of an interpreter that employs an indirect RISC are explained in section 2.16, entitled "Reduced branch instruction. FIG. 8 is a diagram illustrating an ex- Instruction Set Computers" of Document [1]. ample of an interpreter that employs a pseudo indirect 30 instruction. Just In Time compilation [0059] In FIG. 7, "call %r6" represents the indirect branch instruction. More specifically, executing "call %r6" [0053] As a scheme for high- speed implementation of causes an address stored in a program counter to be an interpreter by means of software, there is a technique stored in a link register, and thereafter causes execution called Just In Time compilation. This system operates as 35 to transfer to a branch destination address stored in a an interpreter first to translate and execute a program general purpose register "r6" that is implemented in the while simultaneously produces simple statistical informa- processor. This serves as an indirect branch instruction tion about operation of the program. An entire portion of because the branch destination varies depending on the the program that is determined as Hot Spot (frequently address stored in the register "r6." Such an indirect executed portion) based on the statistical information is 40 branch instruction has failed to execute the next instruc- subjected to program translation at this point in time to tion to be executed following the branch instruction with be converted into directly-computer-executable instruc- high accuracy because branch destination addresses tions set. stored in the general purpose register cover a wide va- [0054] Each time processing encounters a portion cor- riety. responding to the instruction from that time or later, the 45 [0060] In contrast, "more %link_reg=%r6" and "ret" of interpreter stops operation, and control is moved to the FIG. 8 represent the pseudo indirect branch instruction. directly-executable instruction set. This scheme may be More specifically, the interpreter according to the first em- considered as combination of interpretation and compi- bodiment uses, even when it is necessary to use such lation and described as a technique that may be used the indirect branch instruction "call %r6" as given in FIG. only when benefit of speedup brought by direct execution 50 7, such the pseudo indirect branch instruction as "movie is compared to cost of compilation in terms and deter- %link_reg=%r6" and "ret" given in FIG. 8 in place of the mined to sufficiently outweigh the cost. The benefit of indirect branch instruction. speedup is affected by complexity of the optimization in [0061] Executing "move %link_reg=%r6" of FIG. 8 compiling. Thus, there is a tradeoff with many aspects. moves a branch destination address stored in the general This scheme requires a computer performance and large 55 purpose register "r6" to a register (hereinafter, "link reg- memory and hence is generally utilized in a large-scale ister"), which is implemented in the processor, that stores system in many cases. First embodiment therein a branch source address. Although the next "ret" [0055] Subsequently, a first embodiment will be de- is an instruction for return to a branch source that is stored

7 13 EP 2 182 433 A1 14 in the link register, in the present embodiment, the branch particularly closely related to the present invention. destination address is stored therein. Therefore, execu- [0069] Among them, the general purpose register set tion is transferred to the branch destination address. 120a is a register set that stores therein data, branch [0062] Each time an address is stored in the link reg- destination addresses, and the like for use in execution ister, the address is stacked in order in a return address 5 of various programs (such as the interpreter program stack (for description about the return address stack, re- 200a). In the example depicted in FIG. 9, the general fer to the above) implemented in the processor. When purpose register set includes general purpose registers returning after the branch with link instruction has been r0 to rN (N is an integer). executed, the processor fetches the branch destination [0070] The program counter 120b is a register that address stacked in the return address stack (addresses 10 stores therein a stored location address of the next in- are extracted in the order in which they are stacked new- struction to be executed. The link register 120c is a reg- ly), which allows accurate prediction of a return destina- ister that stores therein a branch source address. The tion address. link register 120c may cooperate with the main memory [0063] The configuration (an example of the configu- 200 to form a pseudo stack. For example, when a new ration) of the processor according to the first embodiment 15 address B is to be stored in the link register 120c in a will be described. FIG. 9 is a functional block diagram state where an address A is stored in the link register depicting the configuration of the processor according to 120c, the address A is saved in the saved data 200c in the first embodiment. As depicted in the drawing, this the main memory 200. Put another way, the link register processor 100 is configured to include a memory- access 120c saves an address in the saved data 200c each time control unit 110, a register 120, a return address stack 20 the link register 120c stores therein a new address. 130, a branch prediction unit 140, a decode-and- control [0071] When, in a state where the address B is stored unit 150, and an arithmetic pipeline unit 160. in the link register 120c and the address A is saved in [0064] The processor 100is connected to a main mem- the saved data 200cin the main memory200, the address ory 200 that stores therein various data pieces and pro- A is fetched from the link register 120c, operation of re- grams and reads the various data pieces and the pro- 25 turning the address B saved in the saved data 200c into grams stored in the main memory 200 to perform various the link register 120c is performed. Put another way, a operations in one at a time manner. saved address is returned into the link register 120c each [0065] The main memory 200 is a storage device that time an address stored in the link register 120c is fetched stores therein the various data pieces and the programs, (saved addresses are returned into the link register 120c such as an interpreter program 200a, a source program 30 in the order in which they are saved newly). 200b, and saved data 200c that are particularly closely [0072] The return address stack 130 is a register that related to the present invention. stores therein the same address as the address stored [0066] Among them, the interpreter program 200a is a in the link register 120c in order each time a branch program for reading and executing the source program source address is stored in the link register 120c. For 200b in one at a time manner. The processor 100 starts 35 example, when addresses are stored in the link register the interpreter by reading the interpreter program 200a 120c in the following order: the address A, the address stored in the main memory 200 to thereby execute the B, the address C, ..., the addresses are stored in the re- source program 200b in one at a time manner. The in- turn address stack 130 in the following order: the address terpreter is implemented so as to execute, even when it A, the address B, the address C, .... Note that the return is necessary to perform indirect branch in the process of 40 address stack 130 is not updated when a value read from executing the source program 200b in one at a time man- the main memory 200 is stored in the link register 120c. ner, a pseudo indirect branch instruction in place of an The reason for this is that because readout from the main indirect branch instruction. memory 200 is an operation of returning a saved branch [0067] For example, the interpreter, which is started source address into the link register 120c, updating the by the processor 100, is implemented so as to use, even 45 return address stack 130at this timedisrupts correspond- when an indirect branch instruction "call %rN (N is an ence between storing and fetching. integer equal to or larger than 0)" (see FIG. 7) is neces- [0073] The addresses stored in the return address sary for execution of the source program 200b, a pseudo stack 130 are extracted in the order in which they are indirect branch instruction "more %link_reg=%rN" and stored newly, by the branch prediction unit 140 and sub- "rest" (see FIG. 8) in place of the indirect branch instruc- 50 jected to branch prediction at the time of a return instruc- tion rather than executing the indirect branch instruction. tion after execution has transferred to a subroutine. [0068] Subsequently, the configuration of the proces- [0074] The branch prediction unit 140 is a device that sor 100 will be described. The memory-access control reads a branch destination address stored in the return unit 110 is a processing unit that controls data input and address stack 130 to make branch prediction at the time output to and from the main memory 200. The register 55 of a return instruction. More specifically, the branch pre- 120 is a storage device that stores therein various data diction unit 140 fetches the branch destination address pieces, such as a general purpose register set 120a, a stored in the return address stack 130 in first-in, first- out program counter 120b, and a link register 120c that are manner and determines the thus-fetched branch desti-

8 15 EP 2 182 433 A1 16 nation address as a branch destination address for use ration of a processor and a main memory according to during the return instruction. the second embodiment are similar to those of the proc- [0075] The decode-and-control unit 150 is a device essor 100 and the main memory 200 described in the that fetches a result of branch prediction from the branch first embodiment, the second embodiment will be de- prediction unit 140, sequentially reads instructions stored 5 scribed with reference to FIG. 9 as required. in the main memory 200 via the memory-access control [0079] An interpreter that executes a plurality of indi- unit 110 and the register 120, determines operation to rect branch instructions will be taken as an example first. be executed, and outputs a result of determination, which FIG. 10 is a diagram illustrating an example of the inter- is a control instruction, to the arithmetic pipeline unit 160 preter that executes a plurality of indirect branch instruc- to thereby cause the arithmetic pipeline unit 160 to per- 10 tions. Instructions given in FIG. 10 will be described one form various computations. The arithmetic pipeline unit by one. Executing "move %r6=Func_a" stores an ad- 160 is an arithmetical device that fetches a control in- dress where a function "Func_a" is to be started in the struction from the decode-and-control unit 150 and per- general purpose register "r6." Executing "move forms a plurality of operations in parallel by superscalar %r7=Func_b" stores an address where a function "Func_ technique. If the instruction fetched from the main mem- 15 b" is to be started in the general purpose register "r7." ory 200 is a return instruction, the decode-and-control [0080] Executing "call %r6" transfers execution to the unit 150 calls a return address from the return address address stored in the general purpose register "r6," which stack 130 to make branch prediction and reads the next causes the function "Func_ a" to be executed. Executing instruction to be executed from the main memory 200 "call %r7" transfers execution to the address stored in based on the prediction. A control instruction for process- 20 the general purpose register "r7," which causes the func- ing the return instruction is sent to the arithmetic pipeline tion "Func_b" to be transferred. Meanwhile, "Func_ a:" to unit 160. The arithmetic pipeline unit 160 fetches the con- "rest" are codes for the function "Func_a" while "Func_ trol instruction from the decode control unit 150, reads b:" to "rest" are codes for the function "Func_b." the return instruction, or the address stored in the link [0081] Operation equivalent to "call %r6" given in FIG. register 120c, and notifies the decode-and-control unit 25 10 will be described here. FIG. 11 is a diagram illustrating 150 that this address is the address of the next instruction a program equivalent to "call %r6" given in FIG. 10. As to be fetched and translated. illustrated in the drawing, the program equivalent to "call [0076] By inserting a sufficiently large number of in- %r6" contains "move link_=%pc" and "jump %r6." structions between "move" instruction and "ret" instruc- [0082] The "move link_ =%pc" is an instruction for stor- tion, which are the instructions that make up the pseudo 30 ing an address stored in the program counter 120b in the indirect branch depicted in FIG. 8, a value written in the link register 120c (for the program counter 120b and the register 120 may be stacked in the return address stack link register 120c, refer to FIG. 9). The instruction "jump 130. This leads to making correct prediction of a return %r6" causes transfer to the address stored in the general destination for execution of a return instruction. purpose register "r6" (in the example of FIG. 10, the ad- [0077] As set forth hereinabove, the processor 100 ac- 35 dress of the function "Funk_ a") to occur. Put another way, cording to the first embodiment reads the interpreter pro- Call instruction is an indirect-branch- with-link instruction gram 200a stored in the main memory 200 to start the that executes linking and indirect branch processing si- interpreter, in which the interpreter program 200a is im- multaneously. plemented so as to store, even in a circumstance where [0083] The interpreter of the second embodiment ex- an indirect branch instruction is necessary for the inter- 40 ecutes pseudo indirect branch instructions in place of preter program 200a to execute the source program such the plurality of indirect branch instructions as given 200b, a branch destination address in the link register in FIG. 10 and FIG. 11. FIG. 12 is a diagram illustrating 120c by using the pseudo indirect branch instruction in an example of the interpreter that executes a plurality of place of the indirect branch instruction. Therefore, accu- pseudo indirect branch instructions. rate branch prediction may be made and branch taking 45 [0084] Referring to FIG. 12, executing "move to any desired address may be completed without receiv- %r6=Func_a" stores an address where the function ing penalty involved in the branching. "Func_a" is to be started in the general purpose register "r6," and executing "move %r7=Func_b" stores an ad- Second embodiment dress where the function "Func_ b" is to be started in the 50 general purpose register "r7." [0078] Although the first embodiment has been dis- [0085] Executing "move %link_reg=%r7" stores the cussed by way of an example in which the interpreter address stored in the general purpose register "r7" in the executes the pseudo indirect branch instruction in place link register 120c. Executing "store (%r1)=%link_reg" of the single indirect branch instruction, in a second em- saves the address stored in the link register 120c in a bodiment, an example in which when a plurality of indirect 55 predetermined area in the main memory 200. branch instructions are required, pseudo indirect branch [0086] Meanwhile, "move %link_reg=%r7" is preproc- instructions are executed in place of the plurality of indi- essing for the pseudo indirect branch that replaces the rect branch instructions will be described. The configu- indirect branch instruction "call %r7" given in FIG. 10.

9 17 EP 2 182 433 A1 18

The processor 100 executes "move %link_reg=%r7" to of the plurality of indirect branch instructions (see FIG. write the address stored in the general purpose register 10 and FIG. 11) when the interpreter executes the source "r7" in the link register 120c so that the address specified program 200b. Put another way, the processor 100 can as a return destination address in the general purpose execute the instructions without performing indirect register "r7" is stored in the return address stack 130 5 branching. inside the processor 100. This is a behavior for internal [0093] More specifically, in a case where the interpret- use by the processor 100 and is a behavior that is hidden er is started by the processor 100, the interpreter accord- from the outside. ing to the present embodiment is implemented so as to [0087] Subsequently, executing "movie %link_ perform preprocessing for such pseudo indirect branch reg=%r6" stores the address stored in the general pur- 10 instructions that extract a plurality of indirect branch in- pose register "r6" in the link register 120c. Meanwhile, a structions (for example, Call instruction) that are neces- binary code that executes the function "Func_a" con- sary for execution of the source program 200b in inverse tains, at its end, "load %link_reg=(%r1)" that stores the order (in the example given in FIG. 10, "call %r7" and address saved in the predetermined area in the main "call %r6" in this order) and store branch destination ad- memory 200 in the link register 120c and "ret" that returns 15 dresses associated with the extracted indirect branch in- execution to the previous function. Meanwhile, "Func_ structions in inverse order (in the example given in FIG. b:" to "ret" are codes for the function "Func_b" that in- 10, "address of Func_b" and "address of Func_ a" in this cludes similar codes to those of "Func_a:" to "ret." order). [0088] Executing "store (%r1)=%link_reg" immediate- [0094] Operations to be performed collectively as pre- ly saves the address written in the link register 120c in 20 processing for the replacement of the indirect branch in- the main memory 200. Executing "move %link_ reg=%r6" struction may be in a unit of a control statement (such as saves the address specified in the general purpose reg- a conditional statement, a Switch statement, and a virtual ister "r6" in the link register 120c. function call) that involves branching in the source pro- [0089] Execution of the instructions described above gram. by the processor undesirably causes the address stored 25 [0095] Subsequently, a process procedure for the in- in the link register 120c to be overwritten; however, this terpreter to be started by the processor 100 according to does not cause the return destination addresses to be the second embodiment by reading the interpreter pro- overwritten because the return address stack 130 has, gram 200a stored in the main memory 200 will be de- as its name implies, a stack structure. The immediately scribed. FIG. 13 and FIG. 14 are flowcharts depicting the following return destination address is the address stored 30 process procedure for the interpreter. in the general purpose register "r6" and the next return [0096] As depicted in the drawings, the interpreter sets destination address is the address stored in the general a starting position of the program (source program) as a purpose register "r7." "program-loading start position" (Step S101), sets the [0090] Thereafter, a return instruction transfers execu- "program-loading current position" to a "program-loading tion to an instruction address specified by a current value 35 start position" (Step S102), and loads and translates a (corresponding to the value in the general purpose reg- portion of the program corresponding to one process ister "r6") in the link register 120c. This is a portion where from the "program-loading current position" (Step S103). the pseudo indirect branching is performed by using the [0097] The interpreter causes the "program-loading return instruction. Subsequently, at the end of the func- current position" to advance by the one process (Step tion, the saved address is restored again in the link reg- 40 S104), and determines whether or not the translated in- ister 120c and a return instruction is executed. struction is branch processing (Step S105). If the trans- [0091] The reason for restoring the value into the link lated instruction is not branch processing (Step S106, register 120c here is that the addresses stacked in the No), execution transfers to Step S103. returnaddress stack 130 are information for use in branch [0098] On the other hand, if the translated instruction prediction only. If return is executed without restoring the 45 is branch processing (Step S106, Yes), the "program- value in the link register 120c, execution moves to the loading current position" is set to a "program-loading stop current value in the link register 120c, or a starting line position" (Step S107), an address stored in the link reg- of "Func_a," simultaneously causing prediction to go ister 120c is saved in a stack in the main memory 200 wrong. Although the second branch address is stored in (Step S108), and a portion of the program corresponding the link register 120c immediately before call (i.e., return), 50 to one process is loaded and translated from the "pro- stacking in the return address stack 130 has been col- gram-loading current position" (Step S109). lectively performed before "Func_ a" is called. Therefore, [0099] Subsequently, the interpreter is scanned for a branch prediction by using the return address stack 130 function that executes operation corresponding to the is to be made before the decode-and-control unit 150 translated instruction (Step S110), an address of the fetches an instruction, which allows a correct outcome. 55 thus-obtained function is stored in the link register 120c [0092] Meanwhile, the processor 100 reads the inter- (Step S111), and the "program-loading current position" preter program 200a to start the interpreter and uses the is retreated by one process (Step S112). pseudo indirect branch instructions (see FIG. 12) in place [0100] The interpreter determines whether or not the

10 19 EP 2 182 433 A1 20

"program-loading current position" is equal to the "pro- bodiment have been described by way of example where gram-loading start position" (Step S113). If it is deter- the interpreter uses the pseudo indirect branch instruc- mined they are not equal to each other (Step S114, No), tion in place of execution of the indirect branch instruc- execution transfers to Step S108. tion, such an execution scheme may be applied to a vir- [0101] If it is determined that the "program- loading cur- 5 tual machine. More specifically, even when a virtual ma- rent position" is equal to the "program-loading start po- chine (Java (trademark registered) or the like) requires sition" (Step S114, Yes), the "ret" instruction is executed an indirect branch instruction for execution of a binary, to take branch to the function stored in the link register branch prediction may be made successfully by execut- 120c (Step S115). ing a pseudo indirect branch instruction in place of the [0102] The interpreter performs the operation corre- 10 indirect branch instruction to cause the processor 100 to sponding to the one process of the program with a func- store branch destination addresses in the return address tion of a branch destination (Step S116), determines stack 130. Third embodiment whether or not the function involves branch processing [0107] In the first and second embodiments, the inter- (Step S117). If branch processing is not to be performed preter utilizes the pseudo indirect branch instruction in (Step S118, No), the interpreter fetches one value that 15 place of the indirect branch instruction when executing is at the end of the function of the branch destination and a source program in one at a time manner; however, it that has been in the link register 120c and saved in the is possible to cause a compiler to generate a code that stack in the main memory 200 into the link register (Step utilizes a pseudo indirect branch instruction in place of S119), and execution transfers to Step S115. an indirect branch instruction. [0103] If the function involves branch processing (Step 20 [0108] More specifically, it is possible to speedup op- S118, Yes), a function corresponding to branch process- eration equivalent to the indirect branch instruction by ing is executed (Step S120), a position advanced from causing the compiler to compile the source program to the "program-loading stop position" by one process is set generate a program (hereinafter, referred to as "pseudo as the "program-loading start position" (Step S121), and indirect branch program") that includes the pseudo indi- execution transfers to Step S120. 25 rect branch instruction in place of the indirect branch in- [0104] The interpreter is implemented so as to execute struction and causing the processor to execute the pseu- the pseudo indirect branch in place of the indirect branch do indirect branch program. instructions in this manner. Therefore, speedup of loops [0109] Subsequently, the configuration of the proces- most frequently executed by the interpreter is expected, sor according to a third embodiment will be described. and performance of not only the indirect branch instruc- 30 FIG. 15 is a functional block diagram depicting the con- tion alone but also the entire program is likely to increase figuration of the processor according to the third embod- greatly. iment. As depicted in the drawing, this processor 300 is [0105] As set forth hereinabove, the processor 100 ac- connected to a main memory 400 and configured to in- cording to the second embodiment starts the interpreter clude a memory-access control unit 310, a register 320, by reading the interpreter program 200a and utilizes, in 35 a return address stack 330, a branch prediction unit 340, place of the indirect branch instructions that are neces- a decode-and-control unit 350, and an arithmetic pipeline sary for execution of the source program 200b, the pseu- unit 360. do indirect branch instructions that store branch destina- [0110] Meanwhile, because the memory-access con- tion addresses in the indirect branch instructions in the trol unit 310, the register 320, the return address stack link register 120c in inverse order and take branch by 40 330, the branch prediction unit 340, the decode-and- con- using the return instruction (the processor 100 internally trol unit 350, the arithmetic pipeline unit 360 correspond automatically stacks a value stored in the link register to the memory-access control unit 110, the register 120, 120c in the return address stack 130). Therefore, it is the return address stack 130, the branch prediction unit possible to make branch prediction highly accurately and 140, the decode-and-control unit 150, and the arithmetic complete taking branch to any desired address without 45 pipeline unit 160, as depicted in FIG. 9, descriptions are receiving penalty involved in the branching. Consequent- omitted. ly, operation equivalent to the indirect branch instructions [0111] The main memory 400 is a storage device that may be executed several to dozen times faster. Regis- stores therein various data pieces and programs and tering branch destination addresses collectively in the stores therein a compiler program 400a, a source pro- link register 120c allows to register the branch destination 50 gram 400b, a pseudo indirect branch program 400c, and addresses in the link register 120c prior to actual branch saved data 400d. processing with sufficient lead time and complete regis- [0112] Among them, the compiler program 400a is a tering, which is performed by the processor 100, the data program that may be executed by the processor 300 for registered in the link register 120c in the return address compilation. The compiler, when started by the processor stack 130 before the pseudo indirect branch is executed 55 300, complies the source program 400b and generates by using the return instruction. Therefore, branch predic- the pseudo indirect branch program 400c. tion can be made in time (i.e., correct prediction is made). [0113] When it is necessary to convert an instruction [0106] While the first embodiment and the second em- of the source program 400b into the indirect branch in-

11 21 EP 2 182 433 A1 22 struction (see FIG. 7, FIG. 10) during compilation of the [0119] The programs (the interpreter program 200a, source program 400b, the processor 300 that has started the source program 200b, 400b, the compiler program the compiler performs conversion into the pseudo indirect 400a, and the pseudo indirect branch program 400c) branch instruction rather than into the indirect branch in- stored in the memory 200, 400 of FIG. 9 and FIG. 15 are struction (see FIG. 8, FIG. 12). Put another way, the5 not necessarily stored in the main memory 200, 400 from source program 400b, which would conventionally be the start. The programs may be recorded in advance in, complied as illustrated in FIG. 7, is complied as illustrated for example, a "portable physical medium", which may in FIG. 8. Alternatively, the source program 400b, which be inserted into a computer, such as a floppy (trademark would conventionally be complied as illustrated in FIG. registered) disk (FD), a CD- ROM, a DVD disk, a magneto 10, is complied as illustrated in FIG. 12. 10 optical disk, or an IC card, "fixed- type physical medium" [0114] After causingthe compiler to generate the pseu- provided inside or outside a computer, and "another com- do indirect branch program 400c, the processor 300 puter (or server)" connected to the computer via a public reads out the pseudo indirect branch program 400c to line, the Internet, a LAN, a WAN, or the like so that the execute operations according to the pseudo indirect computer (processor) executes the programs read from branch program 400c. The pseudo indirect branch pro- 15 these. gram 400c includes the pseudo indirect branch instruc- [0120] Although the present invention has been de- tion in place of the indirect branch instruction, which re- scribed in its embodiments, the present invention can be sults in speedup of operation equivalent to the indirect practiced in a variety of other embodiments than the em- branch instruction. bodiments discussed hereinabove within the scope of [0115] As set forth hereinabove, the processor 300 ac- 20 the technical idea described in the appended claims. cording to the third embodiment starts the compiler to compile the source program 400b so that the pseudo INDUSTRIAL APPLICABILITY indirect branch program 400c that includes the pseudo indirect branch instruction in place of the indirect branch [0121] As set forth hereinabove, the present invention instruction are generated, and the processor 300 exe- 25 provides a high-speed implementation method for an in- cutes the pseudo indirect branch program 400c. This re- terpreter and the like that are currently in wide use. This sults in speedup of operation equivalent to the indirect technique can be implemented on a currently- used com- branch instruction. puter, and cost required for implementation is not large. [0116] Meanwhile, in the third embodiment, the proc- This is an optimization technique also effective for virtual essor 300 starts the compiler to thereby generate the 30 machines such as Java (trademark registered) that has pseudo indirect branch program 400c from the source large industrial impact, and hence practically highly val- program 400b; however, there is no restriction to this. uable. It is also appreciated that all programs running on The source program 400b can be compiled by a compiler an interpreter are allowed to run at high speed only by device that executes compilation only to generate the re-configuration of the interpreter. pseudo indirect branch program 400c, which includes 35 the pseudo indirect branch instruction in place of the in- direct branch instruction, so that the pseudo indirect Claims branch program 400c is executed by the processor 300 or the processor 100. 1. An indirect branch processing program for causing [0117] Meanwhile, among the operations described in 40 a computer to execute: the present embodiment, all or some operations that are described as being automatically performed may be per- a reading step of reading a source program formed manually. Alternatively, all or some operations stored in a storage device; that are described as being manually performed may be a code generating step of generating a pseudo performed automatically by using known method. In ad- 45 indirect branch code that includes, in place of dition to this, the process procedure, control procedure, an indirect branch instruction that is necessary specific names, information including various data pieces for execution of the source program, an instruc- and parameters may be arbitrarily changed unless oth- tion that causes branch destination addresses erwise specified. in the indirect branch instruction to be stored in [0118] The configurations of the processors 100, 300 50 a register and/or memory in inverse order; depicted in FIG. 9 and FIG. 15 are functional schematic a storing step of causing the pseudo indirect views, and do not necessarily illustrate requirements for branch code generated by the code generating physical configuration. More specifically, specific mode step to be stored in a storage device; and of distribution and integration of devices are not limited an executing step of reading and executing in- to those illustrated in the drawings, and the devices can 55 structions of the pseudo indirect branch code be functionally or physically distributed or integrated in stored in the storage device in one at a time man- any unit depending various loads, usage status, and the ner. like.

12 23 EP 2 182 433 A1 24

2. The indirect branch processing program according 6, wherein the computer is caused to execute the to claim 1, wherein the code generating step gener- pseudo indirect branch code, thereby indirectly ates a pseudo indirect branch code, in which an in- causing the branch destination addresses to be direct branch instruction that causes execution to stored in a predetermined storage area implemented transfer to a branch destination address stored in a 5 in the computer in inverse order. general purpose register is substituted with an in- struction that causes execution to transfer to a 8. The computer-readable medium according to claim branch destination address stored in the register 7, wherein and/or memory. a control statement that involves branching is written 10 in the source program, and 3. The indirect branch processing program according the pseudo indirect branch code includes an instruc- to claim 2, wherein the computer is caused to exe- tion that causes the branch destination addresses in cute the pseudo indirect branch code, thereby indi- the indirect branch instruction to be stored in a reg- rectly causing the branch destination addresses to ister in inverse order in each section before the con- be stored in a predetermined storage area imple- 15 trol statement is written. mented in the computer in inverse order. 9. Anindirect branch processing method for a computer 4. The indirect branch processing program according that reads a source program stored in a storage de- to claim 3, wherein vice to execute operation, the computer executing: a control statement that involves branching is written 20 in the source program, and areading process of reading the source program the pseudo indirect branch code includes an instruc- stored in the storage device; tion that causes the branch destination addresses in a code generating process of generating a pseu- the indirect branch instruction to be stored in a reg- do indirect branch code that includes, in place ister in inverse order in each section before the con- 25 of an indirect branch instruction that is neces- trol statement is written. sary for execution of the source program, an in- struction that causes branch destination ad- 5. A computer-readable medium that stores therein an dresses in the indirect branch instruction to be indirect branch processing program causing a com- stored in a register and/or memory in inverse puter to execute: 30 order; a storing process of causing the pseudo indirect a reading step of reading a source program branch code generated by the code generating stored in a storage device; process to be stored in a storage device; and a code generating step of generating a pseudo an executing process of reading and executing indirect branch code that includes, in place of 35 instructions in the pseudo indirect branch code an indirect branch instruction that is necessary stored in the storage device in one at a time man- for execution of the source program, an instruc- ner. tion that causes branch destination addresses in the indirect branch instruction to be stored in 10. An indirect branch processing program for causing a register and/or memory in inverse order; 40 a computer to execute an interpreter that translates a storing step of causing the pseudo indirect asource program, selects a processingcode accord- branch code generated by the code generating ing to content of the source program, and calls the step to be stored in a storage device; and processing code, the interpreter comprising: an executing step of reading and executing in- structions of the pseudo indirect branch code 45 a reading step of reading the source program stored in the storage device in one at a time man- stored in a storage device; and ner. a calling step of selecting a processing code ac- cording to the source program and, in a case 6. The computer-readable medium according to claim where the processing code is a processing code 5, wherein the code generating step generates a 50 associated with an indirect branch instruction, pseudo indirect branch code, in which an indirect calling, in place of the processing code associ- branch instruction that causes execution to transfer ated with the indirect branch instruction, a to a branch destination address stored in a general processing code associated with a pseudo indi- purpose register is substituted with an instruction rect branch instruction that includes an instruc- that causes execution to transfer to a branch desti- 55 tion that causes branch destination addresses nation address stored in the register and/or memory. in the indirect branch instruction to be stored in a register and/or memory in inverse order. 7. The computer-readable medium according to claim

13 25 EP 2 182 433 A1 26

11. The indirect branch processing program according ister. to claim 10, wherein the calling step uses a process- ing code of the pseudo indirect branch instruction 16. The computer-readable medium according to claim that causes execution to transfer to a branch desti- 15, wherein the computer is caused to execute a nation address stored in the register and/or memory 5 processing code of the pseudo indirect branch in- as a substitute for a processing code of an indirect struction, thereby indirectly causing the branch des- branch instruction that causes execution to transfer tination addresses to be stored in a predetermined to a branch destination address stored in a general storage area implemented in the computer in inverse purpose register. order. 10 12. The indirect branch processing program according 17. The computer-readable medium according to claim to claim 11, wherein the computer is caused to ex- 16, wherein ecute a processing code of the pseudo indirect a control statement that involves branching is written branch instruction, thereby indirectly causing the in the source program, and branch destination addresses to be stored in a pre- 15 the processing code of the pseudo indirect branch determined storage area implemented in the com- instruction includes an instruction that causes the puter in inverse order. branch destination addresses in the indirect branch instruction to be stored in a register in inverse order 13. The indirect branch processing program according in each section before the control statement is writ- to claim 12, wherein 20 ten. a control statement that involves branching is written in the source program, and 18. An indirect branch processing method for causing a the processing code of the pseudo indirect branch computer to execute an interpreter that translates a instruction includes an instruction that causes the source program, selects a processing code accord- branch destination addresses in the indirect branch 25 ing to content of the source program, and calls the instruction to be stored in the register and/or memory processing code, the interpreter comprising: in inverse order in each section before the control statement is written. a reading step of reading the source program stored in a storage device; and 14. A computer-readable medium that stores therein an 30 a calling step of selecting a processing code ac- indirect branch processing program causing a com- cording to the source program and, in a case puter to execute an interpreter that translates a where the processing code is a processing code source program, selects a processing code accord- associated with an indirect branch instruction, ing to content of the source program, and calls the calling, in place of the processing code associ- processing code, the interpreter comprising: 35 ated with the indirect branch instruction, a processing code associated with a pseudo indi- a reading step of reading the source program rect branch instruction that includes an instruc- stored in a storage device; and tion that causes branch destination addresses a calling step of selecting a processing code ac- in the indirect branch instruction to be stored in cording to the source program and, in a case 40 a register and/or memory in inverse order. where the processing code is a processing code associated with an indirect branch instruction, calling, in place of the processing code associ- ated with the indirect branch instruction, a processing code associated with a pseudo indi- 45 rect branch instruction that includes an instruc- tion that causes branch destination addresses in the indirect branch instruction to be stored in a register and/or memory in inverse order. 50 15. The computer-readable medium according to claim 14, wherein the calling step uses a processing code of the pseudo indirect branch instruction that causes execution to transfer to a branch destination address stored in the register and/or memory as a substitute 55 for a processing code of an indirect branch instruc- tion that causes execution to transfer to a branch destination address stored in a general purpose reg-

14 EP 2 182 433 A1

15 EP 2 182 433 A1

16 EP 2 182 433 A1

17 EP 2 182 433 A1

18 EP 2 182 433 A1

19 EP 2 182 433 A1

20 EP 2 182 433 A1

21 EP 2 182 433 A1

22 EP 2 182 433 A1

23 EP 2 182 433 A1

24 EP 2 182 433 A1

25 EP 2 182 433 A1

26 EP 2 182 433 A1

27 EP 2 182 433 A1

28 EP 2 182 433 A1

29 EP 2 182 433 A1

30 EP 2 182 433 A1

REFERENCES CITED IN THE DESCRIPTION

This list of references cited by the applicant is for the reader’s convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.

Patent documents cited in the description

• JP 2000284965 A [0009]

Non-patent literature cited in the description

• John L. Hennessy ; David A. Patterson. Computer • S. Gochman ; R. Ronen ; I. Anati ; A. Berkovits ; Architecture-A Quantitative Approach 3rd Eedition. T. Kurts ; A. Naveh ; A. Saeed ; Z. Sperber ; R. C. MORGAN KAUFMANN PUBLISHERS [0009] Valentine. The Intel Pentium M Processor: Microar- •P.-Y.Chang; E. Hao ; Y. N. Patt. Target Prediction chitecture and Performance. Intel Technology Jour- for Indirect Jumps. Proceedings of 24th International nal, 2003, vol. 7 (2), 21-36 [0043] Symposium on Computer Architecture, 1997, • Toyoshima Takashi ; Irie Hidetsugu ; Goshima 274-283 [0009] [0042] Masahiro ; Sakai shuichi. Register Indirect Jump • K. Driesen ; U. HAolzle. Accurate Indirect Branch Target Forwarding. SACSIS, 2006 [0044] Prediction. Proceedings of 25th International Sym- • C.F.Webb. Subroutine. ’’call/return stack’’.IBM posium on Computer Architecture, 1998, 167-178 Technical Disclosure Bulletin, 1988, vol. 30 (11), [0009] [0042] 18-20 [0049] • Mike Johnson. Superscalar Microprocessor Design. • D.R.Kaeli ; P.G.Emma. Branch History Table Pre- Prentice-Hallm Inc [0037] diction of Moving Target Branches Due to Subroutine • John L. Hennessy ; David A. Patterson. Computer Returns. Proceedings of 18th International Sympo- Architecture-A Quantitative Approach 3rd Edition. sium of Computer Architecture, 1991, 34-42 [0049] MORGAN KAUFMANN PUBLISHERS [0037] • K.Skadron ; P.S.Ahuja ; M.Martonosi ; D. • Bunki-Yosoku [0041] W.Clark. improving Prediction for Procedure Return with Return-Address-Stack Repair Mechanism. Pro- ceedings of 31st International Symposium on Micro- architecture, 1998, 259-271 [0049]

31