^ ^ ^ ^ I ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ I ^ (1 9) European Patent Office Office europeen des brevets EP 0 905 617

(12) EUROPEAN PATENT APPLICATION

(43) Date of publication: (51) |nt CI.6: G06F 9/45 31.03.1999 Bulletin 1999/13

(21) Application number: 98307926.0

(22) Date of filing: 29.09.1998

(84) Designated Contracting States: (72) Inventor: Ghosh, Sanjoy AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 54 Sunnyvale California 94086 (US) MC NL PT SE Designated Extension States: (74) Representative: Leeming, John Gerard AL LT LV MK RO SI J.A. Kemp & Co., 14 South Square, (30) Priority: 30.09.1997 US 940212 Gray's Inn London WC1R 5LX (GB) (71) Applicant: SUN MICROSYSTEMS, INC. Mountain View, California 94043-1100 (US)

(54) Method for generating a java data flow graph

(57) According to a first aspect of the present inven- ing links according to an order of of the unin- tion, a method for linking of an uninterrupted terrupted blocks and wherein a stack state has been block of bytecodes in the formation of a data flow graph generated for each of the uninterrupted blocks of byte- comprises the steps of scanning the uninterrupted block codes, comprises the steps of stepping through a first of bytecodes in a forward manner to identify the start of path of a plurality of paths of the order of execution that each of the bytecodes, scanning in a backward manner terminates in a join to generate a link in the data flow bytecodewise each of the bytecodes in the uninterrupt- graph between each bytecode producing a value in one ed block of bytecodes, and generating a link in the data of the uninterrupted blocks and each bytecode consum- flow graph that links each of the bytecodes to all other ing the value in another of the uninterrupted blocks in of the bytecodes used by the each of the bytecodes. the first path, and duplicating each link in the first path According to a second aspect of the present inven- with a linkfor each bytecode in all of the plurality of paths tion, a method for linking bytecodes between uninter- other than the first path for each bytecode producing a rupted blocks of bytecodes in the formation of a data value having a similar stack location to each bytecode flow graph, the uninterrupted blocks of bytecodes hav- producing a value in one of the uninterrupted blocks in the first path.

12 16 18 14 z /

JAVA JAVA STAND-ALONE OPTIMIZER

JAVA SOURCE JAVA BYTECODE CODE FILES CLASS FILES CM h ^26 < 10 OPTIMIZED CO JAVA lO CLASS FILES o O) FIG. 1A o a. LU

Printed by Jouve, 75001 PARIS (FR) (Cont. next page) EP0 905 617 A2

12 16 22 A 14 / A JAVA JAVA COMPILER RUNTIME SYSTEM

JAVA SOURCE JAVA BYTECODE CODE FILES CLASS FILES

30- NATIVE 20 JAVA CODE

FIG. 1B

2 1 EP0 905 617 A2 2

Description fication means and portability through bytecodes, Java programs lag natively compiled programs, written in lan- [0001] The present invention relates to the optimiza- guages like /C++, in their execution time. When a user tion of Java class files. More particularly, the present in- activates a Java program on a Web Page, the user must vention relates to the generation of aspects of a data s wait not only for the program to download but also to be flow graph from a Java classfile optimizer that identifies interpreted. To improve Java's execution time, optimiza- the propagation of data values in blocks of bytecode and tions can be introduced into the processing of Java byte- between blocks of bytecode. codes. These optimizations can be implemented in a va- [0002] A known problem for software developers and riety of manners including as Stand-Alone Optimizers computer users is the lack of portability of software 10 (SAO) or as part of Just-in-Time (JIT) . across operating system platforms. Attempts to address [0007] A SAO transforms an input classfile containing this problem must include means of ensuring security bytecode into an output classfile containing bytecodes as computers are linked together in ever expanding net- that more efficiently perform the same operations. A JIT works, such as the World Wide Web. As a response to transforms an input classfile containing byte code into both of these concerns, the JAVA programming Ian- 15 an executable program. Prior to the development of guage was developed at Sun Microsystems as a plat- JITs, a JVM would step through all the bytecode instruc- form independent, object oriented computer language tions in a program and mechanically perform the native designed to include several layers of security protection. code calls. With a JIT compiler, however, the JVM first makes a call to the JIT which compiles the instructions [0003] Java achieves its operating system independ- 20 into native code that is then run directly on the native ence by being both a compiled and interpreted lan- operating system. The JIT compiler permits natively guage. First, Java , which consists of Java complied code to run faster and makes it so that the classfiles, is compiled into a generic intermediate format code only needs to be compiled once. Further, JIT com- called Java bytecode. Java's bytecodes consist of a se- pilers offer a stage at which the executable code can be quence of single byte opcodes, each of which identify a 25 optimized. particular operation to be carried out. Additionally, some [0008] To either optimize or compile bytecodes in- of the opcodes have parameters. For example, opcode volves the translation of the source bytecodes into what number 21 , iload , takes the single-word inte- is known in the art as an intermediate representation ger value stored in the local variable, varnum, and push- (IR). The IR provides information about two essential es it onto a stack. 30 components of a program: the control flow graph (CFG) [0004] Next, the bytecodes are interpreted by a Java and the data flow graph (DFG). Subsequently, the IR is (JVM) which translates the bytecodes transformed for compilers into and for opti- into native . The JVM is a stacked-based mizers into an improved version of the source code for- implementation of a "virtual" processor that shares mat. many characteristics with physical microprocessors. 35 [0009] The CFG breaks the code into blocks of byte- The bytecodes executed by the JVM are essentially a code that are always performed as an uninterrupted machine instruction set, and as will be appreciated by group and establishes the connections that link the those of ordinary skill in the art, are similar to the as- blocks together. The DFG maps the connections be- sembly language of a computing machine. Accordingly, tween where values are produced and where they are every hardware platform or operating system may have 40 used. This includes connections within blocks of byte- a unique implementation of the JVM, called a Java Runt- codes and also connections between blocks of byte- ime System, to route the universal bytecode calls to the codes. Because Java programs frequently use stack lo- underlying native system. cations as implicit variables, tracking the propagation of [0005] When developing the JAVA bytecode instruc- values is a difficult process. When extracting the data tion set, the designers sought to ensure that it was sim- 45 flow information to generate the DFG for the I , the prior pie enough for hardware optimization and also included art approach has been to perform a complete simulation verification means to provide security protection and to of both the stack and iterations through all possible flow prevent the execution errors or system crashes that can paths. result from improperly formatted bytecode. As Java's [0010] In contrast to traditional compilers where end bytecodes contain significant type information, the ver- so users only interact with compiled programs, the run time ification means are able to do extensive type checking nature of Java creates a significant incentive to optimize when the bytecodes are first retrieved from the internet the speed and efficiency of Java optimizers and JIT or a local disk. As a result, the of the native compilers. Accordingly, it is an object of the present in- machine need only perform minimal type checking at vention to provide a method for generating a DFG from run time. Unlike languages such as SmallTalk that pro- 55 the source bytecodes to form an IR that optimizes the vide protection by performing extensive runtime checks, speed and efficiency of Java optimizers and JIT compil- Java executes more quickly at run time. ers. [0006] Although Java provides security through veri- [0011] The present invention is directed to the optimi-

3 3 EP0 905 617 A2 4 zation of Java class files by improving the efficiency of Other embodiments of the invention will readily suggest generating a DFG for the IR. themselves to such skilled persons from an examination [001 2] According to a first aspect of the present inven- of the within disclosure. tion, a forward scan of an uninterrupted bytecode block [0026] Referring first to FIGS. 1 A and 1 B, simplified from a bytecode source file is performed to identify the 5 flow diagrams 10 and 20 of the optimization of a Java start of each bytecode in the uninterrupted block. Next, bytecode source file are illustrated. In both flow dia- a backwards scan is performed. For each bytecode in grams 10 and 20, Java source code file 12 is compiled the block, a DFG node is created, links to all the values by a Java compiler 14 into Java bytecode classfiles 16. used by the bytecode in the block are established, and In flow diagram 10, the classfiles are operated on by a links to preceding or subsequent nodes in the file are 10 SAO 18, and in flow diagram 20, the classfiles are created. passed through a Java Runtime System 22 to a JIT com- [0013] According to a second aspect of the invention, piler 24. The Java SAO 18 outputs Java classfiles 26 the bytecode blocks are linked in the file's order of exe- containing bytecodes that perform the same operations cution and a stack state is generated for each block of as the input classfiles, only in an optimized manner. The code. For each join statement generated by the CFG, is jit 24 outputs executable code 28 that can run directly one path of the CFG is stepped through. Next, links are on the native operating system 30. Those of ordinary established in the DFG between locations where values skill in the art will recognize that the procedures de- are used and where the values are produced. Finally, scribed in FIG. 1 are merely illustrative and that there for all blocks parallel to the blocks producing values the exist other structures within which classfiles may be op- links for bytecodes occupying the same stack state lo- 20 timized. cation are duplicated. [0027] In processing an input bytecode source class- [0014] The present invention will be described below file, both an SAO and a JIT form an IR. As described with reference to exemplary embodiments and the ac- above, an IR is a succinct and straightforward structur- companying drawings, in which: ing of the control and data flow information of a program [0015] FIG. 1A is a flow diagram of the optimization 25 into a CFG and DFG. The CFG represents the informa- of a JAVA bytecode source file using a stand alone op- tion regarding the sequence and order of the execution timizer according to the present invention. of the bytecode instructions as blocks of code linked in [0016] FIG. 1B is a flow diagram of the optimization the order of execution of the program. The blocks of of JAVA bytecode source file using a just in time compiler code are sections of bytecode that are always per- according to the present invention. 30 formed as an uninterrupted group, and the links be- [0017] FIG. 2 is a flow diagram illustrating the gener- tween the blocks of code are bytecode branch, condi- ation of control flow graph from a bytecode source file tional, or jump instructions. suitable for use in the present invention. [0028] FIG. 2 is a flow diagram that illustrates the gen- [0018] FIG. 3 is a flow diagram of the generation of eration of the CFG from a bytecode classfile 40. It should data flow graph according to the prior art. 35 be appreciated that there are several known methods [0019] FIG. 4 is a flow diagram of a first aspect of the for generating a CFG. The specifics of any such method generation of a data flow graph for the links in a block will not be disclosed herein to avoid overcomplicating of bytecode according to the present invention. the disclosure and thereby obscuring the present inven- [0020] FIG. 5A is a block diagram for the generation tion. The procedure begins with a classfile 40 containing of a node in a data flow graph for the instruction IADD 40 an ordered sequence of bytecode instructions num- according to the first aspect of the present invention. bered from 1 through N. The numbers 1 through N are [0021] FIG. 5B is a block diagram for the generation analogous to the numbering of lines in a computer pro- of several nodes in a data flow graph for a plurality of gram as is known in the art Next, an optimizer 42 proc- instructions according to the first aspect of the present esses the bytecode classfile 40 to extract the control invention. 45 flow information from the bytecodes. [0022] FIG. 6 is a flow diagram of a control flow graph [0029] The control flow information is illustrated by the suitable for illustrating a second aspect of the generation CFG 44. For example, the CFG 44 may indicate that for of a data flow graph for the links between blocks of byte- classfile 40 that bytecode instructions 1 through 3 are code according to the present invention. first executed, instructions 1 00 through 1 03 are next ex- [0023] FIG. 7A is a block diagram of the steps in the 50 ecuted, either instructions 10 through 14 or 15 through method according to the second aspect of the present 1 9 are next executed, and finally instructions 4 through invention. 8 are executed before the remainder of the bytecode [0024] FIG. 7B is a block diagram for the generation instructions execute. It will be appreciated that the CFG of nodes in a data flow graph according to the second 44 is an example only, and that any number of CFGs aspect of the present invention. 55 may be generated from the classfile 40 by the optimizer [0025] Those of ordinary skill in the art will realize that 42. the following description of the present invention is illus- [0030] As described above, once the CFG has been trative only and is not intended to be in any way limiting. generated, the DFG will be generated. According to the

4 5 EP0 905 617 A2 6 present invention, two aspects of the DFG are generat- that because the stream of bytecodes are the instruc- ed from the CFG. In one aspect, the propagation of val- tions for a stack machine, the operands or values con- ues within blocks of bytecode is determined. As is sumed by a bytecode instruction precede the instruc- known in a typical JAVA program, data values are pro- tion. When the bytecode file is read in reverse order, the duced in one location and consumed in another. The first 5 step of creating a node in the DFG can be immediately aspect of the present invention relates to establishing performed after the bytecode is read. the connections between the production and consump- [0036] Further examples, not intended to be exhaus- tion points that occur within uninterrupted blocks of tive, are illustrated in FIGS. 5A and 5B of the nodes in code. The present invention allows implicit variables to a DFG created from bytecode instructions according to be easily recognized and for their values to be immedi- 10 the present invention. FIGS. 5A and 5B respectively are ately linked in the DFG to the bytecode(s) using them. examples of a single instruction with two links, and a [0031] Unlike the prior art, the present invention series of instructions with multiple links. avoids the approach taken in the prior art of constructing [0037] In FIG. 5A, a bytecode source block 150 has a stack to determine the values used by bytecodes. For been first differentiated by a forward scan into byte- example, in FIG. 3, a block diagram of a known method 's codes 1 52, 1 54 and 1 58. When the backward scan step for generating links within a code block of a CFG is il- is performed, an integer addition (IADD) instruction is lustrated. The diagram demonstrates the principle stag- read from bytecode 158, and a node 160 is created in es employed to determine the value of implicit stack lo- the DFG. From the definition of the instruction IADD it cation variables when generating a DFG. During stage is known that node 1 60 will consume two values. As the 60, the operations of an input sequence of bytecodes 20 backward scan continues, these two values are read 62 are simulated to create a version of the stack 64 that from bytecodes 1 54 and 1 52, created as nodes 1 62 and they represent. The stack shown 64 presents only a por- 1 64 in the node tree of the DFG, and then linked to node tion 66 of the input sequence of bytecodes 62. During 160. stage 68, the stack 64 is executed, and it is determined [0038] In FIG. 5B, a bytecode source block 166 has that the POP instruction 70 is associated with the value 25 been first differentiated by a forward scan into byte- "2" due to the preceding BIPUSH "2" 72 instruction. This codes 168, 170 and 172. The sequence of bytecodes information is stored in a DFG node represented by ref- 166 implement the statement (X + X). In the backwards erence numerals 74 and 76. scan step the first bytecode 1 72 encountered is an IADD [0032] According to the preferred embodiment of the instruction. As explained with respect to the example of present invention, a node tree is employed. When val- 30 FIG. 5A, the IADD instruction generates a node 174 in ues are produced by the bytecode classfile, a new node the DFG with links to two values. Since the next byte- is created in the tree. This node is then connected to the code 170 is a duplicate (DUP) instruction both of the rest of the tree by linking it to the value's production and links from node 174 are made to the node 176 for the consumption points, as they become known. In this DUP instruction. Next, bytecode 168 having an integer manner, the propagation of data values through a pro- 35 bad X (I LOADX) instruction is read, and a node 1 78 with gram can be readily traced. Those of ordinary skill in the a link to node 176 is created. In this sequence in the art will recognize that a variety of data structures could node tree, the value in node 178, will be duplicated in be used. node 176, and both values will be consumed in node [0033] Turning now to FIG. 4, a flow diagram of a 174. As will be appreciated by those of ordinary skill in method for generating links within a bytecode block ac- 40 the art from the above recited examples, when perform- cording to the present invention is illustrated. In a first ing the backwards scan step, the particular DFG created step 100, a forward scan of a bytecode source file 102 will depend upon the sequence of bytecodes in the is performed in order to determine the starts 1 06, 1 08, blocks generated by the CFG. and 1 1 0 of the bytecodes 112, 114, and 1 1 6, respective- [0039] According to a second aspect of the present ly, because as is known, bytecodes can be of different 45 invention, the propagation of values between blocks of lengths. Next, at step 118 a backward scan of the byte- bytecode in the CFG is determined. FIG. 6 illustrates a code source file 1 02 is performed. During the backward CFG 1 80 having several blocks of linked code 200, 202, scan step 118, the nodes of the DFG are generated as 204, and 206. In the CFG 180, the blocks 202 and 206, they occur. For example, if bytecode 116 is the instruc- that precede a join 208 are considered parallel. It should tion POP, as POP is read, a node 120 is created in the so be appreciated that a join may connect a plurality of par- DFG including a POP instruction 122. allel blocks. When the CFG is generated, stack states, [0034] If next, the instruction BIPUSH "2" is read at 210, 212, 214, and 216 are created for each block of bytecode 1 08, the value "2" at 1 24 is linked to the POP bytecode 200, 202, 204, and 206, respectively. The instruction 122 that consumes it. stack states includes all the values still present on the [0035] The present invention provides a significant in- 55 stack after the bytecodes in the block associated with crease in the efficiency in the generation of the DFG be- that stack have been executed. It should be appreciated cause the entire process of constructing a copy of the that values on the stack are produced by instructions in stack has been eliminated. The invention recognizes the bytecode blocks. This is shown by the mapping of

5 7 EP0 905 617 A2 8 the instructions 21 8 and 220 to the stack states 222 and 224, respectively. For purposes of illustration to be de- IF (some condition occurs) scribed below, instruction 218 is PUSH "1", instruction 220 is PUSH "0", and instruction 226 is POP. X=l; [0040] To determine when a value produced in an ear- 5 block of is used consumed subse- ly bytecode or by a ELSE quent block of bytecode to generate a link in the DFG between blocks of bytecode, an execution of the blocks X is made while noting the consumption of a prior stack = 0; state value and identifying its source. This information 10 is represented in the DFG as a link between the nodes of the consumption and production of the bytecodes. To establish links for all parallel blocks, the method in the prior art relied upon iterating through all possible control flow paths in the CFG. For example, to generate the 15 DFG for links between blocks in the CFG 180, a first A = X + 2; pass would be made through the path including blocks 200, 202 and 204, and then a second pass would be made through the path including blocks 200, 206 and [0044] The assignment of a value to "A" in the expres- 204. 20 sion "A = X + 2" uses the value from either expression [0041] According to the present invention, not all con- "X = 1 " or X = 0" that has been assigned to the implicit trol flow paths must be traversed to form the DFG for stack variable "X" as determined by the "IF (some con- links between parallel blocks of bytecode. Rather, the dition occurs)" statement. links in the DFG between blocks 200, 202, 204 and 206 [0045] In FIG. 7A, a block diagram outlining the steps can be generated simply by stepping through blocks 25 employed to generate a DFG according to the second 200, 202, 204 and then applying the information from aspect of the present invention is illustrated. the pass through blocks 200, 202, 204 to establish the [0046] At step 250, a first branch of a CFG graph is links to block 206. The present invention exploits the fact scanned. Applied to CFG 180 in FIG. 6, a first pass that Java standards' require and the verification de- through blocks 200, 202 and 204 is performed. scribed above ensures that in properly formatted code, 30 [0047] Next at step 252, links between the place a val- the stack states for any of the blocks preceding a join ue is created and the place a value is consumed are are the same. made. As illustrated in FIG.7B, a link 254 is made be- [0042] For example, in CFG 180, both of the stack tween the PUSH "1" instruction at location 218 in the states 212 and 216 that precede the join 208 must be CFG 1 80 that the POP instruction consumes at location the same in both size and relative type. During execu- 35 226 in the CFG 180. tion, the consumption of any value in block 202 will be [0048] Next at step 256, a link 258 is made between paralleled by a consumption of a value in block 206. As the PUSH 0 instruction at location 220 that the POP in- a result, for example, if the value corresponding to the struction consumes at location 226, because it is known stack state location 222 is used by the bytecode instruc- that the data for this instruction occupies the same re- tion in 226 of code block 204, then the bytecode instruc- 40 spective stack state location as the PUSH 1 instruction tion 226 also uses the value corresponding to the stack at location 21 8 in the CFG 1 80. As such, without travers- state location 224. ing all paths in the CFG as relied upon by the prior art, [0043] The following lines of pseudocode illustrate the according to the second aspect of the present invention, point. the links in a DFG between blocks of bytecodes can be 45 more efficiently made. As joins are quite common throughout bytecode source code, the present invention represents a significant increase in efficiency. Where multiple joins exist, the increase in efficiency is even greater. so [0049] While illustrative embodiments and applica- tions of this invention have been shown and described, it would be apparent to those skilled in the art that many more modifications than have been mentioned above are possible without departing from the inventive con- 55 cepts set forth herein. The invention, therefore, is not to be limited except in the scope of the appended claims.

6 9 EP0 905 617 A2

Claims

1. A method for linking bytecodes of an uninterrupted block of bytecodes in the formation of a data flow graph comprising the steps of: s

scanning said uninterrupted block of bytecodes in a forward manner to identify the start of each of said bytecodes; scanning in a backward manner bytecodewise 10 each of said bytecodes in said uninterrupted block of bytecodes; and generating a link in said data flow graph that links each of said bytecodes to all other of said bytecodes used by said each of said byte- 15 codes.

2. A method for linking bytecodes between uninter- rupted blocks of bytecodes in the formation of a data flow graph, said uninterrupted blocks of bytecodes 20 having links according to an order of execution of said uninterrupted blocks and wherein a stack, state has been generated for each of said uninterrupted blocks of bytecodes, comprising the steps of: 25 stepping through a first path of a plurality of paths of said order of execution that terminates in a join to generate a link in said data flow graph between each bytecode producing a val- ue in one of said uninterrupted blocks and each 30 bytecode consuming said value in another of said uninterrupted blocks in said first path; and duplicating each link in said first path with a link for each bytecode in all of said plurality of paths other than said first path for each bytecode pro- 35 ducing a value having a similar stack location to each bytecode producing a value in one of said uninterrupted blocks in said first path.

3. A computer readable storage medium having re- 40 corded thereon code compo- nents that, when loaded onto a computer and exe- cuted, cause that computer to operate according to the method of claim 1 or claim 2.

50

55 EP0 905 617 A2

12 16 18 14 Z Z \ JAVA JAVA COMPILER STAND-ALONE OPTIMIZER

JAVA BYTECODE JAVA SOURCE CLASS FILES CODE FILES h ^-26

10 OPTIMIZED JAVA CLASS FILES

FIG. 1A

12 16 22 A 14 L A A JAVA JAVA COMPILER RUNTIME SYSTEM

JAVA BYTECODE JAVA SOURCE FILES CODE FILES CLASS

30- NATIVE 20 OPERATING SYSTEM JAVA EXECUTABLE CODE

FIG. 1B

8 EP0 905 617 A2

1 42 44-

3

OPTIMIZER

100

8

FIG. 2

62

72 -74 POP bipush2 bipush2 POP POP -76 7 V 64 70 N

~60~ ~68

FIG. 3

9 EP0 905 617 A2

(FORWARD SCAN) 102 112 114 116 100i t t t 106 108 110

(BACKWARDS SCAN)- 102 bipush POP

122"i 114 116 118i pop y

y 120

124

FIG. 4

150- VOL1>L1 I VOL 2 | iadd \ N \ 152 154 158 164- 162 VOL 1 VOL 2

FIG. 5A

166- 174 iloadx dup iadd "ii55~y-174

I I 168 170 172

I 178... iloadx

FIG. 5B

10 EP0 905 617 A2

180

210

202- 206

216 218- -L 224

214

FIG. 6

STEP ONE: SCAN ONE BRANCH OF CONTROL FLOW GRAPH

STEP TWO: ESTABLISH LINKS BETWEEN WHERE A PLURALITY OF VALUES ARE PRODUCED AND WHERE THE PLURALITY OF VALUES ARE CONSUMED

STEP THREE: DUPLICATE LINK(S) TO CONSUMPTION POINT(S) FROM ALL PRODUCTION POINT(S) THAT HAVE THE SAME RESPECTIVE STOCK STATE LOCATION(S) AS PRODUCTION POINTS IN STEP TWO

FIG. 7 A

[popy226

220- 218 push 0 push 1

FIG. 7B

11