<<

The following paper was originally published in the Proceedings of the Third USENIX Conference on Object-Oriented Technologies and Systems Portland, Oregon, June 1997

Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?)

Todd A. Proebsting, Scott A. Watterson The University of Arizona

For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL: http://www.usenix.org

Krakatoa: Decompilation in Java

Does Bytecode Reveal Source? 

To dd A. Pro ebsting Scott A. Watterson



The University of Arizona

Abstract well known. Presently,we fo cus our researchon

two subproblems: recovering source-level expres-

This pap er presents our technique for automati-

sions and synthesizing high-level control constructs

cally decompiling Javabyteco de into Java source.

from goto-like primitives.

Our technique reconstructs source-level expres-

Krakatoa uses a stack-simulation technique to re-

sions from byteco de, and reconstructs readable,

cover expressions and p erform typ e inference. Ex-

high-level control statements from primitive goto-

pression recovery creates source-level assignments

like branches. Fewer than a dozen simple co de-

and comparisons from primitivebyteco de op era-

rewriting rules reconstruct the high-level state-

tions. We extend Ramshaw's goto-elimination al-

ments.

gorithm to structure and create source for ar-

bitrary reducible control ow graphs. This tech-

nique pro duces source co de with lo ops and multi-

1 Intro duction

level break's. Subsequent techniques recover more

intuitive constructs e.g., if statements via appli-

Decompilation transforms a low-level language into

cation of simple co de rewrite rules.

a high-level language. The Java

JVM sp eci es a low-level byteco de language for a

Traditional decompilation systems use graph

stack-based machine [LY97]. This language de nes

transformations to recover high-level control con-

203 op erators, with most of the control ow sp eci-

structs. These systems require the author of the

ed by simple explicit transfers and lab els. Compil-

decompiler to anticipate all high-level control id-

ing a Java class yields a class le that contains typ e

ioms. When faced with an unexp ected language

information and byteco de. The JVM requires a sig-

idiom, these systems either ab ort, or pro duce gotos

ni cant amountoftyp e information from the class

illegal in Java. Krakatoa represents a di erent ap-

les for ob ject linking. Furthermore, the byteco de

proach. Krakatoa rst pro duces legal Java source

must b e veri ably wel l-behaved in order to ensure

given legal Javabyteco de with arbitrary reducible

safe . Decompilation systems can exploit

control ow, and then recovers intuitive high-level

this typ e information and well-b ehaved prop ertyto

constructs from this source.

recover Java source co de from the class le.

We present a technique for transforming low-level

Figure 1 gives the ve steps of decompilation

Javabyteco de into legal Java source co de. Our sys-

p erformed by Krakatoa. First, the expression

1

tem, Krakatoa, p erforms typ e inference to issue

builder reads byteco de, recovers expressions and

lo cal variable declarations. The veri er do es the

typ e information, and pro duces a control ow

same typeoftyp e inference, and the techniques are

graph CFG. Next, the sequencer orders the CFG

no des for Ramshaw's goto-elimination technique.



Address: Department of Computer Science, Uni-

Ramshaw's algorithm pro duces a convoluted|yet

versity of Arizona, Tucson, AZ 85721; Email: fto dd,

[email protected].

legal|Java AST. Our sys-

1

Krakatoa is a volcano lo cated in the Sunda Strait b e-

tem then transforms this AST into a less convo-

tween Java and Sumatra. Its 1983 eruption threw ve cubic

luted AST using a set of simple rewrites. The nal

miles of debris into the air and was heard 2200 miles away

in Australia. phase pro duces Java source by traversing the AST.

class foo { int sam;

Java Bytecodes

int barint a, int b { if sam > a {

Expression Builder

b = a*2; }

Flow graph with expressions and conditional gotos

return b; }

Node Sequencer }

Augmented Flow graph Compiled from foo.java

class foo extends java.lang.Object { int sam;

Goto Eliminator (Ramshaw's Algorithm) int barint,int;

Java AST

Method int barint,int

0 aload_0

Code Simplifier 1 getfield 3 4 iload_1

Restructured Java AST 5 if_icmple 12 8 iload_1

Final Java printer 9 iconst_2 10 imul

Java Source 11 istore_2

12 iload_2

13 ireturn

}

Figure 1: Java Byteco de Decompilation System

Figure 2: Simple Metho d and Byteco de Disassem-

bly via javap -c.

2 Expression Recovery

instance, iload_1, which loads the value of the

Javabyteco des b ear a very close corresp ondence

rst lo cal variable|with typ e int|could b e rep-

to Java source. As a result, recovering expres-

resented on the stackas\i1". Similarly,if i1 and

sions from Javabyteco de is often simple|much

2 were the top two elements of the symb olic stack,

simpler than recovering expressions from machine

and the next byteco de were iadd integer addition,

language. Java class les include information that

those elements would b e p opp ed o the stack and

makes recovering high-level op erations like eld ref-

replaced with \i1+2". The symb olic execution

erences easy. The fact that the byteco de must b e

of some expressions, like assignment, requires emit-

well-b ehaved i.e., veri able also simpli es analy-

ting Java source.

sis. Figure 2 gives a sample program and its abbre-

Our algorithm recovers expressions one basic

viated disassembly. Note the level of typ e informa-

blo ck at a time. Some basic blo cks such as those

tion in the disassembly pro duced by Sun's javap

pro duced by the conditional expression op eration, utility.

A?B:C do not b egin with empty stacks, so some

Symb olic execution of the byteco de creates the

information is required to prop ogate from prede-

corresp onding Java source expressions. It also cre-

cessors. Also, basic blo cks that b egin exception-

ates conditional and unconditional goto's, which

handling blo cks|which are easily identi ed|b egin

will b e removed by subsequent decompilation steps.

with the raised exception on the stack.

Symb olic execution simulates the Java Virtual Ma-

Figure 3 provides the step-by-step decompilation chine's evaluation stack with strings that represent

of the byteco de in Figure 2. The initial aload_0 the source-level expressions b eing computed. For

instruction pushes a Java reference onto the stack. program with exactly the same control owasthe

In virtual functions, the \0'th" lo cal variable, a0, goto-only program.

always refers to this. The getfield instruction Ramshaw's algorithm requires two inputs: the

references a named eld, \sam", of the current top control ow graph, and an instruction ordering. His

of stack. Therefore, the \this" is p opp ed and algorithm enco des this order into the ow graph

replaced with \this.sam". iload_1 pushes \i1" using augmenting edges, such that every instruc-

onto the stack. The ifcmple compares the top two tion has an augmenting edge to the next instruc-

stack elements and branches to the appropriate in- tion in sequence. These augmenting edges o ccur

struction if the lower is less than or equal to the between every pair of physically adjacent instruc-

top element. Symb olically executing the ifcmple tions even if actual control owbetween them is

requires p opping the top two elements and emit- imp ossible. He proves that if this augmented graph

ting the appropriate conditional branch. Translat- is reducible, then a structural ly equivalent [PKT73]

ing the remaining instructions is similar. program can b e created without goto's. How-

ever, Ramshaw provides no algorithm for nding

Most of the byteco de instructions are equally

a reducible augmented ow graph from a given re-

simple to symb olically execute. Unfortunately,a

ducible ow graph.

few require more information. Some of the stack

The control- ow graphs of Java programs are

manipulation routines e.g., pop2, dup2, etc. de-

reducible. Therefore, the compiled byteco de will

p end on o sets from the stack top. For in-

likely form a reducible control- ow graph. Unfor-

stance, pop2 removes the top 8 from the

tunately, simple optimizations likeloopinversion

stack, whether those 8 bytes represent one 8-byte

create irreducible augmented ow graphs. The ow

double value, or two 4-byte scalar values. To cor-

graph of the program in Figure 8 has this problem

rectly simulate these instructions the symb olic ex-

b ecause the augmenting edge b etween the rst two

ecution keeps track of the size and typ e of each

statements creates a \jump" into the b o dy of the

stack element.

lo op formed by the next seven statements.

To utilize Ramshaw's algorithm, we develop ed an

3 Instruction Ordering

algorithm that orders a reducible graph's instruc-

tions such that the resulting augmented graph is

After recovering expressions, conditional and un-

also reducible.

conditional goto's along with implicit fal l through

b ehavior determine control ow. Java, however,

3.1 Augmenting the Flow Graph

has no goto statement, so its control owmust b e

expressed with structured statements.

Creating a reducible augmented ow graph re-

Ramshaw presented an algorithm for eliminating

quires that no augmenting edge enters a lo op any-

goto's from Pascal programs while preserving the

where other than at its header. Preventing this

program's structure [Ram88]. This algorithm re-

is simple|when ordering the instructions, make

places each goto with a multilevel break to a sur-

the header rst and contiguously order the lo op's

rounding lo op. The algorithm determines the ap-

instructions. Because physical adjacency deter-

propropriate lo cations for these surrounding lo ops.

mines augmenting edges, contiguously ordering the

We trivially extended his algorithm to use multi-

instructions guarantees that the only augmenting

level continue's.

edge entering the lo op from the outside will b e en-

tering at the top, which will not a ect reducibility Ramshaw's extended algorithm replaces each

if it is the lo op's header. forward goto with a break and each backward

A lo op with no nested lo ops inside is easy to goto with a continue. His algorithm inserts a lo op

order|simply remove the back edges and top o- that ends just b efore the target of each break state-

logically sort the remaining directed acyclic graph ment. Likewise, it inserts a lo op that starts just

DAG. Handling interior lo ops requires replacing b efore the target of each continue. These lo ops

them with a single placeholder no de in the graph ensure that each control-transfer statement jumps

and separately ordering b oth the lo op and the sur- to the correct instruction. Each newly-inserted

rounding graph. After ordering b oth, re-insert the lo op must also end with a break statement, so

lo op's no des at its placeholder. Re-ordering in- that control will fal l out of the lo op. Figure 4

structions maychange whether or not one instruc- shows an example of this technique. Additional

tion fal ls through to another as it did in the original lo ops and break/continue's create a structured

Byteco de Symb olic Stack Emitted Source

aload 0 "this"

getfield 3 "this.sam"

iload 1 "this.sam", "i1"

if icmple 12 if this.sam <= i1 goto L12

iload 1 "i1"

iconst 2 "i1", "2"

imul "i1*2"

istore 2 i2 = i1*2

12: iload 2 "i2" L12:

ireturn return i2

Figure 3: Symb olic Execution of Byteco de

stmt0

L2: for  ;; f

L1: for  ;; f

if expr 1 break L1; stmt0

if expr 2 break L2; if expr 1 goto L1;

break L1; if expr 2 goto L2;

g // L1 L1: stmt1

stmt1 L2: stmt2

break L2;

g // L2

stmt2

Figure 4: Ramshaw's Goto Elimination: Before and After

ordering. Where implicit control ow has changed, dering of the no des the augmenting path, Kraka-

the algorithm must add new branches to restore toa can apply Ramshaw's technique to eliminate

the original control ow. Whenever p ossible, the goto's.

top ological sort attempts to maintain the original

fall-through b ehavior.

4 Co de Transformations

This algorithm pro duces a reducible augmented

graph. Because all lo ops are ordered separately,

4.1 Program Points

and laid out contiguously, the only augmenting

edge entering from outside enters at the top. The

After applying Ramshaw's algorithm for eliminat-

top ological sort of the lo op minus its backedges

ing goto's, Krakatoa has a complex, yet legal, Java

guarantees that this top no de is the lo op header and

AST see Figure 9. Krakatoa then pro ceeds to

that no internal edges cause irreducibility. Outside

recover more of the natural high-level constructs

edges into the lo op header cannot make a lo op irre-

of the original program e.g. if-then-else, etc..

ducible. Therefore, the resulting augmented graph

Krakatoa uses a program point analysis to summa-

is reducible.

rize a program's control- ow and to guide recover-

Lo ops are not the only blo cks of instructions

ing high-level constructs. A program p oint is a syn-

whichmust b e ordered contiguously. Exception

tactic lo cation in a program. Every statement has a

handling regions must form contiguous sections of

program p oint b oth b efore and after it. These pro-

instructions. Class les sp ecify which instructions

gram p oints havetwo prop erties: reachability and

are in which regions. Our algorithm orders those

equivalence class.

regions contiguously by treating them like lo ops.

A program p oint is unreachable if and only if it is

preceded along all execution paths by an uncondi- After applying this technique to create a total or-

tional jump statement i.e. return, throw, break, then-branchStmtlist1  can reach Stmtlist2 di-

or continue. For instance, in Figure 5, program rectly.

p oint3, , is unreachable, since it is preceded bya

3

return statement.  is reachable, however, since

6

4.4 Lo op Rewriting Rules

one of the branches in the preceding if statement

The third rule in Table 1 removes useless continue

do es not end with a jump statement.

statements. If the program p oint after a continue

Two program p oints are equivalent denoted as

statement is equivalent to the program p oint b efore

    if and only if future computation of the

x y

the continue statement, then that continue can

program is the same from b oth p oints. For in-

b e removed.

stance, the program p oint b efore a break state-

The fourth rule creates a short-circuit test ex-

ment is equivalent to the program p oint after the

pression within a for-lo op by eliminating an inte-

lo op it exits  and  in Figure 6. As an ex-

3 8

rior if statement. Doing so requires that the lo op

ample, in Figure 6,  , , , , , and  are

1 2 4 5 6 7

b o dy b egin with an if-then-else statement, and

equivalent, as are program p oints  and  .

3 8

that the then branch of that statement consists

Both reachability and equivalence are simple

of a single jump to a program p oint equivalentto

to compute via standard control- ow analyses

breaking out of the lo op.

[ASU86].

The fth transformation provides an example of

transforming lo ops into if statements. Aloopis

4.2 AST Rewrite Rules

equivalenttoanif if it can never rep eat itself, and if

all simple break statements can b e safely removed

Krakatoa p erforms a series of AST rewriting trans-

during the transformation. A lo op never rep eats

formations to recover as many of the \natural" pro-

if its last program p oint is unreachable. break's

gram constructs as it can e.g. if-then-else, etc..

may b e removed if the immediately following un-

Krakatoa applies these rewriting rules rep eatedly

reachable program p oint is equivalent to the last

until no changes o ccur. Wehave found that the

program p oint in the lo op  in Table 1. The

1

few rules b elow are sucient to retrieve high-level

transformation replaces the lo op with an if state-

constructs of the Java language, including if-then-

ment, and deletes all of the break statements for

else statements, and short-circuit evaluation of ex-

that lo op.

pressions. Each rewriting rule reduces the size of

the AST, thus ensuring termination.

4.5 Short Circuit Evaluation

Table 1 summarizes the rules, whichwe describ e

b elow in greater detail. Many of these rules gen-

Rewriting Rules

eralize. Those that apply to for-lo ops often apply

The sixth rule shown in Table 1 recovers a short-

to other lo ops. Many rules have several symmetric

circuit Or conditional. Short-circuit Or's exist

cases. For example, the rst rule in Table 1 re-

when two adjacent conditionals guard the same

moves an empty else-branch from an if-then-else

statement list and failure of either will cause a

statement|there is a symmetric rule for removing

branch to equivalent lo cations.

an empty then-branchby negating the predicate.

The last transformation in Table 1 recovers short-

circuit And expressions. This transformation is ap-

4.3 if-then-else Rewriting Rules

plicable whenever a simple if statement represents

the entire b o dy of another.

The rst transformation shown in Table 1 changes

an if-then-else statementinto an if-then state-

ment when the else branch is empty. This trans-

5 Status

formation is always legal.

The second transformation creates an if-then- Wehave implemented a prototyp e Java decompiler,

else statement from an if-then statementby hoist- Krakatoa, in Java. Wehave run Krakatoa on a

ing the subsequent statement list into the else-part. numb er of class les, including some to whichwe

Our algorithm p erforms this transformation if and had no source co de access. We examined the output

only if no reachable program p ointin Stmtlist1 is of Krakatoa by hand, and Krakatoa app ears to re-

equivalent to the program p oint b efore Stmtlist2. cover high-level constructs very well. Figures 7{10

Essentially, this means that no statementinthe provide an example of the stages of decompilation.

Rule Before After Conditions

if expr f if expr f

Eliminate

Stmtlist Stmtlist None

Else

g else fg g

if expr f

if expr f

Stmtlist1 Stmtlist1 contains no

Stmtlist1

Create

g else f reachable program

g

if-then-else

Stmtlist2 p oints equivalentto .

1

 : Stmtlist2

1

g

for A ;B ;C  f

for A;B ;C  f

Stmtlist

Delete   

1 2

Stmtlist

 continue 

Continues

1 2

g

g

for A ;B ;C  f

if expr f

for A ;

 jump

1

B and not expr ;

g else f

C  f

Move

   , X is either a

1 2

Stmtlist1

Stmtlist1

Conditionals break or continue

g

Stmtlist2

Stmtlist2

g

g



2

for  stmt ; expr ;f Stmtlist contains no

Stmtlist stmt reachable program

 if expr f

Remove p oints equivalent to

1

0

Lo op g Stmtlist

 . The program p oint

1

 g after any break must b e

2

equivalentto .

1

if expr1 f

 X

1

if expr1 or expr2 f

g else f

X

if expr2 f

X and Y are equivalent

g

Create Short

 Y

2

jumps. I.e.,    .

else f

Circuit Or's

1 2

g else f

Stmtlist

Stmtlist

g

g

g

if expr1 f

if expr2 f if expr1 and expr2 f

Create Short Neither if stmt has an

Stmtlist Stmtlist

Circuit And's else branch

g g

g

Table 1: Canonical Co de Transformation Rules

 // f  , , , , g

1 2 4 5 6 7

for ;;f

 

1 2

if a

  // f  g

2 3 8

return a; break;

 // unreachable  // unreachable

3 4

g g

else f else f

 

4 5

a= b; continue;

  // unreachable

5 6

g g

  // unreachable

6 7

g



8

Figure 5: Reachable Points

Figure 6: EquivalentPoints

Figure 7 shows the original source co de of a sam- App endix B contains additional examples of

ple program. Figure 8 shows the results of expres- Krakatoa's output.

sion decompilation on the byteco de of this program.

Figure 9 shows the results of applying Ramshaw's

6 Countermeasures

algorithm to the decompiled expression graph. Fig-

ure 10 shows the result of the grammar rewrit-

Krakatoa is very e ective at repro ducing readable

ing rules applied to the output of Ramshaw's al-

Java source from Javabyteco de. This maybe

gorithm. Obviously, using DeMorgan's laws would

alarming to those who want to protect their source

simplify the b o olean expressions. Future versions

co de from unwanted copying. Unfortunately, there

of Krakatoa will do so.

are few countermeasures.

One could intro duce irreducible control- ow

For the JVM dup op erators, which duplicate

through b ogus conditional jumps to foil Ramshaw's

stack elements, Krakatoa simply creates a temp o-

algorithm. This, however, only stops the recre-

rary variable to hold the duplicated value. This

ation of high-level constructs. Krakatoa could sim-

yields unnatural, but easily readable, decompila-

ply pro duce source co de in a Java-like language ex-

tions. A more dicult problem is our failure to

tended with goto's.

recover the conditional-expression op erator, \?:".

One could intro duce bizarre stack b ehavior to foil

This op eration presents two diculties: it requires

determining short-circuit op erators during expres- expression recovery. This is dicult, however, b e-

cause the b ehavior cannot b e so bizarre as to yield

sion recovery, and it requires that expression recov-

unveri able byteco de. It is p ossible, however, to

ery handle non-empty stacks at basic blo ck b ound-

create many b ogus threads of control i.e., threads

aries. Fortunately, the short-circuit problem can b e

that will never execute that will confuse the ex-

handled easily with four simple graph-writing rules

given in [Cif93]. The non-empty stack problem is pression recovery mechanism in basic blo cks that

dicult b ecause it requires combining expressions are entered with non-empty stacks.

in our symb olic stackuponentering a basic blo ck One co de obfuscation technique that is mo destly

with multiple predecessors . Krakatoa again uses a e ectiveistochange the class le's symb ol table to

contain bizarre names for elds and metho ds. So

temp orary variable to hold the result of each branch

long as co op erating classes agree on these names,

of the conditional expression, and then assigns this

the class les will link and execute correctly [vV96, temp orary value to the conditional expression. We

Sri96]. are currently investigating other solutions to this

Another suggested solution is to use dedicated problem.

class fo o f

void fo oint x, int y f class fo o f

while x+y<10 && x > 5 f

void fo oint i1, int i2 f

if y > x k y < 100 f goto L4;

x=y; L1: if i2 > i1 goto L2;

g if i2 >= 100 goto L3;

else f L2: i1 = i2;

x += 100;

goto L4;

g L3: i1 += 100;

g L4: if i1+i2>=10 goto L5;

g if i1 > 5 goto L1;

g L5: return;

g // fo o

g // fo o

Figure 7: Original Source

Figure 8: After Expression Decompilation

class fo o f

void fo oint i1, int i2 f

lp3: for ;;f

if i1 + i2 >= 10 break lp3;

class fo o f

if !i1 > 5 break lp3;

void fo oint i1, int i2 f

lp2 : for ;;f

lp3: for  ;!i1+i2>=10&&i1>5;  f

lp1 : for  ;; f

if i2 > i1 k !i2 >= 100 f

if i2 > i1 break lp1;

i1 = i2;

if !i2 >= 100 break lp1;

g // then

break lp2;

else f

g // lp1

i1 += 100;

i1 = i2;

g

continue lp3;

g // lp3

g // lp2

return;

i1 += 100;

g

continue lp3;

g

g // lp3

return;

g

g

Figure 10: After AST Transformation Final De-

compilation Results

Figure 9: After Goto Elimination Ramshaw's Algorithm

hardware and encryption to protect class les graph transformations. Our goal is similar, since

[Wil97]. the output of the decompiler should b e as readable

as p ossible. Her technique structures old FOR-

Many traditional countermeasures to reverse-

TRAN programs for readability. As a result, her

engineering will not work for Javabyteco de. It is

technique may leave some goto's in the resulting

imp ossible to mix co de and data. It is imp ossible to

programs, which is not allowed in Java.

jump to the middle of instructions. It is imp ossible

to generate byteco de and then jump to it.

Other techniques for eliminating goto's have

b een prop osed [EH94, Amm92, AKPW83, AM75].

These techniques maychange the structure of the

7 Related Work

program, and may add condition variables, or cre-

ate subroutines.

Ramshaw presented a technique for eliminating

goto's in Pascal programs by replacing them with

multilevel break's and surrounding lo ops [Ram88].

8 Conclusion

He made no attempt to recover high-level control

constructs. All high-level control structures were

In this pap er, we present a technique for decom-

provided by the original Pascal.

piling Javabyteco de into Java source. Our decom-

Several decompilation systems have used a se-

piler, Krakatoa, pro duces syntactically legal Java

ries of graph transformations to recover high-level

source from legal, reducible Javabyteco de. We fo-

constructs [Lic85, Cif93]. These systems encounter

cus on two subproblems of decompilation: recov-

diculties in the presence of nested lo ops, and

ery of expressions from Java's stack-based byte-

other arbitrarily control ow. Multilevel break's

co de, and recovery of high-level control- ow con-

cause considerable problems. Exception handling

structs. We present our stack simulation metho d

intro duces another diculty to such systems, as

for recovering expressions. We present an extension

the control ow graph can b e entered in several

of Ramshaw's goto elimination technique that can

places. Krakatoa easily creates multi-level break's

b e applied to any reducible control- ow graph.

and continue's, and is able to eliminate virtually

We also present a small, yet p owerful, set of co de

all of the unnecessary ones via successive applica-

rewriting rules for recovering the natural high-level

tion of the rewrite rules.

control- ow constructs of the Java source language.

\Mo cha" version 1 b eta 1 [vV96]isaJava de-

These rewrite rules enable Krakatoa to successfully

written by Hanp eter van Vliet. Mo cha

decompile many class les that graph transforma-

uses graph transformations to recover high-level

tion systems fail. If Krakatoa is presented with

constructs. Mo cha often ab orts when it confronts

a high-level language idiom that it do es not rec-

tangled|yet structured|control ow including

ognize, it may leave unnecessary breaksor con-

multi-level break's and continue's. The system

tinues in the co de. It will still pro duce legal Java,

do es issue typ e declarations, and uses debugging in-

however. If a system relies on a graph transforma-

formation when present to recover lo cal variable

tion system to pro duce high-level constructs, it will

names.

fail when presented with an unexp ected construct.

Other graph transformation systems used no de-

Our techniques, combined with the abundant

splitting to transform an unstructured graph to a

typ e information available in class les, make de-

structured graph [WO78, PKT73, Wil77]. Peter-

compilation of Javabyteco de quite e ective.

son, Kasami, and Tokura present a pro of that every

ow graph can b e transformed into an equivalent

well-formed ow graph. Williams and Ossher use a

9 Acknowledgment

similar technique, but they recognize ve unstruc-

tured sub-graphs, and replace those with equivalent

Saumya Debray help ed develop the instruction or-

structured graphs. No de-splitting preserves the ex-

dering algorithm.

ecution sequence of a program, but not the struc-

ture. We do not consider this reasonable for de-

compilation.

References

Baker presents a technique for pro ducing pro-

[AKPW83] J.. Allen, Ken Kennedy, Carrie grams from ow graphs [Bak77]. Baker gener-

Porter eld, and Jo e Warren. Conver- ates summary control ow information to guide her

[PKT73] W.W. Peterson, T. Kasami, and sion of control dep endence to data de-

N. Tokura. On the capabilities of while, p endence. pages 177{189, 1983.

rep eat and exit statements. Commu-

[AM75] E. Ashcroft and Z. Manna. Translating

nications of the ACM, 168:503{512,

programs schemas to while-schemas.

1973.

SIAM Journal of Computing, 42:125{

[Ram88] Lyle Ramshaw. Eliminating go to's

146, 1975.

while preserving program structure.

[Amm92] Zahira Ammarguellat. A control- ow

Journal of the Association for Comput-

normalization algorithm and its com-

ing Machinery, 354:893{920, Octob er

plexity. IEEE Transactions on Soft-

1988.

ware Engineering, 182:237{250, 1992.

[Sri96] KB Sriram. Hashjava. url:

[ASU86] A. V. Aho, R. Sethi, and J. D. Ull-

http://www.sbktech.org/hashjava.html,

man. : Principles, Tech-

1996.

niques, and Tools. Addison-Wesley,

[vV96] Hanp eter van Vliet. Mo cha. current

Reading, Massachusetts, 1986.

url: http://www.brouhaha.com/~eric/

computers/mo cha-b1.zip, 1996.

[Bak77] Brenda S. Baker. An algorithm for

structuring owgraphs. Journal of the

[Wil77] M.H. Williams. Generating struc-

Association for Computing Machinery,

tured ow diagrams: The nature of

241:98{120, January 1977.

unstructuredness . Computer Journal,

201:45{50, 1977.

[Cif93] Cristina Cifuentes. A structuring al-

gorithm for decompilation. In Pro-

[Wil97] U. G. Wilhelm. Cryptographically

ceedings of the XIX Conferencia Lati-

protected ob jects, May 1997. A

noamericana de Informatica, pages

frenchversion app eared in the Pro-

267{276, Buenos Aires, Argentina, Au-

ceedings of RenPar'9, Lausanne, CH.

gust 1993.

http://lsewww.epfl.ch/~wilhelm/

CryPO.html.

[EH94] Ana M. Erosa and Laurie J. Hendren.

Taming control ow: A structured ap-

[WO78] M.H. Williams and H.L. Ossher. Con-

proach to eliminating goto statements.

version of unstructured ow diagrams

pages 229{240. International Confer-

to structured. Computer Journal,

ence on Computer Languages, May

212:161{167, 1978.

1994.

[Lic85] Ulrike Lichtblau. Decompilation of

A Additional Rewriting Rules

control structures by means of graph

transformations. In C. F. M. Ni-

Weanticipate using a few other tree rewriting rules

vat Hartmut Ehrig and J. Thatcher,

that might improve readability of our co de. The

editors, Mathematical foundations of

anticipated rules build more natural for-lo ops. Ta-

software development: Proceedings of

ble 2 presents addition co de transformation rules

the International Joint Conferenceon

that could b e applied by Krakatoa. We exp ect

Theory and Practice of Software De-

to add these rules as we re-implement Krakatoa in

velopment TAPSOFT 85: volume 1

Java.

- Col loquium on Trees in Algebra and

Programming CAAP '85,volume 185

B Sample Decompiler Output

of Lecture Notes in Computer Science,

pages 284{297. Springer-Verlag, March

We've included a representative sampling of

1985.

Krakatoa's output on a class le that implements

[LY97] Tim Lindholm and Frank Yellin. The sets in Java. The original Java source is on the left

Java Virtual Machine Speci cation. and Krakatoa's output is on the right. Table 3 pro-

The Java Series. Addison-Wesley, 1997. vides original source class de nitions as well as the

Rule Before After Conditions

I

for I ; expr ; update  f

for ;expr ; update  f

Include

Stmtlist I is a simple statement

Stmtlist

Init

g

g

for init ; expr ;f

Stmtlist contains no

for init ; expr ; U  f

Include Stmtlist reachable program

Stmtlist

Up date U p oints equivalentto.

1

g

 g U is a simple statement

1

Table 2: Additional Co de Transformation Rules

Original Source Output from Krakatoa

import java.io.PrintStream; import java.io.PrintStream;

import java.util.Vector; import java.util.Vector;

public class Set public class Set

implements Cloneable { extends java.lang.Object

implements java.lang.Cloneable{

// class variables

static boolean echo_ops; static boolean echo_ops;

// instance variables

protected Vector members; protected java.util.Vector members;

// functions are defnied here.... // functions are defined here...

} }

Table 3: Class de nition output from Krakatoa

corresp onding Krakatoa output. Table 4 provides

original source of several small functions together

with Krakatoa output for those functions. Table 5

shows a larger function in original source as well as

Krakatoa output for that function.

Original Source Output from Krakatoa

public boolean isMemberObject o { public boolean isMember

java.lang.Object local1 {

return members.containso; return this.members.containslocal1;

} // isMember } // isMember

public void addMemberObject o { public void addMember

java.lang.Object local1 {

if !isMembero { if ! this.isMemberlocal1 != 0 {

members.addElemento; this.members.addElementlocal1

} // then } // then

return;

} // addMember } // addMember

public void removeMemberObject o { public void removeMember

java.lang.Object local1 {

members.removeElemento; this.members.removeElementlocal1

return;

} // removeMember } // removeMember

public int size { public int size {

return members.size; return this.members.size;

} // size } // size

boolean equalsSet local1 {

boolean equalsSet s { Set local2;

Set d1, d2; Set local3;

d1 = differences; local2 = this.differencelocal1;

d2 = s.differencethis; local3 = local1.differencethis;

if !local2.size != 0 ||

!local3.size == 0 {

return 1;

return d1.size == 0 && } // then

d2.size == 0; else {

return 0;

} }//if

} // equals

Table 4: Member Functions: Original Source and Krakatoa output

Original Source Output from Krakatoa

// This returns a NEW set, with all of

// the elements from this set and

// Set s.

public Set unionSet s { public Set unionSet local1 {

Set out; Set local2;

int size; int local3;

int i; int local4;

Object obj; java.lang.Object local5;

if echo_ops { if ! Set.echo_ops == 0 {

System.out.println"unioning"; java.lang.System.out.println"unioning";

} } // then

out = new Set; local2 = new Set;

local2.members = java.util.Vector

this.members.clone;

out.members = Vector members.clone; local3 = local1.size;

local4 = 0;

size = s.size; loop3 :

for  ; !!local4 < local3 ;  {

for i = 0; i < size; i++ { local5 =

obj = s.members.elementAti; local1.members.elementAtlocal4;

if !out.isMemberobj{ if !local2.isMemberlocal5 != 0 {

out.addMemberobj; local2.addMemberlocal5

} // then } // then

local4 += 1;

} // for } // loop3

return out; return local2;

} // union } // union

Table 5: Memb er functions: Original Source and Krakatoa output