The following paper was originally published in the Proceedings of the Third USENIX Conference on Object-Oriented Technologies and Systems Portland, Oregon, June 1997
Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?)
Todd A. Proebsting, Scott A. Watterson The University of Arizona
For more information about USENIX Association contact: 1. Phone: 510 528-8649 2. FAX: 510 548-5738 3. Email: [email protected] 4. WWW URL: http://www.usenix.org
Krakatoa: Decompilation in Java
Does Bytecode Reveal Source?
To dd A. Pro ebsting Scott A. Watterson
The University of Arizona
Abstract well known. Presently,we fo cus our researchon
two subproblems: recovering source-level expres-
This pap er presents our technique for automati-
sions and synthesizing high-level control constructs
cally decompiling Javabyteco de into Java source.
from goto-like primitives.
Our technique reconstructs source-level expres-
Krakatoa uses a stack-simulation technique to re-
sions from byteco de, and reconstructs readable,
cover expressions and p erform typ e inference. Ex-
high-level control statements from primitive goto-
pression recovery creates source-level assignments
like branches. Fewer than a dozen simple co de-
and comparisons from primitivebyteco de op era-
rewriting rules reconstruct the high-level state-
tions. We extend Ramshaw's goto-elimination al-
ments.
gorithm to structure and create source for ar-
bitrary reducible control ow graphs. This tech-
nique pro duces source co de with lo ops and multi-
1 Intro duction
level break's. Subsequent techniques recover more
intuitive constructs e.g., if statements via appli-
Decompilation transforms a low-level language into
cation of simple co de rewrite rules.
a high-level language. The Java Virtual Machine
JVM sp eci es a low-level byteco de language for a
Traditional decompilation systems use graph
stack-based machine [LY97]. This language de nes
transformations to recover high-level control con-
203 op erators, with most of the control ow sp eci-
structs. These systems require the author of the
ed by simple explicit transfers and lab els. Compil-
decompiler to anticipate all high-level control id-
ing a Java class yields a class le that contains typ e
ioms. When faced with an unexp ected language
information and byteco de. The JVM requires a sig-
idiom, these systems either ab ort, or pro duce gotos
ni cant amountoftyp e information from the class
illegal in Java. Krakatoa represents a di erent ap-
les for ob ject linking. Furthermore, the byteco de
proach. Krakatoa rst pro duces legal Java source
must b e veri ably wel l-behaved in order to ensure
given legal Javabyteco de with arbitrary reducible
safe execution. Decompilation systems can exploit
control ow, and then recovers intuitive high-level
this typ e information and well-b ehaved prop ertyto
constructs from this source.
recover Java source co de from the class le.
We present a technique for transforming low-level
Figure 1 gives the ve steps of decompilation
Javabyteco de into legal Java source co de. Our sys-
p erformed by Krakatoa. First, the expression
1
tem, Krakatoa, p erforms typ e inference to issue
builder reads byteco de, recovers expressions and
lo cal variable declarations. The veri er do es the
typ e information, and pro duces a control ow
same typeoftyp e inference, and the techniques are
graph CFG. Next, the sequencer orders the CFG
no des for Ramshaw's goto-elimination technique.
Address: Department of Computer Science, Uni-
Ramshaw's algorithm pro duces a convoluted|yet
versity of Arizona, Tucson, AZ 85721; Email: fto dd,
legal|Java abstract syntax tree AST. Our sys-
1
Krakatoa is a volcano lo cated in the Sunda Strait b e-
tem then transforms this AST into a less convo-
tween Java and Sumatra. Its 1983 eruption threw ve cubic
luted AST using a set of simple rewrites. The nal
miles of debris into the air and was heard 2200 miles away
in Australia. phase pro duces Java source by traversing the AST.
class foo { int sam;
Java Bytecodes
int barint a, int b { if sam > a {
Expression Builder
b = a*2; }
Flow graph with expressions and conditional gotos
return b; }
Node Sequencer }
Augmented Flow graph Compiled from foo.java
class foo extends java.lang.Object { int sam;
Goto Eliminator (Ramshaw's Algorithm) int barint,int;
Java AST
Method int barint,int
0 aload_0
Code Simplifier 1 getfield 3
Restructured Java AST 5 if_icmple 12 8 iload_1
Final Java printer 9 iconst_2 10 imul
Java Source 11 istore_2
12 iload_2
13 ireturn
}
Figure 1: Java Byteco de Decompilation System
Figure 2: Simple Metho d and Byteco de Disassem-
bly via javap -c.
2 Expression Recovery
instance, iload_1, which loads the value of the
Javabyteco des b ear a very close corresp ondence
rst lo cal variable|with typ e int|could b e rep-
to Java source. As a result, recovering expres-
resented on the stackas\i1". Similarly,if i1 and
sions from Javabyteco de is often simple|much
2 were the top two elements of the symb olic stack,
simpler than recovering expressions from machine
and the next byteco de were iadd integer addition,
language. Java class les include information that
those elements would b e p opp ed o the stack and
makes recovering high-level op erations like eld ref-
replaced with \i1+2". The symb olic execution
erences easy. The fact that the byteco de must b e
of some expressions, like assignment, requires emit-
well-b ehaved i.e., veri able also simpli es analy-
ting Java source.
sis. Figure 2 gives a sample program and its abbre-
Our algorithm recovers expressions one basic
viated disassembly. Note the level of typ e informa-
blo ck at a time. Some basic blo cks such as those
tion in the disassembly pro duced by Sun's javap
pro duced by the conditional expression op eration, utility.
A?B:C do not b egin with empty stacks, so some
Symb olic execution of the byteco de creates the
information is required to prop ogate from prede-
corresp onding Java source expressions. It also cre-
cessors. Also, basic blo cks that b egin exception-
ates conditional and unconditional goto's, which
handling blo cks|which are easily identi ed|b egin
will b e removed by subsequent decompilation steps.
with the raised exception on the stack.
Symb olic execution simulates the Java Virtual Ma-
Figure 3 provides the step-by-step decompilation chine's evaluation stack with strings that represent
of the byteco de in Figure 2. The initial aload_0 the source-level expressions b eing computed. For
instruction pushes a Java reference onto the stack. program with exactly the same control owasthe
In virtual functions, the \0'th" lo cal variable, a0, goto-only program.
always refers to this. The getfield instruction Ramshaw's algorithm requires two inputs: the
references a named eld, \sam", of the current top control ow graph, and an instruction ordering. His
of stack. Therefore, the \this" is p opp ed and algorithm enco des this order into the ow graph
replaced with \this.sam". iload_1 pushes \i1" using augmenting edges, such that every instruc-
onto the stack. The ifcmple compares the top two tion has an augmenting edge to the next instruc-
stack elements and branches to the appropriate in- tion in sequence. These augmenting edges o ccur
struction if the lower is less than or equal to the between every pair of physically adjacent instruc-
top element. Symb olically executing the ifcmple tions even if actual control owbetween them is
requires p opping the top two elements and emit- imp ossible. He proves that if this augmented graph
ting the appropriate conditional branch. Translat- is reducible, then a structural ly equivalent [PKT73]
ing the remaining instructions is similar. program can b e created without goto's. How-
ever, Ramshaw provides no algorithm for nding
Most of the byteco de instructions are equally
a reducible augmented ow graph from a given re-
simple to symb olically execute. Unfortunately,a
ducible ow graph.
few require more information. Some of the stack
The control- ow graphs of Java programs are
manipulation routines e.g., pop2, dup2, etc. de-
reducible. Therefore, the compiled byteco de will
p end on byte o sets from the stack top. For in-
likely form a reducible control- ow graph. Unfor-
stance, pop2 removes the top 8 bytes from the
tunately, simple optimizations likeloopinversion
stack, whether those 8 bytes represent one 8-byte
create irreducible augmented ow graphs. The ow
double value, or two 4-byte scalar values. To cor-
graph of the program in Figure 8 has this problem
rectly simulate these instructions the symb olic ex-
b ecause the augmenting edge b etween the rst two
ecution keeps track of the size and typ e of each
statements creates a \jump" into the b o dy of the
stack element.
lo op formed by the next seven statements.
To utilize Ramshaw's algorithm, we develop ed an
3 Instruction Ordering
algorithm that orders a reducible graph's instruc-
tions such that the resulting augmented graph is
After recovering expressions, conditional and un-
also reducible.
conditional goto's along with implicit fal l through
b ehavior determine control ow. Java, however,
3.1 Augmenting the Flow Graph
has no goto statement, so its control owmust b e
expressed with structured statements.
Creating a reducible augmented ow graph re-
Ramshaw presented an algorithm for eliminating
quires that no augmenting edge enters a lo op any-
goto's from Pascal programs while preserving the
where other than at its header. Preventing this
program's structure [Ram88]. This algorithm re-
is simple|when ordering the instructions, make
places each goto with a multilevel break to a sur-
the header rst and contiguously order the lo op's
rounding lo op. The algorithm determines the ap-
instructions. Because physical adjacency deter-
propropriate lo cations for these surrounding lo ops.
mines augmenting edges, contiguously ordering the
We trivially extended his algorithm to use multi-
instructions guarantees that the only augmenting
level continue's.
edge entering the lo op from the outside will b e en-
tering at the top, which will not a ect reducibility Ramshaw's extended algorithm replaces each
if it is the lo op's header. forward goto with a break and each backward
A lo op with no nested lo ops inside is easy to goto with a continue. His algorithm inserts a lo op
order|simply remove the back edges and top o- that ends just b efore the target of each break state-
logically sort the remaining directed acyclic graph ment. Likewise, it inserts a lo op that starts just
DAG. Handling interior lo ops requires replacing b efore the target of each continue. These lo ops
them with a single placeholder no de in the graph ensure that each control-transfer statement jumps
and separately ordering b oth the lo op and the sur- to the correct instruction. Each newly-inserted
rounding graph. After ordering b oth, re-insert the lo op must also end with a break statement, so
lo op's no des at its placeholder. Re-ordering in- that control will fal l out of the lo op. Figure 4
structions maychange whether or not one instruc- shows an example of this technique. Additional
tion fal ls through to another as it did in the original lo ops and break/continue's create a structured
Byteco de Symb olic Stack Emitted Source
aload 0 "this"
getfield 3
iload 1 "this.sam", "i1"
if icmple 12 if this.sam <= i1 goto L12
iload 1 "i1"
iconst 2 "i1", "2"
imul "i1*2"
istore 2 i2 = i1*2
12: iload 2 "i2" L12:
ireturn return i2
Figure 3: Symb olic Execution of Byteco de
stmt0
L2: for ;; f
L1: for ;; f
if expr 1 break L1; stmt0
if expr 2 break L2; if expr 1 goto L1;
break L1; if expr 2 goto L2;
g // L1 L1: stmt1
stmt1 L2: stmt2
break L2;
g // L2
stmt2
Figure 4: Ramshaw's Goto Elimination: Before and After
ordering. Where implicit control ow has changed, dering of the no des the augmenting path, Kraka-
the algorithm must add new branches to restore toa can apply Ramshaw's technique to eliminate
the original control ow. Whenever p ossible, the goto's.
top ological sort attempts to maintain the original
fall-through b ehavior.
4 Co de Transformations
This algorithm pro duces a reducible augmented
graph. Because all lo ops are ordered separately,
4.1 Program Points
and laid out contiguously, the only augmenting
edge entering from outside enters at the top. The
After applying Ramshaw's algorithm for eliminat-
top ological sort of the lo op minus its backedges
ing goto's, Krakatoa has a complex, yet legal, Java
guarantees that this top no de is the lo op header and
AST see Figure 9. Krakatoa then pro ceeds to
that no internal edges cause irreducibility. Outside
recover more of the natural high-level constructs
edges into the lo op header cannot make a lo op irre-
of the original program e.g. if-then-else, etc..
ducible. Therefore, the resulting augmented graph
Krakatoa uses a program point analysis to summa-
is reducible.
rize a program's control- ow and to guide recover-
Lo ops are not the only blo cks of instructions
ing high-level constructs. A program p oint is a syn-
whichmust b e ordered contiguously. Exception
tactic lo cation in a program. Every statement has a
handling regions must form contiguous sections of
program p oint b oth b efore and after it. These pro-
instructions. Class les sp ecify which instructions
gram p oints havetwo prop erties: reachability and
are in which regions. Our algorithm orders those
equivalence class.
regions contiguously by treating them like lo ops.
A program p oint is unreachable if and only if it is
preceded along all execution paths by an uncondi- After applying this technique to create a total or-
tional jump statement i.e. return, throw, break, then-branchStmtlist1 can reach Stmtlist2 di-
or continue. For instance, in Figure 5, program rectly.
p oint3, , is unreachable, since it is preceded bya
3
return statement. is reachable, however, since
6
4.4 Lo op Rewriting Rules
one of the branches in the preceding if statement
The third rule in Table 1 removes useless continue
do es not end with a jump statement.
statements. If the program p oint after a continue
Two program p oints are equivalent denoted as
statement is equivalent to the program p oint b efore
if and only if future computation of the
x y
the continue statement, then that continue can
program is the same from b oth p oints. For in-
b e removed.
stance, the program p oint b efore a break state-
The fourth rule creates a short-circuit test ex-
ment is equivalent to the program p oint after the
pression within a for-lo op by eliminating an inte-
lo op it exits and in Figure 6. As an ex-
3 8
rior if statement. Doing so requires that the lo op
ample, in Figure 6, , , , , , and are
1 2 4 5 6 7
b o dy b egin with an if-then-else statement, and
equivalent, as are program p oints and .
3 8
that the then branch of that statement consists
Both reachability and equivalence are simple
of a single jump to a program p oint equivalentto
to compute via standard control- ow analyses
breaking out of the lo op.
[ASU86].
The fth transformation provides an example of
transforming lo ops into if statements. Aloopis
4.2 AST Rewrite Rules
equivalenttoanif if it can never rep eat itself, and if
all simple break statements can b e safely removed
Krakatoa p erforms a series of AST rewriting trans-
during the transformation. A lo op never rep eats
formations to recover as many of the \natural" pro-
if its last program p oint is unreachable. break's
gram constructs as it can e.g. if-then-else, etc..
may b e removed if the immediately following un-
Krakatoa applies these rewriting rules rep eatedly
reachable program p oint is equivalent to the last
until no changes o ccur. Wehave found that the
program p oint in the lo op in Table 1. The
1
few rules b elow are sucient to retrieve high-level
transformation replaces the lo op with an if state-
constructs of the Java language, including if-then-
ment, and deletes all of the break statements for
else statements, and short-circuit evaluation of ex-
that lo op.
pressions. Each rewriting rule reduces the size of
the AST, thus ensuring termination.
4.5 Short Circuit Evaluation
Table 1 summarizes the rules, whichwe describ e
b elow in greater detail. Many of these rules gen-
Rewriting Rules
eralize. Those that apply to for-lo ops often apply
The sixth rule shown in Table 1 recovers a short-
to other lo ops. Many rules have several symmetric
circuit Or conditional. Short-circuit Or's exist
cases. For example, the rst rule in Table 1 re-
when two adjacent conditionals guard the same
moves an empty else-branch from an if-then-else
statement list and failure of either will cause a
statement|there is a symmetric rule for removing
branch to equivalent lo cations.
an empty then-branchby negating the predicate.
The last transformation in Table 1 recovers short-
circuit And expressions. This transformation is ap-
4.3 if-then-else Rewriting Rules
plicable whenever a simple if statement represents
the entire b o dy of another.
The rst transformation shown in Table 1 changes
an if-then-else statementinto an if-then state-
ment when the else branch is empty. This trans-
5 Status
formation is always legal.
The second transformation creates an if-then- Wehave implemented a prototyp e Java decompiler,
else statement from an if-then statementby hoist- Krakatoa, in Java. Wehave run Krakatoa on a
ing the subsequent statement list into the else-part. numb er of class les, including some to whichwe
Our algorithm p erforms this transformation if and had no source co de access. We examined the output
only if no reachable program p ointin Stmtlist1 is of Krakatoa by hand, and Krakatoa app ears to re-
equivalent to the program p oint b efore Stmtlist2. cover high-level constructs very well. Figures 7{10
Essentially, this means that no statementinthe provide an example of the stages of decompilation.
Rule Before After Conditions
if expr f if expr f
Eliminate
Stmtlist Stmtlist None
Else
g else fg g
if expr f
if expr f
Stmtlist1 Stmtlist1 contains no
Stmtlist1
Create
g else f reachable program
g
if-then-else
Stmtlist2 p oints equivalentto .
1
: Stmtlist2
1
g
for A ;B ;C f
for A;B ;C f
Stmtlist
Delete
1 2
Stmtlist
continue
Continues
1 2
g
g
for A ;B ;C f
if expr f
for A ;
jump
1
B and not expr ;
g else f
C f
Move
, X is either a
1 2
Stmtlist1
Stmtlist1
Conditionals break or continue
g
Stmtlist2
Stmtlist2
g
g
2
for stmt ; expr ;f Stmtlist contains no
Stmtlist stmt reachable program
if expr f
Remove p oints equivalent to
1
0
Lo op g Stmtlist
. The program p oint
1
g after any break must b e
2
equivalentto .
1
if expr1 f
X
1
if expr1 or expr2 f
g else f
X
if expr2 f
X and Y are equivalent
g
Create Short
Y
2
jumps. I.e., .
else f
Circuit Or's
1 2
g else f
Stmtlist
Stmtlist
g
g
g
if expr1 f
if expr2 f if expr1 and expr2 f
Create Short Neither if stmt has an
Stmtlist Stmtlist
Circuit And's else branch
g g
g
Table 1: Canonical Co de Transformation Rules
// f , , , , g
1 2 4 5 6 7
for ;;f
1 2
if a
// f g
2 3 8
return a; break;
// unreachable // unreachable
3 4
g g
else f else f
4 5
a= b; continue;
// unreachable
5 6
g g
// unreachable
6 7
g
8
Figure 5: Reachable Points
Figure 6: EquivalentPoints
Figure 7 shows the original source co de of a sam- App endix B contains additional examples of
ple program. Figure 8 shows the results of expres- Krakatoa's output.
sion decompilation on the byteco de of this program.
Figure 9 shows the results of applying Ramshaw's
6 Countermeasures
algorithm to the decompiled expression graph. Fig-
ure 10 shows the result of the grammar rewrit-
Krakatoa is very e ective at repro ducing readable
ing rules applied to the output of Ramshaw's al-
Java source from Javabyteco de. This maybe
gorithm. Obviously, using DeMorgan's laws would
alarming to those who want to protect their source
simplify the b o olean expressions. Future versions
co de from unwanted copying. Unfortunately, there
of Krakatoa will do so.
are few countermeasures.
One could intro duce irreducible control- ow
For the JVM dup op erators, which duplicate
through b ogus conditional jumps to foil Ramshaw's
stack elements, Krakatoa simply creates a temp o-
algorithm. This, however, only stops the recre-
rary variable to hold the duplicated value. This
ation of high-level constructs. Krakatoa could sim-
yields unnatural, but easily readable, decompila-
ply pro duce source co de in a Java-like language ex-
tions. A more dicult problem is our failure to
tended with goto's.
recover the conditional-expression op erator, \?:".
One could intro duce bizarre stack b ehavior to foil
This op eration presents two diculties: it requires
determining short-circuit op erators during expres- expression recovery. This is dicult, however, b e-
cause the b ehavior cannot b e so bizarre as to yield
sion recovery, and it requires that expression recov-
unveri able byteco de. It is p ossible, however, to
ery handle non-empty stacks at basic blo ck b ound-
create many b ogus threads of control i.e., threads
aries. Fortunately, the short-circuit problem can b e
that will never execute that will confuse the ex-
handled easily with four simple graph-writing rules
given in [Cif93]. The non-empty stack problem is pression recovery mechanism in basic blo cks that
dicult b ecause it requires combining expressions are entered with non-empty stacks.
in our symb olic stackuponentering a basic blo ck One co de obfuscation technique that is mo destly
with multiple predecessors . Krakatoa again uses a e ectiveistochange the class le's symb ol table to
contain bizarre names for elds and metho ds. So
temp orary variable to hold the result of each branch
long as co op erating classes agree on these names,
of the conditional expression, and then assigns this
the class les will link and execute correctly [vV96, temp orary value to the conditional expression. We
Sri96]. are currently investigating other solutions to this
Another suggested solution is to use dedicated problem.
class fo o f
void fo oint x, int y f class fo o f
while x+y<10 && x > 5 f
void fo oint i1, int i2 f
if y > x k y < 100 f goto L4;
x=y; L1: if i2 > i1 goto L2;
g if i2 >= 100 goto L3;
else f L2: i1 = i2;
x += 100;
goto L4;
g L3: i1 += 100;
g L4: if i1+i2>=10 goto L5;
g if i1 > 5 goto L1;
g L5: return;
g // fo o
g // fo o
Figure 7: Original Source
Figure 8: After Expression Decompilation
class fo o f
void fo oint i1, int i2 f
lp3: for ;;f
if i1 + i2 >= 10 break lp3;
class fo o f
if !i1 > 5 break lp3;
void fo oint i1, int i2 f
lp2 : for ;;f
lp3: for ;!i1+i2>=10&&i1>5; f
lp1 : for ;; f
if i2 > i1 k !i2 >= 100 f
if i2 > i1 break lp1;
i1 = i2;
if !i2 >= 100 break lp1;
g // then
break lp2;
else f
g // lp1
i1 += 100;
i1 = i2;
g
continue lp3;
g // lp3
g // lp2
return;
i1 += 100;
g
continue lp3;
g
g // lp3
return;
g
g
Figure 10: After AST Transformation Final De-
compilation Results
Figure 9: After Goto Elimination Ramshaw's Algorithm
hardware and encryption to protect class les graph transformations. Our goal is similar, since
[Wil97]. the output of the decompiler should b e as readable
as p ossible. Her technique structures old FOR-
Many traditional countermeasures to reverse-
TRAN programs for readability. As a result, her
engineering will not work for Javabyteco de. It is
technique may leave some goto's in the resulting
imp ossible to mix co de and data. It is imp ossible to
programs, which is not allowed in Java.
jump to the middle of instructions. It is imp ossible
to generate byteco de and then jump to it.
Other techniques for eliminating goto's have
b een prop osed [EH94, Amm92, AKPW83, AM75].
These techniques maychange the structure of the
7 Related Work
program, and may add condition variables, or cre-
ate subroutines.
Ramshaw presented a technique for eliminating
goto's in Pascal programs by replacing them with
multilevel break's and surrounding lo ops [Ram88].
8 Conclusion
He made no attempt to recover high-level control
constructs. All high-level control structures were
In this pap er, we present a technique for decom-
provided by the original Pascal.
piling Javabyteco de into Java source. Our decom-
Several decompilation systems have used a se-
piler, Krakatoa, pro duces syntactically legal Java
ries of graph transformations to recover high-level
source from legal, reducible Javabyteco de. We fo-
constructs [Lic85, Cif93]. These systems encounter
cus on two subproblems of decompilation: recov-
diculties in the presence of nested lo ops, and
ery of expressions from Java's stack-based byte-
other arbitrarily control ow. Multilevel break's
co de, and recovery of high-level control- ow con-
cause considerable problems. Exception handling
structs. We present our stack simulation metho d
intro duces another diculty to such systems, as
for recovering expressions. We present an extension
the control ow graph can b e entered in several
of Ramshaw's goto elimination technique that can
places. Krakatoa easily creates multi-level break's
b e applied to any reducible control- ow graph.
and continue's, and is able to eliminate virtually
We also present a small, yet p owerful, set of co de
all of the unnecessary ones via successive applica-
rewriting rules for recovering the natural high-level
tion of the rewrite rules.
control- ow constructs of the Java source language.
\Mo cha" version 1 b eta 1 [vV96]isaJava de-
These rewrite rules enable Krakatoa to successfully
compiler written by Hanp eter van Vliet. Mo cha
decompile many class les that graph transforma-
uses graph transformations to recover high-level
tion systems fail. If Krakatoa is presented with
constructs. Mo cha often ab orts when it confronts
a high-level language idiom that it do es not rec-
tangled|yet structured|control ow including
ognize, it may leave unnecessary breaksor con-
multi-level break's and continue's. The system
tinues in the co de. It will still pro duce legal Java,
do es issue typ e declarations, and uses debugging in-
however. If a system relies on a graph transforma-
formation when present to recover lo cal variable
tion system to pro duce high-level constructs, it will
names.
fail when presented with an unexp ected construct.
Other graph transformation systems used no de-
Our techniques, combined with the abundant
splitting to transform an unstructured graph to a
typ e information available in class les, make de-
structured graph [WO78, PKT73, Wil77]. Peter-
compilation of Javabyteco de quite e ective.
son, Kasami, and Tokura present a pro of that every
ow graph can b e transformed into an equivalent
well-formed ow graph. Williams and Ossher use a
9 Acknowledgment
similar technique, but they recognize ve unstruc-
tured sub-graphs, and replace those with equivalent
Saumya Debray help ed develop the instruction or-
structured graphs. No de-splitting preserves the ex-
dering algorithm.
ecution sequence of a program, but not the struc-
ture. We do not consider this reasonable for de-
compilation.
References
Baker presents a technique for pro ducing pro-
[AKPW83] J.R. Allen, Ken Kennedy, Carrie grams from ow graphs [Bak77]. Baker gener-
Porter eld, and Jo e Warren. Conver- ates summary control ow information to guide her
[PKT73] W.W. Peterson, T. Kasami, and sion of control dep endence to data de-
N. Tokura. On the capabilities of while, p endence. pages 177{189, 1983.
rep eat and exit statements. Commu-
[AM75] E. Ashcroft and Z. Manna. Translating
nications of the ACM, 168:503{512,
programs schemas to while-schemas.
1973.
SIAM Journal of Computing, 42:125{
[Ram88] Lyle Ramshaw. Eliminating go to's
146, 1975.
while preserving program structure.
[Amm92] Zahira Ammarguellat. A control- ow
Journal of the Association for Comput-
normalization algorithm and its com-
ing Machinery, 354:893{920, Octob er
plexity. IEEE Transactions on Soft-
1988.
ware Engineering, 182:237{250, 1992.
[Sri96] KB Sriram. Hashjava. url:
[ASU86] A. V. Aho, R. Sethi, and J. D. Ull-
http://www.sbktech.org/hashjava.html,
man. Compilers: Principles, Tech-
1996.
niques, and Tools. Addison-Wesley,
[vV96] Hanp eter van Vliet. Mo cha. current
Reading, Massachusetts, 1986.
url: http://www.brouhaha.com/~eric/
computers/mo cha-b1.zip, 1996.
[Bak77] Brenda S. Baker. An algorithm for
structuring owgraphs. Journal of the
[Wil77] M.H. Williams. Generating struc-
Association for Computing Machinery,
tured ow diagrams: The nature of
241:98{120, January 1977.
unstructuredness . Computer Journal,
201:45{50, 1977.
[Cif93] Cristina Cifuentes. A structuring al-
gorithm for decompilation. In Pro-
[Wil97] U. G. Wilhelm. Cryptographically
ceedings of the XIX Conferencia Lati-
protected ob jects, May 1997. A
noamericana de Informatica, pages
frenchversion app eared in the Pro-
267{276, Buenos Aires, Argentina, Au-
ceedings of RenPar'9, Lausanne, CH.
gust 1993.
http://lsewww.epfl.ch/~wilhelm/
CryPO.html.
[EH94] Ana M. Erosa and Laurie J. Hendren.
Taming control ow: A structured ap-
[WO78] M.H. Williams and H.L. Ossher. Con-
proach to eliminating goto statements.
version of unstructured ow diagrams
pages 229{240. International Confer-
to structured. Computer Journal,
ence on Computer Languages, May
212:161{167, 1978.
1994.
[Lic85] Ulrike Lichtblau. Decompilation of
A Additional Rewriting Rules
control structures by means of graph
transformations. In C. F. M. Ni-
Weanticipate using a few other tree rewriting rules
vat Hartmut Ehrig and J. Thatcher,
that might improve readability of our co de. The
editors, Mathematical foundations of
anticipated rules build more natural for-lo ops. Ta-
software development: Proceedings of
ble 2 presents addition co de transformation rules
the International Joint Conferenceon
that could b e applied by Krakatoa. We exp ect
Theory and Practice of Software De-
to add these rules as we re-implement Krakatoa in
velopment TAPSOFT 85: volume 1
Java.
- Col loquium on Trees in Algebra and
Programming CAAP '85,volume 185
B Sample Decompiler Output
of Lecture Notes in Computer Science,
pages 284{297. Springer-Verlag, March
We've included a representative sampling of
1985.
Krakatoa's output on a class le that implements
[LY97] Tim Lindholm and Frank Yellin. The sets in Java. The original Java source is on the left
Java Virtual Machine Speci cation. and Krakatoa's output is on the right. Table 3 pro-
The Java Series. Addison-Wesley, 1997. vides original source class de nitions as well as the
Rule Before After Conditions
I
for I ; expr ; update f
for ;expr ; update f
Include
Stmtlist I is a simple statement
Stmtlist
Init
g
g
for init ; expr ;f
Stmtlist contains no
for init ; expr ; U f
Include Stmtlist reachable program
Stmtlist
Up date U p oints equivalentto.
1
g
g U is a simple statement
1
Table 2: Additional Co de Transformation Rules
Original Source Output from Krakatoa
import java.io.PrintStream; import java.io.PrintStream;
import java.util.Vector; import java.util.Vector;
public class Set public class Set
implements Cloneable { extends java.lang.Object
implements java.lang.Cloneable{
// class variables
static boolean echo_ops; static boolean echo_ops;
// instance variables
protected Vector members; protected java.util.Vector members;
// functions are defnied here.... // functions are defined here...
} }
Table 3: Class de nition output from Krakatoa
corresp onding Krakatoa output. Table 4 provides
original source of several small functions together
with Krakatoa output for those functions. Table 5
shows a larger function in original source as well as
Krakatoa output for that function.
Original Source Output from Krakatoa
public boolean isMemberObject o { public boolean isMember
java.lang.Object local1 {
return members.containso; return this.members.containslocal1;
} // isMember } // isMember
public void addMemberObject o { public void addMember
java.lang.Object local1 {
if !isMembero { if ! this.isMemberlocal1 != 0 {
members.addElemento; this.members.addElementlocal1
} // then } // then
return;
} // addMember } // addMember
public void removeMemberObject o { public void removeMember
java.lang.Object local1 {
members.removeElemento; this.members.removeElementlocal1
return;
} // removeMember } // removeMember
public int size { public int size {
return members.size; return this.members.size;
} // size } // size
boolean equalsSet local1 {
boolean equalsSet s { Set local2;
Set d1, d2; Set local3;
d1 = differences; local2 = this.differencelocal1;
d2 = s.differencethis; local3 = local1.differencethis;
if !local2.size != 0 ||
!local3.size == 0 {
return 1;
return d1.size == 0 && } // then
d2.size == 0; else {
return 0;
} }//if
} // equals
Table 4: Member Functions: Original Source and Krakatoa output
Original Source Output from Krakatoa
// This returns a NEW set, with all of
// the elements from this set and
// Set s.
public Set unionSet s { public Set unionSet local1 {
Set out; Set local2;
int size; int local3;
int i; int local4;
Object obj; java.lang.Object local5;
if echo_ops { if ! Set.echo_ops == 0 {
System.out.println"unioning"; java.lang.System.out.println"unioning";
} } // then
out = new Set; local2 = new Set;
local2.members = java.util.Vector
this.members.clone;
out.members = Vector members.clone; local3 = local1.size;
local4 = 0;
size = s.size; loop3 :
for ; !!local4 < local3 ; {
for i = 0; i < size; i++ { local5 =
obj = s.members.elementAti; local1.members.elementAtlocal4;
if !out.isMemberobj{ if !local2.isMemberlocal5 != 0 {
out.addMemberobj; local2.addMemberlocal5
} // then } // then
local4 += 1;
} // for } // loop3
return out; return local2;
} // union } // union
Table 5: Memb er functions: Original Source and Krakatoa output