Advanced Obfuscation Techniques for Java Bytecode

The Journal of Systems and Software 71 (2004) 1–10 www.elsevier.com/locate/jss Advanced obfuscation techniques for Java bytecode Jien-Tsai Chan *, Wuu Yang Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan 300, ROC Received 17 May 2002; received in revised form 30 July 2002; accepted 2 August 2002 Abstract There exist several obfuscation tools for preventing Java bytecode from being decompiled. Most of these tools simply scramble the names of the identifiers stored in a bytecode by substituting the identifiers with meaningless names. However, the scrambling technique cannot deter a determined cracker very long. We propose several advanced obfuscation techniques that make Java bytecode impossible to recompile or make the decompiled program difficult to understand and to recompile. The crux of our approach is to over use an identifier. That is, an identifier can denote several entities, such as types, fields, and methods, simulta- neously. An additional benefit is that the size of the bytecode is reduced because fewer and shorter identifier names are used. Furthermore, we also propose several techniques to intentionally introduce syntactic and semantic errors into the decompiled program while preserving the original behaviors of the bytecode. Thus, the decompiled program would have to be debugged manually. Although our basic approach is to scramble the identifiers in Java bytecode, the scrambled bytecode produced with our techniques is much harder to crack than that produced with other identifier scrambling techniques. Furthermore, the run-time efficiency of the obfuscated bytecode is also improved because the size of the bytecode becomes smaller after obfuscation. Ó 2002 Elsevier Inc. All rights reserved. Keywords: Program protection; Bytecode obfuscation; Java programming language 1. Introduction The Java programming language has become more and more popular since its first release in 1994 (Gosling Traditionally, a program is compiled to native code et al., 2000). One of the major benefits of Java is (or machine code). Most of the symbolic information is portability––the compiled program can run on most stripped off when the program is compiled. The identi- platforms. A Java program is compiled to platform- fiers that denote variables and functions in the source independent bytecode. In order to achieve platform program become addresses in the compiled program. independence, instead of the traditional memory ad- Decompiling such a program, though difficult, is still dresses, Java uses symbolic references to link entities possible. Because no methods can absolutely protect a from different libraries (including the standard and program from decompilation attacks by experienced proprietary libraries). Therefore, the names of types, crackers, we usually consider a protection method suc- fields, and methods are stored in a constant pool within cessful if it can make the cracking work costly in terms a bytecode file (Engel, 1999; Lindholm and Yellin, 1999; of time and effort. Cracking becomes valueless when Meyer and Downing, 1997; Venners, 1998). These the cost is more than that of rewriting a program. names and the simple stack-machine instructions facili- Therefore, one of the basic rules is to prevent the de- tate the decompilation of the bytecode file. compilation to be done automatically with tools (i.e. There are many free or commercial Java decompilers decompilers). (D & C, 2001; Hoeniche, 2001; Kouznetsov, 2001; Kumar, 2001; Mayon, 2001; PsychoticSoftware, 2001; * Vliet, 1996). The decompiled program is almost iden- Corresponding author. Tel.: +886-9-3330-3945; fax: +886-3-572- tical to the original source program. These decompil- 1490. E-mail addresses: [email protected] (J.-T. Chan), wuu- ers become the lethal weapon of intellectual property [email protected] (W. Yang). piracy. 0164-1212/$ - see front matter Ó 2002 Elsevier Inc. All rights reserved. doi:10.1016/S0164-1212(02)00066-3 2 J.-T. Chan, W. Yang / The Journal of Systems and Software 71 (2004) 1–10 Obfuscation tools are one of the major defenses 3. The candidates for identifier scrambling against the decompilers. Obfuscation transforms clear bytecode into more obscure bytecode. The goal of ob- According to the Java specification (Gosling et al., fuscation is to make the decompiled program much 2000), an identifier in a Java program may denote harder to understand so that a cracker has to spend more time and effort on the obfuscated bytecode. Most • a package of the existing obfuscation tools simply scramble the • a top-level type (either a class or an interface) symbolic information (identifiers) in the constant pool • a nested type (either a class or an interface) (Dr. Java, 2001; Eastridge, 2000; Hoeniche, 2001; • a field Plumb, 2001; Retrologic, 2000). Usually, a meaningful • a method name is substituted by a meaningless name. • a parameter (of a method, a constructor, or an excep- In this paper, we propose a new obfuscation ap- tion handler) proach that achieves better identifier scrambling. Based • a local variable on the approach, several techniques are introduced to make the bytecode much harder to understand and, However, not all of them are kept in the bytecode file sometimes, make the decompiled program not re-com- after compilation. Only the identifiers that denote the pilable. The basic approach is to endow an identifier first five items in the above list are stored in the byte- with as much information as possible. An identifier can code. By default, parameters and local variables are denote several types, several fields, and several methods stripped off from the bytecode and become the memory at the same time in the obfuscated bytecode. The cracker addresses of the local variable array in the correspond- is confused because an identifier is identified not only by ing stack frame (see Section 3.6 of Lindholm and Yellin its name but also by the context it exists. An additional (1999) and Section 3.7 of Engel (1999)). If the debug-info benefit is that the size of the bytecode is reduced because option of the compiler is enabled, the names of long, meaningful names are replaced by shorter, mean- parameters and local variables will be stored in the ingless names. We also propose several techniques to LocalVariableTable in the bytecode. The LocalVariable- purposely introduce certain hidden compilation errors Table can be removed by disabling the option (which into the obfuscated bytecode so that the decompiled is, the default setting of the Java compiler). If the Lo- program cannot be compiled again. Therefore, a cracker calVariableTable is not available, Java decompilers has to spend a lot of time debugging the decompiled usually automatically generate names sequentially for program manually. The basic approach and these tech- parameters and local variables. Though it is possible niques make a Java bytecode file harder to crack. Fur- to rename the variables in the LocalVariableTable to thermore, the run-time efficiency of an obfuscated make the decompilation process more difficult, a smar- program is also improved. ter decompiler may simply ignore these modified names and generate new variable names instead. Since we cannot prevent the decompilers from generating 2. Obfuscation scope names for parameters and local variables, names in the LocalVariableTable are not candidates for obfus- In Java, an application consists of one or more cation. The candidates for obfuscation are the first five packages. A programmer may divide his own applica- items. tion into packages. He may also use the packages in the On the other hand, not all of the candidates can be standard library and proprietary libraries. Usually, only obfuscated. When an application runs, the Java virtual the part of an application that is developed by the machine (JVM) dynamically loads and links the refer- programmer is distributed. The proprietary libraries are enced types into the runtime environment. The byte- not distributed because of the copyright restrictions. code file that stores the referenced type is located by a The part of a program that will be obfuscated by the symbolic reference––the fully qualified name of a class obfuscation techniques is called the obfuscation scope. or an interface. These symbolic references cannot be Generally, only the programmer-developed part of an changed. Hence, only the candidates that reference en- application is protected. The packages that serve as tities in the obfuscation scope will be obfuscated. The utilities in the standard and proprietary libraries are not candidates that reference entities outside the obfusca- obfuscated. However, the obfuscation scope is not nec- tion scope (which generally denote entities in the stan- essary limited to the packages written by the program- dard library or the proprietary libraries) should not be mer. When an application is not big enough to confuse obfuscated. the cracker, the standard and proprietary libraries could The identifiers that denote entities in the obfuscation be included in the obfuscation scope. However the re- scope need further investigation. The following four distribution of the obfuscated proprietary libraries may groups of identifiers should not be obfuscated (these violate the copyright. groups are called the Exception groups): J.-T. Chan, W. Yang / The Journal of Systems and Software 71 (2004) 1–10 3 Exception group 1: The instance method that imple- actual object that contains the callback method is passed ments an abstract method of a su- as a parameter and a callback method is invoked perclass (or a superinterface) that through the polymorphism mechanism. Based on this is outside the obfuscation scope. assumption, all the callback methods whose names Exception group 2: The instance method that overrides should be retained will belong to Exception group 1 or an inherited method of a superclass 2. that is outside the obfuscation Fields, static methods, and nested types are statically scope. resolved by the Java compiler. Once the bytecode is Exception group 3: The entities that are explicitly des- generated, the JVM will not change the resolution.

Load more