Decompiler: Reversing Compilation

Written By, Tushar B Kute, Lecturer in Information Technology (K. K. Wagh Polytechnic, Nashik – 3, Maharashtra, INDIA) Email – [email protected]

Decompiler: Reversing Compilation - 1 -

Abstract

This paper presents an overview of Decompiler, a program which performs exact reverse process of i.e. creating high level language code from a machine / code. Decompiler comes into picture when user needs from during number of occasions. Decompiler mainly deals with , which can be used for positive as well as negative purposes depending on application user uses it for. Hence we also need ways to protect our code from Decompiler to avoid misuse. In Industry people are taking Decompilation quite seriously like any other department while discovering its usability. In this paper I have a foot in two different camps, as a programmer I’m interested in understanding how others achieve interesting effects but from a business point of view I’m not too keen on someone reusing my code and selling them onto third parties as their own. This paper presents an overview of the Decompiler’s working and area of usability.

Decompiler: Reversing Compilation - 2 - Introduction

A Decompiler is a program that reads a program written in machine language and translates it into equivalent program in HLL. A Process done by Decompiler is Decompilation which is the reverse process of compilation i.e. creating high level language code from machine/assembly language code. At the basic level, it just requires to understand the machine/assembly code and rewrite it into a high level language.

Machine High Level language Decompiler language Program Program

Figure 1 Process of Decompilation

So, using Decompiler we can get source code back from executables. Although translations aim at preserving the extensional semantics of a program, it is usually not possible to retain all information across a translation. A compilation is a form of synthesis in which a program in a high-level language is transformed to .

Need of Decompiler

Computer languages such as FORTRAN, COBOL, and now Java were developed to allow us to put our ideas in a human friendly format that can then be converted into a format that a computer chip can understand. So, during various situations when we need source code back from computer understandable form i.e. exe ( form) the situations which are discussed in details in application part Decompiler comes for help and gives us source code back from executables. Some of the situations where Decompiler helps us are – • To recover Lost Source Code • Migration of applications to new Hardware and many other applications

Some of available Decompiler

The Decompilers available today are as below – - DisC for Turbo C 2.0/2.01 The source is free to download. The author is Satish Kumar http://www.debugmode.com/dcompile/disc.htm - JosephCo: VB 5 beta decompiler - Hans-Peter Diettrich (DoDi) VB3 Decompiler (Generated as Microsoft was giving much of source code in exe generated by VB3) - Jad (Jad - the fast JAva Decompiler) Jad is a decompiler that is free for non commercial use. - Salamanda is a commercial decompiler for .NET. First release was 1st Feb 2002. There are four examples online. http://www.remotesoft.com/sal

Decompiler: Reversing Compilation - 3 - Application-specific Decompilers

There are a number of applications that generate an intermediate (low level) code, which is interpreted by a virtual machine. With some applications this low level code is put inside an executable (e.g. Java bytecodes). Writing decompilers for these intermediate low level code is often much easier; intermediate code is the assembly language of the virtual machine. Examples are: • MultiRipper: a Windows and Delphi/C++ Builder ripper by Baccan Matteo and Peruch Emiliano. A "ripper" program extracts files inside other files. MultiRipper extracts files from Windows and Delphi/C++ Builder applications. For windows applications it extracts windows resources and saves them onto disk, and for Delphi/C++ Builder applications it recovers the Delphi project and code. MultiRipper is not a decompiler, as it does not recover the source code of a Delphi application; however, the authors are working on this for a future release (current release is 2.6). • SourceAgain: a Java decompiler by Ahpah . SourceAgain correctly recovers Java control structures and optimizations from the bytecode. Further, it supports irreducible graphs, polymorphic type inference, recognition of packages, and more, and provides debugging support. PC and Unix versions of SourceAgain are available with prices ranging between U$99 and U$299. • ReFox7.0: A decompiler by Xitech for restoring source code from FoxBASE+, FoxPro 1.x, FoxPro 2.x and for Visual FoxPro executables or compiled modules. (The information given in the FoxPro programming FAQ is out- dated.) • Alexander Lobanov's FoxPro decompilers for version of FoxPro 2.0, 2.5/2.6. Demo versions of the software are available; these programs are distributed in shareware mode. • The Visual Basic 4 decompiler: It can decompile files generated by the Visual Basic ; accessible from the DoDi VB tools page. • The Decaf decompiler for Java .class files: Written in Ada95. Decompilers to Ada95 and Smalltalk are worked on. • The Mocha decompiler for Java .class files: We can use crema to scramble symbolic information in the .class files. • Sculptor and Realizer: These two decompilers were written in Spain: The first is a decompiler written in 1989 for a 4th generation language called Sculptor (also referred to as Sage and Sagerep), for the development of information systems (that is similar to a context-free RPG II). This is a true case in which the development of a decompiler was required due to a huge loss of the originals and backups of an important program. The company for which it was developed exists no more. The second decompiler is for Computer Associates Realizer 2.0, which is the Visual Basic of Computer Associates (considering Visual Objects is their xBase product). Realizer is very close in everything to VB 3.0 and surpasses it in included components, being overall superior, but it didn't got so much popularity: it includes reporter, screen painter, configuration control and version tools (these tools were written in Realizer itself) and features custom controls, databases and ODBC, etc. This language is scarcely used now. • Besides, a try was made to develop a decompiler for COBOL as a hobby project, but it was abandoned due to a lack of knowledge of COBOL.

Decompiler: Reversing Compilation - 4 - History of Decompilers

Existence of Decompiler is not new; it is existed from long back in various forms. IBM had played important role in development of decompilers.

First Decompiler

Decompilers have been written for a variety of applications since development of the first compilers. The very first decompiler was written by Joel Donnelly in 1960 at the Naval Electronic Labs to decompile machine code to Neliac on a Remington Rand Univac M-460 Countess computer. Hence D-Neliac Decompiler was first Decompiler developed in 1960.

Uses from Last Decades

Throughout the last decades, different uses have been given to decompilers. In the 1960s, decompilers were used to aid in the program conversion process from second to third generation computers; in this way, manpower would not be spent in the time-consuming task of rewriting programs for the third generation machines. During the 70s and 80s, decompilers were used for the portability of programs, documentation, debugging, re-creation of lost source code, and the modification of existing binaries. In the 90s, decompilers have become a reverse engineering tool capable of helping the user with such tasks as checking software for the existence of malicious code, checking that a compiler generates the right code, translation of binary programs from one machine to another, and understanding of the implementation of a particular library function.

The ethics of Decompilation

Is decompilation legal, and is it allowed? There are many situations when decompilation can be used. • To recover lost source code. We may have written a program for which we only have the executable now (or we got the exe of a program we wrote long back, from someone else!). If we want to have the source for such a program, we can use decompilation to recover it. In all rights, we are the owner of the program, so nobody is going to question us. • Just as stated above, applications written long back for a legacy computer may not have the source code now, and we may need to port it to a new platform. Either we have to rewrite the application from the scratch, or use decompilation to understand the working of the application and write it again. • For example, we have code written in some language for which we can’t find a compiler today! If we have the executable, just decompile it and rewrite the logic in the language of our choice today. • To discover the internals of someone else's program (like what algorithm they have used...) • Usually all software are copyrighted by the authors. This means, copying or expressing the same idea in another program is prohibited. Hence if we are using decompilation to discover the internals of a program and if that particular part is breaching the copyright of the owner, we are liable for legal action. However, there are some permitted uses of decompilation, like the first

Decompiler: Reversing Compilation - 5 - three cases stated above. Also, decompilation of parts of software which do not come under the copyright laws (e.g. algorithms) is permitted. • In all practical purposes, decompiling programs which were created by us can't be questioned! After all, we are the owner of all rights to the program. But be careful if we are trying it out on someone else's programs.

How is decompilation possible?

Let's take a look at a normal C compiler. When a C program is compiled, the first stage of the compiler will generate a very elementary assembly language output (or nearly equivalent to it), which is nothing but a line-by-line translation of the C source code. If any optimizations are chosen to be done, the next stages perform the code optimization to replace redundant instructions and improve the overall efficiency of the output program. This output is then linked with the for any library function calls, and saved in the executable format of the platform.

void main() ;---- i = 10; { mov [bp+2], 10 int i, j, k; ;---- j = 20; mov [bp+4], 20 i = 10; ;---- k = i*j + 5;

j = 20; mov ax, [bp+2] mov bx, [bp+4] k = i*j + 5; mul bx } add ax, 5 mov [bp+6], ax

Sample C code Possible output of a compiler

(without optimizations)

If no optimizations are performed while the code is generated, it is very easy to understand what the output code does, and an equivalent C code can be written/generated automatically. Note that we can only generate "an equivalent C code", and not the "same C code" which was compiled to get this executable. In other words, it is always impossible to get the exact source code, but we can generate an equivalent program which will function in the same way. However, things are a bit different if any compiler optimizations are used while building the original executable. It turns out to be more difficult to understand the flow of the program now, and the more rigorous the optimizations are, the worse are our chances to figure out what the code is doing exactly. void main() ;---- i = 10; ;---- k = i*j + 5; { mov si, 10 mov ax, 10 int i, j, k; ;---- j = 20; mov bx, 20 mov di, 20 mul bx i = 10; ;---- k = i*j + 5; add ax, 5 j = 20; mov ax, si mov [bp+6], ax mov bx, di k = i*j + 5; mul bx } add ax, 5

Decompiler: Reversing Compilation - 6 - mov [bp+6], ax

Sample C code Possible output of a Possible output of a compiler compiler (using code optimization) (using register variables)

One more factor which joins the opposition is that each compiler generates code in its own way. There are very few points where all compilers generate similar code. This means that we need to tailor the decompilation procedure for each compiler, so that if we know which compiler was used to generate this executable, we have a better chance of understanding the code. For example, the simple "if" statement can be compiled in many ways, like the one given below.

if (i>20) cmp [bp+2], 20 cmp [bp+2], 20 { jle lab1 jg lab1 j=30; jmp lab2 } mov [bp+4], 30 lab1: else jmp lab2 mov [bp+4], 30 { lab1: jmp lab3 j=40; mov [bp+4], 40 lab2: } lab2: mov [bp+4], 40 j++; inc [bp+4] lab3: inc [bp+4] Sample "if" Possible output #1 Possible output #2 statement

Some other factors which hinder the decompilation process are 1) Self modifying code Code written for critical applications like games, where every machine cycle counts for the performance, sometimes self-modifying code is used. For example, if the same condition is used at many places in a critical loop, it is evaluated once and the code after that is modified to branch to the same location without evaluating the condition again. But with today's processor speeds, this technique is fast becoming antique! 2) User-defined data types User defined data types, like "struct"s, "typedef"s, "union"s and bit-fields add more to the confusion of the decompiler. Though it is easy to use structures while writing the code, it is almost impossible to figure out if a variable is a part of a structure or is it a basic type on its own, by looking at the compiled output. 3) Use of processor-specific instructions/optimizations In spite of all these reasons, it is still possible to decompile a binary executable (though not 100% automated, and not 100% accurate!).

How to decompile?

A simple approach to decompile a binary executable is to first parse it and separate it into functions (C style). Once we know where the entry point of the program ("main" for a C program), we can start decompiling that function, and any other function it calls. This way, we can focus on one function at a time carefully.

Decompiler: Reversing Compilation - 7 - To use this approach, the first thing that should be known is the entry-point of the program. For a normal C program, it is the "main" function, and for a Win32 program, it is the "WinMain" function. But to find out where these functions begin, we must analyze the executable and figure out which compiler was used, because each compiler has its own entry/exit code added to the program which we need not decompile. If the compiler used is known, we can trace out where the "main" function is starting, and from thereon, till we get a "return" instruction, we can separate out the function. There are number of factors on which working of Decompiler depends. Decompiler Writing is based on Compiler Writing: Basic Decompiler techniques are used to decompile binary program from a wide variety of machine language to variety of High Level Languages. The structure of decompiler is based on the structure of compiler. Similar principles and techniques are used to perform the analysis of programs.

Phases of Decompiler:

The working of Decompiler goes through number of phases similar to while working of Compiler.

Binary Program

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Control flow graph Generator

Data flow Analyzer

Control flow analyzer

Code Generator

HLL Program

Figure 2 Phases of decompiler

Decompiler: Reversing Compilation - 8 - Conceptually, a decompiler is structured in a similar way to a compiler, by a series of phases that transform the source machine program form one representation to another. The typical phases of a decompiler are shown as above. These phases represent the logical organization of a decompiler. In practice some of the phases will be grouped together. A point to note is that there is no lexical analysis or scanning phase in the decompiler. This is due to the simplicity of machine languages; all tokens are represented by bytes or bits of a byte. Given a byte, it is not possible to determine whether that byte forms the start of a new token or not, for example the byte 50 could represent the op-code for a “push ax” instruction, an immediate constant, or an offset to a data location.

Problems while Decompiler writing

A Decompiler writer has to face several theoretical and practical problems when writing a decompiler. Some of these problems can be solved by use of heuristic methods, others cannot be determined completely. • Data-type Information is hard to recover • Decompilation of used by compiler and routines used by linker are quite difficult to recover this data is quite major problem. • Control flow constructs are hard to recover • Separation of code and data from binary form is one of the major problem faced while writing decompilers In order to achieve a greater percentage of the disassembly automatically, decompilers can make use of knowledge about certain compilers and libraries used in the compilation of the file to be decompiled.

Case Study: Java Decompiler

Java compiler compiles the Java source code files (*.java) into binaries files (*.class). You would use the Java de-compiler to convert java class files into source code files (*.java) .class.

.class file Java .java file Decompiler

Figure 3 Working of Java decompiler

How Java Program Runs?

In order to run a Java program we have to perform a two-stage process, compile and interpret. The programmer compiles the Java file into a class file or bytecode typically using Java Development Kit (JDK) or some other variant. These bytecodes are a series of assembler-like statements that are interpreted by any Java Virtual Machine (JVM) data. A class-file is only partially compiled into bytecodes by ‘javac’, the java compiler. This is then interpreted and executed by a JVM usually on a completely different machine or operating system. The JVM’s class-file interface is strictly defined by ‘Java Virtual Machine’ Specifications. It may help if we think of class files as being analogous to object files in other languages such as C or C++ waiting to be linked and executed by the JVM only with

Decompiler: Reversing Compilation - 9 - a lot more symbolic information. There are many good reasons why a class-file carries around so much information. For many the Internet is seen as being a bit of a modern day Wild West where crooks and criminals are plotting to infect your hard disk with a virus or waiting to grab any credit card details that might pass their way. As a result the JVM was designed from the bottom up to protect Web browsers from any rogue applets. Through a series of checks the Java Virtual Machine and the class loader make sure that no malicious code can be uploaded onto a web page. Hence Java’s compiled classes contain a lot of symbolic information for the JVM. Information; such as variable and method names that would not otherwise be available. These names can be almost as helpful as comments when trying to understand decompiled source code.

Why decompiling of Java is easy?

Practically Java Decompilers gives good results for recovering source code from .class file. Some of the reasons for this are as above: • Information contents of .class file is quiet higher. • For portability, Java code is partially compiled and then interpreted by the JVM. • JVM Designers opted for simple Stack Machine. • Not much security to Developers for preventing Decompiling. • There are very few instructions or op-codes in the JVM. • Because of backward compatibility issues the JVM’s design is not likely to change. • Java applets are typically downloaded for free.

One important point is that Sun Microsystems even kick-started the decompiler industry by including a primitive debug tool, reminiscent of DOS’s Debug, in the JDK.

How Decompiling Is Done For Java?

There are different ways to decompile the program according to different techniques used by decompiler. Depending on which quality of Decompiler increases or decreases. Normally decompiler don’t give 100% code back from executable, source code retrieval is mixture of code from decompiler and code retrieved manually by programmer. In Java the percentage of code retrieved using decompiler is quite high normally. Sometimes for getting source code back from .class file we even don’t need decompiler, just however armed with a hexadecimal editor and an understanding of the class file structure we can get source code back. As we know class-file carries around so much information. All checks on applets have to be performed lightning quick to cut down on the download time so its not really surprising that the original JVM designers opted for a simple stack machine with lots of information available for those crucial security checks, which makes so much information available in .class file. Before starting with our example here we will first see what is inside .class file for decompiler. We can break the class file into the following constituents:

• Magic number • Minor and Major version numbers • Constant Pool Count

Decompiler: Reversing Compilation - 10 - • Constant Pool • Access Flags • this class • super class • Interfaces Count • Interfaces • Methods Count • Methods • Attributes Count • Attributes

These elements are all tied together into a single class-file. Now using this information we will try to retrieve some code from .class file. The class-file is a in below figure I had shown a hexadecimal view of one of the class file which we can get easily using any Hexadecimal editor with .class file. It contains all the necessary information. Here for demo purpose only partial hexadecimal view of .class file is shown.

CAFEBABE0003002D003908001A08002708002B0700260700320700330 700340700350700360A000500120A000700130A000700140A000600150 A000800160A000800170A000400180A000700190C0022001D0C002200 200C002D001F0C002E00210C002F001B0C0030001C0C0031001B0C00 38001B0100012101001428294C6A6176612F6C616E672F537472696E67 3B01001828294C6A6176612F6E65742F496E6574416464726573733B01 0003282956010016284C6A6176612F6177742F47726170686963733B295 601002C284C6A6176612F6C616E672F537472696E673B294C6A617661 2F6C616E672F537472696E674275666665723B010015284C6A6176612F 6C616E672F537472696E673B2956010017284C6A6176612F6C616E672

What I had recovered?

Here as mentioned above we can get much information, but for demo purpose I will show Only 3 constituents out of many: • Magic number (B): It’s pretty easy to find the magic and version numbers as they come at the start of the class-file. We should be able to make them out in Figure. The magic number in hex is the first four bytes, i.e. 0xCAFEBABE and it just tells the JVM that it is receiving a class-file. • Minor and Major version numbers (B): The minor and major version numbers are the next four bytes 0x0003 and 0x002D or minor version 3 and major version 45. These are used by the JVM to make sure that it recognizes and fully understands the format of the class file. Current JVM’s will refuse to execute any class-file with a higher major or minor number. The minor version is for small changes that require an updated JVM, the major number is for wholesale fundamental changes requiring a completely different and incompatible JVM like one designed to stop decompiling. • Constant Pool Count (B): All class or interface constants are stored in the constant pool. And the constant pool count, taking up the next two bytes, tells us how many variable-length elements follow in the constant pool. 0x0039 or integer 57 is the number in our example above. The JVM specification tells us that constant_pool[0] is reserved by the JVM. In fact it doesn’t even appear in

Decompiler: Reversing Compilation - 11 - the class-file - so the constant pool elements are stored in constant_pool[1] to constant_pool[56]. Here is some sample source code in Java, a Parser for constant pool count

class ClassParser { ClassParser(RandomAccessFile in, String s) throws IOException { int filepointer = 8; // skip the magic numbers, i. e. 8 bytes in.seek(filepointer); System.out.println("Constant Pool Count: " + in.readShort( )); } }

In above code, we’ve changed the DataFileInputStream to a RandomAccessFile so that we can skip the magic number as well as the version numbers. The constant pool count pops out next. Similarly other constituents can be found out manually or any simple parser program, which is nothing but source code of our .class file.

How to verify Decompiler’s output?

In order to verify that the decompiler program is regenerating the Java source- code properly, use the following technique – Generate the class file from the generated source code using the compiler –

javac myprogram.java

Now use the UNIX 'diff' command to compare the two class files –

diff myprogram.class myprogram_orig.class

Both these files MUST BE IDENTICAL. This verifies that decompiler program is working perfectly. On DOS or Windows 95 we may want to use the free Cygnus Cygwin 'diff' or 'MKS' utilities.

Applications of Decompiler

Application’s of Decompiler’s are vast and are used from early days for number of situations. After recognizing the power of decompilers in many situations decompilation process can be used.

To recover lost source code

We may have written a program for which we only have the executable now (or we got the exe of a program we wrote long back, from someone else!). If we want to have the source for such a program, we can use decompilation to recover it. In all rights, we are the owner of the program, so nobody is going to question us. It is estimated that 5% of all software in the world has at least some source files missing.

Migration of application to new hardware platform

Decompiler: Reversing Compilation - 12 -

Applications written long back for a legacy computer may not have the source code now and we may need to port it to a new platform. Either we have to rewrite the application from the scratch, or use decompilation to understand the working of the application and rewrite it.

Compiler not available today

Say we have code written in some language for which we can’t find a compiler today! If we have the executable, just decompile it and rewrite the logic in the language of our choice today.

Decompilation of algorithm

In order to discover the internals of someone else's program (like what algorithm they have used...), check the copyright law before. Decompilation of parts of software which do not come under the copyright laws (e.g. algorithms) is permitted. In any case, it is better to contact our legal advisor if we are doing any serious work with decompilation.

Y2K Problem

When whole world was worrying about Y2K (Year 2000 Problem) reverse engineering and the Year 2000 problem has become the acceptable face of decompilation.

EURO Problem

The introduction of the European single currency had caused many more financial programs to fall over, so decompilation had again come for help.

Security

To ensure that compiled code is correct compared to the source code (maybe we don't or can't trust our compiler).

Interoperability

It may be legal and necessary to reverse engineer some binary code for the purposes of interoperability.

Support for programs that ship without debugging information

Support for programs that ship without debugging information, often linking with third party code, and one day it just stops working.

Converting .class To UML

Using this application we can get design view of project back from .class i. e. executable files.

Decompiler: Reversing Compilation - 13 - Protection from Decompilers

As every coin has two sides, Decompiler also can be used positively or negatively. License agreements don’t offer any real protection from a programmer that wants to decompile your code. So to avoid misuse of this concept, we need the ways to avoid misuse of Decompiler. That is, need of protection if decompiler is misused.

Use of Obfuscator:

An obfuscator acts like a filter, removing any useful information such as variable names and line attributes and only allowing the bare minimum of information to pass through. Essentially the obfuscator parses the constant pool to find and then rename all variable names and parameters. It is only been marginally effective, it allows the class-file to run as normal and it prevents the decompiled code from being recompiled. However that is not the end of the story. True, it is more difficult to understand and recompile and once a class-file has been through an obfuscator we can never recover the original variable names, but we could have accomplished some of that by compiling. Using some Obfuscator as many methods as possible are renamed a using overloading where ever possible. Overloaded methods have the same name but have different numbers of parameters. True, the overloaded methods are difficult to understand but they are not impossible to comprehend. They too can be renamed into something easier to read. Some obfuscators have a far more aggressive form of obfuscation known as high mode obfuscation. For example, Zelix Klassmaster encrypts all strings in a class-file and Neil Aggarwal’s Obfuscate adds extra invalid entries to the symbolic table rather like a bytecode manipulator. Unfortunately while they do succeed in mangling the symbolic names beyond all recognition, some decompilers such as SourceAgain can substitute new strings automatically.

Selling source code with application by charging extra

Some companies are finding that if source code is so readily accessible then why not just sell it at a higher price. JClass is available from the KL Group as both class file and source code. The difference in price is so small that it just doesn’t make any sense to decompile, given the time and energy that is sometimes required. We’ve already talked about the possibility of giving away our code at a higher price. If the price is not too high we could convince the decompiler to pay for the code as programmer’s comments are usually very informative. This won’t be for everyone but why not make some money on the fact that some people will decompile our code to copy it, so why not try to gain some revenue on these otherwise illegal activities.

License agreement

License agreements don’t offer any real protection from a programmer that wants to decompile our code.

Code Fingerprinting

Decompiler: Reversing Compilation - 14 - In a court of law, one of the best ways of proving that an application or applet was decompiled and recompiled is to show spurious code which has no function and yet is in the original as well as the decompiled code. However most decompilers will be able to spot spurious code put in there for fingerprinting and it too will be ultimately useless, like stretching a watermarked image to remove the watermark. Both JAD and SourceAgain can already search out unexecuted code and remove it from the decompiled code.

IPR Protection Schemes

IPR protection schemes are one of the areas where real strides are likely to be made in class-file protection. The logic behind IPR protection schemes with respect to Java is that if we can’t get at the class-file then we cannot possibly decompile it and to do that we need a secure browser cache. Already secure browsers that don’t dump class-files, HTML or images to the Internet Explorer or Netscape Navigator cache are on the market. Breaker Technologies have a secure browser and IBM’s CryptoLive claims to have one and no doubt there are others. For the moment this technology is in it’s infancy but expect it to grow. Intellectual Property Rights (IPR) protection schemes such as IBM’s Cryptolope Live or InterTrust’s DigiBox and Breaker Technologies’ SoftSEAL are normally used to sell HTML documents or audio files on some pay-per-view basis or pay-per-group scheme. However as they typically have built in trusted HTML viewers they allow Java applets to be seen but not copied. Unfortunately IPR protection schemes are not cheap. Worse still some of the clients are written in 100% pure Java and can therefore be decompiled. Similar protection schemes in the future are likely to provide the best chance of success in nailing the decompiler issue once and for all.

Executables applications

The safest protection for Java applications is to compile them into executables. This is an option on many Java compilers, for example SuperCede. Your code will now be as safe as any C or C++ executables – read a lot safer – but are no longer portable as they no longer use the JVM.

Keeping code on server side

The safest protection for applets is to hide all the interesting code on the web server and only use the applet as a thin front end GUI. However this increases web server load and goes against the Java methodology.

Decompiler developer’s develops obfuscator

Crema is the original obfuscator and was a complementary program to the above mentioned Mocha. Of course we know Mocha was given away free but Crema cost somewhere around $30 and to safeguard against Mocha we had to buy Crema. It performs some rudimentary obfuscation as we’ll see later and it also has one interesting side effect. It flags class files so that Mocha refuses to decompile any applets or applications that had been previously run through Crema. However other decompiler soon came onto the market which was not so Crema friendly. Now we

Decompiler: Reversing Compilation - 15 - find obfuscators and decompilers are in what we could describe as a cold war scenario without any entente cordiale. So far the decompilers are winning but that’s not to say the climate won’t change.

Best Defense

It is said that perhaps the best defense is to provide a stable, useful application with all the usual phone support and good documentation. The majority of our customers will much prefer a supported application than an illegal copy that they will have to support themselves. So before we start worrying too much about decompilation, remember that people need someone to shout at when things go wrong.

Drawbacks

• Functionality not user Friendly: Using Decompiler’s i.e. getting full use of Functionality of Decompiler is not much user friendly, some times due to need of technical background, lack of GUI. • Presence of Larger Size of Source Code: It is harder to get source code back from larger size of decompiled data. For e.g. Decompiling Office 97 using dcc (A Decompiler) would create so much code that it is about as user friendly as debug or a hexadecimal dump. Most modern commercial software’s source code is so large that it becomes unintelligible without the design documents and lots of source code comments. Let’s face it; many people’s C++ is hard enough to read six months after they wrote it. So how easy would it be for someone else to decipher C code that came from compiled C++ code without any help even if the library calls aren’t traversed? • No GUI for older ones: As older Decompiler’s where mainly used by developers hence the lack GUI which is common for many , hence due to lack of GUI handling Decompiler becomes harder. • Compiler Dependent: As Design of Decompiler depends on compiler for which it is been developed. Hence change in compiler causes to write updated Decompiler. • Increasing Trend towards Stealing code: Habits of using Decompiler are increasing trends towards stealing code other than for educational purpose. • Giving Trusted Secrets to Competitors, Hackers: Using Decompiler the competitors, hackers can get inside the product and can acquire Trusted Secrets, which else was hard to known. • Security holes Exposed: The security holes can be made public using Decompilers in software developed.

Conclusion

• The future of decompilation and obfuscation techniques looks like being fairly reminiscent of the Cold War. An arms race is brewing between the decompiler developers and the obfuscator developers. Oddly enough these are often the same companies trying to break a rival’s product and then sell there own obfuscator as the only 100% secure solution. • The Attitude towards Decompiler’s is needed to be changed, as there are many positive uses of this concept, so more attention is needed to be given to this concept.

Decompiler: Reversing Compilation - 16 - • More research is required in executable to assembly code decompilation.

References

• Thesis paper by Cristina Confluents On “Decompiler Techniques”. • http://www.riis.com/book • http://www.debugmode.com/index.htm • http://www.program-transformation.org • http://www.itee.uq.edu.au/~csmweb/decompilation/ • http://www.decompiler.com/ • http://en.wikipedia.org/ • http://research.sun.com/research/people/cristina/index.html • Google Groups at http://www.google.com/ • E-book On Building decompilers http://www.riis.com/book • “C Column Collection” by Yeshwant Kanetkar

------

Decompiler: Reversing Compilation - 17 -