Decompiler: Reversing Compilation
Total Page:16
File Type:pdf, Size:1020Kb
Decompiler: Reversing Compilation Written By, Tushar B Kute, Lecturer in Information Technology (K. K. Wagh Polytechnic, Nashik – 3, Maharashtra, INDIA) Email – [email protected] Decompiler: Reversing Compilation - 1 - Abstract This paper presents an overview of Decompiler, a program which performs exact reverse process of compilers i.e. creating high level language code from a machine /assembly language code. Decompiler comes into picture when user needs source code from executables during number of occasions. Decompiler mainly deals with Reverse Engineering, which can be used for positive as well as negative purposes depending on application user uses it for. Hence we also need ways to protect our code from Decompiler to avoid misuse. In Industry people are taking Decompilation quite seriously like any other department while discovering its usability. In this paper I have a foot in two different camps, as a programmer I’m interested in understanding how others achieve interesting effects but from a business point of view I’m not too keen on someone reusing my code and selling them onto third parties as their own. This paper presents an overview of the Decompiler’s working and area of usability. Decompiler: Reversing Compilation - 2 - Introduction A Decompiler is a program that reads a program written in machine language and translates it into equivalent program in HLL. A Process done by Decompiler is Decompilation which is the reverse process of compilation i.e. creating high level language code from machine/assembly language code. At the basic level, it just requires to understand the machine/assembly code and rewrite it into a high level language. Machine High Level language Decompiler language Program Program Figure 1 Process of Decompilation So, using Decompiler we can get source code back from executables. Although translations aim at preserving the extensional semantics of a program, it is usually not possible to retain all information across a translation. A compilation is a form of synthesis in which a program in a high-level language is transformed to machine code. Need of Decompiler Computer languages such as FORTRAN, COBOL, C and now Java were developed to allow us to put our ideas in a human friendly format that can then be converted into a format that a computer chip can understand. So, during various situations when we need source code back from computer understandable form i.e. exe (executable form) the situations which are discussed in details in application part Decompiler comes for help and gives us source code back from executables. Some of the situations where Decompiler helps us are – • To recover Lost Source Code • Migration of applications to new Hardware and many other applications Some of available Decompiler The Decompilers available today are as below – - DisC for Turbo C 2.0/2.01 The source is free to download. The author is Satish Kumar http://www.debugmode.com/dcompile/disc.htm - JosephCo: VB 5 beta decompiler - Hans-Peter Diettrich (DoDi) VB3 Decompiler (Generated as Microsoft was giving much of source code in exe generated by VB3) - Jad (Jad - the fast JAva Decompiler) Jad is a decompiler that is free for non commercial use. - Salamanda is a commercial decompiler for .NET. First release was 1st Feb 2002. There are four examples online. http://www.remotesoft.com/sal Decompiler: Reversing Compilation - 3 - Application-specific Decompilers There are a number of applications that generate an intermediate (low level) code, which is interpreted by a virtual machine. With some applications this low level code is put inside an executable (e.g. Java bytecodes). Writing decompilers for these intermediate low level code is often much easier; intermediate code is the assembly language of the virtual machine. Examples are: • MultiRipper: a Windows and Delphi/C++ Builder ripper by Baccan Matteo and Peruch Emiliano. A "ripper" program extracts files inside other files. MultiRipper extracts files from Windows and Delphi/C++ Builder applications. For windows applications it extracts windows resources and saves them onto disk, and for Delphi/C++ Builder applications it recovers the Delphi project and code. MultiRipper is not a decompiler, as it does not recover the source code of a Delphi application; however, the authors are working on this for a future release (current release is 2.6). • SourceAgain: a Java decompiler by Ahpah Software. SourceAgain correctly recovers Java control structures and optimizations from the bytecode. Further, it supports irreducible graphs, polymorphic type inference, recognition of packages, and more, and provides debugging support. PC and Unix versions of SourceAgain are available with prices ranging between U$99 and U$299. • ReFox7.0: A decompiler by Xitech for restoring source code from FoxBASE+, FoxPro 1.x, FoxPro 2.x and for Visual FoxPro executables or compiled modules. (The information given in the FoxPro programming FAQ is out- dated.) • Alexander Lobanov's FoxPro decompilers for version of FoxPro 2.0, 2.5/2.6. Demo versions of the software are available; these programs are distributed in shareware mode. • The Visual Basic 4 decompiler: It can decompile files generated by the Visual Basic compiler; accessible from the DoDi VB tools page. • The Decaf decompiler for Java .class files: Written in Ada95. Decompilers to Ada95 and Smalltalk are worked on. • The Mocha decompiler for Java .class files: We can use crema to scramble symbolic information in the .class files. • Sculptor and Realizer: These two decompilers were written in Spain: The first is a decompiler written in 1989 for a 4th generation language called Sculptor (also referred to as Sage and Sagerep), for the development of information systems (that is similar to a context-free RPG II). This is a true case in which the development of a decompiler was required due to a huge loss of the originals and backups of an important program. The company for which it was developed exists no more. The second decompiler is for Computer Associates Realizer 2.0, which is the Visual Basic of Computer Associates (considering Visual Objects is their xBase product). Realizer is very close in everything to VB 3.0 and surpasses it in included components, being overall superior, but it didn't got so much popularity: it includes reporter, screen painter, configuration control and version tools (these tools were written in Realizer itself) and features custom controls, databases and ODBC, etc. This language is scarcely used now. • Besides, a try was made to develop a decompiler for COBOL as a hobby project, but it was abandoned due to a lack of knowledge of COBOL. Decompiler: Reversing Compilation - 4 - History of Decompilers Existence of Decompiler is not new; it is existed from long back in various forms. IBM had played important role in development of decompilers. First Decompiler Decompilers have been written for a variety of applications since development of the first compilers. The very first decompiler was written by Joel Donnelly in 1960 at the Naval Electronic Labs to decompile machine code to Neliac on a Remington Rand Univac M-460 Countess computer. Hence D-Neliac Decompiler was first Decompiler developed in 1960. Uses from Last Decades Throughout the last decades, different uses have been given to decompilers. In the 1960s, decompilers were used to aid in the program conversion process from second to third generation computers; in this way, manpower would not be spent in the time-consuming task of rewriting programs for the third generation machines. During the 70s and 80s, decompilers were used for the portability of programs, documentation, debugging, re-creation of lost source code, and the modification of existing binaries. In the 90s, decompilers have become a reverse engineering tool capable of helping the user with such tasks as checking software for the existence of malicious code, checking that a compiler generates the right code, translation of binary programs from one machine to another, and understanding of the implementation of a particular library function. The ethics of Decompilation Is decompilation legal, and is it allowed? There are many situations when decompilation can be used. • To recover lost source code. We may have written a program for which we only have the executable now (or we got the exe of a program we wrote long back, from someone else!). If we want to have the source for such a program, we can use decompilation to recover it. In all rights, we are the owner of the program, so nobody is going to question us. • Just as stated above, applications written long back for a legacy computer may not have the source code now, and we may need to port it to a new platform. Either we have to rewrite the application from the scratch, or use decompilation to understand the working of the application and write it again. • For example, we have code written in some language for which we can’t find a compiler today! If we have the executable, just decompile it and rewrite the logic in the language of our choice today. • To discover the internals of someone else's program (like what algorithm they have used...) • Usually all software are copyrighted by the authors. This means, copying or expressing the same idea in another program is prohibited. Hence if we are using decompilation to discover the internals of a program and if that particular part is breaching the copyright of the owner, we are liable for legal action. However, there are some permitted uses of decompilation, like the first Decompiler: Reversing Compilation - 5 - three cases stated above. Also, decompilation of parts of software which do not come under the copyright laws (e.g. algorithms) is permitted. • In all practical purposes, decompiling programs which were created by us can't be questioned! After all, we are the owner of all rights to the program.