Is It Possible to Reverse Engineer Obfuscated Bytecode Back to Source Code?

EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP STOCKHOLM, SVERIGE 2020 Is it possible to reverse engineer obfuscated bytecode back to source code? Är det möjligt att dekompilera obfuskerad bytekod tillbaka till källkod? GUSTAV SMEDBERG JENNY MALMGREN KTH SKOLAN FÖR KEMI, BIOTEKNOLOGI OCH HÄLSA Is it possible to reverse engineer obfuscated bytecode back to source code? Är det möjligt att dekompilera obfuskerad bytekod tillbaka till källkod? Gustav Smedberg Jenny Malmgren Examensarbete inom Datateknik Grundnivå, 15 hp Handledare på KTH: Anders Cajander Examinator: Ibrahim Orhan TRITA-CBH-GRU-2020:052 KTH Skolan för kemi, bioteknologi och hälsa 141 52 Huddinge, Sverige Sammanfattning Det finns mycket gammal mjukvara ute i världen som inte längre underhålls och skulle behöva uppdateras för att kunna täppa säkerhetshål alternativt uppdatera funktioner i dessa program. I fall där källkoden har förlorats eller raderats , skulle det då vara möjligt att använda dekompilering för att återfå källkoden? Rapport syftar till att redovisa vad java bytekod är och hur den används samt hur man kan gå från java bytekod tillbaka till källkod genom en process som kallas dekompilering samt hur man kan skydda sig mot detta genom obfuskering av kod. Vidare redovisas tidigare forskning inom dekompilering samt obfuskering och kompletterar med förklaringar vad en Java Virtual Machine, Bytekod och obfuskering är och hur de fungerar. Tre program av varierande komplexitet görs om till bytekod, obfuskeras för att sedan dekompileras och jämföra resultatet gentemot källkoden. Slutligen, det är möjligt att dekompilera den obfuskerade koden men enbart vissa delar av källkoden går att återskapa. Alla variabelnamn och oanvända metoder försvinner helt samt att koden ibland ändras till icke-konventionella sätt att programmera. Nyckelord Reverse engineering, Java, JVM, bytecode, obfuskering, dekompilering, sä- kerhet. Abstract There are a lot of old software in the world that has not been supported or kept up to date and would need to be updated to seal security vulnerabilities, as well as to up- date functions in the program. In those cases where the source code has been lost or deliberately deleted, would it be possible to use reverse engineering to retrieve the source code? This study aims to show what java bytecode is and how it is used, as well as how one is able to go from java bytecode back to source code in a process called Reverse En- gineering. Furthermore, the study will show previous work in reverse engineering, in obfuscation and to explain further details about what Java Virtual machine, bytecode and obfuscation is and how they work. Three programs of various complex- ity are made into bytecode and then obfuscated. The difference between the original code and the obfuscated code are then analyzed. The results show that it is possible to reverse engineer obfuscated code but some parts. Obfuscation does protect the code, as all the variable names are changed and every unused method are removed, as well as some methods changed to non-con- ventional ways to program. Keywords Reverse engineering, Java, JVM, bytecode, obfuscation, safety. Acknowledgements The Authors of this paper has attended KTH for an entire bachelor’s degree and this is the biggest and last course. We wish to thank all of our teachers for supporting us during these years. A big thank you to Anders Cajander who has been our mentor for this project and Ibrahim Orhan who has been our examiner who both of them helped us with their experience and perspective. Thank you to Sebastian Zeerak who lend us the code to the most advanced program which was made together with Gustav Smedberg. A bit thank you to Anders Lindstöm who is the author of AudioStreamUDP.java who gave us permission to use it in our thesis. We also wish to thank AstraZeneca, and in particular Mikael Engström and Olle Sundholm that helped us settle on a report subject, even if they did not have the resources to supervise us during the project. We also wish to thank our friends and families for the support they have given us during the making of this project. Table of contents 1 Introduction ............................................................................................................................................. 1 1.1 Problem definition ............................................................................................................................ 1 1.2 Objective .......................................................................................................................................... 2 1.3 Limitations ....................................................................................................................................... 2 2 Theory and background ......................................................................................................................... 5 2.1 Coding language .............................................................................................................................. 5 2.2 Java and Bytecode .......................................................................................................................... 5 2.2.1 Class file .................................................................................................................................... 5 2.3 Virtual Machine System ................................................................................................................ 11 2.3.1 Java Virtual Machine .............................................................................................................. 11 2.4 Reverse engineering analytics methods ....................................................................................... 12 2.4.1 Dynamic reverse engineering ................................................................................................. 13 2.4.2 Static reverse engineering ...................................................................................................... 13 2.5 Bytecode Obfuscation .................................................................................................................... 13 2.6 Previous work ................................................................................................................................. 15 2.7 Tools and frameworks.................................................................................................................... 16 2.7.1 Javap - The Java Class File Disassembler .............................................................................. 16 2.7.2 Decompilers ............................................................................................................................ 16 2.7.3 Dynamic analysis tool ............................................................................................................. 16 2.7.4 Code obfuscating program ..................................................................................................... 16 2.7.5 IntelliJ IDEA ........................................................................................................................... 16 2.8 Code ................................................................................................................................................ 16 2.9 Practical implementation .............................................................................................................. 16 3 Method ................................................................................................................................................... 19 3.1 Analysis Methods ........................................................................................................................... 19 3.2 Java code obfuscation ................................................................................................................... 20 3.3 Static analysis ................................................................................................................................ 20 3.4 Dynamic analysis .......................................................................................................................... 20 4 Results ................................................................................................................................................... 21 4.1 Simple program .............................................................................................................................. 21 4.1.1 Obfuscated code ....................................................................................................................... 21 4.1.2 Decompilation and dynamic analyzation .............................................................................. 21 4.2 More complex program ................................................................................................................ 23 4.2.1 Obfuscated code ..................................................................................................................... 23 4.2.2 Decompilation and dynamic analysis ................................................................................... 24 4.3 Advanced program ........................................................................................................................ 26 4.3.1 Obfuscated code ..................................................................................................................... 26 4.3.2 Decompilation and dynamic analyzing ...............................................................................

Load more