Quick Introduction to Reverse Engineering for Beginners
Total Page:16
File Type:pdf, Size:1020Kb
Quick introduction to reverse engineering for beginners Dennis Yurichev <[email protected]> March 4, 2018 Contents Preface iv 2 Compiler’s patterns 1 2.1 Hello, world!..............................................2 2.2 Stack..................................................4 2.2.1 Save return address where function should return control after execution........4 2.2.2 Function arguments passing.................................5 2.2.3 Local variable storage....................................5 2.2.4 (Windows) SEH.......................................6 2.2.5 Buffer overflow protection..................................6 2.3 printf() with several arguments...................................7 2.4 scanf().................................................9 2.4.1 Global variables........................................ 10 2.4.2 scanf() result checking.................................... 12 2.5 Passing arguments via stack..................................... 14 2.6 One more word about results returning............................... 16 2.7 Conditional jumps.......................................... 17 2.8 switch()/case/default......................................... 19 2.8.1 Few number of cases..................................... 19 2.8.2 A lot of cases......................................... 20 2.9 Loops................................................. 24 2.10 strlen()................................................ 27 2.11 Division by 9............................................. 30 2.12 Work with FPU............................................ 32 2.12.1 Simple example........................................ 32 2.12.2 Passing floating point number via arguments....................... 34 2.12.3 Comparison example..................................... 34 2.13 Arrays................................................. 40 2.13.1 Buffer overflow........................................ 41 2.13.2 One more word about arrays................................ 45 2.13.3 Multidimensional arrays................................... 45 2.14 Bit fields................................................ 47 2.14.1 Specific bit checking..................................... 47 2.14.2 Specific bit setting/clearing................................. 49 2.14.3 Shifts............................................. 51 2.14.4 CRC32 calculation example................................. 53 2.15 Structures............................................... 56 2.15.1 SYSTEMTIME example................................... 56 2.15.2 Let’s allocate place for structure using malloc()...................... 57 2.15.3 Linux............................................. 58 2.15.4 Fields packing in structure.................................. 59 2.15.5 Nested structures....................................... 61 2.15.6 Bit fields in structure..................................... 62 i 2.16 C++ classes.............................................. 68 2.16.1 GCC.............................................. 71 2.17 Pointers to functions......................................... 73 2.17.1 GCC.............................................. 75 2.18 SIMD................................................. 77 2.18.1 Vectorization......................................... 77 2.18.2 SIMD strlen() () implementation............................. 83 2.19 x86-64................................................. 87 3 Couple things to add 94 3.1 LEA instruction............................................ 95 3.2 Function prologue and epilogue................................... 96 3.3 npad.................................................. 97 3.4 Signed number representations................................... 99 3.4.1 Integer overflow........................................ 99 3.5 Arguments passing methods (calling conventions)......................... 100 3.5.1 cdecl.............................................. 100 3.5.2 stdcall............................................. 100 3.5.3 fastcall............................................. 100 3.5.4 thiscall............................................. 101 3.5.5 x86-64............................................. 101 3.5.6 Returning values of float and double type......................... 101 4 Finding important/interesting stuff in the code 102 4.1 Communication with the outer world................................ 102 4.2 String................................................. 103 4.3 Constants............................................... 103 4.3.1 Magic numbers........................................ 103 4.4 Finding the right instructions.................................... 104 4.5 Suspicious code patterns....................................... 105 5 Tasks 106 5.1 Easy level............................................... 106 5.1.1 Task 1.1............................................ 106 5.1.2 Task 1.2............................................ 107 5.1.3 Task 1.3............................................ 109 5.1.4 Task 1.4............................................ 110 5.1.5 Task 1.5............................................ 112 5.1.6 Task 1.6............................................ 113 5.1.7 Task 1.7............................................ 114 5.1.8 Task 1.8............................................ 115 5.1.9 Task 1.9............................................ 115 5.1.10 Task 1.10........................................... 116 5.2 Middle level.............................................. 116 5.2.1 Task 2.1............................................ 116 5.3 crackme / keygenme......................................... 123 6 Tools 124 6.0.1 Debugger........................................... 124 ii 7 Books/blogs worth reading 125 7.1 Books................................................. 125 7.1.1 Windows........................................... 125 7.1.2 C/C++............................................ 125 7.1.3 x86 / x86-64......................................... 125 7.2 Blogs.................................................. 125 7.2.1 Windows........................................... 125 8 Other things 126 8.1 More examples............................................ 126 8.2 Compiler’s anomalies......................................... 126 9 Tasks solutions 128 9.1 Easy level............................................... 128 9.1.1 Task 1.1............................................ 128 9.1.2 Task 1.2............................................ 128 9.1.3 Task 1.3............................................ 128 9.1.4 Task 1.4............................................ 129 9.1.5 Task 1.5............................................ 129 9.1.6 Task 1.6............................................ 130 9.1.7 Task 1.7............................................ 130 9.1.8 Task 1.8............................................ 131 9.1.9 Task 1.9............................................ 131 9.2 Middle level.............................................. 132 9.2.1 Task 2.1............................................ 132 iii Preface Here (will be) some of my notes about reverse engineering in English language for those beginners who like to learn to understand x86 code created by C/C++ compilers (which is a most large mass of all executable software in the world). There are two most used compilers: MSVC and GCC, these we will use for experiments. There are two most used x86 assembler syntax: Intel (most used in DOS/Windows) and AT&T (used in *NIX) 1. Here we use Intel syntax. IDA6 produce Intel syntax listings too. 1http://en.wikipedia.org/wiki/X86_assembly_language#Syntax iv Chapter 2 Compiler’s patterns When I first learn C and then C++, I was just writing small pieces of code, compiling it and see what is producing in assembler, that was an easy way to me. I did it a lot times and relation between C/C++ code and what compiler produce was imprinted in my mind so deep so that I can quickly understand what was in C code when I look at produced x86 code. Perhaps, this method may be helpful for anyone other, so I’ll try to describe here some examples. 1 2.1 Hello, world! Let’s start with famous example from the book "The C programming Language"1: #include <stdio.h> int main () { printf("hello, world"); return 0; }; Let’s compile it in MSVC 2010: cl 1.cpp /Fa1.asm (/Fa option mean generate assembly listing file) CONSTSEGMENT $SG3830 DB ’hello, world’, 00H CONSTENDS PUBLIC _main EXTRN _printf:PROC ; Function compile flags: /Odtp _TEXTSEGMENT _main PROC push ebp mov ebp , esp push OFFSET $SG3830 call _printf add esp , 4 xor eax , eax pop ebp ret 0 _main ENDP _TEXTENDS Compiler generated 1.obj file which will be linked into 1.exe. In our case, the file contain two segments: CONST (for data constants) and _TEXT (for code). The string "hello, world" in C/C++ has type const char*, however hasn’t its own name. But compiler need to work with this string somehow, so it define internal name $SG3830 for it. As we can see, the string is terminated by zero byte this is C/C++ standard of strings. In the code segment _TEXT there are only one function so far _main. Function _main starting with prologue code and ending with epilogue code, like almost any function. Read more about it in section about function prolog and epilog 3.2. After function prologue we see a function printf() call: CALL _printf. Before the call, string address (or pointer to it) containing our greeting is placed into stack with help of PUSH instruction. When printf() function returning control flow to main() function, string address (or pointer to it) is still in stack. Because we do not need it anymore, stack