Jump Tables, Nested Loops) – Understanding Complex Data Structures (E.G
Total Page:16
File Type:pdf, Size:1020Kb
Advanced Static Analysis Instructor Davide Maiorca Web Security and Malware Analysis M.Sc. in Computer Engineering, Cybersecurity and Artificial Intelligence University of Cagliari, Italy More on Assembly X86: Calling Conventions • The «standard» calling convention is that the first parameter to be pushed to the stack is the last – The parameter that appears first on the function goes to the bottom of the stack • However, a calling convention is much more complex than this • Calling conventions define how functions results are returned and who cleans the stack – The return value is typically stored in the EAX register • Stack cleaning: depending on the program flow, the arguments passed to a specific function must be cleaned up (removed or replaced) • Normally, the caller function is the one that takes care of the cleaning • This convention is called _cdecl • Typically, this is the standard convention in C programs http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 2 The _cdecl Convention Return Address Previous EBP EBP 1 2 2 ESP 1 retn immediately transfers the control to the caller http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 3 Other Conventions • According to specific needs/function optimization, alternative calls can be used • _stdcall – Like _cdecl, but the stack is cleaned by the function that is called, and not by the caller • _fastcall – It does not push arguments to the stack, but uses registers to store parameters (ecx, edx, eax) – Useful for calls with few arguments (generally two, maximum three) – The stack is cleaned by the called function (like _stdcal) • _thiscall – Only used in object oriented languages – The pointer to the object this is stored in the ecx register • More conventions can be established according to the specific compiler http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 4 _cdecl Vs _stdcall Vs _fastcall Adding a number to ret tells the instruction to pop additional bytes (in this case, ret 8 means popping two http://pralab.diee.unica.it blocks) Web Security and Malware Analysis - Davide Maiorca 5 Assembly X64 • Uses 64 bit-registers and 64-bit addressing • Employs more registers (see Chapter 10 of the slides) – 64 bit-registers starting with «r» (rax, rbp, rsp) • Uses different call conventions • The first parameters never go to the stack, but to the registers first – The order is: RCX, RDX, R8, R9 -> Stack – The calling function works on cleaning the stack if available • X64 supports also floating-point numbers – For this purpose, special registers called XMM are employed – These registers were introduced by the SSE extensions – 128-bit registers that can store 2 64 bit values or 4 32 bit values http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 6 Funny_Sum vs Funny_Sum_64 - Main X86 X64 The first parameters registers MUST be RCX and RDX http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 7 Funny_Sum vs Funny_Sum_64 – Funny_Sum X86 X64 In many cases, the differences between X86 and X64 http://pralab.diee.unica.it are very little Web Security and Malware Analysis - Davide Maiorca 8 Decompilation • Generation of C-code from the original assembly code • Decompilers use custom variables (and sometimes custom types) – You can rename variables, functions and structures • By pressing a button, it is possible to generate a possible C-equivalent of the assembly code • One big advantage of de-compilation is that you can re-compile the whole program by copying and pasting the de-compiled code – In IDA, you can use the «defs.h» library when recompiling the code • An excellent, free decompiler is GHIDRA – Made by the NSA – Great alternative to IDA http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 9 Ghidra http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 10 Ghidra – Funny Sum Decompilation Ghidra automatically associates the decompiled lines to the assembly code http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 11 Ghidra – Funny Sum Function http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 12 Recognizing Code Constructs • Decompilation should never be used alone, but in combination with disassembly • Decompilation is not perfect – Distinguishing code from data (a problem also related to disassembling) – Finding the proper return data types – Understanding complex control structures (e.g., jump tables, nested loops) – Understanding complex data structures (e.g. user-defined typedef) • It is important to always compare the disassembly output with the decompiled output – General rule: assembly is often more reliable • It is also critical to recognize where a specific assembly construct starts/ends http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 13 If and Nested If Statements Nested if = multiple cmp+j instructions void if_statement() { int x,y,z; • IF instructions work by setting special FLAGS with the CMP instruction if (x==y) { • Decisions are taken if a specific FLAG is SET or not • Flags can change depending on the jump instruction (Zero Flag, Sign Flag…) if(z!=0) { • Careful about JNZ: «JNZ» stands for «jump if the flag ZERO (ZF) is NOT set» puts("z is non-zero and x=y") • If x=y the flag ZERO is set to 1 } • JNZ is taken (green branch) when the flag is NOT set (ZF=0) } • So, if the branch is not taken, it means that x=y • Equivalent to JNE • JZ jumps if the flag ZERO is SET http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 14 Recognizing Arrays void array() { int b[5] = {123,87,487,7,978}; int i; //local variable at ebp-4 int a[5]; for(i = 0; i<5; i++) { a[i] = i; //(a is loaded at ebp-44) b[i] = i;// (b is loaded at ebp-24) } } • Arrays are typically assigned sequentially (check what happens at ebp-24, ebp-20, …) • A register is used as an index (in this case, eax, which is often multiplied by 4 in int arrays) http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 15 Recognizing For/While Loops void for_loop() { int i; for ( i = 0; i <= 99; ++i ) printf("i equals %d\n", i); } • For loops are typically recognized by the presence of an IF condition that jumps to a specific location + a back-arrow in the block diagram • There is an increment/decrement instruction on the same element of the CMP instruction (in this case, ebp+i) While loops are similar, but they may not feature • The same element is initialized before the the increment CMP http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 16 Switch Statements – Jump Tables Jumptables are essentially array of locations to which the program can jump The value of the case in a switch controls the element of the table to which the program jumps The array of address is stored in a pre- determined location in the file (typically, it starts with a ds: ) The next address in the jump table can depend on the size of the assembly code Switch with few cases are translated to nested ifs of the previus case However, switch with many cases are translated to However, these cases may not be jumptables necessarily consecutive http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 17 References and Tools • https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming- conventions?view=vs-2019 • https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019 • https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?view=vs-2019 • Assembly X64 cheatsheet: https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf • M.Sikorski, A.Honig. Practical Malware Analysis, Chapter 6 • GHIDRA (https://ghidra-sre.org/) http://pralab.diee.unica.it Web Security and Malware Analysis - Davide Maiorca 18.