
Reverse Engineering II: The Basics Gergely Erdélyi – Senior Manager, Anti-malware Research Protecting the irreplaceable | f-secure.com Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 - Word B D 3 9 2 Byte Order a.k.a. Endianness 00 01 = 0x3412 (Little Endian) 12 34 = 0x1234 (Big Endian) = 0x1234 (Little Endian) 34 12 = 0x3412 (Big Endian) 00 01 3 Little Endian Dword 00 01 02 03 12 34 56 78 0x78563412 78 56 34 12 0x12345678 00 01 02 03 4 Endianness Matters • Data exchange between computers • Networking protocols • File formats for disk storage • Mixing endinannes 5 System Endianness Little Big Switchable Endian Endian Endianness PowerPC Intel x86 ARM (exc. G5) Sparc Intel 8051 Alpha (exc. v9) Most System/370 Intel IA64 uControllers 6 ASCII Code Control Backspace, 0x00 - 0x1F Characters Line feed Digits and 0-9 <> 0x20 - 0x3F Punctuation = .,: *-()! Upper-case ABCD... 0x40 - 0x5F Letters and @[]\^_ Special Lower-case abcd... 0x60 - 0x7E Letters and `{}|~ Special 7 ASCII Example H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34 http://en.wikipedia.org/wiki/ASCII 8 Unicode Strings BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00 UTF-16 / UCS-2 http://en.wikipedia.org/wiki/UTF-16/UCS-2 http://en.wikipedia.org/wiki/Category:Unicode 9 String Storage • ASCIIZ: Zero-terminated ASCII • Pascal: Size byte + ASCII string • Delphi: Size Dword + ASCII or Unicode string H e l l o ASCIIZ: 48 65 6C 6C 6F 00 Pascal: 05 48 65 6C 6C 6F 10 Intel x86 Architecture Image Copyright © 2004 GNU 11 Introduction to Intel x86 • Started with 8086 in 1978 • Continued with 8088, 80186, 80286, 386, 486, Pentium, 686 ... • CISC architecture • 32-bit is called x86-32 or IA-32 • 64-bit is called x86-64, AMD64, EMT64T • 80386 introduced in 1986 • Has a 32-bit word length • Has eight general-purpose registers • Supports paging and virtual memory • Addresses up to 4GiB of memory 12 Data Register Layout Image Copyright © 1997-2008 Intel Corporation 13 Data Registers AL / AH / AX Accumulator Arithmetic operations EAX BL / BH / BX General data storage, Data index EBX index CL / CH / CX Loop counter Loop constructs ECX DL / DH / DX Data register Arithmetics EDX 14 Address Registers IP / EIP Instruction Pointer Program execution SP / ESP Stack Pointer Stack operation BP / EBP Base Pointer Stack frame SI / ESI Source Index String operation DI / EDI Destination Index String operation 15 Segment Registers CS Code Segment Program code DS Data Segment Program data ES / FS / GS Other Segments Other uses 16 EFLAGS Register Image Copyright © 1997-2008 Intel Corporation 17 Mnemonic Examples MOV EAX, 1 Move 1 to EAX ADD EDX, 5 Add 5 to EDX SUB EBX, 2 Subtract 2 from EBX AND ECX, 0 Bit-wise AND 0 to ECX XOR EDX, 4 Bit-wise eXclusive OR 4 to EDX SHL ECX, 6 Shift ECX left by six ROR EBX, 3 Bit-wise rotate EBX right by 3 INC ECX Increment ECX 18 More Mnemonics JNZ label Jump if not zero (equal) JMP label Unconditional jump to label CALL func Call function RET Return from function LOOP label ECX--, Jump to label if not zero PUSH EAX Push EAX to stack POP EDI Pop EDI from stack LODSB Load byte from DS:ESI to AL 19 Reversing C Code Image Copyright © 1988, 1978 by Bell Telephone Labratories, Incorporated 20 Basic Data Types char - 1 byte short - 2 bytes int - 4 bytes (platform word) long - 4 bytes float - 4 bytes floating point double - 8 bytes floating point 21 Arrays and Pointers • Pointers can point to any memory location • One-dimensional arrays are flat memory • Multi-dimensional arrays use pointers char a[4]; A A A A char *b, c; c = a[2]; c = *(b+2); 22 Structures and Unions Structure Union union foo { struct { unsigned int id; int one; unsigned short age; char two; char name[16]; } record; }; Memory is allocated for all Memory is allocated for the members combined. largest member only. sizeof(record) = 24 sizeof(foo) = 4 23 Structure Alignment • Data structures are aligned to word size by default • #pragma pack(n) directive can change it • #pragma pack(1) removes alignment • Important when reconstructing structures 24 Structure Storage Aligned Packed DWORD id DWORD id WORD age WORD age 2-byte padding 16 BYTES name 16 BYTES name sizeof(record) = 24 sizeof(record) = 22 25 Simple C Program int foobar(int x, int y) { int z = x+y; return z; } int main(void) { int z = foobar(1, 2); } 26 Function Calls • Calling conventions are important to know • Mixing them will crash the program • __stdcall - Standard calls on Windows • __cdecl - Most common C calling convention • __fastcall - Uses registers for arguments • __thiscall - Pass ‘this’ pointer in ECX in C++ 27 __cdecl Calls Stack PUSH arg2 PUSH arg1 ARG2 CALL function ARG1 ADD ESP,8 RET Addr. PUSH EBP Saved EBP MOV EBP, ESP LOC1 SUB ESP, 4 MOV EAX, [EBP+8] arg1: EBP+8 MOV ESP, EBP arg2: EBP+12 POP EBP loc1: EBP-4 RET 28 __stdcall Calls PUSH arg1 ARG1 PUSH arg2 ARG2 CALL function RET Addr. Saved EBP PUSH EBP MOV EBP, ESP LOC1 SUB ESP, 4 MOV EAX, [EBP+8] arg1: EBP+8 MOV ESP, EBP arg1: EBP+12 POP EBP loc1: EBP-4 RETN 8 29 Further Reading Intel Processor Documentation http://www.intel.com/products/processor/ manuals/index.htm Netwide Assembler Mnemonic Documentation http://sourceforge.net/docman/display_doc.php? docid=47259&group_id=6208 The Art of Assembly Language Programming Windows 32-bit Edition http://webster.cs.ucr.edu/AoA/index.html 30.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages30 Page
-
File Size-