Reverse Engineering II: The Basics
Gergely Erdélyi – Senior Manager, Anti-malware Research
Protecting the irreplaceable | f-secure.com Binary Numbers
1 0 1 1 - Nibble B
1 0 1 1 1 1 0 1 - Byte B D
1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 - Word B D 3 9
2 Byte Order a.k.a. Endianness
00 01 = 0x3412 (Little Endian) 12 34 = 0x1234 (Big Endian)
= 0x1234 (Little Endian) 34 12 = 0x3412 (Big Endian) 00 01
3 Little Endian Dword
00 01 02 03 12 34 56 78 0x78563412
78 56 34 12 0x12345678 00 01 02 03
4 Endianness Matters
• Data exchange between computers • Networking protocols • File formats for disk storage • Mixing endinannes
5 System Endianness
Little Big Switchable Endian Endian Endianness
PowerPC Intel x86 ARM (exc. G5)
Sparc Intel 8051 Alpha (exc. v9)
Most System/370 Intel IA64 uControllers
6 ASCII Code
Control Backspace, 0x00 - 0x1F Characters Line feed
Digits and 0-9 <> 0x20 - 0x3F Punctuation = .,: *-()!
Upper-case ABCD... 0x40 - 0x5F Letters and @[]\^_ Special
Lower-case abcd... 0x60 - 0x7E Letters and `{}|~ Special
7 ASCII Example
H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34
http://en.wikipedia.org/wiki/ASCII
8 Unicode Strings
BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00
UTF-16 / UCS-2
http://en.wikipedia.org/wiki/UTF-16/UCS-2 http://en.wikipedia.org/wiki/Category:Unicode
9 String Storage
• ASCIIZ: Zero-terminated ASCII • Pascal: Size byte + ASCII string • Delphi: Size Dword + ASCII or Unicode string
H e l l o ASCIIZ: 48 65 6C 6C 6F 00 Pascal: 05 48 65 6C 6C 6F
10 Intel x86 Architecture
Image Copyright © 2004 GNU
11 Introduction to Intel x86
• Started with 8086 in 1978 • Continued with 8088, 80186, 80286, 386, 486, Pentium, 686 ... • CISC architecture • 32-bit is called x86-32 or IA-32 • 64-bit is called x86-64, AMD64, EMT64T
• 80386 introduced in 1986 • Has a 32-bit word length • Has eight general-purpose registers • Supports paging and virtual memory • Addresses up to 4GiB of memory
12 Data Register Layout
Image Copyright © 1997-2008 Intel Corporation
13 Data Registers
AL / AH / AX Accumulator Arithmetic operations EAX
BL / BH / BX General data storage, Data index EBX index
CL / CH / CX Loop counter Loop constructs ECX
DL / DH / DX Data register Arithmetics EDX
14 Address Registers
IP / EIP Instruction Pointer Program execution
SP / ESP Stack Pointer Stack operation
BP / EBP Base Pointer Stack frame
SI / ESI Source Index String operation
DI / EDI Destination Index String operation
15 Segment Registers
CS Code Segment Program code
DS Data Segment Program data
ES / FS / GS Other Segments Other uses
16 EFLAGS Register
Image Copyright © 1997-2008 Intel Corporation
17 Mnemonic Examples
MOV EAX, 1 Move 1 to EAX
ADD EDX, 5 Add 5 to EDX
SUB EBX, 2 Subtract 2 from EBX
AND ECX, 0 Bit-wise AND 0 to ECX
XOR EDX, 4 Bit-wise eXclusive OR 4 to EDX
SHL ECX, 6 Shift ECX left by six
ROR EBX, 3 Bit-wise rotate EBX right by 3
INC ECX Increment ECX
18 More Mnemonics
JNZ label Jump if not zero (equal)
JMP label Unconditional jump to label
CALL func Call function
RET Return from function
LOOP label ECX--, Jump to label if not zero
PUSH EAX Push EAX to stack
POP EDI Pop EDI from stack
LODSB Load byte from DS:ESI to AL
19 Reversing C Code
Image Copyright © 1988, 1978 by Bell Telephone Labratories, Incorporated
20 Basic Data Types
char - 1 byte short - 2 bytes int - 4 bytes (platform word) long - 4 bytes float - 4 bytes floating point double - 8 bytes floating point
21 Arrays and Pointers
• Pointers can point to any memory location • One-dimensional arrays are flat memory • Multi-dimensional arrays use pointers
char a[4]; A A A A char *b, c; c = a[2]; c = *(b+2);
22 Structures and Unions
Structure Union
union foo { struct { unsigned int id; int one; unsigned short age; char two; char name[16]; } record; };
Memory is allocated for all Memory is allocated for the members combined. largest member only.
sizeof(record) = 24 sizeof(foo) = 4
23 Structure Alignment
• Data structures are aligned to word size by default • #pragma pack(n) directive can change it • #pragma pack(1) removes alignment • Important when reconstructing structures
24 Structure Storage
Aligned Packed DWORD id DWORD id
WORD age WORD age
2-byte padding 16 BYTES name
16 BYTES name
sizeof(record) = 24 sizeof(record) = 22
25 Simple C Program
int foobar(int x, int y) { int z = x+y; return z; }
int main(void) { int z = foobar(1, 2); }
26 Function Calls
• Calling conventions are important to know • Mixing them will crash the program • __stdcall - Standard calls on Windows • __cdecl - Most common C calling convention • __fastcall - Uses registers for arguments • __thiscall - Pass ‘this’ pointer in ECX in C++
27 __cdecl Calls
Stack PUSH arg2 PUSH arg1 ARG2 CALL function ARG1 ADD ESP,8 RET Addr. PUSH EBP Saved EBP MOV EBP, ESP LOC1 SUB ESP, 4 MOV EAX, [EBP+8] arg1: EBP+8 MOV ESP, EBP arg2: EBP+12 POP EBP loc1: EBP-4 RET
28 __stdcall Calls
PUSH arg1 ARG1 PUSH arg2 ARG2 CALL function RET Addr. Saved EBP PUSH EBP MOV EBP, ESP LOC1 SUB ESP, 4 MOV EAX, [EBP+8] arg1: EBP+8 MOV ESP, EBP arg1: EBP+12 POP EBP loc1: EBP-4 RETN 8
29 Further Reading
Intel Processor Documentation http://www.intel.com/products/processor/ manuals/index.htm
Netwide Assembler Mnemonic Documentation http://sourceforge.net/docman/display_doc.php? docid=47259&group_id=6208
The Art of Assembly Language Programming Windows 32-bit Edition http://webster.cs.ucr.edu/AoA/index.html
30