<<

Reverse Engineering II: The Basics

Gergely Erdélyi – Senior Manager, Anti-malware Research

Protecting the irreplaceable | f-secure.com Binary Numbers

1 0 1 1 - Nibble B

1 0 1 1 1 1 0 1 - B D

1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 1 - Word B D 3 9

2 Byte Order a.k.a. Endianness

00 01 = 0x3412 (Little Endian) 12 34 = 0x1234 (Big Endian)

= 0x1234 (Little Endian) 34 12 = 0x3412 (Big Endian) 00 01

3 Little Endian Dword

00 01 02 03 12 34 56 78 0x78563412

78 56 34 12 0x12345678 00 01 02 03

4 Endianness Matters

• Data exchange between computers • Networking protocols • File formats for disk storage • Mixing endinannes

5 System Endianness

Little Big Switchable Endian Endian Endianness

PowerPC ARM (exc. G5)

Sparc Alpha (exc. v9)

Most System/370 Intel IA64 uControllers

6 ASCII Code

Control Backspace, 0x00 - 0x1F Characters Line feed

Digits and 0-9 <> 0x20 - 0x3F Punctuation = .,: *-()!

Upper-case ABCD... 0x40 - 0x5F Letters and @[]\^_ Special

Lower-case abcd... 0x60 - 0x7E Letters and `{}|~ Special

7 ASCII Example

H e l l o 1 2 3 4 48 65 6C 6C 6F 20 31 32 33 34

http://en.wikipedia.org/wiki/ASCII

8 Strings

BOM H e l l o ff fe 48 00 65 00 6c 00 6c 00 6f 00

UTF-16 / UCS-2

http://en.wikipedia.org/wiki/UTF-16/UCS-2 http://en.wikipedia.org/wiki/Category:Unicode

9 String Storage

• ASCIIZ: Zero-terminated ASCII • Pascal: Size byte + ASCII string • Delphi: Size Dword + ASCII or Unicode string

H e l l o ASCIIZ: 48 65 6C 6C 6F 00 Pascal: 05 48 65 6C 6C 6F

10 Intel x86 Architecture

Image Copyright © 2004 GNU

11 Introduction to Intel x86

• Started with 8086 in 1978 • Continued with 8088, 80186, 80286, 386, 486, Pentium, 686 ... • CISC architecture • 32- is called x86-32 or IA-32 • 64-bit is called x86-64, AMD64, EMT64T

• 80386 introduced in 1986 • Has a 32-bit word length • Has eight general-purpose registers • Supports paging and virtual memory • Addresses up to 4GiB of memory

12 Data Register Layout

Image Copyright © 1997-2008 Intel Corporation

13 Data Registers

AL / AH / AX Accumulator Arithmetic operations EAX

BL / BH / BX General data storage, Data index EBX index

CL / CH / CX Loop counter Loop constructs ECX

DL / DH / DX Data register Arithmetics EDX

14 Address Registers

IP / EIP Instruction Pointer Program execution

SP / ESP Stack Pointer Stack operation

BP / EBP Base Pointer Stack frame

SI / ESI Source Index String operation

DI / EDI Destination Index String operation

15 Segment Registers

CS Code Segment Program code

DS Data Segment Program data

ES / FS / GS Other Segments Other uses

16 EFLAGS Register

Image Copyright © 1997-2008 Intel Corporation

17 Mnemonic Examples

MOV EAX, 1 Move 1 to EAX

ADD EDX, 5 Add 5 to EDX

SUB EBX, 2 Subtract 2 from EBX

AND ECX, 0 Bit-wise AND 0 to ECX

XOR EDX, 4 Bit-wise eXclusive OR 4 to EDX

SHL ECX, 6 Shift ECX left by six

ROR EBX, 3 Bit-wise rotate EBX right by 3

INC ECX Increment ECX

18 More Mnemonics

JNZ label Jump if not zero (equal)

JMP label Unconditional jump to label

CALL func Call function

RET Return from function

LOOP label ECX--, Jump to label if not zero

PUSH EAX Push EAX to stack

POP EDI Pop EDI from stack

LODSB Load byte from DS:ESI to AL

19 Reversing Code

Image Copyright © 1988, 1978 by Bell Telephone Labratories, Incorporated

20 Basic Data Types

char - 1 byte short - 2 int - 4 bytes (platform word) long - 4 bytes float - 4 bytes floating point double - 8 bytes floating point

21 Arrays and Pointers

• Pointers can point to any memory location • One-dimensional arrays are flat memory • Multi-dimensional arrays use pointers

char a[4]; A A A A char *b, c; c = a[2]; c = *(b+2);

22 Structures and Unions

Structure Union

union foo { struct { unsigned int id; int one; unsigned short age; char two; char name[16]; } record; };

Memory is allocated for all Memory is allocated for the members combined. largest member only.

sizeof(record) = 24 sizeof(foo) = 4

23 Structure Alignment

• Data structures are aligned to word size by default • #pragma pack(n) directive can change it • #pragma pack(1) removes alignment • Important when reconstructing structures

24 Structure Storage

Aligned Packed DWORD id DWORD id

WORD age WORD age

2-byte padding 16 BYTES name

16 BYTES name

sizeof(record) = 24 sizeof(record) = 22

25 Simple C Program

int foobar(int x, int y) { int z = x+y; return z; }

int main(void) { int z = foobar(1, 2); }

26 Function Calls

• Calling conventions are important to know • Mixing them will crash the program • __stdcall - Standard calls on Windows • __cdecl - Most common C calling convention • __fastcall - Uses registers for arguments • __thiscall - Pass ‘this’ pointer in ECX in C++

27 __cdecl Calls

Stack PUSH arg2 PUSH arg1 ARG2 CALL function ARG1 ADD ESP,8 RET Addr. PUSH EBP Saved EBP MOV EBP, ESP LOC1 SUB ESP, 4 MOV EAX, [EBP+8] arg1: EBP+8 MOV ESP, EBP arg2: EBP+12 POP EBP loc1: EBP-4 RET

28 __stdcall Calls

PUSH arg1 ARG1 PUSH arg2 ARG2 CALL function RET Addr. Saved EBP PUSH EBP MOV EBP, ESP LOC1 SUB ESP, 4 MOV EAX, [EBP+8] arg1: EBP+8 MOV ESP, EBP arg1: EBP+12 POP EBP loc1: EBP-4 RETN 8

29 Further Reading

Intel Documentation http://www.intel.com/products/processor/ manuals/index.htm

Netwide Assembler Mnemonic Documentation http://sourceforge.net/docman/display_doc.php? docid=47259&group_id=6208

The Art of Programming Windows 32-bit Edition http://webster.cs.ucr.edu/AoA/index.html

30