<<

Binary‐level program analysis: Assembly basics

Gang Tan CSE 597 Spring 2019 Penn State University

1 , Assembly code, , and Executable Code

Source Assembly Object Assembler code code code

• Then a links object code of different compilation units (files, libraries) into executable code • Assembly code – Consist of assembly instructions – Specific for a particular architecture (x86, x64, ARM, SPARC, etc.) • Object code – Consist of encodings of assembly instructions in bytes • Executable code – AKA – In a particular format (e.g., ELF or PE)

2 Example Source Code: hello.

#include int main() { printf("Hello, World!"); return 0; }

3 Example Assembly Code: After “gcc ‐S ‐o hello.s hello.c”

.file "hello.c" .cfi_def_cfa_register 6 .section .rodata movl $.LC0, %eax .LC0: movq %rax, %rdi .string "Hello, World!" movl $0, %eax .text call printf .globl main movl $0, %eax .type main, @function leave main: .cfi_def_cfa 7, 8 .LFB0: ret .cfi_startproc .cfi_endproc pushq %rbp .cfi_def_cfa_offset 16 .LFE0: .cfi_offset 6, ‐16 .size main, .‐main movq %rsp, %rbp .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7‐23)" .section .note.GNU‐stack,"",@progbits

4 Example executable Code: After “gcc –o hello hello.c” Do “objdump ‐s ./hello”

5 Binary Code Analysis

• Refer to analyzing assembly or executable code • If given executable code – Step 1: disassemble it to assembly code – Step 2: analyze the assembly code • The disassembly step may be hard or easy – Depending on whether meta information is embedded into executable code

6 Meta information in Executable Code

• During compilation, meta information can be embedded into executable code • Meta information: symbol tables – Information about symbols (e.g., function and variable names) from source code – Each entry • The symbol name • The binding address • Type of the symbol • Misc. info – Symbol tables consumed by linkers and debuggers

7 objdump ‐‐sym ./hello

8 Meta information in Executable Code

information – Before linking, memory addresses of functions and global are unknown – generate relocation entries – Static/dynamic linkers patch the program during linking

9 Meta information in Executable Code

• Debugging information – Generated by the compiler and consumed by debuggers (e.g., gdb) – During debugging, the debugger uses debugging info to relate binary code to source code • E.g., this instruction is generated code from this source code line – Include • Source code info: types and scopes of identifiers • Line‐number info: to relate binary to source code • Other info such as location description – Debugging info formats: DWARF and STABS

10 Stripped versus unstripped binaries

• Stripped binaries – Pure binary code; no meta information – Disassembly is hard (do not even know where functions start) • Unstripped binaries – Binary code plus meta information – Disassembly is easy • Why stripped binaries? – Meta information occupies space – Stripped binaries are harder to reverse engineer, making it easier to protect intellectual property

11 Next: IA32 and Reverse Engineering basics

• NSA tutorial on reverse engineering – https://codebreaker.ltsnet.net/resources – Introduction to x86 Assembly – Reverse Engineering Machine Code Pt. 1 – Reverse Engineering Machine Code Pt. 2

12