Analysis of X86 Application and System Programs Via Machine-Code Verification
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of x86 Application and System Programs via Machine-Code Verification Shilpi Goel The University of Texas at Austin Ph.D. Dissertation Proposal 29th April, 2015 Outline ๏ Motivation ๏ State of the Art ๏ Proposed Dissertation Project ‣ [Task 1] Developing an x86 ISA Model ‣ [Task 2] Building a Machine-Code Analysis Framework ‣ [Task 3] Verifying Application and System Programs ๏ Future Work ๏ Expected Contributions 2 /33 Motivation • Software systems are ubiquitous. • Cost of incorrect software is extremely high. • Formal verification can increase software quality. • Approach: Machine-code verification for x86 platforms Motivation 3 /33 Motivation • Why not high-level code verification? X High-level verification frameworks do not address compiler bugs ✓ Verified/verifying compilers can help X But these compilers typically generate inefficient code X Need to build verification frameworks for many high-level languages X Sometimes, high-level code is unavailable • Why x86? ✓ x86 is in widespread use — our approach will have immediate practical application Motivation 4 /33 State of the Art: x86 Machine-Code Analysis [Morrisett et al.] Formal x86 ISA Security policy violation checks in specification browsers [Reps et al.] Control & data flow Point tools; approximation of analysis program behavior [Myreen] Decompilation-in-Logic Analysis of application programs [Moore] Codewalker [Klein et al.] SeL4 Clean-slate development of [Shao et al.] Certified OS kernels verified software State of the Art 5 /33 State of the Art: x86 Machine-Code Analysis We need more! Security policy violation checks in General analysis browsers Precisely capture Point tools; approximation of program behavior program behavior Analyze application Analysis of application programs and system programs Verify existing Clean-slate development of software verified software State of the Art 5 /33 Example: Analysis of a Data-Copy Program Specification: Copy data x from linear (virtual) memory location l0 to disjoint linear memory location l1. l0 x Verification Objective: After a successful copy, l0 and l1 contain x. p x Implementation: l1 x Include the copy-on-write technique: l0 and l1 can be mapped to the same physical memory location p. Linear Physical ‣ System calls Memory Memory ‣ Modifications to address mapping ‣ Access control management Example 6 /33 Proposed Dissertation Project • Goal: Build robust tools to increase software reliability ‣ Verify critical properties of application and system programs ‣ Correctness with respect to behavior, security, & resource usage • Plan of Action: 1. Build a formal, executable x86 ISA model using ACL2, 2. Develop a machine-code analysis framework based on this model, and 3. Employ this framework to verify application and system programs. Proposed Dissertation Project 7 /33 Expected Contributions Briefly: • A new tool: general-purpose analysis framework for x86 machine-code • Program verification taking memory management into account: analysis of programs, including low-level system & ISA features • Reasoning strategies: insight into low-level code verification in general • Foundation for future research: target for verified/verifying compilers Expected Contributions 8 /33 Outline ๏ Motivation ๏ State of the Art ๏ Proposed Dissertation Project ‣ [Task 1] Developing an x86 ISA Model ‣ [Task 2] Building a Machine-Code Analysis Framework ‣ [Task 3] Verifying Application and System Programs ๏ Future Work ๏ Expected Contributions 9 /33 Model Development Obtaining the x86 ISA Specification Task 1 | x86 ISA Model | Model Development 10/33 Model Development Obtaining the x86 ISA Specification ~3400 pages Task 1 | x86 ISA Model | Model Development 10/33 Model Development Obtaining the x86 ISA Specification All AMD manuals: ~3000 pages ~3400 pages Task 1 | x86 ISA Model | Model Development 10/33 Model Development Obtaining the x86 ISA Specification __asm__ volatile ("stc\n\t" // Set CF. "mov $0, %%eax\n\t" // Set EAX = 0. "mov $0, %%ebx\n\t" // Set EBX = 0. "mov $0, %%ecx\n\t" // Set ECX = 0. "mov %4, %%ecx\n\t" // Set CL = rotate_by. "mov %3, %%edx\n\t" // Set EDX = old_cf = 1. All AMD manuals: ~3000 pages ~3400 "mov pages %2, %%eax\n\t" // Set EAX = num. "rcl %%cl, %%al\n\t" // Rotate AL by CL. "cmovb %%edx, %%ebx\n\t" // Set EBX = old_cf if CF = 1. // Otherwise, EBX = 0. "mov %%eax, %0\n\t" // Set res = EAX. "mov %%ebx, %1\n\t" // Set cf = EBX. : "=g"(res), "=g"(cf) : "g"(num), "g"(old_cf), "g"(rotate_by) : "rax", "rbx", "rcx", "rdx"); Task 1 | x86 ISA Model | Model Development Running tests on x86 machines 10/33 Model Development Focus: 64-bit sub-mode of Intel’s IA-32e mode Task 1 | x86 ISA Model | Model Development 11/33 BASIC EXECUTION ENVIRONMENT • Debug registers —Model Debug registers expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and Quality of Service,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. • Descriptor table registers — The global descriptor table register (GDTR) and interrupt descriptor table register (IDTR) expand to 10 bytes so that they can hold a full 64-bit base address. The local descriptor table Focus:register (LDTR) 64-bitand the task register sub-mode (TR) also expand to of hold aIntel’s full 64-bit base IA-32e address. mode Basic Program Execution Registers Address Space 2^64 -1 Sixteen 64-bit General-Purpose Registers Registers Six 16-bit Segment Registers Registers 64-bits RFLAGS Register 64-bits RIP (Instruction Pointer Register) FPU Registers Eight 80-bit Floating-Point Registers Data Registers 0 16 bits Control Register 16 bits Status Register 16 bits Tag Register Opcode Register (11-bits) 64 bits FPU Instruction Pointer Register 64 bits FPU Data (Operand) Pointer Register MMX Registers Eight 64-bit MMX Registers Registers XMM Registers Sixteen 128-bit Registers XMM Registers 32-bits MXCSR Register Figure 3-2. 64-Bit Mode Execution Environment Task 1 | x86 ISA Model | Model Development 11/33 Source: Intel Manuals 3.3 MEMORY ORGANIZATION The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel Vol. 1 3-5 BASIC EXECUTION ENVIRONMENT • Debug registers —Model Debug registers expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and Quality of Service,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. SYSTEM ARCHITECTURE OVERVIEW • Descriptor table registers — The global descriptor table register (GDTR) and interrupt descriptor table register (IDTR) expand to 10 bytes so that they can hold a full 64-bit base address. The local descriptor table register (LDTR) and the task register (TR) also expand to hold a full 64-bit base address. Focus: 64-bit sub-mode of Intel’sRFLAGS IA-32e mode Physical Address Code, Data or Stack Control Register Segment (Base =0) Basic Program Execution Registers Address Space Linear Address CR8 Task-State 2^64 -1 CR4 Segment Selector Segment (TSS) CR3 Sixteen 64-bit General-Purpose Registers CR2 Register Registers CR1 CR0 Global Descriptor Task Register Table (GDT) Six 16-bit Interrupt Handler Segment Registers Segment Sel. Seg. Desc. Registers Code NULL Interrupt TR TSS Desc. Stack 64-bits RFLAGS Register Vector Seg. Desc. Interrupt Descriptor 64-bits RIP (Instruction Pointer Register) Table (IDT) Seg. Desc. Interr. Handler Code Interrupt Gate Current TSS FPU Registers LDT Desc. Stack Interrupt Gate GDTR Eight 80-bit Floating-Point IST Data Registers Trap Gate Registers 0 Local Descriptor Exception Handler Table (LDT) Code NULL Stack IDTR Call-Gate Seg. Desc. 16 bits Control Register Segment Selector Call Gate 16 bits Status Register Protected Procedure 16 bits Tag Register XCR0 (XFEM) Code LDTR NULL Stack Opcode Register (11-bits) 64 bits FPU Instruction Pointer Register Linear Address Space Linear Address 64 bits FPU Data (Operand) Pointer Register PML4 Dir. Pointer Directory Table Offset MMX Registers Linear Addr. PML4 Pg. Dir. Ptr. Page Dir. Page Table Page Eight 64-bit MMX Registers Registers Physical PML4. Pg. Dir. Page Tbl Addr. Entry Entry Entry XMM Registers 0 This page mapping example is for 4-KByte pages CR3* and 40-bit physical address size. *Physical Address Sixteen 128-bit XMM Registers Registers Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode 32-bits MXCSR Register 2.1.1 Global and Local Descriptor Tables When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment Figure 3-2. 64-Bit Mode Executiondescriptors. Environment Segment descriptors provide the base address of segments well as access rights, type, and usage information. Task 1 | x86 ISA Model | Model Development Each segment descriptor has an associated segment selector. A segment selector provides the software11 that/33 uses Source: Intel itManuals with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (deter- 3.3 MEMORY ORGANIZATION mines whether the selector points to the GDT or the LDT), and access rights information. The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel Vol. 3A 2-3 Vol. 1 3-5 PROTECTION — TSS — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.) • Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segmentPROTECTION selectors.