Analysis of Application and System Programs via Machine-Code Verification

Shilpi Goel, Warren A. Hunt, Jr., and Matt Kaufmann

Department of Computer Science The University of Texas at Austin 1 University Way, M/S C0500 Austin, TX 78712-0233

[email protected] TEL: +1 512 471 9748 FAX: +1 512 471 8885

April, 2016 Introduction

Motivation: Increase the reliability of programs used in the industry

Approach: Machine-code verification for x86 platforms -We are developing a formal x86 model in ACL2 for code analysis. -We are vetting our tools on commercial-sized problems.

Today, we talk about our ongoing work in formal verification of software, and present our plans to verify supervisor-level code in the immediate future.

Objective: Emulate an , like FreeBSD, along with the programs running on it, and prove properties about kernel code

Introduction 2 /27 Ecosystem

Our group has significant collaboration with the government and industry.

Our own research includes: - Development of core technologies - Application of these technologies in different domains - Validation of commercial processor designs at Centaur and Oracle (10+ developers, 30+ users)

Introduction 3 /27 Project Overview

Goal: Build robust tools to increase software reliability ‣ Verify critical properties of application and system programs ‣ Correctness with respect to behavior, security, & resource usage

Plan of Action: 1. Build a formal, executable x86 ISA model using 2. Develop a machine-code analysis framework based on this model 3. Employ this framework to verify application and system programs

Introduction 4 /27 Contributions

A new tool: General-purpose analysis framework for x86 machine-code ‣ Accurate x86 ISA reference

Program verification taking memory management into account: ‣ Properties of x86 memory-management data structures ‣ Analysis of programs, including low-level system & ISA features

Reasoning strategies: Insight into low-level code verification in general ‣ Build effective lemma libraries

Foundation for future research: ‣ Target for verified/verifying compilers ‣ Resource usage guarantees ‣ Information-flow analysis ‣ Ensuring process isolation

Introduction 5 /27 Outline

๏ Motivation

๏ Project Description

➡ [1] Developing an x86 ISA Model

➡ [2] Building a Machine-Code Analysis Framework

➡ [3] Verifying Application and System Programs

๏ Future Work & Conclusion

๏ Accessing Source Code + Documentation

6 /27 Model Development

Obtaining the x86 ISA Specification

Model Development 7 /27 Model Development

Obtaining the x86 ISA Specification

~3400 pages

Model Development 7 /27 Model Development

Obtaining the x86 ISA Specification

All AMD manuals: ~3000 pages ~3400 pages

Model Development 7 /27 Model Development

Obtaining the x86 ISA Specification

__asm__ volatile ("stc\n\t" // Set CF. "mov $0, %%eax\n\t" // Set EAX = 0. "mov $0, %%ebx\n\t" // Set EBX = 0. "mov $0, %%ecx\n\t" // Set ECX = 0. "mov %4, %%ecx\n\t" // Set CL = rotate_by. "mov %3, %%edx\n\t" // Set EDX = old_cf = 1. All AMD manuals: ~3000 pages ~3400 "mov pages %2, %%eax\n\t" // Set EAX = num. "rcl %%cl, %%al\n\t" // Rotate AL by CL. "cmovb %%edx, %%ebx\n\t" // Set EBX = old_cf if CF = 1. // Otherwise, EBX = 0. "mov %%eax, %0\n\t" // Set res = EAX. "mov %%ebx, %1\n\t" // Set cf = EBX.

: "=g"(res), "=g"(cf) : "g"(num), "g"(old_cf), "g"(rotate_by) : "rax", "rbx", "rcx", "rdx");

Model Development Running tests on x86 machines 7 /27 Model Development

Focus: 64-bit sub-mode of Intel’s IA-32e mode

Model Development 8 /27 BASIC EXECUTION ENVIRONMENT

• Debug registers —Model Debug registers expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and Quality of Service,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. • Descriptor table registers — The register (GDTR) and descriptor table register (IDTR) expand to 10 bytes so that they can hold a full 64-bit base address. The local descriptor table Focus:register (LDTR) 64-bitand the task register sub-mode (TR) also expand to of hold aIntel’s full 64-bit base IA-32eaddress. mode

Basic Program Execution Registers Address Space 2^64 -1 Sixteen 64-bit General-Purpose Registers Registers

Six 16-bit Segment Registers Registers

64-bits RFLAGS Register

64-bits RIP (Instruction Pointer Register)

FPU Registers

Eight 80-bit Floating-Point Registers Data Registers 0

16 bits Control Register 16 bits Status Register 16 bits Tag Register Opcode Register (11-bits) 64 bits FPU Instruction Pointer Register 64 bits FPU Data (Operand) Pointer Register

MMX Registers

Eight 64-bit MMX Registers Registers

XMM Registers

Sixteen 128-bit Registers XMM Registers

32-bits MXCSR Register

Figure 3-2. 64-Bit Mode Execution Environment Model Development 8 /27 Source: Intel Manuals 3.3 MEMORY ORGANIZATION The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel

Vol. 1 3-5 BASIC EXECUTION ENVIRONMENT

• Debug registers —Model Debug registers expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and Quality of Service,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. SYSTEM ARCHITECTURE OVERVIEW • Descriptor table registers — The global descriptor table register (GDTR) and interrupt descriptor table register (IDTR) expand to 10 bytes so that they can hold a full 64-bit base address. The local descriptor table register (LDTR) and the task register (TR) also expand to hold a full 64-bit base address. Focus: 64-bit sub-mode of Intel’sRFLAGS IA-32e mode Physical Address Code, Data or Stack Control Register Segment (Base =0) Basic Program Execution Registers Address Space Linear Address CR8 Task-State 2^64 -1 CR4 Segment Selector Segment (TSS) CR3 Sixteen 64-bit General-Purpose Registers CR2 Register Registers CR1 CR0 Global Descriptor Task Register Table (GDT)

Six 16-bit Interrupt Handler Segment Registers Segment Sel. Seg. Desc. Registers Code NULL Interrupt TR TSS Desc. Stack 64-bits RFLAGS Register Vector Seg. Desc. Interrupt Descriptor 64-bits RIP (Instruction Pointer Register) Table (IDT) Seg. Desc. Interr. Handler Code Interrupt Gate Current TSS FPU Registers LDT Desc. Stack Interrupt Gate GDTR Eight 80-bit Floating-Point IST Data Registers Trap Gate Registers 0 Local Descriptor Exception Handler Table (LDT) Code NULL Stack IDTR Call-Gate Seg. Desc. 16 bits Control Register Segment Selector Call Gate 16 bits Status Register Protected Procedure 16 bits Tag Register XCR0 (XFEM) Code LDTR NULL Stack Opcode Register (11-bits) 64 bits FPU Instruction Pointer Register Linear Address Space Linear Address 64 bits FPU Data (Operand) Pointer Register PML4 Dir. Pointer Directory Table Offset MMX Registers Linear Addr. PML4 Pg. Dir. Ptr. Page Dir. Page Table Page Eight 64-bit MMX Registers Registers Physical PML4. Pg. Dir. Page Tbl Addr. Entry Entry Entry

XMM Registers 0 This page mapping example is for 4-KByte pages CR3* and 40-bit physical address size. *Physical Address Sixteen 128-bit XMM Registers Registers Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode

32-bits MXCSR Register 2.1.1 Global and Local Descriptor Tables When operating in , all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment Figure 3-2. 64-Bit Mode Executiondescriptors. Environment Segment descriptors provide the base address of segments well as access rights, type, and usage information. Model Development Each segment descriptor has an associated segment selector. A segment selector provides the software8 that/27 uses Source: Intel itManuals with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (deter- 3.3 MEMORY ORGANIZATION mines whether the selector points to the GDT or the LDT), and access rights information. The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel Vol. 3A 2-3

Vol. 1 3-5 PROTECTION

— TSS — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.) • Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segmentPROTECTION selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL BASIC EXECUTION ENVIRONMENT to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has Thesufficient center (reserved privilege tofor access the most the privileged segment, code,access data, is denied and stacks) if the RPL is used is not for of the sufficien segmentst privilege containing level. the That critical is, software,if the RPL usually of a segment the kernel selector of an isoper numericallyating system. greater Outer than rings the are•CPL, usedDebug the RPLfor less overridesregisters critical the software. — CPL,Model Debug and (Systems viceregisters versa. that expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and The RPL can be used to insure that privileged code does not access a segment on behalf of an application use only 2 of the 4 possible privilege levels should use levels 0 and 3.)Quality of Service,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. SYSTEM ARCHITECTURE OVERVIEW program unless the program itself has access privileges for that segment. See Section 5.10.4, “Checking Caller Access Privileges (ARPL Instruction),” for a detailed description• of theDescriptor purpose and table typical registers use of the —RPL. The global descriptor table register (GDTR) and interrupt descriptor table Privilege levels are checked when the segment selector of a segment descriptorregister is(IDTR) loaded expand into a segment to 10 bytesregister. so that they can hold a full 64-bit base address. The local descriptor table Protection Rings The checks used for data access differ from those used for transfers ofFocus: registerprogram control(LDTR) among 64-bitand the code task segments; register sub-mode (TR) also expand to of hold aIntel’s full 64-bit base IA-32eaddress. mode therefore, the two kinds of accesses are considered separately in the following sections. RFLAGS Physical Address Code, Data or Stack Control Register Segment (Base =0) Basic Program Execution Registers Address Space Linear Address CR8 Operating Task-State 5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS 2^64 -1 CR4 Segment Selector Segment (TSS) System CR3 Kernel Level 0 To access operands in a data segment, the segment selector for the data segment must beSixteen loaded 64-bit into the data-General-Purpose Registers CR2 Register Registers segment registers (DS, ES, FS, or GS)Operating or into System the stack-segment register (SS). (Segment registers can be loaded CR1 CR0 with the MOV, POP, LDS, LES, LFS, LGS, andServices LSS instructions.) BeforeLevel 1 the processor loads a segment selector into Global Descriptor a segment register, it performs a privilege check (see Figure 5-4) by comparing the privilege levels of the currently Task Register Table (GDT) running program or task (the CPL), the RPL of the segment selector,Level and2 the DPL of the segment’s segment Six 16-bit Interrupt Handler descriptor. The processor loads the segmentApplications selector into the segmentLevel 3 register if the DPL is numerically greaterSegment Registers Segment Sel. Seg. Desc. Registers Code than or equal to both the CPL and the RPL. Otherwise, a general-protection fault is generated and the segment NULL register is not loaded. Interrupt TR TSS Desc. Stack Figure 5-3. Protection Rings 64-bits RFLAGS Register Vector Seg. Desc. Interrupt Descriptor 64-bits RIP (Instruction Pointer Register) The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing Table (IDT) Seg. Desc. Interr. Handler CS Register a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level Code Interrupt Gate Current TSS violation, it generates a general-protection exceptionCPL (#GP). FPU Registers LDT Desc. Stack Interrupt Gate To carry out privilege-level checks between code segments and data segments, the processor recognizes the GDTR following three types of privilege levels:Segment Selector Eight 80-bit Floating-Point IST For Data Segment Registers Data Registers Trap Gate • Current privilege level (CPL) — The CPL is the privilege level of the currently executing program or task. It 0 Local Descriptor Exception Handler RPL Table (LDT) Code is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of NULL the code segment from which instructions are being fetched. The processor changes the CPL when program Stack Data-Segment Descriptor Privilege IDTR Call-Gate Seg. Desc. control is transferred to a code segment with a different privilege level.Check The CPL is treated slightly differently16 bits Control Register Segment Selector DPL Call Gate when accessing conforming code segments. Conforming code segments can be accessed from any privilege16 bits Status Register Protected Procedure level that is equal to or numerically greater (less privileged) than the DPL of the conforming code segment. 16 bits Tag Register XCR0 (XFEM) Code Also, the CPL is not changed when the processor accesses a conforming code segment that has a different LDTR NULL Stack privilege level than the CPL. Opcode Register (11-bits) Figure 5-4. Privilege Check for Data Access • Descriptor privilege level (DPL) — The DPL is the privilege level of a segment or gate. It is stored64 in bitsthe DPL FPU Instruction Pointer Register field of the segment or gate descriptor for the segment or gate. When the currently executing code segment Linear Address Space Linear Address 64 bits FPU Data (Operand) Pointer Register Dir. Pointer Directory Table Figureattempts 5-5 shows to accessfour procedures a segment (located or gate, in the codes DPL segments of the se gmentA, B, C, or and gate D), is eachcompared running to theat different CPL and privilegeRPL of the PML4 Offset levelssegment and each or attempting gate selector to access (as described the same later data in segment.this section). The DPL is interpretedMMX differently, Registers depending on the type of segment or gate being accessed: Linear Addr. 1. The procedure in code segment A is able to access data segment E using segment selector E1, because the CPL PML4 Pg. Dir. Ptr. Page Dir. Page Table Page of— codeData segment segment A and — theThe RPL DPL of indicates segment the select numericallyor E1 are highest equal topriv theilege DPL level of data that segment a programEight E. 64-bit or task can have MMX Registers to be allowed to access the segment. For example, if the DPL of a data segment is 1, onlyRegisters programs running Physical 2. The procedureat a CPL ofin 0code or 1 segment can access B is the able segment. to access data segment E using segment selector E2, because the CPL PML4. Pg. Dir. Page Tbl Addr. of code segment B and the RPL of segment selector E2 are both numerically lower than (more privileged) than Entry Entry Entry the— DPLNonconforming of data segment code E. A segment code segment (without B proc usingedure a can call also gate) access — The data DPL segment indicates E theusing privilege segment level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code selector E1. XMM Registers 0 This page mapping example is for 4-KByte pages segment is 0, only programs running at a CPL of 0 can access the segment. CR3* and 40-bit physical address size. 3. The procedure in code segment C is not able to access data segment E using segment selector E3 (dotted line), — Call gate — The DPL indicates the numerically highest privilege level that the currently executing program *Physical Address because the CPL of code segment C and the RPL of segment selector E3 are both numerically greaterSixteen than 128-bit (less privileged)or task than can the be DPLat and of datastill be segment able to E.access Even theif a callcode gate. segment (This C is procedure the same wereaccess to rule use assegment for aRegisters data selector XMM Registers segment.) Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode — Conforming code segment and nonconforming code segment accessed through a call gate — The 32-bits MXCSR Register DPL indicates the numerically lowest privilege level that a program or task can have to be allowed to access 2.1.1 Global and Local Descriptor Tables 5-8 Vol. 3Athe segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment. When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment Figure 3-2. 64-Bit Mode Executiondescriptors. Environment Segment descriptors provide the base address of segments well as access rights, type, and usage information. Model Development Vol. 3A 5-7 Each segment descriptor has an associated segment selector. A segment selector provides the software8 that/27 uses Source: Intel itManuals with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (deter- 3.3 MEMORY ORGANIZATION mines whether the selector points to the GDT or the LDT), and access rights information. The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel Vol. 3A 2-3

Vol. 1 3-5 PROTECTION

— TSS — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.) • Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segmentPROTECTION selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL BASIC EXECUTION ENVIRONMENT to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has Thesufficient center (reserved privilege tofor access the most the privileged segment, code,access data, is denied and stacks) if the RPL is used is not for of the sufficien segmentst privilege containing level. the That critical is, software,if the RPL usually of a segment the kernel selector of an isoper numericallyating system. greater Outer than rings the are•CPL, usedDebug the RPLfor less overridesregisters critical the software. — CPL,Model Debug and (Systems viceregisters versa. that expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and The RPL can be used to insure that privileged code does not access a segment on behalf of an application use only 2 of the 4 possible privilege levels should use levels 0 and 3.)Quality of Service,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. SYSTEM ARCHITECTURE OVERVIEW program unless the program itself has access privileges for that segment. See Section 5.10.4, “Checking Caller Access Privileges (ARPL Instruction),” for a detailed description• of theDescriptor purpose and table typical registers use of the —RPL. The global descriptor table register (GDTR) and interrupt descriptor table Privilege levels are checked when the segment selector of a segment descriptorregister is(IDTR) loaded expand into a segment to 10 bytesregister. so that they can hold a full 64-bit base address. The local descriptor table Protection Rings The checks used for data access differ from those used for transfers ofFocus: registerprogram control(LDTR) among 64-bitand the code task segments; register sub-mode (TR) also expand to of hold aIntel’s full 64-bit base IA-32eaddress. mode therefore, the two kinds of accesses are considered separately in the following sections. RFLAGS Physical Address Code, Data or Stack Control Register Segment (Base =0) Basic Program Execution Registers Address Space Linear Address CR8 Operating Task-State 5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS 2^64 -1 CR4 Segment Selector Segment (TSS) System CR3 Kernel Level 0 To access operands in a data segment, the segment selector for the data segment must beSixteen loaded 64-bit into the data-General-Purpose Registers CR2 Register Registers segment registers (DS, ES, FS, or GS)Operating or into System the stack-segment register (SS). (Segment registers can be loaded CR1 CR0 with the MOV, POP, LDS, LES, LFS, LGS, andServices LSS instructions.) BeforeLevel 1 the processor loads a segment selector into Global Descriptor a segment register, it performs a privilege check (see Figure 5-4) by comparing the privilege levels of the currently Task Register Table (GDT) running program or task (the CPL), the RPL of the segment selector,Level and2 the DPL of the segment’s segment Six 16-bit Interrupt Handler descriptor. The processor loads the segmentApplications selector into the segmentLevel 3 register if the DPL is numerically greaterSegment Registers Segment Sel. Seg. Desc. Registers Code than or equal to both the CPL and the RPL. Otherwise, a general-protection fault is generated and the segment NULL register is not loaded. Interrupt TR TSS Desc. Stack Figure 5-3. Protection Rings 64-bits RFLAGS Register Vector Seg. Desc. Interrupt Descriptor 64-bits RIP (Instruction Pointer Register) The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing Table (IDT) Seg. Desc. Interr. Handler CS Register a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level Code Interrupt Gate Current TSS violation, it generates a general-protection exceptionCPL (#GP). FPU Registers LDT Desc. Stack Interrupt Gate To carry out privilege-level checks between code segments and data segments, the processor recognizes the GDTR following three types of privilege levels:Segment Selector Eight 80-bit Floating-Point IST For Data Segment Registers Data Registers Trap Gate • Current privilege level (CPL) — The CPL is the privilege level of the currentlyINTERRUPT executing AND program EXCEPTION or task.HANDLING It 0 Local Descriptor Exception Handler RPL Table (LDT) Code is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of NULL the code segment from which instructions are being fetched. The processor changes the CPL when program Stack Data-Segment Descriptor Privilege IDTR Call-Gate Seg. Desc. control is transferred to a code segment with a different privilege level.Check The CPL is treated slightly differently16 bits Control Register Segment Selector when accessing conforming code segments.DPL ConformingStack Usage code with segments No can be accessed from any privilege Status Register Call Gate Privilege-Level Change 16 bits Protected Procedure level that is equal to or numerically greater (less privileged) than the DPL of the conforming code segment. Interrupted Procedure’s 16 bits Tag Register XCR0 (XFEM) Code Also, the CPL is not changed whenand Handler’sthe processor Stack accesses a conforming code segment that has a different LDTR NULL Stack privilege level than the CPL. Opcode Register (11-bits) • Descriptor privilege level (DPL)Figure — 5-4. The PrivilegeDPL is the Check ESPprivilege Before for Datalevel Accessof a segment or gate. It is stored in the DPL Transfer to Handler 64 bits FPU Instruction Pointer Register field of the segment or gate descriptorEFLAGS for the segment or gate. When the currently executing code segment Linear Address Space Linear Address attempts to access a segment or gate,CS the DPL of the segment or gate is compared to the CPL and 64RPL bits of the FPU Data (Operand) Pointer Register PML4 Dir. Pointer Directory Table Offset Figure 5-5 shows four procedures (located inEIP codes segments A, B, C, and D), each running at different privilege segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on levels and each attempting to access theError same Code data segment.ESP After MMX Registers the type of segment or gate being accessed: Transfer to Handler Linear Addr. 1. The procedure in code segment A is able to access data segment E using segment selector E1, because the CPL PML4 Pg. Dir. Ptr. Page Dir. Page Table Page of— codeData segment segment A and — theThe RPL DPL of indicates segment the select numericallyor E1 are highest equal topriv theilege DPL level of data that segment a programEight E. 64-bit or task can have MMX Registers to be allowed to access the segment. For example, if the DPL of a data segment is 1, onlyRegisters programs running Physical 2. The procedure in code segment B is able to access dataStack segment Usage withE using segment selector E2, because the CPL at a CPL of 0 or 1 can access the segment. Privilege-Level Change PML4. Pg. Dir. Page Tbl Addr. of code segment B and the RPL of segment selector E2 are both numerically lower than (more privileged) than Entry Entry Entry Interrupted Procedure’s the— DPLNonconforming of data segment code E. A segment code segment (without B proc usingedure a can call also gate) access — The dataHandler’s DPL segment indicates Stack E theusing privilege segment level that a program or task must be at to accessStack the segment. For example, if the DPL of a nonconforming code selector E1. XMM Registers 0 This page mapping example is for 4-KByte pages segment is 0, only programs running at a CPL ofESP 0 can Before access the segment. CR3* and 40-bit physical address size. 3. The procedure in code segment C is not able to accessTransfer data segment to Handler E using segment SS selector E3 (dotted line), — Call gate — The DPL indicates the numerically highest privilege level that the currently executing program *Physical Address because the CPL of code segment C and the RPL of segment selector E3 are both ESP numerically greaterSixteen than 128-bit (less privileged)or task than can the be DPLat and of datastill be segment able to E.access Even theif a callcode gate. segment (This C is procedure the same EFLAGS wereaccess to rule use assegment for a data selector XMM Registers Registers Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode segment.) CS — Conforming code segment and nonconforming code segment accessed EIP through a call gate — The ESP After Error Code 32-bits MXCSR Register DPL indicates the numerically lowest privilege levelTransfer that to Handlera program or task can have to be allowed to access 2.1.1 Global and Local Descriptor Tables 5-8 Vol. 3Athe segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment. When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment Figure 3-2. 64-Bit Mode Executiondescriptors. Environment Segment descriptors provide the base address of segments well as access rights, type, and usage information. To return from an exception- or interrupt-handler procedure, the handler must use the IRET (or IRETD) instruction. The IRET instructionModel is similar toDevelopment the RET instruction except that it restores the saved flags into the EFLAGSVol. 3A 5-7 Each segment descriptor has an associated segment selector. A segment selector provides the software8 that/27 uses register. The IOPL field of the EFLAGS register is restored only if the CPL is 0. The IF flag is changed only if the CPL it with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (deter- is less than or equal to the IOPL. See Chapter 3, “Instruction Set Reference, A-M,” of the Intel® 64 and IA-32 Archi-Source: Intel minesManuals whether the selector points to the GDT or the LDT), and access rights information. tectures Software Developer’s Manual, Volume 2A, for a description3.3 of the comple MEMORYte operation ORGANIZATION performed by the IRET instruction. The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as If a stack switch occurred when calling the handler procedure,a the sequence IRET instruction of 8-bit switches bytes. back Each to the byte interrupted is assigned a unique address, called a physical address. The physical procedure’s stack on the return. address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel Vol. 3A 2-3

6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures

The privilege-level protection for exception- and interrupt-handler procedures is similar to that used for ordinary Vol. 1 3-5 procedure calls when called through a call gate (see Section 5.8.4, “Accessing a Code Segment Through a Call Gate”). The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL. An attempt to violate this rule results in a general-protection exception (#GP). The protection mechanism for exception- and interrupt-handler procedures is different in the following ways: • Because interrupt and exception vectors have no RPL, the RPL is not checked on implicit calls to exception and interrupt handlers. • The processor checks the DPL of the interrupt or trap gate only if an exception or interrupt is generated with an INT n, INT 3, or INTO instruction. Here, the CPL must be less than or equal to the DPL of the gate. This restriction prevents application programs or procedures running at privilege level 3 from using a software interrupt to access critical exception handlers, such as the page-fault handler, providing that those handlers are placed in more privileged code segments (numerically lower privilege level). For hardware-generated and processor-detected exceptions, the processor ignores the DPL of interrupt and trap gates.

Vol. 3A 6-13 PROTECTION APPENDIX B — TSS — The DPL indicates the numerically highest privilege level that the currently executing program or INSTRUCTION FORMATS AND ENCODINGS task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.) • Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segmentPROTECTION selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL BASIC EXECUTION ENVIRONMENT to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has sufficient privilege to access the segment, access is denied if theThis RPL isappendix not of sufficien providest privilege machine level. That instruction is, formats and encodings of IA-32 instructions. The first section describes The center (reserved for the most privileged code, data, and stacks)the is usedIA-32 for thearchitecture’s segments containing machine the critical instruction format. The remaining sections show the formats and encoding of software,if the RPL usually of a segment the kernel selector of an isoper numericallyating system. greater Outer than rings the are•CPL, usedDebug the RPLfor less overridesregisters critical the software. — CPL,Model Debug and (Systems viceregisters versa. that expand Development to 64 bits. See Chapter 17, “Debug, Branch Profile, TSC, and The RPL can be used to insure that privileged code does not access a segment on behalf of an application use only 2 of the 4 possible privilege levels should use levels 0 andgeneral-purpose, 3.)Quality of Service,” MMX, in the P6 Intel® family, 64 SSE/SSE2/SSE3, and IA-32 Architectures x87 FP SoftwareU instructions, Developer’s and Manual, VMX instructions. Volume 3A. Those instruction SYSTEM ARCHITECTURE OVERVIEW program unless the program itself has access privileges for that formatssegment. Seealso Section apply 5.10.4, to Intel “Checking 64 architecture. Caller Instruction formats used in 64-bit mode are provided as supersets of Access Privileges (ARPL Instruction),” for a detailed description of theDescriptor purpose and table typical registers use of the —RPL. The global descriptor table register (GDTR) and interrupt descriptor table •the above. Privilege levels are checked when the segment selector of a segment descriptorregister is(IDTR) loaded expand into a segment to 10 bytesregister. so that they can hold a full 64-bit base address. The local descriptor table Protection Rings The checks used for data access differ from those used for transfers ofFocus: registerprogram control(LDTR) among 64-bitand the code task segments; register sub-mode (TR) also expand to of hold aIntel’s full 64-bit base IA-32eaddress. mode therefore, the two kinds of accesses are considered separately in the following sections. RFLAGS Physical Address Code, Data or Stack Control Register Segment (Base =0) B.1 MACHINEBasic Program Execution INSTRUCTION Registers FORMAT Address Space Linear Address CR8 Operating Task-State 5.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS 2^64 -1 CR4 Segment Selector Segment (TSS) System All Intel Architecture instructions are encoded using subsets of the general machineCR3 instruction format shown in Kernel Level 0 To access operands in a data segment, the segment selector for theFigure data segment B-1. Eachmust be Sixteeninstruction loaded 64-bit into the consists data-General-Purpose of: Registers CR2 Register Registers segment registers (DS, ES, FS, or GS)Operating or into System the stack-segment register (SS). (Segment registers can be loaded CR1 CR0 with the MOV, POP, LDS, LES, LFS, LGS, andServices LSS instructions.) BeforeLevel• 1 thean processor opcode loads a segment selector into Global Descriptor a segment register, it performs a privilege check (see Figure 5-4) by comparing the privilege levels of the currently Task Register Table (GDT) running program or task (the CPL), the RPL of the segment selector,Level• and2 a the register DPL of the and/or segment’s address segment mode specifier consisting of the ModR/M byte and sometimes the scale-index-base Six 16-bit Interrupt Handler descriptor. The processor loads the segmentApplications selector into the segmentLevel 3 register(SIB) ifbyte the DPL (if isrequired) numerically greaterSegment Registers Segment Sel. Seg. Desc. Registers Code than or equal to both the CPL and the RPL. Otherwise, a general-protection fault is generated and the segment NULL register is not loaded. • a displacement and an immediate data field (if required) Interrupt TR TSS Desc. Stack Figure 5-3. Protection Rings 64-bits RFLAGS Register Vector Seg. Desc. Interrupt Descriptor 64-bits RIP (Instruction Pointer Register) The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing Table (IDT) Seg. Desc. Interr. Handler CS Register 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level Code Interrupt Gate Current TSS violation, it generates a general-protection exceptionCPL (#GP). FPU Registers Legacy Prefixes REX Prefixes T T T T T T T T T T T T T T T T T T T T T T T T LDT Desc. Stack Interrupt Gate To carry out privilege-level checks between code segments and data segments, the processor recognizesGrp 1, Grp the 2, (optional) GDTR following three types of privilege levels:Segment Selector Eight 80-bit Floating-Point IST For Data Segment RegistersGrp 3, Grp 4 Data Registers 1, 2, or 3 Byte OpcodesTrap Gate(T = Opcode • Current privilege level (CPL) — The CPL is the privilege level of the currentlyINTERRUPT executing AND program EXCEPTION or task.HANDLING It 0 Local Descriptor Exception Handler RPL Table (LDT) Code is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of NULL Stack the code segment from which instructions are being fetched. The processorPrivilege changes the CPL when program IDTR Call-Gate Seg. Desc. Data-Segment Descriptor 16 bits Control Register control is transferred to a code segment with a different privilege level.Check The CPL is treated slightly7-6 differently 5-3 2-0 7-6 5-3 2-0 Segment Selector when accessing conforming code segments.DPL ConformingStack Usage code with segments No can be accessed from any privilege Status Register Call Gate Privilege-Level Change 16 bits Protected Procedure level that is equal to or numerically greater (less privileged) than the DPL of the conforming code Modsegment. Reg* R/M Scale Index Base d32 | 16 | 8 | None d32 | 16 | 8 | None Interrupted Procedure’s 16 bits Tag Register XCR0 (XFEM) Code Also, the CPL is not changed whenand Handler’sthe processor Stack accesses a conforming code segment that has a different LDTR NULL Stack privilege level than the CPL. Opcode Register (11-bits) ModR/M Byte SIB Byte Address Displacement • Descriptor privilege level (DPL)Figure — 5-4. The PrivilegeDPL is the Check ESPprivilege Before for Datalevel Accessof a segment or gate. It is stored in the DPL Immediate Data Transfer to Handler 64 bits FPU Instruction(4, Pointer 2, 1 Bytes Register or None) field of the segment or gate descriptorEFLAGS for the segment or gate. When the currently executing code segment Linear(4,2,1 Address Bytes Space or None) Linear Address attempts to access a segment or gate,CS the DPL of the segment or gate is compared to the CPL and 64RPL bits of the FPU Data (Operand) Pointer Register PML4 Dir. Pointer Directory Table Offset Figure 5-5 shows four procedures (located inEIP codes segments A, B, C, and D), each running at different privilegeRegister and/or Address NOTE: segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on levels and each attempting to access theError same Code data segment.ESP After MMX Registers Mode Specifier the type of segment or gate being accessed: Transfer to Handler * The Reg Field may Linearbe used Addr. as an 1. The procedure in code segment A is able to access data segment E using segment selector E1, because the CPL PML4 Pg. Dir. Ptr. Page Dir. Page Table Page of— codeData segment segment A and — theThe RPL DPL of indicates segment the select numericallyor E1 are highest equal topriv theilege DPL level of data that segment a programEight E. 64-bit or task can have opcode extension field (TTT) and as a MMX Registers to be allowed to access the segment. For example, if the DPL of a data segment is 1, onlyRegisters programs running way to encode diagnostic registers Physical 2. The procedure in code segment B is able to access dataStack segment Usage withE using segment selector E2, because the CPL at a CPL of 0 or 1 can access the segment. Privilege-Level Change PML4. Pg. Dir. Page Tbl Addr. of code segment B and the RPL of segment selector E2 are both numerically lower than (more privileged) than (eee). Entry Entry Entry Interrupted Procedure’s the— DPLNonconforming of data segment code E. A segment code segment (without B proc usingedure a can call also gate) access — The dataHandler’s DPL segment indicates Stack E theusing privilege segment level that a program or task must be at to accessStack the segment. For example, if the DPL of a nonconforming code selector E1. XMM Registers 0 This page mapping example is for 4-KByte pages segment is 0, only programs running at a CPL ofESP 0 can Before access the segment. CR3* and 40-bit physical address size. 3. The procedure in code segment C is not able to accessTransfer data segment to Handler E using segment SS selector E3 (dotted line),Figure B-1. General Machine Instruction Format — Call gate — The DPL indicates the numerically highest privilege level that the currently executing program *Physical Address because the CPL of code segment C and the RPL of segment selector E3 are both ESP numerically greaterSixteen than 128-bit (less privileged)or task than can the be DPLat and of datastill be segment able to E.access Even theif a callcode gate. segment (This C is procedure the same EFLAGS wereaccess to rule use assegment for a data selector XMM Registers Registers Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode segment.) CS — Conforming code segment and nonconforming code segment accessed EIP through a call gate — The ESP AfterThe followingError Code sections discuss this format.32-bits MXCSR Register DPL indicates the numerically lowest privilege levelTransfer that to Handlera program or task can have to be allowed to access 2.1.1 Global and Local Descriptor Tables 5-8 Vol. 3Athe segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment. When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or Figure 6-4. Stack Usage on Transfers to InterruptB.1.1 and Exception-Handling Legacy RoutinesPrefixes an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment Figure 3-2. 64-Bit Mode Executiondescriptors. Environment Segment descriptors provide the base address of segments well as access rights, type, and usage information. To return from an exception- or interrupt-handler procedure, theThe handler legacy must prefixes use the IRET noted (or IRETD) in Figure instruction. B-1 include 66H, 67H, F2H and F3H. They are optional, except when F2H, F3H The IRET instructionModel is similar toDevelopment the RET instruction except thatand it restores66H are the usedsaved flags in new into the instruction EFLAGSVol. 3A 5-7 extensions. LegacyEach segment prefixes descriptor must behas placedan associated before segment REX selectorprefixes.. A segment selector provides the software8 that/27 uses register. The IOPL field of the EFLAGS register is restored only if the CPL is 0. The IF flag is changed only if the CPL it with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (deter- is less than or equal to the IOPL. See Chapter 3, “Instruction Set Reference, A-M,” of the Intel® 64 and IA-32 Archi-Source: Intel minesManuals whether the selector points to the GDT or the LDT), and access rights information. tectures Software Developer’s Manual, Volume 2A, for a description3.3Refer of to the Chapter comple MEMORYte operation2, “Instruction ORGANIZATION performed byFormat,” the in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, IRET instruction. Volume 2A, for more information on legacy prefixes. The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as If a stack switch occurred when calling the handler procedure,a the sequence IRET instruction of 8-bit switches bytes. back Each to the byte interrupted is assigned a unique address, called a physical address. The physical procedure’s stack on the return. address space ranges from zero to a maximum of 236 − 1 (64 GBytes) if the processor does not support Intel Vol. 3A 2-3

6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures

The privilege-level protection for exception- and interrupt-handler procedures is similar to that used for ordinary Vol. 1 3-5 Vol. 2C B-1 procedure calls when called through a call gate (see Section 5.8.4, “Accessing a Code Segment Through a Call Gate”). The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL. An attempt to violate this rule results in a general-protection exception (#GP). The protection mechanism for exception- and interrupt-handler procedures is different in the following ways: • Because interrupt and exception vectors have no RPL, the RPL is not checked on implicit calls to exception and interrupt handlers. • The processor checks the DPL of the interrupt or trap gate only if an exception or interrupt is generated with an INT n, INT 3, or INTO instruction. Here, the CPL must be less than or equal to the DPL of the gate. This restriction prevents application programs or procedures running at privilege level 3 from using a software interrupt to access critical exception handlers, such as the page-fault handler, providing that those handlers are placed in more privileged code segments (numerically lower privilege level). For hardware-generated interrupts and processor-detected exceptions, the processor ignores the DPL of interrupt and trap gates.

Vol. 3A 6-13 Model Development

Under active development: an x86 ISA model in ACL2

Layered modeling approach mitigates ➡ x86 State: specifies the the trade-off between reasoning and components of the ISA execution efficiency [ACL2’13] (registers, flags, memory)

➡ Instruction Semantic Optimized for Functions: specify the effect reasoning of each instruction efficiency

➡ Step Function: fetches, Optimized for decodes, and executes one execution instruction efficiency

Model Development 9 /27 Model Validation

How can we know that our model faithfully represents the x86 ISA? Validate the model to increase trust in the applicability of formal analysis.

Model Development 10/27 Current Status: x86 ISA Model

• The x86 ISA model supports 400+ instructions, including some floating- point and supervisor-mode instructions ‣ Can execute almost all user-level programs emitted by GCC/LLVM ‣ Successfully co-simulated a contemporary SAT solver on our model ‣ Successfully simulated a supervisor-mode zero-copy program

• IA-32e paging for all page configurations (4K, 2M, 1G)

• Segment-based addressing

• Lines of Code: ~85,000 (not including blank lines)

• Simulation speed*: ‣ ~3.3 million instructions/second (paging disabled) ‣ ~330,000 instructions/second (with 1G pages)

Model Development: Current Status * Simulation speed measured on an Intel Xeon E31280 CPU @ 3.50GHz 11/27 Outline

๏ Motivation

๏ Project Description

➡ [1] Developing an x86 ISA Model

➡ [2] Building a Machine-Code Analysis Framework

➡ [3] Verifying Application and System Programs

๏ Future Work & Conclusion

๏ Accessing Source Code + Documentation

12/27 Building a Lemma Database

• Semantics of the program is given by the effect it has on the machine state. 1. read instruction from mem

add %edi, %eax 2. read operands je 0x400304 1. read instruction from mem 3. write sum to eax

2. read flags 4. write new value to flags 3. write new value to pc 5. write new value to pc

• The database should include lemmas about reads from and writes to the machine state, along with the interactions between these operations.

Machine-Code Analysis Framework 13/27 Building a Lemma Database

• System data structures, like the paging structures, are extremely complicated.

• Correct operation of a system heavily depends upon such structures.

• We need to prove lemmas that can aid in proving the following kinds of critical properties: - Processes are isolated from each other. - Page tables, including access rights, are set up correctly.

Machine-Code Analysis Framework 14/27 Address Translations

/27 Logical Address Address Translations

Segment Offset Selector

SEGMENTATION /27 Logical Address Address Translations

Segment Offset Selector

Descriptor Table(s)

Segment Descriptor

Linear Memory

Global or Local Descriptor Table Register SEGMENTATION /27 Logical Address Address Translations

Segment Offset Selector

Descriptor Table(s)

Segment Linear Addr. Descriptor

Segment

Linear Memory

Global or Local Descriptor Table Register SEGMENTATION /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

Segment Linear Addr. Descriptor

Segment

Linear Memory

Global or Local Descriptor Table Register SEGMENTATION IA-32e PAGING (4K page) /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

Segment Linear Addr. Descriptor

Segment

Linear Physical Memory Memory

PML4E

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register SEGMENTATION IA-32e PAGING (4K page) /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

Segment Linear Addr. Descriptor

PDPTE Segment

Linear Physical Memory Memory

PML4E

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register SEGMENTATION IA-32e PAGING (4K page) /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

PDE

Segment Linear Addr. Descriptor

PDPTE Segment

Linear Physical Memory Memory

PML4E

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register SEGMENTATION IA-32e PAGING (4K page) /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

PDE

Segment Linear Addr. Descriptor PTE

PDPTE Segment

Linear Physical Memory Memory

PML4E

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register SEGMENTATION IA-32e PAGING (4K page) /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

PDE

Segment Linear Addr. Descriptor PTE 4K Page

Physical Addr. PDPTE Segment 4K Page Linear Physical Memory Memory

PML4E

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register SEGMENTATION IA-32e PAGING (4K page) /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

PDE a

Segment Linear Addr. Descriptor PTE a 4K Page

Physical Addr. PDPTE a Segment 4K Page Linear Physical Memory Memory

PML4E a

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register accessed SEGMENTATION a IA-32e PAGING (4K page) flag /27 Logical Address Address Translations

Segment Linear Address Offset Selector PML4 Dir. Ptr. Dir. Table Offset

Descriptor Table(s)

PDE a

Segment Linear Addr. Descriptor PTE a d 4K Page

Physical Addr. PDPTE a Segment 4K Page Linear Physical Memory Memory

PML4E a

Control Register CR3 has the base address Global or Local of these structures. Descriptor Table Register accessed dirty SEGMENTATION a d IA-32e PAGING (4K page) flag flag /27 Current Status: Analysis Framework

• Automatically generate and prove many lemmas about reads and writes

• Libraries to reason about (non-)interference of memory regions

• Proved general lemmas about paging data structure traversals

Machine-Code Analysis Framework 16/27 Outline

๏ Motivation

๏ Project Description

➡ [1] Developing an x86 ISA Model

➡ [2] Building a Machine-Code Analysis Framework

➡ [3] Verifying Application and System Programs

๏ Future Work & Conclusion

๏ Accessing Source Code + Documentation

17/27 Application Program #1: popcount

Automatically verify snippets of straight-line machine code using bit- blasting [VSTTE’13]

55 push %rbp 48 89 e5 mov %rsp,%rbp 89 7d fc mov %edi,-0x4(%rbp) 8b 7d fc mov -0x4(%rbp),%edi 8b 45 fc mov -0x4(%rbp),%eax c1 e8 01 shr $0x1,%eax 25 55 55 55 55 and $0x55555555,%eax 29 c7 sub %eax,%edi 89 7d fc mov %edi,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 25 33 33 33 33 and $0x33333333,%eax 8b 7d fc mov -0x4(%rbp),%edi c1 ef 02 shr $0x2,%edi 81 e7 33 33 33 33 and $0x33333333,%edi 01 f8 add %edi,%eax 89 45 fc mov %eax,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 8b 7d fc mov -0x4(%rbp),%edi c1 ef 04 shr $0x4,%edi 01 f8 add %edi,%eax 25 0f 0f 0f 0f and $0xf0f0f0f,%eax 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax c1 e8 18 shr $0x18,%eax 89 45 fc mov %eax,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 5d pop %rbp c3 retq

Program Verification 18/27 Application Program #1: popcount

Automatically verify snippets of straight-line machine code using bit- blasting [VSTTE’13]

int55 popcount_32 ( unsigned push int %rbp v) { 48 89 e5 mov %rsp,%rbp 89// 7dFrom fc Sean Anderson’s mov Bit-Twiddling %edi,-0x4(%rbp) Hacks 8bv =7d v fc- ((v >> 1) &mov 0x55555555); -0x4(%rbp),%edi 8bv =45 (v fc & 0x33333333) mov + ((v -0x4(%rbp),%eax >> 2) & 0x33333333); c1v =e8 ((v 01 + (v >> 4) shr& 0xF0F0F0F) $0x1,%eax * 0x1010101) >> 24; 25return 55 55(v); 55 55 and $0x55555555,%eax } 29 c7 sub %eax,%edi 89 7d fc mov %edi,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 25 33 33 33 33 and $0x33333333,%eax 8b 7d fc mov -0x4(%rbp),%edi c1 ef 02 shr $0x2,%edi 81 e7 33 33 33 33 and $0x33333333,%edi 01 f8 add %edi,%eax 89 45 fc mov %eax,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 8b 7d fc mov -0x4(%rbp),%edi c1 ef 04 shr $0x4,%edi 01 f8 add %edi,%eax 25 0f 0f 0f 0f and $0xf0f0f0f,%eax 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax c1 e8 18 shr $0x18,%eax 89 45 fc mov %eax,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 5d pop %rbp c3 retq

Program Verification 18/27 Application Program #1: popcount

Automatically verify snippets of straight-line machine code using bit- blasting [VSTTE’13] unsigned int int55 popcount_32 ( unsigned push int %rbp v) { 48 89 e5 mov %rsp,%rbp 89// 7dFrom fc Sean Anderson’s mov Bit-Twiddling %edi,-0x4(%rbp) Hacks RAX = popcount(input) 8bv =7d v fc- ((v >> 1) &mov 0x55555555); -0x4(%rbp),%edi 8bv =45 (v fc & 0x33333333) mov + ((v -0x4(%rbp),%eax >> 2) & 0x33333333); c1v =e8 ((v 01 + (v >> 4) shr& 0xF0F0F0F) $0x1,%eax * 0x1010101) >> 24; 25return 55 55(v); 55 55 and $0x55555555,%eax specification function } 29 c7 sub %eax,%edi 89 7d fc mov %edi,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 25 33 33 33 33 and $0x33333333,%eax popcount(x): 8b 7d fc mov -0x4(%rbp),%edi c1 ef 02 shr $0x2,%edi 81 e7 33 33 33 33 and $0x33333333,%edi if (x <= 0) then 01 f8 add %edi,%eax 89 45 fc mov %eax,-0x4(%rbp) return 0 8b 45 fc mov -0x4(%rbp),%eax else 8b 7d fc mov -0x4(%rbp),%edi c1 ef 04 shr $0x4,%edi lsb := x & 1 01 f8 add %edi,%eax x := x >> 1 25 0f 0f 0f 0f and $0xf0f0f0f,%eax 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax return (lsb + popcount(x)) c1 e8 18 shr $0x18,%eax endif 89 45 fc mov %eax,-0x4(%rbp) 8b 45 fc mov -0x4(%rbp),%eax 5d pop %rbp c3 retq

Program Verification 18/27 Application Program #2: word-count

• Proved the functional correctness of a word-count program that reads input from the user using read system calls. System calls are non- deterministic for application programs. [FMCAD’14]

55 push %rbp 48 89 e5 mov %rsp,%rbp 48 83 ec 20 sub $0x20,%rsp c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp) c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp) c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp) c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp) e8 90 ff ff ff callq <_gc> … 05 01 00 00 00 add $0x1,%eax 89 45 f0 mov %eax,-0x10(%rbp) e9 00 00 00 00 jmpq <_main+0xb8> e9 6e ff ff ff jmpq <_main+0x2b> 31 c0 xor %eax,%eax 48 83 c4 20 add $0x20,%rsp 5d pop %rbp c3 retq

Program Verification 19/27 Application Program #2: word-count

• Proved the functional correctness of a word-count program that reads input from the user using read system calls. System calls are non- deterministic for application programs. [FMCAD’14] Specification for counting the # of 55 push %rbp 48 89 e5 mov %rsp,%rbp characters in str: 48 83 ec 20 sub $0x20,%rsp c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) ncSpec(offset, str, count): c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp) c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp) c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp) if (well-formed(str) && offset < len(str))c7 45 f4then 00 00 00 00 movl $0x0,-0xc(%rbp) c := str[offset] e8 90 ff ff ff callq <_gc> … if (c == EOF) then 05 01 00 00 00 add $0x1,%eax return count 89 45 f0 mov %eax,-0x10(%rbp) else e9 00 00 00 00 jmpq <_main+0xb8> e9 6e ff ff ff jmpq <_main+0x2b> count := (count + 1) mod 2^32 31 c0 xor %eax,%eax ncSpec(1 + offset, str, count) 48 83 c4 20 add $0x20,%rsp endif 5d pop %rbp endif Functionalc3 Correctness Theorem: retq Values computed by specification functions on standard input are found in the expected memory locations of the final x86 state.

Program Verification 19/27 Application Program #2: word-count

Other properties verified using our machine-code framework:

• Resource Usage: ‣ Program and its stack are disjoint for all inputs. ‣ Irrespective of the input, program uses a fixed amount of memory.

• Security: ‣ Program does not modify unintended regions of memory.

Program Verification 20/27 System Program: zero-copy

Specification: Copy data x from virtual memory location l0 to disjoint linear memory location l1.

Verification Objective: l0 x After a successful copy, l0 and l1 contain x. p x

x Implementation: l1 Include the copy-on-write technique: l0 and l1 can be mapped to the same physical memory location p. Linear Physical Memory Memory ‣ Modifications to address mapping ‣ Access control management

Program Verification 21/27 System Program: zero-copy

Proved that the implementation of a zero-copy program meets the specification of a simple copy operation. l0 x For simplicity, marking of paging structures p x during their traversal was turned off, i.e., no accessed and dirty bit updates were allowed for x this proof. l1

We are currently porting this proof over to a more accurate x86 model, which characterizes updates Linear Physical to accessed and dirty bits as well. Memory Memory

Program Verification 22/27 Outline

๏ Motivation

๏ Project Description

➡ [1] Developing an x86 ISA Model

➡ [2] Building a Machine-Code Analysis Framework

➡ [3] Verifying Application and System Programs

๏ Future Work & Conclusion

๏ Accessing Source Code + Documentation

23/27 Future Work

• Run a 64-bit FreeBSD kernel on our x86 ISA model - This involves identifying and implementing relevant instructions, call gates, supporting task management, etc.

• Develop lemma libraries to reason about kernel code - This involves developing automated reasoning infrastructure for page table walks, access rights, etc.

• Identify and prove critical invariants in kernel code - This includes proving the correctness of context switches, privilege escalations, etc.

We look forward to collaborating with the community!

Future Work 24/27 Conclusion

It is essential to state and prove properties related to behavior, security, and resource usage; bug-hunting can only take us so far. This task is within the scope of mechanized theorem proving, as is evidenced by its use by our collaborators in the government and the industry to prove complex properties about complex systems.

Although full verification of all software is the ultimate goal, the focus for the coming years is to create islands of trust, i.e., parts of the system for which complex properties have been formally verified.

Conclusion 25/27 Accessing Source Code + Documentation

The x86isa project is available under BSD 3-Clause license as a part of the ACL2 Community Books project.

Go to https://github.com/acl2/acl2/ and see books/projects/x86isa/README for details.

Also, documentation and user’s manual is available online at www.cs.utexas.edu/users/moore/acl2/manuals/ current/manual/?topic=ACL2____X86ISA

Source Code + Documentation 26/27 Some Publications

• Shilpi Goel, Warren A. Hunt, Jr., and Matt Kaufmann. Abstract Stobjs and Their Application to ISA Modeling In ACL2 Workshop, 2013

• Shilpi Goel and Warren A. Hunt, Jr. Automated Code Proofs on a Formal Model of the x86 In Verified Software: Theories, Tools, Experiments (VSTTE), 2013

• Shilpi Goel, Warren A. Hunt, Jr., Matt Kaufmann, and Soumava Ghosh. Simulation and Formal Verification of x86 Machine-Code Programs That Make System Calls In Formal Methods in Computer-Aided Design (FMCAD), 2014

• Shilpi Goel, Warren A. Hunt, Jr., and Matt Kaufmann. Engineering a Formal, Executable x86 ISA Simulator for Software Verification To appear in Provably Correct Software (ProCoS), 2015

27/27 Extra Slides

/27 Verification Effort vs. Verification Utility

Programmer-level Mode System-level Mode

- Verification of application - Verification of system programs programs

- Linear memory address space - Physical memory address space (264 bytes) (252 bytes)

- Assumptions about correctness of - No assumptions about OS OS operations operations

Task 3 | Program Verification | Effort vs. Utility 29/27 Motivation: x86 Machine-Code Verification

• Why not high-level code verification? X High-level verification frameworks do not address compiler bugs ✓ Verified/verifying compilers can help X But these compilers typically generate inefficient code X Need to build verification frameworks for many high-level languages X Sometimes, high-level code is unavailable

• Why x86? ✓ x86 is in widespread use — our approach will have immediate practical application

Motivation 30/27 Building a Lemma Database

Three kinds of theorems: ‣ Read-over-Write Theorems ‣ Write-over-Write Theorems ‣ Preservation Theorems

Task 2 | Machine-Code Analysis Framework | Lemma Database 31/27 Read-over-Write Theorem: #1

non-interference Program Order i j y memory

Task 2 | Machine-Code Analysis Framework | Lemma Database 32/27 Read-over-Write Theorem: #1

non-interference Program Order i j x y memory Wi(x)

Task 2 | Machine-Code Analysis Framework | Lemma Database 32/27 Read-over-Write Theorem: #1

non-interference Program Order i j x y memory Wi(x)

Rj: y

Task 2 | Machine-Code Analysis Framework | Lemma Database 32/27 Read-over-Write Theorem: #2

overlap Program Order i

memory

Task 2 | Machine-Code Analysis Framework | Lemma Database 33/27 Read-over-Write Theorem: #2

overlap Program Order i x memory Wi(x)

Task 2 | Machine-Code Analysis Framework | Lemma Database 33/27 Read-over-Write Theorem: #2

overlap Program Order i x memory Wi(x)

Ri: x

Task 2 | Machine-Code Analysis Framework | Lemma Database 33/27 Write-over-Write Theorem: #1

Program independent writes commute safely Order i j

memory

Task 2 | Machine-Code Analysis Framework | Lemma Database 34/27 Write-over-Write Theorem: #1

Program independent writes commute safely Order i j x memory Wi(x)

Task 2 | Machine-Code Analysis Framework | Lemma Database 34/27 Write-over-Write Theorem: #1

Program independent writes commute safely Order i j x y memory Wi(x)

Wj(y)

Task 2 | Machine-Code Analysis Framework | Lemma Database 34/27 Write-over-Write Theorem: #1

Program independent writes commute safely Order i j x y memory Wi(x)

Wj(y)

Program = Order i j

memory

Task 2 | Machine-Code Analysis Framework | Lemma Database 34/27 Write-over-Write Theorem: #1

Program independent writes commute safely Order i j x y memory Wi(x)

Wj(y)

Program = Order i j y memory Wj(y)

Task 2 | Machine-Code Analysis Framework | Lemma Database 34/27 Write-over-Write Theorem: #1

Program independent writes commute safely Order i j x y memory Wi(x)

Wj(y)

Program = Order i j x y memory Wj(y)

Wi(x)

Task 2 | Machine-Code Analysis Framework | Lemma Database 34/27 Write-over-Write Theorem: #2

Program visibility of writes Order i

memory

Task 2 | Machine-Code Analysis Framework | Lemma Database 35/27 Write-over-Write Theorem: #2

Program visibility of writes Order i x memory Wi(x)

Task 2 | Machine-Code Analysis Framework | Lemma Database 35/27 Write-over-Write Theorem: #2

Program visibility of writes Order i y memory Wi(x)

Wi(y)

Task 2 | Machine-Code Analysis Framework | Lemma Database 35/27 Write-over-Write Theorem: #2

Program visibility of writes Order i y memory Wi(x)

Wi(y)

Program = Order i

memory

Task 2 | Machine-Code Analysis Framework | Lemma Database 35/27 Write-over-Write Theorem: #2

Program visibility of writes Order i y memory Wi(x)

Wi(y)

Program = Order i y memory Wi(y)

Task 2 | Machine-Code Analysis Framework | Lemma Database 35/27 Preservation Theorems

i x memory

reading from a valid x86 state

valid-address-p(i) ⋀ valid-x86-p(x86) ⇒ valid-value-p( R i: x ) ⋀ valid-x86-p(x86)

Task 2 | Machine-Code Analysis Framework | Lemma Database 36/27 Preservation Theorems

i x memory

reading from a valid x86 state writing to a valid x86 state

valid-address-p(i) ⋀ valid-address-p(i) ⋀ valid-x86-p(x86) valid-value-p(x) ⋀ ⇒ valid-x86-p(x86) valid-value-p( R i: x ) ⋀ ⇒ valid-x86-p(x86) valid-x86-p( W i (x ) )

Task 2 | Machine-Code Analysis Framework | Lemma Database 36/27 Verification Effort vs. Verification Utility

Task 3 | Program Verification | Effort vs. Utility 37/27 Programmer-level Mode: Model Validation

Task A: Validate the logical mode against the execution mode Task B: Validate the execution mode against the processor + service provided by the OS /27 Programmer-level Mode: Execution Mode

/27 Programmer-level Mode: Execution and Reasoning

/27 Verification Landscape

Verification Tools: high degree of manual state effort explosion problem limited can be analysis applied to Model capabilities Interactive large systems checkers theorem Static & provers coupled Interactive dynamic SAT & SMT with automatic theorem analyzers solvers tools provers

Automatic Interactive

/27