Encrypted Execution and You
Total Page:16
File Type:pdf, Size:1020Kb
1 Assured Information Security (@ainfosec) Encrypted Execution and You @JacobTorrey DayCon 9 { October 2015 @JacobTorrey | HARES Outline 2 Introduction Background HARES Details Results Implications Conclusions @JacobTorrey | HARES Amp Up! 3 I was pretty nervous presenting here in front of all of you, luckily one of my idols gave me the courage to stand up here in front of you all! @JacobTorrey | HARES Who am I? 4 I Advising Research Engineer at Assured Information Security (words are my own) I Site lead for Denver, CO office; provides InfoSec strategy consulting I Leads low-level Computer Architectures research group I Plays in Intel privilege rings ≤ 0 I Ultra-runner, ultra-cyclist & mountaineer @JacobTorrey | HARES Overview 5 I Encrypted (non-reversable) execution is coming I HARES provides the ability to execute fully-encrypted binaries with minimal performance impact (~2%) I Intel Software Guard Extensions (SGX) provides support for encrypted enclaves I How will this impact attackers vs. defenders? @JacobTorrey | HARES Impact 6 I Provides a secure execution capability for running sensitive applications in contested environments I Could integrate with cloud-computing offerings to minimize trust placed in cloud provider I Security of solution dependent on encryption, not security-through-obscurity I Provides technology, does not enforce usage based on \morals" @JacobTorrey | HARES Problem Statement 7 I Algorithms exposed to copying or theft I Application code can be used to develop attacks I Code can be reused for unintended purposes (ROP) I Malware RE can lead to attribution and C&C attack @JacobTorrey | HARES Current State-of-the-Art 8 I Obfuscation and packers | Analysis tools and live debugging can recover instructions I VM-based obfuscation | Can still be mapped to x86 and impacts performance I Encrypting entire OS with trusted boot | Only prevents against offline attacks @JacobTorrey | HARES Von Neumann versus Harvard Architectures 9 Von Neumann systems have a unified memory and buses for data and instructions: Harvard architectures have separate memories for data and instructions: @JacobTorrey | HARES AES-NI 10 I In response to software-based caching attacks on AES, Intel released instruction set to support AES I Hardware logic is faster, and more protected I Supports 128-bit and 256-bit AES I Provides primitives, still requires engineering to make a safe system on top of these @JacobTorrey | HARES TRESOR 11 I Uses AES-NI and CPU debug registers to provide accelerated, cold-RAM resistant AES on Linux I Key loaded early boot I Kernel patch to prevent applications reading debug registers @JacobTorrey | HARES TLB-Splitting 12 I Translation lookaside buffer (TLB) is a cache for virtual −! physical address translations I Used to improve paging performance I Logically treated as single entity, physically multiple components I Switches x86 platform from apparent Von Neumann to Harvard architecture: I Used by PaX/GRSecurity to emulate no-execute bit I Shadow Walker used for memory-hiding root-kit functionality @JacobTorrey | HARES TLB-Splitting (cont.) 13 Figure 1: TLB during normal operation Figure 2: Split TLB @JacobTorrey | HARES TLB-Splitting Impact 14 I TLB-splitting allows, through software, the apparent hardware architecture to be changed, per process I Most end-point protection capabilities (PatchGuard, AV, anti-root-kit tools, etc.) assume a Von Neumann system; TLB-splitting breaks that assumption and can bypass all of them (I bet your AV vendor forgot to mention that...) @JacobTorrey | HARES S-TLB 15 With Nehalem micro-architecture, an L2 cache was introduced, the S-TLB; breaks split-TLB assumptions Figure 3: S-TLB @JacobTorrey | HARES MoRE: Measurement of Running Executables 16 I DARPA Cyber Fast Track program I Explored using TLB-splitting for measurement/integrity verification of interleaved application I Immutable code page (can repeatably measure in real-time) I Mutable data page (for variable isolation) I Used EPT granular permissions to simulate a split-TLB on newer CPUs with S-TLB I Thin-VMM to simulate Harvard architecture on per-process basis @JacobTorrey | HARES Why HARES? 17 I HARES is an example of an encrypted capability (enclave), that allows exploration of the implications of encrypted execution I Intel SGX soon will provide similar capabilities, but through hardware modifications, providing stronger security guarantees I The following deep-dive will provide background of what the current state-of-the-art is @JacobTorrey | HARES VMX Thin-Hypervisor 18 I Loaded as Windows 7 kernel driver I Based on vmcpu root-kit example I No emulation of devices, OS retains direct access to HW I Minimal performance impact I Can use VMCS exit conditions to track certain architectural events @JacobTorrey | HARES On-CPU AES 19 I \Hoists" TRESOR on-CPU AES into VMM I AES-NI key stored in CPU debug registers to protect against cold-RAM attack I Adds VMCS exit condition for debug register accesses (return NULL or silently discard write) I Decrypts executable sections of program into execute-only memory @JacobTorrey | HARES TLB-Splitting 20 I All data-fetch requests, even from application itself, directed via EPT/TLB to encrypted page I All instruction fetches are directed to execute-only, decrypted, memory I Must track Windows application memory management events (COW) and ensure EPT structures correspond with OS-level structures @JacobTorrey | HARES EPT Fault Flowchart 21 @JacobTorrey | HARES Tested Applications 22 I Command-line synthetic test-suite I Microsoft Windows Notepad I Microsoft Windows Paint I Microsoft Windows Calculator Usability of HARES-protected applications was not noticeably impacted to end-user @JacobTorrey | HARES Test Results 23 Overall, average performance impact was ~2% Test Name Average Execution Time (s) Average HARES Execution Time (s) Performance Impact (%) Pi 28.173 28.600 1.515 Randomized Sort1 0.016 0.031 95.83 Coin Flip 1.778 1.809 1.762 Timers 15.013 15.023 0.069 Table 1: Performance Results for CLI Test-Suite 1The randomized sort runs for such a short period that the initialization routine of the Windows process creation monitor almost doubles the execution time. If the TLB-splitting and periodic measurement is disabled, the run-time is still almost double. For this test, it is reasonable to assume that there is a constant increase of ~.015s for each test case run under the VMX hypervisor, and due to this test case's short duration it skewed the percentage. @JacobTorrey | HARES Demonstration System 24 Specifications for the demonstration system to overcome some prototype limitations: I Intel Core-i series processor with AES-NI; Windows 7 32-bit OS I Single processor [numproc=1] (Our thin-VMM only supports single core currently) I 2GiB RAM [truncatememory=0x80000000] (Windows kernel memory layout changes > 2 GiB, and too lazy to update hard-coded values) I No PAE/DEP [nx=AlwaysOff and pae=ForceDisable] (Again, hard-coded memory layout code) @JacobTorrey | HARES Demonstration 25 [ALT-TAB] @JacobTorrey | HARES Major Challenge: Mixed Code & Data 26 Code and string "Statistics" sharing same section (.text): @JacobTorrey | HARES Security Benefits 27 Protects against: I Reverse-engineering and algorithmic IP theft I \Weaponization" of crash case into RCE I Mining application source for ROP gadgets I Harvard architecture resistant to code injection attacks & pivoting @JacobTorrey | HARES Weaknesses 28 I JTAG/ICE/XDM I Memory Dumping I DMA I SMM/AMT I Side-channels I Emulation/VMM I Syscall Inference @JacobTorrey | HARES Overcoming Weaknesses & Future Work 29 Intel SGX will solve many of these, but for HARES: I VT-d/IOMMU (DMA) I Cache-as-RAM (memory dumping) I DRTM launch (VMM) I Combining with unique compilation @JacobTorrey | HARES Offensive Use-Cases 30 I Offensive key management is challenging, needs further research into new models for a PKI-like system I Easier to use when all parties are aware and in agreement @JacobTorrey | HARES Physically Unclonable Functions 31 I Expose manufacturing variance across devices of same type I Return unique (unclonable without FIB) values to given input I Examples: I SRAM initializes to PUF pattern I Flash memory requires different amounts of charge to jump from 1 to 0 I FPGA gate fabric can exhibit PUFs when \racing" ring oscillators I Provides assurances device has not been modified or simulated @JacobTorrey | HARES S-DRTM 32 I Provides method to verify trustworthiness of platform without requiring hardware extensions such as Intel's Trusted Execution Technologies I Typically uses timing-based attestation against remote trusted entity to measure system and loaded code and prevent active RE or emulation/virtualization 2 @JacobTorrey2Martignoni, | HARES L et al. Conqueror: Tamper-Proof Code Execution on Legacy Systems Offensive Key Management 33 I Boot-kit for key loading I Conqueror for VMM/debugger detection I PUFs for machine-unique keys instead TPM (paired with multi-stage dropper) I With protected execution and device-specific keys we can bootstrap a soft-TPM! I Have trusted, opaque implants @JacobTorrey | HARES AV Heuristics 34 I However, notepad.exe (unencrypted) with a 1-bit change to remove from MS white-list also hits 4/57 @JacobTorrey | HARES Attackers vs. Defenders 35 I There is a track record of attackers evolving to take advantage of technology quicker and more effectively than the defenders it was designed and built for: I Blue Pill I SMM root-kits I ARM TrustZone root-kits I Encrypted malware? I Personal incentives are asymmetric between offense and