1
Assured Information Security (@ainfosec)
Encrypted Execution and You
@JacobTorrey
DayCon 9 – October 2015
@JacobTorrey — HARES Outline 2
Introduction
Background
HARES Details
Results
Implications
Conclusions
@JacobTorrey — HARES Amp Up! 3
I was pretty nervous presenting here in front of all of you, luckily one of my idols gave me the courage to stand up here in front of you all!
@JacobTorrey — HARES Who am I? 4
I Advising Research Engineer at Assured Information Security (words are my own)
I Site lead for Denver, CO office; provides InfoSec strategy consulting
I Leads low-level Computer Architectures research group
I Plays in Intel privilege rings ≤ 0
I Ultra-runner, ultra-cyclist & mountaineer
@JacobTorrey — HARES Overview 5
I Encrypted (non-reversable) execution is coming
I HARES provides the ability to execute fully-encrypted binaries with minimal performance impact (˜2%)
I Intel Software Guard Extensions (SGX) provides support for encrypted enclaves
I How will this impact attackers vs. defenders?
@JacobTorrey — HARES Impact 6
I Provides a secure execution capability for running sensitive applications in contested environments
I Could integrate with cloud-computing offerings to minimize trust placed in cloud provider
I Security of solution dependent on encryption, not security-through-obscurity
I Provides technology, does not enforce usage based on “morals”
@JacobTorrey — HARES Problem Statement 7
I Algorithms exposed to copying or theft
I Application code can be used to develop attacks
I Code can be reused for unintended purposes (ROP)
I Malware RE can lead to attribution and C&C attack
@JacobTorrey — HARES Current State-of-the-Art 8
I Obfuscation and packers — Analysis tools and live debugging can recover instructions
I VM-based obfuscation — Can still be mapped to x86 and impacts performance
I Encrypting entire OS with trusted boot — Only prevents against offline attacks
@JacobTorrey — HARES Von Neumann versus Harvard Architectures 9
Von Neumann systems have a unified memory and buses for data and instructions:
Harvard architectures have separate memories for data and instructions:
@JacobTorrey — HARES AES-NI 10
I In response to software-based caching attacks on AES, Intel released instruction set to support AES
I Hardware logic is faster, and more protected
I Supports 128-bit and 256-bit AES
I Provides primitives, still requires engineering to make a safe system on top of these
@JacobTorrey — HARES TRESOR 11
I Uses AES-NI and CPU debug registers to provide accelerated, cold-RAM resistant AES on Linux
I Key loaded early boot
I Kernel patch to prevent applications reading debug registers
@JacobTorrey — HARES TLB-Splitting 12
I Translation lookaside buffer (TLB) is a cache for virtual −→ physical address translations
I Used to improve paging performance
I Logically treated as single entity, physically multiple components I Switches x86 platform from apparent Von Neumann to Harvard architecture:
I Used by PaX/GRSecurity to emulate no-execute bit I Shadow Walker used for memory-hiding root-kit functionality
@JacobTorrey — HARES TLB-Splitting (cont.) 13
Figure 1: TLB during normal operation
Figure 2: Split TLB
@JacobTorrey — HARES TLB-Splitting Impact 14
I TLB-splitting allows, through software, the apparent hardware architecture to be changed, per process
I Most end-point protection capabilities (PatchGuard, AV, anti-root-kit tools, etc.) assume a Von Neumann system; TLB-splitting breaks that assumption and can bypass all of them (I bet your AV vendor forgot to mention that...)
@JacobTorrey — HARES S-TLB 15
With Nehalem micro-architecture, an L2 cache was introduced, the S-TLB; breaks split-TLB assumptions
Figure 3: S-TLB
@JacobTorrey — HARES MoRE: Measurement of Running Executables 16
I DARPA Cyber Fast Track program I Explored using TLB-splitting for measurement/integrity verification of interleaved application
I Immutable code page (can repeatably measure in real-time) I Mutable data page (for variable isolation)
I Used EPT granular permissions to simulate a split-TLB on newer CPUs with S-TLB
I Thin-VMM to simulate Harvard architecture on per-process basis
@JacobTorrey — HARES Why HARES? 17
I HARES is an example of an encrypted capability (enclave), that allows exploration of the implications of encrypted execution
I Intel SGX soon will provide similar capabilities, but through hardware modifications, providing stronger security guarantees
I The following deep-dive will provide background of what the current state-of-the-art is
@JacobTorrey — HARES VMX Thin-Hypervisor 18
I Loaded as Windows 7 kernel driver
I Based on vmcpu root-kit example
I No emulation of devices, OS retains direct access to HW
I Minimal performance impact
I Can use VMCS exit conditions to track certain architectural events
@JacobTorrey — HARES On-CPU AES 19
I “Hoists” TRESOR on-CPU AES into VMM
I AES-NI key stored in CPU debug registers to protect against cold-RAM attack
I Adds VMCS exit condition for debug register accesses (return NULL or silently discard write)
I Decrypts executable sections of program into execute-only memory
@JacobTorrey — HARES TLB-Splitting 20
I All data-fetch requests, even from application itself, directed via EPT/TLB to encrypted page
I All instruction fetches are directed to execute-only, decrypted, memory
I Must track Windows application memory management events (COW) and ensure EPT structures correspond with OS-level structures
@JacobTorrey — HARES EPT Fault Flowchart 21
@JacobTorrey — HARES Tested Applications 22
I Command-line synthetic test-suite
I Microsoft Windows Notepad
I Microsoft Windows Paint
I Microsoft Windows Calculator Usability of HARES-protected applications was not noticeably impacted to end-user
@JacobTorrey — HARES Test Results 23
Overall, average performance impact was ˜2%
Test Name Average Execution Time (s) Average HARES Execution Time (s) Performance Impact (%) Pi 28.173 28.600 1.515 Randomized Sort1 0.016 0.031 95.83 Coin Flip 1.778 1.809 1.762 Timers 15.013 15.023 0.069 Table 1: Performance Results for CLI Test-Suite
1The randomized sort runs for such a short period that the initialization routine of the Windows process creation monitor almost doubles the execution time. If the TLB-splitting and periodic measurement is disabled, the run-time is still almost double. For this test, it is reasonable to assume that there is a constant increase of ˜.015s for each test case run under the VMX hypervisor, and due to this test case’s short duration it skewed the percentage. @JacobTorrey — HARES Demonstration System 24
Specifications for the demonstration system to overcome some prototype limitations:
I Intel Core-i series processor with AES-NI; Windows 7 32-bit OS
I Single processor [numproc=1] (Our thin-VMM only supports single core currently)
I 2GiB RAM [truncatememory=0x80000000] (Windows kernel memory layout changes > 2 GiB, and too lazy to update hard-coded values)
I No PAE/DEP [nx=AlwaysOff and pae=ForceDisable] (Again, hard-coded memory layout code)
@JacobTorrey — HARES Demonstration 25
[ALT-TAB]
@JacobTorrey — HARES Major Challenge: Mixed Code & Data 26
Code and string ”Statistics” sharing same section (.text):
@JacobTorrey — HARES Security Benefits 27
Protects against:
I Reverse-engineering and algorithmic IP theft
I “Weaponization” of crash case into RCE
I Mining application source for ROP gadgets
I Harvard architecture resistant to code injection attacks & pivoting
@JacobTorrey — HARES Weaknesses 28
I JTAG/ICE/XDM
I Memory Dumping
I DMA
I SMM/AMT
I Side-channels
I Emulation/VMM
I Syscall Inference
@JacobTorrey — HARES Overcoming Weaknesses & Future Work 29
Intel SGX will solve many of these, but for HARES:
I VT-d/IOMMU (DMA)
I Cache-as-RAM (memory dumping)
I DRTM launch (VMM)
I Combining with unique compilation
@JacobTorrey — HARES Offensive Use-Cases 30
I Offensive key management is challenging, needs further research into new models for a PKI-like system
I Easier to use when all parties are aware and in agreement
@JacobTorrey — HARES Physically Unclonable Functions 31
I Expose manufacturing variance across devices of same type
I Return unique (unclonable without FIB) values to given input I Examples:
I SRAM initializes to PUF pattern I Flash memory requires different amounts of charge to jump from 1 to 0 I FPGA gate fabric can exhibit PUFs when “racing” ring oscillators
I Provides assurances device has not been modified or simulated
@JacobTorrey — HARES S-DRTM 32
I Provides method to verify trustworthiness of platform without requiring hardware extensions such as Intel’s Trusted Execution Technologies I Typically uses timing-based attestation against remote trusted entity to measure system and loaded code and prevent active RE or emulation/virtualization
2
@JacobTorrey2Martignoni, — HARES L et al. Conqueror: Tamper-Proof Code Execution on Legacy Systems Offensive Key Management 33
I Boot-kit for key loading
I Conqueror for VMM/debugger detection I PUFs for machine-unique keys instead TPM (paired with multi-stage dropper)
I With protected execution and device-specific keys we can bootstrap a soft-TPM!
I Have trusted, opaque implants
@JacobTorrey — HARES AV Heuristics 34
I However, notepad.exe (unencrypted) with a 1-bit change to remove from MS white-list also hits 4/57
@JacobTorrey — HARES Attackers vs. Defenders 35
I There is a track record of attackers evolving to take advantage of technology quicker and more effectively than the defenders it was designed and built for:
I Blue Pill I SMM root-kits I ARM TrustZone root-kits I Encrypted malware?
I Personal incentives are asymmetric between offense and defense, “cyber” is not unique
@JacobTorrey — HARES Attackers vs. Defenders (cont.) 36
I Author of “From Pablo to Osama” makes thesis claim that clandestine organizations will always be more effective and agile due to the substantial personal risk & incentives involved with criminal activities
I Law-enforcement is typically government-run, much more slight personal risk & incentive (hard to fire government employees) I I posit this same dynamic exists in “cyber”: defenders rarely go to jail or lose livelihood from failure, offense can face severe consequences
I If you’re fired due to breach, you can put “Incident Response” on CV and get new job I Interesting to see how US - China cyber agreements will change as indictments keep coming
@JacobTorrey — HARES Concluding Remarks 37
I HARES demonstrates viability of encrypted execution on existing, common hardware
I Reversing difficulty will soon become intractable
I Intel SGX will be an exciting hardware extension to the platform and should be explored
I It will be explored and exploited by attackers far more effectively than by defenders
@JacobTorrey — HARES Related Works 38
I TrulyProtect - AES VM running decryption routine in cache
I PrivateCore (now Facebook) - Linux kernel running in cache, programs encrypted outside of cache
I Intel SGX - Hardware extensions to x86 to support HARES-like capabilities
I Working towards a Palladium-like trusted and opaque execution environment
@JacobTorrey — HARES Acknowledgments 39
I Mark Bridgman (@c0ercion) for his work on this effort
I Mudge (@dotMudge)& DARPA for supporting the precursor work (MoRE)
I Loc Nguyen (@nocsi )& Ryan Stortz (@withzombies) for their input from a reverse engineering perspective
@JacobTorrey — HARES References 40
Formal references can be found in the whitepaper for:
I GRSecurity PAGEEXEC
I Shadow Walker
I Intel SDM
I TRESOR & TRESOR-Hunt
I ARIUM website
I Self-hashing applications
I CoreBoot Cache-as-RAM and more.
@JacobTorrey — HARES Thanks! 41
Hope to see you again soon!
@JacobTorrey — HARES Questions & Discussion 42
I Thanks for your attention!
I Any questions? or let the heckling begin!
I Eleven page white-paper: http://jacobtorrey.com/HARES-WP.pdf
@JacobTorrey — HARES