1

Assured Information Security (@ainfosec)

Encrypted Execution and You

@JacobTorrey

DayCon 9 – October 2015

@JacobTorrey — HARES Outline 2

Introduction

Background

HARES Details

Results

Implications

Conclusions

@JacobTorrey — HARES Amp Up! 3

I was pretty nervous presenting here in front of all of you, luckily one of my idols gave me the courage to stand up here in front of you all!

@JacobTorrey — HARES Who am I? 4

I Advising Research Engineer at Assured Information Security (words are my own)

I Site lead for Denver, CO office; provides InfoSec strategy consulting

I Leads low-level Computer Architectures research group

I Plays in privilege rings ≤ 0

I Ultra-runner, ultra-cyclist & mountaineer

@JacobTorrey — HARES Overview 5

I Encrypted (non-reversable) execution is coming

I HARES provides the ability to execute fully-encrypted binaries with minimal performance impact (˜2%)

I Intel Software Guard Extensions (SGX) provides support for encrypted enclaves

I How will this impact attackers vs. defenders?

@JacobTorrey — HARES Impact 6

I Provides a secure execution capability for running sensitive applications in contested environments

I Could integrate with cloud-computing offerings to minimize trust placed in cloud provider

I Security of solution dependent on , not security-through-obscurity

I Provides technology, does not enforce usage based on “morals”

@JacobTorrey — HARES Problem Statement 7

I Algorithms exposed to copying or theft

I Application code can be used to develop attacks

I Code can be reused for unintended purposes (ROP)

I Malware RE can lead to attribution and C&C attack

@JacobTorrey — HARES Current State-of-the-Art 8

I Obfuscation and packers — Analysis tools and live debugging can recover instructions

I VM-based obfuscation — Can still be mapped to and impacts performance

I Encrypting entire OS with trusted boot — Only prevents against offline attacks

@JacobTorrey — HARES Von Neumann versus Harvard Architectures 9

Von Neumann systems have a unified memory and buses for data and instructions:

Harvard architectures have separate memories for data and instructions:

@JacobTorrey — HARES AES-NI 10

I In response to software-based caching attacks on AES, Intel released instruction set to support AES

I Hardware logic is faster, and more protected

I Supports 128-bit and 256-bit AES

I Provides primitives, still requires engineering to make a safe system on top of these

@JacobTorrey — HARES TRESOR 11

I Uses AES-NI and CPU debug registers to provide accelerated, cold-RAM resistant AES on Linux

I Key loaded early boot

I Kernel patch to prevent applications reading debug registers

@JacobTorrey — HARES TLB-Splitting 12

I Translation lookaside buffer (TLB) is a cache for virtual −→ physical address translations

I Used to improve paging performance

I Logically treated as single entity, physically multiple components I Switches x86 platform from apparent Von Neumann to Harvard architecture:

I Used by PaX/GRSecurity to emulate no-execute bit I Shadow Walker used for memory-hiding root-kit functionality

@JacobTorrey — HARES TLB-Splitting (cont.) 13

Figure 1: TLB during normal operation

Figure 2: Split TLB

@JacobTorrey — HARES TLB-Splitting Impact 14

I TLB-splitting allows, through software, the apparent hardware architecture to be changed, per process

I Most end-point protection capabilities (PatchGuard, AV, anti-root-kit tools, etc.) assume a Von Neumann system; TLB-splitting breaks that assumption and can bypass all of them (I bet your AV vendor forgot to mention that...)

@JacobTorrey — HARES S-TLB 15

With Nehalem micro-architecture, an L2 cache was introduced, the S-TLB; breaks split-TLB assumptions

Figure 3: S-TLB

@JacobTorrey — HARES MoRE: Measurement of Running Executables 16

I DARPA Cyber Fast Track program I Explored using TLB-splitting for measurement/integrity verification of interleaved application

I Immutable code page (can repeatably measure in real-time) I Mutable data page (for variable isolation)

I Used EPT granular permissions to simulate a split-TLB on newer CPUs with S-TLB

I Thin-VMM to simulate Harvard architecture on per-process basis

@JacobTorrey — HARES Why HARES? 17

I HARES is an example of an encrypted capability (enclave), that allows exploration of the implications of encrypted execution

I Intel SGX soon will provide similar capabilities, but through hardware modifications, providing stronger security guarantees

I The following deep-dive will provide background of what the current state-of-the-art is

@JacobTorrey — HARES VMX Thin-Hypervisor 18

I Loaded as Windows 7 kernel driver

I Based on vmcpu root-kit example

I No emulation of devices, OS retains direct access to HW

I Minimal performance impact

I Can use VMCS exit conditions to track certain architectural events

@JacobTorrey — HARES On-CPU AES 19

I “Hoists” TRESOR on-CPU AES into VMM

I AES-NI key stored in CPU debug registers to protect against cold-RAM attack

I Adds VMCS exit condition for debug register accesses (return NULL or silently discard write)

I Decrypts executable sections of program into execute-only memory

@JacobTorrey — HARES TLB-Splitting 20

I All data-fetch requests, even from application itself, directed via EPT/TLB to encrypted page

I All instruction fetches are directed to execute-only, decrypted, memory

I Must track Windows application memory management events (COW) and ensure EPT structures correspond with OS-level structures

@JacobTorrey — HARES EPT Fault Flowchart 21

@JacobTorrey — HARES Tested Applications 22

I Command-line synthetic test-suite

I Microsoft Windows Notepad

I Microsoft Windows Paint

I Microsoft Windows Calculator Usability of HARES-protected applications was not noticeably impacted to end-user

@JacobTorrey — HARES Test Results 23

Overall, average performance impact was ˜2%

Test Name Average Execution Time (s) Average HARES Execution Time (s) Performance Impact (%) Pi 28.173 28.600 1.515 Randomized Sort1 0.016 0.031 95.83 Coin Flip 1.778 1.809 1.762 Timers 15.013 15.023 0.069 Table 1: Performance Results for CLI Test-Suite

1The randomized sort runs for such a short period that the initialization routine of the Windows process creation monitor almost doubles the execution time. If the TLB-splitting and periodic measurement is disabled, the run-time is still almost double. For this test, it is reasonable to assume that there is a constant increase of ˜.015s for each test case run under the VMX hypervisor, and due to this test case’s short duration it skewed the percentage. @JacobTorrey — HARES Demonstration System 24

Specifications for the demonstration system to overcome some prototype limitations:

I Intel Core-i series processor with AES-NI; Windows 7 32-bit OS

I Single processor [numproc=1] (Our thin-VMM only supports single core currently)

I 2GiB RAM [truncatememory=0x80000000] (Windows kernel memory layout changes > 2 GiB, and too lazy to update hard-coded values)

I No PAE/DEP [nx=AlwaysOff and pae=ForceDisable] (Again, hard-coded memory layout code)

@JacobTorrey — HARES Demonstration 25

[ALT-TAB]

@JacobTorrey — HARES Major Challenge: Mixed Code & Data 26

Code and string ”Statistics” sharing same section (.text):

@JacobTorrey — HARES Security Benefits 27

Protects against:

I Reverse-engineering and algorithmic IP theft

I “Weaponization” of crash case into RCE

I Mining application source for ROP gadgets

I Harvard architecture resistant to code injection attacks & pivoting

@JacobTorrey — HARES Weaknesses 28

I JTAG/ICE/XDM

I Memory Dumping

I DMA

I SMM/AMT

I Side-channels

I Emulation/VMM

I Syscall Inference

@JacobTorrey — HARES Overcoming Weaknesses & Future Work 29

Intel SGX will solve many of these, but for HARES:

I VT-d/IOMMU (DMA)

I Cache-as-RAM (memory dumping)

I DRTM launch (VMM)

I Combining with unique compilation

@JacobTorrey — HARES Offensive Use-Cases 30

I Offensive key management is challenging, needs further research into new models for a PKI-like system

I Easier to use when all parties are aware and in agreement

@JacobTorrey — HARES Physically Unclonable Functions 31

I Expose manufacturing variance across devices of same type

I Return unique (unclonable without FIB) values to given input I Examples:

I SRAM initializes to PUF pattern I Flash memory requires different amounts of charge to jump from 1 to 0 I FPGA gate fabric can exhibit PUFs when “racing” ring oscillators

I Provides assurances device has not been modified or simulated

@JacobTorrey — HARES S-DRTM 32

I Provides method to verify trustworthiness of platform without requiring hardware extensions such as Intel’s Trusted Execution Technologies I Typically uses timing-based attestation against remote trusted entity to measure system and loaded code and prevent active RE or emulation/virtualization

2

@JacobTorrey2Martignoni, — HARES L et al. Conqueror: Tamper-Proof Code Execution on Legacy Systems Offensive Key Management 33

I Boot-kit for key loading

I Conqueror for VMM/debugger detection I PUFs for machine-unique keys instead TPM (paired with multi-stage dropper)

I With protected execution and device-specific keys we can bootstrap a soft-TPM!

I Have trusted, opaque implants

@JacobTorrey — HARES AV Heuristics 34

I However, notepad.exe (unencrypted) with a 1-bit change to remove from MS white-list also hits 4/57

@JacobTorrey — HARES Attackers vs. Defenders 35

I There is a track record of attackers evolving to take advantage of technology quicker and more effectively than the defenders it was designed and built for:

I Blue Pill I SMM root-kits I ARM TrustZone root-kits I Encrypted malware?

I Personal incentives are asymmetric between offense and defense, “cyber” is not unique

@JacobTorrey — HARES Attackers vs. Defenders (cont.) 36

I Author of “From Pablo to Osama” makes thesis claim that clandestine organizations will always be more effective and agile due to the substantial personal risk & incentives involved with criminal activities

I Law-enforcement is typically government-run, much more slight personal risk & incentive (hard to fire government employees) I I posit this same dynamic exists in “cyber”: defenders rarely go to jail or lose livelihood from failure, offense can face severe consequences

I If you’re fired due to breach, you can put “Incident Response” on CV and get new job I Interesting to see how US - China cyber agreements will change as indictments keep coming

@JacobTorrey — HARES Concluding Remarks 37

I HARES demonstrates viability of encrypted execution on existing, common hardware

I Reversing difficulty will soon become intractable

I Intel SGX will be an exciting hardware extension to the platform and should be explored

I It will be explored and exploited by attackers far more effectively than by defenders

@JacobTorrey — HARES Related Works 38

I TrulyProtect - AES VM running decryption routine in cache

I PrivateCore (now Facebook) - Linux kernel running in cache, programs encrypted outside of cache

I Intel SGX - Hardware extensions to x86 to support HARES-like capabilities

I Working towards a Palladium-like trusted and opaque execution environment

@JacobTorrey — HARES Acknowledgments 39

I Mark Bridgman (@c0ercion) for his work on this effort

I Mudge (@dotMudge)& DARPA for supporting the precursor work (MoRE)

I Loc Nguyen (@nocsi )& Ryan Stortz (@withzombies) for their input from a reverse engineering perspective

@JacobTorrey — HARES References 40

Formal references can be found in the whitepaper for:

I GRSecurity PAGEEXEC

I Shadow Walker

I Intel SDM

I TRESOR & TRESOR-Hunt

I ARIUM website

I Self-hashing applications

I CoreBoot Cache-as-RAM and more.

@JacobTorrey — HARES Thanks! 41

Hope to see you again soon!

@JacobTorrey — HARES Questions & Discussion 42

I Thanks for your attention!

I Any questions? or let the heckling begin!

I Eleven page white-paper: http://jacobtorrey.com/HARES-WP.pdf

@JacobTorrey — HARES