<<

instead of Motivation Three Decades of Runtime Attacks

return-into- Return-oriented Morris Worm libc programming Continuing Arms 1988 Solar Designer Shacham Race 1997 CCS 2007

Borrowed Code Code Chunk Injection Exploitation AlephOne Krahmer 1996 2005 Recent Attacks Attacks on Tor Browser [2013] Stagefright [Drake, BlackHat 2015] FBI Admits It Controlled Tor Servers These issues in Stagefright code critically Behind Mass Malware Attack. expose 95% of Android devices, an estimated 950 million devices

MMS

Adversary

Cisco Router Exploit [2016] The Million Dollar Dissident [2016] Million CISCO ASA Firewalls potentially Government targeted human rights vulnerable to attacks defender with a chain of zero-day exploits to infect his iPhone with spyware.

Adversary Relevance and Impact

High Impact of Attacks

• Web browsers repeatedly exploited in pwn2own contests • Zero-day issues exploited in Stuxnet/Duqu [Microsoft, BH 2012] • iOS jailbreak

Industry Efforts on Defenses • Microsoft EMETCan includes either a ROP be detection bypassed, engine or may not • Microsoft Controlbe Flow sufficiently Guard (CFG) effectivein Windows 10 • Google‘s extension VTV (Virtual Table Verification) [Davi et al, Blackhat2014], [Liebchen et al CCS2015], • ’s Hardware[Schuster, Extension et CET al S&P2015] (Control-flow Enforcement Technology)

Hot Topic of Research

• A large body of recent literature on attacks and defenses Runtime Attacks & Defenses: Continuing Arms Race

Defenses Still seeking practicalAttacks and CFI, ROPGuard, BinCFI, ROPecker, secure solutions kBouncer, CCFIR, vTint, vfGuard, SafeDispatch, MoCFI, ROP wo Returns, RockJIT, TVip, Out-of-Control, StackArmor, CPI/CPS, Stitching the Oxymoron, XnR, Gadgets, SROP, JIT- Isomeron, ROP, BlindROP, O-CFI, COOP, StackDefiler, Readactor, "Missing the HAFIX, point(er)" ... The whole story ….. Runtime Attacks Code-Injection Attack Code-Reuse Attack

Basic Blocks (BBL) A Entry: instruction target of a branch A (e.g., first instruction of a function) Exit: Any branch B (e.g., indirect or direct jump/call, return) B corrupt code C D pointer C D

E F E F X DEPX inject malicious corrupt code pointer code Adversary flow Data Execution Prevention Program flow Return-oriented Programing (ROP): Prominent Code-Reuse Attack

ROP shown to be Turing-complete ROP: Basic Ideas/Steps  Use small instruction sequences instead of whole functions  Instruction sequences have length 2 to 5  All sequences end with a return instruction, or an indirect jump/call  Instruction sequences chained together as gadgets  Gadget perform particular task, e.g., load, store, xor, or branch  Attacks launched by combining gadgets  Generalization of return-to-libc Threat Model: Code-reuse Attacks

Application

RX ? Code 1 Writable Executable Code? Code? ? 2 ⊕ ? ? Opaque Memory Layout RW Data ? Data? Data ? 3 Disclose readable Memory RX ? ? Code? Data? 4 Manipulate writable Memory RW? 5 Computing Engine Data? Main Defenses against Code Reuse

1. Code Randomization

2. Control-Flow Integrity (CFI) Randomization vs. CFI

Randomization Control-flow Integrity Low Performance Overhead Formal Security (Explicit Control Flow Scales well to complex Checks) Software (OS, browser)

Information Disclosure Tradeoff: hard to prevent Performance & Security Challenging to integrate High entropy required in complex software, coverage EPISODE I Code Randomization Make gadgets locations unpredictable Fine-Grained ASLR Application Run 1 Application Run 2 Library (e.g., libc) Library (e.g., libc) Instruction RET Instruction Sequence 3 RET Sequence 1 Instruction RET Instruction Sequence 1 RET Sequence 2 Instruction RET Instruction Sequence 2 RET Sequence 3

 Instruction reordering/substitution within a BBL ORP [Pappas et al., IEEE S&P 2012]  Randomizing each instruction‘s location: ILR [Hiser et al., IEEE S&P 2012]  Permutation of BBLs: STIR [Wartell et al., CCS 2012] & XIFER [with Davi et al., AsiaCCS 2013] Randomization: Memory Leakage Problem

Direct memory disclosure • Pointer leakage on code pages • e.g., direct call and jump instruction

Indirect memory disclosure • Pointer leakage on data pages such as stack or heap • e.g., return addresses, function pointers, pointers in vTables JIT-ROP: Bypassing Randomization via Direct Memory Disclosure

Just-In-Time Code Reuse: On the Effectiveness of Fine-Grained Address Space Layout Randomization IEEE Security and Privacy 2013, and Blackhat 2013 Kevin Z. Snow, Lucas Davi, Alexandra Dmitrienko, Christopher Liebchen, Fabian Monrose, Ahmad-Reza Sadeghi Just-In-Time ROP: Direct Memory Disclosure

1 Undermines fine-grained ASLR

Shows memory disclosures are far more 2 damaging than believed

3 Can be instantiated with real-world exploit Readactor: Towards Resilience to Memory Disclosure

Readactor: Practical Code Randomization Resilient to Memory Disclosure IEEE Security and Privacy 2015 Stephen Crane, Christopher Liebchen, Andrei Homescu, Lucas Davi, Per Larsen, Ahmad-Reza Sadeghi, Stefan Brunthaler, Michael Franz CodeCode Randomization: Randomization: AttackAttack & & Defense Defense Techniques Techniques

Attack Timeline Static Attack RX Application Morris Worm / (Fine-grained) Return to libc [Solar Function_A Randomization Designer Bugtraq’97] ?Execute? - Function_B? Just-In-Time ROP [Snow et ? al. IEEE S&P’13] Direct code only Trampoline? Target disclosure call Function_A Isomeron (Attack) [Davi et poppopMemory ebx?ebx ? Execute-only pop ebx ? ?? al. NDSS’15] poppop? ecx?ecx ? Memory (XoM) ret?ret ? TrampolineXoM Trampoline Indirect code RW Reuse Attacks disclosure Pointers Register Code-pointer Return Address randomization hiding Code Randomization: Attack & Defense Techniques

RX Application Attack Timeline JIT Code JIT Code Attacks Counterfeit Object-oriented Same Protection Programming (COOP) XoM? ? ? as for AOT Code ? ? [Schuster et al. IEEE S&P’15] TrampolineXoM Crash-Resistant Oriented Brute-force X-Virtual Table Programming [Gawlik et al. Attacks on vFunc2 Tramp NDSS’16] Entropy Booby Traps BoobyXoM Trap Terminate vFunc1 Tramp

RW Pointers Function Pointer Trampoline Function Virtual Table ? Reuse for Reuse Attacks Ptrvirt.X Function1-virt table Attack Surface Single Function Trampolines & virt. Function2 Large enough? Pointers Booby Traps EPISODE II Control-Flow Integrity (CFI) Restricting indirect targets to a pre-defined control-flow graph Original CFI Label Checking [Abadi et al., CCS 2005 & TISSEC 2009] BBL A label_A ENTRY asm_ins, … EXIT

ATwo Questions CFI CHECK: 1. Benign and correct execution?EXIT(A) -> label_B ? C 2. BRuntimeBBL enforcement? B label_B ENTRY asm_ins, … EXIT CFI: CFG Analysis and Coverage Problem

CFG Analysis • Conservative “points-to” analysis • e.g., over-approximate to avoid breaking the program

CFG Coverage • Precision of CFG analysis determines security of CFI policy • e.g., more precise  more secure Which Instructions to Protect?

• Purpose: Return to calling function Returns • CFI Relevance: Return address located on stack

Indirect • Purpose: tables, dispatch to library functions • CFI Relevance: Target address taken from either Jumps register or memory

Indirect • Purpose: call through function pointer, virtual table calls • CFI Relevance: Target address taken from either processor Calls register or memory Label Granularity: Trade-Offs (1/2)  Many CFI checks are required if unique labels are assigned per node

CFI Check Exit(B) == Label_1 [Label_3, Label_4, Label_5] A Basic Block B Label_2 Label

Label_3 C D Label_4

E F Label_5 Label_6 Label Granularity: Trade-Offs (2/2)  Optimization step: Merge labels to allow single CFI check  However, this allows for unintended control-flow paths

CFI Check Exit(B) == Label_3 A Label_1 Basic Block B Label_2 Label

Label_3 C D Label_3Label_4

Exit(C) == Label_3 E F Label_5Label_3 Label_6 Label Problem for Returns  Static CFI label checking  Shadow stack allows for leads to coarse-grained fine-grained return protection for returns address protection but incurs higher overhead

Forward- CALL RET Edge CFI Backup Check Label_1 Label_2 Shadow Stack A‘ A B B‘ Backup storage for CALL return addresses R Return Addr … … RET Backward -Edge CFI Exit(R) == [Label_1, Label_2] Return Addr A’ Forward- vs. Backward-Edge  Some CFI schemes consider only forward-edge CFI  Google’s VTV and IFCC [Tice et al., USENIX Sec 2015]  SAFEDISPATCH [Jang et al., NDSS 2014]  And many more: TVIP, VTint, vfguard  Assumption: Backward-edge CFI through stack protection  Problems of stack protections:  Stack Canaries: memory disclosure of canary  ASLR (base address randomization of stack): memory disclosure of base address  Variable reordering (memory disclosure) Losing Control: On the Effectiveness of Control-Flow Integrity under Stack Attacks ACM CCS 2015 Christopher Liebchen, Marco Negro, Per Larsen, Lucas Davi, Ahmad-Reza Sadeghi, Stephen Crane, Mohaned Qunaibit, Michael Franz, Mauro Conti StackDefiler

• Goal: • Bypass fine-grained Control-Flow Integrity • IFCC & VTV (CFI implementations by Google for GCC and LLVM) • Approach: • Due to optimization by compiler critical CFI pointer is spilled on the stack • StackDefiler discloses the stack address and overwrites the spilled CFI pointer • At restoring of spilled registers a malicious CFI pointer is used for future CFI checks • No stack-based vulnerability needed Bypassing (Coarse-grained) CFI

COOP IEEE S&P 2015 USENIX Security 2014 Felix Schuster, Thomas Tendyck, Lucas Davi, Daniel Lehmann, Christopher Liebchen, Lucas Davi, Ahmad-Reza Sadeghi, Fabian Monrose Ahmad-Reza Sadeghi, Thorsten Holz Coarse-grained CFI: Lessons Learned Control-Flow Integrity

1. Too many callOut sites of control available Control-Flow Bending → Restrict returns[Göktas toet al.,their actual[Carlini caller et al., (shadow stack) IEEE S&P 2014] USENIX Sec. 2015] 2. Heuristics are ad-hoc and ineffective Stitching the gadgets FlowStich → Adjusted sequence[Davi et al., length leads[Hu et to al., high false positive USENIX Sec. 2014] USENIX Sec. 2015] 3. Too many indirect jump and call targets ROP is still dangerous Control Jujutsu  Resolving indirect[Carlini et al.,jumps and[Evans calls et alis., non-trivial → Compromise:USENIX Compiler Sec. 2014] supportCCS 2015] Size does matter StackDefiler [Göktas et al., [Conti et al., USENIX Sec. 2014] CCS 2015]

COOP Signal-oriented Programming (SROP) [Schuster et al., [Bosman et al., IEEE S&P 2015] IEEE S&P 2014] Hardware CFI Why Leveraging Hardware for CFI ?

Efficiency Security

Dedicated CFI instructions Isolated CFI storage

CFI_RETURN CFI Memory

CFI_JUMP Branch Targets CFI_CALL Why CFI Processor Support?

CFI Processor Support based on Instruction set architecture (ISA) extensions

Dedicated CFI instructions

Avoids offline training phase

Instant attack detection CFI control state: Binding CFI data to CFI state and instructions HAFIX++

Strategy Without Tactics: Policy-Agnostic Hardware-Enhanced Control-Flow Integrity Design Automation Conference (DAC 2016) Dean Sullivan, Orlando Arias, Lucas Davi, Per Larsen, Ahmad-Reza Sadeghi, Yier Jin Objectives Backward-Edge and Stateful, CFI policy agnostic Forward-Edge CFI No burden on developer No code annotations/changes

Security Hardware protection On-Chip Memory for CFI Data No unintended sequences High performance < 3% overhead

Enabling technology All applications can use CFI features Support of Multitasking Compatibility to legacy code CFI and non-CFI code on same platform HAFIX++ Fine-Grained CFI State Model

Initialization FE Control-Flow BE Control-Flow

CFI disabled insn State 2 • Support for bothpre-call CFI/nonSave FE- CFIcall processes State 1 target pre-call State 0 CFI State 1 Idle execution • Strict enforcementCFI of unique forward- ret execution pre-jump CFI State 2 edge control-flow targetsState 2 Save return enabled Save FE jump target call/jmp target State 1 State 3 CFI • Strict enforcement of unique backward- State 3 CFI Check execution edge controlCFI Check-flowX targets X State 4 State 4 Stop Stop execution execution HAFIX++ ISA Extensions

cfibr Issued at call site  setup Backward (BW) Edge

cfiret Issue at return site  check BW Edge

cfiprc Issued at call site  setup call target

cfiprj Issued at jump site  setup jump target

cfichk Issued at call/jmp target  check Forward (FW) Edge

Label State • Fine-grained forward edge control-flow policy Stack (LSS) • Separation of call/jump • Unique label per target • Fine-grained backward edge control-flow policy Label State • Return to only most recently issued return label Register (LSR) Indirect Call Policy

Label State Function A Stack (LSS) CFIBR label_A1 State 0 CFI State CFILSR label_B Normal Execution Only CFI instructions CALL *reg allowed label_A1 A1 CFIRET label_A1 Function A Code CALL *reg CFIBR label_A1

Code … CFILSR label_B Function B Function B CFICHK label_B CFICHK label_B B Code Code RET Label State RET CFIRET label_A1 Register (LSR)

label_B Function Return Policy

Function A State 0 CFI State Label State CFIBR label_A1 Normal Execution Only CFI instructions CALL B allowed Stack (LSS) A1 CFIRET label_A1 Function A Code CALL B Code CFIBR label_A1 label_A1 Function B Function B … Code Code CFIRET label_A1 RET RET HAFIX++ Pipeline

Convert CFI to NOP

Fetch Decode Execute Memory Write insn1CFI NOP insn3insn2 insn5insn4 Forward CFI to Label does not match CFI Control Unit CFI Label State → Stop Execution Memory CFI label label Forward label to CFI Control Unit to check activity Label access in dedicated memory Function A (25) Function B (31) Call Function B CFICHK CFICHK 25 31 insn Push Label 251 onto LSS insn CFIBR 251 insn CFIPRC 31 RET CALL *reg Return to Function A CFIRET 251 Label 31 valid insn Store Label Pop252 label off stack CFIPRJ 252 and validate JMP *reg CFICHK 252 insn CFICHK Label 252 valid 252 insn Store Label 31 to LSR CFIBR 253CFIPRC 45 Label State Register Label State Stack CALL *reg CFIRET 2525231 253 25112 insn RET Challenges … Architectural Issues • Runtime overhead caused by CFI instrumentation o Initializing and validating the CFI state upon every FW/BW edge o I- pressure during instruction fetch o Effective CPI • Runtime overhead and problems caused by hardware o Branch instruction occur about every 3-5 instructions o CFI instructions/operations around every one of them o Memory access for CFI metadata is slow o CFI metadata could be corrupted if considered data (StackDefiler) o CFI metadata could be a bottleneck if placed in code The Multiple Callers Problem

Function A Common Callee Function B CFIPRC 45 CFIPRC 33 CALL *reg CALL *reg • We can not assign both 45 and 33 at the same time. Fn Q • We could assign a common label45 to all targets

Fn M• Introduces erroneousFn P edges in theFn R Control Flow GraphFn U 45 → Call targets must be45 disjointed!33 Use a trampoline! 33 Fn N Fn O Fn S Fn T 45 45 33 33 Label confusion! System Challenges

1 Sharing CFI subsystem resources

2 Separation of process states

3 Handling CFI Module Exceptions

4 Handling of legacy code The Scheduling Issue

Process 1 Process 2

0287 LSSP 3536 7932 LSSP 0452 3589 8459 9265 8182 1415 1618 7182 5772 0003 Label State 0002 Label State Label State Label State Stack Register

This is running This is being scheduled The Scheduling Issue

Process 1 Process 2

7932 LSSP 3589 9265 1415 1618 0003 Label State Label State Label State Stack Register Label State Stack Register

This is running This is being scheduled The Stack Issue

2884 We ran out of stack 7950 space! What do we do? 3832 2643 3846 7932 LSSP 3589 9265 1415 0003 Label State Stack The Process Control Block

• Representation of a process to the kernel • In Linux, look for task_struct in include/linux/sched.h • Information contains: • Execution state (runnable, suspended, zombie…) • allocations • Process owner • Process group • Process id • I/O status information • CPU context state Kernel Scheduler Additions read current CFI awareness if CFI is enabled backup CFI state for current read next CFI awareness if CFI is enabled restore CFI state for next else disable CFI subsystem The Scheduling Issue Resolved

CFI Subsystem Process 1 -- PCB

TASK_RUNNING CFI Context CFI_ON

0287 LSSP … 3536 04527932 LSSP Process 2 -- PCB 84593589 81829265 TASK_RUNNING CFI Context 71821415 16185772 CFI_ON 00020003 Label State Label State Stack Register … The Scheduling Issue Resolved

CFI Subsystem Process 1 -- PCB

TASK_RUNNING CFI Context CFI_ON

7932 LSSP Process 2 -- PCB 3589 9265 TASK_RUNNING CFI Context 1415 1618 CFI_OFF 0003 Label State Label State Stack Register … Your stack still overflows or underflows for that matter

• We use the PCB already, add things there on overflow: copy bottom half of current’s LSS to PCB move top half of LSS to bottom set LSSP to new location on underflow: get bottom half of current’s LSS from PCB set LSSP to new location The Stack Issue Resolved

CFI Subsystem Process 1 -- PCB

2884 LSSP TASK_RUNNING 7950 CFI Context CFI_ON 3832 2643 … 3846 7932 3589 9265 1415 1618 0003 Label State Label State Stack Register CFI Faults

• The CFI subsystem detected a CFI violation • Add kernel log entry with CFI fault information • Send SIGKILL to offending process • This kills the process with no chance of a signal handler running Related Works

 HCFI:  New instructions to track control flow  Combines and relocates instructions into pipeline bubble slots  Single threaded, embedded applications only  Intel CET:  Shadow stack for return addresses  New register ssp for the shadow stack  Conventional move instructions cannot be used in shadow stack  New instructions to operate on shadow stack  New instruction for indirect call/jump targets: branchend  Any indirect call/jump can target any valid target Control-flow Enforcement Technology [Intel 2016]

Function A Function B A1 call [D1] ✔ B1 ENDBRANCH A2 call [D2] B2 push eax A3 ins ✘ B3 pop eax A4 return B4 return

✘✔ Function C ✔ C1 ENDBRANCH C2 … C3 return

Shadow Stack Stack Data

A2 A23 D1 B1

D2 BCB311 Control-flow Enforcement Technology [Intel 2016]

• Backward edge: • Shadow stack detects return-address manipulation • Shadow stack protected, cannot be accessed by attacker • New register ssp for the shadow stack • Conventional move instructions cannot be used in shadow stack • New instructions to operate on shadow stack

• Forward edge: • New instruction for indirect call/jump targets: branchend • Any indirect call/jump can target any valid indirect branch target • Could be combined with fine-grained compiler-based CFI (LLVM CFI) Comparison with HAFIX++

Shared library BE-Support FE-Support Granularity Overhead & Multitasking

XFI Budiu et al, ASID 2006 Coarse 3.75%

HAFIX Coarse 2% Davi et al., DAC 2015 Architectural dependent optimizations LandHere Coarse N/A http://landhere.galois.com Can branchHCFI to any call/jump target Christoulakiswith et endbranchal., inst. Fine 1% CODASPY 2016 Intel CET https://software.intel.com/site s/default/files/managed/4d/2a Coarse N/A /control-flow-enforcement- technology-preview.pdf

HAFIX++ Fine 1.75% Sullivan et al., DAC 2016