Ghosts in a Nutshell Moritz Lipp (@mlqxyz) Claudio Canella (@cc0x1f) Who am I?

Moritz Lipp PhD student @ Graz University of Technology 7 @mlqxyz R [email protected]

1 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Who am I?

Claudio Canella PhD student @ Graz University of Technology 7 @cc0x1f R [email protected]

2 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Media Media Meltdown & Spectre

• Side-Channel Vulnerability Variant 1, 2, 3 • and Variant 3a • Meltdown and Spectre have been disclosed

4 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Motivation

• Variant 1 - Bounds Check Bypass (BCB) • Variant 2 - Branch Target Injection (BTI) • Variant 3 - Rogue Data Cache Load (RDCL) • Variant 3a - Rogue System Register Read (RSRR) • Variant 4 - (SSB) • Variant 1.1 - Bounds Check Bypass Store (BCBS) • Variant 1.2 - Read-only protection bypass (RPB) • Lazy FP State Restore

5 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Motivation

• SpectreRSB - Return Mispredict • • L1 Terminal Fault (L1TF) • Portsmash • Netspectre • SMoTherSpectre • SPOILER

6 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Motivation

• KAISER patch / KPTI / KVA Shadow • Microcode Updates • IBRS / STIPB / IBPB • Retpoline • Taint Tracking • Serialization • InvisiSpec / SafeSpec / DAWG • RSB Stuffing • Site Isolation • SSBD / SSBB • ...

7 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Now you already lost me . . . • Give a comprehensible overview of all attacks and defenses • Show that systematic analysis allows to find new attacks and circumventions of countermeasures

What this talk is

• We want to shed some light and make it less confusing

8 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Show that systematic analysis allows to find new attacks and circumventions of countermeasures

What this talk is

• We want to shed some light and make it less confusing • Give a comprehensible overview of all attacks and defenses

8 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) What this talk is

• We want to shed some light and make it less confusing • Give a comprehensible overview of all attacks and defenses • Show that systematic analysis allows to find new attacks and circumventions of countermeasures

8 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Background • Information leaks due to underlying hardware • Exploit leakage through side-effects

Power Execution Microarchitectural consumption time elements

Side-channel Attacks

• Bug-free software does not mean safe execution

9 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Exploit leakage through side-effects

Power Execution Microarchitectural consumption time elements

Side-channel Attacks

• Bug-free software does not mean safe execution • Information leaks due to underlying hardware

9 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Power Execution Microarchitectural consumption time elements

Side-channel Attacks

• Bug-free software does not mean safe execution • Information leaks due to underlying hardware • Exploit leakage through side-effects

9 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Side-channel Attacks

• Bug-free software does not mean safe execution • Information leaks due to underlying hardware • Exploit leakage through side-effects

Power Execution Microarchitectural consumption time elements

9 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Interface between hardware and software • Microarchitecture is an ISA implementation

Architecture vs Microarchitecture

• Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, . . . )

10 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Microarchitecture is an ISA implementation

Architecture vs Microarchitecture

• Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, . . . ) • Interface between hardware and software

10 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Architecture vs Microarchitecture

• Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, . . . ) • Interface between hardware and software • Microarchitecture is an ISA implementation

10 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Architecture vs Microarchitecture

• Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, . . . ) • Interface between hardware and software • Microarchitecture is an ISA implementation

10 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Caches and buffer Predictor

• Transparent for the programmer • Timing optimizations → side-channel leakage

Microarchitectural Elements

• Modern CPUs contain multiple microarchitectural elements

11 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Transparent for the programmer • Timing optimizations → side-channel leakage

Microarchitectural Elements

• Modern CPUs contain multiple microarchitectural elements

Caches and buffer Predictor

11 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Timing optimizations → side-channel leakage

Microarchitectural Elements

• Modern CPUs contain multiple microarchitectural elements

Caches and buffer Predictor

• Transparent for the programmer

11 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Microarchitectural Elements

• Modern CPUs contain multiple microarchitectural elements

Caches and buffer Predictor

• Transparent for the programmer • Timing optimizations → side-channel leakage

11 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Caches & Cache Attacks Cache

printf("%d", i);

printf("%d", i);

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

Cache miss printf("%d", i);

printf("%d", i);

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

Cache miss Request printf("%d", i);

printf("%d", i);

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

Cache miss Request printf("%d", i);

printf("%d", i); Response

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

Cache miss Request printf("%d", i); i printf("%d", i); Response

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

Cache miss Request printf("%d", i); i printf("%d", i); Response Cache hit

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

DRAM access, slow

Cache miss Request printf("%d", i); i printf("%d", i); Response Cache hit

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache

DRAM access, slow

Cache miss Request printf("%d", i); i printf("%d", i); Response Cache hit

No DRAM access, much faster

12 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Caching speeds up Memory Accesses

·104 Cache hit 3 Cache miss

2

1 Number of accesses

0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 Measured access time (CPU cycles)

13 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush+Reload

14 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Only meta data. Not interesting and not in any threat model.

Cache Attacks

• Leak cryptographic keys • Leak information on co-located virtual machines • Monitor function calls of other applications • Build covert communication channels • ...

15 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Cache Attacks

• Leak cryptographic keys • Leak information on co-located virtual machines • Monitor function calls of other applications • Build covert communication channels • ...

Only meta data. Not interesting and not in any threat model.

15 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Out-of-order execution Out-of-order Execution

int width = 10, height = 5;

float diagonal= sqrt(width* width + height* height); int area= width* height;

printf("Area%dx%d=%d\n", width, height, area);

16 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Out-of-order Execution

Parallelize

int width = 10, height = 5;

float diagonal= sqrt(width* width + height* height); int area= width* height;

Dependency printf("Area%dx%d=%d\n", width, height, area);

16 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • dispatched to the backend • processed by individual execution units

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue

µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP

MUX

Allocation Queue

µOP µOP µOP µOP Instructions are

CDB Reorder buffer µOP µOP µOP µOP µOP µOP µOP µOP • fetched and decoded in the front-end Scheduler

µOP µOP µOP µOP µOP µOP µOP µOP AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

17 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • processed by individual execution units

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue

µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP

MUX

Allocation Queue

µOP µOP µOP µOP Instructions are

CDB Reorder buffer µOP µOP µOP µOP µOP µOP µOP µOP • fetched and decoded in the front-end Scheduler µOP µOP µOP µOP µOP µOP µOP µOP • dispatched to the backend AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

17 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue

µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP

MUX

Allocation Queue

µOP µOP µOP µOP Instructions are

CDB Reorder buffer µOP µOP µOP µOP µOP µOP µOP µOP • fetched and decoded in the front-end Scheduler µOP µOP µOP µOP µOP µOP µOP µOP • dispatched to the backend AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine • processed by individual execution units Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

17 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • wait until their dependencies are ready • Later instructions might execute prior earlier instructions • retire in-order • State becomes architecturally visible • Exceptions are checked during retirement • Flush pipeline and recover state

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue

µOP µOP µOP µOP

CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler

µOP µOP µOP µOP µOP µOP µOP µOP AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Later instructions might execute prior earlier instructions • retire in-order • State becomes architecturally visible • Exceptions are checked during retirement • Flush pipeline and recover state

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue µOP µOP µOP µOP • wait until their dependencies are ready CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler

µOP µOP µOP µOP µOP µOP µOP µOP AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • retire in-order • State becomes architecturally visible • Exceptions are checked during retirement • Flush pipeline and recover state

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue µOP µOP µOP µOP • wait until their dependencies are ready CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler • Later instructions might execute prior earlier instructions

µOP µOP µOP µOP µOP µOP µOP µOP AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • State becomes architecturally visible • Exceptions are checked during retirement • Flush pipeline and recover state

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue µOP µOP µOP µOP • wait until their dependencies are ready CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler • Later instructions might execute prior earlier instructions µOP µOP µOP µOP µOP µOP µOP µOP • retire in-order AGU Load data Load data Store data ALU, Branch ALU, Vect, . . . ALU, AES, . . . ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Exceptions are checked during retirement • Flush pipeline and recover state

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue µOP µOP µOP µOP • wait until their dependencies are ready CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler • Later instructions might execute prior earlier instructions µOP µOP µOP µOP µOP µOP µOP µOP • retire in-order AGU Load data Load data Store data ALU, Branch ALU, Vect, . . .

ALU, AES, . . . • State becomes architecturally visible ALU, FMA, . . . Execution Engine Execution Units

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Flush pipeline and recover state

Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue µOP µOP µOP µOP • wait until their dependencies are ready CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler • Later instructions might execute prior earlier instructions µOP µOP µOP µOP µOP µOP µOP µOP • retire in-order AGU Load data Load data Store data ALU, Branch ALU, Vect, . . .

ALU, AES, . . . • State becomes architecturally visible ALU, FMA, . . . Execution Engine Execution Units • Exceptions are checked during retirement

Load Buffer Store Buffer

DTLB STLB L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Out-of-Order Execution

ITLB L1 Instruction Cache

Branch Instruction Fetch & PreDecode Predictor Instruction Queue Instructions µOP Cache 4-Way Decode Frontend µOPs µOP µOP µOP µOP MUX • are executed out-of-order Allocation Queue µOP µOP µOP µOP • wait until their dependencies are ready CDB Reorder buffer

µOP µOP µOP µOP µOP µOP µOP µOP

Scheduler • Later instructions might execute prior earlier instructions µOP µOP µOP µOP µOP µOP µOP µOP • retire in-order AGU Load data Load data Store data ALU, Branch ALU, Vect, . . .

ALU, AES, . . . • State becomes architecturally visible ALU, FMA, . . . Execution Engine Execution Units • Exceptions are checked during retirement

Load Buffer Store Buffer

DTLB STLB • Flush pipeline and recover state L1 Data Cache Memory

Subsystem L2 Cache

18 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) The state does not become architecturally visible but . . . • Do they leave any side effects? • Can we observe them?

Thinking

• What happens to the instructions that are dismissed?

19 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Can we observe them?

Thinking

• What happens to the instructions that are dismissed? • Do they leave any side effects?

19 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Thinking

• What happens to the instructions that are dismissed? • Do they leave any side effects? • Can we observe them?

19 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Toy example

*(volatile char*) 0; // raise_exception(); array[84 * 4096] = 0;

20 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • “Unreachable” code line was actually executed • Exception was only thrown afterwards • Out-of-order instructions leave microarchitectural traces • Give such instructions a name: transient instructions

Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

21 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Exception was only thrown afterwards • Out-of-order instructions leave microarchitectural traces • Give such instructions a name: transient instructions

Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

• “Unreachable” code line was actually executed

21 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Out-of-order instructions leave microarchitectural traces • Give such instructions a name: transient instructions

Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

• “Unreachable” code line was actually executed • Exception was only thrown afterwards

21 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Give such instructions a name: transient instructions

Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

• “Unreachable” code line was actually executed • Exception was only thrown afterwards • Out-of-order instructions leave microarchitectural traces

21 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

• “Unreachable” code line was actually executed • Exception was only thrown afterwards • Out-of-order instructions leave microarchitectural traces • Give such instructions a name: transient instructions

21 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Does the CPU ignore the semantics of exceptions before handling it? • This isolation is a combination of hardware and software • User applications cannot access anything from the kernel

Memory Isolation

Userspace Kernelspace • Kernel is isolated from user space

Operating Applications System Memory

22 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • User applications cannot access anything from the kernel

Memory Isolation

Userspace Kernelspace • Kernel is isolated from user space • This isolation is a combination of hardware and software

Operating Applications System Memory

22 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Isolation

Userspace Kernelspace • Kernel is isolated from user space • This isolation is a combination of hardware and software • User applications cannot access Operating Applications System Memory anything from the kernel

22 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Physical memory is organized in page frames • Virtual memory pages are mapped to page frames using page tables

Paging

• CPU support virtual address spaces to isolate processes

23 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Virtual memory pages are mapped to page frames using page tables

Paging

• CPU support virtual address spaces to isolate processes • Physical memory is organized in page frames

23 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Paging

• CPU support virtual address spaces to isolate processes • Physical memory is organized in page frames • Virtual memory pages are mapped to page frames using page tables

23 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Address Translation on x86-64

PML4 CR3 PML4E 0 PML4E 1 · · PDPT #PML4I PDPTE 0 · · PDPTE 1 PML4E 511 · · Page Directory #PDPTI PDE 0 · · PDE 1 PDPTE 511 · · Page Table PDE #PDI PTE 0 · · PTE 1 PDE 511 · · 4 KiB Page PTE #PTI Byte 0 · · Byte 1 PTE 511 · · Offset · · PML4I (9 b) PDPTI (9 b) PDI (9 b) PTI (9 b) Offset (12 b) Byte 4095 48-bit virtual address

24 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Address Translation on x86-64

PML4 CR3 PML4E 0 PML4E 1 · · PDPT #PML4I PDPTE 0 · · PDPTE 1 PML4E 511 · · Page Directory #PDPTI PDE 0 · · PDE 1 PDPTE 511 · · Page Table PDE #PDI PTE 0 · · PTE 1 PDE 511 · · 4 KiB Page PTE #PTI Byte 0 · · Byte 1 PTE 511 · · Offset · · PML4I (9 b) PDPTI (9 b) PDI (9 b) PTI (9 b) Offset (12 b) Byte 4095 48-bit virtual address

24 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Page Table Entry

P RW US WT UC R D S G Ignored Physical Page Number

Ignored PK X

• User/Supervisor bit defines in which privilege level the page can be accessed

25 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Then check whether any part of array is cached

Building the Code

• Add another layer of indirection to test

char data = *(char*) 0xffffffff81a000e0; // kernel address array[data * 4096] = 0;

26 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Building the Code

• Add another layer of indirection to test

char data = *(char*) 0xffffffff81a000e0; // kernel address array[data * 4096] = 0;

• Then check whether any part of array is cached

26 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Permission check is in some cases not fast enough

Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

• Index of cache hit reveals data

27 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Building the Code

• Flush+Reload over all pages of the array

500 400 300 [cycles] Access time 0 50 100 150 200 250 Page

• Index of cache hit reveals data • Permission check is in some cases not fast enough

27 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f)

• Entire physical memory is typically accessible through kernel space

Meltdown

• Using out-of-order execution, we can read data at any address

28 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Meltdown

• Using out-of-order execution, we can read data at any address • Entire physical memory is typically accessible through kernel space

28 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Experiment where a thread flushes the value constantly and a thread on a different core reloads the value • Target data is not in the L1 cache of the attacking core • We can still leak the data at a lower reading rate • Meltdown might implicitly cache the data

Uncached memory

• Assumed that one can only read data stored in the L1 with Meltdown

29 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Target data is not in the L1 cache of the attacking core • We can still leak the data at a lower reading rate • Meltdown might implicitly cache the data

Uncached memory

• Assumed that one can only read data stored in the L1 with Meltdown • Experiment where a thread flushes the value constantly and a thread on a different core reloads the value

29 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • We can still leak the data at a lower reading rate • Meltdown might implicitly cache the data

Uncached memory

• Assumed that one can only read data stored in the L1 with Meltdown • Experiment where a thread flushes the value constantly and a thread on a different core reloads the value • Target data is not in the L1 cache of the attacking core

29 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Meltdown might implicitly cache the data

Uncached memory

• Assumed that one can only read data stored in the L1 with Meltdown • Experiment where a thread flushes the value constantly and a thread on a different core reloads the value • Target data is not in the L1 cache of the attacking core • We can still leak the data at a lower reading rate

29 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Uncached memory

• Assumed that one can only read data stored in the L1 with Meltdown • Experiment where a thread flushes the value constantly and a thread on a different core reloads the value • Target data is not in the L1 cache of the attacking core • We can still leak the data at a lower reading rate • Meltdown might implicitly cache the data

29 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Uncached memory

• Assumed that one can only read data stored in the L1 with Meltdown • Experiment where a thread flushes the value constantly and a thread on a different core reloads the value • Target data is not in the L1 cache of the attacking core • We can still leak the data at a lower reading rate • Meltdown might implicitly cache the data

29 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Which bits are left in the PTE? Page Table Entry

P RW US WT UC R D S G Ignored Physical Page Number

Ignored PK X

• Present bit defines whether a page is present in physical memory.

30 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Use virtual machine → completely controlled by attacker

Problem

• Can not really manage the PTE from user space

31 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Problem

• Can not really manage the PTE from user space • Use virtual machine → completely controlled by attacker

31 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Virtual Machine: Extended Page Table Nested Walk Nested Walk Nested Walk Nested Walk ...... #0 #1 #0 #1 #0 #1 #0 #1 #511 #511 #511 #511 Offset Byte 0 Byte 1 Byte 4095 Page Table 4 KiB Page PTE (guest) PDE (guest) Pointer Table Page Directory Page Directory PML4E (guest) PDPTE (guest) Pag-Map Level-4 Nested Walk CR3 (guest)

63 0

Sign extended (16 bit) PML4I (9 bit) PDPTI (9 bit) PDI (9 bit) PTI (9 bit) Offset (12 bit)

64-bit guest virtual address

32 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Present Bit

Page Table PTE 0 PTE 1 · · PTE #PTI · · PTE 511

L1 Cache

33 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Present Bit

Page Table PTE 0 PTE 1 · · present PTE #PTI · · PTE 511

L1 Cache

33 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Present Bit

Page Table PTE 0 PTE 1 · · present Guest Physical PTE #PTI · to Host Physical · PTE 511

L1 Cache

33 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Present Bit

Page Table PTE 0 PTE 1 · · present Guest Physical PTE #PTI Physical · to Host Physical · Page PTE 511

L1 lookup L1 with Cache physical address

33 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Present Bit

Page Table PTE 0 PTE 1 · · not present PTE #PTI · · PTE 511

L1 Cache

33 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Present Bit

Page Table PTE 0 PTE 1 · · not present PTE #PTI · · PTE 511

L1 lookup L1 with Cache virtual address

33 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Demo • Leak data from L1 data cache • Affects virtual machines (VM), hypervisors (VMM), operating systems (OS) and system management mode (SMM) • Read SGX-protected memory and leak machine’s private attestation key

Impact

• Foreshadow or L1TF

34 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Affects virtual machines (VM), hypervisors (VMM), operating systems (OS) and system management mode (SMM) • Read SGX-protected memory and leak machine’s private attestation key

Impact

• Foreshadow or L1TF • Leak data from L1 data cache

34 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Read SGX-protected memory and leak machine’s private attestation key

Impact

• Foreshadow or L1TF • Leak data from L1 data cache • Affects virtual machines (VM), hypervisors (VMM), operating systems (OS) and system management mode (SMM)

34 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Impact

• Foreshadow or L1TF • Leak data from L1 data cache • Affects virtual machines (VM), hypervisors (VMM), operating systems (OS) and system management mode (SMM) • Read SGX-protected memory and leak machine’s private attestation key

34 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) There are still bits left in the PTE Page Table Entry

P RW US WT UC R D S G Ignored Physical Page Number

Ignored PK X

• PK bits define the assigned protection key

35 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • 4 bits in PTE identify key for protected memory regions • Quick update of access rights • Available on Xeon CPUs

Intel MPK

• Protection key for a group of pages

36 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Quick update of access rights • Available on Intel Xeon CPUs

Intel MPK

• Protection key for a group of pages • 4 bits in PTE identify key for protected memory regions

36 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Available on Intel Xeon CPUs

Intel MPK

• Protection key for a group of pages • 4 bits in PTE identify key for protected memory regions • Quick update of access rights

36 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Intel MPK

• Protection key for a group of pages • 4 bits in PTE identify key for protected memory regions • Quick update of access rights • Available on Intel Xeon CPUs

36 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Protected value is forwarded to transient instructions

Protection Keys

• Protection keys are lazily enforced

37 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Protection Keys

• Protection keys are lazily enforced • Protected value is forwarded to transient instructions

37 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Other exceptions? • Data used in transient execution • Attacker can determine data • First Meltdown-type attack on AMD

Bound Range Exceeded

• x86 provides dedicated instruction raising #BR exception if bound-range is exceeded

38 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Attacker can determine data • First Meltdown-type attack on AMD

Bound Range Exceeded

• x86 provides dedicated instruction raising #BR exception if bound-range is exceeded • Data used in transient execution

38 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • First Meltdown-type attack on AMD

Bound Range Exceeded

• x86 provides dedicated instruction raising #BR exception if bound-range is exceeded • Data used in transient execution • Attacker can determine data

38 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Bound Range Exceeded

• x86 provides dedicated instruction raising #BR exception if bound-range is exceeded • Data used in transient execution • Attacker can determine data • First Meltdown-type attack on AMD

38 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) We tested all of them Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

Take a closer look . . . again

Exceptions

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Non-Executable Present Read/Write Alignment Check

Protection Key User/Supervisor Stack Segment

SMAP General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

Take a closer look . . . again

Page Fault

Exceptions

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Alignment Check

Stack Segment

General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

Take a closer look . . . again

Non-Executable Present Read/Write

Protection Key Page Fault User/Supervisor

SMAP Exceptions

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Alignment Check

Stack Segment

General Protection

Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

Take a closer look . . . again

Non-Executable Present Read/Write

Protection Key Page Fault User/Supervisor

SMAP Exceptions

Device not available

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Stack Segment

General Protection

Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor

SMAP Exceptions

Device not available

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Stack Segment

General Protection

Bound-Range-Exceeded Illegal Opcode

MPX Bound

Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor

SMAP Exceptions

Device not available

Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Stack Segment

General Protection

Bound-Range-Exceeded

MPX Bound

Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor

SMAP Exceptions

Device not available Illegal Opcode

Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) General Protection

Bound-Range-Exceeded

MPX Bound

Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP Exceptions

Device not available Illegal Opcode

Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Bound-Range-Exceeded

MPX Bound

Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP Exceptions General Protection

Device not available Illegal Opcode

Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) MPX Bound

Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP Exceptions General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP Exceptions General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP Exceptions General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Take a closer look . . . again

Non-Executable Present Read/Write Alignment Check

Protection Key Page Fault User/Supervisor Stack Segment

SMAP Exceptions General Protection

Device not available Bound-Range-Exceeded Illegal Opcode

MPX Bound Division-by-Zero

39 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Propose: use exception/bit for naming attack → Meltdown = Meltdown-US → Foreshadow = Meltdown-P → Meltdown 3a = Meltdown-GP → Protection Keys = Meltdown-PK → ...

New Naming Scheme

• Names are still confusing

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Meltdown = Meltdown-US → Foreshadow = Meltdown-P → Meltdown 3a = Meltdown-GP → Protection Keys = Meltdown-PK → ...

New Naming Scheme

• Names are still confusing • Propose: use exception/bit for naming attack

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Foreshadow = Meltdown-P → Meltdown 3a = Meltdown-GP → Protection Keys = Meltdown-PK → ...

New Naming Scheme

• Names are still confusing • Propose: use exception/bit for naming attack → Meltdown = Meltdown-US

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Meltdown 3a = Meltdown-GP → Protection Keys = Meltdown-PK → ...

New Naming Scheme

• Names are still confusing • Propose: use exception/bit for naming attack → Meltdown = Meltdown-US → Foreshadow = Meltdown-P

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Protection Keys = Meltdown-PK → ...

New Naming Scheme

• Names are still confusing • Propose: use exception/bit for naming attack → Meltdown = Meltdown-US → Foreshadow = Meltdown-P → Meltdown 3a = Meltdown-GP

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → ...

New Naming Scheme

• Names are still confusing • Propose: use exception/bit for naming attack → Meltdown = Meltdown-US → Foreshadow = Meltdown-P → Meltdown 3a = Meltdown-GP → Protection Keys = Meltdown-PK

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) New Naming Scheme

• Names are still confusing • Propose: use exception/bit for naming attack → Meltdown = Meltdown-US → Foreshadow = Meltdown-P → Meltdown 3a = Meltdown-GP → Protection Keys = Meltdown-PK → ...

40 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we fix this? D1 Architecturally inaccessible data is also microarchitecturally inaccessible

Meltdown Defense Categorization

Meltdown defenses in 2 categories:

41 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Meltdown Defense Categorization

Meltdown defenses in 2 categories:

D1 Architecturally inaccessible data is also microarchitecturally inaccessible

41 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Why don’t we take the kernel addresses...

Take the kernel addresses...

• Kernel addresses in user space are a problem

42 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Take the kernel addresses...

• Kernel addresses in user space are a problem • Why don’t we take the kernel addresses...

42 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • User accessible check in hardware is not reliable

...and remove them

• ...and remove them if not needed?

43 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) ...and remove them

• ...and remove them if not needed? • User accessible check in hardware is not reliable

43 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Meltdown-US Mitigation: KAISER

Userspace Kernelspace

Operating Applications System Memory

44 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) KAISER

Kernel View User View

Userspace Kernelspace Userspace Kernelspace

Operating Applications System Memory Applications

context switch

45 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Apple: Released updates • Windows: Kernel Virtual Address (KVA) Shadow

Mitigations

• Linux: Kernel Page-table Isolation (KPTI)

46 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Windows: Kernel Virtual Address (KVA) Shadow

Mitigations

• Linux: Kernel Page-table Isolation (KPTI) • Apple: Released updates

46 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Mitigations

• Linux: Kernel Page-table Isolation (KPTI) • Apple: Released updates • Windows: Kernel Virtual Address (KVA) Shadow

46 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Meltdown Defense Categorization

Meltdown defenses in 2 categories:

D1 Architecturally inaccessible data is D2 Preventing occurrence of faults also microarchitecturally inaccessible

47 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • But do not leak data • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data • SGX Page Abort Semantics

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • FPU is always available → never faults

D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) D2: Preventing occurrence of faults

• Prevent the occurrence of faults in the first place • Accesses which would normally fault → become architecturally and microarchitecturally valid accesses • But do not leak data • SGX Page Abort Semantics • Returns -1 when enclave memory is accessed from the outside world • Meltdown-NM mitigation: Use eager switching instead of lazy switching in the FPU • FPU is always available → never faults

48 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush L1 upon switching protection domains

Clear phyiscal address field of unmapped PTEs

Meltdown-P Mitigation

49 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Flush L1 upon switching protection domains

Meltdown-P Mitigation

Clear phyiscal address field of unmapped PTEs

49 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Meltdown-P Mitigation

Clear phyiscal address field of Flush L1 upon switching protection unmapped PTEs domains

49 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) What about other microarchitectural elements? Branch Prediction • . . . based on what happened in the past • of instructions • Correct prediction, . . . • . . . very fast • otherwise: Discard results

Branch Prediction

• CPU tries to predict the future, . . .

50 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Speculative execution of instructions • Correct prediction, . . . • . . . very fast • otherwise: Discard results

Branch Prediction

• CPU tries to predict the future, . . . • . . . based on what happened in the past

50 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Correct prediction, . . . • . . . very fast • otherwise: Discard results

Branch Prediction

• CPU tries to predict the future, . . . • . . . based on what happened in the past • Speculative execution of instructions

50 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • . . . very fast • otherwise: Discard results

Branch Prediction

• CPU tries to predict the future, . . . • . . . based on what happened in the past • Speculative execution of instructions • Correct prediction, . . .

50 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • otherwise: Discard results

Branch Prediction

• CPU tries to predict the future, . . . • . . . based on what happened in the past • Speculative execution of instructions • Correct prediction, . . . • . . . very fast

50 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Prediction

• CPU tries to predict the future, . . . • . . . based on what happened in the past • Speculative execution of instructions • Correct prediction, . . . • . . . very fast • otherwise: Discard results

50 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Pattern History Table Branch Target Return Stack Buffer Disambiguation (PHT) Buffer (BTB) (RSB) (STL)

Branch Prediction

• CPU has many different predictors...

51 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Branch Target Return Stack Buffer Disambiguation Buffer (BTB) (RSB) (STL)

Branch Prediction

• CPU has many different predictors...

Pattern History Table (PHT)

51 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Return Stack Buffer Disambiguation (RSB) (STL)

Branch Prediction

• CPU has many different predictors...

Pattern History Table Branch Target (PHT) Buffer (BTB)

51 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Disambiguation (STL)

Branch Prediction

• CPU has many different predictors...

Pattern History Table Branch Target Return Stack Buffer (PHT) Buffer (BTB) (RSB)

51 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Prediction

• CPU has many different predictors...

Memory Pattern History Table Branch Target Return Stack Buffer Disambiguation (PHT) Buffer (BTB) (RSB) (STL)

51 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Can we trick the CPU into executing code speculatively? And exploit that? Pattern History Table (PHT) Pattern History Table (PHT)

LUT index =0;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =0;

char* data ="textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =0;

char* data ="textKEY";

if (index <4) Speculate else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =0;

char* data ="textKEY";

if (index <4) Execute else then Index ’t’ Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =1;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =1;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =1;

char* data = "textKEY";

if (index <4) Speculate Index ’e’ else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =1;

char* data = "textKEY";

if (index <4)

Index ’e’ else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =2;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =2;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =2;

char* data = "textKEY";

if (index <4) Speculate else then Prediction Index ’x’ LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =2;

char* data = "textKEY";

if (index <4)

else then Prediction Index ’x’ LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =3;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =3;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =3;

char* data = "textKEY";

if (index <4) Speculate else then Index ’t’ Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =3;

char* data = "textKEY";

if (index <4)

else then Index ’t’ Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =4;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =4;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =4;

Index ’K’ char* data = "textKEY";

if (index <4) Speculate else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =4;

Index ’K’ char* data = "textKEY";

if (index <4) Execute else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =5;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =5;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =5;

Index ’E’ char* data = "textKEY";

if (index <4) Speculate else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =5;

Index ’E’ char* data = "textKEY";

if (index <4) Execute else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =6;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =6;

char* data = "textKEY";

if (index <4)

else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =6;

char* data = "textKEY"; Index ’Y’

if (index <4) Speculate else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Pattern History Table (PHT)

LUT index =6;

char* data = "textKEY"; Index ’Y’

if (index <4) Execute else then Prediction

LUT[data[index] * 4096] 0

52 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Spectre Variant 1.1: Bounds Check Bypass on Stores • Intel: Side-Channel Vulnerability 1 • Propose: Spectre-PHT

Naming

• Spectre Variant 1: Bounds Check Bypass on Load

53 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Intel: Side-Channel Vulnerability 1 • Propose: Spectre-PHT

Naming

• Spectre Variant 1: Bounds Check Bypass on Load • Spectre Variant 1.1: Bounds Check Bypass on Stores

53 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Propose: Spectre-PHT

Naming

• Spectre Variant 1: Bounds Check Bypass on Load • Spectre Variant 1.1: Bounds Check Bypass on Stores • Intel: Side-Channel Vulnerability 1

53 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Naming

• Spectre Variant 1: Bounds Check Bypass on Load • Spectre Variant 1.1: Bounds Check Bypass on Stores • Intel: Side-Channel Vulnerability 1 • Propose: Spectre-PHT

53 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB) Branch Target Buffer (BTB)

Animal* a = bird;

a->move()

swim() swim() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = bird;

a->move() Speculate swim() swim() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = bird;

a->move()

swim() swim() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = bird;

a->move() Execute swim() swim() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = bird;

a->move()

swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = bird;

a->move() Speculate swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = bird;

a->move()

swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = fish;

a->move()

swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = fish;

a->move() Speculate swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = fish;

a->move()

swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = fish;

a->move() Execute swim() fly() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Branch Target Buffer (BTB)

Animal* a = fish;

a->move()

swim() swim() fly()

Prediction LUT[data[index] * 4096] 0

54 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Intel: Side-Channel Vulnerability 2 • Propose: Spectre-BTB

Naming

• Spectre Variant 2: Branch Target Injection

55 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Propose: Spectre-BTB

Naming

• Spectre Variant 2: Branch Target Injection • Intel: Side-Channel Vulnerability 2

55 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Naming

• Spectre Variant 2: Branch Target Injection • Intel: Side-Channel Vulnerability 2 • Propose: Spectre-BTB

55 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB) Return Stack Buffer (RSB)

function() ...

Victim Attacker

RSB

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB)

function() ...

Victim Attacker reg = secret reg = dummy

RSB

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB)

function() ...

Victim Attacker reg = secret reg = dummy call function(SHORT) RSB

&victim

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB)

function() ...

Victim Attacker reg = secret reg = dummy call function(SHORT) call function(LONG) RSB data[reg * 4096]

&attacker &victim

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB)

function() ...

Victim Attacker reg = secret reg = dummy call function(SHORT) call function(LONG) RSB data[reg * 4096]

&attacker &victim

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB)

function() ...

Victim Attacker reg = secret reg = dummy call function(SHORT) call function(LONG) RSB data[reg * 4096]

&victim

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Return Stack Buffer (RSB)

function() ...

Victim Attacker reg = secret reg = dummy call function(SHORT) call function(LONG) RSB data[reg * 4096]

&victim

56 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • SpectreRSB • ret2spec • Propose: Spectre-RSB

Naming

• Spectre-NG

57 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • ret2spec • Propose: Spectre-RSB

Naming

• Spectre-NG • SpectreRSB

57 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Propose: Spectre-RSB

Naming

• Spectre-NG • SpectreRSB • ret2spec

57 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Naming

• Spectre-NG • SpectreRSB • ret2spec • Propose: Spectre-RSB

57 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Disambiguation (STL) → need to check for previous stores • Check is time consuming • Optimization: Speculate whether a store happened or not • no store: bypass check • stall

Memory Disambiguation (STL)

• Loads can be executed out-of-order

58 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Check is time consuming • Optimization: Speculate whether a store happened or not • no store: bypass check • stall

Memory Disambiguation (STL)

• Loads can be executed out-of-order → need to check for previous stores

58 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Optimization: Speculate whether a store happened or not • no store: bypass check • stall

Memory Disambiguation (STL)

• Loads can be executed out-of-order → need to check for previous stores • Check is time consuming

58 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • no store: bypass check • stall

Memory Disambiguation (STL)

• Loads can be executed out-of-order → need to check for previous stores • Check is time consuming • Optimization: Speculate whether a store happened or not

58 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • stall

Memory Disambiguation (STL)

• Loads can be executed out-of-order → need to check for previous stores • Check is time consuming • Optimization: Speculate whether a store happened or not • no store: bypass check

58 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Memory Disambiguation (STL)

• Loads can be executed out-of-order → need to check for previous stores • Check is time consuming • Optimization: Speculate whether a store happened or not • no store: bypass check • stall

58 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Propose: Spectre-STL

Naming

• Spectre Variant 4: Speculative Store Bypass

59 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Naming

• Spectre Variant 4: Speculative Store Bypass • Propose: Spectre-STL

59 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) What can you attack? OS / HV

Other Process

What can you attack?

Own Process

60 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Other Process

What can you attack?

OS / HV

Own Process

60 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) What can you attack?

OS / HV

Own Process

Other Process

60 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) What can you attack?

OS / HV

Own Process

Other Process

60 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we mistrain the CPU? How can we mistrain?

Victim

same address space/ Victim in place branch

61 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we mistrain?

Victim

same address space/ Congruent out of place branch Address collision

same address space/ Victim in place branch

61 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we mistrain?

Victim

same address space/ Congruent out of place branch Address collision

same address space/ Victim in place branch

Shared Branch Prediction State

61 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we mistrain?

Victim Attacker

same address space/ Congruent out of place branch Address collision

same address space/ Victim in place branch

Shared Branch Prediction State

61 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we mistrain?

Victim Attacker

same address space/ Congruent out of place branch Address collision

same address space/ Victim Shadow cross address space/ in place branch branch in place

Shared Branch Prediction State

61 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we mistrain?

Victim Attacker

same address space/ Congruent Congruent cross address space/ out of place branch branch out of place Address collision Address collision

same address space/ Victim Shadow cross address space/ in place branch branch in place

Shared Branch Prediction State

61 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we fix this? syscall/hypercall ? context switch

How can we fix this?

OS / HV

Own Process

Other Process

62 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) ? context switch

How can we fix this?

syscall/hypercall OS / HV

Own Process

Other Process

62 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) ?

How can we fix this?

syscall/hypercall OS / HV

Own Process context switch

Other Process

62 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) How can we fix this?

syscall/hypercall OS / HV ? Own Process context switch

Other Process

62 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) C1 Mitigating or reducing the accuracy of covert channels

Spectre Defense Categorization

Spectre defenses in 3 categories:

63 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Spectre Defense Categorization

Spectre defenses in 3 categories:

C1 Mitigating or reducing the accuracy of covert channels

63 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → System would be too slow • Remove flush instructions → Use eviction • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Remove flush instructions → Use eviction • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Use eviction • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions → Use eviction

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions → Use eviction • Remove rdtsc

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions → Use eviction • Remove rdtsc → Many alternative high-resolution timers

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions → Use eviction • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions → Use eviction • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) C1: Mitigating covert channels

• Deactivate the cache → System would be too slow • Remove flush instructions → Use eviction • Remove rdtsc → Many alternative high-resolution timers • Build new caches: DAWG, InvisiSpec, SafeSpec → Require hardware changes

• Other covert channels besides the cache: • PortSmash • SMoTHERSpectre • AVX

64 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Spectre Defense Categorization

Spectre defenses in 3 categories:

C1 Mitigating or reducing C2 Mitigating or aborting the accuracy of covert speculation channels

65 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) C2: Mitigating or aborting speculation in hardware

Drilling template (@kreon nrw)

66 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Single Thread Indirect Branch Prediction (STIBP) • Indirect Barrier (IBPB) • ARMv8.5-A: New barrier (sb) • Speculative Store Bypass Safe/Disable (SSBS/SSBD)

C2: Mitigating or aborting speculation in hardware

• Three new controls for ISA: • Indirect Branch Restricted Speculation (IBRS)

67 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Indirect Branch Predictor Barrier (IBPB) • ARMv8.5-A: New barrier (sb) • Speculative Store Bypass Safe/Disable (SSBS/SSBD)

C2: Mitigating or aborting speculation in hardware

• Three new controls for ISA: • Indirect Branch Restricted Speculation (IBRS) • Single Thread Indirect Branch Prediction (STIBP)

67 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • ARMv8.5-A: New barrier (sb) • Speculative Store Bypass Safe/Disable (SSBS/SSBD)

C2: Mitigating or aborting speculation in hardware

• Three new controls for ISA: • Indirect Branch Restricted Speculation (IBRS) • Single Thread Indirect Branch Prediction (STIBP) • Indirect Branch Predictor Barrier (IBPB)

67 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Speculative Store Bypass Safe/Disable (SSBS/SSBD)

C2: Mitigating or aborting speculation in hardware

• Three new controls for ISA: • Indirect Branch Restricted Speculation (IBRS) • Single Thread Indirect Branch Prediction (STIBP) • Indirect Branch Predictor Barrier (IBPB) • ARMv8.5-A: New barrier (sb)

67 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) C2: Mitigating or aborting speculation in hardware

• Three new controls for ISA: • Indirect Branch Restricted Speculation (IBRS) • Single Thread Indirect Branch Prediction (STIBP) • Indirect Branch Predictor Barrier (IBPB) • ARMv8.5-A: New barrier (sb) • Speculative Store Bypass Safe/Disable (SSBS/SSBD)

67 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Developers need to know where to put them • Retpoline: replacing indirect branches with return instructions • RSB Stuffing • Fill RSB with benign addresses on every context switch

C2: Mitigating or aborting speculation in software

• Use serializing instructions on branch outcomes (lfence, dsb sy)

68 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Retpoline: replacing indirect branches with return instructions • RSB Stuffing • Fill RSB with benign addresses on every context switch

C2: Mitigating or aborting speculation in software

• Use serializing instructions on branch outcomes (lfence, dsb sy) • Developers need to know where to put them

68 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • RSB Stuffing • Fill RSB with benign addresses on every context switch

C2: Mitigating or aborting speculation in software

• Use serializing instructions on branch outcomes (lfence, dsb sy) • Developers need to know where to put them • Retpoline: replacing indirect branches with return instructions

68 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Fill RSB with benign addresses on every context switch

C2: Mitigating or aborting speculation in software

• Use serializing instructions on branch outcomes (lfence, dsb sy) • Developers need to know where to put them • Retpoline: replacing indirect branches with return instructions • RSB Stuffing

68 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) C2: Mitigating or aborting speculation in software

• Use serializing instructions on branch outcomes (lfence, dsb sy) • Developers need to know where to put them • Retpoline: replacing indirect branches with return instructions • RSB Stuffing • Fill RSB with benign addresses on every context switch

68 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Spectre Defense Categorization

Spectre defenses in 3 categories:

C1 Mitigating or reducing C2 Mitigating or aborting C3 Ensuring secret data the accuracy of covert speculation cannot be reached channels

69 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Webkit: Index masking instead of array bounds checks • Webkit: Poison values to protect pointers • Chrome: Site Isolation

C3: Ensuring secret data cannot be reached

• What to do if an attacker attacks his own process (sandboxing)?

70 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Webkit: Poison values to protect pointers • Chrome: Site Isolation

C3: Ensuring secret data cannot be reached

• What to do if an attacker attacks his own process (sandboxing)? • Webkit: Index masking instead of array bounds checks

70 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Chrome: Site Isolation

C3: Ensuring secret data cannot be reached

• What to do if an attacker attacks his own process (sandboxing)? • Webkit: Index masking instead of array bounds checks • Webkit: Poison values to protect pointers

70 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) C3: Ensuring secret data cannot be reached

• What to do if an attacker attacks his own process (sandboxing)? • Webkit: Index masking instead of array bounds checks • Webkit: Poison values to protect pointers • Chrome: Site Isolation

70 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Systematization Transient cause? Spectre-type

prediction Transient cause? fault

Meltdown-type microarchitec- Spectre-PHT tural buffer Spectre-BTB

Spectre-RSB Spectre-type Spectre-STL

prediction Transient cause? fault

Meltdown-type mistraining strategy Cross-address-space microarchitec- Spectre-PHT Same-address-space tural buffer Spectre-BTB

Spectre-RSB Cross-address-space Spectre-type Spectre-STL Same-address-space

prediction Cross-address-space Transient Same-address-space cause? fault

Meltdown-type PHT-CA-IP ∗

PHT-CA-OP ∗

PHT-SA-OP ∗

BTB-SA-IP ∗

BTB-SA-OP ∗

in-place (IP) vs., out-of-place (OP)

mistraining strategy Cross-address-space microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB

Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space Transient Same-address-space cause?

fault RSB-CA-IP

RSB-CA-OP Meltdown-type RSB-SA-IP

RSB-SA-OP in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗

fault RSB-CA-IP

RSB-CA-OP Meltdown-type RSB-SA-IP

RSB-SA-OP Meltdown-AC .

Meltdown-DE .

Meltdown-UD .

Meltdown-SS .

in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗ Meltdown-NM fault RSB-CA-IP

RSB-CA-OP Meltdown-type RSB-SA-IP Meltdown-PF RSB-SA-OP

fault type

Meltdown-BR

Meltdown-GP in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗ Meltdown-NM fault RSB-CA-IP Meltdown-AC . RSB-CA-OP Meltdown-type Meltdown-DE . RSB-SA-IP Meltdown-PF RSB-SA-OP Meltdown-UD . fault type Meltdown-SS .

Meltdown-BR

Meltdown-GP Meltdown-PK ∗

Meltdown-XD .

Meltdown-SM .

Meltdown-BND ∗

in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗ Meltdown-NM Meltdown-US fault RSB-CA-IP Meltdown-AC . Meltdown-P RSB-CA-OP Meltdown-type Meltdown-DE . Meltdown-RW RSB-SA-IP Meltdown-PF RSB-SA-OP Meltdown-UD . fault type Meltdown-SS .

Meltdown-BR Meltdown-MPX

Meltdown-GP Meltdown-PK ∗

Meltdown-XD .

Meltdown-SM .

Meltdown-BND ∗

in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗ Meltdown-NM Meltdown-US fault RSB-CA-IP Meltdown-AC . Meltdown-P RSB-CA-OP Meltdown-type Meltdown-DE . Meltdown-RW RSB-SA-IP Meltdown-PF RSB-SA-OP Meltdown-UD . fault type Meltdown-SS .

Meltdown-BR Meltdown-MPX

Meltdown-GP Meltdown-PK ∗

Meltdown-BND ∗

in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗ Meltdown-NM Meltdown-US fault RSB-CA-IP Meltdown-AC . Meltdown-P RSB-CA-OP Meltdown-type Meltdown-DE . Meltdown-RW RSB-SA-IP Meltdown-PF RSB-SA-OP Meltdown-UD . Meltdown-XD . fault type Meltdown-SS . Meltdown-SM .

Meltdown-BR Meltdown-MPX

Meltdown-GP in-place (IP) vs., out-of-place (OP)

mistraining PHT-CA-IP ∗ strategy Cross-address-space PHT-CA-OP ∗ microarchitec- Spectre-PHT Same-address-space tural buffer PHT-SA-IP Spectre-BTB PHT-SA-OP ∗ Spectre-RSB Cross-address-space Spectre-type BTB-CA-IP Spectre-STL Same-address-space BTB-CA-OP prediction Cross-address-space BTB-SA-IP ∗ Transient Same-address-space cause? BTB-SA-OP ∗ Meltdown-NM Meltdown-US fault RSB-CA-IP Meltdown-AC . Meltdown-P RSB-CA-OP Meltdown-type Meltdown-DE . Meltdown-RW RSB-SA-IP Meltdown-PF Meltdown-PK ∗ RSB-SA-OP Meltdown-UD . Meltdown-XD . fault type Meltdown-SS . Meltdown-SM .

Meltdown-BR Meltdown-MPX

Meltdown-GP Meltdown-BND ∗ Are the defenses working? • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers • Poison Value: works • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers • Poison Value: works • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers • Poison Value: works • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers • Poison Value: works • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Timer reduction: can build new timers • Poison Value: works • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Poison Value: works • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Meltdown defenses

Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers • Poison Value: works

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Defenses working?

• Evaluated all mitigations without hardware changes • Serialization: still leaks through TLB, AVX • STIPB, IBPB, SSBD: not enabled by default • Site Isolation: can still leak data of own process • Index Masking: only limits how far beyond leakage can occur • Speculative Load Hardening: Spectrum showed potential leakage • Timer reduction: can build new timers • Poison Value: works • Meltdown defenses

72 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Conclusion • Some mitigations cost more than gained by the feature they defend • Transient-execution attacks will keep us busy for a while • There are still many other microarchitectural elements left to explore

Conclusion

• Optimizations always come at a cost

73 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • Transient-execution attacks will keep us busy for a while • There are still many other microarchitectural elements left to explore

Conclusion

• Optimizations always come at a cost • Some mitigations cost more than gained by the feature they defend

73 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) • There are still many other microarchitectural elements left to explore

Conclusion

• Optimizations always come at a cost • Some mitigations cost more than gained by the feature they defend • Transient-execution attacks will keep us busy for a while

73 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f) Conclusion

• Optimizations always come at a cost • Some mitigations cost more than gained by the feature they defend • Transient-execution attacks will keep us busy for a while • There are still many other microarchitectural elements left to explore

73 Moritz Lipp (@mlqxyz), Claudio Canella (@cc0x1f)