ARMageddon How Your Smartphone CPU Breaks Software-Level Security And Privacy

Moritz Lipp and Cl´ementineMaurice November 3, 2016—Black Hat Europe • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?

Motivation

• Safe software infrastructure does not mean safe execution

2 • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?

Motivation

• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware

2 • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?

Motivation

• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache

2 • Only been demonstrated on Intel x86 for now • But why not on ARM?

Motivation

• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations

2 • But why not on ARM?

Motivation

• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now

2 Motivation

• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?

2 Who We Are Who We Are

• Moritz Lipp • Master Student, Graz University Of Technology • 7 @mlqxyz • R [email protected]

3 Who We Are

• Cl´ementineMaurice • PhD in InfoSec; Postdoc, Graz University Of Technology • 7 @BloodyTangerine • R [email protected]

4 The rest of the team

The rest of the research team • Daniel Gruss • Raphael Spreitzer • Stefan Mangard From Graz University of Technology

5 Demo

6 • What are the challenges for cache attacks on ARM? • How to solve those challenges • Attack scenarios • Tools

Outline

• Background information

7 • How to solve those challenges • Attack scenarios • Tools

Outline

• Background information • What are the challenges for cache attacks on ARM?

7 • Attack scenarios • Tools

Outline

• Background information • What are the challenges for cache attacks on ARM? • How to solve those challenges

7 • Tools

Outline

• Background information • What are the challenges for cache attacks on ARM? • How to solve those challenges • Attack scenarios

7 Outline

• Background information • What are the challenges for cache attacks on ARM? • How to solve those challenges • Attack scenarios • Tools

7 Cache Attacks • CPU registers • Different levels of the CPU cache • Main memory • Disk storage

Memory Hierarchy

CPU Registers L1 Cache L2 Cache Memory Disk storage

• Data can reside in

8 • Different levels of the CPU cache • Main memory • Disk storage

Memory Hierarchy

CPU Registers L1 Cache L2 Cache Memory Disk storage

• Data can reside in • CPU registers

8 • Main memory • Disk storage

Memory Hierarchy

CPU Registers L1 Cache L2 Cache Memory Disk storage

• Data can reside in • CPU registers • Different levels of the CPU cache

8 • Disk storage

Memory Hierarchy

CPU Registers L1 Cache L2 Cache Memory Disk storage

• Data can reside in • CPU registers • Different levels of the CPU cache • Main memory

8 Memory Hierarchy

CPU Registers L1 Cache L2 Cache Memory Disk storage

• Data can reside in • CPU registers • Different levels of the CPU cache • Main memory • Disk storage

8 • cache → fast (cache hit) • main memory → slow (cache miss)

·104

Cache hit Cache miss 3

2

1 Number of accesses

0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles)

Cache Attacks

• Exploit timing differences of memory accesses:

9 • main memory → slow (cache miss)

·104

Cache hit Cache miss 3

2

1 Number of accesses

0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles)

Cache Attacks

• Exploit timing differences of memory accesses: • cache → fast (cache hit)

9 ·104

Cache hit Cache miss 3

2

1 Number of accesses

0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles)

Cache Attacks

• Exploit timing differences of memory accesses: • cache → fast (cache hit) • main memory → slow (cache miss)

9 Cache Attacks

• Exploit timing differences of memory accesses: • cache → fast (cache hit) • main memory → slow (cache miss)

·104

Cache hit Cache miss 3

2

1 Number of accesses

0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles) 9 Set-Associative Caches

0 16 17 25 26 31 Address Index Offset

Cache

10 Set-Associative Caches

0 16 17 25 26 31 Address Index Offset

Cache set

Cache

Data loaded in a specific set depending on its address

10 Set-Associative Caches

0 16 17 25 26 31 Address Index Offset

way 0 way 3

Cache set

Cache

Data loaded in a specific set depending on its address Several ways per set

10 Set-Associative Caches

0 16 17 25 26 31 Address Index Offset

way 0 way 3

Cache set

Cache line

Cache

Data loaded in a specific set depending on its address Several ways per set Cache line loaded in a specific way depending on the replacement policy

10 Cache Attacks: Flush+Reload

Victim address space Cache Attacker address space

Step 1: Attacker maps shared library (shared memory, in cache)

11 Cache Attacks: Flush+Reload

cached cached

Victim address space Cache Attacker address space

Step 1: Attacker maps shared library (shared memory, in cache)

11 Cache Attacks: Flush+Reload

flushes

Victim address space Cache Attacker address space

Step 1: Attacker maps shared library (shared memory, in cache) Step 2: Attacker flushes the shared cache line

11 Cache Attacks: Flush+Reload

loads data

Victim address space Cache Attacker address space

Step 1: Attacker maps shared library (shared memory, in cache) Step 2: Attacker flushes the shared cache line Step 3: Victim loads the data

11 Cache Attacks: Flush+Reload

reloads data

Victim address space Cache Attacker address space

Step 1: Attacker maps shared library (shared memory, in cache) Step 2: Attacker flushes the shared cache line Step 3: Victim loads the data Step 4: Attacker reloads the data

11 Cache Attacks: Prime+Probe

Victim address space Cache Attacker address space

12 Cache Attacks: Prime+Probe

Victim address space Cache Attacker address space

Step 1: Attacker primes, i.e., fills, the cache (no shared memory)

12 Cache Attacks: Prime+Probe

loads data

Victim address space Cache Attacker address space

Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running

12 Cache Attacks: Prime+Probe

loads data

Victim address space Cache Attacker address space

Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running

12 Cache Attacks: Prime+Probe

fast access

Victim address space Cache Attacker address space

Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running Step 3: Attacker probes data to determine if set has been accessed

12 Cache Attacks: Prime+Probe

slow access

Victim address space Cache Attacker address space

Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running Step 3: Attacker probes data to determine if set has been accessed

12 Differences between Intel x86 and ARM • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace

Challenge #1 No flush instruction

Cache maintenance

• Basic operation for cache attacks: invalidate cache lines

13 • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace

Challenge #1 No flush instruction

Cache maintenance

• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions

13 • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace

Challenge #1 No flush instruction

Cache maintenance

• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction

13 • ARMv8-A: Privileged instructions can be unlocked for userspace

Challenge #1 No flush instruction

Cache maintenance

• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions

13 Challenge #1 No flush instruction

Cache maintenance

• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace

13 Cache maintenance

• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace

Challenge #1 No flush instruction

13 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

→ too slow • Fill a specific cache set

cache set

Cache eviction

• Fill the whole cache

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

• Fill a specific cache set

cache set

Cache eviction

• Fill the whole cache → too slow

14 → Ideal case with LRU replacement policy

• Until the target address is evicted from the cache

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set

cache set 2 5 8 1 7 6 3 4

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 2 5 8 19 7 6 3 4

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 102 5 8 19 7 6 3 4

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 102 5 8 19 7 6 113 4

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 102 5 8 19 7 6 113 124

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 102 135 8 19 7 6 113 124

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 102 135 8 19 7 146 113 124

14 • Until the target address is evicted from the cache

→ Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set load

cache set 102 135 8 19 157 146 113 124

14 → Ideal case with LRU replacement policy

Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set • Until the target address is evicted from the cache load

cache set 102 135 168 19 157 146 113 124

14 Cache eviction

• Fill the whole cache → too slow • Fill a specific cache set • Until the target address is evicted from the cache

cache set 102 135 168 19 157 146 113 124

→ Ideal case with LRU replacement policy

14 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy

cache set 2 5 8 1 7 6 3 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 2 5 8 19 7 6 3 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 102 5 8 19 7 6 3 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 102 5 8 1119 7 6 3 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 10122 5 8 1119 7 6 3 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 10122 5 8 1119 7 6 133 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 10122 5 8 1119 7 6 13143 4

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 10122 5 8 1119 7 6 13143 154

15 → Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy load

cache set 10122 5 8 1119 7 6 13143 15164

15 Challenge #2 Pseudo-random replacement policy complicates eviction

Cache eviction (what actually happens)

• Pseudo-random cache replacement policy

cache set 10122 5 8 1119 7 6 13143 15164

→ Simple approach highly inefficient

15 Cache eviction (what actually happens)

• Pseudo-random cache replacement policy

cache set 10122 5 8 1119 7 6 13143 15164

→ Simple approach highly inefficient

Challenge #2 Pseudo-random replacement policy complicates eviction

15 • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode → Previous attacks required root access

Challenge #3 No unprivileged and accurate timing sources

Timing measurements

• Need fine-grained timing measurements

16 • ARM: Cycle counter only in privileged mode → Previous attacks required root access

Challenge #3 No unprivileged and accurate timing sources

Timing measurements

• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count

16 → Previous attacks required root access

Challenge #3 No unprivileged and accurate timing sources

Timing measurements

• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode

16 Challenge #3 No unprivileged and accurate timing sources

Timing measurements

• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode → Previous attacks required root access

16 Timing measurements

• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode → Previous attacks required root access

Challenge #3 No unprivileged and accurate timing sources

16 • shared • inclusive → Shared memory is shared in the cache across all cores

Cache Hierarchy on Intel CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache L2 Cache L2 Cache L2 Cache

L3 Cache

• Last-level cache: L3

17 • inclusive → Shared memory is shared in the cache across all cores

Cache Hierarchy on Intel CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache L2 Cache L2 Cache L2 Cache

L3 Cache

• Last-level cache: L3 • shared

17 → Shared memory is shared in the cache across all cores

Cache Hierarchy on Intel CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache L2 Cache L2 Cache L2 Cache

L3 Cache

• Last-level cache: L3 • shared • inclusive

17 Cache Hierarchy on Intel CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache L2 Cache L2 Cache L2 Cache

L3 Cache

• Last-level cache: L3 • shared • inclusive → Shared memory is shared in the cache across all cores

17 • shared • not inclusive → Shared memory that is not in L2 is not shared in the cache.

Challenge #4 Non-inclusive caches

Cache Hierarchy on ARM Cortex-A CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Last-level cache: L2

18 • not inclusive → Shared memory that is not in L2 is not shared in the cache.

Challenge #4 Non-inclusive caches

Cache Hierarchy on ARM Cortex-A CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Last-level cache: L2 • shared

18 → Shared memory that is not in L2 is not shared in the cache.

Challenge #4 Non-inclusive caches

Cache Hierarchy on ARM Cortex-A CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Last-level cache: L2 • shared • not inclusive

18 Challenge #4 Non-inclusive caches

Cache Hierarchy on ARM Cortex-A CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Last-level cache: L2 • shared • not inclusive → Shared memory that is not in L2 is not shared in the cache.

18 Cache Hierarchy on ARM Cortex-A CPUs

Core 0 Core 1 Core 2 Core 3

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Last-level cache: L2 • shared • not inclusive → Shared memory that is not in L2 is not shared in the cache.

Challenge #4 Non-inclusive caches

18 • CPUs do not share a cache

Challenge #5 No shared cache

Cache Hierarchy on ARM big.LITTLE

CPU1 CPU2

Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L2 Cache L2 Cache

CoreLink CCI-400

• Interconnects multiple CPUs to combine energy efficiency and performance

19 Challenge #5 No shared cache

Cache Hierarchy on ARM big.LITTLE

CPU1 CPU2

Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L2 Cache L2 Cache

CoreLink CCI-400

• Interconnects multiple CPUs to combine energy efficiency and performance • CPUs do not share a cache

19 Cache Hierarchy on ARM big.LITTLE

CPU1 CPU2

Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L2 Cache L2 Cache

CoreLink CCI-400

• Interconnects multiple CPUs to combine energy efficiency and performance • CPUs do not share a cache

Challenge #5 No shared cache

19 Let’s solve those challenges Challenges

Challenge #1 No flush instruction Challenge #2 Pseudo-random replacement policy Challenge #3 No unprivileged timing Challenge #4 Non-inclusive caches Challenge #5 No shared cache

20 • Works on Intel x86 • Prime+Probe • Flush+Reload → Evict+Reload

Solving #1: No flush instruction

• Replace the missing flush instruction with cache eviction

21 • Prime+Probe • Flush+Reload → Evict+Reload

Solving #1: No flush instruction

• Replace the missing flush instruction with cache eviction • Works on Intel x86

21 Solving #1: No flush instruction

• Replace the missing flush instruction with cache eviction • Works on Intel x86 • Prime+Probe • Flush+Reload → Evict+Reload

21 Challenges

Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Challenge #3 No unprivileged timing Challenge #4 Non-inclusive caches Challenge #5 No shared cache

22 Challenges

23 • Eviction can be slow and unreliable. . . • Unless you know how to properly evict data → Central idea of our Rowhammer.js paper

Challenges

• No.

24 • Unless you know how to properly evict data → Central idea of our Rowhammer.js paper

Challenges

• No. • Eviction can be slow and unreliable. . .

24 → Central idea of our Rowhammer.js paper

Challenges

• No. • Eviction can be slow and unreliable. . . • Unless you know how to properly evict data

24 Challenges

• No. • Eviction can be slow and unreliable. . . • Unless you know how to properly evict data → Central idea of our Rowhammer.js paper

24 • Accessing once n addresses in an n-way cache set → Cache eviction slow and unreliable

Solution: • Accessing unique addresses several times, with different access patterns

Solving #2: Pseudo-random replacement policy

• Cache line to be discarded is chosen pseudo-randomly

25 → Cache eviction slow and unreliable

Solution: • Accessing unique addresses several times, with different access patterns

Solving #2: Pseudo-random replacement policy

• Cache line to be discarded is chosen pseudo-randomly • Accessing once n addresses in an n-way cache set

25 Solution: • Accessing unique addresses several times, with different access patterns

Solving #2: Pseudo-random replacement policy

• Cache line to be discarded is chosen pseudo-randomly • Accessing once n addresses in an n-way cache set → Cache eviction slow and unreliable

25 Solving #2: Pseudo-random replacement policy

• Cache line to be discarded is chosen pseudo-randomly • Accessing once n addresses in an n-way cache set → Cache eviction slow and unreliable

Solution: • Accessing unique addresses several times, with different access patterns

25 200 200 33 110  96.04%  800 800 142 876  99.10%  21 96 4 275  99.93%  22 102 5 101  99.99%  23 190 6 209  100.0% 

Solving #2: Pseudo-random replacement policy

Table 1: Different eviction strategies for the Alcatel One Touch Pop 2

Addresses Accesses Cycles Eviction rate 48 48 6 517  70.78% 

26 800 800 142 876  99.10%  21 96 4 275  99.93%  22 102 5 101  99.99%  23 190 6 209  100.0% 

Solving #2: Pseudo-random replacement policy

Table 1: Different eviction strategies for the Alcatel One Touch Pop 2

Addresses Accesses Cycles Eviction rate 48 48 6 517  70.78%  200 200 33 110  96.04% 

26 21 96 4 275  99.93%  22 102 5 101  99.99%  23 190 6 209  100.0% 

Solving #2: Pseudo-random replacement policy

Table 1: Different eviction strategies for the Alcatel One Touch Pop 2

Addresses Accesses Cycles Eviction rate 48 48 6 517  70.78%  200 200 33 110  96.04%  800 800 142 876  99.10% 

26 22 102 5 101  99.99%  23 190 6 209  100.0% 

Solving #2: Pseudo-random replacement policy

Table 1: Different eviction strategies for the Alcatel One Touch Pop 2

Addresses Accesses Cycles Eviction rate 48 48 6 517  70.78%  200 200 33 110  96.04%  800 800 142 876  99.10%  21 96 4 275  99.93% 

26 23 190 6 209  100.0% 

Solving #2: Pseudo-random replacement policy

Table 1: Different eviction strategies for the Alcatel One Touch Pop 2

Addresses Accesses Cycles Eviction rate 48 48 6 517  70.78%  200 200 33 110  96.04%  800 800 142 876  99.10%  21 96 4 275  99.93%  22 102 5 101  99.99% 

26 Solving #2: Pseudo-random replacement policy

Table 1: Different eviction strategies for the Alcatel One Touch Pop 2

Addresses Accesses Cycles Eviction rate 48 48 6 517  70.78%  200 200 33 110  96.04%  800 800 142 876  99.10%  21 96 4 275  99.93%  22 102 5 101  99.99%  23 190 6 209  100.0% 

26 • Compile target executable with generated eviction strategy • Execute on target device • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device

Solving #2: Pseudo-random replacement policy

• We fully automated this process

27 • Execute on target device • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device

Solving #2: Pseudo-random replacement policy

• We fully automated this process • Compile target executable with generated eviction strategy

27 • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device

Solving #2: Pseudo-random replacement policy

• We fully automated this process • Compile target executable with generated eviction strategy • Execute on target device

27 • Find fast and efficient eviction strategies for any device

Solving #2: Pseudo-random replacement policy

• We fully automated this process • Compile target executable with generated eviction strategy • Execute on target device • Evaluate log files and build result database

27 Solving #2: Pseudo-random replacement policy

• We fully automated this process • Compile target executable with generated eviction strategy • Execute on target device • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device

27 Prime+Probe

Victim access No victim access ·103 6 4 2

Number of accesses 1,600 1,800 2,000 2,200 2,400 2,600 2,800 3,000 3,200 3,400 Measured access time in cycles

Solving #2: Pseudo-random replacement policy

Evict+Reload

Victim access No victim access ·104

4

2

0

Number of accesses 0 50 100 150 200 250 300 Measured access time in cycles

28 Solving #2: Pseudo-random replacement policy

Evict+Reload

Victim access No victim access ·104

4

2

0

Number of accesses 0 50 100 150 200 250 300 Measured access time in cycles

Prime+Probe

Victim access No victim access ·103 6 4 2

Number of accesses 1,600 1,800 2,000 2,200 2,400 2,600 2,800 3,000 3,200 3,400 Measured access time in cycles 28 Challenges

Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Challenge #4 Non-inclusive caches Challenge #5 No shared cache

29 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged

30 • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall

30 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n

30 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function

30 • Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

30 • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources

30 → Allows distinguishing cache hits from cache misses

Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources • Nanosecond resolution for all sources

30 Solving #3: No unprivileged timing

1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded

• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses

30 Solving #3: No unprivileged timing

Hit (PMCCNTR) Hit (clock gettime×.15) Miss (PMCCNTR) Miss (clock gettime×.15) Hit (syscall×.25) Hit (counter thread×.05) Miss (syscall×.25) Miss (counter thread×.05)

·104

4

2 Number of accesses

0 0 20 40 60 80 100 120 140 160 180 200 Measured access time (scaled)

31 Challenges

Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Ë Challenge #4 Non-inclusive caches Challenge #5 No shared cache

32 • Fill-up L1 D-Cache and begin to populate shared L2 Cache • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches

33 • Fill-up L1 D-Cache and begin to populate shared L2 Cache • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 • Inclusion: Line evicted from L2 → also evicted from L1

Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

33 Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

• Inclusion: Line evicted from L2 → also evicted from L1 33 Solving #4: Non-inclusive caches

Core 0 Core 1

L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache

L2 Cache

• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache

• Inclusion: Line evicted from L2 → also evicted from L1 33 • Cache coherency protocol • Fetches data from remote cores • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location

Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0

Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles

Solving #4: Non-inclusive caches

• How can we determine if the victim has loaded the address?

34 • Fetches data from remote cores • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location

Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0

Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles

Solving #4: Non-inclusive caches

• How can we determine if the victim has loaded the address? • Cache coherency protocol

34 • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location

Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0

Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles

Solving #4: Non-inclusive caches

• How can we determine if the victim has loaded the address? • Cache coherency protocol • Fetches data from remote cores

34 → Detect if another core has accessed the memory location

Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0

Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles

Solving #4: Non-inclusive caches

• How can we determine if the victim has loaded the address? • Cache coherency protocol • Fetches data from remote cores • Remote cache hit is faster than DRAM access

34 Solving #4: Non-inclusive caches

• How can we determine if the victim has loaded the address? • Cache coherency protocol • Fetches data from remote cores • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location

Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0

Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles

34 Challenges

Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Ë Challenge #4 Non-inclusive caches Ë Challenge #5 No shared cache

35 • Cache coherency protocol (again) • Fetches data from remote CPU • Remote cache hit is faster than DRAM access

Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0

Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles

Solving #5: No shared cache

• Multiple CPUs that do not share a cache

36 • Fetches data from remote CPU • Remote cache hit is faster than DRAM access

Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0

Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles

Solving #5: No shared cache

• Multiple CPUs that do not share a cache • Cache coherency protocol (again)

36 • Remote cache hit is faster than DRAM access

Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0

Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles

Solving #5: No shared cache

• Multiple CPUs that do not share a cache • Cache coherency protocol (again) • Fetches data from remote CPU

36 Solving #5: No shared cache

• Multiple CPUs that do not share a cache • Cache coherency protocol (again) • Fetches data from remote CPU • Remote cache hit is faster than DRAM access

Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0

Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles

36 Challenges

Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Ë Challenge #4 Non-inclusive caches Ë Challenge #5 No shared cache Ë

37 Attack scenarios Case study #1 Covert communication

38 • No permissions except accessing the Internet

• No permissions except accessing your images • Malicious weather widget

Case Study #1: Covert Channel

• Malicious privacy gallery app

39 • No permissions except accessing the Internet

• Malicious weather widget

Case Study #1: Covert Channel

• Malicious privacy gallery app • No permissions except accessing your images

39 • No permissions except accessing the Internet

Case Study #1: Covert Channel

• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget

39 Case Study #1: Covert Channel

• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet

39 Case Study #1: Covert Channel

• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet

covert channel

39 Case Study #1: Covert Channel

• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet

covert channel

39 Case Study #1: Covert Channel

• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet

covert channel

39 • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so

40 • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions

40 • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . .

40 • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel

40 • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate

40 • Evades the sandboxing concept and permission system → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS

40 → Collusion attack

Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system

40 Case Study #1: Covert Channel

• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack

40 • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit

Case Study #1: Covert Channel

• Build a covert channel using the cache

41 • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit

Case Study #1: Covert Channel

• Build a covert channel using the cache • Using addresses from a shared library/executable

41 • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit

Case Study #1: Covert Channel

• Build a covert channel using the cache • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses

41 • Transmit 1: Access the address → cache hit

Case Study #1: Covert Channel

• Build a covert channel using the cache • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss

41 Case Study #1: Covert Channel

• Build a covert channel using the cache • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit

41 • Use a sequence number (SQN) • Protect payload and sequence number with a checksum

078232431 Sender SQN Payload CRC

02310 Receiver SQN CRC

Case Study #1: Covert Channel

• Build a protocol based on packets

42 • Protect payload and sequence number with a checksum

078232431 Sender SQN Payload CRC

02310 Receiver SQN CRC

Case Study #1: Covert Channel

• Build a protocol based on packets • Use a sequence number (SQN)

42 Case Study #1: Covert Channel

• Build a protocol based on packets • Use a sequence number (SQN) • Protect payload and sequence number with a checksum

078232431 Sender SQN Payload CRC

02310 Receiver SQN CRC

42 • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude

Work Type Bandwidth [bps] Error rate

Case Study #1: Covert Channel

• Works using Flush+Reload, Evict+Reload and Flush+Flush

43 • Faster than state of the art by several orders of magnitude

Work Type Bandwidth [bps] Error rate

Case Study #1: Covert Channel

• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU

43 Work Type Bandwidth [bps] Error rate

Case Study #1: Covert Channel

• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude

43 Case Study #1: Covert Channel

• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude

Work Type Bandwidth [bps] Error rate Schlegel et al. Vibration settings 87 – Schlegel et al. Volume settings 150 – Schlegel et al. File locks 685 – Marforio et al. UNIX socket discovery 2 600 – Marforio et al. Type of Intents 4 300 –

43 Case Study #1: Covert Channel

• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude

Work Type Bandwidth [bps] Error rate Schlegel et al. Vibration settings 87 – Schlegel et al. Volume settings 150 – Schlegel et al. File locks 685 – Marforio et al. UNIX socket discovery 2 600 – Marforio et al. Type of Intents 4 300 – Ours (OnePlus One) Evict+Reload, cross-core 12 537 5.00% Ours (Alcatel One Touch Pop 2) Evict+Reload, cross-core 13 618 3.79% Ours (Samsung Galaxy S6) Flush+Flush, cross-core 178 292 0.48% Ours (Samsung Galaxy S6) Flush+Reload, cross-CPU 257 509 1.83% Ours (Samsung Galaxy S6) Flush+Reload, cross-core 1 140 650 1.10%

43 Case study #2 Spying on the user

44 → Cache Template Attacks

1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address

Case Study #2: Spying on the User

• Issue: Locating event-dependent memory access

45 1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address

Case Study #2: Spying on the User

• Issue: Locating event-dependent memory access → Cache Template Attacks

45 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address

Case Study #2: Spying on the User

• Issue: Locating event-dependent memory access → Cache Template Attacks

1. Shared library or executable is mapped

45 → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address

Case Study #2: Spying on the User

• Issue: Locating event-dependent memory access → Cache Template Attacks

1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address

45 3. Repeat step 2 for every address

Case Study #2: Spying on the User

• Issue: Locating event-dependent memory access → Cache Template Attacks

1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable

45 Case Study #2: Spying on the User

• Issue: Locating event-dependent memory access → Cache Template Attacks

1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address

45 Case Study #2: Spying on the User

• Cache template matrix = How many cache hits for each pair (event, address)? • On shared library and ART binaries, e.g., AOSP keyboard

alphabet enter

Input space backspace 0x41500 0x45140 0x56940 0x57280 0x58480 0x60280 0x60340 0x60580 0x66340 0x66380 0x72300 0x72340 0x72380 0x78440 0x98600

A ddresses

46 Case Study #2: Spying on the User

• Cache template matrix = How many cache hits for each pair (event, address)? • On shared library and ART binaries, e.g., AOSP keyboard

alphabet enter

Input space backspace 0x41500 0x45140 0x56940 0x57280 0x58480 0x60280 0x60340 0x60580 0x66340 0x66380 0x72300 0x72340 0x72380 0x78440 0x98600

A ddresses Many cache hits

46 Case Study #2: Spying on the User

• Cache template matrix = How many cache hits for each pair (event, address)? • On shared library and ART binaries, e.g., AOSP keyboard

No cache hits

alphabet enter

Input space backspace 0x41500 0x45140 0x56940 0x57280 0x58480 0x60280 0x60340 0x60580 0x66340 0x66380 0x72300 0x72340 0x72380 0x78440 0x98600

A ddresses Many cache hits

46 Case Study #2: Spying on the User

Evict+Reload on two addresses on the Alcatel One Touch Pop 2 in custpack@app@[email protected]@classes.dex → Differentiate keys from spaces

Key Space 300

200 Access time

100 t h i s Space i s Space a Space m e s s a g e

0 1 2 3 4 5 6 7 Time in seconds

47 • Scan all the libraries • Find all secret-dependent accesses and automate attacks • No need for source code • Spy and learn about user’s behavior

Case Study #2: Spying on the User

• Endless possibilities

48 • Find all secret-dependent accesses and automate attacks • No need for source code • Spy and learn about user’s behavior

Case Study #2: Spying on the User

• Endless possibilities • Scan all the libraries

48 • No need for source code • Spy and learn about user’s behavior

Case Study #2: Spying on the User

• Endless possibilities • Scan all the libraries • Find all secret-dependent accesses and automate attacks

48 • Spy and learn about user’s behavior

Case Study #2: Spying on the User

• Endless possibilities • Scan all the libraries • Find all secret-dependent accesses and automate attacks • No need for source code

48 Case Study #2: Spying on the User

• Endless possibilities • Scan all the libraries • Find all secret-dependent accesses and automate attacks • No need for source code • Spy and learn about user’s behavior

48 Case study #3 Attacking cryptographic algorithms

49 • Uses precomputed look-up tables • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are

(0) xi = pi ⊕ ki for all i = 0,...,15

→ Recovering accessed table indices ⇒ recovering the key

Case Study #3: Attacking Cryptographic Algorithms

• AES T-Tables: Fast software implementation

50 • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are

(0) xi = pi ⊕ ki for all i = 0,...,15

→ Recovering accessed table indices ⇒ recovering the key

Case Study #3: Attacking Cryptographic Algorithms

• AES T-Tables: Fast software implementation • Uses precomputed look-up tables

50 → Recovering accessed table indices ⇒ recovering the key

Case Study #3: Attacking Cryptographic Algorithms

• AES T-Tables: Fast software implementation • Uses precomputed look-up tables • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are

(0) xi = pi ⊕ ki for all i = 0,...,15

50 Case Study #3: Attacking Cryptographic Algorithms

• AES T-Tables: Fast software implementation • Uses precomputed look-up tables • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are

(0) xi = pi ⊕ ki for all i = 0,...,15

→ Recovering accessed table indices ⇒ recovering the key

50 → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values

Case Study #3: Attacking Cryptographic Algorithms

• Bouncy Castle → default implementation uses T-Tables

51 • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values

Case Study #3: Attacking Cryptographic Algorithms

• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed!

51 • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values

Case Study #3: Attacking Cryptographic Algorithms

• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts

51 → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values

Case Study #3: Attacking Cryptographic Algorithms

• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload

51 Case Study #3: Attacking Cryptographic Algorithms

• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values 51 Case study #4 Monitoring ARM TrustZone

52 • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology

53 • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment

53 • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment • Roots of Trust

53 • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world

53 • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store

53 • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments

53 • Information from the trusted world should not be leaked to the non-secure world

Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM)

53 Case Study #4: Monitoring ARM TrustZone

• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world

53 • No knowledge of the TrustZone/trustlet implementation • Valid keys and invalid keys are distinguishable

·106 Valid key 1 Valid key 2 Valid key 3 1 Invalid key

0.5 Mean Squared Error (MSE)

0 250 260 270 280 290 300 310 320 330 340 350 Set number

Case Study #4: Monitoring ARM TrustZone

• Timing difference for different RSA signature keys

54 • Valid keys and invalid keys are distinguishable

·106 Valid key 1 Valid key 2 Valid key 3 1 Invalid key

0.5 Mean Squared Error (MSE)

0 250 260 270 280 290 300 310 320 330 340 350 Set number

Case Study #4: Monitoring ARM TrustZone

• Timing difference for different RSA signature keys • No knowledge of the TrustZone/trustlet implementation

54 Case Study #4: Monitoring ARM TrustZone

• Timing difference for different RSA signature keys • No knowledge of the TrustZone/trustlet implementation • Valid keys and invalid keys are distinguishable

·106 Valid key 1 Valid key 2 Valid key 3 1 Invalid key

0.5 Mean Squared Error (MSE)

0 250 260 270 280 290 300 310 320 330 340 350 Set number

54 Tools • x86 • ARMv7 • ARMv8 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch • Open Source

libflush

• Library to build cross-platform cache attacks

55 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch • Open Source

libflush

• Library to build cross-platform cache attacks • x86 • ARMv7 • ARMv8

55 • Open Source

libflush

• Library to build cross-platform cache attacks • x86 • ARMv7 • ARMv8 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch

55 libflush

• Library to build cross-platform cache attacks • x86 • ARMv7 • ARMv8 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch • Open Source

55 • Includes eviction strategies for several devices • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks

‡ github.com/iaik/armageddon/libflush

libflush

• All described examples have been developed on top of libflush

56 • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks

‡ github.com/iaik/armageddon/libflush

libflush

• All described examples have been developed on top of libflush • Includes eviction strategies for several devices

56 • Even allows to implement cross-platform Rowhammer attacks

‡ github.com/iaik/armageddon/libflush

libflush

• All described examples have been developed on top of libflush • Includes eviction strategies for several devices • Comes with example code and documentation

56 ‡ github.com/iaik/armageddon/libflush

libflush

• All described examples have been developed on top of libflush • Includes eviction strategies for several devices • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks

56 libflush

• All described examples have been developed on top of libflush • Includes eviction strategies for several devices • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks

‡ github.com/iaik/armageddon/libflush

56 • Automatically executed on target platform • Results are stored in a database file

‡ github.com/iaik/armageddon/eviction_strategy_ evaluator

Eviction Strategy Evaluator

• Utilizes libflush to evaluate eviction strategies

57 • Results are stored in a database file

‡ github.com/iaik/armageddon/eviction_strategy_ evaluator

Eviction Strategy Evaluator

• Utilizes libflush to evaluate eviction strategies • Automatically executed on target platform

57 ‡ github.com/iaik/armageddon/eviction_strategy_ evaluator

Eviction Strategy Evaluator

• Utilizes libflush to evaluate eviction strategies • Automatically executed on target platform • Results are stored in a database file

57 Eviction Strategy Evaluator

• Utilizes libflush to evaluate eviction strategies • Automatically executed on target platform • Results are stored in a database file

‡ github.com/iaik/armageddon/eviction_strategy_ evaluator

57 • Scan libraries and executable for vulnerable addresses • Additional tool to simulate input events

‡ github.com/iaik/armageddon/cache_template_attacks

Cache Template Attacks

• Cross-platform cache template attack tool

58 • Additional tool to simulate input events

‡ github.com/iaik/armageddon/cache_template_attacks

Cache Template Attacks

• Cross-platform cache template attack tool • Scan libraries and executable for vulnerable addresses

58 ‡ github.com/iaik/armageddon/cache_template_attacks

Cache Template Attacks

• Cross-platform cache template attack tool • Scan libraries and executable for vulnerable addresses • Additional tool to simulate input events

58 Cache Template Attacks

• Cross-platform cache template attack tool • Scan libraries and executable for vulnerable addresses • Additional tool to simulate input events

‡ github.com/iaik/armageddon/cache_template_attacks

58 Countermeasures → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers

59 • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . .

59 → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory

59 • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe?

59 → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap

59 • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible

59 → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions

59 → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . .

59 → Protect crypto. For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

59 For the rest. . . No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto.

59 No satisfying solution

Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . .

59 Countermeasures

• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .

→ Protect crypto. For the rest. . . No satisfying solution

59 Conclusion • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!

Conclusion

• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges

60 • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!

Conclusion

• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels

60 • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!

Conclusion

• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy

60 • ARM TrustZone leaking through the cache • Try our tools yourself!

Conclusion

• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys

60 • Try our tools yourself!

Conclusion

• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache

60 Conclusion

• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!

60 ARMageddon How Your Smartphone CPU Breaks Software-Level Security And Privacy

Moritz Lipp and Cl´ementineMaurice November 3, 2016—Black Hat Europe