ARMageddon How Your Smartphone CPU Breaks Software-Level Security And Privacy
Moritz Lipp and Cl´ementineMaurice November 3, 2016—Black Hat Europe • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?
Motivation
• Safe software infrastructure does not mean safe execution
2 • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?
Motivation
• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware
2 • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?
Motivation
• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache
2 • Only been demonstrated on Intel x86 for now • But why not on ARM?
Motivation
• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations
2 • But why not on ARM?
Motivation
• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now
2 Motivation
• Safe software infrastructure does not mean safe execution • Information leaks because of the underlying hardware • We focus on the CPU cache • Cache attacks can be used for covert communications and attack crypto implementations • Only been demonstrated on Intel x86 for now • But why not on ARM?
2 Who We Are Who We Are
• Moritz Lipp • Master Student, Graz University Of Technology • 7 @mlqxyz • R [email protected]
3 Who We Are
• Cl´ementineMaurice • PhD in InfoSec; Postdoc, Graz University Of Technology • 7 @BloodyTangerine • R [email protected]
4 The rest of the team
The rest of the research team • Daniel Gruss • Raphael Spreitzer • Stefan Mangard From Graz University of Technology
5 Demo
6 • What are the challenges for cache attacks on ARM? • How to solve those challenges • Attack scenarios • Tools
Outline
• Background information
7 • How to solve those challenges • Attack scenarios • Tools
Outline
• Background information • What are the challenges for cache attacks on ARM?
7 • Attack scenarios • Tools
Outline
• Background information • What are the challenges for cache attacks on ARM? • How to solve those challenges
7 • Tools
Outline
• Background information • What are the challenges for cache attacks on ARM? • How to solve those challenges • Attack scenarios
7 Outline
• Background information • What are the challenges for cache attacks on ARM? • How to solve those challenges • Attack scenarios • Tools
7 Cache Attacks • CPU registers • Different levels of the CPU cache • Main memory • Disk storage
Memory Hierarchy
CPU Registers L1 Cache L2 Cache Memory Disk storage
• Data can reside in
8 • Different levels of the CPU cache • Main memory • Disk storage
Memory Hierarchy
CPU Registers L1 Cache L2 Cache Memory Disk storage
• Data can reside in • CPU registers
8 • Main memory • Disk storage
Memory Hierarchy
CPU Registers L1 Cache L2 Cache Memory Disk storage
• Data can reside in • CPU registers • Different levels of the CPU cache
8 • Disk storage
Memory Hierarchy
CPU Registers L1 Cache L2 Cache Memory Disk storage
• Data can reside in • CPU registers • Different levels of the CPU cache • Main memory
8 Memory Hierarchy
CPU Registers L1 Cache L2 Cache Memory Disk storage
• Data can reside in • CPU registers • Different levels of the CPU cache • Main memory • Disk storage
8 • cache → fast (cache hit) • main memory → slow (cache miss)
·104
Cache hit Cache miss 3
2
1 Number of accesses
0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles)
Cache Attacks
• Exploit timing differences of memory accesses:
9 • main memory → slow (cache miss)
·104
Cache hit Cache miss 3
2
1 Number of accesses
0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles)
Cache Attacks
• Exploit timing differences of memory accesses: • cache → fast (cache hit)
9 ·104
Cache hit Cache miss 3
2
1 Number of accesses
0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles)
Cache Attacks
• Exploit timing differences of memory accesses: • cache → fast (cache hit) • main memory → slow (cache miss)
9 Cache Attacks
• Exploit timing differences of memory accesses: • cache → fast (cache hit) • main memory → slow (cache miss)
·104
Cache hit Cache miss 3
2
1 Number of accesses
0 200 400 600 800 1,000 1,200 Measured access time (CPU cycles) 9 Set-Associative Caches
0 16 17 25 26 31 Address Index Offset
Cache
10 Set-Associative Caches
0 16 17 25 26 31 Address Index Offset
Cache set
Cache
Data loaded in a specific set depending on its address
10 Set-Associative Caches
0 16 17 25 26 31 Address Index Offset
way 0 way 3
Cache set
Cache
Data loaded in a specific set depending on its address Several ways per set
10 Set-Associative Caches
0 16 17 25 26 31 Address Index Offset
way 0 way 3
Cache set
Cache line
Cache
Data loaded in a specific set depending on its address Several ways per set Cache line loaded in a specific way depending on the replacement policy
10 Cache Attacks: Flush+Reload
Victim address space Cache Attacker address space
Step 1: Attacker maps shared library (shared memory, in cache)
11 Cache Attacks: Flush+Reload
cached cached
Victim address space Cache Attacker address space
Step 1: Attacker maps shared library (shared memory, in cache)
11 Cache Attacks: Flush+Reload
flushes
Victim address space Cache Attacker address space
Step 1: Attacker maps shared library (shared memory, in cache) Step 2: Attacker flushes the shared cache line
11 Cache Attacks: Flush+Reload
loads data
Victim address space Cache Attacker address space
Step 1: Attacker maps shared library (shared memory, in cache) Step 2: Attacker flushes the shared cache line Step 3: Victim loads the data
11 Cache Attacks: Flush+Reload
reloads data
Victim address space Cache Attacker address space
Step 1: Attacker maps shared library (shared memory, in cache) Step 2: Attacker flushes the shared cache line Step 3: Victim loads the data Step 4: Attacker reloads the data
11 Cache Attacks: Prime+Probe
Victim address space Cache Attacker address space
12 Cache Attacks: Prime+Probe
Victim address space Cache Attacker address space
Step 1: Attacker primes, i.e., fills, the cache (no shared memory)
12 Cache Attacks: Prime+Probe
loads data
Victim address space Cache Attacker address space
Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running
12 Cache Attacks: Prime+Probe
loads data
Victim address space Cache Attacker address space
Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running
12 Cache Attacks: Prime+Probe
fast access
Victim address space Cache Attacker address space
Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running Step 3: Attacker probes data to determine if set has been accessed
12 Cache Attacks: Prime+Probe
slow access
Victim address space Cache Attacker address space
Step 1: Attacker primes, i.e., fills, the cache (no shared memory) Step 2: Victim evicts cache lines while running Step 3: Attacker probes data to determine if set has been accessed
12 Differences between Intel x86 and ARM • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace
Challenge #1 No flush instruction
Cache maintenance
• Basic operation for cache attacks: invalidate cache lines
13 • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace
Challenge #1 No flush instruction
Cache maintenance
• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions
13 • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace
Challenge #1 No flush instruction
Cache maintenance
• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction
13 • ARMv8-A: Privileged instructions can be unlocked for userspace
Challenge #1 No flush instruction
Cache maintenance
• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions
13 Challenge #1 No flush instruction
Cache maintenance
• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace
13 Cache maintenance
• Basic operation for cache attacks: invalidate cache lines • Cache maintenance instructions • Intel x86: Unprivileged clflush instruction • ARMv7-A: Only privileged cache maintenance instructions • ARMv8-A: Privileged instructions can be unlocked for userspace
Challenge #1 No flush instruction
13 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
→ too slow • Fill a specific cache set
cache set
Cache eviction
• Fill the whole cache
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
• Fill a specific cache set
cache set
Cache eviction
• Fill the whole cache → too slow
14 → Ideal case with LRU replacement policy
• Until the target address is evicted from the cache
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set
cache set 2 5 8 1 7 6 3 4
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 2 5 8 19 7 6 3 4
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 102 5 8 19 7 6 3 4
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 102 5 8 19 7 6 113 4
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 102 5 8 19 7 6 113 124
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 102 135 8 19 7 6 113 124
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 102 135 8 19 7 146 113 124
14 • Until the target address is evicted from the cache
→ Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set load
cache set 102 135 8 19 157 146 113 124
14 → Ideal case with LRU replacement policy
Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set • Until the target address is evicted from the cache load
cache set 102 135 168 19 157 146 113 124
14 Cache eviction
• Fill the whole cache → too slow • Fill a specific cache set • Until the target address is evicted from the cache
cache set 102 135 168 19 157 146 113 124
→ Ideal case with LRU replacement policy
14 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy
cache set 2 5 8 1 7 6 3 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 2 5 8 19 7 6 3 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 102 5 8 19 7 6 3 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 102 5 8 1119 7 6 3 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 10122 5 8 1119 7 6 3 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 10122 5 8 1119 7 6 133 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 10122 5 8 1119 7 6 13143 4
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 10122 5 8 1119 7 6 13143 154
15 → Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy load
cache set 10122 5 8 1119 7 6 13143 15164
15 Challenge #2 Pseudo-random replacement policy complicates eviction
Cache eviction (what actually happens)
• Pseudo-random cache replacement policy
cache set 10122 5 8 1119 7 6 13143 15164
→ Simple approach highly inefficient
15 Cache eviction (what actually happens)
• Pseudo-random cache replacement policy
cache set 10122 5 8 1119 7 6 13143 15164
→ Simple approach highly inefficient
Challenge #2 Pseudo-random replacement policy complicates eviction
15 • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode → Previous attacks required root access
Challenge #3 No unprivileged and accurate timing sources
Timing measurements
• Need fine-grained timing measurements
16 • ARM: Cycle counter only in privileged mode → Previous attacks required root access
Challenge #3 No unprivileged and accurate timing sources
Timing measurements
• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count
16 → Previous attacks required root access
Challenge #3 No unprivileged and accurate timing sources
Timing measurements
• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode
16 Challenge #3 No unprivileged and accurate timing sources
Timing measurements
• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode → Previous attacks required root access
16 Timing measurements
• Need fine-grained timing measurements • Intel x86: Unprivileged rdtsc instruction for cycle count • ARM: Cycle counter only in privileged mode → Previous attacks required root access
Challenge #3 No unprivileged and accurate timing sources
16 • shared • inclusive → Shared memory is shared in the cache across all cores
Cache Hierarchy on Intel CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache L2 Cache L2 Cache L2 Cache
L3 Cache
• Last-level cache: L3
17 • inclusive → Shared memory is shared in the cache across all cores
Cache Hierarchy on Intel CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache L2 Cache L2 Cache L2 Cache
L3 Cache
• Last-level cache: L3 • shared
17 → Shared memory is shared in the cache across all cores
Cache Hierarchy on Intel CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache L2 Cache L2 Cache L2 Cache
L3 Cache
• Last-level cache: L3 • shared • inclusive
17 Cache Hierarchy on Intel CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache L2 Cache L2 Cache L2 Cache
L3 Cache
• Last-level cache: L3 • shared • inclusive → Shared memory is shared in the cache across all cores
17 • shared • not inclusive → Shared memory that is not in L2 is not shared in the cache.
Challenge #4 Non-inclusive caches
Cache Hierarchy on ARM Cortex-A CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Last-level cache: L2
18 • not inclusive → Shared memory that is not in L2 is not shared in the cache.
Challenge #4 Non-inclusive caches
Cache Hierarchy on ARM Cortex-A CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Last-level cache: L2 • shared
18 → Shared memory that is not in L2 is not shared in the cache.
Challenge #4 Non-inclusive caches
Cache Hierarchy on ARM Cortex-A CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Last-level cache: L2 • shared • not inclusive
18 Challenge #4 Non-inclusive caches
Cache Hierarchy on ARM Cortex-A CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Last-level cache: L2 • shared • not inclusive → Shared memory that is not in L2 is not shared in the cache.
18 Cache Hierarchy on ARM Cortex-A CPUs
Core 0 Core 1 Core 2 Core 3
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Last-level cache: L2 • shared • not inclusive → Shared memory that is not in L2 is not shared in the cache.
Challenge #4 Non-inclusive caches
18 • CPUs do not share a cache
Challenge #5 No shared cache
Cache Hierarchy on ARM big.LITTLE
CPU1 CPU2
Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L2 Cache L2 Cache
CoreLink CCI-400
• Interconnects multiple CPUs to combine energy efficiency and performance
19 Challenge #5 No shared cache
Cache Hierarchy on ARM big.LITTLE
CPU1 CPU2
Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L2 Cache L2 Cache
CoreLink CCI-400
• Interconnects multiple CPUs to combine energy efficiency and performance • CPUs do not share a cache
19 Cache Hierarchy on ARM big.LITTLE
CPU1 CPU2
Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L1I L1D L2 Cache L2 Cache
CoreLink CCI-400
• Interconnects multiple CPUs to combine energy efficiency and performance • CPUs do not share a cache
Challenge #5 No shared cache
19 Let’s solve those challenges Challenges
Challenge #1 No flush instruction Challenge #2 Pseudo-random replacement policy Challenge #3 No unprivileged timing Challenge #4 Non-inclusive caches Challenge #5 No shared cache
20 • Works on Intel x86 • Prime+Probe • Flush+Reload → Evict+Reload
Solving #1: No flush instruction
• Replace the missing flush instruction with cache eviction
21 • Prime+Probe • Flush+Reload → Evict+Reload
Solving #1: No flush instruction
• Replace the missing flush instruction with cache eviction • Works on Intel x86
21 Solving #1: No flush instruction
• Replace the missing flush instruction with cache eviction • Works on Intel x86 • Prime+Probe • Flush+Reload → Evict+Reload
21 Challenges
Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Challenge #3 No unprivileged timing Challenge #4 Non-inclusive caches Challenge #5 No shared cache
22 Challenges
23 • Eviction can be slow and unreliable. . . • Unless you know how to properly evict data → Central idea of our Rowhammer.js paper
Challenges
• No.
24 • Unless you know how to properly evict data → Central idea of our Rowhammer.js paper
Challenges
• No. • Eviction can be slow and unreliable. . .
24 → Central idea of our Rowhammer.js paper
Challenges
• No. • Eviction can be slow and unreliable. . . • Unless you know how to properly evict data
24 Challenges
• No. • Eviction can be slow and unreliable. . . • Unless you know how to properly evict data → Central idea of our Rowhammer.js paper
24 • Accessing once n addresses in an n-way cache set → Cache eviction slow and unreliable
Solution: • Accessing unique addresses several times, with different access patterns
Solving #2: Pseudo-random replacement policy
• Cache line to be discarded is chosen pseudo-randomly
25 → Cache eviction slow and unreliable
Solution: • Accessing unique addresses several times, with different access patterns
Solving #2: Pseudo-random replacement policy
• Cache line to be discarded is chosen pseudo-randomly • Accessing once n addresses in an n-way cache set
25 Solution: • Accessing unique addresses several times, with different access patterns
Solving #2: Pseudo-random replacement policy
• Cache line to be discarded is chosen pseudo-randomly • Accessing once n addresses in an n-way cache set → Cache eviction slow and unreliable
25 Solving #2: Pseudo-random replacement policy
• Cache line to be discarded is chosen pseudo-randomly • Accessing once n addresses in an n-way cache set → Cache eviction slow and unreliable
Solution: • Accessing unique addresses several times, with different access patterns
25 200 200 33 110 96.04% 800 800 142 876 99.10% 21 96 4 275 99.93% 22 102 5 101 99.99% 23 190 6 209 100.0%
Solving #2: Pseudo-random replacement policy
Table 1: Different eviction strategies for the Alcatel One Touch Pop 2
Addresses Accesses Cycles Eviction rate 48 48 6 517 70.78%
26 800 800 142 876 99.10% 21 96 4 275 99.93% 22 102 5 101 99.99% 23 190 6 209 100.0%
Solving #2: Pseudo-random replacement policy
Table 1: Different eviction strategies for the Alcatel One Touch Pop 2
Addresses Accesses Cycles Eviction rate 48 48 6 517 70.78% 200 200 33 110 96.04%
26 21 96 4 275 99.93% 22 102 5 101 99.99% 23 190 6 209 100.0%
Solving #2: Pseudo-random replacement policy
Table 1: Different eviction strategies for the Alcatel One Touch Pop 2
Addresses Accesses Cycles Eviction rate 48 48 6 517 70.78% 200 200 33 110 96.04% 800 800 142 876 99.10%
26 22 102 5 101 99.99% 23 190 6 209 100.0%
Solving #2: Pseudo-random replacement policy
Table 1: Different eviction strategies for the Alcatel One Touch Pop 2
Addresses Accesses Cycles Eviction rate 48 48 6 517 70.78% 200 200 33 110 96.04% 800 800 142 876 99.10% 21 96 4 275 99.93%
26 23 190 6 209 100.0%
Solving #2: Pseudo-random replacement policy
Table 1: Different eviction strategies for the Alcatel One Touch Pop 2
Addresses Accesses Cycles Eviction rate 48 48 6 517 70.78% 200 200 33 110 96.04% 800 800 142 876 99.10% 21 96 4 275 99.93% 22 102 5 101 99.99%
26 Solving #2: Pseudo-random replacement policy
Table 1: Different eviction strategies for the Alcatel One Touch Pop 2
Addresses Accesses Cycles Eviction rate 48 48 6 517 70.78% 200 200 33 110 96.04% 800 800 142 876 99.10% 21 96 4 275 99.93% 22 102 5 101 99.99% 23 190 6 209 100.0%
26 • Compile target executable with generated eviction strategy • Execute on target device • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device
Solving #2: Pseudo-random replacement policy
• We fully automated this process
27 • Execute on target device • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device
Solving #2: Pseudo-random replacement policy
• We fully automated this process • Compile target executable with generated eviction strategy
27 • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device
Solving #2: Pseudo-random replacement policy
• We fully automated this process • Compile target executable with generated eviction strategy • Execute on target device
27 • Find fast and efficient eviction strategies for any device
Solving #2: Pseudo-random replacement policy
• We fully automated this process • Compile target executable with generated eviction strategy • Execute on target device • Evaluate log files and build result database
27 Solving #2: Pseudo-random replacement policy
• We fully automated this process • Compile target executable with generated eviction strategy • Execute on target device • Evaluate log files and build result database • Find fast and efficient eviction strategies for any device
27 Prime+Probe
Victim access No victim access ·103 6 4 2
Number of accesses 1,600 1,800 2,000 2,200 2,400 2,600 2,800 3,000 3,200 3,400 Measured access time in cycles
Solving #2: Pseudo-random replacement policy
Evict+Reload
Victim access No victim access ·104
4
2
0
Number of accesses 0 50 100 150 200 250 300 Measured access time in cycles
28 Solving #2: Pseudo-random replacement policy
Evict+Reload
Victim access No victim access ·104
4
2
0
Number of accesses 0 50 100 150 200 250 300 Measured access time in cycles
Prime+Probe
Victim access No victim access ·103 6 4 2
Number of accesses 1,600 1,800 2,000 2,200 2,400 2,600 2,800 3,000 3,200 3,400 Measured access time in cycles 28 Challenges
Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Challenge #4 Non-inclusive caches Challenge #5 No shared cache
29 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged
30 • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall
30 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n
30 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function
30 • Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
30 • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources
30 → Allows distinguishing cache hits from cache misses
Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources • Nanosecond resolution for all sources
30 Solving #3: No unprivileged timing
1. Performance counter: Privileged 2. perf event open: Unprivileged syscall • Unavailable on new devices: Privileged or kernel compiled with CONFIG PERF=n 3. clock gettime: Unprivileged POSIX function 4. Thread counter: Unprivileged, multithreaded
• Unprivileged timing sources • Nanosecond resolution for all sources → Allows distinguishing cache hits from cache misses
30 Solving #3: No unprivileged timing
Hit (PMCCNTR) Hit (clock gettime×.15) Miss (PMCCNTR) Miss (clock gettime×.15) Hit (syscall×.25) Hit (counter thread×.05) Miss (syscall×.25) Miss (counter thread×.05)
·104
4
2 Number of accesses
0 0 20 40 60 80 100 120 140 160 180 200 Measured access time (scaled)
31 Challenges
Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Ë Challenge #4 Non-inclusive caches Challenge #5 No shared cache
32 • Fill-up L1 D-Cache and begin to populate shared L2 Cache • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches
33 • Fill-up L1 D-Cache and begin to populate shared L2 Cache • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 • Inclusion: Line evicted from L2 → also evicted from L1
Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
33 Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
• Inclusion: Line evicted from L2 → also evicted from L1 33 Solving #4: Non-inclusive caches
Core 0 Core 1
L1 I-Cache L1 D-Cache L1 I-Cache L1 D-Cache
L2 Cache
• Instruction-inclusive, data-non-inclusive caches • Fill-up L1 D-Cache and begin to populate shared L2 Cache
• Inclusion: Line evicted from L2 → also evicted from L1 33 • Cache coherency protocol • Fetches data from remote cores • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location
Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0
Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles
Solving #4: Non-inclusive caches
• How can we determine if the victim has loaded the address?
34 • Fetches data from remote cores • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location
Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0
Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles
Solving #4: Non-inclusive caches
• How can we determine if the victim has loaded the address? • Cache coherency protocol
34 • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location
Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0
Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles
Solving #4: Non-inclusive caches
• How can we determine if the victim has loaded the address? • Cache coherency protocol • Fetches data from remote cores
34 → Detect if another core has accessed the memory location
Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0
Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles
Solving #4: Non-inclusive caches
• How can we determine if the victim has loaded the address? • Cache coherency protocol • Fetches data from remote cores • Remote cache hit is faster than DRAM access
34 Solving #4: Non-inclusive caches
• How can we determine if the victim has loaded the address? • Cache coherency protocol • Fetches data from remote cores • Remote cache hit is faster than DRAM access → Detect if another core has accessed the memory location
Hit (same core) Hit (cross-core) ·104 Miss (same core) Miss (cross-core) 3 2 1 0
Number of accesses 0 100 200 300 400 500 600 700 800 900 1,000 Measured access time in cycles
34 Challenges
Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Ë Challenge #4 Non-inclusive caches Ë Challenge #5 No shared cache
35 • Cache coherency protocol (again) • Fetches data from remote CPU • Remote cache hit is faster than DRAM access
Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0
Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles
Solving #5: No shared cache
• Multiple CPUs that do not share a cache
36 • Fetches data from remote CPU • Remote cache hit is faster than DRAM access
Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0
Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles
Solving #5: No shared cache
• Multiple CPUs that do not share a cache • Cache coherency protocol (again)
36 • Remote cache hit is faster than DRAM access
Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0
Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles
Solving #5: No shared cache
• Multiple CPUs that do not share a cache • Cache coherency protocol (again) • Fetches data from remote CPU
36 Solving #5: No shared cache
• Multiple CPUs that do not share a cache • Cache coherency protocol (again) • Fetches data from remote CPU • Remote cache hit is faster than DRAM access
Hit (same core) Hit (cross-cpu) ·104 Miss (same core) Miss (cross-cpu) 3 2 1 0
Number of accesses 0 50 100 150 200 250 300 350 400 450 500 Measured access time in cycles
36 Challenges
Challenge #1 No flush instruction Ë Challenge #2 Pseudo-random replacement policy Ë Challenge #3 No unprivileged timing Ë Challenge #4 Non-inclusive caches Ë Challenge #5 No shared cache Ë
37 Attack scenarios Case study #1 Covert communication
38 • No permissions except accessing the Internet
• No permissions except accessing your images • Malicious weather widget
Case Study #1: Covert Channel
• Malicious privacy gallery app
39 • No permissions except accessing the Internet
• Malicious weather widget
Case Study #1: Covert Channel
• Malicious privacy gallery app • No permissions except accessing your images
39 • No permissions except accessing the Internet
Case Study #1: Covert Channel
• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget
39 Case Study #1: Covert Channel
• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet
39 Case Study #1: Covert Channel
• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet
covert channel
39 Case Study #1: Covert Channel
• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet
covert channel
39 Case Study #1: Covert Channel
• Malicious privacy gallery app • No permissions except accessing your images • Malicious weather widget • No permissions except accessing the Internet
covert channel
39 • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so
40 • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions
40 • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . .
40 • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel
40 • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate
40 • Evades the sandboxing concept and permission system → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS
40 → Collusion attack
Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system
40 Case Study #1: Covert Channel
• Two apps want to communicate with each other, but are not allowed or not able to do so • No permissions • No Intents, Binders, ASHMEM, . . . • A covert channel • Enables two unprivileged apps to communicate • Does not use data transfer mechanisms provided by the OS • Evades the sandboxing concept and permission system → Collusion attack
40 • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit
Case Study #1: Covert Channel
• Build a covert channel using the cache
41 • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit
Case Study #1: Covert Channel
• Build a covert channel using the cache • Using addresses from a shared library/executable
41 • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit
Case Study #1: Covert Channel
• Build a covert channel using the cache • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses
41 • Transmit 1: Access the address → cache hit
Case Study #1: Covert Channel
• Build a covert channel using the cache • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss
41 Case Study #1: Covert Channel
• Build a covert channel using the cache • Using addresses from a shared library/executable • Bits transmitted with cache hits and misses • Transmit 0: Do not access the address → cache miss • Transmit 1: Access the address → cache hit
41 • Use a sequence number (SQN) • Protect payload and sequence number with a checksum
078232431 Sender SQN Payload CRC
02310 Receiver SQN CRC
Case Study #1: Covert Channel
• Build a protocol based on packets
42 • Protect payload and sequence number with a checksum
078232431 Sender SQN Payload CRC
02310 Receiver SQN CRC
Case Study #1: Covert Channel
• Build a protocol based on packets • Use a sequence number (SQN)
42 Case Study #1: Covert Channel
• Build a protocol based on packets • Use a sequence number (SQN) • Protect payload and sequence number with a checksum
078232431 Sender SQN Payload CRC
02310 Receiver SQN CRC
42 • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude
Work Type Bandwidth [bps] Error rate
Case Study #1: Covert Channel
• Works using Flush+Reload, Evict+Reload and Flush+Flush
43 • Faster than state of the art by several orders of magnitude
Work Type Bandwidth [bps] Error rate
Case Study #1: Covert Channel
• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU
43 Work Type Bandwidth [bps] Error rate
Case Study #1: Covert Channel
• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude
43 Case Study #1: Covert Channel
• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude
Work Type Bandwidth [bps] Error rate Schlegel et al. Vibration settings 87 – Schlegel et al. Volume settings 150 – Schlegel et al. File locks 685 – Marforio et al. UNIX socket discovery 2 600 – Marforio et al. Type of Intents 4 300 –
43 Case Study #1: Covert Channel
• Works using Flush+Reload, Evict+Reload and Flush+Flush • Works cross-core and cross-CPU • Faster than state of the art by several orders of magnitude
Work Type Bandwidth [bps] Error rate Schlegel et al. Vibration settings 87 – Schlegel et al. Volume settings 150 – Schlegel et al. File locks 685 – Marforio et al. UNIX socket discovery 2 600 – Marforio et al. Type of Intents 4 300 – Ours (OnePlus One) Evict+Reload, cross-core 12 537 5.00% Ours (Alcatel One Touch Pop 2) Evict+Reload, cross-core 13 618 3.79% Ours (Samsung Galaxy S6) Flush+Flush, cross-core 178 292 0.48% Ours (Samsung Galaxy S6) Flush+Reload, cross-CPU 257 509 1.83% Ours (Samsung Galaxy S6) Flush+Reload, cross-core 1 140 650 1.10%
43 Case study #2 Spying on the user
44 → Cache Template Attacks
1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address
Case Study #2: Spying on the User
• Issue: Locating event-dependent memory access
45 1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address
Case Study #2: Spying on the User
• Issue: Locating event-dependent memory access → Cache Template Attacks
45 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address
Case Study #2: Spying on the User
• Issue: Locating event-dependent memory access → Cache Template Attacks
1. Shared library or executable is mapped
45 → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address
Case Study #2: Spying on the User
• Issue: Locating event-dependent memory access → Cache Template Attacks
1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address
45 3. Repeat step 2 for every address
Case Study #2: Spying on the User
• Issue: Locating event-dependent memory access → Cache Template Attacks
1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable
45 Case Study #2: Spying on the User
• Issue: Locating event-dependent memory access → Cache Template Attacks
1. Shared library or executable is mapped 2. Trigger an event in parallel and Flush+Reload one address → Cache hit: Address used by the library/executable 3. Repeat step 2 for every address
45 Case Study #2: Spying on the User
• Cache template matrix = How many cache hits for each pair (event, address)? • On shared library and ART binaries, e.g., AOSP keyboard
alphabet enter
Input space backspace 0x41500 0x45140 0x56940 0x57280 0x58480 0x60280 0x60340 0x60580 0x66340 0x66380 0x72300 0x72340 0x72380 0x78440 0x98600
A ddresses
46 Case Study #2: Spying on the User
• Cache template matrix = How many cache hits for each pair (event, address)? • On shared library and ART binaries, e.g., AOSP keyboard
alphabet enter
Input space backspace 0x41500 0x45140 0x56940 0x57280 0x58480 0x60280 0x60340 0x60580 0x66340 0x66380 0x72300 0x72340 0x72380 0x78440 0x98600
A ddresses Many cache hits
46 Case Study #2: Spying on the User
• Cache template matrix = How many cache hits for each pair (event, address)? • On shared library and ART binaries, e.g., AOSP keyboard
No cache hits
alphabet enter
Input space backspace 0x41500 0x45140 0x56940 0x57280 0x58480 0x60280 0x60340 0x60580 0x66340 0x66380 0x72300 0x72340 0x72380 0x78440 0x98600
A ddresses Many cache hits
46 Case Study #2: Spying on the User
Evict+Reload on two addresses on the Alcatel One Touch Pop 2 in custpack@app@[email protected]@classes.dex → Differentiate keys from spaces
Key Space 300
200 Access time
100 t h i s Space i s Space a Space m e s s a g e
0 1 2 3 4 5 6 7 Time in seconds
47 • Scan all the libraries • Find all secret-dependent accesses and automate attacks • No need for source code • Spy and learn about user’s behavior
Case Study #2: Spying on the User
• Endless possibilities
48 • Find all secret-dependent accesses and automate attacks • No need for source code • Spy and learn about user’s behavior
Case Study #2: Spying on the User
• Endless possibilities • Scan all the libraries
48 • No need for source code • Spy and learn about user’s behavior
Case Study #2: Spying on the User
• Endless possibilities • Scan all the libraries • Find all secret-dependent accesses and automate attacks
48 • Spy and learn about user’s behavior
Case Study #2: Spying on the User
• Endless possibilities • Scan all the libraries • Find all secret-dependent accesses and automate attacks • No need for source code
48 Case Study #2: Spying on the User
• Endless possibilities • Scan all the libraries • Find all secret-dependent accesses and automate attacks • No need for source code • Spy and learn about user’s behavior
48 Case study #3 Attacking cryptographic algorithms
49 • Uses precomputed look-up tables • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are
(0) xi = pi ⊕ ki for all i = 0,...,15
→ Recovering accessed table indices ⇒ recovering the key
Case Study #3: Attacking Cryptographic Algorithms
• AES T-Tables: Fast software implementation
50 • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are
(0) xi = pi ⊕ ki for all i = 0,...,15
→ Recovering accessed table indices ⇒ recovering the key
Case Study #3: Attacking Cryptographic Algorithms
• AES T-Tables: Fast software implementation • Uses precomputed look-up tables
50 → Recovering accessed table indices ⇒ recovering the key
Case Study #3: Attacking Cryptographic Algorithms
• AES T-Tables: Fast software implementation • Uses precomputed look-up tables • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are
(0) xi = pi ⊕ ki for all i = 0,...,15
50 Case Study #3: Attacking Cryptographic Algorithms
• AES T-Tables: Fast software implementation • Uses precomputed look-up tables • One-round known-plaintext attack by Osvik et al. (2006) • p plaintext and k secret key (r) (r) (r) • Intermediate state x = (x0 ,...,x15 ) at each round r • First round, accessed table indices are
(0) xi = pi ⊕ ki for all i = 0,...,15
→ Recovering accessed table indices ⇒ recovering the key
50 → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values
Case Study #3: Attacking Cryptographic Algorithms
• Bouncy Castle → default implementation uses T-Tables
51 • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values
Case Study #3: Attacking Cryptographic Algorithms
• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed!
51 • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values
Case Study #3: Attacking Cryptographic Algorithms
• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts
51 → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values
Case Study #3: Attacking Cryptographic Algorithms
• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload
51 Case Study #3: Attacking Cryptographic Algorithms
• Bouncy Castle → default implementation uses T-Tables → Let’s monitor which T-Table entry is accessed! • Java VM creates a copy of the T-tables when the app starts • No shared memory → no Evict+Reload or Flush+Reload → Prime+Probe to the rescue! A ddress 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0F 0x0A 0x0B 0x0C 0x0E 0x0D Plaintext byte values 51 Case study #4 Monitoring ARM TrustZone
52 • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology
53 • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment
53 • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment • Roots of Trust
53 • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world
53 • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store
53 • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments
53 • Information from the trusted world should not be leaked to the non-secure world
Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM)
53 Case Study #4: Monitoring ARM TrustZone
• Hardware-based security technology • Secure Execution Environment • Roots of Trust • Applications (trustlets) running in a secure world • Credential-store • Secure element for payments • Digital Rights Management (DRM) • Information from the trusted world should not be leaked to the non-secure world
53 • No knowledge of the TrustZone/trustlet implementation • Valid keys and invalid keys are distinguishable
·106 Valid key 1 Valid key 2 Valid key 3 1 Invalid key
0.5 Mean Squared Error (MSE)
0 250 260 270 280 290 300 310 320 330 340 350 Set number
Case Study #4: Monitoring ARM TrustZone
• Timing difference for different RSA signature keys
54 • Valid keys and invalid keys are distinguishable
·106 Valid key 1 Valid key 2 Valid key 3 1 Invalid key
0.5 Mean Squared Error (MSE)
0 250 260 270 280 290 300 310 320 330 340 350 Set number
Case Study #4: Monitoring ARM TrustZone
• Timing difference for different RSA signature keys • No knowledge of the TrustZone/trustlet implementation
54 Case Study #4: Monitoring ARM TrustZone
• Timing difference for different RSA signature keys • No knowledge of the TrustZone/trustlet implementation • Valid keys and invalid keys are distinguishable
·106 Valid key 1 Valid key 2 Valid key 3 1 Invalid key
0.5 Mean Squared Error (MSE)
0 250 260 270 280 290 300 310 320 330 340 350 Set number
54 Tools • x86 • ARMv7 • ARMv8 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch • Open Source
libflush
• Library to build cross-platform cache attacks
55 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch • Open Source
libflush
• Library to build cross-platform cache attacks • x86 • ARMv7 • ARMv8
55 • Open Source
libflush
• Library to build cross-platform cache attacks • x86 • ARMv7 • ARMv8 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch
55 libflush
• Library to build cross-platform cache attacks • x86 • ARMv7 • ARMv8 • Implements various attack techniques • Flush+Reload • Evict+Reload • Prime+Probe • Flush+Flush • Prefetch • Open Source
55 • Includes eviction strategies for several devices • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks
github.com/iaik/armageddon/libflush
libflush
• All described examples have been developed on top of libflush
56 • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks
github.com/iaik/armageddon/libflush
libflush
• All described examples have been developed on top of libflush • Includes eviction strategies for several devices
56 • Even allows to implement cross-platform Rowhammer attacks
github.com/iaik/armageddon/libflush
libflush
• All described examples have been developed on top of libflush • Includes eviction strategies for several devices • Comes with example code and documentation
56 github.com/iaik/armageddon/libflush
libflush
• All described examples have been developed on top of libflush • Includes eviction strategies for several devices • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks
56 libflush
• All described examples have been developed on top of libflush • Includes eviction strategies for several devices • Comes with example code and documentation • Even allows to implement cross-platform Rowhammer attacks
github.com/iaik/armageddon/libflush
56 • Automatically executed on target platform • Results are stored in a database file
github.com/iaik/armageddon/eviction_strategy_ evaluator
Eviction Strategy Evaluator
• Utilizes libflush to evaluate eviction strategies
57 • Results are stored in a database file
github.com/iaik/armageddon/eviction_strategy_ evaluator
Eviction Strategy Evaluator
• Utilizes libflush to evaluate eviction strategies • Automatically executed on target platform
57 github.com/iaik/armageddon/eviction_strategy_ evaluator
Eviction Strategy Evaluator
• Utilizes libflush to evaluate eviction strategies • Automatically executed on target platform • Results are stored in a database file
57 Eviction Strategy Evaluator
• Utilizes libflush to evaluate eviction strategies • Automatically executed on target platform • Results are stored in a database file
github.com/iaik/armageddon/eviction_strategy_ evaluator
57 • Scan libraries and executable for vulnerable addresses • Additional tool to simulate input events
github.com/iaik/armageddon/cache_template_attacks
Cache Template Attacks
• Cross-platform cache template attack tool
58 • Additional tool to simulate input events
github.com/iaik/armageddon/cache_template_attacks
Cache Template Attacks
• Cross-platform cache template attack tool • Scan libraries and executable for vulnerable addresses
58 github.com/iaik/armageddon/cache_template_attacks
Cache Template Attacks
• Cross-platform cache template attack tool • Scan libraries and executable for vulnerable addresses • Additional tool to simulate input events
58 Cache Template Attacks
• Cross-platform cache template attack tool • Scan libraries and executable for vulnerable addresses • Additional tool to simulate input events
github.com/iaik/armageddon/cache_template_attacks
58 Countermeasures → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers
59 • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . .
59 → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory
59 • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe?
59 → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap
59 • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible
59 → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions
59 → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . .
59 → Protect crypto. For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
59 For the rest. . . No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto.
59 No satisfying solution
Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . .
59 Countermeasures
• Coarse-grained timers → But a thread timer is sufficient. . . • No shared memory → What about Prime+Probe? • Restrict system information, e.g., pagemap → Harder but still possible • Use cryptographic instruction extensions → Still not the default everywhere. . . → Doesn’t protect from spying user behavior. . .
→ Protect crypto. For the rest. . . No satisfying solution
59 Conclusion • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!
Conclusion
• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges
60 • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!
Conclusion
• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels
60 • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!
Conclusion
• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy
60 • ARM TrustZone leaking through the cache • Try our tools yourself!
Conclusion
• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys
60 • Try our tools yourself!
Conclusion
• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache
60 Conclusion
• All powerful cache attacks are applicable to mobile devices • Without any permissions or privileges • Building fast covert channels • Spying on user activity with a high accuracy • Deriving cryptographic keys • ARM TrustZone leaking through the cache • Try our tools yourself!
60 ARMageddon How Your Smartphone CPU Breaks Software-Level Security And Privacy
Moritz Lipp and Cl´ementineMaurice November 3, 2016—Black Hat Europe