Efficient Fault Tolerance using Intel MPX and TSX
Oleksii Oleksenko, Pascal Felber Dmitrii Kuvaiskii, Pramod Bhatotia, Christof Fetzer
DSN’16 Data corruption
● Performance-critical systems ➜ in a low-level language (C/C++) ● Low-level language ➜ no memory protection ○ Applications are more vulnerable to hardware faults
DSN’16 2 Data corruption
● Performance-critical systems ➜ in a low-level language (C/C++) ● Low-level language ➜ no memory protection ○ Applications are more vulnerable to hardware faults ● A pointer gets corrupted ➜ stays undetected
Memory Memory
Corrupted object
Object Object
Corrupted Pointer pointer
DSN’16 2 Overview
Problem:
● Existing solutions are expensive ○ They harden the entire program
Approach:
● Partial protection for efficient fault-tolerance ○ Protect only data pointers
DSN’16 3 Hypothesis
Leverage the new ISA extensions in modern CPUs for fault tolerance
DSN’16 4 Hypothesis
Leverage the new ISA extensions in modern CPUs for fault tolerance
Fault detection Fault recovery Intel MPX Intel TSX (Memory Protection Extension) (Transaction Sync Extensions)
DSN’16 4 Fault detection via Memory Protection Extension
Intel MPX: Bounds checking in the H/W Memory
Insight: Pointer error will cause bound violation with high probability Object bounds
Pointer
DSN’16 5 Fault detection via Memory Protection Extension
Intel MPX: Bounds checking in the H/W Memory
Insight: Pointer error will cause bound violation with high probability Object bounds
Corrupted pointer
DSN’16 5 Fault detection via Memory Protection Extension
Intel MPX: Bounds checking in the H/W Memory Detected by MPX Out-of-bound Insight: Pointer error will cause bound access violation with high probability Object bounds
Corrupted pointer
DSN’16 5 Fault recovery using transactions
Intel TSX: Transactions for optimistic Transaction concurrency control ...
Approach: add mul ● Detect faults using MPX ● Use transactions for recovery mov
...
DSN’16 6 Fault recovery using transactions
Intel TSX: Transactions for optimistic Transaction concurrency control ...
Approach: add mul ● Detect faults using MPX mov ● Use transactions for recovery MPX detects a fault ...
DSN’16 6 Fault recovery using transactions
Intel TSX: Transactions for optimistic Transaction
concurrency control ... Rollback the transaction and re-execute Approach: add mul ● Detect faults using MPX mov ● Use transactions for recovery MPX detects a fault ...
DSN’16 6 Performance overheads
lower better
DSN’16 7 Performance overheads
lower better
DSN’16 7 Performance overheads
lower better
~ 50% slowdown on average
DSN’16 7 Summary
Leverage the new ISA extensions for Transaction fault tolerance: ... Rollback the transaction ● MPX: fault detection add and re-execute mul ● TSX: fault recovery mov MPX detects a Improved efficiency: fault ...
● ~ 50% slowdown ○ State-of-the-art full hardening - 100%
DSN’16 8 Summary
Leverage the new ISA extensions for Transaction fault tolerance: ... Rollback the transaction ● MPX: fault detection add and re-execute mul ● TSX: fault recovery mov MPX detects a Improved efficiency: fault ...
● ~ 50% slowdown ○ State-of-the-art full hardening - 100%
Thanks! [email protected] DSN’16 8 DSN’16 9 Backups
DSN’16 19 Generic Solutions
There are solutions for full-program hardening:
● Thread- and process-level ● Duplicated execution redundancy ○ high performance overhead ○ additional hardware ○ x2 on average ○ i.e., more cores used
add add add add mul mul mul mul mov mov mov mov
DSN’16 20 Approach
...
add Recover (TSX)
mul
mov
Detect (MPX)
DSN’16 21