Efficient Fault Tolerance Using Intel MPX and TSX
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Fault Tolerance using Intel MPX and TSX Oleksii Oleksenko, Pascal Felber Dmitrii Kuvaiskii, Pramod Bhatotia, Christof Fetzer DSN’16 Data corruption ● Performance-critical systems ➜ in a low-level language (C/C++) ● Low-level language ➜ no memory protection ○ Applications are more vulnerable to hardware faults DSN’16 2 Data corruption ● Performance-critical systems ➜ in a low-level language (C/C++) ● Low-level language ➜ no memory protection ○ Applications are more vulnerable to hardware faults ● A pointer gets corrupted ➜ stays undetected Memory Memory Corrupted object Object Object Corrupted Pointer pointer DSN’16 2 Overview Problem: ● Existing solutions are expensive ○ They harden the entire program Approach: ● Partial protection for efficient fault-tolerance ○ Protect only data pointers DSN’16 3 Hypothesis Leverage the new ISA extensions in modern CPUs for fault tolerance DSN’16 4 Hypothesis Leverage the new ISA extensions in modern CPUs for fault tolerance Fault detection Fault recovery Intel MPX Intel TSX (Memory Protection Extension) (Transaction Sync Extensions) DSN’16 4 Fault detection via Memory Protection Extension Intel MPX: Bounds checking in the H/W Memory Insight: Pointer error will cause bound violation with high probability Object bounds Pointer DSN’16 5 Fault detection via Memory Protection Extension Intel MPX: Bounds checking in the H/W Memory Insight: Pointer error will cause bound violation with high probability Object bounds Corrupted pointer DSN’16 5 Fault detection via Memory Protection Extension Intel MPX: Bounds checking in the H/W Memory Detected by MPX Out-of-bound Insight: Pointer error will cause bound access violation with high probability Object bounds Corrupted pointer DSN’16 5 Fault recovery using transactions Intel TSX: Transactions for optimistic Transaction concurrency control ... Approach: add mul ● Detect faults using MPX ● Use transactions for recovery mov ... DSN’16 6 Fault recovery using transactions Intel TSX: Transactions for optimistic Transaction concurrency control ... Approach: add mul ● Detect faults using MPX mov ● Use transactions for recovery MPX detects a fault ... DSN’16 6 Fault recovery using transactions Intel TSX: Transactions for optimistic Transaction concurrency control ... Rollback the transaction and re-execute Approach: add mul ● Detect faults using MPX mov ● Use transactions for recovery MPX detects a fault ... DSN’16 6 Performance overheads lower better DSN’16 7 Performance overheads lower better DSN’16 7 Performance overheads lower better ~ 50% slowdown on average DSN’16 7 Summary Leverage the new ISA extensions for Transaction fault tolerance: ... Rollback the transaction ● MPX: fault detection add and re-execute mul ● TSX: fault recovery mov MPX detects a Improved efficiency: fault ... ● ~ 50% slowdown ○ State-of-the-art full hardening - 100% DSN’16 8 Summary Leverage the new ISA extensions for Transaction fault tolerance: ... Rollback the transaction ● MPX: fault detection add and re-execute mul ● TSX: fault recovery mov MPX detects a Improved efficiency: fault ... ● ~ 50% slowdown ○ State-of-the-art full hardening - 100% Thanks! [email protected] DSN’16 8 DSN’16 9 Backups DSN’16 19 Generic Solutions There are solutions for full-program hardening: ● Thread- and process-level ● Duplicated execution redundancy ○ high performance overhead ○ additional hardware ○ x2 on average ○ i.e., more cores used add add add add mul mul mul mul mov mov mov mov DSN’16 20 Approach ... add Recover (TSX) mul mov Detect (MPX) DSN’16 21.