Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly
Brian Rogers, Siddhartha Chhabra, Milos Prvulovic§ Yan Solihin NC STATE §Georgia UNIVERSITY Tech Motivation
Why is there a need for secure processors?
Digital Rights Management, Copy Protection, Trusted Distributed Computing, Software Piracy, Reverse Engineering, Data Theft
Why are architectural mechanisms necessary?
Hardware attacks emerging (e.g. Mod-chips, bus analyzers)
SW-only protection vulnerable to HW attacks
$59.99 $53.49 $49.69
XBOX mod-chip PS mod-chip GC mod-chip
Brian Rogers AISE + BMT for Secure Processors MICRO 40 2 Secure Processor Architecture
Processor Core
Cache
Secure Processor Crypto Engine Trusted Domain UnTrusted Domain ??
Main Memory (Encrypted Code/Data & Authentication Codes)
Private and Tamper Resistant execution environment
Brian Rogers AISE + BMT for Secure Processors MICRO 40 3 Prior Work
Memory Encryption
Counter Mode Encryption [Suh ’03], [Yang ’03]
Overlap decryption and memory latencies
System-level issues (difficult to support common features)
Virtual Memory
Shared memory-based Inter-Process Communication (IPC)
Memory Integrity Verification
Merkle Tree Integrity Verification [Gassend ’03]
Prevents data replay attacks
Performance & storage overheads
Brian Rogers AISE + BMT for Secure Processors MICRO 40 4 Contributions
Address Independent Seed Encryption (AISE) Retains same cryptographic latency-hiding ability Compatible with support for virtual memory and IPC
Bonsai Merkle Trees (BMT) New, reduced size Merkle Tree organization Same protection, but lower storage & performance overheads
Extended Merkle Tree Protection Novel mechanism to protect both physical memory and the disk from tampering attacks
Brian Rogers AISE + BMT for Secure Processors MICRO 40 5 Outline
Motivation & Background Memory Encryption Overview of counter-mode encryption Address Independent Seed Encryption Memory Integrity Verification Evaluation Conclusion
Brian Rogers AISE + BMT for Secure Processors MICRO 40 6 Counter Mode Encryption
Lowest-level Cache Secret Key
Pad Seed AES Secure Chip Boundary Main Memory
Security: Seed must be used only once Performance: Seed must be known at cache miss time
128 bits Seed Padding Block Address Block Counter
Spatial Uniqueness Temporal Uniqueness
Brian Rogers AISE + BMT for Secure Processors MICRO 40 7 Problems with Address-Based Seeds
What if seed includes Physical Address? Security: Possible pad reuse between disk & memory Complexity: Extra cryptographic work on page swaps What if seed includes Virtual Address? Complexity: Storage of VA’s in lowest-level on-chip cache Security: Possible pad reuse between different processes Prevented by including process ID in seed, but… Shared-memory based IPC is difficult to support OS will reuse process ID’s
Fundamental Problem: Address used for memory management purposes, not as a component for security
Brian Rogers AISE + BMT for Secure Processors MICRO 40 8 Possible Solution – Global Counter
Eliminates system-level problems of address-based seeds Maintain a large (64b) global counter on-chip Seed == global counter value Larger performance and storage overheads Large per-block counters do not cache well Require more storage in memory
Brian Rogers AISE + BMT for Secure Processors MICRO 40 9 Address Independent Seed Encryption
Use logical identifiers in seeds instead of address Manage logical ID per physical page, not per block New seed composition:
Padding Logical Page IDentifier (LPID) Block Page Offset Block Counter
Spatial Uniqueness Temporal Uniqueness
LPID Unique value assigned to a page when allocated Obtained from a 64b on-chip Global Page Counter Stored in a non-volatile register Overflow not an issue Remains associated with page throughout its lifetime
Brian Rogers AISE + BMT for Secure Processors MICRO 40 10 LPID Storage
Borrow an idea from split counter organization [Yan ’06] Co-store LPID’s with block counters Example – 4KB page size, 64B block size, 64-bit LPID, 7-bit counter per block 64B Counter Block LPID
64b 64 x 7-bit block counters
On block counter overflow: Assign new LPID to block’s page & re-encrypt that page On-chip counter cache to enable latency-hiding
Brian Rogers AISE + BMT for Secure Processors MICRO 40 11 AISE Advantages
Retains latency-hiding ability Counter caching or counter prediction Seeds are globally unique Eliminates pad reuse No special mechanisms for page swaps Swap page of data, LPID, & block counters to/from disk Shared-memory IPC naturally supported Low memory storage overhead (1.6%)
Brian Rogers AISE + BMT for Secure Processors MICRO 40 12 Outline
Background Memory Encryption Memory Integrity Verification Merkle Tree overview Bonsai Merkle Trees Evaluation Conclusion
Brian Rogers AISE + BMT for Secure Processors MICRO 40 13 Merkle Tree Integrity Verification
Root MAC • 64B Block Size • 128b Auth. Codes ...... MAC MAC MAC Intermediate MACs MAC MAC . . . . . MAC
Main Memory
Brian Rogers AISE + BMT for Secure Processors MICRO 40 14 Performance Optimization
Problem: Large performance overhead Verify MACs to root for every data fetch Optimization: Cache MACs Cached MAC blocks are verified & trusted Only verify up the Merkle Tree to first cached MAC Lower performance overhead, but… Large portion of L2 cache may be occupied by MACs Increase in cache capacity misses
Brian Rogers AISE + BMT for Secure Processors MICRO 40 15 Bonsai Merkle Trees (BMT)
Leverage counter-mode memory encryption Two Observations: Merkle Tree only needed to prevent replay attacks Counter-mode encryption schemes maintain a counter per memory block Essentially a version number Claim Data blocks don’t need MT to guard replay attacks if: (1) Each block is protected with a MAC (2) Block’s MAC computed on ciphertext & counter value (3) Integrity & freshness of counter values guaranteed
Brian Rogers AISE + BMT for Secure Processors MICRO 40 16 BMTs (Cont.)
Why claim holds true: old old old MAC = Hk(Ctext , Counter ) Attacker replays MACold & Ctextold instead of MACfresh & Ctextfresh old old fresh MAC ≠ Hk(Ctext , Counter ) Processor knows Counterfresh How to guarantee processor knows Counterfresh? We protect counters with Merkle Tree!
Significantly smaller and shallower Bonsai Merkle Tree
Brian Rogers AISE + BMT for Secure Processors MICRO 40 17 BMT Structure
Standard Merkle Tree Bonsai Merkle Tree Secure Chip Secure Chip Secure Root Boundary Boundary Secure Root
Merkle Merkle Tree Data MACs Tree
Data Ctrs MT nodes
Data Ctrs MT nodes
Reduced memory storage overhead & L2 cache contention
Brian Rogers AISE + BMT for Secure Processors MICRO 40 18 Outline
Background Memory Encryption Memory Integrity Verification Evaluation Conclusion
Brian Rogers AISE + BMT for Secure Processors MICRO 40 19 Simulation Setup
SESC – A detailed, execution-driven simulator Three issue, out of order processor
L1 Cache Split I&D, 32KB each, 2-way set-associative, 64B line, 2-cycle/access L2 Cache Unified 1MB, 8-way set-associative, 64B line, 10-cycle/access Memory/Bus 200-cycle uncontended access time, 600MHz Bus AES/SHA-1 Engines 80 cycle latency, 16-stage pipeline Counter Cache 32KB, 16-way set associative, 64B line One 64b LPID, 64-7b minor (4KB page) Merkle-Tree Covers 1GB memory space, 128b MACs
Brian Rogers AISE + BMT for Secure Processors MICRO 40 20 Address Independent Seed Encryption
23% 33% 39% AISE 20% d global32 global64 15%
10%
5%
Execution Overhea Time 0% art gap mcf apsi mesa swim applu mgrid equake AVG21 wupwise
AISE performs significantly better for memory-intensive applications AISE performance equivalent to prior counter-mode studies [Yan ’06]
Brian Rogers AISE + BMT for Secure Processors MICRO 40 21 Bonsai Merkle Tree 74% 35% 63%
d 30% 25% AISE AISE + MT AISE + BMT 20% 15% 10% 5%
Execution Overhea Time 0% art gap mcf apsi mesa swim applu mgrid equake AVG21 wupwise
BMTs eliminate significant portion of the overhead of Merkle Tree-based integrity protection (12% to 2%)
Brian Rogers AISE + BMT for Secure Processors MICRO 40 22 L2 Cache Miss Rate
60% AISE+MT AISE+BMT
50%
40%
30%
20%
10%
Normalized L2 Cache Miss Rate 0% art mcf gap apsi swim mesa applu mgrid AVG21 equake wupwise
BMTs significantly reduce cache contention & L2 miss rates Counter Cache hits filter Merkle Tree integrity checks
Brian Rogers AISE + BMT for Secure Processors MICRO 40 23 Conclusions
AISE Retains cryptographic latency-hiding ability Compatible w/ virtual memory & shared memory IPC Simplifies process of page swapping encrypted pages
BMTs Retain security of standard Merkle Tree over memory Significant reduction in performance overheads (12% to 2%) and memory storage overheads (33% to 21%)
Brian Rogers AISE + BMT for Secure Processors MICRO 40 24 Questions
Email: [email protected]
NC STATE Georgia UNIVERSITY Tech
Brian Rogers AISE + BMT for Secure Processors MICRO 40 25