Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly Brian Rogers, Siddhartha Chhabra, Milos Prvulovic§ Yan Solihin NC STATE §Georgia UNIVERSITY Tech Motivation Why is there a need for secure processors? Digital Rights Management, Copy Protection, Trusted Distributed Computing, Software Piracy, Reverse Engineering, Data Theft Why are architectural mechanisms necessary? Hardware attacks emerging (e.g. Mod-chips, bus analyzers) SW-only protection vulnerable to HW attacks $59.99 $53.49 $49.69 XBOX mod-chip PS mod-chip GC mod-chip Brian Rogers AISE + BMT for Secure Processors MICRO 40 2 Secure Processor Architecture Processor Core Cache Secure Processor Crypto Engine Trusted Domain UnTrusted Domain ?? Main Memory (Encrypted Code/Data & Authentication Codes) Private and Tamper Resistant execution environment Brian Rogers AISE + BMT for Secure Processors MICRO 40 3 Prior Work Memory Encryption Counter Mode Encryption [Suh ’03], [Yang ’03] Overlap decryption and memory latencies System-level issues (difficult to support common features) Virtual Memory Shared memory-based Inter-Process Communication (IPC) Memory Integrity Verification Merkle Tree Integrity Verification [Gassend ’03] Prevents data replay attacks Performance & storage overheads Brian Rogers AISE + BMT for Secure Processors MICRO 40 4 Contributions Address Independent Seed Encryption (AISE) Retains same cryptographic latency-hiding ability Compatible with support for virtual memory and IPC Bonsai Merkle Trees (BMT) New, reduced size Merkle Tree organization Same protection, but lower storage & performance overheads Extended Merkle Tree Protection Novel mechanism to protect both physical memory and the disk from tampering attacks Brian Rogers AISE + BMT for Secure Processors MICRO 40 5 Outline Motivation & Background Memory Encryption Overview of counter-mode encryption Address Independent Seed Encryption Memory Integrity Verification Evaluation Conclusion Brian Rogers AISE + BMT for Secure Processors MICRO 40 6 Counter Mode Encryption Lowest-level Cache Secret Key Pad Seed AES Secure Chip Boundary Main Memory Security: Seed must be used only once Performance: Seed must be known at cache miss time 128 bits Seed Padding Block Address Block Counter Spatial Uniqueness Temporal Uniqueness Brian Rogers AISE + BMT for Secure Processors MICRO 40 7 Problems with Address-Based Seeds What if seed includes Physical Address? Security: Possible pad reuse between disk & memory Complexity: Extra cryptographic work on page swaps What if seed includes Virtual Address? Complexity: Storage of VA’s in lowest-level on-chip cache Security: Possible pad reuse between different processes Prevented by including process ID in seed, but… Shared-memory based IPC is difficult to support OS will reuse process ID’s Fundamental Problem: Address used for memory management purposes, not as a component for security Brian Rogers AISE + BMT for Secure Processors MICRO 40 8 Possible Solution – Global Counter Eliminates system-level problems of address-based seeds Maintain a large (64b) global counter on-chip Seed == global counter value Larger performance and storage overheads Large per-block counters do not cache well Require more storage in memory Brian Rogers AISE + BMT for Secure Processors MICRO 40 9 Address Independent Seed Encryption Use logical identifiers in seeds instead of address Manage logical ID per physical page, not per block New seed composition: Padding Logical Page IDentifier (LPID) Block Page Offset Block Counter Spatial Uniqueness Temporal Uniqueness LPID Unique value assigned to a page when allocated Obtained from a 64b on-chip Global Page Counter Stored in a non-volatile register Overflow not an issue Remains associated with page throughout its lifetime Brian Rogers AISE + BMT for Secure Processors MICRO 40 10 LPID Storage Borrow an idea from split counter organization [Yan ’06] Co-store LPID’s with block counters Example – 4KB page size, 64B block size, 64-bit LPID, 7-bit counter per block 64B Counter Block LPID 64b 64 x 7-bit block counters On block counter overflow: Assign new LPID to block’s page & re-encrypt that page On-chip counter cache to enable latency-hiding Brian Rogers AISE + BMT for Secure Processors MICRO 40 11 AISE Advantages Retains latency-hiding ability Counter caching or counter prediction Seeds are globally unique Eliminates pad reuse No special mechanisms for page swaps Swap page of data, LPID, & block counters to/from disk Shared-memory IPC naturally supported Low memory storage overhead (1.6%) Brian Rogers AISE + BMT for Secure Processors MICRO 40 12 Outline Background Memory Encryption Memory Integrity Verification Merkle Tree overview Bonsai Merkle Trees Evaluation Conclusion Brian Rogers AISE + BMT for Secure Processors MICRO 40 13 Merkle Tree Integrity Verification Root MAC • 64B Block Size • 128b Auth. Codes . MAC MAC MAC Intermediate MACs MAC MAC . MAC Main Memory Brian Rogers AISE + BMT for Secure Processors MICRO 40 14 Performance Optimization Problem: Large performance overhead Verify MACs to root for every data fetch Optimization: Cache MACs Cached MAC blocks are verified & trusted Only verify up the Merkle Tree to first cached MAC Lower performance overhead, but… Large portion of L2 cache may be occupied by MACs Increase in cache capacity misses Brian Rogers AISE + BMT for Secure Processors MICRO 40 15 Bonsai Merkle Trees (BMT) Leverage counter-mode memory encryption Two Observations: Merkle Tree only needed to prevent replay attacks Counter-mode encryption schemes maintain a counter per memory block Essentially a version number Claim Data blocks don’t need MT to guard replay attacks if: (1) Each block is protected with a MAC (2) Block’s MAC computed on ciphertext & counter value (3) Integrity & freshness of counter values guaranteed Brian Rogers AISE + BMT for Secure Processors MICRO 40 16 BMTs (Cont.) Why claim holds true: old old old MAC = Hk(Ctext , Counter ) Attacker replays MACold & Ctextold instead of MACfresh & Ctextfresh old old fresh MAC ≠ Hk(Ctext , Counter ) Processor knows Counterfresh How to guarantee processor knows Counterfresh? We protect counters with Merkle Tree! Significantly smaller and shallower Bonsai Merkle Tree Brian Rogers AISE + BMT for Secure Processors MICRO 40 17 BMT Structure Standard Merkle Tree Bonsai Merkle Tree Secure Chip Secure Chip Secure Root Boundary Boundary Secure Root Merkle Merkle Tree Data MACs Tree Data Ctrs MT nodes Data Ctrs MT nodes Reduced memory storage overhead & L2 cache contention Brian Rogers AISE + BMT for Secure Processors MICRO 40 18 Outline Background Memory Encryption Memory Integrity Verification Evaluation Conclusion Brian Rogers AISE + BMT for Secure Processors MICRO 40 19 Simulation Setup SESC – A detailed, execution-driven simulator Three issue, out of order processor L1 Cache Split I&D, 32KB each, 2-way set-associative, 64B line, 2-cycle/access L2 Cache Unified 1MB, 8-way set-associative, 64B line, 10-cycle/access Memory/Bus 200-cycle uncontended access time, 600MHz Bus AES/SHA-1 Engines 80 cycle latency, 16-stage pipeline Counter Cache 32KB, 16-way set associative, 64B line One 64b LPID, 64-7b minor (4KB page) Merkle-Tree Covers 1GB memory space, 128b MACs Brian Rogers AISE + BMT for Secure Processors MICRO 40 20 Address Independent Seed Encryption 23% 33% 39% AISE 20% d global32 global64 15% 10% 5% Execution Overhea Time 0% art gap mcf apsi mesa swim applu mgrid equake AVG21 wupwise AISE performs significantly better for memory-intensive applications AISE performance equivalent to prior counter-mode studies [Yan ’06] Brian Rogers AISE + BMT for Secure Processors MICRO 40 21 Bonsai Merkle Tree 74% 35% 63% d 30% 25% AISE AISE + MT AISE + BMT 20% 15% 10% 5% Execution Overhea Time 0% art gap mcf apsi mesa swim applu mgrid equake AVG21 wupwise BMTs eliminate significant portion of the overhead of Merkle Tree-based integrity protection (12% to 2%) Brian Rogers AISE + BMT for Secure Processors MICRO 40 22 L2 Cache Miss Rate 60% AISE+MT AISE+BMT 50% 40% 30% 20% 10% Normalized L2 Cache Miss Rate 0% art mcf gap apsi swim mesa applu mgrid AVG21 equake wupwise BMTs significantly reduce cache contention & L2 miss rates Counter Cache hits filter Merkle Tree integrity checks Brian Rogers AISE + BMT for Secure Processors MICRO 40 23 Conclusions AISE Retains cryptographic latency-hiding ability Compatible w/ virtual memory & shared memory IPC Simplifies process of page swapping encrypted pages BMTs Retain security of standard Merkle Tree over memory Significant reduction in performance overheads (12% to 2%) and memory storage overheads (33% to 21%) Brian Rogers AISE + BMT for Secure Processors MICRO 40 24 Questions Email: [email protected] NC STATE Georgia UNIVERSITY Tech Brian Rogers AISE + BMT for Secure Processors MICRO 40 25.

Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

GCM) for Confidentiality And

Block Cipher Modes

Cryptographic Sponge Functions

Recommendation for Block Cipher Modes of Operation Methods

The Whirlpool Secure Hash Function

Block Cipher and Data Encryption Standard (DES)

Characterization of Padding Rules and Different Variants of MD Hash Functions

Secret-Key Encryption Introduction

CRYPTREC Report 2001

Reconsidering the Security Bound of AES-GCM-SIV

Provable Security in Practice: Analysis of SSH and CBC Mode with Padding

Short Message RSA Attacks and Padding and RSA Encryption And