<<

Using Address Independent Seed and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly

Brian Rogers, Siddhartha Chhabra, Milos Prvulovic§ Yan Solihin NC STATE §Georgia UNIVERSITY Tech Motivation

„ Why is there a need for secure processors?

Digital Rights Management, Copy Protection, Trusted Distributed Computing, Software Piracy, Reverse Engineering, Data Theft

„ Why are architectural mechanisms necessary?

Hardware attacks emerging (e.g. Mod-chips, bus analyzers)

SW-only protection vulnerable to HW attacks

$59.99 $53.49 $49.69

XBOX mod-chip PS mod-chip GC mod-chip

Brian Rogers AISE + BMT for Secure Processors MICRO 40 2 Secure Processor Architecture

Processor Core

Cache

Secure Processor Crypto Engine Trusted Domain UnTrusted Domain ??

Main Memory (Encrypted /Data & Authentication Codes)

„ Private and Tamper Resistant execution environment

Brian Rogers AISE + BMT for Secure Processors MICRO 40 3 Prior Work

„ Memory Encryption

Counter Mode Encryption [Suh ’03], [Yang ’03]

„ Overlap decryption and memory latencies

„ System-level issues (difficult to support common features)

Virtual Memory

Shared memory-based Inter-Process Communication (IPC)

„ Memory Integrity Verification

Merkle Tree Integrity Verification [Gassend ’03]

„ Prevents data replay attacks

„ Performance & storage overheads

Brian Rogers AISE + BMT for Secure Processors MICRO 40 4 Contributions

„ Address Independent Seed Encryption (AISE) Retains same cryptographic latency-hiding ability Compatible with support for virtual memory and IPC

„ Bonsai Merkle Trees (BMT) New, reduced size Merkle Tree organization Same protection, but lower storage & performance overheads

„ Extended Merkle Tree Protection Novel mechanism to protect both physical memory and the disk from tampering attacks

Brian Rogers AISE + BMT for Secure Processors MICRO 40 5 Outline

„ Motivation & Background „ Memory Encryption Overview of counter-mode encryption Address Independent Seed Encryption „ Memory Integrity Verification „ Evaluation „ Conclusion

Brian Rogers AISE + BMT for Secure Processors MICRO 40 6 Counter Mode Encryption

Lowest-level Cache Secret

Pad Seed AES Secure Chip Boundary Main Memory

„ Security: Seed must be used only once „ Performance: Seed must be known at cache miss time

128 bits Seed Padding Block Address Block Counter

Spatial Uniqueness Temporal Uniqueness

Brian Rogers AISE + BMT for Secure Processors MICRO 40 7 Problems with Address-Based Seeds

„ What if includes Physical Address? Security: Possible pad reuse between disk & memory Complexity: Extra cryptographic work on page swaps „ What if seed includes Virtual Address? Complexity: Storage of VA’s in lowest-level on-chip cache Security: Possible pad reuse between different processes Prevented by including process ID in seed, but… „ Shared-memory based IPC is difficult to support „ OS will reuse process ID’s

„ Fundamental Problem: Address used for memory management purposes, not as a component for security

Brian Rogers AISE + BMT for Secure Processors MICRO 40 8 Possible Solution – Global Counter

„ Eliminates system-level problems of address-based seeds „ Maintain a large (64b) global counter on-chip „ Seed == global counter value „ Larger performance and storage overheads Large per-block counters do not cache well Require more storage in memory

Brian Rogers AISE + BMT for Secure Processors MICRO 40 9 Address Independent Seed Encryption

„ Use logical identifiers in seeds instead of address „ Manage logical ID per physical page, not per block „ New seed composition:

Padding Logical Page IDentifier (LPID) Block Page Offset Block Counter

Spatial Uniqueness Temporal Uniqueness

„ LPID Unique value assigned to a page when allocated Obtained from a 64b on-chip Global Page Counter „ Stored in a non-volatile register „ Overflow not an issue Remains associated with page throughout its lifetime

Brian Rogers AISE + BMT for Secure Processors MICRO 40 10 LPID Storage

„ Borrow an idea from split counter organization [Yan ’06] Co-store LPID’s with block counters „ Example – 4KB page size, 64B block size, 64-bit LPID, 7-bit counter per block 64B Counter Block LPID

64b 64 x 7-bit block counters

„ On block counter overflow: Assign new LPID to block’s page & re-encrypt that page „ On-chip counter cache to enable latency-hiding

Brian Rogers AISE + BMT for Secure Processors MICRO 40 11 AISE Advantages

„ Retains latency-hiding ability Counter caching or counter prediction „ Seeds are globally unique Eliminates pad reuse „ No special mechanisms for page swaps Swap page of data, LPID, & block counters to/from disk „ Shared-memory IPC naturally supported „ Low memory storage overhead (1.6%)

Brian Rogers AISE + BMT for Secure Processors MICRO 40 12 Outline

„ Background „ Memory Encryption „ Memory Integrity Verification Merkle Tree overview Bonsai Merkle Trees „ Evaluation „ Conclusion

Brian Rogers AISE + BMT for Secure Processors MICRO 40 13 Merkle Tree Integrity Verification

Root MAC • 64B Block Size • 128b Auth. Codes ...... MAC MAC MAC Intermediate MACs MAC MAC . . . . . MAC

Main Memory

Brian Rogers AISE + BMT for Secure Processors MICRO 40 14 Performance Optimization

„ Problem: Large performance overhead Verify MACs to root for every data fetch „ Optimization: Cache MACs Cached MAC blocks are verified & trusted Only verify up the Merkle Tree to first cached MAC Lower performance overhead, but… „ Large portion of L2 cache may be occupied by MACs „ Increase in cache capacity misses

Brian Rogers AISE + BMT for Secure Processors MICRO 40 15 Bonsai Merkle Trees (BMT)

„ Leverage counter-mode memory encryption „ Two Observations: Merkle Tree only needed to prevent replay attacks Counter-mode encryption schemes maintain a counter per memory block „ Essentially a version number „ Claim Data blocks don’t need MT to guard replay attacks if: (1) Each block is protected with a MAC (2) Block’s MAC computed on & counter value (3) Integrity & freshness of counter values guaranteed

Brian Rogers AISE + BMT for Secure Processors MICRO 40 16 BMTs (Cont.)

„ Why claim holds true: old old old MAC = Hk(Ctext , Counter ) Attacker replays MACold & Ctextold instead of MACfresh & Ctextfresh old old fresh MAC ≠ Hk(Ctext , Counter ) „ Processor knows Counterfresh „ How to guarantee processor knows Counterfresh? We protect counters with Merkle Tree!

Significantly smaller and shallower Bonsai Merkle Tree

Brian Rogers AISE + BMT for Secure Processors MICRO 40 17 BMT Structure

Standard Merkle Tree Bonsai Merkle Tree Secure Chip Secure Chip Secure Root Boundary Boundary Secure Root

Merkle Merkle Tree Data MACs Tree

Data Ctrs MT nodes

Data Ctrs MT nodes

„ Reduced memory storage overhead & L2 cache contention

Brian Rogers AISE + BMT for Secure Processors MICRO 40 18 Outline

„ Background „ Memory Encryption „ Memory Integrity Verification „ Evaluation „ Conclusion

Brian Rogers AISE + BMT for Secure Processors MICRO 40 19 Simulation Setup

„ SESC – A detailed, execution-driven simulator Three issue, out of order processor

L1 Cache Split I&D, 32KB each, 2-way set-associative, 64B line, 2-cycle/access L2 Cache Unified 1MB, 8-way set-associative, 64B line, 10-cycle/access Memory/Bus 200-cycle uncontended access time, 600MHz Bus AES/SHA-1 Engines 80 cycle latency, 16-stage pipeline Counter Cache 32KB, 16-way set associative, 64B line One 64b LPID, 64-7b minor (4KB page) Merkle-Tree Covers 1GB memory space, 128b MACs

Brian Rogers AISE + BMT for Secure Processors MICRO 40 20 Address Independent Seed Encryption

23% 33% 39% AISE 20% d global32 global64 15%

10%

5%

Execution Overhea Time 0% art gap mcf apsi mesa swim applu mgrid equake AVG21 wupwise

„ AISE performs significantly better for memory-intensive applications „ AISE performance equivalent to prior counter-mode studies [Yan ’06]

Brian Rogers AISE + BMT for Secure Processors MICRO 40 21 Bonsai Merkle Tree 74% 35% 63%

d 30% 25% AISE AISE + MT AISE + BMT 20% 15% 10% 5%

Execution Overhea Time 0% art gap mcf apsi mesa swim applu mgrid equake AVG21 wupwise

„ BMTs eliminate significant portion of the overhead of Merkle Tree-based integrity protection (12% to 2%)

Brian Rogers AISE + BMT for Secure Processors MICRO 40 22 L2 Cache Miss Rate

60% AISE+MT AISE+BMT

50%

40%

30%

20%

10%

Normalized L2 Cache Miss Rate 0% art mcf gap apsi swim mesa applu mgrid AVG21 equake wupwise

„ BMTs significantly reduce cache contention & L2 miss rates „ Counter Cache hits filter Merkle Tree integrity checks

Brian Rogers AISE + BMT for Secure Processors MICRO 40 23 Conclusions

„ AISE Retains cryptographic latency-hiding ability Compatible w/ virtual memory & shared memory IPC Simplifies process of page swapping encrypted pages

„ BMTs Retain security of standard Merkle Tree over memory Significant reduction in performance overheads (12% to 2%) and memory storage overheads (33% to 21%)

Brian Rogers AISE + BMT for Secure Processors MICRO 40 24 Questions

Email: [email protected]

NC STATE Georgia UNIVERSITY Tech

Brian Rogers AISE + BMT for Secure Processors MICRO 40 25