Logical Partitions on Many-Core Processors

Logical Partitions on Many-core Processors Ramya Masti, Claudio Marforio, Kari Kostiainen, Claudio Soriente, Srdjan Capkun ETH Zurich ACSAC 2015 1 Infrastructure as a Service (IaaS) VM VM App App App App OS OS OS OS Hardware Hardware Hardware Personal platforms Shared cloud platform (IaaS) - Economies of scale - Shared resources - Security problems 2 Resource Partitioning VM VM App App OS OS Hardware Cloud platform with dedicated resources - Guaranteed performance - Secure against shared memory/CPU attacks - Example: IBM cloud 3 Resource Partitioning Logical Partition Memory VM VM Subset of a system’s resources that can run an OS/VM Processor C0 C1 C2 C3 Every logical partition needs dedicated resources Many logical partitions need platforms that have lots of resources! 4 Many-core Processors for Logical Partitioning Cores, caches VM Many-core ? processor VM Many simple cores Many logical partitions Designed for workloads that share data (no isolation!) Can many-core architectures support scalable logical partitioning? 5 Hypervisor Design Alternatives Traditional Distributed Centralized VM VM …… VM VM …… Hypervisor (H) H H …… H H VM VM …… C0 C1 ….. Cn C0 C1 ….. Cn C0 C1 ….. Cn VM Run-time VM-hypervisor interaction H - Increases hypervisor’s attack surface - Performance overhead C 6 Centralized Hypervisors Today Every core supports virtualization H VM VM …… - Hypervisor and VM modes C0 C1 C2 ….. Cn - Privileged instructions and memory addressing - Enables memory confinement Higher privilege Lower privilege But every core runs only in one mode! mode mode VM Unused functionality in every core H VM Solutions without processor virtualization 7 174 174IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011 Fig. 1. Block diagram and tile architecture. Intel Single-chip Cloud Computer (SCC) Fig. 1. Block diagram and tile architecture. Memory Core controller Cache Network on chip (NoC) Messaging Buffer Network Interface Tiles Sources: Howard et al. IJSSC’10; The Future of Many-core Computing, Tim Mattson, Intel Labs 8 Fig. 2. Full-chip and tile micrograph and characteristics. quired for core-to-router communication. Each tile’s NI logic A. Core Architecture also features a Message Passing Buffer (MPB), or 16 KB of Fig. 2. Full-chip and tile micrograph and characteristics. on-die shared memory. The MPB is used to increase perfor- The core is an enhanced version of the second generation Pen- tium processor [4]. L1 instruction and data caches have been mance of a message passing programming model whereby cores upsized to 16 KB, over the previous 8 KB design, and sup- communicate through local shared memory. port 4-way set associativity and both write-through and write- Total die power is kept at a minimum by dynamically scaling quired for core-to-router communication. Each tile’s NI logic A. Core Architecture back modes for increased performance. Additionally, data cache both voltage and performance. Fine grained voltage change also features a Message Passing Buffer (MPB), or 16 KB of lineson-die have shared been modiﬁed memory. with The a MPB new status is used bit used to increase to mark perfor- The core is an enhanced version of the second generation Pen- commands are transmitted over the on-die network to a Voltage the content of the cache line as Message Passing Memory Type tium processor [4]. L1 instruction and data caches have been Regulator Controller (VRC). The VRC interfaces with two mance of a message passing programming model whereby cores (MPMT). The MPMT is introduced to differentiate between upsized to 16 KB, over the previous 8 KB design, and sup- on package voltage regulators. Each voltage regulator has a communicate through local shared memory. normal memory data and message passing data. The cache line’s port 4-way set associativity and both write-through and write- 4-rail output to supply 8 on-die voltage islands. Further power Total die power is kept at a minimum by dynamically scaling MPMT bit is determined by page table information found in the back modes for increased performance. Additionally, data cache savings are achieved through active frequency scaling at a tile core’sboth TLB voltage and must and be performance. setup properly Fine by the grained operating voltage system. change lines have been modiﬁed with a new status bit used to mark granularity. Frequency change commands are issued to a tile’s commandsThe Pentium are instruction transmitted set architecture over the on-die has been network extended to a to Voltage un-core logic, whereby frequency adjustments are processed. the content of the cache line as Message Passing Memory Type includeRegulator a new Controller instruction, (VRC). INVDMB, The used VRC to support interfaces software with two Tile performance is scalable from 300 MHz at 700 mV to 1.3 (MPMT). The MPMT is introduced to differentiate between managedon package coherency. voltage When regulators. executed, Each an INVDMB voltage instruction regulator has a GHz at 1.3 V. The on-chip network scales from 60 MHz at 550 invalidates all MPMT cache lines in a single clock cycle. Subse- normal memory data and message passing data. The cache line’s mV to 2.6 GHz at 1.3 V. The design target for nominal usage is quently,4-rail output reads or to writes supply to the 8 on-die MPMT voltage cache lines islands. are guaranteed Further power MPMT bit is determined by page table information found in the 1 GHz for tiles and 2 GHz for the 2-D network, when supplied tosavings miss and are data achieved will be fetched through or active written. frequency The instruction scaling ex- at a tile core’s TLB and must be setup properly by the operating system. by 1.1 V. Full-Chip and tile micrograph and characteristics are posesgranularity. the programmer Frequency to direct change control commands of cache are management issued to a tile’s The Pentium instruction set architecture has been extended to shown in Fig. 2. whileun-core in a logic, message whereby passing environment. frequency adjustments are processed. include a new instruction, INVDMB, used to support software Tile performance is scalable from 300 MHz at 700 mV to 1.3 managed coherency. When executed, an INVDMB instruction GHz at 1.3 V. The on-chip network scales from 60 MHz at 550 invalidates all MPMT cache lines in a single clock cycle. Subse- mV to 2.6 GHz at 1.3 V. The design target for nominal usage is quently, reads or writes to the MPMT cache lines are guaranteed 1 GHz for tiles and 2 GHz for the 2-D network, when supplied to miss and data will be fetched or written. The instruction ex- by 1.1 V. Full-Chip and tile micrograph and characteristics are poses the programmer to direct control of cache management shown in Fig. 2. while in a message passing environment. Intel SCC Architecture TILE Core Core Core - Asymmetric processor - Each core can run an independent OS NETWORK INTERFACE - Simple cores: no virtualization support Look Up Tables Look Up Tables (LUTs) - Determines the resources available to a core - Resources: other tiles, RAM and I/O Network on Chip All system resources are memory mapped 9 Intel SCC Address Translation Virtual address 0x87256199 CORE Memory Management Unit Physical address 0x72008127 NETWORK INTERFACE Look Up Tables (LUTs) System-wide address On-tile Off-tile LUT Registers On-tile I/O Memory at DRAM Memory Another Tile 10 Problem: Lack of Isolation TILE Core Core NETWORK INTERFACE Look Up Tables Every core can Network on Chip change its memory access configuration! Off-tile memory (DRAM) 11 Solution Intuition TILE TILE Master CoreCore Core Core Core X X X NETWORK INTERFACE NETWORK INTERFACE Look Up Tables Look Up Tables Network on Chip Master core configures memory confinement for all coresOff-tile memory (DRAM) 12 Contributions 1. Hardware change to Intel SCC that enables logical partitioning - isolation on Network on Chip - emulate in software 2. Custom hypervisor - small TCB and attack surface 3. Cloud architecture (IaaS) - implementation and evaluation 13 Cloud Architecture (IaaS) Computing Node Crypto engine (HSM) H VM VM HW-virtualized Intel SCC peripherals Management Service User Management Storage VM Management Service 14 Operation example: VM startup Helper Hypervisor VM Computing VM node Storage Core Master Core Core Service Crypto Encrypt and Intel SCC Engine install VM 1. Boot 2. Start 3. Fetch VM 4. Decrypt VM 5. Assign core and configure LUTs 6. Start 15 Recap of Main Properties No processor virtualization ◦ Beneficial for high number of cores Small TCB ◦ Hypervisor implementation size 3.4K LOC ◦ Tolerance to compromise of other cloud components Reduced VM interaction ◦ Small attack surface ◦ No virtualization overhead 16 Comparison No Processor Small TCB Reduced VM Virtualization Interaction Xen (>100K LOC) ✓ ✗ ✗ HypeBIOS (4K LOC) ✗ ✓ ✗ Merge? NoHype (>100K LOC) ✗ ✗ ✓ } Our solution (3.4K LOC) ✓ ✓ ✓ 17 Summary VM Many-core ? H processor VM Yes! (with minor modifications) Intel SCC case study - Isolation on Network on Chip (NoC) 18 Thank you! Ramya Masti, Claudio Marforio, Kari Kostiainen, Claudio Soriente, Srdjan Capkun 19.

Logical Partitions on Many-Core Processors

Introduction to Virtualization

Logical Partitioning

IIP (System Z9 Integrated Information Processor) Computer CEC (Central Electronics Complex) Server

Let's Build a Z Environment

Systems Management Logical Partitions Version 6 Release 1

Logical Partitioning Feature of CB Series Xeon Servers Suitable for Robust and Reliable Cloud

PR/SM Planning Guide Level 01A

Introduction to the System Z Hardware Management Console

IBM POWER Virtualization Best Practice Guide

PR/SM Planning Guide Operational Considerations

LPAR Weight Concepts

United States Patent (19) 11 Patent Number: 5,802,354 Kubala Et Al