Configurable Fine-Grain Protection for Multicore Processor Virtualization 1

Configurable Fine-Grain Protection for Multicore Processor Virtualization David Wentzlaff1, Christopher J. Jackson2, Patrick Griffin3, and Anant Agarwal2 [email protected], [email protected], griffi[email protected], [email protected] 1Princeton University 2Tilera Corp. 3Google Inc. Abstract TLB Access DMA Engine “User” Network I/O Network 2 2 2 2 1 3 1 3 1 3 1 3 Multicore architectures, with their abundant on-chip re- 0 0 0 0 sources, are effectively collections of systems-on-a-chip. The protection system for these architectures must support Key: 0 – User Code, 1 – OS, 2 – Hypervisor, 3 – Hypervisor Debugger multiple concurrently executing operating systems (OSes) Figure 1. With CFP, system software can dy• with different needs, and manage and protect the hard- namically set the privilege level needed to ac• ware’s novel communication mechanisms and hardware cess each fine•grain processor resource. features. Traditional protection systems are insufficient; they protect supervisor from user code, but typically do not protect one system from another, and only support fixed as- ticore systems, a protection system must both temporally signment of resources to protection levels. In this paper, protect and spatially isolate access to resources. Spatial iso- we propose an alternative to traditional protection systems lation is the need to isolate different system software stacks which we call configurable fine-grain protection (CFP). concurrently executing on spatially disparate cores in a mul- CFP enables the dynamic assignment of in-core resources ticore system. Spatial isolation is especially important now to protection levels. We investigate how CFP enables differ- that multicore systems have directly accessible networks ent system software stacks to utilize the same configurable connecting cores to other cores and cores to I/O devices. protection hardware, and how differing OSes can execute at The spatial aspect of multicore systems mixed with the the same time on a multicore processor with CFP. As illus- increase of silicon area afforded to multicore designers has tration, we describe an implementation of CFP in a com- fueled a resurgence of architectural feature innovation. Ex- mercial multicore, the TILE64 processor. amples include in-core DMA engines, directly accessible networks, and co-processor accelerators. It is desirable that these new features not be available to arbitrary code, but 1. Introduction rather that they can be restricted, virtualized, and multi- Multicore computing systems have changed the face of plexed between multiple OSes executing on a single chip. what is achievable on a single chip. Instead of a chip being Multicore computing chips are a fabric for desktop OSes, a unit in a larger system, or a single system being realized embedded OSes, real-time OSes, and very thin runtimes. on a chip, one chip can now contain a collection of systems. While full-featured OSes share similar protection require- Each of these systems can execute an independent operating ments, the diversity of multicore OSes share fewer com- system (OS) concurrently within a single multicore proces- monalities. Therefore, multicore processors require config- sor. One example of this is the integration of a server-farm urability in their protection systems to be able to execute on a chip. Another is a multicore cell phone where the GUI a diverse set of OSes. This configurability becomes even processor executes a full featured OS, while the baseband more important when executing a real-time system where processor executes a very thin real-time OS. The ability to emulation can detrimentally affect real-time performance. integrate multiple systems on a single chip levies new re- We introduce Configurable Fine-Grain Protection (CFP) quirements on protection systems that manage and restrict as a mechanism to control access to different on-chip re- access to multicore resources. sources. CFP is a unified hardware mechanism which en- When dealing with collections of systems integrated on ables dynamic modification of the protection level required one chip, traditional protection systems break down. This is to access a particular hardware resource. In CFP, the pro- because traditional protection systems are concerned with tection level needed to access each protectable hardware re- protecting supervisory code from user code, but do not ad- source is individually configurable per resource. Resources dress peer-to-peer protection, protecting one system from can be grouped by similar function into protection groups. another system. Traditional protection systems are con- CFP is configured dynamically by software, which lies in cerned with temporally protecting resources, while in mul- stark contrast to the traditional fixed assignment of features 1 to protection levels. The ability to change a hardware re- resources that need to be protected. For instance, multicore source’s protection level allows differing protection models architectures have seen the integration of in-core perfor- to be created within a single chip. CFP is simple to im- mance counters, debug registers, cryptography units (e.g., plement in hardware as it only requires a few bits per re- Niagara 2 [16]), and DMA engines (e.g., TILE64 [27]). source group being protected. The protection level needed to access a particular hardware resource or set of resources 2.1.1. Spatial Isolation can be set in CFP much in the same way that a knob can Multicore and multiprocessor systems have extended be turned. Figure 1 demonstrates how the protection level uniprocessor protection requirements even further due to needed to access four different hardware resources, TLB ac- the need to integrate multiple system software stacks on cess, DMA Engine access, User Network access, and I/O a single system. One example of this can be seen in the Network access can be each individually set. In this exam- server consolidation space where multiple machines are ple, user code can access the user network, the operating aggregated and put into one physical computer or chip. system can access the DMA engine, and the TLB and I/O Software virtualization [26] and hybrid hardware-software network is reserved for use by the hypervisor. schemes [9] have been used to address this challenge. We illustrate these concepts though the implementation Multicores and multiprocessors such as TRIPS [20], of CFP in the Tilera TILE64 processor, a homogeneous Raw [24], the Transputer [28], the J machine [18], and general-purpose multicore with 64 cores. This processor Alewife [2] have introduced architecturally-exposed, direct contains all of the protection features to protect cores from communication channels. Several of these architectures go each other, and to run spatially and/or temporally multi- so far as to map communication networks into the proces- plexed operating systems in a safe protected environment; sor register space thereby increasing performance of net- it protects 48 groups of resources using CFP. We discuss work communication. This introduces the challenge of spa- the implementation overhead of CFP and the unique system tial isolation. Spatial isolation is the need to isolate differ- software opportunities that it affords. ent system software stacks executing on different portions of a multiprocessor or multicore system. What makes this unique is that memory protection alone is not sufficient to 2. Protection Challenges restrict different cores communicating via direct messaging In this section, we begin with an overview of the require- networks. For many of these systems, the spatial nature of ments that multicore puts on protection systems. We discuss the communication is important; for example, if cores are different resources that need to be protected and challenges connected in a 2D grid, then the location of communicating which are introduced specifically by integrating many cores processes is critical. Directly accessible networks may also onto one piece of silicon. This section then describes previ- be used to connect to I/O devices. Finally, the destination ous protection systems and places Configurable Fine-Grain for messages on these networks are not always known at Protection (CFP) in the space of possible protection sys- message insertion time; Transputer channels and the Raw tems. Finally we categorize how each of the different pro- static network have this property. For these reasons, spa- tection systems address the challenges placed by multicore. tial isolation needs to be able to cut a multicore chip into multiple isolatable domains or regions. 2.1. Requirements on Protection Systems The requirements in building a fully self-virtualizable [19] 2.1.2. Diverse Fine-Grain Resource Control multicore processor architecture extend the requirements Multicore systems are opening up diverse new fields which placed onto protection systems by uniprocessor system soft- are taxing traditional protection systems. This is because ware. First, the system needs to have differing classes of multicore systems enable the integration of not only sys- software executing on it and some manner to restrict ac- tems on a single chip, but systems of systems which were cess to resources to certain classes of software. There also never envisioned to share one piece of silicon. For example, needs to be a way for these different classes of software to on a multicore cell phone, different OSes may run concur- safely communicate. Safe communication means passing rently on a single chip: an application processor may run information between different classes of software without Android Linux while the baseband processor is executing allowing the privileges of one class to be passed to another. an embedded OS. The Linux instance wants all of the pro- Uniprocessor systems are primarily concerned with pro- tections afforded by modern protection systems. However, tecting access to memory, I/O devices, in-processor state, the baseband processor may not; it may want its memory to and special instructions which modify processor state. Mul- be protected from the Linux instance, while still receiving tiprocessor systems have extended these uniprocessor re- direct access to on-chip networks and I/O devices.

Configurable Fine-Grain Protection for Multicore Processor Virtualization 1

Performance and Energy Efficient Network-On-Chip Architectures

Computer Architecture: Dataflow (Part I)

CG-Ooo Energy-Efficient Coarse-Grain Out-Of-Order Execution

Parallel Computer Architecture III

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

An Evaluation of the TRIPS Computer System

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

Designing Heterogeneous Many-Core Processors to Provide High Performance Under Limited Chip Power Budget

Modeling Instruction Placement on a Spatial Architecture

Compiling for EDGE Architectures

Scatter-Add in Data Parallel Architectures

Universal Mechanisms for Data-Parallel Architectures