The Case for the Holistic Language Runtime System Client #1! App A! App B! Client #2! App C! Service Interfaces! App D!

Programming FireBox with Managed Languages ! Martin Maas ([email protected]) Advisors: Krste Asanovic, John Kubiatowicz. In collaboration with Tim Harris, Oracle Labs. Thanks is owed to Mario Wolczko (Oracle Labs) and Philip Reames (Azul Systems) for early feedback. Algorithms and Specializers for Provably-optimal Implementations with Resilience and Efficiency Laboratory Client #1! App A! App B! Client #2! App C! Service Interfaces! App D! Cluster-level Scheduler (Mesos, Omega)! Storage (SQL/NoSQL)! Failure tolerance! Storage (SQL/NoSQL)! Failure tolerance! Frameworks! Parallel computation/dataflow! Frameworks! Parallel computation/dataflow! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P0! P1! P2! P3! Profiling! Objects! Profiling! Objects! ~1,000 SoC’s! Threads! Threads! 28 32 36 40 JIT! GC! JIT! GC! Single application! Single application! P0! CPU! CPU! x10-100! Runtime Instance! Runtime Instance! RAM! RAM! RAM! …! RAM! Vector Core! Crypto! Commodity OS (Linux)! Commodity OS (Linux)! SoC! SoC! SoC! …! SoC! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 NIC! Accelerators! P0! P1! P2! P3! Commodity Server! CPUs! DRAM! …! Commodity Server! CPUs! DRAM! Handle remote DRAM reqs.! High-speed Interconnect 10Gb/s Ethernet at rack-level interconnect! Custom System-on-Chip! First International Workshop on Rack-scale Computing (WRSC 2014) 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Disaggregated bulk non-volatile NVM NVM NVM P0! P1! P2! P3! memory, globally accessible! Storage! Storage! Storage! The Case for the Holistic Language Runtime System Client #1! App A! App B! Client #2! App C! Service Interfaces! App D! Coordinate GC! Schedule work on available runtime instances! Target accelerators! Share JIT/profiling results! Failure tolerance! Transparent relocation of Distributed Runtime! Manage distributed heaps! Enforce SLAs! execution and data! ? ? y ? Martin Maas Krste Asanovic´ Profiling!Tim HarrisThreads! ProfilingJohn! KubiatowiczThreads! Objects! Objects! Objects! Objects! JIT! GC! JIT! GC! ? Program 1 Programy 2! Program 1 Program 2! University of California,Per-node Berkeley Runtime! Oracle Labs SoC! CPUs! Accelerators! DRAM! …! SoC! CPUs! Accelerators! DRAM! High-speed Interconnect! ABSTRACT Cluster Scheduler! We anticipate that, by 2020, the basic unit of warehouse- App #1! App #2! App #3! App #4! App #1! App #2! App #3! App #4! scale cloud computing will be a rack-sized machine instead Language Language Language Language Distributed Runtime (Coordination Layer)! of an individual server. At the same time, we expect a Runtime! Runtime! Runtime! Runtime! shift from commodity hardware to custom SoCs that are Holistic Runtime System! Per-node Per-node Commodity OS! Commodity OS! specifically designed for the use in warehouse-scale comput- Language Runtime! Language Runtime! ing. In this paper, we make the case that the software for Today: Multiple Runtime Instances per node, The Holistic Language Runtime: One runtime such custom rack-scale machines should move away from the coordinated by a cluster scheduler.! system per node with global coordination.! model of running managed language workloads in separate Figure 1: Comparing today's software stack and holistic runtimes language runtimes on top of a traditional operating system but instead run a distributed language runtime system capa- 2. CLOUD COMPUTING IN 2020 ble of handling different target languages and frameworks. Warehouse-scale computers in the cloud are becoming the All applications will execute within this runtime, which per- backbone for a growing portion of applications. We expect forms most traditional OS and cluster manager functionality this trend to continue: in this section, we identify a set of de- such as resource management, scheduling and isolation. velopments that, we believe, will lead to significant changes to cloud infrastructure over the next 5-10 years. 1. INTRODUCTION We introduce Holistic Language Runtime Systems as a way 2.1 Different Cloud Hardware to program future cloud platforms. We anticipate that over As the market for IaaS and PaaS cloud services is growing, the next years, cloud providers will shift towards rack-scale and an increasing number of organizations and developers machines of custom designed SoCs, replacing the commod- are adopting the cloud, we expect providers to shift away ity parts in current data centers. At the same time, we from using commodity parts. While this kind of hardware expect a shift towards interactive workloads developed by has been the most economical solution for the last decade, third-party, non-systems programmers. This leads to a pro- the overheads of replicated and often unused components gramming crisis as productivity programmers have to develop (such as unnecessary interfaces, motherboards or peripher- for increasingly complex and opaque hardware (Section 2). als) lead to significant inefficiency in energy consumption, We believe that, as a result, the cloud will be exclusively hardware costs and resource utilization. We expect that due programmed through high-level languages and frameworks. to the increasing scale of warehouse-scale computing, it will However, today's software stack is ill-suited for such a sce- become more economical to design custom SoCs for data nario (Section 3): the conventional approach of running each centers (either as a third-party offering, or designed from application in a separate runtime system leads to inefficiency stock IP by the cloud provider itself). This has been facil- and unpredictable latencies due to interference between mul- itated by the availability of high-quality IP (such as ARM tiple runtime instances, and prevents optimizations. server processors) and would enable to produce SoCs with We propose to solve this problem through running all ap- 10s of cores, accelerators, on-chip NICs integrated into the plications in a rack-wide distributed runtime system. This cache hierarchy (such as [38]) and DRAM controllers that holistic runtime (Section 4) consists of a per-node runtime enable remote accesses without going through a CPU. that executes all workloads within that node, and a dis- We envision 100 to 1,000 of these SoCs to be connected in tributed inter-node runtime that coordinates activities be- a rack-scale environment with a high-bandwidth/low-latency tween them (Figure 1). The per-node runtime takes over the interconnect. Such a rack of SoCs will be the new basic unit functionality of a traditional OS (such as resource manage- of the data center. While the different SoCs are tightly ment, scheduling and isolation), while the distributed layer coupled and may provide a global address space, remote coordinates activities between the different nodes (similar to caching may be limited. At the same time, I/O and bulk a cluster manager, but with low-level control of the system). storage are moved to the periphery of the rack and accessed This enables to e.g. coordinate garbage collection between over the network. Compared with today's clusters, advan- nodes to not interfere with low-latency RPCs, share JIT re- tages of such a system include better energy efficiency and sults, or transparently migrate execution and data. better predictability due to flatter interconnects enabled by We discuss the merits of this model and show challenges the rack-scale form factor (such as a rack-level high-radix that need to be addressed by such a system (Section 5). switch) and being able to integrate NICs into the SoC. 1 First International Workshop on Rack-scale Computing (WRSC 2014) The rack-scale setup bears similarity to systems such as First, the high degree of complexity in the system can lead the HP Moonshot [4] or the AMD SeaMicro SM10000 [41], to high tail-latencies. This is unacceptable for interactive but with custom SoCs. This trend towards rack-scale sys- jobs, which hence require over-provisioning and other tech- tems extends to the research community: for example, an in- niques to be tail-tolerant [22]. This approach works today, creasing amount of work is looking at scale-out architectures since interactive jobs make up a relatively small part of the for memory [38] and CPUs [35]. Storage systems are chang- mix. However, as the portion of interactive jobs is growing, ing as well, with DRAM becoming the main base for stor- more resources are wasted for over-provisioning and replica- age [45] and new low-latency non-volatile memory technolo- tion, which quickly becomes economically infeasible. gies on the horizon [19]. Research is also looking at accelera- Second, load on the warehouse-scale system varies greatly tors to make use of available chip area and tackle the power between different times of the day, week and year, and cloud wall [34, 48], and custom SoCs enable inclusion of acceler- providers need to provision for peak times. Shutting off in- ators for common cloud workloads, such as databases [34]. dividual machines is difficult, since workloads and storage is However, this increases the degree of heterogeneity. spread across many machines and the overhead of consoli- ! The cloud is becoming more opaque: It is likely that dating them before shutting off may be too large. details of the SoC design will be proprietary and program- Due to the rising cost of over-provisioning, we expect that mers will have less knowledge about the hardware they run much of it will have to be replaced by fine-grained sharing of on. Even today it is difficult (and not desirable for portabil- resources { both to reduce tail-latency and to make it possi- ity) to fine-tune programs to the underlying hardware [31]. ble to consolidate workloads on a smaller set of nodes (and However, as cloud providers move to custom SoCs, it will switch off unused parts of a running system). This implies become infeasible. This creates a programmability crisis: a need to reduce detrimental interference between jobs on a programmers will have to program a platform they know highly loaded machine, which is a challenging problem [27].

Load more