The Case for the Holistic Language Runtime System Client #1! App A! App B! Client #2! App C! Service Interfaces! App D!

Total Page:16

File Type:pdf, Size:1020Kb

The Case for the Holistic Language Runtime System Client #1! App A! App B! Client #2! App C! Service Interfaces! App D! Programming FireBox with Managed Languages ! Martin Maas ([email protected]) Advisors: Krste Asanovic, John Kubiatowicz. In collaboration with Tim Harris, Oracle Labs. Thanks is owed to Mario Wolczko (Oracle Labs) and Philip Reames (Azul Systems) for early feedback. Algorithms and Specializers for Provably-optimal Implementations with Resilience and Efficiency Laboratory Client #1! App A! App B! Client #2! App C! Service Interfaces! App D! Cluster-level Scheduler (Mesos, Omega)! Storage (SQL/NoSQL)! Failure tolerance! Storage (SQL/NoSQL)! Failure tolerance! Frameworks! Parallel computation/dataflow! Frameworks! Parallel computation/dataflow! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 P0! P1! P2! P3! Profiling! Objects! Profiling! Objects! ~1,000 SoC’s! Threads! Threads! 28 32 36 40 JIT! GC! JIT! GC! Single application! Single application! P0! CPU! CPU! x10-100! Runtime Instance! Runtime Instance! RAM! RAM! RAM! …! RAM! Vector Core! Crypto! Commodity OS (Linux)! Commodity OS (Linux)! SoC! SoC! SoC! …! SoC! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 NIC! Accelerators! P0! P1! P2! P3! Commodity Server! CPUs! DRAM! …! Commodity Server! CPUs! DRAM! Handle remote DRAM reqs.! High-speed Interconnect 10Gb/s Ethernet at rack-level interconnect! Custom System-on-Chip! First International Workshop on Rack-scale Computing (WRSC 2014) 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Disaggregated bulk non-volatile NVM NVM NVM P0! P1! P2! P3! memory, globally accessible! Storage! Storage! Storage! The Case for the Holistic Language Runtime System Client #1! App A! App B! Client #2! App C! Service Interfaces! App D! Coordinate GC! Schedule work on available runtime instances! Target accelerators! Share JIT/profiling results! Failure tolerance! Transparent relocation of Distributed Runtime! Manage distributed heaps! Enforce SLAs! execution and data! ? ? y ? Martin Maas Krste Asanovic´ Profiling!Tim HarrisThreads! ProfilingJohn! KubiatowiczThreads! Objects! Objects! Objects! Objects! JIT! GC! JIT! GC! ? Program 1 Programy 2! Program 1 Program 2! University of California,Per-node Berkeley Runtime! Oracle Labs SoC! CPUs! Accelerators! DRAM! …! SoC! CPUs! Accelerators! DRAM! High-speed Interconnect! ABSTRACT Cluster Scheduler! We anticipate that, by 2020, the basic unit of warehouse- App #1! App #2! App #3! App #4! App #1! App #2! App #3! App #4! scale cloud computing will be a rack-sized machine instead Language Language Language Language Distributed Runtime (Coordination Layer)! of an individual server. At the same time, we expect a Runtime! Runtime! Runtime! Runtime! shift from commodity hardware to custom SoCs that are Holistic Runtime System! Per-node Per-node Commodity OS! Commodity OS! specifically designed for the use in warehouse-scale comput- Language Runtime! Language Runtime! ing. In this paper, we make the case that the software for Today: Multiple Runtime Instances per node, The Holistic Language Runtime: One runtime such custom rack-scale machines should move away from the coordinated by a cluster scheduler.! system per node with global coordination.! model of running managed language workloads in separate Figure 1: Comparing today's software stack and holistic runtimes language runtimes on top of a traditional operating system but instead run a distributed language runtime system capa- 2. CLOUD COMPUTING IN 2020 ble of handling different target languages and frameworks. Warehouse-scale computers in the cloud are becoming the All applications will execute within this runtime, which per- backbone for a growing portion of applications. We expect forms most traditional OS and cluster manager functionality this trend to continue: in this section, we identify a set of de- such as resource management, scheduling and isolation. velopments that, we believe, will lead to significant changes to cloud infrastructure over the next 5-10 years. 1. INTRODUCTION We introduce Holistic Language Runtime Systems as a way 2.1 Different Cloud Hardware to program future cloud platforms. We anticipate that over As the market for IaaS and PaaS cloud services is growing, the next years, cloud providers will shift towards rack-scale and an increasing number of organizations and developers machines of custom designed SoCs, replacing the commod- are adopting the cloud, we expect providers to shift away ity parts in current data centers. At the same time, we from using commodity parts. While this kind of hardware expect a shift towards interactive workloads developed by has been the most economical solution for the last decade, third-party, non-systems programmers. This leads to a pro- the overheads of replicated and often unused components gramming crisis as productivity programmers have to develop (such as unnecessary interfaces, motherboards or peripher- for increasingly complex and opaque hardware (Section 2). als) lead to significant inefficiency in energy consumption, We believe that, as a result, the cloud will be exclusively hardware costs and resource utilization. We expect that due programmed through high-level languages and frameworks. to the increasing scale of warehouse-scale computing, it will However, today's software stack is ill-suited for such a sce- become more economical to design custom SoCs for data nario (Section 3): the conventional approach of running each centers (either as a third-party offering, or designed from application in a separate runtime system leads to inefficiency stock IP by the cloud provider itself). This has been facil- and unpredictable latencies due to interference between mul- itated by the availability of high-quality IP (such as ARM tiple runtime instances, and prevents optimizations. server processors) and would enable to produce SoCs with We propose to solve this problem through running all ap- 10s of cores, accelerators, on-chip NICs integrated into the plications in a rack-wide distributed runtime system. This cache hierarchy (such as [38]) and DRAM controllers that holistic runtime (Section 4) consists of a per-node runtime enable remote accesses without going through a CPU. that executes all workloads within that node, and a dis- We envision 100 to 1,000 of these SoCs to be connected in tributed inter-node runtime that coordinates activities be- a rack-scale environment with a high-bandwidth/low-latency tween them (Figure 1). The per-node runtime takes over the interconnect. Such a rack of SoCs will be the new basic unit functionality of a traditional OS (such as resource manage- of the data center. While the different SoCs are tightly ment, scheduling and isolation), while the distributed layer coupled and may provide a global address space, remote coordinates activities between the different nodes (similar to caching may be limited. At the same time, I/O and bulk a cluster manager, but with low-level control of the system). storage are moved to the periphery of the rack and accessed This enables to e.g. coordinate garbage collection between over the network. Compared with today's clusters, advan- nodes to not interfere with low-latency RPCs, share JIT re- tages of such a system include better energy efficiency and sults, or transparently migrate execution and data. better predictability due to flatter interconnects enabled by We discuss the merits of this model and show challenges the rack-scale form factor (such as a rack-level high-radix that need to be addressed by such a system (Section 5). switch) and being able to integrate NICs into the SoC. 1 First International Workshop on Rack-scale Computing (WRSC 2014) The rack-scale setup bears similarity to systems such as First, the high degree of complexity in the system can lead the HP Moonshot [4] or the AMD SeaMicro SM10000 [41], to high tail-latencies. This is unacceptable for interactive but with custom SoCs. This trend towards rack-scale sys- jobs, which hence require over-provisioning and other tech- tems extends to the research community: for example, an in- niques to be tail-tolerant [22]. This approach works today, creasing amount of work is looking at scale-out architectures since interactive jobs make up a relatively small part of the for memory [38] and CPUs [35]. Storage systems are chang- mix. However, as the portion of interactive jobs is growing, ing as well, with DRAM becoming the main base for stor- more resources are wasted for over-provisioning and replica- age [45] and new low-latency non-volatile memory technolo- tion, which quickly becomes economically infeasible. gies on the horizon [19]. Research is also looking at accelera- Second, load on the warehouse-scale system varies greatly tors to make use of available chip area and tackle the power between different times of the day, week and year, and cloud wall [34, 48], and custom SoCs enable inclusion of acceler- providers need to provision for peak times. Shutting off in- ators for common cloud workloads, such as databases [34]. dividual machines is difficult, since workloads and storage is However, this increases the degree of heterogeneity. spread across many machines and the overhead of consoli- ! The cloud is becoming more opaque: It is likely that dating them before shutting off may be too large. details of the SoC design will be proprietary and program- Due to the rising cost of over-provisioning, we expect that mers will have less knowledge about the hardware they run much of it will have to be replaced by fine-grained sharing of on. Even today it is difficult (and not desirable for portabil- resources { both to reduce tail-latency and to make it possi- ity) to fine-tune programs to the underlying hardware [31]. ble to consolidate workloads on a smaller set of nodes (and However, as cloud providers move to custom SoCs, it will switch off unused parts of a running system). This implies become infeasible. This creates a programmability crisis: a need to reduce detrimental interference between jobs on a programmers will have to program a platform they know highly loaded machine, which is a challenging problem [27].
Recommended publications
  • A Status Update of BEAMJIT, the Just-In-Time Compiling Abstract Machine
    A Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine Frej Drejhammar and Lars Rasmusson <ffrej,[email protected]> 140609 Who am I? Senior researcher at the Swedish Institute of Computer Science (SICS) working on programming tools and distributed systems. Acknowledgments Project funded by Ericsson AB. Joint work with Lars Rasmusson <[email protected]>. What this talk is About An introduction to how BEAMJIT works and a detailed look at some subtle details of its implementation. Outline Background BEAMJIT from 10000m BEAMJIT-aware Optimization Compiler-supported Profiling Future Work Questions Just-In-Time (JIT) Compilation Decide at runtime to compile \hot" parts to native code. Fairly common implementation technique. McCarthy's Lisp (1969) Python (Psyco, PyPy) Smalltalk (Cog) Java (HotSpot) JavaScript (SquirrelFish Extreme, SpiderMonkey, J¨agerMonkey, IonMonkey, V8) Motivation A JIT compiler increases flexibility. Tracing does not require switching to full emulation. Cross-module optimization. Compiled BEAM modules are platform independent: No need for cross compilation. Binaries not strongly coupled to a particular build of the emulator. Integrates naturally with code upgrade. Project Goals Do as little manual work as possible. Preserve the semantics of plain BEAM. Automatically stay in sync with the plain BEAM, i.e. if bugs are fixed in the interpreter the JIT should not have to be modified manually. Have a native code generator which is state-of-the-art. Eventually be better than HiPE (steady-state). Plan Use automated tools to transform and extend the BEAM. Use an off-the-shelf optimizer and code generator. Implement a tracing JIT compiler. BEAM: Specification & Implementation BEAM is the name of the Erlang VM.
    [Show full text]
  • Using JIT Compilation and Configurable Runtime Systems for Efficient Deployment of Java Programs on Ubiquitous Devices
    Using JIT Compilation and Configurable Runtime Systems for Efficient Deployment of Java Programs on Ubiquitous Devices Radu Teodorescu and Raju Pandey Parallel and Distributed Computing Laboratory Computer Science Department University of California, Davis {teodores,pandey}@cs.ucdavis.edu Abstract. As the proliferation of ubiquitous devices moves computation away from the conventional desktop computer boundaries, distributed systems design is being exposed to new challenges. A distributed system supporting a ubiquitous computing application must deal with a wider spectrum of hardware architectures featuring structural and functional differences, and resources limitations. Due to its architecture independent infrastructure and object-oriented programming model, the Java programming environment can support flexible solutions for addressing the diversity among these devices. Unfortunately, Java solutions are often associated with high costs in terms of resource consumption, which limits the range of devices that can benefit from this approach. In this paper, we present an architecture that deals with the cost and complexity of running Java programs by partitioning the process of Java program execution between system nodes and remote devices. The system nodes prepare a Java application for execution on a remote device by generating device-specific native code and application-specific runtime system on the fly. The resulting infrastructure provides the flexibility of a high-level programming model and the architecture independence specific to Java. At the same time the amount of resources consumed by an application on the targeted device are comparable to that of a native implementation. 1 Introduction As a result of the proliferation of computation into the physical world, ubiquitous computing [1] has become an important research topic.
    [Show full text]
  • CDC Build System Guide
    CDC Build System Guide Java™ Platform, Micro Edition Connected Device Configuration, Version 1.1.2 Foundation Profile, Version 1.1.2 Optimized Implementation Sun Microsystems, Inc. www.sun.com December 2008 Copyright © 2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries. U.S. Government Rights - Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. This distribution may include materials developed by third parties. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, Java, Solaris and HotSpot are trademarks or registered trademarks of Sun Microsystems, Inc. or its subsidiaries in the United States and other countries. The Adobe logo is a registered trademark of Adobe Systems, Incorporated. Products covered by and information contained in this service manual are controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries. Nuclear, missile, chemical biological weapons or nuclear maritime end uses or end users, whether direct or indirect, are strictly prohibited.
    [Show full text]
  • EXE: Automatically Generating Inputs of Death
    EXE: Automatically Generating Inputs of Death Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler Computer Systems Laboratory Stanford University Stanford, CA 94305, U.S.A {cristic, vganesh, piotrek, dill, engler} @cs.stanford.edu ABSTRACT 1. INTRODUCTION This paper presents EXE, an effective bug-finding tool that Attacker-exposed code is often a tangled mess of deeply- automatically generates inputs that crash real code. Instead nested conditionals, labyrinthine call chains, huge amounts of running code on manually or randomly constructed input, of code, and frequent, abusive use of casting and pointer EXE runs it on symbolic input initially allowed to be “any- operations. For safety, this code must exhaustively vet in- thing.” As checked code runs, EXE tracks the constraints put received directly from potential attackers (such as sys- on each symbolic (i.e., input-derived) memory location. If a tem call parameters, network packets, even data from USB statement uses a symbolic value, EXE does not run it, but sticks). However, attempting to guard against all possible instead adds it as an input-constraint; all other statements attacks adds significant code complexity and requires aware- run as usual. If code conditionally checks a symbolic ex- ness of subtle issues such as arithmetic and buffer overflow pression, EXE forks execution, constraining the expression conditions, which the historical record unequivocally shows to be true on the true branch and false on the other. Be- programmers reason about poorly. cause EXE reasons about all possible values on a path, it Currently, programmers check for such errors using a com- has much more power than a traditional runtime tool: (1) bination of code review, manual and random testing, dy- it can force execution down any feasible program path and namic tools, and static analysis.
    [Show full text]
  • An Opencl Runtime System for a Heterogeneous Many-Core Virtual Platform Kuan-Chung Chen and Chung-Ho Chen Inst
    An OpenCL Runtime System for a Heterogeneous Many-Core Virtual Platform Kuan-Chung Chen and Chung-Ho Chen Inst. of Computer & Communication Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. [email protected], [email protected] Abstract—We present a many-core full system is executed on a Linux PC while the kernel code execution is simulation platform and its OpenCL runtime system. The simulated on an ESL many-core system. OpenCL runtime system includes an on-the-fly compiler and resource manager for the ARM-based many-core The rest of this paper consists of the following sections. In platform. Using this platform, we evaluate approaches of Section 2, we briefly introduce the OpenCL execution model work-item scheduling and memory management in and our heterogeneous full system simulation platform. Section OpenCL memory hierarchy. Our experimental results 3 discusses the OpenCL runtime implementation in detail and show that scheduling work-items on a many-core system its enhancements. Section 4 presents the evaluation system and using general purpose RISC CPU should avoid per work- results. We conclude this paper in Section 5. item context switching. Data deployment and work-item coalescing are the two keys for significant speedup. II. OPENCL MODEL AND VIRTUAL PLATFORM Keywords—OpenCL; runtime system; work-item coalescing; In this section, we first introduce the OpenCL programming full system simulation; heterogeneous integration; model and followed by our SystemC-based virtual platform architecture. I. INTRODUCTION A. OpenCL Platform Overview OpenCL is a data parallel programming model introduced Fig. 1 shows a conceptual OpenCL platform which has two for heterogeneous system architecture [1] which may include execution components: a host processor and the compute CPUs, GPUs, or other accelerator devices.
    [Show full text]
  • In Using the GNU Compiler Collection (GCC)
    Using the GNU Compiler Collection For gcc version 6.1.0 (GCC) Richard M. Stallman and the GCC Developer Community Published by: GNU Press Website: http://www.gnupress.org a division of the General: [email protected] Free Software Foundation Orders: [email protected] 51 Franklin Street, Fifth Floor Tel 617-542-5942 Boston, MA 02110-1301 USA Fax 617-542-2652 Last printed October 2003 for GCC 3.3.1. Printed copies are available for $45 each. Copyright c 1988-2016 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being \Funding Free Software", the Front-Cover Texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled \GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. i Short Contents Introduction ::::::::::::::::::::::::::::::::::::::::::::: 1 1 Programming Languages Supported by GCC ::::::::::::::: 3 2 Language Standards Supported by GCC :::::::::::::::::: 5 3 GCC Command Options ::::::::::::::::::::::::::::::: 9 4 C Implementation-Defined Behavior :::::::::::::::::::: 373 5 C++ Implementation-Defined Behavior ::::::::::::::::: 381 6 Extensions to
    [Show full text]
  • A Source to Source Translator
    SML2Java: A Source to Source Translator Justin Koser, Haakon Larsen, and Jeffrey A. Vaughan Cornell University Abstract. Java code is unsafe in several respects. Explicit null refer- ences and object downcasting can cause unexpected runtime errors. Java also lacks powerful language features such as pattern matching and first- class functions. However, due to its widespread use, cross-platform com- patibility, and comprehensive library support, Java remains a popular language. This paper discusses SML2Java, a source to source translator. SML2Java operates on type checked SML code and, to the greatest extent possible, produces functionally equivalent Java source. SML2Java allows program- mers to combine existing SML code and Java applications. While direct translation of SML primitive types to Java primitive types is not possible, the Java class system provides a powerful framework for emulating SML value semantics. Function translations are based on a substantial similarity between Java’s first-class objects and SML’s first- class functions. 1 Introduction SML2Java is a source-to-source translator from Standard ML (SML), a statically typed functional language [8], to Java, an object-oriented imperative language. A successful translator must emulate distinctive features of one language in the other. For instance, SML’s first-class functions are mapped to Java’s first-class objects, and an SML let expression could conceivably be translated to a Java interface containing an ’in’ function, where every let expression in SML would produce an anonymous instantiation of the let interface in Java. Similarly, many other functional features of SML are translated to take advantage of Java’s object-oriented style.
    [Show full text]
  • Building a Source-To-Source UPC-To-C Translator by Wei-Yu
    Building a Source-to-Source UPC-to-C Translator by Wei-Yu Chen Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II. Approval for the Report and Comprehensive Examination: Committee: Professor Katherine Yelick Research Advisor (Date) ******* Professor Rastislav Bodik Second Reader (Date) Contents 1 Introduction 4 2 Background 5 2.1 Unified Parallel C ........................... 5 2.2 The Berkeley UPC Compiler ..................... 7 3 The UPC-to-C Translator: An Overview 8 4 Code Generation Issues 9 4.1 Portable Code Generation ....................... 10 4.2 Handling Shared Expressions ..................... 11 4.3 Code Generation Example ...................... 11 5 Translator Output Performance – Vectorization 12 5.1 Implementation Approach ...................... 13 5.2 Livermore Kernels .......................... 13 6 Optimization Framework 15 7 Optimizing UPC Parallel Loop 16 8 Message Coalescing: Implementation and Results 19 8.1 Analysis for the Optimization ..................... 19 8.2 Implementation Sketch ........................ 19 8.3 Preliminary Evaluation ........................ 22 8.3.1 Performance ......................... 22 8.3.2 Programmability ....................... 23 9 Automatic Split-phase Communication Generation 24 9.1 Algorithm Sketch ........................... 25 9.1.1 Optimize Shared Pointer Arithmetic ............. 25 9.1.2 Split-phase Communication for reads ............ 26 9.2 Preserving the UPC Consistency Model ............... 28 9.3 Optimization Example ........................ 28 9.4 Preliminary Results .......................... 28 10 Related Work 29 11 Conclusion 31 2 Abstract Unified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD) model of parallelism within a global address space. The Berkeley UPC Compiler is an open source and high-performance imple- mentation of the language.
    [Show full text]
  • Practical Partial Evaluation for High-Performance Dynamic Language Runtimes
    Practical Partial Evaluation for High-Performance Dynamic Language Runtimes Thomas Wurthinger¨ ∗ Christian Wimmer∗ Christian Humer∗ Andreas Woߨ ∗ Lukas Stadler∗ Chris Seaton∗ Gilles Duboscq∗ Doug Simon∗ Matthias Grimmery ∗ y Oracle Labs Institute for System Software, Johannes Kepler University Linz, Austria fthomas.wuerthinger, christian.wimmer, christian.humer, andreas.woess, lukas.stadler, chris.seaton, gilles.m.duboscq, [email protected] [email protected] Abstract was first implemented for the SELF language [23]: a multi- Most high-performance dynamic language virtual machines tier optimization system with adaptive optimization and de- duplicate language semantics in the interpreter, compiler, optimization. Multiple tiers increase the implementation and and runtime system, violating the principle to not repeat maintenance costs for a VM: In addition to a language- yourself. In contrast, we define languages solely by writ- specific optimizing compiler, a separate first-tier execution ing an interpreter. Compiled code is derived automatically system must be implemented [2, 4, 21]. Even though the using partial evaluation (the first Futamura projection). The complexity of an interpreter or a baseline compiler is lower interpreter performs specializations, e.g., augments the in- than the complexity of an optimizing compiler, implement- terpreted program with type information and profiling infor- ing them is far from trivial [45]. Additionally, they need to mation. Partial evaluation incorporates these specializations. be maintained and ported to new architectures. This makes partial evaluation practical in the context of dy- But more importantly, the semantics of a language need namic languages, because it reduces the size of the compiled to be implemented multiple times in different styles: For code while still compiling in all parts of an operation that are the first-tier interpreter or baseline compiler, language op- relevant for a particular program.
    [Show full text]
  • Effectivesan: Type and Memory Error Detection Using Dynamically Typed C/C++∗
    EffectiveSan: Type and Memory Error Detection using Dynamically Typed C/C++∗ Gregory J. Duck Roland H. C. Yap Department of Computer Science Department of Computer Science National University of Singapore National University of Singapore Singapore Singapore [email protected] [email protected] Abstract Keywords Type errors, memory errors, (sub-)object bounds Low-level programming languages with weak/static type errors, use-after-free errors, type confusion, dynamic types, systems, such as C and C++, are vulnerable to errors re- type checking, bounds checking, sanitizers, low-fat pointers, lating to the misuse of memory at runtime, such as (sub- C, C++ )object bounds overflows, (re)use-after-free, and type con- ACM Reference Format: fusion. Such errors account for many security and other Gregory J. Duck and Roland H. C. Yap. 2018. EffectiveSan: Type undefined behavior bugs for programs written in these lan- and Memory Error Detection using Dynamically Typed C/C++. guages. In this paper, we introduce the notion of dynamically In Proceedings of 39th ACM SIGPLAN Conference on Programming typed C/C++, which aims to detect such errors by dynami- Language Design and Implementation (PLDI’18). ACM, New York, cally checking the “effective type” of each object before use NY, USA, 15 pages. https://doi.org/10.1145/3192366.3192388 at runtime. We also present an implementation of dynami- cally typed C/C++ in the form of the Effective Type Sanitizer 1 Introduction (EffectiveSan). EffectiveSan enforces type and memory safety Modern programming languages employ type systems to using a combination of low-fat pointers, type meta data and control object usage and detect bugs.
    [Show full text]
  • Automatic Hybridization of Runtime Systems
    MULTIVERSE Automatic Hybridization of Runtime Systems Kyle C. Hale, Conor Hetland, Peter Dinda 1 HYBRID PARALLEL RUNTIMES LONG-TERM GOAL: let’s reimagine systems S/W stack for parallel runtimes (esp. for high-level parallel languages) why in the world? OS gets in the way: imposed abstractions, hides HW capabilities avoid duplicated functionality in the runtime system and OS explore new ways of doing things! 2 A THOUGHT EXPERIMENT 3 2 ASSUMPTIONS 1. YOU ARE A HARDCORE OS HACKER expert parallel programmer elite performance engineer you can squash bugs like no one’s business you have masochistic tendencies 4 2 ASSUMPTIONS 2. YOU HAVE INFINITE TIME ON YOUR HANDS 5 NOW, GO BUILD A PARALLEL RUNTIME SYSTEM *on raw hardware 6 HOW DIFFERENT IS YOUR RUNTIME FROM A TYPICAL OS KERNEL? [Hale et al., HPDC ‘15] [Hale et al., VEE ’16] 7 PROBLEM: you’ve already built a runtime for Linux building it from scratch is hard 8 Linux specialized OS user-space code kernel 9 OKAY, LESS AMBITIOUS I’ll give you a kernel framework Now you just have to port, probably add some functionality 10 instead of starting from scratch, we can port to kernel mode: DIFFICULT TIME-CONSUMING ERROR-PRONE 11 development cycle: do { ADD FUNCTION REBUILD BOOT specialized kernel } while (kernel falls over) 12 FURTHERMORE much of the functionality is NOT ON THE CRITICAL PATH 13 Linux specialized OS user-space code kernel 14 15 MULTIVERSE C/C++ source tree for runtime system point Multiverse to runtime’s source tree rebuild Automatic Hybridization run it (boots as kernel) identify hotspots
    [Show full text]
  • Hiperjit: a Profile-Driven Just-In-Time Compiler for Erlang
    HiPErJiT: A Profile-Driven Just-in-Time Compiler for Erlang Konstantinos Kallas Konstantinos Sagonas University of Pennsylvania Uppsala University Philadelphia, USA Uppsala, Sweden [email protected] [email protected] ABSTRACT Despite this flexibility, the selection of the modules of an appli- We introduce HiPErJiT, a profile-driven Just-in-Time compiler for cation to compile to native code is currently manual. Perhaps it the BEAM ecosystem based on HiPE, the High Performance Erlang would be better if the system itself could decide on which parts compiler. HiPErJiT uses runtime profiling to decide which modules to compile to native code in a just-in-time fashion. Moreover, it to compile to native code and which of their functions to inline and would be best if this process was guided by profiling information type-specialize. HiPErJiT is integrated with the runtime system of gathered during runtime, and was intelligent enough to allow for Erlang/OTP and preserves aspects of Erlang’s compilation which the continuous run-time optimization of the code of an application. are crucial for its applications: most notably, tail-call optimization This paper makes a first big step in that direction. We have devel- and hot code loading at the module level. We present HiPErJiT’s ar- oped HiPErJiT, a profile-driven Just-in-Time compiler for the BEAM chitecture, describe the optimizations that it performs, and compare ecosystem based on HiPE. HiPErJiT employs the tracing support of its performance with BEAM, HiPE, and Pyrlang. HiPErJiT offers the Erlang runtime system to profile the code of bytecode-compiled performance which is about two times faster than BEAM and almost modules during execution and choose whether to compile these as fast as HiPE, despite the profiling and compilation overhead that modules to native code, currently employing all optimizations that it has to pay compared to an ahead-of-time native code compiler.
    [Show full text]