Architectural Support for Javascript Type-Checking on Mobile Processors

Checked Load: Architectural Support for JavaScript Type-Checking on Mobile Processors Owen Anderson Emily Fortuna Luis Ceze Susan Eggers Computer Science and Engineering, University of Washington http://sampa.cs.washington.edu Abstract applications, including Google Maps, Twitter, and Face- book would not be feasible without both high-throughput Dynamic languages such as Javascript are the de-facto and low-latency JavaScript virtual machines on the client. standard for web applications. However, generating effi- At the same time, innovations in mobile device pro- cient code for dynamically-typed languages is a challenge, grammability have opened up embedded targets to the same because it requires frequent dynamic type checks. Our anal- class of programmers. Today’s smart mobile devices are ysis has shown that some programs spend upwards of 20% expected to provide a developer API that is usable by of dynamic instructions doing type checks, and 12.9% on normal application developers, as opposed to the special- average. ized embedded developers of the past. One such plat- In this paper we propose Checked Load, a low- form, HP/Palm’s WebOS [17], uses JavaScript as its pri- complexity architectural extension that replaces software- mary application development language. Others encourage based, dynamic type checking. Checked Load is comprised JavaScript-heavy web applications in addition to their na- of four new ISA instructions that provide flexible and au- tive development environments, as a means of providing tomatic type checks for memory operations, and whose im- feature-rich, portable applications with minimal develop- plementation requires minimal hardware changes. We also ment costs. propose hardware support for dynamic type prediction to Because of their power and space constraints, embedded reduce the cost of failed type checks. We show how to use processors for mobile devices typically do not employ tradi- Checked Load in the Nitro JavaScript just-in-time compiler tional heavyweight architectural techniques, such as wide- (used in the Safari 5 browser). Speedups on a typical mo- issue, out-of-order execution, to hide instruction latencies. bile processor range up to 44.6% (with a mean of 11.2%) in Instead, lightweight, minimal techniques must be used. For popular JavaScript benchmarks. While we have focused our lightweight architectures which were originally designed work on JavaScript, Checked Load is sufficiently general to for executing languages such as C, dynamically-typed lan- support other dynamically-typed languages, such as Python guages pose special performance problems. While these ar- or Ruby. chitectures have successfully weathered the transition from their procedural-language roots to modern object-oriented languages like C++ and Java, progressing to dynamically- 1 Introduction typed languages is considerably more challenging to imple- ment efficiently. In contrast to object-oriented languages, Dynamically-typed languages, including JavaScript [9], where the majority of the language closely resembles a pro- Python [18], and Ruby [7], have exploded in popularity, cedural language (with the addition of dynamic method dis- thanks largely to their lower barriers to entry for both de- patch), dynamically-typed languages are built on higher- velopment and deployment, afforded, in part, by their high- level primitives. level abstractions. These language features allow for more In JavaScript this translates into an instruction stream expressive programs, while simultaneously making univer- dominated by dynamic type checks for “primitive” types sal portability significantly more tenable. and hash-table accesses for user-defined types. In addition, JavaScript, in particular, has seen massive growth, al- because of a combination of a language that is not con- lowing rich client-side interactions in web applications, en- ducive to static analysis and the demands responsiveness abled by the availability of high-performance JavaScript places on code-generation latency, one cannot rely on the virtual machines in all major web browsers. Popular web JavaScript virtual machine’s just-in-time compiler to per- 1 form the aggressive optimizations that would reduce the we first demonstrate the costs of dynamic typing on mo- costs of these operations. For example, the JavaScript VM bile processors by instrumenting the execution of a modern Nitro (previously known as SquirrelFish Extreme [26]) that JavaScript VM, and running it on a simulator to reflect the is used in Safari 5 performs only local optimization, such as performance characteristics of a mobile processor. allocating registers within a single macro-opcode, spilling Type guards, as type checks are known in modern, dy- and restoring them at opcode boundaries. namic language implementations, follow the prototype laid out by Steenkiste [23]. Under this approach, all values, Contributions. The contributions of this paper are two- regardless of type, are represented as machine-word-sized fold. First, we quantify the costs of type checks on mo- (virtual) registers, the high bits of which are used to imple- bile processors, in effect, identifying a performance gap to ment a type tag. For primitive integers, the remaining bits be closed. Second, we propose an instruction set extension are the value of the (reduced-width) integer; for other types to address this gap, design its implementation, and demon- they are a pointer into a memory allocation pool, thus pro- strate its use in code generation and its impact on perfor- viding the address of the object. mance. Given this structure, before a value can be accessed, a More specifically, we investigate the special costs of sequence of mask instructions first extracts the value of the dynamic typing in JavaScript and propose and evaluate tag. This is then followed by a branch on the tag value, architecture-level instructions that reduce them. Our study either to a fast-path block (if the comparison succeeds), or shows that, while type checks are individually small, on to an error or recovery block (if it fails). In the fast case, the order of two x86 instructions per check, they occur the value is then extended and used in the type-appropriate with such frequency as to account for an average 10:9% computation. A clever tag encoding [23] is employed, so (and a high of 46:8%) of executed instructions generated that two’s-complement arithmetic operations may be issued by JavaScript’s virtual machine and 12:9% (62:0% at the on values known to be integers without the need to mask out maximum) of the execution time of the generated code. the tag bits beforehand. In the slow case, fully general type- To reduce the cost of these type checks, we propose a conversion routines (typically written in C and provided by simple instruction set extension, Checked Load. Checked the VM) are called to perform the necessary conversions Load adds four instructions to an ISA and requires only a before the actual computation can be performed. handful of basic hardware components the size of a com- Figure 1 shows the x86 assembly produced by the Nitro parator or MUX for its implementation. It brings perfor- JavaScript VM for the fast-path block of an indexed array mance benefits on generated code that average 11:2% and store operation, written as a[i] = c; in both JavaScript run as high as 44:6%, without slowing down microarchitec- and C. In C this would typically be a single store instruction tural critical paths. In conjunction with Checked Load, we with a base-plus-offset addressing mode. also propose dynamic type prediction, which utilizes stan- In contrast to the simplicity of that implementation, the dard branch prediction hardware and is responsible for a x86 generated by the JavaScript VM contains five branches, large fraction of the performance improvements. three of which are directly attributable to type checking. To minimize the number of instructions generated, this particular implementation uses x86’s rich branching constructs Outline. The remainder of this paper is organized as to avoid complex tag masking operations for the guards. follows: Section 2 analyzes the costs of type checking on Despite this, more than a third of the instructions are ex- a modern mobile processor. Section 3 presents Checked pended on type guards. On a RISC ISA, these guards may Load, our proposed instruction set extension for efficiently require more instructions, raising the type-guard-to-total- implementing type checks. Section 4 demonstrates the per- instruction ratio and leading to an even worse performance formance benefits of Checked Load, including several type- picture. prediction mechanisms. Section 5 examines related work in the scope of optimizing the execution of dynamic lan- 2.1 Experimental Methodology guages, from historical work on LISP to modern JavaScript implementations. Finally, Section 6 concludes. JavaScript VM. To measure the cost of type checks on a real JavaScript implementation, we instrumented a copy 2 Quantifying Type Check Costs of the Nitro JavaScript VM [26], that is used in Apple’s Sa- fari and MobileSafari browsers for desktop computers and While it is accepted wisdom that dynamic type check- the iPhone, respectively. Nitro is a method-oriented JIT ing is a source of performance loss in dynamically typed compiler, with a macro-expansion code generator that is de- languages, any investment in customized hardware must signed for low-latency code generation. Compilation occurs be driven by strong empirical evidence. For that reason, in two

Architectural Support for Javascript Type-Checking on Mobile Processors

Webcore: Architectural Support for Mobile Web Browsing

EVA: an Efficient Vision Architecture for Mobile Systems

6Th Generation Intel® Core™ Processors Based on the Mobile U-Processor for Iot Solutions (Intel® Core™ I7-6600U, I5-6300U, and I3-6100U Processors)

PART I ITEM 1. BUSINESS Industry We Are

NVIDIA Tegra 4 Family CPU Architecture 4-PLUS-1 Quad Core

Intel® Core™2 Duo Mobile Processor for Intel® Centrino® Duo Mobile Processor Technology

Design and Implementation of Low Power 5 Stage Pipelined 32 Bits MIPS Processor Using 28Nm Technology

5 Microprocessors

Introducing Intel's First-Ever Core™ I9 Mobile Processor

Intel® Core™ I7-900/I7-800/I7-700 Mobile Processor

Android in the Cloud on Arm Native Servers

Cubesat and Mobile Processors