Parallelizing the Web Browser

Parallelizing the Web Browser Christopher Grant Jones, Rose Liu, Leo Meyerovich, Krste Asanovic,´ Rastislav Bodík Department of Computer Science, University of California, Berkeley {cgjones,rfl,lmeyerov,krste,bodik}@cs.berkeley.edu Abstract load time is still slowed down by 50%. On an equiva- lent network connection, the iPhone browser is 5 to 10- We argue that the transition from laptops to handheld times slower than Firefox on a fast laptop. The browser computers will happen only if we rethink the design of is CPU-bound because it is a compiler (for HTML), a web browsers. Web browsers are an indispensable part page layout engine (for CSS), and an interpreter (for of the end-user software stack but they are too ineffi- JavaScript); all three tasks are on a user’s critical path. cient for handhelds. While the laptop reused the soft- The successively smaller form factors of previous ware stack of its desktop ancestor, solid-state device computer generations (workstations, desktops, and then trends suggest that today’s browser designs will not be- laptops) were enabled by exponential improvements in come sufficiently (1) responsive and (2) energy-efficient. the performance of single-thread execution. Future We argue that browser improvements must go beyond CMOS generations are expected to increase the clock JavaScript JIT compilation and discuss how parallelism frequency only marginally, and handheld application de- may help achieve these two goals. Motivated by a future velopers have already adapted to the new reality: rather browser-based application, we describe the preliminary than developing their applications in the browser, as is design of our parallel browser, its work-efficient parallel the case on the laptop, they rely on native frameworks algorithms, and an actor-based scripting language. such as the iPhone SDK (Objective C), Android (Java), or Symbian (C++). These frameworks offer higher per- 1 Browsers on Handheld Computers formance but do so at the cost of portability and pro- Bell’s Law of Computer Classes predicts that handheld grammer productivity. computers will soon replace and exceed laptops. Indeed, 2 An Efficient Web Browser internet-enabled phones have already eclipsed laptops in number, and may soon do so in input-output capability We want to redesign browsers in order to improve their as well. The first phone with a pico-projector (Epoq) (1) responsiveness and (2) energy efficiency. While our launched in 2008; wearable and foldable displays are primary motivation is the handheld browser, most im- in prototyping phases. Touch screens keyboards and provements will benefit laptop browsers equally. There speech recognition have become widely adopted. Con- are several ways to achieve the two goals: tinuing Bell’s prediction, phone sensors may enable ap- • Offloading the computation. Used in the Deepfish plications not feasible on laptops. and Skyfire mobile browsers, a server-side proxy Further evolution of handheld devices is limited browser renders a page and sends compressed im- mainly by their computing power. Constrained by the ages to a handheld. Offloading bulk computations, battery and heat dissipation, the mobile CPU noticeably such as speech recognition, seems attractive, but impacts the web browser in particular: even with a fast adding server latencies exceeds the 40ms threshold wi-fi connection, the iPhone may take 20 seconds to load for visual perception, making proxy architectures the Slashdot home page. Until mobile browsers become insufficient for interactive GUIs. Disconnected op- dramatically more responsive, they will continue to be eration is inherently impossible. used only when a laptop browser is not available. With- • Removing the abstraction tax. Browser program- out a fast browser, handhelds may not be able to support mers pay for their increased productivity with an compelling web applications, such as Google Docs, that abstraction tax—the overheads of the page layout are sprouting in laptop browsers. engine, the JavaScript interpreter, parsers for ap- One may expect that mobile browsers are network- plications deployed in plain text, and other com- bound, but this is not the case. On a 2Mbps network ponents. We measured this tax to be two orders of connection, soon expected on internet phones, the CPU magnitude: a small Google Map JavaScript appli- bottleneck becomes visible: On a ThinkPad X61 lap- cation runs about 70-times slower than the equiv- top, halving the CPU frequency doubles the Firefox page alent written using C and pixel frame buffers [3]. load times for cached pages; with a cold cache, the Removing the abstraction tax is attractive because it improves both responsiveness (the program runs What are the implications of these trends on browser faster) and energy efficiency (the program performs parallelization? Consider the goal of sustained 50-way less work). JavaScript overheads can be reduced parallelization execution. A pertinent question is how with JIT compilation, typically 5 to 10-times, but much of the browser we can afford not to parallelize. browsers often spend only 15% of time in the inter- Assume that the processor can execute 250 parallel op- preter. Abstraction reduction remains attractive but erations. Amdahl’s Law shows that to to sustain 50-way we leave it out of this paper. parallelism, only 4% of the browser execution can re- • Parallelizing the browser. Future CMOS genera- main sequential, with the rest running at 250-way paral- tions will not allow significantly faster clock rates, lelism. Obtaining this level of parallelism is challenging but they will be about 25% more energy efficient because, with the exception of media decoding, browser per generation. The savings translate into addi- algorithms have been optimized for single-threaded ex- tional cores, already observed in handhelds. Paral- ecution. This paper suggests that it may be possible to lelizing the browser improves responsiveness (goal uncover 250-way parallelism in the browser. 1) and to some extent also energy efficiency (goal 2. Type of parallelism: Should we exploit task paral- 2): vector instructions improve energy efficiency lelism, data parallelism, or both? Data parallel architec- per operation, and although parallelization does not tures such as SIMD are efficient because their instruc- decrease the amount of work, gains in program ac- tion delivery, which consumes about 50% of energy on celeration may allow us to slow down the clock, superscalar processors, is amortized among the parallel further improving the energy efficiency. operations. A vector accelerator has been shown to in- The rest of this paper discusses how one may go about crease energy efficiency 10-times [9]. In this paper, we parallelizing the web browser. show that at least a part of the browser can be imple- mented in data parallel fashion. 3 What Kind of Parallelism? 3. Algorithms: Which parallel algorithms improve en- The performance of the browser is ultimately limited by ergy efficiency? Parallel algorithms that accelerate pro- energy constraints and these constraints dictate optimal grams do not necessarily improve energy efficiency. parallelization strategies. Ideally, we want to minimize The handheld calls for parallel algorithms that are work energy consumption while being sufficiently responsive. efficient—i.e., they do not perform more total work than This goal motivates the following questions: a sequential algorithm. An example of a work-inefficient 1. Amount of parallelism: Do we decompose the algorithm is speculative parallelization that misspecu- browser into 10 or 1000 parallel computations? The lates too often. Work efficiency is a demanding re- answer depends primarily on the parallelism available in quirement: for some “inherently sequential” problems, handheld processors. In five years, 1W processors are such as finite-state machines, only work-inefficient al- expected to have about four cores. With two threads per gorithms are known [5]. We show that careful specula- core and 8-wide SIMD instructions, devices may thus tion allows work-efficient parallelization of finite state support about 64-way parallelism. machines in the lexical analysis. The second consideration comes from voltage scaling. The energy efficiency of CMOS transistors in- 4 The Browser Anatomy creases when the frequency and supply voltage are de- Original web browsers were designed to render hyper- creased; parallelism accelerates the program to make up linked documents. Later, JavaScript programs, embed- for the lower frequency. How much more parallelism ded in the document, provided simple animated menus can we get? There is a limit to CMOS scaling benefits: by dynamically modifying the document. Today, AJAX reducing clock frequency beyond roughly 3-times from applications rival their desktop counterparts. the peak frequency no longer improves efficiency lin- The typical browser architecture is shown in Figure 1. early [6]. In the 65nm Intel Tera-scale processor, reduc- Loading an HTML page sets off a cascade of events: ing the frequency from 4Ghz to 1GHz reduces the en- The page is scanned, parsed, and compiled into a doc- ergy per operation 4-times, while for the ARM9, reduc- ument object model (DOM), an abstract syntax tree of ing supply voltage to near-threshold levels and running the document. Content referenced by URLs is fetched on 16 cores, rather on one, reduces energy consumption and inlined into the DOM. As the content necessary to by 5-times [15]. It might be possible to design mobile display the page becomes available, the page layout is processors that operate at the peak frequency allowed (incrementally) solved and drawn to the screen. After by the CMOS technology in short bursts (for example, the initial page load, scripts respond to events generated 10ms needed to layout a web page), scaling down the by user input and server messages, typically modifying frequency for the remainder of the execution; the scal- the DOM. This may, in turn, cause the page layout to be ing may allow them to exploit additional parallelism. recomputed and redrawn. optimized with sequential execution in mind.

Load more