<<

Atlantis: Robust, Extensible Execution Environments for Web Applications

James Mickens Mohan Dhawan Rutgers University [email protected] [email protected]

ABSTRACT variety of standards that define the software stack for web Today’s web applications run inside a complex browser en- applications [14, 19, 36, 52, 55]. In practice, the aggregate vironment that is buggy, ill-specified, and implemented in specification is so complex that it is difficult for any browser different ways by different browsers. Thus, web applications to implement it properly. To further complicate matters, that desire robustness must use a variety of conditional code browser vendors often add new features without consulting paths and ugly hacks to deal with the vagaries of their run- other vendors. Thus, each browser defines an idiosyncratic time. Our new exokernel browser, called Atlantis, solves JavaScript interpreter, HTML parser, and layout engine. this problem by providing pages with an extensible execu- tion environment. Atlantis defines a narrow API for basic These components are loosely compatible across different services like collecting user input, exchanging network data, vendors, but fine-tuning an application for multiple brows- and rendering images. By composing these primitives, web ers often means developing specialized application code in- pages can define custom, high-level execution environments. tended for just one browser. For example: Thus, an application which does not want a dependence on Atlantis’ predefined web stack can selectively redefine com- • Writing a portable web-based GUI is challenging be- ponents of that stack, or define markup formats and script- cause different browsers handle mouse and keyboard ing languages that look nothing like the current browser run- events in different, buggy ways (§2.4.1). time. Unlike prior microkernel browsers like OP, and unlike • Web pages use CSS [52] to express complex visual compile-to-JavaScript frameworks like GWT, Atlantis is the styles. Some browsers do not support certain CSS el- first browsing system to truly minimize a web page’s depen- ements, or do not implement them correctly (§2.4.3). dence on black box browser code. This makes it much easier When this happens, web developers must trick each to develop robust, secure web applications. browser into providing the proper layout by cobbling together CSS elements that the browser does under- stand. Categories and Subject Descriptors • D.4.7 [Operating Systems]: Organization and Design; D.4.5 Browsers expose certain internal data structures as Ja- [Operating Systems]: Reliability; D.4.6 [Operating Sys- vaScript objects that web pages can access. By in- tems]: Security and Protection trospecting these objects, pages can implement useful low-level services like logging/replay frameworks [31]. However, the reflection interface for internal browser General Terms objects is extremely brittle, making it challenging to Design, Reliability, Security implement low-level services in a reliable, portable fash- ion (§2.4.4). Keywords Web browsers, microkernels, exokernels All of these issues make it difficult for web developers to rea- son about the robustness and the security of their applica- 1. INTRODUCTION tions. JavaScript frameworks like jQuery [5] try to encapsu- Modern web browsers have evolved into sophisticated com- late browser incompatibilities, providing high-level services putational platforms. Unfortunately, creating robust web like GUI libraries atop an abstraction layer that hides condi- applications is challenging due to well-known quirks and de- tional code paths. However, these frameworks cannot hide ficiencies in commodity browsers [41]. In theory, there are a all browser bugs, or change the fundamental reality that some browsers support useful features that other browsers lack [27, 31]. Indeed, the abstraction libraries themselves may perform differently on different browsers due to unex- pected incompatibilities [18, 28].

1.1 A Problem of Interface These problems are problems of interface: web applications interact with the browser using a bloated, complex API that is hard to secure, difficult to implement correctly, and which

217 often exposes important functionality in an obscure way, if Atlantis’ extensibility and complete agnosticism about the at all. For example, a page’s visual appearance depends on application stack differentiates it from prior browsers like how the page’s HTML and CSS are parsed, how the parsed OP [25] and IBOS [46] that also use a small kernel. Those output is translated into a DOM tree (§2.2), how the result- systems are tightly coupled to the current browser abstrac- ing DOM tree is geometrically laid out, and how the renderer tions for the web stack. For example, OP pushes the Ja- draws that layout. A web page cannot directly interact with vaScript interpreter and the HTML renderer outside the any step of this process. Instead, the page can only provide kernel, isolating them in separate processes connected by a HTML and CSS to the browser and hope that the browser message passing interface. However, the DOM tree abstrac- behaves in the intended fashion. Unfortunately, this is not tion is still managed by native code that a web page cannot guaranteed [15, 30, 39, 41]. Using an abstraction framework introspect or modify in a principled way. In Atlantis, ap- like jQuery does not eliminate the fundamentally black box plications do not have to take dependencies on such opaque nature of the browser’s rendering engine. code bases. Thus, in contrast to prior microkernel brows- ers, Atlantis is more accurately described as an exokernel Using Xax [13] or NaCl [56], developers can write native browser [16] in which web pages supply their own “library code web applications, regaining low-level control over the OSes” that implement the bulk of the web stack. execution environment and unlocking the performance that results from running on the bare metal. However, most web 1.3 Contributions developers who are proficient with JavaScript, HTML, and Atlantis’ primary contribution is to show that exokernel CSS do not want to learn a native code development en- principles can lead to a browser that is not just more secure, vironment. Instead, these developers want a more robust but that also provides a more extensible execution environ- version of their current software stack, a stack which does ment; in turn, this increased extensibility allows web pages have several advantages over native code, like ease of devel- to be more robust. We provide a demonstration web stack opment and the ability to deploy to any device which runs that is written in pure JavaScript and contains an HTML a browser. parser, a layout engine, a DOM tree, and so on. This stack takes advantage of our new scripting engine, called Syphon, 1.2 Our Solution: Atlantis which provides several language features that make it eas- To address these issues, we have created a new microkernel ier to create application-defined runtimes (§3.3). We show called Atlantis. Atlantis’ design was guided by that our prototype Syphon engine is fast enough to execute a fundamental insight: the modern web stack is too compli- application-defined layout engines and GUI event handlers. cated to be implemented by any browser in a robust, secure We also demonstrate how easy it is to extend our demon- way that satisfies all web pages. The Atlantis kernel only stration web stack. For example, we show that it is trivial defines a narrow, low-level API that provides basic services to modify the DOM tree innerHTML feature so that a san- like network I/O, screen rendering, and the execution of ab- itizer [33] is automatically invoked on write accesses. This stract syntax trees [1] that respect same-origin security poli- allows a page to prevent script injection attacks [38]. cies. Each web page composes these basic services to define a richer high-level runtime that is completely controlled by The rest of this paper is organized as follows. In Section 2, that page. we describe modern browser architectures, explaining why current browsers, both monolithic and microkernel, impede The Atlantis kernel places few restrictions on such a runtime. application robustness by exposing brittle black box and We envision that in many cases, web pages will customize a grey box interfaces. This discussion motivates the exokernel third-party implementation of the current HTML/CSS stack design of Atlantis, which we describe in Section 3. After that was written in pure JavaScript and compiled to Atlantis describing our prototype implementation (§4) and several ASTs. However, a page is free to use a different combination practical deployment issues (§5), we evaluate our prototype of scripting and markup technologies. Atlantis is agnostic in Section 6, demonstrating its extensibility and its perfor- about the details of the application stack—Atlantis’ main mance on a variety of tasks. We then discuss related work role is to enforce the same origin policy and provide fair allo- in Section 7 before concluding. cation of low-level system resources. Thus, Atlantis provides web developers with an unprecedented ability to customize 2. BACKGROUND the runtime environment for their pages. For example: The vast majority of lay people, and many computer sci- entists, are unaware of the indignities that web developers • Writing robust GUIs is easy because the developer has currently face as they try to create stable, portable web ap- access to low-level input events, and can completely plications. In this section, we describe the architecture of define higher-level event semantics. a modern web browser, and explain why even new research • Creating a specific visual layout is straightforward be- browsers fail to address the fundamental deficiency of the cause the developer can define his own HTML and CSS web programming model, namely, that the aggregate “web parsers, and use Atlantis’ bitmap rendering to API” is too complex for any browser to implement in a ro- precisely control how content is displayed. bust fashion. • Pages can now safely implement low-level services that introspect “internal” browser state. The introspection is robust because the internal browser state is com- 2.1 Core Web Technologies pletely managed by the page itself—the Atlantis kernel Modern web development uses four essential technologies: knows nothing about traditional browser data struc- HTML,CSS,JavaScript,andplugins.HTML[55]isadeclar- tures like the DOM tree. ative markup language that describes the basic content in a

218 web page. HTML defines a variety of tags for including dif- application, the JavaScript wrappers for internal na- ferent kinds of data; for example, an tag references an tive objects should support the same programming se- external image, and a tag indicates a section of bold text. mantics that are supported by application-defined ob- Tags nest using an acyclic parent-child structure. Thus, a jects. However, as we discuss later, browsers do not page’s tags form a tree which is rooted by a top-level provide this equivalence in practice. node. • The storage layer managesaccesstopersistentdata like cookies, cached web objects, and DOM storage [48], Using tags like , HTML supports rudimentary manip- a new abstraction that provides each domain with sev- ulation of a page’s visual appearance. However, cascading eral megabytes of key/value storage. style sheets (often abbreviated as CSS [52]) provide much richer control. Using CSS, a page can choose fonts and color Even simple browser activities require a flurry of communi- schemes, and specify how tags should be visually positioned cation between the modules described above. For example, with respect to each other. suppose that JavaScript code wants to dynamically add an image to a page. First, the JavaScript interpreter must send JavaScript [20] is the most popular language for -side a fetch request to the network stack. Once the stack has browser scripting. JavaScript allows web pages to dynami- fetched the image, it examines the response headers and cally modify their HTML structure and register handlers for caches the image if appropriate. The browser adds a new GUI events. JavaScript also allows a page to asynchronously image node to the DOM tree, recalculates the layout, and fetch new data from web servers. renders the result. The updated DOM tree is then reflected into the JavaScript namespace, and the interpreter triggers JavaScript has traditionally lacked access to client-side hard- any application-defined event handlers that are associated ware like web cameras and microphones. However, a vari- with the image load. ety of native code plugins like Flash and Silverlight pro- vide access to such resources. These plugins run within the 2.3 Isolating Browser Components browser’s address space and are often used to manipulate Figure 1(a) shows the architecture of a monolithic browser audio or video data. A web page instantiates a plugin using like or IE8. Monolithic browsers share two impor- a special HTML tag like . tant characteristics. First, a browser “instance” consists of a process containing all of the components mentioned in Sec- 2.2 Standard Browser Modules tion 2.2. In some monolithic browsers, separate tabs receive Browsers implement the core web technologies using a stan- separate processes; however, within a tab, browser compo- dard set of software components. The idiosyncratic experi- nents are not isolated. ence of “surfing the web” on a particular browser is largely governed by how the browser implements these standard The second characteristic of a monolithic browser is that, modules. from the web page’s perspective, all of the browser com- ponents are either black box or grey box. In particular, • The network stack implements various transfer proto- the HTML/CSS parser, layout engine, and renderer are all cols like http://, ://, file://,andsoon. black boxes—the application has no way to monitor or di- • The HTML parser validates a page’s HTML. The CSS rectly influence the operation of these components. Instead, parser performs a similar role for CSS. Since mal- the application provides HTML and CSS as inputs, and re- formed HTML and CSS are pervasive, parsers define ceives a DOM tree and a screen repaint as outputs. The Ja- rules for coercing ill-specified pages into a valid format. vaScript runtime is grey box, since the JavaScript language • The browser internally represents the HTML tag tree provides powerful facilities for reflection and dynamic ob- using a data structure called the DOM tree. “DOM” ject modification. However, many important data structures is an abbreviation for the , are defined by native objects, and the JavaScript proxies for a browser-neutral standard for describing HTML con- these objects are only partially compatible with JavaScript’s tent [51]. The DOM tree contains a node for every ostensible object semantics. The reason is that these prox- HTML tag. Each node is adorned with the associated ies are bound to hidden browser state that an application CSS data, as well as any application-defined event han- cannot directly observe. Thus, seemingly innocuous inter- dlers for GUI activity. actions with native code proxies may force internal browser • The layout and rendering engine traverses the DOM structures into inconsistent states [31]. We provide examples tree and determines the visual size and spatial posi- of these problems in Section 2.4. tioning of each element. For complex web pages, the layout engine may require multiple passes over the Figure 1(b) shows the architecture of the OP microkernel DOM tree to calculate the associated layout. browser [25]. The core browser consists of a network stack, • The JavaScript interpreter provides two services. First, a storage system, and a user-interface system. Each com- it implements the core JavaScript runtime. The core ponent is isolated in a separate process, and they commu- runtime defines basic datatypes like strings, and pro- nicate with each other by exchanging messages through the vides simple library services like random number gen- kernel. A webpageinstanceruns atop these core compo- eration. Second, the interpreter reflects the DOM tree nents. Each instance consists of an HTML parser/renderer, into the JavaScript namespace, defining JavaScript ob- a JavaScript interpreter, an Xvnc [47] , and zero or jects which are essentially proxies for internal browser more plugins. All of these are isolated in separate processes objects. These internal objects are written in native and communicate via message passing. For example, the Ja- code (typically C++). From the perspective of a web vaScript interpreter sends messages to the HTML parser to

219 Qhtrv†‡hpr Qhtrv†‡hpr

Tp vƒ‡vt Gh’‚ˆ‡ Hh xˆƒ 9PH 9PH‡ rr ET ˆ‡v€r ˆ‡v€r rqr vt Qh †r ‡ rr

Y‰p CUHG8TT T’ƒu‚v‡r ƒ r‡r ET ˆ‡v€r 9PH‡ rr ET ƒh †r yh’‚ˆ‡ v‡r ƒ r‡r rqr vt

CUHG8TT VD Ir‡‚ x ETv‡r ƒ r‡r ƒh †r yh’‚ˆ‡ rqr vt T‡‚ htr Ir‡‚ x VD Q vpvƒhyD†‡hpr 8 rh‡v‚ 9r‰vpr T‡‚ htr Ir‡‚ x VD T‡‚ htr †r ‰r Fr ry 8 ‚††QD€r††htvt

7 ‚†r riƒhtr 7 ‚†r Hh†‡r xr ry (a) A monolithic browser like (b) The OP browser. (c) Atlantis. Firefox.

Figure 1: Browser architectures. Rectangles represent strong isolation containers (either processes or C# AppDomains). Rounded rectangles represent modules within the same container. Solid borders indicate a lack of extensibility. Dotted borders indicate partial extensibility, and no border indicates complete extensibility. dynamically update a page’s content; the parser sends screen In the target phase, the