<<

Privilege Separation in HTML5 Applications

Devdatta Akhawe, Prateek Saxena, Dawn Song University of California, Berkeley {devdatta,prateeks,dawnsong}@cs.berkeley.edu

Abstract QMail [18] and Chrome [19]. In contrast, privi- lege separation in web applications is harder and comes The standard approach for privilege separation in at a cost. If an HTML5 application wishes to separate web applications is to execute application components its functionality into multiple isolated components, the in different web origins. This limits the practicality of same-origin policy (SOP) mandates that each component privilege separation since each web origin has finan- execute in a separate web origin.1 Owning and main- cial and administrative cost. In this paper, we pro- taining multiple web origins has significant practical ad- pose a new design for achieving effective privilege sep- ministrative overheads. 2 As a result, in practice, the aration in HTML5 applications that shows how appli- number of origins available to a single cations can cheaply create arbitrary number of com- is limited. Web applications cannot use the same-origin ponents. Our approach utilizes standardized abstrac- policy to isolate every new component they add into the tions already implemented in modern browsers. We do application. At best, web applications can only utilize not advocate any changes to the underlying browser or sub-domains for isolating components, which does not require learning new high-level languages, which con- provide proper isolation, due to special powers granted trasts prior approaches. We empirically show that we to sub-domains in the cookie and document.domain be- can retrofit our design to real-world HTML5 applica- haviors. tions (browser extensions and rich client-side applica- Recent research [12, 20] and modern HTML5 plat- tions) and achieve reduction of 6x to 10000x in TCB for forms, such as the extension platform our case studies. Our mechanism requires less than 13 (also used for “packaged web applications”), have recog- lines of application-specific code changes and consider- nized the need for better privilege separation in HTML5 ably improves auditability. applications. These systems advocate re-architecting the 1 Introduction underlying browser or OS platform to force HTML5 ap- plications to be divided into a fixed number of compo- Applications written with JavaScript, HTML5 and CSS nents. For instance, the Google Chrome extension - constructs (called HTML5 applications) are becoming work requires that extensions have three components, ubiquitous. Rich web applications and ex- each of which executes with different privileges [19]. tensions are examples of HTML5 applications that al- Similarly, recent research proposes to partition HTML5 ready enjoy massive popularity [1, 2]. The introduc- applications in “N privilege rings”, similar to the isola- tion of browser operating systems [3, 4], and support for tion primitives supported by x86 processors [12]. We HTML5 applications in classic operating systems [5, 6] observe two problems with these approaches. First, the herald the convergence of web and desktop applica- fixed limit on the number of partitions or components tions. However, web vulnerabilities are still pervasive in creates an artificial and unnecessary limitation. Differ- emerging web applications and browser extensions [7], despite immense prior research on detection and mitiga- 1Browsers isolate applications based on their origins. An origin is tion techniques [8–12]. defined as the tuple . In recent browser exten- Privilege separation is an established security prim- sion platforms, such as in Google Chrome, each extension is assigned a unique public key as its web origin. These origins are assigned and itive for providing an important second line of de- fixed at the registration time. fense [13]. Commodity OSes enable privilege separated 2To create new origins, the application needs to either create new applications via isolation mechanisms such as LXC [14], DNS domains or run services at ports different from port 80 and 443. seccomp [15], SysTrace [16]. Traditional applications New domains cost money, need to be registered with DNS servers and are long-lived. Creating new ports for web services does not work: first, have utilized these for increased assurance and secu- network firewalls block atypical ports and doesn’t rity. Some well-known examples include OpenSSH [17], include the port in determining an application’s origin

1 ent applications require differing number of components, apply this approach to secure Chrome applications, and and a “one-size-fits-all” approach does not work. We our design has influenced the security architecture of up- show that, as a result, HTML5 applications in such plat- coming Chrome applications. forms have large amounts of code running with unneces- In our architecture, HTML5 applications can de- sary privileges, which increases the impact from attacks fine more expressive policies than supported by exist- like cross-site scripting. Second, browser re-design has ing HTML5 platforms, namely the Chrome extension a built-in deployment and adoption cost and it takes sig- platform [19] and the Metro platform [5]. nificant time before applications can enjoy the benefits Google Chrome and Windows 8 rely on applications of privilege separation. declaring install-time permissions that end users can In this paper, we rethink how to achieve privilege sepa- check [23]. Multiple studies have found permission sys- ration in HTML5 applications. In particular, we propose tems to be inadequate: the bulk of popular applications a solution that does not require any platform changes run with powerful permissions [24, 25] and users rarely and is orthogonal to privilege separation architectures en- check install-time permissions [26]. In our architecture, forced by the underlying browsers. Our proposal uti- policy code is explicit and clearly separated, can take lizes standardized primitives available in today’s web into account runtime ordering of privileged accesses, and browsers, requires no additional web domains and im- can be more fine-grained. This design enables expert au- proves the auditability of HTML5 applications. In our ditors, such as maintainers of software application gal- proposal, HTML5 applications can create an arbitrary leries, to reason about the security of applications. In our number of “unprivileged components.” Each compo- case studies, these policies are typically a small amount nent executes in its own temporary origin isolated from of static JavaScript code, which is easily auditable. the rest of the components by the SOP. For any priv- ileged call, the unprivileged components communicate 2 Problem and Approach Overview with a “privileged” (parent) component, which executes Traditional HTML applications execute with the author- in the main (permanent) origin of the web application. ity of their “web origin” (protocol, port, and domain). The privileged code is small and we ensure its integrity The browser’s same origin policy (SOP) isolates differ- by enforcing key security invariants, which we define in ent web origins from one another and from the file sys- Section 3. The privileged code mediates all access to tem. However, applications rarely rely on domains for the critical resources granted to the web application by isolation, due to the costs associated with creating new the underlying browser platform, and it enforces a fine- domains or origins. grained policy on all accesses that can be easily audited. In more recent application platforms, such as the Our proposal achieves the same security benefits in en- Google Chrome extension platform [23], Chrome pack- suring application integrity as enjoyed by desktop appli- aged web application store [1] and Windows 8 Metro ap- cations with process isolation and sandboxing primitives plications [5], applications can execute with enhanced available in commodity OSes [14–16]. privileges. These privileges, such as access to the We show that our approach is practical for exist- geo-location, are provided by the underlying platform ing HTML5 applications. We retrofit two widely used through privileged . Applications utilizing these Google Chrome extensions and a popular HTML5 appli- privileged API explicitly declare their permissions to use cation for SQL database administration to use our design. privileged APIs at install time via manifest files. These In our case studies, we show that the amount of trusted applications are authored using the standard HTML5 fea- code running with full privileges reduces by a factor of tures and web languages (like JavaScript) that web ap- 6 to 10000. Our architecture does not sacrifice any per- plications use; we use the term HTML5 applications to formance as compared to alternative approaches that re- collectively refer to web applications and the aforemen- design the underlying web browser. Finally, our migra- tioned class of emerging applications. tion of existing applications requires minimal changes to Install-time manifests are a step towards better secu- code. For example, in porting our case studies to this rity. However, these platforms still limit the number of new design we changed no more than 13 lines of code application components to a finite few and rely on sep- in any application. Developers do not need to learn new arate origins to isolate them. For example, each Google languages or type safety primitives to migrate code to Chrome extension has three components. One compo- our architecture, in contrast to recent proposals [21]. We nent executes in the origin of web sites that the exten- also demonstrate strong data confinement policies. To sion interacts with. A second component executes with encourage adoption, we have released our core infras- the extension’s permanent origin (a unique public key as- tructure code as well as the case studies (where permit- signed to it at creation time). The third component exe- ted) and made it all freely available online [22]. We are cutes in an all-powerful origin having the authority of the currently collaborating with the Google Chrome team to web browser. In this section, we show how this limits the

2 degree of privilege separation for HTML5 applications in 100

practice. 90

2.1 Issues with the Current Architecture 80 In this section, we point out two artifacts of today’s 70 HTML5 applications: bundling of privileges and TCB 60 inflation. We observe that these issues are rooted in the 50 fact that, in these designs, the ability to create new web 40 origins (or security principals) is severely restricted. 30 Number of Extensions (Percentage) Common vulnerabilities (like XSS and mixed content) 20 today actually translate to powerful gains for attackers 10 in current architectures. Recent findings corroborate the 0 need for better privilege separation—for instance, 27 out 0 5 10 15 20 25 30 35 40 45 50 Percentage of Functions requiring privileges of 100 Google Chrome extensions (including the top 50) recently studied have been shown to have exploitable Figure 1: CDF of percentage of functions in an extension that vulnerabilities [7]. These attacks grant powerful privi- make privileged calls (X axis) vs. the fraction of extensions leges like code execution in all HTTP and HTTPS web studied (in percentage) (Y axis). The lines for 50% and 20% of sites and access to the user’s browsing history. extensions as well as for 5% and 20% of functions are marked. As a running example, we introduce a hypothetical ex- tension for Google Chrome called ScreenCap. Screen- leges of the screenshot component. Cap is an extension for capturing screenshots that also in- We measured the TCB inflation for the top 50 Chrome cludes a rudimentary image editor to annotate and mod- extensions. Figure 1 shows the percentage of total func- ify the image before sending to the cloud or saving to a tions in an extension requiring privileges as a fraction of disk. the total number of static functions. In half the exten- sions studied, less than 5% of the functions actually need Bundling. The ScreenCap extension consists of two any privileges. In 80% of the extensions studied, less functionally disjoint components: a screenshot capturing than 20% of the functions require any privileges. component and an image editor. In the current architec- Summary. It is clear from our data that HTML5 ap- ture, both the components run in the same principal (ori- plications, like Chrome extensions, do not sufficiently gin), despite requiring disjoint privileges. We call this isolate their sub-components. The same-origin policy phenomenon bundling. The screenshot component re- equates web origins and security principals, and web ori- quires the tabs and permission, while the gins are fixed at creation time or tied to the web do- image editor only requires the pictureLibrary permis- main of the application. All code from a given provider sion to save captured images to the user’s picture library runs under a single principal, which forces privileges on the cloud. to be ambient. Allowing applications to cheaply create Bundling causes over-privileged components. For ex- as many security principals as necessary and to confine ample, the image editor component runs with the power- them with fine-grained, flexible policies can make privi- ful tabs and permission. In general, if an lege separation more practical. application’s components require privilege sets α , α ..., 1 2 Ideally, we would like to isolate the image editor com- all components of the application run with the privileges [ ponent from the screenshot component, and give each αi, leading to over-privileging. As we show in Sec- component exactly the privileges it needs. Moving the tion 5.4, 19 out of the Top 20 extensions for the Google complex UI and image manipulation code to an unprivi- Chrome platform exhibit bundling. As discussed earlier, leged component can tremendously aid audit and anal- this problem manifests on the web too. ysis. Our first case study (Section 5.1) discusses un- TCB inflation. Privileges in HTML5 are ambient—all bundling and TCB reduction on a real world screenshot code in a principal runs with full privileges of the princi- application. We achieved a 58x TCB reduction. pal. In reality, only a small application core needs access 2.2 Problem Statement to these privileges and rest of the application does not Our goal is to design a new architecture for privilege sep- need to be in the trusted computing base (TCB). For ex- aration that side steps the problem of scarce web origins ample, the image editor in ScreenCap consists of a num- and enables the following properties: ber of complex and large UI and image manipulation li- braries. All this JavaScript code runs with the ambient Reduced TCB. Given the pervasive nature of code in- privilege to write to the user’s picture library. Note that jection vulnerabilities, we are interested, instead, in this is in addition to it running bundled with the privi- reducing the TCB.

3 Ease of Audit. Dynamic code inclusion and use of com- plex JS constructs is pervasive. An architecture that eases audits, in spite of these issues, is necessary. S S Application H H Flexible policies. Current manifest mechanisms pro- Code I I vide insufficient contextual data for meaningful se- M M curity policies. A separate flexible policy mecha- Bootstrap Code nism can ease audits and analysis. Child Iframe Reduce Over-privileging. Bundling of disjoint applica- Policy Code tions in the same origin results in over-privileging. S We want an architecture that can isolate applications Application H Code I agnostic of origin. M Ease of Use. For ease of adoption, we also aim for min- imal compatibility costs for developers. Mecha- Child Iframe Parent nisms that would involve writing applications for a new platform are outside scope. Browser Page Scope. We on the threat of vulnerabilities in be- nign HTML5 application. We aim to enable a privilege Figure 2: High-level design of our proposed architecture. separation architecture that benign applications can uti- lize with ease to provide a strong second line of defense. our design maintains compatibility and facilitates adop- We consider malicious applications as out of scope, but tion. our design improves auditability and may be applicable Parent. Our design ensures the integrity of the privi- to HTML5 malware in the future. leged parent by maintaining a set of key security invari- This paper strictly focuses on mechanisms for achiev- ants that we define in Section 3.2. The parent guards ac- ing privilege separation and on mechanisms for expres- cess to a powerful API provided by the underlying plat- sive policy-based confinement. Facilitating policy devel- , such as the Google Chrome extension API. For opment and checking if policies are reasonable is an im- making any privileged call or maintaining persistent data, portant issue, but beyond the scope of this paper. the unprivileged children communicate with the parent 3 Design over a thin, well-defined messaging . The par- ent component has three components: We describe our privilege separation architecture in this section. We describe the key security invariants we main- • Bootstrap Code. When a user first navigates to the tain in Section 3.2 and the mechanisms we use for en- HTML5 application, a portion of the parent code forcing them in Section 3.3. called the bootstrap code executes. Bootstrap code 3.1 Approach Overview is the unique entry point for the application. The bootstrap code downloads the application source, We advocate a design that is independent of any privilege spawns the unprivileged children in separate tempo- separation scheme enforced by the underlying browser. rary origins, and controls the lifetime of their exe- In our design, HTML5 applications have one privileged cution. It also includes boilerplate code to initialize parent component, and can have an arbitrary number of the messaging interface in each child before child unprivileged children. Each child component is spawned code starts executing. Privileges in HTML5 appli- by the parent and it executes in its own temporary ori- cations are tied to origins; thus, a temporary origin gin. These temporary origins are created on the fly for runs with no privileges. We explain temporary ori- each execution and are destroyed after the child exits; gins further in Section 3.3. we detail how temporary origins can be implemented us- ing modern web browsers primitives in Section 3.3. The • Parent . During their execution, unprivileged privileged parent executes in the main (permanent) origin children can make privileged calls to the parent. assigned to the HTML5 application, typically the web The parent shim marshals and unmarshals these re- origin for traditional web application. The same origin quests to and from the children. The parent shim policy isolates unprivileged children from one another also presents a usable interface to the policy code and from the privileged parent. Figure 2 shows our pro- component of the parent. posed HTML5 application architecture. In our design, applications can continue to be authored in existing web • Policy Code. The policy code enforces an languages like JavaScript, HTML and CSS. As a result, application-specific policy on all messages received

4 from children. Policy code decides whether to allow unmarshals the result and returns it to the original or disallow access to privileged APIs, such as access caller function in the child. Certain privileged API to the user’s browsing history. This mechanism pro- functions take callbacks or structured data objects; vides complete mediation on access to privileged in Section 4.1 we outline how our mechanism prox- APIs and supports fine-grained policies, similar to ies these transparently. Together, the parent and system call monitors in commodity OSes like Sys- child shim hide the existence of the privilege bound- Trace [16]. In addition, as part of the policy code, ary from the application code. applications can define additional restrictions on the privileges of the children, such as disabling import 3.2 Security Invariants of additional code from the web. Our security invariants ensure the integrity and correct- ness of code running in the parent with full privileges. Only the policy code is application-specific; the boot- We do not restrict code running in the child; our threat strap and parent shim are the same across all applica- model assumes that unprivileged children can be com- tions. To ease adoption, we have made the application- promised any time during their execution. We enforce independent components available online. The applica- four security invariants on the parent code: tion independent components need to be verified once for correctness and can be reused for all application in 1. The parent cannot convert any string to code. the future. For new applications using our design, only the application’s policy code needs to be audited. In our 2. The parent cannot include external code from the experimental evaluation, we find that the parent code is web. typically only a small fraction of the rest of the applica- 3. The parent code is the only entry point into the priv- tion and our design invariants make it statically auditable. ileged origin. Children. Our design moves all functional compo- nents of the application to the children. Each child con- 4. Only primitive types (specifically, strings) cross the sists of two key components: privilege boundary.

• Application Code. Application code starts execut- The first two invariants help increase assurance in the ing in the child after the bootstrap code initializes parent code. Together, they disable dynamic code exe- the messaging interface. All the application logic, cution and import of code from the web, which elimi- including code to handle visual layout of the appli- nates the possibility of XSS and mixed content vulner- cation, executes in the unprivileged child; the parent abilities in parent code. Furthermore, it makes parent controls no visible area on the screen. This implies code statically auditable and verifiable. Several analysis that all dynamic HTML (and code) rendering op- techniques can verify JavaScript when dynamic code ex- erations execute in the child. Children are allowed ecution constructs like eval and setTimeout have been to include libraries and code from the web and ex- syntactically eliminated [9–11, 28, 29]. ecute them. Vulnerabilities like XSS or mixed con- Invariant 3 ensures that only the trusted parent code tent bugs (inclusion of HTTP scripts in HTTPS do- executes in the privileged origin; no other application mains) can arise in child code. In our threat model, code should execute in the permanent origin. The naive we assume that children may be compromised dur- approach of storing the unprivileged (child) code as a ing the application’s execution. HTML file on the server suffers from a subtle but seri- ous vulnerability. An attacker can directly navigate to • Child Shim. The parent includes application inde- the unprivileged code. Since it is served from the same pendent shim code into the child to seamlessly al- origin as the parent, it will execute with full privileges low privileged calls to the parent. This is done to of the parent without going through the parent’s boot- keep compatibility with existing code and facilitate strap mechanism. To prevent such escalation, invariant porting applications to our design. Shim code in 3 ensures that all entry points into the application are the child defines wrapper functions for privileged directed only through the bootstrap code in the parent. APIs (e.g., the Google Chrome extension API [27]). Similarly, no callbacks to unprivileged code are passed The wrapper functions forward any privileged API to the privileged API—they are proxied by parent func- calls as messages to the parent. The parent shim un- tions to maintain Invariant 3. We detail how we enforce marshals these messages, checks the integrity of the this invariant in Section 3.3. message and executes the privileged call if allowed Privilege separation, in and of itself, is insufficient to by the policy. The return value of the privileged improve security. A problem in privilege-separated C ap- API call is marshaled into messages by the parent plications is the exchange of pointers across the privi- shim and returned to the child shim. The child shim lege boundary, leading to possible errors [30, 31]. While

5 JavaScript does not have C-style pointers, it has first- execute if downloaded and executed by the parent, class functions. Exchanging functions and objects across via the bootstrap mechanism. This ensures Invariant the privilege boundary can introduce security vulnerabil- 3. In case of pure client-side platforms like Chrome, ities. Invariant 4 eliminates such attacks by requiring that this involves a simple file renaming from . to only primitive strings are exchanged across the privilege .txt. In case of classic client-server web applica- boundary. tions, this involves returning a Content-Type header of text/plain. To disable mime-sniffing, we also set the 3.3 Mechanisms X-Content-Type-Options HTTP header to nosniff. We detail how we implement the design and enforce the above invariants in this section. Whenever possible, we Messaging Interface. We utilize standard primitives rely on browser’s mechanisms to declaratively enforce like XMLHttpRequest and the DOM API for download- the outlined invariants, thereby minimizing the need for ing the application code and executing it in an iframe. code audits. We rely on the postMessage API for communication across the privilege boundary. postMessage is an asyn- Temporary Origins. To isolate components, we exe- chronous, cross-domain, purely client-side messaging cute unprivileged children in separate iframes sourced mechanism. By design, postMessage only accepts from temporary origins. A temporary origin can be cre- primitive strings. This ensures Invariant 4. ated by assigning a fresh, globally unique identifier that the browser guarantees will never be used again [32]. A temporary origin does not have any privileges, or in Policy. Privilege separation isolates the policy and the other words, it executes with null authority. The globally application logic. Policies, in our design, are written in unique nature means that the browser isolates every tem- JavaScript devoid of any dynamic evaluation constructs porary origin from another temporary origin, as well as and are separated from the rest of the complex applica- the parent. The temporary origin only lasts as long as the tion logic. Permissions on existing browser platforms are lifetime of the associated iframe. granted at install-time. In contrast, our design allows for more expressive and fine-grained policies like granting Several mechanisms for implementing temporary ori- and revoking privileges at run-time. For example, in the gins are available in today’s browsers, but these are rarely case of ScreenCap, a child can get the ability to capture found in use on the web. In the HTML5 standard, iframes a screenshot only once and only after the user clicks the with the sandbox directive run in a temporary origin. ‘capture’ button. Such fine-grained policies require the This primitive is standardized and already supported in policy engine to maintain state, reason about or- shipping versions of Google Chrome/ChromeOS, , dering and have the ability to grant/revoke fine-grained Internet Explorer/Windows 8, and a patch for privileges. Our attempt at expressive policies is along the is in the final stages of review [33]. line of active research in this space [21], but in contrast Enforcement of Security Invariants. To enforce se- to existing proposals, it does not require developers to curity invariants 1 and 2 in the parent, our implemen- specify policies in new high-level languages. Our focus tation utilizes the (CSP) [34]. is on mechanisms to support expressive policies; deter- CSP is a new specification, already supported in Google mining what these policies should be for applications is Chrome and Firefox, that defines browser-enforced re- beyond the scope of this paper. strictions on the resources and execution of application code. In our case studies, it suffices to use the CSP Additional Confinement of Child Code. By default, policy directive default-src ’none’; script-src no restrictions are placed on the children beyond those ’self’—this disables all constructs to convert strings implied by use of temporary origins. Specifically, the into code (Invariant 1) and restricts the source of all child does not inherit the parent’s CSP policy restric- scripts included in the page to the origin of the appli- tions. In certain scenarios, the application developer cation (Invariant 2). We find that application-specific may choose to enforce additional restrictions on the child code is typically small (5 KB) and easily auditable in our code, via an appropriate CSP policy on the child iframe case studies. On platforms on which CSP is not sup- at the time of its creation by the parent code. For ex- ported, we point out that disabling code evaluation con- ample, in the case of ScreenCap, the screenshot compo- structs and external code import is possible by syntacti- nent can be run under the script-src ’self’. This cally restricting the application language to a subset of increases assurance by disabling inline scripts and code JavaScript [11, 28, 29]. included from the web, making XSS and mixed content We require that all non-parent code, when requested, attacks impossible. The policy code can then grant the is sent back as a text file. Browsers do not ex- powerful privilege of capturing a screenshot of a user’s ecute text files—the code in the text files can only webpage to a high assurance screenshot component.

6 The parent proceeds to create a temporary origin, 1. Bootstrap Code unprivileged iframe using the downloaded code as the 3. Create Bootstrap Code Child source (Step 3, Figure 3). Listing 1 outlines the code to create the unprivileged temporary origin. The parent S 2. Application H 4. Source Code and builds up the child’s HTML in the sb content variable. S I Policy Policy The parent can optionally include content restrictions on Application H M Code Code I the child via a CSP policy, as explained in Section 3.3. M Policy Code Creating multiple children is a simple repetition of the Child Iframe step 3. Parent The parent also sources the child shim into the child iframe. The parent concatenates the child’s code (HTML) and URI-encodes it all into a variable Browser Page called sb content. The parent creates an iframe with Figure 3: Sequence of events to run application in sandbox. sb content as the data: URI source, sets the sandbox Note that only the bootstrap code is sent to the browser to exe- attribute and appends the iframe to the document. The cute. Application code is sent directly to the parent, which then parent code also inserts a base HTML tag that enables creates a child with it. relative URIs to work seamlessly. data: is a URI scheme that enables references to in- var s b c o n t e n t= "" ; line data as if it were an external reference. For example, §s b c o n t e n t+= "" ; is similar to an iframe pointing to an HTML page con- s b c o n t e n t+= "

Web Analytics