Establishing Browser Security Guarantees Through Formal Shim Veriﬁcation

Establishing Browser Security Guarantees through Formal Shim Veriﬁcation

Dongseok Jang Zachary Tatlock Sorin Lerner UC San Diego UC San Diego UC San Diego

Abstract ternative security policies [25, 40, 21, 8, 39, 5] and improved JavaScript safety [14, 23, 38, 6, 44]. While all Web browsers mediate access to valuable private data in these techniques improve browser security, the intricate domains ranging from health care to banking. Despite subtleties of Web security make it very difficult to know this critical role, attackers routinely exploit browser vul- with full certainty whether a given technique works as in- nerabilities to exfiltrate private data and take over the un- tended. Often, a solution only “works” until an attacker derlying system. We present QUARK, a browser whose finds a bug in the technique or its implementation. Even kernel has been implemented and verified in Coq. We in work that attempts to provide strong guarantees (for give a specification of our kernel, show that the imple- example [17, 13, 41, 12]) the guarantees come from ana- mentation satisfies the specification, and finally show lyzing a model of the browser, not the actual implemen- that the specification implies several security properties, tation. Reasoning about such a simplified model eases including tab non-interference, cookie integrity and con- the verification burden by omitting the gritty details and fidentiality, and address bar integrity. corner cases present in real systems. Unfortunately, attackers exploit precisely such corner cases. Thus, these 1 Introduction approaches still leave a formality gap between the theory and implementation of a technique. Web browsers increasingly dominate computer use as There is one promising technique that could mini- people turn to Web applications for everything from busi- mize this formality gap: fully formal verification of the ness productivity suites and educational software to so- browser implementation, carried out in the demanding cial networking and personal banking. Consequently, and foundational context of a mechanical proof assistant. browsers mediate access to highly valuable, private data. This severe discipline forces the programmer to specify Given the browser’s sensitive, essential role, it should be precisely how their code should behave and then pro- highly secure and robust in the face of adversarial attack. vides the tools to formally guarantee that it does, all in Unfortunately, security experts consistently discover fully formal logic, building from basic axioms up. For vulnerabilities in all popular browsers, leading to data their trouble, the programmer is rewarded with a ma- loss and remote exploitation. In the annual Pwn2Own chine checkable proof that the implementation satisfies competition, part of the CanSecWest security confer- the specification. With this proof in hand, we can avoid ence [4], security experts demonstrate new attacks on up- further reasoning about the large, complex implementa- to-date browsers, allowing them to subvert a user’s ma- tion, and instead consider only the substantially smaller, chine through the click of a single link. These vulnera- simpler specification. In order to believe that such a bilities represent realistic, zero-day exploits and thus are browser truly satisfies its specification, one needs only quickly patched by browser vendors. Exploits are also trust a very small, extensively tested proof checker. By regularly found in the wild; Google maintains a Vulner- reasoning about the actual implementation directly, we ability Reward Program, publishing its most notorious can guarantee that any security properties implied by the bugs and rewarding the cash to their reporters [2]. specification will hold in every case, on every run of the Researchers have responded to the problems of actual browser. browser security with a diverse range of techniques, from Unfortunately, formal verification in a proof assistant novel browser architectures [10, 42, 17, 41, 31] and de- is tremendously difficult. Often, those systems which we fenses against specific attacks [26, 20, 22, 8, 36] to al- can formally verify are severely restricted, “toy” versions of the programs we actually have in mind. Thus, many user being involved, and always displays the correct researchers still consider full formal verification of real- address bar. istic, browser-scale systems an unrealistic fantasy. Fortu- nately, recent advances in fully formal verification allow To summarize, our contributions are as follows: us to begin challenging this pessimistic outlook. • We demonstrate how formal shim verification en- In this paper we demonstrate how formal shim verifi- abled us to formally verify the implementation of cation radically reduces the verification burden for large a modern Web browser. We discuss the techniques, systems to the degree that we were able to formally tools, and design decisions required to formally ver- verify the implementation of a modern Web browser, ify QUARK in detail. QUARK, within the demanding and foundational context of the mechanical proof assistant Coq. • We identify and formally prove key security prop- At its core, formal shim verification addresses the erties for a realistic Web browser. challenge of formally verifying a large system by clev- erly reducing the amount of code that must be con- • We provide a framework that can be used to further sidered; instead of formalizing and reasoning about gi- investigate and prove more complex policies within gantic system components, all components communi- a working, formally verified browser. cate through a small, lightweight shim which ensures the components are restricted to only exhibit allowed behav- The rest of the paper is organized as follows. Section 2 iors. Formal shim verification only requires one to rea- provides background on browser security techniques and son about the shim, thus eliminating the tremendously formal verification. Section 3 presents an overview of expensive or infeasible task of verifying large, complex the QUARK browser. Section 4 details the design of the components in a proof assistant. QUARK kernel and its implementation. Section 5 ex- plains the tools and techniques we used to formally ver- Our Web browser, QUARK, exploits formal shim verification and enables us to verify security properties for ify the implementation of the QUARK kernel. Section 6 a million lines of code while reasoning about only a few evaluates QUARK along several dimensions while Sec- tion 7 discusses lessons learned from our endeavor. hundred. To achieve this goal, QUARK is structured similarly to Google Chrome [10] or OP [17]. It consists of a small browser kernel which mediates access to sys- 2 Background and Related Work tem resources for all other browser components. These other components run in sandboxes which only allow the This section briefly discusses both previous efforts to im- component to communicate with the kernel. In this way, prove browser security and verification techniques to en- QUARK is able to make strong guarantees about a million sure programs behave as specified. lines of code (e.g., the renderer, JavaScript implementation, JPEG decoders, etc.) while only using a proof as- Browser Security As mentioned in the Introduction, sistant to reason about a few hundred lines of code (the there is a rich literature on techniques to improve browser kernel). Because the underlying system is protected from security [10, 42, 17, 41, 31, 13, 12]. We distinguish our- QUARK’s untrusted components (i.e., everything other selves from all previous techniques by verifying the ac- than the kernel) we were free to adopt state-of-the-art tual implementation of a modern Web browser and for- implementations and thus QUARK is able to run popu- mally proving that it satisfies our security properties, all lar, complex Web sites like Facebook and GMail. in the context of a mechanical proof assistant. Below, we By applying formal shim verification to only reason survey the most closely related work. about a small core of the browser, we formally establish Previous browsers like Google Chrome [10], the following security properties in QUARK, all within a Gazelle [42], and OP [17] have been designed using proof assistant: privilege separation [35], where the browser is divided into components which are then limited to only those 1. Tab Non-Interference: no tab can ever affect how privileges they absolutely require, thus minimizing the the kernel interacts with another tab damage an attacker can cause by exploiting any one component. We follow this design strategy. 2. Cookie Confidentiality and Integrity: cookies for Chrome’s design compromises the principles of priv- a domain can only be accessed/modified by tabs of ilege separation for the sake of performance and com- that domain patibility. Unfortunately, its design does not protect the user’s data from a compromised tab which is free to 3. Address Bar Integrity and Correctness: the ad- leak all cookies for every domain. Gazelle [42] adopts dress bar cannot be modified by a tab without the a more principled approach, implementing the browser

2 as a multi-principal OS, where the kernel has exclusive control over resource management across various Web sand Input Tab principals. This allows Gazelle to enforce richer policies Proc box than those found in Chrome. However, neither Chrome nor Gazelle apply any formal methods to make guarantees about their browser. Quark The OP [17] browser goes beyond privilege separa- Network Kernel tion. Its authors additionally construct a model of their browser kernel and apply the Maude model checker to ensure that this model satisfies important security properties such as the same origin policy and address bar cor- Cookie sand Output box rectness. As such, the OP browser applies insight similar Proc to our work, in that OP focuses its formal reasoning on a small kernel. However, unlike our work, OP does not Figure 1: QUARK Architecture. This diagram shows how QUARK make any formal guarantees about the actual browser im- factors a modern browser into distinct components which run in sep- plementation, which means there is still a formality gap arate processes; arrows indicate information flow. We guarantee our between the model and the code that runs. Our formal security properties by formally verifying the QUARK Kernel in the Coq proof assistant, which allows us to avoid reasoning about the intricate shim verification closes this formality gap by conducting details of other components. all proofs in full formal detail using a proof assistant.

features which allow us to verify QUARK in a familiar Formal Verification Recently, researchers have begun style: invariants expressed as pre- and post-conditions using proof assistants to fully formally verify imple- over program states, essentially a variant of Hoare Type mentations for foundational software including Operat- Theory [33]. Specifically, Ynot enables trace-based ver- ing Systems [27], Compilers [28, 1], Database Man- ification, used extensively in [30] to prove properties of agement Systems [29], Web Servers [30], and Sand- servers. This technique entails reasoning about the se- boxes [32]. Some of these results have even experimen- quence of externally visible actions a program may per- tally been shown to to drastically improve software relia- form on any input, also known as traces. Essentially, bility: Yang et al. [43] show through random testing that our specification delineates which sequences of system the CompCert verified C compiler is substantially more calls the QUARK kernel can make and our verification robust and reliable than its non-verified competitors like consists of proving that the implementation is restricted GCC and LLVM. to only making such sequences of system calls. We go As researchers verify more of the software stack, the on to formally prove that satisfying this specification im- frontier is being pushed toward higher level platforms plies higher level security properties like tab isolation, like the browser. Unfortunately, previous verification re- cookie integrity and confidentiality, and address bar in- sults have only been achieved at staggering cost; in the tegrity and correctness. Building QUARK with a different case of seL4, verification took over 13 person years of proof assistant like Isabelle/HOL would have required effort. Based on these results, verifying a browser-scale essentially the same approach for encoding imperative platform seemed truly infeasible. programming features, but we chose Coq since Ynot is Our formal verification of QUARK was radically available and has been well vetted. cheaper than previous efforts. Previous efforts were Our approach is fundamentally different from pre- tremendously expensive because researchers proved vious verification tools like ESP [16], SLAM [7], nearly every line of code correct. We avoid these costs in BLAST [18] and Terminator [15], which work on ex- QUARK by applying formal shim verification: we struc- isting code bases. In our approach, instead of trying ture our browser so that all our target security properties to prove properties about a large existing code base ex- can be ensured by a very small browser kernel and then pressed in difficult-to-reason-about languages like C or reason only about that single, tiny component. Leverag- C++, we rewrite the browser inside of a theorem prover. ing this technique enabled us to make strong guarantees This provides much stronger reasoning capabilities. about the behavior of a million of lines of code while reasoning about only a few hundred in the mechanical proof assistant Coq. 3 QUARK Architecture and Design We use the Ynot library [34] extensively to reason about imperative programming features, e.g., impure Figure 1 diagrams QUARK’s architecture. Similar to functions like fopen, which are otherwise unavailable in Chrome [10] and OP [17], QUARK isolates complex and Coq’s pure implementation language. Ynot also provides vulnerability-ridden components in sandboxes, forcing

3 them to access all sensitive resources through a small, simple browser kernel. Our kernel, written in Coq, runs in its own process and mediates access to resources including the keyboard, disk, and network. Each tab runs a modified version of WebKit in its own process. WebKit is the open source browser engine used in Chrome and Safari. It provides various callbacks for clients as Python bindings which we use to implement tabs. Since tab processes cannot directly access any system resources, we hook into these callbacks to re-route WebKit’s network, screen, and cookie access through our kernel written in Coq. QUARK also uses separate processes for displaying to the screen, storing and accessing cookies, as well reading input from the user. Throughout the paper, we assume that an attacker can compromise any QUARK component which is exposed to Figure 2: QUARK Screenshot. This screenshot shows QUARK run- content from the Internet, except for the kernel which we ning a Google search, including an interactive drop-down suggesting formally verified. This includes all tab processes, cookie query completions and an initial set of search results from a JavaScript processes, and the graphical output process. Thus, we event handler dispatching an “instant search” as well as a page preview from a search result link. (Location blurred for double-blind review.) provide strong formal guarantees about tab and cookie isolation, even when some processes have been completely taken over (e.g., by a buffer overflow attack in cal/text output as shown in Figure 2 where the kernel has the rendering or JavaScript engine of WebKit). complete control over the address bar. With this design, the graphical output process is never able to spoof the 3.1 Graphical User Interface address bar. QUARK also uses a separate input process to support The traditional GUI for Web browsers manages several richer inputs, e.g., the mouse. The input process is a key responsibilities: reading mouse and keyboard input, simple Python script which grabs keyboard and mouse showing rendered graphical output, and displaying the events from the user, encodes them as user input mes- current URL. Unfortunately, such a monolithic compo- sages, and forwards them on to the kernel’s stdin. For nent cannot be made to satisfy our security goals. If keystrokes, the input process simply writes characters in compromised, such a GUI component could spoof the ASCII format to the kernel’s stdin. We use several “un- current URL or send arbitrary user inputs to the kernel, printable” ASCII values (all smaller than 60 and all un- which, if coordinated with a compromised tab, would vi- typeable from the keyboard) to pass special information olate tab isolation. Thus QUARK must carefully separate from the input process to the kernel. For example, the in- GUI responsibilities to preserve our security guarantees put process maps keys F1-F12 to such un-printable char- while still providing a realistic browser. acters, which allows the kernel to use F11 for “new tab”, QUARK divides GUI responsibilities into several com- and F1-F10 for selecting tabs 1-10. Mouse clicks are also ponents which the kernel orchestrates to provide a tradi- sent to the kernel through un-printable ASCII values. Be- tional GUI for the user. The most complex component cause the input process only reads from the keyboard and displays rendered bitmaps on the screen. QUARK puts mouse, and never from the kernel or any other QUARK this component in a separate process to which the kernel components, it cannot be exposed any attacks originating directs rendered bitmaps from the currently selected tab. from the network. Because the kernel never reads input from this graphical output process, any vulnerabilities it may have can- 3.2 Example of Message Exchanges not subvert the kernel or impact any other component in QUARK. Furthermore, treating the graphical output To illustrate how the kernel orchestrates all the com- component as a separate process simplifies the kernel and ponents in QUARK, we detail the steps from startup proofs because it allows the kernel to employ a uniform to a tab loading http://www.google.com. The user mechanism for interacting with the outside world: mes- opens QUARK by starting the kernel which in turn sages over channels. starts three processes: the input process, the graph- To formally reason about the address bar, we designed ical output process, and a tab process. The ker- our kernel so that the current URL is written directly to nel establishes a two-way communication channel with the kernel’s stdout. This gives rise to a hybrid graphi- each process it starts. Next, the kernel then sends a

4 (Go "http://www.google.com") message to the tab the tab process determines which part of the display has indicating it should load the given URL (for now, assume changed, and sends bitmaps only for the rectangular re- this is normal behavior for all new tabs). gions that need to be updated. This drastically reduces The tab process comprises our modified version of the size of the bitmaps being transferred, and the amount WebKit wrapped by a thin layer of Python to handle of redrawing on the screen. messaging with the kernel. After recieving the Go mes- For I/O performance, the original Ynot library used sage, the Python wrapper tells WebKit to start process- single-character read/write routines, imposing significant ing http://www.google.com. Since the tab process is overhead. We defined a new I/O library which uses size running in a sandbox, WebKit cannot directly access the n reads/writes. This reduced reading an n byte message network. When it attempts to, our Python wrapper in- from n I/O calls to just three: reading a 1 byte tag, fol- tervenes and sends a GetURL request to the kernel. As lowed by a 4 byte payload size, and then a single read for long as the request is valid, the kernel responds with a the entire payload. ResDoc message containing the HTML document the tab We also optimized socket connections in QUARK. Our requested. original prototype opened a new TCP connection for each Once the tab process has received the necessary re- HTTP GET request, imposing significant overhead. Mod- sources from the kernel and rendered the Web pages, it ern Web servers and browsers use persistent connections sends a Display message to the kernel which contains a to improve the efficiency of page loading and the respon- bitmap to display. When the kernel receives a Display siveness of Web 2.0 applications. These connections are message from the current tab, it forwards the message on maintained anywhere from a few seconds to several min- to the graphical output process, which in turn displays utes, allowing the client and server can exchange mul- the bitmap on the screen. tiple request/responses on a single connection. Services When the kernel reads a printable character c from like Google Chat make use of very long-lived HTTP con- standard input, it sends a (KeyPress c) message to the nections to support responsive interaction with the user. currently selected tab. Upon receiving such a message, We support such persistent HTTP connections via the tab calls the appropriate input handler in WebKit. For Unix domain sockets which allow processes to send open example, if a user types “a” on Google, the “a” character file descriptors over channels using the sendmsg and is read by the kernel, passed to the tab, and then passed recvmsg system calls. When a tab needs to open a to WebKit, at which point WebKit adds the “a” charac- socket, it sends a GetSoc message to the kernel with the ter to Google’s search box. This in turn causes WebKit’s host and port. If the request is valid, the kernel opens JavaScript engine to run an event handler that Google has and connects the socket, and then sends an open socket installed on their search box. The event handler performs file descriptor to the tab. Once the tab gets the socket file an “instant search”, which initiates further communica- descriptor, it can read/write on the socket, but it cannot tion with the QUARK kernel to access additional network re-connect the socket to another host/port. In this way, resources, followed by another Display message to re- the kernel controls all socket connections. paint the screen. Note that to ease verification, QUARK Even though we formally verify our browser kernel in currently handles all requests synchronously. a proof assistant, we were still able to implement and reason about these low-level optimizations. 3.3 Efficiency 3.4 Socket Security Policy With a few simple optimizations, we achieve performance comparable to WebKit on average (see Section 6 The GetSoc message brings up an interesting security for measurements). Following Chrome, we adopt two issue. If the kernel satisfied all GetSoc requests, then a optimizations critical for good graphics performance. compromised tab could open sockets to any server and First, QUARK uses shared memory to pass bitmaps from exchange arbitrary amounts of information. The kernel the tab process through the kernel to the output process, must prevent this scenario by restricting socket connec- so that the Display message only passes a shared mem- tions. ory ID instead of a bitmap. This drastically reduces the To implement this restriction, we introduce the idea communication cost of sending bitmaps. To prevent a of a domain suffix for a tab which the user enters when malicious tab from accessing another tab’s shared mem- the tab starts. A tab’s domain suffix controls several se- ory, we run each tab as a different user, and set access curity features in QUARK, including which socket con- controls so that a tab’s shared memory can only be ac- nections are allowed and how cookies are handled (see cessed by the output process. Second, QUARK uses Section 3.5). In fact, our address bar, located at the very rectangle-based rendering: instead of sending a large top of the browser (see Figure 2), displays the domain bitmap of the entire screen each time the display changes, suffix, not just the tab’s URL. We therefore refer to it as

5 the “domain bar”. mains can open sockets to the associated content distri- For simplicity, our current domain suffixes build on bution network. The tab domain suffix remains a sin- the notion of a public suffix, which is a top-level domain gle string, e.g. facebook.com, but behind the scenes, it under which Internet users can directly register names, gets expanded into a list depending on the domain, e.g., for example .com, .co.uk, or .edu – Mozilla main- [facebook.com, fbcdn.net]. When deciding whether tains an exhaustive list of such suffixes [3]. In particu- to satisfy a given socket request, QUARK considers this lar, we require the domain suffix for a tab to be exactly list as a disjunction of allowed domain suffixes. Cur- one level down from a public suffix, e.g., google.com, rently, we provide these whitelists manually. amazon.com, etc. In the current QUARK prototype the user provides a tab’s domain suffix separately from its 3.5 Cookies and Cookie Policy initial URL, but one could easily compute the former from the later. Note that, once set, a tab’s domain suf- QUARK maintains a set of cookie processes to handle fix never changes. In particular, any frames a tab loads cookie accesses from tabs. This set of cookie processes do not affect its domain suffix. will contain a cookie process for domain suffix S if S is We considered using the tab’s origin (which includes the domain suffix of a running tab. By restricting mes- the URL, scheme, and port) to restrict socket creation, sages to and from cookie processes, the QUARK kernel but such a policy is too restrictive for many useful guarantees that browser components will only be able to sites. For example, a single GMail tab uses frames access cookies appropriate for their domain. from domains such as static.google.com and mail. The kernel receives cookie store/retrieve requests from google.com. However, our actual domain suffix checks tabs and directs the requests to the appropriate cookie are modularized within QUARK, which will allow us to process. If a tab with domain suffix t asks to store a experiment with finer grained policies in future work. cookie with domain c, then our kernel allows the oper- To enforce our current socket creation policy, we first ation if c ≤ t, in which case it sends the store request to define a subdomain relation ≤ as follows: given domain the cookie process for domain t. Similarly, if a tab with d1 and domain suffix d2, we use d1 ≤ d2 to denote that domain suffix t wants to retrieve a cookie for domain c, d1 is a subdomain of d2. For example www.google.com then our kernel allows the operation if c ≤ t, in which ≤ google.com. If a tab with domain suffix t requests case it sends the request to the cookie process for domain to open a connection to a host h, then the kernel allows t and forwards any response to the requesting tab. the connection if h ≤ t. To load URLs that are not a The above policy prevents cross-domain cookie reads subdomain of the tab suffix, the tab must send a GetURL from a compromised tab, and it prevents a compro- message to the kernel – in response, the kernel does not mised cookie process from leaking information about open a socket but, if the request is valid, may provide the its cookies to another domain; yet it also allows dif- content of the URL. Since the kernel does not attach any ferent tabs with the same domain suffix (but different cookies to the HTTP request for a GetURL message, a URLs) to communicate through cookies (for example, tab can only access publicly available data using GetURL. mail.google.com and calendar.google.com). In addition, GetURL requests only provide the response body, not HTTP headers. 3.6 Security Properties of QUARK Note that an exploited tab could leak cookies by encoding information within the URL parameter of GetURL We provide intuitive descriptions of the security prop- requests, but only cookies for that tab’s domain could be erties we proved for QUARK’s kernel; formal defini- leaked. Because we do not provide any access to HTTP tions appear later in Section 4. A tab in the kernel is a headers with GetURL, we consider this use of GetURL pair, containing the tab’s domain suffix as a string and to leak cookies analogous to leaking cookie data over the tab’s communication channel as a file descriptor. A timing channels. cookie process is also a pair, containing the domain suffix Although we elide details in the current work, we also that this cookie process manages and its communication slightly enhanced our socket policy to improve perfor- channel. We define the state of the kernel as the cur- mance. Sites with large data sets often use content dis- rently selected tab, the list of tabs, and the list of cookie tribution networks whose domains will not satisfy our processes. Note that the kernel state only contains strings domain suffix checks. For example facebook.com uses and file descriptors. fbcdn.net to load much of its data. Unfortunately, the We prove the following main theorems in Coq: simple socket policy described above will force all this data to be loaded using slow GetURL requests through 1. Response Integrity: The way the kernel responds the kernel. To address this issue, we associate whitelists to any request only depends on past user “control with the most popular sites so that tabs for those do- keys” (namely keys F1-F12). This ensures that one

6 browser component (e.g., a tab or cookie process) Definition kstep(ctab, ctabs) := chan <- iselect(stdin, tabs); can never influence how the kernel responds to an- match chan with other component, and that the kernel never allows | Stdin => untrusted input (e.g., data from the web) to influ- c <- read(stdin); ence how the kernel responds to a request. match c with | "+" => t <- mktab(); 2. Tab Non-Interference: The kernel’s response to a write_msg(t, Render); tab’s request is the same no matter how other tabs return (t, t::tabs) interact with the kernel. This ensures that the kernel | ... end never provides a direct way for one tab to attack an- | Tab t => other tab or steal private information from another msg <- read_msg(t); tab. match msg with | GetSoc(host, port) => if(safe_soc(host, domain_suffix(t)) then 3. No Cross-domain Socket Creation: The kernel send_soc(t, host, port); disallows any cross-domain socket creation (as de- return (ctab, tabs) scribed in Section 3.4). else write_msg(t, Error); 4. Cookie Integrity/Confidentiality: The kernel dis- return (ctab, tabs) | ... allows any cross-domain cookie stores or retrieves end (as described in Section 3.5). end

5. Domain Bar Integrity and Correctness: The domain bar cannot be compromised by a tab, and is Figure 3: Body for Main Kernel Loop. This Coq code shows how our QUARK kernel receives and responds to requests from other browser always equal to the domain sufﬁx of the currently components. It ﬁrst uses a Unix-style select to choose a ready input selected tab. channel, reads a request from that channel, and responds to the message appropriately. For example, if the user enters “+”, the kernel creates a new tab and sends it the Render message. In each case, the code 4 Kernel Implementation in Coq returns the new kernel state resulting from handling this request.

QUARK’s most distinguishing feature is its kernel, which is implemented and proved correct in Coq. In this section which takes the current kernel state, handles a single re- we present the implementation of the main kernel loop. quest, and returns the new kernel state as shown in Fig- In the next section we explain how we formally veriﬁed ure 3. The kernel state is a tuple of the current tab (ctab), the kernel. the list of tabs (tabs), and a few other components which Coq enables users to write programs in a small, simple we omit here (e.g., the list of cookie processes). For functional language and then reason formally about them details regarding the loop and kernel initialization code using a powerful logic, the Calculus of Constructions. please see [24]. This language is essentially an effect-free (pure) subset kstep starts by calling iselect (the “i” stands for of popular functional languages like ML or Haskell with input) which performs a Unix-style select over stdin the additional restriction that programs must always ter- and all tab input channels, returning Stdin if stdin is minate. Unfortunately, these limitations make Coq’s de- ready for reading or Tab t if the input channel of tab fault implementation language ill-suited for writing sys- t is ready. iselect is implemented in Coq using a tem programs like servers or browsers which must be ef- select primitive which is ultimately just a thin wrap- fectful to perform I/O and by design may not terminate. per over the Unix select system call. The Coq extraction To address the limitations of Coq’s implementation process, which converts Coq into OCaml for execution, language, we use Ynot [34]. Ynot is a Coq library can be customized to link our Coq code with OCaml im- which provides monadic types that allow us to write ef- plementations of primitives like select. Thus select fectful, non-terminating programs in Coq while retain- is exposed to Coq essentially as a primitive of the ap- ing the strong guarantees and reasoning capabilities Coq propriate monadic type. We have similar primitives for normally provides. Equipped with Ynot, we can write reading/writing on channels, and opening sockets. our browser kernel in a fairly straightforward style whose Request from User. If stdin is ready for reading, essence is shown in Figure 3. the kernel reads one character c using the read primi- Single Step of Kernel. QUARK’s kernel is essentially tive, and then responds based on the value of c. If c is a loop that continuously responds to requests from the “+”, the kernel adds a new tab to the browser. To achieve user or tabs. In each iteration, the kernel calls kstep this, it ﬁrst calls mktab to start a tab process (another

7 primitive implemented in OCaml). mktab returns a tab to the appropriate tab. The kernel responds to Display object, which contains an input and output channels to messages by forwarding them to the output process. communicate with the tab process. Once the tab t is Monads in Ynot. The code in Figure 3 shows how created, the kernel sends it a Render message using the Ynot supports an imperative programming style in Coq. write_msg function – this tells t to render itself, which This is achieved via monads which allow one to en- will later cause the tab to send a Display message to code effectful, non-terminating computations in pure the kernel. Finally, we return an updated kernel state languages like Haskell or Coq. Here we briefly show (t, t::tabs), which sets the newly created tab t as how monads enable this encoding. In the next section we the current tab, and adds t to the list of tabs. extend our discussion to show how Ynot’s monads also In addition to “+” the kernel handles several other enable reasoning about the kernel using pre- and post- cases for user input, which we omit in Figure 3. For conditions as in Hoare logic. example, when the kernel reads keys F1 through F10, We use Ynot’s ST monad which is a parameterized it switches to tabs 1 through 10, respectively, if the tab type where ST T denotes computations which may per- exists. To switch tabs, the kernel updates the currently form some I/O and then return a value of type T. To use selected tab and sends it a Render message. The ker- ST, Ynot provides a bind primitive which has the fol- nel also processes mouse events delivered by the input lowing dependent type: process to the kernel’s stdin. For now, we only han- bind : forall T1 T2, dle mouse clicks, which are delivered by the input pro- ST T1 -> (T1 -> ST T2) -> ST T2 cess using a single un-printable ASCII character (adding richer mouse events would not fundamentally change our This type indicates that, for any types T1 and T2, bind kernel or proofs). The kernel in this case calls a primi- will take two parameters: (1) a monad of type ST T1 and tive implemented in OCaml which gets the location of (2) a function that takes a value of type T1 and returns a the mouse, and it sends a MouseClick message using monad of type ST T2; then bind will produce a value the returned coordinates to the currently selected tab. We in the ST T2 monad. The type parameters T1 and T2 use this two-step approach for mouse clicks (un-printable are inferred automatically by Coq. Thus, the expression character from the input process, followed by primitive bind X Y returns a monad which represents the compu- in OCaml), so that the kernel only needs to processes a tation: run X to get a value v; run (Y v) to get a value single character at a time from stdin, which simplifies v’; return v’. the kernel and proofs. To make using bind more convenient, Ynot Request from Tab. If a tab t is ready for reading, the also defines Haskell-style “do” syntactic sugar us- kernel reads a message m from the tab using read_msg, ing Coq’s Notation mechanism, so that x <- a;b and then sends a response which depends on the mes- is translated to bind a (fun x => b), and a;b is sage. If the message is GetSoc(host, port), then the translated to bind a (fun _ => b). Finally, the tab is requesting that a socket be opened to the given Ynot library provides a return primitive of type host/port. We apply the socket policy described in Sec- forall T (v: T), ST T (where again T is inferred by tion 3.4, where domain_suffix t returns the domain Coq). Given a value v, the monad return v represents suffix of a tab t, and safe_soc(host, domsuf) ap- the computation that does no I/O and simply returns v. plies the policy (which basically checks that host is a sub-domain of domsuf). If the policy allows the socket 5 Kernel Verification to be opened, the kernel uses the send_socket to open a socket to the host, and send the socket over the chan- In this section we explain how we verified QUARK’s kernel to the tab (recall that we use Unix domain sockets to nel. First, we specify correct behavior of the kernel in send open file descriptors from one process to another). terms of traces. Second, we prove the kernel satisfies this Otherwise, it returns an Error message. specification using the full power of Ynot’s monads. Fi- In addition to GetSoc the kernel handles several other nally, we prove that our kernel specification implies our cases for tab requests, which we omit in Figure 3. For target security properties. example, the kernel responds to GetURL by retrieving a URL and returning the result. It responds to cookie store 5.1 Actions and Traces and retrieve messages by checking the security policy from Section 3.5 and forwarding the message to the ap- We verify our kernel by reasoning about the sequences of propriate cookie process (note that for simplicity, we did calls to primitives (i.e., system calls) it can make. We call not show the cookie processes in Figure 3). The kernel such a sequence a trace; our kernel specification (hence- also responds to cookie processes that are sending cookie forth “spec”) defines which traces are allowed for a cor- results back to a tab, by forwarding the cookie results rect implementation as in [30].

8 Definition Trace := list Action. Inductive tcorrect : Trace -> Prop := | tcorrect_nil: Inductive Action := tcorrect nil | ReadN : chan -> positive -> list ascii -> Action | tcorrect_step: forall tr req rsp, | WriteN : chan -> positive -> list ascii -> Action tcorrect tr -> | MkTab : tab -> Action step_correct tr req rsp -> | SentSoc : tab -> list ascii -> list ascii -> Action tcorrect (rsp ++ req ++ tr). | ... Inductive step_correct : Definition Read c b := Trace -> Trace -> Trace -> Prop := ReadN c 1 [c] | step_correct_add_tab: forall tr t, step_correct tr Figure 4: Traces and Actions. This Coq code defines the type of (MkTab t :: Read stdin "+" :: nil) externally visible actions our kernel can take. A trace is simply a list (WroteMsg t Render) of such actions. We reason about our kernel by proving properties of | step_correct_socket_true: forall tr t host port, the traces it can have. Traces are like other Coq values; in particular, is_safe_soc host (domain_suffix t) = true -> we can write functions that return traces. Read is a helper function to step_correct tr construct a trace fragment corresponding to reading a single byte. (ReadMsg t (GetSoc host port)) (SentSoc t host port) We use a list of actions to represent the trace the | step_correct_socket_false: forall tr t host port, is_safe_soc host (domain_suffix t) <> true -> kernel produces by calling primitives. Each action in step_correct tr a trace corresponds to the kernel invoking a particular (ReadMsg t (GetSoc host port) ++ tr) primitive. Figure 4 shows a partial definition of the (WroteMsg t Error ++ tr) Action datatype. For example: ReadN f n l is an | ... Action indicating that the n bytes in list l were read Figure 5: Kernel Specification. step correct is a predicate over from input channel f; MkTab t indicates that tab t was triples containing a past trace, a request trace, and a response trace; it created; SentSoc t host port indicates a socket was holds when the response is valid for the given request in the context of connected to host/port and passed to tab t. the past trace. tcorrect defines a correct trace for our kernel to be a sequence of correct steps, i.e., the concatenation of valid request and We can manipulate traces and Actions like any other response trace fragments. values in Coq. For example, we can define a function Read c b to encode the special case that a single byte step_correct can be established. For example, b was read on input channel c. Though not shown here, step_correct_add_tab states that typing “+” on we also define similar helper functions to build up trace stdin leads to the creation of a tab and sending the fragments which correspond to having read or written Render message. The step_correct_socket_true a particular message to a given component. For exam- case captures the successful socket creation case, ple, ReadMsg t (GetSoc host port) corresponds to whereas step_correct_socket_false captures the the trace fragment that results from reading a GetSoc re- error case. quest from tab t. 5.3 Monads in Ynot Revisited 5.2 Kernel Specification In the previous section, we explained Ynot’s ST monad Figure 5 shows a simplified snippet of our kernel spec. as being parameterized over a single type T. In re- The spec is a predicate tcorrect over traces with two ality, ST takes two additional parameters representing constructors, stating the two ways in which tcorrect pre- and post-conditions for the computation encoded by can be established: (1) tcorrect_nil states that the the monad. Thus, ST T P Q represents a computation empty trace satisfies tcorrect (2) tcorrect_step which, if started in a state where P holds, may perform states that if tr satisfies tcorrect and the kernel some I/O and then return a value of type T in a state takes a single step, meaning that after tr it gets a where Q holds. For technical reasons, these pre- and post- request req, and responds with rsp, then the trace conditions are expressed using separation logic, but we rsp ++ req ++ tr (where ++ is list concatenation) defer details to a tech report [24]. also satisfies tcorrect. By convention the first action Following the approach of Malecha et al. [30], we de- in a trace is the most recent. fine an opaque predicate (traced tr) to represent the The predicate step_correct defines correctness fact that at a given point during execution, tr captures for a single iteration of the kernel’s main loop: all the past activities; and (open f) to represent the fact step_correct tr req rsp holds if given the past that channel f is currently open. An opaque predicate trace tr and a request req, the response of the cannot be proven directly. This property allows us to kernel should be rsp. The predicate has several ensure that no part of the kernel can forge a proof of constructors (not all shown) enumerating the ways (traced tr) for any trace it independently constructs.

9 Axiom readn: Ynot for this trace, the field is not realized at run- forall (f: chan) (n: positive) {tr: Trace}, tr ST (list ascii) time, it is only used for proof purposes. {traced tr * open f} We define the kcorrect predicate as follows (s.tr {fun l => projects the current trace out of kernel state s): traced (ReadN f n l :: tr) * [len l = n] * open f }. Definition kcorrect (s: kstate) := traced s.tr * [tcorrect s.tr] Definition read_msg: forall (t: tab) {tr: Trace}, Now we want to show that kcorrect is an invariant that ST msg holds throughout execution of the kernel. Essentially we {traced tr * open (tchan t)} must show that (kcorrect s) is a loop invariant on the {fun m => kernel state s for the main kernel loop, which boils down traced (ReadMsg t m ++ tr) * open (tchan t)} := ... to showing that (kcorrect s) is valid as both the pre- and post-condition for the loop body, kstep as shown in Figure 6: Example Monadic Types. This Coq code shows the Figure 3. monadic types for the readn primitive and for the read msg func- As mentioned previously, Coq will ask us to prove im- tion which is implemented in terms of readn. In both cases, the first expression between curly braces represents a pre-condition and the sec- plications between the post-condition of one monad and ond represents a post-condition. The asterisk (*) may be read as normal the pre-condition of the next. While these proofs are ul- conjunction in this context. timately spelled out in full formal detail, Coq provides facilities to automate a substantial portion of the proof Thus (traced tr) can only be true for the current trace process. Ynot further provides a handful of sophisticated tr. tactics which helped automatically dispatch tedious, re- Figure 6 shows the full monadic type for the readn peatedly occurring proof obligations. We had to manu- primitive, which reads n bytes of data and returns it. ally prove the cases which were not handled automati- The * connective represents the separating conjunc- cally. While we have only shown the key kernel invari- tion from separation logic. For our purposes, con- ant here, in the full implementation there are many ad- sider it as a regular conjunction. The precondition of ditional Hoare predicates for the intermediate goals be- (readn f n tr) states that tr is the current trace and tween program points. We defer details of these predi- that f is open. The post-condition states that the trace cates and the manual proof process to [24], but discuss after readn will be the same as the original, but with proof effort in Section 6. an additional (ReadN f n l) action at the beginning, where the length of l is equal to n (len l = n is a regular predicate, which is lifted using square brackets into 5.5 Security Properties a separation logic predicate). After the call, the channel Our security properties are phrased as theorems about the f is still open. spec. We now prove that our spec implies these key secu- The full type of the Ynot bind operation makes sure rity properties, which we intend to hold in QUARK. Fig- that when two monads are sequenced, the post-condition ure 7 shows these key theorems, which correspond pre- of the first monad implies the pre-condition of the sec- cisely to the security properties outlined in Section 3.6. ond. This is achieved by having bind take an additional State Integrity. The first security property, third argument, which is a proof of this implication. The kstate dep user, ensures that the kernel state only syntactic sugar for x <- a;b is updated to pass the wild- changes in response to the user pressing a “control key” card “_” for the additional argument. When processing (e.g. switching to the third tab by pressing F3). The the definition of our kernel, Coq will enter into an inter- theorem establishes this property by showing its contra- active mode that allows the user to construct proofs to positive: if the kernel steps by responding with rsp to re- fill in these wildcards. This allows us to prove that the quest req after trace tr and no “control keys” were read post-condition of each monad implies the pre-condition from the user, then the kernel state remains unchanged of the immediately following monad in Coq’s interactive by this step. The function proj_user_control, not proof environment. shown here, simply projects from the trace all actions of the form (Read c stdin) where c is a control key. 5.4 Back to the Kernel The function kernel_state, not shown here, just com- putes the kernel state from a trace. We also prove that at We now return to our kernel from Figure 3 and show how the beginning of any invocation to kloop in Figure 3, all we prove that it satisfies the spec from Figure 5. We fields of s aside from tr are equal to the corresponding augment the kernel state to additionally include the trace field in (kernel_state s.tr). of the kernel so far, and we update our kernel code to Response Integrity. The second security property, maintain this tr field. By using a special encoding in kresponse dep kstate, ensures that every kernel re-

10 Theorem kstate_dep_user: is consistent with our notion of requests as inputs and forall tr req rsp, responses as outputs. step_correct tr req rsp -> proj_user_control tr Tab Non-Interference. The second security property, = proj_user_control (rsp ++ req ++ tr) -> tab NI, states that the kernel’s response to a tab is not kernel_state tr = kernel_state (rsp ++ req ++ tr). affected by any other tab. In particular, tab NI shows that if in the context of a valid trace, tr1, the kernel Theorem kresponse_dep_kstate: forall tr1 tr2 req rsp, responds to a request req from tab t with rsp1, then kernel_state tr1 = kernel_state tr2 -> the kernel will respond to the same request req with an step_correct tr1 req rsp -> equivalent response in the context of any other valid trace step_correct tr2 req rsp. tr2 which also contains tab t, irrespective of what other Theorem tab_NI: tabs are present in tr2 or what actions they take. Note forall tr1 tr2 t req rsp1 rsp2, that this property holds in particular for the case where tcorrect tr1 -> tcorrect tr2 -> trace tr2 contains only tab t, which leads to the follow- from_tab t req -> (cur_tab tr1 = Some t <-> cur_tab tr2 = Some t) -> ing corollary: the kernel’s response to a tab will be the step_correct tr1 req rsp1 -> same even if all other tabs did not exist step_correct tr2 req rsp2 -> The formal statement of the theorem in Figure 7 is rsp1 = rsp2 \/ made slightly more complicated because of two issues. (exists m, rsp1 = WroteCMsg (cproc_for t tr1) m /\ rsp2 = WroteCMsg (cproc_for t tr2) m). First, we must assume that the focused tab at the end of tr1 (denoted by cur tab tr1) is t if and only if the Theorem no_xdom_sockets: forall tr t, focused tab at the end of tr2 is also t. This additional tcorrect tr -> assumption is needed because the kernel responds differ- In (SendSocket t host s) tr -> is_safe_soc host (domain_suffic t). ently based on whether a tab is focused or not. For example, when the kernel receives a Display message from a Theorem no_xdom_cookie_set: forall tr1 tr2 cproc, tab (indicating that the tab wants to display its rendered tcorrect (tr1 ++ SetCookie key value cproc :: tr2) -> exists tr t, page to the user), the kernel only forwards the message (tr2 = (SetCookieRequest t key value :: tr) /\ to the output process if the tab is currently focused. is_safe_cook (domain cproc) (domain_suffix t)) The second complication is that the communication channel underlying the cookie process for t’s domain Theorem dom_bar_correct: forall tr, tcorrect tr -> dom_bar tr = domain_suffix (cur_tab tr). may not be the same between tr1 and tr2. Thus, in the case that kernel responds by forwarding a valid re- Figure 7: Kernel Security Properties. This Coq code shows how quest from t to its cookie process, we guarantee that the traces allow us to formalize QUARK’s security properties. kernel sends the same payload to the cookie process corresponding to t’s domain. sponse depends solely on the request and the kernel state. Note that, unlike kresponse dep kstate, tab NI This delineates which parts of a trace can affect the ker- does not require tr1 and tr2 to induce the same kernel’s behavior: for a given request req, the kernel will nel state. Instead, it merely requires the request req to produce the same response rsp, for any two traces that be from a tab t, and tr1 and tr2 to be valid traces that induce the same kernel state, even if the two traces have both contain t (indeed, t must be on both traces other- completely different sets of requests/responses (recall wise the step correct assumptions would not hold). that the kernel state only includes the current tab and Other than these restrictions, tr1 and tr2 may be arbi- the set of tabs, and most request responses don’t change trarily different. They could contain different tabs from these). Since the kernel state depends only the user’s different domains, have different tabs focused, different control key inputs, this theorem immediately establishes cookie processes, etc. the fact that our browser will never allow one component Response Integrity and Tab Non-Interference provide to inﬂuence how the kernel treats another component un- different, complimentary guarantees. Response Integrity less the user intervenes. ensures the response to any request req is only affected Note that kresponse dep kstate shows that the ker- by control keys and req, while Tab Non-Interference nel will produce the same response given the same re- guarantees that the response to a tab request does not leak quest after any two traces that induce the same kernel information to another tab. Note that Response Integrity state. This may seem surprising since many of the ker- could still hold for a kernel which mistakenly sends re- nel’s operations produce nondeterministic results, e.g., sponses to the wrong tab, but Tab Non-Interference pre- there is no way to guarantee that two web fetches of the vents this. Similarly, Tab Non-Interference could hold same URL will produce the same document. However, for a kernel which allows a tab to affect how the kernel such nondeterminism is captured in the request, which responds to a cookie process, but Response Integrity pre-

11 cludes such behavior. Component Language Lines of code It is also important to understand that tab NI proves Kernel Code Coq 859 the absence of interference as caused by the kernel, not Kernel Security Properties Coq 142 by other components, such as the network or cookie pro- Kernel Proofs Coq 4,383 cesses. In particular, it is still possible for two websites to Kernel Primitive Specification Coq 143 Kernel Primitives Ocaml/C 538 communicate with each other through the network, caus- Tab Process Python 229 ing one tab to affect another tab’s view of the web. Sim- Input Process Python 60 ilarly, it is possible for one tab to set a cookie which is Output Process Python 83 read by another tab, which again causes a tab to affect an- Cookie Process Python 135 other one. For the cookie case, however, we have a sep- Python Message Lib Python 334 arate theorem about cookie integrity and confidentiality WebKit Modifications C 250 which states that cookie access control is done correctly. WebKit C/C++ 969,109 Note that this property is an adaptation of the tra- Figure 8: QUARK Components by Language and Size. ditional non-interference property. In traditional non- interference, the program has ”high” and ”low” inputs use GetURL when accessing websites that do not match and outputs; a program is non-interfering if high inputs its domain suffix, thus restricting the tab to only access never affect low outputs. Intuitively, this constrains the publicly available data from other domains. program to never reveal secret information to untrusted Cookie Integrity/Confidentiality. The fourth secu- principles. rity property states cookie integrity and confidentiality. We found that this traditional approach to non- As an example of how cookies are processed, consider interference fits poorly with our trace-based verification the following trace when a cookie is set: approach. In particular, because the browser is a non- terminating, reactive program, the ”inputs” and ”out- SetCookie key value cproc :: puts” are infinite streams of data. SetCookieRequest tab key value :: ... Previous research [11] has adapted the notion of non- First, the SetCookieRequest action occurs, stat- interference to the setting of reactive programs like ing that a given tab just requested a cookie (in browsers. They provide a formal definition of non- fact, SetCookieRequest is just defined in terms of a interference in terms of possibly infinite input and out- ReadMsg action of the appropriate message). The ker- put streams. A program at a particular state is non- nel responds with a SetCookie action (defined in terms interfering if it produces similar outputs from similar in- of WroteMsg), which represents the fact that the kernel puts. The notion of similarity is parameterized in their sent the cookie to the cookie process cproc. The ker- definition; they explore several options and examine the nel implementation is meant to find a cproc whose do- consequences of each definition for similarity. main suffix corresponds to the tab. This requirement is Our tab non-interference theorem can be viewed in given in the theorem no xdom cookie set, which en- terms of the definition from [11], where requests are “in- codes cookie integrity. It requires that, within a correct puts” and responses are “outputs”; essentially, our the- trace, if a cookie process is ever asked to set a cookie, orem shows the inductive case for potentially infinite then it is in immediate response to a cookie set request streams. Adapting our definition to fit directly in the for the same exact cookie from a tab whose domain framework from [11] is complicated by the fact that we matches that of the cookie process. There is a similar deal with a unified trace of input and output events in the theorem no xdom cookie get, not shown here, which sequence they occur instead of having one trace of input encodes cookie confidentiality. events and a separate trace of output events. In future Domain Bar Integrity and Correctness. The fifth work, we hope to refine our notion of non-interference property states that the domain bar is equal to the domain to be between domains instead of tabs, and we believe suffix of the currently selected tab, which encodes the that applying the formalism from [11] will be useful in correctness of the address bar. achieving this goal. Unlike [11], we prove a version of non-interference for a particular program, the QUARK 6 Evaluation browser kernel, directly in Coq. No Cross-domain Socket Creation. The third secu- In this section we evaluate QUARK in terms of proof ef- rity property, no xdom sockets, ensures that the ker- fort, trusted computing base, performance, and security. nel never delivers a socket bound to domain d to a tab Proof Effort and Component Sizes. QUARK com- whose domain does not match d. This involves check- prises several components written in various languages; ing URL suffixes in a style very similar to the cookie we summarize their sizes in Figure 8. All Python com- policy as discussed earlier. This property forces a tab to ponents share the “Python Message Lib” for messaging

12 5

with the kernel. Implementing QUARK took about 6 per- not optimized + socket (same origin) + socket (whitelist) + cookie cache

son months, which includes several iterations redesign- 4 ing the kernel, proofs, and interfaces between components. Formal shim veriﬁcation saved substantial effort: 3 we guaranteed our security properties for a million lines of code by reasoning just 859. 2 Trusted Computing Base. The trusted computing

Time (Normalized Time WebKit) to (Normalized 1

base (TCB) consists of all system components we as- Load sume to be correct. A bug in the TCB could invalidate 0 our security guarantees. QUARK’s TCB includes:

• Coq’s core calculus and type checker Figure 9: QUARK Performance. This graph shows QUARK load • Our formal statement of the security properties times for the Alexa Top 10 Web sites, normalized to stock WebKit’s • load times. In each group, the leftmost bar shows the unoptimized load Several primitives used in Ynot time, the rightmost bar shows the load time in the final, optimized ver- • Several primitives unique to QUARK sion of QUARK, and intermediate bars show how additional optimiza- • The Ocaml compiler and runtime tions improve performance. Smaller is better. • The underlying Operating System kernel Performance. We evaluate our approach’s perfor- • Our chroot sandbox mance impact by comparing QUARK’s load times to stock WebKit. Figure 9 shows QUARK load times for Because Coq exploits the Curry-Howard Isomor- the top 10 Alexa Web sites, normalized to stock We- phism, its type checker is actually the “proof checker” we bKit. QUARK’s overhead is due to factoring the browser have mentioned throughout the paper. We assume that into distinct components which run in separate processes our formal statement of the security properties correctly and explicitly communicate through a formally verified reflects how we understand them intuitively. We also as- browser kernel. sume that the primitives from Ynot and those we added in By performing a few simple optimizations, the final QUARK correctly implement the monadic type they are version of QUARK loads large, sophisticated websites axiomatically assigned. We trust the OCaml compiler with only 24% overhead. This is a substantial improve- and runtime since our kernel is extracted from Coq and ment over a na¨ıve implementation of our architecture, run as an OCaml program. We also trust the operating shown by the left-most “not-optimized” bars in Figure 9. system kernel and our traditional chroot sandbox to pro- Passing bound sockets to tabs, whitelisting content distri- vide process isolation, specifically, our design assumes bution networks for major websites, and caching cookie the sandboxing mechanism restricts tabs to only access accesses, improves performance by 62% on average. resources provided by the kernel, thus preventing com- The WebKit baseline in Figure 9 is a full-featured promised tabs from commuting over covert channels. browser based on the Python bindings to WebKit. These Our TCB does not include WebKit’s large code base or bindings are simply a thin layer around WebKit’s C/C++ the Python implementation. This is because a compro- implementation which provide easy access to key call- mised tab or cookie process can not affect the security backs. We measure 10 loads of each page and take the guarantees provided by kernel. Furthermore, the TCB average. Over all 10 sites, the average slowdown in load- does not include the browser kernel code, since it has time is 24% (with a minimum of 5% for blogger and a been proved correct. maximum of 42% for yahoo). Ideally, QUARK will take advantage of previous for- We also measured load-time for the previous version mally verified infrastructure to minimize its TCB. For of QUARK, just before rectangle-based rendering was example, by running QUARK in seL4 [27], compiling added. In this previous version, the average load-time QUARK’s ML-like browser kernel with the MLCom- was only 12% versus 24% for the current version. The pCert compiler [1], and sandboxing other QUARK com- increase in overhead is due to additional communica- ponents with RockSalt [32], we could drastically reduce tion with the kernel during incremental rendering. De- our TCB by eliminating its largest components. In this spite this additional overhead in load time, incremental light, our work shows how to build yet another piece of rendering is preferable because it allows QUARK to dis- the puzzle (namely a verified browser) needed to for a play content to the user as it becomes available instead fully verified software stack. However, these other ver- of waiting until an entire page is loaded. ified building blocks are themselves research prototypes Security Analysis. QUARK provides strong, formal which, for now, makes them very difficult to stitch to- guarantees for security policies which are not fully com- gether as a foundation for a realistic browser. patible with traditional web security policies, but still

13 provide some of the assurances popular web browsers actions on the server side. This guarantee prohibits a seek to provide. large class of attacks, e.g., cross-site request forgery at- For the policies we have not formally verified, QUARK tacks against Amazon domains from non-Amazon do- offers exactly the same level of traditional, unverified mains. However, this policy cannot prevent cross-site enforcement WebKit provides. Thus, QUARK actually request forgery attacks against sites sharing the same provides security far beyond the handful policies we for- domain suffix with the tab, e.g., attacks from a tab on mally verified. Below we discuss the gap between the www.ucsd.edu against cse.ucsd.edu since the tab on subset of policies we verified and the full set of common www.ucsd.edu can directly connect to cse.ucsd.edu browser security policies. using a socket and cookies on cse.ucsd.edu are also The same origin policy [37] (SOP) dictates which re- available to the tab. sources a tab may access. For example, a tab is allowed Compatibility Issues. QUARK enforces non-standard to load cross-domain images using an img tag, but not security policies which break compatibility with some using an XMLHttpRequest. web applications. For example, Mashups do not work Unfortunately, we cannot easily verify this policy properly because a tab can only access cookies for its since restricting how a resource may be used after it has domain and subdomains, e.g., a subframe in a tab can- been loaded (e.g., in an img tag vs. as a JavaScript value) not properly access any page that needs user creden- requires reasoning across abstraction boundaries, i.e., an- tials identified by cookies if the subframe’s domain suf- alyzing the large, complex tab implementation instead of fix does not match with the tab’s one. This limitation treating it as a black box. arises because tabs are the principles of Quark as op- The SOP also restricts how JavaScript running in dif- posed to subframes inside tabs. Unfortunately, tabs are ferent frames on the same page may access the DOM. We too coarse grained to properly support mashups and re- could formally reason about this aspect of the SOP by tain our strong guarantees. making frames the basic protection domains in QUARK For the same reason as above, Quark cannot currently instead of tabs. To support this refined architecture, support third-party cookies. It is worth noting that third- frames would own a rectangle of screen real estate which party cookies have been considered a privacy-violating they could subdivide and delegate to sub-frames. Com- feature of the web, and there are even popular browser munication between frames would be coordinated by the extensions to suppress them. However, many websites kernel, which would allow us to formally guarantee that depend on third party cookies for full functionality, and all frame access to the DOM conforms with the SOP. our current Quark browser does not allow such cookies We only formally prove inter-domain cookie isolation. since they would violate our non-interference guarantees. Even this coarse guarantee prohibits a broad class of at- Finally, Quark does not support communications like tacks, e.g., it protects all Google cookies from any non- “postMessage” between tabs; again, this would violate Google tab. QUARK does enforce restrictions on cookie our tab non-interference guarantees. access between subdomains; it just does so using WebKit Despite these incompatibilities, Quark works well on a as unverified cookie handling code within our cookie variety of important sites such as Google Maps, Amazon, processes. Formally proving finer-grained cookie poli- and Facebook since they mostly comply with Quarks’ cies in Coq would be possible and would not require sig- security policies. More importantly, our hope is that in nificant changes to the kernel or proofs. the future Quark will provide a foundation to explore all Unfortunately, Quark does not prevent all cookie exfil- of the above features within a formally verified setting. tration attacks. If a subframe is able to exploit the entire In particular, adding the above features will require fu- tab, then it could steal the cookies of its top-level parent ture work in two broad directions. First, frames need tab, and leak the stolen cookies by encoding the informa- to become the principles in Quark instead of tabs. This tion within the URL parameter of GetURL requests. This change will require the kernel to support parent frames limitation arises because tabs are principles in Quark in- delegating resources like screen region to child frames. stead of frames. This problem can be prevented by refin- Second, finer grained policies will be required to retain ing Quark so that frames themselves are the principles. appropriate non-interference results in the face of these Our socket security policy prevents an important sub- new features, e.g. to support interaction between tabs set of cross-site request forgery attacks [9]. Quark guar- via ”postMessage”. Together, these changes would pro- antees that a tab uses a GetURL message when request- vide a form of ”controlled” interference, where frames ing a resource from sites whose domain suffix doesn’t are allowed to communicate directly, but only in a sanc- match with the tab’s one. Because our implementa- tioned manner. Even more aggressively, one may attempt tion of GetURL does not send cookies, the resources re- to re-implement other research prototypes like Mashu- quested by a GetURL message are guaranteed to be pub- pOS [19] within Quark, going beyond the web standards licly available ones which do not trigger any privileged of today, and prove properties of its implementation.

14 There are also several other features that Quark does ify the browser in a single shot, we found that an itera- not currently support, and would be useful to add, includ- tive approach was much more effective. We started with ing local storage, file upload, browser cache, browser his- a text-based browser, where the tab used lynx to gener- tory, etc. However, we believe that these are not funda- ate a text-based version of QUARK. We then evolved this mental limitations of our approach or Quark’s current de- browser into a GUI-based version based on WebKit, but sign. Indeed, most of these features don’t involve inter- with no sockets or cookies. Then we added sockets and tab communication. For the cases where they do (for ex- finally cookies. When combined with our philosophy of ample history information is passed between tabs if vis- “write the non-verified version first”, this meant that we ited links are to be correctly rendered), one would again kept a working version of the kernel written in Python have to refine the non-interference definition and theo- throughout the various iterations. Just for comparison, rems to allow for controlled flow of information. the Python kernel which is equivalent to the Coq version listed in Figure 8 is 305 lines of code. 7 Discussion Favor Ease of Reasoning. When forced to choose between adding complexity to the browser kernel or to the In this section we discuss lessons learned while develop- untrusted tab implementation, it was always better keep ing QUARK and verifying its kernel in Coq. In hindsight, the kernel as simple as possible. This helped manage the these guidelines could have substantially eased our ef- verification burden which was the ultimate bottleneck in forts. We hope they prove useful for future endeavors. developing QUARK. Similarly, when faced with a choice Formal Shim Verification. Our most essential tech- between flexibility/extensibility of code and ease of rea- nique was formal shim verification. For us, it reduced soning, we found it best to aim for ease of reasoning. the verification burden to proving a small browser kernel. Previous browsers like Chrome, OP, and Gazelle clearly 8 Conclusions demonstrate the value of kernel-based architectures. OP In this paper, we demonstrated how formal shim verifica- further shows how this approach enables reasoning about tion can be used to achieve strong security guarantees for a model of the browser. We take the next step and for- a modern Web browser using a mechanical proof assis- mally prove the actual browser implementation correct. tant. We formally proved that our browser provides tab Modularity through Trace-based Specification. We noninterference, cookie integrity and confidentiality, and ultimately specified correct browser behavior in terms address bar integrity and correctness. We detailed our of traces and proved both that (1) the implementation design and verification techniques and showed that the satisfies the spec and (2) the spec implies our secu- resulting browser, QUARK, provides a modern browsing rity properties. Splitting our verification into these two experience with performance comparable to the default phases improved modularity by separating concerns. The WebKit browser. For future research, QUARK furnishes first proof phase reasons using monads in Ynot to show a framework to easily experiment with additional web that the trace-based specification correctly abstracts the policies without re-engineering an entire browser or for- implementation. The second proof phase is no longer malizing all the details of its behavior from scratch. bound to reasoning in terms of monads – it only needs to reason about traces, substantially simplifying proofs. This modularity aided us late in development 9 Acknowledgments when we proved address bar correctness (Theorem dom_bar_correct in Figure 7). To prove this theorem, We thank Kirill Levchenko for many fruitful conversa- we only had to reason about the trace-based specifica- tions regarding shim verification. We would also like to tion, not the implementation. As a result, the proof of thank our shepherd, Anupam Datta, and the anonymous dom_bar_correct was only about 300 lines of code, reviewers for helping us improve our paper. tiny in comparison to the total proof effort. Thus, proving additional properties can be done with relatively little References effort over the trace-based specification, without having [1] http://gallium.inria.fr/ dargaye/mlcompcert.html. to reason about monads or other implementation details. ~ [2] Chrome security hall of fame. http://dev.chromium.org/ Implement Non-verified Prototype First. Another Home/chromium-security/hall-of-fame. approach we found effective was to write a non-verified [3] Public suffix list. http://publicsuffix.org/. version of the kernel code before verifying it. This allowed us to carefully design and debug the interfaces be- [4] Pwn2own. http://en.wikipedia.org/wiki/Pwn2Own. tween components and to enable the right browsing func- [5]A KHAWE,D.,BARTH,A.,LAMY, P. E., MITCHELL,J., AND SONG, D. Towards a formal foundation of web security. In tionality before starting the verification task. Proceedings of CSF 2010 (July 2010), M. Backes and A. Myers, Iterative Development. After failing to build and ver- Eds., IEEE Computer Society, pp. 290–304.

15 [6]A NSEL,J.,MARCHENKO, P., ERLINGSSON, U.,T´ AYLOR,E., [25]J ANG,D.,VENKATARAMAN,A.,SAWKA,G.M., AND CHEN,B.,SCHUFF,D.L.,SEHR,D.,BIFFLE,C., AND YEE, SHACHAM, H. Analyzing the cross-domain policies of flash ap- B. Language-independent sandboxing of just-in-time compila- plications. In In Web 2.0 Security and Privacy (W2SP 2011) (May tion and self-modifying code. In PLDI (2011), pp. 355–366. 2011). [7]B ALL, T., MAJUMDAR,R.,MILLSTEIN, T., AND RAJAMANI, [26]J IM, T., SWAMY,N., AND HICKS, M. Defeating script injec- S. K. Automatic predicate abstraction of C programs. In Pro- tion attacks with browser-enforced embedded policies. In WWW ceedings of the ACM SIGPLAN 2001 Conference on Program- (2007), pp. 601–610. ming Language Design and Implementation (Snowbird, Utah, [27]K LEIN,G.,ELPHINSTONE,K.,HEISER,G.,ANDRONICK,J., June 2001). COCK,D.,DERRIN, P., ELKADUWE,D.,ENGELHARDT,K., [8]B ARTH,A.,JACKSON,C., AND MITCHELL, J. C. Robust de- KOLANSKI,R.,NORRISH,M.,SEWELL, T., TUCH,H., AND fenses for cross-site request forgery. In ACM Conference on Com- WINWOOD, S. seL4: formal verification of an OS kernel. In puter and Communications Security (2008), pp. 75–88. SOSP (2009). [28]L EROY, X. Formal certification of a compiler back-end, or: pro- [9]B ARTH,A.,JACKSON,C., AND MITCHELL, J. C. Robust de- gramming a compiler with a proof assistant. In PLDI (2006). fenses for cross-site request forgery. In To appear at the 15th ACM Conference on Computer and Communications Security [29]M ALECHA,G.,MORRISETT,G.,SHINNAR,A., AND WIS- (CCS 2008) (2008). NESKY, R. Toward a verified relational database management system. In POPL (2010). [10]B ARTH,A.,JACKSON,C.,REIS,C., AND THE GOOGLE [30]M ALECHA,G.,MORRISETT,G., AND WISNESKY, R. Trace- CHROME TEAM. The security architecture of the Chromium browser. Tech. rep., Google, 2008. based verification of imperative programs with I/O. J. Symb. Comput. 46 (February 2011), 95–118. [11]B OHANNON,A.,PIERCE,B.C.,SJOBERG¨ , V., WEIRICH,S., [31]M ICKENS,J., AND DHAWAN, M. Atlantis: robust, extensible AND ZDANCEWIC, S. Reactive noninterference. In Proceedings execution environments for web applications. In SOSP (2011), of the 16th ACM conference on Computer and communications pp. 217–231. security (2009). [32]M ORRISETT,G.,TAN,G.,TASSAROTTI,J.,TRISTAN,J.-B., [12]C HEN, E. Y., BAU,J.,REIS,C.,BARTH,A., AND JACKSON, AND GAN, E. Rocksalt: Better, faster, stronger sfi for the x86. In C. App isolation: get the security of multiple browsers with just PLDI (2012). one. In Proceedings of the 18th ACM conference on Computer and communications security (2011). [33]N ANEVSKI,A.,MORRISETT,G., AND BIRKEDAL, L. Poly- morphism and separation in Hoare type theory. In ICFP (2006). [13]C HEN,S.,MESEGUER,J.,SASSE,R.,WANG,H.J., ANDMIN [34]N ANEVSKI,A.,MORRISETT,G.,SHINNAR,A.,GOVEREAU, WANG, Y. A systematic approach to uncover security flaws in P., AND BIRKEDAL, L. Ynot: Dependent types for imperative GUI logic. In IEEE Symposium on Security and Privacy (2007). programs. In ICFP (2008). [14]C HUGH,R.,MEISTER,J.A.,JHALA,R., AND LERNER,S. [35]P ROVOS,N.,FRIEDL,M., AND HONEYMAN, P. Preventing Staged information flow for javascript. In PLDI (2009). privilege escalation. In Proceedings of the 12th conference on [15]C OOK,B.,PODELSKI,A., AND RYBALCHENKO, A. Termina- USENIX Security Symposium - Volume 12 (2003), USENIX As- tor: Beyond safety. In CAV (2006). sociation. [16]D AS,M.,LERNER,S., AND SEIGLE, M. ESP: Path-sensitive [36]R ATANAWORABHAN, P., LIVSHITS, V. B., AND ZORN,B.G. program verification in polynomial time. In PLDI (2002). Nozzle: A defense against heap-spraying code injection attacks. In USENIX Security Symposium (2009), pp. 169–186. [17]G RIER,C.,TANG,S., AND KING, S. T. Secure web browsing [37]R UDERMAN, J. The same origin policy, 2001. with the OP web browser. In IEEE Symposium on Security and http://www.mozilla.org/projects/security/ components/same- Privacy (2008). origin.html. [18]H ENZINGER, T. A., JHALA,R.,MAJUMDAR,R., AND SUTRE, [38]S AXENA, P., AKHAWE,D.,HANNA,S.,MAO, F., MCCA- G. Lazy abstraction. In POPL (2002). MANT,S., AND SONG, D. A symbolic execution framework for [19]H OWELL,J.,JACKSON,C.,WANG,H.J., AND FAN,X. javascript. In IEEE Symposium on Security and Privacy (2010), MashupOS: operating system abstractions for client mashups. In pp. 513–528. HotOS (2007). [39]S INGH,K.,MOSHCHUK,A.,WANG,H.J., AND LEE, W. On [20]H UANG,L.-S.,WEINBERG,Z.,EVANS,C., AND JACKSON, the incoherencies in web browser access control policies. In IEEE C. Protecting browsers from cross-origin css attacks. In ACM Symposium on Security and Privacy (2010), pp. 463–478. Conference on Computer and Communications Security (2010), [40]S TAMM,S.,STERNE,B., AND MARKHAM, G. Reining in the pp. 619–629. web with content security policy. In Proceedings of the 19th in- [21]J ACKSON,C., AND BARTH, A. Beware of finer-grained origins. ternational conference on World wide web (2010), WWW ’10, In In Web 2.0 Security and Privacy (W2SP 2008) (May 2008). pp. 921–930. [41]T ANG,S.,MAI,H., AND KING, S. T. Trust and protection in the [22]J ACKSON,C.,BARTH,A.,BORTZ,A.,SHAO, W., AND illinois browser operating system. In OSDI (2010), pp. 17–32. BONEH, D. Protecting browsers from dns rebinding attacks. In ACM Conference on Computer and Communications Security [42]W ANG,H.J.,GRIER,C.,MOSHCHUK,A.,KING, S. T., (2007), pp. 421–431. CHOUDHURY, P., AND VENTER, H. The multi-principal OS construction of the gazelle web browser. Tech. Rep. MSR-TR- [23]J ANG,D.,JHALA,R.,LERNER,S., AND SHACHAM, H. An em- 2009-16, MSR, 2009. pirical study of privacy-violating information flows in JavaScript Web applications. In Proceedings of the ACM Conference Com- [43]Y ANG,X.,CHEN, Y., EIDE,E., AND REGEHR, J. Finding and puter and Communications Security (CCS) (2010). understanding bugs in C compilers. In PLDI (2011). [44]Y U,D.,CHANDER,A.,ISLAM,N., AND SERIKOV,I. [24]J ANG,D.,TATLOCK,Z., AND LERNER, S. Establishing browser Javascript instrumentation for browser security. In POPL (2007), security guarantees through formal shim verification. Tech. rep., pp. 237–249. UC San Diego, 2012.