Towards High Assurance HTML5 Applications
Total Page:16
File Type:pdf, Size:1020Kb
Towards High Assurance HTML5 Applications Devdatta Akhawe Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-56 http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-56.html May 7, 2014 Copyright © 2014, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Towards High Assurance HTML5 Applications by Devdatta Madhav Akhawe A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Dawn Song, Chair Professor David Wagner Professor Brian Carver Spring 2014 Towards High Assurance HTML5 Applications Copyright 2014 by Devdatta Madhav Akhawe 1 Abstract Towards High Assurance HTML5 Applications by Devdatta Madhav Akhawe Doctor of Philosophy in Computer Science University of California, Berkeley Professor Dawn Song, Chair Rich client-side applications written in HTML5 proliferate diverse platforms such as mobile devices, commodity PCs, and the web platform. These client-side HTML5 applications are increasingly accessing sensitive data, including users' personal and social data, sensor data, and capability-bearing tokens. Instead of the classic client/server model of web applications, modern HTML5 applications are complex client-side applications that may call some web services, and run with ambient privileges to access sensitive data or sensors. The goal of this work is to enable the creation of higher-assurance HTML5 applications. We propose two major directions: first, we present the use of formal methods to analyze web protocols for errors. Second, we use existing primitives to enable practical privilege separation for HTML5 applications. We also propose a new primitive for complete mediation of HTML5 applications. Our proposed designs considerably ease analysis and improve auditability. i To my parents. ii Contents Contents ii List of Figures iv List of Tables v 1 Introduction 1 1.1 Towards a Formal Foundation for Web Protocols . 1 1.2 Privilege Separation for HTML5 Applications . 2 1.3 Data Confined HTML5 Applications . 3 2 Towards A Formal Foundation for Web Protocols 4 2.1 Introduction . 4 2.2 General Model . 7 2.3 Implementation in Alloy . 13 2.4 Case Studies . 19 2.5 Measurement . 28 2.6 Summary of Results . 28 3 Privilege Separation for HTML5 Applications 31 3.1 Introduction . 31 3.2 Problem and Approach Overview . 33 3.3 Design . 36 3.4 Implementation . 41 3.5 Case Studies . 47 3.6 Performance Benchmarks . 54 3.7 Summary of Results . 54 4 Data-Confined HTML5 Applications 56 4.1 Introduction . 56 4.2 Data Confinement in HTML5 applications . 57 4.3 Problem Formulation . 60 4.4 The Data Confined Sandbox . 63 iii 4.5 Implementation . 68 4.6 Case Studies . 68 4.7 Summary of Results . 77 5 Related Work 79 5.1 Formal Verification of Security Protocols . 79 5.2 Privilege Separation for Web Applications . 80 5.3 Data-confined HTML5 Applications . 82 6 Conclusion 83 Bibliography 84 iv List of Figures 2.1 The metamodel of our formalization of web security. Red unmarked edges represent the `extends' relationship. 16 2.2 Vulnerability in Referer Validation. This figure is adapted from [85], with the attack (dashed line) added. 23 2.3 Counterexample generated by Alloy for the HTML5 form vulnerability. 24 2.4 The WebAuth protocol . 25 2.5 Log-scale graph of analysis time for increasing scopes. The SAT solver ran out of memory for scopes greater than eight after the fix. 29 3.1 CDF of percentage of functions in an extension that make privileged calls (X axis) vs. the fraction of extensions studied (in percentage) (Y axis). The lines for 50% and 20% of extensions as well as for 5% and 20% of functions are marked. 35 3.2 High-level design of our proposed architecture. 37 3.3 Sequence of events to run application in sandbox. Note that only the bootstrap code is sent to the browser to execute. Application code is sent directly to the parent, which then creates a child with it. 42 3.4 Typical events for proxying a privileged API call. The numbered boxes outline the events. The event boxes span the components involved. For example, event 4 involves the parent shim calling the policy code. 44 3.5 Frequency distribution of event listeners and API calls used by the top 42 extensions requiring the tabs permission. 53 4.1 High-level design of an application running in a DCS. The only component that runs privileged is the parent. The children run in data-confined sandboxes, with no ambient privileges and all communication channels monitored by the parent. 64 v List of Tables 2.1 Statistics for each case study . 28 3.1 Overview of case studies. The TCB sizes are in KB. The lines changed column only counts changes to application code, and not application independent shims and parent code. 48 4.1 Comparison of current solutions for data confinement . 61 4.2 List of our case studies, as well as the individual components and policies in our redesign. 70 4.3 Confidentiality Invariants in the Top 20 Google Chrome Extensions . 78 vi Acknowledgments First, I want to thank my advisor Dawn for being such a fantastic advisor and guide through my graduate career. Also, thanks to David Wagner whose advice and guidance I have always sought and received during my graduate career. Thanks also to my committee members Brian Carver and George Necula for their help and guidance. The research presented in this thesis is a joint effort. A special thanks goes to all my co-authors: Adam Barth, Warren He, Eric Lam, Frank Li, John Mitchell, Prateek Saxena, Dawn Song. Over the course of my graduate life I have co-authored papers with nearly 30 different co-authors. These collaborators, all my friends in the Security group, and all my teachers at Berkeley have directly impacted my research, my work, and my evolution as a researcher and I remain thankful to them all. I am extremely lucky to have been surrounded by and worked with such a tremendously talented group of people over the past five years. Pursuing graduate studies was in a large part due to all the great mentors and teachers I have had over the years. I would like to particularly thank my undergraduate advisor, Sundar Balasubramaniam, as well as Helen Wang and Xiaofeng Fan for their fantastic mentoring and advice. Without their help and support, it is unlikely I would have even applied to graduate school. Thanks also to all my friends, from Pilani to Berkeley, who made the stress of graduate life easy to manage. You know who you are and I feel blessed to call such an amazing group of people my friends. Finally, and most importantly, I want to thank my extended family: my brother, my cousins, my uncles and aunts, and their respective families for their amazing love, care, and guidance over the years. I would like to particularly thank all my four aunts: they ensured I got an education and never lost focus. 1 Chapter 1 Introduction Rich client-side HTML5 applications|including packaged browser applications (Chrome Apps) [57], browser extensions [56], Windows 8 Metro applications [98], and applications in new browser operating systems (B2G [105], Tizen [134])|are fast proliferating on diverse computing platforms. These applications run with access to sensitive data such as the user's browsing history, personal and social data, financial documents, and capability-bearing tokens that grant access to these data. A recent study reveals that 58% of the 5,943 Google Chrome browser extensions studied require access to the user's browsing history, and 35% request permissions to the user's data on all websites [34]. In addition, the study found that 67% of 34,370 third-party Facebook applications analyzed have access to the user's personal data [34].1 HTML5 applications also form a significant chunk of mobile applications; Chin et al. recently found that 70% of smartphone applications they surveyed on Google Play rely on HTML5 code in some form [35]. These HTML5 applications often execute with access to the same sensors available to native mobile applications, including private data from GPS receivers, accelerometers, and cameras. These trends indicate the evolution of the client-side web from a front-end for servers to a complex application platform running privileged applications. Despite immense prior research on detection and mitigation techniques [7, 45, 66, 82, 122], web vulnerabilities are still pervasive in HTML5 applications on emerging platforms such as browser extensions [30]. As the HTML5 platform achieves wider adoption, enabling higher-assurance in the HMTL5 applications is critical to its success. In this thesis, we address this need. 1.1 Towards a Formal Foundation for Web Protocols First, we present initial work on formal modeling and verification of web protocols. As we discussed above, HTML5 applications on emerging platforms are moving away from the client/server paradigm to a new paradigm of standalone HTML5 applications that 1 The study measured install-time permissions, which are a lower bound for Facebook applications, since they can request further permissions at runtime. CHAPTER 1. INTRODUCTION 2 call diverse web services. The security of protocols used by HTML5 applications to communicate with diverse cloud-based services is just as critical to the security of the platform as the security of the HTML5 application itself.