Wang Paper (Prepublication)
Total Page:16
File Type:pdf, Size:1020Kb
Riverbed: Enforcing User-defined Privacy Constraints in Distributed Web Services Frank Wang Ronny Ko, James Mickens MIT CSAIL Harvard University Abstract 1.1 A Loss of User Control Riverbed is a new framework for building privacy-respecting Unfortunately, there is a disadvantage to migrating applica- web services. Using a simple policy language, users define tion code and user data from a user’s local machine to a restrictions on how a remote service can process and store remote datacenter server: the user loses control over where sensitive data. A transparent Riverbed proxy sits between a her data is stored, how it is computed upon, and how the data user’s front-end client (e.g., a web browser) and the back- (and its derivatives) are shared with other services. Users are end server code. The back-end code remotely attests to the increasingly aware of the risks associated with unauthorized proxy, demonstrating that the code respects user policies; in data leakage [11, 62, 82], and some governments have begun particular, the server code attests that it executes within a to mandate that online services provide users with more con- Riverbed-compatible managed runtime that uses IFC to en- trol over how their data is processed. For example, in 2016, force user policies. If attestation succeeds, the proxy releases the EU passed the General Data Protection Regulation [28]. the user’s data, tagging it with the user-defined policies. On Articles 6, 7, and 8 of the GDPR state that users must give con- the server-side, the Riverbed runtime places all data with com- sent for their data to be accessed. Article 17 defines a user’s patible policies into the same universe (i.e., the same isolated right to request her data to be deleted; Article 32 requires instance of the full web service). The universe mechanism a company to implement “appropriate” security measures allows Riverbed to work with unmodified, legacy software; for data-handling pipelines. Unfortunately, requirements like unlike prior IFC systems, Riverbed does not require devel- these lack strong definitions and enforcement mechanisms at opers to reason about security lattices, or manually annotate the systems level. Laws like GDPR provide little technical code with labels. Riverbed imposes only modest performance guidance to a developer who wants to comply with the laws overheads, with worst-case slowdowns of 10% for several while still providing the sophisticated applications that users real applications. enjoy. The research community has proposed information flow 1 INTRODUCTION control (IFC) as a way to constrain how sensitive data spreads In a web service, a client like a desktop browser or smart- throughout a complex system [35, 42]. IFC assigns labels to phone app interacts with datacenter machines. Although program variables or OS-level resources like processes and smartphones and web browsers provide rich platforms for pipes; given a partial ordering which defines the desired secu- computation, the core application state typically resides in rity relationships between labels, an IFC system can enforce cloud storage. This state accrues much of its value from server- rich properties involving data secrecy and integrity. Unfortu- side computations that involve no participation (or explicit nately, traditional IFC is too burdensome to use in modern, consent) from end-user devices. large-scale web services. The reason is that creating and main- By running the bulk of an application atop VMs in a com- taining a partial ordering of labels is too difficult—the average modity cloud, developers receive two benefits. First, develop- programmer or end-user struggles to reason about data safety ers shift the burden of server administration to professional via the abstraction of fine-grained label hierarchies. As a re- datacenter operators. Second, developers gain access to scale- sult, no popular, large-scale web service uses IFC to restrict out resources that vastly exceed those that are available to how sensitive data is processed and shared. a single user device. Scale-out storage allows developers to co-locate large amounts of data from multiple users; scale-out 1.2 Our Solution: Riverbed computation allows developers to process the co-located data In this paper, we introduce Riverbed, a distributed web plat- for the benefit of users (e.g., by providing tailored search form for safeguarding the privacy of user data. Riverbed pro- results) and the benefit of the application (e.g., by providing vides benefits to both web developers and end users. To web targeted advertising). developers, Riverbed provides a practical IFC system which Web content who share the same data flow policies. We call each copy a (x.com) universe. Since users in the same universe allow the same Browser Application code Universes types of data manipulations, any policy violations indicate HTTP POST upload.html true problems with the application (e.g., the application tried <Sensitive user data> HTTP response IO to transmit sensitive data to a server that was not whitelisted by the inhabitants of the universe). Riverbed Riverbed managed web proxy runtime (IFC) Before a user’s Riverbed proxy sends data to a server, the x.com User-defined policy Trusted hardware proxy employs remote attestation [9, 15] to verify that the Riverbed y.com server is running an IFC-enforcing Riverbed runtime. Impor- policies policy z.com tantly, a trusted server will perform next-hop attestation—the policy HTTP POST upload.html <Sensitive user data> server will not transmit sensitive data to another network + IO devices endpoint unless that endpoint is an attested Riverbed run- x.com policy time whose TLS certificate name is explicitly whitelisted by Figure 1: Riverbed’s architecture. The user’s client device is on the the user’s data flow policy. In this manner, Riverbed enables left, and the web service is on the right. Unmodified components are controlled data sharing between machines that span different white; modified or new components are grey. domains. allows developers to easily “bolt on” stronger security poli- 1.3 Our Contributions cies for complex applications written in standard managed To the best of our knowledge, Riverbed is the first distributed languages. To end users, Riverbed provides a straightforward IFC framework which is practical enough to support large- mechanism to verify that server-side code is running within a scale, feature-rich web services that are written in general- privacy-preserving environment. purpose managed languages. Riverbed preserves the tradi- Figure 1 describes Riverbed’s architecture. For each tional advantages of cloud-based applications, allowing devel- Riverbed web service, a user defines an information flow opers to offload administrative tasks and leverage scale-out policy using simple, human-understandable constraints like resources. However, Riverbed’s universe mechanism, coupled “do not save my data to persistent storage” or “my data may with a simple policy language, provides users with understand- only be sent over the network to x.com.” In the common able, enforceable abstractions for controlling how datacenters case, users employ predefined, templated policy files that are manipulate sensitive data. Riverbed makes it easier for devel- designed by user advocacy groups like the EFF. When a user opers to comply with laws like GDPR—users give explicit generates an HTTP request, a web proxy on the user’s de- consent for data access via Riverbed policies, with server-side vice transparently adds the appropriate data flow policy as a universes constraining how user data may be processed, and special HTTP header. where its derivatives can be stored. We have ported several non-trivial applications to Riverbed, Within a datacenter, Riverbed leverages the fact that many and written data flow policies for those applications. Our services run atop managed runtimes like Python, .NET, or experiments show that Riverbed enforces policies with worst- the JVM. Riverbed modifies such a runtime to automatically case end-to-end overheads of 10%. Riverbed also supports taint incoming HTTP data with the associated user policies. legacy code with little or no developer intervention, making As the application derives new data from tainted bytes, the it easy for well-intentioned (but average-skill) developers to runtime ensures that the new data is also marked as tainted. If write services that respect user privacy. the application tries to externalize data via the disk or the net- work, the externalization is only allowed if it is permitted by 2 RELATED WORK user policies. The Riverbed runtime terminates an application In this section, we compare Riverbed to representative in- process which attempts a disallowed externalization. stances of prior IFC systems. At a high level, Riverbed’s In Riverbed, application code (i.e., the code which the innovation is the leveraging of universes and human- managed runtime executes) is totally unaware that IFC is understandable, user-defined policies to enforce data flow con- occurring. Application developers have no way to read, write, straints in IFC-unaware programs. Riverbed enforces these create, or destroy taints and data flow policies. The advantage constraints without requiring developers to add security anno- of this scheme is that it makes Riverbed compatible with tations to source code. code that has not been explicitly annotated with traditional IFC labels. However, different end users will likely define 2.1 Explicit Labeling incompatible data flow policies. As a result, policy-agnostic In a classic IFC system, developers explicitly label program code would quickly generate a policy violation for some state, and construct a lattice which defines the ways in which subset of users; Riverbed would then terminate the application. differently-labeled state can interact. Roughly speaking, a To avoid this problem, Riverbed spawns multiple, lightweight program is composed of assignment statements; the IFC sys- copies of the back-end service, one for each set of users tem only allows a particular assignment if all of the policies involving righthand objects are compatible with the policies TaintDroid works on unmodified applications.