Efficient Verification of Oblivious Algorithms with Unobservable State

ObliCheck: Efficient Verification of Oblivious Algorithms with Unobservable State Jeongseok Son Griffin Prechter Rishabh Poddar Raluca Ada Popa Koushik Sen University of California, Berkeley Abstract an attacker can reconstruct secret information such as confi- Encryption of secret data prevents an adversary from learning dential search keywords, entire sensitive documents, or secret sensitive information by observing the transferred data. Even images. though the data itself is encrypted, however, an attacker can As a result, a rich line of work designs oblivious execu- watch which locations of the memory, disk, and network are tion to prevent such side channels based on access patterns. accessed and infer a significant amount of secret information. There are two types of oblivious algorithms. The first, Oblivi- To defend against attacks based on this access pattern leak- ous RAM (ORAM) [31, 66], can be used generically to hide age, a number of oblivious algorithms have been devised. accesses to memory, and fits best for workloads of the type These algorithms transform the access pattern in a way that “point queries”. Intuitively, ORAM randomizes accesses to the access sequences are independent of the secret input data. memory. However, even the fastest ORAM scheme incurs Since oblivious algorithms tend to be slow, a go-to optimiza- polylogarithmic overhead proportional to the memory size tion for algorithm designers is to leverage space unobservable per access, which becomes prohibitively slow for processing to the attacker. However, one can easily miss a subtle detail a large amount of data as in data analytics and machine learn- and violate the oblivious property in the process of doing so. ing. For these workloads, instead, researchers have proposed In this paper, we propose ObliCheck, a checker verify- a large array of specialized oblivious algorithms, such as algo- ing whether a given algorithm is indeed oblivious. In con- rithms for joins, filters, aggregates [7, 11, 14, 19, 57, 78], and trast to existing checkers, ObliCheck distinguishes observable machine learning algorithms [36, 48, 58, 64]. These special- and unobservable state of an algorithm. It employs symbolic ized algorithms work by accessing memory according to a execution to check whether all execution paths exhibit the predefined schedule of accesses, which depends only on an same observable behavior. To achieve accuracy and efficiency, upper bound on the data size and not on data content. In this ObliCheck introduces two key techniques: Optimistic State paper, we focus on such specialized oblivious algorithms. Merging to quickly check if the algorithm is oblivious, and Oblivious algorithms in general tend to be notoriously slow Iterative State Unmerging to iteratively refine its judgment if (e.g., hundreds of times for data analytics [78] and tens of the algorithm is reported as not oblivious. ObliCheck achieves times for point queries [66]). To reduce such overhead, many ×50300 of performance improvement over conventional sym- oblivious algorithms take advantage of an effective design bolic execution without sacrificing accuracy. strategy: they leverage special regions of memory that are not observable to the attacker. Such unobservable memory, albeit 1 Introduction often smaller than the observable one, allows the algorithm to Security and privacy have become crucial requirements in the make direct and fast accesses to data. It essentially works as modern computing era. To preserve the secrecy of sensitive a cache for the slower observable memory, which is accessed data, data encryption is now widely adopted and prevents obliviously. Different techniques choose different resources an adversary from learning secret information by observing as unobservable. For example, some techniques [7,51,58,60] the data content. However, attackers can still infer secret in- treat registers as unobservable but all the cache and main mem- formation by observing access patterns to the data. Even ory as observable in the context of hardware enclaves such as though the data itself is encrypted, an attacker can watch Intel SGX. GhostRider [46] employs an on-chip scratchpad which locations of the memory, disk, and network are ac- as an unobservable space to make the memory trace oblivious. cessed. Such concerns are growing with the increasing adop- Certain techniques focus on the network as being observable tion of hardware enclaves such as Intel SGX [49], which by an attacker and the internal secure region of a machine as provides memory encryption but does not hide accesses to unobservable [57, 78]. These techniques show one or more memory. By simply observing the access patterns, several orders of magnitude [78] performance improvement by lever- research efforts [23, 37, 42, 43, 47, 57, 58, 71] have shown that aging the unobservable memory. While generic algorithms like ORAM are heavily scruti- both send an identically-sized encrypted message over the nized, specialized algorithms designed for different settings network, our checker can conclude both branches maintain do not receive the same level of scrutiny. Further, these al- the same observable state (the size of the message and its des- gorithms can be quite complex, balancing rich computations tination) since the message content itself is encrypted (thus with efficiency. The designer can miss a subtle detail and vio- unobservable). late the oblivious property. Currently, an oblivious algorithm However, a naïve application of symbolic execution does comes with written proof, and users must verify the proof not scale. The main challenge with employing symbolic ex- manually. As a result, recent research efforts devise ways to ecution is that the program state quickly blows up as the check whether an algorithm is oblivious in an automated way number of branches in the program increases, making it in- (by looking for a secret dependent branch) using taint analy- feasible to complete the check for many algorithms. While sis [15,33,59,77]. These techniques, however, cannot discern traditional state merging [10,27, 27, 30,63] can merge states unobservable state and would classify an algorithm as not to alleviate the path explosion problem to some extent, it only oblivious because of its non-oblivious accesses to unobserv- works when the values in two different paths are the same. To able state. Thus, they cannot model a vast array of modern address this problem, ObliCheck employs a novel optimistic oblivious algorithms. state merging technique (§4), which leverages the domain- We propose ObliCheck, a checker that can verify oblivious specific knowledge of oblivious algorithms that the actual algorithms having unobservable state in an efficient and accu- values are unobservable to the attacker. ObliCheck uses this rate manner. ObliCheck allows algorithm designers to write insight to optimistically merge two different unobservable an oblivious algorithm using ObliCheck’s APIs to distinguish values by introducing a new unconstrained symbolic value for between observable and unobservable space. Based on this over-approximating the two unobservable values. distinction, ObliCheck precisely records the access patterns Such “aggressive” state merging for symbolic values is visible to an attacker. Then, ObliCheck automatically proves effective at tackling path explosion, but could result in a false that the algorithm satisfies the obliviousness condition. Oth- “not-oblivious” prognosis. If a symbolic variable, x, is merged erwise, ObliCheck provides counterexamples – i.e., inputs into an unconstrained new symbolic variable y, later accesses that violate the oblivious property – and identifies program to y in a conditional statement may trigger an execution path statements that trigger non-oblivious behavior. which would have been impossible if x were not replaced with ObliCheck primarily aims to verify the oblivious property unconstrained y. To address this issue, we devise a technique of an algorithm, not the actual implementation of the algo- called iterative state unmerging (§5). ObliCheck records sym- rithm. We use a subset of JavaScript for modeling algorithms. bolic variables merged during the execution. Then, it iter- We made this choice to leverage an existing program analy- atively refines its judgment by backtracking the execution sis framework, Jalangi [61], for ObliCheck’s implementation. and unmerges a part of merged variables which may have Moreover, we focus on a subset of the language because ver- caused the wrong prognosis. This iterative probing process ification of programs in the full JavaScript language could continues until it either classifies the algorithm as oblivious, result in verification conditions having undecidable theories. or completes the refinement process. Automated verification fails for undecidable theories. We ex- Although iterative state unmerging costs extra symbolic pect that an algorithm designer will use ObliCheck to verify execution, we find that the overhead is tolerable. This is be- algorithms rapidly during the algorithm design phase, instead cause our target algorithms are mostly oblivious: an algorithm of trying to verify the algorithm manually. designer who wants to check their algorithm for obliviousness likely did a decent job making much of the algorithm 1.1 Techniques and contributions oblivious, but is worried about subtle mistakes. Hence, most We observed that taint analysis used in prior work [15, 33, 59, algorithms require few iterations of the iterative state unmerg- 77] is too ‘coarse’

Load more