Implementing a Secure Abstract Machine

Implementing a Secure Abstract Machine Adriaan Larmuseau1 Marco Patrignani2 Dave Clarke1;2 1Dept. of IT, Uppsala University, Sweden 2iMinds-Distrinet, KU Leuven, Belgium fi[email protected] fi[email protected] ABSTRACT tions are also threatened by the low-level context in which Abstract machines are both theoretical models used to study they reside. In practice, an abstract machine implemen- language properties and practical models of language imple- tation will use or interact with various, low-level libraries mentations. As with all language implementations, abstract and/or components that may be written with malicious in- machines are subject to security violations by the context in tent or susceptible to code injection attacks. Such mali- which they reside. This paper presents the implementation cious low-level components can bypass software based secu- of an abstract machine for ML that preserves the abstrac- rity mechanisms and may disclose confidential data, break tions of ML, in possibly malicious, low-level contexts. To integrity checks and so forth [15]. guarantee this security result, we make use of a low-level This paper presents the derivation and implementation of memory isolation mechanism and derive the formalisation an abstract machine for MiniML, a light-weight version of of the machine through a methodology, whose every step is ML featuring references and recursion, that runs on a pro- accompanied by formal properties that ensure that the step cessor enhanced with the Protected Module Architecture has been carried out properly. We provide an implementa- (PMA) [15]. PMA is a low-level memory isolation mech- tion of the abstract machine and analyse its performance. anism, that protects a certain memory area by restricting access to that area based on the location of the program counter. Our abstract machine executes programs within CCS Concepts this protected memory without sacrificing their ability to •Security and privacy ! Software security engineering; interact with the outside world. Domain-specific security and privacy architectures; To guarantee the security of the implemented abstract machine, we follow a two step methodology to derive the for- Keywords malisation of the abstract machine. In the first step MiniML Abstract Machine; Memory Protection is extended with a secure foreign function interface (FFI). This extension to MiniML is derived by improving the theoretical secure operational semantics of Larmuseau et al. [8], 1. INTRODUCTION with Patrignani et al.'s trace semantics of a realistic low- Abstract machines are both theoretical models used to study level attacker [13]. In the second step we apply the syntactic language properties and practical models of language imple- correspondences of Biernacka et al. [2] to this extension of mentations. Nowadays, several languages, especially func- MiniML, to obtain the formalisation of a CESK machine im- tional ones, are implemented using abstract machines. For plementation for MiniML. For each of these syntactic corre- example, OCaml's bytecode runs on the Zinc abstract ma- spondences we prove that they do not result in the abstract chine and the Glasgow Haskell Compiler uses the Spineless machine leaking security sensitive information. Tagless G-machine internally [7]. After presenting the source language MiniML and the rel- When security-sensitive applications are run by an abstract evant security concepts and formalisations (Section2), this machine it is crucial that the abstract machine implemen- paper makes the following contributions. It describes our tation does not leak security sensitive information. Outside methodology and how we apply it to derive a secure CESK of implementation mistakes, abstract machine implementa- machine for MiniML (Section3). This paper also details our implementation of the CESK machine as well as the performance of the machine in certain test scenarios (Section4). The proposed work is not without limitations. While the formalisation is derived in a correct manner, the implementation is hand-made: possibly introducing mistakes that vi- olate the security properties. As with all software, testing and verification techniques can minimise these risks. ACM 978-1-4503-3738-0. http://dx.doi.org/10.1145/0000000.0000000 2. OVERVIEW The equivalences over the locations of MiniML are a little This section first presents the source language MiniML (Sec- more complex. Due to the deterministic allocation order and tion 2.1) and the formal method used to reason about its the inclusion of the hash operation, a context can observe security (Section 2.2). Next it details the security threats the number of locations as well as their indices. Locations to abstract machine implementations (Section 2.3). Lastly when kept secret, however, still produce equivalences as a we cover the memory isolation mechanism that is used to context C cannot observe locations that do not leave a term. protect the abstract machine implementation (Section 2.4) The following two terms, for example, are thus contextually and the formal attacker model (Section 2.5). equivalent. (ref false; 0) (ref true; 0) (Ex-2) 2.1 The Source Language MiniML Contextual equivalence can be used to capture security prop- The source language of our secure abstract machine imple- erties such as confidentiality and integrity at the language mentation is MiniML: an extension of the typed λ-calculus level. This is illustrated for the following two examples. featuring references and recursion. The syntax is as follows. Confidentiality. Consider the following two contextually t ::= v j x j (t t ) j t op t j t t t j t 1 2 1 2 if 1 2 3 ref equivalent MiniML terms. j t1 := t2 j t1 ; t2 j let x = t1 in t2 j !t j fix t j hash t j letrec x : τ = t1 in t2 let secret = ref 0 in let secret = ref 0 in op ::= + j − j ∗ j < j > j == (λx : Int: secret++; x)(λx : Int: x) v ::= unit j li j n j (λx : τ:t) j true j false τ ::= Bool j Int j Unit j τ1 ! τ2 j Ref τ These two terms differ only in the value they store in the E ::= [·] j E t j v E j E op t j op v E j ::: secret reference. Because these terms are contextually equivalent that implies that there is no MiniML context that can Here n indicates the syntactic term representing the num- read the secret. ber n, τ denotes the types and E is a Felleisen-and-Hieb- style evaluation context with a hole[ ·] that lifts the basic Integrity. Consider the following two contextually equiv- reduction steps to a standard left-to-right call-by-value se- alent MiniML terms. mantics. The letrec operator is syntactic sugar for a com- (λx : Int ! Int: (λx : Int ! Int: bination of let and fix. The operands op apply only to booleans and integers. The locations li are an artefact of let y = ref 0 in let y = ref 0 in (x 1)) the dynamic semantics that do not appear in the syntax let r = (x 1)in used by programmers and are tracked at run-time in a store µ ::= ; j µ, li = v, which is assumed to be an ideal store: if (!y =0) r (−1)) it never runs out of space. Allocating new locations is done These two terms differ only in that the left term does an deterministically l ; l ; ::; l . The term t maps a loca- 1 2 n hash integrity check: it checks that the reference y was not mod- tion to its index: l 7! i, similar to how Java's :hashCode i ified. Because these terms are contextually equivalent that method converts references to integers. implies that there is no way for a MiniML context to modify The reduction and type rules are standard and are thus omit- the reference y. ted. The interested reader can find the full formalisation of the semantics in an accompanying tech report [9]. 2.3 The Security Challenges of Abstract Ma- chine Implementations 2.2 Contextual Equivalence An abstract machine for MiniML is a program that inputs To formally state and reason about the security concerns of programs written in MiniML and then executes them ac- MiniML, contextual equivalence is used. A MiniML context cording to the semantics for MiniML encoded in the ma- C is a MiniML term with a single hole [·], two MiniML terms chine. An abstract machine implementation has two security are contextually equivalent if and only if no context C can concerns: (i) implementation mistakes and (ii) malicious be- distinguish them. haviour by the low-level context in which it resides. In this Definition 1. Contextual equivalence is defined as: work we consider the latter concern. def More specifically, we consider the threats posed to an ab- t1 ' t2 = 8C : C [t1 ]* () C [t2 ]* stract machine implementation by an attacker with kernel- level code injection privileges. Kernel-level code injection where * denotes divergence, t and t are closed terms and 1 2 is a critical vulnerability that bypasses all existing software- neither the terms nor the contexts feature explicit locations based security mechanisms: disclosing confidential data, dis- l as they are not part of the static semantics. Note that i rupting applications and so forth [15]. This attacker poses contextually equivalent MiniML terms are of the same type four threats to abstract machine implementations. τ as a context C observes the same typing rules as the terms. MiniML's λ-terms introduce many equivalences, there is no Inspection and manipulation. An abstract machine context C, for example, that can distinguish the following must isolate running programs and their machine state from two λ-terms. any kind of inspection and manipulation from outside the abstract machine. A failure to do so could break the confi- (λx : Int: 0) (λx : Int:(x − x)) (Ex-1) dentiality and integrity requirements of MiniML programs.

Load more