<<

Cryptographic Protocol Synthesis and Verification for Multiparty Sessions

Karthikeyan Bhargavan2,1 Ricardo Corin1 Pierre-Malo Denielou´ 1 Cedric´ Fournet2,1 James J. Leifer1 1 MSR-INRIA Joint Centre, Orsay, France 2 Microsoft Research, Cambridge, UK

Abstract at the application layer, for instance by embedding signatures in message payloads. We the design and implementation of a Rather than hand-crafted protocol design, we advo- compiler that, given high-level multiparty session de- cate the use of compilers and automated verification scriptions, generates custom cryptographic protocols. tools for systematically generating secure, efficient Our sessions specify pre-arranged patterns of mes- cryptographic protocols from high-level descriptions. sage exchanges and data accesses between distributed We outline our approach as regards design, implemen- participants. They provide each participant with strong tation, and security verification. security guarantees for all their messages. Multiparty sessions Multiparty sessions, also called Our compiler generates code for sending and receiv- session types, have proved to be an elegant way of ing these messages, with cryptographic operations and specifying structured communications protocols be- checks, in order to enforce these guarantees against tween distributed parties. We define a language for any adversary that may control both the network and specifying multiparty sessions. This language features some session participants. a clear notion of control flow, expressed as asyn- We verify that the generated code is secure by chronous messages, with a shared, distributed store relying on a recent type system for . Most that may be updated and read during communication, of the proof is performed by mechanized type checking as well as dynamic selection of additional parties to and does not rely on the correctness of our compiler. join the protocol. The global store is subject to fine We obtain the strongest session security guarantees to grained read/write access control and may be used to date in a model that captures the executable details of selectively share and hide data, and to commit to values protocol code. We illustrate and evaluate our approach which are initially blinded and later revealed during on a series of protocols inspired by web services. protocol execution. Our design enables simple, abstract reasoning on and secrecy properties of 1. Security by compilation and typing sessions. For example, the correspondence properties traditionally established for cryptographic protocols Taking advantage of modern programming tools, can be read off session specifications. one can sometimes design, develop, and deploy com- On the other hand, our sessions do not attempt to plex distributed protocols in a matter of hours, re- capture the behaviour of local participants: the applica- lying, for instance, on automated proxy generators tion programmer remains in charge of deciding which to rapidly expose existing applications as networked sessions to join (and in particular which principals to services. Achieving application security under realistic accept as remote peers), which message to send (when assumptions on the network and remote parties is the session offers a choice), which values to write, and more difficult. The problem is that widely-available how to treat received messages. protocols for cryptographic communications (say TLS or IPSEC) operate at a lower level; they provide au- From sessions to cryptographic protocols Whereas thenticity and confidentiality guarantees only for mes- much work has been done on session types and sages exchanged between two URLs or IP addresses, expressiveness, we believe that ours are the first to but leave the interpretation of these messages and protect session execution integrity via the systematic addresses to the application programmer. In particular, generation of cryptographic protocols. We implement any guarantee that involves more than two parties (say, a compiler from the session language to custom pro- clients, gateways, and web servers) must be carefully tocols, coded as ML modules, which can be linked established by linking lower-level guarantees on mes- to application code for each party of the protocol. sages, or by layering ad hoc cryptographic mechanisms These modules can be compiled both with F# [Syme

To appear in the proceedings of the 22st IEEE Foundations Symposium (CSF 2009). c IEEE 2009. et al., 2007] using .NET cryptographic libraries, and code). Thus, we obtain strong security guarantees in a with OCaml using OpenSSL libraries. Our compiler model that accounts for the actual details of our code, combines a variety of cryptographic techniques and without the need to trust our protocol compiler. primitives to produce compact message formats and An alternative approach would be to verify, or even fast processing. Hence, we shift most of the complexity certify, our session compiler (4 300 LOCs in ML). That of our implementation to the compiler, which generates task appears much more complex; it is beyond the ca- efficient custom protocols with a minimal amount of pability of automatic tools at present and would require dynamic processing. We illustrate and evaluate our long, delicate handwritten proofs. That approach is implementation on a series of protocols inspired by also brittle when experimenting with language design web services. and cryptographic optimization, and would not provide As an important design goal, we entirely hide cryp- direct guarantees at the level of the generated code. tographic enforcement from the application program- We obtain additional functional properties by typing: mer, who may thus reason about the runtime behaviour any well-typed application code (for ordinary ML of a session as if every participant followed precisely typing) linked to our protocol implementation complies its high level specification. In particular, our imple- with the session specification; at any point in the mentation preserves the communications structure of session, it may send only one of the messages indicated the session, with exactly one low-level cryptographic in the global sessions, and it must provide a message message for every message of the session. Similarly, handler for every message that may be subsequently any low-level message that fails to verify is silently received. discarded and does not affect the state of the session. On the other hand, we do not address many other properties of interest, such as liveness (any participant Security verification Cryptographic communications may block our sessions), resistance to traffic analysis are difficult to design and implement correctly; in (only our payloads are kept secret), and mitigation of particular, we need solid correctness properties for the denial-of-service attacks. cryptographic code generated by our protocol compiler. Contributions In summary, our contributions are: Various work addresses this problem. We build on 1) A high-level language for specifying multiparty an approach proposed by Bhargavan et al. [2006] who sessions, with integrity and secrecy support for aim at verifying executable protocol code, rather than a global store, and dynamic principal selection; abstract protocol models, to narrow the gap between this language enables simple, abstract reasoning what is verified and what is deployed. Specifically, on global control and data flows. we use the refinement typechecker of Bengtson et al. 2) A family of custom cryptographic protocols that [2008] based on the Z3 SMT solver of de Moura and combine standard cryptographic and networking Bjorner [2008]. This is a good match for our present primitives to support our security requirements. purposes: for each session specification, our compiler 3) A prototype compiler that generates ML inter- generates detailed type annotations (from a predicate faces and implementations for our protocols, as logic) which are then mechanically checked against well as proof annotations. actual executable code. Thus, we overcome a common 4) Security theorems stating that, from the view- limitation of cryptographic verification: typechecking point of compliant participants, all sessions al- is modular, so each function can be checked separately ways run according to their global specification, and verification time grows linearly with the number despite active adversaries in control of both the of functions. network and compromised participants. We define compromised principals as those whose 5) Experimental results for a series of multiparty keys are known to the adversary; they include ma- sessions of increasing complexity, showing that licious principals as well as principals whose keys our approach yields efficient distributed code. have been inadvertently leaked. We say that all other 6) Novel, mostly-automated security proof tech- principals are compliant. Our goal is to protect com- niques: to our knowledge, ours are the first pliant principals from an adversary who controls all automated generate-and-verify implementations compromised principals and the network. To this end, for multiparty cryptographic protocols, and our we verify the generated protocols, showing security generated protocols are the largest to date with for all runs, even when some of the parties involved verified implementations. are compromised. Our proof combines invariants estab- lished through typechecking with a general argument Related work We build upon earlier work [Corin on the structure of the protocol (but independent of the et al., 2008] which explores the secure cryptographic

2 () Fault () c (x) Reply (x) c (c,w,) Request (c,w,q) (c,w,q) Request (c,w,q) (x) Reply (x) c w () Fault () c w () Exit () (a) Ws c (b) Wsn (q) Retry (q) c w

() Forward (c,p,w,q) (c,p,w,q) Request (c,p,w,q) (x) Reply (x) c p () Audit (c,p,w) () Resume (q) w c (d) Details (d) w p

(c) Proxy (o) Retry (o)

(c,p,q) Query (c,p,q) (w) Server (w) () Forward (c,p,w,q) (x) Reply (x) c p c w c (d) Redirect

Figure 1. Sample sessions: (a) Ws: a single query; (b) Wsn: an iterated query; (c) Proxy: a three-party negotiation; (d) Redirect: a query forwarded to a dynamically selected web server. implementation of session abstractions for a simpler Contents Section 2 describes and illustrates our de- language; Corin and Denielou´ [2007] detail its first sign for multiparty sessions. Section 3 presents our design and implementation. The main differences are: a programming model for using sessions from ML. Sec- much-improved expressiveness (with support for value tion 4 states our main theorem. Section 5 describes the binding and dynamic selection of session participants); cryptographic implementation. Section 6 outlines the a more sophisticated implementation (with more effi- hybrid security proof. Section 7 reports experimental cient cryptographic mechanisms); a simpler and more results with our prototype. Additional materials, in- realistic model for the adversary; and a complete cluding detailed proofs [Bhargavan et al., 2009] and formalization of the generated code, with automated the code for all examples and libraries, are available proofs (we used manual proofs on a simplified model). online at http://msr-inria.inria.fr/projects/sec/sessions. Session types Honda, Vasconcelos, and Kubo [1998], Gay and Hole [1999], Dezani-Ciancaglini et al. [2005, 2. Multiparty sessions 2006], Vasconcelos et al. [2006], Bonelli and Com- pagnoni [2007], and Honda, Yoshida, and Carbone We introduce sessions by illustrating their graphical [2008] consider type systems for concurrent and dis- representation and describing their features and design tributed sessions. However, they do not consider im- choices. Sessions are subject to well-formedness prop- plementations or security. More recently, Hu et al. erties (listed in Appendix A), which we also discuss by [2008] integrate session types in Java; McCarthy and example. We then give a formal definition of sessions Krishnamurthi [2008] globally specify security proto- as graphs that obey all well-formedness properties. col narrations and show functional (but not security) Figure 1 represents our four running examples, aspects of their projection to local roles [like in Honda loosely inspired by Web Services; they involve a client, et al., 2008]. Guha et al. [2009] infer sessions types a web service, and a proxy. from existing Javascript applications. Graphs, roles, and labels The first session, named Verified cryptographic implementations Further re- Ws, is depicted as a directed graph with a distinguished lated work tackles secure implementation problems for (circled) initial node. Each node is decorated by a other programming models. For instance: Malkhi et al. role; here we have two roles, c for “client” and w [2004] develop implementations for cryptographically- for “web service”. Each edge is identified by a unique secured multiparty computations, based on replicated label, in this case Request, Reply, or Fault. (The other blinded computation. Zheng et al. [2003] develop com- annotations on the edges are explained below.) The pilers that can automatically split sequential programs source and target roles of an edge (or label) are the between hosts running at different levels of security, distinct roles decorating, respectively, the edge’s source while maintaining information-flow properties. Fournet and target nodes. Thus, Reply has source role w and and Rezk [2008] propose cryptographic type systems target role c. Each role represents a participant of and compilers for distributed information-flow prop- the session, with its own local implementation and erties. Those works consider imperative programs, application code for sending and receiving messages, whereas our sessions enable a more structured, declar- subject to the rules of session execution, which we ative view of interactions between roles, leading to explain next. The precise way in which application simpler, more specific security properties. code links to sessions is described in Section 3.

3 Session execution Sessions specify patterns of al- into variable q, representing some query, as it sends lowed communication between the roles: in Ws, the the Request message, since q appears in the written client starts a session execution by sending a Request variables of Request. This variable also appears in the to the web service, which in turn may send either a read variables of Request, so the proxy may in turn Reply or a Fault back to the client. Each execution of read q and then take a decision whether to carry on a session consists of a walk of a single token through with Forward or Audit. In both cases, the proxy may the session graph. At each step, the role decorating not modify q, since q does not appear anywhere else as the token’s current node chooses one of the outgoing a written variable, so the web service eventually gets edges, and the token advances to the target node of the the same value of q as the proxy did. chosen edge. If the token reaches a node that has no Variables may be hidden from roles. During each outgoing edges, session execution terminates. iteration of the Detail-Retry loop, the web service may modify d as it sends Details, and likewise the proxy Loops and branches The session Wsn of Figure 1 may modify o. Both these variables are hidden from extends the graph with a cycle. On receiving a Request the client role, since it has no incoming edges where d or an Retry, the web service may choose to either or o are read. Intuitively, the graph represents a global terminate the session or send a message Reply back viewpoint, so the variables locally written and locally to the client; the client and service may then repeat read on an edge need not coincide, and all readers are this Reply-Retry loop any number of times before the guaranteed to get the same values unless the variable is client chooses to Exit the session or receives a Fault. explicitly rewritten. However, on each edge, the source The session Proxy of Figure 1 introduces a third role must know all the locally read variables, so that role, p for “proxy”, that intercedes between the client it can forward them to the target. and the web service and chooses between two branches of the session upon receiving a Request from the client. Assigning principals to roles Roles themselves are In the shorter upper branch, the proxy sends a treated as a special class of variables and are assigned Forward message to the web service, which gives a during session execution to principals, representing Reply to the client. some participant equipped with a network address and In the lower branch, the proxy instead sends an a cryptographic identity. Audit message, indicating that further processing is In session Proxy, the principal acting as client ini- needed before accepting the request. The web service tially assigns principals to all three role variables c, and proxy then loop via Details and Retry until the p, and w, writing these in the Request message. (To proxy is satisfied and sends Resume to the web service, form this message, the principal must use credentials which, finally, gives a Reply back to the client. associated with its claimed identity c.) If a compromised principal controls a role that In the general case, the first message need not write chooses between branches, then it may send messages all the role variables; instead, the principals already along multiple branches, hence violating the invariant involved in a session may negotiate who should fill its that each session execution has a single token. To de- remaining open roles. Well-formedness properties on tect such violations, we exclude graphs with branches sessions ensure that all principals agree on role assign- where the set of roles along one branch is not included ments: role variables are written only once, and must in the set of roles along the other. This restriction is have been read by all principals already in the session formally stated as the well-formedness property 4 in before sending any message to a new participant. Appendix A. In session Redirect of Figure 1, for instance, the first Binding and receiving values Each session has a message assigns c and p but not w, the web server role. finite set of typed mutable variables (though their types Based on the client’s query q, the proxy p, acting as are omitted from graphs for brevity) and imposes an a discovery service, can dynamically select a server access control discipline for writing and reading these w and redirects c to continue its session with w. The variables, via the annotation on each edge in the graph: client can then either contact w, or drop the session if the vector just before the label constitutes the written it does not trust w. On the other hand, our compiler variables; the vector just after constitutes the read would reject a session such that p directly forwards the variables. At the start of session execution, all variables request to any server of its choice. (In that case, the are uninitialised. At each communication, the source graph would still indicate that a single server is passed role assigns values to the edge’s written variables, and the client request, whereas a compromised proxy may the target role receives values of the read variables. pass the request to multiple servers and then choose In session Proxy, the client writes an initial value which reply it prefers.)

4 Global session graphs (definition) In preparation for our formal development in Section 4, we define ses- sions as directed graphs, where nodes are session states tagged with their role, and edges are labelled with message descriptors decorated with written and read variables. We write ve to denote sequences (v0 . . . vk). A session graph

G = (R, V, X , L, m0 ∈ V, E,R : V → R) 0 consists of a finite set of roles r, r , ri ∈ R; a finite set 0 of nodes m, m , mi ∈ V; a set of variables X = Xd]R Figure 2. Compiling session programs (the disjoint union of data variables Xd and roles R); a set of labels f, g, l ∈ L; an initial node m0; a set generated protocol module to run Ws sessions, but may 0 of labelled edges (m, x,e f, y,e m ) ∈ E (where E ⊆ also participate in other sessions and communications. ∗ ∗ V × X × L × X × V) for which each variable occurs However, compromised principals are free to use their at most once in the vector xe and likewise for ye (though own session protocol implementation. (The figure de- the two vectors may have variables in common); and picts the case where both principals are compliant, but a function R from nodes to roles. our security theorems are more general; see Section 4.) Edges are uniquely identified by their labels, thus All our implementation code is in ML, and may we may unambiguously conflate them. For an edge be compiled either by the F# compiler using .NET 0 (m, x,e f, y,e m ), we say that write(f) = xe and libraries for networking and cryptography, or by the read(f) = ye, the written and read variables of f OCaml compiler using the OpenSSL library. (respectively). We use src(f) = R(m) and tgt(f) = In the rest of this section, we describe the structure 0 R(m ) for the source and target roles of f. of the main elements of our compilation framework. A path is a sequence of labels fe where the target node of each label is the source node of the next one. Syntactic sessions We explain the syntax of sessions by example. (Appendix B gives the full syntax, with We write ffe or feg to denote the path fe concatenated e support for loops and joins). The file Ws.session spec- with a final f or another path ge, respectively. The empty path is written ε. An initial path is a path for ifies Ws as follows: which the source node of the first label is m0, the initial session Ws = node of the graph. An extended path is a sequence of varq: string alternating labels (not necessarily adjacent) and lists varx: int ˆ rolec: unit= of variables, of the form (xe0)f0 ... (xek)fk. We let f send (Request (c,w,q); recv [ Reply (x) | Fault ]) range over extended paths. rolew: string= Appendix A formalizes well-formedness properties recv [Request (c,w,q) → send ( Reply (x) + Fault )] for session graphs. In the rest of this paper, we only The session Ws has two variables (q, x) and two roles consider graphs that satisfy these properties. (c, w). Each variable represents a value communicated in the session and is given a type. Each role is 3. Programming with sessions given a return type and a role process that describes the local sequences of alternating send and receive Figure 2 illustrates our framework for session Ws actions for this role. At every send, the role process of Figure 1. The programmer first writes a ses- expresses an internal choice (+) of messages that may sion description (Ws.session). This file is compiled be sent, depending on the application. At every recv, to generate a library module (Ws protocol.ml) that it expresses an external choice (|) of messages that implements all cryptographic communications for the may be received, depending on the other roles. Here, session and provides a simple continuation-based API c sends Request, writing c, w, and q, then receives for each role. We verify this protocol implementation either a Reply (reading x) or a Fault. by typechecking it against a refined type interface Our syntax for sessions is local, with a process for (Ws protocol.ml7) also generated by the compiler each role, unlike the global session graphs in Section 2, (Section 6). The programmer then writes application but we can convert between the two representations. code for each of the roles of the session he wishes to For example, the four graphs depicted in Figure 1 run, here code for the client and for the web service. were automatically generated from syntactic sessions Compliant principals run application code that uses the during their compilation. Our compiler checks that

5 each session yields a well-formed graph with respect To verify that the generated cryptographic protocol to the properties in Appendix A; for example, sends code meets this security goal, the compiler generates and receives within a role process must alternate, and an interface (Ws protocol.ml7) that encodes the ses- only one role may send a message with a particular sion graph in terms of logical pre- and post-conditions label. on the protocol functions that send and receive session messages. This interface uses an ML syntax extended Generated protocol module: role functions The with refinement types that allow the embedding of for- generated protocol module provides send and receive mulas in a first-order logic; a specialized typechecker functions for the messages that may be sent or received verifies these formulas by calling out to an automated in a session. These functions are not used directly by theorem prover [Bengtson et al., 2008]. application code; instead, the protocol interface exports a function for each protocol role, called a role function, 4. Session integrity that implements the corresponding role process by performing all session-related sends and receives for In this section, we formalize our main security a given role. theorem for protocol implementations generated by our The first argument of a role function contains in- compiler after they have been verified by typechecking. formation about the running principal (IP address and The theorem is stated in terms of the events emitted asymmetric keys). Using a continuation-passing style, by the implementation, for all messages sent and the second argument allows the application to bind received by compliant principals during session exe- variables and choose which message is to be sent at cutions: irrespective of the behaviour of compromised each state. For instance, the role function for the web principals, these events always correspond to correct service of session Ws is declared by: session traces. A verified Se-system is a program com- valw: principal → m0 → result w posed of the ML modules: Data, Net, Crypto, Prins, (S protocol)i=1..k,U where result w is the server return type (here string) i and the type m0 encodes the role process for w as: where • Data, Net, Crypto, and Prins are symbolic im- type m0 = {hRequest: (var c ∗ var w ∗ var q → m1)} and m1 = Reply of (var x ∗ result w) | Fault of result w plementations of trusted platform libraries; Data is for bytearrays and strings, Net for networking, where var [cwqx] are the types of the variables. Hence, Crypto for cryptography, and Prins maps prin- m0 defines a single message handler for the Request cipals to cryptographic keys (see Section 5); message; the handler takes a triple of values for the • Si protocol is the verified module generated by read variables c, w, q as input and computes a response our compiler from the session description Si and of type m1 that is either a Reply or a Fault. then typechecked against its refined typed inter- In addition to enforcing a role process, role functions faces and those of the libraries, for i = 1, . . . , k. have two features. They log security events before • U represents application code, as well as the sending and after receiving each message. We use these adversary. It ranges over arbitrary ML code with events to formalize our security goals (see section 4). access to all functions exported by Data, Net, For example, before sending the Reply message, the Crypto, and Si protocol, and access to some web server must log an event, Send Reply, that states keys in Prins (as detailed below). that it intends to send a reply with some specific value The module U has the usual capabilities of a Dolev- for x. Second, they are locally sequential; in particular, Yao adversary: it controls compromised principals that they never send messages in both branches of a session. may instantiate any of the roles in a session; it may Typechecking By writing application code that is intercept, modify, and send messages on public chan- well typed (under ordinary ML typing) against our nels, and perform cryptographic computations, but it generated protocol interface, a programmer is guaran- cannot break cryptography or guess secrets belonging teed that his code locally complies with the session to compliant principals. A run of an S-system consists of the events logged discipline. However, compromised principals playing e during execution. For each session, we define three remote roles may not comply with the session and may kinds of events that are observable: collude with a network-based adversary to confuse compliant principals. Hence, the protocol module uses Send f(a, ve) Recv f(a, ve) Leak(a) cryptography to ensure global session integrity—all where f ranges over labels in the session. Send f compliant principals have consistent states (Section 4). asserts that, in some run of the session, principal a

6 instantiating the source role of f commits to sending It means that the views of the session state at all com- a message labelled f with values ve for its written pliant principals must be consistent. Hence, all princi- variables. Recv f asserts that principal a instantiating pals who use protocol implementations (Si protocol) the target role of f after examining the over-the-wire generated by our compiler and verified by typecheck- cryptographic evidence, accepts a message labelled f ing are protected against adversaries U. with values ve for its read variables. For example, in a run of the Proxy session, suppose The event Leak(a) states that the principal a is that the client principal playing the role c and the proxy compromised; this event is generated whenever the playing p are compliant, but the web service playing adversary U demands a from the Prins module; in w may be compromised. Then, the theorem guarantees a run where a principal’s keys are never accessed by U, that whenever the client receives a Reply message from this event does not occur, and the principal is treated as the web service, it must be that the proxy previously compliant. (This functionality of Prins formally models sent the web service a Forward or Resume message; selective key compromise; it is of course disabled in the web service cannot reply to the client before or our concrete implementation.) For a given run of an during its negotiation with the proxy and convince him Se-system, we say that a compliant event of the run is to accept the message. Moreover, the values of session a Send or a Recv event present in the run whose first variables, such as q, d, o, and x, must be consistent at argument is a principal a for which there is no Leak(a) all compliant principals. event anywhere in the run. We now relate events and session graphs: a ses- 5. Protocol design and cryptography sion trace of a session is a sequence of Send and Recv events obtained by (globally) instantiating all the For each session, our compiler generates a custom bound variables of an initial path of the session. cryptographic protocol by first extracting the control Definition 1 (Session traces): The traces of S are as flow graph for each role, and then selecting crypto- follows: graphic protection for each message according to this control flow. We first discuss the design and structure 1) let f1 . . . fk be an initial path of S; of our generated protocols by example, relying on two 2) let xei = write(fi) and yei = read(fi) be the informal protocol narrations for selected paths in the written and read variables of fi for i = 1..k; graph of Proxy in Figure 1. Later in this section, 3) let (αi)i=1..k be a sequence of maps from vari- we describe the general case, the detailed message ables X to values for which αi and αi+1 may formats, and the compilation process. differ only on xei, for i = 1..k − 1; 4) replace each fi from the path with two events Sample protocol narrations The first narration, in Figure 3, corresponds to the upper path of Proxy, Send fi(αi(src(fi)), αixi) e which does not involve any Audit edges. The second Recv fi(αi(tgt(fi)), αiyi) e narration, in Figure 4, corresponds to the lower path 5) optionally discard the final Recv f event. k with a single Details-Retry iteration. For a given run of an Se-system, a compliant subtrace For simplicity, we first assume that every pair of of a session S ∈ Se is a projection of a trace of S principals a, b already shares keys for encrypting and where non-compliant events are discarded. Session MACing data using symmetric cryptography. We write a a traces capture all sequences of events for a partial run encb {m} and macb {m} for the and the of an S-system. Moreover, the values of the variables MAC of m, respectively, using a key shared by a and b. recorded in the events are related to one another As explained later in this section, the first time a needs other exactly in accordance with the variable (re)writes keys for b, it generates these keys, encrypts and signs allowed by the graph (possibly shadowing each other). them using asymmetric cryptography, and includes the We are now ready to state our main security result: result in its outgoing message. Each message consists of a header, a series of Theorem 1 (Session Integrity): For any run of a ver- variables assignments (either in the clear, or hashed, ified S-system, there is a partition of the compliant e or encrypted), and a series of MACs (with at most one events that coincides with compliant subtraces of ses- MAC for every pair of principals of the session). sions from S. e The message header consists of a session-label tag, The theorem states that the compliant events of any a session identifier s , and a message sequence number run are interleavings of the compliant events that may (0, 1, 2,. . . ). The tag is precomputed as a hash of the be seen along execution of initial paths of the sessions. whole session definition plus the message label.

7 Initially, c, p, w share symmetric keys for mac and enc c : Assign c, p, w, q Application at c assigns c,p,w,q c : Fresh s Fresh session identifier s (Request) c → p : let h0 = Request(s, 0) in Request header let m0 = h0 | Vars(c, p, w) in Plaintext message from c to p let a0 = h0 | Hashes(c, p, w, q) in Authenticated message from c to p, w c c c m0 | encp{q} | macp{a0} | macw{a0} Formatted Request message (Forward) p → w : let h1 = Forward(s, 1) in Forward header let m1 = h1 | Vars(c, p, w) in Plaintext message from p to w let a1 = h1 | Hashes(c, p, w, q) in Authenticated message from p to w, c p c p p m1 | encw{q} | macw{a0} | macw{a1} | macc {a1} Formatted Forward message w : Assign x Application at w assigns x (Reply) w → c : let h2 = Reply(s, 2) in Reply header let a2 = h2 | Hashes(c, p, w, q, x) in Authenticated message from w to c w p w h2 | encc {x} | macc {a1} | macc {a2} Formatted Reply message

Figure 3. Protocol narration for upper path of Proxy

The message payload includes the (possibly en- particular assignment to c, p, w, and q. crypted) value of each variable that can be read by the • The second message (Forward) forwards c’s MAC receiver. It also includes a cryptographic hash of the to w, and includes two new MACs from p, so value of each variable that has been initially assigned that the two other principals can verify that p or updated but cannot be read by the receiver. For accepted c’s request and assignment then selected instance, the Forward message. (Without a MAC forwarded • The first message (Request) carries an assignment from c to w, for instance, a compromised p may of principals to c, p, and w (in the clear) and of involve c in a session by faking a request from c.) a request to q (encrypted for p, which reads q). • The third message (Reply) forwards p’s MAC • The second message (Forward) forwards the same to c, and includes a new MAC from w. In com- assignment of principals to c, p, and w, and of the bination, they enable c to verify that p accepted same request to q (now encrypted for w). and forwarded its original request to w, and • The third message (Reply) carries an encrypted that w accepted and replied to the request with assignment to the response x; conversely, there is response x. no need to re-send c, p, w, or q since they have • On the lower path, each of the two MACs to w not been assigned since c sent its last message. carried by the Audit message contains the hashes • On the lower path in Figure 4, the Audit message of c,p,w,q. To verify them, w recomputes the does not carry the assignment to q, as w cannot hashes of c,p,w from their values and uses the read q yet. Instead, it carries the hash of the value transmitted hash for q (since w does not read q of q, used by w as a commitment as it checks the at this stage). received MAC. Later, the Details-Retry loop involves only p Later, the Resume message carries this assign- and w. Both have already checked a MAC from ment, as w is granted access to q. At this point, c and recorded its assignment, so a single MAC w can check that the received value matches the suffices in each message to authenticate the new earlier commitment. assignment to d or o. At the same time, the Next, we explain the purpose and contents of the inclusion of strictly-increasing sequence numbers MACs. In contrast with the message payloads, they in the MACS prevents any confusion between authenticate the global state of the store, using an successive assignments to d. incremental hash computation, written Hashes here. Protection from replay attacks relies on caching of Implicitly, every message recipient verifies every MAC session identifiers, as follows. When receiving a mes- for which it has the authentication key, by recomputing sage, each principal knows from the header whether the presumed content from its local state plus the the message is an invitation to join a session or a received assignments and hashes. continuation for a session in which the principal is • The first message (Request) has two MACs, one already participating. In the former case, the principal for each of the other principals p and w, so that accepts the message only if it is not already involved they can both verify that c initiates a session run in a session with the same session identifier and the as the client, with session identifier s and this same role; to this end, it maintains a local anti-replay

8 Initially c, p, w share symmetric keys for mac and enc c : Assign c, p, w, q Application at c assigns c,p,w,q c : Fresh s Fresh session identifier s (Request) c → p : let h0 = Request(s, 0) in Request header let m0 = h0 | Vars(c, p, w) in Plaintext message from c to p let a0 = h0 | Hashes(c, p, w, q) in Authenticated message from c to p, w c c c m0 | encp{q} | macp{a0} | macw{a0} Formatted Request message (Audit) p → w : let h1 = Audit(s, 1) in Audit header let m1 = h1 | Vars(c, p, w) | Hashes(q) in Plaintext message from p to w let a1 = h1 | Hashes(c, p, w, q) in Authenticated message from p to w c p m1 | macw{a0} | macw{a1} Formatted Audit message w : Assign d Application at w assigns d (Details) w → p : let h2 = Details(s, 2) in Details header let a2 = h2 | Hashes(c, p, w, q, d) in Authenticated message from w to p w w h2 | encc {d} | macp {a2} Formatted Details message p : Assign o Application at p assigns o (Retry) p → w : let h3 = Retry(s, 3) in Retry header let a3 = h3 | Hashes(d, o) in Authenticated message from p to w w p h3 | encc {o} | macw{a3} Formatted Retry message w : Assign d0 Application at w re-assigns d (Details) w → p : let h4 = Details(s, 4) in Details header 0 let a4 = h4 | Hashes(o, d ) in Authenticated message from w to p w 0 w h4 | encc {d } | macp {a4} Formatted Details message (Resume) p → w : let h5 = Resume(s, 5) in Resume header 0 let a5 = h5 | Hashes(d ) in Authenticated message from p to w 0 0 let a5 = h5 | Hashes(c, p, w, q, o, d ) in Authenticated message from p to c p p p 0 h5 | encw{q} | macw{a5} | macc {a5} Formatted Resume message w : Assign x Application at w assigns x (Reply) w → c : let h6 = Reply(s, 6) in Reply header 0 let = h6 | Hashes(o, d ) in Hashes for MAC verification 0 let a6 = h6 | Hashes(c, p, w, q, o, d , x) in Authenticated message from w to c w p 0 w m6 | encc {x} | macc {a5} | macc {a6} Formatted Reply message

Figure 4. Protocol narration for lower path of Proxy with one Details−Retry loop iteration cache, and updates it whenever it joins a session. In Obtaining control flow states The control flow graph the latter case, the principal knows the state of the of each role is used to determine its possible states session as it sent its last message, and updates this during execution; a role in a particular state only state after accepting the message, which prevents any accepts certain incoming messages but not others. To replay (because it only ever accepts messages within precisely capture the runtime states for each role, we the session with strictly increasing sequence numbers). rely on the fact that the content of a given message to be sent is determined by the position of each of the On the lower path, for instance, once w accepts a roles in the session graph and their current knowledge first Audit message, (1) it will never again accept any of the variables’ contents. Each of these states can thus Forward message with the same session identifier, so a be indexed by a role name (the sender) and an extended compromised p cannot trick it into running an “upper path containing the last label sent by each of the roles path” session by reusing c’s MAC; (2) it will not accept and the last occurrence of each written variable. We any message for the session before sending Details; (3) call internal control flow states these paths, indexed it may then accept either a Retry or a Resume. by ρ, and give a formal definition below. We now explain our general compilation scheme. The states ρ rely on an updated analysis of the As shown in the protocol narrations, each session path session graph from the one presented in Corin et al. determines a particular sequence of valid messages, [2008]. The main idea is the same, namely that it protected accordingly with cryptography. Thus, the is sufficient to check a signed partial history of the compilation proceeds in two steps. First, it explores all session’s messages at each execution step. This no- possible execution paths by extracting the control flow tion, which we called visibility in our previous work, graph for each role. Then, it protects cryptographically allows for an optimized protocol that only requires the each message, depending on this control flow. transmission at each step of one MAC from each of the

9 roles that have been involved in the session since the w: ε receiver’s last involvement. The significant difference () Forward (c,p,w,q) () Audit (c,p,w)

is that freshly bound variables are now included in w: (c,p,w,q)Request()Forward w: (c,p,w,q)Request()Audit the MACs to account for the distributed store. This (x) Reply (x) (d) Details (d) compact design allows for a finite statically-computed state-based implementation. w: (c,p,w,q)Request()Forward(x)Reply w: (c,p,w,q)Request()Audit(d)Details The states ρ form a refined graph of the original () Resume (q) (o) Retry (o)

session graph. The refinement roughly duplicates every w: (c,p,w,q)Request(d)Details()Resume w: (c,p,w,q)Request(d)Details(o)Retry

node within a loop to distinguish the case when the (x) Reply (x) (d) Details (d) (o) Retry (o) loop is entered for the first time from subsequent iterations. Hence, the refined graph can contain more w: (c,p,w,q)Request(d)Resume(x)Reply w: (c,p,w,q)Request(o)Retry(d)Details nodes and edges than the session graph, but it remains () Resume (q) finite. The states ρ can then serve as indices for w: (c,p,w,q)Request(o,d)Details()Resume

the various receiving and sending functions in the (x) Reply (x) generated protocol module. w: (c,p,w,q)Request(o,d)Resume(x)Reply Definition 2: An internal control flow state, denoted ρ, is an extended path that is the result of applying the Figure 5. Refined graph for Proxy projected for w state function st (defined below) to some initial path. Let st(fe) = filter(f,e ε, ε) where the filter func- tion filter( , , ) is defined as namely (c,p,w,q). This is consistent with the protocol narration described above. When the next messages filter(ε, z,e re) = ε ( are Details and Resume, the internal control flow state (filter(f,e x0z, rr)) (x0)f if r∈ / r filter(ff, z, r) = e e e e e becomes (c,p,w,q)Request(d)Details()Resume. Since e e e 0 0 (filter(f,e xe z,e re)) (xe ) elsif r ∈ re Details was sent by w, only a MAC of Resume from where r = src(f) and x0 = write(f) \ z. p with the hash of d is expected. The internal control e e flow state also gives the information that all the roles Intuitively, filter(f,e z,e re) sweeps through an initial know the value of (c,p,w,q), but only p and w know path fe, right to left, filtering out any labels sent by the value of d. roles encountered to the right (accumulated in re) and The “dotted” nodes represent states of the protocol any written variables (accumulated in ze). Only the last implementation of a role where this role is inactive occurrence of each variable and the last label sent by (e.g. waiting for a message). Each of these nodes can each role are kept. This way, an internal control flow be uniquely designated by the internal control flow state gives the position of each of the roles in the graph state of the initial node of the abstracted subgraph. and, by the interleaving of variables and labels, the current knowledge of the store by each the roles. Cryptographically protecting messages Once the The refined graph can be projected to give the compiler calculates the control flow states for a role, execution states of a given role (in the same way that it can generate sending and receiving functions that roles processes are projection of the global session emit and expect the correct messages for each par- graph). For example, Figure 5 outlines the refined ticular session state. The receiving functions inspect graph of the Proxy session projected for role w. We the incoming label to decide whether the message is obtain the projection by keeping the nodes in which expected, in which case, it is be further verified with role w is receiving or sending a message, and by cryptographic checks; otherwise, the message is just replacing the subgraphs that involve communications ignored. between the other roles with “dotted” nodes (whose We now turn our attention to the (cryptographic) annotations are explained below). protection of messages exchanged by roles, elucidating Role w has two initial internal control flow states, the elements included in each message which were first depending on whether it receives an Audit or a Forward introduced in the narration earlier in this section. Con- message. In the case of an Audit, the reached internal sider an edge (xe)f(ye) with target role w and source control flow state is (c,p,w,q)Request()Audit, from role u. The compiler generates a sending function for which it is possible to deduce that w expects two role u where a low-level representation of this edge MACs: a MAC of Request from c and a MAC of Audit is sent over the network. This message embeds values from p. Each of those MACs would include the hashes for ye, plus the tag f (as well as additional auxiliary of the variables bound since the last involvement of w, information, detailed below).

10 The functional and cryptographic goals that this session identifier s, the source principal sr (c for message needs to accomplish are: Request), the variable tag ‘v’, its current value v, and the confounder v ; 0 is the initial timestamp; ` • provide the values for ye to w, since ye can be read c 0 by w; is the tag ‘(c,p,w,q) Request’; aencx(m) denotes the asymmetric encryption of m under the public key of • protect the confidentiality of ye, as otherwise pri- vate values of the store may be leaked; principal x; asignx(m) denotes the of • ensure session integrity by giving evidence to w m under the private key of x. and subsequent roles that the message comes from The correspondence between the components of the the source role u and that the session specification message and those shown in the narration of Request was respected until then. in Figure 3 is as follows: Symmetric session keys are initially established • Line (1) is equivalent to h0. between principals via a key-establishment protocol • Line (3) is the expansion of c relying on asymmetric encryption and signature; this Vars(c, p, w) | encp{q} is the only use of public-key cryptography in the Variables c, p, and w are principal variables and generated protocol. Key establishment is carried out hence are sent in the clear. On the other hand, by including data in the regular session messages, so variable q is a data variable and must be kept it does not require further messaging. confidential; hence, it is encrypted for p. c Each session message consists of • Line (2) is the expansion of macp{a0}, revealing • a series of MACs from the sender and earlier the full definition of Hashes(c, p, w, q) (intention- participants, intended to provide integrity of the ally elided in the narration) as a concatenation of session path, and a series of (cryptographically the hashes of the variables c, p, w, and q. c hashed) variables needed by the receivers to re- • Line (5) is the expansion of macw{a0}. compute and verify the MACs; • Lines (4) and (6) are there for key establishment • a series of (possibly encrypted) variables with and therefore do not directly correspond to any- their current values, for the readable variables of thing in the narration figure, where we omit key the receiver; establishment. • a set of encrypted session keys, for the initial con- In (4), the component a e tact between the sending and receiving principals. asignc(aencp(‘c’, k (c, p) | k (c, p))) As an example, we detail the structure for the initial serves for key establishment, necessary only the message of the Proxy session, the first message in first time the principal c sends a message to p. Figures 3 and 4. It contains two fresh session keys ka(c, p) and ke(c, p) asymmetrically encrypted for the recipi- Example message: Proxy Request Consider the first ent w and digitally signed by the source role p; receipt of a Request message by p from c in a run of ka(c, p) is intended for MACing between c and p the Proxy example. The internal control flow state for p (macc {·}), and ke(c, p) is intended for symmetric is (c,p,w,q)Request, denoting that variables c,p,w,q are p encryption (encc {·}). written by c in the initial Request. Let s be the session p Upon receiving the Request message, the principal identifier, c, p, w be the principals instantiating roles p can choose whether to join the session in role p. It c,p,w; and let q be the value for q. Moreover, for each then processes (4) to obtain the session keys. These variable v, let v be a fresh confounder for the session c keys are used to decrypt the variables in (3); once all run s; confounders are used when hashing variables variables are processed, p recomputes the hashes and to protect them from dictionary attacks. The format of checks the MAC of (2). The principal variables are the message sent by c to p is as follows: checked to see if the principal is indeed assigned to s | 0 | `0 (1) its role, and that the s is not a replay. If all the checks c | macp{s | 0 | `0 | h[c] | h[p] | h[w] | h[q]} (2) succeed, role p updates the session store and invokes c | c | cc | p | pc | w | wc | encp{q | qc} (3) the application code handler for the Request message. a e | asignc(aencp(‘c’ | k (c, p) | k (c, p))) (4) Generated protocol module Our compiler generates c | macw{s | 0 | `0 | h[c] | h[p] | h[w] | h[q]} (5) a protocol module with functions that send and receive a e | asignc(aencw(‘c’ | k (c, w) | k (c, w))) (6) messages like the one above. The detailed structure of the generated code is shown in Appendix C. To ease notation, we write h[v] = h(s | sr | ‘v’ | v | vc) For each role, the generated send and receive func- to denote the hash of the concatenation of the (fresh) tions operate on a data store (of type store) that

11 contains all the session parameters, such as the session 6. Sessions as path predicates identifier, the current timestamp, and the values of all the variables known to the role. In addition, the store In this section, we outline the proof of our main contains the hashes of all the variables bound in the theoretical result, Theorem 1, which states the in- session so far, including the ones whose values are not tegrity of session executions as observed by compliant known locally. Finally, the store contains cryptographic principals despite the presence of arbitrary coalitions materials for the session, such as established session of compromised ones. Due to space limitations, we keys, received MACs, and confounders for hashes, confine most of the details to the long version of this as described later in this section. During any run of paper [Bhargavan et al., 2009]. the session, the current internal control flow state in We proceed as follows: we first enrich send and combination with the value of the current data store receive events with additional parameters and lift the describes the full state of each role in the session. definition of session traces accordingly; for each ses- sion, we describe families of predicates that capture invariants that must be maintained by a session imple- Trusted platform libraries Our code relies on a set mentation; we define typed interfaces for our generated of trusted libraries for data manipulation, cryptogra- code, and show that if the code meets these types, then phy, networking, and managing principals. For each it maintains these invariants (Lemma 1); we prove, by library, we provide two implementations: a concrete hand, that the implementation of each role is locally implementation that is used to compile and execute the sequential (Lemma 2); using the invariants and local session, and a symbolic implementation that abstractly sequentiality, we establish (via Lemma 3) the integrity models the library and meets a refined typed interface. theorem for all code that is generated by our compiler Concrete libraries are not verified in our method; they and typechecked (Theorem 1). form part of our trusted computing base. Theorem 1 Stores, timestamps, and enriched events We model applies to protocol code linked with symbolic libraries. a store as a session identifier, herein written s, and A first module, Data, provides data types bytes a variable store, written σ. We treat σ as a triple of (for raw bytes arrays) and str (for strings) used for partial maps σv, σh, σc, where σv maps variables to networking and cryptography; its interface provides (their known) values, σh maps variables to hashes, σc e.g. base64: bytes → str for encoding string payloads, maps variables to confounders. and concat: bytes → bytes → bytes for concatenation. Enriched events are obtained from those of Section 4 by adding timestamps and stores (following the imple- The Crypto library provides functions for encryp- mentation in Section 5): tion (RSA and AES), MACing (HMACSHA1), hash- Send f(a, s, ts, v, σ) Recv f(a, s, ts0, ts, v, σ) ing (SHA1), and generating fresh nonces and keys. e e Timestamps, ranged over by ts, are natural numbers. The Prins library maintains a database of principals. The timestamp ts in Send and Recv events records The database records, for every other principal, the the time at which the event is issued and we refer principal name, its public-key certificate, its network to it as the upper timestamp of the event; in Recv address, and any session keys shared with it. We events, ts0 also records the time at which the role assume an existing public key infrastructure (PKI) tgt(f) previously sent a message, or 0 if no previous in which each principal has a public/private keypair message was sent; σ is the local store of a when the and knows the other principals’ public keys. Prins Send and Recv event is issued. also maintains locally an anti-replay cache for each We can lift Definition 1 accordingly to specify the principal, containing session identifiers and roles for all traces of enriched events, as shown in the long version sessions it has joined, to ensure that it never joins the of this paper. same session twice in the same role. During a session, Invariant path predicates A pair of mutually recur- whenever a principal contacts another principal for the sive families of predicates, Q and Q0, serves as the first time, it generates fresh session keys and registers invariant at each send and receive event emitted by them in the database. the generated implementation code for sessions. This Relying on the networking library, Net, Prins also invariant reflects the full complexity of our optimized provides functions for sending and receiving messages protocol and is established by typechecking. Consider ˆ between principals; Net is never accessed directly by any internal control flow state ρ = f (xe)f; then: our generated code, but may be accessed by application • Q ρ(s, ts, σ) asserts that the principal playing role code for non-session communications. src(f) in a session instance with identifier s is

12 satisfied that its global execution has followed an Typechecking a program against its refined interface initial path whose image under the state func- guarantees that in every execution of the system with tion st (see Definition 2) is ρ, with the final step of an active adversary, whenever a function is called, the execution being the send of f at timestamp ts; its pre-condition holds, and when it returns, its post- moreover, the current values for all the variables condition holds. The typechecker relies on the logical written along ρ (i.e. the state after the send) are properties of MACs and , expressed as in the store σ. refined types for the Crypto library, and on the secrecy 0 0 • Q ρ(s, ts , ts, σ) asserts that the principal play- and usage of keys, expressed as a session key type ing role tgt(f) in a session instance with iden- generated by our compiler. It analyzes the protocol tifier s is satisfied that its global execution has code and generates proof obligations in first-order- followed an initial path whose image under st logic, which are discharged by the Z3 SMT solver. is ρ, with the final step of the execution being Hence, by typechecking we establish that path pred- the receive of f at timestamp ts; the last time the icates are maintained as an invariant by the protocol role sent a message was at timestamp ts0 (or 0 if module generated by our compiler. this is the first time the role enters the session); Lemma 1: For any run of a verified Se-system, for any moreover, the current values for all the variables session S ∈ Se, for any session identifier s running S, written along ρ (i.e. the state after the receive) are for any compliant principal a, in the store σ. • the event Send f(a, s, ts, v, σ) in the run implies 0 e Expanding the Q and Q predicates at a particular that there exists an internal control flow state ρ internal control flow state yields a tree of disjunctions of S ending in the sent label f, such that a = and conjunctions of assertions that when combined, σv(src(f)), ve = σvxe, and Q ρ(s, ts, σ) where xe characterizes the valid traces of send and receive events are the written variables of f; at the compliant principals in a trace of the session. 0 • the event Recv f(a, s, ts , ts, v,e σ) in the run im- Proofs by typing Our compiler generates a re- plies that there exists an internal control flow state fined type interface that uses path predicates as pre- ρ of S ending in the received label f, such that 0 0 and post-conditions for the session messaging func- a = σv(tgt(f)), ve = σvye, and Q ρ(s, ts , ts, σ) tions. For example, for the Ws session, the gener- where ye are the read variables of f; ated protocol module contains a role function w that During the design of our compiler, we found several calls the messaging functions recv w Request and bugs (violations of Lemma 1) by typechecking. More send w Reply. The refined type interface for these two often, we found that our type annotations were not functions is of the form: strong enough to establish our results, or that our

val recv w Request: (s:store){Q ε(s.sid,0,0,s)} → typechecker required predicates to be structured in a 0 (s’:store){Q (cwq)Request(s’.sid,1,s’)} specific way. Discovering sufficiently strong annota- val send w Reply: (x:int) → tions for keys, libraries, and auxiliary functions, and 0 (s:store){Q (cwq)Request(s.sid,1,s)} → designing a compiler that automatically generates them {Q } (s’:store) (cwq)Request(x)Reply(s’.sid,2,s’) requires some effort, but is rewarded with an automated The curly braces after (s:store) enclose a logical verification method. We have used this method to formula that must hold about the store s. In general, typecheck several examples; their verification time and formulas that appear in the type of function arguments other statistics are listed in Section 7. represent pre-conditions; formulas in the their result Proof of integrity We complete our proof by hand type represent post-conditions. The pre-condition on (as our typechecker does not keep track of linearity), recv w Request says that, initially, the store s must with a lemma establishing that the implementation of 0 be empty. Its post-condition is the Q predicate on w’s each role in a session is locally sequential: store at the internal control flow state (cwq)Request; in particular, it requires that c must have sent a Request Lemma 2: In any run of a verified Se-system, if the to w and that w must agree with c on the values of principal a is compliant, then for any role r and c, w, and q. The pre-condition on send w Reply is session identifier s for S, the series of events emitted the same as the post-condition of recv w Request; by a with s in role r forms an alternation of sends and hence, a Reply may be sent only after a Request receives such that for any adjacent pair of events is received; its post-condition is the Q predicate de- Send f(a, s, ts0, v, σ0), Recv g(a, s, ts1, ts2, w, σ2) or e scribing w’s store at the internal control flow state Recv g(a, s, ts1, ts2, w,e σ2), Send f(a, s, ts3, ve , σ3) (cwq)Request(x)Reply. we have ts0 = ts1, ts1 < ts2, and ts2 + 1 = ts3.

13 Session S Roles S.session Application Graph Refined Graph S protocol.ml S protocol.ml7 Verification (see Figure 1) (lines) (lines) (.dot lines) (.dot lines) (lines) (lines) (seconds) Rpc 2 8 24 11 18 472 315 6.1 Ws (a) 2 8 33 14 24 592 414 8.8 Commit 2 16 29 14 24 603 399 10.3 Wsn (b) 2 21 47 20 60 1,171 921 45.1 Fwd 3 15 38 11 19 581 357 8.6 Proxy (c) 3 28 65 26 80 2,181 1,939 154.1 Redirect (d) 3 21 34 17 31 788 550 29.3 Login 4 28 54 29 74 2,053 1,542 103.4

Figure 6. Code sizes and verification results for sample sessions

For example, in the session Ws, the type of the role and a four-party distributed login session between function w (see Section 3) ensures that it receives and a client, a gateway, a database, and a dynamically sends messages in strict alternation and that it cannot selected web server (Login). For each session, we send two messages in parallel. Lemma 2 says that experimentally confirmed that the generated protocol all role functions generated by our compiler have this is functional, by testing it over a network. property. The proof follows from the structure of the Our compiler is written in around 4300 lines of compiler code that generates role functions. ML. The trusted libraries, Data, Crypto, Prins, and We use Lemma 2 to show that the predicates Q and Net, are shared by all session protocols and have Q0 imply session integrity: around 800 lines of code, both for concrete (F#/.NET, Ocaml/OpenSSL) and symbolic implementations, al- Lemma 3: For every run, if Q ρ(s, ts, σ) or though the concrete implementations rely on much- Q0 ρ(s, ts0, ts, σ) holds, then there is a session trace larger system libraries. of an initial path ending in state ρ that matches a For each session example (S), Figure 6 first gives the subsequence of the compliant events. numbers of roles; the size of the input file (S.session); Theorem 1 follows from Lemmas 1 and 3. the size of the handwritten application code we used for testing the session; the size of the session graph Secrecy This paper focuses on integrity rather than generated by our compiler, both before and after refin- secrecy, whose formulation is more technical (as it ing the graph with internal control flow states; the size involves the behaviour of application code, not just of the generated ML implementation (S protocol.ml) protocol code). We only outline our secrecy results. and its refinement-typed interface (S protocol.ml7); By typechecking, we obtain secrecy for values as- and finally the time spent typechecking our implemen- signed to session variables, under the assumption that tation against refinement-typed interfaces at the end the application code run by compliant principals is of the compilation. (The rest of the compilation is trusted to provide secret values for these variables and relatively fast.) not leak them to the adversary outside the session. Even when programming complex multi-party ses- (Bengtson et al. [2008] also provide a discussion of sions with many roles and messages, the application secrecy by typing.) The value assigned to a variable in programmer needs to write less than a hundred lines a session run may be obtained by the adversary only of code. The generated modules are larger and more if a compromised principal plays a role in the session complicated—in fact the auxiliary formulas that an- that can read the variable. To verify this property, we notate function declarations are as large as their actual annotate each session encryption key with a refined code. However, the programmer can ignore their details type that enforces that the key is kept secret and that and rely instead on the typechecker. The verification it is only used to encrypt secret values, unless the time is roughly linear in the size of the generated code. principals sharing the key have been compromised. (Pragmatically, the programmer may gain additional confidence in the verification process by reviewing the 7. Performance evaluation location and content of the security events in generated code, which involve only a small part of that code.) We present compilation and verification results for We also measured the overhead of cryptographic a series of examples. In addition to the sessions of protection for several simple cases. Figure 7 gives the Figure 1, our examples include a simple remote call total runtimes (in seconds) for completing a number of (Rpc); a session with early commitment to values instances of session Wsn (Figure 1(b)), which involves (Commit); a session with message forwarding (Fwd); a loop between two roles. We used a Core 2 Duo

14 Session Wsn Instances 5,000 5,000 50 50 Messages per session 2 20 200 2,000 Total messages 10,000 100,000 10,000 100,000 Total runtime (seconds) 4.02 11.12 1.52 6.62 consisting of symmetric encryption 0.10 1.01 0.09 0.68 MAC computations 0.21 1.09 0.09 0.63 2.17 2.21 0.60 0.74 unprotected messaging 1.54 6.41 0.74 4.47

Figure 7. Cryptographic overhead

E8400 at 3.00GHz with 4GB RAM, running Linux E. Bonelli and A. B. Compagnoni. Multipoint session 2.6.24 with OpenSSL cryptography and only local types for a distributed calculus. In 3rd Symposium on communication. The parameters of our runs are the Trustworthy Global Computing (TGC’07), volume number of instances (5,000 twice, then 50 twice) and 4912 of LNCS. Springer, 2007. the number of messages exchanged in each session R. Corin and P.-M. Denielou.´ A protocol compiler (successively 2, 20, 200 and 2,000). for secure sessions in ML. In 3rd Symposium on We measure the total runtime, first for the automat- Trustworthy Global Computing (TGC’07), volume ically generated cryptographic protocol then by suc- 4912 of LNCS. Springer, 2007. cessively disabling the payload encryption, the MAC R. Corin, P.-M. Denielou,´ C. Fournet, K. Bhargavan, generation and verification, and the key exchange of and J. Leifer. A secure compiler for session ab- the protocol. We report the overhead for each of these stractions. Journal of Computer Security, 16(5):573– operations. (The last line thus corresponds to sessions 636, 2008. An earlier version appeared at the 20th that run without any protection.) IEEE Computer Security Foundations Symposium The cryptographic overhead is of two kinds. For (CSF’07). the key-establishment mechanism, it depends only on L. de Moura and N. Bjorner. Z3: An efficient SMT the number of instances. For the symmetric encryption solver. In 14th International Conference on Tools and MACs, the overhead depends on the number of and Algorithms for the Construction and Analysis of messages, and is around 10-15% each per message. Systems (TACAS’08), volume 4963 of LNCS, pages In terms of global performance, we also measured the 337–340. Springer, 2008. total runtime of establishing 5,000 SSL connections M. Dezani-Ciancaglini, N. Yoshida, A. Ahern, and (using the openssl command line), which yields a S. Drossopoulou. A Distributed Object-Oriented comparable 15.93s in the same conditions. language with Session types. In 1st Symposium on Trustworthy Global Computing (TGC’05), Apr. Acknowledgments This work benefited from com- 2005. ments from Joshua Guttman, Jer´ emy´ Planul, Nobuko M. Dezani-Ciancaglini, D. Mostrous, N. Yoshida, and Yoshida, and the anonymous referees. It was partially S. Drossopoulou. Session types for object-oriented supported by EPSRC EP/F003757/1. languages. In 20th European Conference on Object- Oriented Languages, Nantes, France (ECOOP’06), References July 2006. C. Fournet and T. Rezk. Cryptographically sound im- J. Bengtson, K. Bhargavan, C. Fournet, A. D. Gor- plementations for typed information-flow security. don, and S. Maffeis. Refinement types for secure In 35th Symposium on Principles of Programming implementations. In 21st IEEE Computer Security Languages (POPL’08), pages 323–335. ACM Press, Foundations Symposium (CSF’08), 2008. Jan. 2008. K. Bhargavan, C. Fournet, A. D. Gordon, and S. Tse. S. J. Gay and M. Hole. Types and subtypes for client- Verified interoperable implementations of security server interactions. In 8th European Symposium on protocols. In 19th IEEE Computer Security Founda- Programming (ESOP’99), pages 74–90, 1999. tions Workshop (CSFW’06), pages 139–152, 2006. A. Guha, S. Krishnamurthi, and T. Jim. Using static K. Bhargavan, R. Corin, P.-M. Denielou,´ C. Four- analysis for Ajax intrusion detection. In 18th Inter- net, and J. Leifer. Cryptographic protocol syn- national World Wide Web Conference, pages 561– thesis and verification for multiparty sessions. 570, 2009. Long version of this paper, at http://www.msr-inria. K. Honda, V. T. Vasconcelos, and M. Kubo. Lan- inria.fr/projects/sec/sessions/, 2009.

15 guage primitives and type disciplines for struc- a role r active on either fe1 or fe2 such that r1 = r tured communication-based programming. In 7th or r2 = r. European Symposium on Programming (ESOP’98), 5) On any initial path, every role variable is written volume 1381, pages 22–138. Springer, 1998. at most once. K. Honda, N. Yoshida, and M. Carbone. Multi- 6) For any initial path ffe ending in role r, we have party asynchronous session types. In 35th ACM r ∈ knows(r, ffe ). SIGPLAN-SIGACT Symposium on Principles of Pro- 7) For any initial path fe ending in role r, and any gramming Languages (POPL’08), pages 273–284. role r0 active on fe, we have r0 ∈ knows(r, fe). ACM, 2008. 8) For any initial path fe ending in role r for the R. Hu, N. Yoshida, and K. Honda. Session-based first time (i.e. r not active on fe), every role r0 distributed programming in java. In 22nd European active on fe is such that r ∈ knows(r0, fe). Conference on Object-Oriented Programming, Pa- 9) For any initial path ffe , if x is read on f and f phos, Cyprus (ECOOP’08), pages 516–541, 2008. has source role r then x ∈ knows(r, ffe ). D. Malkhi, N. Nisan, B. Pinkas, and Y. Sella. Fairplay 10) For any initial path ffe ending in role r, if x is - secure two-party computation system. In USENIX read on f and x ∈ knows(r, fe) then x is written Security Symposium, 2004. on f. J. McCarthy and S. Krishnamurthi. Cryptographic protocol explication and end-point projection. In Appendix B. European Symposium on Research in Computer Se- Session syntax curity (ESORICS’08), 2008. D. Syme, A. Granicz, and A. Cisternino. Expert F#. We declare sessions using a process-like syntax for Apress, 2007. each role. This is a local representation, unlike the V. T. Vasconcelos, S. Gay, and A. Ravara. Typecheck- global session graphs in Section 2, but we can convert ing a multithreaded functional language with session between the two representations. (Honda et al. [2008] types. Theoretical Computer Science, 368(1–2):64– also explore conditions under which such interconver- 87, 2006. sions are possible.) All the global session graphs in this L. Zheng, S. Chong, A. C. Myers, and S. Zdancewic. paper are automatically generated from local syntactic Using replication and partitioning to build secure descriptions. distributed systems. In IEEE Symposium on Security τ ::= int | string Payload types and Privacy (S&P’03), pages 236–250, 2003. p ::= Role processes send( {+fi(xei); pi}i

16 the knowledge of hashes and values in order to deduce (header is built as in content) the commitment checks and the fwd hashes relation let content = content ρ00 z s.header.ts s in 00 e 00 that lists the hashes to be forwarded in each message. let mackeyrr = get mackey s.vars.r s.vars.r in let macmsg = mac mackeyrr00 (pickle content) in Finally, an independant fwd keys relation that as- 00 let r fze = concat header macmsg in sociates messages with the keys that need to be 00 00  let store.macs.r fze ← r fze in forwarded is computed: it is used to propagate the fold over (r00, l, z) ∈ fwd macs(ρ, f)  e 00  symmetric shared keys of the session between each let macs = concat store.macs.r lze macs in pair of roles. As shown in the code below, fwd keys is fold over z ∈ fwd hashes(ρ, f)   also used to lookup (if it already exists) or generate a let hashes = concat store.hashes.hz hashes in new shared key, using function gen keys. fold over y ∈ ye let keyrr0 = get symkey s.vars.r s.vars.r0 in Store Updated throughout the execution, the store let encr y = sym encrypt keyrr0 (pickle mar y) in  contains the values of the variable received, some let variables = concat encr y variables in hashes of variables, some MACs, a logical , and let security = concat keys (concat hashes macs) in let payload = concat variables security in the session id. It corresponds to the local view of the let content = concat (utf8 (cS "ρ")) payload in global distributed store. We use the notation ← to let msg = base64 (concat header content) in designate a store with an updated field. let () = psend s.vars.r0 msg in s type store = { vars : { for each (x:t) ∈ X ,  x: t } ; hashes : { for each x ∈ X ,  hx : hashstore  } ; Receiving functions For each receiving state se- (∗ hashstore has hashes and confounders ∗) quence, the compiler generates a sum type with a macs : { for all l with x visible constructor for each possible return values of the  e  received by r, rlxe : bytes } ; receiveWired functions. keys : { for each pair r,r’ of roles,  key r r0:  For all receiving state ρ, bytes } ;  header = { ts : int ; sid : bytes }} type wired ρ = for each f that can be received in state ρ Auxiliary functions The following content functions  | Wired f ρ of types of Read(f) ∗ store  build the content of the MACs used in the protocol The receiveWired function checks whether the re- (as described in Section 5). It is used by the MAC ceived message is initial or not: in the former case, generation and verification functions. the cache needs to be checked for guarding against For all state ρ that is MACed with variables z replay attacks; in the latter, only session id verification  e let content ρ ze = fun ts store → and time-stamp progress are necessary. fold over z ∈ ze Once the header of an incoming message is checked, let hashes = concat store.hashes.hz.hash hashes in let state = utf8 (cS "ρ") in the receiving code verifies the included visible se- let payload = concat state hashes in quence is acceptable. Then, the protocol unmarshalls let header = concat store.header.sid and decrypts variables and keys (read and fwd keys), (utf8 (cS (string of int ts))) in checks commitments (that is, adequacy between an  concat header payload already known hash (i.e. not learnt) of a value that has now become readable), unmarshalls hashes Sending functions For each message (i.e. edge) in (fwd hashes), unmarshalls MACs (fwd macs), checks the refined graph, the compiler generates a sendWired MACs (visib), and finally returns the corresponding function that builds and sends a message (as detailed in Wired data type. Section 5). Confounder management has been removed for clarity.

(x) f (y) For all ρ −−−−−→e e ρ0 of the refined graph The sending role is r, the receiving role is r0  let sendWired f ρ xe (s:store) = fold over x ∈ xe  s.vars.x ← x ; s.hashes.hx ← sha1 x ; s.header.ts ← s.header.ts + 1; fold over r, r0 ∈ fwd key(ρ) let keyrr0 = gen keys s.vars.r s.vars.r0 in let keys = concat keyrr0 keys in 00 00 for all (r , ρ , ze) ∈ future(ρ, l)

17