<<

1

Balancing Performance, Accuracy, and Precision for Secure Cloud Transactions

Marian K. Iskander Tucker Trainor Dave W. Wilkinson Adam J. Lee Panos K. Chrysanthis Department of Computer , University of Pittsburgh {marianky, tmt33, dwilk, adamlee, panos}@cs.pitt.edu

Abstract—In distributed transactional systems deployed over cloud servers, entities cooperate to form proofs of authorization that are justified by collections of certified credentials. These proofs and credentials may be evaluated and collected over extended time periods under the risk of having the underlying authorization policies or the user credentials being in inconsistent states. It therefore becomes possible for policy-based authorization systems to make unsafe decisions that might threaten sensitive resources. In this paper, we highlight the criticality of the problem. We then define the notion of trusted transactions when dealing with proofs of authorization. Accordingly, we propose several increasingly stringent levels of policy consistency constraints, and present different enforcement approaches to guarantee the trustworthiness of transactions executing on cloud servers. We propose a Two-Phase Validation Commit protocol as a solution, which is a modified version of the basic Two-Phase Commit protocols. We finally analyze the different approaches presented using both analytical evaluation of the overheads and simulations to guide the decision makers to which approach to use.

Index Terms—Cloud , authorization policies, consistency, distributed transactions, atomic commit protocol ✦

1 I makes it difficult to strictly maintain the ACID guarantees,  computing has recently emerged as a computing as the ’C’ (consistency) part of ACID is sacrificed to provide C paradigm in which storage and computation can be reasonable availability [3]. outsourced from organizations to next generation data cen- In systems that host sensitive resources, accesses are pro- ters hosted by companies such as Amazon, Google, Yahoo, tected via authorization policies that describe the conditions and Microsoft. Such companies help free organizations from under which users should be permitted access to resources. requiring expensive infrastructure and expertise in-house, and These policies describe relationships between the system prin- instead make use of the cloud providers to maintain, support, cipals, as well as the certified credentials that users must and broker access to high-end resources. From an economic provide to attest to their attributes. In a transactional database perspective, cloud consumers can save huge IT capital invest- system that is deployed in a highly distributed and elastic sys- ments and be charged on the basis of a pay-only-for-what- tem such as the cloud, policies would typically be replicated— you-use pricing model. very much like data—among multiple sites, often following One of the most appealing aspects of cloud computing the same weak or eventual consistency model. It therefore is its elasticity, which provides an illusion of infinite, on- becomes possible for a policy-based authorization system to demand resources [1] making it an attractive environment make unsafe decisions using stale policies. for highly-scalable, multi-tiered applications. However, this Interesting consistency problems can arise as transactional can create additional challenges for back-end, transactional database systems are deployed in cloud environments and database systems, which were designed without elasticity in use policy-based authorization systems to protect sensitive mind. Despite the efforts of key-value stores like Amazon’s resources. In addition to handling consistency issues amongst SimpleDB, Dynamo, and Google’s Bigtable to provide scal- database replicas, we must also handle two types of security able access to huge amounts of data, transactional guarantees inconsistency conditions. First, the system may suffer from remain a bottleneck [2]. policy inconsistencies during policy updates due to the re- To provide and elasticity, cloud services often laxed consistency model underlying most cloud services. For make heavy use of replication to ensure consistent perfor- example, it is possible for several versions of the policy to be mance and availability. As a result, many cloud services rely observed at multiple sites within a single transaction, leading on the notion of eventual consistency when propagating data to inconsistent (and likely unsafe) access decisions during throughout the system. This consistency model is a variant of the transaction. Second, it is possible for external factors to weak consistency that allows data to be inconsistent among cause user credential inconsistencies over the lifetime of a some replicas during the update process, but ensures that transaction [4]. For instance, a user’s login credentials could updates will eventually be propagated to all replicas. This be invalidated or revoked after collection by the authorization 2 server, but before the completion of the transaction. In this paper, we address this confluence of data, policy, and creden- Transactions tial inconsistency problems that can emerge as transactional Transaction Managers database systems are deployed to the cloud. In doing so we (TMs) make the following contributions: • We formalize the concept of trusted transactions. Trusted

DBs and Policies DBs and Policies transactions are those transactions that do not violate DBs and Policies credential or policy inconsistencies over the lifetime of the transaction. We then present a more general term, Verifiable Trusted Third safe transactions, that identifies transactions that are both Parties (CAs) trusted and conform to the ACID properties of distributed database systems (Sec. 2). Fig. 1. Interaction among the system components • We define several different levels of policy consistency constraints and corresponding enforcement approaches that guarantee the trustworthiness of transactions being a set of inference rules from the set R, a policy administrator executed on cloud servers (Sec. 3). from the set A, and a version number. We denote by C the • We propose a Two-Phase Validation Commit (2PVC) set of all credentials, which are issued by the Certificate protocol that ensures that a transaction is safe by checking Authorities (CAs) within the system. We assume that each policy, credential, and data consistency during transaction CA offers an online method that allows any server to check execution (Sec. 4). the current status of credentials that it has issued [5]. Given a • We carry out an experimental evaluation of our proposed credential ck ∈ C, α(ck) and ω(ck) denote issue and expiration D approaches (Sec. 5), and present a trade-off discussion times of ck, respectively. Given a function m : Q → 2 that to guide decision makers as to which approach is most identifies the data items accessed by a particular query, a proof suitable in various situations (Sec 6). of authorization for query qi evaluated at server s j at time

Finally, Sec. 7 describes previous related work, while Sec. 8 tk is a tuple hqi, s j, Ps j (m(qi)), tk, Ci, where C is the set of presents our conclusions. credentials presented by the querier to satisfy Ps j (m(qi)). In this paper, we use the function eval : F × TS → B to denote whether a proof f ∈ F is valid at time t ∈ TS . 2 S A  P D To enhance the general applicability of the consistency 2.1 System Model models developed in this paper, the above formalism is in- Fig. 1 illustrates the interaction among the components in our tentionally opaque with respect to the policy and credential system. We assume a cloud infrastructure consisting of a set formats used to implement the system. For instance, this of S servers, where each server is responsible for hosting a formalism could easily be used to model the use of XACML subset D of all data items D belonging to a specific application policies [6] as the set of inference rules R, and traditional domain (D ⊂ D). Users interact with the system by submitting (e.g., X.509 [7]) credentials for the set C. On the other hand, queries or update requests encapsulated in ACID transactions. it can also model the use of more advanced trust management A transaction is submitted to a Transaction Manager (TM) policies (e.g., [8], [9]) for the inference rules R, and the use that coordinates its execution. Multiple TMs could be invoked of privacy-friendly credentials (e.g., [10], [11]) for the set C. as the system workload increases for load balancing, but each transaction is handled by only one TM. 2.2 Problem Definition We denote each transaction as T = q1, q2,..., qn, where qi ∈ Since transactions are executed over time, the state Q is a single query/update belonging to the set of all queries Q. of the credentials and the policies enforced by different servers The start time of each transaction is denoted by α(T), and the are subject to changes at any instance of time, therefore it time at which the transaction finishes execution and is ready to becomes important to introduce precise definitions for the commit is denoted by ω(T). We assume that queries belonging different consistency levels that could be achieved within a to a transaction execute sequentially, and that a transaction transactions lifetime. These consistency models strengthen the does not fork sub-transactions. These assumptions simplify our trusted transaction definition by defining the environment in presentation, but do not affect the correctness or the which policy versions are consistent relative to the rest of the of our consistency definitions. system. Before we do that, we define a transaction’s view in Let P denote the set of all authorization policies, and let terms of the different proofs of authorization evaluated during Ps (D) denote the policy that server si uses to protect data i the lifetime of a particular transaction. item D. We represent a policy P as a mapping P : S × 2D → 2R × A×N that associates a server and a set of data items with Definition 1: (View) A transaction’s view VT is the set 3 of proofs of authorization observed during the lifetime of a 3 T T E transaction [α(T), ω(T)] and defined as VT = { f | f = si si In this section, we present several increasingly stringent ap- hq , s , P (m(q )), t , Ci ∧ q ∈ T}. _ i i si i i i proaches for enforcing trusted transactions. We show that Following from Def. 1, a transaction’s view is built incre- each approach offers different guarantees during the course mentally as more proofs of authorization are being evaluated of a transaction. Fig. 2 is a graphical depiction of how these by servers during the transaction execution. We now present approaches could be applied to a transaction running across two increasingly more powerful definitions of consistencies three server, and will be referenced throughout this section. In within transactions. this figure, dots represent the arrival time of a query to some server, stars indicate the times at which a server validates a Definition 2: (View Consistency) A view proof of authorization, and dashed lines represent view- or T V = {hqi, si, Psi (m(qi)), ti, Ci,..., hqn, sn, Psn (m(qn)), tn, Ci} is globally-consistency policy synchronization among servers. view consistent, or φ-consistent, if VT satisfies a predicate φ-consistent that places constraints on the versioning of the 3.1 Deferred Proofs of Authorization T policies such that φ-consistent(V ) ↔ ∀i, j : ver(Psi ) = ver(Ps j ) for all policies belonging to the same administrator A, where Definition 5: (Deferred Proofs of Authorization) Given a T function ver is defined as ver : P → N. _ transaction T and its corresponding view V , T is trusted under the Deferred proofs of authorization approach, iff at commit T With a view consistency model, the policy versions should time ω(T), ∀ T : eval( f , ω(T)) ∧ (φ-consistent(V ) ∨ ψ- fsi ∈V si be internally consistent across all servers executing the trans- consistent(VT )) _ action. The view consistency model is weak in that the policy Deferred proofs present an optimistic approach with rela- version agreed upon by the subset of servers within the tively weak authorization guarantees. The proofs of authoriza- transaction may not be the latest policy version v. It may be tion are evaluated simultaneously only at commit time (using the case that a server outside of the S servers has a policy either view or global consistency from Defs. 2 and 3) to decide that belongs to the same administrative domain and with a whether the transaction is trusted. version v0 > v. A more strict consistency model is the global consistency and is defined as follows. 3.2 Punctual Proofs of Authorization Definition 3: (Global Consistency) A view Definition 6: (Punctual Proofs of Authorization) Given a VT = {hq , s , P (m(q )), t , Ci,..., hq , s , P (m(q )), t , Ci} is i i si i i n n sn n n transaction T and its corresponding view VT , T is trusted global consistent, or ψ-consistent, if VT satisfies a predicate under the Punctual proofs of authorization approach, iff at ψ-consistent that places constraints on the versioning of the any time instance ti : α(T) ≤ ti ≤ ω(T) ∀ ∈ T : eval( fs , ti) ∧ T fsi V i policies such that ψ-consistent(V ) ↔ ∀i : ver(Ps ) = ver(P) T T i eval( fs , ω(T)) ∧ (φ-consistent(V ) ∨ ψ-consistent(V )) _ for all policies belonging to the same administrator A, and i function ver follows the same aforementioned definition, while Punctual proofs present a more proactive approach in ver(P) refers to the latest policy version. _ which the proofs of authorization are evaluated instantaneously whenever a query is being handled by a server. This facilitates With a global consistency model, policies used to evaluate early detections of unsafe transactions which can save the the proofs of authorization during a transaction execution system from going into expensive undo operations. All the among S servers should match the latest policy version among proofs of authorization are then re-evaluated at commit time the entire policy set P, for all policies enforced by the same to ensure that policies were not updated during the transaction administrator A. in a way that would invalidate a previous proof, and that Given the above definitions, we now have a precise vocab- credentials were not invalidated. ulary for defining the conditions necessary for a transaction to Punctual proofs do not impose any restrictions on the be asserted as “trusted”. freshness of the policies used by the servers during the trans- action execution. Consequently, servers might falsely deny or Definition 4: (Trusted Transaction) Given a transaction allow access to data. Thus, we propose two more restrictive T = {q , q ,..., q } and its corresponding view VT , T is trusted 1 2 n approaches that enforce some degree of consistency among iff ∀ f ∈VT : eval( fs , t), at some time instance t : α(T) ≤ t ≤ si i the participating servers each time a proof is evaluated. ω(T) ∧ (φ-consistent(VT ) ∨ ψ-consistent(VT )) _

Finally, we say that a transaction is safe if it is a trusted 3.3 Incremental Punctual Proofs of Authorization transaction that also satisfies all data integrity constraints im- Before we define the Incremental Punctual proofs of autho- posed by the database management system. A safe transaction rization approach, we define a view instance, which is a view is allowed to commit, while an unsafe transaction is forced to snapshot at a specific instance of time. rollback. 4

: query start time : query start time : query start time * : proof of authorization * : proof of authorization * : proof of authorization s s s 1 * 1 * * 1 * s s s 2 * 2 * * 2 * s s s 3 * 3 * * 3 *

time time time (a) Deferred (b) Punctual (c) View Incremental Punc- tual

: query start time : query start time : query start time * : proof of authorization * : proof of authorization * : proof of authorization s s s 1 * * 1 * * * 1 * * * * s s s 2 * * 2 * * 2 * * * s s s 3 * * 3 * 3 * *

time time time (d) Global Incremental (e) View Continuous (f) Global Continuous Punctual Fig. 2. Different variants of proofs of authorization

T ⊆ T ∀ ∀ , ∧ , ∧ φ T ∨ Definition 7: (View Instance) A view instance Vti V is 1≤i≤n 1≤ j≤i : eval( fsi ti) eval( fs j ti) ( -consistent(Vti ) T = { | = h , , A , , i ∈ T ∧ ψ T α ≤ ≤ ω _ defined as Vti fsi fsi qi si Psi (m(qi)) t C V -consistent(Vti )) at any time instance t : (T) ti (T) t ≤ t }, ∀t, t : α(T) ≤ t ≤ t ≤ ω(T). _ i i i In Continuous proofs, whenever a proof is evaluated, all T Informally, a view instance Vti is the subset of all proofs previous proofs have to be re-evaluated if a newer version of authorization evaluated by servers involved in transaction of the policy is found at any of the participating servers. At T up until the time instance ti. commit time, Continuous proofs behave similarly to Incremen-

Definition 8: (Incremental Punctual Proofs of Authoriza- tal Punctual proofs. In contrast with the Incremental Punctual

α ( T ) ( T ) α ( T ) ω ( T ) α ( T ) ω ( T ) tion) Given a transaction T and itsω corresponding view VT , T is proofs, if later executing servers are using newer policy trusted under the Incremental Punctual proofs of authorization versions, all previous servers must (i) update their policies to be consistent with the newest one, and (ii) re-evaluate their approach, iff at any time instance ti : α(T) ≤ ti ≤ ω(T), ∀ f ∈VT si ti T T proofs of authorization using the newer policies. In the case of : eval( fsi , ti) ∧ (φ-consistent(Vt ) ∨ ψ-consistent(Vt )) _ i i global consistency, all servers will be forced to use the latest Incremental Punctual proofs develop a stronger notion of policy version at all times. Therefore, we consider this variant trusted transactions, as a transaction is not allowed to proceed of our approaches to be the most strict approach of all giving unless each server achieves the desired level of the policy the best privacy and consistency guarantees. consistency with all previous servers. This implies that all The decision of which approach to adopt is likely to be

participating servers will be forced to have a consistent view

α ( T ) ( T ) α ( T ) ω ( T ) α ( T ) ω ( T ) ω a strategic choice made independently by each application. with the first executing server unless a newer policy version As with any trade-off, the stronger the security and accuracy shows up at a later server, in which case the transaction aborts. given by an approach, the more the system has to pay in terms For view consistency, no consistency check at commit of implementation and messages exchange overheads. Further time is required, since all participating servers will be view discussion of these trade-offs will be presented in Sec. 6. consistent by commit time. On the other hand, the global consistency condition necessitates another consistency check at commit time to confirm that the policies used have not become stale during the window of execution between the last 4 I S T proofs of authorization and commit time. A safe transaction is a transaction that is both trusted (i.e., 3.4 Continuous Proofs of Authorization satisfies the correctness properties of proofs of authorization) and database correct (i.e., satisfies the data integrity con- We now present the least permissive approach which we call straints). We first describe an algorithm that enforces trusted Continuous proofs of authorization. transactions, and then expand this algorithm to enforce safe Definition 9: (Continuous Proofs of Authorization) A trans- transactions. Finally, we show how these algorithms can be action T is trusted under the Continuous approach, iff used to implement the approaches discussed in Sec. 3. 5

4.1 Two-Phase Validation Algorithm Algorithm 1: Two-Phase Validation - 2PV(TM) A common characteristic of most of our proposed approaches 1 Send “Prepare-to-Validate” to all participants to achieve trusted transactions is the need for policy consis- 2 Wait for all replies (a True/False, and a set of policy tency validation at the end of a transaction. That is, in order for versions for each unique policy) a trusted transaction to commit, its TM has to enforce either 3 Identify the largest version for all unique policies view or global consistency among the servers participating 4 If all participants utilize the largest version for each in the transaction. Toward this, we propose a new algorithm unique policy 5 If any responded False called Two-Phase Validation (2PV). 6 ABORT As the name implies, 2PV operates in two phases: collection 7 Otherwise and validation. During collection, the TM first sends a Prepare- 8 CONTINUE to-Validate message to each participant server. In response to 9 Otherwise, for all participants with old versions of policies this message, each participant (1) evaluates the proofs for each 10 Send “Update” with the largest version number of each policy query of the transaction using the latest policies it has available 11 Goto 2 and (2) sends a reply back to the TM containing the truth value (TRUE/FALSE) of those proofs along with the version number and policy identifier for each policy used. Further, 2PVC can be used to ensure the data and policy consistency each participant keeps track of its reply (i.e., the state of each requirements of safe transactions. query) which includes the id of the TM (TM ), the id of the id Specifically, 2PVC will evaluate the policies and authoriza- transaction (T ) to which the query belongs, and a set of policy id tions within the first, voting phase. That is, when the TM versions used in the query’s authorization (v , p ). i i sends out a Prepare-to-Commit message for a transaction, the Once the TM receives the replies from all the participants, it participant server has three values to report: (1) the YES or moves on to the validation phase. If all polices are consistent, NO reply for the satisfaction of integrity constraints as in 2PC, then the protocol honors the truth value where any FALSE (2) the TRUE or FALSE reply for the satisfaction of the proofs causes an ABORT decision and all TRUE cause a CONTINUE of authorization as in 2PV, and (3) the version number of the decision. In the case of inconsistent policies, the TM identifies policies used to build the proofs (v , p ) as in 2PV. the latest policy and sends an Update message to each out- i i The process given in Algorithm 2 is for the TM under view of-date participant with a policy identifier and returns to the consistency. It is similar to that of 2PV with the exception collection phase. In this case, the participants (1) update their of handling the YES or NO reply for integrity constraint policies, (2) re-evaluate the proofs and (3) send a new reply validation and having a decision of COMMIT rather than to the TM. Algorithm 1 shows the process for the TM. CONTINUE. The TM enforces the same behavior as 2PV in In the case of view consistency (Def. 2), there will be at identifying policies inconsistencies and sending the Update most two rounds of the collection phase. A participant may messages. The same changes to 2PV can be made here to only be asked to re-evaluate a query using a newer policy by provide global consistency by consulting the master policies an Update message from the TM after one collection phase. server for the latest policy version (Step 5). For the global consistency case (Def. 3), the TM retrieves The resilience of 2PVC to system and communication fail- the latest policy version from a master policies server (Step ures can be achieved in the same manner as 2PC by recording 2) and uses it to compare against the version numbers of each the progress of the protocol in the logs of the TM and participant (Step 3). This master version may be retrieved participants. In the case of 2PVC, a participant must forcibly only once or each time Step 3 is invoked. For the former log the set of (v , p ) tuples along with its vote and truth case, collection may only be executed twice as in the case i i value. Similarly to 2PC, the cost of 2PVC can be measured in of view consistency. In the latter case, if the TM retrieves the terms of log complexity (i.e., the number of times the protocol latest version every round, global consistency may execute the forcibly logs for recovery) and message complexity (i.e., the collection many times. This is the case if the policy is updated number of messages sent). The log complexity of 2PVC is no during the round. While the number of rounds are theoretically different than basic 2PC and can be improved by using any of infinite, in a practical setting, this should occur infrequently. log-based optimizations of 2PC such as Presumed-Abort (PrA) 4.2 Two-Phase Validate Commit Algorithm and Presumed-Commit (PrC) [12]. The message complexity of 2PVC was analyzed in [13]. The 2PV protocol enforces trusted transactions, but does not enforce not safe transactions because it does not validate any integrity constraints. Since the Two-Phase Commit atomic 4.3 Using 2PV & 2PVC in Safe Transactions protocol (2PC) commonly used to enforce integrity constraints 2PV and 2PVC can be used to enforce each of the consistency has similar structure as 2PV, we propose integrating these pro- levels defined in Sec. 3. Deferred and Punctual (Defs. 5 and 6) tocols into a Two-Phase Validation Commit (2PVC) protocol. proofs are roughly the same. The only difference is that 6

Algorithm 2: Two-Phase Validation Commit - 5 E 2PVC (TM) 5.1 Environment and Setup

1 Send “Prepare-to-Commit” to all participants We used Java to implement each proof approach described 2 Wait for all replies (Yes/No, True/False, and a set of in Sec. 3 with support for both view and global consistency. policy versions for each unique policy) Although the approaches were implemented in their entirety, 3 If any participant replied No for integrity check the underlying database and policy enforcement systems were 4 ABORT 5 Identify the largest version for all unique policies simulated with parameters chosen according to Table 1. To 6 If all participants utilize the largest version for each understand the performance implications of the different ap- unique policy proaches, we varied the (i) protocol used, (ii) level of con- 7 If any responded False sistency desired, (iii) frequency of master policy updates, (iv) 8 ABORT transaction length, and (v) number of servers available. 9 Otherwise 10 COMMIT Our experimentation framework consists of three main com- 11 Otherwise, for participants with old policies ponents: a randomized transaction generator, a master policy 12 Send “Update” with the largest version server that controls the propagation of policy updates, and an number of each policy array of transaction processing servers. 13 Wait for all replies Our experiments were run within a research lab consisting 14 Goto 5 of 38 Apple Mac Mini computers. These machines were running OS X 10.6.8 and had 1.83 GHz Intel Core Duo processors coupled with 2GB of RAM. All machines were connected to a gigabit ethernet LAN with average round trip Punctual will return proof evaluations upon executing each times of 0.35 ms. All WAN experiments were also conducted query. Yet this is done on a single server, and therefore does within this testbed by artificially delaying packet transmission not need 2PVC or 2PV to distribute the decision. To provide by an additional 75 ms. for trusted transactions, both require a commit-time evaluation For each simulation and each possible combination of at all participants using 2PVC. parameters, 1000 transactions were run to gather average Incremental Punctual (Def. 8) proofs are slightly different. on transaction processing delays induced by the As queries are executed, the TM must also check for consis- particular protocol and system parameter choices. The ran- tency within the participating servers. Hence, a variant of the domized transactions were randomly composed of database basic 2PV protocol is used during the transaction execution. reads and writes with equal . To simulate policy For view consistency, the TM needs to check the version updates at different servers, the master policy server picks a number it receives from each server with that of the very random participating server to receive the updates. first participating server. If they are different, the transaction Given that our interest in this article lies in exploring the aborts due to a consistency violation. At commit time, all the average performance of each of the different approaches, we proofs will have been generated with consistent policies and made few assumptions to simplify the experimentations and only 2PC is invoked. In the global consistency case, the TM help limit the influence of other factors on transaction execu- needs to validate the policy versions used against the latest tion time. Specifically, we assumed the existence of a single policy version known by the master policies server to decide master policy server that has to be consulted for the latest whether to abort or not. At commit time, 2PVC is invoked by policy version belonging to a specific policy administrator. the TM to check the data integrity constraints and verify that This simplifies the 2PV protocol and reduces the number the master policies server has not received any newer policy of exchanged messages to realize the latest version among versions. TABLE 1 Finally, Continuous proofs (Def. 9) are the most involved. Simulation Parameters Unlike the case of Incremental Punctual in a view consistency, Continuous proofs invoke 2PV at the execution of each query Parameter Value(s) Times of policies update once during operations, once per which will update the older policies with the new policy participant join, or once at com- and re-evaluate. When a query is requested, its TM will (1) mit time execute 2PV to validate authorizations of all queries up to this Disk read latency 1-3 ms Disk write latency 12-20 ms point, and (2) upon CONTINUE being the decision of 2PV, Authorization check delay 1-3 ms submit the next query to be executed at the appropriate server, Data integrity constraint verification 1-3 ms otherwise the transaction aborts. The same actions occur under Transaction size Short: 8-15 operations, Medium: 16-30 operations, or Long: 31-50 global consistency with the exception that the latest policy operations version is used as identified by the master policy server. 7 all policy servers. We further assumed that all proofs of 31–50 operations running on up to 25 servers. For each case, authorization evaluate to true (i.e., only authorized individuals and as a baseline, we measured the transaction execution time attempted to run transactions in the system) and that no data when transactions execute without any proof of authorization integrity violations were encountered during any transaction’s and are terminated using the basic 2PC (shown in figures as execution. Therefore, transactions would only abort due to a solid line referring to Deferred 2PC only). In all cases, the policy inconsistency. This that Deferred, Punctual, and average transaction execution time of Deferred proofs with Continuous proofs will always produce successful commits, 2PVC was effectively the same as the baseline indicating that whether or not a policy change is detected. 2PVC has negligible overhead over the basic 2PC. We measure the average execution time of the shortest The relative performance of the different proofs of autho- successfully committed transactions (denoted ts), which occurs rization is consistent throughout the different experiments. when there are no policy changes, and the average execu- From the figures, we can conclude that the Deferred proofs tion time of the longest successfully committed transactions have the best performance of all, as the transaction operations (denoted t f ), which occurs when policy changes force re- are allowed to proceed without interruption until commit time. evaluations of the proofs of authorization or multiple rounds Of course, proofs of authorization failing at commit time of 2PV are invoked (e.g., in Continuous proofs). Essentially, will force the transaction to go into a potentially expensive t f captures the cost of recovering from a conflict. In the case rollback. That will not be the case with the other schemes, of Continuous proofs, the worst case is when a policy change as the proofs are evaluated earlier during the execution of the is detected each time a new server joins the execution of a transactions and the rollback process of aborted transactions transaction. The average transaction execution time to termi- involves fewer operations. nate (abort or commit) for Deferred, Punctual, and Continuous Punctual proofs come next in terms of performance. The proofs can be computed using the following equation, where minor difference between Punctual and Deferred proofs is Pu represents the probability of a policy update: because Punctual proofs incur the cost for the local autho- rization checks each of which is in the range of 3-5 ms. Both t = ts(1 − Pu) + t f Pu (1) Deferred and Punctual proofs are on average insensitive to As opposed to the other proofs of authorization, in In- the probability of policy updates (as realized from the graph cremental Punctual proofs, if a policy change is detected slope). This is due to the fact that both schemes only enforce during the execution of a transaction, the transaction will abort consistency at commit time. regardless if it is using view or global consistency. Therefore Incremental Punctual proofs show the worst performance to compute the average execution time, we assume that each of all schemes and are the most sensitive to the probability aborted transaction is re-executed once to successful commit, of policy updates. This is due to the fact that Incremental with all servers using consistent policies. This assumption Punctual proofs using either view or global consistency have approximates the cost for rolling back the aborted transac- the risk of aborting and re-executing each time a policy update tions. We use the following equation to compute the average is encountered. As the policy update probability increases, the transaction execution time: performance of Incremental Punctual is severely penalized. Continuous proofs show better performance than the Incre- t = (t f + ts)Pu + ts(1 − Pu) (2) mental Punctual approach, but are worse than the Deferred and Punctual approaches. Just as with Incremental Punctual, the where t denotes the measured average time of the quickest f performance of Continuous proofs suffers as the probability aborted transactions among the simulation runs, and t denote s of policy update increases, as with each policy update all the average time of the successfully committed transactions. previously evaluated proofs will go through a re-evaluation phase using the 2PV protocol. 5.2 Simulation Results A final observation is that in most cases, global consistency Using Eqs. 1 and 2, we plot Figs. 3 and 4 to show our proofs are slightly slower than view consistency. This extra simulation results for both the LAN arrangement and the latency comes from the additional communication round be- simulated WAN, respectively. Each figure shows the execution tween TM and the master policy server to retrieve the latest time of the committed transaction (y-axis) as the probability policy version. Global consistency proofs were faster than of the policy update changes (x-axis). The figures contrast be- view consistency ones in the few cases when the latest policy tween the four different approaches for proofs of authorization happens to match policy used by all participating servers and each with the two validation modes, namely, view and global as a result all servers skip the re-evaluation step of 2PVC. consistency. The figures show different transactions length: (a) short transactions involve 8–15 operations running on up to 6 T-O D 5 servers, (b) medium transactions involve 16–30 operations In this section, we briefly discuss the strengths and weaknesses running on up to 15 servers, and (c) long transactions involve of each of our protocol variants relative to three metrics: per- 8

200 400 700

180 360 630

160 320 560

140 280 490

120 240 420

100 200 350 Time (ms) Time (ms) Time (ms)

80 160 280 Deferred (view consistency) Deferred (view consistency) Deferred (view consistency) 60 Punctual (view consistency) 120 Punctual (view consistency) 210 Punctual (view consistency) Incremental Punctual (view consistency) Incremental Punctual (view consistency) Incremental Punctual (view consistency) Continuous (view consistency) Continuous (view consistency) Continuous (view consistency) 40 Deferred (global consistency) 80 Deferred (global consistency) 140 Deferred (global consistency) Punctual (global consistency) Punctual (global consistency) Punctual (global consistency) Incremental Punctual (global consistency) Incremental Punctual (global consistency) Incremental Punctual (global consistency) 20 Continuous (global consistency) 40 Continuous (global consistency) 70 Continuous (global consistency) Deferred (2PC only) Deferred (2PC only) Deferred (2PC only) 0 0 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Policy Update Probability Policy Update Probability Policy Update Probability

(a) Short Transactions (8–15 operations) (b) Medium Transactions (16–30 operations) (c) Long Transactions (31–50 operations) Fig. 3. Results for LAN experiments

5,000 11,000 18,000

4,500 9,900 16,200

4,000 8,800 14,400

3,500 7,700 12,600

3,000 6,600 10,800

2,500 5,500 9,000 Time (ms) Time (ms) Time (ms)

2,000 4,400 7,200 Deferred (view consistency) Deferred (view consistency) Deferred (view consistency) 1,500 Punctual (view consistency) 3,300 Punctual (view consistency) 5,400 Punctual (view consistency) Incremental Punctual (view consistency) Incremental Punctual (view consistency) Incremental Punctual (view consistency) Continuous (view consistency) Continuous (view consistency) Continuous (view consistency) 1,000 Deferred (global consistency) 2,200 Deferred (global consistency) 3,600 Deferred (global consistency) Punctual (global consistency) Punctual (global consistency) Punctual (global consistency) Incremental Punctual (global consistency) Incremental Punctual (global consistency) Incremental Punctual (global consistency) 500 Continuous (global consistency) 1,100 Continuous (global consistency) 1,800 Continuous (global consistency) Deferred (2PC only) Deferred (2PC only) Deferred (2PC only) 0 0 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Policy Update Probability Policy Update Probability Policy Update Probability

(a) Short Transactions (8–15 operations) (b) Medium Transactions (16–30 operations) (c) Long Transactions (31–50 operations) Fig. 4. Results for WAN experiments formance, accuracy, and precision. Since choosing a scheme the view of the servers executing the transaction might not for data and policy consistency enforcement is a strategic account for recent updates to the policy in use. Precision and decision that has to consider many trade-offs, we also discuss accuracy together define the security of the system. Deferred the impact of application level requirements on this decision. proofs are less accurate and thus less secure than the other approaches—as policies remain unchecked until the end of 6.1 Absolute Comparison a transaction—but have the best performance overall. The Punctual approach has similar performance to Deferred, but Table 2 represents a comparison between each of our eight is slightly more accurate than Deferred, since authorizations approaches for consistency enforcement. In this chart, all are at least checked locally throughout the transaction. In- approaches are assessed relative to one another using the cremental Punctual has the worst performance—especially metrics described above. Note that the rankings used in the when frequent policy updates occur—but is more accurate and chart (high, moderate, low and poor) are not absolute; rather secure than the previous approaches. Finally, Continuous has each approach is assessed comparatively. moderate to low performance which is the penalty that must From the table, we can see that scoring “high” in all the be paid for the high accuracy afforded by this approach. metrics is not possible, indicating a trade-off between perfor- mance and the other metrics. By design, all our approaches are highly precise, as all participants internally agree on the 6.2 Impact of Application Requirements policy version to be used during the transaction, although the Choosing a consistency enforcement scheme is not something accuracy of this policy can vary. In the general case, view that can be done in isolation, as application requirements may consistency is less accurate than global consistency, since limit the schemes available for use within a particular applica- 9

TABLE 2 Contrasting the various proofs of authorization Deferred Punctual Incremental Continuous View Global View Global View Global View Global Performance – infrequent updates ¥ ¥ ¥ ¥ ¯ ¯ ¢ ¢ Performance – frequent updates ¥ ¥ ¥ ¥ ¤ ¤ ¯ ¯ Precision ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ Accuracy ¤ ¯ ¤ ¯ ¢ ¥ ¢ ¥ ¥ – high ¢ – moderate ¯ – low ¤ – poor tion. We now investigate two cloud-based applications that are 7 R W representative of larger classes of interesting applications to Relaxed Consistency Models for the Cloud. Many database show how requirements can impact the choice of consistency solutions have been written for use within the cloud en- enforcement scheme. In particular, we consider three orthog- vironment. For instance, Amazon’s Dynamo database [14]; onal axes of requirements: code complexity (which is directly Google’s BigTable storage system [15]; Facebook’s Cassan- related to trusted computing base size), transaction mix (i.e., dra [16]; and Yahoo!’s PNUTS [17]. The common thread write-only, read/write with internal reads, and read/write with between each of these custom data models is BASE with a materialized reads), and policy/credential update frequency. relaxed notion of consistency provided in order to support Application: Event Scheduling. Consider an Event Mon- massively parallel environments. itoring Service (EMS) used by a multi-campus university to Such a relaxed consistency model adds a new dimension to track events within the university and to allow staff, faculty the complexity of the design of large scale applications and members, and student organizations to make online event introduces a new set of consistency problems [18]. In [19], registrations. The university is using a cloud infrastructure the authors presented a model that allows queriers to express to host the various EMS databases and execute the different consistency and concurrency constraints on their queries that transactions. Users have varying access privileges that are can be enforced by the DBMS at runtime. On the other hand, governed by authorization policies and credentials issued by [20] introduces a dynamic consistency rationing mechanism a university-wide credentialing system. In this system, read which automatically adapts the level of consistency at run- requests must be externalized to users during the transaction time. Both of these works focus on data consistency, while our execution so that intermediate decisions can be made. Further- work focuses on attaining both data and policy consistency. more, the university system in general has infrequent policy Reliable Outsourcing. Security is considered one of the and credentials updates, and requires lower code complexity major obstacles to a wider adoption of cloud computing. to minimize code verification overheads. Particular attention has been given to client security as it Recommendation: In this case, the use of Punctual proofs relates to the proper handling of outsourced data. For example, makes the most sense. Note that this approach has low code proofs of data possession have been proposed as a means complexity, performs fast, and is suitable for systems with for clients to ensure that service providers actually maintain infrequent updates. Read externalization is also permissible, as copies of the data that they are contracted to host [21]. In policies are checked prior to each operation in the transaction. other works, data replication has been combined with proofs Application: Sales Database This example is derived from of retrievability to provide users with integrity and consistency the traveling salesperson example in [13]. According to com- guarantees when using cloud storage [22], [23]. pany requirements, a customer’s data should only be read by To protect user access patterns from a cloud data store, [24] the representatives in the operations region of that customer, introduces a mechanism by which cloud storage users can while any other data should not be materialized until commit issue encrypted reads, writes and inserts. Further, [25] pro- time. The company also experiences very frequent policy and poses a mechanism that enables untrusted service providers credential updates, as representatives are frequently assigned to support transaction serialization, backup, and recovery with to different operational regions. The company considers secu- full data confidentiality and correctness. This work is orthog- rity to be very important as to avoid incorrect authorization onal to the problem that we focus on in this article, namely decisions that might leak customer information. Finally, the consistency problems in policy-based database transactions. company has enough resources to manage complex code, but Distributed Transactions. CloudTPS provides full ACID still requires reasonable execution latency. properties with a scalable transaction manager designed for Recommendation: This company should use the Continuous a NoSQL environment [26]. However, CloudTPS is primarily global approach for the highest accuracy to avoid any informa- concerned with providing consistency and isolation upon data tion leakage at runtime, or Continuous view for slightly lower without regard to considerations of authorization policies. accuracy. This provides a good balance between accuracy and There has also been recent work that focuses on providing performance, at the cost of higher code complexity. some level of guarantee about the relationship between data 10 and policies [27]. This work proactively ensures that data CNS–0964295, CNS–1017229 and IIS–1050301. stored at a particular site conforms to the policy stored at that site. If the policy is updated, the server will scan the data R items and throw out any that would be denied based on the [1] M. Armbrust et al., “Above the clouds: A berkeley view of cloud revised policy. It is obvious that this will lead to an eventually computing,” University of California, Berkeley, Tech. Rep., Feb. 2009. consistent state where data and policy conform, but this work [2] S. Das, D. Agrawal, and A. El Abbadi, “Elastras: an elastic transactional only concerns itself with local consistency of a single node, data store in the cloud,” in USENIX HotCloud, 2009. [3] D. J. Abadi, “Data management in the cloud: Limitations and opportu- not with transactions that span multiple nodes. nities,” IEEE Data Bulletin, Mar. 2009. Distributed Authorization. The consistency of distributed [4] A. J. Lee and M. Winslett, “Safety and consistency in policy-based proofs of authorization has previously been studied, though authorization systems,” in ACM CCS, 2006. [5] M. Myers, R. Ankney, A. Malpani, S. Galperin, and C. Adams, “X.509 not in a dynamic cloud environment (e.g., [4]). This work internet public key infrastructure online certificate status protocol - highlights the inconsistency issues that can arise in the case ocsp,” RFC 2560, Jun. 1999, http://tools.ietf.org/html/rfc5280. where authorization policies are static, but the credentials used [6] E. Rissanen, “extensible access control markup language (xacml) version 3.0,” Jan. 2013, http://docs.oasis-open.org/xacml/3.0/xacml-3.0- to satisfy these policies may be revoked or altered. The authors core-spec-os-en.. develop protocols that enable various consistency guarantees to [7] D. Cooper et al., “Internet x.509 public key infrastructure certificate be enforced during the proof construction process to minimize and certificate revocation list (crl) profile,” RFC 5280, May 2008, http://tools.ietf.org/html/rfc5280. these types of security issues. These consistency guarantees [8] J. Li, N. Li, and W. H. Winsborough, “Automated trust negotiation using are similar to our notions of safe transactions. However, our cryptographic credentials,” in ACM CCS, Nov. 2005. work also addresses the case in which policies—in addition to [9] L. Bauer et al., “Distributed proving in access-control systems,” in Proc. of the IEEE Symposium on Security and Privacy, May 2005. credentials—may be altered or modified during a transaction. [10] J. Li and N. Li, “OACerts: Oblivious attribute based certificates,” IEEE TDSC, Oct. 2006. [11] J. Camenisch and A. Lysyanskaya, “An efficient system for non- 8 C transferable anonymous credentials with optional anonymity revocation,” in EUROCRYPT, 2001. Despite the popularity of cloud services and their wide adop- [12] P. K. Chrysanthis, G. Samaras, and Y. J. Al-Houmaily, “Recovery tion by enterprises and governments, cloud providers still lack and performance of atomic commit processing in distributed database services that guarantee both data and access control policy systems,” in Recovery Mechanisms in Database Systems. PHPTR, 1998. [13] M. K. Iskander, D. W. Wilkinson, A. J. Lee, and P. K.Chrysanthis, consistency across multiple data centers. In this article, we “Enforcing policy and data consistency of cloud transactions,” in IEEE identified several consistency problems that can arise during ICDCS-SPCC, 2011. cloud-hosted transaction processing using weak consistency [14] G. DeCandia et al., “Dynamo: amazons highly available key-value store,” in ACM SOSP, 2007. models, particularly if policy-based authorization systems are [15] F. Chang, et al., “Bigtable: A distributed storage system for structured used to enforce access controls. To this end, we developed data,” in USENIX OSDI, 2006. a variety of light-weight proof enforcement and consistency [16] A. Lakshman and P. Malik, “Cassandra- a decentralized structured storage system,” in ACM SIGOPS, Apr. 2010. models—i.e., Deferred, Punctual, Incremental, and Continuous [17] B. F. Cooper et al., “Pnuts: Yahoo!’s hosted data serving platform,” proofs, with view or global consistency—that can enforce in- Proc. VLDB Endow., Aug. 2008. creasingly strong protections with minimal runtime overheads. [18] W. Vogels, “Eventually consistent,” in Commun ACM, Jan. 2009. [19] H. Guo, P.-A. Larson, R. Ramakrishnan, and J. Goldstein, “Relaxed We used simulated workloads to experimentally evaluate currency and consistency: how to say ”good enough” in ,” in ACM implementations of our proposed consistency models relative SIGMOD, 2004. to three core metrics: transaction processing performance, [20] T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann, “Consistency rationing in the cloud: pay only when it matters,” Proc. VLDB Endow., accuracy (i.e., global vs. view consistency and recency of Aug. 2009. policies used), and precision (level of agreement among trans- [21] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, action participants). We found that high performance comes at and D. Song, “Provable data possession at untrusted stores,” in ACM CCS, 2007. a cost: Deferred and Punctual proofs had minimal overheads, [22] K. D. Bowers, A. Juels, and A. Oprea, “Hail: a high-availability and but failed to detect certain types of consistency problems. On integrity layer for cloud storage,” in ACM CCS, 2009. the other hand, high accuracy models (i.e., Incremental and [23] A. Shraer, C. Cachin, A. Cidon, I. Keidar, Y. Michalevsky, and D. Shaket, “Venus: verification for untrusted cloud storage,” in ACM Continuous) required higher code complexity to implement CCSW, 2010. correctly, and had only moderate performance when compared [24] P. Williams, R. Sion, and B. Carbunar, “Building castles out of mud: to the lower accuracy schemes. To better explore the differ- practical access pattern privacy and correctness on untrusted storage,” in ACM CCS, 2008. ences between these approaches, we also carried out a trade-off [25] P. Williams, R. Sion, and D. Shasha, “The blind stone tablet: Outsourc- analysis of our schemes to illustrate how application-centric ing durability to untrusted parties,” in NDSS, 2009. requirements influence the applicability of the eight protocol [26] Z. Wei, G. Pierre, and C.-H. Chi, “Scalable transactions for web applications in the cloud,” in Euro-Par, Aug. 2009. variants explored in this article. [27] T. Wobber, T. L. Rodeheffer, and D. B. Terry, “Policy-based access control for weakly consistent replication,” in ACM EuroSys, 2010. Acknowledgments. This work was supported in part by the National Science Foundation under awards CCF–0916015, 11

Marian K. Iskander is currently pursuing a PhD in computer science from the University of Pittsburgh. She received her BS and MS degree in computer and information from the University of Ain Shams, Egypt in 2002 and 2007, respectively. Her broad research interests focus on security and privacy in distributed systems. She has previous research work in privacy and fault-tolerance in wireless sensor networks and currently working on desgining location privacy preserving techniques in crowd sensing applications.

Tucker Trainor received his BS of computer sci- ence from the University of Pittsburgh in 2012. His interests lie in information security and online privacy. He is currently working as a mobile web developer for Branding Brand, a Pittsburgh-based mobile commerce provider.

Dave W. Wilkinson earned his BS in computer science at the University of Pittsburgh in 2009 and currently pursuing his PhD. His research work relates to distributed operating system design and participated in the design and implementation of XOmB and Djehuty operating systems. His re- search focus is writing maintainable, socially rele- vant systems to help people use technology and not the other way around.

Adam J. Lee is an Assistant Professor in the De- partment of Computer Science at the University of Pittsburgh. He obtained a BS in computer science from Cornell University in 2003, as well as MS and PhD degrees from the University of Illinois at Urbana-Champaign in 2005 and 2008, respec- tively. His research interests lie at the intersection of the distributed systems, security, and privacy fields, and his recent work has been funded by DARPA and the NSF.

Panos K. Chrysanthis is a Professor of Computer Science and the founding director of the Advanced Data Management Technologies Laboratory at the University of Pittsburgh. His lab has a broad fo- cus on user-centric data management for scalable network-centric and collaborative applications. He is an Adjunct Professor at the Carnegie Mellon University and at the University of Cyprus. He is a recipient of the NSF CAREER Award and he is an ACM Distinguished Scientist and a Senior Member of IEEE. He earned his BS degree from the University of Athens, Greece in 1982, and his MS and PhD degrees from the University of Massachusetts at Amherst in 1986 and 1991, respectively.