Ficklebase: Looking Into the Future to Erase the Past

Ficklebase: Looking into the Future to Erase the Past

Sumeet Bajaj #1, Radu Sion #2 # Computer Science, Stony Brook University Stony Brook, NY, USA 1 [email protected] 2 [email protected]

Abstract—It has become apparent that in the digital world data requirement of “secure deletion”. This is important since nu- once stored is never truly deleted even when such an expunction merous system sub-components such as memory [33], storage is desired either as a normal system function or for regulatory mediums [39] and file systems [30] have shown to preserve compliance purposes. Forensic Analysis techniques on systems are often successful at recovering information said to have been deleted data. Applications such as databases hold on to deleted deleted in the past. data in transaction logs, error logs, temporary tables, de- Efforts aimed at thwarting such forensic analysis of systems allocated data pages, index entries and audit logs [38,69]. have either focused on (i) identifying the system components These data remnants can later be recovered by employing where deleted data lingers and performing a secure delete oper- forensic analysis [3] techniques. In good hands [6,14,53, ation over these remnants, or (ii) designing history independent data structures that hide information about past operations which 62,67,71] these techniques are very helpful in incriminating result in the current system state. wrong-doings by malicious users, however in the wrong hands Yet, new data is constantly derived by processing existing they pose a grave threat to data & user privacy. (input) data which makes it increasingly difficult to remove Mechanisms have therefore been designed to identify sys- all traces of this existing data, i.e., for regulatory compliance tem parts where deleted data artifacts linger and subsequently purposes. Even after deletion, significant information can linger in and be recoverable from the side effects the deleted data remove them. Solutions have been proposed for general stor- records left on the currently available state. age media [39,45,54,72], file systems [24,56] and database In this paper we address this aspect in the context of a applications [69]. Also, off-the-shelf tools can now be used to relational database, such that when combined with (i) & (ii), perform such a secure data erase [5]. complete erasure of data and its effects can be achieved (“un- “History independent” data structures [48] (also referred to traceable deletion”). We introduce Ficklebase – a relational database wherein once as “uniquely represented” data structures) have the property a tuple has been “expired” – any and all its side-effects are that their storage layout is a function of only the current removed, thereby eliminating all its traces, rendering it un- state and not of the history of past operations that led to recoverable, and also guaranteeing that the deletion itself is unde- it. Such data structures reveal no additional information to tectable. We present the design and evaluation of Ficklebase, and an adversary outside of what can be inferred anyway via then discuss several of the fundamental functional implications of un-traceable deletion. legitimate interfaces. If a delete operation is part of a system interface, then utilizing a “history independent” data structure I.INTRODUCTION ensures that – an adversary subsequently gaining access to the The “delete” operation in modern computer systems can system storage is unable to infer whether the delete operation at many times be an illusion [65]. Although once deleted, was performed at all. This is critical since traditional data data may no longer be accessible via legitimate system inter- structures (e.g., B-Tree indexes) preserve (in their current state faces, numerous instances [29,68,70] have demonstrated that layout) information about past operations. “History indepen- presumably erased data can be recovered with simple mining dent” variants have been developed for Hash Tables [61], 2-3 techniques. Trees [59], B-Trees [42] and Skip-Lists [43]. Preserving un-wanted data not only jeopardizes user privacy A third, largely ignored aspect in preventing erased data & confidentially but can also violate retention policies set from being recovered concerns its relationship to the current forth by legislations such as HIPAA [17], FERPA [8], FISMA state (data present in the system now). The main observation [9], EU Data Protection Directive [7] and the Gramm–Leach– here is that side-effects of deleted data persist within the Bliley Act [10]. E.g., the fifth directive of the Data Protection current state – this can be then exploited to derive information Act [20] mandates that information shall not be retained for about the deleted data items in direct violation of regulation any longer than its intended purpose. and the intent behind deleting the data in the first place. Prior work has addressed this issue on two fronts: “secure For true regulatory compliance, “secure deletion” and “his- deletion” and “history independent” data structures. tory independent” data structures are not sufficient by them- The observation that data artifacts can linger in systems selves but rather need to be augmented with mechanisms for for a significant period after deletion [32,40] gave rise to the full erasure of post-deletion data side-effects. We term this Agent MI832 Assignments developers. Ideally, removal of all traces of the deleted agent Current Name Alias Speciality Rating AgentRating Avg(Rating) Location Spider 8.7 8.7 from the agency records should be supported transparently by Ethan Hunt Hawk Hong Kong Access 8.7 Charles Vine Spider Zurich Intelligence 9.1 Transaction T1: New Assignment the underlying database. Ficklebase achieves exactly this. Sarah Walker Tzar Dhaka Security 8.1 Agent Rating Avg(Rating) John Steed Chopper Warsaw Strategy 7.9 Spider 8.7 8.3 Sam Clover Agent J Omsk Intelligence 7.6 ….. ….. ….. ….. ….. Chopper 7.9 A. Concepts T Transaction T2 : New Assignment Mission Assignment To generalize, consider the following notation for a trans- Code Department Avg (Rating) Mission Agent Agent Rating Avg(Rating) I Spider 8.7 8.067 MI832 Narcotics 8.4 ST782 Hawk action Tj in a relational database - Tj(R) → (M), where ST782 Weapons 8.9 MI832 Spider M Chopper 7.9 CI397 Collections 7.6 CI397 Tzar Agent J 7.6 R represents the read operations performed by Tj while M CR641 Crime 9.0 MI832 Chopper Transaction T3 : Delete Chopper ….. ….. ….. MI832 Agent J E indicates the data modifications (update & insert) operations ….. ….. Agent Rating Travel Spider 8.7 of Tj (|R|≥ 0,|M|≥ 0). Let rti , uti and iti denote the read, Rollback T Agent Source Destination Mission Date Agent J 7.6 2 Hawk Berlin Vienna ST782 03/12/11 Delete Chopper update and insert operation respectively of a tuple ti. Also Rollback T 1 Re-Execute T Spider Chicago Sydney MI832 06/15/11 2 T S(o ) denotes the commit timestamp of the transaction that Hawk Vienna Hong Kong ST782 01/23/12 i Avg(Rating) Avg(Rating) Spider Sydney Zurich MI832 08/07/12 performs operation oi. ….. ….. ….. ….. ….. 8.7 8.15 Now suppose that the following transactions have been (a) (b) executed & committed (in sequence): T1() → (it1 ), T2(rt1 ) →

Fig. 1. (a) Intelligence Agency Database Snippet (b) Rollback vs (it2 ), T3(rt2 ) → (ut5 ,ut6 ), T4(rt5 , rt4 ) → (ut7 ,it10 ). Transaction Re-Execution Let us first determine tuple side-effects in the above execution. Transaction T read tuple t and inserted t . Hence the “un-traceable1 deletion” (Section II). 2 1 2 insertion of t is a side-effect of t . Similarly, transaction T In the following we introduce Ficklebase a relational 2 1 3 read tuple t and updated tuples t and t . Hence updates to database with fully un-traceable deletion guarantees. 2 5 6 t5 and t6 are side effects of t2. In addition T4 read tuple t5 II. MOTIVATION AND CONCEPTS updated by T3. Hence modifications made by T4 i.e. update of t and insertion of t are also side-effects of t and so on. The numerous regulatory compliant data expiration man- 7 10 2 Overall, the side-effects (SA) are as follows: dates derive from real-life privacy concerns in today’s increas- SA(t ) = (i by T , u u by T , u i by T ) ingly digital societies. 1 t2 2 t5 t6 3 t7 t10 4 SA(t ) = (u u by T , u i by T ) To illustrate how deleted/expired data can leave side-effects 2 t5 t6 3 t7 t10 4 SA(t ) = SA(t ) = (u i by T ). behind, consider a snippet from a hypothetical intelligence 5 4 t7 t10 4 It is to be noted from the above illustration that data side- agency database (Figure 1(a)). As a simple example - suppose effects go beyond simple primary-foreign key relationships. that once agent Sarah leaves the agency all evidence of her In fact, any data that is read & then results in modification existence in the database needs to be eradicated. This would of other data constitutes a side-effect and needs to be hidden mean deletion of agent information tuples from the Agent after deletion (of the read data item). relation, travel information from Travel relation and mis- Now, consider the case when tuple t expires and is to be sion assignments (Mission). In addition, the Avg(Rating) 2 t attribute in Mission would need to be recomputed for each deleted. An un-traceable delete of 2 should leave the database t mission the agent was assigned to. Note that simple rollback in a state such that no trace of 2 is left behind, not even its effects on other data. This requires the following: (1) rollback of transactions that computed the Avg(Rating) will not T T t T suffice, but they need to be re-executed to compute the new of transactions 4& 3. (2) deletion of 2 (or rollback of 2). T T correct values (Figure 1(b)). If such a re-computation is not (3) re-execution of transactions 3 & 4. (this is necessary as illustrated by the average example of Figure 1). performed, any adversary that gains access to the database in the future can infer that deletion took place as well as This results in the execution schedule - T1T3T4. Any potentially significant additional information about the deleted database that performs operations equivalent to the above steps agent, by looking at properties of the tuples and indexes for and achieves a schedule where the transaction that inserted t2 Mission Agent never took place would achieve un-traceable deletion of t2 the and relations. 2 At first glance it may appear that such a deletion can sim- (proofs in Section V) . We define side-effects and un-traceable ply be performed by application logic. However, it becomes deletion in the following. quickly apparent that, deletion of all data linked to the agent Definition 1: Side-effects of a tuple ti (SA(ti)) are repre- gets complicated, e.g., in the case of a transaction that uses sented by the set of all data modifications (update and insert) agent information and mission data to generate new travel operations such that assignments for others. Also, implementing such a deletion 1) If ∃Tj(Rj ) → (Mj ) s.t. rti ∈ Rj and T S(Tj) > in application logic (e.g. by using pre-defined Compensating T S(iti ) then, ∀otm ∈ Mj , otm ∈ SA(ti).

Transactions [34,55]) requires detailed semantic knowledge 2) ∀otm ∈ SA(ti), If ∃Tj (Rj ) → (Mj) s.t. rtm ∈ Rj and of all database transaction operations. Finally, this would T S(Tj) > T S(otm ) then, ∀otn ∈ Mj, otn ∈ SA(ti). also significantly increase the burden on database application 2Secure deletion & history independence would still be required to truly 1 To differentiate from a “secure deletion” performed by overwriting. erase t2 (section IV-E) Note that this definition is recursive but not circular. There Time-sensitive Queries. If a query running over “past” data are two reasons for this: (a) under a fully serializable mode of (previously generated) is repeated, then the intuition is that execution there exists a serial schedule (i.e. sequential with no database responses should be unchanged since the past has overlap in time) of database transactions; (b) SA(ti) includes already occurred (e.g. order is shipped, patient is discharged database operations and not the tuples themselves. Consider etc). However, un-traceable deletion of one or more tuples the sequence T1(rt1 ) → (ut2 ), T2(rt2 ) → (ut1 ). Although it that comprised the result set of such a query could result in a may appear circular at first glance, the operations ut1 and ut2 different response (for the same query) at a later time! are distinct, thereby SA(t1)= {ut2 } and SA(t2)= {ut1 }. E.g., consider the query “find the number of Definition 2: Un-Traceable Delete. Let the current agents that travelled on date dt” on the sample database state be achieved by the following serialized trans- database from Figure 1 which in SQL is – S action execution sequence - Γ = ...Tj−2Tj−1TjTj+1Tj+2, SELECT COUNT(DISTINCT AGENT) FROM TRAVEL where tuple ti was inserted by transaction Tj . Then an un- WHERE DATE = dt. traceable delete of ti is a (set of) operation(s) that changes If a given agent was made un-traceable on date de, where the current database state into a state computationally indistin- de > dt then, the responses of the above query will be different 3 guishable from a state resulting from the execution sequence on two dates d1 and d2 (dt < d1 < de < d2), although the E ′ ′ ′ Γ =...Tj−2Tj−1TjTj+1Tj+2, where Tj = φ or Tj = Tj − iti . expected answer for the query on both dates d1 and d2 would ′ Tj = φ when the application logic dictates that non- be the same (in a traditional database). insertion of ti means complete rollback of transaction Tj. In Committed Transactions. Traditionally, once committed, E S ′ this case Γ =Γ − Tj. Otherwise, Tj = Tj − iti e.g. when transactions are treated as permanent and irreversible. How- Tj inserts ti using an insert-select query. ever, with un-traceable deletion this is no longer the case. In fact for a database to support un-traceable deletion it B. Applications must employ mechanisms to change the effects of transactions It is important to note that such an untraceable delete committed in the past. This is not only required but is also the operation is very often not desirable – especially in scenarios most challenging requirement to meet. involving data with real-life artifacts such as cash. E.g., External Application Logic. Un-traceable deletion can not consider a banking application that records money transfer be easily applied transparently for databases that are agnostic between clients. If a client A has transfers recorded with to the application logic semantics, e.g., when most of the another client B, then deletion of client A (when A closes business logic or functionality resides in application programs its account), does not justify deletion of A ↔ B transfers and which in turn access the database externally via a standard their side-effects – since these “side-effects” are in fact the SQL interface. cash that now belongs to B! The reason for this is straightforward. For untraceability, the On the other hand consider a privacy sensitive application database must be able to re-execute transactions. This means that maintains confidential documents, records document ac- that database must have access and understand all application cesses by its users and generates statistical or cross-document logic. E.g., in Figure 1 if the database did not know that the intelligence information. Once a document D is to be purged Rating statistic was computed as an average it would not be it is important to properly erase all associated access records able to correctly remove the effects of deleted agent Chopper. & intelligence information deduced from D, lest this would While often this can be alleviated by moving as much as reveal its existence as well as leak information from therein. possible from the application logic into the data layer, for A third category of applications where an equivalent of un- fully external logic, things can get complicated. traceable delete operation is desired is not privacy but rather Delete vs Un-Traceable Delete. Note that the transaction functionality-centric: economic data such as the Current Pop- notations in Section II-A and the definition of un-traceable ulation Survey (CPS) [4] are permitted to undergo revisions. delete (definition 2) do not include traditional delete operations A simple case for revision could be that an individual I is (di). This is done for simplicity and also to differentiate wrongly classified, which means deletion of I’s information between traditional delete operations (delete queries) from an from the data set and its effects on computed statistics (e.g. un-traceable deletion. However, the inclusion of delete query average earnings). operations are required and strongly supported.

C. Discussion III. MODEL A database providing un-traceable deletion will in certain Adversary. We assume an adversary with full access to aspects function differently than a traditional database without today’s database. She can employ any mining or forensic it. Here we discuss some of these differences. techniques and wishes to recover information about any tuples deleted in the past. 3 No non-uniform probabilistic polynomial time algorithm exists that can Suppose that a tuple ti expires and is made un-traceable at distinguish between them [52]. Ficklebase in fact offers stronger information time Et. Let Dc denote the database state at time ct. Then theoretic guarantees but we formulate this deﬁnition in terms of computational t adversaries to allow for the deployment of cryptography in the underlying data the goal of un-traceable deletion is to prevent the adversary structures and mechanisms. from recovering any information (via side-effects) about ti BEGIN

Q1 Current Time (E t) Server V V V V E < E < E Q2 1 2 3 …. n 0 t 1 Client Proxy …. …. Database Time Qm E0 E1 E2 E3 En-1 En COMMIT (a)

Current Time (E t) BEGIN V V V V E < E < E 1 2 3 …. n 1 t 2 BEGIN Queries for V 1 Q Time 1 Execution Submit Rewriting Queries for V 2 E E E E E E Q2 .… 1 2 3 4 n n+1 Order by Proxy Order …. Queries for Vn Q Current Time (E t) m Consolidation V1 V2 V3 …. Vn E < E < E COMMIT 2 t 3 COMMIT Original Client Time Transaction (b) Rewritten Transaction E2 E3 E4 E5 En+1 En+2 Fig. 3. Overview of (a) Architecture (b) Query Re-writing. Qi = Fig. 2. Version maintenance & expiration with progression of time. Queryi, Vj = Versionj . Vj = Versionj . The maintenance of future versions, transaction application (including its existence), while having full access to any (or and expiration are entirely achieved using versioning and runtime query re-writing. all) of the database states Dcj , where cj > Et. Note that the case where the adversary gains access to any To exemplify, recall from Section III that tuples expire at fixed intervals of time (based on policy). We denote the end two database states Dcm and Dcn where ti ∈ Dcm and cm < time of each such interval as Ei. For each time interval range Et Et database in which the expired tuples are not supposed to exist. exist (i ≥ 1). A version Vi is expired (utilizing a secure delete Ongoing transactions are then applied to all these (current operation) at its expiration time Ex(Vi) (section IV-E). and future) versions. This in effect constructs directly from To illustrate, at any time Et, E0 < Et < E1 versions the untraceable delete definition (Section II) in which if all V1, V2, ..., Vn exist, with V1 being the current version visible transactions are applied to a database instance except for the to clients (Figure 2). Once the current time approaches E1, insertion of tuple tj , then tj has undergone an untraceable version V1 is deleted, V2 become V1, V3 becomes V2 and so delete in that instance. on. Also, Ex(V1) ← E2, Ex(V2) ← E3 and so on. This approach avoids two key problems: (i) keeping track Finally, a new version Vn+1 is created when a tuple with of all system-wide side-effects, and (ii) retroactive rollbacks expiration time >En is inserted by any client transaction Tj . of committed transactions & their re-execution. Components. Figure 3 (a) illustrates the main Ficklebase components. The main query re-writing logic resides in the functionality (e.g. hourly) necessary modifications should be Ficklebase proxy. The proxy intercepts all client queries and made. communicates with the server on behalf of the clients. The Also, tuples are copied “on write” only, when an update database server is an off-the-shelf DBMS. modifies an attribute value causing it to differ between ver- Execution Model. A client transaction Tj is a set of serial- sions. The version attribute is then automatically modified by izable SQL statements Tj={begin,Q1,Q2,Q3,...,Qm,commit} query rewrites to indicate distinct tuple versions. where Qj is a DDL (create/drop), DML (insert/update/delete) As an example, consider the following tuple tj with or a select query. k attributes – tj = {VERSION=0000..11, ATTR1=value1, ATTR2=value2, ... , ATTRk=valuek}. The VERSION attribute Algorithm 1 EXDT2VER has bits b1 & b2 set indicating that the same tuple copy Input: sver INT, exdt DATE, policy INT is valid in both, versions V1 & V2 i.e. expiration time of Output: version BIT(βv) tj ≤Ex(V2). Now, suppose that an update query being applied ′ 1: ver ← 0 to V2 modifies ATTR1 from value1 to value1. Then a new 2: ever ← 0 copy of tj is created such that 3: switch(policy) tj = {VERSION=0000..01, ATTR1=value1, , ATTR2=value2, 4: case 1: ... , ATTRk=valuek} and ′ ′ 5: ever = datediff(exdt, curdate()) + 1 tj = {VERSION=0000..10, ATTR1=value1, , ATTR2=value2, 6: case 2: ... , ATTRk=valuek} 7: ever = period diff(extract(year month from exdt), The version fields of the original and copied tuples are updated extract(year month from curdate())) + 1 (by query rewrites, Figure 5(b)) to correctly maintain distinct 8: case 3: version copies. 9: ever = (period diff(extract(year month from exdt), C. Query Rewriting extract(year month from curdate())) div 3) + 1 Ficklebase relies heavily on query re-writing within the 10: case 4: proxy. Each client query is transformed into a set of queries 11: ever = year(exdt) - year(curdate()) + 1 each of which is then classified as a version specific or con- 12: end switch solidation query. Figure 3 (b) gives an overview of query re- 13: if ever ≥ sver then writing along the order of query submission by the client and − 14: ver ← 2sver 1 the order of execution after re-writing. BEGIN and COMMIT 15: end if statements are executed as is at the start and end of the re- 16: return ver written transaction. Version specific queries only affect the version that they are applicable to while consolidation queries ensure compact B. Versioning storage by combining tuple copies across versions. All versions are maintained within a single database in- Figures 4(a) - 5(b) detail the query re-writing operations. stance. To limit storage overheads tuple copies are combined We briefly discuss them in the following. i.e. if tuple attributes have the same value across multiple DDL (Create/Drop) statements [figure 4(a)]. DDL state- versions then only a single copy of the tuple is maintained ments are re-written in order to (1) Transparently add the for all such versions. VERSION attribute to the relation being created. The VER- A special VERSION attribute is transparently added to each SION attribute is also added (as the terminal field) on any relation by query rewriting (section IV-C) and is not visible to indexes. (2) To create and drop Version Specific views which clients. The VERSION attribute is a bit field of size βv wherein are later used in rewriting select & DML (insert/update) a bit bi (0view 4 expiration time of tuple ≤Ex(Vi) . on a relation R for a version Vi selects only the tuples from Client queries only specify the tuple expiration times on R that are valid in version Vi. (3) Create additional indexes on insertion via the EXPIRATION TIME tuple attribute. Rewriting the VERSION attribute thereby improving overall performance of insert queries (figure 5(a)) converts this expiration time of the re-written transaction. into the correct value of the VERSION attribute5. This is Select statements [figure 4(b)]. A select statement is re- accomplished using the EXDT2VER function (depicted in al- written for each version. When applied to a version Vi it is gorithm 1). Note that this is a sample function that implements ensured that the select only reads tuples valid in that version. daily, monthly, quarterly & yearly expirations. For additional This is achieved by replacing all table references in the select statement with the corresponding Version Specific views. Only 4 b The last bit ( βv ) is used for consolidation and does not represent any the results of the select statement executed on version V1 version. are returned to the client. The remainder of the results are 5Expiration times are only specified/computed in insert queries & cannot be updated at a later time – an almost pervasive requirement of most information filtered out by the proxy component. Thus the existence of life-cycle regulations. other versions is hidden from client applications. C Q l u i e e r n y t C Q l u i e Q e r u n y e V1 t r y

Q u e V1 r to i Vn e s (i) C Q l u i e e r n Q y t u e V1 Q r to u i Vn e V1 e r s y

Q u e V1 r to i Vn e s (ii)

(a) (b) Fig. 4. Query Rewrites for (a) DDL (Create & Drop) statements. (b) Select statement.

The case where a transaction is read-only (i.e. comprises of Here, the client application desires that if no tuple is selected only select statements e.g. a reporting application) is handled by the first select query then the transaction be rolled back. differently. In this case all queries within the transaction are The ROLLBACK syntax is specially provided by Ficklebase applied only to the current version V1 (and the results of each for this purpose. Recall from Section II that it is essential statement returned to the client). This is sufficient since read- for the database to posses the entire application logic. The only transactions do not modify any tuples, and hence do not ROLLBACK as shown in this example enables the specification generate side-effects. of such conditions within transaction queries. DML (insert/update) statements [figures 5(a),5(b)]. Similar Now, it is entirely possible that when being applied to to select, DML statements are also re-written to replace distinct versions a rollback may occur for certain versions, table references with Version Specific views. DML statements but not for others. Query rewriting in Ficklebase handles also create new tuples (or modify existing tuples in case this at different levels (1) The client ROLLBACK statement of updates), thereby bringing additional tuple copies into is also re-written for each version, including creation of existence. These extraneous copies are combined together by separate copies (for each version) of user-defined variables consolidation queries which are generated for each relation (like @d next o id in the example above). (2) A savepoint is in which either a tuple is inserted or modified by the client created on the database before execution of queries for each transaction. Also, similar to select, only results of statements version. executed on current version V1 are seen by clients. When a rollback occurs (i.e. the condition in the ROLL- BACK statement evaluates to true) for any version the Fick- D. Version Specific Rollbacks lebase proxy issues a SQL ROLLBACK statement to the It is often desired by application logic that transactions be database, rolling back the transaction up to the previous rolled back under certain conditions e.g. tuple not found. Note savepoint. This undoes the effects of all queries executed on that these are not error/failure conditions such as duplicate key that specific version. The effects of queries on other versions or deadlock (in which case the transaction is implicitly rolled remain intact. back by the DBMS) but rather a part of application function- This is not unlike the case of nested transactions [64] with ality. E.g. consider the following two queries submitted as part a distinct sub-transaction for each version. Individual sub- of a client transaction. transactions can be rolled back without affecting the parent transaction. SELECT @d_next_o_id := d_next_o_id, d_tax FROM DISTRICT WHERE d_id = 1 AND d_w_id = 1 E. Expiration As illustrated in Figure 2 when the current time Et ROLLBACK(ISNULL(@d_next_o_id)) approaches Ex(V1), version V1 expires. Expiration involves C Q l u i e C e Q r l n u y i t e e r n y t

Q Q u u e V1 e V1 r to r to i Vn i Vn e e s s

C o n C s Q o o u n l e s Q i r o u d i l e a e i r t s d i i a e o t s n i o n

(a) (b) Fig. 5. Query Rewrites for (a) Insert statement. (b) Update statement. the following (1) Physical delete of all tuples that were any deleted tuples. We refer the reader to [69] for details. valid only until version V1 i.e. tuples with expiration time History Independence. Data structures such as B-Trees are ≤ Et = Ex(V1). (2) Setting the next version V2 as the current commonly used by the underlying database storage engines. version V1 visible to clients i.e. Vi ← Vi+1,i ≥ 1 and thus The storage layout of B-Trees (or variations such as B+-Trees) Ex(Vi) ←Ex(Vi+1). is often a function of the (sequence of) operations performed (1) + (2) are achieved by a scheduled task that executes (at on them due to their deterministic insertion & deletions. the expiration interval Ex(V1)) a transaction comprising of the Thus, even after un-traceable delete & secure deletion (as following queries for each relation Ri. above) an adversary gaining access to the database storage that utilizes these structures can potentially (by analyzing their UPDATE Ri SET VERSION = VERSION >> 1 layouts) still reveal deleted content. We note that although such deductions are difficult in practice [53,69] they are DELETE FROM Ri WHERE VERSION = 0 nonetheless possible and in certain very specific cases trivial + The next effect of the above queries is the deletion of all [53] (e.g. an index based on incrementing values). For B - Trees in particular the amount of information regarding past tuples that were valid only in version V1 and not in any other version Vi,i> 1. operations decreases as the fan-out increases and fan-outs in Secure Deletion. However, the execution of the above typical usages are usually large. Ideally a fan-out of N (N is two queries is not sufficient to physically delete all tuples the total number of tuples) stores all index values sorted in from the expired version for two reasons (1) Many database a single root node and is completely history independent, but systems do not physically delete (at time of execution of delete not practical since it affects performance. statement) but rather mark tuples for deletion at a later time. Such leakages from storage layouts of data structures can This limitation of many popular database systems has been be prevented using one of the below two approaches (1) By analyzed in [69]. (2) Other database components such as the the adoption of history independent versions of storage data transaction log, temporary files etc may still reveal deleted structures such as B-Treaps [42] or B-SkipLists [43]. (2) By content. re-creation of index on expiration. Thus deleted tuples will remain in existence even after ex- Although ready-to-use, well evaluated implementations of piration. To avoid such leakage of deleted content we suggest history independent B-trees are not yet easily available, it is adoption of secure deletion mechanisms from [69] wherein (1) nonetheless a promising direction to pursue. During the execution of delete statement the space where tuple (2) is a rather simple technique to employ and can be content resides is overwritten (by zeros). (2) Individual log achieved by executing the following queries (or equivalent records in the transaction log are encrypted to avoid revealing operations) for each relation Ri Vi CREATE TABLE Ri_tmp LIKE Ri Let ΓE denote the set of transactions successfully executed (without being rolled back) on version i until time ℓ. INSERT INTO Ri_tmp SELECT * FROM Ri Then, Ficklebase guarantees that each version does not ORDER BY Ai see any tuples that already should have been expired by the expiration time of the previous version. More formally: Vi DROP TABLE Ri Theorem 1: ∀Vi, Tj ∈ ΓE iff. Ex(tk) ≥ Ex(Vi), (“un- 6 traceable deletion of tk”) RENAME TABLE Ri_tmp TO Ri Proof: (summarized). This follows naturally by the con- struction of query rewriting. We enumerate all transactions where A is a set of (any) attributes of Ri such that |A |≥ 1. i i that “touch” V . First, for any incoming transaction, function All indices of the newly created relation would not have any i EXDT2VER (Figure 5(a)) determines whether t is valid in delete operations performed on them and hence the effects of k (and should touch) for each specific version V i.e. whether delete operations will not be evident in their layouts. i E (t ) ≥ E (V ). If not, t is inserted with a zero-valued Thus, as suggested earlier a combination of secure dele- x k x i k VERSION attribute such that t is not visible to any subse- tion & history independence play a crucial role (along with k quent query on V . In the end, all tuples with zero VERSION Ficklebase) to ensure true erasure of deleted content. i attributes are deleted by the consolidation queries before the Having adopted the above solutions the final component to transaction commits (Figure 5(a)). Second, if the transaction tackle would be the underlying file system (if the database is rolls back on V then the rollback mechanism of Section IV-D not deployed using raw disks). A file system has mechanisms i ensures its queries have no effect on V . Third, when the (also deterministic) to allocated free pages to the DBMS on re- i current time is > E , all versions V where E(V ) ≤ E quest (e.g. when a B+-Tree node is full and requires splitting). t j j t will expire and be securely deleted (Section IV-E). The only Hence not unlike the storage data structures this also warrants versions left would have expiration time > E . the use of history independent file systems. Although research t on history independent data structures clearly points out their VI. DISCUSSION applicability to file systems providing usable implementations Forensic Analysis. At first glance it may seem that un- needs further investigation. An alternative approach (until the traceable deletion rules out additional security mechanisms easy availability of such file systems) is to use a separate disk such as maintenance of audit logs to detect/prevent data partition when re-creating the index. The new partition will tampering, since any additional recorded data opens up another have no imprints of prior file system operations. avenue through which deleted content can be leaked. However, F. Storage Analysis this is not the case. The only aspect that must be considered while recording any such information is making it non- Suppose the database comprises of N tuples and the number readable to the adversary (e.g. by using encryption) as in [67]. of active versions is n. Then in the worst case every tuple has Future Work. In section II we laid down the requirement a distinct copy in each version V , 1 ≤ i ≤ n giving a overall i that for un-traceable deletion all application logic should storage requirement of N · n. In the best case each tuple has be in possession of the database. Ficklebase requires all the same attribute values for all of its versions and only N application logic to be written as database queries. Another storage is required. direction via which this can be achieved is by utilizing stored Now, suppose that each tuple is equally likely to expire at procedures [15]. Under this scenario re-writing of SQL queries any E , i n and it has distinct copies for each of its i 1 ≤ ≤ is insufficient, instead mechanisms are needed for execution version V s.t. V E . Then the storage requirements j Ex( j ) ≤ i of arbitrary procedural code in the context of versioning. are N (n+1) . · 2 We note that although Ficklebase’s approach of query re- If we further assume a random distribution of client queries writing makes it independent of the underlying DBMS, it lim- such that each tuple expiring at Ei, 1 ≤ i ≤ n is equally its performance for scenarios where large number of versions likely to have j copies (1 ≤ j ≤ i), then the average storage need to be maintained. Hence, for greater performance we are N (n+3) requirement is · 4 . looking in to (1) moving parts of Ficklebase functionality in to This gives an overall storage complexity of O(N · n). the DBMS by modifying the storage engine and (2) designing new data structures, that efficiently allow maintenance & V. UNTRACEABILITY expiration of versions. We now show that query rewriting and versioning indeed Finally, it is important to continue to integrate Ficklebase achieves un-traceable deletion as defined (Section II). with the work on history independent file systems to ensure Let ΓS denote the set of all transactions submitted by the cross-layer, end-to-end assurances. client until time ℓ. Let tuple tk with expiration time Et >ℓ be Limitations. The current implementation does not provide inserted by a transaction Tj where Tj ∈ ΓS . The expiration support for re-writing user defined triggers and custom views. of tuple tk will coincide with the expiration of some version We plan to implement these features in the future. Vj i.e. Ex(tk)= Ex(Vj )= Et (as per the expiration model in 6 Vi Section III). This can also be written as ΓE = ΓS − Tj when Ex(Vi) > Ex(tk). 600 versions. Updates to individual tuples can cause distinct copies Order Status to be present in the database. However, these copies reside Stock Level 500 New Order close together and are very often located in the same storage Delivery node (i.e. leaf node of the underlying B+-tree). Hence queries 400 Payment applicable to a specific version often locate their target tuples in the database caches, where they were processed for previous 300 versions. In addition, re-writing of create statements further ensures 200 this by adding the VERSION attribute only as the terminal field of primary keys or other indexes (Figure 4(a)). Thus 100 even if tuples are valid in different versions (and differ in their VERSION attribute) they will not be dispersed within

Per Transaction Execution time (ms) 0 the storage indexes. 1 5 10 15 20 Stock level and order status are both read-only transac- Number of versions tions. Note from section IV-C that read-only transactions are executed only on version one. Hence increasing number of Fig. 6. Execution times for TPC-C transactions. versions to not contribute any overheads on these transactions. VII. EXPERIMENTS VIII. RELATED WORK Benchmark. We evaluate the performance of Ficklebase Secure Deletion. Solutions providing Secure Deletion employ using the TPC-C benchmark [19]. The benchmark data is set either (1) Overwriting or (2) Encryption to erase deleted up with 16 warehouses giving a total database size (on disk) of content from storage media. 1.5 GB for each run. The database buffer pool size is 200MB. Methods to recover erased data from magnetic storage were In the initial versioned database, tuples in relations oorder, originally presented in [45] along with schemes to make this order line and new order are given random expiration times. recovery significantly more difficult. In fact [45] suggests that The tuples in other relations have fixed maximum expiration it may be necessary to overwrite deleted content up to 35 times times. New tuples inserted during the benchmark transactions to completely ensure non-recovery. are also given random expiration times. Later [51,72] claim that at least software-based data recov- Setup. The database server runs on an Intel Xeon 3.4 GHz, ery can be made impossible by a single overwrite. [51] also 4GB RAM Linux box (kernel 2.6.18). The server DBMS is provides extensions to the Ext3 file system that implement off-the-shelf MySQL version 14.12 Distrib 5.0.45. The client overwriting not just of deleted file content but also of file system is an Ubuntu VM running on an Intel core i5 at 1.60 meta-data (e.g. name,owner,group,size etc). GHz with 2 GB RAM. The Ficklebase proxy is implemented [72] investigates the possibility of recovering deleted con- in Lua [11] and runs within the mysql proxy [12] component tent utilizing an electron microscope concluding that although version 0.8.2. To simulate the TPC-C clients we use the recovery of an individual bit is possible, the likelihood of BenchmarkSQL tool [1], modified so that all TPC-C logic recovering sizeable data using this technique is negligible. is comprised in SQL queries. An extension for the Ext2 file system was made available Measurements. To measure the TPC-C transaction execution by [24]. Here an asynchronous overwriting mechanism is times we execute ni × 50 runs of each TPC-C transaction employed which causes less interference with user tasks but using a single client and record the average execution time (ni sacrifices security for a short interval (from deletion time to is the target number of versions the test database instance is set overwrite operation). up for). The multiplicative factor (× 50) ensures that targets Many available off-the-shelf tools aid in secure deletion [5]. of insert/update queries are distributed across all versions of A survey of these sanitization & forensic analysis tools is the test database instance. Figure 6 shows the results for each provided in [39]. of the TPC-C transactions with varying number of versions. [33] addresses identification and removal of deleted content We observe the following overheads for each added version from main memory. The goal here is to reduce the lifetime New Order (≈4.9 %), Delivery (≈7.8 %), Payment (≈6.7 %). of data in main memory (referred to as secure deallocation). Version maintenance and query-rewriting (section IV) may On deallocation the solution overwrites the heap/stack content initially give the impression that each added version in theory with zeros to prevent recovery. will result in an overhead of close to 1x i.e. if a transaction [54] posit that overwriting is insufficient and instead employ takes time t to complete execution on one version, then on encoding/decoding to protect sensitive data. AES encryp- two versions it would require 2t time, on three versions 3t tion/decryption is used (within a modified firmware) to protect and so on. This is because each client transaction is applied deleted content. to all logical database versions. In practice however, this is [56] designed a NAND flash file system based on YAFFS not the case and actual overheads are far lower as seen above. to support Secure Deletion. Encryption is used to delete files, This is due database caching and co-location of tuple while a single block is allocated for storage of all keys. The key store block is erased using overwriting and once this is enables creation of views on multi-versioned data. done all deleted(encrypted) content becomes un-recoverable. A summary of various multi-versioned data structures is Encryption is also employed in [74] to dispose of relevant available in [47]. Commercial [13] and open source imple- index entries when a record expires. Secure deletion for a mentations [16,18] are also available. versioning file system is provided in [63]. Here, a special stub Statistical Databases. Statistical databases [27] are used for is stored with each encrypted data block. On deletion only maintaining statistics over data in an OLAP (online analytical the stub is overwritten which renders the associated block un- processing) model. The main security concern here is to recoverable. prevent an adversary from deducing very specific information For a more detailed survey on secure deletion we refer the by issuing statistical queries. Typical approaches to prevent reader to [37]. such leakage include (but are not limited to) – only supporting History Independence. Both secure deletion and history aggregate queries, refusal to answer queries with small result independent data structures [48] are complimentary to Fickle- sets, returning ranges instead of specific values etc [31,36]. At base since all three are essential & need to exist in tandem to first glance it may seem that statistical databases achieve un- achieve complete erasure of deleted content. traceable deletion at least for aggregates. E.g. if a data item Initial work on History Independence focussed on hash is deleted then all aggregates will be updated to remove its tables [25,26,61] and is not directly applicable to relational effects. However, such databases are designed for the OLAP databases (unless specific hash indices are used). model and are not intended for data modification operations B-Treaps [42] and B-Skip-Lists [43] are promising alter- such as deletion. natives for use in database storage engines. Both offer the Forensic Analysis. Forensic analysis [44], related research same functions as a standard B-Tree and have the same depth [53,60,71] and available tools [6,14] serve to enable detec- O(logB n), where B is the block transfer size. The only tion/prevention of tampering of system data. Several forensic advantage of B-Skip-Lists over B-Treaps is their simplicity algorithms are discussed in [62]. In some cases forensic making them easier to implement. analysis can be complimentary to Ficklebase, e.g. [67] makes A comprehensive survey and explanation of these history audit logs unreadable by the adversary thereby closing another independent data structures is available via [41]. avenue of a possible leakage of deleted data items. Compensating Transactions. A compensating transaction Data Degradation. Data Degradation [22] is a work-in- on execution undoes the effect of a previously committed progress to address removal of sensitive data. Here the goal is transaction without resorting to cascading aborts. Hence, com- to gradually degrade sensitive information over time eventually pensating transactions can potentially be utilized to undo making it un-recoverable. Although comprehensive techniques the side effects of deleted tuples as in Ficklebase. However, are yet to be designed [21] gives a simple introductory Compensating Transactions are application-dependent [55], solution. Here, a data item is degraded in steps from specific need to be pre-defined and can only be minimally automated. to more general values. E.g. an address field may initially Ficklebase on the other provides support for un-traceable contain the entire detailed address. In the next iteration the deletion at the database level. street part is removed, a following iteration removes the state Guidelines for designing compensating transactions are dis- & zip leaving only the country code and so on. cussed in [55]. [34] uses an example of an online bookshop Information Flow Control. Although not dealing with transaction to review several notations for compensation in- removal of side-effects information flow control and related cluding their syntax and semantics. implementations [35,66,73] enable tracking of sensitive data Sagas [28] is a flow composition language which achieves across system components. This can be used along with atomicity based on compensation. In case of a long running Ficklebase to detect and later delete copies of data items that transactions that fails to complete, compensation is employed have crossed system boundaries (e.g. moved to another node to undo its effects. [28] also addresses parallel composition, on a distributed system). nesting and exception handling. In addition, composition lan- Other. [23] enables application developers to specify destruc- guages such as BPEL4WS [2] enable programmers to specify tive policies on business records. These policies are stored compensations for associated transactions. and later executed as stored procedures. The execution is Multiversion Databases. On the flip side of deletion is the triggered by additional policies that define a critical view requirement to record every single change made to data. This which comprise of sensitive data. The destructive policies here may be required for historical queries or to document system need to be predefined not unlike compensating transactions. evolution. Research on multiversion databases achieves this by designing data structures that are efficient for both storing & IX. CONCLUSION retrieving versioned data. Here, no information is ever deleted In this paper we introduced un-traceable deletion which but is rather made available for later querying by version or along with secure deletion and history independence is integral by time. in ensuring complete erasure of deleted content. Designed data structures range from basic B-Trees [57] We provide insights into the new functional aspects of this to transactional B+-Trees [46] with concurrency support. new assurance in the context of databases and present the de- In addition [49,50] address branched evolution while [58] sign and evaluation of Ficklebase, a relational database which achieves un-traceable deletion via versioning and query- [32] Jim Chow, Ben Pfaff, Tal Garfinkel, Kevin Christopher, and Mendel rewriting. Rosenblum. Understanding data lifetime via whole system simulation. In Proceedings of the 13th conference on USENIX Security Symposium - REFERENCES Volume 13, SSYM’04, pages 22–22, Berkeley, CA, USA, 2004. USENIX Association. [1] BenchmarkSQL. Online at http://sourceforge.net/projects/ [33] Jim Chow, Ben Pfaff, Tal Garfinkel, and Mendel Rosenblum. Shredding benchmarksql/. your garbage: reducing data lifetime through secure deallocation. In [2] Business Process Execution Language for Web Services. Online at http: Proceedings of the 14th conference on USENIX Security Symposium - //www.ibm.com/developerworks/library/specification/ws-bpel/. Volume 14, SSYM’05, pages 22–22, Berkeley, CA, USA, 2005. USENIX [3] Computer Forensics. Online at http://en.wikipedia.org/wiki/Computer Association. forensics. [34] Christian Colombo and Gordon J. Pace. A compensating transaction [4] Current Population Survey (CPA). Online at http://www.bls.gov/cps/. example in twelve notations. Technical Report CS2011-01, Department [5] Deleted but not gone. Online at http://www.nytimes.com/2005/11/03/ of Computer Science, University of Malta, 2011. Available from technology/circuits/03basics.html?pagewanted=print. http://www.um.edu.mt/ict/cs/research/technical reports. [6] EnCase. Online at http://www.guidancesoftware.com/forensic.htm. [35] Michael Dalton, Hari Kannan, and Christos Kozyrakis. Raksha: a [7] EU’s Data Protection Directive. Online at http://ec.europa.eu/justice/ flexible information flow architecture for software security. SIGARCH data-protection/index en.htm. Comput. Archit. News, 35(2):482–493, June 2007. [8] Family Educational Rights and Privacy Act (FERPA). Online at http: [36] Dorothy E. Denning and Jan Schlörer. A fast procedure for finding a //www2.ed.gov/policy/gen/guid/fpco/ferpa/index.html. tracker in a statistical database. ACM Trans. Database Syst., 5(1):88– [9] Federal Information Security Management Act (FISMA). Online at http: 102, March 1980. //csrc.nist.gov/groups/SMA/fisma/index.html. [37] Sarah M. Diesburg and An-I Andy Wang. A survey of confidential [10] Gramm–Leach–Bliley Act (GLB). Online at http://en.wikipedia.org/ data storage and deletion methods. ACM Comput. Surv., 43(1):2:1–2:37, wiki/Gramm-Leach-Bliley Act. December 2010. [11] Lua programming language. Online at http://www.lua.org/. [38] Kevvie Fowler. SQL Server Forensic Analysis. Addison-Wesley [12] MySQL Proxy. Online at http://forge.mysql.com/wiki/MySQL Proxy. Professional, 2008. [13] Oracle Total Recall. Online at http://www.oracle.com/us/products/ [39] Simson L. Garfinkel and Abhi Shelat. Remembrance of data passed: A database/options/total-recall/overview/index.html. study of disk sanitization practices. IEEE Security and Privacy, 1(1):17– [14] Sleuth Kit. Online at http://www.sleuthkit.org/. 27, January 2003. [15] Stored Procedure. Online at http://en.wikipedia.org/wiki/Stored [40] Tal Garfinkel, Ben Pfaff, Jim Chow, and Mendel Rosenblum. Data procedure. lifetime is a systems problem. In Proceedings of the 11th workshop on [16] Tau BerkelyDB. Online at http://www.cs.arizona.edu/projects/tau/tbdb/. ACM SIGOPS European workshop, EW 11, New York, NY, USA, 2004. [17] The Health Insurance Portability and Accountability Act of 1996 ACM. (HIPAA). Online at http://www.hhs.gov/ocr/privacy/. [41] Daniel Golovin. Uniquely represented data structures with applications [18] TimeDB. Online at http://www.timeconsult.com/. to privacy. PhD thesis, Pittsburgh, PA, USA, 2008. AAI3340637. [19] TPC-C Benchmark. Online at http://www.tpc.org/tpcc/default.asp. [42] Daniel Golovin. B-treaps: A uniquely represented alternative to b- [20] UK Data Protection Act 1998 (DPA). Online at http://en.wikipedia.org/ trees. In Proceedings of the 36th International Colloquium on Automata, wiki/Data Protection Act 1998#Data protection principles. Languages and Programming: Part I, ICALP ’09, pages 487–499, [21] Nicolas Anciaux, Luc Bouganim, Harold van Heerde, Philippe Pucheral, Berlin, Heidelberg, 2009. Springer-Verlag. and Peter M. G. Apers. Instantdb: Enforcing timely degradation of [43] Daniel Golovin. The B-skip-list: A simpler uniquely represented sensitive data. In Proceedings of the 2008 IEEE 24th International Con- alternative to B-trees. CoRR, abs/1005.0662, 2010. ference on Data Engineering, ICDE ’08, pages 1373–1375, Washington, [44] Mario A. M. Guimaraes, Richard Austin, and Huwida Said. Database DC, USA, 2008. IEEE Computer Society. forensics. In 2010 Information Security Curriculum Development [22] Nicolas Anciaux, Luc Bouganim, Harold van Heerde, Philippe Pucheral, Conference, InfoSecCD ’10, pages 62–65, New York, NY, USA, 2010. and Peter M.G. Apers. Data degradation: making private data less ACM. sensitive over time. In Proceedings of the 17th ACM conference on [45] Peter Gutmann. Secure deletion of data from magnetic and solid-state Information and knowledge management, CIKM ’08, pages 1401–1402, memory. In Proceedings of the 6th conference on USENIX Security New York, NY, USA, 2008. ACM. Symposium, Focusing on Applications of Cryptography - Volume 6, [23] Ahmed A. Ataullah, Ashraf Aboulnaga, and Frank Wm. Tompa. Records SSYM’96, pages 8–8, Berkeley, CA, USA, 1996. USENIX Association. retention in relational database systems. In Proceedings of the 17th [46] Tuukka Haapasalo, Ibrahim Jaluta, Bernhard Seeger, Seppo Sippu, and ACM conference on Information and knowledge management, CIKM Eljas Soisalon-Soininen. Transactions on the multiversion b+-tree. ’08, pages 873–882, New York, NY, USA, 2008. ACM. In Proceedings of the 12th International Conference on Extending [24] Steven Bauer and Nissanka B. Priyantha. Secure data deletion for linux Database Technology: Advances in Database Technology, EDBT ’09, file systems. In Proceedings of the 10th conference on USENIX Security pages 1064–1075, New York, NY, USA, 2009. ACM. Symposium - Volume 10, SSYM’01, pages 12–12, Berkeley, CA, USA, [47] Tuukka K. Haapasalo, Ibrahim M. Jaluta, Seppo S. Sippu, and Eljas O. 2001. USENIX Association. Soisalon-Soininen. Concurrency control and recovery for multiversion [25] Guy E. Blelloch. Strongly history-independent hashing with appli- database structures. In Proceedings of the 2nd PhD workshop on cations. In In Proceedings of the 48th Annual IEEE Symposium on Information and knowledge management, PIKM ’08, pages 73–80, New Foundations of Computer Science, pages 272–282, 2007. York, NY, USA, 2008. ACM. [26] Guy E. Blelloch and Daniel Golovin. Strongly history-independent [48] Jason D. Hartline, Edwin S. Hong, Alexander E. Mohr, William R. hashing with applications. In Proceedings of the 48th Annual IEEE Pentney, and Emily Rocke. Characterizing history independent data Symposium on Foundations of Computer Science, FOCS ’07, pages 272– structures. In Proceedings of the 13th International Symposium on 282, Washington, DC, USA, 2007. IEEE Computer Society. Algorithms and Computation, ISAAC ’02, pages 229–240, London, UK, [27] Claus Boyens, Oliver Gnther, and Hans j. Lenz. Statistical databases. UK, 2002. Springer-Verlag. [28] Roberto Bruni, Hernán Melgratti, and Ugo Montanari. Theoretical [49] Linan Jiang, Betty Salzberg, David B. Lomet, and Manuel Barrena foundations for compensations in flow composition languages. SIGPLAN Garc´ıa. The bt-tree: A branched and temporal access method. In Not., 40(1):209–220, January 2005. Proceedings of the 26th International Conference on Very Large Data [29] Simon Byers. Scalable exploitation of, and responses to information Bases, VLDB ’00, pages 451–460, San Francisco, CA, USA, 2000. leakage through hidden data in published documents. ATT Research, Morgan Kaufmann Publishers Inc. 2003. [50] Khaled Jouini and Geneviève Jomier. Indexing multiversion databases. [30] Brian Carrier. File System Forensic Analysis. Addison-Wesley Profes- In Proceedings of the sixteenth ACM conference on Conference on sional, 2005. information and knowledge management, CIKM ’07, pages 915–918, [31] Francis Y. Chin. Security in statistical databases for queries with small New York, NY, USA, 2007. ACM. counts. ACM Trans. Database Syst., 3(1):92–104, March 1978. [51] Nikolai Joukov, Harry Papaxenopoulos, and Erez Zadok. Secure deletion myths, issues, and solutions. In Proceedings of the second ACM workshop on Storage security and survivability, StorageSS ’06, pages 61–66, New York, NY, USA, 2006. ACM. [52] J. Katz and Y. Lindell. Introduction to modern cryptography. Chapman database tampering. ACM Trans. Database Syst., 33(4):30:1–30:47, & Hall/CRC cryptography and network security. Chapman & Hall/CRC, December 2008. 2008. [63] Zachary N. J. Peterson, Randal Burns, Joe Herring, Adam Stubblefield, [53] Peter Kieseberg, Sebastian Schrittwieser, Martin Mulazzani, Markus and Aviel D. Rubin. Secure deletion for a versioning file system. In Huber, and Edgar Weippl. Trees cannot lie: Using data structures for Proceedings of the 4th conference on USENIX Conference on File and forensics purposes. In Proceedings of the 2011 European Intelligence Storage Technologies - Volume 4, FAST’05, pages 11–11, Berkeley, CA, and Security Informatics Conference, EISIC ’11, pages 282–285, Wash- USA, 2005. USENIX Association. ington, DC, USA, 2011. IEEE Computer Society. [64] R. F. Resende and A. El Abbadi. On the serializability theorem for [54] Marek Klonowski, MichałPrzykucki, and Tomasz Strumiński. Informa- nested transactions. Inf. Process. Lett., 50(4):177–183, May 1994. tion security applications. chapter Data Deletion with Provable Security, [65] James M. Rosenbaum. In defence of the delete key. The Green Bag, pages 240–255. Springer-Verlag, Berlin, Heidelberg, 2009. 3(4), 2000. [55] Henry F. Korth, Eliezer Levy, and Avi Silberschatz. A formal approach [66] Indrajit Roy, Donald E. Porter, Michael D. Bond, Kathryn S. McKinley, to recovery by compensating transactions. Technical report, Austin, TX, and Emmett Witchel. Laminar: practical fine-grained decentralized USA, 1990. information flow control. SIGPLAN Not., 44(6):63–74, June 2009. [56] Jaeheung Lee, Junyoung Heo, Yookun Cho, Jiman Hong, and Sung Y. [67] Bruce Schneier and John Kelsey. Secure audit logs to support computer Shin. Secure deletion for nand flash file system. In Proceedings of the forensics. ACM Trans. Inf. Syst. Secur., 2(2):159–176, May 1999. 2008 ACM symposium on Applied computing, SAC ’08, pages 1710– [68] Jitesh Shetty and Jafar Adibi. The enron email dataset database schema 1714, New York, NY, USA, 2008. ACM. and brief statistical report, 2004. [57] David Lomet and Betty Salzberg. Access methods for multiversion data. [69] Patrick Stahlberg, Gerome Miklau, and Brian Neil Levine. Threats to In Proceedings of the 1989 ACM SIGMOD international conference on privacy in the forensic analysis of database systems. In Proceedings of Management of data, SIGMOD ’89, pages 315–324, New York, NY, the 2007 ACM SIGMOD international conference on Management of USA, 1989. ACM. data, SIGMOD ’07, pages 91–102, New York, NY, USA, 2007. ACM. [58] Claudia Bauzer Medeiros, Marie-Jo Bellosta, and Geneviève Jomier. [70] Latanya Sweeney. Protecting job seekers from identity theft. IEEE Multiversion views: constructing views in a multiversion database. Data Internet Computing, 10(2):74–78, March 2006. Knowl. Eng., 33(3):277–306, June 2000. [71] Maolin Tang and Colin Fidge. Reconstruction of falsified computer [59] Daniele Micciancio. Oblivious data structures: applications to cryptog- logs for digital forensics investigations. In Proceedings of the Eighth raphy. In Proceedings of the twenty-ninth annual ACM symposium on Australasian Conference on Information Security - Volume 105, AISC Theory of computing, STOC ’97, pages 456–464, New York, NY, USA, ’10, pages 12–21, Darlinghurst, Australia, Australia, 2010. Australian 1997. ACM. Computer Society, Inc. [60] Soumyadeb Mitra, Marianne Winslett, Richard T. Snodgrass, Shashank [72] Craig Wright, Dave Kleiman, and Shyaam Sundhar R.S. Overwriting Yaduvanshi, and Sumedh Ambokar. An architecture for regulatory hard drive data: The great wiping controversy. In Proceedings of the 4th compliant database management. In Proceedings of the 2009 IEEE International Conference on Information Systems Security, ICISS ’08, International Conference on Data Engineering, ICDE ’09, pages 162– pages 243–257, Berlin, Heidelberg, 2008. Springer-Verlag. 173, Washington, DC, USA, 2009. IEEE Computer Society. [73] Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David [61] Moni Naor, Gil Segev, and Udi Wieder. History-independent cuckoo Mazières. Making information flow explicit in histar. Commun. ACM, hashing. In Proceedings of the 35th international colloquium on 54(11):93–101, November 2011. Automata, Languages and Programming, Part II, ICALP ’08, pages [74] Qingbo Zhu and Windsor W. Hsu. Fossilized index: the linchpin of 631–642, Berlin, Heidelberg, 2008. Springer-Verlag. trustworthy non-alterable electronic records. In Proceedings of the [62] Kyriacos E. Pavlou and Richard T. Snodgrass. Forensic analysis of 2005 ACM SIGMOD international conference on Management of data, SIGMOD ’05, pages 395–406, New York, NY, USA, 2005. ACM.