A Position Paper on Data Sovereignty: the Importance of Geolocating Data in the Cloud
Total Page:16
File Type:pdf, Size:1020Kb
A Position Paper on Data Sovereignty: The Importance of Geolocating Data in the Cloud Zachary N. J. Peterson Mark Gondree Robert Beverly Naval Postgraduate School Naval Postgraduate School Naval Postgraduate School Abstract at a granularity sufficient for placing it within the bor- ders of a particular nation-state. We call this notion data In this paper we define the problem and scope of data sovereignty. We desire to establish some (probabilistic) sovereignty – the coupling of stored data authenticity guarantee that a provider is storing data at some expected and geographical location in the cloud. Establishing physical location(s) and maintain such guarantees amid sovereignty is an especially important concern amid le- potentially dishonest providers. The problem of verify- gal and policy constraints when data and resources are ing that data exists only at allowed locations—and copies virtualized and widely distributed. We identify the key have not moved to some location that violates a policy— challenges that need to be solved to achieve an effective is a difficult problem in general; data sovereignty pro- and un-cheatable solution as well as propose an initial vides a much weaker guarantee, but a step toward ac- technique for data sovereignty. tively monitoring compliance with some SLA policies. 1 Introduction Within the problem of data sovereignty, key con- cerns include developing techniques that minimize stor- The exponential growth of electronic data has led pri- age and network (thus, economic) costs. The immense vate organizations and governmental agencies with lim- size of digital archives, and the even larger size of the ited storage and IT resources to outsource data storage to data centers on which they reside, make linear schemes cloud-based service providers. Storage service providers (i.e. schemes that access every block of every file) pro- agree, by a service level agreement (SLA) contract, to hibitively expensive, for either an auditor or storage preserve and make data available for retrieval for some provider. We posit that data sovereignty may not be level of durability. In addition to availability, many SLAs solvable using any one technology, but rather may be also guarantee that data will be stored only at data centers achieved with a suite of existing and future tools that both within a specific geographical region (e.g. within a state, detect and deter malicious behavior. Tools like these, time zone or political boundary) for performance, regula- which break the abstractions of the cloud to geolocate tory and continuity reasons. Actually verifying that cloud data, may be essential in the future to gather evidence, es- storage service providers are meeting their contractual tablish compliance (or show non-compliance) with con- geographic obligations, however, is a challenging prob- tracts and laws. lem, and one that has emerged as a critical issue. For example, careless or na¨ıve storage service providers may 2 Motivation move data, in violation of an SLA, to an overseas data center to leverage cheaper IT costs. Such actions, how- Most industries and governments are considering lever- ever, may make data available to foreign governments aging the scalability, cost, rapid provisioning and other through search warrants or other legal mechanisms. A benefits of the cloud. Those that are not, are at least dishonest storage provider may intentionally move data considering the reality that—to enjoy new technologies, overseas, to more easily leak information or to avoid le- to stay competitive, or to remain relevant—they may be gal liability. forced to follow this overwhelming technology trend and In this position paper, we propose the need for devel- move business into the cloud. Moving to the cloud, how- oping new algorithms for establishing the integrity, au- ever, requires organizations to interact with their data at thenticity, and geographical location of data stored in the a new level of abstraction. This comes with significant cloud. Of particular interest is establishing data location benefits, but also some limitations. For example, a data owner may wish to store backups considers a more challenging scenario, as active adver- at multiple remote sites to provide resilience against nat- saries may act strategically to fool or confuse the querier. ural disasters or for other continuity planning. Or, a data Recent work in position-based cryptography considers a owner may desire her data be located physically near her similarly strong type of adversary—capable of breaking target customers, for performance reasons. Rather than nearly all previous geolocation strategies—that is able actively monitor QoS from the perspective of those cus- to clone itself at multiple, specific, hidden locations [5]. tomers, it might be more straight-forward to monitor the Capkun et al. present a slightly weaker adversary that data’s location. The cloud’s abstractions, however, un- is unable to locate certain landmarks (“hidden, mobile dermine the ability for a data owner to choose or control base stations”) during the protocol, and thus unable to location, outside trusting a provider to meet some agree- execute certain attacks [4]. Although these adversarial ment. The US Federal Cloud Computing Strategy [12] models were originally posed in a wireless setting, such outlines “actively monitoring service level agreements” active adversaries are closer to those we consider for data and holding vendors accountable for failures as an essen- sovereignty, and make a good starting point for our future tial part of any agency’s cloud migration strategy. Data analysis. sovereignty protocols would provide such an active mon- Data sovereignty cannot guarantee that additional itoring tool, for detecting compliance with SLA provi- copies of data are not instantiated outside of a prescribed sions concerning data geolocation. geographic area, only that there exists at least one copy Data sovereignty protocols may also be a complemen- of the data at an interrogation point. Considering adver- tary technology providing solutions to other data secu- saries that hold copies of the data at multiple geographic rity problems. For digital provenance, when determining locations is outside the scope of data sovereignty. Track- the origin and history of a digital document, one of the ing all copies of data, without total control of the net- most fundamental questions is: where is this data, right work, is a different and very hard problem. We note that, now? With no reliable answer to this question at any from an economic perspective, it may be punitively ex- point in the data’s lifetime, one may never establish reli- pensive for a storage service to replicate digital archives able provenance data. that are large relative to existing network bandwidth, and Data sovereignty will also be beneficial to honest stor- therefore it may be infeasible to successfully answer ran- age service providers. Even when data sovereignty pro- dom challenges on large data sets by selectively copy- tocols come at additional cost (e.g. performance or eco- ing or distributing the archive to multiple geographically- nomic) we posit service providers may pay these costs distant locations. The premise of data sovereignty is that to establish a level of trust with their clients, to comply an adversary may have some incentive for re-locating its with data retention legislation, or to meet new contractual data in breach of a contract, but storing copies at multiple obligations and remain competitive in the marketplace. locations undermines any such motive. 3 The Scope of Data Sovereignty 4 The State of the Art Data sovereignty has been recognized—although, not Tools to actively monitor real cloud performance or SLA given a name—by cloud practitioners as a critical is- compliance—such as CloudCmp [14], SLAm [20] or sue [15]; however, to date, the problem has been nei- Nimsoft’s commercial monitoring service—do not yet ther solved nor even posed in the research literature. As offer support for checking compliance with respect to with any new security problem, it is important to clar- data durability or location clauses of an SLA. Most tools ify what a protocol should and should not attempt to ac- do monitor certain QoS metrics potentially relevant to in- complish. We propose that a proof of data sovereignty ferring geolocation and data presence, such as up-time protocol should guarantee the possession, integrity and and end-to-end response times. Thus, extending sup- location of an instance of data in large networks that are port to monitor data sovereignty is quite natural. A data not under an interrogator’s control (i.e. the Internet). The sovereignty protocol needs to achieve two things, simul- data holder may act dishonestly (and possibly collude taneously: (1) proof of the physical location of a server with other parties) to falsely claim to be holding a par- on a network within some acceptable margin of error, ticular piece of data at some geographic location. and (2) proof that a client’s data is indeed stored at this We presume the adversary to any data sovereignty so- location. We summarize applicable technologies, and de- lution is stronger than those previously considered by the scribe their relationship to our problem. geolocation literature. Much of the work on geophysical Internet geolocation identification has ignored adversarial behavior: servers generate packets or other trace evidence whose measure- Geolocation of servers on the Internet is currently ment is presumed to reflect reality. Data sovereignty achieved through a variety of evidence-gathering prac- 2 tices, including mining data from whois databases and DNS records, using modern Internet topology tools ? and through the manual inspection of Internet artifacts (e.g. confirming a webpage is written in Chinese). These methods provide a “best guess” based on a small con- stellation of heuristic evidence, generously assumed to 68ms be non-malicious. The only reliable, technical method for bounding location on the Internet, however, is active measurement (i.e.