Database-as-a- Service (DaaS) AN EXPLORATION IN CONFIDENTIALITY-PRESERVING INDEXES What is Database-as-a-Service?

u A -based solution to databases

u Offered by all major cloud providers (Amazon, , , etc) u Customer outsources their data to a cloud service provider u Cloud databases can be implemented in a wide variety of ways

u Multi-tenancy u Single-tenancy

u Relational u Non-relational (NoSQL)

u NewSQL Examples of Services

u Amazon u RDS – relational u DynamoDB – non-relational u Redshift – data warehousing u Google u Cloud SQL u Spanner - NewSQL u Microsoft u Azure – relational u Azure DocumentDB – non-relational u Oracle u Data Advantages of DaaS:

u CONVENIENCE!

u Pay-as-you-go u No need to purchase physical hardware to contain large databases u Fire your Database Administrator!

u Tuning and optimization

u Automatic patching and updates u Easy-peasy setup: providers make it simple to migrate existing databases over to their service

u ...This sounds too good to be true…!? Disadvantages of DaaS

u SECURITY!

u Lack of control u Must entrust the provider to safeguard your data

u The ungoverned cloud: Cloud Security Alliance promotes best practices for data security, but ultimately it is the responsibility of the client

u Anyone in the world can potentially access/view your data A Perfect World: Convenience AND Security? u Can we have both?

u Yes, but every measure taken to increase security corresponds to a decrease in convenience u Multi-Factor Authentication

u Encryption… The Problem With Encryption…

u If we encrypt data before outsourcing, we prevent any outsiders from viewing it

u But then how do we query that data?

u Option #1: retrieve the entire database or table we are interested in querying

u Decrypt locally, and perform queries

u Horribly inefficient!

u Option #2: use specialized indexes to query the encrypted data

u “Confidentiality Preserving Indexes” Confidentiality-Preserving Indexes (CPIs)

u The solution is to make special indexes that enable the owner of the data to query the encrypted data being held by the cloud database provider

u Numerous CPIs to choose from, each with their own pros and cons:

u Deterministic u Order-preserving

u Bucketized u B+ Trees u Many more u Let’s look at a few of them in detail… Deterministic Indexes

u IDEA: client encrypts all the values for an attribute that contains sensitive information. The client is the only one that knows the corresponding value for each encrypted value.

u Example query: SELECT FirstName, LastName FROM Employees WHERE Salary = encrypt(35,000)

u Unauthorized observers will not be able to see the most important part of the query!

u Useful for equality selections, but useless for range selections. Bucket-based Indexes u IDEA: client partitions a table into ranges of values (buckets). The buckets are encrypted and then outsourced to the cloud. Queries will return any and all buckets that contain values matching the query. u Example buckets for an Employee table: u Bucket 1: all records with salary $30,000-$40,000 u Bucket 2: all records with salary $40,000-$50,000 u ... u Example query: SELECT FirstName, LastName FROM Employee WHERE Salary > $35,000 u Query returns all buckets matching the query u Overhead: user must filter out false-positives locally B Tree Indexes u IDEA: create a B Tree Index locally (using plaintext values), and a table which represents the tree. Encrypt, then outsource it. To query the tree, retrieve the root from cloud database, decrypt, then retrieve the desired node at the next level of the tree.. Repeat until you reach the leaf level.

u Preserves order over the encrypted values

u Range queries possible u requires log(n) rounds of communication between the client and the service provider Conclusions about CPIs u The ability to query encrypted data comes at a large “cost”. Types of costs include:

u Communication between client and provider:

u as with B+ Tree Index u Computation:

u As with bucket-based indexes u Security:

u We may also give up a degree or security to make certain types of queries u All CPIs involve some effort/inconvenience for the client u Better solutions in the future? Sources

u https://hackernoon.com/5-top-cloud-databases-that-works-wonders- 7e628810e3ac u https://enlighteninglife.com/wp-content/uploads/2014/08/tug-of-war- 1013740_1920-1080x675.jpg u https://null-byte.wonderhowto.com/how-to/mr-robot-hacks/ u https://blog.jhnr.ch/images/daas_vendors.png?w=660 u https://cdn.shopify.com/s/files/1/1026/1507/products/B7101_2.jpg?v=15 05741145 u Foresti, S. (2011). Preserving Privacy in Data Outsourcing (Vol. 51, Advances in Information Security). Boston, MA: Springer US. u Köhler, J., Jünemann, K., & Hartenstein, H. (2015). Confidential database-as-a-service approaches: Taxonomy and survey. Journal of Cloud , 4(1), 1-14.