Sypse: Privacy-first Data Management through Pseudonymization and Partitioning Amol Deshpande,
[email protected] University of Maryland, College Park, MD, USA ABSTRACT there has been much work in recent years on techniques like dif- Data privacy and ethical/responsible use of personal information ferential privacy and encrypted databases, those have not found are becoming increasingly important, in part because of new reg- wide adoption and further, their primary target is untrusted en- ulations like GDPR, CCPA, etc., and in part because of growing vironments or users. We believe that operational databases and public distrust in companies’ handling of personal data. However, data warehouses in trusted environments, which are much more operationalizing privacy-by-design principles is difficult, especially prevalent, also need to be redesigned and re-architected from the given that current data management systems are designed primarily ground up to embed privacy-by-design principles and to enforce to make it easier and efficient to store, process, access, and share vast better data hygiene. It is also necessary that any such redesign not amounts of data. In this paper, we present a vision for transpar- cause a major disruption to the day-to-day operations of software ently rearchitecting database systems by combining pseudonymiza- engineers or analysts, otherwise it is unlikely to be widely adopted. tion, synthetic data, and data partitioning to achieve three privacy In this paper, we present such a redesign of a standard relational goals: (1) reduce the impact of breaches by separating detailed per- database system, called Sypse, that uses “pseudonymization”, data sonal information from personally identifying information (PII) partitioning, and synthetic data, to achieve several key privacy and scrambling it, (2) make it easy to comply with a deletion re- goals without undue burden on the users.