Successes of Differential Privacy
Cynthia Dwork, Harvard University Pre-Modern Cryptography
Propose
Break Modern Cryptography Propose Propose STRONGER STRONGER Definition
Propose Algs Definition algorithms satisfying definition Break Definition
Break Definition Modern Cryptography Propose Propose STRONGER STRONGER Definition
Propose Algs Definition algorithms satisfying definition Break Definition
Break Definition No Algorithm?
Propose Definition ?
Why? Provably No Algorithm? Propose WEAKER/DIFF Definition Propose Definition Alg / ? ?
Bad Definition Scientific Launch
1. Methodology 2. Engaging with negative results Dinur-Nissim Fundamental Law of Info Recovery
“Overly accurate” estimates of “too many” statistics destroys privacy. Scientific Launch
1. Methodology 2. Engaging with negative results Dinur-Nissim; impossibility of semantic security (Terry Gross) 3. Algorithmic Approach Privacy-preserving programming from a few primitives RR, symmetric noise, EM: the ORs and ANDs of DP The astonishing Blum-Ligett-Roth result Composition Analytical insights: sparse vector and PMW; geometric view 4. Complexity Fruitful Interplay with Other Fields
Learning theory, discrepancy theory, cryptography, geometry, complexity theory, mechanism design, pseudorandomness, communication complexity, machine learning, (robust) statistics, fingerprinting codes, coding theory Rich Algorithmic Literature
Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median, interquartile range) Dimension reduction (PCA, SVD), clustering Support Vector Machines Sparse regression/LASSO, logistic and linear regression Gradient descent Boosting, Multiplicative Weights Combinatorial optimization, mechanism design Privacy Under Continual Observation, Pan-Privacy Kalman filtering Statistical Queries learning model, PAC learning False Discovery Rate control Pan-Privacy, privacy under continual observation … Outreach Formative engagement with statistics Led to earliest public deployment Social Science Research Law, Economics, Medicine,…
PLSC, Berkman, Brussels, Simons Foundation, EC, iDASH,… Omics: Stanford (past); IPAM (upcoming); Society of Epidemeoligic Research Policy Policy
CPUC hearings on Energy Data Center, the ruling, the Southern CA power company Podesta report, PCAST report Commission on Evidence-Based Policymaking Consumer Finance Protection Board … Deployment
RAPPOR, Google more generally, Apple,… A couple of startups (Leapyear, Privatar(?)) Census – OnTheMap and upcoming Help wanted! Deployment
RAPPOR, Google more generally, Apple,… Help wanted! A couple of startups (Leapyear, Privatar(?)) Help wanted! Census – OnTheMap and upcoming Help wanted! Deployment
RAPPOR, Google more generally, Apple,… Help wanted! A couple of startups (Leapyear, Privatar(?)) Help wanted! Census – OnTheMap and upcoming Help wanted! DP when Privacy is not a Concern
Markets, Economics, Game Theory Hartline, McSherry,Talwar; Roth; Pai and Roth; Lykouris, Syrgkanis, and Tardos Fairness in Algorithmic Classification Generalizability under adaptive analysis Fairness Through Awareness
Dwork, Hardt, Pitassi, Reingold, Zemel 2012 Individual Fairness
People who are similar with respect to a specific classification task should be treated similarly S + math ∼ Sc + finance “Fairness Through Awareness”
Classifier
M: 푉 → 푂 M푥 푥 O: Classification V: individuals Outcomes Individual Fairness Lipschitz Classifier
푥 tiny d 푦 푀
V O
푀: 푉 → Δ 푂 푀 푥 − 푀 푦 ≤ 푑(푥, 푦) Lipschitz Mappings
Differential Privacy Individual Fairness
Objects Databases Individuals Outcomes Output of statistical analysis Classification outcome
Similarity General purpose metric Task-specific metric
Can use dp techniques for fairness Theorem: Exponential mechanism of [MT07] yields individual fairness and small loss when the metric has bounded doubling dimension. Which is “Right”? Statistical Validity in Adaptive Data Analysis
Dwork, Feldman, Hardt, Pitassi, Reingold, Roth q1
a1 q2 M a2 q3
a3 Database curator data analyst
푞푖 depends on 푎1, 푎2, … , 푎푖−1 Differential privacy neutralizes risks incurred by adaptivity Hard to find a query for which the data set is not representative The Re-Usable Holdout
Learn on the training set Check against holdout via a differentially private mechanism “Training” Future exploration does not significantly depend on H “Holdout” H stays fresh 3 Sides of the Same Coin
Fairness, Privacy, Generalizability “Keep Up the Good Work” – Moni Naor (by channeling) Let your research be fruitful and multiply Build the 휖 registry, formally or informally Build libraries, continue outreach efforts Confront Implications of the Fundamental Law Prioritization? Who decides? Which fields have the tools Public Understanding Generalization beyond the sample distribution / transfer learning? Strong relation to fairness Thank You