Song Cornellgrad 0058F 12342.Pdf (2.229Mb)

MEASURING THE UNMEASURED: NEW THREATS TO MACHINE LEARNING SYSTEMS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Congzheng Song December 2020 c 2020 Congzheng Song ALL RIGHTS RESERVED MEASURING THE UNMEASURED: NEW THREATS TO MACHINE LEARNING SYSTEMS Congzheng Song, Ph.D. Cornell University 2020 Machine learning (ML) is at the core of many Internet services and applica- tions. Practitioners evaluate ML models based on the accuracy metrics, which measures the models’ predictive power on unseen future data. On the other hand, as ML systems are becoming more personalized and more important in decision-making, malicious adversaries have an incentive to interfere with the ML environment for various purposes such as extracting information about sensitive training data or inducing desired behavior in models’ output. However, none of these security and privacy threats are captured by accuracy and it is unclear to what extent current ML systems could go wrong. In this dissertation, we identify and quantify a number of threats to ML systems that are not measured by conventional performance metrics: (1) we consider the privacy threats at training time, where we show that adversary can supply malicious training code to force a ML model into intentionally ”mem- orizing” sensitive training data, and later extract memorized information from the model; (2) motivated by data-protection regulations, we identify a compliance issue where personal information might be collected for training ML models without consent, and design practical auditing techniques for detecting such unauthorized data collection; (3) we study overlearning phenomenon in deep learning models where the internal representations reveal sensitive and uncor- related information, and discuss its implications in terms of privacy leakages and compliance with regulations; and (4) we demonstrate a secure venerability in ML models for analyzing text semantic similarity, where we propose attacks for generating texts that are semantically unrelated but judged as similar by these ML models. The goal of this dissertation is to provide ML practitioners ways for measuring risks in the ML models through threat modeling. We hope that our proposed attacks could give insights for better mitigation methods, and advocate the ML community to consider all aspects rather than only accuracy when designing new learning algorithms and building new ML systems. BIOGRAPHICAL SKETCH Congzheng Song was born in Changsha, China. He earned B.S. degree with Summa Cum Laude in Computer Science from Emory University. In 2016, he entered Cornell University to pursue a Ph.D. in Computer Science, and was ad- vised by Prof. Vitaly Shmatikov at Cornell Tech campus in New York City. His doctoral research focused on identifying and quantifying security and privacy issues in machine learning. He was a doctoral fellow at Cornell Tech’s Digital Life Initiative in 2020. During his Ph.D. study, he interned at Amazon, Google and Petuum Inc for industrial research. iii To my parents. iv ACKNOWLEDGEMENTS I am extremely fortunate to have Vitaly Shmatikov as my Ph.D. advisor. He supported me in anyway he can and I learned more than I could hope for from him, ranging from formulating and refining research ideas to writing and pre- senting outcomes. His passion and wisdom guided me through many difficult times and made my Ph.D. research very productive. I owe Vitaly deeply and this dissertation would not be possible without his help. I am very grateful to the rest of my thesis committee members: Thomas Ris- tenpart and Helen Nissenbaum. Tom introduced me to security and privacy research problems in machine learning when I first came to Cornell Tech, which is also the focus of this dissertation. I was a doctoral fellow at the Digital Life Initiative (DLI) founded by Helen. I learned about the societal aspects of tech- nology from Helen and the DLI team and how people outside our field view our works from different perspectives. I also want to thank Tom and Helen for their valuable feedback on this dissertation. I want to acknowledge my collaborators and co-authors: Emiliano De Cristo- faro, Luca Melis, Roei Schuster, Vitaly Shmatikov, Reza Shokri, Marco Stronati, Eran Tromer, Ananth Raghunathan, Thomas Ristenpart, and Alexander M. Rush. They are incredible researchers in this field and I benefited a lot from their profound minds. I would like to express my graditiude to my mentors and colleagues at Ama- zon, Google and Petuum Inc during my internships. From them, I learned about how security and privacy research is deployed in practice and about the difference between academia research and the real world. Last but not least, I want to thank all my friends and family members in the U.S. and China for their endless support throughout my entire life. v TABLE OF CONTENTS Biographical Sketch . iii Dedication . iv Acknowledgements . .v Table of Contents . vi List of Tables . ix List of Figures . xii 1 Introduction 1 1.1 Thesis Contribution . .2 1.2 Thesis Structure . .4 2 Background 6 2.1 Machine Learning Preliminaries . .6 2.1.1 Supervised learning . .6 2.1.2 Linear models . .9 2.1.3 Deep learning models . .9 2.2 Machine Learning Pipeline . 10 2.2.1 Data collection . 11 2.2.2 Training ML models . 12 2.2.3 Deploying ML models . 13 2.3 Memorization in ML . 14 2.3.1 Membership Inference Attacks . 14 2.4 Privacy-preserving Techniques . 16 2.4.1 Differential privacy . 16 2.4.2 Secure ML environment . 17 2.4.3 Model partitioning . 18 3 Intentional Memorization with Untrusted Training Code 20 3.1 Threat Model . 20 3.2 White-box Attacks . 24 3.2.1 LSB Encoding . 24 3.2.2 Correlated Value Encoding . 25 3.2.3 Sign Encoding . 28 3.3 Black-box Attacks . 30 3.3.1 Abusing Model Capacity . 30 3.3.2 Synthesizing Malicious Augmented Data . 31 3.3.3 Why Capacity Abuse Works . 34 3.4 Experiments . 35 3.4.1 Datasets and Tasks . 35 3.4.2 ML Models . 37 3.4.3 Evaluation Metrics . 38 3.4.4 LSB Encoding Attack . 40 vi 3.4.5 Correlated Value Encoding Attack . 42 3.4.6 Sign Encoding Attack . 45 3.4.7 Capacity Abuse Attack . 47 3.5 Countermeasures . 54 3.6 Related Work . 55 3.7 Conclusion . 58 4 Auditing Data Provenance in Text-generation Models 60 4.1 Text-generation Models . 61 4.2 Auditing text-generation models . 63 4.3 Experiments . 68 4.3.1 Datasets . 68 4.3.2 ML Models . 70 4.3.3 Hyper-parameters . 70 4.3.4 Performance of target models . 72 4.3.5 Performance of auditing . 73 4.4 Memorization in text-generation models . 80 4.5 Limitations of auditing . 84 4.6 Related work . 86 4.7 Conclusion . 87 5 Overlearning Reveals Sensitive Attributes 89 5.1 Censoring Representation Preliminaries . 90 5.2 Exploiting Overlearning . 92 5.2.1 Inferring sensitive attributes from representation . 93 5.2.2 Re-purposing models to predict sensitive attributes . 94 5.3 Experimental Results . 95 5.3.1 Datasets, tasks, and models . 95 5.3.2 Inferring sensitive attributes from representations . 97 5.3.3 Re-purposing models to predict sensitive attributes . 102 5.3.4 When, where, and why overlearning happens . 105 5.4 Related Work . 107 5.5 Conclusions . 108 6 Adversarial Semantic Collisions 110 6.1 Threat Model . 110 6.2 Generating Adversarial Semantic Collisions . 113 6.2.1 Aggressive Collisions . 114 6.2.2 Constrained Collisions . 117 6.2.3 Regularized Aggressive Collisions . 117 6.2.4 Natural Collisions . 118 6.3 Experiments . 120 6.3.1 Tasks and Models . 122 6.3.2 Attack Results . 125 vii 6.3.3 Evaluating Unrelatedness . 126 6.3.4 Transferability of Collisions . 127 6.4 Mitigation . 128 6.5 Related Work . 130 6.6 Conclusion . 132 7 Conclusion 133 A Chapter 6 of appendix 135 viii LIST OF TABLES 3.1 Summary of datasets and models. n is the size of the training dataset, d is the number of input dimensions. RES stands for Residual Network, CNN for Convolutional Neural Network. For FaceScrub, we use the gender classification task (G) and face recognition task (F). 36 3.2 Results of the LSB encoding attack. Here f is the model used, b is the maximum number of lower bits used beyond which accuracy drops significantly, δ is the difference with the baseline test accuracy. 39 3.3 Results of the correlated value encoding attack on image data. Here λc is the coefficient for the correlation term in the objective function and δ is the difference with the baseline test accuracy. For image data, decode MAPE is the mean absolute pixel error. 40 3.4 Results of the correlated value encoding attack on text data. τ is the decoding threshold for the correlation value. Pre is precision, Rec is recall, and Sim is cosine similarity. 41 3.5 Results of the sign encoding attack on image data. Here λs is the coefficient for the correlation term in the objective function. 41 3.6 Results of the sign encoding attack on text data. 42 3.7 Decoded text examples from all attacks applied to LR models trained on the IMDB dataset. 45 3.8 Results of the capacity abuse attack on image data. Here m is the m number of synthesized inputs and n is the ratio of synthesized data to training data. 47 3.9 Results of the capacity abuse attack on text data. 47 3.10 Results of the capacity abuse attack on text datasets using a pub- lic auxiliary vocabulary.

Load more