Printable Format

Total Page:16

File Type:pdf, Size:1020Kb

Printable Format SIGMOD/PODS 2003 Final Program San Diego, California June 8-13, 2003 Answer Queries, Rada Chirkova, Chen Li Streams, Abhinandan Das, Johannes Sunday, June 8th Gehrke, Mirek Riedewald 8:30-17:30 The Impact of the Constant Complement DEBS Workshop (Windsor) Approach Towards View Updating, Jens SIGMOD Research Session 3: OLAP PKC50 Workshop (Hampton) Lechtenboerger (Garden Salon 1) MPDS Workshop (Sheffield) Chair: Latha Colby View-Based Query Containment, Diego 19:30-21:30 Calvanese, Giuseppe De Giacomo, Maurizio Best Paper: Spreadsheets in RDBMS for SIGMOD/PODS Welcoming Reception Lenzerini, Moshe Y. Vardi OLAP, Abhinav Gupta, Andy Witkowski, (Garden Salon 1) Gregory Dorman, Srikanth Bellamkonda, The View Selection Problem for XML Tolga Bozkaya, Nathan Folkert, Lei Sheng, Content Based Routing, Ashish Kumar Sankar Subramanian Monday, June 9th Gupta, Alon Y. Halevy, Dan Suciu 8:00-8:45 QC-Trees: An Efficient Summary Structure Continental Breakfast (Atlas Foyer) 19:00-22:00 for Semantic OLAP, Laks Lakshmanan, Jian SIGMOD New DB Faculty Symposium Pei, Yan Zhao 8:45-10:00 (Windsor & Sheffield) PODS Session 1: Invited Talk SIGMOD Industrial Session 1: Web (Hampton & Sheffield) 21:00-23:00 Services (Sheffield) Chair: Tova Milo PODS Business Meeting Chair: Surajit Chaudhuri (Garden Salon 1) E-services: A Look Behind the Curtain, Rick The Future of Web Services - I, Adam Hull Bosworth (BEA Systems) Tuesday, June 10th 10:00-10:15 8:00-8:45 The Future of Web Services - II, Felipe Morning Coffee Break (Atlas Foyer) Continental Breakfast (Atlas Foyer) Cabrera (Microsoft) 10:15-11:15 8:45-9:45 SIGMOD Demos: Group A (Crescent) PODS Session 2: Award Papers SIGMOD Keynote (Regency Ballroom) (Hampton & Sheffield) Chair: Alon Halevy A System for Watermarking Relational Chair: Catriel Beeri Databases, Rakesh Agrawal, Jerry Kiernan The Database Course Qua Poster Child For Best Paper: An Information-Theoretic More Efficient Education, Jeffrey D. Ullman Query by Humming - in Action with its Approach to Normal Forms for Relational (Stanford University) Technology Revealed, Yunyue Zhu, Dennis and XML Data, Marcelo Arenas, Leonid Shasha Libkin 9:45-10:15 Morning Coffee Break (Atlas Foyer) PeerDB: Peering into Personal Databases, Best Newcomer Paper: Algorithms for Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou, Data Migration with Cloning, Samir Khuller, 10:15-11:15 Chin Hong Goh,Yingguang Li, Chu Yee Yoo-Ah Kim, Yung-Chun (Justin) Wan PODS Session 5: Data Integration Liau, Bo Ling, Wee Siong Ng,Yanfeng Shu, (Garden Salon 2) Xiaoyu Wang, Ming Zhang 11:15-11:30 Chair: Foto Afrati Break GridDB: A Database Interface to the Grid, Computing Full Disjunctions, Yaron Kanza, David T. Liu, Michael J. Franklin 11:30-12:30 Yehoshua Sagiv FCRC Plenary (Grand Ballroom) CMVF: A Novel Dimension Reduction Data Exchange: Getting to the Core, Scheme for Efficient Indexing in A Large Provably Unbreakable Encryption: Theory Ronald Fagin, Phokion G. Kolaitis, Lucian Image Database, Jialie Shen, Anne H.H. and Implementation, Michael Rabin Popa Ngu, John Shepherd, Du Q. Huynh, Quan (Harvard University) Z. Sheng SIGMOD Research Session 1: XML and 12:30-14:00 Text (Windsor) PLASTIC: Reducing the Cost of Query Lunch (Royal Palm Court) Chair: Divesh Srivastava Optimization through Query Clustering, Vibhuti S. Sengar, Jayant R. Haritsa 14:00-15:30 Querying Structured Text in an XML PODS Session 3: Invited Tutorial Database, Shurug Al-Khalifa, Cong Yu, H. BIRN-M: A Semantic Mediator for Solving (Hampton & Sheffield) V. Jagadish Real-World Neuroscience Problems, Chair: Surajit Chadhuri Amarnath Gupta, Bertram Ludascher, XRANK: Ranked Keyword Search over XML Maryann E. Martone, Arcot Rajasekar, Privacy in Data Systems, Rakesh Agrawal Documents, Lin Guo, Feng Shao, Chavdar Edward Ross, Xufei Qian, Simone Santini, Botev, Jayavel Shanmugasundaram Haiyun He, Ilya Zaslavsky 15:30-16:00 11:15-11:30 Afternoon Coffee Break (Atlas Foyer) SIGMOD Research Session 2: Stream Query Processing I (Hampton) Break 16:00-18:00 Chair: Minos Garofalakis 11:30-12:30 PODS Session 4: Views (Hampton & Sheffield) Distributed Top-K Monitoring, Brian FCRC Plenary (Grand Ballroom) Chair: Marie-Christine Rousset Babcock, Chris Olston Computer Architecture and Technology: Materializing Views with Minimal Size to Approximate Join Processing Over Data Some Thoughts on the Road Ahead, Michael Flynn (Stanford University) Databases (Garden Salon 1) Hongjun Lu, Jeffrey Xu Yu 12:30-14:00 Moderators: Nick Koudas, Divesh Efficient Processing of Joins on Set-valued Lunch (Royal Palm Court) Srivastava Attributes, Nikos Mamoulis Panelists: Daniela Florescu, Hector Garcia- 14:00-15:30 Molina, Tim Griffin, Alon Halevy, Tomasz SIGMOD Research Session 7: Temporal PODS Session 6: Optimization Imielinski, Prabhakar Raghavan Queries (Garden Salon 2) (Hampton 17:00-18:00) Chair: Victor Vianu SIGMOD Demos: Group B (Crescent) Chair: John Cho Soft Stratification for Magic Set Based STREAM: The Stanford Stream Data Temporal Coalescing with Now, Query Evaluation in Deductive Databases, Manager, Arvind Arasu, Brian Babcock, Granularity, and Incomplete Information, Andreas Behrend Shivnath Babu, Mayur Datar, Keith Ito, Curtis Dyreson Justin Rosenstein, and Jennifer Widom Query Containment and Rewriting Using Query by Humming: a Time Series Views for Regular Path Queries under Aurora: A Data Stream Management Database Approach, Yunyue Zhu, Dennis Constraints, Gosta Grahne, Alex Thomo System, D. Abadi, D. Carney, U. Shasha Cetintemel, M. Cherniack, C. Convey, C. Concise Descriptions of Subsets of Erwin, E. Galvez, M. Hatoun,A. Maskey, A. SIGMOD Research Session 8: Meta- Structured Sets, Alberto O. Mendelzon, Rasin, A. Singer, M. Stonebraker, N. Data Management Ken Q. Pu Tatbul, Y. Xing, R. Yan, S. Zdonik (Windsor 16:00-17:30) Chair: Zack Ives On Producing Join Results Early, Jens-Peter IrisNet: Internet-scale Resource-Intensive Dittrich, Bernhard Seeger, David Scot Sensor Services, Amol Deshpande, Suman Rondo: A Programming Platform for Taylor, Peter Widmayer Nath, Phillip B. Gibbons, Srinivasan Seshan Generic Model Management, Sergey Melnik, Erhard Rahm, Phil Bernstein SIGMOD Research Session 4: Data TelegraphCQ: Continuous Dataflow Security and Protection (Hampton) Processing, The TelegraphCQ Team On Schema Matching with Opaque Column Chair: Sharad Mehrotra Names and Data Values, Jaewoo Kang, Transparent Mid-Tier Database Caching in Jeffrey Naughton Winnowing: Local Algorithms for Document SQL Server, Per-Åke Larson, Jonathan Fingerprinting, Saul Schleimer, Daniel Goldstein Statistical Schema Integration across the Wilkerson, Alex Aiken Deep Web, Bin He, Kevin Chen-Chuan DBCache: Middle-tier Database Caching for Chang Information Sharing Across Private Highly Scalable e-Business Architectures, Databases, Rakesh Agrawal, Alexandre Mehmet Altinel, Christof Bornhövd, Sailesh SIGMOD Research Session 9: Statistics Evfimievski, Ramakrishnan Srikant Krishnamurthy, C. Mohan, Hamid Pirahesh, (Sheffield 16:00-17:00) Berthold Reinwald Chair: Phil Gibbons Rights Protection for Relational Data, Radu Sion, Mikhail Atallah, Sunil Prabhakar IPSOFACTO:A Visual Correlation Tool for Extended Wavelets for Multiple Measures, Aggregate Network Traffic Data, Flip Korn, Antonios Deligiannakis, Nick Roussopoulos SIGMOD Research Session 5: XML S. Muthukrishnan, Yunyue Zhu Indexing and Compression (Windsor) Spectral Bloom Filters, Saar Cohen, Yossi Chair: Hank Korth SOCQET: Semantic OLAP with Compressed Matias Cube and Summarization, Laks V.S. ViST: A Dynamic Index Method for Lakshmanan, Jian Pei, Yan Zhao SIGMOD Industrial Session 3: Querying XML Data by Tree Structures, Database Applications Haixun Wang, Sanghyun Park, Wei Fan, 15:30-16:00 (Sheffield 17:00-18:00) Philip Yu Afternoon Coffee Break (Atlas Foyer) Chair: Michael Carey XPRESS: A Queriable Compression for XML 16:00-18:00 Integration of Electronic Tickets and Data, Jun-Ki Min, Myung-Jae Park, Chin- PODS Session 7: XML (Garden Salon 2) Personal Guide System for Public Transport Wan Chung Chair: Gerhard Weikum using Mobile Terminals, Koichi Goto (RTRI), Yahiko Kambayashi (Kyoto D(k)-Index: An Adaptive Structural Correlating XML Data Streams Using Tree- University) Summary for Graph-Structured Data, Qun Edit Distance Embeddings, Minos Chen, Andrew Lim, Kian Win Ong Garofalakis, Amit Kumar Gigascope: A Stream Database for Network Applications, Theodore Johnson SIGMOD Industrial Session 2: Server Numerical Document Queries, Anca (AT&T), Chuck Cranor (AT&T), Oliver Technology (Sheffield) Muscholl, Thomas Schwentick, Helmut Spatscheck (AT&T), Vladislav Shkapenyuk Chair: James Hamilton Seidl (CMU) Oracle RAC: Architecture and Performance, Typing and Querying XML Documents: SIGMOD Tutorial 1 (Garden Salon 1) Angelo Pruscino (Oracle) Some Complexity Bounds, Luc Segoufin Data Quality and Data Cleaning: An Oracle XML DB Repository, Viswanathan The Complexity of XPath Query Evaluation, Overview, Theodore Johnson, Tamraparni Krishnamurthy (Oracle) Georg Gottlob, Christoph Koch, Reinhard Dasu (AT&T Labs Research) Pichler Multi-Dimensional Clustering: A New Data SIGMOD Demos: Group C (Crescent) Layout Scheme in DB2, Sriram SIGMOD Research Session 6: Join Padmanabhan, Bishwaranjan Algorithms (Hampton 16:00-17:00) QXtract: A Building Block for Efficient Bhattacharjee, Timothy Malkemus, Leslie Chair: Alfons Kemper Information Extraction from Plain-Text Cranston, Matthew Huras (IBM) Databases, Eugene Agichtein, Luis Gravano Containment Join Size Estimation: Models SIGMOD Panel 1: Querying Network and Methods, Wei Wang, Haifeng
Recommended publications
  • Challenges in Web Search Engines
    Challenges in Web Search Engines Monika R. Henzinger Rajeev Motwani* Craig Silverstein Google Inc. Department of Computer Science Google Inc. 2400 Bayshore Parkway Stanford University 2400 Bayshore Parkway Mountain View, CA 94043 Stanford, CA 94305 Mountain View, CA 94043 [email protected] [email protected] [email protected] Abstract or a combination thereof. There are web ranking optimiza• tion services which, for a fee, claim to place a given web site This article presents a high-level discussion of highly on a given search engine. some problems that are unique to web search en• Unfortunately, spamming has become so prevalent that ev• gines. The goal is to raise awareness and stimulate ery commercial search engine has had to take measures to research in these areas. identify and remove spam. Without such measures, the qual• ity of the rankings suffers severely. Traditional research in information retrieval has not had to 1 Introduction deal with this problem of "malicious" content in the corpora. Quite certainly, this problem is not present in the benchmark Web search engines are faced with a number of difficult prob• document collections used by researchers in the past; indeed, lems in maintaining or enhancing the quality of their perfor• those collections consist exclusively of high-quality content mance. These problems are either unique to this domain, or such as newspaper or scientific articles. Similarly, the spam novel variants of problems that have been studied in the liter• problem is not present in the context of intranets, the web that ature. Our goal in writing this article is to raise awareness of exists within a corporation.
    [Show full text]
  • The Limits of Post-Selection Generalization
    The Limits of Post-Selection Generalization Kobbi Nissim∗ Adam Smithy Thomas Steinke Georgetown University Boston University IBM Research – Almaden [email protected] [email protected] [email protected] Uri Stemmerz Jonathan Ullmanx Ben-Gurion University Northeastern University [email protected] [email protected] Abstract While statistics and machine learning offers numerous methods for ensuring gener- alization, these methods often fail in the presence of post selection—the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called post hoc generalization (Cummings et al., COLT’16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties. 1 Introduction Consider a dataset X consisting of n independent samples from some unknown population P. How can we ensure that the conclusions drawn from X generalize to the population P? Despite decades of research in statistics and machine learning on methods for ensuring generalization, there is an increased recognition that many scientific findings do not generalize, with some even declaring this to be a “statistical crisis in science” [14].
    [Show full text]
  • Individuals and Privacy in the Eye of Data Analysis
    Individuals and Privacy in the Eye of Data Analysis Thesis submitted in partial fulfillment of the requirements for the degree of “DOCTOR OF PHILOSOPHY” by Uri Stemmer Submitted to the Senate of Ben-Gurion University of the Negev October 2016 Beer-Sheva This work was carried out under the supervision of Prof. Amos Beimel and Prof. Kobbi Nissim In the Department of Computer Science Faculty of Natural Sciences Acknowledgments I could not have asked for better advisors. I will be forever grateful for their close guidance, their constant encouragement, and the warm shelter they provided. Without them, this thesis could have never begun. I have been fortunate to work with Raef Bassily, Amos Beimel, Mark Bun, Kobbi Nissim, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. I enjoyed working with them all, and I thank them for everything they have taught me. iii Contents Acknowledgments . iii Contents . iv List of Figures . vi Abstract . vii 1 Introduction1 1.1 Differential Privacy . .2 1.2 The Sample Complexity of Private Learning . .3 1.3 Our Contributions . .5 1.4 Additional Contributions . 10 2 Related Literature 15 2.1 The Computational Price of Differential Privacy . 15 2.2 Interactive Query Release . 18 2.3 Answering Adaptively Chosen Statistical Queries . 19 2.4 Other Related Work . 21 3 Background and Preliminaries 22 3.1 Differential privacy . 22 3.2 Preliminaries from Learning Theory . 24 3.3 Generalization Bounds for Points and Thresholds . 29 3.4 Private Learning . 30 3.5 Basic Differentially Private Mechanisms . 31 3.6 Concentration Bounds . 33 4 The Generalization Properties of Differential Privacy 34 4.1 Main Results .
    [Show full text]
  • Are All Distributions Easy?
    Are all distributions easy? Emanuele Viola∗ November 5, 2009 Abstract Complexity theory typically studies the complexity of computing a function h(x): f0; 1gn ! f0; 1gm of a given input x. We advocate the study of the complexity of generating the distribution h(x) for uniform x, given random bits. Our main results are: • There are explicit AC0 circuits of size poly(n) and depth O(1) whose output n P distribution has statistical distance 1=2 from the distribution (X; i Xi) 2 f0; 1gn × f0; 1; : : : ; ng for uniform X 2 f0; 1gn, despite the inability of these P circuits to compute i xi given x. Previous examples of this phenomenon apply to different distributions such as P n+1 (X; i Xi mod 2) 2 f0; 1g . We also prove a lower bound independent from n on the statistical distance be- tween the output distribution of NC0 circuits and the distribution (X; majority(X)). We show that 1 − o(1) lower bounds for related distributions yield lower bounds for succinct data structures. • Uniform randomized AC0 circuits of poly(n) size and depth d = O(1) with error can be simulated by uniform randomized circuits of poly(n) size and depth d + 1 with error + o(1) using ≤ (log n)O(log log n) random bits. Previous derandomizations [Ajtai and Wigderson '85; Nisan '91] increase the depth by a constant factor, or else have poor seed length. Given the right tools, the above results have technically simple proofs. ∗Supported by NSF grant CCF-0845003. Email: [email protected] 1 Introduction Complexity theory, with some notable exceptions, typically studies the complexity of com- puting a function h(x): f0; 1gn ! f0; 1gm of a given input x.
    [Show full text]
  • Calibrating Noise to Sensitivity in Private Data Analysis
    Calibrating Noise to Sensitivity in Private Data Analysis Cynthia Dwork1, Frank McSherry1, Kobbi Nissim2, and Adam Smith3? 1 Microsoft Research, Silicon Valley. {dwork,mcsherry}@microsoft.com 2 Ben-Gurion University. [email protected] 3 Weizmann Institute of Science. [email protected] Abstract. We continue a line of research initiated in [10, 11] on privacy- preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = P i g(xi), where xi denotes the ith row of the database and g maps database rows to [0, 1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard devi- ation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation re- sults showing the increased value of interactive sanitization mechanisms over non-interactive.
    [Show full text]
  • Stanford University's Economic Impact Via Innovation and Entrepreneurship
    Full text available at: http://dx.doi.org/10.1561/0300000074 Impact: Stanford University’s Economic Impact via Innovation and Entrepreneurship Charles E. Eesley Associate Professor in Management Science & Engineering W.M. Keck Foundation Faculty Scholar, School of Engineering Stanford University William F. Miller† Herbert Hoover Professor of Public and Private Management Emeritus Professor of Computer Science Emeritus and former Provost Stanford University and Faculty Co-director, SPRIE Boston — Delft Full text available at: http://dx.doi.org/10.1561/0300000074 Contents 1 Executive Summary2 1.1 Regional and Local Impact................. 3 1.2 Stanford’s Approach ..................... 4 1.3 Nonprofits and Social Innovation .............. 8 1.4 Alumni Founders and Leaders ................ 9 2 Creating an Entrepreneurial Ecosystem 12 2.1 History of Stanford and Silicon Valley ........... 12 3 Analyzing Stanford’s Entrepreneurial Footprint 17 3.1 Case Study: Google Inc., the Global Reach of One Stanford Startup ............................ 20 3.2 The Types of Companies Stanford Graduates Create .... 22 3.3 The BASES Study ...................... 30 4 Funding Startup Businesses 33 4.1 Study of Investors ...................... 38 4.2 Alumni Initiatives: Stanford Angels & Entrepreneurs Alumni Group ............................ 44 4.3 Case Example: Clint Korver ................. 44 5 How Stanford’s Academic Experience Creates Entrepreneurs 46 Full text available at: http://dx.doi.org/10.1561/0300000074 6 Changing Patterns in Entrepreneurial Career Paths 52 7 Social Innovation, Non-Profits, and Social Entrepreneurs 57 7.1 Case Example: Eric Krock .................. 58 7.2 Stanford Centers and Programs for Social Entrepreneurs . 59 7.3 Case Example: Miriam Rivera ................ 61 7.4 Creating Non-Profit Organizations ............. 63 8 The Lean Startup 68 9 How Stanford Supports Entrepreneurship—Programs, Cen- ters, Projects 77 9.1 Stanford Technology Ventures Program .........
    [Show full text]
  • SIGMOD Flyer
    DATES: Research paper SIGMOD 2006 abstracts Nov. 15, 2005 Research papers, 25th ACM SIGMOD International Conference on demonstrations, Management of Data industrial talks, tutorials, panels Nov. 29, 2005 June 26- June 29, 2006 Author Notification Feb. 24, 2006 Chicago, USA The annual ACM SIGMOD conference is a leading international forum for database researchers, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. We invite the submission of original research contributions as well as proposals for demonstrations, tutorials, industrial presentations, and panels. We encourage submissions relating to all aspects of data management defined broadly and particularly ORGANIZERS: encourage work that represent deep technical insights or present new abstractions and novel approaches to problems of significance. We especially welcome submissions that help identify and solve data management systems issues by General Chair leveraging knowledge of applications and related areas, such as information retrieval and search, operating systems & Clement Yu, U. of Illinois storage technologies, and web services. Areas of interest include but are not limited to: at Chicago • Benchmarking and performance evaluation Vice Gen. Chair • Data cleaning and integration Peter Scheuermann, Northwestern Univ. • Database monitoring and tuning PC Chair • Data privacy and security Surajit Chaudhuri, • Data warehousing and decision-support systems Microsoft Research • Embedded, sensor, mobile databases and applications Demo. Chair Anastassia Ailamaki, CMU • Managing uncertain and imprecise information Industrial PC Chair • Peer-to-peer data management Alon Halevy, U. of • Personalized information systems Washington, Seattle • Query processing and optimization Panels Chair Christian S. Jensen, • Replication, caching, and publish-subscribe systems Aalborg University • Text search and database querying Tutorials Chair • Semi-structured data David DeWitt, U.
    [Show full text]
  • The Best Nurturers in Computer Science Research
    The Best Nurturers in Computer Science Research Bharath Kumar M. Y. N. Srikant IISc-CSA-TR-2004-10 http://archive.csa.iisc.ernet.in/TR/2004/10/ Computer Science and Automation Indian Institute of Science, India October 2004 The Best Nurturers in Computer Science Research Bharath Kumar M.∗ Y. N. Srikant† Abstract The paper presents a heuristic for mining nurturers in temporally organized collaboration networks: people who facilitate the growth and success of the young ones. Specifically, this heuristic is applied to the computer science bibliographic data to find the best nurturers in computer science research. The measure of success is parameterized, and the paper demonstrates experiments and results with publication count and citations as success metrics. Rather than just the nurturer’s success, the heuristic captures the influence he has had in the indepen- dent success of the relatively young in the network. These results can hence be a useful resource to graduate students and post-doctoral can- didates. The heuristic is extended to accurately yield ranked nurturers inside a particular time period. Interestingly, there is a recognizable deviation between the rankings of the most successful researchers and the best nurturers, which although is obvious from a social perspective has not been statistically demonstrated. Keywords: Social Network Analysis, Bibliometrics, Temporal Data Mining. 1 Introduction Consider a student Arjun, who has finished his under-graduate degree in Computer Science, and is seeking a PhD degree followed by a successful career in Computer Science research. How does he choose his research advisor? He has the following options with him: 1. Look up the rankings of various universities [1], and apply to any “rea- sonably good” professor in any of the top universities.
    [Show full text]
  • Differential Privacy, Property Testing, and Perturbations
    Differential Privacy, Property Testing, and Perturbations by Audra McMillan A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mathematics) in The University of Michigan 2018 Doctoral Committee: Professor Anna Gilbert, Chair Professor Selim Esedoglu Professor John Schotland Associate Professor Ambuj Tewari Audra McMillan [email protected] ORCID iD: 0000-0003-4231-6110 c Audra McMillan 2018 ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor, Anna Gilbert. Anna managed to strike the balance between support and freedom that is the goal of many advisors. I always felt that she believed in my ideas and I am a better, more confident researcher because of her. I was fortunate to have a large number of pseudo-advisors during my graduate ca- reer. Martin Strauss, Adam Smith, and Jacob Abernethy deserve special mention. Martin guided me through the early stages of my graduate career and helped me make the tran- sition into more applied research. Martin was the first to suggest that I work on privacy. He taught me that interesting problems and socially-conscious research are not mutually exclusive. Adam Smith hosted me during what became my favourite summer of graduate school. I published my first paper with Adam, which he tirelessly guided towards a pub- lishable version. He has continued to be my guide in the differential privacy community, introducing me to the right people and the right problems. Jacob Abernethy taught me much of what I know about machine learning. He also taught me the importance of being flexible in my research and that research is as much as about exploration as it is about solving a pre-specified problem.
    [Show full text]
  • Amicus Brief of Data Privacy Experts
    Case 3:21-cv-00211-RAH-ECM-KCN Document 99-1 Filed 04/23/21 Page 1 of 27 EXHIBIT A Case 3:21-cv-00211-RAH-ECM-KCN Document 99-1 Filed 04/23/21 Page 2 of 27 UNITED STATES DISTRICT COURT FOR THE MIDDLE DISTRICT OF ALABAMA EASTERN DIVISION THE STATE OF ALABAMA, et al., ) ) Plaintiffs, ) ) v. ) Civil Action No. ) 3:21-CV-211-RAH UNITED STATES DEPARTMENT OF ) COMMERCE, et al., ) ) Defendants. ) AMICUS BRIEF OF DATA PRIVACY EXPERTS Ryan Calo Deirdre K. Mulligan Ran Canetti Omer Reingold Aloni Cohen Aaron Roth Cynthia Dwork Guy N. Rothblum Roxana Geambasu Aleksandra (Seša) Slavkovic Somesh Jha Adam Smith Nitin Kohli Kunal Talwar Aleksandra Korolova Salil Vadhan Jing Lei Larry Wasserman Katrina Ligett Daniel J. Weitzner Shannon L. Holliday Michael B. Jones (ASB-5440-Y77S) Georgia Bar No. 721264 COPELAND, FRANCO, SCREWS [email protected] & GILL, P.A. BONDURANT MIXSON & P.O. Box 347 ELMORE, LLP Montgomery, AL 36101-0347 1201 West Peachtree Street, NW Suite 3900 Atlanta, GA 30309 Counsel for the Data Privacy Experts #3205841v1 Case 3:21-cv-00211-RAH-ECM-KCN Document 99-1 Filed 04/23/21 Page 3 of 27 TABLE OF CONTENTS STATEMENT OF INTEREST ..................................................................................... 1 SUMMARY OF ARGUMENT .................................................................................... 1 ARGUMENT ................................................................................................................. 2 I. Reconstruction attacks Are Real and Put the Confidentiality of Individuals Whose Data are Reflected
    [Show full text]
  • Calibrating Noise to Sensitivity in Private Data Analysis
    Calibrating Noise to Sensitivity in Private Data Analysis Cynthia Dwork1, Frank McSherry1, Kobbi Nissim2, and Adam Smith3? 1 Microsoft Research, Silicon Valley. {dwork,mcsherry}@microsoft.com 2 Ben-Gurion University. [email protected] 3 Weizmann Institute of Science. [email protected] Abstract. We continue a line of research initiated in [10, 11] on privacy- preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. P Previous work focused on the case of noisy sums, in which f = i g(xi), where xi denotes the ith row of the database and g maps data- base rows to [0, 1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speak- ing, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation re- sults showing the increased value of interactive sanitization mechanisms over non-interactive.
    [Show full text]
  • Efficiently Querying Databases While Providing Differential Privacy
    Epsolute: Efficiently Querying Databases While Providing Differential Privacy Dmytro Bogatov Georgios Kellaris George Kollios Boston University Canada Boston University Boston, MA, USA [email protected] Boston, MA, USA [email protected] [email protected] Kobbi Nissim Adam O’Neill Georgetown University University of Massachusetts, Amherst Washington, D.C., USA Amherst, MA, USA [email protected] [email protected] ABSTRACT KEYWORDS As organizations struggle with processing vast amounts of informa- Differential Privacy; ORAM; differential obliviousness; sanitizers; tion, outsourcing sensitive data to third parties becomes a necessity. ACM Reference Format: To protect the data, various cryptographic techniques are used in Dmytro Bogatov, Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam outsourced database systems to ensure data privacy, while allowing O’Neill. 2021. Epsolute: Efficiently Querying Databases While Providing efficient querying. A rich collection of attacks on such systems Differential Privacy. In Proceedings of the 2021 ACM SIGSAC Conference on has emerged. Even with strong cryptography, just communication Computer and Communications Security (CCS ’21), November 15–19, 2021, volume or access pattern is enough for an adversary to succeed. Virtual Event, Republic of Korea. ACM, New York, NY, USA, 15 pages. https: In this work we present a model for differentially private out- //doi.org/10.1145/3460120.3484786 sourced database system and a concrete construction, Epsolute, that provably conceals the aforementioned leakages, while remaining 1 INTRODUCTION efficient and scalable. In our solution, differential privacy ispre- Secure outsourced database systems aim at helping organizations served at the record level even against an untrusted server that outsource their data to untrusted third parties, without compro- controls data and queries.
    [Show full text]