"Enhancing Fairness and Privacy in Search Systems"

Enhancing Privacy and Fairness in Search Systems A dissertation submitted towards the degree Doctor of Engineering (Dr.-Ing.) of the Faculty of Mathematics and Computer Science of Saarland University by JOANNA BIEGA Saarbrücken December 2018 ii Defense Colloquium Date: 1 April 2019 Dean of the Faculty: Prof. Dr. Sebastian Hack Examination Committee Chair: Prof. Dr. Bernt Schiele Reviewer, Advisor: Prof. Dr. Gerhard Weikum Reviewer, Co-Advisor: Prof. Dr. Krishna P. Gummadi Reviewer: Prof. Dr. Carlos Castillo Reviewer: Prof. Dr. Wolfgang Nejdl Academic Assistant: Dr. Erisa Terolli iii Acknowledgments I am greatly indebted to my PhD advisors, Gerhard Weikum and Krishna Gummadi, for their mentorship. Thank you Gerhard for your excellent technical guidance, for pushing me to work to the best of my ability, and for allowing me so much freedom to pursue my research interests. Thank you Krishna for teaching me to think deeply about the problems I work on, for showing how to be playful in coming up with new ideas, and for the many hours of non-technical discussions that helped me to grow as a researcher. Tremendous thanks to Prof. Carlos Castillo and Prof. Wolfgang Nejdl for reviewing this thesis and for all the helpful feedback. Thanks also to Prof. Bernt Schiele and Dr. Erisa Terolli for agreeing to devote their time to serve on my examination committee. Special thanks to Fabian Suchanek for being a fantastic mentor, for introducing me to the research world and later convincing me to do a PhD, and for being an inspiring example of how to reconcile professional excellence with various hobbies. Thanks to the Privacy Infrastructure Team at Google Zurich for hosting me during my internship and teaching me so many valuable lessons about engineering excellence. I want to thank all the friends and colleagues from D5@MPI-INF, SocSys@MPI-SWS, and the institute administration, for creating such an amazing working environment. Thanks to my old friends from the ’Sanok’ group for being a constant in my life despite the fact that we are so geographically distributed now, and for reminding me that I am not my work. Thanks to Dominika as well as all the friends from our Polish geek group in Sb for all the good times and great discussions. Thanks to all the ’Balnica’ friends for all the artistically creative times that powered me through the work times. Heartfelt thanks to the family Roth for making Germany and Switzerland feel a bit more like home. Thanks to my sister Ela for all the emotional support - during the good times and the bad times. I want to thank my parents, Lilianna and Marek, for making our education a priority. Particular thanks go to my mom who, in a small Polish town in the 80s and 90s, had a visionary dream that her daughters become engineers and grow up to be independent women. It is only years later that I realize how privileged I was to have been raised this way as a girl. Last but not least, thank you Ben for having supported me in countless ways through all these years, for all the amazing experiences we’ve had together, and for all the value you bring to my life. Asia v Abstract ollowing a period of expedited progress in the capabilities of digital systems, the society begins to realize that systems designed to assist people in various tasks can F also harm individuals and society. Mediating access to information and explicitly or implicitly ranking people in increasingly many applications, search systems have a substantial potential to contribute to such unwanted outcomes. Since they collect vast amounts of data about both searchers and search subjects, they have the potential to violate the privacy of both of these groups of users. Moreover, in applications where rankings influence people’s economic livelihood outside of the platform, such as sharing economy or hiring support websites, search engines have an immense economic power over their users in that they control user exposure in ranked results. This thesis develops new models and methods broadly covering different aspects of privacy and fairness in search systems for both searchers and search subjects. Specifically, it makes the following contributions: • We propose a model for computing individually fair rankings where search subjects get exposure proportional to their relevance. The exposure is amortized over time using constrained optimization to overcome searcher attention biases while preserving ranking utility. • We propose a model for computing sensitive search exposure where each subject gets to know the sensitive queries that lead to her profile in the top-k search results. The problem of finding exposing queries is technically modeled as reverse nearest neighbor search, followed by a weekly-supervised learning to rank model ordering the queries by privacy-sensitivity. • We propose a model for quantifying privacy risks from textual data in online communities. The method builds on a topic model where each topic is annotated by a crowdsourced sensitivity score, and privacy risks are associated with a user’s relevance to sensitive topics. We propose relevance measures capturing different dimensions of user interest in a topic and show how they correlate with human risk perceptions. • We propose a model for privacy-preserving personalized search where search queries of different users are split and merged into synthetic profiles. The model mediates the privacy-utility trade-off by keeping semantically coherent fragments of search histories within individual profiles, while trying to minimize the similarity of any of the synthetic profiles to the original user profiles. The models are evaluated using information retrieval techniques and user studies over a variety of datasets, ranging from query logs, through social media and community question answering postings, to item listings from sharing economy platforms. vii Kurzfassung ach einer Zeit schneller Fortschritte in den Fähigkeiten digitaler Systeme beginnt die Gesellschaft zu erkennen, dass Systeme, die Menschen bei verschiedenen Aufgaben N unterstützen sollen, den Einzelnen und die Gesellschaft auch schädigen können. Suchsysteme haben ein erhebliches Potenzial, um zu solchen unerwünschten Ergebnissen beizutragen, weil sie den Zugang zu Informationen vermitteln und explizit oder implizit Menschen in immer mehr Anwendungen in Ranglisten anordnen. Da sie riesige Datenmen- gen sowohl über Suchende als auch über Gesuchte sammeln, können sie die Privatsphäre dieser beiden Benutzergruppen verletzen. In Anwendungen, in denen Ranglisten einen Ein- fluss auf den finanziellen Lebensunterhalt der Menschen außerhalb der Plattform haben, z. B. auf Sharing-Economy-Plattformen oder Jobbörsen, haben Suchmaschinen eine immense wirtschaftliche Macht über ihre Nutzer, indem sie die Sichtbarkeit von Personen in Suchergebnissen kontrollieren. In dieser Dissertation werden neue Modelle und Methoden entwickelt, die verschiedene Aspekte der Privatsphäre und der Fairness in Suchsystemen, sowohl für Suchende als auch für Gesuchte, abdecken. Insbesondere leistet die Arbeit folgende Beiträge: • Wir schlagen ein Modell für die Berechnung von fairen Rankings vor, bei denen Suchsubjekte entsprechend ihrer Relevanz angezeigt werden. Die Sichtbarkeit wird im Laufe der Zeit durch ein Optimierungsmodell adjustiert, um die Verzerrungen der Sichtbarkeit für Sucher zu kompensieren, während die Nützlichkeit des Rankings beibehalten bleibt. • Wir schlagen ein Modell für die Bestimmung kritischer Suchanfragen vor, in dem für jeden Nutzer Aanfragen, die zu seinem Nutzerprofil in den Top-k-Suchergebnissen führen, herausgefunden werden. Das Problem der Berechnung von exponierenden Suchanfragen wird als Reverse-Nearest-Neighbor-Suche modelliert. Solche kritischen Suchanfragen werden dann von einem Learning-to-Rank-Modell geordnet, um die sensitiven Suchanfragen herauszufinden. • Wir schlagen ein Modell zur Quantifizierung von Risiken für die Privatsphäre aus Textdaten in Online-Communities vor. Die Methode baut auf einem Themenmodell auf, bei dem jedes Thema durch einen Crowdsourcing-Sensitivitätswert annotiert wird. Die Risiko-Scores sind mit der Relevanz eines Benutzers mit kritischen Themen verbunden. Wir schlagen Relevanzmaße vor, die unterschiedliche Dimensionen des Benutzerinteresses an einem Thema erfassen, und wir zeigen, wie diese Maße mit der Risikowahrnehmung von Menschen korrelieren. • Wir schlagen ein Modell für personalisierte Suche vor, in dem die Privatsphäre geschützt wird. In dem Modell werden Suchanfragen von Nutzer partitioniert und in synthetische Profile eingefügt. Das Modell erreicht einen guten Kompromiss zwischen der Such- systemnützlichkeit und der Privatsphäre, indem semantisch kohärente Fragmente der Suchhistorie innerhalb einzelner Profile beibehalten werden, wobei gleichzeitig angestrebt wird, die Ähnlichkeit der synthetischen Profile mit den ursprünglichen Nutzerprofilen zu minimieren. viii Die Modelle werden mithilfe von Informationssuchtechniken und Nutzerstudien ausgewertet. Wir benutzen eine Vielzahl von Datensätzen, die von Abfrageprotokollen über soziale Medien Postings und die Fragen vom Q&A Forums bis hin zu Artikellistungen von Sharing-Economy- Plattformen reichen. Contents 1 Introduction1 1.1 Motivation....................................1 1.1.1 Search systems..............................1 1.1.2 Privacy in search.............................2 1.1.3 Fairness in search.............................4 1.2 Challenges.....................................5 1.3 Thesis contributions...............................6 1.4 Other contributions

"Enhancing Fairness and Privacy in Search Systems"

CHAPTER 6: Diversity in AI

Speaker Suan Gregurick

A Participatory Approach Towards AI for Social Good Elizabeth Bondi,∗1 Lily Xu,∗1 Diana Acosta-Navas,2 Jackson A

David C. Parkes John A

Director's Update

Graduate Student Highlights

Report of the ACD Working Group Ad Hoc Virtual Meeting on AI/ML

Bridging Machine Learning and Mechanism Design Towards Algorithmic Fairness

Jurors Bio Version 1

Generative Modeling for Graphs

Fairly Private Through Group Tagging and Relation Impact

June 28, 2019 at Phoenix, Arizona