Towards Understanding the Interaction Between Learning Algorithms and Humans a Dissertation Submitted to the Institute for Compu
Total Page:16
File Type:pdf, Size:1020Kb
TOWARDS UNDERSTANDING THE INTERACTION BETWEEN LEARNING ALGORITHMS AND HUMANS A DISSERTATION SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Sven Schmit June 2018 © 2018 by Sven Peter Schmit. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/tm630yr3734 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Ramesh Johari, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Ashish Goel I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Johan Ugander Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost for Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Voor Ellen Kerseboom Abstract This thesis considers several learning problems where interaction between the end-user and the algorithm play an important role. The thesis is organized as follows. The first chapter provides background material on mechanism design and reinforcement learning, and discusses related work. This sets us up for the next three chapters, which cover our contributions. In the second chapter, we consider recommendation systems. Many recommendation algorithms rely on user data to generate recommendations. However, these recommendations also affect the data obtained from future users. We aim to understand the effects of this dynamic interaction. We propose a simple model where users with heterogeneous preferences arrive over time. Based on this model, we prove that naive estimators, i.e. those which ignore this feedback loop, are not consistent. We show that consistent estimators are efficient in the presence of myopic agents. Our results are validated using simulations. In the third chapter, we consider a platform that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if they are dissatisfied with the actions of the platform. For example, a platform is interested in personalizing the number of newsletters it sends, but faces the risk that the user unsubscribes forever. We propose a general thresholded learn- ing model for scenarios like this, and discuss the structure of optimal policies. We describe salient features of optimal personalization algorithms and how feedback the platform receives impacts the results. Furthermore, we investigate how the platform can efficiently learn the heterogeneity across users by interacting with a population and provide performance guarantees. In the fourth chapter, we propose a new experimentation framework for the setting where there are many hypotheses and observations are costly. In such scenario, it is important to internalize the opportunity cost of assigning a sample to an experiment. We fully characterize the optimal policy and give an algorithm to compute it. Furthermore, we provide a simple heuristic that helps understand the optimal policy. Simulations based on baseball batting average data demonstrate superior performance compared to alternative algorithms. We also discuss more general insights gained from this testing paradigm, such as the paradox of power; high-powered tests can lead to inefficient sampling. v Acknowledgments I have always enjoyed reading the acknowledgement section of dissertations, and it feels a bit strange to write my own as my “educational road trip” is coming to an end. But more than that, I feel very fortunate to have benefited in a myriad of ways by fantastic people who have helped me, supported me, believed in me, and frankly, have made this all possible. I want to start by thanking the professors from the University of Groningen that have been instru- mental. Helena Nusse taught me mathematics and how to teach. Johannes Nieuwenhuis planted the seed for my love/hate relationship with Probability Theory. Unfortunately, I never had the chance to take a course by Maarten van der Vlerk, who sadly passed away. Nonetheless, he has always been incredibly supportive and enthusiastic in his role as director of the Econometrics and Operations Research program. Without their support, I would have never crossed the Channel, let alone the Atlantic ocean. During three summer internships, I found great mentors at HP Labs and Stitch Fix. I cherish attempts by Rob Schreiber to teach me how to write. Big thanks to Brad Klingenberg and Eric Colson at Stitch Fix for believing in me and giving me freedom to explore in my two summers in San Francisco. While it might be difficult to recognize now, the initial ideas which led to this thesis originated from there, many of them based on conversations with Brad. Furthermore, working alongside Jeff, Daragh, Eli, Chris, Greg, Tatsiana, Paolo, Patrick, Deep and fellow intern Thomas, among others, was an absolute blast. Indira, Matt, Emily, Claudine and Antoinette at ICME and Jim, Amy and Lori at MS&E were always available to help out. Let me also wish Matt more luck and wisdom for his future March Madness brackets. Margot Gerritsen has been an absolutely amazing director and tireless supporter and advocate for all ICME students. I feel incredibly lucky that our journeys crossed paths. Professors Emmanuel Candes, Percy Liang, Stephen Boyd, John Duchi, Greg Valiant, Ben Van Roy, and Andrea Montanari, among others, are amazing teachers, and I am privileged to have had the chance to take their courses. Thanks to all the students that I had the pleasure to teach for, either as TA or Instructor. I can only hope they learned half as much from me as I did from them. Johan Ugander and Ashish Goel have graciously agreed to serve on the reading committee, and their vi questions and comments have provided me with different perspectives on the research questions. In fact, Johan has even provided great advice as a PhD student at Cornell when I was considering applying to graduate schools. Furthermore, Yonatan Gur, Mohsen Bayati, Ben Van Roy, Emma Brunnskill, and Andrzej Skrzypacz took time out of their busy schedule to discuss research and provide feedback. It has been a real privilege to have Ramesh Johari as advisor. I must have caused him quite some head aches, though he would never admit it. Ramesh has amazing vision and has an answer ready for every question. His candor has helped me become a better researcher, and he has been a constant source of wisdom. He has given me the freedom to explore my own interests, while at the same time guiding me towards promising directions. I also benefited from and had a lot of fun with other members of Ramesh’s research group; Nick, Carlos, Sid, Vijay, Virag, Mohammad, David, Tum, Nikhil, and Hannah. Over the years, I have met many wonderful friends. Despite being away for the Netherlands for years, Kenneth, Marjolein, Jelmar, and Kevin always make me feel like I never left when I visit. At Stanford, I have spent many nights on tennis courts with Xiaofan, Seth, and Jared, and it has been great to still be in touch with friends from Cambridge, in particular Alex, Andrew, Ho Yin, and Ping. Traveling to conferences and meeting other researchers is one of the perks of a PhD student. In particular, it has been wonderful running into Andrea many times. ICME has a very strong community that provides great support to every student. Victor, Austin, Kon- stantin, Chris, Nolan, Yuekai, Jason, Carlos, Nick, Tania, Fay, Matthias, Lan, Ron, Anjan, Nurbek, and Allison, among other, have turned the basement of Huang into the best office on campus. No stone has been left unturned in lunch conversations with Ramon and Carlos, from the latest tweet controversies to the meaning of life. These often turned into fierce debates, and I loved every bit of disagreeing with both of them, and I look forward to the many conversations that are to come. More serious conversations about research haven proved very fruitful, leading to many collaborations. My mom has taught me to make the most of (and enjoy) the opportunities we are given, which has been the most valuable advice of all. She and Peter have been incredibly supportive despite being far away, and have often taken the arduous trip to visit. Cheryl, as eye-witness, has been an amazing source of inspiration, support and gezelligheid over the years. Finally, I would like to thank the Groningen University Fund, the Huygens Scholarship Program, the Prins Bernhard Cultuurfonds, the Institute for Computational and Mathematical Engineering at Stanford, the National Science Foundation, and the TomKat Center for providing financial support. Thank you all, Sven vii Contents Abstract v Acknowledgments vi 1 Introduction 1 1.1 Organization . 1 1.2 Background . 2 1.3 Mechanism design . 2 1.3.1 Strategies and optimality . 3 1.3.2 Revelation principle . 4 1.3.3 Information with myopic agents . 5 1.3.4 Connections with this thesis . 5 1.4 Reinforcement learning . 6 1.4.1 The multi-armed bandit problem . 6 1.4.2 Extensions of the multi-armed bandit problem . 11 1.4.3 Multi-armed bandits and agents .