Transferable Strategic Meta-Reasoning Models
Total Page:16
File Type:pdf, Size:1020Kb
TRANSFERABLE STRATEGIC META-REASONING MODELS BY MICHAEL WUNDER Written under the direction of Matthew Stone New Brunswick, New Jersey October, 2013 ABSTRACT OF THE DISSERTATION TRANSFERABLE STRATEGIC META-REASONING MODELS by MICHAEL WUNDER Dissertation Director: Matthew Stone How do strategic agents make decisions? For the first time, a confluence of advances in agent design, formation of massive online data sets of social behavior, and computational techniques have allowed for researchers to con- struct and learn much richer models than before. My central thesis is that, when agents engaged in repeated strategic interaction undertake a reasoning or learning process, the behavior resulting from this process can be charac- terized by two factors: depth of reasoning over base rules and time-horizon of planning. Values for these factors can be learned effectively from interac- tion and are transferable to new games, producing highly effective strategic responses. The dissertation formally presents a framework for addressing the problem of predicting a population’s behavior using a meta-reasoning model containing these strategic components. To evaluate this model, I explore sev- eral experimental case studies that show how to use the framework to predict ii and respond to behavior using observed data, covering settings ranging from a small number of computer agents to a larger number of human participants. iii Preface Captain Amazing: I knew you couldn’t change. Casanova Frankenstein: I knew you’d know that. Captain Amazing: Oh, I know that. AND I knew you’d know I’d know you knew. Casanova Frankenstein: But I didn’t. I only knew that you’d know that I knew. Did you know THAT? Captain Amazing: Of course. -Mystery Men Portions of this dissertation are based on work previously published by the author [Wunder et al., 2010, 2011, 2012, 2013]. iv Acknowledgements I would like to extend my gratitude to all those who have made contributions in support of my completion of a Ph.D. First and foremost, my advisors Michael Littman and Matthew Stone en- abled me to satisfy my curiosity in a number of directions until something stuck. I profoundly appreciate their complimentary roles in helping me de- velop as a researcher and a computer scientist. Matthew constantly encour- aged me to focus my thinking and clarify my writing with his sharp lines of questioning, and helped me to understand the broader context of this research project. Through my interactions with Michael, I benefited immensely from his experience and good nature. His feedback and advice have shaped me in too many ways to count and have been invaluable throughout this process. I could not have hoped for a stronger committee, which includes Kobi Gal, Haym Hirsh, Michael Littman, and Matthew Stone. Their diverse views and inputs during the defense process made it a very rewarding experience. Over the years, I have been supported by several grants from the National Science Foundation that made my research experience both possible and fruit- ful. The support via HSD-0624191, in particular, was crucial during the early years of my study. Later, NSF IIS-1018152 and IIS-1017811 were of great assis- tance. This dissertation depends heavily on prior work, and without my co-authors it never would have come to fruition. Collaborators who share much of the v credit for this earlier work include Monica Babes-Vroman, Michael Kaisers, Michael Littman, Siddharth Suri, Duncan Watts, and Bobby Yaros. I have learned an incredible amount through my discussions with everyone. A spe- cial thanks goes to Bobby who generously helped me proofread portions of this document, and whose door was always open for me to stop by and kick around my latest ideas. Another important contributor to this research was Martin Zinkevich, who invented the Lemonade Stand Game and his tireless efforts made the competi- tion possible. The four tournaments taking place during these past four years provided a crucial outlet to apply my research and test my ideas in the field. I am glad I was able to be a part of them, and it was a lot of fun working with the Rutgers dream team of Michael Kaisers and Bobby Yaros to design the perfect agent. To all the members of the RL3 lab, including John Asmuth, Monica Babes- Vroman, Carlos Diuk, Sergiu Goschin, Lihong Li, Chris Mansley, Vukosi Mari- vate, Ali Nouri, Tom Walsh, and Ari Weinstein, thanks for the memories and punctual lunches. You all made the lab a friendly and stimulating place. A warm thanks also goes to everyone else I met at Rutgers who helped me develop as a scientist and a person, as well as all of my other friends who have accompanied me on this journey. Finally, I would like to thank my loving and supportive family. My mom, Patricia, and sister, Beth, were always there for me as I made my way over the years. I am eternally grateful for your non-stop encouragement and faith in me. vi Dedication To Mom and Beth. vii Table of Contents Abstract ..................................... ii Preface ...................................... iv Acknowledgements .............................. v Dedication .................................... vii List of Tables . xiii List of Figures .................................. xv 1. Introduction ................................. 1 1.1. The Multiagent Cycle . 2 1.2. Thesis Statement and Dissertation Overview . 6 1.3. Experimental Case Studies . 12 1.3.1. Learning Algorithms in Two-player Normal-form Games 13 Features for Modeling Learning Algorithms . 14 1.3.2. Reasoning Models in the Lemonade Stand Game Tour- naments . 15 Features for Learning a Level-based Discounted Model . 17 Experimental Results . 17 1.3.3. Populations of Heterogeneous Strategies in Public Goods Games . 18 Features for Predictive Models . 20 viii Building an Empirically-based Computational Social Model 21 Empirically Informed Simulations in Networks . 21 1.4. Why Multiagent Learning? . 22 2. A Framework for Transferable Strategic Meta-reasoning . 25 2.1. Equilibrium . 26 2.2. Level-k Reasoning and Non-reasoning Strategies in Repeated Domains . 32 2.3. Learning in Multiagent Environments . 39 2.4. Planning over Discounted Time Horizons . 40 2.5. A Formal Model of a Population of Strategies . 45 2.6. Related Work . 53 2.6.1. Network of Influence Diagrams . 54 2.6.2. Cognitive Hierarchies/Level-K Reasoning . 55 2.6.3. I-POMDPs as Level-based Reasoners . 56 2.7. Summary . 59 3. Learning Algorithms in Simple Games . 61 3.1. Fictitious Play . 63 3.2. Gradient-based Algorithms . 64 3.2.1. Infinitesimal Gradient Ascent (IGA) . 64 3.2.2. Win-or-Learn-Fast Infinitesimal Gradient Ascent (WoLF- IGA) . 65 3.2.3. Cross Learning . 66 3.3. Q-Learning . 66 3.3.1. Reinforcement Learning/Opponent Model Hybrids . 68 ix 3.3.2. Q-Learning with Boltzmann Exploration . 69 3.3.3. Frequency-Adjusted Q-Learning . 70 3.4. Infinitesimal Q-Learning . 71 3.4.1. e-Greedy Infinitesimal Q-learning (IQL-e) . 71 3.4.2. One-player Sliding Greedy Update . 74 3.4.3. Two-player Sliding Greedy Update . 76 3.5. Classes of 2-player 2-action games . 77 3.6. Analysis of Convergence in Dominant Action Games . 81 3.6.1. Dominant Action Games: Subclass 3a . 82 3.6.2. Cooperative Equilibrium Analysis . 83 3.6.3. Prisoner’s Dilemma Phases . 97 3.6.4. Conditions for Convergence in Subclass 3b . 100 3.7. Empirical Comparison . 108 3.8. Conclusion . 110 4. Modeling Learning Algorithms . 111 4.1. Learning and Teaching . 112 4.2. Teaching a Learner with Unknown Discount . 114 4.3. Modeling a Learning Process with a Dynamic Meta-reasoning Model . 117 4.3.1. Applying the Meta-reasoning Framework to Model Learn- ers . 120 4.4. Experiments . 125 4.4.1. Hypothesis . 126 4.4.2. Evaluation . 128 4.4.3. Results . 130 x 4.4.4. Transferability to Other Games . 134 4.5. Conclusion . 138 5. Reasoning Models in the Lemonade Stand Game Tournaments . 139 5.1. Introduction . 140 5.2. Related Work: Location Games . 142 5.3. Lemonade Stand Game . 145 5.3.1. Generalized LSG . 150 5.4. Level-k Meta-reasoning in a g-model . 150 5.4.1. Uniform game levels . 151 5.4.2. Meta-learning Level-based Model Distributions over k and g .............................. 155 5.4.3. Planning with a Meta-learned Level-based g-Model . 156 5.5. Generalized LSG Experiments . 158 5.5.1. Experiments using 2009 and 2010 agents . 161 5.5.2. Experiments using 2011 and 2012 agents . 163 5.6. Conclusion . 166 6. Empirical Agent Based Models of Cooperation in Public Goods Games 167 6.1. Introduction to Empirical Agent-based Modeling . 168 6.2. Related Work . 171 6.3. Background on Experimental Setup and Data . 174 6.4. Deterministic Models . 176 6.4.1. Model Definitions . 178 6.4.2. Predicting Average Contributions . 182 xi Homogenous Population Evaluation . 183 Heterogenous Population Evaluation . 183 Analysis of Types . 185 6.4.3. Predicting Full Distribution of Contributions . 188 6.5. Stochastic Models . 190 6.5.1. Baseline Stochastic Models . 193 Simple Stochastic Model . 193 Experience-Weighted Attraction . 194 6.5.2. Predicting the Full Distribution of Contributions . 195 6.5.3. Selecting a Model . 197 6.6. Simulating Empirical Agent-Based Models . 199 6.6.1. Network Size . 199 6.6.2. Network Density and Variance of Degree . 200 6.6.3. Correlations between Player Type and Node Degree . 202 6.7. Conclusions and Future Work . 203 7. Conclusion .................................. 207 7.1. A Meta-reasoning Framework for Repeated Games . 207 7.2. Experiments in Repeated Games . 210 7.2.1. Learning Algorithms in Simple Games . 211 7.2.2. Lemonade Stand Game . 212 7.2.3. Public Goods Behavioral Experiments . 213 7.3. Implications for Future Research .