Computing Strong Game-Theoretic Strategies and Exploiting Suboptimal Opponents in Large Games Sam Ganzfried CMU-CS-15-104 May 2015
Total Page:16
File Type:pdf, Size:1020Kb
Computing Strong Game-Theoretic Strategies and Exploiting Suboptimal Opponents in Large Games Sam Ganzfried CMU-CS-15-104 May 2015 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Tuomas Sandholm, Chair Avrim Blum Geoffrey Gordon Michael Bowling, University of Alberta Vincent Conitzer, Duke University Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2015 Sam Ganzfried This research was sponsored by the National Science Foundation under grant numbers IIS-0427858, IIS-0905390, IIS-0964579, CCF-1101668, CCR-0122581, and IIS-1320620, as well as XSEDE computing resources provided by the Pittsburgh Supercomputing Center. I also acknowledge Intel Corporation and IBM for their machine gifts. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government, or any other entity. Keywords: artificial intelligence, game theory, multiagent systems, game solving, game abstraction, equilibrium computation, qualitative models, opponent modeling. Abstract Designing successful agents for multiagent strategic settings is a challenging problem for several reasons. First, many games of interest are far too large to be solved (for a relevant game-theoretic solution concept) by the best current algo- rithms. For example, no-limit Texas hold ’em has approximately 10165 nodes in its game tree, while the best algorithms for computing a Nash equilibrium only scale to games with around 1017 states. A second challenge is that it is not even clear that our goal should be computing a Nash equilibrium in the first place. In games with more than two players (or two-player games that are not zero sum (competitive)), playing a Nash equilibrium has no performance guarantee. Furthermore, even in two-player zero-sum games, we can often obtain significantly higher payoffs by learning to ex- ploit mistakes of a suboptimal opponent than by playing a Nash equilibrium. The leading paradigm for addressing the first challenge is to first approximate the full game with a strategically similar but significantly smaller game, and then to solve this smaller abstract game. All of this computation is done offline in advance, and the strategies are then looked up in a table for actual game play. We have de- veloped new algorithms for improving each step of this paradigm. Specifically, I present new algorithms for performing game abstraction and for computing equilib- ria in several game classes, as well as new approaches for addressing the problem of interpreting actions for the opponent that have been removed from the abstraction and further post-processing techniques that achieve robustness against limitations of the abstraction and equilibrium-finding phases. I then describe two new game-solving paradigms: in the first, relevant portions of the game are solved in real time to a better degree of accuracy than the abstract game, which is solved offline according to the leading paradigm, and in the second, qualitative representations of the structure of equilibrium strategies are leveraged to improve the speed of equilibrium finding. The latter paradigm can be utilized to ob- tain human-understandable knowledge from strategies, which are often represented as massive binary files, thereby enabling improved human decision-making. In the final portion of the thesis, I address the second challenge by presenting new algorithms for effectively learning to exploit unknown static opponents in large imperfect-information games after only a small number of interactions. Further- more, I present new algorithms for exploiting weak opponents that are able to guar- antee a good worst-case performance even against strong dynamic opponents. The approaches are domain independent and apply to any games within very broad classes, which include many important real-world situations. While most of the approaches are presented in the context of two-player zero-sum games, they also apply to games with more than two agents, though in some cases this results in a modification of theoretical guarantees. One application domain that was considered was two-player no-limit Texas hold ’em. Several of the approaches were utilized to create an agent that came in first place in the most recent (2014) AAAI Annual Com- puter Poker Competition, beating each opposing agent with statistical significance. iv Acknowledgments I would like to thank my advisor Tuomas Sandholm for allowing me to pur- sue an extremely ambitious research agenda; a more conservative one would have been significantly less fulfilling. I am glad that we ended up agreeing on Claudico as the name of our man-machine competition agent as opposed to some of Profes- sor Sandholm’s earlier suggestions, which included Carnegietastic, Finntastic, and Sharknegie—I don’t think those would have had quite the same ring! I would like to thank Dong Kim, Jason Les, Bjorn Li, and Doug Polk for taking it easy on Claudico so that our loss was slightly below the value needed to be statistically significant at the 95% confidence level! Claudico may not have been possible without the groundwork that was paved by Andrew Gilpin, my predecessor on the poker project. The “workhorse” file is still called “DoylesGame.java,” even though the no-limit competition has not been 500 BBs since 2008! Claudico also certainly benefited from having Noam Brown on the team, and I am confident that I am leaving the code in extremely capable hands at my departure. Rest assured Troels Sørenson your contributions have not been forgotten, and your code has been pivotal in producing some of the finest betting abstractions in the world to this day. I also enjoyed our conversations on various equilibrium re- finements, including trembling-hand perfect equilibrium, quasi-perfect equilibrium, and proper equilibrium (both the normal-form and extensive-form variants!) The same goes for Branislav Bosansky, Jir˘´ı Cerm˘ ak,´ and Nicola Gatti. I would like to thank the other members of my thesis committee—Avrim Blum, Michael Bowling, Vincent Conitzer, and Geoffrey Gordon, as well as Kevin Leyton- Brown who was on my thesis proposal committee. I appreciate the time, effort, and thoughtfulness you each put into providing valuable feedback. I hope that Geoff was not too devastated to learn that the equilibrium strategies he had computed for one-card poker that had been on his website for many years were actually (weakly) dominated! It was a close call, but Vince wins the award for longest distance tra- versed to attend—both for the proposal and the defense! I would also like to thank Michael Wellman for insightful discussions—it’s too bad that purification has not yet proven to be as helpful for trading agents as it has for poker agents. I enjoyed working with Ariel Procaccia on a project for optimizing weights for exam questions to minimize the average squared error difference from the homework average, which we used as a proxy for the students’ “true ability.” It is too bad that much of the improvements of the techniques we considered disappeared when all the scores were normalized by the average, but I am optimistic that there is a bright future ahead for this line of inquiry. I would like to thank everyone who has been a part of the poker competition and workshop over the years (am I the only person who has attended all four events?). I’d particularly like to thank Michael Johanson, who was extremely helpful in pro- viding details on the information abstraction algorithms used by Alberta’s poker agents. Without his numerous detailed emails, it would have taken me eons longer to comprehend the differences between perfect and imperfect-recall abstraction, not to mention distribution-aware abstraction, the merits of earth-mover’s distance over L2, and “OCHS.” It was a pleasure co-organizing the workshop with Eric Jackson in 2014, and I am very appreciative of Nolan Bard, Neil Burch, Eric Jackson, Jonathan Rubin, and Kevin Waugh for their effort running the computer poker competition, as well as Adrian Koy, Jon Parker, and Tim Reiff for many enjoyable conversations. I would also like to thank Abe Othman (aka “othasaurus”), Daniel Miller (aka “millertie”), Winslow Strong (aka “WINSLOW STRONG”), and Raja “fitbit” Sam- basivan for entertaining all my wacky “startup” ideas. I just know that “SHPRATZ” will go bigtime. I will never forget the dentist of Italian descent who rejoined at just the proper time, shrieking “Oh No!” after ripping backhands that exceeded the base- line by not more than two inches. Thank you HH for informing me that it probably isn’t the greatest idea to wear a t-shirt in my main website/profile photo—they don’t teach these things in grad school! Andrew “azntracker” Li, now that you are retired, I feel ok telling you that you call wayyy too wide on the bubble, jeez. Deb Cavlovich and Charlotte Yano have done far more than necessary to ensure the logistical aspects of my time at CMU proceeded impeccably. And finally, a big thank you to Sally, “Big J,” and of course “the cowboy,” among other things, for all the couscous. vi Contents I Introduction 1 1 Overview 3 2 Game Theory Background 5 2.1 Extensive-form games . 7 2.2 Repeated games . 9 2.3 Other game representations . 9 3 Poker 11 4 Leading Game-Solving Paradigm 13 II New Approaches for Game Solving within the Leading Paradigm 17 5 Potential-Aware Imperfect-Recall Abstraction with Earth Mover’s Distance in Imperfect-Information Games 19 5.1 Potential-aware abstraction . 21 5.1.1 Domain-independent example . 22 5.1.2 Poker example . 23 5.2 Algorithm for potential-aware imperfect-recall abstraction, with EMD .