University of Alberta by Darse Billings a Thesis Submitted to the Faculty Of

University of Alberta ALGORITHMS AND ASSESSMENT IN COMPUTER POKER by Darse Billings A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Department of Computing Science Edmonton, Alberta Fall 2006 University of Alberta Library Release Form Name of Author: Darse Billings Title of Thesis: Algorithms and Assessment in Computer Poker Degree: Doctor of Philosophy Year this Degree Granted: 2006 Permission is hereby granted to the University of Alberta Library to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatever without the author's prior written permission. Darse Billings Date: University of Alberta ALGORITHMS AND ASSESSMENT IN COMPUTER POKER by Darse Billings A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Department of Computing Science Edmonton, Alberta Fall 2006 University of Alberta Faculty of Graduate Studies and Research The undersigned certify that they have read, and recommend to the Faculty of Graduate Studies and Research for acceptance, a thesis entitled Algorithms and Assessment in Computer Poker submitted by Darse Billings in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dr. Jonathan Schaeffer Co-Supervisor Dr. Duane Szafron Co-Supervisor Dr. Robert Holte Dr. Michael Carbonaro Dr. Michael Littman External Examiner Date: Dedication: To Howard and Eileen Billings, for their eternal love and support. And to Xor, for many years of unconditional love and friendship. Abstract The game of poker offers a clean well-defined domain in which to investigate some truly fundamental issues in computing science, such as how to handle deliberate misinformation, and how to make intelligent guesses based on partial knowledge. In the taxonomy of games, poker lies at the opposite end of the spectrum from well-studied board games like checkers and chess. Poker is a multi-player game with stochasticity (random events occurring over a known probability distribution), imperfect information (some information is hidden from each player), and partially observable outcomes (some information might never be known). Consequently, the use of deception, opponent modeling, and coping with uncertainty are indis- pensable elements of high-level strategic play. Traditional methods for computer game-playing are incapable of handling these properties. Writing a program to play a skillful game of poker is a challenging proposition from an engineering perspective as well, since each decision must be made quickly (typically about one second). A major theme of this dissertation is the evolution of architectures for poker-playing programs that has occurred since the research be- gan in 1992. Four distinct approaches we address are: knowledge-based systems, simulation, game-theoretic methods, and adaptive imperfect information game-tree search. Each phase of the research explored the strengths and weaknesses of the corresponding architectures, both in theory and in practice. The important problem of obtaining an accurate assessment of performance is also addressed. The creation of a powerful tool for this purpose, called DIVAT, is discussed in detail. The aca- demic papers arising from each of these studies constitute the core chapters of this thesis. The conclusion discusses some of the advantages and shortcomings of each approach, along with the most valuable lessons learned in the process. Although the goal of surpassing all human players has not yet been attained, the path has been cleared. The best poker programs are beginning to pose a serious threat, even in this most “human” of games. As programs continue to improve, they provide new insights into poker strategy, and valuable lessons on how to handle various forms of uncertainty. Preface It is very common for Ph.D. students to be dissatisfied with their dissertation. The feeling is especially acute in my case, because it could reflect poorly on the excellent work of my colleagues. Due to the scope and collaborative nature of this work, this thesis is not nearly as comprehensive as I would like. Many of the current research topics are beyond the scope of the thesis, and are not discussed. Expedience had to win over idealism. Many excellent comments by my thesis committee (Michael Littman and Robert Holte in particular) were not addressed in this document, due to time constraints. Revisions (and perhaps additional content) might be available in online versions. The reader is also encouraged to read Morgan Kan's upcoming M.Sc. thesis for an alternate presentation of the DIVAT method discussed in Chapter 5, and some detailed analysis of the Vegas'05 and AAAI'06 competitions (among other things). Numerous people have made significant contributions to this research, however I alone accept the blame for errors and failures to communicate effectively. Darse Billings, September 27, 2006. Acknowledgements I have been truly privileged to work closely with a large team of researchers for many years. In fact, it is fair to say that I have been spoiled, having so much support in developing and implementing my ideas. Consequently, I have many people to thank, and cannot possibly do an adequate job of expressing my gratitude in just a few pages. I could not have asked for a better thesis committee – their feedback and guid- ance has simply been outstanding. Jonathan Schaeffer is the best supervisor I could have ever asked for. He gave me the latitude to create my own research area, to blaze a new path, and to learn from my mistakes along the way. He pushed me when I needed to be pushed. He organized the Computer Poker Research Group (CPRG) to further the research, resulting in more than 20 scientific publications that simply would not have happened otherwise. Apart from his own remarkable achievements, he is a catalyst to the research of all those around him. Although I tease him at every opportunity, it is only because my respect for him runs so deep. Duane Szafron is one of the finest role models I have ever encountered. He is revered by his students for his patience, wisdom, and diligence to their education. His feedback is invaluable for filling gaps, and greatly improving communication. Robert Holte is an excellent person to converse with, not only because of his incredible breadth and depth of knowledge, but also for his many interests and anecdotes. His supervision of Bret Hoehn's M.Sc. thesis provided a comprehensive study of opponent modeling in simpler poker domains, illustrating some the inherent challenges and limitations we face. Mike Carbonaro has been a friend for many years, and has provided much encouragement. I am particularly indebted to him for investigating and lobbying for the paper-based thesis option. Although it is a common format in other faculties and universities, this is the first paper-based Ph.D. thesis in Computing Science at the U of A. This option eliminated months of drudgery in re-hashing work that was in some cases more than ten years old. A considerable amount of redundancy is present as a result, but I believe the benefits far outweigh the disadvantages. I hope future students will benefit from the precedent that Mike has championed. Michael Littman has been an inspiration to me since 1998, when he followed my AAAI talk with the single best example I've ever seen of a presentation that is both entertaining and informative. His enthusiasm is infectious, and he is a perfect exemplar of how to do good science and have fun doing it. He provided exten- sive excellent feedback on the dissertation, only some of which I have had time to incorporate. Martin Muller¨ was originally on my thesis committee, but was on sabbatical in Europe at the time of the defense. He provided a lot of valuable feedback on earlier drafts of the dissertation, as well as some insightful words of wisdom about the process. Among my peers, Aaron Davidson has done more than anyone to make me look good. His coding wizardry has had a huge impact on the CPRG. We had way too much fun developing POKI's online personality (responding to conversation, telling jokes, teaching Latin, and on and on). Aaron's online poker server has provided an important venue for empirical testing and data collection. As the lead developer of POKER ACADEMY (www.poker-academy.com), he has helped produce the finest poker training software in the world, by far. Apart from these things, his friendship has enhanced my life in innumerable ways. The other major workhorse for the CPRG has been Neil Burch. He was responsible for assimilating the many concepts and details needed to produce SPARBOT. Half of our papers would not have been possible without his tireless efforts running countless experiments. Terry Schauenberg was responsible for merging some vague modeling ideas with a concrete search algorithm to produce our temperamental champion, VEXBOT. He constantly surprises me with his keen insights, and previously unknown areas of expertise. In the past year, Morgan Kan has had an enormous positive impact on my life. He has a knack for asking innocent questions that hit like a punch to the spleen. He is uncompromising in demanding clarity and precision, and equally conscientious in implementation. The DIVAT analysis technique is much stronger as a result of his involvement. All of the members of the CPRG have made significant contributions to the work.

University of Alberta by Darse Billings a Thesis Submitted to the Faculty Of

University of Alberta Library Release

SCIENCE VOLUME 18, No 2, FALL 2007

Building a Champion Level Computer Poker Player

The Game Stats Appendix A: Further Reading

CHINOOK the World Man-Machine Checkers Champion

Best-First and Depth-First Minimax Search in Practice

Man Versus Machine. . . for the World Checkers Championship

Games History.Key

Poker Learner: Reinforcement Learning Applied to Texas Hold'em

Solving the Game of Checkers

Lecture Notes in Computer Science Edited by J

Chinook: the World Man-Machine Checkers Champion