Game Theory, On-Line Prediction and Boosting

Proceedings of the Ninth Annual Conference on Computational Learning Theory, 1996. Game Theory, On-line Prediction and Boosting Yoav Freund Robert E. Schapire AT&T Laboratories 600 Mountain Avenue Murray Hill, NJ 07974-0636 yoav, schapire ¡ @research.att.com Abstract for learning to play repeated games based on the on-line prediction methods of Littlestone and Warmuth [15]. The We study the close connections between game the- analysis of this algorithmyields a new (as far as we know) and ory, on-line prediction and boosting. After a brief simple proof of von Neumann's famous minmax theorem, as review of game theory, we describe an algorithm well as a provable method of approximately solving a game. for learning to play repeated games based on the In the last part of the paper we show that the on-line on-line prediction methods of Littlestone and War- prediction model is obtained by applying the game-playing muth. The analysis of this algorithm yields a sim- algorithm to an appropriate choice of game and that boosting ple proof of von Neumann's famous minmax theo- is obtained by applying the same algorithm to the ªdualº of rem, as well as a provable method of approximately this game. solving a game. We then show that the on-line prediction model is obtained by applying this game- 2 GAME THEORY playing algorithm to an appropriate choice of game We begin with a review of basic game theory. Further back- and that boosting is obtained by applying the same ground can be found in any introductorytext on game theory; algorithm to the ªdualº of this game. see for instance Fudenberg and Tirole [11]. We study two- person games in normal form. That is, each game is de®ned by a matrix ¦ . There are two players called the row player 1 INTRODUCTION and column player. To play the game, the row player chooses a row § , and, simultaneously, the column player chooses a The purpose of this paper is to bring out the close connec- © § ¨ column ¨ . The selected entry is the loss suffered by tions between game theory, on-line prediction and boosting. the row player. Brie¯y, game theory is the study of games and other interac- For instance, the loss matrix for the children's game tions of various sorts. On-line prediction is a learning model ªRock, Paper, Scissorsº is given by: in which an agent predicts the classi®cation of a sequence of items and attempts to minimize the total number of prediction R P S R 1 errors. Finally, boosting is a method of converting a ªweakº 2 1 0 learning algorithm which performs only slightly better than P 0 1 1 random guessing into one that performs extremely well. 2 S 1 All three of these topics will be explained in more detail 1 0 2 below. All have been studied extensively in the past. In this The row player's goal is to minimize its loss. Often, the paper, the close relationship between these three seemingly goal of the column player is to maximize this loss, in which unrelated topics will be brought out. case the game is said to be ªzero-sum.º Most of our results Here is an outline of the paper. We will begin with a are given in the context of a zero-sum game. However, our review of game theory. Then we will describe an algorithm results also apply when no assumptions are made about the Home page: ªhttp://www.research.att.com/orgs/ssr/people/uidº. goal or strategy of the column player. We return to this point Expected to change to ªhttp://www.research.att.com/Äuidº some- below. For the sake of simplicity, we assume that all the losses ¥ time in the near future (for uid ¢¤£ yoav, schapire ). are in the range 0 1 . Simple scaling can be used to get more general results. Also, we restrict ourselves to the case Permission to make digital/hard copy of all or part of this mate- where the number of choices available to each player is ®nite. rial without fee is granted provided that copies are not made or However, most of the results translate with very mild addi- distributed for pro®t or commercial advantage, the ACM copy- tional assumptions to cases in which the number of choices right/server notice, the title of the publication and its date appear, is in®nite. For a discussion of in®nite matrix games see, for and notice is given that copying is by permission of the Association for Computing Machinery, Inc. (ACM). To copy otherwise, to re- instance, Chapter 2 in Ferguson [3]. publish, to post on servers or to redistribute to lists, requires prior speci®c permission and/or a fee. 2.1 RANDOMIZED PLAY We might go on naively to conjecture that the advantage of As described above, the players choose a single row or col- playing last is strict for some games so that, at least in some umn. Usually, this choice of play is allowed to be random- cases, the inequality in Eq. (2) is strict. ized. That is, the row player chooses a distribution over the Surprisingly, it turns out not to matter which player plays ®rst. Von Neumann's well-known minmax theorem states rows of ¦ , and (simultaneously) the column player chooses that the outcome is the same in either case so that a distribution ¡ over columns. The row player's expected loss is easily computed as ¨ max min M P Q min max M P Q 3 ¢¤£ T Q P P Q § ¦ § ¨ §¡ ¨ ©¨ ¦ ¡ ¦ ¦ ¥ for every matrix . The common value of the two sides For ease of notation, we will often denote this quantity by of the equality is called the value of the game ¦ . A proof of the minmax theorem will be given in Section 2.5. ¡¤ ¦ , and refer to it simply as the loss (rather than In words, Eq. (3) means that the row player has a (min- expected loss). In addition, if the row player chooses a dis- £ max) strategy such that regardless of the strategy ¡ played ¨ tribution but the column player chooses a single column , ¡ by the column player, the loss suffered ¦ will be at § ¦ § ¨ then the (expected) loss is which we denote most . Symmetrically, it means that the column player has ¦ ¨ ¦ § ¡¤ by . The notation is de®ned analogously. a (maxmin) strategy ¡ such that, regardless of the strategy § Individual (deterministically chosen) rows and columns played by the row player the loss will be at least . This ¨ are called pure strategies. Randomized plays de®ned by means that the strategies ¡ and are optimal in a strong distributions and ¡ over rows and columns are called sense. mixed strategies. The number of rows of the matrix ¦ will Thus, classical game theory says that given a (zero-sum) be denoted by . game ¦ , one should play using a minmax strategy. Such a 2.2 SEQUENTIAL PLAY strategy can be computed using linear programming. Up until now, we have assumed that the players choose their However, there are a number of problems with this ap- (pure or mixed) strategies simultaneously. Suppose now that proach. For instance, instead play is sequential. That is, suppose that the column ¦ may be unknown; player chooses its strategy ¡ after the row player has chosen and announced its strategy . Assume further that the column ¦ may be so large that computing a minmax strategy player's goal is to maximize the row player's loss (i.e., that using linear programming is infeasible; the game is zero-sum). Then given , such a ªworst-caseº or ªadversarialº column player will choose ¡ to maximize the column player may not be truly adversarial and may ¡¤ ¦ ; that is, if the row player plays mixed strategy , behave in a manner that admits loss signi®cantly smaller then its payoff will be than the game value . max M P Q 1 Q Overcoming these dif®culties in the one-shot game is (It is understood here and throughout the paper that maxQ hopeless. But suppose instead that we are playing the game denotes maximum over all probability distributions over col- repeatedly. Then it is natural to ask if one can learn to play umns; similarly, minP will always denote minimum over well against the particular opponent that is being faced. all probability distributions over rows. These extrema exist because the set of distributionsover a ®nite space is compact.) 2.4 REPEATED PLAY Knowing this, the row player should choose to mini- Such a model of repeated play can be formalized as described mize Eq. (1), so the row player's loss will be below. To emphasize the roles of the two players, we refer to the row player as the learner and the column player as the min max M P Q P Q environment. A mixed strategy realizing this minimum is called a min- Let ¦ be a matrix, possibly unknown to the learner. The max strategy. game is played repeatedly in a sequence of rounds. On round Suppose now that the column player plays ®rst and the ¨ 1 : row player can choose its play with the bene®t of knowing 1. the learner chooses mixed strategy ; the column player's chosen strategy ¡ . Then by a symmetric argument, the loss of the row player will be 2. the environment chooses mixed strategy ¡ (which may max min M P Q be chosen with knowledge of ) Q P ¡ § ¡ and a realizing the maximum is called a maxmin strategy. 3. the learner is permitted to observe the loss ¦ for each row § ; this is the loss it would have suffered had it 2.3 THE MINMAX THEOREM played using pure strategy § ; Intuitively, we expect the player who chooses its strategy last ¦ ¡ to have the advantage since it plays knowing its opponent's 4.

Load more