Learning and Game Theory Incentives

TTIC 31250 Game theory An Introduction to the Theory • Field developed by economists to study social of Machine Learning & economic interactions. – Wanted to understand why people behave the way they do in different economic situations. Effects of Learning and Game Theory incentives. Rational explanation of behavior. Avrim Blum 5/13/20, 5/18/20 • Zero-sum games, Minimax Optimality & Minimax Thm; Connection to Boosting & Regret Minimization • General-sum games, Nash equilibrium and Correlated equilibrium; Internal/Swap Regret Minimization 1 2 Game theory Game Theory: Setting • Field developed by economists to study social • Have a collection of participants, or players. & economic interactions. – Wanted to understand why people behave the way they • Each has a set of choices, or strategies for do in different economic situations. Effects of how to play/behave. incentives. Rational explanation of behavior. • Combined behavior results in payoffs • “Game” = interaction between parties with their own interests. Could be called “interaction theory”. (satisfaction level) for each player. • Important for understanding/improving large systems: Start by talking about important case of – Internet routing, social networks, e-commerce 2-player zero-sum games – Problems like spam etc. 3 4 2-Player Zero-Sum games Consider the following scenario… • Two players Row and Col. Zero-sum means that what’s good for one is bad for the other. • Shooter has a penalty shot. Can choose to • Game defined by matrix with row for each of Row’s options shoot left or shoot right. and a column for each of Col’s options. Matrix R gives row player’s payoffs, C gives column player’s payoffs, R + C = 0. • Goalie can choose to dive left or dive right. • E.g., penalty shot [Matrix R]: • If goalie guesses correctly, (s)he saves the day. If not, it’s a goooooaaaaall! Left Right goalie Left 0 1 • Vice-versa for shooter. shooter GOAALLL!!! Right 1 0 No goal 5 6 1 Minimax-optimal strategies Minimax-optimal strategies • Minimax optimal strategy is a (randomized) • What are the minimax optimal strategies for strategy that has the best guarantee on its this game? expected payoff, over choices of the opponent. Minimax optimal strategy for shooter is 50/50. Guarantees [maximizes the minimum] 1 expected payoff ≥ no matter what goalie does. • I.e., the thing to play if your opponent knows you 2 Minimax optimal strategy for goalie is 50/50. Guarantees well. 1 expected shooter payoff ≤ no matter what shooter does. 2 Left Right goalie Left Right goalie Left 0 1 Left 0 1 shooter GOAALLL!!! shooter GOAALLL!!! Right 1 0 No goal Right 1 0 No goal 7 8 Minimax-optimal strategies Minimax Theorem (von Neumann 1928) • How about for goalie who is weaker on the • Every 2-player zero-sum game has a unique left? value V. Minimax optimal for shooter is (2/3,1/3). • Minimax optimal strategy for R guarantees Guarantees expected gain at least 2/3. R’s expected gain at least V. Minimax optimal for goalie is also (2/3,1/3). Guarantees expected loss at most 2/3. • Minimax optimal strategy for C guarantees C’s expected loss at most V. Left Right goalie Counterintuitive: Means it doesn’t hurt to publish Left ½ 1 your strategy if both players are optimal. (Borel had shooter proved for symmetric 5x5 but thought was false for Right 1 0 larger games) 9 10 Minimax-optimal strategies Proof of minimax thm using RWM • Claim: no-regret strategies will do nearly as well or better against any sequence of opponent plays. • Suppose for contradiction it was false. – Do nearly as well as best fixed choice in hindsight. • This means some game G has VC > VR: – Implies do nearly as well as best distrib in hindsight – If Column player commits first, there exists – Implies do nearly as well as minimax optimal! a row that gets the Row player at least VC. – But if Row player has to commit first, the Column player can make him get only VR. • Scale matrix so payoffs to row are VC in [-1,0]. Say VR = VC - . VR 11 12 2 Proof contd What if two regret minimizers play each other? • Now, consider playing randomized weighted- majority alg as Row, against Col who plays • Then their time-average strategies must optimally against Row’s distrib. approach minimax optimality. • In T steps, in expectation, 1. If Row’s time-average is far from minimax, then Col 1/2 has strategy that in hindsight substantially beats – Alg gets ¸ [best row in hindsight] – 2(Tlog n) value of game. – BRiH ¸ T¢VC [Best against opponent’s empirical 2. So, by Col’s no-regret guarantee, Col must distribution] substantially beat value of game. 3. So Row will do substantially worse than value. – Alg · T¢V [Each time, opponent knows V C R 4. Contradicts no-regret guarantee for Row. your randomized strategy] VR – Gap is T. Contradicts assumption once T > 2(Tlog n)1/2 , or T > 4log(n)/2. 13 14 Boosting & game theory Boosting & game theory • Suppose I have an algorithm A that for any • Let’s assume the class H is finite. distribution (weighting fn) over a dataset S can • Think of a matrix game where columns indexed by produce a rule h2H that gets < 45% error. examples in S, rows indexed by h in H. • Adaboost gives a way to use such an A to get • 푀푖푗 = 1 if ℎ푖 푥푗 is correct, else 푀푖푗 = −1. error ! 0 at a good rate, using weighted votes of rules produced. • How can we see that this is even possible? 15 16 Boosting & game theory • Assume for any D over cols, exists row s.t. E[payoff] ≥ 0.1. x1, x2, x3,…, xn • Minimax implies exists a h Internal/Swap Regret weighting over rows s.t. for 1 h2 every xi, expected payoff ≥ and 0.1. … hm Correlated Equilibria • So, 푠푛(σ푡 훼푡ℎ푡) is correct on all 푥푖. Weighted vote has L1 margin at least 0.1. Entry ij = 1 if • AdaBoost gives you a way to get this hi(xj) is correct, -1 if with only access via weak learner. incorrect But this at least implies existence… 17 18 3 General-sum games Nash Equilibrium • A Nash Equilibrium is a stable pair of strategies (could be randomized). • In general-sum games, can get win-win • Stable means that neither player has and lose-lose situations. incentive to deviate on their own. • E.g., “what side of sidewalk to walk on?”: • E.g., “what side of sidewalk to walk on”: person Left Right walking Left Right towards you Left (1,1) (-1,-1) Left (1,1) (-1,-1) you Right (-1,-1) (1,1) Right (-1,-1) (1,1) NE are: both left, both right, or both 50/50. 19 20 What if all players minimize regret? Existence of NE In zero-sum games, empirical frequencies quickly • Nash (1950) proved: any general-sum game approaches minimax optimal. must have at least one such equilibrium. In general-sum games, does behavior quickly (or – Might require randomized strategies (called at all) approach a Nash equilibrium? “mixed strategies”) After all, a Nash Eq is exactly a set of • This also yields minimax thm as a corollary. distributions that are no-regret wrt each other. So if the distributions stabilize, they – Pick some NE and let V = value to row player in that equilibrium. must converge to a Nash equil. – Since it’s a NE, neither player can do better Well, unfortunately, no. even knowing the (randomized) strategy their opponent is playing. – So, they’re each playing minimax optimal. 21 22 A bad example for general-sum games A bad example for general-sum games • Augmented Shapley game from [Zinkevich04]: • [Balcan-Constantin-Mehta12]: • First 3 rows/cols are Shapley game (rock / paper / • Failure to converge even in Rank-1 games (games scissors but if both do same action then both lose). where R+C has rank 1). • 4th action “play foosball” has slight negative if other • Interesting because one can find equilibria efficiently player is still doing r/p/s but positive if other player in such games. does 4th action too. • RWM will cycle among first 3 and have no regret, but do worse than only Nash Equilibrium of both playing foosball. • We didn’t really expect this to work given how hard NE can be to find… 23 24 4 What can we say? More general forms of regret • If algorithms minimize “internal” or “swap” 1. “best expert” or “external” regret: regret, then empirical distribution of play – Given n strategies. Compete with best of them in approaches correlated equilibrium. hindsight. • Foster & Vohra, Hart & Mas-Colell,… 2. “sleeping expert” or “regret with time-intervals”: • Though doesn’t imply play is stabilizing. – Given n strategies, k properties. Let Si be set of days satisfying property i (might overlap). Want to simultaneously achieve low regret over each Si. What are internal/swap regret 3. “internal” or “swap” regret: like (2), except that and correlated equilibria? Si = set of days in which we chose strategy i. 25 26 Internal/swap-regret Weird… why care? • E.g., each day we pick one stock to buy “Correlated equilibrium” shares in. • Distribution over entries in matrix, such that if a trusted party chooses one at random and tells • Don’t want to have regret of the form “every you your part, you have no incentive to deviate. time I bought IBM, I should have bought R P S Microsoft instead”. • E.g., Shapley game. • Formally, swap regret is wrt optimal function R --1,1,--11 -1,1 1,-1 f:{1,…,n}!{1,…,n} such that every time you P 1,-1 --1,1,--11 -1,1 played action j, it plays f(j).

Load more