<<

TTIC 31250 An Introduction to the Theory • developed by to study social of & economic interactions. – Wanted to understand why people behave the way they do in different economic situations. Effects of Learning and Game Theory incentives. Rational explanation of behavior.

Avrim Blum 5/13/20, 5/18/20

• Zero-sum , Optimality & Minimax Thm; Connection to Boosting & Regret Minimization • General-sum games, and ; Internal/Swap Regret Minimization 1 2

Game theory Game Theory: Setting • Field developed by economists to study social • Have a collection of participants, or players. & economic interactions. – Wanted to understand why people behave the way they • Each has a set of choices, or for do in different economic situations. Effects of how to play/behave. incentives. Rational explanation of behavior. • Combined behavior results in payoffs • “Game” = interaction between parties with their own . Could be called “interaction theory”. (satisfaction level) for each player. • Important for understanding/improving large systems: Start by talking about important case of – Internet routing, social networks, e-commerce 2-player zero-sum games – Problems like spam etc.

3 4

2-Player Zero-Sum games Consider the following scenario… • Two players Row and Col. Zero-sum means that what’s good for one is bad for the other. • Shooter has a penalty shot. Can choose to • Game defined by with row for each of Row’s options shoot left or shoot right. and a column for each of Col’s options. Matrix R gives row player’s payoffs, C gives column player’s payoffs, R + C = 0. • Goalie can choose to dive left or dive right. • E.g., penalty shot [Matrix R]: • If goalie guesses correctly, (s)he saves the day. If not, it’s a goooooaaaaall! Left Right goalie Left 0 1 • Vice-versa for shooter. shooter GOAALLL!!!

Right 1 0 No goal

5 6

1 Minimax-optimal strategies Minimax-optimal strategies • Minimax optimal is a (randomized) • What are the minimax optimal strategies for strategy that has the best guarantee on its this game? expected payoff, over choices of the opponent. Minimax optimal strategy for shooter is 50/50. Guarantees [maximizes the minimum] 1 expected payoff ≥ no matter what goalie does. • I.e., the thing to play if your opponent knows you 2 Minimax optimal strategy for goalie is 50/50. Guarantees well. 1 expected shooter payoff ≤ no matter what shooter does. 2

Left Right goalie Left Right goalie

Left 0 1 Left 0 1 shooter GOAALLL!!! shooter GOAALLL!!!

Right 1 0 No goal Right 1 0 No goal

7 8

Minimax-optimal strategies Minimax Theorem (von Neumann 1928) • How about for goalie who is weaker on the • Every 2-player zero-sum game has a unique left? V. Minimax optimal for shooter is (2/3,1/3). • Minimax optimal strategy for R guarantees Guarantees expected gain at least 2/3. R’s expected gain at least V. Minimax optimal for goalie is also (2/3,1/3). Guarantees expected loss at most 2/3. • Minimax optimal strategy for C guarantees C’s expected loss at most V. Left Right goalie Counterintuitive: Means it doesn’t hurt to publish Left ½ 1 your strategy if both players are optimal. (Borel had shooter proved for symmetric 5x5 but thought was false for Right 1 0 larger games)

9 10

Minimax-optimal strategies Proof of minimax thm using RWM • Claim: no-regret strategies will do nearly as well or better against any sequence of opponent plays. • Suppose for contradiction it was false. – Do nearly as well as best fixed choice in hindsight. • This means some game G has VC > VR: – Implies do nearly as well as best distrib in hindsight – If Column player commits first, there exists

– Implies do nearly as well as minimax optimal! a row that gets the Row player at least VC. – But if Row player has to commit first, the

Column player can make him get only VR. • Scale matrix so payoffs to row are VC in [-1,0]. Say VR = VC - .

VR

11 12

2 Proof contd What if two regret minimizers play each other? • Now, consider playing randomized weighted- majority alg as Row, against Col who plays • Then their time-average strategies must optimally against Row’s distrib. approach minimax optimality. • In T steps, in expectation, 1. If Row’s time-average is far from minimax, then Col 1/2 has strategy that in hindsight substantially beats – Alg gets ¸ [best row in hindsight] – 2(Tlog n) value of game. – BRiH ¸ T¢VC [Best against opponent’s empirical 2. So, by Col’s no-regret guarantee, Col must ] substantially beat value of game. 3. So Row will do substantially worse than value. – Alg · T¢V [Each time, opponent knows V C R 4. Contradicts no-regret guarantee for Row. your randomized strategy] VR – Gap is T. Contradicts assumption once T > 2(Tlog n)1/2 , or T > 4log(n)/2.

13 14

Boosting & game theory Boosting & game theory

• Suppose I have an A that for any • Let’s assume the class H is finite. distribution (weighting fn) over a dataset S can • Think of a matrix game where columns indexed by produce a rule h2H that gets < 45% error. examples in S, rows indexed by h in H. • Adaboost gives a way to use such an A to get • 푀푖푗 = 1 if ℎ푖 푥푗 is correct, else 푀푖푗 = −1. error ! 0 at a good rate, using weighted votes of rules produced. • How can we see that this is even possible?

15 16

Boosting & game theory • Assume for any D over cols, exists row s.t. E[payoff] ≥ 0.1. x1, x2, x3,…, xn • Minimax implies exists a h Internal/Swap Regret weighting over rows s.t. for 1 h2 every xi, expected payoff ≥ and 0.1. … hm Correlated Equilibria • So, 푠𝑔푛(σ푡 훼푡ℎ푡) is correct on all 푥푖. Weighted vote has L1 margin at least 0.1. Entry ij = 1 if • AdaBoost gives you a way to get this hi(xj) is correct, -1 if with only access via weak learner. incorrect But this at least implies existence…

17 18

3 General-sum games Nash Equilibrium • A Nash Equilibrium is a stable pair of strategies (could be randomized). • In general-sum games, can get win-win • Stable means that neither player has and lose-lose situations. incentive to deviate on their own. • E.g., “what side of sidewalk to walk on?”: • E.g., “what side of sidewalk to walk on”: person Left Right walking Left Right towards you Left (1,1) (-1,-1) Left (1,1) (-1,-1) you

Right (-1,-1) (1,1) Right (-1,-1) (1,1)

NE are: both left, both right, or both 50/50. 19 20

What if all players minimize regret? Existence of NE  In zero-sum games, empirical frequencies quickly • Nash (1950) proved: any general-sum game approaches minimax optimal. must have at least one such equilibrium.  In general-sum games, does behavior quickly (or – Might require randomized strategies (called at all) approach a Nash equilibrium? “mixed strategies”)  After all, a Nash Eq is exactly a set of • This also yields minimax thm as a corollary. distributions that are no-regret wrt each other. So if the distributions stabilize, they – Pick some NE and let V = value to row player in that equilibrium. must converge to a Nash equil. – Since it’s a NE, neither player can do better  Well, unfortunately, no. even knowing the (randomized) strategy their opponent is playing. – So, they’re each playing minimax optimal.

21 22

A bad example for general-sum games A bad example for general-sum games • Augmented Shapley game from [Zinkevich04]: • [Balcan-Constantin-Mehta12]: • First 3 rows/cols are Shapley game (rock / paper / • Failure to converge even in Rank-1 games (games scissors but if both do same action then both lose). where R+C has rank 1). • 4th action “play foosball” has slight negative if other • Interesting because one can find equilibria efficiently player is still doing r/p/s but positive if other player in such games. does 4th action too. • RWM will cycle among first 3 and have no regret, but do worse than only Nash Equilibrium of both playing foosball.

• We didn’t really expect this to work given how hard NE can be to find…

23 24

4 What can we say? More general forms of regret • If minimize “internal” or “swap” 1. “best expert” or “external” regret: regret, then empirical distribution of play – Given n strategies. Compete with best of them in approaches correlated equilibrium. hindsight. • Foster & Vohra, Hart & Mas-Colell,… 2. “sleeping expert” or “regret with time-intervals”: • Though doesn’t imply play is stabilizing. – Given n strategies, k . Let Si be set of days satisfying i (might overlap). Want to simultaneously achieve low regret over each Si. What are internal/swap regret 3. “internal” or “swap” regret: like (2), except that and correlated equilibria? Si = set of days in which we chose strategy i.

25 26

Internal/swap-regret Weird… why care? • E.g., each day we pick one stock to buy “Correlated equilibrium” shares in. • Distribution over entries in matrix, such that if a trusted party chooses one at random and tells • Don’t want to have regret of the form “every you your part, you have no incentive to deviate. time I bought IBM, I should have bought R P S Microsoft instead”. • E.g., Shapley game. • Formally, swap regret is wrt optimal function R --1,1,--11 -1,1 1,-1 f:{1,…,n}!{1,…,n} such that every time you P 1,-1 --1,1,--11 -1,1 played action j, it plays f(j). S -1,1 1,-1 --1,1,--11 In general-sum games, if all players have low swap- regret, then empirical distribution of play is apx correlated equilibrium. 29 30

Connection Correlated vs Coarse-correlated Eq • If all parties run a low swap regret In both cases: a distribution over entries in the algorithm, then empirical distribution of play matrix. Think of a third party choosing from this distr and telling you your part as “advice”. is an apx correlated equilibrium. • Correlator chooses random time t 2 {1,2,…,T}. “Correlated equilibrium” Tells each player to play the action j they played • You have no incentive to deviate, even after in time t (but does not reveal value of t). seeing what the advice is. • Expected incentive to deviate:jPr(j)(Regret|j) = swap-regret of algorithm “Coarse-Correlated equilibrium” • So, this suggests correlated equilibria may be • If only choice is to see and follow, or not to see natural things to see in multi- systems at all, would prefer the former. where individuals are optimizing for themselves Low external-regret ) apx coarse correlated equilib.

31 32

5 Can convert any “best expert” algorithm A into one Internal/swap-regret, contd achieving low swap regret. Idea:

Algorithms for achieving low regret of this – Instantiate one copy Aj responsible for expected form: regret over times we play j.

• Foster & Vohra, Hart & Mas-Colell, Fudenberg & Q A1 Play p = pQ Levine. Alg q2 Cost vector c A2 • Will present method of [BM05] showing how to p2c . . convert any “best expert” algorithm into one . achieving low swap regret. A – Allows us to view p as prob we play n • j Unfortunately, #steps to achieve low swap action j, or as prob we play alg A . regret is O(n log n) rather than O(log n). j – Give Aj of pjc. t t t t t – Aj guarantees t (pj c )¢qj · mini t pj ci + [regret term] t t t t t – Write as: t pj (qj ¢c ) · mini t pj ci + [regret term]

33 34

Can convert any “best expert” algorithm A into one Can convert any “best expert” algorithm A into one achieving low swap regret. Idea: achieving low swap regret. Idea:

– Instantiate one copy Aj responsible for expected – Instantiate one copy Aj responsible for expected regret over times we play j. regret over times we play j.

Q A1 Q A1 Play p = pQ Play p = pQ Alg q2 Alg q2 Cost vector c A2 Cost vector c A2 p2c p2c ...... A A – Sum over j, get: n – Sum over j, get: n

t t t t t t t t t t t p Q c · j mini t pj ci + n[regret term] t p Q c · j mini t pj ci + n[regret term]

Our total cost For each j, can move our prob to its own i=f(j) Our total cost For each j, can move our prob to its own i=f(j)

t t t t t – Write as: t pj (qj ¢c ) · mini t pj ci + [regret term] – Get swap-regret at most n times orig external regret.

35 36

6