Game Theory I

Plan for Today & Next time 15-859(B) Machine Learning • 2-player zero-sum games Theory • 2-player general-sum games – Nash equilibria – Correlated equilibria Learning and Game Theory • Internal/swap regret and connection to Avrim Blum correlated equilibria • Many-player games with structure: congestion games / exact potential games – Best-response dynamics – Price of anarchy, Price of stability 2-Player Zero-Sum games Game Theory terminolgy • Two players R and C. Zero-sum means that what’s • Rows and columns are called pure strategies. good for one is bad for the other. • Game defined by matrix with a row for each of R’s • Randomized algs called mixed strategies. options and a column for each of C’s options. Matrix tells who wins how much. • “Zero sum” means that game is purely • an entry (x,y) means: x = payoff to row player, y = payoff to competitive. (x,y) satisfies x+y=0. (Game column player. “Zero sum” means that y = -x. doesn’t have to be fair). • E.g., penalty shot: Left Right goalie Left Right goalie Left (0,0) (1,-1) Left (0,0) (1,-1) shooter GOAALLL!!! shooter GOAALLL!!! Right (1,-1) (0,0) No goal Right (1,-1) (0,0) No goal Minimax-optimal strategies Minimax-optimal strategies • Minimax optimal strategy is a (randomized) • Can solve for minimax-optimal strategies strategy that has the best guarantee on its using Linear programming expected gain, over choices of the opponent. • No-regret strategies will do nearly as well or [maximizes the minimum] better. • I.e., the thing to play if your opponent knows • I.e., the thing to play if your opponent knows you well. you well. Left Right goalie Left Right goalie Left (0,0) (1,-1) Left (0,0) (1,-1) shooter GOAALLL!!! shooter GOAALLL!!! Right (1,-1) (0,0) No goal Right (1,-1) (0,0) No goal 1 Minimax Theorem (von Neumann 1928) Interesting game to think about • Every 2-player zero-sum game has a unique • Graph G, source s, sink t. value V. • Player A chooses path P from s to t. • Minimax optimal strategy for R guarantees • Player B chooses edge e in G. R’s expected gain at least V. • If e is in P, B wins. Else A wins. • Minimax optimal strategy for C guarantees • What is minimax optimal strategy for B, A? C’s expected loss at most V. • Note that can run RWM for B, and best- Existence of no-regret strategies gives one response for A (shortest path alg on B’s way of proving the theorem. weights) to get apx-minimax-optimal. General-sum games • In general-sum games, can get win-win Now, to General-Sum games… and lose-lose situations. • E.g., “what side of sidewalk to walk on?”: person Left Right walking towards you Left you (1,1) (-1,-1) Right (-1,-1) (1,1) General-sum games Nash Equilibrium • A Nash Equilibrium is a stable pair of strategies (could be randomized). • In general-sum games, can get win-win • Stable means that neither player has and lose-lose situations. incentive to deviate on their own. • E.g., “which movie should we go to?”: • E.g., “what side of sidewalk to walk on”: Bully Hunger games Left Right Bully (8,2) (0,0) Left (1,1) (-1,-1) Hunger games (0,0) (2,8) Right (-1,-1) (1,1) No longer a unique “value” to the game. NE are: both left, both right, or both 50/50. 2 Uses NE can do strange things • Economists use games and equilibria as • Braess paradox: models of interaction. – Road network, traffic going from s to t. • E.g., pollution / prisoner’s dilemma: – travel time as function of fraction x of – (imagine pollution controls cost $4 but improve traffic on a given edge. everyone’s environment by $3) travel time = 1, travel time indep of traffic t(x)=x. don’t pollute pollute 1 x s t don’t pollute (2,2) (-1,3) x 1 pollute (3,-1) (0,0) Fine. NE is 50/50. Travel time = 1.5 Need to add extra incentives to get good overall behavior. NE can do strange things • Braess paradox: Existence of NE – Road network, traffic going from s to t. • Nash (1950) proved: any general-sum game – travel time as function of fraction x of must have at least one such equilibrium. traffic on a given edge. – Might require mixed strategies. travel time = 1, travel time • This also yields minimax thm as a corollary. indep of traffic t(x)=x. 1 x – Pick some NE and let V = value to row player in s 0 t that equilibrium. – Since it’s a NE, neither player can do better x 1 even knowing the (randomized) strategy their opponent is playing. Add new superhighway. NE: everyone – So, they’re each playing minimax optimal. uses zig-zag path. Travel time = 2. Existence of NE in 2-player games Proof • Proof will be non-constructive. • We’ll start with Brouwer’s fixed point theorem. • Unlike case of zero-sum games, we do not n know any polynomial-time algorithm for – Let S be a compact convex region in R and let finding Nash Equilibria in n £ n general-sum f:S ! S be a continuous function. games. [known to be “PPAD-hard”] – Then there must exist x 2 S such that f(x)=x. • Notation: – x is called a “fixed point” of f. – Assume an nxn matrix. • Simple case: S is the interval [0,1]. – Use (p1,...,pn) to denote mixed strategy for row • We will care about: player, and (q1,...,qn) to denote mixed strategy for column player. – S = {(p,q): p,q are legal probability distributions on 1,...,n}. I.e., S = simplexn £ simplexn 3 Proof (cont) Try #1 • S = {(p,q): p,q are mixed strategies}. • What about f(p,q) = (p’,q’) where p’ is best • Want to define f(p,q) = (p’,q’) such that: response to q, and q’ is best response to p? – f is continuous. This means that changing p • Problem: not necessarily well-defined: or q a little bit shouldn’t cause p’ or q’ to – E.g., penalty shot: if p = (0.5,0.5) then q’ could change a lot. be anything. – Any fixed point of f is a Nash Equilibrium. Left Right • Then Brouwer will imply existence of NE. Left (0,0) (1,-1) Right (1,-1) (0,0) Try #1 Instead we will use... • What about f(p,q) = (p’,q’) where p’ is best • f(p,q) = (p’,q’) such that: response to q, and q’ is best response to p? – q’ maximizes [(expected gain wrt p) - ||q-q’||2] 2 • Problem: also not continuous: – p’ maximizes [(expected gain wrt q) - ||p-p’|| ] – E.g., if p = (0.51, 0.49) then q’ = (1,0). If p = (0.49,0.51) then q’ = (0,1). Left Right Left (0,0) (1,-1) p p’ Right (1,-1) (0,0) Note: quadratic + linear = quadratic. Instead we will use... Instead we will use... • f(p,q) = (p’,q’) such that: • f(p,q) = (p’,q’) such that: – q’ maximizes [(expected gain wrt p) - ||q-q’||2] – q’ maximizes [(expected gain wrt p) - ||q-q’||2] – p’ maximizes [(expected gain wrt q) - ||p-p’||2] – p’ maximizes [(expected gain wrt q) - ||p-p’||2] • f is well-defined and continuous since quadratic has unique maximum and small change to p,q only moves this a little. • Also fixed point = NE. (even if tiny p’p incentive to move, will move little bit). • So, that’s it! Note: quadratic + linear = quadratic. 4 What if all players minimize regret? In zero-sum games, empirical frequencies quickly approaches minimax optimal. In general-sum games, does behavior quickly (or Internal regret and at all) approach a Nash equilibrium? (after all, a Nash Eq is exactly a set of distributions that correlated equilibria are no-regret wrt each other). Well, unfortunately, no. A bad example for general-sum games What can we say? Augmented Shapley game from [Z04]: “RPSF” If algorithms minimize “internal” or “swap” First 3 rows/cols are Shapley game (rock / paper / regret, then empirical distribution of play scissors but if both do same action then both lose). approaches correlated equilibrium. th 4 action “play foosball” has slight negative if other player is still doing r/p/s but positive if other player Foster & Vohra, Hart & Mas-Colell,… does 4th action too. Though doesn’t imply play is stabilizing. NR algs will cycle among first 3 and have no regret, but do worse than only Nash Equilibrium of both playing foosball. What are internal regret and We didn’t really expect this to work given how correlated equilibria? hard NE can be to find… More general forms of regret Internal/swap-regret 1. “best expert” or “external” regret: • E.g., each day we pick one stock to buy – Given n strategies. Compete with best of them in shares in. hindsight. – Don’t want to have regret of the form “every 2. “sleeping expert” or “regret with time-intervals”: time I bought IBM, I should have bought – Given n strategies, k properties. Let Si be set of days Microsoft instead”. satisfying property i (might overlap). Want to simultaneously achieve low regret over each Si. • Formally, regret is wrt optimal function 3. “internal” or “swap” regret: like (2), except that f:{1,…,N}!{1,…,N} such that every time you Si = set of days in which we chose strategy i. played action j, it plays f(j).

Game Theory I

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support