Structure Learning for Approximate Solution of Many-Player Games Zun Li, Michael . Wellman University of Michigan, Ann Arbor {lizun, wellman}@umich.edu

Normal Form Games: Limitations Models • In an N-player normal form game G , agent • Games with Symmetry [3]: n ∈ {1,...,N} chooses its action an ∈ – Anonymous game n {1,...,M}, and receives payoff un(a) as : agent ’s payoff depends only on its action and how many agents a function of the agents’ joint action a. choose each action: un(an, a−n) = un(an, f1, . . . , fM ). – : ∀n. un = u. • The payoff for n under a joint mixed strat- – Role-symmetric game: Let R(n) ∈ {1,...,K} denote the role for agent n. Then the egy σ is un(σ) , Ea∼σ[un(an, a−n)], the n m σ payoff for agent n depends on its action and the action distribution within each role: deviation payoff of to under is R un(an, a−n) = un (an, f1,1, . . . , f1,M , . . . , fK,M ). un(an, σ−n) , Ea−n∼σ−n [un(m, a−n)] • : σ is an -Nash Equilib- • Games with Sparsity: in a graphical games [2], agent n’s payoff depends only on the joint action profile over its neighborhood N (n) on an interaction graph, un(an, a−n) = un(an, aN (n)). rium if maxn,an un(an, σ−n) − un(σ) ≤  • The representational complexity is O(NM N ), which is prohibitive when N is Game Model Learning [4] & Iterative Structure Learning Framework large. Need more succinct representation! • Game model learning: Solving a complex unknown game by learning a succinct representation of it in a hypothesis game space whose structure can be exploited for equilibrium computation, • The computational complexity of solving the solution of which can be served as an approximate solution of the origin game a Nash is PPAD complete. Need more ad- vanced computational tools!

Full Game Space Game Model G Empirical Game Models [1] Payoff Function Target Game Equilibrium Regression Computation

• Empirical Game Theoretical Analysis Structure S (EGTA) employs simulation or sampling Final Hypothesis Equilibrium σ* Game Structure to induce a game model. Learning

Initial Hypothesis Query Oracle Data Buffer D • Formally, in EGTA the multiagent envi- in Region of σ* ronment is represented by a game oracle O (e.g.,, a simulator) Hypothesis Game Space Oracle O • A dataset D of action-payoff tuples (a, u) u could be queried to the oracle, where is Figure 1: Game Model Learning & Iterative Structure Learning Framework the (noisy) payoff vector associated with action profile a. • Iterative structure learning framework: The only explicit game descriptors are the sets of agents ∗ • A normal-form game model induced from and actions. Starting with an arbitrary guess solution σ , on each iteration, D is called an empirical game. ∗ – Queries oracle O in the region of σ , obtaining by this online sampling process a new • In EGTA, the game analyst does not need dataset , which is added to the data buffer D. to store the information of the whole game – Through offline interaction with D, we then learn a game model using function approxi- matrix to compute an approximate Nash. mators, and solve it to reach the next σ∗.

K-Roles: Learning Role Symmetry G3L: Learning Graphical Structure Symmetry Can Arise from Sparsity

• Hyperparameter Kˆ : the number of roles • Hyperparameter κˆ: the maximum size of • un = yn − ζ · xn. yn is a symmetric game neighborhood term while xn is a graphical game term, • Idea: Represent each agent as their devia- ζ ≥ 0 a structure parameter defining a tion payoffs and use unsupervised learning • Idea: Greedily learn a graphical model spectrum of game between perfect symme- on the vector embeddings guided by payoff training loss. try and perfect sparsity. • Can be regarded as feature extraction. • Can be regarded as feature selection. 0.6

0.91 0.5 G3L 0.35 IBR 0.90 K-Roles 1.0 0.30 2.0 FPP-100 0.89 0.4 FPP-1000 0.25 0.9 0.88 1.5 0.20 0.8 0.15 0.87 0.3 Regret 1.0 0.7 0.10 Graph Score 0.86 Regret Regret

Rand Index 0.05 0.6 0.5 0.85 0.2 0.00 0.5 0.84 K-Roles 0.0 0 2 4 6 8 0 2 4 6 8 0.1 FPP 0.4 Number of Iterations Number of Iterations 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 G3L Number of Iterations Number of Iterations 0.0 IBR 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Figure 2: Performance of K-Roles on a Figure 3: Performance of G3L on a Structure Parameter 300-agent, 3-action, 3-role role-symmetric 100-agent, 2-action graphical game. Left and right figures respectively measures the game. Left and right figures respectively Figure 4: Performance of all methods on an equilibrium and structure quality, w.r.p. the measures the equilibrium and structure approximately structured game class. quality, w.r.p. the true game model true game model

References

[1] Wellman, Michael P. "Methods for empirical game-theoretic analysis." AAAI. 2006. [2] Kearns, Michael, Michael L. Littman, and Satinder Singh. "Graphical models for ." UAI. 2001. Take a picture to [3] Jiang, Albert Xin, Kevin Leyton-Brown, and Navin AR Bhat. "Action-graph games." Games and Economic Behavior 71.1 (2011): 141-173. download the full paper [4] Vorobeychik, Y. and Wellman, M. and Singh, S., “Learning payoff functions in infinite games" in Machine Learning, 67(1-2) 145–168, 2007.