Game Theory, and Reasoning under Uncertainty

This workshop explores the benefits that may result from carrying out research at the in- terface between machine learning and . While classical game theory makes limited provision for dealing with uncertainty and noise, research in machine learning, and particularly probabilistic inference, has resulted in a remarkable array of powerful algo- rithms for performing statistical inference from difficult real world data. Recent research work at this interface has suggested computationally tractable algorithms for analysing games that consist of a large number of players, whilst insights from game theory have also inspired new work on strategic learning behaviour in probabilistic infer- ence and are suggesting new algorithms to perform intelligent sampling in Markov Monte Carlo methods. The goal of this workshop is to explore the significant advantagesthat game theory and ma- chine learning seem to offer to each other, to explore the correspondences and differences between these two fields and to identify interesting and exciting areas of future work.

Schedule

Morning Session

07:30 Introduction and Aims, Iead Rezek, University of Oxford 07:35 Invited Talk: Machine Learning: Principles, Probabilities and Perspectives, Stephen J. Roberts, Oxford University 08:05 Invited Talk: Learning Topics in Game-Theoretic Decision Making, Michael Littman, Rutgers 08:35 Invited Talk: Predictive Game Theory, David Wolpert, NASA Ames Research Center 09:05 Break 09:15 Model-based Reinforcement Learning for Partially Observable Games with Sampling-based State Estimation, Hajime Fujita, Nara Institute of Science and Technology 09:40 Effective negotiation proposals using models of preference and risk behavior, Angelo Restificar, Oregon State 10:05 Mechanism Design via Machine Learning, Yishay Mansour, Tel-Aviv University 10:30 Ski Break Afternoon Session

16:00 N-Body Games, Albert Xin Jiang, University of British Columbia 16:25 Probabilistic inference for computing optimal policies in MDPs, Marc Toussaint, University of Edinburgh 16:50 Graphical Models, Evolutionary Game Theory, and the Power of Randomization Siddharth Suri, University of Pennsylvania 17:15 Probability Collectives for Adaptive Distributed Control, David H. Wolpert, NASA Ames Research Center 17:40 Break 17:50 Probability Collectives: Examples and Applications, Dev Rajnarayan, Stanford University 18:15 A Formalization of Game Balance Principles, Jeff Long, University of Saskatchewan 18:40 A stochastic optimal control formulation of distributed decision making Bert Kappen, Radboud University, Nijmegen 19:05 Discussion, Wrapping Up 19:30 End

Organizers

I. Rezek University of Oxford, Oxford, UK. [email protected] www.robots.ox.ac.uk/∼irezek A. Rogers University of Southampton, Southampton, UK. [email protected] www.ecs.soton.ac.uk/people/∼acr David Wolpert NASA Ames Research Center, California, USA. [email protected]..gov ti.arc.nasa.gov/people/dhw Abstracts

Learning Topics in Game-Theoretic Decision Making, M. LITTMAN, Rutgers, NJ This presentation will review some topics of recent interest in AI and economics concern- ing design making in a computational game-theory framework. It will highlight areas in which machine learning has played a role and could play a greater role in the future. Cov- ered areas include recent representational and algorithmic advances, stochastic games and reinforcement learning, no regret algorithms, and the role of various equilibrium concepts.

Machine Learning: Principles, Probabilities and Perspectives,S.J. ROBERTS, University of Oxford, UK This talk will offer an overview of some of the key principles in machine learning. It will discuss how uncertainty is involved, from data to models; how learning may be defined and how we may evaluate the value of information. Based on simple principles, strategies may be seen in the light of maximizing expected information. The differences (and similarities) between machine learning and game theory will be considered.

Predictive Game Theory, DAVID H. WOLPERT, NASA Ames Research Center Abstract: Conventional noncooperative game theory hypothesizes that the joint strategy of a set of reasoning players in a game will necessarily satisfy an ”equilibrium concept”. All other joint strategies are considered impossible. Under this hypothesis the only issue is what equilibrium concept is ”correct”. This hypothesis violates the first-principles arguments underlying probability theory. In- deed, probability theory renders moot the controversy over what equilibrium concept is cor- rect - every joint strategy can arise with non-zero probability. Rather than a first-principles derivation of an equilibrium concept, game theory requires a first-principles derivation of a distribution over joint (mixed) strategies. If you wish to distill such a distribution down to the prediction of a single joint strategy, that prediction should be set by decision theory, using your (!) loss function. Accordingly, for any fixed game, the predicted joint strategy - one’s ”equilibrium concept” - will vary with the loss function of the external scientist mak- ing the prediction. Game theory based on such considerations is called Predictive Game Theory (PGT). This talk shows how informationtheory can providesuch a distribution over joint strategies. The connection of this distribution to the quantal response equilibrium is elaborated. It is also shown that in many games, having a with support restricted to Nash equilibria - as stipulated by conventional game theory - is impossible. PGT is also used to: i) Derive an information-theoretic quantification of the degree of rationality; ii) Derive bounded rationality as a cost of computation; iii) Elaborate the close formal relationship between game theory and statistical physics; iv) Use this relationship to extend game theory to allow stochastically varying numbers of players. Model-based Reinforcement Learning for Partially Observable Games with Sampling-based State Estimation, HAJIME FUJITA AND SHIN ISHII, Graduate School of Information Science Nara Institute of Science and Technology, Ikoma, JP We present a model-based reinforcement learning (RL) scheme for large scale multi-agent problems with partial observability, and apply it to a card game, Hearts. This game is a well-defined example of an imperfect information game. To reduce the computational cost, we use a sampling technique based on Markov chain Monte Carlo (MCMC) in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our RL agent can perform learning of an appropriate strategy and exhibit a comparable performance to an expert-level human player in this partially observable multi-agent problem.

Effective negotiation proposals using models of preference and risk be- havior,ANGELO RESTIFICAR, Oregon State University, Corvallis, OR, and PETER HADDAWY, Asian Institute of Technology Pathumthani, Thailand In previous work, we infer implicit preferences and attitude toward risk by interpreting offer/counter-offer exchanges in negotiation as a choice between a certain offer and a gam- ble [A. Restificar et.al. 2004]. Supervised learning can then be used to construct models of preference and risk behavior by generating training instances from such implicit informa- tion. In this paper, we introduce a procedure that uses these learned models to find effective negotiation proposals. Experiments were performed using this procedure via repeated ne- gotiations between a buyer and a seller agent. The results of our experiments suggest that the use of learned opponent models leads to a significant increase in the number of agree- ments and a remarkable reduction in the number of negotiation exchanges.

Mechanism Design via Machine Learning,YISHAY MANSOUR, Tel-Aviv University, IL We use techniques from sample-complexity in machine learning to reduce problems of incentive-compatible mechanism design to standard algorithmic questions for a wide vari- ety of revenue-maximizing pricing problems. Our reductions imply that given an optimal (or beta-approximation) algorithm for the standard algorithmic problem, we can convert it into a (1+ epsilon)-approximation (or beta(1 + ǫ)-approximation) for the problem of designing a revenue-maximizing incentive-compatible mechanism, so long as the number of bidders is sufficiently large as a function of an appropriate measure of complexity of the comparison class of solutions. We apply these results to the problem of auctioning a digital good, the ”attribute auction” problem, and to the problem of item-pricing in unlimited- supply combinatorial auctions. From a learning perspective, these settings present unique challenges: in particular, the loss function is discontinuous and asymmetric, and the range of bidders’ valuations may be large. This is a joint work with Maria-Florina Balcan, Avrim Blum, Jason D. Hartline

N-Body Games, ALBERT XIN JIANG, KEVIN LEYTON-BROWNAND NANDO DE FREITAS, University of British Columbia, CA This paper introduces n-body games, a new compact game-theoretic representation which permits a wide variety of game-theoretic quantities to be efficiently computed both approx- imately and exactly. This representation is useful for games which consist of choosing actions from a metric space (e.g., points in space) and in which payoffs are computed as a function of the distances between players’ action choices. Probabilistic inference for computing optimal policies in MDPs,MARC TOUSSAINTAND AMOS STORKEY, University of Edinburgh, UK We investigate how the problem of planning in a stochastic environment can be translated into a problem of inference. Previous work on planning by probabilistic inference was limited in that a total time T has to be fixed and that the computed policy is not optimal w.r.t. expected rewards. The generative model we propose considers the total time T as a random variable and we show equivalence to maximizing the expected future return for arbitrary reward functions. Optimal policies are computed via Expectation-Maximization.

Graphical Models, Evolutionary Game Theory, and the Power of Ran- domization,MICHAEL KEARNSAND SIDDHARTH SURI, University of Pennsilvania, PA We study a natural extension of classical evolutionary game theory to a setting in which pairwise interactions are restricted to the edges of an undirected graph or network. We generalize the denition of an evolutionary stable strategy (ESS), and show a pair of comple- mentary results that exhibit the power of randomization in our setting: subject to minimal edge density conditions, the classical ESS of any game are preserved when the graph is chosen randomly and the mutation set is chosen adversarially, or when the graph is chosen adversarially and the mutation set is chosen randomly. We examine natural strengthenings of our generalized ESS denition, and show that similarly strong results are not possible for them.

Probability Collectives for Adaptive Distributed Control, DAVID H. WOLPERT, NASA Ames Research Center There are two major fields that analyze distributed systems: statistical physics and game theory. Recently it was realized that these fields can be re-expressed in a way that makes them mathematically identical. This provides a way to combine techniques from them, pro- ducing a hybrid with many strengths that do not exist in either field considered in isolation. This mathematical hybrid is called Probability Collectives (PC). As borne out by numer- ous experiments, it is particularly well-suited to distributed optimization and to adaptive distributed control. The unifying idea of these applications is that rather than directly op- timize a variable of interest x, often it is preferable to optimize an associated probability distribution, P (x). In particular, since probabilities are real-valued, P (x) can be optimized using power- ful techniques for optimization of continuous variables, e.g., gradient descent, Newton’s method, etc. This is true even if the underlying variable x is categorical, mixed type, time- extended, etc. In this way PC allows us to (for example) apply gradient descent to optimize a function over a categorical variable. Another advantage of PC is that P (x) provides sensitivity information about the optimiza- tion problem, e.g., telling us which variables are most important. In addition, finding P (x) is an inherently adaptive process, with excellent robustness against noise. This makes it particularly well-suited to real-world control problems. Moreover, PC algorithms typically ”fracture” in a way that allows completely distributed implementation, typically with ex- cellent scaling behavior. Probability Collectives: Examples and Applications, DEV RAJ- NARAYAN, Stanford University Probability Collectives (PC) is a broad framework that translates and unifies concepts from statistical mechanics, game theory and optimization. In this framework, optimization is performed not on the variables of the problem, but on a probability distribution over those variables. Such an approach has many advantages. In particular, we can now tackle cate- gorical and mixed problems using powerful methods like gradient descent from continuous optimization. Since PC is based on random sampling, it is inherently suited to large prob- lems. Techniques of Simulated Annealing (SA) and Estimation of Distribution Algorithms (EDAs) can be shown to be instances of particular variants in the PC framework. With a language that describes the working of many global optimization approaches, we show how one can analyze the performance of any particular approach and deduce how they need to be changed to improve performance. In this paper, we describe specific pedagogical and real-world problems and a systematic approach to solving them using the PC framework. The pedagogical examples range from categorical ones like two-player, common payoff, matrix games and UAV path-planning over a 2-D grid of hexagonal cells, to standard con- tinuous optimization benchmarks like the Rosenbrock function. Finally, we describe the application of PC to two real-world problems – control design for flight control of a UAV with novel distributed actuators, and control design for gust alleviation using distributed actuators.

A Formalization of Game Balance Principles, JEFF LONGAND MICA- HEL C. HORSCH, University of Saskatchewan Game balance is the problem of determining the fairness of actions or action sets in com- petitive, multiplayer games. In this paper, I formalize issues related to game balance using the mathematical language of game theory, as used in the economic sciences. I show how to detect game imbalance in this language using existing concepts and algorithms, and provide a new algorithm for correcting imbalances thus discovered. Finally, I discuss the application of these techniques to large, real-world competitive games through the use of high-level strategic abstraction.

A stochastic optimal control formulation of distributed decision mak- ing, BERT KAPPEN, BART VAN DER BROEK, WIM WIEGERINCK, SNN, Radboud University, Nijmegen, The Netherlands It has recently been shown that a class of stochastic optimal control problems can be for- mulated as a path integral and where the noise plays the role of temperature. The path integral displays symmetry breaking and there exist a critical noise value that separates regimes where optimal control yields qualitatively different solutions. The path integral can be computed efficiently by Monte Carlo integration or by Laplace approximation, and can therefore be used to solve high dimensional stochastic control problems. In this contribution, I discuss the consequences of this approach for distributed deci- sion making in multi-agent systems. It is shown that the optimal cost to go function J(x1, ..., xn) takes the form of a log partition sum over all configurations at the final time. The optimal action for agent i is given by the gradient of J wrt xi. The free energy displays symmetry breaking as a function of the noise as well as the time to go. This means that in different regimes the optimal behaviour of the agents changes from averaging over all possible stategies to specialization to one particular strategy. The cost to go is intractable for large systems. However, one can use a variety of methodsto approximate J. I will discuss Monte Carlo sampling, variational approximations or belief propagation. I will present a number of examples to show the phenomenology of this model as well as the effectiveness of various approximations. Also, possible extensions into competitive games will be discussed. H.J. Kappen. A linear theory for control of non-linear stochastic systems. Physical Review Letters, 2005. In press. H.J. Kappen. Path integrals and symmetry breaking for optimal control theory. Journal of statistical mechanics: theory and Experiment, 2005. In press.