Computer Games Workshop at ECAI 2012
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Minimax TD-Learning with Neural Nets in a Markov Game
Minimax TD-Learning with Neural Nets in a Markov Game Fredrik A. Dahl and Ole Martin Halck Norwegian Defence Research Establishment (FFI) P.O. Box 25, NO-2027 Kjeller, Norway {Fredrik-A.Dahl, Ole-Martin.Halck}@ffi.no Abstract. A minimax version of temporal difference learning (minimax TD- learning) is given, similar to minimax Q-learning. The algorithm is used to train a neural net to play Campaign, a two-player zero-sum game with imperfect information of the Markov game class. Two different evaluation criteria for evaluating game-playing agents are used, and their relation to game theory is shown. Also practical aspects of linear programming and fictitious play used for solving matrix games are discussed. 1 Introduction An important challenge to artificial intelligence (AI) in general, and machine learning in particular, is the development of agents that handle uncertainty in a rational way. This is particularly true when the uncertainty is connected with the behavior of other agents. Game theory is the branch of mathematics that deals with these problems, and indeed games have always been an important arena for testing and developing AI. However, almost all of this effort has gone into deterministic games like chess, go, othello and checkers. Although these are complex problem domains, uncertainty is not their major challenge. With the successful application of temporal difference learning, as defined by Sutton [1], to the dice game of backgammon by Tesauro [2], random games were included as a standard testing ground. But even backgammon features perfect information, which implies that both players always have the same information about the state of the game. -
Stochastic Game Theory Applications for Power Management in Cognitive Networks
STOCHASTIC GAME THEORY APPLICATIONS FOR POWER MANAGEMENT IN COGNITIVE NETWORKS A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Digital Science by Sham Fung Feb 2014 Thesis written by Sham Fung M.D.S, Kent State University, 2014 Approved by Advisor , Director, School of Digital Science ii TABLE OF CONTENTS LIST OF FIGURES . vi LIST OF TABLES . vii Acknowledgments . viii Dedication . ix 1 BACKGROUND OF COGNITIVE NETWORK . 1 1.1 Motivation and Requirements . 1 1.2 Definition of Cognitive Networks . 2 1.3 Key Elements of Cognitive Networks . 3 1.3.1 Learning and Reasoning . 3 1.3.2 Cognitive Network as a Multi-agent System . 4 2 POWER ALLOCATION IN COGNITIVE NETWORKS . 5 2.1 Overview . 5 2.2 Two Communication Paradigms . 6 2.3 Power Allocation Scheme . 8 2.3.1 Interference Constraints for Primary Users . 8 2.3.2 QoS Constraints for Secondary Users . 9 2.4 Related Works on Power Management in Cognitive Networks . 10 iii 3 GAME THEORY|PRELIMINARIES AND RELATED WORKS . 12 3.1 Non-cooperative Game . 14 3.1.1 Nash Equilibrium . 14 3.1.2 Other Equilibriums . 16 3.2 Cooperative Game . 16 3.2.1 Bargaining Game . 17 3.2.2 Coalition Game . 17 3.2.3 Solution Concept . 18 3.3 Stochastic Game . 19 3.4 Other Types of Games . 20 4 GAME THEORETICAL APPROACH ON POWER ALLOCATION . 23 4.1 Two Fundamental Types of Utility Functions . 23 4.1.1 QoS-Based Game . 23 4.1.2 Linear Pricing Game . -
4 Beyond Classical Search
BEYOND CLASSICAL 4 SEARCH In which we relax the simplifying assumptions of the previous chapter, thereby getting closer to the real world. Chapter 3 addressed a single category of problems: observable, deterministic, known envi- ronments where the solution is a sequence of actions. In this chapter, we look at what happens when these assumptions are relaxed. We begin with a fairly simple case: Sections 4.1 and 4.2 cover algorithms that perform purely local search in the state space, evaluating and modify- ing one or more current states rather than systematically exploring paths from an initial state. These algorithms are suitable for problems in which all that matters is the solution state, not the path cost to reach it. The family of local search algorithms includes methods inspired by statistical physics (simulated annealing)andevolutionarybiology(genetic algorithms). Then, in Sections 4.3–4.4, we examine what happens when we relax the assumptions of determinism and observability. The key idea is that if an agent cannot predict exactly what percept it will receive, then it will need to consider what to do under each contingency that its percepts may reveal. With partial observability, the agent will also need to keep track of the states it might be in. Finally, Section 4.5 investigates online search,inwhichtheagentisfacedwithastate space that is initially unknown and must be explored. 4.1 LOCAL SEARCH ALGORITHMS AND OPTIMIZATION PROBLEMS The search algorithms that we have seen so far are designed to explore search spaces sys- tematically. This systematicity is achieved by keeping one or more paths in memory and by recording which alternatives have been explored at each point along the path. -
Adversarial Search
Artificial Intelligence CS 165A Oct 29, 2020 Instructor: Prof. Yu-Xiang Wang ® ExamplEs of heuristics in A*-search ® GamEs and Adversarial SEarch 1 Recap: Search algorithms • StatE-spacE diagram vs SEarch TrEE • UninformEd SEarch algorithms – BFS / DFS – Depth Limited Search – Iterative Deepening Search. – Uniform cost search. • InformEd SEarch (with an heuristic function h): – Greedy Best-First-Search. (not complete / optimal) – A* Search (complete / optimal if h is admissible) 2 Recap: Summary table of uninformed search (Section 3.4.7 in the AIMA book.) 3 Recap: A* Search (Pronounced “A-Star”) • Uniform-cost sEarch minimizEs g(n) (“past” cost) • GrEEdy sEarch minimizEs h(n) (“expected” or “future” cost) • “A* SEarch” combines the two: – Minimize f(n) = g(n) + h(n) – Accounts for the “past” and the “future” – Estimates the cheapest solution (complete path) through node n function A*-SEARCH(problem, h) returns a solution or failure return BEST-FIRST-SEARCH(problem, f ) 4 Recap: Avoiding Repeated States using A* Search • Is GRAPH-SEARCH optimal with A*? Try with TREE-SEARCH and B GRAPH-SEARCH 1 h = 5 1 A C D 4 4 h = 5 h = 1 h = 0 7 E h = 0 Graph Search Step 1: Among B, C, E, Choose C Step 2: Among B, E, D, Choose B Step 3: Among D, E, Choose E. (you are not going to select C again) 5 Recap: Consistency (Monotonicity) of heuristic h • A heuristic is consistEnt (or monotonic) provided – for any node n, for any successor n’ generated by action a with cost c(n,a,n’) c(n,a,n’) • h(n) ≤ c(n,a,n’) + h(n’) n n’ – akin to triangle inequality. -
Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security
NBER TECHNICAL WORKING PAPER SERIES DETERMINISTIC AND STOCHASTIC PRISONER'S DILEMMA GAMES: EXPERIMENTS IN INTERDEPENDENT SECURITY Howard Kunreuther Gabriel Silvasi Eric T. Bradlow Dylan Small Technical Working Paper 341 http://www.nber.org/papers/t0341 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 August 2007 We appreciate helpful discussions in designing the experiments and comments on earlier drafts of this paper by Colin Camerer, Vince Crawford, Rachel Croson, Robyn Dawes, Aureo DePaula, Geoff Heal, Charles Holt, David Krantz, Jack Ochs, Al Roth and Christian Schade. We also benefited from helpful comments from participants at the Workshop on Interdependent Security at the University of Pennsylvania (May 31-June 1 2006) and at the Workshop on Innovation and Coordination at Humboldt University (December 18-20, 2006). We thank George Abraham and Usmann Hassan for their help in organizing the data from the experiments. Support from NSF Grant CMS-0527598 and the Wharton Risk Management and Decision Processes Center is gratefully acknowledged. The views expressed herein are those of the author(s) and do not necessarily reflect the views of the National Bureau of Economic Research. © 2007 by Howard Kunreuther, Gabriel Silvasi, Eric T. Bradlow, and Dylan Small. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Deterministic and Stochastic Prisoner's Dilemma Games: Experiments in Interdependent Security Howard Kunreuther, Gabriel Silvasi, Eric T. Bradlow, and Dylan Small NBER Technical Working Paper No. 341 August 2007 JEL No. C11,C12,C22,C23,C73,C91 ABSTRACT This paper examines experiments on interdependent security prisoner's dilemma games with repeated play. -
Stochastic Games with Hidden States∗
Stochastic Games with Hidden States¤ Yuichi Yamamoto† First Draft: March 29, 2014 This Version: January 14, 2015 Abstract This paper studies infinite-horizon stochastic games in which players ob- serve noisy public information about a hidden state each period. We find that if the game is connected, the limit feasible payoff set exists and is invariant to the initial prior about the state. Building on this invariance result, we pro- vide a recursive characterization of the equilibrium payoff set and establish the folk theorem. We also show that connectedness can be replaced with an even weaker condition, called asymptotic connectedness. Asymptotic con- nectedness is satisfied for generic signal distributions, if the state evolution is irreducible. Journal of Economic Literature Classification Numbers: C72, C73. Keywords: stochastic game, hidden state, connectedness, stochastic self- generation, folk theorem. ¤The author thanks Naoki Aizawa, Drew Fudenberg, Johannes Horner,¨ Atsushi Iwasaki, Michi- hiro Kandori, George Mailath, Takeaki Sunada, and Masatoshi Tsumagari for helpful conversa- tions, and seminar participants at various places. †Department of Economics, University of Pennsylvania. Email: [email protected] 1 1 Introduction When agents have a long-run relationship, underlying economic conditions may change over time. A leading example is a repeated Bertrand competition with stochastic demand shocks. Rotemberg and Saloner (1986) explore optimal col- lusive pricing when random demand shocks are i.i.d. each period. Haltiwanger and Harrington (1991), Kandori (1991), and Bagwell and Staiger (1997) further extend the analysis to the case in which demand fluctuations are cyclic or persis- tent. One of the crucial assumptions of these papers is that demand shocks are publicly observable before firms make their decisions in each period. -
570: Minimax Sample Complexity for Turn-Based Stochastic Game
Minimax Sample Complexity for Turn-based Stochastic Game Qiwen Cui1 Lin F. Yang2 1School of Mathematical Sciences, Peking University 2Electrical and Computer Engineering Department, University of California, Los Angeles Abstract guarantees are rather rare due to complex interaction be- tween agents that makes the problem considerably harder than single agent reinforcement learning. This is also known The empirical success of multi-agent reinforce- as non-stationarity in MARL, which means when multi- ment learning is encouraging, while few theoret- ple agents alter their strategies based on samples collected ical guarantees have been revealed. In this work, from previous strategy, the system becomes non-stationary we prove that the plug-in solver approach, proba- for each agent and the improvement can not be guaranteed. bly the most natural reinforcement learning algo- One fundamental question in MBRL is that how to design rithm, achieves minimax sample complexity for efficient algorithms to overcome non-stationarity. turn-based stochastic game (TBSG). Specifically, we perform planning in an empirical TBSG by Two-players turn-based stochastic game (TBSG) is a two- utilizing a ‘simulator’ that allows sampling from agents generalization of Markov decision process (MDP), arbitrary state-action pair. We show that the em- where two agents choose actions in turn and one agent wants pirical Nash equilibrium strategy is an approxi- to maximize the total reward while the other wants to min- mate Nash equilibrium strategy in the true TBSG imize it. As a zero-sum game, TBSG is known to have and give both problem-dependent and problem- Nash equilibrium strategy [Shapley, 1953], which means independent bound. -
Bayesian Games Professors Greenwald 2018-01-31
Bayesian Games Professors Greenwald 2018-01-31 We describe incomplete-information, or Bayesian, normal-form games (formally; no examples), and corresponding equilibrium concepts. 1 A Bayesian Model of Interaction A Bayesian, or incomplete information, game is a generalization of a complete-information game. Recall that in a complete-information game, the game is assumed to be common knowledge. In a Bayesian game, in addition to what is common knowledge, players may have private information. This private information is captured by the notion of an epistemic type, which describes a player’s knowledge. The Bayesian-game formalism makes two simplifying assumptions: • Any information that is privy to any of the players pertains only to utilities. In all realizations of a Bayesian game, the number of players and their actions are identical. • Players maintain beliefs about the game (i.e., about utilities) in the form of a probability distribution over types. Prior to receiving any private information, this probability distribution is common knowledge: i.e., it is a common prior. After receiving private information, players conditional on this information to update their beliefs. As a consequence of the com- mon prior assumption, any differences in beliefs can be attributed entirely to differences in information. Rational players are again assumed to maximize their utility. But further, they are assumed to update their beliefs when they obtain new information via Bayes’ rule. Thus, in a Bayesian game, in addition to players, actions, and utilities, there is a type space T = ∏i2[n] Ti, where Ti is the type space of player i.There is also a common prior F, which is a probability distribution over type profiles. -
A. Local Beam Search with K=1. A. Local Beam Search with K = 1 Is Hill-Climbing Search
4. Give the name that results from each of the following special cases: a. Local beam search with k=1. a. Local beam search with k = 1 is hill-climbing search. b. Local beam search with one initial state and no limit on the number of states retained. b. Local beam search with k = ∞: strictly speaking, this doesn’t make sense. The idea is that if every successor is retained (because k is unbounded), then the search resembles breadth-first search in that it adds one complete layer of nodes before adding the next layer. Starting from one state, the algorithm would be essentially identical to breadth-first search except that each layer is generated all at once. c. Simulated annealing with T=0 at all times (and omitting the termination test). c. Simulated annealing with T = 0 at all times: ignoring the fact that the termination step would be triggered immediately, the search would be identical to first-choice hill climbing because every downward successor would be rejected with probability 1. d. Simulated annealing with T=infinity at all times. d. Simulated annealing with T = infinity at all times: ignoring the fact that the termination step would never be triggered, the search would be identical to a random walk because every successor would be accepted with probability 1. Note that, in this case, a random walk is approximately equivalent to depth-first search. e. Genetic algorithm with population size N=1. e. Genetic algorithm with population size N = 1: if the population size is 1, then the two selected parents will be the same individual; crossover yields an exact copy of the individual; then there is a small chance of mutation. -
Guru Nanak Dev University Amritsar
FACULTY OF ENGINEERING & TECHNOLOGY SYLLABUS FOR B.TECH. COMPUTER ENGINEERING (Credit Based Evaluation and Grading System) (SEMESTER: I to VIII) Session: 2020-21 Batch from Year 2020 To Year 2024 GURU NANAK DEV UNIVERSITY AMRITSAR Note: (i) Copy rights are reserved. Nobody is allowed to print it in any form. Defaulters will be prosecuted. (ii) Subject to change in the syllabi at any time. Please visit the University website time to time. 1 CSA1: B.TECH. (COMPUTER ENGINEERING) SEMESTER SYSTEM (Credit Based Evaluation and Grading System) Batch from Year 2020 to Year 2024 SCHEME SEMESTER –I S. Course Credit Course Title L T P No. Code s 1. CEL120 Engineering Mechanics 3 1 0 4 Engineering Graphics & Drafting 2. MEL121 3 0 1 4 Using AutoCAD 3 MTL101 Mathematics-I 3 1 0 4 4. PHL183 Physics 4 0 1 5 5. MEL110 Introduction to Engg. Materials 3 0 0 3 6. Elective-I 2 0 0 2 SOA- 101 Drug Abuse: Problem, Management 2 0 0 2 7. and Prevention -Interdisciplinary Course (Compulsory) List of Electives–I: 1. PBL121 Punjabi (Compulsory)OR 2 0 0 2. PBL122* w[ZYbh gzikph OR 2 0 0 2 Punjab History & Culture 3. HSL101* 2 0 0 (1450-1716) Total Credits: 20 2 2 24 SEMESTER –II S. Course Code Course Title L T P Credits No. 1. CYL197 Engineering Chemistry 3 0 1 4 2. MTL102 Mathematics-II 3 1 0 4 Basic Electrical &Electronics 3. ECL119 4 0 1 5 Engineering Fundamentals of IT & Programming 4. CSL126 2 1 1 4 using Python 5. -
Outline for Static Games of Complete Information I
Outline for Static Games of Complete Information I. Definition of a game II. Examples III. Definition of Nash equilibrium IV. Examples, continued V. Iterated elimination of dominated strategies VI. Mixed strategies VII. Existence theorem on Nash equilibria VIII. The Hotelling model and extensions Copyright © 2004 by Lawrence M. Ausubel Definition: An n-player, static game of complete information consists of an n-tuple of strategy sets and an n-tuple of payoff functions, denoted by G = {S1, … , Sn; u1, … , un} Si, the strategy set of player i, is the set of all permissible moves for player i. We write si ∈ Si for one of player i’s strategies. ui, the payoff function of player i, is the utility, profit, etc. for player i, and depends on the strategies chosen by all the players: ui(s1, … , sn). Example: Prisoners’ Dilemma Prisoner II Remain Confess Silent Remain –1 , –1 –5 , 0 Prisoner Silent I Confess 0, –5 –4 ,–4 Example: Battle of the Sexes F Boxing Ballet Boxing 2, 1 0, 0 M Ballet 0, 0 1, 2 Definition: A Nash equilibrium of G (in pure strategies) consists of a strategy for every player with the property that no player can improve her payoff by unilaterally deviating: (s1*, … , sn*) with the property that, for every player i: ui (s1*, … , si–1*, si*, si+1*, … , sn*) ≥ ui (s1*, … , si–1*, si, si+1*, … , sn*) for all si ∈ Si. Equivalently, a Nash equilibrium is a mutual best response. That is, for every player i, si* is a solution to: ***** susssssiiiiin∈arg max{ (111 , ... ,−+ , , , ... , )} sSii∈ Example: Prisoners’ Dilemma Prisoner II -
Discovery and Equilibrium in Games with Unawareness∗
Discovery and Equilibrium in Games with Unawareness∗ Burkhard C. Schippery April 20, 2018 Abstract Equilibrium notions for games with unawareness in the literature cannot be interpreted as steady-states of a learning process because players may discover novel actions during play. In this sense, many games with unawareness are \self-destroying" as a player's representa- tion of the game may change after playing it once. We define discovery processes where at each state there is an extensive-form game with unawareness that together with the players' play determines the transition to possibly another extensive-form game with unawareness in which players are now aware of actions that they have discovered. A discovery process is rationalizable if players play extensive-form rationalizable strategies in each game with un- awareness. We show that for any game with unawareness there is a rationalizable discovery process that leads to a self-confirming game that possesses a self-confirming equilibrium in extensive-form rationalizable strategies. This notion of equilibrium can be interpreted as steady-state of both a discovery and learning process. Keywords: Self-confirming equilibrium, conjectural equilibrium, extensive-form rational- izability, unawareness, extensive-form games, equilibrium, learning, discovery. JEL-Classifications: C72, D83. ∗I thank Aviad Heifetz, Byung Soo Lee, and seminar participants at UC Berkeley, the University of Toronto, the Barcelona workshop on Limited Reasoning and Cognition, SAET 2015, CSLI 2016, TARK 2017, and the Virginia Tech Workshop on Advances in Decision Theory 2018 for helpful discussions. An abbreviated earlier version appeared in the online proceedings of TARK 2017 under the title “Self-Confirming Games: Unawareness, Discovery, and Equilibrium".