Learning to Play Two-Player Perfect-Information Games Without Knowledge
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Lecture 4 Rationalizability & Nash Equilibrium Road
Lecture 4 Rationalizability & Nash Equilibrium 14.12 Game Theory Muhamet Yildiz Road Map 1. Strategies – completed 2. Quiz 3. Dominance 4. Dominant-strategy equilibrium 5. Rationalizability 6. Nash Equilibrium 1 Strategy A strategy of a player is a complete contingent-plan, determining which action he will take at each information set he is to move (including the information sets that will not be reached according to this strategy). Matching pennies with perfect information 2’s Strategies: HH = Head if 1 plays Head, 1 Head if 1 plays Tail; HT = Head if 1 plays Head, Head Tail Tail if 1 plays Tail; 2 TH = Tail if 1 plays Head, 2 Head if 1 plays Tail; head tail head tail TT = Tail if 1 plays Head, Tail if 1 plays Tail. (-1,1) (1,-1) (1,-1) (-1,1) 2 Matching pennies with perfect information 2 1 HH HT TH TT Head Tail Matching pennies with Imperfect information 1 2 1 Head Tail Head Tail 2 Head (-1,1) (1,-1) head tail head tail Tail (1,-1) (-1,1) (-1,1) (1,-1) (1,-1) (-1,1) 3 A game with nature Left (5, 0) 1 Head 1/2 Right (2, 2) Nature (3, 3) 1/2 Left Tail 2 Right (0, -5) Mixed Strategy Definition: A mixed strategy of a player is a probability distribution over the set of his strategies. Pure strategies: Si = {si1,si2,…,sik} σ → A mixed strategy: i: S [0,1] s.t. σ σ σ i(si1) + i(si2) + … + i(sik) = 1. If the other players play s-i =(s1,…, si-1,si+1,…,sn), then σ the expected utility of playing i is σ σ σ i(si1)ui(si1,s-i) + i(si2)ui(si2,s-i) + … + i(sik)ui(sik,s-i). -
An Iterated Greedy Metaheuristic for the Blocking Job Shop Scheduling Problem
Dipartimento di Ingegneria Via della Vasca Navale, 79 00146 Roma, Italy An Iterated Greedy Metaheuristic for the Blocking Job Shop Scheduling Problem Marco Pranzo1, Dario Pacciarelli2 RT-DIA-208-2013 Novembre 2013 (1) Universit`adegli Studi Roma Tre, Via della Vasca Navale, 79 00146 Roma, Italy (2) Universit`adegli Studi di Siena Via Roma 56 53100 Siena, Italy. ABSTRACT In this paper we consider a job shop scheduling problem with blocking (zero buffer) con- straints. Blocking constraints model the absence of buffers, whereas in the traditional job shop scheduling model buffers have infinite capacity. There are two known variants of this problem, namely the blocking job shop scheduling (BJSS) with swap allowed and the BJSS with no-swap. A swap is needed when there is a cycle of two or more jobs each waiting for the machine occupied by the next job in the cycle. Such a cycle represents a deadlock in no-swap BJSS, while with a swap all the jobs in the cycle move simulta- neously to their subsequent machine. This scheduling problem is receiving an increasing interest in the recent literature, and we propose an Iterated Greedy (IG) algorithm to solve both variants of the problem. The IG is a metaheuristic based on the repetition of a destruction phase, which removes part of the solution, and a construction phase, in which a new solution is obtained by applying an underlying greedy algorithm starting from the partial solution. Comparison with recent published results shows that the iterated greedy outperforms other state-of-the-art algorithms on benchmark instances. -
GREEDY ALGORITHMS Greedy Algorithms Solve Problems by Making the Choice That Seems Best at the Particular Moment
GREEDY ALGORITHMS Greedy algorithms solve problems by making the choice that seems best at the particular moment. Many optimization problems can be solved using a greedy algorithm. Some problems have no efficient solution, but a greedy algorithm may provide a solution that is close to optimal. A greedy algorithm works if a problem exhibits the following two properties: 1. Greedy choice property: A globally optimal solution can be arrived at by making a locally optimal solution. In other words, an optimal solution can be obtained by making "greedy" choices. 2. optimal substructure. Optimal solutions contain optimal sub solutions. In other Words, solutions to sub problems of an optimal solution are optimal. Difference Between Greedy and Dynamic Programming The most difference between greedy algorithms and dynamic programming is that we don’t solve every optimal sub-problem with greedy algorithms. In some cases, greedy algorithms can be used to produce sub-optimal solutions. That is, solutions which aren't necessarily optimal, but are perhaps very dose. In dynamic programming, we make a choice at each step, but the choice may depend on the solutions to subproblems. In a greedy algorithm, we make whatever choice seems best at 'the moment and then solve the sub-problems arising after the choice is made. The choice made by a greedy algorithm may depend on choices so far, but it cannot depend on any future choices or on the solutions to sub- problems. Thus, unlike dynamic programming, which solves the sub-problems bottom up, a greedy strategy usually progresses in a top-down fashion, making one greedy choice after another, interactively reducing each given problem instance to a smaller one. -
SEQUENTIAL GAMES with PERFECT INFORMATION Example
SEQUENTIAL GAMES WITH PERFECT INFORMATION Example 4.9 (page 105) Consider the sequential game given in Figure 4.9. We want to apply backward induction to the tree. 0 Vertex B is owned by player two, P2. The payoffs for P2 are 1 and 3, with 3 > 1, so the player picks R . Thus, the payoffs at B become (0, 3). 00 Next, vertex C is also owned by P2 with payoffs 1 and 0. Since 1 > 0, P2 picks L , and the payoffs are (4, 1). Player one, P1, owns A; the choice of L gives a payoff of 0 and R gives a payoff of 4; 4 > 0, so P1 chooses R. The final payoffs are (4, 1). 0 00 We claim that this strategy profile, { R } for P1 and { R ,L } is a Nash equilibrium. Notice that the 0 00 strategy profile gives a choice at each vertex. For the strategy { R ,L } fixed for P2, P1 has a maximal payoff by choosing { R }, ( 0 00 0 00 π1(R, { R ,L }) = 4 π1(R, { R ,L }) = 4 ≥ 0 00 π1(L, { R ,L }) = 0. 0 00 In the same way, for the strategy { R } fixed for P1, P2 has a maximal payoff by choosing { R ,L }, ( 00 0 00 π2(R, {∗,L }) = 1 π2(R, { R ,L }) = 1 ≥ 00 π2(R, {∗,R }) = 0, where ∗ means choose either L0 or R0. Since no change of choice by a player can increase that players own payoff, the strategy profile is called a Nash equilibrium. Notice that the above strategy profile is also a Nash equilibrium on each branch of the game tree, mainly starting at either B or starting at C. -
Greedy Estimation of Distributed Algorithm to Solve Bounded Knapsack Problem
Shweta Gupta et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 4313-4316 Greedy Estimation of Distributed Algorithm to Solve Bounded knapsack Problem Shweta Gupta#1, Devesh Batra#2, Pragya Verma#3 #1Indian Institute of Technology (IIT) Roorkee, India #2Stanford University CA-94305, USA #3IEEE, USA Abstract— This paper develops a new approach to find called probabilistic model-building genetic algorithms, solution to the Bounded Knapsack problem (BKP).The are stochastic optimization methods that guide the search Knapsack problem is a combinatorial optimization problem for the optimum by building probabilistic models of where the aim is to maximize the profits of objects in a favourable candidate solutions. The main difference knapsack without exceeding its capacity. BKP is a between EDAs and evolutionary algorithms is that generalization of 0/1 knapsack problem in which multiple instances of distinct items but a single knapsack is taken. Real evolutionary algorithms produces new candidate solutions life applications of this problem include cryptography, using an implicit distribution defined by variation finance, etc. This paper proposes an estimation of distribution operators like crossover, mutation, whereas EDAs use algorithm (EDA) using greedy operator approach to find an explicit probability distribution model such as Bayesian solution to the BKP. EDA is stochastic optimization technique network, a multivariate normal distribution etc. EDAs can that explores the space of potential solutions by modelling be used to solve optimization problems like other probabilistic models of promising candidate solutions. conventional evolutionary algorithms. The rest of the paper is organized as follows: next section Index Terms— Bounded Knapsack, Greedy Algorithm, deals with the related work followed by the section III Estimation of Distribution Algorithm, Combinatorial Problem, which contains theoretical procedure of EDA. -
Economics 201B Economic Theory (Spring 2021) Strategic Games
Economics 201B Economic Theory (Spring 2021) Strategic Games Topics: terminology and notations (OR 1.7), games and solutions (OR 1.1-1.3), rationality and bounded rationality (OR 1.4-1.6), formalities (OR 2.1), best-response (OR 2.2), Nash equilibrium (OR 2.2), 2 2 examples × (OR 2.3), existence of Nash equilibrium (OR 2.4), mixed strategy Nash equilibrium (OR 3.1, 3.2), strictly competitive games (OR 2.5), evolution- ary stability (OR 3.4), rationalizability (OR 4.1), dominance (OR 4.2, 4.3), trembling hand perfection (OR 12.5). Terminology and notations (OR 1.7) Sets For R, ∈ ≥ ⇐⇒ ≥ for all . and ⇐⇒ ≥ for all and some . ⇐⇒ for all . Preferences is a binary relation on some set of alternatives R. % ⊆ From % we derive two other relations on : — strict performance relation and not  ⇐⇒ % % — indifference relation and ∼ ⇐⇒ % % Utility representation % is said to be — complete if , or . ∀ ∈ % % — transitive if , and then . ∀ ∈ % % % % can be presented by a utility function only if it is complete and transitive (rational). A function : R is a utility function representing if → % ∀ ∈ () () % ⇐⇒ ≥ % is said to be — continuous (preferences cannot jump...) if for any sequence of pairs () with ,and and , . { }∞=1 % → → % — (strictly) quasi-concave if for any the upper counter set ∈ { ∈ : is (strictly) convex. % } These guarantee the existence of continuous well-behaved utility function representation. Profiles Let be a the set of players. — () or simply () is a profile - a collection of values of some variable,∈ one for each player. — () or simply is the list of elements of the profile = ∈ { } − () for all players except . ∈ — ( ) is a list and an element ,whichistheprofile () . -
Greedy Algorithms Greedy Strategy: Make a Locally Optimal Choice, Or Simply, What Appears Best at the Moment R
Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Overview One of the strategies used to solve optimization problems CSE 548: (Design and) Analysis of Algorithms Multiple solutions exist; pick one of low (or least) cost Greedy Algorithms Greedy strategy: make a locally optimal choice, or simply, what appears best at the moment R. Sekar Often, locally optimality 6) global optimality So, use with a great deal of care Always need to prove optimality If it is unpredictable, why use it? It simplifies the task! 1 / 35 2 / 35 Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Making change When does a Greedy algorithm work? Given coins of denominations 25cj, 10cj, 5cj and 1cj, make change for x cents (0 < x < 100) using minimum number of coins. Greedy choice property Greedy solution The greedy (i.e., locally optimal) choice is always consistent with makeChange(x) some (globally) optimal solution if (x = 0) return What does this mean for the coin change problem? Let y be the largest denomination that satisfies y ≤ x Optimal substructure Issue bx=yc coins of denomination y The optimal solution contains optimal solutions to subproblems. makeChange(x mod y) Implies that a greedy algorithm can invoke itself recursively after Show that it is optimal making a greedy choice. Is it optimal for arbitrary denominations? 3 / 35 4 / 35 Chapter 5 Overview Kruskal Dijkstra Human Compression Overview Kruskal Dijkstra Human Compression Knapsack Problem Greedy algorithmsFractional Knapsack A sack that can hold a maximum of x lbs Greedy choice property Proof by contradiction: Start with the assumption that there is an You have a choice of items you can pack in the sack optimal solution that does not include the greedy choice, and show a A game like chess can be won only by thinking ahead: a player who is focused entirely on Maximize the combined “value” of items in the sack contradiction. -
California State University, Northridge Implementing
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE IMPLEMENTING GAME-TREE SEARCHING STRATEGIES TO THE GAME OF HASAMI SHOGI A graduate project submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science By Pallavi Gulati December 2012 The graduate project of Pallavi Gulati is approved by: ______________________________ ______________________________ Peter N Gabrovsky, Ph.D Date ______________________________ ______________________________ John J Noga, Ph.D Date ______________________________ ______________________________ Richard J Lorentz, Ph.D, Chair Date California State University, Northridge ii ACKNOWLEDGEMENTS I would like to acknowledge the guidance provided by and the immense patience of Prof. Richard Lorentz throughout this project. His continued help and advice made it possible for me to successfully complete this project. iii TABLE OF CONTENTS SIGNATURE PAGE .......................................................................................................ii ACKNOWLEDGEMENTS ........................................................................................... iii LIST OF TABLES .......................................................................................................... v LIST OF FIGURES ........................................................................................................ vi 1 INTRODUCTION.................................................................................................... 1 1.1 The Game of Hasami Shogi .............................................................................. -
Adversarial Search
Artificial Intelligence CS 165A Oct 29, 2020 Instructor: Prof. Yu-Xiang Wang ® ExamplEs of heuristics in A*-search ® GamEs and Adversarial SEarch 1 Recap: Search algorithms • StatE-spacE diagram vs SEarch TrEE • UninformEd SEarch algorithms – BFS / DFS – Depth Limited Search – Iterative Deepening Search. – Uniform cost search. • InformEd SEarch (with an heuristic function h): – Greedy Best-First-Search. (not complete / optimal) – A* Search (complete / optimal if h is admissible) 2 Recap: Summary table of uninformed search (Section 3.4.7 in the AIMA book.) 3 Recap: A* Search (Pronounced “A-Star”) • Uniform-cost sEarch minimizEs g(n) (“past” cost) • GrEEdy sEarch minimizEs h(n) (“expected” or “future” cost) • “A* SEarch” combines the two: – Minimize f(n) = g(n) + h(n) – Accounts for the “past” and the “future” – Estimates the cheapest solution (complete path) through node n function A*-SEARCH(problem, h) returns a solution or failure return BEST-FIRST-SEARCH(problem, f ) 4 Recap: Avoiding Repeated States using A* Search • Is GRAPH-SEARCH optimal with A*? Try with TREE-SEARCH and B GRAPH-SEARCH 1 h = 5 1 A C D 4 4 h = 5 h = 1 h = 0 7 E h = 0 Graph Search Step 1: Among B, C, E, Choose C Step 2: Among B, E, D, Choose B Step 3: Among D, E, Choose E. (you are not going to select C again) 5 Recap: Consistency (Monotonicity) of heuristic h • A heuristic is consistEnt (or monotonic) provided – for any node n, for any successor n’ generated by action a with cost c(n,a,n’) c(n,a,n’) • h(n) ≤ c(n,a,n’) + h(n’) n n’ – akin to triangle inequality. -
Pure and Bayes-Nash Price of Anarchy for Generalized Second Price Auction
Pure and Bayes-Nash Price of Anarchy for Generalized Second Price Auction Renato Paes Leme Eva´ Tardos Department of Computer Science Department of Computer Science Cornell University, Ithaca, NY Cornell University, Ithaca, NY [email protected] [email protected] Abstract—The Generalized Second Price Auction has for advertisements and slots higher on the page are been the main mechanism used by search companies more valuable (clicked on by more users). The bids to auction positions for advertisements on search pages. are used to determine both the assignment of bidders In this paper we study the social welfare of the Nash equilibria of this game in various models. In the full to slots, and the fees charged. In the simplest model, information setting, socially optimal Nash equilibria are the bidders are assigned to slots in order of bids, and known to exist (i.e., the Price of Stability is 1). This paper the fee for each click is the bid occupying the next is the first to prove bounds on the price of anarchy, and slot. This auction is called the Generalized Second Price to give any bounds in the Bayesian setting. Auction (GSP). More generally, positions and payments Our main result is to show that the price of anarchy is small assuming that all bidders play un-dominated in the Generalized Second Price Auction depend also on strategies. In the full information setting we prove a bound the click-through rates associated with the bidders, the of 1.618 for the price of anarchy for pure Nash equilibria, probability that the advertisement will get clicked on by and a bound of 4 for mixed Nash equilibria. -
Types of Algorithms
Types of Algorithms Material adapted – courtesy of Prof. Dave Matuszek at UPENN Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together This classification scheme is neither exhaustive nor disjoint The purpose is not to be able to classify an algorithm as one type or another, but to highlight the various ways in which a problem can be attacked 2 1 A short list of categories Algorithm types we will consider include: Simple recursive algorithms Divide and conquer algorithms Dynamic programming algorithms Greedy algorithms Brute force algorithms Randomized algorithms Note: we may even classify algorithms based on the type of problem they are trying to solve – example sorting algorithms, searching algorithms etc. 3 Simple recursive algorithms I A simple recursive algorithm: Solves the base cases directly Recurs with a simpler subproblem Does some extra work to convert the solution to the simpler subproblem into a solution to the given problem We call these “simple” because several of the other algorithm types are inherently recursive 4 2 Example recursive algorithms To count the number of elements in a list: If the list is empty, return zero; otherwise, Step past the first element, and count the remaining elements in the list Add one to the result To test if a value occurs in a list: If the list is empty, return false; otherwise, If the first thing in the list is the given value, return true; otherwise Step past the first element, and test whether the value occurs in the remainder of the list 5 Backtracking algorithms Backtracking algorithms are based on a depth-first* recursive search A backtracking algorithm: Tests to see if a solution has been found, and if so, returns it; otherwise For each choice that can be made at this point, Make that choice Recur If the recursion returns a solution, return it If no choices remain, return failure *We will cover depth-first search very soon (once we get to tree-traversal and trees/graphs). -
Guru Nanak Dev University Amritsar
FACULTY OF ENGINEERING & TECHNOLOGY SYLLABUS FOR B.TECH. COMPUTER ENGINEERING (Credit Based Evaluation and Grading System) (SEMESTER: I to VIII) Session: 2020-21 Batch from Year 2020 To Year 2024 GURU NANAK DEV UNIVERSITY AMRITSAR Note: (i) Copy rights are reserved. Nobody is allowed to print it in any form. Defaulters will be prosecuted. (ii) Subject to change in the syllabi at any time. Please visit the University website time to time. 1 CSA1: B.TECH. (COMPUTER ENGINEERING) SEMESTER SYSTEM (Credit Based Evaluation and Grading System) Batch from Year 2020 to Year 2024 SCHEME SEMESTER –I S. Course Credit Course Title L T P No. Code s 1. CEL120 Engineering Mechanics 3 1 0 4 Engineering Graphics & Drafting 2. MEL121 3 0 1 4 Using AutoCAD 3 MTL101 Mathematics-I 3 1 0 4 4. PHL183 Physics 4 0 1 5 5. MEL110 Introduction to Engg. Materials 3 0 0 3 6. Elective-I 2 0 0 2 SOA- 101 Drug Abuse: Problem, Management 2 0 0 2 7. and Prevention -Interdisciplinary Course (Compulsory) List of Electives–I: 1. PBL121 Punjabi (Compulsory)OR 2 0 0 2. PBL122* w[ZYbh gzikph OR 2 0 0 2 Punjab History & Culture 3. HSL101* 2 0 0 (1450-1716) Total Credits: 20 2 2 24 SEMESTER –II S. Course Code Course Title L T P Credits No. 1. CYL197 Engineering Chemistry 3 0 1 4 2. MTL102 Mathematics-II 3 1 0 4 Basic Electrical &Electronics 3. ECL119 4 0 1 5 Engineering Fundamentals of IT & Programming 4. CSL126 2 1 1 4 using Python 5.