Generation Based Analysis Model for Non-Cooperative Games

Soham Banerjee Department of Computer Engineering International Institute of Information Technology Pune, India Email: [email protected]

Abstract—In most real-world games, participating agents are to its optimal solution which forms a good base for comparison all perfectly rational and choose the most optimal set of actions for the model’s performance. possible; but that is not always the case while dealing with complex games. This paper proposes a model for analysing II.PROPOSED MODEL the possible set of moves and their outcome; where not all participating agents are perfectly rational. The model proposed is The proposed model works within the framework of pre- used on three games, Prisoners’ Dilemma, Platonia Dilemma and determined assumptions; but may be altered or expanded to Guess 2 of the Average. All the three games have fundamental 3 other games as well. differences in their action space and outcome payoffs, making them good examples for analysis with the proposed model. • All agents have infinite memory - this means that an agent Index Terms—, analysis model, deterministic remembers all previous iterations of the game along with games, superrationality, prisoners’ dilemma, platonia dilemma the observation as well as the result. • All the games have a well-defined pure . I.INTRODUCTION • The number of participating agents is always known. • All the games are non co-operative - this indicates that the While selecting appropriate actions to perform in a game, agents will only form an alliance if it is self-enforcing. a common assumption made is that, all other participating agents (co-operative or not) will make the most optimal move Each participating agent is said to posses a Generation of possible. Such an agent is called a Perfectly Rational Agent. Knowledge Gi. A Generation of Knowledge is defined as When a perfectly rational agent is aware of the fact that all follows Gi = f (Gi-1) ; where f (x) is the generation increment other agents are also perfectly rational, the agent is said to function and Gx is the Generation of Knowledge. be Superrational. This concept of decision mapping was first A Generation of Knowledge is the state of an agent which introduced by Douglas Hofstader in his book Metamagical defines its decision mapping. For example, an agent who Themas, while proposing a solution to the Platonia Dilemma. knows its destination is to the right, will move towards the Analysing the outcomes of games where all agents are right. This is the Generation of Knowledge possessed by the superrational is useful and usually leads to the formation agent. If the agent learns about a shortcut to its destination, it of a good mapping - from observations to actions; but in would prefer to use the shorter path and hence use the shortcut some cases the agents might not be superrational and hence eventually. This is the incremented Generation of Knowledge the aforementioned model will not yield proper results. The possessed by the agent. knowledge and understanding of the real world agent changes The model has the following properties based on multiple factors including, but not limited to, pre- • The initial Generation of Knowledge is G0 and for most vious iterations, changing goals, actions performed by other games, it is a random choice made over the entire action agents or simply finding a better set of actions which maps space. to a better output. As the actions taken by any agent changes, • Each generation is incremented over the previous Gener- the optimal action choices for other agents may change. To ation of Knowledge based on the output provided by the analyse all permutations of actions taken by each of the agents, previous generations. this paper proposes a model that divides the decision process • For each game the Generation of Knowledge starts re- into Generations of Knowledge which defines actions of each peating after a certain generations as the agent reaches participating agent. the optimal situation. In this paper the proposed model is applied to three games, • The generations of knowledge are ordinal in nature and 2 have no numeric weight assigned to them. which are Prisoner’s Dilemma, Platonia Dilemma and Guess 3 of the Average. All the aforementioned games have different Using Generation of Knowledge to dictate the agents’ Nash Equilibria and strategies to achieve the most optimal actions makes it possible to find the most optimal action outcome. Each of the selected games has a different approach even when dealing with agents who do not select the most th optimal action possible, i.e. are not superrational in nature. • Gi = i iteration where you perform the action performed In most real-world games such as the stock market, warfare, by the other agent in the previous generation. This data sharing etc. it is a challenging task to find agents who strategy is also called as Tit-For-Tat. can function rationally and make optimal decisions. If we can • Gn = It becomes apparent that both player will always identify the Generation of Knowledge possessed by the agent pick the same action; making staying silent the superra- we can select the most optimal action from the total action tional action to be taken. space. The Generation of Knowledge also changes when different agents have different objectives. An agent wanting to earn the maximum amount of money possible will not care about the money earned by other agents. This means that in games where different agents are trying to achive differnt goals they will be at different generations of knowledge, making this proposed model useful in these situations as well. This paper uses the proposed model on ”Prisoners’ Dilem- 2 mma”, ”Platonia Dilemma” and ”Guess 3 of the Average” to analyse the payoffs.

III.PRISONERS’DILEMMA Fig. 2. Payoff Matrix for each generation in Prisoners’ Dilemma Prisoner’s Dilemma is a common game played between two or more agents. The game is defined as follows: Two criminals In the inter-generation payoff table, there exists cells that are arrested and are taken in for interrogation. There exists no have multiple values; these have been reduced into one value means of communication between the two individuals. The based on their probabilities of occurrance. For example; if a prosecutors have sufficient evidence to convict the pair on a cell has 50% probability to have value 3 and 50% probability lesser charge, but lack sufficient evidence to convict the pair on to have value 5,s the reduced value will be 4. the primary charge. In order to convict them on their primary IV. PLATONIA DILEMMA charge, the prosecutors offer each prisoner a bargain. Each criminal is given the opportunity to either cooperate with the Douglas Hofstader, in his book , intro- other criminal by remaining silent or betray the other criminal duces a game played among 20 people who have no means of by testifying that the other committed the crime. The offer communiction with each other. The game is defined as follows; provides different payouts for diffent senarios. These are - An eccentric trillionaire gathers 20 people together, and tells them that if one and only one of them sends him a telegram • If both criminals betray each other, each of them serves two years in prison. (reverse charges) by noon the next day, that person will receive a billion dollars. If he receives more than one telegram, or none • If one(A) of them betrays the other(B) while the other remain silent, the criminal who betrays has to face no at all, no one will get any money. charges and is free to go but the one who stays quite will It may seem impossible to win this game, but there is a have to serve three years in prison. set of actions shared by all participating agents of the game which leads to 1 person aquiring the billion dollars. The game • If both criminals remain silent, they will each serve only one year in prison. has the generations of knowledge defined as follows: • G0 = 50% probability to send a telegram and 50% probability of not sending a telegram. • G1 = Send a telegram anyway as it increases your probability of attaining the billion dollars. • G2 = Roll a 20 sided-die and only send a telegram if the outcome of roll is 1. (As this will lead to only 1 person sending the telegram most of the times the game is played). • G3 = Send a telegram anyway as it increases your Fig. 1. Payoff Matrix for Prisoners’ Dilemma probability of attaining the billion dollars. • G2i = Roll a 20 sided-die and only send a telegram if you If we apply our model to this game we get the Generation roll a 1. (Pure strategy Nash Equilibrim) of Knowledge as; • G2i+1 = Send a telegram anyway as it increases your

• G0 = 50% chance of staying silent and 50% chance of probability of attaining the billion dollars. betraying. After a couple of increments, the Generations of Knowl- • G1 = 100% chance of betraying (Pure Strategy Nash edge start to toggle between Rolling a die And Sending the Equilibrium). Telegram anyway. This makes the model especially useful as knowing the Generation of Knowledge, the other contestents the Average”. After dividing various actions by Generations of can greatly increase the agent’s chances. knowledge, it can be noticed that selecting the action that leads The following table indicates the probability (in percentage) to the Pure Strategy Nash Equilibrium may not be the solution of winning the billion dollars based on what Generation of if the other participating agents are at different generations of Knowledge that the agent and opposing agents are on. knowledge. For games like Prisoners’ Dilemma, the actions of consec- utive generations of knowledge are similar to each other and gradually reach superrationality as the generations are incre- mented. For games like Platonia Dilemma, the generations of knowledge start to toggle between generations. This implies that the games have a very drastic change in outcome based on the actions performed by the participating agents. For games 2 like Guess the 3 of the Average we see that the generations of knowledge start to close over a single action irrespective of action space or the payoff.

VII.CONCLUSION Fig. 3. Payoff Matrix for each generation in Platonia Dilemma After analysing all the selected games using the proposed model a better mapping of actions to outcomes can be got. 2 V. GUESSTHE 3 OFTHE AVERAGE Also a consideration can be got with regard to the state of Alain Ledoux, used this game as a tie breaker in his French other agents. Smaller games like Prisoners’ Dilemma have a magazine Jeux et Stratgie. The game was played by about very samll action space, but in the real world games have a 4000 readers of the magazine. The rules were as follows; each much larger action space with many possible outcomes. This contestant chooses an integer between one and a hundred. The makes the proposed model very useful as most real world winner of the game is the participant who chose the number agents are at a different Genrations of Knowledge. closest to the 2 of the average of all the numbers submitted 3 REFERENCES by each candidate. At first glace the solutions seems to be in the range of 20 [1] ”Metamagical Themas” - Basic Books, 1985 [2] ”A Short Note on the Solution of the Prisoner’s Dilemma” Mario Kppen, to 40, but that will not be the case if all participants share the Masato Tsuru - ICINCS, 2015 same Generation of Knowledge. If each participant assumes [3] ”The evolution of cooperation” R. Axelrod - Penguin Books, 2013 the average to be 30 then each player guesses 20; this means [4] ”Computing Nash Equilibria and Evolutionarily Stable States of Evo- lutionary Games” Jiawei Li, Graham Kendall, Robert John - IEEE that now the correct answer has changed from 20 to 16. Transactions on Evolutionary Computation, 2015 If we assume the optimal ranges for the game,It has gener- [5] https://en.wikipedia.org/wiki/Guess 2/3 of the average ations of knowledge as follows; [6] https://theincidentaleconomist.com/wordpress/analysis-of-whats-23-of- the-average • G0 = randomly select any number from 1 to 66. [7] https://en.wikipedia.org/wiki/Superrationality • G1 = select a number in the range of 30 to 40. [8] https://en.wikipedia.org/wiki/Prisoner%27s dilemma [9] https://mindingourway.com/causal-reasoning-is-unsatisfactory • G2 = select a number in the range of 15 to 20. • G3 = select a number in the range of 12 to 15. • Gi = select a number in the range ri; where ri is the range of the ith generation. • Gn = select 1 (Pure Strategy Nash Equilibrium). The range of each generation is smaller than the previous 2 because the 3 of average keeps reducing until it reaches its minimum value of 1. If the game asked the participants to pick 3 th the number closest to 2 of the average, then the n generation will be to select 100. If each participant chooses 1, then the 2 3 of the average is 0.66 whose closest number in the range is 1 making all participants, winners.

VI.RESULTS After defining the model and applying it to three different games, different action mappings which dictate the outcome of the games, can be got. In Prisoners’ Dilemma, if the Generation of Knowledge of the other agent is known, we can select the most optimal action. The same principal can 2 also be applied to Platonia Dilemma as well as ”Guess 3 of