Modeling Human Learning in Games
Total Page:16
File Type:pdf, Size:1020Kb
Modeling Human Learning in Games Thesis by Norah Alghamdi In Partial Fulfillment of the Requirements For the Degree of Masters of Science King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia December, 2020 2 EXAMINATION COMMITTEE PAGE The thesis of Norah Alghamdi is approved by the examination committee Committee Chairperson: Prof. Jeff S. Shamma Committee Members: Prof. Eric Feron, Prof. Meriem T. Laleg 3 ©December, 2020 Norah Alghamdi All Rights Reserved 4 ABSTRACT Modeling Human Learning in Games Norah Alghamdi Human-robot interaction is an important and broad area of study. To achieve success- ful interaction, we have to study human decision making rules. This work investigates human learning rules in games with the presence of intelligent decision makers. Par- ticularly, we analyze human behavior in a congestion game. The game models traffic in a simple scenario where multiple vehicles share two roads. Ten vehicles are con- trolled by the human player, where they decide on how to distribute their vehicles on the two roads. There are hundred simulated players each controlling one vehicle. The game is repeated for many rounds, allowing the players to adapt and formulate a strategy, and after each round, the cost of the roads and visual assistance is shown to the human player. The goal of all players is to minimize the total congestion experienced by the vehicles they control. In order to demonstrate our results, we first built a human player simulator using Fictitious play and Regret Matching algorithms. Then, we showed the passivity property of these algorithms after adjusting the passivity condition to suit discrete time formulation. Next, we conducted the experiment online to allow players to participate. A similar analysis was done on the data collected, to study the passivity of the human decision making rule. We observe different performances with different types of virtual players. However, in all cases, the human decision rule satisfied the passivity condition. This result implies that human behavior can be modeled as passive, and systems can be designed to use these results to influence human behavior and reach desirable outcomes. 5 ACKNOWLEDGEMENTS First and foremost, thanks to god for helping me throughout this journey and for blessing me with all the amazing and inspiring people I have met. I would like to thank my advisor Professor Jeff Shamma for the help, guidance, and advice he gave me. I am really fortunate for having the opportunity to work under his supervision and really grateful for him always being kind, supportive, and inspiring. I extend my thanks to Professor Eric Feron and Professor Meriem Laleg for being part of master's defense committee. I am really grateful for meeting them and I am thankful for everything I learned in the courses they offered. I am extremely thankful to Dr. Ayman Shamma for his great invaluable assistance in implementing the experiment online. While working on the experiment he was supportive, patient, and gave me a lot of advice which made the learning experience much easier. Moreover, my appreciation also goes to all RISC lab members for all the help and friendly environment they have created. They are always glad to help and share their knowledge and experience. Last but not least, I wish to show my gratitude to my family, especially my parents for their kind support, encouragement, and continuous prayers. I extend my gratitude to all my friends who are always supportive and encouraging. 6 TABLE OF CONTENTS Examination Committee Page 2 Copyright 3 Abstract 4 Acknowledgements 5 Table of Contents 6 1 Introduction 8 1.1 Literature Review . .9 1.2 Objectives and Contributions . 11 1.3 Thesis Outline . 12 2 Background and Preliminaries 13 2.1 Population Games . 13 2.2 Stable Games . 14 2.3 Congestion Games . 15 2.4 A Primer on Passivity . 16 2.4.1 Input-Output Operators and Feedback Interconnection . 16 2.4.2 Passivity with Input-Output Operators . 17 2.5 Passivity for Stable Games and Evolutionary Dynamics . 18 2.5.1 Evolutionary Games Feedback Interconnection . 18 2.5.2 δ-Passivity and δ-Anti-Passivity . 19 2.5.3 Stable Games and Passive Evolutionary Dynamics . 20 3 Congestion Game Experiment Formulation and Implementation 22 3.1 Problem Formulation . 22 3.2 Utility Design . 23 3.2.1 Roads Congestion Functions . 24 3.2.2 Human Player Utility Function . 24 7 3.2.3 Virtual Users Utility Function . 25 3.3 Virtual Player Decision Making Rules . 25 3.3.1 Random . 25 3.3.2 Best Response . 26 3.3.3 Fictitious Play . 27 3.3.4 Regret Matching . 29 3.4 Experiment Setup and Game Flow . 31 3.5 Web Interface . 35 4 Results and Analysis 37 4.1 Human Player Simulator . 37 4.1.1 Fictitious Play Virtual User . 37 4.1.2 Regret Matching Virtual User . 39 4.2 The Truck Routing Challenge Analysis . 40 4.3 Human Players Data Analysis . 51 5 Concluding Remarks 61 5.1 Summary . 61 5.2 Future Research Work . 62 References 64 8 Chapter 1 Introduction Human-robot interaction is an important and broad area of study. Much work has been done to delegate complete missions or some tasks to robotics, which have to con- sider the human aspect in the process. However, unlike software and hardware, there is no model that can capture human behavior completely and perfectly. Therefore to entrust some tasks and responsibilities to robots we have to study human behavior and gain some understanding of how choices and actions are made. Recently much emphasis has been placed on collaborative robots and task distri- bution between robots and humans. This comes with its challenges because of the complexity of human behavior. The challenges include predicting what humans will do in response to specific scenarios and how the information available are being per- ceived. By understanding human behavior, robotics can be programmed to be better teammates with humans. In this framework, we use a Game Theoretical approach to gain some insight into human behavior. Game theory is a study of strategies and what happens when inde- pendent, self-interested agents interact. These interactions might be in cooperative or non-cooperative environments where the overall outcome depends on the decisions of all autonomous agents. Therefore, no single decision maker can fully control the outcome [1][2]. To be able to take advantage of game theory we must define the prob- lem in a game form, that is, to define a set of n-players (agents), allowable actions 9 (action profile), and the utility (or payoff) functions. More details about games and their definitions are in the next chapter. Although in this work we take advantage of game theory concepts, there are different ways to approach human modeling. Much works have been done with the intention of learning from humans. These projects' goals are usually along the lines of making robotics that either work as well as a human or work with humans. In the next section, we will explore some of these projects. 1.1 Literature Review Due to the rising interest in having robotics asset human workers in a various range of applications, it become important to understand how humans learn and make their decisions. An example of learning how humans learn in a game is done in [3]. They conduct an experiment where players play a routing game online. Using the assumption that players' learning dynamic is an entropic mirror descent, the problem they try to solve is estimating the learning rates of each player. Also, to estimate the learning rate parameter of a player they use the player decisions and the resulting outcome. They developed an online routing game as a web application. The players log in and get assigned a starting point and a destination. Every player chooses a distribution over their available routes. Then, the collective decision of all players determines the cost of each path. In this game, the player's objective is to minimize their own cost. The estimated model parameter in this experiment was used to predict the flow distribution over routes. They show that the online learning model can be used as a predictive model over a short horizon by comparing their prediction and the actual distribution. Their results show that the estimated distribution closely fit the actual distribution for one of the players. This suggests that the mirror descent model is a good approximation of the human learning model in this game, but not 10 necessarily the best. Another work that involves the human element, is the work done by the authors in [4]. The goal of their work was to determine how a dynamical system should act to match human preferences. However, it is a difficult task for the user to provide a demonstration or a description of what the reward function should be. Therefore, they propose a preference-based approach with the intention of learning the desired reward function in the dynamical system. The preference-based learning of the reword function algorithm works as follow: • The first step is to synthesize queries. In this part of the algorithm two trajec- tories (hypothetical situations) get generated through Active Volume Removal. • Then, these trajectories are shown to the participants for them to provide their preference for the specific pair of trajectories. • The last step is to update the reward function of the humans. The reward function is a linear combination of a set of features, So, using the inputs from the participant acquired in the previous step, the weight for the features get updated. These steps are repeated for N-iterations. They tested their algorithm, and the re- ward function they learned outperformed reward functions obtained using non-active and non-synthesis learning.