Internal Model from Observations for Reward Shaping

Total Page:16

File Type:pdf, Size:1020Kb

Internal Model from Observations for Reward Shaping Internal Model from Observations for Reward Shaping Daiki Kimura Subhajit Chaudhury IBM Research AI IBM Research AI Tokyo, Japan Tokyo, Japan [email protected] [email protected] Ryuki Tachibana Sakyasingha Dasgupta* IBM Research AI Ascent Robotics Inc. Tokyo, Japan Tokyo, Japan [email protected] [email protected] ABSTRACT In most RL methods, it is critical to choose a well-designed re- Reinforcement learning methods require careful design involving a ward function to successfully ensure that the agent learns a good reward function to obtain the desired action policy for a given task. action policy for performing the task. Moreover, there are cases In the absence of hand-crafted reward functions, prior work on the in which the reward function is very sparse or may not be directly topic has proposed several methods for reward estimation by using available. Humans can often imitate the behavior of their instruc- expert state trajectories and action pairs. However, there are cases tors and estimate which actions or environmental states are good where complete or good action information cannot be obtained from for the eventual accomplishment of a task without being provided expert demonstrations. We propose a novel reinforcement learning with a continual reward. For example, young adults initially learn method in which the agent learns an internal model of observation how to write letters by imitating demonstrations provided by their on the basis of expert-demonstrated state trajectories to estimate teachers or other adults (experts). Further skills get developed on rewards without completely learning the dynamics of the external the basis of exploration around this initial grounding provided by environment from state-action pairs. The internal model is obtained the demonstrations. Taking inspiration from such scenarios, vari- in the form of a predictive model for the given expert state distribu- ous methods have been proposed, which are collectively known as tion. During reinforcement learning, the agent predicts the reward imitation learning [8, 9] or learning from demonstration [26]. In- as a function of the difference between the actual state and the state verse reinforcement learning [1, 19, 35], behavior cloning [25], and predicted by the internal model. We conducted multiple experiments curiosity-based exploration [23] are also examples of research in in environments of varying complexity, including the Super Mario this field. Typically, in all these formulations, expert demonstrations Bros and Flappy Bird games. We show our method successfully are provided as input. trains good policies directly from expert game-play videos. The majority of such prior work assumes that the demonstrations f¹ i i º ¹ i i ºg contain both states and actions s0; a0 ; :::; st ; at and that these KEYWORDS can be used to solve the problem of having only a sparse reward or a complete lack thereof. However, there are many cases in real- Inverse Reinforcement Learning, Reward Shaping, Expert demon- world environments in which such detailed action information is not stration readily available. For example, a typical schoolteacher does not tell students the exact amount of force to apply to each of their fingers 1 INTRODUCTION while they are learning how to write. Reinforcement learning (RL) [30] enables an agent to learn the de- As such, in this work, as our primary contribution, we propose a sired behavior required to accomplish a given objective, such that reinforcement learning method in which the agent learns an internal the expected return or reward for the agent is maximized over time. predictive model that is trained on the external environment from Typically, a scalar reward signal is used to guide the agent’s behavior state-only trajectories by expert demonstrations. This model is not arXiv:1806.01267v4 [cs.LG] 14 Oct 2018 so that the agent learns a control policy that maximizes the cumula- trained on both the state and action pairs. Hence, during each RL tive scalar reward over trajectories. This type of learning is referred step, this method estimates an expected reward value on the basis to as model-free RL if the agent does not have an apriori model or of the similarity between the actual and predicted state values by knowledge of the dynamics of the environment it is acting in. Some the internal model. Therefore, the agent must learn to reward known notable breakthroughs among the many recent research efforts that good states and penalize unknown deviations. Here, we formulate incorporate deep models are the deep Q-network (DQN) [17], which this internal model as a temporal-sequence prediction model that approximated a Q-value function used as a deep neural network and predicts the next state value given the current and past state values trained agents to play Atari games with discrete control, the deep de- at every time step. This paper presents experimental results on mul- terministic policy gradient (DDPG) [15], which successfully applied tiple environments with varying input and output settings for the deep RL for continuous control agents, and the trust region policy op- internal model. In particular, we show that it is possible to learn timization (TRPO) [27], which formulated a method for optimizing good policies using an internal model trained by observing only control policies with guaranteed monotonic improvement. game-playing videos, akin to the way we as humans learn by ob- serving others. Furthermore, we compare the performance of our *This work originated while working at IBM Research proposed method regarding the baselines of hand-crafted rewards, proposed by [5] because both methods calculate the similarity of prior research efforts, and other baseline methods for the different demonstrations as a reward shaping function. However, while their environments. paper dealt only with discrete action tasks, we show a similar ap- proach can be applied to continuous action tasks 1. Moreover, all the 2 RELATED WORK above-mentioned methods rely on both state and action information provided by expert demonstrations. In RL, an agent learns a policy π¹at jst º that produces good actions Another recent line of work aimed at learning useful policies for from the observation at the time. DQN [17] showed that a Q-value agents even in the absence of expert demonstrations. In this regard, function q¹st ; at º can be successfully approximated with a deep neu- they trained an RL agent with a combination of intrinsic curiosity- ral network. DAQN [12] showed the pre-training by a generative based reward and hand-engineered reward that had a continuous model reduces the number of training iterations. Similarly, actor or very sparse scalar signal [23]. The curiosity-based reward was and critic networks in DDPG can enable continuous control, e.g. in designed to have a high value when the agent encountered unseen robotic manipulation by minimizing a distance between the robot states and a low value when it was in a state similar to the previously end-effector and the target position. Since the success with DDPG, explored states. The paper reported good policies in games, such as other methods, such as TRPO [27] and proximal policy optimiza- Super Mario Bros. and Doom, without any expert demonstrations. tion (PPO) [28] have been proposed as further improvements for Here, we also compared our proposed method with the curiosity- model-free RL regarding continuous control. based approach and demonstrated better-learned behavior. However, Although RL enables an agent to learn an optimal policy in the as a limitation, our method assumed that state demonstrations were absence of supervised training data, in a standard case, it involves available as expert data. the difficult task of hand-crafting good reward functions for each Also, there is a work that estimate the reward from linear func- environment [1]. Several kinds of approach have been proposed to tion [29]. However, they evaluated by simple task; specifically, they work around or tackle this problem. An approach that does not re- used 27 discrete state-variables for Mario. On the other hand, our quire hand-crafted rewards is behavior cloning based on supervised method is using the non-linear model. learning instead of RL [25]. It learns the conditional distribution of At the same time as this work, a recent paper [31] proposed learn- actions from given states in a supervised manner. Although it has ing policies using behavior cloning method based on observations an advantage of fast convergence [8] (as behavior cloning learns a only. Unlike that work, here we put primary focus on reward shaping single action from states during each step), it typically results in the based on internal model from observation data. compounding of errors in future states. An alternate approach, inverse reinforcement learning (IRL), was 3 PROPOSED METHOD proposed [19]. In this work, the authors tried to recover the reward function as the best description of the given expert demonstrations 3.1 Problem Statement from humans or expert agents using linear programming methods. We considered a MDP consisting of states S and actions A, where This was based on the assumption that expert demonstrations are the reward signal r : S × A ! R was unknown. An agent acted solutions to a Markov Decision Process (MDP) defined by a hid- defined by this MDP following a policy, π¹at jst º. Here, we assumed den reward function [19]. It demonstrated successful estimation to have knowledge of a finite set of expert state trajectories, τ = 0 n i i i of the reward function regarding relatively simple environments, fS ; :::; S g, where S = fs0; :::;sm g. These trajectories represented such as a grid world and the mountain car problem. Extending [19], joint angles, raw images, or other environmental states. entropy-based methods that compute a suitable reward function by Since the reward signal was unknown, our primary goal was to maximizing the entropy of the expert demonstrations have been find a reward signal that enabled the agent to learn a policy, π, that proposed [36].
Recommended publications
  • Vietnam's Hit Game Developer Pulls Plug on Flappy Bird (Update) 10 February 2014, by Cat Barton
    Vietnam's hit game developer pulls plug on Flappy Bird (Update) 10 February 2014, by Cat Barton "It is not anything related to legal issues. I just cannot keep it anymore," Dong tweeted from his @dongatory handle—which has seen its follower count grow by tens of thousands in the last few days. Flappy Bird features 2D retro-style graphics. The aim of the game is to direct a flying bird between oncoming sets of pipes without touching them. Dong has said in interviews that his brainchild was pulling in as much as $50,000 per day in revenue from online advertising banners. Nguyen Ha Dong, creator of the game Flappy Bird, is pictured in Hanoi on February 5, 2014 The Vietnamese developer behind the smash-hit free game Flappy Bird has pulled his creation from online stores after announcing that its runaway success had ruined his "simple life". Technology experts say the addictive and notoriously difficult game rose from obscurity at its release last May to become one of the most downloaded free mobile games on Apple's App Store and Google's Play store. Online commentators have speculated that Nguyen Ha Dong took down Flappy Bird after being pressured by "'Flappy Bird' is a success of mine. But it also ruins Japan's Nintendo my simple life. So now I hate it," the game's creator Nguyen Ha Dong tweeted. "I am sorry 'Flappy Bird' users, 22 hours from now, The free game has been the number one app in I will take 'Flappy Bird' down. I cannot take this Apple's iOS App Store in more than 100 countries, anymore," he wrote Saturday in a message that according to An Minh Do, editor at the Tech in Asia had been retweeted nearly 140,000 times by online media company.
    [Show full text]
  • Flapga Mario∗ Implementing a Simple Video Game on Basys 3 FPGA Board
    FlaPGA Mario∗ Implementing a simple video game on Basys 3 FPGA board Liu Haohua June, 2018 Contents 1 Introduction 2 1.1 Background . .2 1.2 Drawing the blueprint . .2 1.3 System architecture . .3 2 Fundamentals 4 2.1 VGA Module . .4 2.1.1 Basys 3 VGA port specs . .4 2.1.2 VGA Timing . .4 2.2 ROMs . .6 2.2.1 Accessing the pixels . .6 2.2.2 Writing data to the board . .6 2.3 RAMs . .6 3 Graphics Engine 7 3.1 Displaying images . .7 3.2 Sprites . .7 3.3 Background Engine . .7 3.4 Object Engine . .8 3.5 Layers blending . .8 4 Audio Engine 8 4.1 Sine wave generator . .8 4.2 Music data . 10 5 Game Logic 11 5.1 Mario . 11 5.2 Pipes and coins . 11 5.2.1 Pipes and coin generation . 11 5.3 Collision detection . 12 5.4 Scrolling . 12 5.4.1 Split scrolling . 12 5.4.2 Parallax scrolling effect . 13 ∗The full code can be obtained at https://github.com/howardlau1999/flapga-mario 1 5.5 Data writing arrangement . 13 5.6 Game status . 14 6 Conclusion 14 6.1 Screenshots . 14 6.2 My feelings . 15 6.3 Possible improvements . 15 7 References 16 8 Acknowledgement 16 9 Appendix 16 9.1 Resources convertion . 16 9.1.1 Image data convertion . 16 9.1.2 Music data convertion . 16 1 Introduction 1.1 Background Early in my childhood had I the first contact with video game consoles and I have been long fascinated by the virtual world that games create.
    [Show full text]
  • Flapping Genius! : the Ultimate Flappy Birds Trivia Challenge Pdf, Epub, Ebook
    FLAPPING GENIUS! : THE ULTIMATE FLAPPY BIRDS TRIVIA CHALLENGE PDF, EPUB, EBOOK MR James Moore | 26 pages | 25 Nov 2014 | Createspace Independent Publishing Platform | 9781503332591 | English | none Flapping Genius! : The Ultimate Flappy Birds Trivia Challenge PDF Book Reasons to play this reactions-based arcade game: Nostalgic fans of 80s arcade games like Pong and Atari Breakout should appreciate the classic 80s gameplay with a fun twist for added excitement. Squareman 3. Newer Super Mario Bros. We suggest giving players lots of useful new games and lessons that you can draw from them. Higurashi no Naku Koro ni. If two player mode is activated, good teamwork is essential as you and your partner must work in tandem to defeat enemies! Click or hold down the screen to start shooting when you are an assassin and turn back when you are king. Not just in its sound but in his writing as well. Reasons to play this virtual bowling skill game: Whether you like to bowl in real life or just would like a taste of the indoor action, this cool and realistic version should whet your appetite to see some pins fall! Show off your sniping ability in this epic edition of the Sni[p]r series. Squareman 2. A steady hand, deft wrist and solid mouse control are also important for the smooth bowling action required to score strikes and spares at will. Metal Gear Solid: Peace Walker. MapleStory DS. Haunted Castle. Space Defenders. It is a scarce commodity, and requires a lot of skill to play well. Can you complete the training? Beat all the Robots and defend the earth! Nintendo 3DS Music.
    [Show full text]
  • Math 202H Project
    Math 202H Project - due Monday, April 20th Flappy Bird is a mobile game released in May 2013 where the player tries to fly the bird (Faby) between rows of green pipes without hitting them. Once Faby has hit a pipe, the game is over. Flappy Bird soared in popularity in early 2014 and was the most downloaded free game in January 2014 from the iOS App Store. The developer, Dong Nguyen, claims he was making $50,000 per day at that time due to in-game advertisements. However, Flappy Bird was removed from Google Play and Apple's App Store in February 2014 due to the creator's guilt over the addictive nature of the game. Shortly after the game's removal, devices where the game was pre-installed were sold on ebay for thousands of dollars. One of the reason's this simplistic game is so addictive is its challenge. If you haven't played it yet, you'll soon see that flying Faby through the pipes is quite a task. 1. Play Flappy Bird ten times and record your best score and your average score over those ten trials. You can play for free online here: http://flappybird.io/. 2. Our goal in this project is to develop a formula for the expected score based on the probability of success- fully getting through a single pipe. We'll start by modeling Flappy Bird scores as a series of independent success events (I realize that's a simplification, but I don't think an unreasonable one). The very simple model assumes you have a success rate of \p" for each pipe.
    [Show full text]
  • Vidio-Game Mainia!!!!
    VIDIO-GAME MAINIA!!!! - Find some things out about mine craft thatThere you didn’tis also knowcomics before! And some other stuff about video games! Table of Contents Letter from the Editor…………………………..P.1-2 Letters to the Editor Cheats for Minecraft……………………………P.3 Video Game Rant……………………………....P.4 Some News Super Mario……………………………………P.5 Video Games……..……………………………P.6-7 Now Some Stories Minecraft……………………………………….P.8 Short Story……………………………………...P.9 Things That May Interest You Minecraft!………………………………………P.10 Wii U…………………………………………...P.11 Minecraft!!..........................................................P.12-13 Other Video Games…………………………….P.14 LETTER FROM THE EDITOR Dear Readers, TV that’s being designed Its been an exiting for only video games. I’m month in the video not going to spoil the gaming world. Like last surprise by going into week, there was a new detail, but there is one video game that came out thing I will say, there’s an “Unicorn Revolution.” I’m advertisement in this not sure you boys out issue (you have to keep there would like to play reading to find it) that it, but there are lots of describes it. The add girls who do. I know this shows a glitch in the TV. because I have a little So keep reading guys sister who enjoys playing because in this months it (while annoying me at articles there are some the same time). tips that I didn’t even Now for you boys, there know about. Believe it or is a new X-box controller not, even though I am the out there that gives you editor, I didn’t know half high definition graphics the stuff you guys sent in for certain games.
    [Show full text]
  • Brisio Innovations Inc
    BRISIO INNOVATIONS INC FOR IMMEDIATE RELEASE CSE: BZI BRISIO INNOVATIONS ANNOUNCES ACQUISITION OF HIT GAME, SPERMY’S JOURNEY, WITH OVER 4 MILLION DOWNLOADS AND OVER 1 MILLION INSTALLED USERS VANCOUVER, BC, April 2, 2014 – Brisio Innovations Inc. (CSE: BZI) (the “Company”) is pleased to announce that it has purchased all rights, intellectual property and online assets associated with “Spermy’s Journey, A Race To The Egg!”, an Android and IOS game app that has been one of the most highly downloaded and played games on Android and IOS since its release earlier this year. The Company paid the vendor $130,000 USD as consideration for these assets. About Spermy’s Journey: According to AppAnnie.com, an app store market data tracking website, Spermy’s Journey reached the following daily ranks within the past month (March 2014): Google Play Top 5 Overall downloaded app in 2 countries Top 10 Overall downloaded app in 3 countries # 1 App in Overall games in 3 countries # 1 App in Arcade games in 9 countries Top 5 Overall game in 10 countries Top 5 Arcade game in 21 countries Top 10 Overall game in 17 countries Top 10 Arcade game in 26 countries Apple Appstore (iPhone) # 1 Overall downloaded app in 6 countries Top 5 Overall downloaded app in 17 countries Top 10 Overall app in 19 countries # 1 App in Overall games in 8 countries # 1 App in Strategy games in 21 countries Top 5 Overall game in 19 countries Top 5 Strategy game in 28 countries Top 10 Overall game in 21 countries Top 10 Strategy game in 40 countries Since its launch on the Google Play store on February 7th 2014 and the Apple App store on March 11, 2014, Spermy's Journey has been downloaded over 4 million times and was a featured review on DailyAppShow, which cites statements such as, “The Next Flappy Bird”, “Entertaining and funny”, and “hilarious and highly addictive” (1) .
    [Show full text]
  • City News and Activity Guide
    ConcordConcord City News and Activity Guide www.cityofconcord.org Summer 2018 Music & Market Sports Day Camps pages 7–8 pages 20–23 Camp Concord New BART plaza pages 10 –11 page 8 Welcome to MCE, Concord! Sustainable Contra Costa Board Member Enjoying Mt. Diablo The City of Concord is proud to announce that starting April 1, Concord residents’ and business owners’ electric accounts were upgraded to 50% renewable energy (provided by MCE) at lower cost than PG&E. By joining MCE, Concord expects to reduce over 20,000 metric tons* of greenhouse gas emissions, which is similar to removing over 4,200 cars from the road for a year.** Together, we’ll bring Concord approximately 20% closer to achieving its 2020 emissions reduction goals within the fi rst year of MCE service. About MCE Energy programs, including Alameda, San Francisco, In operation since 2010, MCE is a local, not–for–profi t, Sonoma, Silicon Valley, and the Peninsula. public agency that provides renewable energy service at low and stable rates to approximately 450,000 Bay Electricity Choice Area customers. PG&E continues to provide the same Residents and business owners don’t need to do reliable delivery and billing service you’re used to, anything to receive MCE Light Green 50% renewable while MCE buys and builds more California renewables energy service. MCE is not an extra or new charge — to power homes and businesses with solar, wind, it simply replaces PG&E service with lower cost and bioenergy, hydroelectricity, and geothermal heat. more renewable electricity. PG&E will continue to deliver your power, maintain power lines, and provide Local Control your gas service.
    [Show full text]
  • Keogh, Brendan. "Between Aliens, Hackers and Birds: Non-Casual Mobile Games and Casual Game Design." Social, Casual and Mobile Games: the Changing Gaming Landscape
    Keogh, Brendan. "Between aliens, hackers and birds: Non-casual mobile games and casual game design." Social, Casual and Mobile Games: The changing gaming landscape. Ed. Tama Leaver and Michele Willson. New York: Bloomsbury Academic, 2015. 31–46. Bloomsbury Collections. Web. 24 Sep. 2021. <http://dx.doi.org/10.5040/9781501310591.ch-003>. Downloaded from Bloomsbury Collections, www.bloomsburycollections.com, 24 September 2021, 04:10 UTC. Copyright © Tama Leaver, Michele Willson and Contributors 2016. You may share this work for non-commercial purposes only, provided you give attribution to the copyright holder and the publisher, and provide a link to the Creative Commons licence. 3 Between aliens, hackers and birds: Non-casual mobile games and casual game design B r e n d a n K e o g h he January 2012 issue of video game magazine Edge did not display a T high-defi nition render of an upcoming blockbuster game on its cover. It simply showed the increasingly ubiquitous logo of technology company Apple, creators of the iPhone. ‘Apple has changed the video game industry irrevo- cably’ the corresponding feature starts. ‘And the simple truth is that it has changed it without even really trying. It did it with a handheld device that has no buttons, no sticks and no ports for physical media, and it did it with a vir- tual storefront that was created, in the main, to revolutionize the way people bought music, not videogames’ ( Edge 2012, 77). As devices that traditionally brought together digital screens and buttons, that video games would appear on mobile phones in one fashion or another was inevitable.
    [Show full text]
  • Why Flappy Bird Is / Was So Successful and Popular but So Difficult to Learn: a Cognitive Teardown of the User Experience (UX)
    Why Flappy Bird Is / Was So Successful And Popular But So Difficult To Learn: A Cognitive Teardown Of The User Experience (UX) Charles Mauro (http://www.mauronewmedia.com/blog/author/cmauromauronewmedia- com/), February 12, 2014 Promised simplicity, delivered deep complexity One of the fascinating conundrums in the science of man-machine design is the human mind’s complete inability to accurately assess the operational complexity of a given user interface by visual inspection. It turns out the human operating system is hardwired to judge the complexity of almost anything based on visual/spatial arrangement of elements on whatever you are looking at. This is especially true for screen-based interfaces. Simply put, the more elements presented to a user on the screen, the higher the judged initial complexity. The opposite is also true. A very simple screen leads to an assumption of simplicity. The key point here is that the actual cognitive complexity of a given UX solution cannot be judged by visual inspection… nothing is actually further from the truth. Yet we do it every time we open a new app, visit a website, or load new software. In the course of consulting in the science of human factors engineering I have constantly run into this problem. It is so ubiquitous that I have called it the simplicity/complexity deceit (SCD). The problem is everywhere we turn today: in our software, in our automobiles, in our kitchen appliances, in our media systems, and yes, even in the apps like Flappy Bird which we download for minimal cost or even free.
    [Show full text]
  • Hitting Reset: Devising a New Video Game Copyright Regime
    COMMENT HITTING RESET: DEVISING A NEW VIDEO GAME COPYRIGHT REGIME DREW S. DEAN† INTRODUCTION ............................................................................ 1240 I. OVERVIEW OF THE VIDEO GAME INDUSTRY ........................... 1242 A. Mobile Games Industry .............................................................. 1243 B. Cloning in the Mobile Gaming Space ........................................... 1249 II. COPYRIGHT DOCTRINE AND VIDEO GAMES ............................ 1251 A. Copyrightable Subject Matter ...................................................... 1252 B. Idea–Expression Dichotomy in Video Games ................................. 1253 C. Limiting Doctrines ..................................................................... 1255 D. Copyright Infringement Tests ....................................................... 1257 E. Significant Case Law ................................................................. 1258 III. SHIFTS IN CASE LAW .............................................................. 1264 A. Tetris Holding, LLC v. Xio Interactive, Inc. ............................ 1264 B. Spry Fox LLC v. LOLApps Inc. .............................................. 1267 C. DaVinci Editrice S.R.L. v. ZiKo Games, LLC ......................... 1269 D. Summary and Conclusions .......................................................... 1273 IV. THREES: A CASE STUDY IN CLONING ....................................... 1275 A. Video Game Copyright, 1981–2012 .............................................. 1277 B. Video
    [Show full text]
  • ANNUAL REPORT WISSAHICKON SCHOOL DISTRICT and WISSAHICKON EDUCATIONAL OPPORTUNITIES FOUNDATION (WEOF)
    ANNUAL REPORT WISSAHICKON SCHOOL DISTRICT and WISSAHICKON EDUCATIONAL OPPORTUNITIES FOUNDATION (WEOF) IMPORTANT NEWS ABOUT OUR SCHOOLS & STUDENTS 2015-2016 SCHOOL YEAR Greetings to the Wissahickon Community: The Wissahickon Administration, School Board and WEOF Board of Directors are pleased to partner this year to bring you the 2015-2016 Annual Report featuring highlights from the school district and the education foundation. This year was marked by tremendous progress as the school district and our education foundation reached higher to achieve more - in the classroom, on the field, in co-curricular activities and in the community. Our teachers and other staff, coaches, advisors and volunteers demonstrated in countless ways their commitment to providing REACHING HIGHER. our students an exceptional learning experience. In turn, our students demonstrated ACHIEVING MORE. that they are not only learning essential skills in academic 50 Years of Excellence in Education subject areas, but becoming responsible, well-rounded As we reflect on the many accomplishments of the past school year in this individuals - physically, socially Annual Report, the Wissahickon School District will be kicking off the celebration and emotionally. of another special year - the 50th anniversary of the establishment of our district. In 1966, the district was organized under Pennsylvania Act 299 - Advanced We hope you enjoy reading about Establishment - when the school districts of the Borough of Ambler and the the great things happening every Townships of Lower Gwynedd and Whitpain became the Wissahickon School day at Wissahickon! District (some individual schools are older such as Blue Bell Elementary which turns 60 this year). In recognition of 50 years of excellence in education and beyond, we will be sharing the history of our schools with the community and highlighting the amazing achievements of our alumni and staff.
    [Show full text]
  • Score Distribution Analysis, Artificial Intelligence, and Player Modeling for Quantitative Game Design
    SCORE DISTRIBUTION ANALYSIS, ARTIFICIAL INTELLIGENCE, AND PLAYER MODELING FOR QUANTITATIVE GAME DESIGN DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY (Computer Science) at the NEW YORK UNIVERSITY TANDON SCHOOL OF ENGINEERING by Aaron Isaksen May 2017 SCORE DISTRIBUTION ANALYSIS, ARTIFICIAL INTELLIGENCE, AND PLAYER MODELING FOR QUANTITATIVE GAME DESIGN DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY (Computer Science) at the NEW YORK UNIVERSITY TANDON SCHOOL OF ENGINEERING by Aaron Isaksen May 2017 Approved: Department Chair Signature Date University ID: N18319753 Net ID: ai758 ii Approved by the Guidance Committee: Major: Computer Science Andy Nealen Assistant Professor of Computer Science New York University, Tandon School of Engineering Date Julian Togelius Associate Professor of Computer Science New York University, Tandon School of Engineering Date Frank Lantz Full Arts Professor and Director New York University, NYU Game Center Date Michael Mateas Professor of Computational Media and Director University of California at Santa Cruz Center for Games and Playable Media Date Leonard McMillan Associate Professor of Computer Science University of North Carolina at Chapel Hill Date iii Microfilm/Publishing Microfilm or copies of this dissertation may be obtained from: UMI Dissertation Publishing ProQuest CSA 789 E. Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106-1346 iv Vita Aaron Mark Isaksen Education Ph.D. in Computer Science Jan. 2014 - May 2017 New York University, Tandon School of Engineering, Game Innovation Lab - Pearl Brownstein Doctoral Research Award, 2017 - Outstanding Performance on Ph.D. Qualifying Exam, D. Rosenthal, MD Award, 2015 - Best paper in Artificial Intelligence and Game Technology, FDG 2015 M.S.
    [Show full text]