Deep Reinforcement Learning

Total Page:16

File Type:pdf, Size:1020Kb

Deep Reinforcement Learning DEEP REINFORCEMENT LEARNING Yuxi Li ([email protected]) ABSTRACT We discuss deep reinforcement learning in an overview style. We draw a big pic- ture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical con- texts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi- agent RL, relational RL, and learning to learn. After that, we discuss RL appli- cations, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transporta- tion, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue. 1 Keywords: deep reinforcement learning, deep RL, algorithm, architecture, ap- plication, artificial intelligence, machine learning, deep learning, reinforcement learning, value function, policy, reward, model, exploration vs. exploitation, representation, attention, memory, unsupervised learning, hierarchical RL, multi- agent RL, relational RL, learning to learn, games, robotics, computer vision, nat- ural language processing, finance, business management, healthcare, education, energy, transportation, computer systems, science, engineering, art arXiv:1810.06339v1 [cs.LG] 15 Oct 2018 1Work in progress. Under review for Morgan & Claypool: Synthesis Lectures in Artificial Intelligence and Machine Learning. This manuscript is based on our previous deep RL overview (Li, 2017). It benefits from discussions with and comments from many people. Acknowledgements will appear in a later version. 1 CONTENTS 1 Introduction5 2 Background8 2.1 Artificial Intelligence . .8 2.2 Machine Learning . 10 2.3 Deep Learning . 11 2.4 Reinforcement Learning . 12 2.4.1 Problem Setup . 13 2.4.2 Value Function . 13 2.4.3 Exploration vs. Exploitation . 14 2.4.4 Dynamic Programming . 14 2.4.5 Monte Carlo . 15 2.4.6 Temporal Difference Learning . 16 2.4.7 Multi-step Bootstrapping . 17 2.4.8 Model-based RL . 18 2.4.9 Function Approximation . 18 2.4.10 Policy Optimization . 20 2.4.11 Deep RL . 21 2.4.12 Brief Summary . 22 2.5 Resources . 22 Part I: Core Elements 23 3 Value Function 24 3.1 Deep Q-Learning . 24 3.2 Distributional Value Function . 27 3.3 General Value Function . 28 4 Policy 30 4.1 Policy Gradient . 31 4.2 Actor-Critic . 32 4.3 Trust Region Methods . 33 4.4 Policy Gradient with Off-Policy Learning . 34 4.5 Benchmark Results . 35 5 Reward 36 6 Model 39 2 7 Exploration vs. Exploitation 42 8 Representation 46 8.1 Classics . 46 8.2 Neural Networks . 48 8.3 Knowledge and Reasoning . 49 Part II: Important Mechanisms 53 9 Attention and Memory 54 10 Unsupervised Learning 56 11 Hierarchical RL 59 12 Multi-Agent RL 62 13 Relational RL 66 14 Learning to Learn 68 14.1 Few/One/Zero-Shot Learning . 69 14.2 Transfer/Multi-task Learning . 70 14.3 Learning to Optimize . 71 14.4 Learning Reinforcement Learn . 71 14.5 Learning Combinatorial Optimization . 72 14.6 AutoML . 73 Part III: Applications 75 15 Games 76 15.1 Board Games . 76 15.2 Card Games . 80 15.3 Video Games . 81 16 Robotics 82 16.1 Sim-to-Real . 82 16.2 Imitation Learning . 83 16.3 Value-based Learning . 83 16.4 Policy-based Learning . 84 16.5 Model-based Learning . 84 16.6 Autonomous Driving Vehicles . 84 17 Natural Language Processing 86 3 17.1 Sequence Generation . 87 17.2 Machine Translation . 87 17.3 Dialogue Systems . 88 18 Computer Vision 90 18.1 Recognition . 90 18.2 Motion Analysis . 91 18.3 Scene Understanding . 91 18.4 Integration with NLP . 91 18.5 Visual Control . 92 18.6 Interactive Perception . 92 19 Finance and Business Management 93 19.1 Option Pricing . 94 19.2 Portfolio Optimization . 94 19.3 Business Management . 95 20 More Applications 96 20.1 Healthcare . 96 20.2 Education . 96 20.3 Energy . 97 20.4 Transportation . 98 20.5 Computer Systems . 98 20.6 Science, Engineering and Art . 100 20.6.1 Chemistry . 101 20.6.2 Mathematics . 101 20.6.3 Music . 101 21 Discussions 103 21.1 Brief Summary . 103 21.2 Challenges and Opportunities . 103 21.3 Epilogue . 109 4 1 INTRODUCTION Reinforcement learning (RL) is about an agent interacting with the environment, learning an optimal policy, by trial and error, for sequential decision making problems, in a wide range of fields in natural sciences, social sciences, and engineering (Sutton and Barto, 1998; 2018; Bertsekas and Tsitsiklis, 1996; Bertsekas, 2012; Szepesvari´ , 2010; Powell, 2011). The integration of reinforcement learning and neural networks has a long history (Sutton and Barto, 2018; Bertsekas and Tsitsiklis, 1996; Schmidhuber, 2015). With recent exciting achievements of deep learning (LeCun et al., 2015; Goodfellow et al., 2016), benefiting from big data, powerful computation, new algorithmic techniques, mature software packages and architectures, and strong financial support, we have been witnessing the renaissance of reinforcement learning (Krakovsky, 2016), especially, the combination of deep neural networks and reinforcement learning, i.e., deep reinforcement learning (deep RL).2 Deep learning, or deep neural networks, has been prevailing in reinforcement learning in the last several years, in games, robotics, natural language processing, etc. We have been witnessing break- throughs, like deep Q-network (DQN) (Mnih et al., 2015), AlphaGo (Silver et al., 2016a; 2017), and DeepStack (Moravcˇ´ık et al., 2017); each of them represents a big family of problems and large number of applications. DQN (Mnih et al., 2015) is for single player games, and single agent con- trol in general. DQN has implications for most algorithms and applications in RL. DQN ignites this round of popularity of deep reinforcement learning. AlphaGo (Silver et al., 2016a; 2017) is for two player perfect information zero-sum games. AlphaGo makes a phenomenal achievement on a very hard problem, and sets a landmark in AI. The success of AlphaGo influences similar games directly, and Alpha Zero (Silver et al., 2017) has already achieved significant successes on chess and Shogi. The techniques underlying AlphaGo (Silver et al., 2016a) and AlphaGo Zero (Silver et al., 2017), namely, deep learning, reinforcement learning, Monte Carlo tree search (MCTS), and self-play, will have wider and further implications and applications. As recommended by AlphaGo authors in their papers (Silver et al., 2016a; 2017), the following applications are worth further inves- tigation: general game-playing (in particular, video games), classical planning, partially observed planning, scheduling, constraint satisfaction, robotics, industrial control, online recommendation systems, protein folding, reducing energy consumption, and searching for revolutionary new mate- rials. DeepStack (Moravcˇ´ık et al., 2017) is for two player imperfect information zero-sum games, a family of problems with inherent hardness to solve. DeepStack, similar to AlphaGo, also makes an extraordinary achievement on a hard problem, and sets a milestone in AI. It will have rich impli- cations and wide applications, e.g., in defending strategic resources and robust decision making for medical treatment recommendations (Moravcˇ´ık et al., 2017). We also see novel algorithms, architectures, and applications, like asynchronous methods (Mnih et al., 2016), trust region methods (Schulman et al., 2015; Schulman et al., 2017b; Nachum et al., 2018), deterministic policy gradient (Silver et al., 2014; Lillicrap et al., 2016), combining policy gradient with off-policy RL (Nachum et al., 2017; 2018; Haarnoja et al., 2018), interpolated policy gradient (Gu et al., 2017), unsupervised reinforcement and auxiliary learning (Jaderberg et al., 2017; Mirowski et al., 2017), hindsight experience replay (Andrychowicz et al., 2017), differentiable neu- ral computer (Graves et al., 2016), neural architecture design (Zoph and Le, 2017), guided policy search (Levine et al., 2016), generative adversarial imitation learning (Ho and Ermon, 2016), multi- agent games (Jaderberg et al., 2018), hard exploration Atari games (Aytar et al., 2018), StarCraft II Sun et al.(2018); Pang et al.(2018), chemical syntheses planning (Segler et al., 2018), character animation (Peng et al., 2018a), dexterous robots (OpenAI, 2018), OpenAI Five for Dota 2, etc. Creativity would push the frontiers of deep RL further w.r.t. core elements, important mechanisms, and applications, seemingly without a boundary. RL probably helps, if a problem can be regarded as or transformed into a sequential decision making problem. Why has deep learning been helping reinforcement learning make so many and so enormous achieve- ments? Representation learning with deep learning enables automatic feature engineering and end- to-end learning through gradient descent, so that reliance on domain knowledge is significantly 2We choose to abbreviate deep reinforcement learning as ”deep RL”, since it is a branch of reinforcement learning, in particular, using deep neural networks in reinforcement learning, and ”RL” is a well established abbreviation for reinforcement learning. 5 reduced or even removed for some problems. Feature engineering used to be done manually and is usually time-consuming, over-specified, and incomplete. Deep distributed representations exploit the.
Recommended publications
  • Procedural Content Generation for Games
    Procedural Content Generation for Games Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universit¨atMannheim vorgelegt von M.Sc. Jonas Freiknecht aus Hannover Mannheim, 2020 Dekan: Dr. Bernd L¨ubcke, Universit¨atMannheim Referent: Prof. Dr. Wolfgang Effelsberg, Universit¨atMannheim Korreferent: Prof. Dr. Colin Atkinson, Universit¨atMannheim Tag der m¨undlichen Pr¨ufung: 12. Februar 2021 Danksagungen Nach einer solchen Arbeit ist es nicht leicht, alle Menschen aufzuz¨ahlen,die mich direkt oder indirekt unterst¨utzthaben. Ich versuche es dennoch. Allen voran m¨ochte ich meinem Doktorvater Prof. Wolfgang Effelsberg danken, der mir - ohne mich vorher als Master-Studenten gekannt zu haben - die Promotion an seinem Lehrstuhl erm¨oglichte und mit Geduld, Empathie und nicht zuletzt einem mir unbegreiflichen Verst¨andnisf¨ur meine verschiedenen Ausfl¨ugein die Weiten der Informatik unterst¨utzthat. Sie werden mir nicht glauben, wie dankbar ich Ihnen bin. Weiterhin m¨ochte ich meinem damaligen Studiengangsleiter Herrn Prof. Heinz J¨urgen M¨ullerdanken, der vor acht Jahren den Kontakt zur Universit¨atMannheim herstellte und mich ¨uberhaupt erst in die richtige Richtung wies, um mein Promotionsvorhaben anzugehen. Auch Herr Prof. Peter Henning soll nicht ungenannt bleiben, der mich - auch wenn es ihm vielleicht gar nicht bewusst ist - davon ¨uberzeugt hat, dass die Erzeugung virtueller Welten ein lohnenswertes Promotionsthema ist. Ganz besonderer Dank gilt meiner Frau Sarah und meinen beiden Kindern Justus und Elisa, die viele Abende und Wochenenden zugunsten dieser Arbeit auf meine Gesellschaft verzichten mussten. Jetzt ist es geschafft, das n¨achste Projekt ist dann wohl der Garten! Ebenfalls geb¨uhrt meinen Eltern und meinen Geschwistern Dank.
    [Show full text]
  • Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
    Black Box Explanation by Learning Image Exemplars in the Latent Feature Space Riccardo Guidotti1, Anna Monreale2, Stan Matwin3;4, and Dino Pedreschi2 1 ISTI-CNR, Pisa, Italy, [email protected] 2 University of Pisa, Italy, [email protected] 3 Dalhousie University, [email protected] 4 Institute of Computer Scicne, Polish Academy of Sciences Abstract. We present an approach to explain the decisions of black box models for image classification. While using the black box to label im- ages, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first gener- ates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to be- come counter-factuals by \morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Be- sides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability. Keywords: Explainable AI, Adversarial Autoencoder, Image Exemplars. 1 Introduction Automated decision systems based on machine learning techniques are widely used for classification, recognition and prediction tasks.
    [Show full text]
  • Context-Aware Learning to Rank with Self-Attention
    Context-Aware Learning to Rank with Self-Attention Przemysław Pobrotyn Tomasz Bartczak Mikołaj Synowiec ML Research at Allegro.pl ML Research at Allegro.pl ML Research at Allegro.pl [email protected] [email protected] [email protected] Radosław Białobrzeski Jarosław Bojar ML Research at Allegro.pl ML Research at Allegro.pl [email protected] [email protected] ABSTRACT engines, recommender systems, question-answering systems, and Learning to rank is a key component of many e-commerce search others. engines. In learning to rank, one is interested in optimising the A typical machine learning solution to the LTR problem involves global ordering of a list of items according to their utility for users. learning a scoring function, which assigns real-valued scores to Popular approaches learn a scoring function that scores items in- each item of a given list, based on a dataset of item features and dividually (i.e. without the context of other items in the list) by human-curated or implicit (e.g. clickthrough logs) relevance labels. optimising a pointwise, pairwise or listwise loss. The list is then Items are then sorted in the descending order of scores [23]. Perfor- sorted in the descending order of the scores. Possible interactions mance of the trained scoring function is usually evaluated using an between items present in the same list are taken into account in the IR metric like Mean Reciprocal Rank (MRR) [33], Normalised Dis- training phase at the loss level. However, during inference, items counted Cumulative Gain (NDCG) [19] or Mean Average Precision are scored individually, and possible interactions between them are (MAP) [4].
    [Show full text]
  • An In-Depth Exploration of the Effect of 2D/3D Views and Controller Types on First Person Shooter Games in Virtual Reality
    An In-Depth Exploration of the Effect of 2D/3D Views and Controller Types on First Person Shooter Games in Virtual Reality Diego Monteiro Hai-Ning Liang¹ Jialin Wang Hao Chen Nilufar Baghaei Xi'an Jiaotong- Xi'an Jiaotong- Xi'an Jiaotong- Xi'an Jiaotong- Massey University Liverpool University Liverpool University Liverpool University Liverpool University they affect immersion [5]. Similarly, the effects of viewing types ABSTRACT on Simulator Sickness (SS) during fast-paced applications, such as The amount of interest in Virtual Reality (VR) research has First-Person Shooter (FPS) games [6]–[9], are also underexplored. significantly increased over the past few years, both in academia Players pay attention to several factors during gameplay [10], and industry. The release of commercial VR Head-Mounted [11]. In VR, the level of playability and immersion are important Displays (HMDs) has been a major contributing factor. However, considerations because they affect enjoyment and performance, especially while players are (or are not) feeling SS symptoms. there is still much to be learned, especially how views and input Playability and immersion can be affected by how the VR techniques, as well as their interaction, affect the VR experience. environment is controlled [12]–[14]. Nevertheless, we must note There is little work done on First-Person Shooter (FPS) games in that research shows that immersive technologies can hinder VR, and those few studies have focused on a single aspect of VR performance when compared to a traditional setup (i.e., monitor, FPS. They either focused on the view, e.g., comparing VR to a keyboard, and mouse) [15], [16].
    [Show full text]
  • On the Boosting Ability of Top-Down Decision Tree Learning Algorithms
    On the Bo osting AbilityofTop-Down Decision Tree Learning Algorithms Michael Kearns Yishay Mansour AT&T Research Tel-Aviv University May 1996 Abstract We analyze the p erformance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a pro of that such algorithms are boosting algorithms. By this we mean that if the functions that lab el the internal no des of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The b ounds we obtain for this ampli catio n showaninteresting dep endence on the splitting criterion used by the top-down algorithm. More precisely, if the functions used to lab el the internal no des have error 1=2 as approximation s to the target function, then for the splitting criteria used by CART and C4.5, trees 2 2 2 O 1= O log 1== of size 1= and 1= resp ectively suce to drive the error b elow .Thus for example, a small constant advantage over random guessing is ampli ed to any larger constant advantage with trees of constant size. For a new splitting criterion suggested by our analysis, the much stronger 2 O 1= b ound of 1= which is p olynomial in 1= is obtained, whichisprovably optimal for decision tree algorithms. The di ering b ounds have a natural explanation in terms of concavity prop erties of the splitting criterion.
    [Show full text]
  • Boosting with Multi-Way Branching in Decision Trees
    Boosting with Multi-Way Branching in Decision Trees Yishay Mansour David McAllester AT&T Labs-Research 180 Park Ave Florham Park NJ 07932 {mansour, dmac }@research.att.com Abstract It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree al­ gorithms, such as CART and C4.5, implement a trade-off between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justifica­ tion for a particular quantitative trade-off curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy con­ struction of decision trees remains an effective boosting algorithm. 1 Introduction Decision trees have been proved to be a very popular tool in experimental machine learning. Their popularity stems from two basic features - they can be constructed quickly and they seem to achieve low error rates in practice. In some cases the time required for tree growth scales linearly with the sample size. Efficient tree construction allows for very large data sets. On the other hand, although there are known theoretical handicaps of the decision tree representations, it seem that in practice they achieve accuracy which is comparable to other learning paradigms such as neural networks. While decision tree learning algorithms are popular in practice it seems hard to quantify their success ,in a theoretical model.
    [Show full text]
  • An Introduction to Neural Information Retrieval
    The version of record is available at: http://dx.doi.org/10.1561/1500000061 An Introduction to Neural Information Retrieval Bhaskar Mitra1 and Nick Craswell2 1Microsoft, University College London; [email protected] 2Microsoft; [email protected] ABSTRACT Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Tradi- tional learning to rank models employ supervised machine learning (ML) techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between query and docu- ment vocabulary. Unlike classical learning to rank models and non-neural approaches to IR, these new ML techniques are data-hungry, requiring large scale training data before they can be deployed. This tutorial intro- duces basic concepts and intuitions behind neural IR models, and places them in the context of classical non-neural approaches to IR. We begin by introducing fundamental concepts of retrieval and different neural and non-neural approaches to unsupervised learning of vector representations of text. We then review IR methods that employ these pre-trained neural vector representations without learning the IR task end-to-end. We intro- duce the Learning to Rank (LTR) framework next, discussing standard loss functions for ranking. We follow that with an overview of deep neural networks (DNNs), including standard architectures and implementations. Finally, we review supervised neural learning to rank models, including re- cent DNN architectures trained end-to-end for ranking tasks. We conclude with a discussion on potential future directions for neural IR.
    [Show full text]
  • Towards the Implementation of an Intelligent Software Agent for the Elderly Amir Hossein Faghih Dinevari
    Towards the Implementation of an Intelligent Software Agent for the Elderly by Amir Hossein Faghih Dinevari A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Computing Science University of Alberta c Amir Hossein Faghih Dinevari, 2017 Abstract With the growing population of the elderly and the decline of population growth rate, developed countries are facing problems in taking care of their elderly. One of the issues that is becoming more severe is the issue of compan- ionship for the aged people, particularly those who chose to live independently. In order to assist the elderly, we suggest the idea of a software conversational intelligent agent as a companion and assistant. In this work, we look into the different components that are necessary for creating a personal conversational agent. We have a preliminary implementa- tion of each component. Among them, we have a personalized knowledge base which is populated by the extracted information from the conversations be- tween the user and the agent. We believe that having a personalized knowledge base helps the agent in having better, more fluent and contextual conversa- tions. We created a prototype system and conducted a preliminary evaluation to assess by users conversations of an agent with and without a personalized knowledge base. ii Table of Contents 1 Introduction 1 1.1 Motivation . 1 1.1.1 Population Trends . 1 1.1.2 Living Options for the Elderly . 2 1.1.3 Companionship . 3 1.1.4 Current Technologies . 4 1.2 Proposed System . 5 1.2.1 Personal Assistant Functionalities .
    [Show full text]
  • Intellibot: a Domain-Specific Chatbot for the Insurance Industry
    IntelliBot: A Domain-specific Chatbot for the Insurance Industry MOHAMMAD NURUZZAMAN A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy UNSW Canberra at Australia Defence Force Academy (ADFA) School of Business 20 October 2020 ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institute, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’ Signed Date To my beloved parents Acknowledgement Writing a thesis is a great process to review not only my academic work but also the journey I took as a PhD student. I have spent four lovely years at UNSW Canberra in the Australian Defence Force Academy (ADFA). Throughout my journey in graduate school, I have been fortunate to come across so many brilliant researchers and genuine friends. It is the people who I met shaped who I am today. This thesis would not have been possible without them. My gratitude goes out to all of them.
    [Show full text]
  • Full Academic Cv: Grigori Fursin, Phd
    FULL ACADEMIC CV: GRIGORI FURSIN, PHD Current position VP of MLOps at OctoML.ai (USA) Languages English (British citizen); French (spoken, intermediate); Russian (native) Address Paris region, France Education PhD in computer science with the ORS award from the University of Edinburgh (2004) Website cKnowledge.io/@gfursin LinkedIn linkedin.com/in/grigorifursin Publications scholar.google.com/citations?user=IwcnpkwAAAAJ (H‐index: 25) Personal e‐mail [email protected] I am a computer scientist, engineer, educator and business executive with an interdisciplinary background in computer engineering, machine learning, physics and electronics. I am passionate about designing efficient systems in terms of speed, energy, accuracy and various costs, bringing deep tech to the real world, teaching, enabling reproducible research and sup‐ porting open science. MY ACADEMIC RESEARCH (TENURED RESEARCH SCIENTIST AT INRIA WITH PHD IN CS FROM THE UNIVERSITY OF EDINBURGH) • I was among the first researchers to combine machine learning, autotuning and knowledge sharing to automate and accelerate the development of efficient software and hardware by several orders of magnitudeGoogle ( scholar); • developed open‐source tools and started educational initiatives (ACM, Raspberry Pi foundation) to bring this research to the real world (see use cases); • prepared and tought M.S. course at Paris‐Saclay University on using ML to co‐design efficient software and hardare (self‐optimizing computing systems); • gave 100+ invited research talks; • honored to receive the
    [Show full text]
  • Delivering a Machine Learning Course on HPC Resources
    Delivering a machine learning course on HPC resources Stefano Bagnasco, Federica Legger, Sara Vallero This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement LHCBIGDATA No 799062 The course ● Title: Big Data Science and Machine Learning ● Graduate Program in Physics at University of Torino ● Academic year 2018-2019: ○ Starts in 2 weeks ○ 2 CFU, 10 hours (theory+hands-on) ○ 7 registered students ● Academic year 2019-2020: ○ March 2020 ○ 4 CFU, 16 hours (theory+hands-on) ○ Already 2 registered students 2 The Program ● Introduction to big data science ○ The big data pipeline: state-of-the-art tools and technologies ● ML and DL methods: ○ supervised and unsupervised models, ○ neural networks ● Introduction to computer architecture and parallel computing patterns ○ Initiation to OpenMP and MPI (2019-2020) ● Parallelisation of ML algorithms on distributed resources ○ ML applications on distributed architectures ○ Beyond CPUs: GPUs, FPGAs (2019-2020) 3 The aim ● Applied ML course: ○ Many courses on advanced statistical methods available elsewhere ○ Focus on hands-on sessions ● Students will ○ Familiarise with: ■ ML methods and libraries ■ Analysis tools ■ Collaborative models ■ Container and cloud technologies ○ Learn how to ■ Optimise ML models ■ Tune distributed training ■ Work with available resources 4 Hands-on ● Python with Jupyter notebooks ● Prerequisites: some familiarity with numpy and pandas ● ML libraries ○ Day 2: MLlib ■ Gradient Boosting Trees GBT ■ Multilayer Perceptron Classifier MCP ○ Day 3: Keras ■ Sequential model ○ Day 4: bigDL ■ Sequential model ● Coming: ○ CUDA ○ MPI ○ OpenMP 5 ML Input Dataset for hands on ● Open HEP dataset @UCI, 7GB (.csv) ● Signal (heavy Higgs) + background ● 10M MC events (balanced, 50%:50%) ○ 21 low level features ■ pt’s, angles, MET, b-tag, … Signal ○ 7 high level features ■ Invariant masses (m(jj), m(jjj), …) Background: ttbar Baldi, Sadowski, and Whiteson.
    [Show full text]
  • Self-Training Wavenet for TTS in Low-Data Regimes
    StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes Manish Sharma, Tom Kenter, Rob Clark Google UK fskmanish, tomkenter, [email protected] Abstract is increased. However, it can be seen from their results that the quality degrades when the number of recordings is further Recently, WaveNet has become a popular choice of neural net- decreased. work to synthesize speech audio. Autoregressive WaveNet is To reduce the voice artefacts observed in WaveNet stu- capable of producing high-fidelity audio, but is too slow for dent models trained under a low-data regime, we aim to lever- real-time synthesis. As a remedy, Parallel WaveNet was pro- age both the high-fidelity audio produced by an autoregressive posed, which can produce audio faster than real time through WaveNet, and the faster-than-real-time synthesis capability of distillation of an autoregressive teacher into a feedforward stu- a Parallel WaveNet. We propose a training paradigm, called dent network. A shortcoming of this approach, however, is that StrawNet, which stands for “Self-Training WaveNet”. The key a large amount of recorded speech data is required to produce contribution lies in using high-fidelity speech samples produced high-quality student models, and this data is not always avail- by an autoregressive WaveNet to self-train first a new autore- able. In this paper, we propose StrawNet: a self-training ap- gressive WaveNet and then a Parallel WaveNet model. We refer proach to train a Parallel WaveNet. Self-training is performed to models distilled this way as StrawNet student models. using the synthetic examples generated by the autoregressive We evaluate StrawNet by comparing it to a baseline WaveNet teacher.
    [Show full text]