Deep Learning

Total Page:16

File Type:pdf, Size:1020Kb

Deep Learning Deep Learning Kairit Sirts Lecture in TUT 19.12.2016 Outline • What can be done with deep learning? • Deep learning demystified • How can you get started with deep learning? 2 Why deep learning? Deep learning Gradient boosting Random Forest Linear model 3 http://www.infoworld.com/article/3003315/big-data/deep-learning-a-brief-guide-for-practical-problem-solvers.html What can be done with deep learning? Handwritten digit recognition MNIST benchmark dataset The best reported error rate is 0.21% 5 Street view number recognition • Obtained from house numbers in Google Street View images • Best error rate is 1.69% 6 Image classification 7 Image classification 10 objects 6000 labeled instances for each object Best accuracy so far 96.53% 8 Image classification 9 Image classification 20 superclasses 100 finegrained classes 600 labeled images per class Best classification accuracy 75.72% 10 Detecting doodles https://quickdraw.withgoogle.com There are other simple and fun AI experiments launched by Google https://aiexperiments.withgoogle.com 11 Image captioning 12 Image captioning – not so great results 13 Automatic colorization of images 14 http://richzhang.github.io/colorization/resources/images/teaser3.jpg Automatic colorization of images - failed 15 DeepDream https://deepdreamgenerator.com 16 DeepDream 17 DeepDream 18 DeepDream 19 Word embeddings 20 http://metaoptimize.s3.amazonaws.com/cw-embeddings-ACL2010/embeddings-mostcommon.EMBEDDING_SIZE=50.png Word embeddings months weekdays numbers 21 Word embeddings • � man − � woman ≈ � king − �(queen) • � walking − � walked ≈ � swimming − �(swam) 22 Automatic text generation – pseudo Shakespeare 23 http://karpathy.github.io/2015/05/21/rnn-effectiveness Machine translation • Google Translate app 24 Learning to play Atari Arcade games 25 https://www.youtube.com/watch?v=cjpEIotvwFY AlphaGo 26 https://www.youtube.com/watch?v=PQCrX1sQSzY Other tasks tackled with deep neural networks • Speech recognition • Various tasks in robotics • Log analysis/risk detection • Recommendation systems • Motion detection from videos • Business and Economics analytics • Etc … 27 Deep learning demystified How does deep learning work? • Biological neuron • Artificial neuron 29 http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7 • Biological neural network • Artificial neural network 30 https://www.eeweb.com/blog/rob_riemen/deep-machine-learning-and-the-google-brain http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7 What happens inside a neuron? < ℎ = �7�7 + �:�: + ⋯ + �<�< = = �>�> >?7 Output: ℎ = �(�) 31 Activation function E DE 1 if � ≥ th 1 � − � � � = J � � = � � = � � = max (0, �) 0 if � < th 1 + �DE �E + �DE 32 https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html Single neuron logic gates • Threshold activation function 33 https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/ XOR gate • Cannot be done with a single neuron • A hidden layer is necessary �� �� OR NOT AND AND 0 0 � 0 ∙ 1 + 0 ∙ 1 > 0.5 = 0 � 0 ∙ −1 + 0 ∙ −1 > −1.5 = 1 � 0 ∙ 1 + 1 ∙ 1 > 1.5 = 0 0 1 � 0 ∙ 1 + 1 ∙ 1 > 0.5 = 1 � 0 ∙ −1 + 1 ∙ −1 > −1.5 = 1 � 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1 1 0 � 1 ∙ 1 + 0 ∙ 1 > 0.5 = 1 � 1 ∙ −1 + 0 ∙ −1 > −1.5 = 1 � 1 ∙ 1 + 1 ∙ 1 > 1.5 = 1 1 1 � 1 ∙ 1 + 1 ∙ 1 > 0.5 = 1 � 1 ∙ −1 + 1 ∙ −1 > −1.5 = 0 � 1 ∙ 1 + 0 ∙ 1 > 1.5 = 0 34 https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/ How to assign weights? 8 Y 9 + 9 Y 9 + 9 Y 9 + 9 Y 4 = = 270 weights 35 http://neuralnetworksanddeeplearning.com/ Backpropagation • Standard and efficient method for training neural networks • The general idea: • Compute the error with a forward pass • Propagate the error back to change the weights such that the error would become smaller ERROR à ERROR’ ERROR’ < ERROR 36 Diversion to calculus - derivative • �_ = �_ � • Derivative is the slope of the tangent line • It is the rate of change when going in the direction of steepest ascent 37 Derivatives • When �_ � = 0 then it is the local or global maximum or minimum or a saddle point • When �_ � > 0 then the function is increasing • When �_ � < 0 then the function is decreasing 38 Gradients • Generalization of derivatives to multivariate functions • Derivative is a vector pointing to the direction of steepest ascent ab ab • ∇�(�, �) = , ac ad ab ab • , - partial derivatives – take ac ad derivative wrt one variable while treating all others as constant 39 Gradients and backpropagation • Backpropagation is used to compute the gradients with respect to all parameters in a neural network. • The gradients are then used in a general method of gradient descent for minimizing functions. • We want to minimize the cost function that measures the error made by the neural network. • In order to do that we need to move to the direction of deepest descent given by the gradients. 40 Gradient descent • An iterative algorithm • Start with initial parameter values �f • Update parameters iteratively until convergence: �gh7 =: �g − �∇� � • � - learning rate, controls the step size 41 Deep learning demystified How does backpropagation work? Backpropagation explained • Example from: https://mattmazur.com/2015/03/17/ • 2 inputs • 1 hidden layer with 2 neurons • Bias terms in both the hidden and output layer • 2 outputs 43 Initial configuration • Training values • Initial weights: �7, … , �l • Initial biases: �7, �: 44 Forward pass – first hidden unit 45 Forward pass – first hidden unit 46 Forward pass – second hidden unit 47 Forward pass – first output unit 48 Forward pass – second output unit 49 Forward pass – error of the first output 50 Forward pass – output error 51 Forward pass – output error 52 Backwards pass • Consider �n • How much a change in �n affects the total error? • Apply the chain rule: 53 Chain rule • Formula for computing derivative of the composition of two or more functions • � � ≡ �(� � ) ≡ (� ∘ �)(�) – composition of functions � and � • �_ � = �_ � � �_ � • � � = �sc � � = 3� � � � = �u(c) = �sc • �_ � = �_ � � �_ � = (�u(c))′�′(�) = �u c (3�)′ = �sc Y 3 = 3�sc 54 Backwards pass • Consider �n • How much a change in �n affects the total error? • Apply the chain rule: 55 How much does error change wrt the output? 56 How much does output change wrt its net input? 57 Derivative of the sigmoid function 1 � � = 1 + �DE �_ � = �(�)(1 − � � ) 58 How much does output change wrt its net input? 59 How much does net input change wrt �n? 60 Putting it all together 61 This is known as the delta rule • Delta rule is the gradient descent rule for updating the weights of the inputs to neurons in a single-layer neural network 62 Apply delta rule to outer layer weights 63 Update the weights with gradient descent • set learning rate � = 0.5 ��h� =: �� − ��� � 64 Backpropagation to hidden layer • Continue backwards pass to calculate new values for �7, �:, �s and �| 65 BP through hidden layer • ���€7 affects both �7 and �: and thus needs to take into account both: 66 BP through hidden layer • Consider one of those: • First term can be calculated using values computed before: • Second term is just �n 67 BP through hidden layer • Plug the values in: • Compute the same value for �:: • Compute the total: 68 BP through hidden layer a•‚g a<…g • Next we need ƒ„ and ƒ„ for each a<…gƒ„ a† weight � • Compute the partial derivative wrt a weight 69 BP through hidden layer • Putting it together • We can now update �7 70 BP through hidden layer • Compute the partial derivatives in the same way for �:, �s and �| • Update �:, �s and �| 71 After first update with backpropagation 72 Did the error decrease? • Old error was: 0.298371109 • Improvement: 0.007343335 • After 10000 updates the error will be ca 0.000035085 • The generated outputs will be 0.015912196 for 0.01 target and 0.984065734 for 0.99 target 73 In conclusion • Neural networks consist of artificial neurons organized into layers and connected to each other with learnable weights. • Backpropagation with gradient descent is the standard method for training neural networks. • Backpropagation can be used to compute the gradients of a neural network, regardless of the depth of the network. • Of course, there are other important tricks and tips but this is the basis of understanding neural networks and deep learning. 74 Common neural network architectures Feed-forward network • Simplest type of neural network • Connections between units do not form cycles • Information always moves in one direction • It never goes backwards 76 https://upload.wikimedia.org/wikipedia/en/5/54/Feed_forward_neural_net.gif Recurrent neural network • Connections between units form cycles • They possess internal memory – they “remember” the past inputs • Suitable for modeling sequential/temporal data, such as for instance text and language data 77 Convolutional neural networks • Convolutional layers have neurons arranged in 3 dimensions • Especially suitable for processing image data 78 http://parse.ele.tue.nl/education/cluster2 Autoencoders • Output layer attempts to reconstruct the input • Used for unsupervised feature learning • The hidden layer has typically less neurons, thus performing data compression 79 Getting started with neural networks Courses and tutorials • https://www.coursera.org/learn/machine-learning - • Introductory course on machine learning, provides necessary background • https://www.coursera.org/learn/neural-networks • Course on neural networks – assumes knowledge about machine learning •
Recommended publications
  • Training Autoencoders by Alternating Minimization
    Under review as a conference paper at ICLR 2018 TRAINING AUTOENCODERS BY ALTERNATING MINI- MIZATION Anonymous authors Paper under double-blind review ABSTRACT We present DANTE, a novel method for training neural networks, in particular autoencoders, using the alternating minimization principle. DANTE provides a distinct perspective in lieu of traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convex optimization techniques to cast autoencoder training as a bi-quasi-convex optimiza- tion problem. We show that for autoencoder configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations very effectively. DANTE effortlessly extends to networks with multiple hidden layers and varying network configurations. In experiments on standard datasets, autoencoders trained using the proposed method were found to be very promising and competitive to traditional backpropagation techniques, both in terms of quality of solution, as well as training speed. 1 INTRODUCTION For much of the recent march of deep learning, gradient-based backpropagation methods, e.g. Stochastic Gradient Descent (SGD) and its variants, have been the mainstay of practitioners. The use of these methods, especially on vast amounts of data, has led to unprecedented progress in several areas of artificial intelligence. On one hand, the intense focus on these techniques has led to an intimate understanding of hardware requirements and code optimizations needed to execute these routines on large datasets in a scalable manner. Today, myriad off-the-shelf and highly optimized packages exist that can churn reasonably large datasets on GPU architectures with relatively mild human involvement and little bootstrap effort.
    [Show full text]
  • ML-Based Interactive Data Visualization System for Diversity and Fairness Issues 1
    Jusub Kim : ML-based Interactive Data Visualization System for Diversity and Fairness Issues 1 https://doi.org/10.5392/IJoC.2019.15.4.001 ML-based Interactive Data Visualization System for Diversity and Fairness Issues Sey Min Department of Art & Technology Sogang University, Seoul, Republic of Korea Jusub Kim Department of Art & Technology Sogang University, Seoul, Republic of Korea ABSTRACT As the recent developments of artificial intelligence, particularly machine-learning, impact every aspect of society, they are also increasingly influencing creative fields manifested as new artistic tools and inspirational sources. However, as more artists integrate the technology into their creative works, the issues of diversity and fairness are also emerging in the AI-based creative practice. The data dependency of machine-learning algorithms can amplify the social injustice existing in the real world. In this paper, we present an interactive visualization system for raising the awareness of the diversity and fairness issues. Rather than resorting to education, campaign, or laws on those issues, we have developed a web & ML-based interactive data visualization system. By providing the interactive visual experience on the issues in interesting ways as the form of web content which anyone can access from anywhere, we strive to raise the public awareness of the issues and alleviate the important ethical problems. In this paper, we present the process of developing the ML-based interactive visualization system and discuss the results of this project. The proposed approach can be applied to other areas requiring attention to the issues. Key words: AI Art, Machine Learning, Data Visualization, Diversity, Fairness, Inclusiveness.
    [Show full text]
  • Q-Learning in Continuous State and Action Spaces
    -Learning in Continuous Q State and Action Spaces Chris Gaskett, David Wettergreen, and Alexander Zelinsky Robotic Systems Laboratory Department of Systems Engineering Research School of Information Sciences and Engineering The Australian National University Canberra, ACT 0200 Australia [cg dsw alex]@syseng.anu.edu.au j j Abstract. -learning can be used to learn a control policy that max- imises a scalarQ reward through interaction with the environment. - learning is commonly applied to problems with discrete states and ac-Q tions. We describe a method suitable for control tasks which require con- tinuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of -learning, is shown enhance learning speed and reliability for this task.Q 1 Introduction Reinforcement learning systems learn by trial-and-error which actions are most valuable in which situations (states) [1]. Feedback is provided in the form of a scalar reward signal which may be delayed. The reward signal is defined in relation to the task to be achieved; reward is given when the system is successfully achieving the task. The value is updated incrementally with experience and is defined as a discounted sum of expected future reward. The learning systems choice of actions in response to states is called its policy. Reinforcement learning lies between the extremes of supervised learning, where the policy is taught by an expert, and unsupervised learning, where no feedback is given and the task is to find structure in data.
    [Show full text]
  • Arxiv:2006.04059V1 [Cs.LG] 7 Jun 2020
    Soft Gradient Boosting Machine Ji Feng1;2, Yi-Xuan Xu1;3, Yuan Jiang3, Zhi-Hua Zhou3 [email protected], fxuyx, jiangy, [email protected] 1Sinovation Ventures AI Institute 2Baiont Technology 3National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract Gradient Boosting Machine has proven to be one successful function approximator and has been widely used in a variety of areas. However, since the training procedure of each base learner has to take the sequential order, it is infeasible to parallelize the training process among base learners for speed-up. In addition, under online or incremental learning settings, GBMs achieved sub-optimal performance due to the fact that the previously trained base learners can not adapt with the environment once trained. In this work, we propose the soft Gradient Boosting Machine (sGBM) by wiring multiple differentiable base learners together, by injecting both local and global objectives inspired from gradient boosting, all base learners can then be jointly optimized with linear speed-up. When using differentiable soft decision trees as base learner, such device can be regarded as an alternative version of the (hard) gradient boosting decision trees with extra benefits. Experimental results showed that, sGBM enjoys much higher time efficiency with better accuracy, given the same base learner in both on-line and off-line settings. arXiv:2006.04059v1 [cs.LG] 7 Jun 2020 1. Introduction Gradient Boosting Machine (GBM) [Fri01] has proven to be one successful function approximator and has been widely used in a variety of areas [BL07, CC11]. The basic idea is to train a series of base learners that minimize some predefined differentiable loss function in a sequential fashion.
    [Show full text]
  • Building Intelligent Systems with Large Scale Deep Learning Jeff Dean Google Brain Team G.Co/Brain
    Building Intelligent Systems with Large Scale Deep Learning Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google Google Brain Team Mission: Make Machines Intelligent. Improve People’s Lives. How do we do this? ● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers) ○ Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image captioning, neural translation, Magenta, ML for robotics control, healthcare, … ● Build and open-source systems like TensorFlow (see tensorflow.org and https://github.com/tensorflow/tensorflow) ● Collaborate with others at Google and Alphabet to get our work into the hands of billions of people (e.g., RankBrain for Google Search, GMail Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …) ● Train new researchers through internships and the Google Brain Residency program Main Research Areas ● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation Main Research Areas ● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation research.googleblog.com/2017/01 /the-google-brain-team-looking-ba ck-on.html 1980s and 1990s Accuracy neural networks other approaches Scale (data size, model size) 1980s and 1990s more Accuracy compute neural networks other approaches Scale
    [Show full text]
  • Audio Event Classification Using Deep Learning in an End-To-End Approach
    Audio Event Classification using Deep Learning in an End-to-End Approach Master thesis Jose Luis Diez Antich Aalborg University Copenhagen A. C. Meyers Vænge 15 2450 Copenhagen SV Denmark Title: Abstract: Audio Event Classification using Deep Learning in an End-to-End Approach The goal of the master thesis is to study the task of Sound Event Classification Participant(s): using Deep Neural Networks in an end- Jose Luis Diez Antich to-end approach. Sound Event Classifi- cation it is a multi-label classification problem of sound sources originated Supervisor(s): from everyday environments. An auto- Hendrik Purwins matic system for it would many applica- tions, for example, it could help users of hearing devices to understand their sur- Page Numbers: 38 roundings or enhance robot navigation systems. The end-to-end approach con- Date of Completion: sists in systems that learn directly from June 16, 2017 data, not from features, and it has been recently applied to audio and its results are remarkable. Even though the re- sults do not show an improvement over standard approaches, the contribution of this thesis is an exploration of deep learning architectures which can be use- ful to understand how networks process audio. The content of this report is freely available, but publication (with reference) may only be pursued due to agreement with the author. Contents 1 Introduction1 1.1 Scope of this work.............................2 2 Deep Learning3 2.1 Overview..................................3 2.2 Multilayer Perceptron...........................4
    [Show full text]
  • Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting
    water Article Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting Halit Apaydin 1 , Hajar Feizi 2 , Mohammad Taghi Sattari 1,2,* , Muslume Sevba Colak 1 , Shahaboddin Shamshirband 3,4,* and Kwok-Wing Chau 5 1 Department of Agricultural Engineering, Faculty of Agriculture, Ankara University, Ankara 06110, Turkey; [email protected] (H.A.); [email protected] (M.S.C.) 2 Department of Water Engineering, Agriculture Faculty, University of Tabriz, Tabriz 51666, Iran; [email protected] 3 Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam 4 Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam 5 Department of Civil and Environmental Engineering, Hong Kong Polytechnic University, Hong Kong, China; [email protected] * Correspondence: [email protected] or [email protected] (M.T.S.); [email protected] (S.S.) Received: 1 April 2020; Accepted: 21 May 2020; Published: 24 May 2020 Abstract: Due to the stochastic nature and complexity of flow, as well as the existence of hydrological uncertainties, predicting streamflow in dam reservoirs, especially in semi-arid and arid areas, is essential for the optimal and timely use of surface water resources. In this research, daily streamflow to the Ermenek hydroelectric dam reservoir located in Turkey is simulated using deep recurrent neural network (RNN) architectures, including bidirectional long short-term memory (Bi-LSTM), gated recurrent unit (GRU), long short-term memory (LSTM), and simple recurrent neural networks (simple RNN). For this purpose, daily observational flow data are used during the period 2012–2018, and all models are coded in Python software programming language.
    [Show full text]
  • Xgboost: a Scalable Tree Boosting System
    XGBoost: A Scalable Tree Boosting System Tianqi Chen Carlos Guestrin University of Washington University of Washington [email protected] [email protected] ABSTRACT problems. Besides being used as a stand-alone predictor, it Tree boosting is a highly effective and widely used machine is also incorporated into real-world production pipelines for learning method. In this paper, we describe a scalable end- ad click through rate prediction [15]. Finally, it is the de- to-end tree boosting system called XGBoost, which is used facto choice of ensemble method and is used in challenges widely by data scientists to achieve state-of-the-art results such as the Netflix prize [3]. on many machine learning challenges. We propose a novel In this paper, we describe XGBoost, a scalable machine learning system for tree boosting. The system is available sparsity-aware algorithm for sparse data and weighted quan- 2 tile sketch for approximate tree learning. More importantly, as an open source package . The impact of the system has we provide insights on cache access patterns, data compres- been widely recognized in a number of machine learning and sion and sharding to build a scalable tree boosting system. data mining challenges. Take the challenges hosted by the machine learning competition site Kaggle for example. A- By combining these insights, XGBoost scales beyond billions 3 of examples using far fewer resources than existing systems. mong the 29 challenge winning solutions published at Kag- gle's blog during 2015, 17 solutions used XGBoost. Among these solutions, eight solely used XGBoost to train the mod- Keywords el, while most others combined XGBoost with neural net- Large-scale Machine Learning s in ensembles.
    [Show full text]
  • Training Deep Networks Without Learning Rates Through Coin Betting
    Training Deep Networks without Learning Rates Through Coin Betting Francesco Orabona∗ Tatiana Tommasi∗ Department of Computer Science Department of Computer, Control, and Stony Brook University Management Engineering Stony Brook, NY Sapienza, Rome University, Italy [email protected] [email protected] Abstract Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms. 1 Introduction In the last years deep learning has demonstrated a great success in a large number of fields and has attracted the attention of various research communities with the consequent development of multiple coding frameworks (e.g., Caffe [Jia et al., 2014], TensorFlow [Abadi et al., 2015]), the diffusion of blogs, online tutorials, books, and dedicated courses. Besides reaching out scientists with different backgrounds, the need of all these supportive tools originates also from the nature of deep learning: it is a methodology that involves many structural details as well as several hyperparameters whose importance has been growing with the recent trend of designing deeper and multi-branches networks.
    [Show full text]
  • Bridging Reinforcement Learning and Creativity: Implementing Reinforcement Learning in Processing
    Bridging Reinforcement Learning and Creativity: Implementing Reinforcement Learning in Processing Jieliang Luo Media Arts & Technology Sam Green Computer Science University of California, Santa Barbara SIGGRAPH Asia 2018 Tokyo, Japan December 6th, 2018 Course Agenda • What is Reinforcement Learning (10 mins) ▪ Introduce the core concepts of reinforcement learning • A Brief Survey of Artworks in Deep Learning (5 mins) • Why Processing Community (5 mins) • A Q-Learning Algorithm (35 mins) ▪ Explain a fundamental reinforcement learning algorithm • Implementing Tabular Q-Learning in P5.js (40 mins) ▪ Discuss how to create a reinforcement learning environment ▪ Show how to implement the tabular q-learning algorithm in P5.js • Questions & Answers (5 mins) Bridging Reinforcement Learning and Creativity, Jieliang Luo & Sam Green Psychology Pictures Bridging Reinforcement Learning and Creativity, Jieliang Luo & Sam Green What is Reinforcement Learning • Branch of machine learning • Learns through trial & error, rewards & punishment • Draws from psychology, neuroscience, computer science, optimization Bridging Reinforcement Learning and Creativity, Jieliang Luo & Sam Green Reinforcement Learning Framework Agent Environment Bridging Reinforcement Learning and Creativity, Jieliang Luo & Sam Green https://www.youtube.com/watch?v=V1eYniJ0Rnk Bridging Reinforcement Learning and Creativity, Jieliang Luo & Sam Green https://www.youtube.com/watch?v=ZhsEKTo7V04 Bridging Reinforcement Learning and Creativity, Jieliang Luo & Sam Green Bridging Reinforcement
    [Show full text]
  • The Perceptron
    The Perceptron Volker Tresp Summer 2019 1 Elements in Learning Tasks • Collection, cleaning and preprocessing of training data • Definition of a class of learning models. Often defined by the free model parameters in a learning model with a fixed structure (e.g., a Perceptron) (model structure learning: search about model structure) • Selection of a cost function which is a function of the data and the free parameters (e.g., a score related to the number of misclassifications in the training data as a function of the model parameters); a good model has a low cost • Optimizing the cost function via a learning rule to find the best model in the class of learning models under consideration. Typically this means the learning of the optimal parameters in a model with a fixed structure 2 Prototypical Learning Task • Classification of printed or handwritten digits • Application: automatic reading of postal codes • More general: OCR (optical character recognition) 3 Transformation of the Raw Data (2-D) into Pattern Vectors (1-D), which are then the Rows in a Learning Matrix 4 Binary Classification for Digit \5" 5 Data Matrix for Supervised Learning 6 M number of inputs (input attributes) Mp number of free parameters N number of training patterns T xi = (xi;0; : : : ; xi;M ) input vector for the i-th pattern xi;j j-th component of xi T X = (x1;:::; xN ) (design matrix) yi target for the i-th pattern T y = (y1; : : : ; yN ) vector of targets y^i model prediction for xi T di = (xi;0; : : : ; xi;M ; yi) i-th pattern D = fd1;:::; dN g (training data) T x = (x0; x1; : : : ; xM ) , generic (test) input y target for x y^ model estimate fw(x) a model function with parameters w f(x) the\true"but unknown function that generated the data Fine Details on the Notation • x is a generic input and xj is its j-th component.
    [Show full text]
  • Downloaded from the Cancer Imaging Archive (TCIA)1 [13]
    UC Irvine UC Irvine Electronic Theses and Dissertations Title Deep Learning for Automated Medical Image Analysis Permalink https://escholarship.org/uc/item/78w60726 Author Zhu, Wentao Publication Date 2019 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, IRVINE Deep Learning for Automated Medical Image Analysis DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Wentao Zhu Dissertation Committee: Professor Xiaohui Xie, Chair Professor Charless C. Fowlkes Professor Sameer Singh 2019 c 2019 Wentao Zhu DEDICATION To my wife, little son and my parents ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES x LIST OF ALGORITHMS xi ACKNOWLEDGMENTS xii CURRICULUM VITAE xiii ABSTRACT OF THE DISSERTATION xvi 1 Introduction 1 1.1 Dissertation Outline and Contributions . 3 2 Adversarial Deep Structured Nets for Mass Segmentation from Mammograms 7 2.1 Introduction . 7 2.2 FCN-CRF Network . 9 2.3 Adversarial FCN-CRF Nets . 10 2.4 Experiments . 12 2.5 Conclusion . 16 3 Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification 17 3.1 Introduction . 17 3.2 Deep MIL for Whole Mammogram Mass Classification . 19 3.2.1 Max Pooling-Based Multi-Instance Learning . 20 3.2.2 Label Assignment-Based Multi-Instance Learning . 22 3.2.3 Sparse Multi-Instance Learning . 23 3.3 Experiments . 24 3.4 Conclusion . 27 iii 4 DeepLung: Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification 29 4.1 Introduction .
    [Show full text]