Modern Methods of Statistical Learning Sf2935 Lecture 1: Introduction to Learning Theory Timo Koski

Modern Methods of Statistical Learning sf2935 Lecture 1: Introduction to Learning Theory Timo Koski TK 2018-08-28 TK Modern Methods of Statistical Learning sf2935 Overview of the Lecture The methods of statistical learning provide a heterogeneous collection of points of view, tasks, algorithms, models and probability distributions. This lecture tries to give a few comprehensive concepts and presents the historically first instance of learning theory, perceptron and perceptron algorithm. TK Modern Methods of Statistical Learning sf2935 Your Learning Outcomes Supervised learning Unsupervised learning Bias-Variance Trade-off TK Modern Methods of Statistical Learning sf2935 E-learning TK Modern Methods of Statistical Learning sf2935 Statistical Learning = Machine Learning + Statistics Machine Learning means development of algorithms and techniques that allow computers to learn to 1 record observations about a phenomenon 2 build a model of this phenomenon 3 predict the future of this phenomenon Statistics gives a formal definition of machine learning some guarantees of expected results suggestions for new or improved modelling tools TK Modern Methods of Statistical Learning sf2935 Statistical Learning = Machine Learning + Statistics Machine Learning: a phenomenon is recorded via observations n fzi gi=1 with zi 2 Z. There are two generic situations 1 Unsupervised learning no predefined structure in Z. the goal is to find some structures: clusters, association rules, estimating probability distributions/densities 2 Supervised learning Z = X × Y, the components are not exchangeable z = x × y 2 X × Y. modelling: finding how x and y are related. the goal is to make prediction: given x, find a reasonable value for y such that z = x × y is compatible with the phenomenon. TK Modern Methods of Statistical Learning sf2935 Unsupervised learning Learning what normally happens (no teacher) No output Clustering: Grouping similar instances TK Modern Methods of Statistical Learning sf2935 Other types SemiSupervised Learning, a class of supervised learning techniques that also make use of large amount of n1 non-structured data fzi gi=1 for training and typically a small n2 amount of data fxi × yi gi=1 . Semi-supervised learning falls between unsupervised learning and supervised learning. Active learning is said to be a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query some information source to obtain the desired outputs at new data points. In statistics it is sometimes also called optimal experimental design. TK Modern Methods of Statistical Learning sf2935 Other types Reinforcement Learning is (Wikipedia) an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Learning by Doing TK Modern Methods of Statistical Learning sf2935 Supervised learning Several types of Y can be considered: 1 Y = f1, 2, ... , qg, classification i q classes, e.g., to tell whether a patient has a certain kind of disease. 2 Y = Rq, regression, e.g., to calculate the price y of a house based on some characteristics x (like the neighbourhood, year of construction, architectural style). 3 Y is something complex but structured. The modelling difficulty increases with the complexity of Y. TK Modern Methods of Statistical Learning sf2935 Modeling: f from X to Y n Given a data set or training set fxi × yi gi=1 a machine learning method builds up a mmachine h from X to Y. Example Conditional expectation or regression function: h (x) = E (Y j X = x) where Y and X are random variables with values in Y and X , respectively. A Good Model should be such that for all i = 1, ... , n h (xi ) ≈ yi In statistics, one chooses often to measure the quality of modelling by Mean Square Error (assuming Y ⊆ R)) n 1 2 MSE = ∑ (yi − h (xi )) . n i=1 TK Modern Methods of Statistical Learning sf2935 Modelling: f from X to Y The MSE in n 1 2 MSE = ∑ (yi − h (xi )) . n i=1 is computed using the training data that was used to fit the model, and so should more accurately be referred to as the training MSE. In general, we do not really care how well the method works on the training data. Rather, we are interested in the accuracy of the pre- dictions that we obtain when we apply our method to previously unseen test data. This is called generalization error or test error. TK Modern Methods of Statistical Learning sf2935 Bias -Variance Trade-Off Suppose that we have a training set consisting of a set of points x1, ... , xn and real values yi associated with each point xi . We assume that there is a functional, but noisy relation yi = f (xi ) + e, where the noise, e, has zero mean and variance s2 . We want to find a function fˆ(x), that approximates the true function y = f (x) as well as possible, by means of some learning algorithm. We make "as well as possible" precise by measuring the MSE between y and h = fˆ(x) which we want to be minimal both for x1, ... , xn AND for points outside of our sample. Of course, we cannot hope to do so perfectly, since the yi contain noise e. This means we must be prepared to accept an irreducible error in any function we come up with. TK Modern Methods of Statistical Learning sf2935 Bias -Variance Decomposition Finding an fˆ that generalizes to points outside of the training set can be done with any of the countless algorithms used for supervised learning. It turns out that whichever function fˆ we select, we can decompose its expected error on an unseen sample x as follows: h 2i 2 E y − fˆ(x) = Biasfˆ(x) + Varfˆ(x) + s2 (1) (2) Where: Biasfˆ(x) = Efˆ(x) − f (x) (3) and h 2i Varfˆ(x) = E fˆ(x) − E[fˆ(x)] (4) TK Modern Methods of Statistical Learning sf2935 Bias -Variance Trade-Off The expectation ranges over different choices of the training set x1, ... , xn, y1, ... , yn all sampled from the same distribution. The three terms represent: the square of the bias of the learning method, which can be thought of the error caused by the simplifying assumptions built into the method. E.g., when approximating a non-linear function f (x) using a learning method for linear models, there will be error in the estimates fˆ(x) due to this assumption; the variance of the learning method, or, intuitively, how much the learning method fˆ(x) will move around its mean; the irreducible error s2. Since all three terms are non-negative, this forms a lower bound on the expected error on unseen samples. TK Modern Methods of Statistical Learning sf2935 Bias -Variance Trade-Off TK Modern Methods of Statistical Learning sf2935 Bias -Variance Trade-Off TK Modern Methods of Statistical Learning sf2935 Bias -Variance Trade-Off If the model is too simple, the solution it yields is biased and does not fit the data. If the model is too complex, it is too sensitive to small variations in data TK Modern Methods of Statistical Learning sf2935 Classification In classification, one measures the quality of a model by training error rate 1 n ∑ I (yi 6= h (xi )) n i=1 where I (yi 6= f (xi )) = 1 if yi 6= h (xi ) and 0 otherwise. Bias-Variance trade-off can be established here, too. TK Modern Methods of Statistical Learning sf2935 Inductive Learning Inductive Learning is to learn a general rule from a finite number of cases. TK Modern Methods of Statistical Learning sf2935 Google DeepMinds AlphaGo algorithm masters ancient game of Go. Deep-learning software defeats human professional for first time. Nature 27 January 2016 the AlphaGo program applied deep learning in neural networks - brain-inspired programs in which connections between layers of simulated neurons are strengthened through examples and experience. It first studied 30 million positions from expert games, gleaning abstract information on the state of play from board data, much as other programmes categorize images from pixels. Then it played against itself across 50 computers, improving with each iteration, a technique known as reinforcement learning. TK Modern Methods of Statistical Learning sf2935 Visions "the embryo of an electronic computer that will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." Hannes Alfvén: ” (Manniskans)¨ verkliga storhet ligger i att hon ar¨ den levande varelse som var intelligent nog for¨ att inse att (evolutionens) mal˚ ar¨ datorn. ” p. 17 in H. Alfven: Sagan om den stora datamaskinen. concern for a Frankenstein-like world, where machines are programmed so perfectly that one day, they surpass human intellect. TK Modern Methods of Statistical Learning sf2935 TK Modern Methods of Statistical Learning sf2935.

Modern Methods of Statistical Learning Sf2935 Lecture 1: Introduction to Learning Theory Timo Koski

Predrnn: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal Lstms

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Self-Supervised Learning

Reinforcement Learning in Supervised Problem Domains

Infinite Variational Autoencoder for Semi-Supervised Learning

Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J

A Primer on Machine Learning

4 Perceptron Learning

Beyond Word Embeddings: Dense Representations for Multi-Modal Data

Introducing Machine Learning for Healthcare Research

Combining Supervised and Unsupervised Machine Learning

Supervised Learning in Neural Networks