SB2b Statistical Hilary Term 2017

Mihaela van der Schaar and Seth Flaxman Guest lecturer: Yee Whye Teh Department of Statistics Oxford

Slides and other materials available at: http://www.oxford-man.ox.ac.uk/~mvanderschaar/home_ page/course_ml.html Administrative details Course Structure

MMath Part B & MSc in Applied Statistics Lectures: Wednesdays 12:00-13:00, LG.01. Thursdays 16:00-17:00, LG.01. MSc: 4 problem sheets, discussed at the classes: weeks 2,4,6,7 (check website) Part : 4 problem sheets Class Tutors: Lloyd Elliott, Kevin Sharp, and Hyunjik Kim Please sign up for the classes on the sign up sheet! Administrative details Course Aims

1 Understand statistical fundamentals of machine learning, with a focus on supervised learning (classification and regression) and empirical risk minimisation. 2 Understand difference between generative and discriminative learning frameworks. 3 Learn to identify and use appropriate methods and models for given data and task. 4 Learn to use the relevant R or python packages to analyse data, interpret results, and evaluate methods. Administrative details SyllabusI

Part I: Introduction to supervised learning (4 lectures) Empirical risk minimization Bias/variance, Generalization, Overfitting, Cross validation Regularization Neural networks Part II: Classification and regression (3 lectures) Generative vs. Discriminative models K-nearest neighbours, Maximum Likelihood Estimation, Mixture models Naive Bayes, Decision trees, CART Support Vector Machines Random forest, Boostrap Aggregation (Bagging), Ensemble learning Expectation Maximization Administrative details SyllabusII

Part III: Theoretical frameworks Statistical learning theory Decision theory Part IV: Further topics Optimisation Hidden Markov Models Backward-forward algorithms Reinforcement learning Overview Statistical Machine Learning What is Machine Learning?

http://gureckislab.org Tom Mitchell, 1997 Any computer program that improves its performance at some task through experience.

Kevin Murphy, 2012 To develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest.

Overview Statistical Machine Learning What is Machine Learning?

Arthur Samuel, 1959 Field of study that gives computers the ability to learn without being explicitly programmed. Kevin Murphy, 2012 To develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest.

Overview Statistical Machine Learning What is Machine Learning?

Arthur Samuel, 1959 Field of study that gives computers the ability to learn without being explicitly programmed.

Tom Mitchell, 1997 Any computer program that improves its performance at some task through experience. Overview Statistical Machine Learning What is Machine Learning?

Arthur Samuel, 1959 Field of study that gives computers the ability to learn without being explicitly programmed.

Tom Mitchell, 1997 Any computer program that improves its performance at some task through experience.

Kevin Murphy, 2012 To develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Overview Statistical Machine Learning What is Machine Learning?

Information Structure Prediction Decisions Actions

data

Larry Page about DeepMind’s ML systems that can learn to play video games like humans Overview Statistical Machine Learning What is Machine Learning?

statistics

business computer finance science

cognitive biology Machine science genetics Learning psychology

physics mathematics engineering operations research Overview Statistical Machine Learning What is Data Science? Early years

John Tukey, The Future of Data Analysis, 1962 For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt. ... All in all I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data

Four driving forces, according to Tukey The formal theories of statistics Accelerating developments in computers... The challenge, in many fields, of more and ever larger bodies of data The emphasis on quantification in an ever wider variety of disciplines Overview Statistical Machine Learning What is Data Science?

Bin Yu, Let us own Data Science, IMS Presidential Address, 2014 Statistics Domain/science knowledge Computing Collaboration/teamwork Communication to outsiders

David Donoho, 50 years of Data Science, 2015 “Greater Data Science”: Data Exploration and Preparation Data Representation and Transformation Computing with Data Data Modeling Data Visualization and Presentation Science about Data Science Overview Statistical Machine Learning Statistics vs Machine Learning

Traditional Problems in Applied Statistics Well formulated question that we would like to answer. Expensive data gathering and/or expensive computation. Create specially designed experiments to collect high quality data.

Information Revolution Improvements in data processing and data storage. Powerful, cheap, easy data capturing. Lots of (low quality) data with potentially valuable information inside.

CS and Stats forced back together: unified framework of data, inferences, procedures, algorithms statistics taking computation seriously computing taking statistical risk seriously Michael I. Jordan: On the Computational and Statistical Interface and "Big Data" Max Welling: Are Machine Learning and Statistics Complementary? Overview Types of Machine Learning Types of Machine Learning

Unsupervised learning Extract key features of the “unlabelled” data clustering, signal separation, density estimation Goal: representation, hypothesis generation, visualization

Supervised learning Data contains “labels”: every example is an input-output pair classification, regression Goal: prediction on new examples Overview Types of Machine Learning Types of Machine Learning

Semi-supervised Learning A database of examples, only a small subset of which are labelled.

Multi-task Learning A database of examples, each of which has multiple labels corresponding to different prediction tasks.

Reinforcement Learning An agent acting in an environment, given rewards for performing appropriate actions, learns to maximize their reward. Overview Supervised Learning Supervised Learning

Unsupervised learning: To “extract structure” and postulate hypotheses about data generating process from “unlabelled” observations x1,..., xn. Visualize, summarize and compress data. Supervised learning: In addition to the observations of X, we have access to their response n variables / labels Y ∈ Y: we observe {(xi, yi)}i=1. Types of supervised learning: Classification: discrete responses, e.g. Y = {+1, −1} or {1,..., K}. Regression: a numerical value is observed and Y = R. The goal is to accurately predict the response Y on new observations of X, i.e., to learn a function f : Rp → Y, such that f (X) will be close to the true response Y. Overview Supervised Learning Applications of Machine Learning

spam filtering recommendation fraud detection systems

self-driving cars stock market analysis image recognition

ImageNet: Computer Eyesight Gets a Lot More Accurate, Krizhevsky et al, 2012 New applications of ML: Machine Learning is Eating the World Machine learning in practice Spam detection

Observations X are text documents Labels Y are spam = +1 and not spam = −1. How do we encode documents of different lengths as a vector X ∈ Rp? n Given a set of labelled documents {(xi, yi)}i=1 how do we learn a function

p f : R → Y

Many answers to both questions will be covered in this course: logistic regression, naive Bayes, neural networks, Support Vector Machines, etc. Machine learning in practice Image classification

Observations X are images Labels Y ∈ {0, 1,..., 9} Learn a function p f : R → Y Machine learning in practice Face recognition

Observations X are images Labels Y are a very large set of people: {Queen Elizabeth, Bill Gates, Justin Trudeau, Leonardo DiCaprio, etc.} How do we encode images as vectors X ∈ Rp? n Given a set of labelled images {(xi, yi)}i=1 how do we learn a function

p f : R → Y

Fundamentally harder or different than image classification? Machine learning in practice Face detection

Farfade, Saberian, and Li 2015 https://arxiv.org/pdf/1502.02766v3.pdf Machine learning in practice Face detection

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Farfade, Saberian, and Li 2015 https://arxiv.org/pdf/1502.02766v3.pdf Observations X are images What are the labels Y? How should our function f work? Machine learning in practice Machine translation

Kyunghyun Cho https://devblogs.nvidia.com/parallelforall/ introduction-neural-machine-translation-gpus-part-3/ Observations X are sentences in language A Labels Y are sentences in language B How should we encode X and Y numerically? Is this regression or classification? Machine learning in practice Speech recognition

Dahl et al. 2012 Machine learning in practice Self-driving cars

27 million connections and 250 thousand parameters devblogs.nvidia.com/parallelforall/ deep-learning-self-driving-cars/ Machine learning in practice Product recommendation

Fully observe all user interactions on a website (what pages they view, what items they buy, what reviews they leave, etc.) What products should be recommended to them? On which websites? How can you phrase this as supervised learning? Machine learning in practice Software Software

R Python: scikit-learn, mlpy, Theano Weka, mlpack, Torch, Shogun, TensorFlow... Matlab/Octave Machine learning in practice Software Machine learning advances in 2016 and challenges ahead

2016: Free/open source software for deep learning: TensorFlow (Google), CNTK (Microsoft), PaddlePaddle (Baidu), MXNet (Amazon) Audio generation Go Advances in machine translation (Google translate) 2017 and beyond: Increasing concern about, regulation of algorithms Transparency / explainability in machine learning Effect of increasing automation of work on society Medical advances?