Machine Learning Lecture 1: Overview of Class, LFD 1.1, 1.2

ECS171: Machine Learning Lecture 1: Overview of class, LFD 1.1, 1.2 Cho-Jui Hsieh UC Davis Jan 8, 2018 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS171_Winter2018/main.html and canvas My office: Mathematical Sciences Building (MSB) 4232 Office hours: Tuesday 1pm-2pm, MSB 4232 (starting next week) TAs: Patrick Chen ([email protected]) Xuanqing Liu ([email protected]) Office hour: Thursday 10AM{11AM Kemper 55 (starting next week) My email: [email protected] Course Information Course Material: Part I (before midterm exam): Use the book \Learning from data" (LFD) by Abu-Mostafa, Magdon-Ismail and Hsuan-Tian Lin Foundation of machine learning: why can we learn from data? overfitting, underfitting, training vs testing, regularization ∼11 lectures Most slides are based on Yaser Abu-Mostafa (Caltech): http://work.caltech.edu/lectures.html#lectures Hsuan-Tian Lin (NTU): https://www.csie.ntu.edu.tw/~htlin/course/mlfound17fall/ Part II: Introduce some practical machine learning models. Deep learning, kernel methods, boosting, tree-based approach, clustering, dimension reduction Grading Policy Midterm (30%) Written exam for Part I Homework (30%) 2 or 3 homeworks Final project (40%) Competition? Final project Group of ≤ 4 students. We will announce the dataset and task Kaggle-styled competition Upload your model/prediction online Our website will report the accuracy Final report: Report the algorithms you have tested and the implementation details Discuss your findings The Learning Problem From learning to machine learning What is learning? observations ! Learning ! Skill Machine learning: data ! Machine Learning ! Skill Automatic the learning process! Skill: how to make decision (action) Classify an image Predict bitcoin price ::: Example: movie recommendation Data: user-movie ratings Skill: predict how a user rate an unrated movie Known as the “Netflix problem" A competition held by Netflix in 2006 1 million ratings, 480K users, 17K movies 10% improvement over baseline ) 1 million dollar price Movie rating - a solution Each viewer/movie is associated with a \latent factor" Prediction: Rating viewer/movie factors Learning: Known ratings ! viewer/movie factors Credit Approval Problem Customer record: To be learned: \Is Approving credit card good for bank?" Formalize the Learning Problem Input: x 2 X (customer application) e.g., x = [23; 1; 1000000; 1; 0:5; 200000] Output: y 2 Y (good/bad after approving credit card) Target function to be learned: f : X!Y (ideal credit approval formula) Data (historical records in bank): D = f(x1; y1); (x2; y2); ··· ; (xN ; yN )g Hypothesis (function) g : X!Y (learned formula to be used) Basic Setup of Learning Problem Learning Model A learning model has two components: The hypothesis set H: Set of candidate hypothesis (functions) The learning algorithm: To pick a hypothesis (function) from the H Usually optimization algorithm (choose the best function to minimize the training error) Perceptron Our first ML model: perceptron (1957) Learning a linear function Single layer neural network Next, we introduce two components of perceptron: What's the hypothesis space? What's the learning algorithm? Perceptron Hypothesis Space Define the hypothesis set H For input x = (x1;:::; xd ) \attributes of a customer" d X Approve credit if wi xi > threshold; i=1 d X Deny credit if wi xi < threshold i=1 Define Y = f+1(good); −1(bad)g Linear hypothesis space H: all the h with the following form d X h(x) = sign( wi xi − threshold) i=1 (perceptron hypothesis) Perceptron Hypothesis Space (cont'd) Introduce an artificial coordinate x0 = −1 and set w0 = threshold d d X X T h(x) = sign( wi xi − threshold) = sign( wi xi ) = sign(w x) i=1 i=0 (vector form) Customer features x: points on Rd (d dimensional space) Labels y: +1 or −1 Hypothesis h: linear hyperplanes Select g from H H: all possible linear hyperplanes How to select the best one? g(xn) ≈ f (xn) = yn for most of the n = 1; ··· ; N Naive approach: Test all h 2 H and choose the best one minimizing the \training error" N 1 X train error = I (h(x ) 6= y ) N n n n=1 (I (·): indicator) Difficult: H is of infinite size Perceptron Learning Algorithm Perceptron Learning Algorithm (PLA) Initial from some w (e.g., w = 0) For t = 1; 2; ··· Find a misclassified point n(t): T sign(w xn(t)) 6= yn(t) Update the weight vector: w w + yn(t)xn(t) PLA Iteratively Find a misclassified point Rotate the hyperplane according to the misclassified point Perceptron Learning Algorithm Converge for \linearly separable" case: Linearly separable: there exists a perceptron (linear) hypothesis f with 0 training error PLA is guaranteed to obtain f (Stop when no more misclassified point) Binary classification Data: N d Features for each training example: fxngn=1, each xn 2 R Labels for each training example: yn 2 f+1; −1g d Goal: learn a function f : R ! f+1; −1g Examples: Credit approve/disapprove Email spam/not-spam patient sick/not sick ::: Other types of labels - Multi-class Multi-class classification: yn 2 f1; ··· ; Cg (C-way classification) Example: Coin recognition 2 Classify coins by two features (size; mass) (xn 2 R ) yn 2 Y = f1c; 5c; 10c; 25cg (Y = f1; 2; 3; 4g) Other examples: hand-written digits, ··· Other types of labels - Regression Regression: yn 2 R (output is a real number) Example: Stock price prediction Movie rating prediction ··· Other types of labels - structure prediction I love ML |{z} |{z} |{z} pronoun verb noun Multiclass classification for each word (word ) word class) (not using information of the whole sentence) Structure prediction problem: sentence ) structure (class of each word) Other examples: speech recognition, image captioning, ::: Machine Learning Problems Machine learning problems can usually be categorized into Supervised learning: every xn comes with yn (label) (semi-supervised learning) Unsupervised learning: only xn, no yn Reinforcement learning: Examples contain (input, some output, grade for this output) Unsupervised Learning (no yn) Clustering: given examples x1;:::; xN , classify them into K classes Other unsupervised learning: Outlier detection: fxng ) unusual(x) Dimensional reduction ::: Semi-supervised learning Only some (few) xn has yn Labeled data is much more expensive than unlabeled data Reinforcement Learning Used a lot in game AI, robotic controls Agent observe state St Agent conduct action At (ML model, based on input St ) Environment gives agent reward Rt Environment gives agent next state St+1 Only observe \grade" for a certain action (best action is not revealed) Ads system: (customer, ad choice, click or not) Conclusions Two components in ML: Set up a hypothesis space (potential functions) Develop an algorithm to choose a good hypothesis based on training examples A perceptron algorithm (linear classification) Supervised vs unsupervised learning Next class: LFD 1.3, 1.4 Questions?.

Machine Learning Lecture 1: Overview of Class, LFD 1.1, 1.2

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support