<<

Photos placed in horizontal position with even amount of white space between photos and header

Machine Learning Tutorial

Danny Dunlavy, 01461

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2018-7925 TR Goals for this Tutorial

§ Introduction to main concepts in § Preparation for participation in MLDL Workshop

Caveats § Awareness stressed over education § Neural Networks/Deep Learning mostly avoided § Deep Learning Tutorial: Thursday, July 19, 2018

7/18/18 ML Tutorial 2 Machine Learning

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

--Tom Mitchell, Machine Learning, 1997

7/18/18 ML Tutorial 3 Example: Handwriting Recognition

§ Task (T): § recognizing and classifying handwritten numbers within images § Performance measure (P): § percent of numbers correctly classified § Experience (E): § a database of handwritten numbers with given classifications

Example adapted from Tom Mitchell, Machine Learning, 1997 Data from MNIST database, http://yann.lecun.com/exdb/mnist/

7/18/18 ML Tutorial 4 Example ML Workflow

Data Features Model Solution Evaluation

Instance Label Label Correct

5 0 87%

0 1 96%

4 2 84% . . .

1 3 82% ...... 3 0 4 1

7/18/18 ML Tutorial 5 Feature Engineering

§ Feature engineering is the process of using domain knowledge to create feature § Often manual, time-consuming process § Many machine learning algorithms take vectors as inputs § Raw data often is not in vector format § For many data types, there are existing conventions for creating feature vectors

Pedro Domingos. 2012. A few useful things to know about machine learning. Communications of the ACM, 55(10), 78-87.

7/18/18 ML Tutorial 6 Feature Vectors: Images

Pixel Values (vectorized) Image Processing Features (Feature Detectors) § Edge, corner, blob, ridge detection § Histogram of Oriented Gradients (HoG) § Hu’s Invariant Moments § Local binary patterns (LBP) § Hough transform . Example Software: Python: scikit-image Matlab: Image Processing Toolbox Julia: JuliaImages (ImageFeatures) : https://github.com/bnosac/image

Reference: Image Feature Detectors and Descriptors. Eds. Awad Image from Matlab 2018a demo: street1.jpg and Hassaballah, Springer, 2016.

7/18/18 ML Tutorial 7 Works generally with Feature Vectors: Text counts of observations on discrete domains

§ Vector Space Model § Variations (Bag of Words Model) § Stop words (high frequency) § Document 1 § the, a , and The quick brown fox jumped over the lazy dog. § Stemming § Document 2 § jumps, jumped -> jump The brown dog jumped over the dog fence. § N-grams Doc 1 Doc 2 § quick brown quick 1 0 § brown fox brown 1 1 § fox jump fox 1 0 § jump 1 1 Weighting § over 1 1 TF-IDF (term frequency- Document Matrix Document - inverse doc frequency) dog 1 2

Term fence 0 1

Salton, et al., 1975. A vector space model for automatic Manning and Schutze, Foundations of Statistical indexing. Communications of the ACM, 18(11), 613-620. Natural Language Processing. MIT Press. 1999.

7/18/18 ML Tutorial 8 ! Sequential Data ! § Natural Language Processing (NLP) Part of speech at np-tl nn-tl jj-tl nn-tl vbd nr at Word The Fulton County Grand Jury said Friday an !! !! W. Nelson Francis and Henry Kucera, 1979. The Brown Corpus: A Standard Corpus of Present-Day Edited American English. YG! MZOMC!@MH3;N*CMBJN3B* § Computer NetworkYG! MZOMC!@MH3;N*CMBJN3B* Traffic Analysis YGF! M[9#6&1#$%*B#%09* YGF! M[9#6&1#$%*B#%09* g)*!*9;)S.'8').!89383,!49!*9;)*797!HHC!A'79)!39]&9.;93!;).8#'.'.S!='A9!$'.73!)=!0&2#.!#;8').3W!4#<$!Z84)!$'.73[,! *&.!Z84)!$'.73[!#.7!3$'K!K9*=)*297!5%!GGg)*!*9;)S.'8').!89383,!49!*9;)*797!HHC!A'79)!39]&9.;93!;).8!&.'A9*3'8%!38&79.83!#8!#S93!)=!BREQQN!#'.'.S!='A9!$'.73!)=!0&2#.!#;8').3W!4#<$!Z84)!$'.73[,!?9!8))$!9#;0!#;8').!3;9.9!e!8'293!ZG!=)*! *&.!Z84)!$'.73[!#.7!3$'K!K9*=)*297!5%!GG7#8#5#39!#.7!B!=)*!*9;)S.'8').[N!!?9!&397!#!38#8';!;)<)*!!&.'A9*3'8%!38&79.83!#8!#S93!)=!BREQQN!PPM!A'79)!;#29*#!4'80!GC=K3!=*#29!*#89,!40';0!4#3!K)3'8').97!?9!8))$!9#;0!#;8').!3;9.9!e!8'293!ZG!=)*! 8)!#!J!2989*3!<).S!4#<$!4#%!8)!8#$9!=&<

K'\9<3!8)!GHCmQeC!K'\9<3N!"09!A'79)!=*#29!*#89!4#3!#<3)!7)4.E3#2K<97!8)!.)*2#<'a9!809!2)8').!3K997!).!9#;0!2)8').N!!34546G >.!80'3!9\K9*'29.8,!49!79='.9!#!7#8#!398!)!40';0!8#$93!809!=)*2!)=!#!./ 34546 !89.3)*,!409*9!(kGG!Z809!.&259*!)=!K9)K<9[,! >.!80'3!9\K9*'29.8,!49!79='.9!#!7#8#!398!Wylie,!40';0!8#$93!809!=)*2!)=!#! et al., Using NoSQL Databases for Streaming! G !89.3)*,!409*9!(kGG!Z809!.&259*!)=!K9)K<9[,!Network Analysis, LDAV, 2012. VkJ!Z809!.&259*!)=!#;8').!;<#3393[,!#.7!)6kGC!Z809!.&259*!)=!'2#S9!=9#8&*93[N!./ VkJ!Z809!.&259*!)=!#;8').!;<#3393[,!#.7!§ Video Analysis 6kGC!Z809!.&259*!)=!'2#S9!=9#8&*93[N! ?#<$B! ?#<$Q! `&.B! `&.Q! :$'K! ?#<$B! ?#<$Q! `&.B! `&.Q! :$'K!

Takayuki Hori, Jun Ohya and Jun Kurumisawa, Computational Imaging, 2010.

7/18/18 ML Tutorial 9 ?#<$! ?#<$! `&.! `&.! :$'K! ?'80!#*23!?#<$! ?'80)&8!#*23!?#<$! h)4!3K997!`&.! ('S0!3K997!`&.! :$'K! ?'80!#*23!4#A'.S! ?'80)&8!#*23!4#A'.S! h)4!3K997! ('S0!3K997! 4#A'.S! 4#A'.S! g'S&*9!IW!g'A9!1'.73!)=!(&2#.!T;8').3!#.7!V\K<#'.!)=!809!2)8').N! g'S&*9!IW!g'A9!1'.73!)=!(&2#.!T;8').3!#.7!V\K<#'.!)=!809!2)8').N! YGK! O#645$*!"#$%&'&8+%&5$*C#40-%4* YGK! O#645$*!"#$%&'&8+%&5$*C#40-%4* g)*!*9;)S.'8').!89383,!49!&397!BHJ!3#2K<93!ZGG!K9)K<9!+!J!#;8').3[N!"09!*9;)S.'8').!*93&<83!#*9!30)4.!'.!g'SN!R,!'.! 40';0!809!*9;)S.'8').!#;;&*#;'93!=)*!809!J!#;8').3!#*9!30)4.Ng)*!*9;)S.'8').!89383,!49!&397!BHJ!3#2K<93!ZGG!K9)K<9!+!J!!")!30)4!809!A#<'7'8%!)=!)&*!K*)K)397!2980)7,!)&*!2980)7!#;8').3[N!"09!*9;)S.'8').!*93&<83!#*9!30)4.!'.!g'SN!R,!'.! '3!;)2K#*97!4'80!D9#*938!D9'S05)*!*&<9!ZDD[!#.7!O*'.;'K#

g'S&*9!RW!`9;)S.'8').!`93&<83!=)*!9#;0!#;8').N! ! g'S&*9!RW!`9;)S.'8').!`93&<83!=)*!9#;0!#;8').N! !

@:,AB,@CDQ+."*8+EFGG++EFGGHIBT @:,AB,@CDQ+."*8+EFGG++EFGGHIBT

Downloaded from SPIE Digital Library on 30 May 2012 to 198.102.153.2. Terms of Use: http://spiedl.org/terms Downloaded from SPIE Digital Library on 30 May 2012 to 198.102.153.2. Terms of Use: http://spiedl.org/terms

! ! Major Types of Machine Learning

§ Unsupervised Learning § Supervised Learning § Semi-supervised Learning § Reinforcement Learning

7/18/18 ML Tutorial 10 Unsupervised Learning

§ Tasks § Clustering (grouping) § Dimensionality reduction § Anomaly detection § Association § Generative modeling § Experience (data) § Instances are unlabeled § Performance measures § Challenging due to lack of labels/known solutions § Validation often leverages labeled data sets (labels only used in testing)

Fisher, 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 7 (2): 179–188. Anderson, 1936. The species problem in Iris. Annals of the Missouri Botanical Garden. 23 (3): 457–509.

7/18/18 ML Tutorial 11 K-means Clustering § Task § Group data instances by distance into K groups § Data instances are points in a multidimensional feature vector space

§ Standard Algorithm cluster centroid = 1. Initialize cluster centroids randomly arithmetic mean of the points in the cluster 2. Iterate until convergence a) Assign each instance to the cluster whose centroid is “closest” b) Update the centroids given the current cluster assignments

x x x x x x

x x x

Centroids (x) and cluster Assignment of instances Update centroids based assignments (color) at start of iteration to cluster with closest centroid on new cluster assignments

7/18/18 ML Tutorial 12 K-means Clustering § Task § Group data instances by distance into K groups § Data instances are points in a multidimensional feature vector space § Challenges § What value to use for K? § Most often chosen by the user/analyst/subject matter expert § How to initialize the centroids? § Random instances as centroids vs. random cluster assignments § How to compute distances? § Euclidean distance often used § Often data- and problem-dependent § When to stop iterating? § Assignment stagnation often used § K-means clustering is equivalent to local minimization

Lloyd, 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28 (2): 129–137.

7/18/18 ML Tutorial 13 Hierarchical Clustering § Clustering Approaches 1 § Agglomerative 2 6 § Merging from bottom to top 3 § Divisive 5 4 § Splitting from top to bottom Data in 2D Feature Space § Metric § Distance between data points 1,2,3,4,5,6 1 § Linkage Criteria § Distance between sets 1,2,3 4,5,6 2 § Single: minimum 2,3 4,5 4 § Complete: maximum § Average 1 2 3 4 5 6 6 § Number of clusters Dendrogram Clusters § Choose a level to cut dendrogram Gan, et al., Data Clustering: Theory, Algorithms, and Applications. SIAM, 2007.

7/18/18 ML Tutorial 14 Other Clustering Methods

§ K-medoids § K-means like algorithm using medoids (median values of cluster points) instead of means for assignments § Mean-Shift § sliding window dense area search § DBSCAN § density-based clustering with outlier detection and no predetermined number of clusters § Gaussian Mixture Models § K-means like algorithm with Gaussian distribution assumptions & probabilistic assignment

Gan, et al., Data Clustering: Theory, Algorithms, and Applications. SIAM, 2007.

7/18/18 ML Tutorial 15 Clustering Performance Measures True Predicted Cluster Cluster 1 2 § Rand Index (RI) 1 2 § probability that two label sets will agree on 1 2 a randomly chosen pair of labels 2 2 § Jacard Similarity (JS) 2 3 § size of the intersection divided by the size 2 3 of the union of two label sets 3 1 § Normalized Mutual Information (NMI) 3 1 § normalization to [0,1] of the Mutual 3 1 Information (MI) scores between label sets 4 1 RI [0,1] JS [0,1] NMI [0,1] 4 0 0.8590 0.4762 0.7560 4 0

4 0 Gan, et al., Data Clustering: Theory, Algorithms, and Applications. SIAM, 2007.

7/18/18 ML Tutorial 16 Representation Learning

§ Definition: Learning representations of the data (i.e., features) that make it easier to extract useful information when building machine learning models § Goals: § Dimensionality reduction § Noise reduction § More interpretable feature space § Sparse feature vectors § …

Bengio, et al., 2013. Representation Learning: A Review and New Perspectives, IEEE PAMI, 35 (8), 1798-1828.

7/18/18 ML Tutorial 17 Principal Component Analysis (PCA)

Statistical procedure that converts data with correlated features to data with linearly uncorrelated features (principal components [PC])

10 § PCA = eigenvalue decomposition 8 First PC Second PC of data covariance matrix 6

4 § PCA captures data variance

2 § First PC accounts for as much

0 variability in data as possible

-2 § Subsequent PCs account

-4 for highest variance constrained

-6 to be orthogonal to preceding PCs

-8 § Reduced dimension data model

-10 -10 -8 -6 -4 -2 0 2 4 6 8 10 uses first K principal components

Pearson, 1901. On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine, 2 (11): 559–572.

7/18/18 ML Tutorial 18 Principal Component Analysis (PCA)

§ PCA applicable to continuous (multivariate normal) data § Extensions to handle binary, count, and categorical data § Principal components are linear combinations of features § Extensions to nonlinear mappings

7/18/18 ML Tutorial 19 More Dimensionality Reduction

§ Independent Component Analysis (ICA) § Separates data into additive subcomponents that are maximally independent, works with non-Gaussian data § Non-negative Matrix Factorization (NMF) § Assumes data and components are non-negative, more interpretable § Multidimensional scaling (MDS) § Family of nonlinear methods, preserves between-instance distances in lower dimensional feature space § t-distributed stochastic neighbor embedding (t-SNE) § Nonlinear method, preserves both local and global structure in lower dimensional feature space

7/18/18 ML Tutorial 20 Tensor Factorizations

What if we want to apply PCA to multiway data (e.g., instances by features over time)?

Viewpoint 1: Sum of outer products, CP Model: Sum of !-way outer products, useful for interpretation useful for interpretation

CANDECOMP, PARAFAC, Canonical Polyadic, CP Viewpoint 2: High-variance subspaces, Tucker Model: Project onto high-variance useful for compression subspaces to reduce dimensionality

HO-SVD, Best Rank-(#1, #2, … , #!) decomposition

Kolda and Bader, Tensor Decompositions and Applications, SIAM Review, 51(3), 455–500, 2009.

7/18/18 ML Tutorial 21 Supervised Learning Iris Data (subset) Sepal Sepal Petal Petal Species length width length width § Tasks 5.1 3.5 1.4 0.2 setosa § Regression (continuous response) 4.9 3 1.4 0.2 setosa § Classification (discrete response) 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa § Binary (2 classes) 5 3.6 1.4 0.2 setosa § Multiclass (>2 classes) 7 3.2 4.7 1.4 versicolor 6.4 3.2 4.5 1.5 versicolor § Experience (data) 6.9 3.1 4.9 1.5 versicolor § Regression: input-output pairs 5.5 2.3 4 1.3 versicolor § Classification: feature-label pairs 6.5 2.8 4.6 1.5 versicolor 6.3 3.3 6 2.5 virginica § Performance measures 5.8 2.7 5.1 1.9 virginica § Many different methods 7.1 3 5.9 2.1 virginica 6.3 2.9 5.6 1.8 virginica 6.5 3 5.8 2.2 virginica

Features Label Fisher, 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 7 (2): 179–188. Anderson, 1936. The species problem in Iris. Annals of the Missouri Botanical Garden. 23 (3): 457–509.

7/18/18 ML Tutorial 22 Supervised Learning Workflow

Data Features Model Prediction Evaluation

§ Split data into three parts § Training: used to fit the models § Validation: used to estimate prediction error § Test: used for assessment of generalization error of the final model § Cross-validation § N-fold: partition training data into N equal-sized partitions, train N models using each partition as the validation data, and the rest as training data. Average the predictions errors across the models. § Nx2: do 2-fold cross-validation N times. Average predictions errors across the models.

7/18/18 ML Tutorial 23 Linear Regression

Data Linear Regression Model

x y

0.1221 0.2418 0.0684 0.1023 0.1500 0.1083 Linear Least Squares Estimation 0.0871 0.1448 0.0965 0.0714 0.1667 0.2076 1.4 0.1559 0.1937 0.2036 0.2996 1.2

0.1728 0.2202 1

0.2037 0.2368 0.8 0.2214 0.1963 0.6 0.1985 0.2304

0.4 0.2401 0.2034

0.2678 0.3496 0.2

0.2856 0.3446 0 0 0.2 0.4 0.6 0.8 1 1.2

7/18/18 ML Tutorial 24 Linear Regression

§ Linear regression useful when data is approximately linear

All four data sets have the same linear regression lines!

Anscombe, 1973. Graphs in Statistical Analysis, American Statistician, 27 (1): 17–21.

7/18/18 ML Tutorial 25 Nonparametric Regression

§ Polynomial Regression § Relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x § Regression § Find nonlinear relationship between x and y using kernel weighting of dependent and/or independent variables § Gaussian Process Regression (Kriging) § Predict values of a function at a given point by computing a weighted average of the known values of the function in the neighborhood of the point

7/18/18 ML Tutorial 26 k-Nearest Neighbors (kNN)

§ Input: k closest instances (nearest neighbors) in feature space § Output § Regression: average values of k nearest neighbors § Classification: majority class of k nearest neighbors

kNN: example of instance-based learning § Function only approximated locally § Computation deferred until prediction

https://www.quora.com/How-is-the-k-nearest-neighbor-algorithm-different-from-k-means-clustering

7/18/18 ML Tutorial 27 Logistic Regression

§ Logistic model: log-odds of the probability of a binary outcome is a linear combination of the independent variables § Logistic regression: estimating parameters of a logistic model § Used for binary classification, with dependent variables labeled 0 or 1

http://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic.html#sphx-glr-auto-examples-linear-model-plot-logistic-py

7/18/18 ML Tutorial 28 Logistic Regression

§ Binomial Logistic Regression § Dependent variable can have only two possible outcomes (labels) § See previous slide for example § Mulitinomial Logistic Regression § Dependent variable can have three or more possible outcome types § e.g., red vs. blue vs. green § Ordinal Logistic Regression § Dependent variables are ordered § e.g., ratings in the set {1, 2, 3, 4, 5}

7/18/18 ML Tutorial 29 Decision Trees

Iris Data (subset) Sepal Sepal Petal Petal Species length width length width 5.1 3.5 1.4 0.2 setosa 4.9 3 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa petal width < 1 5 3.6 1.4 0.2 setosa YES NO 7 3.2 4.7 1.4 versicolor 6.4 3.2 4.5 1.5 versicolor setosa petal length > 5 6.9 3.1 4.9 1.5 versicolor YES NO 5.5 2.3 4 1.3 versicolor 6.5 2.8 4.6 1.5 versicolor virginica versicolor 6.3 3.3 6 2.5 virginica 5.8 2.7 5.1 1.9 virginica 7.1 3 5.9 2.1 virginica 6.3 2.9 5.6 1.8 virginica 6.5 3 5.8 2.2 virginica

Breiman, et al., 1984. Classification and Regression Trees. Wadsworth & Brooks/Cole.

7/18/18 ML Tutorial 30 Decision Trees

§ Advantages § Easy to interpret § Handles numerical and categorical data § Scales well to large data § Which feature to use for splitting? § Minimize Gini impurity: probability of misclassification in children § Maximize information gain: entropy of the parent minus the weighted sum of entropy of the children § How many levels should be in the tree? § As many as needed to get pure children nodes (all instances of same class) § Pruning increases model generalization (i.e., reduces overfitting)

Breiman, et al., 1984. Classification and Regression Trees. Wadsworth & Brooks/Cole.

7/18/18 ML Tutorial 31 Naïve Bayes

§ Probabilistic classifier based on Bayes’ Theorem with strong independence assumptions among features

§ Assuming independence of n features (x1,…,xn)

§ Use maximum a posteriori (MAP) rule to choose from K classes

Hand and Yu, 2001. Idiot's Bayes — not so stupid after all? International Statistical Review, 69 (3): 385–399.

7/18/18 ML Tutorial 32 Support Vector Machines maximum-margin hyperplane

1 support vectors support vectors 2 6 3 4 5

maximum margin

§ Linear SVM Classifier: if training data is linear separable by class label, solution is straightforward § Nonlinear SVM Classifier: if training data is not linearly separable, use kernel to embed data into higher dimensional feature space where it is linearly separable, then classify in original, lower dimensional feature space

Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995.

7/18/18 ML Tutorial 33 Classification Evaluation Metrics

Scenario: binary classification of data with labels 0 and 1

§ Confusion Matrix § True Positives (TP): true label = 1, predicted label = 1 § True Negatives (TN): true label = 0, predicted label = 0 § False Positives (FP): true label = 0, predicted label = 1 § False Negatives (FN): true label = 1, predicted label = 0

Predicted Labels 0 1 True 0 TN FP Labels 1 FN TP

7/18/18 ML Tutorial 34 Classification Evaluation Metrics

Scenario: binary classification of data with labels 0 and 1

§ Accuracy: (TP + TN) / (TP + TN + FP + FN) § Precision: TP / (TP + FP) § Recall (Sensitivity): TP / (TP + FN) § Specificity: TN / (TN + FP) § F1 Score: 2 * Recall * Precision / (Recall + Precision) § Class Averaged Accuracy: 0.5 * (Sensitivity + Specificity)

7/18/18 ML Tutorial 35 Classification Evaluation Metrics

§ Receiver Operating Characteristic (ROC) curve: plot of true positive rate (sensitivity) vs. false positive rate (1-specificity) § Area under the curve (AUC): probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one

https://docs.eyesopen.com/toolkits/cookbook/python/plotting/roc.html

7/18/18 ML Tutorial 36 Ensemble Learning

§ Idea: use a collection of machine learning models to improve classification performance § Choices/Challenges: § Which model (weak learner) should be used in the collection? § How many models should be used? § How to promote diversity among the models? § How much should each member impact the ensemble prediction?

7/18/18 ML Tutorial 37 Ensemble Learning

§ Bootstrap aggregating (bagging) § Build each weak learner on a subset of data, using sampling with replacement § Boosting § Incrementally build ensemble by training each new model to emphasize the training instances that previous models misclassified § iVoting § Leverages importance sampling to determine training data for each new model in the ensemble § Heterogeneous ensembles § Use different types of weak learners in the ensemble (e.g., kNN, decision trees, SVMs, etc.)

7/18/18 ML Tutorial 38 Random Forests

§ Breiman’s model (Random ForestsTM) § Bagging used for determining training data for weak learners § Weak learners are decision trees that use a random subset of features to determine each split (usually √" when there are " total features) § Model evaluation consists of majority vote of weak learner predictions § Advantages § Empirically outperforms many (non-ensemble) classification models on many tasks § Robust to overfitting § Quick to train § East to compute feature importance

Breiman, 2001. Random Forests, Machine Learning, 45 (1): 5–32.

7/18/18 ML Tutorial 39 Semi-Supervised Learning

§ Tasks § Supervised Learning Tasks § Experience (data) § Small amount of labeled data § Mostly unlabeled data § Performance measures § Supervised Learning measures

Chapelle, et al. Semi-supervised learning. MIT Press, 2006.

7/18/18 ML Tutorial 40 Semi-Supervised Learning § Self-training § Train a model using labeled data § Use model to predict labels for unlabeled data § Add (some) unlabeled data and predicted labels to labeled data § Repeat § Co-training § Train two models on two partitions of labeled data § Use both models to predict labels for unlabeled data § Add (some) unlabeled data and predicted labels from model 1 to labeled data used to train model 2 § Add (some) unlabeled data and predicted labels from model 2 to labeled data used to train model 1 § Repeat § Generative Models

7/18/18 ML Tutorial 41 Reinforcement Learning

§ Tasks § Take the best action based on current state (i.e., information available) § Experience (data) § Interactions with the environment/system § State of environment/system § Performance measures § Maximize reward § Minimize risk

Reinforcement Learning: State-of-the-Art. Eds. Wiering and van Otterlo, Springer-Verlag, 2012.

7/18/18 ML Tutorial 42 How to Get Started with ML

§ Python § scikit-learn (sklearn): regression, classification, clustering, dimensionality reduction, model selection, preprocessing § Tutorial: https://www.datacamp.com/community/tutorials/machine- learning-python § Matlab § Statistics and Machine Learning Toolbox: regression, classification, clustering, dimensionality reduction, model selection, preprocessing, visualization, statistical methods, distributions § Mathworks: https://www.mathworks.com/machinelearning § Java § WEKA: data pre-processing, classification, regression, clustering, association rules, and visualization § FutureLearn: https://www.cs.waikato.ac.nz/ml/weka/courses.html

7/18/18 ML Tutorial 43 How to Get Started with ML (cont.)

§ R § e1071: latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes § caret: Classification And REgression Training, set of functions that attempt to streamline the process for creating predictive models § randomForest: random forests for classification and regression § kernlab: classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction § Tutorial: https://www.datacamp.com/community/tutorials/machine- learning-in-r § Julia § ScikitLearn.jl (wrapper interface to Python’s scikit-learn)

7/18/18 ML Tutorial 44 How to Get Started with ML (cont.)

§ Data § UCI Machine Learning Repository: § https://archive.ics.uci.edu/ml/index.php § Kaggle: § https://www.kaggle.com/datasets

7/18/18 ML Tutorial 45 MLDL Workshop

§ Full schedule available Friday, 7/20/18 § Classified Day, 7/30/18 § ML Day, 7/31/18 § DL Day, 8/1/18 § MLDL Challenge still open § develop a MLDL solution to a data science problem § Submissions due 7/24/18

https://mldl.sandia.gov

7/18/18 ML Tutorial 46