Machine learning – en introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute
[email protected] Twitter: @rosenjosefin #SASFORUMSE
Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – en introduktion
Agenda Vad är machine learning? När, var och hur används machine learning? Exempel – deep learning Machine learning i SAS
Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – vad är det?
Wikipedia: Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
SAS: Machine learning is a branch of artificial intelligence that automates the building of systems that learn from data, identify patterns, and make decisions – with minimal human intervention.
Copyright © 2015, SAS Institute Inc. All rights reserved. Vad är vad egentligen? Statistics Pattern Computational Neuroscience Recognition
Data Science Data Mining Machine AI Learning
Databases Information Retrieval
Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – vad är det?
”Komplicerade metoder, men användbara resultat”
Copyright © 2015, SAS Institute Inc. All rights reserved. När används machine learning?
När modellens prediktionsnoggrannhet är viktigare än tolkningen av modellen
När traditionella tillvägagångssätt inte passar, t ex när man har: fler variabler än observationer
många korrelerade variabler
ostrukturerad data
fundamentalt ickelinjära eller ovanliga fenomen
Copyright © 2015, SAS Institute Inc. All rights reserved. Beslutsträd Träningsdata
Regression
Neuralt nätverk
Copyright © 2015, SAS Institute Inc. All rights reserved. Var används machine learning?
Några exempel:
Rekommendationsapplikationer
Fraud detection
Prediktivt underhåll
Textanalys
Mönster och bildigenkänning
Den självkörande Google-bilen
Copyright © 2015, SAS Institute Inc. All rights reserved. Statistics Pattern Computational Neuroscience Recognition
Data Science Data Mining Machine AI Learning
Databases Information Retrieval
Copyright © 2015, SAS Institute Inc. All rights reserved. Machine Learning
SUPERVISED UNSUPERVISED SEMI-SUPERVISED LEARNING LEARNING LEARNING
Regression A priori rules Prediction and LASSO regression Clustering classification* Logistic regression k-means clustering Ridge regression Clustering* Mean shift clustering Decision tree Spectral clustering EM Gradient boosting Kernel density TSVM Random forests Don’t Sometimes NeuralKnow networks y estimation Manifold SVM Nonnegative regularizationknow y Naïve Bayes matrixknow y Autoencoders factorization Multilayer perceptron Neighbors Restricted Boltzmann PCA Data Mining Data Gaussian machines Kernel PCA processes Sparse PCA Singular value decomposition SOM
*In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering.
Copyright © 2015, SAS Institute Inc. All rights reserved. Deep learning
Deep learning – att använda neurala nätverk med fler än två gömda lager
Används framgångsrikt bl a inom mönsterigenkänning
Bra på att extrahera features från ett dataset
Copyright © 2015, SAS Institute Inc. All rights reserved. MNIST träningsdata
784 variabler bildar en 28x28 digital grid
784-dimensionell inputvektor X = (x1,…,x784) Varierande gråskala från 0 till 255 60,000 träningsbilder med label 10,000 testbilder utan label
Copyright © 2015, SAS Institute Inc. All rights reserved. MNIST exempel
Träna en stacked denoising autoencoder
Extrahera representativa features från MNIST data
Jämföra med PCA, två PCs
Copyright © 2015, SAS Institute Inc. All rights reserved. Stacked denoising autoencoder Uncorrupted Output Features Target Layer
Hidden Neurons h5
h4 Hidden Neurons Hidden layers h3 Hidden Neurons Extractable Features
h2 Hidden Neurons
h1 Hidden Neurons
Partially Corrupted Input Features Input Layer
Copyright © 2015, SAS Institute Inc. All rights reserved. Record ID Hidden Unit 1 Hidden Unit 2 1 0.98754 0.32453 2 0.76854 0.87345 3 0.87435 0.05464 ⋮ ⋮ ⋮
h3 Hidden Neurons Extractable Features h2 Hidden Neurons
h1 Hidden Neurons
Partially Corrupted Input Features Input Layer
Record ID Pixel 1 Pixel 2 Pixel 3 Pixel 4 Pixel 5 Pixel 6 Pixel 7 Pixel 8 Pixel 9 Pixel 10 …
1 0 0 0 0 0 5 8 11 6 3 …
2 0 0 0 0 10 20 45 46 36 24 … 3 0 25 37 32 40 64 107 200 67 46 … ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Copyright © 2015, SAS Institute Inc. All rights reserved. Feature extraction – denoising autoencoder
Copyright © 2015, SAS Institute Inc. All rights reserved. Feature extraction - PCA
Copyright © 2015, SAS Institute Inc. All rights reserved. SAS machine learning algoritmer
Neural networks Expectation maximization Decision trees Multivariate adaptive regression Random forests splines Associations and sequence Bayesian networks discovery Kernel density estimation Gradient boosting and bagging Principal components analysis Support vector machines Singular value decomposition Nearest-neighbor mapping Gaussian mixture models K-means clustering Sequential covering rule building DBSCAN Model ensembles Self-organizing maps Recommendations Local search optimization techniques such as genetic algorithms
Copyright © 2015, SAS Institute Inc. All rights reserved. SAS-produkter som använder machine learning
SAS Enterprise Miner
SAS Text Miner
SAS In-Memory Statistics for Hadoop
SAS Visual Statistics
SAS/STAT
SAS/OR
SAS Factory Miner
Copyright © 2015, SAS Institute Inc. All rights reserved.
Algoritm SAS EM-noder SAS procedurer Supervised learning algoritmer Regression High Performance Regression ADAPTIVEREG LARS GAM Partial Least Squares GENMOD Regression GLMSELECT HPGENSELECT HPLOGISTIC HHPQUANTSELECT HPREG LOGISTIC QUANTREG QUANTSELECT REG
Beslutsträd Decision Tree ARBORETUM High Performance Tree HPSPLIT
Random forest High Performance Tree HPFOREST
Gradient boosting Gradient Boosting ARBORETUM
Neurala nätverk AutoNeural HPNEURAL DMNeural NEURAL High Performance Neural Neural Network
Support vector machine High Performance Support Vector Machine HPSVM
Naïve Bayes HPBNET*
Neighbors Memory Based Reasoning DISCRIM
*PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen
Copyright © 2015, SAS Institute Inc. All rights reserved. Unsupervised learning algoritmer
Algoritm SAS EM-noder SAS procedurer
A priori rules Association Link Analysis
K-means klustring Cluster FASTCLUS High Performance Cluster HPCLUS
Spektral klustring Custom lösning genom Base SAS och procedurerna DISTANCE och PRINCOMP
Kernel density estimation KDE
Kernel PCA Custom lösning genom Base SAS och procedurerna CORR, PRINCOMP och SCORE
Singular value decomposition HPTMINE IML
Self organizing maps SOM/Kohonen
Copyright © 2015, SAS Institute Inc. All rights reserved. Semi-Supervised learning algoritmer
Algoritm SAS EM-noder SAS procedurer
Denoising autoencoders HPNEURAL NEURAL
Copyright © 2015, SAS Institute Inc. All rights reserved. Varför har machine learning fått ökat intresse?
Big data
Beräkningsresurser
Kraftfulla datorer“Space is big. You just won't believe how Billig datalagringvastly, hugely, mind-bogglingly big it is” Douglas Adams i ”Liftarens guide till galaxen”
Copyright © 2015, SAS Institute Inc. All rights reserved. Copyright © 2015, SAS Institute Inc. All rights reserved. Mer läsning
• White papers
http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html
http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
• SAS-länkar
http://www.sas.com/en_us/insights/analytics/machine-learning.html
http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-we- knew.html
• SAS Data Mining Community https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/
• Big Data Matters Webinar Series: www.sas.com/bigdatamatters
Copyright © 2015, SAS Institute Inc. All rights reserved. Tack!
Copyright © 2015, SAS Institute Inc. All rights reserved.