– en introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute

[email protected] Twitter: @rosenjosefin #SASFORUMSE

Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – en introduktion

Agenda  Vad är machine learning?  När, var och hur används machine learning?  Exempel –  Machine learning i SAS

Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – vad är det?

Wikipedia: Machine learning, a branch of , concerns the construction and study of systems that can learn from data.

SAS: Machine learning is a branch of artificial intelligence that automates the building of systems that learn from data, identify patterns, and make decisions – with minimal human intervention.

Copyright © 2015, SAS Institute Inc. All rights reserved. Vad är vad egentligen? Statistics Pattern Computational Neuroscience Recognition

Data Science Machine AI Learning

Databases Information Retrieval

Copyright © 2015, SAS Institute Inc. All rights reserved. Machine learning – vad är det?

”Komplicerade metoder, men användbara resultat”

Copyright © 2015, SAS Institute Inc. All rights reserved. När används machine learning?

När modellens prediktionsnoggrannhet är viktigare än tolkningen av modellen

När traditionella tillvägagångssätt inte passar, t ex när man har:  fler variabler än observationer

 många korrelerade variabler

 ostrukturerad data

 fundamentalt ickelinjära eller ovanliga fenomen

Copyright © 2015, SAS Institute Inc. All rights reserved. Beslutsträd Träningsdata

Regression

Neuralt nätverk

Copyright © 2015, SAS Institute Inc. All rights reserved. Var används machine learning?

Några exempel:

 Rekommendationsapplikationer

 Fraud detection

 Prediktivt underhåll

 Textanalys

 Mönster och bildigenkänning

 Den självkörande Google-bilen

Copyright © 2015, SAS Institute Inc. All rights reserved. Statistics Pattern Computational Neuroscience Recognition

Data Science Data Mining Machine AI Learning

Databases Information Retrieval

Copyright © 2015, SAS Institute Inc. All rights reserved. Machine Learning

SUPERVISED UNSUPERVISED SEMI- LEARNING LEARNING

Regression A priori rules Prediction and LASSO regression Clustering classification* k-means clustering Ridge regression Clustering* clustering Decision tree Spectral clustering EM density TSVM Random forests Don’t Sometimes NeuralKnow networks y estimation Manifold SVM Nonnegative regularizationknow y Naïve Bayes matrixknow y factorization Multilayer Neighbors Restricted Boltzmann PCA Data Mining Data Gaussian machines Kernel PCA processes Sparse PCA Singular value decomposition SOM

*In semi-supervised learning, supervised prediction and classification algorithms are often combined with clustering.

Copyright © 2015, SAS Institute Inc. All rights reserved. Deep learning

 Deep learning – att använda neurala nätverk med fler än två gömda lager

 Används framgångsrikt bl a inom mönsterigenkänning

 Bra på att extrahera features från ett dataset

Copyright © 2015, SAS Institute Inc. All rights reserved. MNIST träningsdata

 784 variabler bildar en 28x28 digital grid

 784-dimensionell inputvektor X = (x1,…,x784)  Varierande gråskala från 0 till 255  60,000 träningsbilder med label  10,000 testbilder utan label

Copyright © 2015, SAS Institute Inc. All rights reserved. MNIST exempel

 Träna en stacked denoising

 Extrahera representativa features från MNIST data

 Jämföra med PCA, två PCs

Copyright © 2015, SAS Institute Inc. All rights reserved. Stacked denoising autoencoder Uncorrupted Output Features Target Layer

Hidden Neurons h5

h4 Hidden Neurons Hidden layers h3 Hidden Neurons Extractable Features

h2 Hidden Neurons

h1 Hidden Neurons

Partially Corrupted Input Features Input Layer

Copyright © 2015, SAS Institute Inc. All rights reserved. Record ID Hidden Unit 1 Hidden Unit 2 1 0.98754 0.32453 2 0.76854 0.87345 3 0.87435 0.05464 ⋮ ⋮ ⋮

h3 Hidden Neurons Extractable Features h2 Hidden Neurons

h1 Hidden Neurons

Partially Corrupted Input Features Input Layer

Record ID Pixel 1 Pixel 2 Pixel 3 Pixel 4 Pixel 5 Pixel 6 Pixel 7 Pixel 8 Pixel 9 Pixel 10 …

1 0 0 0 0 0 5 8 11 6 3 …

2 0 0 0 0 10 20 45 46 36 24 … 3 0 25 37 32 40 64 107 200 67 46 … ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Copyright © 2015, SAS Institute Inc. All rights reserved. Feature extraction – denoising autoencoder

Copyright © 2015, SAS Institute Inc. All rights reserved. Feature extraction - PCA

Copyright © 2015, SAS Institute Inc. All rights reserved. SAS machine learning algoritmer

 Neural networks  Expectation maximization  Decision trees  Multivariate adaptive regression  Random forests splines  Associations and sequence  Bayesian networks discovery  Kernel density estimation  Gradient boosting and bagging  Principal components analysis  Support vector machines  Singular value decomposition  Nearest-neighbor mapping  Gaussian mixture models  K-means clustering  Sequential covering rule building  DBSCAN  Model ensembles  Self-organizing maps  Recommendations  Local search optimization techniques such as genetic algorithms

Copyright © 2015, SAS Institute Inc. All rights reserved. SAS-produkter som använder machine learning

 SAS Enterprise Miner

 SAS Text Miner

 SAS In-Memory Statistics for Hadoop

 SAS Visual Statistics

 SAS/STAT

 SAS/OR

 SAS Factory Miner

Copyright © 2015, SAS Institute Inc. All rights reserved.

Algoritm SAS EM-noder SAS procedurer Supervised learning algoritmer Regression High Performance Regression ADAPTIVEREG LARS GAM Partial Least Squares GENMOD Regression GLMSELECT HPGENSELECT HPLOGISTIC HHPQUANTSELECT HPREG LOGISTIC QUANTREG QUANTSELECT REG

Beslutsträd Decision Tree ARBORETUM High Performance Tree HPSPLIT

Random forest High Performance Tree HPFOREST

Gradient boosting Gradient Boosting ARBORETUM

Neurala nätverk AutoNeural HPNEURAL DMNeural NEURAL High Performance Neural Neural Network

Support vector machine High Performance Support Vector Machine HPSVM

Naïve Bayes HPBNET*

Neighbors Memory Based Reasoning DISCRIM

*PROC HPBNET kan lära sig olika nätverksstrukturer (naïve, TAN, PC, och MB) och automatiskt välja den bästa modellen

Copyright © 2015, SAS Institute Inc. All rights reserved. algoritmer

Algoritm SAS EM-noder SAS procedurer

A priori rules Association Link Analysis

K-means klustring Cluster FASTCLUS High Performance Cluster HPCLUS

Spektral klustring Custom lösning genom Base SAS och procedurerna DISTANCE och PRINCOMP

Kernel density estimation KDE

Kernel PCA Custom lösning genom Base SAS och procedurerna CORR, PRINCOMP och SCORE

Singular value decomposition HPTMINE IML

Self organizing maps SOM/Kohonen

Copyright © 2015, SAS Institute Inc. All rights reserved. Semi-Supervised learning algoritmer

Algoritm SAS EM-noder SAS procedurer

Denoising autoencoders HPNEURAL NEURAL

Copyright © 2015, SAS Institute Inc. All rights reserved. Varför har machine learning fått ökat intresse?

 Big data

 Beräkningsresurser

 Kraftfulla datorer“Space is big. You just won't believe how  Billig datalagringvastly, hugely, mind-bogglingly big it is” Douglas Adams i ”Liftarens guide till galaxen”

Copyright © 2015, SAS Institute Inc. All rights reserved. Copyright © 2015, SAS Institute Inc. All rights reserved. Mer läsning

• White papers

 http://www.sas.com/en_us/whitepapers/machine-learning-with-sas-enterprise-miner-107521.html

 http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf

• SAS-länkar

 http://www.sas.com/en_us/insights/analytics/machine-learning.html

 http://www.sas.com/en_us/insights/articles/analytics/introduction-to-machine-learning-five-things-the-quants-wish-we- knew.html

• SAS Data Mining Community  https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining/

• Big Data Matters Webinar Series:  www.sas.com/bigdatamatters

Copyright © 2015, SAS Institute Inc. All rights reserved. Tack!

Copyright © 2015, SAS Institute Inc. All rights reserved.