AN INTRO TO Machine Learning & Deep Learning Meena Arunachalam, Ph.D Principal Engineer, Intel Corp
1 AGENDA
• AI is transformative across all domains • Machine Learning: Usages, Spark MLlib, MKL • Deep Learning: Usages, Topologies, Convolution, MKL • Software stacks, frameworks and segments • Tensorflow and Neon: How to get started
2 History of Machine Learning (ML)
Pioneering Machine Rediscovery of Machine Machine Learning becomes Learning Learning Feasible • Suggested simple • David Rumelhart, Geoff • Inspired by deep learning and machine learning Hinton and Ronald J. help of powerful processors, ML algorithms Williams discovered becomes popular method • Perceptron, nearest “Backpropagation” neighbor<1950 algorithm • 1970Causes rethinking of ML1990 2010
1950-1960 1980 2000
Statistical Methods Data-Driven Approach Wide Spread • Bayes’s “AI winter” • Analyze large amounts of • ML is widely used Theorem • Pessimist on ML data and conclude from the • Google, Apple, • Markov chains • ML algorithms were data Amazon, … not as effective as • Support Vector Machine expected (SVM), Recurrent Neural Networks (RNN) Artificial intelligence
4 Ai is transformative
Consumer Health Finance Retail Government Energy Transport Industrial Other
Smart Enhanced Algorithmic Support Defense Oil & Gas Autonomous Factory Advertising Assistants Diagnostics Trading Exploration Cars Automation Experience Data Education Chatbots Drug Fraud Insights Smart Automated Predictive Discovery Detection Marketing Grid Trucking Maintenance Gaming Search Safety & Merchandising Professional & Patient Care Research Security Operational Aerospace Precision Personalization IT Services Loyalty Improvement Agriculture Research Personal Resident Shipping Augmented Telco/Media Finance Engagement Conservation Field Reality Sensory Supply Chain Search & Automation Sports Aids Risk Mitigation Security Smarter Rescue Robots Cities Source: Intel forecast
5 The next big wave
Data deluge COMPUTE breakthrough Innovation surge mainframes Standards- Cloud Artificial based servers computing intelligence AI Compute Cycles will grow 12X by 2020 Source: Intel forecast
6 Inside AI experiences
Intel® Nervana™ Cloud & Appliance Intel® Computer Movidius platforms Intel® Nervana™ DL Studio Vision SDK (VPU)
Frameworks Mllib BigDL
Intel® Data Analytics Intel® Nervana™ Graph* Intel Python Acceleration Library libraries Distribution Intel® Math Kernel Library (DAAL) (MKL, MKL-DNN)
* hardware Compute Memory & Storage Networking
*Future Other names and brands may be claimed as the property of others.
7 End-to-end aicompute Wireless and non-IP wired protocols Secure Ethernet High throughput Real-time Cloud/appliance & Wireless gateway Edge Many-to-many hyperscale for stream 1-to-many with majority 1-to-1 devices with lower power and and massive batch data processing streaming data from devices often UX requirements Intel® Xeon® Processors
Intel® Xeon Phi™ Processors* Intel® Core™ & Atom™ Processors
CPU+ Intel® Processor Graphics
Intel® FPGA
Intel® Nervana™ Neural Network Processor (NNP)✝ Vision Movidius VPU
Speech (DC=Datacenter. WS = workstation Intel® GNA (IP)* *Formerly codenamed as the Crest Family All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
8 Modern AI: Theorists vs Practitioners
• Modern AI research has largely split into 2 camps - Theorists: Work on mathematical and statistical problems related to algorithms that learn (Classical ML) . E.g. Is an Email message spam or not spam – Could use Bayesian algorithm to build a model with large dataset – Use the model to filter spam
- Practitioners: Apply ML to various real-work problems guided by experimentation than by mathematical theory. However, use algorithms in their work (DL) . E.g. Is an Email message spam or not spam – One could “tag” what is spam and what is not spam – Could process millions of emails that are spam and billions that are not – Teach a machine to filter spam
9 Classic ML Deep Learning
Using functions or algorithms to extract Using massive data sets to train deep (neural) insights from new data graphs that can extract insights from new data Functions 푓1, 푓2, … , 푓퐾 CNN, RNN, New Untrained Trained . Random Forest RBM, Data Training . Naïve Bayes Inference or etc. Data* . Decision Trees Classification . Graph Analytics . Regression . Ensemble methods Step 1: Training Step 2: Inference . Support Vector Machines (SVM) Hours to Days Real-Time in Cloud at Edge/Cloud . More… Use massive “known” dataset Form inference about (e.g. 10M tagged images) to new input data (e.g. a iteratively adjust weighting of photo) using trained New Data* neural network connections neural network
*Not all classical machine learning algorithms require separate training and new data sets End-To-End Analytics Flow
• Data analysis process and analytics building blocks despite variety of data formats, usages and domains
Data Source
Extract, Transform, Load (ETL) ML Compute DL Compute Scoring/Prediction
Pre-processing Transformation Analysis Modeling Validation Decision Making
Business
Web/Social Scientific/Engineering
Machine Learning (Training) Hypothesis testing Decompression, Aggregation, Summary Inference. Filtering, Normalization Dimension Reduction Statistics Parameter Estimation Model errors Clustering, etc. Simulation
11 Intel Confidential 1 1 Types of Learning • Supervised (inductive) learning - Training data includes desired outputs - Prediction / Classification (discrete labels), Regression (real values) • Unsupervised learning - Training data does not include desired outputs - Clustering / probability distribution estimation - Finding association (in features) - Dimension reduction • Semi-supervised learning - Training data includes a few desired outputs • Reinforcement learning - Rewards from sequence of actions
12 - Decision making (robot, chess machine) Unsupervised “Weakly” supervised Fully supervised
Definition depends on task
Slide credit: L. Lazebnik Visualizing Types of Learning
Supervised Unsupervised learning learning
Semi- supervised learning 14 First, let there be data!
Data Practical acquisition usage
Universal set (unobserv ed)
Training Testing set set (observe (unobserv d) ed) Training and testing • Training = the process of making the system able to learn • Testing = the process of seeing how well the system learned – Simulates “real world” usage – Training set and testing set come from the same distribution – Need to make some assumptions or bias • Deployment = actually using the learned system in practice What is Deep Learning?
Artificial Intelligence Machine Learning Brain Inspired "I propose to consider the question, ‘Can machines Study to build an algorithm that can “learn” and Create think?’”,a program that “draw” a prediction without specific direction mimics Alanthe work Turing, of human1950 brain
17 Brain Inspired • Human brain - Neuron: basic computational unit of the brain . 86 billion neurons in the brain - Neurons are connected by synapses . Gap between synapses are 1014-1015 - Neurons receive input signal from dendrites, go through axon, send signal to another neuron via synapses
Synapses
18 What is Deep Learning?
Artificial Intelligence Machine Learning Brain Inspired "I propose to consider the question, ‘Can machines Study to build anNeural algorithm Network that can “learn” and Create think?’”,a program that “draw” a prediction without specific direction mimicsThe Alan thenetwork work Turing, thatof human1950 mimics brain the work of human brain
19 Machine learning
푁 × 푁 CLASSIFIER SVM Classic Random Forest (푓1, 푓2, … , 푓퐾) Naïve Bayes Arjun ML Decision Trees Logistic Regression Ensemble Methods
푁 × 푁 Deep learning Arjun
~60 million parameters
20 Deep learning
Features are discovered from data
extract features at multiple levels of abstraction
Performance improves with more data
High degree of representational power
21 Deep learning breakthroughs Image recognition Speech recognition 30% 30% 97% 23% person 23% 99% “play song”
15% 15%
Error Error
8% 8% Human Human 0% 0% 2010 Present 2000 Present
22 Deep learning in practice
Healthcare: Tumor detection Industry: Agricultural Robotics Energy: Oil & Gas
Normal Plant
Tumor Weed
Automotive: Speech interfaces Finance: Document Classification Genomics: Sequence analysis
23 Biological Neuron Artificial Neuron
Dendrite Cell Body Axon
24 Biology Artificial Neural Networks
25 Neural Networks: Weighted Sum
26 Image Source: Stanford Neural Network Weighted Sum
Image Source: Stanford Topologies: Network Depth is of importance Faster & Less Accurate <<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>> More Accurate ResNet-50 AlexNet, 8 Layers VGG, 19 Layers GoogleNet, 22 Layers ResNet-152 Layers (2012) (2014) (2014) (2015) First 35 layers of 152
U Toronto
Visual Geom. Group (VGG) Google Microsoft 28 Deep Learning Networks • Deep Neural Networks (DNN) - Billions of parameters
• Convolutional Neural Networks (CNN) - Millions of parameters
• Recurrent Neural Networks (RNN) - Used for time-series, e.g. speech etc.
29 What is Deep Learning?
“Volvo Image XC90”
Image Source: [Lee et al., Comm. ACM 2011] Deep Neural Networks Example: Image Recognition • How the Features are extracted in DNN
Forward-Pass
Backward-Pass Filters Fully Connected Layers
31 Deep Learning in Action: Image Recognition • AlexNet (Krizhevsky, et. al. 2012) - AlexNet: 5 Convolution Layers, 3 Fully connected layers + Soft-Max, 650K Neurons, 60Mil Weights
Fully Connected Forward-Iter i+1 Backward-Iteri Layers Forward-Pass Backward-Pass
32 Mini-Batch Stochastic Gradient Descent
33 What are RNNs?
The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. However, for many tasks that’s a bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. RNN model
Previous result affect the next stage
Unfold
Vanilla RNN model Language translation Predicting missing words
The basic idea is connect previous information in the past
“The Clouds are in the sky” sky
Clouds in Image Descriptors More advanced than simple recognition Aligns the generated words with features found in images
In the long run: automatically narrating a soccer game? Acknowledgements and Resources
ACK: Material collected from many public sources: Yann LeCun, Intel Corp, NVIDIA, MIT Eyeriss Project, MIT CSAIL, Alex Ororbia, Jihyun Ryoo, Stanford University, Mahmut Kandemir http://deeplearning.net/ http://deeplearning.net/software_links/