Deep Learning Meena Arunachalam, Ph.D Principal Engineer, Intel Corp
Total Page:16
File Type:pdf, Size:1020Kb
AN INTRO TO Machine Learning & Deep Learning Meena Arunachalam, Ph.D Principal Engineer, Intel Corp 1 AGENDA • AI is transformative across all domains • Machine Learning: Usages, Spark MLlib, MKL • Deep Learning: Usages, Topologies, Convolution, MKL • Software stacks, frameworks and segments • Tensorflow and Neon: How to get started 2 History of Machine Learning (ML) Pioneering Machine Rediscovery of Machine Machine Learning becomes Learning Learning Feasible • Suggested simple • David Rumelhart, Geoff • Inspired by deep learning and machine learning Hinton and Ronald J. help of powerful processors, ML algorithms Williams discovered becomes popular method • Perceptron, nearest “Backpropagation” neighbor<1950 algorithm • 1970Causes rethinking of ML1990 2010 1950-1960 1980 2000 Statistical Methods Data-Driven Approach Wide Spread • Bayes’s “AI winter” • Analyze large amounts of • ML is widely used Theorem • Pessimist on ML data and conclude from the • Google, Apple, • Markov chains • ML algorithms were data Amazon, … not as effective as • Support Vector Machine expected (SVM), Recurrent Neural Networks (RNN) Artificial intelligence 4 Ai is transformative Consumer Health Finance Retail Government Energy Transport Industrial Other Smart Enhanced Algorithmic Support Defense Oil & Gas Autonomous Factory Advertising Assistants Diagnostics Trading Exploration Cars Automation Experience Data Education Chatbots Drug Fraud Insights Smart Automated Predictive Discovery Detection Marketing Grid Trucking Maintenance Gaming Search Safety & Merchandising Professional & Patient Care Research Security Operational Aerospace Precision Personalization IT Services Loyalty Improvement Agriculture Research Personal Resident Shipping Augmented Telco/Media Finance Engagement Conservation Field Reality Sensory Supply Chain Search & Automation Sports Aids Risk Mitigation Security Smarter Rescue Robots Cities Source: Intel forecast 5 The next big wave Data deluge COMPUTE breakthrough Innovation surge mainframes Standards- Cloud Artificial based servers computing intelligence AI Compute Cycles will grow 12X by 2020 Source: Intel forecast 6 Inside AI experiences Intel® Nervana™ Cloud & Appliance Intel® Computer Movidius platforms Intel® Nervana™ DL Studio Vision SDK (VPU) Frameworks Mllib BigDL Intel® Data Analytics Intel® Nervana™ Graph* Intel Python Acceleration Library libraries Distribution Intel® Math Kernel Library (DAAL) (MKL, MKL-DNN) * hardware Compute Memory & Storage Networking *Future Other names and brands may be claimed as the property of others. 7 End-to-end aicompute Wireless and non-IP wired protocols Secure Ethernet High throughput Real-time Cloud/appliance & Wireless gateway Edge Many-to-many hyperscale for stream 1-to-many with majority 1-to-1 devices with lower power and and massive batch data processing streaming data from devices often UX requirements Intel® Xeon® Processors Intel® Xeon Phi™ Processors* Intel® Core™ & Atom™ Processors CPU+ Intel® Processor Graphics Intel® FPGA Intel® Nervana™ Neural Network Processor (NNP)✝ Vision Movidius VPU Speech (DC=Datacenter. WS = workstation Intel® GNA (IP)* *Formerly codenamed as the Crest Family All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. 8 Modern AI: Theorists vs Practitioners • Modern AI research has largely split into 2 camps - Theorists: Work on mathematical and statistical problems related to algorithms that learn (Classical ML) . E.g. Is an Email message spam or not spam – Could use Bayesian algorithm to build a model with large dataset – Use the model to filter spam - Practitioners: Apply ML to various real-work problems guided by experimentation than by mathematical theory. However, use algorithms in their work (DL) . E.g. Is an Email message spam or not spam – One could “tag” what is spam and what is not spam – Could process millions of emails that are spam and billions that are not – Teach a machine to filter spam 9 Classic ML Deep Learning Using functions or algorithms to extract Using massive data sets to train deep (neural) insights from new data graphs that can extract insights from new data Functions 푓1, 푓2, … , 푓퐾 CNN, RNN, New Untrained Trained . Random Forest RBM, Data Training . Naïve Bayes Inference or etc. Data* . Decision Trees Classification . Graph Analytics . Regression . Ensemble methods Step 1: Training Step 2: Inference . Support Vector Machines (SVM) Hours to Days Real-Time in Cloud at Edge/Cloud . More… Use massive “known” dataset Form inference about (e.g. 10M tagged images) to new input data (e.g. a iteratively adjust weighting of photo) using trained New Data* neural network connections neural network *Not all classical machine learning algorithms require separate training and new data sets End-To-End Analytics Flow • Data analysis process and analytics building blocks despite variety of data formats, usages and domains Data Source Extract, Transform, Load (ETL) ML Compute DL Compute Scoring/Prediction Pre-processing Transformation Analysis Modeling Validation Decision Making Business Web/Social Scientific/Engineering Machine Learning (Training) Hypothesis testing Decompression, Aggregation, Summary Inference. Filtering, Normalization Dimension Reduction Statistics Parameter Estimation Model errors Clustering, etc. Simulation 11 Intel Confidential 1 1 Types of Learning • Supervised (inductive) learning - Training data includes desired outputs - Prediction / Classification (discrete labels), Regression (real values) • Unsupervised learning - Training data does not include desired outputs - Clustering / probability distribution estimation - Finding association (in features) - Dimension reduction • Semi-supervised learning - Training data includes a few desired outputs • Reinforcement learning - Rewards from sequence of actions 12 - Decision making (robot, chess machine) Unsupervised “Weakly” supervised Fully supervised Definition depends on task Slide credit: L. Lazebnik Visualizing Types of Learning Supervised Unsupervised learning learning Semi- supervised learning 14 First, let there be data! Data Practical acquisition usage Universal set (unobserv ed) Training Testing set set (observe (unobserv d) ed) Training and testing • Training = the process of making the system able to learn • Testing = the process of seeing how well the system learned – Simulates “real world” usage – Training set and testing set come from the same distribution – Need to make some assumptions or bias • Deployment = actually using the learned system in practice What is Deep Learning? Artificial Intelligence Machine Learning Brain Inspired "I propose to consider the question, ‘Can machines Study to build an algorithm that can “learn” and Create think?’”,a program that “draw” a prediction without specific direction mimics Alanthe work Turing, of human1950 brain 17 Brain Inspired • Human brain - Neuron: basic computational unit of the brain . 86 billion neurons in the brain - Neurons are connected by synapses . Gap between synapses are 1014-1015 - Neurons receive input signal from dendrites, go through axon, send signal to another neuron via synapses Synapses 18 What is Deep Learning? Artificial Intelligence Machine Learning Brain Inspired "I propose to consider the question, ‘Can machines Study to build anNeural algorithm Network that can “learn” and Create think?’”,a program that “draw” a prediction without specific direction mimicsThe Alan thenetwork work Turing, thatof human1950 mimics brain the work of human brain 19 Machine learning 푁 × 푁 CLASSIFIER SVM Classic Random Forest (푓1, 푓2, … , 푓퐾) Naïve Bayes Arjun ML Decision Trees Logistic Regression Ensemble Methods 푁 × 푁 Deep learning Arjun ~60 million parameters 20 Deep learning Features are discovered from data extract features at multiple levels of abstraction Performance improves with more data High degree of representational power 21 Deep learning breakthroughs Image recognition Speech recognition 30% 30% 97% 23% person 23% 99% “play song” 15% 15% Error Error 8% 8% Human Human 0% 0% 2010 Present 2000 Present 22 Deep learning in practice Healthcare: Tumor detection Industry: Agricultural Robotics Energy: Oil & Gas Normal Plant Tumor Weed Automotive: Speech interfaces Finance: Document Classification Genomics: Sequence analysis 23 Biological Neuron Artificial Neuron Dendrite Cell Body Axon 24 Biology Artificial Neural Networks 25 Neural Networks: Weighted Sum 26 Image Source: Stanford Neural Network Weighted Sum Image Source: Stanford Topologies: Network Depth is of importance Faster & Less Accurate <<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>> More Accurate ResNet-50 AlexNet, 8 Layers VGG, 19 Layers GoogleNet, 22 Layers ResNet-152 Layers (2012) (2014) (2014) (2015) First 35 layers of 152 U Toronto Visual Geom. Group (VGG) Google Microsoft 28 Deep Learning Networks • Deep Neural Networks (DNN) - Billions of parameters • Convolutional Neural Networks (CNN) - Millions of parameters • Recurrent Neural Networks (RNN) - Used for time-series, e.g. speech etc. 29 What is Deep Learning? “Volvo Image XC90” Image Source: [Lee et al., Comm. ACM 2011] Deep Neural Networks Example: Image Recognition • How the Features are extracted in DNN Forward-Pass Backward-Pass Filters Fully Connected Layers 31 Deep Learning in Action: Image Recognition • AlexNet (Krizhevsky, et. al. 2012) - AlexNet: 5 Convolution Layers, 3 Fully connected layers + Soft-Max, 650K Neurons, 60Mil Weights Fully Connected Forward-Iter i+1 Backward-Iteri Layers Forward-Pass