AN INTRO TO & Meena Arunachalam, Ph.D Principal Engineer, Intel Corp

1 AGENDA

• AI is transformative across all domains • Machine Learning: Usages, Spark MLlib, MKL • Deep Learning: Usages, Topologies, Convolution, MKL • Software stacks, frameworks and segments • Tensorflow and Neon: How to get started

2 History of Machine Learning (ML)

Pioneering Machine Rediscovery of Machine Machine Learning becomes Learning Learning Feasible • Suggested simple • David Rumelhart, Geoff • Inspired by deep learning and machine learning Hinton and Ronald J. help of powerful processors, ML algorithms Williams discovered becomes popular method • , nearest “Backpropagation” neighbor<1950 algorithm • 1970Causes rethinking of ML1990 2010

1950-1960 1980 2000

Statistical Methods Data-Driven Approach Wide Spread • Bayes’s “AI winter” • Analyze large amounts of • ML is widely used Theorem • Pessimist on ML data and conclude from the • Google, Apple, • Markov chains • ML algorithms were data Amazon, … not as effective as • Support Vector Machine expected (SVM), Recurrent Neural Networks (RNN) Artificial intelligence

4 Ai is transformative

Consumer Health Finance Retail Government Energy Transport Industrial Other

Smart Enhanced Algorithmic Support Defense Oil & Gas Autonomous Factory Advertising Assistants Diagnostics Trading Exploration Cars Automation Experience Data Education Chatbots Drug Fraud Insights Smart Automated Predictive Discovery Detection Marketing Grid Trucking Maintenance Gaming Search Safety & Merchandising Professional & Patient Care Research Security Operational Aerospace Precision Personalization IT Services Loyalty Improvement Agriculture Research Personal Resident Shipping Augmented Telco/Media Finance Engagement Conservation Field Reality Sensory Supply Chain Search & Automation Sports Aids Risk Mitigation Security Smarter Rescue Robots Cities Source: Intel forecast

5 The next big wave

Data deluge COMPUTE breakthrough Innovation surge mainframes Standards- Cloud Artificial based servers computing intelligence AI Compute Cycles will grow 12X by 2020 Source: Intel forecast

6 Inside AI experiences

Intel® Nervana™ Cloud & Appliance Intel® Computer Movidius platforms Intel® Nervana™ DL Studio Vision SDK (VPU)

Frameworks Mllib BigDL

Intel® Data Analytics Intel® Nervana™ Graph* Intel Python Acceleration Library libraries Distribution Intel® Math Kernel Library (DAAL) (MKL, MKL-DNN)

* hardware Compute Memory & Storage Networking

*Future Other names and brands may be claimed as the property of others.

7 End-to-end aicompute Wireless and non-IP wired protocols  Secure Ethernet  High throughput  Real-time Cloud/appliance & Wireless gateway Edge Many-to-many hyperscale for stream 1-to-many with majority 1-to-1 devices with lower power and and massive batch data processing streaming data from devices often UX requirements Intel® Xeon® Processors

Intel® Xeon Phi™ Processors* Intel® Core™ & Atom™ Processors

CPU+ Intel® Processor Graphics

Intel® FPGA

Intel® Nervana™ Neural Network Processor (NNP)✝ Vision Movidius VPU

Speech (DC=Datacenter. WS = workstation Intel® GNA (IP)* *Formerly codenamed as the Crest Family All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

8 Modern AI: Theorists vs Practitioners

• Modern AI research has largely split into 2 camps - Theorists: Work on mathematical and statistical problems related to algorithms that learn (Classical ML) . E.g. Is an Email message spam or not spam – Could use Bayesian algorithm to build a model with large dataset – Use the model to filter spam

- Practitioners: Apply ML to various real-work problems guided by experimentation than by mathematical theory. However, use algorithms in their work (DL) . E.g. Is an Email message spam or not spam – One could “tag” what is spam and what is not spam – Could process millions of emails that are spam and billions that are not – Teach a machine to filter spam

9 Classic ML Deep Learning

Using functions or algorithms to extract Using massive data sets to train deep (neural) insights from new data graphs that can extract insights from new data Functions 푓1, 푓2, … , 푓퐾 CNN, RNN, New Untrained Trained . RBM, Data Training . Naïve Bayes Inference or etc. Data* . Decision Trees Classification . Graph Analytics . Regression . Ensemble methods Step 1: Training Step 2: Inference . Support Vector Machines (SVM) Hours to Days Real-Time in Cloud at Edge/Cloud . More… Use massive “known” dataset Form inference about (e.g. 10M tagged images) to new input data (e.g. a iteratively adjust weighting of photo) using trained New Data* neural network connections neural network

*Not all classical machine learning algorithms require separate training and new data sets End-To-End Analytics Flow

• Data analysis process and analytics building blocks despite variety of data formats, usages and domains

Data Source

Extract, Transform, Load (ETL) ML Compute DL Compute Scoring/Prediction

Pre-processing Transformation Analysis Modeling Validation Decision Making

Business

Web/Social Scientific/Engineering

Machine Learning (Training) Hypothesis testing Decompression, Aggregation, Summary Inference. Filtering, Normalization Dimension Reduction Statistics Parameter Estimation Model errors Clustering, etc. Simulation

11 Intel Confidential 1 1 Types of Learning • Supervised (inductive) learning - Training data includes desired outputs - Prediction / Classification (discrete labels), Regression (real values) • - Training data does not include desired outputs - Clustering / probability distribution estimation - Finding association (in features) - Dimension reduction • Semi- - Training data includes a few desired outputs • - Rewards from sequence of actions

12 - Decision making (robot, chess machine) Unsupervised “Weakly” supervised Fully supervised

Definition depends on task

Slide credit: L. Lazebnik Visualizing Types of Learning

Supervised Unsupervised learning learning

Semi- supervised learning 14 First, let there be data!

Data Practical acquisition usage

Universal set (unobserv ed)

Training Testing set set (observe (unobserv d) ed) Training and testing • Training = the process of making the system able to learn • Testing = the process of seeing how well the system learned – Simulates “real world” usage – Training set and testing set come from the same distribution – Need to make some assumptions or bias • Deployment = actually using the learned system in practice What is Deep Learning?

Artificial Intelligence Machine Learning Brain Inspired "I propose to consider the question, ‘Can machines Study to build an algorithm that can “learn” and Create think?’”,a program that “draw” a prediction without specific direction mimics Alanthe work Turing, of human1950 brain

17 Brain Inspired • Human brain - Neuron: basic computational unit of the brain . 86 billion neurons in the brain - Neurons are connected by synapses . Gap between synapses are 1014-1015 - Neurons receive input signal from dendrites, go through axon, send signal to another neuron via synapses

Synapses

18 What is Deep Learning?

Artificial Intelligence Machine Learning Brain Inspired "I propose to consider the question, ‘Can machines Study to build anNeural algorithm Network that can “learn” and Create think?’”,a program that “draw” a prediction without specific direction mimicsThe Alan thenetwork work Turing, thatof human1950 mimics brain the work of human brain

19 Machine learning

푁 × 푁 CLASSIFIER SVM Classic Random Forest (푓1, 푓2, … , 푓퐾) Naïve Bayes Arjun ML Decision Trees Ensemble Methods

푁 × 푁 Deep learning Arjun

~60 million parameters

20 Deep learning

Features are discovered from data

extract features at multiple levels of abstraction

Performance improves with more data

High degree of representational power

21 Deep learning breakthroughs Image recognition Speech recognition 30% 30% 97% 23% person 23% 99% “play song”

15% 15%

Error Error

8% 8% Human Human 0% 0% 2010 Present 2000 Present

22 Deep learning in practice

Healthcare: Tumor detection Industry: Agricultural Robotics Energy: Oil & Gas

Normal Plant

Tumor Weed

Automotive: Speech interfaces Finance: Document Classification Genomics: Sequence analysis

23 Biological Neuron  Artificial Neuron

Dendrite Cell Body Axon

24 Biology Artificial Neural Networks

25 Neural Networks: Weighted Sum

26 Image Source: Stanford Neural Network Weighted Sum

Image Source: Stanford Topologies: Network Depth is of importance Faster & Less Accurate <<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>> More Accurate ResNet-50 AlexNet, 8 Layers VGG, 19 Layers GoogleNet, 22 Layers ResNet-152 Layers (2012) (2014) (2014) (2015) First 35 layers of 152

U Toronto

Visual Geom. Group (VGG) Google Microsoft 28 Deep Learning Networks • Deep Neural Networks (DNN) - Billions of parameters

• Convolutional Neural Networks (CNN) - Millions of parameters

• Recurrent Neural Networks (RNN) - Used for time-series, e.g. speech etc.

29 What is Deep Learning?

“Volvo Image XC90”

Image Source: [Lee et al., Comm. ACM 2011] Deep Neural Networks Example: Image Recognition • How the Features are extracted in DNN

Forward-Pass

Backward-Pass Filters Fully Connected Layers

31 Deep Learning in Action: Image Recognition • AlexNet (Krizhevsky, et. al. 2012) - AlexNet: 5 Convolution Layers, 3 Fully connected layers + Soft-Max, 650K Neurons, 60Mil Weights

Fully Connected Forward-Iter i+1 Backward-Iteri Layers Forward-Pass Backward-Pass

32 Mini-Batch Stochastic Gradient Descent

33 What are RNNs?

The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. However, for many tasks that’s a bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. RNN model

Previous result affect the next stage

Unfold

Vanilla RNN model Language translation Predicting missing words

The basic idea is connect previous information in the past

“The Clouds are in the sky” sky

Clouds in Image Descriptors More advanced than simple recognition Aligns the generated words with features found in images

In the long run: automatically narrating a soccer game? Acknowledgements and Resources

ACK: Material collected from many public sources: Yann LeCun, Intel Corp, NVIDIA, MIT Eyeriss Project, MIT CSAIL, Alex Ororbia, Jihyun Ryoo, Stanford University, Mahmut Kandemir  http://deeplearning.net/  http://deeplearning.net/software_links/