Caffe Deep Dive Session

May 22-25, 2017 | San Francisco Machine Learning/Deep Learning Track ▶ Caffe Deep Dive Session ▶ Increase your expertise on the Caffe framework through hands-on guided exercises. You will experience fast, scalable, easy machine learning with openPOWER, GPUs and Docker on the Nimbix public cloud. You’ll look at image classification with a trained model. Train a model of a small dataset with DIGITS and image classify with your own trained model. Compare different techniques and explore the ”Model Zoo”. ▶ Duration – 1.5 hours ▶ Day/Time – Thu 25 9:00AM – 10:30AM ▶ Leader/Trainers – Jason Furmanek, Franck Barillaud, Clarisse Taaffe-Hedglin May 22-25, 2017 San Francisco Agenda Deep Learning and Caffe Why Now Why Caffe Hands–on Lab: Caffe and Digits Inference using a trained model Training a new model May 22-25, 2017 San Francisco Machines Are Learning the Way We Learn…. ▪From "Texture of the Nervous System of Man and the Vertebrates" ▪Artificial Neural Networks by Santiago Ramón y Cajal. May 22-25, 2017 San Francisco 4 Deep Learning Across All Industries Automotive and Security and Public Consumer Web, Medicine and Biology Broadcast, Media and Transportation Safety Mobile, Retail Entertainment • Autonomous driving: • Video Surveillance • Image tagging • Drug discovery • Captioning • Pedestrian detection • Image analysis • Speech recognition • Diagnostic assistance • Search • Accident avoidance • Facial recognition and • Natural language • Cancer cell detection • Recommendations detection • Sentiment analysis • Real time translation Auto, trucking, heavy Local and national Hyperscale web Pharmaceutical, Medical Consumer facing equipment, Tier 1 police, public and companies, large equipment, Diagnostic companies with large suppliers (Hyundai, private safety/ security retail (Google photos, labs (Takeda, Asian streaming of existing Toyota, Komatsu, General (ADT, IViz, Pinkerton, Twitter, Woolworths, Pharma, Pfizer) media, or real time Motors, Volvo) Sentry) Aeon) content May 22-25, 2017 San Francisco Why Now? → Data → Compute → Technique May 22-25, 2017 San Francisco 6 Data Sources MNIST 0-9 Over 1000 datasets at: https://www.kaggle.com/datasets CIFAR-10 (RBG) ImageNet 10,000,000 labeled images 300,000 Labeled images depicting 10,000+ object categories https://quickdraw.withgoogle.com/data Learned filter for AlexNet, Krizhevsky et al. 2012 May 22-25, 2017 San Francisco Data & Compute at the Core of Training Inference Training Inference •Data intensive: historical •Enables the computer to data sets act in real time •Compute intensive: •Low Power 100% accelerated •Out at the edge •Develop a model for use on the edge as inference May 22-25, 2017 San Francisco Requires a Lot of Computational Resources Easy scale-out with: But training DL Models takes time and effort Training can take hours, days or weeks Input data and model sizes are becoming larger than ever (e.g. video input, billions of features etc.) Moore’s law is dying Real-time analytics with: Resulting in…. Unprecedented demand for offloaded computation, accelerators, and higher memory bandwidth systems May 22-25, 2017 San Francisco OpenPOWER: Open Hardware for High Performance Systems designed for big data analytics and superior cloud economics Traditional Intel x86 https://power.jarvice.com OpenPOWER https://mc.jarvice.com Upto: 12 cores per cpu 96 hardware threads per cpu ▪1 TB RAM 7.6Tb/s combined I/O Bandwidth GPUs and FPGAs coming… http://www.softlayer.com/POWER-SERVERS May 22-25, 2017 San Francisco 1 0 New Paradigm, New Chip, New Servers for AI New Chip New Power Accelerated AI “POWER8 with NVLink” Linux Servers S821LC: High Density 2-Socket 1U S822LC for Big Data Accelerator X POWER8 + coherent CAPI + novel NVlink for high BW coherent CPU/GPU acceleration S822LC for High Performance Computing May 22-25, 2017 M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise San Francisco NVLink and P100 advantage NVLink reduces communication time and overhead Incorporating the fastest GPU for deep learning Data gets from GPU-GPU, Memory-GPU faster, for shorter training times IBM advantage: data communication and GPU performance POWER8 + 78 ms Tesla P100+NVLink x86 based 170 ms GPU system ImageNet / Alexnet: Minibatch size = 128 May 22-25, 2017 San Francisco Accelerated Compute with OpenPOWER and NVIDIA GPU Huge speed-ups with GPUs and OpenPOWER! Mesos supports GPUs IBM Spectrum Conductor includes enhanced support for fine grained GPU and CPU scheduling with Apache Spark and Docker ▪Credit: Kevin Klaues, Mesosphere May 22-25, 2017 San Francisco 1 3 Introducing PowerAI to Get Started Fast Package of Pre- Easy to install & get Optimized for Compiled Major started with Deep Performance Deep Learning Learning with Enterprise- To Take Advantage of Frameworks Class Support NVLink https://www.ibm.com/us-en/marketplace/deep-learning-platform Enabled by High Performance Computing Infrastructure May 22-25, 2017 San Francisco14 NVIDIA Docker Available on OpenPOWER • A docker wrapper and tools to package and GPU based apps • Uses drivers on the host • Manual GPU assignment • Good for single node Credit: https://github.com/NVIDIA/nvidia-docker May 22-25, 2017 San Francisco 1 5 DevOps for accelerated Deep Learning on POWER Build Servers VM VM Image Registry Developer Jane DockerHub Inference Micro Services in Docker containers POWER Bare metals with GPUs for model training Shared Data Store May 22-25, 2017 San Francisco 1 6 Technique Increasing in Complexity Convolutional Neural Networks (CNNs) are evolving May 22-25, 2017 San Francisco Caffe Framework Addresses Technique frontend: a language for any network, any task tools: And plenty With the visualization, profiling, network right tools debugging, etc. internal of examples representation backend: layer library: dispatch compute for fast implementations of learning and inference common functions and gradients framework From the Concrete to the Abstract May 22-25, 2017 San Francisco 1 Caffe Overview DL Framework developed by the Berkeley AI Research (BAIR) Formerly Berkeley Vision and Learning Center (BVLC) Yangqing Jia created the project during his PhD at UC Berkeley Caffe is released under the BSD 2-Clause license https://github.com/BVLC/caffe Strong developer/user community: 11,000+forks, 248 Contributors Written in C++, CUDA; Command line, Python and Matlab interfaces May 22-25, 2017 San Francisco Caffe Constructs Model Development name: "LogReg" Blob: how you represent your data (N-D Arrays) layer { name: "mnist" Layer: takes bottom blobs (input) to top blobs (output) and back with weights type: "Data" top: "data" and biases top: "label" Setup: initialize the layer and its connections data_param { source: "input_leveldb" Forward: given input from bottom compute the output and batch_size: 64 } send to the top } Backward: given the gradient w.r.t. the top output compute layer { name: "ip" the gradient w.r.t. to the input and send to the bottom. type: "InnerProduct" bottom: "data" Net: is a collection of layers organized as a directed graph top: "ip" Including a loss layer that defines the learning inner_product_param { num_output: 2 } Implementation/Optimization } layer { Solver: how you train your model with methods that address name: "loss" type: "SoftmaxWithLoss" the general optimization problem of loss minimization bottom: "ip" bottom: "label" top: "loss" Uses Google’s protocol buffer: you build a model schema: the code is built for you } May 22-25, 2017 San Francisco Caffe Use Cases and Model Zoo Primarily focused on vision but branching out: Sequences, reinforcement learning, speech + text Used by many companies including Facebook, Pinterest, Adobe at scale production implementations Provides Reference Models via Model Zoo May 22-25, 2017 San Francisco Caffe Model Zoo Not Just a static repository but a strong community when you can find model definitions, optimization parameters, solver configurations, pre-trained weights Table of Contents includes over 40 areas of interest https://github.com/BVLC/caffe/wiki/Model-Zoo Great starting point for a fast start or fine tuning May 22-25, 2017 San Francisco Fine Tuning Made Easy A trained model that classifies objects into a defined number of categories may Fine Tuning Tips not be sufficient to classify objects into a Start with the last layer first •Caffe layers have local learning rates: param { lr_mult: 1 } new set of categories based on the •Freeze all but the last layer for fast optimization and avoiding early divergence by setting lr_mult: 0 learned features. to fix a parameter. •Stop if good enough, or keep fine-tuning Fine tuning is a way to copy the layers of Reduce the learning rate •Drop the solver learning rate by 10x, 100x features you want to keep and modify the •Preserve the initialization from pre-training and avoid divergence model layers to learn new weights for new categories. May 22-25, 2017 San Francisco Why Caffe Active Community Fast and efficient Plenty of Reference Models to build upon May 22-25, 2017 San Francisco Tutorial Steps → Connect to the NIMBIX public cloud → Image Classification with Caffe and a trained model → Train a model on a small dataset with DIGITS → Image Classify with your own trained model in DIGITS https://www.slideshare.net/IndrajitPoddar/fast-scalable-easy-machine-learning-with-openpower-gpus-and-docker May 22-25, 2017 San Francisco In Summary →We reviewed the structure and advantages of the Caffe Deep Learning framework →We've used the NIMBIX Power container cloud running Docker containers with GPUs. →We built new Docker containers,

Caffe Deep Dive Session

Automated Analysis of Speculation Windows in Spectre Attacks

Defeating Invisible Enemies:Firmware Based

Foundation Overview February 2014

Linux and Open Source: the View from IBM

HARDWARE INNOVATION REVEAL Openpower Foundation Summit

Openpower Innovations for HPC IBM Research

Summit Press

Openpower Ready™ November 13, 2017 Revision 2.0 - PRD

The Openpower Initiative

Linux and Open Source: the View from IBM

IBM Power Systems HPC Solutions Brief

Presence of Europe in HPC-HPDA Standardisation and Recommendations to Promote European Technologies