Fpgas and Machine Learning – Intel® FPGA Deep Learning Acceleration Suite – Execution on the FPGA (Model Optimizer & Inference Engine) • Intel® Agilex® FPGA • Oneapi
Total Page:16
File Type:pdf, Size:1020Kb
Intel Network & Custom Logic Group Greg Nash, System Architect – Government, Aerospace & Military Jim Moawad, Technical Solution Specialist – FPGA Acceleration July 29, 2019 - Argonne Training Program on Extreme Scale Computing Public notices & disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks . Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks . Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Intel Atom, Intel Core, Intel Nervana, Intel Optane, Intel Xeon, Iris, Movidius, OpenVINO, Pentium, Celeron, Stratix, Arria and the Stratix logo and are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others. 2 © 2019 Intel Corporation Agenda • Intel® and AI / Machine Learning • Accelerate Deep Learning Using OpenVINO Toolkit • Deep Learning Acceleration with FPGA – FPGAs and Machine Learning – Intel® FPGA Deep Learning Acceleration Suite – Execution on the FPGA (Model Optimizer & Inference Engine) • Intel® Agilex® FPGA • OneAPI 3 Machine Learning Applies to Nearly Every Market Agriculture Energy Education Government Finance Health Achieve higher Maximize Transform Enhance safety, Turn data Revolutionize yields & increase production the learning research, and into valuable patient efficiency and uptime experience more intelligence outcomes Industrial Media Retail Smart Home Telecom Transport Empower Create Enable homes Drive network truly intelligent thrilling Transform stores that see, hear, and operational Automated Industry 4.0 experiences and inventory and respond efficiency driving Wide variety of market segments incorporating AI and Deep Learning which is increasing compute demands across the board. 4 © 2019 Intel Corporation AI & Machine Learning is a Vast Field of Study Regression Classification AI Supervised Clustering Learning Decision Trees Machine Data Generation learning Image Processing unSupervised Speech Processing Learning Natural Language Processing Deep Recommender Systems learning Adversarial Networks Reinforcement Reinforcement Learning Learning No one size fits all approach to AI 5 © 2019 Intel Corporation Data Is Driving the Desire for AI Data deluge (2019) Analytics Curve Insights 1 25 GB per month Internet User Act/adapt Cognitive 2 Analytics 50 GB per day Business Smart Car Forecast 2 Prescriptive Analytics 3 TB per day Smart Hospital Foresight 2 Predictive 40 TB per day Analytics Operational Airplane Data insight 2 Diagnostic 1 pB per day Analytics Smart Factory Hindsight 2 Descriptive AI 50 PB per day Analytics Security City Safety Is the driving force 1. Source: http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html 2. Source: https://www.cisco.com/c/dam/m/en_us/service-provider/ciscoknowledgenetwork/files/547_11_10-15-DocumentsCisco_GCI_Deck_2014-2019_for_CKN__10NOV2015_.pdf 6 © 2019 Intel Corporation Diverse Workloads require CPU GPU AI FPGA DIVERSE architectures The future is a diverse mix of scalar, vector, matrix, and spatial architectures deployed in CPU, GPU, AI, FPGA and other accelerators Scalar Vector Matrix Spatial SVMS 7 © 2019 Intel Corporation AI Compute Guide CPU Accelerators FPGA, VPU, GPU NNP-I Inference Highly efficient multi-model Delivering Real time workflows inferencing for cloud, data center with AI for optimized system and appliances at scale performance Foundation for all workloads including AI Energy efficient Edge AI for Intel® DL Boost for inference Video/Vision & training acceleration NNP-T Training Fastest time-to-train at scale with high bandwidth AI server connections for the most GPU discrete & Integrated, Multi models – high performance across broad set of existing and emerging DL models for applications like optimized for highly parallel Natural Language Processing, Speech & Text, Recommendation Engines, and Video & Image search & persistent, intense usage filtering workload acceleration A+ - Combined workflows where use cases span AI plus additional workloads (Eg - Immersive Media - Media + Graphics in GPU or L3 forwarding in FPGA) 8 © 2019 Intel Corporation Tame your data with a robust data layer Source(s)? ingest Store Process Analyze Structured? Tool for live File, block or Integration, Applications Volume? streaming data object-based cleaning, in HPC, Big Data, DURABILITY? ingestion from storage solution normalization HPDA, AI & more Streaming? Internet of given cost, and more that have access LOCALITY? Things (IOT) access, volume transformations to a common GOVERNANCE? sensors in and perf on batch and/or compute and Other? endpoint devices requirements streaming data data pool Visit: intel.com/content/www/us/en/analytics/tame-the-data-deluge.html *Other names and brands may be claimed as the property of others 9 © 2019 Intel Corporation Visit: software.intel.com/ai/courses Choose any approach from machine to deep learning SVM Un- supervised Class- Learning Machine Supervised ification Learning Learning Nearest Naïve Neighbor Bayes GMM kmeans GPR SVR Discri- minant Linear Clustering Analysis Hier- Regres- archical sion Neural Logistic Decision Nets Trees Reinforce- Hidden Statistical ment Markov Methods Neural Learning Ensemble Models Methods Nets 10 © 2019 Intel Corporation Visit: www.intel.ai/technology Speed up development using open AI software Machine learning Deep learning TOOLKITS App Open source platform for building E2E Analytics & Deep learning inference deployment Open source, scalable, and developers AI applications on Apache Spark* with distributed on CPU/GPU/FPGA/VPU for Caffe*, extensible distributed deep learning TensorFlow*, Keras*, BigDL TensorFlow*, MXNet*, ONNX*, Kaldi* platform built on Kubernetes (BETA) Python R Distributed Intel-optimized Frameworks libraries * • Scikit- • Cart • MlLib (on Spark) * * And more framework Data learn • Random • Mahout optimizations underway • Pandas Forest including PaddlePaddle*, scientists * * • NumPy • e1071 Chainer*, CNTK* & others Intel® Intel® Data Analytics Intel® Math Kernel Kernels Distribution Acceleration Library Library Library for Deep Neural for Python* (DAAL) Networks (MKL-DNN) developers Intel distribution High performance machine Open source compiler for deep learning model optimized for learning & data analytics Open source DNN functions for computations optimized for multiple devices (CPU, GPU, machine learning library CPU / integrated graphics NNP) from multiple frameworks (TF, MXNet, ONNX) 1 An open source version is available at: 01.org/openvinotoolkit *Other names and brands may be claimed as the property of others. Developer personas show above represent the primary user base for each row, but are not mutually-exclusive All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. 1111 © 2019 Intel Corporation Optimization Notice Visit: