Business Brief

Service Providers Data Center

Enable Artificial Intelligence Workloads in the Cloud

Cloud Service Providers can grow their businesses by offering or supporting artificial intelligence (AI) applications in the cloud. Discover how AI capabilities can be added to your infrastructure.

Industry Strategic Challenges Artificial intelligence (AI) is enjoying a boom, with IDC predicting worldwide spending will reach USD 57.6 billion in 2021, a compound annual growth rate of 50.1 percent between 2016 and 20211. In recent years, AI and have enabled breakthroughs in computer vision, and speech recognition and control systems, among others. These advances have been fundamental for solving real-life problems such as screening medical images, genomics analysis, and log analysis. For next-wave cloud service providers (CSPs), competition is fierce. It is essential to be able to differentiate, and compete with top tier CSPs, who are already offering AI services in the cloud. By offering AI services that are integrated with the cloud stack, next-wave CSPs can enhance the value that they deliver to their clients and help them to solve real business problems using the data in the CSP’s care. Many businesses have huge data lakes, thanks to the growth of the Internet of Things (IoT) and the rise of the digital economy. Making sense of them can be challenging, but AI can help businesses to derive useful insights, so that they can improve their products, processes and customer service. AI SPENDING In many cases, speed is essential for the application. A smart factory IoT workload, for example, may require a quick response to intervene in the WORLDWIDE WILL REACH event of malfunction, and a public safety application would require a prompt response if an incident was detected on a security camera or using telemetry. A trading application in financial services would need to act USD 57.6 quickly before the market moved. More routinely, an application for serving product recommendations would require a quick response to deliver a web page to an online customer without delay. In healthcare, AI can be used for analyzing genomics, CT scans, and MRI scans and for carrying out brain tumor diagnoses. The faster the analysis, the faster the patient can get the care they need. Applications such as these can use AI to analyze incoming BILLION data, and trigger the most suitable response or output. To achieve the BY 2021 required responsiveness, it is essential that AI applications are hosted Source: IDC on highly performant architecture. Business Brief | Enable Artificial Intelligence Workloads in the Cloud 2

At the same time, implementing the technology stack for There are two phases to creating and running an AI AI can appear to be complex. Businesses may turn to CSPs application (see Figure 1): for support with this, and CSPs in turn will be looking for a • Training, where large volumes of data are typically technology stack and a reference architecture that enables processed to create an AI model; and them to deliver flexible AI services, integrated with their existing automation and orchestration tools. • Inference, where the model is applied to real data to enable new insights or action to be taken. Business Drivers For example, an image recognition application might be fed CSPs can differentiate their business and better compete a database of labeled images for the training phase, which it with top-tier CSPs by hosting AI applications. The would use to create an AI model. Afterwards, that model could challenge is to implement a flexible architecture that easily be used to classify images as they are posted online, to help integrates with their existing cloud stack, and that meets with content filtering. To enable inference workloads like this, the performance requirements of AI workloads. If CSPs are the challenge is to offer the required performance in real-time. able to do that, they have an opportunity to host the most Many organizations already have their data stored in the cloud demanding customer workloads. They could also offer in Apache Hadoop* or Apache Spark*, and CSPs can enable Artificial Intelligence as a Service (AIaaS), opening up a new their customers to derive more value from it by enabling it to market and helping to differentiate in the highly competitive be used for training or inference, without being transferred marketplace. to another system. By adding AI capabilities to the existing Hadoop/Spark stack, it is possible to increase utilization of that stack, while creating new commercial opportunities.

STEP 1: TRAINING STEP 2: INFERENCE (In Data Center - Over Hours/Days/Weeks) (At the Edge or in the Data Center - Instantaneous)

Massive data sets: New input labeled or tagged Person from camera input data and sensors

Create Trained neutral “Deep neutral net” Trained Model network model math model

Output 90% person 97% Output Classification 8% traffic light person Classification

Figure 1. The training and inference processes used for creating and deploying artificial intelligence models Business Brief | Enable Artificial Intelligence Workloads in the Cloud 3

Customer Applications

Kubernetes* (KubeFlow*) Orchestration using familiar tools

Intel-optimized version of TensorFlow* 1.6 TensorFlow for

Intel® software to accelerate Intel® Math Kernel Library (Intel® MKL) mathematical processes

IaaS exposed Virtual Machines Infrastructure as a Service layer

Ubuntu* 16.04 Operating system

Intel® Xeon® Scalable processor Latest generation Intel® processors

Figure 2. The TensorFlow* software stack

Enabling Transformation

Through a combination of services, and optimized hardware Extensions 512 (Intel® AVX-512), which accelerate AI by and software, Intel is helping CSPs to meet the performance enabling a single instruction to process up to 512 bits of and scale requirements of AI, so they can better differentiate data simultaneously. and compete in the CSP marketplace. Intel offers a reference The reference architecture is based on TensorFlow*, a leading architecture that enables CSPs to easily add AI capabilities software framework that is often used for machine learning, to their cloud stack, and a software solution that enables a subset of AI where applications learn from data instead of CSPs to add AI to workloads stored in Spark/Hadoop. being explicitly programmed. TensorFlow was created by AI frameworks that are optimized to take advantage of Google, and Intel and Google have worked together to ensure performance features in Intel® processors deliver enhanced that it is able to take advantage of hardware features in Intel® performance, which helps to differentiate the CSP’s Xeon® processor-based platforms. TensorFlow has been capabilities and attract more demanding workloads. optimized for more than 20 topologies that cover various The AI solutions proposed here can provide a CSP with a usage models (see Figure 3). Alternative solutions can be foundation for AIaaS, targeting industry sectors including used in place of TensorFlow in this layer of the stack, such genomics, financial services and security. An optimized as MXNet* or *, so this reference architecture helps to stack has been developed to meet the AI needs of a wide guide CSPs, whatever their preferred AI framework. range of business sectors, with capabilities including CSPs can also use their choice of orchestrator, using image recognition, speech recognition, recommendation OpenStack* in place of Kubernetes, for example. engines, object detection, language translation, text to speech conversion, image segmentation, image generation, Intel has optimized TensorFlow, Caffe, MXNet and several adversarial networks, and . other AI frameworks to take advantage of CPU features such as Intel AVX-512 to accelerate performance. Using the Intel- A reference architecture for machine learning in the cloud optimized version of MXNet, for example: Figure 2 shows a multilayered reference architecture that • Speeds up image classification performance by 24x using can be used to easily host and orchestrate AI workloads, the Inception v3 topology2,3; integrated with the existing cloud stack. • Speeds up text translation by 4x using GNMT22,4; Kubernetes* can be used to orchestrate the AI workloads • Accelerates object detection by 22x using SSD-VGG1622,5; and to the virtual machines. Intel® Xeon® Scalable processors are workload-optimized to support the most high-demand • Speeds up generative adversarial networks by 35x using applications. The processors include Intel® Advanced Vector DCGAN2,6. Business Brief | Enable Artificial Intelligence Workloads in the Cloud 4

Adding AI to data in an Apache Spark/Hadoop cluster IMAGE RECOGNITION If you have an existing Apache Spark/Hadoop cluster, you can ResNet50 use Intel® BigDL, which enables your customers to build AI InceptionV3 capabilities into their existing workloads, without any special- Inception ResV2 purpose hardware or new programming tools or frameworks. Excess capacity in the cluster can be deployed for AI InceptionV4 workloads, increasing utilization and revenue opportunities. MobileNet Intel BigDL enables developers and big data analysts to SqueezeNets write machine learning applications using standard Apache DenseNet Spark programs written in Python* or Scala*. Apache Spark is used for distributed training and inference, and the solution SPEECH RECOGNITION scales out to thousands of servers. As a result, CSPs can offer machine learning performance at cloud scale, with a low Deep Speech barrier to entry. Transformer Intel BigDL brings machine learning capabilities to your existing Apache Hadoop and Apache Spark platform and RECOMMENDER SYSTEMS sits on top of it as shown in Figure 4. The Intel Xeon Scalable Wide & Deep processor delivers the performance required.

OBJECT DETECTION & LOCALIZATION SSD-VGG16 10Gb or 25Gb Switch BigDL R-FCN Faster-RCNN Apache Spark* YoloV2 Intel® Xeon® Scalable processor-based server YARN/Mesos/Kubernetes/ LANGUAGE TRANSLATION Apache Spark Standalone GNMT JVM & Python Intel® Xeon® Scalable Python is optional Transformer processor-based server VM/Container, Docker, KVM, Xen & VMware - Optional TEXT TO SPEECH 10GbE or 25GbE network WaveNet Linux, RedHat & CentOS

Intel® Xeon® IMAGE SEGMENTATION processor-based server Rack MaskRCNN U-Net Figure 4. The Intel® BigDL software stack 3D-Unet

IMAGE GENERATION DRAW Intel® BigDL for Image Analysis JD.com, one of the leading online retailers in China, ADVERSARIAL NETWORKS needed an application to support image extraction DCGAN and picture deduplication. Intel® BigDL offered the 3DGAN scalability, model interoperability, and performance necessary to analyze and deduplicate hundreds of REINFORCEMENT LEARNING millions of images on Intel® Xeon® processor-based clusters. A3C

Figure 3. TensorFlow* topologies Business Brief | Enable Artificial Intelligence Workloads in the Cloud 5

Conclusion Solution Ingredients for TensorFlow* Offering AIaaS can enable CSPs to differentiate their offering • Kubernetes* (KubeFlow*) and attract new business. Using the reference architecture outlined here, based on the Intel-optimized version of • TensorFlow 1.6 TensorFlow, it’s possible to more easily offer customers • Intel® Math Kernel Library (Intel® MKL) machine learning capabilities with enhanced performance. For customers that already have data stored in an existing • Ubuntu* 16.04 Apache Hadoop or Apache Spark platform, Intel BigDL • Intel® Xeon® Scalable processor enables machine learning capabilities to be added on top of the existing infrastructure. Solution ingredients for Intel® BigDL Find the solution that is right for your organization. Contact • Intel® BigDL your Intel representative or visit intel.com/cloud • Apache Spark* • Java* Virtual Machine • YARN*/Mesos*/Kubernetes • Virtual machine/containers infrastructure • Red Hat and CentOS Linux* • Intel® Xeon® Scalable processor • Programming language: Python* or Scala* Business Brief | Enable Artificial Intelligence Workloads in the Cloud 6

Solution Provided By:

1 IDC Spending Guide Forecasts Worldwide Spending on Cognitive and Artificial Intelligence Systems to Reach $57.6 Billion in 2021, September 2017 2 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance. Performance results are based on testing as of 6th December 2018 and may not reflect all publically available security updates. See configuration disclosure for details. No product can be absolutely secure. 3 Tested by Intel as of December 6th 2018. System Details: 2 socket Intel® Xeon® Platinum 8180 CPU @ 2.50GHz / 28 cores, HT ON , Turbo ON Total Memory 192 GB (12slots / 16 GB / 2666 MHz). CentOS Linux-7.5.1708-Core, kernel: 3.10.0-862.6.3.el7.x86_64, Deep Learning Framework: MXNet. BIOS: SE5C620.86B.0X.01.0117.021220182317, microcode version: 0x200004d MKL-DNN: version v0.17.1. Topology: Inception v3, batch size=128, Dataset: Dummy Data (3,244,244). Intel® Optimized MXNet Version 1.3 (pip install mxnet-mkl==1.3.0) using FP32 precision vs stock MXNet version 1.3 (pip install mxnet==1.3.0) using FP32 Precision 4 Tested by Intel as of December 6th 2018. System Details: 2 socket Intel® Xeon® Platinum 8180 CPU @ 2.50GHz / 28 cores, HT ON , Turbo ON Total Memory 192 GB (12slots / 16 GB / 2666 MHz). CentOS Linux-7.5.1708-Core, kernel: 3.10.0-862.6.3.el7.x86_64, Deep Learning Framework: MXNet. BIOS: SE5C620.86B.0X.01.0117.021220182317, microcode version: 0x200004d MKL-DNN: version v0.17.1. Topology: GNMT, batch size=64, Dataset: newstest2016, German->English. Intel® Optimized MXNet Version 1.3 (pip install mxnet-mkl==1.3.0) using FP32 precision vs stock MXNet version 1.3 (pip install mxnet==1.3.0) using FP32 Precision 5 Tested by Intel as of December 6th 2018. System Details: 2 socket Intel® Xeon® Platinum 8180 CPU @ 2.50GHz / 28 cores, HT ON , Turbo ON Total Memory 192 GB (12slots / 16 GB / 2666 MHz). CentOS Linux-7.5.1708-Core, kernel: 3.10.0-862.6.3.el7.x86_64, Deep Learning Framework: MXNet. BIOS: SE5C620.86B.0X.01.0117.021220182317, microcode version: 0x200004d MKL-DNN: version v0.17.1. Topology: SSD-VGG16, batch size=224, Dataset: Dummy Data (3,244,244). Intel® Optimized MXNet Version 1.3 (pip install mxnet-mkl==1.3.0) using FP32 vs stock MXNet version 1.3 (pip install mxnet==1.3.0) using FP32 Precision 6 Tested by Intel as of December 6th 2018. System Details: 2 socket Intel® Xeon® Platinum 8180 CPU @ 2.50GHz / 28 cores, HT ON , Turbo ON Total Memory 192 GB (12slots / 16 GB / 2666 MHz). CentOS Linux-7.5.1708-Core, kernel: 3.10.0-862.6.3.el7.x86_64, Deep Learning Framework: MXNet. BIOS: SE5C620.86B.0X.01.0117.021220182317, microcode version: 0x200004d MKL-DNN: version v0.17.1. Topology: DCGAN, batch size=128, Dataset:cifar10. Intel® Optimized MXNet Version 1.3 (pip install mxnet-mkl==1.3.0) using FP32 precision vs stock MXNet version 1.3 (pip install mxnet==1.3.0) using FP32 Precision

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer, or learn more at https://www.intel.com/content/www/us/en/products/processors/xeon.html Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. © Intel Corporation 0219/JS/CAT/PDF 338650-001EN