Simón Viñals Larruga Corporation Feb 2017 Legal Disclaimers

Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www- ssl.intel.com/content/www/us/en/high-performance-computing/path-to-.html. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. 3D XPoint, Intel, the Intel logo, Intel. Experience What’s Inside, the Intel. Experience What’s Inside logo, Intel Phi, Optane, and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other names and brands may be claimed as the property of others. © 2016 Intel Corporation. All rights reserved.

2 What are the growing challenges in HPC? “The Walls” Divergent Infrastructure Barriers to System Bottlenecks Extending Usage Visualization

HPC Optimized Big HPC Data

Machine Learning Memory | I/O | Storage Democratization at Every Scale | Cloud Access Energy Efficient Performance Resources Split Among Modeling and Simulation | Big | Exploration of New Parallel Programming Space | Resiliency | Data Analytics | | Visualization Models Unoptimized Software The “walls”, divergent usages, and “democratization” are the top issues

3 What is required to deal with these growing challenges?

System Application Innovative Technologies Tighter Integration Modernized Code

Cores Community Compute Memory Memory Fabric Fabric ISV Storage System Graphics Software FPGA Proprietary

I/O PERFORMANCE I CAPABILITY I PERFORMANCE TIME A “holistic” approach is needed…

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.* Other names and brands Data Center Group may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2016, Intel Corporation. 4 Intel Confidential | NDA Required Fuel Your Insight Intel® Scalable System Framework

Small Clusters Through

Compute Memory/Storage Compute and Data-Centric Computing Fabric Software Standards-Based Programmability On-Premise and Cloud-Based Intel Silicon Photonics

Intel® Xeon® Processors Intel® Solutions for Lustre* Intel® Omni-Path Architecture Intel® HPC Orchestrator Intel® ™ Processors Intel® Optane™ Technology Intel® True Scale Fabric Intel® Software Tools Intel® Xeon Phi™ 3D XPoint™ Technology Intel® Ethernet Intel® Cluster Ready Program Intel® Server Boards and Platforms Intel® SSDs Intel® Silicon Photonics Intel Supported SDVis

5 Intel® SSF Market Momentum

HPE/Intel HPC Alliance Project Azimuth Innovation Centers HPC Solutions Frameworks Dell HPC System Portfolio – Stuttgart, Germany – Beijing, China

Collaboration Partners Oil & Gas Life Sciences Finance Genomics Manufacturing Research Dell HPC Innovation Lab – University of Oxford Centers of Excellence – Barcelona Supercomputing – Grenoble, France University of Cambridge Centre – Houston, Texas, USA

*Other names and brands may be claimed as the property of others 6 Intel® SSF rapid adoption

Intel® SSF Design Guidance Simplifies… System Design and Build Software Development Procurement, Deployment, Management

Coming Q1’16 Reference Architectures designs for compatibility Reference Designs system build recipes Validation Tools streamlined testing

Public statements of adoption since April ‘15 7 Copyright © 2016 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Other Key Customer Determinants

Nvidia* GPU Intel® Xeon Phi™ Processor  Proprietary CUDA* programming  Open-standards based programming  Lack of code flexibility, portability  Runs workloads  Data offloading bottlenecks  No PCIe bottlenecks  Greater system complexity  HPC-optimized (integrated memory, fabric)  Higher power requirements  Lower power  Large memory footprint

 Future-ready (AVX-512, ecosystem and long-term roadmap)

As a host processor that runs x86 code, Intel® Xeon Phi™ is much more than an accelerator

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.* Other names and brands Data Center Group may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2016, Intel Corporation. 88 Introducing the Intel® Xeon Phi™ Processor

st Integrated st Host CPU for Highly- st Integrated 1 Fabric 1 Parallel Apps 1 Memory

Leadership performance … with all the benefits of a CPU

  No PCIe Bottleneck Up to Up to Up to Run x86 Workloads

GPU GPU 5x 8x 9x  Programmability  Large Memory Footprint vs.

Accelerator Accelerator Perf* Perf/W* Perf/$*  Power Efficient  Scalability & Future-Ready

*Intel measured results as of April 2016; see speakers notes for full configuration and performance disclaimers

9 Intel® Xeon Phi™ Processor: Your Path to Deeper Insight A Foundational Element of Intel® Scalable System Framework

Solve Biggest Highly-Parallel Eliminate Bottlenecks Challenges Faster Scalability

Realize Power Efficiency Programmability Compelling Value High Utilization

Maximize Future-Ready Code Broad Ecosystem Future Potential Robust Roadmap

For discovery and business innovation in science, visualization & analytics

10 For Discovery and Business Innovation in Science, Visualization & Analytics

Life Sciences – Energy – Seismic/ Genomics / Financial – Risk Weather Reservoir Sequencing

Scientific Big Data Visualization / Simulation, Defense / Analytics / Professional CAE & CFD Security Machine Learning Rendering

and other emerging usages…

*See the Intel® Xeon Phi™ application showcase for examples of workloads that are most suitable

11 Proof Points and Applications Speed Ups: Verticals Snapshot

Up to 3.65X Manufacturing Up to 6.48X Financial Services Up to 2.66X Life Sciences Up to 2.1X Climate and Weather Intel® Xeon Phi™ Processor proof points1: . Various applications compared to * GPU: 2.17X average speed up . Financial Services: 3.45X average speed up . Life Sciences: 1.74X average speed up . Manufacturing: 1.86X average speed up Up to 3.3X Material Sciences . Climate and Weather: 1.46X average speed up . Material Sciences: 1.96X average speed up Up to 2.8X . Physics: 2X average speed up Geophysics Up to 2.44X Physics . Geophysics: 2.17X average speed up

1 - Performance demonstrated in proof points in this presentation Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating12 your contemplated purchases, including the performance of that product when combined with other products. *Other names and brands may be claimed as the property of others. Intel Inside. Software Optimization Outside on the Intel® Xeon Phi™ Product Family Experts from Allinea*, Altair*, Convergent Science*, Kitware*, and LSTC* share some of the use cases and explore the significant advantages of running their applications on the Intel® Xeon Phi™ product family. See what the Intel Xeon Phi Processor can do for key software applications.

*Other names and brands may be claimed as the property of others 13 More Intel® Xeon Phi™ Processor Software Enablement http://itpeernetwork.intel.com . Optimizing Automotive Designs with Intel and Altair* . Momentum Grows for Intel Scalable System Framework . Incredible Machine Learning Advancements Made Possible by Intel and QCT*: The Viscovery Use Case . The Next Giant Leap in Adaptive Supercomputing* – The Intel Xeon Phi Processor . Next-Generation Intel HPC Fabric Takes Flight

14 Intel® Xeon Phi™ processor statements of support

“As we continue to optimize our “LSTC* is working closely with RADIOSS* solution to best embrace “We believe that the Knights Landing Intel to evaluate the KNL the many cores of Xeon Phi for a architecture has great potential for our platform and is exploring customers, and we look forward to release later this year, we’re excited “……The Intel® Xeon Phi™ support in an upcoming to deliver new levels of value fully embracing this exciting new Xeon release of LS-DYNA*" family member in our future product processor is a great step forward beyond application performance and provides awesome Marsha Victory Marketing with our new node license model releases” performance for molecular Director, LSTC called HyperWorks Unlimited Dr. Wim Slagter, Director, HPC & Cloud simulations with GROMACS” Solver Node*.” Marketing, ANSYS Eric Lindahl, of KTH* and Piush Patel VP of Corporate Stockholm University*, GROMACS* Development, Altair Project Leader

“Paradigm is evaluating Intel’s next generation Xeon “…..The Intel® Xeon Phi™ processor is at the forefront of CPU architectures Phi platform as part of our poised to open the door to Exascale systems…” current technology “These achievements are Didier Juvin, Program Director CEA partnership and we are enabling the LAMMPS user working with Intel to best community to overcome barriers take advantage of Intel’s in computational modeling, “We’re looking forward to “SIMULIA* is working with Intel to evolving platform for our enabling new research with delivering solutions to market evaluate the KNL platform and is products.” larger simulation sizes and that take advantage of this many exploring support in an upcoming Somesh Singh Chief Product longer timescales” core platform to deliver improved release” Officer, Paradigm Steve Plimpton, Sandia National experiences to our users” Matt Dunbar, SIMULIA R&D Senior Laboratories Michael Russel Senior Manager Director HPC Cloud Services Automotive, Autodesk

*Other names and brands may be claimed as the property of others. 15 A Growing Ecosystem: DevelopingToday on Intel® Xeon Phi™ processors and intel.com/Coprocessorsxeonphi/partners

*Other names and brands may be claimed as the property of others. 16 A Growing Ecosystem: DevelopingToday on Intel® Xeon Phi™ processors and cCoprocessors intel.com/xeonphi/partners

*Other names and brands may be claimed as the property of others. 17 A Growing Ecosystem: DevelopingToday on Intel® Xeon Phi™ processors and Coprocessors intel.com/xeonphi/partners

*Other names and brands may be claimed as the property of others. 18 Intel® Parallel Computing Centers Community intel.com/xeonphi/partners

Collaborating to accelerate the pace of discovery

*Other names and brands may be claimed as the property of others. 19 Intel® Parallel Computing Centers Community intel.com/xeonphi/partners

Collaborating to accelerate the pace of discovery

*Other names and brands may be claimed as the property of others. 20 Intel® Xeon Phi™ KNL Architecture Overview x4 DMI2 to PCH 36 Lanes PCIe* Gen3 (x16, x16, x4) Self-Boot Processor MCDRAM MCDRAM KNL Binary-compatibility with Xeon, 3+ TFLOPS1 (DP) Package On-package memory 16GB, 490 GB/s STREAM TRIAD Platform Memory Up to 384GB (6ch DDR4-2400 MHz)

Other Key Features  2D Mesh Architecture  Out-of-Order Cores  3X Single-Thread vs. KNC  Intel® AVX-512 Instructions  Scatter/Gather Engine  Integrated Fabric - OPA DDR4 DDR4

TILE: 2VPU HUB 2VPU (up to 36) 1MB Core L2 Core MCDRAM MCDRAM Enhanced Intel® ™ cores based on Microarchitecture

Tile EDC (Embedded DRAM Controller) IMC (Integrated Memory Controller) IIO (Integrated I/O Controller)

1Theoretical peak performance

21 Intel® Xeon Phi™ Processor A Highly-Parallel CPU that Transcends GPU Accelerators

No PCIe Bottleneck Topple Memory Wall Run x86 Workloads Bootable host processor Integrated 16GB memory Intel® Xeon® processor binary-compatible

Bootable Host CPU 2 2 VPU HUB VPU Integrated Fabric

Processor Package 1 Core MB Core L2

Scale Out Seamlessly Reduce Cost Raise Memory Ceiling Efficient scaling like Intel® Xeon® processors Dual-port Intel® Omni-Path Fabric Platform memory up to 384 GB (DDR4)

1Reduced cost based on Intel internal estimate comparing cost of discrete networking components with the integrated fabric solution

22 Intel® Xeon Phi™ Product Family x200

Intel® Xeon Phi™ Processor x200 Intel® Xeon Phi™ x200

with integrated Intel® Omni-Path Fabric

Host Processor in Groveport Platform Ingredient of Grantley Platforms Self-boot Intel® Xeon Phi™ processor Requires Intel® Xeon® processor host

23 How can I get higher performance & TCO for my apps?

Performance is being left on the table

152x VP = Vectorized & Parallelized (MT) SP = Scalar & Parallelized (MT) VS = Vectorized & Single-Threaded (ST) SS = Scalar & Single-Threaded (ST)

Intel believes most codes are here

4C 4C 6C 8C 12C 14C 22C

Modernization (i.e. parallelization and vectorization) of your code is the solution

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel measured as of Q3 2014 Configuration Details: Please slide speaker notes. For more information go to http://www.intel.com/performance . For configuration details, see slide 122

24 Solve Biggest Challenges Faster Highly-Parallel

Vectorized & Parallelized Intel® Xeon® processors are >100x*

increasingly parallel and Scalar & Parallelized require modern code Vectorized & Single-Threaded CPU Generation Scalar & Single-Threaded (2011-2016)

Intel® Xeon Phi™ processors Up to 72 cores (288 threads) are extremely parallel and use general purpose programming V[512] Intel® Advanced Vector Extensions 512 (AVX-512)

*Binomial Options DP simulation performed on Intel® Xeon® processor X5570 (formerly codenamed Nehalem), Intel® Xeon® processor x5680 (formerly codenamed Westmere), and Intel® Xeon® processor E5 2600 families v1 through v4 for 4 sets of code with varying levels of vectorization and threading optimization

25 What platform should I use for code modernization?

tba Future

The world is going Intel® Xeon® Processor Intel® Xeon Phi™ x100 Intel® Xeon® Processor Intel® Xeon Phi™ x200 parallel – stick E5-2600 v3 Product Product Family E5-2600 v4 Product Product Family with sequential Family formerly formerly codenamed Family codenamed codenamed … code and you will codenamed Knights Broadwell Knights Skylake fall behind. Haswell Corner Landing

Cores 18 61 22 72 28

Threads/Core 2 4 2 4 2

Vector Width 256-bit 512-bit 256-bit 512-bit (x2) 512-bit (x2)

Peak Memory Bandwidth 68 GB/s 352 GB/s 77 GB/s >500 GB/s 228 GB/s

Both Xeon and KNL are suitable platforms; KNL provides higher scale & future code readiness. Single investment across KNL & Xeon (Vs two with Nvidia*/CUDA) for HPC workloads.

26 What is KNL’s differentiation Vs SKX (Skylake)?

Feature 2S-Skylake KNL ~KNL/SKX Ratio

Cores Up to 56 Up to 72 + Threads/core 2 4 2x Total Threads 112 288 2.5x HBM (High Bandwidth Memory) None 16 GB + Peak Memory BW 128 GB/s >500 GB/s 4x Intel® AVX-512 ER, PF No Yes + Intel® OPA Ports 1+1 2 + (more/CPU) Power TDP 410 W 245 W 0.6x

KNL provides higher scale & couple more HPC AVX-512 features at much lower power

27 27 FPGA Vs KNL

FPGA: Reprogrammable accelerators that benefit certain Xeon Phi Processor: bootable host processor for highly- algorithms that can be parallelized and pipelined such as parallel applications that require performance that packet processing, signal processing, or image achieves maximum FLOPs. processing.

• Requires host processor, such as Intel Xeon • Is already a host processor • Uses a different source code and programming model • Uses common source code and programming model than Xeon, requiring greater code investment as Xeon, making it a single code investment on IA • Targeted for enterprise workloads such as video, • Targeted for HPC applications such as life sciences, compression/decompression, and some genomics energy, weather, etc. applications.

28 When is KNL optimal versus Xeon†?

Why Xeon Phi? Which Apps?†

Improve Performance Scalable to Optimized for >60 cores If yes… Highly-Parallel Applications -AND/OR- AND Improve ROI Heavily Vectorized -OR- If no… Commonly-Used Unlock Local memory Parallel Processor* Potential BW bound

†Performance results on Xeon Phi will vary depending on app characteristics. For more information, see: https://software.intel.com/sites/default/files/article/383067/is-xeon-phi-right-for-me.pdf

KNL is optimal for apps that scale to >60 cores & are vectorized or memory BW bound

† Xeon = Intel® Xeon® processor

29 Positioning Statement by Product Family*

Xeon . Parallel, Fast Serial Multicore + Vector Leadership Today and Tomorrow . Ideal for servers running widely diverse workloads . Services Compute, Storage, and Network . Excellent Single and Multi-Thread Performance . Broadest base of programing options . Broadest ecosystem of applications Xeon Phi (KNL) . Optimized for highly parallel, highly vectorized and heavy threaded applications . Applications requiring improved performance that achieves maximum FLOPS . Host processor able to run diverse, highly-parallel applications (also available as a co-processor) . A growing base of industry standard HPC programing options (e.g. C, C++, Fortran, etc.) . A growing ecosystem of parallelized commercial applications

*Covers products through 2017 30 Positioning Statement by Product Family* (con’t)

Xeon+FPGA (MCP) . Integrated host processor with memory coherency between CPU and FPGA . Reconfigurable hardware that cooperates with the Xeon CPU providing a heterogeneous compute solution for optimization of workloads with routine algorithms . Enables a reconfigurable number of execution units for customized workloads . Improved Performance/W for custom, evolving or repeating workloads . Direct high bandwidth interfaces to networking and storage . Programming options include RTL and OpenCL Xeon and FPGA (Discrete) . Range of FPGA sizes for different workloads and power budgets . Range of interface options for choices in IO bandwidth, protocols and configurability . Option for FPGA direct attach memory for lower latency and improved inline workload performance . Providing option to deploy multiple FPGAs per node

*Covers products through 2017 31 Intel® Xeon® Processor E7 - Workload AlignmentBusiness Analytics Scientific Cloud Visualization & Comms Storage Processing Services Audio OLTP Data Analysis & Simulation/CAE & Front End Media Delivery Wired Analytics

Mining CFD Web and Transcode Networking HOT File & Print Big Data CAD Data Caching Remote Packet Business Analytics Visualization Processing Processing

Email Machine Learning Life Sciences and Search Remote Gaming Virtual WARM Cloud/Object - Training Genomics Switching Storage Active-Archive ERP Machine Learning Molecular VDI (Clients) Network Archive/

- Evaluation Dynamics Security Compliance COLD CRM Financial - Trading Image & Video Wireless Backup/ Analytics Access Recovery Application Financial - Risk Speech & Audio Wireless Disaster Servers Core Recovery Energy – Seismic/Reservoir Very Applicable Applicable Weather Less Common Defense

Security 32 Intel® Xeon® Processor E5 – Workload Alignment Business Analytics Scientific Cloud Visualization & Comms Storage Processing Services Audio OLTP Data Analysis & Simulation/CAE & Front End Media Delivery Wired Analytics

Mining CFD Web and Transcode Networking HOT File & Print Big Data Analytics CAD Data Caching Remote Packet Business Visualization Processing Processing

Email Machine Learning - Life Sciences and Search Remote Gaming Virtual WARM Cloud/Object Training Genomics Switching Storage Active-Archive ERP Machine Learning - Molecular VDI (Clients) Network Archive/

Evaluation Dynamics Security Compliance COLD CRM Financial - Trading Image & Video Wireless Backup/ Analytics Access Recovery Application Financial - Risk Speech & Audio Wireless Disaster Servers Core Recovery Energy – Subset of applications for this workload Seismic/Reservoir High core counts Very Applicable value high frequency, low core count E5 E5 high core count SKUs very Applicable SKUs. May also use additional Weather applicable Less Common accelerators or scale out clusters. Defense

Security 33 Intel® Xeon Phi™ - What are Target Usages? Workload Alignment Overview Business Visualization & Analytics Scientific Cloud Services Comms Storage Processing Audio Data Analysis & Simulation/CAE & Media Delivery and Wired OLTP Front End Web Analytics Mining CFD Transcode Networking HOT Remote Packet File & Print Big Data Analytics CAD Data Caching Business Processing Visualization Processing

Machine/Deep Life Sciences – WARM Cloud Storage Virtual Email Learning - Genomics/ Search Remote Gaming Object Storage Switching Training Sequencing Active-Archive Machine/Deep Life Science - Network Archive/ Regulatory ERP Learning - VDI (Clients) Molecular Dynamics Security Compliance

Evaluation COLD Image & Video Wireless CRM Financial - Trading Backup/ Recovery Analytics Access Application Financial - Risk Speech & Audio Wireless Core Disaster Recovery Servers

Energy – Scientific Seismic/Reservoir Visualization Very Applicable Applicable Professional Weather Less Common Rendering

Defense/Security 34 FPGA Workload Alignment Business Analytics Scientific Cloud Visualization & Comms Storage Processing Services Audio

OLTP Data Analysis & Simulation/CAE & Front End Media Delivery Wired Analytics

Mining CFD Web and Transcode Networking HOT File & Print Big Data Analytics CAD Data Caching Remote Packet Business Visualization Processing Processing

Email Machine Learning - Life Sciences and Search Remote Gaming Virtual WARM Cloud/Object Training Genomics Switching Storage Active-Archive ERP Machine Learning - Molecular VDI (Clients) Network Archive/ Evaluation Dynamics Security Compliance CRM Financial - HFT Image & Video Wireless COLD Backup/ Analytics Access Recovery Application Financial - Risk Speech & Audio Wireless Disaster Servers Core Recovery

Energy – Seismic Very Applicable Weather Applicable Defense Less Common Security 35 Solve Biggest Challenges Faster Eliminate Bottlenecks

Bootable host: Memory: Fabric: No PCIe* Dependency Integrated (MCDRAM) Integrated on-package Intel® & Platform (DDR4) Omni-Path Fabric

Cost1 Power1

Density1

Memory Capacity

*Other names and brands may be claimed as the property of others. 2 1Reduced cost, power and increased density based on Intel internal estimate comparing discrete networking components with the integrated fabric solution Memory Bandwidth 2Sustained memory bandwidth (STREAM) up to 490GB/s using MCDRAM compared with only 90GB/s with DDR4 platform memory

36 Financial Services: Up to 6.5x higher performance

Higher is better Normalized performance

7 6.5 Financial Services applications allow 6 us to efficiently model options pricing 4.6 to best determine investing strategies. 5 4.0 4.1 4.4 4 3.2 Faster codes drive more certain 3 investment decisions more quickly 2 1.6 1.0 giving an advantage over the 1 competition. 0

Intel® Xeon Phi™ processor family with up to 16 GB of high bandwidth on package memory improves Financial Services codes by up to 6.5x. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured or estimated as of May 2016.

For configuration details, see back-up.

37 PUBLIC PRESENTATION Performance: Intel® Xeon Phi™ Processor vs NVIDIA* K80 Higher is better Relative Performance1 on the Intel® Xeon Phi™ Processor 7250

6

5.2 5.0 5

4

3 2.7

2.0 2 1.6 1.2 1.3 1.3 1.4 1.0 1.1 1

0 Nvidia K80 Binomial Options LINPACK CP2K STAC-A2 Warm MonteCarlo SP STREAM TRIAD BlackScholes DP MonteCarlo DP LAMMPS Embree vs OptiX SP Greeks (vs Titan X) Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured or estimated as of May 2016. For configuration details, see back-up.

Knights Landing delivers up to 2.7x higher performance versus K80 on FSI benchmarks

38 Performance/W: Intel® Xeon Phi™ Processor vs NVIDIA* K80

Higher is better Relative Performance per Watt on the Intel® Xeon Phi™ Processor 7250 9 8.0 8

7

6

5

4 2.8 3 2.0 2 1.4 1.0 1.1 1

0 Nvidia K80 Binomial Options SP MonteCarlo SP BlackScholes DP MonteCarlo DP LAMMPS Using measured total system power

Knights Landing delivers up to 2.8x higher performance/watt versus K80 on FSI benchmarks

For configuration details, see back-up.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the 39 performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured or estimated as of May 2016. Life Sciences: Up to 2.7x higher performance

Higher is better Normalized performance Life sciences HPC codes allow 3 2.7 2.5 researchers to look deeper into some 2.5 of the biggest mysteries in biology. 2 1.8 1.8 1.4 1.5 1.2 1.3 1.3 1.4 1.0 Faster simulations allow users to 1 more quickly understand the 0.5 underlying mechanisms impacting 0 our cells.

Intel® Xeon Phi™ processor family with up to 68 cores with 272 threads provides life science results up to 2.7x

faster. For configuration details, see slides 86-96. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured or estimated as of May 2016. See backup slides for configuration data.

40 Intel Xeon® Phi™ Processor Life Sciences Applications Performance

Life sciences are in the midst of a dramatic transformation as technology redefines what is possible for life as we know it. With Intel® technology, healthcare IT moves faster in everything from sequencing genomes, speeding up molecular dynamics performance workloads, and connecting patience, care teams, and data. The following proof points show tested and proven performance1 for the most important applications, with an average software performance improvement with the Intel Xeon Phi processor 7250 of up to 1.73X, and an average performance/watt improvement of up to 1.67X.

 LAMMPS: Up to 1.41X.  AMBER 16 IMPLICIT: Up to 2.66X  AMBER 16 EXPLICIT: Up to 1.83X Up to  ROME/SML: Up to 2.36X  RELION: Up to 1.31X 2.66X  GROMACS: Up to 1.22X, and up to 1.45X performance/watt  NAMD: Up to 1.36X, and up to 1.91X performance/watt

1 - Performance is the Intel Xeon® Phi™ Processor 7250 compared to the Intel® Xeon® processor E5-2697 v4

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating41 your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others Oil & Gas - GeoPhysics: Up to 1.7x higher performance

Higher is better Normalized performance

1.8 1.7 GeoPhysics HPC codes dig deeper 1.6 into the tectonics and wave 1.6 1.5 1.5 1.4 propagation through the earth. 1.2 1.0 1 Better performance on GeoPhysics 0.8 gives researchers a deeper 0.6 understanding of underlying 0.4 geophysical phenomena. 0.2 0 Intel® Xeon Phi™ processor family 2s Xeon® E5-2697 SeisSol - M7.2 SeisSol - Mount SeisSol - Mount ISO 3D v4 1992 Landers Merapi LTS MR2 Merapi GTS with Intel AVX-512 speeds up Dynamic Rupture GTS GeoPhysics codes by up to 1.7x.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured or estimated as of May 2016.

42 Intel Xeon® Phi™ Processor geophysics Applications Performance

Intel® Xeon Phi™ processor improves the software performance of Geophysics applications with features such as high bandwidth memory (MCDRAM) and Intel® AVX-512 vector instruction set architecture, helping these important applications realize meaningful performance gains. The following proof points show tested and proven performance1 for the most important applications, with an average software performance improvement with the Intel Xeon Phi processor 7250/7210 of up to 2.17X.

 ISO3DFD: Up to 1.71X  DISTRIBUTED OSO3DFD ON 64-NODES: 100% Efficiency  YASK AWP-ODC: Up to 2.8X  YASK ISO3DFD: Up to 2.5X  SEISOL: Up to 1.59X Up to  SPECFEM3D_GLOBE: Up to 1.8X 2.8X

1 - Performance is the Intel Xeon® Phi™ Processor 7250 compared to the Intel® Xeon® processor E5-2697 v4

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating43 your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others Solve Biggest Challenges Faster Performance Results Modeling & Simulation Artificial intelligence Life Science Finance Visualization TraininG scoring Up to Up to Up to g 5.0x 2.7x 5.2x Perf* Perf* Perf* LAMMPS Monte Carlo DP Embree Manufacturing, Engineering, Faster and Most And many Weather, Oil & gas, More Scalable1 Widely Deployed2 More… Applied science, Defense…

Fueling Breakthroughs in Science and Industry One Architecture for All Advanced Analytics

*See speaker notes for performance disclaimers **No published GPU result for 128 instances running the AlexNet topology † Internal development version 1 See next slide for data substantiating this claim

44

*Performance versus GPU Accelerator , see speaker notes for configuration details Intel Confidential – Do Not Forward 45 A common language for ai analytics ARTIFICIAL INTELLIGENCE Traditional analytics SENSE reason act ADAPT remember

Big data Machine Learning Reasoning systems analytics Deep Classic Logic Learning ML Memory based Based

#IntelAI 46 Machine Learning vs. Classic ML Deep learning

Using functions or algorithms to extract Using massive data sets to train deep (neural) insights from new data graphs that can extract insights from new data

Functions CNN, RNN, New 푓1, 푓2, … , 푓퐾 Untrained Trained RBM, Data Training . Random Forest Inference or etc. Data* . Decision Trees Classification . Graph Analytics . Regression . Naïve Bayes Step 2: Inference . Ensemble methods Step 1: Training . SVM Hours to Days …….. (scoring) . More… in Cloud Real-Time Use massive “known” dataset at Edge/Cloud (e.g. 10M tagged images) to Form inference about new iteratively adjust weighting of input data (e.g. a photo) New Data* neural network connections using trained neural network

#IntelAI Intel Confidential 47 Deep Learning example: image recognition

Step 1: Training Step 2: Scoring (In Data Center – Over Hours/Days/Weeks) (End point or Data Center - Instantaneous)

New input from Lots of labeled camera and input data Person sensors

Create “Deep Trained Trained neural neural net” Model network model math model

97% Output 90% person person Output Classification 8% traffic light Classification

Intel Confidential 48 Deep learning breakthroughs

30% Image recognition 30% Speech recognition 97% 23% person 23% 99% “play song”

15% 15%

Error Error

8% 8% Human Human 0% 0% 2010 Present 2000 Present enabling new and enhanced applications!

#IntelAI 49 Intel® Xeon Phi™ Processor Family Enables Shorter Time to Train Using General Purpose Infrastructure

Ideal for HPC & enterprise customers running scale-out, highly-parallel, memory intensive apps Removing IO and Breakthrough Highly Easier Programmability Memory Barriers Parallel Performance . Binary-compatible with Intel® Xeon® processors . Integrated Intel® Omni-Path fabric . Near linear scaling with 31X reduction . Open standards, libraries and increases price-performance and in time to train when scaling to 32 nodes reduces communication latency frameworks . Up to 400X performance on existing . Direct access of up to 400 GB of hardware via Intel software memory with no PCIe performance lag optimizations (vs. GPU:16GB) . Up to 4X deep learning performance increase estimated (Knights Mill, 2017) Configuration details on slide: 30 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

#IntelAI Intel Confidential 50 Intel® Xeon Phi™ processor performance Shattering misconceptions that CPU is not well-suited for deep learning: SW optimization delivers up to 400X perf gain on existing HW in <6 months

Normalized Throughput (Images/Second) 500 Up to 400 400 300 400x

200

100 Higher Higher isbetter 1.0 0

Caffe/AlexNet Xeon Phi™ processor Phi™baseline 7250 Xeon Normalized Images/SecondIntel® on Out-of-Box (OOB*) Performance Current Performance

Configuration details on slide: 30 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

#IntelAI Intel Confidential 51 Intel® Xeon Phi™ processor performance Continued performance breakthroughs: Knights mill (2017) will deliver Up to 4X deep learning performance increase over current generation Intel® Xeon phi™ processor Normalized Performance 6 Up to 4 4x

2

0

Xeon Phi™ Knights Knights Xeon MillPhi™ Deep Learning Performance Intel® Xeon Phi™ processor 7290 Intel® Xeon Phi™ processor family - Knights Mill

Configuration details Intel® on performance normalized Estimated on slide: 30 Knights Mill: Results have processor to Phi™ Intel® Xeon 7290compared been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

#IntelAI Intel Confidential 52 PUBLIC PRESENTATION Over 40 applications optimized for Intel® Xeon Phi™ processor family are available, with up to 6.48X (1.99X average) performance improvement1

BenchmarksTrinity Benchmarks Life Weather and Climate Physics / Geophysics / Material Manufacturi Financial

Science Energy Science ng Services 6.48 6

5 4.65

4.25

3.89 4.03

4 3.65

3.36

3.34

2.94

2.80

2.67

2.54

2.50 2.50

3 2.45

2.37

2.36

2.34

2.18

2.13

2.10

2.00

2.00

1.83

1.82

1.81

1.80

1.77

1.77

1.75

1.72

1.71

1.71

1.71

1.70

1.70

1.69

1.67

1.63

1.60

1.59

1.58

1.58

1.56

1.56

1.53

1.50 1.50 1.49

2 1.45

1.43

1.41

1.41

1.38

1.38

1.38

1.37

1.36

1.35

1.35

1.32

1.31

1.28

1.27

1.23 1.23

1.23

1.22

1.20

1.17

1.16

1.11

1 0.96 Normalized Results Normalized 1

0

NIM

POP

MILC

CP2K

VASP

HPCG

NEMO PETSc

HiFUN

VLPL-S

RELION

Linpack

SGEMM

DGEMM

OpenLB

HOMME

iso3DFD

MPAS-O

OpenLB

QphiXCG

GROMACS

GNAQPMS

GE TacomaGE

ROME/SML

TACCLB3D

IFS IFS PAPS14

NAMDstmv

Soft Sphere Soft

Trinity- GTC

Berkeley GW

Amber2w49

NAMDapoa1

Trinity- AMG

Tinity- SNAP

Trinity- MILC

Trinity- UMT

CloverLeaf3D CloverLeaf2D

STREAMTriad

AmberRubisco

Trinity- miniFE

NASAOverflow

MASNUM Wave

YASK- iso3DFD

BlackScholes SP

BlackScholes DP

Trinity- miniDFT

BinomialOptions

PWMAT- GaAs64

YASK- AWP-ODC

AmberPolio Virus

Trinity- miniGhost

PWMAT- GaAs160

WRF CONUSWRF 12KM

AmberNucleosome

DMI DMI HIROMB-BOOS

Quantum ESPRESSO

2s 2s Xeon® v4E5-2697

QphiXWilson DSLASH BAW AmericanBAW Options…

UPDATED AmericanBAW Options…

SeisSol- Mount Merapi GTS

SeisSol- M7.2 1992 Landers…

Coarse-GrainWater Simulation…

OpenFOAM MotorBike Cells 4M

SPECFEM_3DGLOBE25 nodes-…

SeisSol- Mount Merapi LTS MR2

OpenFOAM MotorBike Cells 20M OpenFOAM MotorBike Cells 11M

MonteCarlo European OptionsSP

MonteCarlo European OptionsDP OpenFOAM DrivAer car Cells 10M :…

Intel® Xeon Phi™ processor 7250 relative performance normalized to baseline (1) of a 2 socket Intel® Xeon® processorOpenFOAM DrivAer car Cells 10M :… E5-2697 v4) SPECFEM_3DGLOBE6 nodes - 55K… 1 – As demonstrated by respective proof points in

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, arethis measured presentation using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance 53 Intel® Deep Learning SDK Accelerate Deep Learning Training & Deployment

. FREE for data scientists and software developers to develop, train & deploy deep learning . Simplify installation of Intel optimized frameworks and libraries . Increase productivity through simple and highly-visual interface . Enhance deployment through model compression and normalization . Facilitate integration with full software stack via inference engine

http://software.intel.com/deep-learning-sdk

#IntelAI 54 Solve Biggest Challenges Faster Deep learning Performance Results (cont’d) training X 4 X 32 X 128 87% * Xeon Phi * 2.3x * 50x Faster 38% Faster Training Better Training Scaling 63% % Efficiency% GPU No published GPU Result**

4 4 1 16 32 1 128 GPU Xeon Phi # of instances Xeon Phi Xeon Phi

Topology: AlexNet Topology: GoogLeNet Topology: AlexNet

*See speaker notes for performance disclaimers **No published GPU result for 128 instances running the AlexNet topology Proven scalability for deep learning † Internal development version

55 Realize Compelling Value Power Efficiency & Cost Savings

2 x + 1 x Up to Up to 9x Up to 8x Perf/$* 683W* 378W* 5x Perf/W* Perf* vs. GPU Accelerator $13,750* $ $7,300*

*Intel measured results as of April 2016; see speakers notes for full configuration and performance disclaimers

56 Realize Compelling Value Programmability

Common code, “[…] porting to the Intel Xeon Phi processor only requires a simple recompile, and it tools and took us less than a week to hand-tune our developers kernels for AVX512. For the first time, this 1001010001 01010101000 will enable us to have a single set of kernels 01110100111 that work both on many core Xeon Phi and 10010010101 01000010110 future multi-core Xeon processors.” 11111000010 11101100010 – Erik Lindahl of KTH* and Stockholm University*, GROMACS* Project Leader

Extreme parallel performance with general purpose programming

*Other names and brands may be claimed as the property of others

57 Realize Compelling Value High Utilization

GPU accelerators in All Applications1 the datacenter are frequently idle, consuming space, Highly-Parallel power and capital Suitable budget* for GPU offload

GPU Optimized GPU

*Based on Intel customer survey feedback of customers with significant GPU deployments † All x86 applications that run on Intel® Xeon® processors

58 Maximize Your Potential Future-Ready Code

Software is long-lasting, longer than hardware  OpenMP*

SCRYU/Tetra* - CFD  MPI* scSTREAM* - CFD  Fortran*, C*, C++*… Dalton* Quantum Chemistry

WRF* - Weather  Open Source Libraries

NWCHEM* - Chemistry  Community Codes LAPACK* - Solvers  General-purpose approach PETSc* - Solvers

IJKMO Unified Model* - Weather Code optimizations based on Pam-Crash* open standards for a general Spice* purpose CPU are portable to NASTRAN* many similar architectures 1970 1980 1990 2000 2010 going forward.

*Other names and brands may be claimed as the property of others

59 Maximize Your Potential Broad Ecosystem Systems ISV Application Intel® Parallel >30 Providers1 >15 Partners1 >60 Computing Centers1 Intel® Xeon Phi™ Processor: Broad Ecosystem Support Intel® Parallel Computing Centers (IPCC)

www.intel.com/xeonphi/partners software.intel.com/en-us/ipcc

*Other names, brands and logos may be claimed as the property of others 1 As of June 2016

60 Maximize Your Potential Intel® Scalable Robust Roadmap System Framework

Compute Abril 2017 KNH* Memory/ Storage

KNM* Fabric 2016 KNL* 2018 KNC* KNF* Q4 2017 Software

*KNF (Knights Ferry), KNC (Knights Corner), KNL (Knights Landing) are abbreviations for former codenames for Intel® Xeon Phi™ product family products. KNM is the abbreviation for the Knights Mill codename for a future Intel® Xeon Phi™ product. KNH is the abbreviation for the Knights Hill codename of a future Intel® Xeon Phi™ product

61 Intel® Xeon Phi™ Processor SKU Lineup Integrated Cores Ghz Memory Fabric* Ddr4 power** Price† * 7290 72 1.5 16GB Yes 384GB 245W $6,254 7.2 GT/s 2400 MHz Best Performance/Node

7250 68 1.4 16GB Yes 384GB 215W $4,876 7.2 GT/s 2400 MHz Best Performance/Watt

7230 64 1.3 16GB Yes 384GB 215W $3,710 7.2 GT/s 2400 MHz Best Memory Bandwidth/Core

7210 64 1.3 16GB Yes 384GB 215W $2,438 6.4 GT/s 2133 MHz Best Value

*Available beginning in September **Add 15 watts for integrated fabric †Recommended Customer Pricing (RCP); add $287 for integrated fabric option

62 Developer Access Program Order today starting under $5K*

Highly-Parallel Performance

Software Tools & Libraries

Support & Training

http://dap.xeonphi.com/

*Cost for base configuration pedestal system as shown on http://dap.xeonphi.com/ninja-dev-platform-pedestal.aspx

63 Intel® Software Development Tools and Libraries for Developers and System Administrators

64 Create Faster Code…Faster Intel® Parallel Studio XE 2017 . High Performance Scalable Code – C++*, C*, Fortran*, Python* and Java* – Standards-driven parallel models: OpenMP*, MPI, and Intel® Threading Building Blocks (Intel® TBB) . New for 2017 – 2nd generation Intel® Xeon Phi™ processor and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) – Optimized compilers and libraries – Vectorization and threading optimization tools – High bandwidth memory optimization tools – Faster Python application performance – Faster deep learning on Intel® architecture Click for more information

65 Intel® Parallel Studio XE Tools to build, analyze and scale high performance software Click for more information

Optimizing Compilers Performance Scripting Intel® C/C++ and Fortran Compilers Intel® Distribution for Python* Machine Learning and Analytics Library Image, Signal, and Compression Routines

Intel® Data Analytics Acceleration Library Intel® Integrated Performance Primitives

EDITION BUILD COMPOSER Fast Math Library Task-Based Parallel C++ Template Library Intel® Intel® Threading Building Blocks

Performance Profiler Intel® VTune™ Amplifier Memory and Threading Debugging

adds: Vectorization Optimization & Thread Design Intel® Inspector EDITION EDITION

Intel® Advisor

ANALYZE PROFESSIONAL

MPI Profiler Intel® Trace Analyzer and Collector Cluster Diagnostic Expert System

adds: Scalable Cluster Messaging Intel® Cluster Checker

EDITION CLUSTER SCALE Intel® MPI Library

66 Intel® Xeon Phi™ Processor: Your Path to Deeper Insight A Foundational Element of Intel® Scalable System Framework

Solve Biggest Highly-Parallel Eliminate Bottlenecks Challenges Faster Scalability

Realize Power Efficiency Programmability Compelling Value High Utilization

Maximize Future-Ready Code Broad Ecosystem Future Potential Robust Roadmap

For discovery and business innovation in science, visualization & analytics

67