Intel® Advisor User Guide

Total Page:16

File Type:pdf, Size:1020Kb

Intel® Advisor User Guide Intel® Advisor User Guide Intel Corporation Intel® Advisor User Guide Contents Chapter 1: Intel® Advisor User Guide Introduction .............................................................................................7 What's New ......................................................................................8 Design and Optimization Methodology................................................ 13 Tutorials ........................................................................................ 17 Get Help and Support ...................................................................... 18 Install and Launch Intel® Advisor ............................................................... 19 Install Intel® Advisor........................................................................ 19 Set Up Environment Variables ........................................................... 20 Set Up System to Analyze GPU Kernels .............................................. 22 Set Up Environment to Offload DPC++, OpenMP* target, and OpenCL™ Applications to CPU ..................................................................... 24 Launch Intel® Advisor ...................................................................... 25 Get Started with Intel® Advisor GUI ................................................... 28 Set Up Project......................................................................................... 30 Configure Target Application ............................................................. 30 Limit the Number of Threads Used by Parallel Frameworks ........... 31 Choose a Small, Representative Data Set .................................. 31 Build Target Application.................................................................... 32 Create Project................................................................................. 37 Configure Project .................................................................... 38 Configure Binary/Symbol Search Directories ............................... 45 Configure Source Search Directory ............................................ 46 Binary/Symbol Search and Source Search Locations .................... 47 Analyze Vectorization Perspective .............................................................. 49 Run Vectorization and Code Insights Perspective from GUI.................... 50 Vectorization Accuracy Presets.................................................. 52 Customize Vectorization and Code Insights Perspective ................ 53 Run Vectorization and Code Insights Perspective from Command Line .... 58 Vectorization Accuracy Levels in Command Line ......................... 64 Explore Vectorization and Code Insights Results .................................. 65 Vectorization Report Overview .................................................. 65 Examine Not-Vectorized and Under-Vectorized Loops ................... 67 Analyze Loop Call Count .......................................................... 70 Investigate Memory Usage and Traffic........................................ 71 Find Data Dependencies........................................................... 74 Analyze CPU Roofline ............................................................................... 76 Run CPU / Memory Roofline Insights Perspective from GUI.................... 77 CPU Roofline Accuracy Presets .................................................. 79 Customize CPU / Memory Roofline Insights Perspective ................ 79 Run CPU / Memory Roofline Insights Perspective from Command Line .... 84 CPU Roofline Accuracy Levels in Command Line ......................... 90 Explore CPU/Memory Roofline Results ................................................ 91 CPU Roofline Report Overview .................................................. 91 Examine Bottlenecks on CPU Roofline Chart................................ 97 Examine Relationships Between Memory Levels ........................ 100 Compare CPU Roofline Results ................................................ 105 Model Threading Designs........................................................................ 107 2 Contents Run Threading Perspective from GUI................................................ 109 Customize Threading Perspective ............................................ 110 Run Threading Perspective from Command Line ................................ 116 Threading Accuracy Levels in Command Line ........................... 120 Annotate Code for Deeper Analysis .................................................. 121 Annotate Code to Model Parallelism ......................................... 123 Annotations ......................................................................... 133 Annotation Report................................................................. 166 Model Threading Parallelism............................................................ 172 Suitability Report Overview .................................................... 175 Choose Modeling Parameters in the Suitability Report ................ 179 Fix Annotation-related Errors Detected by the Suitability Tool...... 181 Advanced Modeling Options.................................................... 181 Reduce Parallel Overhead, Lock Contention, and Enable Chunking 182 Check for Dependencies Issues ....................................................... 184 Code Locations Pane ............................................................. 184 Filter Pane (Dependencies Report)........................................... 186 Problems and Messages Pane ................................................. 186 Dependencies Source Window ................................................ 187 Add Parallelism to Your Program...................................................... 192 Before You Add Parallelism: Choose a Parallel Framework ........... 192 Add the Parallel Framework to Your Build Environment............... 196 Annotation Report................................................................. 199 Replace Annotations with Intel® oneAPI Threading Building Blocks (oneTBB) Code ....................................................... 200 Replace Annotations with OpenMP* Code ................................. 205 Next Steps for the Parallel Program ......................................... 219 Model Offloading to a GPU ...................................................................... 220 Run Offload Modeling Perspective from GUI ...................................... 222 Offload Modeling Accuracy Presets .......................................... 224 Customize Offload Modeling Perspective................................... 225 Run Offload Modeling Perspective from Command Line ....................... 231 Offload Modeling Accuracy Levels in Command Line .................. 244 Run GPU-to-GPU Performance Modeling from Command Line ...... 247 Explore Offload Modeling Results..................................................... 250 Offload Modeling Report Overview ........................................... 251 Examine Regions Recommended for Offloading ......................... 253 Examine Data Transfers for Modeled Regions ............................ 255 Check for Dependency Issues ................................................. 257 Explore Performance Gain from GPU-to-GPU Modeling................ 258 Investigate Non-Offloaded Code Regions .................................. 260 Advanced Modeling Strategies ........................................................ 268 Manage Invocation Taxes ....................................................... 268 Enforce Offloading for Specific Loops ....................................... 270 Check How Assumed Dependencies Affect Modeling................... 271 Analyze GPU Roofline ............................................................................. 273 Run GPU Roofline Insights Perspective from GUI................................ 275 GPU Roofline Accuracy Presets ............................................... 276 Customize GPU Roofline Insights Perspective ............................ 277 Run GPU Roofline Insights Perspective from Command Line ................ 281 GPU Roofline Accuracy Levels in Command Line ....................... 287 Explore GPU Roofline Results .......................................................... 288 Examine GPU Roofline Summary ............................................. 289 Examine Bottlenecks on GPU Roofline Chart ............................. 290 Examine Kernel Details .......................................................... 296 3 Intel® Advisor User Guide Compare GPU Roofline Results................................................ 299 Design and Analyze Flow Graphs ............................................................. 301 Where to Find the Flow Graph Analyzer ............................................ 301 Launching the Flow Graph Analyzer ................................................. 302 Flow Graph Analyzer GUI Overview.................................................. 303 Menus ................................................................................. 304 Toolbars .............................................................................. 306 Tabs.................................................................................... 307 Main Canvas......................................................................... 310 Charts ................................................................................. 310 Flow Graph Analyzer Workflows......................................................
Recommended publications
  • Beyond BIOS Developing with the Unified Extensible Firmware Interface
    Digital Edition Digital Editions of selected Intel Press books are in addition to and complement the printed books. Click the icon to access information on other essential books for Developers and IT Professionals Visit our website at www.intel.com/intelpress Beyond BIOS Developing with the Unified Extensible Firmware Interface Second Edition Vincent Zimmer Michael Rothman Suresh Marisetty Copyright © 2010 Intel Corporation. All rights reserved. ISBN 13 978-1-934053-29-4 This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. Fictitious names of companies, products, people, characters, and/or data mentioned herein are not intended to represent any real individual, company, product, or event. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel, the Intel logo, Celeron, Intel Centrino, Intel NetBurst, Intel Xeon, Itanium, Pentium, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • Intel Advisor for Dgpu Intel® Advisor Workflows
    Profile DPC++ and GPU workload performance Intel® VTune™ Profiler, Advisor Vladimir Tsymbal, Technical Consulting Engineer, Intel, IAGS Agenda • Introduction to GPU programming model • Overview of GPU Analysis in Intel® VTune Profiler • Offload Performance Tuning • GPU Compute/Media Hotspots • A DPC++ Code Sample Analysis Demo • Using Intel® Advisor to increase performance • Offload Advisor discrete GPUs • GPU Roofline for discrete GPUs Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 2 Intel GPUs and Programming Model Gen9 Application Workloads • Most common Optimized Middleware & Frameworks in mobile, desktop and Intel oneAPI Product workstations Intel® Media SDK Direct Direct API-Based Gen11 Programming Programming Programming • Data Parallel Mobile OpenCL platforms with C API C++ Libraries Ice Lake CPU Gen12 Low-Level Hardware Interface • Intel Xe-LP GPU • Tiger Lake CPU Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 3 GPU Application Analysis GPU Compute/Media Hotspots • Visibility into both host and GPU sides • HW-events based performance tuning methodology • Provides overtime and aggregated views GPU In-kernel Profiling • GPU source/instruction level profiling • SW instrumentation • Two modes: Basic Block latency and memory access latency Identify GPU occupancy and which kernel to profile. Tune a kernel on a fine grain level Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 4 GPU Analysis: Aggregated and Overtime Views Copyright © 2020, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
    [Show full text]
  • Hands-On Intel® Software Development & Oneapi WORKSHOP
    Hands-on Intel® Software Development & oneAPI WORKSHOP May 26-27, 2020 Scandic Solli, Parkveien 68 Box 2458 Solli, 0202 Oslo AGENDA DAY 1 - Technical Computing & Developer Tools - May 26 Timing Sessions 08:30 – 09:00 Registration & Light breakfast Part 1: Coding for maximum performance using the new Intel® Parallel Studio XE 2020 A refresher on the Intel® Hardware Architecture for Software Developers and Architects This session will offer in-depth insights into the current and future Intel® hardware platforms tailored to the 09:00 -09:45 needs of software developers, software architects, HPC and AI experts. We will cover the latest Intel® processors and the future Intel® GPU architecture. Developing code for Intel® architecture: how to achieve maximum performance using the new Intel® Parallel Studio XE 2020 09:45 – 10:30 Learn how Intel® Software Development Tools will help you to achieve optimal performance in your High Performance Computing, Artificial Intelligence ,and IoT projects. Includes a look at the new Intel® Parallel Studio XE 2020 tools which are designed to take advantage of the latest generation of Intel processors. 10:30 – 11:00 Coffee Break How to optimize and maximize code performance Learn how to use some of the advanced features of Intel® VTune™ Amplifier profile your applications. See how you can use event-based and architectural analysis to fine-tune your code so that it is taking full 11:00 – 12:00 advantage of the latest processor features of the target CPU. Learn how to use Intel Advisor, a powerful tool for tracking down and solving vectorization problems. In this session we will demonstrate how the Intel Advisor vector analysis and associated Roofline Model can be used to identify and help fixing vectorization problems.
    [Show full text]
  • Evaluating Techniques for Parallelization Tuning in MPI, Ompss and MPI/Ompss
    Evaluating techniques for parallelization tuning in MPI, OmpSs and MPI/OmpSs Advisors: Author: Prof. Jesús Labarta Vladimir Subotic´ Prof. Mateo Valero Prof. Eduard Ayguade´ A THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor per la Universitat Politècnica de Catalunya Departament d’Arquitectura de Computadors Barcelona, 2013 i Abstract Parallel programming is used to partition a computational problem among multiple processing units and to define how they interact (communicate and synchronize) in order to guarantee the correct result. The performance that is achieved when executing the parallel program on a parallel architec- ture is usually far from the optimal: computation unbalance and excessive interaction among processing units often cause lost cycles, reducing the efficiency of parallel computation. In this thesis we propose techniques oriented to better exploit parallelism in parallel applications, with especial emphasis in techniques that increase asynchronism. Theoretically, this type of parallelization tuning promises multiple benefits. First, it should mitigate communication and synchro- nization delays, thus increasing the overall performance. Furthermore, parallelization tuning should expose additional parallelism and therefore increase the scalability of execution. Finally, increased asynchronism would allow more flexible communication mechanisms, providing higher toler- ance to slower networks and external noise. In the first part of this thesis, we study the potential for tuning MPI par- allelism. More specifically, we explore automatic techniques to overlap communication and computation. We propose a speculative messaging technique that increases the overlap and requires no changes of the orig- inal MPI application. Our technique automatically identifies the applica- tion’s MPI activity and reinterprets that activity using optimally placed non-blocking MPI requests.
    [Show full text]
  • Introduction to Intel Performance Tools Part
    Introduction to Intel Performance Tools Part 1/2 Doug Roberts SHARCNET / COMPUTE CANADA Intel® Performance Tools o Intel Advisor - Optimize Vectorization and Thread Prototyping for C, C++, Fortran o Intel Inspector - Easy-to-use Memory and Threading Error Debugger for C, C++, Fortran o Intel Vtune Amplifier - Serial/Threaded Performance Profiler for C, C++, Fortran, Mixed Python o Intel Trace Analyzer and Collector - Understand MPI application behavior for C, C++, Fortran, OpenSHMEM o Intel Distribution for Python - High-performance Python powered by native Intel Performance Libraries Intel® Parallel Studio XE – Cluster Edition https://software.intel.com/en-us/parallel-studio-xe o Intel Advisor* https://software.intel.com/en-us/intel-advisor-xe o Intel Inspector* https://software.intel.com/en-us/intel-inspector-xe o Intel Vtune Amplifier* https://software.intel.com/en-us/intel-vtune-amplifier-xe o Intel Trace Analyzer and Collector* https://software.intel.com/en-us/intel-trace-analyzer o Intel Distribution for Python https://software.intel.com/en-us/distribution-for-python * Product Support → Training, Docs, Faq, Code Samples Initializating the Components – The Intel Way ssh graham.sharcnet.ca cd /opt/software/intel/18.0.1/parallel_studio_xe_2018.1.038 source psxevars.sh → linux/bin/compilervars.sh → clck_2018/bin/clckvars.sh → itac_2018/bin/itacvars.sh → inspector_2018/inspxe-vars.sh → vtune_amplifier_2018/amplxe-vars.sh → advisor_2018/advixe-vars.sh Examples ls /opt/software/intel/18.0.1/parallel_studio_xe_2018.1.038/samples_2018/en
    [Show full text]
  • Intel® Software Products Highlights and Best Practices
    Intel® Software Products Highlights and Best Practices Edmund Preiss Business Development Manager Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de! Hier klicken & informieren! Agenda • Key enhancements and highlights since ISTEP’11 • Industry segments using Intel® Software Development Products • Customer Demo and Best Practices Copyright© 2012, Intel Corporation. All rights reserved. 2 *Other brands and names are the property of their respective owners. Key enhancements & highlights since ISTEP’11 3 All in One -- Intel® Cluster Studio XE 2012 Analysis & Correctness Tools Shared & Distributed Memory Application Development Intel Cluster Studio XE supports: -Shared Memory Processing MPI Libraries & Tools -Distributed Memory Processing Compilers & Libraries Programming Models -Hybrid Processing Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® VTune™ Amplifier XE New VTune Amplifier XE features very well received by Software Developers Key reasons : • More intuitive – Improved GUI points to application inefficiencies • Preconfigured & customizable analysis profiles • Timeline View highlights concurrency issues • New Event/PC counter ratio analysis concept easy to grasp Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® VTune™ Amplifier XE The Old Way versus The New Way The Old Way: To see if there is an issue with branch misprediction, multiply event value (86,400,000) by 14 cycles, then divide by CPU_CLK_UNHALTED.THREAD (5,214,000,000). Then compare the resulting value to a threshold. If it is too high, investigate. The New Way: Look at the Branch Mispredict metric, and see if any cells are pink.
    [Show full text]
  • Intel® Offload Advisor
    NHR@ZIB - Intel oneAPI Workshop, 2-3 March 2021 Intel® Advisor Offload Modelling and Analysis Klaus-Dieter Oertel Intel® Advisor for High Performance Code Design Rich Set of Capabilities Offload Modelling Design offload strategy and model performance on GPU. One Intel Software & Architecture (OISA) 2 Agenda ▪ Offload Modelling ▪ Roofline Analysis – Recap ▪ Roofline Analysis for GPU code ▪ Flow Graph Analyzer One Intel Software & Architecture (OISA) 3 Offload Modelling 4 Intel® Advisor - Offload Advisor Find code that can be profitably offloaded Starting from an optimized binary (running on CPU): ▪ Helps define which sections of the code should run on a given accelerator ▪ Provides performance projection on accelerators One Intel Software & Architecture (OISA) 5 Intel® Advisor - Offload Advisor What can be expected? Speedup of accelerated code 8.9x One Intel Software & Architecture (OISA) 6 Modeling Performance Using Intel® Advisor – Offload Advisor Baseline HW (Programming model) Target HW 1. CPU (C,C++,Fortran, Py) CPU + GPU measured measured estimated 1.a CPU (DPC++, OCL, OMP, CPU + GPU measured “target=host”) measured estimated 2 CPU+iGPU (DPC++, OCL, OMP, CPU + GPU measured “target=offload”) measured Estimated Optimization Notice Copyright © 2019, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Modeling Performance Using Intel® Advisor – Offload Advisor Region X Region Y Execution time on baseline platform (CPU) • Execution time on accelerator. Estimate assuming bounded exclusively
    [Show full text]
  • Intel Edge Computing Portfolio E-Booklet
    Intel Edge Computing Portfolio e-Booklet Learn how Intel products, developer tools, programs, and ecosystem solutions accelerate the development and deployment of edge computing solutions 2021 Febuary Release Legal Notices and Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer to learn more. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Other names and brands may be claimed as the property of others. Any third-party information referenced on this document is provided for information only. Intel does not endorse any specific third-party product or entity mentioned on this document. Intel, the Intel Logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Intel Statement on Product Usage Intel is committed to respecting human rights and avoiding complicity in human rights abuses. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to a violation of an internationally recognized human right. © Intel Corporation. Contents Unmatched
    [Show full text]
  • Intel® Inspector XE 2013
    MEMORY AND THREAD DEBUGGER Product Brief Intel® Inspector XE 2013 Top Features Deliver More Reliable Applications . Inspect C, C++, C# and Fortran Intel® Inspector XE 2013 is an easy to use dynamic memory and threading error detector for . No special builds required. Use Windows* and Linux*. Enhance productivity, cut cost and speed time-to-market. your normal compiler and build. Find memory and threading defects early in the development cycle. The earlier an error is . Inspects all code, even if the found, the lower the cost. Intel Inspector XE makes it easy to find and diagnose errors early. source is unavailable . Highlights the error at multiple Find errors that traditional regression testing and static analysis miss. Intel Inspector XE source code locations, provides finds latent errors on the executed code path plus intermittent and non-deterministic errors, corresponding call stacks even if the error-causing timing scenario does not happen. New! Debugger breakpoints for easier diagnosis of difficult bugs Memory Errors Threading Errors . New! Heap growth analysis finds . Memory leaks . Data races cause of heap growth in a . Memory corruption and Illegal Accesses - Heap races problematic region . Allocation / de-allocation API mismatches - Stack races . Inconsistent memory API usage . Deadlocks “We struggled for a week with a crash situation, the corruption was C, C++, C# and Fortran. Or any mix. Is your GUI in C# with performance sensitive code in C++? identified but the source was Got legacy code in Fortran? Using libraries without the source? No problem, it all works. really hard to find. Then we ran Dynamic instrumentation enables inspection of all code including third party libraries where the Intel® Inspector XE and source is not available.
    [Show full text]
  • Intel® Inspector for Systems
    Intel® Inspector for Systems 1 Agenda 1. Intro to Intel® Inspector 2. The Inspector workflow and walk thru 3. Dynamic Memory and Threading Analysis 4. Static Analysis 5. Readying your sources and builds 6. Managing analysis results 7. Team collaboration 8. Advanced features Copyright© 2013, Intel Corporation. All rights reserved. 2 *Other brands and names are the property of their respective owners. Intel® Inspector Is a debugging tool for threaded software. It is also called as a “Correctness Analyzer”. Has an intuitive GUI. Provides powerful results management, navigation and filtering! Easy to use one-click help for diagnostics (Possible causes and solution suggestions) Finds threading bugs in OpenMP*, CilkTM Plus, Intel® Threading Building Blocks, Win32* and Posix Threads threaded software Locates bugs quickly that can take days to find using traditional methods and tools – Isolates problems, not the symptoms – Bug does not have to occur to find it! Intel® Inspector has a comprehensive portfolio of analyses and an easy to use GUI for effective and efficient results management. Copyright© 2013, Intel Corporation. All rights reserved. 3 *Other brands and names are the property of their respective owners. Motivation for The Inspector Where are my application’s… Memory Errors Threading Errors Security Errors • Invalid Accesses • Races • Buffer overflows and • Memory Leaks • Deadlocks underflows • Uninitialized Memory • Cross Stack References • Incorrect pointer usage Accesses • Over 250 error types… • Developing threaded applications can be complex and expensive • New class of correctness problems are caused by the interaction between concurrent threads Multi-threading problems are hard to reproduce, difficult to debug and expensive to fix! 4 Copyright© 2013, Intel Corporation.
    [Show full text]
  • Intel® Parallel Studio XE 2020 Update 2 Release Notes
    Intel® Parallel StudIo Xe 2020 uPdate 2 15 July 2020 Contents 1 Introduction ................................................................................................................................................... 2 2 Product Contents ......................................................................................................................................... 3 2.1 Additional Information for Intel-provided Debug Solutions ..................................................... 4 2.2 Microsoft Visual Studio Shell Deprecation ....................................................................................... 4 2.3 Intel® Software Manager ........................................................................................................................... 5 2.4 Supported and Unsupported Versions .............................................................................................. 5 3 What’s New ..................................................................................................................................................... 5 3.1 Intel® Xeon Phi™ Product Family Updates ...................................................................................... 12 4 System Requirements ............................................................................................................................. 13 4.1 Processor Requirements........................................................................................................................ 13 4.2 Disk Space Requirements .....................................................................................................................
    [Show full text]
  • Code That Performs
    PRODUCT BRIEF High-Performance Computing Intel® Parallel Studio XE 2020 Software Code that Performs Intel® Parallel Studio XE helps developers take their HPC, enterprise, AI, and cloud applications to the max—with fast, scalable, and portable parallel code Intel® Parallel Studio XE is a comprehensive suite of development tools that make it fast and easy to build modern code that gets every last ounce of performance out of the newest Intel® processors. This tool-packed suite simplifies creating code with the latest techniques in vectorization, multi-threading, multi-node, and memory optimization. Get powerful, consistent programming with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions for Intel® Xeon® Scalable processors, plus support for the latest standards and integrated development environments (IDEs). Who Needs It? • C, C++, Fortran, and Python* software developers and architects building HPC, enterprise, AI, and cloud solutions • Developers looking to maximize their software’s performance on current and future Intel® platforms What it Does • Creates faster code1. Boost application performance that scales on current and future Intel® platforms with industry-leading compilers, numerical libraries, performance profilers, and code analyzers. • Builds code faster. Simplify the process of creating fast, scalable, and reliable parallel code. • Delivers Priority Support. Connect directly to Intel’s engineers for confidential answers to technical questions, access older versions of the products, and receive free updates for a year. Paid license required. What’s New • Speed artificial intelligence inferencing. Intel® Compilers, Intel® Performance Libraries and analysis tools support Intel® Deep Learning Boost, which includes Vector Neural Network Instructions (VNNI) in 2nd generation Intel® Xeon® Scalable processors (codenamed Cascade Lake/AP platforms) • Develop for large memories of up to 512GB DIMMs with Persistence.
    [Show full text]