
© 2017 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Part of Intel® Parallel Studio XE Roadmap Notice: All information provided*Other here names is subject and to brands change withoutmay be notice.claimed as the property of others. Contact your Intel representative to obtainFor morethe latest complete Intel product information specifications about and compiler roadmaps optimizations,. see our Optimization Notice. Contacts Advisor Support Mail List [email protected] Zakhar Matveev [email protected] Intel Advisor Product Architect Kirill Rogozhin [email protected] Intel Advisor Project Manager Egor Kazachkov [email protected] Intel Advisor Senior Developer Optimization Notice Copyright © 2015, Intel Corporation. All rights reserved. Intel Confidential 2 *Other names and brands may be claimed as the property of others. What is Intel® Advisor Vectorization analysis Roofline Cache Simulator and MAP Python API Threading prototyping Optimization Notice Intel Confidential 3 Get Faster Code Faster! Intel® Advisor Vectorization Optimization Have you: Data Driven Vectorization: . Recompiled for AVX2 with little gain . What vectorization will pay off most? . Wondered where to vectorize? . What’s blocking vectorization? Why? . Recoded intrinsics for new arch.? . Are my loops vector friendly? . Struggled with compiler reports? . Will reorganizing data increase performance? . Is it safe to just use #pragma omp simd? Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 5 *Other names and brands may be claimed as the property of others. The Right Data At Your Fingertips Get all the data you need for high impact vectorization Filter by which loops What prevents Trip Counts are vectorized! vectorization? Focus on What vectorization Which Vector instructions How efficient hot loops issues do I have? are being used? is the code? Get Faster Code Faster! Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 6 *Other names and brands may be claimed as the property of others. 5 Steps to Efficient Vectorization Intel® Advisor – Vectorization Advisor 1. Compiler diagnostics + Performance 2. Guidance: detect problem and Data + SIMD efficiency information recommend how to fix it 3. Trip Counts + FLOP: understand utilization, parallelism granularity & overheads 4. Memory Access Patterns Analysis 5. Loop-Carried Dependency Analysis Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 7 *Other names and brands may be claimed as the property of others. 1. Compiler diagnostics + Performance Data + SIMD efficiency information + Binary Analysis Vector Efficiency: All The Data In One Place My “performance thermometer” • Auto-vectorization: affected <3% of code • With moderate speed-ups • First attempt to simply put #pragma omp simd: • Introduced slow-down Original (scalar) Achieved Upper bound: • Look at Vector Issues and Traits to find out why code efficiency. Efficiency 100% • All kinds of “memory manipulations” Corresponds efficiency • Usually an indication of “bad” access pattern to 1x speed-up. 4x gain (VL=4) Survey: Find out if your code is “under vectorized” and why Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 9 *Other names and brands may be claimed as the property of others. Vectorization tied to your code Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 10 *Other names and brands may be claimed as the property of others. Don’t Just Vectorize, Vectorize Efficiently See detailed times for each part of your loops. Is it worth more effort? Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 11 *Other names and brands may be claimed as the property of others. 1. Compiler diagnostics + Performance 2. Guidance: detect problem and Data + SIMD efficiency information recommend how to fix it Get Specific Advice For Improving Vectorization Click to see recommendation Advisor shows hints to move iterations to vector body. Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 13 *Other names and brands may be claimed as the property of others. 1. Compiler diagnostics + Performance 2. Guidance: detect problem and Data + SIMD efficiency information recommend how to fix it 3. Trip Counts + FLOP: understand utilization, parallelism granularity & overheads Critical Data Made Easy Loop Trip Counts Knowing the time spent in a loop is not enough! Check Find trip counts for actual trip each part of a loop counts Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 15 *Other names and brands may be claimed as the property of others. Precise Repeatable FLOP Metrics . FLOPS by loop and function . Instrumentation (count FLOP) plus . All recent Intel processors sampling (time with low overhead) . Adjusted for masking with AVX-512 processors Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 16 *Other names and brands may be claimed as the property of others. 1. Compiler diagnostics + Performance 2. Guidance: detect problem and Data + SIMD efficiency information recommend how to fix it 3. Trip Counts + FLOP: understand utilization, parallelism granularity & overheads 3. Memory Access Patterns Analysis Improve Vectorization Memory Access pattern analysis Select loops of interest Run Memory Access Patterns analysis, just to check how memory is used in the loop and the called function Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 18 *Other names and brands may be claimed as the property of others. Advisor Memory Access Pattern (MAP): know your access pattern Unit-Stride access for (i=0; i<N; i++) A[i] = C[i]*D[i] Constant stride access for (i=0; i<N; i++) point[i].x = x[i] Variable stride access for (i=0; i<N; i++) A[B[i]] = C[i]*D[i] Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 19 *Other names and brands may be claimed as the property of others. Find vector optimization opportunities Memory Access pattern analysis Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 20 *Other names and brands may be claimed as the property of others. 1. Compiler diagnostics + Performance 2. Guidance: detect problem and Data + SIMD efficiency information recommend how to fix it 3. Trip Counts + FLOP: understand utilization, parallelism granularity & overheads 4. Memory Access Patterns Analysis 5. Loop-Carried Dependency Analysis Enabling vectorization Check dependencies Use #pragma simd Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 18 *Other names and brands may be claimed as the property of others. Is It Safe to Vectorize? Loop-carried dependencies analysis verifies correctness Select loop for Vector Dependence Correct prevents Analysis and Vectorization! press play! Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 23 *Other names and brands may be claimed as the property of others. Correctness – Is It Safe to Vectorize? Loop-carried dependencies analysis Received recommendations to force vectorization of a loop: 1. Mark-up loop and check for REAL dependencies Detected dependencies 2. Explore dependencies with code snippets In this example 3 dependencies were detected: . RAW – Read After Write . WAR – Write After Read . WAW – Write After Write This is NOT a good candidate to force Source lines with Read and vectorization! Write accesses detected Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 24 *Other names and brands may be claimed as the property of others. Data Dependencies – Tough Problem #1 Is it safe to force the compiler to vectorize? Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 25 *Other names and brands may be claimed as the property of others. 26 Questions to answer with Roofline: for your loops / functions Am I doing well? How far am I from the Where is the final bottleneck? peak? 1 2 (where will be my limit after all optimizations?) (do I utilize hardware well or not?) Long-term ROI, optimization strategy Big optimization gap. Platform underutilization. Memory-bound, invest into cache Compute bound: invest into blocking etc SIMD,.. Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. Intel Confidential *Other names and brands may be claimed as the property of others. Automated Roofline Chart Generation in Advisor - CARM Each Roof (slope) Legend: Gives peak CPU/Memory throughput of your PLATFORM (benchmarked) - - Takes less time - - Takes considerable time - - Takes much time Each Dot represents loop or function in YOUR APPLICATION (profiled) Summarized memory-compute efficiency picture for the application Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. Intel Confidential 28 *Other names and brands may be claimed as the property of others. Roofline picture Chart configuration Roof configuration Performance headroom Tooltip with more data for dots Switch to grid represenation Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 29 *Other names and brands may be claimed as the property of others. Chart configuration Select which operations are Aggregate data counted over calltree Select memory levels Select only loads or stores Optimization Notice Copyright © 2017, Intel Corporation. All rights reserved. 30 *Other names
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages89 Page
-
File Size-