Vector Speedup of Mkfit: Effects of Different SIMD Options & Turbo Boost

Vector Speedup of Mkfit: Effects of Different SIMD Options & Turbo Boost

Vector Speedup of mkFit: Effects of Different SIMD Options & Turbo Boost Steve Lantz 4/17/2020 1 What’s the Point? • Is it really best to measure mkFit’s vectorization scaling by increasing MPT_SIZE with the code optimized for AVX-512? – It means that the “serial” (MPT_SIZE=1) code is still being vectorized by the compiler, to the extent that it can do so – Previously we found that turning off vectorization entirely increases the serial time by about 15% – To make scaling tests more consistent, maybe we should also match intermediate MPT_SIZEs (4, 8) to the right ISA extensions (SSE, AVX2)? • There is also a question if AVX-512 is really faster than AVX2 when Turbo Boost is enabled and all cores are active 4/17/2020 2 Compiling for the Test Runs on phi3 • The benchmark script and Makefile.config were altered to use different “vOpts” at compile time, depending on MPT_SIZE: -DMPT_SIZE=1 -g -O3 -qopenmp -no-vec -qno-openmp-simd -DMPT_SIZE=2 -g -O3 -qopenmp -march=core2 # = -xssse3 -DMPT_SIZE=4 -g -O3 -qopenmp -march=core2 -DMPT_SIZE=8 -g -O3 -qopenmp -march=haswell # = -xcore-avx2 -DMPT_SIZE=16 -g -O3 -qopenmp -xHost -qopt-zmm-usage=high • Surprising (?) result: MPT_SIZE=4 became anomalously slow – Hypothesis: this is due to the lack of an FMA instruction in SSE – Tests were repeated with -no-fma option for all MPT_SIZE values 4/17/2020 3 Vectorization Scaling Test Results on phi3 4/17/2020 4 Reasons Not to Publish That Figure • The Amdahl fit becomes worse because FMAs add about 3% to the performance of AVX2 and AVX-512 – Not something we want to have to explain! • Probably there are also other shortcuts and improvements in the later instruction sets – The performance improvement from using them goes beyond the degree of vectorization 4/17/2020 5 Overall Vectorization Scaling Test Results Before: AVX-512 (or AVX) options for all After: Options matched to MPT_SIZE ?? 4/17/2020 6 Reasons Not to Publish That Figure Either • KNL suddenly looks like it has superpowers with AVX-512 – The real reason: only 1 of its 2 VPUs can process AVX2 and earlier – Again, not something we want to have to explain! • SNB sticks out like a sore thumb due to flat speedup at size 8 – SNB predates AVX2, so AVX must be used for MPT_SIZE=8 – Therefore SNB has no real FMA (though it can pipeline MUL and ADD) – We have never gotten more than 2x vectorization speedup on SNB – To save the apology, Giuseppe and I opted to eliminate SNB from the mkFit paper 4/17/2020 7 Turbo Boost for AVX2 vs. AVX-512 on phi3 Turbo ISA ext. Events MEIF Threads GHz range Time, s Evt. loop, s ON AVX2 640 32 64 2.2-2.4 1.12 1 ON AVX-512 640 32 64 1.9-2.0 1.13 1.02 off AVX2 640 32 64 2.0-2.1 1.26 1.11 off AVX-512 640 32 64 1.9-1.95 1.22 1.08 ON AVX2 6400 32 64 2.2-2.4 8.21 8.06 ON AVX-512 6400 32 64 1.9-2.0 8.16 8.02 off AVX2 6400 32 64 2.0-2.1 8.86 8.69 off AVX-512 6400 32 64 1.9-1.95 8.44 8.24 ON AVX2 6400 32 32 2.2-2.4 9.43 9.25 ON AVX-512 6400 32 32 1.9-1.95 9.78 9.65 off AVX2 6400 32 32 2.0-2.1 10.6 10.44 off AVX-512 6400 32 32 1.9-1.95 9.98 9.83 Conclusion: when all cores run mkFit, Turbo lets AVX2 perform as well as AVX-512 – intuitively, if vectors are narrower, clocks run faster, and the work is done just as fast 4/17/2020 8 Backup 4/17/2020 9 Results for Serial Baseline on phi3, 1 core Options for serial run Build time for 20 events, s -xHost -qopt-zmm-usage=high 1.531 -no-simd -no-vec 1.738 -qno-openmp-simd -no-vec 1.760 – The more restrictive options lead to ~15% slower times – Speedups are nearly the same if first event is discarded 4/17/2020 10 Speedup Curves with Different Baselines 4/17/2020 11.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us