Enhancing OMB with Python Benchmarks Presentation at MUG ‘21

Enhancing OMB with Python Benchmarks Presentation at MUG ‘21

MVAPICH2 Meets Python: Enhancing OMB with Python Benchmarks Presentation at MUG ‘2 Nawras Alnaasan Network-Based Computing Laboratory Dept. of Computer Science and Engineering !he #hio State $ni%ersity alnaasan.&'osu.edu O$tline • #S$ (icro Benc"marks )#(B* • +"y ,yt"on- • (,. for ,yt"on • .ntroducing #(B-,y • .nitial /esults • Conclusion !etwork Base" Com#$ting %a&oratory MUG ‘2 2 O)U Micro Benchmarks (OMB+ • #(B is a widely used package to measure performance of (,. operations. • .t helps in optimi0ing (,I applications on different HPC systems. • #1ers a series of benchmarks including but not limited to point-to-point, blocking and non-blocking collecti%es, and one-sided tests. • A large set of options and flags for users to create customi0able tests. • Supports different platforms like /#Cm/CUDA to run on AR(4N5IDIA GPUs. • +ritten in C !etwork Base" Com#$ting %a&oratory MUG ‘2 ( -hy Python? • Second most popular programming language according to the !.#BE inde7 as of August 8021. Adopted from= "ttps=44www.tiobe.com4tiobe-inde74 • 6reat community and support around (achine Learning Big Data and Cloud Computing – (any deep learning frameworks like ,yTorch, !ensor3ow :eras etc. – #ne of the most important tools for data science and analytics. • 6reat ,otential to use (,. and distributed computing tec"niques. !etwork• (any Base" other Com#$ting ,yt"on libraries and frameworks like Sci,y Numpy D<ango etc. %a&oratory MUG ‘2 , -hy Python? 5ery flexible and simplified synta7. Adopted from= !etwork Base" Com#$ting "ttps:44www."ardikp.com489&>4&84?94pyt"on-cpp4 %a&oratory MUG ‘2 / MPI for Python • (,. support in ,yt"on is needed for multiple purposes especially for distributed computing applications )DL data science etc.* • ,yt"on needs a wrapper to use (,. functionalities. • mpiApy is t"e most widely used wrapper to pro%ide ,yt"on bindings for (,. • #t"er wrappers exist like t"e torc".distributed package from ,y!orc" !etwork Base" Com#$ting %a&oratory MUG ‘2 0 mpi4py • Bor #(B-,y we mainly use mpiApy to provide ,yt"on bindings for (,.. • Supports communication for selected ,yt"on ob<ects as memory bu1ers )bytearrays Numpy etc.* • Capable of seriali0ing general ,yt"on ob<ects using built-in C,ickleD module. – (,..Send)* uses supported data structures as memory bu1ers. – (,..send)* seriali0es ob<ects before sending data. • /ecent releases E%?.& supports 6,$-aware (,.. !etwork Base" Com#$ting %a&oratory MUG ‘2 2 Intro"$cing OMB4Py PythonPython OMB4PyOMB4Py A##'icationsA##'ications BenchmarksBenchmarks PythonPython m#i,#ym#i,#y MPIMPI Im#'ementationIm#'ementation 1or1or CPUCPU oror GPUGPU Har"wareHar"ware com#onentscom#onents !etwork Base" Com#$ting %a&oratory MUG ‘2 3 -hat does OMB4Py o6er.7 • Bollows a similar design to the original OMB and brings it into Pyt"on. • Support for point-to-point and collecti%e (,. operations. • Mimics beha%ior of real python applications t"at use MPI. • Support for both CP$ and 6,U tests using CUDA-aware Python data structures as buffers like Cu,y Numba and PyCUDA. • A wide range of command-line arguments to run customi0able tests. !etwork Base" Com#$ting %a&oratory MUG ‘2 5 Sample Latency O$t#ut #(B #(B-,y !etwork Base" Com#$ting Similar CL. options %a&oratory MUG ‘2 8 OMB4Py: Initial CPU Results osuGlatency inter-node initial results Small message Large message si0e si0e • Small ,ython o%erhead around 0.5 microseconds. • #%er"ead only apparent in smaller message si0es !etwork Base" Com#$ting %a&oratory MUG ‘2 OMB4Py: Initial GPU Results osuGlatency inter-node initial results Small message Large message • Differentsi0e o%erheads depending on the C$DA-aware datasi0e structure • Around 3 microseconds for CuPy and ,yC$DA • Around 5 microseconds for Numba !etwork Base" Com#$ting %a&oratory• #%er"ead is only apparent in smaller MUG message ‘2 si0es 2 Conclusion • (,. support for Python is crucially needed in t"e Python community. • (,. benchmarks are needed to measure and optimi0e MPI performance. • mpi4py provides Python bindings for (,I and is used with OMB-Py. • OMB-Py provides Python (,. benchmarking similar to OMB in C. • Initial results show a small o%erhead due to Python-MPI bindings layer and the use of Python objects as buffers for communication. !etwork Base" Com#$ting %a&oratory MUG ‘2 ( Thank ;ou< alnaasan.&'osu.edu Network-Based Computing Laboratory "ttp=44nowlab.cse.ohio-state.edu/ Hig" ,erformance Deep Learning http:/4"idl.cse.ohio-state.edu/ !"e 2ig"-,erformance Deep Learning !"e 2ig"-,erformance (,.4,6AS ,ro<ect ,ro<ect "ttp=44m%apich.cse.o"io-state.edu4 !etwork Base""ttp=44"idl.cse.o"io-state.edu4 Com#$ting %a&oratory MUG ‘2 ,.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us