Enhancing OMB with Python Benchmarks Presentation at MUG ‘21
Total Page:16
File Type:pdf, Size:1020Kb
MVAPICH2 Meets Python: Enhancing OMB with Python Benchmarks Presentation at MUG ‘2 Nawras Alnaasan Network-Based Computing Laboratory Dept. of Computer Science and Engineering !he #hio State $ni%ersity alnaasan.&'osu.edu O$tline • #S$ (icro Benc"marks )#(B* • +"y ,yt"on- • (,. for ,yt"on • .ntroducing #(B-,y • .nitial /esults • Conclusion !etwork Base" Com#$ting %a&oratory MUG ‘2 2 O)U Micro Benchmarks (OMB+ • #(B is a widely used package to measure performance of (,. operations. • .t helps in optimi0ing (,I applications on different HPC systems. • #1ers a series of benchmarks including but not limited to point-to-point, blocking and non-blocking collecti%es, and one-sided tests. • A large set of options and flags for users to create customi0able tests. • Supports different platforms like /#Cm/CUDA to run on AR(4N5IDIA GPUs. • +ritten in C !etwork Base" Com#$ting %a&oratory MUG ‘2 ( -hy Python? • Second most popular programming language according to the !.#BE inde7 as of August 8021. Adopted from= "ttps=44www.tiobe.com4tiobe-inde74 • 6reat community and support around (achine Learning Big Data and Cloud Computing – (any deep learning frameworks like ,yTorch, !ensor3ow :eras etc. – #ne of the most important tools for data science and analytics. • 6reat ,otential to use (,. and distributed computing tec"niques. !etwork• (any Base" other Com#$ting ,yt"on libraries and frameworks like Sci,y Numpy D<ango etc. %a&oratory MUG ‘2 , -hy Python? 5ery flexible and simplified synta7. Adopted from= !etwork Base" Com#$ting "ttps:44www."ardikp.com489&>4&84?94pyt"on-cpp4 %a&oratory MUG ‘2 / MPI for Python • (,. support in ,yt"on is needed for multiple purposes especially for distributed computing applications )DL data science etc.* • ,yt"on needs a wrapper to use (,. functionalities. • mpiApy is t"e most widely used wrapper to pro%ide ,yt"on bindings for (,. • #t"er wrappers exist like t"e torc".distributed package from ,y!orc" !etwork Base" Com#$ting %a&oratory MUG ‘2 0 mpi4py • Bor #(B-,y we mainly use mpiApy to provide ,yt"on bindings for (,.. • Supports communication for selected ,yt"on ob<ects as memory bu1ers )bytearrays Numpy etc.* • Capable of seriali0ing general ,yt"on ob<ects using built-in C,ickleD module. – (,..Send)* uses supported data structures as memory bu1ers. – (,..send)* seriali0es ob<ects before sending data. • /ecent releases E%?.& supports 6,$-aware (,.. !etwork Base" Com#$ting %a&oratory MUG ‘2 2 Intro"$cing OMB4Py PythonPython OMB4PyOMB4Py A##'icationsA##'ications BenchmarksBenchmarks PythonPython m#i,#ym#i,#y MPIMPI Im#'ementationIm#'ementation 1or1or CPUCPU oror GPUGPU Har"wareHar"ware com#onentscom#onents !etwork Base" Com#$ting %a&oratory MUG ‘2 3 -hat does OMB4Py o6er.7 • Bollows a similar design to the original OMB and brings it into Pyt"on. • Support for point-to-point and collecti%e (,. operations. • Mimics beha%ior of real python applications t"at use MPI. • Support for both CP$ and 6,U tests using CUDA-aware Python data structures as buffers like Cu,y Numba and PyCUDA. • A wide range of command-line arguments to run customi0able tests. !etwork Base" Com#$ting %a&oratory MUG ‘2 5 Sample Latency O$t#ut #(B #(B-,y !etwork Base" Com#$ting Similar CL. options %a&oratory MUG ‘2 8 OMB4Py: Initial CPU Results osuGlatency inter-node initial results Small message Large message si0e si0e • Small ,ython o%erhead around 0.5 microseconds. • #%er"ead only apparent in smaller message si0es !etwork Base" Com#$ting %a&oratory MUG ‘2 OMB4Py: Initial GPU Results osuGlatency inter-node initial results Small message Large message • Differentsi0e o%erheads depending on the C$DA-aware datasi0e structure • Around 3 microseconds for CuPy and ,yC$DA • Around 5 microseconds for Numba !etwork Base" Com#$ting %a&oratory• #%er"ead is only apparent in smaller MUG message ‘2 si0es 2 Conclusion • (,. support for Python is crucially needed in t"e Python community. • (,. benchmarks are needed to measure and optimi0e MPI performance. • mpi4py provides Python bindings for (,I and is used with OMB-Py. • OMB-Py provides Python (,. benchmarking similar to OMB in C. • Initial results show a small o%erhead due to Python-MPI bindings layer and the use of Python objects as buffers for communication. !etwork Base" Com#$ting %a&oratory MUG ‘2 ( Thank ;ou< alnaasan.&'osu.edu Network-Based Computing Laboratory "ttp=44nowlab.cse.ohio-state.edu/ Hig" ,erformance Deep Learning http:/4"idl.cse.ohio-state.edu/ !"e 2ig"-,erformance Deep Learning !"e 2ig"-,erformance (,.4,6AS ,ro<ect ,ro<ect "ttp=44m%apich.cse.o"io-state.edu4 !etwork Base""ttp=44"idl.cse.o"io-state.edu4 Com#$ting %a&oratory MUG ‘2 ,.