MPI Runtime Error Detection with MUST

MPI Runtime Error Detection with MUST For the 13th VI-HPS Tuning Workshop Joachim Protze and Felix Münchhalfen IT Center RWTH Aachen University February 2014 Content • MPI Usage Errors • Error Classes • Avoiding Errors • Correctness Tools • Runtime Error Detection • MUST • Hands On Motivation • MPI programming is error prone • Portability errors (just on some systems, just for some runs) • Bugs may manifest as: – Crash – Application hanging Error more – Finishes obvious • Questions: – Why crash/hang? – Is my result correct? – Will my code also give correct results on another system? • Tools help to pin-point these bugs Common MPI Error Classes • Common syntactic errors: – Incorrect arguments Tool to use: – Resource usage MUST, – Lost/Dropped Requests Static analysis tool, – Buffer usage – Type-matching (Debugger) – Deadlocks • Semantic errors that are correct in terms of MPI standard, but do not match the programmers intent: – Displacement/Size/Count errors Tool to use: Debugger MPI Usage Errors (2) • Complications in MPI usage: Non-blocking communication Persistent communication Complex collectives (e.g. Alltoallw) Derived datatypes Non-contiguous buffers • Error Classes include: Incorrect arguments Resource errors Buffer usage Type matching Deadlocks Content • MPI Usage Errors • Error Classes • Avoiding Errors • Correctness Tools • Runtime Error Detection • MUST • Hands On Error Classes – Incorrect Arguments • Complications Calls with many arguments In Fortran many arguments are of type INTEGER Several restrictions for arguments of some calls Compilers can’t detect all incorrect arguments • Example: MPI_Send( buf , count , MPI_INTEGER, target, tag, MPI_COMM_WORLD); Error Classes – Resource Usage • Complications Many types of resources Leaks MPI internal limits • Example: MPI_Comm_dup (MPI_COMM_WORLD, &newComm); MPI_Finalize (); Error Classes – Buffer Usage • Complications Memory regions passed to MPI must not overlap (except send-send) Derived datatypes can span non-contiguous regions Collectives can both send and receive • Example: MPI_Isend (&(buf[0]), 5 /*count*/, MPI_INT, ...); MPI_Irecv (&(buf[4]), 5 /*count*/, MPI_INT, ...); Error Classes – Type Matching • Complications Complex derived types Types match if the signature matches, not their constructors Partial receives • Example 1: Task 0 Task 1 MPI_Send (buf, 1, MPI_INT); MPI_Recv (buf, 1, MPI_INT); Matches => Equal types match Error Classes – Type Matching (2) • Example 2: Consider type T1 = {MPI_INT, MPI_INT} Task 0 Task 1 MPI_Send (buf, 1, T1); MPI_Recv (buf, 2, MPI_INT); Matches => type signatures are equal • Example 3: T1 = {MPI_INT, MPI_FLOAT} T2 = {MPI_INT, MPI_INT} Task 0 Task 1 MPI_Send (buf, 1, T1); MPI_Recv (buf, 1, T2); Missmatch => MPI_INT != MPI_FLOAT Error Classes – Type Matching (3) • Example 4: T1 = {MPI_INT, MPI_FLOAT} T2 = {MPI_INT, MPI_FLOAT, MPI_INT} Task 0 Task 1 MPI_Send (buf, 1, T1); MPI_Recv (buf, 1, T2); Matches => MPI allows partial receives • Example 4: T1 = {MPI_INT, MPI_FLOAT} T2 = {MPI_INT, MPI_FLOAT, MPI_INT} Task 0 Task 1 MPI_Send (buf, 2, T1); MPI_Recv (buf, 1, T2); Missmatch => Partial send is not allowed Error Classes – Deadlocks • Complications: Non-blocking communication Complex completions (Wait{all, any, some}) Non-determinism (e.g. MPI_ANY_SOURCE) Choices for MPI implementation (e.g. buffered MPI_Send) Deadlocks may be causes by non-trivial dependencies • Example 1: Task 0 Task 1 MPI_Recv (from:1); MPI_Recv (from:0); Deadlock: 0 waits for 1, which waits for 0 Error Classes – Deadlocks (2) • How to visualise/understand deadlocks? Common approach waiting-for graphs (WFGs) One node for each rank Rank X waits for rank Y => node X has an arc to node Y • Consider situation from Example 1: Task 0 Task 1 MPI_Recv (from:1); MPI_Recv (from:0); • Visualization: 0: MPI_Recv 1: MPI_Recv • Deadlock criterion: cycle (For simple cases) Error Classes – Deadlocks (3) • What about collectives? Rank calling coll. operation waits for all tasks to issue a matching call One arc to each task that did not call a matching call One node potentially has multiple outgoing arcs Multiple arcs means: waits for all of the nodes • Example 2: Task 0 Task 1 Task 2 MPI_Bcast (WORLD); MPI_Bcast (WORLD); MPI_Gather (WORLD); • Visualization: 0: MPI_Bcast 1: MPI_Bcast 2: MPI_Gather • Deadlock criterion: cycle (Also here) Error Classes – Deadlocks (4) • What about freedom in semantic? Collectives may not be synchronizing Standard mode send may (or may not) be buffered • Example 3: Task 0 Task 1 MPI_Send (to:1); MPI_Send (to:0); • This is a deadlock! These are called “potential” deadlocks Can manifest for some implementations and/or message sizes • Visualization: 0: MPI_Send 1: MPI_Send Error Classes – Deadlocks (5) • What about timely interleaving? Non-deterministic applications Interleaving determines what calls match or are issued Causes bugs that only occur “sometimes” • Example 3: Task 0 Task 1 Task 2 MPI_Recv(from:ANY); MPI_Send (to:1) MPI_Recv(from:2) MPI_Send(to:1) MPI_Barrier () MPI_Barrier() MPI_Barrier() • What happens: Case A: Case B: Recv (from:ANY) matches Recv (from:ANY) matches send from task 0 send from task 2 All calls complete Deadlock Error Classes – Deadlocks (6) • What about “any” and “some”? MPI_Waitany/Waitsome and wild-card (MPI_ANY_SOURCE) receives have special semantics These wait for at least one out of a set or ranks This is different from the “waits for all” semantic • Example 4: Task 0 Task 1 Task 2 MPI_Recv (from:1) MPI_Recv(from:ANY); MPI_Recv(from:1) • What happens: No call can progress, Deadlock 0 waits for 1; 1 waits for either 0 or 1; 2 waits for 1 Error Classes – Deadlocks (7) • How to visualize the “any/some” semantic? There is the “Waits for all of” wait type => “AND” semantic There is the “Waits for any of” wait type => “OR” semantic Each type gets one type of arcs AND: solid arcs OR: Dashed arcs • Visualization for Example 4: Task 0 Task 1 Task 2 MPI_Recv(from:1) MPI_Recv(from:ANY); MPI_Recv(from:1) 0: MPI_Recv 1: MPI_Recv 2: MPI_Recv Error Classes – Deadlocks (8) • Deadlock criterion for AND + OR Cycles are necessary but not sufficient A weakened form of a knot (OR-Knot) is the actual criterion Tools can detect it and visualize the core of the deadlock • Some examples: An OR-Knot (which is also a knot, Deadlock): 0 1 2 Cycle but no OR-Knot (Not Deadlocked): 0 1 2 OR-Knot but not a knot (Deadlock): 0 1 2 3 Error Classes – Semantic Errors • Description: – Erroneous sizes, counts, or displacements • Example: During datatype construction or communication • Often “off-by-one” errors • Example (C): Stride must be 10 for a column /* Create datatype to send a column of 10x10 array */ MPI_Type_vector ( 10, /* #blocks */ 1, /* #elements per block */ 9, /* #stride */ MPI_INT, /* old type */ &newType /* new type */ ); Content • MPI Usage Errors • Error Classes • Avoiding Errors • Correctness Tools • Runtime Error Detection • MUST • Hands On Avoiding Errors • The bugs you don’t introduce are the best one: Think, don’t hack Comment your code Confirm consistency with asserts Consider a verbose mode of your application Use unit testing, or at least provide test cases Set up nightly builds MPI Testing Tool: http://www.open-mpi.org/projects/mtt/ Ctest & Dashboards: http://www.vtk.org/Wiki/CMake_Testing_With_CTest Content • MPI Usage Errors • Error Classes • Avoiding Errors • Correctness Tools • Runtime Error Detection • MUST • Hands On Tool Overview – Approaches Techniques • Debuggers: – Helpful to pinpoint any error – Finding the root cause may be hard – Won’t detect sleeping errors – E.g.: gdb, TotalView, Allinea DDT • Static Analysis: – Compilers and Source analyzers MPI_Recv (buf, 5, MPI_INT, -1, – Typically: type and expression errors 123, MPI_COMM_WORLD, – E.g.: MPI-Check &status); “ -1” instead of “ MPI_ANY_SOURCE” • Model checking: – Requires a model of your applications if (rank == 1023) – State explosion possible crash (); – E.g.: MPI-Spin Only works with less than 1024 tasks Tool Overview – Approaches Techniques (2) • Runtime error detection: Inspect MPI calls at runtime Limited to the timely interleaving that is observed Causes overhead during application run E.g.: Intel Trace Analyzer, Umpire, Marmot, MUST Task 0 Task 1 MPI_Send(to:1, type=MPI_INT) MPI_Recv(from:0, type=MPI_FLOAT) OnlyType worksmismatch with less than 1024 tasks Tool Overview – Approaches Techniques (3) • Formal verification: Extension of runtime error detection Explores all relevant interleavings (explore around nondet.) Detects errors that only manifest in some runs Possibly many interleavings to explore E.g.: ISP Task 0 Task 1 Task 2 spend_some_time() MPI_Recv (from:ANY) MPI_Send (to:1) MPI_Recv (from:0) MPI_Send (to:1) MPI_Barrier () MPI_Barrier () MPI_Barrier () Deadlock if MPI_Send(to:1)@0 matches MPI_Recv(from:ANY)@1 Approaches to Remove Bugs (Selection) Repartitioning? Node Memory? Representative input? Our contribution: Reproducibility? Grid? Downscaling Runtime Checking Debuggers Umpire Bug@ ISP/ Scale DAMPI Static Code Analysis Model Checking TASS pCFG’s Barrier Analysis Content • MPI Usage Errors • Error Classes • Avoiding Errors • Correctness Tools • Runtime Error Detection • MUST • Hands On Runtime Error Detection • A MPI wrapper library intercepts all MPI calls Application calls MPI_X(…) Correctness Tool calls PMPI_X(…) MPI • Checks analyse the intercepted calls Local checks require data from just one task E.g.: invalid arguments, resource usage errors

MPI Runtime Error Detection with MUST

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support