Combining Symbolic Execution and Model Checking to Verify MPI Programs
Total Page:16
File Type:pdf, Size:1020Kb
Combining Symbolic Execution and Model Checking to Verify MPI Programs Hengbiao Yu1∗, Zhenbang Chen1∗, Xianjin Fu1,2, Ji Wang1,2∗, Zhendong Su3, Jun Sun4, Chun Huang1, Wei Dong1 1College of Computer, National University of Defense Technology, Changsha, China 2State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, China 3Department of Computer Science, ETH Zurich, Switzerland 4School of Information Systems, Singapore Management University, Singapore {hengbiaoyu,zbchen,wj}@nudt.edu.cn,[email protected],[email protected],[email protected] ABSTRACT South Korea. ACM, New York, NY, USA, 17 pages. https://doi.org/XX.XXXX/ Message passing is the standard paradigm of programming in high- XXXXXXX.XXXXXXX performance computing. However, verifying Message Passing In- terface (MPI) programs is challenging, due to the complex program 1 INTRODUCTION features (such as non-determinism and non-blocking operations). In this work, we present MPI symbolic verifier (MPI-SV), the first Nowadays, an increasing number of high-performance computing symbolic execution based tool for automatically verifying MPI pro- (HPC) applications have been developed to solve large-scale prob- grams with non-blocking operations. MPI-SV combines symbolic lems [11]. The Message Passing Interface (MPI) [78] is the current execution and model checking in a synergistic way to tackle the de facto standard programming paradigm for developing HPC appli- challenges in MPI program verification. The synergy improves the cations. Many MPI programs are developed with significant human scalability and enlarges the scope of verifiable properties. We have effort. One of the reasons is that MPI programs are error-prone implemented MPI-SV1 and evaluated it with 111 real-world MPI because of complex program features (such as non-determinism verification tasks. The pure symbolic execution-based technique and asynchrony) and their scale. Improving the reliability of MPI successfully verifies 61 out of the 111 tasks (55%) within onehour, programs is challenging [29, 30]. while in comparison, MPI-SV verifies 100 tasks (90%). On aver- Program analysis [64] is an effective technique for improving age, compared with pure symbolic execution, MPI-SV achieves 19x program reliability. Existing methods for analyzing MPI programs speedups on verifying the satisfaction of the critical property and can be categorized into dynamic and static approaches. Most ex- 5x speedups on finding violations. isting methods are dynamic, such as debugging [51], correctness checking [71] and dynamic verification [83]. These methods need CCS CONCEPTS concrete inputs to run MPI programs and perform analysis based on runtime information. Hence, dynamic approaches may miss • Software and its engineering → Software verification and input-related program errors. Static approaches [5, 9, 55, 74] ana- validation; lyze abstract models of MPI programs and suffer from false alarms, KEYWORDS manual effort, and poor scalability. To the best of our knowledge, existing automated verification approaches for MPI programs either Symbolic Verification; Symbolic Execution; Model Checking; Mes- do not support input-related analysis or fail to support the analysis sage Passing Inteface; Synergy of the MPI programs with non-blocking operations, the invocations ACM Reference Format: of which do not block the program execution. Non-blocking opera- Hengbiao Yu, Zhenbang Chen, Xianjin Fu, Ji Wang, Zhendong Su, Jun tions are ubiquitous in real-world MPI programs for improving the Sun, Chun Huang, and Wei Dong. 2020. Combining Symbolic Execution performance but introduce more complexity to programming. arXiv:1803.06300v2 [cs.PL] 17 Jan 2020 and Model Checking to Verify MPI Programs. In ICSE ’20: ICSE ’20: 42nd Symbolic execution [27, 48] supports input-related analysis by International Conference on Software Engineering , May 23-29, 2020, Seoul, systematically exploring a program’s path space. In principle, sym- bolic execution provides a balance between concrete execution and ∗The first two authors contributed equally to this work and are co-first authors. Zhen- bang Chen and Ji Wang are the corresponding authors. static abstraction with improved input coverage or more precise 1MPI-SV is available https://mpi-sv.github.io. program abstraction. However, symbolic execution based analyses suffer from path explosion due to the exponential increase ofpro- Permission to make digital or hard copies of all or part of this work for personal or gram paths w.r.t. the number of conditional statements. The problem classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation is particularly severe when analyzing MPI programs because of par- on the first page. Copyrights for components of this work owned by others than ACM allel execution and non-deterministic operations. Existing symbolic must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a execution based verification approaches [77][25] do not support fee. Request permissions from [email protected]. non-blocking MPI operations. ICSE ’20, May 23-29, 2018, Seoul, South Korea In this work, we present MPI-SV, a novel verifier for MPI pro- © 2020 Association for Computing Machinery. ACM ISBN xxx-x-xxxx-xxxx-x/xx/xx...$15.00 grams by smartly integrating symbolic execution and model check- https://doi.org/XX.XXXX/XXXXXXX.XXXXXXX ing. As far as we know, MPI-SV is the first automated verifier that Hengbiao Yu, Zhenbang Chen, Xianjin Fu, Ji Wang, Zhendong Su, Jun Sun, Chun Huang, and Wei Dong supports non-blocking MPI programs and LTL [58] property verifi- Proc ::= var r : T j r := e j Comm j Proc ; Proc j cation. MPI-SV uses symbolic execution to extract path-level models if e Proc else Proc j while e do Proc from MPI programs and verifies the models w.r.t. the expected prop- Comm ::= Ssend(e) j Send(e) j Recv(e) j Recv(*) j Barrier j j j j erties by model checking [17]. The two techniques complement ISend(e,r) IRecv(e,r) IRecv(*,r) Wait(r) each other: (1) symbolic execution abstracts the control and data dependencies to generate verifiable models for model checking, and Figure 1: Syntax of a core MPI language. (2) model checking improves the scalability of symbolic execution by leveraging the verification results to prune redundant paths and enlarges the scope of verifiable properties of symbolic execution. 2 ILLUSTRATION In particular, MPI-SV combines two algorithms: (1) symbolic execution of non-blocking MPI programs with non-deterministic In this section, we first introduce MPI programs and use an example operations, and (2) modeling and checking the behaviors of an to illustrate the problem that this work targets. Then, we overview MPI program path precisely. To safely handle non-deterministic MPI-SV informally by the example. operations, the first algorithm delays the message matchings of non- deterministic operations as much as possible. The second algorithm 2.1 MPI Syntax and Motivating Example extracts a model from an MPI program path. The model represents MPI implementations, such as MPICH [31] and OpenMPI [26], pro- all the path’s equivalent behaviors, i.e., the paths generated by vide the programming interfaces of message passing to support changing the interleavings and matchings of the communication the development of parallel applications. An MPI program can be operations in the path. We have proved that our modeling algorithm implemented in different languages, such as C and C++. Without is precise and consistent with the MPI standard [24]. We feed the loss of generality, we focus on MPI programs written in C. Let T generated models from the second algorithm into a model checker be a set of types, N a set of names, and E a set of expressions. For to perform verification w.r.t. the expected properties, i.e., safety simplifying the discussion, we define a core language for MPI pro- and liveness properties in linear temporal logic (LTL) [58]. If the cesses in Figure 1, where T 2 T, r 2 N, and e 2 E. An MPI program extracted model from a pathp satisfies the property φ,p’s equivalent MP is defined by a finite set of processes fProci j 0 ≤ i ≤ ng. For paths can be safely pruned; otherwise, if the model checker reports a brevity, we omit complex language features (such as the messages counterexample, a violation of φ is found. This way, we significantly in the communication operations and pointer operations) although boost the performance of symbolic execution by pruning a large MPI-SV does support real-world MPI C programs. set of paths which are equivalent to certain paths that have been The statement var r : T declares a variable r with type T. The already model-checked. statement r := e assigns the value of expression e to variable r. We have implemented MPI-SV for MPI C programs based on A process can be constructed from basic statements by using the Cloud9 [10] and PAT [80]. We have used MPI-SV to analyze 12 real- composition operations including sequence, condition and loop. world MPI programs, totaling 47K lines of code (LOC) (three are For brevity, we incorporate the key message passing operations in beyond the scale that the state-of-the-art MPI verification tools can the syntax, where e indicates the destination process’s identifier. handle), w.r.t. the deadlock freedom property and non-reachability These message passing operations can be blocking or non-blocking. properties. For the 111 deadlock freedom verification tasks, when First, we introduce blocking operations: we set the time threshold to be an hour, MPI-SV can complete 100 • Ssend(e): sends a message to the eth process, and the sending tasks, i.e., deadlock reported or deadlock freedom verified, while process blocks until the message is received by the destination pure symbolic execution can complete 61 tasks.