Openmp 4.5 Validation and Verification Testsuite
Total Page:16
File Type:pdf, Size:1020Kb
OPENMP 4.5 VALIDATION AND VERIFICATION TESTSUITE DESIGN AND IMPLEMENTATION FOR OFFLOADING FEATURES by Jose Manuel Monsalve Diaz A Master Thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering Winter 2020 c 2020 Jose Manuel Monsalve Diaz All Rights Reserved OPENMP 4.5 VALIDATION AND VERIFICATION TESTSUITE DESIGN AND IMPLEMENTATION FOR OFFLOADING FEATURES by Jose Manuel Monsalve Diaz Approved: Sunita Chandrasekaran, Ph.D. Professor in charge of Master Thesis on behalf of the Advisory Committee Approved: Guang R. Gao, Ph.D. Co-Professor in charge of Master Thesis on behalf of the Advisory Com- mittee Approved: Kenneth E. Barner, Ph.D. Chair of the Department of Electrical and Computer Engineering Approved: Levi Thompson, Ph.D. Dean of the College of Engineering Approved: Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate and Professional Education and Dean of the Graduate College ACKNOWLEDGMENTS This material is based upon work supported by the U.S. Department of Energy, Office of Science, the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration under contract number DE-AC05-00OR22725. This project is a joint effort between several laboratories of the Department of Energy, and the University of Delaware. In particular, Oak Ridge National Laboratory and Argonne National Laboratory are both large contributors in this project. This project has been an effort of many people that are acknowledge in our website for their contributions. While my contribution has been considerable, there is also a large group of people involved in this project whose contributions are invaluable. In particular Dr. Swaroop Pophale (ORNL), Dr. Oscar Hernandez(ORNL), Dr. David E. Bernholdt(ORNL), Dr. Hal Finkel(ANL), and Professor Sunita Chandrasekaran (UD) are major leaders of this project which is part of the SOLLVE initiative of the Exascale Computing Project. Additionally, Sergio Pino MS, as well as undergraduate students Joshua Davis, Kyle Friedline and Thomas Huber have heavily contributed to this project, and they have created many of the tests presented in this work. Special recognition to other external collaborators such as many of the members of the OpenMP ARB, application developers that have submitted tests and ideas, as well as vendors that have contributed with hardware donations. In particular, special recognition to AMD and NVIDIA for their donations of GPGPU accelerators which we have used to run this testsuite. Also, special thanks to all the vendors that have provided us with feedback through the use of this software. iii To my parents, my brother, my wife and all my family. They are who I would like to be one day. iv TABLE OF CONTENTS LIST OF TABLES :::::::::::::::::::::::::::::::: viii LIST OF FIGURES ::::::::::::::::::::::::::::::: ix LIST OF LISTINGS :::::::::::::::::::::::::::::: xi ABSTRACT ::::::::::::::::::::::::::::::::::: xiii Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 2 OBJECTIVES AND PROBLEM FORMULATION ::::::::: 8 3 BACKGROUND AND MOTIVATION ::::::::::::::::: 9 3.1 A brief history and overview of OpenMP :::::::::::::::: 9 3.2 OpenMP Accelerators Offloading support :::::::::::::::: 12 3.3 Implementation of OpenMP offloading support in compilers ::::: 15 3.3.1 Challenges for compilers ::::::::::::::::::::: 16 3.3.2 Translating code for GPGPU device offloading ::::::::: 17 3.3.2.1 Dynamic Parallelism :::::::::::::::::: 19 3.3.2.2 If-Master coordination ::::::::::::::::: 21 3.3.2.3 Executor/Inspector ::::::::::::::::::: 22 3.4 Compilers supporting OpenMP Offloading ::::::::::::::: 25 3.4.1 GNU GCC: :::::::::::::::::::::::::::: 25 3.4.2 LLVM: ::::::::::::::::::::::::::::::: 26 3.4.3 Other Vendors and Devices ::::::::::::::::::: 27 4 RELATED WORK ::::::::::::::::::::::::::::: 29 v 5 TEST SUITE INFRASTRUCTURE DESIGN :::::::::::: 31 5.1 Test design and review process :::::::::::::::::::::: 32 5.2 Infrastructure ::::::::::::::::::::::::::::::: 34 5.2.1 Development environment and website ::::::::::::: 35 5.2.2 Folder Structure :::::::::::::::::::::::::: 36 5.2.3 Tests structure, Header file and Fortran Module :::::::: 36 5.2.4 Makefile :::::::::::::::::::::::::::::: 39 5.2.4.1 Rules for Makefile ::::::::::::::::::: 39 5.2.4.2 Options for Makefile :::::::::::::::::: 40 5.2.4.2.1 CC, CXX, and FC: compiler selection :: 40 5.2.4.2.2 OMP VERSION :::::::::::::: 41 5.2.4.2.3 SOURCES ::::::::::::::::: 42 5.2.4.2.4 TESTS TO RUN ::::::::::::: 43 5.2.4.2.5 VERBOSE and VERBOSE TESTS ::: 43 5.2.4.2.6 LOG and LOG ALL :::::::::::: 44 5.2.4.2.7 LOG DIR and BIN DIR ::::::::: 44 5.2.4.2.8 SYSTEM, MODULE LOAD and ADD BATCH SCHED :::::::::: 45 5.2.4.2.9 NO OFFLOADING :::::::::::: 45 5.2.4.2.10 REPORT ONLINE TAG and REPORT ONLINE APPEND :::::: 46 5.3 System customization ::::::::::::::::::::::::::: 46 5.4 Results, Logs and Reports :::::::::::::::::::::::: 47 5.4.1 Raw format :::::::::::::::::::::::::::: 48 5.4.2 Summary Report ::::::::::::::::::::::::: 48 5.4.3 JSON format ::::::::::::::::::::::::::: 49 5.4.4 CSV Format :::::::::::::::::::::::::::: 50 5.4.5 HTML format ::::::::::::::::::::::::::: 50 5.4.6 Online Report ::::::::::::::::::::::::::: 52 5.5 Online Result report tool ::::::::::::::::::::::::: 53 5.5.1 The create tag operation :::::::::::::::::::: 55 5.5.2 The obtain result operation :::::::::::::::::: 55 5.5.3 The delete result operation :::::::::::::::::: 56 vi 5.5.4 The update result operation :::::::::::::::::: 56 5.5.5 The append result operation :::::::::::::::::: 57 5.6 Measuring overhead in current OpenMP offloading implementations : 57 6 TEST EXAMPLES ::::::::::::::::::::::::::::: 61 6.1 Offloading to multiple devices :::::::::::::::::::::: 61 6.2 Handling task dependencies ::::::::::::::::::::::: 63 6.3 Mapping C++ features :::::::::::::::::::::::::: 65 6.4 Mapping linked-list to device ::::::::::::::::::::::: 69 6.5 Matrix Multiplication ::::::::::::::::::::::::::: 71 7 FINDINGS AND RESULTS OF THIS WORK :::::::::::: 73 7.1 Test bed configurations :::::::::::::::::::::::::: 73 7.2 Specification findings ::::::::::::::::::::::::::: 75 7.3 Testsuite results :::::::::::::::::::::::::::::: 77 7.3.1 Summary of used systems, compilers and compiler versions :: 78 7.3.2 Programming language evolution :::::::::::::::: 78 7.3.3 Compiler version results, errors and evolution ::::::::: 80 7.3.3.1 AOMP ::::::::::::::::::::::::: 80 7.3.3.2 IBM XL ::::::::::::::::::::::::: 82 7.3.3.3 GNU GCC ::::::::::::::::::::::: 84 7.3.3.4 LLVM/Clang :::::::::::::::::::::: 86 7.4 Overhead ::::::::::::::::::::::::::::::::: 87 7.4.1 Offloading on Summit :::::::::::::::::::::: 88 7.4.2 Offloading on Fatnode :::::::::::::::::::::: 91 7.4.3 Combined Constructs ::::::::::::::::::::::: 95 7.4.4 The effect of number of teams and number of threads ::::: 96 8 CONCLUSIONS AND FUTURE WORK ::::::::::::::: 99 8.1 The future of OpenMP :::::::::::::::::::::::::: 101 REFERENCES :::::::::::::::::::::::::::::::::: 104 vii LIST OF TABLES 5.1 List of operations supported by the OMPVV header files for test formatting :::::::::::::::::::::::::::::::: 38 5.2 Set of rules available in the Makefile ::::::::::::::::: 40 5.3 List of supported out-of-the-box compilers with the used flags for each 42 7.1 List of used compiler vendors, languages and their versions ::::: 78 7.2 List of compilers per system, compiler versions and used flags ::: 79 viii LIST OF FIGURES 1.1 Top 500 List: number of systems reported in the list using accelerators (left axis), and evolution of the average number of cores per sockets (yellow line and right axis) :::::::::::::::: 1 1.2 Overlapping the evolution of OpenMP and OpenACC directive base programming models specification releases with figure 1.1 :::::: 3 1.3 Counting elements per OpenMP Specification Documents. Number of pages in the PDF, Number of Constructs and Directives defined, Number of API functions defined, and number of Environmental Variables defined. :::::::::::::::::::::::::::: 6 3.1 OpenMP 4.0+ Execution model. ::::::::::::::::::: 14 5.1 Workflow for developing the Validation and Verification Suite ::: 34 5.2 Project folder tree structure. :::::::::::::::::::::: 36 5.3 Snapshot of the HTML report generated by the testsuite :::::: 51 5.4 Diagram of the Online Report Infrastructure ::::::::::::: 54 6.1 Task graph created by Code 6.2 :::::::::::::::::::: 64 6.2 UML Diagram of the example in listings 6.3, 6.4, and 6.5 :::::: 66 7.1 Testbed systems. The different execution environments used in this study :::::::::::::::::::::::::::::::::: 74 7.2 Percentage of Pass and Fail in all results per programming language 80 7.3 AOMP Version evolution :::::::::::::::::::::::: 81 7.4 IBM XL Version evolution ::::::::::::::::::::::: 82 ix 7.5 GNU GCC Version Evolution ::::::::::::::::::::: 84 7.6 LLVM/Clang Version evolution. YTK Corresponds to the CORAL Clang Compiler. :::::::::::::::::::::::::::: 86 7.7 Overhead measurement for offloading directives on Fatnode cluster 89 7.8 Overhead measurement for offloading directives on Summit ::::: 92 7.9 Comparing Combined directives vs. Nesting of those directives OpenMP directives ::::::::::::::::::::::::::: 93 7.10 target teams distribute varying the number of teams. Effects of the number of teams on the overhead of the runtime on multiple compilers and systems ::::::::::::::::::::::::: 94 7.11 Effects of the number of teams and number of threads on the overhead of the runtime on multiple compilers and systems