“Visualization Tool Development and Malleable Applications Planning”

University Degree in Computer Science and Engineering Academic Year (e.g. 2018-2019) Bachelor Thesis “Visualization tool development and malleable applications planning” Federico Goldfryd SpruKt David Expósito Singh 2 Abstract High performance computing is a field of increasing relevance both for the industrial and corporate sectors and the academic environment. It allows to speed up research in many fields, using different applications that could need years of execution in a normal computer. For that reason, the access to supercomputers and computer clusters is increasing throughout the years. While at the same time governments and organizations are supporting initiatives for the development of new supercomputers with exponentially more power that can significantly accelerate all this research and development. The power of these computers come with the high degree of parallelization that they offer, and the more powerful the higher the number of parallel nodes working at the same time. Given the cost both economical and in time that takes to the applications to execute, performance is a key matter regarding supercomputers. To improve this performance many monitoring applications have been developed. These applications gather all the data that can be generated during the execution of these highly parallel applications in order to improve as much as possible their efficiency, and the data generated increases with the power of the computers used. FlexMPI is a project part of the University Carlos III de Madrid, that offers a tool for running parallel applications and monitoring them. Using FlexMPI, these applications can be modified during execution time, changing the number of processes they execute or moving them from one computing node to another. This tool was executed as a command line tool, being accessible only through a terminal. This project is focused on the development of a graphical user interface to interact with the system from a remote machine, to allow to visualize the system and control the execution of the applications in real time, with a simple graphical interface that makes possible control the computing cluster and the applications that run using FlexMPI using simple buttons. In addition to the visualization and control features, the GUI will include notifications when different issues in the supercomputer are detected, monitoring both applications and the computing nodes where they are running. All the data collected will be stored persistently to add the possibility of analysing it later, and the tool will also offer option of automating the actions the be ordered to the applications by analysing the real- time data that will be received from the cluster. 3 4 Table of Contents 1. INTRODUCTION ...................................................................................................... 11 1.1. MOTIVATION ...................................................................................................... 11 1.2. RESEARCH GROUP CONTEXT .............................................................................. 12 1.2.1. SOFTWARE CONTEXT ................................................................................................... 12 1.2.2. HARDWARE CONTEXT: TUCAN ...................................................................................... 12 1.3. PROJECT OBJECTIVES ........................................................................................ 13 1.4. REPORT STRUCTURE .......................................................................................... 14 2. STATE OF THE ART ............................................................................................... 15 2.1. CURRENT CHALLENGES ON SUPERCOMPUTING .................................................. 15 2.2. APPLICATIONS MONITORING TOOLS ................................................................... 17 2.2.1. ARM MAP ............................................................................................................... 17 2.2.2. AWS CLOUDWATCH .................................................................................................. 18 2.2.3. GANGLIA .................................................................................................................. 19 2.2.4. HPC TOOLKIT ............................................................................................................ 20 2.2.5. INTEL VTUNE ............................................................................................................ 22 2.2.6. PARAVER AND EXTRAE ................................................................................................ 23 2.2.7. PERISCOPE ................................................................................................................ 24 2.2.8. SCALASCA ................................................................................................................. 25 2.3. APPLICATION RUNNING ENVIRONMENT IN CLUSTERS ......................................... 26 2.3.1. SLURM ..................................................................................................................... 26 2.3.2. TORQUE ................................................................................................................. 29 2.3.3. IBM SPECTRUM LSF .................................................................................................. 30 2.3.4. UNIVA GRID ENGINE ................................................................................................... 32 2.4. FLEXMPI TOOL .................................................................................................. 33 3. ENVIRONMENT DESCRIPTION ............................................................................ 36 3.1. DEVELOPMENT ENVIRONMENT .......................................................................... 36 3.1.1. GUI ......................................................................................................................... 36 3.1.2. CONTROLLER ............................................................................................................. 39 3.1.3. COMMON ENVIRONMENT ............................................................................................ 40 3.2. SOCIO-ECONOMIC ENVIRONMENT ................................................................................. 41 4. DESCRIPTION OF THE PROPOSED ARCHITECTURE ...................................... 44 4.1. OVERVIEW ......................................................................................................... 44 4.2. COMPONENTS ..................................................................................................... 46 4.2.1. GUI ......................................................................................................................... 46 4.2.2. CONTROLLER ............................................................................................................. 49 4.2.3. APPLICATION ............................................................................................................. 51 4.3. INTERFACES DEFINITION .................................................................................... 52 4.3.1. CONTROLLER-GUI CONNECTION ................................................................................... 52 4.3.2. APPLICATION REGISTRATION ........................................................................................ 53 5 4.3.3. MONITOR REGISTRATION ............................................................................................. 55 4.3.4. COMMAND SENDING .................................................................................................. 56 4.3.5. APPLICATION METRICS ................................................................................................ 57 4.3.6. CONTENTION NOTIFICATION ......................................................................................... 58 4.4. REQUIREMENTS ANALYSIS ................................................................................. 60 4.4.1. FUNCTIONAL REQUIREMENTS ....................................................................................... 61 4.4.2. NON-FUNCTIONAL REQUIREMENTS ............................................................................... 68 4.5. SYSTEM DESIGN .................................................................................................. 70 4.5.1. CONTAINER CLASSES ................................................................................................... 70 4.5.2. BUTTONS HANDLERS AND THREADS ............................................................................... 73 4.5.3. GUI VIEWS DESIGN ..................................................................................................... 75 4.6. PROJECT PLANNING ............................................................................................ 79 4.6.1. PHASE 1 ................................................................................................................... 79 4.6.2. PHASE 2 ................................................................................................................... 81 4.7. BUDGET .............................................................................................................. 83 4.7.1. STAFF COSTS ............................................................................................................. 83 4.7.2. HARDWARE COSTS ....................................................................................................

Load more