University Degree in Computer Science and Engineering Academic Year (e.g. 2018-2019)

Bachelor Thesis “Visualization tool development and malleable applications planning”

Federico Goldfryd Sprukt David Expósito Singh

2 Abstract

High performance computing is a field of increasing relevance both for the industrial and corporate sectors and the academic environment. It allows to speed up research in many fields, using different applications that could need years of execution in a normal computer. For that reason, the access to supercomputers and computer clusters is increasing throughout the years. While at the same time governments and organizations are supporting initiatives for the development of new supercomputers with exponentially more power that can significantly accelerate all this research and development. The power of these computers come with the high degree of parallelization that they offer, and the more powerful the higher the number of parallel nodes working at the same time. Given the cost both economical and in time that takes to the applications to execute, performance is a key matter regarding supercomputers. To improve this performance many monitoring applications have been developed. These applications gather all the data that can be generated during the execution of these highly parallel applications in order to improve as much as possible their efficiency, and the data generated increases with the power of the computers used. FlexMPI is a project part of the University Carlos III de Madrid, that offers a tool for running parallel applications and monitoring them. Using FlexMPI, these applications can be modified during execution time, changing the number of processes they execute or moving them from one computing node to another. This tool was executed as a command line tool, being accessible only through a terminal. This project is focused on the development of a graphical user interface to interact with the system from a remote machine, to allow to visualize the system and control the execution of the applications in real time, with a simple graphical interface that makes possible control the computing cluster and the applications that run using FlexMPI using simple buttons. In addition to the visualization and control features, the GUI will include notifications when different issues in the supercomputer are detected, monitoring both applications and the computing nodes where they are running. All the data collected will be stored persistently to add the possibility of analysing it later, and the tool will also offer option of automating the actions the be ordered to the applications by analysing the real- time data that will be received from the cluster.

3

4 Table of Contents

1. INTRODUCTION ...... 11

1.1. MOTIVATION ...... 11 1.2. RESEARCH GROUP CONTEXT ...... 12 1.2.1. SOFTWARE CONTEXT ...... 12 1.2.2. HARDWARE CONTEXT: TUCAN ...... 12 1.3. PROJECT OBJECTIVES ...... 13 1.4. REPORT STRUCTURE ...... 14

2. STATE OF THE ART ...... 15

2.1. CURRENT CHALLENGES ON SUPERCOMPUTING ...... 15 2.2. APPLICATIONS MONITORING TOOLS ...... 17 2.2.1. ARM MAP ...... 17 2.2.2. AWS CLOUDWATCH ...... 18 2.2.3. GANGLIA ...... 19 2.2.4. HPC TOOLKIT ...... 20 2.2.5. INTEL VTUNE ...... 22 2.2.6. PARAVER AND EXTRAE ...... 23 2.2.7. PERISCOPE ...... 24 2.2.8. SCALASCA ...... 25 2.3. APPLICATION RUNNING ENVIRONMENT IN CLUSTERS ...... 26 2.3.1. SLURM ...... 26 2.3.2. TORQUE ...... 29 2.3.3. IBM SPECTRUM LSF ...... 30 2.3.4. UNIVA GRID ENGINE ...... 32 2.4. FLEXMPI TOOL ...... 33

3. ENVIRONMENT DESCRIPTION ...... 36

3.1. DEVELOPMENT ENVIRONMENT ...... 36 3.1.1. GUI ...... 36 3.1.2. CONTROLLER ...... 39 3.1.3. COMMON ENVIRONMENT ...... 40 3.2. SOCIO-ECONOMIC ENVIRONMENT ...... 41

4. DESCRIPTION OF THE PROPOSED ARCHITECTURE ...... 44

4.1. OVERVIEW ...... 44 4.2. COMPONENTS ...... 46 4.2.1. GUI ...... 46 4.2.2. CONTROLLER ...... 49 4.2.3. APPLICATION ...... 51 4.3. INTERFACES DEFINITION ...... 52 4.3.1. CONTROLLER-GUI CONNECTION ...... 52 4.3.2. APPLICATION REGISTRATION ...... 53

5 4.3.3. MONITOR REGISTRATION ...... 55 4.3.4. COMMAND SENDING ...... 56 4.3.5. APPLICATION METRICS ...... 57 4.3.6. CONTENTION NOTIFICATION ...... 58 4.4. REQUIREMENTS ANALYSIS ...... 60 4.4.1. FUNCTIONAL REQUIREMENTS ...... 61 4.4.2. NON-FUNCTIONAL REQUIREMENTS ...... 68 4.5. SYSTEM DESIGN ...... 70 4.5.1. CONTAINER CLASSES ...... 70 4.5.2. BUTTONS HANDLERS AND THREADS ...... 73 4.5.3. GUI VIEWS DESIGN ...... 75 4.6. PROJECT PLANNING ...... 79 4.6.1. PHASE 1 ...... 79 4.6.2. PHASE 2 ...... 81 4.7. BUDGET ...... 83 4.7.1. STAFF COSTS ...... 83 4.7.2. HARDWARE COSTS ...... 84 4.7.3. SOFTWARE COSTS ...... 84 4.7.4. TOTAL BUDGET ...... 85

5. EVALUATION ...... 86

5.1. PLATFORM DESCRIPTION ...... 86 5.2. TRACEABILITY MATRIX ...... 87 5.3. FUNCTIONAL TESTS ...... 88 5.4. PERFORMANCE TESTS ...... 100 5.4.1. CONTROLLER PERFORMANCE TESTS ...... 100 5.4.2. GUI PERFORMANCE TESTS ...... 102 5.4.3. APPLICATIONS PERFORMANCE TESTS ...... 104

6. CONCLUSIONS AND FUTURE WORK ...... 105

6.1. CONCLUSIONS ...... 105 6.2. FUTURE WORK ...... 106

7. REFERENCES ...... 107

APPENDIX A: USER MANUAL ...... 111

FIRST STEPS ...... 111 STEP 1: INSTALL AND SETUP ...... 111 STEP 2: RUN ...... 112 STEP 3: CONNECT CONTROLLER ...... 112 USING THE GUI ...... 113 MAIN WINDOW ...... 113 COMMANDS PANEL ...... 114 STATISTICS VIEW ...... 115 NODES VIEW AND MANAGEMENT ...... 117

6 Figures Reference

Figure 1: Arm MAP ...... 17 Figure 2: GUI of Arm MAP ...... 17 Figure 3: AWS CloudWatch ...... 18 Figure 4: AWS Capture GUI ...... 18 Figure 5: Ganglia ...... 19 Figure 6: Ganglia GUI ...... 19 Figure 7: HPCToolkit ...... 20 Figure 8: HPCToolkit components ...... 20 Figure 9: HPCToolkit TraceViewer GUI ...... 21 Figure 10. Intel VTune GUI ...... 22 Figure 11: Paraver GUI Views ...... 23 Figure 12: Persicope ...... 24 Figure 13: Scalasca Report Explorer ...... 25 Figure 14. Slurm Workload Manager ...... 26 Figure 15. Slurm Architecture Diagram ...... 27 Figure 16. Entities in Slurm ...... 27 Figure 17. Spectrum LSF Architecture ...... 30 Figure 18. IBM Spectrum LSF Security Model ...... 31 Figure 19. Univa Grid Engine ...... 32 Figure 20: FlexMPI application running environment ...... 34 Figure 21: Workflow related to FlexMPI ...... 35 Figure 22: HPC Performance Development ...... 41 Figure 23: HPC user base growth ...... 42 Figure 24: Diagram of FlexMPI components ...... 44 Figure 25: Overview of the threads of the system ...... 45 Figure 26: Flow diagram of controller-GUI connection ...... 53 Figure 27: Flow diagram of application registration ...... 54 Figure 28: Flow diagram of contention monitor thread connection ...... 55 Figure 29: Flow diagram of command sending ...... 56 Figure 30: Flow diagram of metrics retrieval process ...... 57 Figure 31: Flow diagram of contention message received ...... 58 Figure 32: Flow diagram of solving contention problem ...... 59 Figure 33: Relationship between nodes, apps and processes ...... 70 Figure 34: Class diagrams of container classes ...... 71 Figure 35: Threads and handlers class diagram ...... 74 Figure 36: Main GUI panel ...... 75 Figure 37: GUI nodes grid view ...... 76 Figure 38: Node information view ...... 77 Figure 39: Application metrics visualization window in GUI ...... 77 Figure 40: Application commands view ...... 78 Figure 41: Gantt chart for first part of Phase 1 ...... 80 Figure 42: Gantt chart for second part of Phase 1 ...... 80 Figure 43: Gantt chart for first part of Phase 2 ...... 82 Figure 44: Gantt chart for second part of Phase 2 ...... 82 Figure 45: Traceability Matrix ...... 87 Figure 46: Evolution of memory usage of the controller ...... 101 Figure 47: Evolution of the CPU usage of the GUI ...... 102

7 Figure 48: Evolution of the memory usage of the GUI ...... 103 Figure 49: Evolution of the execution time for an application ...... 104 Figure 50: FlexMPI GUI directory example ...... 111 Figure 51: FlexMPI GUI main window after launch ...... 112 Figure 52: FlexMPI GUI main window components ...... 113 Figure 53: Commands panel components ...... 114 Figure 54: Commands panel with statistics service active ...... 115 Figure 55: Application processes and statistics messages view ...... 115 Figure 56: Metrics plot panel ...... 116 Figure 57: Nodes view panel ...... 117 Figure 58: Nodes view panel with contention alert ...... 117 Figure 59: Node information panel ...... 118

8 Tables Reference

Table 1: Requirement Definition Template ...... 60 Table 2: Functional Requirement FR-01 ...... 61 Table 3: Functional Requirement FR-02 ...... 61 Table 4: Functional Requirement FR-03 ...... 61 Table 5: Functional Requirement FR-04 ...... 62 Table 6: Functional Requirement FR-05 ...... 62 Table 7: Functional Requirement FR-06 ...... 62 Table 8: Functional Requirement FR-07 ...... 63 Table 9: Functional Requirement FR-08 ...... 63 Table 10: Functional Requirement FR-09 ...... 63 Table 11: Functional Requirement FR-10 ...... 64 Table 12: Functional Requirement FR-11 ...... 64 Table 13: Functional Requirement FR-12 ...... 64 Table 14: Functional Requirement FR-13 ...... 65 Table 15: Functional Requirement FR-14 ...... 65 Table 16: Functional Requirement FR-15 ...... 65 Table 17: Functional Requirement FR-16 ...... 66 Table 18: Functional Requirement FR-17 ...... 66 Table 19: Functional Requirement FR-18 ...... 66 Table 20: Functional Requirement FR-19 ...... 67 Table 21: Non-Functional Requirement NFR-01 ...... 68 Table 22: Non-Functional Requirement NFR-02 ...... 68 Table 23: Non-Functional Requirement NFR-03 ...... 68 Table 24: Non-Functional Requirement NFR-04 ...... 69 Table 25: Non-Functional Requirement NFR-05 ...... 69 Table 26: Salaries description ...... 83 Table 27: Total costs description ...... 83 Table 28: Hardware costs breakdown ...... 84 Table 29: Software costs breakdown ...... 84 Table 30: Functional Test definition template ...... 88 Table 31: Functional Test 01 ...... 89 Table 32: Functional Test 02 ...... 89 Table 33: Functional Test 03 ...... 89 Table 34: Functional Test 04 ...... 90 Table 35: Functional Test 05 ...... 90 Table 36: Functional Test 06 ...... 90 Table 37: Functional Test 07 ...... 91 Table 38: Functional Test 08 ...... 91 Table 39: Functional Test 09 ...... 91 Table 40: Functional Test 10 ...... 92 Table 41: Functional Test 11 ...... 92 Table 42: Functional Test 12 ...... 92 Table 43: Functional Test 13 ...... 93 Table 44: Functional Test 14 ...... 93 Table 45: Functional Test 15 ...... 94 Table 46: Functional Test 16 ...... 94 Table 47: Functional Test 17 ...... 95

9 Table 48: Functional Test 18 ...... 95 Table 49: Functional Test 19 ...... 95 Table 50: Functional Test 20 ...... 96 Table 51: Functional Test 21 ...... 96 Table 52: Functional Test 22 ...... 97 Table 53: Functional Test 23 ...... 97 Table 54: Functional Test 24 ...... 98 Table 55: Functional Test 25 ...... 98 Table 56: Functional Test 26 ...... 99 Table 57: Functional Test 27 ...... 99 Table 58: Functional Test 28 ...... 99

10 1. Introduction

This chapter will serve as an introduction to the final degree project, where the motivation behind the project development will be explained, as well as the context, objectives. Also, the report structure will be detailed here.

1.1. Motivation

High Performance Computing [1], commonly referred as HPC, is a computing technique consisting in joining a vast number of independent computers in order to achieve much higher performance than with a typical desktop computer or workstation. The result of joining all these different computers is called supercomputer. The main advantage of having all these independent computing nodes connected forming one single supercomputer is the high level of parallelization that they can achieve. This makes possible to execute specialized programs to solve complex tasks. All these nodes execute processes that need to keep communications between each other to exchange the necessary data. Given this decentralized design, data is spread all over the nodes, and creating these communications is a complex task. This leads to complex machines that have high operating costs. Being used mainly for commercial and research projects, where time and economical costs are especially relevant. For that reason, getting the best performance possible is highly valuable, as it allows significative cuts both in execution time and, as a result, in money too. The way to increase the power of these machines, is mainly to increase the number of nodes, since there is a limit in the power that each individual node can achieve. The race to build more powerful supercomputer is a hot topic nowadays, and with the increase of complexity there are many problems to face in order to scale the systems. The applications executed in HPCs are monitored constantly to ensure the best efficiency in the use of these expensive resources, and with bigger systems the amount of information to be processed increases substantially. Being the problem of processing that vast amount of information is one of the main challenges for scaling and building the next generations of supercomputers, this final degree project is focused on the development of a tool that allows to process the data generated and offering visualization tools to detect possible performance reduction during execution, as well as allowing to solve these problems in different ways both manually and automatically.

11 1.2. Research group context

1.2.1. Software context

MPI (Message Passing Interface) is a communication protocol used for parallel- computing programming [2]. It is a specification for a standard library for message passing that was defined by a group of academic researches, which also implemented it in MPICH. It is a language independent protocol, supporting both point-to-point and collective communications, being the two major implementations the former mentioned MPICH and Open MPI [3]. More implementations can be found, but most of them are derivatives of the other two. Being focused on achieving high-performing, scalable and portable applications, it has become the de facto standard in the industry, when dealing with communications when using a distributed memory architecture, such as clusters and supercomputers. Most MPI implementations consist of a specific set of subroutines (API) directly callable from , C++, and any language able to interface with these libraries, such as or Python. The advantages of MPI over older message passing libraries come from the fulfilment objectives mentioned before, having implementations for almost every distributed memory architecture that allow high portability, and specific optimizations for the hardware on which it runs to gain performance.

1.2.2. Hardware context: Tucan

Tucan is the name of the cluster used for research in Computer Science at UC3M University. It is operated by the Computer architecture department, ARCOS, and it offers HPC services to the departments of the University that can need it. It is made up from 32 different computing nodes of different characteristics, executing all of them Ubuntu . Although all of the nodes are powered by Intel CPUs, not all of them contain the same version. Depending on which is the goal of the program that is going to be executed, the cluster contains nodes with between 4 and 64 cores. Regarding the memory, there is also a variety of options depending to cover the different needs that a user may have. It is possible to access to nodes that contain 8GB of memory to up to 378GB. Regarding storage, there are different options of different capacities and speeds, depending on the technology that can be either HDD or SSD. Finally, some nodes have also different graphic cards to be used in artificial intelligence or graphic computing applications.

12 1.3. Project objectives

This section states different goals that are aimed to be covered during the development of this project:

• Integration with FlexMPI controller: Implement in the controller the necessary changes for the integration with the GUI and support of the GUI functionalities.

• Statistics visualization: Implement a GUI that allows user to connect with FlexMPI and visualize the metrics generated by the applications on real time plots.

• FlexMPI control with graphic components: Allow the users to interact with FlexMPI by simple to use components in the GUI. That is buttons, selectors or textboxes when it is necessary.

• Node contention monitoring and automatic solving: Show information about the status of the nodes and alert if there is any contention problem in one of them. Aside of showing them, offer a way of solving these problems manually. Also enable the possibility of having the GUI to automatically manage the nodes contention solving to improve performance without user intervention.

• Persistence of metrics: Store the metrics received from the applications persistently to enable later analysis of the performance results.

13 1.4. Report structure

Here each one of the sections that make up this report are listed with a short description, to improve the readability of the report:

• Introduction: Overview of the project, exposing the motivation behind the project, the context in which was developed and the goals to achieve.

• State of the art: Analysis of the current challenges that the supercomputing environment is facing at the moment, as well as the software that is being used at the moment for monitoring HPC applications and to run them in clusters.

• Environment description: Description of the software and hardware environment in which the project and each one of the components has been developed. Also, the socioeconomic environment for the project is described here.

• Description of the proposed architecture: This section describes in depth the design of each one of the components of the system, as well as their interactions to integrate into a larger system. Here are also described the requirements that the system had to meet, the planning that was followed to implement that design and the budget of the project.

• Evaluation: Description of the different tests that been carried out to ensure the system is properly working and meets all the requirements.

• Conclusions and future work: Explanation of the conclusions that resulted for the project, as well as an analysis describing the possible future improvements and lines of work that could follow this project.

• Appendix. User Manual: Guide of how to use the GUI to integrate it with FlexMPI and control the applications through the GUI options.

14 2. State of the Art

In this section, the current state of the art about supercomputing and HPC will be described, looking at the challenges that the industry is facing with the development of the technology. Also, different alternatives for performance monitoring and running environments will be analysed.

2.1. Current challenges on supercomputing

The challenge in supercomputing consists of achieving more power each time. Currently, the objective is to get to Pre-exascale systems by the end of the decade, and to achieve the Exascale generation of supercomputers in the first years of the next decade. This will mean having supercomputers that can perform over one exaFLOPs (1018 operations per second). Different organizations are investing in the development of the Exascale generation, such as the European Union, the United States or China. The U.S. Department of Energy has set the challenges to tackle for this development as the following [4]:

• Extreme parallelism. Clock speeds are not evolving as fast as they used to be. This means that the majority of the gains in performance for Exascale HPCs will come from concurrency improvements. It is simple to see that this means an improvement of 1000 times, which means that parallelism is expected to increase near to that number. Exascale systems are predicted to have a billion-way concurrency via a combination of tasks, threads and vectorization, and more than one hundred thousand nodes, which is a big challenge in the design of both the systems and the applications that will run on them.

• Data movement in a deep memory hierarchy. Data movement is one of the main bottlenecks both to performance and to reduce power consumption been identified. To solve this problems, new systems are being designed with more and more types and layers of memory. Another challenge regarding this aspect will be to increase data locality and reuse for running application, which will also mean less data movement.

• Resilience. As the number of components in a system grows, the resilience of the hardware decreases because of the higher number of places where failure can appear. To manage this issue, resilience has to be implemented also via software. The goal is to have systems that can adapt to any possible hardware failure without a global failure of the system. The availability technologies such as non-volatile memory is helping to reduce the impact that a failure can cause to the system, but there still are problems to be solved around the issue.

15 • Power consumption: Given the size and performance of Exascale systems, the power consumption was expected to be excessively high, with some predictions of arriving to GWs [5]. These predictions made this the main concern in the design and planification of the systems, which lead to an aggressive power consumption goal of 20-30 MW, not much more than the power consumed by the largest systems of today. Meeting this goal will require the development of power monitoring and management software that does not exist today.

Although these are the challenges imposed by the scale of Exascale computers, there are additional goals that some actors are trying to achieve within this new generation of supercomputers:

• Productivity: Traditionally, supercomputers software has required a high degree of expertise to use. In order to make HPCs accessible to a wider science community, it is necessary to develop software that improves ease of use and productivity.

• Diversity: Commonly software is developed focusing on just one new supercomputer every couple of years. The intention is to make software run across diverse Exascale systems, enabling diverse architectures. It will be necessary a careful design and the usage of portability layers to reduce code difference as much as possible, to create software that can run efficiently in different systems.

• Analytics and machine learning: Aside of more traditional modelling and simulation applications, future supercomputers will have to solve data science and machine learning problems. To solve these problems, it will be necessary to develop new scalable and parallel analytics and machine learning software.

16 2.2. Applications monitoring tools

Performance monitoring is a very interesting topic for several companies and institutions. For that reason, different alternatives have been developed and are used nowadays. This section offers a view of the most relevant tools that can be found for this task.

2.2.1. Arm MAP

Figure 1: Arm MAP

Formerly known as Allinea MAP before it was bought by Arm. It is an application profiler, compatible with different languages such as C, C++, Fortran and Python. It is used for its profiling functionality for MPI applications [6]. It can be used to monitor performance and causes of bottlenecks. In the image of the GUI that we can see in Figure 2, it is possible to see some of the data that this tool allows to visualize. On top, there is information about CPU and memory usage, as well as about the threads, processes and nodes. There is also data about the instruction to were specific metrics belong to in the middle panel. Information about I/O, inter-process communication and energy consumption can be gathered using this tool too.

Figure 2: GUI of Arm MAP

17 2.2.2. AWS CloudWatch

Figure 3: AWS CloudWatch

Amazon CloudWatch is Amazon’s monitoring and management for their cloud platform AWS, which is the most widely used worldwide [7]. It allows to collect of performance metrics from the applications, infrastructure and services that are being used on their systems. This data can be observed in form of logs and metrics for the whole stack. On its functionality, it enables the use of alarms, logs, and events data to take automated actions and reduce Mean Time to Resolution (MTTR). It also provides 15 months of data retention of metrics, and the ability to perform calculations on these metrics, allowing to perform historical analysis for cost optimization optimizing applications and infrastructure resources. The GUI of this tool is completely customizable, allowing to set-up a dashboard with the preferences of the user. Figure 4 shows an example, with information about the services used, some metrics and some configured alarms being shown.

Figure 4: AWS Capture GUI

18 2.2.3. Ganglia

Figure 5: Ganglia

Ganglia is a scalable and distributed monitoring system for HPC systems such as clusters developed by the University of California, Berkeley. The development of this tool was initially funded by the National Partnership for Advanced Computational Infrastructure (NPACI) and National Science Foundation of the United States. This software is distributed under a BSD open-source license, and it is used by several organizations such as Cray, the MIT or the NASA [8]. It is based on a hierarchical design, organizing clusters in groups called federations. Using a tree structure, each node notifies its data to it representative to aggregate their state. The main goal of the tool is to achieve a low overhead in each node it is used to monitor. This organization allows to visualize the information in many levels of specificity, from the whole system to a specific node as Figure 6 shows.

Figure 6: Ganglia GUI

19 2.2.4. HPC Toolkit

Figure 7: HPCToolkit

As part of the Exascale Computing Project (ECP), we can find HPCToolkit. This software is developed with the collaboration of the Office of Science (DOE-SC) and the National Nuclear Security Administration (NNSA) [9]. It is a suite of tools from performance measurement and analysis for systems of all sizes. It uses statistical sampling of timers and hardware counters to offer measurements of a program’s work, resources and performance, attributing them to the context where they occur. It offers a low overhead (1-5%) and is compatible with threaded and MPI applications. Figure 8 shows the different components of the software:

Figure 8: HPCToolkit components

20

• hpcrun: hpcrun collects performance measurements for unmodified fully optimized applications. It uses asynchronous sampling triggered by system timers and performance monitoring unit events to drive collection of call path profiles and optionally traces. • hpcstruct: hpcstruct relates binary code to source code files and components to associate measurements with the structure. • hpcprof: hpcprof and hpcprof/mpi join the data from hpcrun and hpcstruct to generate a performance database that can be explored using the graphical user interfaces. • hpcviewer: hpcviewer is a graphical user interface that presents performance data focusing on the code as well as a view to check changes on the performance throughout different threads and processes. • hpctraceviewer: hpctraceviewer is a graphical user interface that presents a hierarchical view of the execution of the program, focusing on the time. It offers efficient rendering of trace lines for high amounts of nodes.

Figure 9: HPCToolkit TraceViewer GUI

21 2.2.5. Intel VTune

Intel has also a solution for performance profiling and monitoring, called Intel VTune. This tool is compatible with several languages; C, C++, Fortran, Java, Python, Go and assembly. It has support for single and multi-threaded applications, and also for parallel applications using MPI. Among its features, we can find:

• Software sampling: On compatible processors this functionality gives both the locations • JIT profiling support: Profiles dynamically generated code. • Locks and waits analysis: To find long synchronization waits that happen when nodes are underutilized. • Threading timeline: Shows the relationships between threads to identify load and synchronization issues. It can also be used to select a region of time and filter the results. • Source view: Results can be seen displayed line by line on the source or binary code. • Hardware events sampling: When using compatible Intel processors, this functionality allows to find specific tuning actions such as cache misses or wrong branch predictions. • Memory Access Analysis: Helps to optimize data structures for performance and optimize latency and scalability.

Figure 10. Intel VTune GUI

22 2.2.6. Paraver and Extrae

The Barcelona Supercomputing Centre has developed a toolkit for performance analysis. It is composed by Paraver [10], which is the tool used for the data visualization, and Extrae [11], which is the tool used for the extraction of the data from the applications. In addition to these tools, there is also another tool for the simulation of the performance of a parallel application using a single-core CPU. Extrae is a library with the functionality to generate the traces that can then be visualized with Paraver. It is compatible with all the most common architectures, with several programming languages, and with application models like MPI/OpenMP and pthreads. The tool can be easily configured using an XML file. The traces generated by Extrae consist on performance and energy data obtained from different hardware counters. These metrics are related also with precise timestamp information, and also with the code fragment that was being executed on that moment. Other metrics than the default can also be captured using the API that the library includes. Paraver serves as the visualization tool for the traces generated. It is focused on flexibility, based in two main principles. The first one is that the trace format it uses has no semantics. This allows to extend the support to new performance data and programming models without any changes in the visualizer, simply adding this data to the trace used to visualize. The second principle is that metrics are programmed instead of hardwired. The tool includes different mechanisms to allow displaying a wide variety of metrics with the data available. Once programmed, these views can be saved into a configuration file to use them in a different moment or project. Based on its flexibility, Paraver provides a simple GUI that is enough for displaying all the metrics. As we can see in Figure 11, this GUI has only two views. One for a timeline view and one for a statistical view. To ensure this flexibility, the tool includes its semantic module with many different functions to visualize.

Figure 11: Paraver GUI Views

23 2.2.7. Periscope

Figure 12: Persicope

The Periscope Tuning Framework is a scalable automatic performance analysis tool developed at the Technical University of Munich. It consists of a frontend and a hierarchy of communication and analysis agents. Each of the analysis agents, executes autonomously searching for inefficiencies in a subset of the application processes [12]. The application processes are linked with a monitoring system that provides a network interface. This interface allows the agent to configure the measurements, to control the execution of the application and to retrieve the performance data. Currently, only summary information is supported. On the start-up of the application and the network of agents, it analyses the set of processors available, determines the mapping of application and analysis agent processes, and then starts the application and the agent hierarchy. After launch, a command is propagated down to the analysis agents to start the search. The search is performed according to a search strategy selected when the frontend is started. When the agents finish the search, the performance metrics are delivered back to the frontend. Another of the properties of Periscope, is the compatibility of plugins to achieve different functionality. There are plugins available right now that help tuning MPI applications and also energy efficiency, among other options.

24 2.2.8. Scalasca

Scalasca is a tool for performance optimization developed as a joint project by the Jülich Supercomputing Centre, the University of Darmstadt and the German Research School for Simulation Sciences. It is sponsored both by the European Union and the US DoE. Its focused on HPC applications, and compatible with MPI [13]. This software tool allows the optimization of the performance of parallel applications by runtime behaviour measurements and analysis. With these data bottlenecks are detected, and the software offers guidance for finding the roots of the problems found. The metrics are stored to be analysed after execution, offering two different execution modes [14]. The first mode is called profiling mode, in which Scalasca captures measurements from individual function calls and generates aggregate metrics. This allows to find the most consuming parts of the program and to analyse process-local metrics as those resulting from hardware counters. The second mode is tracing mode, where individual performance-relevant events are collected. This mode makes possible to automatically identify call paths that show wait states. Both modes come with a graphical interface that enables to explore the data interactively, as we can see in Figure 13.

Figure 13: Scalasca Report Explorer

25 2.3. Application running environment in clusters

In a computing cluster, applications are not running alone. Instead, the common practice is to take hand on different tools that ease the task of managing the applications and processes that run at each specific moment in an HPC.

2.3.1. Slurm

Figure 14. Slurm Workload Manager

Currently known as Slurm Workload Manager or just Slurm, the original name of this software was Simple Linux Utility for Resource Management (SLURM). Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of all sizes. It is the main choice in all supercomputers worldwide, being used in around 60% of the supercomputers listed in the TOP500 ranking [15]. It has three key functions [16]. The first one, is to allocate exclusive and/or non- exclusive access to compute nodes to users for some specific time so they can use them to work. The second feature is it provides a framework for starting, executing, and monitoring work on the group of nodes where the user has received permission. The last function is to arbitrate contention for resources by managing a queue of pending work. The architecture of the software consists of a centralized manager (slutmctld) to manage the monitoring. This manager can also have a backup manager to avoid system- wide failures in case of a failure in the primary one. On the node side, there is the slurmd daemon that basically waits for communications from the manager. Once a job has been received, the daemon executes it and responds with the status. An optional database daemon (slurmdbd) is also available, which allows to manage multiple clusters and store the monitoring information in just one single database. The user has different tools, for initiating jobs, terminate jobs or retrieving the system information among others. These tools allow the users to communicate both with the central manager or with specific nodes, depending on the command sent. A diagram can be found in Figure 15.

26

Figure 15. Slurm Architecture Diagram

The daemons can manage different entities, which are different sets of nodes. These entities can be of many types, being the simplest one just a single computing node. The other possible entities are partitions, which are groups of nodes in logical sets, job, that represent resources assigned to a user for a specific time interval, and job steps, which are sets of tasks from the same job. The different entities can be seen in Figure 16.

Figure 16. Entities in Slurm

Additionally, Slurm has a modular design that offers the option to use it with different plugins that increase the functionalities. There are more than 100 plugins available that allow different tasks such as implementation specific MPI hooks, energy

27 consumption gathering, containers support, time sharing for parallel jobs or topology optimized resource selection. Finally, all this functionality can be configured with a simple configuration file. Using these files, it is possible to specify the nodes, with their cores, memory or disk space. It also allows to set-up the partition of the nodes and the organization of the different entities, as well as to limit the maximum time for a job, the maximum number of nodes for a task or different policies, among other things.

28 2.3.2. TORQUE

TORQUE is resource manager that provides control over batch jobs and distributed computing resources. It is developed by Adaptive Computing, based on the previous open-source project PBS (Portable Batch System). Although this is optional, it can integrate with Moab Cloud, a workload and resource orchestration platform developed by the same company [17]. Unlike Slurm, as of June 2018 TORQUE is not open-source anymore and it is distributed under a proprietary license [18], but it is used in tens of thousands of government, academic and commercial sites all over the world [19]. Compared to PBS, it offers enhancements in key features:

• It improves fault tolerance, by adding new failure conditions and adding support to node health check scripts.

• The scheduling interface is extended, adding support for the collection of statistics. Both the query and control interfaces are extended, providing the scheduler with additional and more accurate information and increasing control over job behaviour and attributes.

• For scalability, it incorporates support to handle larger clusters and jobs, with compatibility for clusters of over 15 teraflops or 2,500 processors, and jobs of over 2,000 processors. It also offers support for larger server messages and an improved server to MOM communication model.

• Usability is enhanced by adding many different logging functionalities, as well as by making logs more human-readable.

29 2.3.3. IBM Spectrum LSF

Created by Platform Computing and formerly known as Platform Load Sharing Facility, or LSF, it was called Spectrum LSF after IBM bought the company in January 2012 [20]. It is a workload management platform and job scheduler for supercomputing, and a part of IBM’s HPC suites. It is focused mainly into enterprise customers, offering security and fault tolerance. In a similar way to Slurm, there is a master host which is the machine where the main piece of code is running and that has the responsibility of coordinating all the other nodes. Additionally, several master host candidates can be found in the system, so one of them will assume control in case of any failure. The rest of the nodes in the cluster can be considered as client hosts or server hosts, where clients can only submit jobs and servers can also run them. Depending on the state of a server host, it can be an execution host if it is running a task at the moment or a submission host if it is idle [21]. Jobs are managed using a job queue. These queues can be system-wide or can deliver tasks to a group of nodes only. Each queue may have different job scheduling and control policies. Jobs are assigned transparently to the user, which just has to choose the queue to submit it, and LSF will assign the best resources available for the execution of the task. A diagram of the different components of the LSF architecture can be found in Figure 17.

Figure 17. Spectrum LSF Architecture

30 Each one of the hosts run various daemon processes, depending on their role in the systems. The master host will run all the functionalities to monitor the system and schedule the different tasks, while the server hosts will run the necessary processes to receive their jobs and return the results. Client hosts do not run any daemon, since it is not necessary. Regarding security, LSF offers different roles that can have a user [22]. Every user is allowed to submit jobs, but the available resources will depend on the assignation that user has. The primary administrator is the one with permissions all over the cluster. It can control all jobs, regardless who submitted them and is the only one who can change configuration files. There are more possible administrators, regarding the scope where they have permission, cluster administrator, queue administrator, host group administrator and user group administrator. Neither of these have permission to change any configuration file, as this privilege is only given to the primary administrator. For the authentication, the LSF security model tracks user accounts internally by default. A user account that is defined in LSF includes a password to provide authentication and an assigned role to provide authorization, such as administrator. There is also the option to rely on external authentication systems in case it is desired, such as Kerberos, ActiveDirectory or LDAP. A diagram about the security model can be seen in Figure 18.

Figure 18. IBM Spectrum LSF Security Model

31 2.3.4. Univa Grid Engine

Figure 19. Univa Grid Engine

Univa Grid Engine is a batch-queuing system, forked from Sun Grid Engine. That happened after Univa acquired the software from Oracle in October 2018 [23]. Since Oracle bought Sun until the product was given to Univa, is was called Oracle Grid Engine. Its intention is to manage workloads automatically, maximizing sharing resources and accelerating the execution of any application, container or service. It can be deployed in many environments, such as on-premise, cloud, hybrid cloud or Cloud-native HPCs. Univa Grid Engine allows sharing workloads across machines in a data centre with the objective of optimizing the usage of the computing infrastructure. The scheduling policies can be applied to any work submitted, making sure that high-priority tasks are finished on time and at the same time making the utilization of the computing nodes as high as possible [24]. Another feature is the possibility to select resources on at a granular level. This way, it gives support to complex topologies with multiple different CPUs, GPUs and network interfaces, optimising resource allocation. It also offers high scalability, being a single Grid Engine cluster able to contain up to 10,000 nodes and to run 200 million jobs per month. In a single on-premise environment, it has scaled to up to 20,000 cores and in a cloud environment, to 1 million cores.

32

2.4. FlexMPI tool

As stated on the previous section, MPI is the main standard used to implement applications that execute on high performance computing (HPC) clusters. Applications built to run in an HPC cluster tend to be parallel, typically with a high degree of parallelization. Regarding the applications capability to vary the number of processes used at a time, they be classified in four different categories: rigid, mouldable, malleable, or evolving. Rigid and mouldable applications have in common that their number of processes remains fixed the whole execution, while the difference is that in mouldable applications this number is set on the application start-up. For malleable and evolving applications, both types may vary the number of processes during execution time, being the difference that while the latter is autonomous and makes the changes on its own when more processors are available, malleable applications make these changes controlled by an external Resource Management System (RMS). This makes malleable applications more flexible and efficient, as the RMS may take into account the global state of the cluster, including other applications and the priority of each, in order to set the policy for each of them. Although different RMSs offer malleable capabilities, MPI does not support this natively. About the design of dynamic reconfiguration techniques for malleable MPI applications, the complexity appears because it is not enough to simply modify the number of processes when resources become available, but it is also necessary to take in account performance. Reconfiguration can actually decrease the application performance, not only because of the overhead of that action itself, but also because of an increase in communication and synchronization overheads. Also, complexity grows higher when running on clusters with heterogeneous computing nodes. FlexMPI is an MPI extension which supports malleability and implements performance-aware dynamic reconfiguration for iterative MPI applications, working as a library on top of the MPICH [25] implementation. It uses completion time as the performance objective, and it automatically reconfigures the application to use the amount of processes necessary to achieve its execution time. These reconfiguration takes place each time that it is detected low performance in the application, and it is based on user-given constraints. The prediction model decides the number of processes and the processors used, resulting in a reduction of dynamically created processes from the efficiency constraint and selecting the processors with the lowest cost (USD per CPU time) that satisfy performance needs from the cost constraints.

33 The development of FlexMPI is focused on single program multiple data (SPMD) applications, which tend to be iterative. This kind of applications execute the same code in different processes, each of them with its own subset of the data. The common structure of these applications is an initialization section where the data is partitioned, followed by an iterative section were the processes operate parallelly communicating with each other to find a global solution. As it is implemented as a library on top of MPICH, FlexMPI is totally compatible with all the features of the MPI-3 standard. The running environment of a FlexMPI applications consists of the FlexMPI library, the MPI user application, the Performance API (PAPI) and MPI library, the user-given performance objective and performance constraints, and the resource management system, as Figure 20 shows.

Figure 20: FlexMPI application running environment

On its execution, FlexMPI is organized in different modules that are responsible for different parts of the functionality. The first one is the monitoring module, which in each group of iterations of the application receives the performance metrics to be aggregated. Then, in each sampling interval, which consists of 100 iterations, the performance module receives the gathered metrics. This module uses the data received to track the performance of the applications and calculate if it is necessary to reconfigure the

34 application by either adding or removing processes. The computational prediction model estimates the number of processes and the computing power, in FLOPS, required to satisfy the objective. With this prediction, the performance module maps the processes to the available processors taking in account their number and type and the performance constraints. The dynamic process management module is the one implementing the process creation and removal functionalities and is responsible for rescheduling the processes according to the mapping. Each time a reconfiguration is done, the data distribution between processes changes, which might lead to imbalance. To deal with this problem, the load balance modules computes the new workload distribution based on the computing power of the nodes that are allocated to the application after every reconfiguration process. The last module is the data redistribution module, which is in charge of mapping and redistributing the data between processes following the workload distribution. After reconfigurations has finished, the application resume sits execution. Figure 21 shows the relationship of all the modules with each other and with the FlexMPI applications.

Figure 21: Workflow related to FlexMPI

35 3. Environment Description

In this section the environment where the project has been carried out will be described, analysing both the development environment and the socio-economic environment.

3.1. Development environment

In this section, the decisions made about the different aspects of the environment will be stated. The different topics described here are the programming language and chosen for developing the code, as well as the different tools and libraries used for simplifying development and adding desired functionalities. The project is made of two different parts that are connected, the GUI (Graphical User Interface) developed for the visualization of the monitored data, and the controller which manages the FlexMPI applications. Different options have been chosen for each part, so they will be explained separately.

3.1.1. GUI

The GUI is the main component developed during this project. It allows the user to visualize all the metrics generated about the performance of the applications running in the cluster, allowing to interact with the system and solve problems when they appear, such as contention in computing nodes. For the programming language, the choice for the GUI was Java. It is a general- purpose object-oriented programming language, developed by Sun Microsystems, which was later bought by Oracle [26]. Oracle provides a free implementation, and there is also the OpenJDK implementation, which is open source. Java is one of the most used programming languages [27], having a big collection of libraries and tools. It is easy to use, and to learn. Additionally, even if it is not faster than lower level languages, it is still one of the most efficient languages [28]. This is more than enough for a program that will not require a high performance, since most of its execution will consist on listening to event and network calls, without a high load. In contrast with its lower performance, it offers helpful characteristics as being highly portable for running in a virtual machine, error-verification on compilation and execution time that make it easier to debug and the garbage collector that removes the need of manually managing memory. It also allows all the needed functionality for threads and sockets, necessary for the connection with the rest of the system.

36 The reasons to choose this programming language have been:

• Easy to learn and use. • More-than-enough performance. • High portability. • Wide usage, support and community. • Support for necessary functionality.

Along with the programming language, there are some tools and libraries that can be chosen in order to ease the development and the implementation of some of the functionalities. When working with Java, the most common thing is to use an IDE (Integrated Development Environment) since it helps with many of the tasks that have to be done to ease the development process. For this reason, the only software used will be the Eclipse IDE [29]. This tool offers many advantages when working with Java, that will be discussed. Eclipse is an IDE currently developed by the Eclipse Foundation. It is a multiplatform application distributed under a free license (Eclipse Public License, EPL), which allows free usage. It offers a wide variety of features, including code autocompletion, debugging tools and utilities for refactoring. In addition, it has many plugins that can be obtained through its plugin marketplace that allow to increase its functionality. Despite having many functionalities, the main reason objective of using a specific development tool is to increase productivity when working. For that reason, having the shortest learning curve possible is a good measurement when choosing which tools to use, especially in one-person projects. Eclipse is the most used IDE for Java development, and specifically it is the one used in all the UC3M lectures that work with Java. This, in addition of its functionality and free licensing, makes it the best choice since there will not be any need to learn new configurations that can slow down the development compared to other options as NetBeans (which is free) [30] or IntelliJ IDEA (which is a paid software) [31]. The main functionalities dependant on libraries are the GUI components and the plotting of the different metrics. For the implementation of the GUI the options considered have been Apache Pivot and Java , and for the chart plotting JFreeChart and Chart2D. Apache Pivot is an open-source platform for building installable Internet applications [32], maintained by the Apache Software Foundation. It was considered as the first option to develop the application interface, but it was discarded in favour of Swing.

37 Java Swing is a GUI widget toolkit developed by Oracle for Java and it is part of Oracle’s JFC (Java Foundation Classes), which are a framework for building portable GUIs on Java [33]. Both options offer all the options needed for the project, being Swing the chosen one for just a few reasons:

• Better documentation and community. • Easiness to use. • Developed by the language maintainer, thus no need of external libraries.

JFreeChart is a free Java chart library [34], distributed under the GNU LGPL (Lesser General Public License). It has support for many output types, including Swing, image files and vector graphics file formats. JChart2D is a minimalistic chart library. It is also free software, distributed under the GNU LGPL. Although it has less functionality than other charting libraries, it is focused on reducing the overhead on plotting real-time data, allowing run time – dynamic display. It is also compatible with Swing, and it is simple to use and has simple documentation. Both solutions would be valid for the project, being the choice to use JChart2D for the following reasons:

• It offers simplicity for the implementation of real-time charts visualization. • It has exactly the functionality needed, without extra characteristic that are not needed. • It is focused in reducing the overhead of using it, which improves the overall performance of the application.

38 3.1.2. Controller

The controller is the component that connects all the other parts. It is in charge of receiving all the messages and sending them to their destination. Since this component only communicates with the GUI over the network, there is no need to use the same language as the requirements here differ. This component will need a high performance as its main task will be to process a lot of data in a fast way and send it where it is necessary. To cover this performance requirement, the C programming language was chosen for the development. C is a general-purpose programming language mainly oriented towards systems programming. It is one of the main languages used for the implementation of most of the operating systems used nowadays (Linux/Unix, Windows, MacOS), and also in the development of applications on embedded systems and supercomputers. It is cross-platform, as most systems have a compiler for C, and it is defined by its high performance compared to other programming languages. The reasons for this performance are the low-level access to memory that the language provides, the efficient mapping between C code and machine instructions and the minimal runtime system used for executing compiled applications. These characteristics make it often to be referred as a medium-level language. The higher performance of the language makes it also more complex than others, leaving for example memory management as a total responsibility of the developer, compared to other languages that have a garbage collector. This was the language of choice for previous stages of the project, so it is the natural option given that this new component will be integrated with the rest of the system. Given the size of this component is relatively small, the higher complexity of using this language has low relevance compared to the benefits. The reasons to choose this programming language are:

• It has been used in previous stages of the project. • High performance and portability. • Compatibility with the necessary libraries. • Widely used in system programming.

When working with C, especially with not so large projects, it is common to use a code editor instead of in IDE. There is a wide variety of solutions for this, but the chosen one has been Visual Studio Code [35]. For compiling the code, the choice has been GCC (GNU Compiler Collection) [36].

39 Visual Studio Code is a code editor developed by Microsoft. It is open source and free to use, distributed under un MIT license. It is compatible with the three most spread operating systems (Linux, Windows and MacOS) and has support for most of the programming languages. It is highly customizable, offering many functionalities, and it has also support of plugins and extensions to add additional functionality or support for more languages. GCC is a compiler system produced by the GNU Project. It was originally developed only for the C language and named GNU C Compiler, but it was renamed later on to GNU Compiler Collection when support to other languages was implemented. It is distributed under the GNU GPL License (GNU General Public License) and it is open source, as common practice from the GNU Project. It is the most used C compiler, and has become the de facto standard in the industry, when compiling for GNU/Linux systems. It has also support for most of the processor families used nowadays.

3.1.3. Common environment

One of the advantages of having development tools that are available in all major operating systems has been the possibility to work with all of them. The code has been developed using Windows, Linux and MacOS throughout different phases of the project. Regarding the GUI, since one of the objectives was to make it multiplatform, this has been useful in order to test that the application works with all the systems. On the controller, it is a piece of software that is meant to work only under Linux/Unix systems, so even if the development has been done in all systems the testing has only been carried on under Linux, using also a virtual machine when the machine was not running the needed OS. To manage the usage of different systems, the tools used are both ssh and sshfs, allowing the former to connect remotely to a different computer shell to run any command, and the former using ssh to connect a remote computer file system in order to work locally as if the files where in our computer. Scp has also been used, as it allows to securely and easily transfer files from one computer to another. As these tools are only available for Unix-like systems, in Windows the solution was to use PuTTY [37] and WinSCP [38], which are applications that offer the functionality of ssh and scp with a graphical interface that is really convenient to use. Finally, the testing has been developed both locally in the development machine and in the University cluster, Tucán. This cluster runs also a Linux distribution.

40 3.2. Socio-economic environment

HPC is one of the hot topics in computer engineering nowadays, as it has turned to be necessary to process all the data generated from all kind of new technologies such as artificial intelligence or the growing IoT devices. It is also a key tool in scientific research, allowing to simulate different physical or biological phenomena in a way that allow to cut costs significantly as well as reducing drastically the times needed for achieving a relevant discovery. Just as with computer power in general, the performance achieved by HPCs is growing fast year after year as seen in Figure 22 [39], and also the community around HPC is growing exponentially throughout the years as shows Figure 23 [40].

Figure 22: HPC Performance Development

41

Figure 23: HPC user base growth

As the topic gains more relevance to the eyes of governments and big companies, the investments on the area grow. The European Union has developed the European High- Performance Computing Joint Undertaking [41] with a budget of €1 billion for developing top-level supercomputers between 2018 and 2026, with an additional €400 million from private members. In contrast, the budget on the previous period was of €700 million [42] – a half. This tool is aimed to be used by system administrators. The objective of the project is to offer a lightweight alternative to monitor performance of applications running in HPCs. This way, costs can be cut both by having a light tool and improving efficiency of the running applications with its functionality. Being an academic project, both free and open source, there are no intentions of gaining any economic benefit. The main benefit that is sought is related to the academic environment. This means that the project is more successful the more usage it gets, along with the references that it obtains in different academic fields, as well as achieving collaboration projects with different universities and institutions. Nevertheless, in this kind of projects there is the possibility to obtain a benefit by support contracts. An example of that is Red Hat, a company based on offering an Linux distribution which makes the majority of its profit with support subscriptions, having a license that can be of US$1,299 with a year of premium support or up to US $3,096 with all the possible addons included [43]. The proof of the potential of these kind of business models is the purchase of Red Hat by IBM for around $34 billion last year [44].

42 Given the high energy consumption that an HPC has, executing parallel applications can be expensive. It is for this reason that efficiency is really important, since a small improvement can mean a big economic difference in cost. For this reason, a free and open tool can benefit many actors both in the industry and in the academic environment, especially in fields related with services and infrastructure.

43 4. Description of the Proposed Architecture

In this section the architecture designed will be described, showing the different components that form it and how they are related with each other.

4.1. Overview

The system works on a distributed way, enabling the execution of different components on different computers. On one side, the GUI is thought to be launched on the local computer of the user. The GUI will connect to the cluster and receive the communications to show them to the user. The rest of the system is meant to be executed on the cluster, although that is not strictly necessary for all the components. In the middle, the controller is the component in charge of communicating with all the different parts, sending the information about the metrics from the applications to the GUI, and the commands from the GUI to the applications. A diagram of the components and their relationship can be seen in Figure 24.

Figure 24: Diagram of FlexMPI components

44 On this project, the components of the system that have been developed are the GUI and some elements of the controller. The communications between both components have also been part of the work. As seen in Figure 24, not every part of the system directly communicates with each other. The way the arrows in the diagram represent, he only interaction the GUI has with the rest of the system is with the controller. This component is the responsible of centralizing all the communications. It receives the metrics of the applications and send them back to the GUI to be plotted. It also receives the commands from the GUI and redirects them to the corresponding application. Of course, that is a simple general overview of the major components of the system. Each one of the components consists of different threads, each with a specific responsibility. The major three parts on which the system can be classified are the application statistics service, the application commands sending and the contention monitor. A more in-depth overview of the different threads can be seen in Figure 25, and they will be explained throughout the following sections.

Figure 25: Overview of the threads of the system

45 4.2. Components

The system executes with three different main components. The GUI has the function of showing metrics and information to the user, and to send commands to the controller. The controller is the component that will connect to the GUI and to the applications, sending metrics and commands between them. Last, the applications also execute part of the functionality, sending the metrics when necessary and listening to commands. Each component will be described in this section, showing some pseudocode to illustrate the logic of different parts of these components.

4.2.1. GUI

As stated before, the GUI is the component responsible for showing the metrics and sending the commands from the user to the rest of the system. It runs several threads to avoid the possibility of having the system blocked, as well as functions launched on specific events. The main of the GUI, is used to receive the connection from the controller and also to receive contention data and the registration of new applications. The logic of the threads looks like the following:

Algorithm 1 Main thread of GUI

1: init () 2: while running do 3: message = receiveControllerMessage () 4: if message == REGISTER_MESSAGE then 5: runInitialControllerRegistration (message) 6: else if message == APP_MESSAGE && controller is registered 7: runAppRegistration (message) 8: else if message == MONITOR_MESSAGE && controller is registered 9: runMonitorRegistration (message) 10: end if 11: end while

For sending the commands to each application, asynchronous events are used as common practice for GUI components. There are also two threads related with each application, but they are only executed when the statistics service is activated, since they are used to gather and process the application metrics. The first thread just waits for a new message with statistics information, to parse it and store it in a circular buffer. The functionality of the second thread is to read the

46 buffer in order to process the data, plot the metrics into the graph when the user shows it or save them to disk if that option is activated. This is done this way so if the frequency of statistics messages is high at some point, the socket could be read at a matching frequency without the thread being stuck in processing the data. The logic of the statistics collector thread is very simple. It is described in this pseudocode:

Algorithm 2 Statistics collector thread

1: init () 2: while running do 3: message = receiveStatisticsMessage () 4: if message == REGISTER_MESSAGE then 5: saveToBuffer (message) 6: end if 7: end while

The buffer reader thread is the one taking and processing the information. When it is executed, it first initializes the necessary structures and then loops reading the data from the buffer. Aside of saving the data into the disk if necessary, it would be possible to analyse the metrics in order to execute actions or alerts. This is the pseudocode for this thread: The last thread in the GUI is a single one that is launched when the monitor command is registered. This thread will parse the information about the nodes that are running in the cluster and generate a grid with them.

Algorithm 3 Buffer reader thread

1: init () 2: while running do 3: message = waitForBufferRead () 4: parseMetrics (message) 5: saveToBuffer (message) 4: if writeToDisk == True && bufferMoreThan50%Full then 5: writeToDisk () 6: end if 3: action = checkDataForActions () 4: if action != null then 5: execute (action) 6: end if 7: end while

47 After the first part, it will stay listening for contention alerts, creating colour alerts and executing automatically actions if necessary, to solve the problem notified. The pseudocode of this thread is the following:

Algorithm 4 Monitor listener thread

1: init () 2: // Parse nodes in cluster 3: message = getRegistrationMessage () 4: nodesList = parse (message) 5: createNodesGrid (nodesList) 6: while running do 7: message = receiveContentionMessage () 8: node, info = parse (message) 9: node.update (info) 10: if node.autoSolve == True then 11: solveContention (node) 12: else 13: showColourAlert (node) 14: end if 15: end while

48 4.2.2. Controller

The same day as the GUI, the controller executes different threads for different functionality. Although there are more threads that might be executed for different functionalities, here just the main thread and the ones related with the interaction with the GUI will be described. The main thread is just responsible of initializing all the necessary variables and threads, and then waiting for user input from the terminal. The pseudocode of this thread is the following:

Algorithm 5 Main thread of controller

1: init () 2: parseParameters () 3: createGUIConnectionThread () 4: initializeApplications () 5: while running do 6: readTerminalInput () 7: processTerminalInput () 8: end while

After the main thread, the next thread to be executed is the GUI listener thread. This thread is responsible of connecting the controller to the GUI. After that, it loops waiting for commands coming from the GUI to the applications. The logic of this thread looks this way:

Algorithm 6 GUI Listener thread

1: guiPort, guiAddress = getGUIPortAndAddress () 2: registerToGUI (guiPort, guiAddress) 3: while running do 4: command, appId = receiveGUICommandMessage () 5: sendMessageToApp (command, appId) 6: end while

49 Next to the connection with the GUI, it is time to initialize the applications. Each application will have a matching thread in the controller. This thread holds the responsibility of registering each application to the GUI and of monitoring the app status. When the statistics service is active for an app, it will be from this thread where the GUI receives the execution metrics. Also, different states of the application are monitored here, as if the application has terminated for example. This is the pseudocode of this thread:

Algorithm 7 App management thread in controller

1: init_app () 2: connectToAppInCluster () 3: registerAppToGUI () 4: while running do 5: message = receiveApplicationInformation () 6: if parse (message) == APPLICATION_TERMINATED then 7: exitThread () 8: else if parse (message) == DIFFERENT_POSSIBLE_MESSAGES 9: executeNecessaryActions () 10: else if parse (message) == STATISTICS_MESSAGE then 11: parseStatistics (message) 12: if isGUIWaitingForStatistics == True then 13: sendStatistics () 14: end if 15: end if 16: end while

The last thread to be launched is the thread dealing with the contention monitoring. Similarly to the other threads described, in this thread some set-up instructions are executed, and afterwards the same tasks are repeated in order inside a loop. This is the pseudocode for this thread:

Algorithm 8 Contention monitor thread

1: connectToMonitor () 2: connectToGUI () 3: while running do 4: message = receiveContentionMessageFromMonitor () 5: sendMessageToGUI (message) 6: processMessage (message) 7: end while

50 4.2.3. Application

The logic inside the applications that run using FlexMPI is simple, and it is intended to add the lowest overhead possible to the application itself. As the applications run an iterative program, the only extra logic is to check every 100 iterations if a command has been received and process it, or to send back to the monitor the metrics about the status of the application if it is necessary. This logic works the following way:

Algorithm 9 Application running FlexMPI

1: init () 2: for each iteration ∈ appIterations do 3: executeAppParallelComputing () 4: command = xecuteAppParallelComputing () 5: if command != null then 6: processCommand (command) 7: end if 8: if statisticsServiceActive then 9: sendStatistics () 10: end if 11: end for

51 4.3. Interfaces definition

Having a system that is compound of different independent components, the way to interconnect them is with the exchange of messages over the network. All these communications are done using UDP sockets. The advantage of using UDP sockets compared to using TCP sockets is that this type of socket is more lightweight [45]. Although TCP sockets offer interesting features such as reliability and ordering, it is more relevant that the UDP sockets are more lightweight. Having the majority of messages being about application metrics, it is comparable to audio or video streaming, where it is not that important if one of the messages is lost. Regarding the ordering, it is also not relevant since the metrics ware be plotting using the timestamp as one of the parameters received. In this section, the different interfaces of the components for each functionality will be described, showing diagrams how the communications work. All the message strings consist of different values separated with a colon “:”. The at character “@” is also used to separate some information in certain messages.

4.3.1. Controller-GUI connection

This is the initial interaction between the controller and the GUI, where the controller connects to the GUI in order to later register the applications. Before, the GUI stays passively listening to a previously chosen socket. This message is used basically to register in the GUI the address and port that will be listening for the commands sent by the user. This command has the following format:

���: 0 The only important message is the first parameter, that means that this is a connection message, so the GUI stores the information about where to send the messages to the controller if necessary. The rest is irrelevant, and it just meant to keep consistency within all the messages. Extra information for more functionality could be added in later versions for this message. The flow diagram representing the controller connection process is shown in Figure 26.

52

Figure 26: Flow diagram of controller-GUI connection

4.3.2. Application registration

The second interaction that the components of the system will have is the application initialization and registration in the GUI. In this process, the controller launches an application and then informs the GUI that the application has been initialized. The format of this message is the following:

���: < ��� ���� >: < ��������� > @ < ������ �������� > @ < ��� �� > @ < ������� ���� >

53 In this message, the first parameter (“APP”) indicates that it is a message with information about a new application, being the second parameter the application name and the third one the timestamp when the application has been launched. This timestamp is used to difference between applications that have the same name if they existed, as well of having data about the moment the application was launched. The next parameter, “format filename”, is the name of the XML file where the format of the metrics for that application is described. This file allows to easily have different metrics for each application, keeping the statistics messages smaller. The “app id”, as its name states, is the id of the application as stored in the controller, which will be used to identify the application when the commands are sent by the user. Lastly, the “command port” is the port number where the GUI will send the commands for the application. When this message is received, the data is parse by the GUI and the necessary threads and GUI components are initialized and created. A flow chart of this process can be seen in Figure 27.

Figure 27: Flow diagram of application registration

54 4.3.3. Monitor registration

Next to launching and registering the applications, the next step is to connect the contention monitor. The format of this message string is also similar to the previous ones:

�������: < ���� 1 >: … ∶< ���� � >

After the first part of the message that states that it is to register the monitor, the next parameters correspond to the nodes that are found in the cluster. Using these names, the GUI will represent the nodes in a grid to show the problems when they are notified by the controller. When the monitor connection message is received, a thread is created to listen for those alerts. Along with the thread, a socket is initialized, and the port number is sent back to the controller to send the contention notifications to that port. The flow diagram for this process is described in Figure 28.

Figure 28: Flow diagram of contention monitor thread connection

55 4.3.4. Command sending

Once an application is registered in the GUI, it is possible to send different commands to the app. The different commands will be described later in the report, but they all have the same format:

< ��� �� > @ < ������� ���� >: < ��������� 1 >: … ∶< ��������� � > The app ID is the Id of the application as received in the registration message. This ID is used by the controller to know where to redirect the command. The command code corresponds to the action that has to be executed, and the parameters are optional and vary along different commands. Every command is sent using a button, which will launch an asynchronous event that will parse the data and send the message to the controller in a non-blocking way. Before sending the command, it is checked that it is correct and not redundant (like activating statistics when they are already active). If everything is correct the command is sent. The flow diagram of this process is described in Figure 29.

Figure 29: Flow diagram of command sending

56 4.3.5. Application metrics

When the statistics service is active, the GUI receives the metrics of each application in the specific port for this application. For that reason, there is no need about application-specific data in the message. The metrics arrive to the GUI with the following format:

< ����� 1 >: … ∶< ����� � >: < ��������� > In this string, the different values correspond to the different metrics that are configured to be received. The last value is the timestamp, which is always added by default regardless the other metrics, since it is indispensable for plotting the graphs. Once the metrics arrive to the GUI, the values are processed. Then, alerts are showed if necessary and if the graph is open, the metrics are plotted. The diagram is shown in Figure 30.

Figure 30: Flow diagram of metrics retrieval process

57 4.3.6. Contention notification

The last interaction between the controller and the GUI is the notification of contention in some node of the cluster. These notifications arrive to the thread that is created after the monitor is registered into the GUI. This is the format of the contention message string:

< ���� ���� >: < ��� 1 >: < ��������� �� ��� 1 >: …: < ��� � >: < ��������� �� ��� � > @ < ������� >

In the message, the node name refers to the node that is suffering the contention. The node name is followed by the applications that are running in that node with the number of processes that each one is executing in the node. The last part is the information about the metrics that are being monitored to detect contention. These metrics are the percentage of CPU, memory, cache and network that is being used by the node. This communication has two parts, the first one where the notification arrives id described in Figure 31. The second part of the communication is for solving the contention problem, it varies depending on if it is in auto mode or the solution is done manually.

Figure 31: Flow diagram of contention message received

58 The process for solving the contention manually or automatically is almost the same. It consists in removing an application for the problematic node and launching the processes in a different node with less load. The only different between the manual and the auto mode is the way the application to be removed is selected. The application is selected by the user in manual mode, while on auto mode the GUI itself will select the application based on a set of rules. The diagram for this process is shown in Figure 32.

Figure 32: Flow diagram of solving contention problem

59 4.4. Requirements analysis

This section shows the software requirements that have been obtained for the project. They define how the system will work one after its implementations. These requirements will be defined throughout the whole project, as they may change in different phases while the objectives of the project itself change. Those shown here are the definitive version. The template that will be used will be the following in Table 1.

Identifier

Title

Description

Priority

Necessity

Table 1: Requirement Definition Template

• Identifier: FR-XX for functional requirements and NFR-XX for non-functional requirements. XX will be the number of the requirement. • Title: Brief definition of the requirement • Description: Explanation of the requirement • Priority: Importance of the requirement. Can be high, medium or low. • Necessity: Relevance for the project. Can be necessary, desired or optional.

60 4.4.1. Functional requirements

Identifier: FR-01

Title Connection time.

Description The controller has to connect to the GUI on launch, so the GUI should be launched before the controller and wait for connection. Priority High

Necessity Necessary

Table 2: Functional Requirement FR-01

Identifier: FR-02

Title Applications view.

Description All the applications shall appear on a list in the GUI, so the user can find each of them easily. Priority Low

Necessity Optional

Table 3: Functional Requirement FR-02

Identifier: FR-03

Title Application identification.

Description Each application shall be individually identified.

Priority High

Necessity Necessary

Table 4: Functional Requirement FR-03

61

Identifier: FR-04

Title Control processes.

Description The user shall be able to launch and delete processes using the GUI to send the commands. Priority Medium Necessity Necessary

Table 5: Functional Requirement FR-04

Identifier: FR-05

Title Application termination.

The user shall be able to terminate an application using the GUI to Description send the command.

Priority High

Necessity Necessary

Table 6: Functional Requirement FR-05

Identifier: FR-06

Title Policies control.

The user shall be able to launch different policies for an app using the Description GUI to send the commands.

Priority Medium

Necessity Necessary

Table 7: Functional Requirement FR-06

62

Identifier: FR-07

Title Start statistics service.

The user shall be able to start the statistics service using the GUI to Description send the command.

Priority High

Necessity Necessary

Table 8: Functional Requirement FR-07

Identifier: FR-08

Title Stop statistics service.

The user shall be able to stop the statistics service using the GUI to Description send the command.

Priority Low

Necessity Desired

Table 9: Functional Requirement FR-08

Identifier: FR-09

Title Individual metrics.

The user shall be able to request individual metrics using the GUI to Description send the command.

Priority Low

Necessity Optional

Table 10: Functional Requirement FR-09

63

Identifier: FR-10

Title Data persistence.

Description Metrics received about applications has to be stored persistently.

Priority Low

Necessity Optional

Table 11: Functional Requirement FR-10

Identifier: FR-11

Title Graphs representation.

Description The graphs plotted have to be based on time.

Priority High

Necessity Necessary

Table 12: Functional Requirement FR-11

Identifier: FR-12

Title Metrics choice.

Description The user shall be able to visualize the different metrics of their choice, either together or individually. Priority High

Necessity Necessary

Table 13: Functional Requirement FR-12

64

Identifier: FR-13

Title Metrics configuration.

Description The metrics that will be received shall be configurable by the user before launch. Priority Medium

Necessity Desired

Table 14: Functional Requirement FR-13

Identifier: FR-14

Title Visual alerts.

Description There shall be visual alerts when notifications arrive from the system to the GUI. Priority Medium

Necessity Desired

Table 15: Functional Requirement FR-14

Identifier: FR-15

Title Contention information.

Description There shall be a section in the GUI to show whether a node has contention. Priority High

Necessity Necessary

Table 16: Functional Requirement FR-15

65

Identifier: FR-16

Title Apps in nodes information.

Description The apps running in each node should be visible when receiving a contention alert. Priority Necessary

Necessity Medium

Table 17: Functional Requirement FR-16

Identifier: FR-17

Title Solve contention manually.

Description There shall be an option to solve contention manually choosing one of the apps in the node. Priority High

Necessity Necessary

Table 18: Functional Requirement FR-17

Identifier: FR-18

Title Solve contention automatically.

Description There shall be an option to solve contention for a node automatically.

Priority Medium

Necessity Desired

Table 19: Functional Requirement FR-18

66

Identifier: FR-19

Title Distributed system.

Description The GUI and FlexMPI shall work also in separate systems.

Priority High

Necessity Necessary

Table 20: Functional Requirement FR-19

67 4.4.2. Non-Functional Requirements

Identifier: NFR-01

Title FlexMPI integration.

Description The GUI shall work with FlexMPI.

Priority High

Necessity Necessary

Table 21: Non-Functional Requirement NFR-01

Identifier: NFR-02

Title GUI independency.

Description The GUI shall be independent of the rest of the system.

Priority High

Necessity Necessary

Table 22: Non-Functional Requirement NFR-02

Identifier: NFR-03

Title Controller reliability.

Description The controller shall not stop on disconnection with the GUI.

Priority High

Necessity Necessary

Table 23: Non-Functional Requirement NFR-03

68

Identifier: NFR-04

Title GUI reliability.

Description The GUI shall not stop if there is a problem delivering a message.

Priority High

Necessity Necessary

Table 24: Non-Functional Requirement NFR-04

Identifier: NFR-05

Title GUI buttons.

Description The GUI has to offer buttons for all the functionality.

Priority High

Necessity Necessary

Table 25: Non-Functional Requirement NFR-05

69 4.5. System design

Being a complex system with different components, there are different parts to describe. In this section the design of the different components and the system itself will be defined. 4.5.1. Container classes

A system running FlexMPI can be divided in nodes, applications and processes. These three different units are modelled each one with a different container class. The classes are named container because they are developed to contain the information about each entity. These three objects are related to each other as shown Figure 33. The processes related to the applications with a many-to-one relationship, while the applications and the nodes have a many-to-many relationship where an app can be running in several nodes, and a node can host more than one app at the same time. Besides describing each class, a class diagram describing the different containers and the relationship between them can be found in Figure 34.

Figure 33: Relationship between nodes, apps and processes

The most basic container class is the process, which is inherited from the first phase of the project. Originally, the metrics for each application arrived per-process, and they were stored individually in order to plot them separately. When starting the second phase, the metrics were changed to arrive aggregated by application, which helped to reduce the network usage by having shorter messaged sent.

70 To adapt the GUI to the new system, the simplest option was to keep things basically as they were implemented, keeping the process container class. The difference regarding the GUI is that each app would appear to have just one process, although the number of processes is also data that is viewed with this change. Keeping the design this way allows to send data per-process again if desired with little change in the system. The process container stores the different metrics that arrive for an application and it is where the GUI extracts the metrics to plot them in the chart, using a thread safe buffer. This class has also the responsibility of writing to the disk the metrics of the process it is related with, when that functionality is active. The processes are related to their parent application. The application container holds the different processes that belong to each application, as well as all the information about the app. The information about an application includes the application name and ID, the names of the metrics that are being monitored about this application, the sockets used to communicate with the controller, and the buttons and labels to send the application commands and show the information.

Figure 34: Class diagrams of container classes

71

The responsibility of this class is to manage all the necessary aspects about each application. For that reason, also the status of the application is managed, like if the statistics service has been activated or if the application has been terminated. The single buffer were the unprocessed metrics are stored belong also to the application container class. The last container class is the node container, which was created to integrate the GUI with the new contention monitoring functionality. It was added as part of the second phase of development, since the information about the nodes was not necessary before the second phase. Being the newest, this class is very simple. It just stores information about the node name, and the data about the status of a node when the last contention problem notification arrived. Together with the metrics, also the applications that were running in the node on the notification are stored.

72 4.5.2. Buttons handlers and threads

The nodes, applications and processes are managed by several classes. The system can be divided in thread classes and button handler classes. Thread classes are responsible of managing the logic related with each one of the functionalities implemented, while button handler classes manage the events when buttons are clicked. In addition, both thread and handler encapsulate the different GUI layouts. These layouts can be called when clicking a button that opens a new window, or when a new message arrives and has to be plotted or showed to the user. A diagram stating the relation between all the components can be find following this description in Figure 35. The whole program is built around the Monitor Parent class, which is in the centre of the diagram. This class executes the main function and it is the entry point of the application. All the configurable values are located in this class, and it is the responsible of managing the main window of the GUI, with the applications list. These applications are managed throughout app the code using a hash table. This hash table works using a hash of the application name as the key and storing ApplicationContainer objects. The choice of using a hash table was based on the higher efficiency when both saving and searching an application, which is O(1). Once the main window is initialized, the responsibility of the MonitorParent class is to wait for messages from the controller. The messages that arrive here can be for the initial registration of the controller, for the registration of an application or for the registration of the contention monitor. When the controller is registered, the data received is simply stored and there are no more actions involved. When the contention monitor is registered, the nodes are parsed, and a new thread is initialized to listen for contention messages. Each node parsed involves the initialization of a NodeContainer instance. Lastly, receiving an application message involves the initialization of an ApplicationContainer instance that will be displayed and stored into the hash table. With the object, also two threads are created but are not initialized, StatCollector and BuffReader (on the top right of the diagram), which start their execution when the statistics service is activated and that will receive the app metrics and extract and parse them. Another thread class exists, ProcessReader. Although deprecated, since metrics do not arrive per-process in the final version, this thread was the one populating in real- time each one of the charts that were open at any moment. It still runs, but existing only once per application since it is simpler to keep the original design and it allows to obtain information per-process again in future versions if desired. Regarding the buttons, there are 5 different classes that manage to usage of these buttons, and all the related classes are located in the bottom half of the diagram. The first

73 one is AppButtonHandler, which holds the functionality of showing the application commands window. There is one instance of this class for each application. Another object that appears once per application is CommandsButtonHandler, which is the class that executes all the different instructions to send the commands for an application to the controller. ProcessButtonHandler, while originally appeared once per application process, currently is initialized also one time for each application, and it is the class that launches the charting window and that initializes the thread that plots the metrics in the chart. Additionally, there are two button handlers related with the node contention, those classes are NodeButtonHandler and SolveNodeConflictButtonHandler. The former opens the window to see the information about a specific node, and the latter executes the code that sends the command to the controller to automatically solve the contention problem. A class diagram of all the classes mentioned in this section can be found in Figure 35.

Figure 35: Threads and handlers class diagram

74 4.5.3. GUI views design

The GUI has different sections that show the distinct information and options to the user. In this section each window will be explained and shown with an example. It is important to recall that the data shown in the example should not be taken into consideration, as it has been sent only with the purpose of simulating the real message and showing the window. Each one of the sections is found in a different panel, that con be shown or hidden. This allows the user to organize the elements as desired in the screen, as well to hide the unnecessary ones. There is just one single element that is always visible, which is the main view. An example of this panel is shown in Figure 36. The primary section of this panel is the view of the different applications that are registered. They are identified with their name and the application ID, in case there is more than one with the same name.

Figure 36: Main GUI panel

For each application, there are three different buttons. These allow to open the commands panel and the metrics visualization window for that app. The last button is used to request extra metrics information out of the normal sending interval.

75

The last components of the main view of the GUI are a small text panel at the bottom to print information such as new applications, and the left button that opens the node contention view. The node view is very simple, and it can be seen in Figure 37. It simply creates a grid-like layout where all nodes are shown as buttons. Each button is labelled with the name of the node, and the order remains unchanged during the whole execution, to simplify locating each node if necessary. As it is possible to see in the figure, the labels of the nodes can appear in different colours. When a node appears in red, it means that a contention problem has appeared, and that it has not been solved yet. This way the user is notified about a node when it requires attention, and it is easier to visually detect possible issues in the cluster nodes.

Figure 37: GUI nodes grid view

When a contention message is received, its values and parsed to be included in the contention message view. This window is opened by clicking on one of the nodes buttons and is shown in Figure 38.

76

Figure 38: Node information view

If there was no information about the node, the window would simply show a message stating that the node has not had problems yet. In case there was previous data about the node, the information found in this view can be divided in two sections. The first section shows the metrics that are sent from the controller to the GUI, along with the time where these metrics have been received. The second part of the view shows the applications that are running in the node, with two buttons that allow to open the app metrics visualization window or to change the app to a different node to solve the problem. The window to visualize the metrics of an application is also simple, having just the necessary components. As seen in Figure 39, the main part of the window is the chart where the metrics are plotted. The X axis in this chart is always the time, and the Y axis will be the value of the metrics. The scale of the Y axis changes depending on the values that are being plotted, and the metrics are updated on real time.

Figure 39: Application metrics visualization window in GUI

77 The other section in this window is the metric selector in the left side. All the different metrics that are received appear on that section, together with one checkbox for each metric. By checking and unchecking them, it is possible for the user to change which metrics are being shown on the chart. This allows to have a cleaner view without too many metrics, and to visualize only the necessary at every moment. Another advantage for this is that different metrics do not have the same scale, so to visualize information about metrics with a small scale it is necessary to plot them alone. In order to see the metrics, it is necessary to send the command to activate the statistics service first. This command can be sent from the command panel that exists for each application, where the user can access to all the possible commands for that app. This window, as it can be seen in Figure 40, is also very easy to understand. It shows a button to send each one of the supported commands, with text boxes and selectors to insert the input if the command requires it. On a button press, the command is directly sent to the controller after verifying the parameters.

Figure 40: Application commands view

78 4.6. Project planning

In this section, the development of the project will be described briefly, stating the two main development phases in which it has been carried out. Also the Gantt charts representing de planning of the development done in the different phases described. This type of chart was used because it is “one of the most widely used management tools for project scheduling and control” [46]. The first phase of the project is the work done during a research grant at the Carlos III University, comprising 3 months. This grant had the objective of developing a GUI to work with the existing version of FlexMPI. The second phase was 3 years after, with the objective of adapting the GUI to the changes that have been done in the system, as well as adding some more functionality both for the GUI and for the controller.

4.6.1. Phase 1

This phase was the first one, being part of the research grant. Given the lack of familiarity with the system, the first tasks were meant to get some knowledge about the existing codebase, as well as of planning the whole development. After that, the development consisted on three different sprints of different length, regarding the different complexity of each task. The last part of this phase consisted on testing, both the functionality and the performance. As a research grant, this phase of the project was defined with clear goals to reach. In this phase of the development, the aim was to have a functional GUI to control the applications running FlexMPI, as well as allowing to visualize the metrics for each application. These metrics were being received per-process, showing different graphs for each of the processes the application was running. Another functionality that was included during this phase was the functionality of storing to the disk the statistics about the applications, in a way of having the data stored persistently to allow. The goal of this functionality is to allow the analysis after execution. The last goal of this phase was to choose the technologies that were going to be used for the development of the project. Although some of the technologies were already predefined, there was the need to evaluate some of the alternatives for different aspects of the work in order to choose the more suitable. Regarding the methodology applied in this phase, it has been based in agile methods. Applying agile methodologies encourages a more flexible and rapid approach to the software development, thus making the development faster and less prone to work repetition, since problems are detected in earlier phases. During this part of the project, the work was developed on site, having a workstation provided by the University in an office. That organization allowed to easily

79 have daily communication with the tutor. For that reason, during this period the development was divided in short sprints, where some specific objective was decided to keep the focus of the work on that goal. In addition to the sprints, daily meetings were scheduled. These allowed to review if the work done on the previous day was correct. This practice allows faster development ensures correct work, as having daily meetings and reviews makes it easier to detect possible problems. The sooner problems are detected, the faster they can be solved.

Figure 41: Gantt chart for first part of Phase 1 Some small testing periods were also included in the end of sprints 1 and 3, to reduce the amounts of problems that were dragged to later stages of the development. This chart has been split in two parts to make it easier to view, and can be seen in Figure 41 and Figure 42.

Figure 42: Gantt chart for second part of Phase 1

80 4.6.2. Phase 2

Although Phase 2 was developed during a longer period of time, it consists of less parts, as the new functionalities implemented were specific and clearly defined beforehand. The first of these two different parts consisted on updating the GUI to work with the last version of FlexMPI. This required an adaptation for the changes in the communication protocol, as well as the fact that the metrics stopped being sent per- process. Being developed years after the first phase, during the time since the first phase was finished there were several changes in the system. The purpose of this phase has been updating the GUI to catch up with the changes that have been done to the rest of the system, as well of adding some extra functionality. FlexMPI has not stopped being under development, so while the result of the first phase was a totally functional GUI integrated with the system, during the time after that phase FlexMPI continued to evolve. For that reason, it was necessary to apply some changes to the GUI to adapt it to the new logic and to the new communication protocol. As an example of the changes that were done, the GUI was setup to receive metrics aggregated per application instead of per process. Aside of these updates, the main change in the system was the addition of a monitor detecting contention in the nodes of the cluster. This made it possible to add also information about contention in the GUI, in order to visualize the performance problems that are happening in each node. In addition to the visualization options, the other goal was to use this information to solve the problems that appear throughout execution. The solution for these problems can be done either manually by the user or automatically by a simple algorithm which was developed during this phase too. For this phase agile methodologies were also applied. This time, the work was not developed in the university but remote, so daily meetings were not an option. Instead, the meetings with the were scheduled weekly with additional online communications. The second part of this phase involved the implementation in the GUI of the contention monitor. This functionality was developed and integrated with the rest of the system after the first part was finished, so it was necessary an upgrade to integrate it in the GUI. The charts for this phase are shown in Figure 43 and Figure 44.

81

Figure 43: Gantt chart for first part of Phase 2

Figure 44: Gantt chart for second part of Phase 2

82 4.7. Budget

In this section the budget for this project will be described. The costs source regarding the project will be staff, hardware and software costs. Regarding the utilization of the cluster, the cost will not be considered as its usage can be considered negligible compared to the total usage by the university. 4.7.1. Staff costs

As for the staff, the work has been developed by a team of two members. Namely David Expósito Singh, tutor supervising this final degree project and Federico Goldfryd Sprukt, student in charge of the design, planification and development of the project. The salary for each one of the members was retrieved from Glassdoor [47], being the one for the tutor the corresponding for a senior software engineer and for the student the corresponding for a junior software engineer. This salary is annual, and can be converted easily to an hourly salary, dividing by the total number of hours worked on a year. Once the hourly rate is calculated, it will be necessary to add the taxes that the university has to pay from the Social Security [48] to have the total cost for an hour of work. This tax is of the 23.6% of the salary. The different amounts, from the base salary to the hourly cost including taxes are shown in Table 26.

Annual Hourly Social Hourly cost Staff Role salary salary security (salary + taxes)

David Expósito Singh Tutor 42,143 € 25 € 5.87 € 30.88 €

Federico Goldfryd Sprukt Student 23,077 € 15 € 3.52 € 18.53 €

Table 26: Salaries description

The work of this project was carried out in two phases, being the first one during 70 working days, and the second one during 86 working days. In the first phase, the tutor had 2 hours of work per week and the student had 4 hours a day of work. In the second one, the tutor had 1 hour of work per week and the student 2 hours per day. Taking in account the cost per hour for each team member, the total cost appears in Table 27.

Phase 1 Phase 1 Phase 2 Phase 2 Total Staff Total cost hours cost hours cost hours

David Expósito Singh 28 h 864.64 € 17 h 524.96 € 45 h 1389.66 €

Federico Goldfryd Sprukt 280 h 5,188.4 € 172 h 3,187.16 € 452 h 8.375,56 €

Total 308 h 6,053.04 € 189 h 3,712.12 € 497 h 9,765.16 €

Table 27: Total costs description

83 4.7.2. Hardware costs

This section explains the cost related with the different hardware devices used for the development of the project. Each computer will have an amortization period of 8 years, and the cost will be proportional to the time of use for the realization of this project. All the information about the devices and their use is described in Table 28.

Monthly Months Total Product Prize cost used cost

University office workstation 1000 € 10.42 € 4 months 41.68 €

MacBook Pro 2019 1550 € 16.15 € 4 months 64.6 €

Asus GL552-VW 699 € 7.3 € 8 months 58.4 €

Total hardware cost 164.68 €

Table 28: Hardware costs breakdown

4.7.3. Software costs

The software costs are related to the licenses that are paid in order to use the different programs, including operating systems, development environments or any other kind of application. These costs will be calculated using an amortization period of 5 years and counting the proportional time of usage. The information regarding these costs can be found in Table 29.

Monthly Months Total Product Prize cost used cost

Microsoft Windows 10 Pro 259 € 4.32 € 4 months 17.28 €

Microsoft Office Professional 579 € 9.65 € 6 months 57.9 €

Ubuntu 16.04 0 € 0 € 8 months 0 €

Eclipse 0 € 0 € 8 months 0 €

Visual Studio Code 0 € 0 € 8 months 0 €

VirtualBox 0 € 4 months 0 €

Total software cost 75.18 €

Table 29: Software costs breakdown

84 4.7.4. Total budget

The economic benefits of the project will be related with its commercial application in the future, but for now it will be considered with a low benefit of 15%. The risk will be considered to be of the 6%, given that the project is directed by the Computer Architecture Department of the Polytechnic School of the University Carlos III of Madrid. The total budget, including taxes, will be the following defined in TABLE.

Concept Cost

Staff 9,765.16 €

Hardware 164.68 €

Software 75.18 €

Subtotal 9,929.84 €

Risk (6%) 595.79 €

Benefit (15%) 1,489.48 €

Total before V.A.T. (21%) 12,015.10 €

V.A.T. (21%) 2,523.17 €

Total 14,538.28

To conclude, the total budget of the project amounts for FORTEEN THOUSAND FIVE HUNDRED AND THIRTY-EIGHT WITH TWENTY-EIGHT EUROS (VAT included): 14,538.28 € (V.A.T. included)

85 5. Evaluation

This section describes all the testing performed to evaluate the execution of the system, and the fulfilment of the requirements. The tests are divided in functional tests and performance tests. The functional tests are intended to check the functional requirements, and they ensure that all the logic of the application is executed as it is necessary. The performance tests are used to check that the execution of the systems is in the accepted constraints, whether in the execution time or in the amount of resources such as memory used.

5.1. Platform description

This is the description of the platform used for testing, so the results of the tests can be better understood and reproduced if necessary. For the tests, different hardware will be used. The functional tests, they will be checked running the controller in an Asus GL557VW and the GUI both in the former laptop and in a MacBook Pro, to test network communications. For the performance tests, the controller and the applications will be executed using the Tucan cluster to have more precise results, while the GUI will be executed in both a MacBook Pro and an Asus GL557VW, both laptops. In the cluster, two nodes will be used, in addition to the frontend node which is necessary to communicate with systems outside the cluster itself. The specifications of both nodes are the same. The CPU is an Intel Xeon CPU E5405, with 8 cores running at 2.00GHz. Each node has 8GB of memory and 2TB of HDD storage. The Asus GL557VW has a CPU Intel Core i5 6300HQ with 4 cores running at 2.30GHz. The memory is of 8GB and the storage is an SSD of 128GB, and the MacBook Pro has a CPU Intel Core i5-7360U with 2 cores running at 2.3GHz. The memory is of 8GB and the storage is an SSD of 128GB. Regarding the network communications, all the connections between the computers have been using Wi-Fi N, with a 100MBs connection for out of the network communications. The connection between the laptops and Tucan was handled using a VPN in order to obtain access to the network of the cluster. In the MacBook Pro the tool used for that was Tunnelblick 3.8.0, while in the Asus GL557VW it was the tool of the system. For the operation systems used in each one of the computers, the MacBook Pro was running MacOS 10.14.5, and the Asus Gl557VW was running Ubuntu 18.04.3 LTS. In Tucan, the OS used was Ubuntu 16.04 LTS. Finally, the Java Runtime Environment used to execute the GUI was OpenJDK 11.0.4.

86 5.2. Traceability matrix

To track that each requirement is covered by at least one test in an easy to see view, a traceability matrix is used. This matrix has a row for each test and a column for each requirement, recognized by their corresponding identifier. The X marks the coordinates where each test id covering the corresponding requirement. If the tests had enough coverage, there should not be any column without at least one X in it, which means there is no requirement with at least one test checking it implementation. The traceability matrix for this project is shown in Figure 45.

Figure 45: Traceability Matrix

87 5.3. Functional tests

Functional tests are designed to ensure the implementation of all the defined functional requirements. Each requirement should be covered by at least one test, that will have the objective of checking whether or not the program is fulfilling the requirements related. Each one of the functional tests is defined using a table for that purpose. The template used for these tests can be seen in Table 30.

Identifier

Description

Objective

Related requirements

Expected result

Obtained result

Table 30: Functional Test definition template

• Identifier: FT-XX where XX will be the unique number of the requirement. • Description: Brief definition of the test. • Objective: Explanation of the test. • Related requirements: Requirements that are being proven by the test. • Expected result: Explanation of the behaviour the program is expected to take. • Obtained result: Relevance for the project. Can be necessary, desired or optional.

88 FT-01

Description GUI initialization.

Check the GUI executes correctly without the controller Objective running.

Related requirements FR-01

Expected result The GUI launches correctly.

Obtained result The GUI opens and shows the main window.

Table 31: Functional Test 01

FT-02

Description Launch the controller with the GUI open.

Objective Check that the controller connects to the GUI.

Related requirements FR-01

Expected result The controller starts execution and connects to the GUI.

Obtained result The controller is launched, and the GUI shows a new connection.

Table 32: Functional Test 02

FT-03

Description Starting applications.

Check that the GUI receives the information bout the new Objective application and shows is in the list of the main window.

Related requirements FR-02

Expected result The application appears in the main window.

Obtained result The GUI receives the new application message and shows it in the main window list.

Table 33: Functional Test 03

89 FT-04

Description Showing several applications.

Check that the GUI shows all the applications in the list, Objective enabling to scroll them when there are too many

Related requirements FR-02

Expected result All the applications started are displayed in the window and the scroll is activated when they do not fit the size.

Obtained result All the applications are displayed in the main window list. When there are more applications, there is a scroll option.

Table 34: Functional Test 04

FT-05

Description Applications are uniquely identified.

Check that all applications are uniquely identified, even the Objective ones with the same name.

Related requirements FR-03

Expected result Application names are joined with their individual IDs to identify the uniquely.

Obtained result Applications appear with their ID attached, so when the name is the same the IDs are still different.

Table 35: Functional Test 05

FT-06

Description Remove processes.

Objective Check that the process removal command works.

Related requirements FR-04

Expected result After sending the command, the process is removed by the controller.

Obtained result The controller receives the command and reduces the processes

Table 36: Functional Test 06

90

FT-07

Description Create processes.

Objective Check that the process addition command works.

Related requirements FR-04

Expected result After sending the command, the process is created by the controller

Obtained result The command is received and the process spawned.

Table 37: Functional Test 07

FT-08

Description Stop applications.

Objective Check that the application stopping command works.

Related requirements FR-05

Expected result After sending the command, the controller stops the app execution.

Obtained result The controller receives the message and the application stops execution.

Table 38: Functional Test 08

FT-09

Description Change policy.

Objective Check that the policy command works.

Related requirements FR-06

Expected result After sending the command, the controller changes the execution policy.

Obtained result The policy is changed after receiving the message.

Table 39: Functional Test 09

91 FT-10

Description Start statistics service in GUI.

Check that the GUI executes the necessary actions when Objective statistics are started.

Related requirements FR-07

Expected result The statistics button shows that the service is active, and the necessary components are initialized.

Obtained result The statistics button shows they are active, and the visualization window opens.

Table 40: Functional Test 10

FT-11

Description Start statistics service in the controller.

Check that starting the statistics service sends the command Objective correctly to the controller.

Related requirements FR-07

Expected result After clicking the button, the controller receives the command and starts sending the metrics.

Obtained result The controller receives the message and starts the service.

Table 41: Functional Test 11

FT-12

Description Statistics reception.

Objective Check that the GUI receives and process correctly the metrics.

Related requirements FR-07

Expected result When the statistics service is active, the GUI receives the data and parses it.

Obtained result The metrics received are displayed and the chart is populated with the last metrics.

Table 42: Functional Test 12

92

FT-13

Description Stop statistics in GUI.

Check that when deactivating the statistics, the GUI executes Objective the necessary actions.

Related requirements FR-08

Expected result The statistics button shows it is disabled.

Obtained result The GUI shows that the statistics service is disabled for the application.

Table 43: Functional Test 13

FT-14

Description Stop statistics in controller.

Check that the controller receives the command and processes Objective it properly.

Related requirements FR-08

Expected result The controller receives the command and stops sending metrics information.

Obtained result The controller stops sending statistics.

Table 44: Functional Test 14

93

FT-15

Description Retrieve individual metrics out of schedule.

Check that the functionality of requesting metrics out of the Objective scheduled interval works.

Related requirements FR-09

Expected result When the button is clicked, the controller sends the metrics for the application directly and they are received and processes by the GUI correctly.

Obtained result The controller receives the message and sends the metrics instantly even when the service is disabled. The metrics are received and processed by the GUI.

Table 45: Functional Test 15

FT-16

Description Persistent storage of metrics.

Objective Check that the GUI stores in a file the data received.

Related requirements FR-10

Expected result When the buffer is full, the metrics should be dumped to a file that will be created in case it is missing.

Obtained result A file uniquely identified is created, and the metrics are written inside.

Table 46: Functional Test 16

94 FT-17

Description Metrics plotted in chart.

Objective Check that the statistics are plotted properly and based on time

Related requirements FR-11

Expected result When the statistics service is active, the chart for the application should show the metrics received in a time-based chart.

Obtained result The chart shows the metrics that have arrived, with the time of each measurement.

Table 47: Functional Test 17

FT-18

Description Disable plotting a specific metric.

Objective Check the it is possible to remove a metric from the chart.

Related requirements FR-12

Expected result When unchecking the corresponding checkbox, the values of the metric stop being displayed in the chart.

Obtained result When the metric is unchecked, it stops being displayed on the chart.

Table 48: Functional Test 18

FT-19

Description Enable plotting a specific metric.

Objective Check the it is possible to add a metric to the chart.

Related requirements FR-12

Expected result When checking the corresponding checkbox, the values of the metric start being displayed in the chart.

Obtained result When the metric is checked again, it is instantly displayed in the chart.

Table 49: Functional Test 19

95

FT-20

Description Metrics configuration.

Check that the metrics to be received can be changed using the Objective XML file.

Related requirements FR-13

Expected result The metrics shown are the ones described in the XML.

Obtained result The metrics shown in the chart correspond with the ones in the XML. When the XML is modified and the GUI relaunched, the metrics change and keep matching the file.

Table 50: Functional Test 20

FT-21

Description Contention alerts processing.

Check that the GUI receives and processes correctly contention Objective alerts.

Related requirements FR-14, FR-15

Expected result When the contention message is received, the corresponding node button changes to red.

Obtained result The contention message arrives, and the corresponding node changes to red.

Table 51: Functional Test 21

96 FT-22

Description Contention information display.

Check that the correct contention information about the node is Objective displayed in the information window.

Related requirements FR-15, FR-16

Expected result When a contention message is received, the node values and applications are shown in the node window.

Obtained result When clicking the node button, the window with the node information shows the contents of the last message.

Table 52: Functional Test 22

FT-23

Description Manually fix contention choosing the application.

Check that the move application button works, changing the Objective application processes to a different node.

Related requirements FR-17

Expected result When the button is clicked, the GUI sends the controller the corresponding messages, creating the necessary processes in a new node and removing them from the current one.

Obtained result The GUI sends two messages, to create and remove the same amount of processes in different nodes for the selected application. The controller receives the messages and does both actions.

Table 53: Functional Test 23

97

FT-24

Description Manually fix contention using the auto solve button.

Check that clicking the auto solve button moves the application Objective with the lowest number of processes and to a different node.

Related requirements FR-18

Expected result When the button is clicked, the GUI send the messages to move the application with les processes from that node the another one.

Obtained result The GUI sends two messages, to create and remove the same amount of processes in different nodes for the application with the lowest number of processes. The controller receives the messages and does both actions.

Table 54: Functional Test 24

FT-25

Description Activate automatic contention management for a node.

Objective Check that the GUI activates the automatic contention management for a node. Related requirements FR-18

Expected result When the button is clicked and automatic management enabled, the node button appears in green colour to show that it is being automatically managed. Obtained result The node colour turns green, showing that it is being automatically managed.

Table 55: Functional Test 25

98 FT-26

Description Deactivate automatic contention management for a node.

Objective Check that the GUI deactivates the automatic contention management for a node. Related requirements FR-18

Expected result When the button is clicked and automatic management is enabled, the node button appears again black to show that it is not being automatically managed. Obtained result Activate automatic contention management for a node.

Table 56: Functional Test 26

FT-27

Description Automatic management for a node.

Objective Check that the automatic management for a node works correctly when a contention message arrives. Related requirements FR-18

Expected result When the GUI receives a contention message for a node that is automatically managed, there is no notification. The application with les processes is the moved to a different node, sending the command to the controller. Obtained result The message is processed and an app is moved to other node without notification.

Table 57: Functional Test 27

FT-28

Description Connection from different computers.

Objective Check that the GUI and the controller can communicate being executed in different systems. Related requirements FR-19

Expected result When the GUI and the controller are launched in different systems setting properly the IP address, they should connect and work the same as being in the same computer. Obtained result When the controller is launched, the GUI receives the connection message and shows it.

Table 58: Functional Test 28

99 5.4. Performance tests

These tests are meant to evaluate how the performance of the system behaves both regarding the GUI and the controller, which are the components of the system that are aimed in this project. The measurements involve both memory and CPU usage for both parts. To measure these values, different amounts of applications will be launched, starting with 1 and doubling the amount up to 16 simultaneous applications. The usage of resources will be measured for the component without any load, to see the minimum usage of resources, and again with the applications running while having the statistics service both enabled and disabled. The performance of the applications is also going to be evaluated. In this case, the test will consist of launching several times one instance of the same application. Each time, the number of processes will change, in the range between 1 and 16 in the same way as with the applications. The measurement here will involve the execution time of that application to have information about how the number of processes affects this time. All these tests will be carried out using 2 computing nodes of the Tucan cluster, so the metrics are more relevant to what would be a real-life scenario.

5.4.1. Controller performance tests

The controller was tested being executed in Tucan, using htop [49] to capture the data about the memory in KB and CPU usage in percentage of usage of the application. Being written in C with the focus on performance, good values were expected. The measurements were taking both for the with the statistics service on and off, which changed the results. Several measures were taken for each scenario, and then averaged. Regarding CPU, the usage stayed always in 0%, which did not give any relevant information. That is probably related to the efficiency of the program, and also to the fact that it was being executed in the cluster, which is more powerful than a normal computer. Regarding the memory usage of the controller, there is more variation depending on the amount of applications executed. Right after being launched, the base memory consumption was of 2000KB. The highest memory usage was measured when running 16 applications with the statistics active and was of 3328KB. A graph showing the variation of the results can be found in Figure 46.

100 3500 3328

2980 3000 2624 2500 2428 2290 2300 2320 2324 2336 2352

2000 2000

1500 Memory (KB)

1000

500

0 No applications 1 application 2 applications 4 applications 8 applications 16 applications

Statistics Off Statistics On

Figure 46: Evolution of memory usage of the controller

Looking at the graph, it is possible to observe the beginning of an exponential trend in the memory consumption. Using quadratic regression, the predicted number of applications necessary to have a usage of 1GB, would be of 611. This number is way above the needs, since the number of applications executed at the same time in an HPC cluster does never get that high.

101 5.4.2. GUI performance tests

Being written in Java, it is not expected to have the same performance as the controller, as this language is not as efficient as C. Another reason for being less memory efficient is the graphical interface itself, which is something that generally uses more memory than a terminal application. This component of the system was executed in the MacBook Pro, using the activity monitor of the system to measure the different results. Regarding CPU, the consumption was still almost neglectable, but in this case there are some values higher than a 0% of usage, although it did not go higher than 1% during the tests. The values of usage are shown in Figure 47.

0,9 0,8 0,8

0,7

0,6

0,5

0,4 0,3 CPU CPU utilization (%) 0,3 0,2 0,2 0,2 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1

0 No applications 1 application 2 applications 4 applications 8 applications 16 applications

Statistics Off Statistics On

Figure 47: Evolution of the CPU usage of the GUI

The memory consumption of the GUI was higher than the one of the controller tests. The base usage of memory, when no application was registered, was of 100MB, an order of magnitude higher than the controller. The maximum memory usage registered was of 450MB with the 16 applications running and the statistics service active for all of them. The graph with the memory usage can be seen in Figure 48. The high memory consumption when the statistics are active, correspond to the usage when the windows to observe the statistics plotted are open. Each window open

102 uses between 25MB and 30MB of memory, which are throughout time released back to the system again to the system while they are closed. This is not easy to control since it is not possible to acknowledge when the memory used by a closed window is released, the option has been to measure the consumption with all the windows, although when the number of applications grows higher not all the windows will be observed at the same time.

500 450 450

400

350

300 260 250 200 200 190 Memory (MB)

150 130 140 130 120 100 105 110 100

50

0 No applications 1 application 2 applications 4 applications 8 applications 16 applications

Statistics Off Statistics On

Figure 48: Evolution of the memory usage of the GUI

Using quadratic regression for predicting the consumption with more applications, the results are that it is possible to execute between 35, and 59 applications using 1GB of memory, and between 61 and 87 using 2GB of memory. Given that the GUI will be running in a normal computer, that consumption should not be problematic and the lowest amount of applications that it is possible to run with 1GB of memory with all the statistics on and the windows open should be enough for the greatest part of use cases.

103 5.4.3. Applications performance tests

To test the performance of the applications, the program used with FlexMPI was Jacobi. Jacob is an iterative application used for FlexMPI testing purposes to be CPU intensive, which makes it a good option to test parallelization. The execution time was measured used the command line utility time, that outputs the time taken by a command when it finishes. Starting with just a single process, the time taken to finish was 8 minutes and 55 second. When the processes are set to 16, the execution time is reduced to just 47 seconds. The evolution is very close to be directly proportional and can be observed in Figure 49.

9:36 8:55

8:24

7:12

6:00

4:48 4:37

3:36 2:41 2:24 1:47 Execution time (minutes : seconds)

1:12 0:47

0:00 1 process 2 processes 4 processes 8 processes 16 processes

Execution time

Figure 49: Evolution of the execution time for an application

104 6. Conclusions and Future Work

This section states the conclusions after the development of the project, and how the different goals have been achieved. It also describes the different way on which the system could be improved in the future and which can be more relevant.

6.1. Conclusions

The main goal of this project was to develop an application with a graphical user interface to allow the visualization of data and the control of an existing system, integrating the GUI with FlexMPI. That is the result of the development carried out during this project, which consisted on implementing completely the GUI application, but also making the necessary change in the existing part of the system so that both components can seamlessly interact. All the other goals come after this one and complement it. With both systems communicating, the first and most important functionalities to implement are the interactions between them. The GUI interacts with the controller sending all the commands that allow to control FlexMPI, using the different buttons available throughout the menus. On the other side, the controller interacts with the GUI delivering information about the statistics of the application, that later are displayed for the user in a chart so they can be observed in a user-friendly display. This allows an easy statistics visualization of the different metrics that the applications are generating. Along with this metrics about each application, the information about the nodes was added in the second phase. This goal aimed to allow the user of the GUI to monitor the health of all the nodes that execute the applications detecting node contention, since the status of the nodes is key for the performance of the applications that are running in it. To show the information about the nodes, there was added a new panel to the GUI with a grid-like distribution of all these nodes so they can be easily found. Nevertheless, not only showing the node was the goal, but also implementing the option of solving the node contention when ir shows up and even leveraging in the GUI itself to solve these problems when they arrived. This was implementing adding for each node the option of moving an application out of it both being the user itself the one choosing which app to move or allowing the system to choose the app. Not only that, but the automatic contention fixing function also permits the user to enable it for a node, so the GUI itself will process each contention message and send the corresponding commands to the controller to fix it.

105 Finally, the last of the goals was to implement a persistent storage for all the metrics collected from the applications, so they can be analysed more in-depth after the execution. The GUI has a logging system that will store in different files the metrics received for each one of the applications to achieve this goal.

6.2. Future work

Although having achieved all the proposed goals, it is clear than this application has many possible ways of improving. Probably, there are two main fields where the space for improvement is more relevant. The first of these paths would continue the development around the persistence of data. In one hand, the data about the nodes and the contention issues is not being stored in any case and having this information for later analysis can lead to better results. On the other hand, another interesting way of improving the system would be to incorporate functionalities for analysing the data after execution without having to use external tools. This functionality is provided by most of the existing monitoring programs and makes them provide more value to the users. This is important because some analysis cannot be carried out on real time because they would take too long to be useful, but they can provide important insight about how to improve the execution of the applications in different iterations and the reasons for the performance problems that could have been found. The second big path to explore regarding the improvement of the GUI, would be the automation. At the moment, the algorithm regarding the contention problems is a simple solution, moving with each error message the application with the lowest amount of processes in that node. This algorithm can surely be improved in order to find the actions that will increase the performance and reduce problems more. Another place for automation is the applications statistics itself, where there has been little analysis in this project. Analysing the metrics from the applications, and the data about the nodes can allow to setup different actions to improve the efficiency of the overall system. Of course, there are more things to improve. Possibly another important improvement would be UX, which is usually not taken too much into account in this type of applications. Lastly, simplifying the configuration to make it easier for the user would be also a good idea. Nevertheless, it is the first two paths the ones that can lead to a greater increase in the relevance of this tool.

106 7. References

[1] Inside HPC, “What is high performance computing?,” Inside HPC, [Online]. Available: https://insidehpc.com/hpc-basic-training/what-is-hpc. [Accessed 20 09 2018].

[2] W. Foundation, “Message Passing Interface,” Wikimedia Foundation, 29 08 2019. [Online]. Available: https://en.wikipedia.org/wiki/Message_Passing_Interface. [Accessed 15 06 2019].

[3] The Open MPI Project, “Open MPI: Open Source High Performance Computing,” The Open MPI Project, 20 05 2019. [Online]. Available: https://www.open-mpi.org. [Accessed 15 06 2019].

[4] M. A. a. C. J. a. T. R. a. V. J. a. M. L. C. a. A. J. a. N. J. R. Heroux, “ECP Software Technology Capability Assessment Report,” vol. 07, 2018.

[5] P. a. B. S. a. C. D. a. C. W. a. D. W. a. D. M. a. F. P. a. H. W. a. H. J. a. K. S. a. K. D. a. L. R. Kogge, “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Techinal Representative, vol. 15, 2008.

[6] Arm, “Arm MAP,” Arm, 2019. [Online]. Available: https://developer.arm.com/tools-and-software/server- and-hpc/arm-architecture-tools/arm-forge/arm-map. [Accessed 10 06 2019].

[7] Amazon, “AWS CludWatch,” Amazon, 2019. [Online]. Available: https://aws.amazon.com/cloudwatch/. [Accessed 10 06 2019].

[8] Ganglia, “What is Ganglia?,” Ganglia, 07 03 2018. [Online]. Available: http://ganglia.info. [Accessed 10 06 2019].

[9] HPCToolkit, “HPCToolkit Overview,” HPCToolkit, 22 10 2018. [Online]. Available: http://hpctoolkit.org/overview.html. [Accessed 11 06 2019].

[10] Barcelona Supercomputing Center, “Paraver: a flexible performance analysis tool,” Barcelona Supercomputing Center, 2019. [Online]. Available: https://tools.bsc.es/paraver. [Accessed 12 06 2019].

[11] Barcelona Supercomputing Center, “Extrae,” Barcelona Supercomputing Center, 2019. [Online]. Available: https://tools.bsc.es/extrae. [Accessed 12 06 2019].

[12] Technical University of Munich, “About Periscope,” [Online]. Available: https://periscope.in.tum.de. [Accessed 12 06 2019].

[13] Jülich Supercomputing Center, “About Scalasca,” Jülich Supercomputing Center, 23 03 2015. [Online]. Available: http://scalasca.org/about/about.html. [Accessed 14 06 2019].

[14] F. W. B. J. W. E. Á. D. B. B. M. Markus Geimer, “The Scalasca performance toolset architecture,” Concurrency and Computation: Practice and Experience, no. 22, p. 702–719, 2010.

[15] University of Southern Carolina, “Running a Job on HPC using Slurm,” University of Southern Carolina, [Online]. Available: https://hpcc.usc.edu/support/documentation/slurm/. [Accessed 22 06 2019].

107 [16] SchedMD, “Slurm Overview,” 24 04 2019. [Online]. Available: https://slurm.schedmd.com/overview.html. [Accessed 23 06 2019].

[17] Adaptive Computing, “TORQUE Resource Manager,” Adaptive Computing, 2019. [Online]. Available: http://www.adaptivecomputing.com/products/torque/. [Accessed 23 06 2019].

[18] Wikimedia Foundation, “TORQUE,” 31 07 2018. [Online]. Available: https://en.wikipedia.org/wiki/TORQUE. [Accessed 23 06 2019].

[19] Adaptive Computing, “TORQUE Resource Manager Data Sheet,” Adaptive Computing, 2018. [Online]. Available: http://www.adaptivecomputing.com/wp-content/uploads/2018/07/TORQUE-Resource-Manager- Data-Sheet.pdf. [Accessed 23 06 2019].

[20] IBM, “IBM Closes on Acquisition of Platform Computing,” IBM, 09 01 2012. [Online]. Available: https://www- 03.ibm.com/press/us/en/pressrelease/36372.wss. [Accessed 21 07 2019].

[21] IBM, “Inside an LSF cluster,” IBM, 2019. [Online]. Available: https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_foundations/chap_lsf_cluster.html. [Accessed 24 06 2019].

[22] IBM, “LSF Security Model,” IBM, 2019. [Online]. Available: https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_foundations/security_lsf_overview.html. [Accessed 24 06 2019].

[23] Oracle, “Gird Engine,” Oracle, 22 10 2018. [Online]. Available: https://www.oracle.com/technetwork/oem/grid-engine-166852.html. [Accessed 23 06 2019].

[24] Univa, “Univa Grid Engine Datasheet,” Univa, 2019. [Online]. Available: http://www.univa.com/resources/files/gridengine.pdf . [Accessed 23 06 2019].

[25] E. L. N. A. S. William Gropp, “A high-performance, portable implementation of the MPI message passing interface standard,” Parallel Computer, Elsevier, vol. 22, no. 6, pp. 789-828, 1996.

[26] Orcale, “About Sun,” Oracle, 2010. [Online]. Available: https://www.oracle.com/sun/. [Accessed 15 05 2019].

[27] Tiobe, “Tiobe Index,” 05 2019. [Online]. Available: https://www.tiobe.com/tiobe-index/. [Accessed 15 05 2019].

[28] R. a. C. M. a. R. F. a. R. R. a. C. J. a. F. J. P. a. S. J. Pereira, “Energy Efficiency Across Programming Languages: How Do Energy, Time, and Memory Relate?,” in Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, Vancouver, BC, Canada, ACM, 2017, pp. 256-267.

[29] Eclipse Foundation, “About Us,” [Online]. Available: https://www.eclipse.org/org/. [Accessed 15 05 2019].

[30] Apache Foundation, “Apache Netbeans,” Apache Foundation, 27 04 2019. [Online]. Available: https://netbeans.apache.org/. [Accessed 22 06 2019].

[31] Jetbrains, “IntelliJ IDEA,” Jetbrains, 2019. [Online]. Available: https://www.jetbrains.com/idea/. [Accessed 22 06 2019].

108 [32] Apache Foundation, “About Apache Pivot,” Apache Foundation, [Online]. Available: https://pivot.apache.org/about.html. [Accessed 15 05 2019].

[33] Oracle, “Swing,” Oracle, [Online]. Available: https://docs.oracle.com/javase/8/docs/technotes/guides/swing/. [Accessed 16 05 2019].

[34] Andreas Viklund, “JFreeChart,” JFree, 2017. [Online]. Available: http://www.jfree.org/jfreechart/. [Accessed 16 05 2019].

[35] Microsoft, “Visual Studio Code,” Microsoft, 2019. [Online]. Available: https://code.visualstudio.com. [Accessed 16 05 2019].

[36] The GNU Foundation, “GCC,” The GNU Foundation, 03 05 2019. [Online]. Available: https://code.visualstudio.com. [Accessed 16 05 2019].

[37] Bitvise, “Download PuTTY,” Bitvise, [Online]. Available: https://www.putty.org. [Accessed 01 06 2019].

[38] WinSCP, “Introducing WinSCP,” 2019. [Online]. Available: https://winscp.net/eng/docs/introduction. [Accessed 01 06 2019].

[39] Top500, “Performance Development,” Top500, 2018. [Online]. Available: https://www.top500.org/statistics/perfdevel/. [Accessed 18 05 2019].

[40] Case Western Research University, “HPC Statistics,” Case Western Research University, 2014. [Online]. Available: https://sites.google.com/a/case.edu/hpc-upgraded-cluster/cluster-faq/hpc-statistics. [Accessed 18 05 2019].

[41] European Union, “EuroHPC,” European Union, 04 2019. [Online]. Available: https://ec.europa.eu/digital- single-market/en/eurohpc-joint-undertaking. [Accessed 17 05 2019].

[42] ES Horizonte 2020, “Asociaciones Público-Privadas,” [Online]. Available: https://eshorizonte2020.es/mas- europa/grandes-iniciativas/asociaciones-publico-privadas-ppps. [Accessed 17 05 2019].

[43] Red Hat, “Buy Red Hat Enterprise Linux Server,” Red Hat, 2019. [Online]. Available: https://www.redhat.com/en/store/red-hat-enterprise-linux-server. [Accessed 04 06 2019].

[44] Red Hat, “IBM TO ACQUIRE RED HAT, COMPLETELY CHANGING THE CLOUD LANDSCAPE AND BECOMING WORLD’S #1 HYBRID CLOUD PROVIDER,” Red Hat, 28 10 2018. [Online]. Available: https://www.redhat.com/en/about/press-releases/ibm-acquire-red-hat-completely-changing-cloud- landscape-and-becoming-worlds-1-hybrid-cloud-provider. [Accessed 04 06 2019].

[45] Wikipedia, “User Datagram Protocol,” Wikipedia, 27 07 2019. [Online]. Available: https://en.wikipedia.org/wiki/User_Datagram_Protocol#Comparison_of_UDP_and_TCP. [Accessed 30 07 2019].

[46] R. Klein, Scheduling of resource-constrained projects, Boston: Kluwer Academic, 2000.

[47] Glass, “Glassdoor salaries,” Glassdoor, [Online]. Available: https://www.glassdoor.com/Salaries/madrid- salary-SRCH_IL.0,6_IM1030.htm. [Accessed 08 2019].

109 [48] Seguridad Social, “Bases y tipos de cotización 2019,” Seguridad Social, 2019. [Online]. Available: http://www.seg- social.es/wps/portal/wss/internet/Trabajadores/CotizacionRecaudacionTrabajadores/36537. [Accessed 08 2019].

[49] H. Muhammad, “htop,” [Online]. Available: https://hisham.hm/htop/. [Accessed 2019].

[50] Glassdoor, “Senior Software Engineer salaries in Madrid,” Glassdoor, [Online]. Available: https://www.glassdoor.com/Salaries/madrid-senior-software-engineer-salary- SRCH_IL.0,6_IM1030_KO7,31.htm. [Accessed 08 2019].

[51] Microsoft, “ 10 Pro,” Microsoft, 2019. [Online]. Available: https://www.microsoft.com/es-es/p/windows-10- pro/df77x4d43rkt/48DN?icid=Cat_Windows_mosaic_linknav_Pro_090117- en_US&activetab=pivot%3aoverviewtab. [Accessed 08 2019].

[52] Microsoft, “Microsoft Office Professional,” Microsoft, 2019. [Online]. Available: https://www.microsoft.com/es-es/p/office-profesional-2019/cfq7ttc0k7c5?activetab=pivot%3aoverviewtab. [Accessed 08 2019].

110 Appendix A: User manual

This is a brief guide explaining how to work with the GUI, from setting it up to using all its functionality.

First steps

Before doing anything else, it is necessary to set up the GUI in the user’s computer, and make sure it will be able to execute properly and connect with the controller to interact with the system.

Step 1: Install and setup

There is no installation process to use the GUI. The only important thing is to set up Java in the computer where the software is going to be used. The proper execution has been tested using both Java 8 and Java 11. The FlexMPI GUI comes packed as a JAR file, with the name FlexMPI_GUI.jar. This file can be copied anywhere in the filesystem, and it has tso be ensured that it receives execution privilege if it is wanted to run it directly. Together with the file, there is the file format., which is the XML file that the program will use to detect the format of the metrics that are going to be received. This file has to be placed in a folder called xml, in the same directory than the JAR file. There is an example of this in Figure 50.

Figure 50: FlexMPI GUI directory example

111 Step 2: Run

Running the GUI is really simple. If the JAR file has been granted execution, it is as simple as double clicking on the file. The other option to run the file is using the terminal. Once in the file directory, the command java -jar FlexMPI_GUI.jar will execute the GUI. The main windows of the GUI will show empty, without any application registered yet, as in Figure 51.

Figure 51: FlexMPI GUI main window after launch

Step 3: Connect controller

While this manual is related to the usage of the GUI and does not cover the controller and the rest of FlexMPI, this section will explain basically how to launch the controller, so it connects to the GUI as otherwise the GUI is useless. It is important that the GUI has to be executed before the controller is launched. To execute the controller, it is necessary to know the IP and the port where the GUI is listening. If it is not changed manually, the default port is 6660. With this data, the controller is launched form its corresponding directory, with the following command, adding -GUI and its parameters should be added to any other wanted flag: ./controller -GUI As for the parameters, and are respectively the address and the port to connect with the GUI. is the port number that the controller should be using to receive the GUI messages, and the only constraint to choose it is that it has to be a free port. Typically, the chosen port will be 6661.

112 Using the GUI

Once everything is set up, it is time for actually using the tool. This section of the guide helps getting familiar with the different components and functionalities that the GUI brings to be used.

Main Window

All the applications executed appear in the main window of the GUI. This window offers a fast view of the status of the system, as well as access to the different functionalities. An example of this window can be found in Figure 52, with the different components highlighted and numbered.

Figure 52: FlexMPI GUI main window components

1. Application name, together with the app ID. 2. FLOPS and CPU time, that are showed whenever an application statistics service is active. 3. Options button, which opens the options panel for that application. 4. Show info button, which opens the visualization panel for this application. 5. Request button, which request a metrics message out of the default interval. 6. Show nodes button, which opens the nodes panel where nodes and contention alerts are shown. 7. Panel to show some messages from the GUI.

113 Commands Panel

This panel contains most of the functionality for sending commands to a connected application. It is opened using the Options button from the main windows for the applications that is going to be messaged. Each option is shown highlighted and numbered in Figure 53.

Figure 53: Commands panel components

1. Get counters 2. Add/Remove process 3. Load balance 4. Start/statistics service 5. Terminate app

114 Statistics View

To watch the statistics, the first step is to activate the statistics service by clicking button 4 from the command panel. When the statistics service is active, it will be seen as in Figure 54.

Figure 54: Commands panel with statistics service active

To open the statistics window for an application, it is necessary to click button 4 (Show info) in the main window for that application, which will open the corresponding panel. The window that get open is shown in Figure 55.

Figure 55: Application processes and statistics messages view

115 This window has two sections, the button 0 is the one that will show the metrics plot. It corresponds to the process number 0 for this application. Although at the moment the metrics are received aggregated per-application, if that changed there would be more buttons in that panel. The bottom text panel shows the different metrics that have been received for that application. Finally, the code that appears on top of the window is the hash code of the application name, and it will be used to name the file where the statistics are going to be stored persistently. When pressing the process 0 button, it will open the plotting panel. This panel, shown in Figure 56, has on the left side the different metrics to select which ones are going to be plotted. The main part of the panel is the chart, where the metrics are plotted based on time. The colour corresponding to each metrics can be found below, with the metric name.

Figure 56: Metrics plot panel

116 Nodes View and Management

The big button number 7 on the left of the main window opens the nodes panel. This panel, shown in Figure 57, contains a button for each existing node, organized in a grid-like layout to find easily each one of the nodes.

Figure 57: Nodes view panel

When a contention message arrives, the button will change its colour to red as in Figure 58, notifying that there is an issue with that node. It is important to remark that if the node contains an application with the auto solve option activated, the GUI will solve the issue without user interaction, thus not showing any alert.

Figure 58: Nodes view panel with contention alert

117

Clicking on each one of the buttons will open the information panel for that node. In the node information frame, the GUI shows the metrics that arrived with the contention alert, along with the applications that are running in that node at the time of the error. An example of this panel is shown in Figure 59.

Figure 59: Node information panel

As seen in the example, the metrics are shown in the top half of the panel. These metrics correspond to the values for the node when the alert arrive, and at least one of them should look problematic. Below the metrics, there is the list of the applications that are running in the node, with two buttons for each app. The first button, “Show info”, has the same functionality as the one in the main window, it will open the statistics window for that application to easily visualize if there is any performance drop. In case a lower performance is detected, the second button for the app has the functionality of changing that application to a different node, so the problems on that node are reduced. Finally, the two buttons in the bottom on the panel have the functionality of automatically solving the problem. The first one will make it for a single time, while the second one will activate the automatic service. In both cases it will be the system will automatically choose the application to move from that node to a different one, but in the second one each time that anew contention message arrives it will be automatically handled by the GUI. The way to know if a node is being automatically fixed is because the corresponding button will turn green.

118