Flood Prediction Model Simulation with Heterogeneous Trade-Offs in High Performance Computing Framework
Total Page:16
File Type:pdf, Size:1020Kb
FLOOD PREDICTION MODEL SIMULATION WITH HETEROGENEOUS TRADE-OFFS IN HIGH PERFORMANCE COMPUTING FRAMEWORK Antoni Portero Simone Libutti Radim Vavrik Giuseppe Massari Stepan Kuchar William Fornaciari Martin Golasowski Politecnico di Milano Vit Vondrak Dipartimento di Elettronica, IT4Innovations Informazione e Bioengegneria (DEIB) VSB-Technical University of Ostrava Milano, Italy Ostrava, Czech Republic Email: [email protected] Email: [email protected] KEYWORDS that need on-demand computation are emerging. Looking into the future, we might imagine event-driven and data-driven Hydrology model simulation; High Performance Comput- HPC applications running on demand to predict any kind of ing systems; disaster management; reliability models; com- event, e.g. running air pollution dispersion models at real time puting resource management; multi and many-cores systems; to predict hazardous substance dispersion caused by some parallel systems accident. Of course, as we build confidence in these emerging computations, they will move from the scientists workbench ABSTRACT into critical decision-making paths, but HPC clusters will still In this paper, we propose a safety-critical system with be needed to execute and analyse these simulations in a short a run-time resource management that is used to operate an time frame and they will have to provide the results in a application for flood monitoring and prediction. This applica- reliable way based on the specified operational requirements. tion can run with different Quality of Service (QoS) levels The intent of this paper is to describe the characteristics depending on the current hydrometeorological situation. and constraints of disaster management (DM) applications for industrial environments. The system operation can follow two main scenarios - stan- dard or emergency operation. The standard operation is active We therefore propose an HPC system running with a Run- when no disaster occurs, but the system still executes short- Time Resource Manager (RTRM) that monitors the healthiness term prediction simulations and monitors the state of the river of the platform. Our in-house disaster management application discharge and precipitation intensity. Emergency operation is called Floreon+ (Martinovic,ˇ Kuchar,ˇ Vondrak,´ Vondrak,´ Sˇ´ır & active when some emergency situation is detected or predicted Unucka 2010) is running on the system. The application can by the simulations. The resource allocation can either be used run with different Quality of Service (QoS) levels depending for decreasing power consumption and minimizing needed on the situation. Taking the current situation in mind, the resources in standard operation, or for increasing the precision resource allocation can be used either for decreasing power and decreasing response times in emergency operation. This consumption and minimizing needed resources in standard paper shows that it is possible to describe different optimal operation or increasing the precision and decreasing response points at design time and use them to adapt to the current times in emergency operation. Power consumption is a critical quality of service requirements during run-time. consideration in high performance computing systems and it is becoming the limiting factor to build and operate petascale INTRODUCTION and exascale systems in the future. When studying the power consumption of existing systems running HPC workloads, we Urgent computing (Beckman 2008) prioritises and provides find that power, energy and performance are closely related immediate access to supercomputers and grids for emergency leading to the possibility to optimize energy without sacrificing computation such as severe weather prediction during disaster performance. situations. For these urgent computations, late results are use- less results, because the decisions based on these results have This paper shows that it is possible to describe different op- to be made before the disaster happens and endangers human timal points at design time and use them to adapt to the current lives or property. As theHigh Performance Computing (HPC) QoS requirements during run-time. The proposed process can community builds increasingly realistic models, applications then be used to prepare an operational environment with high Proceedings 29th European Conference on Modelling and Simulation ©ECMS Valeri M. Mladenov, Petia Georgieva, Grisha Spasov, Galidiya Petrova (Editors) ISBN: 978-0-9932440-0-1 / ISBN: 978-0-9932440-1-8 (CD) availability and varying energy cost and model precision levels. the application’s behaviour based on the different ways the We manage to adapt the precision of the model by changing its system can be used in its over-all context, system scenarios input parameters and configuration, e.g. changing the number classify the behaviour based on the multi-dimensional cost of Monte-Carlo samples used for uncertainty modelling. The trade-off during the implementation (Gheorghita, Vandeputte, change of precision alters the model execution time and/or Bosschere, Palkovic, Hamers, Vandecappelle, Mamagkakis, the number of required computational resources, effectively Basten, Eeckhout, Corporaal & Catthoor 2009). changing the energy cost of the simulation. The decision to focus on the precision or energy cost is taken by the RTRM RTRM: Run-Time Resource Manager that computes the best-trade off for the current QoS. The RTRM (Run-Time Resource Manager) framework This paper is divided into seven sections. A section contain- (Bellasi, Massari & Fornaciari 2015) is the core of a highly ing the current state-of the art follows this introduction section. modular and extensible run-time resource manager which In section three, we introduce our driving disaster management provides support for an easy integration and management of example in the form of a rainfall-runoff uncertainty model. multiple applications competing on the usage of one (or more) Section four describes the run-time operating (RTRM) system shared many-core computation devices. The framework design, that monitors the sensors and knobs of the platform. Section which exposes different plug-in interfaces, provides support five shows the possible way of RTRM deployment in an for pluggable policies for both resource scheduling and the HPC environment. Section six presents the results of possible management of applications coordination and reconfiguration. optimal computation points that are obtained at design time for the described solution deployed on single computing node With RTRM, it is possible to make a suitable instrumen- and on multiple nodes. These optimal points are used by the tation to support Design-Space-Exploration (DSE) techniques, RTRM at run-time and the results show the performance and which could be used to profile application behaviour to either energy efficiency of the deployed solution. The final section optimize them at design time or support the identification provides a summary of the presented research and possible of optimal QoS requirement goals as well as their run-time future work. monitoring. Suitable platform abstraction layers, built on top of GNU/Linux kernel interfaces, allow an easy porting of the framework to different platforms and its integration with THE STATE OF THE ART specific execution environments. Elastic computing cloud (Galante & de Bona 2012) can The configuration of a run-time tunable application is provide thousands of virtual machine images rapidly and defined by a set of parameters. Some of them could impact the cost effectively. Unfortunately, one of the main drawbacks of application behaviour (e.g. the uncertainty of rainfall-runoff elastic computing is the application scalability due to computer models can produce different results for different scenarios) limited bandwidth communication. Another problem is the while other have direct impact on the amount of required feasibility of allocating resources and the infrastructure size resources (e.g. the amount of Monte-Carlo method samples that can be used on demand in case of emergency; sometimes, in uncertainty modelling and the time between simulation it cannot be determined fast enough. Therefore, one of the batches leads to different requirements for allocating system goals of the future HPC systems is to develop automated resources). frameworks that use power and performance knobs to make applications-aware energy optimizations during execution us- FLOOD MONITORING AND PREDICTION ing techniques like dynamic voltage and frequency scaling MODEL (DVFS) (Iraklis Anagnostopoulos 2013) for reducing the speed (clock frequency) in exchange for reduced power. Power gating The studied flood prediction model is a modular part technique then allows for the reduction of power consumption of the Floreon+ system (Martinovicˇ et al. 2010). The main by shutting off the current to blocks of the circuit that are objective of the Floreon+ system is to create a platform for not in use, in case that diverse computations have different integration and operation of monitoring, modelling, prediction power requirements. For example, when a CPU is waiting and decision support for disaster management. Modularity for resources or data, the frequency can be reduced to lower of the Floreon+ system allows for a simple integration of power consumption with minimal performance impact (Tiwari, different thematic areas, regions and data. The central thematic Laurenzano, Peraza, Carrington & Snavely 2012). area of the project is flood monitoring and prediction.