<<

HiPro-CodeGen Automatic Programming for Parallel Numerical Simulations

Liao Li, Jing Cuiping, Wang Wei, Zhang Aiqing and Yang Zhang Institute of Applied Physics and Computational Mathematics, No. 2, East Fenghao Road, Beijing, China

Keywords: High Performance Computing, Generation, Graphical Programming, Parallel Programming, Component-based Programming.

Abstract: HiPro-CodeGen is a code generation engine designed for numerical simulation development. Its central objective is to produce a parallel software framework with standard structure for an application developed on JASMIN, a domain-specific computational framework. The unique parallel part and all interfaces of the application are generated and implementation of sequential subroutines is the only part of the code left to be written manually for a programmer. The design and implementation of the code generation engine is introduced which combines numerical mathematics with component-based programming to create ontological models for parallel simulations. A hybrid programming method is proposed on the work mechanism of the engine which combines graphical and textual approaches to hide parallel programming and object-oriented programming from developers. A real application is presented to show the effectiveness and efficiency of the engine.

1 INTRODUCTION difficulties on parallel programming embarrass the development of high performance numerical Numerical simulation is an important method to simulations. explore the evolution of physical systems by To ease the coding issues many studies on scientists in current age (Mo, Pei, 2009). With the graphical programming and automated code development of high performance computing, generation techniques have been given on simulation is leading to high accuracy and high developing parallel applications. Boris etc. develop confidence in scientific study. Developing a an integrated development environment for visual scientific simulation covers implementing scientific parallel programming (Boris, 2006). It is not methodology, algorithms, and programming models, effective to develop complicated numerical and virtually always each advance has been simulations because of limitation on graphical accompanied by increases in the complexity of the operations. MatLab also gives some parallelism underlying software. At the same time, the computer capabilities with toolbox while industry has continued to create ever larger and the scalability and the performance of the program more complex hardware in an attempt to satisfy the are limited by its parallel computing mechanism. increasing demand for simulation capabilities Experience says that programming in graphic (Benjamin, 2006). These architectures tend to form completely is embarrassed on solving the exacerbate the complexity of software running on problems especially developing numerical these systems. Modern programmers may repeatedly simulations because of the complexities of rewrite applications to expose incrementally more simulations and the limitations of the graphical parallelism for each next generation of hardware. It programming technology. Liao etc. propose a hybrid requests a programmer be knowledgeably aware of programming approach which combines graphical new software and hardware. Admittedly, this is a and textual programming to solve the problem by demanding task for current programmers most of shielding parallel programming details and object- which are engaged in researching on physics and oriented programming languages for developing mathematics, because they have been trained in scientific applications based on JASMIN. sequential programming. We are facing a fact that

Li L., Cuiping J., Wei W., Aiqing Z. and Zhang Y.. 125 HiPro-CodeGen - Automatic Programming for Parallel Numerical Simulations. DOI: 10.5220/0005089101250131 In Proceedings of the 9th International Conference on Software Engineering and Applications (ICSOFT-EA-2014), pages 125-131 ISBN: 978-989-758-036-9 Copyright c 2014 SCITEPRESS (Science and Technology Publications, Lda.) ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

In this paper we present the design and a patch domain which is part of the global implementation of the code generation engine which region. Programmers must implement some supply the automatic programming capabilities strategy procedures which are implicitly called given in HiPro (Liao,2013), originally named with by JASMIN. IDE-JASMIN. The content of this paper is organized . Integrator component: a parallel computation as follows. Section 2 analysis the features of parallel mode defined in JASMIN. It is invoked in a programs developed on JASMIN. In section 3, the level strategy class and implicitly calls design of the code generation engine which utilizing procedures defined in its registered patch the features to create topological model is presented strategy class. in detail. In section 4, we show the advantages of . Numerical kernel: A kind of subroutine to using HiPro-CodeGen to develop a computational implement part of numerical computational fluid dynamic parallel numerical simulation program method. It is a sequential program. in the system. Then we present our conclusion and point to directions for future work. 2.1 Data Structure and Numerical Algorithm 2 FEATURE MODEL OF A Generally numerical simulation is solving a system PARALLEL NUMERICAL of PDEs or ODEs with numerical methods on a discrete grid. JASMIN supports development of SIMULATIONS structured grid application of two types including rectilinear and deforming grid (Mo, 2009). Figure 1 JASMIN abstracts a general framework from shows the two-dimensional example grids often- numerical simulation domain and provides large used in numerical simulations. Data parallelism is granularity software reuse in scientific high- the main form of parallelization of computing. In the performance computing (Mo, 2010). The framework kind of parallelism, as is showed in Figure 2, the tends to isolate the increased complexity of the region is decomposed into many domains during the software and operating environment from the process of the computation. Each domain is programmers and utilize state of art technologies to extended with ghost zones to satisfy the need of extract the maximum possible performance. It numerical scheme. In JASMIN, three kinds of facilitates problem solving with encapsulation of coordinate system are available: Cartesian, parallel computing and communication through cylindrical and spherical coordinates. Grid type and patch-based data structures and component-based coordinate system determine the mesh description mechanism which shield data decomposition and mode in a program. communication from the users. Generally numerical A physical variable, which also named with field simulations share many common features. These are variable, is defined on the special position of the utilized in JASMIN to provide a more normative architecture for developing a parallel program. We define a feature model to illustrate a numerical simulation developed based on JASMIN. It is used to direct the design of the hybrid graphical programming interface. The model includes five Figure 1: Rectilinear and deforming grid used in structured parts which are described in the following appliations. subsections. Before we introduce the model, to describe the construction features of a numerical simulation, we give some key concepts: . Level strategy class: strategy class which is responsible for dealing with the computation on global region. Programmers must implement some strategy procedures which are implicitly called by JASMIN. . Patch strategy class: strategy class specially serves for an integrator component. It is responsible for dealing with the computation on Figure 2: Domain decomposition for data parallelism.

126 HiPro-CodeGen-AutomaticProgrammingforParallelNumericalSimulations

implement his computational method in sequential numerical subroutines while parallelism is

Cell Node Side-X Side-Y Particle accomplished in integrator components. The separation makes hybrid programming of combining Figure 3: Discretization types of variables in a numerical graphical and textual programming feasible. simulation. 2.2 Application Architecture and discretization grid, such as cell, node, side or face Program Organization etc., as illustrated in Figure 3. The numerical algorithm usually involves time Domain experts develop parallel numerical iterations with given initial conditions, i.e. time applications by assembling configured parallel integration algorithm. A complex algorithm is integrator components in a flowchart of time broken down into smaller and manageable parts. integrator algorithm. For each component carrying These parts are arranged to be executed adhering to on computation task a patch strategy must be some rules. Figure 4 shows the flowchart of a implemented and the object of it be registered to the general time integration algorithm. Generally the component. Some virtual interfaces must be algorithm involves four key modules which are implemented in C++ language and in which, data of organized in a loop and the user provides parameters field variables is fetched via patch object and passed to control the stop of it. All modules should be to a numerical kernel invoked. Therefore parallelism implemented according to the concrete computation. for a computational module is achieved by connecting the component with real numerical Start kernels. These components provide interfaces or ports to be invoked in level strategy. As is shown in Initialization figure 5, the application architecture model includes: Continue ? . Main program . A level strategy class which deals with

Evaluate time step assembling components. . Patch strategy classes. For each component Advance 1 step some specific procedures must be implemented in a patch strategy class to invoke sequential Accept solution numerical subroutines to accomplish numerical computation. The number of the patch strategy classes is determined by the number of the End components created. . Sequential numerical subroutines Figure 4: A general flowchart presenting time integration algorithm. Programs and Classes Component Patch Strategy Subroutines of Common Objectes Classes Archetecture

Generally developers are used to programming TypeDefinations.h subprog‐1 numerical subprograms in , which is a Comp‐1 ICPComp‐1 Main Program.C subprog‐2 conventional textual, linear programming language Comp‐2 ICPComp‐2 that is familiar to programmers who engaged in the subprog‐3 AppLevelStrategy . . sequential numerical simulation development. Comp‐n Component architecture have been used for high VariableManager ICPComp‐n subprog‐n performance scientific computing for years(Parker, 2002). Parallel integrator components are proposed AppFort.h to encapsulate parallel computing details on patch- Figure 5: Architecture of an application and invoking based data structure (Mo, 2009). Parallelism is model. obtained by creating integrator components to encapsulate sequential numerical subroutines. Components encapsulate much of the complexity of 3 DESIGN OF HiPro-CodeGen parallelism inside a black box such as invoking MPI or OpenMP interfaces and utilizing multi or many- To develop a parallel program of numerical cores CPU in modern machines. We support user to simulation the user must write program to:

127 ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

. Define variables. numerical subroutines. . Code sequential computation kernels. The type definition of rectilinear grid variable for . Create parallel integrator components. a patch domain in C and C++ is . Assemble components in the time integration algorithm to construct a parallel program. Typedef struct { . Deal with physical model described in an input int ilo[NDIM]; file. int ihi[NDIM]; . Output physical variables for visual analysis. double xlo[NDIM]; double dx[NDIM]; . Create a main procedure to organize issues } geom_t listed above. HiPro is designed to offer graphical The definition of that for deforming grid is programming capability to develop parallel similar with it which only needs the first two numerical simulations. The system implements a parameters. Comparative definition of them in graphical domain-specific programming language Fortran shares the same form using derived types. combined with textual editor to support parallel We leave it out here for simplicity. The type programming. The idea of parallel design combined definition for ghosts is with sequential coding is achieved in the visual programming environment. Typedef struct { According to the application model described in int ilo[NDIM]; section 2, the programming task is classified into int ihi[NDIM]; two types: configured on GUI and processed int btype; automatically. Upon this we shield most of them as int location; best as we can in the programming system. int condition; Therefore to construct a numerical simulation } ghost_t program only the first four things of those which are For physical variables, we give a set of domain- mentioned at the beginning of this section left to the specific data types listed in Table 1. programmer to deal with. The other issues are . Domain-specific data type: a kind of data type handled automatically. This mechanism releases the used for describing the discretization form of a user from complex programming. HiPro-CodeGen is physical variable specially defined in a the core of HiPro system which is responsible for numerical simulation. generating codes. Next we introduce the working mechanism in HiPro-CodeGen according to the four Table 1: Domain-specific data types. tasks separately. Type Properties 3.1 Define Variables Cell Element type, Depth, Ghost width Node Element type, Depth, Ghost width Variables in a scientific numerical simulation are Side-X Element type, Depth, Ghost width classified into three categories according to their Side-Y Element type, Depth, Ghost width functionality and source, they are: grid variables, Group Element type, Depth, Ghost width, Number of physical variables and grid independent ones. The Cell groups classification is used to standardize interface Group Element type, Depth, Ghost width, Number of definition for a procedure and build type match Node groups filtering when the procedure is called. Three variable databases are created to store the variables. They are Table 2: Supplementary data type. physical variable DB, general variable DB and input Type Properties parameter DB. Generic Element type, Depth A grid variable gives the definition of a domain. As noticed, no database is created for grid variables. To describe the general variables we supplement In fact, grid ones are created automatically. There the type list with a special one, as is shown in the are two build-in grid variables in the system. One is Table 2. Upon this we extend data types to cover all used to describe the patch domain and the other is variables in JASMIN. These extensions are used to give the information of ghost zones. This is necessary to support full application development in vitally important for the system design because we scientific and engineering computing. use them to normalize the interface definition of

128 HiPro-CodeGen-AutomaticProgrammingforParallelNumericalSimulations

Programmer defines a variable through graphical code of interface definition is generated and inserted user interfaces interactively while the system will into the beginning of the program file automatically. invisibly generates the corresponding source code of This approach frees users from error-prone and it. For instance, the code of a physical variable of tedious type definitions and variable declarations. Cell type for a two dimensional application is given Figure 6 shows an example code generated for a as below: subroutine. The code will be updated and replace the older one when the user modify the definition integer :: gw_varName(0:1) through GUI. ElementType [,IntentType]:: varName (geom%ilo(0)-gw_varName(0) : 3.3 Create Parallel Integrator geom%ihi(0)+gw_varName(0), geom%ilo(1)-gw_varName(1) : Components geom%ihi(1)+gw_varName(1)) A type library of component is provided in the 3.2 Program Sequential Numerical system. Each type of components performs some Subroutines kind of well defined task and has specially designed GUI. Available types of integrator components and In the system numerical subroutines are classified their functional properties are listed in Table 3. The into three categories: property specifies acceptable categories of . (Sub I) Execute computation on a domain. subroutines and whether this kind of component . (Sub II) Fill physical boundary. needs to register patch data to accomplish memory . (Sub III) Process grid independent variables management or data communication.

Table 3: Integrator component types and functional properties. Type Properties Initialize Sub I, RegisterPDI Numerical Sub I+II, RegisterPDI Dt Sub I, RegisterPDI Reduce Sub I, RegisterPDI Memory RegisterPDI Copy RegisterPDI ParticleComm RegisterPDI

In fact, creating and using a parallel integrator component in HiPro needs only three simple steps: . Create a component from an appropriate component model; . Configure necessary properties of it; . If necessary, select a proper numerical

subroutine and pass actual arguments to it. Figure 6: Generated code for a numerical kernel interface in Fortran 90. 3.4 Assembling Components in Requirements of dummy parameters are different Computational Flow Charts according to the classification. For example, no Main concern of writing a program is how to dummy parameter of physical variable type is compose these components together to form permitted in the third category. A parameter of applications. A flow chart is the graphical geom_t is absolutely necessary in the first and representation of control flow of an algorithm (Tia, second category and ghost_t is mandatory in the 2004). second one. There must be one and only one . Predefined flowchart: expressing the general parameter of such type. The classification is used to numerical computation execution process of a build interface requirement match for integrator time integration algorithm. component definition. A predefined flowchart, as is shown in figure 7 We use Fortran 2003 as the programming representing the time integrator algorithm is used to language for coding numerical subroutines. The

129 ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

fit the need of rapid numerical integration algorithm a one-to-one relationship between the subroutines implementation. The main flowchart is composed of and the components. One exception is that a some semi-predefined flowcharts which includes subroutine of category Sub II must be assigned to a initalizeLevelData, getLevelDt, advanceLevel and component together with a subroutine of category acceptTimeDependentSolution. They are actually the Sub I. The other exception is that a component of member procedures of the level strategy class. A Memory type must be created to allocate or flowchart is semi-predefined means that the deallocate memory for physical variables. interface of it is built-in defined and the body of it need to be implemented by the user. In HiPro, Table 4: Subroutines and integrator components created components are inserted into the flowchart to define for the simulation. the implementation of the procedure. Loops, Sequential Subroutines(Category) Component(Type) conditional branches and other control structures can initset(Sub I) initComp(Initialize) be used to configure the execution logic of the getdt(Sub I) dtComp(Dt) components. On the right of the Figure 7, part of setphy(Sub II) fluxComp(Numerical) advanceLevel is illustrated, where some integrator getflux(Sub I) components are created and used to organize the updateconser(Sub I) connComp(Numerical) actual computation. acceptsolution(Sub I) updateComp(Copy) allocComp(Memory)

Figure 8: Screenshot of physical variables created in HiPro for solving the linear advection equation.

Figure 7: Assembling integrator components to implement a procedure for the level strategy class.

Flowcharts and kernels as well as other data from graphical input are stored into a database, which are then feed into code generation engine to accomplish automatic code generation using predefined code templates.

4 APPLICATION Figure 9: The four implemented semi-predefined We give a test on developing a simulation program flowcharts. for solving linear advection equation: Figure 8 shows the screenshot of physical u  (a  u)  0 variables created in HiPro for the simulation. Four t semi-predefined flowcharts with components The initial value is given by and u(x,0)  u0 (x) inserted are shown in figure 9. As is presented in the boundary condition is set as u(x,t)  u (t) . Table 5, in this application it consists of 17 C++ The computation is decoupled into six numerical program files, 2 files coded in C language, 7 Fortran subroutines. Six components are created to files and 1 CMake file. Among 1947 lines of source implement parallel computing and communication. code about 6 percent of them are coded by hand. The subroutines with category specification and components are listed in Table 4. Generally there is

130 HiPro-CodeGen-AutomaticProgrammingforParallelNumericalSimulations

Table 5: Program files and ratio of auto-generated for the development for tens of years and is quite familiar to simulation of solving linear advection equation. them. Ratio of Practice demonstrates that non-programmers can Program files Language Lines auto- create fairly complex programs with little training. It generated greatly reduces programming complexities and make Main.C C++ 237 100% numerical application development easy and fast. VariableManager.h C++ 59 100% Real world applications show that HiPro-CodeGen VariableManager.C C++ 152 100% ensures high-performance and high-quality for LinAdvLevelStrategy.h C++ 92 100% scientific numerical simulation. LinAdvLevelStrategy.C C++ 99 100% ICPInitComp.h C++ 33 100% ICPInitComp.C C++ 58 100% ACKNOWLEDGEMENTS ICPDtComp.h C++ 42 100% ICPDtComp.C C++ 90 100% ICPFluxComp.h C++ 40 100% This work is under the auspices of the National ICPFluxComp.C C++ 106 100% Natural Science Foundation of China (61033009), ICPConnComp.h C++ 40 100% National Basic Key Research Special Fund ICPConnComp.C C++ 90 100% (2011CB309702) and National High Research and ICPUpdateComp.h C++ 40 100% Development Program of China (2012AA01A309). ICPUpdateComp.C C++ 86 100% ICPAllocComp.h C++ 26 100% ICPAllocComp.C C++ 35 100% REFERENCES LinAdvFort.h C 33 100% TypeDefs.h C 26 100% Benjamin A. A., Robert A.,David E. B.,etc. 2006. A typedef.f90 Fortran 29 100% component architecture for high-performance initset.f90 Fortran 42 50% scientific computing. The International Journal of getdt.f90 Fortran 23 70% High Performance Computing Applications,20(2):162- 202 setphy.f90 Fortran 59 50% Boris S.,Alexey S.,Vera I. Integrated Development getflux.f90 Fortran 64 50% Environment for Visual parallel programming. In updateconser.f90 Fortran 60 60% Proceedings of the 10th Conference of Fruct acceptsolution.f90 Fortran 40 70% Association. CMakeLists.txt CMake 246 100% Liao L., Zhang A.Q., Yang Z., etc. IDE-JASMIN:An Interactive Graphical Approach for Parallel Programming in Scientific Computing. 8th International Conference on Software Engineering 5 CONCLUSIONS and Applications. Mo Z.Y., Pei W.B., 2009. Scientific computing Parallel programming usually involves tedious application codes. Physics (in Chinese). amount of coding and often error-prone for domain Mo Z.Y., Zhang A.Q., 2010. JASMIN: A parallel software infrastructure for scientific computing. Front. Comput. experts. HiPro-CodeGen, a powerful automatic code Sci. China. generation engine, is developed to generate major Mo Z.Y., Zhang A.Q., 2009. User’s guide for JASMIN, and tedious part of a JASMIN application. Plenty of Technical Report. https://www.iapcm.ac.cn/jasmine. knowledge and technologies are shielded from the Parker, S.G., 2002. A component-based architecture for programmers, including: parallel multi-physics PDE simulation. In Proceedings . Object-oriented programming of the International Conference on Computational . C++ language Science-Part III. Springer-Verlag. . Hybrid programming with C++ and Fortran Pei W.B., Zhu S.P., 2009. Scientific computing in Laser . Component-based programming Fusion. Physics (in Chinese), 38(8): 559-568. Tia W. 2004.The SFC Editor a graphical tool for It frees users from unnecessary exposure to algorithm development. Jounal of Computing Science complex features of programming issues of in Colleges. languages and JASMIN interfaces together with the application code organization. Users only need to write body code for numerical kernels with well- defined interface through GUI in Fortran which has been used for scientific computing program

131