New Implementations for the Simultaneous-FETI Method Roberto Molina, François-Xavier Roux
Total Page:16
File Type:pdf, Size:1020Kb
New implementations for the Simultaneous-FETI method Roberto Molina, François-xavier Roux To cite this version: Roberto Molina, François-xavier Roux. New implementations for the Simultaneous-FETI method. International Journal for Numerical Methods in Engineering, Wiley, 2019, 118 (9), pp.519-535. 10.1002/nme.6024. hal-02294320 HAL Id: hal-02294320 https://hal.archives-ouvertes.fr/hal-02294320 Submitted on 23 Sep 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ORIG I NAL AR TI CLE New IMPLEMENTATIONS FOR THE Simultaneous-FETI method. Roberto Molina1∗ | FRançois-XaVIER Roux2∗ 1LaborATOIRE Jacques LOUIS Lions, Sorbonne In THIS work, WE PRESENT ALTERNATIVE IMPLEMENTATIONS FOR THE Université, Paris, 75005, FRANCE 2DTIS/ONERA, Université PARIS SaclaY, Simultaneous-FETI (S-FETI) method. DeVELOPED IN RECENT F-91123 Palaiseau, FRANCE Years, THIS METHOD HAS SHOWN TO BE VERY ROBUST FOR HIGHLY Correspondence HETEROGENEOUS problems. HoweVER, THE MEMORY COST IN S- Roberto Molina, Applied Mathematics FETI IS GREATLY INCREASED AND CAN BE A LIMITATION TO ITS use. Department, National LaborATORY FOR ScientifiC Computing - LNCC, Petrópolis, RJ, Our MAIN OBJECTIVE IS TO REDUCE THIS MEMORY USAGE with- 25651-070, BrAZIL OUT LOSING SIGNIfiCANT TIME consumption. The ALGORITHM IS Email: [email protected] BASED ON THE EXPLOITATION OF THE SPARSITY PATTERNS FOUND ON Present ADDRESS THE BLOCK OF SEARCH directions, ALLOWING TO STORE LESS VECTORS ∗National LaborATORY FOR ScientifiC Computing - LNCC, Petrópolis, RJ, PER ITERATION IN COMPARISON TO A FULL STORAGE scheme. In ad- 25651-070, BrAZIL dition, DIFFERENT VARIATIONS FOR THE S-FETI METHOD ARE ALSO yDTIS/ONERA, Université PARIS SaclaY, F-91123 Palaiseau, FRANCE proposed, INCLUDING ANOTHER TREATMENT FOR THE POSSIBLE de- PENDENCIES BETWEEN DIRECTIONS AND THE USE OF THE Lumped FUNDING INFORMATION The AUTHOR IS SUPPORTED BY CONICYT/Chile PRECONDITIONER. SeVERAL TESTS WILL BE PERFORMED IN ORDER TO ESTABLISH THE IMPACT OF THE MODIfiCATIONS PRESENTED IN THIS WORK COMPARED TO THE ORIGINAL S-FETI algorithm. KEYWORDS DOMAIN decomposition, FETI, S-FETI, SCIENTIfiC computation, HPC, HETEROGENEOUS PROBLEMS 1 | INTRODUCTION In THE CONTEXT OF NUMERICAL RESOLUTION OF PARTIAL DIFFERENTIAL equations, THERE HAS BEEN AND STILL EXISTS A PERMANENT INCREASE IN THE SIZE AND COMPLEXITY OF REAL ENGINEERING applications. This PHENOMENON IS PRESENT IN PROBLEMS COMING Abbreviations: DDM, Domain Decomposition Methods.; FETI, Finite Element TEARING AND Interconnecting. ∗ Equally CONTRIBUTING authors. 1 2 MOLINA ET AL. FROM DIFFERENT fiELDS SUCH AS flUID dynamics, electromagnetism, STRUCTURAL analysis, etc. EvEN IF A LOT HAS ALREADY BE done, THERE IS STILL ROOM FOR IMPROVEMENT FOR THE EFfiCIENT EXPLOITATION OF MASSIVELY PARALLEL SYSTEMS TO SOLVE THESE problems. If WE look, FOR EXAMPLE IN THE FRAMEWORK OF SOLID mechanics, THE COMPLEXITY OF COMPOSITE MATERIALS LEADS TO SIMULATIONS IN DIFFERENT SCALES ON THE SAME NUMERICAL model, WHICH IS A NUMERICALLY CHALLENGING situation. Such PROBLEMS CAN LEAD TO ILL CONDITIONED SYSTEMS WITH DIMENSION RANGING FROM SEVERAL MILLIONS TO HUNDREDS OF millions. In ORDER TO USE THESE NEW ARCHITECTURES IN THE CONTEXT OF PDE solutions, THE DOMAIN DECOMPOSITION METHODS HAVE BEEN A LARGELY USED OPTION [1],[2],[3]. The fiNITE ELEMENT TEARING AND INTERCONNECTING METHOD (FETI) [4] AND THE BALANCED DOMAIN DECOMPOSITION METHOD (BDD) [5] ARE AMONG THE MOST POPULAR non-oVERLAPPING DOMAIN DECOMPOSITION methods. These non-oVERLAPPING METHODS ARE VERY WELL SUITED FOR SUPERCOMPUTERS OR COMPUTER CLUSTERS BECAUSE THEY ARE BASED ON THE SOLUTION OF LOCAL PROBLEMS EASILY PARALLELIZABLE ON A multi-core COMPUTE node. In PARTICULAR, THE FETI METHOD COMBINES THE SOLUTION OF THESE LOCAL PROBLEMS WITH THE ITERATIVE SOLUTION OF INTERFACE MATCHING CONDITIONS IN ORDER TO fiND THE GLOBAL solution. The FETI METHODS HAVE BEEN LARGELY DEVeloped, LEADING TO FASTER AND MORE ROBUST VERSIONS [6]. HoweVER, THERE ARE SOME NUMERICALLY STIFF PROBLEMS WHERE FETI DOES NOT PERFORM WITH THE SAME ROBUSTNESS AS usual. WE HAVe, FOR Example, THE CASES WITH EXTREME HETEROGENEITIES ALONG THE INTERFACE WHERE THE FETI OPERATOR IS DEfined, i.e., THE INTERSECTION OF DOMAINS MADE IN THE decomposition. This LEAD TO VERY ill-conditioned PROBLEMS THAT FETI WITH THE CLASSIC PRECONDITIONERS DOES NOT ASSURES A FAST CONVergence. FOR THESE STIFFER TYPE OF PROBLEMS THE Simultaneous-FETI (S-FETI) METHOD WAS DEVeloped, SHOWING VERY GOOD performances, SEE [7] AND [8]. It IS BASED ON THE CONSTRUCTION OF SEVERAL SEARCH directions, TO BE USED SIMULTANEOUSLY IN THE ITERATIVE SOLVER FOR THE INTERFACE problem. This CONSTRUCTION ADDS SOME EXTRA TIME AND MEMORY CONSUMPTION COST FOR S-FETI IN COMPARISON WITH THE CLASSIC FETI method. The REDUCTION IN THE TIME COST OF THE EXTRA COMPUTATIONS OF S-FETI IS TREATED IN [8]. HoweVER, THIS METHOD ALSO INCLUDES THE STORAGE OF A MUCH LARGER NUMBER OF SEARCH DIRECTIONS AT EACH ITERATION AND SIGNIfiCANT MEMORY ISSUES DERIVE FROM this. In THE CASE OF LARGER 3D problems, THIS MAY BE A LIMITATION IN THE USE OF THIS method. In THIS PAPER, WE PROPOSE DIFFERENT STRATEGIES TO GENERALIZE THE APPLICATION OF THE S-FETI METHOD TO A LARGER CLASS OF PROBLEMS BY REDUCING THE MEMORY use. WE fiRST EXTEND THE PRESENTATION OF S-FETI TO GENERATE EVEN MORE SEARCH DIRECTIONS PER ITERation. WE INCLUDE THE USE OF THE ALTERNATIVE Lumped PRECONDITIONER INSTEAD OF THE USUAL Dirichlet. WE SHOW AN ALTERNATIVE WAY OF SORTING THE SEARCH DIRECTIONS AND WE INTRODUCE A NEW IMPLEMENTATION THAT REDUCES THE MEMORY CONSUMPTION BY STORING THESE VECTORS IN A SPARSE form, i.e, BEFORE THE PROJECTION AND FULL REORTHOGONALIZATION processes. This PAPER WILL BE ORGANIZED AS follows. First, WE RECALL THE DEfiNITIONS OF FETI AND S-FETI. WE PROPOSE then, A NEW WAY OF SORTING THE SEARCH DIRECTIONS IN EACH STEP OF THE ITERATIVE method. Next, WE SHOW IN DETAIL THE NEW SPARSE STORAGE IMPLEMENTATION FOR THE S-FETI METHOD TO fiNALLY MAKE SOME NUMERICAL VALIDATION OF ALL THE IDEAS PROPOSED IN here. 1.1 | FETI DefiNITION LET US CONSIDER A LINEAR MECHANICAL PROBLEM DEfiNED IN A GLOBAL DOMAIN Ω. LET US ALSO CONSIDER THE DISCRETIZATION OF IT BY SOME fiNITE ELEMENT method, LEADING TO A SYMMETRIC AND POSITIVE DEfiNITE SYSTEM OF EQUATIONS Ku = f . The DOMAIN IS ¹sº NOW PARTITIONED INTO Ns non-oVERLAPPING SUBDOMAINS Ω , s = 1;:::; Ns AND NEW LOCAL Neumann PROBLEMS ARE SET up. The FETI METHOD INTRODUCES A LagrANGE MULTIPLIERS fiELD λ THAT CONNECTS THESE subdomains. The FOLLOWING SYSTEMS ARE MOLINA ET AL. 3 THEN FORMED T T K ¹sºu¹sº = f ¹sº + t ¹sº B¹sº λ N Õs (1) B¹sºt ¹sºu¹sº = 0 s=1 WHERE t ¹sº : Ω¹sº ! @Ω¹sº ARE TRACE OPERATORS (restrictions OF A fiELD INTO THE LOCAL boundary) AND B¹sº : @Ω¹sº ! Γ IS ¹ º A SIGNED BOOLEAN ASSEMBLY OPERATOR ON THE GLOBAL INTERFACE Γ, WITH Γ = [ Γ sq WITH Γ¹sqº = @Ω¹sº \ @Ω¹qº. The 1≤s<q ≤Ns ¹sº fiRST LINE REPRESENTS THE LOCAL EQUILIBRIUM IN EVERY SUBDOMAIN Ω , s = 1;:::; Ns AND THE SECOND IS THE CONTINUITY OF THE SOLUTION THROUGH EACH INTERFACE Γ¹sqº BETWEEN EACH NEIGHBORING PAIR OF subdomains, Ω¹sº AND Ω¹qº. The NEXT PART CONSISTS IN REFORMULATING THE LOCAL EQUILIBRIUM IN TERMS OF THE INTERFACE unknowns, BY USING THE Schur COMPLEMENTS IN EACH subdomain. The CLASSICAL NOTATION ARE USED −1 + T ¹sº ¹sº − ¹sº ¹sº ¹sº ¹sº ¹sº ¹sº ¹sº ¹sº ¹ ¹sºº Sbb = Kbb Kbi Kii Ki b ; F = t K t ; R = k er K (2) ¹sº ¹ ¹sºº+ THE SUBSCRIPTS i AND b STAND FOR INTERNAL AND BOUNDARY DEGREES OF FREEDOM RESPECTIVELY. F = Sbb IS THE DUAL OF THE Schur COMPLEMENT AND R ¹sº IS A BASIS OF RIGID BODY MODES OF k er ¹K ¹sºº. With THESE CONSIDERations, PLUS THE CONTINUITY condition, THE CLASSIC FETI SYSTEM IS FORMED ! ! ! FG λ d = (3) GT 0 α e WHERE T e = −(:::; f ¹sº R ¹sº;::: ºT ; G = ¹:::; B¹sºt ¹sºR ¹sº;::: º Õ T Õ + T F = B¹sºF ¹sºB¹sº ; d = − B¹sºt ¹sºK ¹sº f ¹sº (4) s s The OPERATOR F IS NOT EXPLICITLY assembled, IN PRACTICE TO MULTIPLY IT BY A VECTOR, A LOCAL Neumann PROBLEM IS SOLVED IN EACH SUBDOMAIN AND THE JUMP OF THE LOCAL SOLUTIONS ACCROSS EACH THROUGH EACH INTERFACE Γ¹sqº IS computed. The USE OF THE pseudo-inVERSE IN THE DEfiNITION OF F ¹sº IS ASSOCIATED TO THE CONSTRAINT GT λ = e WHICH ENFORCES THE GLOBAL EQUILIBRIUM ON THE INTERFACE FOR THE flOATING subdomains. This CONDITION IS HANDLED BY AN initialization-projection STRATEGY WITH THE DEfiNITION OF T −1 λ0 = AG¹G AGº e Initialization (5) − P = I − AG¹GT AGº 1GT Projection (6) T T SO THAT G λ0 = e AND G P = 0. The SOLUTION IS THEN OBTAINED AS λ = λ0 + P λ˜ WHERE λ˜ IS FOUND VIA THE SOLUTION OF T T P FP λ˜ = P ¹d − F λ0º (7) 4 MOLINA ET AL. This EQUATION IS SOLVED VIA A PROJECTED Conjugate GrADIENT method, WHERE THE FOLLOWING CLASSIC PRECONDITIONER IS USED Õ ¹ º ¹ º ¹ ºT S˜ = B˜ s S˜ s B˜ s (8) s ˜¹sº ¹sº ¹sº ¹ ¹sºº SeVERAL CHOICES OF S CAN BE MADE TO DEfiNE THE PRECONDITIONER. Sbb , Kbb AND di ag Kbb DEfiNE THE Dirichlet, Lumped AND Superlumped PRECONDITIONERS RESPECTIVELY. The MATRIX A INTRODUCED IN 5 AND 6 CAN BE TAKEN AS THE IDENTITY OR ANY OF THE PRECONDITIONER PREVIOUSLY DEfined. In PRACTICE IT IS NOT NECESSARY TO TAKE THE Dirichlet OR THE Lumped PRECONDITIONER FOR A, THE Super Lumped WORKS WELL AND IS MUCH EASIER TO implement.