<<

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN Int. J. Numer. Meth. Engng. 44, 281—303 (1999)

DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION OF SHAPE SENSITIVITY ANALYSIS PROBLEMS

MANOLIS PAPADRAKAKIS* AND YIANNIS TSOMPANAKIS Institute of Structural Analysis and Seismic Research, National Technical University of Athens, Athens 15773, Greece

ABSTRACT This paper presents the implementation of advanced domain decomposition techniques for parallel solution of large-scale shape sensitivity analysis problems. The methods presented in this study are based on the FETI method proposed by Farhat and Roux which is a dual domain decomposition implementation. Two variants of the basic FETI method have been implemented in this study: (i) FETI-1 where the rigid-body modes of the floating subdomains are computed explicitly. (ii) FETI-2 where the local problem at each subdomain is solved by the PCG method and the rigid-body modes are computed explicitly. A two-level iterative method is proposed particularly tailored to solve re-analysis type of problems, where the dual domain decomposition method is incorporated in the preconditioning step of a subdomain global PCG implementation. The superiority of this two-level iterative solver is demonstrated with a number of numerical tests in serial as well as in parallel computing environments. Copyright ( 1999 John Wiley & Sons, Ltd.

KEY WORDS: shape sensitivity analysis; domain decomposition methods; preconditioned conjugate gradient; multiple right-hand sides; reanalysis problems

INTRODUCTION Shape optimization aims to improve the shape of a structure, defined with a number of parameters called design variables, by minimizing an objective function subject to certain constraint functions. The shape optimization algorithm proceeds with the following steps: (i) a finite element mesh is generated, (ii) displacements, stresses, frequencies, etc. are evaluated depending on the type of optimization problem, (iii) the gradients, or the sensitivities of the functions are computed by perturbing each design variable by a small amount, (iv) the optimiza- tion problem is solved and the new shape of the structure is defined. These steps are repeated until convergence has occurred. The most time-consuming part of a gradient-based optimization process is devoted to the sensitivity analysis phase. For this reason several techniques have been developed for the efficient calculation of the sensitivities in an optimization problem. The semi-analytical and the finite-difference approaches are the two most widely used types of sensitivity analysis techniques. From the algorithmic point of view the semi-analytical technique results in a typical linear solution problem with multiple right-hand sides in which the stiffness

* Correspondence to: Manolis Papadrakakis, Department of Civil Engineering, National Technical University, Zografou Campus, GR 157 73 Athens, Greece. E-mail: [email protected]

Contract/grant sponsor: European Union; Contract/grant number: HC&M/9203390 CCC 0029—5981/99/020281—23$17.50 Received 24 June 1996 Copyright ( 1999 John Wiley & Sons, Ltd. Revised 20 February 1998 282 M. PAPADRAKAKIS AND Y. TSOMPANAKIS matrix remains the same, while the finite-difference technique results in a typical re-analysis problem in which the stiffness matrix is modified due to the perturbations of the design variables. Shape optimization problems are usually computationally intensive tasks, where 60—90 per cent of the computations are spent for the solution of equilibrium equations required for the finite element analysis and sensitivity analysis. Although it is widely recognized that hybrid solution methods, based on a combination of direct and iterative solvers for solving linear finite element equations, outperform their direct counterparts, in sequential as well as parallel comput- ing environments, little effort has been devoted until now to their implementation in the field of structural optimization. In a recent paper by Papadrakakis et al., the performance of various iterative solution methods, based on PCG and Lanczos algorithms, in sequential computing environment was demonstrated and compared with the conventional direct skyline solver in a number of topology and shape optimization problems. In the present study two variants of the basic FETI method of Farhat and Roux are implemented for solving sensitivity analysis problems using the semi-analytical approach, while an innovative two-level parallel solution method is proposed for solving sensitivity analysis problems using the global finite-difference approach. In the two variants of the basic FETI method the rigid-body modes of the floating subdomains are computed explicitly, while in the second variant a PCG iterative solver with a strong preconditioner is also used for the solution of the local problem in each subdomain. These two variants have a beneficial effect on the robustness of basic FETI method for ill-conditioned problems, while the second variant operates under reduced storage requirements. In the present study a two-level iterative method is proposed specially tailored for solving reanalysis type of problems in general, a special case of which is the problem arising when the global finite- difference sensitivity analysis approach is used. For this type of problems the two variants of the basic FETI method are incorporated in the preconditioning step of a global subdomain imple- mentation of the PCG method. This two-level subdomain implementation is applicable in serial and parallel computing environments resulting in a drastic improvement of computing time compared to the conventional one-level methods.

SHAPE SENSITIVITY ANALYSIS Sensitivity analysis is the most important and time-consuming part of gradient-based structural optimization. Several techniques have been developed which can be mainly distinguished by their numerical efficiency and their implementation aspects. The primary objective of sensitivity analysis is to compute the of the displacement field with respect to perturbations of the primary design variables. The methods for sensitivity analysis can be divided into discrete and variational methods. In the variational approach the sensitivity coefficients can be determined by applying basic variational theorems. In this case, the sensitivities are given as boundary and surface integrals which are solved after the structure has been discretized, whereas in the discrete approach they are evaluated using the finite element equations. The implementation for dis- crete methods is simpler than the one for variational techniques. A further classification of the discrete methods is the following: (i) Global finite-difference method where the derivatives needed for the solution of the optimization problem are computed numerically. (ii) Semi-analytical method where the calculation of the sensitivities is performed via analytical and numerical expressions. (iii) Analytical method where the derivatives of the objective and constraint functions are obtained analytically.

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 283

The decision on which method to implement depends strongly on the type of problem, the organization of the computer program and the access to the source code. The implementation of analytical and semi-analytical methods is more complex and requires access to the source code, whereas when a finite-difference method is applied the formulation is much simpler. Depending on the type of problem and the approach used, the sensitivity analysis together with the finite element analysis can take 60—90 per cent of the total computational effort required to solve the whole optimization problem. An efficient and reliable sensitivity analysis module that can take full advantage of the innovative computer architectures with parallel processing capabilities could therefore result in a considerable reduction of the overall computational effort of the optimization procedure. In the present study the emphasis is given on the application of efficient domain decomposition methods in parallel computing environments for solving the systems of the algebraic equations encountered in the two most commonly used types of sensitivity analysis, namely the global finite difference and the semi-analytical approaches.

¹he Global Finite-Difference (GFD) approach The GFD approach provides a simple way of computing the sensitivity coefficients. This method requires the solution of the linear system of equations Ku"f, where u is the displacement vector, for the original design variables s, and for each perturbed design variable N" #* * sI sI sI, where sI is the magnitude of the perturbation, usually taken in the range of 10\—10\ of the value of the design variable. The design sensitivities for the displacements * * u/ sI are computed using a forward difference scheme: *u u(s #*s )!u(s ) *u *s + " I I I / I * * (1) sI sI #* where u(sI sI) is evaluated by solving the following reanalysis type of problem: #* #* " #* K(sI sI)u(sI sI) f (sI sI) (2)

The Semi-Analytical (SA) approach The SA approach is based on the chain rule differentiation of the finite element equations Ku"f :

*u *K * f # " K * * u * (3) sI sI sI which when rearranged results in

*u K "f * * I (4) sI where

*f *K f *" ! u I * * (5) sI sI

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 284 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

* * * * fI* represents a pseudo-load vector. The derivatives of K/ sI and f / sI are computed for each #* #* design variable by recalculating the new values of K(sI sI) and f (sI sI) for a small * perturbation sI of the design variable sI, while the stiffness matrix remains unchanged through- out the whole sensitivity analysis.

The Conventional Semi-Analytical (CSA) approach. In the CSA approach, the values of the derivatives in equation (3) are calculated by applying the forward-difference approximation

*K K(s #*s )!K(s ) *K *s + " I I I / I * * (6) sI sI * f f (s #*s )!f (s ) *f *s + " I I I / I * * (7) sI sI For maximum efficiency of the semi-analytical approach only those elements which are affected * * by the perturbation of a certain design variable are involved into the calculations of f/ sI and * * K/ sI.

The *Exact+ Semi-Analytical (ESA) approach. The CSA approach may suffer some drawbacks in particular types of shape optimization problems. This is due to the fact that in the numerical differentiation of the element stiffness matrix with respect to shape design variables the compo- nents of the pseudo-load vector associated with the rigid-body rotation do not vanish. The solution suggested by Olhoff et al. alleviates the problem by performing an ‘exact’ numerical differentiation of the elemental stiffness matrix as follows:

*k L *k *a " + H * *a * (8) sI H H sI where n is the number of elemental nodal coordinates affected by the perturbation of the design a variable sI and H is the nodal co-ordinate of the element which can be either an x-or a y-co-ordinate. This means that by perturbing a design variable all nodes of an element on the perturbed boundary are perturbed as well and the summation is carried out only for the *a * perturbed nodal co-ordinates. The term H/ sI is computed using the forward-difference * *a scheme while the k/ H is computed by differentiating the element stiffness matrix expression.

SOLUTION METHODS IN SENSITIVITY ANALYSIS PROBLEMS The implementation of hybrid solution schemes in structural optimization, which are based on a combination of direct and preconditioned iterative solvers, has not yet received the attention it deserves from the research community, even though in finite element linear solution problems and particularly when dealing with large-scale applications their efficiency is well documented. In a recent study by Papadrakakis et al., a class of efficient hybrid solution methods was applied in the context of shape sensitivity analysis and topology optimization problems in sequential computing environment.

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 285

In this work domain decomposition methods are applied for solving the sensitivity analysis part of shape optimization problems, in both sequential and parallel computing environments, after being properly modified to address the special features of the particular type of problems at hand. The most computational intensive part of sensitivity analysis, using either semi-analytical or finite-difference sensitivity analysis approaches, is the solution of finite element equilibrium equations (4) or (2), respectively. In the first case, the coefficient matrix remains constant and only the right-hand-side vector is changing, which is the typical case for solving linear systems with multiple right-hand sides, while in the second case the coefficient matrix is slightly modified therefore a typical reanalysis problem needs to be solved.

Solving sensitivity analysis problems with the semi-analytical approach One of the main shortcomings of iterative solution methods is encountered when a sequence of right-hand sides has to be processed. In such cases direct methods possess a clear advantage over iterative methods since the most computationally intensive part, associated with the factorization of the stiffness matrices, is not repeated and only a backward and forward substitution is required for each subsequent right-hand side. The Lanczos method has been used in the past for treating a sequence of right-hand sides. An efficient implementation of Lanczos method was proposed by Papadrakakis and Smerou which handles all approximations to the solution vectors simulta- neously without the necessity for storing the tridiagonal matrix and the orthonormal basis. This approach, however, cannot be implemented in problems with multiple right-hand sides that are not known at the beginning of the iterative procedure. The most efficient method for this type of problems is the dual domain decomposition method FETI with the projection—re-orthogonaliz- ation scheme for handling problems with multiple or repeated right-hand sides. Subsequently, the basic FETI method, its two variants and the re-orthogonalization procedure will be briefly presented.

The basic FETI method. The FETI method proposed by Farhat and Roux is considered to be very promising method for solving large-scale problems in both shared and distributed memory computer architectures due to its very satisfactory numerical and parallel scalability features. This method operates on totally disconnected subdomains, while the governing equilibrium equations are derived by invoking stationarity of the energy functional subject to displacement constraints which enforce the compatibility conditions on the subdomain interface. The aug- mented equations are solved for the Lagrange multipliers after eliminating the unknown displace- ments. The resulting interface problem is in general indefinite due to the presence of floating subdomains which do not have enough prescribed displacements to eliminate the local rigid-body modes. The application of the FETI method results in the following interface problem:

KC BC2 uC f C " (9) CBC 0 DCj D C 0 D where KCuC"f C are the unassembled equations for all subdomains and BC"[B() B() )))B(Q)] are signed Boolean matrices which localize the subdomain displacements on the interface and ‘s’ is the total number of subdomains. The vector of Lagrange multipliers j represents the

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 286 M. PAPADRAKAKIS AND Y. TSOMPANAKIS interaction forces between the subdomains along their common boundary that impose the continuity of the structure. The system of equation (9) is indefinite but has a unique solution provided that the global problem is adequately restrained. Care has to be taken for the handling of the floating subdo- mains which are characterized by a singular stiffness matrix K(H). The system of equation (9) may be explicitly written for fixed subdomains as u(H)"K(H)\( f (H)!B(H)2j) (10) and for floating subdomains as u(H)"K(H)>( f (H)!B(H)2j)#R(H)c(H) (11) where K(H)> is a generalized inverse of K(H) for floating subdomains, R(H) corresponds to the rigid-body modes of the floating subdomain j, and c(H) specifies a linear combination of these. If the stiffness matrix K(H) of a floating subdomain is partitioned as K(H) K(H) (H)"   K C (H)2 (H)D (12) K K (H) (H)> (H) where K has full rank, then K and R are given by \ \ K(H) 0 !K(H) K(H) K(H)>"  and R(H)"   (13) C 00D C I D The additional equations required for determining c(H) are provided by the zero energy condition of the rigid-body modes R(H)2K(H)u(H)"0 (14) The combination of the compatibility conditions of equation (9) for the displacements u(H) on the interface d.o.f. of the subdomains with equations (10), (11) and (14) gives the following interface problem: F !G j f ' ' " H (15) C ! 2 DCc D C D G' 0 fA where

Q > 2 Q > " + (H) (H) (H) c" c() )))c(QD) 2 " + (H) (H) (H) F' B K B , [ ] , fH B K f H H 2 2 " () ) () ))) (QD) ) (QD) " () ) () ))) (QD) ) (QD) 2 G' [B R B R ], fA [ f R f R ] and sD is the total number of floating subdomains. The solution of this indefinite problem can be performed by a Preconditioned Conjugate Projected Gradient (PCPG) algorithm. The precon- ditioning matrix adopted in this study is the lumped-type preconditioner which does not require any additional storage, involves only matrix vector products on the interface level and often outperforms the optimal but expensive Dirichlet preconditioner which requires additional stor- age equivalent to the storage of the factorized stiffness matrices of each subdomain. 

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 287

Two variants of the basic FETI method. In the basic FETI method the solution of the local problem of equations (10) or (11) is required associated with both internal and interface d.o.f. of each subdomain. The solution of this problem by a direct Cholesky solver is most advantageous for two reasons: (i) A solution of equations (10) or (11) has to be performed at each PCPG (H)>+ (H), iteration for calculating, via fast forward and backward substitutions, the product K d K " + (H), ( j 1,...,s),where d K is the direction vector in mth iteration. (ii) The rigid-body modes R(H) of the floating subdomains are computed as a by-product of the factorization procedure according to equation (13). An alternative way of treating the solution of the local problems is to use a PCG solver where the preconditioning matrix is computed by an incomplete, or even a complete factorization of the stiffness matrix but stored in single precision arithmetic.  In this approach the mixed precision PCG implementation proposed in Reference 10 with a strong preconditioner is adopted in which all computations are performed in single precision arithmetic except for the matrix—vector multiplication, occurring during the recursive evaluation of the residual vector, which is per- formed in double precision arithmetic. The stiffness matrix is stored in double precision arithmetic but in compact form which demands only a small fraction of the memory needed by the skyline storage scheme especially for large-scale 3-D problems. The storage requirements of this mixed precision PCG are at most half the storage of the direct Cholesky (in the case of a complete factorized preconditioner since it is stored in single precision arithmetic) plus the memory for the compact stored stiffness matrix and the four auxiliary vectors of the PCG method, thus resulting in a great reduction of the memory demands. This implementation proved to be a robust and reliable solution procedure even for handling large and ill-conditioned problems, while it is computer storage-effective. It was also demonstrated to be more cost-effective than double precision arithmetic calculations for the same storage demands. By adopting this alternative treatment for the solution of the local problem at each subdomain the rigid-body modes cannot be obtained as a by-product of the factorization procedure as previously described for the basic FETI method. Bitzarakis et al.  have proposed an analyti- cal handling of the rigid-body modes for all types of finite elements which permits the incorpora- tion of the PCG algorithm with a strong preconditioner for the solution of the local problem in the basic FETI method. Under this formulation the floating subdomains are fixed with con- straints equal to the number of local singularities which is less or equal to 3 for 2-D problems and is less or equal to 6 for 3-D ones. For a 2-D problem discretized with plane /strain finite elements, the rigid-body modes associated with 2 translations and 1 rotation produce the following displacements at node i of subdomain j: (H)" 2 RG [R R R] (16) with ! 1 0 yG " " " R C0D, R C1D, R C xG D 0 0 0 (H) where xG, yG are the co-ordinates of node i. The rigid body modes matrix R required for the computation of G' is formed by (H)" (H) (H) (H) 2 R [R R ... RL ] (17)

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 288 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

(H) where n is the total number of nodes of the subdomain. Similar expressions for RG may be defined for plate, shell and 3-D elasticity problems. Whenever the boundary conditions of the structure impose a restriction in some of the translational/rotational movements of a floating subdomain, then only the remaining ‘active’ rigid-body modes of this subdomain need to be calculated and used in the computations. The detection of the correct number of rigid-body modes, which corresponds to the size of the null space of the stiffness matrix of floating subdomains, is a very important issue for the FETI method. The analytical handling of the rigid-body modes requires a slightly complicated algorithm in the case of partially supported subdomains. In the rather extreme case of subdomains that contain non-rigid-body zero-energy modes the presented procedure is not directly applicable and needs further refinement. The benefits of using the analytical calculation of the rigid-body modes are twofold. The first is related to the accuracy of the computations involved in the evaluation of the rigid-body modes and their linear combinations c, while the second is associated with the implementation of a reduced storage solver for the solution of the local problem. During the factorization procedure of the basic FETI method the restraints are imposed on the last degrees of freedom of the floating subdomains and as a result the computation of the rigid-body modes is infected by round-off errors which become more pronounced for ill-conditioned problems. Thus, the explicit calcu- lation of the rigid-body modes results to a more stabilized procedure which becomes more evident in large-scale 3-D and ill-conditioned problems.

¹he re-orthogonalization procedure for treating repeated right-hand sides. A re-orthogonaliz- ation procedure has been proposed recently by Farhat et al. for extending the PCG method to problems with multiple and/or repeated right-hand sides based on the K-conjugate property of " 2 " O the search directions (dK dKKdG 0 for m i). The implementation of the re-orthogonalization technique is impractical when applied to the full problem Ku(H)"f (H) due to excessive storage requirements. However, this methodology has been efficiently combined with the FETI method where the size of the interface problem can be order(s) of magnitude less than the size of the global problem. Thus, the cost of re-orthogonalization is negligible compared to the cost of the solution of the local problems associated with the matrix-vector products of F' with the interface direction vectors of the FETI method, while the additional memory requirements are not excessive. The modified search direction of the PCPG algorithm is given by

K d2F d  " ! + G ' K> dK> dK> 2 dG (18) G dG F'dG  " " which enforces explicitly the orthogonality condition dK>F'dG 0, i 1, . . . , m. The initial j(G>) estimate  of the solution vector of the subsequent right-hand side (H>) (H>) 2 [ f H f A ] of equation (15) is given by

j(H>)" #   DIx x (19) 2 " 2 (H>)!  " 2 \ (H>) where DIF'DIx DI ( f H F'x ) and x G'(G' G') fA .

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 289

Solving sensitivity analysis problems with the global finite-difference approach The hybrid solution schemes proposed in Reference 2 for treating reanalysis type of problems, based on the global formulation and solution of the problem of equation (4), proved to be very efficient compared with the standard skyline solver in a sequential computing environment. Their parallel implementation, however, is hindered by the inherent scalability difficulties encountered during the preconditioning step which incorporates forward and backward substitutions of a fully factorized stiffness matrix. In order to alleviate this deficiency the Global Subdomain Implementation (GSI) of a subdomain-by-subdomain PCG algorithm is implemented in this study on the global stiffness matrix. The dominant matrix-vector operations of the stiffness and the preconditioning matrices are performed in parallel on the basis of the same multi-element group partitioning of the entire domain. In order to exploit the parallelizable features of the GSI(PCG) method and to take advantage of the efficiency of a fully factorized preconditioning matrix, the following two-level methodology is proposed based on the combination of the global subdomain implementation and the FETI method. The GSI(PCG) method is employed, using a multi-element group partitioning of the entire finite element domain, in which the solution required during the preconditioning step is performed by the FETI method, or any of its variants, operating on the same mesh partitioning of the GSI(PCG) method. In the proposed methodology the preconditioning step of the GSI(PCG) method " \ zK> CI rK> (20) is performed by FETI. For the solution of this problem two methodologies, namely the GSI(PCG)-FETI and the GSI(NCG)-FETI are proposed. The second approach is based on a Neumann series expansion of the preconditioning step.

The GSI(PCG)-FETI method. In the GSI(PCG)-FETI method the iterations are performed on the global level with the GSI(PCG) method, using an incomplete Cholesky factorization of the stiffness matrix as preconditioner. Thus, the incomplete factorization of the stiffness matrix #* 2" #* ! K K can be written as LDL K K E, where E is an error matrix which does not have to be formed. Matrix E is usually defined by the computed positions of ‘small’ elements in ¸ which do not satisfy a specified magnitude criterion and therefore are discarded. For the typical reanalysis problem #* " (K K)u f (21) matrix E is taken as *K, so that the preconditioning matrix becomes the complete factorized " initial stiffness matrix: CI K. Therefore, the solution of the preconditioning step of the GSI(PCG) algorithm, which has to be performed at each GSI(PCG) iteration, can be effortlessly executed, once K is factorized, by a forward and backward substitution. With the parallel implementation of the two-level GSI(PCG)-FETI method the precondition- ing step can be solved in parallel by the interface dual method for treating the repeated solutions required in equation (20), using the same decomposition of the domain employed by the external GSI(PCG) method. The procedure continues this way for every re-analysis problem, while the direction vectors of FETI are being re-orthogonalized in order to further decrease the number of PCPG iterations of the interface problem within the preconditioning step. The solution of equation (20) is performed nG * nP times via the FETI method, where nG and nP correspond to the number of GSI(PCG) iterations and the number of reanalysis steps, respectively.

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 290 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

The GSI(NCG)-FETI method. The combination of Neumann series expansion and PCG method on the global level for the solution of re-analysis problems in shape and topology optimization was investigated in a previous study. In this work the Neumann series expansion is used to improve the quality of the preconditioning step of the two-level method by computing a better approximation to the inverse preconditioning matrix. The preconditioning matrix is now #* defined as the complete stiffness matrix (K K), but the solution for zK> of equation (20) is performed approximately using a truncated Neumann series expansion. Thus, the preconditioned vector zK> of equation (20) is obtained at each iteration by " # !1* \ !1 zK> (I K0 K) K0 rK> (22) where the term in parenthesis can be expressed in a Neumann expansion giving " ! # ! #))) \ zK> (I P P P )K rK> (23) " \* with P K K. The preconditioned residual vector of equation (23) can now be represented by the following series: "  !  #  !  #))) zK> z z z z (24) with  " \ z K rK> (25)  " \ *  " z K ( KzG\), i 1,2,..., (26) The incorporation of the Neumann series expansion in the preconditioned step of the PCG algorithm can be seen from two different perspectives. From the PCG point of view, an improvement of the quality of the preconditioning matrix is achieved by computing a better " #* \ approximation to the solution of u (K K) f during the preconditioning step, than the \ one provided by the preconditioning matrix K . From the point of view of the Neumann series expansion, the inaccuracy entailed by the truncated series is alleviated by the conjugate gradient iterative procedure. In the present study the following parallel implementation of the two-level GSI(NCG)-FETI method for solving equation (21) is used: A GSI(NCG) method is performed as described in the previous section, in which the preconditioning step is performed according to equation (24). Equations (25) and (26) can be solved in a parallel computing environment by the FETI method utilizing the same decomposition of the domain adopted for the external GSI(NCG) method. The procedure continues this way for every reanalysis problem, while the direction vectors of FETI are being re-orthogonalized in order to keep the number of PCPG iterations of the interface problem small. The solution of equations (25) and (26) is a typical multiple right hand-side ; ; problem and it is performed nH nG nP times via the FETI method, where nH, nG, nP correspond to the number of terms in the Neumann series expansion, the number of GSI(NCG) iterations and the number of re-analysis steps, respectively.

NUMERICAL TESTS Before presenting the performance of the methods discussed for the solution of shape optimiza- tion problems it would be appropriate to demonstrate the performance of the two variants of the

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 291

Figure 1. 3-D cantilever beam

basic FETI method for solving 2-D and 3-D ill-conditioned problems. All computational results reported in this section were run on a SG Power Challenge XL shared memory machine with R8000 processors. The three versions of the FETI method which are tested and compared are the following: The basic FETI method where the computation of the rigid-body modes of the floating subdomains is performed as a by-product of the factorization procedure. The FETI-1, where the computation of the rigid-body modes is performed explicitly via equations (16) and the position of the pseudo- boundary conditions is located by an automatic procedure that avoids close positioning of the restrained d.o.f. The FETI-2, where the rigid-body modes are computed explicitly via equations (16) and the local problem is solved via the PCG algorithm where the preconditioner is the completely factorized subdomain stiffness matrix stored in single precision arithmetic. In all test cases the basic FETI method and its variants are applied with re-orthogonalization unless otherwise stated and the lumped-type preconditioner is used for the PCPG algorithm.

Linear solution test examples

3-D cantilever beam problem. The three versions of FETI are applied first to the 3-D cantilever beam of Figure 1 which is discretized with 31 104 solid-cube elements resulting in 105 000 d.o.f. In order to deteriorate the conditioning of the problem the Poisson ratio is taken l"0)499. The characteristic d.o.f. for 6, 12, 18, 24 and 36 subdomains are depicted in Table I. The convergence tolerance for this example was taken as 3;10\. Table I shows the performance of the methods in 6 processors when the structure is divided with one way dissection into 6, 12, 18, 24 and 36 subdomains. The global problem size is fixed and the number of subdomains is increased in an effort to reduce the cost of the local factorizations and solutions and reduce storage requirements. It can be observed that an improvement in the computing time is accomplished by the FETI-2 variant. This improvement becomes more evident when no re-orthogonalization is performed. It was found that using more subdivisions at each subdomain is equally advantageous for the basic FETI method and for the variant FETI-2. It can also be seen that the benefits from the reduced computer storage demands of the FETI-2 method are still significant when more subdomains are allocated at each processor, while the gains in computational efficiency remain almost the same.

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 292 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

Table I. 3-D cantilever beam: Characteristic d.o.f. and performance of the methods in six processors

Performance of the methods with re-orthogonalization

Number of Interf Local PCPG Local Total Storage subdomains d.o.f. d.o.f. Method Iter time (s) time (s) time (s) (Mb)

FETI 117 25 2,176 2,282 682 6 3,705 18,525 FETI-1 117 25 2,174 2,280 682 FETI-2 118 26 2,072 2,178 391 FETI 154 94 1,490 1,660 509 12 8,151 8,892 FETI-1 154 94 1,489 1,659 509 FETI-2 151 92 1,379 1,548 312 FETI 178 158 1,059 1,290 393 18 12,597 6,669 FETI-1 178 158 1,057 1,288 393 FETI-2 178 158 982 1,212 263 FETI 216 259 942 1,276 351 24 17,043 5,187 FETI-1 212 254 925 1,254 350 FETI-2 213 255 864 1,196 254 FETI 312 694 991 1,763 361 36 25,935 2,964 FETI-1 298 661 922 1,660 355 FETI-2 299 663 867 1,609 292

Figure 2. (a) 2-D plane strain problem; (b) 8 subdomains with optimal aspect ratio

2-D plane strain problem. The basic FETI method and its variants are also applied to the 2-D plane strain problem of Figure 2(a) which is discretized with 20 385 elements resulting in 41 188 d.o.f. as shown in Figure 2(b). In order to increase the conditioning of the problem the Poisson ratio is taken as l"0)4999, while the convergence tolerance for this example is taken as 10\.

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 293

Table II. 2-D plane strain problem: number of iterations of basic FETI and FETI-1 for the optimal decomposition to 8 subdomains

Tolerance 10E!1 10E!2 10E!3 10E!4 10E!5 10E!6

Method With re-orthogonalization FETI 206 245 278 *** FETI-1 206 245 277 297 325 339 No re-orthogonalization FETI 1,165 1,648 2,324 *** FETI-1 824 1,146 1,512 1,717 2,131 2,356

* No convergence

Figure 3. 2-D plane strain cantilever beam with different support conditions

This problem is subdivided using TOPDOMDEC to 8 subdomains with optimal aspect ratio resulting in 1254 interface d.o.f., as shown in Figure 2(b). The performance of the methods is presented in Table II. It can be observed that the analytical computation of the rigid-body modes in FETI-1 is beneficial even for applications with optimal subdomain aspect ratio, since the basic FETI method cannot manage to achieve half the required tolerance even with re-orthogonaliz- ation. Without re-orthogonalization the basic FETI presents a much slower convergence rate than FETI-1 variant. The improved convergence behaviour of FETI-1 variant depicted in Table II is related to the accuracy with which the rigid-body modes and the linear combination of these are computed. In the case of the basic FETI method the computation of the rigid-body modes is infected by the round-off errors which are developed during the factorization of the stiffness matrix of the floating subdomains. This infection becomes more pronounced for ill-conditioned problems and the improvement with the FETI-1 variant becomes much more evident when no re-orthogonaliz- ation is performed.

2-D plane strain cantilever beam problem. The basic FETI method and its variant are also applied to the 2-D cantilever beam of Figure 3 which is discretized with 5000 elements with 10 200 d.o.f. Two test cases were considered for this example with two types of supports, as shown in Figures 3(a) and 3(b), in order to investigate their influence on the convergence behaviour of the methods. For the first test case the beam is optimally subdivided into 8 subdomains with 514 interface d.o.f. In Figure 3(b) the two supports are very close together in order to demonstrate the

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 294 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

Table III. 2-D cantilever beam—Test case 1: number of iterations of basic FETI and FETI-1

Poisson ratio 0)30)49 0)4999

Method distant supports FETI 51 116 * FETI-1 51 116 231 close supports FETI 57 ** FETI-1 57 118 240

* No convergence

Table IV. 2-D cantilever beam—Test case 2: number of iterations of basic FETI and FETI-1

Subdomains 2 4 8

Method l"0)4999 FETI **241 FETI-1 60 116 240 l"0)499999 FETI **316 FETI-1 65 172 314

*No convergence effect of large rigid-body movements associated with small strains on the convergence behaviour of the FETI method. As can be seen in Table III the location of the supports affects the convergence behaviour of the methods. Furthermore, the basic FETI method is much more sensitive to the position of the restraints since the influence of the infected rigid-body modes by the round-off errors appears to be more decisive for badly restrained structures. Table IV depicts the performance of the methods for the cantilever beam of Figure 4 for different number of subdivisions. The results confirm the superiority of FETI-1 over the basic FETI method which is affected by the accuracy with which the rigid-body modes are computed during the factorization of the subdomain stiffness matrices. This accuracy is reduced as the size of the subdomains become larger.

Shape optimization test examples Two benchmark shape optimization test examples are used to demonstrate the efficiency of the proposed methods using the shape optimization code ADOPT. The first example is a long, slender domain which has a rather dense global stiffness matrix with narrow bandwidth, whereas the global stiffness matrix of the second example has a sparse pattern with a relatively large

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 295

Figure 4. 2-D plane strain cantilever beam with different number of subdivisions bandwidth. The performance of the proposed solution methods is investigated and compared first in serial computing mode with the conventional direct skyline and PCG, NCG and Lanczos solvers proposed in Reference 2. NCG is a PCG algorithm in which the preconditioning step is performed via a Neumann series expansion of the stiffness matrix. Furthermore, the parallel performance of basic FETI method and its variants is investigated in both types of sensitivity analysis problems, while the two-level PCG method is applied for the GFD sensitivity analysis test cases. The convergence tolerance for all solution methods was taken as 10\. For these test examples plane stress conditions and isotropic material properties were assumed (elastic modulus E"210 000 N/mm and Poisson’s ratio l"0)3). The following abbreviations are used: Direct is the conventional skyline direct solver; PCG(t) and Lanczos(t) are the PCG and Lanczos solvers, respectively, with the preconditioning matrix produced via a complete, or an incomplete Cholesky factorization controlled by the rejection parameter t. A value of t between 0 and 1 corresponds to an incomplete Cholesky precondi- tioner, while t"0 gives the complete factorized matrix. NCG-i is the NCG solver with i terms of the Neumann series expansion. Finally, the two variants of the two-level methodology, combined with the FETI-2 variant, namely the GSI(PCG)-FETI-2 and GSI(NCG)-FETI-2 methods, are compared with the other solvers, both in serial and parallel computing modes, in the case of GFD sensitivity analysis problems.

Connecting rod problem. The problem definition is given in Figure 5. The linearly varying line load between key points 4 and 6 has a maximum value of p"500 N/mm. The objective is to p "  minimize the volume of the structure, subject to an equivalent stress limit of ! 1200 N/mm . The design model, which makes use of symmetry, consists of 12 key points, 4 primary design variables (7, 10, 11, 12) and 6 secondary design variables (7, 8, 9, 10, 11, 12). The stress constraint is imposed as a global constraint for all the Gauss points and as key point constraint for the key points 2, 3, 4, 5, 6 and 12. The movement directions are indicated by the dashed arrows. Key points 8 and 9 are linked to 7 so that the shape of the arc is preserved throughout the optimization procedure. The problem is analysed with a fine mesh having 39 200 d.o.f. resulting in a dense global stiffness matrix with relatively narrow bandwidth. The characteristic d.o.f. for 4 and 8 subdomains, as depicted in Figure 6, are given in Table V. The ESA and the GFD

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 296 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

Figure 5. Connecting rod: (a) initial shape; (b) final shape

Figure 6. Connecting rod: subdivision in 4 and 8 subdomains sensitivity analysis methods are used to compute the sensitivities with perturbation value *s"10\. Table VI demonstrates the performance of the methods operated in a sequential mode for the case of ESA sensitivity analysis. In the basic FETI method and its variants the operations are

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 297

Table V. Connecting rod: characteristic d.o.f. for 4 and 8 subdomains

Subdomains 4 8

Total d.o.f. 39,200 39,200 Internal d.o.f.* 10,038 5,118 Interface d.o.f. 474 1,098

* Of the larger subdomain

Table VI. Connecting rod: performance of the methods in sequential mode with ESA sensitivity analysis

Method (4 subdomains- Time Storage 1 processor) (s) (Mbytes)

Direct skyline 186 47 NCG-1 220 35 Lanczos (1E!9) 209 16 PCG (0) 205 34 PCG (1E!9) 224 15 FETI 257 14 FETI-1 234 14 FETI-2 198 8

Table VII. Connecting rod: performance of the methods in parallel mode with ESA sensitivity analysis

Time Storage Right-handsides 12345 (s) (Mbytes)

Method Iterations (4 processors) FETI 26 12 10 10 8 86 14 FETI-2 26 12 10 10 8 69 8

Method Iterations (8 processors) FETI 36 16 15 13 11 59 9 FETI-2 36 16 15 13 11 48 6

carried out in 4 subdomains. Following the results depicted in Table I it is expected that the performance of the basic FETI method and its variants will be further improved for an increased number of subdomains. Table VII demonstrates the performance of the basic FETI method and FETI-2 variant, for the case of ESA sensitivity analysis, operated on parallel computing mode in

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 298 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

Table VIII. Connecting rod: performance of the methods in serial mode with GFD sensitivity analysis

Method (4 subdomains- Time Storage 1 processor) (s) (Mbytes)

Direct skyline 844 47 NCG-1 335 36 PCG (0) 341 34 PCG (1E!9) 374 15 FETI 1,085 14 FETI-2 823 8 GSI(PCG)-FETI-2 368 9 GSI(NCG)-FETI-2 360 10

Table IX. Connecting rod: performance of the methods in parallel mode with GFD sensitivity analysis

Right-hand time Storage sides 1 2345(s)(Mbytes)

Method (4 processors) Iterations FETI 26 26 26 26 26 349 14 FETI-2 26 26 26 26 26 264 8 GSI(PCG)-FETI-2 26 14 12 10 98876 1249 GSI(NCG)-FETI-2 26 14 12 10 98876 12210 Method (8 processors) Iterations FETI 36 36 36 36 36 247 9 FETI-2 36 36 36 36 36 185 6 GSI(PCG)-FETI-2 36 17 15 12 11 10 8 8 7 88 7 GSI(NCG)-FETI-2 36 17 14 12 11 10 9 8 7 87 8

4 and 8 processors using 4 and 8 subdomains, respectively. Tables VIII and IX depict the performance of the methods, for the case of GFD sensitivity analysis, operated on sequential and parallel computing modes, respectively. In Tables VII and IX the iteration history is also depicted for five right-hand sides which correspond to the initial finite element solution and the sensitivity analysis for the four design variables of the problem. In the case of the two level methods the number of iterations corresponds to the number of FETI iterations for the two global iterations required for convergence.

Square plate problem with a central cut-out. The problem definition of this example is given in Figure 7 where, due to symmetry, only a quarter of the plate is modelled. The plate is under biaxial tension with one side loaded with a distributed loading p"0)1538 N/mm and the other side loaded only with half of this value, as shown in Figure 7. The objective is to minimize the p " )  volume of the structure subject to an equivalent stress limit of ! 7 0 N/mm . The design model consists of 8 key points and 5 primary design variables (2, 3, 4, 5, 6) which can move along

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 299

Figure 7. Square plate: (a) initial shape; (b) final shape

Figure 8. Square plate: subdivision in 4 and 8 subdomains radial lines. The movement directions are indicated by the dashed arrows. The stress constraints are imposed as a global constraint for all the Gauss points and as key point constraints for the key points 2, 3, 4, 5, 6 and 8. The problem is analysed with a fine mesh with 38 800 d.o.f. resulting in a sparse global stiffness matrix with relatively large bandwidth. The characteristic d.o.f. for 4 and 8 subdomains, as depicted in Figure 8, are given in Table X. The ESA and the GFD methods are used to compute the sensitivities with *s"10\. Table XI demonstrates the performance of the methods operated in a sequential mode for the case of ESA sensitivity analysis. In the basic FETI method and its variants the operations are carried out in 4 subdomains. Table XII demonstrates the performance of the basic FETI method

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 300 M. PAPADRAKAKIS AND Y. TSOMPANAKIS

Table X. Square plate: characteristic d.o.f. for 4 and 8 subdomains

Subdomains 4 8

Total d.o.f. 38,800 38,800 Internal d.o.f.* 9,738 5,122 Interface d.o.f. 998 2,290

* Of the larger subdomain

Table XI. Square plate: performance of the methods in sequential mode with ESA sensitivity analysis

Method (4 subdomains- Time Storage 1 processor) (s) (Mbytes)

Direct skyline 502 95 Lanczos (0) 524 63 Lanczos (1E!9) 417 29 PCG (0) 514 61 PCG (1E!9) 425 27 FETI 486 43 FETI-1 466 43 FETI-2 414 26

Table XII. Square plate: performance of the methods in parallel mode with ESA sensitivity analysis

Right-hand Time Storage sides 123456 (s) (Mbytes)

method (4 processors) Iterations FETI(no reorth) 67 60 65 51 53 53 420 41 FETI 33 16 13 10 9 8 150 43 FETI-2(no reorth) 67 60 65 51 53 53 306 24 FETI-2 33 16 13 10 9 8 120 26 method (8 processors) Iterations FETI(no reorth) 271 266 253 219 199 204 398 20 FETI 64 24 18 14 11 11 92 23 FETI-2(no reorth) 269 267 253 220 199 205 290 13 FETI-2 64 24 18 14 11 11 70 16

and the FETI-2 variant, for the case of ESA sensitivity analysis, operated on parallel computing mode in 4 and 8 processors using 4 and 8 subdomains, respectively. The benefits from the use of the re-orthogonalization are also evident both in terms of PCPG iterations and computing time. Tables XIII and XIV depict the performance of the methods, for the case of GFD sensitivity

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 301

Table XIII. Square plate: performance of the methods in serial mode with GFD sensitivity analysis

Method Time Storage (4 subdomains) (s) (Mbytes)

Direct skyline 2,790 95 NCG-1 732 65 PCG (0) 745 61 PCG (1E!9) 714 27 FETI 2,108 43 FETI-2 1,782 26 GSI(PCG)-FETI-2 795 27 GSI(NCG)-FETI-2 779 29

Table XIV. Square plate: performance of the methods in parallel mode with GFD sensitivity analysis

Right-hand Time Storage sides 1 23456(s)(Mbytes)

Method (4 processors) Iterations FETI 33 33 33 33 33 33 667 43 FETI-2 33 33 33 33 33 33 574 26 GSI(PCG)-FETI-2 33 17 13 12 10 10 8 8 7 7 6 256 27 GSI(NCG)-FETI-2 33 17 14 12 11 11 9 7 7 6 6 249 29 Method (8 processors) Iterations FETI 64 64 64 64 64 64 396 23 FETI-2 64 64 64 64 64 64 332 16 GSI(PCG)-FETI-2 64 24 19 17 14 12 11 9 9 8 7 165 17 GSI(NCG)-FETI-2 64 24 18 16 14 13 11 10 9 8 7 163 18

analysis, operated on sequential and parallel computing modes, respectively. In Tables XII and XIV the iteration history is also depicted for six right-hand sides, which correspond to the initial finite element solution and the sensitivity analysis for the five design variables of the problem. In the case of the two level methods the number of iterations corresponds to the number of FETI iterations for the two global iterations required for convergence.

DISCUSSION OF THE RESULTS AND CONCLUSIONS The variant FETI-1, where the rigid-body modes are computed explicitly, increases the robust- ness of the basic FETI method for ill-conditioned problems. Although re-orthogonalization ameliorates this deficiency of the basic FETI method, it is still present in 3-D large-scale and/or 2-D ill-conditioned problems, where the basic FETI method was found incapable to converge to stricter tolerances for a class of ill-conditioned problems. The second variant FETI-2, where the local problem is solved via the PCG algorithm, retains the characteristics of the first variant and

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) 302 M. PAPADRAKAKIS AND Y. TSOMPANAKIS operates on reduced storage requirements which are particularly important when solving large- scale 3-D problems. The benefits from the reduced computer storage demands of the FETI-2 variant are retained when more subdivisions are allocated at each processor since the beneficial effect of solving local problems with smaller bandwidths is equally exploited by the basic FETI method and by FETI-2 variant. In the case of sensitivity analysis with the ‘exact’ semi-analytical approach, the two FETI variants outperform the basic FETI method, both in sequential and in parallel computing modes, as well as the direct skyline solver, the global PCG and Lanczos methods in sequential computing mode. In the connecting rod problem with a small bandwidth, the direct skyline solver is slightly faster than the FETI-2 variant with re-orthogonalization, but requires 6 times more computer storage. In the square plate problem with a relatively sparse stiffness matrix the FETI-2 method with re-orthogonalization is 1)25 times faster than the direct skyline solver and requires 6 times less computer storage. For this example the global PCG and Lanczos methods with incomplete Cholesky factorization preconditioners outperform the skyline solver, both in terms of computing time and storage. In all cases the re-orthogonalization procedure proved to be extremely beneficial to the performance of the domain decomposition methods. In the case of sensitivity analysis with the global finite-difference approach, the superiority of the hybrid solution methods is more pronounced. The global PCG and Lanczos methods are 2)5 and 4 times faster than the direct solver in the dense and sparse example, respectively, and reduce storage requirements by 65 per cent in both cases. The proposed two-level methods, namely the GSI(PCG)-FETI-2 and GSI(NCG)-FETI-2 (with two terms in the Neumann series expansion), appear to perform equally well, in the sequential mode, compared to the global PCG and Lanczos methods with incomplete Cholesky factorization preconditioners, while they outperform the direct skyline solver and the one-level FETI and its two variants by a factor ranging from 2.3 to 3.5 for both examples. In the parallel computing mode the superiority of the two-level methods is retained over the one-level domain decomposition methods by a factor of more than 2 in both examples considered. The benefits of using the FETI-2 variant instead of the basic FETI method is even more pronounced in this case. It has to be pointed out that the overall optimization time with the global finite-difference sensitivity analysis is drastically reduced using the parallel version of the proposed two-level GSI(PCG)-FETI methods. More specifically, using the direct skyline solver, the optimization time with the ‘exact’ semi-analytical sensitivity analysis is 5 times faster than the corresponding time required by the global finite-difference sensitivity analysis, whereas using the parallel two-level PCG methods this ratio is reduced to 2. It is anticipated that for large-scale three- dimensional problems and for more design variables the superiority of the proposed methods will become even more evident. Therefore by using the two-level methods in a parallel com- puting environment, the performance of GFD sensitivity analysis can become competitive to the SA type of sensitivity analysis particularly in large-scale 3-D shape optimization problems. Bearing in mind the simplicity and ease of the implementation of the GFD compared to that of the SA, the former approach could be a prospective alternative to other sensitivity analysis approaches.

ACKNOWLEDGEMENTS This work has been supported by HC & M/9203390 project of the European Union. The authors wish to thank E. Hinton and J. Sienz for providing the shape optimization code ADOPT. The

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999) DOMAIN DECOMPOSITION METHODS FOR PARALLEL SOLUTION 303 authors are also grateful to D. Harbis for his assistance with the linear solution test examples that are presented in this work.

REFERENCES 1. C. Farhat and F.-X. Roux, ‘A method of finite element tearing and interconnecting and its parallel solution algorithm’, Int. J. Numer Meth. Engng., 32, 1205—1227 (1991). 2. M. Papadrakakis, Y. Tsompanakis, E. Hinton and J. Sienz, ‘Advanced solution methods in topology optimization and shape sensitivity analysis’, J. Engng. Comput., 13(5), 57—90 (1996). 3. M. Papadrakakis, ‘Domain decomposition techniques for computational structural mechanics’, in M. Papadrakakis (ed.), Parallel Solution Methods in , Wiley, New York, 1997, pp. 87—141. 4. J. Sienz, ‘Integrated structural modelling, adaptive analysis and shape optimization’, Ph.D. ¹hesis, Dept. of Civil Eng., Univ. of Wales, Swansea, UK, 1994. 5. K. U. Bletzinger, S. Kimmich and E. Ramm, ‘Efficient modeling in shape optimal design’, Comput. Systems Engng., 2(5/6), 483—495 (1991). 6. N. Olhoff, J. Rasmussen and E. Lund, ‘Method of exact numerical differentiation for error estimation in finite element based semi-analytical shape sensitivity analyses’, Special Report No. 10, Institute of Mechanical Engineering, Aalborg University, Aalborg, DK, 1992. 7. M. Papadrakakis and S. Smerou, ‘A new implementation of the Lanczos method in linear problems’, Int. J. Numer. Meth. Engng., 29, 141—159 (1990). 8. C. Farhat and F.-X. Roux, ‘Implicit parallel processing in structural mechanics’, Comput. Mech. Adv., 2, 1—124 (1994). 9. M. Papadrakakis and S. Bitzarakis, ‘A dual substructure method for solving large-scale problems on parallel computers’, Report 95-1, Institute of Structural Analysis and Seismic Research, NTUA, Athens, Greece, 1995. 10. N. Bitoulas and M. Papadrakakis, ‘An optimised computer implementation of the incomplete Cholesky factoriz- ation’, Comput. Systems Engng., 5(3), 265—274 (1994). 11. S. Bitzarakis, M. Papadrakakis and A. Kotsopoulos, ‘Parallel solution techniques in computational structural mechanics’, Comput. Meth. Appl. Mech. Eng., 148, 75—104 (1997). 12. C. Farhat, L. Crivelli and F.-X. Roux, ‘Extending substructure based iterative solvers to multiple load and repeated analyses’, Comput. Meth. Appl. Mech. Engng., 195—209 (1994). 13. M. Sharp and C. Farhat, ‘TOPDOMDEC—A totally object oriented program for visualisation, domain decomposi- tion and parallel processing’, ºser’s Manual, PGSoft and University of Colorado, Boulder, USA, 1994. 14. E. Hinton and J. Sienz, ‘Studies with a robust and reliable structural shape optimization tool’, in B. H. V. Topping (ed.), Developments in Computational ¹echniques for Structural Engineering, CIVIL-COMP Press, Edinburgh, 1995, pp. 343—358.

Copyright ( 1999 John Wiley & Sons, Ltd. Int. J. Numer. Meth. Engng. 44, 281—303 (1999)