Korea-Australia Rheology Journal, Vol.25, No.1, pp.55-64 (2013) www.springer.com/13367 DOI: 10.1007/s13367-013-0006-9

An efficient iterative scheme for the highly constrained augmented Stokes problem for the numerical simulation of flows in porous media Wook Ryol Hwang1,*, Oliver G. Harlen2,§ and Mark A. Walkley3 1School of Mechanical Engineering, Research Center for Aircraft Parts Technology (ReCAPT), Gyeongsang National University, Gajwa-dong 900, Jinju, 660-701, Korea 2Department of Applied Mathematics, University of Leeds, Leeds, LS2 9JT, United Kingdom 3School of Computing, University of Leeds, Leeds, LS2 9JT, United Kingdom (Received November 24, 2012; final revision received January 21, 2013; accepted January 28, 2013)

In this work, we present a new efficient iterative solution technique for large sparse matrix systems that are necessary in the mixed finite-element formulation for flow simulations of porous media with complex 3D architectures in a representative volume element. Augmented Stokes flow problems with the periodic boundary condition and the immersed solid body as constraints have been investigated, which form a class of highly constrained saddle point problems mathematically. By solving the generalized eigenvalue problem based on block reduction of the discrete systems, we investigate structures of the solution space and its sub- spaces and propose the exact form of the block preconditioner. The exact Schur complement using the fun- damental solution has been proposed to implement the block-preconditioning problem with constraints. Additionally, the algebraic and the diagonally scaled conjugate gradient method are applied to the preconditioning sub-block system and a Krylov subspace method (MINRES) is employed as an outer solver. We report the performance of the present solver through example problems in 2D and 3D, in comparison with the approximate Schur complement method. We show that the number of iterations to reach the convergence is independent of the problem size, which implies that the performance of the present iterative solver is close to O(N). Keywords: flow in porous media, representative volume element (RVE), iterative solver, block precondi- tioning, algebraic multigrid method

1. Introduction of complex porous microstructures for the application to the composite manufacturing. There are two necessary In this work, we consider a fast and efficient iterative ingredients in dealing with this class of porous media solution technique for the numerical simulation of flows flows: one is the treatment of the periodic boundary con- in porous media with complex micro-architectures to dition and the other is the introduction of the solid bodies investigate the flow behaviors such as the permeability, within the flow. As we will introduce later, mathematical the mobility of fluids with shear-dependent viscosity or treatments for these two lead to highly constrained flow the flow resistance of viscoelastic fluids, which has var- problem, the so-called augmented Stokes problem, par- ious industrial applications: e.g. liquid molding in com- ticularly with mixed formulation of the finite-element posite manufacturing, the packed-bed reactor in chemical method. A suitably preconditioned iterative scheme is engineering, the secondary-oil recovery in petroleum essential for successful flow simulations of large-scale industries and various filters in automobile industries. Due problems, as the use of a direct solver is impractical and to its repeated structure, it is necessary to introduce a rep- even impossible for large 3D problems. In this work, we resentative volume element containing a small number of aim to develop a new efficient iterative solution technique microstructures with periodic boundary conditions for for the highly constrained large sparse matrix system effective numerical simulations. Good examples are the using specific choices of block-preconditioning, which is works of one of the authors’ group (Wang and Hwang, specifically tailored for flow simulations of porous media 2008; Liu and Hwang, 2009; Hwang and Advani, 2010; within a unit cell. Liu and Hwang, 2012) in which the authors modeled 2D To introduce the problem, let us first consider the stan- and 3D structures in bi-periodic or tri-periodic unit cells dard Stokes problem in a domain Ω. Since fluid inertia is containing fibers or fiber tows to predict the permeability often neglected on scale of interest with flows in porous media, the Stokes flow is usually assumed as follows: * Corresponding author: [email protected] § Co-corresponding author: [email protected] ∇⋅ σ ==0, ∇⋅ u 0 (1)

© 2013 The Korean Society of Rheology and Springer 55 Wook Ryol Hwang, Oliver G. Harlen and Mark A. Walkley subjected to the following boundary conditions: It is worthwhile to briefly introduce their Krylov subspace/ multigrid method, as we will further extend their method in uu==, on ∂Ω and t t, on ∂Ω .(2) u t this work for much more complex highly constrained system The stress is σ = –2pI + ηD with the pressure p, the with the presence of the immersed solid bodies or the peri- identity tensor I, the viscosity η and the rate-of-the defor- odic boundary condition. For the discrete Stokes problem, mation tensor D = ()12⁄ ()∇uu+()∇ T . Suppose that the Silverster and Wathen (1994) introduced a finite element domain boundary Γ∂Ω()= be composed of the Dirichlet- mass matrix as a block preconditioner for the pressure ∂Ω ∂Ω type boundary u and the Neumann-type boundary t unknowns and the discrete Laplacian as a preconditioning ∂Ω ∂Ω ∂Ω and = u∪ t . From the standard Galerkin approx- block for the velocity unknowns. The mass matrix has been imation with the velocity and the pressure as the primitive shown to be spectrally equivalent to the exact Schur com- –1 variables, one can obtain the following weak form for this plement SGK= GT and it can be solved by a diagonally problem: Find ()u,p such that scaled CG method within a fixed number of iterations (Elman et al., 2005). Also, the discrete Laplacian can be ()Ω∇⋅v ηDu[] []Ωv tv⋅Γ –∫Ω p d +∫Ω 2 :D d = ∫∂Ω d , (3) optimally inverted by the multigrid method for the precon- t ditioning problem of the velocity unknowns. They showed ()Ω∇⋅u ∫Ω q d = 0 (4) that this combination of the approximate preconditioners clusters the eigenvalue spectrum independent of the mesh for all the admissible weighting functions ()v,q . The dis- size and thereby the convergence of the Krylov subspace crete finite element matrix system for Eqs. (3) and (4) can outer iterative scheme (MINRES) can be guaranteed within be written in a block-matrix form with a suitable com- a fixed number of iterations. The CPU time as well as the bination of discretized spaces for the velocity and the pres- memory usage is observed to scale linearly with the number sure as of degree of freedom. That is, the ultimate ON() perfor- ˜ mance of the solution scheme has been established. KG⎛⎞u ⎛⎞˜f KG ⎜⎟==, A .(5)In this work, we aim to develop a new efficient iter- T ⎝⎠ Stokes T G 0 ⎝⎠p˜ 0 G 0 ative solution technique for the highly constrained large The variable u˜ is a collective unknown for the discrete sparse matrix system using specific choices of block-pre- velocity variables, p˜ for the discrete pressure variables conditioning. Augmented Stokes flow problems with the and ˜f is the work equivalent nodal force due to the trac- periodic boundary condition and the immersed solid tion boundary condition. (The symbols with tilde indicate body as additional constraints have been investigated. By the discretized variables throughout this work.) solving the generalized eigenvalue problem in block The linear system shown in Eq.(5) is already highly con- matrix form, we propose an exact form of the block pre- strained and can be classified as the saddle point problem, conditioner to achieve the ON() performance of the iter- which means that the matrix AStokes is indefinite though ative solution technique. The exact Schur complement symmetric and it will have positive and also negative using the fundamental solution has been proposed to eigenvalues and pivots while elimination (Strang, 2007). implement the block preconditioner with the constraints. (We will consider additional constraints to the discrete The paper is organized as follows. In Sec. 2, we intro- Stokes problem in Eq. (5) in this work and therefore the duce the mathematical framework for the two highly class of the flow problem of interest in the present work constrained Stokes problem by presenting the weak form may be best called highly constrained augmented Stokes and the structure of the block matrices in their dis- problem.) This complication has originated from the pres- cretized form. In Sec. 3, we investigate the solution ence of the incompressibility constraint. Although the space of the discretized weak form by solving the gen- basic methods for solving large sparse indefinite problems eralized eigenvalue problem with block reduction and are the minimum residual (MINRES) and the generalized then propose an exact form of the block preconditioner minimum residual (GMRES), the iterative methods do not that guarantees fixed number of iteration for conver- behave satisfactory or even do not converge without a gence. In Sec. 4, we introduce the implementation tech- suitable choice of the preconditioner. An extremely effi- niques for this preconditioned iterative method, cient, indeed ON() , iterative scheme for the discrete particularly the exact Schur complement method using Stokes problem of Eq. (5) has been proposed by Silverster the fundamental solution. Finally, the performance of the and Wathen (1994). This approach formulates a block present solver will be presented through example prob- structured preconditioner and uses appropriate techniques lems in 2D and 3D, in comparison with the approximate for each block system such that it can be solved with opti- Schur complement method. We show that the number of mal efficiency. They employed the outer MINRES iter- iterations to reach the convergence is independent of the ation along with the inner iterations of the algebraic problem size, which implies that the performance of the multigrid and conjugate gradient (CG) methods. present iterative solver is close to ON() .

56 Korea-Australia Rheology J., Vol. 25, No. 1 (2013) An efficient iterative scheme for the highly constrained augmented......

2. Highly Constrained Stokes Flow Problems

2.1. The augmented Stokes flow problem with peri- odic boundary conditions The first problem of the two highly constrained Stokes problems of interest in this work is the flow with the peri- odic boundary condition. Let us first consider a simple 2D problem in a rectangular domain of []0,L ×[]0,H with the periodic boundary condition in the horizontal direction Fig. 1. A 2D model mesh (left) and local node numbering in an element (right). such that the velocity on the left boundary is the same as the that on the right boundary: i.e., selected a simple 2D model mesh as shown in Fig. 1. u()0,y = uLy(), , y∈[]0,H . (6) Nodal Collocation Using the nodal collocation, the col- To combine the periodic boundary condition with the location at all nodes, the boundary integral in Eq. (9) can weak form, one usually introduces a Lagrangian multiplier be written as λ ∂Ω λ∈ 2()∂Ω on the left boundary left :i.e., L left and the 5 µ ⋅Γ()u(), u(), µ ⋅ ()u u periodic boundary condition can be expressed as an addi- ∫ 0 y – Ly d = ∑ k k – k +30 ∂Ωleft tional constraint. Then the weak form for the Stokes flow k = 1 can be rewritten as: Find ()u,,p λ such that for all k. (11) ()Ω∇⋅v ηDu[] []Ωv λ –∫Ω p d +∫Ω 2 :D d +∫∂Ω Refer to Fig. 1 for example nodal numbering scheme left used in Eq. (11). In this case the block matrix Γ , which is λ⋅Γ()v()0,y –v()Ly, d = 0, (7) a 10× 70 matrix, for this specific problem in a symbolic form can be expressed as follows: ()Ω∇⋅u ∫Ω q d = 0 , (8) Γ = []I10× 10 O10× 50 –I10× 10 , (12)

∫ µ ⋅Γ()u()0,y –u()Ly, d = 0 . (9) where the sub-block matrix I10× 10 indicates the identity ∂Ωleft matrix of the size 10× 10 and O10× 50 is the null block for all the admissible weighting functions ()v,,q µ . Com- matrix of the corresponding dimension. Considering the paring with Eq. (3) and Eq. (7), one can find the identity following product of the matrix Γ between the Lagrangian multiplier for the periodicity and the traction force. Introducing approximate interpolations I10× 10 O10× 50 –I10× 10 T T for the velocity, the pressure and the Lagrangian multiplier, ΓΓ ==2I10× 10 and Γ Γ O50× 10 O50× 50 O50× 10 .(13) Eqs. (7-9) can be written as the matrix equation: –I10× 10 O10× 50 I10× 10 KGT ΛT ⎛⎞u˜ ˜ KGT ΛT ΓΓT ⎜⎟ ⎛⎞f we notice that the matrix is a full-rank matrix, ⎜⎟, T G 0 0 ⎜⎟p˜ ==⎜⎟0 A G 0 0 .(10)whereas Γ Γ is not with rank deficiency. ⎜⎟ ⎝⎠ Linear Continuous Interpolation A standard mortar Λ ⎝⎠λ˜ 0 Λ 0 0 0 0 element discretization is much more involved in this case. Dimensions and entries of the block matrix K, G and Γ Introducing the linear interpolation of the Lagrangian mul- are determined by the choice of the spatial discretization tiplier along the 1D element boundary (see Fig. 2), and we chose a quadrilateral element with the bi-quadratic ⎛⎞λ ˜ 1 velocity and the discontinuous pressure interpolations λ ==Nλ ()λ φ1 φ2 ⎜⎟, ⎝⎠λ2 (Q2 –P1 Crouzeix-Raviert element). Construction of the block matrices K and G is obvious in the standard Galer- the integral in Eq. (9) along the element boundary can be kin formalism and therefore we only consider the entries written as follows: of the block matrix Γ . Among several possible interpo- λ µ ⋅Γ()u(), u(), lation schemes for the Lagrangian multiplier , we con- ∫Γ 0 y – Ly d sider only two of them: (i) interpolation with the Delta left function at every node (nodal collocation) and (ii) a linear 2 ()µ T T[]Γ()()˜ ()˜ continuous interpolation. (The implementation with the = ∑ ˜ e ∫ Nλ N d u eleft, – u e, right . (14) Γeleft, weak form of the integrals in Eqs. (7) and (9) involved e = 1 with the periodic boundary is called the mortar element The symbol []N is the interpolation function for the method (Laursen, 2002)). To deliver the idea easily, we velocity unknowns and the subscript ‘e’ indicates the vari-

Korea-Australia Rheology J., Vol. 25, No. 1 (2013) 57 Wook Ryol Hwang, Oliver G. Harlen and Mark A. Walkley

the body. There are three advantages of the rigid-shell description with the in simu- lation of flow in porous media: first, one does not have to consider the interface conditions between solid and fluid, as the entire problem is essentially the fluid problem. The second one is the easiness in discretization of the immersed body using its boundary information only and, as will be shown later, one needs just points on the solid Fig. 2. An example 1-D element for the linear interpolation of the boundary. Thirdly, as the entire problem is the fluid flow Lagrangian multipliers. problem, one can use the regular mesh which facilitates simple implementation for the mortar element technique for the periodic boundary condition of the representative able defined within an element. Using the local numbering volume element. in Fig. 2, the block matrix Γ in the specific mesh of Fig. The zero boundary condition on the solid boundary ∂B 1 can be expressed in a symbolic form as can be expressed as u = 0, on ∂B . (17) U35× O35× O35× –U35× O35× Γ = ,(15) O35× U35× O35× O35× –U35× As was done with the periodic boundary condition, we define the Lagrangian multiplier λ on ∂B and the zero where a sub-block matrix U35× is velocity condition on the boundary can be treated as the constraint in the weak form in exactly the same form as 13⁄ 23⁄ 000 previous: Find ()u,,p λ such that U35× = 023⁄ 23⁄ 23⁄ 0 . ⁄ ⁄ 00 02313 –∫ p()Ω∇⋅ v d +∫ 2ηDu[] : D[]Ωv d +∫ λ⋅Γvd Ω Ω ∂B Again considering the following two products of the block matrix Γ : = ∫ tv⋅Γd , (18) Γt 2 T 2U33× 0 T ΓΓ == and Γ Γ ∫ q()Ω∇⋅ u d = 0 , (19) 2 Ω 02U33×

∫ µ ⋅Γud = 0 (20) ∂B

for all the admissible weighting functions ()v,,q µ . We employ the point to implement the integral over the solid boundary in Eqs. (18) and (20). For ,(16) example, the integral in Eq. (20) can be approximated as

M

∫ µ ⋅Γud ≈ ∑ µ ⋅ux(), (21) ∂B k k k = 1 ΓΓT x µ the matrix is found to be a full-rank matrix, whereas where M, k and k are the number of collocation points ΓTΓ is not with rank deficiency. on ∂B , the position of the k-th colocation point and the col- x located Lagrangian multiplier at k , respectively. The 2.2. The augmented Stokes flow problem with resulting matrix equation appears exactly the same as the immersed solid bodies previous periodic boundary condition in Eq. (10). Though The second highly constrained Stokes problem of inter- not presented here, the product of the corresponding off- est in this study is the Stokes flow with immersed solid diagonal sub-block matrix Γ satisfies the same charac- bodies. To model this problem, we use the so-called rigid- teristics: the matrix ΓΓT is a full-rank matrix, whereas ΓTΓ ring (or rigid-shell in 3D) description, one of the fictitious is not with rank deficiency. domain methods, such that the interior of the solid body is considered as a part of the fluid with the same constitutive 3. Block-Preconditioning Strategy and the Gen- equation as the fluid domain with the zero velocity con- eralized Eigenvalue Problem dition on the solid boundary (Wang and Hwang, 2008; Liu and Hwang, 2009; Liu and Hwang, 2012). As fluid inertia For the solution of the matrix equation in Eq. (10), we pro- is neglected in the Stokes flow, the zero velocity condition pose a block preconditioner P similar to (or motivated by) on the solid boundary ensures vanishing velocity inside the Schur complement in the discrete Stokes problem: i.e.,

58 Korea-Australia Rheology J., Vol. 25, No. 1 (2013) An efficient iterative scheme for the highly constrained augmented......

Γ , denoted by N()Γ , satisfies Γu˜ = 0 and has the dimen- () sion of nu –nλ . IV (iii) The dimension of the subspace Vh is no larger than nλ or np , which means that ()()c ∩ ()Γ c ≤ (), dim NG N min np nλ . (iv) The number of unknowns related with the constraint (the periodic boundary condition or the zero velocity con- dition) can be safely considered much smaller than the primitive variables u˜ and p˜ , since the dimension where the constraint is one-order lower than those of u˜ and p˜ : i.e., nλ < < n

Korea-Australia Rheology J., Vol. 25, No. 1 (2013) 59 Wook Ryol Hwang, Oliver G. Harlen and Mark A. Walkley

We have four distinct eigenvalues ()–1012,,, , whose exact solution method for Eq. (29) using the fundamental ()IV multiplicity is dim Vh , since the Schur complement solution and present the performance of our scheme in matrix T is symmetric and positive-definite. comparison with the approximation method for the matrix –1 In summary, the preconditioned matrix P A has only Γ proposed by Elman (1996). six eigenvalues ()–11,()– 5⁄201,,,() 1+ 5⁄22, and there- –1 –1 –1 fore, once K , S and T are computed exactly by any 4.1. Implementation of block preconditioning with means, the solution of the iterative scheme converges to an approximate solution of the Schur comple- the exact solution within maximum six iterations, indepen- ment dent of the problem size. Further, if the solution methods to First we start presenting the approximation method for –1 –1 –1 obtain K , S and T satisfy the ON() performance, the the solution of Eq. (29), which was originally developed iterative scheme for the entire problem will show the ulti- of Elman (1996) and is called as the ‘BFBt’ precondi- mate ON() performance. tioner. It is relatively easy to implement but at the same time the number of iterations for the convergence scales 4. Implementation Techniques with ON() and therefore the CPU time scales with ON() N . Here we present a little bit modified imple- Although the preconditioner P in Eq. (22) is theoreti- mentation scheme to clearly show the procedures, which cally optimal, one cannot employ the preconditioner as is, are composed of three steps as below. –1 –1 since the Schur complement SGK= GT and T = ΓK ΓT Step 1 [Solution of Γyr= ] This can be done by taking involves the inverse of K which is prohibitively expensive y = ΓTy∗ . Firstly solve for y∗ by themselves. Therefore one needs further approximation ΓΓTy∗ = r , (30) of the preconditioning matrix P. In this section, we seek implementation techniques for the preconditioning prob- and find y from y = ΓTy∗ . As illustrated in Eqs. (13) and lem. As the preconditioners K and S, which are related (16), the matrix ΓΓT is small and easily invertible for both with the velocity and the pressure unknowns respectively, the constrained problems. Especially with the nodal col- are the same as the Stokes flow problem, we can follow location, the matrix ΓΓT is simply twice of the identity the approach outlined for the Stokes systems (Silvester matrix. –1 and Wathen, 1994; Elman et al., 2005; Hwang et al., Step 2 [Solution of A xy= ] The solution x can be 2011a). For the approximation of the preconditioning obtained by a simple matrix-vector multiplication though ˆ block matrix , we employ the discrete Laplace operator K , hugh: which can be inverted by an algebraic multigrid (AMG) xAy= (31) V-cycle to provide a fast and efficient solution. Details of analyses on the choice of the discrete Laplace operator has Step 3 [Solution of ΓTzx= ] This problem can be trans- been presented by the authors’ previous work (Hwang et formed into a trivial problem by multiplying Γ on both al., 2011a). The block preconditioner S, the Schur com- sides, since ΓΓT is easily invertible. plement matrix of the Stokes problem, is spectrally equiv- ΓΓTzx= (32) alent to a mass matrix M in the pressure space (Elman et al., 2005). Therefore we adopt the mass matrix M in the The whole solution process requires two matrix inver- discrete pressure space as the approximation of the Schur sions in the form of ΓΓTxb= and three matrix-vector complement S in the present study and the mass matrix multiplications among which one is hugh xAy= and the can always be easily solved within a fixed number of iter- other two are small. Though it looks concise and clear, the ation with the diagonally scaled CG, independent of the final solution of Eq. (32) is not exact but only an approx- number of unknowns. imation. The reason is that x in Eq. (32) does not reside in From the above statement, the approximation of the pre- the range of ΓT in general and the equation in the third ˆ conditioner, denoted by P , can be expressed as step ΓTzx= does not have exact solution. Therefore, one ˆ ˆ may expect minor improvement in iterative performance P = diag()K,,MT .(28) and as will be seen later the ON() performance cannot be The last remaining problem is the sub-preconditioning achieved. problem with T, which can be written as –1 T 4.2. Implementation of block preconditioning with Tz= r , with T = ΓK Γ (29) the fundamental solutions The symbols z and r represent the preconditioned resid- In the present work, we propose an exact solution of ual and the residual vector ()rbAx= – . The problem with Eq. (29) using the fundamental solutions and with this Eq. (29) is that the block matrix Γ is not a square matrix method we start rewriting Eq. (29) as follows: Find z sat- and its inverse is not obvious. In this work, we propose an isfying

60 Korea-Australia Rheology J., Vol. 25, No. 1 (2013) An efficient iterative scheme for the highly constrained augmented......

ary condition and the immersed solid body problem with ΓTzKy= ,(33) the rigid-ring or rigid-shell description. subjected to the constraint of (ii) The most time-consuming step is Eq. (37) for the solution of the complete fundamental solution set y∗ ’s. Γyr= .(34) i However one can construct the AMG matrix only once γ In this method, we represent the solution z in terms of y and use it repeatedly for all right hand side vectors i ’s –1 which resides in the column space of K ΓT , or range with the splitted AMG method. This facilitates significant ()–1ΓT γ ΓT K . Let i be the i-th column vector of , which is saving in computation time and moreover the AMG × () an nu nλ matrix. method has the ON performance. (The memory usage and CPU time scales linearly with the number of unknowns.) ΓT = .(35)(iii) For the problems of interest in this work, the matrix Γ does not change even in the time dependent problem or problems with nonlinear material properties (e.g. shear-

We notice that nλ « nu in the constrained problems of this thinning viscosity) and this means that one can construct study, since the Lagrangian multiplier is defined on the the Schur complement T once and use it for all. space which has one order lower dimension than the (iv) As every fundamental solution yi is independent, it γ velocity unknowns. Note also that the i vectors are inde- is not necessary to build the matrix Y explicitly, which is pendent each other as the matrix ΓT is of a full rank. That a hugh matrix. In practice, one can introduce a temporary T is, rank()Γ = nλ . Now one can express the right-hand side vector as a fundamental solution and use it to build the j- γ of Eq. (33) as a linear combination of the i vectors with th column of the matrix T by using Eq. (40). the constant coefficient zi ’s: (v) As the Schur complement matrix T is completely nλ full, there is no advantage to use a sparse matrix storage ΓT γ zz= ∑ i i .(36)and related solution technique in solving Eq. (39). We use i = 1 a simple LU decomposition provided in LINPACK. The ∗ Let yi be the fundamental solution of the problem. LU decomposition can be used for all repeated, which invokes significant reduction in computation time as well. Ky∗ = γ , ()i = 1,,… n (37) i i λ Therefore, this method does not require any additional The name ‘fundamental solution’ seems to be proper for storage other than the matrix T itself. ∗ two reasons: one is that yi is the solution for each column vector of ΓT and the other is that the solution of Eq. (34) 5. Numerical Examples ∗ can be represented as a linear combination of yi such that

nλ The first test problem is a 3D cubic channel Stokes flow ∗ × × yz= ∑ iyi (38) of the size 111 with the pressure drop in one direction, i = 1 where the periodic boundary condition is applied. We Finally one can get the exact solution of Eq. (34) sub- tested the two interpolation schemes for the Lagrangian jected to the constraint of Eq. (35) by solving the linear multipliers, the nodal collocation and the linear contin- system below: uous interpolation. Three different finite element meshes have been employed from coarse to fine: 55× ×5 , 10× 10×10 and 20× 20×20 . Note that the number of Tz= r , with T = ΓY and Y = (39) unknowns increases by the factor of 64. In all the com- putational results presented here, the number of V-cycles in the AMG is set to six and the convergence tolerance for –7 The matrix T is the exactly Schur complement defined the norm of the residual vector has been set to 10 rel- in Eq. (22) and is an nλ ×nλ square matrix whose com- ative to the norm of the original right-hand side vector b, γ ∗ ⁄ ≤ –7 ponent is a simply inner product of i and yi : i.e. rb 10 as the convergence criteria. Performance tests were carried out using a single processor of an SMP T = γ ⋅y∗, ()ij, = 1,,… n . (40) ij i j λ workstation with two dual-core CPUs (Intel Xeon 5140, From Eq. (40) one identifies that the matrix T is unsym- 2.33 GHz) and 32 GB memory. metric and almost full matrix. We have several remarks on Plotted in Fig. 4 are the convergence behaviors of the this method. iterative scheme based on the approximation of the Schur (i) This scheme is based on the fact that the number of complement discussed in Sec. 4.1. In both the results of Lagrangian multipliers is much smaller that that of the nodal collocation and linear continuous interpolation of velocity unknowns, which is valid in the periodic bound- Lagrangian multipliers, the number of iteration for the

Korea-Australia Rheology J., Vol. 25, No. 1 (2013) 61 Wook Ryol Hwang, Oliver G. Harlen and Mark A. Walkley

Fig. 4. (Color online) Convergence behaviors of the iterative tech- Fig. 5. (Color online) Convergence behaviors of the iterative nique for 3D Stokes problem with the periodic boundary condition technique for 3D Stokes flow with the periodic boundary con- implemented with the approximate Schur complement scheme for dition implemented with the exact Schur complement scheme both nodal collocation and linear continuous interpolation of the using the fundamental solution for the linear continuous inter- Lagrangian multipliers. ‘p1n’ indicates results from the nodal col- polation of the Lagrangian multipliers. location and ‘p1c’ is for the linear continuous interpolation.

preconditioning problem in (39) involving the LU decom- convergence is found to increase with the number of position and related full matrix construction. (A hugh unknowns and more specifically results show that as the problem with Eq. (37) involves the inverse of the matrix number of unknowns increases from 4493 (55× ×5 ) to K but we employ the AMG as mentioned in Sec. 4.2 31783 (10× 10×10 ) and 238763 (20× 20×20 ), the number which has the ON() performance by itself.) However, of iterations has turned to increase much minorly from 59 since the Lagrangian multiplier is defined in the space one to 107 and 147, respectively. The number of iterations has order lower than those of the velocity and the pressure, the 14⁄ found to scale with ON() and the CPU time will scale increase in memory and CPU time in the preconditioning 54⁄ with ON() , which shows a bit faster convergence in for the Lagrangian multiplier variables can be expected comparison to the result presented by Elman (1996). minor. In summary, the iterative solution technique with Having validated the correctness of our code, we tested the fundamental solution for the exact Schur complement the proposed exact Schur complement method using the shows nearly the ON() convergence, because the outer fundamental solution and the results are presented in Fig. MINRES iteration converges within the fixed number of 5. Plotted in Fig. 5 are the convergence behaviors for the iteration and, among the three block preconditioning same 3D Stokes problem in a cubic channel with the peri- schemes, two large problems related to the velocity and odic boundary condition implemented with the exact pressure unknowns have the ON() convergence. Schur complement scheme using the fundamental solution Finally we present the convergence behavior of the for the linear continuous interpolation of the Lagrangian Stokes flow problem with an immersed solid body in Fig. multipliers. Fig. 5 shows monotonic exponential conver- 6. A circle of radius 0.15 is centered in a domain of the gence (nearly straight line) and that the number of iter- size 11× in the 2D problem and a sphere of radius 0.15 ations of the outer MINRES algorithm to reach is centered in a domain of the size 11× ×1 as for the 3D convergence is roughly constant irrespective of the prob- problem. The particle boundary is discretized with uni- lem size. As was already shown in Hwang et al. (2011a) formly distributed (collocation) points and non-trivial task and Eq. (28), the AMG algorithm for the preconditioning in obtaining uniformly spaced points on a spherical sur- velocity unknowns ()Kzˆ = r has ON() expenses in both face has been performed by using the spiral point-set memory and CPU time and the preconditioning for the method (Saff and Kuijlaars, 1997). The numbers of col- pressure variable ()Mz= r with the diagonally scaled CG location points were 21 with a 20× 20 mesh in 2D and 81 converges within a single iteration for this specific choice with a 10× 10×10 mesh in 3D and they were chosen to of the pressure discretization, the overall performance can scale with the number of elements for larger or smaller be expected to be close to ON() in both the memory usage problems. The same pressure difference boundary con- and computation time. It is close to the ultimate ON() dition as previous is applied such that the circle (or convergence, since there is one exception in the proposed sphere) is an obstacle for flow separation. In 2D problem iterative scheme. The only exception is the solution of the (Fig. 6a), one can again observe monotonic exponential

62 Korea-Australia Rheology J., Vol. 25, No. 1 (2013) An efficient iterative scheme for the highly constrained augmented......

Fig. 6. (Color online) Convergence behaviors for 2D (a) and 3D (b) Stokes flow with an immersed solid body implemented with exact Schur complement scheme using the fundamental solution. convergence along with the fixed number of iteration for those of the primitive variables, which is particularly true both 10× 10 and 20× 20 meshes indicating mesh-inde- for the periodic boundary constraint and the zero velocity pendent number of outer iterations, as in the previous peri- condition on the immersed solid body boundary which are odic boundary problems. However, we observed the two main ingredients for successful flows simulations –5 convergence stalling, around the residual value of 10 , in porous media with the representative volume element though global convergence behavior is somewhat satisfac- (unit cell). The block preconditioner has been proposed by tory. We need further investigation for the origin of the con- solving the generalized eigenvalue problem to result in vergence stalling. From our previous experience on the only six different eigenvalues of the preconditioned matrix Stokes flow, this phenomenon might be related with mod- system. Further, we proposed the exact Schur complement ification of the singularity behavior by introducing the (non- method using the fundamental solution. We illustrated the singular) preconditioner (Hwang et al. 2011a), where the performance of the present solver through example prob- pressure specification to remove singularity in all Dirichlet lems in 2D and 3D, in comparison with the approximate boundary conditions invokes adverse effects such as con- Schur complement method. We reported that the number vergence stalling (delay in the convergence rate) counter to of iterations to reach the convergence is independent of common wisdom. Similarly, Elman et al. (2005) reported the problem size, which implies that the performance of that rank deficiency for an enclosed flow does not prevent the present iterative solver is close to ON() , since the sub convergence to a consistent solution for Stokes and Navier- block preconditioning schemes related with pressure and Stokes problem. The optimal number of collocation points velocity unknowns have the ON() convergence. in direct solver might yield additional constraints to the Further development of the iterative solution techniques of matrix system to be an over-constrained problem. this class is necessary particularly for flow simulations with complex fluids such as viscoelastic fluids, particle suspen- 6. Conclusions sions, droplet emulsions and/or fiber suspensions, in which one always wants to solve large-scale 3D flow problems to In this work, we have presented a new efficient iterative understand hydrodynamic interactions, particle-particle/ scheme to apply large-scale 3D simulations of flows in droplet-droplet interactions in various ranges. The next step porous media, using on the optimal block-preconditioning would be the development of a similar iterative solution and the combination of the Krylov-subspace/AMG method for massive 3D simulations of particle or fiber sus- method and the exact Schur complement method with the pensions (e.g., Hwang et al., 2011a). The periodic boundary fundamental solution. The use of the exact Schur com- conditions and the treatment of immersed solid body in this plement can be justified by the fact that the Lagrangian work can be extended to solve particle suspension flow sim- multipliers are defined in the space one order lower than ulations without any significant modifications.

Korea-Australia Rheology J., Vol. 25, No. 1 (2013) 63 Wook Ryol Hwang, Oliver G. Harlen and Mark A. Walkley

Acknowledgement simulations of orientational dispersion of platelet particles in a viscoelastic fluid subjected to a partial drag flow for effective This work was supported by the National Research coloring of polymers, Korea-Australia Rheol. J. 23, 173-181. Foundation of Korea (NRF) funded by the Ministry of Koay, C.G., 2011, Analytically exact spiral scheme for generating Education, Science and Technology (2011-0031383 and uniformly distributed points on the unit sphere, J. Comput. Sci. 2010-0021614). 2, 88-91. Laursen, T.A., 2002, Computational Contact and Impact References Mechanics, Springer, Berlin. Saff, E.B., and A.B.J. Kuijlaars, 1997, Distributing many points on a sphere, Mathematical Intelligencer 19, 5-11. Elman, H.C., 1996, Preconditioning for the steady-state Navier- Silvester, D.J. and A.J. Wathen, 1994, Fast iterative solution of Stokes eqautions with low viscosity, SIAM J. Sci. Comput. 20, stabilized Stokes systems. Part II: Using general block pre- 1299-1316. conditioner, SIAM J. Numer. Anal. 31, 1352-1367. Elman, H.C., D.J. Silvester, and A.J. Wathen, 2005, Finite Ele- Strang, G., 2007, Computational Science and Engineering, ments and Fast Iterative Solvers with Applications in Incom- Wellesley, Cambridge. pressible Fluid Dynamics, Oxford University Press, New York. Wang, J.F., and W.R. Hwang, 2008, Permeability prediction of Hughes, T.J.R., 1987, The , Prentice-Hall fibrous porous media in a bi-periodic domain, J. Compos. Inc., New Jersey. Mater. 43, 909-929. Hwang, W.R. and S.G. Advani, 2010, Numerical simulations of Liu, H.L., and W.R. Hwang, 2009, Transient filling simulations Stokes-Brinkman equations for permeability prediction of in unidirectional fibrous porous media, Korea-Australia Rheol. dual-scale fibrous porous media, Physics of Fluids 22, 113101. J. 21, 71-79. Hwang, W.R., M.A. Walkley, and O.G. Harlen, 2011a, A fast and Liu, H.L., and W.R. Hwang, 2012, Permeability prediction of efficient iterative scheme for viscoelastic flow simulations with fibrous porous media with complex 3D architectures, Compos. the DEVSS finite element method, J. Non-Newtonian Fluid Part A 43, 2030-2038. Mech. 166, 354-362. Hwang, W.R., W.R. Kim, E.S. Lee and J.H. Park, 2011, Direct

64 Korea-Australia Rheology J., Vol. 25, No. 1 (2013)