RESEARCH STATEMENT

Xuemin Tu

1 Overview

My general research interest is in developing efficient numerical algorithms and applying them to real life problems. Particularly, I have been working on particle filters, domain decomposition algorithms, nonlinear multigrid methods. I have helped develop some special finite element methods for non-convex domains. I have also worked on some biomathematical projects such as the effect of cannibalism on flour beetle population dynamics and blood flow in stenotic collapsible tubes.

2 Previous Research Experiences and Accomplishments

2.1 Domain Decomposition Algorithms and Parallel Computation Usually the first step of solving a linear elliptic partial differential equation (PDE) numerically is its . Finite difference, finite element, or other discretization methods reduce the original PDE to an often huge and ill-conditioned linear system of algebraic equations

Au = f. (1)

Limited by the memory and speed of computers, the traditional direct solvers often cannot han- dle such large linear systems. Also, iterative methods, such as Krylov space methods, can need thousands of iterations to obtain accurate solutions due to large condition numbers of such systems. If we can find a matrix P such that P −1A has a smaller condition number than A and P −1 acting on a vector is much easier to compute than for A−1, we can then solve

−1 −1 P Au = P f, (2) instead of (1). This will need much fewer iterations when we use Krylov space methods because of the much smaller condition number of P −1A. We call the matrix P −1 a of A. Domain decomposition methods provide efficient that can be accelerated by Krylov space methods. They have become popular in applications in computational fluid dynamics, structural , electromagnetics, constrained optimization, etc. The basic idea of domain decomposition methods is to split the original huge problem into many small problems which can be handled by direct solvers, and then solve these smaller problems a number of times and accelerate the solution of the original problem with Krylov space methods. The preconditioner of a domain decomposition method can often be written as:

N −1 T −1 T −1 P = Ri DiAi DiRi + R0 A0 R0. =1 kX 1 Here, we have decomposed our original domain Ω into subdomains Ωi, i = 1 · · · N. Ai is the local problem on subdomain Ωi and A0 is a coarse problem. Ri are restriction operators and Di are certain matrices of weights. There are two main classes of domain decomposition methods: overlapping Schwarz methods and iterative substructuring methods. One well-known family of the iterative substructuring do- main decomposition methods is the Balancing Domain Decomposition by Constraints (BDDC) algorithms, which were introduced and analyzed in [3, 14]. In BDDC algorithms, we first reduce the original system to a subdomain interface system (Schur complement). The local components of the BDDC preconditioners are the local Schur complements of the subdomains and the coarse com- ponent is given in term of a set of primal constraints chosen for each pair of adjacent subdomains. We can obtain condition number bounds of the form 2 −1 H κ(P A) ≤ C 1 + log . (3) BDDC h   Here H is the diameter and h is the typical mesh size of subdomains and C is a constant independent of H and h. Combining this estimate and the convergence analysis of Krylov space methods, we can conclude that the of BDDC is independent of the number of subdomains but depends weakly on the problem size of each subdomain. These are efficient and scalable algorithms. For parallel computation, we can assign one or several subdomains to individual processors and the coarse problem to one processor or to each processor. In each iteration, the subdomain local problems of different processors are solved in parallel. The sizes of the subdomain local and coarse problems are much smaller than the original problem. Therefore, we can obtain a very good parallel efficiency with BDDC algorithms. The key issue in BDDC algorithms is how to choose the primal constraints which define the coarse problems. I have extended the BDDC algorithms to several new applications and will discuss them in some detail in Section 2.1.1. Domain decomposition methods not only provide efficient preconditioners, but can also be coupled directly to in some cases. In collaboration with Dr. Maksymilian Dryja of Warsaw University, Poland, we have developed a domain decomposition discretization for the suitable for parallel computation; see [5].

2.1.1 Extensions of BDDC Algorithms I have extended the BDDC algorithms to scalar elliptic equation with two kinds of discretizations: namely mixed and hybrid finite element discretizations, which have many applications such as for flow in porous media. These discretizations give systems of equations of following form:

T A B u F1 = . (4) " B 0 # " p # " F2 #

The system matrix of (4) is symmetric indefinite with the matrix A symmetric, positive definite. The hybrid finite element discretization is equivalent to a nonconforming finite element method. We can reduce the original saddle point problem to a positive definite system for the pressure p by introducing Lagrange multipliers on the interface of the subdomains and by eliminating the velocity u in each subdomain. I then use the BDDC preconditioner to solve the interface problem for the Lagrange multipliers, which can be interpreted as an approximation to the trace of the pressure.

2 By enforcing a suitable set of constraints, I obtain the same convergence rate as for a conforming finite element case (3), see [20]. Using the mixed formulation, I obtain a saddle point problem which is closely related to that arising from the incompressible Stokes equations [13]. I first reduce the original problem to an interface problem with the solution in a benign space, which is a subspace in which the BDDC preconditioned operator is positive definite. I choose edge/face constraints to force the iterates into the benign space. The conjugate gradient methods can therefore be used to accelerate the convergence. I have proved a condition number bound as in (3) for this BDDC algorithm, see [19]. We have also considered BDDC algorithms for some symmetric indefinite and nonsymmetric positive definite problems. This is a joint work with Dr. Jing Li of Kent State University. For the symmetric indefinite case, we consider the following system of linear equations 2 Au = (K − σ M)u = f, (5) where K is the stiffness matrix, M is the mass matrix, and σ is a scalar. This system arises from a finite element discretization of the Helmholtz equation on bounded interior domains or from inverse iterations for eigenvalue problems. Our BDDC algorithm is motivated by the dual-primal finite element tearing and interconnecting algorithm (FETI-DP) for solving the time-harmonic wave propagation problems (FETI-DPH), which was first proposed by Farhat and Li in [6]. The FETI-DPH method has been shown to be parallel scalable by extensive experiments and has been applied to the simulation of elastic waves in structural dynamics problems, and to the simulation of sound waves in acoustic scattering problems. A key component in FETI-DPH, and our BDDC algorithm, is some plane waves incorporated in the coarse level problem to enhance the convergence rate. These plane waves represent exact solutions of the partial differential equation in free space. This idea was first introduced by Farhat, Macedo, and Lesoinne [7] with the FETI-H algorithm for solving the Helmholtz equations. We use the GMRES iterations for the BDDC preconditioned system. Under the condition that the diameters of the subdomains are small enough, we prove that the convergence rate of the GMRES iteration depends polylogarithmically on the dimension of the individual subdomain problems and that it improves with a decrease of the subdomain diameters. We also establish the spectral equivalence between the proposed BDDC algorithms and the FETI- DPH algorithms for solving (5). Therefore, a convergence analysis of FETI-DPH algorithms is also obtained, see [12]. The systems of linear equations arising from the finite element discretization of advection- diffusion equations are nonsymmetric, but usually positive definite. We proposed a BDDC precon- ditioner for this nonsymmetric but positive definite system. A preconditioned GMRES iteration is used to solve a Schur complement system of equations for the subdomain interface variables. A key component in this BDDC algorithm, is certain constraints related the flux across the sub- domain interface incorporated in the coarse level problem to enhance the convergence rate. A convergence rate estimate for the GMRES iteration is established, under the condition that the di- ameters of subdomains are small enough. It is independent of the number of subdomains and grows only slowly with the subdomain problem size. Numerical experiments for several two-dimensional advection-diffusion problems illustrate the fast convergence of this algorithm, see [24].

2.1.2 Three-level BDDC Algorithms One of the shortcoming of the BDDC methods is that the coarse problem needs to be generated and factored by a direct solver at the beginning of the computation. The number of primal constraints

3 selected for each subdomain must be large enough to make sure that the preconditioned system has the condition number bound as in (3). The coarse component can therefore be a bottleneck if the number of subdomains is very large. Motivated by this, I have developed two three-level BDDC algorithms to remove this difficulty. I group several subdomains together into a subregion and then use BDDC idea recursively for the coarse problem. I first reduce the original coarse problem to a subregion interface problem by eliminating the subregion interior variables independently. In the three-level BDDC algorithms, I do not solve the subregion interface problem exactly, but treat it by doing one iteration of the BDDC preconditioner. This means that I only need to solve several local subregion problems and one coarse problem on the subregion level for each iteration. We can assume that all these problems are small enough to be solved by direct solvers. I also show that the condition number estimate for the three-level preconditioned BDDC operator is bounded by

2 Hˆ H 2 κ ≤ C 1 + log 1 + log , (6) H h !   where Hˆ , H, and h are the typical diameters of the subregions, subdomains, and the finite element mesh, respectively. C is a constant independent of Hˆ , H, and h. 2 Hˆ In order to remove the additional factor 1 + log H in (6), I can use a Chebyshev iteration to accelerate the three-level BDDC method. With this accelera tion, the condition number bound is

H 2 κ ≤ CC(k) 1 + log , (7) h   where C(k) is a constant, which depends on the eigenvalues of the preconditioned coarse problem, the two parameters chosen for the Chebyshev iteration, and k, the number of Chebyshev iteration. C(k) goes to 1 as k goes to ∞. H and h are the same as before. I first obtained these results for two dimensional problem with primal vertex constraints, see [22]. I then extended these algorithms to the three dimensional cases with constraints on the averages over subdomain edges. The new constraints lead to a considerably more complicated coarse problem and the need for new technical tools in the analysis. The same condition number bounds (6) and (7), with Chebyshev acceleration, were obtained for three dimensional cases in [21]. In collaboration with Dr. Hyea Hyun Kim of Chonnam National University, Korea, we have ex- tended the three-level BDDC algorithms to mortar finite element discretizations with geometrically nonconforming subdomain partitions, see [11]. Mortar finite element allows for different meshes in different subdomains. The optimal approximation property is obtained by enforcing the mortar matching continuity condition across the subdomain boundaries. Because of the nonmatching mesh and geometrically nonconforming partition, the coarse problems in this case even are larger than for standard finite element discretizations. In the BDDC algorithms for saddle point problems, [13, 19], the number of selected primal constraints in each subdomain must be large enough to make sure not only a good condition number bound, but also that the iterates stay in a benign space. Moreover, the coarse problem formed by these constraints is no longer positive definite. All these facts make it is even more important to have a good inexact solver for the coarse problems in those two-level BDDC algorithms for saddle point problems. In [23], I analyzed such coarse problems and introduce a three-level BDDC algorithm, which can be applied to the two algorithms of [13, 19]. We also obtain the same condition number estimate as in (6) for the mixed formulation for scalar elliptic problems. Numerical experiments

4 illustrate that two-level BDDC algorithms can fail because of too small memory while the three-level algorithms still wrok well on the same computer.

2.2 Nonlinear Multigrid Methods Newton’s method, combined with multigrid to approximately solve the linear systems, is a standard approach for the solution of the algebraic systems arising from the discretization of nonlinear partial differential equations. The full approximation scheme (FAS) is an alternative approach which never requires the full linearization of the nonlinear operator. Both approaches use multiple discretization grids to accelerate the convergence. Despite superficial differences between the two approaches they actually define a class of iterative methods where the only essential difference is how the linearization is done and when it is done. In addition, the convergence of these methods may be accelerated using common techniques. It is important to understand these relationships in order to select the one most appropriate for a modern computer system; this will depend on mathematical properties of the problem as well as the memory bandwidth/latency and floating point speed of the computer system. We have studied simple and full FAS version with V or W cycles and tested different smoothers. We have also considered Krylov space type methods to accelerate FAS and the use of FAS as a nonlinear preconditioner for Quasi-Newton method. All these algorithm are implemented in PETSc [1] and give very good results for our test problems in [16]. This is a joint work with Dr. Barry Smith of Argonne National Laboratory.

2.3 Enhanced Singular Function Mortar Finite Element Methods for Noncon- vex Domain This project concerned how to obtain second order discretizations (first order in the energy norm and second order in L2 norm) for the Poisson problem on non-convex domains. We also worked on how to obtain second order accurate tensor coefficients corresponding to the non-convex corner. We have developed three families of finite element methods that combine mortar finite elements and singular functions. The results of our numerical experiments show that these three families all get second order accuracy. We also analyzed the H1 and L2 error for these methods using function analysis and finite element theory. A complete theory was developed for one of the methods. For the other two methods, the theory is under development supported by very promising preliminary numerical results; see [25, 15]. This is joint work with Dr. Marcus Sarkis of Worcester Polytechnic Institute.

2.4 Simulation of Blood Flow in Stenotic Collapsible Tubes In this project, I parallelized a Fortran code which simulates the viscous flow in a stenotic elastic tube with large wall deformation and collapse. The fluid is governed by the Navier-Stokes equations and the wall is simulated using a thin-shell theory. The incremental boundary iteration method is used to handle the fluid-wall interactions and the Navier-Stokes equations are discretized by un- structured general finite difference method. The SIMPLER method with direct solvers (LAPACK) is used for both the pressure and velocity equations, see [18]. I have parallelized the code using MPI commands on an IBM SP2. The original direct linear solver needs a very large memory, which is a severe limitation for 3D problem as noticed in Section 2.1. Moreover, it is not easy to parallelize.

5 Therefore, we replace it by more efficient and easier parallelized domain decomposition solvers. In addition to several simple iterative solvers, I have tested the GMRES iterations with an Additive Schwarz preconditioner and versions of the Restricted Schwarz preconditioner with one or two mesh cell of overlap. GMRES with right preconditioner provides a good solver for this problem and the running time could be reduced by a factor eight.

2.5 Effect of Cannibalism on Flour Beetle Population Dynamics In this project, we constructed two kinds of population models for flour beetles: age-structured and size-structured population models. The asymptotic behaviors of the solutions of the population models with cannibalism were discussed by analyzing the bifurcation diagram of their equilibriums and by numerical simulation. The effect of cannibalism among individuals on the population dynamics was studied, and some dynamical characteristics of the size-structured population model with cannibalism, due to the nonlinear growth, were shown. A complete theory was developed for the existing steady states of the size-structures (hyperbolic PDEs) and the stability of the steady states (using bifurcation and semi-group theory). We established a numerical simulation method to study the dynamic behaviors of these models and to explain the population dynamics from a new view point of ”energy”. This is a joint work with Dr. Haiynag Huang of Beijing Normal University, China, see [10, 9, 8].

3 Current and Future Research Directions

3.1 Particle Filters and Oceanography Applications

We consider the evolution of a sequence of the state variables xn, n = 1, 2, · · ·, given by

xn = fn(xn−1,ξn−1), where ξn, n = 1, 2, · · · is an independent and identical distributed noise sequence. We would like to estimate the state variables xn based on the measurements yi, i = 1, · · · ,n, given by

yn = hn(xn, ηn), where ηn is an independent and identical distributed measurement noise sequence. This requires the construction of the probability density function of the posterior distribution p(xn|y1,···,n). If we assume that the functions fn and hn are linear functions and the posterior distribution is Gaussian, the Kalman filter gives the optimal solution to this problem. However, this assumption does not hold in many applications. A particle filter is a sequential importance sampling type algorithm to generate samples from the posterior distribution without the limitation to linear functions or Gaussian distribution. However, most particles will have negligible weights after a few iterations. A resampling technique has to be used after each iteration to eliminate particles with small weights, which, however, causes the particles with large weights to be concentrated and lead to sample impoverishment, i.e., a loss of the diversity of the particles. Some techniques have been proposed to solve the problem of sample impoverishment. For example, the resample-move algorithms, which add Markov Chain Monte Carlo (MCMC) steps after resampling. In this project, we introduce a new approach to obtain the particles with large weights and therefore the resampling is often not necessary or, if needed, will not lead to sample

6 impoverishment. This algorithm avoids the time consuming MCMC steps. Our new particle filters have been applied to several test problems and given good results. The long term goal of this project is to apply our efficient particle filters to oceanography. We would like to work on the ecosystem model in [17] with biological data from the Hawaii Ocean Time series (HOT) site. This is a joint work with Dr. Alexandre Chorin of University of California at Berkeley, and colleagues in the College of Oceanic and Atmospheric Sciences at Oregon State University.

3.2 Domain Decomposition Methods When we apply BDDC algorithms to different applications, the crucial part is the choice of the right type of primal constraints and enough of them to form the coarse problems. We have seen this in Section 2.1.1. On the other hand, there are now some computer systems that have more than 100,000 powerful processors, which allows very large and detailed simulations. (Usually we do not assign a local component to more than one processor due to the communication cost in domain decomposition algorithms.) With hundred of thousands subdomains, the coarse components will be the bottleneck, as previously noted, in many applications, including Stokes, Helmholtz, advection- diffusion, elasticity. I am planning to extend our three-level BDDC algorithms to these applications. We have succeeded in some cases discussed in Section 2.1.2. BDDC algorithms have been extended to saddle point problems [13, 19] and advection-diffusion [12] and we are now ready to extend them to incompressible Navier-Stokes. In collaboration with Dr. Clark Dohrmann of Sandia National Laboratories, Dr. Jing Li of Kent State University, and Dr. Olof Widlund of Courant Institute of Mathematical Sciences, we are interested in exploring the application of the new generation of the overlapping Schwarz methods, which uses coarse spaces borrowed from iterative substructuring methods, to the incompressible Navier-Stokes. This new overlapping Schwarz methods have been applied successfully for irregular domain in [4] and incompressible elasticity in [2].

3.3 Nonlinear Multigrid Methods Nowadays, the floating operation is cheap but moving the floating point numbers from main memory to the CPUs and then move the results back is expensive. FAS marches through the grid solving small nonlinear systems associated with one or a small number of unknowns using Newton’s method. We are then doing a number of floating point operations and only need to store the vectors; the Jacobian entries we compute never get stored in main memory. (Note that with multigrid the smoothing takes up almost all of the time, so to get high floating point performance we just need to have very efficient smoothers, the rest of the code doesn’t matter very much). With FAS properly implemented, we have the possibility of going from 10 percent of machine floating point peak to 60+ percent. We will study more acceleration technique for FAS and also connect it with what we have learned from domain decomposition methods. This is a joint work with Dr. Barry Smith of Argonne National Laboratory.

3.4 Biomathematical Projects Based on our extensive work on domain decomposition methods and nonlinear multigrid algorithms, our plan is to do detailed simulation for some biomathematical projects such as the blood flow in

7 stenotic collapsible tubes discussed in Section 2.4. We believe that parallel computers and scalable parallel domain decomposition algorithms will allow us to do larger and faster simulation for those real life problems.

References

[1] Satish Balay, Kris Buschelman, William D. Gropp, Dinesh Kaushik, Matt Knep- ley, Lois Curfman McInnes, Barry F. Smith, and Hong Zhang. PETSc home page. http://www.mcs.anl.gov/petsc, 2001.

[2] Clark Dohrmann and Olof Widlund. An overlapping Schwarz algorithm for almost incom- pressiblle elasticity. Technical Report TR2008-912, Department of Computer Science, Courant Institute, May 2008.

[3] Clark R. Dohrmann. A preconditioner for substructuring based on constrained energy mini- mization. SIAM J. Sci. Comput., 25(1):246–258, 2003.

[4] Clark R. Dohrmann, Axel Klawonn, and Olof B. Widlund. Domain decomposition for less reg- ular subdomains: overlapping Schwarz in two dimensions. SIAM J. Numer. Anal., 46(4):2153– 2168, 2008.

[5] Maksymilian Dryja and Xuemin Tu. A domain decomposition discretization of parabolic problems. Numer. Math., 107(4):625–640, October 2007.

[6] Charbel Farhat and Jing Li. An iterative domain decomposition method for the solution of a class of indefinite problems in computational structural dynamics. Appl. Numer. Math., 54:150–166, 2005.

[7] Charbel Farhat, Antonini Macedo, and Michel Lesoinne. A two-level domain decomposition method for the iterative solution of high frequency exterior Helmholtz problems. Numer. Math., 85(2):283–308, 2000.

[8] Haiyang Huang and Xuemin Tu. The regulation effect of cannibalism in the dynamics of the population model. Journal of Beijing Normal University (), 36(6):1–6, 2000.

[9] Haiyang Huang and Xuemin Tu. The effect of cannibalism in the dynamics of the size- structured population model,. Journal of Beijing Normal University (Natural Science), 37(5):580–585, 2001.

[10] Haiyang Huang and Xuemin Tu. Existence and stability of multiple equilibria in a size- structured cannibalism population model. Journal of Beijing Normal University (Natural Science), 38(1):1–6, 2002.

[11] Hyea Hyun Kim and Xuemin Tu. A three-level BDDC algorithm for mortar discretization. Technical Report LBNL-62791, Lawrence Berkeley National Laboratory, June 2007.

[12] Jing Li and Xuemin Tu. Convergence analysis of a balancing domain decomposition method for solving interior Helmholtz equations. Technical Report LBNL-62618, Lawrence Berkeley National Laboratory, May 2007.

8 [13] Jing Li and Olof B. Widlund. BDDC algorithms for incompressible Stokes equations. SIAM J. Numer. Anal., 44(6):2432–2455, 2006.

[14] Jan Mandel and Clark R. Dohrmann. Convergence of a balancing domain decomposition by constraints and energy minimization. Numer. Appl., 10(7):639–659, 2003.

[15] Marcus Sarkis and Xuemin Tu. Singular function mortar finite element methods. Comput. Methods Appl. Math., 3(1):202–218 (electronic), 2003. Dedicated to Raytcho Lazarov.

[16] Barry Smith and Xuemin Tu. Newton-multigrid, the full approximation scheme, and their acceleration. in preparation.

[17] Y. H. Spitz, J. R. Moisan, and M. R. Abbott. Configuring an ecosystem model using data from the Bermuda Atlantic Time Series (BATS). Deep-Sea Research II, 48:1733–1768, 2001.

[18] Dalin Tang, Chun Yang, Shunichi Kobayashi, and David N. Ku. Generalized finite difference method for 3-D viscous flow in stenotic tubes with large wall deformation and collapse. Appl. Numer. Math., 38(1-2):49–68, 2001.

[19] Xuemin Tu. A BDDC algorithm for a mixed formulation of flows in porous media. Electron. Trans. Numer. Anal., 20:164–179, 2005.

[20] Xuemin Tu. A BDDC algorithm for flow in porous media with a hybrid finite element dis- cretization. Electron. Trans. Numer. Anal., 26:146–160, 2007.

[21] Xuemin Tu. Three-level BDDC in three dimensions. SIAM J. Sci. Comput., 29(4):1759–1780, 2007.

[22] Xuemin Tu. Three-level BDDC in two dimensions. Internat. J. Numer. Methods Engrg., 69:33–59, 2007.

[23] Xuemin Tu. A three-level BDDC algorithm for saddle point problems. Submitted to Numer. Math., 2008.

[24] Xuemin Tu and Jing Li. A balancing domain decomposition method by constraints for advection-diffusion problems. Commun. Appl. Math. Comput. Sci., 3:25–60, 2008.

[25] Xuemin Tu and Marcus Sarkis. Singular function enhanced mortar finite element. In Domain decomposition methods in science and engineering, pages 475–482 (electronic). Natl. Auton. Univ. Mex., M´exico, 2003.

9