<<

THE UNIVERSITY OF CHICAGO

FAST NUMERICAL METHODS AND BIOLOGICAL PROBLEMS

A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF COMPUTER SCIENCE

BY PETER BRUNE

CHICAGO, ILLINOIS AUGUST 2011 ABSTRACT

This thesis encompasses a number of efforts towards the development of fast numer- ical methods and their applications, particularly in light of the simulation of bio- chemical systems. Scientific computing as a discipline requires considerations from computer science; namely those of algorithmic efficiency and software automation. Also required is knowledge of applied in order to bridge the gap be- tween computer science and specific application. This thesis spans these fields, with the study and implementation of optimal numerical techniques encompassing two chapters, and the development and application of numerical techniques to biochemi- cal problems encompassing another two. The first of these efforts is the construction of robust, optimal geometric unstructured multigrid methods in the face of difficult problem and mesh conditions. The second was the construction of optimal discrete spaces for problems arising in quantum mechanics and electronic structure calculations. The third and fourth were the development of fast and flexible meth- ods for nanoscale implicit solvent electrostatics. The development of fast and parallel methods for an important quantity of interest in classical density functional theory calculations is discussed. Also, the derivation and implementation of a finite element method for improved solvent models using automated scientific computing tools is described.

ii TABLE OF CONTENTS

ABSTRACT ...... ii

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

ACKNOWLEDGEMENTS ...... x

Chapter 1 ROADMAP ...... 1 1.1 Geometric Unstructured Multigrid ...... 1 1.2 Exponential Meshes ...... 2 1.3 Classical Density Functional Theory ...... 2 1.4 Nonlocal Bioelectrostatics ...... 3 1.5 Disciplines ...... 4 1.5.1 Computer Science ...... 4 1.5.2 Computational and Applied Mathematics ...... 5 1.5.3 Finite Element Methods ...... 6

2 MULTILEVEL METHODS ...... 7 2.1 Multigrid ...... 8 2.2 FEM & Multilevel Methods ...... 11 2.2.1 Efficiency and High Performance Computing ...... 12 2.2.2 Feasibility and Low Performance Computing ...... 12 2.2.3 Coarsening and Multigrid Components ...... 13 2.3 Vertex Selection ...... 14 2.4 Multigrid Mesh Conditions ...... 16 2.4.1 Function-Based Coarsening ...... 18 2.4.2 Graph Coarsening Algorithm ...... 20 2.4.3 Mesh Boundaries ...... 24 2.5 Remeshing ...... 26 2.5.1 Remeshing Options ...... 27 2.5.2 Simple Vertex Removal ...... 28 iii 2.5.3 Quality Measure Possibilities ...... 30 2.5.4 Anisotropic Quality Measures ...... 32 2.5.5 Remeshing the Boundary ...... 34 2.5.6 Mesh Cleanup ...... 35 2.6 Adaptive Refinement ...... 37 2.7 Interpolation Operators ...... 40 2.7.1 Construction by Traversal ...... 41 2.8 Experiments ...... 46 2.8.1 Experimental Setup ...... 47 2.8.2 Test Problems ...... 48 2.8.3 Mesh Quality Experiments ...... 49 2.8.4 Mesh Grading Experiments ...... 52 2.8.5 Multigrid Performance ...... 55 2.8.6 Anisotropic Multigrid Performance ...... 56 2.9 Outlook ...... 58

3 EXPONENTIAL MESHES AND QUANTUM MECHANICS ...... 61 3.1 Approximation of Exponential Decay ...... 62 3.1.1 Optimal Spacing ...... 63 3.1.2 The Higher Order Case ...... 65 3.2 Mesh Representation ...... 67 3.2.1 Simple Radial Meshes ...... 70 3.3 Quantum Mechanics ...... 71 3.3.1 The Spectrum ...... 72 3.3.2 Solving the Schr¨odingerEquation ...... 72 3.4 General-Dimensional Finite Elements ...... 76 3.4.1 Tensor Product Geometries and Elements ...... 76 3.4.2 Barycentric Coordinates ...... 77 3.4.3 Tensor Product Simplices ...... 77 3.4.4 The Finite Element Basis ...... 78 3.4.5 Lagrange Elements ...... 78 3.4.6 Quadrature on Tensor Product Cells ...... 80 3.5 Experiments ...... 82 3.5.1 The Hydrogen Atom ...... 83 3.5.2 The Hydrogen Spectrum ...... 84 3.6 Outlook ...... 84

iv 4 EFFICIENT CLASSICAL DENSITY FUNCTIONAL THEORY ...... 87 4.0.1 Contribution ...... 90 4.1 Problem Setup and Parameters ...... 91 4.1.1 The Hard Wall Case ...... 92 4.1.2 Reduced Model for the Ion Channel ...... 93 4.1.3 Numerical Approaches ...... 94 4.2 Parallel Algorithm ...... 96 4.2.1 GPU Computation ...... 98 4.2.2 The Algorithm ...... 101 4.2.3 Optimization ...... 103 4.3 Initial Results ...... 105 4.4 Fast Algorithm for the Reference Density ...... 107 4.4.1 Parameter gridding ...... 107 4.4.2 Error Estimation ...... 108 4.4.3 Complexity of Approximation ...... 111 4.4.4 Results ...... 112 4.5 Outlook ...... 114

5 FINITE ELEMENT METHODS FOR NONLOCAL ELECTROSTATICS 116 5.1 Biochemical Continuum Electrostatics ...... 117 5.1.1 Domain of Interest ...... 118 5.2 Models and Formulations ...... 121 5.2.1 The Fourier-Lorentzian Model ...... 122 5.2.2 Parameters of the FL Model ...... 125 5.2.3 Discretization of the FL Model ...... 126 5.3 FEM Approach ...... 128 5.3.1 Formulation Details ...... 129 5.3.2 Code Generation and Automation ...... 131 5.3.3 Fast Solvers ...... 133 5.4 Atomic Meshing ...... 133 5.5 Experiments ...... 135 5.6 Outlook ...... 137 5.6.1 Model Improvements ...... 138

Appendix

v A MULTIGRID CODE OVERVIEW ...... 139 A.1 Multigrid Infrastructure ...... 139 A.1.1 MultigridProblem ...... 139 A.1.2 Interpolation ...... 141 A.2 Coarsening Infrastructure ...... 142 A.2.1 CoarsenedHierarchy ...... 142

B OPTIMAL GRIDS FOR OTHER FUNCTIONS ...... 144 B.1 Geometric Grids ...... 144 B.2 Grids for Algebraic Functions ...... 145

C MOLECULAR MESHING ...... 147 C.0.1 Surface Meshing ...... 148 C.0.2 Inputs ...... 149 C.0.3 Algorithm ...... 149

REFERENCES ...... 150

vi LIST OF TABLES

2.1 Hierarchy quality metrics for the Pacman mesh...... 50 2.2 Hierarchy quality metrics for the Fichera mesh...... 51 2.3 Calculated Cl and Ch for the Pacman mesh hierarchy...... 53 2.4 Calculated Cl and Ch for the Fichera mesh hierarchy...... 54 2.5 Multigrid performance on the Pacman and Fichera problems. . . . . 55 2.6 Anisotropic problem multigrid performance...... 57 2.7 Galerkin vs. Rediscretization Multigrid for the Sinker Problem. . . . 60

3.1 Relative L2 errors...... 82 3.2 Number of unknowns...... 82

4.1 Table of species parameters...... 91 4.2 Error and refinement in R on the hard wall...... 114

5.1 Constants used in various studies of the nonlocal model...... 126 5.2 Free energies of solvation for example ions ...... 137

A.1 Solver parameters...... 140 A.2 Multigrid cycle type options...... 140 A.3 Multigrid type...... 140 A.4 Available smoother types...... 141 A.5 Smoother to be used on the coarsest level...... 141 A.6 Linear solver type...... 141

vii LIST OF FIGURES

2.1 Multigrid cycle types...... 10 2.2 Coarsening of a graph...... 21 2.3 Interior and boundary coarsening...... 26 2.4 Vertex-removing Contractions...... 29 2.5 Quality measure components ...... 31 2.6 Thin, wedge, sliver, and flat elements...... 33 2.7 Coarsened aneurysm mesh...... 35 2.8 Example of singular refinement...... 39 2.9 Outer and inner traversals...... 42 2.10 Various depth breadth-first searches...... 44 2.11 Non-nested meshes...... 46 2.12 Generated hierarchy of the Pacman mesh...... 50 2.13 Generated hierarchy of the Fichera mesh...... 51 2.14 Anisotropic mesh sequence...... 57 2.15 Solution and mesh hierarchy for the viscous sinker problem...... 59

3.1 Optimal exponential meshes...... 66 3.2 The prism element...... 67 3.3 Simple radial mesh...... 70 3.4 2D example sparse grids...... 74 3.5 A hierarchical basis over the interval...... 75 3.6 Dubiner type quadrature on the triangle...... 80 3.7 Grundmann-M¨ollertype quadrature on the triangle...... 80 3.8 Uniform lattice type quadrature on the triangle...... 80 3.9 The recovered spectrum of the hydrogen atom...... 84

4.1 The hard wall test problem...... 92 4.2 The ion channel test problem...... 93 4.3 Schematic of the GPU’s processor and memory layout...... 98 4.4 Diagram of the on-GPU part of the algorithm ...... 101 4.5 CPU and GPU performance for calculating ρref ...... 105

5.1 A domain with water, protein, and ions...... 118 5.2 Charge distributions...... 120 viii 5.3 Frequency response showing polar and non-polar regimes...... 122 5.4 Realspace interpretation with correlation distance λ...... 124 5.5 An overview of the tools used ...... 131 5.6 View of the entire meshed atom domain...... 134 5.7 Zoomed center of an atomic mesh showing center and grading. . . . 135

B.1 Exponential, geometric, and singular grids ...... 146

C.1 Simple meshes of benzene and trypsin ...... 147

ix ACKNOWLEDGEMENTS

I would like to thank my advisors and mentors for their advice, guidance, and support throughout my graduate studies. My advisor, Ridgway Scott gave me both enough guidance to make interesting contributions and enough freedom to make interesting discoveries during the course of my graduate studies. I’m also very grateful to my co-advisor Todd Dupont for his always insightful advice and support. I’d also like to thank my mentor, Matthew Knepley, for providing at every opportunity interest- ing problems, insight and a great introduction to the world of scientific computing. Thanks go to Dexuan Xie at the University of Wisconsin at Milwaukee has been an insightful and enthusiastic collaborator and was to the work in Chapter 5. I’d also like to thank my former advisor through my first year, Rob Kirby, for starting me off on the right track in my studies. I was very fortunate to be able to travel and collaborate a great deal during the course of my graduate studies. I have a number of individual groups and institutions to thank for this. In particular, I would like to Ola Skavhaug and Anders Logg at Simula Research Laboratory in Oslo, Norway and Johan Hoffmann and Johan Jansson at Royal Technical Institute in Stockholm, Sweden for allowing me to work on development and application of the FEniCS tools over the summers of 2008 and 2009. I also extend my gratitude to the people at the Institute for Mathematics and its Applications for the welcoming and engaging academic atmosphere and support during my year visit. I would also like to thank the staff, professors, and students at in the University of Chicago Computer Science department for providing and being part of a great environment in which to study and work. The discussions at the various department

x social gatherings allowed me to expand my horizons to various parts of computer science that I would have otherwise been blind to. My parents and brother have always been interested and shown support in my studies, and my father was a particular help in providing advice on academic matters. My lovely girlfriend, Kate Franklin, has stood by me through these last few years of my graduate studies, and has always been supportive. I also acknowledge partial support from NSF grant DMS-0920960.

xi CHAPTER 1 ROADMAP

In the following thesis, I focus on the development of efficient numerical methods and application of efficient numerical methods in the simulation of biochemical systems. This effort is composed of four projects: unstructured multigrid methods, optimal meshes for exponentially decaying functions, classical density functional theory, and finite element methods for nonlocal electrostatics.

1.1 Geometric Unstructured Multigrid

I began investigating geometric multigrid methods as a Givens Associate in the Mathematics and Computer Science division at Argonne National Laboratory under the direction of Matthew Knepley. Efficiently creating a series of quality-controlled coarse meshes for use with fast multilevel methods presents many algorithmic chal- lenges. I focused on computational geometry methods [102] and algorithms for effi- cient quality-controlled node-nested coarsening of highly graded initial meshes. The final implementation made use of FEniCS extended to use the multigrid framework from PETSc. I have shown that the resulting preconditioners behave surprisingly well on prob- lems that are often challenging for multilevel methods. We were able to precondition highly graded problems with corner singularities [35]. An initial foray into coarsen- ing of highly anisotropic meshes with jump coefficients, both in terms of preexisting theoretical results [157] and practical concerns, also appears to be promising. This work is contained in Chapter 2.

1 1.2 Exponential Meshes

Meshes refined for the solution of problems that contain geometric singularities are a well-known case for a priori refinement in general dimension [12]. However, we are concerned with the optimal representation of a different type of function: one that has exponential decay away from some small set of centers. This work was done in conjunction with my advisor, L. Ridgway Scott. We have shown that the optimal grading for functions of this sort follows a simple nonlinear ordinary differential equation that results in a mesh with nearly regular grading at the centers that rapidly falls off outside of a certain region. This opens up the possibility of representing such functions in arbitrary dimensions quite efficiently on semi-structured finite element meshes. I developed a finite element framework for discretizing and solving the eigenprob- lem arising from the high-dimensional Schr¨odingerequation on meshes based upon this grading in general dimensions. We were able to evaluate this method in the context of very small quantum systems. This work is contained in Chapter 3

1.3 Classical Density Functional Theory

Classical density functional theory is a general modeling technique for fluids con- sisting of several interacting components, such as a solvent and ions. This effort was in collaboration with Matthew Knepley, Dmitry Karpeev, and Dirk Gillespie in the Department of Molecular Biophysics and Physiology at Rush Medical Center in Chicago on using classical density functional theory (DFT) to model the selectivity properties of biological ion channels [88]. The resulting software is build on top of PETSc. My contribution to this effort was the efficient computation of a particu- lar term representing a local equillibrium ion concentration. This term requires an integral operator to be applied pointwise by a convolution kernel, requiring, in the

2 na¨ıve case, an inverse Fourier transform of the whole domain per gridpoint. This was dominating the simulation time. I tackled this bottleneck using two very different tactics. At first, it seemed that it would be sufficient to accelerate the kernel application using a GPU-based parallel method. I created a CUDA implementation of the kernel application and created a model of its performance for the sake of tuning. However, as the required problem size grew, it became necessary to approximate the reference density rather than compute it outright. I was able to restructure the integral kernel and derive an error-controlled fast approximation technique with logarithmic complexity in the range of an easily bounded physical parameter of the model. This has allowed for us to begin studying and evaluating reduced models of ion-channel-like pores. This work is contained in Chapter 4.

1.4 Nonlocal Bioelectrostatics

The most common model of implicit solvent biomolecular systems has a stark jump from the dielectric constant in protein, which is small, to the dielectric constant in bulk water, which is large. This model has been shown to neglect alignment correlation between neighboring water molecules and structuring of water by charges in the protein. Nonlocal implicit solvent models for this correlation allow for physical properties of solvated molecules, like the solvation free energy, to be efficiently and accurately computed. However, the nonlocal term turns the dielectric constant in water into an integral operator applied to the electric field [75, 126]. Luckily, this integral operator is the Green’s function to a simple PDE [149], allowing for treatment of the nonlocal model with mixed finite difference or finite element methods. I have been working with Dexuan Xie in the Department of Mathematical Sci- ences at the University of Wisconsin at Milwaukee on finite element formulations for nonlocal electrostatics. I have developed software based upon FEniCS in order

3 to quickly test the models. My efforts have included improving the formulation of the mixed equations, devising simple unstructured meshing techniques for small molecules, and building test problems based upon experimental or numerical tests of the method. This work is contained in Chapter 5.

1.5 Disciplines

Numerical methods and their applications require the combination if insights from several fields. The study of computer science, namely in terms of efficiency and expressiveness of computational models and languages, must be combined with the study of computational science, in terms of discretizations of mathematical models, error analysis, and the understanding of areas of application and their mathematical underpinnings. These themes will be present throughout the thesis.

1.5.1 Computer Science

Computer science generally provides important insight to numerous other disciplines with respect to the computational and algorithmic efficiency of methods for the con- struction, for the accurate discretization, and for the efficient solution, paralleliza- tion, and automation of a problem in scientific computing. This requires not only the knowledge of the mathematics of the computation, but how the more “computer science” areas of mathematics relate to the solution of continuum problems. In particular, studies such as those in this thesis are enabled by the existence of well-tested libraries informed by software engineering principles and guidance. Libraries like PETSc [108] and FEniCS [56] allow for quick use by practitioners in particular problem areas. Domain specific languages and code generation using them is also a long-standing development in scientific computing and comes in many forms, from libraries that

4 implicitly impose a programming style on their users, to languages or language ex- tensions made explicitly for the completion of some task, such as the discretization of a partial differential equation as done by the code generation tools in FEniCS.

1.5.2 Computational and Applied Mathematics

Computational mathematics lives at the intersection of applied mathematics, com- puter science, and science. Computational mathematics, along with the related fields of numerical analysis and scientific computing, strives to build schemes for solving mathematical or physical problems of interest that are efficient, robust, and numer- ically applicable, and work to improve the efficiency, robustness, and accuracy of schemes used for the solution of numerical problems. For instance, the multigrid chapter of this work looks at discretizations of the domain (referred from hereon in as meshes) that efficiently and accurately capture some singularity in the solution of a partial differential equation. These meshes make it difficult to use efficient solvers, namely geometric multilevel methods, on the particular problems of interest. Therefore, we strive to develop methods that may robustly bridge the gap between an optimal multilevel solver and an optimal unstructured mesh for a given finite element problem. We also explore efficient discretizations with regard to general dimensional meshes for problems where the solution necessarily dies exponentially. These problems re- quire grading that is neither regular nor singularly refined. The analysis of these grids and their extension to higher dimensions is discussed. We will show how these may be applied to a particular set of problems related to quantum mechanics as an efficient way of computing quantum-chemical energies. We look at the problems of classical density functional theory, where accurate models for the interaction of electrostatics and chemistry require the calculation of quantities where the straightforward algorithms require a huge amount of work even

5 for small problems. We propose two solutions to this, an accelerated solution using the latest hardware available in desktop computers, and a fast approximate solution done using a approximation-theoretic analysis of a parameter-driven approximation to the operator used in the problem. Finally, we look at the problem of feasible computation of an interesting model of electrostatics in implicit solvation calculations at the protein scale. This model allows for much more accurate simulation of atoms or molecule immersed in water. However, a cursory approach to this problem consists of both sparse and dense numerical computation. We derive a finite element approximation for this problem that is purely differential and sparse and which solves many previous problems with the discretization of the model. We show that this discretization matches with other techniques for discretizing the model, and discuss future possibilities.

1.5.3 Finite Element Methods

The finite element method (FEM) [30] is widely used for the solution of partial differential equations (PDEs). Standard FEM utilizes polynomial basis functions over subdivisions of a domain, referred to as cells. Those basis functions are used both as the approximation space for the solution and as the space of test functions for which the weak form of the equation is satisfied. This enables a wide number of linear and nonlinear equations to be discretized and solved using the finite element method. FEM techniques are generally applicable in a wide number of problem domains. This thesis has two application domains which use finite element formulations di- rectly. The solution of simple electronic structure problems using finite elements is explored in Chapter 3. A finite element formulation of nonlocal electrostatics for implicit solvation is derived in Chapter 5. Fast solution techniques for the finite element method in difficult application cases are discussed in Chapter 2.

6 CHAPTER 2 MULTILEVEL METHODS

The development and implementation of techniques for geometric unstructured multi- grid involves the incorporation of interesting insight from a number of fields, includ- ing computational geometry, fast solvers, and adaptive finite element methods. The goals for this project were to expand the applicability of both theory and methods for multigrid methods into problems where this was difficult before. The use of software designed for the automatic and general discretization of par- tial differential equations was a key factor in this effort. The initial implementation was done using the Sieve library [86] and had initial success with a prototypical version of the work [34]. However, using the multigrid library for general problems was still difficult. At- tempts to reconcile the use of flexible mesh libraries with the use of generalized and flexible finite element computation were initially stymied by programming difficulties, so the project was rebooted to use the FEniCS tools. This provided a very flexible geometry and mesh interface as well as very flexible tools for the discretization and solution of finite element equations. We explore further possibilities for the marriage of fast solvers and generalized finite element computation in the future work section. This chapter will describe the theoretical motivations, algorithmic and imple- mentation challenges, and results of this effort. Multigrid methods and their use with unstructured finite element problems will be introduced. Then, an introduction to the necessary steps taken to make adaptive finite element methods work with multigrid will be discussed. Then, the details of the algorithm and implementation for guaranteed-quality construction of a series of coarse grids for use with multigrid

7 methods will be discussed. Finally, the experimental evidence that this was indeed accomplished will be presented and discussed.

2.1 Multigrid

Multigrid methods allow for the rapid iterative solution of numerical discretizations of a wide range of partial differential equations. A number of detailed introductions [32,72,143] into the techniques for multigrid methods describe the basic and advanced theory, implementation, and practice. Therefore, we will only briefly describe the construction of multigrid methods here. Given an operator A that arises from the discretization of a linear system, the goal is then to solve, for x

Ax = b (2.1) given some vector b and matrix A. In the context of this work, b and A come from some finite element discretizations, so for some set of compactly supported basis functions {φi}, we construct

Aij = (Dφi, φj) (2.2) and

bj = (f, φj) (2.3) for some differential operator D and function f. One rudimentary way of solving equations such as these is by use of a simple iterative method known as a smoother. We define a smoother S and a solution procedure consisting of, at iteration i with approximate solution xi, determining the next approximate solution xi+1 by

8 i+1 i x = Sbx (2.4) such that in the limit of N → ∞

xN = A−1b. (2.5)

The theory of iterative methods is well developed and elaborated upon in several fine sources [66, 118]. This includes classical methods such as Jacobi, SOR, and Richardson iterations, as well as the observed behavior of relatively more modern methods such as incomplete factorizations and Krylov methods. These methods tend to be able to resolve the high-energy components of the error A−1b−xi very quickly, but leave the “smooth” parts of the error intact significantly longer. This makes classical smoothers very simple to implement, but slow methods for the solution of linear systems. Multigrid turns this problem on its head, using these “slow” solution technologies in a clever way to solve the equations rapidly. The major insight of multigrid methods is to resolve this smooth error on a discretization where it is rough, allowing for much more rapid convergence of the method. Therefore, the most basic multigrid iteration, the V-cycle, may be defined in three phases:

• pre-smoothing

• coarse grid correction

• post-smoothing

In this scheme, the pre and post-smoothings are some fixed number of applica- tions of the smoothing procedure. The coarse grid correction involves projecting the residual to a coarser subspace, smoothing on that subspace, and projecting the correction back to the finer subspace. In the case of the V-cycle, the coarse grid

9 correction phase consists of a round of pre-smoothing, recursive coarse grid correc- tion, and post-smoothing on the coarser subspace. This recursive procedure may be extended to any number of levels. This process is repeated until convergence. There may be more complex ways of scheduling these components, such as the W-cycle which includes two coarse-grid corrections between pre and post-smooths. There is also the full multigrid method, which adds a fine level at the end of each cycle, building a good initial guess to the fine problem progressively by using the coarse problems. These cycles are shown in Figure 2.1. fine

coarse

Figure 2.1: Various multigrid cycle types shown in scheduling order. Dashed lines correspond to grid levels. Presmooths in blue, postsmooths in red. Projection of the residual shown as down arrows, projection of the correction shown with up arrows. V-cycle on the left, W-cycle in the middle, Full V-cycle on the right.

In the case of elliptic partial differential equations, “textbook” multigrid efficiency is O(n) for n unknowns in the system. This is independent of the increases in the number of DoFs in the system, and the associated increase in the condition number, due to refinement. In this thesis we focus on geometric multigrid methods. Geometric multigrid methods construct the coarse space through geometric means. With respect to the finite element method, this implies the construction of a series of meshes covering the domain at various resolutions. The alternative is algebraic multigrid (AMG) [135]. This family of procedures builds the coarse set of unknowns not by constructing a coarse space based upon a 10 coarse Mc, but based completely upon inspection and heuristic from the operator A. The end goal of this procedure is to construct coarse operators that assume little or nothing about the source of the matrix. However, the theoretical justification for such procedures typically assume fairly idealized situations compared to the existing theory for geometric multigrid methods.

2.2 FEM & Multilevel Methods

AMG is particularly appealing to the standard computational science practitioner in any number of fairly typical areas of application. Huge gains in performance on a wide array of problems are effectively achieved by simply turning on AMG in a number of popular linear algebra packages. Unstructured FEM problems with complex domains or complex formulations may be treated easily with algebraic multigrid methods in ways that are very difficult using provably optimal geometric methods. These include the case where there has been extreme refinement or coarsening on a complex shaped domain. Being able to handle this in a robust way with geometric methods is a major goal of this effort. However, there is also a wide class of problems for which AMG either fails entirely on, or requires expert tuning to work with. The very “black box” and matrix-oriented nature of AMG algorithms also limits their applicability to only fully-assembled, linear finite element equations and methods of solution. Therefore, our focus on optimal geometric multigrid methods for problems with complex domains. The eventual goal of the work is to develop strategies for geomet- ric unstructured multigrid methods that work equally well or better than algebraic methods on hard problems for geometric multigrid, but that are also usable in the cases where algebraic multigrid methods cannot be used.

11 2.2.1 Efficiency and High Performance Computing

The efficiency of numerical methods for partial differential equations is a big issue at large computing scales. Being able to solve PDEs amenable to multilevel methods is often an integral part of large-scale computation. This is fairly easy to integrate when the problem is well structured, but in the case of multilevel methods on large- scale unstructured grids, more complex techniques, such as domain decomposition methods [132] must be marshalled in order to parallelize the problem. However, even these methods have some serial inefficiencies at the local scale, so the notion of a fast solver for the local problems still makes a lot of sense. There is a fairly vast literature for large-scale parallel multigrid methods. These often are simple and structured. However, the use of both algebraic and geometric multigrid methods with finite element problems in the high-performance computing context is a constant problem being studied. While the point at which the computa- tion breaks even might be high, given the cost of additional assembly of the coarse space. Therefore, precomputation of a series of coarse, provably optimal spaces for this type of problem reaps massive benefits. We intend to be informed by the domain decomposition literature as well in how to construct efficient multilevel methods. The way that this work could potentially, in the future, be taken to the highly parallel case is either through parallelization or as use for the creation of a coarse space with domain decomposition.

2.2.2 Feasibility and Low Performance Computing

Another major win is for intermediate-size systems, where a direct inversion of the operator might take prohibitively long or prohibitively much memory on a laptop, and for which the condition number is too large to use a standard iterative solver. For someone trying to run a small convergence study for a simple system, this is both the most useful and flummoxing regime. By intermediate-size systems we mean

12 problems in the hundreds of thousands of degrees of freedom range, which are the typical practitioner problem sizes for those trying to solve finite element systems who do not own a cluster or have access to a supercomputer. These problems are quite important. A few hundred thousand unknowns may correspond to an entirely reasonable discretization of even a fairly complex system. While many problems of this size may not even be feasible without some fast solver technology, we are also interested in what the break-even point for fast solvers might be. While the preprocessing steps required for the use of multilevel methods, even very simple ones, may dwarf the actual computation time for a number of problems, they may often make up their ground when the problem size becomes nontrivial. The break even point for nonlinear problems of this size is going to be much less, and the break-even point for time-dependent problems of this size even less so. These sorts of problems motivate our study into how we may flexibly construct such solvers in an efficient way. The break-even for a single solve may be too high to consider these methods, especially when algebraic techniques may be used. However, in many cases, quick solutions are not applicable, and the problem feasibility hinges on the availability of a fast solver for the problem. The feasibility of small-scale computational projects is also greatly improved by software that may automatically allow for various sorts of exotic discretizations to be used. These include the use of higher-order discretizations, which make the condition number of the resulting system potentially much worse, but may also actually have significantly better performance when used in conjunction with finite element methods [91].

2.2.3 Coarsening and Multigrid Components

The components of the unstructured multigrid method which we will describe the construction of are

1. Coarse Vertex Selection

13 2. Remeshing

3. Interpolation

4. Linear Algebra

Each of these components may be treated reasonably independently. However, there are some key interplays. For optimal multigrid convergence, we require the guarantees on the meshes. For optimal efficiency for the interpolators, we must have guarantees on the meshes. The coarse vertex selection and meshing options are all chosen such that this is possible.

2.3 Vertex Selection

For node-nested coarsening of an unstructured mesh, we separate the process of mesh coarsening into two stages: coarse vertex selection and remeshing. The problem of coarse vertex selection on a mesh M involves creating some subset V c of the vertices V of M that is, by some measure, coarse. Several techniques have been described for coarse vertex selection on unstructured simplicial meshes. The most widely used of these methods are Maximum Independent Set (MIS) algorithms [1,39,71]. These choose a coarse subset of the vertices of the mesh such that no two vertices in the subset are connected by a mesh edge. The resulting set of vertices is then remeshed.

Given some mesh M, we define the graph GM = (V,E), which consists of the edges E and vertices V of M. The MIS algorithms may then be described as finding MIS some set of coarse vertices Vc such that

MIS (va, vb) 6∈ E for all pairs of vertices (va, vb) ∈ Vc . (2.6)

This may be implemented by a simple greedy algorithm where at each stage a MIS vertex is selected for inclusion in Vc and its neighbors are removed from potential

14 inclusion. It should also be noted that MIS type methods are used as the coarse- space selection procedure in classical AMG methods, where the replacement for G as stated above, GA is induced by the nonzero entries of the stiffness matrix A [135]. There are a couple of issues with this method. The first being that there is no way to determine the size of the mesh resulting from coarsening. This is quite important if we want to be able to use the strategy for generalized computation, as higher-order finite element discretizations require much more aggressive coarsening in order to have optimal multigrid characteristics. The other is that there are no real guarantees on the spatial distribution of vertices in the resulting point set. It has been shown that MIS methods may, for very simple example meshes, degrade the mesh quality quite rapidly [102]. This may be shown in the case where one has a grid, and by chance one picks every other vertex in one direction and every third in the other direction. The aspect ratio will double at each coarsening step. Other methods for choosing the coarse vertex selection have been proposed to mitigate this shortcoming, often based upon some notion of local increase in length scale. For instance, the method in [112] tries to pick some set of vertices such that the average edge length throughout the mesh is increased by a given factor. Other methods of interest include coarsening methods meant to semi-coarsen a mesh with anisotropic refinement, in which the eigenvectors and values of some local distance metric are used to determine what direction to coarsen in [101]. There have been developments of methods much better suited to generalized geometric multigrid methods. These methods tend to be built around some notions of what multigrid methods require for optimal convergence in the unstructured case. We restate the requirements of multigrid methods here in order to frame the issue.

15 2.4 Multigrid Mesh Conditions

There is some justification for extending the classical multigrid convergence results to the case of non-nested meshes [23, 24, 98]. Results of optimal multigrid convergence have been extended to non-quasi-uniform [156] and degenerate meshes [157] in two dimensions. Three dimensional non-nested multigrid has been proven to work in the quasi-uniform case [125]. We will use the mesh quality conditions required for the non-quasi-uniform case [156]. While the example mesh hierarchies used in this paper might be quasi-uniform when considered alone, we may not assume quasi-uniformity independent of the size of the finest mesh in the asymptotic limit of refinement. It should be noted that the non-quasi-uniform theory does not exist for three dimensions, but the quasi-uniform theory [125] serves as a guide to how the method should perform.

For any mesh M and any cell τ in M let rin(τ) be τ’s incircle diameter and h(τ) its longest edge, and define the aspect ratio as

h(τ) AR(τ) = . (2.7) 2rin(τ)

The quality of approximation of the finite element function space [129] is depen- dent on the mesh quality; as can be correlated with the aspect ratio. The quality of the initial solution is therefore dependent on this and we may assume that the com- petent practitioner giving us an initial mesh will have taken this into consideration. In addition, and of more concern to the numerical methods involved, the matrix condition number is also quite dependent on the mesh quality, which is well-known classically [7]. Therefore, degradation of the mesh quality on the coarser levels will spell doom for the convergence properties of multigrid. For some constant CAR, we must require that the aspect ratio of any given cell in {M0...Mn} satisfy

AR(τ) ≤ CAR. (2.8) 16 The other half of the criteria are the local comparability conditions. Assume that if we have two meshes, Mk and Mk−1, we define, for some cell τ in Mk

k k−1 Sτ = {T : T ∈ M ,T ∩ τ 6= ∅} (2.9) which defines the set of cells in a mesh Mk−1 overlapping a single cell τ in the next coarser mesh Mk. We can state the local comparability conditions as

k sup {|Sτ |} ≤ Co for some Co ≥ 0 (2.10a) τ∈Mk

hτ sup { sup { }} ≤ Cl for some Cl ≥ 1 (2.10b) k k hT τ∈M T ∈Sτ (2.10a) implies that each cell intersects a bounded number of cells in the next finer mesh, and (2.10b) that the length scale differences of overlapping cells are bounded. We will also state here, for the sake of completeness, the assumption from the standard proofs of multigrid convergence necessary for the algorithmic efficiency of the method. Define

|M| = # of cells in mesh M

Then, for some Cm ≥ 2 k k+1 |M | > Cm|M |. (2.11)

The implication is that at each coarser mesh there must be a sufficient decrease in the number of cells. A geometric decrease in the number of cells over the course of coarsening is necessary in order to have an O(n) method as each V-cycle must be O(n).

17 2.4.1 Function-Based Coarsening

The coarse vertex selection method we will focus on for this was developed by Miller et al. [102] and is referred to as function-based coarsening. The method begins by defining some spacing function Sp(v) over the vertices v ∈ V of M. Sp(v) is defined as some local length scale consisting of some set of spheres at the vertices that do not overlap. The natural choice for Sp(v) is the nearest neighbor(NN(v)) distance [102]. We choose to approximate this by the nearest mesh neighbor distance; the shortest ad- jacent edge. This makes more sense given the conditioning and approximation issues related to finite element computation, as for vertices with adjacent cells with large angles the nearest neighbor and the nearest mesh neighbor may exist at totally differ- ent length scales from one another. The length scale captured in an approximation sense will always be the nearest mesh neighbor, so we shall use that, as it is the in- crease in that which is required by geometric multigrid methods. This is the first of many modifications we will make to the method in order to make it more amenable to finite element computation. β is the multiplicative coarsening parameter. β can be considered the the mini- mum factor by which Sp(v) is increased at some v shared between the fine and coarse mesh during the coarsening process. √ √ Here we choose β > β0 where β0 = 2 in 2D and β0 = 3 in 3D by the simple fact that β = β0 +  for small  would reproduce the repeated structured coarsening of an isotropic, structured, shape-regular mesh where the length scale is increased by two in every direction. This may be visualized by considering the length of the diagonal across a square and a cube in 2D and 3D respectively. β may also be tuned, changing the size of the set of resulting coarse vertices, in a problem-specific fashion to account for mesh and function-space properties, such as polynomial order. The size of the mesh may be carefully controlled to satisfy the needs of the multigrid method based upon this tuning. 18 FBC We say that a coarse subset of the vertices of M, Vc satisfies the spacing condition if

FBC β(Sp(vi) + Sp(vj)) < dist(vi, vj) for all pairs (vi, vj) ∈ Vc . (2.12)

FBC c After determining Vc , M may be created by some remeshing process. A hierarchy of meshes may then be created by reapplying the algorithm to each newly- minted coarse mesh with constant β and some recalculated Sp in turn to create a yet coarser mesh. This may be done until the mesh fits some minimum size requirement, or until the creation of a desired number of meshes for the hierarchy. More options and detail about this step may be found in [137]. The original authors [102] had fast numerical methods in mind when proving qual- ity bounds for function-based coarsening, and a number of the properties required of the meshes are spelled out in the original work. For one, the maximum aspect ratio of the resulting mesh hierarchy may be bounded by some constant, satisfying (2.8).

We also have that, with hi(x) being the length-scale of the cell overlapping the point x for all x ∈ Ω:

1 h (x) ≤ h (x) ≤ Ih (x) for some I > 0. (2.13) I i+1 i i+1 Note that this condition implies the first of the local comparability conditions (2.10a). By combining the length scale bound with the aspect ratio bound, one may infer that only a certain number of fine cells may be in the neighborhood of a coarse cell, bounding the number of overlapping cells (2.10b). Finally, for the coarsest mesh

Mn and some small constant b,

|Mn| ≤ b. (2.14)

19 This implies that the coarsening procedure will be able to create a constant-sized coarse mesh. Because of these conditions and their parallels with the conditions on the multigrid hierarchy, this method for coarsening is particularly appealing. Parts of the function-based coarsening idea have been incorporated into other methods for mesh coarsening by Ollivier-Gooch [105]. Some similarities between the method we propose here and this previous work are that both use traversal of the mesh and local remeshing in order to coarsen meshes that exhibit complex features. This method constructs the conflict graph

GC = (V,EC) (2.15) where

EC = {(vi, vj): β(Sp(vi) + Sp(vj)) > dist(vi, vj)}. (2.16)

This graph is then coarsened with an MIS approach as shown above.

Note that in the limit of extreme grading or large β the GC can grow to be of size O(|V |2). The computation could be sped up by building the graph only as needed. Our method avoids this by localizing and simplifying the notion of the spacing function and choice of vertex comparisons. We modify function based coarsening in a way that reliably guarantees optimal complexity irregardless of mesh non-quasi- uniformity as discussed in the adaptive finite element section without dependence on the mesh and the parameter β.

2.4.2 Graph Coarsening Algorithm

GCA We describe here a greedy algorithm for determining an approximation Vc to FBC the set of coarse vertices Vc satisfying a weakened notion of the spacing condi- tion based upon GM . Talmor [137] proposed using shortest-weighted-path distance determined by traversing along mesh edges to accomplish this. Instead of using the 20 (a) GM with Sp(v). (b) GM with βSp(v).

(c) Subalgorithm on v. (d) Subalgorithm on v1.

Figure 2.2: This shows coarsening of a patch of GM . SP is defined, multiplied by β, and the subalgorithm is invoked for two vertices, v and v1 shown in Figure 2.2(c) and Figure 2.2(d) in dark grey with vertices violating the graph spacing condition shown in light gray. shortest-weighted-path approach, we choose to use the edge connectivity to progres- sively transform GM into some final GM˜ c, which approximates the connectivity of c the coarse mesh, M . It should be noted that, beyond the initial GM formed at the start of the algorithm, GM does not necessarily satisfy the condition that its edges represent the edges of a valid simplicial mesh. We begin this process by modifying condition 2.12 to be

β(Sp(v1) + Sp(v2)) < dist(v1, v2) for all (v1, v2) ∈ E of GM . (2.17)

This restricts the spacing condition to take into account only distance between

21 vertices connected by graph edges rather than all pairs of vertices. This then involves not pairwise, but mesh dependent comparisons between the vertices by the same logic that led to us restricting Sp(v) to mesh adjacency. This is an important simplification for complex domains, where the mesh may represent complex features of the domain such as holes, surfaces, and interfaces with different local characteristics on each side that may be hard to encode when the vertices are considered as a point cloud. In our model, the edges of the mesh are seen as encoding the topological connectivity of the domain. Spacing only applies to topologically connected areas of the mesh as encoded in GM . It is known that the edge-path based distance de(v, w) for vertices v and w follows the inequality

|v − w| ≤ de(v, w) ≤ C|v − w| (2.18) for some constant C. In the case of our edge-based approximation, a large C would imply disagreement with the full FBC algorithm. It would also imply small angles for edges within some constant factor of one another, and therefore vanishing rin compared to the edge-lengths. Therefore, whatever we assume as a constant with regard to the aspect ratio bound (2.8) is going to control the size of C, and therefore the quality of the approximation to the full FBC algorithm. Define

F (v) = unknown, included, or excluded for all v ∈ V.

The algorithm starts with F (v) = unknown for all v ∈ V . Vertices that must be included for some reasons such as the boundary issues described in Section 2.4.3, are set to be included automatically and visited first. The remaining v ∈ V are then visited in some arbitrary order. If F (v) = excluded, the iteration skips the vertex. If F (v) = unknown, F (v) is changed to included.

22 When F (v) is set to included, a subalgorithm is invoked that, given GM and v some particular vertex v, transforms GM to some intermediate GM . This subal- gorithm corresponds to coarsening the area around v until the spacing function is satisfied for all N v (v). Each w ∈ NG (v) are tested to see if the edge (v, w) GM M violates (2.17). In the case that the condition is violated and F (w) = unknown, w is removed from GM and F (w) is changed to excluded. A removed w’s neighbors in G are added to N (v) by adding edges in G between v and all u ∈ N (w) M GM M GM if u 6∈ N (v) already. This may also be considered as an edge contraction from w GM to v.

The outcome of this subalgorithm, Gv , has that all w ∈ N v (v) such that M GM F (w) = unknown obey (2.17). There is the possibility that for some v1 ∈ N v (v), GM F (v1) = included and (2.17) is not satisfied. This may arise due to the necessary inclusion of boundary vertices due to conditions described in Section 2.4.3. After v the subalgorithm terminates, one is left with GM . The algorithm then visits some v1 v v1 (Figure 2.2(d)), which, if included, will create some GM by modification of GM . Once every vertex is visited, the whole algorithm terminates. At this point we may define

GCA Vc = {v : v ∈ V,F (v) = included}.

Despite the fact that we are considering coarsening and remeshing as a prepro- cessing step, we still have the goal that the algorithm should be O(n) with n as the number of DoFs in the discretized system. For piecewise linear finite elements, the number of DoFs in the system is exactly the number of vertices, |V |, in M. For all reasonably well-behaved meshes, we can additionally assume that |E| = O(|V |) and |M| = O(|V |). This implies that n = O(|V |). The complexity of a single invocation of the subalgorithm may be O(|V | + |E|) if some large fraction of the mesh vertices are contracted to some v in one step. However, the aggregate number of operations

23 taken to reach GM˜ c must be the order of the size of the original graph GM . This aggregate complexity bound is independent of the order in which v are visited. While here we keep the assumed order in which the vertices are visited arbitrary, we see in Section 2.4.3 that specifying some restrictions on the order may be used to preserve mesh features.

Theorem 2.4.1. Given a graph derived from a mesh GM , the graph coarsening GCA algorithm will create Vc in O(|V | + |E|) time.

Proof. The fact that the complexity of the algorithm depends on |V | at least linearly is apparent from the fact that every vertex is visited once. In order to show that the entire algorithm is O(|V | + |E|) we must establish that each edge e is visited at most a constant number of times independent of the size of the graph. As vertex v is visited and F (v) is set to included, the subalgorithm inspects each e = (v, n) for n ∈ N (v) to see if they satisfy the spacing condition with v. e GM is either deleted from GM if n and v do not satisfy the spacing condition, or left in place if the spacing condition is satisfied or if F (n) = included. Edges that are deleted are not ever considered again. Therefore, we must focus on edges that survive comparison. Suppose that an edge e = (v0, v1) is considered a second time by the subalgorithm at the visit to vertex v0. As this is the second visit to consider e, F (v1) must be included as in the first consideration of e necessarily happened during the visit to v1. As both endpoints of e have now been visited, there is no way e may be considered a third time. As each vertex is visited once and the distance across each edge in GM is considered no more than twice, the algorithm runs in O(|V | + |E|) time.

2.4.3 Mesh Boundaries

Preservation of the general shape of the domain is important to the performance of the multigrid method as the coarse problem should be an approximation of the fine 24 problem. In the worst case a coarser mesh could become topologically different from the finer mesh leading to very slow convergence or even divergence of the iterative solution. If this is to be an automatic procedure, then some basic criteria for the shapes of the sequence of meshes must be enforced. Therefore, the vertex selection and remeshing algorithms must be slightly modified in order to take into account the mesh boundaries. First, we must define the features, in 2D and 3D, which require preservation. We choose these features by use of appropriate-dimensional notions of the curvature of the boundary. Techniques like these are widely used in computational geometry and computer graphics [52,117]. In 2D, the features we choose to explicitly preserve are the corners of the mesh. In the coarsening procedure, the user may already know what particular mesh features need to be protected. If this is not given, a reasonable automated heuristic is simple to define; we consider any boundary vertex with an angle differing from a straight line more than CK to be a corner. For the sake of this π work we assume CK = 3 . In 3D, the discrete curvature at a boundary vertex is computed as the difference between 2π and the sum of the boundary facet angles tangent to that vertex. Vertices where the absolute value of the discrete curvature is greater than CK are forced to be included. High-dihedral-angle edges, corresponding to ridges on the surface of the domain, must also be preserved. We consider any boundary edge with the absolute value of the dihedral angle greater than CK to be a boundary ridge. Our approach to protecting the boundary during the coarse vertex selection pro- cedure is to separate vertex selection into two stages, one for the interior, and one for the boundary. In the interior stage, the boundary vertices are marked as included automatically, and therefore any interior vertices violating the spacing condition with respect to a boundary vertex are removed from GM (Figure 2.3(a)). All boundary vertices now respect the spacing condition with respect to all remaining interior ver- tices, making the second stage, boundary vertex selection, entirely independent of

25 (a) interior coars- (b) boundary (c) final graph ening coarsening

Figure 2.3: Interior and boundary coarsening stages. Dark gray vertices are included automatically; Light gray circles violate the spacing condition. the previous coarsening of the interior. The boundary vertex selection procedure then operates independently of this, producing the completely coarsened graph G M˜c (Figure 2.3(c)). In 3D this process is repeated once again. During the boundary coarsening procedure, vertices lying on edges that have been identified as ridges are automatically included. The ridge vertices are then coarsened in turn. Corner ver- tices are automatically included during all stages of vertex selection, as shown in Figure 2.3(b).

2.5 Remeshing

After some VC has been picked, one must create a new mesh, MC that contains only the vertices in VC and is “coarser” than the initial mesh. There are several options for this process, however, for the sake of this problem we have picked one that fits our general strategy: local mesh modification and traversal. This section describes our options for performing this operation in a way that satisfies the needs of the multilevel methods in a robust and general fashion. We choose a process relying on removal of individual vertices, discuss how the mesh quality may be enforced, and discuss issues that arise at the boundary.

26 2.5.1 Remeshing Options

One way of doing this would be using remeshing software such as Triangle [127], Tetgen [130], Netgen [121], or the like. These pieces of software may be used to create a constrained Delaunay triangulation of a domain specified by some set of vertices and a boundary. This type of input to mesh generators may be easily created from the coarsened set of vertices, as the lower-dimensional coarsening is generally easier to do than the higher dimensional coarsening, allowing the boundary to be remeshed by a local flipping or contraction scheme, and then the interior remeshed. The major disadvantage of using a mesh generator is that some set of Steiner points must often be inserted in order to make the meshing process possible [103,116]. This is a problem if we want to keep the meshes node-nested, as the Steiner points must typically be inserted without regard to the previous nodes in the mesh. This may be necessary for the algorithm to complete in 3D. Even in 2D, the notions of mesh quality built into these packages may not be sufficient for use with numerical methods. The other major disadvantage to this is that the mesh generator’s input may require some a priori knowledge of the topology of the domain. This includes any holes on the absolute interior of the domain. If these are not either identified or spec- ified, they may be meshed over, changing the topology of the domain and potentially making the resulting preconditioner less effective or not effective. Therefore, for a given mesh, it is best to be able to handle the process of remeshing without knowing the global topology of the domain. This caused problems in the initial implementation of this method for coarsen- ing, which relied on the above-mentioned Triangle and TetGen software packages to remesh. Initial efforts to locate and mark holes in the domain such that they could be pointed out to the meshing software were tedious in 2D and nearly impossible in 3D, where a “hole” in the domain could be of various overall topology rather than just an inclusion in the mesh. 27 In order to do this, processes of local remeshing were used instead. The problem is perhaps even easier than that of local remeshing, as the node-nestedness limits the problem to simply that of removing some set of the vertices.

2.5.2 Simple Vertex Removal

Simple vertex removal schemes for numerical methods have been proposed in the past. There are ones that roughly approximate the reversal of some refinement process. There are others that use edge contraction. This may be used to great effect when the remeshing has to be done quickly to generate a series of coarse meshes with reasonable but not perfect quality. The scheme we propose here involves the creation of coarse meshes primarily by edge contraction. This has been studied in the multigrid context before, but is worth elaborating on in terms of how the eventual multigrid method came together. In addition, we introduce a framework for studying how the quality of the resulting meshes may be controlled. Guaranteed quality control will give us the proper properties we need for the eventual multigrid solution to be optimal. We define two conditions for the mesh creation scheme, firstly, we must pick a new configuration T tetrahedralizing the space formerly occupied by the link of v such that

max Q(τ) ≤ CQ (2.19) τ∈link(v) which is the quality enforcement condition, and

∗ min max Q (τ) ≤ CQ∗ (2.20) Tnew τ∈Tnew which is the quality optimization condition. Where the enforced quality metric Q is not necessarily the same as the optimization quality metric Q∗. The procedure for

28 remeshing then becomes the process of choosing some new set of cells T that fill the link of the vertex being removed such that the maximum quality measure condition is satisfied and the quality optimization condition is minimized. We consider the set of potential retetrahedralizations formed by a simple process. Given a vertex v with link cells τ ∈ link(v) and some link vertex n ∈ N(v), where the process forms cells

{τnew} = {(τ − v + n)} for τ such that n 6∈ τ (2.21)

Figure 2.4: Contractions removing v (red). From left to right: Good aspect-ratio contraction, poor aspect-ratio contraction, and inverted contraction (invalid triangles in orange).

This set of retetrahedralizations are the feasible edge collapses. For an edge collapse to be feasible, it must both satisfy the quality constraint, and be properly oriented in space. Orientation may be tested by seeing if the sign of any of the tetrahedra change between τ and τnew. This condition may be stated as:

|Jτ ||Jτnew | > 0 (2.22)

For the of Jτ , the geometric Jacobian of the cell τ and the determi- nant of Jτnew , the geometric Jacobian of τ with v replaced in-order by n, ensuring

29 that the cell facet across from v remains oriented the same way. The geometric Ja- cobian of a cell is the linear map from the reference element. Note that enforcing this makes sure that the mesh is never inverted at any stage in the algorithm.

2.5.3 Quality Measure Possibilities

There has been a fair amount of study on what constitutes a reasonable quality measure with respect to meshes used with numerical methods. A few interesting ones are reviewed in [106] and a much more extensive and brutal assessment is in [129]. A fair amount of the work, both by practitioners of computational geometry and numerical methods, has focused on purely computational geometry notions such as the circumcircle, incircle, and the like. There have also been attempts to quantify the quality by way of looking at properties of the element Jacobian [58]. Here we investigate the computational geometry approach with respect to the multigrid mesh constraints. Listed below are the properties of a particular cell that may be considered in the computation of a quality measure based upon the local features of a given mesh that generalize to any dimensional simplex and have an easy to understand differential equation-independent interpretation. Conditions that don’t generalize include the minimum angle in a triangle θ, although the dihedral angles are a similar type of measure in 3D. For general dimensional simplices, we may define

• the incircle radius rin(τ)

• the circumcircle radius rci(τ)

• the maximum length-scale h

• the minimum edge-length hmin

30 rin hmin h

rci

Figure 2.5: Components of the quality measures for the case of triangles.

Here “circle” refers to the two-dimensional case, and we will use this language in the general case as well. However, in 3D we mean “sphere” for circle. We consider the relations between these various quantities in order to compare and contrast tra- ditional measures of mesh quality. For instance, it is obvious that ρτ ≥ ντ , 2ρτ ≥ h, h ≥ hmax. Some fairly commonly used quality measures are essentially aspect ratios, which may be expressed as

rci(τ) 1. The aspect ratio QAR = rin(τ) h 2. Another formulation of the aspect ratio as QAR = 2rin(τ)

rci(τ) 3. The circumradius to longest edge measure, QLE = h

rci(τ) 4. The circumradius to shortest edge measure, QLE = hmin(τ)

A number of popular meshing tools use the circumradius to longest edge ratio [127]. The circumradius to shortest edge ratio is considered the most “natural” [128] by meshing practitioners concerned with Delaunay refinement algorithms because of its intrinsic minimization of the circumradius, which is inherently intertwined with the Delaunay condition. 31 For our particular application, we initially consider the aspect ratio. We want an inverse factor of the inradius rin(τ), as we want it maximized. The inradius may be thought of as the minimum length scale of a cell independent of the edge lengths. Using it helps control the smallest angle or dihedral angle in the cell as large dihedral angles necessarily place facets near facets, causing the incircle to decrease. The second option is particularly appealing in the case of isotropic refinement of graded meshes, as each step conserves and optimizes the isotropic quality constraints as one continues to coarsen.

2.5.4 Anisotropic Quality Measures

However, for the case where the mesh has been generated with some anisotropic refinement, aspect ratio measures are not a good way of thinking about the quality of the resulting mesh for numerical methods, as Q for the progressively refined initial meshes would blow up. This leads us to measures that are not in line with the traditional isotropic measures dependent on the largest length scales in the mesh. In fact, using these measures on extremely anisotropic meshes often impedes coarsening entirely, as no valid contractions exist with QAR as the enforced quality measure. The difficulties of the anisotropic coarsening are also of concern in the highly graded isotropic case. While remeshing, it may be necessary to create an anisotropic cell due to coarsening order, which will disappear as isotropy is restored by repeated coarsening. Failure to account for this type of issue leads to “runs” which appear when a whole series of vertices are contracted onto the same one due to repeated coarsening in one direction due to the ordering of the vertices of the mesh. This case may be avoided by allowing for anisotropic coarsening. One we will propose here for use with anisotropic problems is a simple modifica- tion of the aspect ratio. This modification still behaves as roughly the aspect ratio in the case of an isotropic mesh, i.e. a mesh such that

32 h < C for a fixed C > 1. (2.23) hmin

We replace h with hmin, giving us

hmin(τ) QAI(τ) = . (2.24) rin(τ) The behavior of this is markedly different in the anisotropic case. Instead of penal- izing the difference in length scale, this quality measure penalizes difference between the minimum edge length of the mesh and the minimum length scale as represented by the inradius. The original aspect ratio is that for meshes with very different length edges, the original aspect ratio would be needlessly penalizing elements that are well- shaped but long and skinny. For instance, in two dimensions, elements consisting of long and skinny right triangles should be considered well shaped, but elements with large angles should be considered of poor quality. This is also expressed anecdotally by the names used by the meshing community for long and skinny tetrahedra of var- ious sorts [106]. “thin” and “wedge”, tetrahedra; those with at least one edge on the order of the incircle radius, are acceptable anisotropic cells. “sliver” and “flat” tets, with vanishing incircle but nearly isotropic edges, are not acceptable. Therefore, the incircle radius is a necessary measure of quality, but we do not want to punish anisotropy.

Figure 2.6: Thin, wedge, sliver, and flat elements with incircles displayed in blue. Note the relationship between the minimum edge length and the incircle radius in each case.

33 Another benefit of this scheme exists even in the isotropic case. We may need, in the process of coarsening, to take a step that is “effectively” anisotropic given the fine and coarse scales, but will be made isotropic at some later time. Having the minimum distance rather than the maximum distance be the notion of quality allows us to have a better intermediate view of mesh quality than enforcing effective isotropy during the entire process of coarsening. Given the progressive transformations that make up the remeshing procedure, this makes sense. Note that this also has an interesting connection to the spacing functions as described above. The spacing function we use is different than the typical local feature size as classically described in the computational geometry literature, and for good reason. The local feature size we use here is the minimum edge length, and the use of the minimum edge length as something we want to minimize at each step keeps the local feature size as defined by the shortest edge to a minimum.

2.5.5 Remeshing the Boundary

The boundary of a mesh provides some special challenges for these types of ap- proaches. While we still want to preserve the enforced quality condition, the opti- mization condition may be modified into some notion over the boundary rather than over the interior. If we do not preserve the boundary, we get “tearing” of curved sections of the boundary, where some curvature that is under the corner or ridge threshold is effectively flipped across an edge. This is both aesthetically displeasing, and represents a potentially significant change in the shape of the mesh. Volume preservation methods are an interesting way this may be done. This is done such that the chosen contraction minimizes the change in mesh volume. This is may be easily used as the optimization criterion on the boundary, and experiments show that this keeps the mesh boundary from being “torn” during the coarsening process. Therefore, the optimization criterion on the boundary becomes

34 Q ∗ (Tnew) = |V ol(Tnew) − V ol(Told)| (2.25) over the entire new configuration Tnew given the old configuration Told.

Figure 2.7: Coarsened aneurysm mesh with (left) and without (right) volume opti- mization. Note the surface tearing on the right.

In fact, it is possible that this kind of investigation could replace guarding the ridges of the mesh, for the most part. However, in the pathological case that con- traction across the ridge is not possible, this would not be enough to protect the ridge from being torn.

2.5.6 Mesh Cleanup

A remaining issue with this scheme is that it seems to have the most pathological cases arise when the mesh is nearly the regular right-angle grid. This seems to be due to the fact that the nearly-coplanar faces of the mesh restrict the possibilities for contraction by creating situations where most if not all contractions will create a sin- gular tetrahedron, which is explicitly forbidden both from the orientation and quality 35 measure viewpoint for any reasonable quality measure. We note that a number of the pathological cases, typically encountered at the interface between coarsened and non-coarsened parts of the mesh, may be cleared up by some quality-improving steps taken before or after the contraction of a vertex to a neighbor. It’s also a reasonable idea to, after taking a contraction step, look at how any locally non-optimal tetrahedra that are created in that process may be cleaned up. This cleanup may be restricted to consider some amount of distance around the removed vertex. This removed distance may be restricted to one or two tetrahedra. If we consider just one set of tetrahedra, any additional modifications must be restricted to the link itself. If we consider distance two, then we may also flip across edges or facets on the surface of the link into the next set of tetrahedra. Therefore, we may qualify this as investigating whether any simple flip will im- prove the maximum optimization quality measure of the involved tetrahedra. This may be done by investigating across all edges and facets in the link to see if any are amenable to this kind of mesh improvement. By flip, we mean a simple, local topological mesh change. These are done by taking some simple motif, consisting of some small number of cells with a defined connectivity, in the mesh with simply defined guidelines for when it is feasible to change it to another motif. In two dimensions, only one type of flip is necessary, and the mesh modification may consist entirely of flipping across an edge. In 3D, the most general flips are those taking three tetrahedra tangent to an edge to two tetrahedra across a face and vice versa. These may be more robust by adding additional flips to get out of pathological situations, such as the flip of an edge when four tetrahedra share four coplanar vertices [93]. Its success rate may also be improved by having some perturbation of the input such that the resulting tetrahedralization is valid both in the original and modified geometries. However, the emphasis on the resulting mesh being Delaunay is somewhat of a distraction from the needs of numerical methods,

36 which require quality measures that are often at odds with the Delaunay condition, which tends to only restrict the largest length scales of a given cell and falls down in terms of enforcing quality conditions on the boundary of meshes. It should be noted that it is much more likely that a given doublet across a facet may be flipped, period, than it is likely that an edge may be flipped into a facet. This is due to the fact that the number of tangent cells to an edge is variable and probably more than the three required for a flip. This may be contrasted with the fact that any given mesh face has the requisite number of tetrahedra tangent to it to be flipped. However, valid flips are still somewhat rare, and a fairly high relative computational cost may be assigned to searching for flips that are both valid and optimize the mesh. However, cleanup has the added benefit that having a better-shaped mesh makes any subsequent vertex removals involving better-shaped tetrahedra more likely. There- fore, the number of times a given vertex must be retried is generally reduced by the optimization stage. However, we have seen that the mesh cleanup stage can only improve things so far, and that the costs outweigh the benefits.

2.6 Adaptive Refinement

Adaptive finite element methods (AFEM) provide a way towards efficient solution of a given differential equation to a specified accuracy. As a large number of partial differential equations have solutions with a small region of error and a large region of relative calm, a posteriori AFEM’s goal is to hone the discretization in on the region contributing to the overall error and guide the solution process to a specified accuracy through on a final mesh with high grading. A fairly typical scheme for constructing a series of meshes for geometric multigrid methods involves uniform refinement from some coarsest starting mesh. However, the goals of refinement for the purpose of resolving discretization or modelling error

37 and refinement for the purposes of creating a multigrid hierarchy can easily be at odds with each other if care is not taken. If a mesh has been refined to satisfy some analytical or adaptively determined error control, then it may have vastly different length scales existing in different parts of the mesh. It is because of this that the combination of refinement in order to resolve physics and refinement for the purpose of multigrid is fraught with peril. If one refines for the purpose of creating a multigrid hierarchy from a coarse mesh with some grading to handle the approximation issues, the refinement of the finest mesh will not reflect the error properly. Conversely if some stages of the refinement for accuracy are used for the multigrid hierarchy, one may be unable to guarantee that the meshes would satisfy the quality and size constraints required by multigrid. Because of this it is very difficult to make the two concepts, adaptive refinement and geometric multigrid, go hand in hand as a single adaptive meshing strategy that satisfies the needs of both methods in the general case. In typical applications in computational science, one is often given a mesh that has already been adapted to the physics of the problem and asked to solve some some series of linear equations repeatedly on this static mesh. Two examples where this happens quite frequently are in optimization problems, and in the solution of nonlinear equations by methods requiring the equations to be linearized and solved repeatedly. In this case, the only available geometric multigrid methods are based upon coarsening, and huge advantages may be reaped from precomputation of a series of coarser spaces that effectively precondition the problem. This is not to say that combinations of the two are not used in practice with great success [11,20]. However, in some ways the process of coarsening is both theoretically and practically more appealing in many situations, motivating our study of it. The need for a mesh refined to resolve error can be demonstrated in some fairly straightforward cases. The need for a priori grading around a reentrant corner to resolve pollution effects [8] is a well-studied classical phenomenon [2, 25, 96] in

38 adaptive finite element computation. Multigrid computations on reentrant corner problems have been analyzed on simple meshes in the two dimensional case with shape-regular refinement [43] or structured grading [82]. A mesh arising from these requirements has disproportionately many of its vertices concentrated around the reentrant corner, such as the refined mesh shown in Figure 2.8. Around a reentrant corner of mesh interior angle θ ≥ π we can define a factor

π µ ≥ θ

Then, given some constants Ca > 0 and Cb < ∞ related to the maximum length scale in the mesh, a mesh that will optimally resolve the error induced by the reen- trant corner, for h as the length scale of the cell containing any point a distance r away from the corner, will satisfy:

1−µ 1−µ Car ≤ h ≤ Cbr

Figure 2.8: Example of the singular refinement around a reentrant corner as required from (2.6). Both meshes have roughly 500 vertices.

39 2.7 Interpolation Operators

Another essential component of multigrid methods is the fast construction of in- terpolation operators between the function-spaces at each level. The interpolation operators used in multigrid methods involving the spaces {V0...VN } induced by the meshes {M0...MN } (fine to coarse) are

i • restriction: Ri+1 s.t. f ∈ Vi,Rif ∈ Vi+1

i • prolongation: Pi−1 s.t. f ∈ Vi,Pif ∈ Vi−1

In practice, the projectors between discrete function spaces are matrices mapping each DoF in one function space to some linear combination of DoFs on another. For the sake of this discussion, we will consider the construction of the prolongation i i−1 i T operators Pi−1 and use Ri = (Pi−1) . This is a very typical strategy in the construction of multilevel schemes. Here we restrict ourselves to the standard linear interpolation using evaluations of nodal finite element methods. However, this is by far not the only choice. Multi- grid methods, both algebraic and geometric, are dependent either partially or wholly on the quality of the constructed coarse spaces. This may be done through various methods, including aggregation, smoothed aggregation, algebraic methods with as- sumptions of finite element origins for the operators, and in ways that are energy stable. The fast construction of these operators, however, may be difficult. While it is possible to have reasonable, general O(n log(n)) algorithms for the construction of the interpolation operators by some fast search and location framework, we believe that a much more potentially general framework may be constructed from traversal.

40 2.7.1 Construction by Traversal

c The construction of the prolongation operator Pf by traversal across the meshes Mc and Mf has several advantages over other ways of constructing it. Firstly, the inherent locality afforded by traversal enables one to easily extend the algorithm to handle special cases; a few of which will be elaborated upon here. In order to build an easily extendable local traversal algorithm, we break the traversal-based location up into two stages:

1. geometry location

2. DoF location

The first stage involves the matching of mesh features to mesh features. This resembles a tandem traversal of the coarse and fine meshes. This stage may be accurate or inaccurate, and serves to enable the topological connectivity of the mesh to accelerate the matching of coarse and fine mesh features in a way that going directly to the specific location and properties of nodes of the mesh would not allow. The second phase consists of the location of fine DoFs, the association of those unknowns with a particular set of fine DoFs, and the construction of entries in the prolongation operator. This looks like a smaller variant of the geometry traversal, with the location of a particular DoF, for the sake of this section a node for nodal finite elements, being what is finally determined. We make a couple of assumptions for the sake of this argument. The first is the nodal assumption stated above, meaning that in order to project a function f(x) f f onto the function space spanned by some set of basis functions V = {ψi }, we may construct the coefficients

f fi = f(xi) · ψi (xi). (2.26)

41 (a) Outer traversal (b) Inner traversal

Figure 2.9: The goal of the outer and inner traversals: locating an associated cell for a fine topological feature, and fine degree of freedom (both in red).

f c c Therefore, the standard linear interpolator between V and V = {ψj} may be constructed as

c f Iij = ψj(xi) · ψi (xi). (2.27) f f One can, for the sake of simplicity, associate every ψj with a single cell τi . The choice of cell is arbitrary between the cells the unknown is supported on. This makes the problem equivalent to finding, for some set of x ∈ Ω, T c(x), defined as the cell in Mc that x is located in. We also define T˜c(x), which is some cell in Mc that is, in f some sense, nearby x. In addition to the points xi associated with all ψ , we have f f the midpoint x f of every cell τi ∈ M . τi

Outer Loop

c f f The outer loop locates T (xτf ) for each τ ∈ M , and the inner loop determines c f f c T (xi) for all xi associated with ψi supported on τi . Once T (xi) is resolved for all 42 i, one may construct the nodal interpolation operator. The outer loop consists of breadth-first searches (BFSes) on the graph of the c c f neighbor relations of the cells of M in order to determine T (xτk ) for each τk ∈ M . This is implemented by a simple first-in-first-out FIFO queue in which the neighbors of a given cell are pushed on the back of the queue when it is visited. Enqueued and visited cells are marked as such and not enqueued again. We say that two cells c c τ0 and τ1 are neighbors if they share a vertex, edge, or facet. The BFS to locate ˜c τk starts with the cell T (xτk ). A typical visitation order for the BFS is shown in Figure 2.10(a).

Inner Loop

c f As T (x f ) is established for each cell τk , the inner loop is invoked. This loop τk f f c consists of a BFS for each ψi associated with τk and determines T (xi) by BFS ˜c c starting from T (xi) = T (x f ). τk ˜c The last ingredient in the algorithm is how to determine T (x f ). One simple τk ˜c c f way of doing this is setting T (x f ) = T (x f ) for any cell τm that is neighboring τk τm f ˜c τk . This notion of locality may be exploited by setting the values of T (τn) for all c c f f f neighbors τn of a cell τ upon determining T (τ). When T (τ ) for any τ ∈ M is determined, the connectivity of Mc and Mf may be effectively traversed in tandem in order to extend the search over the whole meshed domain. This process may be both started and made more efficient by exploiting the node- nested properties of the meshes. We have that meshes Mf and Mc will share some (f,c) f set of vertices V due to our node-nested coarsening procedure. Define τv to be f (f,c) c a cell in mesh M that is adjacent to some v ∈ V and τv to be an arbitrary cell c ˜c c in M adjacent to that same v. Then, one may initialize T (x f ) to be τv . τv

43 (a) BFS patches. (b) Step heuristic.

Figure 2.10: The patch covered by various depth breadth-first searches. The pes- simistic size of this patch estimated by our complexity heuristic.

Complexity Heuristic

An observation about the complexity of this algorithm may be established assuming that the conditions in Section 2.4 are satisfied by the hierarchy of meshes. It should be obvious that the complexity is going to be bounded by the total number of BFSes done during the course of the algorithm multiplied by the worst-case complexity of an invocation of the BFS. It’s easy to see that for |Ml| cells in the fine mesh and n unknowns, the geometric search must be done |Ml| + n times. In order to bound the complexity of a given BFS to a constant, we must show how many steps of traversal one must search outwards in order to locate some T c(τ f ) given T˜c(τ f ). This may be accomplished by showing that the length scale the search has to cover on the fine mesh may only contain a given number of coarse mesh cells.

We know that the dist(x f , xτf ) is going to be less than some maximum search τ0 radius rτf = hτf + h f given that they are adjacent. We also may put a lower limit τ0 c f f on the minimum length scale of cells in M that overlap τ and τ0 by the constant

44 C h in (2.10b) and by the aspect ratio limit in (2.8) as hc = 0 τf . min CAR This gives us that the number of cells that may fit between T˜c(τ f ) and T c(τ f ) is  f  rτ at most c , which is independent of overall mesh size. Note that the distance hmin between the center of a cell τ and some nodal points xi of ψi supported on τ is always less than or equal to hτ . This extends this constant-size traversal bound to the inner loop also, making the entire algorithm run in O(|Mf | + n) time. In practice on isotropic meshes T˜(τ f ) and T (τ f ) are almost always in the same cell or one cell over, meaning that the topological search is an efficient method for building the interpolation operators.

Boundary Caveats

The boundary of the domain in the unstructured case may change in shape rather significantly from the previous configuration, even when using protective measures as described in the remeshing section. This requires us to be able to determine favorable values for nodal unknowns lying outside the domain spanned by the coarse mesh. This is done by modifying the procedure above to be an approximate search when necessary. This approximated search may be incorporated into the above framework with some modest modifications. The core of these modifications is doing approximate search based upon nearest discernible points within some approximate cell rather than directly locating points. Points that are inside an easily located cell are left with the above algorithm. An extension of the algorithm that has proven necessary on complex meshes is to allow for interpolation to xi not located within a cell of the mesh. For example, the Pacman mesh (Figure 2.12), when coarsened, will have vertices that lie outside the coarser mesh on the round section. In order to do this, the outer loop BFSes are replaced by a more complex arrangement where the BFSes search for the nearest cell rather than an exact location. This nearest cell then becomes T (xτ ). The inner

45 ?

(a) Boundary Problem (b) Simple Boundary Solution

Figure 2.11: Example where the meshes aren’t nested. Solution involves projection of nodes to the nearest interface. loop is replaced by a similar procedure, which then projects any xi that could not be exactly located to somex ˜i on the surface of the nearest cell and changes (2.27) c c to use φj(˜xi) instead of φj(xi).

2.8 Experiments

The experiments aim to show that our concern with the theoretical aspects of geo- metric multigrid are justified. These are simple problems on meshes with singular a priori refinement tuned to resolve error around the reentrant corner. We show that the coarsened mesh sequence satisfies the hierarchy conditions, and then seek to show that textbook multigrid efficiency is recovered. Given the a priori refinement required for this, we may also describe what an “ideal” set of meshes for this problem would look like. This is an interesting addi- tional experiment. This allows us to investigate how close to an ideal series of meshes 46 for the reentrant corner problems we get. We also show that the coarsened hierarchy is somewhere near ideal in the test problems we consider. We will justify the method in terms of the other major option for unstructured multigrid problems. This is algebraic multigrid. While we acknowledge that we cannot beat AMG with respect to the cost of constructing the hierarchies, as we must do a lot more work in that process, we will compare the asymptotic performance on the above named problems, and demonstrate the advantage of geometric methods; that we may apply them generally and in problem cases where the algebraic multigrid methods do not work. To do this, we will elaborate on the flexibility of the method, both in terms of the geometries we may treat, and in terms of the differential equations we may solve with the method. The combination of our generalized treatment of the geometry with our general (nodal) interpolant treatment allows us to solve other interesting problems, including those that have a dual variable. Algebraic multigrid methods are shown to fail on these problems, and we show that we may treat problems of general interest with the method. Enabling this is an implementation. The interface to this implementation is decribed in full in Appendix A. This is the second implementation of the method, and is much improved over the first [34] in terms of flexibility.

2.8.1 Experimental Setup

The problem setup was done using extensions to the DOLFIN finite element frame- work. This extension consisted of two parts:

• the interface to the coarsening infrastructure

• the interface to the multigrid infrastructure.

47 These two parts are roughly decoupled, as the coarsened mesh or sequence of meshes generated from the coarsening infrastructure may be used with any application, in- cluding refinement-and-coarsening based adaptive finite elements. A full description of the software interfaces to these components is given in Appendix A. The multigrid infrastructure may also be used with any series of meshes, such as ones created by regular refinement or repeated generation from a geometric description. The meshes were generated automatically for a given size using custom mesh generators written as an extension to DOLFIN that repeatedly refines cells not sat- isfying the refinement criterion until fixed point or some maximum (and excessive) number of iterations of refinement. This was used in order to generate the pacman and fichera corner meshes for the asymptotic studies of multigrid performance. The refinement algorithm used in DOLFIN is a standard Rivara-type [113] edge splitting. The PCMG framework allows for the construction and composition of multilevel preconditioners within the PETSc krylov solver (KSP) framework and allows for inter- esting extensions to the multigrid implementation, including transparent incorpora- tion of multiple cycle types, such as V and W cycles, as well as novel grid sequencing such as full multigrid and kaskade schemes as well as additive and multiplicative application of the multigrid preconditioner. For the sake of these experiments, we choose to use an ILU(0) type smoother [40, 78] that tends to work quite well on the sort of problems we will be handling here [51].

2.8.2 Test Problems

The test problems are the a priori refined fichera corner and Pacman problems. The Pacman mesh is the unit circle with a tenth removed. The refinement criteria is as defined above. These problems are the types of problems that are most difficult

48 for standard multigrid techniques, and therefore the ability to create an optimal multigrid method on them is of great interest. The Fichera corner mesh is the double-unit cube with one octet removed. This leaves a right-angle corner at the origin. The proper grading for this particular case is elusive, as the singularity is weaker at the corner than it is at the edges. For the sake of the problem, we will grade at the corner as we do at the edges. The partial differential equation being solved is simply Poisson;

(∇u, ∇v) = (f, v) (2.28) for all v in V .

2.8.3 Mesh Quality Experiments

In order to motivate the use of the method with the multigrid solver, one must look if our implementation of the method actually lives up to the mesh quality metrics stated as requirements for guaranteed multigrid performance. This study was done using large initial Pacman and Fichera corner meshes coarsened using the same algorithm used for the multigrid studies in Section 2.8.5. The measurements of mesh quality we have taken correspond to the bounds on the meshes in the multigrid requirements. These are the number of vertices and cells in the mesh, the worst aspect ratio in each mesh, the maximum number of cells in a mesh overlapping any given cell in the next coarser mesh, and the maximum change in local length-scale between a mesh and the next coarser mesh at any point in the domain. The meshes are created by repeated application of the coarsening algorithm with β = 1.5 in 2D and β = 1.8 in 3D.

49 Table 2.1: Hierarchy quality metrics for the Pacman mesh.

Pacman Mesh, β = 1.5   k hk−1(x) level cells vertices max(AR(τ)) sup {|Sτ |} max τ∈Mk x∈Ω hk(x) 0 70740 35660 3.39 – – 1 22650 11474 7.64 14 9.48 2 7562 3858 7.59 15 6.23 3 2422 1254 4.63 15 4.19 4 811 428 5.52 15 5.32 5 257 143 5.94 15 8.51 6 86 52 7.76 12 4.94

Figure 2.12: Generated hierarchy of the Pacman mesh.

50 Table 2.2: Hierarchy quality metrics for the Fichera mesh.

Fichera Mesh, β = 1.8, CAR = 60   k hk−1(x) level cells vertices max(AR(τ)) sup {|Sτ |} max τ∈Mk x∈Ω hk(x) 0 373554 72835 4.41 – – 1 49374 9120 59.3 197 6.63 2 11894 2269 59.2 131 6.70 3 3469 693 59.3 94 6.68 4 914 208 58.8 121 6.61 5 182 52 49.5 94 6.87

Figure 2.13: Generated hierarchy of the Fichera mesh.

In 2D (Table 2.1) the aspect ratio of the resulting cells stays within acceptable limits during the entire process. The slight increase in the aspect ratio is expected for the coarsening of highly graded meshes, as the coarser versions must necessarily remove vertices lying between refined and coarse regions of the non-quasi-uniform mesh. However, the associated increase in aspect ratio is kept to a minimum by the enforcement of the Delaunay condition. In 3D (Table 2.2), we see consistent decrease of the mesh size and increase in the length scale. The maximum aspect ratio stays around CAR for most of the levels. Further work on our incredibly simple remeshing scheme should be able to improve

51 this. However, we do not see successive degradation of the quality of the series of meshes as the coarsening progresses. We can assume that the quality constraints are reasonably met in both 2D and 3D.

2.8.4 Mesh Grading Experiments

We noted in Section 2.6 that we can quantify how well our coarsening strategy does for a priori refined meshes of both the Fichera corner and Pacman types as described above. To refresh, we are looking at the constants in (2.6). This relation proves to be much more interesting when considered in the multigrid context. If we use the algorithm from the previous section to coarsen, we have an interesting relation on the length-scales (assuming the isotropic case). We may say that for the “ideally” coarsened set of coarse meshes we have

1−µ 1−µ Carl β ≤ hl ≤ Cbrl β.

Note that we coarsen using the upper bound. Therefore, in the ideal case we would have h for τ in the coarser meshes roughly following

1−µ Cbrl β ≈ h for all τ on a given coarse mesh level . This provides us an interesting metric, in this particular case, for how our coarsening algorithm performs with respect to the idealized situation on these reentrant corner problems. As we have some initial straying away from the upper bound (towards an assumed lower bound) in the process of refinement, we may look at how the upper and lower bounds shift with respect to this to determine how well we have done in the process of choosing a coarse vertex set and coarsening. Consider the quantity, for a cell τ in mesh

52 h C˜ = . (2.29) τ r1.−µβl This gives us a “closeness” to the ideal situation of a particular cell lying r away from the reentrant corner. Then, we investigate

˜ Ca = min(Cτ ) (2.30) and

˜ Cb = max(Cτ ) (2.31) for each mesh in the sequence. This will tell us exactly how optimal our coarsening strategy is in two respects. Firstly, it tells us how well our coarsening strategy main- tains the shape of the mesh from mesh to mesh considering the idealized situation. Secondly, it tells us how “tight” each increase in the length scale is. What this means is that, because we remove colliding vertices, we may increase the overall mesh length scale by some factor somewhat greater than β at each step. This metric allows us, for the reentrant corner cases, to quantify both of these issues.

Table 2.3: Calculated Cl and Ch for the Pacman mesh hierarchy. |V | C C Cb a b Ca 20414 1.37e-2 2.82e-2 2.04 6574 1.18e-2 5.41e-2 5.41 2153 1.59e-2 5.68e-2 5.68 729 1.93e-2 7.48e-2 7.48 238 2.44e-2 7.81e-2 7.81 80 2.68e-2 9.05e-2 9.05

53 Table 2.4: Calculated Cl and Ch for the Fichera mesh hierarchy. |V | C C Cb a b Ca 72835 7.49e-2 1.23e-1 1.65 9272 3.42e-2 4.56e-1 13.37 2582 3.22e-2 3.09e-1 9.60 977 2.49e-2 2.20e-1 8.82 378 1.57e-2 1.73e-1 10.98 182 9.00e-3 1.20e-1 13.41

We describe the ratio of the two constants, Ch as the tightness of the refinement Cl scheme. We see that the initial meshes, both for the Pacman and Fichera problems, have fairly small ratios between the constants, and that the initial coarsening causes a loosening of the constants. However, after the initial loosening, our constant range stays fairly fixed. This shows that we are not that far off from an “ideal” set of meshes for these particular idealized test cases. The next experiments show that these meshes work quite well, as we can predict from this experiment and the quality experiments, for multigrid methods.

54 2.8.5 Multigrid Performance

Table 2.5: Multigrid performance on the 2D (Pacman) and 3D (Fichera) problems. l is the number of mesh levels. MG is the number of multigrid V-cycles. AMG is the number of AMG V-cycles. ILU is ILU-preconditioned GMRES iterations. L2 Error is computed based upon difference from the exact solution for the Pacman test problem. (a) Pacman Problem Performance (b) Fichera Problem Performance

DoFs l MG AMG ILU kEk2 DoFs l MG AMG ILU 762 3 6 9 76 9.897e-6 2018 2 7 8 46 1534 4 6 10 129 3.949e-6 3670 2 8 8 60 3874 5 7 11 215 9.807e-7 9817 3 8 8 112 9348 5 7 12 427 2.796e-7 27813 4 9 9 207 23851 6 7 13 750 1.032e-7 58107 5 9 9 385 35660 7 8 13 1127 8.858e-8 107858 6 9 9 543 78906 7 7 14 1230 3.802e-8 153516 6 9 9 440 139157 8 8 14 3262 1.535e-8 206497 7 9 9 616 175602 8 8 14 3508 3.440e-9 274867 8 9 9 652

The performance of multigrid based upon the mesh series created by this algo- rithm using the two test examples was carried out using the DOLFIN [97] finite element software modified to use the PCMG multigrid preconditioner framework from PETSc [9]. The operators were discretized using piecewise linear triangular and tetrahedral finite elements. The resulting linear systems were then solved to a relative tolerance of 10−12. The solvers used were ILU-preconditioned GMRES for the standard iterative case, shown in the ILU columns as a control. We also compare to the Hypre algebraic multigrid package. In the multigrid case we chose to use GMRES preconditioned with V-cycle multigrid. Three pre and post-smooths 55 using ILU as the smoother were performed on all but the coarsest level, for which a direct LU solve was used. For this to make sense, the coarsest mesh must have a small, nearly constant number of vertices; a condition easily satisfied by the mesh creation routine. We coarsen until the coarsest mesh is around 200 vertices in 2D or 300 vertices in 3D. These experiments show that the convergence of the standard iterative methods becomes more and more arduous as the meshes become more and more singularly refined. The singularity in 2D is much greater, due to the sharper reentrant angle, than it is in 3D, so the more severe difference in performance between multigrid and ILU is to be expected. This is despite increase in condition number due to increase in dimension [12]. We see that the number of multigrid cycles levels out to a constant for both 2D (Table 2.5(a)) and 3D (Table 2.5(b)). We also see that a steadily in- creasing number of multigrid levels are automatically generated and that the method continues to behave as expected as the coarsening procedure is continually applied. AMG and GMG seem to have similar performance characteristics in both 2D and 3D.

2.8.6 Anisotropic Multigrid Performance

We define the Poisson problem with a right hand side function living on an internal interface; that is:

(∇u, v) + (αu, v) = (f, v)Γ (2.32) u(x, y) = 0 if x = 0 or 1 (2.33) for all v ∈ V . f is a function living on the interface, Γ. This will create a kink on the interface. This screened Poisson equation will have solution away from the interface that dies exponentially. We have that anisotropic refinement in the direction normal

56 to Γ is required to resolve this kink. We follow the example of [157] and grade 2l 1.5 parametrically from the normal, regular mesh as x = N for a vertex at level l cells away from the interface for N layers outwards. This grading results in meshes that resemble the finest mesh in Figure 2.8.6. This experiment was performed as the others above were, using rediscretization multigrid along with a series of coarse meshes generated from the coarsening proce- dure.

Figure 2.14: Anisotropic mesh and the sequence of resulting coarse meshes using the algorithm. Note the automated mixture of isotropic coarsening and pseudocoarsen- ing.

Table 2.6: Anisotropic problem multigrid performance. n l MG ILU 50 3 5 76 100 4 7 102 150 4 9 120 200 5 9 138 250 6 11 149 300 7 12 159

Our convergence to a steady value is therefore less nice than it is with isotropic elements. The theoretical results say that meshes like the ones we generate should 57 be able to precondition both problems with an interface like this, and problems with jump coefficients. This is left for future work.

2.9 Outlook

We have had some initial success at robust coarsening for general multilevel prob- lems. This success has allowed us to do general geometric multigrid for difficult problems with corner singularities, satisfying both the theoretical needs of adaptive finite element methods and multilevel methods. Further exploration of the remeshing infrastructure may go several ways. The remeshing may be tuned to be able to treat problems with difficult to handle internal features generally, and further areas of application that require this abound. A software effort to make this into a general tool is underway. Unlike AMG, GMG may be used in a fairly straightforward way to solve a variety of difficult equations. The FEniCS tools enable us to quickly try different equations. Of great interest is the variable viscosity stokes equation. We have made an initial foray into the preconditioning of this equation, defined as the equation for the velocity and pressure pair (u, p):

(µ∇u, ∇v) + (∇ · v, p) = (f, v) (∇ · u, v) = 0

d d d for all v ∈ Vv and all q ∈ Vp with u, v : R → R and p, q : R → R. The jump viscosity problem of this sort has

 M ∈ Ω1 µ(x) = 1 ∈ Ω2

58 for M >> 1. This problem is notoriously hard to precondition. Because the system has no diagonal in the pressure block (Vp), AMG-type methods typically converge to an incorrect solution on it. We use the geometric multigrid construction on the full problem at each level. Our test problem is the viscous sinker, in which one has

 1 ∈ Ω1 f(x) = 0 ∈ Ω2

such that there is a body force on the more viscous region. The domain is a square Ω1 within some outer domain Ω2. There is recirculation due to the sinking viscous region.

Figure 2.15: Solution and mesh hierarchy for the viscous sinker problem.

For the 128x128 initial grid with four levels of a priori refinement around the edge of the sinker, we increase M and see if the preconditioning holds as the coef- ficient jump increases. We see that it does fairly well using galerkin multigrid, but rediscretization sees steady increase in the number of iterates.

59 M GMG(galerkin) GMG(redisc.) 1 5 13 10 6 14 100 7 20 1000 7 36 100000 9 –

Table 2.7: Galerkin vs. Rediscretization Multigrid for the Sinker Problem.

In order to be able to generalize to non-nested internal subdomains, we have begun the study of stable or energy-minimizing interpolants. An energy-minimizing i−1 interpolant Pi is such that the projection of the coarse basis functions onto the fine c c c space has small energy, e.g. kAPf φ k is small for all coarse basis functions φ . These types of interpolants have been studied in the multigrid context before [31,81,147]. Other possibilities for this are to extend the geometric notions of stable interpo- lation by either some subdomain-matching projection, or weighted Scott-Zhang [124] Interpolant chosen to minimize the energy. The software effort can be generalized and provided as a project-agnostic coars- ening section as well as a series of hooks to PETSc’s multigrid facilities. This work is underway.

60 CHAPTER 3 EXPONENTIAL MESHES AND QUANTUM MECHANICS

The approximation of functions of a particular sort, more specifically those that are nonsingular but die off exponentially away from some small set of centers, is of im- portance in a number of interesting areas of application. We contrast this with the approximation of other types of functions, such as those that have geometric decay. The approximation of these functions by piecewise polynomial function-spaces is dis- cussed and explored. In particular, the problem of error-equilibrating approximation spaces for these problems is explored. A solutions from a number of application areas which behave like this also live in some high dimensional space, and therefore suffer from the curse of dimensionality. The curse of dimensionality implies that the amount of work necessary to represent functions in high dimensions rapidly becomes intractable. This intractability has been approached in various ways in the past, and our approach is contrasted with these. We discuss the construction of FEM function spaces that enable approximation of high-dimensional functions. In particular, we discuss how one would generalize meshing to radial functions in general dimensions in a flexible way. Along with this is a set of ideas and a prototype implementation for generalized finite elements. The problems of meshes, quadratures, and function spaces are discussed, as well as their interconnection with regard to organizational issues. We consider the problems of electronic structure in configuration space, being, 3N for a system of N electrons one has functions living in R . The approximation

61 of these high dimensional functions require specially constructed function-spaces in high dimensions. We show, for a simple example problem, that we can approximate the Schr¨odinger equation in an adaptive fashion. We show that the eigenvalue problem for the hydro- gen atom has interesting behavior with respect to our approximations, and discuss the ramifications of this with respect to the flexible solution of larger quantum sys- tems.

3.1 Approximation of Exponential Decay

We limit ourselves to the approximation of exponentially dying functions by use of piecewise polynomial function spaces in the radial direction away from the origin. As we are considering just the behavior in one dimension, our analysis may begin with one-dimensional functions that die exponentially.Basic error estimates for the representation of such functions have that the error over the interval [r0, r1] given some order p polynomial approximation as

p 1 Y Er0,r1(x) = f(p + 1)(˜x) (x − x ) (3.1) (p + 1)! i i=0 for Lagrange nodes x0...xp within the interval. We will use this type of error analysis to develop optimal grid spaces for a particular type of function. This type of function is one where

f(x) = O(e−x). (3.2)

These functions are exponentially decaying. Contrast this with functions that have algebraic decay; namely that

 1  f(x) = O . (3.3) |x| 62 Note the difference in behavior at both x → 0 and x → ∞. We see that the class of functions following (3.3) may have a singularity at the origin, and decays significantly less fast at the extremes than (3.2). Therefore, the type of mesh used to approximate it is necessarily much different. The proper gridpoints for functions dying algebraically will satisfy

µ xi − xi−1 = hxi−1 for some µ > 1 (3.4) and functions behaving like the logarithm; namely

f(x) = O(log(x)). (3.5) which are properly captured by the geometric meshes. These meshes are used ex- tensively in finite element computations for resolving singularities of various sorts because they are very easily generated with nested refinements. They are defined as

i−1 xi = γ h. (3.6) for some smallest length scale h and multiplicative factor γ. This enables refinement 1 by either γ or h. Note that powers of γ as j will produce j-nested refinement. Meshes graded in this fashion are used to resolve highly singular functions. Highly graded meshes of this sort, and their numerical properties, have been previously analyzed [12]. These examples are explored further in Appendix B.

3.1.1 Optimal Spacing

The optimal grid for an equation like this will equilibrate the error in each cell, allowing for the error to be resolved and managed in a controllable way a priori. The construction of a finite, piecewise-polynomial function space for which this happens

63 assuming some basic properties of the function we are estimating. Suppose we want to optimally represent the function

f(x) = e−x (3.7) on the one-dimensional interval [0,X] by defining 0 = x0 < x1 < x2... < xN = X. The question is how to equilibrate the error on each interval in order to resolve this exponentially-dying function. One initial approach to this is to estimate the error at the midpoint in the piecewise linear case of the interval x 1 = (xi + xi+1)/2) as i+ 2

1 2 00 4 u(x 1 ) − (u(xi+1) + u(xi))/2 = (xi+1 − xi) u ((xi + xi+1)/2) + O((xi+1 − xi) ). i+ 2 8 (3.8) We introduce the effective length-scale parameter h, which, given an error  ≤ Ch2 for first-order piecewise polynomials for some constant C. Assuming C may be subsumed into the equations, we have √ 2 h2 = (x − x )e(xi+xi+1)/4. (3.9) 2 i+1 i The question is then how this may be realized. It should be noted that this leads to the difference method

x − x √ i+1 i = 2e(xi+xi+1)/4 (3.10) h which may be solved as

√ x0 = 2ex/2 (3.11)

64 creating the “optimal” first order mesh for a given number of points N. This may be solved as an ordinary differential equation by a shooting method over the parameter h. This both allows us to have a given size mesh and some estimate of the error.

3.1.2 The Higher Order Case

We may define the error due to interpolation of e−r on some higher-order piecewise polynomial as

1 hp+1 = f(p+1)(˜x)hp+1 (3.12) (p + 1)! 1 = (−1)p+1e−x˜hp+1 (3.13) (p + 1)! for somex ˜ in the interval [xi, xi+1] with h as the resolution parameter for the error p+1 p = Ch . We may, from now on, neglect the alternating sign. We make the xi+xi+1 assumption thatx ˜ = 2 . We see that

p+1 x +x (xi+1 − xi) i i+1 hp+1 = (e 2 ) (3.14) (p + 1)! s x +x h 1 i i+1 = p+1 e 2(p+1) . (3.15) xi+1 − xi (p + 1)!

Note that in the case where p = 1 this gives us √ h 2 xi+xi+1 = e− 4 (3.16) xi+1 − xi 2 which was the original derivation. Now, to get the difference method, we have

x +x (x − x ) i i+1 i+1 i = p+1p(p + 1)!e 2(p+1) (3.17) h 65 once again yielding the difference equation

x0 = p+1p(p + 1)!ex/(p+1) (3.18)

Figure 3.1: Optimal exponential meshes for a variety of length parameter h and polynomial orders p. The number of points in each grid is n.

Note that to reach a given accuracy, the number of points decreases a great deal with degree as expected. However, the “knee” of the function; the area between x = 1 and x = 10, roughly, comes to contain more and more of the points as the order is increased, meaning that the origin and far end are well-represented by piecewise linears, but require refinement at the intermediate ranges. The knee is also pushed outwards by p refinement, meaning that the exponential die-off is delayed by increase in polynomial order. This is very different behavior than is seen with other classes of functions.

66 3.2 Mesh Representation

Figure 3.2: The prism element and slicing the array of barycenters to produce a facet.

These meshes expand outwards from a central origin. There are two approaches we may take to this, both of which work in this case. In both cases we start with a central convex volume, which we have as either a simplex or a hypercube. Any volume that may be generalized to any dimension would work in this case. From that central volume we build shells outwards by repeatedly expanding the surface of the central volume outwards by some spacing. In terms of scaling, if we use the hypercube we have O(2d) cells in the mesh. This is markedly better than the O(nd) one has with a regular grid, but still is not optimal. The simplex, however, has O(d) cells in the final mesh. These meshes may be easily constructed out of prism elements in d dimensions. Our example meshes for this problem are inspired by work on meshes built to represent novel and multi-dimensional finite element computations in a way that al-

67 lows for generalized methods and geometries. These mesh technologies were inspired by the Sieve [87], which used algebraic geometry and topology concepts to define operations over the mesh, as well as the progenitors of the Sieve interface. The data structures required for implementing such meshes in the general setting are discussed here. We use the python programming language to realize these data structures, and show how a number of interesting techniques that are often quite difficult to implement are simplified by thinking about the structure of non-affine elements like quadrilateral and hexahedral finite elements in the tensor product con- text. We define some basic concepts from the sieve mesh paradigm, and use this to describe a finite element scheme for these problems in general dimensions. The sieve mesh provides an interface style that allows for interesting query language on the mesh. It allows general geometry, general discretization finite element computation. This is a specialized recreation of the sieve-type mesh in Python. Some basic definitions are required here. A mesh entity is any shape in the mesh of any dimension. This includes vertices of dimension 0, cells of dimension d or anything in-between. We define the facets of a cell as being the lower-dimensional mesh entities that surround the cell. We define a cell as a multidimensional array of vertices that make up the corners of that cell. The axes of this array correspond to each of the simplicial spaces in the tensor-product cell. The facets of the cell then become slices of this array. Any slice of this array becomes a valid mesh entity. This is apparent from the fact that the removal of a vertex from a simplex induces the facet of that simplex opposite from that vertex. A mesh then becomes a collection of cells, with shared vertices. The other shared mesh entities between the cells are then induced from the shared vertices. Orientation of a particular mesh entity against an adjacent cell is also inferred from how the barycentric coordinates map into one another. This allows for a lot of difficult

68 problems in finite element computation, including orientation of cells and facets, to be decided based upon the nesting of the tensor product barycentric coordinates. Given a set of tensor product barycentric coordinates on a cell, it is straightfor- ward to determine if those coordinates are on the interior of the cell, or on some mesh entity in the cover of that cell. Any zero entries in the tensor product barycen- tric coordinates correspond to rows or columns that may be sliced out of the vertex array. The resulting mesh entity is then the appropriate location for the barycentric coordinates. In fact, this is the same problem as translating the barycentric coordi- nates over a cell onto a facet. The only issue is determining the permutation of the barycentric coordinates, which may be done easily by investigating the permutation of the corners and matching barycentric coordinates together properly. The orienta- tion of this permutation may be used to determine the orientation of the cell with respect to a facet. We define a section to be data living over this. Every single piece of data is given a mesh entity and a coordinate on that mesh entity in terms of the entity’s tensor product barycentric coordinates. The data may be of different types or shapes (scalars, vectors, tensors) or types(integers, floats, etc.). For instance, it is very useful to have a global numbering over degrees of freedom. This may be done with a twin integer section to some DoF section. This framework allows us to quickly set up a generalized discrete function space on a mesh. Upon investigating each cell, one may

1. determine the barycentric coordinates of each DoF in the local function space

2. determine the coordinates’ mesh entity in the cover recursively

3. allocate data on that mesh entity at the appropriate coordinates

4. append any facets created to the cover

69 Note that we only construct subfacets when necessary with respect to function spaces represented on the given mesh. This on-the-fly building of the mesh only constructs mesh entities necessary to the representation of the field on the mesh. This allows for the complexity of the mesh and the function spaces used in the finite element computations to mimic each other exactly, even in the higher-dimensional case. The restrict closure operation therefore involves going to every entity in the cover of a cell and recursively translating all of the unknowns on the entity into the cell’s co- ordinates. Once all the unknowns are in the cell’s coordinates, they may be matched up with basis functions with barycentric coordinates on the cell and evaluated in the standard finite element type framework by an integral on the cell.

3.2.1 Simple Radial Meshes

Figure 3.3: Simple radial mesh in 2D. The surface of the initial simplex is in red.

We describe simple meshes with a single radial center in n dimensions. We consider some d − 1 mesh of a surface surrounding the origin; the simplest being the surface of the d dimensional regular simplex. This simplex may be constructed simply and d−1 1 recursively by placing a regular simplex of barycentric radius d at −d on the xd axis. Then, the d dimensional simplex is induced by connecting this to a vertex at 1 along the xd axis. This leaves the barycenter at the origin. 70 The interior of this simplex is either left as-is, or barycentrically subdivided into d+1 interior simplices, leaving a vertex at the barycenter. This allows for a vertex at the origin, which is important for representing functions that have some singularity in themselves or their derivatives at the origin. Then, we chose to extrude prisms from the face of this simplex. The simplex is scaled to some inner radius. Then, an outer simplex is created at some outer radius by scaling the initial simplex to a larger radius. Then, the corresponding faces of the inner and outer simplices are connected together by prism cells. Note that the prisms will have d adjacent prisms across d − 1 dimensional faces shaped like I ⊗ Sd−2. To visualize this, one can think of the 3D example (Figure 3.2), where a quadrilateral separates two adjacent cells. The above procedure may be repeated in this way for any number of layers. The outer boundary of a mesh resulting from this procedure will be a simplex. In practice we use the layer spacing arising from the optimal spacings derived in Section 3.1.

3.3 Quantum Mechanics

Electronic structure calculations allow for the energetics of quantum systems to be modeled. These computations enable the quantitative estimation of chemical prop- erties of atoms and molecules. These properties are derived from the solution of the Schr¨odingerequation, which gives both the electron distribution and the energy of the system. Consider a general form for the n-body time-independent Schr¨odinger equation:

−∆Ψ(x1...xn) + V (x1...xn)Ψ(x1...xn) = EΨ(x1...xn) (3.19)

1 1 3 with V (x1, ...xn) = + ... + − Vfield with all xi in (realspace). |x1−x2| |xn−xn−1| R These correspond to the Coulomb interactions between the electrons; Vfield can be thought of as the Coulomb interaction of a single electron with the atom centers.

71 In an effort to forestall the curse of dimensionality, a large class of methods project all ri onto a single realspace field, and then the resulting problem is treated as a nonlinear eigenvalue problem.

3.3.1 The Spectrum

The eigenspace of the time-independent Schr¨odingerequation consists of two distinct regimes; a positive real one and a negative real one. These regimes correspond to the free and bound states of the electrons in the system. The bound states have negative energy and the free states have positive energy. The lowest of the bound states is the ground state of the system. The ground state is the lowest possible energy state of the system, with the rest of the states being “excited”. Both the ground and excited states correspond to eigenstates of the above equations. Our experiments will see how both the ground and excited states are captured by these approximation techniques.

3.3.2 Solving the Schr¨odingerEquation

Solutions to the Schr¨odingerequation are an industry as to themselves. There is a wide variety of methods for the solution to the full and approximated equation used by practitioners for a variety of situations.

Realspace Methods

The vast majority of approaches for the Schr¨odingerequations model the high- dimensional interaction of the electrons in some averaged, realspace way, neglect- ing some order of electron correlation. An entirely different set of approaches to this problem comes about with the density functional theory (DFT) methods, which

72 treat the electron interaction as a functional of the overall density of the electrons rather than electron-electron interactions.

Basis Sets

The bases used in standard computational chemistry methods are often some set of functions with a peak at the atomic centers. Some standard functions include Slater [131] orbitals corresponding to one-electron wave-functions, or Gaussians [110]. Individual electron orbitals are then considered to be some combination of these one-electron functions. These types of tools have been used with great effect in electronic structure calculations for some time, but they do not constitute a robust approximation space. Recently, the numerical analysis community has moved to create methods for quantum chemistry. These may be based upon finite elements [36], extended finite elements (XFEM) using a finite element basis in addition to slater orbitals [107], wavelets [111], or modified radial basis functions [41]. The notion of a real approx- imation space is therefore becoming more important in computational chemistry, especially for harder problems where the electron cloud may be divorced from the atom centers.

Configuration Space Methods

However, the construction of methods for the solution of the Schr¨odingerequation 3N in full configuration space, that is, functions in R for N electrons, is difficult, but seemingly possible for small systems. There are interesting approaches that reduce the dimensionality and then use regular-meshes in this reduced configuration space [158]. However, the most popular methods of all for these problems are sparse grids, which are more easily used adaptively.

73 Sparse Grids

y y

x x

α β α β 8 x y8 x y 6 6 β4 β 4 2 2 2 4 6 8 2 4 6 8 α α

Figure 3.4: 2D example sparse grids with two different truncations of the hierarchical space.

The notion of sparse grids was introduced by Smolyak [133] as a way of representing functions living in certain Sobolev spaces. Given some assumptions on the functions being integrated or approximated, these approximation schemes allowed for the accu- rate representation of high dimensional functions with a comparatively small number of unknowns compared to the naiv¨ecase. A good introduction is provided in [61]. Approximation spaces and quadrature rules on the interval I allow for the repre- sentation or integration of functions on that interval in one dimension. Suppose we have some approximation space VI set of size n basis functions on the interval,

I φi (x) for all i ∈ [0..n]. (3.20)

One could build a simple high dimensional approximation space by taking the tensor product of this rule d times, such that the associated set of basis functions spanning VId:

n Y φ (x , ...x ) = φI (x ) (3.21) n 1 n αi i i

74 for some multi-index α = (α0, ...αn) with each αi ∈ {0...n}. Note that the number of these basis functions is O(nd), which is exceedingly expensive even for d of 2 or 3. Now, suppose that instead of just a single interval space, we had a multilevel set of finite function spaces VI on the interval. A sparse grid consists of a hierarchical set of basis functions [155], such as plane waves or nested hat functions, over an interval. The classical nested hat functions of this type are shown in Figure 3.5

Figure 3.5: A hierarchical basis over the interval built upon hat functions.

Then, consider truncated tensor products of these spaces. The simplest trunca- tion is that the space is built as

S X α0 α Vn = V ⊗ ... ⊗ V d (3.22) α d X over all α such that αi < n. Others, such as the more severe one shown on i the right in Figure 3.4, are also possible. This truncation leads to spaces where the approximation of mixed derivatives dies quickly. The use of sparse grids for small quantum-mechanical systems has been explored in a reasonably large body of work [60, 68, 77, 154]. The solutions to hydrogen and helium systems under conditions that are hard to handle using regular techniques have been attempted with good results. Theoretical justification for sparse grids in electronic structure was established by Yserantant [153]. This justification showed that the regularity of the solution to the 75 Schr¨odingerequation increases as the number of electrons increases. This provides justification to the technique as a way forward. The uses of sparse grids are not limited to quantum mechanics. They may be used to compress data for visualization [141], and solve a number of necessarily high-dimensional problems, for instance those in finance, [62] efficiently. Our goal is to study other low-complexity structures for discretizing the con- figuration space problem. It is our intention to treat the linear eigenvalue problem directly by sparsely meshing the space; paying careful attention a priori to the shape of the resulting eigenfunctions and treating atomic and molecular electron interac- tion carefully. These problems are discretized on the grids described in the previous sections using dimension-generalized finite element approaches.

3.4 General-Dimensional Finite Elements

We define a family of simple, general-dimensional finite elements. These finite el- ements interlock with the meshing technology described in Section 3.2. However, we describe the local finite element spaces and meshing separately, with the finite element section describing more the mathematical notions used, and the meshing section describing implementation details.

3.4.1 Tensor Product Geometries and Elements

The simplex in d dimensions is a well-defined geometrical and topological object [73]. d(d+1) It is the simplest topological object in any dimension. It has d + 1 vertices, 2 d+1 edges, and so on with e+1 e-dimensional facets. We of course know the vertex V , the interval I, triangle S2, and tetrahedron S3 quite well as the 0D, 1D, 2D and 3D simplices.

76 3.4.2 Barycentric Coordinates

We introduce, for the sake of exposition, the notion of the barycentric coordinates. Each barycentric coordinate has the properties that

bi(xj) = δij (3.23) and have the property that for x within the simplex, that

X bi(x) = 1. (3.24)

The barycentric coordinates on the simplex are merely the piecewise linear func- tions well known in finite element computation. Given this set of barycentric coor- dinates on the d−simplex, we may construct an equivalent set of barycentric coordi- nates on the tensor product cell. This construction may be done by taking the power set of barycentric coordinates on each subsimplex as the barycentric coordinates of the complete cell. Such elements transform in a general nonlinear fashion. One may think of the local element Jacobian as a composition of the Jacobians for each space at a given point.

3.4.3 Tensor Product Simplices

We define a tensor product cell as being, for some set of base simplices Si,

C = ⊗iSi. (3.25)

For instance, the prism elements described above in d-dimensional space is

Pd = Sd−1 ⊗ I. (3.26)

The d-dimensional hypercube would additionally be

77 Cd = ⊗i∈dI. (3.27)

The connection between tensor product elements and tensor product geometries is not often well stated. The standard uses of tensor product technologies for high dimensional problems are often limited to simple cases where the overall geometry is isomorphic to some cube. By constructing our elements out of these more general formulations, we may take intuitive notions in lower dimensions and use them to construct appropriate spaces in higher dimensions.

3.4.4 The Finite Element Basis

There are many options for what kind of element to use here. The barycentric coordinate-centric cells behoove us to try a number of interesting options that exist for this. For the sake of these experiments we develop Lagrange-type elements for these geometries. The two most attractive options for this are the Bernstein polyno- mials [99], which have a natural mapping into the tensor-product barycentric space in terms of polynomial order, and the standard Lagrange polynomials, which have a natural mapping into the barycentric framework that is geometric based upon the positions of the nodes.

3.4.5 Lagrange Elements

One may define, trivially, the right reference simplex as having the barycentric co- ordinates ξi for i ∈ {0...d + 1}. where

 xi for i ≤ d ξi = (3.28) Pd 1 − j xi for i = d + 1.

78 The reference tensor product simplex barycentric coordinates are merely the power set of these. For nodal basis functions, we may define the nodes in terms of the barycentric coordinates naturally, and the reference element merely becomes the power set of the nodal Lagrange basis functions may be defined on these. We review the simplest procedure for constructing an order n nodal approxima- tion space: inversion of the Vandermonde matrix. The Vandermonde matrix V may be expressed as

  ψ (x ) ... ψm(x ) 0...0 ...... 0 (3.29) ψm(x0) ... ψm(xm) for some set of basis functions {ψ} of size m. On the reference here we can just use the monomials of less than or equal to total order n per simplex, with the power set taken.

Y αi φα = xi for some |α| ≤ n. (3.30)

In order for this to be invertible, there must be a one-to-one correspondence between the nodes and monomials. Note that limiting the order per-simplex like in (3.30) gives us the monomial space the standard Lagrange elements are unisolvent on on the simplex. The nodal polynomials are discovered by inverting the Vandermonde matrix; that is, we solve for

V a = bi (3.31)

i for all i such that bj = δij. That is, we solve for monomial coefficients such that only one nodal value is nonzero. We can do this in general dimension.

79 3.4.6 Quadrature on Tensor Product Cells

Figure 3.6: Dubiner type quadrature on the triangle.

Figure 3.7: Grundmann-M¨oller type quadrature on the triangle.

Figure 3.8: Uniform lattice type quadrature on the triangle.

We want to be able to construct quadratures on these cells in order to be able to project functions into the space and assemble mass and stiffness matrices over the space. In this section we describe quadratures constructed over these spaces, issues of stability, and what may be done in order to alleviate the complexity and numerical stability issues involved. The problems in Section 3.3 require us to be able to evaluate

80 the basis functions against an arbitrary potential field, so we must assemble using quadrature. Collapse quadrature rules (Figure 3.6) [50] takes some quadrature rule on the d- hypercube, typically some tensor-product of a number of optimal interval rules, and “collapse” one of the facets into a point, giving a rule on the simplex. These rules are easily implemented given any one-dimensional rule, such as the Chebychev rules, and may be easily scaled up in dimension. However, there are significant disadvantages to this construction when done over two or three dimensions. As the number of points grows as O(nd) for n quadrature points in the one-dimensional rule, the construction quickly becomes less than tenable. Of course, the optimal one-dimensional rules have n points for polynomial degree 2n + 1. Another possible construction that is less efficient per-order for low-dimensions, but scales much better, is the Grundmann - M¨ollerrules (Figure 3.7 [69]). These rules have a number of nice features with respect to their construction and higher- dimensional efficiency. They are constructed based upon combinatorial methods on the n-simplex. Unfortunately, while they are, in some sense, near-optimal in terms of number of points in the simplex vs. order, they have the unfortunate drawback that they always have some set of negative coefficients for n > 1. In turn, they have non-optimal Lebesgue constants, and therefore the integration of singular functions like the potential fields is fraught with peril. Given the problems with both of these “optimal” methods, we choose to use the simplest lattice method (Figure 3.8) possible, which will have more points in low dimensions, but won’t have the stability issues of the optimal rules, and won’t have the exponentially increasing number of points of the squashed rule. Instead, it will have O(nd/d!) points, which is far superior, even though n must be a constant factor larger.

81 3.5 Experiments

We demonstrate the applicability of the methods described above. We will test the approximability of exponential functions in high dimensions. We use the grading from Section 3.1 and variable order finite elements in order to show that we may resolve e−r in high dimension. We aim to see how well spaces of multiple orders n hold up to the increase in dimension d.

n ↓ d → 2 3 4 5 1 0.244 0.405 0.477 0.551 2 0.055 0.169 0.307 0.396 3 0.057 0.093 0.167 0.279

Table 3.1: Relative L2 errors with respect to polynomial order of approximation dimension using the number of degrees of freedom from Table 3.2.

n ↓ d → 2 3 4 5 1 34 45 56 67 2 130 215 321 448 3 289 635 1106 1764

Table 3.2: Number of unknowns in the approximation spaces with respect to poly- nomial order and dimension.

We use the generalized radial meshes described in Section 3.2.1, with 10 levels of prism elements on the outside out to a radius of 20 at the simplex tips. We see that there is significant degradation in dimension, but that higher-order elements quickly make up this gap. Therefore, we expect that adaptive refinement in the order could be used to resolve the high dimensional Schr¨odingerproblem.

82 3.5.1 The Hydrogen Atom

The simplest atomic system is that of the Hydrogen atom. We aim to resolve the wavefunction of a single electron around a single atomic nucleus assumed here to be at the origin. The associated differential equation is therefore

1 −∆ψ + ψ = Eψ. (3.32) |x| The well-known behavior of the energies of the quantum states of Hydrogen is, in atomic units:

1 En = − . (3.33) n2 The associated eigenfunctions may be split, with 0 ≤ l ≤ n and −l ≤ m ≤ l, as

Ψnlm = RnlYlm (3.34) with R being the normalized function consisting of e−r/n multiplied by a generalized

Laguerre polynomial dependent on n and l. Ylm is a spherical harmonic. This gives us a very nice model problem to test the behavior of our methods. We will show the necessary conditions for approximating the ground-state and higher wavefunctions of hydrogen on these grids, and show why the initial attempts at using first-order finite elements has problems resolving the energies well. The particular issue is in the angular component of the solution and capturing this accurately on these grids requires adaptivity in the polynomial order.

83 3.5.2 The Hydrogen Spectrum

101

100

10-1

10-2

10-3 0 1 2 3 4 5

Figure 3.9: The recovered spectrum of the hydrogen atom for second through fifth order.

We see that we can capture the ground state increasingly well on the various grids. What is also exciting is that we may capture the excited states increasingly well. Improvement of the excited states separately from the ground state is due to the fact that having higher order in the angular direction allows for the angular part of the wavefunction; the spherical harmonics, to be increasingly well approximated.

3.6 Outlook

We have shown that we are able to study the adaptive behavior of simple finite ele- ment realizations of the time-independent Schr¨odingerequation eigenproblem. This 84 problem, plus the techniques developed to study it, lead in several interesting direc- tions. The high dimensional problems in [60,158] and the like would be a good starting point for extending the method to the problems of interest; namely the problems of high-dimensional quantum chemistry. These problems have two or more electrons 6 and have solutions in R The challenges to this include creating good meshes for the high dimensional problems, where we really should match the cusps in the electron- electron potentials. Consider the problem of solving for ψ with

 1 2 2  −∆ψ + − − ψ = Eψ (3.35) |x1 − x2| |x1| |x2| which corresponds to the spin-independent helium problem. The singularity is now three dimensional; all x1 = x2 will be singular. This is still a subfacet of the whole configuration space if matched properly. One approach to this high-dimensional meshing problem that is in consideration is the use of tensor products of regular meshes. This would leave it very easy for the cusps to be handled, and still be reasonably efficient and leave open the possibility of adaptivity. Other potential extensions include giving up on the high dimensional problem, and using the analysis of the error for low-dimensional methods. This is also quite an interesting possibility, as adaptive finite elements for molecular problems often are refined to capture the electrostatic potential. The different sort of refinement necessary for the electron structure problem is something that we could emphasize as an interesting point of argument in the future. Therefore, the refinement strategies described here are immediately applicable to these problems. The work on flexible mesh representation also has a number of interesting direc- tions it may go. The construction of interesting meshes with various sorts of cells is a difficult software problem. The h-p finite element community has dedicated a fair amount of time to this problem [10], but many solutions are still inflexible. Having a

85 flexible geometry and function space representation library, as was developed for this work, is a first step to creating intuitive methods for problems with various element geometries. Perhaps the most interesting way we may extend this is to integrate additional insight from other ways of representing high-dimensional functions, namely sparse grids. Sparse grids may be related in terms of regularity assumptions to the serendip- ity elements [139]. Therefore, a natural next step is to build “sparse” element spaces using serendipity-like elements. These element spaces may either be of Lagrange sort, where the number of nodes is truncated, or Bernstein sort, where the mixed polynomial order is truncated.

86 CHAPTER 4 EFFICIENT CLASSICAL DENSITY FUNCTIONAL THEORY

Methods for the efficient calculation of physical quantities of electrostatic systems on the protein scale requires models of fluids consisting of several interacting com- ponents, such as a solvent and ions. Classical density functional theory (DFT) is a generalized framework for considering the behavior of such fluids that takes into ac- count a number of different physical reactions that occur in an overall energetic sense. Classical DFT models provide qualitatively accurate implicit models for interesting complex fluids. The explicit treatment of ions on the same problems of interest have been handled previously using grand canonical Monte-Carlo methods [28,79]. These methods allow for explicit treatment of systems with an undetermined number of particles. However, implicit models based upon generalized energetics allow for the problem to be treated without many of the feasibility issues of Monte Carlo simulations. We consider implicit calculation of electrostatics in the nanoscale. The goal is to capture the concentration profiles of these systems in the high-concentration limit. The Poisson-Boltzmann type formalisms work well when the interaction of the particles is purely electrostatic. However, in the limit of high concentrations of ions this has proven to be an insufficient model of the ion densities. As the concentrations increase locally, the hard sphere repulsion of the ions themselves become a dominant energetic effect. Classical density functional theories are able to handle hard-sphere repulsion, and allow for a more complete notion of how the species interact than standard Poisson-Boltzmann type equations.

87 Previous studies of ion-channel like systems using the Poisson-Nernst-Planck for- malism [27], for which Poisson-Boltzmann is the steady-state solution, have been shown to exhibit qualitatively wrong behavior in numerical studies. There have been recent attempts to meld Poisson-Nernst-Planck formulations with classical DFT for- mulations [64], which was the starting point for the work described below. Classical density functional theories have been developed in order to resolve these shortcomings in a generalized way. At a broad level, classical density functional the- ories encapsulate the interplay between the repulsion and attraction between ions in solution, represented implicitly as densities. What these formulations end up resem- bling is a generalized Poisson-Boltzmann like systems, where, given some chemical potentials µi[ρ, φ] for all ion species i and some electrostatic potential φ

X −∆φ = eµi. (4.1) i when the system is in equilibrium ∇µi = 0 for all i. In the bulk fluid [114] and reference density formulation [65], one may split µi into

bath ext ex µ = µi − µi − µi (4.2)

bath where µi is the chemical potential of the ion species in the bath, which may be determined analytically. The rest of the terms may be seen as an expansion off of ext this. µi is concentration-independent and occurs due to an externally imposed ex field. µi term is the focus of this paper, and consists of several terms, which deal with the hard sphere (HS), screening (SC) and electrostatic contributions

ex HS SC µi = µi + µi − zieφ. (4.3)

The hard-sphere part of the potential is dependent on the ion radii Ri and is quickly numerically computable in the periodic case using fast Fourier transform-

88 SC based convolutions. The concern of this paper is the fast computation of the µi , which is computed as

Z SC ES,ref X (2) 0 0 0 0 µi = µ − (cij (x, x ) + φij(x, x ))∆ρj(x )dx (4.4) j where cij + φij are pairwise interactions dependent on the radii and charges of the ref ions. ∆ρi = ρi − ρi , and is the difference between the reference fluid, which has electrostatic screening effects averaged out, and the actual ion density. Previous work on classical density functional theory used the concentrations of ref bulk the bulk fluid, i.e. ρ (x) = ρi (x) for all ion species i at every point x in the domain. Note that this approach is entirely valid in the case where the ion densities do not vary very much over the domain. However, as we see variation of several orders of magnitude in confined geometries, it has been shown that this produces qualitatively wrong results in regimes where the densities vary greatly from the bulk values, such as the confined geometry of the ion channel or at the boundary of the hard wall. The formalism leading to ρref is an attempt at reconstructing an equilibrium for some local conditions rather than the conditions in the bulk. A fluid at equilibrium is both charge-neutral and has local electrostatic effects averaged out. This leads us to a value for ρref derived from the locally charge-neutralized values of ρ,ρ ¯, which is then averaged over a sphere. The radius of this sphere is the screening radius:

X ρi(x)Ri 1 R(x) = + (4.5) X 2Γ(x) ρi(x) which takes into account the average radius of an ion at a given point, as well as 1 the screening length 2Γ(x). Γ(x) is the mean spherical approximation [26] screening parameter. This parameter depends nonlinearly on the local ion densities. There has been some recent effort to make the parameter analytical rather than numerical [63].

89 We assume that, maxx R(x) ≈ λD; the Debye length:

s   kT λ = r 0 (4.6) D PN 2 i=0 ρiqi 1 which is a reasonable approximation to the MSA Screening length 2Γ for small ion concentrations such as those in the bulk. We may also say that

min R(x) ≥ min Ri(x) (4.7) x i

4.0.1 Contribution

The contribution of this thesis is to introduce two ways in which ρref may be effi- ciently computed. In the original work on this subject [88] hinted that this compu- tation was the one stumbling block to having an efficient method for calculating this problem on problem sizes of interest. The problem may be phrased as the application of a kernel

Z 0 ref θ(|x − x | − R(x)) 0 ρ (x) = Kρ¯ = ρdx¯ (4.8) 3 4 3 R 3πR (x) with θ being the Heaviside function.ρ ¯ is a locally charge-averaged version of ρ. The original attempt at tackling this computation in realspace relied on spherical quadratures. However, this was fraught with problems given the added complexity caused by the varying window size, and that the spherical quadratures often ended up having O(1) error. Spectral quadratures were then investigated for this problem. Efficient compu- tation in the case of constant R is therefore possible using fast Fourier transforms. The Fourier space representation of the kernel application is

90 Z Z − cos |ξ|R(x) sin |ξ|R(x) iξ·x ˆ x iξ·x ˆ Kρ¯(x) = e K (ξ)dξ = e 3 2 + 3 ρdξ.¯ (4.9) R3 R3 (|ξ|R(x)) (|ξ|R(x))

It should be noted that this is not a convolution as the Fourier-space representa- tion of the kernel is dependent on both ξ and on x and therefore the result calculated at a given x by convolution is only valid at that x. We will refer to this as a pseudo- convolution. The efficient computation of this problem is our goal.

4.1 Problem Setup and Parameters

We restrict ourselves to domains that are periodic in all three dimensions. This makes the application of Fourier space methods for both the reference density and several other of the associated terms straightforward. These domains correspond to several test problems, and to physical systems of interest. The systems studied here correspond to the behavior of the ion species against a hard wall, as a test, and a reduced model of an ion channel. The particular ion channel being explored is a Ca2+ selective channel. It is believed that this selectivity may be explained by its effective radius and the hard-sphere interactions with other species in the channel. In all, we care about the behavior of four species in the ion channel problem. The parameters for these species as used in these experiments are:

Table 4.1: Table of species parameters. Sodium Chlorine Calcium Oxygen charge(e) 1+ 1- 2+ 0.5- radius(A)˚ 1.0 1.81 0.99 1.4

ρbath (Mol) 0.1 0.1 1e-4 –

Other physical parameters used in the model are T = 305o Kelvin.

91 4.1.1 The Hard Wall Case

3 nm 3 nm

2 nm

Figure 4.1: The hard wall test problem has an excluded region (gray) and a region of ion accessibility (white) and the specified dimensions.

The main test problem for this method is the interaction of two equally charged ion species of slightly different radius against a hard wall. This test problem only considers a saline solution, so the chlorine and sodium ions are the only ones used here. The hard-wall case allows for the one-dimensional model to be used in order to test the full 3D code. This test case is defined as being a 2 nm by 2 nm by 6 nm periodic box of water with an exclusion region representing a hard wall at the half-way point. The behavior of the ion species at the wall may be compared with the behavior of the ion species in an equivalent one-dimensional model, and is highly dependent on the hard-sphere repulsions at the wall surface.

92 4.1.2 Reduced Model for the Ion Channel

1 3 2 5 4

Figure 4.2: The ion channel test problem.

The surface of an ion channel molecule is quite complex and has only recently been resolved with any reasonable detail from the crystallography perspective. A reduced model [53] is quite useful for studying the selectivity of a channel given ions of various widths and charges. This reduced model has only a few parameters:

1. The bath width, height, and depth

2. The ion channel radius and length

3. The manifold radius and length

Using only these parameters, it may be shown that the ion channel displays selectivity. Certain ions, even with very low relative bath concentrations to others, are preferred to inhabit the channel. This fact is what makes biological ion channels function as they do, and the ability to display this with a reduced model is fairly profound. In particular, one sees the preferential occupation of the channel by Ca2+, even with the much smaller concentrations as shown in Table 4.1. Oxygen is treated as an ion species, but this is a simplification as the actual pro- tein geometry of the channel includes relatively free-floating residues with strongly- partially-charged oxygen atoms in them. There are eight such residues, so the density 93 may be normalized at each step and the oxygen “ions” are restricted to being inside the channel and their distribution rearranges in response to the presence of other, unconstrained ions in the channel.

4.1.3 Numerical Approaches

As the reference density is a model approximation, but merely serves as the basis for a perturbative expansion, a great degree of latitude may be given to its numerical approximation. Therefore, we explore the space of potential approximations and simplifications possible and use them to approximate the quantity numerically. The naiv¨eapproach was tried initially, and while it worked for the particular problem, it took weeks to converge due to the reference density calculation.

Dense Linear Algebra

We must note that the calculation described above, in the limit of R(x) → λD ≈ L for domain length L the realspace operator will be dense. The Fourier-space computation of a single point of the kernel will, as well, be a dense computation. Therefore, analogs between fast dense linear algebra and the application of this kernel are valid. Dense linear algebra is especially amenable to optimization based upon memory locality, as shown by optimized libraries [150] and the application of novel shared- memory architectures to the problem. The first attempt at computing the reference density used this type of architecture to compute the field, and is described in Section 4.4. In particular we look into approaches for optimizing dense computations in the general-purpose GPU computation context. The GPU optimization of dense gener- alized linear algebra has been studied [15, 95]. By looking at techniques for tuning the problem to particular hardware, we can show a vast initial improvement in per- formance on the GPU architecture that gets steadily better as we optimize further.

94 Pseudodifferential Operators

There is a natural relation between the calculation of the application of this and similar kernels and the theory and computation of pseudo-differential operators (PDOs) [138,142]. A PDO L of order k is a differential operator for some multi-index α, |α| = k such that the operator may be applied as either

X L u(x) = a(x)Dαu(x) (4.10) α with

d Y ∂ Dα = (−i )αi (4.11) ∂x i i or

X Z L u(x) = S(x, ξ)eiξ·xdξ (4.12) 3 α R

Pd αi in Fourier space. S(x, ξ) = i ai(x)ξ is known as the symbol of the operator. These operators may be considered a generalization of differential operators, and their theoretical study is of great importance for PDEs. In terms of numerical anal- ysis, the Green’s functions of various differential operators are often expressed as negative-order PDOs for the sake of the application of fast solvers or development of fast methods for these problems. Note the similarity between these operators and the operator we are trying to apply above. The equivalent notion of the symbol of our operator is the kernel described in (4.9). A number of fast methods for PDOs of various types have been explored in the literature. For the constant coefficient case, fast Green’s-function transforms like the Fast Multipole Method(FMM) [67], Wavelet transform [22], or Ewald summation [45]

95 have proven to be their own industry. Fast transforms of various sorts have also been developed for the computation of classes of PDOs without assuming constant coefficients in the operator. These include use of special bases, manipulation of the symbol [47] and radial/angular [13] splitting. The simplest methods [92] based upon some, potentially approximate, splitting the symbol into αi(x) and βi(ξ) and computing the application as

Z X −iξ·x αi(x) e βi(ξ)dξ (4.13) 3 i R which makes the problem solvable by the application of a small number of convo- lutions multiplied by realspace functions. One may look into other ways of decom- posing the kernel based upon dependence on x and ξ as well. We have decided to exploit the particular structure of the kernel we have above, in a way inspired by fast methods for PDOs but distinct from them. We can note that the structure of the integral we have is similar to the application of a continuous histogram. These types of applications will inform our fast method for the problem.

4.2 Parallel Algorithm

The efficient numerical calculation of classical density functional theory was explored previously by our coauthors [88]. They were able to outline efficient algorithms for the computation of all relevant quantities for the problem, as well as create a fairly complete software package built on top of the PETSc library [108] to put the model into practice. The major hurdle that needed to be overcome was the efficient calculation of the reference density. There are several terms in the model that require the use of convolutions. How- ever, these have homogeneous radii across the domain and did not create any com- putational difficulty. The introduction of the parameter dependent on the local

96 concentrations caused this convolution-based approach to fall down. The extremely fragile numerical methods required to make this complicated coupled system converge therefore had to be tested over weeks, rather than anything resembling a reasonable testing cycle. This made the development of the method as a reliable contender for classical DFT calculations somewhat difficult. The first attack came from the massively-parallel side of thing. The second attack was analytical and is laid out in the next section. The need for the use of a massively parallel algorithm for a problem on such a small domain is a bit counterintuitive. As we have noted, the complete summation L L L of the problem requires O(n2) complexity for n = x y z unknowns with domain hxhyhz sizes corresponding to the L and grid sizes corresponding to h. This is a prohibitive cost considering the rest of the algorithms, drastically limiting the resolution of the entire problem despite just being one term. However, we have access to hardware in modern desktop computers able to handle such highly-structured complex computa- tions in a very straightforward way quickly up to a fairly large size domain. Modern GPU hardware allows us to bridge the gap between small serial runs and very large problems requiring supercomputers. Computing using GPUs allows for massively- parallel computation without major infrastructure investment. In addition, the GPU algorithm allows us to sanity check the approximation algorithms we develop to be able to approximate this field with algorithms with much better scaling. Our integral operator application is essentially the classic problem of parallel x matrix-vector multiplication. In fact, the action of the kernel Kˆ (ξ) may be expressed as a matrix-vector product. matvec is a very common target for serious and often architecture-specific tuning. Ways of optimizing dense matrix operations on the GPU have been explored as the computational platform has gained momentum in the sciences. There are some advantages we have over the standard dense matvec problems which allow our scaling to be better. The key difference for our particular problem

97 is that the dense matrix representing the kernel need never be explicitly assembled. This eliminates much of the memory traffic that occurs in a matvec, so that only the vector quantities ρ¯ˆ, R(x), and the final answer ρref are ever transferred between the GPU and the CPU. This gives us a vast advantage in terms of the amount of computation we have per unit of data. The transfer time between the GPU and CPU is typically a huge bottleneck in terms of making computation on the GPU efficient, especially in the case where there is substantially more data than computation. This is thankfully not the case in the computation we are attempting here. However, our disadvantage is that entry-specific state must be computed on the GPU during the unassembled application process. Examples of this state include the values of ξ at each gridpoint. This requires integer operations on the GPU, which are expensive compared to floating point operations and should be avoided. Being able to amortize this computation requires notions of spatial locality in both the x and ξ variables, making the problem more nontrivial than it appears on the surface.

4.2.1 GPU Computation

Global Memory

...SM SP SP SP SP SP SP SM ... SPSM SP Shared Memory

Figure 4.3: Schematic of the GPU’s processor and memory layout.

The Graphical Processing Unit (GPU) architecture is similar to that of the classical Single-Instruction-Multiple-Data (SIMD) machine that was popular as the model for supercomputers in the dawn of high performance computing. These machines had a number of parallel processors working in lock-step, doing the same operations on 98 different inputs. The GPU generalizes this model by having scheduling and data- layout services that make a lot of concurrency issues transparent. In particular, our target API is the CUDA [104] architecture, which is built to target the NVIDIA line of graphical processors. We could have targeted the more generalized OpenCL [134] framework, but have chosen to neglect it for the time being. However, a lot of the same lessons apply, as the two are similar in layout. It should be noted that CUDA and OpenCL are still very much in development, and that their ease-of-use has increased greatly in the past year. That being said, the description below continues to apply.

Programming Model

If we look at the layout of the hardware (Figure 4.3) in terms of the organization of the processing units, each GPU is organized into streaming processors (SPs). Each SP is a member of a streaming multiprocessor (SM). The threads in an SM each perform the exact same operation at each step. In the present generation hardware, each SM consists of 16 SPs. Each SM also features a few special function units specialized for trigonometric operations and double precision arithmetic. This maps to the programming interface by having threads and warps. A thread may be thought of as a single process, and a warp may be thought of as a bundle of processes that are scheduled together. The practitioner doesn’t directly handle warps, but they are handled in a more abstract sense through threadblocks, which are effectively scheduled concurrently on the same SM. The scheduler places the warps on individual SMs and the program is executed much as it might be on a serial machine, with interruption due to any number of reasons prompting the next set of ready threads to run.

99 Memory Layout

The memory layout, from a programmer’s perspective, consists of shared memory, global memory, constant memory, and texture memory. For the sake of this problem we will focus on the global and shared memory (Figure 4.3). Shared memory is accessible from within a single thread block. These memories are small, but low-latency and allow for the sharing of data between threads of a threadblock. They may be considered the analog to a cache in parallel architec- tures where local memory management isn’t as explicitly the responsibility of the programmer. Global memory is randomly accessible from any thread. This memory is large and relatively high latency. Entire warps may batch aligned memory requests, which improves the average access time by allowing for large numbers of memory accesses in parallel. The performance issues that this model exposes allows the user to either have terrible or wonderful performance.

100 4.2.2 The Algorithm

ρref Ξ (X,Ξ) ^ρ ρ ref

ρref ρref ρref ρref ρref ρref ρref ρref (X, b) (X, b) (X, b) (X, b) (X, b) (X, b) (X, b) (X, b)

ρ ρ ρ ρ ρ ρ ρ ρ ^b ^b ^b ^b ^b ^b ^b ^b

Figure 4.4: Diagram of the on-GPU part of the algorithm. Note the division of ρ¯ˆ among the threads, the per-threadblock partial summation to ρ(X,b), and the final ref complete summation per segment of realspace ρ(X,Ξ).

The algorithm proceeds in two definite steps. These steps may be roughly described as a Fourier transform and an all-to-all reduction. The integration with the GPU computing model of these two steps is our main concern here. This is going to be a balancing act between the amount of caching we may do, and the amount of precomputation we may do. The first of these steps is to compute the vector ρ¯ˆ(ξ) from the vectorρ ¯(x) by use of a fast Fourier transform. This may be reliably completed either on the CPU or on the GPU. Doing it on the CPU side allows for larger problem sizes, as both 101 ρ¯(x) and ρ¯ˆ(ξ) must be stored at the same time and the global memory of the GPU is often smaller than the CPU-side memory. In either case we eventually have ρ¯ˆ on the GPU. Doing it on the CPU also means that multiple pieces of the ρ¯ˆ(ξ) may be transferred to the GPU and used in pieces in the second stage of the algorithm. However, some additional precomputation may be done and saved in the shared memory per threadblock in order to further speedup the algorithm. The second part of the algorithm proceeds in two distinct stages, accomplished by the invocation of two different kernels. The two stages are scheduled one after another and act on rectangular blocks of real-space, ΩX (Figure 4.2.2, dark red). This process is repeated until ρref (x) is calculated for the whole of Ω. ˆ ˆ The first stage is kernel application. Given some subdomain of Ω, ΩΞ (Figure x 4.2.2, blue) that fits in global memory, the product Kˆ ρ¯ˆ(ξ) for each ξ ∈ Ξ is computed for each x in ΩX, a subdomain of Ω that also fits into memory. The organization of the first stage is that each threadblock has some subregion of Ω,ˆ B ( Figure 4.2.2, in shades of blue). Then, the per-entry application is done by striding the threads in the threadblock (Figure 4.2.2, in yellow) over the entries in ΩX and having them compute the kernel application for each b ∈ B in lockstep, ref forming ρ(X,B). This choice was done because the case where each thread in the block accesses the same value in shared memory at the same time is optimized in-hardware. ref The second stage is a reduction. The per-block ρ(X,B) are then contracted over ref ˆ the blocks to form ρ(X,Ξ), representing the partial solution for ΩX and ΩΞ. ref ref The final reduction sums the various ρ(X,Ξ) entries into the complete ρ . As this data is now most likely off the GPU, it is probably best to do on the CPU in order to not incur any additional data transfer. However, if ΩX = Ω, the case where all realspace data fits on the GPU, then this consideration is unnecessary.

102 4.2.3 Optimization

The whole reduction stage has been shown to contribute less than 1% to the runtime, so its optimization is less important than that of the first part. These optimizations involve both observations about what the GPU does well and poorly and tweaks meant to mitigate shortcomings of the architecture, and a performance model for the algorithm leading to intuition as to how it might perform.

We precompute the per-dimension frequencies ξi and |ξi| for each point. This reduces the use of integer math and the square root function, which results in signif- icant performance gains. These values may be reused for several applications of the kernel for different ΩXi. There is an obvious tradeoff between storing these frequen- cies and the amount of ρ¯ˆ that can be accommodated at the same time. However, tuning shows that there is a definite advantage to this precomputation. We will explain this advantage below. To tune, we have the following variables. We suppose that there are N gridpoints. 2 We have broken the computation into N = N invocations of the kernel, kernels NΞNX covering NΞ entries in Fourier space for NX entries in realspace. Each kernel has Nblocks Nthreads threads each, with Nξ entries of ρ¯ˆ in each B. The accumulation phase uses Naccum accumulators over each block. Using these, the per-threadblock flops may be described as:

• 122Nξ ∗ NX for kernel application

• 18Nξ for precomputation

This estimated number of flops for kernel application uses 24 flops for sin() and cos() calls, which we arrive at using a simple benchmark. The required global memory use per kernel, in terms of 32-bit floats, is:

• NX to store R

103 • NΞ to store ρ¯ˆ

ref • Nblocks ∗ NX to store ρ(B,X) for all threadblocks

The required shared memory use per threadblock is:

• Nξ for ρ¯ˆ

• 3 ∗ Nξ for precomputation

The required data traffic from the CPU to the GPU per kernel is merely the size of the data:

• NX to transfer R to the GPU

• NΞ to transfer ρ¯ˆ to the GPU

ref • NX to transfer ρ(X,Ξ) back

And the required global memory accesses per threadblock are

• Nξ accesses to ρ¯ˆ

• 3 ∗ Nξ accesses to the precomputed values

ref • NX writes to ρ(B,Ξ)

It’s a natural assumption that we want NΞ and NX to be sufficient to fill global memory. However, we want the number of blocks to be minimized as they increase storage by a large factor in order to increase concurrency, meaning we want to max- imize Nξ for a given size of NΞ. Nξ is limited by the size of the local memories. It is a standard observation that one wants a huge number of threads for GPU computation in order to get efficient scheduling. However, the number of threads is limited to less than NX. We also gain from the precomputation step when we 104 increase NX. This is because the cost of precomputation is amortized for as many entries it may be used for.

Therefore, the seemingly reasonable approach would be to minimize Nblocks, max- imize Nthreads, NX, Nξ, and NΞ. At this point, these heuristics have been used to hand-tune the kernel. The results are shown below.

4.3 Initial Results

An easily verifiable test problem for this algorithm is the case of constant RSC, which may be computed efficiently on the CPU using the convolution theorem. The test system is an Intel Core 2 Duo E8400 3.0 GHz with 4 GB of RAM. The GPU is an NVIDIA GTX285 [70] with 1 GB of global memory and 240 SP cores organized into SMs of 8 cores with 16k of shared memory between them.

Figure 4.5: CPU and GPU performance for calculating ρref .

105 The CUDA implementation of the kernel application was implemented using Py- Cuda [85], a python wrapper for the CUDA language that has a large number of built-in utilities for basic operations and program setup. This allowed for easy pro- totyping of the system. PyCUDA acts as a just-in-time (JIT) compiler, which means that many kernels with different parameters for the size of the caches used for ρ¯ˆ and the precomputed frequencies may be tweaked automatically, allowing us to explore the performance characteristics of the overall computation. It should be noted in Figure 4.5 that the complexity of the two algorithms is the same. However, the GPU implementation allows for a speedup of two orders of magnitude, allowing for much larger problems to be treated. For the n = 32 per-dimension grid, these calculations yield a flop rate of approx- imately 124 GF. This card has a theoretical peak, neglecting the special function unit, of approximately 660 GF. This clearly shows that there is room for improve- ment. We are currently investigating an out-of-core solution for the GPU, in which the reduction is moved from the GPU to the CPU, possibly resulting in much higher throughput. The throughput on the GPU is already quite impressive compared to the CPU. However, there may be other exploitable features of this computation that we have not studied, including dividing up the operations in more interesting ways between the CPU and GPU. We also note that the fact that the GPU computation must be done in single precision gives us a persistent error of 10−5 for all problem sizes. This error persists when using stable summation methods, such as Kahan summation [84], and double-precision accumulators. Therefore, the fast direct approach approximates the quantity to some error based upon the numerical precision required. This is fine, as the reference density is a model parameter. However, the complexity is still daunting.

106 4.4 Fast Algorithm for the Reference Density

The parallel Fourier-space method based upon techniques similar to fast dense linear algebra was able to make the problem tractable for reasonable problem sizes, but quickly suffered from the same O(n2) curse that the serial implementation suffered from as well. Further experiments required bigger grids in order to resolve the channel sufficiently, so a fast approach became necessary. By fast here we mean O(n log(n)), given our reliance on convolution algorithms with this complexity. We show in the following section how the structure of this operator application, as inspired from methods for approximating PDOs, leads to such an algorithm. Additionally, the approximation may be made more accurate than the present CUDA direct implementation simply by the fact that all its calculations are done in double precision rather than the single precision required for full parallel performance from the GPU. However, as the quantity is already an approximate model parameter, both approaches are valid in this respect.

4.4.1 Parameter gridding

The dependence of the kernel on ξ and x, and the ramifications for this in terms of the applicability of fast convolution algorithms seems daunting until one considers the fact that the only dependence on x in the kernel is contained in the numerical parameter R(x). Recall from earlier that the quantity R(x) was derived from the MSA approximation and may be easily bounded in magnitude between the smallest ion radius and (4.6). This fact provides us with an interesting option, in which we create some param- eter grid in R(x) such that for all representative values of R(x) some error bounds are satisfied. This leads us to the simple change of variables for the kernel, namely, we choose some R∗ with

107 ∗ cos(|ξ|R∗) sin(|ξ|R∗) ˆ x(ξ) → ˆ R (ξ) = 3(− + ) (4.14) K K (|ξ|R∗)2 (|ξ|R∗)3 when R∗ = R(x). Then, we may calculate the application of the convolution kernel for each R∗ such that some R(x) = R∗. Note that if we have unique values of R(x) for all x ∈ Ω, the algorithm would then still be the same complexity as before. However, this is where the fact that we may approximate the reference density comes in. In practice we may have some discrete set {Ri} with Rmin ≤ ... < Ri < Ri+1 < ... < Rmax and use some piecewise-polynomial interpolation between Ri and Ri+1 for some R(x), Ri < R(x) < Ri+1. We define nR = |{Ri}|. This reduces the algorithm to n applications of R eiξ·x Ri(ξ), which is just a single inverse R R3 K transform for each i.

The algorithm is therefore to determine all Ri, and in each window calculate the R R application of K i and K i+1. Then, for each x in Ω, we set

ref Ri+1 − R(x) R R(x) − Ri R ρ (x) = K i + K i+1 (4.15) Ri+1 − Ri Ri+1 − Ri or, more simply, the simple linear interpolation between the two computed values of R based upon R(x). The ramifications for the calculation of the quantity cannot be understated. This 6 3 reduces the overall runtime from O(n ) to O(n log(n)nR). Our goal from now on is to determine the required nR from the analytical properties of ρ and R(x) by some estimate of the error based upon these physical quantities. This means that the complexity of the algorithm is dependent on the range of the parameters as well as the size of the domain.

4.4.2 Error Estimation

We may define E[R0,R1](x) the error for the piecewise linear approximation to the reference density at a particular point x where R0 ≤ R(x) ≤ R1 in terms of the 108 standard polynomial interpolation error that, at some R˜ ∈ [R0,R1]

1 ∂2 Z θ(|x − x0| − R˜)¯ρ(x0) E(R0,R1)(x) = (R − R(x))(R − R(x)) dx0. (4.16) 2 0 1 ˜3 2 ∂ x R3 4πR which may be immediately simplified to a form involving hR = R1 − R0:

h2 ∂2 Z θ(|x − x0| − R˜)¯ρ(x0) E(R0,R1)(x) ≈ E(hR)(x) = R dx0. (4.17) 2 ˜3 8 ∂ x R3 4πR We have that the distributional second derivative of the kernel in terms of R is

∂2 θ(|x − x0| − R) ∂ δ(|x − x0| − R) 6δ(|x − x0| − R) 12θ(|x − x0| − R) = ∂R + + . ∂2R 4πR3 4πR3 4πR4 4πR5 (4.18) Note that this now is dependent upon ∇ρ in the direction of the outward normal of the sphere of radius R centered at x. This gives us, for the sphere’s normal vector 0 ~n (r0) = r−r r |r−r0|

h2 Z δ(|x − x0| − R) hR R 0 0 E (x) = ( 3 ∇ρ¯(x ) · ~nr(r ) + (4.19) 8 R3 4πR 6δ(|x − x0| − R) ρ¯(x0) + (4.20) 4πR4 12θ(|x − x0| − R) ρ¯(x0))dx0. (4.21) 4πR5

h h hR R R We split the error into E (x) = E1 (x) + E2 (x). E1 is the component of the error dependent on ∇ρ and E2 only on ρ itself. We can define

109 2 0 h h Z δ(|x − x | − R˜) E R(x) = R ( ∇ρ¯(x0) · ~n (r0)dx0. (4.22) 1 ˜3 r 8 R3 4πR At this point we must begin taking our statement of the error at a particular point and generalizing it to norms. We want our bound to be in terms of the 2-norm of the error at every point as if the entire domain’s R(x) were covered within a given interval. The way we transform this is by applying Young’s inequality, first, we define r, p, and q such that

1 1 1 1 + = + . r p q Young’s inequality implies that

kf ∗ gkr ≤ kfkpkgkq (4.23) for such p, q, and r. We choose r = 2, p = 1 and q = 2. Another possible choice, that doesn’t change the analysis very much, is r = ∞, p = 1 and q = ∞. We are limited to having the norm imposed over the statement containing a delta function be limited to 1 due to the fact that it is the only norm under which such an integral is finite. It also gives us a natural sense of what this quantity means, having a measure of the volume of the convolution function multiplied by some notion of the magnitude of ∇ρ. In some ways, both the L2 and L∞ norms are reasonable approaches. Note that both have reasonable physical interpretations as well. We will show that both produce similar results when applied numerically. How- ever, in the following analysis we will go through with the L2 version, but note that the complexity of the derivation is all in the L1 portion, so the L∞ formulation goes through the same way.

Applying this to E1 gives us that

110 2 ˜2 hR hR 4πR kE (x)k2 ≤ k∇ρ¯k2. (4.24) 1 8 4πR˜3

Next, we have E2, the component dependent on ρ alone. This component reads

2 0 0 h h Z 6δ(|x − x | − R˜) 12θ(|x − x | − R˜) E R(x) = R ( + )¯ρ(x0)dx0. (4.25) 2 ˜4 ˜5 8 R3 4πR 4πR If we apply Young’s inequality, we get similarly that

2 ˜2 ˜3 2 hR hR 24πR ) 16πR hR 10kρ¯k2 kE k2 ≤ kρ¯k2( + ) = . (4.26) 2 8 4πR˜4 4πR5 8 R˜2 So, by applying the triangle inequality in order to bound the total error in terms of kE1k2 and kE2k2, we have

2   h hR hR hR k∇ρ¯k2 10kρ¯k2 kE Rk2 ≤ kE k2 + kE k2 ≤ + . (4.27) 1 2 8 R˜ R˜2

Therefore, if we have some acceptable error bound  and want to determine hR from this we can set

h2 k∇ρ¯k 10kρ¯k   ≈ R 2 + 2 . (4.28) 8 R R2 This gives us

s 8 hR = R . (4.29) Rk∇ρ¯k2 + 10kρ¯k2

4.4.3 Complexity of Approximation

The complexity of using this method is dependent on Rmin = min R(x) and Rmax = x∈Ω max R(x). We may write an iterative algorithm for finding the number of levels x∈Ω required in the computation as 111 s 8 Ri+1 − Ri = Ri (4.30) Rik∇ρ¯k2 + 10kρ¯k2

s ! 8 Ri+1 = Ri 1 + (4.31) Rik∇ρ¯k2 + 10kρ¯k2

This gives us that, for a range of Rmin to Rk ≥ Rmax, we have that

s !nR R 8 k ≥ 1 + (4.32) Rmin Rmaxk∇ρ¯k2 + 10kρ¯k2 meaning that the number of levels required to reach error  in this approximation of application of the operator will require nR calculations at different fixed R values for

log(Rk) − log(Rmin) nR ≤ q . (4.33) log(1 + 8 ) Rmaxk∇ρ¯k2+10kρ¯k2

Rmin may be bounded below by the minimum ion radius, Rmax above by the domain size. In practice, we may specify an acceptable relative L2 error

 = relkρ¯k2. (4.34)

4.4.4 Results

The method has been tested using the DFT software [88] developed by the authors in conjunction with Dirk Gillespie and Bob Eisenberg at Rush Medical center built using the PETSc numerical linear algebra library. FFTW [59] is used to calculate the convolution parts of the potentials. This implementation allows for the calculation of the classical density functional theory ion densities arising in periodic domains by FFT methods.

112 The solve consists of a nonlinear Picard iteration for which all the components of the solution must be recalculated at each stage. At each iteration minimization with a quadratic line-search is used to stabilize the solver due to the high degree of nonlinearity of the equations. The relevant quantities, including ρref , must therefore be computed several times per iteration. We define the problem consisting of a solvent against a hard wall with two species of oppositely charged ions, corresponding to sodium and chlorine, in solution as described in Section 4.1.1. For the sake of these experiments we have that the number of gridpoints in each direction are nx = 81 (this is the direction orthogonal to the hard wall), ny = 21, and nz = 21. The ion species behavior by the wall is shown to be distinctly dependent on charged hard sphere interactions. Accordingly, the above described model is easily tested in both one dimension and three dimensions in this type of geometry. Therefore, it makes a good test problem for the method and the accuracy of the full three dimensional model as it may be reduced to the one dimensional model and calculated practically to the asymptotic limit. It has already been verified that the method works with laboriously but ex- actly calculated ρref and ρref calculated by use of half-precision GPU hardware in the earlier work. However, this approach still exhibits the non-optimal complexity from before, but becomes unfeasible much less quickly due to parallelism. As the half-precision version is also an approximation of the quantity, we know that an approximation of this quantity resolves the problem sufficiently. The range of R in the calculation for the hard wall case varies, to some extent, at each nonlinear step as R(x) is recalculated depending on the local densities. This is important in that we will see the complexity vary as the screening radius changes. However, for the sake of the experiments here we take the error at the final step of the nonlinear solve. This makes sense as the extreme concentrations in the channel “deepen” as the solve transitions the Poisson-Boltzmann like initial guess to the classical DFT solution, causing the lower limit of R to be approached.

113 Table 4.2: Error and refinement in R on the hard wall. kEk2 rel Rmin Rmax nR kρk2 0.1 0.000705929 0.574 0.973 3 0.01 7.69773e-05 0.574 0.973 9 0.001 9.97085e-06 0.574 0.973 28 0.0001 1.16836e-06 0.574 0.973 87

If we look at what happens with the hard wall calculation and the resulting com- plexity of the approximation for a particular specified , we see that we overestimate the error by a fair amount. However, the specified error and the actual error track each other proportionally as the tolerance is tightened. The error for a specified

rel is calculated from the difference between the calculated error and the calculated error with specified rel of one tenth of the previous value. Note that while we overestimate the error in the hard wall case, we see that we may get error proportional to that of the direct method with just a few convolutions instead of the direct, full-complexity solve.

4.5 Outlook

The calculations on interesting geometries are now possible, and the model has started to be explored. We were able to run simple wall and ion-channel exam- ples and are able to begin exploring the rest of the model now that this complexity bottleneck has been mitigated. The time savings turned out to be immense. The calculations, which could take weeks for even moderately sized problems using a se- rial implementation of the direct calculation, came to take hours at most after the application of both the fast and direct algorithms. The remaining difficulties are not in the efficiency of any given part of the algo- rithm, but in the convergence of such a complex nonlinear system. The convergence of the system appears to be made significantly more difficult by the inclusion of the

114 oxygen species, which causes the overall solver to stall or diverge. Removal of these from consideration allows for the system to converge, but at the expense of model accuracy, as the selectivity properties of the ion channels are no longer observed.

115 CHAPTER 5 FINITE ELEMENT METHODS FOR NONLOCAL ELECTROSTATICS

The problem of computing electrostatic quantities of interest for the sake of study- ing electrochemical effects at the nanoscale often hinges on the computation of elec- trostatic energies. The most studied of these nanoscale systems are solute-solvent interactions, with the solvent being water or salty water. This encompasses a wide range of problems, including those that are biochemical in nature. Molecular dynamics (MD), the full representation of all molecules and atoms in the system and the time integration of their trajectories, may be used to calculate the energies of such systems. This is done by perturbing the charge distribution in such a way that the difference in energies may be derived [148]. While these approaches are successful, there are a few major catching points as to their total applicability. The issues of scale quickly catch up to MD approaches; both the amount of time and the number of particles in a particular system may quickly make the computation infeasible on even the largest and most specialized computers. Implicit solvation techniques overcome these issues by representing the solvent as a continuum with its properties merely becoming parameters of the classical equa- tions of electrostatics. It also generally computes the steady state solution of the system, meaning that time integration is no longer an issue. However, this is not the end of the story. The most popular models of implicit sol- vation often have notable disagreement with the fully atomistic MD calculations and with experiment [80]. This behooves researchers to look for efficiently computable extensions of the implicit solvent models that better capture real behavior.

116 In particular, the Fourier-Lorentzian solvent model [17] has been shown to im- prove prediction of solvation properties of simple systems quite remarkably from the standard implicit solvation models. Here we take a finite element approach to dis- cretizing this model. We use automated scientific tools from the FEniCS project [56] to be able to rapidly study the space of parameters and adaptations of the model. This is contrasted with former studies of the model. In this chapter, we review the literature on implicit solvation generally and the Fourier-Lorentzian model specifically, with special attention paid to the variety of justifications, parameterizations, and solution techniques available for the model. We describe our implementation, the test problems and problem domains we are interested in, the performance of the method, and the outlook for continued study of nonstandard implicit solvation with automated scientific computing.

5.1 Biochemical Continuum Electrostatics

The classical electrostatic potential in a given configuration of charges and dielectric media may be expressed easily as a second-order partial differential equation. One has some charge distribution ρ(x), some relative permittivity function (x) and per- mittivity of free space 0. The goal of the problem is to determine the electrostatic potential φ(x) by solving

−0∇ · (x)∇φ(x) = ρ(x). (5.1)

This is the standard classical poisson model solved in regard to linear response theory calculations of biochemical systems. The implicit energetic treatment of n bulk species of ions with bulk concentrations ρi and charge qi may be added, the Poisson-Boltzmann equation

117 n X bulk −qiφ(x)/kT −0∇ · (x)∇φ(x) = ρ(x) + ρi qie . (5.2) i The Poisson-Boltzmann models the screening of molecular charges by free ions in the solvent. The Poisson-Boltzmann equation is therefore an improved model of the interaction of the solvent, now with mobile charges in it, and the solute which structures those charges. This will produce a layered screening around a charge as observed classically [4]. However, the small-scale properties of the polar solvent itself, for which the charge mobility is not nearly as large, are still neglected entirely. This is the motivation for the extended models of the properties of the polar solvent, expressed through the dielectric .

5.1.1 Domain of Interest

Figure 5.1: A domain with water, protein, and ions.

The tried-and-true simplified model of proteins in water is a bidomain system with two subdomains, protein and water. The two domains have markedly different elec- trostatics properties. We will refer to the domains as Σ for the solvent and Ω for the protein, with Γ being the interface between the two (Figure 5.1). The solvent region Σ generally is treated as having the bulk dielectric properties of water throughout it; that is (x) ≈ 80 for x ∈ Σ. In the vast majority of such

118 simulations, the solvent region extends to infinity if the method allows it, or to a particular cutoff distance. The protein domain Ω is modeled in classical electrostatics simulations as a region where the solvent is excluded and consequentially as a comparatively low-dielectric medium. There are debates as to how one estimates what exactly the dielectric effect inside the protein region is [122], however most simulations seem to have a constant small dielectric constant of around (x) ≈ 2 for x ∈ Ω. The other major feature of the protein region is the presence of fixed charges. These charges are related to partial charges induced by the molecular structure of the protein and induced charges in the protein. The vast majority of models for these charges conceptualize them to be fixed singular coulomb charges at the atomic centers. From there, there are many ways of desingularizing the fields that allow for the estimation of their effect to varying extents. Accounting for the effect of the fixed charges in finite element formulations of continuum electrostatics [42] may be done in several ways. A number of formulations for these models, both for finite element, finite difference, or other require some regularization of the potential field due to the singular nature of the density and the resulting singular potential. Most of these attempt to “split” the potential into the singular component and one or more nonsingular components [37]. The resulting potential fields are then reconciled by enforcing the dielectric jump properties of the potential. Given a dielectric function  with jump that across an interface with normal n, this implies that one must enforce

∂(φ + φ ) mol har = 0. (5.3) ∂n We explicitly represent the charges as delta functions at the atomic centers. We must beware, as this puts “singular” point charges on the grid. However, we may use the standard variational regularization in order to place the charges onto the

119 grid without being exceedingly singular. This is due to model-specific issues with the splitting as discussed in Section 5.2.3. The generalized Born model [16] is another way of attempting to de-singularize the field by changing the definition of ρ(x). These models place the charge entirely on the exterior of the atom or molecule by defining an effective embedding distance for the charge and extrapolating in order to place an effective charge on the surface. Similar approaches are used in the integral equation approach literature in order to put all the charge on the surface, from which Green’s function approaches such as boundary elements or Ewald summation may be applied. The apparent surface charge (ASC) [38] formulation does exactly this by solving for the surface charge that would account for the potential field created from the coulomb field, and then using that surface charge in order to solve for the effect in the solvent region. The resulting system is then easily treated using Boundary Element Method(BEM) approaches [14].

Figure 5.2: generalized Born (surface), delta function, and finite-element basis func- tion regularized delta function charge distributions.

The definition of the protein region varies from formulation to formulation. For mere spherical ions, we use the optimized radii from Aqvist˚ [3]. For more complex shapes, we may either be the Van der Waal (VdW) surface, the solvent-accessible surface (SAC) , or the solvent excluded surface (SEC). All of these are based on the abstraction that there is a stark interface between the protein and the solvent. However, the difference is in where the surface lies and how that surface is defined.

120 The VdW surface is by far the simplest of these models and is merely the union of spheres of a different radii dependent on the atom type centered at the atom centers of the molecule. The radii are the standard Van der Waal radii of that atoms [29]. The solvent accessible surface models [44,94] the effect of the radius of the solvent as well, and is defined by increasing the VdW radius by the effective radius of the solvent. The solvent excluded surface [120], by contrast, is defined by rolling a sphere of the size of the solvent across the surface of the molecule. For the sake of these experiments we use VdW surfaces. This is due to the fact that using both an improved model for the solvent and an improved notion of the surface may be seen as “double counting” in this instance. As our dielectric model now explicitly takes into account the disruption of the water network due to the charge distribution of the protein, a lot of the effects that these corrections on the VdW surface try to capture are already accounted for.

5.2 Models and Formulations

The standard Poisson model of electrostatics assumes that the dielectric response at every point in the solvent is entirely local; that is, that the polar solvent molecules reorient independently to screen the field. This neglects the fact that there is inher- ently nonlocal structure to water, including but not limited to the hydrogen bonds that form between water molecules in liquid form. Because of this, the standard clas- sical models of protein electrostatics described above fail to describe many situations accurately [89]. The goal of having an improved electrostatics model is to better represent the effects of the water network. In ligand-solvent situations, these manifest themselves as structuring of the polar solvent due to charges on the protein surface [140]. MD simulations with explicit water and experimental observation show that water un- dergoes interesting structures when exposed to rapidly changing electrostatic fields.

121 These new configurations often have wildly different dielectric behavior than water in the bulk. Water in the bulk has very high permittivity compared to most ligands and proteins, and heavily screens the electric field created by charges. However, the effect of those charges on the permittivity of water is often neglected. Various models have been proposed to overcome these shortcomings. These in- clude notions of density of the species in the solvent [74] and the relationship of the dielectric density [6] to the strength of the electrostatic field. However, of partic- ular interest to us is the response of the dielectric medium to the frequency of the electrostatic potential.

5.2.1 The Fourier-Lorentzian Model

ε bulk

dielectric Polar Nonpolar ε 1 1 0 λ frequency

Figure 5.3: Frequency response showing polar and non-polar regimes.

The Fourier-Lorentzian (FL) model [17] of nonlocal electrostatics was introduced [48,90] in order to introduce dispersive effects into the model of the dielectric medium. There have been a multitude of studies using the model given particular situations and test problems of ions and suites of small molecules and proteins ( [18, 115, 126, 136,149] to name a few). The two intuitive explanations for the nonlocal model live in frequency and re- alspace. Both may provide interesting insights into how one might interpret the nonlocal operator and how it effects the hydrogen bond network of the surrounding 122 water. An interesting observation by Scott et al. [126] is that there is a direct analog between Debye’s temporal dielectric ansatz and often-used spatial dielectric ansatzes. Debye’s model [46]:

w − ∞ ν = ∞ + (5.4) 1 + τ 2ν2 where τ is the dielectric relaxation time. The dielectric relaxation time gives one the amount of time the solvent takes screen the field. Therefore, for larger temporal field frequency ν, the relaxation time will inhibit screening. A similar ansatz may be made with respect to the spatial frequency of the field, with

w − ∞ E = ∞ + (5.5) ξ 1 + λ2|ξ|2 where ξ is the spatial frequency. This leaves us with a dielectric pseudodifferential operator with symbol Eξ

Z iξ·x ˆ E φ(x) = e Eξφ(ξ)dξ (5.6) R3 λ is the correlation length-scale of the polar component of the permittivity. This therefore assumes that as the spatial frequency of the potential increases, the ability of the structure of the solvent to realign to screen the field will be more and more hindered. The interpretations and implications of λ are discussed in Section 5.2.2.

∞ in both (5.4) and (5.5) is known as the optical limit of the permittivity. The high-frequency behavior of water has been experimentally well shown to rapidly decrease with increased field frequency. ∞ is the limit of this. In (5.5) we may take the assumption that w is the relative dielectric constant of the medium with both polar and non-polar effects included ∞ is the dielectric constant with only the non-polar effects, such as bond stretching and perturbations of the electronic structure, included. Both of these exist on a much smaller scale

123 than the polar effects (which are at an even smaller scale than the ionic screening effects in salty water), so they are treated locally in the model.

λ

Figure 5.4: Realspace interpretation with correlation distance λ.

The second interpretation of the model is motivated by a realspace, nonlocal modeling of the correlations. If we take the inverse Fourier transform of (5.6), we have

Z Z −ix·ξ w − ∞ 0 0 Eξ∇φe = ∇φ(∞δ + 2 H(x, x ))dx (5.7) R3 R3 λ with

|x−x0| − e λ H(x, x0) = . (5.8) 4π|x − x0| This real-space Green’s function for the nonlocal electrostatics operator is a Yukawa kernel. The Yukawa kernel is often used in models of systems that have exponentially-dying interaction in distance. Therefore, the whole model can be written

124 w − ∞ − ∇(∞∇φ + P˜) = ρ (5.9) 0 λ2 Z for P˜ = H(x, x0)∇φ(x0)dx0. R3 In this particular case, the exponential correlation is in the reaction field potential. In the standard theory of dielectrics, one has the displacement field D for ∇D = ρ ˜ defined from the electrostatic field E and the polarization field P = Plocal + P

˜ D = 0E + Plocal + P (5.10) where 0E + Plocal = 0∞E by the standard dielectric model assuming only the local contributions. This cleanly splits the effect of the dielectric medium into local and nonlocal terms, with the nonlocal term expressed as an integral equation. The physicality of this, in particular the physicality of the of exponential cor- relation of the reaction field, has been questioned [5], and the frequency response behavior is known to be an ansatz. However, it is a good starting point for the study of models of improved electrostatics and provides a good starting point for further inquiry.

5.2.2 Parameters of the FL Model

The model parameters for the nonlocal model are subject to some interpretation, and require explanation. Here we will try to piece together the various interpretations of the model and how these play out in the parameterization of the model, with focus on what they mean for the physical interpretation of the model.

125 Table 5.1: Constants used in various studies of the nonlocal model. Study λ ∞ Dogonadze, Kornyshev [48] 11.7 A˚ 1.8 Rubenstein, Sherman [115] 3-5 A˚ 6 Basilevsky, Parsons [19] 4.83 A˚ 1.78 Hildebrandt, Weggler [149] 20 A˚ 1.8

This model has a few parameters that need to be explained in full. While the parameter w is the well-known relative dielectric constant of bulk water, which is around 80 at biological temperatures, a number of the other parameters of the model require some explanation. We will neglect explanations of p, as they do not really factor in to the experiments and the discussion of it is not part of the model. The length-scale parameter, λ has had the most debate as to what it means. The early studies considered it to be the distance between the centers of hydrogen-bonded water molecules, 3 − 5 A,˚ or the degree to which the dipoles may reorient, around 11 A.˚ More recent work on the model has reinterpreted this to be an empirical length of correlation, and much larger; on the order of 20 A,˚ instead.

The other model parameter requiring explanation is ∞. This parameter is the permittivity of water at the high-frequency limit. There have been numerous ex- perimental measurements of the response of water to various frequencies [83], and numerous values for ∞ assumed in studies [152] and the limiting value is usually taken to be around 2. Another interesting multiparameterization of the model was done by Basilevsky and Parsons [18].

5.2.3 Discretization of the FL Model

(5.9) has an easy to derive Green’s function, allowing it to be used with boundary element methods (BEM) quite easily. BEM, when it may be used, is a quite effective 126 method for the fast solution of a problem. The above system has been tested in a few boundary element studies [55, 76] solving for the coupled field as one boundary element equation. However, it has also been noted that the integral kernel H is the Green’s function of the well-known screened Poisson equation. Namely, we can define B such that

Z BH(x, x0)φ(x0)dx0 = φ(x) (5.11) R3 with

φ Bφ = −∆φ + . (5.12) λ2 This turns the overall system into a system of two equations for the potential φ and the nonlocal reaction field P˜:

w − ∞ − (∞∆φ + ∇ · P˜) = ρ (5.13) 0 λ2 1 (−∆ + )P˜ = ∇φ. (5.14) λ2

Note that (5.14) is a vector Helmholtz equation with the opposite sign; the Helmholtz equation is damped instead by a frequency λ and the solutions are real. There is often confusion between this equation and the Helmholtz equation in the literature, so we must be specific. This is a much easier to solve problem than the Helmholtz equation, and requires none of the specialized techniques. Weggler et al. further refined this formulation. They wrote a coupled system of potentials and split out the molecular component of the potential field. The resulting system, neglecting jump conditions, is upper-triangular. In their particular formulation, the curl-free nature of the potential was important. They also were forced to introduce an an artificial jump condition assuming  = ∞ on the solvent

127 side of the boundary. Starting back from (5.13), we show that another path of reformulation allows for a number of these difficulties to be overcome fairly naturally.

5.3 FEM Approach

We return to consideration of the mixed formulation of the equations in (5.9). Some difficulties arise when one has a mixture of integral and differential equations in the model. In addition, the auxiliary field P˜ is a vector field rather than the scalar potential of which it is the gradient. It was noted by Dexuan Xie [151] that the derivative may be removed by using the fact that H is a convolution operator, under which derivatives commute, giving us:

Z φ˜ = ∇ · H(x, x0)∇φ(x0)dx0 (5.15) 3 Z R = ∆H(x, x0)φ(x0)dx0 (5.16) 3 ZR 0 0 1 0 0 0 = (δ(|x − x |)φ(x ) − 2 H(x, x )φ(x )dx . (5.17) R3 λ This leads us to the derivativeless reformulation of the entire operator, which is

Z w − ∞ w − ∞ 0 0 ρ −∞∆φ + 2 φ − 4 H(|x − x |)φdx = . (5.18) λ λ 0 We then leverage the same Green’s function trick on the derivativeless formula- tion, leading to the system of equations:

w − ∞ w − ∞ ˜ ρ −∞∆φ + 2 φ − 4 φ = (5.19) λ λ 0 1 (−∆ + )φ˜ = φ. (5.20) λ2

128 5.3.1 Formulation Details

One may define the weak form suitable for finite element simulation here as follows. Given the potential and auxiliary field pair (φ, φ˜) ∈ V ⊗ V , with some scalar-valued function space V , we define the following four bilinear forms

 w−∞ (∞∇u(x), ∇v(x)) + (u(x), v(x)) if x ∈ Σ a(u(x), v(x)) = λ2 (5.21) (p∇u(x), ∇v(x)) if x ∈ Ω ( −  ) b(˜u(x), v(x)) = w ∞ (˜u(x), v(x)) (5.22) λ4 c(u(x), v˜(x)) =(u, v) (5.23) 1 d(˜u(x), v˜(x)) =(∇u˜(x), ∇v˜(x)) + u˜(x)˜v(x). (5.24) λ2 with the resulting system being solved for (φ, φ˜);

a(φ, v) − b(φ,˜ v) =(ρ, v) (5.25) −c(φ, v˜) + d(φ,˜ v˜) = 0 (5.26) for all (v, v˜) ∈ V ⊗ V . We may of course assume that the potential goes to zero in the limit of the radius, e.g.

φ(x) = 0 for |x| → ∞.

However, we choose a large enough cutoff such that imposing homogeneous Dirich- let boundary conditions have negligible impact on the solution on the molecule. For the sake of small molecules and atoms this may be anywhere from 60 to 80 A.˚

129 The discretization of the potential ρ(x) is done using the ansatz that all molecular charges are point charges living at the nuclei and are either the ionization charge in the ion case or the partial charges in the molecular case. Therefore, we have, for N atoms:

N X ρ(x) = qiδ(x − xi). i=0 As the standard electrostatic interface conditions are automatically satisfied in the weak-form case, the formulation becomes much simpler if we’re willing to repre- sent the charge distribution as part of the right-hand side rather than altering the formulation to desingularize it, as we noted in Section 5.1.1. We take

N X (ρ, v) = qiv(xi) for all v ∈ V. i=0 For the sake of simplicity we choose V as a space of piecewise linear basis functions defined on a tetrahedral mesh.

We introduce the discrete function space Vh consisting of some set of test and trial functions {φi for i ∈ [0,N]}. For the sake of these experiments we use the standard piecewise linear finite element basis functions defined on tetrahedra. We construct the matrix considering ranges D being unknowns corresponding to basis functions of the primary field and D˜ being basis functions corresponding to the auxiliary field.

 a(φ , φ ) if i, j ∈ D  i j   ˜ b(φi, φj) if i ∈ D, j ∈ D Aij = (5.27) c(φ , φ ) if i ∈ D,˜ j ∈ D  i j   ˜ d(φi, φj) if i ∈ D, j ∈ D

130 and the vector

 (ρ, φj) if j ∈ D bj = (5.28) 0 otherwise.

The development of a realized finite element simulation for this model was made much more rapid using automated scientific computing tools.

5.3.2 Code Generation and Automation

(Du, v) = (f, v) .pdb params.

UFL Mesher

FFC TetGen Hypre

UFC DOLFIN PETSc

Simulation

Figure 5.5: An overview of the code generation, finite element, and fast solver suite of tools used for this problem.

Automated tools for scientific computing allow for complex systems to be rapidly converted into simulation code and run. The advantage of these types of technologies are that a lot of the tedious work may be offloaded to well-tested algorithms, and

131 the space of a model may be explored quickly and without a lot of the mistakes that accompany numerical programming. In particular, turning the set of bilinear forms used above into a fast finite element simulation may either be painstakingly programmed by hand, in no way resembling the mathematics it encodes, or it may be encoded using a domain-specific language. UFL [145] is a domain specific language built on top of python for this very purpose. For instance, (5.21) – (5.26) can be written in UFL as

element = FiniteElement("Lagrange", tetrahedron, 1) mixed_element = element*element v, v_tilde = TestFunctions(mixed_element) u_reac, u_tilde_reac = TrialFunctions(mixed_element) u_mol = Coefficient(element) u_tilde_mol = Coefficient(element) eps_protein = Constant(tetrahedron) eps_inf = Constant(tetrahedron) eps_s = Constant(tetrahedron) lmbda = Constant(tetrahedron) def a(w1, w2): return eps_inf*inner(grad(w1), grad(w2))*dx(0) + \ ((eps_s - eps_inf) / lmbda**2)*w1*w2*dx(0) + \ eps_protein*inner(grad(w1), grad(w2))*dx(1) def b(w1, w2): return inner(grad(w1), grad(w2))*dx + \ (1. / lmbda**2)*w1*w2*dx def c(w1, w2): return w1*w2*((eps_s - eps_inf) / lmbda**4)*dx def d(w1, w2): return w1*w2*dx a_1 = a(u_reac, v) - c(u_tilde_reac, v) a_2 = -d(u_reac, v_tilde) + b(u_tilde_reac, v_tilde) a = a_1 + a_2 L = rho*v*dx

132 which corresponds very closely to the mathematical definitions. This code is read by the FFC [57] package, which takes that code into fast quadrature evaluation of the matrix. The C++ interface definition used for this is known as UFC [144]. UFC is supported by the DOLFIN [49] finite element package. DOLFIN allows for easy defi- nition of the differential form through the UFC generated code, and easy interfaces to set up particular meshes, boundary conditions, and other problem parameters.

5.3.3 Fast Solvers

Systems of the sizes resulting from the above systems are difficult to solve directly. A fast solver solves such a system in time proportional to the number of unknowns in the system. DOLFIN is equipped with an interface to the PETSc [108] linear algebra framework. This suite of solvers allows for efficient, iterative solution to large linear and nonlinear systems. The solution to the matrix equation

Ax = b (5.29) consists of basis function coefficients x for both φ and φ˜ and is solved for using PETSc. Fast solution may be achieved fairly easily using algebraic multigrid tech- niques (AMG), such as the Hypre [54] package, which may be automatically included in PETSc’s functionality through use of a configure option.

5.4 Atomic Meshing

In order to test the methods by comparing against the solvation energies produced by the exact solutions of the nonlocal model produced for Born spheres, we must be able to progressively create meshes with interior and exterior sections that are extremely graded towards the center in order to properly resolve the atomic singularity.

133 Figure 5.6: View of the entire meshed atom domain.

The domain we use for an atom is a bidomain sphere with some ratom and rcutoff , where the entire mesh has radius ratom + rcutoff . The problem calls for having a constant dielectric function inside ratom, and the nonlocal model applying outside of ratom and until rcutoff . This allows us to create quality meshes of arbitrary inner and outer radius quickly, and to see that the model converges as the meshes are refined in the radius. We want a quality spherical mesh with differentiated inner and outer regions consisting of nin spherical shells in the atom and nout spherical shells. This may be achieved by moving the vertices of a regular tetrahedralized grid of size 2(nin + nout) + 1 in each direction. This is done by taking the square shells out from the center vertex and moving them to be at a given radius. Meshes are created by warping a regular tetrahedralized cube that has been mirror-flipped so that no near-singular tetrahedra appear on the corners of the in- dividual subcubes. This will occur if the vertex warping pushes a corner vertex too much towards its opposite facet, which will happen if just the regular grid is warped. This is fixed by making sure the central diagonal of the standard six-cell tetrahedralization of the cube always are aligned with the radial direction.

134 Figure 5.7: Zoomed center of an atomic mesh showing center and grading.

The mesh on the exterior is graded as having radial distance h = Cr2 for C related to rcutoff . This allows for a high concentration of points at the center of the mesh and very few points out where the potential field is necessarily very close to zero. These meshes have increasing aspect ratio as they go outwards, but the elements are still well-shaped and nearly right-tetrahedra. If anything, the angular directions are over-refined in this formulation. To determine ratom, we use the Aqvist˚ radii [3], as used in the previous studies of the nonlocal models. Any cutoff distance rcutoff on the order of 50 A˚ away from the atom center seems to work.

5.5 Experiments

The goal of our initial test experiments is to determine the solvation free energies of small ions. The solvation free energy is the energetic difference between a molecule in and out of water and has two components, the electrostatic component and the enthalpic component. We neglect the enthalpic component and assume that the en- ergy is dominated by electrostatics. For reference, the electrostatic potential energy may be defined as

Z 1 Gelec = φ(x)ρ(x)dx (5.30) R3 2 135 Therefore, as we assume the charge distribution ρ is fixed, the difference in en- ergies between vacuum and solvent is merely going to involve potential φ calculated with the solvent present and φ0 without it. Therefore, we define the free energy of solvation as

Z 1 ∆Gsolv = ρ(x)(φ(x) − φ0(x))dx. R3 2 If we assume that the fixed charges are delta-function points, at some set of points

{xi} with charges qi, this reduces to

N X 1 ∆G = ρ(x )(φ(x ) − φ (x )). solv 2 i i 0 i i=0 If the value in kJ / mol is desired, one may write this expression as

2 NAq0 X 1 qi(φ(xi) − φ0(xi)). 1000e0 2 The chosen validation of the method involves the solvation free energies of a number of ion species. This is the same set of experiments chosen by previous investigators of the nonlocal models using both exact answers and finite difference methods. There is experimental data for the solvation free energy of a number of ion species. We choose a small number of small ions to test these against. Experimental observations [100] allow us to compare to the actual values of the solvation energies of these ions. These experiments mimic previous experiments using the nonlocal model with different formalism [149]. We choose to use their values parameters for the sake of comparison.

136 exp Table 5.2: Free energies of solvation for example ions. ∆Gsolv are experimental FEM values from [100]. ∆Gsolv are the energies as calculated with the finite element model. exp FEM Ion ∆Gsolv ∆Gsolv Li+ -475 -538 Na+ -365 -370 K+ -295 -286 Mg2+ -1830 -2240 Ca2+ -1505 -1370

These results correspond well to the previous investigation.

5.6 Outlook

We have completed a study of improved models of polar solvents and the reasons for doing so. The derivation of the nonlocal model has been explored, and its inter- pretation has been discussed. A number of solution techniques that have been tried in the past for it have been touched upon. However, the finite element treatment derived in this thesis and the associated work has several advantages. We have shown that, using automated finite element techniques, an efficient fi- nite element formulation and simulation suite for the model is easily realized. The computation of quantities of interest for small ions has been completed using this simulation method. This matches quite well with other similar results, and the gen- eralized study of the method may be continued. Immediate future work comes in two major forms. The first of these are extensions of the applicability of our code, made possible by generalizing to more interesting domains. In particular we want to extend our package to discretize protein geome-

137 tries given standard inputs of these geometries. An initial attempt at this has been completed but still must be tested. This effort is described in full in Appendix C. The second of these is the use of more general models, and the inclusion of other effects in the model.

5.6.1 Model Improvements

The automated nature of the solution techniques used here allow for a number of model improvements to be quickly assimilated into the software suite. These include a variety of different forms of the dielectric response, including nonlinear formulations [119, 123]. Another direction would be the further parameterization of the higher frequency and charge-dependent effects, which have been a proposed addition to the model [18]. These could be compared to the nonlocal model quickly, as they all may be supported in the finite element framework naturally. The other model improvement would be to study the combination of salty wa- ter with the nonlocal model of permittivity. This would involve incorporation of a Poisson-Boltzmann like nonlinear implicit ion response (5.2). In either case, the FEM discretization of the model has opened up a wide number of possibilities for future inquiry, and the model space is wide open thanks to this effort.

138 APPENDIX A MULTIGRID CODE OVERVIEW

The code for using multilevel preconditioners with the DOLFIN library has several components, listed here for the sake of future use. These components may be used separately; the coarsening may be adapted for coarsening as needed by adaptive finite elements, and the multigrid methods used with any set of meshes without regard to how they were generated.

A.1 Multigrid Infrastructure

A.1.1 MultigridProblem

MultigridProblem::MultigridProblem(const Form & a, const vector >& a_c, const Form& L, const vector >& bcs, const vector > >& bcs_c)

The main interface for the linear algebra part of the code is the MultigridProblem interface, whose interface as shown above mimics the VariationalProblem interface from the DOLFIN library.

• a is the bilinear form to be solved

• a c is a vector of pointers to levels of preconditioning forms

• L is the linear form of the equation to be solved 139 • bcs are the boundary conditions of the equation

• bcs c are the boundary conditions for the preconditioner problems

The following parameters allow one to change the behavior of the multigrid solver

Table A.1: Solver parameters. parameter: value (* def.) description: relative tolerance 1e-6* The relative residual tolerance absolute tolerance 1e-15* The absolute residual tolerance divergence limit 1e4* The divergence limit maximum iterations 1000* The maximum outer cycles monitor convergence True* Print the convergence history down smooths 3* The number of smoothing steps in the downward sweep up smooths 3* The number of smoothing steps in the upward sweep

Table A.2: Multigrid cycle type options. parameter: value (* def.) description: cycle type V* V cycle W W cycle

Table A.3: Multigrid type. parameter: value (* def.) description: multigrid type multiplicative* Use multiplicative preconditioning additive Use additive preconditioning full Use full multigrid kaskade Use kaskade (single shot) multigrid interpolation type domain matching Use pointwise interpolation pointwise Use approximate pointwise interpolation matrix type rediscretization Assemble the matrix on each level galerkin Construct coarse operators by interpolation

140 Table A.4: Available smoother types. parameter: value (* def.) description: smoother ilu* Use incomplete LU factorization smoother jacobi Use jacobi smoothing sor Use SOR smoothing icc Use incomplete Cholesky factorization

Table A.5: Smoother to be used on the coarsest level. parameter: value (* def.) description: coarsest smoother ... In addition to smoother options lu* Invert the coarsest level using LU cholesky Invert using Cholesky hypre Invert using algebraic multigrid (Hypre)

Table A.6: Linear solver type. parameter: value (* def.) description: outer solver gmres* Use GMRES outer iteration fgmres Use flexible GMRES with restart lgmres Use augmented restarted GMRES bicg Use biconjugate gradient cg Use conjugate gradient richardson Use the richardson iteration minres Use the minimum residual method preonly Just use multigrid

A.1.2 Interpolation

The interpolation operator construction functions defined in Section 2.7 are distinct from the interpolation functionality built into DOLFIN due to the need for explicit construction of the interpolation matrix. They have have an interface that takes the form

141 void domain_matching_interpolant(PETScMatrix &I, const FunctionSpace &F_f, const FunctionSpace &F_c, const PETScMatrix * A = NULL);

• I is the interpolation matrix, passed by reference, to be created

• F f is the fine function space instance

• F c is the coarse function space instance

• A is the fine operator, which is used to determine boundary conditions as a hack.

The interpolation functions all share this set of arguments, and the present two options available through the MultigridProblem interface may be augmented with energy-minimizing interpolants and subdomain-matching interpolants in the future.

A.2 Coarsening Infrastructure

A.2.1 CoarsenedHierarchy

The function to create a coarse hierarchy is just coarsen hierarchy.

void coarsen_hierarchy(const Mesh & mesh, double coarsen_factor, uint n_coarse_meshes, std::vector > & new_meshes, string quality_enforcement = "anisotropic", double C_enf = 20., string quality_optimization = "aspect_ratio", string boundary_optimization = "volume");

void coarsen_hierarchy(const Mesh & mesh, double coarsen_factor, uint max_coarse_meshes, 142 uint n_coarsest_vertices, std::vector > & new_meshes, string quality_enforcement = "anisotropic", double C_enf = 20., string quality_optimization = "aspect_ratio", string boundary_optimization = "volume");

The difference between these two interfaces is that the first allows for a fixed number of meshes to be created, while the second allows for a minimum number of vertices to be selected as well.

• coarsen factor is β; the multiplicative coarsening parameter.

• n coarse meshes or max coarse meshes is the (maximum) number of coarse meshes to be created

• n coarsest vertices is the minimum number of coarse vertices in the coarsest mesh

• new meshes is the list of pointers to the new meshes created

• quality enforcement and quality optimization are the quality parameters

– aspect ratio uses the standard aspect ratio, (2.7) – anisotropic uses the anisotropic quality measure, (2.24)

• C enf is the enforced quality bound.

• boundary optimization is the boundary quality optimization; either volume or none

143 APPENDIX B OPTIMAL GRIDS FOR OTHER FUNCTIONS

We have contrasted the exponential grids required for resolving functions resembling e−x. We have described other grids commonly used in finite element computation, but have not shown the sorts of functions approximated by them. We have also described algebraic functions and the optimal grids for them, but the derivation is enlightening. The error analysis works in reverse. We take a popular grid grading and describe functions optimally discretized on it. We also work through the error analysis for a set of functions of great interest in many areas of application.

B.1 Geometric Grids

Suppose we have grids graded, for some initial length scale h

i xi = hγ . (B.1) for some γ > 1. These are the geometrically graded meshes, and are used in a great deal of finite element computation in order to reach scales many orders of magnitude below the scale of the mesh without having extreme refinement everywhere. We may say that

i−1 xi − xi−1 = hγ (γ − 1) (B.2) or, alternatively 144 xi − xi−1 = xi−1(γ − 1). (B.3)

Plugging this into the piecewise-linear error analysis, we have that

! 2 = f00(x ). (B.4) 2 2 g i−1 xi−1(γ − 1)

This is satisfied for f (x) = 2log(x). Therefore, logarithmic functions are opti- g (γ−1)2 mally represented on these grids. Note that fg is singular at x → 0, and that it rapidly becomes smooth as x → ∞.

B.2 Grids for Algebraic Functions

We consider the algebraic functions. Namely, we consider the example function

1 f (x) = (B.5) a x for x > 0. The approximation of these functions is important in many physical processes, and therefore the optimal grids for them are considered as consisting of some set of points {xi}, for some error parameter h, satisfying

1 x + x h2 = (x − x )2f00( i i−1 ) (B.6) 2 i i−1 a 2 (x − x )2 = i i−1 (B.7) xi+xi−1 3 ( 2 ) (x − x ) x + x i i−1 = ( i−1 i )3/2 (B.8) h 2 x0 = x3/2 (B.9)

145 The solution to this differential equation gives us something on the order of mesh spacing at a given x as h(x) = hx3/2. This is similar, but more severe, than cases of a priori graded refinement around reentrant corners in finite element calculations.

1.0 Exponential n: 30 Geometric(1.25) n: 35 0.5 Singular n: 176

0.0

0.5

1.0

1.5

2.0

2.5

3.0 -1 1 10-2 10 100 10 102 103

Figure B.1: Exponential, geometric (γ = 1.25), and singular grids at h = 0.05.

We see in Figure B.1 that both the geometric and singular meshes are very different from the exponential meshes used in Chapter 3.

146 APPENDIX C MOLECULAR MESHING

Figure C.1: Simple meshes of a benzene molecule and trypsin protein (.pdb from [146]).

Here we discuss the creation of general molecular meshes for use with this simulation. This work has been partially completed at the time of writing. We consider schemes for molecular meshing that proceed in two steps: The definition of the surface, and the creation of the mesh within the surface. The surface will split the two domains; protein and water. We also impose a bounding box some distance away from the molecular surface in order to be able to efficiently compute on it. The benefit of this is that we may black-box one, the other, or both of these operations. For instance, here we choose to generate the surface in order to be able to pick a resolution. However, we could just as easily take the surface from a specialized surface meshing tool and mesh over the volume separately. We will discuss these options in the surface meshing section.

147 One additional requirement we impose on the mesh is that there must be vertices at the atomic nuclei. This is purely a practical consideration given that, in order to avoid searching the mesh repeatedly to determine the closest feature to the atomic nuclei, we may just name a vertex that corresponds.

C.0.1 Surface Meshing

The problem of creating molecular meshes is much more involved than that of creat- ing meshes for simple geometries. We choose to use a very simple scheme for creating a surface mesh, combined with an off-the-shelf delaunay tetrahedral mesh generator in order to be able to generalize the method. This also has the upside that we may use quality surface meshes generated externally and tetrahedralize them ourselves. Other solutions will be tested in future endeavors. We have a few simple requirements for molecular meshes for these problems. The first of these requirements is that the mesh must correspond, within some specified minimum mesh size in comparison to the atomic radii, to the molecular surface as defined by the Van der Waals or solvent accessible surface. However, the difference between these two in the nonlocal setting is much less clear than it is with a sudden coefficient drop. Another requirement is that a vertex in the mesh must correspond with the center of each atom. As we use mesh-based regularization of the central charges of the individual atoms rather than some splitting, we make the simple requirement that each atom center correspond to a mesh vertex for the sake of having some easily controllable location for each within the atom. A very simple scheme for molecular meshing that is O(n) for n atoms in the molecule may be employed by having an implicit computational grid upon which the atoms have been located. The exterior of the molecular surface may then be discovered by traversal upon this grid.

148 Now, the grid could be used for the process of meshing, but one would have to be very careful about inverted tetrahedra when the simplicial mesh is derived from the grid. Because of this, we choose to use the tetrahedral mesh generator TetGen [130] to generate the mesh instead. Using the surface and the total size of the proposed domain as inputs to the mesh generator, one may create general sized domains for reasonably sized molecules suitable for use with the finite element method.

C.0.2 Inputs

The inputs to the algorithm are actually the files used to set up a molecular dynam- ics simulation. By taking in a protein data-base file (.pdb), used to specify atom coordinates and bonds and a protein structure file (.psf) used to specify charges and other force-field dependent quantities and generate by NAMD. These files are available on the Internet through the Protein Data Bank (PDB) [21]. This, along with partial charges provided by the CHARMM [33] forcefield built using the NAMD package [109]. Given this input, we should be able to mesh a molecular surface quickly in a way that may be generalized to relatively large molecules from the PDB with ease. In addition, the inputs require some definition of the atomic radii, for which we use the VdW radii [29]. The mesher also allows for the initial resolution to be chosen in terms of number of divisions of the smallest atomic radii for the surface mesh as noted earlier. This is a simple way of enabling coarse and fine versions of the same mesh to be created, and allows for simplified versions of the various simulations to be quickly built and run.

C.0.3 Algorithm

The meshing algorithm we use starts with an implicit grid that covers the domain of interest out to a certain radius. The method by which this implicit grid is traversed

149 outward from the closest gridpoints to the atom centers, which are moved to the atom centers in post-processing. This grid is defined as being some factor smaller than the smallest atom radius. This enables the entire molecule to be resolved with a fairly well specified accuracy. In this traversal, gridpoints are marked as inside if they’re inside any atom’s radius, and surface if they’re the next-door neighbor of any surface gridpoint. The surface is defined by simply taking the exterior of the standard tetrahedralization of only the interior and surface vertices, insuring that the surface is watertight and of reasonable quality. This surface is then moved to the surface and smoothed. This surface, and the locations of the atom center vertices, and the bounding box out some radius from the molecule, are then sent to a tetrahedral mesh generator in order to be turned into a mesh. This mesh will be graded from interior to exterior length scales by any reasonable quality constrained mesh generator. The output of this process is a simplicial mesh with lists of cells that are on the protein interior and facets that make up the barrier between. This mesh is entirely useful for the type of calculations we have completed with the atoms.

150 REFERENCES

[1] Adams, M., and Demmel, J. W. Parallel multigrid solver for 3D unstruc- tured finite element problems. In Supercomputing ’99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) (New York, NY, USA, 1999), ACM, pp. 27+.

[2] Apel, T., and Milde, F. Comparison of several mesh refinement strategies near edges. Communications In Numerical Methods In Engineering 12 (1996), 373–381.

[3] Aqvist,˚ J. Ion-water interaction potentials derived from free energy pertur- bation simulations. The Journal of Physical Chemistry 94, 21 (Oct. 1990), 8021–8024.

[4] Attard, P. Ion condensation in the electric double layer and the corre- sponding Poisson-Boltzmann effective surface charge. The Journal of Physical Chemistry 99, 38 (Sept. 1995), 14174–14181.

[5] Attard, P., Wei, D., and Patey, G. N. Critical comments on the nonlocal dielectric function employed in recent theories of the hydration force. Chemical Physics Letters 172, 1 (Aug. 1990), 69–72.

[6] Azuara, C., Lindahl, E., Koehl, P., Orland, H., and Delarue, M. PDB Hydro: incorporating dipolar solvents with variable density in the Poisson-Boltzmann treatment of macromolecule electrostatics. Nucleic Acids Research 34, suppl 2 (July 2006), W38–W42.

[7] Babuska, I., and Aziz, A. K. On the angle condition in the finite element method. SIAM Journal on Numerical Analysis 13, 2 (1976), 214–226.

[8] Babuˇska, I., Kellogg, R. B., and Pitkaranta,¨ J. Direct and inverse error estimates for finite elements with mesh refinements. Numerische Mathe- matik 33, 4 (December 1979), 447–471–471.

[9] Balay, S., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Smith, B. F., and Zhang, H. PETSc Web page, 2010. http://www.mcs.anl.gov/petsc. 151 [10] Bangerth, W., and Kayser-Herold, O. Data structures and require- ments for hp finite element software. ACM Trans. Math. Softw., in print (2008).

[11] Bank, R. A-posteriori error estimates. adaptive local mesh refinement and multigrid iteration. In Multigrid Methods II, W. Hackbusch and U. Trottenberg, Eds., vol. 1228 of Lecture Notes in Mathematics. Springer Berlin / Heidelberg, 1986, pp. 7–22.

[12] Bank, R. E., and Scott, L. R. On the conditioning of finite element equations with highly refined meshes. SIAM J. Numer. Anal. 26, 6 (1989), 1383–1394.

[13] Bao, G., and Symes, W. W. Computation of pseudo-differential operators. SIAM Journal on Scientific Computing 17, 2 (1996), 416–429.

[14] Bardhan, J. P. Numerical solution of boundary-integral equations for molec- ular electrostatics. The Journal of Chemical Physics 130, 9 (2009), 094102+.

[15] Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., and Quintana-Orti, E. S. Evaluation and tuning of the level 3 cublas for graph- ics processors. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on (April 2008), pp. 1–8.

[16] Bashford, D., and Case, D. A. Generalized born models of macromolec- ular solvation effects. Annual Review of Physical Chemistry 51, 1 (2000), 129– 152.

[17] Basilevsky, M. V., and Chuev, G. N. Nonlocal solvation theories. In Continuum Solvation Models in Chemical Physics: From Theory to Applica- tions, B. Mennucci and R. Cammi, Eds. John Wiley and Sons, 2007.

[18] Basilevsky, M. V., and Parsons, D. F. An advanced continuum medium model for treating solvation effects: Nonlocal electrostatics with a cavity. The Journal of Chemical Physics 105, 9 (1996), 3734–3746.

[19] Basilevsky, M. V., and Parsons, D. F. Nonlocal continuum solvation model with oscillating susceptibility kernels: A nonrigid cavity model. The Journal of Chemical Physics 108, 21 (1998), 9114–9123.

[20] Bastian, P., and Wieners, C. Multigrid methods on adaptively refined grids. Computing in Science and Engg. 8 (Nov. 2006), 44–54. 152 [21] Berman, H. M., Westbrook, J., Feng, Z., Gilliand, G., Bhat, T., Wessig, H., Shindyalov, I. N., and Bourne, P. E. The protein data bank. Nucleic Acids Research, 28 (2000), 235–242.

[22] Beylkin, G., Coifman, R., and Rokhlin, V. Fast wavelet transforms and numerical algorithms I. Comm. Pure Appl. Math. 44, 2 (1991), 141–183.

[23] Bittencourt, M. L., Douglas, C. C., and Feijoo,´ R. A. Non-nested and non-structured multigrid methods applied to elastic problems part II: The three-dimensional case., 1998.

[24] Bittencourt, M. L., Douglas, C. C., and Feijoo,´ R. A. Adaptive non-nested multigrid methods. Engineering Computations 19, 2 (2002), 158– 176.

[25] Blum, H., and Rannacher, R. Extrapolation techniques for reducing the pollution effect of reentrant corners in the finite element method. Numerische Mathematik 52 (1988), 539–564.

[26] Blum, L. Mean spherical model for asymmetric electrolytes. Molecular Physics (November 1975), 1529–1535.

[27] Boda, D., Gillespie, D., Nonner, W., Henderson, D., and Eisen- berg, B. Computing induced charges in inhomogeneous dielectric media: Application in a Monte Carlo simulation of complex ionic systems. Physical Review E 69, 4 (Apr 2004).

[28] Boda, D., Henderson, D., and Busath, D. D. Monte carlo study of the effect of ion and channel size on the selectivity of a model calcium channel. The Journal of Physical Chemistry B 105, 47 (Nov. 2001), 11574–11577.

[29] Bondi, A. van der Waals volumes and radii. The Journal of Physical Chem- istry 68, 3 (Mar. 1964), 441–451.

[30] Brenner, S. C., and Scott, R. The Mathematical Theory of Finite Ele- ment Methods, 3rd ed. Springer, Nov. 2010.

[31] Brezina, M., Mandel, J., and Vanek, P. Energy optimization of algebraic multigrid bases. Tech. rep., University of Colorado at Denver, Denver, CO, USA, 1998.

153 [32] Briggs, W. L., Henson, V. E., and McCormick, S. F. A Multigrid Tutorial, 2 ed. SIAM: Society for Industrial and Applied Mathematics, July 2000.

[33] Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S., and Karplus, M. CHARMM: A program for macro- molecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4, 2 (FebFeb 1983), 187–217.

[34] Brune, P. Enabling unstructured multigrid under the sieve framework. Mas- ter’s thesis, University of Chicago, Chicago, Illinois, 2008.

[35] Brune, P. R., Knepley, M. G., and Scott, L. R. Unstructured geo- metric multigrid in two and three dimensions on complex and graded meshes. Submitted (2011).

[36] Bylaska, E. J., Holst, M., and Weare, J. H. Adaptive finite element method for solving the exact kohn sham equation of density functional theory. Journal of Chemical Theory and Computation 5, 4 (2009), 937–948.

[37] Cai, Q., Wang, J., Zhao, H. K., and Luo, R. On removal of charge singularity in Poisson–Boltzmann equation. The Journal of Chemical Physics 130, 14 (2009).

[38] Cammi, R., and Tomasi, J. Remarks on the use of the apparent sur- face charges (ASC) methods in solvation problems: Iterative versus matrix- inversion procedures and the renormalization of the apparent charges. J. Com- put. Chem. 16, 12 (1995), 1449–1458.

[39] Chan, T., Smith, B., and Zou, J. Multigrid and domain decomposition methods for unstructured meshes, 1994.

[40] Chan, T. F., and van der Vorst, H. A. Approximate and incomplete factorizations. In ICASE/LARC Interdisciplinary Series In Science and Engi- neering (1994), vol. 20, pp. 167–202.

[41] Chen, J., Hu, W., and Puso, M. Orbital HP-clouds for solving Schr¨odinger equation in quantum mechanics. Computer Methods in Applied Mechanics and Engineering 196, 37-40 (2007), 3693 – 3705. Special Issue Honoring the 80th Birthday of Professor Ivo Babuska.

154 [42] Chen, L., Holst, M., and Xu, J. The finite element approximation of the nonlinear Poisson-Boltzmann equation. ArXiv e-prints (Jan. 2010).

[43] Chen, L., and Zhang, C. A coarsening algorithm on adaptive grids by newest vertex bisection and its applications. Journal of Computational Math- ematics 28, 6 (2010), 767–789.

[44] Connolly, M. L. Solvent-accessible surfaces of proteins and nucleic acids. Science 221, 4612 (Aug. 1983), 709–713.

[45] Darden, T., York, D., and Pedersen, L. Particle mesh ewald: An n [center-dot] log(N) method for ewald sums in large systems. The Journal of Chemical Physics 98, 12 (June 1993), 10089–10092.

[46] Debye, P. Polar molecules. Angewandte Chemie 42, 41 (1929), 995. [47] Demanet, L., and Ying, L. Discrete symbol calculus. Siam Review (Jul 2008).

[48] Dogonadze, R. R., and Kornyshev, A. A. Polar solvent structure in the theory of ionic solvation. J. Chem. Soc., Faraday Trans. 2: 70 (1974), 1121–1132. [49] DOLFIN, 2006. URL: http://www.fenicsproject.org/dolfin/. [50] Dubiner, M. Spectral methods on triangles and other domains. J. Sci. Comput. 6, 4 (1991), 345–390.

[51] Dupont, T., Kendall, R. P., and Rachford, H. H. An approximate fac- torization procedure for solving self-adjoint elliptic difference equations. SIAM Journal on Numerical Analysis 5 (Sept. 1968), 559–573.

[52] Dyn, N., Hormann, K., Kim, S.-J., and Levin, D. Optimizing 3D triangulations using discrete curvature analysis. In Mathematical Methods for Curves and Surfaces. Vanderbilt University, Nashville, TN, USA, 2001, pp. 135–146.

[53] Eisenberg, B. Crowded charges in ion channels. Advances in Chemical Physics (Sept. 2010).

[54] Falgout, R. D., and Yang, U. M. Hypre: a library of high perfor- mance preconditioners. In Preconditioners, Lecture Notes in Computer Science (2002), pp. 632–641. 155 [55] Fasel, C., Rjasanow, S., and Steinbach, O. A boundary integral for- mulation for nonlocal electrostatics. In Numerical Mathematics and Advanced Applications, K. Kunisch, G. Of, and O. Steinbach, Eds. Springer Berlin Hei- delberg, Berlin, Heidelberg, 2008, ch. 13, pp. 117–124.

[56] FEniCS, 2006. URL: http://www.fenicsproject.org/.

[57] FFC, 2007. URL: http://www.fenicsproject.org/ffc/.

[58] Freitag, L. A., and Knupp, P. M. Tetrahedral mesh improvement via optimization of the element condition number. Int. J. Numer. Meth. Engng. 53, 6 (2002), 1377–1391.

[59] Frigo, M., and Johnson, S. G. FFTW: an adaptive software architecture for the FFT. Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on 3 (May 1998), 1381–1384 vol.3.

[60] Garcke, J., and Griebel, M. the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique, 2000.

[61] Garke, J. Sparse grid tutorial, August 2006.

[62] Gerstner, T. Sparse Grid Quadrature Methods for Computational Finance. Habilitation, Institute for Numerical Simulation, University of Bonn, 2007.

[63] Gillespie, D. Toward making the mean spherical approximation of primitive model electrolytes analytic: An analytic approximation of the MSA screening parameter. The Journal of Chemical Physics 134, 4 (Jan. 2011).

[64] Gillespie, D., Nonner, W., and Eisenberg, R. S. Coupling Poisson- Nernst-Planck and density functional theory to calculate ion flux. Journal of Physics: Condensed Matter 14, 46 (2002), 12129–12145.

[65] Gillespie, D., Nonner, W., and Eisenberg, R. S. Density functional theory of charged, hard-sphere fluids. Physical review. E, Statistical, nonlinear, and soft matter physics 68, 3 Pt 1 (September 2003).

[66] Golub, G. H., and van Van Loan, C. F. Matrix Computations, 3rd ed. The Johns Hopkins University Press, Oct. 1996.

[67] Greengard, L. Fast algorithms for classical physics. Science (1994).

156 [68] Griebel, M., and Hamaekers, J. Tensor product multiscale Many-Particle spaces with Finite-Order weights for the electronic Schr¨odingerequation. Tech. rep., Institute for Numerical Simulation, UBonn, 2009.

[69] Grundmann, A., and Moller, H. M. Invariant integration formulas for the n-simplex by combinatorial methods. SIAM Journal on Numerical Analysis 15, 2 (1978), 282–290.

[70] Geforce GTX285. http://www.nvidia.com/object/product_geforce_gtx_ 285_us.html, June 2010.

[71] Guillard, H. Node-nested multi-grid method with Delaunay coarsening. Research Report RR-1898, INRIA, 1993.

[72] Hackbusch, W. Multi-Grid Methods and Applications (Springer Series in Computational Mathematics). Springer, Dec. 2010.

[73] Hatcher, A. Algebraic Topology. Cambridge University Press, 2002.

[74] Hess, B., Holm, C., and van der Vegt, N. Modeling multibody effects in ionic solutions with a concentration dependent dielectric permittivity. Phys. Rev. Lett. 96, 14 (Apr 2006), 147801.

[75] Hildebrandt, A., Blossey, R., Rjasanow, S., Kohlbacher, O., and Lenhof, H. P. Novel formulation of nonlocal electrostatics. Physical Review Letters 93, 10 (Sept. 2004).

[76] Hildebrandt, A., Blossey, R., Rjasanow, S., Kohlbacher, O., and Lenhof, H.-P. Electrostatic potentials of proteins in water: a structured continuum approach. Bioinformatics 23, 2 (Jan. 2007), e99–e103.

[77] Hilgenfeldt, S. Numerical solution of the stationary Schr¨odingerequation using finite element methods on sparse grids.

[78] Il’in, V. P. Iterative Incomplete Factorization Methods. World Scientific Pub Co Inc, July 1992.

[79] Im, W. A grand canonical Monte Carlo brownian dynamics algorithm for simulating ion channels. Biophysical Journal 79, 2 (Aug. 2000), 788–801.

[80] Jean-Charles, A., Nicholls, A., Sharp, K., Honig, B., Tempczyk, A., Hendrickson, T. F., and Still, W. C. Electrostatic contributions

157 to solvation energies: comparison of free energy perturbation and continuum calculations. Journal of the American Chemical Society 113, 4 (Feb. 1991), 1454–1455.

[81] Jones, J. E., and Vassilevski, P. S. AMGe based on element agglomera- tion. SIAM Journal on Scientific Computing 23, 1 (2001), 109–133.

[82] Jung, M. Some multilevel methods on graded meshes. Journal of Computa- tional and Applied Mathematics 138, 1 (January 2002), 151–171.

[83] Kaatze, U. Complex permittivity of water as a function of frequency and temperature. Journal of Chemical & Engineering Data 34, 4 (Oct. 1989), 371– 374.

[84] Kahan, W. Pracniques: further remarks on reducing truncation errors. Com- mun. ACM 8, 1 (Jan. 1965), 40+.

[85] Klockner,¨ A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., and Fasih, A. PyCUDA: GPU run-time code generation for high-performance computing, Nov 2009.

[86] Knepley, M., and Karpeev, D. A flexible representation for computational meshes. Research Report, Argonne National Lab, 2008.

[87] Knepley, M. G., and Karpeev, D. A. Flexible representation of compu- tational meshes. Technical Report ANL/MCS-P1295-1005, Argonne National Laboratory, October 2005.

[88] Knepley, M. G., Karpeev, D. A., Davidovits, S., Eisenberg, R. S., and Gillespie, D. An efficient algorithm for classical density functional theory in three dimensions: ionic solutions. The Journal of chemical physics 132, 12 (March 2010).

[89] Koehl, P., Orland, H., and Delarue, M. Beyond the Poisson-Boltzmann model: Modeling biomolecule-water and water-water interactions. Phys. Rev. Lett. 102, 8 (Feb 2009), 087801.

[90] Kornyshev, A. A., Rubinshtein, A. I., and Vorotyntsev, M. A. Model nonlocal electrostatics. I. Journal of Physics C: Solid State Physics 11, 15 (Aug. 1978), 3307+.

158 [91] Koster,¨ M., and Turek, S. A note on optimal multigrid convergence for higher order FEM. Tech. rep., University of Dortmund, 1995.

[92] Lamoureux, M., and Margrave, G. An Introduction to Numerical Meth- ods of Pseudodifferential Operators. In Proceedings of the CIME Workshop on Pseudodifferential Operators, Quantization, and Signals (2008), pp. 79–133.

[93] Ledoux, H., Gold, C. M., and Baciu, G. Flipping to robustly delete a vertex in a Delaunay tetrahedralization. In ICCSA (1) (2005), pp. 737–747.

[94] Lee, B., and Richards, F. M. The interpretation of protein structures: Estimation of static accessibility. Journal of Molecular Biology 55, 3 (Feb. 1971), 379–IN4.

[95] Li, Y., Dongarra, J., and Tomov, S. A note on auto-tuning GEMM for GPUs. In Computational Science ICCS 2009, G. Allen, J. Nabrzyski, E. Seidel, G. D. Albada, J. Dongarra, and P. M. A. Sloot, Eds., vol. 5544. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, ch. 89, pp. 884–892.

[96] Liao, X., and Nochetto, R. H. Local a posteriori error estimates and adaptive control of pollution effects. Numerical Methods for Partial Differential Equations 19, 4 (2003), 421–442.

[97] Logg, A., and Wells, G. N. DOLFIN: Automated finite element comput- ing. ACM Transactions on Mathematical Software 37, 2 (2010), 20:1–20:28.

[98] Lohner,¨ R., and Morgan, K. An unstructured multigrid method for ellip- tic problems. International Journal for Numerical Methods in Engineering 24, 1 (1987), 101–115.

[99] Lorentz, G. G. Bernstein Polynomials, 2 sub ed. American Mathematical Society, Oct. 1997.

[100] Marcus, Y. Thermodynamics of solvation of ions. part 5.-Gibbs free energy of hydration at 298.15 k. Faraday Trans. 87, 18 (1991), 2995–2999.

[101] Mesri, Y., and Guillard, H. An automatic mesh coarsening technique for three dimensional anisotropic meshes. Rapport de recherche RR-6344, INRIA, 2007.

[102] Miller, G. L., Talmor, D., and Teng, S.-H. Optimal coarsening of unstructured meshes. J. Algorithms 31, 1 (1999), 29–65. 159 [103] Murphy, M., Mount, D. M., and Gable, C. W. A Point-Placement strategy for conforming Delaunay tetrahedralization. In Proceedings of the Eleventh Annual Symposium on Discrete Algorithms (2000), vol. 11, pp. 67– 74.

[104] NVIDIA. NVIDIA CUDA Programming Guide 2.0, 2008.

[105] Ollivier-Gooch, C. Coarsening unstructured meshes by edge contraction. Int. J. Numer. Meth. Engng. 57, 3 (2003), 391–414.

[106] Parthasarathy, V. A comparison of tetrahedron quality measures. Finite Elements in Analysis and Design 15, 3 (Jan. 1994), 255–261.

[107] Pask, J. E., and Sterne, P. A. Finite element methods in ab initio elec- tronic structure calculations. Modelling and Simulation in Materials Science and Engineering 13, 3 (2005), R71–R96.

[108] PETSc, 2006. URL: http://www.mcs.anl.gov/petsc/.

[109] Phillips, J. C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R. D., Kale,´ L., and Schulten, K. Scalable molecular dynamics with NAMD. Journal of computational chemistry 26, 16 (Dec. 2005), 1781–1802.

[110] Pople, J., and Hehre, W. Computation of electron repulsion in- volving contracted Gaussian basis functions. Journal of Computational Physics 27, 2 (May 1978), 161–168.

[111] qing Dai, D., Han, B., and qing Jia, R. Galerkin analysis for Schr¨odinger equation by wavelets. Journal of Mathematical Physics 45 (2004), 855–869.

[112] Rey, B., Mocellin, K., and Fourment, L. A node-nested Galerkin multigrid method for metal forging simulation. Computing and Visualization in Science 11, 1 (January 2008), 17–25.

[113] Rivara, M. C. Mesh refinement processes based on the generalized bisection of simplices. SIAM Journal on Numerical Analysis 21, 3 (1984).

[114] Rosenfeld, Y. Free energy model for inhomogeneous fluid mixtures: Yukawa- charged hard spheres, general interactions, and plasmas. The Journal of Chem- ical Physics 98, 10 (1993), 8126–8148.

160 [115] Rubinstein, A., and Sherman, S. Evaluation of the influence of the internal aqueous solvent structure on electrostatic interactions at the protein-solvent interface by nonlocal continuum electrostatic approach. Biopolymers 87, 2-3 (2007), 149–164.

[116] Ruppert, J., and Seidel, R. On the difficulty of tetrahedralizing 3- dimensional non-convex polyhedra. In Proceedings of the fifth annual sym- posium on Computational geometry (New York, NY, USA, 1989), SCG ’89, ACM, pp. 380–392.

[117] Rusinkiewicz, S. Estimating curvatures and their derivatives on triangle meshes. 3D Data Processing Visualization and Transmission, International Symposium on 0 (2004), 486–493.

[118] Saad, Y. Iterative Methods for Sparse Linear Systems, Second Edition, 2 ed. Society for Industrial and Applied Mathematics, Apr. 2003.

[119] Sandberg, L., Casemyr, R., and Edholm, O. Calculated hydration free energies of small organic molecules using a nonlinear dielectric continuum model. The Journal of Physical Chemistry B 106, 32 (Aug. 2002), 7889–7897.

[120] Sanner, M. F., Olson, A. J., and Spehner, J.-C. Reduced surface: An efficient way to compute molecular surfaces. Biopolymers 38, 3 (Dec. 1996), 305–320.

[121] Schoberl,¨ J. Netgen - an advancing front 2D/3D-mesh generator based on abstract rules.

[122] Schutz, C. N., and Warshel, A. What are the dielectric ”constants” of proteins and how to validate electrostatic models? Proteins 44, 4 (September 2001), 400–417.

[123] Scott, L. R. Nonstandard dielectric response. Tech. rep., The University of Chicago, September 2010.

[124] Scott, L. R., and Zhang, S. Finite element interpolation of nonsmooth functions satisfying boundary conditions. Mathematics of Computation 54, 190 (1990), 483–493.

[125] Scott, L. R., and Zhang, S. Higher-dimensional nonnested multigrid meth- ods. Mathematics of Computation 58, 198 (1992), 457 – 466.

161 [126] Scott, R., Boland, M., Rogale, K., and Fernandez, A. Continuum equations for dielectric response to macro-molecular assemblies at the nano scale. Journal of Physics A: Mathematical and General (Oct. 2004), 9791– 9803.

[127] Shewchuk, J. R. Triangle: engineering a 2D quality mesh generator and De- launay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, M. C. Lin and D. Manocha, Eds., vol. 1148 of Lecture Notes in Computer Science. Springer-Verlag, May 1996, pp. 203–222. From the First ACM Workshop on Applied Computational Geometry.

[128] Shewchuk, J. R. Tetrahedral mesh generation by Delaunay refinement. In Proceedings of the fourteenth annual symposium on Computational geometry (New York, NY, USA, 1998), SCG ’98, ACM, pp. 86–95.

[129] Shewchuk, J. R. What is a good linear element? - interpolation, condi- tioning, and quality measures. In In 11th International Meshing Roundtable (2002), pp. 115–126.

[130] Si, H. TetGen website, 2007. http://tetgen.berlios.de/index.html. [131] Slater, J. C. Atomic shielding constants. Physical Review Online Archive (Prola) 36, 1 (July 1930), 57–64.

[132] Smith, B., Bjørstad, P., and Gropp, W. Domain Decomposition: Par- allel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, March 2004. a [133] Smolyak, S. A. Quadrature and interpolation formulas the classes Ws and a Es . Dokl. Akad. Nauk SSSR 131 (1960), 1028–1031. Russian, Engl. Transl.: Soviet Math. Dokl. 1:384–387, 1963.

[134] Stone, J. E., Gohara, D., and Shi, G. OpenCL: A parallel program- ming standard for heterogeneous computing systems. Computing in Science & Engineering 12, 3 (May 2010), 66–73.

[135] Stuben,¨ K. A review of algebraic multigrid. J. Comput. Appl. Math. 128, 1-2 (Mar. 2001), 281–309.

[136] Sutmann, G. Computer simulation of the non-local dielectric function of polar liquids: boundary conditions revisited. Molecular Physics 96 (1999), 1781–1788. 162 [137] Talmor, D. Well-Spaced Points for Numerical Methods. PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania, 1997.

[138] Taylor, M. E. Pseudodifferential operators. Princeton University Press, Princeton, N.J. :, 1981.

[139] Taylor, R. L. On completeness of shape functions for finite element analysis. International Journal for Numerical Methods in Engineering 4, 1 (1972), 17– 22.

[140] Teeter, M. M. Water-protein interactions: Theory and experiment. Annual Review of Biophysics and Biophysical Chemistry 20, 1 (1991), 577–600.

[141] Teitzel, C., Hopf, M., and Ertl, T. Scientific visualization on sparse grids. Tech. rep., Universit¨atErlangen-N¨urnberg, Lehrstuhl f¨urGraphische Datenverarbeitung (IMMD IX), 2000.

[142] Treves, F. Introduction to pseudodifferential and Fourier integral operators. Plenum Press, New York :, 1980.

[143] Trottenberg, U., Oosterlee, C. W., and Schuller, A. Multigrid, 1st ed. Academic Press, Dec. 2000. [144] UFC, 2009. URL: http://www.fenicsproject.org/ufc/. [145] UFL, 2009. URL: http://www.fenicsproject.org/ufl/. [146] Walter, J., Steigemann, W., Singh, T. P., Bartunik, H., Bode, W., and Huber, R. On the disordered activation domain in trypsinogen: chemical labelling and low-temperature crystallography. Acta Crystallographica Section B 38, 5 (May 1982), 1462–1472.

[147] Wan, W. L., Chan, T. F., and Smith, B. An energy-minimizing interpo- lation for robust multigrid methods. SIAM J. Sci. Comput. 21 (Dec. 1999), 1632–1649.

[148] Warshel, A., Sussman, F., and King, G. Free energy of charges in solvated proteins: microscopic calculations using a reversible charging process. Biochemistry 25, 26 (Dec. 1986), 8368–8372.

[149] Weggler, S., Rutka, V., and Hildebrandt, A. A new numerical method for nonlocal electrostatics in biomolecular simulations. Journal of Computational Physics (Feb. 2010). 163 [150] Whaley, R. C. Automated empirical optimizations of software and the atlas project. Parallel Computing 27, 1-2 (January 2001), 3–35.

[151] Xie, D., Jiang, Y., Brune, P., and Scott, L. R. A fast solver for the nonlocal dielectric continuum model. In Preparation (2011).

[152] Yada, H., Nagai, M., and Tanaka, K. The intermolecular stretching vibration mode in water isotopes investigated with broadband terahertz time- domain spectroscopy. Chemical Physics Letters 473, 4-6 (2009), 279 – 283.

[153] Yserentant, H. On the regularity of the electronic Schr¨odingerequation in Hilbert spaces of mixed derivatives. Numer. Math. 98, 4 (2004), 731–759.

[154] Yserentant, H. Sparse grid spaces for the numerical solution of the elec- tronic Schr¨odingerequation. Numerische Mathematik 101, 2 (Aug. 2005), 381– 389.

[155] Yserentant, H., Vom, H., and Andre, V. D. K. On the multi-level splitting of finite element spaces. Numer. Math 49 (1986), 379–412.

[156] Zhang, S. Optimal-order nonnested multigrid methods for solving finite ele- ment equations II: On non-quasi-uniform meshes. Mathematics of Computation 55, 192 (1990), 439–450.

[157] Zhang, S. Optimal-order nonnested multigrid methods for solving finite el- ement equations III: On degenerate meshes. Mathematics of computation 64, 209 (1995), 23–49.

[158] Zheng, W. Numerical solutions of the Schr¨odingerequation for the ground lithium by the finite element method. Applied Mathematics and Computation 153, 3 (June 2004), 685–695.

164