<<

1

From Fundamental First-Principle Calculations to NanoEngineering Applications: Review of the NESSIE project James Kestyn, Eric Polizzi

Abstract—This paper outlines how modern first-principle cal- and modeling techniques are indeed largely inadequate to cope culations can adequately address the needs for ever higher with the new generation of challenges encountered in large- levels of numerical accuracy and high-performance in large-scale scale nanoengineering applications including systems with electronic structure simulations, and pioneer the fundamental study of quantum many-body effects in a large number of many thousand atoms. Atomistic simulations must adapt to emerging nanomaterials. leverage the current needs in scalability by capitalizing on the massively parallel capabilities of modern high-performance Index Terms—DFT, TDDFT, electronic structure, first- principle, real-space , real-time propagation, excited-state, computing (HPC) platforms. Additionally, well-established plasmonic, FEAST, NESSIE public or commercial software packages were originally in- tended to investigate the basic electronic structure properties of materials using ground-state calculations. They possess only I.INTRODUCTION limited capabilities for performing excited-state calculations HE TECHNOLOGY for electronic devices has been on that can efficiently model and predict quantum many-body T a rapidly rising trajectory since the 1960s. The main effects in emerging nanomaterials. The ability to capture these factor in this development has been the ability to fabricate fundamental nanophysics effects is increasingly important for ever smaller silicon CMOS devices (‘Moore’s Law’), with exploring and prototyping new revolutionary functional mate- today’s device sizes in the nanometer range. The ability to rials in nanotechnology. There is an urgent need in nanoengi- control electronic materials and understand their properties neering for new quantum-based transformative solutions that has been a driving force for technological breakthroughs. The will play a role in future electronics including plasmonics, emergence of new nanoscale materials and devices, whose phononics and excitonics. Future breakthrough could enable operating principles rely entirely on quantum effects, necessi- disruptive technologies to compete directly with CMOS or be tates a fundamental and comprehensive understanding of the integrated into existing systems to increase throughput and nanoscale physics of systems. First principle calculations offer decrease power dissipation. To this end, new one and two- a unique approach to study materials that start directly from the dimensional nanostructures (e.g. graphene, carbon nanotubes, mathematical equations describing the physical laws and do MoS2, layered transition metal dichalcogenides) have been the not require any empirical parameters aside from fundamental center of large research efforts, with first-principle atomistic constants. They are known as electronic structure calculations simulations playing a significant role. Plasmonic devices, that when applied to the configuration of electrons in a molecule or rely on collective many-body effects, have also shown promise solid which determine most of the physical properties of matter as high frequency analog sensors to be used in bio-medical through chemical bonding. Fundamentals laws governing the applications and telecommunications. physics have been known since the beginning of the 20th century with the development of quantum mechanics. The This paper presents the entire computational process needed difficulty, then, does not lie in formulating the problem, but to bring fundamental first-principle calculations up to the actually solving it. Atom-by-atom large-scale first-principle level where they can significantly impact innovations in na- calculations have become critical for supplementing the ex- noengineering. As depicted in Figure 1, modern first principle perimental investigations and obtaining detailed electronic calculations must be capable of: (i) achieving both accuracy arXiv:2002.06732v1 [cond-mat.mtrl-sci] 17 Feb 2020 structure properties and reliable characterization of emerging and scalability; (ii) allowing for the study emerging many- nanomaterials. These simulations are essential to assist the body functionality; and (iii) fully taking advantage of recent every day work of numerous engineers and scientists and can advanced in algorithms development for high-performance universally impact a wide range of disciplines (engineering, computing (HPC). The basics of first principle modeling physics, chemistry, and biology) that span technological fields are first summarized in Section II. From ground-state to of computing, sensing and energy. excited-state calculations, the NESSIE modeling framework In spite of the enormous progress that has been made is then introduced in Section III. NESSIE’s accuracy and in the last few decades, the room for improvement in first- scalability are described in detail using various simulation principle calculations is still significant. Traditional numerical results. The paper ends by discussing plasmonics in Section IV Department of Electrical and Computer Engineering, University of Mas- as a nanoengineering application of large-scale first-principle sachusetts, Amherst, MA, 01003 USA e-mail: [email protected] calculations. 2

Fig. 1. Requirement in modern first-principle calculations at a glance.

II.FIRST-PRINCIPLE MODELING:FROM PHYSICSTO computational material science for decades, since it provides ALGORITHMS (in principle) an exact method for calculating the ground-state density and energy of a system of interacting electrons using a The field of first-principle modeling can be broadly sep- non-linear single electron Schrodinger-like¨ equation associated arated into three categories: (i) physical models to reduce with exchange-correlation (XC) functionals. In practice, the the complexity of the full many-body solution while keeping reliability of DFT depends on the numerical approximations much of the important physics. The choice of a physical used for the XC terms that range from the simplest local model is often motivated by the objectives of the simulations; density approximation (LDA) or the generalized gradient (ii) discretization and mathematical models that transform approximation (GGA), to more advanced (hybrid) schemes physical equations into the language of linear algebra; and which are still the subject of active research efforts [5]–[8]. (iii) computing and numerical algorithms to solve the resulting Solutions of the DFT/Kohn-Sham problem are routinely used problems. In order to improve on current software imple- in the calculations of many ground-state properties including: mentation by fully capitalizing on modern HPC computing total energy and ionization potential, crystal-atomic structure, platforms, it is essential to revisit all the various stages of ionic forces, vibrational frequencies, and phonon bandstructure the electronic structure modeling process which are briefly via pertubation theory. summarized in the following. Although DFT cannot fundamentally provide information on excited-states and many-body properties, the Kohn-Sham A. Physical Modeling eigenvectors are often needed by more advanced techniques: A direct numerical treatment of a full many-body e.g. either Green’s function-based [9] (e.g. GW, Bethe- Schrodinger¨ equation leads to a deceptively simple linear Salpeter) or time-dependent density-based (i.e. TDDFT [10], eigenvalue problem, which is well known to be intractable [11]) approaches. The pros and cons of these approaches are because of its exponential growing dimension with the number discussed in Ref. [12]. TDDFT, proposed by Runge and Gross of particles. This limitation has historically motivated the need [13], continues to gain popularity as one of the most numer- for lower levels of sophistication in the description of the elec- ically affordable many-body techniques capable of providing tronic structure using a single electron picture approximation fairly accurate results. TDDFT has been successfully applied where the size of the Hamiltonian operator ends up scaling to calculate many physical observables of the time-dependent linearly with the number of electrons. First-principle electronic Hamiltonian, such as excitation energies and complex permit- structure calculations are usually performed within the single- tivities, as well as non-linear phenomena. It is often used to electron picture [1], [2] using either quantum chemistry (i.e. obtain the absorption spectra of complex molecular systems. post Hartree-Fock) methods or, as an alternative to wave While the design of advanced time-dependent XC functionals function based methods, Density Functional Theory (DFT) is still a challenging task [14], ALDA (Adiabatic LDA) for associated with the Kohn-Sham equations [3], [4]. Although TDDFT has been found to perform extremely well on a wide DFT does not allow for systematic accuracy as traditional variety of systems by capturing many nanoscopic effects (such quantum chemistry techniques would, it is the method of as plasmonic effects) which, in turn, can be quantitatively choice when dealing with moderate sized systems containing compared with the experimental data. more than a handful of atoms. DFT has been widely used in TDDFT calculations can be performed in frequency or real- 3 time domain. The real-time TDDFT technique is a relatively systems where the computational domain must be made recent approach introduced by Yabana and Bertsch in [15], much larger than the molecular size to ensure interac- [16], and it has become an important focus of the TDDFT tions due to periodicity are negligible. Additionally, they research activities. It has notably been integrated into the often make use of pseudopotentials to mimic the effects software packages Octopus [17], [18], NWChem [19], and of core electrons, which do not directly participate in GPAW [20] for the study of molecular systems. In essence, chemical bonding and would otherwise necessitate a very spectroscopic information can be obtained using the standard large number of plane waves due to their high-frequency formalism of dipole time-response from weak short-polarized variations. impulses in any given direction of the system, and which • LCAO benefits from a large collection of local basis sets requires all the occupied single electron wave functions to that has been improved and refined throughout the years be propagated (non-linearly) in time. The imaginary part of by the quantum chemistry community to obtain high- the dipole’s Fourier transform provides the dipole strength level of accuracy in simulations. However, LCAO bases function. The absorption spectrum is then obtained along may suffer from numerical truncation errors of finite with the expected “true many-body” excited energy levels. In expansions, and the solutions cannot be universally and contrast to the numerical models derived from the TDDFT systematically improved towards convergence. linear response theory in frequency domain [10], [21]–[23], • Real-space mesh techniques provide a natural way of the real-time TDDFT approach is better suited for achieving quantifying atomic information by employing universal linear parallel scalability and it can also address any form of local mathematical approximations. They can easily han- non-linear responses, including ion dynamics [24]. dle the treatment of various boundary conditions, such as Dirichlet (for the confined directions), periodic or absorbing (for transport simulations). Similarly to plane B. Mathematical Modeling and Discretization wave schemes, however, the high-level of refinement Although, first-principle calculations have provided a prac- needed to capture the core electrons may be problematic. tical (i.e. numerical tractable) path for solving the electronic In all cases, the level of approximation in the discretization structure problem, they have also introduced new numerical stage, is bounded by the capabilities of the numerical algo- challenges. Within the single electron picture, the resulting rithms for solving the resulting system matrices. In modern eigenvalue problem becomes fully non-linear since the Hamil- nanoelectronic applications, one aims at fully utilizing the tonian operator depends on all the occupied eigenfunctions power of modern HPC architectures to tackle large-scale finite (i.e. H({ψ})ψ = Eψ). In practice, this non-linear eigenvector systems by exploiting parallelism at multiple levels. In this problem is commonly addressed using direct minimization context, real-space mesh techniques offer the most significant schemes or self-consistent field methods (SCF) wherein a advantages. They produce very sparse matrices that can take series of linear eigenvalue problems (i.e. Hψ = Eψ), needs to advantage of recent advances made in O(N) linear scaling be solved iteratively until convergence. Computing the electron methods and domain decomposition techniques. density at a given iteration step becomes one of the most time-consuming and challenging part of the DFT electronic structure calculations. Successfully reaching convergence by performing SCF iterations is of paramount importance to C. Computing first-principle electronic structure calculations software. Real- Much of the progress in this field is directly tied to time TDDFT comes also with its own set of mathematical advancements in algorithmic research allowing larger and and numerical challenges for performing the time-propagation, more complex systems to be simulated. Within the SCF-DFT those will be discussed further in Section III. procedure, computing the electron density by solving the linear To perform the numerical calculations, the mathematical and symmetric eigenvalue problem at each iteration becomes models need first to be discretized by expanding the wave the major computational challenge. The characterization of functions over a set of basis functions. One can identify three complex systems and nanostructures of current technological main discretization techniques that have been widely used over interests, requires the repeated computations of many tens the past four decades by both the quantum chemistry and the of thousands of eigenvectors, for eigenvalue systems that solid-state physics communities [1]: (i) the linear combination can have sizes in the tens of millions. It is important to of atomic orbitals (LCAO) (along with the dominant use mention that Green’s function-based formalism can alterna- of Gaussian local basis sets), (ii) the plane wave expansion tively be used for computing directly the electron density scheme, and (iii) the real-space mesh techniques [25]–[37] (using efficient evaluations of the diagonal elements of the (also loosely called numerical grids) based on finite difference Green’s function along a complex contour, e.g. [38]–[40]). method (FDM), finite element method (FEM), spectral element However, this method gives rise to difficulties in algorithmic or wavelets methods. Each of these approaches have pros and complexity (i.e. O(N 2) for 3D systems), parallel scalability cons. and accuracy. In that regard, it is difficult to bypass the wave • Plane waves have traditionally been used within the solid- function formalism, and progress in large-scale electronic state physics community because their natural periodic structure calculations can then be tied together with advances nature can be easily applied to crystal structures. How- in numerical algorithms for addressing the eigenvalue problem, ever, this can be cumbersome when dealing with finite in particular. 4

Traditional methods for solving the eigenvalue problem computing platforms using state of the art parallel algo- (including Arnoldi, Lanczos methods, or other Davidson- rithms/solvers. Jacobi techniques [41], [42]) and related packages [43], are The next sections a step-by-step description of largely unable to cope with these challenges. In particular, NESSIE’s modeling framework applied to the benzene they suffer from the orthogonalization of a very large basis molecule as an example. when many eigenpairs are computed. In this case, a divide- and-conquer approach that can compute wanted eigenpairs by A. An All-Electron HPC Framework parts becomes mandatory, since ’windows’ or ’slices’ of the spectrum can be computed independently of one another and In NESSIE, the equations are discretized using FEM with orthogonalization between eigenvectors in different slices is no quadratic (P2) or cubic (P3) order, along with a muffin- longer necessary. These issues have motivated the development tin domain-decomposition (DD) technique. The latter has of a new family of eigensolver based on contour integration been proposed as early as the 1930’s [53] to specifically techniques [44]–[47] such as the FEAST eigensolver [48], address a multi-center atomic system. The whole simulation [49]. FEAST is an optimal accelerated subspace iterative domain is separated into multiple atom-centered regions (i.e. technique for computing interior eigenpairs making use of a ra- muffins) and one large interstitial region. Without any loss tional filter to approximate the spectral projector [50]. FEAST of generality, Figure 2 illustrates the essence of the muffin-tin can be applied for solving both standard and generalized forms domain decomposition using FEM and applied to the Benzene of the Hermitian or non-Hermitian problems [71]. Once a molecule. The 3D finite-element muffin-tin mesh can be built given search interval is selected, FEAST’s main computational in two steps: (i) a 3D atom-centered mesh which is highly task consists of solving a set of independent linear systems refined around the nucleus to capture the core states, and (ii) a along a complex contour. Not only does the FEAST algorithm much coarser 3D interstitial mesh that connects all the muffins feature some remarkable and robust convergence properties (generated in NESSIE using the Tetgen software [54], [55]). [50], [51], it can exploit natural parallelism at three different For the atom-centered mesh, which is common to all atoms levels (L1, L2 or L3): (L1) search intervals can be treated of the same atomic number, it is convenient to use successive separately (no overlap), (L2) linear systems can be solved layers of polyhedra similar to the ones proposed in [37]. This independently across the quadrature nodes of the complex discretization provides both tetrahedra of good quality, and an contour, and (L3) each complex linear system with multiple arbitrary level of refinement i.e. the distance between layers right-hand-sides can be solved in parallel. Parallel resources can be arbitrarily refined while approaching the nucleus (this can be placed at all three levels simultaneously in order to is known as a h-refinement for FEM). achieve scalability and optimal use of the computing platform. The muffin-tin decomposition can bring flexibility in the discretization step (different basis-sets can also be used inde- pendently to describe the different regions), reduce the main III.FIRST-PRINCIPLE CALCULATIONS USING NESSIE computational efforts within the interstitial region alone, and A first-principle simulation software must be capable of should also guaranteed maximum linear parallel scalability addressing all the modern challenges summarized in Figure 1. performances. It is important to note that, independently of One of the major goal in modern first-principle calculations the type of atoms, the outer layer of the muffin is consistently is to develop numerical algorithms and simulation software providing the same (relatively small) number of connectivity for electronic structure that can scale the system size to thou- nodes nj with the interstitial mesh at the muffin edges (i.e. sands of atoms, without resorting to additional approximations nj = 98, or 218 nodes respectively using quadratic P2 or beyond the DFT and TDDFT physical models. The target cubic P3 FEM). Consequently, the size of the system matrix computing architecture is usually comprised of thousands of in the interstitial region stays independent of the size of the processor cores and contains multiple hierarchical levels of atom-centered regions, and the approach can then ideally parallelism. with full potential (all-electron). The NESSIE project [52] is an electronic structure code Once the “Schrodinger”¨ eigenvalue problem (i.e. Hψ = that uses a real-space finite element (FEM) discretization and Eψ) is reformulated using domain decomposition strategies, domain decomposition to perform all-electron ground-state the resulting (and still exact) problem now takes a a non- DFT and real-time excited-state TDDFT calculations. The linear form in the interstitial region (i.e. HI (E)ψI = EψI , code is written to take advantage of multi-level parallelisms to since the boundary conditions at the interfaces with the muffins target systems containing many distributed-memory compute are energy dependent). As originally pointed out by Slater in nodes. Custom numerical algorithms have been developed for 1937 while introducing the muffin-tin augmented plane wave the eigenvalue problems and linear systems representing the (APW) method, this non-linear eigenvalue problem gives rise major linear algebra operations within the software. NESSIE’s to an energy dependent secular equation which cannot be capabilities can be separated into three main categories: (i) ac- handled by traditional eigenvalue algorithms. Although solving curate large-scale full core potential DFT calculations using such non-linear problem explicitly is not impossible [56], [57], real space FEM discretization and domain-decomposition; it remains practically challenging and it is still the subject (ii) TDDFT real-time propagation for efficient spectroscopic of active research efforts [58], [59]. Therefore, the main- calculations allowing the study of many-body effects; and stream approaches to all-electron (i.e. full-potential) electronic (iii) massively parallel implementation on modern high-end structure calculations in the solid-state physics community, 5

muffin to be factorized and solve independently. When the muffin-tin decomposition is applied to a given linear system, the resulting linear system in the interstitial domain (a.k.a., the Schur complement) remains linear. Figure 3 illustrates how the muffin-DD can be used to optimally solve the FEAST’s linear systems. The details of the muffin-DD strategy imple- mentation have been provided in Ref. [69]. In comparison with linearization techniques discussed above, the set of ‘pivot energies’ used to evaluate the interstitial Hamiltonian system are now explicitly provided by the FEAST algorithm (i.e. they correspond to quadrature nodes in the complex plane) and they guarantee global convergence toward the correct solutions (i.e. no approximation needed). Since the complexity of interstitial system scales linearly with the number of atoms while in- cluding non-locality only at the interfaces with the muffins, one can also demonstrate that this all-electron framework is (paradoxically) capable of better scalability performances than pseudopotential approaches on parallel architectures.

Fig. 2. Using a muffin-tin domain-decomposition method, the whole simula- tion domain is separated into multiple atom-centered regions (i.e. muffins) and one large interstitial region. The top figure represents a 2D section of local FEM discretization using a coarse interstitial mesh (represented only partially here) connecting all of the atoms of a Benzene molecule. The bottom figures represent a finer mesh for the atomic (muffin) regions suitable to capture the highly localized core states around the nuclei. have been mostly relying on approximations, such as direct Fig. 3. In NESSIE, the muffin-DD solver operates in three stages: (i) the atoms are distributed throughout the MPI processors and factorized at a given linearization techniques, which have been improved through- energy pivot (i.e. shift value provided by FEAST). The boundary conditions out the years (e.g. LAPW, LMTO, LAPW+lo, etc.) [60]– (i.e. self-energy) are then derived at the interfaces of the muffins. (ii) the [64]. Alternatively, linear eigenvalue problems can directly be resulting interstitial problem (Schur complement) is solved in parallel; and (iii) knowing the exact solution at the muffin interfaces, the solution within obtained from pseudopotential approximation techniques [65]– each atom is retrieved in parallel. [68] that eliminate the core states by introducing smooth but non-local potentials in muffin-like atom-centered regions. The FEAST solver has been recently undergone a significant In NESSIE, an exact strategy has been introduced for per- upgrade to support the MPI-MPI-MPI distributed parallel forming all-electron electronic structure calculations within a programming model in v4.0 [85] (where the last ’MPI’ refers parallel computing environment [69], [70]. The approach relies to the linear system solves at level L3). Furthermore, NESSIE on the shift-and-invert capability of eigenvalue algorithms such can take advantage of the first two MPI levels of parallelism as FEAST, which leads to formulating well-defined linear offered by FEAST assuming that the eigenvalue spectrum is systems.0 Domain decomposition methods have been well distributed among the compute nodes. At the second level studied and are a natural framework for addressing large sparse L2, FEAST can naturally distribute all the linear systems linear systems generated from real-space meshes. They are associated with a given search interval (typically less than 10). often associated with the use of distributed-memory numerical At the first level L1, FEAST enables ‘spectrum slicing’ where algorithms to address the data distribution using the Message all the intervals of interests are solved in parallel. The use of Passing Interface (MPI) paradigm. Consequently, the solution spectrum slicing is essential to address a major bottleneck in of the FEAST’s linear systems can be fully parallelized using large-scale DFT calculations concerning the computation and MPI since the muffin-tin decomposition naturally allows each storage of the DFT wave functions. The storage requirement, 6

k in particular, keeps increasing linearly with the number of output density ρout computed from the wave functions. With electrons in the nanostructures. Even with simplified physical simple mixing, the input electron density for the next iteration model such as DFT, it becomes particularly difficult to scale can be computed as, the electronic structure problem for systems containing more ρk+1 = (1 − β)ρk + βρk , than a few hundred electrons without the ability to perform in in out (2) spectrum slicing. This technique is illustrated for benzene in where the parameter β is usually chosen less than 1/2. This, Figure 4, using three elliptical contours for FEAST, one that however, will converge very slowly. In order to increase the computes the core states and two others that computes half convergence rate, more sophisticated methods have been de- of the valence states. In larger systems, each contour often veloped for solving this fixed-point problem. Newton methods contains hundreds of eigenvalues. cannot be used in electronic structure since it is impractical to construct the Jacobian matrix. Other quasi-Newton methods Core States Valence States 3 have been developed in the 1960s, notably by Anderson [72]

2 and Broyden [73], which do not require the Jacobian or

1 Hessian. These techniques were later refined in the 1980s in the context of SCF iteration by Pulay [74], [75] and have 0 since been expanded upon [76]–[79]. They are also related -1 Contour Full Contour to Krylov methods and GMRES [80]–[82]. For electronic

Imaginary Energy (eV) -2 Eigenvalue structure calculations these iterative techniques are usually -3 -268 -264 -260 -24 -20 -16 -12 -8 -4 0 referred to as Direct Inversion of the Iterative Subspace (DIIS) Real Energy (eV) Real Energy (eV) methods. The general idea is to build the input electron Fig. 4. Splitting the full search interval into three separate contours density as a linear combination of past densities. One can then for benzene. The lowest energy contour captures the core states while the construct the input mixing density, other two target valence electrons. Each half-contour has eight quadrature nodes shown as circles. Symmetry allows the FEAST algorithm to only k+1 0 1 k ρˆ = c0ρ + c1ρ + ··· + ckρ , (3) perform computations for the upper upper half of the contour (solving eight in in in in independent linear systems per interval). and the output mixing density,

k+1 0 1 k ρˆout = c0ρout + c1ρout + ··· + ckρout, (4) B. DFT Ground-State Calculations and Scalability from the previous input and output densities. The input for the The DFT/Kohn-Sham problem can be expressed as: next iteration is chosen as a linear combination of the mixing densities:  2  ¯h ρk+1 = (1 − β)ˆρk+1 + βρˆk+1, − ∇2 + v [n](r) ψ = E ψ , in in out (5) 2m KS i i i where β is again referred to as the mixing parameter. This Ne X 2 approach can be truncated in order to keep the density sub- with n(r) = 2 |ψi(r)| , (1) space size small. The inclusion of more densities in the mixing i=1 subspace can result in better convergence, but has diminishing where the Kohn-Sham potential, vKS[n] = vH [n] + vXC [n] + returns. Keeping a history of ten to twenty input and output vext, is composed of the Hartree potential vH solution of densities seems to be more than sufficient. Better performance the Poisson equation, the XC potential, and other external can be obtained by choosing a larger value of β, but it may also potential vext including the ionic core potential. The Ne lowest result in instability. However, improvement in convergence can occupied electronic states {ψ } are needed to com- i (i=1,...,Ne) be obtained by progressively increasing the β parameter along pute the electron density (the factor 2 stands for the electron the SCF iterations. spin). Formally, the system (1) forms a non-linear eigenvector The coefficients {c0, ..., ck} of (3) and (4) are the same problem which is commonly addressed using a self-consistent for both the input and output mixing subspaces. They are field method (SCF) wherein a series of linear eigenvalue computed by solving a k × k linear system with one right- problems need to be solved iteratively until convergence. The hand-side, naive approach which consists of updating the input electron M c = r , (6) density at each SCF iteration directly from the output electron k×k k×1 k×1 density, results in large oscillations between SCF iterations. where the ith element of r depends on the difference between This approach is very unlikely to converge as the initial the output and input densities of the current iteration k and a guess for the density is usually far from the ground-state previous iteration i, solution. Instead, traditional SCF methods employ successive Z k  k i  approximation iterates of a fixed point mapping to generate ri = ρout,in × ρout,in − ρout,in dΩ, (7) the new input electron (a.k.a., ’mixing’ techniques). Ω p p p 1) Discussions on Convergence: Using a mixing technique, with ρout,in = (ρout − ρin), and each element Mij of matrix two different electron densities are considered to construct the M takes into account the densities at iterations i and j: input density at the (k + 1)th SCF iteration: the input density Z  k i  h k j i ρk used to construct the Kohn Sham Hamiltonian and the Mij = ρout,in − ρout,in × ρout,in − ρout,in dΩ. (8) in Ω 7

The effect of beta mixing ratio β and the number of density Method/Energy(eV) E1 EHOMO Etot mixing subspaces kept in memory can bee seen in Figure 5 NWChem 6-311g* [19] −266.35 −6.40 −6262.27 NWChem cc-pvqz [19] −266.41 −6.52 −6263.65 while analyzing the convergence of benzene. P2-FEM [37] −264.66 −6.54 −6226.57 P3-FEM [37] −266.38 −6.53 −6262.57 Simple Mixing Anderson Mixing Anderson Mixing with Beta=0.1 P4-FEM [37] −266.44 −6.53 −6263.78 100 40 50 90 35 FHI-AIMS [37], [83] −266.44 −6.53 −6263.83 80 40 30 NESSIE P2-FEM −266.39 −7.04 −6244.45 70 60 25 30 NESSIE P3-FEM −266.49 −6.55 −6263.41 50 20 20 40 15

# SCF Iteration TABLE I # SCF Iteration # SCF Iteration 30 10 20 10 DFT ENERGYRESULTSFORBENZENE (INCLUDING THE FIRST CORE 10 5 EIGENVALUE, THE HOMO LEVEL AND TOTAL ENERGY) OBTAINED USING 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.2 0.4 0.6 0.8 1 0 4 8 12 16 20 24 28 32 DIFFERENT ALL-ELECTRON FIRST-PRINCIPLESOFTWAREANDBASIS Beta Beta # Subspaces FUNCTIONS, BUTTHESAME LDA APROACH [84] TOMODELTHE XC TERM. Fig. 5. The effect of SCF parameters on the convergence of benzene. The convergence criteria is satisfied when the relative error is below 10−10 for the total energy. For values of β = 0.1 and β > 0.5 the simple mixing (left) does not converge. With Anderson mixing (middle and right) the total energy converges consistently much faster, and only a few mixing subspaces can be spectrum slicing approach presented in Figures 3 and 4. For kept in memory. example, using only two search intervals (one for the core and one for the valence states, i.e. L1=2 in FEAST), eight contour In NESSIE, the mixing scheme can take advantage of one points per interval (so eight linear systems in total, i.e. L2=8 important feature of the FEAST eigensolver. As the density in FEAST), a molecule like benzene with twelve atoms (i.e. begins to converge, the previous eigenvector subspace solution L3=12 in FEAST) can effectively scale up to 2×8×12 = 192 can be used as a very good initial guess for solving the current MPI processes on HPC platforms. eigenvalue problem. Since the eigenvalue convergence criteria In general, the three levels of parallelism of FEAST can is set to only slightly exceed the current SCF convergence, work together to minimize time spent in all stages of the FEAST must only perform a single subspace iteration on algorithm. The third level L3 can be used to reduce the average and, with parallelism, solve a single linear system per memory per node and to decrease the solution time of both diagonalization. the linear system factorization and solve. The second level L2 The other major numerical operation in ground-state DFT, has close to ideal scaling, and, if fully utilized, can reduce is the computation of the Hartree potential - the potential the algorithmic complexity to solving a single complex linear corresponding to a classical charge distribution - through system per FEAST iteration. The first level L1, in turn, allows the solution to the Poisson equation. Once the Dirichlet for the computation of a very large number of eigenvalues boundary conditions have been determined at the edge the of by subdividing the full search interval. The simulation results simulation domain, this amounts to solving a real symmetric- obtained in Ref. [85] have demonstrated that the NESSIE’s positive-definite linear system with one right-hand-side. The muffin-tin approach associated with the FEAST eigensolver, Poisson equation gives then rise to a much less expensive is ideally suited for achieving both strong and weak scalability linear systems than the ones obtained with the eigenvalue on high-end HPC platforms. Some results on weak scalability computation. The computation of the boundary conditions for (i.e. the number of MPI processes increases proportionally Poisson uses the integral form of the Poisson equation and with the number of atoms) are reported in Figure 6. These 2 scales as O(Ns /p) where Ns is the number of surface nodes results outline, in particular, the efficiency of the muffin-tin (which stays relatively small using a coarse FEM mesh far DD solver presented in Figure 3 in comparison with other from the atomistic region), and p is the total number of MPI ‘black-box’ sparse parallel direct solvers. processes. Selected DFT/LDA ground-state simulation results for ben- zene obtained using NESSIE and other all-electron first- C. Real-time TDDFT Excited States Calculations principle software, are reported in Table I. NESSIE results In TDDFT theory, all the occupied Ne ground-state wave are in excellent agreement with other approaches. In addition, functions Ψ = {ψ1, ψ2, . . . , ψNe } solutions of the Kohn- a real-space mesh discretization such as FEM can easily Sham system for the ground-state problem (1), are used as be refined either by adding more local mesh nodes or by initial conditions for solving a time-dependent Schrodinger-¨ increasing the accuracy of the basis functions. Consequently, type equation (∀i = 1,...,Ne): the numerical solutions can systematically converge toward the  2  exact solutions at the level of the physical model (LDA in the ∂ ¯h 2 ı¯h ψi(r, t) = − ∇ + vKS[n](r, t) ψi(r, t), example). ∂t 2m 2) Discussions on Scalability: In Table I, the NESSIE FEM Ne X 2 discretization leads to system matrices of size 2, 454 for the with n(r, t) = 2 |ψi(r, t)| , (9) atom-centered mesh (i.e. a single muffin) using P2, or 8, 155 i=1 using P3. The interstitial size matrix varies from 20, 653 for where the electron density of the interacting system can then P2 to 69, 305 for P3. These system matrices can effectively be obtained at any given time from the time-dependent Kohn- be handled in parallel using the muffin-tin decomposition and Sham wave functions. In principle, TDDFT can be used to 8

#MPI 9 17 25 33 41 This represents the starting point for splitting methods #Atoms 54 102 150 198 246 [90], Magnus expansion [91] or other spectral decompo- System size 211K 392K 575K 758K 942K sitions [92], [93]. If ∆t is small enough, one can usually assume that H(t + ∆t/2) ' H(t). Alternatively, corrector-predictor schemes can be used to evaluate the Hamiltonian at (t + ∆t/2). The non-linear nature of the time propagation arises from the Kohn-Sham VKS potential that needs to be reevaluated at each time step to form the new Hamiltonian. The Hartree potential VH and exchange and correlation potential VXC are both functionals of the electron density. The time dependent Hartree potential is solution of the Poisson equation, while the XC term must be computed with time instantaneous electron density (using the adiabatic approximation) and the same Fig. 6. L3 weak scaling of factorization and solve stages for a single approximation used in ground-state calculations (such as LDA, FEAST iteration using 16 contour points (L2=1 here) and 600 right-hand- GGA, etc.). sides (i.e. search interval with up to 600 states). The matrix size is increased The CN scheme for TDDFT is particularly effective within proportionally with the number of MPI processes (using 12 cores per MPI of Haswell E5-2680v3 – so 492 cores in total for the 246 atom systems). The the NESSIE framework, since the linear system solves along results show that the muffin-DD solver in Figure 3 outperforms both MKL- the time steps in (11) can take advantage of the highly scalable Cluster-Pardiso [86] and MUMPs solvers [87]. Overall timings can be easily muffin-tin real-space domain decomposition solver shown in reduced by a factor 16 using L2=16 MPI parallelism (using 7872 cores in total). Finally, timings can further improve by increasing the number of MPI Figure 3. As a result, the approach can be parallelized at two processes for a fixed atom size system (strong scalability). different MPI levels: (i) by propagating independent chunks of the occupied wavefunctions along ∆t; and (ii) by solving the linear system in parallel. calculate any time dependent observable as a functional of the Spectral-based schemes for the integral-approach (12) are electron density. In (9), the Kohn-Sham potential becomes a known to be robust and accurate permitting larger time steps functional of the time-dependent density: than other usual integration schemes. They are, however, rarely used in practice for large-scale simulations since a direct v (n(r, t)) = v (r, t) + v (n(r, t)) + v (n(r, t)), KS ext H XC diagonalization of the evolution operator (12) would require where it is common practice to consider a local dependency solving hundreds to thousands eigenvalue problems along the time-domain (i.e. one large-scale eigenvalue problem per on time for the XC potential term vXC (a.k.a., the adiabatic approximation). time step). In addition to CN, NESSIE includes an efficient 1) Real-Time Propagation: Assuming a constant time step spectral-based approach which relies on the efficiency of the FEAST eigensolver [92]. First, good approximations of the ∆t, the integral form of (9) introduces the time-ordered exponential in (12) can be obtained with a partial spectral evolution operator Uˆ(t + ∆t, t), such that: decomposition using an eigenvector subspace four to five times

Ψ(t + ∆t) = Uˆ(t + ∆t, t)Ψ(t), the number of propagated states ψj. Since the latter are low- ( ) energy states, this truncated spectral basis is typically sufficient ı Z t+∆t with Uˆ(t + ∆t, t) = T exp − dτH(τ) . to accurately expand the solutions. Second, FEAST can reuse ¯h t the eigenvector subspace computed in the current time-step (10) as very good initial guess for the next one. As a result, only one or two subspace iterations are usually sufficient to obtain There exists a large number of efficient numerical methods convergence. While the linear systems arising in CN need to for solving this real-time propagation problem [88] which can be solved one after another along small time-intervals, a par- broadly be classified into two categories: allel FEAST implementation permits the solution of a single (i) PDE-based such as the Crank-Nicolson (CN) scheme [89] linear system by larger time intervals. Although the FEAST where linear system is notably more computationally demanding i.e. −1 including a lot of extended states, linear parallel scalability can  i  Uˆ(t + ∆t, t) = 1 + ∆tH(t + ∆t/2) × still be naturally achieved using multiple search intervals and 2 more parallel computing power. It is worth mentioning that the   i same strategy would not be possible with the techniques used 1 − ∆tH(t + ∆t/2) . (11) 2 in TDDFT linear response theory in frequency domain (e.g. This is an implicit scheme that requires solving a linear using Casida equation [10]), where the demand in extended system at each-time step. states is even higher and represents the bottleneck of their (ii) Integral-based that acts directly on the evolution operator, cubic arithmetic complexity. Consequently, if one can keep such the mid-point exponential rule i.e. up with the demand in parallel computing power, direct diagonalizations for the real-time TDDFT formalism using Uˆ(t + ∆t, t) = exp{−ı∆tH(t + ∆t/2)}. (12) FEAST become a viable high-performance alternative to other 9 schemes, potentially capable of both higher-scalability and Finally, the imaginary part of the dynamic polarizability better accuracy for obtaining linear and non-linear responses. provides the photo-absorption cross section [96], which is 2) Dipole-time Response: In principle, all time dependent related the probability that a photon passing through the observables are functionals of the density, however, in practice atomistic system is absorbed. A measure of the strength of the functional form is rarely known. A very useful case of this interaction (a.k.a., the oscillator strength) can be computed TDDFT is related to spectroscopy, where the absorption and from the polarizability as follows [97], [98]: emission spectrum corresponding to electronic excitations can 4πω σ(ω) = =(α(ω)). (17) be derived directly from the induced dipole moment d(t). The c latter is related to the response of the system to an applied Once the oscillator strength ω is plotted in function of the electric field E: frequency ω (or equivalently the absorption energy), it pro- Z t vides the absorption spectrum of the system. As an example, d(t) = α(t − t0)E(t0)dt0, (13) −∞ Figure 7 shows the variations of the induced dipole moment for benzene obtained after three distinct impulse excitations where α is defined as the dynamic polarizability. In general, polarized in the x, y and z directions, as well as the corre- E and d are vectors quantities with x, y and z components sponding three absorption spectra. and α is a tensor. In computational spectroscopy, one often 3) Resonances and response density: The peaks in the considers the response of the system in a given direction µ absorption spectrum correspond to specific quantum many- (e.g. x, y or z) associated with an excitation polarized in the body excitations (such as plasmon, band-band, etc.). The same direction. The induced dipole moment in (13) can be electron dynamics for a specific peak can be investigated calculated as a measure of how far the electron density n(r, t) further by computing and then visualizing the response density has moved away from its ground-state value n0 along a given in 4D. Such simulations aim at providing more details on the direction µ [17], [94], [95]: electron dynamics of the particular resonances with relevant Z information about their nature. The response density δn(ω, r) dµ(t) = − (µ − r0)(n(r, t) − n0(r)) dr, (14) Ω is the change in electron density due to an excitation at frequency ω (i.e. charge oscillations at ω). One possible way where r stands for the molecular center of mass. As a result, 0 to visualize δn(ω, r) is by applying a sinusoidal excitation at for an isotropic material, it is possible to compute the dynamic a given frequency of interest, waiting for the induced dipole polarizability by inverting (13). In frequency domain (after moment to reach a steady state where it oscillates at ω, and Fourier transform), the expression becomes: plotting the 4D data when the dipole reaches a maximum and d(ω) a minimum [99]. Another more efficient approach consists α(ω) = . (15) E(ω) of computing δn(ω, r) directly following the same procedure used for deriving the dipole moment in (14) and (16), which In practice, it is necessary to introduce artificial damping leads to: signal into the computed dipole moment before taking its Z T Fourier transform: −γt iωt δn(r, ω) = (n(r, t) − n0(r)) e )e dt. (18) Z T 0 −γt iωt d(ω) = d(t) × e e dt, (16) In practice, there is no need to store all the n(r, t) functions 0 that would be too prohibitive. Once the peaks/resonances of where γ is a damping coefficient. This damping term is interests (i.e. {ωj}) have been identified in the absorption used to mimic the system relaxation effect since the TDDFT spectrum (for a given polarized excitation), one can proceed by simulations presented here do not explicitly account for energy running a new time-dependent TDDFT calculation to compute dissipation (i.e. a system would physically emit energy as the response density δn(r, ωj) (all frequency {ωj} at once) photons and relax back to the ground-state after being excited). using an on-the-fly Fourier transform of the time varying In time-dependent simulations, any external electric field electron density. Results from this approach are shown in E(t) may be considered. It is common practice, however, Figure 8 for few selected peaks in the absorption spectrum of to either use a step potential u(t) or an impulse excitation benzene when the molecule is excited with an impulse electric δ(t) both along a given direction [16]. Table II presents the field polarized in the z direction (as shown in Figure 7). resulting expressions for the dynamic polarizability α(ω) after 4) Discussion on Accuracy and Reliability: The direction Fourier transforms of these particular electric fields. independent absorption spectrum which is computed as the average of the spectra in x, y and z directions, can be directly E(t) α(ω) and quantitatively compared with the experimental data, if

E0 × δ(t) −d(ω)/E0 available. Figure 9 compares the experimental absorption spectra of various molecules with the NESSIE’s first-principle E0 × u(t) −iω × d(ω)/E0 simulation results (obtained at T=0K). In general, the TDDFT TABLE II simulation results compare remarkably well with experimental POLARIZABILITYFOREXCITATIONSINTHEFORMOFANIMPULSE POTENTIALANDASTEPPOTENTIAL (E0 STANDSFORTHEAMPLITUDEOF data for a large number of atomistic systems. THE ELECTRIC FIELD). While the choice of the XC functional can significantly impact the reliability of the DFT ground-state results, a simple 10

Fig. 7. The left plots represent the relative orientation of the benzene molecule if the applied electric field was polarized from left to right. The middle plots show the induced dipole moment for benzene after an impulse excitation polarized in the x (top), y (middle) and z (bottom) direction (time steps of 5as are used). The corresponding absorption spectra are shown in the right plots.

basis and more advanced XC term (B3LYP) (except for the CO molecule).

H2 CH4 CO C6H6 NWChem-ccpvtz/LDA 12.31 10.29 8.28 7.10 NWChem-ccpvtz/B3LYP 12.90 10.75 8.55 7.18 Fig. 8. 4D isosurface plots of the response electron density for four different NESSIE-P2/LDA 11.19 9.40 8.40 6.90 peaks chosen from the absorption spectrum of z-polarized excited benzene Experiment 11.19 9.70 8.55 6.90 (electric field going from left to right). From left to right, four frequency ω values are represented: 6.91eV, 10.04eV, 13.61eV, and 16.20eV. TABLE III COMPARISON OF REAL-TIME TDDFTNESSIE AND NWCHEM SIMULATION RESULTS, WITH THE EXPERIMENTAL DATA FOR THE LOWEST EXCITATION ENERGIES (INEV), AS REPORTED IN [114]. adiabatic LDA (ALDA) approximation for TDDFT appears to be sufficient for a wide variety of systems. Using NESSIE, it is also interesting to note that the choice of P2 vs P3 FEM basis functions, does have only a minimal impact on 5) Discussion on large-scale real-time TDDFT simulations: the accuracy of the absorption spectrum [70]. This is clearly The computing challenges end up being very similar between not the case for DFT ground-state calculations as reported in ground-state DFT and excited-state real-time TDDFT calcula- Table I, where good accuracy would require an appropriate tions. There are only two main operations to consider: (i) solv- level of refinement for FEM (using at least cubic P3 FEM). ing a Hamiltonian linear system with multiple right hand sides As a result, the real-time TDDFT framework appears to be using in particular the muffin-tin technique in Figure 3; and resilient to some approximations (such as the choice of XC (ii) solving a linear system for the Poisson equation (using term, or basis functions) as long as they are not too far off local XC). For DFT, these two steps have to be repeated self- and stay consistent throughout the time propagation. consistently until convergence, while for TDDFT, they need to Beside offering the opportunity to perform X-Ray spec- be repeated at each time step of the time-propagation (using troscopy, there are many other advantages for considering either a Crank-Nicolson or a spectral decomposition scheme). a full real-space all-electron treatment in simulations. In In comparison to other approaches, the muffin-tin solver has contrast to other approaches, a full-core real-space potential demonstrated great efficiency to achieve strong and weak offers numerical consistency while performing simulations in scalability [85] (see Figure 6). The scalability bottleneck of time-domain. Transferability issues are indeed likely happen the muffin-tin solver would eventually come from solving the with the use of pseudopotentials that are generated for time- interstitial Hamiltonian system in parallel using MPI (Schur independent calculations, or in turn, with the use of LCAO complement in step 2 of Figure 3). In practice, our strong and basis which cannot offer the same reliability to capture both weak scalability results show that up to one thousand atoms confined and extended states (although LCAO bases can be (corresponding to ∼ 1M size interstitial matrix), this system “augmented” in time-dependent simulations). Comparisons can be efficiently solved using a standard direct ’black-box’ between NESSIE and LCAO approaches are reported in parallel sparse system solver (such as cluster MKL-PARDISO Table III. Although NESSIE is using both a low level of [86]). Reaching the milestone of ten thousand atoms and real-space approximation (P2-FEM) and a rather simple XC beyond, is still the subject of active research efforts that term (LDA), the results compare relatively well with the investigate new directions in numerical linear algebra such experimental data. These results are actually much better than as the use of hybrid parallel solvers with customized low- the ones obtained with the NWChem software using the ccpvtz communication preconditioners. 11

Fig. 9. Comparison between the computed absorption spectrum for various molecules using NESSIE (with ALDA and P2 FEM) and multiple set of experimental values for H2 (Exp-A [100], B [101] C [102] and D [103]), CO [104], H2O (Exp-A [105], B [106] C [107] and D [108]), CH4 (Exp-A [109], and B [110]), SiH4 [111], and C6H6 (Exp-A [112], and B [113])

IV. NANOPLASMONIC APPLICATIONS Since the reported preliminary work on short CNT [99] Nanoplasmonics is a field that has grown rapidly in the last (about 5 unit-cells), NESSIE has been upgraded to simulate few years [115], [116], and it already offers numerous appli- large-scale atomistic systems by taking advantage of new high cations to electronics and photonics. In the visible and near performance computing techniques such as the muffin-DD IR range, nano-antennas [117] and nanoparticles have provided solver presented and discussed in Figures 2 and 3. This all- drastically enhanced coupling to electromagnetic waves [118], electron real-space and real-time TDDFT framework is now [119]. A 2008 comment in Nature Nanotechnology [117] capable to simulate very large structures (up to tens of unit stated “molecular components promise to revolutionize the cells for CNT – from hundred atoms to few thousands), and electronics industry, but the vision of devices built from quan- lead to more relevant predicted data of the plasmonic effects tum wires and other nanostructures remains beyond present for 1D systems. As an example, Table IV summarizes the main day technology. Making such devices will require an extremely NESSIE parameters and simulation results while considering detailed knowledge of the properties of these components such increasingly longer (3,3)-CNTs. These results show that the as the dynamics of charge carriers, electron spins and various position of the plasmon excitation peak (lowest excitation excitations, on nanometer length scales and subpicosecond energy/frequency i.e. Ep, fp) keeps shifting with longer CNTs. timescales at very high (up to THz) frequencies”. Reference The corresponding absorption spectra are provided in Fig- [117] points out that single-wall carbon nanotube (SWCNT) ure 10. resonators would constitute a unique THz ultra-compact circuit 5-CNT 10-CNT 20-CNT 40-CNT element, which might for example be used to control a THz Length (nm) 1.26 2.49 4.99 9.99 oscillator source. Similar opportunities exist in the IR and #Atoms 78 138 258 498 408 768 1488 2928 visible ranges. One concludes that discovery of new plasmonic System #Electrons materials is mandatory for the future expansion of the field of size muffin 2065 2065 2065 2065 size inters. 149499 126431 233696 446559 nanoplasmonics. Mesh size total 302925 397877 741182 1426125 In Ref. [99], NESSIE has been used to provide evidence E1 (eV) −267.73 −268.05 −268.07 −268.28 of the plasmon resonances (collective electron excitations) EHOMO (eV) −4.996 −4.823 −5.000 −4.955 DFT Etot (eV) −67713 −128825 −251351 −496359 in a number of representative short 1D finite carbon-based ∆t (fs) 0.01 0.01 0.01 0.01 nanostructures using real-time TDDFT simulations. The sim- Total T (fs) 15 30 45 65

RT-CN #Time-steps 1500 3000 4500 6500 ulated systems ranged from small molecules such as C2H2 Ep (eV) 1.62 1.24 0.82 0.525 to various carbon nanostructures that are equivalent to 1D fp (THz) 394.13 299.83 198.27 126.94 6 conductors with finite lengths, including: carbon chains, nar- vp (10 m/s) 0.98 1.49 1.98 2.54 TDDFT row armchair and zigzag graphene nanoribbons (i.e. acenes TABLE IV and PPP), and short carbon nanotubes (CNT). NESSIE all- NESSIE’S MAIN PARAMETERS AND SIMULATION RESULTS OBTAINED electron TDDFT/ALDA’s model was able to accurately capture USINGTHEALL-ELECTRON/DFT/TDDFT/ALDA FRAMEWORKAPPLIED the bright components of the spectra which account for the TO INCREASINGLY LONGER (3,3)-CNTS.THEREAL-SPACE MESH USES A P2-FEM DISCRETIZATIONWHILETHEREAL-TIMEAPPROACHUSESA CN plasmonic excitation. The chief signature of 1D plasmons is (RT-CN) PROPAGATION SCHEME.THESIMULATIONSWEREEXECUTEDON a high-frequency excitation that is inversely proportional to XSEDE-COMET [125]. the length of the conductor. In particular, it was shown that metallic 1D CNTs can be well described with the Tomonaga- Luttinger theory [99], [120]–[124]. The plasmon velocity is In Table IV, the plasmon velocity is obtained with the expected to reach an asymptotic value (up to 3 to 5 times reasonable assumption that the plasmon (collective electron the single particle Fermi velocity) when the simulations are cloud) must travel back and forth the full length L of the extended to tens of unit cells, such as very long CNT that nanotube to complete a single oscillation (i.e. vp = 2Lfp). become relevant for THz spectroscopy. This is further supported by the 4-dimensional isosurface plots 12

modeling approach is tailored to optimally take advantage of the full capability of the state-of-the-art FEAST eigensolver that can achieve significant parallel scalability on modern HPC architectures. With the success in meeting these challenges, NESSIE is currently able to extend the first-principle simula- tions to very large atomistic structures (i.e. many thousands electrons at the level of all-electron/DFT/TDDFT/ALDA the- ory). The modeling framework opens new perspectives for ad- dressing the numerical challenges in TDDFT excited-state cal- culations to operate the full range of electronic spectroscopy, and study the nanoscopic many-body effects in arbitrary com- plex molecules and finite-size large-scale nanostructures. It is expected that the NESSIE software and associated numerical components can become a new valuable new tool for the scientific community, to investigate the fundamental electronic properties of numerous nanostructured materials.

Fig. 10. Computed absorption spectra for three (3,3)-CNTS presented in ACKNOWLEDGMENT Table IV (10, 20 and 40 units cells). The selected energy range outlines the lowest excitation peak which keeps shifting left (red-shift) with longer This work was also supported by the National Science Foun- nanostructures. The corresponding 4D isosurface plots of the response electron dation, under grants #CCF-1510010 and SI2-SSE#1739423. density (on top) provide visual information about the dynamic of these plasmonic excitations. The CNT calculations used the Extreme Science and Engi- neering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. of the response density δn(ω, r) (18) in Figure 10, which have been calculated for the specific plasmon resonances. The REFERENCES plasmon velocity increases to 2.54 times the Fermi velocity (here at 106 m/s) for the 40 unit cells (3,3)-CNT, and it is [1] R. M. Martin, Electronic Structure: Basic Theory and Practical Methods, expected to level off if we keep increasing the length of the Cambridge University Press, (2004). [2] J. Kohanoff, Electronic Structure Calculations for Solids and Molecules: CNT. Theory and Computational Methods, Cambridge University Press (2006). [3] P. Hohenberg and W. Kohn, Inhomogeneous Electron Gas, Phys. Rev. V. CONCLUSION 136, B864 - B871 (1964). [4] W. Kohn and L. J. Sham, Self-Consistent Equations Including Exchange Modern first-principle calculations aim at bringing compu- and Correlation Effects, Phys. Rev. 140, A1133 - A1138 (1965). tational activities up to the level where they can significantly [5] K. Burke, Perspective on density functional theory, J. Chem. Phys. 136, 150901, (2012). impact innovations in electronic nanomaterials and devices [6] L. J. Sham, Theoretical and Computational Development Some efforts research. Nanostructures with many atoms and electrons can beyond the local density approximation, International Journal of Quantum only be treated by addressing the efficiency and scalability Chemistry Vol. 56, 4 , pp 345-350 (2004). [7] V. I. Anisimov, Strong Coulomb Correlations in Electronic Structure of algorithms on modern computing platforms with multiple Calculations: Beyond the Local Density Approximation, Gordon and hierarchical levels of parallelism. These goals can be achieved Breach Science Publisher, (2000). using an efficient modeling framework that can perform real- [8] E. M. Stoudenmire, L. O. Wagner, S. R. White, and K. Burke, One- dimensional Continuum Electronic Structure with the Density Matrix space DFT ground-state calculations, and real-time TDDFT Renormalization Group and Its Implications For Density Functional excited-state calculations. The latter can be used to study var- Theory, Phys. Rev. Lett. 109, I5, 056402 (2012). ious relevant quantum many-body effects (such as plasmonic [9] W. G. Aulbur, L. Jonsson,¨ J. W. Wilkins, Quasiparticle Calculations in effects) by performing electronic spectroscopy. Spectroscopic Solids. Solid State Physics 54: 1218, (2000). [10] C. A. Ullrich, Time-Dependent Density-Functional Theory: Concepts techniques are among the most fundamental probes of matter: and Applications. Oxford University Press, 2012. incoming radiation perturbs the sample and the response to this [11] M. A. L. Marques, N. T. Maitra, F. M. S. Nogueira, E. K. U. Gross, perturbation is measured. The system is inherently excited in A. Rubio Fundamentals of Time-Dependent Density Functional Theory, Lecture Notes in Physics vol. 837, Springer-Verlag, Berlin Heidelberg this process and hence, a calculation of ground-state properties (2012) is insufficient to interpret the response of the system. TDDFT [12] G. Onida, L. Reining, A. Rubio, Electronic excitations: density- has had considerable success modeling the interaction of functional versus many-body Greens-function approaches , Rev. Mod. Phys. 74, pp 601-658 (2002). electromagnetic fields with matter, and obtaining spectroscopic [13] E. Runge, E. K. U. Gross, Density-Functional Theory for Time- information with absorption and emission spectra. Dependent Systems, Phys. Rev. Let., 52, pp997-1000, (1984). The paper discussed the NESSIE software which has been [14] K. Burke, J. Werschnik, E. K. U. Gross, Time-dependent density func- tional theory: Past, present, and future, J. Chem. Phys. 123, 062206 fundamentally designed to take advantage of parallel opti- (2005) mization at various levels of the entire modeling process. [15] K. Yabana, G. F. Bertsch, Application of the time-dependent local density NESSIE benefits from the linear scaling capabilities of real- approximation to optical activity, Phys. Rev. A60 1271 (1999). [16] K. Yabana, T. Nakatsukasa, J.-I. Iwata, and G. F. Bertsch, Real-time, space mesh techniques and domain decomposition methods real-space implementation of the linear response time-dependent density- to perform all-electron (full core potential) calculations. The functional theory, Phys. Stat. Sol. (b) 243, No. 5 , pp11211138 (2006). 13

[17] X. Andrade, J. Alberdi-Rodriguez, D. A. Strubbe, M. J. T. Oliveira, [46] A. Imakura, L. Du, and T. Sakurai, A block Arnoldi-type contour integral F. Nogueira, A. Castro, J. Muguerza, A. Arruabarrena, S. G. Louie, spectral projection method for solving generalized eigenvalue problems, A. Aspuru-Guzik, A. Rubio, and M. A. L. Marques,Time-dependent Applied Mathematics Letters, 32 (2014), pp. 22–27. density-functional theory in massively parallel computer architectures: [47] A. P. Austin and L. N. Trefethen, Computing eigenvalues of real the octopus project, J. Phys.: Cond. Matt. 24 233202 (2012). symmetric matrices with rational filters in real arithmetic, SIAM Journal [18] Octopus http://www.tddft.org/programs/octopus/ on Scientific Computing, 37 (2015), pp. A1365–A1387. [19] NWChem http://www.nwchem-sw.org/index.php/Main Page [48] E. Polizzi Density-matrix-based algorithm for solving eigenvalue prob- [20] GPAW https://wiki.fysik.dtu.dk/gpaw/ lems, Phys. Rev. B, 79, p115112, (2009). [21] M.E. Casida, Time-dependent density-functional response theory for [49] The FEAST eigenvalue solver, http://www.feast-solver.org molecules, in Recent Advances in Density Functional Methods, Part I, [50] P. Tang, E. Polizzi, FEAST as a Subspace Iteration EigenSolver edited by D.P. Chong (Singapore, World Scientific, 1995), p 155. Accelerated by Approximate Spectral Projection, SIAM Journal on Matrix [22] E. Lorin de la Grandmaison, S. B. Gowda, Y. Saad, M. L. Tiago, and Analysis and Applications, (SIMAX), 35, 354 (2014). J. R. Chelikowsky. Efficient computation of the coupling matrix in time- [51] S. Guttel,¨ E. Polizzi, P. T. Tang, G. Viaud, Optimized Quadrature Rules dependent density functional theory. Computer Physics Communications, and Load Balancing for the FEAST Eigenvalue Solver, SIAM Journal on 167:7–22, (2005). Scientific Computing (SISC), 37 (4), pp2100-2122 (2015) [23] W. R. Burdick, Y. Saad, L. Kronik, Manish Jain, and James Che- [52] NESSIE Software http://www.nessie-code.org likowsky.Parallel implementations of time-dependent density functional [53] J. C. Slater, Wave Functions in a Periodic Potential, Phys. Rev., 51, theory. Computer Physics Communications, 156:22–42, (2003). pp846-851, (1937). [24] S. Meng, E. Kaxiras, Real-time, local basis-set implementation of time- [54] H. Si, TetGen, a Delaunay-Based Quality Tetrahedral Mesh Generator, dependent density functional theory for excited state dynamics simula- ACM Trans. Math. Softw. 41, 2, Art. 11 (2015). tions, J. of Chem. Phys. 129, 054110 (2008). [55] TetGen http://wias-berlin.de/software/tetgen [25] T. Beck, Real-space mesh techniques in density-functional theory, Rev. [56] B. N. Harmon and D. D. Koelling, Technique for rapid solution of the of Modern Phys., 72, 1041, (2000). APW secular equation, J. Phys. C: Solid-state Phys., 7, p210, (1974). [26] P. F. Batcho, Computational method for general multicenter electronic [57] E. Sjostedt¨ L. Nordstrom,¨ Efficient solution of the non-linear augmented structure calculations, Phys. Rev. B, 61, 7169, (2000). plane wave secular equation, J. Phys.: Condens. Matter, 14, pp12485- [27] S. R. White, J. W. Wilkins, M. P. Teter, Finite Element method for 12494, (2002). electronic structure, Phys. Rev. B, 39, 5819 (1989). [58] W-J Beyn, An integral method for solving nonlinear eigenvalue [28] J. R. Chelikowsky, N. Troullier, Y. Saad, Finite-Difference- problems, Linear Algebra and its Applications, V. 436, I. 10, 3839, (2012). Pseudopotential method: electronic structure calculations without a basis, [59] B. Gavin, A. Miedlar, E. Polizzi, FEAST eigensolver for nonlinear Phys. Rev Let. 72, 1240, (1994). eigenvalue problems, Journal of Computational Science, V. 27, 107, [29] E. Tsuchida, M. Tsukada, Electronic-structure calculations based on (2018). finite-element method, Phys. Rev. B, 52, 5573, (1995). [60] O. K. Andersen, Linear methods in band theory, Phys. Rev. B, 12, [30] N. A. Modine, G. Zumbach, E. Kaxiras Adaptive-coordinate real-space pp3060-3083, (1975). electronic structure calculations for atoms, molecules, and solids, Phys. [61] D. J. Singh, Ground-state properties of lanthanum: Treatment of extend- Rev. B, 55, 10289, (1997). core states, Phys. Rev. B, 43, p6388, (1991). [31] E. L. Briggs, D. J. Sullivan, J. Bernholc, Real-space multigrid-based [62] E. Sjostedt,¨ L. Nordstrom,¨ and D. J. Singh, An alternative way of lin- approach to large-scale electronic structure calculations, Phys. Rev. B, earizing the augmented plane-wave method, Solid state communications, 54, 14362, (1996). 114, pp15-20, (2000). [32] M. Heisakanen, T. Torsti, M. J. Puska, and R. M. Nieminen, Multigrid [63] G. K. H. Madsen, P. Blaha, K. Schwarz, E. Sjostedt,¨ L. Nordstrom,¨ method for electronic structure calculations, Phys. Rev. B, 63, 245106, Efficient linearization of the augmented plane-wave method, Phys. Rev. (2001). B, 64, p195134, (2001) [33] S. Goedecker, Linear scaling electronic structure methods, Rev. of [64] D. J. Singh, L. Nordstrom, Planewaves, Pseudopotentials, and the Modern Phys., 71, 1085, (1999). LAPW Method, Springer, 2nd edition, (2006). [34] J. E. Pask, B. M. Klein, P. A. Sterne, C. Y. Fong, Finite-element methods [65] H. Hellmann, A New Approximation Method in the Problem of Many in electronic structure theory, Comput. Phys. Comm., 135,1, (2001). Electrons, J. Chem. Phys.,3, p61, (1935). [35] J.R. Chelikowsky, The pseudopotential-Density Functional method [66] J. C. Phillips and L. Kleinman, New Method for Calculating Wave applied to nanostructures, J. Phys. D 33, R33 (2000). Functions in Crystals and Molecules, Phys. Rev., 116, p287, (1959). [36] T. Torsti, T. Eirola, J. Enkovaara, T. Hakala, P. Havu, V. Havu, T. Hnmaa, [67] L. Kleinman and D. M. Bylander, Efficacious Form for Model Pseu- J. Ignatius,-A M. Lyly, I. Makkonen, T. Rantala, K. Ruotsalainen, E. nen,- dopotentials, Phys. Rev. Lett., 48, p1425, (1982) A H. Saarikoski, M. J. Puska, Three real-space discretization techniques [68] P. E. Blochl,¨ Projector augmented-wave method, Phys. Rev. B, 50, in electronic structure calculations, Physica Status Solidi (b) 243, No. 5, p17953, (1994) 1016 1053 (2006). [69] A. Levin, D. Zhang, E. Polizzi, FEAST fundamental framework for [37] L. Lehtovaara, V. Havu, and M. Puska, All-electron density functional electronic structure calculations: Reformulation and solution of the theory and time-dependent density functional theory with high-order finite muffin-tin problem, Comput. Phys. Comm. V183 I11, pp2370–2375, elements J. Chem. Phys. 131, p054103 (2009) (2012) [38] S. Baroni and P. Giannozzi, Towards very large scale electronic structure [70] J. Kestyn, Parallel Algorithms for Time Dependent Density Functional calculations Europhys, Lett. 17, 547 (1992). Theory in Real-space and Real-time, Doctoral Dissertations, UMass [39] L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E, SelInvAn Algorithm for Amherst, (2018). Selected Inversion of a Sparse Symmetric Matrix ACM Trans. on Math. [71] J. Kestyn, E. Polizzi, P.T. Tang FEAST eigensolver for non-Hermitian Softw. (TOMS), V.37 I4, no40 (2011). problems, SIAM Journal on Scientific Computing, 38, S772–S799, (2016) [40] D. Zhang and E. Polizzi, Linear scaling techniques for first-principle [72] D. G. Anderson, Iterative procedures for nonlinear integral equations. calculations of large nanowire devices 2008 NSTI Nanotechnology Journal of the ACM (JACM) 12, 4, 547-560, (1965). Conference and Trade Show. Technical Proceedings, Vol. 1 pp12-15 [73] C. G. Broyden, A class of methods for solving nonlinear simultaneous (2008). equations. Mathematics of computation 19, 92 (1965), 577593. [41] G. Golub, H. A. van der Vorst. Eigenvalue computation in the 20th [74] P. Pulay, Convergence acceleration of iterative sequences. the case of century, J. of Comput. and Appl. Math., V. 123, pp. 35-65 (2000). SCF iteration, Chem. Phys. Let. 73 (2): 393398 (1980). [42] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. van der Vorst, Templates [75] P. Pulay, Improved scf convergence acceleration. Journal of Computa- for the solution of Algebraic Eigenvalue Problems: A Practical Guide, tional Chemistry 3, 4, 556-560 (1982). Society for Industrial and Applied Mathematics, Philadelphia, PA, (2000). [76] K. N. Kudin, G. E. Scuseria, and E. Cances. A black-box self-consistent [43] http://www.netlib.org/utk/people/JackDongarra/la-sw.html field convergence algorithm: One step closer. The Journal of chemical [44] T. Sakurai and H. Sugiura, A projection method for generalized eigen- physics 116, 19, 8255-8261, (2002). value problems using numerical integration, Journal of Computational [77] C. Yang, W. Gao, J. C. Meza On the convergence of the self-consistent and Applied Mathematics, 159 (2003), pp. 119–128. field iteration for a class of non-linear eigenvalue problems SIAM J. [45] T. Sakurai and H. Tadano, CIRR: a Rayleigh-Ritz type method with Matrix Anal. Appl., V. 30 (4), pp. 1773-1788 (2009). contour integral for generalized eigenvalue problems, Hokkaido Mathe- [78] H. F. Walker, and P. Ni. Anderson acceleration for fixed-point iterations. matical Journal, 36 (2007), pp. 745–757. SIAM Journal on Numerical Analysis 49, 4, 1715-1735, (2011). 14

[79] B. Gavin, E. Polizzi, Non-linear eigensolver-based alternative to [106] B-M. Cheng, C-Y. , Chung, M. Bahou, Y-P. Lee, L.C. Lee, R. van traditional SCF methods, J. Chem. Phys. 138, 194101 (2013) Harrevelt, and M. C. van Hemert, Quantitative spectroscopic and [80] V. Eyert, A comparative study on methods for convergent acceleration theoretical study of the optical absorption spectra of h2o, hod, and d2o in of iterative vector sequences, Journal of Computational Physics 124, the 125-145 nm region. The Journal of chemical physics 120, 1, 224-229 pp271-285 (1996). (2004). [81] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual [107] D. H. Katayama, R. E. Huffman, C. L. and OBryan, Absorption and algorithm for solving nonsymmetric linear systems. SIAM Journal on photoioniza- tion cross sections for h2o and d2o in the vacuum ultraviolet. scientific and statistical computing 7, 3, 856-869, (1986) The Journal of Chemical Physics 59, 8, 4309-4319, (1973). [82] Y. Saad, J. R. Chelikowsky, and S. M. Shontz. Numerical methods for [108] W.F. Chan, G. Cooper, C. E. and Brion, The electronic spectrum of electronic structure calculations of materials. SIAM review 52, 1, 3-54, water in the discrete and continuum regions. absolute optical oscillator (2010). strengths for photoabsorption (6-200 eV). Chemical physics 178, 1-3, [83] FHI-aims, https://aimsclub.fhi-berlin.mpg.de/ 387-400, (1993). [84] J. P. Perdew, A. Zunger Self-interaction correction to density- functional [109] F. Z. Chen, C. Y. R. Wu. Temperature-dependent photoabsorption approximations for many-electron systems, Phys. Rev. B 23, 10, 5048, cross sections in the VUV-UV region. I. Methane and ethane. Journal of (1981). Quantitative Spectroscopy and Radiative Transfer 85, 2, 195-209, (2004). [85] J. Kestyn, V. Kalantzis, E. Polizzi, Y. Saad, PFEAST: A High Per- [110] K. Kameta, N. Kouchi, M. Ukai, and Y. Hatano. Photoabsorption, formance Eigenvalue Solver Using Distributed-Memory Linear Solvers, photoionization, and neutral-dissociation cross sections of simple hydro- Proc Int Conf High Perform Comput Networking, Storage Anal. 2016:16, carbons in the vacuum ultraviolet range. Journal of Electron Spectroscopy (2016). and Related Phenomena 123, 2-3, 225-238 (2002). [86] Intel Math Kernel Library. http://software.intel.com/en-us/intel-mkl. [111] G. Cooper, G. R. Burton, W. F. Chan, and C. E. Brion. Absolute [87] P. R. Amestoy, I. S. Duff, J. L’Excellent, and J. Koster, MUMPS: a oscillator strengths for the photoabsorption of silane in the valence and general purpose distributed memory sparse solver, in Applied Parallel Si 2p and 2s regions (7.5-350 ev). Chemical physics 196, 1-2, 293-306, Computing. New Paradigms for HPC in Industry and Academia, Springer, (1995). pp. 121–130, (2000). [112] A. Dawes, N. Pascual, S. V. Hoffmann, N. C. Jones, and N. J. [88] A. Castro, M. A. L. Miguel, A. Rubio, Propagators for the time- Mason. Vacuum ultraviolet photoabsorption spectroscopy of crystalline dependent Kohn-Sham equations, J. Chem. Phys. 121, 3425 (2004). and amorphous benzene. Physical Chemistry Chemical Physics 19, 40, [89] J. Crank, P. Nicolson, A practical method for numerical evaluation of 2754427555 (2017). solutions of partial differential equations of the heat conduction type, [113] E. E. Rennie, C. A. F. Johnson, J. E. Parker, D. M. P. Holland, D. Proc. Camb. Phil. Soc. 43 (1): 5067 (1947). A. Shaw, and M. A. Hayes. A photoabsorption, photodissociation and [90] T.Y. Mikhailova and V.I. Pupyshev, Symmetric approximations for the photoelectron spectroscopy study of c6h6 and c6d6, Chemical physics evolution operator, Physics Letters A 257, 1-2 p16 (1999). 229, 1, 107-123, (1998). [91] On the exponential solutions of differential equations for a linear [114] K. Lopata, and N. Govind, Modeling Fast Electron Dynamics with operator Commun. Pure Appl. Math. VII (1954), 649673. (1954). Real-Time Time-Dependent Density Functional Theory: Application to [92] Z. Chen, E. Polizzi Spectral-based propagation schemes for time- Small Molecules and Chromophores, J. Chem. Theory Comput., 7 (5), dependent quantum systems with application to carbon nanotubes, Phys. pp 13441355 (2011). Rev. B, Vol. 82, 205410 (2010). [115] M.I. Stockman, Nanoplasmonics: the physics behind the applications, [93] A. Russakoff, Y. Li, S. He, and K. Varga, Accuracy and computa- Phys. Today 64 3944 (2011). tional efficiency of real-time subspace propagation schemes for the time- [116] N.J. Halas, Connecting the dots: reinventing optics for nanoscale dependent density functional theory, The Journal of Chemical Physics dimensions, Proc. Natl. Acad. Sci. 106, 3463 (2009). 144, 204125 (2016). [117] L. Novotny, and N. van Hulst, Antennas for light, Nature Photonics [94] X. Andrade, S. Botti, M. A. L. Marques, and A. Rubio, Time-dependent 5, 83 (2011). density functional theory scheme for efficient calculations of dynamic [118] H. Cang, A. Labno, C. Lu, X. Yin, M. Liu, C. Gladden, Y. Liu and X. (hyper)polarizabilities, J. Chem. Phys. 126, 184106 (2007). Zhang, Probing the electromagnetic field of a 15 nanometre hotspot by [95] Y. Takimoto, A Real-time Time-dependent Density Functional Theory single molecule imaging, Nature 469, 385 (2011). Method for Calculating Linear and Nonlinear Dynamic Optical Response. [119] J.B. Khurgin, How to deal with the loss in plasmonics and metamate- PhD thesis, University of Washington, (2008). rials, Nature Nanotechnology 10, 26 (2015). [96] V. Astapenko, Interaction of Ultrashort Electromagnetic Pulses with [120] C. Kittel, Introduction to Solid State Physics 8th edn, New York: Matter. Springer, (2013). Wiley; (2005). [97] R. C. Hilborn, Einstein coefficients, cross sections, f values, dipole [121] S. Tomonaga, Remarks on Blochs method of sound waves applied to moments, and all that. American Journal of Physics 50, 11, 982-986, many-fermion problems, Prog. Theor. Phys. 5, 54469 (1950). (1982). [122] J. Luttinger, An exactly soluble model of a many-fermion system, J. [98] Z. Chen, Computational All-Electron Time-Dependent Density Func- Math. Phys. 4, 115462 (1963). tional Theory in Real-Space and Real-Time: Applications to Molecules [123] V.K. Deshpande, M. Bockrath, L. Glazman and A. Yacoby A, Electron and Nanostructures, Doctoral Dissertations, UMass Amherst, (2013). liquids and solids in one dimension, Nature 464 209 (2010). [99] E. Polizzi, S. Yngvesson, Universal nature of collective plasmonic [124] C. Kane, L. Balents and P.A. Fisher, Coulomb interactions and excitations in finite 1-D carbon-based nanostructures, Nanotechnology, mesoscopic effects in carbon nanotubes, Phys. Rev. Lett. 79 5086 (1997). 26, p325201 (2015). [125] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. [100] T. E. Sharp, Potential-energy curves for molecular hydrogen and its Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, ions. Atomic Data and Nuclear Data Tables 2, 119-169 (1970). N. Wilkins-Diehr, XSEDE: Accelerating Scientific Discovery, Computing [101] G. Herzberg. Molecular spectra and molecular structure. vol. 1: in Science & Engineering, vol.16, no. 5, pp. 62-74, Sept.-Oct. (2014). Spectra of diatomic molecules. New York: Van Nostrand Reinhold, 1950, 2nd ed. (1950). [102] O. W. Richardson. Molecular hydrogen and its spectrum, vol. 23. Yale University Press, (1934). [103] C. Backx, G. R. Wight, and M. J. Van der Wiel, Oscillator strengths (10-70 ev) for absorption, ionization and dissociation in h2, hd and d2, obtained by an electron-ion coincidence method. Journal of Physics B: Atomic and Molecular Physics 9, 2, 315 (1976). [104] W. F. Chan, G. Cooper, C.E. and Brion, Absolute optical oscillator strengths for discrete and continuum photoabsorption of carbon monoxide (7200 ev) and transition moments for the x 1σ+ a 1π system. Chemical James Kestyn James Kestyn received his PhD in Electrical Engineering at the physics 170, 1, 123-138 (1993). University of Massachusetts, Amherst in 2018. His graduate studies focused [105] C-Y. Chung, E. P. Chew, B-M. Cheng, M. Bahou, and Y-P. Lee, on electronic structure calculations, where he helped to develop the multi- Temperature dependence of absorption cross-section of h2o, hod, and d2o level parallel approach used in NESSIE, and on numerical linear algebra, in the spectral region 140-193 nm. Nuclear Instruments and Methods in developing the non-Hermitian extension of the FEAST eigenvalue solver. He Physics Research Section A: Accelerators, Spectrometers, Detectors and has held internships at Intel and Samsung and has since been working on Associated Equipment 467 15721576, (2001). computational electromagnetics at Stellar Science Ltd Co. 15

Eric Polizzi Eric Polizzi is Professor in the Department of Electrical and Computer Engineering and the Department of Mathematics and Statistics at the University of Massachusetts, Amherst. He received an education in theoretical and computational physics, and a PhD in Applied Mathematics (2001) from the University of Toulouse, France. Prior to joining UMass in 2005, he served as a postdoctoral research associate in Electrical Engineering (2002-2003) and as a senior research scientist in Computer Sciences (2003- 2005), both at Purdue University. Prof. Polizzi is conducting interdisciplinary research activities at the intersection between advanced mathematical tech- niques, parallel numerical algorithms, and computational nanosciences. These activities can be broadly divided into three projects: the DFT/TDDFT NESSIE code, the parallel linear system solver SPIKE, and the eigenvalue solver FEAST.