QUANTUM ESPRESSO: a Modular and Open-Source Software Project for Quantum Simulations of Materials

Home , DP code, Kinetic PreProcessor, Materials Studio, Quantum ESPRESSO, YAMBO code

QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials

Paolo Giannozzi,1, 2 Stefano Baroni,1, 3 Nicola Bonini,4 Matteo Calandra,5 Roberto Car,6 Carlo Cavazzoni,7, 8 Davide Ceresoli,4 Guido L. Chiarotti,9 Matteo Cococcioni,10 Ismaila Dabo,11 Andrea Dal Corso,1, 3 Stefano de Gironcoli,1, 3 Ralph Gebauer,1, 12 Uwe Gerstmann,13 Christos Gougoussis,5 Anton Kokalj,14 Michele Lazzeri,5 Layla Martin Samos Colomer,1 Nicola Marzari,4 Francesco Mauri,5 Stefano Paolini,3, 9 Alfredo Pasquarello,15 Lorenzo Paulatto,1, 3 Carlo Sbraccia ∗,1 Sandro Scandolo,1, 12 Gabriele Sclauzero,1, 3 Ari P. Seitsonen,5 Alexander Smogunov,12 Paolo Umari,1 and Renata M. Wentzcovitch10 1 CNR-INFM Democritos National Simulation Center, 34100 Trieste, Italy 2 Dipartimento di Fisica, Università degli Studi di Udine, via delle Scienze 208, 33100 Udine, Italy 3 SISSA – Scuola Internazionale Superiore di Studi Avanzati, via Beirut 2-4, 34014 Trieste Grignano, Italy 4 Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 USA 5 Institut de Minéralogie et de Physique des Milieux Condensés, Université Pierre et Marie Curie, CNRS, IPGP, 140 rue de Lourmel, 75015 Paris, France 6 Department of Chemistry, Princeton University, Princeton NJ 08544, USA 7 CINECA National Supercomputing Center, Casalecchio di Reno, 40033 Bologna, Italy 8 CNR-INFM S3 Research Center, 41100 Modena, Italy 9 SPIN s.r.l. via del Follatoio 12, 34148 Trieste, Italy 10 Department of Chemical Engineering and Materials Science, University of Minnesota, 151 Amundson Hall, 421 Washington Avenue SE, Minneapolis MN 55455, USA 11 Université Paris-Est, CERMICS, Projet Micmac ENPC-INRIA, 6-8 avenue Blaise Pascal, 77455 Marne-la-Vallée Cedex 2, France 12 The Abdus Salam International Centre for Theoretical Physics, Strada Costiera 11, 34014 Trieste Grignano, Italy 13 Theoretische Physik, Universität Paderborn, D-33098 Paderborn, Germany 14 Jožef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia 15 Ecole Polytechnique Fédérale de Lausanne (EPFL), Institute of Theoretical Physics, and Institut Romand de Recherche Numérique en Physique des Matériaux (IRRMA), CH-1015 Lausanne, Switzerland

QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). QUANTUM ESPRESSO stands for opEn Source Package for Re- search in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly- restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efﬁciency are still its main focus, with special attention paid to massively-parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and inter-operable codes in the spirit of an open-source project, where researchers active in the ﬁeld of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.

I. INTRODUCTION ize on it represents a multifold challenge. In our view it is also a most urgent, fascinating and fruitful endeavor, able The combination of methodological and algorithmic in- to deliver new forms for scientific exploration and discovery, novations and ever-increasing computer power is deliver- where a very complex infrastructure—made of software rather ing a simulation revolution in materials modeling, starting than hardware—can be made available to any researcher, and from the nanoscale up to bulk materials and devices [1]. whose capabilities continue to increase thanks to the method- Electronic-structure simulations based on density-functional ological innovations and computing power scalability alluded theory (DFT) [2–4] have been instrumental to this revolu- to above. tion, and their application has now spread outside a restricted Over the past few decades, innovation in materials simula- core of researchers in condensed-matter theory and quantum tion and modeling has resulted from the concerted efforts of chemistry, involving a vast community of end users with very many individuals and groups worldwide, often of small size. diverse scientific backgrounds and research interests. Sus- Their success has been made possible by a combination of taining this revolution and extending its beneficial effects to competences, ranging from the ability to address meaningful the many fields of science and technology that can capital- and challenging problems, to a rigorous insight into theoretical methods, ending with a marked sensibility to matters of numerical accuracy and algorithmic efficiency. The readiness to implement new algorithms that utilize novel ideas requires ∗Present address: Constellation Energy Commodities Group 7th Floor, 61 total control over the software being used–for this reason, Aldwich, London, WC2B 4AE, United Kingdom the physics community has long relied on in-house computer 2 codes to develop and implement new ideas and algorithms. large community of users has full access to the source code Transitioning these development codes to production tools is and the development material, under the coordination of a nevertheless essential, both to extensively validate new meth- smaller group of core developers. In the long term, and in ods and to speed up their acceptance by the scientific com- the absence of entrenched monopolies, this strategy could be munity. At the same time, the dissemination of codes has to more effective in providing good software solutions and in be substantial, to justify the learning efforts of PhD students nurturing a community engaged in providing those solutions, and young postdocs who would soon be confronted with the as compared to the proprietary software strategy. In the case necessity of deploying their competences in different research of software for scientific usage, such an approach has the ad- groups. In order to sustain innovation in numerical simulation, ditional, and by no means minor, advantage to be in line with we believe there should be little, if any, distinction between the tradition and best practice of science, that require repro- development and production codes; computer codes should ducibility of other people’s results, and where collaboration is be easy to maintain, to understand by different generations of no less important than competition. young researchers, to modify, and extend; they should be easy In this paper we will shortly describe our answer to the to use by the layman, as well as general and flexible enough above-mentioned problems, as embodied in our QUANTUM to be enticing for a vast and diverse community of end users. ESPRESSO project (indeed, QUANTUM ESPRESSO stands One easily understands that such conflicting requirements can for opEn Source Package for Research in Electronic Structure, only be tempered, if anything, within organized and modular Simulation, and Optimization). First, in Sec. 2, we describe software projects. the guiding lines of our effort. In Sec. 3, we give an overview Software modularity also comes as a necessity when com- of the current capabilities of QUANTUM ESPRESSO. In Sec. plex problems in complex materials need to be tackled with 4, we provide a short description of each software component an array of different methods and techniques. Multiscale ap- presently distributed within QUANTUM ESPRESSO. Finally, proaches, in particular, strive to combine methods with differ- Sec. 5 describes current developments and offers a perspec- ent accuracy and scope to describe different parts of a com- tive outlook. The Appendix sections discuss some of the more plex system, or phenomena occurring at different time and/or specific technical details of the algorithms used, that have not length scales. Such approaches will require software packages been documented elsewhere. that can perform different kinds of computations on different aspects of the same problem and/or different portions of the same system, and that allow for interoperability or joint us- II. THE QUANTUM ESPRESSO PROJECT age of the different modules. Different packages should at the very least share the same input/output data formats; ideally QUANTUM ESPRESSO is an integrated suite of computer they should also share a number of mathematical and appli- codes for electronic-structure calculations and materials mod- cation libraries, as well as the internal representation of some eling based on density-functional theory, plane waves basis of the data structures on which they operate. Individual re- sets and pseudopotentials to represent electron-ion interac- searchers or research groups find it increasingly difficult to tions. QUANTUM ESPRESSO is free, open-source, software meet all these requirements and to continue to develop and distributed under the terms of the GNU General Public Li- maintain in-house software project of increasing complexity. cense (GPL) [20]. Thus, different and possibly collaborative solutions should be The two main goals of this project are to foster method- sought. ological innovation in the field of electronic-structure sim- A successful example comes from the software for simulations and to provide a wide and diverse community of ulations in quantum chemistry, that has often (but not al- end users with highly efficient, robust, and user-friendly soft- ways) evolved towards commercialization: the development ware implementing the most recent innovations in this field. and maintenance of most well-known packages is devolved to Other open-source projects [21–25] exist, besides QUAN- non-profit [5–8] or commercial companies [9–12]. The soft- TUM ESPRESSO, that address electronic-structure calcula- ware is released for a fee under some proprietary license that tions and various materials simulation techniques based on may impose several restrictions to the availability of sources them. Unlike some of these projects, QUANTUM ESPRESSO (computer code in a high-level language) and to what can be does not aim at providing a single monolithic code able to per- done with the software. This model has worked well, and is form several different tasks by specifying different input data also used by some of the leading development groups in the to a same executable. Our general philosophy is rather that of condensed-matter electronic-structure community [13, 14], an open distribution, i.e. an integrated suite of codes designed while some proprietary projects allow for some free academic to be interoperable, much in the spirit of a Linux distribution, usage of their products [14–19]. A commercial endeavor also and thus built around a number of core components designed brings the distinctive advantage of a professional approach to and maintained by a small group of core developers, plus a software development, maintenance, documentation, and sup- number of auxiliary/complementary codes designed, imple- port. mented, and maintained by members of a wider community of We believe however that a more interesting and fruitful al- users. The distribution can even be redundant, with different ternative can be pursued, and one that is closer to the spirit applications addressing the same problem in different ways; of science and scientific endeavor, modeled on the experience at the end, the sole requirements to be added to the distribu- of the open-source software community. Under this model, a tion are that: i) they are distributed under the same GPL li- 3 cence agreement [20] as other QUANTUM ESPRESSO com- mostly, in Fortran-95, with some parts in C or in Fortran-77. ponents; ii) they are fully interoperable with the other com- Fortran-95 offers the possibility to introduce advanced pro- ponents. Of course, they need to be scientifically sound, ver- gramming techniques without sacrificing the performances. ified and validated. External contributors are encouraged to Moreover Fortran is still the language of choice for high- join the QUANTUM ESPRESSO project, if they wish, while performance computing and it allows for easy integration of maintaining their own individual distribution and advertise- legacy codes written in this language. A single source tree ment mode for their software (for instance, by maintaining is used for all architectures, with C preprocessor options se- individual web sites with their own brand names [26]). To fa- lecting a small subset of architecture-dependent code. Par- cilitate this, a web service called qe-forge [27], described allelization is achieved using the Message-Passing paradigm in the next subsection, has been recently put in place. and calls to standard MPI (Message Passing Interface) [30] Interoperability of different components within QUANTUM libraries. Most calls are hidden in a few routines that act as ESPRESSO is granted by the use of common formats for the an intermediate layer, accomplishing e.g. the tasks of sum- input, output, and work files. In addition, external contributors ming a distributed quantity over processors, of collecting dis- are encouraged, but not by any means forced, to use the many tributed arrays or distributing them across processors, and numerical and application libraries on which the core compo- to perform parallel three-dimensional Fast Fourier Transform nents are built. Of course, this general philosophy must be (FFT). This allows to develop straightforwardly and transpar- seen more as an objective to which a very complex software ently new modules and functionalities that preserve the effi- project tends, rather than a starting point. cient parallelization backbone of the codes. One of the main concerns that motivated the birth of the QUANTUM ESPRESSO project is high performance, both in serial and in parallel execution. High serial performance A. QE-forge across different architectures is achieved by the systematic use of standardized mathematical libraries (BLAS, LAPACK The ambition of the QUANTUM ESPRESSO project is not [28], and FFTW [29]) for which highly optimized implemen- limited to providing highly efficient and user-friendly soft- tations exist on many platforms; when proprietary optimiza- ware for large-scale electronic-structure calculations and ma- tions of these libraries are not available, the user can compile terials modeling. QUANTUM ESPRESSO aims at promot- the library sources distributed with QUANTUM ESPRESSO. ing active cooperation between a vast and diverse commu- Optimal performance in parallel execution is achieved through nity of scientists developing new methods and algorithms in the design of several parallelization levels, using sophisti- electronic-structure theory and of end users interested in their cated communications algorithms, whose implementation of- application to the numerical simulation of materials and de- ten does not need to concern the developer, being embedded vices. in appropriate software layers. As a result the performance As mentioned, the main source of inspiration for the model of the key engines (PWscf and CP, see below) may scale on we want to promote is the successful cooperative experience massively parallel computers up to hundreds or thousands of of the GNU/Linux developers’ and users’ community. One of processors respectively. the main outcomes of this community has been the incorpora- The distribution is organized into a basic set of modules, tion within the GNU/Linux operating system distributions of libraries, installation utilities, plus a number of directories, third-party software packages, which, while being developed each containing one or more executables, performing specific and maintained by autonomous, and often very small, groups tasks. The communications between the different executa- of users, are put at the disposal of the entire community under bles take place via data files. We think that this kind of ap- the terms of the GPL. The community, in turn, provides pos- proach lowers the learning barrier for those who wish to con- itive feedback and extensive validation by benchmarking new tribute to the project. The codes distributed with QUANTUM developments, reporting bugs, and requesting new features. ESPRESSO, including many auxiliary codes for the postpro- These developments have largely benefited from the Source- cessing of the data generated by the simulations, are easy to Forge code repository and software development service [31], install and to use. The GNU configure and make util- or by other similar services, such as RubyForge, Tigris.org, ities ensure a straightforward installation on many different BountySource, BerliOS, JavaForge, and GNU Savannah. machines. Applications are run through text input files based Inspired by this model, the QUANTUM ESPRESSO devel- on Fortran namelists, that require the users to specify only an opers’ and users’ community has set up its own web portal, essential but small subset of the many control variables avail- named qe-forge [27]. The goal of qe-forge is to com- able; a specialized graphical user interface (GUI) that is pro- plement the traditional web sites of individual scientific soft- vided with the distribution facilitates this task for most com- ware projects, which are passive instruments of information ponent programs. It is foreseen that in the near future the retrieval, with a dynamical space for active content creation design of special APIs (Application Programming Interfaces) and sharing. Its aim is to foster and simplify the coordination will make it easier to glue different components of the dis- and integration of the programming efforts of heterogeneous tribution together and with external applications, as well as groups and to ease the dissemination of the software tools thus to interface them to other, custom-tailored, GUIs and/or com- obtained. mand interpreters. qe-forge provides, through a user-friendly web inter- The QUANTUM ESPRESSO distribution is written, face, an integrated development environment, whereby re- 4 searchers can freely upload, manage and maintain their own (BO) surface [51], in a variety of thermodynamical en- software, while retaining full control over it, including the sembles, including NPT variable-cell [52, 53] MD; right of not releasing it. The services so far available include source-code management software (CVS or SVN repos- • density-functional perturbation theory (DFPT) [54–56], itory), mailing lists, public forums, bug tracking facilities, to calculate second and third derivatives of the total en- download space, and wiki pages for projects’ documentation. ergy at any arbitrary wavelength, providing phonon dis- qe-forge is expected to be the main tool by which QUAN- persions, electron-phonon and phonon-phonon interac- TUM ESPRESSO end users and external contributors can tions, and static response functions (dielectric tensors, maintain QUANTUM ESPRESSO-related projects and make Born effective charges, IR spectra, Raman tensors); them available to the community. • location of saddle points and transition states via transition-path optimization using the nudged elastic band (NEB) method [57–59]; III. SHORT DESCRIPTION OF QUANTUM ESPRESSO • ballistic conductance within the Landauer-Büttiker the- QUANTUM ESPRESSO implements a variety of methods ory using the scattering approach [60]; and algorithms aimed at a chemically realistic modeling of • generation of maximally localized Wannier functions materials from the nanoscale upwards, based on the solution [61, 62] and related quantities; of density-functional theory (DFT) [2, 3] problem, using a plane waves (PW) basis set and pseudopotentials (PP) [32] to • calculation of nuclear magnetic resonance (NMR) and represent electron-ion interactions. electronic paramagnetic resonance (EPR) parameters The codes are constructed around the use of periodic [63, 64]; boundary conditions, which allows for a straightforward treat- ment of infinite crystalline systems, and an efficient con- • calculation of K-edge X-ray absorption spectra [65]. vergence to the thermodynamic limit for aperiodic but extended systems, such as liquids or amorphous materials. Fi- Other more advanced or specialized capabilities are described nite systems are also treated using supercells; if required, in the next sections, while ongoing projects (e.g. time- open-boundary conditions can be used through the use of the dependent DFT, many-body perturbation theory) are men- density-countercharge method [33]. QUANTUM ESPRESSO tioned in the last section. Selected applications were described can thus be used for any crystal structure or supercell, and in Ref [66]. Several utilities for data postprocessing and inter- for metals as well as for insulators. The atomic cores can be facing to advanced graphic applications are available, allow- described by separable [34] norm-conserving (NC) PPs [35], ing e.g. to calculate scanning tunneling microscopy (STM) ultra-soft (US) PPs [36], or by projector-augmented wave images [67], the electron localization function (ELF) [68], (PAW) sets [37]. Many different exchange-correlation func- Löwdin charges [69], the density of states (DOS), and pla- tionals are available in the framework of the local-density nar [70] or spherical averages of the charge and spin densities (LDA) or generalized-gradient approximation (GGA) [38], and potentials. plus advanced functionals like Hubbard U corrections and few meta-GGA [39] and hybrid functionals [40–42]. The latter is an area of very active development, and more details on A. Data file format the implementation of hybrid functionals and related Fock- exchange techniques are given in Appendix F. The interoperability of different software components The basic computations/simulations that can be performed within a complex project such as QUANTUM ESPRESSO re- include: lies on the careful design of file formats for data exchange. A rational and open approach to data file formats is also essential • calculation of the Kohn-Sham orbitals and energies [43] for interfacing applications within QUANTUM ESPRESSO for isolated or extended/periodic systems, and of their with third-party applications, and more generally to make the ground-state energies; results of lengthy and expensive computer simulations ac- cessible to, and reproducible by, the scientific community at • complete structural optimizations of the microscopic large. The need for data file formats that make data exchange (atomic coordinates) and macroscopic (unit cell) de- easier than it is now is starting to be widely appreciated in the grees of freedom, using Hellmann-Feynman forces [44, electronic-structure community. This problem has many as- 45] and stresses [46]; pects and likely no simple, "one-size-fits-all", solution. Data files should ideally be • ground states for magnetic or spin-polarized system, including spin-orbit coupling [47] and non-collinear mag- • extensible: one should be able to add some more infor- netism [48, 49]; mation to a file without breaking all codes that read that file; • ab-initio molecular dynamics (MD), using either the Car-Parrinello Lagrangian [50] or the Hellmann- • self-documenting: it should be possible to understand Feynman forces calculated on the Born-Oppenheimer the contents of a file without too much effort; 5

• efficient: with data size in the order of GBytes for specifically designed for supercomputers find their way into large-scale calculations, slow or wasteful I/O should be commodity computers. avoided. The architecture of today’s supercomputers is characterized by multiple levels and layers of parallelism: the bot- The current trend in the electronic-structure community seems tom layer is the one affecting the instruction set of a single to be the adoption of one of the following approaches: core (simultaneous multithreading, hyperthreading); then one • Structured file formats, notably Hierarchical Data For- has parallel processing at processor level (many CPU cores mat (HDF) [71] and network Common Data Form inside a single processor sharing caches) and at node level (netCDF) [72], that are widely used since years in other (many processors sharing the same memory inside the node); communities; at the top level, many nodes are finally interconnected with a high-performance network. The main components of the • file formats based on the Extensible Markup Language QUANTUM ESPRESSO distribution are designed to exploit (XML) [73]. this highly structured hardware hierarchy. High performance on massively parallel architectures is achieved by distributing It is unlikely that a common, standardized data formats will both data and computations in a hierarchical way across avail- ever prevail in our community. We feel that we should fo- able processors, ending up with multiple parallelization levels cus, rather than on standardization, on an approach that al- [78] that can be tuned to the specific application and to the lows an easy design and usage of simple and reliable con- specific architecture. This remarkable characteristic makes it verters among different data formats. Prompted by these con- possible for the main codes of the distribution to run in parallel siderations, QUANTUM ESPRESSO developers have opted on most or all parallel machines with very good performance for a simple solution that tries to combine the advantages of in all cases. both the above-mentioned approaches. A single file contain- More in detail, the various parallelization levels are geared ing all the data of a simulation is replaced by a data direc- into a hierarchy of processor groups, identified by different tory, containing several files and subdirectories, much in the MPI communicators. In this hierarchy, groups implement- same way as it is done in the Mac OS X operating system. ing coarser-grained parallel tasks are split into groups im- The “head” file contains data written with ordinary Fortran plementing finer-grained parallel tasks. The first level is im- formatted I/O, identified by XML tags. Only data of small age parallelization, implemented by dividing processors into size, such as atomic positions, parameters used in the calcu- nimage groups, each taking care of one or more images (i.e. lation, one-electron and total energies, are written in the head a point in the configuration space, used by the NEB method). file. Data of potentially large size, such as PW coefficients of The second level is pool parallelization, implemented by fur- Kohn-Sham orbitals, charge density, potentials, are present as ther dividing each group of processors into npool pools of links to separate files, written using unformatted Fortran I/O. processors, each taking care of one or more k-points (see Data for each k-point is written to a separate subdirectory. Appendix D). The third level is plane-wave parallelization, A lightweight library called iotk, standing for Input/Output implemented by distributing real- and reciprocal-space grids ToolKit [74], is used to read and write the data directory. across the nPW processors of each pool. The final level is task Another problem affecting interoperability of PW-PP codes group parallelization [79], in which processors are divided is the availability of data files containing atomic PP’s — one into ntask task groups of nFFT = nPW /ntask processors, of the basic ingredients of the calculation. There are many each one taking care of different groups of electron states to different types of PP’s, many different codes generating PP’s be Fourier-transformed, while each FFT is parallelized inside (see e.g. Ref [75–77]), each one with its own format. Again, a task group. A further paralellization level, linear-algebra, the choice has fallen on a simple solution that makes it easy coexists side-to-side with plane-wave parallelization, i.e. they to write converters from and to the format used by QUANTUM take care of different sets of operations, with different data dis- ESPRESSO. Each atomic PP is contained in a formatted file tribution. Linear-algebra parallelization is implemented both (efficiency is not an issue here), described by a XML-like syn- with custom algorithms and using ScaLAPACK [80], which tax. The resulting format has been named Unified Pseudopo- on massively parallel machines yield much superior perfor- tential File (UPF). Several converters from other formats to mances. Table I contains a summary of the five levels cur- the UPF format are available in QUANTUM ESPRESSO. rently implemented. With the recent addition of the two last levels, most parallelization bottlenecks have been removed, both computations and data structures are fully distributed, IV. PARALLELIZATION scalability on parallel machines is only limited by the physical sizes of the system being simulated. Scalability does not Keeping the pace with the evolution of high-end supercom- yet extend to tens of thousands of processors as in especially- puters is one of the guiding lines in the design of QUANTUM crafted code QBox[81], but excellent scalability on up to 4800 ESPRESSO, with a significant effort being dedicated to port- processors has been demonstrated (see Fig. 1) even for cases ing it to the latest available architectures. This effort is moti- where coarse-grained parallelization doesn’t help. vated not only by the need to stay at the forefront of architec- Scalability is thus guaranteed for large-scale simulations. tural innovation for large to very-large scale materials science This being said, it is obvious that the size and specific nature simulations, but also by the speed at which hardware features of the specific application sets quite natural limits to the max- 6

TABLE I: Summary of parallelization levels in QUANTUM ESPRESSO. group distributed quantities communications performance image NEB images very low linear CPU scaling, fair to good load balancing; does not distribute RAM pool k-points low almost linear CPU scaling, fair to good load balancing; does not distribute RAM plane-wave plane waves, G-vector coefﬁcients, high good CPU scaling, R-space FFT arrays good load balancing, distributes most RAM task FFT on electron states high improves load balancing linear algebra subspace Hamiltonians very high improves scaling, and constraints matrices distributes more RAM

Scalability for > 1000 processors CP scalability 350 5000 PSIWAT BG/P 1 task CNT (1) BG/P 4 tasks 300 4500 CNT (2) BG/P 8 tasks Altix 1 task 4000 250 Altix 4 tasks Altix 8 tasks 3500 200 3000 150 2500 CPU time (s) CPU time (s) 2000 100 1500 50 1000

0 1000 2000 3000 4000 5000 0 100 200 300 400 500 Number of CPUs Number of CPUs

FIG. 1: Scalability as a function of the number of processors for FIG. 2: CPU time tcpu(s) per electronic time step (CP code) on large-scale calculations. PSIWAT: gold surface covered by thiols in an IBM Blue Gene/P (BG/P) and a SGI Altix for a fragment of an interaction with water, 10.59 × 20.53 × 32.66 Å3 cell, 587 atoms, Aβ−peptide in water containing 838 atoms and 2311 electrons in a 3 2552 electrons, PWscf code with npool = 4 and ntg = 4 on A Cray 22.1×22.9×19.9Å cell, as a function of the number of processors, XT 4. CNT: porphyrin-functionalized nanotube, 1532 atoms, 5232 for different numbers ntask of task groups. electrons; case (1), PWscf on a Cray XT 4; case (2), CP on a Cray XT3(2) codes. CPU times for PSIWAT and CNT (2) were rescaled so that they to fall in the same range as for CNT (1). imum number of processors up to which the performances of team are not limited to the performance on massively paral- the various codes are expected to scale. For instance, the num- lel architectures. Simulations on systems containing several ber of images in a NEB calculation sets a natural limit to the hundreds of atoms are by now quite standard (see Fig. 2 level of image groups, or the number of electronic bands sets for an example). Special attention is also paid to optimize a limit for the parallelization of the linear algebra operations. the performances for simulations of intermediate size (on sys- Moreover some numerical algorithms scale better than others. tems comprising from several tens to a few hundreds inequiv- For example, the use of norm-conserving pseudopotentials al- alent atoms), to be performed on medium-size clusters, read- lows for a better scaling than ultrasoft pseudopotentials for ily available to many groups [82]. In particular, the QUAN- a same system, because a larger plane wave basis set and a TUM ESPRESSO developers’ team is now working to better larger real- and reciprocal-space grids are required in the for- exploit new hardware trends, particularly in the ﬁeld of mul- mer case. On the other hand, using ultrasoft pseudopotentials ticore architectures. By the time this paper will be printed, is generally faster because the use of a smaller basis set is mixed OpenMP-MPI parallelization will be available for en- obviously more efﬁcient, even though the overall parallel per- hanced performance on modern multicore CPU’s. Looking formance may be not as good. ahead, future developments will likely focus on hybrid system The efforts of the QUANTUM ESPRESSO developers’ with hardware accelerators (GPUs and cell co-processors). 7

V. QUANTUM ESPRESSO PACKAGES provided in input or those automatically calculated starting from a uniform grid. Crystal symmetries are automatically de- The complete QUANTUM ESPRESSO distribution is rel- tected and exploited to reduce computational costs, by restrict- ative large: about 240,000 lines of code, excluding copies of ing the sampling of the BZ to the irreducible wedge alone. the external libraries. With such a sizable code basis, mod- When only the Γ point (k = 0) is used, advantage is taken ularization becomes necessary. QUANTUM ESPRESSO is of the real character of the orbitals, allowing to store just half presently divided into several executables, performing differ- of the Fourier components. BZ integrations in metallic sys- ent types of calculations, although some of them have over- tems are dealt with smearing/broadening techniques, such as lapping functionalities. Typically there is a single set of func- Fermi-Dirac, Gaussian, Methfessel-Paxton [95], and Marzari- tions/subroutines or a single Fortran 90 module that performs Vanderbilt cold smearing [96]; the tetrahedron method [97] is each specific task (e.g. matrix diagonalizations, or potential also implemented. Note that finite-temperature effects on the updates), but there are still important exceptions to this rule, electronic properties can be easily accounted for by using the reflecting the different origin and different styles of the orig- Fermi-Dirac smearing not as a mathematical device, but as a inal components. QUANTUM ESPRESSO has in fact been practical way of implementing the Mermin density-functional built out of the merge and re-engineering of different pack- approach [98]. ages, that were independently developed for several years. In Structural optimizations are performed using the Broyden- the following, the main components are briefly described. Fletcher-Goldfarb-Shanno (BFGS) algorithm [99–101] or damped dynamics; these can involve both the internal, microscopic degrees of freedom (i.e. the atomic coordinates) and/or A. PWscf the macroscopic ones (shape and size of the unit cell). The calculation of minimum-energy paths, activation energies, and transition states uses the Nudged Elastic Band (NEB) method PWscf implements an iterative approach to reach self- [57]. Potential energy surfaces as a function of suitably cho- consistency, using at each step iterative diagonalization tech- sen collective variables can be studied using Laio-Parrinello niques, in the framework of the plane-wave pseudopotential metadynamics [102]. method. An early version of PWscf is described in Ref [83]. Microcanonical (NVE) MD is performed on the BO sur- Both separable NC-PPs and US-PPs are implemented; re- face, i.e. achieving electron self-consistency at each time step, cently, also the projector augmented-wave method [37] has using the Verlet algorithm[103]. Canonical (NVT) dynamics been added, largely following the lines of Ref [84] for its can be performed using velocity rescaling, or Anderson’s or implementation. In the case of US-PPs, the electronic wave Berendsen’s thermostats [104]. Constant-pressure (NPT) MD functions can be made smoother at the price of having to aug- is performed by adding additional degrees of freedom for the ment their square modulus with additional contributions to re- cell size and volume, using either the Parrinello-Rahman La- cover the actual physical charge densities. For this reason, grangian [105] or the invariant Lagrangian of Wentzcovitch the charge density is more structured than the square of the [53]. wavefunctions, and requires a larger energy cutoff for its plane wave expansion (typically, 6 to 12 times larger; for a NC-PP, The effects of finite macroscopic electric fields on the elec- a factor of 4 would be mathematically sufficient). Hence, dif- tronic structure of the ground state can be accounted for ei- ferent real-space Fourier grids are introduced - a "soft" one ther through the method of Ref [106, 107] based on the Berry that represents the square of electronic wave functions, and a phase, or (for slab geometries only) through a sawtooth ex- "hard" one that represents the charge density [82, 85]. The ternal potential [108, 109]. A quantum fragment can be em- augmentation terms can be added either in reciprocal space bedded in a complex electrostatic environment that includes a (using an exact but expensive algorithm) or directly in real model solvent [110] and a counterion distribution [111], as is space (using an approximate but faster algorithm that exploits typical of electrochemical systems. the local character of the augmentation charges). PWscf can use the established LDA and GGA exchange- correlation functionals, including spin-polarization within the B. CP scheme proposed in Ref [86] and can treat non-collinear magnetism[48, 49] as e.g. induced by relativistic effects (spin- The CP code is the massively-parallel module for Car- orbit interactions) [87, 88] or by complex magnetic interac- Parrinello ab-initio MD. CP can use both NC PPs [112] and tions (e.g. in the presence of frustration). DFT + Hubbard U US PPs [85, 113]. In the latter case, the electron density calculations [89] are implemented for a simplified (“no-J”) is augmented through a Fourier interpolation scheme in real rotationally invariant form [90] of the Hubbard term. Other space (“box grid”) [82, 85] that is particular efficient for large advanced functionals include TPSS meta-GGA [39], function- scale calculations. CP implements the same functionals as in als with finite-size corrections [91], and the PBE0 [40] and PWscf, with the exception of hybrid functionals; a simpli- B3LYP [41, 42] hybrids. fied one-electron self-interaction correction (SIC)[114] is also Self-consistency is achieved via the modified Broyden available. The Car-Parrinello Lagrangian can be augmented method of Ref [92], with some further refinements that are with Hubbard U corrections [115], or Hubbard-based penalty detailed in the Appendix. The sampling of the Brillouin Zone functionals to impose arbitrary oxidation states [116]. (BZ) can be performed using either special [93, 94] k-points Since the main applications of CP are for large systems 8 without translational symmetry (e.g. liquids, amorphous ma- vectors in the star are generated using the symmetry opera- terials), Brillouin zone sampling is restricted to the Γ point of tions of the crystal. This approach allows us to speed up the the supercell, allowing for real instead of complex wavefunc- calculation without the need to store too much data for sym- tions. Metallic systems can be treated in the framework of metrization. “ensemble DFT” [117]. The calculation of second-order derivatives of the energy In the Car-Parrinello algorithm, microcanonical (NVE) MD works also for US PP [132, 133] and for all GGA flavors is performed on both electronic and nuclear degrees of free- [134, 135] used in PWscf and in CP. The extension of dom, treated on the same footing, using the Verlet algorithm. PHonon to PAW [136], to noncollinear magnetism and to The electronic equations of motion are accelerated through a fully relativistic US PPs which include spin-orbit coupling preconditioning scheme [118]. Constant-pressure (NPT) MD [137] will be available by the time this paper will be printed. is performed using the Parrinello-Rahman Lagrangian [105] Advanced features of the PHonon code include the calcu- and additional degrees of freedom for the cell. Nosé-Hoover lation of third order derivatives of the energy, such as electron- thermostats [119] and Nosé-Hoover chains [120] allow to per- phonon or phonon-phonon interaction coefficients. Electron- form simulations in the different canonical ensembles. phonon interactions are straightforwardly calculated from the CP can also be used to directly minimize the energy func- response of the self-consistent potential to a lattice distortion. tional to self-consistency while keeping the nuclei fixed, or to This involves a numerically-sensitive “double-delta” integra- perform structural minimizations of nuclear positions, using tion at the Fermi energy, that is performed using interpolations the “global minimization” approaches of in Refs. [121, 122], on a dense k-point grid. Interpolation techniques based on and damped dynamics or conjugate-gradients on the elec- Wannier functions [138] will speed up considerably these cal- tronic or ionic degrees of freedom. It can also perform NEB culations. The calculation of the anharmonic force constants and metadynamics calculations. from third-order derivatives of the electronic ground-state en- Finite homogeneous electric fields can be accounted for us- ergy is described in Ref. [139] and is performed by a separate ing the Berry’s phase method, adapted to systems with the code called d3. Static Raman coefficients are calculated us- Γ point only [106]. This advanced feature can be used in ing the second-order response approach of Refs. [140, 141]. combination with MD to obtain the infrared spectra of liq- Both third-order derivatives and Raman-coefficients calcula- uids [106, 123], the low- and high-frequency dielectric con- tions are currently implemented only for NC PPs. stants [106, 124] and the coupling factors required for the calculation of vibrational properties, including infrared, Raman [125–127], and hyper-Raman [128] spectra. D. atomic

The atomic code performs three different tasks: i) solu- C. PHonon tion of the self-consistent all-electron radial Kohn-Sham equations (with a Coulomb nuclear potential and spherically sym- The PHonon code implements density-functional pertur- metric charge density); ii) generation of NC PPs, of US PPs, bation theory (DFPT) [54–56] for the calculation of second- or of PAW data-sets; iii) test of the above PPs and data-sets. and third-order derivatives of the energy with respect to The three tasks can be either separately executed or performed atomic displacements and to electric fields. The global min- in a single run. Three different all-electron equations are avail- imization approach [129, 130] is used for the special case able: i) the non relativistic radial Kohn and Sham equations, of normal modes in finite (molecular) systems, where no BZ ii) the scalar relativistic approximation to the radial Dirac sampling is required, while a self-consistent procedure [55] is equations [142], iii) the radial Dirac-like equations derived used in the general case, with the distinctive advantage that within relativistic density functional theory [143, 144]. For i) the response to a perturbation of any arbitrary wavelength and ii) atomic magnetism is dealt within the local spin density can be calculated with a computational cost that is propor- approximation, i.e. assuming an axis of magnetization. The tional to that of the unperturbed system. Thus, response at atomic code uses the same exchange and correlation energy any wavevector, including very long-wavelength ones, can be routines of PWscf and can deal with the same functionals. inexpensively calculated. This latter approach, and the tech- The code is able to generate NC PPs directly in separable nicalities involved in the calculation of effective charges and form (also with multiple projectors per angular momentum interatomic force constants, are described in detail in Refs. channel) via the Troullier-Martins [145] or the Rappe-Rabe- [55, 131]. Kaxiras-Joannopoulos [146] pseudization. US PPs can be Symmetry is fully exploited in order to reduce the amount generated by a two-step pseudization process, starting from of computation. Lattice distortions transforming according a NC PPs, as described in Ref. [147], or using the solutions to irreducible representations of small dimensions are gener- of the all-electron equation and pseudizing the augmentation ated first. The charge-density response to these lattice distor- functions [85]. The latter method is used also for the PAW tions is then sampled at a number of discrete k-points in the data-set generation. The generation of fully relativistic NC BZ, which is reduced according to the symmetry of the small and US PPs including spin-orbit coupling effects is also avail- group of the phonon wavevector q. The grid of q needed for able. Converters are available to translate pseudopotentials the calculation of interatomic force constants reduces to one encoded in different formats (e.g. according to the Fritz- wavevector per star: the dynamical matrices at the other q Haber [75] or Vanderbilt [76] conventions) into the UPF for- 9 mat adopted by QUANTUM ESPRESSO. and translationally invariant. Gauge and translational invari- Transferability tests can be made simultaneously for sev- ances are restored by using the gauge including projector aug- eral atomic configurations with or without spin-polarization, mented wave (GIPAW) method [63, 64] both i) to describe by solving the non relativistic radial Kohn-Sham equations in the PP Hamiltonian the coupling of orbital degrees of free- generalized for separable nonlocal PPs and for the presence dom with the external magnetic field, and ii) to reconstruct the of an overlap matrix. all-electron wave-functions, in presence of the external magnetic field. In addition, the description of a uniform magnetic field within periodic boundary condition is achieved by con- E. PWcond sidering the long wave-length limit of a sinusoidally modu- lated field in real space [152, 153]. The NMR chemical shifts The PWcond code implements the scattering approach pro- are computed following the method described in Ref.[63], the posed by Choi and Ihm [60] for the study of coherent electron g-tensor following Ref.[154] and the magnetic susceptibility transport in atomic-sized nanocontacts within the Landauer- following Refs.[63, 152]. Recently, a “converse” approach Büttiker theory. Within this scheme the linear response bal- to calculate chemical shifts has also been introduced [155], listic conductance is proportional to the quantum-mechanical based on recent developments on the Berry-phase theory of or- electron transmission at the Fermi energy for an open quan- bital magnetization; since it does not require a linear-response tum system consisting of a scattering region (e.g., an atomic calculation, it can be straightforwardly applied to arbitrarily chain or a molecule with some portions of left and right leads) complex exchange-correlation functionals, and to very large connected ideally from both sides to semi-infinite metallic systems, albeit at computational costs that are proportional to leads. The transmission is evaluated by solving the Kohn- the number of chemical shifts that need to be calculated. Sham equations with the boundary conditions that an electron coming from the left lead and propagating rightwards gets partially reflected and partially transmitted by the scattering re- G. XSPECTRA gion. The total transmission is obtained by summing all transmission probabilities for all the propagating channels in the The XSPECTRA code allows for the calculation of K-edge left lead. As a byproduct of the method, the PWcond code X-ray absorption spectra (XAS). The code calculates both provides the complex band structures of the leads, that is the the XAS cross-section including dipolar and quadrupolar ma- Bloch states with complex kz in the direction of transport, de- trix elements. The code uses the self-consistent charge den- scribing wave functions exponentially growing or decaying in sity produced by PWscf and acts as a post-processing tool. the z direction. The original method formulated with NC PPs The all-electron wavefunction is reconstructed using the PAW has been generalized to US PPs both in the scalar relativistic method and its implementation in the GIPAW code. The pres- [148] and in the fully relativistic forms [149]. ence of a core-hole in the final state of the X-ray absorption process is simulated by using a pseudopotential for the absorbing atom with a hole in the 1s state. The calculation of the F. GIPAW charge density is performed on a supercell with one absorbing atom. From the self-consistent charge density, the X-ray ab- The GIPAW code allows for the calculation of physical pa- sorption spectra are obtained using the Lanczos method and a rameters measured by i) nuclear magnetic resonance (NMR) continued fraction expansion [65, 156]. The advantage of this in insulators (the electric field gradient (EFG) tensors and the approach is that once the charge density is known it is not nec- chemical shift tensors), and by ii) electronic paramagnetic res- essary to calculate empty bands to describe very high energy onance (EPR) spectroscopy for paramagnetic defects in solids features of the spectrum. Correlation effects can be simulated or in radicals (the hyperfine tensors and the g-tensor). The in a mean-field way using the Hubbard U correction [90] that code also computes the magnetic susceptibility of nonmag- has been included in the XSPECTRA code in Ref. [157]. Cur- netic insulators. The code is based on the PW-PP method, rently the code is limited to collinear magnetism. Its extension and uses many subroutines of PWscf and of PHonon. The to noncollinear magnetism is under development. code is currently restricted to NC PPs. All the NMR and EPR parameters depend on the detailed shape of the electronic wave-functions near the nuclei and thus require the re- H. Wannier90 construction of the all-electron wave-functions from the PP wave-functions. For the properties defined at zero external Wannier90 [26, 158] is a code that calculates maximally- magnetic field, namely the EFG and the hyperfine tensors, localized Wannier functions in insulators or metals— such reconstruction is performed as a post-processing step of according to the algorithms described in Refs. [61, 62]—and a self-consistent calculation using the PAW reconstruction, as a number of properties that can be conveniently expressed in described for the EFG in Ref.[150] and for the hyperfine ten- a Wannier basis. The code is developed and maintained inde- sor in Ref.[151]. The g-tensor, the NMR chemical shifts and pendently by a Wannier development group [158] and can be the magnetic susceptibility are obtained from the orbital lin- taken as a representative example of the philosophy described ear response to an external uniform magnetic field. In pres- earlier, where a project maintains its own individual distribu- ence of a magnetic field the PAW method is no more gauge- tion but provides full interoperability with the core compo- 10 nents of QUANTUM ESPRESSO in this case PWscf or CP. wannier90 library and code [26, 158] (also in- These codes are in fact used as “quantum engines” to produce cluded in the QUANTUM ESPRESSO distribution); the data onto which Wannier90 operates. The need to pro- pw2casino.f90, an interface to the casino quan- vide transparent protocols for interoperability has in turn fa- tum Monte Carlo code [165]; wannier_ham.f90, a cilitated the interfacing of wannier90 with other quantum tool to build a tight-binding representation of the KS engines [14, 21], fostering a collaborative engagement with Hamiltonian to be used by the dmft code [166] (avail- the broader electronic-structure community that is also in the able at the qe-forge site); pw_export.f90, an in- spirit of QUANTUM ESPRESSO. terface to the GW code SaX [167]; pw2gw.f90, an Wannier90 requires as input the scalar products between interface to code DP [168] for dielectric property calcu- wavefunctions at neighboring k-points, where these latter lations, and to code EXC [169] for excited-state proper- form uniform meshes in the Brillouin zone. Often, it is ties. also convenient to provide scalar products between wavefunctions and trial, localized real-space orbitals—these are used to • Calculation of various quantities that are useful for the guide the localization procedure towards a desired, physical analysis of the results. In addition to the already men- minimum. As such, the code is not tied to a representation tioned ELF and STM, one can calculate projections of the wavefunctions in any particular basis—for PWscf and over atomic states (e.g. Löwdin charges [69]), DOS and CP a postprocessing utility is in charge of calculating these Projected DOS (PDOS), planar and spherical averages, scalar products using the plane-wave basis set of QUANTUM and the complex macroscopic dielectric function in the ESPRESSO and either NC-PPs or US-PPs. Whenever Γ random-phase approximation (RPA). sampling is used, the simplified algorithm of Ref. [159] is adopted. Besides calculating maximally localized Wannier func- J. PWgui tions, the code is able to construct the Hamiltonian matrix in this localized basis, providing a chemically accurate, and PWgui is the graphical user interface (GUI) for the transferable, tight-binding representation of the electronic- PWscf, PHonon, d3 and atomic codes as well as for structure of the system. This, in turn, can be used to construct some of the main post-processing routines (e.g. pp.x and Green’s functions and self-energies for ballistic transport cal- projwfc.x). PWgui is an input file builder whose main culations [160, 161], to determine the electronic-structure and goal is to lower the learning barrier for the newcomer, who density-of-states of very large scale structures [161], to inter- has to struggle with the input syntax. Its event-driven mecha- polate accurately the electronic band structure (i.e. the Hamil- nism automatically adjusts the display of required input fields tonian) across the Brillouin zone [161, 162], or to interpolate (i.e. enables certain sets of widgets and disables others) to the any other operator [162]. These latter capabilities are espe- specific cases selected. It enables a preview of the format of cially useful for the calculation of integrals that depend sen- the (required) input file records for a given type of calculation. sitively on a submanifold of states; common examples come The input files created by PWgui are syntactically correct (al- from properties that depend sensitively on the Fermi surface, though they can still be physically meaningless). It is possible such as electronic conductivity, electron-phonon couplings to upload previously generated input files for syntax checking Knight shifts, or the anomalous Hall effect. A related by- and/or to modify them. It is also possible to run calculations product of Wannier90 is the capability of downfolding a from within the PWgui. In addition, PWgui can also use selected, physically significant manifold of bands into a mini- the external xcrysden program [163] for the visualization mal but accurate basis, to be used for model Hamiltonians that of molecular and/or crystal structures from the specified in- can be treated with complex many-body approaches. put data and for the visualization of properties (e.g. charge densities or STM images). I. PostProc As the QUANTUM ESPRESSO codes evolve, the input file syntax expands as well. This implies that PWgui has to be continuously adapted. To effectively deal with such is- PostProc The module contains a number of codes for sue, PWgui uses the GUIB concept [170]. GUIB builds on PWscf postprocessing and analysis of data files produced by the consideration that the input files for numerical simulation CP and . The following operations can be performed: codes have a rather simple structure and it exploits this sim- • Interfacing to graphical and molecular graphics appli- plicity by defining a special meta-language with two purposes: cations. Charge and spin density, potentials, ELF [68] The first is to define the input-file syntax, and the second is to and STM images [67] are extracted or calculated and simultaneously automate the construction of the GUI on the written to files that can be directly read by most com- basis of such a definition. mon plotting programs, like xcrysden [163] and VMD A similar strategy has been recently adopted for the de- [164]. scription of the QUANTUM ESPRESSO input file formats. A single definition/description of a given input file serves as i) • Interfaces to other codes that use DFT results from a documentation per-se, ii) as a PWgui help documentation, QUANTUM ESPRESSO for further calculations, such and iii) as a utility to synchronize the PWgui with up-to-date as e.g.: pw2wannier90, an interface to the input file formats. 11

FIG. 3: Snapshot of the PWgui application. Left: PWgui’s main window; right: preview of speciﬁed input data in text mode.

VI. PERSPECTIVES AND OUTLOOK ready available, or soon-to-be available, on qe-forge, we mention: SaX [167], an open-source project implementing state-of-the-art many-body perturbation theory methods for Further developments and extensions of QUANTUM dmft ESPRESSO will be driven by the needs of the community excited states; [166], a code to perform Dynamical using it and working on it. Many of the soon-to-come ad- Mean-Field Theory calculations on top of a tight-binding rep- qha ditions will deal with excited-state calculations within time- resentation of the DFT band structure; , a set of codes for dependent DFT (TDDFT [171, 172]) and/or many-body per- calculating thermal properties of materials within the quasi- pwtk turbation theory [173]. A new approach to the calculation of harmonic approximation [181]; , a fully functional Tcl optical spectra within TDDFT has been recently developed scripting interface to PWscf [182]. [174], based on a finite-frequency generalization of density- Efforts towards better interoperability with third-party soft- functional perturbation theory [54, 55], and implemented ware will be geared towards releasing accurate specifications for data structures and data file formats and providing inter- in QUANTUM ESPRESSO. Another important development presently under way is an efficient implementation of GW cal- faces to and from other codes and packages used by the sci- culations for large systems (whose size is of the order of a few entific community. Further work will be also devoted to the hundreds inequivalent atoms) [175]. The implementation of extension to the US-PPs and PAW schemes of the parts of efficient algorithms for calculating correlation energies at the QUANTUM ESPRESSO that are now limited to NC-PPs. RPA level is also presently under way [176–178]. It is fore- The increasing availability of massively parallel machines seen that by the time this paper will appear, many of these will likely lead to an increased interest towards large-scale cal- developments will be publicly released. culations. The ongoing effort in this field will continue. A special attention will be paid to the requirements imposed by It is hoped that many new functionalities will be made avail- the architecture of the new machines, in particular multicore able to QUANTUM ESPRESSO users by external groups who CPUs, for which a mixed OpenMP-MPI approach seems to will make their own software compatible/interfaceable with be the only viable solution yielding maximum performances. QUANTUM ESPRESSO. At the time of the writing of the Grid computing and the commoditization of computer cluster present paper, third-party scientific software available to the will also lead to great improvements in high-throughput cal- QUANTUM ESPRESSO users’ community include: yambo, culations for materials design and discovery. a general-purpose code for excited-state calculations within many-body perturbation theory [179]; casino, a code for electronic-structure quantum Monte Carlo simulations [165]; want, a code for the simulation of ballistic transport in nanos- VII. ACKNOWLEDGMENTS tructures, based on Wannier functions [180]; xcrysden, a molecular graphics application, especially suited for periodic The QUANTUM ESPRESSO project is an initiative of the structures [163]. The qe-forge portal is expected to burst CNR-INFM DEMOCRITOS National Simulation Center in the production and availability of third-party software com- Trieste (Italy) and its partners, in collaboration with MIT, patible with QUANTUM ESPRESSO. Among the projects al- Princeton University, the University of Minnesota, the Ecole 12

Polytechnique Fédérale de Lausanne, the Université Pierre et Let us compare the DFT energy calculated in the standard Marie Curie in Paris, the Jožef Stefan Institute in Ljubljana, way: and the S3 research center in Modena. Many of the ideas 2 embodied in the QUANTUM ESPRESSO codes have ﬂour- X Z ¯h E = ψ∗(r) − ∇2 + V (r) ψ (r)dr ished in the very stimulating environment of the International i 2m ext i School of Advanced Studies, SISSA, where Democritos is i (out) hosted, and have beneﬁted from the ingenuity of generations +EHxc[ρ ], (4) of students and young postdocs. SISSA and the National Su- percomputing Center CINECA are currently providing valu- where EHxc is the Hartree and exchange-correlation en- able support to the QUANTUM ESPRESSO project. ergy, with the Harris-Weinert-Foulkes functional form, which doesn’t use ρ(out):

E0 = VIII. APPENDIX X Z ¯h2 ψ∗(r) − ∇2 + V (r) + V (in)(r) ψ (r)dr i 2m ext i This appendix contains the description of some algorithms i used in QUANTUM ESPRESSO that have not been docu- Z − ρ(in)V (in)(r) + E [ρ(in)] (5) mented elsewhere. Hxc

Both forms are variational, i.e. the ﬁrst-order variation of the A. Self-consistency energy with respect to the charge density vanish, and both converge to the same result when self-consistency is achieved. Their difference can be approximated by the following expres- The problem of ﬁnding a self-consistent solution to the sion, in which only the dominant Hartree term is considered: Kohn-Sham equations can be recast into the solution of a non- 0 linear problem 0 1 R ∆ρ(r)∆ρ(r ) 0 E − E ' 2 |r−r0| drdr = 1 R ∆ρ(r)∆V (r0)dr (6) x = F [x], x = (x1, x2, . . . , xN ), (1) 2 H

(out) (in) where vector x contains the N Fourier components or real- where ∆ρ = ρ − ρ and ∆VH is the Hartree potential space values of the charge density ρ or the Kohn-Sham po- generated by ∆ρ. Moreover it can be shown that, when ex- tential V (the sum of Hartree and exchange-correlation poten- change and correlation contributions to the electronic screen- tials); F [x(in)] is a functional of the input charge density or ing do not dominate over the electrostatic ones, this quantity potential x(in), yielding the output vector x(out) via the so- is an upper bound to the self-consistent error incurred when lution of Kohn-Sham equations. A solution can be found via using the standard form for the DFT energy. We therefore an iterative procedure. PWscf uses an algorithm based on take this term, which can be trivially calculated in reciprocal the modified Broyden method [92] in which x contains the space, as our squared scf norm: components of the charge density in reciprocal space. Mixing 4πe2 X |∆ρ(G)|2 algorithms typically find the optimal linear combination of a ||ρ(out) − ρ(in)||2 = , (7) few x(in) from previous iterations, that minimizes some suit- Ω G2 G ably defined norm ||x(out) −x(in)||, vanishing at convergence, that we will call in the following “scf norm”. where G are the vectors in reciprocal space and Ω is the vol- Ideally, the scf norm is a measure of the self-consistency ume of the unit cell. error on the total energy. Let us write an estimate of the latter Once the optimal linear combination of ρ(in) from previous for the simplest case: an insulator with NC PPs and simple iterations (typically 4 to 8) is determined, one adds a step in LDA or GGA. At a given iteration we have the new search direction that is, in the simplest case, a fraction of the optimal ∆ρ or, taking advantage of some approximate ¯h2 electronic screening[183], a preconditioned ∆ρ. In particular, − ∇2 + V (r) + V (in)(r) ψ (r) = ψ (r), (2) 2m ext i i i the simple, Thomas-Fermi, and local Thomas-Fermi mixing described in Ref.[183] are implemented and used. where i and ψi are Kohn-Sham energies and orbitals re- The above algorithm has been extended to more sophisti- spectively, i labels the occupied states, Vext is the sum of cated calculations, in which the x vector introduced above the PPs of atomic cores (written for simplicity as a local po- may contain additional quantities: for DFT+U, occupancies of tential), the input Hartree and exchange-correlation potential atomic correlated states; for meta-GGA, kinetic energy den- (in) (in) P V (r) = VHxc[ρ (r)] is a functional of the input charge sity; for PAW, the quantities ihψi|βnihβm|ψii, where the density ρ(in). The output charge density is given by β functions are the atomic-based projectors appearing in the PAW formalism. The scf norm is modified accordingly in such (out) X 2 ρ (r) = |ψi(r)| . (3) a way to include the additional variables in the estimated self- i consistency error. 13

B. Iterative diagonalization where k is the Bloch vector of the electronic states under consideration, |k + G0i denotes PWs, an estimate of the highest During self-consistency one has to solve the generalized occupied eigenvalue. Since the Hamiltonian is a diagonally eigenvalue problem for all N occupied states dominant operator and the kinetic energy of PWs is the dominant part at high G, this simple form is very effective. Hψi = iSψi, i = 1,...,N (8) in which both H (the Hamiltonian) and S (the overlap ma- 2. Conjugate-Gradient trix) are available as operators (i.e. Hψ and Sψ products can be calculated for a generic state ψ ). Eigenvectors are The eigenvalue problem of Eq.(8) can be recast into a se- normalized according to the generalized orthonormality con- quence of constrained minimization problems: straints hψi|S|ψji = δij. This problem is solved using iterative methods. Currently PWscf implements a block Davidson   algorithm and an alternative algorithm based on band-by-band X min hψi|H|ψii − λj (hψi|S|ψji − δij) , (14) minimization using conjugate gradient. j≤i

where the λj are Lagrange multipliers. This can be solved 1. Davidson using preconditioned conjugate gradient algorithm with mi- nor modiﬁcations to ensure constraint enforcement. The algo- One starts from an initial set of orthonormalized trial or- rithm here described was inspired by the conjugate-gradient (0) (0) (0) (0) bitals ψi and of trial eigenvalues i = hψi |H|ψi i. The algorithm of Ref.[184], and is similar to one of the variants starting set is typically obtained from the previous scf itera- described in Ref.[185]. tion, if available, and if not, from the previous time step, or Let us assume that eigenvectors ψj up to j = i − 1 have optimization step, or from a superposition of atomic orbitals. already been calculated. We start from an initial guess ψ(0) We introduce the residual vectors for the i-th eigenvector, such that hψ(0)|S|ψ(0)i = 1 and (0) (0) (0) (0) hψ |S|ψji = 0. We introduce a diagonal precondition ma- gi = (H − i S)ψi , (9) trix P and auxiliary functions y = P −1ψ and solve the equiv- a measure of the error on the trial solution, and the correction alent problem vectors δψ(0) = Dg(0), where D is a suitable approximation h i i i min hy|H|yi − λ hy|S|yi − 1 , (15) (0) −1 e e to (H − i S) . The eigenvalue problem is then solved in the 2N-dimensional subspace spanned by the reduced basis where He = PHP , Se = PSP , under the additional orthonor- set φ(0), formed by φ(0) = ψ(0) and φ(0) = δψ(0): i i i+N i mality constraints hy|PS|ψji = 0. The starting gradient of 2N Eq.(15)) is given by X (i) (H − S )c = 0, (10) (0) (0) jk i jk k g = (He − λSe)y . (16) k=1 where By imposing that the gradient is orthonormal to the starting vector: hg(0)|S|y(0)i = 0, one determines the value of the (0) (0) (0) (0) e Hjk = hφj |H|φk i,Sjk = hφj |S|φk i. (11) Lagrange multiplier:

Conventional algorithms for matrix diagonalization are used hy(0)|SeHe|y(0)i λ = . (17) in this step. A new set of trial eigenvectors and eigenvalues is (0) 2 (0) obtained: hy |Se |y i 2N The remaining orthonormality constraints are imposed on (1) X (i) (0) (1) (1) (1) (0) ψ = c φ , = hψ |H|ψ i (12) P g by explicit orthonormalization (e.g. Gram-Schmid) to i j j i i i (0) j=1 the ψj. We introduce the conjugate gradient h , which for the first step is set equal to g(0) (after orthonormalization), and and the procedure is iterated until a satisfactory convergence is the normalized direction n(0) = h(0)/hh(0)|S|h(0)i1/2. We achieved. Alternatively, one may enlarge the reduced basis set e search for the minimum of hy(1)|H|y(1)i along the direction with the new correction vectors δψ(1) = Dg(1), solve a 3N- e i i y(1), defined as: [184] dimensional problem, and so on, until a prefixed size of the reduced basis set is reached. The latter approach is typically y(1) = y(0) cos θ + n(0) sin θ. (18) slightly faster at the expenses of a larger memory usage. The operator D must be easy to estimate. A natural choice This form ensures that the constraint on the norm is correctly in the PW basis set is a diagonal matrix, obtained keeping only enforced. The calculation of the minimum can be analytically the diagonal term of the Hamiltonian: performed and yields (0) δ 0 1 a hk + G|D|k + G0i = GG (13) θ = atan , (19) hk + G|H − S|k + Gi 2 (0) − b(0) 14 where a(0) = 2

In molecular dynamics runs and in structural relaxations, (ns) X X 2 extrapolations are employed to generate good initial guesses ρ (r) = wk|ψi,k(r)| . (25) for the wavefunctions at time t + dt from wavefunctions at i k∈IBZ previous time steps. The extrapolation algorithms used are The factors wk (“weights”) are proportional to the number of similar to those described in Ref.[184]. The alignment pro- vectors in the star (i.e. inequivalent k vectors among all the cedure, needed when wavefunctions are the results of a self- {Rk} vectors generated by the point-group rotations) and are consistent calculation is as follows. The overlap matrix Oij P normalized to 1: k∈IBZ wk = 1. Weights can either be between wavefunctions at consecutive time steps: calculated or deduced from the literature on the special-point technique[93, 94]. The charge density is then symmetrized as: Oij = hψi(t + dt)|S(t + dt)|ψj(t)i, (23) 1 X ˆ (ns) 1 X (ns) −1 can be used to generate the unitary transformation U [186] ρ(r) = Sρ (r) = ρ (R r − f) (26) Ns Ns k P Sˆ Sˆ that aligns ψ(t+dt) to ψ(t): ψi (t+dt) = j Uijψj(t+dt). Since O is not unitary, it needs to be made unitary via e.g. the where the sum runs over all Ns symmetry operations. unitarization procedure The symmetrization technique can be extended to all quantities that are expressed as sums over the BZ. Hellmann- U = (O†O)−1/2O†. (24) Feynman forces Fs on atom s are thus calculated as follows:

The operation above is performed using a singular value de- 1 X 1 X (ns) F = SˆF(ns) = RF , (27) composition: Let the overlap matrix be O = vDw, where v s s Sˆ−1(s) Ns Ns and w are unitary matrix and D is a diagonal non-negative def- Sˆ Sˆ inite matrix, whose eigenvalues are close to 1 if the two sets of ˆ−1 wavefunctions are very similar. The needed unitary transfor- where S (s) labels the atom into which the s−th atom trans- mation is then simply given by U ' w†v†. This procedure is forms (modulo a lattice translation vector) after application of simpler than the original proposal and prevents the alignment Sˆ−1, the symmetry operation inverse of Sˆ. In a similar way algorithm to break in the occasional situation where, due to one determines the symmetrized stress, using the rule for ma- level crossing in the band structure between subsequent time trix transformation under a rotation: steps, one or more of the eigenvalues of the D matrix vanish. 3 1 X X (ns) σαβ = Rαγ Rβδ σγδ . (28) Ns ˆ γ,δ=1 D. Symmetry S The PHonon package supplements the above technique Symmetry is exploited almost everywhere, with the notable with a further strategy. Given the phonon wave-vector q, the exception of CP. The latter is devised to study aperiodic sys- small group of q (the subgroup Sˆq of crystal symmetry opera- tems or large supercells where symmetry is either absent or of tions that leave q invariant) is identified and the reducible rep- little use even if present. resentation defined by the 3Nat atomic displacements along 15 cartesian axis is decomposed into nirr irreducible representa- nition of the exchange-correlation functional. For a periodic (q) system, the Fock exchange energy per unit cell is given by: tions (irreps) γj , j = 1, . . . , nirr. The dimensions of the irreducible representations are small, with νj ≤ 3 in most cases, 2 Z ∗ ∗ 0 0 up to 6 in some special cases (zone-boundary wave-vectors q e X ψ (r)ψk0v0 (r)ψ 0 0 (r )ψkv(r ) E = − kv k v drdr0, in nonsymmorphic groups). Each irrep, j, is therefore defined x N |r − r0| kv 0 0 by a set of νj linear combinations of atomic displacements k v that transform into each other under the symmetry operations (32) of the small group of q. In the self-consistent solution of the where an insulating and non magnetic system is assumed for linear response equations, only perturbations associated to a simplicity. Integrals and wave-function normalizations are de- given irrep need to be treated together and different irreps can fined over the whole crystal volume, V = NΩ (Ω being the be solved independently. This feature is exploited to reduce unit cell volume), and the summations run over all occupied the amount of memory required by the calculation and is suit- bands and all N k-points defined in the BZ by Born-von Kár- able for coarse-grained parallelization and for execution on a mán boundary conditions. The calculation of this term is performed exploiting the dual-space formalism: auxiliary coden- GRID infrastructure [187]. ∗ (j,α) sities, ρk0,v0 (r) = ψk0,v0 (r)ψk,v(r) are computed in real space The wavefunction response, ∆ψk+q,i(r), to displacements k,v and transformed to reciprocal space by FFT, where the associ- along irrep j, γ(q) (where α = 1, . . . , ν labels different part- j,α j ated electrostatic energies are accumulated. The application of ners of the given irrep), is then calculated. The lattice-periodic (ns) the Fock exchange operator to a wavefunction involves addi- unsymmetrized charge response, ∆ρq,j,α(r), has the form: tional FFTs and real-space array multiplications. These basic operations need to be repeated for all the occupied bands and (ns) −iq·r X X ∗ (j,α) ∆ρq,j,α(r) = e 4 wkψk,i(r)∆ψk+q,i(r), all the points in the BZ grid. For this reason the computational i k∈IBZ(q) cost of the exact exchange calculation is very high, at least an (29) order of magnitude larger than for non-hybrid functional cal- where the notation IBZ(q) indicates the IBZ calculated as- culations. suming the small group of q as symmetry group, and the In order to limit the computational cost, an auxiliary grid of weights wk are calculated accordingly. The symmetrized q-points in the BZ, centered at the Γ point, can be introduced charge response is calculated as and the summation over k0 be limited to the subset k0 = k+q. Of course convergence with respect to this additional param- νj 1 X −iqf X ˆ (ns) −1 eter needs to be checked, but often a grid coarser than the one ∆ρq,j,α(r) = e D(Sq)βα ∆ρq,j,β(R r−f) Ns(q) used for computing densities and potentials is sufficient. ˆ β=1 Sq The direct evaluation of the Fock energy on regular grids in (30) the BZ is however problematic due to an integrable divergence where D(Sˆ ) is the matrix representation of the action of the q that appears in the q → 0 limit. This problem is addressed re- ˆ (q) symmetry operation Sq ≡ {R|f} for the j−th irrep γj . At sorting to a procedure, first proposed by Gygi and Baldereschi the end of the self-consistent procedure, the force constant [188], where an integrable term that displays the same diver- matrix Csα,tβ(q) (where s, t label atoms, α, β cartesian co- gence is subtracted from the expression for the exchange en- ordinates) is calculated. Force constants at all vectors in the ergy and its analytic integral over the BZ is separately added star of q are then obtained using symmetry: back to it. Some care must still be paid [176] in order to es- X timate the contribution of the q = 0 term in the sum, which Csα,tβ(Rq) = RαδRβγ CSˆ−1(s)δ,Sˆ−1(t)γ (q), (31) contains a 0/0 limit that cannot be calculated from informa- γ,δ tion at q = 0 only. This term is estimated [176] assuming that the grid of q-points used for evaluating the exchange integrals where Sˆ ≡ {R|f} is a symmetry operation of the crystal is dense enough that a coarser grid, including only every sec- group but not of the small group of q. ond point in each direction, would also be equally accurate. Since the limiting term contributes to the integral with different weights in the two grids, one can extract its value from the E. Fock exchange condition that the two integral give the same result. This procedure removes an error proportional to the inverse of the unit Hybrid functionals are characterized by the inclusion of a cell volume Ω that would otherwise appear if this term were fraction of exact (i.e. non-local) Fock exchange in the defi- simply neglected.

[1] N. Marzari, MRS BULLETIN 31, 681 (2006). (Springer-Verlag, 1990). [2] R. G. Parr and W. Yang, Density Functional Theory of Atoms [4] R. Martin, Electronic Structure: Basic Theory and Practi- and Molecules (Oxford University Press, 1989). cal Methods (Cambridge University Press, Cambridge, UK, [3] R. M. Dreizler and E. K. U. Gross, Density Functional Theory 2004). 16

[5] E. J. Baerends, J. Autschbach, A. Bérces, F. Bickelhaupt, (2005), URL http://cp2k.berlios.de. C. Bo, P. M. Boerrigter, L. Cavallo, D. P. Chong, L. Deng, [23] S. R. Bahn and K. W. Jacobsen, Comput. Sci. Eng. 4, 56 R. M. Dickson, et al., ADF2008.01, SCM, Theoretical Chem- (2002), URL http://www.fysik.dtu.dk/campos. istry, Vrije Universiteit, Amsterdam, The Netherlands, URL [24] J. J. Mortensen, L. B. Hansen, and K. W. Jacobsen, Phys. http://www.scm.com. Rev. B 71 (2005), URL https://wiki.fysik.dtu. [6] R. Dovesi, B. Civalleri, R. Orlando, C. Roetti, and V. R. Saun- dk/gpaw. ders, Ab initio quantum simulation in solid state chemistry, in [25] T. D. Crawford, C. D. Sherrill, E. F. Valeev, J. T. Fermann, Reviews in Computational Chemistry, Ch. 1, Vol. 21, K. B. R. A. King, M. L. Leininger, S. T. Brown, C. L. Janssen, E. T. Lipkowitz, R. Larter, T. R. Cundari editors, John Wiley and Seidl, J. P. Kenny, et al., J. Comput. Chem. 28, 1610 (2007), Sons, New York (2005), URL http://www.crystal. URL http://www.psicode.org. unito.it. [26] A. A. Mostofi, J. R. Yates, Y.-S. Lee, I. Souza, D. Vanderbilt, [7] G. Karlström, R. Lindh, P.-A. Malmqvist, B. O. Roos, and N. Marzari, Comput. Phys. Commun. 178, 685 (2008). U. Ryde, V. Veryazov, P.-O. Widmark, M. Cossi, B. Schim- [27] URL http://qe-forge.org. melpfennig, P. Neogrady, et al., Comp. Mater. Sci. 28, 222 [28] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, (2003), URL http://www.teokem.lu.se/molcas. J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, [8] R. Ahlrichs, F. Furche, C. Hättig, W. M. Klopper, M. Sierka, A. McKenney, et al., LAPACK Users’ Guide (Society for In- and F. Weigend, TURBOMOLE, URL http://www. dustrial and Applied Mathematics, Philadelphia, PA, 1999), turbomole-gmbh.com. 3rd ed., ISBN 0-89871-447-8 (paperback). [9] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuse- [29] M. Frigo and S. G. Johnson, Proceedings of the IEEE 93, 216 ria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., (2005), special issue on "Program Generation, Optimization, T. Vreven, K. N. Kudin, J. C. Burant, et al., Gaussian 03, and Platform Adaptation". Revision C.02, Gaussian, Inc., Wallingford, CT, 2004, URL [30] Message Passing Interface Forum, Int. J. Supercomputer Ap- http://www.gaussian.com. plications 8 (3/4) (1994). [10] Materials Studio, URL http://accelrys.com/ [31] URL http://en.wikipedia.org/wiki/ products/materials-studio. SourceForge. [11] Jaguar: rapid ab initio electronic structure package, [32] W. E. Pickett, Comput. Phys. Rep. 9, 115 (1989). Schrödinger, LLC., URL http://www.schrodinger. [33] I. Dabo, B. Kozinsky, N. E. Singh-Miller, and N. Marzari, com/. Phys. Rev. B 77, 115139 (2008). [12] Spartan, Wavefunction, Inc., URL http://www. [34] L. Kleinman and D. M. Bylander, Phys. Rev. Lett. 48, 1425 wavefun.com/products/spartan.html. (1982). [13] G. Kresse and J. Furthmuller, Comp. Mater. Sci. 6, 15 (1996), [35] D. R. Hamann, M. Schlüter, and C. Chiang, Phys. Rev. Lett. URL http://cmp.univie.ac.at/vasp. 43, 1494 (1979). [14] URL http://www.castep.org. [36] D. Vanderbilt, Phys. Rev. B 41, 7892 (1990). [15] The CPMD consortium page, coordinated by M. Parrinello [37] P. E. Blöchl, Phys. Rev. B 50, 17953 (1994). and W. Andreoni, Copyright IBM Corp 1990-2008, Copy- [38] J. P. Perdew, J. A. Chevary, S. H. Vosko, K. A. Jackson, D. J. right MPI für Festkörperforschung Stuttgart 1997-2001, URL Singh, and C. Fiolhais, Phys. Rev. B 46, 6671 (1992). http://www.cpmd.org. [39] J. Tao, J. P. Perdew, V. N. Staroverov, and G. E. Scuseria, Phys. [16] E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski, Rev. Lett. 91, 146401 (2003). T. P. Straatsma, M. Valiev, D. Wang, E. Apra, T. L. Win- [40] J. P. Perdew, M. Ernzerhof, and K. Burke, J. Chem. Phys. 105, dus, J. Hammond, et al., Nwchem, a computational chem- 9982 (1996). istry package for parallel computers, version 5.1, Pacific [41] A. D. Becke, J. Chem. Phys. 98, 1372 (1993). Northwest National Laboratory, Richland, Washington 99352- [42] P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. 0999, USA, URL http://www.emsl.pnl.gov/docs/ Frisch, J. Phys. Chem. 98, 11623 (1994). nwchem/nwchem.html. [43] W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965). [17] R. A. Kendall, E. Apra, D. E. Bernholdt, E. J. Bylaska, [44] H. Hellmann, Einführung in die Quantenchemie (Deutliche, M. Dupuis, G. I. Fann, R. J. Harrison, J. Ju, J. A. Nichols, 1937). J. Nieplocha, et al., Comput. Phys. Commun. 128, 260 (2000). [45] R. P. Feynman, Phys. Rev. 56, 340 (1939). [18] M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, [46] O. H. Nielsen and R. M. Martin, Phys. Rev. B 32, 3780 (1985). M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. [47] P. Pyykkö, Chem. Rev. 88, 563 (1988). Nguyen, S. J. Su, et al., J. Comput. Chem. 14, 1347 (1993). [48] T. Oda, A. Pasquarello, and R. Car, Phys. Rev. Lett. 80, 3622 [19] M. S. Gordon and M. W. Schmidt, Advances in elec- (1998). tronic structure theory: Gamess a decade later, in The- [49] R. Gebauer and S. Baroni, Phys. Rev. B 61, R6459 (2000). ory and Applications of Computational Chemistry, the first [50] R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985). forty years, C. E. Dykstra, G. Frenking, K. S. Kim, G. [51] R. M. Wentzcovitch and J. L. Martins, Sol. St. Commun. 78, E. Scuseria editors, ch. 41, pp 1167-1189, Elsevier, Ams- 831 (1991). terdam, 2005, URL http://www.msg.chem.iastate. [52] M. Parrinello and A. Rahman, Phys. Rev. Lett. 45, 1196 edu/GAMESS/GAMESS.html. (1980). [20] URL http://www.gnu.org/licenses/. [53] R. M. Wentzcovitch, Phys. Rev. B 44, 2358 (1991). [21] X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, [54] S. Baroni, P. Giannozzi, and A. Testa, Phys. Rev. Lett. 58, G.-M. Rignanese, L. Sindic, M. Verstraete, G. Zerah, F. Jol- 1861 (1987). let, et al., Comp. Mater. Sci. 25, 478 (2002), URL http: [55] S. Baroni, S. de Gironcoli, A. Dal Corso, and P. Giannozzi, //www.abinit.org. Rev. Mod. Phys. 73, 515 (2001). [22] J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, [56] X. Gonze, Phys. Rev. A 52, 1096 (1995). T. Chassaing, and J. Hutter, Comput. Phys. Commun. 167, 103 [57] G. Henkelman and H. Jónsson, J. Chem. Phys. 111, 7010 17

(1999). [92] D. D. Johnson, Phys. Rev. B 38, 12807 (1988). [58] W. E, W. Ren, and E. Vanden-Eijnden, Phys. Rev. B 66, [93] D. J. Chadi and M. L. Cohen, Phys. Rev. B 8, 5747 (1973). 052301 (2002). [94] H. J. Monkhorst and J. Pack, Phys. Rev. B 13, 5188 (1976). [59] K. J. Caspersen and E. A. Carter, P. Natl. Acad. Sci. USA 102, [95] M. Methfessel and A. Paxton, Phys. Rev. B 40, 3616 (1989). 6738 (2005). [96] N. Marzari, D. Vanderbilt, A. de Vita, and M. C. Payne, Phys. [60] H. J. Choi and J. Ihm, Phys. Rev. B 59, 2267 (1999). Rev. Lett. 82, 3296 (1999). [61] N. Marzari and D. Vanderbilt, Phys. Rev. B 56, 12847 (1997). [97] P. E. Blöchl, O. Jepsen, and O. Andersen, Phys. Rev. B 49, [62] I. Souza, N. Marzari, and D. Vanderbilt, Phys. Rev. B 65, 16223 (1994). 035109 (2001). [98] N. D. Mermin, Phys. Rev. 137, A1441 (1965). [63] C. J. Pickard and F. Mauri, Phys. Rev. B 63, 245101 (2001). [99] R. Fletcher, Practical Methods of Optimization (John Wiley [64] C. J. Pickard and F. Mauri, Phys. Rev. Lett. 91, 196401 (2003). and Sons, 1987). [65] M. Taillefumier, D. Cabaret, A. M. Flank, and F. Mauri, Phys. [100] S. R. Billeter, A. J. Turner, and W. Thiel, Phys. Chem. Chem. Rev. B 66, 195107 (2002). Phys. 2, 2177 (2000). [66] S. Scandolo, P. Giannozzi, C. Cavazzoni, S. de Gironcoli, [101] S. R. Billeter, A. Curioni, and W. Andreoni, Comp. Mater. Sci. A. Pasquarello, and S. Baroni, Z. Kristallogr. 220, 574 (2005). 27, 437 (2003). [67] J. Tersoff and D. R. Hamann, Phys. Rev. B 31, 805 (1985). [102] C. Micheletti, A. Laio, and M. Parrinello, Phys. Rev. Lett. 92, [68] A. D. Becke and K. E. Edgecombe, J. Chem. Phys. 92, 5397 170601 (2004). (1990). [103] L. Verlet, Phys. Rev. 159, 98 (1967). [69] A. Szabo and N. Ostlund, Modern Quantum Chemistry [104] M. P. Allen and D. J. Tildesley, Computer Simulations of Liq- (Dover, 1996). uids (Clarendon Press, 1986). [70] A. Baldereschi, S. Baroni, and R. Resta, Phys. Rev. Lett. 61, [105] M. Bernasconi, G. L. Chiarotti, P. Focher, S. Scandolo, 734 (1988). E. Tosatti, and M. Parrinello, J. Phys. Chem. Solids 56, 501 [71] The HDF Group, URL http://www.hdfgroup.org. (1995). [72] URL http://www.unidata.ucar.edu/software/ [106] P. Umari and A. Pasquarello, Phys. Rev. Lett. 89, 157602 netcdf. (2002). [73] URL http://www.quantum-simulation.org. [107] I. Souza, J. Íñiguez, and D. Vanderbilt, Phys. Rev. Lett. 89, [74] G. Bussi, URL http://www.s3.infm.it/iotk. 117602 (2002). [75] M. Fuchs and M. Schefﬂer, Comp. Phys. Comm. 119, 67 [108] K. Kunc and R. Resta, Phys. Rev. Lett. 51, 686 (1983). (1999). [109] J. Tobik and A. Dal Corso, J. Chem. Phys. 120, 9934 (2004). [76] URL http://www.physics.rutgers.edu/~dhv/ [110] D. A. Scherlis, J.-L. Fattebert, F. Gygi, M. Cococcioni, and uspp/. N. Marzari, J. Chem. Phys. 124, 074103 (2006). [77] OPIUM – pseudopotential generation project, URL http: [111] I. Dabo, E. Cances, Y. Li, and N. Marzari (2009), URL http: //opium.sourceforge.net. //arxiv.org/abs/0901.0096v2. [78] P. Giannozzi and C. Cavazzoni, Nuovo Cimento C (in press, [112] C. Cavazzoni and G. L. Chiarotti, Comput. Phys. Commun. 2009). 123, 56 (1999). [79] J. Hutter and A. Curioni, ChemPhysChem 6, 1788 (2005). [113] A. Pasquarello, K. Laasonen, R. Car, C. Lee, and D. Vander- [80] L. S. Blackford, J. Choi, A. Cleary, J. Demmel, I. Dhillon, bilt, Phys. Rev. Lett. 69, 1982 (1992). J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, [114] M. d’Avezac, M. Calandra, and F. Mauri, Phys. Rev. B 71, et al., SC Conference 0, 5 (1996). 205210 (2005). [81] F. Gygi, E. W. Draeger, M. Schulz, B. R. de Supin- [115] H.-L. Sit, M. Cococcioni, and N. Marzari, J. Electroanal. ski, J. A. Gunnels, V. Austel, J. C. Sexton, F. Franchetti, Chem. 607, 107 (2007). S. Kral, C. W. Ueberhuber, et al., IBM J. Res. Dev. [116] H.-L. Sit, M. Cococcioni, and N. Marzari, Phys. Rev. Lett. 97, 52, 137 (2008), URL http://eslab.ucdavis.edu/ 028303 (2006). software/qbox/index.htm. [117] N. Marzari, D. Vanderbilt, and M. C. Payne, Phys. Rev. Lett. [82] P. Giannozzi, F. de Angelis, and R. Car, J. Chem. Phys. 120, 79, 1337 (1997). 5903 (2004). [118] F. Tassone, F. Mauri, and R. Car, Phys. Rev. B 50, 10561 [83] A. Dal Corso, A pseudopotential plane waves program (1994). (pwscf) and some case studies, in Lecture Notes in Chem- [119] D. J. Tobias, G. J. Martyna, and M. L. Klein, J. Phys. Chem. istry, Vol. 67, C. Pisani editor, Springer Verlag, Berlin (1996). 97, 12959 (1993). [84] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999). [120] G. J. Martyna, M. L. Klein, and M. Tuckerman, J. Chem. Phys. [85] K. Laasonen, A. Pasquarello, R. Car, C. Lee, and D. Vander- 97, 2635 (1992). bilt, Phys. Rev. B 47, 10142 (1993). [121] M. C. Payne, M. P. Teter, D. C. Allen, T. A. Arias, and J. D. [86] J. A. White and D. M. Bird, Phys. Rev. B 50, 4954 (1994). Joannopoulos, Rev. Mod. Phys. 64, 1045 (1992). [87] A. Dal Corso and A. Mosca Conte, Phys. Rev. B 71, 115106 [122] D. Marx and J. Hutter, Modern methods and algorithms of (2005). quantum chemistry, John von Neumann Institute for Comput- [88] A. Mosca Conte, Ph.D. thesis, SISSA/ISAS, Trieste, Italy ing, FZ Jülich, pp. 301–449 (2000). (2007), URL http://www.sissa.it/cm/thesis/ [123] V. Dubois, P. Umari, and A. Pasquarello, Chem. Phys. Lett. 2007/moscaconte.pdf. 390, 193 (2004). [89] V. I. Anisimov, J. Zaanen, and O. K. Andersen, Phys. Rev. B [124] F. Giustino and A. Pasquarello, Phys. Rev. Lett. 95, 187402 44, 943 (1991). (2005). [90] M. Cococcioni and S. de Gironcoli, Phys. Rev. B 71, 035105 [125] P. Umari and A. Pasquarello, Diam. Relat. Mater. 14, 1255 (2005). (2005). [91] H. Kwee, S. Zhang, and H. Krakauer, Phys. Rev. Lett. 100, [126] P. Umari and A. Pasquarello, Phys. Rev. Lett. 95, 137401 126404 (2008). (2005). 18

[127] L. Giacomazzi, P. Umari, and A. Pasquarello, Phys. Rev. Lett. 076804 (2005). 95, 075505 (2005). [162] J. R. Yates, X. Wang, D. Vanderbilt, and I. Souza, Phys. Rev. [128] P. Umari and A. Pasquarello, Phys. Rev. Lett. 98, 176402 B 75, 195121 (2007). (2007). [163] A. Kokalj, J. Mol. Graph. Model. 17, 176 (1999), URL http: [129] P. Giannozzi and S. Baroni, J. Chem. Phys. 100, 8537 (1994). //www.xcrysden.org/. [130] X. Gonze, Phys. Rev. B 55, 10337 (1997). [164] W. Humphrey, A. Dalke, and K. Schulten, J. Mol. Graph. [131] X. Gonze and C. Lee, Phys. Rev. B 55, 10355 (1997). Model. 14, 33 (1996). [132] A. Dal Corso, A. Pasquarello, and A. Baldereschi, Phys. Rev. [165] URL http://www.tcm.phy.cam.ac.uk/~mdt26/ B 56, R11369 (1997). casino2.html. [133] A. Dal Corso, Phys. Rev. B 64, 235118 (2001). [166] D. Korotin, A. V. Kozhevnikov, S. L. Skornyakov, I. Leonov, [134] F. Favot and A. Dal Corso, Phys. Rev. B 60, 11427 (1999). N. Binggeli, V. I. Anisimov, and G. Trimarchi, Eur. Phys. J. B [135] A. Dal Corso and S. de Gironcoli, Phys. Rev. B 62, 273 (2000). 65, 91 (2008), URL http://[email protected]. [136] A. Dal Corso, submitted (2009). [167] L. Martin-Samos and G. Bussi, Comp. Phys. Comm. (2009), [137] A. Dal Corso, Phys. Rev. B 76, 054308 (2007). URL http://sax-project.org. [138] F. Giustino, M. L. Cohen, and S. G. Louie, Phys. Rev. B 76, [168] URL http://dp-code.org. 165108 (2007). [169] URL http://www.bethe-salpeter.org. [139] M. Lazzeri and S. de Gironcoli, Phys. Rev. B 65, 245402 [170] A. Kokalj, Comp. Mater. Sci. 28, 155 (2003), URL http: (2002). //www-k3.ijs.si/kokalj/guib/. [140] M. Lazzeri and F. Mauri, Phys. Rev. Lett. 90, 036401 (2003). [171] E. Runge and E. K. U. Gross, Phys. Rev. Lett. 52, 997 (1984). [141] M. Lazzeri and F. Mauri, Phys. Rev. B 68, 161101(R) (2003). [172] M. A. L. Marques, C. L. Ullrich, F. Nogueira, A. Rubio, [142] D. D. Koelling and B. N. Harmon, J. Phys. C: Solid State Phys. K. Burke, and E. K. U. Gross, eds., Time-Dependent Den- 10, 3107 (1977). sity Functional Theory, vol. 706 of Lecture notes in Physics [143] A. H. MacDonald and S. H. Vosko, J. Phys. C: Solid State (Springer-Verlag, Berlin, Heidelberg, 2006), DOI-10.1007/3- Phys. 12, 2977 (1979). 540-35426-3-17. [144] A. K. Rajagopal and J. Callaway, Phys. Rev. B 7, 1912 (1973). [173] G. Onida, L. Reining, and A. Rubio, Rev. Mod. Phys. 74, 601 [145] N. Troullier and J. L. Martins, Phys. Rev. B 43, 1993 (1991). (2002). [146] A. M. Rappe, K. M. Rabe, E. Kaxiras, and J. D. Joannopoulos, [174] D. Rocca, R. Gebauer, Y. Saad, and S. Baroni, J. Chem. Phys. Phys. Rev. B 41, 1227 (1990). 128, 154105 (2008). [147] G. Kresse and J. Hafner, J. Phys.-Condens. Mat. 6, 8245 [175] P. Umari, G. Stenuit, and S. Baroni, Phys. Rev. B, in press (1994). (2009), arXiv/0811.1453. [148] A. Smogunov, A. Dal Corso, and E. Tosatti, Phys. Rev. B 70, [176] H.-V. Nguyen and S. de Gironcoli, submitted to Phys. Rev. B 045417 (2004). (2009), URL http://arxiv.org/abs/0902.0889. [149] A. Dal Corso, A. Smogunov, and E. Tosatti, Phys. Rev. B 74, [177] H.-V. Nguyen and S. de Gironcoli, submitted to Phys. Rev. B 045429 (2006). (2009), URL http://arxiv.org/abs/0902.0883. [150] M. Profeta, F. Mauri, and C. J. Pickard, J. Am. Chem. Soc. [178] H.-V. Nguyen, Ph.D. thesis, SISSA (2008), URL 125, 541 (2003). http://www.sissa.it/cm/thesis/2008/ [151] C. G. van de Walle and P. E. Blöchl, Phys. Rev. B 47, 4244 VietHuyNguyen_PhDthesis.pd%f. (1993). [179] A. Marini, C. Hogan, M. Grüning, and D. Varsano, Comp. [152] F. Mauri and S. G. Louie, Phys. Rev. Lett. 76, 4246 (1996). Phys. Commun. p. in press (2009), URL http://www. [153] F. Mauri, B. Pfrommer, and S. G. Louie, Phys. Rev. Lett. 77, yambo-code.org. 5300 (1996). [180] A. Calzolari, I. Souza, N. Marzari, and M. B. Nardelli, [154] C. J. Pickard and F. Mauri, Phys. Rev. Lett. 88, 086403 (2002). Phys. Rev. B 69, 035108 (2004), URL http://www. [155] T. Thonhauser, D. Ceresoli, A. Mostoﬁ, N. Marzari, R. Resta, wannier-transport.org. and D. Vanderbilt, submitted to Phys. Rev. Lett. (2009), URL [181] URL http://[email protected]. http://arxiv.org/abs/0709.4429v1. [182] A. Kokalj, pwtk: a Tcl scripting interface to PWscf, soon to [156] C. Gougoussis, M. Calandra, A. Seitsonen, and F. Mauri, First be available on http://qe-forge.net. principles calculation of X-ray absorption spectra in an ul- [183] D. Raczkowski, A. Canning, and L. W. Wang, Phys. Rev. B trasoft pseudopotential scheme: from SiO2 to high-Tc com- 64, R121101 (2001). pounds , submitted. [184] T. A. Arias, M. C. Payne, and J. D. Joannopoulos, Phys. Rev. [157] C. Gougoussis, M. Calandra, A. Seitsonen, C. Brouder, B 45, 1538 (1992). A. Shukla, and F. Mauri, Phys. Rev. B 79, 045118 (2009). [185] A. Qteish, Phys. Rev. B 52, 14497 (1995). [158] URL http://www.wannier.org/. [186] C. A. Mead, Rev. Mod. Phys. 64, 51 (1992). [159] P. Silvestrelli, N. Marzari, D. Vanderbilt, and M. Parrinello, [187] R. di Meo, A. Dal Corso, P. Giannozzi, and S. Cozzini, Pro- Solid State Commun. 107, 7 (1998). ceedings of the COST School, Trieste (2009), to be published. [160] A. Calzolari, I. Souza, N. Marzari, and M. B. Nardelli, Phys. [188] F. Gygi and A. Baldereschi, Phys. Rev. B 34, 4405 (1986). Rev. B 69, 035108 (2004). [161] Y.-S. Lee, M. B. Nardelli, and N. Marzari, Phys. Rev. Lett. 95,