<<

Quasiparticle , lifetimes, and spectra using the GW approximation

by

Derek Wayne Vigil-Fowler

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

Physics

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Steven G. Louie, Chair Professor Feng Wang Professor Mark D. Asta

Summer 2015 scattering, lifetimes, and spectra using the GW approximation

c 2015

by Derek Wayne Vigil-Fowler 1

Abstract

Quasiparticle scattering, lifetimes, and spectra using the GW approximation

by

Derek Wayne Vigil-Fowler

Doctor of Philosophy in

University of California, Berkeley

Professor Steven G. Louie, Chair

Computer simulations are an increasingly important pillar of science, along with exper- iment and traditional pencil and paper theoretical work. Indeed, the development of the needed approximations and methods needed to accurately calculate the properties of the range of materials from to nanostructures to bulk materials has been a great tri- umph of the last 50 years and has led to an increased role for computation in science. The need for quantitatively accurate predictions of material properties has never been greater, as technology such as computer chips and require rapid advancement in the control and understanding of the materials that underly these devices. As more accuracy is needed to adequately characterize, e.g. the conversion processes, in these materials, improvements on old approximations continually need to be made. Additionally, in order to be able to perform calculations on bigger and more complex systems, algorithmic devel- opment needs to be carried out so that newer, bigger computers can be maximally utilized to move science forward. 2

In this work we discuss our endeavors to improve approximations and algorithms to an- swer the challenge of better describing material properties. After an introduction to define and discuss all the important concepts that appear later, we first discuss the calculation of so-called satellite properties in the photoemission spectra (PES) of doped . While the GW approximation accurately produces the quasiparticle across a range of materials from nanostructures and molecules to bulk and , it does not accurately produce the satellite properties seen in PES experiments. We find that a more advanced treatment of the Green’s function, the cumulant expansion, is needed to adequately describe the satellite properties of doped graphene on SiC. In addition to this more advanced Green’s function treatment, a novel technique is devised for including the screening due to the SiC substrate on which the doped graphene is placed. This more ad- vanced treatment of the substrate is also crucial for obtaining agreement with experiment. Next, we show how the cumulant expansion can be used to accurately predict the ARPES spectra of bulk Si and the time-domain capacitance spectra of two-dimensional (2DEGs) in wells, with both the quasiparticle and satellite features given correctly (unlike in GW theory, in which only the quasiparticle properties are predicted accurately). We then discuss carrier lifetimes from the GW approximation in bulk Si and GaAs, showing how theory can provide access to detailed microscopic in- formation that could be of use in designing more efficient photovoltaics. In chapter 6, we discuss the effect of the pseudopotential approximation on excited-state GW calculations. Finding a small amount of error due to the use of nodeless pseudowavefunctions when us- ing pseudopotentials, we are able to understand the tendency of GW calculations that use pseudowavefunctions to overestimate the band gap in many common semiconductors. We quantify this error and suggest improved techniques for applications where this error is too large. In the last section on research, we discuss the effect of self-consistency in GW calculations. Chapter 7 is on computational algorithm development, and there we discuss some algorithmic advances made in improving the BerkeleyGW code. A technique for bet- ter distributing the data during the calculation of the inverse dielectric matrix is discussed and shown to give very good performance improvements, especially for the large systems that are becoming increasingly common. Other small improvements that allow for a more accurate calculation of quasiparticle lifetimes are also discussed. Finally, a few appendices are included for completeness. i

To Rafael,

I cannot wait to meet you. ii

Contents

List of Figures v

List of Abbreviations vi

Acknowledgments vii

I Background 1

1 Introduction 2 1.1 What is many-body perturbation theory (MBPT)? ...... 2 1.2 Themany-bodyhamiltonian ...... 7 1.2.1 TheBorn-OppenheimerApproximation ...... 8 1.2.2 Symmetries : translational and point-group ...... 9 1.2.3 PlanewavesandPseudopotentials ...... 10 1.3 Mean-fieldtheories ...... 12 1.3.1 Hartreetheory ...... 13 1.3.2 Hartree-Focktheory ...... 13 1.3.3 Densityfunctionaltheory ...... 14 1.3.4 Single Slater determinants and interactions ...... 16 1.4 TheGWapproximation ...... 17 1.5 Quasiparticlelifetimes ...... 19 1.6 BeyondGW:thecumulantexpansion ...... 20 CONTENTS iii

II Research 27

2 PlasmonSatellitesinDopedGrapheneARPES 28 2.1 Summary ...... 28 2.2 Background ...... 28 2.3 Methods...... 29 2.3.1 Mean-fieldcalculation ...... 30 2.3.2 GWcalculation ...... 30 2.3.3 GW+Ccalculation ...... 32 2.3.4 Linear-bandsmodelcalculation ...... 33 2.4 Results...... 34 2.4.1 Suspendedgraphene ...... 35 2.4.2 Grapheneonsiliconcarbide ...... 40 2.5 Conclusions ...... 44

3 OtherStudiesofPlasmonSatellites: BulkSiand2DEGs 47 3.1 Summary ...... 47 3.2 Introduction...... 47 3.3 Results...... 48 3.3.1 BulkSi...... 48 3.3.2 2DEGsinsemiconductorquantumwells ...... 48 3.4 Discussion...... 52

4 Carrierlifetimesduetoelectron-electronscattering 53 4.1 Summary ...... 53 4.2 Introduction...... 53 4.3 Results...... 54 4.3.1 BulkSi...... 54 4.3.2 BulkGaAs ...... 54 4.4 Discussion...... 55

5 Pseudowavefunctions in the GW approximation 58 5.1 Summary ...... 58 5.2 Background ...... 58 5.3 Methods...... 60 5.4 Results...... 61 CONTENTS iv

6 Self consistency in GW calculations 69 6.1 Summary ...... 69 6.2 Introduction...... 69 6.3 ComputationalDetails ...... 72 6.4 Results...... 73

III Computational Methods 75

7 Improved Accuracy and Scaling of Full-Frequency GW calculations 76 7.1 Parallelfrequencies ...... 78 7.1.1 BerkeleyGW’s calculation of ǫ−1 ...... 78 7.1.2 Why you should calculate ǫ−1(ω) at multiple frequencies in parallel . 86 7.2 X+CORvs.COH+SEXdivisionsofΣ ...... 89 7.3 PrincipalvalueintegralevaluationofΣ ...... 91

IV Appendix 93

8 COHSEX Derivation 94 8.1 DerivationofCOHSEXSelfEnergy ...... 94

9 Doped Graphene Refinement 101

Bibliography 104 v

List of Figures

2.1 Contour plot: ab initio/linear bands GW/GW+C for isolated graphene . . 36 2.2 Spectral functions: ab initio/linear bands GW/GW+C for isolated graphene 38 2.3 ImΣ for linear bands GW/GW+C for isolated graphene ...... 39 2.4 Scaling plots: ab initio/linear bands GW/GW+C for isolated graphene . . 41 2.5 Contour plot: ab initio/linear bands GW/GW+C for graphene on SiC . . . 42 2.6 Spectral functions: ab initio/linear bands GW/GW+C for graphene on SiC 43 2.7 ab initio GW+C contour plot and experiment for graphene on SiC . . . . . 45

3.1 ab initio GW, GW+C contour plots and experiment for bulk Si ...... 49 3.2 ab initio GW,GW+Cspectralfunctionplotsfor2DEG ...... 50 3.3 ab initio GW, GW+C quasiparticle, satellite peak positions vs experiment. 51

4.1 Carrier lifetimes in Si with electron-electron, electron- contributions. 55 4.2 Brillouin-zone resolved carrier lifetimes in bulk Si...... 56 4.3 ImpactionizationrateinbulkGaAs...... 57

5.1 ExchangechargedensityinatomicSi ...... 62 5.2 ExchangepotentialinatomicSi ...... 63 5.3 Product of exchange charge density and exchange potentialinatomicSi . . 64

7.1 Depiction of calculation of as matrix multiplication...... 80 7.2 Distribution of dielectric matrix and bands...... 81 7.3 Demonstration of matrix communication scheme...... 82 7.4 Demonstration of parallel frequencies scheme...... 88 vi

List of Abbreviations

ARPES Angle Resolved Photoemission Spectroscopy DFT Density Functional Theory DOS Electronic PDOS Projected Density of States or Partial Density of States PES Photoemission Spectroscopy UPS Photoemission Spectroscopy XPS X-ray Photoemission Spectroscopy vii

Acknowledgments

The process of obtaining my Ph.D has been a long, challenging and beautiful experience, and I could never have gotten through it without a wealth of help along the way. First of all, I would like to thank all of my group members for their help through the years. When I first joined the group, Jack Deslippe, Manish Jain, and Georgy Samsonidze were kind and helpful guides as I learned about the theory and practical implementation of density functional theory and many-body perturbation theory. They gave of their time unbegrudgingly and their instruction helped set me on the path to being an efficient prac- titioner of computational condensed physics. Next, my physical insight and research rigor was built painstakingly with the help of mentor, teacher, and friend Dr. Johannes Lischner. Without a doubt, the way in which I conduct my research on a day-to-day basis owes most to Johannes, for it is he who taught me the need for simple test systems and physical models, for changing things slowly and testing after every change you make, and for paying attention to all the small details of the calculation. Johannes has a gift for boiling things down to their simplest, most important physical aspects, and watching how he does this has contributed greatly to my development as a physicist. Johannes also has a great sense of humor and smile, both of which I greatly enjoyed during my time working with him. My two years working with Dr. Lischner changed the way my brain worked, for the better, and for that I will always be grateful to him. Finally, in my last stage in the group, I’ve worked with several people in a collaborative way that befits someone who has learned a thing or two during the beginning of the PhD and is ready to start becoming a real, grown-up scientist (tongue firmly in cheek). Dr. Marco Bernardi showed me how to think about how carriers lose energy after being excited by , and that completely changed my view of my own field and knowledge, showing me the way to marry my interest in alternative energy to my already-developed expertise. ACKNOWLEDGMENTS viii

His ability to think about things in a simple physical way and be a volcano of ideas is simple impressive, and I owe him a great debt of gratitude for sharing his knowledge during our collaborations. I’m very proud of the work we’ve done together and look forward to working with Marco often in the future. My work on the effect of pseudowavefunctions on GW calculations was a pleasure in large part due to my collaboration with Dr. Brad Malone, now of Dupont Pioneer. Brad has a way of asking simple, but important questions that really help further understanding and find good directions for new avenues of inquiry for research projects. Also, Brad is the king of converging calculations and the entire GW community owes him a debt of gratitude for the simple, physical convergence scheme that he developed during his time at Berkeley. I have tried to carry on his tradition now that he has left the field. Besides all of this Brad is just a funny, nice, and overall great human being. I had so much fun working for him that it makes me smile and miss it as I type these words and remember what it was like. The field will miss Dr. Malone. During a late project, not actually included in this thesis, I also worked closely with Brad Barker to understand phenomena in materials that have large -orbit . He quickly got me up to speed on the subject with simple and intuitive explanation, for which I am grateful. Throughout my Ph.D, although I had no research projects with them, I worked with sev- eral people on code development and had important discussions about physics that were as influential, if not more, than some of my projects. Named above already are Jack Deslippe, Manish Jain, and Georgy Samsonidze. They all gave me great help during my begin- ning stages at Berkeley, and Jack has continued to provide guidance on how to implement high-performance code for scientific computing, including recommending the Blue Waters Fellowship to me. I am very grateful to Jack for all his help through the years. I also learned many things from the applied mathematicians who were part of the SciDAC collaboration, including Lex Kemper, Lin Lin, Fang Liu, and Chao Yang. From them I got a better idea of the universe of ideas available to people trying to scale to big computers and systems, and learned that I should basically always talk to them before I try to implement a harebrained idea of my own. From Steve’s group, I learned a lot from Jamal Mustafa about how to code properly and shared many good conversations about how to push the state of the art in computational to better accuracy, cleaner code, and bigger scales. I hope to work with Jamal in the future, as I think we are of very similar temperament and mindset. Plus, we got married one month apart and are having children one month apart (he beat me on both counts), so it seems we’re awfully similar in how we go about things. This acknowledgments section simply would not be complete without my dear friend and office mate, Felipe Homrich da Jornada, a tornado of ideas if there ever was one. If you have not met Felipe, it’s almost impossible to describe just how amazing of a person he is. In addition to having a razor sharp intellect that is both creative and analytical (these do NOT always go together!), Felipe is probably the kindest person I’ve ever known. I have ACKNOWLEDGMENTS ix learned so much from conversations that started casually in the office and quickly developed with equations being written on the backboard and computational schemes envisaged. I will always cherish these brainstorming sessions for what they were : the budding of my own independence as a researcher and scientific thinker. Additionally, Felipe is a great person to share research results with. He will always give good, tough feedback that helps fill in any gaps in understanding of the data at hand. He also does so with a smile. In this, he has learned from the ways of the master, our advisor Steven Louie, who seems preternaturally kind and detailed, at the same time. Felipe is also a great human being outside of science and understands that scientists are human beings too, with lives, struggles, and problems. His kind understanding was of great value to me as I navigated the rough and tumble world of academic science with dreams, ideas, and problems that lie completely outside of my professional life. I firmly believe that such realities must be acknowledged by the academy if they want to retain talent and recruit people from a broader background than they are currently. Felipe is truly a beacon in this regard, as in all others previously discussed. Besides all of my fellow group members, I am grateful to my advisor Steven Louie (a group member, but of a different type). As mentioned before, Steve has an unusual eye for detail that is quite frankly astounding and the feedback he gives is almost universally useful to finding new avenues for understanding in the project at hand. His encyclopedic knowledge of nearly every result from “the old days”, as he is fond of saying, is simply incredible, but even more impressive is the simple, powerful pictures (or models, if you prefer) he has built up to understand a truly vast array of phenomena that define his research area in condensed matter physics. I can think of no better person from which to learn my profession, with all of its tricky details that end up mattering a lot, and I feel grateful every day for getting to work in Steve’s group. Steve’s advice on making good, clear figures has also been an invaluable contribution to my scientific tool set which I hold in my head at all times, including when reading others’ papers and watching their presentations. Steve also gives all of his feedback with a smile and a laugh, which is a most wonderful thing in the academic sphere, where such comments are often given much more harshly. Finally, Steve has always been there for me when I was applying for various fellowships and grants, and when I was trying to figure out my exact plan for exiting Berkeley. He has written me numerous recommendation letters and given me advice that has proven invaluable to my process of growing into the scientist I am today and moving onto my next place of employment, the National Renewable Energy Laboratory in Golden, CO. In short, Steve has been a great advisor and I am grateful for my time spent in his group. Throughout my research career at Berkeley I have been blessed to have the support of my now wife, Meg Vigil-Fowler. Simply put, without her I am not sure I would’ve finished my Ph.D here at Berkeley. She helped me when I was at the lowest of lows, struggling with depression and OCD, and helped me become a better, happier person despite the fact that ACKNOWLEDGMENTS x

I did not always deserve all that she gave me. Life is hard, research is hard, and to have Meg to come home to every day, knowing that I could unwind, have a nice dinner, enjoy warm conversation, watch a show, and just relax was priceless in my ability to cope with the stressors of the Ph.D. She also helped me see when stress was getting to me and when I needed to look past the present moment to the calmer horizon. Without her soothing presence, I do question whether someone as volatile as the younger version of myself was could’ve made it through this program. Perhaps I would’ve, but I doubt I would have been nearly as productive and creative without the wealth of happiness that my life with Meg has brought me. I love you dear, more than I could ever express. Finally, throughout my life, I have been supported by many teachers, mentors, and family members whom I need to thank. First and foremost, my parents have always supported me in my life, no matter what it was I was doing and even when I did not fully appreciate all they did for me. My mom, especially, has always cultivated a spirit of inquiry and lifelong learning, always having a shelf full of books that she was reading. She always believed in my abilities and saw that I had a special talent at physics, coaxing me along even when I questioned whether I might should do something that satisfies my need to help in the world more directly. Her coaxing at several crossroads in my life helped me end up here, and for that I am always grateful, as I truly believe I am meant to be a scientist and that my skills as a scientist also help me in my other life endeavors. It goes without saying that all the years of selfless parental love made me the person I am today and for that I will never be able to repay you, Mom and Bob, but I hope my thanks and continual drive to spend time with you shows how much you mean to me. I had a mentor in high school, John MccConnell, who is a truly fantastic guy. John is a retired physicsts who worked at Ames lab in Iowa and at Los Alamos National Laboratory. Also important to his biography is that he grew up on a farm in Nebraska and the gumption he built up in his upbringing there is like nothing I’ve ever seen in anyone else. John is a high energy, kind, and brilliant guy from whom I have learned very much. Indeed, during my junior year in high school when I was really looking to challenge myself and push myself to the next level of achievement, John was exactly the person I needed to encouraging me to keep pushing harder because there was a whole bigger world of science outside my hometown. He helped me run a science camp for kids after my junior year in high school, mentoring me in process in a multi-faceted way. He taught me how to teach science to kids. He taught me physics that I did not know at the time. He taught me how to build things and fiddle around to make things work, even when they don’t seem to want to cooperate. Most importantly, he taught me that I would need toughness, grit and drive in order to succeed in science and in life. Ever since, I have never looked for the easy way out and have instead looked to build myself up with ever greater challenges, picking myself up when I fail and trying harder. I will be eternally indebted to John for this needed push at such a ACKNOWLEDGMENTS xi critical time in my life. We’ve collaborated many more times over the years and have been fast friends ever since. Thanks so much for everything John. In college, my most influential professor was my advanced lab professor, Dave Herzog. Professor Herzog was a bit of a perfectionist that could rub students who were not ready to be pushed hard the wrong way. Indeed, when my first lab report came back and it was plastered in red ink my first reaction was to be upset and taken aback. But then I read Professor Hertzog’s comments and realized everything he was saying was legitimate and I needed to improve. The rest of his course was quite similar, with Professor Hertzog demanding a high quality of work and a continual effort on my part to continue improving. I emerged as a student who appreciated the experimental realization of so many of the ideas that I had been learning in my classes, an experience that was exhilarating. I think this course, along with my REU the previous summer at University of Rochester, were what really hooked me into the scientific path once and for all. Besides all the great science, Professor Hertzog was also very influential invthat he was to first person to emphasize to me about the importance of a good and clear presentation, with well arranged slides. In a sense, I always here his voice in my head (along with Steve’s and many others), coaching me to present things cleanly, with a minimum of clutter and maximum of clarity. I hope such classes are taken by budding physicists everywhere. Even if one becomes a theorist, I think these classes are perhaps the most important of any in the undergradaute curriculum. It just really helps to make the connection of class material to real measurements. To all my other teachers who helped me get here, I thank you. 1

Part I

Background 2 1

Introduction

As I have meandered through my PhD here in the Louie group at Berkeley, I have slowly come to understand the meaning of “many-body perturbation theory,” or MBPT. This is a good thing since it is my acclaimed area of expertise! I have also learned a good deal of other things, such as the meaning of “electronic structure” and its intricacies, how to code well in serial and parallel, the meaning and trickiness of the term fermi level, density functional theory (DFT), pseudopotential theory, and many other smaller things. In this introduction I hope to give a very accessible and simple view on some of these topics, presented in a straightforward manner appropriate for any beginners who might be entering the group. I will be pedantic because I think that is helpful in such a complicated field and because I spent many years feeling inadequate for my lack of ability to grasp what seemed to be very basic and fundamental concepts in the field. I later found out that everyone felt this way and so I feel it is best to not give any impression of some grand understanding, but instead present things as simply and clearly as I can, with many helpful details provided along the way. All the technical aspects of the theory can be found in standard textbooks in the field and interested readers can peruse those to find all the nitty, gritty details left out in my simple coverage of the topics at hand. I hope it will be of use to anyone who might stumble across it.

1.1 What is many-body perturbation theory (MBPT)?

A tautological answer to the question posed in this section’s title is that many-body per- turbation theory is a perturbation theory that allows you to deal with multiple bodies, or . But what does that mean in practice? And how does it compare to perturbation theory in single-particle ? Let’s start with the latter question. In textbook single-particle quantum mechanics you 1.1. WHAT IS MANY-BODY PERTURBATION THEORY (MBPT)? 3 use perturbation theory to incorporate somewhat small contributions to the single-particle hamiltonian at an approximate level. For now, let’s consider only static, i.e. not time- dependent, perturbation theory in single-particle quantum mechanics. That is what is in mind in the discussion that follows (we turn to time-dependent perturbation theory in single- particle quantum mechanics later on). The reason you have to do it at an approximate level is because the hamiltonian is in general too difficult to solve with this additional, perturba- tive term. The way you include the smaller contributions to the hamiltonian in perturbation theory is to start with a relatively simple single-particle hamiltonian of the variety that you can actually solve analytically, e.g. the hydrogen , and then use the eigenfunctions and eigenenergies of that simple system to express changes in the eigenfunctions and eigenener- gies due to the new term in the hamiltonian. The key point to get right in this process is that the original, simple single-particle hamiltonian must give a much larger contribution to the hamiltonian than does the additional perturbative term. Trying to do perturbation theory for a term in the hamiltonian that is comparable in size to other contributions will lead to results of dubious quality and meaning [35]. So, getting the proper division of the hamiltonian right is a critical aspect of being able to successfully use perturbation theory in single-particle quantum mechanics, even if this basic detail is somewhat obscured in standard textbooks since the author is guiding the student through standard problems (such as the hydrogen atom) for which the decomposition of the hamiltonian into terms of decreasing magnitude has already been done. In many-body perturbation theory, the equivalent task of partitioning the hamiltonian is an active area of research, especially for difficult materials systems containing localized . Moving onto many-body perturbation theory, what is the equivalent of the large con- tribution to the hamiltonian of single-particle quantum mechanics, on top of which we add smaller contributions perturbatively? First of all, we know that the kinetic energies of distinct particles are not coupled, so solving this part of the hamiltonian is quite straight- forward. Hence, the sum of all the single particle kinetic energies seems like a good starting point for a simple, large contribution to the hamiltonian to which we’ll add other contri- butions (and indeed it is). However, the problem is that there all those pesky coulomb interactions between charged particles ( and electrons) in a . These two-body coulomb interactions are what make the full many-body hamiltonian impossible to solve exactly and constitute much of the challenge for condensed matter physicists. How can these interactions be dealt with in a tractable way from which we can get a simple physical understanding of the system under study? The answer for many-body perturbation theory lies in a concept known as a mean field. A mean field, simply put, is an average potential with which all electrons interact and which is due to the presence of all the other electrons in the system. It is an approximation to the true, complicated potential that the electrons in a system would see due to all the 1.1. WHAT IS MANY-BODY PERTURBATION THEORY (MBPT)? 4 other electrons in the system if one were able to sum up all the coulomb contributions at all the different points in the solid. Thus, the big challenge of many-body perturbation theory is to judiciously pick this mean field so that it is both tractable to compute and is a faithful representation of the largest contribution to the many-body hamiltonian. The requirement of tractability allows the mean field calculation to be performed and starting eigenfunction and eigenenergies to be obtained, with which many-body perturbation theory can be performed. The requirement of this mean field being a faithful representation of the largest contribution to the many-body hamiltonian ensures that the quantities computed from many-body perturbation theory will be meaningful. Without a quality mean field, as in singling out the largest part of the single-particle hamiltonian in single-particle quantum mechanics, many-body perturbation theory is doomed. Of course, there is much more to many-body perturbation theory than merely picking a mean field that is both tractable and faithful. Indeed, it is obvious that in order to yield useful information about the effects due to interactions in addition to those included in the chosen mean field, so-called many-body effects, the theory must be quite complicated. However, I think it is important for beginners in the field to know that the starting point of any exercise in many-body perturbation theory is to have a tractable, faithful mean field as a counterpart to the commonly known large contributions to single-particle hamiltonians (e.g. the coulomb potential for the hydrogen atom) in single-particle quantum mechanics. Knowing this allows two things to be made clear : 1) the mapping between the important concepts from the simpler perturbation theory in single-particle quantum mechanics and the corresponding concepts in many-body perturbation theory, 2) the fact that in single-particle quantum mechanics, due to the simplicity of the interactions, the largest contributions to the hamiltonian are often known exactly, while in many-body quantum mechanics finding a quality mean field can constitute an active area of research in its own right. Understanding both of these points are important in order for one to be an able practitioner of many-body perturbation theory. The first point is critical for basic conceptual understanding of what is done in many-body perturbation theory, while the second point must always be kept in mind so that the many-body perturbation theory calculations performed are meaningful. After that long preamble, what is many-body perturbation theory? (Note : I will refer to many-body perturbation theory as MBPT from here on out; I wrote it out previously to distinguish between MBPT and perturbation theory in single-particle quantum mechanics) The simplest answer is that it is a quantum field theory for systems with many strongly- interacting particles. The fact that MBPT is a quantum field theory is obscured by the apparent difference in the expressions in MBPT textbooks from those in standard quantum field theory textbooks. To understand why MBPT does not always look recognizable as quantum field theory in action, we need to first revisit what quantum field theory is. Quantum field theory 1.1. WHAT IS MANY-BODY PERTURBATION THEORY (MBPT)? 5 starts with the notion of a of some non-interacting system(s) of particles, and builds a machinery by which you can create excitations from this ground state via interactions between particles [84]. Generally, quantum field theory textbooks are oriented towards , so the non-interacting ground state is a state with no particles in the system and the non-interacting particles are truly free particles with only a kinetic energy term in their non-interacting hamiltonians. The excitations from the ground state are actual particles being created through various interactions. This is a natural setup for particle physics. However, for condensed matter physics, the appropriate choice for the non-interacting ground state is instead the state with all the particles in the system arranged such that the system is in its lowest energy configuration (this is of course why the lowest energy configuration is called the ground state of the system under study). Additionally, the non- interacting particles in condensed matter physics are not truly free particles, but are instead particles that have a kinetic energy term but also interact with the mean field due to the presence of all the other particles in the system. The particles are “non-interacting” in the sense that they do not directly interact with any of the other particles, but only through some average potential. The non-interacting ground state is the lowest energy configuration of the set of particles under consideration interacting through the chosen average potential. The excitations for the quantum field theory of condensed matter systems are, then, any changes in the configuration of the system that lead to a higher energy state. Such changes can come in the form of excitation of electrons from lower energy levels to higher energy levels, or the creation of vibrations, oscillations of electron density, spin waves, etc. The former type of excitations are referred to as “” and are generally fermionic, while the latter type of excitations are referred to as “collective excitations” and are bosonic. The superset of all such excitation in condensed matter systems are referred to as “elementary excitations.” [79] Thus, we see that the reason for the confusing state of affairs in which it is often hard to see that MBPT is really just quantum field theory is largely due to the fact that in particle physics, the focus of most quantum field theory textbooks, a truly free particle picture is sufficient and the excitations of the ground state are actually particles, while in condensed matter physics the “non-interacting” particles actually interact via a mean field and the excitations are not true particles. The complexity in particle physics comes from all the different particles and the complicated forms of their interactions, while in condensed matter physics the challenge comes from the fact that the particles are close together and strongly interacting, which leads to many types of elementary excitations that interact in interesting ways. A final remark on how MBPT is different quantum field theory for particle physics. In many standard textbook treatments of quantum field theory for particle physics the 1.1. WHAT IS MANY-BODY PERTURBATION THEORY (MBPT)? 6 weakness of the interactions between particles allows the use of first-order perturbation theory to get various important results, e.g. the Baba scattering cross section. However, in MBPT, the particles are strongly interacting and doing first-order perturbation theory is, in general, insufficient to get reasonable results. Instead, perturbation theory must be performed to infinite order in MBPT, which leads to the machinery of Dyson’s equation (to be discussed in more detail later). Ok, we’ve now understood why MBPT is just quantum field theory applied to condensed matter physics. In the process we also highlighted a difference between MBPT and pertur- bation theory in single-particle quantum mechanics (let’s call it single-particle perturbation theory, or SPPT) : the output of MBPT is the excitations of the system from the ground state, while SPPT yields a better description of the ground state and excited states of the system under study (a more complete picture is possible in SPPT due to the simplicity of single-particle problem relative to the many-body problem). This difference is impor- tant to keep in mind when becoming familiar with many-body perturbation theory. MBPT also has the same difference with so-called model hamiltonian approaches in many-body physics, which try to solve for the exact ground state and wavefunctions for a hamiltonian acting with restricted interactions in a restricted Hilbert space. It is also important to know where MBPT and SPPT meet. Where the two meet is time- dependent SPPT, which involves excitations of the single particle under study to higher states via time-dependent potentials, usually due to electromagnetic (EM) fields. That MBPT meets SPPT when considering SPPT for time-dependent EM fields is not terribly surprising since EM fields are composed of and SPPT is implicitly including many- body interactions when interactions with light are considered. Another way to view the difference between MBPT and SPPT without time-dependent interactions is that the former gives dynamics, while the latter deals with statics. Excitations are fundamentally dynamic, while the ground state is firmly a static concept. This again makes it unsurprising that MBPT and SPPT meet at time-dependent SPPT. Having established that MBPT gives excitations from the ground state and is a dynamic theory, we can expect to see many time- and frequency-dependent quantities appear in the theory. This is indeed the case, as you have the time-dependent Green’s function, self energy, susceptiblity, etc. that appear in the theory. This is in contrast to static theories of the ground state, such as density functional theory (DFT) or the Hartree-Fock approximation (HFA), which are mean-field theories involving static interaction potentials. Indeed, these theories yield the mean-field starting point needed for performing MBPT calculations. The only thing we need now to get started doing MBPT calculations, besides all the formal machine and expressions I’ve omitted, is to select the inter-particle interactions that we think are important and perform our infinite-order MBPT calculation using this interaction. This interaction will explicitly involve multiple particles, unlike the mean-field 1.2. THE MANY-BODY HAMILTONIAN 7 potential. What governs our choice of interaction? We again want to faithfully represent the inter-particle interaction as best as possible by including the most important interactions, and we also need an interaction that is tractable, i.e. we need to be able to do the infinite order summation required in MBPT. These are two quite strict requirements that are not easy to meet for any arbitrary interaction between particles. Hence, standard interactions, such as those given by Hartree theory or Hartree-Fock theory, are most often used in MBPT since they are both tractable and treat the most important physics. Most of our work as MBPT practitioners is to apply these tried-and-true theories to increasingly large, complex and realistic systems, and to understand the theories so we can make needed modifications, such as including the effect of the substrate on electron energy levels and dynamics. Working out completely new theories is in general very hard, so clever modifications are usually the best use of one’s time. It is my hope that some of what I have written above is at least somewhat useful to someone new to the subject of MBPT. It took me a long time and many readings of the standard MBPT textbooks, such as Mattuck [79], Fetter and Walecka [28], Mahan [75], and Ziman [103], as well as the Hedin-Lundqvist paper [43], to piece together the relation between MBPT and quantum field theory as it is standardly taught, the relation between perturbation theory in single-particle quantum mechanics and MBPT, the fact that to get scattering away from the ground state you have to have time-dependent interactions so that dynamics excitations scattering, and that MBPT involves perturbation series that are ∼ ∼ summed to infinte order because the interactions in condensed matter systems are strong enough to require such a treatment in order to get trustworthy results, and many of the other intricacies of MBPT. I hope the above description makes this understanding a bit easier for future students. Given this basic understanding of many-body perturbation theory, I will now proceed to cover some of the usual material on the full many-body Hamiltonian, mean-field theories and many-body perturbation theory, albeit in an abbreviated fashion since such things have been covered very well in many of the standard references in the field.

1.2 The many-body hamiltonian

The many-body hamiltonian for a collection of nuclei and their corresponding electrons is given by [77] 1.2. THE MANY-BODY HAMILTONIAN 8

~2 2 2 ˆ 2 ZI e 1 e H = i + − 2m ∇ − ri RI 2 ri rj Xi Xi,I | − | Xi=6 j | − | ~2 2 2 1 ZI ZJ e I + (1.1) − 2MI ∇ 2 RI RJ XI XI=6 J | − | where e is the quantum of charge, ~ is Planck’s constant, m is the mass of the electron, ri is the position of electron i, RI ,ZI and MI are the position, charge and mass of nuclei I, respectively, and i ( I ) is the gradient operator for electron i (nuclei I). The first term gives ∇ ∇ the kinetic energy due to the electrons, the second term the attractive coulomb interaction between the electrons and nuclei, the third term the repulsive coulomb interaction between electrons, the fourth term the kinetic energy of the nuclei, and the fifth term the repulsive coulomb interaction between nuclei. Almost all effort in understanding the behavior of various condensed matter systems is focused on the third term above, for reasons that will become obvious shortly. The hamiltonian in (1.1) is obviously intractable in solid state systems, involving on the order of Avogadro’s number ( 1023) of particles. Even if one could solve it exactly, the many-body wavefunction that would result would be so complicated as to be of little use in understanding the system at hand. Clearly some simplifications need to be made if we are to make headway in understanding the behavior of from quantum mechanical principles alone (or first principles, to use the common phrase).

1.2.1 The Born-Oppenheimer Approximation The first important approximation is based on the insight that the nuclei, being much more massive than the electrons in the system, move much more slowly than the electrons. The consequence of this is that the nuclear kinetic energy is small and can therefore be neglected in the hamiltonian (get rid of the fourth term in (1.1)). This is the so-called Born-Oppenheimer or adiabatic approximation. The effect of the nuclei on the electrons in the system is as an external potential and we can take this into account by considering the nuclear coordinates as parameters in equation (1.1) [77]. Practically, what this means is that, if we are unsure about the positions of the nuclei in a given condensed matter system, we simply initialize them to some values (often based on experiment, or some intuition about the of the system at hand), compute the fifth term in (1.1) based on this initial guess, solve the electronic hamiltonian (the first, second, and third terms in (1.1)) for this set of nuclear coordinates, combine the electronic and nuclear contributions to get the total energy, and then update the nuclear positions to a new set that we suspect will lower 1.2. THE MANY-BODY HAMILTONIAN 9 the total energy (based on some algorithm - there are many). The set of nuclear coordinates for which total energy is lowest is the theorist’s answer for the material structure. Now, there are often many approximation that go into this process, so this structure may not exactly match experiment, but this is in principle the process that must be carried out. Before we move onto further approximations that help make the many-body hamiltonian manageable, a quick mention of how the fifth term in (1.1) is computed : it is through the technique of Ewald summations, which are actually quite fun. See appendix F in Martin’s book for a clear exposition of this method [77].

1.2.2 Symmetries : translational and point-group Having computed the contribution to the total energy coming from nuclei-nuclei repulsion, we are left with only the electronic hamiltonian

~2 2 2 ˆ 2 ZI e 1 e Hel = i + . (1.2) − 2m ∇ − ri RI 2 ri rj Xi Xi,I | − | Xi=6 j | − | This remains a considerable problem to solve, with 1023 particles to consider still for solid state systems. The key ingredient that allows us to reduce the problem size by many orders of magnitude ( 23 orders of magnitude for bulk systems) is the use of symmetry, especially discrete translational symmetry. I will describe these simplifications very briefly below, as they are covered very well in many standard textbooks. The brief coverage here is not meant to downplay the importance of these topics, but instead is a reflection of the quality of coverage elsewhere. Indeed, without the reductions allowed by symmetry the many-body problem in solids would be very much intractable still. The idea that symmetry allows us to reduce our problem size by 23 orders of magnitude is truly revolutionary and it is worth ∼ taking a moment to pause and appreciate this beautiful help that nature has given us. The effect of discrete translational symmetry present in crystalline systems is distilled in Bloch’s theorem, which says that upon moving between different identical copies of the fundamental unit cell in a crystal, the electron wavefunction changes by only a [77]. The exact value of this phase is determined by the quantum number associated with transla- tional symmetry - the wavevector - and the vector connecting the different unit cells. Since this change between different unit cells is quite small and is easily quantified, then this important theorem tells us is that we need only solve our problem in the unit cell of the crystal under study. This achieves the aforementioned massive reduction in problem size for crystalline systems. Thank you, ! And thank you, nature. Further reduction of the problem size can be made by making use of the point-group symmetries of the system under study. While of somewhat less importance than Bloch’s 1.2. THE MANY-BODY HAMILTONIAN 10 theorem in reducing the problem size, point-group symmetries do reduce the amount of work theorists have to do by roughly an order of magnitude in systems with significant symmetry[77].

1.2.3 Planewaves and Pseudopotentials We have obtained a substantial reduction in problem complexity and size via the Born- Oppenheimer approximation and the use of symmetry. Now we can legitimately think about solving the hamiltonian in (1.2). The remaining challenges include how to treat the troublesome electron-electron repulsion term and how to treat the interaction of the electrons with the nuclei (the kinetic energy is quite easy to treat, as mentioned earlier). The next section on mean-field theories will tackle the complexities of the electron-electron interactions, so we focus here on the electron-nuclei interactions. The problem of how to treat the electron-nuclei interactions is closely related to the question of which basis to use in representing the electronic wavefunction (all practical electronic structure calculations use some basis to represent the electronic wavefunction). The wavefunction of valence electrons vary rapidly as a function of space in the vicinity of the nucleus, so common bases used in toy problems in textbooks, such as planewaves or real-space points, would require the use of many basis vectors in order to adequately capture this variation. The solution of the many-body problem using such a large basis would be almost impossible, even on modern computers. One solution to this problem is to use basis functions that are suited to the rapid variations present around the nucleus, such as a combination of spherical Bessel functions and spherical harmonics. This is the approach of so-called “all-electron” methods, which include all of the electrons of the system explicitly in the calculation and hence have to deal with the complexity of the wavefunction near the nuclear cores. Such methods include the linearized muffin-tin orbital (LMTO) and LAPW (linearized augmented plane waves) methods [77]. Another approach, first tried by Enrico Fermi [27, 44], refined by Antoncik, Phillips and Kleinman [3, 85] and implemented in a practical, general way by Cohen, Heine, and Hamann [19, 39], is to recognize the fact that the electrons in the inner shells of are tightly bound to the nucleus and are hence relatively inert, so that the nucleus and core electrons can be grouped together as an “ core” with which the outer, valence electrons interact. Since the valence electrons are the only electrons included explicitly in the hamiltonian and the core electrons are included in an effective “pseudopotential” along with the nuclei, the radial wavefunction of the valence electrons will be nodeless (since they will be the lowest energy eigenstates of the hamiltonian with the pseudopotential in place of the true electron-nuclei interaction). These smooth, nodeless valence wavefunctions can be expanded in simple bases such as plane waves with a reasonable size basis set. 1.2. THE MANY-BODY HAMILTONIAN 11

This methodology, the so-called planewave pseudopotential (PW-PP) method, has a long history in condensed matter physics and is to this day one of the most common methods of performing electronic structure calculations. The simplicity of planewaves makes the implementation of new physical ideas and methods quite straighforward and less prone to errors (in derivations and coding) than other, more complicated bases. Indeed, new methods or physical approximations are often first attempted with the plane waves and pseudopotentials and many methods are primarily available in the PW-PP methodology due to the ease of convergence, scalability, and simplicity of the underlying expressions [10, 9, 34, 33, 17, 78]. However, the use of pseudopotentials is a significant approximation that fails for some atoms in its original form and requires innovations on top of the simple formulation dis- cussed above in order to be able to give reliable results for all atoms in the periodic table. Specifically, the pseudopotential approximation in fact has three sources of error relative to including all of the electrons explicitly in the calculation : 1) freezing the core electrons at the atomic level and not allowing them to relax in the solid state environment, 2) partitioning the electron-electron interaction into core and valence terms artificially and representing the effect of the core only through the pseudopotential, and 3) using smooth pseudowavefunc- tions in the place of the true all-electron wavefunction[32, 66]. For ground state calculations, these sources of error are generally quite small (although the partitioning can be problem- atic for systems where the core and valence charge density overlap significantly [72]) and the PW-PP approximation is quite accurate. However, for MBPT calculations, these ap- proximations become more severe due to the dependence of the quantities computed (e.g. the bare exchange) on the details of the electronic wavefunction, and the PW-PP approxi- mation is somewhat less reliable for MBPT calculations (although still quite reliable on the whole - the relative error due to the PW-PP approximation is still quite small for MBPT calculations for many systems, just larger than for ground state calculations). This will be discussed in detail later as it is a main topic of this thesis. Let’s now briefly discuss how pseudopotentials are actually generated, for reference. After generating the reference all-electron wavefunction and energies for a given atom, it is a simple three-step process (which is performed for each angular-momentum channel). First, a set of conditions is chosen for getting a suitable radial pseudowavefunction from the corresponding all-electron radial wavefunction (the angular part of the wavefunction is taken to be a spherical harmonic; since the ionic core potential is approximately spherically symmetric, the separation of the full wavefunction into a product of a radial wavefunction and spherical harmonic is employed). Such conditions include matching the value of the wavefunction and its derivative at a given radius rc(usually chosen to be slightly past the outermost peak of the AE wavefunction), including the same amount of charge inside the chosen radius for your pseudowavefunction as is in the AE wavefunction (norm conserva- 1.3. MEAN-FIELD THEORIES 12 tion), having the same eigenvalues for the AE wavefunction and pseudowavefunction, and various conditions to improve the smoothness of the pseudowavefunction. The second step is then to invert the radial Schrodinger equation to get the pseudopotential for the given an- gular momentum channel (the pseudopotential is angular-momentum dependent). Finally, the pseudopotential is unscreened by subtracting off the contribution due to the valence electrons, leaving only the contribution due to the ion cores [77]. There are some details of PP generation that are omitted in the above coverage, but the most important details are included. More details can be found in the book by Martin [77]. PP generation has been called a dark art by some of those in the field because generating a quality PP is generally not a completely straightforward process. The conditions that are applied in order to get a pseudowavefunction from an AE wavefunction mean that there is flexibility in defining the PP, which can be very helpful in getting the smoothest or most accurate PP. Conversely, it can create a large parameter space through which one must search in order to get a desirable representation of the interaction with the ion cores. Also, depending on the conditions put on the pseudowavefunction during PP generation, the equations may not in general be solvable for certain choices of parameters. This adds another level of frustration, when certain parts of parameter space are unavailable and it is unclear a priori why this is so. This is an active challenge to high-throughput calculations based on the PW-PP methodology and presents a challenge to future researchers. Some work in the group has been done, e.g. by Brad Malone and Zhenglu Li, to create scripts that generate pseudopotentials for a range of parameters and attempt to recommend the best possible one. This work will likely need to be expanded in the coming years to meet the demand.

1.3 Mean-field theories

As discussed above, mean-field theories, such as Hartree theory, Hartree-Fock theory, and density functional theory (DFT), are theories for the static ground state of various condensed matter systems and they represent the interactions between particles in terms of an average potential, or mean field. They share that they are self-consistent field theories, in which an initial guess is made for the wavefunctions or density of electrons in the system, the relevant equations are solved, and then the wavefunctions and densities are updated, and this process is continued until self-consistency is reached. They also share that the fundamental equations that define all the theories are obtained by minimizing the total energy of the theory with respect to variations in the wavefunctions. All the fundamental equations for single-particle orbitals in the different theories resemble the single-particle Schrodinger equation, but with more complicated interactions than are usually seen in single-particle quantum mechanics. This resemblance is both comforting and deceptive, as we know and love the Schrodinger 1.3. MEAN-FIELD THEORIES 13 equation, but the superficial resemblance of all the equations can mask the differences in the theories and the assumptions made in the derivation of each theory’s field equation. Indeed, all of these theories are derived based on different assumptions, and consideration of the assumptions made in the derivations and subsequent approximations of important quantities (especially in the case of DFT) can yield considerable insight into the successes and failures of these theories. We will discuss each one in succession, before focusing on some of the details of density functional theory that will be relevant for some of the research presented later.

1.3.1 Hartree theory The Hartree field equations are obtained by assuming that the many-body wavefunction can be expressed as a product of single-particle orbitals and then minimizing the total energy with respect to variations in the single-particle orbitals assuming such a wavefunction [55]

~2 ′ 2 2 2 3 ′ φj(r ) + Vion(r)+ e d r | | φi(r)= ǫiφi(r) (1.3)  − 2m∇ Z r′ r  Xj=6 i | − | where we have written the interaction of an electron with all the ion cores in the system

2 ZI Vion(r)= e (1.4) − r RI XI | − | where ZI is now the charge of the ion (nuclear charge + tightly bound electron charge), assuming we are using pseudopotentials (for all-electron calculations it is still just the nuclear charge). The assumed form of the many-body wavefunction in Hartree theory is directly in contra- diction to the Pauli exclusion principle and the corresponding need for anti-symmetrization of electron wavefunctions, so it is not surprising that Hartree theory does not in general give great results for most systems and is mostly of interest for historical reasons and as a simplest possible approximation for the many-body wavefunction.

1.3.2 Hartree-Fock theory The Hartree-Fock field equations are obtained by assuming that the many-body wavefunc- tion can be expressed as an anti-symmetrized product of single-particle orbitals and then minimizing the total energy assuming such a wavefunction [55] 1.3. MEAN-FIELD THEORIES 14

~2 ′ 2 ∗ ′ ′ 2 2 3 ′ φj(r ) 2 3 ′ φj (r )φi(r ) + Vion(r)+ e d r | | φi(r)+ e d r φj(r)= ǫiφi(r)  − 2m∇ Z r′ r  Z r′ r Xj | − | Xj | − | (1.5)

The Hartree-Fock approximation (HFA) is a significant improvement over Hartree theory, since it contains the effect of Pauli exclusion principle explicitly. Indeed, the HFA is the basis for many higher accuracy methods in the field of quantum chemistry. For solids, however, the HFA is problematic since it only contains exchange effects but no correlation effects and no screening. This can lead to unphysical results in solids, such as the density of states being zero at the fermi level for the homogeneous electron [7].

1.3.3 Density functional theory Density functional theory is a bit more complicated than either Hartree theory and Hartree- Fock theory in terms of the underlying theoretical underpinnings. I will not attempt to go into the intricacies in any detail, but instead I will just give a quick overview. First of all, Hohenberg and Kohn [45] showed using two simple proofs that: 1) The total energy is a functional of the density. To derive this result this, Hohenberg and Kohn used a simple proof by contradiction to show that the external po- tential (e.g. the potential due to all the ion cores) uniquely determines the density. Since the external potential and the number of particles determines the many-body wavefunction, then the total energy, which is a functional of the many-body wavefunction, must also be a functional of the density. 2) The total energy functional has its minimum at the correct, physical den- sity of the system under study. To show this, just note, by the first theorem, if a system has a different density than the true, physical density, then it has a different many-body wavefunction. Since this is not the ground-state many-body wavefunction, the energy of this state must be higher. These are some very nice theorems, proven very simply. However, they give no insight as to what the nature of the energy functional might be. Further work is needed to make these theorems useful. The next key breakthrough came from Kohn and Sham [58], who proposed that the density for a system of interacting particles can be obtained by solving auxiliary equations for a corresponding system of non-interacting particles [77]. Since the prescription for such a mapping is not unique and in fact it is not known in general if such a mapping exists in general, this proposal is an ansatz (the Kohn-Sham ansatz). This simplifies the problem, because the many-body wavefunction of a set of non-interacting particles can be written as 1.3. MEAN-FIELD THEORIES 15 a single Slater determinant of the N lowest eigenvalues of the single-particle equation for the non-interacting potential VXC [n(r)] (see next section for why this is). VXC [n(r)] is known as the exchange-correlation potential and contains all the complicated many-body effects. With this ansatz, the Kohn-Sham equation is obtained simply as a minimum of of the total energy with respect to variations in the single-particle wavefunctions

~2 ′ 2 2 2 3 ′ φj(r ) + Vion(r)+ e d r | | + VXC [n(r)] φi(r)= ǫiφi(r), (1.6)  − 2m∇ Z r′ r  Xj | − | which is updated self-consistently along with the density, which is defined as the sum of the absolute square of the occupied orbitals. The exchange correlation potential is defined as the functional derivative of the exchange-correlation energy functional EXC [n(r] with respect to the density. Density functional theory, or DFT, performed using the Kohn-Sham framework is known as Kohn-Sham DFT. The challenge in the Kohn-Sham framework is to find an adequate expression for EXC [n(r]. Some expressions for EXC [n(r] have been obtained by parameterizing total energy v. density data from Quantum Monte Carlo calculations on the uniform electron gas to various functional forms on the density, including local and semi-local functionals. The resulting functionals work quite well for systems in which the electrons are not overly localized, as in many metals and main block semiconductors such as Si or GaAs. For systems with localized electrons, these functionals do not work very well due to the problem of self-interaction that plagues such functionals (see details in Martin [77]). More advanced schemes, such as hybrid functionals or orbital-dependent interactions are needed to handle such systems. Incidentally, Burke and co-workers [97] have shown that in 1D the Kohn-Sham exchange-correlation potential always exists for systems with reasonable physical parameters, even for systems with localized electrons. If these results extend to 3D, then it is possible that, with a good enough EXC [n(r], Kohn-Sham DFT will one day be useful on systems with localized electrons as well, without further modification. That would be nice.

A final note as we close our discussion of mean-field theories. The ǫi that appear in all the equations above are Lagrange multipliers that arise during the variational minimization of the total energy and there is no formal justification for equating them with the energies of the single particle states φi(r). Nonetheless, it is common practice to interpet ǫi as the energies of the single particle states and the results are generally reasonable, especially for band dispersions in solids. Band gaps, however, are not well-reproduced due to the well- known derivative discontinuity of DFT [77]. This is why, for band gaps, we need a more advanced theory, like hybrid functionals, or, even better, the GW approximation. After a short interlude about Slater determinants and particle interactions, we will turn to the GW approximation, which has been used for the bulk of the results in this thesis. 1.3. MEAN-FIELD THEORIES 16

1.3.4 Single Slater determinants and particle interactions I do not think it is ever adequately discussed why one can write the many-body wavefunction of a system of non-interacting particles as a single Slater determinant and this is guaranteed to minimize the total energy, while for interacting particles this is not the case. So, I will endeavor to do so here. For the non-interacting case, the particles do not interact with each other, so the potential the particles feel can only depend on the position of the particle in space. This means that we can simply solve the single-particle Schrodinger equation for the first N eigenvalues and eigenstates for our N-electron system subject to the potential V (r) and then form a Slater determinant Ψ from these lowest energy states. The key point is that, for a one- | 0i body potential V (r), the addition of more Slater determinants to this determinant cannot possibly lower the energy of the system of electrons. To see why the addition of more Slater determinants does not lower the total energy, let’s think about the problem systematically. Let’s imagine, without loss of generality, that we form our new single Slater determinants Ψjk by replacing one of the N lowest | i energy eigenstate φj (j N) with a state of higher energy φk (k >N). Now, when ≤ you calculate the matrix elements of the Hamiltonian between Ψ and these new, slightly | 0i different Slater determinants the result will always be zero because there will always be an integral over a product of φk(r)φj(r) (this result can be obtained by following the usual Hartree-Fock evaluation of the total energy, but instead using different Slater determinants for the bra and ket). So, the total energy, which is just a sum over all the matrix elements of Hamiltonian, will just be a sum of the energies of the individual Slater determinants. Since the other Slater determinants are known to have higher energy for the given Hamiltonian under study, their addition to Ψ increases the total energy. So, for the non-interacting | 0i problem Ψ really does give the ground state. | 0i When instead you have a two-body potential, like the Coulomb interaction between electrons, this argument no longer holds. Instead, for a two-body potential the matrix ele- ments between Ψ and the different Ψjk can in general be non-zero (orbital orthogonality | 0i | i no longer kills the coupling between different Slater determinants due to the presence of the Coulomb interaction between particles in different orbitals). Since you have coupling between the different Slater determinants, the many-body wavefunctions that include the Slater determinants formed from single-particle states of higher energy can possibly lower the total energy! For example, if you solve the Hartree-Fock equations you will get the single Slater determinant of lowest possible energy Ψ by forming a Slater determinant from the | 0i N lowest eigenstates (as discussed above). However, the true many-body wavefunction is likely composed of this Slater determinant along with admixtures of the Ψjk , since these | i admixtures can lower the total energy of the N-electron system. 1.4. THE GW APPROXIMATION 17

1.4 The GW approximation

Having reviewed mean-field theory, let’s now discuss many-body perturbation theory, espe- cially the GW approximation. I do not wish to re-hash what is covered much better in all of the quality many-body theory books cited above, but I will again present the information that is important for this thesis in as clear a way as possible. In many-body perturbation theory (MBPT) the poles of the Green’s function G(r, r′, ω) (also known as the propagator) give the energies for excitation of particles from the ground state. The Green’s function in the time domain gives the amplitude for the process by which a particle is created at one point and one time and is later destroyed at a different time. These facts comes out of the machinery of MBPT or QFT. So, how do we obtain the Green’s function for the systems in which we are interested? We use exactly the process discussed in the first section of this chapter. Before we get into the equations, a discussion of this process in words only is in order. What is done is that, first, a mean-field calculation is performed to obtain single-particle eigenfunctions and eigenenergies. This is used to build the non-interacting Green’s function ′ G0(r, r , ω). Since the Green’s function tells us how particles propagate in our system, it also gives us information about how particles in the system polarize, as polarization occurs when charges of opposite charge propagate to different parts of the solid. The quantity that encodes this polarization is known as the polarizability χ(r, r′, ω), and when we are talking about the polarization from the propagation of non-interacting particles, the rele- ′ vant quantity is known as the irreducible polarizability χ0(r, r , ω). We can then use this irreducible polarizability to construct the inverse dielectric matrix, which tells us how the coulomb interaction between particles is weakened by this polarization (or screening). Fi- nally, with this suitably weakened coulomb interaction, we can obtain a modified Green’s function, G(r, r′, ω), by allowing the originally non-interacting particles to interact weakly via the screened coulomb interaction. This is the so-called interacting Green’s function. The effect of these relatively weak interactions with other particles can be grouped into a quantity known as the self energy Σ(r, r′, ω). Because the interactions in solids are not very weak, the equation that relates the renormalized Green’s function G to the non-interacting

Green’s function G0, a perturbation series known as Dyson’ equation, must be summed to infinite order. In equations, after performing our mean field calculation to obtain mean-field wavefunc- tions φnk(r) and energies ǫnk, we construct the non-interacting Green’s function as follows [43]

∗ ′ ′ ψnk(r)ψnk(r ) G0(r, r , ω)= , (1.7) ω ǫnk + iδnk Xnk − 1.4. THE GW APPROXIMATION 18 where here we are now assuming the system under study now is a solid, so we have an addi- tional Bloch index k. From this we construct the irreducible polarizability as the convolution G G [88] 0 ∗ 0

occ emp ′ 1 ∗ ∗ ′ ′ χ (r, r , ω)= ψ (r)ψ ′ (r)ψ (r )ψ ′ (r ) 0 2 n n n n Xn Xn′ 1 1 (1.8) × ǫnk q ǫn′k ω iδ − ǫnk q ǫn′k + ω + iδ  + − − − + − We then construct the dielectric matrix in the linearized time-dependent Hartree (LTH) approximation (better known by the terrible name the random phase approximation, or RPA) [43]

′ ′ ′′ ′′ ′′ ′ ǫ(r, r , ω)= δ(r, r ) dr v(r, r )χ0(r , r , ω), (1.9) − Z invert it, and use the inverse dielectric matrix to get the screened coulomb interaction

′ ′′ −1 ′′ ′′ ′ W0(r, r , ω)= dr ǫ (r, r , ω)v(r , r ) (1.10) Z Using the screened coulomb interaction and the non-interacting Green’s function, we can obtain the self energy as

∞ ′ i ′ ′ ′ ′ ′ −iδω′ Σ(r, r , ω)= dω G0(r, r , ω ω )W0(r, r , ω )e . (1.11) 2π Z−∞ − This expression for the self energy is called the GW approximation and was originally derived by Hedin [40]. Solving Dyson’s equation [75] for the interacting Green’s function

G = G0 + G0ΣG, (1.12) gives G−1 = G−1 Σ. (1.13) 0 − 1.5. QUASIPARTICLE LIFETIMES 19

Projecting this equation onto state space we get

G−1 = nk G−1 nk nk h | | i −1 = G Σnk 0,nk − = ω ǫnk Σnk, (1.14) − − which yields a renormalized, or interacting, Green’s function given by

1 Gnk(ω)= (1.15) ω ǫnk Σnk(ω) − − So, once the matrix element of the self energy is obtained for a given state, then one can get the interacting Green’s function for that state in the GW approximation. By solving for the poles of G one gets the energies Enk of the quasiparticles in the system

Enk = ǫnk + ReΣnk(Enk) (1.16)

The above equation is called the quasiparticle equation and solving is often referred to as “solving Dyson’s equation” because solving it gives the interacting Green’s function on the LHS of Dyson’s equation. From G one can also get the spectral function A

1 A (ω)= ImG (ω) nk π nk 1 ImΣnk(ω) = 2 2 (1.17) π ω ǫnk ReΣnk(ω) + ImΣnk(ω) − −   The spectral function is the link between what we calculate theoretically and what is mea- sured experimentally, since the signal intensity in an angle-resolved photoemission (ARPES) experiment is proportional to the spectral function (the proportionality constant is a one- electron matrix element). This fact is derived in very straightforward fashion on p. 158 of the Hedin-Lundqvist review [43]

1.5 Quasiparticle lifetimes

The non-interacting Green’s function from equation (1.7) can be written in some single- particle basis as 1 G0,nk(ω)= . (1.18) ω ǫnk + iδnk − 1.6. BEYOND GW: THE CUMULANT EXPANSION 20

Fourier tranforming equation (1.18) we get

iǫnkt G0,nk(t)= e . (1.19)

This is a standard result. What we see from equation (1.19) is that non-interacting particles do not decay as a function of time, i.e. the Green’s function has the same amplitude at all times. This makes sense with the prior discussion about interactions being needed to induce scattering and particle decay. We also see that if the single-particle energy ǫnk has a complex component then the particle will decay with a lifetime that is inversely proportional to the imaginary component. Inspecting equation (1.15) we see that the self energy, which is complex, will give interacting particles a lifetime that is inversely proportional to ImΣ(Enk). This again makes sense in light of our prior discussion about interactions being needed to induce scattering and particle decay, i.e. the self energy contains the many-body interactions, so the inclusion of it in the Green’s function via Dyson’s equation gives the particles in the system (which are now interacting with one another) finite lifetimes. We will use the fact that ImΣ(Enk) gives the inverse lifetime to compute quasiparticle lifetimes to due electron- electron interactions in the GW approxiation, and use the computed lifetimes to study hot carrier relaxation and impact ionization. Note that in practice we use the imaginary part of the self energy evaluated at the mean-field energy to get the quasiparticle lifetimes,

ImΣ(ǫnk). This is to save computational expense.

1.6 Beyond GW: the cumulant expansion

In this section we go through the derivation of the expressions to calculate the improved Green’s function from the cumulant expansion (improved relative to the GW Green’s func- tion due the inclusion of more interactions between particles). This follows the derivation in the review of Almbladh and Hedin [2], but fills in many details that are important for conceptual understanding. This derivation was done in conjunction with my colleague Dr. Johannes Lischner.

The basic idea of the cumulant expansion is to write the Green’s function Gn(t) for a hole in the state n as

−iǫnt+Cn(t) Cn(t) Gn(t)= iΘ( t)e = G ,n(t)e , (1.20) − 0

−iǫnt where G ,n(t)= iΘ( t)e is the non-interacting Green’s function, Cn(t) is the cumulant, 0 − and ǫn is the mean-field energy for the state n (we assume the Green’s function is diagonal in some single-particle basis). We have suppressed the wavevector label for notational 1.6. BEYOND GW: THE CUMULANT EXPANSION 21 simplicity. Θ(t) is the Heaviside step function. We then write Dyson’s equation in the time domain

∂ ∞ i Gn(t) ǫnGn(t) Σn(t τ)Gn(τ)dτ =0, (1.21) ∂t − − Z−∞ − or ∞ ∂ iǫnt −1 i ln[e Gn(t)] = Σn(t τ)Gn(τ)dτGn (t). (1.22) ∂t Z−∞ − Everything we’ve done up to now is exact. We now make the crucial ansatz/approximation for this derivation (there are similar approximations for all other derivations), which is to −ǫkt replace the Green’s functions on the right-hand side with G ,n(t) = ie Θ( t). Because 0 − we are considering holes we have k < kF and t< 0. With all of this, we get

0 ∂ iǫnt iǫn(t−τ) i ln[e Gn(t)] = Σn(t τ)e dτ ∂t Z−∞ − ∞ iǫnτ = Σn(τ)e dτ, (1.23) Zt where we have used the variable change τ t τ. Integrating with respect to t (for the → − RHS, recall that Gn(0) = i at zero tempertaure), we get

iǫnt t ∞ e Gn(t) Gn(t) ′ iǫnτ iln = iln = dt Σn(τ)e dτ, (1.24)  i  G0,n(t) Z0 Zt′ or (recall that there is an implicit Θ( t) that we have been carrying around above), − t ∞ ′ iǫnτ Gn(t)= G0,n(t)exp[ i dt Σn(τ)e dτ]. (1.25) − Z0 Zt′

With this result, we can read off the cumulant function Cn(t) as

t ∞ ′ iǫnτ Cn(t)= i dt Σn(τ)e dτ. (1.26) − Z0 Zt′

This result for the cumulant function Cn(t) can be derived in other ways, e.g. diagrammat- ically [41] or a heuristic approach of equating the first term in the Dyson series with the cumulant function [5]. The approach presented here is quite similar to that of Aryasetiawan, et al. [5] and I think the derivation by Aryasetiawan and collaborators actually has this derivation in mind as it presents a more heuristic approach. However, I think that deriva- tion ends up being quite confusing, as the lack of rigor leads to some ambiguity about the limits of integration, which are of fundamental performance in getting and understanding 1.6. BEYOND GW: THE CUMULANT EXPANSION 22 the final result. That is why I have followed the approach of Almbladh and Hedin up to this point to get the expression from the cumulant function. Having done this in a rigorous fashion, I will now proceed to manipulate further this expression for Cn(t) in order to get something that can be computed simply from the output from a standard GW calculation for the full-frequency dependent self energy Σn(ω). This part of the derivation follows what is in the review of Aryasetiawan most closely, since I think it is done most clearly there. We first plug the spectral representation of the self energy

µ ∞ i −iωt −iωt Σn(t)= Θ( t) dωe ImΣn(ω)+Θ(t) dωe ImΣn(ω) (1.27) π  − Z−∞ Zµ  into (1.26) and do the time integrals to get two terms (we use the fact that t < 0 in determining the limits of integration below)

t ∞ µ ′ iǫnτ i −iωτ Cn,<(t)= i dt dτe Θ( τ) dωe ImΣn(ω) − Z0 Zt′ π − Z−∞ µ t 0 1 ′ i(ǫn−ω)τ = dωImΣn(ω) dt dτe π Z−∞ Z0 Zt′ µ t ′ 1 ′ 1 i(ǫn−ω)t = dωImΣn(ω) dt [1 e ] π Z Z i(ǫn ω) − −∞ 0 − 1 µ t 1 ei(ǫn−ω)t = dωImΣn(ω) + 2 2 π Z−∞ i(ǫn ω) [i(ǫn ω)] − [i(ǫn ω)]  µ − µ − −µ it ImΣn(ω) 1 ImΣn(ω) 1 ImΣn(ω) i(ǫn−ω)t = dω dω 2 + dω 2 e − π Z ǫn ω − π Z (ǫn ω) π Z (ǫn ω) −∞ − −∞ − −∞ − (1.28)

t ∞ ∞ ′ iǫnτ i −iωτ Cn,>(t)= i dt dτe Θ(τ) dωe ImΣn(ω) − Z0 Zt′ π Zµ ∞ t ∞ 1 ′ i(ǫn−ω)τ = dωImΣn(ω) dt dτe π Zµ Z0 Z0 ∞ t 1 ′ 1 = dωImΣn(ω) dt [1 0] π Z Z i(ǫn ω + iδ) − µ 0 − it ∞ ImΣ (ω) = dω n (1.29) − π Z (ǫn ω + iδ) µ − where iδ’s have been introduced in Cn,> in order to ensure the convergence of the integrals 1.6. BEYOND GW: THE CUMULANT EXPANSION 23 ranging to t = . ∞ Fourier transformation of Σn(t) in (1.27) gives

µ ′ ∞ ′ ′ ImΣn(ω )/π ′ ImΣn(ω )/π Σn(ω)= dω dω (1.30) Z ω ω′ iδ − Z ω ω′ + iδ −∞ − − µ − which allows us to simplify the expression for Cn(t) by noting the following identities (which follow directly from (1.30))

µ ∞ it ImΣn(ω)/π it ImΣn(ω)/π itΣn(ǫn)= dω + dω (1.31) − − π Z ǫn ω iδ π Z ǫn ω + iδ −∞ − − µ −

µ ∞ 1 ImΣn(ω)/π 1 ImΣn(ω)/π ∂ωΣn(ǫn)= dω 2 + dω 2 (1.32) −π Z (ǫn ω iδ) π Z (ǫn ω + iδ) −∞ − − µ − so that

h Cn(t)= Cn,<(t)+ Cn,>(t)= itΣn(ǫn)+ ∂ωΣ (ǫn)+ Cn,S(t) Cn,qp(t)+ Cn,S(t) (1.33) − n ≡

h Cn,qp(t)= itΣn(ǫn)+ ∂ωΣ (ǫn) (1.34) − n

∞ Γn(ω) i(ǫn−ω)t Cn,S(t)= dω 2 e (1.35) Z (ǫn ω iδ) µ − −

ImΣ (ω) Γ (ω)= n (1.36) n π The term that involves the derivative of Σ with respect to frequency has an “h” superscript because it only has a contribution to the spectral resolution of the self energy going up to the chemical potential, i.e. the hole contribution to the self energy. Cn,qp(t) and Cn,S(t) are the quasiparticle and satellite contributions to the cumulant, respectively. The iδ is added − to the satellite contribution since t< 0, including t = . Further defining −∞ 1.6. BEYOND GW: THE CUMULANT EXPANSION 24

h ∂ωΣn(ǫn)= γn + iαn (1.37)

Σn(ǫn)=∆ǫn + iδn (1.38) we find that the spectral function as a function of frequency An(ω) is given by

∞ 1 iωt −iǫnt+Cn,qp(t) An,qp(ω)= dte e 2π Z−∞ −γn e δncos(αn) (ω En)sin(αn) = − −2 2 (1.39) π (ω En) + δ − n

∞ 1 iωt −iǫnt+Cn(t) An(ω)= dte e 2π Z−∞ ∞ 1 iωt −iǫnt+Cn,qp(t) Cn,S (t) = An,qp(ω)+ dte e e 1 (1.40) 2π Z − −∞   where En = ǫn +∆ǫn. Note that to derive this result we had to extend the cumulant to ∗ positive times using the fact that Cn( t)= C (t) (which can be see from equation (1.26)). − n The splitting of the spectral function into quasiparticle and satellite contributions in the second line of (1.40) is appealing physically and is helpful when one is wanting to analyze these contributions to the spectral function separately, e.g. when dealing with extrinsic losses [38]. Indeed, using the identity

h(ω)= dteiωtf(t)g(t) Z ′ ′′ dω ′ dω ′′ = dteiωt e−iω tf(ω′) e−iω tg(ω′′) Z Z 2π Z 2π dω′ = f(ω′)g(ω ω′) (1.41) Z 2π − 1.6. BEYOND GW: THE CUMULANT EXPANSION 25 we get the first order satellite contribution to be

∞ 1 iωt −iǫnt+Cn,qp(t) An,sat(ω)= An(ω) An,qp(ω) dte e Cn,S(t) − ≈ 2π Z−∞

′ ′ ′ = dω An,qp(ω )Cn,S(ω ω ) (1.42) Z − However, this separation is not necessary and can lead to numerical artifacts, such as a negative value of the spectral function in small frequency regions due to inexact cancellation between the quasiparticle and satellite contributions. The most numerically stable way to compute the total spectral function is to use the expression in the first line of (1.40), which does not involve splitting up the spectral function into different contributions. Before we move onto fourier transforming the expression for the satellite contribution to the cumulant function to frequency space, we note that the use of the hole part of the self energy in the frequency derivative term is quite difficult in a practical calculation, as it would require the calculation of the self energy at a large range of frequencies. Instead, in our work we have just used the total self energy, i.e. with particle and hole contributions.

This term is actually not very important because αn 0 and γn just provides an overall ≈ scaling. The reason that αn 0 is that, near the quasiparticle energy, the imaginary part ≈ of the self energy should be small (not much decay) and thus its variation with frequency should also be small. One can adjust γn to achieve normalization, as a practical matter. However, in our work we used a slight modification to the formalism presented here [53], in which normalization is not a problem. As a final activity, let’s fourier transform the satellite cumulant to frequency space, which is useful when decomposing the cumulant into quasiparticle and satellite contributions. We write

0 ∞ 1 iωt iωt Cn,S(ω)= dte Cn,S(t)+ dte Cn,S(t) , (1.43) 2π  Z−∞ Z0  separating the fourier transforms into the t < 0 and t > 0 contribution since the pole ∗ structure is different for the two different terms ( since Cn( t) = C (t)). Evaluating the − n first term we get

0 ∞ ′ ′ ′ 1 iωt ′ Γn(ω )Θ(µ ω ) i(ǫn−ω )t Cn,S(ω)= dte dω ′ − 2 e 2π Z−∞ Z−∞ (ǫn ω iδ) ∞ ′ − − ′ 1 ′ Γn(ω )Θ(µ ω ) = dω ′ 2 − ′ (1.44) 2πi Z (ǫn ω iδ) (ω + ǫn ω iδ) −∞ − − − − 1.6. BEYOND GW: THE CUMULANT EXPANSION 26

The integral in (1.44) has a double pole and single pole, both in the lower half plane. The integral for t> 0 has both of its poles in the same place, but in the upper half plane. Thus, it does not matter which way we close the contour and we choose to close in the lower half plane, where the contribution for the t> 0 integral is zero. Recall that the integrals over a single pole are given by

f(z) dz = 2πif(z0), (1.45) Z (z z ) − − 0 while those over a double pole are given by

f(z) ′ dz =2πif (z0). (1.46) Z (z z )2 − 0 Remembering to include an extra minus sign since we close in the lower half plane (clockwise contour), our final result is

′ ′ ′ ′ ′ Γn(ω )Θ(µ ω ) Γn(ω )Θ(µ ω ) Cn,S(ω)= ′ − ′− 2 ′ ′  (ǫn ω iδ) ω =ω+ǫn −(ω + ǫn ω iδ)ω =ǫn − − ′ − − Γn(ω + ǫn)Θ(µ ω + ǫn) Γ (ǫn)Θ(µ ǫn) Γn(ǫn)Θ(µ ǫn) = − n − − ω2 − ω − ω2 ′ Γn(ω + ǫn)Θ(µ ω + ǫn) ωΓ (ǫn) Γn(ǫn) = − − n − (1.47) ω2

This expression for Cn,S(ω) can be used in the decomposition of the spectral function into quasiparticle and satellite contributions. 27

Part II

Research 28 2

Plasmon Satellites in Doped Graphene ARPES

2.1 Summary

We present spectral functions of doped graphene in isolation and on a silicon carbide sub- strate from ab initio GW and GW plus cumulant (GW+C) theory. In the vicinity of the Dirac points, both theories predict two intense quasiparticle bands and two weaker satellite bands. We find that the separation of quasiparticle and satellite bands is smaller in the ab initio GW+C theory, in good agreement with experimental photoemission data. While the energy dispersions and intensities from ab initio GW+C theory agree well with experiment for the upper quasiparticle and satellite bands, we find discrepancies for the intensities of the lower quasiparticle band and the lower satellite band, which we attribute to extrinsic losses. For comparison with previous calculations, we also carry out GW and GW+C theory calculations with a linear-bands model starting point.

2.2 Background

Since its isolation in 2004[80], graphene and its many extraordinary properties have been studied intensively. Graphene is of particular interest because of the linear dispersion of its valence and conduction bands near the Dirac points in the Brillouin zone, which causes electrons to behave like massless relativistic particles with a chiral character. The electronic band structure of electron-doped graphene on a silicon carbide (SiC) substrate was measured by Bostwick and coworkers [14, 89, 15] using angle-resolved pho- toemission spectroscopy (ARPES). Surprisingly, they found a more complicated electronic structure than the expected Dirac cones. In particular, in the vicinity of the Dirac points, they observed additional satellite bands which they attributed to excitations, strongly coupled hole-plasmon states [15]. 2.3. METHODS 29

Such plasmaron states had been predicted originally in the three-dimensional electron gas by Lundqvist [73] based on GW theory calculations. In the GW approach[40], only the first term of the diagrammatic expansion of the electron self energy in terms of the screened Coulomb interaction W is retained and the self energy is computed as the product of W and the interacting electron Green’s function G. While the GW method gives accurate quasiparticle properties, such as band gaps, for many materials[47], it fails for the description of plasmon satellite features in many systems [68, 38, 67]. An accurate description of plasmon satellite features can be obtained via the GW plus cumulant (GW+C) method[41]. This method is based on a cumulant expansion of the electron Green’s function and provides the exact solution for the model problem of a core electron interacting with [62]. For many materials, the GW+C approach gives good agreement with experimentally observed satellite features[68, 38, 67, 6]. It was therefore surprising when Polini and coworkers[87] and Hwang and Das Sarma [46] reported plasmon satellite features based on GW calculations for doped graphene that were in good agreement with the experimental observations of Bostwick et al. [15, 98]. In a recent paper [68], we have shown that this agreement is somewhat fortuitous and results from a cancellation of errors: the overestimation of the quasiparticle-satellite separation in GW theory is compensated by an underestimation of the plasmon frequency resulting from the use of a too large dielectric constant of the SiC substrate in these calculations. We also showed that a GW+C calculation in conjunction with a careful treatment of the substrate gives agreement for the spectral function at the Dirac point. In this chapter we expand upon our earlier results on doped graphene, providing spectral functions from ab initio GW and ab initio GW+C theory for doped graphene in isolation and on a SiC substrate. The rest of this chapter is organized as follows. In section 2.3 we describe the details of our calculation, including the special techniques used to achieve the fine k-point sampling needed for converged results and the treatment of the SiC substrate from first principles. In section 2.4 we present spectral functions for isolated doped graphene and doped graphene on a SiC substrate, and compare our results to the experimental ARPES spectra of Bostwick and coworkers [15]. We also compare our ab initio results to results obtained by using a simplified linear-bands Hamiltonian. In section 2.5 we summarize our results and present our conclusions.

2.3 Methods

In this section, we describe our calculations of spectral functions for suspended graphene and graphene on a SiC substrate. We first carry out density-functional theory (DFT) calculations to obtain an accurate mean-field starting point for the subsequent many-body Green’s function theory calculations. Next, we compute self energies and spectral functions 2.3. METHODS 30 using the ab initio GW and GW+C theories. Finally, we describe our spectral function calculations using a simple linear-bands model starting point.

2.3.1 Mean-field calculation We carried out DFT calculations on suspended graphene using a plane-wave basis and norm- conserving pseudopotentials as implemented in the PARATEC package [81]. We employed the local density approximation to the exchange-correlation energy, a plane-wave cutoff of 45 Ry, a separation of 19 bohr between periodically repeated graphene sheets and the experimental lattice constant of graphene [18]. For the self-energy calculations, we obtained DFT wave functions and energies on a 72 72 k-point grid in the Brillouin zone. × In experimental photoemission experiments, graphene is usually placed on a substrate. For example, in a recent experiment by Bostwick and coworkers [15], hydrogenated SiC was used as substrate. To capture the effect of the substrate, we carried out a calculation of graphene on a 10-atomic-layer thick slab of 4H-SiC. Because SiC and graphene have different lattice constants, calculations must be carried out in a graphene supercell. In this work, we employ a 2 2 graphene supercell and a √3 √3 SiC supercell. This requires an eight × × percent compression of the SiC [96]. We have verified that this only leads to a small change of the SiC dielectric constant. Larger supercells, such as a 13 13 graphene supercell [29], × require less compression of the SiC, but quasiparticle calculations on such a large system are currently not feasible. Using the DFT-D functional implemented in the Quantum ESPRESSO program package[25] to describe van der Waals interactions[36, 8], we determined a separation of the graphene from the SiC substrate of 3.86A˚. In this calculation, we used a plane-wave cutoff of 45 Ry and a separation of 54 bohr between the periodically repeated graphene sheets. Because of the weak hybridization between the graphene sheet and the SiC substrate, it is possible to calculate their non-interacting susceptibilities separately (see next section). We therefore carried out DFT calculations of (i) only the 2 2 graphene sheet without the substrate and × (ii) only the SiC substrate without the graphene sheet. In (i), we used a 36 36 k-point × grid corresponding to a 72 72 grid of the graphene primitive cell. In (ii), we use a 3 3 × × k-point grid corresponding to a 5 5 grid of the SiC primitive cell. × 2.3.2 GW calculation The photoelectron current in an ARPES experiment with monochromatic photons of fre- quency ν and polarizatione ˆν is given by [20]

occ

I(k,ω, eˆν,ν)= I0(n, k,ω, eˆν,ν)f(ω)Ank(ω), (2.1) Xn 2.3. METHODS 31 where k and ω are the momentum and binding energy of an electron in band n, f(ω) is the Fermi-Dirac distribution and I0 includes the absorption cross section of the incident photons. Also, Ank(ω)=1/π ImGnk(ω) denotes the spectral function with Gnk(ω) being | | the interacting one-particle Green’s function.

One usually obtains Gnk(ω) by solving the Dyson equation

−1 −1 xc G (ω)= G (ω) + Σnk(ω) V , (2.2) nk nk,0 − nk xc where Gnk,0(ω) and Vnk denote the mean-field Green’s function and exchange-correlation potential, respectively. Also, Σnk(ω) is the one-electron self energy. Within the GW approximation[40, 43], the self energy is computed as the product of the interacting Green’s function G and the screened Coulomb interaction W . In Eq. (2.2), we assumed that the self energy is diagonal in the basis of DFT orbitals.

The energies of quasiparticle excitations, Enk, are given by the poles of the interacting Green’s function and can be obtained by solving the quasiparticle equation

xc Enk = ǫnk + Σnk(Enk) V , (2.3) − nk where ǫnk denotes the mean-field orbital energy. When dopant electrons (or holes) are added to graphene, a new collective excitation, the carrier plasmon, with a characteristic square-root-of-q dispersion relation is observed. The coupling of this plasmon to the Dirac gives rise to very sharp features in the self energy as function of frequency. Specifically, the combination of the linear dispersion of the Dirac fermions and the ωpl(q) q dispersion of the carrier plasmon leads to a van ∝ | | Hove-like divergence in the imaginaryp part of the self energy when the group velocities of a hole and the carrier plasmon are equal [87, 46]. To compute the self energy, two summations over all k-points in the Brillouin zone are required: one in the calculation of the non-interacting susceptibility matrix χ0(q, ω), another one in the calculation of the self energy itself. The fineness of the k-point grids determines the energy resolution of self energy. To resolve the sharp features of the self energy of doped graphene, we found that extremely fine k-point grids are needed in the vicinity of the Dirac points. Specifically, for isolated graphene we used a 720 720 k-point sampling in this region × of the Brillouin zone to obtain a converged self energy. We achieve this fine sampling by interpolating the DFT energies from a coarse 72 72 × k-point grid onto the fine grid. The contribution to the non-interacting frequency-dependent susceptibility matrix χ0(q, ω) from the intraband transitions is then calculated by carrying out summations over the fine k-point grid, assuming the matrix elements are slowly varying and need not be interpolated. For suspended graphene, we calculate χ0(q, ω) using a 5 Ry plane-wave cutoff, 70 empty states and a sampling of the real frequency axis with fine steps 2.3. METHODS 32 of 0.05 eV up to 6 eV, and then coarser steps up to 100 eV. A broadening of η =0.075 eV was employed. Next, we calculate the inverse dielectric matrix ǫ−1(q, ω)=[1 v(q)χ (q, ω)]−1, − 0 where v(q) denotes the truncated bare Coulomb interaction [48]. Finally, we obtain the self energy by summing over contributions in the same way as was done for the susceptibility. Because of the large unit cell size, the evaluation of the self energy of graphene on a SiC substrate is computationally challenging. To include the effect of the hydrogenated SiC substrate on doped graphene from first principles in a computationally tractable way, we assumed that graphene hybridizes only weakly with the substrate[102]. Then the non- interacting susceptibility of the full system can be obtained as the sum of the susceptibilities of doped graphene and the SiC slab, calculated separately. For the doped graphene sheet, we carry out DFT calculations on a coarse 36 36 k-point grid and then interpolate the × DFT energies to a fine 360 360 grid. For the SiC contribution, we obtain the frequency- × dependent non-interacting susceptibility on a 3 3 k-point grid (using 270 empty states). × We then interpolate these susceptibilities onto the fine grid and add them to the graphene contributions. Note that it is important to include the frequency-dependent screening re- sponse of the substrate in order to obtain an accurate Fermi velocity of graphene on SiC. The carrier plasmon is a low-energy excitation and is not affected by the use of the static screening response of the substrate. To improve the mean-field Green’s function, which is used to calculate the self energy, all DFT energies are shifted by a constant ∆nk, which is determined by requiring that ǫnk +∆nk equals the quasiparticle energy Enk obtained by solving the quasiparticle equation [42]. For the quasiparticle energies, this procedure is equivalent to the so-called on-shell approximation, where the quasiparticle energy shift is obtained by evaluating the self energy at the DFT energy. All self-energy calculations were carried out using the BerkeleyGW program package [11].

2.3.3 GW+C calculation While GW theory yields accurate quasiparticle properties, such as band gaps or band widths, for many semiconductors and insulators[43, 47], much less is known about its accuracy for plasmon satellite properties. For the special case of a dispersionless hole interacting with plasmons, the exact spectral function can be obtained by means of a cumulant expansion [62]. It exhibits an infinite series of plasmon satellite peaks. The first satellite peak is separated from the quasiparticle peak by the plasmon energy. In contrast, the spectral function from GW theory only has a single plasmon satellite peak and the separation of this peak to the quasiparticle peak is significantly overestimated [62]. For systems with non-flat bands the cumulant expansion no longer gives the exact solu- 2.3. METHODS 33 tion, but still significantly improves the description of plasmon satellite properties compared to GW theory [6, 38, 68]. Recently, an improved cumulant expansion, the generalized cumu- lant expansion, was proposed by Kas, Rehr and Reining [53]. We will use this method, which includes both electron and hole-contributions to the spectral function, in our calculations. In the generalized cumulant expansion, the interacting retarded Green’s function is given by HF −iE k t+Cnk(t) Gnk(t)= iΘ(t)e n , (2.4) − HF X xc X where E = ǫnk + Σ V with Σ denoting the static Hartree-Fock self energy. Also, nk nk − nk Cnk(t) is the cumulant function given by

1 e−iωt + iωt 1 Cnk(t)= dω ImΣnk(ω + Enk) − , (2.5) π Z | | ω2 where Σnk(ω) is the some suitable self energy. Given a certain approximation to the self energy, in our case the GW approximation, the cumulant expansion gives an improved Green’s function through Eqs. (2.4) and (2.5). We refer to this method as GW+C in the rest of this paper. To gain physical understanding, it is often useful to separate the cumulant function into a satellite contribution [containing the exponential in Eq. (2.5)] and a quasiparticle contribution (containing the other two terms). Consequently, the GW+C spectral function qp can be expressed as the sum of a quasiparticle contribution Ank(ω) and an infinite series of m,sat plasmon satellite contributions Ank (ω) (denoting the contribution involving the creation or “shake-up” of m plasmons) [6]. The quasiparticle contribution is given by

−γnk qp e Γnk cos αnk (ω Enk)sin αnk Ank(ω)= − −2 2 , (2.6) π (ω Enk) +Γ − nk with Γnk = ImΣnk(Enk) and γnk + iαnk = ∂Σnk(Enk)/∂ω. | | qp The first satellite contribution is a convolution of the cumulant function and Ank ac- cording to 1,sat ′ ′ qp ′ A (ω)= dω Cnk(ω ω )A (ω ). (2.7) nk Z − nk Higher order satellite contributions are obtained by convolving multiple cumulant func- qp tions with Ank.

2.3.4 Linear-bands model calculation We also carried out Green’s function calculations for the linear-bands model of graphene. In this model, one assumes that the band structure of graphene consists of only two linear 2.4. RESULTS 34 bands at the K and K’ points in the Brillouin zone. Because of its simplicity the linear- bands model has been widely used [46, 87, 15]. However, it is important to keep in mind the limitations of this model Hamiltonian approach: i) neglect of deviations from linearity of the bands which become important at higher doping levels, ii) neglect of the anisotropy of the bands, iii) neglect of other bands (for example, the lower lying σ bands) in the calculation of the polarizability and the self energy, iv) the need for a momentum cutoff kc, v) the need to choose an unrenormalized Fermi velocity, and vi) the use of a purely two-dimensional Coulomb interaction. To include screening contributions from non-Dirac bands, we added an analytically parametrized constrained RPA dielectric function calculated within DFT by excluding pz bands [99] to the linear-bands polarizability. When graphene is placed on a substrate, the substrate screens the interaction between electrons in the graphene sheet. To include these complicated screening processes into the linear-bands model, the bare two-dimensional Coulomb interaction is usually divided by a constant κ, which is interpreted as a substrate dielectric constant [15, 87, 46]. In the simplest approach, on takes κ = (ǫ∞ + 1)/2 with ǫ∞ being the bulk long-wavelength zero- frequency limit of the electronic dielectric function of the substrate. In another calculation κ was treated as a fitting parameter [15]. In a previous study [68], we have determined an appropriate value of κ for graphene on a SiC substrate from first-principles calculations. For small q, we found κ 2.2, which we will use in the linear-bands model calculations in this ≈ paper. We then calculate the GW self energy and spectral function of the linear-bands model[87, 6 101]. We employ the DFT-LDA value for the Fermi velocity vF = 0.85 10 m/s and a × wave vector cutoff kc corresponding to a bandwidth of D = vF kc = 7 eV [94]. Finally, we evaluate the spectral function within GW+C theory.

2.4 Results

In this section we describe our results for spectral functions of suspended graphene and graphene on a SiC substrate. For each system we present results for two different doping levels: a high doping level with ǫF ǫD = 1.0 eV (ǫF and ǫD denote the mean-field Fermi − energy and the mean-field Dirac point energy, respectively) corresponding to a charge density 13 −2 of n = 12.4 10 cm and a lower doping level with ǫF ǫD = 0.6 eV (corresponding − × − to n = 3.7 1013 cm−2). Spectral functions were calculated using four different methods: − × ab initio GW and GW+C theory and GW and GW+C theory with a linear-bands model starting point. We compare our results to experimental ARPES spectra for graphene on a SiC substrate[15]. 2.4. RESULTS 35

2.4.1 Suspended graphene Figure 2.1(a) shows the spectral functions of suspended graphene for the high doping level,

ǫF ǫD = 1.0 eV, from ab initio GW theory along the K Γ direction in the Brillouin − − zone. All wave vectors are measured relative to the Dirac point. The spectrum exhibits two intense quasiparticle bands, which are degenerate at the Dirac point, k/kF = 0. We also find two weaker satellite bands resulting from the shake-up of plasmons. At the Dirac point, the two satellite bands are degenerate. The upper satellite band merges with the upper quasiparticle band at the Fermi level. This is a characteristic feature of two-dimensional metallic systems and was also observed in calculations on the the two-dimensional electron gas [67, 22, 51]. We also observe an interesting behavior where the upper satellite band crosses the lower quasiparticle band: the lower quasiparticle band exhibits a kink and is significantly broadened. Fig. 2.3(a) shows the imaginary part of the ab initio GW self energy evaluated “on the shell”, i.e. at the DFT energies. For the lower band, ImΣnk(ǫnk) has a maximum at k/kF 0.6, which results from the increased number of hole-plasmon decay ≈ channels at the crossing with the lower satellite band. Because of causality requirements, the real and imaginary part of the self energy are related via a Kramers-Kronig transformation and the peak in the imaginary part of Σ induces a kink in the real part of Σ and the quasiparticle band. Similar kinks in the quasiparticle bands of graphene arise from electron- phonon interactions[16, 14]. The lower satellite band disperses away from the Fermi level as k increases and finally merges with the lower quasiparticle band. Figure 2.1(b) shows the spectral functions from ab initio GW+C theory. Again, we find intense quasiparticle bands, which are somewhat broader than in the ab initio GW theory. The satellite bands are also broader and their separation to the quasiparticle bands is reduced compared to ab initio GW theory making it difficult to discern them from the quasiparticle bands even near the Dirac point. To understand the differences between the results of ab initio GW and ab initio GW+C theory, we present spectral functions at k/kF = 0 and k/kF = 0.4 for the upper and lower Dirac bands in Figs. 2.2(a), (c) and (d), respectively. Fig. 2.2(a) shows that ab initio GW+C theory gives a much broader satellite peak than ab initio GW theory. The separation of the satellite peak from the quasiparticle peak is 0.70 eV in ab initio GW+C theory compared to 0.85 eV in ab initio GW theory. The origin of this discrepancy can be traced back to a spurious plasmaron solution of the quasiparticle equation in ab initio GW +C GW theory, see Fig. 2.2(b). Here, we computed the GW+C self energy via Σnk (ω) = −1 xc ω ǫnk G (ω)+ V , where Gnk denotes the Green’s function from ab initio GW+C − − nk nk theory. While the quasiparticle equation in ab initio GW+C theory exhibits only a single solution corresponding to the standard quasiparticle band, ab initio GW theory yields a second solution at 1.80 eV corresponding to well-defined plasmaron excitation. − 2.4. RESULTS 36

(a) (b)

0.0 0.0 >1.25 >1.25

−0.5 −0.5 1.00 1.00

−1.0 0.75−1.0 0.75

−1.5

ω(eV) ω(eV) 0.50−1.5 0.50

−2.0 0.25−2.0 0.25

−2.5 0.00 0.00 −2.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 k/kF k/kF (c) (d)

0.0 >1.250.0 >1.25

−0.5 1.00−0.5 1.00

0.75 0.75 −1.0 −1.0

ω(eV) ω(eV) 0.50 0.50 −1.5 −1.5

0.25 0.25 −2.0 −2.0

0.00 0.00 −2.5 −2.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 k/kF k/kF

Figure 2.1: Spectral functions for suspended graphene with electron doping corresponding to a charge density n = 12.4 1013 cm−2 from (a) ab initio GW, (b) ab initio GW+C, (c) linear-bands model GW,− and× (d) linear-bands model GW+C theories along the K Γ direction in the Brillouin zone. All energies are given with respect to the interacting Fermi− energy. 2.4. RESULTS 37

The spectral functions from ab initio GW and ab initio GW+C theory for the upper

Dirac band at k/kF = 0.4 [see Fig. 2.1(c)] look qualitatively similar to the Dirac point result. We observe separate quasiparticle and satellite peaks. Compared to the Dirac point result, the satellite peaks are somewhat reduced in height and their separation from the quasiparticle peak is smaller.

For the lower Dirac band at k/kF = 0.4 [see Fig. 2.1(d)], ab initio GW theory exhibits only a very weak satellite peak at 2.10 eV, while in the ab initio GW+C theory we obtain − a single, asymmetric, broad peak. Figure 2.1(c) shows the spectral functions from GW with a linear-bands model starting point for ǫF ǫD = 1 eV. The linear-bands GW result looks qualitatively similar to the − ab initio GW result, Fig. 2.1(a). It exhibits two intense quasiparticle bands, an upper satellite band which merges with the upper quasiparticle band at the Fermi level, and a lower satellite band. But there are also important differences. Specifically, the quasiparticle bands in the linear-bands GW theory have a smaller linewidth than in the ab initio GW theory, see Fig. 2.3(a). In particular, at the Dirac point, the linear-bands GW theory yields a vanishing linewidth, while the ab initio GW theory predicts a finite value. Both deviations from linearity in the band structure (for example, resulting from trigonal warping) and also the numerical broadening used in the ab initio calculation can lead to an increase of the linewidth at the Dirac point. Considering the good agreement of previous ab initio GW linewidths with experiment [82] and the relatively small value of the numerical broadening, we believe that the deviations from linearity of the bands are the most important factor. Also, in contrast to the ab initio result, the lower satellite band does not merge with the lower quasiparticle band in the linear-bands GW theory, but disperses parallel to it. Finally, Fig. 2.1(d) shows the spectral functions from GW+C with a linear-bands model starting point for ǫF ǫD = 1 eV. Again, the results are similar to the ab initio GW+C − results, but with reduced linewidths. The spectrum also exhibits a second satellite band, resulting from the shake-up of two plasmons. In Figs. 2.3(b), (c) and (d), we compare spectral functions from the linear-bands GW and the linear-bands GW+C theory at k/kF = 0 and k/kF =0.4 for the upper and lower Dirac bands, respectively. At the Dirac point, the linear-bands GW theory gives a separation of 0.95 eV between the quasiparticle and satellite peaks, 0.10 eV larger than the ab initio value. For the linear-bands GW+C spectral function, a second satellite peak is found at 2.30 eV. − The absence of this second peak in the ab initio GW+C theory is a consequence of the larger quasiparticle linewidths [see Fig. 2.3(a)]: the quasiparticle linewidths influence the widths of the satellite peaks, because the satellite contributions to the spectral function are obtained by convolving the cumulant function with the quasiparticle contribution, see Eq. (2.7). The false prediction of a second satellite peak in the linear-bands GW+C theory demonstrates the importance of an ab initio approach providing accurate quasiparticle linewidths. 2.4. RESULTS 38

3.5 1.0 (a) (b)

3.0 (eV) 0.5 k n

) 2.5 0.0 -1 2.0 −0.5 -ω +ε

(eV 1.5 k k xc n

n −1.0 1.0 A 0.5 −1.5 (ω)-V k

0.0 n −2.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 −2.5 −2.0 −1.5 −1.0 −0.5 ω (eV) Σ ω (eV)

3.5 3.5 (c) (d) 3.0 3.0

) 2.5 ) 2.5 -1 -1 2.0 2.0

(eV 1.5 (eV 1.5 k k n 1.0 n 1.0 A A 0.5 0.5 0.0 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 ω (eV) ω (eV)

Figure 2.2: Spectral functions for suspended graphene with electron doping corresponding to a charge density n = 12.4 1013 cm−2 from ab initio GW (red curve)and ab initio − × GW+C (blue curve) theories (a) at the Dirac point (k/kF = 0), (c) at k/kF = 0.4 for the upper Dirac band and (d) at k/kF = 0.4 for the lower Dirac band. (b) shows the solution of the quasiparticle equation in ab initio GW and ab initio GW+C theory at the Dirac point. Arrows indicate positions of satellite peaks in (a), (c), and (d) and of the plasmaron solution in (b). All energies are given with respect to the interacting Fermi energy. 2.4. RESULTS 39

3.5 (a) (b) 0.25 3.0

) 2.5 0.20 -1 )(eV)

k 2.0 n 0.15 (ε

(eV 1.5 k k

n 0.10 n 1.0 A 0.05 0.5 Im Σ Im 0.00 0.0 0.0 0.2 0.4 0.6 0.8 1.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 k/kF ω (eV)

3.5 3.5 (c) (d) 3.0 3.0

) 2.5 ) 2.5 -1 -1 2.0 2.0

(eV 1.5 (eV 1.5 k k n 1.0 n 1.0 A A 0.5 0.5 0.0 0.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 ω (eV) ω (eV)

Figure 2.3: (a) On-shell imaginary part of the self energy for suspended graphene with electron doping corresponding to a charge density of n = 12.4 1013 cm−2: upper Dirac − × band (triangles) and lower Dirac band (circles) from ab initio GW theory (red) and GW theory with a linear-bands model starting point (blue). (b), (c) and (d): Spectral functions of suspended graphene with the same doping level as (a) from GW (blue curves) and GW+C (red curves) theories with a linear-bands model starting point at the Dirac point (b), at k/kF = 0.4 for the upper Dirac band (c), and the lower Dirac band (d). All energies are given with respect to the interacting Fermi energy. 2.4. RESULTS 40

Next, we present our results for the lower doping level, ǫF ǫD =0.6 eV (corresponding − to a charge density n = 3.7 1013 cm−13). Within the linear-bands model, it has been − × shown that the spectral functions of doped graphene exhibit a scaling behavior[87]: the spectra for different doping levels look the same when all wave vectors are divided by the

Fermi wave vector kF and all energies are divided by ǫF . The scaling behavior was also found in the experimental ARPES studies by Bostwick and coworkers[15]. Fig. 2.4(a) shows that this scaling transformation of the energy and wave vector axes indeed brings the Dirac point ab initio GW spectral functions from the two doping levels into good agreement. Fig. 2.4(b) shows that also the ab initio GW+C spectral functions exhibit the scaling property. For comparison, Figs. 2.4(c) and (d) show the scaled spectral functions from the linear-bands GW and GW+C theories, respectively.

2.4.2 Graphene on silicon carbide Figure 2.5(a) shows the spectral functions of graphene on a SiC substrate from ab initio GW theory for the high doping level, ǫF ǫD =1.0 eV, along the K Γ direction in the Brillouin − − zone. The spectrum is qualitatively similar to the isolated graphene result [Fig. 2.1(a)]: we find two intense quasiparticle bands and two weaker plasmon satellite bands with the upper satellite band merging with the upper quasiparticle band at the Fermi level. When the upper satellite band crosses the lower quasiparticle band, the lower quasiparticle band is broadened and exhibits a kink. Because of the additional screening from the substrate, the frequency of the carrier plasmon is reduced resulting in a reduction of the quasiparticle- satellite separation. At the Dirac point, we find a separationof0.66 eV, compared to 0.85 eV for isolated graphene. Figure 2.5(b) shows the spectral functions from ab initio GW+C theory. Again, the additional correlation effects included in the cumulant expansion lead to a broadening of both quasiparticle and satellite bands and to a further reduction of the quasiparticle-satellite separation. At the Dirac point, we find a satellite-quasiparticle separation of 0.40 eV, compared to 0.70 eV for isolated graphene. Figs. 2.6(a), (c) and (d) compare spectral functions from the ab initio GW and the ab initio GW+C theory at k/kF = 0 and k/kF = 0.4 for the upper and lower Dirac bands, respectively. Figure 2.5(c) shows the spectral functions from the linear-bands GW theory. Again, we find that the lower satellite band disperses parallel to the lower quasiparticle band. Fig. 2.5(d) shows the spectral functions from the linear-bands GW+C theory. Because of the underestimation of the quasiparticle lifetime in the linear-bands model, a second satellite band can again be observed. In Figure 2.7(a), we compare our results from ab initio GW+C theory to the experi- mental ARPES spectrum of Bostwick and coworkers [15]. Because experimental data is not 2.4. RESULTS 41

3.5 3.5 (b) 3.0 high doping 3.0 low doping 2.5 2.5

k (a) k

n 2.0 n 2.0 A A

F 1.5 F 1.5 ε ε 1.0 1.0 0.5 0.5 0.0 0.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 ω/εF ω/εF

3.5 3.5 (c) (d) 3.0 3.0 2.5 2.5 k k

n 2.0 n 2.0 A A

F 1.5 F 1.5 ε ε 1.0 1.0 0.5 0.5 0.0 0.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 ω/εF ω/εF

Figure 2.4: Spectral functions of isolated graphene for two different doping levels (n = 12.4 1013 cm−2 for the higher doping level and 3.7 1013 cm−2 for the lower doping level)− at× the Dirac point from ab initio GW (a), ab initio− GW+C× (b), linear-bands GW (c), and linear-bands GW+C (d) theories. All energies are divided by the non-interacting Fermi energy ǫF and all wave vectors are divided by the Fermi wave vector kF . 2.4. RESULTS 42

(a) (b) 0.0 0.0

>1.50 >1.50 −0.5 −0.5

−1.0 1.00−1.0 1.00

−1.5 −1.5 ω(eV) ω(eV) 0.50 0.50 −2.0 −2.0

−2.5 0.00−2.5 0.00

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 k/kF k/kF (c) (d)

0.0 0.0>1.50 >1.50

−0.5 −0.5 1.00 1.00

−1.0 −1.0 ω(eV) ω(eV) −1.5 −1.50.50 0.50

−2.0 −2.0 0.00 0.00

−2.5 −2.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 k/kF k/kF

Figure 2.5: Spectral functions for graphene on a SiC substrate with electron doping corre- sponding to a charge density n = 12.4 1013 cm−2 from (a) ab initio GW, (b) ab initio GW+C, (c) linear-bands model GW,− and× (d) linear-bands model GW+C theories along the K Γ direction in the Brillouin zone. All energies are given with respect to the interacting Fermi− energy. 2.4. RESULTS 43

3.5 7 (a) (b) 3.0 6

) 2.5 5 -1 k

2.0 n 4 A F

(eV 1.5 3 k ε n 1.0 2 A 0.5 1 0.0 0 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 −1.6 −1.2 −0.8 ω (eV) ω/εF

3.5 3.5 (c) (d) 3.0 3.0

) 2.5 ) 2.5 -1 -1 2.0 2.0

(eV 1.5 (eV 1.5 k k n 1.0 n 1.0 A A 0.5 0.5 0.0 0.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 ω (eV) ω (eV)

Figure 2.6: (a), (c) and (d): Spectral functions of graphene on a SiC substrate with electron doping corresponding to a charge density of n = 12.4 1013 cm−2 from ab initio GW − × (blue curves) and GW+C (red curves) theories at the Dirac point (a), at k/kF = 0.4 for the upper Dirac band (c), and the lower Dirac band (d).(b): Comparison of scaled ab initio spectral functions at the Dirac point with experimental ARPES data[15]. All energies are given with respect to the interacting Fermi energy. 2.5. CONCLUSIONS 44 available for the high doping level used in our calculation, we compare our scaled theoretical spectral functions to the scaled experimental data [86]. We find good agreement between theory and experiment for the intense upper quasiparticle band and the weak upper satel- lite band. In the experimental spectrum, the lower satellite band is strong, while the lower quasiparticle band is only observed in the vicinity of the Dirac point. In contrast, the ab initio GW+C theory yields an intense lower quasiparticle band and a weaker lower satellite band. Figure 2.6(b) compares the scaled ab initio GW and GW+C spectral functions at the Dirac point to the experimental ARPES data. Both for the quasiparticle-satellite separation and also for the wdith of the two peaks ab initio GW+C theory agrees much better with experiment than ab initio GW theory. In Fig. 2.7(b), we also compare the linear-bands GW+C results to experiment. While the linear-bands results are qualitatively similar to the ab initio results, the positions of the bands can be more easily observed because of the unphysically small linewidths. We attribute the differences between our calculations and the experimental spectrum to extrinsic and interference effects. These effects, which are very important in the photoemis- sion from bulk solids [38, 68], result from inelastic scattering processes that the photoelectron undergoes on its way to the detector and transfer spectral weight from the quasiparticle peak into the satellite peaks. In Fig. 2.7(b), we have indicated the positions of the quasiparticle and satellite bands from linear-bands GW+C theory by dashed lines. It can be seen that a transfer of spectral weight from the lower quasiparticle band to the lower satellite band would improve the agreement between theory and experiment. Further work is therefore needed to include extrinsic and interference effects into the current theory.

2.5 Conclusions

We have presented spectral functions from ab initio GW and GW+C theory for doped graphene in isolation and on a silicon carbide substrate. We find that ab initio GW+C theory improves the description of plasmon satellite features compared to ab initio GW theory. Specifically, we find good agreement with the experimental angle-resolved photoe- mission spectra of Bostwick and coworkers [15] for the upper quasiparticle band and the upper satellite band. In contrast to the experimental spectrum, we find an intense lower quasiparticle band and a relatively weak lower satellite band. We attribute this discrepancy to extrinsic and interference effects, which are not included in the present theory. We also carried out GW and GW+C calculations with a linear-bands model starting point. Using a substrate dielectric constant derived from ab initio calculations, we find good overall agree- ment with the first-principles results. However, the calculations based on the linear-bands model exhibit significantly sharper linewidths than the ab initio results and, as a conse- 2.5. CONCLUSIONS 45

(b) (a)

>1.50 >1.50 −0.5 −0.5 1.00 1.00 F F −1.0

−1.0 ω/E ω/E 0.50 0.50

−1.5 −1.5 0.00 0.00

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 k/kF k/kF

Figure 2.7: (a): Comparison of ab initio GW+C spectral functions to the experimental ARPES spectrum obtained by Bostwick and coworkers [15]. Experiment is on the left, theory on the right. (b): Comparison of GW+C spectral functions with a linear-bands starting point to the experimental ARPES spectrum. Again, experiment is on the left, theory on the right. Dashed lines indicate the positions of the quasiparticle and satellite bands in the linear-bands GW+C theory. All energies are given with respect to the interacting Fermi energy. 2.5. CONCLUSIONS 46 quence, a second plasmon satellite is observed in the linear-bands GW+C spectral functions in the vicinity of the Dirac point. 47 3

Other Studies of Plasmon Satellites : Bulk Si and 2DEGs

3.1 Summary

In this section we review two other cases in which the cumulant expansion gives better agreement with experiment for satellite properties than the GW approximation, namely the ARPES spectra of bulk Si and the time-domain capacitance spectra of semiconductor quantum wells. In both of these systems the quasiparticle-satellite separation is again over- estimated in the in GW calculations, and the GW erroneously predicts the existence of a plasmaron. The cumulant expansion again gets the quasiparticle-satellite separation cor- rect and predicts the existence of a plasmon satellite due to relatively weak electron-plasmon interactions.

3.2 Introduction

The tunneling current I between two different system with DOS’s given by gr(ǫ) and gr(ǫ) when a bias V is applied across them is given by a formalism due to Bardeen [55] as

ǫF +eV 2 I(V )= dǫgr(ǫ + eV )gl(ǫ) M(ǫ) , (3.1) ZǫF | | where ǫF is the fermi energy, e is the charge of the electron, and M(ǫ) is the tunneling matrix element. In time-domain capacitance spectroscopy one measures the tunneling between a three-dimensional electrode and a two-dimensional electron gas (2DEG) [23]. If we assume the DOS of the electrode and the tunneling matrix elements are slowly varying functions of energy, then the derivative of the tunneling current gives the DOS of the 2DEG. We will 3.3. RESULTS 48 compare the DOS obtained in this fashion to the theoretical many-body density of states, which is given as a sum over the spectral functions for all states in the system (see equation (1.17)). We use a 2DEG model hamiltonian with a modification to take into account the screening in the system [70]. For bulk Si, we will compare ARPES spectra to the corresponding theoretical quantity in exactly the same fashion as for doped graphene. For details of the calculations performed and the experimental setups, see our original papers on the 2DEGs [70] and on bulk Si [69].

3.3 Results

3.3.1 Bulk Si We plot the GW and GW+C simulated ARPES spectra along with experiment in figure 3.1(a)-(c) for the full energy range, including the valence bands and plasmon satellite bands. We see that that valence bands are well produced by both the GW and GW+C theories, as expected. However, we see that the plasmon satellite bands appear at too low of an energy (farther below the valence bands) in the GW theory, while their location is well produced by the GW+C theory. In 3.1(d)-(f) we plot the same thing but in the energy range of only the plasmon satellites. From these plots we can see that, in adddition to getting the location of the plasmon satellites wrong, the GW theory also gets the intensities wrong as well, with the intense lowest satellite band at Γ being the most notable example. This intense plasmon satellite band comes from the same source as in doped graphene : the incorrect prediction of another solution to Dyson’s equation from the GW theory [69]. The GW+C on the other hand does not predict another solution to Dyson’s equation and gets the intensity of the satellite bands correct.

3.3.2 2DEGs in semiconductor quantum wells In figure 3.2 we compare the spectral functions predicted by the GW and GW+C theories for several values of the fermi wavevector. We see, as in doped graphene and bulk Si, that the plasmon satellite is farther below the quasiparticle peak in the GW theory than in the GW+C theory. Additionally, the satellite peak is more pronounced in the GW theory, which again is due to the incorrect prediction by the GW theory of another solution to Dyson’s equation [70]. In figure 3.3, we plot the quasiparticle and satellite peak band edges in the GW and GW+C theories versus experiment as a function of doping. Unsurprisingly, we again see that the GW+C predicts the experimental quasiparticle-satellite separation quite accurately at all dopings, while the GW theory overestimates this separation. 3.3. RESULTS 49

Figure 3.1: (a): Experimental ARPES spectra. (b) GW simulated ARPES spectra. (c) GW+C simulated ARPES spectra. (d), (e), and (f) : same as (a),(b), (c), but in the energy range of the plasmon satellite (-35 to -15 eV). All energies are given with respect to the valence band maximum. 3.3. RESULTS 50

Figure 3.2: ab initio spectral functions from GW (dashed blue line) and GW+C (solid red line) at (a) k = 0.2kF , (b) k = 0.4kF , (c) k = 0.6kF , and (d) k = 0.8kF for doping n =5x1010cm−2. All energies are given with respect to the interacting Fermi energy. 3.3. RESULTS 51

Figure 3.3: Quasiparticle and satellite peak positions for GW, GW+C, and experiment as a function of doping. 3.4. DISCUSSION 52

3.4 Discussion

At this point the lessons from our work on doped graphene, bulk Si, and 2DEGs in semi- conductor quantum wells should be clear : while GW gives the location and linewidth of the quasiparticle bands accurately, it incorrectly predicts another solution of Dyson’s equa- tion for the plasmon satellite and overestimates the quasiparticle-satellite peak separation. The cumulant expansion also predicts the location and linewidth of the quasiparticle bands accurately, but additionally does not give a spurious extra solution to Dyson’ equation for the plasmon satellite and accurately predicts the quasiparticle-satellite peak separation. Thus, when calculating satellite properties due to electron-electron interactions the cumu- lant expansion is preferable to the GW approximation. Both theories work when calculating quasiparticle properties, but the GW theory takes less work and is preferable if one only cares about quasiparticle properties. 53 4

Carrier lifetimes due to electron-electron scattering

4.1 Summary

In this section we discuss the calculation of carrier lifetimes in bulk Si and GaAs due to electron-electron scattering in the GW approximation. We discuss another contribution to the carrier lifetimes, from electron-phonon scattering, for context and completeness.

4.2 Introduction

The calculation of carrier lifetimes is important when considering harvesting the energy of hot carriers before they cool to the band edges in photovoltaic devices. To get accurate car- rier lifetimes, the contribution due to electron-phonon scattering is needed in addition to the contribution from electron-electron interactions that can be computed from the GW approx- imation. This can be done by using density functional perturbation theory to calculate the electron-phonon coupling and then using this to calculate the electron-phonon contribution to the carrier lifetime [12]. The resulting lifetime, τel−ph can be combined with the lifetime due electron-electron interaction τel−el to get the total lifetime τtot using Matthiessen’s rule

−1 −1 −1 τtot = τel−el + τel−ph (4.1)

The carrier lifetime due to the electron-electron interaction is important in its own right, because the only way an electron can scatter to another state considering only electron- electron interactions is via impact ionization. Impact ionization is the inverse of the usual Auger process, i.e. in impact ionization a high energy conduction band electron scatters to a lower energy state with the promotion of an electron from the valence band to the conduction band. Impact ionization is of interest for photovoltaics as well, since it is a 4.3. RESULTS 54 mechanism to get multiple carriers from a single , thus increasing the photovoltaic efficiency.

4.3 Results

4.3.1 Bulk Si In figure 4.1(a) we plot the imaginary part of the self energy due to electron-electron inter- actions and electron-phonon interactions, as well as the DOS, as a function of energy. As expected, the imaginary part of the self energy due to electron-phonon interactions follows the DOS (a higher density of states gives more possible states for an electron to scatter into when interacting with a phonon; all states can be accessed by phonon scattering due their low energy). Also, the imaginary part of the self energy due electron-electron interactions only becomes non-zero at roughly 1.5 eV away from the band edge. This makes sense in light of our previous discussion, because an electron would have to promote another carrier across the gap ( 1.1 eV in Si) via impact ionization in order to scatter and have a finite ∼ lifetime (non-zero ImΣ). The reason that the threshold is 1.5 eV and not 1.1 eV is because momentum conservation also has to hold for the scattering process. In figure4.1(b) we plot the carrier lifetimes (or relaxation times, if one is thinking in a Boltzmann transport frame- work) due to electron-electron interactions and electron-phonon interactions, as well as the total lifetime. The total lifetime within a few eV of the band edge is dominated, unsur- prisingly, by the electron-phonon contribution. This is the range relevant for hot carriers in photovoltaics devices. In figure 4.1 we aggregate all of the lifetimes throughout the Brillouin zone (BZ) and plot them as a function of energy. We can also look at the lifetimes resolved in the Brillouin zone. In figure 4.2 we plot the BZ-resolved carrier lifetimes. We will discuss below how this microscopic information can be used to give further insight into the design of photovoltaic cells.

4.3.2 Bulk GaAs In figure 4.3(a) we plot the impact ionization rate in bulk GaAs calculated in the GW approximation as a function of energy and a fit via a simple Keldysh model [56] (which assumes a direct band gap and parabolic dispersion). Again, we see an onset at some small energy over the band gap, since a carrier has to be excited from the valence band across the gap while conserving momentum. The Keldysh fit captures the behavior reasonably well. 4.4. DISCUSSION 55

Figure 4.1: (a) The imaginary part of the self energy due to electron-electron and electron phonon interactions, as well as the density of states, as a function of energy. (b) The carrier lifetimes, or relaxation times, due to electron-electron and electron-phonon contributions, and the total lifetime. All energies are given with respect to middle of the band gap.

4.4 Discussion

The microscopic information presented in this chapter have only become in reach of ab initio calculations with the increased computing power of the last decade. The results presented on bulk Si and GaAs are surely only the beginning of a much wider exploration of what can be learned about design by considering such microscoping information. For example, by considering in which direction of the BZ the carrier lifetimes are the longest, we learn in whch crystallographic direction we can grow thin-film solar cells for maximum efficiency [12]. Additionally, by combining the carrier lifetimes with the radiative lifetimes at the band edge, one can compute the diffusion length, a key determinant of photovoltaic efficiency. That is the next step in this process of using microscoping information to guide the design of better photovoltaics. I for one am excited about it! 4.4. DISCUSSION 56

Figure 4.2: Brillouin-zone resolved carrier lifetimes and ImΣ in bulk Si. 4.4. DISCUSSION 57

Figure 4.3: Impact ionization rate in bulk GaAs, with fit via the simplified Keldysh model. Energies relative to the conduction band maximum. 58 5

Pseudowavefunctions in the GW approximation

5.1 Summary

We perform ab initio GW calculations on a variety of atomic and bulk systems in order to understand the error caused by the use of pseudowavefunctions in GW calculations. For clarity, we first focus on atomic Si, analyzing the contributions to the bare exchange in depth since the bare exchange is the largest contribution to the error from pseudowavefunctions. Calculations with pseudowavefunctions are found to overestimate contributions to the bare exchange when the peak separation between the innermost and outermost peak in the AE calculation is large for the given contribution, and vice-versa. We confirm that these results hold in a range of atomic systems, and use this information to understand the overestimate of the band gap in bulk systems when using pseudowavefunctions. The peak separation in bulk systems is found to be large, giving an overestimate of the magnitude of the bare exchange for all states when using pseudowavefunctions. This overestimate is larger for the valence states, which gives the increase in the band gap when using pseudowavefunctions in GW calculations.

5.2 Background

The electronic bandstructure is a crucial quantity in understanding of a variety of physical properties of crystalline systems, from optical absorption to . Attempts to calculate bandstructures accurately date back decades, to the early works of Slater, Wigner and Seitz [92, 100]. The use of the plane-wave, pseudopotential (PW-PP) methodology was recognized early as an efficient, simple method for calculating electronic bandstructures, as well as a range of other properties [27, 44, 3, 85, 19, 39]. Indeed, new methods or physical approximations are often first attempted with the plane waves and pseudopotentials 5.2. BACKGROUND 59 and many methods are primarily available in the PW-PP methodology due to the ease of convergence, scalability, and simplicity of the underlying expressions [10, 9, 34, 33, 17, 78]. This was the case for the first accurate, fully ab initio calculations of the electronic bandstructures [47, 31], which were done using many-body perturbation theory, specifically the GW approximation [40]. These early PW-PP GW calculations found very good agreement with experiment for the band gaps of a wide variety of materials. However, later calculations using all-electron (AE) methodologies found a discrepancy with these PW-PP GW results, with the AE GW calculations generally showing smaller band gaps than PW-PP GW calculations [4, 61, 59, 26]. PW-PP GW calculations performed with shallow core states included explicitly in the calculation seemed to agree with the AE GW calculations [93]. Questions remain about this issue, though, because the convergence of both the AE and PW-PP calculations were questionable, especially in light of recent results showing the need to include many bands in order to converge GW calculations [76, 57]. More recent AE calculations from Li and collaborators claim to have addressed the convergence issue, and then proceed to decompose the error in the band gap from PW-PP GW calculations into three contributions : 1) relaxation of the core electrons, 2) core- valence partitioning, and 3) pseudowavefunctions [32, 66]. The first two contributions are understandable physically, as they result from the separation of the electrons into inert core states and active valence states, with the core states frozen at the atomic level and only included implicitly through the pseudopotential. However, the physical reason that using pseudowavefunctions gives an error in the band gap is not clear, and has not been discussed thoroughly. The pseudowavefunctions do not have nodes and this clearly has an effect in the results of Li, et al. [66] but it is unclear why. In this chapter, we investigate the error caused by the use of pseudowavefunctions in GW calculations of the band gap by performing PW-PP calculations with and without the shallow core states explicitly included in the ground-state calculation, performing the GW calculations using only the valence states. The valence wavefunctions from the ground-state calculations with the shallow cores states included are practically equivalent to AE valence wavefunctions. We first analyze the Si atom, focusing on the bare exchange since it is the largest source of error and has a simple dependence on the pseudowavefunctions used to compute it. We discuss the source of error in the bare exchange when using pseudowave- functions and how it varies based on the relative position of the outermost peak of the AE wavefunction. We then confirm our results also hold for the Ar, Ga, and As atoms, and use these results to explain the error in the band gap due to pseudowavefunctions for bulk Si, Ar, and GaAs. 5.3. METHODS 60

5.3 Methods

The bare exchange for an atomic state ψn is given by

occ ∗ ′ ρn,n′ (r ) 3 ′ 3 ′ = d rρn,n (r) d r ′ (5.1) | | − ′ Z Z r r Xn | − |

′ ∗ ′ where ρn,n (r)= ψn(r)ψn (r) is the exchange charge density associated with a given contri- ′ bution n to the bare exchange of the state ψn. As will be shown, it is necessary to look at ′ specific contributions n to the total bare exchange of the state ψn in order to understand what is causing the difference between PW-PP and AE calculations.

We can rewrite equation (5.1) by defining an exchange potential Vn,n′ as

∗ ′ ′ 3 ′ ρn,n (r ) Vn,n′ (r)= d r (5.2) Z r r′ | − | and exploiting the spherical symmetry of atomic systems to obtain

occ

= dΩdr(rρn,n′ (r))(rVn,n′ (r)) (5.3) | | − ′ Z Xn In the figures presented in this chapter, we include the radial phase-space factor ’r’ in equation (5.3) in order to give the proper weight at each point. We also perform the angular integrals in order to eliminate dependence on these variables and simplify the discussion. Ground state density functional theory calculations with the Purdue-Zunger LDA func- tional [83] were carried out using the Quantum ESPRESSO PWscf code [30, 25]. In the calculations where the shallow core states were kept in the pseudopotential the wavefunc- tions were expanded up to a kinetic energy cutoff of 60 Ry for Si, Ga, and As and 80 Ry for Ar. A cutoff of 700 Ry (900 Ry) was used for calculations that explicitly included the shal- low cores states. These values of the kinetic energy cutoff are very large and give errors less than 10 meV in the self energy. A box size of 13.5 bohr was used for all atomic calculations. We refer to the calculations with only the valence electrons as PS (for pseudo) in this work, to denote that it is the usual PW-PP calculation. We refer to the calculations that also include the shallow core states as AE since they are essentially all-electron calculations. The 1s electron is very deeply bound in energy and space and does not affect our conclusions. For the calculations on the open-shell atoms (Si, Ga, and As) we equally occupy the p-orbitals to avoid issues related to degenerage ground states. GW calculations were performed using the BerkeleyGW [21, 11] package. For the atomic calculations, we focus only on the bare exchange and so the only relevant convergence parameter is the kinetic energy cutoff, which we chose to be the same as for our ground state calculations. We use spherical truncation of the coulomb interaction with a cutoff radius of 6.75 bohr [48]. 5.4. RESULTS 61

5.4 Results

We now consider the bare exchange in atomic Si in order to clearly understand the effect of using pseudowavefunctions in the place of AE wavefunctions in a simple model system. In Fig. 1(a) we show the exchange charge density as a function of the radial distance from the nucleus. for n = n′ = 3s in the top panel, and for n = n′ = 3p in the bottom panel. We see that both AE and PS exchange charge densities agree at large r, while showing some discrepancy for small r, as expected. The location of the peak of the AE exchange charge densities is indicated with arrows. The inner peak for the n=n’=3s exchange charge density,

ρ3s,3s, is farther from the origin than for ρ3p,3p due to the requirement of orthogonality with the 1s state. The outer peak for ρ3s,3s is closer to the origin than for ρ3p,3p, which is reasonable given that the 3s state is of lower energy than the 3p state. The inner-outer peak separation for ρ3p,3p is 1.56 bohr, while for ρ3s,3s it is 1.22 bohr. This peak separation, lsep, determines whether the PS calculation over or underestimates the bare exchange contribution, as will be discussed below.

In Fig. 1(b) the exchange potential is plotted. Both V3s,3s and V3p,3p show almost negligible differences between the AE and PS calculations, with very small differences only at small r where the potential is near zero. This small difference in the exchange potential - in spite of a relatively significant difference in the exchange charge density - results from the integral over the coulomb kernel in Eq. (5.2) and the phase-space factor, both of which suppress differences between the AE and PS calculations. The combination of this small difference in the exchange potential and the fact that the exchange potential goes to zero at small r, where the AE and PS exchange charge densities are most different, is responsible for the relatively small absolute differences between the AE and PS bare exchange. Note that though these absolute differences are small in magnitude when compared to the magnitude of the bare exchange, they are the biggest component in the error in the self energy, which in semiconductors leads to a noticeable error in the band gap when using pseudopotentials. In Fig. 1(c) the product of the exchange charge density and the exchange potential ρV versus radius is plotted. This is the quantity that is integrated in order to get the given bare exchange contribution, (cf. Eq. (5.3)). For small r, the AE ρV has a peak and is larger than the PS ρV . This peak is significant for ρ3s,3sV3s,3s, while for ρ3p,3pV3p,3p it is smaller. This is because the peaks in the exchange charge densities are closer for ρ3s,3s than for ρ3p,3p, which leads to more interaction between the exchange charge densities in these two regions. At some intermediate r there is a crossover, at which point the PS ρV is larger than the AE ρV . Finally, at large r the AE and PS ρV are the same. What determines whether the PS bare exchange contribution is larger than the AE bare exchange contribution is whether the extra weight in the PS ρV in the intermediate r region is larger than the extra weight in the AE ρV in the small r region. 5.4. RESULTS 62

(a) 0.3

0.2 ) 2

− 0.1

0.0 (bohr ′ 0.3 n,n ρ · 0.2 r

0.1

0.0 0.0 0.5 1.0 1.5 2.0 2.5 radius (bohr)

Figure 5.1: The exchange charge density. The top panel is for n=n’=3s and the bottom panel is for n=n’=3p. See equations (5.1)-(5.3) for the meaning of n and n’. AE results are with solid lines, while the PW-PP results are with dotted lines. Arrows indicate the peak locations of the exchange charge densities. In (c) the lines with points are the running integral of the difference of the product of the exchange charge density and exchange potential between the AE and PS calculations, as indicated in the right vertical axis. 5.4. RESULTS 63

10

5 bohr) ·

0 (eV

′ 10 n,n V ·

r 5

0 0.0 0.5 1.0 1.5 2.0 2.5 radius (bohr)

Figure 5.2: The exchange potential. Labeling and panels same as in 5.1. 5.4. RESULTS 64

3 0.3 ) 1 − (eV)

2 0.2 ) ′ PP n,n ρ bohr 1 0.1 ′ · PP n,n V 0 0.0 − ′ (eV ′ 3 0.3 AE n,n n,n ρ ρ ′ ′ 2 0.2 AE n,n n,n V V ( ·

1 0.1 · 2 2 r r

0 0.0 r 0.5 1.0 1.5 2.0 2.5 0 radius (bohr)

Figure 5.3: The product of the exchange charge density and the exchange potential. Labeling and panels same as in 5.1. The lines with points are the running integral of the difference of the product of the exchange charge density and exchange potential between the AE and PS calculations, as indicated in the right vertical axis. 5.4. RESULTS 65

For the n = n′ = 3s case, the PS bare exchange contribution is smaller due to the larger small r peak in the AE ρV , while for n = n′ =3p the PS bare exchange contribution is larger due to the smaller peak in the AE ρV at small r. To see this more clearly, we note that taking the difference between the AE and PS AE ρV and then integrating will give the difference between the AE and PS bare exchange contributions. If this integral is instead performed up to a given radius r, the result is the difference in the bare exchange contributions up to that r. We plot this quantity in Fig. 1(c). From this we can clearly see that the peak at small r for n = n′ =3s is large enough that it outweighs the intermediate r region, leading to a positive difference (AE minus PS) in the bare exchange contributions. For n = n′ =3p the peak at small is not large enough and is outweighed by the intermediate r region, leading to a very small negative difference in the bare exchange contributions. Table 1 shows the contributions to the bare exchange for atomic Si, as well as the total bare exchange. In addition to the contributions for n = n′ =3s (n = n′ =3p) being larger (smaller) in magnitude for the AE calculation, as discussed, we see that the n =3s,n′ =3p contribution is smaller in magnitude for the AE calculation. This can be understood in a similar way as the other contributions. The exchange charge density ρ3s,3p takes both positive and negative sign at different points in real-space, due to the angular dependence of the p-orbital. When the integrals in (5.1) are performed, if there are regions of alternating sign, then this will decrease the magnitude of the final result due to cancellation between regions of different sign. This can be seen in Table 1 by comparing the magnitude of the n =3s,n′ =3p contribution to that of the n = n′ =3s or 3p contribution. The region near the origin is more sensitive to this cancellation effect since points there are closer to other points with differing sign than are points farther from the origin. This leads to a smaller peak in ρV at the origin for n = 3s,n′ = 3p for the AE calculation and, by the above discussion, an AE bare exchange contribution magnitude that is smaller than for the PS calculation. Consideration of the angular dependence of the exchange charge density may be helpful in understanding results in molecular systems [54]. For bulk systems, such an analysis is likely to be too complicated to be fruitful. Table 1 also show results for atomic Ga, As, and Ar, with trends in the bare exchange contributions that are identical to those seen in Si. Given that Ga and As have shallow d-electrons and Ar is much more localized than any of the other atoms, this shows that the trends discussed are robust across very different atomic environments. Note that though there are clear and consistent trends in the bare exchange contributions, the total bare exchange is not as consistent and indeed hides many of the trends. For example, the total bare exchange for both the 3s and 3p states is smaller in magnitude for the AE calculation, despite the fact that the n = n′ = 3s contribution is larger for the AE calculation. This is not the case in As, where the greater degree of localization leads the n = n′ = 3s AE contribution to be large enough in magnitude that the 3s total bare exchange is larger for 5.4. RESULTS 66

Table 5.1: Bare exchange contributions and total bare exchange for atomic Si, Ga, As, and Ar. All energies are given in eV. l0 is the distance of the outer node of the 3p-state for each atom from the origin. It is a characterstic lengthscale for each atom and is given in bohr. lsep is the distance between the inner and outer AE wavefunction peaks, so lsep/l0 then gives the scaled separation of the inner and outer AE wavefunction peaks.

lsep/l0 AE PS % diff (AE-PS) Si atom (l0 =1.80) n = n′ =3s 0.68 -11.34 -11.30 -0.35 n = n′ =3p 0.87 -9.50 -9.51 0.11 n =3s,n′ =3p 0.76 -2.15 -2.21 2.7 3s Σ 3s -13.49 -13.51 0.15 h3p|Σ|3pi -11.65 -11.72 0.60 h | | i Ga atom (l0 =2.04) n = n′ =3s 0.58 -10.45 -10.43 -0.19 n = n′ =3p 0.78 -7.91 -7.92 0.13 n =3s,n′ =3p 0.65 -1.86 -1.90 2.2 3s Σ 3s -12.31 -12.33 0.16 h3p|Σ|3pi -9.77 -9.82 0.51 h | | i As atom (l0 =1.69) n = n′ =3s 0.63 -12.73 -12.66 -0.55 n = n′ =3p 0.76 -10.456 -10.464 0.08 n =3s,n′ =3p 0.69 -2.33 -2.39 2.6 3s Σ 3s -15.06 -15.05 -0.07 h3p|Σ|3pi -12.786 -12.854 0.55 h | | i Ar atom (l0 =1.10) n = n′ =3s 0.75 -17.48 -17.36 -0.69 n = n′ =3p 0.82 -15.99 -16.00 0.06 n =3s,n′ =3p 0.79 -3.48 -3.61 3.7 3s Σ 3s -20.96 -20.97 0.05 h3p|Σ|3pi -19.47 -19.61 0.72 h | | i 5.4. RESULTS 67

Table 5.2: LDA energy ǫlda and exchange correlation potential Vxc, bare exchange Σx and correlation self energy Σc, and quasiparticle energy EGW and renormalization factor Znk at the VBM Γv and the conduction band X-point Xc in Si. ∆ gives the difference between the values at Xc and Γv. ∆ is near the fundamental gap in Si and has been studied in past work. All energies are given in eV.

ǫlda Vxc Σx Σc Znk EGW AE G0W0 Γv 7.44 -11.45 -12.55 0.28 0.79 6.79 Xc 7.99 -9.14 -5.04 -4.12 0.79 7.97 ∆ 0.55 2.31 -7.49 -4.40 1.18 PS G0W0 Γv 6.06 -11.26 -12.70 0.27 0.79 5.14 Xc 6.67 -9.08 -5.08 -4.12 0.79 6.58 ∆ 0.61 2.18 -7.62 -4.39 1.44 the AE calculation. Without the detailed picture above these numbers seem mysterious and the trends in the total bare exchange difficult to understand. For our calculations on bulk Si, Ar, and GaAs we used a dielectric matrix kinetic energy cutoff of 60 Ry (25 Ry), 1500 (400) empty states in the calculation of dielectric matrix, and 1500 (600) states in the self energy calculation for the AE (PS) calculation . A BZ sampling of 8x8x8 was used. We use the Hybertsen-Louie plasmon-pole model[47] for the frequency dependence of the dielectric matrix, using the valence charge density to construct the model. In all cases we check against full-frequency results and found a difference of only a few tens of meV. The total convergence error of our calculations, including contributions from the use of the plasmon-pole model, the kpoint sampling, wavefunction and dielectric matrix cutoff, and number of bands in the dielectric matrix and self energy calculations is 50 meV. Experimental lattice constants of 5.43 A,˚ 5.31 A,˚ and 5.65 Aare˚ used for Si, Ar, and GaAs for ease of comparison with past calculations. Our results for bulk Si are shown in Table II. The first thing to note is that the bare exchange is smaller for both valence and conduction states for the calculation with the AE wavefunctions. This is in accordance with the previous discussion, as the inner-outer peak separation is larger in bulk Si than in atomic Si, so we would expect the bare exchange to be smaller for the calculation with the AE wavefunctions. For example, for the VBM exchange charge density ρvbm,vbm, lsep/l0 = 0.95, which is bigger than any values in the atomic calculations. The bare exchange overestimate is larger for the valence states since the bare exchange is larger for those states, so this leads to the opening of the gap from the self energy. Turning to other contributions, there is almost no difference in the dynamical self energy, Σc : the value at Xc is -4.12 eV for both calculations, while at Γv its value is 5.4. RESULTS 68

0.28 eV (0.27 eV) for the AE (PS) calculation. This is reasonable since Si’s core states are quite deep. This is opposite to what is found by G´omez-Abal and collaborators [32], where they find the self energy difference is in the dynamical self energy. This is likely caused by a lack of basis-set convergence in their FLAPW GW calculations, as discussed by Kresse et al.[57]. We indeed find that when we lower our convergence parameters that we are able to find numbers similar to G´omez-Abal, et al. The other differences in the gap come from the LDA eigenvalues and the exchange- correlation potential. The LDA eigenvalues show a large spread in the literature for both PW-PP and AE calculations, so we do not discuss this further. The exchange correlation potential is uniformly larger for the calculation with the AE wavefunctions than for the calculation with the PS wavefunctions. This is likely because there is a strong interaction in the small r region for the AE calculation since in LDA the exchange correlation potential is determine directly from the charge density at that point. So, there is an overestimate of the interaction in the small r region since the charge density is quite high there. We can see from Fig. 1(b) that the peaks in the exchange charge density get significantly smoothed out when the integrations over r′ are performed to get the exchange potential, as discussed previously. We find similar results in Ar and GaAs, showing that our explanation works across a variety of systems. In summary, we have carried out first principles GW calculations on atomic Si, Ga, As, and Ar, as well as their bulk counterparts. By focusing on individual contributions to the bare exchange in the atomic case, this work has shown the source of the error from the use of pseudowavefunctions. In particular, the finding that states that have a larger inner-outer peak separation in their AE wavefunction have an overestimated PS bare exchange, and vice-versa, immediately explains the overestimation of the bare exchange for calculations with pseudowavefunctions and the corresponding overestimate of the gap. Our results may be used to further understand previously published results trying to understand the effect of the PW-PP approximations in bulk [66, 32] and molecular systems [54]. Critically, we believe these findings can be used to develop better norm-conserving pseudopotentials that more accurately reproduce the AE results for the bare exchange, which is crucial given the widespread use of the PW-PP methodology across many fields of research. 69 6

Self consistency in GW calculations

6.1 Summary

Here we will briefly discuss research that we did to assess whether self-consistently iterating the GW equations gives better results than just using the mean-field wavefunctions and energies. In particular, we looked at the effect of updating just the mean-field energies in both the Green’s function and the dielectric matrix, as well as the more complicated quasi- particle self-consistent GW, or QSGW, scheme, which updates both the wavefunctions and the energies [26]. We found that, if the mean field wavefunctions are of high quality, updat- ing the energies gives worse agreement with experiment. because the screened interaction is described more poorly. However, if the mean-field wavefunctions are not of high quality, then updating the wavefunction and energies can be effective in getting better results, but updating just the energies is not effective. The QSGW method is quite expensive, however, so more affordable methods, such as the static COHSEX method [52, 50], will likely give the same benefits for a much lower cost.

6.2 Introduction

First of all, almost all GW calculations are performed non-self-consistently, using the wave- functions and energies from some mean-field calculation with no modifications or updates.

Calculations done in this way are referred to as G0W0 calculations, where the zero un- derscore denotes that the given quantities are computed from the mean-field energies and wavefunctions, with no updates. Coming from the perspective of self-consistent mean-field theories such DFT or the HFA, it is tempting to think that self consistency is always needed and will improve the quality of the answer. Unfortunately, the situation is more complicated in the case of the GW approximation (GWA). 6.2. INTRODUCTION 70

That is because the GWA is an approximation to a full set of self-consistent equations, called the Hedin equations, which, in addition to the two-point Green’s function and screened coulomb interaction, also contain a three-point function called the vertex function Γ [40]. The full approximation is referred to as the GW Γ approximation. The vertex function con- tains important excitonic effects, which have the effect of reducing the band gaps that enter into the expressions for the polarizability/screened coulomb interaction. Unfortunately, the vertex function is too complex to calculate from first principles, so it is usually neglected. Herein lies the reason why doing self-consistency in both G and W is not advisable. If one is not including the vertex function, then the excitonic effects are lost in the screened coulomb interaction, so there is an underestimate of the screening due to the overstimate of the band gap. Since decreased screening opens band gaps (with the zero screening of the HFA being the far limit), self consistency in GW calculations tends to give overestimated band gaps. Somewhat counterintuitively, using the mean-field energies from DFT, which understimate the bandgap, to construct the screend coulomb interaction W actually gives better results because the DFT energies better reproduce the quasiparticle energies + excitonic effects that would enter W than the quasiparticle energies. The only time that full self consistency in G and W might make sense is if the mean- field wavefunctions are of poor quality and updating the wavefunctions self consistently is needed to get reasonable results. However, I’ll argue below that in that case it probably makes more sense to improve the quality of the mean field since self consistency is expensive and a perturbative improvement to the mean field is unlikely to work when the mean field is truly of poor quality. Note that updating only the Green’s function is a reasonable thing to do and has shown been shown to give improvements over the standard G0W0 approach [91, 90]. Such calculations are referred to as GW0 calculations, with the zero subscript removed from the Greens’ function since it is being updated with new energies. We do not discuss such calculations in this work. XC In the QSGW scheme, a static exchange correlation potential (VQSGW ) is constructed from the matrix elements of the self energy operator

XC 1 V = ψi Re Σ(ǫi) + Re Σ(ǫj) ψj , (6.1) QSGW 2 | i{ ij ij}h | Xij     and then this potential is used in a DFT calculation in the place of the usual LDA or PBE exchange-correlation potential. From this new wavefunctions and energies are obtained, which are used to perform another GW calculation, and this process is iterated until con- vergence is achieved. ψi,ǫi are the usual mean field wavefunctions and energies, and ReΣ means you take the hermitian part of the self energy operator (= 1/2(Σ + Σ†)). You ad- ditionally evaluate this hermitian part at the energies of the rows and columns, and then average the two. This maintains the hermiticity of the operator, so that it can serve as a 6.2. INTRODUCTION 71 mean-field potential in a density functional theory calculation. A few notes about the QSGW method : 1) It is very expensive. In a usual GW calcu- lation, one only calculates the diagonal matrix elements of the self energy operator. In the QSGW scheme, one needs to also calculate the offdiagonal matrix elements, as well as iterate 2 the process. This leads to an increase of cost by a factor of Nmtx Niter/Nmtxel, where Nmtx ∗ XC is the rank of the self energy matrix needed to adequately represent the potential VQSGW , Niter is the number of iterations needed to reach convergence, and Nmtxel is the number of matrix elements you want to calculate. Let’s say you are doing a calculation on Si to get the valence bands and lowest four conduction bands. Even in Si, a very simple system that can be adequately described with a 32x32 self energy matrix and four iterations, this factor is already 512. So, unless it gives very impressive results, it probably does not warrant the cost of using it. 2) It is meant to alleviate the dependence of the results of GW calculations on the mean-field starting point. The fact that the GW results can depend on the mean-field starting point is not terribly shocking since the mean-field energies and wavefunction appear in all the GW expressions and perturbation theory necessarily works by calculating results away from some starting point. For many simple materials, such as Si and other standard semiconductors, this dependence on the starting point is quite small, as there is little dif- ference between GW calculations performed with LDA wavefunctions and energies versus GW calculations performed with other functionals’ wavefunctions and energies. For more complicated systems with correlations, such as transition oxides, the dependence on the mean-field starting point is very severe, which would indicate that QSGW might be a good choice for such systems. However, in my point of view this is unlikely to be the case, as QSGW moves perturbatively away from some mean-field starting point, say LDA, and constructs its own mean field. To me it seems this process is very unlikely to yield impressive results for a system where the starting mean field is of very bad quality, such as the transition metal oxides, since a perturbative process is very unlikely to get to the right mean field from a poor starting mean field. Instead, improvements in the original, non-perturbative mean field are needed, such as the LDA+U or LDA+DMFT methods. In light of this, it seems that the use case of QSGW should be quite small, limited to cases where the wavefunctions are described poorly by standard mean fields, but not poorly enough that they cannot be reached by the perturbative QSGW process. 3) The wavefunctions that are obtained from the QSGW method are not the true quasiparticle wavefunctions. The true quasiparticle wavefunctions would be obtained by diagonalizing the many-body hamiltonian, including the difference of the off-diagonal self energy operator and exchange correlation potential, at a range of frequencies and finding the frequencies for which one of the rows/columns of the diagonalized many-body hamiltonian matrix has same value as the input frequency. The state corresponding to this row/column of the diagonalized hamiltonian matrix for the given frequency is the quasiparticle state for that frequency (or energy). One can see immediately 6.3. COMPUTATIONAL DETAILS 72 that getting the true quasiparticle wavefunction in the GW approximation is quite expensive and, practically, this is never done. The QSGW scheme, though expensive, is much more inexpensive than the process just described. The tradeoff is that the QSGW scheme still does not get the quasiparticle wavefunction completely accurately. Eigenvalue self-consistency is performed simply by doing a mean-field calculation, cal- culating the polarizability and self energy to get quasiparticle energies, which can then be used to update the polarizability and one particle Green’s function given in equations (1.7) and (1.8). Eigenvalue self consistency (ESC) is generally what researchers are doing when they say they are doing “self-consistent GW”, since updating the wavefunctions is quite expensive (requiring the expensive diagonalization of the frequency dependent many-body hamiltonian mentioned above). I personally find this use of the term “self-consistent GW” when one actually mean eigenvalue self-consistent GW (ESC-GW) to be misleading be- cause those coming from self-consistent mean-field theory calculations will usually assume the wavefunctions are also being updated. I think this leads many “blackbox” users of common GW codes to think that they are doing a more impressive calculation than they actually are and to not understand why full self-consistency in G and W might not be a good thing if only the eigenvalues are being updated. It also confuses GW beginners. ESC-GW does not require the calculation of the off-diagonal elements of the self en- ergy operator, so its cost is just that of a standard G W calculation times Nupdate 0 0 ∗ Niter/Nmtxel Niter, where Nupdate is the number of eigenvalues calculated to update the Green’s function and polarizability. This factor is usually relatively small. In both the QSGW and eigenvalue self-consistency scheme, the updates to the energies are done only for states up to a given cutoff (say, 32, in the example of Si) and for the rest of the energies a simple scheme such as a scissor shift is used to correct the energies of the states after this cutoff. In the QSGW scheme the wavefunctions above this cutoff are not updated. This is reasonable because the higher energy states are closer to the continuum and ths feel the details of the potential less, so it is less important to represent the wavefunctions of these states accurately.

6.3 Computational Details

All of the calculations in this section are underconverged by modern standards, as this is a preliminary work. Nonetheless, given that greater convergence will lead to larger band gaps and HOMO-LUMO gaps, we already can make conclusions based on what we have found. The mean field calculations in this work were done using the PARATEC code [81], while the excited-state calculations were done using the BerkeleyGW code [11, 21]. We perform calculations on bulk Si and silane. For both calculations we use the local density approximation of DFT for the mean-field calculations and the Hybertsen-Louie generalized 6.4. RESULTS 73 plasmon-pole model for the frequency dependence of the inverse dielectric matrix [47]. For Si, we use a 4x4x4 Brillouin zone (BZ) sampling and a 30 Ry wavefunction cutoff for our ground-state calculation. For the excited state calculation, we use a 4x4x4 BZ sampling, a 10 Ry screened coulomb cutoff, 32 states in the polarizability and coulomb hole summations, and a 32x32 representation of the QSGW exchange-correlation potential. For silane, we use a 100 Ry wavefunction cutoff for the mean-field calculation. For the excited state calculation, we use a 10 Ry screened coulomb cutoff, 102 states in the polarizability and coulomb hole summations, and a 102x102 representation of the QSGW exchange-correlation potential. Additionally, in this work, for computational simplicity, we construct the mean-field hamiltonian including the QSGW potential and diagonalize it non-self consistently to get updated wavefunctions and energies. The full QSGW method uses this new potential in a self-consistent calculation to get a new charge density, and then new wavefunctions and energies are obtained from a non-self consistent calculation. Our results, however, are in good agreement with previous results in the literature so skipping the self-consistent step does not seem to be of great importance. Nonetheless, this simplification should be kept in mind in considering our results below.

6.4 Results

In Table 6.1 we give our results for the direct and Γ X gaps of bulk silicon and the HOMO, − LUMO levels of silane from our G0W0, ESC-GW, and QSGW calculations. We compare to the experimental values and also to the offdiagonal COHSEX (od-COHSEX) method for improving the mean field [50]. In bulk silicon we see the trend discussed in the introduction, namely that self-consistency causes the band gaps to be overestimated. The gaps will only be further overestimated for a fully converged calculation, although some of the overstimate is due to the use of the PW-PP approximation, as discussed in chapter 5. This agrees with previous researchers’ results [60]. The LUMO level in silane is known to be above the level (= 0) from QMC calculations [37]. For silane, we see that the agreement of the HOMO (with experiment) and especially the LUMO (with QMC calculations) energy levels are improved when using QSGW and od-COHSEX method, whereas eigenvalue self- consistency does very little to help the agreement of the LUMO level. This is because the LDA LUMO wavefunction is far too localized, so some method of mean-field improvement is needed [50]. So, in the case of silane, QSGW helps improve the results, but at a cost far higher than the od-COHSEX method, which gives similar results. ESC-GW is of little help in either case and should be avoided for the reasons discussed previously. 6.4. RESULTS 74

Table 6.1: Band gaps in bulk Si and HOMO, LUMO levels in silane for G0W0, ESC-GW, od-COHSEX, and QSGW calculations, as well experiment. All values in eV.

Method of computation G0W0 QSGW ESC-GW od-COHSEX [50] expt [74, 1, 63, 49] bulk Si Direct 3.39 3.46 3.42 3.32 3.35 Γ X 1.33 1.39 1.36 1.45 1.32 silane− HOMO -12.32 -12.58 -12.43 -12.49 -12.6 LUMO 0.84 0.31 0.90 -0.01 75

Part III

Computational Methods 76 7

Improved Accuracy and Scaling of Full-Frequency GW calculations

In my time in the group I have become very thoroughly acquainted with the BerkeleyGW codes that calculate the dielectric matrix and the self energy - the epsilon and sigma codes, in the group’s parlance. This is for three reasons : 1) My first project was to test the efficacy of the QSGW method, which required me to dig around in the sigma and epsilon codes; 2) When I joined the group, there was a set of very skilled and prodigious programmers who had contributed many improvements to the BerkeleyGW code - Dr. Jack Deslippe, Dr. Manish Jain, Dr. David Strubbe and Dr. Georgy Samsonidze - and I admired these people and their skills greatly. I set out to emulate them; and 3) For my calculations on doped graphene we needed the full-frequency dielectric matrix and self energy. Additionally, we had to modify the epsilon and sigma codes to read in external data for our refinement technique and to read in the substrate dielectric matrix for our substrate treatment, detailed earlier. This required a lot of modifications of the code, which required understanding it in detail. So, these three factors are what led me to be an expert in the BerkeleyGW epsilon and sigma codes, something about which I am very happy, as it allows me to perform calculations I would not otherwise be able to do by modifying the code quickly on the fly. It’s a bit by chance that I am an expert in these particular parts of the code, but I think I would’ve been digging into the code in one way or another due to the interested sparked by my interactions with the aforementioned coding gurus. I recommend this activity to any group member, as it helps you understand what you are actually calculating and because sometimes you need new physics or faster calculations, and changing the code is the only choice. To be specific, let me detail how this exact necessity arose for our calculations on doped graphene on SiC. These calculations were very expensive and the calculations of the ImΣ were not accurate enough for our purposes due to the use of a small broadening. Addition- ally, we found that the COH+SEX division of the self energy was not as well-behaved as CHAPTER 7. IMPROVED ACCURACY AND SCALING OF FULL-FREQUENCY GW CALCULATIONS 77 the COR+X division, since in the former the two frequency dependent terms are treated on unequal footing since the SEX term involves a simple evaluation of the inverse dielectric matrix, while the COH term requires an integral over the inverse dielectric matrix and some energy denominators. This leads to an imperfect cancellation of the COH and SEX terms, which was especially important for calculation of ImΣ which is supposed to be zero near the fermi level. The problem is especially stark when you have a numerical broadening when you evaluate the integral needed for the COH term. The COR+X term has only one frequency dependent term - the COR term - and so does not suffer from this problem. The large expense of these calculations motivated me to develop the calculation of the dielectric matrix for multiple frequencies in parallel (instead of doing the different frequen- cies serially), while the inaccuracies in ImΣ led me to develop the principal-value (PV) integral approach for calculating the self energy and my colleague, Dr. Johannes Lischner, to implement the COR+X division of the self energy into the BerkeleyGW code. This is in addition to the modifications we made in the code to read in the necessary data for the refinement treatment and to read in the substrate dielectric matrix. These two additions did not make it into the code because they were a bit hacky, although the latter was rela- tively close to being able to be put into the code. Since this work, however, Felipe Jornada has developed some nice scripts for adding the dielectric matrix from the substrate and it is a more efficient way to do things. He has also developed a method for treating the substrate (the in-plane substrate approximation, or IPSA [95]) that is more affordable than the method we used and captures the necessary physics, and is hence likely to become the standard in such calculations. Long live incremental improvement! I have recently finished my implementation of the calculation of multiple frequencies in parallel and it greatly improves the performance and scaling of the full-frequency epsilon code. After some more testing and creation of memory tests to determine how many frequen- cies to do in parallel, this will become the default in the code. My original implementation of the PV evaluation of the self energy and Johannes’ implementation of the correlation self energy are the basis of what’s currently in the code, although Jack Deslippe made their im- plementation cleaner and improved the numerical stability of the PV value implementation with the input of our applied math collaborators on the SciDAC team [71]. I provide this timeline so that future generations will get a taste of how things get into the code and how it often happens from a process of needing new or more accurate physics. Knowing how to code efficiently and accurately is crucial! I will now detail the conceptual framework of my scheme for calculating multiple fre- quencies of epsilon in parallel, which will of course require a discussion of how it is done in serial. I will also show the tremendous performance increase on a couple tests systems, namely CO and silicon. I will also briefly discuss the PV approach for evaluating the self energy, from my simple point of view (one can see the much longer and more sophisticated 7.1. PARALLEL FREQUENCIES 78 view in the paper written about the subject [71]). Finally, I’ll finish with some discussion of the relative merits of the X+COR and COH+SEX divisions of the self energy.

7.1 Parallel frequencies

As Steven Louie would say, “back in the old days” we did not compute the frequency- dependent dielectric matrix because it was too expensive and a simple plasmon-pole model, such as the one invented by Hybertsen and Louie [47], was sufficient to get good electronic bandstructures on simple, relatively homogeneous systems such as Si. However, many new systems of current interest, e.g. molecules, hybrid organic-inorganic perovskites, oxides, show much more complex frequency dependence of the dielectric response than the simple, “standard” semiconductors of yesterday. Additionally, if one is interested in quasiparticle lifetimes due to electron-electron interactions then plasmon-pole models are not sufficient because they assume zero lifetime in their formulation. The BerkeleyGW codes was optimized over the years, especially by Jack Deslippe, with the calculation of the static dielectric matrix in mind. Although the “elements” communica- tion scheme for the polarizability summation (keyword gcomm elements in epsilon.inp file) was specifically designed for the frequency-dependent case, the calculation of the frequency dependence was essentially bolted onto the old, static code, especially for the inversion of the dielectric matrix. Generally, loops over frequency were added to the static code, so that the same operations done in the static code were just done repeatedly for the frequency- dependent case. In the static case the time to do the polarizability sum and matrix inversion are dwarfed by the time to calculate the matrix elements, so the speed and scaling of the code were largely determined by the matrix element calculation (which need only be done once for all frequencies). However, in the frequency dependent case, the polarizability sum and the matrix inversion are dominant, so these steps need to be performed optimally. The aforementioned addition of loops over frequency leads to suboptimal speed and scaling in the frequency-dependent case. In this section, I will detail the working of the BerkeleyGW epsilon code so as to point out where problems occur in the original parallelization scheme for the frequency-dependent case. I will then discuss how this problem was solved.

1 7.1.1 BerkeleyGW’s calculation of ǫ− Alright, first things first : BerkeleyGW is a very well parallelized code, thanks to the efforts of many former group members, most especially Jack Deslippe. This is what makes it so speedy and scalable. However, the price of this speed and scalability is quite a bit of complexity that can take a while to wrap one’s head around. I know it took me a long 7.1. PARALLEL FREQUENCIES 79 time to understand the parts of the code with which I have worked (the sigma and epilson codes, along with a lot of what is in the Common directory). The epsilon code is definitely one of the most complicated because it actually has two different parallelizations going on simultaneously, and they meet at the calculation of the polarizability. The switch between the two different paralellization schemes takes some thought and work, and we’ll discuss this in detail below.

Before we do that, let’s re-write the usual expression for the polarizability χGG′ as

′ ∗ χGG (ω)= MG,vcMvc,G′ fvc(ω), (7.1) Xv,c where here I have suppressed the k and q indices because they are not important for this discussion and it clarifies the underlying expressions. MG,vc is the usual matrix element between states v , c for a given g-vector G. See the 2012 BerkeleyGW paper if you do not | i | i know what this quantity is [21]. The functions fvc(ω) are the usual energy denominators that come into the polarizability (again, see the BerkeleyGW paper if you don’t know what these are). Ignoring frequency for the moment, we see that on the left we have a matrix with indices G, G′ while on the right we have a sum over products of matrix elements for all vc pairs. So, we have to perform the sum over the vc pairs for all G, G′. This seems like a daunting task, given that the number of valence bands, conduction bands, and g-vectors are all quite large numbers. Additionally, the number of valence bands, conduction bands, and g-vectors each scale with system size, so this is an N 4 step. We clearly need to do this summation intelligently. I have written equation (7.1) in a suggestive way, which helps show the way forward. We can see that the v,c indices on the matrix elements are the same, so we are really performing a gigantic matrix multiplication! I believe this was first recognized by Jack Deslippe. At the very least, he greatly optimized this operation and wrote it up in the 2012 BerkeleyGW paper [21]. We can represent equation (7.1) as in figure 7.1. Ok, so we have cast the polarizability sum as a gigantic matrix multiplication, so what do we need to do it quickly and at scale? The answer is of course that we have to distribute many quantities across processors and make judicious use of libraries, in this case LAPACK, to do the linear algebra operations efficiently. Let’s now talk about the two levels of parallelization/distribution that appear in the ep- silon code. First of all, note that all the data structures needed for distribution of the dielec- tric matrix and matrix elements is done at the beginning of the code in the epsilon main.f90 and input.f90 files. The wavefunctions are also distributed to different processors at the beginning when the wavefunctions are read (which occurs input.f90). With this in mind, the two different parallelizations are : 1) in the beginning of the epsilon code the matrix ele- 7.1. PARALLEL FREQUENCIES 80

Figure 7.1: Pictorial representation of (7.1). The ′T ′ that appears above (M T ) means transpose conjugate. ments are calculated for all vc pairs and these vc pairs are distributed to different processors, and 2) at the end of the code the dielectric matrix is inverted in parallel, which is achieved by breaking it up into blocks and distributing the blocks to different processors. The rele- vant routines for setting up the needed data structures are create pools (called in input.f90 and which determines the band distribution and associated data structures) and setup blacs (called in epsilon main.f90 and determines the distribution of the dielectric matrix and the associated data structures). We can again, represent all of this pictorially, which is done in figure 7.2 assuming that we have 16 processors in our calculation. What can we see from this figure? First of all, we see that we distribute all the vc pairs to different processors for the computation of the matrix elements. Each processor owns all the g-vectors for all the matrix elements of the vc pairs that it owns. This of course means that each processor must own all the g-vectors of the wavefunctions for all the vc pairs that it owns, otherwise a given processor could not compute the matrix elements for all g-vectors for the vc pairs it owns. The reason that we distribute over vc pairs and not over g-space for the matrix elements is : 1) there are many vc pairs, which scale as N 2, so this offers a larger degree of parallelism than if we parallelized over g-vectors, which would scale as N; and 2) we compute the matrix elements using FFTs and parallel FFTs are not very efficient, so parallelizing over g-vectors would give bad performance. Another important thing to note here is that each processor owns the wavefunctions for all k-points for the vc pairs that it owns and computes the needed matrix elements for all k-points needed in the polarizability sum. So, in all figures in this 7.1. PARALLEL FREQUENCIES 81

Figure 7.2: Pictorial representation of the distribution of the matrix elements and dielectric matrix amongst 16 processors.

T section keep in mind 7.2 both the M and M matrices have size NG Ncvk = NG Ncv Nk, ∗ ∗ ∗ where NG is the number of g-vectors, Ncv is the number of cv pairs, and Nk is the number of k-points in the little group of the q-point under consideration (there’s a huge outer loop over q-points for all steps in the computation of the inverse dielectric matrix). Second, we see that we distribute the dielectric matrix in blocks to different processors. So, we are distributing the polarizability, dielectric matrix, and inverse dielectric matrix over g-space. This would seem somewhat incompatible with what happens in the matrix element calculation and, indeed, it is. To make the switch between parallelization of the matrix elements and parallelization of the inversion of the dielectric matrix, we are going to need some communication. There are two communication schemes for this switch between the two parallelization schemes, the “matrix” communication scheme (gcomm matrix in epsilon.inp) and the previously mentioned “elements” communication scheme (gcomm elements). The matrix scheme is the default in BerkeleyGW, but is not optimal for full-frequency calcu- lations. Let’s discuss each scheme in detail now. I will include pseudocode with some commentary, as the full frequency code seems quite big and complicated, but really it is doing some relatively simple steps with reasoning behind each. Matrix Communication Scheme. In this scheme, the code loops through the blocks of the polarizability owned by different processors. For a given block, each processor per- forms the matrix multiplication for the states that it owns and for the G, G′ of the current block of the polarizability. This is depicted in figure 7.3, where lines have been added on the right to show how processors perform the needed multiplication for only the subset of 7.1. PARALLEL FREQUENCIES 82

Figure 7.3: Pictorial representation of the distribution of the matrix elements and dielectric matrix amongst 16 processors, with additional lines added on the right to show how the “matrix” communication scheme works. the g-vectors that they own that correspond to the G, G′ of the current block of the polar- izability. The result of this matrix multiplication is then reduced over all processors to the processor that owns that block of the polarizability. This has to be done for each frequency, 2 so the amount of communication performed in this step is proportional to N Nf , where G ∗ Nf is the number of frequencies in the calculation.

PSEUDOCODE BEGIN for ipe=1,nprocessors for ispin=1,nspin itot=0 for irk=1,nkpt ibz for it=1,nkpt star for ig row=1,ng row own temp phase row(ig row)=global phase(ig row,ipe,it,irk) temp index row(ig row)=global index(ig row,ipe,it,irk) for ig col=1,ng col own temp phase col(ig row)=global phase(ig row,ipe,it,irk) temp index col(ig row)=global index(ig row,ipe,it,irk) for iv=1,nv own 7.1. PARALLEL FREQUENCIES 83

for ic=1,nc own mytot=itot+ic for ifreq=1,nfreq energy denom(ifreq)=f(iv,ic,ifreq) for ig row=1,ng row own mtxel temp row(ig row)=mtxel global(temp index row(ig row),ic,iv,ispin,irk) *temp phase row(ig row) for ifreq=1,nfreq mtxel temp row with energy denom(:,mytot,ifreq)=mtxel temp row(:) *energy denom(ifreq) for ig col=1,ng col own mtxel temp col(mytot,ig col)=conjg( mtxel global(temp index col(ig col), ic,iv,ispin,irk)*temp phase col(ig col) ) itot=itot+nc own for ifreq=1,nfreq zgemm(mtxel temp row with energy denom(:,:,ifreq),mtxel temp col(:,:),chilocal(:,:,,ifreq)) mpi reduce(chilocal(:,:,:),chilocal spin(:,:,:,ispin),ipe-1) for ispin=1,nspin for ifreq=1,nfreq pol%chi(ifreq,:,:,ispin)=chil local spin(:,:,ifreq,ispin) PSEUDOCODE END

PSEUDOCODE COMMENTARY: What we do here is we loop through the blocks of the polarizability matrix and then through kpoints in the IBZ and the kpoints in the star (or little group) of the kpoints in IBZ. For each kpoint each processor grabs the necessary phase and indexing arrays from previously calculated global phase and indexing arrays. This is done for the range of g-vectors for the current block we are working on, for both rows and columns in the polarizability. Then, each processor loops through the valence and con- duction bands that it owns and calculates the energy denominators using the mean-field eigenvalues read in from the wavefunction (or eqp.dat). Each processor also grabs the ma- trix elements for all the vc pairs it owns and for the needed range of g-vectors for the given block of the polarizability that we are working on, and appends the needed phases. The energy denominator gets built into the matrix element array that has the polarizability matrix row index as its first index, or the matrix M in the expression χ = MM T , which becomes χ(ω)= M(ω)M T (I have which matrix is transposed in figure 7.1; it is immaterial which matrix is transposed, as it is a matter of the definition of M). Each processor then performs the zgemm (matrix multiplication) for the vc pairs it owns and for all kpoints for the current block of the dielectric matrix for all frequencies. The resulting contribution to 7.1. PARALLEL FREQUENCIES 84 the polarizability for the current block of the dielectric matrix is reduced/summed across all processors to the processor that owns that block of the dielectric matrix. Finally, the temporary array is copied to a permanent array. I have not discussed spin above because how it is handled is straightforward. The actual code looks messy because there are things I have not shown above, like when you use the complex version of the code and you need to calculate both the advanced and retarded dielectric matrix, or when you use the linear energy denominator technique. But these are just extra details that are not important for understanding the logic of the code. Note that we use temporary arrays for the phase, index, energy denominators, and matrix element data to take advantage of memory locality and not touch memory every time we do a zgemm. This leads to better performance. If you need more information, take CS267 with Jim Demmel! Elements Communication Scheme. In this scheme, the code loops over k-points, then conduction bands, and then valence bands. For a given k-point, conduction band, and valence band, the processor that owns the given vc-pair distributes the matrix element (for all g-vectors and all kpoints in the little group) and energy denominator (for all frequen- cies) to all other processors. Then, all the processors calculate the contribution to their part of the polarizability coming from this vc pair using the matrix element and energy denominator they received. The amount of communication in this scheme is proportional to NG Nk Nv Nc. In terms of figure 7.2, what is basically happening is that the code ∗ ∗ ∗ is looping through different columns/rows of the first/second matrix on the right-hand side and the processors that owns the given column/row under consideration broadcasts this data to the other processors, which then use that data to compute the contribution from those cv-pairs to their block of the polarizability. After looping through all k-points, valence bands, and conduction bands, all processors will have received all matrix elements and from this data computed the corresponding contribution to their block of the polarizability ma- trix. Note that, in reality, the code does not send whole columns/rows to other processors but instead just sends the data for one-kpoint, one conduction band and one valence band at a time, to minimize memory overhead.

PSEUDOCODE BEGIN allocate(cond band counter(nval)) for irk=1,nkpt ibz for ispin=1,nspin chilocal=cplx(0d0,0d0) cond band counter(:)=1 for ic=1,ncond itot=0 7.1. PARALLEL FREQUENCIES 85

for iv=1,nval if(I own vc-pair) mtxel temp(:)=mtxel global(:,cond band counter(:),iv,ispin,irk) for ifreq=1,nfreq energy denom(ifreq)=f(iv,cond band counter,ifreq) cond band counter(iv)=cond band counter(iv)+1 mpibcast(mtxel temp) mpibcast(energy denom) for it=1,nkpt star itot=itot+1 for ig row=1,ng row own mtxel temp row with energy denom(ig row,itot,:) =mtxel temp(index(ig row,it,irk)) *phase(ig row,it,irk) *energy denom(:) for ig col=1,ng row col mtxel temp col(ig col,itot)=conjg( mtxel temp(index(ig col,it,irk)) *phase(ig col,it,irk) ) for ifreq=1,nfreq zgemm(mtxel temp row with energy denom(:,:,ifreq),mtxel temp col(:,:), chilocal(:,:,,ifreq)) for ifreq=1,nfreq pol%chi(ifreq,:,:,ispin)=pol%chi(ifreq,:,:,ispin)+chilocal(:,:,ifreq) PSEUDOCODE END

PSEUDOCODE COMMENTARY: What we do here is loop over kpoints in the IBZ and then vc pairs. If a given processor owns a vc-pair, then it computes the matrix element and energy denominator for that vc pair and sends them to all other processors. Then each processor loops through all kpoints in the star of the current kpoint in the IBZ and appends the requisite phases to the matrix elements, and also builds in the energy denominator into the array that has the polarizability matrix row index as it’s first index. This is all done inside the loops over valence bands and kpoints in the start of the current kpoint in the IBZ. We then exit these two loops and each processor performs the zgemms for all frequencies to get contribution to the G, G′ chunk of the polarizability it owns using the matrix elements of all valence bands for all kpoints in the star of the current kpoint in the IBZ and the current conduction band, adding it to the previous value and storing the result in a tem- porary array chilocal (thus summing over all the contributions from the conduction bands by the loop over conduction bands). We then exit the conduction band loop and add this 7.1. PARALLEL FREQUENCIES 86 contribution to the polarizability for the current kpoint (irk) - which includes contribution from all valence bands, conduction bands, and kpoints in the start of the current kpoint - to each processor’s chunk of the polarizability matrix, this time using a permanent array pol%chi. By looping over kpoints in the IBZ we get all the contributions to the chunk of polarizability matrix that each processor owns.

Again, the actual code looks messy because there are things I have not shown above.

So, when do we use a given communication scheme? The ratio of communication done in the matrix scheme to that done in the elements scheme is given by

2 N Nf R = G ∗ (7.2) NG Nk Nv Nc ∗ ∗ ∗ NG Nf = ∗ Nk Nv Nc N∗ ∗ f , ≈ Nk Nv ∗ so, if Nf < Nk Nv (Nf > Nk Nv) then the matrix (elements) scheme should be used. ∗ ∗ If you keep the same k-point sampling along each periodic direction, then as you go to lower dimensions the elements scheme will start to perform better relative to the matrix scheme, with zero dimensions (molecules, quantum dots, etc.) being ideal for the elements communication scheme (since Nk = 1 in this case). Also, as you add more frequencies the elements communication scheme gets better relative to the matrix scheme. Finally, for the static case (Nf = 1) the matrix communication scheme is always the better choice.

1 7.1.2 Why you should calculate ǫ− (ω) at multiple frequencies in parallel We discussed the two different communication schemes that are used to switch between the band parallelization of the matrix element computation to the g-vector parallelization of the dielectric matrix inversion, and found that for full-frequency calculations the elements communication scheme can sometimes be preferable to the matrix communication scheme. That being said, regardless of the communication scheme in polarizability sum, the full- frequency calculation of the inverse dielectric matrix is not really optimized for a reason that was not mentioned in the previous discussion : the code inverts the distributed matrix one frequency at a time using all processors available. This is problematic because the library that is called to invert the dielectric matrix in parallel, ScaLAPACK, does not scale 7.1. PARALLEL FREQUENCIES 87 past a few hundred processors and, more importantly, for a given number of processors that is appropriate for the calculation of the matrix elements (which scales as N 2) this number of processors will always be too large for ScaLAPACK to use all of the processors to efficiently invert the dielectric matrix. This leads to many processors being wasted during the inversion of the dielectric matrix and this is compounded when you have to invert the dielectric matrix for multiple frequencies (which is currently done serially). To see the problem, go to chapter 5 of the original ScaLAPACK paper [13], where they talk about the performance of ScaLAPACK. In this section it says that for computational efficiency, the ideal dimensions of the sub-matrix held by each processor is 1000x1000. So, 2 for maximal computational efficiency, we should use (NG/1000) processors in our inversion of the dielectric matrix. To see why this is problematic, the number of g-vectors for bulk Si (molecular CO) calculation with a reasonable cutoff of 10 Ry is 137 (1237). So, for bulk Si we should be using just 1 processor and for molecular CO we should be using 1 or 2 processors! In a typical BerkeleyGW calculation, where we would use at least hundreds of processors for the calculation of the matrix elements, we have far too many processors than we need for the matrix inversion. The additional processors offer essentially no speedup and they can be considered wasted processors for this part of the calculation. If there are many frequencies for which we need to invert the dielectric matrix, then the slowness/non-scalability of this part of the calculation becomes evident. This was fine in the past when the calculation of the matrix elements swamped all other contributions to the calculation time, but for full-frequency calculations the matrix inversions can start to become a bottleneck. Note that a 10 Ry cutoff is somewhat modest and bulk Si and molecular CO are both quite small systems, so for most bigger systems one would need more than a processor or two to invert the dielectric matrix efficiently. For example, for C60 - a buckyball - the number of atoms is 30 times larger than in CO, so the number of g-vectors will also be roughly 30 times larger, or roughly 30,000. In this case the number of processors needed for efficient inversion of the dielectric matrix is (30000/1000)2 = 900. However, the number of processors that one would need to calculate the matrix elements would also increase for C60 relative to CO, from something like 100 processors to 100*(30**2) = 90,000 processors. So, again the matrix element calculation needs far more processors than does the matrix inversion. So, how do we alleviate this problem? Given the title of this section, you probably know the answer: calculate many frequencies in parallel. But why does this alleviate the problem? The answer lies in reducing communication, memory accesses, and speeding up the ScaLAPACK inversion. To see this, see figure 7.4. The basic idea is this : we leave the calculation of the matrix elements exactly as it was before, but we give each processor a larger chunk of the polarizability/dielectric matrix, while giving it only a subset of the frequencies. The processors that hold the same frequencies form an MPI group - the 7.1. PARALLEL FREQUENCIES 88

Figure 7.4: Pictorial representation of the distribution of the matrix elements and dielectric matrix amongst 16 processors with 4 frequencies calculated in parallel. frequency communication group - and they work together to do the polarizability sum and invert the dielectric matrix. The fact that we do multiple frequencies in parallel and give each processor a bigger chunk of the dielectric matrix helps the ScaLAPACK bottleneck in two ways : 1) Each processor in the frequency communication group has a larger chunk of the dielectric matrix, so we are closer to the 1000x1000 ideal submatrix size for ScaLAPACK to perform efficiently, and 2) We perform multiple matrix inversions in parallel, so less (and possibly no) processors are wasted waiting for a ScaLAPACK inversion that could go just as quickly with few processors. If you re-read the description of the elements communication scheme you will see that nothing else is needed besides the creation of the MPI frequency communication groups and change in the ScaLAPACK layout to achieve this improvement in performance. This is be- cause we can still loop through k-points, conduction bands, and valence bands, distributing them to all processors and letting them work on the chunk of the polarizability that they own. In this case, each processors will own a bigger chunk in G, G′, but only a subset of the frequencies. With these small changes in mind, the elements scheme works exactly as described above. The matrix communication scheme, however, requires some extra communication to make it work with each processor owning only a larger chunk in G, G′, but only a subset of the frequencies. The problem is that in the matrix scheme you rely on the fact that, amongst all processors involved in the polarizability sum, every vc-pair is owned by one of the processors. But it was already said that the matrix element computation remains exactly as before, with no alterations made, so the vc-pairs are spread across all processors, 7.2. X+COR VS. COH+SEX DIVISIONS OF Σ 89 not just the processors in one’s frequency communication group. This is done to give the maximum amount of parallelism to and do no redundant work during the matrix element computation. The price is that there has to be communication of matrix elements between the different frequency communication groups. To do this we setup matrix element com- munication groups between the processors that have the same rank within their respective frequency communication groups. These processors send their matrix elements to one an- other, with the end result being that all frequency communication groups end up owning all matrix elements needed to perform the polarizability sum. Note that the number of pro- cessors in the matrix elements communication groups is equal to the number of frequencies done in parallel Nf,par, which in general is going to be a small fraction of the total number of processors. Thus, the amount of time spent in communication of matrix elements is quite small, NG Nf,par. ∗ Conversely, there is a very large benefit in the polarizability sum for this frequency parallelization, in addition to the benefit during the matrix inversion. Since the number of frequencies held by each communication group is reduced by the number of frequencies in parallel - let’s call this number - then the amount of communication done during the 2 polarizability sum goes down to N Nf /Nf,par. This actually reduces the benefit of using G ∗ the elements communication scheme for full-frequency calculations, especially if you are not calculating a very large number frequencies, as you when use contour deformation to evaluate the self energy. It is likely that, going forward, the method of choice for full- frequency calculations using BerkeleyGW will use the contour deformation technique, the matrix communication scheme, and the parallel frequencies technique. Note that the parallel frequencies scheme also improves the performance of the zgemms and the array allocations (which you can see the pseudocode above), likely because zgemm works better with bigger matrices and we have fewer but bigger array allocations, which is better for memory locality. Alright, well that was a lot of explanation. Let’s now get to some numbers that show the performance of the parallel frequencies scheme. From Table 7.1 we see that we show good speedup for all parts of the calculation of the inverse dielectric matrix for both bulk Si and the more challenging molecular CO. In particular, the performance is very good for bigger systems, as represented by molecular CO in this case. The saturation of ScaLAPACK gives basically linear improvement in the time for inversion of the dielectric matrix for bulk Si. This makes sense, since the rank of the dielectric matrix for Si is quite small, so ScaLAPACK does not have enough work to do.

7.2 X+COR vs. COH+SEX divisions of Σ

The derivation of the COH+SEX division of the self energy is given in the appendix. When dividing the self energy into bare exchange + correlation, the so-called X+COR method, 7.2. X+COR VS. COH+SEX DIVISIONS OF Σ 90

Table 7.1: Run times, in seconds, for various parts of the calculation of inverse dielectric matrix using the parallel frequencies schemes with 1, 2, and 8 frequencies done in parallel.

numberfrequenciesinparallel 1 2 8 bulk Si polarizabilitysum-total 13.12 8.93 4.40 polarizabilitysum-prep 10.75 7.08 3.23 polarizability sum - zgemm 2.17 1.75 1.14 polarizabilitysum-comm 0.20 0.10 0.027 epsiloninversion 0.74 0.26 0.064 molecular CO polarizabilitysum-total 9.31 6.89 2.13 polarizabilitysum-prep 1.27 1.01 0.66 polarizability sum - zgemm 1.85 1.60 0.90 polarizabilitysum-comm 6.18 4.27 0.57 epsiloninversion 5.275 2.60 0.93 the correlation self energy

1 ∗ ′ nk Σc(E) nk = Mnn′ (k, q, G)M ′ (k, q, G)v(q + G ) h | | i π nn Xn′ qGGX′ ∞ TO −1 ′ ′ Im[ǫGG′ ] (q,E ) dE ′ , (7.3) × Z ǫn′k−q E + E sgn(ǫn′k−q µ)+ iδ 0 − − is much more straightforwardly derived than the two terms in the COHSEX method. The “TO” in the above equation stands for time-ordered. Indeed, since we calculate the time- ordered self energy, then the spectral resolution of the screened coulomb interaction W that we employ should also be time-ordered. All of the difficulties in the derivation of the COH+SEX self energy and the appearance of the difference of the retarded and advanced dielectric matrices can be traced back to the breaking up of the frequency dependent term into two terms. For the X+COR case, all one needs to do is plug (8.8) in (8.1) and immedi- ately the bare exchange and correlation terms are evident. Thus, the correlation formulation is formally much simpler. The other benefit of using the X+COR formulation is that it is better behaved numeri- cally, since you are not adding two terms of opposite sign. In the COH+SEX formulation, the two terms are not even on equal footing, since the COH term involves an integral over the inverse dielectric matrix that has a broadening, while the SEX term involves a simple evaluation of the inverse dielectric matrix. So, the two terms are treated unequally. In par- ticular, when the imaginary part of the self energy goes to zero at the fermi level, this adding of two energy dependent terms of opposite sign can be quite problematic. We found the 7.3. PRINCIPAL VALUE INTEGRAL EVALUATION OF Σ 91

X+COR formulation to be much better for this purpose, although when the principal value technique is used, both formulations provide reliable and stable results, since no broadening is being used [71]. However, the COH+SEX formulation requires two dielectric matrices (re- tarded and advanced) while the X+COR formulation requires only one (time-ordered), so the X+COR formulation uses half the computational time and memory as the COH+SEX formulation. The COX+SEX formulation is important for historical and physical reasons, but for practical calculations the X+COR formulation is usually preferable.

7.3 Principal value integral evaluation of Σ

As discussed previously, in evaluating the self energy in (7.3) a small, finite broadening δ is generally used in order to have a numerically smooth Σ. However, we have found that a better approach is to instead choose a broadening in the calculation of the polarizability χ that is sufficient to give a smooth sigma and take the broadening in (7.3) to zero. When this is done ImΣ goes to zero at the fermi level, as it should. This is especially important for doped graphene because the features in the self energy are very sharp and the quantitative accuracy of the spectral function is diminished by having a finite lifetime at the fermi level, as is the case in standard treatments using a broadening. To take the broadening to zero we use the well-known identity:

1 1 lim = P + iπδ(x), (7.4) δ→0 x + iδ x where P() denotes the principal value and δ(x) is the Dirac delta function. Equation (7.4) is only meaningful under an integral. The delta function part of (7.4) is straightforward to evaluate. For the principal value part we use a technique in which we Taylor expand ǫ−1(ω) in each frequency interval on our frequency grid and evaluate the integral in each interval analytically [24]:

nf ∞ −1 ′ En+1 −1 ′ Im ǫGG′ (q,E ) ′ Im ǫGG′ (q,En) dE ′ = dE ′ Z Ep E Z Ep E 0 − Xn=1 En − nf −1 Ep En = Im ǫGG′ (q,En)log − . (7.5) Ep En  Xn=1 − +1

The En are the energies on our frequency grid, nf is the number of frequencies on our frequency grid, and Ep is the pole energy from equation (7.3). In our original work [68] our 7.3. PRINCIPAL VALUE INTEGRAL EVALUATION OF Σ 92 implementation of (7.5) contained also a first derivative in the expansion of ǫ−1 in the pole region, but we have found that it makes little difference. 93

Part IV

Appendix 94 8

COHSEX Derivation

8.1 Derivation of COHSEX Self Energy

Before we start the derivation let me first say that, if you are looking for good references as background before reading, I highly recommend G. Mahan’s “Many Particle Physics”, 2nd or 3rd edition, sections 2.1-2.3 and 2.9a. The whole of chapter 2 is excellent and relevant to our work, but only these sections are important for understanding Green’s functions and the concepts of time-ordered, retarded, and advanced. The Hedin and Lundqvist article also has many important definitions and results, most of which are cited below with page numbers, so you can read further. For an excellent exposition of linear response and why ǫ−1 can be written in terms of a density-density correlation function, see Ulrich R¨ossler’s “Solid-State Theory”, ch. 2, especially p. 22-32. However, don’t be intimidated by these references, dive in and see how far you can follow! I owe an enormous hat tip to Johannes Lischner in helping me to understand how this derivation works. Thanks JL!

Let’s insert into the definition of the self-energy operator in the GW approximation:

∞ i ′ Σ(r, r′,E)= dE′G(r, r′,E E′)W (r, r′,E′)e−iδE (8.1) 2π Z−∞ − the spectral representation of the time-ordered Green’s function to get

∞ µ ′ ′′ i ′ 1 ImG(r, r ,E ) Σ(r, r′,E)= dE′W (r, r′,E′)e−iδE dE′′ | | 2π Z π Z E E′ E′′ iδ −∞ −∞ − − − 1 ∞ ImG(r, r′,E′′) + dE′′ | | (8.2) π Z E E′ E′′ + iδ  µ − − 8.1. DERIVATION OF COHSEX SELF ENERGY 95

Because of the sign of the exponential, we close the contour in the lower half plane for the E′ integral. For the first term in (8.2), which we call Σ(1) we first perform the E′ integral. This term has a contribution due to the poles of G and a contribution due to the poles of W. In doing the E′ integral below, we will leave out the contribution to Σ(1) from the poles of W. We will return to this contribution once we discuss the spectral resolution of W below. We will refer to the portion of Σ(1) from the poles of G as Σ(1G) and the portion from the poles of W as Σ(1W ). The second term in (8.2), which we call Σ(2), has only a contribution due to the poles of W. Doing the E′ integral and, again, focusing only on the poles coming from G, we get

µ ∞ ′ ′ 1 i ′ W (r, r ,E ) Σ(1G)(r, r′,E)= dE′′ ImG(r, r′,E′′) − dE′e−iδE Z π | |2π Z E′ (E E′′)+ iδ −∞ −∞ − − µ 1 i = dE′′ ImG(r, r′,E′′) − 2πi ( 1) W (r, r′,E E′′) Z−∞ π | |2π − − 1 µ = dE′′ ImG(r, r′,E′′) W (r, r′,E E′′) (8.3) − π Z−∞ | | −

The minus sign in the first line of the above expression comes from the change in sign of the denominator, while the minus sign in the middle line comes from the fact that the contour is traversed clockwise. The time-ordered Green’s function is given by

φ (r)φ∗ (r′) G(r, r′, ω)= nk nk (8.4) ω Enk + iδnk Xnk −

where δk is a positive (negative) infinitesimal for states above (below) the fermi energy. Then, using the standard identity from complex analysis 1 1 = P iπδ(ω E) (8.5) ω E iδ ω E ∓ − − ± − we have that

′ ∗ ′ ImG(r, r , ω) = π φnk(r)φ (r )δ(ω Enk) (8.6) | | nk − Xnk Putting this into the above expression for Σ(1G) we get

occ (1G) ′ ∗ ′ ′ Σ (r, r ,E)= φnk(r)φ (r )W (r, r ,E Enk) (8.7) − nk − Xnk Let us now turn to Σ(2) and Σ(1W ), which both come from the poles of W. We would like to use the spectral resolution of W in these terms. We will derive the spectral resolution 8.1. DERIVATION OF COHSEX SELF ENERGY 96 shortly (see equations (8.9)-(8.12)), but for now we just write it down for discussion. From Hedin and Lundqvist (HL) p. 41, we see that, for systems with time-reversal symmetry (all of the ones we want to do in BGW), we have for the time-ordered screened coulomb interaction W

∞ 2E′B(r, r′,E′) W (r, r′,E)= v(r, r′)+ dE′ (8.8) Z E2 E′2 0 − This comes from the definition of the time-ordered dielelectric matrix ǫ−1 on p.33 of HL and the spectral resolution derivation in Appendix C of HL. Note that E′ has a small negative imaginary part (see Appendix C). The question is, then, what is the function B(r, r′,E′) in equation (8.8)? One obvious choice is 1/π times the imaginary part of the time-ordered screened interaction, but it turns out this is not correct. What one needs in fact is 1/π times [W R(r, r′,E′) W A(r, r′,E′)]/2i , where R (A) is for retarded (advanced). − − To see this, we derive the spectral resolutions for the time-ordered, retarded, and advanced coulomb interactions. Using the definitions of the time-ordered inverse dielectric function ǫ−1 and time-ordered coulomb interaction W on p. 33 and p.36 we write (~ = 1)

W (r, r′,t′ t)= d3r′′v(r, r′′)ǫ−1(r′′, r′,t′ t) − Z − = d3r′′v(r, r′′) δ(r′′, r′)δ(t′ t) Z  − i d3r′′′ v(r′′, r′′′) − Z | |  = v(r, r′)

i d3r′′d3r′′′v(r, r′′) v(r′′, r′′′) (8.9) − Z | |

Note that the right-hand side of (8.9) depends on t′ t after you write out the time- − evolution operators that act on the density operators and insert a complete set of many-body states, as is done in appendix C. Fourier transforming both sides of (8.9) and using the result from appendix C we get

′ ′ 3 ′′ 3 ′′′ ′′ ′′ ′′′ ′′′ ′ W (r, r ,E)= v(r, r )+ d r d r v(r, r )v(r , r )ρs(r )ρs(r ) Z Xs 1 1 (8.10) × E ǫs + iδ − E + ǫs iδ h − − i Defining 8.1. DERIVATION OF COHSEX SELF ENERGY 97

′ 3 ′′ 3 ′′′ ′′ ′′ ′′′ ′′′ ′ fs(r, r )= d r d r v(r, r )v(r , r )ρs(r )ρs(r ) (8.11) Z then we get

′ ′ ′ 1 1 W (r, r ,E)= v(r, r )+ fs(r, r ) (8.12) × E ǫs + iδ − E + ǫs iδ Xs h − − i To see that equations (8.8) and (8.12) are equivalent note that, as mentioned in appendix

C, the many-body excitation energies ǫs form a continuum and are defined to be greater than zero. Combining the two energy denominators in (8.12) and taking the continuum ′ ′ limit, we get equation (8.8) with fs(r, r ) B(r, r ,E). Similar steps for the retarded and → advanced interactions give

R ′ ′ ′ 1 1 W (r, r ,E)= v(r, r )+ fs(r, r ) (8.13) × E ǫs + iδ − E + ǫs + iδ Xs h − i

A ′ ′ ′ 1 1 W (r, r ,E)= v(r, r )+ fs(r, r ) (8.14) × E ǫs iδ − E + ǫs iδ Xs h − − − i Using the identity in (8.5) we get

R ′ A ′ ′ [W (r, r ,E) W (r, r ,E)]/2i = fs(r, r )[ 2πiδ(E ǫs) − − − Xs

+ 2πiδ(E + ǫs)]/2i ′ = π fs(r, r )[ δ(E ǫs)+ δ(E + ǫs)] (8.15) − − Xs

Since we will be inserting this into equation (8.8) for B(r, r′,E′) and the E′ integral only goes over positive values, we can drop the second delta function in (8.15) (recall that ′ ǫs > 0). So, putting 1/π times (8.15) into (8.8) and recalling that E has a small negative − imaginary component, we get 8.1. DERIVATION OF COHSEX SELF ENERGY 98

∞ ′ ′ ′ ′ ′ 2(E iδ)δ(E ǫs) ′ W (r, r ,E)= v(r, r )+ fs(r, r ) − − dE Z E2 (E′ iδ)2 Xs 0 − − ′ ′ 2(ǫs iδ) = v(r, r )+ fs(r, r ) 2 − 2 E (ǫs iδ) Xs − − ′ ′ 1 1 = v(r, r )+ fs(r, r ) (8.16) × E ǫs + iδ − E + ǫs iδ Xs h − − i Seeing that (8.12) and (8.16) are the same, we conclude that, indeed, the correct function to put in for B in (8.8) is 1/π times [W R(r, r′,E′) W A(r, r′,E′)]/2i. We will refer to the − − combination [W R(r, r′,E′) W A(r, r′,E′)]/2i as Im W (r, r′,E′). − Now we are finally ready to evaluate Σ(2) and Σ(1W ). We return to equation (8.2). Since the contribution due to the poles of G has already been dealt with above, we can set delta to zero in both terms and recombine them to get the E′′ integral from to −∞ ∞

i ∞ ∞ ImG(r, r′,E′′) Σ(2+1W )(r, r′,E)= dE′W (r, r′,E′) dE′′ | | (8.17) 2π2 Z Z E E′ E′′ −∞ −∞ − − The notation in (8.17) may be slightly incorrect/misleading, as one may get the impres- sion that there are poles along the real axis. This is not true, however, as all poles from G have been included in Σ(1G). Keeping this notational matter in mind, we proceed. If we put the spectral resolution of W into (8.17) the v(r, r′) term goes to zero because the E′ integrand has no poles in the lower half plane (recall we are closing the contour in the lower ′ half plane). Dropping the e−iδE since it is only needed for determining where we close the contour, the second term gives

i ∞ ∞ ∞ 2(E′′′ iδ)Im W (r, r′,E′′′) Σ(2+1W )(r, r′,E)= dE′dE′′dE′′′ − − 2π3 Z Z Z E′2 (E′′′ iδ)2 −∞ −∞ 0 − − ImG(r, r′,E′′) | | × E E′ E′′ + iδ −1 −∞ ∞ ImG(r, r′,E′′) = dE′′dE′′′ Im W (r, r′,E′′′) | | (8.18) − π2 Z Z E E′′′ E′′ + iδ −∞ 0 − − Finally, we insert (8.6) into (8.18) to get 8.1. DERIVATION OF COHSEX SELF ENERGY 99

∞ ′ ′′′ (2+1W ) ′ 1 ∗ ′ ′′′ Im W (r, r ,E ) Σ (r, r ,E)= φnk(r)φnk(r ) dE ′′′ (8.19) − π Z E E Enk + iδ Xnk 0 − −

(1G) (2+1W ) Switching to more standard notation, from now on we refer to Σ as ΣSEX and Σ as ΣCOH , where SEX denotes Screened EXchange and COH denotes COulomb Hole. This is standard terminology from the HL paper. To summarized our results thus far, we write these contributions to Σ below:

occ ′ ∗ ′ ′ ΣSEX (r, r ,E)= φnk(r)φ (r )W (r, r ,E Enk) (8.20) − nk − Xnk

∞ ′ ′′′ ′ 1 ∗ ′ ′′′ Im W (r, r ,E ) ΣCOH (r, r ,E)= φnk(r)φnk(r ) dE ′′′ (8.21) − π Z E E Enk + iδ Xnk 0 − − We are very close now. All that remains is to take the Fourier transform of (8.20) and (8.21), and take matrix elements. Using the following definition

′ ′ i(q+G)·r −i(q+G )·r′ W (r, r ,E)= e WGG′ (q,E)e (8.22) qGGX′ we get

occ

= W ′ (q,E Enk) | | − GG − Xn′k′ qGGX′ ′ ′ < n′k′ e−i(q+G )·r nk > (8.23) × | | | | The matrix element in (8.23) is given by

i(q+G)·r ′ ′ ′′ ′′′ = c nk (G )cn′k′ (G ) | | ∗ GX′′G′′′ ′ ′′′ ′′ d3rei(k +q−k+G+G −G )·r (8.24) × Z

where the cnk’s are the fourier coefficients of the unk, the periodic part of the Bloch function. This expression can only be non-zero if k′ = k q. With this, we have − 8.1. DERIVATION OF COHSEX SELF ENERGY 100

occ

= W ′ (q,E Enk−q) | | − GG − Xn′ qGGX′ ′ ′ < n′k q e−i(q+G )·r nk > (8.25) × | | − − | |

We now define Mnn′ (k, q, G) to be the matrix elements (8.25), i.e.

i(q+G)·r ′ Mnn′ (k, q, G) (8.26) ≡ | | − Using (8.26) and the following equality, which can be seen from equation (8.9),

′ −1 ′ WGG (q,E)= ǫGG′ (q,E)v(q + G ) (8.27) equation (8.25) becomes

occ ∗ = Mnn′ (k, q, G)M ′ (k, q, G) | | − nn Xn′ qGGX′ ′ −1 v(q + G )ǫ ′ (q,E Enk−q) (8.28) × GG − Following identical steps for the COH term, we get

1 ∗ ′ = Mnn′ (k, q, G)M ′ (k, q, G)v(q + G ) | | − π nn Xn′ qGGX′ ∞ −1 ′ ′ Im ǫGG′ (q,E ) dE ′ (8.29) × Z E E En′k−q + iδ 0 − − A few final remarks. The Im symbol in (8.29) has the same meaning for ǫ as it did for W, i.e. Im ǫ−1(r, r′,E′)=[(ǫ−1)R(r, r′,E′) (ǫ−1)A(r, r′,E′)]/2i. Note that the formula − −1 for ΣSEX in the 2011 BGW paper is incorrect, as it is the time-ordered ǫ that should be there, not the retarded. Also, the differences in the above formulas for the COH and SEX terms relative to the BGW paper can be traced back to the definition of the matrix elements in that paper. I think the author of that paper had in mind a definition of the screened interaction as in (8.22) but with the first exponential having a negative sign and the second having a positive sign, just the opposite of what is above. I think this accounts for the difference. 101 9

Doped Graphene Refinement

In doped graphene the self energy has very sharp features that must be resolved [87, 46], and this requires a very fine k-point sampling. This is because the dirac cone is localized in reciprocal space and a fine k-point sampling is needed to sufficiently sample the states near the fermi level in doped graphene. A coarse k-grid will not include enough states very near to the fermi level, and these are the states that dominate the physics in doped graphene. The required sampling is around 720x720 k-points, which is much finer than is possible for an ab initio GW calculation. However, we take advantage of the fact that this fine k-point sampling is only needed to resolve features coming from around the dirac cone. We use the following expression for the dynamical polarizability [11]

occ emp

′ ′ ∗ ′ χGG (q,E)= Mnn (k, q, G)Mnn′ (k, q, G ) Xn Xn′ Xk 1 1 1 , (9.1) × 2ǫnk q ǫn′k E iδ − ǫnk q ǫn′k + E + iδ  + − − − + − where ǫnk+q,ǫn′k are the mean field energies from DFT, δ is a positive infinitesimal, and the i(q+G)·r ′ quantities Mnn′ (k, q, G) nk e n k q are matrix elements. When calculating the ≡h | | − i polarizability above, we linearly interpolate the mean field energies from our 72x72 k-grid to the 720x720 grid for k-points near the dirac point, since these will be the points that contribute to epsilon for small magnitude of q. We assume the associated matrix elements to be slowly varying so they are moved outside the sum over k for points near the dirac point and hence on our interpolated k-grid. The sum over all the relevant energy denominators on our fine k-grid is then performed. This yields a dielectric matrix with a very sharp carrier plasmon peak. We only do this refinement for small q points, since these are the points for | | CHAPTER 9. DOPED GRAPHENE REFINEMENT 102 which the group velocity of the carrier plasmon and electrons in graphene are similar. At such points there is a high density of states for creation of a plasmon by the photoexcited electron [87]. In evaluating the dynamical self energy in (7.3) we interpolate the heads of the dielectric matrix, which we have on our 72x72 grid, for small q onto our 720x720 grid. We also | | interpolate the mean field energies to the finer grid in this step. We again assume that the matrix elements are slowly varying and take them outside the sum over the fine q-grid. For a given k-point, we perform the needed frequency integrals and sum over the q-points in this interpolated region to get the needed contribution to sigma. For all large q-point contributions to given k-point, we neither use the refined heads of epsilon nor this refinement treatment of the self energy since it is unnecessary. In the above expression µ is the chemical potential, ǫ−1 is the inverse dielectric matrix obtained by inverting the dielectric matrix from equation (10) below, and Im ǫ−1 [(ǫR)−1 (ǫA)−1)/2i, where ǫR and ǫA are the retarded ≡ − and advanced dielectric matrix, respectively. There are two important points about our interpolation procedure for the self energy. The first point is that, in interpolating the frequency dependent dielectric matrix, we had to devise a scheme for interpolating a function of frequency and q-point onto a finer q-point grid. The second point is that, to interpolate the dielectric matrix to very small q-points, we needed some input about the q 0 behavior of the dielectric matrix since the smallest → q-point on our grid for which we have our dielectric matrix is (1/72 0 0). Well-known schemes exist for interpolating quantities that have just a single value at a given point in the BZ, such as bandstructures [64, 65]. To our knowledge there is not a well-known method for interpolating quantities that are functions of frequency in the BZ. Our method partitions the frequency region in which we have our dielectric matrix into two regions: a low-energy region that contains the carrier and π plasmon peaks, and a high- energy region that contains the σ plasmon peak and other smooth features. We freeze the high energy region to it’s value at one of the corners of our q-point interpolation region. The dielectric matrix in this frequency range should not vary strongly with q-point. Additionally, the physics of doped graphene is not really affected by what is in this frequency range of the dielectric matrix. These two points justify the freezing at higher frequencies. In the low-frequency region, we rely on the fact that we are interpolating a function with two peaks from the carrier and π plasmons. Specifically, we go through the following 6-step process in order to interpolate the head of the frequency-dependent inverse dielectric matix: 1) For a given set of four q-points at the corners of a square (in lattice coordinates) in the brillouin zone we construct a much finer q-grid within this square (10x10). 2) For each point on this finer q-grid we bilearly interpolate the peak position and height of the carrier and plasmon peaks from values on the corners of the square. 3) For a given q-point on the corner of the BZ square, we calculate difference vectors between the peak height and CHAPTER 9. DOPED GRAPHENE REFINEMENT 103 position at that q-point and their interpolated values at the q-point on the fine grid:

∆ = pinterp pq. (9.2) − −1 In the above equation pq =(ωq,peak,ǫ00 (q, ωq,peak)) is the vector location of the peak (carrier or π plasmon) in the (ω,ǫ−1)-plane for the q-point on the BZ square under consideration. pinterp is the same quantity for the peaks at the interpolated q-point. 4) To get the vector location of all other frequency points of the head of the inverse dielectric matrix we linearly interpolate the difference vectors for the carrier and plasmon peak with respect to frequency:

′ ωπ ω ω ωc pq = pq + ∆c − + ∆π − (9.3) ωπ ωc ωπ ωc − −

ωc and ωπ are the carrier and π-plasmon peak location, respectively. After this, there is a minor step where we have to interpolate back to our original frequency grid. 5) We use bilinear interpolation to average the heads of the frequency dependent dielectric matrix from the four corners of our square that result from the above procedure. 6) We sum over the contributions from all the q-points in our dense grid to get the contribution from the given square in the BZ. For the region around q 0 we interpolated the carrier plasmon peak onto a finer q-grid → by using the long-wavelength analytic expression for the imaginary part of the head of the inverse dielectric matrix [87]:

−1 −1 Imǫ0,0(q, ω)= C q δ(ω α q )+ δImǫ0,0(q, ω), (9.4) p| | − p| | where the first term corresponds to the carrier plasmon, and we determine the constants C and α from the head of the inverse dielectric matrix for the smallest q-point on our grid. −1 The δImǫ0,0(q, ω) contains the rest of the head of the inverse dielectric matrix and is kept constant in this interpolation of the q 0 region since the physics here is dominated by → the carrier plasmon. 104

Bibliography

[1] S. Adachi. Handbook on Physical Properties of Semiconductors, volume 3. Kluwer Academic, 2004.

[2] C.-O. Almbladh and L. Hedin. Handbook of Synchroton Radiation, volume 1. E. E. Koch (North-Holland, Amsterdam), 1983.

[3] E. Antoncik. A new formulation of the method of nearly free electrons. Czech. J. Phys., 4:439, 1954.

[4] B. Arnaud and M. Alouani. All-electron projector-augmented-wave gw approximation: Application to the electronic properties of semiconductors. Phys. Rev. B, 62:4464, 2000.

[5] F Aryasetiawan and O Gunnarsson. The gw method. Reports on Progress in Physics, 61(3):237, 1998.

[6] F. Aryasetiawan, L. Hedin, and K. Karlsson. Phys. Rev. Lett., 77:2268, 1996.

[7] Neil W. Ashcroft and N. David Mermin. Solid State Physics. Brooks Cole, 1976.

[8] V. Barone, M. Casarin, D. Forrer, M. Pavone, M. Sambi, and A. Vittadini. J. Comp. Chem., 30:934, 2009.

[9] S. Baroni, S. de Gironcoli, A. Dal Corso, and P. Giannozzi. and related crystal properties from density-functional perturbation theory. Rev. Mod. Phys., 73:515, 2001.

[10] Stefano Baroni, Paolo Giannozzi, and Andrea Testa. Green’s-function approach to linear response in solids. Phys. Rev. Lett., 58:1861, 1987.

[11] http://www.berkeleygw.org. BIBLIOGRAPHY 105

[12] Marco Bernardi, Derek Vigil-Fowler, Johannes Lischner, Jeffrey B. Neaton, and Steven G. Louie. Ab Initio study of hot carriers in the first picosecond after sun- light absorption in silicon. Phys. Rev. Lett., 112:257402, Jun 2014.

[13] L.S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Don- garra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. ScaLAPACK Users’ Guide, volume 1. Society for Industrial and Applied Mathematics, 1997.

[14] A. Bostwick, T. Ohta, T. Seyller, K. Horn, and E. Rotenberg. Nature Physics, 3:36, 2007.

[15] A. Bostwick, F. Speck, T. Seyller, K. Horn, M. Polini, R. Asgari, A. H. MacDonald, and E. Rotenberg. Science, 328:999, 2010.

[16] M. Calandra and F. Mauri. Phys. Rev. B, 76:205411, 2007.

[17] R. Car and M. Parrinello. Unified approach for molecular dynamics and density- functional theory. Phys. Rev. Lett., 55:2471, 1985.

[18] A. H. Castro Neto, F. Guinea, N. M. R. Peres, K. S. Novoselov, and A. K. Geim. The electronic properties of graphene. Rev. Mod. Phys., 81:109, 2009.

[19] M.L. Cohen and V. Heine. The fitting of pseudopotentials to experimental data and their subsequent application, volume 24 of Solid State Physics, page 37. Academic, New York, 1970.

[20] A. Damascelli, Z. Hussain, and Z.-X. Shen. Rev. Mod. Phys., 75:473, 2003.

[21] Jack Deslippe, Georgy Samsonidze, David A. Strubbe, Manish Jain, Marvin L. Cohen, and Steven G. Louie. Berkeleygw: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Computer Physics Communications, 183(6):1269 – 1289, 2012.

[22] O. E. Dial, R. C. Ashoori, L. N. Pfeiffer, and K. W. west. Phys. Rev. B, 85:081306(R), 2012.

[23] O.E. Dial, R.C. Ashoori, L.N. Pfeiffer, and K.W. West. High-resolution spectroscopy of two-dimensional electron systems. Nature, 448:176–179, July 2007.

[24] C.A. Emeis, L.J. Oosterhoff, and G. De Vries. Numerical evaluation of kramers-kronig relations. Proc. R. Soc. London, A297:54, 1967.

[25] http://www.quantum-espresso.org/. BIBLIOGRAPHY 106

[26] Sergey V. Faleev, Mark van Schilfgaarde, and Takao Kotani. All-electron self- consistent gw approximation: Application to si, mno, and nio. Phys. Rev. Lett., 93:126406, 2004.

[27] E. Fermi. Displacement by pressure of the high lines of the spectral series. Nuovo Cimento, 11:157, 1934.

[28] A.L. Fetter and J.D. Walecka. Quantum Theory of Many-Particle Systems. Dover Publications, 2003.

[29] I. Forbeaux, J.-M. Themlin, and J.-M. Debever. Heteroepitaxial graphite on 6h- sic(0001):interface formation through conduction-band electronic structure. Phys. Rev. B, 58:16396, 1998.

[30] Paolo Giannozzi, Stefano Baroni, Nicola Bonini, Matteo Calandra, Roberto Car, Carlo Cavazzoni, Davide Ceresoli, Guido L Chiarotti, Matteo Cococcioni, Ismaila Dabo, An- drea Dal Corso, Stefano de Gironcoli, Stefano Fabris, Guido Fratesi, Ralph Gebauer, Uwe Gerstmann, Christos Gougoussis, Anton Kokalj, Michele Lazzeri, Layla Martin- Samos, Nicola Marzari, Francesco Mauri, Riccardo Mazzarello, Stefano Paolini, Al- fredo Pasquarello, Lorenzo Paulatto, Carlo Sbraccia, Sandro Scandolo, Gabriele Sclauzero, Ari P Seitsonen, Alexander Smogunov, Paolo Umari, and Renata M Wentz- covitch. Quantum espresso: a modular and open-source software project for quantum simulations of materials. Journal of Physics: Condensed Matter, 21(39):395502, 2009.

[31] R. W. Godby, M. Schl¨uter, and L. J. Sham. Trends in self-energy operators and their corresponding exchange-correlation potentials. Phys. Rev. B, 36:6497, 1987.

[32] Ricardo G´omez-Abal, Xinzheng Li, Matthias Scheffler, and Claudia Ambrosch-Draxl. Influence of the core-valence interaction and of the pseudopotential approximation on the electron self-energy in semiconductors. Phys. Rev. Lett., 101:106404, 2008.

[33] X. Gonze. First-principles responses of solids to atomic displacements and homoge- neous electric fields: Implementation of a conjugate-gradient algorithm. Phys. Rev. B, 55:10337, 1997.

[34] X. Gonze and J.-P. Vigneron. Density-functional approach to nonlinear-response co- efficients of solids. Phys. Rev. B, 39:13120–13128, Jun 1989.

[35] D.J. Griffiths. Introduction to Quantum Mechanics, 2nd edition. Pearson Prentice Hall, 2004.

[36] S. Grimme. J. Comp. Chem., 27:1787, 2006.

[37] Jeffrey C. Grossman, Michael Rohlfing, Lubos Mitas, Steven G. Louie, and Mar- vin L. Cohen. High accuracy many-body calculational approaches for excitations in molecules. Phys. Rev. Lett., 86:472–475, Jan 2001. BIBLIOGRAPHY 107

[38] M. Guzzo, G. Lani, F. Sottile, P. Romaniello, M. Gatti, J.J. Kas, J.J. Rehr, M.G. Silly, F. Sirotti, and L. Reining. Valence electron photoemission spectrum of semi- conductors: Ab initio description of multiple satellites. Phys. Rev. Lett., 107:166401, 2011.

[39] D. R. Hamann, M. Schl¨uter, and C. Chiang. Norm-conserving pseudopotentials. Phys. Rev. Lett., 43:1494, 1979.

[40] L. Hedin. Phys. Rev., 139:A796, 1965.

[41] L. Hedin. Phys. Scr., 21:477, 1980.

[42] L. Hedin. J. Phys.: Condens. Matter, 11:R489, 1999.

[43] L. Hedin and S. Lundqvist. Effects of electron-electron and electron-phonon inter- actions on the one-electron states of solids. In Frederick Seiz, David Turnbull, and Henry Ehrenreich, editors, Advances in Research and Applications, volume 23 of Solid State Physics, pages 1 – 181. Academic Press, 1970.

[44] H. Hellman. A new approximation method in the problem of many electrons. J. Chem. Phys, 3:61, 1935.

[45] P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Phys. Rev., 136:B864–B871, Nov 1964.

[46] E.H. Hwang and S. Das Sarma. Quasiparticle spectral function in doped graphene: Electron-electron interaction effects in arpes. Phys. Rev. B, 77:081412(R), 2008.

[47] M. S. Hybertsen and S. G. Louie. Electron correlation in semiconductors and insula- tors: Band gaps and quasiparticle energies. Phys. Rev. B, 34:5390, 1986.

[48] S. Ismail-Beigi. Truncation of periodic image interactions for confined systems. Phys. Rev. B, 73(23):233103, 2006.

[49] Uichi Itoh, Yasutake Toyoshima, Hideo Onuki, Nobuaki Washida, and Toshio Ibuki. Vacuum ultraviolet absorption cross sections of sih4, geh4, si2h6, and si3h8. The Journal of Chemical Physics, 85(9):4867–4872, 1986.

[50] Manish Jain, Jack Deslippe, Georgy Samsonidze, Marvin L. Cohen, James R. Che- likowsky, and Steven G. Louie. Improved quasiparticle wave functions and mean field for G0W0 calculations: Initialization with the cohsex operator. Phys. Rev. B, 90:115148, Sep 2014.

[51] R. Jalabert and S. Das Sarma. Phys. Rev. B, 40:9723, 1989. BIBLIOGRAPHY 108

[52] Wei Kang and Mark S. Hybertsen. Enhanced static approximation to the electron self-energy operator for efficient calculation of quasiparticle energies. Phys. Rev. B, 82:195108, Nov 2010.

[53] J. J. Kas, J. J. Rehr, and L. Reining. Cumulant expansion of the retarded one-electron green function. Phys. Rev. B, 90:085112, Aug 2014.

[54] Amandeep Kaur, Erik R. Ylvisaker, Deyu Lu, Tuan Anh Pham, Giulia Galli, and Warren E. Pickett. Spectral representation analysis of dielectric screening in solids and molecules. Phys. Rev. B, 87:155144, 2013.

[55] E. Kaxiras. Atomic and Electronic Structure of Solids. Cambridge University Press, 2003.

[56] L.V. Keldysh. Ionization in the field of a strong electromagnetic field. JETP, 20:1307, 1965.

[57] Ji ˇr´ıKlimeˇs, Merzuk Kaltak, and Georg Kresse. Predictive gw calculations using plane waves and pseudopotentials, Aug 2014.

[58] W. Kohn and L. J. Sham. Self-consistent equations including exchange and correlation effects. Phys. Rev., 140:A1133–A1138, Nov 1965.

[59] T. Kotani and M. van Schilfgaarde. All-electron GW approximation with the mixed basis expansion based on the full-potential LMTO{ }method. Solid State Communi- cations, 121(910):461 – 465, 2002. { }

[60] Takao Kotani, Mark van Schilfgaarde, and Sergey V. Faleev. Quasiparticle self- consistent gw method: A basis for the independent-particle approximation. Phys. Rev. B, 76:165106, Oct 2007.

[61] Wei Ku and Adolfo G. Eguiluz. Band-gap problem in semiconductors revisited: Effects of core states and many-body self-consistency. Phys. Rev. Lett., 89:126401, 2002.

[62] D.C. Langreth. Singularities in the x-ray spectra of metals. Phys. Rev. B, 1:471, 1970.

[63] P. Lautenschlager, M. Garriga, L. Vina, and M. Cardona. Temperature dependence of the dielectric function and interband critical points in silicon. Phys. Rev. B, 36:4821– 4830, Sep 1987.

[64] G. Lehmann, P. Rennert, M. Taut, and H. Wonn. phys. stat. sol., 37:K27, 1970.

[65] G. Lehmann and M. Taut. phys. stat. sol., 54:469, 1972.

[66] X.-Z. Li, R. G´omez-Abal, H.J., C. Ambrosch-Draxl, and M. Scheffler. Impact of widely used approximations to the g 0 w 0 method: an all-electron perspective. New Journal of Physics, 14(2):023006, 2012. BIBLIOGRAPHY 109

[67] J. Lischner, D. Vigil-Fowler, and S. G. Louie. Phys. Rev. B, 89:125430, 2014.

[68] J. Lischner, D. Vigil-Fowler, and S.G. Louie. Phys. Rev. Lett., 110:146801, 2013.

[69] Johannes Lischner, G. K. P´alsson, Derek Vigil-Fowler, S. Nemsak, J. Avila, M. C. Asensio, C. S. Fadley, and Steven G. Louie. Satellite band structure in silicon caused by electron-plasmon coupling. Phys. Rev. B, 91:205113, May 2015.

[70] Johannes Lischner, Derek Vigil-Fowler, and Steven G. Louie. Satellite structures in the spectral functions of the two-dimensional electron gas in semiconductor quantum wells: A gw plus cumulant study. Phys. Rev. B, 89:125430, Mar 2014.

[71] Fang Liu, Lin Lin, Derek Vigil-Fowler, Johannes Lischner, Alexander F. Kemper, Sahar Sharifzadeh, Felipe H. da Jornada, Jack Deslippe, Chao Yang, Jeffrey B. Neaton, and Steven G. Louie. Numerical integration for ab initio many-electron self energy calculations within the GW approximation. J. Comp. Phys., 286(0):1 – 13, 2015. { } [72] Steven G. Louie, Sverre Froyen, and Marvin L. Cohen. Nonlinear ionic pseudopoten- tials in spin-density-functional calculations. Phys. Rev. B, 26:1738–1742, Aug 1982.

[73] B. I. Lundqvist. Phys. Kondens. Mater., 6:193, 1967.

[74] O. Madelung. Semiconductors : Basic Data. Springer-Verlag, 1996.

[75] G.D. Mahan. Many-Particle Physics (Physics of Solids and ), 3rd edition. Springer, 2000.

[76] Brad D. Malone and Marvin L. Cohen. Quasiparticle semiconductor band struc- tures including spinorbit interactions. Journal of Physics: Condensed Matter, 25(10):105503, 2013.

[77] R.M. Martin. Electronic Structure, volume 1. Cambridge University Press, 2004.

[78] D. Marx and J. Hutter. Ab initio molecular dynamics : Theory and implementation, volume 1 of Modern Methods and Algorithms of Quantum Chemistry, page 301. John von Neumann Institute of Computing, 2000.

[79] R.D. Mattuck. A guide to Feynman Diagrams in the Many-Body Problem, 2nd edition. Dover Publications, 1992.

[80] K.S. Novoselov, A.K. Geim, S.V. Morozov, D. Jiang, Y. Zhang, S.V. Dubonos, I.V. Grigrieva, and A.A. Firsov. Electric field effect in atomically thin carbon films. Science, 306:666, 2004.

[81] http://www.nersc.gov/projects/paratec. BIBLIOGRAPHY 110

[82] Cheol-Hwan Park, Feliciano Giustino, Catalin D. Spataru, Marvin L. Cohen, and Steven G. Louie. First-principles study of electron linewidths in graphene. Phys. Rev. Lett., 102:076803, 2009.

[83] J. P. Perdew and Alex Zunger. Self-interaction correction to density-functional ap- proximations for many-electron systems. Phys. Rev. B, 23:5048, 1981.

[84] M.E. Peskin and D.V. Schroeder. An Introduction to . Westview Press, 1995.

[85] J.C. Phillips and L. Kleinman. New method for calculating wave functions in crystals and molecules. Phys. Rev., 116:287, Oct 1959.

[86] The experimental data was obtained for a doping level corresponding to a charge density of n = 4.0 1013 cm−13. We have scaled the experimental spectrum using the energy difference− × from the interacting Fermi energy to the interacting Dirac point energy.

[87] M. Polini, R. Asgari, G. Borghi, Y. Barlas, T. Pereg-Barnea, and A.H. MacDonald. Plasmons and the spectral function of graphene. Phys. Rev. B, 77:081411(R), 2008.

[88] Martin M. Rieger, L. Steinbeck, I.D. White, H.N. Rojas, and R.W. Godby. The gw space-time method for the self-energy of large systems. Comp. Phys. Comm., 117:211, 1999.

[89] E. Rotenberg, A. Bostwick, T. Ohta, J.L. McChesney, T. Seyller, , and K. Horn. Origin of the energy bandgap in epitaxial graphene. Nat. Mater., 7:258, 2008.

[90] Georgy Samsonidze, Cheol-Hwan Park, and Boris Kozinsky. Insights and challenges of applying the gw method to transition metal oxides. Journal of Physics: Condensed Matter, 26(47):475501, 2014.

[91] M. Shishkin and G. Kresse. Self-consistent gw calculations for semiconductors and insulators. Phys. Rev. B, 75:235102, Jun 2007.

[92] J.C. Slater. The electronic structure of metals. Rev. Mod. Phys., 6:209, 1934.

[93] Murilo L. Tiago, Sohrab Ismail-Beigi, and Steven G. Louie. Effect of semicore orbitals on the electronic band gaps of si, ge, and gaas within the gw approximation. Phys. Rev. B, 69:125212, 2004.

[94] Bruno Uchoa, Ling Yang, S.-W. Tsai, N. M. R. Peres, and A. H. Castro Neto. Theory of scanning tunneling spectroscopy of magnetic adatoms in graphene. Phys. Rev. Lett., 103:206804, Nov 2009. BIBLIOGRAPHY 111

[95] Miguel M. Ugeda, Aaron J. Bradley, Su-Fei Shi, Felipe H. da Jornada, Yi Zhang, Diana Y. Qiu, Wei Ruan, Sung-Kwan Mo, Zahid Hussain, Zhi-Xun Shen, Feng Wang, Steven G. Louie, and Michael F. Crommie. Giant bandgap renormalization effects in a monolayer transition metal dichalcogenide semiconductor. Nat. Mater., 13, 2014.

[96] F. Varchon, R. Feng, J. Hass, X. Li, N. Nguyen, C. Naud, P. Mallet, J.Y. Veuillen, C. Berger, E.H. Conrad, and L. Magaud. Electronic structure of epitaxial graphene layers on sic: Effect of the substrate. Phys. Rev. Lett., 99:126805, 2007.

[97] Lucas O. Wagner, Thomas E. Baker, E. M. Stoudenmire, Kieron Burke, and Steven R. White. Kohn-sham calculations with the exact functional. Phys. Rev. B, 90:045109, Jul 2014.

[98] A. L. Walter, A. Bostwick, K.-J. Jeon, F. Speck, M. Ostler, T. Seyller, L. Moreschini, Y. J. Chang, M. Polini, R. Asgari, A. H. MacDonald, K. Horn, and E. Rotenberg. Phys. Rev. B, 84:085410, 2011.

[99] T. O. Wehling, E. S¸a¸sio˘glu, C. Friedrich, A. I. Liechtenstein, M. I. Katsnelson, and S. Bl¨ugel. Phys. Rev. Lett., 106:236805, 2011.

[100] E.P. Wigner and F. Seitz. On the constitution of metallic sodium. Phys. Rev., 43:804, 1933.

[101] B. Wunsch, T. Stauber, F. Sols, and F. Guinea. New J. Phys., 8:318, 2006.

[102] J. Yan, K. S. Thygesen, and K. W. Jacobsen. Phys. Rev. Lett., 106:146803, 2011.

[103] J.M. Ziman. Elements of Advanced Quantum Theory. Cambridge University Press, 1969.