IAS/PARK CITY MATHEMATICS SERIES Volume 23

Mathematics and Materials

Mark J. Bowick David Kinderlehrer Govind Menon Charles Radin Editors

American Mathematical Society Institute for Advanced Study Society for Industrial and Applied Mathematics Mathematics and Materials

IAS/PARK CITY MATHEMATICS SERIES Volume 23

Mathematics and Materials

Mark J. Bowick David Kinderlehrer Govind Menon Charles Radin Editors

American Mathematical Society Institute for Advanced Study Society for Industrial and Applied Mathematics Rafe Mazzeo, Series Editor Mark J. Bowick, David Kinderlehrer, Govind Menon, and Charles Radin, Volume Editors.

IAS/Park City Mathematics Institute runs mathematics education programs that bring together high school mathematics teachers, researchers in mathematics and mathematics education, undergraduate mathematics faculty, graduate students, and undergraduates to participate in distinct but overlapping programs of research and education. This volume contains the lecture notes from the Graduate Summer School program 2010 Mathematics Subject Classification. Primary 82B05, 35Q70, 82B26, 74N05, 51P05, 52C17, 52C23.

Library of Congress Cataloging-in-Publication Data Names: Bowick, Mark J., editor. | Kinderlehrer, David, editor. | Menon, Govind, 1973– editor. | Radin, Charles, 1945– editor. | Institute for Advanced Study (Princeton, N.J.) | Society for Industrial and Applied Mathematics. Title: Mathematics and materials / Mark J. Bowick, David Kinderlehrer, Govind Menon, Charles Radin, editors. Description: [Providence] : American Mathematical Society, [2017] | Series: IAS/Park City math- ematics series ; volume 23 | “Institute for Advanced Study.” | “Society for Industrial and Applied Mathematics.” | “This volume contains lectures presented at the Park City summer school on Mathematics and Materials in July 2014.” – Introduction. | Includes bibliographical references. Identifiers: LCCN 2016030010 | ISBN 9781470429195 (alk. paper) Subjects: LCSH: Statistical mechanics–Congresses. | Materials science–Congresses. | AMS: Sta- tistical mechanics, structure of matter – Equilibrium statistical mechanics – Classical equilibrium statistical mechanics (general). msc | Partial differential equations – Equations of mathematical and other areas of application – PDEs in connection with mechanics of particles and sys- tems. msc | Statistical mechanics, structure of matter – Equilibrium statistical mechanics – Phase transitions (general). msc | Mechanics of deformable solids – Phase transformations in solids – Crystals. msc | Geometry – Geometry and physics (should also be assigned at least one other classification number from Sections 70–86) – Geometry and physics (should also be assigned at least one other classification number from Sections 70–86). msc | Convex and discrete geometry – Discrete geometry – Packing and covering in n dimensions. msc | Convex and discrete geometry – Discrete geometry – , aperiodic tilings. msc Classification: LCC QC174.7 .M38 2017 | DDC 530.13–dc23 LC record available at https://lccn. loc.gov/2016030010

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Permissions to reuse portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink service. For more information, please visit: http://www.ams.org/rightslink. Send requests for translation rights and licensed reprints to [email protected]. Excluded from these provisions is material for which the author holds copyright. In such cases, requests for permission to reuse or reprint material should be addressed directly to the author(s). Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the first page of each article within proceedings volumes. c 2017 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights except those granted to the Government. Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability. Visit the AMS home page at http://www.ams.org/ 10987654321 222120191817 Contents

Preface ix

Introduction xi

Veit Elser Three Lectures on Statistical Mechanics 1 Lecture 1. Mechanical foundations 3 Lecture 2. Temperature and entropy 15 Lecture 3. Macroscopic order 25

Bibliography 41

Henry Cohn Packing, Coding, and Ground States 43

Preface 45

Lecture 1. Sphere packing 47 1. Introduction 47 2. Motivation 48 3. Phenomena 50 4. Constructions 52 5. Difficulty of sphere packing 54 6. Finding dense packings 55 7. Computational problems 57

Lecture 2. Symmetry and ground states 59 1. Introduction 59 2. Potential energy minimization 60 3. Families and universal optimality 61 4. Optimality of simplices 65

Lecture 3. Interlude: Spherical harmonics 69 1. Fourier series 69 2. Fourier series on a torus 71 3. Spherical harmonics 73

Lecture 4. Energy and packing bounds on spheres 77 1. Introduction 77 2. Linear programming bounds 79 3. Applying linear programming bounds 81

v vi CONTENTS

4. Spherical codes and the kissing problem 82 5. Ultraspherical polynomials 83

Lecture 5. Packing bounds in Euclidean space 89 1. Introduction 89 2. Poisson summation 91 3. Linear programming bounds 92 4. Optimization and conjectures 94

Bibliography 99

Alpha A. Lee and Daan Frenkel Entropy, Probability and Packing 103

Introduction 105

Lecture 1. Introduction to thermodynamics and statistical physics 107 1. Classical equilibrium thermodynamics 107 2. Statistical physics of entropy 113 3. From entropy to thermodynamic ensembles 119 4. Exercises 123

Lecture 2. Thermodynamics of phase transitions 125 1. Thermodynamics of phase equilibrium 125 2. Thermodynamic integration 126 3. The chemical potential and Widom particle insertion 130 4. Exercises 133

Lecture 3. Order from disorder: Entropic phase transitions 135 1. Hard-sphere freezing 136 2. Role of geometry: The isotropic-nematic transition 140 3. Depletion interaction and the entropy of the medium 145 4. Attractive forces and the liquid phase 151 5. Exercises 154

Lecture 4. Granular entropy 157 1. Computing the entropy 158 2. Is this “entropy” physical? 160 3. The Gibbs paradox 161

Bibliography 165

MichaelP.Brenner Ideas about Self Assembly 167

Introduction 169

Lecture 1. Self assembly: Introduction 171 1. What is self-assembly 171 2. Statistical mechanical preliminaries 172 CONTENTS vii

Lecture 2. Self assembly with identical components 175 1. Introduction 175 2. The (homogeneous) polymer problem 176 3. Cluster statistical mechanics 179

Lecture 3. Heterogeneous self assembly 187 1. Heteropolymer problem 187 2. The yield catastrophe 189 3. Colloidal assembly 191

Lecture 4. Nucleation theory and multifarious assembly mixtures 195 1. Nucleation theory 195 2. Magic soup 196

Bibliography 199

P. Palffy-Muhoray, M. Pevnyi, E. Virga, and X. Zheng The Effects of Particle Shape in Orientationally Ordered Soft Materials 201

Introduction 203

Lecture 1. Soft condensed matter and orientational order 205 1. Soft condensed matter 205 2. Position and orientation 206 3. Orientational order parameters 210

Lecture 2. The free energy 213 1. Helmholtz free energy 214 2. Trial free energy 215 3. Configurational partition function 215 4. Pairwise interactions 215 5. Soft and hard potentials 216 6. Mean-field free energy 217 7. Density functional theory 219

Lecture 3. Particle shape and attractive interactions 227 1. Polarizability of a simple atom 228 2. Dispersion interaction 230 3. Polarizability of non-spherical particles 231

Lecture 4. Particle shape and repulsive interactions 239 1. Onsager theory 240 2. Excluded volume for ellipsoids 241 3. Phase separation 242 4. Minimum excluded volume of convex shapes 244 5. Systems of hard polyhedra 247 viii CONTENTS

Summary 249 Bibliography 251

Roman Koteck´y Statistical Mechanics and Nonlinear Elasticity 255

Introduction 257

Lecture 1. Statistical mechanics 259 Statistical mechanics of interacting particles 259 Lattice models of nonlinear elasticity 261 Ising model 263

Lecture 2. Phase transitions 267 Existence of the free energy 267 Concavity of the free energy 268 Peierls argument 269

Lecture 3. Expansions 275 The high temperature expansion 275 Intermezzo (cluster expansions) 277 Proof of cluster expansion theorem 280

Lecture 4. Gradient models of random surface 283 Quadratic potential 283 Convex potentials 285 Non-convex potentials 286

Lecture 5. Nonlinear elasticity 291 Free energy 291 Macroscopic behaviour from microscopic model 293 Main ingredients of the proof of quasiconvexity of W 294

Bibliography 297

Peter Bella, Arianna Giunti, and Felix Otto Quantitative Stochastic Homogenization: Local Control of Homogenization Error through Corrector 299 1. A brief overview of stochastic homogenization, and a common vision for quenched and thermal noise 301 2. Precise setting and motivation for this work 304 3. Main results 307 4. Proofs 309 Bibliography 326 Preface

The IAS/Park City Mathematics Institute (PCMI) was founded in 1991 as part of the “Regional Geometry Institute” initiative of the National Science Foundation. In mid-1993 the program found an institutional home at the Institute for Advanced Study (IAS) in Princeton, New Jersey. The IAS/Park City Mathematics Institute encourages both research and ed- ucation in mathematics and fosters interaction between the two. The three-week summer institute offers programs for researchers and postdoctoral scholars, gradu- ate students, undergraduate students, high school teachers, undergraduate faculty, and researchers in mathematics education. One of PCMI’s main goals is to make all of the participants aware of the total spectrum of activities that occur in math- ematics education and research. We wish to involve professional mathematicians in education and to bring modern concepts in mathematics to the attention of educators. To that end, the summer institute features general sessions designed to encourage interaction among the various groups. In-year activities at the sites around the country form an integral part of the High School Teachers Program. Each summer a different topic is chosen as the focus of the Research Program and Graduate Summer School. Activities in the Undergraduate Summer School deal with this topic as well. Lecture notes from the Graduate Summer School are being published each year in this series. The first twenty-three volumes are:

• Volume 1: Geometry and Quantum Field Theory (1991) • Volume 2: Nonlinear Partial Differential Equations in Differential Geom- etry (1992) • Volume 3: Complex Algebraic Geometry (1993) • Volume 4: Gauge Theory and the Topology of Four-Manifolds (1994) • Volume 5: Hyperbolic Equations and Frequency Interactions (1995) • Volume 6: Probability Theory and Applications (1996) • Volume 7: Symplectic Geometry and Topology (1997) • Volume 8: Representation Theory of Lie Groups (1998) • Volume 9: Arithmetic Algebraic Geometry (1999) • Volume 10: Computational Complexity Theory (2000) • Volume 11: Quantum Field Theory, Supersymmetry, and Enumerative Geometry (2001) • Volume 12: Automorphic Forms and their Applications (2002) • Volume 13: Geometric Combinatorics (2004) • Volume 14: Mathematical Biology (2005) • Volume 15: Low Dimensional Topology (2006) • Volume 16: Statistical Mechanics (2007)

ix xPREFACE

• Volume 17: Analytic and Algebraic Geometry: Common Problems, Dif- ferent Methods (2008) • Volume 18: Arithmetic of L-functions (2009) • Volume 19: Mathematics in Image Processing (2010) • Volume 20: Moduli Spaces of Riemann Surfaces (2011) • Volume 21: Geometric Group Theory (2012) • Volume 22: Geometric Analysis (2013) • Volume 23: Mathematics and Materials (2014) Volumes are in preparation for subsequent years. Some material from the Undergraduate Summer School is published as part of the Student Mathematical Library series of the American Mathematical Society. We hope to publish material from other parts of the IAS/PCMI in the future. This will include material from the High School Teachers Program and publications documenting the interactive activities that are a primary focus of the PCMI. At the summer institute late afternoons are devoted to seminars of common interest to all participants. Many deal with current issues in education: others treat mathematical topics at a level which encourages broad participation. The PCMI has also spawned interactions between universities and high schools at a local level. We hope to share these activities with a wider audience in future volumes.

Rafe Mazzeo Director, PCMI March 2016 Introduction

This volume contains lectures presented at the Park City summer school on “Mathematics and Materials” in July 2014. The central theme is a description of material behavior that is rooted in statistical mechanics. While many presentations of mathematical problems in materials science begin with continuum mechanics, these lectures present an alternate view. A rich variety of material properties is shown to emerge from the interplay between geometry and statistical mechanics. The school included approximately eighty graduate students and forty research- ers from many areas of mathematics and the sciences. This interdisciplinary spirit is reflected in a diverse set of perspectives on the order-disorder transition in many geometric models of materials, including nonlinear elasticity, sphere packings, gran- ular materials, liquid crystals, and the emerging field of synthetic self-assembly. The lecturers for the school, and the topics of their lectures, were as follows:

(1) Michael Brenner, School of Engineering and Applied Sciences, Harvard University: Ideas about self-assembly. (2) Henry Cohn, Microsoft Research: Packing, coding and ground states. (3) Veit Elser, Department of Physics, Cornell University: Three lectures on statistical mechanics. (4) Daan Frenkel, Department of , University of Cambridge: En- tropy, probability and packing. (5) Richard D. James, Department of Aerospace Engineering and Mechanics, University of Minnesota: Phase transformations, hysteresis and energy conversion –the role of geometry in the discovery of materials. (6) Robert V. Kohn, Courant Institute, New York University: Wrinkling of thin elastic sheets. (7) Roman Koteck´y, Mathematics Institute, University of Warwick: Statisti- cal mechanics and nonlinear elasticity. (8) Peter Palffy-Muhoray, Department of Chemical Physics, Kent State Uni- versity: The effects of particle shape in orientationally ordered soft mate- rials.

In addition, L. Mahadevan (Harvard University) and Felix Otto (MPI, Leipzig) were in residence for the program as Clay Senior Scholars, and gave well-received public lectures. All the lectures in this volume contain unique pedagogical introductions to a variety of topics of current interest. Several lectures touch on the interplay between discrete geometry (especially packing) and statistical mechanics. These problems have an immediate mathematical appeal and are of increasing importance in ap- plications, but are not as widely known as they should be to mathematicians with

xi xii Introduction an interest in materials science. Both Elser and Frenkel present elegant introduc- tions to statistical mechanics from the physicist’s perspective, with an emphasis on the interplay between entropy and packings. This theme is repeated in Cohn’s lectures, which reveal the role of unexpected mathematical tools in simply stated problems about symmetric ground states. Similarly, Brenner uses discrete geome- try and statistical mechanics to model exciting new experiments on synthetic self- assembly. Palffy-Muhoray uses statistical mechanics to derive several models for liquid crystals, exploring again the interplay between shape and statistical mechan- ics. Koteck´y’s lecture contains a mathematical introduction to statistical mechan- ics, with a focus on the foundations of nonlinear elasticity. The volume also includes an account of recent work on correctors in stochastic homogenization by Otto and his co-workers. Regrettably, this volume does not contain the texts of excellent lectures by James on solid-solid phase transitions, and by Kohn on the elasticity of thin sheets. We express our thanks to the PCMI steering committe, especially Richard Hain, John Polking and Ronald Stern, for their support of the program; to Cather- ine Giesbrecht and Dena Vigil for invaluable help with organization; and to Rafe Mazzeo for his assistance in bringing this volume to publication. Finally, we ex- press our thanks to the lecturers, students and researchers for their enthusiastic participation in the summer school.

Mark Bowick, David Kinderlehrer, Govind Menon, and Charles Radin Three Lectures on Statistical Mechanics

Veit Elser

IAS/Park City Mathematics Series Volume 23, 2014

Three Lectures on Statistical Mechanics

Veit Elser

Lecture 1. Mechanical foundations Laws, hypotheses, and models It is widely held that thermodynamics is built upon a foundation of laws that go beyond the standard laws of mechanics. This is not true. Even the microscopic level of thermodynamic description, called statistical mechanics, requires only the application of strict mechanical principles. The “laws” that appear to be new and indispensable to statistical mechanics are really hypotheses about the mathemat- ical consequences of the laws of mechanics. One such consequence, the ergodic hypothesis, has been rigorously demonstrated only for some simple model systems. However, its validity is not in doubt for the broad range of systems where it applies, that it is treated as an actual law, perhaps best known as the principle of equal a priori probability. We should really think of this law as a model for the statistical properties of mechanical systems. This model need not always apply, but when it does it is tremendously useful. Rather than define this hypothesis/model in the abstract, we illustrate it with a simple example.

Soft billiards Consider two identical mass m particles moving in a two-dimensional world and interacting by a potential that only depends on the distance between their centers. Because the ergodic hypothesis only applies to bounded systems we place the par- ticles in a box. To minimize the effects of the shape of the box, we let the world be a flat torus, in other words, a square with opposite edges identified. The mechanical description of this system is simplified by working not with the positions r1 and r2 of the particles, but instead, their centroid R =(r1 + r2)/2and relative position r = r1 −r2. The equations of motion for R and r are independent, the equations for R being that of a particle of mass 2m subject to no forces. On the other hand, the equations for r are more interesting and describe a single particle of “reduced” mass μ = m/2 subject to a potential U(r)fixedinthetorus.Wewill consider an especially simple potential that takes only two values: U(r)=U0 when |r|

Department of Physics, Cornell University E-mail address: [email protected]

c 2017 American Mathematical Society

3 4 VEIT ELSER, STATISTICAL MECHANICS

Figure 1. Two renderings of a trajectory in soft-billiards: on the torus (left panel), and in a periodic crystal (right panel). The shaded circular regions are at higher potential energy U0 relative to zero potential energy elsewhere.

of U0 and the total energy E of the equivalent single-particle. Shown is the case 0

Figure 2. Left: Stroboscopic rendering of the trajectory in Figure 1. The particle has lower kinetic energy in the circular region and moves slower there. Deflections, caused by the momentum-kicks at the potential discontinuity, are analogous to Snell’s law in optics. Middle: Same as the image on the left, but with tenfold increase in time span (also lower stroboscopic rate). Right: Tenfold increase in time span over middle image. hypothesis can explain both of these observations, we need to take a step back and recall how our model is described in the Hamiltonian formalism of mechanics. Our model has two degrees of freedom corresponding to the x and y coordinate 1 of the particle . Associated with each of these is a conjugate momentum, px and py, and the combined space of coordinates and momenta is called phase space.The equations of motion follow from the Hamiltonian, which in our case is p2 p2 H(x, y, p ,p )= x + y + U(x, y), x y 2μ 2μ U ,x2 + y2

Given some property θ(x, y, px,py) that depends on the Hamiltonian variables we can form averages in the sense of dynamics as follows: 1 T (0.1) θ = dt θ(x(t),y(t),px(t),py(t)), T 0 where the time evolution of the variables is determined by the Hamiltonian and a choice of initial conditions. The weak form of the ergodic hypothesis asserts that in the limit of large T any such dynamical average can alternatively be computed as a phase space average with respect to some distribution ρ,

θ = dx dy dpx dpy ρ(x, y, px,py) θ(x, y, px,py), thus sidestepping the intractability of chaotic time evolution. The ergodic hypothesis does not have much value to physicists unless the dis- tribution ρ is known rather precisely. Remarkably, there is a single very simple distribution that is believed to apply to almost all bounded systems whose dynam- ics is sufficiently chaotic. To formulate this distribution we need to acknowledge

1These have the topology of angular variables for our periodic boundary conditions. 6 VEIT ELSER, STATISTICAL MECHANICS that even in the presence of chaos there will be conserved quantities that the dis- tribution must respect. Typically, as in our example once we eliminated the trivial centroid motion, only the energy is conserved. The strong form of the ergodic hy- pothesis, or how the hypothesis is usually interpreted by physicists, asserts that ρ is supported on a connected level set of the conserved quantities and is otherwise uniform. In the case of energy – the Hamiltonian – as the only conserved quantity, we then have

(0.2) ρ(x, y, px,py)=ρ0 δ(H(x, y, px,py) − E), where E is the value of the conserved energy, δ is the Dirac delta-function, and ρ0 is a normalization constant. The statement

(0.3) θ = θ, with ρ in the phase space average given by (0.2), is how we will interpret the ergodic hypothesis from now on. A more descriptive term for the same thing is the principle of equal a priori probability: over the course of time, the system visits all states consistent with the conserved quantities with equal frequency. We are now ready to apply the ergodic hypothesis to find the frequency our particle visits a particular point (x0,y0) on the torus. For this we calculate the phase space average of

θ = δ(x − x0)δ(y − y0), and find, ∞ ∞

θ = ρ0 dpx dpy δ(H(x0,y0,px,py) − E) −∞ −∞ ∞ 2 = ρ0 2πpdp δ(p /(2μ) − K(x0,y0)), 0 where we have transformed to polar coordinates in the momentum integrals and defined the local kinetic energy

K(x0,y0)=E − U(x0,y0). Performing the final radial integral we obtain

θ =2πρ0μ, provided K(x0,y0) > 0. This result is remarkable mostly because it is indepen- dent of the point (x0,y0), in agreement with the counter-intuitive findings of our numerical experiment. Before we jump to the conclusion that the uniform sampling of position is a general feature of chaotic systems, we should calculate the corresponding phase space average for a soft billiards system in a dimension other than d =2.The result we obtain, d/2−1 θ∝K(x0,y0,...) , shows that d = 2 is special for its sampling uniformity. On the other hand, the conclusion that in dimensions d>2 the dynamics samples points at a higher rate, where its kinetic energy (speed) is greater, is even more at odds with our intuition! LECTURE 1. MECHANICAL FOUNDATIONS 7

Quantum states Physicists have a much more explicit concept of the “states” in a mechanical system than the phase-space formulation of the ergodic principle suggests in the abstract. That’s because the true laws of mechanics are the laws of quantum mechanics, where the states of a closed system are discrete and can be counted. Statistical mechanics is usually understood to be applicable only when a system has very many degrees of freedom. But as we saw in the soft billiards system, this is not a requirement at all. However, we now show that this characterization is correct after all, if “degrees of freedom” are reinterpreted as quantum mechanical states. Consider a rectangular region of extent [x0,x0 +Δx] × [y0,y0 +Δy]inthe soft billiards system, and assume the potential U(x, y) is constant in this region. The solutions of the Schr¨odinger equation for a particle confined to this region are products of sinusoids,

Ψ(x, y)=sin((x − x0)px/)sin((y − y0)py/), where is Planck’s constant divided by 2π and the momenta are required to have discrete values determined by the vanishing boundary condition (on the rectangular region of interest):

Δxpx/ = πnx,nx =1, 2, 3,...

Δypy/ = πny,ny =1, 2, 3,... Classical mechanics is the → 0 asymptotic limit of quantum mechanics. For any range of the momenta, say dpx and dpy, that we might care to resolve in classical mechanics, there are very many quantum states (allowed values of nx and ny): Δxdp Δydp x y . h/2 h/2 Given a similarly finite extent to which the kinetic energy can be resolved, p2 K<

y F

v x

Figure 3. A configuration of the freely-jointed polymer model with n = 6 rigid struts. The polymer-end on the left is fixed to the origin while the end on the right is moved with constant velocity v by an external mechanism. The fluctuating force F acting on the external mechanism is parallel to the end strut.

This includes the classical hard billiards problem: a particle confined inside a re- gion of uniform potential. Choosing the potential inside equal to zero so the kinetic energy equals the total energy E, we write this generalization as dN ∼ cVEd/2−1dE, where c involves the mass of the particle, Planck’s constant, the dimension, but no other characteristics of the region. The significance of this asymptotic result should not be overlooked, since not only is calculating the chaotic billiards trajectory of a classical particle intractable, so too is calculating the quantum spectrum of energy levels in an arbitrary region. In the next section we will need the integrated form, an asymptotic level counting formula: (0.6) N(E) ∝ VEd/2. The constant of proportionality is omitted, as it only depends on fixed constants. This is just the leading term of an asymptotic series, but the only one that matters in the classical limit. The correction terms depend on characteristics of the billiards region other than its volume, and are the subject of Mark Kac’s classic article [3] on being able to “hear the shape of a drum”.

Kinetic elasticity There is probably no better example of how quantum states assert themselves in classical mechanics than the kinetic polymer model to which we now turn. Figure 3 shows one configuration of the polymer: equal length rigid struts, free to pivot in the plane about joints which carry all the mass. This is also a “microscopic” model in the sense that there are no phenomenological forces at a smaller scale, such as the friction we would expect in a macroscopic joint. For the sake of tractability, we allow the masses and struts to behave as phantom entities that may freely pass through each other. Apart from the latter, this is a reasonably realistic model of a polymer in vacuum, as bond-angle forces — completely absent here — are often significantly weaker than the forces that fix the bond lengths. We will add the effects of a solvent environment to the model in a future lecture. LECTURE 1. MECHANICAL FOUNDATIONS 9

p(t)

t

Figure 4. Time series of the instantaneous power, p(t)=F(t)·v, delivered by the polymer to the external mechanism.

The quantum correspondence is seen in the behavior of the model when subject to an adiabatic process. We will examine the process where an external mechanism fixes one end of the polymer and moves the other with a constant velocity. A polymer of n struts therefore has n − 2 degrees of freedom. We can think of the moving end as an infinite (macroscopic) mass whose velocity v is unaffected by the motion of the polymer. The last strut of the polymer exerts a time-dependent force F(t) on the macroscopic mass (parallel to the strut). The polymer thus delivers instantaneous power p(t)=F(t) · v to the macroscopic mass, which is the negative of the power delivered to the polymer by the mass. Figure 4 shows a time series of p(t) for a 6-strut polymer during a period of time where the moving end has moved only a small fraction of one strut-length. The erratic nature of this function reflects, of course, the highly chaotic dynamics of the polymer. We will perform a number of numerical experiments with the 6-strut polymer. In reporting the results, we use the strut length for our unit of length. The adiabatic process is begun with zero separation of the ends and randomly generated initial conditions consistent with one end fixed and the other moving with velocity v. Our unit of speed is the root-mean-square speed of the masses at the start of the process. Likewise, our energy unit is the total energy — entirely kinetic — of the polymer at the start. The instantaneous energy of the polymer changes as a result of the fluctuating power: t (0.7) ΔE = − F(t) · v dt 0 r(t) = − F(t) · dr. r(0) What makes a process adiabatic is slowness in the changes of the external parame- ters. In the adiabatic limit the position of the moving end r(t) has hardly changed over a span of time during which many fluctuations in F(t) have occurred. This 10 VEIT ELSER, STATISTICAL MECHANICS

E(x)

2.2

2.0

1.8

1.6

1.4

1.2

1.0 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure 5. Four energy vs. extension curves, E(x), for the n =6 polymer model when the speed of the moving end is one-tenth the root-mean-square speed of the masses at x = 0. The thick curve is the function exp (x2/12). motivates us to rewrite (0.7) as x E(x) − E(0) = − F (x)dx, 0 where x = |v|dt = vt is the separation of the ends, and a time average defines a position-dependent force: F (x)dx = F(t) · dr. Our numerical experiments are strictly mechanical: we do not perform any actual averages. Instead, we record the instantaneous energy E of the polymer, at time t and end-to-end distance x = vt, as the function E(x). This function will have random features that vary from one set of initial conditions to another. Figure 5 shows four E(x) curves obtained when the moving end has speed v =0.1. While there is much randomness, there is also a clear trend: pulling on the polymer tends to increase its energy. In Figure 6 we see the results of four more experiments, but with v =0.01. Two things have changed at the slower speed: fluctuations have been suppressed by about a factor of 10, the same as the reduction in speed, and there is better evidence of a well defined average energy. Superimposed on these curves is the simple function exp (x2/12), whose significance will be made clear below. Since the equations of mechanics do not distinguish between the future and past directions of time, we know exactly what will happen in an experiment where the ends of the polymer are brought together from a stretched state: the energy will decrease along the same curve we found when it was stretched. If the mass at the end of the last strut was made finite, so its velocity could change, it would respond (as a new degree of freedom) to the rest of the polymer much as a mass attached to an elastic spring. From the quadratic behavior of E(x)atsmallx we see that this spring, at small extension, has a linear force law just as a conventional LECTURE 1. MECHANICAL FOUNDATIONS 11

Ex

2.2

2.0

1.8

1.6

1.4

1.2

1.0 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure 6. Same as Figure 5, but with the rate of extension slower by a factor of 10. spring. What is curious about this particular spring, however, is the absence of any potential terms in its microscopic Hamiltonian for the storage of energy. Although classical Hamiltonian mechanics does address the effects of adiabatic change when it is imposed on periodic motion, for our chaotic polymer a more powerful tool is needed. That tool is the very simple behavior of a quantum system in an adiabatic process. The rule for quantum systems is that a state will evolve so as to remain in an instantaneous energy eigenstate even when a parameter x in the (quantum) Hamiltonian is varied, provided the change is slow. The energy of the quantum system thus varies exactly as the energy E(x) of a particular energy eigenstate. Intractability or “non-integrability” of the classical equations of motion is, ironically, good in this respect because it means the corresponding quantum energy levels do not cross when a parameter is changed. The quantum analog of ourpolymersystem,whenpreparedintheNth energy state for x = 0, would still be in the Nth level when x is slowly changed to some other value. The energy levels of the quantum analog of our polymer model are within our reach once we realize this model is a multi-dimensional billiards for which Weyl’s asymptotic result (0.6) applies, where d = n−2 is the number of degrees of freedom. The only remaining hurdle is determining V (x), the dependence of the configuration space volume on the polymer’s end-to-end separation x. Putting that aside for now, from (0.6) we obtain E(N,x) ∝ (N/V (x))2/(n−2), and since N is constant in an adiabatic process,

− V (0) 2/(n 2) E(x)=E(0) . V (x) The calculation of V (x) is a much studied problem in statistics. Parametrizing the polymer by the angles θ1,...,θn of the struts relative to the axis of extension, 12 VEIT ELSER, STATISTICAL MECHANICS an explicit formula takes the form 2π 2π V (x)= dθ1 ··· dθn δ(cos θ1 + ···+cosθn − x)δ(sin θ1 + ···+sinθn). 0 0 Up to normalization, V (x) is just the probability a random walk of n unit steps arrives at a particular√ point, whose distance from the origin is x. For large n and x growing at most as n we can use the central limit theorem to get the estimate (0.8) V (x) ∝ exp (−x2/n), and thus 2 E(x)=E(0) exp x2 . n(n − 2) Central limit convergence is surprisingly rapid, as the comparison of this formula with the n = 6 numerical experiment in Figure 6 shows.

Statistical equilibrium Inmechanicsweusethetermequilibrium to describe motion at its very simplest, where all the variables are time independent. Statistical mechanics has its own notion of equilibrium, and time is again at the center of the definition. Time is the relevant quantity, both for the ergodic hypothesis and for a process to be adiabatic. A phase space average, by definition, is time independent. But it estimates the time averages that arise in our study of physical phenomena only when those averaging times are sufficiently large. In the soft billiards model “sufficient” is translated to “many deflections of the particle by the potential have occurred”. Adiabatic processes transform a system reversibly from one set of parameters to another set. But again, this is true only when the process is carried out over a long enough time. The function E(x) for the energy of the polymer model, and its derivative giving the elastic force, is only well defined when the rate of extension is slow. The long time-average in this case is that of the fluctuating force generated by the polymer and acting on the external mechanism. Only when diverse polymer configurations are sampled in the time taken for x to move through a small range does the force-average correspond to a true position-dependent force. Equilibrium, in statistical mechanics, is not so much a state or behavior that a system might settle into, but a statement about observations or processes being carried out at the appropriate temporal scale. The conditions for equilibrium are also what make the subject difficult. In extreme but by no means exotic cases, the ergodic hypothesis is known to fail, and it might not be possible to exercise adiabatic control. Consider the soft billiards model but with an attracting circular potential and negative energy: U0

2The KAM theorem applies to Hamiltonians with smooth potential functions, and therefore not to billiards. LECTURE 1. MECHANICAL FOUNDATIONS 13 hand, an arbitrarily small protrusion on the wall of this integrable billiard can be sufficiently randomizing to restore the ergodic hypothesis. The time scale for “statistical equilibrium”, by construction, can thereby be made arbitrarily long. Long time scales were also in evidence in our polymer model experiments. The long undulations that differentiate the four curves in Figure 6 suggest that the polymer has very slow modes of oscillation whose period is still beyond the time scale of the extension. Slow modes are a general feature of systems with many degrees of freedom; they represent the main mechanism whereby a system can be said to be “out of equilibrium”. Slow modes and results such as the KAM theorem are what dim the hopes of proving a general form of the strong ergodic hypothesis. Statistical mechanics, in response, has adopted the “law” of equal aprioriprobabilities as a model, with the understanding that this model may not apply to all systems.

Problems for study Snell’s law for particles Amassm particle in two dimensions moves in a potential that has only two val- ues, as in the soft billiards system. Its speed therefore takes two values: a high speed v1 and a low speed v2. Suppose the particle starts in the high speed region and is incident on a region of high potential with straight boundary, where its speed will be slow. Using momentum conservation, only in the force-free direction parallel to the boundary, show that the angles of incidence of the particle satisfy v1 sin θ1 = v2 sin θ2.Hereθ1 and θ2 are the angles subtended by the trajectory and the perpendicular to the noundary; θ1 = θ2 = 0 corresponds to normal incidence and no deflection. When θ1 exceeds a particular value this Snell’s law for particles has no solution and the particle is reflected back into the high speed region. Find this maximum angle and determine the law of reflection, again using momentum conservation.

Tracer particle analysis of soft billiards The ergodic hypothesis is difficult to prove, even for simple systems. This exercise should at least make the hypothesis plausible for the soft billiards system. Instead of following a single very complex trajectory, we will analyze simple families of trajectories over a limited time. The family we have in mind is best described as a set of tracer particles initially arranged with uniform density ρ along the y-axis. All particles are in the low potential region and have speed v1 and velocity in the direction of the positive x-axis. First show that the total time spent by all the tracer particles crossing a circular region of radius b in the low potential region is

2 T1 =(ρ/v1)πb . Next suppose the parallel streaming tracer particles encounter a circular region of high potential, where their speed slows to v2. The time spent by all the tracer particles crossing this region can be written as +b T2 = (ρdy)T (y), −b 14 VEIT ELSER, STATISTICAL MECHANICS where T (y) is the time spent in the region by a tracer particle with initial offset y from the center of the circle. Use Snell’s law to show that (2b/v ) 1 − (v /v )2(y/b)2 , |y| < (v /v )b T (y)= 2 1 2 2 1 0 , otherwise.

Now evaluate the integral for T2 and observe that it exactly equals T1. Weyl’s law Derive (0.5) by repeating the calculation of the result (0.4) for a billiards in d dimensions. Central limit analysis of polymer Apply the central limit theorem to a random walk of n unit steps, isotropically distributed in two dimensions, to derive the large n estimate (0.8). Slow modes of a polymer The slowest mechanical modes of a system bound the time scale of adiabatic pro- cesses. In a gas the slow modes are sound, the slowest being coherent oscillatory motion on large scales, where the density in one half grows at the expense of the other half. Speculate what might be the slowest modes in the kinetic polymer system. LECTURE 2 Temperature and entropy

The thermodynamic limit In the first lecture we saw that statistical mechanics provides a quantitative descrip- tion of mechanical phenomena where time has been completely eliminated. This is a great benefit in systems whose dynamics is chaotic, and conversely, the statistical description very much relies on chaos for the foundational model to be valid. This part of the subject applies just as much to systems of few degrees of freedom as it does to systems with many. Another level of modeling applies when systems become very large, the regime of phenomena in the thermodynamic limit.Asthe name suggests, it is only in this limit that the concept of temperature makes sense. It is also in this thermodynamic limit, of systems with a well defined temperature, that entropy may be defined as a commodity that is interchangeable with energy.

A model for the number of states An interesting mathematical quantity we can define for an arbitrary mechanical system (for which the ergodic hypothesis holds) is the number-of-states function:

(2.1) Ω(E)= dq1 ···dp1 ··· δ(H(q1,...,p1,...) − E).

This is the phase space integral of the uniform distribution promised by the ergodic hypothesis. With proper normalization its value is unity; without this normalization the integral Ω(E) is interpreted as the number of states of the system with energy in a fixed range of arbitrarily narrow width about E. To get our bearings we will calculate the number-of-states function for a system of n identical and weakly interacting particles in a three dimensional box of volume V : the ideal gas. In the limit of weak interactions the energy is just the kinetic energy of the particles. However, the interactions cannot be switched off completely because then each particle’s energy is fixed by the initial conditions, contrary to the ergodic hypothesis. Ignoring the ergodicity restoring interactions in H,theposition integrals in (2.1) give a factor V for each particle, and the 3n momentum integrals√ give the volume of a spherical shell√ in 3n dimensions of radius proportional to E and thickness proportional to 1/ E. Combining the numerical factors into a single constant C, n (3n−2)/2 n 3n/2 Ωgas(E)=CV E ∼ CV E , where the last step is appropriate when n 1. To motivate our general model for the number-of-states function, we note that the logarithm of Ωgas(E) is, up to logarithmic factors, proportional to the number

15 16 VEIT ELSER, STATISTICAL MECHANICS of particles in the limit of large n: 3 (2.2) log Ω (E) ∼ n log V + log E . gas 2 This linear behavior with n is the same as the behavior of V and E in the ther- modynamic limit, where the volume and energy per particle are held fixed. The intuition behind the linear behavior of log Ω, for general macroscopic systems, is that these systems can usually be partitioned into identical and nearly independent subsystems of some fixed size. Since Ω for independent subsystems is multiplica- tive, it will behave as the power of the number of subsystems. System properties that grow in proportion to the number of degrees of freedom, such as V , E and log Ω, are called extensive. Taking the energy derivative of (2.2) we obtain a quantity that behaves, in the thermodynamic limit, as the ratio of two extensive quantities: d 3 n (2.3) log Ω (E)= . dE gas 2 E Quantities that are fixed in the thermodynamic limit, such as density and now the quantity above, are called intensive. For a general macroscopic system we define the following intensive quantity: d (2.4) β(E)= log Ω(E). dE In the case of the weakly interacting gas of particles in three dimensions, β(E)is 3/2 times the inverse mean kinetic energy per particle. An equivalent and more illuminating restatement of (2.4), that the energy- derivative of the logarithm of the number-of-states function is intensive, is the following: (2.5) Ω(E + E)=Ω(E)exp(β(E) E). We are justified in keeping just the first two terms of the Taylor series for log Ω(E) provided the energy fluctuations E we consider are bounded as we take the ther- modynamic limit (since then E/E → 0). On the other hand, the value of the bound on E is arbitrary, and so the change in the number-of-states function can be substantial when it exceeds the energy scale defined by β−1.

Temperature Up to now we have discussed the number-of-states function in mathematical terms, without a physical context. We will arrive at an interpretation of the β-function by considering two weakly interacting macroscopic systems. By “weakly interacting” we mean that the Hamiltonian for the joint system is well approximated by the sum H ≈ H1 + H2, where the two parts have no variables in common and the neglected terms allow for the exchange of energy between the parts. Two subsystems having this description are said to be in thermal contact. By the ergodic hypothesis, the joint phase space distribution of the system, for total energy E = E1 + E2,is ρ = ρ δ(H + H − E − E ) 0 1 2 1 2    = ρ0 dE δ(H1 − E1 − E )δ(H2 − E2 + E ), LECTURE 2. TEMPERATURE AND ENTROPY 17

 where ρ0 is the normalization constant. When the integration variable E is not macroscopic in scale, the joint distribution ρ can be thought of as the product of two distributions, one for system 1 with macroscopic energy E1 and energy fluctuation  +E , the other for system 2 with macroscopic energy E2 and energy fluctuation −E.Integratingρ over phase space and using (2.5), we obtain    1=ρ0 dE Ω1(E1)exp(β1(E1) E )Ω2(E2)exp(−β2(E2) E )   ∝ dE exp ((β1(E1) − β2(E2))E ).

When β1 − β2 > 0, the phase space distribution favors arbitrarily large positive fluctuations E, that is, the transfer of energy from system 2 to system 1. The values E1 and E2 for the two system energies, even in an average sense, are therefore suspect. For such large E our model for the number-of-states function breaks down, and we should say that the macroscopic energy of system 1 has increased and that of system 2 has decreased. The expansions of the number-of-states functions should   then be about energies E1 >E1 and E2 β2(E2), and therefore −  −  β1(E1) β2(E2) >β1(E1) β2(E2). The new difference of β functions, if still positive, is smaller and will favor positive energy fluctuations to a lesser extent than the original choice of average system energies. Continuing in this way, we see that there exists a special partitioning of ∗ ∗ the energy as E = E1 + E2 such that ∗ ∗ (2.6) β1(E1 )=β2(E2 ), where neither sign of energy fluctuation is favored. Only this partitioning of the total energy establishes average energies for the two subsystems. We can ask what would happen if the two systems considered above were ini- tially isolated and prepared with energies such that condition (2.6) was not satisfied and then brought into thermal contact with each other. The analysis above shows that the joint number-of-states function in that case favors a redistribution of en- ergy such that condition (2.6) is restored. What actually happens, in physical terms, is that a macroscopic quantity of energy is transferred from the system with small β to the system with large β. Macroscopic energy transfer without changes in macroscopic parameters, such as volume, is called heat. The transfer of heat ceases once the β values of the contacting systems are equal. Temperature is operationally defined as the property that two systems must have in common for there to be no transfer of energy (heat). Statistical mechanics defines, quantitatively, the absolute temperature T as

kBT =1/β(E), where Boltzmann’s constant kB serves to convert the conventional Kelvin (K) units of temperature to units of energy (J): −23 kB =1.3806488 × 10 J/K. 18 VEIT ELSER, STATISTICAL MECHANICS

The transfer of heat in the direction of small β to large β is consistent with everyday experience, of heat flowing from hot to cold (large T to small T ). Referring to (2.5) we can also give the Boltzmann constant a microscopic interpretation. Suppose we have a macroscopic system at temperature T = 1000 K; by transferring the  −20 microscopic energy E = kBT ≈ 10 J to the system in the form of heat, its number-of-states is increased by the factor e =2.718 ... .

The Boltzmann distribution Suppose we have two systems in thermal contact: a truly macroscopic system and a much smaller system. The macroscopic system is so much larger that any energy it exchanges with the smaller system is effective microscopic; its role is simply to establish a temperature T . If we are primarily interested in the small system, and the ergodic hypothesis gives us a uniform distribution for both systems in thermal contact, what is the marginal phase space distribution of the small system? Our model for the number-of-states function of a macroscopic system (2.5) provides the answer to this question. Let H(q,...,p,...) be the Hamiltonian of the macroscopic system, sometimes referred to as the thermal reservoir. This Hamiltonian is weakly coupled to the small system Hamiltonian H(q,...,p,...). We write the phase space distribution of the weakly coupled systems, with total (macroscopic) energy E,asanintegral over energy fluctuations E, just as we did above in the discussion of temperature:    − −    ρ(q1,...,p1 ...; q1,...,p1,...)=ρ0 dE δ(H E E )δ(H + E ). Integrating over just the phase space variables of the macroscopic system and using (2.5), we obtain the marginal distribution:       ρ(q1,...,p1,...)=ρ0 dE Ω(E)exp(β(E) E ) δ(H + E ) ∝ −    exp ( βH (q1,...,p1,...)). The only property of the macroscopic system that has survived is its temperature. This distribution, named after Boltzmann, is far from uniform. Its accuracy, for energy fluctuations potentially spanning many orders of magnitude, is limited only by the degree to which the thermal reservoir has more degrees of freedom. The “Boltzmann factor”, usually written exp (−ΔE/kBT ), represents the reduction in the number-of-states function of the reservoir when it gives up energy ΔE to the smallersystemitisinthermalcontactwith.

Thermal averages For the rest of this lecture we always consider systems in contact with a thermal reservoir. The system Hamiltonian will be called H, and its quantum energy levels EN (x) may depend on an external parameter x. We choose to work with quantum states instead of classical Hamiltonian variables because it is through the former that we are able to define the force associated with an adiabatic change in x.By writing averages as explicit sums over energy levels we also emphasize the fact that our system can be truly microscopic. We write the Boltzmann probability of energy state N as

pN =exp(−βEN (x))/Z, LECTURE 2. TEMPERATURE AND ENTROPY 19 where the normalization factor Z(β,x)= exp (−βEN (x)) N is called the partition function. The average of the Hamiltonian, in the thermal or Boltzmann distribution, has a neat expression in terms of Z: (2.7) H = pN EN N 1 = exp (−βE )E Z N N N ∂ = − log Z. ∂β Nearly the same kind of expression gives us the thermal average of the force: dE (2.8) F = p − N N dx N 1 dE = exp (−βE ) − N Z N dx N ∂ = β−1 log Z. ∂x Entropy We can relate the thermal averages for energy and force with the help of another thermodynamic quantity, the entropy. Up to the choice of units, the entropy in statistical mechanics is defined exactly as in information theory: (2.9) S = kB pN log (1/pN ). N Multiplying S by the absolute temperature T of the thermal reservoir we obtain something having units of energy. To arrive at an interpretation of this energy, we substitute the Boltzmann probabilities for p : N exp (−βE ) S = k N (log Z + βE ) B Z N N

= kB log Z + H/T. After multiplying by T , taking the derivative with respect to the external parameter x and using (2.8), and rearranging, we obtain: ∂H ∂S = T −F . ∂x ∂x The final step is to integrate this between two values of the external parameter: x2 H(x2)−H(x1) = T (S(x2) − S(x1)) − F (x)dx. x1 In standard thermodynamic notation this takes the form (2.10) ΔU = T ΔS + W, where U represents the “internal energy” of the system and W is the work performed on the system (by the external mechanism that caused the change in x). Because 20 VEIT ELSER, STATISTICAL MECHANICS the temperature is held fixed by contact with the thermal reservoir, the process described by (2.10) is called isothermal. Whereas the ΔU and W terms of the thermodynamic relation (2.10) are clearly energies, the relationship between energy and the entropy term is more mysterious. In thermodynamics, the product T ΔS is often written as Q, the transfer of energy in the form of heat. Statistical mechanics provides a mechanistic basis for heat, and from the definition of entropy (2.9) we see that the outcome of the transfer of heat energy from the reservoir must be a change in the probabilities pN .Since the Boltzmann probabilities are the marginal distribution of the joint system, to understand these changes we need to consider changes in the reservoir as well. To help us track the trail of energy in the isothermal process, we note that the premise of “weak coupling” , between system and reservoir, implies not only that the variables in the two parts are independent, but that the phase-space proba- bility distribution of the joint system is (again approximately) the product of two independent distributions. The entropy of the joint system is therefore the sum of the entropies of the system and reservoir:

S + Sres = S0.

The joint entropy S0 is constant because the change in x imposed from the outside is adiabatic. From this we conclude ΔS = −ΔSres. Now the reservoir has a uniform distribution over all its states by the ergodic hypothesis, and that implies its entropy is related to the reservoir number-of-states function by  Sres = kB log Ωres(E + E ), where, as before, E is the macroscopic energy that establishes the temperature T and E is the much smaller energy transferred from the system to the reservoir. Using our model (2.5) for the reservoir number-of-states function, we obtain  ΔSres = E /T, and finally  Q = T ΔS = −T ΔSres = −E . We now have a complete accounting of all the energies in the thermodynamic re- lation (2.10): the change in the system internal energy U is caused both by the input of energy in the form of work W by an external mechanism, and by the flow of energy (heat) E into the reservoir.

Thermal elasticity It might be a good idea to review the polymer model of Lecture 1 because it will serve as our main example of the concepts just introduced. The energy levels of this model have the large N asymptotic form 2/d EN (x) ∝ (N/V (x)) , where d = n − 2 is the number of degrees of freedom for a polymer of n struts and the configuration space volume for polymer extension x is approximately (x/l)2 V (x) ≈ V (0) exp − n in which we restored the strut length l. We will use the proportionality symbol to hide constants that do not depend on parameters such as x and β. LECTURE 2. TEMPERATURE AND ENTROPY 21

A polymer in solution is perhaps not as convincing an example of weak cou- pling to a thermal reservoir as a bottle of gas making thermal contact with its environment only through the walls of the bottle. The merits of our model on a phenomenological level would depend on whether the presence of the solvent intro- duces configuration-dependent forces, or if this only modifies the parameters, such as the masses, already in the model. Weak coupling would certainly be valid for a polymer in vacuum and interacting only with ambient thermal blackbody radiation. Whether the environment is a solvent or blackbody photons, we will assume that weak coupling applies and the only property of this environment that matters is its temperature. Our first task is to calculate the partition function Z: ∞

(2.11) Z = dN exp (−βEN ). 0 Already we have made two approximations. First, in replacing the sum over N by an integral, we have declared that discreteness of the energy levels is insignificant by our choice of β being not too large. More specifically, we limit ourselves to temperatures such that ΔE/kBT 1, where ΔE is a typical level spacing. Second, the upper limit of our integral clearly extends beyond the range of validity of the model. By keeping β abovesomeminimumvaluewecanensuretheintegralcuts off beyond the point where this is a problem. Our partition function thus will be valid in a range of β bounded both above and below. Using the relation ∝ d/2 N V (x)EN , we can change variables in the integral (2.11) : ∞ Z(β,x) ∝ V (x) dE Ed/2−1 exp (−βE) 0 ∝ V (x)β−d/2. The average energy and force then follow from (2.7) and (2.8): H =(d/2)k T B 1 dV 2k T F = k T ≈− B x. B V dx nl2 Our result for the average energy depends only on the exponent in Weyl’s law for our billiards Hamiltonian. The case of billiards in flat space is also covered by the equipartition theorem, which applies to quadratic Hamiltonians and asserts that the average energy is simply (1/2)kBT per positive eigenvalue of the quadratic form. A flat space billiard in dimension d has a quadratic kinetic energy with exactly d positive eigenvalues. The fact that the configuration space of our polymer is curved does not change the average energy from the flat space value. The average force generated by the polymer and acting on its moveable end has exactly the form of a Hookean spring, with stiffness proportional to the absolute temperature. A hot polymer makes a stiffer spring because there is a steeper entropic penalty for being extended when the temperature is high. We can see this also by evaluating the entropy difference, k T T ΔS = T (S(x) − S(0)) = − B x2 nl2 22 VEIT ELSER, STATISTICAL MECHANICS and noting that ΔU =0(sinceH is independent of x) in (2.10): k T 0=− B x2 + W. nl2 Thus all the work performed by the external mechanism on extending the polymer is cancelled by the loss in entropy.

Free energy Statistical mechanics makes yet another contact with thermodynamics through the Helmholtz free energy, defined as F = −β−1 log Z = H−TS. Both H and TS (through its change Q = T ΔS) have macroscopic, thermody- namic interpretations. In statistical mechanics these quantities are defined mi- croscopically through the probabilities of the individual energy levels, pN .The connection between the two points of view takes the form of a minimum principle. Suppose we did not know that the probabilities of a system in contact with a thermal reservoir have the Boltzmann form. Instead we propose that the proba- bilities are determined by the property that they optimize something. Maximizing the entropy gives the uniform distribution, while minimizing the energy just gives the lowest energy state(s). As something in between, we try minimizing the free energy F with respect to the probabilities. Because the probabilities sum to 1, this is a constrained minimization problem that we solve by the method of Lagrange multipliers. Thus we minimize the function −1 F − λ pN = pN (EN + β log pN − λ) N N with respect to the pN subject to no constraint and general Lagrange multiplier λ, andthensolveforλ such that the pN sum to 1. The result of this easy exercise is that the pN have exactly the Boltzmann form. The appeal of optimization principles is that they provide a basis for intuition. In the case of the free energy minimization principle we see how temperature tips the scale in favor of energy or entropy. At low temperatures the principle emphasizes energy, and this is what a system will minimize. When the temperature is high, entropy gains the upper hand.

Thermal equilibrium Although statistical mechanics is the study of mechanics from which time has been eliminated, we should not overlook the significance of time scales. In the previous Lecture we saw that a process is only adiabatic when it is carried out on a time scale that is long on the scale of the system’s dynamics. Another time scale is relevant when our system is in contact with a thermal reservoir. This scale is set by the rate that energy can flow between system and reservoir. As we get ever closer to the ideal of weak coupling, this rate of thermal equilibration goes to zero and the time scale diverges. In this Lecture we found that the force generated by a freely-jointed polymer is changed when it is in contact with a thermal reservoir, over what it was in isolation. In isolation the polymer energy increases when extended, while in a solvent at temperature T its energy stays constant at a value set by T . When the coupling LECTURE 2. TEMPERATURE AND ENTROPY 23 to the solvent is weak, it may be possible to extend the polymer quickly and not maintain thermal equilibrium with the solvent. In that case the polymer will gain energy, as in the adiabatic process. Processes are often described as adiabatic for just this reason: there is insufficient time to transfer energy to and from the thermal environment.

Problems for study Naturalness of logarithms Does physics “know” about the number e, the base of the natural logarithms? Information theory, by convention, defines entropy with the base 2 logarithm. Is the entropy of statistical physics more natural, as an outgrowth of natural law? Not unrelated is the Boltzmann distribution, in which e also figures prominently. Ideal gas as multi-dimensional billiards An ideal gas of n identical point masses in a volume V is equivalent to a billiard (1 particle) in 3d dimensions and volume V n. Use this equivalence and Weyl’s asymptotic law for the Nth energy level to calculate the number-of-states function Ωgas(E). As another application of Weyl’s law, show that the energy of the gas satisfies V 2/3 E(V )=E(V ) 0 , 0 V when the volume of the gas is changed adiabatically (V0 is an arbitrary reference volume). Slightly non-ideal gas The hard-sphere gas model is a multi-dimensional billiard, just like the ideal gas, the only difference being that the configuration space is restricted by the constraint that all particles have separation at least 2b,whereb is the hard-sphere radius. The volume of the billiard is therefore not equal to V n, but something smaller. Find an approximation for the reduced billiard volume that applies when the total volume covered by spheres is much smaller than V . Ideal gas in contact with thermal reservoir Revisit the ideal gas system, but now in contact with a thermal reservoir at temper- ature T . Use the energy levels from the second problem to calculate the partition function, and then (2.7) to find the average energy. Is your answer consistent with the equipartition theorem? The analog of the external parameter x, by which we defined force in the polymer model, is the volume V occupied by the gas. When the gas is in the energy state EN its pressure is ∂E p = − N . ∂V Find the thermal average p in analogy with the calculation of average force for the polymer. Does the result surprise you? Compare the formula for the pressure of the gas at temperature T with the formula for the gas in isolation, that is, when the gas is in a particular energy state. Select the energy state so the two gases have the same pressure p0 at the reference volume V0. 24 VEIT ELSER, STATISTICAL MECHANICS

Isothermal compression of the ideal gas An ideal gas in contact with a thermal reservoir at temperature T is compressed from volume V1 to volume V2. Calculate the change in internal energy, the change in entropy, and the work performed on the gas using V2 W = − pdV. V1 Check that the general thermodynamic relation (2.10) holds. Cosmic microwave background The universe is said to be filled with microwave radiation at absolute temperature T =2.73 K. How is it possible to say a system as large as the universe is at some temperature when nothing bigger exists that could be performing the role of thermal reservoir? Also, the temperature of this radiation is claimed to have been much higher in the distant past — how is that possible? LECTURE 3 Macroscopic order

Macroscopic manifestations of microscopic order In addition to providing a microscopic foundation for thermodynamics, statistical mechanics is also instrumental in showing how physics on the micro-scale is trans- lated into macroscopic phenomena. The classic example of elementary interactions transcending many orders of magnitude in scale is the origin of crystal facets, and the related fact that crystals scatter radiation much like a macroscopic mirror but according to rules derived directly from long range geometric order in the atomic structure. We will examine this example by way of a simplified model. Statistical mechanics serves us in two ways: to explain the mechanism of microscopic order, and also to set limits on the degree to which microscopic order translates to order on the macro-scale.

Hard spheres: microscopic order In materials such as silicon, the origin of crystallinity is directly linked to the bonding geometry of the constituent atoms. Atomic order in these materials comes about through the minimization of energy. In its crystalline form, the energy of silicon is about 5 eV per atom lower than it is when the atoms form a gas. Silicon atoms in contact with a reservoir whose thermal energy scale kBT is several times smaller than 5 eV (say a reservoir at 1,000 K) will have such a high probability of being in the unique crystalline configuration that there is no further role for statistical mechanics. Order, even on a microscopic scale, can happen through entropy as well. The most studied example of this phenomenon, called order by disorder, is the hard sphere system. In this model there is no energy scale: the energy of any configu- ration of nonintersecting spheres is the same, which for convenience we take to be zero. The noble gas atoms (helium, neon, etc.) are well modeled by this system, because the pair potential energy rises sharply below a certain distance; at low temperatures such configurations are simply excluded as they would be, for any temperature, in the hard sphere system. We will show the results of some numerical experiments on the hard disk system in two dimensions to explain the mechanism of order by disorder. Figure 1 shows the initial positions of 63 hard disks in a square box. They are separated by small gaps from each other and the walls of the box. After giving them random velocities we run the equations of motion and see what happens. Packed as they are in a tight square lattice, one might expect the disks to ceaselessly rattle around, maintaining their average positions. One of the disks has been removed to check whether this hypothesis is robust. If the square crystal maintains its integrity, then the missing

25 26 VEIT ELSER, STATISTICAL MECHANICS

Figure 1. Initial positions of 63 hard disks. Upon being given random velocities the disks rearrange themselves over time into more loosely packed configurations, such as the one shown in Figure 2. disk, or “vacancy”, should diffuse around without disturbing the crystal as a whole. But as Figure 2 shows, this is not what happens. After a relatively short time, measured by numbers of collisions with neighbor- ing disks, the square crystal structure settles into a more loosely packed structure, such as shown in Figure 2. We can interpret the disintegration of the square lattice over time as a manifestation of free energy minimization with respect to the static probabilities of statistical mechanics. In the case of hard spheres (disks), free en- ergy is minimized, at any temperature, by maximizing the entropy. When the disks are packed as a square lattice with small gaps, there is very little “free volume” of movement for each disk: the entropy is very small. By adopting a different kind of order, and favoring configurations that have a greater capacity for disorder, the entropy is higher and the free energy is lower. Although the disks in Figure 2 appear disordered, they too possess crystalline order. A hexagonal lattice is evident when we aggregate positions over time, as shown in Figure 3. That the hexagonal crystal must prevail above some value of the density of disks is reasonable, because exactly at the maximum packing density the hexagonal arrangement is the only allowed configuration of disks. The question for statistical mechanics to answer is whether this hexagonal order persists, even at densities below this maximum density of “close-packed” disks. A simple plausibility argument, for the existence of crystalline order below the close-packing density, goes like this. Let v = V/n bethevolumeperdiskfora LECTURE 3. MACROSCOPIC ORDER 27

Figure 2. A typical configuration of 63 hard disks has nine rows of seven disks in a hexagonal arrangement. system of n disks, where “volume” in this case the total area√ of the system. At close packing a system of unit-radius disks has v = vc =2 3. What could the behavior of the total configuration space volume Vn of a system of n disks look like, as v tends to vc? This quantity is a function of the parameter v, vanishes when v = vc, and scales as volume in 2n dimensions. A reasonable candidate (for n large so boundary effects are small) is n (3.1) Vn ∼ (c (v − vc)) , the only unknown being the constant c. We can even estimate c by interpreting c (v − vc) as the free volume available to one disk when its neighbors are fixed at their average positions. This gives c = 1. For a better estimate of c we would have to consider correlations in the motions of the disks. But whatever the true value of c, the fact that the asymptotic form (3.1) makes reference to a specific crystalline configuration (and applies for v>vc) is consistent with the proposition that there is order even below the close-packed density. Statistical mechanics plays a greater role in determining the crystalline struc- ture of hard spheres in three dimensions because geometry by itself is inconclusive. In three dimensions there is not a unique densest packing but an infinite set of packings differing in the sequence that three kinds of hexagonally ordered layers are stacked. As shown in Figure 4, above any one layer the next layer must be of one of the other two types. Referring to the three kinds of layer as A, B and C,there are two stacking sequences that prevail in actual crystals: the fcc or face-centered- cubic sequence ABCABC... and the hcp or hexagonally-close-packed sequence 28 VEIT ELSER, STATISTICAL MECHANICS

Figure 3. Scatter plot, over a period of many collision times, of the distribution of disk centers. The averages of these distributions form a hexagonal lattice.

ABAB . . . . It is widely believed that the asymptotic form (3.1) holds for hard spheres [5], and it is only the constant c that distinguishes the different stacking sequence structures. Extensive numerical calculations were necessary to resolve dif- ferences among these constants and it is now believed that the fcc and hcp sequences give the extremes in the spectrum of values. The configuration space volumes for these extremes are still very close [6]:

cfcc/chcp ≈ 1.00116.

Since the entropy of the hard sphere system is just kB times the logarithm of Vn (plus a term that just depends on the temperature), the two sphere packings have free energy difference cfcc (Ffcc − Fhcp)/n = −kB T log ≈−0.00116 kBT. chcp We can compare this purely entropic contribution to the free energy difference to any energetic contributions we neglected when modeling our noble gas atoms as hard spheres. Given the small size of the entropic contribution, the selection between the fcc and hcp crystal forms is most likely determined by energy. A more difficult question is determining the largest volume per disk (or sphere) below which crystalline order first appears. This is a much studied problem and LECTURE 3. MACROSCOPIC ORDER 29

Figure 4. Layer stacking in close-packed spheres. The large spheres show a single hexagonal layer, of type A. The layer above this layer is again hexagonal, but there is a choice of centering the spheres on the small light colored spheres, type B,orthesmall dark spheres, type C. well beyond the scope of these lecture. We can at least develop an intuitive un- derstanding of such order-disorder phase transitions by once again considering free volumes. In the disordered or gas phase, the disks are only weakly correlated. Even so, a typical disk will be surrounded by a number of nearby disks that strongly limit the size of its free volume. The entropy of these disordered configurations will be diminished as a result of these “caging” effects. To partially mitigate the entropic penalty of caging, disks may coordinate on a set of mutually beneficial average- positions. While this introduces an entropic penalty, by constraining the average positions to a crystal, the gain in free volumes enabled by these average positions can result in a net entropy gain. The density where this social contract among disks first takes effect marks the phase transition to the ordered, or solid phase.

Hard spheres: macroscopic order In our argument for microscopic crystalline order in the hard sphere system we considered the limit v → vc as n was held fixed. When addressing macroscopic properties we are actually interested in the opposite limit: v fixed and n →∞. Does crystallinity survive in the thermodynamic limit? Numerical experiments with disks casts doubts on the existence of macroscopic order. Even when the gaps between disks are small in the perfectly ordered arrange- ment, there can be large net motions of disks when gaps are slightly compressed or expanded over large regions. In large systems we often observe that rows of 30 VEIT ELSER, STATISTICAL MECHANICS disks deviate from straight lines by several disk diameters, sometimes opening gaps wherein the hexagonal microscopic order is lost. To address the large scale motions of disks and their threat to macroscopic order, we construct a model for just these degrees of freedom. We start with a small system, such as the one in Figure 2, whose crystal structure is under our control by means of boundary conditions. Let r denote the average positions in the crystal with perfect hexagonal symmetry and volume V . We can displace the average positions by a small linear transformation r = r + u(r)=r + A · r by applying the same displacement to the boundary of our system. From the work performed by the boundary on the disks we can determine, in principle, the free energy change associated with any linear distortion of the crystal structure. The four degrees of freedom of the matrix A decompose into expansion, shear and rotation modes: a + a a + a A = 0 1 2 3 . a2 − a3 a0 − a1

The parameter a0 corresponds to uniform expansion (or compression for negative a0), a1 and a2 are the two orthogonal modes of shear, and a3 generates a rotation. By general principles we can argue that the expansion of the free energy to second order takes the form − − 2 2 2 ··· (3.2) F (A)/V F (0)/V = 2pa0 + κ1a0 + κ2(a1 + a2)+ .

The rotation parameter a3 does not appear at all because the free energy is un- changed when the system is rotated. Only the expansion parameter may appear to first order because a linear term in a1 or a2 is inconsistent with hexagonal symme- try having the lowest free energy (at constant volume). The coefficient of the a0 term, proportional to the volume derivative of the free energy, is just the pressure (its analog in two dimensions). Finally, the two shear mode stiffnesses are equal by the assumed hexagonal symmetry of the crystal. To model the macroscopic system we interpret the free energy change (3.2) of the small system with constant distortion A as the local change in the free energy density of a large system with variable A.Since

a0 =(∂xux + ∂yuy)/2

a1 =(∂xux − ∂yuy)/2

a2 =(∂xuy + ∂yux)/2, the free energy of the large system takes the following form as a functional of the displacement field u(r): D (3.3) F (u)= d r (−p ∇·u + Cijkl(∂iuj )(∂kul))

Although we derived this for the case of disks in dimension D = 2, in other dimen- sions (and different packing geometries) only the details of the elasticity tensor C are changed. We can think of the integrated free energy density (3.3) as the Hamiltonian of a new system whose variables are the displacement field u(r). This Hamiltonian has no momentum variables, and it is not possible or even correct to try to derive equations of motion for the displacement field. The correct interpretation of (3.3) is LECTURE 3. MACROSCOPIC ORDER 31 that it provides an efficient computation of the Boltzmann distribution for systems so large that the displacement field has become a complete specification of the system’s degrees of freedom, the microscopic details having been absorbed by the definitions of the pressure and the elasticity tensor. Since the field u(r) vanishes on the boundary of the macroscopic system, the pressure term in (3.3) integrates to zero. The squared-gradient form that remains appears in many contexts, not just hard spheres, and is our focus in the next section.

Order in the height model The height model is a model that captures the key elements of long range order in a variety of systems. When applied to macroscopic order in the hard sphere system, the “height” corresponds to a single component of the elastic displacement field. In the next section we will see that it can also represent something much more abstract. The height model has only one interesting parameter: the dimension of space D.WhenD = 2 the variables of the height model, h(r), might represent the actual height of a surface above points r in the plane. The case D = 1 models a random walk, when the single space coordinate is reinterpreted as time and h(t)represents the position of the walker at time t. In fact, we can think of the height model in D dimensions as a generalization of the random walk, where D is not the dimension of the space that is being walked within, but the dimensions of the entity that is “walking”. We construct new variables for the height model from the amplitudes hk of plane-wave modes: ik·r (3.4) h(r)= e hk k Because the height is real-valued, the complex amplitudes satisfy the constraint ∗ h−k = hk. The mode wave-vectors k are chosen with each component an integer multiple of 2π/L so the height is a periodic function on a hyper-cubic domain of volume V = LD. We exclude the k = 0 mode because it does not contribute to the free energy of the height model. Since the model is only meant to address macroscopic properties, we place an upper cutoff kmax = k0 on the magnitudes of the wave-vectors. The exclusion of k = 0 means that there is effectively also a lower cutoff, kmin =2π/L. We are not so much concerned with the precise values of these cutoffs, but the more salient fact that only the lower cutoff scales in a certain way with the system size L. In our analysis of the height model we will encounter sums over all the wave- vectors, a finite set, of functions that are insensitive to the discreteness of this set. We can therefore approximate such sums by integrals: ··· ≈ ρdDk ··· ,   k kmin< k

p(h0)=δ(h(r0) − h0). The angle brackets denote the average with respect to the Boltzmann distribution for the height model Hamiltonian H. Using the Fourier representation of the delta function, we can express the probability distribution as 1 +∞ dq −iqh0 p(h0)= e Z(q), Z(0) −∞ 2π where ik·r0 2 2 Z(q)= dhk exp 2iq Re(e hk) − βκV k |hk| , C k+ and the product is over just one representative of each (k, −k) pair. By rotating the phase of the complex integration variable hk we see that the integral does not depend on the position r0, in agreement with the translational invariance of the probability distribution. Upon performing the Gaussian integrals for each k+ we obtain π q2 Z(q)= exp − , βκV k2 βκV k2 k+ and from that the Fourier representation of the probability: +∞ 2 dq − 1 q iqh0 − p(h0)= e exp 2 . −∞ 2π 2 βκV k k The final form of the distribution is Gaussian 2 1 h0 p(h0)=√ exp − , 2πσ2 2σ2 where the width σ takes three asymptotic forms, depending on dimension, when the sum over modes, k 1 V max ≈ Ω kD−3dk, k2 (2π)D D k kmin is evaluated in the asymptotic limit k k : ⎧ min max ⎨ L/(βκ) ,D=1 2 (3.5) σ ∝ log (k0L)/(βκ) ,D=2 ⎩ D−2 k0 /(βκ) ,D>2. LECTURE 3. MACROSCOPIC ORDER 33

ab

ab

Figure 5. Periodic (top) and random (bottom) sequences of two lengths a and b.

The proportionality hides numerical factors (2π, spherical surface areas ΩD)to emphasize the dependence on the system size L. Since D = 1 corresponds to the ordinary√ random walk, it comes as no surprise that we recover the well known σ ∝ L growth of the root-mean-square “height” with the “time” L of the walk. The important lesson here is that above D =2the “order” in the height is perfect in the sense that the width σ is bounded — stays microscopic — as we take the limit L →∞. The ordered, system-wide value of the height in our model was arbitrarily set to zero in our mode expansion (3.4) when we omitted the k = 0 mode. Because the Hamiltonian does not depend on this mode, height fluctuations that do not grow with L would have been obtained for any value of the ordered height in dimension D>2. This fact makes the height model one of the simplest examples of spontaneous symmetry breaking. The symmetry being broken is the uniform translation of the height over the entire system. Above any point, the distribution of heights has a fixed distribution in the limit of infinite system size. The mean of these heights can be determined even from a single configuration of the surface h(r), since, as further analysis shows, the fluctuations at different points are only weakly correlated. As a result, the value of the height spontaneously selected by the system can be determined with arbitrary precision, and from a single observation, as the system size increases. Taking the height variable to be a component of the elastic displacement field u(r) for hard disks or spheres, we see that true long range order exists only in the case of spheres in D =3.FordisksinD = 2 the amplitudes of thermal displacement fluctuations grow logarithmically with system size and therefore the degree of order is borderline.

Random tilings The height model makes a surpassing appearance in discrete models of solid or- dering. Although the main application is to order in two and three dimensions, the general idea can be explained with a simple model in one dimension. Suppose we have a solid in one dimension comprised two two rigid motifs (e.g. molecules) of length a and b. When these alternate in the structure, as shown in Figure 5, the resulting structure has period a + b. Diffraction experiments provide a direct probe of this order. For simplicity, suppose the motifs are bonds of two lengths between identical atoms and that these atoms scatter radiation. Let X be the set of atom positions, and we are interested in experiments where the size N of this set is very large. The structure factor, defined by S(q)= exp (iqx) x∈X 34 VEIT ELSER, STATISTICAL MECHANICS

r

r

α

Figure 6. Projection of a lattice path onto the line r||,forming the sequence of lengths sin α and cos α. gives the amplitude of radiation scattered by the entire system. Here q is the momentum change of the radiation in the scattering process, determined both by the wavelength and the scattering angle1.Whenq is an integer multiple of 2π/(a+b) the terms in the sum repeat and S(q)isproportionaltoN,thenumberoftermsin the sum. This scaling of the structure factor — the phenomenon of Bragg peaks at special q’s — is the signature of periodic order. Figure 5 also shows a more random sequence of the two motifs. What can we say about the behavior of the structure factor for these? We will find, that with relatively weak assumptions about the degree of disorder, even these structures exhibit Bragg peaks. A systematic way to analyze the diffraction properties of general sequences of the two lengths is to embed the structure in two dimensions, as shown in Figure 6. Let a =cosα and b =sinα be the two lengths. We are interested in the case where a and b are incommensurate; if they had a common multiple c,then we would trivially get a Bragg peak for every q that is a multiple of 2π/c.The construction shown in Figure 6 is to project edges of the square graph onto a line making angle α with the horizontal axis. Any sequence of lengths a and b thus corresponds to a path through the square lattice. The line onto which we project may only pass through a single lattice point, the origin, since otherwise a and b would be commensurate. Similarly, from any projected lattice point on the line we can reconstruct a unique lattice point. Let r be a general point of the square lattice. We can express r as the sum of its projection on the line, r||, and its orthogonal complement, r⊥. Also, let q be a general point of the lattice dual to the square lattice, another square lattice, and scaled by 2π. Thus for any r and q we have (3.6) exp (iq · r)=1. We can decompose any q into the same pair of orthogonal spaces as r and from (3.6) infer the relationship

exp (iq|| · r||)=exp(−iq⊥ · r⊥).

1In D = 1 there are only two “angles”, corresponding to the sign of q. LECTURE 3. MACROSCOPIC ORDER 35

Using the square lattice construction we can identify Bragg peaks of the origi- nal two-length sequence in one dimension and calculate their strength. Let R be a particular set of lattice points forming a path on the lattice and whose projections, the set R||, correspond to the atom positions in the diffraction experiment. By the incommensurate property of the projection, there is a bijection between elements of R|| and the elements of R⊥, the projection of the lattice path onto the orthog- onal space. Let q|| be the projection of some dual lattice vector and consider the structure factor with this as the momentum change in the diffraction experiment:

S(q||)= exp (iq|| · r||)

r||∈R|| (3.7) = exp (−iq⊥ · r⊥).

r⊥∈R⊥

From the second line we can see how the terms in the sum can be made to combine to give a structure factor that grows as N, the number of terms in the sum. Most directly, we can constrain the lattice paths to always lie in a strip of finite extent in r⊥ and consider dual lattice vectors for which the q⊥ is small. Such dual lattice vectors always exists, because the projection subspaces are incommensurate. With these bounds on r⊥ and q⊥ in place, we see that the structure factor does indeed grow as N, and there is a Bragg peak at q|| in the diffraction experiment. Constraining the lattice path to be bounded in r⊥ seems unphysical, since the atoms arranged in r|| do not have access to the geometrical construction that reveals their r⊥. Is there a statistical mechanism that achieves the same thing? Consider a long lattice path comprised of Na horizontal and Nb vertical edges. The average slope of the path will match that of the r|| subspace when Na/Nb ≈ tan α. Assuming the two motifs in our 1-dimensional solid have this relative con- centration, a simple model might be that all their arrangements are energetically so similar on the thermal energy scale kBT that they occur with equal probability. This still leaves open the possibility of large-scale concentration fluctuations within the material, whose effect on the diffraction we turn to next. The method of analysis follows closely our analysis of long range order in the hard sphere system. The macroscopic region occupied by the solid is partitioned into microscopic domains characterized by local variations in the slope of the lattice path. In each domain we calculate the free energy and its dependence on the local slope. This then becomes the free energy density of a macroscopic model that is integrated to give the probability of arbitrary paths, now described by smooth curves r⊥(r||). Consider a microscopic domain of size Δr|| in which the r⊥ projection changes by Δr⊥. By geometry, these are related to the number of edges in the path that are horizontal, na, and vertical, nb:

na =Δr|| cos α − Δr⊥ sin α

nb =Δr⊥ cos α +Δr|| sin α. 36 VEIT ELSER, STATISTICAL MECHANICS

Thefreeenergyisjust−T times the entropy of paths with this mixture of the two kinds of edges: −1 na + nb (3.8) F (Δr||, Δr⊥)=−β log na −1 na + nb na + nb ∼−β na log + nb log . na nb In the second expression we used Stirling’s formula for the factorials, since our domain can still have many edges and still be microscopic. This free energy has a regular Taylor series in Δr⊥, the local deviation of the slope from the average slope of the macroscopic system. To second order we obtain, 1 2 (3.9) F (Δr||, Δr⊥)=Δr|| f + f (Δr⊥/Δr||)+ f (Δr⊥/Δr||) + ··· , 0 1 2 2 where only the coefficient of the quadratic term will have any bearing on long range order: −1 f2 =(β(sin α +cosα)sinα cos α) .

The positivity of f2 (0 <α<π/2) can be traced to the convexity of the mixture- entropy (3.8). We see that the structure of (3.9) is the volume of the domain in the r|| subspace times an expansion in the gradient of the macroscopic function r⊥(r||): 1 2 (3.10) F (r⊥)= dr|| f + f ∂||r⊥ + f (∂||r⊥) + ··· . 0 1 2 2 As in the case of hard spheres, we treat the free energy functional (3.10) as a Hamiltonian for the macroscopic degrees of freedom, the function r⊥(r||). For periodic boundary values the linear gradient term integrates to zero and we obtain another instance of the height model. The main result for the height model is that the heights have a distribution p(r⊥) above every r|| whose width grows with the system size only when the dimension D is one or two. Previously we argued that the structure factor (3.7) would scale with the system size (Bragg peak behavior) when the distribution of r⊥ was bounded. While this is not the case for our D =1 solid of two lengths, analogues of this model in three dimensions produce Bragg peaks because their r⊥ distribution is independent of system size. Figure 7 shows a tiling model for the marginal case, D = 2. By projecting square facets forming a surface in the cubic lattice we generate tilings of the plane, r||, by three kinds of parallelograms. In the most symmetrical case, when the orthogonal height space r⊥ is parallel to a 3-fold axis of the cubic lattice, the parallelograms are congruent 60◦-rhombi. This geometry, for a statistical model in the plane, would appear to be the most natural by having just a single structural motif. In addition to lacking perfect long range order as a result of a logarithmically growing height distribution, this model also suffers from the defect that the height distribution cannot be inferred from the rhombus vertices. All heights collapse to a simple hexagonal lattice and the formation of Bragg peaks is trivial. One could avoid this with incommensurate projection spaces, as in the right panel of Figure 7, but then the corresponding tiles are considerably less natural. The minimum dimension for long range order (D = 3), incommensurate pro- jection subspaces, and high symmetry, all come together in the six dimensional hyper-cubic lattice [7]. A tiling of space by two “golden rhombohedra” is con- structed in direct analogy with the D =1andD = 2 constructions described LECTURE 3. MACROSCOPIC ORDER 37

Figure 7. Tilings of the plane by three kinds of parallelogram are projections of surfaces formed from the three kinds of square facets in a cubic lattice. In the most symmetrical projection, on the left, the parallelograms are congruent 60◦-rhombi, whose vertices always lie on a hexagonal lattice. above. Figure 8 shows how exactly two tile shapes emerge when the three dimen- sional r|| subspace is chosen so the six edges of the hyper-cubic lattice project to the six 5-fold axes of the regular icosahedron. Because the orthogonal r⊥ subspace also has three dimensions, three “height” variables are required to describe how the three dimensional hyper-surface of face-connected 3-facets meanders through the hyper-cubic lattice. As in the one dimensional model analyzed earlier, the hyper- surface r⊥(r||) on macroscopic scales has a squared-gradient free energy density in the most random scenario microscopically, where all tile arrangements have equal probabilities. And because the heights r⊥ have long range order in three dimen- sional squared-gradient models, diffraction from such a random tiling structure will exhibit Bragg peaks, much like a crystal [8]. The icosahedron-symmetric positions q|| of the Bragg peaks (projected from the hyper-cubic dual lattice vectors q)place this structure in the quasicrystal class. 38 VEIT ELSER, STATISTICAL MECHANICS

Figure 8. The 3-facets of the six dimensional hyper-cubic lattice project to “golden rhombohedra” when the projection subspace is invariant with respect to the icosahedral group. Edges of the lattice project to the six 5-fold axes of the regular icosahedron, and all triples of distinct edges form a rhombohedron congruent to one of the two shown.

Problems for study Free volumes of nearly close-packed spheres Compare the free volumes of spheres in the limit of close packing for the fcc and hcp structures. As with disks, fix the surrounding spheres at their average position and approximate the spherical surfaces of constraint by planes. Use symmetry to argue the two free volumes are equal and thereby avoid having to calculate them.

Isotropic elasticity of hexagonally packed disks The two shear degrees of freedom of a material in two dimensions are described by the distortion matrix a a A = 1 2 . a2 −a1 Rotating the material by 60◦ gives a different distortion   a1 a2 A =  −  . a2 a1   Determine a1 and a2 in terms of a1 and a2 and use this result to argue that if the 2 2 shear elastic free energy of hexagonally packed disks is κ1a1 + κ2a2 for general a1 and a2,thenκ1 = κ2. Hexagonally packed disks are thus elastically isotropic: the free energy makes no reference to the crystal axes.

Height model Fill in all the missing steps in the analysis of the height model. LECTURE 3. MACROSCOPIC ORDER 39

Ising model with many ground states In some solids the only degrees of freedom that have significant entropy at low temperature are the electron magnetic moments, or spins. Often the Hamiltonian of such solids can be modeled by spin variables that take two values, s = ±1, corresponding to the magnetic moment along a particular axis in units of /2. Consider a two dimensional solid where the spins are arranged on a hexagonal lattice and have the Ising Hamiltonian H = J sisj , (ij) where the sum is over all adjacent pairs of spins and the coupling J is positive (antiferromagnetism). To get the lowest possible energy, or ground state, we would like adjacent pairs of spins to have opposite sign. But this is impossible, because the adjacency graph for the spins is not bipartite. Show that this system has in fact many ground states, and that these are in 2-to-1 correspondence with the 60◦-rhombus tilings of the plane.

Bibliography

[1] Ya. G. Sinai, On the foundations of the ergodic hypothesis for a dynamical system of statistical mechanics, Sov. Math Dokl. 4, 1818-1822 (1963). MR0214727 [2] H. Weyl, On the asymptotic distribution of eigenvalues,Nachr.K¨onigl. Ges. Wiss. G¨ottingen, 110-117 (1911). [3] M. Kac, Can one hear the shape of a drum?, Amer. Math. Monthly 73, 1-23 (1966). MR0201237 [4] M. V. Berry, Regularity and chaos in classical mechanics, illustrated by three deformations of a circular ‘billiard’,Eur.J.Phys.2, 91-102 (1981). MR634612 [5] F. H. Stillinger Jr., Z. W. Salsburg, and R. L. Kornegay, Rigid disks at high density,J.Chem. Phys. 43, 932-943 (1965). MR0184667 [6] V. Elser, Phonon contribution to the entropy of hard sphere crystals,Phys.Rev.E89, 052404 (2014). [7] P. Kramer and R. Neri, On periodic and non-periodic space fillings of Em obtained by projec- tion,ActaCryst.A40, 580-587 (1984). MR768042 [8] V. Elser, Comment on “Quasicrystals: a new class of ordered structures”, Phys. Rev. Lett. 54, 1730-1730 (1985).

41

Packing, Coding, and Ground States

Henry Cohn

IAS/Park City Mathematics Series Volume 23, 2014

Packing, Coding, and Ground States

Henry Cohn

Preface In these lectures, we’ll study simple models of materials from several different perspectives: geometry (packing problems), information theory (error-correcting codes), and physics (ground states of interacting particle systems). These per- spectives each shed light on some of the same problems and phenomena, while highlighting different techniques and connections. One noteworthy phenomenon is the exceptional symmetry that is found in certain special cases, and we’ll examine when and why it occurs. The overall theme of the lectures is thus order vs. disorder. How much symmetry can we expect to see in optimal geometric structures? The style of these lecture notes is deliberately brief and informal. See Conway and Sloane’s book Sphere packing, lattices and groups [28] for far more information about many of the mathematical objects we’ll discuss, as well as the references cited in the notes for omitted details. I’ve included a dozen exercises for the reader, which cover things I think it’s most useful to do for oneself. The exercises vary in difficulty, from routine verifi- cations to trickier computations. There’s no need to solve them if you are willing to take a few things on faith, but I highly recommend engaging actively with this material, and the exercises would be a good way to get started. These notes are based on my PCMI lectures from 2014 and were written before Viazovska [82] found a remarkable solution to the sphere packing problem in R8 using linear programming bounds. The only updates to reflect this development are a few footnotes. See also [14] for an exposition.

Acknowledgments I am grateful to Matthew de Courcy-Ireland for serving as the teaching assistant for this course and for providing feedback on the lecture notes.

Microsoft Research New England, One Memorial Drive, Cambridge, MA 02142 E-mail address: [email protected]

c 2017 American Mathematical Society

45

LECTURE 1 Sphere packing

1. Introduction The sphere packing problem asks for the densest packing of congruent spheres in Rn. In other words, how can we cover the greatest fraction of space using congruent balls that do not overlap (i.e., that have disjoint interiors)? The density is the fraction of space covered. Finding the densest sphere packing sounds simple, but it turns out to be a surprisingly deep and subtle problem. Before we dive into the sphere packing problem, it’s worth thinking about how to write down a rigorous definition. Although pathological packings may not have well-defined densities, everything we could reasonably hope for is true: we can define the optimal density by taking a suitable limit, and there is a packing that achieves this density. Specifically, given a packing P, a point x ∈ Rn, and a positive real number r,let vol(Br(x) ∩P) Δr,x(P)= vol Br(x) be the fraction of the ball Br(x) of radius r centered at x that is covered by P.If n we define the optimal packing density Δn in R by

Δn = lim sup sup Δr,0(P), r→∞ P then there exists a single packing P for which

lim Δr,x(P)=Δn r→∞ uniformly for all x ∈ Rn.See[39] for a proof. What are the optimal sphere packings in low dimensions? In one dimension, we have the interval packing problem on the line, which is trivial. In two dimensions, the answer is pretty clearly the hexagonal packing, with each disk surrounded by six others:

However, proving optimality takes a genuine idea. For example, one can show that the Voronoi cells (the sets of points closer to each sphere center than to the others) in the hexagonal packing are as small as possible in any packing. See [76]forthe first proof of optimality, [65, 37] for subsequent proofs, and [40] for a particularly short proof.

47 48 HENRY COHN, PACKING, CODING, AND GROUND STATES

In three dimensions, the sphere packing problem is much more difficult. There is a natural guess for the solution, namely stacking hexagonal layers as densely as possible, so that each is nestled into the gaps in the neighboring layers. Such a packing is known to be optimal, via an elaborate proof [41] that depends on computer calculations. The original proof was so long and complex that it was difficult to check carefully, but it has recently been verified at the level of formal logic [42]. In four or more dimensions, the optimal sphere packing density is not known, although there are upper and lower bounds.1 Exercise 1.1. How can hexagonal layers be stacked to form dense packings in R3? Show that there are an uncountable number of different ways to do so, even if you consider two packings the same when they are related by a rigid motion of space. Can you extend this analysis to R4? Which packings can you get by stacking optimal three-dimensional packings as densely as possible? How many can you find? How dense are they? What about R5? R6? How high can you go? Feel free to give up after four dimensions, but the further you go, the more interesting phenomena you’ll run into. By R10, this iterated stacking process will no longer produce the densest possible sphere packings, but nobody knows whether it fails before that. See [27] for more details on what happens in dimensions two through ten.

2. Motivation There are several reasons why we should care about sphere packing. One is that it’s a natural geometric problem: humanity ought to know the answer to such a simple and natural question. Another reason is that the problem has interesting solutions. Sometimes it’s difficult to judge how interesting a problem is in the abstract, before taking a look at the phenomena that occur. Sphere packing is full of rich, intricate structures that are themselves of intrinsic interest, and this makes the problem far more appealing than it would have been if the answers had been less exciting. A third reason to care about sphere packing is that it is a toy model of granular materials. Of course no real material consists of identical perfect spheres, and the sphere packing problem also neglects forces and dynamics. However, sphere packing is at least a first step towards understanding the density of an idealized material. (See [54] for a statistical physics perspective on packing.) The most important practical reason to study sphere packing is also one of the most surprising reasons: high-dimensional sphere packings are essential for commu- nication over noisy channels, as we’ll spend the rest of this section understanding. This is really an assertion about information theory, Claude Shannon’s great dis- covery from his famous 1948 paper A mathematical theory of communication [72]. Sphere packing per se again deals with an idealized scenario, but it illustrates some of the fundamental principles underlying information theory. The setting works as follows. Suppose we are sending messages over some communication channel. We will represent the signals by points in a bounded sub- set of Rn, say the ball of radius R about the origin (the precise subset is not so

1Viazovska [82] has recently solved the sphere packing problem in R8 using linear programming bounds, which led to a solution in R24 as well by Cohn, Kumar, Miller, Radchenko, and Viazovska [21]. LECTURE 1. SPHERE PACKING 49 important). In this model, each coordinate represents some measurement used to describe the signal. For example, for a radio signal we could measure the amplitude at different frequencies. There is no reason to expect the number of measurements to be small, and realistic channels can involve hundreds or even thousands of coor- dinates. Thus, the signal space for the channel will be high-dimensional. Note that this dimensionality has nothing to do with the physical space we are working in; instead, it simply represents the number of independent measurements we make on the signals. Each signal will be an individual transmission over the channel at a given time, and we will send a stream of signals as time passes. Of course, the big difficulty with communication is noise: if we send a signal s, then the received signal r at the other end will generally not be exactly equal to s. Instead, it will have been perturbed by channel noise. In a useful channel, the noise level will be fairly low, and we can expect that |r − s| <εfor some fixed ε (the noise level of the channel). Thus, we can imagine an open error ball of radius ε about each signal sent, which shows how it could be received after adding noise: ε

r s

This is a simplistic model of noise, since we assume that the noise has no direc- tionality or structure, and that every perturbation up to radius ε could plausibly occur but nothing beyond that limit. In practice, engineers use more sophisticated noise models; for example, cell phones have to take into account all sorts of other phenomena, such as interference from reflected signals. However, our basic noise model is a good illustration of the essential principles. How can we arrange our communications so as to remove the effects of noise? We will build a vocabulary S ⊆ Rn of signals and only send signals in S.Thisis called an error-correcting code. If two distinct signals s1,s2 ∈ S satisfy |s1 − s2| < 2ε, then the received signal could be ambiguous:

r s1 s2

Therefore, we will keep all signals in S at least 2ε apart, so that the error balls are disjoint: 2ε

s1 s2

This is exactly the sphere packing problem. We want the signal set S to be as large as possible, since having more signals available increases the rate at which we can transmit information, but the ε-balls about the signals in S are not allowed 50 HENRY COHN, PACKING, CODING, AND GROUND STATES to overlap. How large can we make S subject to this constraint? Recall that the only available subset of Rn in our model is the ball of radius R. Thus, the question becomes how many ε-balls we can pack into a ball of radius R + ε. (The radius is R + ε, rather than R, because the error balls can stick out over the edge.) That’s a finite version of sphere packing, and we recover the usual version in all of Rn in the limit when R is much larger than ε. That limit is exactly the situation we expect, since the channel is not very useful if ε is on the same scale as R. It is remarkable that although high-dimensional packing sounds utterly abstract and impractical, it turns out to be particularly important for applications. In these lectures we will focus on the theory behind sphere packing, rather than the applications, but it is helpful to keep in mind that a high-dimensional packing is a tool for communicating over a noisy channel.

3. Phenomena Relatively little is understood about the sphere packing problem. One might hope for a systematic solution that works in every dimension, but that just doesn’t seem possible. Instead, each dimension has its own idiosyncrasies. Getting a feeling for how R8 differs from R7 or R9 is part of the charm of the subject, but these differences mean the packing problem is much more subtle than it sounds. In two or three dimensions, we can rely on our spatial intuition and summarize the procedure as “just do the obvious thing,” but there is no obvious thing to do in Rn. Good constructions are known in low dimensions, and there is little doubt that humanity has found the optimal density through at least the first eight dimensions. However, we have absolutely no idea what the best high-dimensional packings look like. For example, we do not know whether to expect them to be ordered and crystalline, or disordered and pseudorandom. Many researchers expect disorder, perhaps on the grounds that this is the default when there is no reason to expect order. However, we lack the theoretical tools to analyze this question. All we know in general are upper and lower bounds for the optimal density, and these bounds are distressingly far apart. For example, in R36 they differ by a multiplicative factor of 58: if you take the densest known packing in R36,then thebestwecansayisthatyoucouldn’tfitinanymorethan58timesasmany spheres if you rearranged them. The ratio of the upper and lower bounds in Rn grows exponentially as n →∞. At first, this gap sounds absurd. How could our bounds possibly be off by an exponential factor? One way to think about it is that volume scales exponentially in high dimensions, because the volume of a hypercube of side length in Rn is n, which is exponential in n. If you take a packing in Rn and move the sphere centers 1% further apart, then you lower the density by a factor of 1.01n.Inlow dimensions this factor is insignificant, but in high dimensions it is enormous. Thus, even a little bit of uncertainty in the sphere locations translates to an exponential uncertainty in the density. On a scale from one to infinity, a million is small, but we know almost nothing about sphere packing in a million dimensions. The best we can say is that the op- timal density is at least a little larger than 2−1000000. More generally, the following greedy argument gives a surprisingly easy lower bound of 2−n in Rn. Consider a saturated packing in Rn, i.e., a packing such that no further spheres can be added without overlap. Such packings certainly exist, because one can LECTURE 1. SPHERE PACKING 51

Figure 1. Any point not covered by the double-radius spheres could be used as the center of a new sphere (shaded above). obtain a saturated packing by iteratively adding spheres as close to the origin as possible. Alternatively, there are saturated packings on flat tori because there is room for only finitely many spheres, and unrolling such a packing yields a saturated periodic packing in Euclidean space. Proposition 3.1. Every saturated sphere packing in Rn has density at least 2−n. Proof. No point in Rn can have distance at least 2 from all the sphere centers in a saturated packing with unit spheres, because we could center a new sphere at such a point without creating any overlap (see Figure 1). In other words, doubling the radius of the spheres in a saturated packing would cover space completely. Doubling the radius increases the volume by a factor of 2n, and so the original spheres must occupy at least a 2−n fraction of Rn. Thus, every saturated packing has density at least 2−n. 

In R1 there are saturated packings with density arbitrarily close to 1/2, but that is the only case in which Proposition 3.1 is sharp, because the bound is sharp exactly when the double-radius balls can tile Rn. One way to improve it is to prove a lower bound for how inefficient a sphere covering in Rn must be. For example, using the Coxeter-Few-Rogers theorem on sphere covering [29] improves the bound to e−3/2n · 2−n asymptotically. At first 2−n sounds like a rather weak bound, which must be far from the truth. However, nobody has been able to obtain any exponential improvement to it, and perhaps it is closer to the truth than one would guess. In any case, it is nearly all we know regarding density lower bounds in high dimensions. A long sequence of improvements ground to a halt with Ball’s bound of 2(n−1)·2−n in 1992 [7], before progress began again nearly twenty years later. Vance proved a lower bound asymp- totic to 6n/e · 2−n in 2011 [79], which improves on Ball’s bound because e<3, and Venkatesh followed that with a much larger constant-factor improvement as well as a bound proportional to n log log n · 2−n for a certain sparse sequence of dimen- sions [80]. This last bound is particularly exciting because it is the first superlinear improvement on 2−n, but on an exponential scale all of these improvements are small. For comparison, the best upper bound known is 2−(0.5990...+o(1))n, due to Kabatiansky and Levenshtein [43] in 1978, with a constant-factor improvement by Cohn and Zhao [25] in 2014. 52 HENRY COHN, PACKING, CODING, AND GROUND STATES

Note that the greedy argument is nonconstructive, and the same is true of the improvements mentioned above. For large n, all known bounds anywhere near 2−n are nonconstructive, while every packing anyone has described explicitly is terrible in high dimensions. For example, one natural attempt is to center spheres of radius 1/2 at the integer lattice points Zn. That yields a packing of density πn/2 , (n/2)!2n and the factorial in the denominator ruins the density. (Note that when n is odd, (n/2)! means Γ(n/2+1).) There are some patterns in low dimensions, but they quickly stop working. For example, natural generalizations of the face-centered cubic packing from R3 work well in R4 and R5, but not in higher dimensions, as we will see in the next section. In R10, the best packing known is based on a periodic arrangement with 40 spheres in each fundamental cell [28, p. 140]. Crystalline packings work beautifully in low dimensions, but they become in- creasingly difficult to find in high dimensions. Perhaps they just aren’t optimal? It’s natural to speculate about amorphous packings, but nobody really knows. In high dimensions, we can analyze only random or typical packings, and we simply do not know how close they are to the very best. One philosophical quandary is that too much structure seems to make high- dimensional packings bad, but the known lower bounds all rely on some sort of heavy structure. Vance’s and Venkatesh’s techniques give the best density, but they involve the most structure, namely lattices with nontrivial symmetry groups acting on them. The trade-off is that structure seemingly hurts density but helps in analyzing packings. 8 The most remarkable packings are the E8 root lattice in R and the Leech 24 lattice Λ24 in R . They are incredibly symmetrical and dense packings of spheres, and they must be optimal, although this has not yet been proved.2 What makes them exciting is that they turn out to be connected with many areas in mathematics and physics, such as string theory, hyperbolic geometry, and finite simple groups. See [28]and[34] for more information about these wonderful objects, as well as the next section for a construction of E8. 4. Constructions How can we form a sphere packing? The simplest structure we could use is a lattice, the integer span of n linearly independent vectors in Rn. Inotherwords,givena basis v1,...,vn, we center the spheres at the points

{a1v1 + a2v2 + ···+ anvn | a1,...,an ∈ Z}. The packing radius of a lattice is half the shortest nonzero vector length, since that is the largest radius for which the spheres do not overlap. Given a lattice basis v1,...,vn, the corresponding fundamental cell is the parallelotope

{x1v1 + x2v2 + ···+ xnvn | x1,...,xn ∈ [0, 1)}. The translates of the fundamental cell by lattice vectors tile space. In a lattice packing, there is one sphere per translate of the fundamental cell, and the density is the volume ratio of the sphere and cell. More generally, we

2Until very recently in [82]forn =8and[21]forn = 24. LECTURE 1. SPHERE PACKING 53 could form a periodic packing, which is the union of finitely many translates of a lattice. Equivalently, there can be several spheres per cell, which are then translated throughout space by the lattice vectors. There is no reason to believe that one sphere per cell is the best choice, and indeed periodic packings offer considerably more flexibility. One confusing issue is that physicists use the term “lattice” to mean periodic packing, while they call lattices “Bravais lattices.” We will stick with the standard mathematical terminology. There is no reason to believe that periodic packings achieve the greatest possible density. This is an open question above three dimensions, and it is plausibly false in high dimensions. However, periodic packings always come arbitrarily close to the optimal density. To see why, consider an optimal packing, and imagine intersecting it with a large box. If we try to repeat the part in the box periodically through space, then the only place overlap could occur is along the boundary of the box. We can fix any problems by removing the spheres next to the boundary. Shaving the packing in this way produces a periodic packing without overlap, at the cost of slightly lowering the density. The decrease in density becomes arbitrarily small if we use a sufficiently large box, and thus periodic packings come arbitrarily close to the optimal packing density. By contrast, lattices probably do not approach the optimal density in high dimensions. The problem is that unlike periodic packings, lattices have limited flexibility. A lattice is completely determined by a basis, and thus a lattice in Rn can be specified by n2 parameters (in fact, fewer if we take the quotient by rigid motions). Quadratically many parameters just don’t give enough flexibility to fill all the gaps in an exponential amount of space. It’s natural to guess that when n is large enough, no lattice packing in Rn is ever saturated, but this conjecture remains out of reach. The best sphere packings currently known are not always lattice packings (R10 is the first case in which lattices seem to be suboptimal), but many good packings are. The simplest lattice is Zn, but it is a lousy packing when n>1, as discussed above. Instead, the “checkerboard” packing n Dn = {(x1,...,xn) ∈ Z | x1 + ···+ xn is even}. is better for n ≥ 3. In fact, D3, D4,andD5 are the best packings known in their dimensions, and provably the best lattice packings (see [28] for more information). However, they are suboptimal for n ≥ 6. What goes wrong for n ≥ 6 is that the holes in Dn grow larger and larger. A hole in a lattice Λ in Rn is a point in Rn that is a local maximum for distance from the nearest point in Λ. There are two classes of holes in Dn for n ≥ 3, represented by (1, 0,...,0), which is at distance 1 from Dn,and(1/2, 1/2,...,1/2), which is at distance   1 2 1 2 n + ···+ = . 2 2 4

More generally, the translates of these points by Dn are also holes, as are the translates of (1/2, 1/2,...,1/2, −1/2). ± When n>4wecall(1 , 0,...,0) a shallow hole in Dnand (1/2,..., 1/2) a deep hole, because n/4 > 1. When n is large, the depth n/4 of a deep hole is enormous. For comparison, note that the spheres in the Dn packing have radius 54 HENRY COHN, PACKING, CODING, AND GROUND STATES √ 2/2, because the nearest lattice points are (0, 0,...,0) and (1, 1, 0,...,0), √ at distance 2. When n is large, the holes are much larger than the spheres in the packing, and Dn is not even saturated, let alone an optimal packing. This transition occurs at dimension eight, and something wonderful happens right at the transition√ point. When n = 8, the radius n/4 of a deep hole equals the distance 2 between adjacent lattice points. Thus, we can slip another copy of D8 into the holes, which doubles the packing density, and the new spheres fit perfectly into place. The resulting packing is called the E8 root lattice. This construction of E8 appears asymmetric, with two different types of spheres, namely the original spheres and the ones that were added. However, they are indistinguishable, because E8 is a lattice and thus all the spheres are equivalent under translation.

Exercise 4.1. Check that E8 is in fact a lattice.

The E6 and E7 lattices are certain cross sections of E8.TheE6, E7,andE8 lattices are the densest lattice packings in R6 through R8, and they are almost certainly the densest sphere packings. 24 The Leech lattice Λ24 in R is similar in spirit, but with a more elaborate construction. See [34] for an elegant treatment of the Leech lattice, as well as the theory of root lattices. The kissing number in Rn is the greatest number of spheres that can touch a central sphere, if they all have the same size and cannot overlap except tangentially. It is known to be 6 in R2,12inR3,24inR4, 240 in R8, and 196560 in R24, but is not known in any other dimensions. The case of R2 is easy, but R3 is not [68], and R4 is yet more difficult [57]. Surprisingly, R8 and R24 are quite a bit simpler than R3 or R4 are [61, 52], and we will settle them in the fourth lecture.

Exercise 4.2. What are the shortest nonzero vectors in Dn?InE8? This will give optimal kissing configurations in R3, R4,andR8. Exercise 4.3. The vertices of a cross polytope centered at the origin in Rn consist of n pairs of orthogonal vectors of the same length (it’s a generalized octa- hedron). Show how to decompose the vertices of a hypercube in R4 into two cross polytopes. Find a symmetry of the hypercube that interchanges them.

Exercise 4.4. Show how to decompose the minimal vectors in D4 into three disjoint cross polytopes, and find a symmetry of D4 that cyclically permutes these cross polytopes.

This symmetry is called triality, and it makes D4 more symmetrical than any of its siblings. When n = 4, the symmetries of Dn are simply permutations and sign changes of the coordinates, while D4 has all those plus triality.

5. Difficulty of sphere packing Why is the sphere packing problem hard? There are several reasons for this. One is that there are many local optima. For example, among lattices in R8,thereare 2408 local maxima for density [75]. This number seems to grow rapidly in high dimensions, and it means the structure of the space of packings is complicated. LECTURE 1. SPHERE PACKING 55

There is lots of space to move in, with complicated geometrical configurations, and it is difficult to rule out implausible configurations rigorously. To get a feeling for the difficulties, it is useful to think about the geometry of high dimensions. Let’s start by looking at the n-dimensional cube

{(x1,...,xn) ||xi|≤1 for all i} √ n of√ side length 2. It has 2 vertices (±1,...,±1), each at distance 12 + ···+12 = n from the center. When n =106, the number of vertices is absurdly large, and they are each 1000 units from the center, despite the fact that the side length is only 2. These facts are amazingly different from our intuition in low dimensions. I like to imagine the vertices as vast numbers of tiny fingers stretching out from the center of the cube. I find it difficult to imagine that the result is convex, but somehow a million dimensions has enough space to accommodate such a convex body. The reason why cubes pack much better than spheres is that the vertices stick out far enough to fill in all the gaps. One of the most important insights in high-dimensional geometry is the fol- lowing principle: almost all the volume of a high-dimensional body is concentrated near its boundary. To see why, imagine shrinking such a body by 1%, leaving just a thin fringe near the boundary. The volume of the shrunken copy is lower by a factor of (99/100)n, which tends exponentially to zero as n →∞. Thus, virtually all of the volume lies in that boundary fringe. There is of course nothing special about 1%. The appropriate shrinkage scale in Rn to capture a constant fraction of the volume is on the order of 1/n, because (1 − c/n)n converges to e−c. Boundaries, where all the interaction takes place, become increasingly impor- tant as dimension rises. This helps explain the difficulty of sphere packing, because avoiding overlap is all about interaction along boundaries. This principle reverses our intuition from low dimensions. We typically think of boundaries as small and exceptional, but in high dimensions there’s practically nothing but boundary, and this changes everything. For example, suppose we are analyzing a numerical algorithm that uses many variables and thus operates in a high-dimensional space. If it works efficiently throughout a certain region except near the boundary, then that sounds good until we realize that almost all the region is near the boundary.

6. Finding dense packings In this section we’ll examine how record-setting sphere packings can be found. The high and low-dimensional cases are handled very differently in practice. First, we’ll look at the averaging techniques used in high dimensions, and then we’ll briefly discuss how computer searches can be used in low dimensions. The key technique used in the most recent papers in high dimensions [79, 80] is the Siegel mean value theorem [74], which lets us average suitable functions over the space of lattices. To carry out such an averaging we need a probability measure, and indeed there is a canonical probability measure on lattices with fixed determinant (i.e., fundamental cell volume). Specifically, it’s the unique SLn(R)- invariant probability measure on this space. The existence of an SLn(R)-invariant measure follows from general results on Haar measure [58], but it takes a calculation to show that it has finite volume and can thus be normalized to yield a probability measure. 56 HENRY COHN, PACKING, CODING, AND GROUND STATES

Once we have this probability measure on lattices, we can ask various statistical questions. For example, what does the average pair correlation function look like? In other words, what can we say about the average number of neighbors at each distance in a random lattice? The Siegel mean value theorem says that these pair correlations are exactly the same as for a Poisson distribution (i.e., uniformly scattered points). More precisely, it says that for a sufficiently well-behaved function f : Rn → R with n>1, the average of f(x) x∈Λ\{0} over all lattices Λ of determinant 1 equals f(x) dx. Rn Intuitively, averaging over a random lattice blurs the sum into an integral. The reason why the Siegel mean value theorem holds is that there is enough symmetry to rule out any other possible answer. Specifically, by linearity the answer must be fdμ for some measure μ on Rn \{0} that is invariant under SLn(R). There is only one such measure up to scaling when n>1 (given a few mild hypotheses), and some consistency checks determine the constant of proportionality. The meta principle here is that averaging over all possible structures is the same as having no structure at all. Of course this is not always true. It generally depends on invariance under the action of a large enough group, and SLn(R)is more than large enough. It is not hard to deduce lower bounds for sphere packing density from the Siegel mean value theorem. The following proposition is far from the state of the art, but it illustrates the basic technique. Proposition 6.1. The sphere packing density in Rn is at least 2 · 2−n. Proof. Let B be a ball of volume 2 centered at the origin. For a random lattice of determinant 1, the expected number of nonzero lattice points in B is vol(B)=2, by applying the Siegel mean value theorem to the characteristic function of B. These lattice points come in pairs (negatives of each other), so the number is always even. Since the average number is 2 and some lattices have many, other lattices must have none. Such a lattice gives a packing with one copy of B/2per unit volume and density vol(B) vol(B/2) = =2· 2−n, 2n as desired.  Vance’s key idea [79] builds on the extra factor of 2 that arises because lattice vectors occur in pairs of the same length. What if we impose additional symmetry? The intuition is that the average number of neighbors remains the same, but now they occur in bigger clumps, and so the chances of no nearby neighbors go up. Vance used lattices with quaternion algebras acting on them, and Venkatesh [80] obtained even stronger results by using cyclotomic fields. Is this the best we can do? Only certain symmetry groups work here: we need a big centralizer to get enough invariance for the Siegel mean value theorem proof, and only division algebras will do. Cyclotomic fields are the best division algebras LECTURE 1. SPHERE PACKING 57 for this purpose [56]. Other sorts of groups will distort the pair correlation function away from Poisson statistics, but that could be good or bad. The area is wide open, and it is unclear which sorts of constructions might help. In low dimensions, one can obtain much better results through numerical searches by computer. Several recent papers [45, 55, 44] have taken this ap- proach and recovered the densest lattices known in up to 20 dimensions. So far the computer searches have not yielded anything new, but they seem to be on the threshold of doing so. Can we push the calculations further, to unknown territory? What about periodic packings?

7. Computational problems Lattices may sound down to earth, but they are full of computational difficulties. For example, given a lattice basis it is hard to tell how dense the corresponding sphere packing is. The difficulty is that to compute the density, we need to know both the volume of a fundamental cell and the packing radius of the lattice. The former is just the absolute value of the determinant of a basis matrix, which is easy to compute, but computing the packing radius is not easy. We can see why as follows. Recall that the packing radius is half the shortest nonzero vector length in the lattice. The problem is that the basis vectors may not be the shortest vectors in the lattice, because some linear combination of them could be much shorter. There are exponentially many linear combinations that could work, and there is no obvious way to search efficiently. In fact, computing the shortest vector length is NP-hard [1]. In other words, many other search problems can be reduced to it. No proof is known that it cannot be solved efficiently (this is the famous problem of whether P = NP), but that is almost certainly the case. There are good algorithms for “lattice basis reduction,” such as the LLL algo- rithm [51, 60], and they produce pretty short vectors. These vectors are generally far from optimal, but they are short enough for some applications, particularly in relatively low dimensions. Shortest vector problems and their relatives come up in a surprising range of topics. One beautiful application is cryptography. We’ll briefly discuss the Goldreich-Goldwasser-Halevi cryptosystem [38], which turns out to have weak- nesses [59] but is a good illustration of how lattice problems can be used to build cryptosystems. It’s a public key cryptosystem, in which the public key is a basis for a high-dimensional lattice, while the private key is a secret nearly orthogonal basis for the same lattice, which makes it easy to find the nearest lattice point to any given point in space (while this problem should be hard for anyone who does not know the secret basis). We encode messages as lattice points. Anyone can encrypt a message by adding a small random perturbation, thereby moving it off the lattice. Decryption requires finding the nearest lattice point, which has no obvious solution without the private key. As mentioned above, this system is not as secure as it was intended to be [59], but there are other, stronger lattice-based systems. See [63] for a survey of recent work in this area. Recognizing algebraic numbers is a rather different sort of application. The number α = −7.82646099323767402929927644895 58 HENRY COHN, PACKING, CODING, AND GROUND STATES is a 30-digit approximation to a root of a fifth-degree polynomial equation. Which equation is it? Of course there are infinitely many answers, but Occam’s razor suggests we should seek the simplest one. One interpretation of “simplest” is that the coefficients should be small. For comparison, 0.1345345345345345345345345345345 is clearly an approximation to 1/10 + 345/9990, and no other answer is nearly as satisfying. To identify the number α given above, let C =1020 (chosen based on the precision of α), and look at the lattice generated by the vectors

v0 =(1, 0, 0, 0, 0, 0,C),

v1 =(0, 1, 0, 0, 0, 0,Cα), 2 v2 =(0, 0, 1, 0, 0, 0,Cα ), 3 v3 =(0, 0, 0, 1, 0, 0,Cα ), 4 v4 =(0, 0, 0, 0, 1, 0,Cα ), 5 v5 =(0, 0, 0, 0, 0, 1,Cα ). The lattice vectors are given by 5 i a0v0 + ···+ a5v5 = a0,a1,a2,a3,a4,a5,C aiα i=0 ∈ Z with a0,...,a 5 . Such a vector is small when the coefficients ai are small and 5 i the sum i=0 aiα is tiny, since C is huge. Thus, finding a short vector amounts to 5 i finding a polynomial i=0 aix with small coefficients such that α is nearly a root. If we search for a short vector using the LLL algorithm, we find (71, −5, 12, −19, 13, 2, 0.000004135 ...). This tells us that 71 − 5α +12α2 − 19α3 +13α4 +2α5 ≈ 0. (Moreprecisely,itisabout0.000004135/C ≈ 4·10−26.) In fact, this is the equation I used to generate α. More generally, we can use lattices to find integral linear relations between any realnumbers,notjustpowersofα. I find it really remarkable that the same sort of mathematics arises in this problem as in communication over a noisy channel. LECTURE 2 Symmetry and ground states

1. Introduction One of the beautiful phenomena in sphere packing is the occurrence of spontaneous order. There seems to be no reason to expect that an optimal sphere packing should be highly structured, but this happens time and again, with the precise structure being difficult to predict a priori. These questions of order vs. disorder fit into a broader context. Where do symmetry and structure come from? L´aszl´o Fejes T´oth played an important role in formulating and attracting attention to this question. He drew a distinction between the systematology of the regular figures, which amounts to classifying the possible symmetriesthatcouldoccur,andthegenetics of the regular figures, which studies when and why they do occur. He sought to explain the genetics of the regular figures via optimization principles, and he made considerable progress towards this goal. In his vision [36, p. x], “regular arrangements are generated from unarranged, chaotic sets by the ordering effect of an economy principle, in the widest sense of the word.” Typically the optimization problem has certain symmetries, but it is far from obvious when its solutions will inherit these symmetries. Steiner trees are an at- tractive illustration of this issue. What is the minimal-length path connecting the vertices of a square? One obvious guess is an X, which inherits all the symmetries of the square:

However, the optimal solution turns out to look like this, or its rotation by 90◦:

There is partial symmetry breaking, in that the set of all solutions is of course invariant under the full symmetry group of the square, but each individual solution is invariant under just a subgroup. This behavior occurs generically for optimization problems. For example, in the sphere packing problem the full symmetry group of the optimization problem consists of all rigid motions of Euclidean space, while each optimal sphere packing

59 60 HENRY COHN, PACKING, CODING, AND GROUND STATES will be invariant under a much smaller subgroup, consisting of just a discrete set of motions. The difficulty lies in predicting what that subgroup will be. Which materials crystallize beautifully, and which remain amorphous? From this perspective, we would like to understand which optimization prob- lems admit highly symmetrical solutions, such as lattices or regular polytopes. Can we explain why E8 and the Leech lattice are so much more symmetrical than the best packing known in R10?

2. Potential energy minimization There’s no hope of developing a comprehensive theory of symmetry in optimization problems, because optimization is just too broad a topic. If you choose an arbitrary function to optimize, then you can make literally anything happen to the optima. To make progress, we must restrict the class of functions under consideration. In this lecture we will take a look at point particles with pairwise forces acting on them. Given a collection of particles interacting according to some potential function, what do they do? For example, the Thomson problem deals with charged particles on the surface of the unit sphere S2 in R3. Each pair of particles at Euclidean distance r has potential energy 1/r, and the total potential energy is the sum over all the pairs. The simplest question is what the ground states are. In other words, what are the minimal-energy configurations? They describe the behavior of the system at zero temperature. This is a simple question, but the ground states in the Thomson problem are far from obvious, and in fact not fully known in general. More generally, we can ask about dynamics or the behavior at positive temper- ature. These questions are more subtle, and we will generally restrict our attention to ground states. After all, if we can’t even understand the ground states, then there is little hope of analyzing anything more involved than that. Before we restrict our attention to ground states, though, it’s worth putting everything in the context of Gibbs measures. They are a canonical way of putting a probability measure on the states of a system based on nothing except their energies and the system’s temperature. Of course one can’t possibly capture the behavior of every system based on so little information, but Gibbs measures do a good job of describing a system that is in equilibrium with a heat bath (a neighboring system that is so much larger that its temperature is unaffected by the smaller system). For simplicity, imagine that our system has only n states, labeled 1 through n, where state i has energy Ei. (To handle continuous systems we can simply replace sums over states with integrals.) If we are given the average energy E of the system, we determine the correspondingprobability distribution on the states by − finding probabilities p1,...,pn so that i piEi = E and the entropy i pi log pi is maximized, where we interpret 0 log 0 as 0. In other words, the system is as disordered as possible, subject to having a certain average energy. It is not difficult to solve this optimization problem via Lagrange multipliers, and the result is that log(1/pi)=α + βEi for some constants α and β.Thus,wecanwrite − e βEi p = , i Z LECTURE 2. SYMMETRY AND GROUND STATES 61   −βEi α where the partition function Z = i e ensures that i pi =1(itisalsoe ). Such a probability distribution is called a Gibbs distribution. In physics terms, β turns out to be proportional to the reciprocal of tempera- ture. As the temperature tends to zero, β tends to infinity and the Gibbs distri- bution becomes concentrated on the ground states. As the temperature tends to infinity, β tends to zero and the Gibbs distribution becomes equidistributed among all the states.  − One question we have not yet addressed is why i pi log pi deserves the name entropy. In fact, it is an excellent measure of disorder, essentially because it mea- sures how surprising the probability distribution is on average. Consider how sur- prised we should be by an event of probability p. Call this surprise function S(p), and think of it as a measure of how much you learn from seeing this event happen. (Information theory makes this intuition precise.) Clearly S should be a decreasing function: the higher the probability is, the less surprising it is and the less you learn from seeing it happen. Furthermore, we should have S(pq)=S(p)+S(q). In other words, the amount you learn from independent events is additive. This makes good sense intuitively: if you learn one bit of information from a coin flip, then you learn two bits from two independent coin flips. These conditions uniquely determine the function S up to a constant factor, as − S(p)= log p. Now the entropy is i piS(pi), and this quantity measures disorder by telling us how surprised we’ll be on average by the outcome. Part of the beauty of mathematics is that concepts are connected in ways one would never guess. Gibbs measures are not just a construction from statistical physics, but rather occur throughout mathematics. For example, Dyson recognized that they describe eigenvalues of random matrices [33], as follows. Haar measure gives a canonical probability measure on the unitary group U(n). What does a random n × n unitary matrix chosen from this distribution look like? It has n eigenvalues z1,...,zn on the unit circle, and the Weyl integral formula tells us that the probability density function for these eigenvalues is proportional to 2 |zi − zj | . i

3. Families and universal optimality Given that we are going to study particles interacting via pairwise potential func- tions, what do we hope to learn from it? There are many possibilities: 62 HENRY COHN, PACKING, CODING, AND GROUND STATES

(1) We may care about the ground states for their own sake, as part of pure mathematics or physics (see [11] for many examples in physics). (2) We may seek a highly uniform point distribution so that we can discretize the ambient space. (3) We may wish to construct error-correcting codes by letting the codewords repel each other, so that they become well separated. (4) We may seek well-distributed sample points for numerical integration. To account for these and other goals, we will have to look at a broad range of potential functions. There are also many spaces we could work in, such as spheres, projective spaces, Grassmannians, Euclidean spaces, hyperbolic spaces, and even discrete spaces such as the Hamming cube {0, 1}n. All of these possibilities are interesting, but in this lecture we will focus on spheres. (For comparison, [26]and[22] examine spaces that are rather different from spheres.) Thus, we will focus on the question of what energy minima on spheres look like for a variety of potential functions. As we vary the potential function, how do the optimal configurations change? They vary in some family, and we would like to understand these families. Note that our perspective here is broader than is typical for physics, where the potential function is usually fixed in advance. The simplest case is that the optimal configurations never vary, at least for rea- sonable potential functions, such as inverse power laws.1 For example, 4 points on S2 always form a regular . Abhinav Kumar and I named this property universal optimality [18]. More generally, we can ask for a parameter count for the family, which is 0 for universal optima. As we vary the potential function (say, among all smooth functions), what is the dimension of the space of configurations attained as ground states? There is little hope of proving much about this quantity in general. How- ever, we can try to estimate it from numerical data [8]. These parameter counts can be difficult to predict, because they take into account how well the number of points accommodates different sorts of symmetry. For example, 44 points on S2 vary in a one-parameter family near the putative Coulomb minimizer when we perturb the potential function, while 43 points vary in a 21-parameter family. See Figure 1 for an illustration. What this means is that the 44-point configuration is nearly determined by symmetry, with just one degree of freedom remaining to be specified by the choice of potential function, while the 43-point configuration is far more complex. To give a precise definition of universal optimality, we must specify the class of potential functions. For a finite subset C⊂Sn−1 and a function f :(0, 4] → R,we define the energy of C with respect to the potential function f to be 1 E (C)= f |x − y|2 . f 2 x,y∈C x= y The factor of 1/2 simply corrects for counting each pair twice and is not important. The use of squared Euclidean distance similarly doesn’t matter in principle, since

1Of course it is impossible for a configuration of more than one point to be a ground state for literally every potential function, since minimizing f is the same as maximizing −f.Wemust restrict the class of potential functions at least somewhat. LECTURE 2. SYMMETRY AND GROUND STATES 63

43 points, 21 parameters 44 points, 1 parameter Klein four-group symmetry cubic symmetry (fixed point = double circle, orbit black) (square faces shaded)

Figure 1. Putative ground states for Coulomb energy on S2,and the number of parameters for the families they lie in. the squaring could be incorporated into the potential function, but it turns out to be a surprisingly useful convention. A function f is completely monotonic if it is infinitely differentiable and

(−1)kf (k) ≥ 0 for all k ≥ 0 (i.e., its derivatives alternate in sign, as in inverse power laws). We say C is universally optimal if it minimizes Ef (C) for all completely monotonic f, compared with all |C|-point configurations on Sn−1. It’s not obvious that completely monotonic functions are the right class of functions to use, but they turn out to be. The fact that f is decreasing means the force is repulsive, and convexity means the force grows stronger at short distances. Complete monotonicity is a natural generalization of these conditions, and the results and examples in [18] give evidence that it is the right generalization (see pages 101 and 107–108). Note in particular that inverse power laws are completely monotonic, so universal optima must minimize energy for all inverse power laws. In the circle S1, there is a universal optimum of each size, namely the regular polygon. This is not as straightforward to prove as it sounds, but it follows from Theorem 1.2 in [18], which we will state as Theorem 3.3 in the fourth lecture. In S2, the complete list of universal optima with more than one point is as follows: (1) Two antipodal points (2 points) (2) Equilateral triangle on equator (3 points) (3) Regular tetrahedron (4 points) (4) Regular octahedron (6 points) (5) Regular icosahedron (12 points) See Figure 2. Universal optimality again follows from Theorem 1.2 in [18](after special cases were proved in [83, 46, 2, 47, 3]), while completeness follows from a theorem of Leech in [50]. 64 HENRY COHN, PACKING, CODING, AND GROUND STATES

tetrahedron octahedron icosahedron 4 vertices 6 vertices 12 vertices

Figure 2. Platonic solids whose vertices form universally optimal codes.

The cube and regular dodecahedron are conspicuously missing from this list. The cube cannot be universally optimal, because rotating one face moves its corners further from those of the opposite face, and the dodecahedron fails similarly. Square and pentagonal faces are not particularly favorable shapes for energy minimization, although cubes and dodecahedra can occur for unusual potential functions [20]. Five points are the first case without universal optimality, and they are surpris- ingly subtle. There are two natural ways to arrange the particles: we could include the north and south poles together with an equilateral triangle on the equator (a triangular bipyramid), or the north pole together with a square at constant lati- tude in the southern hemisphere (a square pyramid ). The square pyramid lies in a one-parameter family, where the latitude of the square depends on the choice of potential function. By contrast, the triangular bipyramid is in equilibrium for every potential function, but it becomes an unstable equilibrium for steep inverse power laws. Conjecture 3.1. For each completely monotonic potential function, either the triangular bipyramid or a square pyramid minimizes energy for 5 points in S2. This conjecture really feels like it ought to be provable. Specifying five points on S2 takes ten degrees of freedom, three of which are lost if we take the quotient by symmetries. Thus, we are faced with a calculus problem in just seven variables. However, despite a number of partial results [32, 69, 10, 70], no complete solution is known. The known universal optima in spheres are listed in Table 1. Each of them is an exciting mathematical object that predates the study of universal optimality. For example, the 27 points in R6 correspond to the classical configuration of 27 lines on a cubic surface. One way of thinking about universal optimality is that it highlights similarities between various exceptional structure and helps characterize what’s so special about them. See [18] for descriptions of these objects and how they are related. We’ll discuss the proof techniques in the fourth lecture, while [18] contains detailed proofs. One important source of universal optima is regular polytopes, the higher- dimensional generalizations of the Platonic solids. As in three dimensions, only some of them are universally optimal, specifically the ones with simplicial facets.2

2 Surprisingly, the minimal vectors of D4 are not universally optimal [15], despite their beauty and symmetry. They are the vertices of a regular polytope with octahedral facets, called the regular 24-cell. LECTURE 2. SYMMETRY AND GROUND STATES 65

Table 1. Known universal optima with N points on Sn−1.

nN Description 2 NN-gon nN≤ n + 1 simplex (generalized tetrahedron) n 2n cross polytope (generalized octahedron) 3 12 icosahedron 4 120 regular 600-cell 5 16 hemicube 627Schl¨afli graph 7 56 equiangular lines 8 240 E8 root system 21 112 isotropic subspaces 21 162 strongly regular graph 22 100 Higman-Sims graph 22 275 McLaughlin graph 22 891 isotropic subspaces 23 552 equiangular lines 23 4600 kissing configuration of next line 24 196560 Leech lattice minimal vectors q(q3 +1)/(q +1) (q +1)(q3 + 1) isotropic subspaces (q is a prime power)

The shortest vectors in the E8 lattice (called the E8 root system) also form a universally optimal configuration, as do the shortest vectors in the Leech lattice. It is difficult to depict high-dimensional objects on a two-dimensional page, but Figure 3 shows how the E8 root system appears when viewed from random directions. It is so regular and symmetrical that even these random views display considerable structure. For comparison, Figure 4 shows similar projections of a random point configuration. In up to 24 dimensions, all of the known universal optima are regular polytopes or cross sections of the E8 or Leech configurations. However, the last line of Table 1 shows that there are more examples coming from finite geometry. It’s not plausible that Table 1 is the complete list of universal optima, and in fact [8]constructstwo conjectural examples (40 points in R10 and 64 points in R14), but it seems difficult to find or analyze further universal optima. The gap between 8 and 21 dimensions in Table 1 is puzzling. Are the dimensions in between not favored by universal optimality, or do we just lack the imagination to construct new universal optima in these dimensions?

4. Optimality of simplices It is not difficult to explore energy minimization via numerical optimization, but it is far from obvious how to prove anything about it. Developing proof techniques will occupy most of the remaining lectures, and we will start here by analyzing regular simplices, i.e., configurations of equidistant points. In particular, we will study the spherical code problem: how can we maximize the closest distance between N points on the unit sphere Sn−1? Thisisanimportant 66 HENRY COHN, PACKING, CODING, AND GROUND STATES

Figure 3. Four views of the E8 root system, after orthogonal projection onto randomly chosen planes.

Figure 4. A random 240-point configuration in S7, orthogonally projected onto randomly chosen planes. problem in both geometry and information theory.3 It is a version of the sphere packing problem in spherical geometry, i.e., for spherical caps on the surface of a sphere. Furthermore, it is a degenerate case of energy minimization. If we look at the limit of increasingly steep potential functions, then asymptotically only the minimal distance matters and we obtain an optimal spherical code.

3If we represent radio signals by vectors in Rn by measuring the amplitudes at different frequencies, then the squared vector length is proportional to the power of the radio signal. If we transmit a constant-power signal, then we need an error-correcting code on the surface of a sphere, i.e., a spherical code. LECTURE 2. SYMMETRY AND GROUND STATES 67

When N ≤ n + 1, we will see shortly that the optimal solution is a regular simplex. In other words, the points are all equidistant from each other, forming an n-dimensional analogue of the equilateral triangle or regular tetrahedron. The cutoff at n + 1 simply reflects the fact that Rn cannot contain more than n +1 equidistant points. Let x, y denote the inner product of x and y. Inner products can be used to measure distances on the unit sphere, since |x − y|2 = x − y, x − y = |x|2 + |y|2 − 2x, y =2− 2x, y when |x| = |y| = 1. Thus, maximizing the distance |x − y| is equivalent to mini- mizing the inner product x, y. Note that if x1,...,xN are unit vectors forming the vertices of a regular simplex centered at the origin, then all the inner products between them must be −1/(N−1). To see why, observe that x + ···+ x = 0 and hence 1 N 2 0=|x1 + ···+ xN | = N + xi,xj , i= j while all the N(N −1) inner products in this sum are equal. This calculation already contains all the ingredients needed to prove that regular simplices are optimal spherical codes: Proposition 4.1. If N ≤ n +1, then the unique optimal N-point spherical code in Sn−1 is the regular simplex centered at the origin. Of course it is unique only up to rigid motions.

n−1 Proof. Suppose x1,...,xN are points on S . The fundamental inequality we’ll use is    2 N    ≥  xi 0. i=1 2 Using |xi| = 1, this inequality expands to N + xi,xj ≥0, i= j which amounts to 1 1 x ,x ≥− . N(N − 1) i j N − 1 i= j In other words, the average inner product is at least −1/(N − 1), and hence the maximal inner product (which corresponds to the minimal distance) must be at least that large. Equality holds iff all the inner products are the same and i xi =0. This condition is equivalent to all the points being equidistant with centroid at the origin, which can be achieved iff N ≤ n +1.  Exercise 4.2. Prove that regular simplices are universally optimal, and more generally that they minimize Ef for every decreasing, convex potential function f. Our discussion here may give the impression that the existence of regular sim- plices is trivial, while their optimality is a little more subtle. This impression is reasonable for Euclidean space, but in projective spaces or Grassmannians the ex- istence of regular simplices is far more mysterious. See, for example, [22]. 68 HENRY COHN, PACKING, CODING, AND GROUND STATES

The inequality    2 N    ≥  xi 0 i=1 is useful for analyzing simplices, but it is not obvious at a glance what its significance is or how it fits into a broader theory. It turns out to be a special case of Delsarte’s linear programming bounds, which are also equivalent to the nonnegativity of the structure factor in statistical physics. In the upcoming lectures, we’ll look at these connections. The fundamental theme will be geometrical constraints on correlation functions. LECTURE 3 Interlude: Spherical harmonics

Spherical harmonics are a spherical generalization of Fourier series and a fun- damental tool for understanding particle configurations on the surface of a sphere. Despite their importance in mathematics, they are not nearly as well known as Fourier series are, so this lecture will be devoted to the basic theory. We’ll begin with a quick review of Fourier series, to establish notation and fundamental con- cepts, and then we’ll do the same things in higher dimensions. Our discussion will start off in a rather elementary fashion, but then gradually increase in sophistica- tion. We won’t go through complete proofs of basic facts such as convergence of Fourier series under the L2 norm, but we will at least see an outline of what is true and why, to a level of detail at which the proofs could be completed using standard facts from introductory graduate classes.

1. Fourier series We will identify the circle S1 with the quotient R/2πZ via arc length (i.e., the quotient of the real line in which we wrap around after 2π units). In other words, a function on the circle is the same as a function on R with period 2π. We know from basic analysis that every sufficiently nice function f from R/2πZ to C can be expanded in a Fourier series ikx (1.1) f(x)= ake . k∈Z Of course we could replace the complex exponentials with trigonometric functions by writing eikx =coskx + i sin kx, but the exponentials will be more pleasant. The coefficients a are determined by orthogonality via 2π 1 −ix a = f(x)e dx, 2π 0 because we can interchange the sum (1.1) with the integral and apply  1 2π 1ifk = ,and (1.2) ei(k−)x dx = 2π 0 0otherwise. The right setting for Fourier series is the space of square-integrable functions on S1, i.e.,    2π L2(S1)= f : R/2πZ → C  |f(x)|2 dx < ∞ . 0 This is a Hilbert space under the inner product ·, · defined by 1 2π f,g = f(x)g(x) dx, 2π 0 69 70 HENRY COHN, PACKING, CODING, AND GROUND STATES

2 which corresponds to the L norm ·2 defined by  2π 1 2 f2 = f,f = |f(x)| dx. 2π 0

2 1 The exponential functions are orthonormal in L (S ): if fk is the function defined ikx by fk(x)=e , then (1.2) amounts to  1ifk = ,and fk,f = 0otherwise.

Furthermore, these functions form an orthonormal basis of L2(S1). We can express this fact algebraically as follows. If Vk consists of the complex multiples of the function fk,then  2 1 L (S )= Vk. k∈Z (Here ⊕ is the orthogonal direct sum. The hat indicates a Hilbert space completion; without the hat, the direct sum would contain only sums of finitely many expo- nentials.) In other words, the partial sums of the Fourier series of an L2 function converge to that function under the L2 norm. However, it’s important to keep in mind that they needn’t converge pointwise. The most important property of the decomposition  2 1  L (S )= Vk. k∈Z is that it is compatible with the symmetries of S1 (i.e., the rigid motions that preserve S1), as we will see shortly. Recall that the symmetry group O(2) of S1 consists of rotations and reflections that fix the center of the circle, with the subgroup SO(2) consisting of just the rotations. The notation is based on the fact that these symmetries can be written in terms of orthogonal matrices, but we do not need that perspective here. Each symmetry g of S1 acts on functions f : S1 → C by sending f to the function gf defined by (gf)(x)=f(g−1x). The inverse ensures that the associative law (gh)f = g(hf) holds. For motivation, recall that moving the graph of a function f(x) one unit to the right amounts to graphing f(x − 1), not f(x + 1). Similarly, the graph of gf is simply the graph of f transformed according to g. Under this action, L2(S1) is a representation of the group O(2). In other words, the group O(2) acts on L2(S1) by linear transformations. In fact, it is a unitary representation, which means that symmetries of S1 preserve the L2 norm. We would like to decompose L2(S1) into irreducible representations of O(2) or SO(2). In other words, we would like to break it apart into orthogonal subspaces preserved by these groups, with the subspaces being as small as possible. For the rotation group SO(2), we’re already done. In the R/2πZ picture, rotations of S1 correspond to translations of R. The exponential functions are already invariant: if we translate x → eikx by t,weget

eik(x−t) = e−ikteikx, LECTURE 3. INTERLUDE: SPHERICAL HARMONICS 71 which is the original function x → eikx multiplied by the constant e−ikt.Inother words, Vk is itself a representation of SO(2), and  2 1 L (S )= Vk k∈Z is the complete decomposition of L2(S1) under this group action. Each summand must be irreducible, since it’s one-dimensional. There are many ways to restate this decomposition, such as: (1) The Fourier basis simultaneously diagonalizes the translation operators on L2(R/2πZ) (i.e., rotations of L2(S1)). (2) The exponential functions are simultaneous eigenfunctions for the trans- lation operators. It turns out that the reason why the Fourier decomposition is particularly simple, with one-dimensional summands, is that the rotation group SO(2) is abelian. But what about the full symmetry group O(2)? It is generated by SO(2) and any one reflection, because all the reflections are conjugate by rotations. In the R/2πZ picture, we can use the reflection x →−x. The nonconstant exponential functions are not preserved by this reflection, because it takes x → eikx to x → e−ikx. In other words, it interchanges k with −k. However, this is no big deal. Instead of keeping the representations Vk and V−k separate, we combine them to form Wk = Vk ⊕ V−k when k>0 (while we ikx −ikx take W0 = V0). Now Wk is the span of x → e and x → e , or equivalently x → cos kx and x → sin kx if we expand e±ikx =coskx ± i sin kx. These spaces Wk are preserved by O(2), because this group is generated by SO(2) and x →−x. Thus, the decomposition of L2(S1) into irreducible representations of O(2) is  2 1 L (S )= Wk. k≥0 This decomposition is just slightly more complicated than the one for SO(2), be- cause dim Wk =2whenk>0. Another way to think of this equation is as the spectral decomposition of the Laplacian operator d2/dx2. Specifically, d2 eikx = −k2eikx. dx2 2 Thus, Wk is the eigenspace with eigenvalue −k . The Laplacian plays a fundamental role, since it is invariant under the action of O(2). (In other words, translating or reflecting a function commutes with taking its Laplacian.) In fact, the Laplacian generates the algebra of isometry-invariant differential operators on S1, but that’s going somewhat far afield from anything we will need.

2. Fourier series on a torus The S1 theory generalizes pretty straightforwardly if we think of S1 as a one- dimensional torus. We can view a higher-dimensional flat torus as Rn/Λ, where Λ is a lattice in Rn. In other words, we simply take a fundamental cell for Λ and wrap around whenever we cross the boundary. When we looked at S1,wewroteit as R1/ΛwithΛ=2πZ, and it’s worth keeping this example in mind. 72 HENRY COHN, PACKING, CODING, AND GROUND STATES

We can decompose L2(Rn/Λ) into exponential functions in exactly the same way as we did for S1. It works out particularly simply since Rn/Λ is an abelian group. To write this decomposition down, we need to figure out which exponential functions are periodic modulo Λ. Suppose y ∈ Rn, and consider the exponential function x → e2πi x,y from Rn to C.Here·, · denotes the usual inner product on Rn (not the inner product on functions used in the previous section). This formula defines a function on Rn/Λ if and only if it is invariant under translation by vectors in Λ. What happens if we translate the function x → e2πi x,y by a vector z?It gets multiplied by e−2πi z,y , and so it is always an eigenfunction of the translation operator. Furthermore, it is invariant under translation by vectors in Λ if and only if y satisfies e2πi z,y =1 for all z ∈ Λ, which is equivalent to z,y∈Z for all z ∈ Λ. Let Λ∗ = {y ∈ Rn |z,y∈Z for all z ∈ Λ} be the dual lattice to Λ. Thus, the exponential functions that are periodic modulo Λ are parameterized by Λ∗. ∗ Exercise 2.1. Prove that Λ is a lattice. Specifically, prove that if v1,...,vn ∗ ∗ ∗ is any basis of Λ, then Λ has v1 ,...,vn as a basis, where these vectors are the dual basis vectors satisfying  ∗ 1ifi = j,and vi,v = j 0otherwise. Deduce also that (Λ∗)∗ =Λ. 2πi x,y Let Vy be the complex multiples of x → e .Then  2 n L (R /Λ) = Vy, y∈Λ∗ which is the decomposition into irreducible representations under the translation action. When n = 1, the lattice Λ is determined up to scaling. In the previous section we took Λ = 2πZ,inwhichcaseΛ∗ =(2π)−1Z. The elements of Λ∗ are (2π)−1k, ikx where k is an integer, and V(2π)−1k is spanned by x → e . Thus, we recover exactly the same theory as in the previous section, except that we now write V(2π)−1k instead 2πikx of Vk. It’s arguably prettier to take Λ = Z and use the functions x → e , but this is a matter of taste. The higher-dimensional analogue of the O(2) theory is a little more subtle. The map x →−x is always a symmetry of Rn/Λ, and taking it into account means n combining Vy with V−y as before. Generically, all the symmetries of R /Λare generated by translations and x →−x. However, particularly nice lattices may have further symmetries. If G is the automorphism group of the lattice itself, then the full group of isometries of Rn/Λ is the semidirect product of G with the additive group Rn/Λ. What effect this has on the decomposition of L2(Rn/Λ) depends on the representation theory of G. However, for many purposes this is not important, and the decomposition under translations alone will suffice. LECTURE 3. INTERLUDE: SPHERICAL HARMONICS 73

3. Spherical harmonics If we think of S1 as a one-dimensional sphere, rather than a one-dimensional torus, then it is less clear how to generalize Fourier series. Instead of exponential functions, we’ll have to use spherical harmonics. The symmetry group of the unit sphere Sn−1 = {x ∈ Rn ||x|2 =1} is the orthogonal group O(n), which consists of n × n orthogonal matrices. As before, L2(Sn−1) is a Hilbert space under the inner product f,g = f(x)g(x) dx, Sn−1 where the integral is taken with respect to the surface measure on Sn−1,and L2(Sn−1) is a unitary representation of O(n). We would like to decompose it into irreducible representations of O(n). 2 n−1 n−1 To get a handle on L (S ), we will study the polynomials on S .LetPk be the subset of L2(Sn−1) consisting of polynomials on Rn of total degree at most k. (Strictly speaking, it consists of the restrictions of these polynomials to Sn−1, since two different polynomials can define the same function on the unit sphere.) Then P0 ⊆P1 ⊆P2 ⊆ ..., and each Pk is a representation of O(n). To see why, note that rotating or reflecting a polynomial gives another polynomial of the same degree; in fact, this is true for any invertible linear transformation. Let W0 = P0,andfork>0letWk be the orthogonal complement of Pk−1 in Pk.ThenWk is a representation of O(n), because Pk−1 and Pk are representations and the inner product in L2(Sn−1)isO(n)-invariant. Iterating this decomposition shows that P ⊕ ⊕···⊕  k = W0 W1 Wk. P 1 2 n−1 Furthermore, k k is dense in L (S ), and hence  2 n−1  L (S )= Wk. k≥0 We have thus decomposed L2(Sn−1) into finite-dimensional representations of O(n). In fact they are irreducible, as we will see in the next lecture, but that fact is by no means obvious. This decomposition may sound abstract, but it’s actually quite elementary, since it is simply given by polynomials. Let’s check that it agrees with what we did for S1. Polynomials on R2 canbewrittenintermsofthecoordinatevariablesx and y, and in the R/2πZ picture we have x =cosθ and y =sinθ with θ ∈ R/2πZ. Thus, Pk consists of polynomials of degree at most k in the functions θ → cos θ and θ → sin θ.Ifwewritecosθ =(eiθ + e−iθ)/2andsinθ =(eiθ − e−iθ)/(2i), then we iθ find that the elements of Pk involve powers of e ranging from −k to k, and every such power is in Pk.Inotherwords,

Pk = V−k ⊕ V−(k−1) ⊕···⊕Vk−1 ⊕ Vk

1Continuous functions are dense in L2(Sn−1), and the Stone-Weierstrass theorem tells us that polynomials are dense in the space of continuous functions. 74 HENRY COHN, PACKING, CODING, AND GROUND STATES in the notation from §1. In particular, the orthogonal complement Wk of Pk−1 in Pk is indeed V−k ⊕ Vk when k>0, which agrees with our previous construction. 2 n−1 Returning to L (S ), we call the elements of Wk spherical harmonics of degree k. Note that the word “harmonic” generalizes the term from music theory for a note whose frequency is an integer multiple of the base frequency; this term literally describes Wk when n = 2, and it is applied by analogy in higher dimensions. Writing Wk down explicitly is a little subtle, because two different polynomials Rn n−1 2 ··· 2 on can restrict to the same function on S . For example, x1 + + xn and 1 are indistinguishable on the unit sphere. To resolve this ambiguity, we will choose a canonical representative for each equivalence class. Lemma 3.1. For each polynomial on Rn, there is a unique harmonic polynomial on Rn with the same restriction to Sn−1. Recall that harmonic means Δg =0,where 2 2 ∂ ··· ∂ Δ= 2 + + 2 ∂x1 ∂xn n is the Laplacian on R .Iff is a harmonic polynomial with f|Sn−1 = g|Sn−1 ,then f is called a harmonic representative for g. The main fact we’ll need about harmonic functions is the maximum principle [4, p. 7]: the maximum of a harmonic function on a domain D cannot occur in the interior of D (instead, it must occur on the boundary). Of course, multiplying the function by −1 shows that the same is true for the minimum.

Proof. Uniqueness follows immediately from the maximum principle: if

g1|Sn−1 = g2|Sn−1 with both g1 and g2 harmonic, then g1 − g2 is a harmonic function that vanishes on Sn−1. It must therefore vanish inside the sphere as well (since its minimum and maximum over the ball must be attained on the sphere), which implies that g1 = g2 because they are polynomials. Proving existence of a harmonic representative is only slightly trickier. Let Qk denote the space of polynomials of degree at most k on Rn. Note that the difference n−1 between Pk and Qk is that Pk consists of the restrictions to S , and thus Pk is the quotient of Qk by the polynomials whose restrictions vanish. Multiplication by 2 ··· 2 − Q Q n−1 x1 + + xn 1maps k−2 injectively to k, and its image vanishes on S ,so

dim Pk ≤ dim Qk − dim Qk−2.

On the other hand, Δ maps Qk to Qk−2, and hence | ≥ Q − Q ≥ P dim ker Δ Qk dim k dim k−2 dim k. | P By uniqueness, the restriction map from ker Δ Qk to k is injective, and thus the | ≥ P P inequality dim ker Δ Qk dim k implies that each polynomial in k must have a | P  harmonic representative (and dim ker Δ Qk =dim k). Another way to understand spherical harmonics is as eigenfunctions of the 2 spherical Laplacian ΔSn−1 , which acts on C functions on the sphere (i.e., twice continuously differentiable functions). The right setting for this operator is the theory of Laplace-Beltrami operators in Riemannian geometry, but we can give a LECTURE 3. INTERLUDE: SPHERICAL HARMONICS 75 quick, ad hoc definition as follows. Given a function f on Sn−1,extendittoa n radially constant function f on R \{0}. Then we define Δ n−1 by radial  S  n− ΔS 1 f = Δfradial Sn−1 .

In other words, ΔSn−1 f measures the Laplacian of f when there is no radial change. It is often notationally convenient to extend the operator ΔSn−1 to apply to functions f : Rn \{0}→R, rather than just functions defined on the unit sphere. We can do so by rescaling everything to the unit sphere. More precisely, to define n−1 ΔSn−1 f at the point x, we consider the function g : S → R defined by g(y)= f(|x|y), and we let ΔSn−1 f(x)=ΔSn−1 g(x/|x|). n The advantage of being able to apply ΔSn−1 to functions on R \{0} is that it becomes the angular part of the Euclidean Laplacian in spherical coordinates: Exercise 3.2. Prove that if r denotes the distance to the origin and ∂/∂r is the radial derivative, then for every C2 function f : Rn → R, ∂2f n − 1 ∂f 1 (3.1) Δf = + + Δ n−1 f ∂r2 r ∂r r2 S when r =0. If f is homogeneous of degree k, then (3.1) becomes k(k − 1)f (n − 1)kf 1 Δf = + + Δ n−1 f. r2 r2 r2 S

Then Δf = 0 is equivalent to ΔSn−1 f = −k(k + n − 2)f. In other words, harmonic functions that are homogeneous of degree k are eigenfunctions of the spherical Laplacian with eigenvalue −k(k + n − 2). We will see shortly that the spherical k harmonics in W are all homogeneous of degree k, and thus that the spaces Wk are the eigenspaces of ΔSn−1 . First note, that the Euclidean Laplacian maps homogeneous polynomials of degree k to homogeneous polynomials of degree k − 2. Thus, every harmonic poly- nomial is the sum of homogeneous harmonics. In terms of spherical harmonics, Pk is the sum of the eigenspaces of ΔSn−1 with eigenvalues − ( +n−2) for =0, 1,...,k. These eigenspaces are orthogonal, because the spherical Laplacian is symmetric: Lemma 3.3. For C2 functions f and g on Sn−1,

f,ΔSn−1 g = ΔSn−1 f,g. Proof. This identity is well-known for the Laplace-Beltrami operator, but verifying it using our ad hoc definition takes a short calculation. Replace f and g with their radial extensions to Rn \{0}, and let Ω={x ∈ Rn | 1/2 ≤|x|≤2}. Equation (3.1) implies that 2 n−3 f,ΔSn−1 g−ΔSn−1 f,g ωnr dr = fΔg − gΔf, 1/2 Ω where the integral over Ω is with respect to Lebesgue measure and ωn is the surface n−1 n−3 area of S . In this equation, ωnr combines the volume factor from spherical 2 coordinates with the 1/r factor multiplying ΔSn−1 f in (3.1). 76 HENRY COHN, PACKING, CODING, AND GROUND STATES

Now Green’s identity tells us that ∂g ∂f fΔg − gΔf = f − g , Ω ∂Ω ∂n ∂n where ∂/∂n denotes the normal derivative and the integral over ∂Ω is with respect to surface measure. It vanishes because ∂f/∂n = ∂g/∂n = 0 by construction. 

Because Pk is the sum of the eigenspaces of ΔSn−1 with eigenvalues − ( + n − 2) for =0, 1,...,k and these eigenspaces are orthogonal, the orthogonal complement of Pk−1 in Pk must be the −k(k + n − 2) eigenspace. Thus, Wk consists of the harmonic polynomials that are homogeneous of degree k.Thisgives a rather concrete, if cumbersome, description of the space of spherical harmonics. By contrast, people sometimes make spherical harmonics look unnecessarily exotic by writing them in spherical coordinates as eigenfunctions of the Laplacian. Exercise 3.4. Compute the homogeneous harmonic polynomials explicitly when n = 2, and check that this computation agrees with our earlier analysis of S1.

We will see in the next lecture that Wk is an irreducible representation of O(n). Thus, we have found the complete decomposition of L2(Sn−1) into irreducible representations, as well as the spectral decomposition of the Laplacian. The biggest 1 conceptual difference from S is that the space Wk of degree k spherical harmonics has much higher dimension than 2 in general, but that’s not an obstacle to using this theory. LECTURE 4 Energy and packing bounds on spheres

1. Introduction In this lecture, we will use spherical harmonics to prove bounds for packing and energy minimization on spheres.1 Our technique will be essentially the same as in the proof of Proposition 4.1 from the second lecture, but the bounds will be more sophisticated algebraically and much more powerful. By the end of the lecture we will be able to solve the kissing problem in R8 and R24,aswellasanalyzealmost all of the known cases of universal optimality. In the next lecture we will tackle Euclidean space using much the same approach, but the analytic technicalities will be greater and it will be useful to have looked at the spherical case first. Everything we do will be based on studying the distances that occur between pairs of points. Motivated by error-correcting codes, we call a finite subset C of Sn−1 a code.Thedistance distribution of a code measures how often each pairwise distance occurs. For −1 ≤ t ≤ 1, define the distance distribution A of C by  2 At =# (x, y) ∈C |x, y = t , where ·, · denotes the usual inner product on Rn. Recall that |x − y|2 =2−x, y √when x and y are unit vectors; thus, At counts the number of pairs at distance 2 − 2t, but inner products are a more convenient way to index these distances. In physics terms [77, p. 63], the distance distribution is equivalent to the pair correlation function, although it is formulated a little differently. We can express the energy for a pair potential function f in terms of the distance distribution via 2 (1.1) f(|x − y| )= f(2 − 2t)At. x,y∈C −1≤t<1 x= y (In the sum on the right, there are uncountably many values of t, but only finitely many of the summands are nonzero. Note that the restriction to t<1isto avoid self-interactions; it corresponds to x = y on the left.) Thus, figuring out which energies can be attained amounts to understanding what the possible pair correlation functions are. Which constraints must they satisfy? We have made an important trade-off here. The dependence of energy on the distance distribution is as simple as possible, because the right side of (1.1) is a linear function of the variables At. However, the nonlinearity in this problem cannot simply disappear. Instead, it reappears in the question of which distance distributions occur for actual point configurations.

1Analogous techniques work in various other settings, such as projective spaces or Grassmannians.

77 78 HENRY COHN, PACKING, CODING, AND GROUND STATES

≥ There are some obvious constraints for an N-point code: At 0 for all t, 2 A1 = N,and t At = N . They follow trivially from the definition  2 At =# (x, y) ∈C |x, y = t .

Another obvious constraint is that At must be an integer for each t, but we will generally ignore this constraint, because optimization theory does not handle inte- grality constraints as seamlessly as it handles inequalities. There are also less obvious constraints, such as Att ≥ 0. t To see why this inequality holds, note that Att = x, y, t x,y∈C because At counts how often t occurs as an inner product between points in C. Thus, !    2     ≥ Att = x, y = x, y =  x 0. t x,y∈C x∈C y∈C x∈C Recall that this is the inequality we used to analyze simplices at the end of the second lecture. Delsarte discovered an infinite sequence of linear inequalities generalizing this one.2 The factor of t above is replaced with certain special functions, namely n Gegenbauer or ultraspherical polynomials, which are a family Pk of polynomials in n one variable with deg(Pk )=k. The Delsarte inequalities then say that whenever A is the distance distribution of a configuration in Sn−1, n ≥ (1.2) AtPk (t) 0 t

n for all k.Inparticular,P1 (t)=t, from which we recover the previous inequality, n and P0 (t) = 1, while the higher-degree polynomials depend on n. The Delsarte inequalities are far from a complete characterization of the dis- tance distributions of codes. However, they are particularly beautiful and important constraints on these distance distributions. An equivalent reformulation of (1.2) is that for every finite set C⊂Sn−1, n ≥ Pk ( x, y ) 0. x,y∈C

We will return in §5 to what ultraspherical polynomials are and why they have this property. In the meantime, we will treat them as a black box while we explore how the Delsarte inequalities are used to prove bounds.

2Delsarte’s initial discovery was in a discrete setting [30], but analogous techniques apply to spheres [31, 43]. LECTURE 4. ENERGY AND PACKING BOUNDS ON SPHERES 79

2. Linear programming bounds The energy of a code is given by the linear function 1 f(2 − 2t)A 2 t −1≤t<1 of its distance distribution, and the Delsarte inequalities n ≥ AtPk (t) 0 t are linear in A as well. The linear programming bounds minimize the energy subject to these linear constraints.3 Because of the linearity, these bounds are particularly well behaved and useful. The only computational difficulty is that there are infin- itely many variables At. Let’s write down the linear programming bounds more precisely. To begin with, wearegiventhedimensionn,numberN of points, and potential function f.Then linear programming bounds attempt to choose At for −1 ≤ t ≤ 1 so as to minimize 1 A f(2 − 2t) 2 t −1≤t<1 subject to

A1 = N, A ≥ 0for−1 ≤ t ≤ 1, t 2 At = N , and t n ≥ ≥ AtPk (t) 0 for all k 1. t This optimization problem gives us a lower bound for the energy of codes in Sn−1, because every code has a corresponding distance distribution. However, there is no reason to expect the bound to be sharp in general: the optimal choice of At will usually not even be integral, let alone come from an actual code. Of course one could improve the bound by imposing integrality, but then the optimization problem would become far less tractable. In particular, it would no longer be a convex optimization problem. Linear programming bounds are well suited to computer calculations, but they have not yet been fully optimized. Any given case can be solved numerically, but the general pattern is unclear. In particular, for most n, N,andf we do not know the optimal solution. In practice, it is useful to apply linear programming duality, in which we try to prove bounds on energy by taking linear combinations of the constraints. If we multiply the Delsarte inequalities n ≥ AtPk (t) 0 t by constants hk and then sum over k, we obtain the following theorem.

3Recall that “linear programming” means optimizing a linear function subject to linear constraints. There are efficient algorithms to solve finite linear programs. 80 HENRY COHN, PACKING, CODING, AND GROUND STATES  Theorem . n ≥ ≥ 2.1 (Yudin [83]) Suppose h = k hkPk with hk 0 for k 1,and suppose h(t) ≤ f(2 − 2t) for t ∈ [−1, 1). Then every N-point configuration C on − Sn 1 satisfies 2 2 f(|x − y| ) ≥ N h0 − Nh(1). x,y∈C x= y

The auxiliary function h is generally a polynomial, in which case hk =0forall n − sufficiently large k, but convergence of k hkPk on [ 1, 1] suffices. (It turns out | n|≤ n − that Pk Pk (1) on [ 1, 1], and hence the convergence is automatically absolute and uniform.) Proof. We have f(|x − y|2) ≥ h(x, y) (because f(2 − 2t) ≥ h(t)pointwise) x,y∈C x,y∈C

x=y x=y = h(x, y) − Nh(1) ∈C x,y 2 − n = N h0 Nh(1) + hk Pk ( x, y ) k≥1 x,y∈C 2 ≥ N h0 − Nh(1), as desired.  Note that the proof rests on the fundamental inequality n ≥ Pk ( x, y ) 0. x,y∈C The proof technique might seem extraordinarily wasteful, since it involves throwing n away many terms in our sum. However, Pk ( x, y ) averages to zero over the whole sphere when k ≥ 1, which suggests that the double sums n Pk ( x, y ) x,y∈C may not be so large after all when C is well distributed over the sphere. Theorem 2.1 tells us that to prove a lower bound for f-energy, all we need is a lower bound h for the potential function f such that h has non-negative ultra- spherical coefficients. Such an auxiliary function is a convenient certificate for a lower bound. Outside of a few special cases, nobody knows the optimal h for a given f. However, numerical optimization is an effective way to compute approximations to it. One can use more sophisticated techniques such as sums of squares and semidefinite programming, but even the most straightforward approach works well in practice: let h be a polynomial of degree d, and instead of imposing the inequality h(t) ≤ f(2 − 2t) for all t, impose it just at finitely many locations (chosen fairly densely in [−1, 1), of course). Then we are left with a finite linear program, i.e., a linear optimization problem with only finitely many variables and constraints, which is easily solved numerically using standard software. The resulting auxiliary function h might not satisfy h(t) ≤ f(2 − 2t) everywhere, but any violations will be small, and we can eliminate them by adjusting the constant term h0 without substantially changing the energy bound. LECTURE 4. ENERGY AND PACKING BOUNDS ON SPHERES 81

3. Applying linear programming bounds Linear programming bounds are behind almost every case in which universal opti- mality, or indeed any sharp bound on energy, is known. As mentioned above, they are generally far from sharp, but for certain codes they miraculously give sharp bounds. This is the case for all the universal optima listed in Table 1 from the second lecture. When could the bound be sharp for a configuration C? Equality holds in Theorem 2.1 iff every term we throw away in the proof is actually already zero. Inspecting the proof leads to the following criteria: Lemma 3.1. The energy lower bound in Theorem 2.1 is attained by a code C if and only if f(|x − y|2)=h(x, y) for all x, y ∈C with x = y,and n Pk ( x, y )=0 x,y∈C for all k ≥ 1 for which hk > 0. The first condition says that h(t)=f(2 − 2t) whenever t = x, y with x, y ∈C and x = y. Because h(t) ≤ f(2 − 2t) for all t, the functions h and f cannot cross. Instead, they must agree to order at least 2 whenever they touch. In practice, sharp bounds are usually obtained in the simplest possible way based on this tangency constraint. We choose h to be a polynomial of as low a degree as possible subject to agreeing with f to order 2 at each inner product that occurs between distinct points in C. This specifies a choice of h, but it is not obvious that it has any of the desired properties. For example, the inequality h(t) ≤ f(2−2t) might be violated in between the points at which we force equality, and there is no obvious reason to expect the ultraspherical coefficients hk to be nonnegative. This construction of h is generally far from optimal when it works at all, but for particularly beautiful codes it does remarkably well at proving sharp bounds. For example, let’s show that regular simplices are universally optimal, which was Exercise 4.2 from the second lecture. Recall that for N ≤ n+1, the N-point regular simplex C in Sn−1 has all inner products equal to −1/(N − 1). Proposition 3.2. For N ≤ n +1,theN-point regular simplex is universally optimal in Sn−1. We’ll describe the proof in terms of linear programming bounds, but one could reword it to use just the inequality    2     ≥  x 0 x∈C (as was intended in Exercise 4.2 from the second lecture).

Proof. We will show that the simplex in fact minimizes energy for every decreasing, convex potential function f, which is an even stronger property than universal optimality. Let h(t) be the tangent line to f(2 − 2t)att = −1/(N − 1); in other words, h(t)=f 2+2/(N − 1) − 2f  2+2/(N − 1) t +1/(N − 1) . 82 HENRY COHN, PACKING, CODING, AND GROUND STATES

This function is the lowest-degree polynomial that agrees with f(2 − 2t)toorder2 at all the inner products occurring in the regular simplex, which makes it a special case of the construction outlined above. Because f is convex, h(t) ≤ f(2−2t) for all t. Thus,thefirstinequalityweneed for h does in fact hold. To check the nonnegativity of the ultraspherical coefficients (aside from the constant term), note that the first two ultraspherical polynomials are 1 and t.Ifweexpress h(t) in terms of this basis, then the coefficient of t is −2f  2+2/(N − 1) , which is nonnegative since f is decreasing. Thus, h satisfies the hypotheses of Theorem 2.1. Furthermore, h(t)=f(2−2t)whent = −1/(N −1) by construction, and    2     x, y =  x =0. x,y∈C x∈C These are the conditions for a sharp bound in Lemma 3.1, and so we conclude that our energy bound is equal to the energy of the regular simplex. Hence regular simplices minimize energy for all decreasing, convex potential functions, and in particular they are universally optimal.  Codes with more inner products are more complicated to handle, but in any given case one can figure out whether this approach works. If one analyzes the technique in sufficient generality, it proves the following theorem, which extends a theorem of Levenshtein [53]. Theorem 3.3 (Cohn and Kumar [18]). Every m-distance set that is a spherical (2m − 1)-design is universally optimal. Here an m-distance set is a set in which m distances occur between distinct points, and a spherical k-design is a finite subset D of the sphere Sn−1 such that for every polynomial p: Rn → R of total degree at most k, the average of p over D is equal to its average over the entire sphere Sn−1. In other words, averaging at the points of D is an exact numerical integration formula for polynomials up to degree k, which means these points are exceedingly well distributed over the sphere. This theorem suffices to handle every known universal optimum on the surface of a sphere (see Table 1 in the second lecture) except the regular 600-cell, which is dealt with in §7of[18]. Surely that’s not the only exception, but it is unclear where to find other universal optima that go beyond Theorem 3.3.

4. Spherical codes and the kissing problem Recall that the spherical code problem asks whether N points can be arranged on Sn−1 so that no two are closer than angle θ to each other along the great circle connecting them. In other words, the minimal angle between the points is at least θ. This is a packing problem: how many spherical caps of angular radius θ/2can we pack on the surface of a sphere? The most famous special case is the kissing problem discussed in the first lec- ture. Given a central unit ball, the kissing problem asks how many non-overlapping unit balls can be arranged tangent to it. Equivalently, the points of tangency should form a spherical code with minimal angle at least 60◦ (see Figure 1). Linear programming bounds apply to this problem. In fact, packing problems were the original application for these bounds [30], before Yudin applied them to energy minimization [83]. LECTURE 4. ENERGY AND PACKING BOUNDS ON SPHERES 83

no overlap

≥ 60◦

Figure 1. An angle of 60◦ or more between tangent spheres is equivalent to avoiding overlap between them.  Theorem . n ≥ ≥ 4.1 Suppose h = k hkPk with hk 0 for k 0 and h0 > 0,and suppose h(t) ≤ 0 for t ∈ [−1, cos θ]. Then every code C in Sn−1 with minimal angle at least θ satisfies |C| ≤ h(1)/h0. Proof. We have |C| ≥ n ≥|C|2  h(1) h( x, y )= hk Pk ( x, y ) h0. x,y∈C k x,y∈C As in the case of energy minimization, this bound is generally not sharp, but on rare occasions we are lucky enough to get a sharp bound. The most famous case is the kissing problem in R8 and R24, which was solved independent by Levenshtein [52] and Odlyzko and Sloane [61]. In particular, the kissing number is 240 in R8 24 and 196560 in R ,asachievedbytheE8 lattice and the Leech lattice. It is not so difficult to prove these upper bounds using Theorem 4.1. In particular, we take h(t)=(t +1)(t +1/2)2t2(t − 1/2) in the R8 case, and h(t)=(t +1)(t +1/2)2(t +1/4)2t2(t − 1/4)2(t − 1/2) in the R24 case. (The roots correspond to the inner products that occur in the kissing configurations.) Checking that these polynomials satisfy the hypotheses of Theorem 4.1 and prove sharp bounds is a finite calculation. Of course presenting it this way makes the proof look like a miracle, and explaining it conceptually requires a deeper analysis [53].

5. Ultraspherical polynomials So far, we have treated ultraspherical polynomials as a black box and taken the Delsarte inequalities on faith. In this section, we will finally examine where these polynomials come from and why the inequalities hold. 84 HENRY COHN, PACKING, CODING, AND GROUND STATES

One simple (albeit unmotivated) description is that ultraspherical polynomials for Sn−1 are orthogonal polynomials with respect to the measure (1 − t2)(n−3)/2 dt on [−1, 1]. In other words, 1 n n − 2 (n−3)/2 Pk (t)P (t)(1 t ) dt =0 −1  n for k = .Equivalently,Pk is orthogonal to all polynomials of degree less than k with respect to this measure, because all such polynomials are linear combinations n n of P0 ,...,Pk−1. We’ll see shortly where the measure comes from and why this orthogonality characterizes the ultraspherical polynomials, but first let’s explore its consequences. Orthogonality uniquely determines the ultraspherical polynomials up to scaling n (and the scaling is irrelevant for our purposes, as long as we take Pk (1) > 0soasnot to flip the Delsarte inequality). Specifically, we just apply Gram-Schmidt orthog- onalization to 1,t,t2,..., which gives an algorithm to compute these polynomials explicitly. It’s not the most efficient method, but it works. Although orthogonality may sound like an arcane property of a sequence of polynomials, it has many wonderful and surprising consequences. For example, it n − n implies that Pk has k distinct roots in [ 1, 1]. To see why, suppose Pk changed sign at only m points r1,...,rm in [−1, 1], with m

(which holds because (t − r1) ...(t − rm) has degree less than k). Thus, m = k and n − Pk has k distinct roots in [ 1, 1], which means it’s a highly oscillatory function. Although ultraspherical polynomials can be characterized via orthogonality, it’s not really a satisfactory explanation of where they come from. To explain that, we will use spherical harmonics. Recall that as a representation of O(n), we can decompose L2(Sn−1)as  2 n−1 L (S )= Wk, k≥0 where Wk consists of degree k spherical harmonics. We can obtain ultraspherical polynomials by studying the evaluation map:let n−1 x ∈ S , and consider the linear map that takes f ∈ Wk to f(x). By duality for finite-dimensional vector spaces, this map must be the inner product with some unique element wk,x of Wk, called a reproducing kernel.Thatis,

f(x)=wk,x,f 2 n−1 for all f ∈ Wk.Notethathere·, · denotes the inner product on L (S ). We will use the same notation for both this inner product and the standard inner product on Rn; to distinguish between them, pay attention to which vector spaces their arguments lie in. n−1 The function wk,x on S has considerable structure. For example, it is in- variant under all symmetries of Sn−1 that fix x:

Lemma 5.1. If T is an element of O(n) such that Tx = x,thenTwk,x = wk,x. LECTURE 4. ENERGY AND PACKING BOUNDS ON SPHERES 85

Proof. This lemma follows easily from the invariance of the inner product on Wk under O(n). We have wk,x,f = Twk,x,f for all f ∈ Wk, because −1 −1 wk,x,f = f(x)=f(Tx)=(T f)(x)=wk,x,T f = Twk,x,f, and hence wk,x = Twk,x. 

Equivalently, wk,x(y) can depend only on the distance between x and y,and n therefore it must be a function of x, y alone. We define Pk by n wk,x(y)=Pk ( x, y ).

The reproducing kernel wk,x is a polynomial of degree k in several variables, because n it is a spherical harmonic in Wk, and thus Pk must be a polynomial of degree k in one variable. (Technically this definition is off by a constant factor from the special n case P1 (t)=t mentioned earlier, but we could easily rectify that by rescaling so n that Pk (1) = 1.) We have finally explained where ultraspherical polynomials come from. They describe reproducing kernels for the spaces Wk, and the importance of reproducing kernels is that they tell how to evaluate spherical harmonics at points. The drawback of the reproducing kernel definition is that it does not make it clear how to compute these polynomials in any reasonable way. In principle one could choose a basis for the homogeneous harmonic polynomials of degree k,inte- grate over the sphere to obtain the inner products of the basis vectors in Wk,write down the evaluation map explicitly relative to this basis, and solve simultaneous linear equations to obtain the reproducing kernel. However, that would be unpleas- antly cumbersome. The beauty of the orthogonal polynomial characterization of ultraspherical polynomials is that it is much more tractable, but we must still see whyitistrue. 2 n−1 First, observe that wk,x and w,x are orthogonal in L (S )fork = ,since they are spherical harmonics of different degrees. Thus, n n (5.1) Pk ( x, y )P ( x, y ) dμ(y)=0, Sn−1 where μ is surface measure. We can now obtain the orthogonality of the ultra- spherical polynomials from the following multivariate calculus exercise: Exercise 5.2. Prove that under orthogonal projection from the surface of the sphere onto a coordinate axis, the measure μ projects to a constant times the measure (1 − t2)(n−3)/2 dt on [−1, 1]. (See [13, p. 2434] for a simple solution.) If we apply this orthogonal projection onto the axis between the antipodal points ±x, then (5.1) becomes 1 n n − 2 (n−3)/2 Pk (t)P (t)(1 t ) dt =0, −1 as desired. As a side comment, we can now see that Wk is an irreducible representation of O(n). If it broke up further, then each summand would have its own reproducing kernel, which would yield two different polynomials of degree k that would be orthogonal to each other as well as to lower degree polynomials. That’s impossible, since the space of polynomials of degree at most k has dimension too low to contain so many orthogonal polynomials. 86 HENRY COHN, PACKING, CODING, AND GROUND STATES

All that remains to prove is the Delsarte inequalities. The key observation is n that Pk ( x, y ) can be written as the inner product of two vectors in Wk depending only on x and y, namely the reproducing kernels: Lemma 5.3. For all x, y ∈ Sn−1 and k ≥ 0, n Pk ( x, y )= wk,x,wk,y .

Proof. Recall that the reproducing kernel property means wk,x,f = f(x) for all f ∈ Wk. In particular, taking f = wk,y yields wk,x,wk,y = wk,y(x). Now n wk,y(x)=Pk ( x, y ) implies that n Pk ( x, y )= wk,x,wk,y , as desired.  Corollary 5.4. For every finite subset C⊂Sn−1 and k ≥ 0, n ≥ Pk ( x, y ) 0. x,y∈C Proof. We have n Pk ( x, y )= wk,x,wk,y x,y∈C x,y∈C    2     =  wk,x x∈C ≥ 0, as desired.     2 ≥ This argument is a perfect generalization of x∈C x 0, except instead of summing the vectors x, we are summing vectors wk,x in the Hilbert space Wk.One n−1 interpretation is that x → wk,x maps S into a sphere in the higher-dimensional space Wk, and we’re combining the trivial inequality    2     ≥  wk,x 0 x∈C with that nontrivial mapping. When n =2,thespaceWk has dimension 2 for k ≥ 1, and so up to scaling we are mapping S1 to itself. This map wraps S1 around itself k times, while the analogues for n ≥ 3 are more subtle. It’s natural to wonder whether ultraspherical polynomials span all the functions P satisfying P (x, y) ≥ 0 x,y∈C for all C. In fact, they do not. Pfender has constructed further such functions and used them to obtain improvements on linear programming bounds [64]. However, the numerical improvements so far have been relatively modest. Instead, Schoenberg proved that the ultraspherical polynomials span the space n−1 of positive-definite kernels [66], i.e., functions P such that for all x1,...,xN ∈ S , the N × N matrix with entries P (xi,xj ) is positive semidefinite. The reason why ultraspherical polynomials are positive-definite is Lemma 5.3: the matrix with LECTURE 4. ENERGY AND PACKING BOUNDS ON SPHERES 87

entries wk,xi ,wk,xj is a Gram matrix and is thus positive semidefinite. Positive- definite kernels play an important role in representation theory, which contributes to the importance of ultraspherical polynomials. As a final comment, everything we have done in this lecture has been restricted to analyzing pairwise distance distributions. It’s natural to ask what happens if one looks at triples of points instead of pairs, or even larger subconfigurations. The Delsarte inequalities can be generalized to semidefinite constraints on these higher-order correlation functions, and thus we can obtain semidefinite program- ming bounds [67,5,49], which are a powerful and important extension of linear programming bounds. For reasons that have not yet been understood, these higher- order bounds seem less fruitful for obtaining sharp bounds, but several sharp cases are known [6, 24] and others presumably remain to be discovered.

LECTURE 5 Packing bounds in Euclidean space

1. Introduction In this lecture we will study linear programming bounds for the sphere packing problem in Euclidean space. The basic principles are closely analogous to those we saw for spherical codes in the fourth lecture. However, the way the bounds behave in Euclidean space is far more mysterious. They almost certainly solve the sphere 8 24 packing problem in R and R , by matching the densities of the E8 and Leech lattices, but nobody has been able to prove it. We will focus on the sphere packing problem, rather than energy minimization. Everything we will do works just as well in the latter case (see §9of[18]), but sphere packing already illustrates the essential features of these bounds. To begin, let’s review the statement and proof of linear programming bounds for spherical codes, i.e., Theorem 4.1 from the last lecture:  Theorem . n ≥ ≥ 1.1 Suppose h = k hkPk with hk 0 for k 0 and h0 > 0,and suppose h(t) ≤ 0 for t ∈ [−1, cos θ]. Then every code C in Sn−1 with minimal angle at least θ satisfies

|C| ≤ h(1)/h0. Proof. We have |C| ≥ n ≥|C|2  h(1) h( x, y )= hk Pk ( x, y ) h0. x,y∈C k x,y∈C How could we generalize this argument? First, we need functions on Euclidean space that can play the same role as ultraspherical polynomials. In particular, we need an analogue of the positivity property n ≥ Pk ( x, y ) 0. x,y∈C As it turns out, the Euclidean functions are considerably more familiar, namely exponentials x → e2πi t,x . If we apply them to two points via (x, y) → e2πi t,x−y , then for every finite subset C of Rn,    2   2πi t,x−y  2πi t,x  ≥ (1.1) e =  e  0. x,y∈C x∈C As in the third lecture, these functions have representation-theoretic origins, but we will not take up that subject here. In the same way we previously made use of nonnegative linear combinations of ultraspherical polynomials, we will now need to use nonnegative linear combinations of exponentials. The natural setting for linear combinations of exponentials is

89 90 HENRY COHN, PACKING, CODING, AND GROUND STATES the Fourier transform. Define the Fourier transform f" of an integrable function f : Rn → R by f"(t)= f(x)e−2πi t,x dx. Rn If f is continuous and f" is integrable as well, then Fourier inversion tells us that f(x)= f"(t)e2πi t,x dt. Rn

In other words, the Fourier transform f" gives the coefficients needed to express f as a continuous linear combination of exponentials. Thus, we will be particularly interested in functions f for which f" ≥ 0. If f"(t) ≥ 0 for all t, then Fourier inversion implies that f(x − y) ≥ 0 x,y∈C whenever C is a finite subset of Rn, because    2   f(x − y)= f"(t)  e2πi t,x  dt Rn   x,y∈C x∈C by (1.1). Thus, functions with nonnegative Fourier transforms have exactly the property we need to generalize the Delsarte inequalities to Euclidean space. However, using these functions to prove sphere packing bounds requires some finesse. In the spherical case, we looked at the double sum h(x, y) x,y∈C and bounded it on both sides to get |C| ≥ n ≥|C|2 h(1) h( x, y )= hk Pk ( x, y ) h0. x,y∈C k x,y∈C In Euclidean space, the corresponding double sum would be f(x − y), x,y∈C where C is a dense sphere packing, or rather the set of sphere centers in such a packing. Unfortunately, there’s an obvious problem with this approach: C will be infinite and the double sum will diverge. For example, if C is a lattice, then every term in the sum occurs infinitely often, because there are infinitely many ways to write each lattice vector as a difference of lattice vectors. Can we somehow renormalize the double sum and use it to complete the proof? The answer is yes if we’re careful; see the proof of Theorem 3.3 in [25], which controls the sum over a packing by subtracting a uniform background distribution of equal density. However, in this lecture we’ll take an arguably more fundamental approach using the Poisson summation formula. LECTURE 5. PACKING BOUNDS IN EUCLIDEAN SPACE 91

2. Poisson summation Poisson summation is a remarkable duality between summing a function over a lattice and summing its Fourier transform over the dual lattice. We’ll take a somewhat cavalier attitude towards analytic technicalities: we will manipulate sums and integrals however we like, and include enough hypotheses to justify these manipulations. Specifically, we will deal with what we’ll call admissible n −n−ε " functions f : R → R, those for which |f(x)| = O (1 + |x|) and |f(t)| = O (1+|t|)−n−ε for some ε>0. This decay rate is fast enough for sums over lattices to converge with room to spare. In practice, we can simply read “admissible” as “sufficiently rapidly decreasing and smooth for everything to work.” Theorem 2.1 (Poisson summation). If f : Rn → R is an admissible function and Λ is a lattice in Rn,then 1 f(x)= f"(t). vol(Rn/Λ) x∈Λ t∈Λ∗ Here vol(Rn/Λ) is the volume of a fundamental cell of Λ, i.e., the determinant of Λ, and Λ∗ = {t ∈ Rn |x, t∈Z for all x ∈ Λ} is the dual lattice (see Exercise 2.1 in the third lecture). Proof. The key idea is to prove an even more general formula, by looking at the Fourier expansion of the periodization of f under Λ. Let F (y)= f(x + y), x∈Λ so that F is periodic modulo Λ. We can expand F as a Fourier series 2πi t,y F (y)= cte t∈Λ∗ ∗ for some coefficients ct,whereΛ occurs because it specifies the exponentials that are periodic modulo Λ (see §2 in the third lecture). Let D be a fundamental domain for Λ. By orthogonality, 1 −2πi t,y ct = F (y)e dy vol(D) D 1 = F (y)e−2πi t,y dy Rn vol( /Λ) D 1 = f(x + y)e−2πi t,y dy Rn vol( /Λ) D x∈Λ 1 = f(x + y)e−2πi t,x+y dy (x ∈ Λandt ∈ Λ∗) Rn vol( /Λ) D x∈Λ 1 −2πi t,y Rn = n f(y)e dy (translates of D tile ) vol(R /Λ) Rn 1 = f"(t). vol(Rn/Λ)

In other words, the Fourier coefficients ct of the periodization of f are simply proportional to f"(t), with constant of proportionality 1/ vol(Rn/Λ). 92 HENRY COHN, PACKING, CODING, AND GROUND STATES

Thus, 1 f(x + y)= f"(t)e2πi t,y , vol(Rn/Λ) x∈Λ t∈Λ∗ and setting y = 0 yields Poisson summation.  The more general formula 1 f(x + y)= f"(t)e2πi t,y vol(Rn/Λ) x∈Λ t∈Λ∗ is important in its own right, not just as tool for proving Poisson summation. At first it looks considerably more general than Poisson summation, but it is simply Poisson summation applied to the function x → f(x + y)inplaceoff.

3. Linear programming bounds We can now state and prove the linear programming bounds for Euclidean sphere packings. Theorem 3.1 (Cohn and Elkies [16]). Let f : Rn → R be an admissible function with f(x) ≤ 0 for |x|≥2, f"(t) ≥ 0 for all t,andf"(0) > 0. Then the sphere packing density in Rn is at most πn/2 f(0) · . (n/2)! f"(0) As before, (n/2)! means Γ(n/2+1)whenn is odd. The factor of πn/2/(n/2)! is the volume of a unit ball. It occurs because we are looking at packing density, rather than just the number of balls per unit volume in space. We will prove Theorem 3.1 using Poisson summation [16]. Several other proofs are known, but they are longer [12] or more delicate [25]. One advantage of the proof in [25] is that it weakens the admissibility hypothesis, so that we can use a more robust space of functions; however, it obscures when a sharp bound can be obtained. Before we turn to the proof, let’s compare Theorem 3.1 with Theorem 1.1, its spherical analogue. One difference is that the Euclidean case involves a function of n variables, as opposed to one variable in the spherical case. However, this difference is illusory: we might as well radially symmetrize f in the Euclidean case (since both the hypotheses and the bound are radially symmetric), after which it becomes a function of one variable. Table 1 gives a dictionary with which these theorems can be compared. They really are fully analogous, with the biggest discrepancy being that we use inner products to measure distances in the spherical case but Euclidean distance in the Euclidean case. Proof. As a warm-up, let’s prove the linear programming bounds for lattice packings. Suppose Λ is a lattice packing with unit balls (since we can specify the packing radius without loss of generality). In other words, the minimal vector length of the lattice Λ is at least 2. By Poisson summation, 1 f(x)= f"(t). vol(Rn/Λ) x∈Λ t∈Λ∗ LECTURE 5. PACKING BOUNDS IN EUCLIDEAN SPACE 93

Table 1. A dictionary for comparing linear programming bounds on spheres and in Euclidean space.

space Sn−1 Rn function hf " transform hk f(t) balls don’t overlap t ∈ [−1, cos θ] |x|≥2 value at distance zero h(1) f(0) " bound h(1)/h0 f(0)/f(0)

We will apply the contrasting inequalities f(x) ≤ 0(for|x|≥2) and f"(t) ≥ 0to this identity. We have f(0) ≥ f(x) x∈Λ because f(x) ≤ 0for|x|≥2, while f"(t) ≥ f"(0) t∈Λ∗ because f"(t) ≥ 0 for all t.Thus, f"(0) f(0) ≥ . vol(Rn/Λ) The number of balls per unit volume in the packing is 1/ vol(Rn/Λ), and its density is therefore 1/ vol(Rn/Λ) times the volume of a unit ball. Thus, the density is at most πn/2 f(0) · , (n/2)! f"(0) as desired. So far, we have done nothing but apply the given inequalities to both sides of Poisson summation. Handling general packings will require a little more work, but nothing too strenuous. Without loss of generality, we can restrict our attention to periodic packings, since they come arbitrarily close to the optimal packing density. In other words, we can suppose our packing consists of N translates of a lattice Λ, namely

Λ+y1,...,Λ+yN . Now the number of balls per unit volume in the packing is N/ vol(Rn/Λ), and the condition that they should not overlap says that |x + yj − yk|≥2forx ∈ Λaslong as x =0or j = k. A little manipulation based on the translated Poisson summation formula 1 f(x + y)= f"(t)e2πi t,y vol(Rn/Λ) x∈Λ t∈Λ∗ shows that    2 N 1 N  "  2πi yj ,t  f(x + yj − yk)= f(t)  e  . vol(Rn/Λ)   x∈Λ j,k=1 t∈Λ∗ j=1 94 HENRY COHN, PACKING, CODING, AND GROUND STATES

The inequalities on f and f" showthattheleftsideisatmostNf(0) and the right side is at least N 2f"(0)/ vol(Rn/Λ). It follows that πn/2 N πn/2 f(0) · ≤ · , (n/2)! vol(Rn/Λ) (n/2)! f"(0) which completes the proof.  In this proof, the inequality N N 2f"(0) f(x + y − y ) ≥ j k vol(Rn/Λ) x∈Λ j,k=1 for functions satisfying f" ≥ 0 plays the role of the Delsarte inequalities. Note that the left side is essentially summing f over the distance distribution of the packing, but renormalized so that the distances do not occur infinitely often. In physics terms [77, p. 72], this inequality says that the structure factor (the Fourier transform of g2 −1, where g2 is the pair correlation function) is nonnegative. The structure factor plays an important role in the theory of scattering, and its nonnegativity is a fundamental constraint on the pair correlations that can occur in any material. The Poisson summation approach to linear programming bounds generalizes naturally to the Selberg trace formula [71, 35]. Specifically, one can use a pretrace formula to prove density bounds in hyperbolic space [25]. However, several things that are known in the Euclidean case remain mysterious in hyperbolic geometry. In particular, the bounds based on the pretrace formula have been proved only for periodic packings, which are not known to come arbitrarily close to the optimal density in hyperbolic space, and it is not known how to decrease the hypotheses on the auxiliary function f along the lines of the proof for Euclidean space in [25].

4. Optimization and conjectures As in the spherical case, nobody knows how to choose the optimal auxiliary function in the linear programming bounds. It is not difficult to obtain a trivial bound as follows:

Exercise 4.1. Let χB be the characteristic function of the unit ball centered n at the origin in R . Show that the convolution f = χB ∗χB satisfies the hypotheses of Theorem 3.1 and yields an upper bound of 1 for the sphere packing density. Despite the triviality of the bound, this function is of some interest [73], but better constructions are needed if we are to prove nontrivial bounds. See, for example, §6of[16] for constructions based on Bessel functions. The behavior of the optimized linear programming bound in Rn as n →∞is unclear. Cohn and Zhao [25] showed that it is at least as good as the Kabatiansky- Levenshtein bound of 2−(0.5990...+o(1))n, while Torquato and Stillinger [78]showed that it can be no better than 2−(0.7786...+o(1))n. In particular, it comes nowhere near the density of 2−(1+o(1))n attained by the best sphere packings currently known, although it might come much closer than the Kabatiansky-Levenshtein bound does. Aside from these constraints, the asymptotics are a mystery. We will focus instead on how the bounds behave in low dimensions, by which I mean “not tending to infinity” rather than low in the everyday sense. Linear LECTURE 5. PACKING BOUNDS IN EUCLIDEAN SPACE 95

dimension 1 4 8 12 16 20 24 28 32 36 0

upper bound

best packing known log(density)

−14

Figure 1. The logarithm of sphere packing density as a function of dimension. The upper curve is the linear programming bound, while the lower curve is the best packing currently known. Vertical lines mark conjectured equality above one dimension. programming bounds are nearly the best bounds known in four or more dimensions (there is a small improvement based on incorporating one more term from Poisson summation [48]). As shown in Figure 1, they are not so far from the truth in eight or fewer dimensions, but they gradually drift away from the current record densities in high dimensions. Note that the jaggedness of the record densities reflects their subtle dependence on the dimension. The most remarkable feature of Figure 1 is that the curves appear to touch in eight and twenty-four dimensions. If true, this would settle the sphere packing problem in those dimensions, without the difficulties that plague three dimensions. Conjecture 4.2 (Cohn and Elkies [16]). The linear programming bounds for sphere packing density in Rn are sharp when n =2, 8,or24. Equality holds to at least fifty decimal places [23], but no proof is known.1 It is furthermore conjectured that the linear programming bounds for energy are sharp, which would lead to universal optimality in Euclidean space [18, §9]. Examining the proof of Theorem 3.1 shows that the auxiliary function f proves a sharp bound for a lattice Λ iff f(x) = 0 for all x ∈ Λ \{0} and f"(t) = 0 for all t ∈ Λ∗ \{0}. In other words, all we have to do is to ensure that f and f" have certain roots without developing any unwanted sign changes. That sounds like a manageable problem, but unfortunately it seems difficult to control the roots of a function and its Fourier transform simultaneously.

1This conjecture has since been proved for n =8in[82]andn =24in[21]. 96 HENRY COHN, PACKING, CODING, AND GROUND STATES

Linear programming bounds seem not to be sharp in Rn except when n =1,2, 8, or 24. We can’t rule out the possibility of sharp bounds in other dimensions, but nobody has been able to identify any plausible candidates. The n = 1 case follows from Exercise 4.1, but sharpness is not known even for n = 2, let alone 8 or 24. That makes it seem all the more mysterious: eight or twenty-four dimensions could be truly deep, but two dimensions cannot transcend human understanding. The strongest evidence for the sharpness of these bounds is the numerics, but there are also analogies with related bounds that are known to be sharp in these dimensions, such as those for the kissing problem [52, 61]. It is worth noting that the kissing bounds are not just sharp, but sharp to an unnecessary degree. Because the kissing number is an integer, any bound with error less than 1 could be truncated to the exact answer, but that turns out not to be necessary. In particular, the bound in Theorem 1.1 is generally not an integer, but for the kissing problem it miraculously turns out to be integral in R8 and R24: itisexactly240 in R8 and exactly 196560 in R24. This unexpected exactness raises the question of whether sharp bounds hold also for quantities such as packing density, where integrality does not apply, and indeed they do seem to. The computations behind Figure 1 are based on numerical optimization within a restricted class of auxiliary functions. Specifically, they use functions of the form 2 f(x)=p(|x|2)e−π|x| ,wherep is a polynomial of one variable. Such functions are relatively tractable, while being dense among all reasonable radial functions. To carry out explicit computations, it is convenient to write p in terms of an eigenbasis of the Fourier transform: Exercise 4.3. Let #  $ 2 −π|x|2 n  Pk = functions x → p(|x| )e on R p is a polynomial and deg(p) ≤ k .

Prove that Pk is closed under the Fourier transform. What are the eigenvalues of the Fourier transform on Pk? Show that the polynomials p corresponding to an eigenbasis are orthogonal with respect to a certain measure on [0, ∞), and compute that measure. The most general approach to optimizing the linear programming bounds over " Pk is to impose the sign conditions on f and f via sums of squares and then optimize using semidefinite programming [62]. This technique will produce the best possible polynomial p of any given degree. (Another approach is to force roots for f and f" and then optimize the root locations [16].) However, we cannot obtain a sharp bound by using polynomials, because they have only finitely many roots. The best we can do is to approximate the optimal bound. By contrast, in the spherical case we can prove sharp bounds using polynomials. That is one reason why linear programming bounds are so much more tractable for spheres than they are in Euclidean space. Numerical computations have thus far shed little light on the high-dimensional asymptotics of linear programming bounds. These bounds are difficult to compute precisely in high dimensions, because such computations seem to require using high- degree polynomials. Computing the linear programming bound in R1000 to fifteen decimal places would be an impressive benchmark, which might be possible but would not be easy. Even just a few decimal places would be interesting, as would an order of magnitude estimate in R10000. LECTURE 5. PACKING BOUNDS IN EUCLIDEAN SPACE 97

Table 2. Numerically computed Taylor series coefficients of the hypothetical sphere packing functions in Rn, normalized so f(0) = f"(0) = 1.

n function order coefficient conjecture 8 f 2 −2.7000000000000000000000000000 ... −27/10 8 f" 2 −1.5000000000000000000000000000 ... −3/2 24 f 2 −2.6276556776556776556776556776 ... −14347/5460 24 f" 2 −1.3141025641025641025641025641 ... −205/156 8 f 44.2167501240968298210998965628 ... ? 8 f" 4 −1.2397969070295980026220596589 ... ? 24 f 43.8619903167183007758184168473 ... ? 24 f" 4 −0.7376727789015322303799539712 ... ?

Even without being able to prove that the linear programming bounds are sharp in R8 and R24, they can be combined with further arguments to prove optimality among lattices: Theorem 4.4 (Cohn and Kumar [19]). The Leech lattice is the unique densest lattice in R24. See also the exposition in [17]. Aside from R24, the optimal lattices are known only in up to eight dimensions [9, 81]. Cohn and Miller observed that the hypothetical auxiliary functions proving sharp bounds in R8 and R24 have additional structure, which has not yet been explained. The patterns are prettiest if we rescale the function f and its input so that f(0) = f"(0) = 1, in which case the linear programming bounds amount to minimizing the radius r such that f(x) ≤ 0for|x|≥r (see Theorem 3.2 in [16]). Then the quadratic Taylor coefficients appear to be rational numbers: Conjecture 4.5 (Cohn and Miller [23]). The quadratic Taylor coefficients of the optimal radial functions f and f",normalizedasabovewithf(0) = f"(0) = 1, are rational numbers when n =8or n =24, as shown in Table 2. Because f and f" are even functions, the odd-degree Taylor coefficients vanish. The fourth-degree coefficients shown in Table 2 remain unidentified. If all the coefficients could be determined, it would open the door to solving the sphere packing problem in R8 and R24.

Bibliography

[1] M. Ajtai, The shortest vector problem in L2 is NP-hard for randomized reductions,inPro- ceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 10–19, Association for Computing Machinery, New York, 1998. [2]N.N.Andreev,An extremal property of the icosahedron, East J. Approx. 2 (1996), 459–462. MR1426716 [3]N.N.Andreev,Location of points on a sphere with minimal energy, Proc. Steklov Inst. Math. 219 (1997), 20–24. MR1642295 [4]S.Axler,P.Bourdon,andW.Ramey,Harmonic function theory, second edition, Graduate Texts in Mathematics 137, Springer-Verlag, New York, 2001. MR1805196 [5] C. Bachoc and F. Vallentin, New upper bounds for kissing numbers from semidefinite pro- gramming,J.Amer.Math.Soc.21 (2008), 909–924. MR2393433 [6] C. Bachoc and F. Vallentin, Optimality and uniqueness of the (4, 10, 1/6) spherical code,J. Combin. Theory Ser. A 116 (2009), 195–204. MR2469257 [7] K. Ball, A lower bound for the optimal density of lattice packings, Internat. Math. Res. Notices 1992, 217–221. MR1191572 [8] B. Ballinger, G. Blekherman, H. Cohn, N. Giansiracusa, E. Kelly, and A. Sch¨urmann, Exper- imental study of energy-minimizing point configurations on spheres, Experiment. Math. 18 (2009), 257–283. MR2555698 [9] H. F. Blichfeldt, The minimum values of positive quadratic forms in six, seven and eight variables,Math.Z.39 (1935), 1–15. MR1545485 [10] A. V. Bondarenko, D. P. Hardin, and E. B. Saff, Mesh ratios for best-packing and limits of minimal energy configurations, Acta Math. Hungar. 142 (2014), 118–131. MR3158856 [11] M. Bowick and L. Giomi, Two-dimensional matter: order, curvature and defects,Adv.in Phys. 58 (2009), 449–563. [12] H. Cohn, New upper bounds on sphere packings II,Geom.Topol.6 (2002), 329–353. MR1914571 [13] H. Cohn, Order and disorder in energy minimization, Proceedings of the International Con- gress of Mathematicians, Hyderabad, August 19–27, 2010, Volume IV, pages 2416–2443, Hindustan Book Agency, New Delhi, 2010. MR2827978 [14] H. Cohn, A conceptual breakthrough in sphere packing, Notices Amer. Math. Soc. 64 (2017), 102–115. MR3587715 [15] H. Cohn, J. H. Conway, N. D. Elkies, and A. Kumar, The D4 root system is not universally optimal, Experiment. Math. 16 (2007), 313–320. MR2367321 [16] H. Cohn and N. D. Elkies, New upper bounds on sphere packings I, Ann. of Math. (2) 157 (2003), 689–714. MR1973059 [17] H. Cohn and A. Kumar, The densest lattice in twenty-four dimensions, Electron. Res. An- nounc. Amer. Math. Soc. 10 (2004), 58–67. MR2075897 [18] H. Cohn and A. Kumar, Universally optimal distribution of points on spheres,J.Amer. Math. Soc. 20 (2007), 99–148. MR2257398 [19] H. Cohn and A. Kumar, Optimality and uniqueness of the Leech lattice among lattices, Ann. of Math. (2) 170 (2009), 1003–1050. MR2600869 [20] H. Cohn and A. Kumar, Algorithmic design of self-assembling structures, Proc. Natl. Acad. Sci. USA 106 (2009), 9570–9575. [21] H. Cohn, A. Kumar, S. D. Miller, D. Radchenko, and M. Viazovska, The sphere packing problem in dimension 24, Ann. of Math. (2) 185 (2017), 1017–1033. [22] H. Cohn, A. Kumar, and G. Minton, Optimal simplices and codes in projective spaces,Geom. Topol. 20 (2016), 1289–1357. MR3523059

99 100 HENRY COHN, PACKING, CODING, AND GROUND STATES

[23] H. Cohn and S. D. Miller, Some properties of optimal functions for sphere packing in dimen- sions 8 and 24, preprint, 2016, arXiv:1603.04759. [24] H. Cohn and J. Woo, Three-point bounds for energy minimization,J.Amer.Math.Soc.25 (2012), 929–958. MR2947943 [25] H. Cohn and Y. Zhao, Sphere packing bounds via spherical codes, Duke Math. J. 163 (2014), 1965–2002. MR3229046 [26] H. Cohn and Y. Zhao, Energy-minimizing error-correcting codes, IEEE Trans. Inform. Theory 60 (2014), 7442–7450. MR3285724 [27] J. H. Conway and N. J. A. Sloane, What are all the best sphere packings in low dimensions?, Discrete Comput. Geom. 13 (1995), 383–403. MR1318784 [28] J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups,thirdedi- tion, Grundlehren der Mathematischen Wissenschaften 290, Springer, New York, 1999. MR1662447 [29] H. S. M. Coxeter, L. Few, and C. A. Rogers, Covering space with equal spheres,Mathematika 6 (1959), 147–157. MR0124821 [30] P. Delsarte, Bounds for unrestricted codes, by linear programming, Philips Res. Rep. 27 (1972), 272–289. MR0314545 [31] P. Delsarte, J. M. Goethals, and J. J. Seidel, Spherical codes and designs, Geom. Dedicata 6 (1977), 363–388. MR0485471 [32] P. D. Dragnev, D. A. Legg, and D. W. Townsend, Discrete logarithmic energy on the sphere, Pacific J. Math. 207 (2002), 345–358. MR1972249 [33] F. J. Dyson, A Brownian-motion model for the eigenvalues of a random matrix,J.Math. Phys. 3 (1962), 1191–1198. MR0148397 [34] W. Ebeling, Lattices and codes: a course partially based on lectures by Friedrich Hirzebruch, third edition, Advanced Lectures in Mathematics, Springer Spektrum, Wiesbaden, 2013. MR2977354 [35] J. Elstrodt, F. Grunewald, and J. Mennicke, Groups acting on hyperbolic space: harmonic analysis and number theory, Springer-Verlag, Berlin, 1998. MR1483315 [36] L. Fejes T´oth, Regular Figures, Pergamon Press, Macmillan, New York, 1964. MR0165423 [37] L. Fejes T´oth, Lagerungen in der Ebene auf der Kugel und im Raum, second edition, Springer, Berlin, 1972. MR0353117 [38] O. Goldreich, S. Goldwasser, and S. Halevi, Public-key cryptosystems from lattice reduction problems,inAdvances in Cryptology – CRYPTO ’97, Lecture Notes in Computer Science, volume 1294, pp. 112–131, Springer-Verlag, Berlin, 1997. MR1630399 [39] H. Groemer, Existenzs¨atze f¨ur Lagerungen im Euklidischen Raum,Math.Z.81 (1963), 260– 278. MR0163222 [40] T. C. Hales, Cannonballs and honeycombs, Notices Amer. Math. Soc. 47 (2000), 440–449. MR1745624 [41] T. C. Hales, A proof of the Kepler conjecture, Ann. of Math. (2) 162 (2005), 1065–1185. MR2179728 [42] T. Hales, M. Adams, G. Bauer, T. D. Dang, J. Harrison, L. T. Hoang, C. Kaliszyk, V. Ma- gron, S. McLaughlin, T. T. Nguyen, Q. T. Nguyen, T. Nipkow, S. Obua, J. Pleso, J. Rute, A.Solovyev,T.H.A.Ta,N.T.Tran,T.D.Trieu,J.Urban,K.Vu,andR.Zumkeller,A formal proof of the Kepler conjecture,ForumMath.Pi,5 (2017), e2, 29pp. [43] G. A. Kabatiansky and V. I. Levenshtein, Bounds for packings on a sphere and in space, Probl. Inf. Transm. 14 (1978), 1–17. [44] Y. Kallus, Statistical mechanics of the lattice sphere packing problem,Phys.Rev.E87 (2013), 063307, 5 pp. [45] Y. Kallus, V. Elser, and S. Gravel, Method for dense packing discovery,Phys.Rev.E82 (2010), 056707, 14 pp. [46] A. V. Kolushov and V. A. Yudin, On the Korkin-Zolotarev construction, Discrete Math. Appl. 4 (1994), 143–146. MR1273240 [47] A. V. Kolushov and V. A. Yudin, Extremal dispositions of points on the sphere,Anal.Math. 23 (1997), 25–34. MR1630001 [48] D. de Laat, F. M. de Oliveira Filho, and F. Vallentin, Upper bounds for packings of spheres of several radii,ForumMath.Sigma2 (2014), e23, 42 pp. MR3264261 [49] D. de Laat and F. Vallentin, A semidefinite programming hierarchy for packing problems in discrete geometry, Math. Program. 151 (2015), Ser. B, 529–553. MR3348162 BIBLIOGRAPHY 101

[50] J. Leech, Equilibrium of sets of particles on a sphere,Math.Gaz.41 (1957), 81–90. MR0086325 [51] A. K. Lenstra, H. W. Lenstra, Jr., and L. Lov´asz, Factoring polynomials with rational coef- ficients, Math. Ann. 261 (1982), 515–534. MR682664 [52] V. I. Levenshtein, On bounds for packings in n-dimensional Euclidean space,SovietMath. Dokl. 20 (1979), 417–421. MR529659 [53] V. I. Levenshtein, Designs as maximum codes in polynomial metric spaces, Acta Appl. Math. 29 (1992), 1–82. MR1192833 [54] H. L¨owen, Fun with hard spheres,inK.R.MeckeandD.Stoyan,eds.,Statistical physics and spatial statistics: the art of analyzing and modeling spatial structures and pattern formation, Lecture Notes in Physics 554, Springer, New York, 2000, pp. 295–331. MR1870950 [55] E.´ Marcotte and S. Torquato, Efficient linear programming algorithm to generate the densest lattice sphere packings,Phys.Rev.E87 (2013), 063303, 9 pp. [56] G. Minton, unpublished notes, 2011. [57] O. Musin, The kissing number in four dimensions, Ann. of Math. (2) 168 (2008), 1–32. MR2415397 [58] L. Nachbin, The Haar integral, D. Van Nostrand Company, Inc., Princeton, NJ, 1965. MR0175995 [59] P. Nguyen, Cryptanalysis of the Goldreich-Goldwasser-Halevi cryptosystem from Crypto ’97, in Advances in Cryptology – CRYPTO ’99, Lecture Notes in Computer Science, volume 1666, pp. 288–304, Springer-Verlag, Berlin, 1999. [60] P. Q. Nguyen and B. Vall´ee, eds., The LLL algorithm: survey and applications, Springer- Verlag, Berlin, 2010. MR2722178 [61] A. M. Odlyzko and N. J. A. Sloane, New bounds on the number of unit spheres that can touch a unit sphere in n dimensions, J. Combin. Theory Ser. A 26 (1979), 210–214. MR530296 [62] P. A. Parrilo, Semidefinite programming relaxations for semialgebraic problems, Math. Pro- gram. 96 (2003), Ser. B, 293–320. MR1993050 [63] C. Peikert, A decade of lattice cryptography, Found. Trends Theor. Comput. Sci. 10 (2014), 283–424. MR3494162 [64] F. Pfender, Improved Delsarte bounds for spherical codes in small dimensions, J. Combin. Theory Ser. A 114 (2007), 1133–1147. MR2337242 [65] C. A. Rogers, The packing of equal spheres, Proc. London Math. Soc. (3) 8 (1958), 609–620. MR0102052 [66] I. J. Schoenberg, Positive definite functions on spheres, Duke Math. J. 9 (1942), 96–108. MR0005922 [67] A. Schrijver, New code upper bounds from the Terwilliger algebra and semidefinite program- ming, IEEE Trans. Inform. Theory 51 (2005), 2859–2866. MR2236252 [68] K. Sch¨utte and B. L. van der Waerden, Das Problem der dreizehn Kugeln, Math. Ann. 125 (1953), 325–334. MR0053537 [69] R. E. Schwartz, The five-electron case of Thomson’s problem,Exp.Math.22 (2013), 157–186. MR3047910 [70] R. E. Schwartz, The triangular bi-pyramid minimizes a range of power law potentials, preprint, 2015, arXiv:1512.04628. [71] A. Selberg, Harmonic analysis and discontinuous groups in weakly symmetric Riemannian spaces with applications to Dirichlet series, J. Indian Math. Soc. (N.S.) 20 (1956), 47–87. MR0088511 [72] C. E. Shannon, A mathematical theory of communication,BellSystemTech.J.27 (1948), 379–423 and 623–656. MR0026286 [73] C. L. Siegel, Uber¨ Gitterpunkte in convexen K¨orpern und ein damit zusammenh¨angendes Extremalproblem,ActaMath.65 (1935), 307–323. MR1555407 [74] C. L. Siegel, A mean value theorem in geometry of numbers,Ann.ofMath.(2)46 (1945), 340–347. MR0012093 [75] M. Dutour Sikiri´c, A. Sch¨urmann, and F. Vallentin, Classification of eight-dimensional perfect forms, Electron. Res. Announc. Amer. Math. Soc. 13 (2007), 21–32. MR2300003 [76] A. Thue, Om nogle geometrisk-taltheoretiske Theoremer, Forhandlingerne ved de Skandi- naviske Naturforskeres 14 (1892), 352–353. [77] S. Torquato, Random heterogeneous materials: microstructure and macroscopic properties, Interdisciplinary Applied Mathematics 16, Springer-Verlag, New York, 2002. MR1862782 102 HENRY COHN, PACKING, CODING, AND GROUND STATES

[78] S. Torquato and F. Stillinger, New conjectural lower bounds on the optimal density of sphere packings, Experiment. Math. 15 (2006), 307–331. MR2264469 [79] S. Vance, Improved sphere packing lower bounds from Hurwitz lattices, Adv. Math. 227 (2011), 2144–2156. MR2803798 [80] A. Venkatesh, A note on sphere packings in high dimension,Int.Math.Res.Not.2013 (2013), 1628–1642. MR3044452 [81] N. M. Vetˇcinkin, Uniqueness of classes of positive quadratic forms on which values of the Hermite constant are attained for 6 ≤ n ≤ 8, Proc. Steklov Inst. Math. 152 (1982), 37–95. [82] M. S. Viazovska, The sphere packing problem in dimension 8, Ann. of Math. (2) 185 (2017), 991–1015. [83] V. A. Yudin, Minimum potential energy of a point system of charges, Discrete Math. Appl. 3 (1993), 75–81. MR1181534 Entropy, Probability and Packing

Alpha A Lee, Daan Frenkel

IAS/Park City Mathematics Series Volume 23, 2014

Entropy, Probability and Packing

Alpha A Lee, Daan Frenkel

Introduction The origin of the concept of time is lost in the mists of time itself. All definitions of time tend to be self referential. However, thermodynamics and, subsequently, statistical physics have given us a better understanding of the arrow of time. These deep insights originated from a rather unexpected direction: the science of steam engines. The work of Carnot and Clausius on the efficiency of ideal heat engines resulted in the formulation (by Clausius) of the Second Law of Thermody- namics. In the form that Clausius formulated, the Second Law of Thermodynamics introduces a quantity termed Entropy. This Entropy has interesting properties: it is a non-decreasing function of time and, for a closed system, it is maximal at equilibrium . Hence when we speak of entropy as ‘the arrow of time’, we actually turn things around and imply that time is a monotonically non-decreasing func- tion of entropy: it is the increase of entropy that distinguishes the past from the future. The thermodynamic formulation of the Second Law of Thermodynamics is fascinating and it is useful as it gives us a quantitive understanding of macroscopic thermomechanical processes such as engines. However, the Thermodynamic Second Law does not provide us with a microscopic interpretation of entropy. Statistical mechanics as created by Boltzmann, Gibbs and Planck, changed all that. In recent decades our intuitive understanding of entropy has increased substan- tially, partly due to our ability to perform extensive numerical simulations. Those simulations allow us to compute the entropy of complex systems, and relate en- tropy to molecular configurations. In these lectures we will discuss how modern understanding of entropy sheds light on the physics of Soft Matter and Granular Media.

Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK (AAL) Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK (DF) E-mail address: [email protected] (DF) DF acknowledges support from the European Research Council [Advanced Grant 227758] and the Engineering and Physical Sciences Research Council [Programme Grant EP/I001352/1].

c 2017 American Mathematical Society

105

LECTURE 1 Introduction to thermodynamics and statistical physics

In this lecture, we first review the basis of the classical “laws” of thermodynam- ics, which describe experimental observations on macroscopic systems. Thermody- namics predates the atomic picture of matter, and does not provide any microscopic interpretation of the thermodynamic quantities. To bridge the macro- and micro scales, we will introduce the framework of statistical mechanics and thermody- namic ensembles. Unlike the macroscale, defining temperature in the microscale turns out to be a non-trivial exercise, and we will discuss different formulations of temperature and whether or not temperature could be negative. Finally, we will discuss the origin of the arrow of time, and how entropy is compatible with the time-reversibility of classical Newtonian dynamics.

1. Classical equilibrium thermodynamics The era of industrial revolution and steam engines motivated the development of thermodynamics — the science of how thermal energy is converted to mechanical work. Seminal studies in the early and mid 19th century by Sadi Carnot, Rudolf Clausius and Josiah Gibbs, and by Walther Nernst in the early 20th century, led to the formulation of three “laws” of thermodynamics (and a “Zero-th Law”), which summarised the experimental observations at the time (for a light-hearted summary attributed to Ginsberg [17], see Figure 1). Despite its axiomatic appearance, it is important to bear in mind that those “laws” are purely phenomenological — the microscopic interpretation of concepts as such temperature and free energy is not within the classical framework.

1.1. The Zeroth Law: The concept of temperature The zeroth law of thermodynamics is an observation about the transitivity of ther- modynamic equilibrium. It states that: Zeroth Law of Thermodynamics: If two systems, A and B, are separately in thermal equilibrium with a third system, C,then A and B are also in thermal equilibrium with each other, and there is no heat flow between A and B when they are in contact. The zeroth law is, in effect, a two-part statement. The first is a definition of thermal equilibrium — thermal equilibrium between two systems is attained when there is no energy exchange (heat flow) between them when they are in contact. The second is a statement about the transitive nature of thermal equilibrium. This property of transitivity is the basis of thermometry: we can devise a probe (a ther- mometer) and define a temperature readout based on material properties of the thermometer, such that if we know that the thermometer is in thermal equilibrium

107 108 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Figure 1. The three laws of thermodynamics in a casino. The law that corresponds to each statement is left as an exercise for the reader. with two objects independently, then the two objects will have the same temper- ature. Conversely, by the zero-th law, there is no heat flow between two objects with the same temperature.

1.2. The First Law: Heat is work, and work is heat The first law of thermodynamics expresses the empirical observation that energy is conserved, even though it can be converted into various forms. The intuition is that any change in the internal energy of a system is the sum of heat transferred to/from the system and work on/by the system. A crucial observation is that heat can be converted to work, and vice versa. Therefore, although energy is conserved, heat and work themselves are not independently conserved. A precise statement of the first law for an infinitesimal process performed on the system is: (1.1) dE =¯dq +¯dw , where dE is the change in internal energy of the system,dq ¯ is the heat transferred to the system anddw ¯ is the mechanical work performed on the system. Note that the notationd ¯ denotes an inexact differential — heat and work cannot be expressed in terms of its antiderivative for the purpose of integral calculations; i.e. its value cannot be inferred just by the initial and final states of a given system. The energy, however, is an exact differential, and the change in energy going from one state to another is only dependent on the initial and final states but not on the path that the system takes. The first law rules out the dream of developing perpetual motion machines that permanently produces work without any input of energy — a common scientific fantasy from Hellenistic times to the 19th Century (and, surprisingly, still alive and kicking). Energy can neither be created nor destroyed: it can only be transformed from one form to another. Hence, it is impossible for a machine to do the work infinitely without energy input. LECTURE 1. INTRODUCTION 109

1.3. The Second Law: Heat flows from hot to cold Another type of perpetual machines attempts to convert heat from a single heat bath (i.e. a large reservoir in equilibrium) into work, which does not contradict the first law as long as the sum of heat and work is conserved. A concrete example of this is the Zeromotor, hypothesised by John Gamgee in 1880: A reservoir of ammonia, which boils at −33oC, would be vaporised due to heat flow from the surrounding. This produces a vapour pressure of 4 atmospheres that would drive a piston. As the vapour expands it would also cool, and Gamgee reasoned that it should condense and be returned to the reservoir for the next cycle. Perpetual motion of that kind would, indeed, completely solve the world’s energy problem. This is too good to be true. The pitfall lies in the empirical observation that heat can never flow spontaneously (i.e. without the expenditure of work) form a cold reservoir to a warmer reservoir. In the case of Zeromotor, the spontaneous condensation of ammonia gas back to the liquid state would require heat to flow spontaneously from the colder gas to the hotter surrounding. This motivates the second law: Second Law of Thermodynamics: Heat flows occurs sponta- neously only from a hotter object to a colder object. This statement is actually a bit more subtle than it seems because, before we have defined temperature, we can only distinguish hotter and colder by looking at the direction of heat flow. What the second law says is that it is never possible to make heat flow spontaneously in the “wrong” direction. This, in effect, defines an arrow of time. Heuristically, the second law reveals that heat and work are different “qualities” of energy — we can covert work to heat as we please, but the second law puts a fundamental restriction on how heat can be converted to work. There are more abstract formulations of the second law, most notably in terms of a quantity known as entropy. We will look deeper into the ramifications of the second law and the concept of entropy by introducing the concept of a reversible heat engine.

Reversible heat engines A reversible thermodynamic process is one that can be run backward by simply reversing the inputs and the outputs. It is the analogue of a frictionless system in mechanics. Since time-reversibility implies equilibrium, a reversible thermodynamic process is at equilibrium at each stage of its operation and thus is quasi-static. Well before the First Law of Thermodynamics was known, Sadi Carnot applied the concept of reversible process to a simple model of heat engine which exchange heat between two reservoirs kept at two different temperatures (see Figure 2). The complete analysis of the Carnot engine only became possible with the arrival of the first and second law. Note that a minimal heat engine requires two thermal reservoirs — the continuous extraction work from only one reservoir contradicts the second law. During one cycle of the Carnot engine, i.e. the sequence of steps that is com- pleted when the engine is returned into its original state, this engine takes in an amount of heat qH from a hot reservoir (e.g. coal that is burning), converts part of it into work w (pushing the piston) and delivers the remaining amount of heat qC to a cold reservoir (the surrounding environment). The reverse process is that, by performing an amount of work w, we can take an amount of heat qC from the cold reservoir and deliver an amount of heat qH to the hot reservoir. Reversible 110 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

TH

qH

w

qC

Tc

Figure 2. The Carnot engine operates between temperatures TH and TC with no other heat exchanges. engines are idealisations because in any real engine, there will be additional heat losses due to friction. However, the ideal reversible engine can be approximated arbitrarily closely by a real engine if, at every stage, the real engine is sufficiently close to equilibrium. As the engine is returned to its original state at the end of one cycle, its internal energy E has not changed. Hence, the first law tells us that

(1.2) ΔE = qH − (w + qC )=0. The figure of merit for heat engines is the efficiency, defined as the amount of work that one can extract per amount of heat taken in, viz. w (1.3) η = . qH At first, one might think that η depends on the precise design of our reversible engine. However, the second law actually implies: Carnot’s Theorem: Given the temperature of the two reser- voirs, no engine can be more efficient than the reversible Carnot engine. This can be proved via the following argument: Since a Carnot engine is reversible, it can be ran backward as a refrigerator. We use the non-Carnot engine to run the Carnot engine backward (see Figure 3). Denoting the heat exchanges of the non-   Carnot and Carnot engines by qH , qC , qH and qC , respectively , the the net effect −  −  of the system is to transfer heat equal qH qH = qC qC from the hot reservoir to −  ≥ the cold reservoir. As heat flows from hot to cold, qH qH 0. The same amount of work, w, is involved, hence we must have w w (1.4) ≤  =⇒ η ≤ ηc. qh qh Carnot’s theorem was an important discovery at the time — it shows that ther- modynamics places an upper bound on the efficiency of steam engines.

The thermodynamic temperature scale We have shown that all cyclic and reversible engines, independent of the material used, design and construction, have the same efficiency that depends only on the two reservoir temperatures. This conveniently allows us to define a temperature LECTURE 1. INTRODUCTION 111

TH TH

qH q'H qH - q'H

η ηc w =

q C q'C qC - q'C

Tc Tc

Figure 3. A Carnot engine run in reverse by another heat engine. scale. To do this, we consider a two Carnot engines connected back-to-back (see Figure 4) Expressing the heat flow as a function of the efficiency of the engine, we obtain

(1.5) Q2 = Q1(1 − η(T1,T2)),

(1.6) Q3 = Q2(1 − η(T2,T3)) = Q1(1 − η(T1,T2))(1 − η(T2,T3)), and, considering the two engines as a combined system,

(1.7) Q3 = Q1(1 − η(T1,T3)).

Comparing the two expressions for Q3 yields the functional relation

(1.8) (1 − η(T1,T2))(1 − η(T2,T3)) = 1 − η(T1,T3)

As the relation must hold for all T1, T2 and T3, it follows that the bivariate function f(x, y)=1− η(x, y) takes the form g(x) (1.9) f(x, y)= . g(y) Now, we can define a new thermodynamic temperature scale (with slight abuse of notation) T = g(T ). In this scale, the efficiency is given by

T1 (1.10) η(T1,T2)=1− . T2 Moreover, in this universal temperature scale, q q (1.11) 1 = 2 . T1 T2

Therefore, during a complete cycle, the difference between q1/T1 and q2/T2 is zero. Recall that at the end of a cycle the internal energy of the system has not changed. Now Equation (1.11) tells us that there is also another conserved quantity that we will call “entropy”. The entropy, S, is unchanged when we restore the system to its original state. In the language of thermodynamics, we call S a state function. We do not know what S is, but we do know how to compute its change. In the above example, the change in S was given by dS =(q1/T1) − (q2/T2) = 0. In general, 112 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

T1 T q1 1 w1 q1

w =w + w q 1 2 2 = T2 q2 q3 w2 T3

q3 T3

Figure 4. Two Carnot engines connected back to back. the change in entropy of a system due to the reversible addition of an infinitesimal amount of heatdq ¯ rev from a reservoir at temperature T is given by dq¯ (1.12) dS = rev . T The most famous (though not the most intuitively obvious) statement of the second law of thermodynamics is that any spontaneous change in a closed system (i.e. a system that exchanges neither heat nor particles with its environment) can never lead to a decrease in the entropy. Hence, at equilibrium, the entropy of a closed system is at a maximum. Can we understand this? Well, let us first consider a system with an energy E,volumeV and number of particles N that is in equilibrium. Let us denote the entropy of this system by S0(E,V,N). At equilibrium, all spontaneous changes that can happen, have happened. Now suppose we want to change something in this system; for instance, we decrease the volume of the system by one half. As the system was at equilibrium, this change does not occur spontaneously. Hence, in order to effect this change, we must perform a certain amount of work w (for instance, by placing a piston in the system and moving it). Let us perform this work reversibly in such a way that E, the total energy of the system, stays constant (as do N). The first law tells us that we can only keep E constant if, while we do the work, we allow an amount of heat q to flow out of the system, such that q = w. Equation (1.12) tells us that when an amount of heat q flows out of the system, the entropy S of the system must decrease. Let us denote the entropy of this constrained state by S1(E,V1,N)0. As this argument is quite general, we have indeed shown that any spontaneous change in a closed system leads to an increase in the entropy. Hence, at equilibrium, the entropy of a closed system is at a maximum. Interpretations of entropy in terms of heat engines has its limits. After all, what is entropy? How can one interpret it microscopically? This key question cannot be answered with the tools of classical thermodynamics — we will introduce the framework of statistical mechanics in the next section.

1.4. The Third Law: Journey to absolute zero The third law is much less intuitive compared to the first three. In its original form it states The entropy of a system cannot be reduced to its absolute-zero value in a finite number of operations. However, the more conventional formulation is: Third Law of Thermodynamics:Theentropyofanyper- fectly ordered substance at zero absolute temperature is a uni- versal constant and can be taken as zero. Since the microscopic interpretation of entropy will be apparent in the subsequent section, we will not discuss the third law in detail.

2. Statistical physics of entropy To understand what entropy means microscopically, we need derive the framework of statistical mechanics. Let us take a step back, and imagine that we can track the positions and momenta of each particle. This gives us the phase space of the system. Suppose the system is completely isolated from its surroundings, such that its total energy is constant. In this case, not all regions of phase space can be explored, as some regions (with large momenta, for instance) would correspond to higher energy configurations. The volume of phase space can be seen as the continuum limit of the number of degenerate energy eigenstates if we consider the system using quantum mechanics. For the systems that are of interest to statistical mechanics, i.e. systems with O(1023) particles, the degeneracy of energy levels is astronomically large — in fact, the word “astronomical” is misplaced: the numbers involved are so large that, by comparison, the total number of particles in the universe is utterly negligible. In what follows, we denote by Ω(E,V,N) the volume of phase space 1 that can be explored by a system of N particle with total energy E confined in a volume V . We now express the basic assumption of statistical mechanics as follows: Ergodic Hypothesis: A system with fixed N, V and E is equally likely to be found in any region of the allowed phase phase.

1 More precisely  Ω= drN dpN δ(E − H(rN , pN )) where H(rN , pN ) is the Hamiltonian of the system of N particles. 114 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Much of statistical mechanics follows from this simple (but highly non-trivial) as- sumption, known as the “ergodic hypothesis”. To see how the ergodic hypothesis leads to statistical mechanics, let us first consider a system with total energy E that consists of two weakly interacting sub- systems. In this context, “weakly interacting” means that the sub-systems can exchange energy, but we can still write the total energy of the system as the sum of the energies E1 and E2 of the sub-systems. There are many ways in which we can distribute the total energy over the two sub-systems such that E1 + E2 = E.Fora given choice of E1, the total phase space volume of the system is Ω1(E1) × Ω2(E2). Note that the total phase space volume is not the sum but the product of the volumes of the individual systems 2. In what follows, it will be convenient to have a measure of the phase space volumes of the sub-systems that scales linearly with the system size (an extensive quantity). A logical choice is to take the (natural) logarithm, giving

(1.13) log Ω(E1,E− E1)=logΩ1(E1)+logΩ2(E − E1) We assume that sub-systems 1 and 2 can exchange energy. As the two systems do not exchange particles and do not perform work on one another, the exchange of energy corresponds to the transfer of heat. What is the most likely distribution of the energy? We know that every point in the allowed phase space is equally likely; however, the volume of the allowed phase space that correspond to a given distribution of the energy over the sub-systems depends very strongly on the value of E1. We wish to know the most likely value of E1, i.e. the one that maximises log Ω(E ,E− E ). The condition for this maximum is that 1 1 ∂ log Ω(E ,E− E ) (1.14) 1 1 =0, ∂E1 N,V,E or, in other words, ∂ log Ω (E ) ∂ log Ω (E ) (1.15) 1 1 = 2 2 . ∂E1 N,V ∂E2 N,V Introducing the shorthand notation ∂ log Ω (E ) (1.16) β(E,V,N)= 1 1 ∂E1 N,V Equation (1.15) can be written as

(1.17) β(E1,V1,N1)=β(E2,V2,N2). Clearly, if initially we put all energy in system 1 (say), there will be energy transfer from system 1 to system 2 until Equation (1.15) is satisfied. From that moment on, there is no net energy flow from one sub-system to the other, and we say that the two sub-systems are in thermal equilibrium. This implies that

2This can be seen by considering  N N N N − N N − N N Ωtot(E)= dr1 dp1 dr2 dp2 δ(E H1(r1 , p1 ) H2(r2 , p2 ))  N N N N − N N − − N N = dr1 dp1 dr2 dp2 δ(E1 H1(r1 , p1 ))δ(E E1 H2(r2 , p2 ))

=Ω1(E − E1)Ω2(E1). LECTURE 1. INTRODUCTION 115

Figure 5. A schematic diagram showing entropy maximisation. The high dimensional phase space is projected onto a 2D surface for ease of visualisation. Ω0 and Ω1 denote the phase space volume of the initial and final states. the condition β(E1,V1,N1)=β(E2,V2,N2) must be equivalent to the statement that the two sub-systems have the same temperature. log Ω is an extensive state function (of E, V and N), just like the thermodynamic entropy S. Moreover, when thermal equilibrium is reached, log Ω of the total system is at a maximum, again just like S. This suggests that S is simply proportional to log Ω,

(1.18) S = kB log Ω, where kB is Boltzmann constant, which, in SI units, has the value 1.38065 × 10−23JK−1. This constant of proportionality cannot be derived and follows from comparison with experiment. Thus, in the statistical picture, the second law of thermodynamics is not at all mysterious: it simply states that, in thermal equilib- rium, the system is most likely to be found in the state that has the largest number of realisations. An objection to entropy and the second law, known as the Lochschmidt para- dox, is as follows: Newton’s equation of motion is time-reversible. Therefore, as- suming that that laws of classical mechanics holds, it should not be possible to deduce an irreversible process. The key to “solving” this paradox is to bear in mind that the second law must be interpreted as a probabilistic law — precisely stated, it should read “with high probability, entropy does not spontaneously de- crease and heat does not flow from the cold to hot.” Suppose we start with particles confined within a small region of phase space (e.g. a confined gas). By the ergodic hypothesis, once we remove the confinement, particles can explore all regions of the phase space that is allowed by conservation of energy. Therefore, at long time, there is no reason for the particles to localised themselves at the initial, small re- gion. Rather, they will tend to be as spread out as possible, and this is the precisely entropy maximisation (see Figure 5). Using this mental picture, the probability of the system spontaneously returning to the confined state (and breaking the second law) is simply given by Ω (1.19) p = 0 ∼ O(e−N ). Ω1 The last scaling follows from the exponential increase in the degrees of freedom in the system with number of particles. Therefore, the second law holds with very high probability. 116 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

2.1. Definition of temperature The next thing to note is that thermal equilibrium between sub-systems 1 and 2 implies that β1 = β2. In thermodynamics, we have another way of expressing the same thing: we say that two bodies that are brought in thermal contact are in equilibrium if their temperatures are the same (c.f. the zeroth law). This suggests that β must be related to the absolute temperature. The definition of temperature in classical thermodynamics is 1 ∂S (1.20) = . T ∂E N,V

If we use the same definition here, we find that 1 (1.21) β = . kBT It is of course a bit unfortunate that we cannot simply say that S = log Ω. The reason is that, historically, thermodynamics preceded statistical thermodynamics. In particular, the absolute thermodynamic temperature scale (see Section 1.3) con- tained an arbitrary constant that, in the 19th century, was chosen such that a one degree Kelvin change matched a one degree Celsius change. If we were to introduce an absolute temperature scale now, we could choose it such that S =logΩandthe  new absolute temperature T would then be equal to kBT . However, this would create many practical problems because, in those units, room temperature would be of the order of 5 × 10−21 J: that is, entropy would be dimensionless, but tem- perature would have the dimensions of energy. Few people would be happy with such a temperature scale, so we leave things as they are.

2.2. Can the absolute temperature be negative? In light of the formulation of temperature using statistical physics, a natural ques- tion is whether the absolute Kelvin scale can go negative — i.e. can the phase space volume available to a system decrease as we increase its total energy? For classical systems, common intuition suggests that it cannot. However, for systems with a finite number of energy levels (e.g. a system of spins in a magnetic field), where the energy levels are bounded, the answer to this question is much less trivial. To gain some intuition, we consider a concrete example of a two level system of distinguishable particles. The ground state has energy E0 = 0 and the excited state has energy E1 =  (see Figure 6). This model is physically analogous to a non-interacting spin system in an applied magnetic field, where the spins can align parallel or anti-parallel to the field. Consider a system with N particles. If the total energy E = 0, all the particles are in the ground state and there is only one possible configuration, thus Ω = 1. Now, when the energy is increased to E = , there is one particle in the excited state. The particle in the excited state can be any of the N particles, thus Ω = N. Therefore, increase in energy corresponds to increase in Ω and the temperature is positive. Now, consider the opposite limit. If E =(N − 1), all particles are in the excited state except one, and reasoning as before we have Ω = N. However, when E = N, there is only one configuration corresponding to all particles in the excited state and Ω = 1. This shows that increase in energy leads to a decrease in Ω, and therefore a negative temperature. LECTURE 1. INTRODUCTION 117

To t a l Energy

ε

0

E<ε/2 E>ε/2

Figure 6. The two level system under consideration. When the total energy E>/2, the population of the excited state is greater than the ground state.

More precisely, if N0 particles are in the ground state, N1 are in the excited state, the total number of configurations is given by N! (1.22) Ω = N0!N1! Substituting Stirling’s approximation log N! ≈ N log N − N for N 1intoEqua- tion (1.18), we have

(1.23) S = N log N − N0 log N0 − N1 log N1.

The total energy of the system is given by E = N1, using the definition E ≡ N, we can rewrite Equation (1.23) as M S(E) E E E E (1.24) = − log − 1 − log 1 − , kBN EM EM EM EM hence ∂S(E)/∂E =1/TB < 0forE>EM /2. A simple approximation to the entropy function (obtained by fitted a parabola through the minima and maximum of Equation 1.24) is given by % & S(E) 2E 2 (1.25) ≈ log 2 1 − − 1 . kBN EM Using this approximation, the temperature reads −1 ∂S ≈  1 (1.26) T = 2E , ∂E kB log 16 1 − EM which indeed is negative for E>EM /2. Negative temperature is attained when the occupancy of the excited state is greater than the ground state (known as population inversion in the laser literature). Gibbs Entropy It turns out that Equation (1.18), the so called Boltzmann entropy, is not the only way to define an entropy function. An alternative, proposed by Gibbs, considers the number of configuration (or total phase space volume) with energy less than or equal to E as the central quantity. Thus

(1.27) SG = kB log λω 118 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING where λ is an unimportant constant and ∂ω (1.28) =Ω(E). ∂E It should be stressed that, in 1902 when Gibbs proposed this definition of entropy, he was thinking only of classical systems. For such systems, the Gibbs and Boltzmann definitions of entropy are equivalent in the thermodynamic limit. However, this equivalence breaks down for systems with an energy that is bounded from above. In the Gibbs picture, the temperature is given by −1 ∂SG 1 ω (1.29) TG ≡ = > 0 ∂E kB Ω as ω, Ω > 0. Thus negative temperature is eliminated by redefining the entropy. Recently, it has been proposed that Gibbs’ definition of entropy is, in fact, more ‘correct’ than the one attributed to Boltzmann. [5]. This proposal has led to active discussions and debates in the literature [10, 6, 15, 7, 11]. For classical macroscopic systems with many degrees of freedom — Ω(E)in- creases steeply with E, and thus Ω(E) ≈ ω(E)asω(E) is dominated by contribution near E. However, for systems with bounded energy, the two definitions differ sig- nificantly. To illustrate this, we go back to the simple two state system that was introduced in the previous section (c.f. Figure 6). Integrating Equation (1.28), we have E E  (1.30) ω(E) − ω(0) = Ω(E)dE = eS(E )dE. 0 0 Substituting Equation (1.25) into (1.30) and noting that the ground state is non- degenerate (i.e. Ω(0) = 1),  % &' E 2E 2 ω(E)=1+ exp −N log 2 1 − − 1 dE EM 0   ( ) π N−2 2E =1+ 2 EM erf N log 2 +erf − 1 N log 2 N log 2 EM (1.31) Thus, as N →∞(neglecting all the sub-linear terms in N) ( )  2E (1.32) log ω ∼ N log2+log erf N log 2 +erf − 1 N log 2 . EM Noting that

2 e−x (1.33) erf(x) ∼ 1 − √ ,x→∞, x π the asymptotic behaviour of Equation (1.32) can be analysed by considering two distinct cases: For E

Hence, in the thermodynamic limit, 2E 2 (1.35) SG = NkB log 2 1 − − 1 , EM agreeing with the Boltzmann picture. However, for E>EM /2, ( ) 2E (1.36) erf N log 2 +erf − 1 N log 2 ∼ 2, EM hence

(1.37) SG ∼−NkB log 2 =⇒ TG →∞, and for N<∞,0EM /2), the Boltzmann temperature is negative but TG is positive and definite. One can easily construct a classical system (for example a perfect gas) with the same value of TG. Should one therefore conclude that the our two level system with an inverted population can be in thermal equilibrium with a perfect gas? Obviously not. We would expect the two-level system to lose energy to the gas raising the (Boltzmann) entropy of both. In the conventional picture of course, a population inverted state has a negative Boltzmann temperature and is always hotter than a normal system with a positive Boltzmann temperature, so one would always expect heat transfer to take place. Therefore, the Gibbs temperature fails the zeroth law of thermodynamics — equal Gibbs temperature does not imply thermal equilibrium, and a “Gibbs thermometer” would be (practically) useless. In summary: whilst there are infinitely many ways to define functions that have some of the properties of entropy, the important properties that such definitions must satisfy are those that allow us to make the link to thermodynamics in the limit of large system sizes.

3. From entropy to thermodynamic ensembles In the previous section, we consider the thermodynamic behaviour of an isolated system, where the total energy is conserved. However, most experiments are not carried out in a thermos flask. We very often consider systems that are in contact with a much larger heat reservoir. We can still consider the total energy of system plus heat bath to be fixed, but allow exchanges between our small system and the bath so as to keep the global quantities (energy, volume, number of particles) constant. We can then allow heat flow to occur between the system and the bath such that the total energy is constant, or we allow the system to exchange volume with the bath (e.g. a piston that can freely expand) such that the total volume is constant. How this exchange occur microscopically is not important to us, and what precisely is the bath is also unimportant. The elegance of this formalism is that it is completely independent of those factors. The defining feature of a bath is that it is sufficiently large that it can transfer energy, volume or particles to the system without changing its temperature, pressure or chemical potential. Physically, the surroundings (air, the container etc.) contains far more degrees of freedom than our 120 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING system under consideration and thus often acts as an ideal bath. The ensemble of states that the system under study can occupy either in isolation or in contact with a bath, is clearly very large. Importantly, the ensemble of states depends on the nature of the contact with the bath: is it isolated? Can it exchange energy? Can it exchange volume? Can it exchange particles? These different conditions give rise to different ensembles, some of which we briefly discuss.

3.1. Micro-canonical ensemble The simplest ensemble is the micro-canonical ensemble, where we keep the num- ber of particles N, the volume of the system V , and the energy of the system E fixed. This system is completely isolated from its surroundings. We already know that the entropy of a system in the micro-canonical ensemble is given by S = kB ln Ω(N,V,E).

3.2. Canonical ensemble A very common ensemble is the canonical ensemble, where we keep the number of particles N, the volume of the system V , and the temperature of the system T fixed. This is sometimes denoted in the literature as the (N,V,T) ensemble. Fixing the temperature means that we allow heat flow between the system and the reservoir. The sum of the energy between the system and bath, E,isfixed. Now suppose that the system is prepared in a specific state s =(pN , qN ), where rN and pN denote the positions and momenta of the N particles, respectively. The energy of the system is given by the Hamiltonian H = H(s). Whilst the system is in a point s in the phase space, the bath can access the entire phase space with energy E − H(s). Therefore, denoting the phase space volume of the bath as ΩB, the probability that the small system is in state s is thus

−  ΩB(E H(s)) (1.38) p(s)=   . ds ΩB(E − H(s ))

Noting that

∂ log Ω (E − H(s)) = log Ω (E) − H(s) log Ω (E)+··· B B ∂E B H(s) (1.39) =logΩB(E) − + ··· , kBT and neglecting higher order terms (as they become negligible in the thermodynamic limit), Equation (1.38) can be rewritten as

− H(s) e kB T (1.40) p(s)= H(s)  − ds e kB T LECTURE 1. INTRODUCTION 121

The probability distribution for s is related to the probability distribution for energy E by ρ(E)= ds δ(E − H(s))p(s)

− E e kB T = ds δ(E − H(s)) Z − E Ω(E)e kB T = Z − E−TS(E) e kB T (1.41) = , Z where   − H(s ) (1.42) Z = ds e kB T is known as the partition function, and we have make use of the definition of entropy to rewrite Ω(E)=eS/kB . Equation (1.41) shows that the most probable energy E is given by the minimiser of (1.43) A = E − TS, aquantityknownastheHelmholzfreeenergy. The average value of the energy is given by H = ds H(s)p(s) ∂ (1.44) = − log Z, ∂β where β ≡ 1/(kBT ). To relate the most probable energy and the average energy, we need to know the standard deviation in the energy distribution. This is simply given by * + ∂2 log Z ∂ H (1.45) σ2 = H2 −H2 = = − . H ∂β2 ∂β √ ∼ The right hand side√ is an extensive function (as H is extensive), hence σH N and σH / H∼1/ N. Thus in the thermodynamic limit where N →∞,the fluctuations in the energy decreases to zero and thus the median and the mean is the same. Therefore, we can equate ∂ (1.46) A = − log Z. ∂β 3.3. Generalised ensembles and the Legendre transform TheformoftheHelmholtzfreeenergy ∂E (1.47) A(N,V,T(S)) = E(N,V ) − S ∂S is not a coincidence. In fact, these kinds of relations are known in classical ther- modynamics as Legendre Transforms. This can be stated in more general terms: suppose we have a function of two variables f(x, y). By the chain rule, we can write ∂f ∂f (1.48) df = dx + dy ≡ pdx + qdy. ∂x ∂y 122 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

The key observation is that if we subtract d(qy) from both sides, we obtain (1.49) d(f − qy)=pdx − ydq. Thus we have a new function Φ(x, q)=f − qy, with p = ∂Φ/∂x and y = −∂Φ/∂q. Therefore, the Legendre transform allows us derive a relation between x and q, given f(x, y).

The isobaric-isothermal ensemble Using the Legendre transform technique, we can derive the free energy when the volume is allowed to fluctuate, but the pressure is fixed. To do this, we note that ∂E ∂A (1.50) = = −p, ∂V S ∂V T and as such ∂E ∂E G(N,P(V ),T(S)) = E(N,V ) − S − V ∂S V ∂V S (1.51) = E − TS + pV, where G is called the Gibbs free energy. The Gibbs free energy is an important quantity in chemistry and biochemistry, as chemical reactions usually takes place at ambient pressure rather than at constant volume. The quantity H = E + pV is know as the enthalpy, and at constant temperature (1.52) ΔG =ΔH − T ΔS. Physically, the change in enthalpy is the heat absorbed (ΔH>0) or released (ΔH<0) by the system. Forming stronger bonds instead of weaker ones releases heat, and thus the enthalpy change is negative. On the other hand, the entropy is a measure of the number of realisations of the system (the meaning of this will be explored in Lecture 3). Thermodynamically, the system tends to minimise its Gibbs free energy, and therefore if the system loses realisations (negative entropy), it must form stronger bonds (negative enthalpy), such that ΔG<0. Analogous to the canonical ensemble, the partition function in the Isobaric- Isothermal ensemble is defined as −βG −βpV (1.53) Zp = ds dVe = dVZ(N,V,T)e .

The extra integration over volume is necessary as we also allow the volume to fluctuate, and similar to Equation (1.46) we have ∂ (1.54) G = − log Z . ∂β p Note that we have argued here by analogy to classical thermodynamics. The mi- croscopic derivation is rather more lengthy and is left for the diligent reader.

The grand canonical ensemble Another commonly used ensemble is the grand canonical ensemble. In the grand canonical ensemble, we keep the volume constant, but allow both energy transfer and mass transfer. A quantity that controls whether the particles prefer to be in the LECTURE 1. INTRODUCTION 123 thermodynamic reservoir or the system is the energy it takes to transfer a particle from the system to the bath. This is given by the chemical potential μ, defined as ∂E ∂A (1.55) μ = = . ∂N S ∂N T Now, using Legendre transform again, the grand canonical free energy is given by ∂E ∂E Λ(μ(N),V,T(S)) = E(N,V ) − S − N ∂S V ∂N S (1.56) = E − TS − Nμ. From thermodynamics, it is clear that Λ(μ(N),V,T(S)) = −pV . By analogy, the partition function is defined as the trace of e−βΛ over all fluctuating quantities, i.e. ∞ βNμ (1.57) Zμ = e Z(N,V,T), N=0 and

(1.58) pV = kBT log Zμ.

4. Exercises (1) Consider the coupled heat pump and heat engine in Figure 7. Derive a

TH T'H

qH q'H

w

qC q'C

Tc T'c

Figure 7. A prototypical refrigerator.

 relation that relate qc to qH and the temperatures of the reservoirs. (This is a prototypical refrigerator.) (2) (a) If we have a system that has 1 E − 1 sin 2E 2E Ω(E)=2+ e 2 4 ( ) 1 − cos , 2  where  is an energy scale, show that a single Boltzmann or Gibbs temperature can correspond to many values of E. (b) Is the zeroth law still satisfied with this particular Ω(E)? (c) Can you come up with a physical system with this form of Ω(E)? 124 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

(3) Using the definition of the Gibb’s free energy G = H − TS, the definition of enthalpy H = U + pV and the Gibbs-Duhem equation G = Nμ, derive the thermodynamic relation Ndμ = −SdT + V dp. LECTURE 2 Thermodynamics of phase transitions

Many-body systems can exist in different states, the most common of which are the vapour, liquid and crystalline state. The transitions between these phases are not gradual — for instance, at the transition, the solid phase and the liquid phase have different densities and very different structures. Such abrupt transitions between two phases are called a first-order phase transitions. In this lecture, we examine the thermodynamics and statistical mechanics of first-order phase transitions.

1. Thermodynamics of phase equilibrium Let us begin by reviewing the thermodynamics of phase equilibrium. Physical intuition suggests that the condition for coexistence of two or more phases 1, 2, ··· is that the intensive quantities in the phases must be equal: i.e. pressure of all coexisting phases must be equal (p1 = p2 = ··· = p), as must be the temperature (T1 = T2 = ···= T ) and the chemical potentials of all species α (μ1 = μ2 = ··· = μα). These conditions follow directly from the second law of thermodynamics. This can be seen by considering a closed system of N particles in a volume V with a total energy E (see Figure 1). Suppose that this system consists of two distinct phases 1 and 2 that are in equilibrium. Phase 1 consists of N1 particles in avolumeV1, with a total energy E1. Phase 2 consists of N2 = N − N1 particles in volume V2 = V −V1 and energy E2 = E−E1. Note that we ignore any contributions of the interface between 1 and 2 and retained only the extensive properties N,V and E. This is allowed in the thermodynamic limit because extensive properties vary linearly with the number of particles, whereas the corresponding interfacial properties (e.g. the surface energy) vary as N 2/3 (in three dimensions). Then the ratio of surface-to-bulk properties scales as N −1/3, which goes to zero in the thermodynamic limit (N →∞). The second law of thermodynamics states that, at equilibrium, the total entropy (S = S1 + S2)isatamaximum.Hence

(2.1) dS =dS1 +dS2 =0. However, by using the chain rule ∂S ∂S ∂S dS = dE + dV + dN ∂E ∂V ∂N dE p μ (2.2) = + dV + dN. T T T As dE = −dE ,dV = −dV ,dN = −dN , maximum total entropy implies 1 2 1 2 1 2 1 1 p1 p2 μ1 μ2 (2.3) − dE1 + − dV1 + − dN =0, T1 T2 T1 T2 T1 T2

125 126 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Figure 1. Thermodynamic conditions for phase coexistence are equal pressures, temperatures and chemical potentials. thus

(2.4) T1 = T2,p1 = p2,μ1 = μ2. How do we compute the point where phases coexist? Let us consider the case where we have an analytical approximation of the Helmholtz free energy of both phases. For simplicity, we assume that we are dealing with a one-component system. The question then is, given A(N,V,T) for both phases, how do we find the the equilibrium volume of each phase. It turns out that there is a nice graphical method to do this (see Figure 2). For a given temperature, we plot A1(N,V,T) and A2(N,V,T)asafunctionofV while keeping N and T constant. We note that the pressure at constant temperature is given by ∂A (2.5) p = − , ∂V T thus equal pressures imply equal slopes. The second criterion we require to uniquely find the equilibrium volumes is by noting that the chemical potential must be the same (we are fixing the temperature). Now, the key observation is that if we draw a line that is tangent to both curves, the intercept of the line at V =0is ∂A (2.6) A − 1 V = A + pV = Nμ , 1 ∂V 1 1 1 1 thus the line that is tangent to both curves locates the equilibrium volumes of the phases.

2. Thermodynamic integration The double-tangent construction described in Figure 2 is useful if we actually know the free energy of the phases. Unfortunately, it is not possible to measure the free energy (or entropy) directly. What we can measure are averages of mechanical quantities. Those quantities, such as the pressure or the dielectric constant, are averages of functions of the coordinates and momenta of the molecules in the system. If we denote such a mechanical quantity by X(s), then from Equation (1.40) the average of X that can be measured in an experiment at constant N, V and T is  ds X(s)e−βH(s) (2.7) X =  N,V,T ds e−βH(s) where H(s) is the Hamiltonian of the system. LECTURE 2. THERMODYNAMICS OF PHASE TRANSITIONS 127

A solid

Nμ liquid

V s VL V

Figure 2. The double tangent construction for phase coexistence.

However, the entropy, the free energy and related quantities are not simply averages of functions that depend on the system’s phase-space coordinates. Rather, they are directly related to the volume of phase space that is accessible to a system. This is why A and, for that matter, S or G, cannot be measured directly. That said, we can compute A analytically in certain limits (e.g. the ideal gas). Therefore, the strategy is to start with a system whose free energy we know. Next, we changes the system slowly, e.g. by heating it up, or by compressing it, until the system is brought to the state where the free energy is not yet known. By taking account of the exchange of heat, mass and work during the process we should be able to find the new free energy. Suppose we are trying to compute the free energy of a gas. We can measure its pressure p = −(∂A/∂V )N,T as a function of V . At sufficiently large volume V0, the gas behaves as an ideal gas and A(V0)=Aid(V0), thus V ∂A A(V )=A (V )+ dV  id 0 ∂V V0 V   (2.8) = Aid(V0) − dV p(V ). V0 To know the free energy as a function of temperature, we need to measure the internal energy E as a function of temperature (e.g. via calorimetric measurements). From the definition of the free energy A = E − TS,wehave ∂A/T (2.9) E = . ∂(1/T ) N,V

If we know the free energy at a particular temperature T0, we can integrate and obtain 1/T A(T ) A(T0) 1 ∂A/T = + d  T T0 1/T0 T ∂(1/T ) N,V 1/T A(T0) 1  (2.10) = + d  E(N,V,T ). T0 1/T0 T 128 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

01λ

Figure 3. Computing protein-substrate binding free energy by thermodynamic perturbation.

2.1. Perturbation theory Aside from experimental measurements, statistical mechanics tells us that the free energy is simply the logarithm of the partition function. If we can compute the partition function of the system that we are interested in, statistical mechanics will tell us what the free energy is. However, the partition function is non-trivial to compute for any realistic systems — it is a 3N dimensional integral over the entire configuration space. However, suppose we are lucky and know the partition function of a “close enough” system. We would hope that we can play the same game as thermodynamic integration, and exploit our knowledge for a similar system to our advantage. We will begin by generalising thermodynamic integration further to the case where the Hamiltonian depends on a parameter λ, which is completely general now but would be made clear below. Now, A = A(N,V,T,λ), and ∂A β−1 ∂Z(N,V,T,λ)) = − ∂λ Z ∂λ N,V,T  N,V,T ds ∂H e−βH(s,λ) = ∂λ , - Z ∂H (2.11) = , ∂λ λ ··· where λ denotes the ensemble average keeping λ constant. Equation (2.11) is very useful in molecular simulations. Suppose we want to calculate the free energy change of a substrate binding to a protein. Let us parameterise the protein- substrate interaction in the Hamiltonian by λ,say

(2.12) H(λ)=λHprotein−substrate +(1− λ)H0, where H0 is the Hamiltonian for the non-interacting protein and substrate, and Hprotein−substrate is the full Hamiltonian (see Figure 3). We can compute the average energy at each λ by running a simulation, and numerically calculate the binding free energy ΔA via , - 1 ∂H (2.13) ΔA = dλ . 0 ∂λ λ We note that in practical simulations, there are a lot of other intricacies concerning sampling error etc. which we have not touched on. LECTURE 2. THERMODYNAMICS OF PHASE TRANSITIONS 129

Our generalised thermodynamic integration (2.11) is not only a useful com- putational tool, but also a handy analytical technique. Let us assume that the interaction energy of a simpler reference system is denoted by U0,ofwhichthe partition function Z0 is a quantity which we have an analytical handle on, while the potential energy function of the system of interest is denoted by U1.Inorderto compute the free energy difference between the known reference system and the sys- tem of interest, we use, as before, a linear parameterisation of the potential energy − − 1 − function, U = λU0 +(1 λ)U1. Equation (2.11) gives A1 A0 = 0 dλ U1 U0 . Now, let us evaluate the second derivative of the free energy A(λ) with respect to λ, , - ! , - ∂2A ∂2U ∂U 2 ∂U 2 = − β − β ∂λ2 ∂λ2 ∂λ ∂λ (. λ / ) 2 2 (2.14) = −β (U1 − U0) + U1 − U0 ≤ 0

Therefore, as the second derivative is non-positive ∂A ≤ ∂A ⇒ ≤ − (2.15) = A1 A0 + U1 U0 0 . ∂λ ∂λ λ=0 This variational principle for the free energy is known as the Gibbs-Bogoliubov inequality. It implies that we can compute an upper bound to the free energy of the system of interest, from a knowledge of the average of U1 −U0 evaluated for the reference system. Of course, the usefulness of Equation (2.15) depends crucially on the quality of the choice of reference system. A good reference system is not necessarily one that is close in free energy to the system of interest, but one for which the fluctuations in the potential energy difference U1 − U0 are small.

Crystals with long-ranged interaction To illustrate the power of thermodynamic integration, we consider particles in the solid state with long-ranged interparticle interactions V (|ri − rj |) and Hamiltonian

1 N (2.16) H = V (|r − r |). 2 i j i= j Computing the partition function for this Hamiltonian is analytically impossible as the long-ranged interaction makes the integrals intractable. However, our strategy is to exploit that fact that we know the system is crystalline. Therefore, to first approximation, we can consider particles being tied to their equilibrium positions by a harmonic force and use this as a reference system, i.e.

1 N (2.17) H = k (r − R )2, 0 2 i i=1 where k is the effective spring constant, a parameter that we will compute at the end th of the calculation. Ri denotes the position of the i lattice site. As the particles are uncoupled, we can write the N-particle partition function as ∞ N 2 π 3N/2 (2.18) Z = e−αr dr = , −∞ α 130 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING with α = βk, thus the free energy of the reference system is given by 3 π (2.19) A = Nk T log . 0 2 B α Noting that by the equipartition theorem, we have ! 1 N 3 (2.20) k (r − R )2 = Nk T, 2 j j 2 B i=1 0 and hence ! N − 1 | − | − 3 H H0 0 = V ( ri rj ) NkB T 2 2 i=j 0 1 3 (2.21) = dr drρ(2)(r, r)V (|r − r |) − Nk T, 2 0 i j 2 B (2)   where ρ0 (r, r ) is the probability of simultaneously finding a particle at r and at r in the reference system. As particles in the reference system are uncorrelated, the (2)  joint probability is the product of the local density of particles, i.e., ρ0 (r, r )= (1) (1)  ρ0 (r)ρ0 (r ). By noting the probability distribution in the canonical ensemble, we have 3/2 N (1) α − | − |2 (2.22) ρ (r)= e α r Ri . 0 π i=1 Now, the Gibbs-Bogoliubov inequality provides an upper bound to the free energy 3 π 1 (2.23) A ≤ Nk T log − 1 + dr drρ(1)(r)ρ(1)(r)V (|r −r |) ≡ A˜(α). 2 B α 2 0 0 i j To determine closest approximation to the true free energy A, we minimise the RHS of (2.23) with respect to α, i.e. (2.24) α∗(ρ, T ) = argmin A˜(α), where ρ is the density, and the approximate the free energy is (2.25) A˜(ρ, T )=A˜(α∗(ρ, T )). We leave it as an exercise the computation of A˜ for a particular interaction potential.

3. The chemical potential and Widom particle insertion When we introduced the grand canonical ensemble, we defined the concept of the chemical potential of a species. But what is the chemical potential — what is its physical meaning? To understand the meaning of a macroscopic quantity, it is often useful to consider how one would determine it on a microscopic scale. To determine the chemical potential, we will discuss a scheme proposed by Widom that has subsequently been used to measure (computationally) the chemical potential μ of a species in a pure liquid or in a mixture. Consider the definition of the chemical potential μ of a species α. From thermodynamics, we know that μ can be defined as ∂A (2.26) μ = . ∂N V,T LECTURE 2. THERMODYNAMICS OF PHASE TRANSITIONS 131

We can rewrite this using the finite difference approximation log Z(N +ΔN,V,T) − log Z(N,V,T) μ = −kBT lim ΔN→0 ΔN Z(N +1,V,T) (2.27) = −k T log ,N→∞. B Z(N,V,T) Now, we wish to separate the chemical potential into the part due to ideal gas behaviour, and an excess part. The partition function of an ideal gas is simply V N (2.28) Z (N,V,T)= , id N! where the factor of N! accounts for the fact that the particles are indistinguishable 3N — we will revisit this in the last lecture (there is also a factor of ΛDB where ΛDB is the thermal de Broglie length due to integration over momentum degrees of freedom. Without loss of any generality we will set it to 1.). As such, Zid(N +1,V,T) V (2.29) μid = −kBT log = −kBT log , Zid(N,V,T) N +1

Thus the excess part of the chemical potential. μex = μ − μid,isgivenby Z(N +1,V,T)(N +1) (2.30) μ = −k T log , ex B VZ(N,V,T)

 N now, recalling that Z(N,V,T)= drN e−βU(r )/N !, the excess chemical potential reads  N+1 drN+1e−βU(r ) (2.31) μ = −k T log  ex B V drN e−βU(rN )

N N+1 N N If we define ΔU(r ; rN+1) ≡ U(r ) − U(r ) (in our notation r ≡ (r1, ··· , rN ) th whilst rN+1 is the position vector of the (N +1) particle),  − N − N drN+1e βU(r ) βΔU(r ;rN+1) μ = −k T log  ex B V drN e−βU(rN )  − N − N 1 drN e βU(r ) βΔU(r ;rN+1) = −k T log dr  B V N+1 drN e−βU(rN ) . / 1 N −βΔU(r ;rN+1) (2.32) = −kBT log drN+1 e . V N In a homogenous liquid, the system is translationally invariant, hence ΔU should not depend on rN+1,thus * + − −βΔU (2.33) μex = kBT log e N . Equation (2.33) provides us with a way to understand the chemical potential in terms of the work needed to insert an additional particle at a random position in a system where N particles are already present (hence the term “particle insertion”). This interpretation of the chemical potential provides us with both a numerical method to compute the chemical potential via simulations, and an analytical han- dle. 132 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Viral expansion of an imperfect gas To illustrate how Equation (2.33) can be used as an analytical tool, we use the particle insertion method to calculate the correction to the ideal gas equation of state due to intermolecular interactions. We express the equation of state as an ascending power series of the density ρ, p 2 (2.34) Z ≡ =1+B2ρ + B3ρ + ··· , ρkBT where Bi are called the viral coefficients. At low densities, Z → 1, and we recover ideal gas behaviour. The goal here is to relate the viral coefficients to the inter- particle interactions. To this end, we first use thermodynamics to relate pressure to the chemical potential. The Gibbs-Duhem relation gives us (see Exercise 3 of Lecture 1) (2.35) −SdT + V dp − Ndμ =0, thus 1 ∂p ρ ∂μ (2.36) = . kBT ∂ρ T kBT ∂ρ T As μid = kBT log ρ, ρ ∂μ (2.37) id =1, kBT ∂ρ and using the viral expansion (2.34), we can write

1 ∂p 2 (2.38) =1+2B2ρ +3B3ρ + ··· . kBT ∂ρ

Thus, noting μex = μ − μid, Equation (2.36) yields

ρ ∂μex 2 (2.39) =2B2ρ +3B3ρ + ··· . kBT ∂ρ

This gives us a limiting relation between μex and B2

(2.40) lim μex =2kBTB2ρ. ρ→0

Next,wewouldliketomakeuseoftheWidommethodtocomputeμex as ρ → 0. We consider the random insertion of a test particle in a very dilute gas. Because the gas is so dilute, we can safely ignore the possibility that the inserted particle will simultaneously interact with more than one other particle. We assume that the interaction between two particles becomes negligible beyond some finite 1 distance Rmax . We can decompose the total volume into a volume Vint in which interaction with another particle is possible, and the remainder, V − Vint,where the inserted particle does not interact at all with the rest of the system. As the inserted particle can interact with at most one of the particles in the system, the volume Vint is equal to the sum of all the interaction zones between the particles that are already present, 4π (2.41) V = N R3 ≡ Nv , int 3 max int where vint denotes a single interaction volume. We have used the fact that at very low densities, the interaction volumes almost never overlap.

1This is a key assumption in viral expansion formalism. For long-ranged interactions like the Coulomb interaction, this formalism completely fails. LECTURE 2. THERMODYNAMICS OF PHASE TRANSITIONS 133

Now we are in a position to compute the expectation value * + −βΔU (2.42) e N .

With a probability (V − Vint)/V , the test particle will be inserted into a region where ΔU = 0. With a probability Vint/V , the insertion will happen in one of the “interaction zones”. The average value of exp(−βΔU) inside an interaction zone is * + −βΔU 1 −βφ(r) (2.43) e int = dr e vint vint where φ(r) is the interaction potential. Here, r denotes the distance between the position of the test particle and the particle at the centre of the interaction zone. Combining the above results, we obtain * + − −βΔU V Vint 0 Vint −βφ(r) (2.44) e = e + × vint dre V V vint

Using the fact that Vint/vint = N, we find that * + e−βΔU =1+ρ dr(e−βφ(r) − 1) vint (2.45) ≈ 1+ρ dr(e−βφ(r) − 1). V In the last step we have extended the integration to over all space as φ(r) ≈ 0 outside the interaction zone. Thus −βφ(r) μex = −kBT log 1+ρ dr(e − 1) V −βφ(r) (2.46) = kBTρ dr(1 − e ),ρ→ 0. V Hence, by making use of (2.40), we arrive at a relation between the second viral coefficient and the interaction potential 1 −βφ(r) (2.47) B2 = dr(1 − e ). 2 V

4. Exercises (1) We would like to calculate the free energy of a charged colloidal crystal in- e−κr teractingviaaYukawapotentialV (r)= κr via a variational approach. (a) Recall the upper bound to the free energy 3 α 1 F (α)= Nk T log − 1 + dr drρ0(r)ρ0(r)V (|r − r|), 2 B π 2 1 1 show that 3 α 1 1 F (α)= Nk T log − 1 + dk dk|ρˆ0(k)|2Vˆ (k), 2 B π 2 (2π)3 1 where the Fourier transform is defined as fˆ(k)= dr e−ik·rf(r) 134 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

(b) Show that N − k2 − · | 0 |2 2α ik Rj ρˆ1(k) = Ne e j=1 (c) Hence or otherwise, show that −xj ∗ xj ∗ ∗ 1 1 e 1 − α xj e 1+α xj 2α∗ √ − √ F (α )= e erfc ∗ erfc ∗ 4t xj 2α xj 2α j 3 α∗κ2 + log − 1 , 2 π ∗ 2 where α = α/κ , xj = κRj , t = kBT/ and ref(x)istheerror function. The sum extends over all the shells of a Bravais lattice which are at a distance xj of the origin of coordinates. ∗ (d) Find f(T,ρ)=minα∗ F (α ), and compare it with numerical simula- tions [13] (see also Ref [3]). (2) Consider interaction potential of the type v(r)=−1/rα. Show that the second viral coefficient for a system in D dimensions is bounded if and only if D<α. Thus the virial approach fails for strongly correlated systems (e.g. aCou- lomb gas). (3) Show that the Helmholtz free energy can be written as V N A = −k T log . B βU e N,V,T Why is this a NOT a viable computational method? (Hint: Can you easily sample high energy states? ) LECTURE 3 Order from disorder: Entropic phase transitions

The second law of thermodynamics tells us that any spontaneous change in an isolated system results in an increase in the entropy, S. In this sense, all spontaneous transformations of one phase into another are entropy driven. However, this is not what the term “entropic phase transitions” is meant to describe. It is more common to consider the behavior of a system that is not isolated, but can exchange energy with its surroundings. In that case, as we see in Lecture 1, the second law of thermodynamics implies that the system will tend to minimise its Helmholtz free energy A = E − TS,whereE is the internal energy of the system and T the temperature. Clearly, a system at constant temperature can lower its free energy in two ways: either by increasing the entropy S,orbydecreasing the internal energy E.Inorder to gain a better understanding of the factors that influence phase transitions, we must look at the statistical mechanical expression for entropy. We should remind ourselves that the entropy of an isolated system of N particles in volume V at an energy E is given by

(3.1) S = kB log Ω , where kB, the Boltzmann constant, is simply a constant of proportionality. Ω is the volume of the phase space that is accessible to the system. The usual interpretation of Equation (3.1) is that Ω, the number of accessible states of a system, is a measure for the “disorder” in that system. The larger is the disorder, the larger is the entropy. This interpretation of entropy suggests that a phase transition from a disordered to a more ordered phase can only take place if the loss in entropy is compensated by the decrease in internal energy. This statement is completely correct, provided that we use Equation (3.1) to define the amount of disorder in a system. However, we also have an intuitive idea of order and disorder: we consider crystalline solids “ordered”, and isotropic liquids “disordered”. This intuitive picture suggests that a spontaneous phase transition from the fluid to the crystalline state can only take place if freezing lowers the internal energy of the system sufficiently to outweigh the loss in entropy: i.e. the ordering transition is “energy driven”, and the particles stick together in spite of entropy when the temperature is low enough. In many cases, this is precisely what happens. It would, however, be a mistake to assume that our intuitive definition of order always coincides with the one based on Equation (3.1). In fact, the aim of this lecture is to show that many “ordering” transitions that are usually considered to be energy-driven may, in fact, be entropy driven. We stress that the idea of entropy-driven phase transitions is an old one. However, it has only become clear during the past few years that such phase transformations may not be interesting exceptions, but the rule!

135 136 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

In order to observe “pure” entropic phase transitions, we should consider sys- tems for which the internal energy is a function of the temperature, but not of the density. Using elementary statistical mechanics, it is easy to show that this condition is satisfied for classical hard-core systems. Whenever these systems order at a fixed density and temperature, they can only do so by increasing their entropy (because, at constant temperature, their internal energy is fixed). Such systems were initially studied in computer simulations. But, increasingly, experimentalists, in particular colloid scientists, have succeeded in making real systems that behave very nearly as ideal hard-core systems. Hence, the phase transitions we discuss below can, and in many cases do, occur in nature.

1. Hard-sphere freezing The most famous, and for a long time controversial, example of an entropy-driven ordering transition is the freezing transition in a system of hard spheres. This transition had been predicted by Kirkwood in the early fifties on the basis of an approximate theoretical description of the hard-sphere model. As this prediction was quite counter-intuitive and not based on any rigorous theoretical results, it was met with wide-spread skepticism until Alder, Wainwright, Wood and Jacob- son performed numerical simulations of the hard-sphere system that showed direct evidence for this freezing transition [1, 18]. Even then, the acceptance of the idea that freezing could be an entropy driven transition came only slowly. However, by now, the idea that hard spheres undergo a first-order freezing transition 1 is generally accepted. Indeed, we now know a great deal about the phase behaviour of hard spheres. Since the work of Hoover and Ree, we know the location of the thermodynamic freezing transition and we now also know that the face-centred cubic phase is more stable than the hexagonal −3 close-packed phase, be it by only 10 kBT per particle. In order to understand the freezing transition of hard spheres, we need a theo- retical description that allows us to compute the thermodynamic properties of both the liquid and the solid state. At present, such information is usually obtained from computer simulations. However, it is possible to construct relatively simple approx- imations for the free energy and the pressure of both the liquid and the crystalline states. With this information, we can then determine the point of phase coex- istence, i.e. the point where the solid and the liquid phase have both the same pressures and the same chemical potentials. The determination of this coexistence point is usually performed graphically, using the double-tangent construction.

1.1. The liquid phase Let us first consider the liquid. If we have an expression that allows us to estimate the pressure p of the hard-sphere liquid as a function of the density ρ,thenwecan compute the Helmholtz free energy A and the chemical potential. To begin, we recall the Widom particle-insertion method to compute the excess −βΔU chemical potential: μex = −kBT loge . If the particle that we add is a point particle, then every particle that is present occupies a volume (4π/3)R3 where the new particle cannot be inserted and the probability of inserting such a small particle

1The order of a phase transition is determined by the lowest derivative of the free energy with respect to a thermodynamic variable that is discontinuous at the transition. The solid/liquid ∂G phase transition is first order as the volume V = ∂p is discontinuous. LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 137

− 4π 3 successfully is simply 1 3 ρR . Now, consider the insertion of an infinitesimally small particle of radius . The exclusion volume when the small particle and large 4π 3 particle are in contact is given by Ve =(3 )π(R + ) . Thus the probability p0 to insert such a small particle successfully is 4π (3.2) p =1− ρ (R + )3 , 0 3 where we have exploited the fact that the particle is small, and therefore the ex- clusion volumes do not overlap. The work done to insert a particle of radius  is given by the excess chemical potential, and is 4π (3.3) w ()=−k T log 1 − ρ (R + )3 . 0 B 3 Next, we consider the probability to insert a large particle. If the added par- ticle is much larger than all other particles, we can use macroscopic arguments to estimate this work. For very large particles, the work is dominated by the pV work (=(4π/3)r3p). For smaller particles, terms of order r2 and r may also contribute. We thus assume that w(r)isoftheform: 1 4 (3.4) w(r)=a + a r + a r2 + pπr3, 0 1 2 2 3 where p is the pressure. The constants a0, a1 and a2 are not known aprioriand will later be eliminated. The key insight is that the w(r) ∼ w0(r)asr → 0. Therefore, matching up to the second derivative of Equations (3.3) and (3.4) at r =0,wehave 4πR2ρ 4πR2ρ 2 8πRρ (3.5) a = − log(1 − η),a= ,a= + , 0 1 1 − η 2 1 − η 1 − η where 4π (3.6) η = ρR3 3 is the packing fraction. Thus 6η 9η2 βPη (3.7) βμ ≡ βw(R)=− log(1 − η)+ + + . ex 1 − η 2(1 − η)2 ρ Using the thermodynamic relation (2.36), we have ∂μ ∂p (3.8) 1 + βρ ex = β , ∂ρ ∂ρ and differentiating Equation (3.7) with respect to ρ gives ∂μ ∂p η(η2 + η +7) (3.9) βρ ex = βη + . ∂ρ ∂ρ (1 − η3)

We can eliminate the derivative of μex and obtain ∂p 1 η(η2 + η +7) (3.10) β = + . ∂ρ 1 − η (1 − η)4 Integrating both sides with respect to ρ, noting that at p =0whenρ = 0, we arrive at an expression for the pressure 1+η + η2 (3.11) p = k Tρ , B (1 − η)3 138 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING the excess chemical potential η(14 − 13η +5η2) (3.12) βμ = − log(1 − η). ex 2(1 − η)3 The total chemical potential is the sum of the ideal gas contribution and the excess chemical potential. Noting that the ideal gas free energy is given by βμid =logρ, where ρ isthedensity,wearriveat 6 η(14 − 13η +5η2) (3.13) βμ =log η + − log(1 − η). π 2(1 − η)3 Note that we have set the particle diameter σ = 1 — the value of σ does not change the coexistence pressure as long as it is set consistently for the liquid state calcu- lation and the solid state calculation. Changing σ amounts to adding a constant to both the solid and liquid chemical potentials, which does not change the point where the two chemical potentials cross.

1.2. The solid phase Here we use a very simple and widely use model developed by Lennard-Jones and Devonshire known as the cell theory. We know that in a crystal, every particle lives in a cell defined by its neighbours. At close packing, the volume of the cell in which a particle can move will vanish. However, if we expand the crystal a little, then every particle can move in its own “cell”. To estimate the accessible volume, we note that the volume v per particle in a simple crystal is equal to the volume V of the crystal divided by the number of particles (this is the volume of the so-called Wigner-Seitz cell). However, most of this volume is not accessible to the particle. In particular, at close packing, the volume per particle is v0 = Vcp/N and the accessible volume is zero. If we now expand the crystal, the volume that is accessible to a particle has the same shape as the Wigner-Seitz cell, but its volume is 3 (3.14) vfree = c(rnn − σ) , where rnn is the nearest-neighbour distance in the expanded crystal and σ,the hard-sphere diameter, is the nearest neighbour distance at close packing. c is a geometrical factor. We assume that the free volume has the same shape as the Wigner-Seitz√ cell of the crystal lattice (face-centered cubic lattice), in which case c =4 2. We now use the fact that V 1/3 (3.15) rnn = σ , V0 and hence √ 1/3 3 3 V (3.16) vfree =4 2σ − 1 . V0 We now assume that all particles can move independently in their own cell. This is, of course, an over simplification. But, with this assumption, we can write down a very simple expression for the configurational part of the partition function of the crystal: N (3.17) Zcell = vfree , LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 139

17

16

15 μ

14

13 Solid Phase (fcc) Liquid Phase

12 4 4.5 5 5.5 6 6.5 (π R3/6)β P

Figure 1. Location of the freezing transition of hard spheres, us- ing the scaled particle equation of state for the liquid and the cell model for the solid (see text). The coexistence pressure is in (fortuitously) good agreement with the result from computational simulation of Pcoex =6.06 in the pressure units used in this figure. and hence, % & 1/3 √ −1 V (3.18) Acell = −Nβ log vfree = −3NkBT log − 1 − log(4 2) . V0 As in the case of the liquid, we have left out constants that depend on temperature only and constants that are the same for the solid and the liquid (note again that we have consistently set σ = 1 in the solid phase and liquid phase). The pressure is given by ∂A Nβ−1 (V/V )−2/3 (3.19) p = − cell ≈ 0 . 1/3 − ∂V T V0 (V/V0) 1 Using the relation Nμ = A + pV , we obtain the chemical potential for the solid ⎧ % & ⎫ ⎨ 3⎬ √ V 1/3 1 (3.20) μ = − log 4 2 − 1 + . cell 1/3 ⎩ V0 ⎭ 1 − (V0/V )

To investigate the possibility of hard sphere phase transition, we plot the chem- ical potential as a function of pressure. Figure 1 shows the result of this calculation. The chemical potentials for solid and liquid cross in the μ − p plane, showing that there is a first order phase transition between the fluid phase and solid phase as the pressure is increased. The key to understanding why entropic freezing occurs is to note that entropy is not simply a measure of disorder, but a measure of “freedom of movement”. In 140 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Figure 2. In experiments by Pusey and van Megen [14]onsus- pensions of hard, colloidal particles, “hard-sphere” freezing was observed in the density regime where simulations had predicted the occurrence of an entropic freezing transition. Reprinted by permission from Macmillan Publishers Ltd.: P. N. Pusey and W. Van Megen, Phase behaviour of concentrated suspensions of nearly hard colloidal spheres,Nature320, 1986. a gas, molecules can move freely. In a crystalline solid, the molecules are confined, but they can still move locally about their lattice position. However, in a jammed configuration, the molecules are stuck (low entropy). Therefore, to maximise free- dom, the system will spontaneously arrange itself in a crystalline configuration to avoid being jammed, and hence the occurrence of a phase transition. Therefore, macroscopic ordering actually corresponds to an increase in the freedom and disor- der in the microscale. Hard sphere crystallisation has been observed in experiments with colloids [14], where we can tune the interactions such that the colloids behave as almost perfect hard spheres (see Figure 2).

2. Role of geometry: The isotropic-nematic transition In the previous section, we saw that entropy drives translational ordering in hard spheres. A natural question to ask is how the geometry and anisotropy of the molecules play a role in entropic phase transition, and in particular whether entropy can produce orientational ordering. Following Onsager, we consider thin hard rods of length L and diameter D.In the liquid crystal terminology, we are interested to find out the transition from the isotropic fluid phase, where the molecules are translationally and orientationally disordered, to the nematic phase. In the latter phase, the molecules are transla- tionally disordered, but their orientations are, on average, aligned. In the same sprit as the previous section, we will analytically compute the (approximate) free energy for the isotropic and nematic phases, and see whether they cross. LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 141

u 2D

u'

L

2D L (a) (b)

θ L

L (c)

Figure 3. Excluded volume of rod-like particles. (a) When a par- ticle pointing along u is fixed, the centre of mass of the other parti- cle u cannot enter the shaded parallelepiped region. (b) The par- allelepiped viewed from the direction of u. (c) The parallelepiped viewed from the direction of u × u.

2.1. Isotropic phase To compute the thermodynamics of the isotropic phase, we consider it as a gas of hard rods. The pressure of an imperfect gas can be approximated by the viral expansion (c.f. Equation (2.34)). For hard particles, the second virial coefficient is 1 −βφ(r) 1 (3.21) B2 = (1 − e )dr = Vexcl, 2 V 2 where Vexcl excluded volume of a pair of particles. Now, consider two infinitely long rods (L/D →∞), pointing in directions u and u. If the position of one rod is fixed, the centre of mass of the second rod cannot enter a certain region (as shown in Figure 3). The volume of that region is the excluded volume. Geometrically (see Figure 3), the excluded region is a parallelepiped, thus for two rods that make an angle θ (0 ≤ θ ≤ π) 2  2 (3.22) Vexcl(θ)=2DL |u × u | =2L D sin θ, where we have ignored the contributions of the end points as these are a factor D/L smaller and we consider the limit D/L → 0. Averaging the excluded volume over all angles yields: 1 πL2D (3.23) V =2B =2L2D dθ sin2 θ = excl 2 2 2 142 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING and hence πL2D (3.24) B = . 2 4 The peculiar feature of the Onsager model is that all higher virial coefficients vanish in the limit L/D →∞, hence πL2Dρ (3.25) Z =1+ . 4 Forwhatfollows,itisconvenienttochooseB2 as our unit of volume. In these units, the number density is

(3.26) ρB2 ≡ c, which defines the reduced density c.Intermsofc: (3.27) Z =1+c.

If we use units kBT/B2 for the pressure, we find: (3.28) P  = c(1 + c) ,   where P ≡ pB2/kBT . We express the chemical potential in units of kBT : μ ≡ μ/k T . The ideal part of the chemical potential of this gas is B c (3.29) μ =log . id 4π Note that there is an extra factor of 4π in the chemical potential — adding a constant to the chemical potential makes no difference for the location of phase transitions (as long as it is done consistently). The factor 1/(4π) in Equation (3.29) follows from the fact that the number density is a function of both the position and the orientation. For an isotropic phase every orientation is equally likely and hence the orientational distribution 1 (3.30) f(cos θ, φ)= . 4π Later, when we consider the nematic phase, this angular distribution will not be uniform. The excess chemical potential is 2B ρ =2c, hence 2 c (3.31) μ =log +2c. 4π 2.2. Nematic phase In the nematic phase, the particle density is still uniform in space, but the orien- tational distribution is not uniform. We characterise the orientation of a molecule by a unit vector ω and the orientational distribution function by f(ω). On average, molecules are aligned parallel (or, equally likely, antiparallel) to some preferred direction n, called the nematic “director”. We denote the angle that a given molecule makes with the z-axis by θ. In the nematic phase, the orientational distribution of the molecules has cylindrical symmetry, i.e. it depends on θ but not on the azimuthal angle φ. The single-particle density is ρ(r, ω)=ρ0f(ω)whereρ0 is the average number density and f(ω) is the normalised orientational distribution function. At this stage, we do not yet know what the orientational distribution is, apart from the fact that it is cylindrically symmetric and that it does not distinguish between “head” and “tail”, i.e. f(θ)=f(π − θ). In what follows, we will only consider the range of values of |θ|≤π/2. To facilitate the calculations we will LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 143 assume a trial distribution f(θ) that is sharply peaked around θ = 0. The approach will now be as follows: initially, we will fix the single-particle distribution and compute the Helmholtz free energy A for that distribution. Then we vary f(θ)to minimise A. To make life easy, we will use a Gaussian trial distribution. 2 (3.32) f(ω) ≈ ce−aθ , where the width of the distribution is determined by the adjustable constant a, while c is a normalisation constant.2 As the distribution is sharply peaked around θ =0,wehavethatθ 1 and hence that sin θ ≈ θ. In addition, we can replace the upper limit of the integration (θ = π/2) by infinity. Normalisation then requires that ∞ 2 (3.33) dθ 2πθce−aθ =1, 0 or c = a/π. We note that in reality the orientational distribution function has two peaks: one at θ = 0 and one at θ = π. However, we make use of the symmetry of the distribution around θ = π/2. To compute the Helmholtz free energy, we consider a reference system of rods 2 with an orientational distribution (a/π)e−aθ . The free energy of such an “orienta- tionally ordered ideal gas” is:

(3.34) Aref = kBT dr dω ρ(r, ω) [log ρ(r, ω) − 1] .

Using

(3.35) dr ρ0 = N and ∞ a 2 (3.36) dθ 2πθ (aθ2)e−aθ =1, 0 π we get ( ) ρ a (3.37) A = Nk T log 0 − 1 − Nk T. ref B π B To compute the full chemical potential and the corresponding free energy, we must include the part due to interactions, i.e. μ = μid + μex. −βΔU We once again use the Widom expression μex = −kBT loge to compute the excess chemical potential. For a given orientational distribution function we can compute μex from particle insertion, using

−βΔU −ρ0 Vexcl (3.38) e =1− ρ0 Vexcl≈e , where Vexcl is the average volume excluded by the rods in the system to the rod that is inserted (and the approximation is justified as ρ Vexcl 1, see below for further discussion). We now make an assumption that is similar in spirit to the cell model for crystals, namely that, whilst inserting one rod, we keep the orientation

2This is in the same sprit as the Gibbs-Bogoliubov variational theory. In general, we know that the true orientational distribution function f(θ) is the minimiser of the free energy. Now, for simplicity, we pick a distribution fˆ(θ; c1,c2 ···), where c1,c2 ··· are parameters, and instead minimise the free energy with respect to those parameters to find the closest approximation to f(θ) (in this simple example we choose a one-parameter family of distributions). 144 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING of all other rods parallel to the z-axis. Clearly, this is an approximation, but it makes life easier. The orientational distribution of the rod to be inserted is given by f(ω). Using this approximation (and using the fact that sin θ ≈ θ), we find  π (3.39) V =2L2D sin θ≈2L2D , excl 4a and hence  π (3.40) dNμ = Nk Tρ L2D . ex B 0 4a The total free energy is then  ρ a π (3.41) A = Nk T log 0 − 2+ρ L2D . B π 0 4a We now determine a by requiring that ∂A/∂a = 0. This yields:  π (3.42) ρ L2D =2, 0 4a and hence ( ) ρ a ρ3(L2D)2 (3.43) A = Nk T log 0 = Nk T log 0 , B π B 16 from which it follows that

(3.44) p =3ρ0kBT, and ρ3(L2D)2 (3.45) μ = k T log 0 +3k T. B 16 B  Now we use the reduced units defined in the previous section: c ≡ ρB2, P ≡ pB /k T , μ ≡ μ/k T .Thenwehave: 2 B B c3 (3.46) μ =log +3, π2 and (3.47) P  =3c. To determine the coexistence with the isotropic phase, we will have to determine thepointwheretheμ vs P  curves of the two phases intersect (see Figure 4). At first sight it may seem strange that the hard rod system can increase its entropy by going from a disordered fluid phase to an orientationally ordered phase. Indeed, due to the orientational ordering of the system, the orientational entropy of the system decreases. However, this loss in entropy is more than compensated by the increase in translational entropy of the system: the available space for any one rod increases as the rods become more aligned. In fact, we shall see this mechanism returning time-and-again in ordering transitions of hard-core systems: on the one hand the entropy decreases because the density is no longer uniform in orientation or position, but on the other hand the entropy increases much more because the free-volume per particle is larger in the ordered phase compared to the disordered phase. LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 145

8

7

6 ′ μ

5

4 Isotropic Phase Nematic Phase 3

10 15 20 25 P′

Figure 4. Approximate location of the isotropic-nematic tran- sition in a fluid of thin hard rods (Onsager model). The figure shows the chemical potentials of the isotropic and nematic phases as a function of pressure. The calculation has been performed us- ing the approximations described in the text. At low pressures, the isotropic phase is stable, but at higher pressures the fluid un- dergoes an orientational ordering transition. This simple model predicts that the densities of the isotropic and nematic phases at coexistence are cI =3.45 and cN =5.12, respectively. The “exact” (i.e. numerical) results are cI =3.29 and cN =4.22.

The “exact” (but numerical) answer for the densities of the isotropic and ne- matic phase at coexistence are ρI B2 =3.2906 and ρN B2 =4.2230. The important pointtonoteisthatthetransitiontakesplacewhenρI B2 = O(1). That is the volume of the system is comparable to NVexcl. However, at that density the volume 2 occupied by the particles (Vocc = NπLD /4) is negligible compared to V : i.e. the volume fraction of the rods is negligible. When the rods are compressed to higher volume fractions, other phases form (smectic liquid crystal and 3D crystal). There is another “characteristic” density that is important for rod-like colloids, namely the density where the rods start to hinder each other’s rotation. This happens at a much lower density, namely when ρL3 = O(1). Hence, when a rod-like liquid is compressed, the first thing that happens is that the dynamics becomes non-ideal 3 (ρL = O(1)). After that, the orientations order ( ρB2 = O(1)), and finally the positions order ( ρLD2 = O(1)).

3. Depletion interaction and the entropy of the medium One of the most surprising effects of the solvent on the interaction between colloids is the so-called depletion interaction. Consider a mixture of large hard spheres (colloids) and small hard spheres (solvents/polymer coils). When two hard spheres are far apart, the small spheres can easily fit in the gap that is formed by the two large spheres. However, when the closest distance between the large spheres is less 146 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Figure 5. Schematic illustration of the depletion interaction. When the gap between the large spheres cannot accommodate the smaller spheres, the system will push the two larger spheres to- gether to grant more freedom to the smaller spheres. than the diameter of the small spheres, the small spheres can no longer fit into the gap. We have established in the last few sections that Nature wants to maximise freedom of particles. Therefore, perhaps a natural response is to squeeze the two large spheres together so as to reduce the volume that is inaccessible by the small spheres (see Figure 5). Although this sacrifices the freedom of the large spheres, there are a lot more small spheres around, and therefore the freedom gained is more than the freedom lost.

3.1. A simple toy model To explore this idea further, we first study a simple model system. Let us consider a d-dimensional cubic lattice with at most one particle allowed per square (Figure 6). Apart from the fact that no two particles can occupy the same square face, there is no interaction between the particles. For a lattice of N sites, the grand-canonical partition function is:

% & (3.48) Ξ = exp βμc ni .

{ni} i

The sum is over all allowed sets of occupation numbers {ni} (ni =1, 0 for occupied and empty sites, respectively) and μc is the chemical potential of the “colloidal” particles. Next, we include small “solvent” particles that are allowed to sit on the links of the lattice (see Fig. 6). These small particles are excluded from the edges of a cube that is occupied by a large particle. For a given configuration {ni} of the large particles, one can then exactly calculate the grand canonical partition function of the small particles. Let M = M({ni}) be the number of free spaces accessible to the small particles. Then clearly:

M l M!z { } (3.49) Ξ ({n })= s =(1+z )M( ni ) , small i l!(M − l)! s l=0 LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 147

Figure 6. Two-dimensional lattice model of a hard-core mixture of “large” colloidal particles (black squares) and “small” solvent particles (white squares). Averaging over the solvent degrees of freedom results in a net attractive interaction (depletion interac- tion) between the “colloids”.

where zs ≡ exp(βμs) is the fugacity of the small particles. M can be written as (3.50) M({ni})=dN − 2d ni + ninj , i where d is the dimensions of the systems, dN is the number of links on the lattice and the second sum is over nearest-neighbour pairs and comes from the fact that when two large particles touch, the number of sites excluded for the small particles is 4d−1, not 4d. Whenever two large particles touch, we have to correct for this over- counting of excluded sites. The total grand-partition function for the “mixture” is: (3.51) Ξmixture = exp[(βμc − 2d log(1 + zs)) ni + [log(1 + zs)] ninj] ,

{ni} i dN where we have omitted a constant factor (1+zs) . Now we can bring this equation in a more familiar form by using the standard procedure to translate a lattice-gas model into a spin model. We define spins si such that 2ni −1=si or ni =(si +1)/2. Then we can write Eqn. 3.51 as (3.52) ⎡ ⎤ βμ − d log(1 + z ) log(1 + z ) Ξ = exp ⎣ c s s + s s s +Const.⎦ . mixture 2 i 4 i j {ni} i This is simply the expression for the partition function of an Ising model in a magnetic field with strength H =(μc − d log(1 + zs)/β) and an effective nearest neighbor attraction with an interaction strength log(1 + zs)/(4β). There is hardly 148 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING a model in physics that has been studied more than the Ising model. In two dimensions, the partition function can be computed analytically in the zero field case. In the language of our mixture model, no external magnetic field means:

d (3.53) (1 + zs) = zc , where zc =exp(βμc), the fugacity of the large particle. Several points should be noted. First of all, in this simple lattice model, sum- ming over all “solvent” degrees of freedom resulted in effective attractive nearest neighbour interaction between the hard-core “colloids”. Secondly, below its critical temperature, the Ising model exhibits spontaneous magnetisation. In the mixture model, this means that, above a critical value of solvent chemical potential, there will be phase transition in which a phase with low nc ( a dilute colloidal suspen- sion) coexists with a phase with high nc (concentrated suspension). Hence, this model system with purely repulsive hard-core interaction can undergo a de-mixing transition. This de-mixing is purely entropic.

3.2. Integrating out solvent degrees of freedom Going beyond the toy model, we would like to calculate the depletion interaction in a realistic system. To do this, we need to develop the thermodynamic framework whereby we systematically average out the solvent degrees of freedom. Consider a system with Nc hard particles (e.g. colloids) in a volume V at temperature T . The solvent is held at constant chemical potential μs, but the number of solvent molecules Ns is fluctuating. The “semi-grand” partition function of such a system is given by ∞ (3.54) Ξ(Nc,μs,V,T) ≡ exp(βμsNs)Q(Nc,Ns,V,T).

Ns=0

The canonical partition function Q(Nc,Ns,V,T)isgivenby q (T )Nc q (T )Ns id,c id,s Nc Ns Nc Ns (3.55) Q(Nc,Ns,V,T)= dr dr exp[−βU(r , r )] , Nc!Ns! where qid,α is the kinetic and intra-molecular part of the partition function of a particle of species α. These terms are assumed to depend only on temperature, and not on the inter-molecular interactions (sometimes this is not true, e.g. in the case of polymers). In what follows, we will drop the factors qid,α (more precisely, we will account for them in the definition of the chemical potential: i.e. μα ⇒ μα + kBT log qid,α). Nc Ns The interaction potential U(r , r ) can always be written as Ucc + Uss + Usc, Nc Ns where Ucc is the direct colloid-colloid interaction (i.e. U(r , r )forNs =0), Nc Ns Uss is the solvent-solvent interaction (i.e. U(r , r )forNc =0),andUsc is the Nc Ns Nc Ns solvent-colloid interaction U(r , r )−Ucc(r )−Uss(r ). With these definitions, we can write (3.56)  1 1 Nc Ns Q(Nc,Ns,V,T)= dr exp[−βUcc] dr exp[−β(Uss + Usc)] , Nc! Ns! LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 149 and hence 1 Nc Ξ(Nc,μs,V,T)= dr exp[−βUcc] Nc! ' ∞ exp(βμ N ) s s Ns (3.57) × dr exp[−β(Uss + Usc)] . Ns! N=0 We can rewrite this in a slightly more transparent form. We define 1 Ns (3.58) Qs(Ns,V,T) ≡ dr exp[−βUss] Ns! and ∞ (3.59) Ξs(μs,V,T) ≡ exp(βμsNs)Qs(Ns,V,T).

Ns=0 Then 1 Nc Ξ(Nc,μs,V,T)= dr exp[−βUcc] Nc! ' ∞ × exp(βμ N )Q (N ,V,T) exp[−βU ] s s s s sc Nc,Ns,V,T Ns=0 Ξs(μs,V,T) (3.60) = drNc exp[−βU ] exp[−βU ] , cc sc μs,V,T Nc! where (3.61) ∞ exp(βμsNs)Qs(Ns,V,T) exp[−βUsc] exp[−βU ] ≡ Ns=0 Nc,Ns,V,T . sc μs,V,T Ξ(μs,V,T) We can see that the effective colloid-colloid interaction is * + eff Nc Nc Nc (3.62) U (r ) ≡ Ucc(r ) − kBT log exp[−βUsc(r )] . cc μs,V,T

eff Nc We refer to Ucc (r )asthepotential of mean force. Note that the potential of mean force depends explicitly on the temperature and on the chemical potential of the solvent. A perhaps curious property of the potential of mean force is that even when the colloid-solvent and solvent-solvent interactions are pairwise additive, the potential of mean force is not. However, we should bear in mind that even the “normal” potential energy is not pairwise additive - that is why pair potentials that describe intermolecular interactions in the gas phase cannot be used to model simple liquids.3 However, in many cases, we can make very reasonable estimates of the potential of mean force.

3For example, the interaction between two like-charges in a medium with dielectric constant  is 2 given by the Coulomb’s law βv(r)=lB/r where lB = e /(4π0kB T ) is the thermal Bjerrum length. However, the potential of mean force between two charges√ in a dilute binary electrolyte −r/l is approximately given by βv(r)=lB e D /r,wherelD =1/ 4πc0lB is the Debye screening length, and c0 is the number density of ions. Therefore, the long-ranged Coulomb interaction between two charges turns into an exponentially decaying potential of mean force when there are other charges around. 150 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

3.3. Asakura-Oosawa model Though mathematically elegant, the potential of mean force is almost an intractable object. To make further progress, we focus specifically on hard sphere colloids and polymer coils which, to a first approximation, can be modelled as smaller hard spheres. Furthermore, we need to introduce some simplifying assumptions. First, we ignore the interactions between the polymer coils. This is justified because of the large size difference between the polymer and colloid. Second, we ignore the interactions between colloids, and only consider the case of two colloids with position fixed. This assumption neglects the many-body nature of the force due to exclusion volume of, say, three colloids overlapping with each other (it can be made rigorous by a cluster expansion argument). Therefore, the only interactions that remain are those between the colloids and the polymer coils (those are hard core exclusion interactions). The partition function is now much simpler, and reads ∞ eβNpμp Ξ= Z(Np,V,T; D; Rc,Rp), Np! Np=0 ∞ βNpμp e − Np = drNp e βUcp(r ;D;Rc,Rp) Np! V Np=0 ∞ − Np e βNpμp −βUcp(ri;D;Rc,Rp) (3.63) = dri e , Np! V Np=0 where D is the separation between the large colloids, and Rc is the radius of the −βUcp(ri;D;R) colloid and Rp is the radius of the polymer. Now, the integrand e is 0 when the polymer overlaps with the colloid, and 1 otherwise, hence

−βUcp(ri;D;R) (3.64) dri e = Veff (Rc,Rp,D), V where Veff (Rc,Rp,d) is the volume accessible to the polymer. Thus ∞ (V (R ,R ,D)eβμp )Np eff c p βμp (3.65) Ξ = =exp(Veff (Rc,Rp,D)e ). Np! Np=0

Therefore, the depletion force ∂ log Ξ ∂V (R ,R ,D) (3.66) f = k T = k Tz eff c p , dep B ∂D B p ∂D μp,V,T

βμp where zp ≡ e is the fugacity of the polymer. By simple geometry, we have

(3.67) % & 3 3 4π(Rc + Rp) 3r 1 r Veff (r; Rc,Rp)=V − 1 − + , 3 4(Rc + Rp) 16 Rc + Rp

2Rc

2Rc

4. Attractive forces and the liquid phase In this lecture, we have seen a few different types of phase transitions. Even for purely repulsive interactions, rich phase behaviour could be seen due to the work of entropy. However, in the hard sphere model, only fluid-solid transition is seen, without an intermediate liquid phase. Therefore, perhaps a degree of attractive interaction is needed to generate the liquid phase? This poses an obvious question — why and when do liquids exist? We are so used to the occurrence of phenomena such as boiling and freezing that we rarely pause to ask ourselves if things could have been different. Yet the fact that liquids must exist is not obvious apriori. This point was made in an essay by V. F. Weisskopf [16]: ...The existence and general properties of solids and gases are relatively easy to understand once it is realized that atoms or molecules have certain typical proper- ties and interactions that follow from quantum mechanics. Liquids are harder to understand. Assume that a group of intelligent theoretical physicists had lived in closed buildings from birth such that they never had occasion to see any natural structures. Let us forget that it may be impossible to prevent them to see their own bodies and their inputs and outputs. What would they be able to predict from a fundamental knowledge of quantum mechanics? They probably would predict the existence of atoms, of molecules, of solid crystals, both metals and insulators, of gases, but most likely not the existence of liquids.

Weisskopf’s statement may seem a bit bold. Surely, the liquid-vapour transition could have been predicted apriori. This is a hypothetical question that can never be answered. But, as we shall discuss below, in colloidal systems there may exist an analogous phase transition that has not yet been observed experimentally and that was found in simulation before it had been predicted. To set the stage, let us first consider the question of the liquid-vapour transition. In his 1873 thesis, van der Waals gave the correct explanation for a well known, yet puzzling feature of liquids and gases, namely that there is no essential distinction between the two: above a critical temperature Tc, a vapour can be compressed continuously all the way to the freezing point. Yet below Tc, a first-order phase transition separates the dilute fluid (vapour) from the dense fluid (liquid). The liquid-vapour transition is due to competition between short-ranged repulsion and longer-ranged attraction. From the work of Longuet-Higgins and Widom, we now know that the van der Waals model (molecules are described as hard spheres with an infinitely weak, infinitely long-ranged attraction) is even richer than originally expected: it exhibits not only the liquid-vapour transition but also crystallisation (see Figure 7). 152 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Figure 7. Phase diagram of a system of hard spheres with a weak, long-ranged attraction (the “true” van der Waals model). The den- sity is expressed in units σ−3,whereσ is the hard-core diameter. The “temperature” τ is expressed in terms of the van der Waals a-term: τ = kBTv0/a.Wherev0 is the volume of the hard spheres. V, L, and S denotes the vapour, liquid and solid phases, respec- tively. CP and TP denotes the critical point and triple point.

The liquid-vapour transition is possible between the critical point and the triple point, and in the van der Waals model, the temperature of the critical point is about a factor two larger than that of the triple point. There is, however, no fundamental reason why this transition should occur in every atomic or molecular substance, nor is there any rule that forbids the existence of more than one fluid-fluid transition. Whether a given compound will have a liquid phase depends sensitively on the range of the intermolecular potential: as this range is decreased, the critical temperature approaches the triple-point temperature, and when Tc drops below the latter, only a single stable fluid phase remains. In mixtures of spherical colloidal particles and non-adsorbing polymer, the range of the attractive part of the depletion interaction can be varied by changing the size of the polymers (c.f. Equation (3.68)). Experiment, theory and simulation all suggest that when the width of the attractive well becomes less than approxi- mately one third of the diameter of the colloidal spheres, the colloidal ‘liquid’ phase disappears (see Figure 8). However, when the attraction becomes very short-ranged (less than 5% of the hard-core diameter), a first-order iso-structural solid-solid transition appears in the solid phase. [4] We can rationalise the origin of this transition by comparing two situations: one is the expanded solid close to melting, the other is the dense solid near close packing. To make life simple, let us assume every particle moves inde- pendently in a “cell” formed by its neighbours which has radius a. For sufficiently short-ranged potentials with legthscale δ, the solid can be expanded to a density where a is much larger than the width of the attractive well. In that case, a given LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 153

A B C p p p Solid 2 Liquid Solid Solid Solid 1 Vapour Fluid Fluid

T T T Figure 8. Phase-diagram of a system of spherical particles with a variable ranged attraction. As the range of attraction decreases, the liquid-vapour curve moves into the meta-stable regime. For very short-ranged attraction (less than 5% of the hard-core diam- eter), a first-order iso-structural solid-solid transition appears in the solid phase. It should be stressed that phase diagrams of type B are common for colloidal systems, but rare for simple molecular systems. A possible exception is C60. Phase diagrams of type C have, thus far, not been observed in colloidal systems. Nor had they been predicted before the simulations appeared (this suggests that Weisskopf was right).

particle can have at most three neighbours within the range of its attractive well when it rattles in its cell, although the average number will be far less. In contrast, once the density of the solid is so high that a<δ, then every particle interacts with all its nearest neighbours simultaneously. This leads to a fairly abrupt lowering of the potential energy of the system. At low temperatures, this decrease of the energy on compression outweighs the loss of entropy that is caused by the decrease of the free volume. Thus the Helmholtz free energy mill exhibit an inflection point, resulting in a first-order transition to a “collapsed” solid. In fact, we can have an analytical handle on the solid-solid transition by an analysis similar to Equation (2.23). Here, for simplicity, we will choose the cell model (outlined in Section 1.2) as the reference system. The upper bound to the free energy is given by the Gibbs-Bogoliubov equation    (3.69) A = Aid + Acell + dr dr ρ0(r )ρ0(r)V (|r − r |), where Acell is the free energy of cell model, Aid = NkBT log(V/V0) is the ideal gas contribution (neglecting all V -independent constants) , ρ0(r) is the density of particles in the cell model, and V (r) is the interaction potential. At low temper- atures, the atoms are almost completely localised at its lattice site. Thus we can approximate

N (3.70) ρ0(r)= δ(|r − Ri|), i=1 154 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING where Ri are the equilibrium positions of the lattice. Thus 1 dr drρ (r)ρ (r)V (|r − r|)= V (|R − R |) 0 0 2 i j i= j N (3.71) = V (r ), 2 i i where we have used the fact that all sites of the crystal are equivalent, hence the sum is simply N times the sum when we start from a particular atom. We consider a particular form of the potential σ n (3.72) V (r)=−U ,r>σ 0 r and with that N U N σ n V (r )=− 0 2 i 2 r i i i U N σ n a n = − 0 2 a ri i U N σ n (3.73) = − 0 M , 2 a n where a is the lattice constant, and Mn is known as the Madelung constant. Noting −1/3 that, σ/a =(V/V0) ,wehave − A V V 1/3  V n/3 (3.74) =log − 3log − 1 − Mn NkBT V0 V0 2 V0 where  = U0/(kBT ). Figure 9 shows that there is indeed a expanded solid phase and a contracted solid phase with short ranged interaction potentials.

5. Exercises (1) By extending the scaled particle theory framework, show that the equation of state for a 2D hard disc system is given by βP 1 = , ρ (1 − η)2 where η = πr2ρ is the 2D packing fraction with ρ being the area density. (2) In this problem we want to consider a toy model for entropic phase transi- tion due to “bond disorder” (this is an important concept in understanding the self-assembly of DNA-coated colloids). Suppose we have a suspension of colloids and polymer strands. The colloid-polymer interaction is strong, but the two ends of the polymer must be attached to two different colloids. For simplicity, we assume that there is no restriction to the number of polymer strands that each colloid can be attached to. Furthermore, we assume that polymer strands are small in size compared to the colloid. We would like to treat this using a lattice model. LECTURE 3. ORDER FROM DISORDER: ENTROPIC PHASE TRANSITIONS 155

15 No Interaction n=10, ε = 1.5

10 T) B A/(N k 5

0 1 2 3 4 5 6 7 8 9 10 V/V 0

Figure 9. Solid-solid transition appears for short ranged interac- tion potentials. The double tangent construction can be used to find the equilibrium volumes of the expanded and contracted phase (indicated with an arrow).

(a) Consider the colloids to be on a d-dimensional cubic lattice, where each lattice site can only accommodate one colloid. Show that colloid partition function is given by % & Ξ= exp βμc ni ,

{ni} i

where μc is the chemical potential of the colloids and {ni} is the occupation number of the sites. (b) Assume that the polymer chains can only link up nearest neighbour colloid pairs, and there are no free polymer strands. Show that the total number of ways to distribute Np polymer chains amongst P nearest neighbour pairs is

(N + P − 1)! W = p . Np!(P − 1)!

(c) Assuming that the polymer does not compete for space with the colloids, show that the polymer partition function is given by

−P Ξpolymer =(1− f) ,

where f ≡ eβμp is the fugacity of the polymer. 156 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

(d) Hence or otherwise show that the total partition function is given by ⎡ ⎤ ⎣ ⎦ Ξtot = C exp βμC ni − log(1 − f) ninj ,

{ni} i ij where C is a constant. (e) This is again the antiferromagnetic Ising model, and we expect a demixing phase transition for d>2 (we need f<1, otherwise the polymer bath is unstable with respect to the system). Explain this result. Note that in the model there is nothing to stop the polymers localising themselves to link colloids into dimers. (Hint: How does the formation colloid clusters change the number of ways the polymer strands can distribute themselves?) LECTURE 4 Granular entropy

In the previous lectures we have applied statistical physics to microscopic sys- tems in order to extract macroscopic properties. One of the most fundamental tenants that allows us to bridge the micro and macro worlds is the ergodic hypoth- esis. It states that in a closed system, all states with the same energy are equally likely to be occupied. We have never proved this statement, but experimentally it seems to hold for most matter (a major exception being a glass, a subject we will not go into). In this lecture we would like to make a “leap of faith”, and consider granu- lar media and the physics of packing macroscopic objects using statistical physics. Unlike atoms or colloids, grains in a powder are “frozen” in position at ambient temperature. Due to friction dissipation and their large masses, the thermal kinetic energy is negligible compared to the gravitational energy; thus the external bath temperature is effectively zero. That is not to say that the powder is always sta- tionary. A way to perturb it is to tap it or shake it. Each tap pumps energy into the system and rejigs the grains. Due to inelastic collisions the kinetic energy is totally dissipated after each tap, and the system is again frozen in one of its many stable configuration. Imagine tapping it many times, does the powder converge to any universal state? Common experience would suggest so. If we take a bottle a powder and tap it or shake it, the volume does not change much. In fact, detailed measurements show that the packing fraction of randomly packed hard spheres is always ηrcp ≈ 0.64 (those experiments can be done on a very macroscopic level — you can try it by pouring/shaking/kneading ball bearings in a flask!). However, ηrcp is not the densest packing of hard spheres. The densest close packed√ structure is the face centred cubic, which has packing fraction ηfcc = π/ 18 ≈ 0.74. The natural question is: What give rise to this universality? This universal packing fraction was suggested to be the endpoint of a metastable branch of hard sphere packing where the rate of disappearance of accessible free volume diverges (the stable branch of hard sphere packing being one that leads to the face-centred cubic crystal). [12] However the true answer is that we still do not know. Notwithstanding this, the seemingly universal response of grains and powders prompted a “statistical mechanical” framework which exploits the fact that the number of grains in a power is large (although not to the scale of Avogadro’s number). Sir Sam Edwards [9, 8] first suggested that one can define a quantity analogous to entropy,

(4.1) Spowder =logN,

157 158 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING where N is the total number stable packings that the system has, subjected to a fixed volume. The “ergodic hypothesis” would therefore be that all stable packings are achieved with equal probability.

1. Computing the entropy To make life simpler, we will use a soft-core potential which is continuous and differentiable  α  1 − r ,r<σ (4.2) U(r)= σ 0,r>σ. The potential U(r) tends to the hard sphere potential in the limit  →∞. Now, consider a collection of N particles starting with random positions. We can use the steepest decent algorithm to get to the nearest energy minima. If the minima has total potential energy Utot = 0, it is also an acceptable minima for a hard sphere system, whereas if Utot > 0 the minima will not carry over to the hard sphere limit. If we keep preparing random starting conditions, we can computationally “count” the number of distinct energy minimising configurations with Utot = 0 as every scaled coordinate belongs to the basin of attraction of one and only one energy minimum. (Note that the set of all starting conditions that converge to the ith energy minimum is known as the basin of attraction of that minimum). There is one further ambiguity — mechanical stability does not exclude some particles being able to move locally (“rattlers” or “floaters”) in an otherwise rigid framework. We will consider two configurations to be equivalent if they only differ by the displacement of those rattling or floating particles. This brute-force enumeration is a workable algorithm, and from the number of times we have sampled a particular minimum we can work out the volume of its basin of attraction. However, computationally this approach is only feasible for ∼ 12 particles! An alternative approach is to take advantage of the fact that every initial starting condition necessarily converges to one and only one energy minimum. Thus the total volume of configuration space is simply the sum of the volumes of all basins of attraction, i.e. N (4.3) V = vi. i=1 For a system of N distinguishable particles in a d-dimensional volume V ,wehave1 V = V N . Now, the sum can be rewritten as

N 1 N (4.4) v = N v ≡ N v , i N i i=1 i=1 where v is the average volume of a basin of attraction. Now the crucial observation is that if we can efficiently compute v by sampling, we can obtain an estimate for

1For a system of hard particles, the volume of configuration space needs to account for hard core exclusion, hence N log V =logV − Aex(φ) where Aex(φ) excess free energy of the packing at packing fraction φ. LECTURE 4. GRANULAR ENTROPY 159

N without direct enumeration using V (4.5) N = . v A way to compute the average volume is to convert the problem into a free energy calculation. To see how the two are related, we consider the Hamiltonian  N ∈ N 0, r vi (4.6) H0(r )= N ∞ r ∈/ vi The volume of the basin of attraction, v , can be written as a “partition function” i

N −H0 (4.7) Z = vi = dr e .

Therefore, the “free energy” is simply log vi. To compute the free energy, we can use the thermodynamic perturbation method (see Section 2.1), where we consider our system as a perturbation to a well-known, analytical solvable model. In this case, we will use the Einstein crystal model which we have discussed in Section 2.1. To recap, we consider N − N 2 (4.8) H = H0 + λ(r ri ) , N th →∞ where ri is the coordinate of the i energy minimum. In the limit λ , H0 Nd/2 is unimportant and Z → (π/k) .WecantreatH0 as a perturbation, and thus the “free energy” Nd π ∞ * + (4.9) log v = log + dλ (r − rN )2 . i 2 k N i λ 0 * + − N 2 Numerically computing the integral is a non-trivial task as (rN ri ) λ is a quantity that is computationally intensive to obtain, and we have to truncate the range of integration at some finite λ (for an actual implementation see [19]). It seems that we have a method to compute v and hence N and the granular entropy. Unfortunately, there is a problem: in order to compute v we should perform an unbiased sampling of all energy minima. However, in practice, we will sample different minima with a probability Ps(v) that is proportional to the basin volume v, thus the unbiased probability Pu(v)is:

Ps(v)/v (4.10) Pu(v)= , 1/v s and thus the unbiased average is given by ∞ 1 (4.11) v = dvP (v)= . u u 0 1/v s where . s denotes averaged quantities in the sampled distribution, and . u de- notes averaged quantities in the unbiased distribution. Although Equation (4.11) mathematically corrects the biasing, in practice it is still problematic because the average of 1/v is dominated by small basin volumes that are barely sampled, and it turns out that there are a lot more of small basins compared to large ones! Hence, whilst the method is correct in principle, it performs poorly in practice because of poor statistics. A way to get around this is to exploit the fact that (at least from numerical experiments) the probability distribution of basin volumes is a rather nice function of v. Therefore, we can fit the numerically obtained probabilities to a 160 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING trial function. Then, using an analytical distribution (with fitted parameters), we can unbias the sampled basin volumes. This was indeed done successfully in [2].

2. Is this “entropy” physical? The problem with different basin volumes indicates that not all basins are the same. Surely, if the system explores the whole configuration space, it is more likely to end up in an energy minimum with a larger basin of attraction. To take this into account, we revisit the Boltzmann entropy formula but express it in terms of probabilities of states. Suppose we have prepared K replica of our granular powder with random starting conditions, and there are M energy minima . If we know that th the i minima has been attained by wi replica, the number of ways this distribution can be realised (noting that the each replica is independent and distinguishable) is given by K! 7 (4.12) Ω = M . i=1 wi! Now, the effective Boltzmann entropy for this system is S =logΩ M ≈ K log K − wi log wi i=1 M = K log K − Kpi log Kpi i=1 M (4.13) = −K pi log pi, i=1 where in the second step we have used the Sterling’s approximation, and we have th defined pi ≡ wi/K as the probability of the system to be in the i minima. There- fore, the entropy per replica is given by M ∗ (4.14) S ≡− pi log pi. i=1 This entropy is actually easier to evaluate numerically as no unbiasing is needed. To see this, we write pi = vi/V,thus M ∗ − 1 vi S = V vi log V i=1 M − 1 V = V vi log vi +log i=1 − V (4.15) = log v s +log , where in the last step we have used Equation (4.10), the relationship between sampled and unbiased distributions. A crucial property of any entropy function is extensivity — it must scale linearly with N, the number of particles in the system. Figure 1 shows that Equation (4.14) is not extensive! The scaling is not linear, and the best fit line does not pass through LECTURE 4. GRANULAR ENTROPY 161

Figure 1. The entropy function Sˆ∗ is extensive (open squares and stars) whilst S∗ is not (open circles and crosses). The numerical simulation is done with a binary mixture of hard disc with diameter ratio σ1/σ2 =1.4 (open circles and squares) and σ1/σ2 =1.4 (crosses and stars) to avoid crystallisation. The volume fraction is kept at φ =0.88. The figure is taken from [2]. the origin. This indicates something pathologically wrong. However, it turns out that if we redefine the entropy

M p (4.16) Sˆ∗(N)=− p log i i N! i=1 then Sˆ∗ is extensive (see Figure 1).

3. The Gibbs paradox This factor of N! is not fortuitous. We have encountered this in Lecture 2 when we tried to calculate the partition function of an ideal gas, and we argued that this factor is due to indistinguishability of particles. Gibbs originally proposed this modification to make the entropy extensive. In his own words: “...Again, when such gases have been mixed, there is no more impossibility of the separation of the two kinds of molecules in virtue of their ordinary motion in the gaseous mass without any especial external influence, than there is of the separation of a homogeneous gas into the same two parts into which it has once been divided, after these have these have once been mixed.”

Gibbs’ statement is almost impenetrable! Many subsequent authors seem to believe that it is only resolvable via quantum mechanics due to symmetries in the wave function with respect to particle exchange. However, as we will see below, a better interpretation of this N!isthatwedonotwant to distinguish the particles in the calculation, rather than some inherent indistinguishability. 162 ALPHA A LEE, DAAN FRENKEL, ENTROPY, PROBABILITY AND PACKING

Suppose we have two systems, one containing N1 particles in volume V1,and the other containing N2 particles in volume V2, with both systems having the same temperature and pressure. Those particles are all different, but sufficiently similar such that their physical interactions are the same (say their size or shape is every- so-slightly different, the key point is that they are perfectly distinguishable, and we can track each and every one of them). If the system is an ideal gas, the partition function for system 1 and 2 is given by N −βH N N (4.17) q1,2(N1,V1,T)= dr e = dr = V1,2. V Now, we bring the two systems into contact such that mass transfer can take place. As they have the same temperature and pressure, it follows that there is no net flux of matter or energy. The total partition function is now

(4.18) Ztot = q1(N1,V1,T)q1(N2,V2,T)g(N1,N2), where the extra factor g(N1,N2) denotes the number of ways we can partition the distinguishable particles between the two subsystems. Combinatorics tells us,

(N1 + N2)! (4.19) g(N1,N2)= . N1!N2!

Equilibrium is obtained by maximising Ztot with respect to N1, keeping N = N1 + N constant. Therefore, a necessary and sufficient condition for equilibrium is 2 ∂ log Z ∂ log(q /N !) ∂ log(q /N !) (4.20) tot = 1 1 − 2 2 =0, ∂N1 N ∂N1 ∂N2 wherewehaveusedthefactdN1 = −dN2. Therefore, we are forced to conclude that if we want to define a free energy such that its extremal yields the condition for thermodynamic equilibrium, the partition function must be 1 N −βH (4.21) Z1,2 = dr e . N1,2! V Note that we can rearrange Equation (4.18) into a more transparent form Z q (N ,V ,T) q (N ,V ,T) (4.22) tot = 1 1 1 2 2 2 . N! N1! N2! The exact same argument can be applied to derive of Boltzmann entropy as entropy maximisation is the condition for thermal equilibrium in the (N,V,E) ensemble. The number of states, Ω, satisfy Ω (N,V,E) Ω (N ,V ,E) Ω (N ,V ,E) (4.23) tot = 1 1 1 2 2 2 , (N1 + N2)! N1! N2! and thus Ω (4.24) S = k log . B N! This explains why the factor of N! is needed in Equation (4.16) to make the granular entropy extensive. Of course, if we are fixing N, the factor of N! amounts to a constant addition to the free energy or the entropy. However, this factor becomes important when we are comparing systems with different number of particles. The key question posed at the beginning of this lecture is whether statistical physics allows us to extract macroscopic properties of granular materials. We have LECTURE 4. GRANULAR ENTROPY 163 computed a granular entropy that fulfils extensivity and accounts for unequal vol- ume of the basins of attraction, but does this tell us anything useful? We do not know that answer to this question and this is very much an active area of research.

Bibliography

1. B. J Alder and T. E. Wainwright, Phase transition for a hard sphere system, The Journal of Chemical Physics 27 (1957), 1208. 2. D. Asenjo, F. Paillusson, and D. Frenkel, Numerical calculation of granular entropy,Physical Review Letters 112 (2014), 98002. 3. Marc Baus and Carlos F Tejero, Equilibrium statistical physics: phases of matter and phase transitions, Springer, 2007. MR2450355 4. P. Bolhuis and D. Frenkel, Prediction of an expanded-to-condensed transition in colloidal crystals, Physical Review Letters 72 (1994), 2211. 5. J. Dunkel and S. Hilbert, Consistent thermostatistics forbids negative absolute temperatures, Nature Physics 10 (2014), 67. 6. , Reply to Frenkel and Warren [arxiv: 1403.4299 v1], arXiv preprint arXiv:1403.6058 (2014). 7. , Reply to Schneider et al.[arxiv: 1407.4127 v1], arXiv preprint arXiv:1408.5392 (2014). 8. S. F Edwards, The flow of powders and of liquids of high viscosity, Journal of Physics: Con- densed Matter 2 (1990), SA63. 9. S. F Edwards and R. B. Oakeshott, Theory of powders, Physica A: Statistical Mechanics and its Applications 157 (1989), 1080. MR1004774 10. D. Frenkel and P. B Warren, Gibbs, Boltzmann, and negative temperatures,AmericalJournal of Physics 83 (2015), 163. 11. S. Hilbert, P. H¨anggi, and J. Dunkel, Thermodynamic laws in isolated systems,Physical Review E 90 (2014), 62116. 12. R. D Kamien and A. J Liu, Why is random close packing reproducible?, Physical Review Letters 99 (2007), 155501. 13. E. J. Meijer and D. Frenkel, Melting line of Yukawa system by computer simulation,The Journal of Chemical Physics 94 (1991), 2269. 14. P. N. Pusey and W. Van Mege n, Phase behaviour of concentrated suspensions of nearly hard colloidal spheres,Nature320 (1986), 340–342. 15. U. Schneider, S. Mandt, A. Rapp, S. Braun, H. Weimer, I. Bloch, and A. Rosch, Com- ment on “consistent thermostatistics forbids negative absolute temperatures”, arXiv preprint arXiv:1407.4127 (2014). 16. V. F. Weisskopf, About liquids, Transactions of the New York Academy of Sciences 38 (1977), 202. 17. Arthur W Wiggins, The joy of physics, Prometheus Books, 2011. 18. W. W. Wood and J. D. Jacobson, Preliminary results from a recalculation of the Monte Carlo equation of state of hard spheres, The Journal of Chemical Physics 27 (1957), 1207. 19. N. Xu, D. Frenkel, and A. J. Liu, Direct determination of the size of basins of attraction of jammed solids, Physical Review Letters 106 (2011), 245502.

165

Ideas about Self Assembly

Michael P. Brenner

IAS/Park City Mathematics Series Volume 23, 2014

Ideas about Self Assembly

Michael P. Brenner

Introduction Self-assembly is a subject that has gained prominence in recent years due to ad- vances in materials science and materials fabrication, as well as advances in biology. The goal of these lectures is to give an overview of our research aimed at uncovering the mathematical foundations of self-assembly. These lectures are designed for the the PCMI summer school and as such, their main job is to be pedagogical. On the other hand, the most difficult and in many ways the most interesting part of being a mathematical scientist is to figure out what questions one should ask. In that spirit, you should read these lectures as a story of our own struggles in trying to frame the right questions. The organizing theme is the distinction between het- erogeneous and homogeneous assembly: whereas traditional assembly methods in material science consider structures that are made out of many copies of a single components, here we focus on what happens when the structures are composed of many different components. This leads to qualitatively new questions and ways of making materials – and allows formulating questions that transcend the many specific experimental systems that researchers have been studying. A mathematical modeler can always productively make models of specific systems, each of which has their own interesting peculiarities. But our goal is to find ideas that will apply to an entire array of problems, transcending specific details – but yet being detailed enough that the results can be applied to specific experiments, and suggest new experiments or ways of looking at the major issues. The subject we will be discussing will also give an opportunity for you to review and practice statistical mechanics; another way to view the first two lectures is as a set of practice problems in statistical mechanics. Before we begin, I also want to emphasize that the ideas and the perspective that I am going to report is collaborative work with many people, particularly Vinny Manoharan, Guannang Meng, Zorana Zervavcic, Arvind Muragan, Natalie Arkus, Miranda Holmes-Cerfon, Sahand Hormoz and Stanslas Leibler. The ideas outlined in these lectures have been published in a series of papers: Lecture 2[15,1,2,12]; Lecture 3[15, 19]; and Lecture 4 [14]. Sarah Kostinski was the teaching assistant for these lectures. Miranda Holmes-Cerfon wrote part of section 2.1, outlining how moment of inertias arise from integrating over constraint manifolds within partition functions.

School of Engineering and Applied Sciences and Kavli Institute for Bionano Science and Technol- ogy, Harvard University, Cambridge, MA 02138, USA E-mail address: [email protected] The author gratefully acknowledges support from the National Science Foundation through DMR- 143596 and the Harvard MRSEC, and the Simons Foundation.

c 2017 American Mathematical Society

169

LECTURE 1 Self assembly: Introduction

1. What is self-assembly The usual motivation for self-assembly begins by observing that there are enormous differences in the assembly mechanisms between those made by humans and those made by biological systems. Whereas humans make devices and structures by putting things exactly where they need to go, biological systems have the ability to assemble things spontaneously. They do this by using very complicated parts which bind to each other in ways that are hard to understand, as well as highly regulated genetics which are very difficult to unravel. On the other hand, a holy grail is clearly to discover how biology does this, both so that we can understand it, as well as re-created ourselves. It is instructional to look through several examples from biology. Microtubules are tubular structure that consists of two proteins outside beta distributed which bind to each other as a dimer. In solution it spontaneously forms into long fila- ments of this dimer which then associate with each other to form a long filament. Another of my favorite examples is called the ribosome. The ribosome is the pro- tein complex that creates proteins, and consists of two large subunits. The smaller subunit is comprised of about 20 different proteins and a long RNA strand. In the 1960s, it was demonstrated by Nomura and colleagues [17] that these components, when put in solution by themselves, could spontaneously assemble into functional ribosomes. Presumably, the assembly of both microtubules and the ribosome must occur with high-yield. If not there would be many wasted components, and given the importance and prevalence of these two structures within a cell, it would be surprising if the yield is not high. There is also much evidence that mutations in individual proteins in either of these structures degrades yield. Biology is littered with examples like this, of complex structures that spontaneously assemble. In reality, the situation for obtaining high yield is even worse than described. The assembly of these structures do not occur in isolation, but occurs in a soup, that is densely filled with other assembling structures. One would imagine that the errors that are induced by having the wrong things assemble are high. There is a large literature on self assembly. Since this is a set of lectures at a summer school and not a review article, I am not going to attempt to summarize this literature. Broadly speaking, the literature mainly refers to systems where there are a small number of constituent components that assemble into large structures. Learning how to predict what macroscopic structure forms from a set of components with prescribed interactions is a fascinating subject, and there is much that can be learned from making mathematical models of these processes.

171 172 MICHAEL P. BRENNER, SELF ASSEMBLY

For the first three lectures, we will be concerned with examples that are in thermodynamic equilibrium. Although the motivating experiments are not neces- sarily carried out in equilibrium, the mathematical formalism governing equilibrium structures is well defined, and we can use this to learn basic principles. This first lecture will give a preliminary introduction to statistical mechanics. Lecture 2 will discuss the assembly of structures made out of identical components, focusing on both polymer assembly and the assembly of clusters of colloidal spheres. Lecture 3 asks how the assembly can be controlled by choosing interactions between the components. The motivation here is that the only way to control the output of a self assembling system is to encode in the components the structures that should be built–by choosing the interactions appropriately. This can be done by using heterogeneity, making the components different from each other, and forcing them to interact in ways to pick out the desired structures. In order for this to be an interesting exercise, one must choose interactions from a set that is realizable in experiments. In these lectures we will be considering short ranged, specific inter- actions, as can be realized by using a library of stickers coating the assembling objects. Recent technology has invented several ways of creating these libraries of stick- ers, using DNA, different shapes, etc. We would like to highlight two important ex- amples from the experimental literature, in which nanostructures are either coated [13] with or entirely composed of DNA [18, 8]. For example, a plethora of shapes have been robustly self-assembled using a sea of short DNA strands (“bricks”) [18, 8]. Similar efforts are underway using rationally-designed proteins [9, 10]or colloidal particles [3] with specific interactions as building blocks. Both the colloidal and the DNA structures are interesting in that they are created out of objects that are mostly different from each other. In the DNA structures, they are all different from each other. Every DNA strand is different from every other DNA strand, and it were not, simply would not occur reliably. For the nano spheres, only a few different types of spheres have been used up until this point, but the possibility of further heterogeneity clearly exists. The broad goal of the last two lectures is to explore the design problem sug- gested by these technologies: how do we choose components or interactions between components to create structures with high yield. What are the limits to what can be built? Is it possible to design specific components to create an arbitrary (large) structure with high yield? Or are there limits? If there are limits, what are the ways in which these limits can be surmounted? In order to begin to answer these questions, we will begin by addressing the problem of assembly of identical objects. This will give us a point of comparison when we start making the structures heterogeneous, and also will provide a setting to review some basic statistical mechanics. In the rest of this first lecture, we will review aspects of statistical mechanics that are needed for our explorations.

2. Statistical mechanical preliminaries Letusconsideranobjectdescribedwithcoordinatesx(t); these coordinates could describe both the coordinates in 3 dimensional space in which it lives, the rotational state of the object, as well as any internal degrees of freedom. Lets suppose the LECTURE 1. SELF ASSEMBLY: INTRODUCTION 173 coordinates obey the dynamical rule dx (1.1) ζ = −∇U + ξ. dt Here ζ is the drag coefficient, U = U(x) is the potential and −∇U is the force on the particle. The term ξ is random uncorrelated noise, which obeys   <ξi(t)ξj(t ) >= Aδi,j δ(t − t ), where the brackets denote an ensemble average over realizations of the noise. The different components of the noise are uncorrelated with each other, and the noise is 2 uncorrelated in time. The constant A =2Dζ , where D = kBT/ζ is the diffusion constant. To describe the motion of the particle we must adopt a probabilistic description, which is given by the Fokker Planck equation of this Langevin equation. This is 1 (1.2) ∂ P = ∇· ∇UP + D∇P , t ζ where P = P (x,t) is the probability distribution of the particle locations at time t. In the absence of a force field (−∇U), the Fokker Planck equation is the diffusion equation, and thereby describes the Gaussian spreading of the probability distribu- tion of the particle. With a force field however, if the potential energy function has minima in some configurations, it is possible to find the steady state distribution. The equilibrium probability distribution is obtained by setting ∂tP =0,which implies (1.3) P = Ce−βU, where C is a normalization constant, and β = ζ/D. Einstein established the relationship that 1 β = , kBT where kB is Boltzmann’s constant. Thus β is proportional to the inverse tem- perature. Meanwhile C must be normalized so that the total probability of all configurations is unity. Namely 1 1 C = =  , Z e−βU(x)ddx where the integral is over the entire configuration space that is accessible. We have defined C = Z−1,asZ is typically called the partition function.Notethatwehave made the integration over a d dimensional configuration space. This includes both translational and rotation configuration space of the object, as well as any potential internal degrees of freedom. The probability distribution in Eqn. (1.3) plays a central role in statistical mechanics and is called the Boltzmann Distribution. For many particles interacting, we can compute the partition function with the same procedure. Lets first imagine that the number of particles are fixed. Then we can use equations 1.1 and 1.2 as written, but instead interpret x as the 3N dimensional vector giving the coordinates of the particle configuration. Similarly, U = U(x) also contains the interactions between the particles. In the problems we will consider in these lectures, the total potential energy is caused by pair potentials 174 MICHAEL P. BRENNER, SELF ASSEMBLY between the different objects. Namely U = v(ri − rj), i,j where v(r) is the pair potential and the sum is over all particle pairs. The formula for the probability of a configuration is similar except we have to compute the normalization integral over the entire configuration space, which for N particles is 3N dimensional. 2.1. Ensembles and free energies We now summarize some basic nomenclature, which will play a role in what follows: The canonical ensemble fixes N, the number of particles, and so carries out the integral at fixed dimension. Inside of this integral there is a lot of physics: if we consider particles interacting with short ranged interactions, then the integral contains configurations in which (i) there is a cluster of N particles attached to each other; (ii) there is a cluster of N-1 particles, and one free monomer that is not interacting with the cluster; a cluster of N-2 particles, and one dimer of two particles; and so on. Each of these structures has some probability of forming. If the bonds between particles are strong, then it is to be expected that the most likely configuration is the one with all N particles in a cluster. In the grand canonical ensemble, the number of particles is not fixed. If we let ZN be the partition function computed above with the number of particles fixed, we can compute the partition function when the number of particles is fixed using ∞ βμN Zg = e ZN , N=0 where μ is the chemical potential. The idea here is that unbound components have a translational free energy (entropy) μ, so that there is an energetic cost −μ for taking a particle out of solution and attaching it to a structure with N particles. One can also think about μ as a Lagrange multiplier that is enforcing a constraint that the average concentration particles in the system is a constant. We will encounter the grand canonical ensemble whenever we study systems at fixed concentration. An important approximation is the relationship between the concentration of the system and the chemical potential: to find this, we note that in the grand canonical ensemble, the average number of particles N¯ is given by ∞ eβμnZ (1.4) N¯ = n=0 n . Zg Now at low concentrations, the average number of particles is dominated by the number of monomers; the entropy of individual monomers is high enough that it is costly for the particles to associate into clusters. This means βμ e Z1 (1.5) N¯ ≈ Nmonomer . Zg

Since Z0 = 1, we can further approximate Zg ≈ 1. This means that βμ N¯ ≈ Z1e , but we also have that Z1 = V . This implies that N¯ c = = eβμ, V in the limit of low concentrations. LECTURE 2 Self assembly with identical components

1. Introduction Lecture 1 provided an introduction to self assembly, and also some statistical me- chanical preliminaries. Let us begin by recapping the main statistical mechanical ideas: (1) The stationary distribution of the Fokker Planck equation leads to the Boltzmann distribution, and associated partition function Z.Namely,for N particles, −βU(x) 3N ZN = e d x.

(2) We are interested in computing the partition function (and hence the prob- ability distribution) for structures that interact with each other through short ranged interactions. The form of the probability distribution de- pends on what the constraints are. There are two important sets of con- straints: (a) Fixed N. This is the canonical ensemble. Here we must sum over all sets of structures that can be made with these N objects. (b) Fixed concentration. This is the grand canonical ensemble. We en- force this by introducing the chemical potential, which is a Lagrange multiplier that enforces the constraint of constant average particle number N¯. Physically, the chemical potential represents the entropic cost of taking a monomer out of solution. We showed in lecture 1 the critical relation N¯ c = = eβμ, V where c is the particle concentration. This second lecture will consist of two parts. first, we will work out the solution to the polymer problem, when all of the monomers are identical. Then we will dis- cuss the corresponding problem with colloidal spheres, considering the statistical mechanics of clusters of identical colloidal particles, of up to ∼ 10 particles. This is the moral analogue of the homopolymer problem, except that it comes with signif- icant mathematical complexities. A primary difficulty is the task of enumerating all possible states of the system. Whereas in the polymer problems we could im- mediately make a complete list of possible structures that can form is much more complicated for sphere packings, and the first part of this lecture addresses that mathematical problem. The critical issue is that for colloidal clusters, above N =5, there are multiple structures at each N, and the partition function must sum over all of these structures. There is typically a massive degeneracy in the ground state, with multiple structures having the maximal number of contacts. The rotational

175 176 MICHAEL P. BRENNER, SELF ASSEMBLY and translational entropies then become critical in fixing the probability distribu- tions of final state.

2. The (homogeneous) polymer problem We would like to find the probability distribution of the lengths of polymers, which are composed of sets of monomers. We will assume that the monomers bind to each other with a binding energy −E, and the chemical potential of each monomer is μ: this means that a monomer in solution has a free energy (entropy) of μ.Wemust pay this much energy to take the particle out of solution. When the monomer binds to the polymer this energy is lost, so the cost of binding a monomer to the polymer is −(E − μ). Intuitively if this energy difference is negative then it is favorable for monomers to grow into a long polymer. If it is positive then they will not. In this first example, we assume that each monomer has exactly two binding sites. How to calculate the probability distribution? In the last lecture, we argued that the partition function has the form ∞ nβμ (2.1) Z = e Zn, n=0 where 3n −βU (2.2) Zn = dx e , where U(x) is the interaction potential, the sum of pair potentials U = v(|ri − rj|). i,j Let us suppose that the minimum of v(r) occurs at r = a + ,wherea is the size of the monomer. We choose  a, so that the range of the potential is much smaller than the particle size. To analyze this problem, we first note that there are several types of degrees of freedom: translation (3 degrees of freedom); rotation (3 degrees of freedom) and the internal degrees of freedom (3n − 6). These different motions decouple in the integral, and thus the partition function is composed of the product of three terms

Z0 = ZrotZtransZvib. The reason that integrating out translations and rotations modifies the partition function is because there is a Jacobian factor in the coordinate transforms between the different coordinates. 2.1. Evaluating partition functions To evaluate the partition functions1, let us suppose that we have a system with a certain set of reaction coordinates q(x), where x ∈ Rn, q ∈ Rm.Thisisgivenby (2.3) Z = ρ∞(x)δ(q(x) − z)dnx.

Here ρ∞ is the stationary distribution / Boltzmann distribution. The surface Σ = {x : q(x) − z =0} is a manifold, so it is common to evaluate (2.3) by changing variables to those lying along the manifold. These are often called “collective” variables in statistical mechanics.

1This section was written with Miranda Holmes-Cerfon. LECTURE 2. SELF ASSEMBLY WITH IDENTICAL COMPONENTS 177

Suppose we identify a set of collective variables {si}, i =1...n − m that parameterize Σ. The variables perpendicular to the manifold can be parameterized  by the qi themselves, i.e. we can choose variables qi = qi(x). The volume element −1/2  in configuration space transforms as dx →|det G| dσΣdq ,wheredσΣ is the T surface area element on the manifold (a function of the si), G =(∇q) (∇q), and ∇q =(∇q1, ∇q2,...∇qn−m). The integral (2.3) becomes ∞  −1/2  Z = ρ (x)δ(q − z)| det G| dσΣdq ∞ −1/2 = ρ (x)| det G| dσΣ

This shows that evaluating (2.3) is not the same thing as integrating over the particular manifold in the delta-function; there is an additional factor of | det G|−1/2 that accounts for the “squishiness” of the manifold: the relative infinitesimal space between the manifolds when the configuration space is foliated by them. To evaluate our partition function, we must break the 3N dimensional integral into two parts: one part over the internal degrees of freedom, and another part of the 3 rotational and 3 translational degrees of freedom. If we denote the internal coordinates by z ( a vector of length 3N-6) then the coordinates of the entire object will be x = R(θ, φ, ψ)z + μ,whereR is a rotation matrix of e.g. the 3 Euler angles θ, φ, ψ and μ is a constant vector representing translating all components by the same amount. To find the effect of rotation and translation on the integrals, we simply need to evaluate the matrix G associated with θ, φ, ψ, with rotation, and μ for translations and carry out the integral over these variables. 2.1.1. Translational partition function. Clearly, the translation variables lead to G which is the identity matrix, which implies that the translational partition function gives a factor of the volume. 2.1.2. Rotational partition function. For the rotational partition function, the re- sult is nontrivial: The rotational partition function comes about because, once the number of bonds in a given structure are fixed, we now need to consider the integral over the configuration space of the structure with these bond constraints enforced. This includes all rotations of the object. Using the Euler angles introduced above, the algebra shows that this will lead to a factor which is the square root of the determinant of the moment of inertia tensor of the object. 2.1.3. Dimer. Let’s work this out explicitly for a 2D dimer, in which the objects are to be separated by some amount z. The dimer has two particles x =(x1,x2), which interact with energy U(x). For simplicity, lets say that the potential energy 1 2 is U(x)=− k(|x1 − x2|−z) . The partition function is then 2 β k(|x −x |−z)2 (2.4) Z = e 2 1 2 dx.

To evaluate this, let’s change variables as above to xc,yc,θ, which describe the centre of mass and the overall rotation√ of the dimer. The final variable we choose to | − | | − |2 be q = x1 x2 .Notethatq = 2I,whereI is the moment of inertia ( i xi xc , where xc is the centre of mass.) q q − The variable change can be written explicitly as x =(xc+ 2 cos θ, yc+ 2 sin θ, xc q − q | ∂x| 2 cos θ, yc 2 sin θ). One can calculate that ∂q = q, so the corresponding volume element is dx = qdxcdycdθdq. 178 MICHAEL P. BRENNER, SELF ASSEMBLY

Equation (2.4) becomes β k(q−z)2 β k(q−z)2 Z = e 2 qdxcdycdθdq = qdθ e 2 dq dxcdyc 8 9: ; 8 9: ; 8 9: ;

Zrot Zvibr Ztrans The partition function factors into rotational, vibrational, and translational parts respectively, where each is an integral over the respective degrees of freedom. We include the factor q in the rotational partition function because this comes from the rotational degrees of freedom: it is the surface area of the space of rotations. We can now easily evaluate √ √ Zrot =2πq =2π 2 I 1/2 Zvibr =(2πkBT/k)

Ztrans = V (the volume of space) This result can be generalized for a cylinder of diameter d with monomer length ,thisis 2 3/2 2 2 3 7/2 Zrot =(8π ) d N/8 N /12 ∼ N . In general, |I| (2.5) Z = , rot σ where σ is the symmetry number of the structure. This is the size of the auto- morphism group, of rotations and reflections, that maps the structure onto itself. Although this does not play any roll for the polymer problem, we will see that it plays an important role for the cluster problem. 2.1.4. Vibrational partition function. In the dimer calculation, we saw one other partition function that came up–the vibrational partition function, which is over the binding energy itself. It is worth redoing this calculation for a more general potential: Let us consider a single bond between two monomers. If the monomers have coordinates along the x axis of x1 and x2 then let r = x2 − x1. The part of the partition function that corresponds to this motion is then L dre−βv(r), −L where L is the size of the segment we are integrating over (connected to the volume of the box). To compute this, we expand v(r) around its minimum point. Namely v(r)= v(a + )+v(a + )δ2/2+..., implying that the integral becomes L ∗  ∗ dδe−βv(r )e−v (r )δ2. −L Here r∗ = a+. We now can carry out the gaussian integral that remains, obtaining  ∗ 2πk T ∗ (2.6) e−βv(r ) B = e−βv(r )Z . v(r∗) vib

Here we have introduced the so called vibrational partition function, Zvin for a single bond. If we have N − 1 bonds, then we need to compute the product of LECTURE 2. SELF ASSEMBLY WITH IDENTICAL COMPONENTS 179

N − 1 such integrals so that we end up with

N−1 −β(N−1)(v(r∗) Zvib e . In this case we can combine the vibrational partition function and the bond energy ∗ into a single term, with E = v(r )−kBTlog(Zvib). This is a special property of the polymer problem and arises because the vibrational integral is identical over every bond in the structure. In the cluster problem we address below, this no longer holds. 2.1.5. Total partition function. The rotational and translational entropy modify the shape of the length distribution for polymers. The partition function is: 2 3/2 2 2 3 −β(N−1)E −β(N−1)E (2.7) ZN =(8π ) d N/8 N /12e = Z0(N)e . Here we have used the fact that there is If we instead fix the chemical potential, the partition function is ∞ ∞ Nβμ −β(N−1)E Nβμ (2.8) Z = ZN e = Z0(N)e e . N=0 N=0

β(μ−E) This series converges when e < 1. Given the N dependence of Z0,itcanbe directly summed. The most important result though is that the probability of seeing a polymer with N different monomers

β(Nμ−(N−1)E) (2.9) PN ∼ e Note that when E μ, the monomers are happier to be in solution then they are to stick to the polymer; in this limit, the polymer length distribution decays with N. In contrast if the binding energy between the monomers is stronger than the chemical potential of the monomers, then longer polymers have higher prevalence. This means that with identical monomers, we are not given any control knob with which we can control the length distribution of the polymers.

3. Cluster statistical mechanics Let us now carry out the same set of calculations but for clusters of spheres. The main difference here is that the geometry of the clusters complicates all of the cal- culations in the example above; this causes the phenomenology to be a bit different. Moreover, we will consider a set of spheres, motivated by experiments on colloidal spheres interacting with depletion forces – in which there are short ranged, non specific interactions between the spheres. This means that as many spheres can stick to each other as permitted by geometry. There is no restriction on valence as in the case of the polymer problem. To focus on aspects that are only geometry dependent, we proceed here by focusing on the statistical mechanics of clusters of exactly N spheres – ie. we are not fixing the concentration as in the example above. The results here can be put together to form the fixed concentration examples. For a given number of particles, N, we need to compute the relative probabilities that the different clusters form. Let us consider a particular cluster of N particles, that we denote with the index S. Then the probability PS that this cluster forms is − ∼ βMS E rot trans vib proportional to PS e ZS ZS ZS , where MS is the number of contacts 180 MICHAEL P. BRENNER, SELF ASSEMBLY that form, and ZrotZtransZvib are the rotational, vibrational and translational partition functions. The total probability that S formsisgivenby

−βMS E rot trans vib e ZS ZS ZS (2.10) PS =  . −βMS E rot trans vib S e ZS ZS ZS Thus to compute equilibrium probabilities we need to enumerate all of the struc- tures that form and compute their entropies. In the identical particle experiment, all of the particles stick to each other, so we expect that the partition function will be dominated by clusters with the maximal number of contacts. This is at least the number of contacts needed for the structure to be rigid, namely with at least MS =3N − 6contacts. As with the polymer case, the calculation involves computing the vibrational partition function, and the rotational partition function, both of which differ some- what from the homopolymer problem.

3.1. Vibrational partition function In the homopolymer problem, we could absorb the vibrational partition function into the definition of E – this is because every bond in the polymer made the same contribution to the overall partition function. This condition is not met for finite clusters. The reason for this can be seen by the following calculation.Let the interaction energy U(r) correspond to a given cluster with coordinates r. Here, r is a 3N dimensional vector, characterizing the coordinates of each of the N particles. Let us suppose that the equilibrium configuration for a given particle cluster is given by r0. Then we can write

3N ∂2U (2.11) U(r)=U(r + δr)=U + δr δr . 0 0 ∂r ∂r i j i,j i j

By diagonalizating the Hessian Hij = ∂ri ∂rj U, we can find the eigenvalues λα,and thus decompose the energy as

3N 1 (2.12) U(r)=U 0 + λ q2 , 2 α α α=1 where the qα are the displacements in the eigendirections. For each eigendirection, the partition function is  ∞ 1 2 2π −β λαq (2.13) Zv,α ∼ e 2 α = . −∞ λα Thus the total vibrational partition function is −  3N6 2π (2.14) Zvib = c . S v λ α α What is different here from the exercise above is that here, the multiple struc- tures at a fixed N all have different vibrational partition functions, and so they play a relative role in determining which structures are most probable. Using these formulae, we computed the landscape of interaction energies of identical particles through N =9. LECTURE 2. SELF ASSEMBLY WITH IDENTICAL COMPONENTS 181

3.2. Rotational partition function As in the polymer problem above, the rotational partition function is proportional to |I|,whereI is the moment of inertia tensor of the structure. But in addition, there is another important factor, namely |I| (2.15) Z ∼ , rot σ where σ is the symmetry factor of the structure. The symmetry factor is the size of the automorphism group of the structure–it is the number of symmetries the structure has under rotations and reflections. Given that the particles are identical, we should only consider the integral over rotations modulo these symmetries. Some structures that will form have a high degree of symmetry, corresponding to a high σ. This suppresses the entropy of these structures. It is worth remarking that the symmetry number correction factor is entirely classical: the origin of the factor is clearly exposed if we label the particles. Imagine that we enumerate all of the different ways of assembling our labelled particles to create the structure in question. If a rotation or reflection symmetry exists, this lowers the number of possible labellings–since it will render those that are related by rotation/reflection to be equivalent. So in a sense we can view the effect of the symmetry factor as a way to account for the different numbers of pathways that build different structures.

3.3. Cluster enumeration Given the formalism described above, the critical step for calculating statistical mechanical probabilities is to enumerate the different geometries of clusters that can form. For each cluster, we can use the formalism to compute its partition function. We therefore developed a method[2]toenumerateallclustersofn particles which have at least 3n − 6 internal contacts. This is the minimum number of the maximum number of contacts that can form in a cluster of n particles, given rigidity constraints. Our method combines graph theoretic enumeration of adja- cency matrices with basic geometry, and allows analytically solving for clusters of n ≤ 10 particles satisfying minimal rigidity constraints. The procedure for enumer- ating packings has two steps. First we use graph theory to construct all possible n-particle configurations: we first enumerate all adjacency matrices, and then re- strict to those that are non isomorphic to each other, and obey certain rigidity constraints: each particle should have at least three contacts, and the total number of contacts should be at least 3n−6. Once we have the adjacency matrices obeying these constraints, we use geometry to determine which configurations correspond to minimally rigid packings. To make this procedure tractable, we distinguish between two types of packings: iterative packings, in which all possible m particle subsets with ≥ 3m − 6contacts also correspond to minimally rigid packings, and non-iterative packings or seeds. The majority of packings at small n are iterative. The table shows the growth of adjacency matrices with n. Only a very slmall number are non-iterative. Iterative packings can be directly solve for the structure of the packing [2]. The limiting step in our procedure is solving for new seeds at each n. In principle this method can produce the complete set of such packings. In practice our enumeration represents a lower bound on the number of clusters at each n; we use numerical values for 182 MICHAEL P. BRENNER, SELF ASSEMBLY

n A’s Non-Isomorphic A’s Minimally Iterative A’s Non-Iterative A’s rigid A’s 1 1 1 1 1 0 2 2 2 1 1 0 3 8 4 1 1 0 4 64 11 1 1 0 5 1,024 34 1 1 0 6 32,768 156 4 3 1 7 2,097,152 1,044 29 26 3 8 268,435,456 12,346 438 437 1 9 6.8719 ·1010 274,668 13,828 13,823 5 10 3.5184 ·1013 12,005,168 750,352 750,226 126 Table 1. The Growth of Adjacency Matrices with n. The number of adjacency matrices (constructed by [11]) decreases rapidly as isomorphism and rigidity constraints are imposed. Iter- ative and non-iterative are defined in the text. The classification of whether an A is iterative or not is here shown after all rules for n − 1 particles are applied; thus the non-iterative column shows n particle non-iterative structures only, and does not include less than n particle non-iterative structures.

the coordinates, and in rare cases round-off error may cause some packings to be missed. Figure 1 shows the new seeds that form for n ≤ 10 particles. The first new seed is the octalhedron that exists at n =6.Byn = 10 there are many different new seeds. Finally, Table 2 gives the number of packings that were identified by our study. As n increases the number of packings increases rapidly. In order to compute the equilibrium probabilities of identical sphere clusters we need to compute the partition functions over all of these structures. For n ≤ 9 the ground state degeneracy increases exponentially with n. However, for n>9 the degeneracy decreases due to the formation of structures with greater than 3n − 6 contacts. Interestingly, for n = 10 and possibly at n =11andn = 12, the ground states of this system are subsets of hexagonal close packed crystals. We show also that at n =12andn = 13 the ground states are not icosahedra. We relate our results to the structure and thermodynamics of suspensions of colloidal particles with short-ranged attractions.

3.4. Probabilities With the list of clusters in hand, we can now use equilibrium statistical mechanics to compute the probabilities of each of the clusters. The most important point is that since all maximal contact structures with n ≤ 9 have the same number of contacts (3n-6), the differences in occupancy between these different structures is purely entropic, arising from the different rotational and translational partition functions. We evaluated these partition functions and compared the results with experiments, with excellent agreement [12]. The main qualitative point from these calculations LECTURE 2. SELF ASSEMBLY WITH IDENTICAL COMPONENTS 183

Figure 1. New Seeds. All new seeds of n ≤ 10 particles shown in both sphere and point/line representation. There exists only 1 packing for each n of n ≤ 5 particles, that can each be constructed iteratively from a dimer; thus there exist no new seeds for n ≤ 5. n = 6 is the first instance of a new seed. The set of new seeds reported for n =10 is putative and thus represents a lower bound. New seeds with a ∗ appearing to the right correspond to minima of the second mo- ment out of all packings for that n. It can be seen here that, for the packings we have analyzed, the minimum of the second moment happens to correspond to a seed,for n>5. and experiments is that since entropies are determining the occupancies, the lowest probability structures are those with the highest symmetry. More detail about these points and the details of the probability distributions can be found in the original papers. Figure 2 shows the equilibrium probabilities of the different structures for N=6,7,8, compared with experiments [12]. 184 MICHAEL P. BRENNER, SELF ASSEMBLY

n Total Packings New Seeds Non-Rigid Chiral Total States Packings

2 1 0 0 0 1 3 1 0 0 0 1 4 1 0 0 0 1 5 1 0 0 0 1 6 2 1 0 0 2 7 5 1 0 1 6 8 13 1 0 3 16 9 50 4 1 27 77 10 223 8 4 170 393 Table 2. Packings. Total number of packings found. We distinguish between chiral structures and packings, such that a left and right-handed packing is considered to be 1 packing with 2 distinct states (for n ≤ 10 this is the only type of chiral packing encountered). The number of packings having chiral counterparts is included in the column marked ‘chiral.’ The total number of states per n is equal to the number of packings plus the number of chiral structures. This is included in the table, along with the number of packings corre- sponding to new seeds and to non-rigid structures. LECTURE 2. SELF ASSEMBLY WITH IDENTICAL COMPONENTS 185

Figure 2. From [12]. Comparison between experimental and the- oretical values of probability P ,atN = 6, 7, and 8. Structures that are difficult to differentiate experimentally have been binned together at N =7andN = 8 to compare to theory. The calcu- lated probabilities for the unbinned states are shown by the light gray bars, and binned probabilities are shown in dark gray. The dots indicate the experimental measurements, with 95% confidence intervals given by the error bars. Renderings and point groups, in Sch¨onflies notation, are given for each structure. The number in the subscript of each symbol indicates the order of the highest rotational symmetry axis, and the letter indicates the symmetry group. High symmetry structures (those in D, T ,andO) occur at low probability. Structures in C1 and C2 groups occur in chiral pairs.

LECTURE 3 Heterogeneous self assembly

1. Heteropolymer problem Our previous lectures have given a backdrop to finally explain what we mean by the phrase “self assembly”. One might say that the polymers themselves in the previous calculation self assemble, in that they form large structures based on the interactions between the components. But there is no element of control in the construction. We only have only a single parameter to tune, E − μ, and it gives us only two possible choices for the length distributions of the polymers. But what if we want a different distribution? More to the point, what if we would like a sea of monomers that picks out a particular length for the structure? Similarly, in the colloid example, the probabilities that each of the respective structures occur is fixed. What if we want to change the distribution to, for example, favor one structure over another? Clearly this cannot be done with our simple monomers or particles. We need more of a substrate to code for the structures that we want. The approach we will follow now is to consider a heterogeneous alphabet. What we mean by this is that when we want to construct a system of size N, we will use N different components to do so. These components will interact with each other in different ways, i.e. have different binding energies with each other. We will then ask whether it is possible to create the desired structure with high yield. To do this, need to consider

Z (3.1) Y = desired . Zdesired + Zjunk

The goal is to choose the interactions between the components to minimize the sum over the “junk”; the reason that this could be difficult is that although each individual term in the sum over junk is smaller than Zdesired, there are potentially a large number of terms in the sum–especially in the limit when the structure becomes asymptotically large.

1.1. The polymer problem, revisited The work in this section was done with Arvind Murugan and James Zou [15]. We consider heteropolymers assembled in a pot with N species of monomers ai,i = 1 ...N. Each monomer ai has a distinct left and right end through which it can bind to the complementary side of any other monomer aj with binding energy gij < 0ifai is to the left of aj (denoted ai − aj). We assume that the monomers prefer to glue in the sequence a1 − a2 − ...− aN and hence gi,i+1 = −E is more negative than other values gij.Thusgij has the structure shown in Eq. 3.2 where ij

187 188 MICHAEL P. BRENNER, SELF ASSEMBLY

⎛ ⎞ 1,1 E1,2 ··· 1,n ⎜ ··· ⎟ ⎜ 2,1 2,2 E 2,n ⎟ ⎜ ⎟ ⎜ . . .. . ⎟ (3.2) g = − ⎜ . . . . ⎟ ⎜ . . . ⎟ ⎝ . . .. E ⎠ N,1 N,2 N,3 ··· N,N General constraints on the design and structure of macromolecules limit the extent to which one can minimize crosstalking interactions ij . Let us assume that ij are drawn randomly and independently from a general distribution ρ(w). Our goal is to discover when heterogeneous assembly (with high yield) can take place, as a function of the matrix g–namely E and ρ(w). We begin by computing the partition function in the limit that the concen- trations of the individual components are equal. Competing structures can be classified by their length l and the number k of weak bonds contained in them. The free energy of such a linear structure is F = −lμ +(l − k − 1)E + w1 + ...+ wk ≡ −lμ +(l − k − 1)E +Ωk where wi are k weak cross-talk energies, drawn indepen- dently from the distribution ρ(w). Let νn(k, l) be the total number of such linear structures of size l with k weak bonds. (We will not need its exact form here.) The partition function is a sum over structures of all length l with varying k, l−1 lβμ−(l−1)βE −βΩk+kβE (3.3) Z ∼ e νn(k, l)e k=0 where Ωk = w1 + w2 + ...+ wk is the sum of k random cross-talking interactions found in the structure (each chosen from the distribution ρ(w)). Note the new aspect of the heterogeneous problem from the homopolymer problem occurs in the sum; we will see that this is where the action is. To understand this formula, we need to make comments on two aspects, νn(k, l) and also Ωk. 1.1.1. Computing νn(k, l). Although ν(k, m) can be computed exactly for fixed k, m, it is intuitive to choose an approximation: m − 1 m k+1 (3.4) ν (k, m) ≈ n − . t,n k k +1 m−1 The combinatorial factor k accounts for chosing k locations of cross-talking interactions in a structure of length m. This divides the length m into k +1 fragments, each of which is a contiguous piece of the desired structure (of length n). Given that we are assuming completely heterogeneous structures, the composition of each of the k + 1 fragments is uniquely determined by the identity of (say) the first component of each fragment. The number of such choices for each fragment is N − (fragment length). While we need to sum over all lengths of fragment lengths to find ν(k, m), a useful approximation is to assume all segments to be of average m length k+1 . 1.1.2. Computing Ωk. To understand how to handle Ωk,itisbesttoaverageover all realizations of random cross-talk. We rely on the following simple identity for the average of the exponential of a sum of identical random variables, k−1 −Ωk−1 −w1−w2+...−wk−1 −w (3.5) e ρ = e ρ = e ρ LECTURE 3. HETEROGENEOUS SELF ASSEMBLY 189

Hence we define the exponential averagew ˜ of the distribution ρ, −w˜ −w −w (3.6) e ≡e ρ = dwρ(w)e . which is sensitive to the largest cross-talk (most negative w) in the distribution. We can rewrite Z usingw ˜, l−1 lβμ−(l−1)βE −βk(˜w−E) (3.7) Z ∼ e νn(k, l)e k=0 from which we identify the parameter g =˜w − E>0, the difference between cross-talking energy and strong bond energy. 1.2. Constraints on the energy matrix Intuitively, if the strong binding energy E is close enough to weak bonds in the distribution ρ(w), then we would imagine that the qualitative behavior of the model will be similar to the homopolymer problem–namely the polymers will either grow or shrink with N depending on the relative values of E and the chemical potential μ. Something different will happen when the correct binding energy E is large enough relative to the incorrect binding energy. When does something happen? First we give an intuitive argument, and then suggest how this can be formal- ized. The probability of breaking any particular strong bond in the desired structure is proportional to eβE. Any of the other n − 1 components can glue to the newly exposed site through a weak bond, giving an energy gain of −w˜. Hence, the proba- bility p of breaking the desired structure at a particular bond and gluing weakly to one of the n−1componentsisgivenbyp ∼ eβE ×(n−1)×e−βw˜ =(n−1)e−βg.Note that the first component that binds weakly to the exposed site uniquely determines the remaining structure. This is because our chosen concentrations eβμ ≈ e−βE are low enough to ensure that additional components glue on to the first component only through strong bonds and thus complete the structure uniquely. A break in the structure could happen at any of the n bonds in the desired structure. To prevent a break from occurring at any of the n sites, we need p<1/n. Hence yield can be high only if (n − 1)e−βg < 1/n, which gives the constraint:

(3.8) βg > βgc ≈ A ln n + B,

If ggc(n), structures with weak bonds are energetically suppressed and good yield may be obtained. Thus the condition g>gc(n)setsan upper limit on the size n of structures in terms of crosstalk.

2. The yield catastrophe

We will now restrict ourselves to g>gc, and examine how the maximum yield depends on the size and structural complexity of the desired structures. We assume 190 MICHAEL P. BRENNER, SELF ASSEMBLY

Figure 1. (A,B) Yield falls (lower dots)) with (A) increasing number of arms and (B) size if the chemical potentials of different components are equal. The yield improvement due to optimized unequal potentials is greater for larger and highly branched struc- tures. The resulting optimized yield (upper dots) is relatively in- dependent of shape and size. (The number of distinct components is assumed to equal structure size m = n. All shapes in (A) have n = m = 25 components with g =12kBT .In(B),tocompare different n = m fairly, we scaled g = 3+2lnn to keep g a constant amount over the 2 ln n threshold.)

that the concentrations of components match their stoichiometry in the desired structure, so that the chemical potentials are all equal, μi = μ. Fig. 1A shows the yield (red curve) as a function of μ, for a linear structure of size n =8.Thereis aways a value of μ where the yield is maximized: Low chemical potentials favor incomplete structures while large aggregate structures held together by weak bonds form at high chemical potentials. The lower data points in Fig. 1A show how this maximum yield depends on n, the size of the linear structure, for structures where all the components are distinct –i.e, the number m of distinct components used is also the size n of the structure. To fairly compare yields for structures of different n, we need to increase the difference g between strong and weak bond energies as A ln(n)+G, so that the to stay above the crosstalk threshold of Eqn. (3.8) by a fixed amount G.WesetA ≈ 2.0asin Eqn. (3.8) applied to linear structures. Strikingly, the yield degrades exponentially with increasing n, indicating that by n ≈ 35 the maximum yield is at most about 1%. This occurs for a simple reason: the number of competing structures increases dramatically with increasing size; although each individual competing structure has higher energy, the combinatorial explosion of possibilities strongly limits the yield. This yield catastrophe not only occurs with increasing size of structure, but also with increasing structural complexity. We show this by developing a methodology for addressing linearly branched structures. Such calculations can be tedious since the partition function must be summed over all competing structures of varying shapes and sizes, and the list can can be quite large even for a simple structure. However, these calculations can be dramatically simplified using rules adapted from those of Feynman diagrams. (Feynman methods have been used before in the computation of partition functions of polymers [4].) In our context, Feynman rules give us a one-step method of summing over structures of all sizes that are consistent with a given topology. LECTURE 3. HETEROGENEOUS SELF ASSEMBLY 191

Using this method (detailed in [15]), we computed the yield of branched struc- tures for fixed n, but with increasing numbers of arms, Fig. 1B). As above, we keep the chemical potentials of each component to be stoichiometrically matched to that in the desired structure, and find the value of μ that optimizes yield. The yield decreases exponentially with the number of arms. Again, this is for the same basic reason as the linear polymer yield degradation: the number of competing structures increases dramatically with increasing number of arms. This yield catastrophe presents a fundamental limit: if the concentrations of components are stoichiometrically matched to the structure, then there is a limit to how large and how complex a structure that can be robustly assembled.

2.1. Fixing the yield catastrophe The calculations resulting in the yield catastrophe have assumed that all compo- nents are supplied at the same chemical potential. While this is typical of DNA experiments, it is natural to wonder if the situation would be improved if com- ponents have unequal supply (or chemical potentials). Naively, one might expect wastage of components that are supplied in excess of others with whom they form a complex. This intuition is valid only in the limited circumstances where the reaction can be driven to completion and all components are assembled into the desired structure with no incorrect structures left over (i.e., 100% yield). However, incorrect structures are inevitable in most models of assembly and constitute a form of wastage. Hence it is conceivable that yield can be improved by balancing the wastage in free components and those in incorrect structures. We studied this question numerically, varying μi independently to optimize yield for each structure, using gradient descent on our analytical constructions for partition functions. Strikingly, the optimal values of the chemical potentials μi are highly nonuniform across a structure, with exterior pieces having much higher μi than interior pieces. Moreover, using the optimal μi leads to a nearly complete recovery of the yield catastrophe. Fig 1 shows the effect on our two model cal- culations: Fig 1A (black dots) shows that the optimized μi’s lead to a polymer yield that is independent of the length of the polymer n. Moreover, Fig. 1B (black dots) shows that the optimized yield is independent of the number of arms in the branched structures. We have found that these results are robust, based on numerical solutions of model problems.

3. Colloidal assembly The ideas in this section summarize work of Zorana Zervacic [19]. We now ask whether the same type of encoding rule leads to robust assembly of colloidal clusters. As with the polymers, we do this by making every particle in the target structure different, with interparticle interactions chosen to favor the desired local configuration in its target structure. The interactions between different particle types are coded into an interaction matrix Iˆ (or alphabet), specifying the interaction energy between every pair of particles [7]. We study the assembly yield of arbitrary structures by choosing the interaction energy so that the desired structure is the ground state. This can be done uniquely for an isolated system of N spherical particles with isotropic interactions as follows: start with the adjacency matrix Aˆ, which is the N × N matrix having an element Aij = 1 if particles i 192 MICHAEL P. BRENNER, SELF ASSEMBLY and j are in contact and Aij =0otherwise.WechooseIˆ directly from Aˆ,by mapping non-zero elements of Aˆ to favorable interactions in Iˆ and zero elements to unfavorable interactions. Every contact in the desired structure has a bond energy −, while every other interaction has a higher energy . This interaction matrix represents maximal interaction specificity and is called the maximal alphabet. When a structure has a unique adjacency matrix, this procedure guarantees that the desired structure has the maximal number of contacts, and is therefore the unique ground state. But if a structure has no mirror symmetries, then its “chiral partner”, obtained as the object’s mirror reflection through an arbitrary mirror plane, cannot be made to coincide with the original object through proper rotations or translations. The chiral partners are therefore distinct assemblies of particles, though each particle shares the same neighbors in both (and therefore the chiral partners have the same Aˆ). When a structure is built out of different types of particles, it generically has no mirror symmetries, even if the geometrical shape of the structure does. Consequently, both chiral partners are ground states, and in this paper we identify both as being the desired structure. For equilbrium yield, this difference is not consequential, but we will see at the end of the paper that the simultaneous assembly of both chiral partners can lead to kinetic effects relevant for the yield. The fundamental question is to find Eqn. (3.1),and in particular to compute Zjunk, the partition function of the states with higher energy of the ground state. Like in the heteropolymer problem, we seek to understand how many excited states there are and of what types. This requires identifying and enumerating the excited states of the system. To understand this, let us first consider what are the excited states of a colloidal cluster ; e.g. from the set of clusters we enumerated and used in Lecture 2. In particular, a local minimum (LM) state is a stable configuration of N particles and must have at least one particle bond less than C. Each LM is characterized by the number of broken bonds compared to C, BLM , each bond costing an energy .As an example, Fig. 2 shows the energy landscape with the two lowest energy local minima that arise for the maximal alphabet of one of the N =7clusters.Each of these local minima have BLM = 1. Kinetic landscapes of this type for a few of clusters with N = 6 and 7 show that both the number of LM and BLM is quite variable between different cluster geometries.

3.1. Defects From the examples in Fig. 2, we see that in case of small clusters, all LMs are obtained by permutations of two particles. Considering an arbitrary structure, the interactions given by Iˆ imply that the permutation of far away particles i and j would break all their bonds with the rest of structure. With these observations, we define a local defect as a permutation of two particles i, j that are in contact or share at least one neighbor. The energy of such a defect is determined by the local environment of particles i, j: The number of broken bonds is

(3.9) blocal defect =#NN(i)+#NN(j) − 2 · #NN(i, j) , where #NN(i) is number of nearest neighbors to particle i,and#NN(i, j)the number of nearest neighbors shared by i and j, including the bond between i and j. LECTURE 3. HETEROGENEOUS SELF ASSEMBLY 193

Figure 2. Energy landscape for a N = 7 cluster designed using maximal alphabet. Only the lowest energy LMs are shown, both missing one bond compared to the ground state. Both can be obtained by permuting two particles in the ground state. #BB∗ is the minimal number of bonds that need to be broken for transition between different states. #PW is the number of distinct pathways by which the transition can be achieved. For example, to transition from the ground state to the top local minimum one needs to break at least two bonds. One of the four pathways is to break bonds between particle pairs red — purple and front-yellow — purple, and then smoothly exchange positions of purple and blue particles before reconnecting the purple with the front-yellow.

When both i, j are positioned deeply inside the bulk of the structure, we will call it a bulk defect. Bulk defects tend to have high energies as there are many nearest neighbors in the bulk. Surface defects correspondtoeitherorbothof i, j on the surface of the structure; these typically have fewer broken bonds and lower energy. Continuing the classification, the structure might have ridges and sharp apexes, leading to line and point defects, respectively. Any low-energy local minimum is obtained as a configuration of a particular set of local defects. We neglect configurations where defects overlap, because as N grows the number of such configurations is negligible compared to the number of configurations with well-separated defect locations. In this limit, the energy of a configuration of defects is just the sum of the defects’ individual energies. With this construction of the local minima, we can now compute the yield of a structure C as follows: 1 (3.10) Y eq =  , C −βEm 1+ m f(m)Nme Here, the sum in the denominator is over the different defect states, with m =1 having the lowest energy. Em is the energy of the defect state, which is given by Em = blocal defect. Similarly Nm is the number of local defects of type m,andf(m) is the entropic correction for this defect type. The entropy will arise from the fact that the defects are floppy structures, since they are missing bonds, and therefore the vibrational entropy will be larger than the ground state. Note that we can have two ways to enumerate the local defects: We can either systematically flip particles in the ground state structure; or we can consider every defect type explicitly enumerate these. As an example, if we had a face-centered- cubic lattice, the bulk defects require breaking 16 bonds. Clearly every internal 194 MICHAEL P. BRENNER, SELF ASSEMBLY

1.0

0.8

0.6

0.4

0.2

0.0 5 6 7 8 9 10

Figure 3. Maximal equilibrium yield Ymax extracted from simu- lations (see SI Text) vs. the cluster size N, for all the alphabets of all the N =6, 7 and 8 clusters, for maximal alphabets of all the N = 9 clusters, and for maximal alphabets of a subset of the N = 10 clusters. Big data points correspond to maximal alpha- bets, and small to all the other alphabet sizes. In general, maximal alphabets give the biggest yield. lattice site could be flipped, and so this means that the number of such defects scale with the system size. On the other end of the spectrum, a point defect on a cube can only occur at the corners of the cube. The equation above predicts that 3 bonds will be broken, and the cube has 8 corners so N1 =8. 3.2. Numerical simulations We have tested this theory by carrying out numerical simulations, using dissipative particle dynamics (DPD) [6, 5], and measuring the equilibrium yield of various structures as a function of temperature. Our simulation contains N colloidal spheres of diameter D,withaninteractionrangeof1.05D (this range corresponds roughly to that of a DNA-coated 1μm particles [16]). The colloids are immersed into a DPD solvent of smaller particles. Colloids are modeled as 48 − 96 Lennard-Jones spheres if they interact favorably, and with the repulsive part of the Lennard-Jones potential if they interact unfavorably. Simulations are run for a range of temperatures with a volume fraction of colloids φcoll =1/30, and a larger volume fraction of solvent φsol ≈ 0.2. We have carried out simulations of two types of structures: first we considered the clusters outlined in the Lecture 2. For each cluster that was enumerated therein, we design the energy matrix so that the structure in question is the ground state. Fig. 3 shows the maximum of the yield as a function of N, for all of the structures of N =6, 7, 8, 9 clusters. The decay of the yield with N can be quantitatively understood by enumerating the defect states as outlined above. Secondly, we have considered large complicated structures–and also demonstrate in that case that the theory works quantitatively. For more details, see the original manuscript [19]. LECTURE 4 Nucleation theory and multifarious assembly mixtures

Up until this point, we have discussed equilibrium properties of self assembled structures. We have shown that heterogeneous components allow significant con- trol over the probability distribution of structures that form in equilibrium. We now turn to the subject of kinetics, and provide a few examples of what can be accomplished by controlling kinetics. We begin with the simplest type of extension–we choose interactions to design for the nucleation of structures. We will see that with heterogeneous components, there are interesting possibilities for kinetic control of nucleation[14].

1. Nucleation theory Our discussion of equilibrium structures did not consider the time that it takes for a given structure to form. To motivate this subject, consider the fact that it is possible to supercool a substance below its nominal phase boundary. For example, liquid water can be cooled to a temperature that is significantly below 0oC, without turning to ice. This is not an equilibrium effect–since in equilibrium we know that water should turn to ice in this regime. Instead, it arises because of kinetics of transforming one phase to another. To understand this, lets consider the free energy change of converting a sphere of radius R toicefromwater.Thisisgivenby

4 (4.1) ΔF =(μ − μ ) πR3 +4πγR2. ice water 3

Here μice and μwater are the chemical potentials of water and ice, respectively. When we are in a region of the phase diagram where ice is the free energy minimum, we have μice <μwater, so the first term in Eqn. 4.1 is negative. The second term represents the surface energy–and reflects the fact that the molecules on the outside of the sphere would be in contact with both ice and water if the phase change would occur. This costs energy, and so this term is positive. A notional plot of this free energy change is in Fig. 1: At a critical radius R = R∗, the free energy ΔF has a maximum. Any sphere with radius above R∗ will grow without bound, because its free energy decreases monotonically with size. Physically the penalty of the surface is outweighed by the gain in free energy that is obtained by adding more molecules to the bulk. We can compute the critical radius as 2γ R∗ = , |Δμ|

195 196 MICHAEL P. BRENNER, SELF ASSEMBLY

Figure 1. Notional plot of free energy ΔF versus nucleation ra- dius R. The dashed line denotes the nucleation radius, which is the critical radius above which the initial seed must be for the free energy to favor a growing crystal. and the corresponding free energy barrier at this radius is 16 γ3 (4.2) ΔF ∗ = . 3 (Δμ)2 Now, if the initial bulk solution has no supercritical seeds in it, we need to ask what is the chance that such a seed forms spontaneously from thermal fluctuations. The rate at which this can happen can be derived from the Fokker Planck equation, d (4.3) ∂ P = ∂ ΔFP + D∂ P , t r dr r and the answer is the classical Arrhenius formula, βΔF ∗ (4.4) tseed = τoe , where τ0 is a microscopic timescale, β is the inverse temperature as before.

Exercise: Derive the Arrenhius formula directly from the Fokker planck Equa- tion. Hint: Consider a steady state solution with P (r =0,t)=P0 and P (r = 2R∗,t) = 0. Then compute the flux J that passes from r =0→ 2R∗.Therateis then J/P0.

2. Magic soup With nucleation theory as introduction, we now turn to discuss a variant of nu- cleation theory that is possible using heterogeneous components. This idea was developed in collaboration with Arvind Muragan, Zorana Zervacic, and Stanislas Leibler[14]. Consider N different components, that interact with each other through local interactions. We discussed in Lecture 3 how to design the components to robustly assemble into a single desired structure. The interactions are characterized by an × J J N N energy matrix U ,whereUij is the binding strength of component i to component j.WeconstructUJ so that every contact of the desired structure has J − a fixed large binding energy Uij = E, whereas every non-desired contact has a binding energy  that is chosen from a distribution ρ(). LECTURE 4. NUCLEATION THEORY AND MULTIFARIOUS ASSEMBLY MIXTURES 197

Now we would like to use this framework to program multiple structures. Namely, can we use this formalism to store many different structures into the same set of components? We are guaranteed that we cannot do this by having each of the stored structure be a global free energy minimum: at best they each can be local minima, with high energy barriers between them. For it to be useful, we also require that the monomer bath is itself stable–so that we can start out with all of the components in the bath at fixed concentrations. We then need a simple way to trigger the formation of the structures. To program multiple structures, we need to choose the energy matrix U so that the components are capable of different final states. The simplest way to do this is to choose Uij = −E if components i and j bind strongly in any of the different structures, and to choose non-desired contacts from ρ(). The full matrix U then has the potential of storing each structure UJ as a local energy minimum; however, this is not guaranteed. The energy landscape could contain structures that are chimeras, hybrids of pieces of structures from multiple energy minima. Such chimeras could corrupt individual stored structures, making them unstable. For a bath of components to usefully code for a multiplicity of structures, two conditions must be met: First, the monomer bath of individual, unaggregated, com- ponents must be stable. The ability to controllably stimulate individual structures requires that homogenous nucleation of the individual structures cannot occur on the timescale of the experiment. Secondly, we need to have a useful trigger for practically causing a particular structure to form; such triggers could be nucleation seeds, or actionable modifications of bond strengths or chemical potentials. The question is whether a parameter regime exists in the model that can satisfy both of these criteria, which, as we will see are somewhat antagonistic. Monomer Bath Stability Under what conditions is the monomer bath of in- dividual components stable? Let us first consider a system which stores a single memory, and in which each component has the same chemical potential μ. Suppose that the coordination number of the components in the target structure is z. Then, the free energy of the ground state is then F = −z/2EV + μV + γS,whereV,S are the volume and surface area of the structure, and γ measures the energy penalty 3 2 for components on the boundary. If we assume that V = cV r and S = cSr ,where r is the linear dimension of the target structure, then the nucleation barrier occurs at the radius

2γcS (4.5) r∗ = , 3cV (z/2E − μ) ∗ 2 and the free energy barrier is F = γcS /3r∗. From our discussion on nucleation theory, we then know that the time to spontaneously nucleate a stable structure is 2 log(t∗) ∼ γcS/3r∗/(kBT ).

There is thus an exponential, antagonistic relationship between t∗ and r∗.In- creasing the stability of the monomer bath by increasing t∗ requires increasing the nucleation radius r∗. Hence more information is required to recover a stored structure. Qualitatively, the classical nucleation argument still holds when multiple struc- tures are stored in the bath, though there is a shift in thresholds. This can be understood as due to an effective decrease in the chemical potential: when there are m different stored structures made out of N components, a given potential 198 MICHAEL P. BRENNER, SELF ASSEMBLY nucleation seed could have multiple components that could strongly bind to it, thereby increasing the effective concentration of components that can bind from the bath. If we denote ν(m, N) as the typical number of components that can occupy a given position in a target structure with strong bonds to all neighbors, then μ → μ − lnν(m, N).Thismeansthatforagivenμ we expect that system to be more unstable (with smaller t∗) as the number of stored structures increases. Memory Capacity How many different structures can be programmed into in- dividual components before the chimeric structures proliferate? Consider a desired structure growing from a seed, with single components attaching to its boundary. If the coordination number is z, then each new component typically forms bonds with z/2 other components on the boundary. Each boundary component binds strongly to about m components, corresponding to each of the m different struc- tures that have been programmed into the interactions. Let us assume that the set of strongly-binding species for each boundary component is an independent ran- dom set of size m,drawnfromtheN possible species. The intersection of z/2such sets will have typical size, N(m/N)z/2 = mz/2/N z/2−1. The number of components (z−2)/z that satisfy this constraint is O(1) when m = mc ∼ N .Form>mc,many different species can attach strongly to the boundary of a growing seed, resulting in a proliferation of chimeric structures. Hence, in a typical three dimensional, densely packed structure, with = −6, the number of structures that can be stored in N different components is ∼ N√2/3. If the contact graph is two dimensional, we expect z = 4 and hence mc ∼ N. The memory capacity for stored structures decreases with decreasing coordination number. These basic notions can be tested in numerical simulations, and experiments. In [14] we carried out Monte Carlo simulations of a lattice model, validating the theoretical arguments we have outlined here. Bibliography

1. N. Arkus, V. Manoharan, and M. Brenner, Minimal energy clusters of hard spheres with short range attractions, Phys. Rev. Lett. 103 (2009), 118303. 2. , Deriving finite sphere packings,SIAMJ.Disc.Math.25 (2011), no. 4, 1860–1901. MR2873224 3. Paul L. Biancaniello, Anthony J. Kim, and John C. Crocker, Colloidal interactions and self- assembly using DNA hybridization, Phys. Rev. Lett. 94 (2005), no. 5, 058302. 4. P. G. de Gennes, Statistics of branching and hairpin helices for the dAT copolymer, Biopoly- mers 6 (1968), no. 5, 715–729. 5. RD Groot and PB Warren, Dissipative particle dynamics: Bridging the gap between atomistic and mesoscopic simulation, The Journal of Chemical Physics 107 (1997), 4423–4435. 6. P J Hoogerbrugge and J M V A Koelman, Simulating microscopic hydrodynamic phenomena with dissipative particle dynamics, Europhysics Letters 19 (1992), 155–160. 7. S Hormoz and MP Brenner, Design principles for self-assembly with short-range interactions, PNAS 108 (2011), no. 13, 5193–5199. 8. Y Ke, L L Ong, W M Shih, and P Yin, Three-Dimensional Structures Self-Assembled from DNA Bricks, Science 338 (2012), no. 6111, 1177–1183 (English). 9. Neil P King, William Sheffler, Michael R Sawaya, Breanna S Vollmar, John P Sumida, In- gemar Andre, Tamir Gonen, Todd O Yeates, and David Baker, Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy, Science 336 (2012), no. 6085, 1171–1174. 10. Yen-Ting Lai, Neil P. King, and Todd O. Yeates, Principles for designing ordered protein assemblies, Trends in Cell Biology (2012). 11. B. McKay, Practical graph isomorphism,, Congressus Numerantium 30 (1981), 45–87. MR635936 12. G. Meng, N. Arkus, M. Brenner, and V. Manoharan, The free-energy landscape of clusters of attractive hard spheres, Science 327 (2010), no. 5965, 560–563. 13. C A Mirkin, R L Letsinger, R C Mucic, and J J Storhoff, A DNA-based method for rationally assembling nanoparticles into macroscopic materials,Nature382 (1996), no. 6592, 607–609. 14. A. Murugan, Z. Zervacic, M.P. Brenner, and S. Leibler, Multifarious assembly mixtures: Systems allowing retrieval of diverse stored structures, arXiv:1408.6893 (2014). 15. A. Murugan, J. Zou, and M.P. Brenner, Incorrect usage analysis: A principle for robust self assembly of heterogeneous structures, preprint (2014). 16. W B Rogers and J C Crocker, Direct measurements of DNA-mediated colloidal interactions and their quantitative modeling, PNAS 108 (2011), no. 38, 15687–15692. 17. P Traub and M Nomura, Studies on the Assembly of Ribosomes in vitro, Cold Spring Harbor Symposia on Quantitative Biology 34 (1969), no. 0, 63–67. 18. Bryan Wei, Mingjie Dai, and Peng Yin, Complex shapes self-assembled from single-stranded DNA tiles,Nature485 (2012), no. 7400, 623–626. 19. Z. Zervacic, V. Manoharan, and M.P. Brenner, Size limits of self assembled colloidal structures made using specific interactions, preprint (2014).

199

The Effects of Particle Shape in Orientationally Ordered Soft Materials

P. Palffy-Muhoray, M. Pevnyi, E. G. Virga, and X. Zheng

IAS/Park City Mathematics Series Volume 23, 2014

The Effects of Particle Shape in Orientationally Ordered Soft Materials

P. Palffy-Muhoray, M. Pevnyi, E. G. Virga, and X. Zheng

Introduction Our intent in these notes is to summarize some of the salient ideas presented in the four lectures given by one of us (PPM) with the help, as a teaching assistant, of another (MYP) at PCMI 2014. The lectures focus on how the shapes of constituent particles affect the behavior of orientationally ordered soft condensed matter sys- tems. The systems considered are, almost exclusively, liquid crystals. Both in the original lectures and in these notes, we adhere to simple elementary concepts and approximate descriptions with the hope of providing insights less easily accessible via more formal and rigorous approaches.

c 2017 American Mathematical Society

203

LECTURE 1 Soft condensed matter and orientational order

1. Soft condensed matter Condensed matter refers to systems where the number density ρ of particles of size l is on the order of l−3. As the name implies, soft materials are indeed ‘soft’; that is, they are easily deformed. Can ‘easily deformed’ be better defined? One useful and popular definition is through Goldstone’s Theorem [16, 17], which as- serts that a broken continuous symmetry must give rise to low energy excitations (Nambu-Goldstone bosons in quantum field theory). Materials whose phase repre- sents broken continuous symmetry may therefore be designated as soft. A simple example may be a system with rod-like particles – a liquid crystal – in which, at high temperatures, the rods are oriented randomly, while at low temperatures, they prefer to be parallel. If such a system, let us say spherical in shape, were to be cooled from the high temperature disordered phase to the low temperature ordered one in a region of empty space (without any fields), then the direction of average orientation of the particles in the ordered phase could be in any direction at all. The points on a sphere represent the possible directions of average orientation – this is the continuous symmetry (SO(3)) which is broken by the system choosing one particular direction/point. Since all orientations are equivalent, the energy of the system can only depend on the relative orientation of particles with respect to each other. Since the ground state is one where, on the average, particles are oriented paralell to each other, the energy of the system near the ground state must depend on the gradient of the order parameter. If the order parameter is S, the energy density is expected to have the form 1 (1.1) E = K(∇S)2. 2

If S = S0 + A cos qx, 1 (1.2) E = KA2q2 sin2 qx, 2 2π where λ = q is the wavelength of the distortion. This form of the energy is a signature of Goldstone modes, where long wavelength distortions have vanishingly small energy. This is the meaning of the term ‘soft’ in the context of Goldstone’s Theorem. Since many systems have energy of this form, it is interesting to ask: what system is not soft? Systems where the order parameter couples to some external field, such as ferroelectrics where the electric polarization couples to lattice strains [8, 9, 10]. Here the energy depends on the angle between lattice vectors and the polarization, not only on the polarization gradient; these are not soft. On the other hand, solid crystal lattices are soft, since phonon energies depend on the distance

205 206 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE between neighboring atoms, and the order parameter S is the displacement of atoms from their equilibrium positions [1, Chap. 22]. Soft systems with excitation energy of the form given in Eq. (1.1) have an interesting property; they cannot be ordered in one or two dimensions. Consider a system with a distortion of the form (1.3) S = S0 + Ai cos qi·r, i the total energy of the system is 1 (1.4) VK A2q2, 4 i i i where V is the volume occupied by the system, and, if the distortions are the results of the thermal fluctuations, the equipartition theorem gives, for the mean squared fluctuation amplitude of one mode, 1 * + 1 (1.5) VK A2 q2 = kT, 4 i i 2 * + 2 2kT (1.6) Ai = 2 . VKqi Summing over all the modes in n-dimensions, * + kT (1.7) (S − S )2 ∼ dnq. 0 VKq2 If n =1,then  * + qmax 2 1 1 (1.8) (S − S0) ∼−  = − (l − L) ∼ L, q 2π qmin where L is the sample size, and in large samples, fluctuations diverge in 1D. This is the Peierls instability [44, Chap. 5.3]. If n =2,then * +  q − 2 ∼ qmax max ∼ (1.9) (S S0) ln q q =ln ln L, min qmin and in large samples, fluctuations diverge logarithmically in 2D. This is the Hohenberg-Mermin-Wagner instability [21, 37]. If n =3,then * + 1 1 2 qmax (1.10) (S − S0) = q| = qmax − qmin ∼ − , qmin l L and fluctuations no longer diverge. Thus soft condensed matter can only be ordered in 3D and higher.

2. Position and orientation Particles are bodies which may be thought of as sets of points. (These may be all the points ‘inside’ of a particle, or they may be the centers of the constituent atoms or molecules, or other sets defining the object.) If the positions of all of the points of a particle are specified, then the configuration (position and orientation) of the particle in space is completely determined. In the case of rigid bodies, where the distances between points do not change, it is convenient to specify the positions of points in a body-fixed coordinate system; these positions never change. In addition, LECTURE 1. SOFT CONDENSED MATTER AND ORIENTATIONAL ORDER 207 to completely determine the configuration of the body, one only needs to specify the absolute positions of three points which determine the body-fixed frame. So if the absolute position of one of the points (say of the center of mass or volume) is known, together with any two (non-collinear) body-fixed vectors, then the position and orientation of the body is determined. The orientation of a rigid body is thus completely determined by two (non-collinear) body-fixed unit vectors. It is amusing to note that two finite solid objects cannot have the same position, but they can have the same orientation; objects with nonzero mass cannot change their position if their linear momentum is zero, but they can change their orientation if their angular momentum is zero [25]. We are interested in constructing ‘orientation descriptors’; quantities which uniquely specify orientation and remain unchanged under the allowed symmetry operations on the body in question. The ensemble averages of these orientation descriptors are the orientational order parameters, and play a key role in describing the behavior of orientationally ordered systems. One useful quantity in this context is the unit vector ˆl, constructed from the positions of two points, say r1 and r2: r − r (1.11) ˆl = 1 2 , |r1 − r2| as shown in Fig. 1.

r1 l

r2

Figure 1. The unit vector ˆl specifying the direction of the line joining two points in the body-fixed frame.

This unit vector ˆl specifies the direction of the line joining the two points; this could be one axis of the body-fixed frame. To completely specify orientation, another vector is needed. However, for simplicity, we restrict our attention here to cylindrically symmetric bodies, and, if the two points at r1 and r2 are on the symmetry axis, then ˆl is sufficient to determine the orientation of the body. If a single vector is sufficient to determine orientation, the body is uniaxial, if two vectors are needed, then it is biaxial. The unit vector ˆl along the symmetry axis is a good orientation descriptor of the body provided the body has no other symmetries. If the body is also centrosym- metric, then ˆl is no longer a good orientation descriptor, since it changes sign under centroinversion. In this case an appropriate orientation descriptor would be the 208 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE dyad ⎡ ⎤ 2 lx lxly lxlz ˆˆ ⎣ 2 ⎦ (1.12) ll = lylx ly lylz , 2 lzlx lzly lz which is invariant under centroinversion. It is useful to make this tensor traceless by subtracting its trace, and normalizing it so that in diagonal form, the largest diagonal element is 1. This gives 1 (1.13) σ = (3ˆlˆl − I), 2 where I is the identity, or, in indicial form, 1 (1.14) σ = (3l l − δ ). αβ 2 α β αβ This is a widely used orientation descriptor for a cylindrically symmetric object with centroinversion. Determining the orientation descriptors (ODs) for objects with different sym- metries is a challenging task; ODs may be as simple as two body-fixed unit vectors for objects with no symmetry, and as complicated as a number of higher-rank ten- sors generated from these two vectors. The proper approach for obtaining ODs is via group theory. In many cases, however, it is possible to obtain ODs through a simple pedestrian approach, which we illustrate with the example below. Consider the right equilaterial triangular prism, shown in Fig. 2. In order to

b1

a3

a2 a1

b2

Figure 2. Right equilateral triangular prism. construct ODs, two vectors need to be identified. Vectors can be constructed from the centroid to the centers of the faces. One of the two required vectors could be a unit vector along the straight line from the centroid to the center of one of the rectangular faces. However, there are three equivalent rectangular faces, so there is no unique way to assign this vector. We therefore construct not one, but three equivalent unit vectors, ˆa1, ˆa2 and ˆa3. Similarly, we construct the two equivalent unit vectors bˆ1 and bˆ2. Next, we attempt to construct the simplest combinations LECTURE 1. SOFT CONDENSED MATTER AND ORIENTATIONAL ORDER 209 of these vectors which are invariant under the symmetry operations of the object – three fold rotation about the vertical axis, and reflection about the horizontal midplane. The sum of the ˆa’s is invariant under three fold rotations about the vertical axis, however,

(1.15) ˆa1 + ˆa2 + ˆa3 = 0, and so this quantity cannot serve as an OD. Similarly, the sum of the bˆ’s is invariant under reflection about the midplane, but again

(1.16) bˆ1 + bˆ2 = 0. It is therefore necessary to look at higher order quantities to form the ODs. The dyad 1 (1.17) (bˆ bˆ + bˆ bˆ ) 2 1 1 2 2 is invariant under reflection, and does not vanish, so it can serve as one OD. We next construct, similarly, a dyad from the ˆa’s which is invariant under three fold rotations, but find that 1 1 (1.18) (ˆa ˆa + ˆa ˆa + ˆa ˆa )= (I − bˆ bˆ ), 3 1 1 2 2 3 3 2 1 1 so the dyad from the ˆa’s carries no additional information. Forming mixed dyads does not help; for example,

(1.19) ˆa1bˆ1 + ˆa2bˆ1 + ˆa3bˆ1 + ˆa1bˆ2 + ˆa2bˆ2 + ˆa3bˆ2 = 0. Continuing to third rank, we find that 1 (1.20) (ˆa ˆa ˆa + ˆa ˆa ˆa + ˆa ˆa ˆa ) 3 1 1 1 2 2 2 3 3 3 is invariant under three fold rotation and it does not vanish; it can therefore serve as an OD. Although higher order terms could be constructed, the two ODs, the second 1 ˆ ˆ ˆ ˆ 1 rank tensor 2 (b1b1 +b2b2) and the third rank tensor 3 (ˆa1ˆa1ˆa1 +ˆa2ˆa2ˆa2 +ˆa3ˆa3ˆa3) ˆ −ˆ 1 ˆ ˆ ˆ ˆ ˆ ˆ determine the orientation. Since b2 = b1,wehave 2 (b1b1 + b2b2)=b1b1 = bˆ2bˆ2 = bˆbˆ. This simplification however, does not work for third- or higher rank tensors. We note that if the trianglular cross-section was not equilateral, then a single unique ˆa vector could be identified, (say along the line from the centroid to the center of the face with the smallest area), and it could serve as one OD, together with bˆbˆ. Applying this procedure to a regular tetrahedron, letting ˆc be a unit vector parallel to the line from the center of the tetrahedron to the center of a face. The OD is, interestingly, not a fourth-, but a third-rank tensor 1 (1.21) (ˆc ˆc ˆc + ˆc ˆc ˆc + ˆc ˆc ˆc + ˆc ˆc ˆc ), 4 1 1 1 2 2 2 3 3 3 4 4 4 in agreement with the literature [2, 11]. If the tetrahedron is deformed by scaling the length in the direction of one edge, due to the broken symmetry, the ODs are a vector and a second- and third-rank tensor [34]. The above described procedure can be formalized; we introduce it here to pro- vide an elementary strategy to determine ODs without invoking the full formalism of group theory. 210 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

3. Orientational order parameters Orientational order parameters are, by definition, the ensemble averages of ori- entation descriptors. For nematic liquid crystals, consisting of rod- or disc-like molecules/particles, the order parameter is the Q-tensor, , - 1 (1.22) Q = σ = (3ˆlˆl − I) , 2 or, more explicitly, in indicial notation, , - 1 (1.23) Q = (3l l − δ ) . αβ 2 α β αβ Since Q is traceless and symmetric, it may be diagonalized. Expressing ˆl in terms of the usual polar and azimuthal angles (θ, φ), (1.24) ˆl =(sinθ cos φ, sin θ sin φ, cos θ), and, in the diagonal frame, ⎡ ⎤ − 1 − 2 (S P )00 ⎣ − 1 ⎦ (1.25) Q = 0 2 (S + P )0 , 00S where , - 1 (1.26) S = (3 cos2 θ − 1) , 2 and , - 3 (1.27) P = sin2 θ cos 2φ . 2 The eigenvalues are scalars, giving a measure of the extent of alignment; the corre- sponding eigenvectors indicate the directions of alignment. If S = P = 0, the system is disordered, isotropic; there are no nondegenerate eigenvectors. If S = P = 0, the system is ordered (nematic), uniaxial; there is one nonde- generate eigenvector indicating alignment direction. If S =0 = P , the system is ordered (nematic), biaxial; there are three nonde- generate eigenvectors indicating alignment directions. Frequently, nematic liquid crystal phases are uniaxial. In this case, ⎡ ⎤ − 1 2 S 00 ⎣ − 1 ⎦ (1.28) Q = 0 2 S 0 . 00S It is then convenient to write the Q-tensor as 1 (1.29) Q =S (3nˆˆn − I). 2 Here nˆ is the eigenvector associated with the unique eigenvalue S. The unit vector nˆ is called the nematic director; its sign is undetermined. It indicates the direction of average alignment of particles in the system. The eigenvalue S = P2(cos θ), where P2 is the Legendre polynomial of degree 2, is the scalar order parameter − 1 with 2

LECTURE 2 The free energy

We want to describe the behavior of systems consisting of large numbers of particles. Classical mechanics describes the behavior of few particles. To obtain the equilibrium configuration of a system, the potential energy should be minimized. To describe the dynamics, one needs to consider kinetic as well as potential energy, and friction. Statistical mechanics describes the behavior of many particles. To obtain the equilibrium configuration of a system, the free energy should be minimized. To describe the dynamics, one needs to consider kinetic as well as the free energy, and dissipation. There are similarities between potential energy and thermodynamic potentials, such as the free energy. The potential energy U of a particle, at any time, is just a number. What is useful is to know the potential energy subject to some constraint (say, the position r of the particle). Then we can obtain the force acting on the particle from the gradient (derivative w.r.t. r)ofU; these forces change r and drive the dynamics. Similarly, the free energy F is just a number; what is useful is to know the free energy subject to some constraint (say, the value of the order parameter, such as magnetization M). Then we can obtain the thermodynamic force acting on the system from derivatives of F with respect to M; these thermodynamic forces change M and drive the dynamics. There are a number of thermodynamic potentials; different ones are minimized in equilibrium under different conditions. The one we will be focusing on in this lecture is the Helmholtz free energy F , defined as

(2.1) F = E − TS, where E is the average total energy of the system, T is the (absolute) temperature, and S is the entropy. The first and second laws of thermodynamics tell us that at constant temperature T and volume V , the change in the Helmholtz free energy dF plus the work done by the system dW cannot be positive. That is,

(2.2) dF + dW ≤ 0.

It follows that if the system does no work, the Helmholtz free energy can only decrease. In equilibrium, therefore, at constant T and V , the Helmholtz free energy is a minimum. Other thermodynamic potentials are: • Gibbs Free Energy G = E − TS + pV ,wherep is the pressure. In equi- librium at constant T and p, the Gibbs free energy G is a minimum.

213 214 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

• Enthalpy H = E + pV . In equilibrium at constant S and p,theenthalpy H is a minimum. • − − 1 · Helmholtz free energy with fields, F = E TS ( 2 D E)V ,whereD is the electric displacement and E is the electric field. At constant T ,V ,and E, the Helmholtz free energy is a minimum. • − 1 · Helmholtz free energy with fields, F = E TS+( 2 D E)V . At constant T,V and D, the Helmholtz free energy is a minimum.

1. Helmholtz free energy We consider a dynamical system with n degrees of freedom in the Hamiltonian n formalism. Thus, we let q =(q1,...,qn) ∈ R denote collectively the generalized coordinates and q˙ =(˙q1,...,q˙n) correspondingly the generalized velocities of the system. First, we construct the Lagrangian function (2.3) L(q, q˙ ):=K(q, q˙ ) −U(q), where K is the kinetic energy and U is the potential energy, taken to be a function of q only. Second, we define the conjugate momenta p =(p1,...,pn)as ∂L (2.4) pj := . ∂q˙j Third, we employ (2.4) to change variables, q˙ → p, so that the total energy of the system, the Hamiltonian H := K + U, becomes a function of (q, p). Then the equations that govern the time evolution of the system in the variables (q, p)are ∂H (2.5a) q˙ = , ∂p ∂H (2.5b) p˙ = − . ∂q If Ω ⊂ Rn denotes the configuration space, that is, the set of all admissible q,then Ω × Rn ⊂ R2n, which is the ambient where the evolution governed by Eqs. (2.5a) and (2.5b) takes place, is called the phase space. Now, following Gibbs [15, Chap. I], imagine a great many replicas of the system described by the same Hamiltonian H, all evolving according to Eqs. (2.5a) and (2.5b), but differing in the occupation of the phase space, that is, with respect to configurations and momenta, as a result, for example, of different initial conditions. Such an ensemble of systems is distributed in the phase space with a probability density  :Ω× Rn → R+, which is stationary in time. Different ensembles differ in their probability density .Foracanonical en- semble,1  is assumed to be given by

1 − 1 H(q,p) (2.6) (q, p):= e kT , Z where k is the Boltzmann constant, T is the absolute temperature, and − 1 H(q,p) (2.7) Z := dp dq e kT , Rn Ω which is called the partition function, is defined so as to ensure that  is normalized to unity. A qualitative property emerges immediately from Eq. (2.6): regions of the

1In which the number of particles is fixed and the system is thought of as being in contact with a thermostat at a given temperature. LECTURE 2. THE FREE ENERGY 215 phase space with higher total energy are exponentially less likely to be occupied by the systems in the canonical ensemble than regions with lower total energy. The Helmholtz free energy associated with a canonical ensemble of systems is defined by (2.8) F := −kT ln Z.

2. Trial free energy So far the whole phase space Ω × Rn has been considered to be accessible to all systems in the ensemble. We wonder now what would be the effect on our discussion of constraints that prevent each system from occupying certain regions of Ω × Rn. Let A⊂Ω × Rn be the accessible region in the phase space, so that the evolution of every system in the canonical ensemble will be confined to A. This amounts to say that the probability of any admissible trial, which we denote by trial,must vanish identically away from A. Correspondingly, the trial partition function, Ztrial, becomes − 1 H(q,p) (2.9) Ztrial := e kT dpdq A and the trial free energy, Ftrial,isgivenby

(2.10) Ftrial := −kT ln Ztrial.

Were A depending on a parameter, say t, Ftrial would depend on t too and a ‘dynamics’ of A(t) would be identified by the ‘trajectories’ in the phase space that make Ftrial decrease.

3. Configurational partition function Here and in the following, the partition function Z plays a central role; we shall endeavor to compute it in a number of significant cases. One way to simplify this task is reducing Z to a purely configurational quantity, by integrating out the momenta component of H in Eq. (2.7). For H quadratic in p, this amounts to extracting a factor (depending on T ) from the right side of Eq. (2.7), which in turn, by Eq. (2.8), only affects F by an inessential additive constant, which shall be disregarded. Thus, from now on, Z will only depend on its configurational component and its definition Eq. (2.7) will be replaced by − 1 U(q) (2.11) Z := e kT dq. Ω

4. Pairwise interactions Computing the Helmholtz free energy F is in general a formidable task. We see now how such a task can be accomplished for a special, albeit common class of systems, though a price still needs to be paid due to a number of approximations, which we deem not too severe. We begin by imagining that a dynamical system consists of N identical, possibly deformable particles,2 whose configurations in space are described by m parameters,

2Thus, the particles we are considering are not mass-points. 216 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE q =(q1,...,qm), living in the single-particle configuration space Ω1, q ∈ Ω1.Thus, in the notation of Section 1, N (2.12) Ω = Ω1 and q =(q1,...,qN ), so that n = mN. For example, in the simplest case of identical rigid particles, m =6andtheqi can be chosen so as to represent both the translational degrees of freedom and the rotational degrees of freedom. We further assume that the N particles only interact in pairs, so that

1 N (2.13) U(q)= U(q , q ), 2 i j i= j=1 where U :Ω1 × Ω1 → R is a function symmetric under exchange of its arguments,

(2.14) U(qi, qj )=U(qj , qi), which represents the interaction energy of any two particles.

5. Soft and hard potentials

U(q1, q2) comprises the whole energy involved in the interaction between particles 1 and 2. It includes both slowly varying, long-range potentials, typically responsible for the attraction between particles, and rapidly varying, short-range potentials, typically responsible for the repulsion between particles, often steric in nature. We call soft the former component of U and hard the latter. In general, U can be decomposed into the sum of its attractive (soft) and repulsive (hard) components as (A) (R) (2.15) U(q1, q2)=U (q1, q2)+U (q1, q2). Though, to some extent, such a decomposition is arbitrary, the distinctive feature of U (R), which will be essential in the following, is its abrupt divergence when the interacting particles tend to come in contact with one another. For definiteness, we shall assume that U (R) is arbitrarily close to zero when the interacting particles are not in contact, and diverges to +∞ as they touch. By combining Eq. (2.15), Eq. (2.13), and Eq. (2.11), we can write Eq. (2.8) in the form  1 − 1 1 N U (R)(q ,q ) (2.16) F = −kT ln e 2 kT i= j=1 i j N! ΩN 1  1 1 N (A) −  U (qi,qj ) × e 2 kT i=j=1 dq1dq2 ...dqN .

1 A classical justification for introducing the correction factor N! in Eq. (2.16) is to account for the indistinguishability of the particles comprising the system, also called the ‘correct Boltzmann counting’. It has long been known3 that failing to introduce this correction factor would make the theory vulnerable to Gibbs’ paradox for the entropy of mixing ideal gases, which is still a debated issue [4, 52, 55]. 1 It has also been argued that the correction factor N! should also be included in Eq. (2.16) for distinguishable particles [24, 51, 54]. We shall not dwell any further

3Apparently, to Gibbs himself, as effectively argued by Jaynes [24]. LECTURE 2. THE FREE ENERGY 217 on this (still disputed) issue, but we shall simply adopt for F the form in Eq. (2.16). Expanding the integrand, we give this latter the following, more explict form   1 1 N (R) 1 1 N (A) 1 − U (q1,qj ) − U (q1,qj ) F = − kT ln e 2 kT j=2 e 2 kT j=2 dq1 N! N Ω1   − 1 1 N (R) − 1 1 N (A) × 2 kT j=1,j=2 U (q2,qj ) 2 kT j=1,j=2 U (q2,qj ) (2.17) e e dq2 × ...   1 1 N−1 (R) 1 1 N−1 (A) − U (qN ,qj ) − U (qN ,qj ) × e 2 kT j=1 e 2 kT j=1 dqN . We now concentrate on the first integrand on the right side of Eq. (2.17),   1 1 N (R) 1 1 N (A) − U (q1,qj ) − U (q1,qj ) (2.18) I1 := e 2 kT j=2 e 2 kT j=2 and rearrange it in the following equivalent form,4

N B  (1) − 1 1 N (A) − 2 kT j=2 U (q1,qj ) (2.19) I1 = 1 aj e , j=2 where (1) − 1 (R) − kT U (q1,qj ) (2.20) aj := 1 e . (R) (1) The properties of U guarantee that aj is a small quantity for all configurations accessible to the pair of particles (q , q ), and so C1 j C D D N B DN D N − (1) E − (1) ≈ E − (1) (2.21) 1 aj = (1 aj ) 1 aj . j=2 j=2 j=2 This approximation is solely based on the properties of the repulsive potential; an- other approximation, instrumental to the computation of sums, will be introduced in the following section for attractive and repulsive potentials alike.

6. Mean-field free energy Assuming that there is a large number N of particles in the system and that they are distributed in Ω1 with number density ρ,bothsums 1 N N U (A)(q , q )and a(1) 2 1 j j j=2 j=2 featuring in Eq. (2.19), once use is made of Eq. (2.21), can be given approximate forms that depend on the unknown distribution ρ: 1 N 1 (2.22) U (A)(q , q ) ≈ ρ(q)U (A)(q , q)dq =: E (q ) 2 1 j 2 1 1 1 j=2 Ω1

4 (R) We note here that since the value of the repulsive potential U (qi, qj ) in the exponent of the first term on the r.h.s. of Eq. (2.18) is either 0 or ∞, the choice of the prefactor 1/2issomewhat arbitrary. We have chosen to retain it since it arises naturally in our (nearly) parallel treatment of the attractive and repulsive parts of the potential, and also because it recovers the classic van der Waals equation of state for a uniform dilute system of hard spherical particles (cf. Eq. (2.32)). This prefactor can also be obtained via a novel cluster approximation for repulsive interactions [41]. 218 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

N (1) − 1 (R) ≈ − kT U (q1,q) V (2.23) aj ρ(q) 1 e dq =: 1(q1). j=2 Ω1 Equations Eq. (2.22) and Eq. (2.23) represent our mean-field approximation, which rests essentially on the assumption that all particles are distributed in Ω1 with one and the same density ρ. We shall call E1 the single-particle effective potential and V1 the excluded volume fraction. In the mean-field approximation, all integrals in Eq. (2.17) become equal and they can be estimated in terms of E1 and V1 only; we readily obtain that N 1 1 − E1(q) (2.24) F = −kT ln 1 −V1(q)e kT dq . N! Ω1

It should be recalled here that for a deformable particle the configuration space Ω1 will also allow for the configurational degrees of freedom in addition to the usual translational and orientational ones. It should also be noted that the free energy in Eq. (2.24) is indeed a functional of the single-particle density ρ, which features in both E1 and V1 throughEq.(2.22)andEq.(2.23).Inaway,foraprescribedρ, we could also consider F in Eq. (2.24) as a trial free energy of the type considered in Section 2. Example To illustrate Eq. (2.24), we compute F for an isotropic system of hard spherical particles of radius R interacting with some long-range potential U (A)(r) depending only on the distance r between their centers. N Letting ρ := V be the number density of particles, considered uniform in space, we readily see from Eq. (2.22) that

(2.25) E1 = −ρa, where ∞ (2.26) a := −2π r2U (A)(r)dr > 0. R Moreover, since the excluded volume of two equal spheres is 8 times their volume, it follows from Eq. (2.23) that

(2.27) V1 =8ρv0, 4π 3 where v0 = 3 R . Making use of Eq. (2.25) and Eq. (2.27) in Eq. (2.24), we give the latter the form N 1 1 ρa (2.28) F = −kT ln V 1 − 8ρv e kT . N! 0 By Stirling’s approximation, ln N!=N ln N − N + O(ln N), and so Eq. (2.28) can also be written as V ρa (2.29) F ≈−kTN ln 1 − 8ρv + − kTN. N 0 kT Since both T and N are fixed in the canonical ensemble, the additive constant in Eq. (2.29) can be safely disregarded. For sufficiently diluted systems ρv0 1and Eq. (2.28) finally becomes 1 (2.30) F ≈−NkT ln − 4v − Nρa. ρ 0 LECTURE 2. THE FREE ENERGY 219

Since the pressure p in the system can be derived from F through the thermody- namic relation ∂F 1 ∂F (2.31) p = − = ρ2 , ∂V N ∂ρ we readily arrive from Eq. (2.30) to 2 (2.32) (p + ρ a)(1 − 4ρv0)=ρkT, which is the van der Waals’ equation of state.

7. Density functional theory As already remarked, the mean-field free energy F in Eq. (2.24) is indeed a func- tional of the single-particle density ρ. The theory that we now present has the objective of computing F systematically for a wide class of systems of interacting particles. In Section 6 we have already succeeded in computing F from Eq. (2.24); the rea- son for that success is that the particles in the system were uniformly distributed in space. Here to reproduce somehow that fortunate juncture we split the system in a great number, say M, of subsystems, each with Ni particles, which can be regarded (i) as uniformly distributed around the point q in the single-particle configuration (i) (i) space Ω1, within a small region Ω1 with measure Δωi. Setting ρi := ρ q ,and thinking of ρ as a prescribed function over the whole of Ω1,wehavethat FM M (i) (2.33) Ω1 = Ω1 and Ni = ρiΔωi, with Ni = N. i=1 i=1

In the following, we shall assume that the population Ni of each subsystem (i) is very large, even though each member Ω1 of the partition of Ω1 must have a very small measure to make the partition effective in computing integrals over Ω1. Differently said, we assume that ρ is a function varying slowly in Ω1,andsoare also E1 and V1, once evaluated on such a ρ through Eq. (2.22) and Eq. (2.23), respectively. Since F is additive, we can write it as M (2.34) F [ρ]= Fi[ρi], i=1 where, by Eq. (2.24), Ni 1 1 − E1(q) (2.35) Fi[ρi]:=−kT ln 1 −V1(q)e kT dq . Ni! (i) Ω1 By combining Eq. (2.35) and Eq. (2.34), we arrive at B M Ni 1 Ni (i) − 1 E(i) (2.36) F [ρ]=−kT ln 1 −V e kT 1 , N ! ρ 1 i=1 i i where use has been made of Eq. (2.33) and the definitions V(i) V (i) E (i) E (i) (2.37) 1 := 1 q , 1 := 1 q . 220 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

By Stirling’s approximation, 1 Ni ≈ (2.38) ln Ni Ni, Ni! and so Eq. (2.36) becomes ⎛ B ⎞ M 1 −V(i) E (i) (2.39) F [ρ] ≈−kT N ⎝ln 1 − 1 ⎠ − kTN. i ρ kT i=1 i Disregarding the last addendum, as it does not depend on ρ, and using again Eq. (2.33), we finally arrive at 1 (2.40) F [ρ] ≈ kT ρ ln ρdq − kT ρ ln(1 −V1)dq + ρE1dq, Ω1 2 Ω1 Ω1 where E1 and V1 are given by Eq. (2.22) and Eq. (2.23), respectively. Clearly, in seeking to minimize F [ρ], by Eq. (2.33), ρ must be subjected to the constraint (2.41) ρdq = N. Ω1

A further approximation of F can be introduced by assuming that V1 1. Then Eq. (2.40) simplifies to 1 (2.42) F [ρ] ≈ kT ρ ln ρdq + kT ρV1dq + ρE1dq. Ω1 2 Ω1 Ω1 The functionals in Eq. (2.40) and Eq. (2.42) will form the basis of our further development. An example: A nematic liquid crystal with attractive and repulsive in- teractions Nematic liquid crystals are usually described either in terms of attractive inter- actons, as in Maier-Saupe theory [35], or in terms of repulsive interactions, as in Onsager theory [39]. Here we illustrate the usefulness of the Helmholtz free energy formalism derived above for describing nematic liquid crystals in such a way that both attractive and repulsive interactions are included in a single consistent model. We use here the free energy expression of Eq. (2.42) augmented by a term enforcing the density constraint of Eq. (2.41). We write the positional and orien- tational coordinates explictly, so 3 2 3 2 F [ρ]=kT ρ(r1,ˆl1)lnρ(r1,ˆl1)d r1d ˆl1 + λ ρ(r1,ˆl1)d r1d ˆl1 (R) U (r1,ˆl1, r2,ˆl2) kT − 3 2 3 2 + ρ(r1,ˆl1) ρ(r2,ˆl2) 1 − e kT d r2d ˆl2d r1d ˆl1 2 1 (2.43) + ρ(r ,ˆl ) ρ(r , ˆl )U (A)(r ,ˆl , r ,ˆl )d3r d2ˆl d3r d2ˆl , 2 1 1 2 2 1 1 2 2 2 2 1 1 where r denotes position, and ˆl denotes orientation. The second term on the r.h.s. is the density constraint term with the Lagrange multiplier λ, the third term is the excluded volume fraction term, while the fourth is the mean field energy. It is useful to define the local number density

(2.44) ρ0(r)= ρ(r,ˆl)dˆl, LECTURE 2. THE FREE ENERGY 221 and the local orientational distribution function ρ(r,ˆl) (2.45) P (r,ˆl)= . ρ0(r) The attractive interaction. The key quantity representing long-range attractive interactions in mean field theory is the single particle effective potential, 1 (2.46) E (r ,ˆl )= ρ(r , ˆl )U (A)(r , ˆl , r , ˆl )d3r d2ˆl . 1 1 1 2 2 2 1 1 2 2 2 2 Maier-Saupe theory is based on anisotropic London dispersion, discussed in Lecture 3. The interaction potential is of the form  U    (1) (2) − 0 | − |≥ | − |6 (a + b σ σ ), for r1 r2 r0 (2.47) U (A) = r1 r2 αβ αβ 0, for |r1 − r2|

(i) where σαβ is the orientation descriptor (Eq. (1.23)) of particle i, ri is the position    of particle i,andU0, a and b are parameters of the potential. This is a long-range attractive potential which is a minimum if the particles are parallel. It follows that the single particle effective potential is 1 E (ˆl)= ρ(r , ˆl )U (A)(r , ˆl , r , ˆl )d3r d2ˆl 1 2 2 2 1 1 2 2 2 2

(2.48) = −ρ0v0U0(a + bSP2(cos θ)), 1 2 − ˆ· where P2(cos θ)= 2 (3 cos θ 1), cos θ = l nˆ and nˆ is the nematic director, and S = P2(cos θ) as before. Equation (2.48) is obtained under the assumption that both the local number density ρ0 and the local orientational density function P be uniform in the whole space; v0 is an effective particle volume related to the cutoff radius r0 (see also Lecture 3). The repulsive interaction. The key quantity representing hard core repulsive interactions is the excluded volume, (R) U (r1,ˆl1, r2,ˆl2) − 3 (2.49) Vexc(ˆl1,ˆl2)= 1 − e kT d r2, and, in mean field theory, the excluded volume fraction (R) U (r1,ˆl1, r2,ˆl2) − 3 2 (2.50) V1(ˆl)= ρ(r2, ˆl2) 1 − e kT d r2d ˆl2.

Onsager theory is based on hard-core repulsion. In his original work, Onsager [39] considered a system of long rigid cylinders of length l and diameter d,asshown in Fig. 1. The excluded volume for these is of the approximate form l (2.51) V =8v 1+ sin θ , exc 0 πd 12 where θ12 is the angle between the symmetry axes, and v0 is the volume of each cylinder (see Eq. (30) of [39]). In this work, we consider ellipsoids with semi-axes lengths a and b as shown in Fig. 2. The excluded volume in this case is, to a good approximation [49],   (2.52) Vexc = v0(c − d P2(cos θ12)). 222 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

l

θ 12 d

Figure 1. Two hard cylinders for excluded volume calculation.

2a

θ 12 2b

Figure 2. Two hard ellipsoids for excluded volume calculation.

The dependence of Vexc on θ12 in Eq. (2.51) and Eq. (2.52) are quite similar, however, the form in Eq. (2.52) turns out to be far more tractable subsequently. The excluded volume fraction is of the form

(2.53) V1(ˆl)=ρ0v0(c − dSP2(cos θ)), where cos θ = ˆl · nˆ and nˆ is the nematic director, and S = P2(cos θ). Model with attractive and repulsive interactions. On substituting the mean field potential and excluded volume fraction into the Helmholtz free energy, we obtain from Eq. (2.42) that 3 2 3 2 F = kT ρ0P (ˆl1)ln(ρ0P (ˆl1))d r1d ˆl1 + λ ρ0P (ˆl1)d r1d ˆl1 1 3 2 + kT ρ0P (ˆl1){ρ0v0(c − dSP2(cos θ))}d r1d ˆl1 2 1 (2.54) + ρ P (ˆl ){−ρ v U (a + bSP (cos θ))}d3r d2ˆl . 2 0 1 0 0 0 2 1 1 This expression for the free energy consistently includes both long-range attractive and hard core repulsive interactions. We note that the last two terms can be LECTURE 2. THE FREE ENERGY 223 combined; then 3 2 3 2 F = kT ρ0P (ˆl1)ln(ρ0P (ˆl1))d r1d ˆl1 + λ ρ0P (ˆl1)d r1d ˆl1 1 (2.55) + kT ρ P (ˆl )ρ v {A − BSP (cos θ))}d3r d2ˆl , 2 0 1 0 0 2 1 1 where U a (2.56) A = c − 0 kT and U b (2.57) B = d + 0 ; kT A includes the isotropic and B the anisotropic contributions from both attractive and repulsive parts of the interaction potential. Minimizing the free energy with respect to the orientational distribution P (ˆl) gives eφBSP2(cos θ) (2.58) P (θ)=  , eφBSP2(cos θ)d(cos θ) where φ = ρ0v0 is the volume fraction. Substituting into Eq. (2.55), the free energy density F (per unit volume) becomes 1 Fv0 1 1 (2.59) = φ ln φ + φ2A + φ2BS2 − φ ln eφBSP2(x)dx, kT 2 2 0 and minimizing it w.r.t. S, we obtain the self-consistent equation for S  1 P (x)eφBSP2(x)dx 0  2 (2.60) S = 1 . φBSP2(x) 0 e dx Solving the self-consistent equation gives the order parameter, S. If we set c = d = 0, only the attractive contributions remain; we recover Maier- Saupe theory. The order parameter as a function of normalized temperature then is shown in Fig. 3. At low temperatures, the system is in a stable nematic phase; as the tempera- ture is increased, it makes a discontinuous transition to the isotropic phase. Here ∗ 1 ∗ kT = 5 ρ0v0U0b; the transition occurs at TNI =1.101T ;hereSNI =0.43. In addition to the stable nematic phase, solutions of the self-consistent equation exist for unstable and metastable phases, as indicated. Above T † =1.114 T ∗, only the isotropic solution exists. If we set a = b = 0, only steric interactions remain; we then obtain our modified Onsager theory. The order parameter as a function of the dimensionless density ∗ ∗ ρn = ρ0/ρ0,whereρ0v0d = 5, is shown in Fig. 4. The transition occurs at ρ0NI = ∗ ρ0/1.101, and SNI =0.43. If we keep both attractive and repulsive interactions, we have a combined Maier- Saupe and modified Onsager theory. The order parameter in this case is shown as a function of both temperature and density in Fig. 5. To identify the solutions which minimize the free energy, the free energy must be calculated. The Helmholtz free energy, from Eq. (2.59) is shown as function of both temperature and density in Fig. 6. The scaling on the vertical axis was chosen to amplify details for small 224 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE Order parameter S vs. T

S 1 nematic

stable 0.5

unstable isotropic 0 0.6 0.7 0.8 0.9 1 1.1 T/T* metastable −0.5

Figure 3. Order parameter vs. temperature. Maier-Saupe theory.

Order parameter S vs. ρν S 0.8 nematic

0.6 stable

0.4

0.2 unstable isotropic 0 0.8 0.9 1 1.1 1.2 ρν −0.2 metastable

Figure 4. Order parameter vs. density. Modified Onsager model. values of the free energy. At any temperature and pressure, the stable equilibrium solution corresponds to the lowest free energy surface. LECTURE 2. THE FREE ENERGY 225

Figure 5. Order parameter as function of dimensionless tempera-  U0 b 1 ture and density (T = kd , ρn = 5 ρ0v0d). Combined Maier-Saupe and modified Onsager theory.

Figure 6. Free energy density as function of dimensionless tem-  U0 b 1 perature and density (T = kd , ρn = 5 ρ0v0d). Combined Maier- Saupe and modified Onsager model.

LECTURE 3 Particle shape and attractive interactions

As discussed in Lecture 1, the behavior of bulk condensed matter depends on interparticle interactions, which can be regarded as consisting of attractive and repulsive contributions. In this lecture, we focus on the attractive part of the interactions, its origins and its connection with particle shape. In many systems, the interaction between two spherical particles is adequately described by the Lennard-Jones potential σ 12 σ 6 (3.1) U =4ε 0 − 0 , LJ r r which features a single potential well. In Eq. (3.1), ε is the depth of the well, and σ0 is the distance between particle centers when the potental is zero. ULJ is shown in Fig. 1.

5 U LJ /ε 4

3

2

1

0

−1 0 0.5 1 r 1.5 2 m/σ0 r /σ0

Figure 1. Lennard-Jones potential. The distance rm at which 1/6 the ULJ attains its minimum is rm/σ0 =2 .

A variety of different mechanisms contribute to the attraction of particles in condensed matter systems; the resulting attractive force is referred to as van der Waals force. The van der Waals force between two particles consists of three main contributions:

227 228 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

• the interaction between two permanent electric dipoles (Keesom force [26, 27, 28, 29]), • the interaction between one permanent and one induced dipole (Debye force [6, 7]), • the interaction between two induced dipoles (London dispersion force [33]). The Keesom force operates only if both particles have permanent electric dipole moments, and it is only attractive for certain particle orientations. Hydrogen bonds are due to Keesom forces [22]. The Debye force is due to a permanent electric dipole inducing a dipole in a polarizable particle; the resulting force is always attractive. Here we focus on London dispersion, which is always present; it is also the dominant interaction for non-polar particles. Since it holds ordinary soft matter together, it is interesting to inquire about its particulars. The London dispersion interaction proceeds by the following steps: (1) a quantum fluctuation in a non-polar particle creates a spontaneous short- lived dipole moment, (2) the field of this sponteneous dipole polarizes the second non-polar particle, (3) the field of the second dipole interacts with the dipole moment of the first. Though all dipolar fluctuations in the particle inducing the interaction average out to be zero, the resulting interaction force does not. The interaction between the particles is always attractive; the electric fields are non-radiative and the process may be viewed as virtual photon exchange. A number of interesting questions arise, which we discuss below. We use elementary classical physics to make estimates. The first question is: how polarizable are the particles?

1. Polarizability of a simple atom

The electric dipole moment p of a collection of charges qi at locations ri is defined as (3.2) p = qiri. i It is independent of choice of origin if the net charge is zero. If a non-polar particle is placed in an electric field E, it will become polarized; the polarizability tensor α is defined by (3.3) p = αE. We consider a simple model of a spherical atom, where the nucleus with positive charge is surrounded by a spherical electron cloud of radius R and uniform charge density qeρ, where qe isthechargeofanelectron,andρ is the number density of electrons. Inside the electron cloud, the electric field, at position r from the center, due only to the electrons, is given by Gauss’ law [23, Sec. 1.3] 4 3 qeρ 3 πr qeρ (3.4) Ei= 2 ˆr = r, 4πε0r 3ε0 where ε0 is the permittivity of free space. If such an atom is placed in a static ex- ternal electric field E, the nucleus and the center of the electron cloud will separate until the net electric field at the site of the nucleus is zero, that is, when Ei +E = 0. LECTURE 3. PARTICLE SHAPE AND ATTRACTIVE INTERACTIONS 229

Assuming that the electron cloud is not deformed by the external field, this gives at once that q ρ (3.5) E = − e r. 3ε0

Since qe < 0, the nucleus is displaced along the applied field relative to the electron cloud’s center (see Fig. 2). The total charge of the electron cloud is qeρV,where V is the volume of the atom; since, by the atom’s neutrality, the charges at the nucleus is −qeρV, and it follows that

(3.6) p = −qeρVr =3ε0VE. The dipole moment is parallel to the polarizing field E, and the static scalar polar- izability is

(3.7) α =3ε0V. This is a remarkable result – the polarizability is independent of the charge on the nucleus or the number or electrons; it only depends on the atomic volume!

E

x

Figure 2. Schematic showing the displacement of the electron cloud away from nucleus.

Eq. (3.5) shows that the displacement, relative to the nucleus, of the electrons, re = −r, is proportional to the external electric field E, and the restoring force on each electron, which establishes equilibrium, is 2 qe ρ (3.8) F = −qeE = − re, 3ε0 which is proportional and opposite to the displacement re, so the electrons behave as though they were attached to a spring with spring constant q2ρ (3.9) k = e . 3ε0 The frequency dependent polarizability can be obtained from solving the equation of motion for one electron, in one space dimension, 2 qe ρ −iωt (3.10) mex¨ + βx˙ + x = qeE0e , 3ε0 where me is the mass of an electron, and β>0 is a damping constant. Letting −iωt x = x0e gives 2 2  1 qe ρ qe −iωt (3.11) −ω − iωβ + x = E0e , 3 ε0me me 230 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

 where β = β/me. Since the dipole moment p = Vqeρx = αE, the polarizability is ω2 (3.12) α = ε V p , 0 1 2 − 2 −  3 ωp ω iβ ω where  2 qe ρ (3.13) ωp = ε0me 2 × 33 −2 is the plasma frequency. We note that for hydrogen, ωp =5.0917 10 s ,which corresponds to light with wavelength 26nm, in the far ultraviolet. For frequencies well below the plasma frequency, and in the case of small damp- ing, α  3ε0V as in the static case.

2. Dispersion interaction London dispersion is due to the formation of a spontaneous dipole in one particle, whose field polarizes a second particle. The field at point r of an oscillating dipole p at the origin is [23, Sec. 9.2] 2 1 − 1 − ik − k i(k·r−ωt) (3.14) E(r,t)= (3ˆrˆr I) 3 2 +(I ˆrˆr) pe , 4πε0 r r r where k =2π/λ = ω/c and λ is the wavelength and c is the speed of light. How large is p? This cannot be answered purely classically, since the origins of the quantum fluctuations which give rise to the polarization are in the uncertainty principle. On estimating the polarization from the uncertainty in position of the electron cloud, or alternately, through dimensional analysis, one obtains (3.15) p2 = hνα, where h is Planck’s constant, and ν is the resonant frequency of the atom or mol- ecule. Typical resonant frequencies of atoms are well below the plasma frequency, so the corresponding wavelengths are considerably longer than typical interatomic distances. Consequently, in Eq. (3.14), k/r 1, so the first term dominates. We note that the last term, with 1/r dependence, describes radiation, the first term is a non-radiating term. Knowing the dipole field and the polarizability, it is straight- forward to calculate the magnitude of the induced dipole, and its electric potential. The energy of the system of two particles follows; it is 2 −3 α (3.16) U = hν 2 6 , 4 (4πε0) R where R is the separation of the centers of particles. Typical resonant frequencies for atoms are ∼ 1.5 × 1015Hz, (λ ∼ 200nm) in the ultraviolet, but again, these are non-radiating modes. A remarkable consequence of this form of the interaction is the following. For two spherical atoms having radius r, separated by a distance R, as shown in Fig. 3, and interacting via London dispersion, the energy is

3 2 6 −3 (4πε0r ) −3 r (3.17) U = hν 2 6 = hν , 4 (4πε0) R 4 R LECTURE 3. PARTICLE SHAPE AND ATTRACTIVE INTERACTIONS 231

3 15 where we have used the result that α  3ε0V =4πε0r .Ifν =1.5 × 10 Hz, then hν ≈ 200kBTrm,wherekB is Boltzmann’s constant and Trm  300K is room temperature. If the particles are close, then R  2r,and

(3.18) U ≈−kBTrm regardless of particle size. If the interaction strength is comparable to kBTrm,this implies that at the temperatures where we live, phase transitions abound; solids can melt, liquid vapor interfaces can exist, and complex behavior of materials can occur.

r R

Figure 3. Two atoms interacting via London dispersion

3. Polarizability of non-spherical particles As indicated in the preceeding arguments, it is clear that polarizability is a key aspect of particles in determining the attractive part of interparticle interactions. The polarizability tensor of spherical particles is, in our simple model, ω2 (3.19) α = ε V p I, 0 1 2 − 2 −  3 ωp ω iβ ω where we have indicated that the induced polarization is along the polarizing electric field. Now we ask: what is the form of the polarizability tensor if the particles are non-spherical? Here it is useful to think of molecules or particles consisting of many atoms. For simplicity, we assume that the composition is uniform; that is, that all the constituent atoms are identical. Consider two bodies, consisting of spherical atoms, in a uniform external field E as shown in Fig. 4. If the number of atoms is very

E

Figure 4. Particles consisting of identical spherical particles in a uniform field. large, that is, in the continuum limit, the polarization on the bodies can be shown to be uniform inside [23, Sec. 4.5], so long as the particle shape is ellipsoidal. In 232 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE spherical particles, the polarization inside is parallel to the polarizing electric field E, but for non-spherical ellipsoids, the polarization is not parallel to the field due to broken symmetry. Here each atom feels not only the external field, but also the electric field of the neighboring dipoles. This effect can be taken into account as follows The potential φd,atpointr, of an electric dipole p at the origin is given by 1 (3.20) φd(r)=p ·∇ . 4πε0r The electric polarization P is, by definition, the dipole density; that is, (3.21) p = Pd3r.

volume where P is uniform in space. The electric potential in the body at the point r due to the presence of dipoles at r is just the sum of the contributions from the neighbors, and · ˆ 1 3  P N  (3.22) φd(r)= P · ∇  d r =  dA , 4πε0|r − r | 4πε0|r − r | volume surface where Nˆ is the outward surface normal. This is just the potential due to surface charges with surface charge density P · Nˆ . The key point is that the field inside depends strongly on the particle shape. This is shown in Fig. 5.

E

Figure 5. Polarization induced surface charges on ellipsoidal particles.

So the electric field inside the ellipsoids is the external polarizing field E, plus the depolarizing field due to the surface charges originating from the polarization. The depolarizing field can be obtained at once from the potential; P (3.23) Edep = −∇φd(r)=−N , ε0 where the depolarizing factor tensor N is given by Nˆ (3.24) N = ∇ dA, 4π|r − r| surface where r is the position of a point inside the volume, and r is a point on the surface. For an ellipsoid, the depolarizing factor tensor is a constant, independent of r.We note that the result in Eq. (3.23) is only valid for ellipsoids. Next, we calculate the polarizability of such bodies, taking into account the effects of shape. As before, we can write down the equation of motion of one LECTURE 3. PARTICLE SHAPE AND ATTRACTIVE INTERACTIONS 233 electron, where we include the depolarizing field in addition to the polarizing field E.Thisis −iωt qe (3.25) me¨r + βr˙ + kr = qeE0e − N P, ε0 Noting that P = ρq r, substitution gives e( ) − 2 −  2 2N 2 (3.26) ( ω iωβ + ω0)I + ωp P = ωpε0E, where ω0 = k/me is the natural oscillation frequency of the electron. For frequen- cies well below the plasma frequency, the first term of the l.h.s. may be ignored. This is usually the case for London disperson. In this case, the polarizability for ellipsoidal particles becomes −1 (3.27) α = ε0VN . Closed form results for the polarizability are only available for ellipsoids; informa- tion about the shape of the ellipsoid is contained in the depolarizing factor tensor. The depolarizing factor tensor In spite of its apparent simplicity, the evaluation of the expression for the depolar- izing tensor Eq. (3.24) in general is prohibitively difficult. The geometry is shown in Fig. 6.

r' r

Figure 6. Geometry for evaluating the depolarizing factor tensor for an ellipsoid.

The first solution was provided by Poisson, who studied magnetic systems. The expression for the demagnetizing field has the same structure, and demagnetizing and depolarizing tensors of the same body are the same. Maxwell writes [36]: The mathematical theory of magnetic induction was first given by Poisson in Mem- oires de l’Institut, 1824...... The case of an ellipsoid placed in a field of uniform and parallel magnetic force has been solved in a very ingenious manner by Poisson. If V is the potential at the point (x, y, z) due to the gravitation of a body of any form of uniform density ρ,then−dV/dx is the potential of the magnetism of the same body if uniformly magnetized in the direction of x with the intensity I = ρ. Poisson’s trick is to make use of the observation that (in our electric case) if 1 an ellipsoid with uniform positive charge density +ρe and center at 2 Δr and an − − 1 identical ellipsoid with uniform negative charge density ρe and center at 2 Δr were superimposed, one would have uniform polarization P = ρeΔr inside, and 234 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE surface charge density ρeΔr · Nˆ , as we have in our problem. If the potential V+ 1 at r due to the ellipsoid with positive charge density +ρe and center at 2 Δr is − 1 V (r 2 Δr), the potential V− at r due to the ellipsoid with negative charge density − − 1 − 1 ρe and center at 2 Δr is V (r+ 2 Δr). The potential due to their superposition is − 1 − 1 −∇ · (3.28) Vtot = V (r 2 Δr) V (r + 2 Δr) V (r) Δr. If the electric potential V (r) of an ellipsoid with uniform charge density is known, we can calculate at once the potential of a uniformly polarized ellipsoid. We can then evaluate the resulting depolarizing field, and thus arrive at an explicit expression for the depolarizing factor tensor N . The gravitational potential of an ellipsoidal body with uniform mass density ρm has been calculated by Thompson [30]. If the shape of the body is represented by x2 y2 z2 (3.29) + + ≤ 1; a2 b2 c2 defining ∞ dq2 (3.30) q0 = , 2 2 2 2 2 2 0 (a + q )(b + q )(c + q ) and 1 ∂q (3.31) L = abc 0 , 2 ∂a2 1 ∂q (3.32) M = abc 0 , 2 ∂b2 and 1 ∂q (3.33) N = abc 0 , 2 ∂c2 the gravitational potential in the ellipsoid is ρ (3.34) V (r)=G m (Lx2 + My2 + Nz2), 2 where G is the gravitational constant. Using this result, in our electric case, the electric potential due to uniform charge density ρe is ρ (3.35) V (r)=− e (Lx2 + My2 + Nz2), 2ε0 where we have changed the sign, since like charges repel. We can write this as ρ (3.36) V (r)=− e r·N r, 2ε0 where N is a symmetric tensor whose meaning is not yet determined. Using Pois- son’s trick, we have that the electric potential in a uniformly polarized ellipsoid is ρ 1 (3.37) φ(r)=−∇V (r)·Δr = r·N Δr = r·N P, ε0 ε0 and the depolarizing field is 1 (3.38) Edep = −∇φ(r)=− N P. ε0 LECTURE 3. PARTICLE SHAPE AND ATTRACTIVE INTERACTIONS 235

Comparison with Eq. (3.23) shows that N is indeed the depolarizing factor tensor. Explicitly, then, we have shown that the depolarizing factor tensor for an ellip- soid with semi-axis lengths a, b and c is ⎡ ⎤ ∂q0 00 1 ∂a2 N ⎣ ∂q0 ⎦ (3.39) = abc 0 ∂b2 0 , 2 ∂q0 00∂c2 where q0 is given in Eq. (3.30). Some properties of the depolarizing factor tensor are as follows:

• trN =1;thatis,Nxx + Nyy + Nzz =1, • for a sphere, Nxx = Nyy = Nzz =1/3, • for a long needle along z-axis, Nxx = Nyy =1/2, Nzz =0, • for a thin disc normal to z-axis, Nxx = Nyy =0, Nzz =1. The polarizability tensor for non-spherical (ellipsoidal) particles can now be evalu- ated explicitly for all frequencies; from Eq. (3.26) we have

⎡ 2 ⎤ ωp N 2− 2−  2 00 ⎢ xxωp ω iωβ +ω0 ⎥ ⎢ ω2 ⎥ V ⎢ p ⎥ (3.40) α = ε0 0 N 2− 2−  2 0 , ⎣ yyωp ω iωβ +ω0 ⎦ 2 ωp 00N 2− 2−  2 zzωp ω iωβ +ω0 If the frequencies are small compared to the plasma frequency, we recover Eq. (3.27), which for spheres becomes

(3.41) α =3ε0VI. In many situations, the electrons can be regarded as being free inside the ellipsoid; in this case, ω0 = 0. This is true for conduction electrons in metallic nanoparticles, and for electrons inside the spherical electron cloud of a simple atom. If this is the case, and if the particles have cylindrical symmetry, we obtain ⎡ ⎤ 2 ωp N 2− 2−  00 ⎢ ⊥ωp ω iωβ ⎥ ⎢ ω2 ⎥ V ⎢ p ⎥ (3.42) α = ε0 0 N 2− 2−  0 . ⎣ ⊥ ωp ω iωβ ⎦ 2 ωp 00N 2− 2−  ωp ω iωβ For spherical particles, we obtain at once Eq. (3.19). For metallic nanoparticles, Eq. (3.42) indicates frequencies where the absorption is large; it describes well, for example, the shape dependence of the color of colloidal Au nanoparticle suspensions (Fig. 7), as function of their aspect ratio [12]; the spectra are shown in Fig. 8. Similar results hold for rodlike dye molecules, such as rylene [32]. Our main interest in this lecture is to determine how the attractive interaction via London dispersion depends on particle shape. Recalling Eq. (3.16), we write it as − hν (3.43) U = 2 6 (α1 : α2 +3trα1 trα2), 40(4πε0) R which is the form appropriate for two particles with polarizability tensors α1 and α2, under the assumption that the particles are uniformly distributed in space (see 236 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

Figure 7. Au nanorods suspended in toluene.

1.2 aspect ratio ~4.2 1 aspect ratio ~3 aspect ratio ~2.75 0.8

0.6

0.4

0.2

0 450 550 650 750 850

Figure 8. Absorption spectra of Au nanorods with different as- pect ratios. equation (4.3.5) of [50]), by use of the expression in Eq. (3.27) for both polarizabil- ities, we obtain for the interaction potential between two ellipsoidal particles with cylindrical symmetry % & 1 V2 2 1 2 1 1 1 2 (3.44) U = − hν + + − P2(cos θ) , 12 (4π)2R6 N⊥ N 5 N N⊥ where P2 is the Legendre polynomial of degree 2 and θ is the angle between the symmetry axes of the particles. Note that for both rod- and plate-like shapes, the energy is minimized if the particles are parallel. An expression of this form is the basis for the celebrated Maier-Saupe theory of nematic liquid crystals [35]; here we have indicated explicitly, through the depolarizing factor tensor, the shape dependence of the interaction. LECTURE 3. PARTICLE SHAPE AND ATTRACTIVE INTERACTIONS 237

In summary, the ubiquitous attractive force of the London dispersion interac- tion is proportional to the particle polarizabilities. The polarizabilites are shape dependent; for ellipsoidal particles, they depend on the depolarizing factor tensor. This has been derived explicitly by Poisson, using an ingeneous scheme. For non- ellipsoidal particles, there are no closed form results. In soft matter, the interaction strengths can be comparable to thermal excitations at room temperature, making phase transitions possible and enabling related complex behavior.

LECTURE 4 Particle shape and repulsive interactions

Atoms repel each other at short distances. The repulsive force originates from the quantum mechanical Pauli exclusion principle [19, Chap. 5], [43], according to which fermions – particles with half-integer spins, such as electrons – cannot simul- taneously occupy the same quantum state. This repulsive effect is approximated in the Lennard-Jones potential, σ 12 σ 6 (4.1) U =4ε 0 − 0 , LJ r r 12 σ0 by the r term on the r.h.s. Here σ0 is a characteristic length comparable to the particle diameter. The Buckingham potential 6 − σ0 (4.2) U =4ε e r/σ0 − B r is a modification of the Lennard-Jones potential. It has been argued [3] that the exponential decay better approximates the quantum mechanical expression origi- nating from overlap of electron orbitals. The essential feature of both expressions is that the energy increases rapidly as the interparticle distance decreases. For simplicity, we shall use the approximation here that the particles are perfectly rigid and impenetrable; in this case the repulsive potential is given by  ≥ (R) 0, if r12 r0, (4.3) U (r12)= ∞, if r12

U (R) r ( 12) ∞

0 r r 0 12

Figure 1. Repulsive potential.

(R) Our particles in general are not spherical, and U (r12) depends on the direc- tion of the interparticle vector r12 = r2 − r1 as well as its magnitude. In this case we need to evaluate the excluded volume Vexc given by (R) U (r1,ˆl1, r2,ˆl2) − 3 (4.4) Vexc(ˆl1,ˆl2)= 1 − e kT d r2.

239 240 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

If the expression for the excluded volume is known explicitly, then the free energy including the excluded volume can be evaluated, and the equilibrium state can be obtained by minimizing the free energy.

1. Onsager theory The first successful theory of hard particle interactions was proposed by Lars On- sager in a landmark paper in 1949 [39]. He considered long hard cylinders with only hard core interactions between them, and proposed an expression for the free energy of the form 1 (4.5) F = kT f(ˆl)lnf(ˆl)dˆl + ρkT V (ˆl,ˆl)f(ˆl)f(ˆl)dˆldˆl. 2 exc

Here ρ is the number density of particles, f(ˆl) is the orientationl distribution func- tion, andˆl is a unit vector along the axis of a cylinder as shown in Fig. 1 of Lecture 2. Onsager showed that, in the limit of infinitely long cynlinders, the excluded volume, to a good approximation, is given by l (4.6) V (ˆl,ˆl)=8v 1+ sin θ , exc 0 πd 12 where v0, l and d are the volume, length and width of the cylinder, respectively. Minimizing the free energy with respect to the distribution function f(ˆl) gives the self-consistent equation  − ˆˆ ˆ ˆ e ρ Vexc(l,l )f(l )dl ˆ   (4.7) f(l)= − ˆˆ ˆ ˆ , e ρ Vexc(l,l )f(l )dl dˆl where f(ˆl) appears on both sides. This equation is difficult to solve, at least in part because the term ˆl · ˆl cannot be separated; solutions of the theory therefore can only be obtained numerically. Two important results from Onsager theory are:

(1) above a critical volume fraction φc =4.4858d/l, the system is orientation- ally ordered; it is nematic, and (2) above a critical (average) volume fraction φ2 =3.3399d/l, the system phase separates into an isotropic and a nematic phase. This means that at low concentrations, φ<φ2, the system is isotropic. Then, just above φ2, the system phase separates into a nematic and an isotropic phase with very different concentrations; the volume fraction in the nematic phase is φ ≥ φc. We note that the temperature does not appear in Eq. (4.7), and so the behavior of the system is independent of temperature. If we set the attractive interaction U (A) = 0 in our free energy in Eq. (2.40), assume that the density is sufficiently low that we can expand the logarithm to first order, and assume that the number density is uniform in space and we can write ρ(r,ˆl)=ρ0P (ˆl), then we have the free energy density 2 2 2 2 (4.8) F = P (ˆl1)lnP (ˆl1)d ˆl1 + λ P (ˆl1)d ˆl1 + ρ0 P (ˆl1) P (ˆl2)Vexcd ˆl2d ˆl1, LECTURE 4. PARTICLE SHAPE AND REPULSIVE INTERACTIONS 241 where (R) U (r1,ˆl1, r2,ˆl2) − 3 (4.9) Vexc(ˆl1,ˆl2)= 1 − e kT d r2, which is the same as Onsager’s Eq. (4.5) except for our constraint term with the Lagrange multiplier. Minimizing the free energy density F with respect to the orientational distri- bution P (ˆl1) gives the self-consistent equation

−ρ0ˆ e l2 (4.10) P (ˆl1)= . −ρ0ˆ e l2 dˆl1 Since Onsager’s expression for long cylinders is untractable, we consider instead ellipsoidal particles.

2. Excluded volume for ellipsoids The excluded volume for ellipsoidal particles is not known exactly, even if the particle shapes are ellipsoids of revolution. The distance of closest approach of two ellipsoids must be determined numerically [59]. However, a simple approximate expression can be readily obtained by a simple linear interpolation. If V⊥ and V are the excluded volumes when the two identical ellipsoids are parallel and perpendicular, we assume that the excluded volume for arbitrary orientations can be written as 2 (4.11) Vexc(ˆl1,ˆl2)=V⊥ +(V − V⊥)(ˆl1 ·ˆl2) . Recalling from Eq. (1.13) that 1 (4.12) σ = (3ˆlˆl − I), 2 it follows that 1 (4.13) (ˆl ·ˆl )2 = (4σ : σ +3), 1 2 9 1 2 4 2 1 2 and since V = 3 πld , and for long ellipsoids, V⊥ = 3 π(l + d) d, the excluded volume is 2V⊥ + V 4 V − V⊥ (4.14) V (ˆl ,ˆl )= + σ : σ , exc 1 2 3 3 3 1 2 or 2 (4.15) V (ˆl ,ˆl )=v c + dσ : σ , exc 1 2 0 3 1 2 or  (4.16) Vexc(ˆl1,ˆl2)=v0(c + d P2(cos θ12)), π 2 where v0 = ld is the volume of one particle, and 6 4 l d (4.17) c = +4+ , 3 d l and 4 l 2 d (4.18) d = − − 1 , 3 d l 242 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

A comparison of the dependence of excluded volume on orientation for cylin- drical rods in Eq. (4.6) and for ellipsoids in Eq. (4.16) is shown in Fig. 2, where  Vexc(θ12)−Vexc(0) reduced excluded volume V (θ12)= is plotted for the two cases. exc Vexc(π/2)−Vexc(0)

1.0 V'exc cylinders

0.5 ellipsoids

0.0 0.0 0.5 1.0 1.5 θ 12

Figure 2. Comparsion of dependence of exluded volume on ori- entation for cyllinders and ellipsoids.

Averaging over the orientations of the second particle, from Eq. (1.22) we ob- tain, in general, . / 2 (4.19) V (ˆl ,ˆl ) = v (c + d σ : Q), exc 1 2 0 3 1 and in a uniaxial sytem, . /  (4.20) Vexc(ˆl1,ˆl2) = v0(c + d SP2(ˆl1 · nˆ)).

The behavior of the system of hard ellipsoids has been discussed in Lecture 2. Here we consider an interesting aspect of orientationally ordered systems: phase separation in a system consisting only of a single component.

3. Phase separation We start by considering Eq. (2.59) which we reproduce here for convenience: 1 Fv0 1 1 (4.21) = φ ln φ + φ2A + φ2BS2 − φ ln eφBSP2(x)dx. kT 2 2 0 We recall that this is the free energy density for systems with both attractive and repulsive interactions. We note that repulsive interactions are the key to phase separation, i.e., solely attractive interactions in a one-component system do not lead to phase separation, but solely repulsive interactions do. If two phases coexist LECTURE 4. PARTICLE SHAPE AND REPULSIVE INTERACTIONS 243 in equilibrium, they must have, in each phase, equal pressure Π and equal chemical potential μ. The chemical potential is given by ∂Fv (4.22) μ = 0 , ∂φ and the pressure is given by ∂F (4.23) Π = −F + φ . ∂φ Finding the volume fractions φ of two coexisting phases is equivalent to the so- called double tangent construction. In this process, one looks for a straight line which is tangent at two points to the free energy density F vs. volume fraction φ curve, so that these points satisfy the criteria of having equal slope (equal chemical potential) and being on the same line (equal pressure). Ordinarily for compositions between the two tangent points, the system phase separates, since the free energy of the phase separated system, as indicated by the straight line, is lower than that for the uniform system, as indicated by the free energy curve (see, for example, Fig. 5 below). It is convenient to introduce the scaled volume fraction ψ = Bφ;intermsof this, the free energy density can be written as 1 FBv0 1 1 (4.24) = ψ ln ψ + ψ2r + ψ2S2 − ψ ln eψSP2(x)dx, kT 2 2 0 where we have a single control parameter r = A/B. We have omitted a term linear in ψ; this does not change the behavior, since it just contributes an additive constant to μ, and leaves the pressure unchanged. Accurate construction of the double tangent can be challenging if dependence of the free energy density on volume fraction is nearly linear. In addition, we have three separate curves of F vs. φ, corresponding to the three solutions of the self- consistent equation Eq. (2.60), where S>0, S =0,andS<0. We therefore employ here another strategy: we plot the scaled pressure ΠBv 1 (4.25) 0 = ψ − ψ2(S2 − r), kT 2 versus the scaled chemical potential μ 1 (4.26) =lnψ + ψr − ln eψSP2(x)dx +1, kT 0 using the scaled volume fraction ψ as a parameter. At coexistence, curves cross, indicating the equality of pressure and chemical potential. Fig. 3 shows a typical scenario. Here the red curve corresponds to the nematic phase, and the green to the isotropic phase. As the parameter r is increased, a crossover and a swallowtail catastrophe develop. Pressure versus chemical potential is plotted in Fig. 4 for different values of the parameter r. In Fig. 3, three intersections indicate three pairs of compositions where the chemical potentials and pressures are the same for each pair. The correspond- ing double tangent constructions, made after the compositions were identified, are showninFig.5. Of the three double tangent constructions, only one minimizes the free energy. The phase diagram can be determined from the intersections shown in Fig. 4; it is 244 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

Π

r = 0.80

Swallowtail Catastrophe μ

Figure 3. Pressure vs. chemical potential, showing coexistence as a swallowtail catastrophe.

Figure 4. Pressure vs. chemical potential for different values of r. depicted below in Fig. 6. As the value of the control parameter r is increased above 0.781, the system undergoes phase separation for a large range of compositions.

4. Minimum excluded volume of convex shapes As can be seen in the examples above, the excluded volume of two particles plays a key role in the behavior of soft condensed matter systems, a role similar to that of the interaction energy of two particles. Here we consider aspects of the excluded volume for convex objects. LECTURE 4. PARTICLE SHAPE AND REPULSIVE INTERACTIONS 245

0.3

0.2

(a.u.) 0.1 F

0

−0.1 2 6 10 14 ψ

Figure 5. Three double tangent constructions. Only one mini- mizesthefreeenergy.

3

= 0 + 2 cos

= − 2 cos 2 I N − 0 = = + 0 1 0.781

I N = 0

4 5 6 7 8

Figure 6. Phase diagram showing phase separation in a single component system.

The excluded volume is the volume that one particle makes inaccessible to another, as indicated in Fig. 7. As B is moved around A while maintaining its orientation and so that the distance between centers is minimized without overlap, its center traces out the surface of the excluded volume (indicated with a dashed line). The excluded volume is formally defined by Eq. (4.4). 246 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

hA(n) *

n

V[ *]

Figure 7. Illustration of the volume excluded by A to the centroid of B. From “The excluded volume of hard sphero- zonotopes,” Bela M. Mulder, Molecular Physics, 2005, Taylor & Francis Ltd., reprinted by permission of the publisher, http:// www.tandfonline.com.

We assume that particles are sets of points, and that a particle is characterized by its shape, defined as a bounded, regularly closed region in 3D. We next introduce some useful notation and some definitions. A homothetic transformation of the shape A is a shape retaining transformation via a dilation (enlargement or shrinking) of A by a factor λ. Under a homothetic transformation, the volume in 3D changes by a factor of λ3, (4.27) V [λA]=λ3V [A].

Central inversion of the shape A is to reflect each point through the centroid rAo: A∗ { ∗ | − ∗ ∈A} (4.28) = rA rAo +(rAo rA) . If A and B are two shapes, their Minkowski sum is

(4.29) A⊕B={rA + rB|rA ∈A, rB ∈B}. This is illustrated in Fig. 8.

rA rB

Figure 8. Illustration of the Minkowski sum of the bodies A and B.

Next, we introduce the Brunn-Minkowski inequality [48] (4.30) V [A⊕B] ≥(V (A)1/3 + V (B)1/3)3, LECTURE 4. PARTICLE SHAPE AND REPULSIVE INTERACTIONS 247 where the equality holds if A and B are convex and homothetic, and Mulder’s equality [38] ∗ (4.31) Vexc{A, B} = V [A⊕B ], which is suggested by Fig. 7. Atheorem With the above results, we can prove a theorem regarding the minimum excluded volume of convex bodies. First, we note that the excluded volume of body A and a second body Aλv of arbitrary shape and volume v = λ3V [A] with λ>0 is given by Mulder’s equality: ∗ λv λv (4.32) Vexc{A, A } = V [A⊕A ]. Next, we use the Brunn-Minkowski inequality to obtain ∗ 3 (4.33) V [A⊕Aλv ] ≥ V (A)1/3 + V (Aλv)1/3 . But clearly 3 (4.34) V (A)1/3 + V (Aλv)1/3 =(1+λ)3V [A], and so λv 3 (4.35) Vexc{A, A }≥(1 + λ) V [A]. The equality holds if A and Aλv are convex and homothetic, but not otherwise. λv∗ 3 Since for A convex Vexc{A, A } =(1+λ) V [A], again by the Brunn-Minkowski inequality, we have therefore proved that the excluded volume of a convex body A and an arbitrarily shaped body B with volume v is a minimum if B is an inverted image of A, scaled so as to have the volume v of B. Details and discussion can be found in [42]. One consequence of the theorem is that the excluded volume of indentical cylin- drically symmetric particles is a minimum when the symmetry axes of the particles are antiparallel. This suggests that ensembles of such hard particles may experience the same geometric frustration as spins in the Ising model for antiferromagnetism on certain lattices.

5. Systems of hard polyhedra Systems of hard convex polyhedra have come under increased scrutiny in recent years, producing some unexpected results. In hard particle systems interacting solely via excluded volume effects, only entropy enters the free energy. Entropy is usually associated with disorder, but it can also produce order. For example, it was recently shown by the group of Sharon Glotzer [20] that tetrahedra with only excluded volume interactions can form quasicrystals; the quasicrystalline order is produced by entropy alone. Phase separation of hard particles of different sizes, due to entropic effects alone has been shown in an exact model by Frenkel and Louis [13]; orientational order of hard rods in the Onsager model is due to entropy [56], as is the formation of layers – quasi-long-range 1D order – in smectic crystals [57]. In all these cases, the creation of order of one type allows for greater disorder of another – for example, in smectics, orientational order of long rods allows increased translational freedom and disorder, while the positional order of smectic layers in one dimension allows greater translational freedom in the other two. 248 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

Glotzer’s group studied a wide range of polyhedral shapes, and found a wide variety of entropically produced phases [5] as shown in Fig. 9.

Figure 9. Phases of hard polyhedra detemined via computer sim- ulations. From P. F. Damasceno, M. Engel, and S. C. Glotzer, “Predictive self-assembly of polyhedra into complex structures,” Science, 337 (6093): 453–457, 2012. Reprinted with permission of AAAS.

Shape-dependent entropy produced order is becoming widely used tool in the production of new nanostructured materials. Summary

In these lectures we have considered the effects of the geometric shapes of par- ticles on their collective behavior in soft matter systems. We have focused on orien- tational order, and provided prescriptions for constructing orientation descriptors and order parameters. We provided a simple derivation of the density functional form of the Helmholtz free energy, encompassing both long-range attractive and short-range repulsive parts of the potential within a consistent framework. A de- tailed example of the application of this formalism was provided describing nematic phases due to both attractive (Maier-Saupe) and repulsive (Onsager) interactions as well as phase separation. The origins and shape dependence of the anisotropic long-range attractive London dispersion interaction were discussed in terms of the polarizability and depolarizing factor tensors. The shape dependence of anisotropic repulsive interactions was discussed in terms of excluded volume effects. It is the humble hope of the authors that the material in these lectures will be of some use to graduate students in mathematics towards understanding orienta- tionally ordered soft materials.

249

Bibliography

[1] N. W. Ashcroft and N. D. Mermin, Solid state physics. New York: Holt, Rinehart and Winston, 1976. [2] H. R. Brand, H. Pleiner, and P. E. Cladis, Flow properties of the optically isotropic tetrahe- dratic phase. Eur. Phys. J. E, 7:163–166, 2002. [3] R. A. Buckingham, The classical equation of state of gaseous helium, neon and argon. Proc. R. Soc. London A, 168(933):264–283, 1938. [4] D. S. Corti, Comment on “The Gibbs paradox and the distinguishability of identical particles” by M. A. M. Versteegh and D. Dieks [Am. J. Phys. 79, 741–746 (2011)]. Am. J. Phys., 80(2):170–173, 2012. [5] P. F. Damasceno, M. Engel, and S. C. Glotzer, Predictive self-assembly of polyhedra into complex structures. Science, 337(6093):453–457, 2012. [6] P. Debye, Die van der Waalsschen koh¨asionkr¨afte. Phys. Z., 21:178, 1920. [7] P. Debye, Die van der Waalsschen koh¨asionkr¨afte. Phys. Z., 22:302, 1921. [8] A. F. Devonshire, XCVI. Theory of barium titanate – Part I. Phil.Mag.Series7, 40(309):1040–1063, 1949. [9] A. F. Devonshire, CIX. Theory of barium titanate – Part II. Phil.Mag.Series7, 42(333):1065–1079, 1951. [10] A. F. Devonshire, Theory of ferroelectrics. Adv. Phys., 3(10):85–130, 1954. [11] L. G. Fel, Tetrahedral symmetry in nematic liquid crystals. Phys. Rev. E, 52:702–717, 1995. [12] J. P. Fontana, Self-assembly and characterization of anisotropic metamaterials. PhD disser- tation, Kent State University, Kent, OH, 2011. [13] D. Frenkel and A. A. Louis, Phase separation in binary hard-core mixtures: An exact result. Phys. Rev. Lett., 68:3363–3365, 1992. [14] P. Fromherz, Liquid crystals and biological structures. Angew. Chem., 92(4):329–330, 1980. [15] J. W. Gibbs, Elementary principles in statistical mechanics. Charles Scribners’s Sons, New York, 1902. Digitally reprinted by Cambridge University Press, Cambridge, in 2010. [16] J. Goldstone, Field theories with “superconductor” solutions. Il Nuovo Cimento, 19(1):154– 164, 1961. MR0128374 [17] J. Goldstone, A. Salam, and S. Weinberg, Phys. Rev., 127:965–970, 1962. [18] J. W. Goodby, Liquid crystals and life. Liquid Crystals, 24(1):25–38, 1998. [19] D. J. Griffiths, Introduction to quantum mechanics. Prentice Hall, 2004. [20] A. Haji-Akbari, M. Engel, A. S. Keys, X. Zheng, R. G. Petschek, P. Palffy-Muhoray, and S. C. Glotzer, Disordered, quasicrystalline and crystalline phases of densely packed tetrahedra. Nature, 462:773–777, 2009. [21] P. C. Hohenberg, Existence of long-range order in one and two dimensions. Phys. Rev., 158:383–386, 1967. [22] J. N. Israelachvili, Intermolecular and surface sorces (Third edition). Academic Press, San Diego, third edition, 2011. [23] J. D. Jackson, Classical electrodynamics. New York: Wiley, 1999. MR0436782 [24] E. T. Jaynes, The Gibbs paradox. In G. J. Erickson C. R. Smith and P. O. Neudorfer, editors, Maximum entropy and Bayesian methods, pages 1–22. Kluwer Academic, Dordrecht, 1992. [25] T. Kane and M. P. Scher, A dynamical explanation of the falling cat phenomenon. Int. J. Solids Struct., 5(7):663–670, 1969. [26] W. H. Keesom, Proc. R. Acad. Sci., 18:636, 1915. [27] W. H. Keesom, Proc. R. Acad. Sci., 23:939, 1920. [28] W. H. Keesom, Phys. Z., 22:643, 1921. [29] W. H. Keesom, Die van der Waalsschen koh¨asionkr¨afte. Phys. Z., 22:129, 1921.

251 252 PALFFY-MUHORAY, ET AL., THE EFFECTS OF PARTICLE SHAPE

[30] W. T. Kelvin, P. G. Tait, G. H. Darwin, and F. P. Whitman, Treatise on natural philosophy. Cambridge: At the University Press, 1883. [31] H.-S. Kwok, S. Naemura, H. L. Ong, and S. Kobayashi, Progress in liquid crystal science and technology: in honor of Shunsuke Kobayashi’s 80th birthday. Series on liquid crystals: vol. 4. Singapore: World Scientific, 2013. [32] Y. Li, W. Xu, S. Di Motta, F. Negri, D. Zhu, and Z. Wang, Core-extended rylene dyes via thiophene annulation. Chem. Commun., 48:8204–8206, 2012. [33] F. Z. London, Phys. Z., 63:245, 1930. [34] T. C. Lubensky and L. Radzihovsky, Theory of bent-core liquid-crystal phases and phase transitions. Phys. Rev. E, 66:031704, 2002. MR1934928 [35] W. Maier and A. Saupe, A simple molecular theory of the nematic liquid-crystalline state. Z. Naturforschung, 13a:564–566, 1958. [36] J. C. Maxwell, A treatise on electricity and magnetism, volume 2. Oxford: At the Clarendon Press, 1873. MR1669161 [37] N. D. Mermin and H. Wagner, Absence of ferromagnetism or antiferromagnetism in one- or two-dimensional isotropic Heisenberg models. Phys. Rev. Lett., 17:1133–1136, 1966. [38] B. M. Mulder, The excluded volume of hard sphero-zonotopes. Mol. Phys., 103:1411–1424, 2005. [39] L. Onsager, The effects of shape on the interaction of colloidal particles. Annals of the New York Academy of Sciences, 51(4):627–659, 1949. [40] P. Palffy-Muhoray, The diverse world of liquid crystals. Physics Today, 60(9):54, 2007. [41] P. Palffy-Muhoray, E. G. Virga, and X. Zheng, Cluster expansion for repulsive interactions. (to appear). [42] P. Palffy-Muhoray, E. G. Virga, and X. Zheng, The minimum excluded volume of convex shapes. J. Phys. A: Math. Theor., 47(41):415205, 2014. MR3265431 [43] W. Pauli, Uber¨ den zusammenhang des abschlusses der elektronengruppen im atom mit der komplexstruktur der spektren (On the connexion between the completion of electron groups in an atom with the complex structure of spectra). Z. Phys., 31:765–783, 1925. [44] R. E. Peierls, Quantum theory of solids. Oxford classic texts in the physical sciences. Oxford: Clarendon, 1955. [45] F. Reinitzer, Beitr¨age zur kenntniss des cholesterins. Monatshefte f¨ur Chemie und verwandte Teile anderer Wissenschaften, 9(1):421–441, 1888. [46] T. Sanchez and Z. Dogic, Engineering oscillating microtubule bundles. In W. F. Marshall, editor, Cilia, Part A, volume 524 of Methods in enzymology, pages 205 – 224. Academic Press, 2013. [47] T. Sanchez, D. Welch, D. Nicastro, and Z. Dogic, Cilia-like beating of active microtubule bundles. Science, 333(6041):456–459, 2011. [48] R. Schneider, Convex bodies: the Brunn-Minkowski theory. Encyclopedia of mathematics and its applications: v. 44. Cambridge; New York: Cambridge University Press, 1993. MR1216521 [49] P. Sheng, Hard-rod model of the nematic-isotropic phase transition. In E. B. Priestley, P. J. Wojtowicz, and P. Sheng, editors, Introduction to liquid crystals, pages 59–70. New York: Plenum Press, 1975. [50] A. J. Stone, The theory of intermolecular forces,volume32ofThe international series of monographs on chemistry. Clarendon Press, Oxford, 1996. [51] R. H. Swendsen, Statistical mechanics of classical systems with distinguishable particles. J. Stat. Phys., 107(5–6):1143–1166, 2002. MR1901515 [52] R. H. Swendsen, How physicists disagree on the meaning of entropy. Am. J. Phys., 79(4):342– 348, 2011. [53] D. van der Beek, T. Schilling, and H. N. W. Lekkerkerker, Gravity-induced liquid crystal phase transitions of colloidal platelets. J. Chem. Phys., 121(11):5423–5426, 2004. [54] N. G. van Kampen, The Gibbs paradox. In W. E. Parry, editor, Essays in theoretical physics, pages 303–312. Pergamon, Oxford, 1984. [55] M. A. M. Versteegh and D. Dieks, The Gibbs paradox and the distinguishability of identical particles. Am. J. Phys., 79(7):741–746, 2011. [56] G. J. Vroege and H. N. W. Lekkerkerker, Phase transitions in lyotropic colloidal and polymer liquid crystals. Rep. on Prog. in Phys., 55(8):1241, 1992. [57] X. Wen and R. B. Meyer, Model for smectic-A ordering of parallel hard rods. Phys. Rev. Lett., 59:1325–1328, 1987. BIBLIOGRAPHY 253

[58] G. Zanchetta, F. Giavazzi, M. Nakata, M. Buscaglia, R. Cerbino, N. A. Clark, and T. Bellini, Right-handed double-helix ultrashort DNA yields chiral nematic phases with both right- and left-handed director twist. Proc. Nat. Acad. Sci., 107(41):17497–17502, 2010. [59] X. Zheng, W. Iglesias, and P. Palffy-Muhoray, Distance of closest approach of two arbitrary hard ellipsoids. Phys. Rev. E, 79:057702, 2009.

Statistical Mechanics and Nonlinear Elasticity

Roman Koteck´y

IAS/Park City Mathematics Series Volume 23, 2014

Statistical Mechanics and Nonlinear Elasticity

Roman Koteck´y

Introduction Our goal in these lectures will be to examine the following question: to what extend can nonlinear elasticity be based directly on statistical mechanics? We will begin by inspecting the method of statistical mechanics with an em- phasis on results that can be substantiated in a mathematically rigorous way. We will see that even the claims that are very natural from the point of view of physics present an important open problem from the mathematical perspective. However, there is a class of models for which our rigorous understanding is quite satisfac- tory. We will spent some time appraising the main ideas in the simple case of Ising model. In the third lecture we will present method of abstract cluster expansion that is very useful for a precise perturbative study of various systems. Returning to our main question, we show in the fourth lecture the insufficiency of cluster expansion approach for gradient models and formulate the principles of a remedy in terms of the renormalisation group approach. The final lecture is devoted to the explanation of the results concerning nonlinear elasticity. It is shown how the macroscopic variational principle can be derived in terms of large deviations of a gradient model Gibbs measure.

Department of Mathematics, University of Warwick, UK and Center for Theoretical Study, Charles University, Prague, Czech Republic E-mail address: [email protected] Supported by the grant GACRˇ P201/12/2613

c 2017 American Mathematical Society

257

LECTURE 1 Statistical mechanics

Statistical mechanics of interacting particles A really fundamental theory would start from statistical mechanics of interacting particles (atoms/molecules). Consider a system of N particles of mass m whose con- figurations are characterised by their positions r1,...,rN and momenta p1,...,pN . The energy (Hamiltonian) of such a system is N p 2 N H (p ,...,p ,r ,...,r )= i + U(|r − r |). N 1 N 1 N 2m i j i=1 i,j=1 i= j Here U is a suitable pair potential (a realistic interaction should feature a strong short range repulsion and a decaying long range attraction; a typical example would be the U ∼− α 6 α 12 Lenard-Jones potential, U(r) r + r , as shown in the figure). A statistical mechanics description of a physical system begins with a choice of a suit- r able probability distribution. For a system of N particles in a fixed volume V at (inverse) tem- perature β it is the canonical ensemble with the probability density 1 p(p ,...,p ,r ,...,r ) ∼ exp{−βH (p ,...,p ,r ,...,r )}. 1 N 1 N N! N 1 N 1 N An alternative is the grandcanonical ensemble where the number of particles is not fixed and its probability distribution is determined by a parameter z called fugacity:   N  3 3 z −βHN d pi d ri N! R3N ×V N e h3N Pβ,V,z(N)= . ZG(β,z,V) The normalization factor is the grandcanonical partition function 7 7 zN d3p d3r −βHN i i ZG(β,z,V)= e 3N = N! R3N × N h N V N 3 N  z 2πm 2 − | − | β i,j U( ri rj ) 3 = 2 e d ri. N! βh N N V 3 2πm 2 N Here, the factor βh2 is coming from the Gaussian integration over mo- menta with the Planck constant h being the only reminder of the fact that the present description is actually a classical approximation of a more fundamental quantum mechanical setting. The grandcanonical ensemble should, in principle,

259 260 ROMAN KOTECKY,´ NONLINEAR ELASTICITY yield an adequate description of gas, liquid, as well as crystal in the corresponding regions of parameters. Thus, for example, introducing the density ρ(β,z)by

∞ 1 1 ∂ ρ(β,z) = lim NPβ,V,z(N) = lim z log ZG(β,z,V), V R3 |V | V R3 |V | ∂z N=0 we are expecting that over some z = zt(β), the density changes discontinuously passing from gas to liquid phase. However, this turns out to be an entirely open problem: Prove that, for suitable range of β’s (sufficiently large), there exists zt(β) such that the function ρ(β,z) has a discontinuity at z = zt(β). In spite of being explicitly on the table for at least 50 years, essentially, the only rigorous results for genuinely continuous systems (of particles in R3)arefor“toy models” by Widom and Rowlinson [WR] and a slightly more realistic model by Lebowitz, Mazel, and Pressutti [LMP]. The idea that a phase transitions can be described by a theory based only on a unique underlying Hamiltonian that does not aprioridistinguish the phases, 1 1 ∂ did not come easily . In finite volume, the function ρV (β,z)=z |V | ∂z log ZG(β,z,V) is clearly a real analytic function of its variables β and z. In particular, it has no discontinuity—it is just very steep in the neighbourhood of zt.Itwouldbe difficult, however, to introduce phase transitions as points at which the function ρV is growing “very rapidly”. Conceptually much simpler is to consider a mathematical idealization: to go to infinite volume—thermodynamic limit, V → R3, and instead of rapid changes to look for true discontinuities. It thus seems to be hopeless to try to construct a model of nonlinear elas- ticity starting from a true particle system in continuum. The only notable but rather limited exception is the model proposed by Penrose [Pe]. He studies a (two-dimensional) model of hard discs endowed with “poles” allowing to consider their spatial orientation. The system features an additional potential depending on mutual orientations of discs with an additional restriction that allows only con- figurations with the mutual angle of neighbouring discs lying in a fixed interval. As a result the particles have a tendency to align in a crystal pattern. Penrose succeeded to prove the existence of the elastic free energy per particle defined by the thermodynamic limit of the normalised logarithm of partition functions in parallelogram-shaped regions and show its convexity in the lengths of the sides of the parallelogram. Contrary to the fact that the problem of proving the basic fact of an existence of phase transitions for a realistic system of particles on continuum remains to be open, there is a class of systems, lattice models, for which we have a much more detailed understanding. We will discuss a simplest case representing this class, the Ising model, in the second lecture in some detail. Here, we will first look at a natural formulation of non-linear elasticity by means of a lattice model.

1Serious doubts existed for long time. This was witnessed by an anecdotal ballot at the van der Waals Centenary Conference in 1937. The question was “does the partition function contain the information necessary to describe a sharp phase transition?” The outcome of the vote was not very conclusive [BF]. LECTURE 1. STATISTICAL MECHANICS 261

Lattice models of nonlinear elasticity Think about a reference crystal and its deformation: ϕ(i) ∈ Rd denotes the position of the particle labeled by i ∈ Zd. ψ ····· · ······· · ······ · ···· · ··· · ·· · ϕ · · · · · · · · ·· ·· ··· ·········· ·· ········ ·· · ··· For any finite volume Λ ⊂ Zd and a boundary condition ψ : Zd \ Λ → Rm we consider a field of displacements2 ϕ :Λ→ Rm with a suitable deformation energy. For concreteness, let us think about two dimensional (d = m = 2) mass-spring model with the energy (Hamiltonian) HΛ(ϕ | ψ)= U1(ϕ(i) − ϕ(j)) + U2(ϕ(i) − ϕ(j)) = U(ϕ). {i,j}∩Λ= ∅ {i,j}∩Λ√= ∅  |i−j|=1 |i−j|= 2 2 Here, U1(ϕ(i) − ϕ(j)) = K1(|ϕ(i) − ϕ(j)|−a1) is the nearest-neighbour pair in- 2 teraction and, similarly, U2(ϕ(i) − ϕ(j)) = K2(|ϕ(i) − ϕ(j)|−a2) ) is the next- nearest-neighbour pair interaction. The value of the field ϕ outside of Λ is fixed to equal ψ, ϕ(i)=ψ(i)fori ∈ Zd \ Λ. In the last equality we indicated that the energy can be rewritten as a sum of contributions corresponding to all square cells, 1 U(ϕ)= U1(ϕ(i) − ϕ(j)) + U2(ϕ(i) − ϕ(j)). 2 √ i,j∈,|i−j|=1 i,j∈,|i−j|= 2

Remarks. (1) The Hamiltonian H is a function of gradients that is invariant with respect to a transformation ϕ → R(τaϕ) defined by R(τaϕ)(i)=R(ϕ(i)+a) for all R ∈ SO(m) and a ∈ Rm. This reflects the condition of the frame invariance: the independence of the energy on the observer. (2) While adding the next-nearest-neighbour term prevents the two-dimensional system to collapse by flattening along diagonals (the energy of any cell is, in the

2We generalise slightly by taking the values in Rm. The case of elasticity corresponds to m = d. 262 ROMAN KOTECKY,´ NONLINEAR ELASTICITY absence of the next-nearest-neighbour term, constant whenever the differences of displacements of nearest neighbours are fixed), we would also like to avoid a lattice folding:ford = 1 with a boundary condition that is strongly “compressing the system” (ψ(i)=ci with small c), it would be clearly cheaper to keep optimal distance between neighbouring particles and fold the line instead. This can be prevented by adding a term to the Hamiltonian that favours positive increments, − − 2 say, i(ϕ(i +1) ϕ(i) 1) .Ford = 2 we could elude a similar folding by adding 2 the term (S(ϕ) − 1) , where the sum is over all square cells and 1 1 S(ϕ)= det(ϕ(i ) − ϕ(i ),ϕ(i ) − ϕ(i )) + det(ϕ(i ) − ϕ(i ),ϕ(i ) − ϕ(i )) 2 2 1 4 1 2 4 3 2 3 represents the oriented area of the deformed cell (ϕ(i1),ϕ(i2),ϕ(i3),ϕ(i4)). As a result, the added term prevents folding by favouring local deformations with positive area.

ϕ(i2) ϕ(i4)

+ −

ϕ(i1)

ϕ(i3) A deformed cell with a negative area.

(3) Even if we were able to control the behaviour of a model with Lenard-Jones potential at a phase coexistence, we would not expect that under a small boundary deformation, the system in the bulk will comply with the boundary for extremely long time intervals; as the potential is vanishing at far distances, the system will clearly favour to break eventually into disjoint pieces each close to a stress free ideal configuration. A correct theory would have to invoke dynamics and time evolution distinguishing short and long time behaviour. While in the long run the system will strive to a “true equilibrium”, in relatively short time scale it will be described in terms of a metastable state. However, a rigorous understanding of the phenomenon of metastability is even more involved than to control phase coexistence. 2 In the mass-spring model, the potentials U1 and U2 grow like |∇ϕ(i)| with |∇ϕ(i)|→∞. This is far from realistic Lenard-Jones potential that vanishes at infinity. Actually, such a growth could be viewed as just a technical way how to deal with metastability. One might expect that an equilibrium state in such a system is actually mimicking a metastable state at a short time scale for a more realistic system with potential vanishing at infinity.

An immediate generalisation that we will consider later is to take the elastic energy | HΛ(ϕ ψ)= U(ϕτj (A)) d j∈Z : τj (A)∩Λ= ∅ LECTURE 1. STATISTICAL MECHANICS 263 given in terms of a finite range interaction U :(Rm)A → R, A ⊂ Zd finite, diam A = R0, with the assumption of the frame indifference implying that U de- pends only on the gradients: m U(ϕA)=U(RϕA + a) ∀ R ∈ SO(m),a∈ R =⇒ U(∇ϕ), as well as a suitable growth condition. Our big aim can now be formulated as follows: Determine the behaviour of the system at non vanishing temperature in large finite volumes Λ ⊂ Zd with the probability distribution of the vector field ϕ given as m Λ a gradient Gibbs measure μΛ,ψ(dϕ) on (R ) (equipped with the Borel σ-algebra):  exp −βHΛ(ϕ | ψ) (1.1) μΛ,ψ(dϕ)= dϕ(i). ZΛ,ψ i∈Λ

Here, ZΛ,ψ is the normalisation (partition function)  (1.2) ZΛ,ψ = exp −βHΛ(ϕ | ψ) dϕ(i). Rm Λ ( ) i∈Λ We will study this class of models in the last lecture. As it was said above: a lot is known for lattice systems, especially lattice systems where the field ϕ(i) takes values in a compact or even finite set. The fact that our random field is non-compact brings additional complications with which we will deal later. But now, let us summarise the basic facts for systems with ϕ(i) in a finite set by looking mostly at the simplest one: the Ising model.

Ising model

d Here,wehaveaspins(i) ∈{−1, 1}, s ∈ Ω={−1, 1}Z , instead of a displacement ϕ(i). For a finite set Λ ⊂ Zd we consider the Hamiltonian with a boundary condition s¯: | − − 1 − (1.3) HΛ(s s¯)= s(i)s(j) 2 s(i)¯s(j) h s(i). {i,j}∩Λ= ∅ i∈Λ,j∈Λc i∈Λ |i−j|=1 |i−j|=1

Notice that even though the Hamiltonian HΛ(s|s¯)isdefinedforanys, s¯ ∈ Ω, it depends only on the restriction of s to Λ, sΛ =(s(i),i∈ Λ), and ons ¯ only through the restrictions ¯Λc . The corresponding partition function is −βHΛ(s|s¯) (1.4) ZΛ,s¯(β,h)= e ,

sΛ∈ΩΛ β,h · {− }Λ and the measure μΛ,s¯( ) on the finite set ΩΛ = 1, 1 is determined in terms of probabilities of spin configurations sΛ ∈ ΩΛ:

−βHΛ(s|s¯) β,h e μΛ,s¯(sΛ)= . ZΛ,s¯(β,h)

The finite volume free energy, fΛ,s¯(β,h), and the magnetization mΛ,s¯(β,h), are then introduced, in terms of the partition function ZΛ,s¯(β,h), as 1 f (β,h)=− log Z (β,h) Λ,s¯ β|Λ| Λ,s¯ 264 ROMAN KOTECKY,´ NONLINEAR ELASTICITY and d 1 − | (1.5) m (β,h)=− f (β,h) ≡ s(i)e βHΛ(s s¯) , Λ,s¯ dh Λ,s¯ |Λ| sΛ i∈Λ where |Λ| is the number of points in the volume Λ. Occasionally we will also use β,h the notation E(·) β,h = · for the expectation μΛ,s¯ Λ,s¯ 1 − | ·β,h · βHΛ(s s¯) Λ,s¯ = e . ZΛ,s¯(β,h) sΛ Our task now is to clarify how the phase transitions phenomena can be de- scribed. As mentioned above, they can be truly revealed only in the thermodynamic limit. There are two alternative, but linked, characterisations:

(1) Nonanalyticity of the limiting free energy f(β,h) = limΛZd fΛ,s¯(β,h). β,h · (2) Nonuniqueness of Gibbs measures limΛZd μΛ,s¯( ).

Before dwelling into a rigorous discussion of those points, let us try to get some heuristic understanding. Consider first the situation with the free boundary condition: the energy is HΛ(s)=− s(i)s(j) − h s(i). {i,j}⊂Λ, i∈Λ |i−j|=1 and the probability distribution is

−βHΛ(s) β,h e μΛ (sΛ)= . ZΛ(β,h) with the partition function −βHΛ(s) ZΛ(β,h)= e .

sΛ∈ΩΛ

Let us try to understand which are the main contributions to ZΛ(β,h) and thus, β,h which are the typical configurations of the Gibbs measure μΛ (sΛ) for large Λ in dependence on values of the parameters β and h. For β = ∞ (zero temperature) the answer is very easy. We just notice that only ground configurations minimising the energy contribute. Using gΛ(∞,h)todenote ∞ { ∈ } the set of ground configurations, gΛ( ,h)= s ΩΛ : HΛ(s)=mins˜∈ΩΛ HΛ(˜s) , we have gΛ(∞,h)={+Λ} whenever h>0. Here +Λ is the configuration that takes value +1 for each i ∈ Λ. Similarly, gΛ(∞,h)={−Λ} for h<0andgΛ(∞, 0) = {−Λ, +Λ}.Giventhatβ = ∞, all other configuration are suppressed and we get ∞,h − μΛ (sΛ)=χ(h)δ+(sΛ)+(1 χ(h))δ−(sΛ). Here, χ is the step function ⎧ ⎨⎪1ifh>0, χ(h)= 0ifh<0, ⎩⎪ 1/2forh =0 and δ+ denotes the delta measure supported by the ground configuration +Λ, δ+(+Λ) = 1 and similarly for δ− supported by the configuration −Λ. LECTURE 1. STATISTICAL MECHANICS 265

For the free energy 1 1 HΛ(s) fΛ(∞,h)=− lim log ZΛ(β,h)= β→∞ β |Λ| |Λ| s∈gΛ(∞,h) ∞ 1 ∞ 1 − we get fΛ( ,h)= |Λ| HΛ(+Λ)forh>0, fΛ( ,h)= |Λ| HΛ( Λ)forh<0, and ∞ 1 − fΛ( , 0) = |Λ| (HΛ(+Λ)+HΛ( Λ)).

LECTURE 2 Phase transitions

What happens for small non vanishing temperatures (large β<∞)? Now, the excitations (the configurations whose energy is not a minimiser) are not totally suppressed. Nevertheless, we will argue that the main contributions come from configurations that are just small (low energy) excitations of ground configurations. We start the rigorous discussion of phase transition by proving three basic facts: existence of the limiting free energy and its independence on the boundary condition, its concavity in the external field h, and the Peierls argument.

Existence of the free energy

d The limit f(β,h) = limΛnZ fΛ,s¯(β,h) exists and does not depend on Λn and s¯ d |∂Λn| for any Λn  Z in the van Hove sense, → 0.Here∂Λ denotes the outer |Λn| boundary of Λ, ∂Λ={i ∈ Λc : ∃j ∈ Λ, |i − j| =1}.

Proof. Consider first the partition function ZΛ(β,h) with the free boundary N Nd condition and take a sequence of cubic volumes ΛN of side 2 , |ΛN | =2 , N = (N−M)d 1, 2,....ForM

(N−M)d 2 { (N−M)d| |} ZΛN (β,h)= ZΛM (β,h) exp β2 ∂ΛM .

Dividing the logarithm of both sides above by |ΛN | and using the bound |∂ΛM |≤ 2d2M(d−1),wehave   log Z (β,h) log Z (β,h)  ΛN − ΛM  ≤ 2βd2−M |ΛN | |ΛM |

log Z (β,h) proving that the sequence ΛN is Cauchy and thus the limit f(β,h)= |ΛN | 1 log ZΛN (β,h) − limN→∞ exists. β |ΛN | Consider a sufficiently large Λ and a maximal amount of disjoint shifts of ΛM | | that can be fully included in Λ. Their number is at most Λ .UsingΛtodenote |ΛM | their union, ignoring again the links between them and also ignoring all edges in Λ \ Λ (the number of the latter is bounded by |∂Λ|d|ΛM | since |Λ \ Λ|≤|∂Λ||ΛM |), we get log Z (β,h) log Z (β,h) |Λ | |Λ ||∂Λ| Λ = ΛM M + βO M +2d2−M |Λ| |ΛM | |Λ| |Λ|

267 268 ROMAN KOTECKY,´ NONLINEAR ELASTICITY yielding that 1 log Z (β,h) (2.1) f(β,h)=− lim Λn d | | ΛnZ β Λn for any van Hove sequence Λn. Finally, for any boundary conditions ¯ we have

−β|∂Λ| β|∂Λ| ZΛ(β,h)e ≤ ZΛ,s¯(β,h) ≤ ZΛ(β,h)e yielding the full claim. 

Concavity of the free energy The free energy f(β,h) is concave and symmetric in h (and also concave in T = 1/β) and for any s¯ ∈ Ω we have: 1 (2.2) − ∂−f(β,h) ≤ lim inf s(i) β,h ≤ →∞ Λn,s¯ n |Λn| i∈Λn 1 ≤ lim sup s(i) β,h ≤−∂+f(β,h) Λn,s¯ n→∞ |Λn| i∈Λn with ∂− and ∂+ denoting the left and right derivative with respect to h, respectively.

Proof. The concavity and symmetry of f(β,h) follows from convexity and symmetry of log ZΛ(β,h) checked by an explicit calculation expressing its second derivative as a variance of the random variable SΛ = SΛ(s)= i∈Λ s(i): 2 ∂ log Z (β,h) 2 Λ = β2 S −S β,h β,h ≥ 0 ∂h2 Λ Λ Λ Λ

To prove (2.2), we use the fact that f(β,h) = limn→∞ fΛn,s¯(β,h)and 1 ∂f (β,h) s(i) β,h = − Λn,s¯ . Λn,s¯ |Λn| ∂h x∈Λn Using ∂ to denote the partial derivative with respect to h, it suffices to show that + ≤ ≤ ≤ − (2.3) ∂ f(β,h) lim inf ∂fΛn,s¯(β,h) lim sup ∂fΛn,s¯(β,h) ∂ f(β,h). n→∞ n→∞ To prove the upper bound, we observe that by concavity, f (β,h) − f (β,h − δ) ∂f (β,h) ≤ Λn,s¯ Λn,s¯ Λn,s¯ δ for any δ>0. Thus − − ≤ f(β,h) f(β,h δ) lim sup ∂fΛn,s¯(β,h) L→∞ δ leading to the upper bound in (2.3) by taking the limit δ → 0. The lower bound follows similarly.  LECTURE 2. PHASE TRANSITIONS 269

Peierls argument ≥ ∞ β,0 ≥ For all d 2 there exist constants >0 and β0 < such that s(i) Λ,+  d uniformly in Λ ⊂ Z , |Λ| < ∞,andi ∈ Λ,aslongasβ ≥ β0.

Proof. For simplicity, we will provide the proof only for d = 2, even though it can be easily extended to d>2. First, we rewrite the partition function ZΛ,+(β,0) as a sum over collections of geometrical objects, so-called Peierls contours—com- ponents of the boundary separating regions of pluses and minuses. To be precise, let sΛ be a configuration in a finite set Λ, and consider its extension to Z2 by setting s(i) = +1 whenever i/∈ Λ. For each nearest neighbour bond {i, j} in Z2 for which s(i)s(j)=−1 we consider the dual edge {i, j}∗ as the Z2 ∗ Z2 1 1 − edge in ( ) = +(2 , 2 ), that is orthogonal to i j and bisects the segment from i to j at its center. 2 Consider now the the union ∂(sΛ) ⊂ R of all those dual edges for which s(i) = s(j). The contours corresponding to sΛ are now defined as the connected 2 d components of ∂(sΛ). Viewing contours as subsets of R ,wesaythatγ ⊂ R is a contour in Λ, if there is a configuration sΛ such that γ is one of the contours corresponding to sΛ. Notice that there is a one to one correspondence between configurations sΛ and sets ΓΛ of mutually disjoint contours in Λ, given that the boundary condition is fixed. Observing that the number of edges in ∂(sΛ) is just the sum of the lengths of the contours γ ∈ ΓΛ, and recalling that the unit edges in ∂(sΛ) are precisely those edges that are dual to nearest neighbour edges i, j for which s(i) = s(j), we get that the energy HΛ(s | +) of a configuration with the set countours ΓΛ differs from the energy of plus configuration +Λ by twice the overall length of contours: HΛ(s | +) = HΛ(+ | +) + 2 |γ|,

γ∈ΓΛ where |γ| denotes the length of the contour γ. Writing the energy HΛ(+ | +) explicitly, HΛ(+ | +) = −|EΛ|,whereEΛ is the set of all edges {i, j} with at least one endpoint i, j in Λ, we get for the partition function (1.4) with vanishing h and with + boundary condition, β|EΛ| ZΛ,+(β,0) = e exp −2β |γ| .

ΓΛ γ∈ΓΛ Here the sum is over all collections of disjoint contours in Λ. i Let ZΛ,+ be the contribution to partition function of all those configurations s for which s(i)=−1. For each such configuration, consider a cycle λ surrounding i defined from the configuration of contours ΓΛ as follows. Let C be the component 2 \∪ of R γ∈ΓΛ γ containing the site i. Consider the unique unbounded component O of the complement in R2 to its closure C.Letλ be the intersection of their boundaries ∂C ∩ ∂O. Itiseasytoshowthatλ ⊂ γ for some γ ∈ ΓΛ, λ is a cycle (each of its vertices is contained in exactly two edges), and λ is encircling the site i (λ splits R2 into two components and i belongs to the bounded one). We get

  | | i ≤ β EΛ − | | − | | ZΛ,+ e exp( 2β λ ) exp 2β γ . λi ΓΛ γ∈ΓΛ 270 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

Here the first sum runs over cycles surrounding i and the second runs over all sets of ∪ ∈ contours ΓΛ such that the union λ γ ΓΛ γ is a collection of contours correspond- | |  ing to some spin configuration. Clearly, eβ EΛ exp −2β |γ| ≤ Z . ΓΛ γ∈ΓΛ Λ,+ Indeed, even without specifying the details of the restriction on terms contribut- ing to the second sum, relaxing it we get the full sum (of positive terms) that corresponds to ZΛ,+. As a result, i ≤ − | | ZΛ,+ exp( 2β λ )ZΛ,+. λi

λ

i

Figure 1. i A contribution to ZΛ with circuit λ. The area of minuses is filled in gray and includes the dark grey component C.

The probability that the spin at a given site x is −1 is thus bounded as Zi Pβ,0 − ≤ Λ,+ ≤ − | | Λ,+(s(i)= 1) exp( 2β λ ). ZΛ,+ λi β,0 ≥ β,0 − ≤ 1− Hence, the bound s(i) Λ,+  is implied by PΛ,+(s(i)= 1) 2 which follows once we show that 1 −  exp(−2β|λ|) ≤ . 2  λi − | | To bound the sum λi exp( 2β λ ), we first omit the implicit condition that the cycle λ is contained in Λ. This will secure uniformity of our final estimate in Λandi. The number of cycles λ of length |λ| = n encircling a fixed site i can be n n ≤ n bounded by 2 3 4 . Indeed, a vertical half-line starting at i intersects λ at a horizontal edge at the distance at most n/2 and the number of cycles starting at that edge can be bounded by the number of self avoiding walks starting at the same edge that can be bounded by 3n−1. LECTURE 2. PHASE TRANSITIONS 271

In consequence, we get ∞ 1 −  exp(−2β|λ|) ≤ . 2 λi n=4 once β is sufficiently large, yielding the needed bound β,0 ≥ (2.4) s(i) Λ,+ . 

Now, we are prepared for the proof of the existence of phase transition for d ≥ 2, β large, and h = 0 in both formulations: nonanalyticity of the limiting free energy f(β,h)aswellasnonuniqueness of limiting Gibbs measures. Indeed, using the last inequality in (2.2) with the boundary condition + in combination with (2.4), we get −∂+f(β,0) ≥  and thus ∂+f(β,0) ≤−.On the other hand, combining the first inequality in (2.2), taken this time with the − β,0 ≤− boundary condition , with the inequality s(i) Λ,−  (a symmetric version of (2.4) obtained by flipping all spins including the boundary condition), we get ∂−f(β,0) ≥ . As a result, the left and the right derivatives at h = 0 differ and thus there is a discontinuity in the derivative of the free energy f(β,h)ath =0for sufficiently large β. For the other formulation, we have to clarify first the limiting process yielding, d as the limit of measures μβ,0 ,ameasureμβ on Ω = {−1, 1}Z endowed with Λn,+ + d the Borel σ-algebra generated by the compact product topology on Ω = {−1, 1}Z . Without going into details, we just notice that for the existence of a limiting measure β,0 it is sufficient to verify that the limit lim →∞ μ (f) exists for any cylindrical n Λn,+ function, i.e. any function f :Ω→ R such that it depends only on the restriction ⊂ Zd β of s to a finite set A (f(s)=f(˜s) whenever sA =˜sA) and define μ+(f) as this limit.7 In particular,7 it is enough to control the limits7 of the mean values β,0 E β,0 ( s(i))) = s(i) of the correlations s(i) for any finite μ i∈A i∈A Λn,+ i∈A Λn,+ A ⊂ Zd, 7 7 E E (2.5) μβ ( i∈A s(i)) = limn→∞ μβ,0 ( i∈A s(i)). + Λn,+ It can be shown that these limits exist, but even without this information we can consider any limit through a subsequence (observing that the space Ω is compact). In view of (2.4) and (2.5), we necessarily have E β (s(i)) ≥ . On the other hand, μ+ β any measure μ− obtained with the − boundary condition satisfies E β (s(i)) ≤−. μ− Having two measures with different expectation of the same random variable, we can conclude that these two measures differ and thus we can conclude that we have at least two distinct infinite volume Gibbs measures. We say that at h =0 (and β sufficiently large), two phases coexist, the plus phase whose probabilistic β β distribution is governed by μ+, and the minus phase governed by μ− There is an alternative way of characterising the limiting measures on the space Ω employing so called DLR equations (DLR stands for Dobrushin, Lanford, and Ruelle [Do68, LR]). Namely, we say that a measure μ on Ω is a Gibbs measure if it satisfies the equation β,0 μ(A)= μΛ,s(A) dμ(s) 272 ROMAN KOTECKY,´ NONLINEAR ELASTICITY for any finite Λ ⊂ Zd and any event A ⊂{−1, 1}Λ (that can be identified with a subset of Ω given as {s ∈ Ω:sΛ ∈ A}). An alternative way of formulating this is to say that the conditional probablity with respect to the measure μ of the event { } c c β,0 s : sΛ = σΛ knowing that sΛ =¯sΛ is μΛ,s¯(σΛ). It can be shown that any weak limit of measures μβ,0 with boundary condi- Λn,σn d tions σn outside Λn,Λn  Z , is a Gibbs measure. In addition, the closed (in the weak topology) convex hull of the set of all such limits coincides with the set of all Gibbs measures1. Notice that due to the plus-minus symmetry, we know apriorithat the tran- sition happens at h = 0. Any modification of the Hamiltonian (1.3) breaking this symmetry will change this fact and will result in a need of a different method of proof. As an example, we can consider a simplest such modification (in the case − d = 2) consisting in adding to the Hamiltonian (1.3) the term δ i,j,k s(i)s(j)s(k), 2 δ>0, where the sum is over all triplets i, j, k ∈ Z such that j1 = i1,j2 = i2+1,k1 = i1 +1, and k2 = i2. While it is easy to verify that at zero temperature the transition is just shifted to h = −δ, at nonzero temperatures it will occur at a particular value h = h(β), where h(β) is growing with decreasing β (growing temperature 1/β). To understand the reason for that, consider the contributions of the lowest excitations (one flipped spin) to the ground configurations + and − for the value h = −δ. The change of energy when flipping one + to − in the pure + configuration is 4+2h +6δ =4+4δ, while if one − is flipped to + in the pure − configuration, the energy changes by 4 − 2h − 4δ =4− 2δ. Correspondingly, the contribution of the excitation in + state in volume Λ is Λe−4β+4βδ (we have Λ positions at which this excitation can be placed), while for − state it is a bigger Λe−4β−2βδ favouring the − state at nonzero temperature and causing the coexistence line to shift to h(β) > −δ. We say that at h = −δ, only the − phase is stable at non vanishing temperature. We should note that to make this reasoning into a rigorous descrip- tion in this case needs a careful discussion summarised in the technique called the Pirogov-Sinai theory [PS]. In any case, the reasoning above was based purely on evaluation of energy of lowest excitations that determined which phase stable for given parameters. However, in some cases the energy alone is not sufficient to determine which phase is stable, the number of excitations with the lowest energy (entropy) is decisive. The simplest such case is the Blum-Capel model with spin s(i) ∈{−1, 0, 1} and with the Hamiltonian 2 2 HΛ(s)= (s(i) − s(j)) + h s(i)+λ s(i) . {i,j}⊂Λ, i∈Λ i∈Λ |i−j|=1 Here for h = λ = 0 all three phases −1, 0, +1 are stable for β = ∞, while for β finite only the phase 0 is stable. The reason lies in the fact that while the energy of the lowest excitation is the same, −4, for all three phases, the main contribution to + or − phase is a single flip to 0 configuration, for the phase 0 there are two possible relevant flips, either to +1 or to −1. Thus the state 0 is favoured with the lowest excitation contribution Λ2e−4β exceeding the contribution Λe−4β for either +or− state.

1For a detailed account in a full generality see [Ge, FV]. LECTURE 2. PHASE TRANSITIONS 273

Finally, there are cases where the stability of phase relies on a competition between energy and entropy of relevant states. Such order/disorder transition occurs for a Potts model with the spin s(i) ∈{1, 2,...,q} and with the Hamiltonian HΛ(s)=− (δs(i),s(j) − 1). {i,j}⊂Λ, |i−j|=1 Here, for low temperatures, we still have coexistence of q ordered phases where the nearest neighbours are typically occupied by the same spin, in a similar way as in Ising model. In addition there exist a temperature βt for which there are all q ordered phases and, in addition, a disordered phase for which nearest neighbour spins typically differ. Heuristically, these additional transition occurs for βt chosen so that the contributions of two phases equal. Namely, the contribution e0 =1ofthe | | − | | ordered state matches the contribution q Λ e dβt Λ of the disordered state, yielding ∼ log q thus βt d . Again, rigorous evaluation of βt and a proof of order/disorder coexistence needs enhanced techniques based either on Pirogov-Sinai theory or on the so called reflection positivity [KS].

LECTURE 3 Expansions

The high temperature expansion Our remaining task for the Ising model is to show the analyticity of the free energy f(β,h) in the region of high temperatures. While we have proven the discontinuity ∂f(β,h) ≥ of the derivative ∂h at h =0forβ β0,wewillshowtheexistenceofβ1 <β0 so that f(β,h) is analytic in the set RHT = {(β,h):β<β1}. We start by using the equalities eβs(i)s(j) =cosh(β)+s(i)s(j)sinh(β) and eβhs(i) =cosh(βh)+s(i) sinh(βh) to rewrite the Gibbs factor with free boundary condition. Pulling out the factors cosh(β)|E(Λ)| and cosh(βh)|Λ|,weget

exp{βH (s)} Λ = |E(Λ)| |Λ| cosh(β) cosh(βh) = 1+s(i)s(j)tanh(β) 1+s(i)tanh(βh) = {i,j}∈E(Λ) i∈Λ |B| |J| = tanh(β) s(i)s(j) tanh(βh) s(i). B⊂E(Λ) {i,j}∈B J⊂Λ i∈J The first sum is over all subsets B of E(Λ), the set of all edges (pairs of nearest neighbours) in Λ, while the second sum is over all subsets J ⊂ Λ. Let V (B)bethe set of all endpoints of the edges from B and Vodd(B) the set of those contained in odd number of edges from B. Observing that after summing over all configurations sΛ only terms with even number of s(i)’s for each i in the product above do not vanish, we get Z (β,h) |B| |V (B)| Λ =2|Λ| tanh(β) tanh(βh) odd . |E(Λ)| |Λ| cosh(β) cosh(βh) B⊂E(Λ) Introducing the shorthand1 Z (β,h) (3.1) Z (β,h)= Λ , Λ |Λ| |E(Λ)| 2cosh(βh) cosh(β)

1The free energy does not depend on a boundary condition. We have chosen the free boundary condition here.

275 276 ROMAN KOTECKY,´ NONLINEAR ELASTICITY we get |B| |Vodd(B)| (3.2) ZΛ(β,h)= tanh(β) tanh(βh) , B⊂E(Λ) where the sum is over all subsets of E(Λ). Notice, that the contributions tanh(βh) stem from the terms s(i)tanh(βh) that do not vanish after summing over all con- figurations sΛ only if the site i isalsocontainedinanoddnumberoftermsofthe |V (B)| form s(i)s(j)tanh(β). This explains the form of the factor tanh(βh) odd .

Figure 1. A high temperature polymer (for d =2)withthe 10 4 weight tanh(β) tanh(βh) . The vertices from Vodd(B)are denoted by thick dots.

The expression (3.2) can be rewritten as (3.3) ZΛ(β,h)= w(B), B B∈B where the sum is over the collections B of connected (when viewed as collection of edges of a subgraph of (Zd,E(Zd))) sets B ⊂ E(Λ) such that distinct elements in B are disconnected. The elements B ∈Bare called (high-temperature) polymers and we denote P the set of all of them. The weight w(B)isforanyB ∈P defined by |B| |V (B)| (3.4) w(B)= tanh(β) tanh(βh) odd .

Two polymers B1 and B2 are incompatible, B1 ∼ B2 once their union is connected. Notice that polymers from P are considered with their position on Zd; equivalent graphs B at different positions are considered to be distinct polymers. The set B can thus be viewed as an independent set of vertices in a graph whose vertices are polymers with edges connecting incompatible polymers. The formula (3.3) gives us an expansion of ZΛ(β,h) in powers of tanh(β/2) and tanh(βh), but using it naively we will hardly see the proportionality of log ZΛ(β,h) to |Λ| that would yield a useful expansion of f(β,h). Taking, for example, the lowest 2 term w(B) = tanh(β) tanh(βh) corresponding to B consisting of single edge, we get in the lowest term ZΛ(β,h) ∼ 1+2|Λ|w(B)+... (we use that w(B)does not depend on the position of B with factor d stemng from d possible directions oftheedgeformingB). This does not look like a term whose logarithm would proportional to |Λ|. We can try to improve this by taking disjoint pairs, triplets, etc of such terms at different positions. Ignoring the condition of disjointness, we get d|Λ| d|Λ| Z (β,h) ∼ 1+d|Λ|w(B)+ w(B)2 + w(B)3 + ···= Λ 2 3 =(1+w(B))d|Λ| + .... LECTURE 3. EXPANSIONS 277

This looks better (log ZΛ(β,h) is proportional to |Λ|) but we ignored the disjointness of polymers. Thus we need to subtract the corresponding terms. This will lead to corrections in orders w(B)2 and higher. Cluster expansion is a method that allows for a systematic treatment of the arising series for the logarithm of partition function. It applies not only to Ising model at high temperature but for a whole large class of models, including the interacting particle system we mentioned in the first lecture.

Intermezzo (cluster expansions) As usually, the proofs turn out to be more straightforward, once we restrict ourselves to abstract bones of the claim2.

Consider a countable graph G =(V,E) (without selfloops). We call abstract polymers its vertices v ∈ V and say that two vertices v, v ∈ V are incompatible if {v, v}∈E (no sefloops: only distinct vertices may be incompatible). Further, we assume that a function w : V → C assigns a weight to each abstract polymer. Whenever L ⊂ V finite, we take G[L] to be the induced subgraph of G spanned by L. Define (3.5) ZL(w)= w(v). I⊂L v∈I with the sum running over all independent sets I of vertices in L (no two vertices in I are connected by an edge). In other words: over all collections I of compatible abstract polymers. |L| The partition function ZL(w) is an entire function in w = {w(v)}v∈L ∈ C and ZL(0) = 1. Hence, it is nonvanishing in some neighbourhood of the origin w = 0 and its logarithm is, on this neighbourhood, an analytic function yielding the convergent Taylor series X log ZL(w)= aL(X)w . X∈X (L) 7 X →{ } X X(v) Here, (L) is the set of all multi-indices X : L 0, 1,... and w = v w(v) . Inspecting the Taylor formula for aL(X) in terms of corresponding derivatives of log ZL(w) at the origin w = 0, it is easy to show that the coefficients aL(X) actually do not depend on L: aL(X)=asupp X (X), where supp X = {v ∈ V : X(v) =0 }. As a result, one is getting the existence of coefficients a(X)foreachX ∈X = {X : V →{0, 1,...}, |X| = ∈ |X(v)| < ∞} such that v V X (3.6) log ZL(w)= a(X)w X∈X (L) for every finite L ⊂ V (convergence is on a small neighbourhood of the origin depending on L). Notice that a(X) ∈ R for all X (consider ZL(w) with real w)anda(X)=0 whenever G(supp X) is not connected (just notice that, from definition, Zsupp X (w) ∪ = ZL1 (w)ZL2 (w) once supp X = L1 L2 with no edges between L1 and L2). What is the diameter of convergence?

2Our approach here is mainly inspired by [Do96]asisalso[M-S] whose treatment we are following quite closely. 278 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

For each finite L ⊂ V , consider the polydiscs DL,R = {w : |w(v)|≤R(v)forv ∈ L} with the set of radii R = {R(v); v ∈ V }. The most natural for the inductive proof (leading in the same time to the strongest claim) turns out to be the Do- brushin condition: There exists a function r : V →[0, 1) such that, for each v ∈ V, (∗) R(v) ≤ r(v) 1 − r(v) . v∈N (v) Saying that X ∈X is a cluster if the graph G(supp X) is connected, we can summarise the cluster expansion claim for an abstract polymer model in the fol- lowing way: Theorem (Cluster expansion). There exists a function a : X→R that is nonvanishing only on clusters, so that for any sequence of diameters R satisfying the condition (∗) with a sequence {r(v)}, the following holds true:

(i) For every finite L ⊂ V , and any contour weight w ∈DL,R, one has Z (w) =0 and L X log ZL(w)= a(X)w ; X∈X (L)  | || |X ≤− − (ii) X∈X :supp Xv a(X) w log 1 r(v) .

Deferring the proof to the end of the section, let us come back to the high temperature expansion and verify that the condition (∗) is satisfied for the weights (3.4). Actually, we will verify a slightly stronger condition, There exists a function b : V →[0, ∞) such that, for each v ∈ V,  (∗∗) R(v)eb(v)+ R(v)eb(v ) ≤ b(v). v∈N (v) Indeed, having R satisfying (∗∗) with a function b and taking r(v)=1− exp −R(v)eb(v) , one has

  RHS of (∗)= 1 − exp −R(v)eb(v) exp − R(v)ea(v ) ≥ v∈N (v)  ≥ 1 − exp −R(v)eb(v) e−b(v) exp R(v)eb(v) =  = exp R(v)eb(v) − 1 e−b(v) ≥ R(v). The first inequality follows from (∗∗) and for the second we used that ex − 1 ≥ x. −τ |B| −τ|B| Let us choose b(B)=|B| and let e tanh(β1)=e .Thenw(B)e ≤ e . The set B can be viewed as the set of edges of a connected subgraph of (Zd,E(Zd)). To evaluate the number of such connected graphs we can use:

Lemma 3.1. In a connected graph of a maximal degree k, the number of con- n nected subsets of n of its edges containing a fixed edge e is bounded by k2 . Proof of Lemma 3.1. Just notice that for any connected subgraph, there is a path starting at a given edge, visiting all its edges, and using any edge at most twice (doubling each edge it amounts to Euler theorem asserting the existence of Euler path once the degree of each vertex is even). Its length thus is at most 2n. LECTURE 3. EXPANSIONS 279

Since at every vertex the path can go on in k possible ways, we get an upper bound n k2 . 

Since the degree of the graph (Zd,E(Zd)), we get ∞ κe−τ (3.7) e−τ|B| ≤ (κe−τ )n = ≤ 1/2 1 − κe−τ B:Be n=1 with κ = κ(d)=(2d)2 once 3κe−τ ≤ 1. With R(B)=e−(τ+1)|B| this implies the condition (∗∗). Thus, the claim of cluster expansion theorem can be applied to the high temperature expansion once β ≤ β1 with 1 tanh(β )= . 1 3κe In particular, we have X (3.8) log ZΛ(β,h)= a(X)w X∈X (Λ) with X (Λ) denoting the set of all multiindices on polymers in Λ. Using V (X)=∪B∈supp X V (B), let us define the pressure of the high tempera- ture polymer gas by the explicit formula a(X) (3.9) p(β,h)= wX |V (X)| X:V (X)i for any fixed i ∈ Zd (by translation invariance of the contributing terms, the choice of i is irrelevant). The function p(β,h) is analytic on RHT since it is given as a series, converging, in view of the claim (ii) of the cluster expansion theorem, absolutely and uniformly on RHT, with terms that are analytic functions (powers of cosh(βh)andcosh(β)). We will argue that, for real β and h, the limiting free energy (2.1), can be explicitly expressed in terms of the function p(β,h). Namely, (3.10) −βf(β,h)=p(β,h) + log 2cosh(βh) +2d log cosh(β) . Indeed, combining (3.8) with (3.9), we get

(3.11) log ZΛ(β,h) − p(β,h)|Λ| = a(X) = a(X)wX − wX = |V (X)| X∈X (Λ) i∈Λ X:V (X)i |Λ ∩ V (X)| = − a(X)wX . |V (X)| X/∈X (Λ) With the help of the cluster theorem and the bound (3.7), the right hand side can be bounded by (3.12) |a(X)wX |≤ |a(X)wX |≤ i∈∂Λ  i∈∂Λ  X:supp XB X:V (X) i B:V (B) i ≤ 2 e−τ|B| ≤|∂Λ|. i∈∂Λ B:Be Combining now the definition (2.1) with (3.1) and using (3.11) with the bound (3.12), we get the claim (3.10). 280 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

Proof of cluster expansion theorem In addition to the properties formulated earlier (a(X) ∈ R and a(X) = 0 whenever X is not a cluster), the coefficients a(X)havealternating signs: (−1)|X|+1a(X) ≥ 0. To prove this claim one first notices an equivalent formulation: Lemma (alternating signs). For every finite L ⊂ V , all coefficients of the X expansion of − log ZL(−|w|) in powers |w| are nonnegative. Indeed, equivalence with alternating signs property follows by observing that due to (3.6), one has |X| X − log ZL(−|w|)=− a(X)(−1) |w| X∈X (L) (and every X has supp X ⊂ L for some finite L).

Proof. Proof of the Lemma by induction in |L|: ∗ −| | Using a shorthand ZL = ZL( w ), we notice that ∞ |w(v)|n Z∗ =1with − log Z∗ =0andZ∗ =1−|w(v)| with − log Z∗ = . ∅ ∅ {v} {v} n n=1 Using N (v) to denote the set of vertices v ∈ V adjacent in the graph G ¯ ∪{ } ∗ to the vertex v,forw small and L = L v , from definition one has ZL¯ = ∗ −| | ∗ ZL w(v) ZL\N (v) yielding Z∗ − ∗ − ∗ − −| | L\N (v) log ZL¯ = log ZL log 1 w(v) ∗ ZL | | ∗ ⊂ ¯ (we consider w for which all concerned Taylor expansions for log ZW with W L converge). The first term on the RHS has nonnegative coefficients by induction hypothesis. Taking into account that − log(1 − z) has only nonnegative coefficients and that ∗  Z \N L (v) =exp |a(X)||w|X Z∗ L X∈X (L)\X (L\N (v)) has also only nonegative coefficients, all the expression on the RHS has necessarily only nonnegative coefficients. 

Proof of Cluster Expansion Theorem. Again, by induction in |L| we ∈X prove (i) and (ii)L obtained from (ii) by restricting the sum to X (L): Assuming ZL =0and |a(X)||w|X ≤− log 1 − r(v) X∈X (L):supp X∩N (v)= ∅ v∈N (v) obtained by iterating (ii) ,weuse L ZL\N (v) ZL¯ = ZL 1+w(v) ZL LECTURE 3. EXPANSIONS 281 and the bound       ZL\N (v)  X 1+w(v)  ≥ 1 −|w(v)| exp |a(X)||w| ≥ ZL ∈X \X \N X (L) (L (v)) ≥ 1 −|w(v)| (1 − r(v))−1 ≥ 1 − r(v) > 0 v∈N (v)  to conclude that ZL¯ =0. To verify (ii)L¯ ,wewrite | || |X − ∗ ∗ a(X) w = log ZL¯ +logZL = X∈X (L¯),supp Xv ∗ ZL\N (v) = − log 1 −|w(v)| ∗ ≤−log(1 − r(v)). ZL 

LECTURE 4 Gradient models of random surface

Let us return now to the gradient Gibbs measure μΛ,ψ(dϕ)asdefinedin(1.1) in the first lecture. We will first restrict ourselves to the case of a scalar field with m =1. This is a model of a random surface: the field ϕ(i) is interpreted as the height of the surface above the lattice site i ∈ Zd with fixed heights ψ at the boundary: ψ

ϕ

Here, we will confine our attention to the case of a nearest neighbour interaction with a boundary condition, (4.1) HΛ(ϕ | ψ)= U(ϕ(i) − ϕ(j)), {i,j}∩Λ= ∅, |i−j|=1 where the interaction potential U is a real function, U : R → R and, again, the value of the field ϕ outside of Λ is fixed to equal ψ. The finite volume Gibbs measures are explicitly defined by  exp −βHΛ(ϕ | ψ) (4.2) μΛ,ψ(dϕ)= dϕ(i). ZΛ,ψ i∈Λ

This is the model that was extensively studied especially in the case of a strictly convex interaction U and there exist several excellent reviews, for example, the review article by T. Funaki [Fu] or the book by S. Sheffield [Sh]. Here, we will just briefly mention some basic facts (following mainly [Fu], where detailed citations to original works can be found) that will play a role in the subsequent exposition.

Quadratic potential First of all, the limiting Gibbs measures are easily constructed in a particular case s2 of a quadratic potential, U(s)= 2 , also known as the massless Gaussian free field (GFF). Using the fact that all finite volume Gibbs measures are Gaussian, the limiting Gibbs measures are necessarily also Gaussian and can be explicitly constructed.

283 284 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

To state the result, we need few basic notions. First, let us introduce the discrete Laplacian ΔonZd as the operator Δϕ(i)= (ϕ(j) − ϕ(i)). j∈Zd:|i−j|=1 A function ψ : Zd → R is harmonic if Δψ(i)=0foreachi ∈ Zd. Finally, the Green function G(i − j)=(−Δ)−1(i, j) of the Laplacian Δ is explicitly given by 1 e−ijp  G(j)= d d dp. 2(2π) − d − ( π,π] =1(1 cos p) It turns out that G(j) < ∞ only if d ≥ 3. In this case, we have: Proposition 4.1. Let d ≥ 3 and let ψ be a harmonic function. Then the weak Zd d limit (on R endowed by the product topology) of the measures μΛ,ψ as Λ  Z exists. It is the Gaussian measure μψ that is uniquely determined by its mean and covariance, E E − − ∈ Zd μψ (ϕ(i)) = ψ(i) and μψ (ϕ(i)ϕ(j) ψ(i)ψ(j)) = G(i j) for any i, j . Remarks. (1) Thus, for d ≥ 3, there are infinitely many distinct Gibbs measures in the thermodynamic limit since there are infinitely many distinct harmonic functions. d In particular, all linear functions ψt(i)=(t, i)withafixedt ∈ R are harmonic. The corresponding Gibbs measure μψt is characterized by the tilt t and it is describing a random surface whose mean is a plane with the inclination determined by the normal vector t, E (ϕ(i)) = (t, i). μψt (2) On the other hand, for d =1, 2, the limit does not exist. This can be seen by − d observing that, say, the Gibbs state μΛN ,ψ0 in a cube ΛN =[ N,N] with the zero boundary condition ψ0(i) = 0 is the Gaussian field with the vanishing mean and − −1 | |×| | diverging covariance GΛN (i, j)=( ΔΛN ) (i, j). Here ΔΛN is the ΛN ΛN - matrix corresponding to the Laplacian Δ restricted to ΛN with the zero boundary ∼ ∼ condition, yielding GΛN (0, 0) N for d =1andGΛN (0, 0) log N for d =2. Nevertheless, if we consider the induced measure on the gradients ∇ϕ(i)= d ϕ(i + e) − ϕ(i),i ∈ Z , =1, 2,...,d, with e denoting the unit vector in the E ∇ ∇ direction ,weget μΛ ,ψ ( ϕ(i)) = ψt(i)and N t (4.3) E (∇ ϕ(i) −∇ ψ (i))(∇ ϕ(j) −∇ ψ (j)) = ∇ ∇ G (i − j) μΛN ,ψt   t k k t i, j,k ΛN

(here ∇i, denotes the gradient with respect to the variable i in the direction (and similarly for ∇j,k)). The right hand side above has a finite limit for any d ≥ 1 allowing eventually to introduce consistently the thermodynamic limit of the measure of the field of gradients ∇ϕ = {∇ϕ(i); i ∈ TN , =1,...,d}—the ∇ gradient Gibbs measure μt . It turns out that the resulting gradient Gibbs measure ∇ 1 μt is tempered (finite second moment of field gradients), shift invariant ,and ergodic under spatial shifts with correlations decay given by a version of (4.3) with 1 G(i − j) ∼ − . |i−j|d 2

1The invariance of the measure μ∇ under spacial shifts is quite natural once we notice that ψt E ∇ E ∇ ∈ Zd ∇ ∇ μ∇ ( ϕ(i)) = μ∇ ( ϕ(j)) for any i, j since ψt(i)=(t, e)= ψt(j). ψt ψt LECTURE 4. GRADIENT MODELS OF RANDOM SURFACE 285

(3) In addition to the Gibbs measures in the thermodynamic limit, the quadratic potential also allows for an explicit computation of the surface tension—an inter- face free energy corresponding to a random surface with a linear slope boundary condition. The surface tension σ(t)thusmeasuresthecostperunitarea(interms of the free energy) of a random surface whose mean inclination is given by the tilt t. Explicitly, it is defined in terms of the partition function with the boundary condi- tion ψt,  − | ZΛN ,ψt = exp βHΛN (ϕ ψt) dϕ(i), (Rm)ΛN i∈ΛN by the limit 1 1 σ(t)=− lim log ZΛ ,ψ . β N→∞ (2N +1)d N t The existence of the limit is easy to verify with the help of a subadditivity argument. s2 In the case of the quadratic potential, U(s)= 2 , one explicitly gets 1 σ(t)=σ(0) + |t|2 2 with Euclidean norm |t|. Notice that, contrary to the Ising model (and in general models with compact spins) whose free energy does not depend on the boundary condition (recall the claim about existence of the free energy from the second lecture), here the free energy—surface tension depends on the boundary condition. This is a direct con- sequence of the fact that the field ψt on the boundary of the cube ΛN attains unbounded values proportional to N.

Convex potentials Main results concerning the gradient Gibbs measure and surface tension for qua- dratic potential can be extended to a class of strictly convex potentials U in (4.1). In particular, assuming smoothness, symmetry, and strict convexity of the potential 2  U (i.e., U ∈ C (R), U(−s)=U(s),s∈ R,andU (s) ∈ (c−,c+) for any s ∈ R and fixed c+ >c− > 0), it was shown [FS]thatσ(t)isaconvexfunctionofthetilt t. Actually, this can be strengthend to a proof [DGI, GOS]ofstrictconvexity of σ(t). Finally, Funaki and Spohn proved uniqueness of gradient Gibbs measures corresponding to t. More explicitly, they have shown that for each tilt t there is a unique tempered, shift invariant and ergodic gradient Gibbs measure. When constructing an infinite volume gradient Gibbs measure by a limit of

finite volume measures μΛN ,ψt , one is confronted with a difficulty in proving that the limiting gradient Gibbs measure is actually shift invariant. This problem was solved in [FS] by what is now known as the Funaki-Spohn trick. Instead of enforcing the tilt by the boundary condition as in μΛN ,ψt , one considers a shift invariant measure on a discrete torus TN (this would automatically imply E(∇ϕ(i)) = 0 corresponding to a vanishing tilt)—with the slope t enforced by replacing the potential U(∇ϕ(i)) with a modified potential U(∇ϕ(i)+(t, e)). More explicitly, we consider a gradient field ∇ϕ on the torus T =(Z/N Z)d with a probability measure μ∇ defined N TN ,t so that the law of ∇ϕ is determined as {∇ϕ(i)=∇ϕ(i)+(t, e)=∇ϕ(i)+ 286 ROMAN KOTECKY,´ NONLINEAR ELASTICITY t; i ∈ TN , =1,...,d} where ∇ϕ is distributed with the probability measure μ∇ defined by TN ,t # $ ∇ 1 − t ∇ μTN ,t(dϕ)= exp β U ( ϕ(i)) λ(dϕ). ZN (t) i∈TN =1,2,...,d

d T Here, λ(dϕ)isthe(N − 1)-dimensional Hausdorff measure on the space R N /R of fields ϕ satisfying the condition ϕ(i)=0, i∈TN

t U (∇ϕ(i)) = U(∇ϕ(i)+t), and ZN (t) is the normalization. Given that the measure μ∇ is automatically translation invariant and, taking  TN ,t into account that the sum ∇eϕ(i) over any closed loop consisting of edges (i, i+e) (including loops wrapped around the torus) necessarily vanishes, one observes that E ∇ (∇ ϕ(i)) = 0 and thus E ∇ (∇ ϕ(i)) = t for any i ∈ T and =1,...,d. μT  μT   N N ,t N ,t Finally, and this is a crucial step, one has to check that the family of measures {μ∇ ; N ∈ N} is tight and that the limiting measure μ (with the limit taken TN ,t possibly over a subsequence Nj ) is indeed a Gibbs measure satisfying the DLR equations with respect to the original potential U(∇ϕ(i)).

Let us mention that in the case of convex potential, a number of additional results were proven (Deuschel, Funaki, Sheffield, Giacomin, and many others) con- cerning the decay of correlations, concentration properties, limit theorems under appropriate scalings, etc.

Non-convex potentials Considering a model with the Hamiltonian (4.1) but with a non-convex potential U, we are entering a completely new territory. In particular, one cannot expect either the uniqueness of gradient Gibbs measures corresponding to a fixed tilt t or a strict convexity of the surface tension σ(t). Unfortunately, available techniques to investigate the non-convex case are rather limited and we will restrict our discussion here to explaning only some basic ideas. Let us begin with non-uniqueness of gradient Gibbs measures.

Phase transition for a non-convex potential. Essentially the only available technique to prove the existence of phase transition in this case is so called method of reflection positivity. We have no place here to enter into explanation of this technique (see, e.g., the recent review [Bi]). Let us just say that it has a very severe technical restriction that does not allow to treat any potential that is not symmetric; we need the condition U(s)=U(−s)even for the modified potential when applying the Funaki-Spohn trick. Thus, one has to devise a model for which one expects a transition to happen for zero tilt (the boundary condition ψ0). LECTURE 4. GRADIENT MODELS OF RANDOM SURFACE 287

In the only paper [BK] where, up to now, the existence of phase transition for a gradient Gibbs measure was really proven, a potential of the form

U()s

s was considered. Heuristically, what one is expecting is that at low temperatures (β large) the energy is winning: a typical configuration in a Gibbs measure will feature fields with ∇ϕ(i) very close to 0 being pushed there by the steep depression around the origin in the pictured potential—the contribution of the Boltzmann −βU(∇ϕ(i)) factor e is negligible if ∇ϕ(i) is away from the region covered by the depression. The Gibbs measure represents a low temperature ordered phase. On the other hand, at high temperatures (β small) the entropy is winning: the influence of the Boltzmann factor is not so important and integrating over large values of the gradient in the space of high dimension brings a significant contribution—typical values of ∇ϕ(i) are spread over much bigger interval. The Gibbs measure represents a high temperature disordered phase. One could expect that there exists a particular temperature βt—a point of phase coexistence—where these two possibilities coexist. Indeed, using the technique of reflection positivity it was possible to prove that this is the case [BK]. To simplify the technical difficulties, the particular potential for which the phase coexistence was proven was chosen in the following explicit form: − − 2 − 2 (4.4) e U(s) = p e κOs /2 +(1− p)e κDs /2

This is actually (for a particular choice of κO, κD,andp) exactly the potential pictured above. The parameter p plays a role of β here. Theorem 4.1. For each >0 there exist a constant c = c() > 0 and, if

κO ≥ cκD, a number pt ∈ (0, 1) such that, for the interaction U with p = pt, there are two distinct, infinite-volume, shift-ergodic gradient Gibbs measures μ ord and μdis of zero tilt for which α 1 |∇ |≥ √ ≤ μ ord ϕ(i))  + 2 for all α>0, κO 4α and α 1/4 μdis |∇ϕ(i))|≤√ ≤  + c1α for all α>0. κD

Here c1 is a constant of order unity.

For d =2thevaluept can be determined explicitly by a duality argument, p κ 1/4 t = D . 1 − pt κO 288 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

While it can be proven (see next chapter) that the surface tension σ is in general a convex function of the tilt t, it is less clear under what conditions it is strictly convex.

Strict convexity of the surface tension. We would not anticipate any phase transitions for β small and thus it is natural to expect that σ(t) is strictly convex in this regime. Indeed, this (and much more, including the uniqueness of ergodic shift-invariant Gibbs measure) was proven for a large class of models in [CDM]and[CD]. Models considered in those papers feature a potential of the form U = U0 + V ,whereU0 is strictly convex and the perturbation V is small. Once a perturbation and temperature are sufficiently small (an example of the√ needed condition is an assumption that the second derivative of V is negative and βV  is small enough), the surface tension is proven to L1(R) be strictly convex. These results apply also to the potential U of the form (4.4) with the condition of the smallness of β replaced by the smallness of p, p pt. We cannot expect strict convexity when p = pt due to an anticipated linearity of σ over certain interval. Nevertheless, for p above to pt we would still presume strict convexity at least for small tilts. The idea is that for large β (here, p close to 1) and κO >κD,the − 2 dominant contribution comes from the factor e κOs /2 governing the depression in the potential U. One could expect that to prove the latter claim should not be difficult using a form of low temperature cluster expansion. Let us explain why it is not so simple. We shall restrict ourselves to inspecting a simplest case with potential U = U0 + V 1 2 with quadratic U0, U0(s)= 2 s , and with a small perturbation assuming, say, that V (s) ≥−s2/h2 for some big constant h and that its first and second derivatives are uniformly bounded . To be sure that V will not directly influence the behaviour around the origin (the minimum of U0),wecanevenassumethatV vanishes, say, on the interval [−1, 1]. Consider now the surface tension 1 1 σ(t)=− lim log ZN (t) β N→∞ N d where the partition function ZN (t) is defined on the torus employing the Funaki- Spohn trick, # $ (4.5) ZN (t)= exp −β U(∇ϕ(i)+t) λ(dϕ)

i∈TN =1,2,...,d (renaming ϕ as ϕ). Rescaling the field ϕ(i) → ϕ√(i) and using the explicit quadratic  β form of U0 as well as the condition ∈T ϕ(i) = 0, we get i N # $ − β d| |2 (0) 2 N t − √1 ∇ (4.6) ZN (t)=e ZN exp β V β ϕ(i)+t ν(dϕ) i∈TN =1,2,...,d where ν is the Gaussian measure # $ 1 1 2 (4.7) ν(dϕ)= exp − ∇ϕ(i) λ(dϕ) Z(0) 2 N i∈TN =1,2,...,d LECTURE 4. GRADIENT MODELS OF RANDOM SURFACE 289

1 | |2 Thus, finally, we get the surface tension as the leading contribution 2 t with an addition of a term that could be expected to be just a small correction,

1 2 1 1 (4.8) σ(t)= |t| − lim log ZN (t). 2 β N→∞ N d Here # $ Z − √1 ∇ (4.9) N (t)= exp β V β ϕ(i)+t ν(dϕ). i∈TN =1,2,...,d Now, the hope is to investigate the asymptotics of the correction term with the help of cluster expansion. Let us begin by rewriting the integrand as # d $ ( # d $ ) − √1 ∇ − √1 ∇ − exp β V β ϕ(i)+t = 1+exp β V β ϕ(i)+t 1 i∈TN =1 i∈TN =1 ( # d $ ) − √1 ∇ − ∇ = exp β V β ϕ(i)+t 1 = KA( ϕ) A⊂TN i∈A =1 A⊂TN

with the sum over all subsets A ⊂ TN . Using the assumptions on the perturbation V , it is possible to show that  1 −β d V ( √ z +t ) 2 2 sup |e =1 β   − 1|e−|z| /h ≤ ρ z∈Zd with a small ρ (once t is sufficiently small and β sufficiently large). Notice that weuseherethefactthatforasmall√ t the left hand side vanishes for z such that ≤ ∇ 2maxz β. As a result, evaluating the size of the terms KA( ϕ)intermsof · {− 1 d |∇ |2} anorm h with the weight factor of the form exp h2 i∈A =1 ϕ(i) we   ≤ get KA h ρ and thus 1 d |K (∇ϕ)|≤ρ|A| exp{ |∇ ϕ(i)|2}. A h2  i∈A =1 Observe that the positive term occuring in the exponent with a tiny prefactor 1/h2 is of the same form as the negative terms in the definition (4.7) of the measure ν. This would allow to proceed with the integration and make it thinkable to begin with a “polymer representation” (4.10) ZN (t)= KA(∇ϕ)ν(dϕ)

A⊂TN

to get a good control of log ZN (t) with the help of cluster expansion. However, there is a problem with this scenario. For the cluster expansion to really work, we would have to start with (abstract) polymers that are connected components of the set A. Just recall the formula (3.3) that was a base for the high temperature cluster expansion7 for the Ising model. Mimicking this, let us split the ∇ ∇ A weights KA( ϕ)= X∈A KX ( ϕ) into a product over , the collection of all connected components X of the set A. But this is just a factoring7  of the integrand. ∇ ∇ The integral KA( ϕ)ν(dϕ) does not split into a product X∈A KX ( ϕ)ν(dϕ). There would be some possibility to consider an enhanced version of “cluster expan- sion with interaction” if the correlation between the term KX (∇ϕ) immersed in the field governed by the measure ν were decaying exponentially quickly. However, 290 ROMAN KOTECKY,´ NONLINEAR ELASTICITY we are dealing here with polymers “floating” in the zero mass Gaussian free field ν with its very slow decay of correlations (c.f. (4.3)). This is the main obstacle preventing a use of a modification of standard cluster expansions. Nevertheless, there is a way out employing a multiscale renormalization group strategy that, in a gross simplification, amounts to executing the integration in steps, “integrating subsequent modes of decay one after another”. Indeed, this approach was used in [AKM] to prove that, for a class of potentials U = U0 + V as above at low temperatures and for small tilts, the surface tension σ(t)isindeed strictly convex. This approach is rather technically involved and we have no place here to go into any details. It might actually seem that it is too involved to solve such a straightforward problem as to prove a strict convexity of the surface tension in the situation where it clearly should take place. Hawever, we believe that the techniques introduced in [AKM] will be also a solid base of further exploration of the properties of gradient Gibbs measures—uniqueness, decay of correlations etc. LECTURE 5 Nonlinear elasticity

Finally, let us return to our main aim of investigating random vector field ϕ distributed according to the gradient Gibbs measure (1.1) and the link with a macroscopic variational picture1. We will argue, that the standard variational formulation of the nonlinear elasticity coincides , in a suitable scaling limit, with the large deviation principle of the corresponding Gibbs measure. First, it will be technically simpler to relax slightly the strict boundary con- dition in (1.1) and to replace it by a somehow softer “clamped” version. We thus consider the measure  − (ψ) (ψ) exp βHΛ(ϕ) 1l Λ (ϕ) (5.1) μΛ (dϕ)= dϕ(i). ZΛ,ψ i∈Λ with the normalisation  (ψ) − (ψ) (5.2) ZΛ = exp βHΛ(ϕ) 1l Λ (ϕ) dϕ(i) Rm Λ ( ) i∈Λ and the Hamiltonian

(5.3) HΛ(ϕ)= U(ϕτj (A)) d j∈Z : τj (A)⊂Λ

Here 1l(ψ)(ϕ) is the indicator of the set {ϕ ∈ (Rm)Λ : ϕ − ψ ≤ 1} with Λ ∞(∂R Λ) { ∈ ≤ } 0 ∂R0 Λ denoting the strip ∂R0 Λ= i Λ:dist(i, ∂Λ) R0 . (Notice the different position of the index ψ in (1.1) and (5.1) indicating the change in the boundary condition.)

Free energy Let us begin by defining the free energy measuring the response of the system to a homogeneous external deformation. To this end, we consider an affine boundary Rd → Rm (F ) condition F : and the corresponding partition function ZΛ . F

························ ··· ··· ··· ··· ··· ··· ϕ ··· ··· ··· ··· ··· ··· ··· ··· ························ ·

1This lecture is based on the paper [KL]. It presents its brief and simplified account.

291 292 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

We will assume that the Hamiltonian H (resp. the interaction U) satisfies the fol- lowing growth condition: there exists p ≥ 1 and constants c and C so that: p p c|∇ϕ(0)| ≤ U(ϕA) ≤ C( |∇ϕ(i)| +1),p≥ 1, i∈Q d where Q ⊂ Z is a cube (of side R0) containing A.

In this setting it is easy to show the existence of the limiting free energy. Theorem 5.1 (Existence of the free energy). Thefreeenergy −1 (F ) W (F )=− lim |Λn| log Z n→∞ Λn exists and does not depend on the sequence Λn (assuming (van Hove thermodynamic | | limit) that ∂Λn → 0). |Λn| The free energy W is a continuous quasiconvex function of F satisfying the growth condition |W (F )|≤C(1 + F p). While the existence of the limiting free energy follows by standard subaddi- tivity arguments (here the relaxed boundary condition makes the argument very straightforward), the quasiconvexity proof is more subtle and it is one of the main results of [KL]. Letusrecallthatquasiconvexity of a map W : M m×d → R (with M m×d denoting a set of m × d matrices encoding affine maps F ) is the condition stating that the homogenized deformation is optimal for a homogenous boundary condition. Explicitly, (5.4) W (F + ∇v(x))dx ≥ W (F )|Ω| Ω for any open and bounded set Ω ⊂ Rd with |∂Ω| =0,anyF ∈ M m×d,andany ∈ 1,∞ v W0 (Ω) such the the integral on the left hand side exists. This is the property that is crucial from the point of view of variational calculus as it implies (under ∇ additional technical conditions) that the functional Ω W ( v(x))dx is sequentially weakly lower semicontinuous on W 1,p(Ω). This fact allows to use direct method of constructing minimizers of this functional on W 1,p(Ω). Notice that for m =1ord = 1, quasiconvexity implies convexity. This means, in particular, that the surface tension σ(t) is automatically convex as alluded in the preceeding lecture. On the other hand, once m, d ≥ 2, the free energy that is quasiconvex need not necessarily be convex. This is a standard fact in variational calculus, but comes as a surprise from the point view of statistical mechanics.

Possible non-convexity of W . It is thus useful to show that to get a non-convex free energy directly from a suitably chosen Hamiltonian is indeed quite straightforward. For simplicity, let us consider × the case m = d = 2. An example of nonconvex function on M 2 2 is det(F ). Indeed, 10 α 0 for F = and F = with α ≥ 0, α =1,weget 1 0 α 2 01 1 1 1+α 2 1 1 det( 2 F1 + 2 F2)= 2 > 2 det(F1)+ 2 det(F2)=α. LECTURE 5. NONLINEAR ELASTICITY 293

Consider now the free energy W stemming from a Hamiltonian HΛ as above. Let us modify the Hamiltonian as follows, ∗ HΛ(ϕ)=HΛ(ϕ)+M S(ϕ) ⊂Λ with the sum over all square cells (plaquettes) consisting of four neighbouring ver- tices  =(i0,i1,i2,i3)=(i0,i0 +(1, 0),i0 +(1, 1),i0 +(0, 1)) contained in Λ and 1 − − − − S(ϕ)= 2 det ϕ(i1) ϕ(i0),ϕ(i2) ϕ(i0) +det ϕ(i1) ϕ(i3),ϕ(i2) ϕ(i3) .

Recall that S(ϕ) is just the area of the quadrangle (ϕ(i0),ϕ(i1),ϕ(i2),ϕ(i3)) and the sum ⊂Λ S(ϕ) thus depends only on the values of ϕ on the inner boundary of Λ and represents the area of the image ϕ(Λ). Given that any contributing configuration satisfies the condition ϕ − F  ≤ 1, the above sum equals ∞(∂R Λ) | | | | ∗ 0 | | | | F (Λ) + O( ∂Λ ). As a result, HΛ(ϕ)=HΛ(ϕ)+M det(F ) Λ + O( ∂Λ )and the corresponding free energy is W ∗(F )=W (F )+M det(F ). Assuming that b ≤ W (F ) ≤ B for any F in the unit ball, F ≤1, it is easy to see that W ∗ is nonconvex for M large. Indeed, considering the affine maps F1 and F2 as above with α<1, we get ∗ 1 1 − 1 ∗ ∗ ≥− − 1+α 2 − W ( 2 F1 + 2 F2) 2 (W (F1)+W (F2)) (B b)+M( 2 α) > 0 ≥ 4(B−b) once M (1−α)2 . The free energy W ∗ corresponding to the Hamiltonian H∗ is non-convex.

Macroscopic behaviour from microscopic model Our next task is to link the microscopic description in terms of a gradient Gibbs measure with a macroscopic setting. We begin by considering lattice sets of the form 1 Zd ∩ ≡ Zd ∩ 1 Ωε = ε ε Ω ε Ω, and consider the Gibbs measure

μΩε,ψu,ε (dϕ) with an appropriate discretization ψu,ε of the boundary condition u. Here, we need a link between a macroscopic boundary condition u on Rd \ Ω d and a lattice boundary condition ψ on Z \ Ωε. A natural requirement is to take u ↔ ψ so that their gradients (the continuous gradient ∇u(x)foru and the discrete gradient ∇ψ(i)forψ)coincide.  → 1 − ∈ Zd For example, we can take u ψu,ε(i)= ε εi+Q(ε) u(y) dy for each i .On the other hand, for any ψ : Zd → Rm we can consider a canonical interpolation uψ,ε : x → uψ,ε(x) coinciding (up to a rescaling to keep gradients to coincide) with ψ on all lattice points: uψ,ε(εi)=εψ(i). 294 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

Large deviations. The principal question now is: → What is the asymptotic behaviour of μΩε,ψu,ε (dϕ) with ε 0? It turns out that we can formulate a large deviation principle that links large deviations of the Gibbs measure μ(ψu,ε) with the macroscopic variational problem Ωε in terms of a functional defined with the free energy W that we already have in place, −1 (F ) (5.5) W (F )=− lim |Ωε| log Z . ε→0 Ωε In an informal and vague formulation it says: For ϕ close to ϕ with a fixed v ∈ W 1,p(Ω) + u,itis v,ε 0  μ(ψu,ε)(ϕ) ∼ exp −ε−d W (∇v(x)) dx . Ωε Ω m Zd More precisely, using Πε for the map assigning to any ϕ ∈ (R ) the canonical interpolation, Πε(ϕ)=vϕ,ε,wehave[KL], Theorem 5.2 (Large Deviation Principle). Let p>1.Then,themeasures μ(ψu,ε) satisfy the large deviation principle with the rate ε−d and the rate functional Ωε I(v)= W (∇v(x)) dx − min W (∇w(x)) dx. ∈ 1,p Ω w W0 (Ω)+u Ω Namely: ⊂ 1,p a) For any C W0 (Ω) + u closed in the weak topology, we have lim sup εd log μ(ψu,ε)(Π−1(C)) ≤−inf I(v). Ωε ε ε→0 v∈C ⊂ 1,p b) For any O W0 (Ω) + u open in the weak topology, we have d (ψu,ε) −1 ≥− lim inf ε log μ (Πε (O)) inf I(v). ε→0 Ωε v∈O Let us stress that the rate functional I(v) is exactly the functional that is used in the variational formulation of nonlinear elasticity. But while the “stored energy function” W has to be chosen by some aprioriphenomenological reasoning, here it is derived from the underlying microscopic model.

Main ingredients of the proof of quasiconvexity of W We have no place here to go through all the relevant and very technical details of the proofs from [KL] and will restrict ourselves to just a sketch of the main steps in the proof of the quasiconvexity of W . Let us begin by introducing the notation  A − (5.6) ZΩε ( )= exp βHΩε (ϕ) 1l A(ϕ) dϕ(i) (Rm)Ωε i∈Ωε for any event A⊂(Rm)Ωε . Thus, in particular, (ψ) {  −  ≤ } Z = ZΩε ( ϕ : ϕ ψ ∞ 1 ). Ωε  (∂R0 Ωε) LECTURE 5. NONLINEAR ELASTICITY 295

The main technical tool is a rather nontrivial lemma about interpolation that al- lows, roughly speaking, switch between Z ({ϕ − ϕ p ≤ κε−d−p})and(asymp- Ωε v,ε p → { − p ≤ −d−p}∩{ −  } totically with ε 0) ZΩε ( ϕ ϕv,ε κε ϕ ϕv,ε ∞ ). p(Ωε)  (∂R0 Ωε) With its help, one can obtain several equivalent limits for the free energy. Namely, we can replace (5.5) by either − | |−1 { − p ≤ −d−p}∩{ −  } W (F )= lim Ωε log ZΩε ( ϕ F κε ϕ F ∞(∂ Ω ) ). ε→0 p(Ωε) R0 ε or − | |−1 { − p ≤ −d−p} W (F )= lim lim sup Ωε log ZΩε ( ϕ F  (Ω ) κε ). κ→0 ε→0 p ε ∈ 1,p To prove the quasiconvexity of W , we first show that for any v W0 (Ω) + F we have Z ({ϕ − ϕ p ≤ κε−d−p}) ≥ exp{−ε−d W (∇v(x))dx − δ}. Ωε v,ε p(Ωε) Equipped with equivalent definitions of W , one can do it in the following steps: • Cover Ωε by a triangulation {Tj } and approximate v by a piecewise linear w consistent with {Tj }; • We have ∩ { − p ≤ −d−p}∩{ −  }⊂ j ϕ ϕw,ε κε ϕ ϕw,ε ∞ p(Tj )  (∂R0 Tj ) ⊂{ϕ − ϕ p ≤ κε−d−p}; v,ε p(Ωε) • Finally, use − { − p ≤ −d−p}∩{ −  } ∼ ∇ | | log ZTj ( ϕ ϕw,ε κε ϕ ϕw,ε ∞ ) W ( w) Tj . p(Tj )  (∂R0 Tj ) On the other hand Z ({ϕ − ϕ p ≤ κε−d−p}) ∼ Ωε v,ε p(Ωε) ∼ { − p ≤ −d−p}∩{ −  } ≤ ZΩε ( ϕ ϕv,ε κε ϕ F ∞ ) p(Ωε)  (∂R0 Ωε) ≤ { −  } ∼ {− −d| | } ZΩε ( ϕ F ∞ ) exp ε Ω W (F ) .  (∂R0 Ωε) Comparing these two bounds we get (5.4). 

Bibliography

[AKM] S.Adams,R.Koteck´y, and S. M¨uller, Strict Convexity of the Surface Tension for Non-convex Potentials. In preparation. [BF] M. Born and K. Fuchs, The Statistical Mechanics of Condensing Systems. Proceedings of the Royal Society of London. Series A, Mathematical and Physicsl Sciences, 166 (1938), 391–414. 61 (1990), 79–119. [Bi] M. Biskup, Reflection Positivity and Phase Transitions in Lattice Spin Models. Lecture Notes in Mathematics 1970 (2009), 1–86. MR2581604 [BK] M. Biskup and R. Koteck´y, Phase coexistence of gradient Gibbs states. Probability Theory and Related Fields 139 (2007), 1–39. MR2322690 [CD] C. Cotar and J.-D. Deuschel, Decay of covariances, uniqueness of ergodic component and scaling limit for a class of ∇φ systems with non-convex potential. Annales de l’Institut Henri Poincar´e - Probabilit´es et Statistiques 48 (2012), 819–853. MR2976565 [CDM] C. Cotar, J.-D. Deuschel, and S. M¨uller, Strict Convexity of the Free Energy for a Class of Non-Convex Gradient Models. Communications in Mathematical Physics 286 (2009), 359–376. MR2470934 [Do68] R. L. Dobrushin, Gibbsian random fields for lattice systems with pairwise interactions. Funct. Anal. Appl. 2 (1968), 292–301, The problem of uniqueness of a Gibbs random field and the problem of phase transition. Funct. Anal. Appl. 2 (1968), 302–312, Gibb- sian random fields. The general case. Funct. Anal. Appl. 3 (1969), 22–28. MR0250630 [Do96] R. L. Dobrushin, Estimates of Semi-invariants for the Ising Model at Low Tempera- tures. Amer. Math. Soc. Transl. (2) 177 (1996), 59–81. MR1409170 [DGI] J.-D. Deuschel, G. Giacomin, and D. Ioffe, Large deviations and concentration proper- ties for ∇ϕ interface models. Probab. Theory and Relat. Fields, 117 (2000), 49–111. MR1759509 [Fu] T. Funaki, Stochastic Interface Models, Graduate School of Mathematical Sciences, The University of Tokyo, Komaba, Tokyo 153-8914, Japan, [email protected] http://www.ms.u-tokyo.ac.jp/ funaki/ [FS] T. Funaki and H. Spohn, Motion by mean curvature from the Ginzburg- Landau ∇ϕ interface model. Commun. Math. Phys. 185 (1997), 1–36. MR1463032 [FV] S. Friedli and Y. Velenik, Equilibrium Statistical Mechanics of Classical Lattice Sys- tems: a Concrete Introduction, a book in progress, http://www.unige.ch/math/folks/velenik/smbook/index.html. [Ge] H.-O. Georgii, Gibbs Measures and Phase Transitions, Studies in Mathematics 9,Wal- ter de Gruyter, Berlin-New York, 1988. MR956646 [GOS] G. Giacomin, S. Olla, and H. Spohn, Equilibrium fluctuations for ∇ϕ interface model. Ann. Probab., 29 (2001), 1138–1172. MR1872740 [KL] R. Koteck´y and S. Luckhaus, Nonlinear Elastic Free Energies and Gradient Young- Gibbs Measures. Communications in Mathematical Physics 326 (2014), 887–917. MR3173410 [KP] R. Koteck´y and D. Preiss, Cluster expansions for abstract polymer models. Communi- cations in Mathematical Physics 103 (1986), 491–498. MR832923 [KS] R. Koteck´y and S. B. Shlosman, First-order transitions in large entropy lattice models. Communications in Mathematical Physics 83 (1982), 493–515. MR649814 [LMP] J.L. Lebowitz, A. Mazel, and E. Presutti, Rigorous Proof of a Liquid-Vapor Phase Transition in a Continuum Particle System. Physical Review Letters 80 (1998), 4701– 4704; Liquid-vapor phase transitions for systems with finite range interactions. Journal of Statistical Physics 94 (1999), 955–1025. MR1694123

297 298 ROMAN KOTECKY,´ NONLINEAR ELASTICITY

[LR] O.E. Lanford and D. Ruelle, Observables at infinity and states with short range corre- lations in statistical mechanics. Communications in Mathematical Physics 13 (1969), 194–215. MR0256687 [M-S] S. Miracle-Sol´e, On the convergence of cluster expansions. Physica A 279 (2000), 244–249. MR1797141 [Pe] O. Penrose, Statistical Mechanics of Nonlinear Elasticity. Markov Processes Relat. Fields 8 (2002), 351–364 and 12 (2006), p. 169. MR1924944 [PS] S. Pirogov and Ya. G. Sinai, Phase diagrams of classical lattice systems. Theoretical and Mathematical Physics 25 (1975), 1185–1192; 26 (1976), 39–49. MR0676316 [Sh] S. Sheffield, Random Surfaces. Ast´erisque 304 (2005), 175 pp. MR2251117 [WR] B. Widom and J. S. Rowlinson, New model for the study of liquid–vapor phase transi- tions. The Journal of Chemical Physics 52 (1970), 1670. Quantitative Stochastic Homogenization: Local Control of Homogenization Error through Corrector

Peter Bella, Arianna Giunti, Felix Otto

IAS/Park City Mathematics Series Volume 23, 2014

Quantitative Stochastic Homogenization: Local Control of Homogenization Error through Corrector

Peter Bella, Arianna Giunti, Felix Otto

Abstract: This note addresses the homogenization error for linear elliptic equations in divergence-form with random stationary coefficients. The homoge- nization error is measured by comparing the quenched Green’s function to the Green’s function belonging to the homogenized coefficients, more precisely, by the (relative) spatial decay rate of the difference of their second mixed derivatives. The contribution of this note is purely deterministic: It uses the expanded notion of corrector, namely the couple of scalar and vector potentials (φ, σ), and shows that therateofsublineargrowthof(φ, σ) at the points of interest translates one-to-one into the decay rate.

1. A brief overview of stochastic homogenization, and a common vision for quenched and thermal noise Heterogeneous materials typically have the behavior of a homogeneous material on scales that are large with respect to the characteristic length scale of the hetero- geneity, provided that the “type of heterogeneity” is the same all over the medium. Here, we think of material properties like conductivity or elasticity that are typically described by an elliptic differential operator in divergence form −∇ · a∇,wherethe heterogeneity resides in the uniformly elliptic coefficient field a. Homogenization then refers to the fact that on large scales, the solution operator corresponding to −∇·a∇ behaves like to the solution operator to −∇·ah∇ with a constant coefficient ah. This type of homogenization is well-understood in case of a periodic medium, where, for instance, ah can be inferred from a cell problem and the homogenization error can be expanded to any order in the ratio of the period to the macroscopic scale of interest. However, the case where only the statistical specification of the coefficient field is known might be more relevant in practice. In this case, one thinks of an ensemble · of coefficient fields and one speaks of stochastic homogenization. For homogenization to occur, the statistics have to be translation invariant, mean- ing that the joint distribution of the shifted coefficient field is identical to that of the original coefficient field, a property called stationarity in the parlance of sto- chastic processes. For homogenization to be effective, the statistics of the coefficient field have to decorrelate (more precisely, become independent) over large distances.

Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103 Leipzig, Germany

c 2017 American Mathematical Society

301 302 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

Hence stochastic homogenization relies on the separation of scales between the cor- relation length and the macroscopic scale of interest. From a qualitative point of view, stochastic homogenization has been rigorously understood since the seminal works of Kozlov [17] and of Papanicolaou & Varadhan [21]: Stochastic homogenization takes place when the ensemble is stationary and ergodic, the latter being a purely qualitative way of imposing decorrelation over large distances. Stochastic analysis has a finer, still qualitative view, on stochastic homogeniza- tion, where −∇ · a∇ is seen as the generator of a (reversible) diffusion process. In a discrete medium this leads to the picture of a “random walk in random environ- ment”, which amounts to the superposition of thermal noise (the random walk) and quenched noise (the random jump rates). Here, the relevant qualitative question is that of a “quenched invariance principle”: On large scales and for large times (on a parabolic scale), the random walker behaves like a Brownian motion (with covariance given by ah) for almost every realization of the random environment. Surprisingly, first full results in that direction came quite a bit later [22]. Stochastic analysis is still mostly interested in qualitative results, but pushing the frontier in terms of models, for instance towards degenerate diffusions like random walks on (supercritical) percolation clusters or towards diffusions with random drifts leading to non-reversible random walks. Numerical analysis has another, naturally more quantitative view on stochastic homogenization. As opposed to periodic homogenization, there is no cell problem to extract the homogenized coefficient ah. Hence one has to resort to an artificial “representative volume”: On such a cube (let us adopt three-dimensional language), one samples a realization of the medium according to the given statistical specifica- tions and then solves three boundary value problems for −∇ · a∇u, corresponding to describing different slopes on u. In the case of a conducting medium, where the coefficient field a corresponds to the heterogeneous conductance, u corresponds to the electrical potential, so that the boundary conditions impose an average poten- tial gradient, that is, an electrical field in one of the three coordinate directions. One then monitors the average electrical current a∇u; this (linear) relation between average electrical field and average electrical current yields an approximation to the homogenized coefficient ah as a linear map. Clearly, this approximate homogenized coefficient ah,L will be the closer to the true homogenized coefficient ah,thelarger the linear size L of the cube is, where the relevant (small) nondimensional param- eter is the ratio of correlation length to L. Intuitively, there are two error sources: On the one hand, ah,L is a random quantity, since it depends on the given realiza- tion a of the coefficient field on the cube, so that there is a random error coming from the fluctuations of ah,L which can be reduced by repeated sampling. On the other hand, the very concept of the representative volume element perturbs the statistics, for instance in case of periodic boundary conditions, the concept intro- duces artificial long-range correlations, which is not affected by repeated sampling. Clearly, it is of interest to understand — which for mathematicians means to prove — how both errors scale in L. This natural and very practical question turns out to be difficult to analyze (rigorously). Shortly after the qualitative theory was introduced, Yurinski˘ı[23] produced the first quantitative result motivated by the above questions. He in 1. A BRIEF OVERVIEW 303 fact used ingredients from stochastic analysis (the picture of diffusions in a ran- dom medium to understand sensitivities) and from regularity theory (Nash’s upper heat kernel bounds), but only obtained sup-optimal results in terms of exponents. After using (qualitative) stochastic homogenization to understand the large-scale correlation structure of a statistical mechanics model (fluctuating surfaces) in [19], Naddaf & Spencer [20] realized that tools from statistical mechanics (spectral gap estimate) can be used to obtain quantitative results in stochastic homogenization (for discrete media). With Meyers perturbation of the Calderon-Zygmund estimate they introduced a second tool from elliptic regularity theory into the field, which allowed them to obtain optimal variance estimates in the case of small ellipticity ratio (small contrast media); this type of result was subsequently extended by Con- lon and coworkers, see e. g. [6]. In [13, 14], Gloria and the last author used the same tool from statistical mechanics but yet another ingredient from elliptic regu- larity theory (Caccioppoli’s estimate to obtain optimal spatially averaged bounds on the gradient of the quenched Green’s function) to obtain the first optimal error estimate on the representative volume method also for large ellipticity ratios. In [18], Marahrens and the last author used the concentration of measure property coming from the logarithmic Sobolev inequality to study the (random part of the) homogenization error itself, in form of optimal estimates on the variance of the quenched Green’s function. Using Green’s function estimates and two-scale ex- pansion, Gloria, Neukamm, and the last author [10] compared the heterogeneous and corrected homogenenous solution. While in [10] the error is measured in H1 and averaged both over the domain and the ensemble, Gu and Mourrat [16]re- cently combined probabilistic techniques with Green’s function estimates to obtain a pointwise bound for solutions of both elliptic and parabolic equations. Since then, there has been a flurry of activities, which will be partially addressed in the next two sections, with the work of Armstrong & Smart [2] playing a central role. For in- stance, by now it is already understood that the error in the representative volume method is to leading order Gaussian [12]. We do not even mention the numerous activities in stochastic homogenization of non-divergence form equations, like fully nonlinear equations or Hamilton-Jacobi equations.

In both qualitative and quantitative homogenization, both from the PDE and the stochastic analysis point of view, the notion of the corrector, a random function φi for every coordinate direction i =1,...,d, is central. There is a very geometric and deterministic view of the corrector: Given a realization a of the coefficient field, x → xi + φi(x)providesa-harmonic coordinates. This is also its main merit from the almost-sure stochastic analysis point of view: Seen in these coordinates, the diffusion turns into a Martingale. The corrector is also natural from the numerical analysis point of view: In the representative volume method, one actually solves for an approximate version of the corrector. Last not least, in the original (very functional analytic) PDE approach to stochastic homogenization the corrector is central: Using stationarity, one lifts the equation for the corrector to the probability space, solves it by Lax-Milgram, and expresses ah in terms of it. Like the work [8] on higher-order Liouville principles, this note demonstrates the usefulness of the vector potential σ of the corrector, an object known in periodic homogenization, and recently introduced into random homogenization by Gloria, Neukamm, and the last author in [9]. 304 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

We have a common vision for a regularity theory of random elliptic operators as considered in this note and for stochastic partial differential equations (SPDE). In other words, we’d like to capitalize more on the similarities between quenched and thermal noise. At first glance, these problems seem very different since in the first case, one is interested in the emergence of a generic large-scale regularity due to cancellations, while in the second case one wants to preserve a small-scale regu- larity despite the rough forcing. However, in both cases, the key is to understand how sensitively the solution φ of an elliptic or parabolic equation (nonlinear to be of interest in case of SPDE with additive noise while in case of stochastic homog- enization, the interesting effect is already present for a linear equations) depends on the data, be it the coefficients in case of stochastic homogenization or the right hand side in case of a random forcing, in which case this derivative can be asso- ciated to the Malliavin derivative. In this sensitivity analysis, one typically has to control the size of a functional derivative, that is, the functional derivative of some nonlinear functional of the solution (a norm, say) with respect to the data, which in case of the data being the coefficients is a highly nonlinear mapping even for a linear equation. Hence for a given realization of the data, one has to control the norm of a linear form on infinitesimal perturbations of the data. Since one is dealing with random stationary data, the appropriate measure of the size of the infinitesimal perturbations is best captured by an L2-type norm, the specific struc- ture of which depends on the assumption on the noise: An ordinary L2-norm in case of white-noise forcing or a more nonlocal (and thus weaker) norm in case of stochastic homogenization if one wants to cover also cases where the covariances of the coefficient field have a slow (that is, non-integrable) decay, as used in [9]. Even if the problem is a nonlinear one, the sensitivity estimates require a priori estimates for linear elliptic or parabolic equations, albeit with non-constant coefficients that a priori are just uniformly elliptic — it is here where all the help of classical reg- ularity is needed. Once the appropriate, purely deterministic sensitivity estimates are established, it is the principle of concentration of measure that provides the stochastic estimates on the random solution itself. In a work in preparation with Hendrik Weber, the last author is applying this philosophy to the fully non-linear − 2 parabolic equation φ + ∂tφ ∂xπ(φ)=ξ forced by space-time white noise ξ to 1 − establish H¨older- 2 bounds with exponential moments in probability.

2. Precise setting and motivation for this work While the contribution of this note is purely deterministic, and the main result will be stated without reference to probabilities in the next section, the motivation is probabilistic and will be given now. In elliptic homogenization, one is interested in uniformly elliptic coefficient fields a in d-dimensional space Rd, where uniform ellipticity means that there exists a (once for all fixed) constant λ>0 such that (1) ∀x ∈ Rd,ξ ∈ Rd ξ · a(x)ξ ≥ λ|ξ|2, |a(x)ξ|≤|ξ|. To such a coefficient field a we associate an elliptic operator in divergence form −∇ · a∇. For simplicity (in order to avoid dealing with the correctors of the dual equation), we shall assume that a(x), and thus the corresponding operator, is sym- metric. We note that statements and proofs remain valid in the case of systems with the above strong ellipticity property, but for simplicity, we shall stick to scalar notation like in (1), where we wrote ξ ∈ Rd instead of a tensor-valued object. 2. PRECISE SETTING AND MOTIVATION FOR THIS WORK 305

In stochastic homogenization, one is interested in ensembles of uniformly elliptic coefficient fields, that is, probability measures on this space. We denote by · the corresponding expectation and use the same symbol to refer to the ensemble. Minimal requirements for homogenization are stationarity and ergodicity,where both notions refer to the action of the translation group Rd on the space of uniformly elliptic coefficient fields. Stationarity means that the distribution of the random field a is invariant under shifts a(z + ·) for any shift vector z ∈ Rd. Ergodicity means that shift-invariant random variables, that is, functionals a → ζ(a)that satisfy ζ(a(z + ·)) = ζ(a) for all shifts z, must be (almost surely) constant. Under these assumptions, the classical theory of (qualitative) stochastic homogenization introduced by Kozlov [17] and by Papanicolaou & Varadhan [21] establishes the (almost sure) existence of sublinear correctors. More precisely, for any coordinate direction i =1,...,dand a given realization a of the coefficient fields, the corrector φi = φi(a) modifies the affine coordinate x → xi to an a-harmonic coordinate x → xi + φi(x), that is,

(2) −∇ · a(ei + ∇φi)=0.

In order to be rightfully named a corrector, φi should be dominated by xi,thatis, have a sublinear growth, at least in the L2-averaged sense of

1 2 1 2 (3) lim (φi − φi) dx =0. R↑∞ R BR BR Under the assumptions of stationarity and ergodicity for · the classical theory constructs d functions φ1(a, x),...,φd(a, x) such that (2) and (3) are satisfied for ·-a. e. a.

The actual construction of φi for a fixed coordinate direction i =1,...,d proceeds as follows: In a first step, one constructs a random vector field fi(a, x) ∈ d R that is stationary in the sense of fi(a(z + ·)),x)=fi(a, x + z), of expected 2 value fi = ei and of finite second moments |fi| < ∞ andthatiscurl-free ∂j fik − ∂kfij = 0 and satisfies the divergence condition ∇·(afi)=0.Usingthe stationarity to replace spatial derivatives by “horizontal derivatives”, the existence and uniqueness of these harmonic 1-forms follows from Lax-Milgram. The second step is to consider the random field φi(a, x) which satisfies ∇φi = fi −fi = fi −ei — and thus satisfies (2) — and is (somewhat arbitrarily) made unique by the anchoring φi(x = 0) = 0 and thus generically non-stationary. One then appeals to ergodicity (von Neumann’s version combined with a maximal function estimate) to show with help of Poincar´e’s inequality that ∇φi = 0 translates into (3). Incidentally, the homogenized coefficients are then given by ahei = a(ei + ∇φi) = afi and are easily shown to satisfy (in the symmetric case) the same ellipticity bounds as the original ones, c. f. (1), d 2 (4) ∀ξ ∈ R ξ · ahξ ≥ λ|ξ| , |ahξ|≤|ξ|.

We now make the point that next to the scalar potential φi, it is also natural to consider a vector potential, in the parlance of three-dimensional vector calculus. Indeed, the above (purely functional analytic) first step constructs the harmonic stationary 1-form fi. This means that next to the stationary closed 1-form fi,there is the stationary closed (d − 1)-form qi := afi. In the language of a conduction model, where the tensor a is the conductivity, fi corresponds to the electric field 306 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION whereas afi corresponds to the current density. Hence next to considering a (non- stationary) 0-form φi with dφi = fi −fi, it is natural to also consider a (non- stationary) (d − 2)-form σi with dσi = qi −qi, where d denotes the exterior derivative. In Euclidean coordinates, a (d − 2)-form σ is represented by a skew- symmetric tensor {σjk}j,k=1,...,d,thatis,σjk + σkj =0,thendσi = qi −qi translates into ∇·σi = qi−qi,where(∇·σi)j := ∂kσijk, using Einstein’s convention of summation over repeated indices. In view of the definitions qi = afi, fi = ei + ∇φi,andahei = a(ei + ∇φi), this implies

(5) ∇·σi = a(ei + ∇φi) − ahei and σi is skew.

The merit of this vector potential σi has been recognized in the case of peri- odic homogenization and lies in a good representation of the homogenization error: When comparing a solution of −∇ · a∇u = f to a solution of the corresponding homogenized problem −∇ · ah∇v = f, more precisely to the “corrected” solution v + φi∂iv (with summation convention), one obtains for the homogenization error w := u−(v+φi∂iv) the following simple equation: −∇·a∇w = ∇·((φia−σi)∇∂iv). It is σ that allows to bring the r. h. s. into divergence form, which makes simple energy estimates possible. In [9], we follow the classical arguments to show existence of a sublinear σ under the mere assumptions of stationarity and ergodicity. More precisely, given a coordinate direction i =1,...,d, we show by a suitable choice of gauge that there exists a skew symmetric tensor field σi(a, x) such that its gradient ∇σi is stationary, of mean zero, of finite second moments and such that ∇·σi = qi −qi.Wethen appeal to the same arguments as for φi to conclude 1 2 1 2 (6) lim |σi − σi| dx =0 ↑∞ R R |x|≤R |x|≤R for almost every realization a of the ensemble ·. The contribution of this note is purely deterministic in the following sense: We will consider a fixed coefficient field a which is uniformly elliptic in the sense of (1) and a constant coefficient ah, also elliptic in the sense of (4). We then assume that for every coordinate direction i =1,...,d, there exists a scalar field φi with (2) and a skew-symmetric tensor field σi with (5). Our main assumption will be a quantification of the sublinear growth (3) and (6), roughly in form of

1 2 1 | − |2 (7) lim − (φ, σ) (φ, σ) dx =0 R↑∞ 1 α R BR BR for some exponent α>0andfor·-almost every a. Obviously, this quantification of the sublinear growth of fields that have a stationary gradient of vanishing expec- tation can only be true for a quantified ergodicity of ·. The discussion and proof of criteria under which assumption on · the strengthened sublinearity (7) holds is not part of this note, but of ongoing work. We just mention here that for d>2 and under the assumption that · satisfies a spectral gap estimate with respect to Glauber dynamics, it is shown in [15], extending the arguments of [11]fromthe case of a discrete medium to a continuum medium, that a stationary corrector φi exists, which in particular implies (7) for the φi-part for every α<1. We also men- tion that (7) is expected to hold for some α>0 under much weaker assumption on 3. MAIN RESULTS 307

·,asisalreadysuggestedby[2], and more explicitly in [1], and will be formulated as needed here in an updated version of [9], where the rate of decay of correlations of a will be related to α.

3. Main results In this section, we state and comment our main results, namely Theorem 2 and Corollary 3. As mentioned in the introduction, our standing assumption is the uniform ellipticity of the fixed variable coefficient a on Rd and the constant coeffi- cient ah, c. f. (3) and (4), respectively, and the existence of the scalar and vector potentials (φ, σ)={(φi,σi)}i=1,...,d} with (2) & (5). Our result, Theorem 2 and Corollary 3, on the homogenization error relies and expands on a large-scale regularity result established in [9] and stated in Theorem 1. It falls into the realm of Campanato iteration for the C1,α-Schauder theory for elliptic equations in divergence form. In the framework of this theory, the C1,α- H¨older-semi-norm is expressed in terms of a Campanato norm that monitors the decay of the (spatially averaged) energy distance to affine functions as the radius becomes small. In Theorem 1 below, affine functions in the Euclidean sense are replaced by a-linear functions, the space of which is spanned by the constants and the d correctors φ1,...,φd. This intrinsic “excess decay” (in the parlance of De Giorgi’s regularity theory for minimal surfaces) kicks in on scales where the corrector pair (φ, σ) is sufficiently sublinear, c. f. (8). This is not surprising since in view of (2) & (5), which we rewrite as (a − ah)ei = ∇·σi − a∇φi,(φ, σ) is an averaged measure of distance of the variable coefficient a to the constant coefficient ah, whose harmonic functions of course feature excess decay. In this sense, Theorem 1 is a perturbation result around constant coefficients. In the context of stochastic homogenization, Theorem 1 amounts to a large-scale regularity result for a-harmonic functions, since in view of (3) and (6), the smallness (8) is expected to kick in above a certain length scale r∗ < ∞.

The idea of perturbing around the homogenized coefficient ah in a Campanato- type iteration to obtain a large-scale regularity theory (Schauder and eventually Calderon-Zygmund) is due to Avellaneda & Lin, who carried this program out in theperiodiccase[3]. Recently, Armstrong & Smart [2] showed that this philosophy extends to the random case, a major insight in stochastic homogenization. This in turn inspired [9] to make an even closer connection by introducing the intrinsic excess and by using the vector potential σ to establish its decay. Theorem 1 (Gloria, Neukamm, O.). Let α ∈ (0, 1). Then there exists a con- stant C = C(d, λ, α) with the following property: Suppose that the corrector has only a mild linear growth in the point x =0in the sense that there exists a radius r∗ such that 1 2 2 r (8) |(φ, σ) − (φ, σ)| ≤ for r ≥ r∗. Br Br C Then for any radii r∗ ≤ r ≤ R and every a-harmonic function u in {|x|≤R} we have (9) 1 1 2 α 2 2 r 2 inf |∇u − ξi(ei + ∇φi)| ≤ C inf |∇u − ξi(ei + ∇φi)| . ξ∈Rd ξ∈Rd Br R BR 308 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

Here comes the first main result on the homogenization error. It relates the homogenization error to the amount of sublinear growth of the corrector couple (φ, σ). Theorem 2. Suppose that the corrector grows sublinearly in two points x ∈ {0,x0} in the sense that there exists an exponent α ∈ (0, 1) and there exists a radius r∗ < ∞ such that 1 2 1−α 2 r (10) |(φ, σ) − (φ, σ)| ≤ for r ≥ r∗. ∗ Br (x) Br (x) r For a square integrable vector field g we compare ∇u defined through (11) −∇ · a∇u = ∇·g with ∂iv(ei + ∇φi) (using Einstein’s summation convention of summation over re- peated indices), where ∇v is defined through

(12) −∇ · ah∇v = ∇·(gi(ei + ∂iφ)). ⊂ | |≥ Provided supp g Br∗ (0) and x0 4r∗ we have 1 1 ˆ 2 |x0| ˆ ln 2 (13) |∇u − ∂ v(e + ∇φ )|2 ≤ C(d, λ, α) r∗ |g|2 . i i i |x | B (x ) ( 0 )d+α r∗ 0 r∗

Estimate (13) holds also without the logarithmic factor. To see this, in its proof just apply Theorem 1 with H¨older exponent strictly larger than the exponent α from (10), so that (60) becomes a sum of a geometric series. Theorem 2 compares the Helmholtz projection T based on a with the Helm- holtz projection Th based on ah and the multiplication operator M := id + ∇φ. ∗ Loosely speaking, it states that T ≈ MTh M . We post-process Theorem 2 to get a measure of the homogenization error on the level of the Green’s functions in Corollary 3, our second main result. More precisely, we compare the “quenched” Green’s function G(x, y) and the Green’s function (or rather fundamental solution) Gh(x) belonging to the homogenized coefficient ah characterized by

(14) −∇x · a(x)∇xG(x, y)=δ(x − y)and −∇·ah∇Gh = δ. Let us make two remarks on the existence of the quenched Green’s function: 1) In case of d = 2, the definition of G is at best ambiguous. However Corollary 3 only involves gradients of the Green’s function which are unambiguously defined and for instance constructed via approximation through a massive term or through Dirichlet boundary conditions. 2) DeGiorgi’s counterexample [7] implies that in the system’s case, there are uniformly elliptic coefficient fields that do not admit a Green’s function. However, in [5] we show that under the mere assumption of stationarity of an ensemble · of coefficients, the Green’s function G(a, x, y)exists for ·-a. e. a. We thus will not worry about existence in this note.

The corollary compares G(x, y)toGh(x − y) on the level of the mixed second 2 derivatives ∇x∇yG(x, y) (interpreted as a 1-1 tensor) and −∇ Gh(x−y), where the mixed derivative of the homogenized Green’s function is corrected in both variables, leading to the expression −∂i∂j Gh(x−y)(ei+∇φi(x))⊗(ej +∇φj (y)). The corollary 4. PROOFS 309 monitors the rate of decay of this difference in an almost pointwise way, just locally averaged over x ≈ x0 and y ≈ 0 and shows that, up to a logarithm, the rate of −d−α −α −d the decay is |x0| ,whichisby|x0| stronger than the rate of decay of |x0| 2 −α of the −∇ Gh(x0). The main insight is thus that this relative error of |x0| is dominated by the sublinear growth rate of the corrector couple (φ, σ), where it is only necessary to control that growth at the two points of interest, that is, 0 and x0. In this sense, Corollary 3 expresses a local one-to-one correspondence between the sublinear growth of the corrector and the homogenization error.

Corollary 3. Suppose for some exponent α ∈ (0, 1) and some radius r∗ the corrector couple (φ, σ) satisfies ( 10)intwopointsx =0,x0 of distance |x0|≥4r∗. Then we have (using summation convention)

B r∗ (x0) B r∗ (0) 2 2 1 2 2 |∇x∇yG(x, y)+∂i∂j Gh(x − y)(ei + ∇φi(x)) ⊗ (ej + ∇φj (y))| dy dx

| | ln x0 ≤ r∗ C(d, λ, α) | | . ( x0 )d+α r∗ In order to pass from Theorem 2 to Corollary 3, we need the following statement on families (rather ensembles) of a-harmonic functions, which is of independent interest and motivated by [4]. Lemma 4. For some radius R, we consider an ensemble · (unrelated to the one coming from homogenization) of a-harmonic functions u in {|x|≤2R}.Then we have ˆ ! * + (15) |∇u|2 dx  sup |Fu|2 , | |≤ R F x 2 where the supremum runs over all linear functionals F bounded in the sense of ˆ (16) |Fu|2 ≤ |∇u|2 dx, |x|≤R and where here and in the proof  means ≤ C with a generic C = C(d, λ). We note that the statement of Lemma 4 is trivial for an ensemble · supported on a single function. Hence Lemma 4 expresses some compactness of ensembles of a-harmonic functions.

4. Proofs Proof of Theorem 2. Throughout the proof  denotes ≤ C,whereC is a generic constant that only depends on the dimension d, the ellipticity ratio λ>0, and the exponent α ∈ (0, 1). By a rescaling of space we´ may assume w. l. o. g. that r∗ =1; 2 1 by homogeneity, we may w. l. o. g. assume ( |g| ) 2 = 1. We set for abbreviation 1 | |≥ R := 4 x0 1. We first list and motivate the main steps in the proof. Step 1. In the first step, we upgrade the excess decay (9) in the sense that we replace the optimal slope ξ on the l. h. s. by a fixed slope that does not depend on r, but is the optimal slope for some radius r0 of order one. More precisely, suppose 310 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION that (10) holds for a point x,sayx = 0. Then there exists a radius 1 ≤ r0  1 such that for all radii R ≥ r0 and a-harmonic functions u in {|x|≤R},theoptimal slope on scale r0 ˆ 2 ξ := argmin |∇u − ξi(ei + ∇φi)| Br0 is such that for r ∈ [r0,R]:

ˆ 1 ˆ 1 2 2 2 r d +α 2 (17) |∇u − ξi(ei + ∇φi)|  ( ) 2 |∇u| . Br R BR In addition, we have

1 1 2 2 (18) |ξ|  |∇u|2  |∇u|2 . Br BR

Since r0  1, the second inequality holds, possibly with a worse constant, for all 1 ≤ r ≤ R. Step 2. We have the following estimates on ∇u and ∇v

ˆ 1 2 |∇ |2  | |d|∇ | | |d+1|∇2 |  (19) u 1, sup|x|≥2( x v(x) + x v(x) ) 1.

Moreover, u and v have vanishing “constant invariant” ˆ ˆ

(20) ∇ηr · a∇u = ∇ηr · ah∇v =0 and, thanks to the special form of the r. h. s. of (12), identical “linear invariants” (for k =1,...,d) ˆ ˆ

(21) ∇ηr · ((xk + φk)a∇u − ua(ek + ∇φk)) = ∇ηr · (xkah∇v − vahek),

≥ x {| |≤ } for all r 1, and where ηr(x)=η( r )andη is a cut-off function for x 1 in {|x|≤2}. We speak of invariants, since for two a-harmonic functions u andu ˜ (in our caseu ˜ = 1 for the constant invariant andu ˜ = xk +´φk for the linear invariant) {| |≥ } · ∇ − ∇ defined in x 1 , the value of the boundary integral ∂Ω ν (˜ua u ua u˜)does not depend on the open set Ω provided the latter contains B1. Step 3. We consider the homogenization error ˜ (22) w := u − (v + φi∂iv), where φ˜ denotes the following blended version of the corrector:

(23) φ˜ := (1 − η)(φ − φ)+η(φ − φ), B1(0) B1(x0) where η is a cut-off function for {|x − x0|≤R} in {|x − x0|≤2R}.Thenwehave (24) −∇ · a∇w = ∇·h for |x|≥2, where ∇w and h satisfy the following estimates

1 ˆ 2 (25) |∇w|2  1 |x|≥2 4. PROOFS 311 and ˆ 1 2 1 (26) |h|2  for r ≥ 2, d +α |x|≥r r 2

1 ˆ 2 d | |2  r 2 ≤ ≤ (27) h d+α for 1 r R. |x−x0|≤r R Moreover, w has an asymptotically vanishing constant invariant ˆ

(28) lim ∇ηr · a∇w =0 r↑∞ and asymptotically vanishing linear invariants ˆ

(29) lim ∇ηr · ((xk + φk)a∇w − wa(ek + ∇φk)) = 0 for k =1,...,d. r↑∞

Step 4. Extension into the origin. There existsw ¯ (with square integrable gradient), h¯,andf¯ defined on all of Rd such that (30) −∇ · a∇w¯ = ∇·h¯ + f,¯ while (31) ∇w¯ = ∇w in {|x|≥4}, supp h¯ ⊂{|x|≥2}, supp f¯ ⊂{|x|≤4}. The r. h. s. f¯ and h¯ satisfy the estimates ˆ 1 2 (32) f¯2  1, and ˆ 1 2 1 (33) |h¯|2  for r ≥ 2, d +α |x|≥r r 2

1 ˆ 2 d |¯|2  r 2 ≤ ≤ (34) h d+α for 1 r R. |x−x0|≤r R Moreover, we have vanishing constant and linear invariants: ˆ ˆ ¯ ¯ ¯ (35) f =0, ((ek + ∇φk) · h − (xk + φk)f)=0 fork =1,...,d.

The estimates (32)-(34) ensure that the integrals in (35) converge absolutely. Step 5. From the equation (30), the vanishing invariants (35), and the esti- mates (32) & (33) it follows ˆ 1 2 ln R (36) |∇w¯|2  . d +α |x|≥R R 2

Step 6. From the same ingredients as in Step 5 and in addition (34), one obtains the following localized version of (36): 1 ˆ 2 |∇ |2  ln R w¯ d+α . |x−x0|≤1 R 312 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

Step 7. Conclusion. Argument for Step 1 We first argue that there exists a radius 1 ≤ r ∼ 1 such that 0 2 2 (37) |ξi · (ei + ∇φi)| ∼|ξ| for r ≥ r0. Br Indeed, (37) easily follows from the sublinear growth (10) of φ in the point 0. The upper boundffl is a consequence of Caccioppoli’s estimate for the a-harmonic function xi +(φi − φi),cf.(2),leadingto B2r 1 1 2 1 2 |ξ · (e + ∇φ )|2  |ξ (x +(φ − φ ))|2 i i i r i i i i Br B2r B2r 1 1 2  |ξ| 1+ |φ − φ|2 . r B2r B2r The lower bound is a consequence of Jensen’s inequality once we introduce a cut-off x function η of B 1 in B1 and set ηr(x)=η( ): 2 r ˆ 1 ˆ 1 2 2 2 2 |ξi · (ei + ∇φi)| ≥ ηr|ξi · (ei + ∇φi)| Br ˆ    ≥ 1  · ∇  ´ 1  ηrξi (ei + φi) ( ηr ) 2  ˆ ˆ    1  − − ∇  = ´ 1 ξ ηr ξi (φi φi) ηr ( η ) 2 r Br 1 2 1 d d −1 2 ≥|ξ| r 2 − Cr 2 |φi − φi| . C Br Br

For an a-harmonic function u in B and a radius r ≤ r ≤ R we consider R 0 2 (38) ξr := argmin |∇u − ξi(ei + ∇φi)| Br and claim that 1 2 2 (39) |ξr − ξR|  inf |∇u − ξi(ei + ∇φi)| . ξ BR Indeed, by the triangle inequality in Rd and since α>0, it is enough to show for   r0 ≤ r ≤ r ≤ R with r ≤ 2r that 1 α 2 r 2 |ξr − ξr |  inf |∇u − ξi(ei + ∇φi)| , ξ R BR which thanks to Theorem 1 follows from 2 2 2 |ξr − ξr |  inf |∇u − ξi(ei + ∇φi)| +inf |∇u − ξi(ei + ∇φi)| , ξ ξ Br Br which by definition (38) in turn follows from

2 2 2 |ξr − ξr |  |∇u − ξr,i(ei + ∇φi)| + |∇u − ξr,i(ei + ∇φi)| . Br Br 4. PROOFS 313

2 The latter finally follows from the triangle inequality in L (Br) and from (37) applied to ξ = ξr − ξr .

Now (17) is an easy consequence of (39) (with (r, R) replaced by (r0,r)) and (9):

ˆ 1 ˆ 1 2 (39) 2 2 ˜ 2 |∇u − ξi(ei + ∇φi)|  inf |∇u − ξi(ei + ∇φi)| ˜ Br ξ Br ˆ 1 (9) 2 r d +α 2  ( ) 2 |∇u| . R BR

We finally turn to the argument for (18). For this purpose we first note that for all radii r ≥ r0

1 1 2 2 2 2 (40) |ξr| + inf |∇u − ξi(ei + ∇φi)|  |∇u| . ξ Br Br Indeed, that the second l. h. s. term is dominated by the r. h. s. is obvious. For 2 the first l. h. s. term we note by (37), the triangle inequality in L (Br), and the definition (38) of ξr that

1 2 2 |ξr|  |ξr,i(ei + ∇φi)| Br 1 1 2 2 2 2 ≤ inf |∇u − ξi(ei + ∇φi)| + |∇u| . ξ Br Br

Equipped with (40), and more importantly (39) and (9), we may now tackle (18). For the first estimate in (18), we appeal to the triangle inequality, (39) with (r, R) replaced by (r0,r), and (40):

1 (39) 2 2 |ξr |≤|ξr| + |ξr − ξr|  |ξr| + inf |∇u − ξi(ei + ∇φi)| 0 0 ξ Br 1 (40) 2  |∇u|2 . Br 2 d For the second estimate in (18), we use the triangle inequality in L (Br)andinR , (39),(9),and(40):

1 1 2 (38) 2 2 2 |∇u| ≤|ξR| + |ξR − ξr| + inf |∇u − ξi(ei + ∇φi)| ξ Br Br 1 (40) 2 2  |ξR| +max inf |∇u − ξi(ei + ∇φi)| ρ∈{r,R} ξ Bρ 1 (9) 2 2  |ξR| + inf |∇u − ξi(ei + ∇φi)| ξ BR 1 (40) 2 ≤ |∇u|2 . BR 314 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

Argument for Step 2. The first estimate in (19) is an immediate conse- quence of the definition (11) and the energy estimate. We now turn to the second estimate in (19). We first note that the r. h. s.g ˜ := gi(ei + ∂iφ) of (12) can be bounded with help of Caccioppoli’s estimate

ˆ ˆ 1 ˆ 1 ˆ 1 2 2 2 | |≤ 2 | ∇ |2  | − |2  g˜ gi ei + φi x +(φ φ) 1. B1 B2 B2

Since moreover, it is supported in B1, we obtain the following representation for |x|≥2 ˆ ˆ 2 2 3 ∇v(x)= ∇ Gh(x − y)˜g(y)dy, ∇ v(x)= ∇ Gh(x − y)˜g(y)dy in terms of the constant-coefficient Green’s function Gh. The second estimate in (19) now follows using the homogeneity of Gh. We now turn to the invariants. The first identity (20) follows immediately from integration by parts of the equations (11) & (12) using the fact that the respective right hand sides are supported in B1. We now turn to the second identity (21), which also follows from integration by parts, but this time relying on the Green formulas

∇·[(xk + φk)(a∇u + g) − ua(ek + ∇φk)] = (ek + ∇φk) · g,

∇·[xk(ah∇v + gi(ei + ∂iφ)) − vahek]=ek · [gi(ei + ∂iφ)], which follow from (11) & (12) in conjunction with (2), and the pointwise identity

(ek + ∇φk) · g = ek · [gi(ei + ∂iφ)].

Argument for Step 3. We start by establishing the formula (24) with (41) ˜ h := (φia − σ˜i)∇∂iv + ∂iv ( φi − φi)a − ( σi − σi) ∇η, B1(0) B1(x0) B1(0) B1(x0) where, in line with (23), we have set

(42)σ ˜ := (1 − η)(σ − σ)+η(σ − σ). B1(0) B1(x0) Indeed, from definition (22) & (23) we obtain

˜ (43) ∇w = ∇u − [∂iv(ei + ∇φi +( φi − φi)∇η)+φi∇∂iv] B1(0) B1(x0) and thus, using ∇·a(ei + ∇φi)=0,cf.(2), −∇ · a∇w = −∇ · a∇u + ∇∂ v · a(e + ∇φ ) i i i ˜ + ∇·[∂iv( φi − φi)a∇η + φia∇∂iv]. B1(0) B1(x0)

Using the identity a(ei + ∇φi)=ahei + ∇·σi,cf.(5),wehave∇∂iv · a(ei + ∇φi)= ∇·ah∇v + ∇∂iv · (∇·σi). Using that the r. h. s. of (11) & (12) are supported in B ,wethusobtainin{|x|≥1} 1 ˜ −∇ · a∇w = ∇∂iv · (∇·σi)+∇·[∂iv( φi − φi)a∇η + φia∇∂iv]. B1(0) B1(x0) 4. PROOFS 315

It remains to substitute σ byσ ˜ and to bring the related part of the above r. h. s. into divergence form. Indeed, by definition (42)

∇·σi = ∇·σ˜i −∇·(η( σi − σi)) = ∇·σ˜i − ( σi − σi)∇η, B1(0) B1(x0) B1(0) B1(x0) and by the identities

(44) ∇ζ · (∇·σ)=∇·(ζ∇·σ)=−∇ · (σ∇ζ)forskewσ, we obtain as desired

∇∂iv · (∇·σi)=−∇ · [˜σi∇∂iv] −∇·[∂iv( σi − σi)∇η]. B1(0) B1(x0)

Inffl order toffl estimate the contribution to ∇w and h that comes from the differ- ence − of the average values we now argue that B1(0) B1(x0)        −   1−α (45)  (φ, σ) (φ, σ) R . B1(0) B1(x0)

To keep notation light, we write φ instead of (φ, σ). Let us first argue how to reduce (45) to        −   1−α ≥ ∈{ } (46)  φ φ r for r 1,x 0,x0 . Br (x) B1(x)

Indeed, (45) follows from (46) and our assumption (10) via the string of inequalities        −   φ φ B1(0) B1(x0) 1 1 2 2 ≤ |φ − φ|2 + |φ − φ|2 1 1 B2R( 2 x0) B1(0) B2R( 2 x0) B1(x0) 1 1 2 2  |φ − φ|2 + |φ − φ|2 B4R(0) B1(0) B4R(x0) B1(x0) 1 1 2 2  |φ − φ|2 + |φ − φ|2 B4R(0) B4R(0)   B4R(x0) B4R(x0)         (10),(46)  −   −   1−α +  φ φ +  φ φ R . B4R(0) B1(0) B4R(x0) B1(x0)

For (46), we focus on x = 0 and note that by a decomposition into dyadic radii, it is enough to show        −   1−α ≥  ≥ ≥  φ φ r for 2r r r 1. Br Br 316 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

This estimate follows from a similar string of inequalities as the one before:

  1   1 2   2  −  ≤ | − |2 | − |2  φ φ φ φ + φ φ Br Br Br Br Br Br 1 1 2 2  |φ − φ|2 + |φ − φ|2 Br Br Br Br and an application of (10). We now turn to (25). We start from the formula (43), which we rewrite as

∇w = ∇u − ∂iv ei + ∇φi +( φi − φi)∇η B (0) B (x ) 1 1 0

− (φi − φi)+η( φi − φi) ∇∂iv. B1(0) B1(0) B1(x0) From the estimate (45) on averages and the estimate (19) on v we thus obtain for |x|≥2 |∇w|  |∇u| + |x|−d |id + ∇φ| + R1−α|∇η|

(47) + |x|−(d+1) |φ − φ| + R1−αη . B1(0) Taking the L2({|x|≥2})-norm, we see that the ∇u-term is bounded according to (19). By the choice of the cut-off function η, the function |x|−d|∇η| + |x|−(d+1)η is −(d+1) supported in {|x − x0|≤2R} and bounded by R so that the contribution of ´ 1 2 − − d |∇ |2 α 2 this term to |x|≥2 w is estimated by R . We turn to the term involving φ and will for later purposes show the slightly more general statement for r ≥ 1

1 ˆ 2 | |−(d+1)| − | 2  1 (48) ( x (φ, σ) (φ, σ) ) d , 2 +α |x|≥r B1(0) r which, restricting to φ in our notation, follows by dyadic summation from

1 ˆ 2 2 d +1−α |φ − φ|  r 2 , r≤|x|≤2r B1(0) which trivially follows from

1 2 |φ − φ|2  r1−α. Br (0) B1(0) The last estimate is a combination of our assumption (10) with (46). We now turn to the estimate of the ∇φ-term in (47). By dyadic summation, it is enough to show

1 ˆ 2 2 d |id + ∇φ|  r 2 , r≤|x|≤2r 4. PROOFS 317 which follows from Caccioppoli’s estimate and assumption (10):

1 1 2 1 2 |id + ∇φ|2  |x +(φ − φ)|2 B2r r B4r B4r 1 1 2  1+ |φ − φ|2 . r B4r B4r

We now turn to estimate (26). For this purpose, we rewrite the definition (41) of h as

h = (φi − φi)a − (σi − σi) ∇∂iv B (0) B (0) 1 1

+ η ( φi − φi)a − ( σi − σi) ∇∂iv B (0) B (x ) B (0) B (x ) 1 1 0 1 1 0

+ ∂iv ( φi − φi)a − ( σi − σi) ∇η. B1(0) B1(x0) B1(0) B1(x0) Inserting, as above, the estimate (19) on v and the estimate (45) on the averages we obtain for |x|≥2     (49) |h|  |x|−(d+1)(φ, σ) − (φ, σ) + R1−α |x|−(d+1)η + |x|−d|∇η| . B1(0) Since by definition of η, the function |x|−(d+1)η + |x|−d|∇η| is supported in {|x − −(d+1) x0|≤2R} and bounded by R , the contribution of the second r. h. s. term in ´ 1 2 d − − d | |2 2 (d+α)  ( 2 +α) (49) to |x|≥r h vanishes for r>6R and is bounded by R R r for r ≤ 6R. The first r. h. s. term in (49) was treated in (48). We finally turn to the last estimate (27). For this purpose, we note that because ofthechoiceofη,onBR(x0), the definition (41) of h turns into

h = (φi − φi)a − (σi − σi) ∇∂iv, B1(x0) B1(x0) so that by the estimate (19) on v we have for 1 ≤ r ≤ R

1 1 2 2 |h|2  R−(d+1) |(φ, σ) − (φ, σ)|2 , Br (x0) Br (x0) B1(x0) which by estimate (46) on averages turns into ⎛ ⎞ 1 1 2 2 |h|2  R−(d+1) ⎝ |(φ, σ) − (φ, σ)|2 + r1−α⎠ , Br (x0) Br (x0) Br (x0) so that the desired estimate in the strengthened form of ˆ 1 2 d +1−α | |2  r 2 h d+1 Br (x0) R now follows from assumption (10). 318 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

We finally turn to the asymptotic invariants (28) & (29). Since we may assume r ≥ 6R, we may´ ignore the presence of η in the definition (22) of w (and assume w. l. o. g. that (φ, σ) = 0 for notational simplicity). Hence we have B1(0) a∇w = a∇u − ∂iva(ei + ∇φi)+φia∇∂iv (50) (5) = a∇u − ah∇v + ∂iv∇·σi + φia∇∂iv . Using the formula ˆ

(51) ∇ηr · (ζ∇·σ + σ∇ζ)=0 forskewσ, which follows from the last identity in (44) (and even holds if ζ a priori is not defined in {|x|≤r} since it can be arbitrarily extended), we derive ˆ ˆ (52) ∇ηr · a∇w = ∇ηr · a∇u − ah∇v − (φia − σi)∇∂iv , which by the identity (20) of the variants for u and v collapses into ˆ ˆ ∇ηr · a∇w = − ∇ηr · (φia − σi)∇∂iv .

Using the estimates (19) on v, this yields ˆ  ˆ 1   2   −(d+2) −2 2  ∇ηr · a∇w  r |(φ, σ)|  r |(φ, σ)| . B2r B2r Together with our assumption (10), this implies (28). We conclude this step with the argument for (29). From the identity (50) we deduce (x +φ )a∇w − wa(e + ∇φ ) k k k k = (x + φ )a∇u − ua(e + ∇φ ) − x a ∇v − va e k k k k k h h k − φkah∇v − (xk + φk) ∂iv∇·σi + φia∇∂iv

+ φi∂iva(ek + ∇φk)+v∇·σk. Using that the linear invariants of u and v coincide, cf.´ (21), and the formula (51), only the following terms survive after application of ∇η ·: r −φkah∇v + σi∇ (xk + φk)∂iv − (xk + φk)φia∇∂iv + φi∂iva(ek + ∇φk) − σk∇v

= ∂iv(φia + σi)(ek + ∇φk) − (φkah + σk)∇v − (xk + φk)(φia − σi)∇∂iv. Together with the estimates (19) on v, this implies by Caccioppoli’s estimate  ˆ       ∇ηr · (xk + φk)a∇w − wa(ek + ∇φk)  ˆ ˆ  r−(d+1) |(φ, σ)|(1 + |id + ∇φ|)+r−(d+2) |(φ, σ)||x + φ| B2r B2r 1 1 1 1 2 2 1 2  |(φ, σ)|2 1+ |id + ∇φ|2 + |x + φ|2 . r B B r B 2r 2r 2r 1 1 1 2 1 2  |(φ, σ)|2 1+ |φ|2 . r B2r r B4r 4. PROOFS 319

By assumption (10), this yields (29). Argument for Step 4. Select a cut-off function η for {|x|≥4} in {|x|≥2} and set ¯ ¯ w¯ := η(w − w0), h := ηh − (w − w0)a∇η, f := −∇η · (a∇w + h), where w0 is the average of w on the annulus {2 ≤|x|≤4}. By the choice of η, (31) is clearly satisfied. Since ¯ ∇w¯ = η∇w +(w − w0)∇η and thus − a∇w¯ − h = −η(a∇w + h), we learn from (24) that also (30) holds. The estimate (32) on f¯ follows from the estimates (25) and (26) on w and h. As for estimate (33) on h¯,wenotethat for the contribution ηh to h¯, the estimate immediately translates from (26); for the contribution −(w − w0)a∇η we note that it is supported in {2 ≤|x|≤4} and estimated by |w − w0|, and thus the desired estimate follows from Poincar´e’s inequality with mean value zero on the annulus {2 ≤|x|≤4}. For estimate (34) ¯ ¯ on h,wenotethatonBR(x0), h coincides with h, so that it follows immediately from (27). We now turn to the invariants (35). For the constant invariant we note that we obtain from the equation (30) for r ≥ 2 ˆ ˆ ˆ ˆ ¯ (31) ¯ ¯ (31) f = ηrf = ∇ηr · (a∇w¯ + h) = ∇ηr · (a∇w + h). ´ Hence the first identity in (35) follows from (28) provided we have limr↑∞ ∇ηr ·h = 0. The latter is a consequence of (26): ˆ  ˆ ˆ 1   2   1 d −1 2  ∇ηr · h  |h|≤r 2 |h| . r r≤|x|≤2r |x|≥r

We now turn to the linear invariants in (35). We start with the identity which follows from the equations forw ¯ and φk and the support properties ˆ ¯ ¯ ηr − (ek + ∇φk) · h +(xk + φk)f ˆ (30) ¯ = ∇(ηr(xk + φk)) · a∇w¯ +(xk + φk)∇ηr · h ˆ (2) ¯ = ∇ηr · (xk + φk)(a∇w¯ + h) − (¯w − w0)a(ek + ∇φk) ˆ (31) = ∇ηr · (xk + φk)(a∇w + h) − wa(ek + ∇φk) .

Hence the linear invariants follow from (29) once we show ˆ

(53) |−(ek + ∇φk) · h¯ +(xk + φk)f¯| < ∞ and ˆ

(54) lim (xk + φk)∇ηr · h =0. r↑∞ 320 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

The limit (54) follows from the estimate ⎛ ⎞ ˆ  ˆ 1 ˆ 1   2 2   ⎝ d 1 2 ⎠ 2  (xk + φk)∇ηr · h  r 2 + |φk| |h| r |x|≤2r |x|≥r 1 (26) 2  1 1 | |2 (55) α 1+ φk r r B2r ffl 1 1 2 2 together with the observation that the term |φk| is of higher order since ffl r B2r w. l. o. g. we may assume φk = 0 so that in view of (46) we may appeal to our B1 assumption (10). We now turn to (53); in view of the square integrability of f¯ and h¯ (established above) and the local square integrability of xk + φk and its gradient, it remains to show ˆ

(56) |(ek + ∇φk) · h| < ∞. |x|≥2 To this purpose, we divide into dyadic annuli and use Caccioppoli’s estimate 1 ˆ ˆ 1 ˆ 2 2 2 2 |(ek + ∇φk) · h|≤ |ek + ∇φk| |h| r≤|x|≤2r B2r |x|≥r 1 ˆ 1 ˆ 2 2 1 2 2  |xk + φk| |h| r B4r |x|≥r 1 ˆ 1 ˆ 2 2 1 d 2 2  r 2 + |φk| |h| . r B4r |x|≥r We now appeal to the same argument as for (55) to see ˆ | ∇ · |  1 (ek + φk) h α . r≤|x|≤2r r Summation over dyadic annuli yields (56). Argument for Step 5. We give an argument by duality and therefore con- sider for arbitrary square-integrable vector field h˜ supported in {|x|≥R} the finite energy solution of (57) −∇ · a∇w˜ = ∇·h.˜ Sincew ¯ is a finite energy solution of (30) we have ˆ ˆ ˆ (58) − ∇w¯ · h˜ = − ∇w¯ · a∇w˜ = (∇w˜ · h¯ − w˜f¯).

Recall Step 1 and consider ˆ 2 (59) ξ := argmin |∇w˜ − ξk(ek + ∇φk)| and c := (˜w − ξk(xk + φk)). Br0 B4 By the vanishing invariants (35) we may post-process (58) to ˆ ˆ ˜ ¯ ¯ − ∇w¯ · h = ∇w˜ − ξk(ek + ∇φk) · h − w˜ − ξk(xk + φk) − c f . 4. PROOFS 321

By the support conditions (31) on f¯ and h¯, this implies  ˆ       ∇w¯ · h˜ ∞ ˆ ˆ ≤ |∇w˜ − ξk(ek + ∇φk)||h¯| + |w˜ − ξk(xk + φk) − c||f¯| n≤| |≤ n+1 | |≤ n=1 2 x 2 x 4 ˆ 1 ˆ 1 ∞  2 2 2 ¯ 2 ≤ |∇w˜ − ξk(ek + ∇φk) |h| | |≥ n n=1 B2n+1 x 2 ˆ 1 ˆ 1 2 2 2 ¯ 2 + |w˜ − ξk(xk + φk) − c| |f| . B4 Inserting the estimates (32) and (33) on f¯ and h¯ and using Poincar´e’s inequality on B4 (in view of the definition (59) of c)

ˆ  ∞ 1    2  ˜ 1 α 2 (60)  ∇w¯ · h  ( ) |∇w˜ − ξk(ek + ∇φk) . 2n n=1 B2n We now distinguish the cases of 2n ≤ R and 2n ≥ R.Incaseof2n ≤ R,sinceby ˜ assumption h is supported in {|x|≥R},˜w is a-harmonic in BR, cf. (57). We thus may appeal to (17) in Step 1 and obtain in view of the definition of ξ in (59)

1 1  2 n 2 2 2 α 2 (61) |∇w˜ − ξk(ek + ∇φk)  ( ) |∇w˜| . B2n R BR In case of 2n ≥ R, we use (18) in Step 1 (and once more Caccioppoli’s estimate in conjunction with assumption (10)) to obtain

1 1 1  2 2 2 2 2 2 |∇w˜ − ξk(ek + ∇φk)  |∇w˜| + |ξ| |id + ∇φ| B2n B2n B2n 1 1 2 2  |∇w˜|2 + |∇w˜|2 . B2n BR Using the energy inequality for (57) in form of

1 ˆ 1 ˆ 1 2 2 2 2 − d 2 − d 2 |∇w˜|  r 2 |∇w˜|  r 2 |h˜| , Br we obtain in either case  ' 1 n d ˆ 1  2 2 α 1 2 n ≤ 2 2 ( R ) ( R ) for 2 R ˜ 2 |∇w˜ − ξk(ek + ∇φk)  d |h| . 1 2 n ≥ B2n ( R ) for 2 R Inserting this into (60) we obtain

ˆ  ˆ 1   ln R 2  ∇w¯ · h˜  |h˜|2 .   α+ d R 2 Since the only constraint on h˜ was that it is supported in {|x|≥R},weobtain (36). 322 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

Argument for Step 6.Inviewof(36)inStep5,itisenoughtoshow

1 1 ˆ 2 ˆ 2 |∇ |2  ln R 1 |∇ |2 (62) w¯ d+α + d w¯ , 2 B1(x0) R R BR(x0) wherew.l.o.g.wemayassumethatR =2N is dyadic. To this purpose, we ¯ ¯ ¯ decompose the r. h. s. h of (30) into (hn)n=0,...,N+1 such that h0 is supported in ¯ n−1 n B1(x0), hn is supported in the annulus {2 ≤|x − x0|≤2 } for n =1,...,N, ¯ and hN+1 is supported in the exterior domain {|x − x0|≥R}.Forn =0,...,N letw ¯n denote the finite energy solution of

(63) −∇ · a∇w¯n = ∇·h¯n;

For n = N +1,w ¯N+1 denotes the finite energy solution of ¯ ¯ −∇ · a∇w¯N+1 = ∇·hN+1 + f.  N+1 ∇ ∇ By uniqueness of finite energy solutions of (30), n=0 w¯n = w¯, so that by the triangle inequality ˆ 1 ˆ 1 2 N+1 2 2 2 (64) |∇w¯| ≤ |∇w¯n| . B1(x0) n=0 B1(x0) In the sequel, we will estimate the contributions individually. ¯ We start with the intermediate n =1,...,N: Since by construction of hn and by (63),w ¯n is a-harmonic in B2n−1 (x0), we have by Step 1 (the second inequality in (18) and with the origin replaced by x0), and the energy estimate for (63): 1 1 ˆ 2 ˆ 2 2 1 d 2 |∇w¯ |  ( ) 2 |∇w¯ | n 2n n B1(x0) B2n−1 (x0) 1 ˆ 2 1 d  2 |¯|2 ( n ) h . 2 B2n (x0) In the case of n = 0, we obtain likewise by just the energy estimate 1 1 ˆ 2 ˆ 2 2 ¯ 2 |∇w¯0|  |h| . B1(x0) B1(x0) Hence in both cases we obtain thanks to (34) that

1 ˆ 2 |∇ |2  1 (65) w¯n d+α . B1(x0) R

We finally turn to n = N + 1 and obtain like for n =1,...,N by Step 1 1 1 ˆ 2 ˆ 2 2 1 d 2 |∇w¯N+1|  ( ) 2 |∇w¯N+1| . B1(x0) R BR(x0) But now we use the triangle inequality in form of 1 1 ˆ 2 ˆ 2 ˆ 1 2 2 2 2 |∇w¯N+1| ≤ |∇w¯| + |∇(¯w − w¯N+1)| , BR(x0) BR(x0) 4. PROOFS 323

¯ and the energy estimate for −∇ · a∇(¯w − w¯N+1)=∇·(I(BR(x0))h) to conclude 1 ˆ 1 ˆ 2 2 (34) |∇ − |2 ≤ |¯|2  1 (¯w wN+1) h d . 2 +α BR(x0) R Collecting these estimates on the contribution of n = N +1,weobtain 1 1 ˆ 2 ˆ 2 1 1 d |∇ |2  2 |∇ |2 (66) w¯N+1 d+α +( ) w¯ . B1(x0) R R BR(x0) Inserting (65) and (66) into (64), we obtain (62).

Argument for Step 7. Note that by (31) and (43) we have in B1(x0), where the cut-off satisfies η ≡ 1,

∇w = ∇u − [∂iv(ei + ∇φi)+(φi − φi)∇∂iv]. ffl B1(x0) In fact, the extra term (φi − φi)∇∂iv is of higher order: B1(x0) 1 ˆ 2 | − ∇ |2  1 (φi φi) ∂iv d+1 , B1(x0) B1(x0) R which follows immediately from the assumption (10) and from the pointwise esti- mate (19) on v.

Proof of Corollary 3. Like for the proof of the Theorem 2 we may assume r∗ =1.Weconsiderg, u,andv like in the statement of Theorem 2 and note that we obtain from (11) the Green’s function representation ˆ

u(x)=− ∇yG(x, y) · g(y)dy, whereas (12) yields ˆ ˆ v(x)= ∇Gh(x−y)·(ei +∂iφ(y))gi(y)dy = ∂j Gh(x−y)(ej +∇φj (y))·g(y)dy.

By differentiation in x this implies ˆ

∇u(x)=− ∇x∇yG(x, y)g(y)dy, ˆ B1(0)

∂iv(x)= ∂i∂j Gh(x − y)(ej + ∇φj (y)) · g(y)dy, B1(0) so that (13) takes on the form ˆ  ˆ   ∇x∇yG(x, y) B1(x0) B1(0)  1 2 2 +∂i∂j Gh(x − y) ei + ∇φi(x) ⊗ ej + ∇φj (y) g(y)dy dx ˆ ln |x | 1 (67)  0 |g|2 dy 2 . | |d+α x0 B1(0)

We now argue that in (67), we may replace ∂i∂j Gh(x−y)by∂i∂j Gh(x). Indeed, because of |x0|≥4, |x − x0|≤1, and |y|≤1, we have for the constant-coefficient 324 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

| − − |  1 Green’s function ∂i∂j Gh(x y) ∂i∂j Gh(x) d+1 . In addition, we have by |x0| ´the argument from Step 1 in the proof of´ Theorem 2 and our assumption (10) that 2 2 |ei + ∇φi(x)| dx  1aswellas |ej + ∇φj (y)| dy  1. We therefore B1(x0) B1(0) obtain by Cauchy-Schwarz’ inequality in y ˆ ˆ   1  2 2  ∂i∂j Gh(x−y)−∂i∂j Gh(x) ei+∇φi(x) ⊗ ej +∇φj (y) g(y)dy dx B1(x0) B1(0) ˆ 1 1  |g|2 dy 2 . | |d+1 x0 B1(0) Hence (67) upgrades to

(68) ˆ   1  2 2  ∇x∇yG(x, y)+∂i∂j Gh(x) ei+∇φi(x) ⊗ ej +∇φj (y) g(y)dydx B1(x0) B1(0) ˆ ln |x | 1  0 |g|2 dy 2 . | |d+α x0 B1(0)

We now apply Lemma 4 to the family u(y)=∇xG(x, y)−∂i∂j Gh(x)(ei+∇φi(x))(yj+ d φj(y)) of R -valued maps defined for y ∈ B1(0) and parameterized by the point x ∈ B1(x0), i.e., we interchanged the role of x and y. We note that these maps are component-wise a-harmonic on {|y|≤1} because of (2) and because of the y- derivative of (14) in conjunction with the symmetry of the Green’s function, which follows from symmetry of A (for nonsymmetric A one would apply Lemma 4 for the adjoint problem).ffl The ensemble average on this family is given by the spatial average · = · dx. The role of the linear functionals in the statement of B1(x0) ´ Lemma 4 is played by Fu := ∇yu(y)g(y)dy, where we restrict to g that are ´ B1(0) 1 | |2 2 normalized ( B (0) g dy) = 1. Hence we learn from Lemma 4 that (68) implies ˆ 1   1  2 2 ∇x∇yG(x, y)+∂i∂j Gh(x)(ei + ∇φi(x)) ⊗ (ej + ∇φj (y)) dy dx B1(x0) B 1 (0) 2 | |  ln x0 d+α . |x0|

By the preceding argument, we may substitute ∂i∂j Gh(x) again by ∂i∂j Gh(x − y).

Proof of Lemma 4. Using translation and scaling invariance, an elementary R covering argument shows that is enough to establish (15) with radius 2 replaced by √R . Therefore, it suffices to show the result with the inner ball {|x| < √R } 2 d 2 d − R R d {| | } replaced by the cube ( 2 , 2 ) and the outer ball x

The proof essentially amounts to a generalization of Caccioppoli’s estimate for an a-harmonic functionˆ u. Recall that the latterˆ states (71) |∇u|2 dx  inf |u − c|2 dx, − π π d c∈R − π π d ( 4 , 4 ) ( 2 , 2 ) which we may re-express in terms of the Fourier cosine series B ˆ F 2 d ∈ Zd \{ } (72) u(k):= πd u(x)Πi=1 cos(kixi)dx for k 0 (− π , π )d ´  2 2 |∇ |2  |F |2 as − π π d u dx k∈Zd\{0} u(k) . The generalization of (71) we need is ( 4 , 4 ) that for any even n ∈ N we have ˆ 1 (73) |∇u|2 dx  |Fu(k)|2, | |2n (− π , π )d k 4 4 k∈Zd\{0} which amounts to replace the L2-norm on the r. h. s. of (71) by the negative H˙ −n- norm. For the remainder of the proof,  will mean up to a constant also depending on n; but this will not matter, since we will presently fix an n in terms of d. Before we give the argument for (73), let us argue how to conclude. We first note that for k ∈ Zd \{0} the linear functional Fu(k) has the boundedness property (70): For any c ∈ R B ˆ F 2 − d u(k)= πd (u(x) c)Πi=1 cos(kixi)dx, (− π , π )d ffl 2 2 so that choosing c = − π π d u, we may apply H¨older’s inequality and Poincar´e’s ( 2 , 2 ) − π π d inequality in ( 2 , 2 ) to infer B ˆ 1 ˆ 1 2 2 |F |≤ 2 | − |2 d | |2 u(k) πd u(x) c dx Πi=1 cos(kix) dx − π π d − π π d ( 2 , 2 ) ( 2 , 2 ) B ˆ 1 1 ˆ 1 2 d 2 2 ≤ 2 |∇ |2 π |∇ |2 πd u(x) dx = u(x) dx , − π π d 2 − π π d ( 2 , 2 ) ( 2 , 2 ) − π π d where we used that the Poincar´e constant on the d-dimensional cube ( 2 , 2 ) is equal 1, and that at least one ki =0. Hence, after taking the ensemble average, we may reformulate (73) as ˆ 1 |∇u(x)|2dx  |Fu(k)|2. | |2n (− π , π )d k 4 4 k∈Zd\{0}  ∈ N d 1  Now picking n with n> 2 so that k∈Zd\{0} |k|2n 1, we obtain (69). We now turn to the argument for (73) and introduce the abbreviation · for the 2 − π π d L (( 2 , 2 ) )-norm. The main ingredient is the following interpolation inequality for any function v of zero spatial average (74) 1 1 n n+1 n 1 2 2(n+1) 1 2 2 η v  η ∇v n+1 |Fv(k)| + |Fv(k)| , |k|2n |k|2n k∈Zd\{0} k∈Zd\{0} − π π d − π π d where η is a cut-off function for ( 4 , 4 ) in ( 2 , 2 ) with (75) |∇η|  1. 326 BELLA, GIUNTI, AND OTTO, QUANTITATIVE STOCHASTIC HOMOGENIZATION

Note that (74) couples the degree of negativity of the r. h. s. norm to the degree of degeneracy of the cut-off ηn. If we plug the standard Caccioppoli estimate in its refined form of (75) ηn+1∇u  inf (u − c)∇ηn+1  inf ηn(u − c) c∈R c∈R into (74) for v = u − c and use Young’s inequality, we obtain (73). In preparation of its proof, we rewrite (74) without Fourier transform, appealing to the representation of the Laplacian −"N with Neumann boundary conditions 2 through the Fourier cosine series by F(−"N )w(k)=|k| Fw(k): 1 1 2 2 n |Fv(k)| = w where (−" ) 2 w = v. |k|2n N k∈Zd\{0} For(74)itthussufficestoshowforarbitraryfunctionw

n n n+1 n n 1 η " 2 w  η ∇" 2 w n+1 w n+1 + w. By iterated application of Young’s inequality, it is easily seen that this family of interpolation estimates indexed by even n follows from the following two-tier family of interpolation inequalities index by m ∈ N 2m m 2m+1 m 1 2m−1 m−1 1 2m−1 m−1 η " w  η ∇" w 2 η ∇" w 2 + η ∇" w and 2m−1 m−1 2m m 1 2m−2 m−1 1 2m−2 m−1 η ∇" w  η " w 2 η " w 2 + η " w. Obviously, this two-tier family reduces to the two estimates 2m 2m+1 1 2m−1 1 2m−1 η "v  η ∇"v 2 η ∇v 2 + η ∇v, 2m−1 2m 1 2m−2 1 2m−2 η ∇v  η "v 2 η v 2 + η v, which by Young’s inequality follow from 2m 2m+1 2m 1 2m−1 1 η "v  (η ∇"v + η "v) 2 η ∇v 2 , 2m−1 2m 2m−1 1 2m−2 1 η ∇v  (η "v + η ∇v) 2 η v 2 . Thanks to (75), these two last estimates immediately follow from integration by parts (the cut-off η suppresses boundary terms), the Cauchy-Schwarz and the tri- angle inequalities.

Bibliography [1] S. N. Armstrong and J.-C. Mourrat, Lipschitz regularity for elliptic equations with random coefficients, Arch. Ration. Mech. Anal. 219 (2016), no. 1, 255–348. MR3437852 [2]S.N.ArmstrongandC.K.Smart,Quantitative stochastic homogenization of convex integral functionals, Ann. Sci. Ec.´ Norm. Sup´er. (4) 49 (2016), no. 2, 423–481. MR3481355 [3] M. Avellaneda and F.-H. Lin, Compactness methods in the theory of homogenization, Comm. Pure Appl. Math. 40 (1987), no. 6, 803–847. MR910954 [4] P. Bella, A. Giunti, and F. Otto, Green’s function for elliptic systems: moment bounds,to appear in Netw. Heterog. Media. [5] J. Conlon, A. Giunti, and F. Otto, Green function for elliptic systems: Delmotte-Deuschel bounds, available on arXiv (2016). [6] J. G. Conlon and A. Naddaf, On homogenization of elliptic equations with random coeffi- cients, Electron. J. Probab. 5 (2000), no. 9, 58 pp. (electronic). MR1768843 BIBLIOGRAPHY 327

[7] E. De Giorgi, Un esempio di estremali discontinue per un problema variazionale di tipo ellittico, Boll. Un. Mat. Ital. (4) 1 (1968), 135–137. MR0227827 [8] J. Fischer and F. Otto, A higher-order large-scale regularity theory for random elliptic oper- ators, Comm. Partial Differential Equations 41 (2016), no. 7, 1108–1148. MR3528529 [9] A. Gloria, S. Neukamm, and F. Otto, A regularity theory for random elliptic operators,ArXiv e-prints (2014). [10] A. Gloria, S. Neukamm, and F. Otto, An optimal quantitative two-scale expansion in sto- chastic homogenization of discrete elliptic equations,ESAIMMath.Model.Numer.Anal.48 (2014), no. 2, 325–346. MR3177848 [11] , Quantification of ergodicity in stochastic homogenization: optimal bounds via spec- tral gap on Glauber dynamics, Invent. Math. 199 (2015), no. 2, 455–515. MR3302119 [12] A. Gloria and J. Nolen, A quantitative central limit theorem for the effective conductance on the discrete torus, Comm. Pure Appl. Math. 69 (2016), no. 12, 2304–2348. MR3570480 [13] A. Gloria and F. Otto, An optimal variance estimate in stochastic homogenization of discrete elliptic equations, Ann. Probab. 39 (2011), no. 3, 779–856. MR2789576 [14] , An optimal error estimate in stochastic homogenization of discrete elliptic equations, Ann. Appl. Probab. 22 (2012), no. 1, 1–28. MR2932541 [15] , Quantitative estimates on the periodic approximation of the corrector in stochastic homogenization, CEMRACS 2013—modelling and simulation of complex systems: stochastic and deterministic approaches, 80–97, ESAIM Proc. Surveys, 48 (2015), EDP Sci., Les Ulis. MR3415388 [16] Y. Gu and J.-C. Mourrat, Pointwise two-scale expansion for parabolic equations with random coefficients, Probab. Theory Related Fields 166 (2016), no. 1-2, 585–618. MR3547747 [17] S. M. Kozlov, The averaging of random operators, Mat. Sb. (N.S.) 109(151) (1979), no. 2, 188–202, 327. MR542557 [18] D. Marahrens and F. Otto, Annealed estimates on the Green function, Probab. Theory Re- lated Fields 163 (2015), no. 3-4, 527–573. MR3418749 [19] A. Naddaf and T. Spencer, On homogenization and scaling limit of some gradient perturba- tionsofamasslessfreefield,Comm.Math.Phys.183 (1997), no. 1, 55–84. MR1461951 [20] , Estimates on the variance of some homogenization problems, (1998), preprint. [21] G. C. Papanicolaou and S. R. S. Varadhan, Boundary value problems with rapidly oscillating random coefficients, Random fields, Vol. I, II (Esztergom, 1979), Colloq. Math. Soc. J´anos Bolyai, vol. 27, North-Holland, Amsterdam-New York, 1981, pp. 835–873. MR712714 [22] V. Sidoravicius and A.-S. Sznitman, Connectivity bounds for the vacant set of random inter- lacements, Ann. Inst. H. Poincar´e Probab. Statist. 46 (2010), no. 4, 976–990. MR2744881 [23] V. V. Yurinski˘ı, Averaging of symmetric diffusion in a random medium, Sibirsk. Mat. Zh. 27 (1986), no. 4, 167–180, 215. MR867870 PUBLISHED TITLES IN THIS SERIES

23 Mark J. Bowick, David Kinderlehrer, Govind Menon, and Charles Radin, Editors, Mathematics and Materials, 2017 22 Hubert L. Bray, Greg Galloway, Rafe Mazzeo, and Natasa Sesum, Editors, Geometric Analysis, 2016 21 Mladen Bestvina, Michah Sageev, and Karen Vogtmann, Editors, Geometric Group Theory, 2014 20 Benson Farb, Richard Hain, and Eduard Looijenga, Editors, Moduli Spaces of Riemann Surfaces, 2013 19 Hongkai Zhao, Editor, Mathematics in Image Processing, 2013 18 Cristian Popescu, Karl Rubin, and Alice Silverberg, Editors, Arithmetic of L-functions, 2011 17 Jeffery McNeal and Mircea Mustat¸˘a, Editors, Analytic and Algebraic Geometry, 2010 16 Scott Sheffield and Thomas Spencer, Editors, Statistical Mechanics, 2009 15 Tomasz S. Mrowka and Peter S. Ozsv´ath, Editors, Low Dimensional Topology, 2009 14 Mark A. Lewis, Mark A. J. Chaplain, James P. Keener, and Philip K. Maini, Editors, Mathematical Biology, 2009 13 Ezra Miller, Victor Reiner, and Bernd Sturmfels, Editors, Geometric Combinatorics, 2007 12 Peter Sarnak and Freydoon Shahidi, Editors, Automorphic Forms and Applications, 2007 11 Daniel S. Freed, David R. Morrison, and Isadore Singer, Editors, Quantum Field Theory, Supersymmetry, and Enumerative Geometry, 2006 10 Steven Rudich and Avi Wigderson, Editors, Computational Complexity Theory, 2004 9 Brian Conrad and Karl Rubin, Editors, Arithmetic Algebraic Geometry, 2001 8 Jeffrey Adams and David Vogan, Editors, Representation Theory of Lie Groups, 2000 7 Yakov Eliashberg and Lisa Traynor, Editors, Symplectic Geometry and Topology, 1999 6 EltonP.HsuandS.R.S.Varadhan,Editors, Probability Theory and Applications, 1999 5 Luis Caffarelli and Weinan E, Editors, Hyperbolic Equations and Frequency Interactions, 1999 4 Robert Friedman and John W. Morgan, Editors, Gauge Theory and the Topology of Four-Manifolds, 1998 3 J´anos Koll´ar, Editor, Complex Algebraic Geometry, 1997 2 Robert Hardt and Michael Wolf, Editors, Nonlinear partial differential equations in differential geometry, 1996 1 Daniel S. Freed and Karen K. Uhlenbeck, Editors, Geometry and Quantum Field Theory, 1995 Articles in this volume are based on lectures presented at the Park City summer school on “Mathematics and Materials” in July 2014. The central theme is a description of material behavior that is rooted in statistical mechanics. While many presentations of mathematical problems in materials science begin with continuum mechanics, this volume takes an alternate approach. All the lectures present unique pedagogical introductions to the rich variety of material behavior that emerges from the interplay of geometry and statistical mechanics. The topics include the order-disorder transi- tion in many geometric models of materials including nonlinear elasticity, sphere packings, granular materials, liquid crystals, and the emerging field of synthetic self-assembly. Several lectures touch on discrete geometry (especially packing) and statistical mechanics. The problems discussed in this book have an immediate mathematical appeal and are of increasing importance in applications, but are not as widely known as they should be to mathematicians interested in materials science. The volume will be of interest to graduate students and researchers in analysis and partial differential equations, continuum mechanics, , discrete geometry, and mathematical physics.

PCMS/23 AMS on the Web www.ams.org