DEGREE PROJECT IN ENGINEERING PHYSICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2020

Building Neural Network Potentials for Lennard-Jones and Aluminium systems

ANTHONY SALIOU

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES TRITA TRITA-SCI-GRU 2020:312

www.kth.se Kungliga Tekniska H¨ogskolan

Building neural network potentials for Lennard-Jones and Aluminium systems

Anthony Saliou

Master Thesis for the MSc in Engineering Physics

KTH, Royal Institute of Technology Stockholm, Sweden

March 2020 – July 2020 Chapter 0 Abstract

This thesis gathers different works that fall within the framework of approach in material sciences, which is a very active field of research. More particularly, the objective is to build a Neural Network Potential from ab initio Molecular Dynamics simulation data.

We first introduce the basics of Molecular Dynamics (MD) and the role it plays in the current material sciences, condensend matter physics and chemistry. Some tools of MD have be used to compare dif- ferent situations and confront the simulation results to the laws of physics. Furthermore, we compare two simulation methods: the classical MD using LAMMPS and the ab initio approach using VASP.

Then, we dig into the sciences of Neural Networks (NNs): What are these? How do they work? Through a few examples we build consistent datasets and train a NN with appropriate and optimized structure to predict quantites. Obtained results are discussed with those from the litterature. Es- pecially, we conduct a machine learning approach to study the relationship between diffusion and entropy within a Lennard-Jones (LJ) system.

Finally, thanks to the knowledge acquired in both MD and NNs we are able to implement a Python set of modules to predict configurational energies from atomic positions within a material using the Behler Parrinello approach. We assess the implemented method for the Lennard-Jones system from a dataset produced by LAMMPS and applied it for pure aluminium from VASP trajectories.

1 Chapter 0 Acknowledgements

This year 2020 was not the easiest in terms of work conditions. However, I’m very grateful to have shared this internship at SIMAP Laboratory in Grenoble in this difficult period. Even though most of the work was made at home, the time I spent there was awesome and the welcome was very warm. During these five months, we have kept very good and close contact with the team, with regular visioconferences and chatrooms.

The first thanks should go to the SIMAP laboratory who has hosted me for this period and which has been of a very good support within those months. Moreover, grateful thanks are adressed to Constellium, without who this work would not have been possible. Finally, I am thankfull to the Grenoble Institute on Artificial Intelligence MIAI.

I would like to directly thanks Noel, my tutor at the laboratory, for all the support and the help he has provided to me during those five months. It was a very precious time. On the professional point of view, Noel has supported me all along, sharing his experience and bringing a lot of knowledge to me. I was quite new to the research area and working with him was a very good way to discover how to find my role and place in it. On a human point of view, he is a colleague I will,for sure, keep close contact with in the future, for all the relativity and positivity he brings to me.

I am also thankful to Cecilia and Sebastien, who were part, with Noel and I, of the simulation team. Together we have build a good working environment. It was a real motivation to carry on the work with people working alongside, with similar issues and to share our own experience.

Moreover, a big thanks can be adressed to my friend Paul, who has endured a lot of my questions concerning Neural Networks. He was of a huge help understanding some concepts and technical issues.

I would add a huge thanks needs to be adressed to my mom, my dad and my sister, who have hosted and supported me during most of these 5 months. It was not always easy to have someone back at home after a few years out and I am grateful for it.

The final thanks is adressed to my girlfriend, Lisa, with who we exchanged mutual supports in these difficult months. Thank you for all that you gave, give and will give to me.

2 Contents

1 Introduction 6

2 Theoretical overviews of Molecular Dynamics and Neural Networks 8 2.1 Molecular Dynamics simulations ...... 8 2.1.1 Classical Molecular Dynamics with LAMMPS ...... 8 2.1.2 ab initio with VASP ...... 20 2.1.3 Simulation results and comparisons ...... 23 2.2 Constructing a neural network potential (NNP) ...... 28 2.2.1 A neural network (NN) ...... 28 2.2.2 Multi-layer perceptrons (MLP) ...... 29 2.2.3 Constructing a NN ...... 35 2.2.4 The Behler-Parrinello method ...... 42

3 Study of diffusion-entropy relation within Lennard-Jones systems 51 3.1 Building consistent LAMMPS simulations ...... 51 3.2 First approach of the relation diffusion vs entropy ...... 52 3.3 Behaviour along characteristic curves ...... 55 3.4 Conclusion on the entropy-diffusion relation ...... 58

4 Construction of a high-dimensional neural networks in the Behler-Parinello ap- proach 59 4.1 Lennard-Jones NNP ...... 59 4.1.1 Constructing a dataset ...... 59 4.1.2 Building the High-Dimensional NNP (HDNNP) ...... 64 4.1.3 Optimizing the HDNNP ...... 69 4.1.4 Structure comparisons ...... 69 4.1.5 Comparison results and conclusion ...... 70 4.1.6 Predicting the configurational energy ...... 71 4.2 Aluminium NNP ...... 72 4.2.1 Training of the NN ...... 72 4.2.2 Energy trajectories ...... 74

5 Conclusion 79

A LAMMPS and VASP files 80 A.1 Examples of LAMMPS output files ...... 80 A.2 LAMMPS simulation code for enthalpy dynamics ...... 81 A.3 VASP input files ...... 84 List of Figures

2.1 Lennard-Jones pair potential ...... 9 2.2 Evolution of total energy in the production phase for liquid Al at T = 1200K and ρ = 0.052A˚−3...... 15 2.3 Distribution of total energy values for liquid AL at T = 1200K and ρ = 0.052A˚−3 ... 15 2.4 Radial correlation function in liquid Al at T = 1200K and ρ = 0.0154A˚−3 ...... 16 2.5 Radial distribution function in liquid Al at T = 1200K and ρ = 0.0154A˚−3 ...... 16 2.6 Mean squared displacement in liquid Al at T = 1200K and ρ = 0.0154A˚−3 ...... 17 2.7 Energy distribution with 864 atoms...... 18 2.8 Energy distribution with 6912 atoms...... 19 2.9 Enthalpy evolutions for Aluminium ...... 19 2.10 MD simulation led to determine the melting temperature Tmelt of a material ...... 20 2.11 g(r) for liquid Al at 1300K ...... 23 2.12 (a) g(r) for liquid Al at 650K; (b) Zoom on the second g(r) peak for liquid Al at 650K 24 2.13 R2(t) for liquid Al at (a) 650K; (b) 1300K ...... 24 2.14 Comparison of pair correlation function (a) gAlAl(r); (b) gAlNi(r); (c) gNiNi(r)..... 26 2.15 Comparison of diffusion coefficient ...... 27 2.16 Coordination numbers from (a) classical MD; (b) ab initio...... 27 2.17 General approach of building NNP with Machine Learning. Taken from Artrith pre- sentation at Aalto’s University [17]...... 28 2.18 (a) Scheme of a neuron; (b) Mathematical description of a neuron in a NN ...... 29 2.19 ...... 30 2.20 (a) Example of a MLP; (b) Mathematical description of a MLP...... 30 2.21 (a) Hyperbolic tangent; (b) Sigmoid function; (c) Both functions ...... 32 2.22 (a) ReLU function; (b) Softplus function; (c) Both functions ...... 33 2.23 Example of an overfitted NN ...... 34 ML 2.24 Aex prediction using (a) MLPRegressor; (b) keras ...... 40 2.25 Reduced excess Helmholtz free energy per temperature against density in a LJ system 41 2.26 Lennard-Jones system phase diagram taken from [22]...... 41 2.27 Atomic environment. Fig 2 in Belher 2015 [19]...... 43 2.28 High-dimensional NNP structure : from atomic positions inputs to structure energy output ...... 44 2.29 Plot of the cut-off function for different cut-off radius rc...... 45 2 2.30 Gaussian symmetry functions Gi with several (a) widths; (b) shifts ...... 46 3 2.31 Periodic radial symmetry function Gi for different periods ...... 47 2.32 Atomic neighbouring within a structure ...... 47 θ 2.33 Angular part Gi with orientation (a) λ = −1; (b) λ = +1 ...... 48 5 2.34 Angular symmetry function Gi with orientation (a) λ = −1; (b) λ = +1 ...... 49 3.1 Gaussian fit of the pressure ...... 52 3.2 Diffusion-entropy relation on isobar P ∗ =0...... 54 3.3 Diffusion vs entropy (a) ρ∗ = 0.6; (b) ρ∗ = 0.8; (c)ρ∗ = 1.0; (d) ρ∗ = 1.2 ...... 55 3.4 (a) D∗ = 0.05 exp(S); (b) D∗ = 0.6 exp(0.8S) approximation for different ρ∗ ...... 56 3.5 Diffusion vs entropy (a) T ∗ = 0.6; (b) T ∗ = 2; (c) T ∗ = 4; (d) T ∗ =6...... 56 Chapter 0 List of Figures

3.6 (a) D∗ = 0.05 exp(S); (b) D∗ = 0.6 exp(0.8S) approximation for different T ∗ ...... 57 3.7 Diffusion vs entropy (a) P ∗ = 0; (b) P ∗ = 0.5; (c) P ∗ = 2; (d) P ∗ =5 ...... 57 3.8 (a) D∗ = 0.05 exp(S); (b) D∗ = 0.6 exp(0.8S) approximation for different P ∗ ...... 58

2 4.1 Choice of Gi functions for LJ NNP on (a) η; (b) Rs ...... 61 4.2 Choices of radial symmetry functions for LJ NNP ...... 62 5 4.3 Choice of Gi functions for LJ NNP for (a) λ = −1; (b) λ = +1 ...... 63 4.4 Choices of angular symmetry functions for LJ NNP ...... 64 4.5 Example of the training structure of an atomic NN for N =4...... 66 4.6 Energy predictions using NN structure from Behler-Parrinello approach ...... 69 4.7 Prediction of energy trajectory of the LJ state (0.8, 2.0) using a 32-20-20-1 NNP. . . . 72 4.8 HDNNP energy trajectory (a) (0.8, 0.2); (b) (0.8, 0.8); (c) (0.8, 2.0) ...... 75 4.9 Choices of (a) radial; (b) angular summetry functions for Al HDNNP ...... 76 4.10 Energy trajectory for Al at (a) 300K; (b)800K; (c) 1700K; (d) 3100K ...... 78

5 Chapter 1

Introduction

Material sciences is a very dynamic research field that has known many revolutions through the last decades. The development of new materials, composites for example, is of very great interest from an economic point of view and for industry. Moreover, from a fondamental point of view, it remains many open questions and one of them, the so-called ”Inverse problem” is still far from being solved. As a consequence, it tends to develop very quickly.

First studied from a macroscopic point of view thanks to experiments and classical mechanics, ma- terials can now be understood at very different scales. Theories and computers have joined their forces after the second world war at Los Alamos, allowing scientists to operate at the microscopic and atomistic scales through simulations. Theories and experiments are not opposed anymore and count as a whole alongside simulations to validate accurate models (Allen and Tildesley [1]). By the end of the twentieth century, LAMMPS was developed as a programming language to carry out, among many other softwares, simulations at the atomic scale based on Molecular Dynamics (MD). Most of the thermodynamic properties of a material can be computed with LAMMPS and from there, a whole new dimension was given to material sciences. In MD, the interactions between atoms are described by external potentials based either empiraclly on physical and chemical principles, or semi-empirically on quantum principles of the electronic structure bonding.

A bit later, ab initio simulations arrived on the market in 1985 [2]. This method uses the principles of quantum mechanics, especially the Schr¨odingerequation, i.e. the formulation of the Hamiltonian H, to describe the interaction between atoms. It allows to compute the energy of the atoms in a material and therefore its thermodynamic properties. By taking into account the electronic effects between atoms, this ab initio method is expected to give a better description of the materials structures at the atomic scale. However, it still remains very expensive, computationally speaking, with a limitation to more or less 500 simulated atoms as what concerns MD.

A more recent revolution took place a few decades ago known as the fourth paradigm of science [3], with the renewed interest in tools borrowed from Artificial Intelligence such as Artifical Neural Net- works (ANN). Molecular dynamics simulations can provide a lot of data if you give it enough time. While a big amount of data have been obtained from previous simulations, scientists came with the idea that the time spent to get those data can now be recycled to perform brand new simulations but faster. The energy results from various simulations based on different potentials could allow to define general potentials that can describe every local atomic environment. ANN are very promising for the future and are already taking a huge place in the recent progresses on material sciences.

Simulations, either from classical MD or from ab initio calculations can quickly become expensive in terms of CPU time. There comes the strengh of Machine Learning techniques and especially inverse problems. They key purpose of this thesis is to use trained ANNs to predict energy from atomic configurations so we do not have to process whole simulations to obtain energies, for both classical MD and ab initio simulations. Chapter 1

To do so, we will use the Behler Parinello (BP) method [16], involving the creation of atomic neural networks (one NN for each element in the material). From these atomic NNs, a high dimensional neu- ral network potential (HDNNP) can be built in order to predict the system energy from all the atomic positions within the material. To test this method, we build two datasets, one for a Lennard-Jones system built using classical MD and a second for liquid Aluminium built using ab initio calculations. All the obtained results will be checked according to simulated energy trajectory, allowing to conclude on the validity of the so-built NNP.

This work will begin with a few words on Molecular Dynamics simulation, how do they work and what are the basics of LAMMPS and VASP modules. To do so, both methods are used and compared to properly understand the way they work. Then, the concept of neural networks and the BP method are depicted. At this step, we will construct and test a NN under Python, before to apply it to a Lennard-Jones liquid in order to study the diffusion-entropy relation.

The last and main step will then be discussed: the construction of the NNP for LJ and aluminium sys- tems in the liquid and solid states. Using LAMMPS and VASP modules (for resp. LJ and aluminium systems), atomic positions and configurational energies can be obtained for several thermodynamical states. Then, according to the BP method, symmetry functions are computed to describe the local atomic structure and used to train atomic NN conveniently. Finally, HDNNP are built for both systems and predicted energy trajectories can be compared to actual’s to validate the modelled HDNNP.

7 Chapter 2

Theoretical overviews of Molecular Dynamics and Neural Networks

2.1 Molecular Dynamics simulations

In order to get familiar with the basics of MD, we began with a few simulation works to manipulate the quantities of interests and learn how to extract the thermodynamic and structure properties from these. The limitation of the technique will also be shortly discussed.

2.1.1 Classical Molecular Dynamics with LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) has been released in 1995 [5] by Sandia National Laboratories. Written in C++, LAMMPS has many features, like potentials, atom types, ensembles and many other that allow to build very diversified and complex simulations with atoms, molecules, charges and granular objects. It has his own syntax and logic that we will quickly explore in the following subsection, before moving on an example of simulation we have led to manipulate this powerful tool.

Basics of classical Molecular Dynamics Molecular dynamics is a computational technique in which we follow numerically the time evolution of a set of atoms according the classical mechanics’ laws. Especially, the dynamics of each atom i is given by the Newton’s second law:

d2r F = m a ≡ m i , (2.1.1) i i i i dt2 where mi correspond to the mass of the atom i, Fi the force exercing on the atom i and a its proper acceleration.

As molecular dynamics deals with the collective behaviour of an assembly of N atoms, its properties are mainly understood ine the framwork of statistical mechanics. Still, one main wonder how relevant it is to use classical mechanics equation when one knows that at the atomic scale, the Schr¨odinger equation is expected to describe particle behaviours. Indeed, the classical approximation can be justified when Λ  a [4], where Λ is the de Broglie wavelength defined as s 2π 2 Λ = ~ , (2.1.2) mkBT m and T being respectively the atomic mass and the temperature. It turns out from this that the classical approximation is well justified for heavy atoms and high temperature, while the method can be quite limited for light systems and low temperatures. Chapter 2 2.1. Molecular Dynamics simulations

Forces calculations

On the left hand side (LHS) of Eq. 2.1.1 we have the force Fi. Forces are known to be the opposite gradient of a potential energy surface (PES) V (r) ≡ V (r1, r2, ..., rN ), where N is the total number of atoms in our system and ri corresponds to the coordinates of atom i. Thus, the force is given as below:

Fi = −∇ri V (r1, r2, ..., rN ). (2.1.3) However, to obtain the force, one needs to know this PES. An efficient and common way to chose the potential V (r) is to define it with pairwise interactions that we sum over all atoms: X X X X V (r) = φ(|ri − rj|) ≡ φ(rij). (2.1.4) i j>i i j>i

Defining the potential requires a description of the pair interaction φ(rij). The Lennard-Jones (LJ) potential is the most used pair potential to describe pair interactions within the system. It is defined according to two parameters ε, the potential depth, and σ the potential zero-point:

σ 12 σ 6 φ (r) = 4ε − . (2.1.5) LJ r r This potential is plotted for different values of the parameters ε and σ in Fig. 2.1.

Figure 2.1: Lennard-Jones pair potential

Equation of motion: numerical integration

Recalling Eq. 2.1.1, now that we have the force at a time t, we have the acceleration ai(t) of the atom i. The last step is to integrate it with regard to the time in order to get the atomic position ri(t). Then, we would be able to move forward to the time t + ∆t

9 Chapter 2 2.1. Molecular Dynamics simulations

A way to do so with a good precision is to expand the position at fourth order with respect to timestep ∆t, forward and backward:

 1 2 1 3 4 r(t + ∆t) = r(t) + v(t)∆t + 2 a(t)∆t + 6 b(t)∆t + O(∆t ), 1 2 1 3 4 (2.1.6) r(t − ∆t) = r(t) − v(t)∆t + 2 a(t)∆t − 6 b(t)∆t + O(∆t ), where v, a and b denotes respectively the first, second and third time derivative of r.

Summing these two equations, we get:

r(t + ∆t) = 2r(t) − r(t − ∆t) + a(t) + ∆t2 + O(∆t4), (2.1.7) which is completely independant of the velocity. Thus, we have a good way to obtain the position from the PES, recalling Eq. 2.1.1. Moreover, if the velocity can be recovered from:

r(t + ∆t) − r(t − ∆t) v(t) = . (2.1.8) 2∆t The main drawback of Eq. 2.1.7 is that it requires two sets of consecutive configurations to start. A better way to obtain positions and velocities at a timestep t + ∆t is the Velocity Verlet algorithm that requires only one initial configuration. This algorithm couples expansions and Eq. 2.1.1, allowing to compute the desired quantities successively. Regarding the first method, it has an accuracy in O(∆t2) which is worse that O(∆t4) but which remains sufficient since we are working with statistical quantites. In brief, it can be summed up as:

 r(t + ∆t) = r(t) + v(t)∆t + 1 a(t)∆t2,  2  1 v(t + ∆t/2) = v + 2 a(t)∆t, 1 (2.1.9)  a(t + ∆t) = − m ∇V (r(t + ∆t)),  1 v(t + ∆t) = v(t + ∆t/2) + 2 a(t + ∆t)∆t. This algorithm is the one used in LAMMPS and has many strengths. First, it is symplectic: it conserves the internal energy. Second, it is time-reversible as expected. Indeed, solving the Velocity Verlet algorithm is nothing more than solving the primitive Euler system, which is time-reversible, given below (plus computing the velocity at time t + ∆t/2):

( R t v(t) = v(0) + 0 a(t)dt, R t (2.1.10) r(t) = r(0) + 0 v(t)dt. This makes another point in favor of this method: it is not computational expensive. A last thing is that it may appear that we need 9N memory slots to store positions, velocities and acceleration values for each direction x, y and z and each atom. However, these quantities are never needed simultaneously.

A LAMMPS script In order to illustrate the way LAMMPS work and can be used in material sciences, there is no better than take a look at a simple script, step by step. # Initialisation echo both u n i t s metal atom style atomic The echo command has 4 possible values: None, screen, log or both. It indicates how the simulation should be shown to the executer, either on the screen, on a log file or on both.

Then, we define the units the code should deal with and the objects we are manupilating. Here we have Aluminium atoms, but one can also choose molecules in cgs units by example.

10 Chapter 2 2.1. Molecular Dynamics simulations

variable x equal 6 variable y equal $x variable z equal $x variable Tbegin equal 1 variable Tmax equal 3500 variable Tliquid equal 1200 variable Pamb equal 1.013 v a r i a b l e N production step equal 50000 variable distri root equal 87287 variable lattice size equal 4.05 variable scale equal 1.04 variable box s i z e equal ${ l a t t i c e s i z e }∗${ s c a l e } We define here the variable we need. Every variable can be called using ${variable}. The lattice size depends on each type of atom and the scale is chosen to fix the atomic density within the box.

l a t t i c e f c c ${ b o x s i z e } region boxblock0 $x 0 $y 0 $z

c r e a t e b o x 1 box create atoms 1 box

mass 126.981539 The lattice is chosen to be in face-centered cubic configuration. The, the box of size (x, y, z) is created, with atoms of type 1 (Al) in it.

timestep 0.001 At every timestep a new configuration is computed from the last configuration and the potential. Forces on atoms and new positions are computed and so is the temperature, the pressure, the energy etc.

p a i r style eam/alloy p a i r c o e f f ∗ ∗ AlO.eam.alloy Al O For a LJ system we would have: p a i r style lj/cut 4#cut−o f f at 4∗ sigma p a i r coeff 1 1 1.0 1.0 4#epsilon, sigma=1 With these commands we define the potential that describes the interaction between atoms.

neighbor 1.5 bin neigh modify delay 10 These two parameters rules the way atoms are counted into neighbors lists of each atom. Here, we consider an atom as a neighbor if it is in a radius of rc ± 1.5, with rc the cut-off radius. The list of neighbor is updated every 10 timesteps.

velocity all create ${Tbegin} ${ d i s t r i r o o t } Velocities are initialized at the temperature Tbegin according to a random seed distri root.

t h e r m o style custom step temp press vol pe ke etotal enthalpy thermo 10

11 Chapter 2 2.1. Molecular Dynamics simulations

Here, we set the quantities we want to compute at each timestep during the thermalisation.

# Thermalisation

r e s e t t i m e s t e p 0

fix 1allnvttemp ${ Tliquide } ${ Tliquide } ${Tdamp} run 10000 u nf ix 1

fix 1allnvttemp ${Tmax} ${Tmax} ${Tdamp} run 10000 u nf ix 1

fix 1allnvttemp ${Tmax} ${ Tliquide } ${Tdamp} run 50000 u nf ix 1

fix 1allnpttemp ${ Tliquide } ${ Tliquide } ${Tdamp} run 10000 To identify correctly the timesteps we reset the counter to zero. Then, we make our system evolve in temperature in the ensemble we want.

# Production

dump vitesse all custom 20 dump.vit id type vx vy vz

dump position all custom 20 dump.pos id type x y z dump command generates dump file, here with the positions of atoms for the first and with the velocities for the second. Other quantities like forces, stresses or orientations of dipole moment of atom can be computed. 20 refers as the number of step we integrate on to obtain those values.

r e s e t t i m e s t e p 0

compute RDF all rdf 100 fix RDF2 all ave/time 10 5000 50000c RDF[ ∗ ] file RDF.dat mode vector

compute MSDall msd fix MSD2all ave/time50150c MSD[4] file MSD.dat

run ${ N production step } Finally, we compute new quantities here : the Radial Distribution Function (RDF) (resp. the Minimum Squared Displacement (MSD)) every 5000 (resp. 1) steps and store the averaged value over 10 (resp. 50) values in a RDF.dat (resp. MSD.dat) file.

Running a LAMMPS simulation The first simulations were made with pure Aluminium (Al) using LAMMPS. The atoms are placed in a cubic box of volume V = x × y × z × scale3 chosen by the user in a face-centered cubic (fcc) con- figuration. Interactions between atoms are described by an embedded atom model (eam) potential, which are quite appropriate for metallic systems.

12 Chapter 2 2.1. Molecular Dynamics simulations

Simulations are split into three phases, as an example we look at supercooled Al :

1. Initialisation : the system is prepared, positions and velocities are fixed at t = 0 ; then the temperature increases from a very low value (Tbegin = 1K) to Tmax = 3500K so the Al has completely melted. Finally, the material is cooled down to a final temperature T ≡ Tliquid; 2. Thermalisation : the liquid equilibrate in the canonic ensemble (NVT , N the number of particles, V the volume and T the temperature);

3. Production : output parameters (pression, volume, energies, enthalpy etc.) are computed in a NPT ensemble.

Running the previous file allowed us to study a supercooled aluminium.

LAMMPS output files From the previous LAMMPS simulation, we have been able to obtain a set of data files, each containing informations about the evolution of our system. Each of them have different format that need to be understood in order to use these data in the convenient way.

The log.lammps file

All the script and the execution are displayed in this file. Every calculations, values are executions are printed so every step of the simulation can be read and studied.

Moreover, it contains every output of every function called during the simulation, especially the thermo style details.

These are printed as follow :

Step Temp Press Volume PotEng KinEng TotEng Enthalpy 0 1 −91000.485 16610.653 −2878.1794 0.11155151 −2878.0679 −3821.5204 [...] A longer example is given in Appendix A.

At every timestep (chosen by the user), the requested quantites are computed and gathered under the form of a table. The values are given for the entire system. Note that since we have chosen an NVT ensemble, the volume is constant.

The .dump files

Dump files can be filled with positions, velocities, forces... These quantities are computed by atoms (and not for the whole system as it was the cas in the log file). As a consequence, quantites are displayed by atoms, for every timestep: ITEM: TIMESTEP 0 ITEM: NUMBER OF ATOMS 864 ITEM: BOX BOUNDS pp pp pp 0.0000000000000000e+00 2.5271999999999998e+01 0.0000000000000000e+00 2.5271999999999998e+01 0.0000000000000000e+00 2.5271999999999998e+01 ITEM: ATOMS id type x y z

13 Chapter 2 2.1. Molecular Dynamics simulations

469 1 4.04342 3.41491 2.46877 728 1 1.49523 3.92532 2.74118 [...] More data are given as an example in Appendix A.

Here, we have 864 atoms of type 1. Thus, we have, for each timestep 864 rows, with first the identifi- cation number of the atom, then its type and finally the asked quantites, here x, y and z the spatial coordinates of the atom.

Within a simulation, atoms move in the box and so the calculations for a given atom are processed by different processors. Since the different processors write in the dump file independently, the atoms order change from one timestep to another.

The RDF and MSD files

From the example script we took, two more files have been generated : RDF.dat and MSD.dat.

RDF.dat has 4 columns. The first refers to the number of the timestep (1,2,3,4,...). The time is obtained by multiplying it by the chosen timestep value. The left three columns respectively correspond to :

• RDF [1] : mean distances of neighboring atoms;

• RDF [2] : pair correlation function g(r);

• RDF [3] : radial distribution function f(r) = ρ(T )4πg(r)r2.

Implecitely, the number of closest neighbors, also called coordination number is given in these data as:

Z rmin Z rmin 2 Ncoord = f(r)dr = ρ(T )4πg(r)r dr, (2.1.11) 0 0 where rmin refers to the first minimum of the pair correlation function g(r).

The MSD.dat file is quite similar, with the same first column and a second column that contains the Mean Squared Displacement (MSD) computed as follow for a system of Natoms atoms :

1 X ∆r2(t) = [r (t) − r (0)]2 ≡ R2(t). (2.1.12) N i i atoms i It allows a direct computation of the diffusion coefficient D defined as

∆r2(t) D = lim . (2.1.13) t→∞ 6t More quantites can be obtained either directly in the dump file by LAMMPS or with a few external calculations, using appropriates dump files.

Quantities determination and interpretation Now that we have computed all the data we needed, let us take a look at these as function of the radius and the time. The validity of a simulation can be obtained by a few check on these quantities. For this purpose, a MD simulation for Al with the EAM potential of AlO available from LAMMPS, as given in the previous subsection. These simulations have been performed exactly according to the previous described fle, with 864 atoms, a timestep of 0.001ps and a fixed density of 0.052A˚−3. The temperature goes from 1K to 3500K before the liquid Al is cooled down to 1200K.

14 Chapter 2 2.1. Molecular Dynamics simulations

Production data are sampled every 20 timesteps for the dump file and every 10 timesteps for the production.

From Noether theorem, in isolated system the total energy, expressed as the sum of the kinetic and potential energies must remain constant during the whole production process. In our situation, the canonical ensemble exchanges energy with the thermostat and so physical fluctuations happen. How- ever, the values must be distributed around a mean value, according to a gaussian law.

The simulations described above have been lead and allowed to plot the Fig. 2.2 and Fig. 2.3.

Figure 2.2: Evolution of total energy in the production phase for liquid Al at T = 1200K and ρ = 0.052A˚−3.

Figure 2.3: Distribution of total energy values for liquid AL at T = 1200K and ρ = 0.052A˚−3

Here, the energy behave completely as expected. Indeed, so do the temperature, the pressure, the

15 Chapter 2 2.1. Molecular Dynamics simulations enthalpy etc.

Let us now take a look at radial functions: g(r) and f(r), shown respectively in Fig. 2.4 and 2.5. Fig. 2.4 displays both the data obtained from LAMMPS simulation and the one obtained experimentally [6], [7].

Figure 2.4: Radial correlation function in liquid Al at T = 1200K and ρ = 0.0154A˚−3

Figure 2.5: Radial distribution function in liquid Al at T = 1200K and ρ = 0.0154A˚−3

Let us now take a look a these functions. To do so, we will consider one atom a as our reference.

• The pair correlation function g(r) describes the probability of presence of an atom at a distance r of atom a. Here, the radius that maximise the function is around 2.7A.˚ This means that atom a should have the most of its neighbor in a sphere of radius at bit higher than r = 2.7A.˚ Since it is the first peak of g(r), these neighbors correspond to the closest neigbhbors.

16 Chapter 2 2.1. Molecular Dynamics simulations

The height and width of the peak characterize the local structure around our atom. The thicker and higher is the peak, the stable is the local structure.

Note that it was experimentally shown that a peak around 3 is expected when the temperature of a material is equal to the melting temperature.

• The radial distribution function f(r) is a functional density. Integrating it between 0 and rmin, the first minimum radius, gives us a value for the number of coordination Ncoord = 11.5 for our atom a. It is quite close to 12 which is the value expected from a fcc structure like we are working with.

Figure 2.6: Mean squared displacement in liquid Al at T = 1200K and ρ = 0.0154A˚−3

In a liquid, the MSD is a linear function of time, as showed in Fig. 2.6: diffusion D is almost constant. That is exactly what we recover here and the diffusion coefficient can be computed according to Eq. 2 −1 Eq.2.1.13. Here we have DT =1200K = 0.9995A˚ .ps .

Manipulation of simulations Above, we have only considered one simulation at a given temperature with a given number of atoms. However, the dynamics of a system can be also studied regarding the number of atom, the density, the volume or the yet, the temperature.

From 864 atoms to 6912 atoms

By increasing the number of atoms within our simulation, we increase the total energy of the simulation (since the total energy is an extensive quantity). However, the energy per atom must remains the same. Otherwise, it means that finite size exist in the simulation. According to statistical physics, microscopic and macroscopic scales are higly related since:

∆E 1 ∝ √ . (2.1.14) E N This result states that statistical fluctuations of the energy are limited by the size of the system. The bigger the system is, the smaller the fluctuations shoud be.

17 Chapter 2 2.1. Molecular Dynamics simulations

Let us take a look at the energy distribution in two simulations with respectively 864 and 6912 atoms, still in a liquid Aluminium, but at a temperature of 1250K. The energy distributions are given in Fig. 2.7 and Fig. 2.8 on next page.

Both distribution are gaussian, as expected, with different mean values and standard deviations. Indeed from simple statistics on the values we get :   σ ' 0.22% E0,N=864 = −2.98749eV, σN=864 = 0.00651eV E0 =⇒   N=864 (2.1.15) E0,N=6912 = −2.98673eV, σN=6912 = 0.00236eV σ ' 0.08% E0 N=6192 √Here, by multiplying the size of the system by almost 10, we have divided the fluctuations by almost 10 ' 3. Eq. Eq.2.1.14 is (happily) verified in these simulations. The difference in energy between the two system size is smaller than the fluctuations and therefore, we can conclude that there is no size effect in the simulation for this physical quantity.

Figure 2.7: Energy distribution with 864 atoms.

Temperature evolution of the enthalpy

The enthalpy is an interesting quantity since it is expected to undergo discontinuity as a function of the temperature when the system faces a state change.

To have a look at this, we build the following simulation for Al, for which temperature increases from 300K with a perfect fcc crystal to 1650K by incremental steps of 50K. At every step, 50 000 timesteps of 0.001ps are run so the system equilibrates well and the quantities can be precisely obtained. Sim- ulations are led with Natoms = 864. The LAMMPS script is given in Appendix A. In Figure 2.9, we clearly see that the curve breaks at a temperature a bit higher than T = 900K. It corresponds to the melting of the system.

In order to get a better value measurement of the melting temperature, we do the exact same sim- ulation but in the opposite direction, starting with the last liquid configuration: the temperature decreases by steps of 50K, from 1650K to 300K. This time, the breaking appears at the very end of the curve, around 350K. From experiments, we know that Aluminium melts aroung 933.5K. As the melting temperature of the EAM model simulated here might be located between the enthalpy jumps of the solid and liquid branches, this result shows the EAM model underestimates the melting

18 Chapter 2 2.1. Molecular Dynamics simulations

Figure 2.8: Energy distribution with 6912 atoms. temperature of aluminium.

Figure 2.9: Enthalpy evolutions for Aluminium

On the left part of the Fig. 2.9, we have the solid branch while on the right part we have the liquid branch. Indeed, when increasing the temperature, the material will melt when it reaches the tem- perature Tmelt. However, if we decrease the temperature and expect the material to go from a liquid state to a solid state, the enthalpy does not break: the liquid state persist. The material is said to be undercooled or supercooled.

Nevertheless, such a process is not a good way to obtain precisely the fusion temperature of a material. A far better alternative is to simulate two boxes, respectively filled with solid and liquid material, with a contact interface. Waiting for the system to equilibrate will finally give a good value for the melting temperature, at which liquid and solid can subsist together. Such a method is illustrated in Fig. 2.10.

19 Chapter 2 2.1. Molecular Dynamics simulations

By doing so with LAMMPS we have managed to obtain a melting temperature around 928K which is quite close to the actual melting temperature.

Phase 1 Phase 2 Phase 3

Liquid Solid Liquid Solid Liquid and solid

T1 ≥ Tmelt T2 ≤ Tmelt T1 ≥ Tmelt T2 ≤ Tmelt T1 = T2 = Tmelt

Two separated interfaces Opening of the frontier Mixed state at T = Tmelt

Figure 2.10: MD simulation led to determine the melting temperature Tmelt of a material

A brief conclusion on these results

Here have been shown a few examples of what can be done and studied thanks to MD simulations with LAMMPS. With very simple work we saw many facets of this tool, regarding structural and dy- namic studies of materials. It allowed to manipulate the quantites, parameters and functions that characterizes a material and that will be useful for the following work.

Moreover, one must recall that simulations must be considered alongside experiment and theory. These three components are complementary and are needed to get a good understanding of materials from an atomic to a macroscopic scale.

2.1.2 ab initio with VASP The Vienna Ab initio Simulation Package (VASP) is, according to its name, the module use to process atomic scale materials modelling with an ab initio approach.

The ab initio designation comes from the fact that such simulation are based on first principles. Under VASP, all the calculations are made using approximation of the many-body Schr¨odinger equation, using density functional theory (DFT) or with the Hartree-Fock (HF) approximation. The DFT approach has been very popular in solid-state physics since it has been developped in the 1970s. As a consequence, we will introduce it with a bit of theory before to study VASP input and output files.

Basics of ab initio calculations: the DFT example Let us assume the Bohr-Oppenheimer approximation:

• Ions are fixed;

• Only valence electrons are taken into accounts.

Moreover, we will consider ourselves in the single electron approximation, assuming that electron- electron interaction are somehow screened by the ion-electron interaction. Thus, we have N equations with a single electron equations, N refering to the number of electrons. In these hypothesis we only have an electronic Hamiltonian given as:

X 1 X e2 Hˆe = Ke + Vext + Vint ≡ Ke + VI (ri − RI ) + , (2.1.16) 2 |rj − ri| I,i i,j

20 Chapter 2 2.1. Molecular Dynamics simulations

where RI are the coordinates of the ion I, ri the coordinates of electron i and Ke refers to the kinetic term of the Hamiltonian defined as:

X K = − ~ ∇2, (2.1.17) e 2m i e where me is the mass of the electron and ∇ refers to the gradient. Quantum mechanics ensure that the electronic energy reads as

hϕ|Hˆ |ϕi E = e , (2.1.18) e hϕ|ϕi and that the electronic density can be written as X n(r) = hϕ| δ(r − ri))|ϕi , (2.1.19) i where |ϕi is the wave of the considered electron.

Then, once the electronic density has been obtained, interactions are computed according to Density Functional Theory (DFT). Two examples of functionals that can be used in VASP are: the Local Den- sity Approximation (LDA) or the Generalized Gradient Approximation (GGA), which is an extension of the LDA including density derivative in the approximation.

The general form of the LDA is, for a spin-unpolarized system: Z LDA Exc [n] = ρ(r)εxc(n(r))dr, (2.1.20) where x stands for exchange and c for correlation. εxc is the exchange-correlation energy per particle and n the electronic density. The exchange-correlation energy is nothing more than the sum of both exchange energy and correlation energy as

Exc = Ex + Ec. (2.1.21) For homogeneous electron gas, the exchange energy is given analytically according to a pointwise approach [8]:

3  3 1/3 Z ELDA[n] = − n(r4/3)dr. (2.1.22) x 4 π

The correlation energy per atom εc has analytical expressions for both low and high-densities given respectively regarding the Wigner-Seitz dimensionless parameter rs (linked to the electronic density 4 3 1 by 3 πrs = n ): ! 1 g g ε (n) = 0 + 1 + ... , (2.1.23) c 3/2 2 rs rs and

εc(n) = A ln(rs) + B + rs(C ln(rs) + D). (2.1.24) Then, all the parameters can be fitted regarding a certain accuracy to, by example, quantum Monte Carlo simulations [9],[10]. The correlation energy is finally given by: Z Ec = n(r)εc(n(r))dr. (2.1.25)

Looking back to the calculation of the density: one of the main issue is now to obtain the electronic wave function, and this can be very tedious. Moreover, we recall that we have here 3N equations to

21 Chapter 2 2.1. Molecular Dynamics simulations solve: N for N electrons (of which the resolution are simplified by the single electron approximation), in a 3 dimensional space.

However, Hohenberg and Kohn proved two very important theorems in 1964 [11]:

Theorem 1 For any given electronic system under an external potential Vext(r), this external potential is uniquely defined by the electronic density of the ground state n(r).

In other terms, the density is uniquely defined by the external potential.

Theorem 2 A universal energy functional depending E[n], depending only on the electronic density n(r) can be determined for any electronic system. The exact ground state energy corresponds to the global minimum of this functional.

From these two theorems, it comes as a fact that n(r) is indeed the basic variable of the problem. Thus, these results allow us to minimize the situation. Since we have three direction, we have now 3 variables to work with instead of 3N.

Approximation are made to approximate the wave function and so the electronic density can be obtained. Then the energy can be computed from this density and simulations can give very accurate results. Still, the wave function calculation remains tedious and this method can involve important CPU times. This is a price to pay for relevant results, as we will discuss in §2.3.

The VASP scripts VASP works in a quite different way as LAMMPS. In order to launch a simulation you need to have four input files:

• INCAR file: the central input file, containing the parameters that design the calculations: the type of calculation, the cut-offs... It has a lof of convenient defaults parameters;

• KPOINTS file: specity the k-points mesh used for the calculations; A typical example of KPOINTS file is given in Appendix A;

• POSCAR file: contains all the cell vectors and atomic positions; For Al, an example is given in Appendix A;

• POTCAR file: must contain the pseudopotential informations for every atomic species (in the same order at the species are given in the POSCAR file)

In terms of the running, VASP and LAMMPS work the same way. The key difference in the way interac- tions are dealt with.

VASP output files The exist a few output files using VASP with the different relevant quantities. As the method computes the quantites for every timestep (called ionic step), the files becomes very large quickly. Let us gives a few examples of these files, that will be of use in our work:

• OUTCAR file: the largest file, containing the running informations (similarly as the log in LAMMPS), information about the electronic steps (Fermi energy, Khon-Sham eigenvalues ), atomic positions forces, configurational stress tensors and energies...

• CONTCAR file: similar to the POSCAR file, gving the geometry data at the end of the run: lattice parameter, ionic positions, velocities...

22 Chapter 2 2.1. Molecular Dynamics simulations

• XDATCAR file: gives the ionic positions at every timestep;

• WAVECAR file: contains the coefficients of the wave function.

Other outputs can be obtained with VASP, especially to get the charges and moments of the cell at every ionic steps: the OSZICAR, DOSCAR, CHGCAR.

2.1.3 Simulation results and comparisons Now that we have seen the basics of LAMMPS and VASP, both algorithm can be compared. To do so, we will first consider pure Aluminium material and then an alloy of with 80% of Aluminium (Al) and 20% of Nickel: Al80Ni20.

Pure aluminium In order to compare both potentials, we led a few simulations of liquid Al at different temperatures. We here present the results for 650K and 1300K, chosen arbitraly to give a good overview of the obtained results.

For the LAMMPS simulation we simulate a box of N = 256 Al atoms in the AlO eam potential, randomly organized. The simulation box has a lattice parameter equal to 4.05 A˚ and we take a scale parameter of 1.05. The dynamic is computed according to the Verlet algorithm, with a timestep of 1fs. Thermalisation lasts 80000 timesteps while the production is run for 50000 timesteps in the canonical (NVT) ensemble.

VASP calculations uses the LDA functional with PAW potential begining with a disorganized configu- ration of 256 atoms. The timestep is also set to 1fs and calculations are led with the Verlet algorithm for 40000 thermalisation steps and 80000 production steps, in the canonical (NVT) ensemble. At 1300K, the simulation is led directly while at 650K, in the undercooled case, the liquid starts at 1500K and is cooled down to 650K. See [13] for more details about VASP simulations.

Pair correlation function g(r)

Figure 2.11: g(r) for liquid Al at 1300K

The main structure appears identical for both method. The nearest atom has a higher probability to be found at a radius of 2.75A.˚ Other peaks are quite similar at both temperatures, but the ab initio cruves tend to decay faster.

The ab initio method describes narrower and higher peak than the classical MD approach. As a con- sequence, the structure is expected to be more defined, atoms are more organised within the system. Note that the height of the peak increases as the temperature descreases but the difference between

23 Chapter 2 2.1. Molecular Dynamics simulations

(a) (b)

Figure 2.12: (a) g(r) for liquid Al at 650K; (b) Zoom on the second g(r) peak for liquid Al at 650K both methods remain constant.

Another difference appears at low temperature: a shouldering is obtained from ab intio calculations which is characteristic of a undercooled state. Indeed, in liquid Al, at low temperature, atoms form a local icosaedral structure. Such an arrangement creates perturbations within the system, which will prevent him from forming a crystal. This is a very subtile point that can not been seen from classical MD and that raises the importance of using ab initio techniques for precise calculations.

MSD and diffusion coefficient

(a) (b)

Figure 2.13: R2(t) for liquid Al at (a) 650K; (b) 1300K

Let us first take a look at the shape of these curves. They can be splitted into two parts : a first one corresponding to a ballistic area and second one where the diffusion is a linear function of time.

In the ballistic area, an atom moves freely from it’s initial position, with a t2 behaviour, i.e a slope of 2. After a certain time, it is slowed down as it encounter other atoms. Then the MSD slows down, the slope decreases and becomes linear with time as we are in a liquid.

As temperature increases, the slopes in the non-ballistic part (corresponding to the diffusion coefficient according to Eq.2.1.13 increase, which is expected. As examples we have obtained for three different temperatures:

 2 −1 Dclassique = 0.169A˚ .ps • T = 650K : 2 −1 , Dab initio = 0.194A˚ .ps

24 Chapter 2 2.1. Molecular Dynamics simulations

 2 −1 Dclassique = 0.932A˚ .ps • T = 1300K : 2 −1 , Dab initio = 1.215A˚ .ps

 2 −1 Dclassique = 1.556A˚ .ps • T = 1600K : 2 −1 . Dab initio = 1.787A˚ .ps

∆D Defining a relative error as ε = D , the average error between those two methods is ε = 16.4%, which is significant.

Conclusion of comparison of the methods

A comparison of the ab initio results with experiment was done by Jakse et al. [13]. The excellent match ensure us that ab initio calculations give results quite close to reality. As a consequence, we will take ab initio results as reference in the following.

Still, both methods give interesting results that can be exploited to understand the material behaviors at the atomic scale. Calculations with LAMMPS are far less expensive than VASP and can be use to study some materials in a first approach and on a large scale (time and size). A deeper understand of atomic behavior will however require an ab initio investigation. In the next part we will extend our comparison to a more complex system: an alloy.

Al80Ni20 alloy

Now we can focus on a more complex system: an Al80Ni20 alloy. As a reference, we will use the results obtained by N. Jakse and A. Pasturel, 2015 [12].

The same procedure is reproduced, so that quantities can be consistently compared. Simulations are led with LAMMPS, using the EAM potential for B2-NiAl proposed by Y. Mishin et al., 2002 [14]. The liquid is melted at Tmax = 2000K (the melting temperature of AlNi is around 1320K) before to be cooled downto temperature T in the range [600, 1800]K. In order that the cooling rate remains constant at 3.3 × 1012K.s−1 like in [12], the number of cooling steps if a function of the temperature such that :

T − T N = max = 3 × 103(T − T ). steps 3.3 × 1012 × 10−15 max Note that the 10−15 factor comes from the timestep of 0.001ps. Moreover, we made sure that for every final temperature, the densities fits the one obtained experi- mentally from [15] so we get the exact same thermodynamic states (ρ, T ) as in [12].

We have made 3 simulations, at 820, 1020 and 1320K, for 256 atoms and a timestep of 1fs. The number of production steps is given above and fixed by the cooling rate choosen in [12].

Partial pair correlation functions

Let us first take a look at gXY(r) functions, where X and Y denotes the elements we are considering. g(r) therefore characterize the probability of finding an atom of type Y at a distance r of an atom of type X (and vice versa).

In Fig 2.14, the black and red curves are respectively shifted upwards of 2 and 1 units for lisibility. The data produced with LAMMPS do not allow to reach radius as large as VASP data, which is only a technical aspect of the domain decompistion parallelisation of the LAMMPS code. Still, these data are sufficient to compare both algorithm.

25 Chapter 2 2.1. Molecular Dynamics simulations

(a) (b)

(c)

Figure 2.14: Comparison of pair correlation function (a) gAlAl(r); (b) gAlNi(r); (c) gNiNi(r)

Both methods show similar results in Fig. 2.14 (a) and (b). A well defined first peak and a second peak splitted in the case of classical MD and more pronounced at higher temperature. This actually makes sense since at higher temperature, structure tends to be much less defined.

Things become clearly worse when it comes to Ni-Ni correlation in Fig. 2.14 (c). According to ab initio simulations, the second peak is the greatest while it is the first peak for the classical MD. It means that ab initio expected Ni atoms to be surrounded by a very few number of Ni (less than 1 in average), a feature that is reproduced by the classical potential but with a shorter NiNi bond length (see Fig. 2.16).

Diffusion coefficient

Plotting the diffusion coefficient against 1000/T gives us Fig. 2.15.

Tendancies are the same and both methods show that Ni diffuses less than Al, at every temperature. We recover here the Arrhenius law break we expected from theory.

We have seen from the plots of g(r) that LAMMPs and VASP predict different structure, leading to discrepancies in the diffusion coefficients. However, a complete understanding of the link between the structure and dynamics is an open issue we will not deal with in this thesis.

Coordination number

Finally, we can take a look at coordination numbers for every element relative to another. On the left are the results obtained using classical MD while on the right we have the result from ab initio simulations. Note that we define NAl ≡ NAlNi + NAlAl and NNi ≡ NNiAl + NNiNi.

26 Chapter 2 2.1. Molecular Dynamics simulations

Figure 2.15: Comparison of diffusion coefficient

(a) (b)

Figure 2.16: Coordination numbers from (a) classical MD; (b) ab initio.

While coordination numbers are quite similar for NiNi, they differ for AlNi (and so NiAl, by symme- try). However, the biggest discrepancies appear for AlAl : NAlAl,DM = 9.6 6= NAlAl,ab initio = 10.2. This can be a consequence of the use of the AlO potential with LAMMPS. Every potential describes specific environment and the one we used here may not be the best to describe the actual structure.

A short conclusion on the comparisons

Classical MD simulations allow to predict the general behavior of atoms in a system with respect to ab initio: shape of the partial pair correlation functions, tendancy of the diffusion or yet a good approximation of the coordination number.

Classical MD with LAMMPS thus remain very interesting in a first approach to study a lot of systems, using the appropriate potential with large box size and larger time span which is often required.

27 Chapter 2 2.2. Constructing a neural network potential (NNP)

2.2 Constructing a neural network potential (NNP)

Machine Learning technique occupy an important place on research those days. These tools, including neural networks, allowed the development of very promising works in the last few years, especially in the construction of Neural Network Potentials, which is one of the purpose of this thesis.

As a consequence, we will begin our discussion on NNP with a quick overview of neural networks before to see how these potentials can be computed from a method proposed by Behler and Parinello in 2007 [16].

To illustrate NNP and the way we use Machine Learning to build PES, we refer to Fig. 2.17, taken from Artrith presentation [17].

Figure 2.17: General approach of building NNP with Machine Learning. Taken from Artrith presen- tation at Aalto’s University [17].

2.2.1 A neural network (NN) Let us first begin by clearly defining a NN: a circuit of neurons in which the information propagates from one neuron to every other connected to this latter.

In Artificial Neural Networks (ANNs), neurons are represented by nodes and the links between these nodes correspond to functions that can be either linear or not. There is many ways to design NN and almost every structure can be described by this Machine Learning technique.

As other Machine Learning techniques, there is a learning component in the process as shown above, which requires a lot of data. Indeed, as a ML technique, the purpose of this tool is the predict an output for a given input. Thus, to train the network, one needs to have great amounts of data with both input and output quantites. Once the network is trained, it has, like the human brain, earned experience and is able to predict the output regarding a given input.

Artificial neurons work just like human ones and can be mathematically modeled. These are exple- citely compared on Fig. 2.18. The biological neuron has a body with connections on the left end, a core call ”axon” and axon terminals on the right end. Such structure recovered when describing a neuron mathematically. The information propagates from the left to the right. In Fig 2.18(b), the xi correspond to the input signals, each weigthed by wi. Then, these signals are summed and propagates through the neuron according to an ”activation function” f that gives the output signal of the neuron. From there, a mathematical description can be obtained. All the inputs signals are linearly combined before to be activated by f to give the output as :

Ninputs  X y = f(u) = f  wixi . (2.2.1) i=1

Alongside weights wi, it is of common use to add constant terms bi called biases, to shift the activation function. It gives to the neuron more flexibility and the output of an artificial neuron is given in

28 Chapter 2 2.2. Constructing a neural network potential (NNP)

x1 w1

x2 w2 u y w3 f(u) x3 w4

x4

(a) (b) ”Cell body” ”Axon” ”Axon terminals”

Figure 2.18: (a) Scheme of a neuron; (b) Mathematical description of a neuron in a NN

Eq.2.2.2. From now on, we will use the term of weights to refer to the sets {wi, bi}i

Ninputs  X y = f(u) = f  wixi + bi . (2.2.2) i=1 Note that positive weights enhanced connections while negative weights tend to inhibit the connec- tions. Most of the activation function are choose to have a range in either [0, 1] or [−1, 1] and modulates the amplitude of the output.

A neural network (either biological or artificial depending on the type of neuron it is made of) consist of a network of such neurons. Artificial networks can ”learn” from dataset just like species learn from experience. By training an ANN with datasets large enough, the weights and biases can adapt (we talk about trainable weights) and the output can be treated as a prediction.

Many types of ANN exist, allowing predictive modeling on numerical datasets, image classification or adaptative control... They are powerful tools in a world where relationships between inputs and outputs are, most of the time, non linear and complex.

2.2.2 Multi-layer perceptrons (MLP) The MLP is a very common class of neural network, if not the most classical. It can be built and exploited easily, while being endowed with a great flexibility. Such NN are built with successive layers, each made of an arbitrary number of nodes that we call neurones. Especially, these compose a class of feed-forward ANNs.

According to the Universal Approximation Theorem, any continuous function that maps from Rn into R, for n ∈ N, can be approximated modulo a certain accuracy (or so-called ’loss’), by a multi-layer perceptron with one hidden-layer. In other terms, a neural network can be considered as a universal function approximator in a certain domain of functions. A point of matter with this theorem is that it applies for one hidden layer, with a few activation functions, like sigmoid or hyperbolic tangent. As a consequence, for MLP with more than one layer can also be used to approximate functions, at the only condition that at least one layer is activated according to one of the convenient activation functions.

MLP are used mainly for regression prediction issues. They are well suited to treat numerical databasis and so are convenient to predict energie values from, by example, temperature and density. As a consequence, such NN can be used to obtain energy and forces (needed to build the potential) from data like positions of atoms in a given material. In Fig. 2.20, the MLP is made with one input layer that takes two inputs, two hidden layers made of 5 and 4 neurons each and an output layer that returns one output. The nodes are represented by circles while the arrows denote the propagation of the information within the network.

29 Chapter 2 2.2. Constructing a neural network potential (NNP)

Hidden11

Hidden21

Hidden12

Input1 Hidden22

Hidden13 Output1

Input2 Hidden23

Hidden14

Hidden24

Hidden15

(a) Input layer Hidden layer n°1 Hidden layer n°2 Output layer 1 b1 2 2 1 1 w11 b1 w11 y1 2 2 1 w21 y1 x0 w21 3 1 . y1 . w11 . 2 . 3 2 2 b1 y2 w12 1 3 y3 y1 2 2 . y3 w13 . 1 . 0 y4 . 2 x2 1 w14 w42 2 2 w35 y4 1 1 w52 y5 2 2 w45 b4 b1 (b) 5

Figure 2.19: (a) Example of a MLP; (b) Mathematical description of a MLP.

The layers A MLP is a feed-forward Artificial Neural Network (ANN). Its scheme is quite intuitive : successive layers made of neurones and connected by links. Information moves forward through the network from one layer to another under a condition of ”activation” of the layer. While propagating within the network, this information undergoes a few calculations, from one neuron to another and so on, from one layer to another.

More than being feed-forward NN, MLPs are also fully-connected NN, which means that all the outputs of the layer l − 1 will correspond to the inputs of the layer l and so on. In practice, each layer has an activation function f l (which will be the activation function of each l l neuron that composes it) and a set of weights {wij, bi}ij, where l denotes the layer, i refers to the neurons of the layer and j the inputs (or outputs of the previous layer). Note that our index j here correspond exactly to the index i in Eq.2.2.2.

Most of the time, the input and the output layers are linear, without any activation function (recalling that only one is needed for the Universal Approximation Theorem to apply) and only the weights of the hidden layers are trained to obtain the best predictions. Therefore, known parameters values are sent in the input layer, undergo mathematical operations within the hidden layers, involving the trained weights, to return the ouput as a prediction of the NN.

Information propagation According to this scheme, the MLP appears to be a linear combination of individual neurons. Indeed, it has almost the same description. The NN have activation functions and weights that allow to com-

30 Chapter 2 2.2. Constructing a neural network potential (NNP) pute the ouputs from given inputs.

An MLP is a fully-connected NN. Thus, the behaviour is very similar to the one of a simple neuron. Regarding Fig. 2.20(b), the information propagates from the left to the right and weights adapt as the l network is trained to get good predictions as outputs. wij correspond to the weight of the connection l between the neuron i of the layer l to the neuron j of the layer l − 1. bi is the bias of the ith neuron of the layer l.

0 0 0 Here, we consider the input layer as an identity layer such that xi = ui = yi , for i = 1, 2. The inputs values directly enter the NN. For every layer l, the output of the ith neuron is given according to Eq.2.2.2.

Thus :   Nl−1 l l l l X l l−1 l yi = f (ui) = f  wijyj + bi , (2.2.3) j=1 where Nl refers to the size of the lth layer, i.e the number of neurons in the layer.

3 As a reminder, the input of layer l is the output of layer l − 1. Therefore, the output y1 of the network 0 0 in Fig. 2.20(b) can be expressed as a function of the inputs (x1, x2) as following :

 4 5 2 ! !  3 3 X 3 2 X 2 1 X 1 0 1 2 3 yi = f  wijf wjkf wklxl + bk + bj + bi  . (2.2.4) j=1 k=1 l=1

Activation functions As already discussed, the activation function modulates the amplitude of the output. In other terms, it maps an interval into another. There exist a huge variety of activation functions, each with its own perkas and drawbacks. In the following we will introduce a few common and useful activa- tion functions, either linear or not. Note that activation functions must be in accordance with the Universal Approximation Theorem according to which the activation function must be: continuous, non-constant, monotonically-increasing and bounded.

Hyperbolic functions (tanh and logistic function)

The hyperbolic tangent tanh is a very common activation function. It projects R on the interval [−1, 1].

The logistic function, also called sigmoid is quite similar to the tanh function. It projects R onto [0,1] as is defined as: 1 sigmoid(x) = . (2.2.5) 1 + exp(−x) Both functions are plotted in Fig. 2.21.

Rectified linear unit (ReLU) and softplus functions

The ReLU function is a bit different from the two discussed above. It maps R to R+ as follow:

ReLU(x) = max(0, x). (2.2.6)

Last but not least, the softplus function also maps R to R+ but smoother than ReLU does.

31 Chapter 2 2.2. Constructing a neural network potential (NNP)

(a) (b)

(c)

Figure 2.20: (a) Hyperbolic tangent; (b) Sigmoid function; (c) Both functions

It is defined as

softplus(x) = ln(1 + exp(x)). (2.2.7) Both functions can be compared in Fig 2.22.

n n Most of activation functions map R into smaller intervals like R+, for n ∈ N, [0, 1] or [−1, 1] etc. This is done by linear and non-linear functions. Especially, such projections and the use of non-linear functions allow to deal with nontrivial problem with a small number of nodes. Note that according to this, in order not to restrict the output of a NN, the output layer often has a linear activation function.

Training of the NN We have formalised the propagation of information within a NN. However, this is not sufficient yet to understand how training works. To train a NN, a dataset is read by the NN. From this dataset, it recognizes the input and the output. Then, from these inputs, it can compute an output value and compare it to the expect value of the database, generating an error called loss. Most of the time, the loss is associated with a cost function Γ. An example and standard choice of cost function is the MSE, mean-squared error defined as

N 1 X Γ ≡ Γ(w, X, y) = (y − y∗)2 , (2.2.8) 2N i i i=1 where w denotes the weights, X and y the input and output of the NN and N the number of set in the training data. y∗ denotes the NN prediction. This function is computed for every set of values in the dataset. It accounts for an epoch.

For each traning epoch, the is then minimized by a algorithm, which l computes new weights wij. With the new weights, a new loss is computed and so on.

32 Chapter 2 2.2. Constructing a neural network potential (NNP)

(a) (b)

(c)

Figure 2.21: (a) ReLU function; (b) Softplus function; (c) Both functions

The training ends when either the maximum number of training epochs (chosen by the user) is reached or an early stopping function stops the training if, by example, the loss does not improve after a certain number of epochs.

Overfitting while training

When training a NN it is very important to make sure we are not overfitting it. A NN is said to be over fitted when it has learned so much on a data set that it is not able to predict correctly from data outside the data.

Let us take an example: we consider a dataset from which we take 80% to train the NN (training set) and keep 20% away (validation set). At each training iteration (or epoch), both sets are passed within the network and we measure the error (or loss) on these sets to finally obtain Fig. 2.23.

First, the loss decreases for both sets, converging to a small value. However, after a certain number of epochs, the validation loss begins to increase: this is the overfitting. The NN is not able anymore to predict values outside its training set since it has learn too much on it. The NN is indeed, too constrainted.

L2 regularization

It is common to consider a L2 weight regularization penalty α, also called weight decay. Indeed, the regularization terms comes in the cost function as:

Cost function = Loss + Regularization term. (2.2.9) It is called weight decay as it prevents the weights value from getting really big. It is assumed that smaller weights values correspond to simpler models. As a consequence it also limits overfitting to a

33 Chapter 2 2.2. Constructing a neural network potential (NNP)

Figure 2.22: Example of an overfitted NN certain extent. It can be mathematically expressed as:

N Nweights α 1 X α X Γ = Γ + kwk ≡ (y − y∗)2 + |w |2. (2.2.10) 0 2N 2 2N i i 2N j i=1 j=1

Here N denotes the size of the dataset and Nweights the total number of weights. Γ0, defined in 2.2.8, corresponds to the unregularized cost function.

Gradient descent algorithm

The gradient descent is an optimization algorithm that aims to minimize the cost function to obtain relevant predictions. The term gradient refers to the direction that takes the function of interest (the cost function Γ) regarding the evolving parameters (the weights w). To minimize the cost function, we want to move the weights in the opposite direction of the gradient of this cost function, regarding the weights. This can be mathematically expressed at a step k by Eq. Eq.2.2.11:

wk+1 = wk − η∇wk Γ(w), (2.2.11) where η corresponds to step size within the optimization process, also called learning rate.

There exists different types of gradient descent: Batch Gradient Descent, Stochastic Gradient Descent or Mini batch Gradient Descent depending on the frequency of weights updates and using different updates methods.

From a NN point of view, the gradient descent algorithm is applied using what we call an optimizer. There exist many variety of optimizers, based on the different gradient descent methods. Their pur- pose is to update the weights to minimize the loss function.

Adam optimization algorithm

In this work we have mainly worked with the Adam (Adaptative Moment estimation), a very popular optimizer, presenting a good compromise between efficiency and CPU time. It combines two different

34 Chapter 2 2.2. Constructing a neural network potential (NNP) algorithms: AdaGrad (Adaptative Gradient Algorithm) and RMSProp (Root Mean Square Propaga- tion) [21].

The Adam optimization algorithm is a stochastic gradient descent procedure. It has 4 parameters: η the learning rate; β1 (default value of 0.9) the exponential decay of the first moment estimation; β2 (default value of 0.999) the exponential decay of the second moment estimation, which should be close to 1 to avoid issues with a sparse gradient;  a very small number to prevent any division by zero (∼ 10−8).

Mathematically, if we denote the gradient gt = ∇wt Γ(w), the Adam optimizer computes, at each time step (or epoch here):

mt = β1mt−1 + (1 − β1)gt 2 , (2.2.12) vt = β2vt1 + (1 − β2)gt where mt and vt are estimates of the first and second momenta (resp. mean and uncentered variance) of the gradient.

At initialization, mt and vt are set to zero, which creates biases. These are counteracted by using bias-corrected first and second momenta estimates :

mt vt mˆ t = ;v ˆt = . (2.2.13) 1 − β1 1 − β2 Finally the weights are updated according to the Adam update rule as follow: η wt+1 = wt − √ mˆ t. (2.2.14) vˆt + 

2.2.3 Constructing a NN Many tools exist dealing with NN within Python. One of the most common is scikit-learn, while one of the most flexible is TensorFlow and espicially its module Keras. During this thesis, we have investigated both of these Python module in order to find the best suited for our purpose.

The process of creation and use of a NN can be divided in three big steps: the creation of the model with its hyperparameters, the fitting or training using a dataset and the prediction. In this section we will explicitely discuss these three steps and see the role of NN hyper parameters before to compare the two ML methods mentionned earlier.

NN hyperparameters A NN involves a few hyperparameters we will present briefly in this paragraph. This list is non- exhaustive and each module has its own parameters to set (and with of course different names most of the time), depending on its own flexibility. We will get into more detail for each module in the next subsection.

• solver or optimizers: denote which optimization algorithm to use; • init weights: set the NN initial weights; • loss: loss function to use in the NN training;

• alpha or regulizer: the L2 regularization term; • hidden layer sizes: the size of the hidden layers of the NN; • max iter or epochs: maximum number of training iteration/epochs; • tolerance: convergence condition, the training stops if the loss does not improve by more than tolerance after patience iterations;

35 Chapter 2 2.2. Constructing a neural network potential (NNP)

• activation function: activation function for the layer;

• random state: allow to fix the random seed so results are reproducible;

• shuffle: chose weither data should be shuffled or not;

• validation fraction: pourcentage of the training data that are used for the validation in the model fitting;

• batch size: size of the data batches. A batch corresponds to the number of data samples (or rows) that are processed before the weights are updated.

In addition to these hyperparameters, optimizers also have their own hyperparameters as already discussed is §3.2.4.

Preprocessing the data Training the NN is not as simple as putting the whole dataset within the NN. It needs to be processed a bit so the training can be done efficiently. Especially, it is important when training a NN to avoid the overfitting, that corresponds to the non-ability of a NN to predict values out of the range of values it has been trained on.

Splitting the dataset

In this work, we distinguish three types of datasets: the training set, the validation set and the test set. These are all proportions of the whole dataset and contains NN inputs and outputs. Note that if the dataset is not shuffled, these sets must be shuffled so all sets cover every data, to avoid overfitting.

• Training set: used to directly train the NN. The NN learns from the data of this set;

• Validation set: used to measure the prediction regarding data not used for the training. It is used within the training as a reference to make the model converge or not;

• Test set: used to check the prediction made by our NN. Quite similar to the validation set, we do not include it within the NN so we can check the prediction made once the training is achieved.

The sets can be splitted using the module train test split from sklearn.model selection:

#splitting the training and test sets; a common proportion is 80/20 pourcent train, test = train test split(dataset,test size=0.2,shuffle=True) #splitting the training into training and validation #if the user wants to enforce the validation set train, val = train test split(train,test size=0.2,shuffle=True);

Note that a validation fraction hyperparameter exists for most of the NN model. However it can be convenient to chose a validation set to compare different models by example. This can be done as above or also by fixing the random seed if both model use the same random process (most of python’s function use the numpy.random process).

In the following, considering a set, by example the train set, we will denote train X and train y respectively the inputs and outputs data of the NN.

Scaling the dataset

Another important step before to train a NN is to rescale the dataset. Unscaled data can be prejudi- ciable to make prediction since the NN will not be able to distinguish a row from another.

36 Chapter 2 2.2. Constructing a neural network potential (NNP)

From sklearn.preprocessing library, the StandardScaler function allows to correctly scale the data by removing the serie mean and scaling regarding the variance:

x − m(x) StandardScaler(x) = , (2.2.15) σ(x) where m refers to the mean and σ to the standard error, while x is an input serie of the dataset. Both can be set to zero on the user’s choice.

Other scaling techniques can be used and we will discuss a few of them when constructing a NNP.

Scikit-learn and MLPRegressor approach Constructing a NN can be done quite easily using scikit-learn, especially in regression problems like ours.

Especially, there exist a module MLPRegressor, that can be import from scikit-learn.Neural Network, completely built for such issues. Note that the default loss of MLPRegressor is the MSE (Mean-Squared Error).

From there, the model can be computed, fitted to the training data and validated on the validation set.

Building and compiling the model

First, we build the model using different hyperparameters: mlpr=MLPRegressor(solver=’adam’,alpha=alpha l2,batch size=batch size,shuffle=shuffle, hidden layer sizes=hidden layer sizes,max iter=max epochs,activation=activation function, random state=None,early stopping=True,validation fraction=0.2, tol=min delta, n iter no change=patience)

Training the model

We can now train/fit the model with the data by simply using mlpr.fit(train X,train y)

Predicting with the model

Finally, prediction on a set test X are obtained using mlpr.predict(test X).

TensorFlow and Keras approach TensorFlow is known to be a strong tool to build Deep Neural Network, based on Keras a Python deep learning API. As a consequence, we want to use it to build the same NN as seen with scikit-learn before to extend it to construct NNP. On Septembre 2019, TensorFlow was upgraded to 2.0, changing most of its commands and making it more intuitive to use and quite close to Keras. Such module allows to build simple, efficient and very flexible NNs as we will see in this part.

Building the model

More flexibility involes more parameters. As a consequence, building a NN using Keras is different than with Scikit-learn. Before to compile the model we must define the layers of our NN. To do so we use tf.keras, the high-level API used to build and train deep learning models. From there, we can construct the

37 Chapter 2 2.2. Constructing a neural network potential (NNP) different layers of our NN. The parameters of each layers are chosen to be in accordance with the one implemented under scikit-learn.

• Input layer: tf.keras.layers.DenseFeatures(input columns), where input columns is a TensorFlow numeric column object obtained with feature column.numeric column since our database is made only of numerical values;

• nth hidden layer: layers.Dense(size n,activation=activation function,kernel regularizer =regularizers.l2(alpha l2)), using the modulus layers from tensorflow.keras; the regularizers function is imported from keras and regularizers.l2 corresponds to the L2 norm penalty al- ready discussed in §3.2. The Dense is used to build a regular densely connected NN. According to previous discussion we build two layers of size 5 and 4 this way;

• Output layer: layers.Dense(1), is a linear, 1-dimensional output of the quantity measured previously combined with the optimized weights and biases.

When using Dense layers, the weights are initialized by default with a Glorot Uniform distribution while the biases are initialized to zeros. The Glorot uniform distribution is such as for a layer l with a number of inputs Nl−1 and outputs Nl the weights wl are taken as s s ! 6 6 wl ∈ U − , + , (2.2.16) Nl−1 − Nl Nl−1 − Nl where U refers to the uniform distrubution. The initialization can be changed within keras by simply using the variables kernel initializer and bias initializer.

Now that we have built our layers and stored them into a list NN layers, we can build the model. Keras has two main model categories : Sequential and Functional model. The sequential model corre- spond to linear, layer-by-layer model with one input and one output. The functional model (or API) allows one to connect each layer to any other layer, resulting in far more complex networks which we will discuss while constructing our high-dimensional NNP. In this first example, the Sequential module is sufficient : model = tf.keras.Sequential(NN layers).

We now need to compile our model and make sure every parameters are coherent with the ones seen in §3.2. The parameters are not implemented in keras as they were in scikit-learn. The model is parametrized by general functions that must be parametrized on their own.

In order to compile the model we have to parametrize correctly our optimizer. To do so, we choose the adam optimizer and use the same parametrization as already seen (default parameters of this opti- mizers in MLPRegressor): optimizer adam = tf.keras.optimizers.Adam(learning rate=0.001, beta 1=0.9, beta 2=0.999, epsilon=1e-08, amsgrad=False).

Henceforth, model.compile(optimizer=optimizer adam,loss=’mean squared error’,metrics=[’MSE’]) will compile our model.

Training the model

Before to fit the model, one important thing needs to be defined: the so-called Early Stopping. It al- lows the fitting of the model to stop when a monitored quantity (for us the loss) has stopped improving.

We define it as EarlyStopping = tf.keras.callbacks.EarlyStopping(monitor=’loss’, min delta=1e-05, patience=10, mode=’auto’, baseline=None, restore best weights=False), where min delta cor- responds to the expected minimum change in the loss to consider it has improved and patience the number of iterations it must not have improved before stopping the fit (in accordance with the tol

38 Chapter 2 2.2. Constructing a neural network potential (NNP) parameter of MLPRegressor function).

Finally the fit is made using both training and validation sets: model.fit(train ds,validation data=val ds, epochs=max epochs,verbose=0,callbacks=[EarlyStopping]). It stops when the loss is converg- ing, just like it was done with scikit-learn.

Predicting with the model

The prediction command is model.predict(test X).

Keras validation As seen above, constructing a NN with Keras can quickly become messy if one does not clearly understand every step. While Scikit-learn is quite easy to use, it can be limited in terms of flexibility. Still, they can both be used to build simple NNs. In this part we will try to build a given NN with both methods to compare them and to make sure our models behave exactly as we want them to. The hyperparameters are set as following in both models:

• solver=’adam’: with parameters beta 1=0.9, beta 2=0.999, a numerical stability of epsilon=3e-08 and a constant learning rate of 0.001;

• alpha=1e-5: the L2 regularization term ;

• hidden layer sizes = (5,4): two hidden layers of 5 and 4 neurons ;

• max iter=100000: maximum number of iteration if the error does not converge ;

• tol=1e-5 : convergence condition, the NN stops training if the loss does not improve by more than tol after 10 iterations;

• activation=’tanh’: activation function for the layers,

• random state=1: fix the random seed so results are reproducible (every suffling will be identical for each method);

• shuffle = True: data are shuffled;

• validation fraction=0.2: 0.2 of the training data are used for the validation of the model fit;

• batch size = 32: default value for keras; 32 data rows are processed before every weights updates.

The structure of the NN is the one used in [23] and has two hidden layers of size 5 and 4, on the same pattern as the one given in Fig. 2.20

Comparison with an example

As an example, we will work with a database, based on the obtained value of Johnson et al. [22]. According to Table 2 and equation (5) of this latter, it was possible to build a database of density ρ and temperature T as a 2-dimensional input x and Helmholtz free energy A as an output y containing 11338 values for each parameter (discretization of the 180 obtained rows). Such a database is large enough to train, validate and test a NN. Thus, a NN can be developped in order to get predictions on A according to a (ρ, T ) input that we will call as macroscopic feature. The paper from C. Desgranges and J. Delhommelle [23] was used as a reference to validate the so-built NN.

39 Chapter 2 2.2. Constructing a neural network potential (NNP)

Nb epochs MSE Execution time MLPRegressor 70 0.0016986252430028345 9.13219308853149 s keras 344 0.0005154545651748776 132.039839267730 s

Table 2.1: Convergence results of the NN on the Helmotz free energy example

The database is based on a lot of MD simulations at different temperatures and densities. According to Eq (5) from [22], the Helmholtz free energy in natural units (A∗ = A/Nε, with ε the Lennard-Jones parameter) is given as :

8 ∗i 6 X aiρ X A∗ = + b G , (2.2.17) i i i i=1 i=1 where ai, Gi and bi are given is Tables 5, 6 and 7 in Johnson’s paper [22]. These parameters are all density and temperature dependant. Moreover, the effective quantity we are interested in is the reduced excess Helmholtz free energy, i.e the Helmholtz free energy plus the ideal energy given by Aideal = ρ(ln(T ) − 1). Therefore we will work with Aex = A + Aideal in our database.

NB : All the data are expressed in natural units. As a consequence we will assume every quantity to be in natural units and leave the superscript ”∗” in following.

After splitted the dataset into training, validation and test sets, the data are scaled and our two NNs are built and fitted. On Fig. 2.24 are plotted the predictions for each model, regarding the effective values. These results are quite convenient. Both methods converge to small losses. The training results are summed up in Table 2.1. Execution time refers to the time needed to build, compile and fit the model.

(a) (b)

ML Figure 2.23: Aex prediction using (a) MLPRegressor; (b) keras

When training a NN, the weights are updated. As a consequence, to investigate a bit deeper the comparison of these methods we can take a look at the weights obtained for each methods. Especially, ML Eq. (1) in [23] gives us the expression of Aex as a function of the NN weights:

4  5  ML X X Aex = b3 + W (3, 4, l, 1) tanh b2 + W (2, 3, j, l) tanh [b1 + W (1, 2, 1, j)T + W (1, 2, 2, j)ρ] . l=1 j=1 (2.2.18) ML ML As a consequence, Aex β ≡ Aex /T can be plotted for different temperatures similarly as in Fig. 3 in [23] in Fig. . The energy is directly computed from Eq. 2.2.18

40 Chapter 2 2.2. Constructing a neural network potential (NNP)

Figure 2.24: Reduced excess Helmholtz free energy per temperature against density in a LJ system

For the highest temperature, the predicted reduced energies are identical. When the temperature diminishes, discreapancies appear. This is a direct consequence of the state of the LJ system at low temperatures: for low and high densities, the system is less stable. This can be directly read from the phase diagram given as Fig. 1 from [22].

Figure 2.25: Lennard-Jones system phase diagram taken from [22].

As a consequence of such system unstabilities, the data are limited in this (ρ, T ) area, leading to less accurate predictions. Still, the results obtained are convenient and the keras results are closer to the one obtained by Desgranges and Delhommelle [23], even though the loss is higher than with MLPRegressor. The CPU time is also higher for keras. However, this is a known trouble from Keras and is a counterpart to pay for all the perks the module has [24].

2.2.4 The Behler-Parrinello method In the last subsection we have discussed the description of a material with so-called macroscopic fea- tures. With the Behler-Parrinello approach we are able to get more precise description of a material using microscopic features based on the atomic positions within the material.

41 Chapter 2 2.2. Constructing a neural network potential (NNP)

First proposed by J. Behler and M. Parrinello in 2007 [16], the Behler-Parrinello method aims to rep- resent Potential Energy Surfaces (PESs) using an ANN. Such an ANN is expected to predict energy from atom positions within a given configuration. Then, from the predicted energy, the energy gra- dients and forces can be computed and the potential formed. Even though this methods is expected to give more precise results, it has a cost: we need to build a neural system more complex than in 2.18(b). In this subsection, we will go through the definition of the microscopic features and the way the neural system must be created to be efficient in understanding material behaviours.

The choice of NNP to build PESs is well motivated as it requires much less CPU time that DFT calculations. Moreover, energy can be obtained from reference data with a great accuracy and such NN energy expression are unbiased, meaning that they can apply to all types of bonding.

Therefore, one must recall that NNPs require a lot of training points and have very limited extrapo- lation capabilities. As a consequence, they need to be build carefully and clearly understood to give accurate results.

The NN structure for High-Dimensional Potentials In Behler 2007 [16], the ”high-dimensional” NNP is defined as a NNP that can be used to deal with thousands of atoms and all of their degree of freedom (radial, angular). This way, very complex systems can be dealt with, with atoms of different types in large quantites. Until now, we have only discussed what we can call Conventional NNP (cf Fig. 2.20). From now on we will figure what is it to build high-dimensional NNP.

Single feed-forward NN

First, considering a single feed-forward NN would not efficient enough to efficiently predict the energy of a given configuration: the large number of degree of freedom will lead to a very large number of adjustment weights and the fitting will become tedious. Moreover, such calculations would be com- putationally expensive.

Another point is quoted in [19] and has been investigated since 1998 [18]: in some material, bonds should be completely equivalent, meaning that exchanging their positions might result in the same energy. However, a single feed-forward NN does not guarantee this symmetry. In other words, the order of atoms matter in the input of the NN, which should not be necesseraly the case.

Finally, with a single NN there is no way to add any atom to the system without changing all the weights.

A more complex structure

Thus, the proposition made to counter-act all these flaws is to compute the total energy of the system according to atomic energies, predicted themselves by individual atomic NNs. We would therefore have the total energy of the system E as :

N Xatoms E = Ei, (2.2.19) i=1 where Natoms is the number of atoms composing the system and Ei correspond to the individual atomic energy of atom i. These individual atomic energies correspond to the energy of an atom in a local environment (or configuration) up to a cut-off radius Rc so only the relevant energy is kept. This is shown in Fig. 2.27.

42 Chapter 2 2.2. Constructing a neural network potential (NNP)

Figure 2.26: Atomic environment. Fig 2 in Belher 2015 [19].

NN The strategy here is to use NN to predict the individual energies Ei in order to obtain the total energy of a given configuration ENN. However, the MSE (Mean Squared Error) error functional we want to minimize is :

N 1 Xstruct h i2 Γ = Eref − ENN , (2.2.20) N σ σ struct σ=1 where Nstruct refers to the total number of local structures (or configurations) we have in our basis. As a consequence, the calculations of ENN must be included within the NN.

Note: According to Eq 2.2.20, the MSE only depends on the energy. However, to get a better weights training and so better predicitions, the atomic forces can be used in the MSE calculations. Indeed, Eq 2.2.20 can be remplaced by the following MSE formula [19]:

N  3N  1 Xstruct  2 β Xatom  2 Γ = Eref − ENN + F ref − F NN , (2.2.21) N  σ σ 3N j,σ j,σ  struct σ=1 atoms j=1 where Natoms refers to the number of atoms within the system and α and β to the weight quantifying the respective importance of energy and forces prediction role in the MSE calculation. Even more precision can be obtained using configurational stresses S. These are obtained from the NNP as [31]:

N  3N 6  struct 2 atom 2 2 1 X  ref NN β X  ref NN γ X  ref NN Γ = α Eσ − Eσ + Fj,σ − Fj,σ + Sk,σ − Sk,σ  , Nstruct 3Natoms 6 σ=1 j=1 k (2.2.22) But once again we will not dig further in this direction.

Let us consider a material composed with a total of N atoms of two different types a and b. The whole set of atoms will be denoted as {a1, a2, a3, ..., bN−2, bN−1, bN }. The structure chosen to perform energy predictions from atomic positions is given in Fig. 2.28.

Our system takes as input an entire configuration of N atoms {ai}i=1...N with their spatial coordinates. Then, these coordinates are transformed under symmetry functions that we will discuss in the next subsection. From these symmetry functions that describe the local environment of each atom ai, the atomic energy is predicted from a NN trained, for each type of atom a (or chemical element). Finally, predicted energy are sumed according to Eq.2.2.19. By doing this, the output of the system is the total energy and the function defined in Eq.2.2.20 is minimized.

Note: it is important to notice that each atomic NN is element dependant. Indeed, there is one unique NN for each type of atom a and b within a material. This unique NN will be used to predict the energy

43 Chapter 2 2.2. Constructing a neural network potential (NNP)

Cartesian Symmetry Atomic Atomic Configurational Coordinates Functions NNs Energies Energy

r(a1) G(a1) NNa(a1) E(a1)

r(a2) G(a2) NNa(a2) E(a2)

r(a3) G(a3) NNa(a3) E(a3) N atoms . . . . configuration . . . . E

r(bN−2) G(bN−2) NNb(bN−2) E(bN−2)

r(bN−1) G(bN−1) NNb(bN−1) E(bN−1)

r(bN ) G(bN ) NNb(bN ) E(bN )

Figure 2.27: High-dimensional NNP structure : from atomic positions inputs to structure energy output for every atom of type a and b.

Symmetry functions The structure and the energy of a system must remain invariant under rotation or translation of the system. However, cartesian coordinates are not. Still, simulations only provide cartesian coordinates computed for the whole system using peridoc boundary conditions (pbc). As a consequence, we need to use symmetry functions to symmetrize our local structure, within a cut-off radius Rc. These func- tions are functions of r the distance and θ the angle between atoms, which are invariant under rotation and translation.

In this part we will discuss the symmetry functions that are proposed by Behler [16]. These are shown in a general way and will be discussed in the more specific cases of Lennard-Jones system and aluminium in Chapter 4.

Cut-off function fc(r)

In order to only consider atoms whose interaction affects the ith atom, we have to define a cut-off radius Rc beyond which other interactions will be set to zero (since neglectible).

Mathematically, to do so, we consider a cut-off function fc(rij) defined as :

(  πrij  0.5 cos( R ) + 1 if rij ≤ Rc, fc(rij) = c (2.2.23) 0, if rij > Rc, where rij refers to the distance between an atom i and an atom j. Let us chose an atom a. The closer is an atom to a, the stronger its effects are on this atom. If the atom is at a distance beyond Rc, it does not affect the atom a.

From this cut-off function, several types of many-body symmetry functions can be build. In his ap- proach of constructing NNP, Belher gives a list of symmetry function that can be used [20], listed in two categories: the radial symmetry functions and the angular symmetry functions.

Radial symmetry functions

44 Chapter 2 2.2. Constructing a neural network potential (NNP)

Figure 2.28: Plot of the cut-off function for different cut-off radius rc.

The first angular symmetry function given by Behler is the simplest given as the sum of all cut-off functions on the neighboring atoms :

Natoms 1 X Gi ≡ fc(rij), (2.2.24) j=1 A better alternative is to consider the product of these cut-off functions with Gaussian factors. Thus, such a Gaussian radial symmetry function wil be defined according to a Gaussian width η and shift Rs as in Eq.2.2.25.

Natoms 2 X 2 Gi ≡ exp[−η(rij − Rs) ]fc(rij). (2.2.25) j=1 Taking a look at the term inside the sum for different parameters gives Fig. 2.30. Another alternative is to take a periodic radial symmetry function that can be defined using a cosine an a period parameter κ :

Natoms 3 X Gi ≡ cos(κrij)fc(rij). (2.2.26) j=1 Here, the parameter is the period of the cosine. A family of functions can be easily plotted for one neighbor.

Angular symmetry functions

The local structure can not entirely be defined only using radial distribution, one also needs the angular distribution of neighbouring atoms. Fig. 2.32 depicts the local environment of an atom an the associated variables. Once again, in order to take the whole local environment into account, a sum is computed over all neighbors. Angular distribution is obtained by using periodic functions of θjik. Thus, Behler’s proposal for such functions are given in Eq.2.2.27 and 2.2.28 :

4 1−ζ X X ζ 2 2 2 Gi ≡ 2 (1 + λ cos(θjik)) exp(−η(rij + rik) + rjk)fc(rij)fc(rik)fc(rjk). (2.2.27) j6=i k>j

45 Chapter 2 2.2. Constructing a neural network potential (NNP)

(a)

(b)

2 Figure 2.29: Gaussian symmetry functions Gi with several (a) widths; (b) shifts

5 1−ζ X X ζ 2 2 Gi ≡ 2 (1 + λ cos(θjik)) exp(−η(rij + rik))fc(rij)fc(rik). (2.2.28) j6=i k>j 5 Both functions are quite similar. Still, Gi contains a bit more information in the sense that it also consider a triplet if atom j an atom k are separated by a distance greater than Rc. Such triplets are completely relevant to take into account if atom j and k stand within the sphere of radius Rc.

The angular part of these functions is the same and is defined as:

θ 1−ζ ζ Gi ≡ 2 (1 + λ cos(θjik)) . (2.2.29) Such angular functions can be plotted for different normalization (or resolution) ζ and orientation λ parameters. As one sees in Fig 2.33, all the angular information is covered by these functions. The product of radial and angular parts√ can also be plotted for several parameters if we fix, for example, the distance rij = rjk = rik at 1/ 3. Moreover, we fix the Gaussian width at η = 0.001.

46 Chapter 2 2.2. Constructing a neural network potential (NNP)

3 Figure 2.30: Periodic radial symmetry function Gi for different periods

i rik θjik k

rij

j

Figure 2.31: Atomic neighbouring within a structure

Increasing ζ makes the curves thinner while increasing Rc makes it smaller, which makes complete sense.

Building a feature

All of the mentioned symmetry functions are defined regarding a few parameters. Each set of pa- rameter defines a function. Therefore, the local environment of a atom can be described using as many symmetry functions as we need. Identifying the number of symmetry functions needed and the associated parameters can be challenging and must be done very carefully so that all the local environment is accounted for and so that the energy can be precisely predicted. Both radial and angu- lar features should be taken to cover all the information needed to describe the local structure correctly.

k As a consequence, for every studied case, the sets of functions {Gi }k must be carefully thought of: the functions (defined by their parameters) must not leave aside any radial or angular piece of infor- mation. Morever, it was observed empirically that a local environment can be well described using 1/3 of radial symmetry functions and 2/3 of angular symmetry functions.

Note that there is no unique choice of functions set, as long as a set describes well enough the environment it is well suited. In §4 we will see how to build a functions set in an example case.

47 Chapter 2 2.2. Constructing a neural network potential (NNP)

(a)

(b)

θ Figure 2.32: Angular part Gi with orientation (a) λ = −1; (b) λ = +1

Computing the forces Now that the local environment of an atom can be described, atomic energies can be predicted with our atomic NNs and so can be the configurational energy according to Eq.2.2.19.

In Molecular Dynamics, interactions are described by a force field, which is by definition the gradient of the potential energy surface :

F = −∇E, (2.2.30) where E ≡ E(r) corresponds to the PES, i.e the potential energy of a system for any atomic configu- ration r = r1, r2, ...rNatoms , where Natoms is the total number of atoms.

Since the energy is given as the sum of atomic energies Ei that are obtained considering the local atomic environment within a sphere of radius Rc, the gradient can be expanded so the forces acting

48 Chapter 2 2.2. Constructing a neural network potential (NNP)

(a)

(b)

5 Figure 2.33: Angular symmetry function Gi with orientation (a) λ = −1; (b) λ = +1 on an atom k in the direction α, for α = (x, y, z) is given as :

Natoms Nk+1 X ∂Ei X ∂Ei F = − = − , (2.2.31) k,α ∂r ∂r i=1 k,α i=1 k,α where Nk denotes the number of neighbours of atom k within the sphere of radius Rc, including atom k.

Making the forces available is of major interest in MD simulations. The NNPs having a well-defined functional form, the analytic derivatives are readily available. Moreover, as a NN prediction, the energy derivatives regarding the symmetry functions can be computed using back propagation within the NN and the symmetry function derivatives regarding the Cartesian coordinates can be computed analitycally.

49 Chapter 2 2.2. Constructing a neural network potential (NNP)

Thus, the forces can be expressed and obtained according to Eq Eq.2.2.32:

Nk Nk NG X ∂Ei X X ∂Ei ∂Gi,s F = − = − , (2.2.32) k,α ∂r ∂G r i=1 k,α i=1 s=1 i,s k,α where NG stands for the number of symmetry functions.

Note: we have here discussed the case of forces. However, the stresses can also be obtained using this method [31]:

Nk Nk NG Nk 1 X ∂Ei 1 X X X ∂Ei ∂Gi,s ∂rγ,α S = − = − , (2.2.33) αβ V ∂ε V ∂G ∂r ∂ε i=1 αβ i=1 s=1 γ=1 i,s γ,α αβ where εαβ are the components of the configurational stress tensor (the same for each atom within a given configuration) and V the volume of the system. Still, we will not follow such path in this thesis and stay on the prediction of the PES.

50 Chapter 3

Study of diffusion-entropy relation within Lennard-Jones systems

In this part our purpose is to understand the behaviour of the diffusion relatively to the entropy of a Lennard-Jones system by exploiting the NN we have built and training to predict the Helmoltz free energy in §3.4.5.

This will be done running LAMMPS Molecular Dynamics (MD) simulations and collect the data for a different values of temperature T ∗, density ρ∗ and pressure P ∗, the ∗ referring to reduced units. These simulations will be led with a Lennard-Jones potential, with a cut-off radius rc = 4σ and Lennard- Jones parameters set at ε = σ = 1. This simulation will give us the value of the diffusion D∗ and the potential energy Upot ≡ U at different thermodynamic states.

ML Moreover, the trained NN allows us to computed the Helmotz free energy A (modulo Aideal) for a given couple (ρ∗,T ∗).

With this free energy and the energy obtained from the simulation we will be able to get the entropy S for different thermodynamic states and compare it to the diffusion. According to Eq.(3) from Desgranges [23] for a given configuration i, the entropy is given as :

ML ∗ ∗ Ui − A (ρi ,Ti ) Si = ∗ . (3.0.1) Ti 3.1 Building consistent LAMMPS simulations

In order to get the more precise result, we need to build simulations that gives us the same results obtained by Johnson [22]. To do so, we will compare the obtained pressure and energies, using NVT simulation.

∗ ∗ The system is over melted to be completely fused and then is cooled until a state (ρi ,Ti ). Then, it is thermalized to stabilize before the production begins. To obtain the same exact results as Johnson, we have chosen to do the simulation using a cut-off radius of rc = 4σ and to shift the Lennard- Jones potential from its origin. The point is now to check that the obtained energies are the same as Johnson’s and that in the production stage, the thermodynamic parameters remain steady and their distribution follows `agaussian law, as expected from statistical mechanics.

Checking the energy The energy obtained from the simulations will therefore have to be shifted back, by adding firstly the potential shift given as 1 4π φ = N U (r ) = ρ∗ r3U (r ), (3.1.1) shift atom,LJ LJ c 2 3 c LJ c Chapter 3 3.2. First approach of the relation diffusion vs entropy

where Natom,LJ corresponds to the number of atoms in a Lennard-Jones potential sphere of radius rc and secondly the potential correction mentioned in Eq.(3) in Johnson [22] :

" 9  3# 8 ∗ σ σ Ulrc = πρ − 3 . (3.1.2) 9 rc rc

1 Note that since the Lennard-Jones potential is valid for a pair of atoms, we have a factor 2 in φshift, avoiding double counting the interaction.

Thus we have : U ≡ Usimulation + Ulrc + φshift.

Using this, we have been able to build consistent simulations in terms of energy : the obtained energy is coherent with the values obtained by Johnson.

Parameters oscillations

Figure 3.1: Gaussian fit of the pressure

Now, we can check the fluctuations of internal quantities at the equilibrium state. These must be oscillating around a constant value, following a gaussian repartition according to statistical physics. As an example we have the gaussian fit for the pressure in the equilibrium of a test simulation made in Fig. 3.1.

Here we can assure that the pressure oscillates following a gaussian repartition around a mean value of −0.08127.

This is an example among very similar others. Combining the different checks, we have been able to build simulations that are completely consistent with those led by Johnson in 1993 and that represent an actual liquid state.

3.2 First approach of the relation diffusion vs entropy

According to works from Rosenfeld [25] and Jakse & Pasturel [26], there exist a strong relation between entropy and diffusion in fluids and liquid metals. Indeed, both articles mention A universal scaling

52 Chapter 3 3.2. First approach of the relation diffusion vs entropy law for atomic diffusion in condensed matter, Dzugutov 1996[27], which states a dimensionless form of the diffusion coefficient as a fonction of the two-body excess entropy S2 of the liquid phase, that represents up to 80 to 90% of the liquid’s entropy :

∗ D ≈ 0.049 exp(S2) ≈ 0.049 exp(S) (3.2.1) Both articles takes different approaches to verify this relation,where S is the excess entropy containing in principle all contributions beyond the two-body one S2.

Jakse & Pasturel approach The dimensionless diffusion coefficient is directly computed using uncorrelated binary collisions de- scribed by the Enskog theory such that :

1 D∗ 1 πk T  2 = , where Γ = 4σ2g(σ)ρ B , D Γσ2 m i.e. 1 D∗ 1 1  m  2 = 4 , (3.2.2) D 4 σ g(σ)ρ πkBT where m is the atomic mass set to 1 here, ρ the atomic density and σ the hard-sphere diameter. All the parameters needed to compute D∗ are obtained from the MD simulations.

Note that in reduced units we have kB = 1.

Rosenfeld approach The Rosenfeld approach is divided in two parts. First, the dimensionless diffusion is defined differently, computed according to :

∗ 1 D 1  m  2 = ρ 3 . D kBT This dimensionless coefficient was shown to behave according to :

D∗ ≈ 0.6 exp(0.8S) (3.2.3) for all strongly coupled simple fluids (−S & 1). Then the Duzugutov relation is recovered using another dimensionless diffusion such that :

D+ 2 π  4 σ 4  = √ 3 g(σ) ∝ exp(0.2S), D∗ π 6 a

1  3  3 + where a = 4πρ is the Wigner-Seitz radius. Finally, D ≈ 0.05 exp(S), according to Eq. (25) in Rosenfeld 1999 [25].

Once again, all the quantities needed to computed these dimensionless coefficient are obtained from MD simulations.

Note: as a consequence of the previous equations we have

1 4 1 4 1 +  4    2 π  3 4 3   2 D 2 π  3 σ  1 m 2 6 a ρ m √ 3 = g(σ) ρ = 4 . D π 6 a kBT σ g(σ) πkBT Using the expression of the Wigner-Seitz radius, we get

53 Chapter 3 3.2. First approach of the relation diffusion vs entropy

4 4 1 4 1 1 4 + π  3 3   3   2   3   2 D 2 6 ρ 3 m π  3 3 1 m = 4 = 2 4 , D σ g(σ) 4πρ πkBT 6 4π σ g(σ)ρ πkBT and so

4 1 4 1 D+  3  3 1  m  2 1 3 1  m  2 = 2 4 = 2 4 . D 6 × 4 σ g(σ)ρ πkBT 8 σ g(σ)ρ πkBT Finally we have 1 D+ 1 1  m  2 = 4 , (3.2.4) D 8 σ g(σ)ρ πkBT which is half the relation obtained from Jakse & Pasturel approach.

Obtained results To study both approach we will try to recover equations 3.2.1 and 3.2.3 from the data of our sim- ulations. The procedure is the following: from the LAMMPS simulations described above, for each chosen thermodynamic state, we determine the potential energy U, the pair correlation function g(r) for which we determined its maximum g(σ) and the mean square displacement from which we get the diffusion coefficient. The important point here is that we can calculate what we call an ”exact” entropy from Eq. , using the simulated U and the predicted free energy AML from the trained NN. This removes the uncertainty in this analysis of the approximate nature of the entropy encountered in previous published works on this issue. The reduced diffusion coefficient is determined from the diffusion coefficient from the same simulation, so that we can explore the diffusion-entropy universal law in a consistent way.

Leading the simulation and computing the Helmotz free energy for different thermodynamic states from the NN on an isobar at P ∗ = 0 allowed us to obtain Fig. 3.2:

Figure 3.2: Diffusion-entropy relation on isobar P ∗ = 0.

We directly observe the decreasing tendency for each curve. In the Rosenfeld approach, the curves are almost the same while a small difference persists. However, the results are consistent along the isobar and we can here confirm that diffusion behaves as a negative exponential function of the entropy.

54 Chapter 3 3.3. Behaviour along characteristic curves

3.3 Behaviour along characteristic curves

Now we have validated our simulations, we want to see how does the diffusion evolve as a function of the entropy along curves with constant density ρ∗, temperature T ∗ and pressure P ∗. To do so, simulations are led and the results are summed up in the following subsections.

Running this simulations allows us to move along the phase diagram given in Fig. 2.26.

In this diagram we recover the liquid states above the curve and the solid below. On the right, at ”high” density (i.e. ρ∗ ≥ 1) we have the undercooling area where the solid state can be obtained at ”high” temperature (T ∗ ≥ 3).

Note: in the following parts, D* Dzugotov will refer to D∗ = 0.05 exp(S) and D* approx Rosenfeld to D+ = 0.6 exp(0.8S).

Isochores As an example we have chosen four densities and decided to plot the diffusion vs the entropy for ρ∗ = 0.6, 0.8, 1 and 1.2, with a constant number of atoms equal to 864. For each density, the plots are obtained at different temperatures T ∗ = 0.6, 0.8, 1, 2, 4 and 6.

For the two lowest densities, the results are quite consistent. However, reaching high density, things begin to be more messy especially for the smallest entropies, i.e. small temperatures. Indeed, this makes sense. According to the phase diagram, such parameters correspond to the undercooling area. In this area, we have a solid state and the Helmoltz free energy is extrapolated. Therefore, the entropy is not well estimated, and as a consequence, the relation between diffusion and entropy breaks down.

(a) (b)

(c) (d)

Figure 3.3: Diffusion vs entropy (a) ρ∗ = 0.6; (b) ρ∗ = 0.8; (c)ρ∗ = 1.0; (d) ρ∗ = 1.2

As the density increases, both diffusion and entropy decrease (so −S increases). There is no surprises

55 Chapter 3 3.3. Behaviour along characteristic curves

(a) (b)

Figure 3.4: (a) D∗ = 0.05 exp(S); (b) D∗ = 0.6 exp(0.8S) approximation for different ρ∗ to that since a more dense state is more constrained and therefore the structure is better defined and less homogenous.

Isotherms

(a) (b)

(c) (d)

Figure 3.5: Diffusion vs entropy (a) T ∗ = 0.6; (b) T ∗ = 2; (c) T ∗ = 4; (d) T ∗ = 6

On the exact same pattern as the previous subsection, we can plot the curves on isoterms by varying the density. Above are plotted the results for T ∗ = 0.6, 2, 4 and 6.

These plots are obtained with densities ρ∗ = 0.6, 0.8, 1 and 1.2. We see that Fig. 3.5(a) corresponds to a solid state for 3 out of 4 points so the exponential law can not be seen. The other figures show discrepancies but the exponential behaviour is explicitly plotted. These discrepancies seem to increase with temperature. This might be explained by the fact that Johnson’s database on which we have trained our neural network algorithms with has more data for low temperature.

56 Chapter 3 3.3. Behaviour along characteristic curves

(a) (b)

Figure 3.6: (a) D∗ = 0.05 exp(S); (b) D∗ = 0.6 exp(0.8S) approximation for different T ∗

Increasing the temperature makes both diffusion and entropy increase. This is in opposition with the case seen with the isochores: increasing temperature should lead to a liquid state that is more likely to be disorganized.

Isobars Now fixing P ∗ for the production and choosing different temperatures T ∗ (either densities ρ∗) can give us the relation between diffusion and entropy along isobars. These are plotted in Fig. 3.7.

Note the case P ∗ = 0 differs from the other as simulation at this pressure are very delicate. We have to consider a range of temperature T ∗ ∈ [0.6; 1.2] while the other pressures were studied with T ∗ ∈ [1; 4].

(a) (b)

(c)

Figure 3.7: Diffusion vs entropy (a) P ∗ = 0; (b) P ∗ = 0.5; (c) P ∗ = 2; (d) P ∗ = 5

Along isobars, the results are very consistent. The curves fit very nicely at P ∗ = 0, confirming once again our simulations and calculations are good. The other plots are also good even if some

57 Chapter 3 3.4. Conclusion on the entropy-diffusion relation discrepancies appears at high temperature again. The exponential law seems to break down as the entropy goes to zero, i.e. as we go towards the solid state. Increasing the pressure reduces the diffusion and the entropy. Relation between both are plotted in Fig. 3.8. A higher pressure correspond to more constraints on the structure. Therefore, the atoms should reorganize and the structure will be well defined, similarly as with the isochores.

Figure 3.8: (a) D∗ = 0.05 exp(S); (b) D∗ = 0.6 exp(0.8S) approximation for different P ∗

3.4 Conclusion on the entropy-diffusion relation

Most of the curves we have seen tend to confirm the relation we expected between diffusion and entropy, either along isotherms, isobars and iso-volumes. It comes out that the more determinant parameter is the density, as we might have expected from the phase diagram. The universal entropy diffusion laws proposed by Rosenfeld and Dzugutov, and later refined by Jakse and Pasturel seem to hold in a range of moderate densities in the liquid state.

This study allowed us to see that the exponential law breaks in the undercooling area. Moreover, at high temperature, discrepancies are systematic.

58 Chapter 4

Construction of a high-dimensional neural networks in the Behler-Parinello approach

4.1 Lennard-Jones NNP

From here, we have all the elements and the methods needed to build a NNP for a Lennard-Jones (LJ) system. In this section we will describe the whole process we have followed to build this first NNP.

Let us first recall the stakes of such NNP: be able to use a trained NN to predict the energy from a configuration regarding atomic positions. This matters since simulations can be very expensive in terms of CPU time.

4.1.1 Constructing a dataset In order to cover a significant part of the phase diagram (cf Fig. 2.26) including the dense liquid and solid phases, we have simulated different states with T ∗ ∈ [0.001, 3.801] and ρ∗ ∈ [0.5, 1.2]. Respec- tively with temperature step ∆T ∗ = 0.2 and density step ∆ρ∗ = 0.05, we computed the data for 30 differents thermodynamic states. These simulations were led with a cut-off radius Rc = 3.5σ in LJ units, in a box of 256 atoms.

Dump and log files were generated. Then, production files were created based on the log files, contain- ing all the data on the energy and the stress and using the position and forces data from the dump files, we have created the dataset files for the training of the NN and built out a NNP as described below.

Sampling configurations As a question of matter, our NNP must be representative of every possible configuration and energy. Therefore, it is a convient to have a very large dataset. However, one must remain careful not to overfit a NN while training it.

From previous results, by example from Fig. 2.2, we know that energy distribution within a station- nary material is Gaussian around a mean value. If we consider every configurations within our training dataset, our NN will tend to predict energy very close to the mean value since it would have learned a lot at this energy area.

To counteract this, configurations will be sampled based on the strength of atomic interaction for an arbitrary atom i. For each configuration, the strength of the total force Fi applied on the atom i will be computed and compared to a threshold α chosen arbitrarly. Then, a dynamic timestep τ allow us Chapter 4 4.1. Lennard-Jones NNP

Algorithm 1: Sampling configurations regarding the atomic interactions. Data: All system configurations Result: Write here the result α #force threshold to sample the data; τmax #maximum delay before to sample a new configurations; #even if the forces are not above the threshold; τ = 0 #dynamic timestep for the sampling; for each configuration do while τ < τmax do q 2 2 2 Compute Fi = Fi,x + Fi,y + Fi,z;

if Fi ≤ α then if floor (α/Fi) ≤ τmax then τ = τ + floor (α/Fi); end else τ = τ + τmax; end end else if Fi > α then Sample the configuration; τ = 0; end end Sample the configuration; τ = 0; end to sample the configurations regurlarly anyways.

Two parameters are fixed externally: α and τmax. The idea beyond the sampling algorithm is that, as we must cover all the information but not too much information at the center of the Gaussian (cf Fig. 2.2), we will only consider configurations with sufficiently high energies, i.e. interactions. Thus, we need to apply a strengh comparison to a threshold α. Still, central energies must not be forgotten. Therefore, these are sampled regularly using a timestep τmax. By default we set α equals to 3 times the mean force exerced on the ith atom regarding all the configurations and τmax to 10.

This algorithm is explicitely written above and is based on the work of A. Pukrittayakamee et al., 2009 [28].

k Compute the features {Gi } At this point, we have a data containing the positions, the energies, the forces and the stresses of many configurations. Recalling Fig. 2.28, the next step is to compute the symmetry functions in order to train each atomic NN.

Chosing the symmetry functions

We have already discussed how important it is to chose conveniently the symmetry functions. Further, we know that we must consider enough functions so the local environment of every atom is described well enough. To do so, we plot symmetry functions for different parameters from Fig. 4.1.

60 Chapter 4 4.1. Lennard-Jones NNP

(a)

(b)

2 Figure 4.1: Choice of Gi functions for LJ NNP on (a) η; (b) Rs

Drawing all these radial functions on the same plot allows us to see the whole radial information is covered by these functions, from 0 to the LJ radius of 3.5σ. From an angular point of view, the same work can be lead and so the chosen parameters give us the functions drawn on Fig. 4.3. And as previously, we can plot all these angular functions on a single graph on Fig.4.4. This way we perceive that most of the angular information is covered.

Based on previous work, for example [30], we have decided to work with two types of symmetry func- tions: one radial G2 and one angular G5. From these two functions, all the local environment can be described using the good parameters.

We have chosen to consider a total of 32 symmetry functions : 12 radial and 20 angular (in order to respect an average proportion of 1/3 vs 2/3). This choice is motivated as we must chose enough func- tions for the NN predict energy correctly, and as angular symmetry functions can be quite expensive in terms of CPU time so we limit their numbers. Still, such a description will well reproduce the local

61 Chapter 4 4.1. Lennard-Jones NNP

Figure 4.2: Choices of radial symmetry functions for LJ NNP

−2 Rc(σ) Rs(σ) η(σ ) 2.5 2.22 14 2.5 1.85 14 2.5 1.48 14 2.5 1.11 14 2.5 0.74 14 2.5 0.37 14 2.5 0 14 2.5 0 5 2.5 0 2 2.5 0 0.8 2.5 0 0.3 2.5 0 0.01

Table 4.1: List of G2 function parameters for the LJ features environment of atoms.

Table 4.1 and 4.2 summarize the parameters used for the symmetry functions.

All these parameters have been chosen to correctly reproduce the local environment of each atom within the material. With simulations led with a cut-off radius of 3.5σ in LJ units, we have here chosen smaller cut-off radius values for our symmetry functions. Within the newly chosen radii, Fig. 4.2 shows that radial symmetry functions takes non-null values within the range r ∈ [0, 2.5σ]. Fig. 4.4 directly shows that the angular symmetry functions cover all the angular information on the ◦ range θjik ∈ [0, 360 ].

Computing the feature values

From the simulation data we have the euclidean coordinates xi, yi, zi for an atom i within the simulation

62 Chapter 4 4.1. Lennard-Jones NNP

(a)

(b)

5 Figure 4.3: Choice of Gi functions for LJ NNP for (a) λ = −1; (b) λ = +1

box. The distance rij between two atoms i and j is given as: q 2 2 2 rij = (xi − xj) + (yi − yj) + (zi − zj) , (4.1.1) and within a triplet, the angle of θjik is defined according to

2 2 2 rjk − rik − rij cos(θjik) = . (4.1.2) 2rikrij Eq. 4.1.1 and 4.1.2 with the definition of G2 and G5 allow us to obtain, for each atom of any configuration, a set of 32 values that we will call a feature. For an atom i we denote the associated feature G(ai).

63 Chapter 4 4.1. Lennard-Jones NNP

Figure 4.4: Choices of angular symmetry functions for LJ NNP

The training file The main purpose of our work is to predict energy from position. Still, it is also common to consider forces and stresses in such process, to get a better precision on prediction. As a consequence, we have built an interface that the user can use to query the quantites one wishes in the training file.

The relevant quantities: energy, force vectors and stress tensors are respectively extracted from the production and dump files for classical MD simulations and from the OUTCAR file for ab initio simulations. Finally, the NN will be trained with a .csv file which as between 33 and 42 columns for each atom i (12 + 20 feature coordinates G(ai), one energy E ≡ Econfiguration, 3 forces (Fx,i,Fy,i,Fz,i) and 6 stresses (σxx,i, σyy,i, σzz,i, σxy,i, σxz,i, σyz,i)). Note that the energy E is a configurational energy (which is the quantity we want to predict). Thus, every atom of a given configuration will have the same E.

We will train the NN on a set built by considering 360 configurations (of N = 256 atoms each) of a LJ system in all of the thermodynamical states previously mentionned. Thus, we have 360*30 = 10800 configurations.

From the sampling we have applied, 9035 configurations have been kept, for a total of 9035*256 = 2312960 atomic features.

4.1.2 Building the High-Dimensional NNP (HDNNP) Recalling the structure given in Fig. 2.28, we here want to build a HDNNP with only one type of atom a for a LJ system. Thus, every atomic neural network will be identical. The atomic positions will pass through the HDNNP so the symmetry functions can be computed and atomic energies predicted. Finally, the configurational energy will be predicted.

Atomic Neural Networks We have already discussed the structure of the atomic NN that are expected to be MLPs. Since these are all identical in our work, they all have the same weights.

64 Chapter 4 4.1. Lennard-Jones NNP

−2 η(σ ) ζ Rc(σ) λ 0.01 1 1.6 1 0.01 1 2.5 1 0.01 2 1.6 1 0.01 2 2.5 1 0.01 4 1.6 1 0.01 4 2.5 1 0.01 16 1.6 1 0.01 16 2.5 1 0.01 64 1.6 1 0.01 64 2.5 1 0.01 1 1.6 -1 0.01 1 2.5 -1 0.01 2 1.6 -1 0.01 2 2.5 -1 0.01 4 1.6 -1 0.01 4 2.5 -1 0.01 16 1.6 -1 0.01 16 2.5 -1 0.01 64 1.6 -1 0.01 64 2.5 -1

Table 4.2: List of G5 function parameters for the LJ features

Each atomic NN must have 6 sets of weights of respective sizes: (32, h1), (h1), (h1, h2), (h2), (h2, 1), (1, ) if we consider an atomic NN as a MLP with two hidden layers of sizes h1 and h2. There is two sets for each weighted layer, corresponding to the weights w and the biases b.

Structure of the atomic NN

The atomic NN must be trained using the cost function defined in Eq. 2.2.20. As a consequence, it must be trained using a whole configuration, to predict the configurational energy.

Our NN must take as input the features of an entire configuration and returns the predicted energy, obtained using a set of weights of sizes (32, h1), (h1), (h1, h2), (h2), (h2, 1), (1, ).

To do so, we build a structure that reproduces formally the one showed in Fig. 2.28, with N inputs corresponding to our atoms, 2 successive hidden layers that will predict the N atomic energies, and finally an output layer in which the atomic energies are sumed to give the configurational energy. The key point in the training is that the NN should only have two hidden layers for each atomic inputs. It is the exact same as building N subnetworks (one for each atom) where the hidden layers of each subnetwork should have the same weights as the others.

An example of such a structure can be plotted using keras. Here we considered 4 atoms and used two hidden layers of size 5 each. In such NN structure, every atomic energies are predicted with the same weights (since the feature inputs are processed with the same layers). They are gathered as inputs of the last layer, which is a simple unweighted sum as defined in Eq. 2.2.19. Finally, the weights are updated regarding the configurational energy and the MSE is obtained according to Eq. 2.2.20.

65 Chapter 4 4.1. Lennard-Jones NNP

Figure 4.5: Example of the training structure of an atomic NN for N = 4

Note that there is no direct way to obtain the atomic energies from the training of the NN with such method.

Keras implementation

To build such a structure, we used a different approach than the one showed in §3.4. As already discussed, Keras models can be built using two ways: the Sequential Model of the Functional API. The first work was made using the Sequential Model: we stacked layers and the information propagates through these layers. To design the model from Fig. 4.5 we need more flexibility. It can be obtained using the functional API.

Let us first create the inner layers of our atomic NN. This can be done independently of the input and outputs as following: layer h1 = Dense(hidden layer sizes[0],input dim = (N features,),activation = activation function,name=’atom h1’) layer h2 = Dense(hidden layer sizes[1],input dim = (N features,hidden layer sizes[0]),activation = activation function,name=’atom h2’) layer out = Dense(1)

Here, N features refers to the total number of features (32 in our case) and hidden layer sizes to the size of the hidden layers of the MLP. The last layer is set to a Dense layer of size 1 since we predict a single value for each atomic NN: the atomic energy Ei.

Note that since we use Dense layers, we recall that the weights are initialized with a Glorot Uniform distribution and the biases at zeros.

Now that the layers have been create, every atomic input must propagates through these. Then, for each atom i we can create a subnetwork by linking the layers as follow: subnet in = Input(shape=(N features,)) subnet h1 = layer h1(subnet in) subnet h2 = layer h2(subnet h1) subnet out = layer out(subnet h2)

Then, we need to store the input and output layers of each subnet in order to build the full model: sub networks in.append(subnet in)

66 Chapter 4 4.1. Lennard-Jones NNP sub networks out.append(subnet out)

From there, we have 256 subnetworks with their respective inputs and outputs, and which share the same layers. These must now be connected to build the full structure from Fig. 4.5. First we defined the system output as the sum of the subnet outputs: out = add(sub networks out)

Then, we simply build the model from the subnet inputs and the summed output and compile it, with adam as optimizer and the MSE as the loss: merged model = Model(sub networks in,out) merged model.compile(optimizer=optimizer adam,loss=’mean squared error’)

Finally, we can save the model to load back the weights or the model characteristics whenever it is needed: merged model.save(model label)

Training of the model

Now that the model is built, the procedure is the exact same as in §3.4. We define our EarlyStopping based on the val loss quantity, the loss on the validation set (equal to 20% of the training set in our case) using: callbacks = [EarlyStopping(monitor=’val loss’, min delta=10e-5,patience=10, mode=’min’)].

Such a callback will stop the training if the val loss does not decrease by less than 10−5 for 10 iterations. Finally, we fit the model using our hyperparameters: fit = merged model.fit(train X,train y,validation split=0.2,batch size=32, shuffle=True, epochs=2000,verbose=2,callbacks=callbacks).

Predictions can be made using: predict y = merged model(test X).

Preconditionning the dataset We have the dataset and the NN. The next step is to train the subatomic NNs to get the weights in order to predict atomic energies and so on, the potential energy.

To do so, we want to precondition the data so the atomic NNs can clearly differentiate the features and make an accurate prediction. There is different ways to precondition the data before to pass them through the NN [29] and we will briefly introduce them in the following.

For every preconditionning method, the function Gi,s correspond to a given symmetry function (in our case, s ∈ [1, 32]) for the atom i.

Scaling

67 Chapter 4 4.1. Lennard-Jones NNP

min scale 2(Gi,s − Gs ) Gi,s = max min − 1, (4.1.3) Gs − Gs min max where Gs and Gs refers to the minimum and maximum of Gi,s over all atoms and configurations of the dataset.

Shifting

shift Gi,s = Gi,s − Gs, (4.1.4) where Gs is the mean of Gi,s over all atoms and configurations of the dataset.

Standardization

stand Gi,s − Gs Gi,s = , (4.1.5) σ(Gs) where σ denotes the standard deviation; here σ(Gs) is the standard deviation of Gi,s over all atoms and configurations of the dataset.

In our case we will mainly work with the scaling preconditionning since it is the one proposed by Behler [19]. Once the input file has been created, the data enters a preprocessing channel, where the features (i.e the inputs) are scaled, while the outputs remain unchanged. Right after this, the configurations are splitted into training and test sets (at 80/20% each) and the training can be carried out.

NN training and accuracy Everything is now available for the NN to be trained. Let us take a first look at a NN made of N = 256 atoms. We have trained the atomic NN with N = 256 atoms in concordance with the simulation we have lead. Still, it is possible to change these values.

Atomic NN parameters

The used parameters for the training are the following: batch size = 32 ; shuffle = True alpha l2 = 1e-5 #l2 norm regularization hidden layer sizes = (5,5) #NN hidden layers max epochs = 1500 #maximum number of epochs algorithms should train if no early stopping activation function = ’tanh’ #layers activation function min delta = 1e-5 ; patience = 10 #early stopping parameters

Loss convergence and prediction

The NN can be trained based on the dataset discussed in §5.1.3. Fig. 4.6 gives us a first impression to see if the NN converge, i.e. if it is able to make a good energy prediction.

In this plot, the blue line correspond to the actual energy value while the red dots are the energy prediction. Since the prediction are on the blue line, the prediction is assumed to be consistent, regarding a sufficiently low accuracy. In the next section we will discuss the loss of different NN

68 Chapter 4 4.1. Lennard-Jones NNP

Figure 4.6: Energy predictions using NN structure from Behler-Parrinello approach structure to see how many layers and what sizes should the layer be to get the best prediction, i.e. the smaller loss.

4.1.3 Optimizing the HDNNP Now that we know how to train consistently the atomic NNs, we want to find which NN structure will lead to the more accurate energy predictions, i.e. the smaller loss.

As already discussed, a NN is defined by its weights and so by its number of parameters which de- pends on the size of its layers. In the first approach, we have worked with a structure 32-5-5-1 : an input layer of size 32, a first hidden layer of size 5, a second hidden layer of size 5 and an out- put layer of size 1. This structure has a total of 32 × 5 + 5 + 5 × 5 + 5 + 5 + 1 = 201 parameters. Such a configuration of the NN gives us values of the MSE error for the train set and the validation set.

By building NNs with different structures, it is possible to see which structure will be the more efficient and appropriate to use to build the NNP.

4.1.4 Structure comparisons In order to properly compare the results obtained with different NN strcutures, we need te get repro- ducible results, i.e. we need to fix all the random seeds used within our program. The can be done with the following sequence:

#fix the seed value seed value=3

# 1. Set the ‘PYTHONHASHSEED‘ environment variable at a fixed value import os os.environ[’PYTHONHASHSEED’]=str(seed value)

# 2. Set the ‘python‘ built-in pseudo-random generator at a fixed value import random as rn

69 Chapter 4 4.1. Lennard-Jones NNP rn.seed(seed value)

# 3. Set the ‘numpy‘ pseudo-random generator at a fixed value import numpy as np np.random.seed(seed value)

# 4. Set the ‘tensorflow‘ backend random generator import tensorflow as tf tf.random.set seed(seed value)

Then we have to decide the proceedure according to which we will compare the different NNs. These must be compared in terms of energy loss, so to compare them we will see which NN provides the best loss. As an example, it is common to consider a RMSE of 0.5 meV per atom for most of the materials. As far as we study the generic Lennard-Jones model, we can chose a real system like Argon for it gives good results by choosing ε/kB = 119K, i.e ε = 0.0097eV. This corresponds to a MSE of 0.25 meV2 per atom. Thus a good RMSE to aim for is: RMSE = 0.5meV/ε ≈ 0.5 per atom. Therefore, the MSE we will aim to is MSE = 0.25 per atom.

Not all the structure could be able to reach such a loss. In order to compare them and see which ones gives the best losses we will train them according to a sufficient number of epochs, with an EarlyStopping hyperparameters set to stop the training when the loss becomes stable according a certain accuracy, sufficiently low enough. This would help to limit the CPU time.

In order to prevent overfitting, a manual check will be made, and the best value of the loss before the overfitting will be kept.

The hyper parameters chosen to find the best structure are given as follow: batch size = 32 ; shuffle = True alpha l2 = 1e-5 #l2 norm regularization max epochs = 20000 #maximum number of epochs algorithms should train if no early stopping activation function = ’tanh’ #layers activation function min delta = 1e-5 ; patience = 1000 #early stopping parameters

The training is made using a database made from the first 120 configurations of the LJ lead simulations, for each density and temperature. From the sampling 1321 configurations remain left, for a total number of 338176 atoms.

4.1.5 Comparison results and conclusion The results of different tests are given in Table 4.3, built on the same pattern as Table 2 from [30]. The MSEs and RMSEs (Root Mean Squared Errors) are given in LJ units. Different NN leads to different efficiencies. As a consequence of the results obtained in Table 4.3, we will go forward using a NN with a structure 32-20-20-1.

Carry on the training of these NNs do not lead to better losses. However, recalling the expected value of the RMSE of 0.5 per atom, we clearly have a good precision here (we remind that simulation have been lead with N = 256 atoms). These errors are convenient with the one obtained by Artrith [30] for TiO2 for instance, around 5 meV per atom (from the calculations discussed in §5.4).

70 Chapter 4 4.1. Lennard-Jones NNP

NN structure Nb of parameters Nb epochs MSE (train) MSE (val) RMSE(train) RMSE(val) 32-5-5-1 201 7071 4.14356 4.84136 2.03557 2.20031 32-10-10-1 451 20000 1.59686 1.78452 1.26367 1.33586 32-15-15-1 751 9456 1.54976 2.29951 1.24489 1.51641 32-20-20-1 1101 10676 1.23094 1.38608 1.10948 1.17732 32-30-30-1 1951 2099 9.73608 13.49774 3.12027 3.67393 32-40-40-1 3001 1717 11.17976 9.05565 3.34361 3.00926 32-50-50-1 4251 3761 9.32164 14.29718 3.05313 3.78116

Table 4.3: Comparison of MSE with different NN structures

Structure 32-5-5-1 32-10-10-1 32-15-15-1 32-20-20-1 32-30-30-1 32-40-40-1 32-50-50-1 RMSE per atom 0.00795 0.00494 0.00486 0.00433 0.01219 0.01306 0.01193

Table 4.4: RMSE of trained NN on the LJ dataset

4.1.6 Predicting the configurational energy Now that we have trained our NN, the next and last step is to use it to correctly predict an energy from an atomic configuration. In this subsection we will first take a look at energy predictions on a simulation lead to build the training dataset and then, process new simulations and see the capacity of our HDNNP to predict the potential energies along the phase space trajectory produced in a MD simulation.

Simulation used in training To check our results, we work on a file generated from an external simulation. We consider a LJ system in the state (ρ, T ) = (0.8, 2.0), which corresponds to a high temperature liquid. The energy trajectory is plotted in Fig. 4.7. We recall here that for a LJ system, energies are given in units of ε and radii in units of σ.

From the simulation we have lead, we consider the 70 firsts configurations of the simulations (cor- responding to 7000 timesteps). Within theses 70 configurations, 15 configurations are part of the training dataset. The energy trajectory can be plotted, as the so called MD reference energy, and the NN can be used to predict the energies of these configurations, which is called NN prediction. The results are given below: The two curves are quite close, with an error errors of order 10−3 as seen in Table 4.4. At some configurations, both curves overlap perfectly. This comes from the fact that the set of data we are predicting also contains some data the NN was trained on.

Outside these shared configurations, the NN predictions are getting quite close to the actual energies and the tendancies are the exact same for both curves. An important fact is that, if we do have a dataset large enough, there will be more and more overlaps and so, the predictions will tend to be exact for each configuration. This is explicitely shown by Behler in the Fig. 7 of his 2015 article [19].

Independant simulations A final test for our HDNNP is to try to predict the energy of unknown configurations. To see if it is able to do so, we process new simulations of three different LJ states (ρ, T ): (0.8, 0.2), (0.8, 0.8) and (0.8, 2.0) using LAMMPS, i.e. classical MD simulations, similarly as the one we made to build the training dataset.

71 Chapter 4 4.2. Aluminium NNP

Figure 4.7: Prediction of energy trajectory of the LJ state (0.8, 2.0) using a 32-20-20-1 NNP.

Plots of Fig. 4.8 shows the predictions made for each thermodynamical state. This time, the curves does not necessarly cross, and indeed, do not have to (except if the prediction is perfect, i.e if the configuration is known by the NN). Even though the curves does not fit perfectly, the energy values are close, with an average error around 5%.

We have here an illustration that we have been able to create a HDNNP to predict the energy of a LJ system for a range of states within ρ ∈ [0.50, 1.20] and T ∈ [0.001, 3.801].

4.2 Aluminium NNP

As an expansion of the results obtained with the Lennard-Jones potential, we want to use a HDNNP to build a potential for the Aluminium (Al), based on ab initio data.

To do so, we have lead ab initio simulation for Aluminium at different temperatures. The Aluminium is simply undercooled before being cooled down to a temperature T and the positions and energy data are computed.

From there, similarly as in §5 with the LJ case, we have been able to build a dataset to train atomic NNs, extract the weights and build a HDNNP. The results obtained are presented below.

4.2.1 Training of the NN Data for simulation at T = 300, 500, 1000, 1500, 3100, 4500 and 8000K are gathered and NN can be trained in order to build energy trajectories are discussed in the previous section. The simulations are led on the exact same way as in Chapter 2, section 2.1.3, excepted for the temperature T = 300K for which the initial configuration is a perfect fcc. For T = 300, 500 and 1000K simulation are processed at ambient pressure while other are processed at high pressures (see Jakse and Bryk, [32] for more details on high pressure simulations).

With a number N = 256 atoms per simulation, we have built a full data set of 579072 atoms. Note that for CPU time issues, all the simulations are not processed with the same initial number of

72 Chapter 4 4.2. Aluminium NNP

−2 Rc(A˚) Rs(A˚) η(A˚ ) 6.8 6.105 1.85 6.8 5.087 1.85 6.8 4.070 1.85 6.8 3.052 1.85 6.8 2.035 1.85 6.8 1.017 1.85 6.8 0 1.85 6.8 0 0.66 6.8 0 0.26 6.8 0 0.1 6.8 0 0.04 6.8 0 0.0013

Table 4.5: List of G2 function parameters for the Al features configurations and since we sample configurations on the strength of the force between atoms, the atoms are not equally distributed over temperatures. Indeed, following the sampling we have:

• 300K: 39990 initial configurations, 342 sampled, 87552 atoms;

• 500K: 39990 initial configurations, 344 sampled, 88064 atoms;

• 1000K: 37000 initial configurations, 327 sampled, 83712 atoms;

• 1500K: 39990 initial configurations, 346 sampled, 88576 atoms;

• 3100K: 39990 initial configurations, 348 sampled, 89088 atoms;

• 4500K: 37000 initial configurations, 307 sampled, 78592 atoms;

• 8000K: 29000 initial configurations, 248 sampled, 63488 atoms.

Features choices For each configurations we have computed 22 Behler features: 12 radial symmetry functions G2 and 10 angular symmetry functions G5. This choice was made to find a good compromise between the features computation time and the number of features required to make the NNs converge. The pa- rameters of these symmetry functions are given in the Tables 4.5 and 4.6. These have been chosen on the same process as for the LJ case, taking regular steps of each parameters in order to cover all the information on the local structure.

For Al, the lattice parameter is equal to 4.05 (fcc), thus, one can see on Fig. 4.9 that with these parameters, all the local structure is depicted by those 22 (12+10) symmetry functions.

Training parameters and results Now that the training dataset has been computed, the NN can be generated and trained. According to the optimization discussed in §5.3, we chose to build a NN with the following parameters: batch size = 32 ; shuffle = True alpha l2 = 1e-5 #l2 norm regularization hidden layer sizes = (20,20) #NN hidden layers max epochs = 20000 #maximum number of epochs algorithms should train if no early stopping activation function = ’tanh’ #layers activation function

73 Chapter 4 4.2. Aluminium NNP

−2 η(A˚ ) ζ Rc(A˚) λ 0.0013 1 4.4 1 0.0013 2 4.4 1 0.0013 4 4.4 1 0.0013 16 4.4 1 0.0013 64 4.4 1 0.0013 1 4.4 -1 0.0013 2 4.4 -1 0.0013 4 4.4 -1 0.0013 16 4.4 -1 0.0013 64 4.4 -1

Table 4.6: List of G5 function parameters for the Al features

NN structure Nb of parameters MSE (train) MSE (val) RMSE(train) RMSE(val) 32-20-20-1 1101 3.11514 3.50430 1.76498 1.87198

Table 4.7: Training for the Al HDNNP min delta = 1e-5 ; patience = 1000 #early stopping parameters

The results of the training are presented on Table 4.7.

According to this table, the RMSE per atom is of 0.0068945 eV/atom, i.e 6.9meV per atom, which is, once again, consistent with Artrith work [30] and its average RMSE of 7meV per atom.

4.2.2 Energy trajectories Now that we have our trained NN, we will build trajectories in the same pattern we did in §5.6.2. 200 configurations are simulated using VASP, i.e ab initio simulations, for Al at T = 300, 800, 1700 and 3100K. Using the NN to predict the energies on these configurations leads to the Fig. 4.10: Even though the curves does not fit perfectly, the error made on energy prediction remains within a range of 7meV per atom, which is, as already discussed, quite consistent. The tendancies of the curves are very similar between reference energies and predictions, and at higher energy, the fit appears to be better even though the dataset is smaller.

In comparison to the results obtained for the Lennard-Jones system, the fits are good. An important point to recall is that the training dataset for Al is made of 579072 atoms while it is 4 times larger for the LJ system. As a consequence, we can expect very good results by producing larger dataset with ab initio data for Al. Such dataset can be computational expensive to obtain. However, we have seen that once obtained and passed through an appropriate NN, it can be used to predict potential energies in a very efficient way. As a consequence, a lot of time can be saved using these NNP, once sufficiently large dataset have been produced to train it.

This work on Al is completely new with the consideration of very high pressure. It is at is beginning and still needs to be refined, especially regarding the features. Moreover, it can also be highly by covering more energy states. Still, the results discussed here are very promising.

74 Chapter 4 4.2. Aluminium NNP

(a)

(b)

(c)

Figure 4.8: HDNNP energy trajectory (a) (0.8, 0.2); (b) (0.8, 0.8); (c) (0.8, 2.0)

75 Chapter 4 4.2. Aluminium NNP

(a)

(b)

Figure 4.9: Choices of (a) radial; (b) angular summetry functions for Al HDNNP

76 Chapter 4 4.2. Aluminium NNP

(a)

(b)

Energy trajectory for Al at (a) 300K; (b)800K

77 Chapter 4 4.2. Aluminium NNP

(c)

(d)

Figure 4.10: Energy trajectory for Al at (a) 300K; (b)800K; (c) 1700K; (d) 3100K

78 Chapter 5

Conclusion

All along this thesis, we have presented the different concepts we wanted to work with. From the basics of Molecular Dynamics to the construction of High-Dimensional Neural Network Potentials, passing by the understanding of the Neural Networks, we have seen how Artificial Intelligence can be of great interest in material sciences.

Simulation plays a major role in physics and material sciences nowadays, alongside experiment and theory. Still, it can remains very expensive in terms of CPU. As discussed, a good way to save time from doing too many calculation can be to use a trained NN to predict the energy from atomic posi- tions (or features). That is exactly what the Behler Parrinello method aims to.

Through our work, we have been able to simulate different systems (Lennard-Jones and Aluminium) using different methods (resp. LAMMPS and VASP). Then, we have used these simulations to build large datasets in order to train a NN structure made of atomic NNs. From these atomic NNs, we have been able to construct HDNNPs, predicting configurational energies regarding atomic positions.

HDNNPs represent very strong tools with their perks and flaws. However, their flexibility are im- pressive and they can be generalized to a lot of systems. If we have only discussed Lenard-Jones and Aluminium in this work, the method used here can be generalized to other elements and to more complex structure as discussed in the litterature (with two or more components by example). This work worth as an introduction of all the concepts used to properly understand how HDNNPs are built and how we can optimize these.

With this thesis, we have tried to provide a code with the basic tools and all the necessary keys to build the Potential Energy Surfaces for different systems so that the forces can now be obtained from the predicted energies and the parameters of the atomic NN, including gradients and weights. Once the forces will be obtained, the last step to follow will be to implement the newly built PES into LAMMPS, in C++ in order to get fast and precise MD simulations, using ab initio data. Now that the mechanics of the method has been clearly understood for single component systems, further work will be to extend this work to alloys and multi-component system in a more general approach. Appendix A

LAMMPS and VASP files

A.1 Examples of LAMMPS output files

Here is given an example of a production file produced in LAMMPS.

Step Temp Press Volume PotEng KinEng TotEng Enthalpy 0 1 −91000.485 16610.653 −2878.1794 0.11155151 −2878.0679 −3821.5204 10 0.93710517 −90997.599 16610.653 −2878.1721 0.10453549 −2878.0676 −3821.4902 20 0.76909282 −90989.947 16610.653 −2878.1527 0.085793462 −2878.0669 −3821.4102 30 0.55017731 −90980.159 16610.653 −2878.1274 0.061373107 −2878.066 −3821.3078 40 0.3483151 −90971.475 16610.653 −2878.1041 0.038855074 −2878.0653 −3821.217 50 0.22129066 −90966.505 16610.653 −2878.0894 0.024685307 −2878.0648 −3821.165 60 0.1966304 −90966.281 16610.653 −2878.0864 0.021934417 −2878.0644 −3821.1624 70 0.26319739 −90969.974 16610.653 −2878.0934 0.029360064 −2878.0641 −3821.2003 80 0.37946521 −90975.404 16610.653 −2878.1059 0.042329915 −2878.0636 −3821.2561 90 0.49395924 −90980.07 16610.653 −2878.118 0.055101897 −2878.0629 −3821.3038 100 0.56725011 −90982.218 16610.653 −2878.1253 0.063277604 −2878.062 −3821.3252 [...] Chapter A A.2. LAMMPS simulation code for enthalpy dynamics

Below is given a example of position data obtained from a dump file from LAMMPS. ITEM: TIMESTEP 0 ITEM: NUMBER OF ATOMS 864 ITEM: BOX BOUNDS pp pp pp 0.0000000000000000e+00 2.5271999999999998e+01 0.0000000000000000e+00 2.5271999999999998e+01 0.0000000000000000e+00 2.5271999999999998e+01 ITEM: ATOMS id type x y z 469 1 4.04342 3.41491 2.46877 728 1 1.49523 3.92532 2.74118 133 1 2.37237 0.794748 2.79189 197 1 4.40895 1.76801 0.459679 371 1 1.56634 2.37603 0.389284 364 1 4.28392 5.82181 3.03532 694 1 2.7793 2.60523 4.75217 855 1 3.11678 4.95582 0.756791 446 1 0.384608 1.11677 4.92562 477 1 8.18088 4.8461 1.36756 727 1 5.85493 4.1182 0.595712 583 1 6.47858 1.80297 2.292 475 1 9.21342 2.59358 3.24227 518 1 8.11331 2.29301 0.169327 704 1 7.09993 4.31039 3.73634 766 1 5.2386 0.195997 3.46391 186 1 7.41078 1.41507 4.73801 180 1 11.0776 4.30466 1.86244 387 1 10.5822 1.51577 0.45725 421 1 10.7795 5.58666 3.96351 706 1 11.5348 2.73085 3.7813 321 1 14.3091 3.67225 1.49926 678 1 14.3782 1.75524 3.62069 343 1 13.5297 5.1417 4.52763 700 1 11.1463 0.0749875 2.89656

[...]

A.2 LAMMPS simulation code for enthalpy dynamics

# Initialisation echo both u n i t s metal atom style atomic variable x equal 6 variable y equal $x variable z equal $x variable Tbegin equal 1

81 Chapter A A.2. LAMMPS simulation code for enthalpy dynamics variable Tmax equal 3500 variable Tliq equal 1200 variable Pamb equal 1.013 #v a r i a b l e N restart equal 10000 v a r i a b l e N prod steps equal 50000 variable Tdamp equal 0.2 variable Pdamp equal 0.10 #variable Pdamp equal 0.3 variable distri root equal 87287 variable lattice size equal 4.05 variable scale equal 1.05 variable box s i z e equal ${ l a t t i c e s i z e }∗${ s c a l e } l a t t i c e f c c ${ b o x s i z e } region boxblock0 $x 0 $y 0 $z c r e a t e b o x 2 box create atoms 1 box mass 126.981539 mass 215.999 timestep 0.001 p a i r style eam/alloy p a i r c o e f f ∗ ∗ AlO.eam.alloy Al O neighbor 1.5 bin neigh modify delay 10 velocity all create ${Tbegin} ${ d i s t r i r o o t } t h e r m o style custom step temp press vol pe ke etotal enthalpy thermo 10 variable nloop28 l a b e l loop

# Thermalisation r e s e t t i m e s t e p 0 fix 1allnvttemp ${ Tliq } ${ Tliq } ${Tdamp} run 10000 u nf ix 1 fix 1allnvttemp ${Tmax} ${Tmax} ${Tdamp} run 10000 u nf ix 1

82 Chapter A A.2. LAMMPS simulation code for enthalpy dynamics

fix 1allnvttemp ${Tmax} ${ Tliq } ${Tdamp} run 50000 u nf ix 1 fix 1allnvttemp ${ Tliq } ${ Tliq } ${Tdamp} run 10000

# Production dump vitesse all custom 20 dump.vit id type vx vy vz dump position all custom 20 dump.pos id type x y z r e s e t t i m e s t e p 0 compute RDF all rdf 100 fix RDF2 all ave/time 10 5000 50000c RDF[ ∗ ] file RDF.dat mode vector compute MSDall msd fix MSD2all ave/time10110c MSD[4] file MSD.dat

#r e s t a r t ${ N r e s t a r t } r e s t a r t . ∗ run ${ N prod steps } undump p o s i t i o n undump v i t e s s e variable Tliq equal ${ Tliq }−50 next n jump SELF loop

83 Chapter A A.3. VASP input files

A.3 VASP input files

Example of the content of a KPOINTS file: Automatic mesh 0 G (M) 4 4 4 0 . 0 . 0 . Example of the content of a POSCAR file for Al: f c c : Al 4.05 0 . 5 0 . 5 0 . 0 0 . 0 0 . 5 0 . 5 0 . 5 0 . 0 0 . 5 Al 1 Selective Dyn Cartesian 0 0 0 T T T

84 Bibliography

[1] Allen, M.P. and Tildesley, D.J., Computer Simulation of liquids. Oxford: Clarendon Pr 1987.

[2] Car, R. and Parrinello, M., Unified Approach for Molecular Dynamics and Density-Functional The- ory. Phys. Rev. Lett. 55, 22, p2471–2474, Nov 1985, American Physical Society. doi: 10.1103/Phys- RevLett.55.2471

[3] Tony Hey, Kristin Michele Tolle, Stewart Tansley, The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research, 1st October 2009. ISBN: 978-0-9825442-0-4.

[4] J.P. Hansen and I.R. McDonald, Theory of simple liquids, 2nd Ed., Academic, 1986. A classic book on liquids, with a chapter devoted to computer simulation.

[5] S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995). http://lammps.sandia.gov

[6] http://res.tagen.tohoku.ac.jp/ waseda/scm/LIQ/gr/Al 750 gr.txt, Structural Characterization of Materials Liquid Database, Institute of Advanced Materials Processing, Tohoku University, Japan.

[7] Mauro N A, Bendert J C, Vogt A J, Gewin J M, and Kelton K F 2011. J. Chem. Phys. doi: 135 044502

[8] Parr, Robert G; Yang, Weitao (1994). Density-Functional Theory of Atoms and Molecules. Oxford: Oxford University Press. ISBN 978-0-19-509276-9.

[9] Teepanis Chachiyo (2016), Communication: Simple and accurate uniform electron gas cor- relation energy for the full range of densities. J. Chem. Phys. 145 (2): 021101. Bib- code:2016JChPh.145b1101C. doi:10.1063/1.4958669. PMID 27421388.

[10] S. H. Vosko, L. Wilk and M. Nusair (1980), Accurate spin-dependent electron liquid correlation energies for local spin density calculations: a critical analysis. Can. J. Phys. 58 (8): 1200–1211. Bibcode:1980CaJPh..58.1200V. doi:10.1139/p80-159.

[11] Hohenberg, P. and Kohn, W., Inhomogeneous Electron Gas, Nov. 1964. PhysRev.136.B864–B871, American Physical Society. doi: 10.1103/PhysRev.136.B864

[12] N. Jakse and A. Pasturel, Correlation between dynamic slowing down and local icosahedral ordering in undercooled liquid Al80Ni20 alloy, SIMAP, UMR CNRS 5266, Grenoble Universit´e Alpes, France. The Journal of Chemical Physics 143, 084508 (2015); doi: 10.1063/1.4929481.

[13] Jakse, N., Pasturel, A. Liquid Aluminum: Atomic diffusion and viscosity from ab initio molecular dynamics. Sci Rep 3, 3135 (2013). https://doi.org/10.1038/srep03135

[14] Y. Mishin et al., Embedded-atom potential for B2-NiAl, School of Computational Sciences, George Mason University, Faifax, Virginia 22030. Physical Review B, Volume 65, 224114 (2002); doi : 10.1103/PhysRevB.65.224114

[15] M. Kehr, M. Schick, W. Hoyer and I. Egry, High Temp. - High Pressures 37, 361 (2008). Chapter A Bibliography

[16] J. Behler and M. Parrinello, Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces, Department of Chemistry and Applied Biosciences, ETH Zurich, USI- Campus, Via Giuseppe Buffi 13, CH-6900 Lugano, Swtizerland; doi : 10.1103/PhysRevLet- ter.98.146401.

[17] Nong Artrith, Development of Efficient and Accurate MLPs for the Simulations of Com- plex Materials. Columbia University, Center for Functional Nanomaterials. Given at Aalto, Finland, May 06, 2019. URL: http://ml4ms2019.aalto.fi/wp-content/uploads/2019/05/ 2019-05-06-ANN-potential-final-Artrith_med.pdf

[18] H. Gassner, M. Probst, A. Lauenstein, K. Hermansson, J. Phys. Chem. A 1998, 102, 4596.

[19] J. Behler, Constructing Hugh-Dimensional Neural Network Potentials: A Tutorial Review, Inter- national Journal of Quantum Chemistry; doi: 10.1002/qua.24890

[20] J. Behler, Atom-centered symmetry functions for constructing high-dimensional neural network potentials, The Journal of Chemical Physics; doi:10.1063/1.3553717

[21] D. Kingma, J. Lei Ba, Adam : A Method for Stochastic Optimization, conference paper at ICLR 2015; arXiv:1412.6980

[22] J. Karl Johnson, John A. Zollweg, Keith E. Gubbins, The Lennard-Jones equation of state revis- ited, School of Chemical Engineering, Cornell University, Ithaca, NY, 14853, USA, 1993. Molecular Physics, Volume 78, doi: 10.1080/00268979300100411

[23] C. Desgranges and J. Delhommelle, Crystal nucleation along an entropic pathway: Teaching liquids how to transition, Department of Chemistry, University of North Dakota, Grand Forks, North Dakota 58202, USA, 2018. Physical Review E, Volume 98, doi:10.1103/PhysRevE.98.063307

[24] Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu, Benchmarking State-of-the-Art Deep Learning Software Tools, 2017. Department of Computer Science, Hong Kong Baptist University. arXiv:1608.07249v7

[25] Y. Rosenfeld, A quasi-universal scaling law for atomic transport in simple fluids. Nuclear Research Centre Negev, PO Box 9001, Beer-Sheva 84190, Israel, 1999. J. Phys.: Condens. Matter 11 (1999) 5415–5427. PII: S0953-8984(99)02929-X

[26] N. Jakse and A. Pasturel, Excess Entropy Scaling Law for Diffusivity in Liquid Metals. Sci Rep 6, 20689 (2016). https://doi.org/10.1038/srep20689

[27] M. Dzugutov, A universal scaling law for atomic diffusion in condensed matter. Nature 381, 137–139 (1996). https://doi.org/10.1038/381137a0

[28] A. Pukrittayakamee, M. Malshe, M. Hagan, L. M. Raff, R. Narulkar, S. Bukkapatnum, and R. Komanduri, Simultaneous Ftting of a Potential Energy Surface and its Corresponding Force Fields using Feedforward Neural Networks, The Journal of Chemical Physics, vol. 130, p. 134101, Apr. 2009. doi: 10.1063/1.3095491.

[29] John-Anders Stende, Constructing high-dimensional neural network potentials for molecular dy- namics, Thesis for the degree of Master of Science, Faculty of Mathematics and Natural Sciences University of Oslo, September 2017

[30] N. Artrith, B. Hiller and J. Behler, Neural Network Potentials for Metals and Oxides - First ap- plications to Copper Clusters at Zinc Oxide; Lehrstuhl f¨urTheoretische Chemie, Ruhr-Universit¨at Bochum, 44780 Bochum, Germany. Phys. Status Solidi B 250, No. 6, 1191-1203 (2013). doi: 10.1002/pssb.201248370

86 Chapter A Bibliography

[31] M´arioR. G. Marques, Jakob Wolff, Conrad Steigemann and Miguel A. L. Marques, Neural network force fields for simple metals and semiconductors: construction and application to the calculation of phonons and melting temperatures. Phys.Chem.Chem.Phys., 2019, 21, 6506. doi: 10.1039/c8cp05771k

[32] No¨elJakse and Taras Bryk, Pressure evolution of transverse collective excitations in liquid Al along the melting line. June 2019. J. Chem. Phys. 151, 034506 (2019). https://doi.org/10.1063/ 1.5099099

87