Data-Driven Learning and Prediction of Inorganic Crystal Structures
Total Page:16
File Type:pdf, Size:1020Kb
Faraday Discussions Cite this: Faraday Discuss.,2018,211,45 View Article Online PAPER View Journal | View Issue Data-driven learning and prediction of inorganic crystal structures ab cd a Volker L. Deringer, * Davide M. Proserpio, Gabor´ Csanyi´ and Chris J. Pickardef Received 16th February 2018, Accepted 6th March 2018 DOI: 10.1039/c8fd00034d Crystal structure prediction algorithms, including ab initio random structure searching (AIRSS), are intrinsically limited by the huge computational cost of the underlying quantum-mechanical methods. We have recently shown that a novel class of machine Creative Commons Attribution 3.0 Unported Licence. learning (ML) based interatomic potentials can provide a way out: by performing a high- dimensional fit to the ab initio energy landscape, these potentials reach comparable accuracy but are orders of magnitude faster. In this paper, we develop our approach, dubbed Gaussian approximation potential-based random structure searching (GAP- RSS), towards a more general tool for exploring configuration spaces and predicting structures. We present a GAP-RSS interatomic potential model for elemental phosphorus, which identifies and correctly “learns” the orthorhombic black phosphorus (A17) structure without prior knowledge of any crystalline allotropes. Using the tubular structure of fibrous phosphorus as an example, we then discuss the limits of free This article is licensed under a searching, and discuss a possible way forward that combines a recently proposed fragment analysis with GAP-RSS. Examples of possible tubular (1D) and extended (3D) hypothetical allotropes of phosphorus as found by GAP-RSS are discussed. We believe Open Access Article. Published on 12 April 2018. Downloaded 9/28/2021 2:16:12 AM. that in the future, ML potentials could become versatile and routine computational tools for materials discovery and design. Introduction Exploring and cataloguing new crystal structures is one of the principal tasks in chemistry. Decades of careful experimental work are collected in the Inorganic Crystal Structure Database (ICSD)1 and the Cambridge Structural Database (CSD),2 which have been impressive examples of “big data” since long before the aDepartment of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK. E-mail: [email protected] bDepartment of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK cDipartimento di Chimica, Universita` degli Studi di Milano, Milano, Italy dSamara Center for Theoretical Materials Science (SCTMS), Samara State Technical University, Samara 443100, Russia eDepartment of Materials Science and Metallurgy, University of Cambridge, Cambridge CB3 0FS, UK fAdvanced Institute for Materials Research, Tohoku University, 2-1-1 Katahira, Aoba, Sendai, 980-8577, Japan This journal is © The Royal Society of Chemistry 2018 Faraday Discuss.,2018,211,45–59 | 45 View Article Online Faraday Discussions Paper phrase became fashionable. While structural space has traditionally been explored by synthesis, more recent work has shown that ab initio structure searching can play an important and complementary role in this regard. Computational tools, including genetic algorithms,3–5 particle swarm optimisa- tion,6,7 or ab initio random structure searching (AIRSS),8,9 can predict structures that are (sometimes) far from what chemical intuition would suggest. Many of these predictions have subsequently been validated by experiments10–14 or have given rise to databases of their own.15 We only mention in passing the prediction of organic molecular crystal structures, which has likewise seen fundamental breakthroughs.16–19 Despite their predictive power, these structure searching methods are nor- mally driven by quantum-mechanical density-functional theory (DFT) computa- tions, and therefore they are limited to systems with relatively small unit cells. The allotropes of elemental phosphorus,20–22 which are the topic of the present paper, directly illustrate the problem at hand. The thermodynamically stable “black” (orthorhombic) form, as well as the high-pressure As-type (rhombohedral) allotrope, exhibit simple crystal structures with only one symmetry-independent atom each (Table 1). Such systems are easily amenable to ab initio crystal struc- ture prediction, and various, especially layered, hypothetical allotropes have been 23–32 33 Creative Commons Attribution 3.0 Unported Licence. proposed in recent years. On the other hand, “violet” (Hittorf’s) phosphorus has a notoriously complex structure that was solved more than 100 years aer the initial synthesis34 and contains no fewer than 21 symmetry-independent atoms in the unit cell. Systems of this size have hitherto been out of reach for ab initio crystal structure prediction. It has recently been suggested that machine learning (ML) based interatomic potentials could help with this long-standing issue.35–38 Such potentials are tted to reference databases of DFT energies and forces and, once generated, they allow one to perform atomistic simulations with close to DFT quality but at a compu- This article is licensed under a tational cost that is orders of magnitude lower.39 Indeed, we have recently shown Open Access Article. Published on 12 April 2018. Downloaded 9/28/2021 2:16:12 AM. Table 1 Experimentally known crystal structures of phosphorus.20 Z is the number of atoms in the conventional unit cell; Z0 gives the number of symmetry-inequivalent atoms in the cell, and is therefore a measure of structural complexity. The molecular (“white”) and amorphous (“red”) forms are omitted from this study for simplicityb Space group ZZ0 Source Ref. Black P Cmca 8 1 High-pressure synthesis 43 As-type Pa R3m 2 1 From black P (p > 5 GPa) 44 Simple cubic P Pm3m 1 1 From black P (p > 11 GPa) 44 Hittorf’sP P2/c 84 21 Slow cooling from Pb melt 34 Fibrous P P1 42 21 Resublimation with I catalyst 45 a Occasionally referred to as “blue P” in recent literature,46 especially in the monolayer form. b To explore whether our GAP would be able to nd molecular (“white”) P, we carried out À a set of preliminary searches at low density (1.0 g cm 3). The resulting structures did include distorted P4 units, but also other small fragments, and we expect that the GAP will need to have “seen” these in iterative training to distinguish them more clearly. We also expect that fully capturing the intricate structural details of white P, including its packing variants,74 will require additional reference data; this is the topic of ongoing work. Regarding amorphous forms, our searches are restricted to relatively small periodic unit cells which cannot fully represent the amorphous allotrope(s). 46 | Faraday Discuss.,2018,211,45–59 This journal is © The Royal Society of Chemistry 2018 View Article Online Paper Faraday Discussions that an ML-based Gaussian approximation potential (GAP), initially developed for amorphous carbon,40 can be used to identify crystalline allotropes.36 This has added around 150 entries to the Samara Carbon Allotrope Database (SACADA, ref. 15). Very recently, the use of ML to speed up global searching and crystal structure optimization itself has been reported,41,42 but we focus here on the use of explicit interatomic potentials (“force elds”) for this purpose. What is more, in recent work,37 we have argued that RSS can be used to construct the interatomic potential from scratch, exploring and tting a complex potential energy surface (PES) at the same time. This points towards a more general “data-driven” strategy for atomistic materials modelling. In this Discussion paper, we aim to take new steps in this direction and further develop our emerging method (which we call “GAP-RSS”). Aer briey summa- rising its components, we present results for elemental phosphorus, generating and applying the rst GAP-RSS potential for this material. Our protocol “discovers” the crystal structure of black P during the iterations and, by construction, adds it to the reference database without prior knowledge of any existing allotropes. We then use this potential for some exercises in crystal structure prediction: we show that a brous allotrope of P (Table 1) appears to be prohibitively hard to nd in free searches, and we outline the use of an alternative Creative Commons Attribution 3.0 Unported Licence. approach, showing exemplary predicted structures in 1D and 3D. We also discuss open questions and expected future directions. Methodology The protocol for GAP-RSS potential tting and structure searching consists of three components: single-point DFT computations, GAP tting to an updated reference database, and structural optimisation using GAP. We give details of these in sequence, and an overview is provided in Fig. 1. This article is licensed under a DFT computations These provide the input data for GAP tting: initially, single-point DFT compu- Open Access Article. Published on 12 April 2018. Downloaded 9/28/2021 2:16:12 AM. tations are done for randomised atomic congurations, later, for intermediates or local minima of GAP-RSS searches. We obtain these data using standard DFT procedures, with high numerical accuracy to minimise noise in the input for tting. In this work, we used the PBEsol functional,47 which has been validated for black P before,48 and on-the-y ultra-so pseudopotentials as implemented in CASTEP.49 Reciprocal space was sampled on dense k-point meshes (maximum À spacing 0.02 A˚ 1). The cut-off energy for plane-wave expansions was 600 eV, and an extrapolation scheme was used to counteract nite-basis errors.50 In Fig. 1, all parts that involve single-point DFT computations (viz. the construction of the reference database) are enclosed by dashed lines. GAP tting With reference data available, an interatomic potential is tted to these using GAP. The framework was introduced in 2010 (ref. 51) and has since been used to generate potentials for diverse molecular and solid-state materials.40,52–54 Ahigh-dimensional t to reference energy and force data is performed based on structural similarity or kernel functions, comparing atomic environments one by one.