AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes

David Hicks,1, 2 Cormac Toher,1, 2 Denise C. Ford,1, 2 Frisco Rose,1, 2 Carlo De Santo,1, 2 Ohad Levy,1, 2, 3 Michael J. Mehl,1, 2 and Stefano Curtarolo1, 2, ∗ 1Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, USA 2Center for Autonomous Materials Design, Duke University, Durham, North Carolina 27708, USA 3Department of Physics, NRCN, P.O. Box 9001, Beer-Sheva 84190, Israel (Dated: October 12, 2020) The accelerated growth rate of repository entries in crystallographic makes it arduous to identify and classify their prototype structures. The open-source AFLOW-XtalFinder package was developed to solve this problem. It symbolically maps structures into standard designations following the AFLOW Prototype Encyclopedia and calculates the internal degrees of freedom consistent with the International Tables for Crystallography. To ensure uniqueness, structures are analyzed and compared via symmetry, local atomic geometries, and crystal mapping techniques, simultaneously grouping them by similarity. The software i. distinguishes distinct crystal prototypes and decorations, ii. determines equivalent spin configurations, iii. reveals compounds with similar properties, and iv. guides the discovery of unexplored materials. The operations are accessible through a Python module ready for workflows, and through command line syntax. All the 4+ million compounds in the AFLOW.org repositories are mapped to their ideal prototype, allowing users to search entries via symbolic structure-type. Furthermore, 15,000 unique structures — sorted by prevalence — are extracted from the AFLOW-ICSD catalog to serve as future prototypes in the Encyclopedia.

Scientists have been struggling for decades to iden- comparison. For instance, XTALCOMP is coupled tify prototypes (e.g. Strukturbericht series [1] and Pear- with the XTALOPT infrastructure for identifying dis- son’s Handbook [2]) and duplicates in crystallographic tinct materials generated via their evolutionary algo- databases; and to label structures in a concise way to rithm [15]. Despite the considerable number of platforms, recognize (and enable searching by) structure-types. The none are suitable for autonomous prototype detection. recent rapid growth of online repositories has worsened Crystallographic symmetry is neglected in Structure the problem [3]. Distinguishing distinct crystalline com- Matcher, XTALCOMP, and SPAP; while STRUCTURE- pounds is becoming increasingly difficult, leading to repe- TIDY, CRYCOM, and COMPSTRU rely on external sym- tition of previously studied materials, hindering database metry packages. Additionally, most tools only feature variety — biasing data-driven analyses and machine single pairwise comparisons (with the exception of Struc- learning methods [4,5] — and wasting valuable com- ture Matcher) and others require additional inputs (e.g. putational and experimental resources. The multitude space group, Wyckoff positions, and unit cell choice). of crystal geometries make by-hand detection of proto- Aside from technical functionality, the codes do not of- types and repeated entries intractable. A major com- fer built-in methods to compare structures to existing plication for finding structure-types is the non-standard crystallographic libraries and material repositories. To representation of crystals. Determination of unique crys- promote materials discovery, routines must analyze com- tallographic structures is obfuscated by i. unit cell repre- pounds with respect to established prototypes to iden- sentations and ii. origin choices. While standard forms tify new structure-types. This would enable the ex- exist — such as Niggli [6] and Minkowski [7] unit cells — pansion of prototype libraries — such as the AFLOW the conversion procedures are highly sensitive to numer- Prototype Encyclopedia (or Prototype Encyclopedia for ical tolerance values and can cast similar structures into brevity) [16, 17] — fueling generation of unique com- differing descriptions [8,9]. Additionally, lattice stan- pounds via prototype decoration. Comparing compounds dardization techniques do not address differences in ori- to those in materials databases can prevent duplication. gin choices. The lack of commensurate representations Moreover, the properties of database entries can be used impedes the search for prototypes and inhibits mappings to estimate those of similar uncalculated compounds, ex- between similar crystals and their corresponding proper- ploiting the structure-property relationship of materials. ties. Clearly, an automatic and reliable large-scale method for arXiv:2010.04222v1 [cond-mat.mtrl-sci] 8 Oct 2020 To overcome non-standard descriptions, crystal com- discerning unique crystallographic structures is therefore parison tools have been developed to identify similar crucial for the materials science community. structures. Programs such as Structure Matcher [10], AFLOW-XtalFinder (AFLOW crystal finder, XTALCOMP [9], SPAP [11], CMPZ [12], CRYCOM [8], XtalFinder for brevity) addresses many of the pre- STRUCTURE-TIDY [13], and COMPSTRU [14] are viously mentioned issues in a high-throughput fashion. available with varying objectives related to structure The primary objective of XtalFinder is to identify/- classify the prototypes of materials and relate them via structural similarity metrics. To accomplish this, XtalFinder determines the ideal prototype designation ∗ [email protected] of crystal structures, consistent with the International

AAAB83icbVDLSgMxFL1TX7W+qi7dBIvgqsxIoborunFZwT6gM5RMmmlDk8yQZIQy9DfcuFDErT/jzr8x085CWw8EDufcyz05YcKZNq777ZQ2Nre2d8q7lb39g8Oj6vFJV8epIrRDYh6rfog15UzSjmGG036iKBYhp71wepf7vSeqNIvlo5klNBB4LFnECDZW8n2BzUSJTLQb82G15tbdBdA68QpSgwLtYfXLH8UkFVQawrHWA89NTJBhZRjhdF7xU00TTKZ4TAeWSiyoDrJF5jm6sMoIRbGyTxq0UH9vZFhoPROhncwz6lUvF//zBqmJroOMySQ1VJLloSjlyMQoLwCNmKLE8JklmChmsyIywQoTY2uq2BK81S+vk+5V3WvUbx4atdZtUUcZzuAcLsGDJrTgHtrQAQIJPMMrvDmp8+K8Ox/L0ZJT7JzCHzifPy4dkcw=

AAACAHicbVDLSsNAFJ34rPUVdeHCzWARXJVECuqu6MZlBfuAJpTJ9KYdOpOEmYlQQjb+ihsXirj1M9z5N07aLLT1wMDhnHuZc0+QcKa043xbK6tr6xubla3q9s7u3r59cNhRcSoptGnMY9kLiALOImhrpjn0EglEBBy6weS28LuPIBWLowc9TcAXZBSxkFGijTSwjz1IFOOGZp4geixFpqYizwd2zak7M+Bl4pakhkq0BvaXN4xpKiDSlBOl+q6TaD8jUjPKIa96qYKE0AkZQd/QiAhQfjY7IMdnRhniMJbmRRrP1N8bGRHKxArMZBFSLXqF+J/XT3V45WcsSlINEZ1/FKYc6xgXbeAhk0A1nxpCqGQmK6ZjIgnVprOqKcFdPHmZdC7qbqN+fd+oNW/KOiroBJ2ic+SiS9REd6iF2oiiHD2jV/RmPVkv1rv1MR9dscqdI/QH1ucPQQ6XgA== AAACL3icbVBNSwMxEM3Wr7p+rXr0EiyKp7JbCuqtVhCPFewHdEvJptM2NJtdkqxQlv4jL/6VXkQU8eq/MG33oK0vzPB4M5NkXhBzprTrvlm5tfWNza38tr2zu7d/4BweNVSUSAp1GvFItgKigDMBdc00h1YsgYQBh2Ywup3Vm08gFYvEox7H0AnJQLA+o0Qbqevc+QEMmEiJlGQ8STmen4l9g89xyQRgk0Ls+3Z1RfFB9LLBrlNwi+4ceJV4GSmgDLWuM/V7EU1CEJpyolTbc2PdMbdpRjlMbD9REBM6IgNoGypICKqTzved4DOj9HA/kiaExnP190RKQqXGYWA6Q6KHark2E/+rtRPdv+qkTMSJBkEXD/UTjnWEZ+bhHpNANR8bQqhk5q+YDokkVBuLbWOCt7zyKmmUil65eP1QLlSqmR15dIJO0QXy0CWqoHtUQ3VE0TOaonf0Yb1Yr9an9bVozVnZzDH6A+v7BxLLo5o= AAACA3icbVDLSgMxFM3UV62vUXe6CQ6CC6mTUlB3RTcuK9gHtMOQSdM2NJkZkoxQhoIbf8WNC0Xc+hPu/Bsz7Qjaekguh3PuJbkniDlT2nW/rMLS8srqWnG9tLG5tb1j7+41VZRIQhsk4pFsB1hRzkLa0Exz2o4lxSLgtBWMrjO/dU+lYlF4p8cx9QQehKzPCNZG8u2DesVP0eRMwO7pz3GyipBvO27ZnQIuEpQTB+So+/ZntxeRRNBQE46V6iA31l6KpWaE00mpmygaYzLCA9oxNMSCKi+d7jCBx0bpwX4kzQ01nKq/J1IslBqLwHQKrIdq3svE/7xOovsXXsrCONE0JLOH+gmHOoJZILDHJCWajw3BRDLzV0iGWGKiTWwlEwKaX3mRNCtlVC1f3lad2lUeRxEcgiNwAhA4BzVwA+qgAQh4AE/gBbxaj9az9Wa9z1oLVj6zD/7A+vgGWCeUIg== AAAC8HicbVLNb9MwFHcCjFE+1sGRi0XFxCEKcWi7htMYF45Fotukpooc96Wz5nzIdqaVKH8FFw4gxJU/hxv/DU4a8dHtyX7f79n+PceF4Ep73i/LvnX7zs7d3Xu9+w8ePtrr7z8+UXkpGcxYLnJ5FlMFgmcw01wLOCsk0DQWcBpfvG3ip5cgFc+zD3pdwCKlq4wnnFFtXNG+tRPGsOJZRaWk67oSTNS9UMOVrloeJ5WgMYi6xgf4tdlhSvW5TKs3x2GUTodhREgYgVk1DsOtyoJKmqo/pdTB8UvDWMPMsdqIq6gitYM/boSx/I3l39DukooS/rYbuaPJ2MGee0iCRoyDiYMD4gaB7zTFBybHc/2R3wT9V8M2dURaa0za9pAtu4dH/YHnei3h6wrplAHqaBr1f4bLnJUpZJoJqtSceIVemG6aMwEGw1JBQdkFXcHcqBlNQS2qdmA1fm48S5zk0uxM49b7b0VlYFPrNDaZDdxqO9Y4b4rNS51MFhXPilJDxjYHJaXAOsfN9PGSS2BarI1CmeTmrpidmykxbf5Iz4BAtp98XTnxXTJ0g/fDwdFxB8cueoqeoReIoEN0hN6hKZohZqXWJ+uL9dWW9mf7m/19k2pbXc0T9B/ZP34Dq2XeaQ== AAACLXicbVBNSwMxEM3Wr1q/qh69BIvgQcquFNRbUUGPFe0HdEvJptM2NMkuSVYoS/+QF/+KCB4q4tW/Ydouoq0DgTdv5k1mXhBxpo3rjp3M0vLK6lp2PbexubW9k9/dq+kwVhSqNOShagREA2cSqoYZDo1IAREBh3owuJrU64+gNAvlgxlG0BKkJ1mXUWIs1c5f+wH0mEyIUmQ4Sugo5wti+kokolIaYd//ye9vbHqC/QL2vCkPspPK2vmCW3SngReBl4ICSqPSzr/6nZDGAqShnGjd9NzItOw0wygHu0OsISJ0QHrQtFASAbqVTK8d4SPLdHA3VPZJg6fsb0VChNZDEdjOyep6vjYh/6s1Y9M9byVMRrEBSWcfdWOOTYgn1uEOU0ANH1pAqGJ2V0z7RBFqrME5a4I3f/IiqJ0WvVLx4q5UKF+mdmTRATpEx8hDZ6iMblEFVRFFT+gFjdG78+y8OR/O56w146SaffQnnK9vpz2nuA== AAACLXicbVBNSwMxEM3Wr1q/qh69BIvgQcquFKq3ooJ6q2g/oFtKNp22odnskmSFsvQPefGviOChIl79G2bbRbR1IPDmzbzJzPNCzpS27YmVWVpeWV3Lruc2Nre2d/K7e3UVRJJCjQY8kE2PKOBMQE0zzaEZSiC+x6HhDS+TeuMRpGKBeNCjENo+6QvWY5RoQ3XyV64HfSZiIiUZjWM6zrk+0QPpx8FtaYxd9ye/vzbpCXYLuOxMeRDdVNbJF+yiPQ28CJwUFFAa1U7+1e0GNPJBaMqJUi3HDnXbTNOMcjA7RApCQoekDy0DBfFBtePptWN8ZJgu7gXSPKHxlP2tiImv1Mj3TGeyupqvJeR/tVake2ftmIkw0iDo7KNexLEOcGId7jIJVPORAYRKZnbFdEAkodoYnDMmOPMnL4L6adEpFc/vSoXKRWpHFh2gQ3SMHFRGFXSDqqiGKHpCL2iC3q1n6836sD5nrRkr1eyjP2F9fQOoVqe5 AAACLnicbVBNSwMxEM36WevXqkcvwSJ4kLKrBe2tKKLeKtoP6JaSTadtaDa7JFmhLP1FXvwrehBUxKs/w7RdRFsHAm/em5nMPD/iTGnHebXm5hcWl5YzK9nVtfWNTXtru6rCWFKo0JCHsu4TBZwJqGimOdQjCSTwOdT8/vlIr92DVCwUd3oQQTMgXcE6jBJtqJZ94fnQZSIhUpLBMKHDrBcQ3ZNBoq8LQ+x5P/ntpUkPsZfD7nFxLIBop30tO+fknXHgWeCmIIfSKLfsZ68d0jgAoSknSjVcJ9JNM00zysEsESuICO2TLjQMFCQA1UzG5w7xvmHauBNK84TGY/Z3R0ICpQaBbypHu6tpbUT+pzVi3TltJkxEsQZBJx91Yo51iEfe4TaTQDUfGECoZGZXTHtEEqqNw1ljgjt98iyoHuXdQr54U8iVzlI7MmgX7aED5KITVEJXqIwqiKIH9ITe0Lv1aL1YH9bnpHTOSnt20J+wvr4BPIyn/Q== AAACLnicbVBLSwMxEM76rPW16tFLsAgepOxKqXoriuixPvqAbinZdNqGZrNLkhXK0l/kxb+iB0FFvPozTNtFtHUg8M33zUxmPj/iTGnHebXm5hcWl5YzK9nVtfWNTXtru6rCWFKo0JCHsu4TBZwJqGimOdQjCSTwOdT8/vlIr92DVCwUd3oQQTMgXcE6jBJtqJZ94fnQZSIhUpLBMKHDrBcQ3ZNB0rspDrHn/eS3lyY9xF4Ou8XiWADRTvtads7JO+PAs8BNQQ6lUW7Zz147pHEAQlNOlGq4TqSbZppmlINZIlYQEdonXWgYKEgAqpmMzx3ifcO0cSeU5gmNx+zvjoQESg0C31SOdlfT2oj8T2vEunPSTJiIYg2CTj7qxBzrEI+8w20mgWo+MIBQycyumPaIJFQbh7PGBHf65FlQPcq7hfzpdSFXOkvtyKBdtIcOkIuOUQldoTKqIIoe0BN6Q+/Wo/VifVifk9I5K+3ZQX/C+voGOtOn/A== AAACLnicbVBLSwMxEM7WV62vqkcvwSJ4kLJbKtZbUXwcK9oHdEvJptM2NJtdkqxQlv4iL/4VPQgq4tWfYdouoq0DgW++b2Yy83khZ0rb9quVWlhcWl5Jr2bW1jc2t7LbOzUVRJJClQY8kA2PKOBMQFUzzaERSiC+x6HuDc7Hev0epGKBuNPDEFo+6QnWZZRoQ7WzF64HPSZiIiUZjmI6yrg+0X3px/SyNMKu+5PfXpn0CLs5XCgcTwQQnaSvnc3ZeXsSeB44CcihJCrt7LPbCWjkg9CUE6Wajh3qlpmmGeVglogUhIQOSA+aBgrig2rFk3NH+MAwHdwNpHlC4wn7uyMmvlJD3zOV493VrDYm/9Oake6WWjETYaRB0OlH3YhjHeCxd7jDJFDNhwYQKpnZFdM+kYRq43DGmODMnjwPaoW8U8yf3hRz5bPEjjTaQ/voEDnoBJXRNaqgKqLoAT2hN/RuPVov1of1OS1NWUnPLvoT1tc3G0yn6Q== AAACLXicbVBLSwMxEM76rPVV9eglWAQPUnaloN6KFfRY0T6gu5RsOq3BJLskWWFZ+oe8+FdE8FARr/4N03YRXwOBb76ZbzLzhTFn2rju2JmbX1hcWi6sFFfX1jc2S1vbLR0likKTRjxSnZBo4ExC0zDDoRMrICLk0A7v6pN6+x6UZpG8MWkMgSBDyQaMEmOpXuncD2HIZEaUIukoo6OiL4i5VSIT9eoI+/5Xfn1h00Psl7F3NOVB9nNZr1R2K+408F/g5aCM8mj0Ss9+P6KJAGkoJ1p3PTc2gZ1mGOVgd0g0xITekSF0LZREgA6y6bUjvG+ZPh5Eyj5p8JT9rsiI0DoVoe2crK5/1ybkf7VuYgYnQcZknBiQdPbRIOHYRHhiHe4zBdTw1AJCFbO7YnpLFKHGGly0Jni/T/4LWkcVr1o5vaqWa2e5HQW0i/bQAfLQMaqhS9RATUTRA3pCY/TqPDovzpvzPmudc3LNDvoRzscnkwenrA== 2 symbolic prototype finder a b self-consistent crystal ideal prototype designation prototype finder mP4 scan scan P 21/m # 11 A 2 em ✏sym B 2 em 0 0.25 0.5 map to label + symbolic/numeric dof (ITC)

label : AB mP4 11 e e params : a, b/a, c/a, ,x1,z1,x2,z2 mP4 mC4 oI4 tI4 hR6 cF8 values :5.586, 0.719, 0.698, 91.992, SG #11 SG #12 SG #71 SG #139 SG #166 SG #225 0.252, 0.234, 0.751, 0.261

FIG. 1. Self-consistent symbolic prototype finder. (a) The prototype for an input structure, e.g. AlCl (ICSD #56541), is identified by analyzing its symmetry. Classification of the prototype may change depending on the symmetry tolerance (sym): mP4, SG #11 (0 < sym ≤ 0.25 A);˚ mC4, SG #12 (0.26 ≤ sym ≤ 0.27 A);˚ oI4, SG #71 (0.34 ≤ sym ≤ 0.36 A);˚ tI4, SG #139 (0.37 ≤ sym ≤ 0.44 A);˚ hR6, SG #166 (0.45 ≤ sym ≤ 0.49 A);˚ and cF8, SG #225 (0.51 ≤ sym ≤ 1.0 A).˚ An adaptive routine is employed for tolerance regions with incommensurate symmetry descriptions (gray arrows for 0.27 < sym < 0.34 A˚ and 0.49 < sym < 0.51 A),˚ ensuring self-consistent prototype/symmetry designations. (b) The structure is then mapped into its prototype label and symbolic and numeric internal degrees of freedom (dof), consistent with the International Tables for Crystallography (ITC). Structures in this representation can be generated with the symbolic prototype generator.

Tables for Crystallography (ITC)[18]. Any structure tures are generally classified in terms of their symme- in this representation can be automatically generated try characteristics. For example, the rocksalt prototype via a new symbolic prototype generator. Similarity has a face-centered cubic lattice and 8 atoms in the between structures is analyzed on multiple fronts. conventional cell (i.e. Pearson symbol of cF8), space Crystallographic structures are first compared by sym- group F m3¯m (#225), and Wyckoff positions 4 a m3¯m metry (isopointal analysis), leveraging a robust software and 4 b m3¯m. Determining this information for any arbi- implementation, AFLOW-SYM, which calculates self- trary structure is often a challenge: numerical noise in the consistent symmetry descriptions freeing the user from atomic positions inhibits detection of crystal isometries, tolerance adjustments [19]. Local atomic geometries requiring by-hand modification of tolerance thresholds. are also computed to match neighborhoods of atoms in Furthermore, consistency between real- and reciprocal- crystals (isoconfigurational snapshots). Finally, crystal space symmetries is often overlooked, and yet it is im- similarity is resolved by rigorous structure mapping perative for reliable ab initio simulations. Thus, accurate procedures (complete isoconfigurational analysis) and prototype detection relies on robust symmetry analyses. quantified via a misfit criterion [20]. The prototype finder accommodates automatic workflows, with functionality XtalFinder employs a self-consistent mechanism to find to analyze multiple materials/structures simultaneously the ideal prototype of a given structure. The space group, via multithreading. Features are provided to identify Pearson symbol, and occupied Wyckoff positions are cal- crystallographic structures, distinct materials, atom culated via the AFLOW-SYM routines [19]. The proto- decorations, and spin configurations. Methods are also type classification is sensitive to the symmetry tolerance included to compare compounds/prototypes to the (sym). For example, the AlCl structure (ICSD #56541, AFLOW.org repository and AFLOW prototype libraries. DFT-relaxed) in Figure1(a) can be classified as one of Every entry in the AFLOW.org repository has been six different prototypes as a function of sym: i. mP4, mapped to its prototype label, enabling users to search SG #11 (0 <  0.25 A);˚ ii. mC4, SG #12 (0.26 sym ≤ ≤ the database by structure-type. The XtalFinder code —  0.27 A);˚ iii. oI4, SG #71 (0.34  0.36 A);˚ sym ≤ ≤ sym ≤ written in C++ — is part of the AFLOW (Automatic iv. tI4, SG #139 (0.37  0.44 A);˚ v. hR6, SG ≤ sym ≤ flow) framework [21–24] and is open-source under the #166 (0.45 sym 0.49 A);˚ and vi. cF8, SG #225 GNU-GPL license. For seamless integration into different ≤ ≤ (0.51 sym 1.0 A).˚ For certain tolerance values — work environments, this functionality is accessible via ≤ ≤ e.g. 0.27 < sym < 0.34 A˚ and 0.49 < sym < 0.51 A—˚ the command-line and a Python module. incommensurate symmetry descriptions are calculated. Problem of the ideal prototype. Prototype struc- To overcome this, the symmetry tolerance is automat- 3 ically changed, scanning tighter and looser tolerances resentation also easily distinguishes isopointal and iso- around the initial value, to find consistent symmetry de- configurational prototypes. Two compounds with simi- scriptions at a new sym. This autonomous approach en- lar labels are isopointal (i.e. same symmetry), and are sures prototype classifications are correct and compati- isoconfigurational if their parameters are the same (i.e. ble against all symmetry descriptors (e.g. space group, equivalent geometric configurations) [25]. Moreover, the Wyckoff position, lattice type, Brillouin zone, etc.). representation reveals the degrees of freedom that can The default symmetry tolerance value for classifying be altered, while preserving the underlying symmetry. prototypes in XtalFinder is proportional to the minimum This is useful for showing continuous structure transi- min interatomic distance (dnn /100). The tolerance is thus tions within the same symmetry-type and performing system-specific, and it has been shown to be consistent symmetry-constrained structure relaxations [26]. Lastly, with experimentally resolved symmetries [19]. Neverthe- with this format, structures are now easily regenerated less, the tolerance can also be adjusted by the user, and with the AFLOW software. is guaranteed to return a commensurate designation due Symbolic prototype generator. Structures repre- to the adaptive prototype protocol shown in Figure1(a). sented in the ideal prototype designation can be created Once the symmetry attributes of the crystal are calcu- and decorated with any atomic elements via a new sym- lated, XtalFinder automatically maps the structure to bolic prototype generator, enabling automatic materials its AFLOW prototype label and symmetry-based degrees design. A procedure — introduced in Refs. [16, 17]— of freedom (Figure1(b)), i.e. lattice parameters/angles has been extended to create all possible prototype struc- and non-fixed Wyckoff coordinates [16, 17]. These desig- tures, going beyond those previously described in the nations are commensurate with the ITC cell choices and Prototype Encyclopedia. Given a crystal’s composition, Wyckoff positions; the de facto standard for crystallogra- Pearson symbol, space group, and occupied Wyckoff po- phy. The label specifies the stoichiometry and symmetry sitions, the generator determines the degrees of freedom of the structure in underscore-separated fields. The fields in symbolic notation (i.e. a, b/a, c/a, α, β, γ, x, y, indicate the following (example system: esseneite struc- and z) that must be specified, based on the ITC con- ture, ABC6D2 mC40 15 e e 3f f [16]) ventions [18]. Feeding in the ideal prototype label and • first field: the reduced stoichiometry based on al- degrees of freedom to the symbolic generator will pro- phabetic ordering of the compound, e.g. a quater- duce the corresponding geometry file, substituting the nary with stoichiometry ABC6D2, appropriate degrees of freedom with the input values. • second field: the Pearson symbol, e.g. mC40, Prototypes, including those in the Prototype Encyclope- • third field: the space group number, e.g. space dia, no longer need to be tabulated (hard-coded) in the group #15, AFLOW software, and are now created on-the-fly. With • fourth field: the Wyckoff letter(s) of the first atomic this prototype generator, AFLOW is capable of creating site, e.g. site A: one Wyckoff position with letter structures to span all regions of crystallographic space. e, Structures are generated with the following pro- • fifth field: the Wyckoff letter(s) of the second totype command syntax: --proto=label --params= atomic site, e.g. site B: one Wyckoff position with parameter_1,parameter_2,.... Here, the label is the letter e, ideal prototype label, e.g. AB mP4 11 e e as shown in • sixth field: the Wyckoff letter(s) of the third atomic Figure1(b), and parameter_1,parameter_2,... are the site, e.g. site C: three Wyckoff positions with let- comma-separated values for the prototype’s degrees of ters f, and freedom, e.g. 5.586, 0.719, 0.698, 91.992, 0.252, 0.234, • seventh field: the Wyckoff letter(s) of the fourth 0.751, and 0.261 as shown in Figure1(b). By default, atomic site, e.g. site D: one Wyckoff position with structures are generated with fictitious species in alpha- letter f. betical order (i.e. A, B, C, D, etc.). Users can over- The prototype parameters specify the degrees of free- ride this order by specifying other permutations after dom allowed by the symmetry of the structure. For the the prototype label (separated by a period), i.e. -- esseneite structure, there are 18 parameters: a, b/a, c/a, proto=label.BAC...; a useful feature for controlling the β, y1, y2, x3, y3, z3, x4, y4, z4, x5, y5, z5, x6, y6, and atomic site decorations. Specific elements can be deco- z6. The first three variables are the lattice parameters rated onto the prototype by appending the element ab- — with b and c represented in relation to a — the fourth breviations to the command in colon-separated alphabet- variable is the lattice angle β, and the subsequent vari- ical order, e.g. --proto=label:Ag:Cu:Zr. The genera- ables are the Wyckoff coordinates (fractional) that are tor checks for any inconsistencies with the provided label not fixed by symmetry. The sequence of the Wyckoff and/or parameter values, terminating prematurely with parameters is based on the alphabetic ordering of the a message listing possible fixes to the command. The Wyckoff letters, followed by alphabetic ordering of the generator supports multiple geometry file formats, in- species. Additional information regarding the label and cluding VASP (POSCAR)[27], FHI-AIMS [28], Quantum parameters are discussed in the Refs. [16, 17]. ESPRESSO [29], ABINIT [30], ELK [31], and CIF. Swap- Mapping structures into this format characterizes pro- ping the command --proto=label with --aflow_proto totypes in a concise and descriptive manner. The rep- =label, will build an aflow.in file, AFLOW’s input file 4

(using a standard set of DFT parameters by default [32]), further compared by inspecting arrangements of atoms, automating ab initio simulations of these compounds. i.e. local atomic geometries. Local geometry analy- The generator can also print the symbolic representa- ses have been fruitful in providing structural descrip- tion of the lattice and Wyckoff positions. Adding the tors and similarity metrics between crystals of different option --add_equations to the prototype command re- types [39, 40]. However, the positions of these environ- turns both a numerical and symbolic version of the geom- ments are often neglected, precluding the determination etry file, and the option --equations_only only prints of one-to-one mappings between similar crystals. Nev- the symbolic version. Symbolic geometry files can be ertheless, the analysis quickly identifies local geometries printed with respect to the conventional cell (ITC) or and is employed here to analyze structures beyond sym- symbolically transformed into the primitive cell (using metry considerations (i.e. isoconfigurational versus iso- the SymbolicC++ open-source software [33]). By default, pointal [35]). AFLOW provides the primitive cell, since fewer-atom unit Rather than determine the complete local atomic ge- cells are more computationally efficient. ometry for each atom, XtalFinder builds a reduced rep- With a robust prototype classifier and generator in resentation: neighborhoods comprised of only the least place, comparison of prototypes is required to i. iden- frequently occurring atom (LFA) type(s). The local LFA tify unique structure-types and ii. group similar ones to- geometry analysis provides the connectivity for a sub- gether. The prototype label and parameters alone can- set of atoms (i.e. LFA-type) to discern if patterns are not establish structural similarity due to variations in present in both structures, regardless of cell choice and the choice of lattice and origin, potentially affecting both crystal orientation. This description is preferred over the the label (e.g. Wyckoff letters) and the parameters (e.g. full local geometry because it is i. computationally less lattice and non-fixed Wyckoff parameters). Therefore, expensive to calculate and ii. generally less sensitive to XtalFinder offers three levels of comparison: symmetry, coordination cutoff tolerances. The latter is attributed local atomic geometry, and complete crystal geometry. to the fact that LFA geometries are more sparse. They are described in the following three subsections. An example of a local LFA geometry is shown in Fi- Isopointal structures: symmetry comparison. gure2(b). A local LFA a tomic geometry (AG) is a set Symmetry analyses of crystals are required to identify of vectors connecting a central atom (c) to its closest structures of the same symmetry-type. The isometries of neighbors: crystals (e.g. rotations, roto-inversions, screw axes, and min AGc dic i atomi LFA(s) , (1) glide planes) are calculated via the routines of AFLOW- ≡ { } ∀ | ∈ { } min SYM [19] to determine the space group and occupied where dic is the minimum distance vector to the i atom Wyckoff positions (Figure2(a)). Results from AFLOW- — restricted to LFA-type(s) only — and is calculated via SYM are robust against numerical tolerance issues and the method of images for periodic systems [41]: are consistent with experimentally determined symme- min dic = min( min (xi xc + naa + nbb + ncc) ). (2) tries in comparison to other symmetry software [19]. i na,nb,nc || − ||

Crystals are isopointal if they have commensurate Here, na, nb, and nc are the lattice dimensions along the space groups (equivalent or enantiomorphic pairs) and lattice vectors a, b, and c; and xi and xc are the Carte- Wyckoff positions [35]. Wyckoff positions are compatible sian coordinates of the i and c (center) atoms, respec- if they have the same multiplicity and similar site sym- min tively. A coordination shell with a thickness of dic /10 metry designations. Due to different setting and origin captures other atoms of the same type to control numer- choices for the conventional cell, a strict site symmetry ical noise in the atomic coordinates (a similar tolerance match is insufficient. For instance, the Wyckoff positions metric is defined in AFLOW-SYM, i.e. loose preset tole- with multiplicity 2 in space group #47 (P mmm) — four rance value [19]). This cutoff value yields expected co- 2mm (letters i-l), four m2m (letters m-p), and four mm2 ordination numbers for well-known systems and is com- (letters q-t) — form a Wyckoff set and are related via an parable to results provided by other atom environment automorphism of the space group operations [18, 36, 37]. calculators [39, 40]. If there is only one LFA type — e.g. Depending on the assignment of the lattice parameters Si in α-cristobalite (SiO2)[42] — then the distance to (a, b, and c) and origin choice, different — and potentially the closest neighbor of that LFA type is calculated. If equivalent — Wyckoff decorations are possible. Conse- there are multiple LFA types — e.g. four for the qua- quently, XtalFinder tests permutations of the site sym- ternary Heusler (as illustrated in Figure2(b)) — then metry symbol to expose positions that may be within the the minimum distances to each LFA type are computed. same Wyckoff set [38]. The local atomic geometry is calculated for each atom The symmetry calculation is performed automatically, of the LFA type(s) in the unit cell, resulting in a list i.e. it does not require input from the user. Options are of atomic geometries ( AGc ). Therefore, α-cristobalite available to ignore symmetry and force geometric com- has a set of four Si LFA{ geometries} (one for each Si in parison of structures, which can identify crystals associ- the unit cell: AGSi,1, AGSi,2, AGSi,3, AGSi,4 ) and the ated via symmetry subgroups. quaternary Heusler{ has a set of four LFA geometries} (one Isoconfigurational snapshots: local geometry for each element type: AGAu, AGLi, AGMg, AGSn , re- comparison. Beyond isopointal analyses, structures are spectively). { } 5 duplicate reduction a symmetry comparison b local geometry comparison C4 S2

C4 S2

0 10 010 0 10 010 local LFA geometry C4 = 100. . . S2 = 100 C4 = 100. . . S2 = 100 00011 0 1 00011 0 1

AAACXHicbVBNS8NAEN3Grxq/qoIXL4tF0YMlUUEvQtGLRwWrQlPCZjtpl242YXciltCf54/w4tGLV727/VD8GtjlvTfzmOFFmRQGPe+p5ExNz8zOlefdhcWl5ZXK6tqNSXPNocFTmeq7iBmQQkEDBUq4yzSwJJJwG/XOh/3be9BGpOoa+xm0EtZRIhacoZXCSuieh8XRgJ7SQEKMu0EEHaGKhKEWDwOXenSH7vv282gQuHSMvqj3Sf0hDUC1P52BFp0u7rlhperVvFHRv8CfgCqZ1GVYeQnaKc8TUMglM6bpexm2CqZRcAkDN8gNZIz3WAeaFiqWgGkVoyAGdNsqbRqn2j6FdKR+dxQsMaafRHbS3tk1v3tD8b9eM8f4pFUIleUIio8XxbmkmNJhqrQtNHCUfQsY18LeSnmXacbRZv/vFhuM/zuGv+DmoOYf1g6ujqr1s0lEZbJJtsgu8ckxqZMLckkahJNH8kreyHvp2Zl2Fpyl8ahTmnjWyY9yNj4AK5evDA== 001 AAACXHicbVBNS8NAEN3Grxq/qoIXL4tF0YMlUUEvQtGLRwWrQlPCZjtpl242YXciltCf54/w4tGLV727/VD8GtjlvTfzmOFFmRQGPe+p5ExNz8zOlefdhcWl5ZXK6tqNSXPNocFTmeq7iBmQQkEDBUq4yzSwJJJwG/XOh/3be9BGpOoa+xm0EtZRIhacoZXCSuieh8XRgJ7SQEKMu0EEHaGKhKEWDwOXenSH7vv282gQuHSMvqj3Sf0hDUC1P52BFp0u7rlhperVvFHRv8CfgCqZ1GVYeQnaKc8TUMglM6bpexm2CqZRcAkDN8gNZIz3WAeaFiqWgGkVoyAGdNsqbRqn2j6FdKR+dxQsMaafRHbS3tk1v3tD8b9eM8f4pFUIleUIio8XxbmkmNJhqrQtNHCUfQsY18LeSnmXacbRZv/vFhuM/zuGv+DmoOYf1g6ujqr1s0lEZbJJtsgu8ckxqZMLckkahJNH8kreyHvp2Zl2Fpyl8ahTmnjWyY9yNj4AK5evDA== 001

AAACW3icbVBNS8NAEN2mfsavqnjyslgUvZSkCnoRil48KloVmlo220m7uNmE3YlYQv+df8KDVw9e9Qe4rVH8Ghh47808ZnhhKoVBz3ssOeWJyanpmVl3bn5hcamyvHJpkkxzaPJEJvo6ZAakUNBEgRKuUw0sDiVchbfHo/nVHWgjEnWBgxTaMespEQnO0Eqdyo173snrQ3pIAwkRbgch9ITKY4Za3A9d6tEt6tv2aBC4Bfqi3if1RzQA1f00Blr0+rjjdipVr+aNi/4FfgGqpKjTTuU56CY8i0Ehl8yYlu+l2M6ZRsElDN0gM5Ayfst60LJQsRhMOx/nMKSbVunSKNG2FdKx+t2Rs9iYQRzaTftn3/yejcT/Zq0Mo4N2LlSaISj+cSjKJMWEjkKlXaGBoxxYwLgW9lfK+0wzjjb6f6/YYPzfMfwFl/Wav1urn+1VG0dFRDNknWyQbeKTfdIgJ+SUNAknD+SFvJK30pNTdlxn/mPVKRWeVfKjnLV3w8yu4w== AAACW3icbVBNS8NAEN2mfsavqnjyslgUvZSkCnoRil48KloVmlo220m7uNmE3YlYQv+df8KDVw9e9Qe4rVH8Ghh47808ZnhhKoVBz3ssOeWJyanpmVl3bn5hcamyvHJpkkxzaPJEJvo6ZAakUNBEgRKuUw0sDiVchbfHo/nVHWgjEnWBgxTaMespEQnO0Eqdyo173snrQ3pIAwkRbgch9ITKY4Za3A9d6tEt6tv2aBC4Bfqi3if1RzQA1f00Blr0+rjjdipVr+aNi/4FfgGqpKjTTuU56CY8i0Ehl8yYlu+l2M6ZRsElDN0gM5Ayfst60LJQsRhMOx/nMKSbVunSKNG2FdKx+t2Rs9iYQRzaTftn3/yejcT/Zq0Mo4N2LlSaISj+cSjKJMWEjkKlXaGBoxxYwLgW9lfK+0wzjjb6f6/YYPzfMfwFl/Wav1urn+1VG0dFRDNknWyQbeKTfdIgJ+SUNAknD+SFvJK30pNTdlxn/mPVKRWeVfKjnLV3w8yu4w== @ A @ A @ A @ A

space group ¯ space group ¯

FmAAACA3icbVBNS8MwGE7n15xfVW96CRbBg4y2U/Q4FMTjBPcBaxlplm1hSVuSVBhl4MW/4sWDIl79E978N6ZdD7r5kLw8PM/7krxPEDMqlW1/G6Wl5ZXVtfJ6ZWNza3vH3N1rySgRmDRxxCLRCZAkjIakqahipBMLgnjASDsYX2d++4EISaPwXk1i4nM0DOmAYqS01DMPbrgXIJHWphx6p/mxsuq65z3Tsqt2DrhInIJYoECjZ355/QgnnIQKMyRl17Fj5adIKIoZmVa8RJIY4TEakq6mIeJE+mm+wxQea6UPB5HQN1QwV39PpIhLOeGB7uRIjeS8l4n/ed1EDS79lIZxokiIZw8NEgZVBLNAYJ8KghWbaIKwoPqvEI+QQFjp2Co6BGd+5UXScqtOrerenVn1qyKOMjgER+AEOOAC1MEtaIAmwOARPINX8GY8GS/Gu/Exay0Zxcw++APj8weOxJTf 3m # 225 FmAAACA3icbVBNS8MwGE7n15xfVW96CRbBg4y2U/Q4FMTjBPcBaxlplm1hSVuSVBhl4MW/4sWDIl79E978N6ZdD7r5kLw8PM/7krxPEDMqlW1/G6Wl5ZXVtfJ6ZWNza3vH3N1rySgRmDRxxCLRCZAkjIakqahipBMLgnjASDsYX2d++4EISaPwXk1i4nM0DOmAYqS01DMPbrgXIJHWphx6p/mxsuq65z3Tsqt2DrhInIJYoECjZ355/QgnnIQKMyRl17Fj5adIKIoZmVa8RJIY4TEakq6mIeJE+mm+wxQea6UPB5HQN1QwV39PpIhLOeGB7uRIjeS8l4n/ed1EDS79lIZxokiIZw8NEgZVBLNAYJ8KghWbaIKwoPqvEI+QQFjp2Co6BGd+5UXScqtOrerenVn1qyKOMjgER+AEOOAC1MEtaIAmwOARPINX8GY8GS/Gu/Exay0Zxcw++APj8weOxJTf 3m # 225

Wyckoff Mg 4 am3¯m Wyckoff Cl 4 am3¯m

positions AAACUnicbVJLSwMxEE7ru76qHr0Ei+Kp7KqgR9GLF7GCfUC3lNl0WoNJdkmyQln2NwrixR/ixYOatqto64SEj5lvJjNfEsaCG+t5r4Xi3PzC4tLySml1bX1js7y13TBRohnWWSQi3QrBoOAK65Zbga1YI8hQYDN8uBzFm4+oDY/UnR3G2JEwULzPGVjn6pZ5EOKAqxS0hmGWCjpeWSmQYO+1TK8HGT2gJ24DdYcMQtDpcSZpEPxwbr4p4SwFVS+v3S1XvKo3NjoL/BxUSG61bvk56EUskagsE2BM2/di23HVLGcCXYuJwRjYAwyw7aACiaaTjiXJ6L7z9Gg/0m4rS8fe3xkpSGOGMnTM0RRmOjZy/hdrJ7Z/1km5ihOLik0u6ieC2oiO9KU9rpFZMXQAmOauV8ruQQOz7hVKTgR/euRZ0Diq+sfVo9uTyvlFLscy2SV75JD45JSckytSI3XCyBN5Ix/ks/BSeC+6XzKhFgt5zg75Y8W1L1WrsHw= ¯ positions O4bm3m NaAAACU3icbVFNSwMxEM2uX7V+VT16CRbFU9m1gh7FXjyJglWhW8psOq3RJLskWaEs+x9F8OAf8eJB07qKWidMeMy8mcy8xKngxgbBi+fPzM7NL1QWq0vLK6trtfWNK5NkmmGbJSLRNzEYFFxh23Ir8CbVCDIWeB3ft8b56wfUhifq0o5S7EoYKj7gDKwL9Wp3UYxDrnLQGkZFLujkFNVIgr3VMm+Jgu7SA+dA3SWjGHTeLCSNom/OGXxx4mkOqn7ZvFerB41gYnQahCWok9LOe7WnqJ+wTKKyTIAxnTBIbdd1s5wJdDNmBlNg9zDEjoMKJJpuPtGkoDsu0qeDRDtXlk6iPytykMaMZOyY4zXM39w4+F+uk9nBUTfnKs0sKvb50CAT1CZ0LDDtc43MipEDwDR3s1J2CxqYdd9QdSKEf1eeBlf7jbDZ2L84qB+flHJUyBbZJnskJIfkmJySc9ImjDySV/LuEe/Ze/N9f/aT6ntlzSb5Zf7KBwA0r+I= 4 bm3¯m coordination analysis

c Xref geometric structure comparison

Xtest lattice and origin search

lattice deviation coordinate displacement failure

dmap dmap

FIG. 2. Symmetry, local atomic geometry, and geometric structure comparisons in AFLOW-XtalFinder. (a) Crystal isometries are calculated internally with AFLOW-SYM. The space groups and occupied Wyckoff positions are compared, revealing isopointal structures. (b) The local least-frequently occurring atom (LFA) geometries are computed and compared between structures. An example local LFA geometry is shown for the quaternary Heusler structure [34] (2-D projection), highlighting the closest neighbors (via solid lines) for each LFA type to the central Mg atom (purple). Shaded concentric circles indicate the tolerance threshold for capturing atoms in the coordination shell with a thickness of 10% of the distance from the central and connected atom. Local geometry vectors are compared against local geometries in other structures to determine mapping potential. (c) Two structures (Xref and Xtest) are mapped onto one another by expanding Xtest into a supercell and exploring commensurate lattice and origin choices with respect to Xref . The yellow lattice (highlighted by the green box) is a potential match with Xref . Xtest is transformed into the new representation (Xetest), and the structures are quantitatively compared via the misfit criteria. The structures are evaluated via their lattice deviation (latt), coordinate displacement (coord), map and figure of failure (fail). Distances between mapped atoms (d ) that are less than half the atom’s nearest neighbor (dnn/2) are accounted for in the coordinate displacement (green dashed lines and arrows), while larger distances are described in the figure of failure (red dashed lines and arrows). 6

To investigate structural compatibility, local atomic The grid dimensions span between na,b,c na,b,c to geometry lists for compounds are compared. In gen- account for different orientations/rotations− between→ the eral, the local geometry comparisons err on the side of structures. To optimize the lattice search, translation caution. For instance, comparing the cardinality of the vectors are explored in a grid comprised of only the LFA- coordination is often too strict. Despite a more sparse type in Xtest (since they are the minimal set of atoms geometry space, slight deviations in position can move exhibiting crystal periodicity). In addition to verifying atoms outside the coordination shell threshold, changing crystal periodicity, candidate lattice vectors must be sim- the atom cardinality and neglecting potential matches. ilar to those in the Xref lattice based on i. lattice vector Local atomic geometries are thus compatible if i. the moduli (∆l), ii. angles formed between pairs of lattice central atoms are comparable types (i.e. same element vectors (∆θ) and iii. volumes enclosed by three lattice and/or stoichiometric ratio in crystal), ii. the neighbor- vectors (∆V ). The tolerances values (∆l, ∆θ, ∆V ) are hood of surrounding atoms have distances that match chosen based on how much the lattices are allowed to dif- within 20% after normalizing with respect to max(AGc) fer. If the lattices are significantly different, then the lat- (i.e. the largest distance in the local geometry cluster), tice is ignored (see the “lattice deviation” in the “Quan- and iii. the angles formed by two atoms and the center titative similarity measure” subsection). Additionally, atom match within 10 degrees. To further alleviate the as a speed increase, commensurate lattices are sorted by coordination problem, an exact geometry match is not minimum lattice deviation to find matches more quickly. required, i.e. some distances and angles are permitted to Upon finding a similar cell to Xref , Xtest is transformed be missing. Grouping local atomic geometries as compat- into the new lattice representation Xtest and is stored if ible is favored to mitigate false negatives for equivalent the representations have the same number of atoms (and structures. types). e Isoconfigurational structures: Geometric struc- For each prospective unit cell, possible origin choices ture comparison. To resolve a commensurate represen- are explored. The origin of Xref is placed on one of the tation between two structures for geometric comparison, LFA-type atoms, and the origin of Xtest is cycled through one structure — the reference Xref — remains fixed and all atoms of its LFA-type. Given an origin choice, a map- the other structure — the potential duplicate Xtest — is ping procedure is attempted for alle atoms in the unit cell. expanded into a supercell. Lattice vectors are identified The minimum Cartesian distance — via the method of within the supercell and compared against the reference images for periodic systems [41] — is determined for ev- structure. For any similar lattices to Xref , Xtest is trans- ery atom i in Xref to each atom j in Xtest formed into the new lattice representation (Xtest). Ori- gin shifts for this cell are then explored in an attempt to dij = min (xi xj + naa + nebb + ncc) , (4) match atoms. If one-to-one atom mappings existe between na,nb,nc || − || the two structures, then the similarity is quantified with the crystal misfit method (see “Quantitative similarity where na, nb, and nc are the lattice dimensions along measure” subsection) [20]. Misfit values below a given the lattice vectors a, b, and c; and xi and xj are the threshold indicate equivalent structures and the search Cartesian coordinates of the i and j atoms, respectively. Given the set of distances dij , the minimum distance terminates. Alternatively, misfit values larger than the { } threshold are disregarded and the search continues until over all j atoms is identified as the mapping distance, i.e. all lattices and origin shifts are exhausted. The proce- map di min dij , (5) dure is detailed below and an illustration of the process ≡ j { } is depicted in Figure2(c). The lattice search algorithm begins by scaling the vol- map regardless of the element type. Once di is computed umes of the unit cells to compare structures with different for all i, the following conditions are verified: i. one-to- volumes (an option is available to quantify the similar- one mappings (i.e. no duplicate j indices between i in- ity between structures at fixed volumes). Once scaled, dices), and ii. no cross-matching between element types the routine searches for translation vectors by generat- (i.e. cannot map a single element type to multiple types ing a lattice grid of test. The size of the grid is defined X in Xtest). If either condition is violated, the mappings to encompass a sphere with a radius (rgrid) equal to the are ignored and the search continues. maximum lattice vector length of , i.e. Xref Givene a successful mapping, the similarity of the two crystals in the corresponding representations are quan- rgrid max (a, b, c) . (3) ≡ tified, indicating equivalent or unique structures. If no Similar to a procedure described in Ref. [19], the nec- mapping is found for any lattice and origin choice, then essary grid dimensions are given by the set of vectors the structures are considered distinct and are not as- perpendicular to each pair of Xref lattice vectors scaled signed a similarity value. by the grid radius (e.g. n1 = rgrid (b c/ b c )). The Quantitative similarity measure. To compare two scaled vectors are then transformed into× the|| × lattice|| basis crystals in a given representation, a method proposed (L), via n0 = L−1n, and the ceiling of the n0 compo- by Burzlaff and Malinovsky is employed [20]. The simi- 0 nents indicate the grid dimensions: na,b,c = ceil(na,b,c). larity between structures is quantified by a misfit value, 7

, which incorporates differences between lattice vectors positions [10] and coordination characterization func- and atomic coordinates via [20]: tions [11]. XtalFinder employs the crystal misfit criteria to incorporate structural differences between both the  1.0 (1.0  ) (1.0  ) (1.0  ) . (6) ≡ − − latt − coord − fail lattice and atom positions. Differences between common similarity metrics — and their software implementations The misfit quantity is bound between zero and one: — are discussed in more detail in the “Comparison Ac- structures with a value close to zero match and those curacy” subsection. with a value close to one do not match. Special misfit ranges defined by Burzlaff and Malinovsky are adopted Super-type comparisons. To explore new areas of here [20] materials space, the XtalFinder module i. identifies equivalent and unique materials, ii. uncovers common 0 <  match : match, structure-types across different compounds (i.e. proto- ≤ types), iii. determines inequivalent atom decorations match <  family : same family, and (7) ≤ for a given crystal structure, and iv. discerns distinct  <  1: no match. family ≤ magnetic structure configurations. The corresponding The “same family” designation generally corresponds to comparison modes are denoted as material-, structure-, crystals with common symmetry subgroups. Burzlaff and decoration-, and magnetic-type, respectively (Figure3). Each variant uses the underlying procedures discussed in Malinovsky recommend match = 0.1 and family = 0.2 based on definitions from Pearson [43] and Parth´e[44]. In the “Results” section (i.e. symmetry, local atomic geom- etry, and geometric structure comparisons) with different the “Finding match: structural misfit versus calculated restrictions on mapping atom types. property (∆Hatom)” section, heuristic misfit thresholds are identified based on the allowed maximum enthalpy Material-type. Material-type comparisons map atoms differences between similar structures. of the same atomic species (Figure3(a)). For example, The deviation of the lattices, latt, captures the dif- given two ZnS zincblende compounds [45], a material- type comparison maps Zn Zn and S S in the two struc- ference between the lattice face diagonals of Xtest and tures. Therefore, the method→ reveals→ duplicate com- Xref [20] e pounds within a data set. latt 1 (1 D12)(1 D23)(1 D31), (8) Structure-type. Conversely, structure-type compar- ≡ − − − − test ref test ref isons ignore atomic species and map any atom-type with dkl dkl + fkl fkl Dkl || − ref|| ||ref − ||, (9) compatible stoichiometric ratios (Figure3(b)). In the ≡ dkl fkl case of zincblende structures ZnS and SiC, a structure- e || − e || type comparison attempts to map Zn Si and S C, or where fkl and dkl denote the diagonals by adding and Zn C and S Si, since the compounds→ are equicompo-→ subtracting, respectively, the k and l lattice vectors. → → In the lattice search algorithm, ∆l, ∆θ, and ∆V tol- sitional. This mode exposes unique backbone structures and is practical for crystallographic prototyping. Identi- erances are coupled to latt, and are tuned to ensure fying prototypes is also useful for modeling solid solutions latt family. The≤ coordinate deviation — measuring the disparity and disordered materials [46, 47]. between atomic positions in the two structures — is based Decoration-type. The decoration-type (or map map permutation-type) mode determines unique atom on the mapped atom distances (di or dj as com- puted with Equations (4) and (5)) and the atoms’ nearest decorations for a given crystal structure, i.e. inequiv- alent colorings of a structure, where each element is neighbor distances in the respective structures, dnn [20] denoted by a different color (Figure3(c)). Continuing test ref Ne test map N ref map with the zincblende example, the A and B atomic sites i (1 ni ) di + j 1 nj dj coord − − . are equivalent: swapping elements on the sites results in N test N ref ≡ P e (1 ntest) dtest + P 1 nref dref a duplicate compound compared to the original decora- i − ei nn,i j − j nn,j (10) tion. Thus, only one site decoration choice is necessary P P  e to create a distinct compound. Given a compound with N test and N ref are the number of atoms in the two crys- n species, there are n! possible atom permutations. map tals. If d < dnn/2, then a “switch” variable n is set to XtalFinder automatically i. generates compounds with zeroe and the mapped atom distance is included in coord. the different atom decorations for a crystal, ii. compares Otherwise, n is set to one, signifying the mapped atoms the decorations (via a material-type comparison), and are far apart and not considered in coord. These atoms iii. identifies the unique configurations. Atom decora- are represented in the figure of failure, fail [20] tions are only compared if atomic types have the same Wyckoff multiplicity and similar site symmetries (see test ref Ne test N ref subsection “Isopointal structures: symmetry analysis”). i ni + j nj fail . (11) Equivalent decoration groups need to obey Lagrange’s ≡ P N test + NPref e theorem [48]: the order h of subgroup H divides the Other metrics can be used to assess structural similar- group G with order g (i.e. mod(g, h) = 0). Accordingly, ity, including the root meane square (rms) of the atom the numbers of unique and equivalent decorations must 8 super-type comparisons a material-type b structure-type c decoration-type d magnetic-type

AB BA

reveal duplicates identify prototypes unique decorations distinct spin config.

FIG. 3. Available super-type comparison modes. (a) Material-type: maps same element types, revealing duplicate compounds. (b) Structure-type: maps structures regardless of the element types, identifying crystallographic prototypes. (c) Decoration-type: creates and compares all atom decorations for a given structure, determining unique and equivalent decorations (in this case, atom decorations AB and BA match). (d) Magnetic-type: maps compounds by element types and magnetic moments, discerning distinct spin configurations. divide the total number of decorations, i.e. satisfy divi- However, the number of equivalent decorations in each sor theory. The possible equivalent decoration groups — set are not the same, violating Lagrange’s theorem [48]. out of n! – are dictated by its divisors, and are enumer- Furthermore, all misfits values should be the same, since ated below for 2 < n < 5 (elemental compounds, n = 1, the underlying structure is unchanged. The incommen- are excluded): surate groupings are a symptom of only comparing to the 2! = 2 : 2, 1 reference decoration, as opposed to cross-comparing with 3! = 6 : 6, 3, 2, 1 other decorations. 4! = 24 : 24, 12, 8, 6, 4, 3, 2, 1 To remedy incorrect groupings, XtalFinder checks for 5! = 120 : 120, 60, 40, 30, 24, 20, 15, 12, 10, 8, 6, 5, better matches (i.e. potential equivalent decorations 4, 3, 2, 1 with lower misfit values). Therefore, the “duplicate” dec- For example, the possible groupings for a ternary com- orations are compared to the other reference decorations pound (n = 3) are: 6, 3, 2, and 1 unique sets with 1, 2, and regrouped to minimize the misfit value. In this case, 3, and 6 decorations per set, respectively. the subsequent cross-comparisons are performed: Depending on the matching (misfit) tolerance and the • BAC with CAB and BCA, choice of the reference decoration, calculated equivalency • CBA with CAB and BCA, and groups can violate divisor theory. For instance, two dec- • ACB with BCA (not compared with ABC; per- orations can match with a certain misfit; however, a bet- formed previously). ter match with a smaller misfit can exist with another Consequently, the final equivalent decorations are decoration. To combat incorrect groupings, XtalFinder • ABC = CBA ( = 0.0144), executes a consistency check, verifying the groupings are • CAB = BAC ( = 0.0144), and commensurate with the possible divisors. If they are not, • BCA = ACB ( = 0.0144). XtalFinder searches for better matches and regroups the The groupings above satisfy Lagrange’s theorem, and the compatible decorations. equivalent structures in each group have the same misfit For example, ICSD entry BiITe #10500 (original geom- value with respect to their reference decoration. etry) has six possible atom decorations: ABC, BAC, Magnetic-type. Magnetic-type comparisons map CBA, ACB, CAB, and BCA. Since the three equicom- atoms of the same atomic species and similar magnetic positional sites are comprised of the same Wyckoff multi- moments, i.e. analyzes spin configurations (Figure3(d)). plicity and site symmetry (multiplicity 1 and site symme- For instance, given two body-centered cubic chromium try 3m. in space group #156), all structures are placed compounds with antiferromagnetic ordering, the routine in the same initial comparison group, with ABC chosen attempts to map Cr↑ Cr↑ and Cr↓ Cr↓. A magnetic as the reference decoration (since it is the first in the set). moment tolerance threshold→ denotes equivalent→ spin sites; After comparing, the equivalent groups and their misfit where the default tolerance is 0.1µB. The analysis can be values are: performed for both collinear and non-collinear systems. • ABC = BAC ( = 0.0889) = CBA ( = 0.0144), The magnetic-type comparison can be joined with a mag- • CAB = ACB ( = 0.0889), and netic structure generator to create distinct spin configu- • BCA (no equivalent decorations). rations for high-throughput simulation. AAACIXicdVDLSgMxFM3UV62vqks3wSK4Gmak0nZXdOOygn1Ap5RMmmlDk8yQZIQyzK+48VfcuFCkO/FnzLRjUdEDISfn3Mu9OX7EqNKO824V1tY3NreK26Wd3b39g/LhUUeFscSkjUMWyp6PFGFUkLammpFeJAniPiNdf3qd+d17IhUNxZ2eRWTA0VjQgGKkjTQs172I+4lHIkVZKFK4fDLyxXJjmHgc6YnkibnwJE3TYbni2I16reZcQtd2FoBGMag5K6UCcrSG5bk3CnHMidCYIaX6rhPpQYKkptjMK3mxIhHCUzQmfUMF4kQNksUPU3hmlBEMQmmO0HChfu9IEFdqxn1Tme2pfnuZ+JfXj3VQHyRURLEmAi8HBTGDOoRZXHBEJcGazQxBWFKzK8QTJBHWJtSSCWH19/9J58J2q3bjtlppXuVxFMEJOAXnwAU10AQ3oAXaAIMH8ARewKv1aD1bb9Z8WVqw8p5j8APWxyeHwKZK 9 automatic grouping

compounds compare compare local compare geometric isoconfigurational to compare symmetry LFA geometry structure all matched? isopointal near isoconfig.

all matched? yes ( ✏ ✏ match )  match

no

move unmatched ...

...... into new groups done

FIG. 4. Automatic grouping of multiple compounds. Compounds are compared in the following sequence: symmetry, local LFA geometry, and geometric structure. The algorithms determine isopointal, near isoconfigurational, and isoconfigura- tional structures, respectively, and aggregate them into similar sets (enclosed in black solid-lined boxes). Unmatched structures (i.e.  > match) after the initial geometric structure comparison are put into new groups and re-compared until all equivalent structures are grouped. This sequence is the same for material-, structure-, decoration-, and magnetic-type comparisons; how- ever, the criteria for atom mappings differ (see subsection “Super-type comparisons” for details). The symmetry, local LFA geometry, and geometric structure comparisons (blue boxes) are multithreaded for parallel computation.

Multiple comparisons. With the plethora of com- depending on the comparison mode. pounds generated by computational frameworks — Multithreading. To enhance calculation speed, multi- such as AFLOW [21, 49], NoMaD [50], Materials threading capabilities can be employed. The three com- Project [51], High-Throughput Toolkit [52], Materials putationally intensive procedures — calculating the sym- Cloud/AiiDA [53], and OQMD [54] — automatically com- metry, constructing the local LFA geometry, and per- paring structures is necessary for high-throughput classi- forming geometric comparisons — are partitioned onto fication of unique/duplicate compounds and structure- allocated threads, offering significant speed increases for types. For this purpose, we developed an automatic large collections of structures. comparison procedure for multiple crystals (Figure4). Automatic comparisons. There are three built-in Compounds are first grouped into isopointal sets by an- functions to compare multiple structures automatically: alyzing and comparing the symmetries of the structures, i. compare structures provided by a user, ii. compare an aggregating them by stoichiometry, space groups, and input structure to prototypes in AFLOW [16, 17], and iii. Wyckoff sets (calculated via AFLOW-SYM [19]). Next, compare an input structure to entries in the AFLOW.org compounds are further partitioned into near isoconfigu- repository. An overview of each high-throughput method rational sets by determining and comparing the local LFA is discussed below and usage is detailed in the Methods geometries in each structure. Within each near isoconfig- section. urational group, one representative structure — generally the first in the set — is compared to the other struc- Compare user datasets. Users can load crystal geome- tures via geometric comparisons and the misfit values tries and compare them automatically with XtalFinder. are stored. Once the comparisons finish, any unmatched Options to perform both material-type and structure- type comparisons are available to identify unique/du- structures (i.e. misfit values greater than match) are re- organized into new comparison sets. The process repeats plicate compounds or prototypes, respectively. For until all structures have been assembled into matching structure-type comparisons, the unique atom decorations groups or all comparison pairs are exhausted. The three for each representative structure are determined. Once comparison analyses are performed in this order for two the analysis is complete, XtalFinder groups compatible reasons: i. to categorize structural similarity to varying structures together and returns the corresponding misfit degrees (isopointal, near isoconfigurational, and isocon- values. figurational) and ii. to efficiently group compounds to Compare to AFLOW prototypes libraries. Given reduce the computational cost of the geometric structure an input structure, this routine returns similar AFLOW comparison (see “Speed and scaling considerations” in prototype(s) along with their misfit value(s) (Fi- the Discussion). This procedure is the same for material-, gure5(a)). AFLOW contains structural prototypes that structure-, decoration-, and magnetic-type comparisons; can be rapidly decorated for high-throughput materials however, different atom mapping restrictions are applied discovery: 590 in the Prototype Encyclopedia [16, 17] and 1,492 in the High-throughput Quantum Computing 10

compounds similar to the input structure based on TABLE 1. AFLOW.org entries equivalent to an in- put sodium chloride (rocksalt) compound. A list species, stoichiometry, space group, and Wyckoff posi- of equivalent compounds to the Prototype Encyclopedia’s tions. With the AURL from the AFLUX response, struc- rocksalt structure with the default degrees of freedom (la- tures for the entry are retrieved via the REST-API. The bel=AB cF8 225 a b, parameters=5.64 A).˚ The compound most relaxed structure is extracted by default; however, name, auid, misfit (), and enthalpy per atom (Hatom) are options are available to obtain structures at different listed for all similar structures in the database. Volume scal- ab-initio relaxation steps. The set of entries from the ing is suppressed for the comparison to incorporate volume database are then compared to the input structure. Sim- differences. The first 25 and last 2 entries are from AFLOW’s ilar to the AFLOW prototype comparisons, candidate en- ICSD and LIB2 catalogs, respectively. tries are only compared against the input structure, i.e. H compound auid  atom the procedure terminates without regrouping unmatched (eV/atom) entries. ClNa aflow:d241535faf2a4519 0.00514317 -3.39101 With the underlying AFLUX functionality, mate- ClNa aflow:82a178672a734c47 0.00537828 -3.39082 ClNa aflow:1cd71114972d46dd 0.00514250 -3.39078 rial properties can also be extracted, highlighting the ClNa aflow:d0c93a9396dc599e 0.00496982 -3.39075 structure-property relationship amongst similar materi- ClNa aflow:5699b196418c6044 0.00450831 -3.39066 als. For instance, the enthalpy per atom (Hatom) for ClNa aflow:9017f9c64ead22ab 0.00434390 -3.39062 matching database entries are printed by including the ClNa aflow:39ab5e62afdb5ac0 0.00429571 -3.39062 enthalpy_atom API keyword in the query. Any num- ClNa aflow:c16c0f1c061f7d3e 0.00424606 -3.39061 ber or combination of properties can be queried; avail- ClNa aflow:2f4b5e32510830a0 0.00427698 -3.39061 able API keywords are located in Refs. [55, 56]. Table1 ClNa aflow:b5ab343f3a484538 0.00421818 -3.39060 shows the comparison results between a rocksalt NaCl ClNa aflow:cc41860d69de2888 0.00405508 -3.39056 compound and matching DFT-relaxed structures in the ClNa aflow:b2ec4b68e12f3674 0.00404585 -3.39056 AFLOW.org repository along with their misfits and en- ClNa aflow:4f19021768a3118a 0.00399452 -3.39055 ClNa aflow:ec23029a18d3fec9 0.00373730 -3.39049 thalpies per atom. ClNa aflow:a4652bde28e67c3d 0.00339698 -3.39041 This routine reveals equivalent AFLOW.org com- ClNa aflow:18ebb85b07a92f89 0.00386555 -3.39033 pounds, if similar materials exists in the database. As ClNa aflow:1354bbef4edd80b3 0.00383458 -3.39032 such, it can estimate structural properties a priori; before ClNa aflow:182f848dd10cc403 0.00383093 -3.39031 performing any calculations. The estimation is based on ClNa aflow:d996b8d524516c24 0.00380678 -3.39030 the following assumptions: i. the matching AFLOW ma- ClNa aflow:a5755554aaf5d10e 0.00379748 -3.39030 terial resides at a local minimum in the energy landscape ClNa aflow:9466351a9cbac2c9 0.00379835 -3.39030 and ii. the input structure relaxes to the same geometry ClNa aflow:9a28207fd647e477 0.00379460 -3.39029 as that AFLOW compound, given comparable calcula- ClNa aflow:e3e31c4914d59e25 0.00379517 -3.39029 tion parameters. The functionality can explore proper- ClNa aflow:fd711a60dbfba2de 0.00378193 -3.39028 ClNa aflow:55d2cbd0f4018884 0.00405470 -3.39013 ties that are not calculated for a given entry, but are cal- ClNa aflow:3bd528dd9f88be7d 0.00395044 -3.39233 culated for an equivalent entry. For example, compounds ClNa aflow:f4b806d73482566c 0.00345690 -3.39121 in AFLOW’s prototype catalogs (LIB1, LIB2, LIB3, etc.) do not usually have band structure data; however, corre- sponding ICSD entries can be found which do provide library [24]. In this method, AFLOW prototypes are ex- band structure information. Finally, the method can tracted — based on similar stoichiometry, space group, identify compounds that are absent from the database and Wyckoff positions to the input — and compared to and prioritize them for future calculation, enhancing the the user’s structure. Since only matches to the input diversity of the AFLOW.org repositories. are relevant, the procedure terminates before regrouping Using AFLOW-XtalFinder. For ease-of-use, the any unmatched prototypes. The attributes of matched XtalFinder routines are accessible via a command-line prototypes are also returned, including the prototype la- interface and a Python environment (see Methods for bel, mineral name, Strukturbericht designation, and links details). to the corresponding Prototype Encyclopedia webpage. Ideal prototype analysis in AFLOW.org. The ideal The scheme identifies common structure-types with the prototype designations — for both the original and re- AFLOW libraries or — if no matches are found — reveals laxed geometries — have been successfully determined new prototypes. Absent prototypes can be character- for all 4+ million entries in the AFLOW.org repository. ized automatically in the AFLOW standard designation The prototype label, parameter variables, and parame- with XtalFinder’s prototyping tool (discussed in subsec- ter values are incorporated into the AFLOW REST- and tion “Problem of the ideal prototype”). Search-APIs [55, 56]. The corresponding API keywords Compare to AFLOW.org repository. Compounds for the original geometries are are compared to entries in the AFLOW.org repository us- • aflow_prototype_label_orig, ing the AFLOW REST- and AFLUX Search-APIs [55, 56] • aflow_prototype_params_list_orig, and (Figure5(b)). An AFLUX query (i.e. matchbook and • aflow_prototype_params_values_orig. directives) is generated internally and returns database For the DFT-relaxed geometries, the keywords are 11 encyclopedia/online mapping input a matching prototype structure label: A2B3_hR10_167_c_e mineral name: corundum

Strukturbericht: D51 web: aflow.org/CrystalDatabase/ A2B3_hR10_167_c_e.html … b matching entries AFLUX + REST-API entry: misfit:

Al2O3 (ICSD #608993) 0.0003 Al2O3 (ICSD #600672) 0.0021 …

FIG. 5. Encyclopedia/online prototype mapping. An input Al2O3 (corundum) compound is compared to entries in (a) AFLOW Prototype Encyclopedia and (b) the AFLOW.org repository. Potential equivalent entries are retrieved automatically from the respective catalog and compared with XtalFinder. Matching entries and their level of similarity (misfit) are returned.

• aflow_prototype_label_relax, structure. • aflow_prototype_params_list_relax, and Finding match: structural misfit versus calculated • aflow_prototype_params_values_relax. property (∆Hatom). To identify a suitable threshold The prototype keywords enable researchers to search for matching similar structures (match), Figure6 plots for materials by structure. This new feature is useful for the misfit value () between two mapped structures and identifying possible crystal structures given experimen- their difference in enthalpy per atom (∆Hatom). The tal data. For example, with composition, space group, structures in the test set are comprised of DFT-relaxed and occupied Wyckoff information (characteristics often entries from the entire AFLOW-ICSD catalog as of 14 known to experimentalists); users can construct the cor- August 2020 (60,390) [57, 58]. Compounds are grouped responding prototype label(s) and extract all compounds via commensurate atomic elements, stoichiometries, sym- based on the provided structure-type. The keywords are metries, and local LFA geometries. Furthermore, only also used to identify the frequency of certain prototypes compounds calculated with similar ab initio settings are in the AFLOW.org repository. For example, all com- compared together — such as LDAU parameters, kpoint pounds that are isopointal to the corundum prototype per reciprocal atom (KPPRA), and pseudopotentials (see (labels: A2B3 hR10 167 c e and A3B2 hR10 167 e c) Supplementary Information for details) — to prevent ex- can be retrieved for both the original and relaxed geome- traneous enthalpy differences due to differing parame- tries. Moreover, this search capability is used to discern ters. In addition, magnetic systems are excluded since if a structure-type is novel or has been reported previ- the magnetic moment is not incorporated into the misfit ously. value. For these comparisons, the unit cell volumes are The ideal prototype keywords also reveal whether a not rescaled, and the best lattice/origin choices are ex- compound retains the same prototype designation be- plored (minimizing the misfit value) to show better cor- fore and after relaxation. For structures that retain the relation with the enthalpies. After grouping the struc- same prototype label, the parameter values show the tures and identifying one-to-one mappings, misfit values continuous structure transition during relaxation. For for the remaining 6,795 comparison pairs are calculated. structures that transform into different prototypes, the Figures6(a) and (b) show the enthalpy difference ranges symmetry-based designations highlight the symmetries 0 100 meV/atom and 0 10 meV/atom, respectively, that were broken. This can indicate that certain element highlighting− the maximum− enthalpy differences at differ- combinations/arrangements are averse to certain proto- ent misfit values. type structures. More advanced relaxation techniques, In general, the misfit value correlates with the enthalpy e.g. symmetry-constrained relaxations [26], would be difference for  0.1: as the misfit value decreases, the required to restrict the relaxation to a given prototype enthalpy difference≤ also reduces. For  > 0.1, the en-

AAAB/HicbVDLSgMxFM3UV62v0S7dBIvgqs6UgrorunFZwT6gHUomzbSheQxJRhiG9lfcuFDErR/izr8xbWehrQcCh3Pu5Z6cMGZUG8/7dgobm1vbO8Xd0t7+weGRe3zS1jJRmLSwZFJ1Q6QJo4K0DDWMdGNFEA8Z6YSTu7nfeSJKUykeTRqTgKORoBHFyFhp4JZrsz5HZqx4xkn7EhnJpwO34lW9BeA68XNSATmaA/erP5Q44UQYzJDWPd+LTZAhZShmZFrqJ5rECE/QiPQsFYgTHWSL8FN4bpUhjKSyTxi4UH9vZIhrnfLQTs6D6lVvLv7n9RITXQcZFXFiiMDLQ1HCoJFw3gQcUkWwYaklCCtqs0I8RgphY/sq2RL81S+vk3at6terNw/1SuM2r6MITsEZuAA+uAINcA+aoAUwSMEzeAVvzsx5cd6dj+Vowcl3yuAPnM8fBzCVCg== AAAB/HicbVDLSgMxFM3UV62v0S7dBIvgqs5IRd0V3bisYB/QDiWTZtrQPIYkI5Sh/RU3LhRx64e482/MtLPQ1gOBwzn3ck9OGDOqjed9O4W19Y3NreJ2aWd3b//APTxqaZkoTJpYMqk6IdKEUUGahhpGOrEiiIeMtMPxXea3n4jSVIpHM4lJwNFQ0IhiZKzUd8uXsx5HZqR4yknrHBnJp3234lW9OeAq8XNSATkafferN5A44UQYzJDWXd+LTZAiZShmZFrqJZrECI/RkHQtFYgTHaTz8FN4apUBjKSyTxg4V39vpIhrPeGhncyC6mUvE//zuomJroOUijgxRODFoShh0EiYNQEHVBFs2MQShBW1WSEeIYWwsX2VbAn+8pdXSeui6teqNw+1Sv02r6MIjsEJOAM+uAJ1cA8aoAkwmIBn8ArenJnz4rw7H4vRgpPvlMEfOJ8/C/KVDQ== AAACBXicbVC7TsMwFHV4lvIKMMJgUSExlQQVFbYKFsYi0YfURJXjOq1VO45sB1FFZWDhV1gYQIiVf2Djb3DaDNBypCsdnXOv7r0niBlV2nG+rYXFpeWV1cJacX1jc2vb3tltKpFITBpYMCHbAVKE0Yg0NNWMtGNJEA8YaQXDq8xv3RGpqIhu9SgmPkf9iIYUI22krn3goTiW4h5Wzx48jvRA8pST5gnSgo+7dskpOxPAeeLmpARy1Lv2l9cTOOEk0pghpTquE2s/RVJTzMi46CWKxAgPUZ90DI0QJ8pPJ1+M4ZFRejAU0lSk4UT9PZEirtSIB6YzO1TNepn4n9dJdHjupzSKE00iPF0UJgxqAbNIYI9KgjUbGYKwpOZWiAdIIqxNcEUTgjv78jxpnpbdSvniplKqXeZxFMA+OATHwAVVUAPXoA4aAINH8AxewZv1ZL1Y79bHtHXBymf2wB9Ynz+o2pi0 12 ✏ ✏

a =0 =0 narrow enthalpy spread wide enthalpy spread . . 1 2 75 meV/atom ⇡ atom) / (meV atom H

✏ ✏ ✏ b =0 =0 max enthalpy difference additional datapoints . . 1 025 decreases with misfit large jump in max atom)

/ enthalpy difference

5 meV/atom (meV atom H

2 meV/atom

FIG. 6. Enthalpy difference per atom and misfit value between compared structures in the AFLOW-ICSD ref test catalog. The misfit value () and the difference in enthalpy per atom (∆Hatom = Hatom − Hatom) for all AFLOW-ICSD entries with similar parameters are shown above. The plots show the misfit values between 0 − 0.2 with two enthalpy difference ranges (for clarity): (a) 0 − 100 meV/atom and (b) 0 − 10 meV/atom. The plot with the full misfit domain and enthalpy range is shown in the Supplementary Information. Candidate misfit thresholds are chosen based on the acceptable maximum enthalpy deviation between match structures. For example, misfit values below  = 0.025 and  = 0.1 (black vertical lines) are expected to yield enthalpy differences no larger than ∆Hatom ≈ 2 meV/atom and ∆Hatom ≈ 5 meV/atom, respectively (black horizonal lines). As the misfit values increases beyond  > 0.1, the spread of the data points also increases. A large jump in the maximum enthalpy difference occurs at approximately  = 0.125, indicating matched structures near this value and beyond are not guaranteed to have similar enthalpies. thalpy spread widens with the misfit. Some comparison- quantifying similar structures, rather than relating dis- pairs exhibit large misfit values, but have similar en- parate structures. thalpies. This follows intuition since it is possible for Figure6 reveals possible thresholds for  based significantly differing structures to have similar proper- match on the maximum enthalpy difference allowed for mapped ties. The sparsity of the data points for large values structures. For  0.1, the enthalpy differences per atom of  is attributed to the lack of one-to-one mappings as are all below 5 meV/atom,≤ with the exception of one structures become increasingly dissimilar. This suggests comparison-pair. Reducing the misfit cutoff reasonably XtalFinder and the misfit criteria are better suited to guarantees the enthalpy differences will also decrease; e.g. 13

TABLE 2. Functionalities of comparison codes specific to high-throughput analysis and structure prototyping. This tabulation is not exhaustive; many programs offer additional analyses, such as fragment/molecular comparisons, and are outside the scope of this work. †: optional symmetry input. ∗: requires symmetry input. §: Structure Matcher compares to the AFLOW Prototype Encyclopedia partially, as it does not provide the internal degrees of freedom for the prototype. k: Structure Matcher matches magnetic structures with opposite spins (SpinComparator function). AFLOW- Structure STRUCTURE- XtalFinder Matcher XTALCOMP SPAP CMPZ CRYCOM TIDY COMPSTRU source-code available x x x prototyping tools x consider symmetry x x† x∗ x∗ x∗ perform multiple comparisons x x material-type comparisons x x x x x x x x structure-type comparisons x x x x x x x x decoration-type comparisons x magnetic-type comparisons x xk compare to database x compare to prototypes x x§ quantitative similarity metric x x x x x x x enthalpies will be within 1 meV/atom and 2 meV/atom Quantum ESPRESSO [29], ABINIT [30], ELK [31], for misfit values below 3.58 10−3 and 0.025, respec- and CIF; Structure Matcher: POSCAR, CIF, ABINIT, tively. The maximum enthalpy× difference jumps signifi- and a Pymatgen object; XTALCOMP: C++ object (and cantly (to approximately 50 meV/atom) near  = 0.125. POSCARs via online tool); SPAP: POSCAR (and CIFs via Thus, matching structures with misfits beyond  = 0.1 CIF2Cell [62]); CMPZ: KPLOT structure files; CRYCOM: are not guaranteed to exhibit similar enthalpies. This FDAT files (native to the Cambridge Structural Database value is in agreement with Burzlaff and Malinovsky’s (CSD)[63]); STRUCTURE-TIDY: creates structures proposed threshold. By default, XtalFinder employs based on space group input, unit cell parameters, and a threshold of match = 0.1 to ensure similar materials positions of atoms; and COMPSTRU: CIF. match within approximately 5 meV/atom. The thresh- Symmetry analysis. XtalFinder is the only package old is also used for comparing prototypes; two prototypes coupled with an internal symmetry calculator (AFLOW- that match within match = 0.1, are expected to have en- SYM [19]). CRYCOM, STRUCTURE-TIDY, and COMP- thalpies within 5 meV/atom when decorated with the STRU require a symmetry input (space group number) same atomic elements. Users can adjust to stricter (or to perform the comparison, but lack methods to calcu- looser) thresholds for matching; however,  0.1 match  late the symmetry internally. CMPZ allows a symmetry is not guaranteed to yield small enthalpy differences be- input; however it is not required to perform the compar- tween matched structures. ison. Structure Matcher, XTALCOMP, and SPAP do not Functionality differences with other codes. In ad- consider symmetry in their structural analyses. dition to XtalFinder, other structure comparison tools Multiple comparisons. The only packages that offer are available to the materials science community: Struc- comparison of multiple materials in a single command are ture Matcher [10], XTALCOMP [9], SPAP [11], CMPZ [12], XtalFinder and Structure Matcher. Other software, such CRYCOM [8], STRUCTURE-TIDY (via Platon) [13], and as XTALCOMP and SPAP, showcase comparison results COMPSTRU [14]. A summary of the offered function- performed on multiple structures, but multi-comparison alities related to automatic comparisons and structure routines are not available to users. To achieve similar prototyping is indicated in Table2 and described below. functionality, users need to implement external regroup- Source code availability. The source codes are ing procedures. available for the following packages: XtalFinder (via Decoration-type comparisons. XtalFinder is the AFLOW), Structure Matcher (via Pymatgen), and XTAL- only code that automatically determines the unique (and COMP (via XTALOPT). Pre-compiled binaries for SPAP equivalent) atom decorations for a given crystal struc- on different operating systems are available with the CA- ture. With other packages, users must externally gener- LYPSO software [59, 60]. Source codes for the other soft- ate, organize, and compare the subsequent decorations. ware: CMPZ (implemented in KPLOT [61]), CRYCOM, Beyond the lack of routines to generate decorations, STRUCTURE-TIDY, and COMPSTRU (online) are not the codes find incorrect equivalent decoration groups. available. Therefore, the latter packages are not con- In the BiITe (ICSD #10500) example discussed in the venient for merging into user-workflows. “Decoration-type” comparison mode section, Structure Input file formats. The different structure file Matcher’s group_structures function (with ltol=0.2, formats for each comparison code are listed below; stol=0.17, angle_tol=5.0) identifies groupings that vi- XtalFinder: VASP (POSCAR)[27], FHI-AIMS [28], olate divisor theory: 14

• ABC = BAC (rms=0.1196) = CBA Speed and scaling considerations. Comparison (rms=0.0194), speeds were evaluated for packages that could be com- • CAB = ACB (rms=0.1196), and piled locally on a Linux machine: XtalFinder (V3.2), • BCA (no equivalent decorations). Structure Matcher (V2020.4.2), and XTALCOMP (down- A similar discrepancy occurs for the remaining codes loaded from GitHub on 14 Apr. 2020). The bench- depending on the structure input order and compari- marks were run with a single processor on a 2.60 GHz son tolerance(s). XtalFinder checks consistency with Intel(R) Xeon(R) Gold 6142 CPU machine, and the re- Lagrange’s theorem to validate permissible decoration spective default tolerances were used for all codes. Pair- groupings; a burden that falls to users of the other pack- wise comparison times are similar between the packages; ages. on the order of milliseconds. For a 1,494 pairwise com- Furthermore, XtalFinder calculates Wyckoff positions parison test set [65], XtalFinder averaged 282 millisec- a priori to check if decorations are commensurate based onds/comparison, Structure Matcher averaged 689 mil- on symmetry, i.e. mapped positions have the same multi- liseconds/comparison, and XTALCOMP averaged 33 mil- plicity and similar site symmetries. Without this valida- liseconds/comparison. XTALCOMP is the fastest, but tion, positions with differing site symmetries can be mis- at the cost of limited scope and functionality: XTAL- taken as equivalent. For example, the GaPu compound COMP does not scale volumes and quits immediately if (ICSD #103930) has three Wyckoff positions: Ga 8 h the volumes/lattice vectors are different sizes. Therefore, m.2m, Pu 4 d 4m2, and Pu 4 e 4mm (before and after XTALCOMP finds fewer matches (18), while XtalFinder relaxation). From− symmetry, the Ga and Pu sites can- and Structure Matcher find more (approximately 450). not be swapped to yield degenerate compounds. Despite For large, skewed cells, XtalFinder can be slower since the symmetry restrictions, Structure Matcher incorrectly it does not convert to Minkowski, Niggli, and/or primi- groups the decorations (group_structures) into equiv- tive cells by default to preserve the input representations alent bins using their default tolerance values (i.e. ltol (unlike Structure Matcher and XTALCOMP). To increase =0.2, stol=0.3, angle_tol=5). XtalFinder — with its speed in XtalFinder, lattice transformations are available symmetry analysis coupled with mapping routines — cor- with the relevant options for Minkowski (--minkowski), rectly distinguishes these decorations and is the only vi- Niggli (--niggli), and primitive (--primitive) reduc- able option to establish consistent unique/duplicate dec- tions. orations for crystalline prototypes. For multiple comparisons, XtalFinder scales more ef- Database comparisons. XtalFinder is the only mod- ficiently with the number of compounds when compared ule that features a method for comparing input struc- to other software. XtalFinder groups structures into near tures to a database of materials, namely AFLOW.org. isoconfigurational sets via symmetry and local atomic The API functionality coupled with XtalFinder ensures geometries (both calculated internally), eliminating un- comparisons are performed with the most current ver- necessary mapping comparisons between dissimilar struc- sion of the database, incorporating new materials as they tures. All other codes do not use symmetry or local geom- are calculated. Furthermore, XtalFinder users can com- etry analyses to optimize groupings. Structure Matcher pare structures at various relaxation steps (with the -- — and other straightforward extensions to pairwise com- relaxation_step option). Users of the other packages parisons — only groups by composition and executes need to extract relevant structures (e.g. similar compo- more mapping procedures. For an ensemble of 600 struc- sitions, space group, Wyckoff positions, and local atomic tures modeling a disordered 5-metal carbide [66, 67], geometries at particular relaxation steps) by-hand or XtalFinder partitions immediately into 54 groups via the code auxiliary scripts to perform similar functionality. symmetry and local geometry comparisons, while Struc- Prototype comparisons. XtalFinder compares to ture Matcher puts all 600 structures into one large group. all prototype structures in AFLOW: the Prototype En- Consequently, XtalFinder executes 546 structure map- cyclopedia, the High-Throughput Quantum Comput- pings, and Structure Matcher performs 17,640 mapping ing library, and initial geometries in the AFLOW.org attempts before arriving at the same solution. repository. Similar to the database comparisons, While all benchmarks were performed serially, XtalFinder automatically includes new prototype struc- XtalFinder routines are parallelized and users can spec- tures as they are added to AFLOW. Structure Matcher ify the number of threads for the analyses (--np=x), only compares structures against a static subset of offering additional speed over other packages for large- AFLOW prototypes (i.e. the Prototype Encyclopedia via scale automatic comparisons requiring little or no user AflowPrototypeMatcher [64]). Moreover, XtalFinder input. Therefore, XtalFinder will be more performant, provides the internal degrees of freedom for any struc- especially when comparing more structures. ture (via the prototyping routines); functionality all ex- Comparison accuracy. As shown in Figure6, the isting codes currently lack. To compare to these pro- XtalFinder misfit value decreases with the enthalpy dif- totype representations, users of other packages need to ference between matched compounds, validating its ac- convert the degrees of freedom – including expansion of curacy. Comparisons with Structure Matcher are less the corresponding Wyckoff positions — into a structure accurate — and at times qualitatively incorrect — due file a priori. to conversions of structures to an “average lattice” [10], 15 matching significantly differing lattices with no penalty TABLE 3. Number of unique materials and prototypes ICSD on the rms value. For example, Se ( #104187, space in the AFLOW-ICSD repository. The statistics are or- group #229) and Se (ICSD #57181, space group #166) ganized by number of species, and the counts are shown for are classified as distinct by XtalFinder because the lat- the original and DFT-relaxed entries. tices are considerably dissimilar (latt = 0.15), consistent unique materials prototypes with the space groups. Despite having different sym- species entries metries, Structure Matcher inaccurately finds rms = 0 original relaxed original relaxed between the structures. This distorts their rms value, 1 1,606 538 440 236 196 and it cannot be used to correlate properties of matched 2 22,530 9,050 8,569 3,140 3,017 compounds, e.g. enthalpy. Conversely, XTALCOMP is 3 26,109 17,285 16,725 6,419 6,168 qualitatively accurate, but it lacks a quantitative similar- 4 8,185 6218 6,101 3,962 3,894 ity metric (the return type is a Boolean). Furthermore, 5 1,644 1442 1,426 1,177 1,156 6 291 262 258 246 236 XTALCOMP comparisons neglect volume scaling between 7 25 25 25 25 25 structures, an essential feature for identifying prototypes. XtalFinder is the only comparison software suitable for total 60,390 34,820 33,544 15,205 14,692 quantitatively measuring similarity of materials and pro- totypes. Overall, XtalFinder is optimized for prototype detec- set. This is attributed to the different volumes (e.g. mea- tion and structural comparison within large datasets. In sured temperatures and pressures) of the original geome- addition, it is designed to be accessible to the broader ma- tries, while the DFT-relaxed geometries represent the terials science community for integration into user work- ground state configurations, yielding additional degen- flows. erate compounds. Unique prototypes in the AFLOW-ICSD cata- Overall, the binaries and ternaries have the highest log. With XtalFinder, unique compounds and proto- number of entries, and thus, prototypes. The number types have been identified in the ICSD catalog of the of entries/prototypes drops with species n > 3, follow- AFLOW.org repository. Table3 shows the statistics for ing statistics regarding the complexity of materials [68]. the original (reported by the ICSD [57]) and DFT-relaxed Table4 partitions the prototypes by their symmetry, geometries (via the AFLOW standard [32]) for 60,390 en- i.e. Bravais lattices. The number of lower symmetry tries. Material-type comparisons and suppressing volume prototypes (tri, mcl, and mclc) exceed the higher ones scaling reveal the number of unique compounds. Sub- (cub, fcc, bcc) because lower symmetry classes have ad- sequent structure-type comparisons (allows for volume ditional degrees of freedom, permitting more geometric scaling) determine the number of distinct prototypes. diversity. Similar to Table3, there are generally more The representative compound for each prototype is cho- original prototypes in each lattice type compared to their sen as the entry with the lowest ICSD number, since it relaxed counterparts. However, 347 structures changed is generally the oldest among the compounds (and less lattice symmetry upon relaxation, yielding the following likely to be removed from the ICSD). The unique atom net Bravais lattice type gains/losses: tri (+8), mcl (-17), decorations for each prototype are determined via the mclc (+46), orc (-31), orcc (+48), orcf (+1), orci (+5), decoration-type comparison. Moreover, the prototypes tet (+17), bct (+8), hex (-87), rhl (-7), cub (+2), fcc are cast into the AFLOW prototype designation form, (+4), bcc (+3). In particular, the mclc and orcc Bravais exposing its degrees of freedom. Finally, the prototypes lattices had a considerable influx of prototypes, offset- are compared to the Prototype Encyclopedia [16, 17] to ting the expected reduction of prototypes due to DFT distinguish between existing and new structures. For the geometry optimization. subsequent comparisons, the matching threshold is cho- While AFLOW.org contains a subset of the ICSD sen as match = 0.1 to group similar compounds (and catalog, the highest frequency prototypes are consis- prototypes when decorated with alike atoms) that are tent with those published for the ICSD [69]. In par- expected to have enthalpies differing by approximately ticular, XtalFinder and the ICSD both identify the 5 meV/atom or less (see subsection “Finding match: following structures as some of the most common structural misfit versus calculated property (∆Hatom)” prototypes: Al2MgO4 (spinel, A2BC4 cF56 227 d a e- for details). 001), CaTiO3 (cubic perovskite, AB3C cP5 221 a c b), The analysis shows that the original geometry set in- GdFeO3 (AB3C oP20 62 a cd c), and NaCl (rocksalt, cludes 34,820 unique compounds (57.7% of the total num- AB cF8 225 a b) (see Table5 and the Supplementary In- ber of entries) and 15,205 prototypes (25.2%). Similarly, formation). The criteria for grouping compounds into the DFT-relaxed set contains 33,544 unique compounds structure-types described in Ref. [69] is more relaxed (55.5%) and 14,692 prototypes (24.3%). Based on the than XtalFinder (e.g. larger tolerances for c/a and β symmetry comparisons, there are 8,521 (14.1%) original ranges and user-defined ranges for fractional atomic co- and 8,493 (14.1%) relaxed distinct isopointal structure- ordinates). Consequently, XtalFinder finds more distinct types. In general, the original geometry set has more dis- prototype structures than the 1,600 (as of January 2007) tinct compounds and prototypes than the DFT-relaxed in Ref. [69]. 16

able for improving performance. Built-in methods com- TABLE 4. The prototypes and their symmetries. The AFLOW prototypes are grouped into the 14 Bravais lattices: triclinic pare input structures to the .org repository and (tri), monoclinic, (mcl), base-centered monoclinic (mclc), or- the AFLOW prototype libraries for detecting new com- thorhombic (orc), base-centered orthorhombic (orcc), face- pounds and structure-types. Crystal prototyping tech- centered orthorhombic (orcf), body-centered orthorhom- niques are also introduced to cast structures into a stan- bic (orci), tetragonal (tet), body-centered tetragonal (bct), dard designation, facilitating extensions of the Prototype hexagonal (hex), rhombohedral (rhl), simple cubic (cub), Encyclopedia. A command line and Python interface face-centered cubic (fcc), and body-centered cubic (bcc). are provided for easing incorporation into user-workflows. prototypes Applying the procedures to the AFLOW-ICSD reposi- lattice type original relaxed tory revealed approximately 15,000 prototypes out of over 60,000 ICSD entries, representing over 34,000 unique tri 1,345 1,338 compounds. Subsequent comparisons with the AFLOW mcl 2,266 2,255 prototype libraries exposed new candidate entries for fu- mclc 2,165 2,195 orc 2,665 2,493 ture iterations of the encyclopedia. Overall, XtalFinder orcc 1,093 1,158 serves as a versatile tool for finding prototypes and com- orcf 167 157 paring crystalline geometries. orci 305 292 Command-line interface. The XtalFinder command- tet 938 887 line calls are detailed below. Function descriptions and bct 861 817 options are provided following each command. hex 1,720 1,540 Prototype commands. rhl 996 911 cub 274 265 • aflow --prototype < file fcc 227 211 – Converts a structure (file) into its standard bcc 183 173 AFLOW prototype label. The parameter variables (degrees of freedom) and corresponding values are also listed. Information about the label and param- From this analysis, new candidate prototypes have eters are described in the Refs.[16, 17]. been identified that are missing from the Prototype En- Options specific to this command: cyclopedia (signified by empty rows in the last columns of --setting=1|2|aflow Specify the space group setting for the conven- Table5 and the Supplementary Information). The num-  ber of new prototypes in the original (relaxed) sets with tional cell/Wyckoff positions. The aflow set- more than 10 unique compounds exhibiting the structure ting follows the choices of the Prototype En- are: binaries 31 (33), ternaries 168 (177), quaternaries cyclopedia: axis-b for monoclinic space groups, 40 (42), and quinaries 4 (3); while the unaries, senaries, rhombohedral setting for rhombohedral space and septenaries have 0 (0). This amounts to 243 dis- groups, and origin centered on the inversion tinct crystalline structures that will be incorporated into site for centrosymmetric space groups (default: future installments of the Prototype Encyclopedia [70]. aflow). Some structures in Table5 and the Supplemen- • aflow --proto=

TABLE 5. Most frequent prototypes in the AFLOW-ICSD catalog. The five most common prototypes are shown for unary, binary, ternary, and quaternary compounds as identified via XtalFinder. The original and relaxed geometry sets are shown on the top and bottom portions of the table, respectively. Each prototype is listed with the following information: AFLOW label, number of unique atom decorations, representative compound with its ICSD designation, number of unique compounds exhibiting the structure (along with the count when including duplicate compounds), and matches to existing AFLOW prototypes, if they exist. Empty rows in the AFLOW prototype column reveal new prototypes, which will be included in Part 3 of the AFLOW Prototype Encyclopedia [16, 17]. The complete list of prototypes is provided in the Supplementary Information. # unique representative AFLOW prototype AFLOW label # compounds decors. compd. (ICSD #) (common name) A cF4 225 a 1 Gd (20502) 86 (379) 1, 2,A cF4 225 a (face-centered cubic, A1) A cI2 229 a 1 H (28465) 58 (228) 58, 59,A cI2 229 a (body-centered cubic, A2) A hP2 194 c 1 Be (1425) 54 (252) 115, 117,A hP2 194 c-001 (hexagonal close packed, A3) A cP1 221 a 1 Sb (52227) 13 (16)A cP1 221 a( α-Po, Ah) A tI2 139 a 1 Ga (12174) 11 (32) 303, 304,A tI2 139 a-001 (In, A6) AB cP2 221 b a 1 ClCs (22173) 429 (1026) 61, 1026, 1205,AB cP2 221 b a (CsCl, B2) AB cF8 225 b a 1 INa (44279) 396 (2176) 201, 720, 1009, 1200,AB cF8 225 a b (rocksalt, B1) AB3 cP4 221 a c 2 SiU3 (1890) 308 (905) 25, 26, AB3 cP4 221 a c (Cu3Au, L12) A2B cF24 227 c b 2 Al2Ca (30213) 251 (1258) 182, 183, 1042, A2B cF24 227 d a (cubic laves, C15) AB cF8 216 a c 1 CuI (9098) 124 (557) 218, 1007, 1201,AB cF8 216 c a (zincblende, B3)

original A3BC cP5 221 c a b 6 O3PbTi (1613) 414 (759) T0009, AB3C cP5 221 a c b (cubic perovskite, E21) A2BC cF16 225 c b a 3 Cu2LiSi (15128) 293 (556) T0001, TBCC013,AB2C cF16 225 a c b (Heusler, L21) ABC hP9 189 f bc g 6 RuSiZr (16306) 227 (332) ABC cF12 216 c a b 3 AuMgSn (16475) 188 (287) T0003, ABC cF12 216 b c a (half-Heusler, C1b) ABC oP12 62 c c c 6 CoMoP (2421) 185 (244) T0004 (CoGeMn ICSD:#52968)

A2BC6D cF40 225 c a e b 12 Ba2MnO6W (189) 207 (314) Q0001 (elpasolite) ABCD tP8 129 b c a c 24 AgLaOS (15530) 54 (86) AB3C7D hP24 173 a c b2c b 24 CuLa3S7Si (23519) 50 (82) A3BC6D hR22 167 e a f b 24 Ca3LiO6Ru (50018) 49 (74) A2B12C3D3 cI160 230 a h d c 24 Al2F12Li3Na3 (9923) 41 (219) A2B3C12D3 cI160 230 a c h d-001 (garnet, S14) A cF4 225 a 1 Ce (2284) 68 (383) 1, 2,A cF4 225 a (face-centered cubic, A1) A cI2 229 a 1 H (28465) 50 (231) 58, 59,A cI2 229 a (body-centered cubic, A2) A hP2 194 c 1 Be (1425) 40 (244) 115, 117,A hP2 194 c-001 (hexagonal close packed, A3) A cP1 221 a 1 Sb (52227) 12 (27)A cP1 221 a( α-Po, Ah) A tI2 139 a 1 Ga (12174) 9 (31) 303, 304,A tI2 139 a-001 (In, A6) AB cP2 221 a b 1 CsI (9204) 399 (1041) 61, 1026, 1205,AB cP2 221 b a (CsCl, B2) AB cF8 225 b a 1 INa (44279) 338 (2188) 201, 720, 1009, 1200,AB cF8 225 a b (rocksalt, B1) AB3 cP4 221 a c 2 SiU3 (1890) 308 (915) 25, 26, AB3 cP4 221 a c (Cu3Au, L12) A2B cF24 227 c b 2 Fe2Tb (2351) 248 (1270) 182, 183, 1042, A2B cF24 227 d a (cubic laves, C15) AB cF8 216 c a 1 CuI (9098) 115 (557) 218, 1007, 1201,AB cF8 216 c a (zincblende, B3)

relaxed A3BC cP5 221 c a b 6 O3PbTi (1613) 399 (841) T0009, AB3C cP5 221 a c b (cubic perovskite, E21) A2BC cF16 225 c b a 3 Cu2LiSi (15128) 291 (556) T0001, TBCC013, AB2C cF16 225 a c b (Heusler, L21) AB2C2 tI10 139 a e d 6 CaGe2Ni2 (408) 241 (572) T0011 (As2CePd2 ICSD:#604354) ABC hP9 189 f bc g 6 RuSiZr (16306) 230 (332) ABC cF12 216 c b a 3 AuMgSn (16475) 188 (287) T0003, ABC cF12 216 b c a (half-Heusler, C1b)

A2BC6D cF40 225 c a e b 12 Ba2MnO6W (189) 209 (321) Q0001 (elpasolite) ABCD tP8 129 b c a c 24 AgLaOS (15530) 55 (95) A3BC6D hR22 167 e a f b 24 Ca3LiO6Ru (50018) 45 (69) AB3C7D hP24 173 a c b2c b 24 CuLa3S7Si (23519) 40 (72) A2B12C3D3 cI160 230 a h d c 24 Al2F12Li3Na3 (9923) 37 (218) A2B3C12D3 cI160 230 a c h d-001 (garnet, S14)

species and with commensurate stoichiometric ra- try files to compare, and tios, i.e. material-type comparison, and returns aflow --compare_materials -F=: their level of similarity (misfit value). This method  specify file () contain- identifies unique and duplicate materials. There are ing compounds between delimiters three input types: [VASP_POSCAR_MODE_EXPLICIT]START and aflow --compare_materials=,,...: [VASP_POSCAR_MODE_EXPLICIT]STOP. Additional  append geometry files (,,...) to delimiters will be included in later versions. compare, • aflow --compare_structures aflow --compare_materials -D : spec- – Compares compounds with commensurate stoichio-  ify path to directory () containing geome- metric ratios with no requirement of the atomic 18

species, i.e. structure-type comparison, and returns – Specifies the misfit threshold for matched structures their level of similarity (misfit value). This method (default: match = 0.1). identifies unique and duplicate prototypes. There • --misfit_family=| are three input types: --misfit_family_threshold= aflow --compare_structures=,,...: – Specifies the misfit threshold for structures in the  append geometry files (,,...) to “same family” (default: family = 0.2). compare, • --np=|--num_proc= aflow --compare_structures -D : – Allocate the number of processors/threads for the  specify path to directory () containing task. geometry files to compare, and • --optimize_match aflow --compare_structures -F=: – Explore all lattice and origin choices to find the best  specify file () contain- matching representation, i.e. minimizes misfit value. ing compounds between delimiters • --no_scale_volume [VASP_POSCAR_MODE_EXPLICIT]START and – Suppresses volume rescaling during structure match- [VASP_POSCAR_MODE_EXPLICIT]STOP. Additional ing; identifies differences due to volume expansion or delimiters will be included in later versions. compression of a structure. • aflow --compare2database < file • --ignore_symmetry – Compares a structure (file) to AFLOW database – Neglects symmetry (both space group and Wyckoff entries, returning similar compounds and quantify- positions) for grouping comparisons. ing their levels of similarity (misfit values). Mate- • --ignore_Wyckoff rial properties can be extracted from the database – Neglects Wyckoff symmetry (site symmetry) for fil- (via AFLUX) and printed, highlighting structure- tering comparisons, but considers the space group property relationships. Performs material-type com- number. parisons or structure-type comparisons (by adding • --ignore_local_geometry the --structure_comparison option). Options – Neglects local LFA geometries for filtering compar- specific to this command: isons. --properties= • --minkowski Specify the properties via their API keyword to – Performs a Minkowski lattice transformation [7] on  print the corresponding values with the compar- all structures prior to comparison; offering a speed ison results. increase. --catalog= • --niggli Restrict the database entries to a specific cata- – Performs a Niggli lattice transformation [6] on all  log/library (e.g. ‘lib1’, ‘lib2’, ‘lib3’, ‘icsd’, etc.). structures prior to comparison; offering a speed in- --geometry_file= crease. Compare geometries from a particular DFT • --primitive|--primitivize  relaxation step (e.g. ‘POSCAR.relax1’, – Converts all structures to a primitive form prior to ‘POSCAR.relax2’, ‘POSCAR.static’, etc.). comparison; offering a speed increase. • aflow --compare2prototypes < file • --keep_unmatched – Compares a structure (file) against the AFLOW – Retains misfit information of unmatched structures prototype libraries, returning similar structures and (i.e.  > match). quantifying their levels of similarity (misfit values). • --match_to_aflow_prototypes --catalog= – Identifies matching AFLOW prototypes to the Restrict the prototypes to a specific catalog/li- representative structure. The option does not  brary (e.g. ‘aflow’ or ‘htqc’). apply to --unique_atom_decorations or -- • aflow --isopointal_prototypes < file compare2prototypes (redundant). – Returns prototype labels that are isopointal (i.e. • --magmom=:... similar space group and Wyckoff positions) to the – Specifies the magnetic moment for each structure input structure (file). (collinear or non-collinear) delimited by colons, sig- --catalog= naling a magnetic-type comparison. The option does Restrict the prototypes to a specific catalog/li- not apply to --compare_structures since the atom  brary (e.g. ‘aflow’ or ‘htqc’). type is neglected. XtalFinder supports three input • aflow --unique_atom_decorations < file formats for the magnetic moment: i. explicit dec- – Determines the unique and duplicate atom decora- laration via comma-separated string m1, m2, ...mn tions for a given structure. (m1,x, m1,y, m1,z, m2,x, ...mn,z for non-collinear) ii. Generic options for all comparison commands (unless read from a VASP INCAR, or iii. read from a VASP indicated otherwise): OUTCAR. Additional magnetic moment readers for other ab initio codes will be available in future ver- --misfit_match=| • sions. --misfit_match_threshold= 19

• --add_aflow_prototype_designation • compare_structures_directory(directory, – Casts representative structure into the AFLOW stan- options) dard designation. The option does not apply • compare_structures_file(filename, options) to commands --unique_atom_decorations or -- • compare2database(input_file, options) prototype (redundant). • compare2prototypes(input_file, options) • --remove_duplicate_compounds • get_isopointal_prototypes(input_file, – For structure-type comparisons, duplicate com- options) pounds are identified first (via a material-type com- • get_unique_atom_decorations(input_file, parison without volume scaling), then remaining options) unique compounds are compared, removing dupli- The input fields for the Python functions are as follows: cate bias. • input_file • --print – A string specifying the path to a structure file, e.g. – For comparing two structures, additional compar- input_file=‘/home/user/test.poscar’. ison information is printed, including atom map- • input_files pings, distances between matched atoms, and the – A list of paths (of any size 2) to structure files, transformed structures in the closest matching rep- e.g. input_files=[‘test1.poscar’,≥ ...]. resentation. • directory • --print=text|json – A string specifying the path to directory contain- – For comparing multiple structures, the results are ing structure files, e.g. directory=‘/home/user/ printed to into human-readable text or JSON files, directory’. respectively. By default, XtalFinder writes the out- • filename put to both files. – A string specifying the path to a file con- • --quiet taining structure files separated by a delimiter, – Suppresses the log information for the comparisons. e.g. filename=‘/home/user/list_of_structures • --screen_only .txt’. – Prints the comparison results to the screen and does • options not write to any files. – A string specifying non-default functionality (op- Python environment. In addition to the command- tional), which has the form -- or --=, e.g. “--ignore_symmetry --np=8”. into a variety of workflows. The module mirrors the for- Python module. Below is a Python module for the mat used for AFLOW-SYM [19] and AFLOW-CHULL [72]. XtalFinder functionality. All output is converted into An XtalFinder function is performed on the input(s) and JavaScript Object Notation (JSON) to ease integration the results are returned to an XtalFinder class. The into user workflows. module wraps around a local instance of AFLOW, and import json the path to the AFLOW executable can be specified by: import subprocess XtalFinder(aflow_executable=‘your_executable’). import os By default, the XtalFinder object searches for an AFLOW executable in the PATH. An example Python class XtalFinder: script is shown below, where XtalFinder object is initial- ized and a material-type comparison between two struc- def i n i t (self, aflow executable=’aflow’): ture files (POSCARs) is performed. self.aflow executable = aflow executable from aflow xtal match import XtalFinder from pprint import pprint def aflow command(self, cmd): try: xtal match = XtalFinder(aflow executable=’./aflow’) return subprocess.check output( input files = [‘test1.poscar’,‘test2.poscar’] self.aflow executable + cmd, output = xtal match.compare materials(input files) shell=True pprint(output) ) except subprocess.CalledProcessError: The following Python functions are accessible, corre- print "Error aflow executable not found sponding to the commands described in the previous sec- , at:"+ self.aflow executable tion: → • get_prototype_label(input_file, options) def get prototype label(self, input file , • compare_materials(input_files, options) , options=None): → • compare_materials_directory(directory, fpath = os.path.realpath(input file) options) command =’--prototype’ • compare_materials_file(filename, options) output =’’ • compare_structures(input_files, options) 20

if options: res json = json.loads(output) command +=’’ + options return res json

output = self.aflow command( def compare structures(self, input files , command +’--print=json<’ + fpath , options=None): ) command→ =’--compare structures=’+’,’. , join(input files) res json = json.loads(output) output→ =’’ return res json if options: def compare materials(self, input files , command +=’’ + options , options=None): command→ =’--compare materials=’+’,’. output = self.aflow command( , join(input files) command +’--print=json--screen only output→ =’’ , --quiet’ ) → if options: command +=’’ + options res json = json.loads(output) return res json output = self.aflow command( command +’--print=json--screen only def compare structures directory(self, , --quiet’ , directory, options=None): ) → command→ =’--compare structures-D’+ , directory res json = json.loads(output) output→ =’’ return res json if options: def compare materials directory(self, directory command +=’’ + options , , options=None): command→ =’--compare materials-D’+ output = self.aflow command( , directory command +’--print=json--screen only output→ =’’ , --quiet’ ) → if options: command +=’’ + options res json = json.loads(output) return res json output = self.aflow command( command +’--print=json--screen only def compare structures file(self, filename, , --quiet’ , options=None): ) → command→ =’--compare structures-F=’+ , filename res json = json.loads(output) output→ =’’ return res json if options: def compare materials file(self, filename, command +=’’ + options , options=None): command→ =’--compare materials-F=’+ output = self.aflow command( , filename command +’--print=json--screen only output→ =’’ , --quiet’ ) → if options: command +=’’ + options res json = json.loads(output) return res json output = self.aflow command( command +’--print=json--screen only def compare2database(self, input file, options= , --quiet’ , None): ) → fpath→ = os.path.realpath(input file) command =’--compare2database’ 21

output =’’ res json = json.loads(output) if options: return res json command +=’’ + options AFLOW-XtalFinder JSON output details. The out- output = self.aflow command( put keywords for the XtalFinder functions are listed be- command +’--print=json--screen only low as they appear in the JSON format. The output , --quiet<’ + fpath for multiple comparisons (user defined sets, comparison → ) to AFLOW prototypes, and comparison to AFLOW.org entries), unique atom decorations, and casting into the res json = json.loads(output) AFLOW prototype representation are described. return res json AFLOW prototype designation. • aflow_prototype_label def compare2prototypes(self, input file , – Description: AFLOW label for the structure. , options=None): → – Type: string fpath = os.path.realpath(input file) • aflow_prototype_params_list command =’--compare2prototypes’ – Description: degrees of freedom (variables) in the output =’’ lattice and/or Wyckoff positions for the structure. – Type: array of strings if options: • aflow_prototype_params_values command +=’’ + options – Description: values specifying the degrees of freedom for the structure. output = self.aflow command( – Type: array of floats command +’--print=json--screen only Comparison results. , --quiet<’ + fpath structure_representative ) → • – Description: name of the representative structure for res json = json.loads(output) the prototype structure. Type: string return res json – • stoichiometry Description: def get isopointal prototypes(self, input file , – stoichiometry of the prototype struc- ture. , options=None): Type: array of integers fpath→ = os.path.realpath(input file) – number_of_types command =’--isopointal prototype’ • Description: output =’’ – number of atom types (species) in the prototype structure. Type: integer if options: – number_of_atoms command +=’’ + options • – Description: number of atoms in the unit cell (from output = self.aflow command( the representative structure). Type: integer command +’--print=json<’ + fpath – elements ) • – Description: atomic elements found in this structure res json = json.loads(output) from both the representative and duplicate com- return res json pounds/structures. – Type: array of strings space_group def get unique atom decorations(self, • – Description: space group number for the prototype , input file, options=None): fpath→ = os.path.realpath(input file) structure. Type: integer command =’--unique atom decorations’ – grouped_Wyckoff_positions output =’’ • – Description: Wyckoff positions grouped by atomic if options: species (corresponding to the representative struc- command +=’’ + options ture). – Type: array of Wyckoff objects geomeries_LFA output = self.aflow command( • Description: command +’--print=json<’ + fpath – local atomic geometries comprised of ) LFA types only (corresponding to the representative structure). 22

– Type: array of local_geometry objects – Type: array of floats • property_names • properties_structure_representative – Description: API keywords corresponding to ma- – Description: values of the material properties re- terial properties (available for comparisons to the quested for the representative structure (available AFLOW.org repository only). for comparisons to the AFLOW.org repository only). – Type: array of strings – Type: array of strings • property_units • properties_structures_duplicate – Description: units, if applicable, for material prop- – Description: values of the material properties re- erties (available for comparisons to the AFLOW.org quested for the duplicate structures (available for repository only). comparisons to the AFLOW.org repository only). – Type: array of strings – Type: 2D array of strings • structures_duplicate • properties_structures_family – Description: names of duplicate structures that – Description: values of the material properties re- match with the representative structure, i.e. mis- quested for the same family structures (available for fit is less than match. comparisons to the AFLOW.org repository only). – Type: array of strings – Type: 2D array of strings • misfits_duplicate • number_compounds_matching_representative – Description: values of the misfit between the repre- – Description: number of compounds that match with sentative structure and the duplicate structures. the representative structure via a material-type com- – Type: array of floats parison (only for structure-type comparisons that re- • lattice_deviations_duplicate move duplicate compounds beforehand). – Description: values of the lattice deviation between – Type: integer the representative structure and the duplicate struc- • number_compounds_matching_duplicate tures. – Description: number of compounds that match with – Type: array of floats the duplicates structures via a material-type com- • coordinate_displacements_duplicate parison (only for structure-type comparisons that – Description: values of the coordinate displacement remove duplicate compounds beforehand). between the representative structure and the dupli- – Type: array of integers cate structures. • number_compounds_matching_family – Type: array of floats – Description: number of compounds that match with • failures_duplicate the same family structures via a material-type com- – Description: values of the figure of failure between parison (only for structure-type comparisons that re- the representative structure and the duplicate struc- move duplicate compounds beforehand). tures. – Type: array of integers – Type: array of floats • matching_aflow_prototypes • structures_family – Description: labels of AFLOW crystal proto- – Description: names of structures that are within the types [16, 17] that match with this structure (in- same family as the representative structure, i.e. mis- cluded when using option fit is between match and family. “--add_matching_aflow_prototypes”). – Type: array of strings – Type: array of strings • misfits_family A Wyckoff object contains the following: – Description: values of the misfit between the repre- element sentative structure and the same family structures. • Description: – Type: array of floats – atomic species on Wyckoff site. – Type: string • lattice_deviations_family type – Description: values of the lattice deviation between • Description: the representative structure and the same family – an index corresponding to atomic structures. species, based on alphabetic ordering of element – Type: array of floats name. – Type: integer • coordinate_displacements_family letters – Description: values of the coordinate displacement • Description: between the representative structure and the same – Wyckoff letters for the atomic species. Type: array of strings family structures. – multiplicities – Type: array of floats • – Description: Wyckoff multiplicities for the atomic • failures_family – Description: values of the figure of failure between species. Type: array of integers the representative structure and the same family – site_symmetries structures. • – Description: Wyckoff site symmetries for the atomic 23

species. – Type: array of strings – Type: array of strings • aflow_prototype_params_values_orig A local_geometry object contains the following: – Description: values specifying the degrees of free- • center_element dom of the structure (original geometry). – Description: atomic species at the center of the ge- – Type: array of floats ometry cluster. • aflow_prototype_label_relax – Type: string – Description: the standard prototype label of the • center_type structure (DFT-relaxed geometry). – Description: index corresponding to atomic species – Type: string at the center of the geometry cluster; enumeration • aflow_prototype_params_list_relax is based on alphabetic ordering of element name. – Description: degrees of freedom (variables) in the – Type: integer lattice and/or Wyckoff positions of the structure • neighbor_elements (DFT-relaxed geometry). – Description: atomic elements of neighbors. – Type: array of strings – Type: array of strings • aflow_prototype_params_values_relax • neighbor_distances – Description: values specifying the degrees of free- – Description: distances of the neighbors from the cen- dom of the structure (DFT-relaxed geometry). ter atom – Type: array of floats – Type: array of floats • neighbor_frequencies – Description: coordination of the neighbors at the All crystallographic structure data is freely available corresponding neighbor distance (within 10%). and accessible online through AFLOW.org or program- – Type: array of integers matically via the REST- and AFLUX Search-APIs. The • neighbor_coordinates AFLOW prototype information is provided online at – Description: coordinates of the neighbors that com- http://aflow.org/prototype-encyclopedia, and the corre- prise the local atomic geometry; the origin of the sponding structures can be generated with the AFLOW system resides on the center atom. source code. – Type: 2D array of floats The XtalFinder module is integrated into the AFLOW Permutation results. software (version 3.2 and later). The source code for • atom_decorations_equivalent AFLOW is available at http://aflow.org/install-aflow/ – Description: groupings of equivalent atom decora- and http://materials.duke.edu/AFLOW/, and it is com- tions for the structure. patible with most Linux, macOS, and Microsoft operat- – Type: 2D array of strings ing systems. The multithreaded capabilities require GNU Ideal prototype API keywords. g++-4.4 or later. • aflow_prototype_label_orig Supplementary information. The article is accom- – Description: the standard prototype label of the panied by supplementary information providing i. the structure (original geometry). computational details for the data in Figure6 and ii. – Type: string the full prototype list extracted from the AFLOW-ICSD • aflow_prototype_params_list_orig catalog (a continuation of Table5). – Description: degrees of freedom (variables) in the AFLOW-XtalFinder support. Questions and bug re- lattice and/or Wyckoff positions of the structure ports should be emailed to [email protected] with a sub- (original geometry). ject line containing “XtalFinder”.

[1] P. P. Ewald and C. Hermann, eds., Strukturbericht 1913- [6] P. Niggli, Handbuch der Experimentalphysik, vol. 7 1928 (Akademische Verlagsgesellschaft M. B. H., Leipzig, (Akademische Verlagsgesellschaft, 1928). 1931). [7] H. Minkowski, Geometrie der Zahlen (Teubner-Verlag, [2] P. Villars and L. Calvert, Pearson’s Handbook of Crys- 1896). tallographic Data for Intermetallic Phases (ASM Inter- [8] A. V. Dzyabchenko, Method of crystal-structure simi- national, Materials Park, Ohio, USA, 1991), 2nd edn. larity searching, Acta Crystallogr. Sect. B 50, 414–425 [3] S. Curtarolo, G. L. W. Hart, M. Buongiorno Nardelli, (1994). N. Mingo, S. Sanvito, and O. Levy, The high-throughput [9] D. C. Lonie and E. Zurek, Identifying duplicate crystal highway to computational materials design, Nat. Mater. structures: XTALCOMP, an open-source solution, Com- 12, 191–201 (2013). put. Phys. Commun. 183, 690–697 (2012). [4] A. Ko lcz, A. Chowdhury, and J. Alspector, Data dupli- [10] W. D. Richards, S. Dacek, and S. P. Ong, Pymatgen: cation: an imbalance problem? (2003). Structure Matcher, http://pymatgen.org/_modules/ [5] E. N. Muratov et al., QSAR without borders, Chem. Soc. pymatgen/analysis/structure_matcher. (2011). Rev. 49, 3525–3564 (2020). (accessed January 20, 2020). 24

[11] C. Su, J. Lv, Q. Li, H. Wang, L. Zhang, Y. Wang, a quasiharmonic Debye model, Phys. Rev. B 90, 174107 and Y. Ma, Construction of crystal structure prototype (2014). database: methods and applications, J. Phys.: Condens. [24] S. Curtarolo, W. Setyawan, G. L. W. Hart, M. Jahn´atek, Matter 29, 165901 (2017). R. V. Chepulskii, R. H. Taylor, S. Wang, J. Xue, K. Yang, [12] R. Hundt, J. C. Sch¨on,and M. Jansen, CMPZ - an algo- O. Levy, M. J. Mehl, H. T. Stokes, D. O. Demchenko, and rithm for the efficient comparison of periodic structures, D. Morgan, AFLOW: An automatic framework for high- J. Appl. Crystallogr. 39, 6–16 (2006). throughput materials discovery, Comput. Mater. Sci. 58, [13] L. M. Gelato and E. Parth´e, STRUCTURE TIDY - a 218–226 (2012). computer program to standardize crystal structure data, [25] A strict parameter comparison does not distinguish iso- J. Appl. Crystallogr. 20, 139–143 (1987). configurational structures, e.g. parameters may differ by [14] G. de la Flor, D. Orobengoa, E. Tasci, J. M. Perez-Mato, an origin shift. and M. I. Aroyo, Comparison of structures applying the [26] M.-O. Lenz, T. A. R. Purcell, D. Hicks, S. Curtarolo, tools available at the Bilbao Crystallographic Server, J. M. Scheffler, and C. Carbogno, Parametrically con- Appl. Crystallogr. 49, 653–664 (2016). strained geometry relaxations for high-throughput mate- [15] D. C. Lonie and E. Zurek, XTALOPT: An open-source rials science, npj Comput. Mater. 5, 123 (2019). evolutionary algorithm for crystal structure prediction, [27] G. Kresse and J. Hafner, Ab initio molecular dynamics Comput. Phys. Commun. 182, 372–387 (2011). for liquid metals, Phys. Rev. B 47, 558–561 (1993). [16] M. J. Mehl, D. Hicks, C. Toher, O. Levy, R. M. Hanson, [28] V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, G. L. W. Hart, and S. Curtarolo, The AFLOW Library X. Ren, K. Reuter, and M. Scheffler, Ab initio molecular of Crystallographic Prototypes: Part 1, Comput. Mater. simulations with numeric atom-centered orbitals, Com- Sci. 136, S1–S828 (2017). put. Phys. Commun. 180, 2175–2196 (2009). [17] D. Hicks, M. J. Mehl, E. Gossett, C. Toher, O. Levy, [29] P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, R. M. Hanson, G. L. W. Hart, and S. Curtarolo, The C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococ- AFLOW Library of Crystallographic Prototypes: Part 2, cioni, I. Dabo, A. Dal Corso, S. de Gironcoli, S. Fabris, Comput. Mater. Sci. 161, S1–S1011 (2019). G. Fratesi, R. Gebauer, U. Gerstmann, C. Gougoussis, [18] T. Hahn, ed., International Tables of Crystallogra- A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, phy. Volume A: Space-group symmetry (Kluwer Aca- F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, demic publishers, International Union of Crystallogra- L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. phy, Chester, England, 2002). Seitsonen, A. Smogunov, P. Umari, and R. M. Wentz- [19] D. Hicks, C. Oses, E. Gossett, G. Gomez, R. H. Tay- covitch, QUANTUM ESPRESSO: a modular and open- lor, C. Toher, M. J. Mehl, O. Levy, and S. Curtarolo, source software project for quantum simulations of mate- AFLOW-SYM: platform for the complete, automatic and rials, J. Phys.: Condens. Matter 21, 395502 (2009). self-consistent symmetry analysis of crystals, Acta Crys- [30] X. Gonze, J. M. Beuken, R. Caracas, F. Detraux, tallogr. Sect. A 74, 184–203 (2018). M. Fuchs, G. M. Rignanese, L. Sindic, M. Verstraete, [20] H. Burzlaff and Y. Malinovsky, A Procedure for the Clas- G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami, sification of Non-Organic Crystal Structures. I. Theoret- P. Ghosez, J. Y. Raty, and D. C. Allan, First-principles ical Background, Acta Crystallogr. Sect. A 53, 217–224 computation of material properties: the ABINIT software (1997). project, Comput. Mater. Sci. 25, 478–492 (2002). [21] C. Toher, C. Oses, D. Hicks, E. Gossett, F. Rose, P. Nath, [31] The Elk Code (2020). http://elk.sourceforge.net/. D. Usanmaz, D. C. Ford, E. Perim, C. E. Calderon, [32] C. E. Calderon, J. J. Plata, C. Toher, C. Oses, O. Levy, J. J. Plata, Y. Lederer, M. Jahn´atek, W. Setyawan, M. Fornari, A. Natan, M. J. Mehl, G. L. W. Hart, S. Wang, J. Xue, K. Rasch, R. V. Chepulskii, R. H. Tay- M. Buongiorno Nardelli, and S. Curtarolo, The AFLOW lor, G. Gomez, H. Shi, A. R. Supka, R. Al Rahal Al standard for high-throughput materials science calcula- Orabi, P. Gopal, F. T. Cerasoli, L. Liyanage, H. Wang, tions, Comput. Mater. Sci. 108 Part A, 233–238 (2015). I. Siloi, L. A. Agapito, C. Nyshadham, G. L. W. Hart, [33] Y. Hardy, K. S. Tan, and W.-H. Steeb, Computer Algebra J. Carrete, F. Legrain, N. Mingo, E. Zurek, O. Isayev, with SymbolicC++ (World Scientific, Singapore, 2008). A. Tropsha, S. Sanvito, R. M. Hanson, I. Takeuchi, M. J. [34] http://aflow.org/prototype-encyclopedia/ABCD_ Mehl, A. N. Kolmogorov, K. Yang, P. D’Amico, A. Calzo- cF16_216_c_d_b_a.html. lari, M. Costa, R. De Gennaro, M. Buongiorno Nardelli, [35] J. Lima-de-Faria, E. Hellner, F. Liebau, E. Makovicky, M. Fornari, O. Levy, and S. Curtarolo, The AFLOW and E. Parth´e, Nomenclature of inorganic structure Fleet for Materials Discovery, in Handbook of Materials types. Report of the International Union of Crystallogra- Modeling, edited by W. Andreoni and S. Yip (Springer phy Commission on Crystallographic Nomenclature Sub- International Publishing, Cham, Switzerland, 2018), pp. committee on the Nomenclature of Inorganic Structure 1–28, doi:10.1007/978-3-319-42913-7˙63-1. Types, Acta Crystallogr. Sect. A 46, 1–11 (1990). [22] C. Toher, C. Oses, J. J. Plata, D. Hicks, F. Rose, O. Levy, [36] L. L. Boyle and J. E. Lawrenson, The origin dependence M. de Jong, M. Asta, M. Fornari, M. Buongiorno of Wyckoff site description of a crystal structure, Acta Nardelli, and S. Curtarolo, Combining the AFLOW Crystallogr. Sect. A 29, 353–357 (1973). GIBBS and elastic libraries to efficiently and robustly [37] E. Koch and W. Fischer, Automorphismengruppen von screen thermomechanical properties of solids, Phys. Rev. Raumgruppen und die Zuordnung von Punktlagen zu Mater. 1, 015401 (2017). Konfigurationslagen, Acta Crystallogr. Sect. A 31, 88– [23] C. Toher, J. J. Plata, O. Levy, M. de Jong, M. Asta, 95 (1975). M. Buongiorno Nardelli, and S. Curtarolo, High- [38] Permuting the site symmetry symbol does not always re- throughput computational screening of thermal conductiv- veal Wyckoff positions belonging to the same set since the ity, Debye temperature, and Gr¨uneisenparameter using site symmetry may originate from higher point symme- 25

tries (see example of space group #66 (Cccm) and Wyck- 178–192 (2014). off positions i and k in Ref. [37]). Nevertheless, Wyckoff [56] F. Rose, C. Toher, E. Gossett, C. Oses, M. Buongiorno positions belonging to different sets cannot be matched, Nardelli, M. Fornari, and S. Curtarolo, AFLUX: The which will be revealed via the comparison. LUX materials search API for the AFLOW data reposi- [39] E. Perim, D. Lee, Y. Liu, C. Toher, P. Gong, Y. Li, W. N. tories, Comput. Mater. Sci. 137, 362–370 (2017). Simmons, O. Levy, J. J. Vlassak, J. Schroers, and S. Cur- [57] G. Bergerhoff, R. Hundt, R. Sievers, and I. D. Brown, tarolo, Spectral descriptors for bulk metallic glasses based The inorganic crystal structure data base, J. Chem. Inf. on the thermodynamics of competing crystalline phases, Comput. Sci. 23, 66–69 (1983). Nat. Commun. 7, 12315 (2016). [58] S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, [40] N. Zimmermann and A. Jain, Local Structure Order Pa- R. H. Taylor, L. J. Nelson, G. L. W. Hart, S. San- rameters and Site Fingerprints for Quantification of Co- vito, M. Buongiorno Nardelli, N. Mingo, and O. Levy, ordination Environment and Crystal Structure Similar- AFLOWLIB.ORG: A distributed materials properties ity, RSC Adv. 10, 6063–6081 (2020). repository from high-throughput ab initio calculations, [41] M. Hloucha and U. K. Deiters, Fast Coding of the Comput. Mater. Sci. 58, 227–235 (2012). Minimum Image Convention, Mol. Simul. 20, 239–244 [59] Y. Wang, J. Lv, L. Zhu, and Y. Ma, Crystal structure (1998). prediction via particle-swarm optimization, Phys. Rev. B [42] http://aflow.org/prototype-encyclopedia/A2B_ 82, 094116 (2010). tP12_92_b_a.html. [60] Y. Wang, J. Lv, L. Zhu, and Y. Ma, CALYPSO: A [43] W. B. Pearson, The Crystal Chemistry and Physics of method for crystal structure prediction, Comput. Phys. Metals and Alloys (Wiley-Interscience, 1972). Commun. 183, 2063–2070 (2012). [44] E. Parth´e, Elements of Inorganic Structural Chemistry: a [61] R. Hundt, KPLOT: A Program for Plotting and Investi- course on selected topics (K. Sutter Parth´e,Petit-Lancy, gation of Crystal Structures (1979). Switzerland, 1990). [62] T. Bj¨orkman, CIF2Cell: Generating geometries for elec- [45] http://aflow.org/prototype-encyclopedia/AB_cF8_ tronic structure programs, Comput. Phys. Commun. 182, 216_c_a.html. 1183–1186 (2011). [46] P. Avery, C. Toher, S. Curtarolo, and E. Zurek, [63] C. R. Groom, I. J. Bruno, M. P. Lightfoot, and S. C. XtalOpt Version r12: An open-source evolutionary al- Ward, The Cambridge Structural Database, Acta Crys- gorithm for crystal structure prediction, Comput. Phys. tallogr. Sect. B 72, 171–179 (2016). Commun. 237, 274–275 (2019). [64] Pymatgen: AflowPrototypeMatcher, http://pymatgen. [47] C. Oses, C. Toher, and S. Curtarolo, High-entropy ce- org/pymatgen.analysis.aflow_prototypes.html. (ac- ramics, Nat. Rev. Mater. 5, 295–309 (2020). cessed January 20, 2020). [48] J. A. Beachy and W. D. Blair, Abstract Algebra (Wave- [65] The test set is comprised of ICSD unaries (original ge- land Press, Inc., Long Grove, Illinois, 2006). ometries) with 1-105 atoms per unit cell and varying sym- [49] C. Oses, C. Toher, and S. Curtarolo, Data-driven design metries; along with a mix of equivalent and inequivalent of inorganic materials with the Automatic Flow Frame- structures. work for Materials Discovery, MRS Bull. 43, 670–675 [66] K. Yang, C. Oses, and S. Curtarolo, Modeling Off- (2018). Stoichiometry Materials with a High-Throughput Ab- [50] C. Draxl and M. Scheffler, NOMAD: The FAIR concept Initio Approach, Chem. Mater. 28, 6484–6492 (2016). for big data-driven materials science, MRS Bull. 43, 676– [67] P. Sarker, T. Harrington, C. Toher, C. Oses, M. Samiee, 682 (2018). J.-P. Maria, D. W. Brenner, K. S. Vecchio, and S. Cur- [51] A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, tarolo, High-entropy high-hardness metal carbides dis- S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, covered by entropy descriptors, Nat. Commun. 9, 4980 and K. A. Persson, Commentary: The Materials Project: (2018). A materials genome approach to accelerating materials [68] A. L. Mackay, On complexity, Crystallogr. Rep. 46, 524– innovation, APL Mater. 1, 011002 (2013). 526 (2001). [52] The High-Throughput Toolkit (httk), http://httk. [69] R. Allmann and R. Hinek, The introduction of struc- openmaterialsdb.se/. (accessed January 20, 2020). ture types into the Inorganic Crystal Structure Database [53] G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, and ICSD, Acta Crystallogr. Sect. A 63, 412–417 (2007). B. Kozinsky, AiiDA: automated interactive infrastructure [70] Most entries in the Prototype Encyclopedia stem from and database for computational science, Comput. Mater. experimentally observed structures; therefore, we plan Sci. 111, 218–230 (2016). to use the original geometries for prototyping. [54] J. E. Saal, S. Kirklin, M. Aykol, B. Meredig, and [71] http://aflow.org/prototype-encyclopedia/A2B_hP9_ C. Wolverton, Materials Design and Discovery with 189_fg_bc.html. High-Throughput Density Functional Theory: The Open [72] C. Oses, E. Gossett, D. Hicks, F. Rose, M. J. Quantum Materials Database (OQMD), JOM 65, 1501– Mehl, E. Perim, I. Takeuchi, S. Sanvito, M. Schef- 1509 (2013). fler, Y. Lederer, O. Levy, C. Toher, and S. Curtarolo, [55] R. H. Taylor, F. Rose, C. Toher, O. Levy, K. Yang, AFLOW-CHULL: Cloud-Oriented Platform for Autono- M. Buongiorno Nardelli, and S. Curtarolo, A mous Phase Stability Analysis, J. Chem. Inf. Model. 58, RESTful API for exchanging materials data in the 2477–2490 (2018). AFLOWLIB.org consortium, Comput. Mater. Sci. 93,