AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes
David Hicks,1, 2 Cormac Toher,1, 2 Denise C. Ford,1, 2 Frisco Rose,1, 2 Carlo De Santo,1, 2 Ohad Levy,1, 2, 3 Michael J. Mehl,1, 2 and Stefano Curtarolo1, 2, ∗ 1Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, USA 2Center for Autonomous Materials Design, Duke University, Durham, North Carolina 27708, USA 3Department of Physics, NRCN, P.O. Box 9001, Beer-Sheva 84190, Israel (Dated: October 12, 2020) The accelerated growth rate of repository entries in crystallographic databases makes it arduous to identify and classify their prototype structures. The open-source AFLOW-XtalFinder package was developed to solve this problem. It symbolically maps structures into standard designations following the AFLOW Prototype Encyclopedia and calculates the internal degrees of freedom consistent with the International Tables for Crystallography. To ensure uniqueness, structures are analyzed and compared via symmetry, local atomic geometries, and crystal mapping techniques, simultaneously grouping them by similarity. The software i. distinguishes distinct crystal prototypes and atom decorations, ii. determines equivalent spin configurations, iii. reveals compounds with similar properties, and iv. guides the discovery of unexplored materials. The operations are accessible through a Python module ready for workflows, and through command line syntax. All the 4+ million compounds in the AFLOW.org repositories are mapped to their ideal prototype, allowing users to search database entries via symbolic structure-type. Furthermore, 15,000 unique structures — sorted by prevalence — are extracted from the AFLOW-ICSD catalog to serve as future prototypes in the Encyclopedia.
Scientists have been struggling for decades to iden- comparison. For instance, XTALCOMP is coupled tify prototypes (e.g. Strukturbericht series [1] and Pear- with the XTALOPT infrastructure for identifying dis- son’s Handbook [2]) and duplicates in crystallographic tinct materials generated via their evolutionary algo- databases; and to label structures in a concise way to rithm [15]. Despite the considerable number of platforms, recognize (and enable searching by) structure-types. The none are suitable for autonomous prototype detection. recent rapid growth of online repositories has worsened Crystallographic symmetry is neglected in Structure the problem [3]. Distinguishing distinct crystalline com- Matcher, XTALCOMP, and SPAP; while STRUCTURE- pounds is becoming increasingly difficult, leading to repe- TIDY, CRYCOM, and COMPSTRU rely on external sym- tition of previously studied materials, hindering database metry packages. Additionally, most tools only feature variety — biasing data-driven analyses and machine single pairwise comparisons (with the exception of Struc- learning methods [4,5] — and wasting valuable com- ture Matcher) and others require additional inputs (e.g. putational and experimental resources. The multitude space group, Wyckoff positions, and unit cell choice). of crystal geometries make by-hand detection of proto- Aside from technical functionality, the codes do not of- types and repeated entries intractable. A major com- fer built-in methods to compare structures to existing plication for finding structure-types is the non-standard crystallographic libraries and material repositories. To representation of crystals. Determination of unique crys- promote materials discovery, routines must analyze com- tallographic structures is obfuscated by i. unit cell repre- pounds with respect to established prototypes to iden- sentations and ii. origin choices. While standard forms tify new structure-types. This would enable the ex- exist — such as Niggli [6] and Minkowski [7] unit cells — pansion of prototype libraries — such as the AFLOW the conversion procedures are highly sensitive to numer- Prototype Encyclopedia (or Prototype Encyclopedia for ical tolerance values and can cast similar structures into brevity) [16, 17] — fueling generation of unique com- differing descriptions [8,9]. Additionally, lattice stan- pounds via prototype decoration. Comparing compounds dardization techniques do not address differences in ori- to those in materials databases can prevent duplication. gin choices. The lack of commensurate representations Moreover, the properties of database entries can be used impedes the search for prototypes and inhibits mappings to estimate those of similar uncalculated compounds, ex- between similar crystals and their corresponding proper- ploiting the structure-property relationship of materials. ties. Clearly, an automatic and reliable large-scale method for arXiv:2010.04222v1 [cond-mat.mtrl-sci] 8 Oct 2020 To overcome non-standard descriptions, crystal com- discerning unique crystallographic structures is therefore parison tools have been developed to identify similar crucial for the materials science community. structures. Programs such as Structure Matcher [10], AFLOW-XtalFinder (AFLOW crystal finder, XTALCOMP [9], SPAP [11], CMPZ [12], CRYCOM [8], XtalFinder for brevity) addresses many of the pre- STRUCTURE-TIDY [13], and COMPSTRU [14] are viously mentioned issues in a high-throughput fashion. available with varying objectives related to structure The primary objective of XtalFinder is to identify/- classify the prototypes of materials and relate them via structural similarity metrics. To accomplish this, XtalFinder determines the ideal prototype designation ∗ [email protected] of crystal structures, consistent with the International
label : AB mP4 11 e e params : a, b/a, c/a, ,x1,z1,x2,z2 mP4 mC4 oI4 tI4 hR6 cF8 values :5.586, 0.719, 0.698, 91.992, SG #11 SG #12 SG #71 SG #139 SG #166 SG #225 0.252, 0.234, 0.751, 0.261
FIG. 1. Self-consistent symbolic prototype finder. (a) The prototype for an input structure, e.g. AlCl (ICSD #56541), is identified by analyzing its symmetry. Classification of the prototype may change depending on the symmetry tolerance (sym): mP4, SG #11 (0 < sym ≤ 0.25 A);˚ mC4, SG #12 (0.26 ≤ sym ≤ 0.27 A);˚ oI4, SG #71 (0.34 ≤ sym ≤ 0.36 A);˚ tI4, SG #139 (0.37 ≤ sym ≤ 0.44 A);˚ hR6, SG #166 (0.45 ≤ sym ≤ 0.49 A);˚ and cF8, SG #225 (0.51 ≤ sym ≤ 1.0 A).˚ An adaptive routine is employed for tolerance regions with incommensurate symmetry descriptions (gray arrows for 0.27 < sym < 0.34 A˚ and 0.49 < sym < 0.51 A),˚ ensuring self-consistent prototype/symmetry designations. (b) The structure is then mapped into its prototype label and symbolic and numeric internal degrees of freedom (dof), consistent with the International Tables for Crystallography (ITC). Structures in this representation can be generated with the symbolic prototype generator.
Tables for Crystallography (ITC)[18]. Any structure tures are generally classified in terms of their symme- in this representation can be automatically generated try characteristics. For example, the rocksalt prototype via a new symbolic prototype generator. Similarity has a face-centered cubic lattice and 8 atoms in the between structures is analyzed on multiple fronts. conventional cell (i.e. Pearson symbol of cF8), space Crystallographic structures are first compared by sym- group F m3¯m (#225), and Wyckoff positions 4 a m3¯m metry (isopointal analysis), leveraging a robust software and 4 b m3¯m. Determining this information for any arbi- implementation, AFLOW-SYM, which calculates self- trary structure is often a challenge: numerical noise in the consistent symmetry descriptions freeing the user from atomic positions inhibits detection of crystal isometries, tolerance adjustments [19]. Local atomic geometries requiring by-hand modification of tolerance thresholds. are also computed to match neighborhoods of atoms in Furthermore, consistency between real- and reciprocal- crystals (isoconfigurational snapshots). Finally, crystal space symmetries is often overlooked, and yet it is im- similarity is resolved by rigorous structure mapping perative for reliable ab initio simulations. Thus, accurate procedures (complete isoconfigurational analysis) and prototype detection relies on robust symmetry analyses. quantified via a misfit criterion [20]. The prototype finder accommodates automatic workflows, with functionality XtalFinder employs a self-consistent mechanism to find to analyze multiple materials/structures simultaneously the ideal prototype of a given structure. The space group, via multithreading. Features are provided to identify Pearson symbol, and occupied Wyckoff positions are cal- crystallographic structures, distinct materials, atom culated via the AFLOW-SYM routines [19]. The proto- decorations, and spin configurations. Methods are also type classification is sensitive to the symmetry tolerance included to compare compounds/prototypes to the (sym). For example, the AlCl structure (ICSD #56541, AFLOW.org repository and AFLOW prototype libraries. DFT-relaxed) in Figure1(a) can be classified as one of Every entry in the AFLOW.org repository has been six different prototypes as a function of sym: i. mP4, mapped to its prototype label, enabling users to search SG #11 (0 < 0.25 A);˚ ii. mC4, SG #12 (0.26 sym ≤ ≤ the database by structure-type. The XtalFinder code — 0.27 A);˚ iii. oI4, SG #71 (0.34 0.36 A);˚ sym ≤ ≤ sym ≤ written in C++ — is part of the AFLOW (Automatic iv. tI4, SG #139 (0.37 0.44 A);˚ v. hR6, SG ≤ sym ≤ flow) framework [21–24] and is open-source under the #166 (0.45 sym 0.49 A);˚ and vi. cF8, SG #225 GNU-GPL license. For seamless integration into different ≤ ≤ (0.51 sym 1.0 A).˚ For certain tolerance values — work environments, this functionality is accessible via ≤ ≤ e.g. 0.27 < sym < 0.34 A˚ and 0.49 < sym < 0.51 A—˚ the command-line and a Python module. incommensurate symmetry descriptions are calculated. Problem of the ideal prototype. Prototype struc- To overcome this, the symmetry tolerance is automat- 3 ically changed, scanning tighter and looser tolerances resentation also easily distinguishes isopointal and iso- around the initial value, to find consistent symmetry de- configurational prototypes. Two compounds with simi- scriptions at a new sym. This autonomous approach en- lar labels are isopointal (i.e. same symmetry), and are sures prototype classifications are correct and compati- isoconfigurational if their parameters are the same (i.e. ble against all symmetry descriptors (e.g. space group, equivalent geometric configurations) [25]. Moreover, the Wyckoff position, lattice type, Brillouin zone, etc.). representation reveals the degrees of freedom that can The default symmetry tolerance value for classifying be altered, while preserving the underlying symmetry. prototypes in XtalFinder is proportional to the minimum This is useful for showing continuous structure transi- min interatomic distance (dnn /100). The tolerance is thus tions within the same symmetry-type and performing system-specific, and it has been shown to be consistent symmetry-constrained structure relaxations [26]. Lastly, with experimentally resolved symmetries [19]. Neverthe- with this format, structures are now easily regenerated less, the tolerance can also be adjusted by the user, and with the AFLOW software. is guaranteed to return a commensurate designation due Symbolic prototype generator. Structures repre- to the adaptive prototype protocol shown in Figure1(a). sented in the ideal prototype designation can be created Once the symmetry attributes of the crystal are calcu- and decorated with any atomic elements via a new sym- lated, XtalFinder automatically maps the structure to bolic prototype generator, enabling automatic materials its AFLOW prototype label and symmetry-based degrees design. A procedure — introduced in Refs. [16, 17]— of freedom (Figure1(b)), i.e. lattice parameters/angles has been extended to create all possible prototype struc- and non-fixed Wyckoff coordinates [16, 17]. These desig- tures, going beyond those previously described in the nations are commensurate with the ITC cell choices and Prototype Encyclopedia. Given a crystal’s composition, Wyckoff positions; the de facto standard for crystallogra- Pearson symbol, space group, and occupied Wyckoff po- phy. The label specifies the stoichiometry and symmetry sitions, the generator determines the degrees of freedom of the structure in underscore-separated fields. The fields in symbolic notation (i.e. a, b/a, c/a, α, β, γ, x, y, indicate the following (example system: esseneite struc- and z) that must be specified, based on the ITC con- ture, ABC6D2 mC40 15 e e 3f f [16]) ventions [18]. Feeding in the ideal prototype label and • first field: the reduced stoichiometry based on al- degrees of freedom to the symbolic generator will pro- phabetic ordering of the compound, e.g. a quater- duce the corresponding geometry file, substituting the nary with stoichiometry ABC6D2, appropriate degrees of freedom with the input values. • second field: the Pearson symbol, e.g. mC40, Prototypes, including those in the Prototype Encyclope- • third field: the space group number, e.g. space dia, no longer need to be tabulated (hard-coded) in the group #15, AFLOW software, and are now created on-the-fly. With • fourth field: the Wyckoff letter(s) of the first atomic this prototype generator, AFLOW is capable of creating site, e.g. site A: one Wyckoff position with letter structures to span all regions of crystallographic space. e, Structures are generated with the following pro- • fifth field: the Wyckoff letter(s) of the second totype command syntax: --proto=label --params= atomic site, e.g. site B: one Wyckoff position with parameter_1,parameter_2,.... Here, the label is the letter e, ideal prototype label, e.g. AB mP4 11 e e as shown in • sixth field: the Wyckoff letter(s) of the third atomic Figure1(b), and parameter_1,parameter_2,... are the site, e.g. site C: three Wyckoff positions with let- comma-separated values for the prototype’s degrees of ters f, and freedom, e.g. 5.586, 0.719, 0.698, 91.992, 0.252, 0.234, • seventh field: the Wyckoff letter(s) of the fourth 0.751, and 0.261 as shown in Figure1(b). By default, atomic site, e.g. site D: one Wyckoff position with structures are generated with fictitious species in alpha- letter f. betical order (i.e. A, B, C, D, etc.). Users can over- The prototype parameters specify the degrees of free- ride this order by specifying other permutations after dom allowed by the symmetry of the structure. For the the prototype label (separated by a period), i.e. -- esseneite structure, there are 18 parameters: a, b/a, c/a, proto=label.BAC...; a useful feature for controlling the β, y1, y2, x3, y3, z3, x4, y4, z4, x5, y5, z5, x6, y6, and atomic site decorations. Specific elements can be deco- z6. The first three variables are the lattice parameters rated onto the prototype by appending the element ab- — with b and c represented in relation to a — the fourth breviations to the command in colon-separated alphabet- variable is the lattice angle β, and the subsequent vari- ical order, e.g. --proto=label:Ag:Cu:Zr. The genera- ables are the Wyckoff coordinates (fractional) that are tor checks for any inconsistencies with the provided label not fixed by symmetry. The sequence of the Wyckoff and/or parameter values, terminating prematurely with parameters is based on the alphabetic ordering of the a message listing possible fixes to the command. The Wyckoff letters, followed by alphabetic ordering of the generator supports multiple geometry file formats, in- species. Additional information regarding the label and cluding VASP (POSCAR)[27], FHI-AIMS [28], Quantum parameters are discussed in the Refs. [16, 17]. ESPRESSO [29], ABINIT [30], ELK [31], and CIF. Swap- Mapping structures into this format characterizes pro- ping the command --proto=label with --aflow_proto totypes in a concise and descriptive manner. The rep- =label, will build an aflow.in file, AFLOW’s input file 4
(using a standard set of DFT parameters by default [32]), further compared by inspecting arrangements of atoms, automating ab initio simulations of these compounds. i.e. local atomic geometries. Local geometry analy- The generator can also print the symbolic representa- ses have been fruitful in providing structural descrip- tion of the lattice and Wyckoff positions. Adding the tors and similarity metrics between crystals of different option --add_equations to the prototype command re- types [39, 40]. However, the positions of these environ- turns both a numerical and symbolic version of the geom- ments are often neglected, precluding the determination etry file, and the option --equations_only only prints of one-to-one mappings between similar crystals. Nev- the symbolic version. Symbolic geometry files can be ertheless, the analysis quickly identifies local geometries printed with respect to the conventional cell (ITC) or and is employed here to analyze structures beyond sym- symbolically transformed into the primitive cell (using metry considerations (i.e. isoconfigurational versus iso- the SymbolicC++ open-source software [33]). By default, pointal [35]). AFLOW provides the primitive cell, since fewer-atom unit Rather than determine the complete local atomic ge- cells are more computationally efficient. ometry for each atom, XtalFinder builds a reduced rep- With a robust prototype classifier and generator in resentation: neighborhoods comprised of only the least place, comparison of prototypes is required to i. iden- frequently occurring atom (LFA) type(s). The local LFA tify unique structure-types and ii. group similar ones to- geometry analysis provides the connectivity for a sub- gether. The prototype label and parameters alone can- set of atoms (i.e. LFA-type) to discern if patterns are not establish structural similarity due to variations in present in both structures, regardless of cell choice and the choice of lattice and origin, potentially affecting both crystal orientation. This description is preferred over the the label (e.g. Wyckoff letters) and the parameters (e.g. full local geometry because it is i. computationally less lattice and non-fixed Wyckoff parameters). Therefore, expensive to calculate and ii. generally less sensitive to XtalFinder offers three levels of comparison: symmetry, coordination cutoff tolerances. The latter is attributed local atomic geometry, and complete crystal geometry. to the fact that LFA geometries are more sparse. They are described in the following three subsections. An example of a local LFA geometry is shown in Fi- Isopointal structures: symmetry comparison. gure2(b). A local LFA a tomic geometry (AG) is a set Symmetry analyses of crystals are required to identify of vectors connecting a central atom (c) to its closest structures of the same symmetry-type. The isometries of neighbors: crystals (e.g. rotations, roto-inversions, screw axes, and min AGc dic i atomi LFA(s) , (1) glide planes) are calculated via the routines of AFLOW- ≡ { } ∀ | ∈ { } min SYM [19] to determine the space group and occupied where dic is the minimum distance vector to the i atom Wyckoff positions (Figure2(a)). Results from AFLOW- — restricted to LFA-type(s) only — and is calculated via SYM are robust against numerical tolerance issues and the method of images for periodic systems [41]: are consistent with experimentally determined symme- min dic = min( min (xi xc + naa + nbb + ncc) ). (2) tries in comparison to other symmetry software [19]. i na,nb,nc || − ||
Crystals are isopointal if they have commensurate Here, na, nb, and nc are the lattice dimensions along the space groups (equivalent or enantiomorphic pairs) and lattice vectors a, b, and c; and xi and xc are the Carte- Wyckoff positions [35]. Wyckoff positions are compatible sian coordinates of the i and c (center) atoms, respec- if they have the same multiplicity and similar site sym- min tively. A coordination shell with a thickness of dic /10 metry designations. Due to different setting and origin captures other atoms of the same type to control numer- choices for the conventional cell, a strict site symmetry ical noise in the atomic coordinates (a similar tolerance match is insufficient. For instance, the Wyckoff positions metric is defined in AFLOW-SYM, i.e. loose preset tole- with multiplicity 2 in space group #47 (P mmm) — four rance value [19]). This cutoff value yields expected co- 2mm (letters i-l), four m2m (letters m-p), and four mm2 ordination numbers for well-known systems and is com- (letters q-t) — form a Wyckoff set and are related via an parable to results provided by other atom environment automorphism of the space group operations [18, 36, 37]. calculators [39, 40]. If there is only one LFA type — e.g. Depending on the assignment of the lattice parameters Si in α-cristobalite (SiO2)[42] — then the distance to (a, b, and c) and origin choice, different — and potentially the closest neighbor of that LFA type is calculated. If equivalent — Wyckoff decorations are possible. Conse- there are multiple LFA types — e.g. four for the qua- quently, XtalFinder tests permutations of the site sym- ternary Heusler (as illustrated in Figure2(b)) — then metry symbol to expose positions that may be within the the minimum distances to each LFA type are computed. same Wyckoff set [38]. The local atomic geometry is calculated for each atom The symmetry calculation is performed automatically, of the LFA type(s) in the unit cell, resulting in a list i.e. it does not require input from the user. Options are of atomic geometries ( AGc ). Therefore, α-cristobalite available to ignore symmetry and force geometric com- has a set of four Si LFA{ geometries} (one for each Si in parison of structures, which can identify crystals associ- the unit cell: AGSi,1, AGSi,2, AGSi,3, AGSi,4 ) and the ated via symmetry subgroups. quaternary Heusler{ has a set of four LFA geometries} (one Isoconfigurational snapshots: local geometry for each element type: AGAu, AGLi, AGMg, AGSn , re- comparison. Beyond isopointal analyses, structures are spectively). { } 5 duplicate reduction a symmetry comparison b local geometry comparison C4 S2
C4 S2
0 10 010 0 10 010 local LFA geometry C4 = 100. . . S2 = 100 C4 = 100. . . S2 = 100 00011 0 1 00011 0 1
space group ¯ space group ¯
Fm
Wyckoff Mg 4 am3¯m Wyckoff Cl 4 am3¯m
positions
c Xref geometric structure comparison
Xtest lattice and origin search
lattice deviation coordinate displacement failure
dmap dmap
FIG. 2. Symmetry, local atomic geometry, and geometric structure comparisons in AFLOW-XtalFinder. (a) Crystal isometries are calculated internally with AFLOW-SYM. The space groups and occupied Wyckoff positions are compared, revealing isopointal structures. (b) The local least-frequently occurring atom (LFA) geometries are computed and compared between structures. An example local LFA geometry is shown for the quaternary Heusler structure [34] (2-D projection), highlighting the closest neighbors (via solid lines) for each LFA type to the central Mg atom (purple). Shaded concentric circles indicate the tolerance threshold for capturing atoms in the coordination shell with a thickness of 10% of the distance from the central and connected atom. Local geometry vectors are compared against local geometries in other structures to determine mapping potential. (c) Two structures (Xref and Xtest) are mapped onto one another by expanding Xtest into a supercell and exploring commensurate lattice and origin choices with respect to Xref . The yellow lattice (highlighted by the green box) is a potential match with Xref . Xtest is transformed into the new representation (Xetest), and the structures are quantitatively compared via the misfit criteria. The structures are evaluated via their lattice deviation (latt), coordinate displacement (coord), map and figure of failure (fail). Distances between mapped atoms (d ) that are less than half the atom’s nearest neighbor (dnn/2) are accounted for in the coordinate displacement (green dashed lines and arrows), while larger distances are described in the figure of failure (red dashed lines and arrows). 6
To investigate structural compatibility, local atomic The grid dimensions span between na,b,c na,b,c to geometry lists for compounds are compared. In gen- account for different orientations/rotations− between→ the eral, the local geometry comparisons err on the side of structures. To optimize the lattice search, translation caution. For instance, comparing the cardinality of the vectors are explored in a grid comprised of only the LFA- coordination is often too strict. Despite a more sparse type in Xtest (since they are the minimal set of atoms geometry space, slight deviations in position can move exhibiting crystal periodicity). In addition to verifying atoms outside the coordination shell threshold, changing crystal periodicity, candidate lattice vectors must be sim- the atom cardinality and neglecting potential matches. ilar to those in the Xref lattice based on i. lattice vector Local atomic geometries are thus compatible if i. the moduli (∆l), ii. angles formed between pairs of lattice central atoms are comparable types (i.e. same element vectors (∆θ) and iii. volumes enclosed by three lattice and/or stoichiometric ratio in crystal), ii. the neighbor- vectors (∆V ). The tolerances values (∆l, ∆θ, ∆V ) are hood of surrounding atoms have distances that match chosen based on how much the lattices are allowed to dif- within 20% after normalizing with respect to max(AGc) fer. If the lattices are significantly different, then the lat- (i.e. the largest distance in the local geometry cluster), tice is ignored (see the “lattice deviation” in the “Quan- and iii. the angles formed by two atoms and the center titative similarity measure” subsection). Additionally, atom match within 10 degrees. To further alleviate the as a speed increase, commensurate lattices are sorted by coordination problem, an exact geometry match is not minimum lattice deviation to find matches more quickly. required, i.e. some distances and angles are permitted to Upon finding a similar cell to Xref , Xtest is transformed be missing. Grouping local atomic geometries as compat- into the new lattice representation Xtest and is stored if ible is favored to mitigate false negatives for equivalent the representations have the same number of atoms (and structures. types). e Isoconfigurational structures: Geometric struc- For each prospective unit cell, possible origin choices ture comparison. To resolve a commensurate represen- are explored. The origin of Xref is placed on one of the tation between two structures for geometric comparison, LFA-type atoms, and the origin of Xtest is cycled through one structure — the reference Xref — remains fixed and all atoms of its LFA-type. Given an origin choice, a map- the other structure — the potential duplicate Xtest — is ping procedure is attempted for alle atoms in the unit cell. expanded into a supercell. Lattice vectors are identified The minimum Cartesian distance — via the method of within the supercell and compared against the reference images for periodic systems [41] — is determined for ev- structure. For any similar lattices to Xref , Xtest is trans- ery atom i in Xref to each atom j in Xtest formed into the new lattice representation (Xtest). Ori- gin shifts for this cell are then explored in an attempt to dij = min (xi xj + naa + nebb + ncc) , (4) match atoms. If one-to-one atom mappings existe between na,nb,nc || − || the two structures, then the similarity is quantified with the crystal misfit method (see “Quantitative similarity where na, nb, and nc are the lattice dimensions along measure” subsection) [20]. Misfit values below a given the lattice vectors a, b, and c; and xi and xj are the threshold indicate equivalent structures and the search Cartesian coordinates of the i and j atoms, respectively. Given the set of distances dij , the minimum distance terminates. Alternatively, misfit values larger than the { } threshold are disregarded and the search continues until over all j atoms is identified as the mapping distance, i.e. all lattices and origin shifts are exhausted. The proce- map di min dij , (5) dure is detailed below and an illustration of the process ≡ j { } is depicted in Figure2(c). The lattice search algorithm begins by scaling the vol- map regardless of the element type. Once di is computed umes of the unit cells to compare structures with different for all i, the following conditions are verified: i. one-to- volumes (an option is available to quantify the similar- one mappings (i.e. no duplicate j indices between i in- ity between structures at fixed volumes). Once scaled, dices), and ii. no cross-matching between element types the routine searches for translation vectors by generat- (i.e. cannot map a single element type to multiple types ing a lattice grid of test. The size of the grid is defined X in Xtest). If either condition is violated, the mappings to encompass a sphere with a radius (rgrid) equal to the are ignored and the search continues. maximum lattice vector length of , i.e. Xref Givene a successful mapping, the similarity of the two crystals in the corresponding representations are quan- rgrid max (a, b, c) . (3) ≡ tified, indicating equivalent or unique structures. If no Similar to a procedure described in Ref. [19], the nec- mapping is found for any lattice and origin choice, then essary grid dimensions are given by the set of vectors the structures are considered distinct and are not as- perpendicular to each pair of Xref lattice vectors scaled signed a similarity value. by the grid radius (e.g. n1 = rgrid (b c/ b c )). The Quantitative similarity measure. To compare two scaled vectors are then transformed into× the|| × lattice|| basis crystals in a given representation, a method proposed (L), via n0 = L−1n, and the ceiling of the n0 compo- by Burzlaff and Malinovsky is employed [20]. The simi- 0 nents indicate the grid dimensions: na,b,c = ceil(na,b,c). larity between structures is quantified by a misfit value, 7
, which incorporates differences between lattice vectors positions [10] and coordination characterization func- and atomic coordinates via [20]: tions [11]. XtalFinder employs the crystal misfit criteria to incorporate structural differences between both the 1.0 (1.0 ) (1.0 ) (1.0 ) . (6) ≡ − − latt − coord − fail lattice and atom positions. Differences between common similarity metrics — and their software implementations The misfit quantity is bound between zero and one: — are discussed in more detail in the “Comparison Ac- structures with a value close to zero match and those curacy” subsection. with a value close to one do not match. Special misfit ranges defined by Burzlaff and Malinovsky are adopted Super-type comparisons. To explore new areas of here [20] materials space, the XtalFinder module i. identifies equivalent and unique materials, ii. uncovers common 0 < match : match, structure-types across different compounds (i.e. proto- ≤ types), iii. determines inequivalent atom decorations match < family : same family, and (7) ≤ for a given crystal structure, and iv. discerns distinct < 1: no match. family ≤ magnetic structure configurations. The corresponding The “same family” designation generally corresponds to comparison modes are denoted as material-, structure-, crystals with common symmetry subgroups. Burzlaff and decoration-, and magnetic-type, respectively (Figure3). Each variant uses the underlying procedures discussed in Malinovsky recommend match = 0.1 and family = 0.2 based on definitions from Pearson [43] and Parth´e[44]. In the “Results” section (i.e. symmetry, local atomic geom- etry, and geometric structure comparisons) with different the “Finding match: structural misfit versus calculated restrictions on mapping atom types. property (∆Hatom)” section, heuristic misfit thresholds are identified based on the allowed maximum enthalpy Material-type. Material-type comparisons map atoms differences between similar structures. of the same atomic species (Figure3(a)). For example, The deviation of the lattices, latt, captures the dif- given two ZnS zincblende compounds [45], a material- type comparison maps Zn Zn and S S in the two struc- ference between the lattice face diagonals of Xtest and tures. Therefore, the method→ reveals→ duplicate com- Xref [20] e pounds within a data set. latt 1 (1 D12)(1 D23)(1 D31), (8) Structure-type. Conversely, structure-type compar- ≡ − − − − test ref test ref isons ignore atomic species and map any atom-type with dkl dkl + fkl fkl Dkl || − ref|| ||ref − ||, (9) compatible stoichiometric ratios (Figure3(b)). In the ≡ dkl fkl case of zincblende structures ZnS and SiC, a structure- e || − e || type comparison attempts to map Zn Si and S C, or where fkl and dkl denote the diagonals by adding and Zn C and S Si, since the compounds→ are equicompo-→ subtracting, respectively, the k and l lattice vectors. → → In the lattice search algorithm, ∆l, ∆θ, and ∆V tol- sitional. This mode exposes unique backbone structures and is practical for crystallographic prototyping. Identi- erances are coupled to latt, and are tuned to ensure fying prototypes is also useful for modeling solid solutions latt family. The≤ coordinate deviation — measuring the disparity and disordered materials [46, 47]. between atomic positions in the two structures — is based Decoration-type. The decoration-type (or map map permutation-type) mode determines unique atom on the mapped atom distances (di or dj as com- puted with Equations (4) and (5)) and the atoms’ nearest decorations for a given crystal structure, i.e. inequiv- alent colorings of a structure, where each element is neighbor distances in the respective structures, dnn [20] denoted by a different color (Figure3(c)). Continuing test ref Ne test map N ref map with the zincblende example, the A and B atomic sites i (1 ni ) di + j 1 nj dj coord − − . are equivalent: swapping elements on the sites results in N test N ref ≡ P e (1 ntest) dtest + P 1 nref dref a duplicate compound compared to the original decora- i − ei nn,i j − j nn,j (10) tion. Thus, only one site decoration choice is necessary P P e to create a distinct compound. Given a compound with N test and N ref are the number of atoms in the two crys- n species, there are n! possible atom permutations. map tals. If d < dnn/2, then a “switch” variable n is set to XtalFinder automatically i. generates compounds with zeroe and the mapped atom distance is included in coord. the different atom decorations for a crystal, ii. compares Otherwise, n is set to one, signifying the mapped atoms the decorations (via a material-type comparison), and are far apart and not considered in coord. These atoms iii. identifies the unique configurations. Atom decora- are represented in the figure of failure, fail [20] tions are only compared if atomic types have the same Wyckoff multiplicity and similar site symmetries (see test ref Ne test N ref subsection “Isopointal structures: symmetry analysis”). i ni + j nj fail . (11) Equivalent decoration groups need to obey Lagrange’s ≡ P N test + NPref e theorem [48]: the order h of subgroup H divides the Other metrics can be used to assess structural similar- group G with order g (i.e. mod(g, h) = 0). Accordingly, ity, including the root meane square (rms) of the atom the numbers of unique and equivalent decorations must 8 super-type comparisons a material-type b structure-type c decoration-type d magnetic-type
AB BA
reveal duplicates identify prototypes unique decorations distinct spin config.
FIG. 3. Available super-type comparison modes. (a) Material-type: maps same element types, revealing duplicate compounds. (b) Structure-type: maps structures regardless of the element types, identifying crystallographic prototypes. (c) Decoration-type: creates and compares all atom decorations for a given structure, determining unique and equivalent decorations (in this case, atom decorations AB and BA match). (d) Magnetic-type: maps compounds by element types and magnetic moments, discerning distinct spin configurations. divide the total number of decorations, i.e. satisfy divi- However, the number of equivalent decorations in each sor theory. The possible equivalent decoration groups — set are not the same, violating Lagrange’s theorem [48]. out of n! – are dictated by its divisors, and are enumer- Furthermore, all misfits values should be the same, since ated below for 2 < n < 5 (elemental compounds, n = 1, the underlying structure is unchanged. The incommen- are excluded): surate groupings are a symptom of only comparing to the 2! = 2 : 2, 1 reference decoration, as opposed to cross-comparing with 3! = 6 : 6, 3, 2, 1 other decorations. 4! = 24 : 24, 12, 8, 6, 4, 3, 2, 1 To remedy incorrect groupings, XtalFinder checks for 5! = 120 : 120, 60, 40, 30, 24, 20, 15, 12, 10, 8, 6, 5, better matches (i.e. potential equivalent decorations 4, 3, 2, 1 with lower misfit values). Therefore, the “duplicate” dec- For example, the possible groupings for a ternary com- orations are compared to the other reference decorations pound (n = 3) are: 6, 3, 2, and 1 unique sets with 1, 2, and regrouped to minimize the misfit value. In this case, 3, and 6 decorations per set, respectively. the subsequent cross-comparisons are performed: Depending on the matching (misfit) tolerance and the • BAC with CAB and BCA, choice of the reference decoration, calculated equivalency • CBA with CAB and BCA, and groups can violate divisor theory. For instance, two dec- • ACB with BCA (not compared with ABC; per- orations can match with a certain misfit; however, a bet- formed previously). ter match with a smaller misfit can exist with another Consequently, the final equivalent decorations are decoration. To combat incorrect groupings, XtalFinder • ABC = CBA ( = 0.0144), executes a consistency check, verifying the groupings are • CAB = BAC ( = 0.0144), and commensurate with the possible divisors. If they are not, • BCA = ACB ( = 0.0144). XtalFinder searches for better matches and regroups the The groupings above satisfy Lagrange’s theorem, and the compatible decorations. equivalent structures in each group have the same misfit For example, ICSD entry BiITe #10500 (original geom- value with respect to their reference decoration. etry) has six possible atom decorations: ABC, BAC, Magnetic-type. Magnetic-type comparisons map CBA, ACB, CAB, and BCA. Since the three equicom- atoms of the same atomic species and similar magnetic positional sites are comprised of the same Wyckoff multi- moments, i.e. analyzes spin configurations (Figure3(d)). plicity and site symmetry (multiplicity 1 and site symme- For instance, given two body-centered cubic chromium try 3m. in space group #156), all structures are placed compounds with antiferromagnetic ordering, the routine in the same initial comparison group, with ABC chosen attempts to map Cr↑ Cr↑ and Cr↓ Cr↓. A magnetic as the reference decoration (since it is the first in the set). moment tolerance threshold→ denotes equivalent→ spin sites; After comparing, the equivalent groups and their misfit where the default tolerance is 0.1µB. The analysis can be values are: performed for both collinear and non-collinear systems. • ABC = BAC ( = 0.0889) = CBA ( = 0.0144), The magnetic-type comparison can be joined with a mag- • CAB = ACB ( = 0.0889), and netic structure generator to create distinct spin configu- • BCA (no equivalent decorations). rations for high-throughput simulation.
compounds compare compare local compare geometric isoconfigurational to compare symmetry LFA geometry structure all matched? isopointal near isoconfig.
all matched? yes ( ✏ ✏ match ) match
no
move unmatched ...
...... into new groups done
FIG. 4. Automatic grouping of multiple compounds. Compounds are compared in the following sequence: symmetry, local LFA geometry, and geometric structure. The algorithms determine isopointal, near isoconfigurational, and isoconfigura- tional structures, respectively, and aggregate them into similar sets (enclosed in black solid-lined boxes). Unmatched structures (i.e. > match) after the initial geometric structure comparison are put into new groups and re-compared until all equivalent structures are grouped. This sequence is the same for material-, structure-, decoration-, and magnetic-type comparisons; how- ever, the criteria for atom mappings differ (see subsection “Super-type comparisons” for details). The symmetry, local LFA geometry, and geometric structure comparisons (blue boxes) are multithreaded for parallel computation.
Multiple comparisons. With the plethora of com- depending on the comparison mode. pounds generated by computational frameworks — Multithreading. To enhance calculation speed, multi- such as AFLOW [21, 49], NoMaD [50], Materials threading capabilities can be employed. The three com- Project [51], High-Throughput Toolkit [52], Materials putationally intensive procedures — calculating the sym- Cloud/AiiDA [53], and OQMD [54] — automatically com- metry, constructing the local LFA geometry, and per- paring structures is necessary for high-throughput classi- forming geometric comparisons — are partitioned onto fication of unique/duplicate compounds and structure- allocated threads, offering significant speed increases for types. For this purpose, we developed an automatic large collections of structures. comparison procedure for multiple crystals (Figure4). Automatic comparisons. There are three built-in Compounds are first grouped into isopointal sets by an- functions to compare multiple structures automatically: alyzing and comparing the symmetries of the structures, i. compare structures provided by a user, ii. compare an aggregating them by stoichiometry, space groups, and input structure to prototypes in AFLOW [16, 17], and iii. Wyckoff sets (calculated via AFLOW-SYM [19]). Next, compare an input structure to entries in the AFLOW.org compounds are further partitioned into near isoconfigu- repository. An overview of each high-throughput method rational sets by determining and comparing the local LFA is discussed below and usage is detailed in the Methods geometries in each structure. Within each near isoconfig- section. urational group, one representative structure — generally the first in the set — is compared to the other struc- Compare user datasets. Users can load crystal geome- tures via geometric comparisons and the misfit values tries and compare them automatically with XtalFinder. are stored. Once the comparisons finish, any unmatched Options to perform both material-type and structure- type comparisons are available to identify unique/du- structures (i.e. misfit values greater than match) are re- organized into new comparison sets. The process repeats plicate compounds or prototypes, respectively. For until all structures have been assembled into matching structure-type comparisons, the unique atom decorations groups or all comparison pairs are exhausted. The three for each representative structure are determined. Once comparison analyses are performed in this order for two the analysis is complete, XtalFinder groups compatible reasons: i. to categorize structural similarity to varying structures together and returns the corresponding misfit degrees (isopointal, near isoconfigurational, and isocon- values. figurational) and ii. to efficiently group compounds to Compare to AFLOW prototypes libraries. Given reduce the computational cost of the geometric structure an input structure, this routine returns similar AFLOW comparison (see “Speed and scaling considerations” in prototype(s) along with their misfit value(s) (Fi- the Discussion). This procedure is the same for material-, gure5(a)). AFLOW contains structural prototypes that structure-, decoration-, and magnetic-type comparisons; can be rapidly decorated for high-throughput materials however, different atom mapping restrictions are applied discovery: 590 in the Prototype Encyclopedia [16, 17] and 1,492 in the High-throughput Quantum Computing 10
compounds similar to the input structure based on TABLE 1. AFLOW.org entries equivalent to an in- put sodium chloride (rocksalt) compound. A list species, stoichiometry, space group, and Wyckoff posi- of equivalent compounds to the Prototype Encyclopedia’s tions. With the AURL from the AFLUX response, struc- rocksalt structure with the default degrees of freedom (la- tures for the entry are retrieved via the REST-API. The bel=AB cF8 225 a b, parameters=5.64 A).˚ The compound most relaxed structure is extracted by default; however, name, auid, misfit (), and enthalpy per atom (Hatom) are options are available to obtain structures at different listed for all similar structures in the database. Volume scal- ab-initio relaxation steps. The set of entries from the ing is suppressed for the comparison to incorporate volume database are then compared to the input structure. Sim- differences. The first 25 and last 2 entries are from AFLOW’s ilar to the AFLOW prototype comparisons, candidate en- ICSD and LIB2 catalogs, respectively. tries are only compared against the input structure, i.e. H compound auid atom the procedure terminates without regrouping unmatched (eV/atom) entries. ClNa aflow:d241535faf2a4519 0.00514317 -3.39101 With the underlying AFLUX functionality, mate- ClNa aflow:82a178672a734c47 0.00537828 -3.39082 ClNa aflow:1cd71114972d46dd 0.00514250 -3.39078 rial properties can also be extracted, highlighting the ClNa aflow:d0c93a9396dc599e 0.00496982 -3.39075 structure-property relationship amongst similar materi- ClNa aflow:5699b196418c6044 0.00450831 -3.39066 als. For instance, the enthalpy per atom (Hatom) for ClNa aflow:9017f9c64ead22ab 0.00434390 -3.39062 matching database entries are printed by including the ClNa aflow:39ab5e62afdb5ac0 0.00429571 -3.39062 enthalpy_atom API keyword in the query. Any num- ClNa aflow:c16c0f1c061f7d3e 0.00424606 -3.39061 ber or combination of properties can be queried; avail- ClNa aflow:2f4b5e32510830a0 0.00427698 -3.39061 able API keywords are located in Refs. [55, 56]. Table1 ClNa aflow:b5ab343f3a484538 0.00421818 -3.39060 shows the comparison results between a rocksalt NaCl ClNa aflow:cc41860d69de2888 0.00405508 -3.39056 compound and matching DFT-relaxed structures in the ClNa aflow:b2ec4b68e12f3674 0.00404585 -3.39056 AFLOW.org repository along with their misfits and en- ClNa aflow:4f19021768a3118a 0.00399452 -3.39055 ClNa aflow:ec23029a18d3fec9 0.00373730 -3.39049 thalpies per atom. ClNa aflow:a4652bde28e67c3d 0.00339698 -3.39041 This routine reveals equivalent AFLOW.org com- ClNa aflow:18ebb85b07a92f89 0.00386555 -3.39033 pounds, if similar materials exists in the database. As ClNa aflow:1354bbef4edd80b3 0.00383458 -3.39032 such, it can estimate structural properties a priori; before ClNa aflow:182f848dd10cc403 0.00383093 -3.39031 performing any calculations. The estimation is based on ClNa aflow:d996b8d524516c24 0.00380678 -3.39030 the following assumptions: i. the matching AFLOW ma- ClNa aflow:a5755554aaf5d10e 0.00379748 -3.39030 terial resides at a local minimum in the energy landscape ClNa aflow:9466351a9cbac2c9 0.00379835 -3.39030 and ii. the input structure relaxes to the same geometry ClNa aflow:9a28207fd647e477 0.00379460 -3.39029 as that AFLOW compound, given comparable calcula- ClNa aflow:e3e31c4914d59e25 0.00379517 -3.39029 tion parameters. The functionality can explore proper- ClNa aflow:fd711a60dbfba2de 0.00378193 -3.39028 ClNa aflow:55d2cbd0f4018884 0.00405470 -3.39013 ties that are not calculated for a given entry, but are cal- ClNa aflow:3bd528dd9f88be7d 0.00395044 -3.39233 culated for an equivalent entry. For example, compounds ClNa aflow:f4b806d73482566c 0.00345690 -3.39121 in AFLOW’s prototype catalogs (LIB1, LIB2, LIB3, etc.) do not usually have band structure data; however, corre- sponding ICSD entries can be found which do provide library [24]. In this method, AFLOW prototypes are ex- band structure information. Finally, the method can tracted — based on similar stoichiometry, space group, identify compounds that are absent from the database and Wyckoff positions to the input — and compared to and prioritize them for future calculation, enhancing the the user’s structure. Since only matches to the input diversity of the AFLOW.org repositories. are relevant, the procedure terminates before regrouping Using AFLOW-XtalFinder. For ease-of-use, the any unmatched prototypes. The attributes of matched XtalFinder routines are accessible via a command-line prototypes are also returned, including the prototype la- interface and a Python environment (see Methods for bel, mineral name, Strukturbericht designation, and links details). to the corresponding Prototype Encyclopedia webpage. Ideal prototype analysis in AFLOW.org. The ideal The scheme identifies common structure-types with the prototype designations — for both the original and re- AFLOW libraries or — if no matches are found — reveals laxed geometries — have been successfully determined new prototypes. Absent prototypes can be character- for all 4+ million entries in the AFLOW.org repository. ized automatically in the AFLOW standard designation The prototype label, parameter variables, and parame- with XtalFinder’s prototyping tool (discussed in subsec- ter values are incorporated into the AFLOW REST- and tion “Problem of the ideal prototype”). Search-APIs [55, 56]. The corresponding API keywords Compare to AFLOW.org repository. Compounds for the original geometries are are compared to entries in the AFLOW.org repository us- • aflow_prototype_label_orig, ing the AFLOW REST- and AFLUX Search-APIs [55, 56] • aflow_prototype_params_list_orig, and (Figure5(b)). An AFLUX query (i.e. matchbook and • aflow_prototype_params_values_orig. directives) is generated internally and returns database For the DFT-relaxed geometries, the keywords are 11 encyclopedia/online mapping input a matching prototype structure label: A2B3_hR10_167_c_e mineral name: corundum
Strukturbericht: D51 web: aflow.org/CrystalDatabase/ A2B3_hR10_167_c_e.html … b matching entries AFLUX + REST-API entry: misfit:
Al2O3 (ICSD #608993) 0.0003 Al2O3 (ICSD #600672) 0.0021 …
FIG. 5. Encyclopedia/online prototype mapping. An input Al2O3 (corundum) compound is compared to entries in (a) AFLOW Prototype Encyclopedia and (b) the AFLOW.org repository. Potential equivalent entries are retrieved automatically from the respective catalog and compared with XtalFinder. Matching entries and their level of similarity (misfit) are returned.
• aflow_prototype_label_relax, structure. • aflow_prototype_params_list_relax, and Finding match: structural misfit versus calculated • aflow_prototype_params_values_relax. property (∆Hatom). To identify a suitable threshold The prototype keywords enable researchers to search for matching similar structures (match), Figure6 plots for materials by structure. This new feature is useful for the misfit value () between two mapped structures and identifying possible crystal structures given experimen- their difference in enthalpy per atom (∆Hatom). The tal data. For example, with composition, space group, structures in the test set are comprised of DFT-relaxed and occupied Wyckoff information (characteristics often entries from the entire AFLOW-ICSD catalog as of 14 known to experimentalists); users can construct the cor- August 2020 (60,390) [57, 58]. Compounds are grouped responding prototype label(s) and extract all compounds via commensurate atomic elements, stoichiometries, sym- based on the provided structure-type. The keywords are metries, and local LFA geometries. Furthermore, only also used to identify the frequency of certain prototypes compounds calculated with similar ab initio settings are in the AFLOW.org repository. For example, all com- compared together — such as LDAU parameters, kpoint pounds that are isopointal to the corundum prototype per reciprocal atom (KPPRA), and pseudopotentials (see (labels: A2B3 hR10 167 c e and A3B2 hR10 167 e c) Supplementary Information for details) — to prevent ex- can be retrieved for both the original and relaxed geome- traneous enthalpy differences due to differing parame- tries. Moreover, this search capability is used to discern ters. In addition, magnetic systems are excluded since if a structure-type is novel or has been reported previ- the magnetic moment is not incorporated into the misfit ously. value. For these comparisons, the unit cell volumes are The ideal prototype keywords also reveal whether a not rescaled, and the best lattice/origin choices are ex- compound retains the same prototype designation be- plored (minimizing the misfit value) to show better cor- fore and after relaxation. For structures that retain the relation with the enthalpies. After grouping the struc- same prototype label, the parameter values show the tures and identifying one-to-one mappings, misfit values continuous structure transition during relaxation. For for the remaining 6,795 comparison pairs are calculated. structures that transform into different prototypes, the Figures6(a) and (b) show the enthalpy difference ranges symmetry-based designations highlight the symmetries 0 100 meV/atom and 0 10 meV/atom, respectively, that were broken. This can indicate that certain element highlighting− the maximum− enthalpy differences at differ- combinations/arrangements are averse to certain proto- ent misfit values. type structures. More advanced relaxation techniques, In general, the misfit value correlates with the enthalpy e.g. symmetry-constrained relaxations [26], would be difference for 0.1: as the misfit value decreases, the required to restrict the relaxation to a given prototype enthalpy difference≤ also reduces. For > 0.1, the en-
a =0 =0 narrow enthalpy spread wide enthalpy spread . . 1 2 75 meV/atom ⇡ atom) / (meV atom H
✏ ✏ ✏ b =0 =0 max enthalpy difference additional datapoints . . 1 025 decreases with misfit large jump in max atom)
/ enthalpy difference
5 meV/atom (meV atom H