Arxiv:2010.04222V1 [Cond-Mat.Mtrl-Sci] 8 Oct 2020

AFLOW-XtalFinder: a reliable choice to identify crystalline prototypes David Hicks,1, 2 Cormac Toher,1, 2 Denise C. Ford,1, 2 Frisco Rose,1, 2 Carlo De Santo,1, 2 Ohad Levy,1, 2, 3 Michael J. Mehl,1, 2 and Stefano Curtarolo1, 2, ∗ 1Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, USA 2Center for Autonomous Materials Design, Duke University, Durham, North Carolina 27708, USA 3Department of Physics, NRCN, P.O. Box 9001, Beer-Sheva 84190, Israel (Dated: October 12, 2020) The accelerated growth rate of repository entries in crystallographic databases makes it arduous to identify and classify their prototype structures. The open-source AFLOW-XtalFinder package was developed to solve this problem. It symbolically maps structures into standard designations following the AFLOW Prototype Encyclopedia and calculates the internal degrees of freedom consistent with the International Tables for Crystallography. To ensure uniqueness, structures are analyzed and compared via symmetry, local atomic geometries, and crystal mapping techniques, simultaneously grouping them by similarity. The software i. distinguishes distinct crystal prototypes and atom decorations, ii. determines equivalent spin configurations, iii. reveals compounds with similar properties, and iv. guides the discovery of unexplored materials. The operations are accessible through a Python module ready for workflows, and through command line syntax. All the 4+ million compounds in the AFLOW.org repositories are mapped to their ideal prototype, allowing users to search database entries via symbolic structure-type. Furthermore, 15,000 unique structures | sorted by prevalence | are extracted from the AFLOW-ICSD catalog to serve as future prototypes in the Encyclopedia. Scientists have been struggling for decades to iden- comparison. For instance, XTALCOMP is coupled tify prototypes (e.g. Strukturbericht series [1] and Pear- with the XTALOPT infrastructure for identifying dis- son's Handbook [2]) and duplicates in crystallographic tinct materials generated via their evolutionary algo- databases; and to label structures in a concise way to rithm [15]. Despite the considerable number of platforms, recognize (and enable searching by) structure-types. The none are suitable for autonomous prototype detection. recent rapid growth of online repositories has worsened Crystallographic symmetry is neglected in Structure the problem [3]. Distinguishing distinct crystalline com- Matcher, XTALCOMP, and SPAP; while STRUCTURE- pounds is becoming increasingly difficult, leading to repe- TIDY, CRYCOM, and COMPSTRU rely on external sym- tition of previously studied materials, hindering database metry packages. Additionally, most tools only feature variety | biasing data-driven analyses and machine single pairwise comparisons (with the exception of Struc- learning methods [4,5] | and wasting valuable com- ture Matcher) and others require additional inputs (e.g. putational and experimental resources. The multitude space group, Wyckoff positions, and unit cell choice). of crystal geometries make by-hand detection of proto- Aside from technical functionality, the codes do not of- types and repeated entries intractable. A major com- fer built-in methods to compare structures to existing plication for finding structure-types is the non-standard crystallographic libraries and material repositories. To representation of crystals. Determination of unique crys- promote materials discovery, routines must analyze com- tallographic structures is obfuscated by i. unit cell repre- pounds with respect to established prototypes to iden- sentations and ii. origin choices. While standard forms tify new structure-types. This would enable the ex- exist | such as Niggli [6] and Minkowski [7] unit cells | pansion of prototype libraries | such as the AFLOW the conversion procedures are highly sensitive to numer- Prototype Encyclopedia (or Prototype Encyclopedia for ical tolerance values and can cast similar structures into brevity) [16, 17] | fueling generation of unique com- differing descriptions [8,9]. Additionally, lattice stan- pounds via prototype decoration. Comparing compounds dardization techniques do not address differences in ori- to those in materials databases can prevent duplication. gin choices. The lack of commensurate representations Moreover, the properties of database entries can be used impedes the search for prototypes and inhibits mappings to estimate those of similar uncalculated compounds, ex- between similar crystals and their corresponding proper- ploiting the structure-property relationship of materials. ties. Clearly, an automatic and reliable large-scale method for arXiv:2010.04222v1 [cond-mat.mtrl-sci] 8 Oct 2020 To overcome non-standard descriptions, crystal com- discerning unique crystallographic structures is therefore parison tools have been developed to identify similar crucial for the materials science community. structures. Programs such as Structure Matcher [10], AFLOW-XtalFinder (AFLOW crystal finder, XTALCOMP [9], SPAP [11], CMPZ [12], CRYCOM [8], XtalFinder for brevity) addresses many of the pre- STRUCTURE-TIDY [13], and COMPSTRU [14] are viously mentioned issues in a high-throughput fashion. available with varying objectives related to structure The primary objective of XtalFinder is to identify/- classify the prototypes of materials and relate them via structural similarity metrics. To accomplish this, XtalFinder determines the ideal prototype designation ∗ [email protected] of crystal structures, consistent with the International <latexit sha1_base64="MnxbZF/3rgP0c52rhfcIlkxMFxs=">AAAB83icbVDLSgMxFL1TX7W+qi7dBIvgqsxIoborunFZwT6gM5RMmmlDk8yQZIQy9DfcuFDErT/jzr8x085CWw8EDufcyz05YcKZNq777ZQ2Nre2d8q7lb39g8Oj6vFJV8epIrRDYh6rfog15UzSjmGG036iKBYhp71wepf7vSeqNIvlo5klNBB4LFnECDZW8n2BzUSJTLQb82G15tbdBdA68QpSgwLtYfXLH8UkFVQawrHWA89NTJBhZRjhdF7xU00TTKZ4TAeWSiyoDrJF5jm6sMoIRbGyTxq0UH9vZFhoPROhncwz6lUvF//zBqmJroOMySQ1VJLloSjlyMQoLwCNmKLE8JklmChmsyIywQoTY2uq2BK81S+vk+5V3WvUbx4atdZtUUcZzuAcLsGDJrTgHtrQAQIJPMMrvDmp8+K8Ox/L0ZJT7JzCHzifPy4dkcw=</latexit> <latexit sha1_base64="vjFOPzdf0hE21Rqx7fzubBeXhHM=">AAACAHicbVDLSsNAFJ34rPUVdeHCzWARXJVECuqu6MZlBfuAJpTJ9KYdOpOEmYlQQjb+ihsXirj1M9z5N07aLLT1wMDhnHuZc0+QcKa043xbK6tr6xubla3q9s7u3r59cNhRcSoptGnMY9kLiALOImhrpjn0EglEBBy6weS28LuPIBWLowc9TcAXZBSxkFGijTSwjz1IFOOGZp4geixFpqYizwd2zak7M+Bl4pakhkq0BvaXN4xpKiDSlBOl+q6TaD8jUjPKIa96qYKE0AkZQd/QiAhQfjY7IMdnRhniMJbmRRrP1N8bGRHKxArMZBFSLXqF+J/XT3V45WcsSlINEZ1/FKYc6xgXbeAhk0A1nxpCqGQmK6ZjIgnVprOqKcFdPHmZdC7qbqN+fd+oNW/KOiroBJ2ic+SiS9REd6iF2oiiHD2jV/RmPVkv1rv1MR9dscqdI/QH1ucPQQ6XgA==</latexit> <latexit sha1_base64="rdnbweuxU/kPx3dT7UzdXM8BZwQ=">AAACL3icbVBNSwMxEM3Wr7p+rXr0EiyKp7JbCuqtVhCPFewHdEvJptM2NJtdkqxQlv4jL/6VXkQU8eq/MG33oK0vzPB4M5NkXhBzprTrvlm5tfWNza38tr2zu7d/4BweNVSUSAp1GvFItgKigDMBdc00h1YsgYQBh2Ywup3Vm08gFYvEox7H0AnJQLA+o0Qbqevc+QEMmEiJlGQ8STmen4l9g89xyQRgk0Ls+3Z1RfFB9LLBrlNwi+4ceJV4GSmgDLWuM/V7EU1CEJpyolTbc2PdMbdpRjlMbD9REBM6IgNoGypICKqTzved4DOj9HA/kiaExnP190RKQqXGYWA6Q6KHark2E/+rtRPdv+qkTMSJBkEXD/UTjnWEZ+bhHpNANR8bQqhk5q+YDokkVBuLbWOCt7zyKmmUil65eP1QLlSqmR15dIJO0QXy0CWqoHtUQ3VE0TOaonf0Yb1Yr9an9bVozVnZzDH6A+v7BxLLo5o=</latexit> <latexit sha1_base64="cinTZGjg3dC8H01fUgn2SvJQwgo=">AAACA3icbVDLSgMxFM3UV62vUXe6CQ6CC6mTUlB3RTcuK9gHtMOQSdM2NJkZkoxQhoIbf8WNC0Xc+hPu/Bsz7Qjaekguh3PuJbkniDlT2nW/rMLS8srqWnG9tLG5tb1j7+41VZRIQhsk4pFsB1hRzkLa0Exz2o4lxSLgtBWMrjO/dU+lYlF4p8cx9QQehKzPCNZG8u2DesVP0eRMwO7pz3GyipBvO27ZnQIuEpQTB+So+/ZntxeRRNBQE46V6iA31l6KpWaE00mpmygaYzLCA9oxNMSCKi+d7jCBx0bpwX4kzQ01nKq/J1IslBqLwHQKrIdq3svE/7xOovsXXsrCONE0JLOH+gmHOoJZILDHJCWajw3BRDLzV0iGWGKiTWwlEwKaX3mRNCtlVC1f3lad2lUeRxEcgiNwAhA4BzVwA+qgAQh4AE/gBbxaj9az9Wa9z1oLVj6zD/7A+vgGWCeUIg==</latexit> <latexit sha1_base64="eU3dZoafB9Ef1Wyz2fMI2Y+QsJk=">AAAC8HicbVLNb9MwFHcCjFE+1sGRi0XFxCEKcWi7htMYF45Fotukpooc96Wz5nzIdqaVKH8FFw4gxJU/hxv/DU4a8dHtyX7f79n+PceF4Ep73i/LvnX7zs7d3Xu9+w8ePtrr7z8+UXkpGcxYLnJ5FlMFgmcw01wLOCsk0DQWcBpfvG3ip5cgFc+zD3pdwCKlq4wnnFFtXNG+tRPGsOJZRaWk67oSTNS9UMOVrloeJ5WgMYi6xgf4tdlhSvW5TKs3x2GUTodhREgYgVk1DsOtyoJKmqo/pdTB8UvDWMPMsdqIq6gitYM/boSx/I3l39DukooS/rYbuaPJ2MGee0iCRoyDiYMD4gaB7zTFBybHc/2R3wT9V8M2dURaa0za9pAtu4dH/YHnei3h6wrplAHqaBr1f4bLnJUpZJoJqtSceIVemG6aMwEGw1JBQdkFXcHcqBlNQS2qdmA1fm48S5zk0uxM49b7b0VlYFPrNDaZDdxqO9Y4b4rNS51MFhXPilJDxjYHJaXAOsfN9PGSS2BarI1CmeTmrpidmykxbf5Iz4BAtp98XTnxXTJ0g/fDwdFxB8cueoqeoReIoEN0hN6hKZohZqXWJ+uL9dWW9mf7m/19k2pbXc0T9B/ZP34Dq2XeaQ==</latexit> <latexit sha1_base64="cHeDYIe0bMM0ldSq3iZ2hjB1AjI=">AAACLXicbVBNSwMxEM3Wr1q/qh69BIvgQcquFNRbUUGPFe0HdEvJptM2NMkuSVYoS/+QF/+KCB4q4tW/Ydouoq0DgTdv5k1mXhBxpo3rjp3M0vLK6lp2PbexubW9k9/dq+kwVhSqNOShagREA2cSqoYZDo1IAREBh3owuJrU64+gNAvlgxlG0BKkJ1mXUWIs1c5f+wH0mEyIUmQ4Sugo5wti+kokolIaYd//ye9vbHqC/QL2vCkPspPK2vmCW3SngReBl4ICSqPSzr/6nZDGAqShnGjd9NzItOw0wygHu0OsISJ0QHrQtFASAbqVTK8d4SPLdHA3VPZJg6fsb0VChNZDEdjOyep6vjYh/6s1Y9M9byVMRrEBSWcfdWOOTYgn1uEOU0ANH1pAqGJ2V0z7RBFqrME5a4I3f/IiqJ0WvVLx4q5UKF+mdmTRATpEx8hDZ6iMblEFVRFFT+gFjdG78+y8OR/O56w146SaffQnnK9vpz2nuA==</latexit> <latexit sha1_base64="TiA9CWcFx0uoceH1+hCdj54qzbk=">AAACLXicbVBNSwMxEM3Wr1q/qh69BIvgQcquFKq3ooJ6q2g/oFtKNp22odnskmSFsvQPefGviOChIl79G2bbRbR1IPDmzbzJzPNCzpS27YmVWVpeWV3Lruc2Nre2d/K7e3UVRJJCjQY8kE2PKOBMQE0zzaEZSiC+x6HhDS+TeuMRpGKBeNCjENo+6QvWY5RoQ3XyV64HfSZiIiUZjWM6zrk+0QPpx8FtaYxd9ye/vzbpCXYLuOxMeRDdVNbJF+yiPQ28CJwUFFAa1U7+1e0GNPJBaMqJUi3HDnXbTNOMcjA7RApCQoekDy0DBfFBtePptWN8ZJgu7gXSPKHxlP2tiImv1Mj3TGeyupqvJeR/tVake2ftmIkw0iDo7KNexLEOcGId7jIJVPORAYRKZnbFdEAkodoYnDMmOPMnL4L6adEpFc/vSoXKRWpHFh2gQ3SMHFRGFXSDqqiGKHpCL2iC3q1n6836sD5nrRkr1eyjP2F9fQOoVqe5</latexit>

Arxiv:2010.04222V1 [Cond-Mat.Mtrl-Sci] 8 Oct 2020

Semantic Skin: from Flat Textual Content to Interconnected Repositories Of

Where Is the Semantic Web? – an Overview of the Use of Embeddable Semantics in Austria

Appendix a the Ten Commandments for Websites

Microformats Cheat Sheet

Microformats: Empowering Your Markup for Web 2.0

Standardization's All Very Well, but What About the Exabytes of Existing

The Global Rock-Art Database

Web Standards.Pdf

Cinii: Bringing Linked Data to Japan's Largest Scholarly Search Engine

Towards a Visual Annotation Tool for End-User Semantic Content Authoring

Microformats.Cheatsheet

Using Microformats and Rdfa As Embeddable Data Formats