OBO: chemistry requirements

Colin Batchelor July 2, 2008

Abstract We list some urgent corrections which are needed to make ChEBI compatible with BFO, RO and the other OBO Foundry ontologies, as well as outlining developments which will minimize future curatorial effort. The urgent corrections are to ensure: is a completeness, that the is a relation is transitive, that the parthood relations are compatible with RO, that granular and determinate parthood are distinguished, that the names of important terms reflect biomedical usage rather than IUPAC’s prescriptions, and that the top-level terms are rearranged to reflect the new parthood relations. The future developments outlined are to replace most of the asserted structure of ChEBI with an inferred structure, and to introduce a cross-product structure compatible with postcomposition of terms.

1 Introduction

Not only are organisms made of chemical entities, but fundamental processes in biological organisms such as the Krebs cycle, DNA transcription, RNA transla- tion and post-translational modification are defined in terms of transformations of chemical entities. It is for this reason that the Open Biomedical Ontologies (OBO) need a biochemical ontology. In order to interoperate effectively with other OBO ontologies, any ontology should

• be is a complete • have an is a relation that is transitive • reuse relations from RO where appropriate • distinguish between granular and determinate parthood (Rogers et al.) • distinguish between dependent and independent continuants (BFO)

A biochemical ontology that is part of OBO we needs to provide a chemical framework for biomedical ontologies such as the molecular function and biolog- ical process ontologies in GO, the types of that are implied by the

1 Sequence Ontology, and the chemical classes in the Systems Biology Ontology. In order that the ontologies can be aligned, the biochemical ontology should as far as possible reflect how entities are talked about in the literature. For example, calling a term “nucleus”, which is polysemous, should be deprecated in favour of, for example, “atomic nucleus”. The key problem with a general-purpose chemical ontology is that the enti- ties it will be called upon to describe is combinatorially large—more compounds are synthesizable using readily available materials than there are in the universe by several orders of magnitude—therefore the curation task will always dwarf the curatorial effort. This is why machine-generated identifiers such as the InChI or the SMILES have been so successful in chemistry. However, there is no open algorithm that maps between chemical names and machine-generated identifiers, and this is a task to which a general-purpose chemical ontology would make an important contribution. In order to cope with the sheer size of chemical space, a general-purpose chemical ontology has to be:

• Generous with its inferred structure and extremely parsimonious with its asserted structure. • A cross-product structure which can be used for recursive postcomposition of terms. In this squib we present some urgent fixes to ChEBI (Chemical Entities of Biological Interest) which should make it more effectively interoperable with the other OBO Foundry ontologies and in passing outline how to produce a general- purpose chemical ontology based largely on an inferred structure rather than an asserted structure. The squib is structured as follows: in section 2 we present an upper-level ontology for chemistry, based on previous work by this author and discuss the difference between atoms and elements. Section 3 discusses polysemy, subsumption and parthood relations in ChEBI, giving fixes for is a completeness, transitivity and the parthood relations, as well as proposing a new way of writing genus–differentia definitions for . Finally, section 4 has a very brief account of the meaning of amino acid names in both ChEBI and the literature.

2 The upper level 2.1 An upper-level ontology Currently most terms in ChEBI are descendants of molecular entities CHEBI:23367, defined as A is any constitutionally or isotopically distinct , molecule, , ion pair, radical, radical ion, complex, con- former etc., identifiable as a separately distinguishable entity.

2 This has a wide range of descendants, whose identity may or may not be pre- served when parts are gained or lost. Chemistry, as the science of transforma- tions at the molecular level, needs an ontology which can distinguish identity- preserving transformations from identity-changing transformations. For exam- ple, salts CHEBI:24866, salts being collectives that can lose or gain without changing their identity, is a sibling of cooordination entities CHEBI:33240, even though if a loses a metal ion it changes its identity. There are clearly different parthood relations that apply to these terms and their de- scendants, but it is impossible to work out what parthood relations apply other than by close inspection of the term. Elsewhere in the ontology, the parts of kanamycin CHEBI:6104 are ingredients of a mixture, though this is nowhere signalled in its ancestry. There is also no obvious mapping to BFO. Based on its children, molecular structure CHEBI:24431 looks like a snap:IndependentContinuant. However, its definition begins “A description...” Such a non-realist definition is completely inappropriate for an ontology which is to be aligned with realist biomedical ontologies. The classes from BFO that describe molecules of biomedical interest are snap:Object and snap:FiatObjectPart. (We defer the question of what to do with the application and biological role terms to another meeting.) With this in mind, we present a minimal set of terms for an upper-level chemistry ontology. This has been written up in more detail in Batchelor (2008, attached). None of the terms below should have an is a parent within ChEBI, or be part of anything else. snap:FiatObjectParts: • molecular part (in ChEBI as groups (CHEBI:24433), is a molecular en- tities (ChEBI:23367)) snap:Objects that only have determinate parts: • molecule (= polyatomic entities (CHEBI:36357) is a molecular entities (CHEBI:23367)) • atom (= atoms (CHEBI:33250), no is a parent) • subatomic particle (= subatomic particle (CHEBI:36342), no is a parent) snap:Objects that are collectives: • salt (= salts CHEBI:24866, is a heteroatomic molecular entities (CHEBI:37577), is a polyatomic entities (CHEBI:36357)) • mixture (not in ChEBI) • pure substance (not in ChEBI) I have chosen a set which is pairwise disjoint—everything in ChEBI should fit into one and only one of the above categories. This is why ion and radical do not appear in this list; a species may be both a molecule and an ion, or both an atom and a radical.

3 Further consequences of this are that molecular entities CHEBI:23367 should be obsoleted, as should terms of the general type Xium molecular entities, being replaced by Xium molecule or Xium salt or an appropriately-named term to describe the macroscopic, pure substance.

2.2 Atoms and elements The is a children of atoms CHEBI:33250 are main group elements CHEBI:33318, s-block elements CHEBI:33559, metals ChEBI:33521 and nonmetals CHEBI:25585. To handle the “elements” terms first, it is simply not true that a chemical element can stand in an is a relationship with an atom, or that “element” is a synonym for “atom”. You cannot substitute the word “element” for “atom” in a sentence and expect the sentence to still make sense. Laser-cooled neutral atoms localized in a deeply confining optical potential satisfy this requirement. cannot be changed to Laser-cooled neutral elements localized in a deeply confining optical potential satisfy this requirement. The first definition in the IUPAC Gold Book of “chemical element” is:

1. A species of atoms; all atoms with the same number of protons in the atomic nucleus. The first part of that sentence implies strongly that “element” does not belong in a chemical or biochemical ontology, just as “species” does not belong in a biological taxonomy; the second part, which seems to mean the mereological sum of all atoms with the same number of protons in the universe, is too eccentric to include in an ontology. The second definition in the IUPAC Gold Book is: 2. A pure chemical substance composed of atoms with the same number of protons in the atomic nucleus. Sometimes this concept is called the elementary substance as distinct from the chemical el- ement as defined under 1, but mostly the term chemical element is used for both concepts. We can at least put something that fits this definition in a jar, but it is a collec- tive of atoms, and therefore can neither have an is a relationship to “atoms” nor be a synonym of “atoms”. What unites all the manifestations of a chemical element, whether it be a meitnerium atom created in a particle accelerator, a lump of solid gold or atmospheric carbon, is the relevant atom itself. We know atoms exist. Consequence 1: main group elements CHEBI:33318 should be renamed to “main group atoms”, and s-block elements CHEBI:33559 to “s-block atoms”. Similarly all terms in ChEBI ending in “elements”.

4 Consequence 2: Terms that have the name of a chemical element should have “atom” added to the end of the name. Likewise “chalcogens” and “pnictogens” should become “chalcogen atom” and “pnictogen atom”. Secondly, it is not true that all metals are some atom. Metallicity is a prop- erty of collectives of atoms or molecules rather than individual atoms. Phrases like “metal–oxygen bonding” are examples of a regular polysemy, where the name of the collective stands for a grain. We should probably tolerate “metal atoms” and “nonmetal atoms” for a biochemical ontology, but their current is a children should become children of “atoms”..

3 Polysemy, subsumption and parthood 3.1 Inferred vs. asserted structure Asserted relationships are those that have been manually added by a human curator, while inferred relationships have been worked out by a reasoner. A chemical ontology is an ideal place for a rich inferred structure because most of the classifications of chemical compounds are based on their parts. For example, all children of organic functional classes CHEBI:33244 can be defined in terms of parthood. Take imines CHEBI:24783, for example. If a molecule contains the imine substructure, a carbon atom bonded to two carbon atoms and doubly bonded to a nitrogen atom which is itself bonded to a carbon or hydrogen atom, then the molecule is an imine. This doesn’t need a human annotator; it can be done by machine. In OBO format, the assertion can be written like this: id: CHEBI:24783 name: imine intersection_of: CHEBI:25700 ! organic molecular entities intersection_of: has_determinate_part CHEBI:new ! imine substructure

3.2 Chemical nomenclature An important point to begin with is that chemical names are polysemous in a regular way. Most names can either completely specify the non-hydrogen atoms in a molecule and their connectivity or underspecify them. So “” can stand for either the molecule pyridine or any member of the effectively infinite set of compounds that contain a ring of carbon and nitrogen atoms connected to each other in the same way as pyridine. This ring is called a pyridine ring. Terms that have names that completely specify a compound will typically have an InChI. Consequence 1: A term with an name understood in the underspecified sense should never be the is a child of a term with a completely specified name. Consequence 2: The default should be for the name of a molecule to appear three times in the ontology, twice as descendants of molecule, underspecified and completely specified, and once as a descendant of molecular part.

5 Consequence 2.1: The completely specified molecule should be an is a child of the underspecified molecule (this can either be asserted or inferred). Consequence 2.2: The underspecified molecule should have as a determinate part the molecular skeleton. This will then be inherited by its completely- specified descendants. Consequence 3: In terms of genus–differentia definitions, an underspecified molecule chebine should be defined as follows “A molecule that has as a determi- nate part the chebine skeleton.” The completely specified molecule’s definition would therefore be something like “A chebine that only has hydrogen atoms attached to its skeleton.” Consequence 3.1: The 533 non-realist has parent hydride relationships be- tween a molecule and a completely-specified molecule can be replaced by is a relationships with their least underspecified parents. Consequence 3.2: The 3516 non-realist has functional parent relation- ships between a molecule and a completely-specified molecule can be replaced by is a relationships with their least underspecified parent. IUPAC has set canonical numbering schemes for the atoms in particular molecules. ChEBI already contains these in its structural database, though they are not accessible through the obo format file. These numbering schemes enable us to write powerful genus–differentia definitions if we allow a three-part relation. This will be briefly described in subsection 3.5.

3.3 is a completeness There are 1433 out of 19743 terms in ChEBI (7.26%) that are is a orphans. This means that they would need a separate, lexically-based mechanism for reasoning purposes. Table 1 proposes parentage for all but 19 of them, which are listed, and should be straightforward to assign by inspection. 1046 have functional parents, 121 have parent hydrides and 6 have both. By consequences 3.1 and 3.2 in the last section, these 1061 terms could in principle have is a parents added automatically, though care must be taken with, for example, esters having carboxylic acids as is a parents.

3.4 is a transitivity Having rearranged the top-level terms, there are two remaining difficulties. The first is easiest: “organic functional classes” CHEBI:33244 and “natural prod- uct classes” CHEBI:33243 are supertypes, types of types, so they should be obsoleted and their children made children of their parent, “organic molecular entities”. The other difficulty is that many terms have is a parentage leading back to “application” and/or “biological role” in addition to “molecular structure.” Simply put, an independent continuant cannot be an is a child of a dependant continuant. These relationships will have to be changed to has application and has biological role. There are some ‘mixed’ terms, such as organochlo- rine acaricide (CHEBI:38657). These are compounds with both structural

6 Proposed classification Terms As least-underspecified ancestor of functional All terms with a has functional parent parent (1046) relationship. As least-underspecified ancestor of parent All terms with a has parent hydride hydride (121) relationship. Obsoletion (3) ChEBI ontology CHEBI:23091, molecular structure CHEBI:24431, unclassifieds CHEBI:27189 Not chemically distinct (3) high-density lipoprotein CHEBI:47775, low-density lipoprotein cholesterol CHEBI:47774, very-low-density lipoprotein cholesterol CHEBI:47773 Top-level terms (4) atoms CHEBI:33250, biological role CHEBI:24432, application CHEBI:33232, subatomic particle CHEBI:36342 is a biological role (2) nutrient CHEBI:33284, vitamin CHEBI:33229 is a application (1) label CHEBI:35209 groups CHEBI:24433 (205) names ending in “group” or “groups” acyl-CoAs CHEBI:17984 (161) names ending in “CoA” or “CoAs” residue CHEBI:new (2) names ending in “residue” or “residues” cyclopropanecarboxylic ester CHEBI:new names ending in “-fluthrin” or “-halothrin” (13) or “-methrin” macrolides CHEBI:25106 (6) names with “-mectin” or “milbemycin” Inspection needed (19) 1,1’-diethyl-2,2’-cyanine CHEBI:37994, 2-(4-dimethylaminostyryl)-1-ethylpyridinium CHEBI:38008, 4-(dimethylamino)benzenediazonium CHEBI:38898, 1430 id: CHEBI:38006 name: 4-(4-dimethylaminostyryl)-1-ethylpyridinium cetylpyridinium CHEBI:32915, cetylpyridinium CHEBI:32914, ethylmercurithiosalicylate CHEBI:33215, amylopectin CHEBI:28057, amylose CHEBI:28102, hexanedioyl CHEBI:48082, sulfadiazinate CHEBI:33127, urate anions CHEBI:46818, thermorubin A CHEBI:48480, 3,3’-(biphenyl-4,4’-diyldidiazene-2,1- diyl)bis(4-aminonaphthalene-1-sulfonate) CHEBI:38216, graphene CHEBI:36973, (-1) CHEBI:49199, ions CHEBI:35274, ipratropium CHEBI:5956, atta(-4) CHEBI:33027

Table 1: is a orphans and suggested parentage

7 parentage and application parentage. In this case we recommend a cross-product definition as above. An organochlorine acaricide is an organochlorine that has application acaricide. There is no obvious semi-automatic fix for the application and biological role relations, but it must be done as a matter of priority in the next few releases, rather than in the next few years.

3.5 Parthood The is part of relation in ChEBI is incompatible with RO and also disguises the difference between the determinate parthood of an oxygen atom in a molecule (without which it is a molecule), and the granular parthood of a sodium ion in a lump of rocksalt (without which it is a slightly smaller lump of rocksalt). Rector et al. (2006) propose a set of relations which capture the different sorts of parthood we see asserted in ChEBI. They can all be written as descen- dants of the has part relation in RO: (i) has determinate part. This relationship holds between wholes and parts where removing a determinate part necessarily damages or diminishes the whole. This is particularly relevant to molecules, where loss of atoms converts the molecule into a different molecule. (ii) has grain. This relationship holds between wholes and parts where removing a grain does not necessarily damage or diminish the whole, as in removing an atom or molecule from a macroscopic lump of something. (iii) has ingredient. This relationship holds between mixtures and their ingredients, where the mixture is defined by the relative proportions of the ingredients and where removing an ingredient, perhaps by chromatography or centrifugation, necessarily damages or diminishes the whole. There are 436 is part of relationships in version 46 of ChEBI, in com- parison with 533 has parent hydride relationships, 3516 has functional- parent relationships, 631 is substituent group from relationships and 26490 is a relationships. This relatively small number of relationships makes it possible to determine how to fix the is part of relationships by inspecting them one-by-one. We have done this and the results are listed in the Appendix. A brief summary is given here: 4 of them relate molecular structure, subatomic particle, application and biological role to ChEBI ontology. Since the term “ChEBI ontology” does not refer to a chemical entity, it can be obsoleted. Over 100 indicate that an ion is part of a salt, which we consider to be granular parthood. A further six 60 indicate presence in a mixture, whether it be a racemate (a mixture of two com- pounds that only differ in terms of their handedness), a hydrate or a mixture produced by a living organism. Around half indicate determinate parthood. There are six generic relationships between macromolecules and their residues. While in general these are granular relations, it is conceivable that a specific oligonucleotide or oligopeptide might have a specific residue as a determinate part, so we leave the relationship as a has part one for the moment. Appendix

8 A.7 deals with relationships involving , which may actually specify lo- cation rather than parthood, and which require the attentions of a domain expert. There are four relationships involving roles and applications. It is not clear how we can have parthood relationships between dependent continuants. They would need to be rewritten involving independent continuants. In the attached document, we propose a further three-part relationship, has substituent...at position, which can be combined with the canonical numberings of molecules to write genus–differentia definitions of molecules and relate these to their names.

C has substituent S at position P = [definition] for all c, s, t, if c instance of C at t and s instance of S at t then there is some s that is part of c at position P .

So, to take a trivial example, 1,2-dimethylbenzene would be defined as fol- lows in an OBO-like syntax (here CHEBI:32875 is the methyl group). name: 1,2-dimethylbenzene intersection_of: CHEBI:22712 ! intersection_of: has_substituent CHEBI:32875 at_position 1 intersection_of: has_substituent CHEBI:32875 at_position 2 Some extra reasoning will be needed here to reconcile this with cardinality. There are three possible dimethylbenzenes (1,2- and 1,3- and 1,4-), which we would have do define as follows: name: 1,2-dimethylbenzene intersection_of: CHEBI:22712 ! benzenes intersection_of: has_determinate_part CHEBI:32875 cardinality 2

4 Lexical defaults

One problem that besets efforts to align ChEBI with GO is that the amino acids in ChEBI have been named from a formal nomenclature perspective, whereas the relevant terms in GO have been named according to biological reality. The relevant axes here are the distinction between amino acids as free molecules and as parts (residues) of a macromolecule, and their handedness (chirality). Amino acids and their residues in living systems (with a very few exceptions) all have a given handedness (l-), and this is not explicitly marked in GO, whereas it is in ChEBI. Table 2 lists, for four amino acids, and from the most recent 25 hits in PubMed for “”, “”, “” and “” as search terms, the sense in which those words are meant. Overwhelmingly the sense of a given amino acid name is the l-residue, and this is probably the sense in which it is meant in GO.

9 Type Lysine Arginine Histidine Serine l-whole 2 (5%) 9 (19%) 2 (6%) 0 l-part 2 (5%) 8 (17%) 1 (3%) 0 l-residue 16 (37%) 29 (61%) 27 (82%) 36 (100%) l-location 2 (5%) 0 (0%) 0 (0%) 0 l-don’t know 21 (49%) 1 (2%) 3 (9%) 0 don’t know 1 l total 43 47 33 36 d total 0 0 0 0 ? total 0 0 1 0

Table 2: What does “arginine” mean?

5 Conclusion

We have presented some urgent corrections to ChEBI, which we hope will in the main be relatively straightforward to fix in the next few releases. We also hope that this document will be a starting point for discussion about the future direction on ChEBI, and whether it is to remain serving as both a reference and an application ontology, or to fork into a general purpose chemistry reference ontology and a biochemical application ontology.

A Parthood relations A.1 Salts A salt is a mixture of ingredients that cannot exist independently—a macro- scopic sample of sodium cations, for example, would explode by Coulombic repulsion. It is for this reason that we propose that the salt term should have as a grain the anion or cation.

halide anions CHEBI:16042 is part of halide salts (CHEBI:33958) CHEBI:17051 is part of fluoride salts CHEBI:24060 chloride CHEBI:17996 is part of chloride salts CHEBI:23114 bromide CHEBI:15858 is part of bromide salts CHEBI:22925 iodide CHEBI:16382 is part of iodide salts CHEBI:24858 lead(2+) CHEBI:49807 is part of lead diacetate CHEBI:31767 cobalt(2+) CHEBI:48828 is part of cobalt dichloride CHEBI:35696 copper(2+) CHEBI:29036 is part of copper(2+) CHEBI:23414 sodium(1+) CHEBI:29101 is part of sodium salts CHEBI:26714 potassium(1+) CHEBI:29103 is part of potassium salts CHEBI:26218 anions CHEBI:22563 is part of salts CHEBI:24866 alpha-D,alpha-D-digalacturonate CHEBI:39473 is part of disodium alpha-D,alpha-D-digalacturonate CHEBI:39470 maleate(2-) CHEBI:30780 is part of maleate CHEBI:35030 fluoroacetate CHEBI:18172 is part of sodium fluoroacetate CHEBI:38699 pyruvate CHEBI:15361 is part of sodium pyruvate CHEBI:50144 (R)-carnitinamide CHEBI:17159 is part of L-carnitinamide chloride CHEBI:48602 stearate CHEBI:25629 is part of aluminium tristearate CHEBI:37867 salicylate CHEBI:30762 is part of azaheterocycle salicylate salts CHEBI:48884 cholate CHEBI:29747 is part of cholate salt CHEBI:23169 (1-) CHEBI:48311 is part of diclofenac epolamine CHEBI:48296 diclofenac(1-) CHEBI:48311 is part of diclofenac potassium CHEBI:4508 diclofenac(1-) CHEBI:48311 is part of diclofenac sodium CHEBI:4509 (1-) CHEBI:49165 is part of montelukast sodium CHEBI:6993 CHEBI:29223 is part of bromate salts CHEBI:22923 nitrite CHEBI:16301 is part of nitrite salts CHEBI:46648 sulfate CHEBI:16189 is part of copper(2+) sulfate CHEBI:23414 hydrogensulfite CHEBI:17137 is part of sodium hydrogensulfite CHEBI:26709

10 triphenylsulfonium CHEBI:48596 is part of triphenylsulfonium 4-oxo-1-adamantyloxycarbonyldifluoromethanesulfonate CHEBI:48454 triphenylsulfonium CHEBI:48596 is part of triphenylsulfonium 4-oxo-1-adamantyloxycarbonyldifluoromethanesulfonate CHEBI:48456 1-(2-oxo-2-phenylethyl)tetrahydrothiophenium CHEBI:48958 is part of 1-(2-oxo-2-phenylethyl)tetrahydrothiophenium 4-oxo-1-adamantyloxycarbonyldifluoromethanesulfonate CHEBI:48457 tetraborate(-2) CHEBI:38889 is part of disodium tetraborate CHEBI:38892 dichromate(2-) CHEBI:33141 is part of sodium dichromate CHEBI:39483 vanadate(3-) CHEBI:46442 is part of trisodium vanadate CHEBI:35607 CHEBI:17514 is part of cyanide salts CHEBI:36572 cyanate CHEBI:29195 is part of cyanate salts CHEBI:36831 carbonate CHEBI:41609 is part of carbonate salts CHEBI:46721 cations CHEBI:36916 is part of salts CHEBI:24866 anthocyanidin cations CHEBI:16366 is part of anthocyanidins CHEBI:38695 anthocyanin cations CHEBI:35218 is part of anthocyanins CHEBI:38697 cyanidin 3-O-beta-D-galactoside CHEBI:27475 is part of cyanidin 3-O-beta-D-galactoside chloride CHEBI:37664 pelargonidin 3-O-beta-D-glucoside CHEBI:31967 is part of pelargonidin 3-O-beta-D-glucoside chloride CHEBI:36122 cyanin CHEBI:3978 is part of cyanin chloride CHEBI:38021 cyanidin 3-O-rutinoside CHEBI:28064 is part of cyanidin 3-O-rutinoside chloride CHEBI:16726 luteolinidin CHEBI:6584 is part of luteolinidin chloride CHEBI:37648 pelargonidin CHEBI:25683 is part of pelargonidin chloride CHEBI:28510 delphinidin CHEBI:28436 is part of delphinidin chloride CHEBI:38701 ethylmercurithiosalicylate CHEBI:33215 is part of thimerosal CHEBI:9546 kanamycin A CHEBI:17630 is part of kanamycin A sulfate CHEBI:6109 CHEBI:2637 is part of amikacin disulfate CHEBI:2638 CHEBI:32499 is part of amineptine hydrochloride CHEBI:50003 taurocholate CHEBI:36257 is part of sodium taurocholate CHEBI:36276 glycocholate CHEBI:29746 is part of sodium glycocholate CHEBI:36273 4,4’-bis({4-anilino-6-[bis(2-hydroxyethyl)amino]-1,3,5-triazin-2-yl}amino)stilbene-2,2’-disulfonate CHEBI:50012 is part of Calcufluor White CHEBI:50011 quaternary ammonium ions CHEBI:35267 is part of quaternary ammonium salts CHEBI:35273 CHEBI:15355 is part of acetylcholine chloride CHEBI:2417 methacholine CHEBI:6804 is part of methacholine chloride CHEBI:50142 CHEBI:3647 is part of chlorpromazine hydrochloride CHEBI:3649 CHEBI:6807 is part of methadone hydrochloride CHEBI:50140 dodecyl sulfate CHEBI:23872 is part of CHEBI:8984 physostygmine CHEBI:27593 is part of salicylate CHEBI:48883 A 2’-propanoate CHEBI:48913 is part of erythromycin estolate CHEBI:4846 emamectin CHEBI:39230 is part of emamectin benzoate CHEBI:39233 (1+) diphosphate CHEBI:9532 is part of thiamine(1+) diphosphate chloride (CHEBI:18290) thiamine(1+) monophosphate CHEBI:9533 is part of thiamine(1+) monophosphate chloride (CHEBI:18338) acetophenazine CHEBI:2401 is part of acetophenazine dimaleate CHEBI:2402 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonate CHEBI:46757 is part of sodium 2-[4-(2-hydroxyethyl)piperazin- 1-yl]ethanesulfonate CHEBI:46758 carnitinamide CHEBI:48604 is part of carnitinamine chloride CHEBI:48601 levobunolol CHEBI:6438 is part of levobunolol hydrochloride CHEBI:6439 adamantan-1-aminium CHEBI:48320 is part of hydrochloride CHEBI:2619 CHEBI:3181 is part of bromocriptine methanesulfonate CHEBI:3182 ethidium CHEBI:42478 is part of CHEBI:4883 CHEBI:4466 is part of dexmedetomidine hydrochloride CHEBI:31472 levomedetomidine CHEBI:48557 is part of levomedetomidine hydrochloride CHEBI:48557 eprosartan CHEBI:4814 is part of eprosartan methanesulfonate CHEBI:48409 nitro blue tetrazolium(2+) CHEBI:7586 is part of nitro blue tetrazolium dichloride CHEBI:9505 N,N’-(9-{[4-(dimethylamino)phenyl]amino}acridine-3,6-diyl)bis(3-pyrrolidin-1-ylpropanamide) CHEBI:33534 is part of BRACO-19 CHEBI:39440 1-(2-hydroxyethyl)pyrrolidinium CHEBI:48312 is part of diclofenac epolamine CHEBI:48296 CHEBI:9566 is part of thioridazine hydrochloride CHEBI:48566 sulfadiazinate CHEBI:33127 is part of silver(1+) sulfadiazinate CHEBI:9142 thiamine(1+) CHEBI:18385 is part of thiamine(1+) chloride CHEBI:33283 thiamine(2+) CHEBI:49107 is part of thiamine(2+) dichloride CHEBI:49105 CHEBI:45783 is part of imatinib methanesulfonate CHEBI:31690 CHEBI:3723 is part of citalopram hydrobromide CHEBI:3724 quinapril CHEBI:8713 is part of quinapril hydrochloride CHEBI:8714 CHEBI:3638 is part of chloroquine sulfate CHEBI:50178 CHEBI:3611 is part of chlordiazepoxide hydrochloride CHEBI:3612 3,7-bis(dimethylamino)phenothiazin-5-ium CHEBI:43830 is part of blue CHEBI:6872 dothiepin CHEBI:36798 is part of dothiepin hydrochloride CHEBI:31519 cis-dothiepin CHEBI:36802 is part of cis-dothiepin hydrochloride CHEBI:36804 trans-dothiepin CHEBI:36803 is part of trans-dothiepin hydrochloride CHEBI:36805 quinacrine mustard CHEBI:37595 is part of quinacrine mustard dihydrochloride CHEBI:21182 N-(2-chloroethyl)-N’-(6-chloro-2-methoxyacridin-9-yl)-N-ethylpropane-1,3-diamine CHEBI:37594 is part of ICR- 70 CHEBI:21183 trimipramine CHEBI:9738 is part of trimipramine maleate CHEBI:35030 CHEBI:47499 is part of imipramine hydrochloride CHEBI:5882 lofepramine CHEBI:47782 is part of lofepramine hydrochloride CHEBI:31780 CHEBI:36796 is part of duloxetine hydrochloride CHEBI:36808 (S)-duloxetine CHEBI:36795 is part of (S)-duloxetine hydrochloride CHEBI:31526 (R)-duloxetine CHEBI:36797 is part of (R)-duloxetine hydrochloride CHEBI:36806 cyanotriphenylborate(-1) CHEBI:38895 is part of sodium cyanotriphenylborate (CHEBI:34659) thallium-201 CHEBI:37804 is part of ((201)Tl)thallium monochloride CHEBI:32213 hexafluoroaluminate(3-) CHEBI:39288 is part of trisodium hexafluoroaluminate CHEBI:39289 3,3’-(biphenyl-4,4’-diyldidiazene-2,1-diyl)bis(4-aminonaphthalene-1-sulfonate) CHEBI:38216 is part of Congo Red CHEBI:34653 tetrafluoroborate CHEBI:38899 is part of 4-(dimethylamino)benzenediazonium tetrafluoroborate CHEBI:35095 CHEBI:8069 is part of phenobarbital sodium CHEBI:8070 rabeprazole(1-) CHEBI:49199 is part of rabeprazole sodium CHEBI:8769 tetracyanonickelate(2-) CHEBI:49928 is part of potassium tetracyanonickelate(2-) CHEBI:30071 tetracyanonickelate(4-) CHEBI:30368 is part of potassium tetracyanonickelate(4-) CHEBI:30070 osmiamate CHEBI:35656 is part of potassium osmiamate CHEBI:35657 pentachloridonitridoosmate(2-) CHEBI:35658 is part of potassium pentachloridonitridoosmate(-2) CHEBI:35659 tetraxenonogold(2+) CHEBI:50000 is part of tetraxenonogold bis(undecafluoroantimonate) CHEBI:50001

11 undecafluorodiantimonate(1-) CHEBI:50002 is part of tetraxenonogold bis(undecafluoroantimonate) CHEBI:50001 hexafluorosilicate(2-) CHEBI:37189 is part of lead hexafluorosilicate CHEBI:37192 iminium ions CHEBI:35286 is part of iminium salts CHEBI:35277 ammonium ions CHEBI:35274 is part of ammonium compounds CHEBI:35276 arsanilate(1-) CHEBI:36048 is part of sodium arsanilate CHEBI:36049 ipratropium CHEBI:5956 is part of ipratropium bromide CHEBI:46659 acebutolol CHEBI:2379 is part of acebutolol hydrochloride CHEBI:2380 CHEBI:9398 is part of tamsulosin hydrochloride CHEBI:9399 paraquat CHEBI:34905 is part of paraquat dichloride CHEBI:28786 diamminesilver(1+) CHEBI:33049 is part of diamminesilver(1+) fluoride CHEBI:32129 enterobactin(6-) CHEBI:38150 is part of ferrienterobactin(3-) CHEBI:28199 gadopentetate CHEBI:35778 is part of gadopentetate dimeglumine CHEBI:31797 atta(4-) CHEBI:33027 is part of [Eu(atta)](-) CHEBI:33025 ep-atta(-4) CHEBI:37593 is part of [Eu(ep-atta)](-) CHEBI:33026 4-(dimethylamino)benzenediazonium CHEBI:38898 is part of 4-(dimethylamino)benzenediazonium tetrafluorob- orate CHEBI:35095 cetylpyridinium CHEBI:32914 is part of cetylpyridinium chloride CHEBI:32915 CHEBI:8597 is part of protriptyline hydrochloride CHEBI:8598 4-(4-dimethylaminostyryl)-1-ethylpyridinium CHEBI:38006 is part of 4-(4-dimethylaminostyryl)-1-ethylpyridinium iodide CHEBI:37995 1,1’-diethyl-2,2’-cyanine CHEBI:37994 is part of 1,1’-diethyl-2,2’-cyanine halide CHEBI:38003 2-(4-dimethylaminostyryl)-1-ethylpyridinium CHEBI:38008 is part of 2-(4-dimethylaminostyryl)-1-ethylpyridinium iodide CHEBI:38007

A.2 Hydrated compounds dihydrate, for example, is a stoichiometric mixture of lisinopril and water. It is notable that water is missing from each of these 16 examples, indicating that these are, as well as being in the wrong direction according to RO, not fully computable. Each of these relationships needs to have its direction reversed and changed to has ingredient. How to specify the stoichiometry is up for discussion.

lisinopril CHEBI:43755 is part of lisinopril dihydrate CHEBI:6503 CHEBI:43968 is part of meropenem trihydrate CHEBI:6770 motexafin gadolinium CHEBI:50161 is part of motexafin gadolinium hydrate CHEBI:50162 strontium dichloride CHEBI:36383 is part of strontium dichloride hexahydrate CHEBI:36385 sodium sulfate CHEBI:32149 is part of sodium sulfate decahydrate CHEBI:32586 disodium tetraborate CHEBI:38892 is part of borax CHEBI:38888 hydrazine CHEBI:15571 is part of hydrazine hydrate CHEBI:35511 sodium acetate CHEBI:32954 is part of sodium acetate trihydrate CHEBI:32138 calcium sulfate CHEBI:31346 is part of calcium sulfate dihydrate CHEBI:32583 aluminium trichloride CHEBI:30114 is part of aluminium trichloride hexahydrate CHEBI:30115 lead acetate CHEBI:31767 is part of lead acetate trihydrate CHEBI:33112 copper(2+) sulfate CHEBI:23414 is part of copper(2+) sulfate pentahydrate CHEBI:31440 zinc sulfate CHEBI:35176 is part of zinc sulfate heptahydrate CHEBI:32312 ipratropium bromide CHEBI:46659 is part of ipratropium bromide hydrate CHEBI:5957 gadodiamide CHEBI:37333 is part of gadodiamide hydrate CHEBI:31642 cetylpyridinium chloride CHEBI:32915 is part of cetylpyridinium chloride monohydrate CHEBI:3566

A.3 Mixtures There are two things that need to happen here. Firstly the direction of the rela- tionship needs to be reversed, and the is part of relation changed to has ingredient. Secondly any is a relationships that the second term (for example, kanamycin, abamectin) has to another molecular structure term must be deleted and re- placed by an is a relationship to mixture CHEBI:new.

kanamycin A CHEBI:17630 is part of kanamycin CHEBI:6104 kanamycin C CHEBI:28185 is part of kanamycin CHEBI:6104 kanamycin B CHEBI:28098 is part of kanamycin CHEBI:6104 all-trans-retinol CHEBI:17336 is part of vitamin A CHEBI:12777 all-trans-retinal CHEBI:17336 is part of vitamin A CHEBI:17898 bacitracin A CHEBI:35862 is part of CHEBI:28669 2,3,5-trimethylphenyl methylcarbamate CHEBI:38893 is part of trimethacarb CHEBI:38569 3,4,5-trimethylphenyl methylcarbamate CHEBI:38894 is part of trimethacarb CHEBI:38569 avermectin B1a CHEBI:29534 is part of abamectin CHEBI:39214 avermectin B1b CHEBI:29537 is part of abamectin CHEBI:39214 milbemycin A3 CHEBI:39228 is part of milbemectin CHEBI:39225 milbemycin A4 CHEBI:39229 is part of milbemectin CHEBI:39225 spinosyn D CHEBI:9232 is part of spinosad CHEBI:39211 spinosyn A CHEBI:9230 is part of spinosad CHEBI:39211

12 emamectin B1a CHEBI:39231 is part of emamectin CHEBI:39230 emamectin B1b CHEBI:39232 is part of emamectin CHEBI:39230 (1R)-trans-imiprothrin CHEBI:39373 is part of imiprothrin CHEBI:39389 (1R)-cis-imiprothrin CHEBI:39372 is part of imiprothrin CHEBI:39389 (1R)-cis-(alphaS)-cyfluthrin CHEBI:39309 is part of beta-cyfluthrin CHEBI:39314 (1S)-trans-(alphaR)-cyfluthrin CHEBI:39310 is part of beta-cyfluthrin CHEBI:39314 (1R)-trans-(alphaS)-cyfluthrin CHEBI:39312 is part of beta-cyfluthrin CHEBI:39314 (1S)-cis-(alphaR)-cyfluthrin CHEBI:39313 is part of beta-cyfluthrin CHEBI:39314 (1S)-cis-(alphaR)-cypermethrin CHEBI:39335 is part of alpha-cypermethrin CHEBI:39331 (1S)-cis-(alphaR)-cypermethrin CHEBI:39335 is part of beta-cypermethrin CHEBI:39332 (1R)-cis-(alphaS)-cypermethrin CHEBI:39336 is part of alpha-cypermethrin CHEBI:39331 (1R)-cis-(alphaS)-cypermethrin CHEBI:39336 is part of beta-cypermethrin CHEBI:39332 (1R)-cis-(alphaS)-cypermethrin CHEBI:39336 is part of zeta-cypermethrin CHEBI:39334 (1S)-trans-(alphaR)-cypermethrin CHEBI:39337 is part of beta-cypermethrin CHEBI:39332 (1S)-trans-(alphaR)-cypermethrin CHEBI:39337 is part of theta-cypermethrin CHEBI:39333 (1R)-trans-(alphaS)-cypermethrin CHEBI:39338 is part of beta-cypermethrin CHEBI:39332 (1R)-trans-(alphaS)-cypermethrin CHEBI:39338 is part of theta-cypermethrin CHEBI:39333 (1R)-trans-(alphaS)-cypermethrin CHEBI:39338 is part of zeta-cypermethrin CHEBI:39334 (1S)-cis-(alphaS)-cypermethrin CHEBI:39339 is part of zeta-cypermethrin CHEBI:39334 (1S)-trans-(alphaS)-cypermethrin CHEBI:39340 is part of zeta-cypermethrin CHEBI:39334 gamma-cyhalothrin CHEBI:39323 is part of lambda-cyhalothrin CHEBI:39325 (1S)-cis-(alphaR)-cyhalothrin CHEBI:39327 is part of lambda-cyhalothrin CHEBI:39325 (Z)-(1R)-cis-tefluthrin CHEBI:39395 is part of tefluthrin CHEBI:9430 (Z)-(1R)-trans-tefluthrin CHEBI:39396 is part of tefluthrin CHEBI:9430 tocopherols CHEBI:27013 is part of vitamin E CHEBI:33234 (+)-alpha-tocopherol CHEBI:18145 is part of vitamin E CHEBI:33234 thermorubin A CHEBI:48480 is part of thermorubin CHEBI:48516 monosodium aurothiomalate CHEBI:35863 is part of sodium aurothiomalate CHEBI:5516 disodium aurothiomalate CHEBI:35864 is part of sodium aurothiomalate CHEBI:5516 amylopectin CHEBI:28057 is part of starch CHEBI:28017 amylose CHEBI:28102 is part of starch CHEBI:28017

A.4 Determinate parthood These largely seem to be correct, but the all–some direction is wrong. They are also determinate parts—lisinopril ceases to be lisinopril if it loses its l-prolino group, for example. Over 100 of these are “molecular entities” terms.

dicyanoaurate(1-) CHEBI:49491 is part of sodium dicyanoaurate(1-) CHEBI:30058 dicyanoaurate(1-) CHEBI:49491 is part of potassium dicyanoaurate(1-) CHEBI:30057 proton CHEBI:24636 is part of protium CHEBI:29236 triton CHEBI:29234 is part of tritium CHEBI:29238 deuteron CHEBI:29233 is part of deuterium CHEBI:29237 L-gamma-glutamyl group CHEBI:32474 is part of 4-(L-gamma-glutamylamino)butanoic acid CHEBI:49260 L-lysine residue CHEBI:29967 is part of lisinopril CHEBI:43755 L-prolino group CHEBI:32866 is part of lisinopril CHEBI:43755 phenylsulfanyl group CHEBI:48499 is part of methyl N-(tert-butoxycarbonyl)-3-[2-(2,6-dichlorophenyl)-4-(phenylsulfanyl)- 1,2,3,4,4a,8a-hexahydro-6-quinolyl]alaninate CHEBI:48492 tert-butoxycarbonyl group CHEBI:48052 is part of methyl N-(tert-butoxycarbonyl)-3-[2-(2,6-dichlorophenyl)-6- quinolyl]alaninate CHEBI:48491 benzyloxy group CHEBI:48508 is part of 6-(benzyloxy)-2-chloroquinoline CHEBI:48488 benzyloxy group CHEBI:48508 is part of 6-(benzyloxy)-2-phenoxyquinoline CHEBI:48487 benzyloxy group CHEBI:48508 is part of ethyl [5-benzyloxy-4’-(trifluoromethyl)biphenyl-3-yl]acetate CHEBI:48662 benzyloxy group CHEBI:48508 is part of ethyl 2-[5-benzyloxy-4’-(trifluoromethyl)biphenyl-3-yl]pentanoate CHEBI:48665 methylsulfanyl group CHEBI:48563 is part of thioridazine CHEBI:9566 campholenic cyclohexenyl group CHEBI:48885 is part of 2-[3-(2,2,3-trimethylcyclopent-3-enyl)cyclohex-3-enyl]propan- 2-ol CHEBI:48699 campholenic cyclohexenyl group CHEBI:48885 is part of 2-[4-(2,2,3-trimethylcyclopent-3-enyl)cyclohex-3-enyl]propan- 2-ol CHEBI:48696 campholenic cyclohexenyl group CHEBI:48885 is part of [4-(2,2,3-trimethylcyclopent-3-enyl)cyclohex-3-enyl] CHEBI:48692 campholenic cyclohexenyl group CHEBI:48885 is part of [3-(2,2,3-trimethylcyclopent-3-enyl)cyclohex-3-enyl]methanol CHEBI:48693 campholenic cyclohexenyl group CHEBI:48885 is part of 1-[4-(2,2,3-trimethylcyclopent-3-enyl)cyclohex-3-enyl] CHEBI:48682 campholenic cyclohexenyl group CHEBI:48885 is part of 1-[3-(2,2,3-trimethylcyclopent-3-enyl)cyclohex-3-enyl]ethanol CHEBI:48685 hydroxy group CHEBI:43176 is part of CHEBI:44032 glycidyl group CHEBI:24366 is part of glycidyl 2,2-dinitropropyl formal CHEBI:48340 prenyl group CHEBI:26248 is part of abyssinone VI CHEBI:2369 prenyl group CHEBI:26248 is part of abyssinone V CHEBI:2368 2,6-dichlorobenzoyl group CHEBI:48625 is part of N-(2,6-dichlorobenzoyl)-3-[2-(2,6-dichlorophenyl)-6-quinolyl] CHEBI:48459 2,6-dichlorobenzoyl group CHEBI:48625 is part of N-(2,6-dichlorobenzoyl)-3-(2-phenoxy-6-quinolyl)alanine CHEBI:48479 2,6-dichlorobenzoyl group CHEBI:48625 is part of N-(2,6-dichlorobenzoyl)-3-[6-(2,6-dimethoxyphenyl)-2-naphthyl]alanine CHEBI:48463 2,6-dichlorobenzoyl group CHEBI:48625 is part of 2-[(2,6-dichlorobenzoyl)oxy]-3-[2-(2,6-dichlorophenyl)-6-quinolyl]propanoic acid CHEBI:48523 2,6-dichlorobenzoyl group CHEBI:48625 is part of N-(2,6-dichlorobenzyl)-3-[2-(2,6-dichlorophenyl)-6-quinolyl]- N-methylalanine CHEBI:48478 nitro group CHEBI:29785 is part of CHEBI:2948 nitro group CHEBI:48496 is part of 4-nitrophenylalanine CHEBI:48496 oxo group CHEBI:46629 is part of organic oxo compounds CHEBI:36587 alkyl group CHEBI:22323 is part of alkanesulfonic acids CHEBI:47901

13 aryl group CHEBI:33338 is part of arenesulfonic acids CHEBI:33555 1,4,8,11-tetraazacyclotetradecane CHEBI:37401 is part of (1,4,8,11-tetraazacyclotetradecane)copper(2+) CHEBI:37402 1,4,8,11-tetraazacyclotetradecane CHEBI:37401 is part of (1,4,8,11-tetraazacyclotetradecane)nickel(2+) CHEBI:38076 organyl groups CHEBI:33249 is part of organosulfonic acids CHEBI:33551 triflate group CHEBI:48510 is part of 2-phenoxy-6-quinolyl triflate CHEBI:48485 disulfanediyl group CHEBI:29826 is part of organic disulfides CHEBI:35489 carbon CHEBI:27594 is part of organic molecular entities CHEBI:25700 graphene CHEBI:36973 is part of graphite CHEBI:33418 oxygen-18 CHEBI:33815 is part of ((18)O)water CHEBI:33813 ammonium CHEBI:28938 is part of ammonium salts CHEBI:47704 main group elements CHEBI:33318 is part of main group molecular entities CHEBI:33579 alkali metals CHEBI:22314 is part of alkali metal molecular entities CHEBI:33296 potassium CHEBI:26216 is part of potassium molecular entities CHEBI:26217 sodium CHEBI:26708 is part of sodium molecular entities CHEBI:26712 CHEBI:30145 is part of lithium molecular entities CHEBI:33298 rubidium CHEBI:33322 is part of rubidium molecular entities CHEBI:37126 caesium CHEBI:30514 is part of caesium molecular entities CHEBI:37128 francium CHEBI:33323 is part of francium molecular entities CHEBI:37129 alkaline earth metals CHEBI:22313 is part of alkaline earth molecular entities CHEBI:33299 calcium CHEBI:22984 is part of calcium molecular entities CHEBI:22985 magnesium CHEBI:25107 is part of magnesium molecular entities CHEBI:25108 beryllium CHEBI:30501 is part of beryllium molecular entities CHEBI:33780 strontium CHEBI:33324 is part of strontium molecular entities CHEBI:37131 barium CHEBI:32594 is part of barium molecular entities CHEBI:37133 radium CHEBI:37201 is part of radium molecular entities CHEBI:37201 noble gases CHEBI:33309 is part of noble gas molecular entities CHEBI:33583 helium CHEBI:30217 is part of helium molecular entities CHEBI:33679 neon CHEBI:33310 is part of neon molecular entities CHEBI:36907 argon CHEBI:49475 is part of argon molecular entities CHEBI:36908 krypton CHEBI:49696 is part of krypton molecular entities CHEBI:36909 radon CHEBI:33314 is part of radon molecular entities CHEBI:33314 xenon CHEBI:49957 is part of xenon molecular entities CHEBI:36910 p-block elements CHEBI:33560 is part of p-block molecular entities CHEBI:33675 boron group elements CHEBI:33317 is part of boron group molecular entities CHEBI:33581 boron CHEBI:27560 is part of boron molecular entities CHEBI:22916 aluminium CHEBI:28984 is part of aluminium molecular entities CHEBI:33620 gallium CHEBI:49631 is part of gallium molecular entities CHEBI:37111 indium CHEBI:30430 is part of indium molecular entities CHEBI:37112 thallium CHEBI:30440 is part of thallium molecular entities CHEBI:37110 carbon group elements CHEBI:33306 is part of carbon group molecular entities CHEBI:33582 silicon CHEBI:27573 is part of silicon molecular entities CHEBI:26677 germanium CHEBI:30441 is part of germanium molecular entities CHEBI:33584 tin CHEBI:27007 is part of tin molecular entities CHEBI:27008 lead CHEBI:25016 is part of lead molecular entities CHEBI:33585 pnictogens CHEBI:33300 is part of pnictogen molecular entities CHEBI:33302 nitrogen CHEBI:25555 is part of nitrogen molecular entities CHEBI:25556 phosphorus CHEBI:28659 is part of phosphorus molecular entities CHEBI:26082 arsenic CHEBI:27563 is part of arsenic molecular entities CHEBI:22632 antimony CHEBI:30513 is part of antimony molecular entities CHEBI:36919 bismuth CHEBI:33301 is part of bismuth molecular entities CHEBI:37196 chalcogens CHEBI:33303 is part of chalcogen molecular entities CHEBI:33304 oxygen CHEBI:25805 is part of oxygen molecular entities CHEBI:25806 sulfur CHEBI:26833 is part of sulfur molecular entities CHEBI:26835 selenium CHEBI:27568 is part of selenium molecular entities CHEBI:26628 tellurium CHEBI:30452 is part of tellurium molecular entities CHEBI:33305 polonium CHEBI:33313 is part of polonium molecular entities CHEBI:36917 halogens CHEBI:24473 is part of halogen molecular entities CHEBI:24471 bromine CHEBI:22928 is part of bromine molecular entities CHEBI:22927 chlorine CHEBI:23116 is part of chlorine molecular entities CHEBI:23117 fluorine CHEBI:24061 is part of fluorine molecular entities CHEBI:24062 iodine CHEBI:24859 is part of iodine molecular entities CHEBI:24860 astatine CHEBI:30415 is part of astatine molecular entities CHEBI:37138 sulfinyl group CHEBI:29882 is part of sulfinyl halides CHEBI:50096 transition elements CHEBI:27081 is part of transition element molecular entities CHEBI:33497 lanthanoids CHEBI:33319 is part of lanthanoid molecular entities CHEBI:33775 europium CHEBI:32999 is part of europium molecular entities CHEBI:37266 lanthanum CHEBI:33336 is part of lanthanum molecular entities CHEBI:37215 cerium CHEBI:33369 is part of cerium molecular entities CHEBI:37261 praseodymium CHEBI:49828 is part of praseodymium molecular entities CHEBI:37279 neodymium CHEBI:33372 is part of neodymium molecular entities CHEBI:37280 promethium CHEBI:33373 is part of promethium molecular entities CHEBI:37281 samarium CHEBI:33374 is part of samarium molecular entities CHEBI:37282 gadolinium CHEBI:33375 is part of gadolinium molecular entities CHEBI:35729 terbium CHEBI:33376 is part of terbium molecular entities CHEBI:37284 dysprosium CHEBI:33377 is part of dysprosium molecular entities CHEBI:37295 holmium CHEBI:49648 is part of holmium molecular entities CHEBI:37297 erbium CHEBI:33379 is part of erbium molecular entities CHEBI:37298 thulium CHEBI:33380 is part of thulium molecular entities CHEBI:37299 ytterbium CHEBI:33381 is part of ytterbium molecular entities CHEBI:37300 lutetium CHEBI:33382 is part of lutetium molecular entities CHEBI:37301 scandium CHEBI:33330 is part of scandium moleular entities CHEBI:37202 yttrium CHEBI:33331 is part of yttrium molecular entities CHEBI:37203 actinoids CHEBI:33320 is part of actinoid molecular entities CHEBI:33498 uranium CHEBI:27214 is part of uranium molecular entities CHEBI:33499 actinium CHEBI:33337 is part of actinium molecular entities CHEBI:37216 thorium CHEBI:33385 is part of thorium molecular entities CHEBI:37302 protactinium CHEBI:33386 is part of protactinium molecular entities CHEBI:37303 neptunium CHEBI:33387 is part of neptunium molecular entities CHEBI:37305 plutonium CHEBI:33388 is part of plutonium molecular entities CHEBI:37306 americium CHEBI:33389 is part of americium molecular entities CHEBI:37307 curium CHEBI:33390 is part of curium molecular entities CHEBI:37308

14 berkelium CHEBI:33391 is part of berkelium molecular entities CHEBI:37309 californium CHEBI:33392 is part of californium molecular entities CHEBI:37310 einsteinium CHEBI:33393 is part of einsteinium molecular entities CHEBI:37311 fermium CHEBI:33394 is part of fermium molecular entities CHEBI:37312 mendelevium CHEBI:33395 is part of mendelevium molecular entities CHEBI:37313 nobelium CHEBI:33396 is part of nobelium molecular entities CHEBI:37314 lawrencium CHEBI:33397 is part of lawrencium molecular entities CHEBI:37315 scandium group elements CHEBI:33335 is part of scandium group molecular entities CHEBI:33773 titanium group elements CHEBI:33345 is part of titanium group molecular entities CHEBI:33768 titanium CHEBI:33341 is part of titanium molecular entities CHEBI:37217 zirconium CHEBI:33342 is part of zirconium molecular entities CHEBI:37218 hafnium CHEBI:33343 is part of hafnium molecular entities CHEBI:37219 rutherfordium CHEBI:33346 is part of rutherfordium molecular entities CHEBI:37220 vanadium group elements CHEBI:33347 is part of vanadium group molecular entities CHEBI:33746 vanadium CHEBI:27698 is part of vanadium molecular entities CHEBI:27275 niobium CHEBI:33344 is part of niobium molecular entities CHEBI:37221 tantalium CHEBI:33348 is part of tantalum molecular entities CHEBI:37222 dubnium CHEBI:33349 is part of dubnium molecular entities CHEBI:37223 chromium group elements CHEBI:33350 is part of chromium group molecular entities CHEBI:37741 chromium CHEBI:28073 is part of chromium molecular entities CHEBI:23237 molybdenum CHEBI:28685 is part of molybdenum molecular entities CHEBI:25370 tungsten CHEBI:27998 is part of tungsten molecular entities CHEBI:33742 seaborgium CHEBI:33351 is part of seaborgium molecular entities CHEBI:37224 manganese group elements CHEBI:33352 is part of manganese group molecular entities CHEBI:33743 manganese CHEBI:18291 is part of manganese molecular entities CHEBI:25154 technetium CHEBI:33353 is part of technetium molecular entities CHEBI:26865 rhenium CHEBI:49882 is part of rhenium molecular entities CHEBI:37225 bohrium CHEBI:33355 is part of bohrium molecular entities CHEBI:37226 iron group elements CHEBI:33356 is part of iron group molecular entities CHEBI:33744 iron CHEBI:18248 is part of iron molecular entities CHEBI:24873 ruthenium CHEBI:30682 is part of ruthenium molecular entities CHEBI:35734 osmium CHEBI:30687 is part of osmium molecular entities CHEBI:37227 cobalt group elements CHEBI:33358 is part of cobalt group molecular entities CHEBI:33767 cobalt CHEBI:27638 is part of cobalt molecular entities CHEBI:33888 rhodium CHEBI:33359 is part of rhodium molecular entities CHEBI:33887 iridium CHEBI:49666 is part of iridium molecular entities CHEBI:37228 meitnerium CHEBI:33361 is part of meitnerium molecular entities CHEBI:37229 nickel group elements CHEBI:33362 is part of nickel group molecular entities CHEBI:33747 nickel CHEBI:28112 is part of nickel molecular entities CHEBI:33748 palladium CHEBI:33363 is part of palladium molecular entities CHEBI:37230 platinum CHEBI:33364 is part of platinum molecular entities CHEBI:33749 darmstadtium CHEBI:33367 is part of darmstadtium molecular entities CHEBI:37231 copper group elements CHEBI:33366 is part of copper group molecular entities CHEBI:33745 copper CHEBI:28694 is part of copper molecular entities CHEBI:23377 silver CHEBI:30512 is part of silver molecular entities CHEBI:33964 gold CHEBI:29287 is part of gold molecular entities CHEBI:33969 roentgenium CHEBI:33368 is part of roentgenium molecular entities CHEBI:37232 platinum group metals CHEBI:33365 is part of platinum group molecular entities CHEBI:27081 f-block elements CHEBI:33562 is part of f-block molecular entities CHEBI:33677 iron-sulfur-molybdenum cluster CHEBI:48976 is part of iron-sulfur-molybdenum cofactor CHEBI:30409 ferricyanide CHEBI:5020 is part of hexacyanoferrate(3-) salts CHEBI:36296 CHEBI:5032 is part of hexacyanoferrate(4-) salts CHEBI:36294 iron-sulfur clusters CHEBI:30408 is part of iron-sulfur proteins CHEBI:35135 s-block elements CHEBI:33559 is part of s-block molecular entities CHEBI:33674 hydrogen CHEBI:49637 is part of hydrogen molecular entities CHEBI:33608 zinc CHEBI:27363 is part of zinc molecular entities CHEBI:27364 cadmium CHEBI:22977 is part of cadmium molecular entities CHEBI:22978 mercury CHEBI:25195 is part of mercury molecular entities CHEBI:25196 zinc group elements CHEBI:33340 is part of zinc group molecular entities CHEBI:33673 ununbium CHEBI:33517 is part of ununbium molecular entities CHEBI:37233 d-block elements CHEBI:33561 is part of d-block molecular entities CHEBI:33676 atoms CHEBI:33250 is part of groups CHEBI:24433 nucleus CHEBI:33252 is part of atoms CHEBI:33250 nucleon CHEBI:33253 is part of nucleus CHEBI:33252 electron CHEBI:10545 is part of atoms CHEBI:33250 electron CHEBI:10545 is part of muonium CHEBI:30213 electron CHEBI:10545 is part of muonide CHEBI:30215 electron CHEBI:10545 is part of positronium CHEBI:30224 antimuon CHEBI:30214 is part of muonide CHEBI:30215 antimuon CHEBI:30214 is part of muonium CHEBI:30213 positron CHEBI:30225 is part of positronium CHEBI:30224 groups CHEBI:24433 is part of polyatomic entities CHEBI:36357 alpha-particle CHEBI:30216 is part of helium-4 CHEBI:30219 helion CHEBI:30220 is part of helium-3 CHEBI:30218

15 A.5 Determinate parthood, both directions valid These 14 terms are trivially true, but can act as the basis for computable def- initions (see above). However, there should be a has determinate part rela- tionship coming from the parent, and the is part of relationship should be replaced with the is determinate part of relationship.

thiol group CHEBI:29917 is part of thiols CHEBI:29256 hydroxy group CHEBI:43176 is part of hydroxides CHEBI:24651 carboxy group CHEBI:46883 is part of carboxylic acids CHEBI:33575 carbamoyl group CHEBI:23004 is part of carboxamides CHEBI:37622 formyl group CHEBI:42485 is part of aldehydes CHEBI:17478 nitro group CHEBI:35715 is part of nitro compounds CHEBI:35715 carbonyl group CHEBI:23019 is part of carbonyl compounds CHEBI:36586 urate anions CHEBI:46818 is part of urates CHEBI:46819 diazo group CHEBI:30105 is part of diazo compounds CHEBI:39444 azo group CHEBI:30106 is part of azo compounds CHEBI:37533 borono group CHEBI:38272 is part of boronic acids CHEBI:38269 nitroso group CHEBI:35801 is part of nitroso compounds CHEBI:35800 peroxy group CHEBI:29369 is part of peroxides CHEBI:25940 hydroperoxy group CHEBI:29792 is part of hydroperoxides CHEBI:35923

A.6 Location These terms are bizarre.

very-low-density lipoprotein triglycerides CHEBI:47776 is part of very-low-density lipoproteins CHEBI:39027 high-density lipoprotein cholesterol CHEBI:47775 is part of high-density lipoproteins CHEBI:39025 low-density lipoprotein cholesterol CHEBI:47774 is part of low-density lipoproteins CHEBI:39026 very-low-density lipoprotein cholesterol CHEBI:47773 is part of very-low-density lipoproteins CHEBI:39027

A.7 Proteins These 12 relationships are problematic and need closer inspection, in collab- oration with domain experts in the field and ontologists, to work out whether they reflect chemical parthood, location, or what.

(L-cysteinato)(molybdopterin)molybdenum CHEBI:21273 is part of sulfite oxidase CHEBI:49118 bis(molybdopterin dinucleotide)(L-serinato)molybdenum CHEBI:21392 is part of re- ductase CHEBI:49123 bis(molybdopterin guanine dinucleotide)(L-selenocysteinato)molybdenum, CHEBI:21386 is part of molybdenum formate dehydrogenase CHEBI:49119 protein polypeptide chains CHEBI:16541 is part of proteins CHEBI:36080 flavins CHEBI:30527 is part of flavoproteins CHEBI:5086 hemes CHEBI:30413 is part of hemoproteins CHEBI:35137 ferroheme CHEBI:38573 is part of ferrocytochrome CHEBI:15983 ferroheme b CHEBI:17627 is part of ferrocytochrome b CHEBI:5034 ferriheme CHEBI:38574 is part of ferricytochrome CHEBI:15719 ferriheme b CHEBI:36144 is part of ferricytochrome b CHEBI:5022 heme-thiolate prosthetic group CHEBI:36073 is part of heme-thiolate proteins CHEBI:36074 apolipoproteins CHEBI:39015 is part of lipoproteins CHEBI:6495

A.8 Residues These can be reversed and replaced by the has part relation.

canonical ribonucleoside residues CHEBI:33792 is part of ribonucleic acids CHEBI:33697 canonical deoxyribonucleoside residues CHEBI:33793 is part of deoxyribonucleic acids CHEBI:16991 canonical nucleoside residues CHEBI:33791 is part of nucleic acids CHEBI:33696 locked nucleic acid residues CHEBI:48011 is part of locked nucleic acids CHEBI:48010 amino-acid residues CHEBI:33708 is part of CHEBI:16670 canonical amino-acid residues CHEBI:33700 is part of protein polypeptide chins CHEBI:16541

16 A.9 Dependent continuants

nutrient CHEBI:33284 is part of food CHEBI:33290 vitamin CHEBI:33229 is part of food CHEBI:33290 radioactive label CHEBI:35211 is part of radioactive tracer CHEBI:35207 label CHEBI:35209 is part of tracer CHEBI:35204

17