Chemistry Requirements
Total Page:16
File Type:pdf, Size:1020Kb
OBO: chemistry requirements Colin Batchelor July 2, 2008 Abstract We list some urgent corrections which are needed to make ChEBI compatible with BFO, RO and the other OBO Foundry ontologies, as well as outlining developments which will minimize future curatorial effort. The urgent corrections are to ensure: is a completeness, that the is a relation is transitive, that the parthood relations are compatible with RO, that granular and determinate parthood are distinguished, that the names of important terms reflect biomedical usage rather than IUPAC's prescriptions, and that the top-level terms are rearranged to reflect the new parthood relations. The future developments outlined are to replace most of the asserted structure of ChEBI with an inferred structure, and to introduce a cross-product structure compatible with postcomposition of terms. 1 Introduction Not only are organisms made of chemical entities, but fundamental processes in biological organisms such as the Krebs cycle, DNA transcription, RNA transla- tion and post-translational modification are defined in terms of transformations of chemical entities. It is for this reason that the Open Biomedical Ontologies (OBO) need a biochemical ontology. In order to interoperate effectively with other OBO ontologies, any ontology should • be is a complete • have an is a relation that is transitive • reuse relations from RO where appropriate • distinguish between granular and determinate parthood (Rogers et al.) • distinguish between dependent and independent continuants (BFO) A biochemical ontology that is part of OBO we needs to provide a chemical framework for biomedical ontologies such as the molecular function and biolog- ical process ontologies in GO, the types of molecule that are implied by the 1 Sequence Ontology, and the chemical classes in the Systems Biology Ontology. In order that the ontologies can be aligned, the biochemical ontology should as far as possible reflect how entities are talked about in the literature. For example, calling a term \nucleus", which is polysemous, should be deprecated in favour of, for example, \atomic nucleus". The key problem with a general-purpose chemical ontology is that the enti- ties it will be called upon to describe is combinatorially large|more compounds are synthesizable using readily available materials than there are atoms in the universe by several orders of magnitude|therefore the curation task will always dwarf the curatorial effort. This is why machine-generated identifiers such as the InChI or the SMILES have been so successful in chemistry. However, there is no open algorithm that maps between chemical names and machine-generated identifiers, and this is a task to which a general-purpose chemical ontology would make an important contribution. In order to cope with the sheer size of chemical space, a general-purpose chemical ontology has to be: • Generous with its inferred structure and extremely parsimonious with its asserted structure. • A cross-product structure which can be used for recursive postcomposition of terms. In this squib we present some urgent fixes to ChEBI (Chemical Entities of Biological Interest) which should make it more effectively interoperable with the other OBO Foundry ontologies and in passing outline how to produce a general- purpose chemical ontology based largely on an inferred structure rather than an asserted structure. The squib is structured as follows: in section 2 we present an upper-level ontology for chemistry, based on previous work by this author and discuss the difference between atoms and elements. Section 3 discusses polysemy, subsumption and parthood relations in ChEBI, giving fixes for is a completeness, transitivity and the parthood relations, as well as proposing a new way of writing genus{differentia definitions for molecules. Finally, section 4 has a very brief account of the meaning of amino acid names in both ChEBI and the literature. 2 The upper level 2.1 An upper-level ontology Currently most terms in ChEBI are descendants of molecular entities CHEBI:23367, defined as A molecular entity is any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, con- former etc., identifiable as a separately distinguishable entity. 2 This has a wide range of descendants, whose identity may or may not be pre- served when parts are gained or lost. Chemistry, as the science of transforma- tions at the molecular level, needs an ontology which can distinguish identity- preserving transformations from identity-changing transformations. For exam- ple, salts CHEBI:24866, salts being collectives that can lose or gain ions without changing their identity, is a sibling of cooordination entities CHEBI:33240, even though if a coordination complex loses a metal ion it changes its identity. There are clearly different parthood relations that apply to these terms and their de- scendants, but it is impossible to work out what parthood relations apply other than by close inspection of the term. Elsewhere in the ontology, the parts of kanamycin CHEBI:6104 are ingredients of a mixture, though this is nowhere signalled in its ancestry. There is also no obvious mapping to BFO. Based on its children, molecular structure CHEBI:24431 looks like a snap:IndependentContinuant. However, its definition begins \A description..." Such a non-realist definition is completely inappropriate for an ontology which is to be aligned with realist biomedical ontologies. The classes from BFO that describe molecules of biomedical interest are snap:Object and snap:FiatObjectPart. (We defer the question of what to do with the application and biological role terms to another meeting.) With this in mind, we present a minimal set of terms for an upper-level chemistry ontology. This has been written up in more detail in Batchelor (2008, attached). None of the terms below should have an is a parent within ChEBI, or be part of anything else. snap:FiatObjectParts: • molecular part (in ChEBI as groups (CHEBI:24433), is a molecular en- tities (ChEBI:23367)) snap:Objects that only have determinate parts: • molecule (= polyatomic entities (CHEBI:36357) is a molecular entities (CHEBI:23367)) • atom (= atoms (CHEBI:33250), no is a parent) • subatomic particle (= subatomic particle (CHEBI:36342), no is a parent) snap:Objects that are collectives: • salt (= salts CHEBI:24866, is a heteroatomic molecular entities (CHEBI:37577), is a polyatomic entities (CHEBI:36357)) • mixture (not in ChEBI) • pure substance (not in ChEBI) I have chosen a set which is pairwise disjoint|everything in ChEBI should fit into one and only one of the above categories. This is why ion and radical do not appear in this list; a species may be both a molecule and an ion, or both an atom and a radical. 3 Further consequences of this are that molecular entities CHEBI:23367 should be obsoleted, as should terms of the general type Xium molecular entities, being replaced by Xium molecule or Xium salt or an appropriately-named term to describe the macroscopic, pure substance. 2.2 Atoms and elements The is a children of atoms CHEBI:33250 are main group elements CHEBI:33318, s-block elements CHEBI:33559, metals ChEBI:33521 and nonmetals CHEBI:25585. To handle the \elements" terms first, it is simply not true that a chemical element can stand in an is a relationship with an atom, or that \element" is a synonym for \atom". You cannot substitute the word \element" for \atom" in a sentence and expect the sentence to still make sense. Laser-cooled neutral atoms localized in a deeply confining optical potential satisfy this requirement. cannot be changed to Laser-cooled neutral elements localized in a deeply confining optical potential satisfy this requirement. The first definition in the IUPAC Gold Book of \chemical element" is: 1. A species of atoms; all atoms with the same number of protons in the atomic nucleus. The first part of that sentence implies strongly that \element" does not belong in a chemical or biochemical ontology, just as \species" does not belong in a biological taxonomy; the second part, which seems to mean the mereological sum of all atoms with the same number of protons in the universe, is too eccentric to include in an ontology. The second definition in the IUPAC Gold Book is: 2. A pure chemical substance composed of atoms with the same number of protons in the atomic nucleus. Sometimes this concept is called the elementary substance as distinct from the chemical el- ement as defined under 1, but mostly the term chemical element is used for both concepts. We can at least put something that fits this definition in a jar, but it is a collec- tive of atoms, and therefore can neither have an is a relationship to \atoms" nor be a synonym of \atoms". What unites all the manifestations of a chemical element, whether it be a meitnerium atom created in a particle accelerator, a lump of solid gold or atmospheric carbon, is the relevant atom itself. We know atoms exist. Consequence 1: main group elements CHEBI:33318 should be renamed to \main group atoms", and s-block elements CHEBI:33559 to \s-block atoms". Similarly all terms in ChEBI ending in \elements". 4 Consequence 2: Terms that have the name of a chemical element should have \atom" added to the end of the name. Likewise \chalcogens" and \pnictogens" should become \chalcogen atom" and \pnictogen atom". Secondly, it is not true that all metals are some atom. Metallicity is a prop- erty of collectives of atoms or molecules rather than individual atoms. Phrases like \metal{oxygen bonding" are examples of a regular polysemy, where the name of the collective stands for a grain. We should probably tolerate \metal atoms" and \nonmetal atoms" for a biochemical ontology, but their current is a children should become children of \atoms".. 3 Polysemy, subsumption and parthood 3.1 Inferred vs. asserted structure Asserted relationships are those that have been manually added by a human curator, while inferred relationships have been worked out by a reasoner.