Converting Chemical Names to Structures with Name=Struct

Converting Chemical Names to Structures with Name=Struct

Converting Chemical Names to Structures with Name=Struct Abstract With either version, Name=Struct sets the standard for coverage and accuracy in generating structures from chemical names. Name=Struct is CambridgeSoft’s comprehensive algorithm for converting English Table of Contents 1 chemical names into chemical structure diagrams. General Nomenclature Issues . 2 It is designed to be as practical as possible, interpreting chemical names as they are actually Capabilities . 8 used by chemists. In addition to recognizing most IUPAC and CAS Names . 8 of the official rules and recommendations of Supported Procedures. 9 International Union of Pure and Applied Accuracy . 10 Chemistry (IUPAC), the International Union of Speed . 10 Biochemistry and Molecular Biology (IUBMB), Limitations . 10 and the Chemical Abstracts Service (CAS), Platform Limitations . 10 Name=Struct also recognizes the shorthand, slang, Chemistry Limitations . 11 and neologisms of everyday usage. It is extremely Structural Limitations. 11 tolerant of deviations from the “official” rules in Nomenclature Limitations . 12 regard to spaces, parentheses, and punctuation. Nomenclature Classes Handled by Name=Struct Both regular names (“chlorobenzene”) and Chains. 13 inverted names (“benzene, chloro-”) are supported. Rings . 14 In addition, it has an extensive algorithm for the Natural Products. 15 identification of common “typos” (typing errors, Functional Groups Containing Nitrogen . 17 such as “mehtyl”) to increase the odds of Functional Groups Containing Oxygen . 18 generating structures for the names it is given. Acids . 20 Name=Struct is available in two forms: a batch Ions and Radicals . 23 application, and an interactive version that is part Salts . 24 of CS ChemDraw 8.0 Ultra. Isotopes . 25 Stereochemistry . 25 Miscellaneous. 27 1. A description of an earlier version of Name=Struct was published as Brecher, J. “Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomencla- ture.” J. Chem. Inf. Comput. Sci. 39, 6, 943-950. General In general, Name=Struct is designed to be as smart as a real chemist—if a nomenclature human chemist can understand what structure is intended by a given name, issues then Name=Struct should manage to do so as well. Chemical names come in many styles. Some names truly do conform to published nomenclature recommendations, most commonly from IUPAC, IUBMB, or CAS. Clearly, Name=Struct needs to recognize these names, but that’s only the start of the problem. First, each of those organizations has changed their recommendations over time. There is no way to know which version of the recommendations were used to generate any given name, and so Name=Struct must recognize names produced by all versions. Changes in IUPAC Recommendations NH IUPAC 1979 recommendations S OH (methylthio)cyclohexane 2-(methylamino)-1-phenylethanol NH IUPAC 1993 recommendations S OH (methylsulfanyl)cyclohexane 2-(methylazanyl)-1-phenylethanol Second, many chemical names use trivial forms that have long been forbidden by all of those nomenclature bodies. Nonetheless, these trivial names are used frequently enough that most chemists will recognize their meaning, and so Name=Struct should as well. Trivial and Structural Names OH OH O Structures for trivial names O generated by Name=Struct N N H H paracetamol acetaminophen OH Structural name generated O by Struct=Name N H N-(4-hydroxyphenyl)acetamide 2 Converting Chemical Names to Structures with Name=Struct Finally, even though those organizations have published nomenclature recommendations, the recommendations are extremely complex and difficult to understand. Even the best-intentioned chemist will often produce names that—technically or egregiously—violate the published norms. As long as the meaning of the name remains clear, Name=Struct should be able to handle it. To achieve this goal, Name=Struct attempts to be as flexible as possible. Capitalization, font type, and font style are completely ignored. Most punctuation is ignored as well, regardless of whether it is used correctly as per the published recommendations or not. Spelling, similarly, is important only for clarity: Name=Struct will interpret many common misspellings correctly, but proper spelling is much more likely to be interpreted correctly. More recently, extensive typo recognition has been added, increasing the likelihood that names will be interpreted correctly even if they are not technically correct. Capitalization Capitalization is completely ignored when interpreting chemical names. “Benzene”, “benzene”, and “BENZENE” clearly all represent the same compound. There are a few nomenclature rules that call for a given capitalization. Possibly the most common of these is the “n” prefix. When used before an alkane, it indicates the straight-chain form (“n-butylamine”). When capitalized, it indicates that a ligand should be attached directly to a nitrogen atom (“N-chloroaniline”). If capitalization is ignored, there is a potential for ambiguity (“n- butylaniline”). In our experience, this is rarely an issue. Not only are names of this type uncommon, but the intended structure is almost always intended to be “the straight chain form on the nitrogen” (“N-(n-butyl)aniline”). It is much more common to find names with completely non-standard capitalization than it is to find ones where the capitalization actually makes a difference. Name=Struct interprets all of the following names identically: N-Butylaniline • n-butylaniline • n-Butylaniline • n-BUTYLANILINE • N-butylaniline • N-Butylaniline NH • N-BUTYLANILINE Converting Chemical Names to Structures with Name=Struct 3 Punctuation For the most part, Name=Struct ignores punctuation. It can be present, absent, or incorrect without affecting the interpretation of a chemical name. There are a few exceptions: Commas must be present in CAS-style inverted names. “Benzene, chloro-” and “chlorobenzene” are interpreted identically. “Benzene chloro” without the comma cannot be interpreted. Parentheses, brackets, and braces should be used for ambiguous names. “(Trichloromethyl)silane” is different from “trichloro(methyl)silane”. They also must be paired and nested appropriately, each open parenthesis with a matching close parenthesis and so on. Otherwise, parentheses, brackets, or braces may used interchangeably. Some punctuation is required to separate digits. Traditionally, this duty is performed by a comma: “1,1-” vs. “11-”, but Name=Struct will recognize any reasonable separator—for example “1-1-”—as acceptable. Other roles are traditionally performed by periods, colons, and hyphens, but Name=Struct will recognize any reasonable separator there as well. Spaces are ignored whenever possible, but there are a few cases where the space performs a vital role. This is most commonly important for esters. The following names represent very different structures: Effect of spaces in related names methyl ethyl malonate methylethyl malonate O O O O O O O OH methyl ethylmalonate methylethylmalonate O OH O O -O O O O- 4 Converting Chemical Names to Structures with Name=Struct Baseline Superscripts are occasionally called for by various published nomenclature recommendations. Name=Struct handles them in a way consistent with its handling of other typographical issues. If the superscript consists of a number and immediately follows some other number, they must be separated in some fashion (parentheses are most often used). Thus “13,7” becomes “1(3,7)” and should not be represented as the confusing “13,7”. In all other cases, the superscripted characters should follow the previous characters with no added separation: “N3,N4,N5-trimethyloctane-3,4,5-triamine”, for example, should be entered as “N3,N4,N5-trimethyloctane-3,4,5-triamine”. N3,N4,N5-trimethyloctane-3,4,5-triamine NH H N HN Fonts Chemical names are not usually defined by the particular font used to display the name. There is one exception to this rule. Some chemical names are supposed to use Greek characters. Unfortunately, not only is this nuance lost on many people, but it is also impossible in many circumstances, such as in a text-only database. Accordingly, Name=Struct will recognize any reasonable representation of Greek characters. The following names are all interpreted identically: α-methylphenethylamine • a-methylphenethylamine • alpha-methylphenethylamine • .alpha.-methylphenethylamine H N • α-methylphenethylamine 2 Italicization All font styles, including italicization, are completely ignored when interpreting chemical names. This will never introduce any ambiguity. Converting Chemical Names to Structures with Name=Struct 5 Spelling Spelling is critically important. There are many pairs of names that differ only by a single character—compare methylamine and menthylamine: Similar spellings NH2 H2N methylamine menthylamine However, in many cases, a human chemist might not even notice an incorrect spelling, and would interpret the name easily. Name=Struct should do the same. Many common misspellings, including “choro”, “cloro”, and “flouro” are automatically recognized (see “Automatic error recognition” below). Elision refers to the elimination of one vowel that is directly followed by another, and is technically required or prohibited in many cases: the rules say to use “propanamine” rather than “propaneamine”. Name=Struct does not enforce those rules; vowels may be elided or not without penalty. Elision H2N H2N propaneamine propanamine Automatic error Because it is designed to interpret real-life usage, Name=Struct also handles

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    28 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us