Chemspider – Is This the Future of Linked Chemistry on the Internet?

Chemspider – Is This the Future of Linked Chemistry on the Internet?

ChemSpider – Is This The Future of Linked Chemistry on the Internet? Antony Williams BAGIM, Boston, August 2010 Our dog has fleas It’s not an Advantage… What is the structure of “Advantage”? . Audience Participation Time…. Where would you look? . What would you trust? . Where would you look ONLINE? What is the Structure of Vitamin K? MeSH . A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K What is the Structure of Vitamin K1? Wikipedia What is the Structure of Vitamin K1? CAS’s Common Chemistry PubChem “2-methyl-3-(3,7,11,15-tetramethylhexadec-2- enyl)naphthalene-1,4-dione” . Variants of systematic names on PubChem . 2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl . 2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl . 2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl . 2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl . 2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl . 2-methyl-3-[(E)-3,7,11,15-tetramethyl . 2-methyl-3-(3,7,11,15-tetramethyl . 2-methyl-3-[(E)-3,7,11,15-tetramethyl Bioassay Data are Associated… Structures on DailyMed Lack of Stereochemistry Does Stereochemistry Matter? Does one stereocenter matter? . Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide Incorrect Structures Wow! ChEBI – Manual Curation The InChI Identifier Multiple Layers InChIStrings Hash to InChIKeys PubChem InChIKeys . MBWXNTAXLNYFJB-NKFFZRIASA-N . MBWXNTAXLNYFJB-LKUDQCMESA-N . MBWXNTAXLNYFJB-UHFFFAOYSA-N . MBWXNTAXLNYFJB-FAKCLFGASA-N . MBWXNTAXLNYFJB-NIHVXYICSA-N (O-18 label) . MBWXNTAXLNYFJB-ODDKJFTJSA-N . MBWXNTAXLNYFJB-KSVLJPARSA-N . MBWXNTAXLNYFJB-UDCSOKOMSA-N . MBWXNTAXLNYFJB-JHBCSKSVSA-N . MBWXNTAXLNYFJB-JXAKDHTRSA-N PubChem InChIKeys . MBWXNTAXLNYFJB-NKFFZRIASA-N . MBWXNTAXLNYFJB-LKUDQCMESA-N . MBWXNTAXLNYFJB-UHFFFAOYSA-N . MBWXNTAXLNYFJB-FAKCLFGASA-N . MBWXNTAXLNYFJB-NIHVXYICSA-N (O-18 label) . MBWXNTAXLNYFJB-ODDKJFTJSA-N . MBWXNTAXLNYFJB-KSVLJPARSA-N . MBWXNTAXLNYFJB-UDCSOKOMSA-N . MBWXNTAXLNYFJB-JHBCSKSVSA-N . MBWXNTAXLNYFJB-JXAKDHTRSA-N InChIs . InChIs are proliferating across databases . InChIs are increasingly used by publishers . Single code base – no multi-flavored SMILES . InChIs are “incomplete” but very useful… Vancomycin – Search the Internet Full Skeleton Search: 104 Hits Full Molecule Search: 4 Hits Is this the structure of Vitamin K1? Where is chemistry online? . Encyclopedic articles (Wikipedia) . Chemical vendor databases . Metabolic pathway databases . Property databases . Patents with chemical structures . Drug Discovery data . Scientific publications . Compound aggregators . Blogs/Wikis and Open Notebook Science Linked Data on the Web Taken from: Rafael Sidis’ Blog Where Would You look? What Do You Trust? Question Everything online: www.dhmo.org It’s all on Wikipedia… What’s Methane? What’s Methane? What ELSE is Methane??? The EXPERTS must get it right?! Wikipedia, C&E News, PubChem C&E News (from ACS) Feedback from Steve Ritter . “As for where we source our structures, our primary source is the researcher and peer- reviewed papers, because many compounds are novel. ..we always double check them against one or more primary sources, typically Merck Index and SciFinder. Although CAS and C&EN are both part of the ACS Publications Division, we at C&EN still have to pay for our SciFinder access, strangely enough.” Feedback from Steve Ritter . “As a rule, we at C&EN don’t use Wikipedia as a primary source for structures or chemical information, and I recommend that policy to anyone.” . “It would be nice to have an authoritative web- based source of standard, well-drawn structures for chemists to go to so they can freely cut and paste structures into their papers, PowerPoint presentations, and anything else they might need. Maybe Wikipedia will be that source one day.” A vision… . Authoritative web-based source of standard, well-drawn structures . With associated data – spectra, property data, ADME/Tox data, Bioassay data . Linked to encyclopedic articles, publications, patents, MSDS/safety sheets . Links to chemical vendors . Links to property predictions A Pragmatic Vision “Build a Structure Centric Community” . December 2006 – A hobby project initiated to connect chemistry on the web . Integrate chemical structure data on the web . Create a “structure-based hub” to information and data . Provide access to structure-based “algorithms” . Let chemists contribute their own data . Allow the community to curate/correct data What do humans want? media.obsessable.com As few interfaces as possible www.chemspider.com We’re Out to Answer Questions . Questions a chemist might ask… . What is the melting point of n-heptanol? . What is the chemical structure of Xanax? . Chemically, what is phenolphthalein? . What are the stereocenters of cholesterol? . Where can I find publications about xylene? . What are the different trade names for Ketoconazole? . What is the NMR spectrum of Aspirin? . What are the safety handling issues for Thymol Blue? Search for a Chemical…by name Available Information… . Linked to vendors, safety data, toxicity, metabolism Available Information…. Search for a chemical…by structure Substructure search coming… Annotating, Cleaning and Growing... Almost 25 million chemicals from 400 diverse data sources . “Diverse” data sources… . High Quality through questionable to wrong . Rich content of Wikipedia links, YouTube videos and photographs to “Stub Records” containing “just a structure” . All records can be further enhanced…25 million compounds need annotation by the masses Search “Vitamin H” Search “Vitamin H” “Curate” Identifiers “Curate” Identifiers “Curate” Identifiers “Curate” Identifiers . General curation activities . Remove incorrect names . Correct spellings . Remove names with/without stereo compared to the structure . Correct registry numbers and other numeric identifiers (Beilstein, EINECS etc) . Add multilingual names . Add alternative names Crowdsourced “Annotations” . Registered Users can add . Descriptions/Syntheses/Commentaries . Links to PubMed articles . Links to articles via DOIs . Add spectral data . Add Crystallographic Information Files . Add photos . Add MP3 files . Add Videos Spectra Linked Spectra Linked Link off a structure in ChemSpider . Chemical suppliers . Other publications . Analytical Data . Related Reactions . Wikipedia . Patents . “Everything” Semantic Markup: Project Prospect Success Depends on Dictionaries Semantic Linking of Structures . What would you want to link off a structure? . Chemical suppliers . Other publications . Analytical Data . Related Reactions . Wikipedia . Patents . “Everything” “Chemicalizing” Pages “Chemicalizing” Pages ChemSpider SyntheticPages ChemSpider SyntheticPages ChemSpider Everywhere: What do computers want? Web services Web Services ChemSpider Everywhere . Linked from Wikipedia and many Public Databases . Linked from Open Notebook Science sites . Linked from Blogs using Structure/Spectra EMBED . Integrated into structure drawing packages . Integrated to software offerings from Thermo, Waters, Agilent, Bruker Structure Database Lookup Structure Database Lookup Reaction Database Look-up Reaction Database Look-up There will always be gaps... What ChemSpider does not deal with, yet... Materials . Minerals . Polymers . Biological macromolecules ChemSpider Tomorrow . 6 months: >1.2M compounds/month . 6 months: >800,000 new uniques . 6 months: >60 new data sources added . Continue the curation effort and keep cleaning . Finish depositions – millions left to deposit . Integrate RSC content – a massive archive! . Integrate RSC publishing workflows and databases . Enable the semantic web for chemistry – RDF was layered on last week The Future of Linked Chemistry on the Internet? . I can buy my wife a “methane ring” for Xmas . There are more than 10 compounds called Vitamin K1 on PubChem… . Most databases online cannot be annotated . The public funds the generation of data that is then mis-associated, cannot be used for modeling, for reference, for… . Low quality databases become authorities . The community accepts the status quo The PREFERABLE Future of Linked Chemistry on the Internet? . Public compound databases federate to build a truly linked environment of validated data! . Data validation needs are not ignored . Publishers layer on information to make publications discoverable . Public-Private databases can be linked . Open Data proliferate . RDF is everywhere . Business models WILL change Thank you Email: [email protected] Twitter: ChemConnector Blog: www.chemspider.com/blog Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    87 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us