Databases and Identification D
Total Page:16
File Type:pdf, Size:1020Kb
SESSION III – DATABASES “Databases and Identification” Prof. Jacques Vervoort BRAMA training for technicians – Module I, Rome For more information see http://fiehnlab.ucdavis.edu To be a master of spectra you need to be a master of structures in the first place. 765 100 OH N NH O O 50 N 807 747 O 705 O N O HO O O O 676 723 604 265 353 395 455 513 538 636 0 260 310 360 410 460 510 560 610 660 710 760 810 (nist_msms) Vincristine Complex MS data interpretations only possible with software MS data obtained by hyphenated techniques (GC-MS, LC-MS) Mass spectral database search and structure search routinely are used Mass spectrometers deliver multidimensional data 2 BRAMA training for technicians – Module I, Rome Be prepared – visualize your structures Try Marvin Space via Webstart 3 BRAMA training for technicians – Module I, Rome Organic Chemistry Reminder Molecular Formula C3H7F 47 100 F 50 61 27 41 13 19 33 59 0 4 10BRAMA20 training30 40 for50 technicians60 70 – Module I, Rome (mainlib) Propane, 2-fluoro- Be prepared - StereoIsomers How many stereoisomers can you expect from glucose ( KEGG )? O OH HO HO OH OH Glucose 5 BRAMAExample training calculated for technicians with MarvinView – Module I, Rome(via JAVA Webstart ) Be prepared – Tautomers How many tautomers can you expect? Important for mass spectral interpretations. O CH 3 H3C O Methyl acetate Example calculated with MarvinView Start via WebStart 6 BRAMA training for technicians – Module I, Rome Be prepared – Resonance (electron shifts) What are possible resonant structures? Important for mass spectral interpretation (electron impact, electrospray) OH Phenol Example calculated with MarvinView Start via WebStart 7 BRAMA training for technicians – Module I, Rome Structure search – know what could be possible How many compounds (isomer structures) are found in public databases? http://www.chemspider.com/ 8 BRAMA training for technicians – Module I, Rome H3 H3 Chemical Structure Handling C C H O H3 C O C C H H 3 O 3 C C H H Moronic Acid - CID: 489941 Most common structure formats you need to know: 3 3 SMILES /SMARTS - Simplified Molecular Input Line Entry Specification SDF /MOL - Structure Data File InChI /InChIkey - IUPAC In ternational Ch emical Identifier PDB - Protein Data Bank CML - Chemical Markup Language Some problems: • Data format needs to be based on Open Standard (problem with SMILES, ok with CML) • Stereo and aromatic bond information needs to be saved (ok with SDF) • Format needs to be small in space for millions of compounds (ok with SMILES) • SMILES notation needs to be unique (problem with SMILES) • Structure representation should be portable and based on Open Standard (ok with CML) 9 BRAMA training for technicians – Module I, Rome Chemical Structure Identifiers CH 3 N N O Structure Identifiers are needed for uniquely identifying structures Important for searching chemical structures in text and databases N N CH 3 H C 3 O Structure Name – IUPAC name or common name 1,3,7-trimethylpurine-2,6-dione CAS RN – Chemical Abstracts identifier 58-08-2 PubChem ID – PubChem Compound ID CID: 2519 InChIKey – Short representation of InChI InChiKey= RYYVLZVUVIJVGH -UHFFFAOYAW InChI – IUPAC In ternational Ch emical Identifier InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 10 BRAMA training for technicians – Module I, Rome SMILES structure format Positive: Good for storing structures in single line Fast text based search possible; human readable Negative: Many different SMILES codes exist SMILES for same structure can be different (canonical or unique SMILES needed) CH 3 C N N O HC N N CC CH 3 H C 3 O CCC InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 CCCC All those SMILES codes represent caffeine [c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-] CCCCO CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12 Cn1cnc2n(C)c(=O)n(C)c(=O)c12 Cn1cnc2c1c(=O)n(C)c(=O)n2C CCCCN N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2 O=C1C2=C(N=CN2C)N(C(=O)N1C)C CN1C=NC2=C1C(=O)N(C)C(=O)N2C Caffeine SMILES Source InChiI FAQ 11 BRAMA training for technicians – Module I, Rome SDF/MOL structure format Positive: established standard format; good for storing structures safely can store 3D structure; can store metadata (boiling points, toxicity, mass spectra) Negative: large file size, need compression OpenBabel02240823422D 1 0 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 M END $$$$ OpenBabel02240823422D 2 1 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 1 2 1 0 0 0 M END $$$$ OpenBabel02240823422D Creator 3 2 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 Coordinates for 3D 0.0000 0.0000 0.0000 C 0 0 0 0 0 1 2 1 0 0 0 2 3 1 0 0 0 Connection of atoms M END $$$$ 12 BRAMA training for technicians – Module I, Rome Molecules and mass spectra Close relationship between molecular structure and mass spectra Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations) Mass spectra reflect a state of gas phase ion physics and chemistry (rearrangements, fragmentations, bond cleavages) 130 130 130 100 100 100 73 73 Si N Si 73 NH O Si N Si 50 50 50 59 145 45 58 45 114 59 147 29 84 100 145 29 86 100 114 46 91 105 160 0 0 0 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 (mainlib) tert-Butylaminotrimethylsilane (mainlib) N,N-Diethyl-1,1,1-trimethylsilylamine (replib) Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]- 13 BRAMAElectron impact training (70 eV) for mass technicians spectra; Source: NIST0– Module5 I, Rome Molecules and mass spectra Similar structures may or may have not similar mass spectra 130 100 Si N 73 O 50 Si 59 147 47 91 105 114 163 179 188 204 220 294 0 65 163 206 59 91 102 132 280 147 179 Si N O 44 50 Si 100 73 116 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]-N-Methylphenylethanolamine, bis(trimethylsilyl)- Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program 14 BRAMA training for technicians – Module I, Rome Molecules and mass spectra Similar mass spectra may or may have not similar structures 43 100 55 70 83 97 50 29 111 27 125 196 15 65 140 154 168 0 32 153 139 168 196 27 125 111 29 50 97 41 69 83 100 55 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 1-Tetradecene Cyclotetradecane Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program 15 BRAMA training for technicians – Module I, Rome Mass spectral databases I Name Spectra count Type NIST05 200,000 electron impact spectra (EI 70 eV) Wiley 8 400,000 electron impact spectra (EI 70 eV) Palisade 600K 600,000 electron impact spectra (EI 70 eV) NIST MS/MS 5,200 MS/MS (ESI, +/-, 30-100V CID) MassFrontier 7,000 MS n, ESI, (Spectral Tree Library ) Important is data quality Annotation with CAS and Structure and Formula Link to literature or publication useful Currently no large ESI,APPI,APCI libraries available (free or commercial) 16 BRAMA training for technicians – Module I, Rome Mass spectral databases II 272 Smaller specialized libraries 100 Cl Cl Pfleger Maurer Weber (Drugs) MS+RI, 70eV Cl Cl Cl Cl Cl MassFinder (Volatiles) MS+RI, 70eV 50 237 Cl Cl RIZA DB (Toxicants) MS+RI, 70eV Cl Cl Cl 332 Golm DB (primary Metabolites) MS+RI, 70eV 404 0 230 250 270 290 310 330 350 370 390 410 430 450 Fiehnlib (primary Metabolites) MS+RI, 70eV (riza_web) |RI|2583|KEY|1596|CAS|2385-85-5|FRML|Empty|CMPD|Mirex| MassBank (Metabolites) ESI, MS n , accurate masses AAFS (Drugs, Forensic,Toxicology), MS+RI, 70eV ChemicalSoft (Drugs), MS/MS, MS E _____________________________________________________________ In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1) and temperature program must be used for matching retention indices In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy 17 BRAMA training for technicians – Module I, Rome Searching Molecules on PubChem 18 million compound DB (++) 18 GotoBRAMA PubChem training for technicians Structure – Module Search I, Rome CAS SciFinder • 33 million molecules and 60 million peptides/proteins • largest reaction DB (14 million reactions) and literature DB • substructure and similarity search of structures • a must for chemists and biochemists/biologists • no bulk download, no good Import/ Export, no Link outs 19 BRAMA training for technicians – Module I, Rome Structure search in SciFinder Retrieved 4000 papers (refine search only MS and MALDI) 20 BRAMA training for technicians – Module I, Rome Atomic Mass Correct unit is [ u] – unified atomic mass unit or [ Da ] Dalton see SI units 1 u = 1 Da = 1/12 th of mass of carbon 12 C = 1.66053886 x 10 -27 kg C6Cl6: C6 Cl6 p(gss, s/p:40) Chrg 0 R: 1000 Res.Pwr..