<<

SESSION III – DATABASES

“Databases and Identification”

Prof. Jacques Vervoort

BRAMA training for technicians – Module I, Rome For more information see http://fiehnlab.ucdavis.edu

To be a master of spectra you need to be a master of structures in the first place.

765 100 OH N

NH O O 50 N 807 747 O 705 O N O HO O O O 676 723 604 265 353 395 455 513 538 636 0 260 310 360 410 460 510 560 610 660 710 760 810 (nist_msms) Vincristine

‰ Complex MS data interpretations only possible with software ‰ MS data obtained by hyphenated techniques (GC-MS, LC-MS) ‰ Mass spectral database search and structure search routinely are used ‰ Mass spectrometers deliver multidimensional data 2 BRAMA training for technicians – Module I, Rome Be prepared – visualize your structures

Try Marvin Space via Webstart 3 BRAMA training for technicians – Module I, Rome Organic Chemistry Reminder

Molecular Formula

C3H7F

47 100

F

50

61 27 41 13 19 33 59 0 4 10BRAMA20 training30 40 for50 technicians60 70 – Module I, Rome (mainlib) Propane, 2-fluoro- Be prepared - StereoIsomers How many stereoisomers can you expect from glucose ( KEGG )?

O OH HO

HO OH

OH

Glucose

5 BRAMAExample training calculated for technicians with MarvinView – Module I, Rome(via JAVA Webstart ) Be prepared – Tautomers

How many tautomers can you expect? Important for mass spectral interpretations.

O

CH 3 H3C O

Methyl acetate

Example calculated with MarvinView Start via WebStart 6 BRAMA training for technicians – Module I, Rome Be prepared – Resonance (electron shifts) What are possible resonant structures? Important for mass spectral interpretation (electron impact, electrospray)

OH

Phenol

Example calculated with MarvinView Start via WebStart 7 BRAMA training for technicians – Module I, Rome Structure search – know what could be possible

How many compounds (isomer structures) are found in public databases?

http://www.chemspider.com/

8 BRAMA training for technicians – Module I, Rome H3 H3 Chemical Structure Handling C C

H O H3 C

O C C H H 3 O 3

C C H H Moronic Acid - CID: 489941 Most common structure formats you need to know: 3 3

SMILES /SMARTS - Simplified Molecular Input Line Entry Specification SDF /MOL - Structure Data File InChI /InChIkey - IUPAC In ternational Ch emical Identifier PDB - Protein Data Bank CML - Chemical Markup Language

Some problems:

• Data format needs to be based on Open Standard (problem with SMILES, ok with CML) • Stereo and aromatic bond information needs to be saved (ok with SDF) • Format needs to be small in space for millions of compounds (ok with SMILES) • SMILES notation needs to be unique (problem with SMILES) • Structure representation should be portable and based on Open Standard (ok with CML)

9 BRAMA training for technicians – Module I, Rome Chemical Structure Identifiers

CH 3

N N O Structure Identifiers are needed for uniquely identifying structures Important for searching chemical structures in text and databases N N CH 3 H C 3 O

Structure Name – IUPAC name or common name 1,3,7-trimethylpurine-2,6-dione

CAS RN – Chemical Abstracts identifier 58-08-2

PubChem ID – PubChem Compound ID CID: 2519

InChIKey – Short representation of InChI InChiKey= RYYVLZVUVIJVGH -UHFFFAOYAW

InChI – IUPAC In ternational Ch emical Identifier InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

10 BRAMA training for technicians – Module I, Rome SMILES structure format

Positive: Good for storing structures in single line Fast text based search possible; human readable Negative: Many different SMILES codes exist SMILES for same structure can be different (canonical or unique SMILES needed)

CH 3

C N N O HC N N CC CH 3 H C 3 O

CCC InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

CCCC All those SMILES codes represent caffeine [c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-] CCCCO CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12 Cn1cnc2n(C)c(=O)n(C)c(=O)c12 Cn1cnc2c1c(=O)n(C)c(=O)n2C CCCCN N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2 O=C1C2=C(N=CN2C)N(C(=O)N1C)C CN1C=NC2=C1C(=O)N(C)C(=O)N2C Caffeine SMILES Source InChiI FAQ 11 BRAMA training for technicians – Module I, Rome SDF/MOL structure format

Positive: established standard format; good for storing structures safely can store 3D structure; can store metadata ( points, , mass spectra) Negative: large file size, need compression

OpenBabel02240823422D

1 0 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 M END $$$$

OpenBabel02240823422D

2 1 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 1 2 1 0 0 0 M END $$$$

OpenBabel02240823422D Creator

3 2 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0.0000 0.0000 0.0000 C 0 0 0 0 0 Coordinates for 3D 0.0000 0.0000 0.0000 C 0 0 0 0 0 1 2 1 0 0 0 2 3 1 0 0 0 Connection of atoms M END $$$$ 12 BRAMA training for technicians – Module I, Rome Molecules and mass spectra

Close relationship between molecular structure and mass spectra

Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations)

Mass spectra reflect a state of gas phase ion physics and chemistry (rearrangements, fragmentations, bond cleavages)

130 130 130 100 100 100 73 73 Si N Si 73 NH O Si N Si 50 50 50

59 145 45 58 45 114 59 147 29 84 100 145 29 86 100 114 46 91 105 160 0 0 0 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160 (mainlib) tert-Butylaminotrimethylsilane (mainlib) N,N-Diethyl-1,1,1-trimethylsilylamine (replib) Silanamine, N,1,1,1-tetramethyl-N-[1-methy l-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R *)]-

13 BRAMAElectron impact training (70 eV) for mass technicians spectra; Source: NIST0– Module5 I, Rome Molecules and mass spectra

Similar structures may or may have not similar mass spectra

130 100

Si N 73 O 50 Si

59 147 47 91 105 114 163 179 188 204 220 294 0 65 163 206 59 91 102 132 280 147 179 Si N O 44 50 Si

100 73 116 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]-N-Methylphenylethanolamine, bis(trimethylsilyl)-

Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program 14 BRAMA training for technicians – Module I, Rome Molecules and mass spectra

Similar mass spectra may or may have not similar structures

43 100 55

70 83

97 50 29 111 27 125 196 15 65 140 154 168 0 32 153 139 168 196 27 125 111 29

50 97

41 69 83

100 55 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 1-Tetradecene Cyclotetradecane

Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program 15 BRAMA training for technicians – Module I, Rome Mass spectral databases I

Name Spectra count Type NIST05 200,000 electron impact spectra (EI 70 eV) Wiley 8 400,000 electron impact spectra (EI 70 eV) Palisade 600K 600,000 electron impact spectra (EI 70 eV)

NIST MS/MS 5,200 MS/MS (ESI, +/-, 30-100V CID) MassFrontier 7,000 MS n, ESI, (Spectral Tree Library )

Important is data quality Annotation with CAS and Structure and Formula Link to literature or publication useful Currently no large ESI,APPI,APCI libraries available (free or commercial)

16 BRAMA training for technicians – Module I, Rome Mass spectral databases II

272 Smaller specialized libraries 100 Cl Cl Pfleger Maurer Weber (Drugs) MS+RI, 70eV Cl

Cl Cl Cl Cl MassFinder (Volatiles) MS+RI, 70eV 50 237 Cl Cl

RIZA DB (Toxicants) MS+RI, 70eV Cl Cl Cl 332 Golm DB (primary Metabolites) MS+RI, 70eV 404 0 230 250 270 290 310 330 350 370 390 410 430 450 Fiehnlib (primary Metabolites) MS+RI, 70eV (riza_web) |RI|2583|KEY|1596|CAS|2385-85-5|FRML|Empty|CMPD|Mirex| MassBank (Metabolites) ESI, MS n , accurate masses AAFS (Drugs, Forensic,Toxicology), MS+RI, 70eV ChemicalSoft (Drugs), MS/MS, MS E ______

In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1) and temperature program must be used for matching retention indices

In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy 17 BRAMA training for technicians – Module I, Rome Searching Molecules on PubChem

18 million compound DB (++)

18 GotoBRAMA PubChem training for technicians Structure – Module Search I, Rome CAS SciFinder

• 33 million molecules and 60 million peptides/proteins • largest reaction DB (14 million reactions) and literature DB • substructure and similarity search of structures • a must for chemists and biochemists/biologists • no bulk download, no good Import/ Export, no Link outs

19 BRAMA training for technicians – Module I, Rome Structure search in SciFinder

Retrieved 4000 papers

(refine search only MS and MALDI) 20 BRAMA training for technicians – Module I, Rome Atomic Mass

Correct unit is [ u] – unified atomic mass unit or [ Da ] Dalton see SI units 1 u = 1 Da = 1/12 th of mass of carbon 12 C = 1.66053886 x 10 -27 kg

C6Cl6: C6 Cl6 p(gss, s/p:40) Chrg 0 R: 1000 Res.Pwr...

283.81 100

90 Hexachlorobenzene (C6Cl6) Cl 285.81 80 Cl Cl 70 average mass - 284.7804 u 60 Cl Cl 281.81 50 Cl integer mass - 282.0 u

40 RelativeAbundance 287.80 30 monoisotopic mass - 281.81312 u 20 282.82 284.81 10 286.81 289.80 288.81 291.80 292.80 294.80 295.80 0 282 284 286 288 290 292 294 296 m/z

Always (always) check molecular masses obtained from databases or publications. For mass spectrometry the monoisotopic mass is used. 21 BRAMA training for technicians – Module I, Rome InChIKey: CKAPSXZOOQJIBF -UHFFFAOYAV Mass Accuracy

Instruments must be calibrated to obtain high mass accuracy. In case of FT-ICR-MS mass calibration can be stable over weeks. Post- mass calibration can be performed if calibrant was run with samples. Mass of electron becomes important at around 500 Da.

Type Mass Accuracy

FT-ICR-MS 0.1 - 1 ppm Orbitrap 0.5 - 1 ppm mexp m- calc Magnetic Sector 1 - 2 ppm ppm = ( )∗1E+ 6 mexp TOF-MS 3 - 5 ppm

Q-TOF 3 - 5 ppm m(e-) = 0.00054858026 u = mass of electron m(1H) = 1.0078246 u = mass of proton Triple Quad 3 - 5 ppm Linear IonTrap 50-200 ppm (10 ppm in Ultra-Zoom) 22 BRAMA training for technicians – Module I, Rome Resolving Power

RP = 1700

High resolving power is helpful for separation of species with almost same mass ( isobars ). RP = 48,250

High resolving power can not be used to distinguish between structural isomers .

Example: 23 BRAMAC8H 10trainingN2O for has technicians 100,082,479 – Module isomers. I, Rome Example Solanine (CID=30185) Isotopic Pattern Generators

Elements can be a) monoisotopic (F, Na, P, I) b) polyisotopic (H, C, N, O, S, Cl, Br) Isotopic pattern generators generate the isotopic abundances for a given mass value. Calculation is very time-consuming and based on Fast Fourier algorithms.

24 BRAMA training for technicians – Module I, Rome Charge states

charge state 1 charge state 2

CID: 3081765 MW = 1125.50082 C50H72N13O15P

25 BRAMA training for technicians – Module I, Rome Different charge states and peak resolutions

562.75 1125.50 100 100

90 90 80 2000 Resolving Power 80 2000 Resolving Power 70 Charge state 2 70 Charge state 1 563.25 1126.51 60 60 C H N O P: 50 50 72 13 15 50 C 50 H72 N13 O 15 P 1 C 50 H72 N13 O 15 P: 40 p (gss, s /p:40) Chrg 2 40 C 50 H72 N13 O 15 P 1 R: 2000 Res .Pwr . @FWHM p (gss, s /p:40) Chrg 1 30 30 R: 2000 Res .Pwr . @FWHM 563.76 1127.52 20 20

10 564.26 10 1128.52 564.76 565.77 566.78 567.78 1130.54 1132.55 1134.55 0 0 562.75 1125.50 100 100

90 90

80 80 70 200,000 Resolving Power 70 200,000 Resolving Power 60 563.25 Charge state 2 60 1126.50 Charge state 1 50 0.5 50 1.0 40 40 C H N O P: 50 72 13 15 C H N O P: C H N O P 50 72 13 15 30 50 72 13 15 1 30 C H N O P p (gss, s /p:40) Chrg 2 50 72 13 15 1 p (gss, s /p:40) Chrg 1 R: 200000 Res .Pwr . @FWHM 20 563.75 20 1127.51 R: 200000 Res .Pwr . @FWHM 10 10 564.25 1128.51 564.76 565.76 566.76 567.26 1129.51 1131.52 1133.52 1135.53 0 0 562 563 564 565 566 567 568 1125 1130 1135 m/z 26 BRAMA training for techniciansm/z – Module I, Rome Example of Phosphorylated Angiotensin isotopic pattern without adduct [M+H] + simulated by Thermo XCalibur Molecular Formula Generators

Formula generators are used to create molecular formulae from accurate masses. Input requires 1) accurate isotopic mass (with or without adduct) and 2) error in ppm or mDa (milli Dalton)

Accurate mass

Mass error Example MWTWIN 27 BRAMA training for technicians – Module I, Rome The molecular formula space of small molecules calculated by the Seven Golden Rules

Each molecular formula can expand to billions of structural isomers. Molecular Formula ≠ Molecular Isomer http://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules/ 28 BRAMA training for technicians – Module I, Rome Frequency distribution of molecular formulas

29 BRAMA training for technicians – Module I, Rome Impact of mass accuracy on number of formulas

30 BRAMA training for technicians – Module I, Rome Mass accuracy and isotopic pattern

[M+H] +

C45H73NO15 MW = 867.49799

Example : ESI-MS (+) of Solanine on a LTQ Resolving Power: 1700 Mass Accuracy: 46 ppm Isotopic Abundance Error: ±1.46%

31 BRAMA training for technicians – Module I, Rome Isotopic abundances as orthogonal filter

32 BRAMA training for technicians – Module I, Rome Tasks

( 1) Calculate the number of isomers for C12H12

(2) Generate the isotopic pattern for a and Hexachlorobenzene.

(3) Find the molecular formula for the mass spectrum of the next page. http://www.ch.ic.ac.uk/java/applets/FormToM.html Use H=24, C=24, O=8 and others=4 in the settings. Include S ! Use the isotope generator to check which of the possible formula ’s is the best to fit the pattern observed.

(4) Find the possible molecule(s) in SciFinder and in the National Library of Medicine. Which one is the most likely? http://chem.sis.nlm.nih.gov/chemidplus/ (note: use the formula with hyphen and use the letters alphabetically) https://scifinder.cas.org/scifinder/login.jsf

33 BRAMA training for technicians – Module I, Rome Int=100 Int=9 Int=2

Int=20 Int=2

34 BRAMA training for technicians – Module I, Rome Webapplications

• Isotope calculator: • http://yanjunhua.tripod.com/pattern.htm • Mass to Formula and Formula to Mass: http://www.ch.ic.ac.uk/java/applets/FormToM.html • Tutorial GC-MS: • http://eu.shimadzu.de/products/chromato/gcms/TutorialGCMS/default.aspx

• Databases: • Dictionary of Natural Products (there is a limited access because of lack of license) • http://dnp.chemnetbase.com/dictionary-search.do?method=view&id=2885722&si • Chemical lookup service: • http://cactus.nci.nih.gov/ • SciFinder: • This needs to be activated through the university library link.

• Good website for Mass spectrometry background: • “The expanding role of Mass spectrometry in Biotechnology ” • http://masspec.scripps.edu/book_toc.php

35 BRAMA training for technicians – Module I, Rome