DOCUMENT RESUME ED 025 413 SE 004 965 By- Scholtz, R G.. And Others of Feasibility Study of the Development of a Specialized ComputerSystem of Organic Chemical Signatures Spectral Data Illinois Inst. of Tech, Chicago. Spons Agency-National Science Foundation.WashingtON D.C. Repor/ No- IITRI-C6104- 4 Pub Date Apr 68 Contract NSF CS14 Note- 186p. MRS Price MF-S0 75 HC- $9 40 Descriptors- *Chemistry,Information Retrieval, *InformationScience, *Organic Chemistry Identifiers-National Science Foundation This final report of a feasibilitystudy describes the researchperformed in assessing the requirementsfor a chemical signaturefile and search schemefor organic compoundidentification and information retrieval.The research performed to determined feasibility of identifying anunknown compound involved screeningthe compound against a file of chemicalsignatures of knowncompounds. The chemical signatures were obtainedfrom (1) infrared spectrometry(IR), (2) nuclear magnetic resonance (NMR) spectrometry,(3) mass spectrometry(MS), (4) gas chromatography (GC), and (5) ultraviolet (UV)spectrophotometry. A physical stateand element search precededthesignaturesearch.The basicsystem containeddatafor 500 representative pure organiccompounds. The discriminatingability of each signature, from most to least discriminating,follow the order (1) IR, (2)MS, (3) GC, (4) NMR, and (5) UV. The logic and searchprocedures of the experimentalsystem weredesigned for ready conversion to a computersystem. Appendixes contain alisting of the 500 compounds in the data baseand other experimentdetails. OW EDUCATION & WELEARE U.S. DEPARTMENT Of HEALTH, OFFICE OF EDUCATION
REPRODUCED EXALTLY ASRECEIVED FROM THE THIS DOCUMENT HAS SEEN OPINIONS ORIGINATING IT.POINTS Of VIEW OR PERSON OR ORGANIZATION REPRESENT OFFICIAL OFFICEOF EDUCATION STATED DO NO1 NECESSARILY POSITION OR POLICY
Report No.IITRI-C6104-4 (Final Report) FEASIBILITY STUDY OF THEDEVELOPMENT OF A SPECIALIZEDCOMPUTER SYSTEM OF ORGANIC CHEMICALSIGNATURES OF SPECTRAL DATA
National ScienceFoundation Report No. IITRI-C6104-4 (Final Report)
FEASIBILITY STUDY OF THEDEVELOPMENT OF A SPECIALIZED COMPUTERSYSTEM OF ORGANIC CHEMICAL SIGNATURESOF SPECTRAL DATA
April 17, 1967, through April16, 1968 Contract No. NSF-0514 IITRI Project C6104
Prepared by
R. G. Scholz, E. S. Schwartz, and M. E. Williams
of
IIT RESEARCH INSTITUTE Tedhnology Center Chicago, Illinois 60616
for
National Science Foundation 1800 G Street Washington, D. C.
Attention: Mr. T. Quigley
Copy No. cV; April 291 1968
HT RESEARCH INSTITUTE FOREWORD
This is Report No.IITRI-C6104-4 (Final Report) onIITRI Project C6104, entitled"Feasibility Study of theDevelopment of a Specialized ComputerSystem of OrganicChemical Signatures This program isbeing performed for the of Spectral Data." The National ScienceF.:undation under ContractNo. NSF-0514. work reported herein wasperformed from April l 1967, through April 16, 1968.
The project leaderis Dr. Robert G.Scholz, Research Chemist, and the project is underthe administrativeguidance of Dr. Warner M.Linfield, Manager, OrganicChemistry Research. Major contributions tothis report were madeby Eugene S. Schwartz, Senior Scientist, andMiss Martha E. Williams,Manager, Technical Information Research.
The authors acknowledgethe excellent work ofthe following persons whocontributed to this study: M. M. Bogolin, P. A. Demink, B. E.Edwards, J. J. Finn, P. A.Llewellen, B. L. Mankus, H. J.O'Neill and A. J.Starshak. Respectfully submitted, j\ IIT RESEARCH INSTITUTE
R. G. Scholz Research Chemist Organic Chemistry Research
Approved by:
Yvi i4AA/° W. M.Linfield Manager Organic Chemistry Research
Li RGS:dfp
ji
lITRESEARCH INSTITUTE
ii IITRI-C6104-4 FEASIBILITY STUDY OFTHE DEVELOPMENT OF A SPECIALIZEDCCMPUTER SYSTEM OF ORGANIC CHEMICALSIGNATURES OF SPECTRALDATA
ABSTRACT
This final reportdescribes a study ofthe feasibility of information file fororganic chemical signatures. a proposed for a The purpose of thestudy was to assessthe requirements file and a searchscheme which couldbe used for compound identification and informationretrieval. Use of the system could be made bychemists from industry,government and to identify educational institut;.ons. Users would be able compounds and obtainreferences to theprimary data sources compound sample from by supplying eitheranalytical data or a derived. These data wouldthen be submitted which data could be Each to the computerfor screening againstthe master file. compound in the filewill bear theChemical Abstractsregistry number and sourcereferences so that thechemical signature file will be tiedin with the existingNational Chemical Information System andoriginal data sources.
The researchperformed determinedthe feasibility of identifying an unknowncompound by screeningit against a file of chemical signaturesfor known compounds. The chemical signatures were obtainedfrom Infrared (IR)spectrometry, spectrometry nuclear magnetic resonance(NMR) spectrometry, mass (MS), gas chromatography(GC), and ultraviolet(UV) spectro- preceded photometry. A physical stateand element search the seardhes on thefive signatures. Data for 500representative pure organiccompounds were used todevelop the basic system. from existing sources,such as ASTM, Sadtler, Data were obtained Some Varian, and otherGovernment and privatecollections. additional data weregenerated.
The work encompassedselecting the representativecompounds; acquiring source data;analyzing, indexingand cataloging the data; devising afile structure andsearch strategy; and demonstrating feasibilityby operating thedata-matching and -screening system. A series ofexperimental searches wasconducted using the compounds as the database. In these searches 500 representative treated as unknowns 100 compounds allhaving five signatures were and screened againstthe file compounds. Also, eleven compounds analyzed by the fivetechniques and the data of the 100 were file with obtained were screenedagainst the file. An inverted optical coincidence(Termatrex system) was usedto demonstrate the feasibility ofthe identification system.
HT RESEARCHINSTITUTE
iii IITRI-C6104-4 A candidate had to fulfill the following criteria:
1. Same physical state at standard conditions
2. Elements
3. Match on two of the three most intense IR bands, within tolerances, in any order
4. Match on two of the four most intense MS peaks in any order
5. Match on relative retention time, within tolerances, on any one of 2 selected GC columns
6. Match on the most intense peak, within tolerances, in NMR
7. Match on the most intense band position, within tolerances, in UV.
Of the 100 "unknowns" screened using these original matcl criteria, 68 came through as unique candidates while 32 came through with multiple candidates. The large number of multiple candidates is attributed primarily to the lack of data (defaults).
A search made for the 11"unknowns" which were analyzed produced 9 unique candidates,one lost to suspected erroneous data in the file and one with4 candidates, attributed to lack of data.
A test on the 32 compounds having multiple candidates using modified, more rigid match criteria resulted in 24 coming through as unique candidates.
The discriminating ability of each signature, from most to least discriminating, follow the order: IR, MS, GC, NMR and UV.
The logic and the search procedures of the experimental system were designed for ready conversion to a computerized system.
To evaluate the results of these searches a weighting sdheme based on chemical criteria, i.e., relative usefulness of the types of analytical data, was established. Exact, tolerance, and default matches were also considered in the weight assignments. When data for more than one compound survived the search, the differences in ratings clearly identified the better candidate.
lITRESEARCH INSTITUTE
iv IITRI-C6104-4 The study has demonstrated that the composite chemical signature method for screening and matching data is feasible and provides a reliable base for compound identification.
IIT RESEARCH INSTITUTE
v IITRI-C6104-4 TABLE OF CONTENTS Page
Abstract iii
I. Introduction 1
Selection of Compounds 3
III. Acquisition of Data 4
A. General 4 B. Gas Chromatography 11 C. Nuclear Magnetic Resonance 16 D. Infrared 24 E. Mass Spectrometry 26 F. Ultraviolet Spectrophotometry 27 G. Survey of Signature Availability 28 H. Cataloguing Data and Data Master Card Design 31
IV. Testing Signature Concept 32
A. Experimental Design 32 B. Search Procedures 38 C. Search Measures 43 D. Test Results with Initial Match Criteria 47
V. Summary and Conclusions 108
VI. Recommendations 115
Appendix A-500 Compounds in Data Base 119
Appendix B-Availability of Signature Data 152
Appendix C - Data Master Sheet 165 Appendix D - Chemical Signature Search Experiment 167 Appendix E - Chemical Signature Search Record 169
Appendix F - Candidate List 171
Appendix G - Candidate List Rating Table 173
lITRESEARCH INSTITUTE
vi IITRI-C6104-4 LIST OF TABLES
Table Page
1 Compounds Available in Multisignature Combinations 6
2 Distribution of Multisignatures by Functional Group 7
3 4-Signature Combinations by Functional Group 8
4 3-Signature Combinations by Functional Group 9
5 2-Signature Combinations by Functional Group 10
6 Operating Conditions for Gas Chromatographic Retention Data 12
7 Variability of GC Retention Data 14
8 NMR Peak Positions for Diethyl Phthalate 19
9 NMR Peak Positions for Diethyl Phthalate According to Sadtler 20
10 NMR Peak Positions for Diethyl Phthalate According to Varian 21
11 Search Table (CSTABL) 41
12 Signature Availability and Default Ratios 45
13 Signature Discrimination for 100 Known Compounds 48
14 MS Matches on Peak Permutations 52
15 Number of Candidate Compounds Obtained with Different MS Match Criteria 53
16 IR Matches on Peak Permutations 54
17 Number of Different Compounds Obtained with Different IR Match Criteria 55
18 Number of Candidate Compounds Obtained with Different GC Match Criteria 57
19 State and Elements Data Availability 59
20 Selectivity of State and Elements Searches 59 lITRESEARCH INSTITUTE
vii IITRI-C6104-4 LIST OF TABLES(cont.)
Table Page 63 21 Selectivity with DifferentSearch Orders 67 22 Search Selectivity by Stages
23 Search Selectivity ofFunctional Groups 68
24 Data Availability 65 Extraneous CandidateCompounds 70 71 25 Distribution of Candidate Ratings 72 26 Search Precision
27 Compounds with Multiple Candidatesby Functional Group 74
28 Matches Obtained withMulticandidate Test Compounds 75
29 Extraneous Candidate CompoundSelected More than Once 92
30 Candidate Compounds Retrievedin Search for Octane 94 97 31 Comparison Between Storedand Laboratory Data 98 32 Identification of Compounds fromLaboratory Data
33 Candidates Eliminated withModified Match Criteria 101 103 34 Selectivity of Combinationsof Signatures
lITRESEARCH INSTITUTE
viii IITRI-C6104-4 LIST OF FIGURES
Figure Page
1 NMR Spectra of DiethylPhthalate 18
2 Distribution of Signature Matches 50
3 Selectivity with Sequence Orderedby Match Criteria 61
4 Selectivity with Sequence Orderedby Default Ratio 62
5 Selectivity of 7-Stage Searches(1) 64
6 Selectivity of 7-Stage Searches(2) 65
7 Selectivity of 2-Signature Searches Averages of 10 Known Compounds 104
8 Selectivity of 3-Signature Searches Averages of 10 Known Compounds 105
9 Selectivity of 4-SignatureSearches Averages of 10 Known Compounds 106
IIT RESEARCH INSTITUTE
ix IITRI-C6104-4 FEASIBILITY STUDYOF THE DWELOPMENT OF A SPECIALIZEDCOMPUTER SYSTEM DATA OF ORGANIC CHEMICALSIGNATURES OF SPECTRAL
I. INTRODUCTION engaged In the past yearIIT ResearchInstitute has been data in a program tostudy the feasibilityof an analytical information file fororganic compounds. The program was Science carried out underthe sponsorship ofthe National
Foundation. Although there is awealth ofanalytical dataavailable, know where it islocated nor the analyticalchemist does not would be of does he have ready accessto it. This information unknown considerable helpparticularly foridentification of the number of compounds. In the area oforganic chemistry, research in the areasof compounds is steadilyincreasing and analysis is becoming organic synthesis aswell as instrumental complexity of available morediversified. The volume and analytical data aregrowing and evenwell-trained analytical keep pace withthis and organic chemistsfind it difficult to of chemists tohave growth. Thus, there is aneed on the part their analytical a placeWhere they can gofor help with
problems. chemists The usual procedurefor analyticaland organic compound is toobtain trying to confirmthe identity of a spectral data by as manyanalytical techniques asare
available to comparethese data withreference data and lITRESEARCH INSTITUTE
1 IITRI-C6104-4 identify the unknown compound. All too often the chemist is limited by inadequate files. A search would befacilitated by a centralized systemwhich can use either spectraldata or can provide a more complete array ofspectral data to search for and identify the unknowncompound and direct the user to additional analytical, chemicaland physical data. To date, there is no such system. There are available variouscollections of spectral data fromindustrial and governmentorganizations such as Sadtler ResearchLaboratories, Varian Associates: the American Society for Testing andMaterials, the American Petroleum Institute and the NationalBureau of Standards.
Most of these collections arelimited in scope and, more important, are not correlated witheach other. At present a researcher will have to spendconsiderable time and money in an effort to locate a sourcethat has the analytical data on file and Whichwill help identify an unknowncompound and reference the original data. A well-planned andultimately computerized central data file wouldprovide a single location
to which a scientist couldsubmit his data or bring compounds
from which the spectral datacould be obtained. Such a file containing all published standardizeddata could be screened
to identify an unknowncompound. If the data are not available he need not go elsewhere if thefile incorporated data from
other collections. If the data are contained in thefile he will be able to identify thecompound, learn the original
source of the data andbe directed to additionalchemical and
lITRESEARCH INSTITUTE
2 IITRI-C6104-4 physical data pertainingto the compound. The objective of the research studyreported herein was todetermine the capability of a compositesignature to identify anunknown ccmpound. The composite signatureis made up of a minimum number of selected datapoints obtained from 5analytical accomplished by matching techniques. Identification would be the composite signatureof the unknown against afile of composite signatures. In a sequentialsearch, candidate compounds would be selected byobtaining the logicalintersection of the
individual signaturematches. We have prepared adata file and developedthe logical
procedures for searchingthe file. The work involvedacquiring
data for a referencefile of 500 pure organiccompounds, indexing
and tabulating the data,defining the search strategyand finally testing the systemfor its ability toselect the proper the work compounds. This final report onthe project describes
and the results. Recommendations based onthe study are provided. A
comprehensive list showingdata sources, type ofdata available,
costs and otherinformation are also included.
II. SELECTION OF COMPOUNDS Our initial efforts weredirected towards selecting500
compounds for which wecould obtain 2, 3, or 4sets of data
derived from infraredspectrometry, nuclearmagnetic resonance,
mass spectrometryand gas chromatography. We wanted the
lITRESEARCH INSTITUTE
3 IITRI-C6104-4 compounds to represent manycompound types and,in some cases, compounds whose compositesignatures weresimilar to test the selectivityof the sequentialsearch to select the the proper compound. To aid in theselection of compounds Chemical Names was used SOCMA Handbook forCommercial Or anic IUPAC name and CASregistry to identify thecompounds by their identified by our systemthe CAS number. When a compound was registry number wouldaccompany eachcompound allowing the user information about to obtainadditional chemicaland physical organic compounds were the compound. Ultimately, 838 different considered for use. These representedhydrocarbons, terpenes, steroids, heterocyclics, polyfunctionals,aromatic, amino acids,
alkaloids and others. Also, we made thedecision to include ultraviolet as a fifthsignature. From the 838compounds, we hoped to select as many aspossible (up to500) that had 5 compounds, 23 of theoriginal signatures. In the 500 selected 24 compound types wererepresented with onlycarbohydrates
being excluded.
III. ACQUISITION OF DATA
A. General Our plan wes to extractand index datafrom various
literature and commercialsources tobuild our data base.
Initially, since wehad not yet determinedthe minimum amount concerned with of data necessaryfor identification we were getting more data than wethought we might needfor the lITRESEARCH INSTITUTE
4 IITRI-C6104-4 final system. We tried to find allsignatures for all 838 compounds. This proved to be a major problem and moredifficult than had been anticipated. The difficulty lay not onlyin the lack of data but in our method of compoundselection - we were limiting the compounds to several hundred andrestricting ourselves to finding as many signatures as wecould for just these compounds. However, this approach was reasonable atthis stage of the program inasmuch as we were trying tomaintain the diversity of compound types. Of the 838 compounds consideredfor use, 357 had NMR data,
233 had UV data, 478 had MSdata, 476 had IR data and 308had
GC data. Of these, 500 were selectedfor use in our file. The following is a breakdownof the data distributionfor these
500 compounds: 100 had 5 signatures each
174 had 4 signatures eadh
195 had 3 signatures each
31 had 2 signatures each
Table 1 lists the number ofcompounds available in 5-, 4-, 3-,
and 2-signature combinations. Table 2 summarizes the number of signatures available in the23 chemical groups used in the
experiment. Tables 3, 4 and 5 list thecombinations of signatures available in the 23chemical groups in 4-, 3-, and
2-signature combinations.
HT RESEARCHINSTITUTE
5 IITRI-C6104-4 Table 1
COMPOUNDS AVAILABLE INMULTISIGNATURE COMBINATIONS
SIGNATURES NUMBLR
UV/MS/GC/IR/NMR 100 MS/GC/IR/NMR 89 UV/GC/IR/NMR 2 UV/MS/IR/NMR 50 UV/MS/GC/NMR 0 UV/MS/GC/IR 33 UV/MS/GC 3 UV/MS/IR 21 UV/MS/NMR 7 UV/GC/1R 2 UV/GC/NMR 1 UV/IR/NMR 7 MS/GC/IR 62 MS/GC/NMR 4 MS/IR/NMR 85 GC/IR/NMR 3 UV/MS 6 UV/GC 0 UV/IR 1 UV/NMR 0 MS/GC 6 9 MS/IR. MS/NMR 3 GC/IR 3 GC/NMR 0 IR/NMR 3
TOTAL 500
6 IITRI-C6104-4 Tabl e 2
DISTRIBUTION OF MULTISIGNATURES BY FUNCTIONAL GROUP
NUMBER FUNCTIONAL OF COMPOUNDS NUMBER OF SI GNATURES AVAILABLE GROUP I N GROUP FIVE FOUR THREE TWO
ALCOHOLS 50 19 20 10 1 ALDEHYDES 21 8 5 7 1 ALKALOIDS 3 0 0 2 1 AMI DE S/I MI DE S 8 0 1 4 3 AMI NES 18 0 4 13 1 AMI NO AC IDS 4 0 1 2 1 CARBOXYLIC ACIDS 24 0 8 15 1 DI OL S 14 4 9 1 0 ESTERS 56 4 23 26 3 ETHERS 22 2 8 9 3 HALOCARBONS 57 5 22 30 0 HETEROCYCLIC S 22 0 11 11 0 HYDROCARBONS 70 23 22 22 3 KETONES 27 10 9 7 1 NITRI LES 18 3 7 8 0 NI TRO 9 6 1 1 1 POLYP UNC TIONAL S 48 10 14 18 6 STEROIDS 2 0 0 1 1 SULF I DES 8 5 2 1 0 SULF ONE S 1 0 0 0 1 SULF OXE DE S 1 0 1 0 0 TERPENES 11 0 4 5 2 THIOLS 6 1 2 2 1
TOTALS 500 100 174 195 31
7 IITRI-C6104-4 Table 3
4-SIGNATURE CCMBINATIONS BYFUNCTIONAL GROUP
r4'
FUNCTIONAL GROUP
ALCOHOLS 3 1 3 0 13 ALDEHYDES 4 0 0 0 1 0 ALKALOIDS 0 0 0 0 0 AMIDES/IMIDES 0 0 1 0 0 AMINES 1 0 3 0 0 AMINO ACIDS 0 1 0 0 CARBOXYLIC ACIDS 0 0 8 0 0 6 DIOLS 0 0 3 0 ESTERS 19 0 3 0 1 6 0 2 0 0 ETHERS . 0 1 HALOCARBONS 19 0 2 0 0 HETEROCYCLICS 6 0 5 0 5 HYDROCARBONS 7 0 10 0 3 KETONES 4 0 2 0 0 NITRILES 7 0 0 0 0 NITRO 1 0 0 0 1 POLYFUNCTIONALS 6 0 7 0 0 STEROIDS 0 0 0 0 1 SULFIDES 0 0 1 0 0 SULFONES 0 0 0 0 0 SULFOXIDES 1 0 0 0 0 1 TERPENES 3 0 0 0 0 THIOLS 2 0
TOTALS 89 2 50 0 33
8 IITRI.C6104-4 00000N0000000000r400000°1
1 I-100MMOMOM00000M0Mr4r400r4NI2 r-I N i
\1 Or4r4000000000r40r40000000011, M Z
LI N000000r4MWIN000000MOINN 0 M Z
000001000000r40N0I-100000001r.
00H000000000000000000001H
000000000000000000
r-10000000000N.0000000000
M m00.INON00,-10r403000N0000.-1
`8 .-100000000000N000000000
M M 04 A 0 H w 4 um 0 A M 1HZM
cf) U) MWAH MW A 0 OMM4 N 4ge P4 AH Wil :M 6'! 1W MI IITRI-C6104-4 0 0 0 H 0 0 0 0 0 00 00 0 0 0 0 0 0,-1 0I 01
0 .0000000000000000000010
00 0 0 0,-100 ,-100000000000001 8
000 0 0 000000000000001
000 ri0000 000rIOOON 0ririO øs
00ri00000riri000ri00N0000001
I 0 00000000000000000000000 II
.1 0000000000 00000ri00000001 ri
I000000000000000000000001 0
1 riri0000000ri00NO0Or-I0000001 0
cn a, r) u) H O (J) IX (J) UM Z O rzlrl w MHZ 0 Z 140 H (11 E-4 cn cn 2 Ho ovliq (11 QUI u)ra A H 4ti R O 1-1 H.., 0 Cil cn 8 u)ra 4(11 H 0 OU1M o cnix 0 o ra 04 0 E- M 8c4 fzi g M° R 0 A R 13:1 WO .52 0 Ei a, E-4 E-4 44 u rz 10 The list of the 500compounds in the data base shownin
Appendix A contains, inaddition to the IITRI numberassigned to a compound, common namesand the name designated byChemical Abstracts, and the dataavailability for each compound.
B. Gas Chromatography The analytical data ofqualitative value in gaschromato- graphy is retention time. Worker's have been trying tofind the best retention designationthat will be accurate,universally applicable and acceptable. The main difficulty is thelack of universal applicability; eachworker has a privatecollection of columns for analysis andtheir retention data areunrelatable to other workers'data. Most commonly used data arerelative retention values and a recentlyintroduced value called the 1 Kovats' Retention Index. We decided to use therelative retention values because of theirwide use and ease of measure- We ment. We also chose toluene asthe inttrnal standard. believe that for any futurework with this system we would
select another internal standardbecause under the operating
conditions we had selected, theretention distance of toluene
is somewhat short, possiblyresulting in relatively large
errors in measurement. A compound with alonger retention distance, such as hexyl or heptyl acetate, would bebetter
suited for this purpose. The operating conditions chosenfor our data base are
shown in Table 6. Two columns, Carbowax20M and Silicone SE-30 1Kovats, E., Z. Anal. Chem.,181, 351 (1961). lITRESEARCH INSTITUTE
11 IITRI-C6104-4 thermostatted at 160°C were selected. These columns are frequently used by gaschromatographers and offer excellent thermal stability and high-temperatureoperation if needed.
The liquid phases also arereadily obtainable from a multitude of distributors.
Table 6
OPERATING CONDITIONS FOR GASCHROMATOGRAPHIC RETENTION DATA
Condition A Condition B
olumn: Column: 20% SE-30 (Gen. Elec. Co.) 21.9% Carbowax 20M (Carbide 0.5% Polytergent J-300 & Carbon Chem. Co.) 60/70 mesh Celite 545(J-M Co.) 0.5% Polytergent J-300 3/8 in. x 12' Aluminum 60/70 mesh Celite 545(J-M Co.) 160°C 3/8 in. x 12' Aluminum 160°C Solvent: toluene & chloroform Solvent: methylene chloride Conditioned: 48 hr @ 180°C Conditioned: 40 hr @ 170°C 18 hr @ 160°C Carrier gas: Helium Carrier gas: Helium V toluene: 14.6 ml V toluene: 15.9 ml
We obtained gas chromatographicdata from two sources:
1. W. O. McReynolds' publishedcollection2
2. Data generated in our laboratory.
McReynolds' compilation includesrelative retention data
) for about 350 compounds on and Kovats' Retention Indeces(Ix 80 different substrates at twodifferent temperatures. We were
able to use about 180 compounds outof his compilation. Relative
times (Rv ) were calculated fromthe following equation.
R -R . x air Rw= Relative Retention = -R tol air
2McReynolds, W. 0.1 "Gas Chromatographic Retention Data, Preston Technical Abstracts Co., Chicago, In., 1966. lITRESEARCH INSTITUTE
12 IITRI-C6104-4 where Rx, Rt01 and Rair are measured retention distances for the compound, toluene and air,respectively. Although Kovats' retention indices are included in our datafile they were not used in the screening. Hence, no detail will be given concerning its computation except to say it is a numberthat relates the retention time of the unknown to the retentiontimes for the normal hydrocarbons which bracket theunknown.
In order to increase available gaschromatographic data we found it necessary to generatedata. At first, an attempt was made to assess theusefulness of literature data, ASTM published a gas chromatography data compilation(ASTM DS25A) whidh compiles data from the literature through 1965. Unfortunately, as Table 7 illustrates,relative retention distances obtained under similar operating conditions can varywidely making such data unusable for our purposes. Two conclusions drawn from the literature survey are:
1. It is not feasible to use literature sources
for standard gas chromatographic data
2. We were (and would be) restricted to the GCdata
we have on hand or can generateunder the conditions
used by McReynolds. Consequently, we requested and received permissionfrom
the National Science Foundation to generate gaschromatographic
data for 100 to 150 compounds. Mr. W. 0. McReynolds supplied
us with the Carbowax 20Mand Silicone SE-30 columnc he used and we generated data for about 125compounds.
HT RESEARCH INSTITUTE
13 IITRI-C6104-4 Table 7 VARIABILITY OF GC RETENTION DATA
ANI4ONNIMMINIEWMINNIIIM Commuricl Relative RetentionDataa On 20M Relative to Acetone Acetoldehyde 0.68 0.53 0.37 Propionaldehyde 0080 0.86 0.87 0.90 Butyraldehyde 1.40 1.35 1.35 1.34 Ethanol 2.20 1.27 1.40 1.48 Isovaleraldehyde 3.70 2.6 2.06 Formaldehyde 0.60 0.39 On 20M Relative to Toluene Dipropylether 0.20 0.26 Diethylether 0.68 0.11 Methanol 0.35 0.35 n-Amylacetate 2.08 1.82 n-Heptanol 5.73 4.7 n-Hexane 0.09 0.10 n-Decane 0.65 0.70 Acetone 0.29 0.31 0.21 2-Butanone 0.44 0.47 Benzene 0.62 0.64 Ethylbenzene 1.52 1.53 Methylformate 0.19 0.20 n-Butylacetate 0.99 0.94 Acetaldehyde 0.16 0.16 1.2 Butyraldehyde 0.38 0.42 On 20M Relative to Benzene Butylacetate 2.40 2.01 Cyclohexane 0.277 0.228 1-Butanol 3.58 308 Ethanol 0.74 1.29 On SE30 Relative to Cholestane Cholesterol 1.78 1.95 1.94 1.96 1.74 1.2 On SE30 Relative to Pirent Chrysene 2.78 2.81 2.81 2.27 2.08 1.83 On SE30 Relative to Pentane Dimethylbutane 1.35 1.24
0111. aEac column represents a differentliterature source or different column temperature. Some groups have only one entry per column; e.g.,cholesterol shows 6 different sources and/ortemperatures.
14 IITRI-C6104-4 The actual time spentin generating these data was approximately 20 man days. We estimate that, onthe average, wa can generateGC data for 10compounds during an 8-hour working day per instrument.
Our final data filefor gas chromatographycontained data for 308 compounds of which36 had data on SiliconeSE-30 only,
7 had data on Carbowax 20Monly and 265 had data onboth columns. Although wa are unhappyabout the lack of GC data wehave decided to retain it inthe system because of theexcellence of gas chromatography as aqualitative tool. Since there are little or no additional gaschromatographic data available
Which we could incorporateinto our file, any futuredata acquisition would requiregeneration. In order to include gas chromatography in the finalworking system, it will be necessary to standardize theoperating conditions such thatall users of the data retrieval systemwould obtain data oncolumns specially supplied for that purpose. This is a reasonablesolution to the perplexing problem nowexisting among gaschromatographers
regarding variability of"standard" data. The logic underlying
GC's qualitative abilityforecasts greater importance,increased emphasis, and improved data andtechniques for qualitative
analysis by GC. Thus it should be relatively easyto ask the scientific community to generateand contribute data by using
a prescribed setof conditions. Such a proposition would probably be accepted becausescientists have already accepted
HT RESEARCHINSTITUTE
15 IITRI-C6104-4 1 such systems in the areas of NMR, MS, X-ray diffraction, IR, etc.; universally accepted "standard" data that have been compiled are accepted by all for comparison with individual data.
ASTM is approaching the problem of standard GC data with their ASTM DS25A compilation, but, as evidenced by the data in
Table 7, unique data for each compound are not available and won't be until a set of standard conditions is agreed upon by the workers in this area.
C. Nuclear Magnetic Resonance
Probably the most difficult area for indexing data thus far encountered is the area of NMR. A number of systems have been developed by various workers none of Which is either completely satisfactory or widely accepted. The difficulties are inherent in the nature of NMR spectra.
In theory, and frequently in practice, a complete NMR spectrum for a given chemical structure may be generated bya computer. However, the reverse process, generation ofa structure from a spectrum has been too complex for satisfactory handling by computer methods. A skilled practioner may analyze the patterns present in a spectrum almost intuitively, but the rules by which such analyses are made become too complex for practical translation into computer language.
The purpose for which we are indexing these and all other data must be kept in mind, however. Our purpose is not to derive chemical structures from the data but touse the data PITRESEARCH INSTITUTE
16 IITRI-C6104-4 as a "map" to helpidentify unknown compounds and to direct us to the complete spectra ororiginal data source. The methods of indexing must be simple (free ofinterpretive quality) and amenable to computerization. With these goals in mind we can employ indexing methods which are oflittle value in terms of relating to structural characteristics but aresignificant in terms of defining, in an abbreviatedfashion, the uniqueness of the total data. The problems related to various NMRindexing systems may best be discussed with reference toparticular examples. The
NMR spectrum of diethyl phthalate(Figure 1) is a typical example of a relatively simple spectral patternin the format supplied by the Sadtler Research Laboratories, Inc. The molecule is
syn !trir:al and has only four chemically distinctkinds of hydrogen (indicated by a, bi c, and d). The pattern o f a
triplet at 1.35 ppm and a quadruplet at 4.29 ppmwith integral
ratio of 3 to 2 is immediately recognizable ascharacteristic
of an 0CH group. The pattern in the aromatic region(7.47 and 25 7.63 ppm) is typical of adjacent hydrogens on anaromatic ring,
further coupled to more distant aromatic ringhydrogens. It
should be noted here that the Sadtler assignments arechemical
shift values arising from interpretation of the spectrum,and
in the cases of b, c and d do notcorrespond to any actual
peak positions on the spectrum. They represent the centers of
multiplet patterns.
HT RESEARCHINSTITUTE
17 IITRI-C6104-4 PHTHALIC ACID, DIETHYLESTER
Mol. Wt. 222.33 B. P. 298-299°C/735 mm C12H1404 Kingsport, Tenn. Eastman ChemicalProducts, Inc., Source: ASSIGNMENTS cps a 1.35 Filter Bandwidth: 4 b 4.29 Sweep time: 250 sec cps c 7.47 Sweep width: 500t250 d 7.63 Sweep offset: -/225 cps Spectrum amp: 10 Integral amp: 25 Conc. -100mg/0.5m1 CC14
40 10 SI
141
1.1"11
!;1.4 4.0 1141 Si 00
Figure 1
NMR SPECTRAOF DIETHYLPHTHALATE
situation, theinstrumentation To furthercomplicate the evolution towardshigher of NMR isundergoing acontinuing widely usedinstruments atthe present magnetic fields. The most but 100 Mhzinstruments arecommercially time operateat 60 Mhz, studies at220 Mhz havebeen LI available andexperimental
reported.
HTRESEARCHINSTITUTE
18 IITRI-C6104-4 The significance ofthis evolution to our program maybe illustrated by reference tothe spectrum of diethylphthalate at
60 Mhz and at 100 Mhz. Two parametersdetermine the observed patterns in the NMR,the chemical shifttr, expressed in ppm
from trimethylsilane(TMS) and the spin-couplingconstants, J
in hz. The observed value ofgis dependent only on the magnetic field, and When expressedin the dimensionlessterms of ppm is
independent of the fieldstrength of the instrument. The value
of J is constant inhz, and is independentof the magnetic of 4.29 ppm. field. The quartet (b) indiethyl phthalate has a The observed peakpositions at 60 Mhz areshown in Table 8. Simple calculations givethe values at whidhthey would appear
in a spectrum run at100 Mhz. Note that the onlynumber which
is the same in both spectrais the chemical shiftg , which is does not a valueobtained by interpretationof the spectra and represent a peak position. In more complicated caseswhere
multiplet patterns overlap,the difficulties obviouslyincrease
rapidly.
Table 8
NMR PEAK POSITIONS FORDIETHYL PHTHALATE
60 Mhz 100 Mhz ppm hz ppm hz 4 4.29 257.5 4.29 429 Peak 1 4.12 247 4.19 419 2 4.23 254 4.26 426 3 4.35 261 4.33 433 4 4.47 268 4.40 440 = Chemicalshift of multiplet. lITRESEARCH INSTITUTE
19 IITRI-C6104-4 A simple versionof indexing observedband positions is the Sadtler Specfindersystem, a methodapparently derived from in each IR experience. In this systemthe most intense peak in 1 ppm region isrecorded, as well asthe most intense peak phthalate is presented the whole spectrum. The data for diethyl in Table 9 below. The range from 0-14 ppmis required to insure inclusion of allpossible compounds althoughpeaks are far from very rarelyfound beyond 9 ppm. While this system is ideal, from a practicalpoint of view it appearsto be best for
the purposes of thisstudy. The resulting series ofdigits re- present a set of dataWhich is characteristicof a relatively
small number of compounds,and which maynevertheless be readily obtained from rawpublished data. The problem of
peak position shift withchanges in operatingfrequency outlined
in the previous sectionis still present, butis minimized by the fact that the strongestpeak in a multiplet willbe near
its center, and theshift with frequencywill therefore be
small.
Table 9
NMR PEAK POSITIONS FORDIETHYL PHTHALATE ACCORDING TO SADTLER
Strongest Band 3 4 5 6 7 8 9 10 11 12 13 14 1 2 1.35 35 - - 23- -50
A more refined system,developed by VarianAssociates,
consists of recording therelative integratedintensity of the
peaks in each 1 ppm range. For our example ofdiethyl phthalate, lITRESEARCH INSTITUTE
20 IITRI-C6104-4 these data would have the form shownin Table 10. While this form of data recording contains alittle more structural information (i.e., the ratios of thevarious chemically different kinds of protons in the molecule)it is less sensitive for distinguishing between structural isomers sincediethyl isophthalate, diethyl terephthalate, and probably anyethyl esters of trisubstituted benzoicacids would give the same profiles. This is because the profile simply represents aratio of 2 aromatic protons to one ethyl ester group. The shape of the
aromatic signal would change markedly, butit would not be shifted outside of the 7-8 ppm region. An advantage of the system is
that the data may be transferred directlyfrom the spectrometer
to computer storage, but thiswould be useful only in a program
of data generation. Also, for our present purposes, a good deal of the requisite data would be unavailablefrom the literature
except via calculations from theoreticalintegrals since very
little integral data is published. We investigated this method through discussions with Dr. LeRoy Johnsonof Varian Associates
so as to have a completeappraisal of the method on hand. This method of indexing did not appear to bepractical for our
purposes.
Table 10
NMR PEAK POSITIONS FOR DIETHYL PHTHALATE ACCORDING TO VARIAN
pm 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 % - 43 - -29- - 29 - - =IN 011 =IN =IN
lITRESEARCH INSTITUTE
21 IITRI-C6104-4 being An experimentalTermatrex indexof NMR data is compounds were evaluated in an ASTMproject. In this system, number of lines per indexed by the number,position, area and
"cluster" in theirspectra, as well asby the presence or While the absence of certainfunctional groups orelements. it is too system seemsfairly effective onthe scale tried, and the cumbersome for aproject of themagnitude we propose, mentioned method of indexingis open to manyof the objections
above in connectionwith othertechniques. shifts Other workers havedevised indicesbased on chemical techniques all require and spin-multiplicities,but since those prior to storage of considerable interpretationof the spectra further data, we caneliminate thesepossibilities without
discussion. Therefore, the bestcompromise of reasonablydefinitive experimental data which areeasily yet briefdescriptions of the handled by computertechniques is theSpecfinder system. used We have usedthis approach inrecording our data but Some tests, which the first mostintense peak forsearching. using the first and will be discussedlater, were conducted second most intensepeaks. particularly where the A systembased on this approach, would require instrument is connecteddirectly to the computer, (60, 100 or 220Mhz). Data a standardcondition in frequency obtained at differentfrequencies would have tobe interpreted Although before presentation tothe data file forscreening.
lITRESEARCH INSTITUTE
22 IITRI-C6104-4 would prevailwith any file someldhat inconvenient,this is what would not detractfrom the that exists now. Such a compromise application usefulness of thesystem consideringits qualitative
and purpose. the Sadtler andVarian Our main sourcesof data were to these, wehave searched spectra collections. In addition increase the numberof several other sourcesin order to which we have allfive types compounds on ourselected list for Index to NMRLiterature of data. We have usedthe "Formula locate the spectraof a number of Data," VolumesI and II, to the MCA and APIcollections, which compounds. In addition to of theThermodynamics are handledby the DataDistribution Office (and which Research Center atthe Texas A&MResearch Foundation obtained copies ofthe Tiers'Tables, we have onhand), we have privately circulatedby a collectionof chemicalshift values Collection, circulated G. Tiers of3M in 1958,and the Humble in 1959. The Jeolco by N. F.Chamberlain ofHumble Oil Company Sadtler. Collection of spectrahas also beenobtained from chemical shifts Although the currentpractice is to report literature arereported in ppm fromTMS, the datain the older has provided some in a varietyof fortw. This situation but so far thequality interesting exercisesin calculation, conversion to the ppm of the data appearssatisfactory after arecalibrated in scale. For instance,the Humble spectra These values areconverted to mgauss,relative to benzene. the magneticfield strength ppm relativeto TMS bydividing by
lITRESEARCH INSTITUTE
23 IITRI-C6104-4 --""--1li'addinga constant tochange the reference from benzene to
TMS. Many of the spectra are run at30 or 40 Mhz, vihich
magnifies the spin coupling constantsrelative to the chemical
shifts. Further, a variety of solventshave been used, and solvent shifts must therefore be considered. However, qualitative inspection of the data andcomparison of data for
a single compound fromseveral sources indicate that the constancy of the values recorded in oursystem is adequate,
although it leaves something to bedesired in absolute
reproducibility. NMR data derived from literature sourcescomprised 6
percent of our total NMR data input. The majority were obtained
from Sadtler and Varian compilations.
D. Infrared Two basic approaches to indexing IRdata were considered.
These are the methods used by Sadtler andASTM. We decided to
use ASTM's approachwith the exception that we listed thepeaks
in order of decreasing intensity. ASTM's method is to list,
in order of wavelength to the nearest0.1 micron, the most
intense band plus all bands Whoseintensities are 10 percent or
greater of the most intense band. Sadtler, on the other hand, lists the most intense band within each 1micron region from
0 to 15 microns. Using a modified ASTM approach weindexed the three most intense bands (except 3.44Which was so
prevalent as to be relativelynon-discriminating) in order of
decreasing intensity. HT RESEARCHINSTITUTE Ii
24 IITRI-C6104-4 Indexed IR data forabout 700 compoundsfrom our list were obtained from ASTMthrough Dr. Kuentzel. About 1,800 punched
IBM cards wereprovided by Dr. Kuentzeland represented, in compound. many cases,multiple listings of IRdata for a given Because the 1,800ASTM IR pundhed cards werenot interpreted, printouts of the deck wereprepared to facilitatedata extraction from the cards. The ASTM cards recordall significant Because our IR peaks, but theydo not indicateintensities. search routine whichwill be discussed laterrequires data for the three most intensepeaks, it was necessaryto consult the original spectra(mainly Sadtler) todetermine intensities. from the ASTM deck, The intensities wererecorded on the printout complete source and the printoutwith intensities is now a more document for the projectthan the ASTM cards.
The entires representing542 spectra from SadtlerResearch Laboratories were allcharacterized by punches4 and 3 in column
79 and y and 1 incolumn 80. In columns 74through 78 the ASTM
number gave these innumerical order; thus it was amuch simpler
task to check thevalue of the Sadtlerspectrum and the accuracy
of the ASTM listing. Fully 10 percent of theSadtler spectra(1962 edition)
reviewed were poor enough toraise doubts concerningsomeof the acceptability can be absorption peaks. Usually this lack of attributed to concentrationcontrol (the samples weretoo thick techniques. or toothin), to impure samples, orto poor mulling these It is understandablewhy Sadtler haspublished a number of
PITRESEARCH INSTITUTE
25 IITRI-C6104-4 again as "series B." Furthermore, in almost 5 percentof the spectra viewed, ASTM had madean error either in interpreting the
spectrum or in failing to transcribeand keypunch the data
correctly. In either case, strong absorptions (notattributable to the solvent or mulling medium)were not listed, or medium
absorptions were listed incorrectly. Medium strength absorptions that had not been listedwere not counted as a mistake in the
above error analysis. This emphasizes the importance ofhaving trained personnel verifyany data placed in the data bank of the retrieval system.
While the intensities of the 3 mostintense bandswere accorded the intensities,per se, were not used in the screening
process. The intensitieswere used only to rank the three peaks
in order of decreasingintensity.
E. Mass Spectrometry
Indexing mass spectrometral dataproved relatively easy. The ten most intense peakswere indexed along with their
relative intensities. The only standard criteriarestricting the selection of the peaks isthat they be obtained with
electron bombarding energies of 70electronvolts, a widely
accepted practice. The most intense peak (base peak)was given an intensity value of 1000 and the remainderwere given values in percents of 1000. In our actual screening only thefirst 4 peaks (in order of decreasingintensity) were used. The intensity, per se, was not used inthe screening process.
HT RESEARCH INSTITUTE
26 IITRI-C6104-4 taken from two sources: All of the massspectral data were of Mass Spectral 1. ASTM PublicationNo. 356, "Index Pa., 1963. Data, 1st Edition,ASTM, Philadelphia, of Mass 2. Cornu, A., Massot,R., "Compilation 1966. Spectral Data,"Heyden & Son,Ltd., London, about 3,200compounds The ASTM publicationcontains data for compounds. While Cornu andMassot's has datafor about 5,000 compilation are alsofound Most of thecompounds in the ASTM
in the Cornu-Massotcompilation.
F. UltravioletSpectrophotometry through the program webecame About two-thirdsof the way chromatographic data. We concerned aboutthe lack of gas spectral data into decided to incorporateultraviolet (UV) identification of our file toprovide a broaderbasis for data for the838 compounds compounds. Accordingly, spectral three data sources: were soughtusing the following located at AbbottLaboratories, 1. Sadlter's compilation
North Chicago,Illinois Compounds," 2. "Ultraviolet Spectraof Aromatic Wiley & Sons,Inc., R. A. Friedeland M. Orchin, John New York, 1957 Spectra Index 3. "Ultraviolet andVisible Absorption Academic Press, New for 1930-1953,"H. N. Hershenson,
York, 1956.
The third sourceis a literatureindex that provided the major references to theoriginal data. Sadtler's file was HT RESEARCHINSTITUTE
27 IITRI-C6104-4 source of data. Because ASTM's UV spectral files were not
readily available, we did not use that source for UV data. The ASTM file has data for almost 26,000 compounds.
The majority of UV spectra have been obtained in solvents over a wavelength range of 200 to 400 millimicron (mg). The most common solvent was methanol. Hence, data for compounds run in methanol were selected. Our data were reported in the
range of 200 to 400 mg. We recorded only the strongest peak
and excluded extinction coefficients. The strongest absorption band was located to the nearest mg. A tolerance of +2 mg was used in the search. Selection of this tolerance was based on experience in reproducing wavelength readings that are subject to both instrumental and interpretational variations.
A sample that is transparent in the UV region is so indicated in the data file.
G. Survey of Signature Availability
A survey was made by information specialists between June,
1967 and March, 1968 to determine the overall availability of signature data from the major data sources in the country.
The American Society for Testing and Materials (ASTM) is responsible for the identification and indexing of the significant spectral collections throughout the United States. The ASTM index identifies virtually all spectra available from Sadtler
Research Laboratories, National Bureau of Standards (NBS),
Documentation Molecular Spectroscopy (DMS), Thermodynamics
Research Center (TRC) (which includes the American Petroleum lITRESEARCH INSTITUTE
28 IITRI-C6104-4 Institute (API) and ManufacturingChemists Association (MCA) collections), the Coblentz Society, theInfrared Documentation
Center, Japan (IRDC-Japan), andthe open literature, domestic
and foreign. As of March, 1968, ASTM hadin its files information concerning 100,000 IR, 3,200 MS, and25,749 UV spectra. A
subcommittee is working on the indexingscheme for NMR. A
trial NMR indexing scheme has beenprepared by Jonker Business
Machines, Inc. This scheme, organized for use ofthe Termatrex
system, is now under review. ASTM has also prepared an updated
GC Compilation DS25A(approximately 4,000 compounds) Which was published in late 1967.
ASTM expects to add 15,000-20,000IR spectra yearly to its
index. Some 9,000 MS spectra will be addedthis June from
European collections. Sadtler Research Laboratories have32,000 IR spectra,
4,000 NMR spectra, no GC, no MS, and22,000 UV spectra on
organic compounds. All Sadtler spectra are identifiedin the
ASTM indices. It is anticipated that 2,000signatures in IR and NMR will be added annually. Dr. Zwolinski, director ofthe Chemical Thermodynamics Research Center (TRC), states that there arepresently 13,000
complete spectra available at TRC. As of December 31, 1967, this collection consisted of 3,950 IR,1,246 UV, 2,651 MS, and
1,260 NMR spectra. All TRC spectra are identified andindexed
through ASTM. However, TRC expects to have its ownindex
lITRESEARCH INSTITUTE
29 IITRI-C6104-4 " +404,,Prv*IP,'411.4.0014/0.,,,
available in the Summer of 1968. The Coblentz Society, which also has its spectra identified and indexed through ASTM, has 5,000 complete IR spectraavailable.
These are sold by Sadtler Research Laboratories.
A tabulation of major spectra sources appearsin Appendix B.
Eadh table lists the major sources, number of spectraavailable, the form of the spectra, the cost, and the data sources. The tables are accompanied by a listing of other datacompilations.
Cost figures for data acquisition, cannot be estimated adequately on the basis of the procedures used in this research inasmuch as the procedures were experimental, varied and unique for the method of compound selection used. Methods for data acquisition will be different if future researdh is conducted. In general, data acquisition costs whidh will account for a large share of costs in an operatingsystem fall into three categories:
1. Data obtained from published sources and collections Cost of collection Cost of transcription
2. Data measured from published sources
Cost of measurement
Cost of transcription
3. Data obtained from laboratory measurements
Cost of compounds Cost of manual processing Cost of data reduction and transcription, or Cost of automatic processing. lITRESEARCH INSTITUTE
30 IITRI-C6104-4 obtained from acombination The data placedin the file were processing of of the threecategories exceptthat automatic employed in thisphase of the laboratory measurementswas not study.
Design H. Cataloguing Data andData Master Card inclusion into thedata For every compoundselected for compound. file there is a masterdata sheet onfile for that the selection A searchmedhanism presentlyhas as its purpose gheets which represent of one to, at most, afew of such master match the inputdata. The the compound(s)whidh most closely the indexed data data contained onthe 8 1/2 x 11form include plus the originalsources for the analyticaltedhniques involved
of data. project is found A sample ofthe master sheetused in the compound used in in Appendix C. A gheet wasprepared for each includes: the file. The information onthe gheet Compound name - Thecompound will belisted by IUPAC name, andtrivial or trade namesare included for crossreference from CAS registry -The CAS registrynumber, obtained number the SOCNA, Handbookalso aids in identifying the compound is listed GC data - The gaschromatograPhic data R for Carbowax20M and SE-30columns. Both the relativespecific retention
lITRESEARCH INSTITUTE
31 IITRI-C6104-4 volume and the Kovats retention
index appear
MS data - The mass-to-charge ratio islisted above the relative intensities in
decreasing order
IR data - The three most intense infraredpeaks (excluding 3.4) are listed
in order of decreasing peak
transmittance
NMR data - The NMR is listed as it appearsin the Sadtler Specfinder Index. The strongest
peak and solvent are indicated
UV data - The most intense Land is listed tothe nearest millimicron (mg)along with
the solvent used.
Structure, empirical formula,melting point, boiling point, physical state and the original data source are also indicated on the master data sheet.
IV. TESTIVG SIGNATURE CONCEPT
&Igsperimental Design
1. Objectives
A search experiment wasdesigned to determine the feasibility of identifying chemical compounds bymatching their signatures
with those of known compounds. The experiment tested the hypothesis that an unknown compound canbe identified by lITRESZARCH INSTITUTE
32 IITRI-C6104-4 sequentially searching adata fileconsisting of a minimum signature number of significantdata points ineach designated of the datamatches. and obtaining thelogical intersection tolerances, and the By carefulselection of thedata points, the that the intersectionof matching criteria,it was anticipated compounds, the matches wouldcontain a small setof candidate one of whichwould be thedesired compound. provide data for The experiment wasalso designed to match criteria, establishing searchprocedures, formulating power ofthe five and investigatingthe discrimination objectives signatures in thedata base. These auxiliary of the experiment arelisted below. match criteria in (1) To investigatethe following the search ofeach signature:
(a) Data parameters Number of peaks Magnitude of peaks Ranking of peaks
(b) Data tolerances
(c) Conditions of dataacquisition Solvents (whenapplicable) Standard measurementconditions capabilities of (2) To determinethe discrimination
each signaturefor individualcompounds, chemical groups, andall the compoundstogether
UTRESEARCH INSTITUTE
33 IITRI-C6104-4 (3) To measure theselectivity of sequential search
by:
(a) Combining searches
In10combinationstakentwo at a time In10combinationstakenthree at a time In5combinationstakenfour at a time
(b) Determining the minimum numberof signatures required to achieveidentification of an unknown
(4) To develop a measure ofsearch effectiveness that
incorporates:
(a) Rapid screening
(b) The least number of searchoperations
(c) The maximum selectivity
(5) To investigate a weightingsystem for evaluating
and ranking compounds.
2. Test Inputs The experiments consisted of a setof searches with three categories of inputs:
Input 1: Compounds having all five signaturesavailable in the data base were used as"unknowns."
Input 2: Compounds having differentcombinations of signatures were used as unknowns;these data
were not includedin the data base.
Input 3: Experimental data derived fromin-house laboratory measurements werematched against
stored data.
IIT RESEARCHINSTITUTE
34 IITRI-C6104-4 Input 1 data were used to determine the ability of the search tc; make positive identifications, given that the compounds are stored in the data base. Input 2 data were used to test the ability of the search to discriminate against closely related compounds Which had data stored in the data base. Input 3 data were used to test the selectivity of search under non- controlled input conditions that may be encountered in practice.
Input 1 and 2 compounds served essentially the same dis- crimination-testing function. By eliminating an Input 1 compound from a candidate list, an Input 2 condition was established. The remaining candidates are those that would be obtained if the input data were not available in the data base. The Input 3 compounds were used to test the tolerance limits that would be required in an operational system in Which data describing unknown compounds to be identified would be derived from laboratory measurements made externally or at
IITRI.
3. Tests All "unknown" compounds from inputs 1, 2, or 3 were processed in the same manner. Input data were inserted in designated locations on the data sheet shown in Appendix D.
The data base, stored in an inverted-term-Termatrex system (described in Report No. IITRI-C6104-2) was searched, one signature at a time, and the numbers of compounds whose stored data matched the input data accoluing to the match criterion were obtained.
lITRESEARCH INSTITUTE
35 IITRI-C6104-4 To obtain data on matchcriteria and discrimination capa- bilities, the search in each signature wascarried out independently without reference to previoussearches. To obtain data on selectivity and search effectiveness, acumulative search was
then made by selecting the numbersof compounds in the inter-
section of all previous searches withthe currnt search; these data were tabulated on the searchrecord shown in Appendix E. Selected candidate compounds that werethe intersection of
the matdhes in all availablesignatures were entered on the
candidate list shown in Appendix F.
Two series of tests were made. The first test was based
upon the formulation ofinitial match criteria. The second tests
were developed to explorethe effects of modified matchcriteria
on search precision.
a. Initial Match Criteria
The initial match criteria forthe seven steps of the
sequential search were:
1. State: solid/liquid/gas
2. Elements: combinations of significant elements
as coded in column32 of the ASTM punched cards for
infrared spectra
3. Infrared spectroscopy: match on any two of the first three highest peaks of spectrum. The 3.44 peak (C-H band) was omitted from the input data
4. Nuclear magnetic resonance: match on the highest peak; solvents were coded but wheredisregarded
HT RESEARCHINSTITUTE
36 IITRI-C6104-4 5. Mass spectroscopy: match on any two of the four highest peaks; matches were made on the mass numbers
of ranked peaks but not on the relative amplitudes
6. Gas chromatography: match on relative retention time of either one of two columns depending upon
availability of data
7. Ultraviolet spectroscopy: match on the most
intense band.
b. Modified Match Criteria
Following an evaluation of the precision of the search results obtained with the initial match criteria, several modifications of the criteria were made to assess their effect in improving precision.. These modifications iacluded the following criteria used singly and in combinations:
1. State: no change
2. Elements: no change
3. Infrared spectroscopy: no change
4. Nuclear magnetic resonance: match on two
out of the two highest peaks
5. Mass spectroscopy: a) same as initial criteria except the 1/4 and 4/1 input data/stored
data peak matches were deleted; b) match on three
out of four peaks
6. Gas chromatography: match on relative retention
time of both columns A and B
7. Ultraviolet spectroscopy: no change.
HT RESEARCH INSTITUTE
37 IITRI-C6104-4 4. Candidate Ratings Ratings were used to evaluatethe probability that an unknown compound was a compound onthe candidate list in accord- ance with weightsassigned to the signatures andwith the close- ness of the match. The maximum ratings assigned tothe five types of signatures and to the state and theelement searches are:
MS 40
IR 36
NMR 24
GC 18 (9 points per column)
UV 6
Elements 4
State 2
130
To test the tolerance limitsin each signature, the rating scale considered exact matches ascontrasted to matches within the tolerance range. A skip was rated 0, and adefault was rated 1. The rating table for a candidatelist is shown in
Appendix G.
B. Search Procedures
1. Search Logic A composite chemical signatureof a compound consists of
a representationof the significant parameters of itscomponent
signatures. The composite signature for acompound can be HT RESEARCHINSTITUTE
38 IITRI-C6104-4 represented as
Ci = f (Mi, Ri,Ni, Gil Ui.), i =(1, 2, n) where C = compound M = mass spectrometrysignature
R = infraredspectroscopy signature
N = nuclearmagnetic resonancesignature
G = gaschromatography signature
U = ultravioletspectroscopy signature. Each signature, in turn, canbe represented bythe para- meters selectedfor the experiment: a a a 3 4 2 if i : M : a , M : , M M.=f (M 1 2 a 3 a 4 al 1 i 1 i 1 i i i i where M= maximumpeak 1 a= amplitudeof maximum peak =unity 1 a4 and 4th peaks M. : = rankorder of 2nd, 3rd, 3 Al
R. = f (b b , b ), 1 ' 2 3 i i i
= threehighest absorptionbands with transmittance Where b1,2,3 equal to or greaterthan 10% of the mostintense
band.
N= f(h ), i max.
6 = 0 x 10 = chemicalshift. where hmax. max
Gi = f (gAi gBi),
(dimensionless), and A and B Where g = relativeretention time
are columnsdescribed in Table6. HT RESEARCHINSTITUTE
39 IITRI-C6104-4 U. = f
the most intensepeak. where4max The objective of asearch aimed atidentifying an unknown compound is to yield aminimal set of compoundsthat satisfies the Boolean expression
C3. = tcrniin(Cail (c4fa)(CGil
designate the wherefil=logical AND, and thei and j subscripts stored known andinput unknown compoundsrespectively. and E preceded the State and elementssearches, Si i' composite signaturesearch in the experiment.
2. Com uter Search Desi n
A generalizedsearch procedure capableof being implemented
on a computer wasdeveloped to providethe Boolean search and described pre- to serve theobjectives of the experiment as to every viously. A data availabilitycode was assigned unknown compound whoseidentificationwas sought and to every or stored (known) compound. The code indicates the presence
absence of data uponwhich a match can beattempted. The absence
of data for a parameterin an unknown compound,for example, precludes a searchthrough the relevant parametersof the known
compounds. In a computer search,each step would startwith a match compound and on thedata availability codesof an unknown
the candidatecompounds. A search routineCSEAR would be
lITRESEARCH INSTITUTE
40 IITRI-C6104-4 Table 11 SEARCH TABLE (CSTABL)
CSEAR 1 STATE
PARAM 1 Solid/Liquid/Gas
CSEAR 2 ELEMENTS
PARAM 2 As specified by unknown
CSEAR 3 MS
/m (mass with highest amplitude) 1 M (mass with 2nd amplitude) PARAM 3 2 (mass with 3rd amplitude) 3 M (mass with 4th amplitude) 1M4 CSEAR 4 IR
PARAM 4 42 (second peak) 43 (third peaY)
TOLER 4 +0.14
CSEAR 5 NMR
PARAM 5 PPMmax TOLER 5 +0.1 PPM
(PARAM) Solvent (not used at present)
CSEAR 6 GC {Relative Retention Time, Column A PARAM 6 Relative Retention Time, Column B
TOLER 6 (See Section IVD1d)
CSEAR 7 UV
PARAM 7 Strongest Peak
TOLER 7 +2m4 lITRESEARCH INSTITUTE
41 IITRI-C6104-4 called in from a search table CSTABLand the appropriate parameter(s) PARAM and tolerance(s) TOLER would beinitialized
in accordance with Table 11.
The set of compounds that matchedin each stage would be placed in a candidate list togetherwith ratings as per the
rating table. Only the signatures of compounds on thecandidate lists would be seardhed in subsequent stages. Surviving candidates
at the end of all stages would beprinted out in order of
their accumulated ratings. Upon completion of the search, the resultswould fall into
one of four categories:
1. Zero selection - no compounds
2. Plural selection - more than one compound
3. Conditional selection - one compound isolated with less than "perfect" rating; identification
cannot rule out other possibilities
4. Unique selection - positive identification of single compound having a "perfect" rating.
3. Termatrex Experiment The search experiment described above wasimplemented
by using an inverted file with opticalcoincidence (Termatrex
system) to demonstrate the feasibility ofidentification by
sequential signature matching. The logic and search procedures, Which were designed for a computer, werecarried out using
drilled cards and appropriate tally forms.
lITRESEARCH INSTITUTE
42 IITRI-C6104-4 C. Search Measures
1. Availability andDefault Ratios unknown compound by The feasibilityof identifying an sequentially searching adata file consistingof a minimum
number of significantdata points indesignated signatures and on depends on gooddiscrimination in eadhsignature search
fine selectivityin screening thedata base. As used in thisreport, discriminationis the capability well-defined to separate setsof compounds onthe basis of characteristics of asignature. Selectivity is thescreening
capability of asequential search. Precision is the accuracy
of identification. Because data are notuniformly availablefor all compounds must be considered as a in all signatures,data availability selectivity, factor in formulatingmeasures fordiscrimination, actions are and precision. Accordingly, thefollowing search
defined. given signature of a A match occurs whendata stored for a tolerance range ofinput known compound areequal to or in the for a givensignature data. A default occurswhen input data compound does nothave of an unknown areavailable and a stored of stored data data for thatsignature. The nonavailability precludes obtaining amatch but does notrule out the possi- data is a candidate. bility that thecompound with no stored of an A skip occurswhen input data for agiven signature seardh of unknown compound arenot available. In this case, a HT RESEARCH INSTITUTE
43 IITRI-C6104-4 the stored data in the sic;-aturefile is not possible.
The availability ratioof a signature is the ratio of the number of compounds having dataavailable to the total number of compounds in the data base. The default ratio of a signa- ture is the ratio of the numberof compounds not having data to the total number ofcompounds in the data base. These ratios can be expressed mathematically: a =Eff3=E=1- a,where
C is the number of compoundsin the data base
S is the number of compoundswith available data in a given signature D is the number of compounds with nodata in a given signature
a is the availability ratio
is the default ratio.
Table13 lists the availability and thedefault ratios of eadh signature in the 500-compounddata base.
2. Discrimination The discrimination factor of asignature is a function of the number of matches and thedefault ratio inasmuch as the number of possible successfulmatches is reduced by the non-
availability of data. The defaults can be likened to a seaof
uncertainty upon which successfulmatches float. The discrimi-
nation factor, b, is expressed as: =Mxp where M is the number of matchesobtained in a signature seardh.
HT RESEARCH INSTITUTE
44 I1TRI-C6104-4 C=73 C=f Table 12 SIGNATURE AVAILABILITY AND NUMBER OF DEFkULT RATIOS SIGNATURE MS COMPOUNDSNUMBER OF 500(C) SIGNATURESAVAILABLE (S)478 NUMBERDEFAULTS OF (D) 22 AVAILABILITY 0.956RATIO (a) DEFAULT RATIO0.044 (p ) IRNMR 500500 470357 192143 30 0.7140.9400.616 0.0600.3840.286 UVGC TOTALS 2500 500500 1846 233308 654267 0.4660.738 0.5340.262 3. Selectivity The selectivity of asequential search measuresthe si.ev- ing capability of thesearch and deals withthe intersections the ratio of the signature matches. Selectivity is defined as of the number ofsurviving candidates atthe end of a stage to the total number ofcompounds in the database. Inasmudh as compounds with defaults canbe present in thecandidate list, Expressed mathe- an adjustmentfor defaults is not necessary. matically,
2,...n) SelectivitYk= nk = 14-4alis = 1, where subscript k designates aseardh stage
n is themaximum number of stages Mk is the number ofmatches at the endof the k-th stage.
The sequential searchconsisted of seven stages: state, elements, and the fivesignatures (MS, IR, NMR,GC, and UV).
The final search results areindependent of the orderof search
inasmuch as intersection iscommutative. However, the search effectiveness, that is, therapidity of screeningand the total
number of compounds thatmust be seardhed,is affected.
4. Precision The most critical ofthe measures thatdescribe the
results of a sequentialsignature search isprecision. Precision 1 and is the accuracy ofidentification. In the case of Input
Input 3 compounds, perfectprecision would resultin the
lITRESEARCH INSTITUTE
46 IITRI-C6104-4 compound. identification of theinput compoundand only that precision would re- In the caseof Input 2compounds, perfect The elimina- sult in therejection of allcandidate compounds. of more than one tion of the"true" compound(or the presence identification of candidate) in theformer case andthe "false" would decreasethe one or morecompounds in thelatter case precision. is an arbi- Retention of acompound on thecandidate list number ofsignatures trary decisionthat depends onthe minimum threshold in therating scale. required forscreening and on a be increased By use of thesetwo quantities,the precision can discrimination by meansof or decreased. Similarly, enhancing criteria alsoincreases pre- tightening matchtolerances and
cision. selected Precision is measuredby the numberof candidates
in identifying acompound. Match Criteria D. Test Resultswith Initial based upon theinitial The resultsreported below are
match criterialisted in SectionIVA3a.
1. Discrimination number of matdhes,the The range ofmatches, the average and thediscriminationfactor for averagenumber of candidates Input 1 compoundsare shownin each signaturefor the 100 the number ofposi- Table13. The match rangecolumn indicates in eachsignature tive matches(less defaults)that occurred minimum number ofmatches obtained wlth "low"referring to the
lITRESEARCH INSTITUTE
47 IITRI-C6104-4 Table 13 FOR 100 KNOWN COMPOUNDS MATCHRANGE SIGNATURE DISCRIMINATION NUMBER OFAVERAGE NUMBER OF NUMBER OFAVERAGE DISCRIMINATION SIGNATURE (LESS DEFAULTS)LOW 1 HIGH153 MATCHES 49.83 DEFAULTS 22 CANDIDATES 71.83 FACTOR 2.19 MSIR 1 8459 30.3931.22 143 30 174.22 60.39 8.931.82 NMRGCUV 11 8539 36.7711.74 267192 203.74303.77 19.64 4.51 number. in the 100 searchesand "high"referring to the maximum compounds in There was at least onecompound among the 100
each signature that wasthe sole selectedcandidate, exclusive in the MS of defaults. The largest rangeof matches occurred of sig- signature, the smallest rangein GC. The distribution the nature matches isshown in Figure 2. Both the mean and medians are listedtogether with the rangeof matches including
defaults. The effect of thedefaults is clearly seen asthe distri- The butions of the signatures areshifted on theabscissa. distribution is due tothe 34 peak at 351-360matches in the UV compounds that aretransparent to UV.
a. MS Discrimination
The MS search rankedsecond in discriminationamong the seardh five signatures. An average numberof 49.83 matches per With from an availabledata base of 478compounds was found. candidates were found, onthe the 22 defaults, atotal of 71.83 The dis- average, foreadh of the 100 knowninputcOmpounds. crimination factor was2.19.
The criterion ofmatch was equalitybetween at least two in the offlour peaks. Because numerouspermutations were found ranks of the four majorpeaks among differentdata sources for combinations of a singlecompound, any two matchesfrom the 16 input and stored datapeaks were accepted. The rating table match of (Appendix G) takes permutedmatches into account, a match input peak 1 againststored peak 1 ascontrasted wlth a
UT RESEARCHINSTITUTE
49 IITRI-C6104-4 SIG. MS MEAN71.83 MEDIAN48.00 RANGE23 115 30 IR NMR NMRIR 60.39174.22 56.33176.00 144 31 202114 1020 rms04LNIa111) GCUV 303.77203.74 200.40280.77 268193 352231 50 0 MS UV 4030 i....OAIII..) 1020 0 co 0 co 0CN 0 0 0cN 0 0 0:11 0re) DISTRIBUTION OF SIGNATURE MATCHES NUMBER OF MATCHES Figure 2 'of input peak 1 againststored peak 4, forexample. the permuted peaks . e A summary of the MSmatches obtained on 14. of a sample of 25known input compoundsis presented in Table (maximum) number The range indicatesthe low(minimum) and hiah
of candidates obtainedin the searchesfor the 25 compounds. The results of the1 on 1, 3 on 3,and 4 on 4 matches were as expected, namely thatthe average numberof matches was great- match was there est in thesepermutations. Only in the 2 on 2 The 1 on 1 matches a reversalof order with the 2 on1 match. provided the largest averagenumber of matches,followed closely
by the 4 on 4matches. Since there is notolerance on an MSpeak, discrimination smaller) by of the search can beincreased (i.e.,co can be made (1) increasing the numberof peaks for whichmatches are re-
quired, e.g., three outof four; (2) increasingthe number of
peaks searched, e.g.,from four to five orsix, and raising the
match criterionaccordingly; (3) introducingrelative ampli- peak tudes, and (4)eliminating some of the16 combinations of
seardhes now employed, e.g.,input peak 1 mustmatch on any one
stored data peaks 1,2, or 3, but not4. To evaluate the effectof changing thenumber of peaks re-
quired for MS matching,Table 15 was prepared. The table lists criteria the average numberof matches obtainedwith the varying The of 1 out of 4 to 4 outof 4 peaks, exclusiveof defaults. matches on one peakonly, term "exclusive"in Table 15 refers to The two peaks only,three peaks only, andfour peaks only.
HT RESEARCH INSTITUTE
51 IITRI-C6104-4 Table 14
MS MATCHES ON PEAK PERMUTATIONS (25 Input Compounds)
PEAKS RANGE AVERAGE NUMBER INPUT STORED LOW HIGH OF MATCHES
1 2 78 34.76
2 2 37 19.92 1 3 0 68 16.24
4 1 69 14.12
1 0 78 19.20
2 2 37 15.52 2 3 2 68 13.00
4 0 69 12.08
1 0 78 12.20
2 1 37 13.08
3 3 2 68 20.00
4 1 69 15.80
1 0 78 16.40
2 0 37 16.92
4 3 0 68 24.88
4 1 69 29.16
52 IITRI-C6104-4 term "inclusive" refersto every instanceof one, two, three, or four mai:ches;for example the numberof matches on one peak includes the number of matches ontwo, three, and fourpeaks.
Table 15
NUMBER OF CANDIDATECOMPOUNDS OBTAINED WITH DIFFERENT MS MATCHCRITERIA (100 Known Input Compounds)
NUMBER OF MATCH RANGE AVERAGE NUMBER PEAKS ___(INCLUSIVE) OF MATCHES INCLUSIVE MATCHED LOW HIGH EXCLUSIVE 1
1 3 313 103.54 153.37
2a 1 153 40.23 49.83
3 2 38 7.85 9.60
4 1 6 1.75 1.75
aCriterion used for the experiment: matches on2 or more peaks. The average of 49.83 matches(exclusive of defaults) ob-
tained with the criterion ofmatching on 2 out of 4peaks would
be reduced to 9.60 if3 out of 4 peaks were thematch criterion
A 4 out of 4 criterionwould further reduce the averagenumber of candidates per searchto 1.75.
b. IR Discrimination The IR search was the mostdiscriminating of the five
signatures. An average number of30.39 matches per search was found from an available database of 470 compounds. With the
30 IR defaults, a total of60.39 candidates werefound, on the
average, for each ofthe 100 known inputcompounds. The dis-
crimination factor was 1.82. lITRESEARCH INSTITUTE
53 IITRI-C6104-4 compounds, The peak at3.44, characteristicof hydrocarbon and stored data. The 5.8/2 had been eliminatedfrom all input data and was avalid seardh peak, however, wasincluded in the stored data wereequal to value. A match wasregistered if the data. Permutations or withinthe tolerance rangeof the input MS seardh, i.e.,IR in- of ranked peaks wereseardhed as in the 2, and 3, etc. put peak 1 wassearched againststored peaks 1, permuted peaks A listing ofthe IR matdhesobtained on the measured in MS appearsin Table of the same 25compound sample obtained on the 1 on1 and 16. The average numberof matches categories. 2 on 2 peaks werethe highest intheir respective than The number of 3 on2 peak matches wereslightly greater
the 3 on 3 matdhes. Table 16
IR MATCHES ONPEAY PERMUTATIONS (25 InputCompounds)
PEAKS RANGE AVERAGE NUMBER OF MATCHES INPUT STORED LOW HIGH
1 10 71 41.72 22.12 1 2 13 39 3 11 24 18.44
1 1 71 24.36 27.28 2 2 2 57 , 2 67 26.20
1 2 64 26.40 34.56 3 2 4 59 3 9 67 33.92
. _ ii HT RESEARCHINSTITUTE
54 IITRI-C6104-4 Li The number of candidatecompounds obtained with1-, 2-, and 3-peak matches isshown in Table 17. The rarge end exclu- sive-inclusive definitions arethe same as thoseaccompanying
Table 15.
Table 17
NUMBER OF DIFFERENTCOMPOUNDS OBTAINED WITH DIFFERENT IR MATCHCRITERIA (100 Known InputCompounds)
NUMBER OF MATCH RANGE AVERAGE NUMBER PEAKS (INCLUSIVE) OF MATCHES MATCHED LOW HIGH EXCLUSIVE INCLUSIVE
1 38 282 161.28 191.67
2a 2 84 27.05 30.39
3 1 12 3.34 3.34
aCriterion used for theexperiment: matches on2 or more peaks. The 2-peak match criterionresulted in an averagenumber
of 30.39 matches perseardh (not includingdefaults), whereas 191.67 a single peakmatch would have resultedin an average of 3-peak matches. Greater discriminationwould be adhieved by a match criterion with an averageof 3.34 matches persearch.
C. NMR Discrimination
The NMR signature, with adiscrimination factor of8.93,
ranked fourth. An average number of31.22 matches per search from an available data baseof 357 compounds wasfound. To- gether with the 143 NMRdefaults, an average of174.22 matches
per search wasfound. The criterion of search was amatch on When the strongest peakwithin the tolerancelimit of +0.1ppm. HT RESEARCH INSTITUTE
55 IITRI-C6104-4 two peaks were ofthe same magnitude,eadh peak was assigned a different compound numberand the data for allsignatures; were entered for the twonumbers. Crotonaldehyde, for example,had nearly equal NMRpeaks, after rounding, of 2.1 ppmand 2.0 ppm. The former value was assigned IITRI number337; the latter wasassigned 961. In the search for crotonaldehydewith an NMR peak of2.0 ppm, compound
337 was selected, with arating of 130 points. It is interest- ing that since compound961 was within thetolerance range of 2.1 ppm, compound 961 wasalso selected,although its rating was 126, becauseof the tolerance match. The discrimination of theNMR search can beincreased by dhanging the criterion ofmatdh to two or more ofthe most in- tense peaks and by matchingsolvent data. Matching solvent data was not employedin the search, and itwould not improve the discrimination to the degreethat adding additionalpeaks would.
d. GC Discrimination The criterion of the GCsearch was a match withthe input
data of one of two columnswithin the tolerance limit. Because
of the wide range ofrelative retention times,the tolerance
was madeproportional to the time inthe following manner: 0.01-0.99: +0.01
1.0-9.9: +0.1
10-99: +1.
HT RESEARCH INSTITUTE
56 IITRI-C6104-4 The GC signatureprovided the lowest averagenumber of matches per search,11.74, but thenonavailability of data on
192 compounds reduced thediscrimination factor to4.51; GC thus ranked third indiscrimination. Table 18 summarizesthe
GC signature search. The discriminationof the GC signature canbe improved by using Column A alone orColumn B alone insteadof the inclusive
OR criterion usedin the experiment. Even greater discrimina- tion would result fromusing the intersectionof the two columns, as shown inTable 18.
Table 18
NUMBER OF CANDIDATECOMPOUNDS OBTAINED WITH DIFFERENT GC MATCHCRITERIA (100 Known Input Compounds)
NUMBER OF MATCHES AVERAGE MATCH INPUT RANGE NUMBER OF CRITERION DATA LOWHIGH MATCHES
Column A 96 1 26 7.62
Column B 83 1 20 6.32
A and B 79 1 7 1.44 a N or B 100 1 39 11.75
aCriterion used for the experiment.
UTRESEARCHINSTITUTE
57 IITRI-C6104-4 The GC signatureprovided the lowest averagenumber of matches per search,11.74, but thenonavailability of data or
192 compounds reducedthe discriminationfactor to 4.51; GC thus ranked third indiscrimination. Table 18 summarizes tho
GC signature search. The discrimination ofthe GC signature canbe improved ] using Column A alone orColumn B alone insteadof the inclus
OR criterion usedin the experiment. Even greater discrimin tion would result fromusing the intersectionof the two col as shown inTable 18.
Table 18
NUMBER OF CANDIDATECOMPOUNDS OBTAINED WITH DIFFERENT GC MATCHCRITERIA (100 Known Input Compounds)
NUMBER OF MATCHES AVERAGE MATCH INPUT RANGE NUMBER OF CRITERION DATA LOWHIGH MATCHES
Column A 96 1 26 7.62
Column B 83 1 20 6.32
A and B 79 1 7 1.44 a N or B 100 1 39 11.75
aCriterion used for the experiment.
HT RESEARCHINSTITUTE
57 IITRI-C6104- e. UV Discrimination
The UV signature was the least discriminating of the five signatures. Its average output of 36.77 candidates per search adied to the 267 compounds for which UV data were not available resulted in an average of 303.77 candidates per search. The discrimination factor was 19.64.
2. Selectivity
a. State and Element Searches
The distribution of state and elements data for the 500- compound data base and the 100 test compounds is shown in Table
19. The liquid state category eliminated 15 of 100 compounds; the solid stata eliminated 86. The gas category provided unique identification. The selectivity of the two stages is given in
Table 20.
HT RESEARCHINSTITUTE
58 IITRI-C6104-4 Table 19
STATE AND ELEMENTSDATA AVAILABILITY
_ INPUT 1 SEARCH PARAMETER DATA BASE 500 COMPOUNDS 100 COMPOUNDS
Solid 95 14 85 STATE Liquid 387 Gas 18 1
Bromine 25 1 Chlorine 43 7 Fluorine 3 Iodine 9 ELEMENT* Lead 1 Nitrogen 80 9 Oxygen 300 63 Sulfur 16 6 No Hydrogen 7 1 No designation 70 23 _ *Data base and Input 1columns total moreth n 500 and 100 because more than oneelement was listedfor many compounds.
Table 20
SELECTIVITY OF STATE ANDELEMENTS SEARCHES (100 Known InputCompounds)
AVERAGE NUMBER OF STAGE SURVIVING CANDIDATESSELECTIVITY
State 342.43 0.685
Elements 195.11 0.390
lITRESEARCH INSTITUTE
59 IITRI-C6104-4 The effectivenessof the preselectionis indicated clearly compounds that was by the 61 percentreduction in candidate obtained at the endof the elementssearch.
It was anticipatedthat the state andelements searches would serve only as apreselection tool toreduce the number Although this of compounds to beseardhed in thesignatures.
objective was adhieved,it was found thatthe state and ele- of a number of ments searches alsoaided in identification
compounds.
b. Signature Searches experiments to Two sequences weretested in preliminary ordered on test searcheffectiveness. The first sequence was in accord- the basis of thenumber of data pointsto be searched NMR, UV, ance with thematch criteria. The order chosen was crotonalde- GC, IR, and MS. The selectivityfor compound 337, hyde, is Shown inFigure 3. The state andelements searches ordered were omittedfrom Figure 3. The second sequence was with the on the basisof the defaultratio/ the signature chosen fewest defaults beingsearched first. The seardh order for crotonalde- was MS, IR,NMR, GC, and UV. The selectivity hyde via this sequenceis Shown in Figure4. The presence of 143defaults in the NMRdata and 267 in
the UV data accountedfor 80 candidatecompounds that were carried into the thirdstage of search, asindicated in Figure the IR data, re- 3. The 22 and the 30defaults in the MS and spectively, gave rise to nodouble-default candidates, as lITRESEARCH INSTITUTE
60 IITRI-C6104-4 Signature Discrim-ination 337: Crotonaldehyde Selectivity 176 176 - NMR 411) 277 143 411) NMR NMR143 104 UV 10 267 1110 t 80 cs) NMR-UV NMR.UV NMR-EV NMR-UV 205 00 OW 26 i GC 49 11921 9 1110111 1111111111 NMR-UV-GC 1 9 J 44 MOONMR-UV-GC NMR-UV-GC 3 _r____te,-..str____ NMR-UV-GC RMR.UV-GC NMR-UV-GC IR 14 30 an I Is.....01 2 MS 47 69 22 NMR-UV-GC-IR 1 NMR-UV-GC-IR 0 MATCH DEFAULT 00000NMRUV-GC-IRMSSELECTIVITY WITH SEQUENCE Figure 3 ORDERED BY MATCH CRITERIA Discrim-ination 337: Crotonaldehyde Selectivity Signature 47 69 22 47 69 22 MS . IR 14 44 30 MS- MS 6 NMR 176 143 MSIR MI MS.IR GC 205 192 alnMSIRNMR elell al MSIR-NMR2 MS-IR-NMR 4111) 10 277 MS - IR -NMR 1 - GC MN=MS - IR-NMR-GC 0 MATCH DEFAULT UV 1267 41110111111100MS.IRNMR.GC-UV SELECTIVITY WITH SEQUENCE Figure 4 ORDERED BY DEFAULT RATIO indicated in Figure 4. The discrimination ofthe seardhes, of course, is also animportant factor inestablishing the selec- tivity. The selectivity of the twosequences issummarized in
Table 21.
Table 21
SELECTIVITY WITH DIFFERENT SEARCH ORDERS (Signatures Only)
MATCH DEFAULT STAGE CRITERIA RATIO
1 0.354 0.104
2 0.201 0.012 3 0.052 0.006 4 0.006 0.004 5 0.002 0.002
_...... , Inasmuch as the selectivity ofsearch of Table 21 was
characteristic of numerous compounds,the search sequence
based on default ratios wasemployed in the experimentaltests
which were made using a manualTermatrex system. The search
order was: 1) state, 2) elements, 3) MS,4) IR, 5) NMR, 6) GC,
and 7) UV. Selectivity curves of 10 compounds areshown in Figures
5 and 6. The pattern of selectionvaries with each compound. A unique identification was notadhieved until the final stage
of search in four of theillustrated compounds, whereaswith
compound 522: benzonitrile, asingle candidate was selected at
the end of the fourth stage,after the MS and IR searches. For
three of the compounds in Figure6, unique identification was lITRESEARCH INSTITUTE
63 IITRI-C6104-4 : 250 Isopropanol
522: Benzonitrile O.%
10: VaPhthalene
: 111111111111
194: Piperonal
337: Crotonaldebyde
0 o o 0 0 0 0 0 0 oi w 4=. CA) COMPOUNDS CANDIDATE
t79 tPOT93IHII 0 0 o 0.. o 0 o 0 0 o I_I 01 411. (k) n.) COMPOUNDS CANDIDATE
S9 p-dichlorobenzene, forexample, not achieved. Compound 208: elements searches, had 10 candidatesat the end ofthe state and
NMR, GC, and UV. . 3 at the end ofMS, 3 afterIR, and 2 after compounds A summary ofselectivity for the100 known input Table 22. At the in the 7-stagesequential searchis listed in nearly 93 percentof the end of the firstsignature seardh, MS, The IR search 500 compounds inthe data base wereeliminated. For the 100 eliminated up to 98percent of the500 compounds. survived follow- test compounds, anaverage of1.65 candidates
ing the fihal stageof search, UV. and The range columnof Table 22indicates the lowest stages for highest number ofmatches obtainedin the respective differ from those any of the100 test compounds. These ranges considered independentof of Table 13whereeadh signature was
the sequentialsearch. the stage at The two righthand columns ofTable 22 list Six compounds were which uniqueidentification wasobtained. MS. Six- identified at theend of the firstsignature search, cumulative total of22 were teen additionalcompounds for a IR. At the identified followingthe secondsignature search, compounds were conclusion of the7-stage search, 68of the 100
uniquely identified. functional The selectivity ofthe searchesvaried with the of the fivelargest groups as wasexpected. The selectivity compounds is listedin Table 23. groupsrepresented in the test
HT RESEARCHINSTITUTE
66 IITRI-C6104-4 SEARCH SELECTIVITY(100 Known BY Input Compounds) Table 22 STAGES AVERAGE SURVIVINGNO. C ANDIDATERANGE COMPOUNDSWITH ONENUMBER OF STAGE1. State CANDIDATES 342.43 SELECTIVITY 0.685 LOW18 HIGH 387 CANDIDATENEW 0 CUM. 0 4.3.2. MSElementsIR 195.11 37.72 8.96 0.0180.0750.390 161 143387 39 16 60 22 60 6.5.7. GCNMRUV 5.851.651.85 0.0120.00330.0037 11 26 67 142210 446858 SEARCH SELECTIVITY (Known Input Compounds) Table 23 OF FUNCTIONAL GROUPS STAGE ALCOHOLS AVERAGE NUMBERALDEHYDES SURVIVING HYDRO-CARBONS KETONES CANDIDATES POLY-FUNCTIONAL 1.2.3. MSElementsState 340.47210.47 56.05 207.37350.50 33.87 345.56345.56 59.96 357.80227.00 37.20 113.60241.00 14.10 4.6.5. GCNMRIR 13.73 9.681.68 4.125.621.37 14.00 8.913.21 10.80.8.37 1.40 2.701.401.80 No. of TestCompounds7. UV 1.6319 1.12 8 2.6523 1.4010 101.20 The aldehydes had an averageof 1.12 candidates per test com-
pound at the end of the7-stage search followedclosely by the polyfunctionals with an averageof 1.20 candidates. The hydro-
carbons had poorerselectivity than all other groupsexcept in
the state and NMRsearches.
3. Precision
a. Input 1 - 100 Known Com ounds
As noted in Table 24,68 out of the 100 testcompounds
were uniquelyidentified in the 7-stage search. Sixty-five extraneous candidates wereselected for the remaining 32compounds.
The number of extraneouscandidates ranged from one to five,with
an average of2.03 for the multicandidatecompounds (1.65 average
for the 100 test compounds). Only two of the 65 extraneouscompounds had data on all signatures available as Shown inTable 24. Eleven had one
signature missing, 45 had twosignatures missing, and 7 hadthree
signatures missing. The ratings for the 65 extraneouscandidates ranged from 34 to 95 compared tomaximum possible ratings of
117 to 130 as Shown in Table25. The distribution of the number ofcandidates broken down by
functional groups is shown in Table26. The hydrocarbons had 13
multicandidate compounds out of 23tesi,ed. The extraneous
candidates ranged from one to five. The alcohols had 9 out of
19 multicandidate compounds,although only one or two extraneous
candidates were found.
HT RESEARCH INSTITUTE
69 IITRI -C6104 -4 _ ...--=didnEall.1.1.11.1.111.1111111111111111111.111111.111.1111
Tabl e 24
DATA AVAILABILITY COMPOUNDS 65EXTRANEOUSCANDIDATE
SUB- UNAVAILABLE NUMBER OF NUMBER SIGNATURES SIGNATURE S COMPOUNDS TOTAL MS IR NMR GC UV UNAVAILABLE , 2 2 0 ...-.-.....4 X 4 X 4 1 X 3 11
2 X X 1 X X X X 8 2 X X 4 X X 16 45 X X 14
X 1 X X 1 X X 3 X 5 7 X X X
65 Tot al .....,_ _
70 IITRI-C6104-4 Table 25 DISTRIBUTION OFCANDIDATE RATINGS
TRUE COMPOUNDS(32)
Rating No. Compounds matched) 130 18 (All data data) 126 11 (No elements column available) 121 1 (Only one GC only one GCcolumn) 117 2 (No elements data;
EXTRANEOUS COMPOUNDS(65)
60 1 34 1 62 5 35 1 64 2 39 3 65 3 40 1 66 1 42 2 3 43 3 67 70 2 44 2 72 1 45 1 74 1 46 4 75 1 47 1 *77 1 48 1 2 49 2 **78 80 1 53 1 84 1 54 5 86 1 56 1 87 1 57 1 93 1 58 5 95 1 59 1
*5 Signaturesavailable **5 Signaturesavailable for onecompound
71 IITRI-C6104.4 Thiols
Sulfides
Polyfunctionals I-6
Nitro
Nitriles 0 tones Ke I. 0
F IV ON H 4 Hydrocarbons H0 H 0 Halocarbons R Ethers
Esters iv
Diols tv 1-4
Aldehydes 1-,
Alcohols al CA)
M
% 0 01 OD 0 OD hi 0 CD 0 . 0 -+ 0 .0 --, g 0ill Z 0
M M H Pi q W H tli CI' W Prj 0 th) libb til 0 rt 0 H N rt 0 tr 1-3 zi 0 E w
t-t0T93-IIIIII The 32 test compoundswith multicandidates arelisted in
Table 27 by functional group. Table 28 shows the detailedtest results for each of themulticandidate compounds. Notations below the test resultsfor some compounds inTable 28 are
discussed in the followingsection dealing withmodified match
criteria. The letters X and M inTables 28designate"default" (no
data available) and"match," respectively. A and B in the GC The Y/Y row designatethe columns on which amatch was made. designations in the MS andIR rows indicatethe peaks that
matched; the left digitdesignates the input data,the right
digit designates the storeddata. The symbols + and -designate
matches on the tolerancelimits. The "true" inputcompounds
appear at the leftof eadh set of candidatesand are marked
with an asterisk.
lITRESEARCH INSTITUTE
73 IITRI-C6104-4 COMPOUNDS WITH MULTIPLE Table 27CANDIDATES HY FUNCTIONAL GROUP ALCOHOLS 1.2. 279244 : ethanol CAND. 22 KETONES 1.2. 296:305: cyclopentanone2-pentanone CAND. 42 4.3.5. 246761760 : : butanolnonanol2-ethyl-1-hexanol3-heptanol 22 HYDROCARBONS 1. 2: ethane 2 9.8.7.6. 247759278259 : : pentanoloctanolcyclohexanol2-ethyl-1-butanol 3 4.5.3.2. 10:50:51:53: n-decanecycldhexanecis-decalincyclopentane 24343 4=. ALDEHYDES 1. 322: propionaldehyde 2 10. 6.9.8.7. 12:56: 6:9:7: methyln-hexanen-nonanen-heptanen-hexadecane cyclohexane 44 DIOLS 1.3.2. 783:284:282: diethylene1,2-propanediol2,3-butanediol glycol 42 13.12.11. 715:701: 52: cycloheptane3-methylpentane2,3-dimethylbutane 656 ESTERS 1. 425: diethyl malonate 2 POLYFUNCTIONALS 1. 698: 4-hydroxy2-pentanone -4 -methyl - 3 HALOCARBONS 2.1. 819:208: p-dichlorobenzenebutyl benzoate 2 COMPOUNDS: ALCOHOLS(1) MATCHES OBTAINEDWITHMULTICANDIDATE TEST
658: Acetal Alcohol 244*:Ethanol 2 M 2 M State 4 M 4 M Elements 21 40 2/1 3/3 4/4 MS 1/1 2/2 3/3 4/4 1/3- 2/1- 9 1/1 2/2 3/3 36 IR 24 M 24 M NMR B+ 6 M : B 18 M : GC M : A, X 1, M 6 UV Score 67 Score 130 GC: No MatchCol. A 769: 3,3-Dimethyl- 2-butanol Alcohol 246*:Butanol M 2 M 2 State 4 M 4 M Elements 14 40 2/4 3/3 MS 1/1 2/2 3/3 4/4 1/3 2/1- 12 1/1 2/2 3/3 36 IR X 1 M 24 NMR 6 18 M : B M: A, Di : B GC 6 M 6 M UV Score 45 Score 130 GC: No MatdhCol. A MS: 2 of 4 Peaks
Citronellol Octanol 182: 3 Alcohol 278*: 2 2 State 4 4 Elements 19 4/4 40 1/1 4/3 MS 1/1 2/2 3/3 2/1+ 18 1/1 2/2 3/3 36 1/2 IR X 1 24 NMR X 1 18 M : A, M : GC X 1 rvl 6 UV Score 46 Score 130 MS: 2 of 4Peaks
75 IITRI-C6104-4 Table 28.1 (Continued):
MATCHES OBTAINED WITH MULTICANDIDATETEST COMPOUNDS: ALCOHOLS (2)
Alcohol 279*:Nonanol 182:Citronellol
State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 1/1 3/3 24 IR 1/1 2/2 3/3 36 1/2 3/1+ 13 NMR M 24 X 1 1 GC M : A, M :B 18 X UV M 6 X 1 Score 130 Score 46 MS: 2 of 4 Peaks .wA--1 776: Cyclopentyl- Alcohol 760*: 3-Heptanol methanol
State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 2/3 3/1 13 X IR 1/1 2/2 3/3 36 1 NMR M 24 M+ 20 X 1 GC M : A, M :B 18 UV M 6 M 6 Score 130 Score 47 NMR:1 of 2 Peaks MS: 2 of 4 Peaks m, 761: 2-Ethyl- Alcohol 1-hexanol 182: Citronellol
State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 3/1 4/3 7 18 IR 1/1 2/2 3/3 36 1/2 2/1- X NMR M 24 1 X GC M : A, M :B 18 1 X UV M 6 1 Score 130 Score MS: 2 of 4 Peaks
76 IITRI-C6104-4 Milwarib Table 28.1 (Continued): 7 Alcohol MATCHES247*: OBTAINED WITH Pentanol NnLTICANDIDATE TEST 182: Citronellol COMPOUNDS: ALCOHOLS 256: 2,2-Dimethyl-1-butanol (3) ElementsState 1/1 2/2 M 3/3 4/4 40 42 2/3 MM 3/1 13 42 MDI 421 MSNMRIR 1/1 2/2 M 3/3 3624 1/1+ 2/2 X 27 1 2/1- 3/3 Xx M : B 12 91 GCUV 11 : A, M M:BScore: 130 18 6 MS: 2 of 4 Peaks X Score 49 1 GC: No Match Col. M Score Am--wi 35 6 AlcoholState 259*: Cyclohexanol M 2 1 690: 2-Butoxy-ethanol M 42 182: Citronellol M 42 MSElementsIR 1/1 2/2 M 3/3 4/4 4036 4 1/2-1/1 2/3- M 4/4 1915 1/1+ 2/22/4 M 4/1 27 81 NMR : M M : B 1824 M : A+, X M : B- 12 1 XX 1 GCUV M A, M Score 130 6 MS: 2 of 4 Peaks X Score 54 1 MS: 4/12 ofMatch 4 Peaks X Score 44 1 Table 28.1 (Continued): I MATCHES OBTAINED 759*: 2-Ethyl- WITH MULTICANDIDATE 765: 2,Methyl- TEST COMPOUNDS: AICOHOLS770: (4) 2-Heptanol 9 Alcohol 1-butanol 2 1-pentanol - M 2 M 2 MSElementsState 1/1 2/2 11M 3/3 4/4 4036 4 1/11/1 2/3- M 3/23/3 4/4 2828 4 1/21/1 2/2- M 3/4 4/3 2127 4 NMRGCIR 1/1 M : A, 2/2 M 3/3 M : B 2418 M : A, X M : B 18 61 MX DI : B 916 UV M Score 130 6 M Score 87 GC: No Match Col. Score A 70 I CD MATCHES OBTAINED WITH MULTICANDaDATE Table 28.2 TEST COMPOUNDS: DIOLS (1) DiolState 284*: Diethylene glycol M 2 282: 1,2-Propanediol M 2 MSElementsIR 1/1 2/2 M 3/3 4/4 4036 4 1/2+1/1 MM+ 3/33/1 4/4 282011 4 GCNMRUV M : A, MM M :Score B 130 2418 6 M : A+ M Score 77 66 i GC:NMR: 1 of 2 No Match Col. Peaks B Diol ,783*: 2,3-Butanediol 765: 2-Methyl- 1-pentanol ElementsState DIM 3/3 4/4 40 42 2/3 MM 3/1 4/4 17 42 MSNMRIR i 1/11/1 M : 2/22/2 A, M 3/3 M : B 243618 1/1 M : A 2/2 X 28 91 GC 1 M 6 Di 6 UV Score 130 L. GC: No Match Col. Score B 67 I MATCHES OBTAINED WITH MULTICANDIDATE TEST COMPOUNDS:Table 28.2(Continued): DIOLS (2) I 256: 2,2-Dimethyl- 12 DiolState 282*: 1,2-Propanediol M 2 284: Diethylene glycol M 2 765: 2-Methyl-1-pentanol M 42 1-butanol MM 42 MSElements 1/1 2/22/2 M 3/3 4/4 3640 4 1/31/1 2/1- M 3/3 4/4 1228 4 1/1 2/2 2/1 M 4/4 3013 1/1- 2/2+ X 24 1 NMRIR : M 1824 II : A- M- 20 6 M : A+ X 61 M : A- X 6 GC M A, M : B 1 6 M 6 M 6 M ' M UV , Score 44 co Score 130 NMR:GC: 1 of 2 Peaks No Matdh Col. B Score 78 MS:GC: No Match2 of 4 Col.Peaks B Score 62 GC: No Match Col. B MATCHES OBTAINED WITH MISCELLANEOUS GROUPS (1) TableMULTICANDIDATE 28.3 TEST COMPOUNDS: 13 Po1yfunc-1tional methy1-2-pentanone698*: 4-Hydroxy1-4- 2 1 797: Isobutyric acid M 2 402: Ethylidene diacetate 14 2 MSElementsState 1/1 2/2 m/I 3/3 4/4 3640 4 1/1 2/1 M 3/2- 4/3 1119 4 1/1 2/4 NX 22 41 GCNMRIR 1/1 2/2 M M : B 24 96 MX 24 1 X M : B 19 UV M Score 121 m.----- NMR:MS: 21 ofof 42 PeaksPeaks Score 62 MS: 2 of 4 Peaks Score 40 MATCHES OBTAINED WITH MULTICANDIDATE TEST Table 28.3 (Continued): COMPOUNDS: MISCELLANEOUS GROUPS (2) HalocarbonState 208*: p-Didhloro- benzene M 2 chloro-benzene755: 1-Bromo-4- M 42 MSNMRElementsIR 1/1 2/2 MM 3/3 4/4 402436 4 2/2+ 3/3+ MM+ 3/3 4/4 201312 GCUV M : A, M M Score: B 1130 18 6 X Score 53 11 Aldehyde 322*: Propionaldehyde MS: 332: Acrolein2 of 4 Peaks MSElementsStateIR 1/1 2/2 MM 3/3 4/4 3640 42 1/1+ MM 3/33/4 4/1 21 742 GCNMR M : A, M M : B 1824 6 M : A- XX 61 UV M Score 130 MS:GC: No4/1 2Matdh ofMatdh 4 Col.Peaks B Score 42 Table 28.4
MATCHES OBTAINED WITHMULTICAND1DATE TEST COMPOUNDS: ESTERS 661: Isobutyric Ester 425*:Diethyl malonate anhydride
State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 2/1 4/4 13 IR 1/1 2/2 3/3 36 1/1-1/2+ 24 NMR M 24 M- 20 X 1 GC M : A, M : B 18 X UV M 6 1 Score 130 Score 65 q , NMR:1 of2 Peaks MS: 2 of4 Peaks .
Benzyl benzoate Ester 819*:Butyl benzoate 818:
State M 21 M 4 Elements M 4 M 24 MS 1/1 2/2 3/3 4/4 40 1/1 3/3 34 IR 1/1 2/2 3/3 36 1/1 2/2 3/3- 24 NMR M 24 M 1 GC M :A, M:B 18 X 4 UV M 6 M+ Score 130 Score 9 NMR:1 of 2 Peaks MS: 2 of 4 Peaks
82 IITRI-C6104-4 MATCHES OBTAINED WITH MULTICANDaDATE Table 28.5 TEST COMPOUNDS: HYDROCARBONS (1) HydrocalbonElementsState 2*: Ethane M 02 Propane M 02 MSNMRIR 1/1 2/2 M 3/33/3 4/4 403624 1/21/1 2/2+2/3 X 2721 1 GCUV M : AI M M :Score: B 126 18 6 MS: 2 of MX4 Peaks Score 58 61 Hydrocarbon 53*: Cis-decalin 2 54: Trans-decalin M 2 MSElementsStateIR 1/1 2/2 M- 3/3 4/4 4036 0 1/1 2/2 - 3/3 4/3 3615 0 GCNMR M : A, M M : B 2418 XM 24 1 UV , M Score 126 6 MS: 2 of M 4 Peaks Score 84 6 MATCHES OBTAINED WITH MULTICANDIDATE Table 28.5 (Continued): TEST COMPOUNDS: 845: Trans-1,3-dimethyl-HYDROCARBONS (2) 20 Hydrocarbon 50*: Cyclopentane , 844: Cis-1,4-dimethyl-cyclohexane 2 cycldhexane 2 MSElementsState 1/1 2/2 M- 3/3 4/4 40 02 M_ 3/3 4/1 09 M_ 3/3 4/1 90 1 GCNMRIR 1/1 M : A 2/2 M 3/3 2436 9 MX 24 11 MXX 24 61 UV M Score 117 6 MS:MS: 4/12 ofMatch 4 Peaks M Score 43 6 MS:MS: 4/12 ofMatdh 4 Peaks M Score 43 MATCHES OBTAINED WITH MULTICANDIDATE TEST Table 28.5 (Continued): CM:POUNDS: HYDROCARBONS (3) Hydrocarbon 51*: Cyclohexane 2 844: Cis-1,4-dimethyl-cyclohexane 845: Trans-1,3-dimethylcyclohexane 1: MSElementsState 1/1 2/2 E- 3/3 4/4 40 0 - 3/3 4/1 90 ._ 3/3 4/1 90 GCNMRIR 1/1 2/2 M 3/3 M : B 3624 9 M+X 20 1 XM+X 20 1 UV M Score 117 6 MS: 4/12 ofMatch 4 Peaks M Score 39 6 MS: 4/12 ofMatch 4 Peaks M Score 39 6 HydrocarbonElementsState 6*: n-Hexane M- 02 1 13: n-Heptadecane M- 02 36: Squalane -M 02 839: 2-Propanethiol M- 02 MSIR 1/11/1 2/2 3/3 4/4 3640 1/11/1 2/2 3/43/3+ 34 1/11/1 2/22/2 3/4 3034 1/2+ 2/32/1 /I 3/3 1717 GCNUR M : A, M 1.1 : B 1824 X 1 XX 1 X 24 1 UV i M Score 126 6 M Score 78 6 M Score 74 6 X Score 62 1 MS:NMR: 21 of 42 Peaks I Table 28.5 (Continued): TEST COMPOUNDS: HYDROCARBONS (4) 7 : n-Heptan! MATCHES OBTAINED WITH MULTICANDIDATE 13: n-Heptadecane 839: 2-Propanethiol 36: Squalane Hydrocarbp0State M- IF 02 144- 02 /4 02 M- 0 MSElementsIR 1/1 2/2 3/33/3 4/4 4036 1/1-1/2 2/22/4 3/33/1 3322 1 1/21/1 2/3 242017 1/11/2 2/22/4 X 3/1 2722 1 GCNMRUV M : A, MM M :Score B 126 2418 6 MXX Score 65 61 MX Score 65 11 MX Score 59 61 Hydrocarbon 9*: n-Nonane 839: 2-Propanethiol MS:NMR: 1 of 2 13:Peaks n-Heptadecane2 of 4 Peaks 36: Squalane MSElementsState 1/1 2/2 II - 3/3 4/4 40 02 1/1 M- 3/3 24 02 1/2 2/1 14- 3/4 27 02 1/2 2/12/2 M- 3/4 27 02 GCNMRIR 1/1 M : A, 2/2 Di 3/3 M : B 182436 1/2 2/3 Di X 2420 1 1/1- 2/2 X 3/3 33 1 1/1 X 11 UV M Score 126 6 MS:NMR: 1 of 2 Peaks 2 of 4 Peaks X Score 72 1 M Score 70 6 M Score 64 6 Table 28.5 (Continued): Hydrocarbon 10*: n-Decane MATCHES OBTAINED WITH M1JLTICANDaDATE TEST 839: 2-Propanethiol COMPOUNDS:13: HYDROCARBONS (5) n-Heptadecane 36: Squalane MSElementsState 1/1 2/2 M- 3/3 4/4 40 02 1/1 M- 3/3 24 02 1/2 2/1 M- 3/4 17 02 1/2 2/1 M- 3/4 17 02 GCNMRIR 1/1 M : A, 2/2 M 3/3 M : B 361824 1/2+ 2/3+ MX 2415 1 1/1 2/2+ X 27 1 1/1 2/2+ X 27 11 UV M Score 126 6 MS:NMR: 1 of 2 Peaks 2 of 4 Peaks X Score 67 1 M Score 54 6 M Score 54 6 Hydrocarbon 12*: n-Hexadecane 13: n-Heptadecane 36: Squalane i 839: 2-Propanethiol MSElementsState 1/1 2/2 M- 3/3 4/4 40 02 1/1 2/2 M- 3/3 4/4 40 02 1/1 2/2 M- 3/3 4/4 40 02 2/1 M- 4/3 12 02 GCNMRIR 1/1 : A, 2/2 M 3/3 M : B 361824 1/1 2/2 X 3/3 36 1 1/1 2/2 X 30 1 1/2+ 2/3 MX 2417 1 UV M DI Score 126 6 M Score 86 6 M Score 80 6 MS:NMR: 21 of 42 Peaks X Score 57 1 Table 28.5 (Continued): COMPOUNDS: HYDROCARBONS (6) 27 Hydrocarbon 56*: Methylcyclo- hexane MATCHES OBTAINED WITH MULTICANDIDATE TEST 836: 4-Methyl-valeronitrile pentanenitrile834: 4-Methyl- dimethylcyclohexane846: Trans-1,4- MSElementsState 1/1 2/2 M - 3/3 4/4 40 02 2/1 M - 3/2 12 02 2/1 M- 3/2 12 02 2/1 M- 3/3 14 02 GCNMRIR 1/1 : A, 2/2 M 3/3 M : B 132436 1/1- 1/2+ 3/3- XM+ 2028 1 1/1- 1/2+ M+X 2024 1 MXx 24 1 UV M M Score 126 6 MS: 2 of 4 Peaks X Score 64 1 MS: 2 of 4 Peaks X Score 60 1 MS:NMR: 21 of 24 Peaks M Score 48 6 717: 2-Methyl- 2-butene 726: Chlorocyclo- hexane methylcyclohexane844: Cis-1,4-di- , 845: Trans-1,3-di-methylcyclohexane . 28 Hydrocarbon 52*: Cycloheptane M 2 M 2 m 2 m_. 2 m 2 MSElementsState 1/1 2/2 - 3/3 4/4 3640 0 1/31/1 - 3/13/2+ 1812 0 1/11/4 - 3/3-3/3 1812 0 1/3 x 3/1 12 01 1/3 x- 3/1 12 01 GCNnRIR 1/1 M : A, M M : B 1824 MX 24 1 MX 24 1 XM 24 61 MX 24 61 UV M Score 126 6 r X Score 58 1 -4 x Score 58 M Score 46 --i 2 of 4 Peaks M Score 46 I MS: 2 of 4 Peaks MS: 21/4 of Match4 Peaks MS: 2 of 4 Peaks 1 MS: MATCHES OBTAINED WITH MULTICANDIDATE Table 28.5 (Continued): TEST COMPOUNDS: HYDROCARBONS 705: 2,2-Dimethyl- (7) In: 2,3,4-Trimethyl- HydrocarboState 701*: 2,3-Dimethyl- butane M 2 836: 4-Methyl-valeronitrile M 2 36: Squalane - 02 13: n-Heptadecane M_ 02 butane M- 02 pentane M_ 0 - 0 , MSElementsIR 1/11/1 2/2 2/2 3/3 3/3 4/4 - 3640 0 1/11/3 2/3- 3/2 4/4 1824 1/11/2 2/2 3/4 1930 1/11/2 2/2 3/4 3018 1/1 2/2 3/3 4/4 X 40 1 1/1 2/2- 3/1- 3/3 X 3/4 1022 1 GCNKR M : A, M Ii:B 2418 XM- 20 1 XX 61 X 61 MX M : B- 6 MX 61 UV M Score 126 6 X Score 66 MS: 2 of 4 Peaks M Score 58 MS:NM: 21 ofof 42 PeaksPeaks M Score 58 i Score 56 MS: 2 of 4 Peaks Score 42 ' -"713: 2,3-Dimethyl- 41: 3-Hexene HydrocarbonState 715*: 3-Methyl- pentane M- 2 pentane M- 02 712: Isopentane M - 02 13: n-Heptadecane M - 02 36: Squalane M- 02 M- 02 FIR1MS Elements 1/1 2/2 3/3 4/4 4036 0 1/21/1 2/22/4 3/3 3626 1/11/4 2/2 3/3 3012 1/11/1 2/2 3/4 2230 1/1 2/2 3/4 3022 1 1/3+ 2/2 3/1 3/2- 2016 6 UVGCNMR M : A, M M : B 2418 6 MmX 24 61 MMX 24 61 MXX 61 XMX 61 M : A XM+ Score 54 19 Score 126 J Score 95 MS:MS: 1/4 2 ofMatch 4 Peaks Score 75 MS: 2 of 4 Peaks Score 62 MS: 2 of 4 Peaks Score 62 MS:NMR: 21 of 42 PeaksPedks 1 Table 28.6 KETONES MATCHES OBTAINED I 305*: Cyclo- WITH MULTICANDaDATE 812: PropylTEST COMPOUNDS:acrylate 31 KetoneElementsState pentanone M 42 MM 421 NSNMRIR 1/11/1 2/22/2 M 3/3 4/4 403624 1/1 : A- X X 3/3 24 61 GCUV M : A, El M :Score B 130 18 6 GC: M No Match Col. X Score B 39 1 _I 32 Ketone 296*: 2-Pentanone 2 1 373: Methacrylic M acid 2 ,1 404: Allyl acetate M 42 384: Isoblityl M formate 42 MSElementsState 1/1 2/2 M 3/3 4/4 4036 4 M 3/3-3/2 4/1 19 74 1/11/2- 2/3 M 4/2 1718 1/11/2- M 3/1- 4/4 1019 1 NMRGCIR 1/1 M : 2/2 A, M 3/3 M : B 2418 1/1+ XM- 20 1 M : A- XX 611 M : A- XX 61 UV M Score 130 6 X Score 54 Score 49 -i No Matdh Col. B Score 43 MS:NMR: 1 of 2 Peaks 4/12 ofMatch 4 Peaks I MS:GC: No 2Matdh of 4 Col.Peaks B GC: In general, it can besaid that hydrocarbons and alcohols posed the greatest problemwith respect to multiple candidates. The reasons for this appear tobe correlated to compound type and lack of data. The summarized results in Tables261 27 and
29 point out the large numberof alcohols and hydrocarbons that came through with multiplecandidates. Tables 24 and 29 show that, in general, significant data wereunavailable particularly in GC and NMR. Of all the signatures these twocould be the most important for characterizingalcohols and hydrocarbons. This correlates wlth the factthat hydrocarbons and alcohols were the worstoffenders among the extraneous candidates selected more than once. The observation one can make isthat a more complete data file, with particular emphasis on GCand NMR which are among the less abundant signatures, wouldgreatly reduce (by perhaps
75 percent) the multiple candidateproblem we experienced. The greatest influence of havingthese data would be greater discrimination in the alcohols andhydrocarbons.
b. Input 2 - Compounds WithoutStored Data Five compounds that do not havesignature data stored in
the data base were tested. These Input 2 compounds are:
68: n-Prophylbenzene
270: m-Cresol
8: Octane
74: Indene
92: Thiophene.
lITRESEARCH INSTITUTE
91 IITRI-C6104-4 L I' EXTRANEOUS CANDILATE COMPOUNDSSELECTED MORE THAN ONCE Table 29 COMPOUND SELECTIONS NUMBER MS UNAVAILABLESIGNATURESIR NMR GC UV' 182: 13:36: CitronellolSqualanen-Heptadecane 75 XX X X 839:845:844: cis-1,4-dimethyl-cycldhexane2-Propanethiol 35 X XX X 256:836:765: trans-1,3-dimethyl-cycldhexane4-Methyl-valeronitrile2-Methyl-1-pentanol2,2-Dimethyl-1-butanol 32 X XX X X precision; i.e., no Compounds 68, 92,and 74 had perfect candidate compoundssurvived. of one The search form-cresol resultedin the candidacy had a rating of compound, 274(2,4-dimethylphenol), which because no GC 28 points. The GC signaturesearch was omitted data were availablefor m-cresol andthe 2,4-dimethylphenol Hence passed through theNMR and the IRsearches by default. signature searches the candidatecompound passedthrough only two 2/2 and the by matching thenegative tolerance onUV and the
4/4 peaks in MS. Compound 274 is anexample of a compoundthat was very because of chemical similar to the unknownand passed through candidate similarity and default. If GC data wereavailable, the homologues, compound surelywould have beenrejected, because readily resolved, sudh as m-cresol and2,4-dimethylphenol, are particularly with the GCoperating parametersused. Although similar enough to these molecules aredifferent, they are considering produce very similarMS, IR, and NMRspectra, and, probably pass our matchcriteria,2,4-dimethylphenol will through as a candidatebased on thesespectral data. The search for n-octane,a hydrocarbon,resulted in
selecting the sixcandidates shown inTable 30 with scores
ranging from 72 to30 out of a possible130. The following screening key factors weresignificant in thefailure of the n-octane: process toreject the sixcandidates accompanying
lITRESEARCH INSTITUTE
93 IITRI-C6104-4 Table 30 CANDIDATE COMPOUNDS RETRIEVED IN SEARCH FOR OCTANE SEARCH a NO. 19: 3 -METHYLHEPTANE COMPOUND NAME STATE ELEMENT SKIP 2/41/1MSRESULTS 2/+21/+1 IR NMR D GC A UV M SCORE 72 25: 2,5 -DIMETHYLHEXANE SKIP 4/24/22/31/1 1/12/32/+2 D -B M 71 763:839: 2 2-PROPANETHIOL -HEXANOL SKIP 2/31/12/31/2 1/+22/+31/1 MD DA MD 6863 13: N-HEPTADECANE SKIP 4/11/22/4 1/13/32/+2 D D M 5630 a D 29:= default; M = match. 2 3.3-TRIMETHYLPENTANE M SKIP 4/41/1 D D D M rejection (1) GC data on Carbowax20M would have caused of 2-propanethiol and2-hexanol
(2) GC data on n-heptadecanewould have caused rejection of this compound
(3) The chemical similarity amongthe four hydro- carbons (all saturatedhydrocarbons, three of which
have eight carbonatoms) would make itdifficult to distinguish on the basis of ourMS, IR, and UV match
criteria. NMR may or may nothave been selective
(4) The MS fragmentationbehavior of hydrocarbons, 2-hexanol, and 2-propanethiolis very similar.
In practice, evenwith entire mass spectrafor
the candidate compounds,it is often difficult to distinguish between them.
The interesting featureof the above examplesis the similarity of the candidates tothe input compound. This proves to be anadvantage because if there are nodata in the
file for an unknown, thecompounds selected give astrong indication of the nature ofthe compound and can be veryhelpful of in identifying thecompound type. Of course, the selection "false" candidatei dependsnot only on matches,but also on defaults; the more dataavailable, the greater theselectivity,
while the lack of data couldpermit more candidates tobe
carried through by default.
If a large number ofdefault compounds arecarried through,
little help might be provided tothe user, since therewould be
lITRESEARCH INSTITUTE
95 IITRI-C6104-4 no reason to suspectthat the default candidates arestructurally related. No inferences shouldbe made on the basis of default matches. Judgments should be made onthe basis of true data matches. Because the Input 2 compounds servethe same purpose as
Input 1 compounds, no further tests weremade on this group, and further evaluation of testresults is included in thediscussion of modified match criteria.
. Input 3 - Compounds wlthLaboratory Measurements
The Input 3 compounds wereused to evaluate measurement repeatability, data standards, andthe tolerance limits that would be necessitated to accommodatevariations in measured data. Eleven compounds were chosen atrandom from the 100
"unknowns." All eleven compounds were analyzedby each of the
five analytical techniques. Only one, diphenyl sulfide,lacked
the relative retention data onthe Carbowax 20M column.
Table 31 lists the eleven compoundsalong wlth the
abstracted data generated in ourlaboratory. Included in the
table are the abstracted data forthe compounds contained in
the file. The laboratory generated data wereabstracted and entered into the data sheets forunknowns. These data were
screened against the file according tothe initial match
criteria described in IVA3a. The results of the screening process aretabulated in
Table 32. Nine of the eleven came through asunique matdhes.
lITRESEARCH INSTITUTE
96 . IITRI-C6104-4 Pebble 31 MS COMPARISON BETWEEN STORED AND LABORRTORY IR DATA NMR (Rel Ret. Time)GC (miL) UV , (Mass) (13m) Param. Stored Lab Stored Lab Unknown A 560:1-Butanethiol Compound PkParam. 321 Stored 274156 Lab 904156 PkParam. 132 Stored 8.27.86.8 Lab7.2*7.86.8 Stored 0.9 Lab0.9 Col BA 0.570.76 0.760.59x NA . E 327:Isobutyraldehyde PkPk 4 231 90274143 47+412743 Pk 321 3.76.85.8 6.83.75.8 0.9 1.0 Col BA 0.330.31 0.330.31 214 214 c 10:Decane PkPk 4 312 43714157 29+415743 Pk 213 6.83.97.2 13.7* 6.87.2 1.3 1.3 Col BA 0.702.7 0.68x2.7 D 184:Menthone PkPk 4 321 112 412969 112 416971+ Pk 312 5.87.36.8 5.87.36.8 NA 0.9 Col BA 6.0 6.1 NA 236 E 685:p-Anisaldehyde PkPk 4 312 135136 5577 136135 5577 PkPk 31 2 8.67.96.2 8.8*8.06.3 3.9 3.8 Col bA NA9.4 45 9.4 274 274 r 323:n-Butyraldehyde PkPk 41 32 92442729 4476+2743 PkPk 1 32 6.83.75.8 10.3* 5.88.7* 1.0 0.9 Col BA 0.420.35 0.410.21 290 290 G 458:Diphenyl Ether PkPk 14 23 170 775143 170 5141+77 Pk 321 8.16.76.3 14.5* 8.16.7* 7.0 7.0 ColCol A B 1754 1747 226 226 H 574:Diphenyl Sulfide PkPk 214 3 185186141 51 185186141 51 Pk 321 14.513.6 6.3 13.614.5 6.8* 7.2 7.2 Col BA 14 NA 14 250 250 I 249:n-Heptanol Pk 124 184 417056 184 417056 Pk 213 9.56.93.0 9.53.16.9 1.3 1.3 Col BA 4.62.5 4.3x2.4 J 540:Nitrobenzene PkPk 34 12 123 517743 123 4357+77 Pk?k 32 1 14.3 7.46.6 14.2 6.67.4 7.6 7.6 Col 3A 17 4.9 17 4.5 260 258 x 528:o-Tolunitrile PkPk 314 2 117116 9050 116117 9036+ PKPk 1 32 13-1 4.66.7 13.1 6.74.5 2.6 2.5 ColCol A B 12 4 12 3.9 228 228 + = No match on any one of four stored peaks.of any one of three stored Pk 43 89 89 peaks. TPNA = Transparent.Not available.x* L..= Outside Outside toleraueetolerance limits limits. V711411111MITIFI, 7 itiessmimerl
IDINTIPICATION or COMPOUNDS PhWiLAMOSATORY Table 32 Cosmound 4 Commoumd X Search _Mpiscrim C. ..und A M+D Select,Cand. 387 387Discrim.M coemound IS 387M+D Select, Cand. 387 387DiscrimM compound c M+D387 Sglect. Cand. 387 387Discrim.M compound D 387M+D Select, Cond. 387 I Discrim.387M compound E M+D387 Select. Cand. 387 Discrim.387M Compound F M+D387 Select,cand. 387 387Discrim.M 387 5elect cand. G387 387DiscrIm.M Compound H M+D387 Select.Cand. 387 387Inserts.M M+D387 Select.Cand. 387 387DlecrimM MiM387 §elipct. Cand. 387 Diecrlm.387M NE2387 Select. cand. 387 MSElementsState 387 773016 387 5216 14 24 300131 28 300153 58 227 1189 108 -48 130 78- 387 9134 300 d939 300119 61 227 25 4 300 10 6 300 4028 227 11 1 138300 14 160300 44 227 1096 300 1417 300 4439 227 18 2 16 1 164623 14 1 300103 49 125300 79 227 1668 2713 7 373527 10 2 1480 5 442780 48 24 GCMilltIR 1042 185107202 12 5011 203193 11 2 49 7 199192 20 56 41 5 272197184 13 2011 4 287154196 1 1042 6 273185202 03 1311 3 280154195 1 33 83 275195176 1 8549 7 352199192 13 1 1412 7 281199155 12 141612 281204159 12 UV 85 352 ti 1 6 273 1 85 352 . Candidate Score candidate Score Candidate Score Candidate Score Candidate Score Candidate Scars co cation(TrueIdentifi- thiol1-Butane-candidate Score 104 aldehydeiso-Butvr-Candidate Score 103 nSqualanen-DecaneCandidate -Septa- decane Score 107 6771 MenthoneCandidate Score 87 aadehydep-Anis-Candidate Score 110 EtherDiphenvl 108 SulfideDiohenyl 104 n-Heptanol 101 benzeneNitro- 107 nitrileo-Tolu- 120 lined)under-compound 2,3,32 -Propane-Tri -methylthiol - - 6729 M = Matches = Matches+Defaults pentane ,.... - ,e,orel/FOOFOilig_
One, n-butyralddhyde, was lost due to a "no-match" on infrared.
A comparison was made of our infrared spectrum with the original data source (Sadtler IR #333). Sadtler's spectrum shows a strong band at 13g Which is totally absent in our
spectrum. We strongly suspect an impurity in Sadtler's material making that Whole spectrum suspect. Since we cannot, at this time, obtain a confirmation on Sadtler's spectrum we must carry
it through as a mismatch.
Another unknown, n-Decane, encountered the same problem as
the hydrocarbons encountered in the Input 3 category Which were discussed earlier. N-Decane came through with 4 additional candidates. These four were the same compounds that came through with other hydrocarbons. These four candidates lacked gas chromatographic data. If gas chromatographic data had been
available these four extraneous candidates would have been
eliminated.
All but one of the proper candidates received a rating of
over 100 out of 130 maximum. None received 130 points. Menthone, though uniquely selected, obtained a ra ,ng of 87. The reason
for this was the lack of both NMR and UV data in the file. if these signatures had been present and there had been matdhes within the tolerance limits, 24 more points would have been
Gbtained. In spite of the lack of these data, methone still came through as a unique candidate.
Two of the eleven searches resulted in a unique selection only after UV had been searched. Prior to the UV search two
HT RESEARCH INSTITUTE
99 IITRI-C6104-4 candidates remained. Although this suggests that UV does have
its ability to be discriminating, a search based on the modified match criteria would probably have eliminated the need for UV.
This supposition was not tested however.
d. Precision with Modified Match Criteria
The results obtained above using the initial match criteria described in Section IVA3a can be considered as a first pass based upon assumptions as to the efficacy of each signature
in identifying an unknown compound. There were no explicit guidelines to determine the number of peaks to be searched in
each signature, the number required to establish a match, and
the tolerance limits that should be established.
Subsequent to an evaluation of the precision of identifica- tion obtained with the initial match criteria, additional tests were made using the modified match criteria described in
Section IVA3b. These tests were made only on the 32 known input compounds for which multiple candidates were obtained. No
effort was made to determine the effect of the modified criteria on the discrimination and selectivity measures derived from the
tests using the initial criteria. The modified criteria would have the effect, in general, of improving both discrimination
and selectivity. The availability and default ratios, however, would also be changed by the requirement for additional data
in the NMR and GC signatures.
The results of imposing the modified match criteria on the
32 test compounds with multiple candidates are recorded beneath
PITRESEARCH INSTITUTE
100 IITRI-C6104-4 been affected compoundsin Table 28. The ratings have not criteria, however. The changed in accordancewith the modified singly number of candidateseliminated by themodified criteria,
and in combination, arelisted in Table33.
Table
CANDIDATES ELIMINATEDWITH MODIFIED MATCHCRITERIA
Number of Candiewites Match Criterion Eliminated
(1) GC: Columns A and B 7
(2) NMR: 2 of 2Peaks
(3) MS: 3 of 4 Peaks 17
(4) MS: 1/4 or 4/1 match eliminated 2 (1) and (2) 1
, 5 (1) and (3) ;
i (2) and (3) 11 i 7 (3) and (4)
!I (1) and (3)and(4) 1
1 1 (2) and (3)and(4) ;
Total 51
eliminated Fifty-one of the65 extraneouscompounds were Of leaving only 8 testcompounds with multiplecandidates. candidate and 6have 2 these 8 compounds,2 have oneextraneous compounds were extraneouscandidates. Seven of the test lITRESEARCH INSTITUTE
101 IITRI-C6104-4 hydrocarbons and one was an alcohol. Each of five of the hydrocarbons had n-heptadecane and squalane as the extraneous candidates. Certainly a more complete data file on these compounds would have resulted in rejection also.
For the 100 test compounds, the modified match criteria resulted in unique identification of 92 of the compounds.
f. Additional Match Modifications
A number of additional match modifications can be con- sidered for the improvement of search precision. These include the number of signatures to be searched, data tolerances, and number of peaks to be searched and matched. As the data base grows and encompasses thousands of compounds, someadditional match criteria may be required to maintain satisfactory precision.
One of the objectives of the search experiment was to determine the minimum number of signatures necessary to achieve identification. Table 22 has indicated that the five signatures each aided in this task. To investigate the precision obtained with various combinations of signatures, ten compounds were selected and matches were made on combinations of 2,3,4, and
5 signatures. Only combinations were tested because the final results are independent of the order of search. The results are summarized in Table 34 and illustrated in Figures7, 8, and
9. In the 2-signature search, the average number of candidates for the 10 test compounds averaged over the 10 combinations of five signatures taken 2 at a time was 49.2. The MS.IR combination
lITRESEARCH INSTITUTE
102 IITRI-C6104-4 Table 34
SELECTIVITY OF COMBINATIONS OF SIGNATURES (10 Known Input Compounds*)
AVERAGE NUMBER OF SURVIVING CANDIDATES NO. SIGN. SIGNATURES 'Stage1 Stage 2 Stage 3 Stage 4 Stage 5 MEAN
MS-IR 73.2 15.3 MS1\1MR 73.2 30.3 MS.GC 73.2 22.1 MS-UV 73.2 45.1 49.2 2 IR-NMR 65.8 32.2 IR-GC 65.8 28.4 IR-UV 65.8 39.2 NMRGC 175.3 55.6 NMR-UV 175.3 109.4 GC-UV 205.3 114.7
MSIR-GC 73.2 15.3 4.0 MSIRUV 73.2 15.3 10.2 MSIRNMR 73.2 15.3 8.8 MS-NMR-GC 73.2 30.3 7.4 13.2 3 MSNMR-UV 73.2 30.3 21.1 MSGCUV 73.2 22.1 11.0 IRNMR-GC 65.8 32.2 12.3 IRNMR-UV 65.8 32.2 20.9 IRGC-UV 65.8 28.4 11.2 NMR-GC-UV 175.3 55.6 25.0
MS-IRNMR-UV 73.2 15.3 8.8 5.4 MSNMR-GCUV 73.2 30.3 7.4 3.6 2.9 3.8 4 IRNMR-GC,MS 65.8 32.2 12.3 IR-GC-UVMS 65.8 28.4 11.2 2.1 NMR-GC-UVIR 175.3 55.6 25.0 5.0
2.9 1.5 1.5 5 IRNMR-GCMSUV65.8 32.2 12.3
*All compounds had only onesurviving candidate in 7-stagesearch. 103 IITRI-C6104-4 L Ii 500o 200300400 "C 1 00 NER NMR UIT UV
AVERAGE OF 10 COMBINATIONS = 49.2 SELECTIVITY OF 2-SIGNATUREAVERAGES SEARCHESOF 10 KNOWN COMPOUNDS Figure 7 SEARCH STAGES 400500300 i1 i i i i 200100 NMR 50 idS MS MS MS ms ND11111 IAS ILIIII IR MIlikIIIE! IR IR GC 0E 4030 lit NKR a 1.0iR GC NKR , NMR GC UV r=1P 20 IR sIR siR UV GC UV 0 C.)P 10 111 Ill UV N MR GC 453 GC 11 2 I AVERAGE J OF 10 COMBINATIONS = 13.2 1 SELECTIVITY OF 3SIGNATUREAVERAGES SEARCHES OF 10 KNOWN COMPOUNDS Figure 8 SEARCH STAGES 400500300 200100 ii ii NMR 50 llir MS rib ws MKIII IR . IR GC 403020 \ NMR NMR GC UV 10 IR . NMR GC UV -1C I 1 45 UV 1111 1 UV . IR 23 AVERAGE OF MS 5 COMBINATIONS MS = 3.8 1 SELECTIVITY OF 4-SIGNATUREAVERAGES SEARCHES OF 10 KNOWN COMPOUNDS SEARCH STAGES Figure 9 with an average of 15.3 candidates percompound was the most
selective. The GCUV combination with an average of114.7 was the leastselective attributed mainly to the largenumber of defaults in both techniques. The MS.IR.GC combination with an averageof 4.0 candidates per compound was the mostselective of the 10 3-signature searches, followed by MSNMRGC with7.4 candidates and MSIRNMR with 8.8. The average of the 10 combinations was
13.2 candidates per zompound. The IRGCUV.MS combination was mostselective among the
4-signature combinations with an average of2.1 candidates per
compound. The IR.NMR.GC.MS combination was secondwith an average of 2.9 candidates. For the 5 combinations, 3.8 was
the average number of candidates. The search on all five signaturesresulted in an average
of 1.5 candidates per compound thus showing onceagain that
each signature contributed to theprecision. Inasmuch as all
of the 10 compounds searched in thecombinatorial tests resulted in unique identifications in the 7-stagesearch, it is concluded
that the state and element searches alsocontributed to the
identification. It can also be concluded that eachof the 7 stages made a
contribution to identification in theexperiment, not all equally,
however. It is possible that themodified match criteria, which eliminated many extraneous candidates,would alter the
results of Table 34 sufficiently to permitelimination of a
lITRESEARCH INSTITUTE
107 IITRI-C6104-4 signature. The selectivity shown inTables 22 and 32 suggests that four signatures would be theminimum number to be searched using the number and ranking ofpeaks as employed in the modified match criteria.
V. SUMMARY AND CONCLUSIONS The objective of the researchstudy reported herein was to determine the capability of acomposite signature to identify an unknowncompound. The composite signature is made up of a minimum number of data points obtainedfrom five analytical techniques. Identification was accomplished bymatching the composite signature of the unknownagainst a file of composite
signatures. In a sequential search, candidates wereselected by obtaining the logical intersectionof the individual signature matdhes. To determine the feasibility ofthe above concept a data
base of 500 representative organic compounds wasestablished and
a search experiment wasconducted. The 500 compounds represented 23 functional chemical groups and wereselected from 838 compounds
for which datawerecollected. The data consisted of five signatures obtained from the followinganalytical tedhniques:
1. Mass spectrometry
2. Infrared spectrometry
3. Nuclear magnetic resonance
4. Gas chromatography
5. Ultraviolet spectrophotometry.
HT RESEARCHINSTITUTE
108 IITRI-C6104-4 The 500 testcompounds wereselected on the basis of compounds had data on maximum dataavailability. One hundred 195 on 3 signatures, and five signatures,174 on 4 signatures, and distribution 31 on 2 signatures. Signature availability among chemical groupsare summarizedin Tables 1 to 5 and a list of the 500compounds appears inAppendix A. The signature data werecollected primarily frompublished used sources. Standard measurementsand classifications were signatures for more than100 when available. Gas chromatography compounds were obtainedfrom laboratorymeasurements. the number The search e.:periment wasdesigned to determine tolerances of data points tobe associated witheach signature, the
that would be requiredto accommodateinstrument variability,
and the matchcriteria that wouldfacilitate positive
identification. The match criteriaused in the first partof the experiment
were asfollows:
MS: Two out of fourof the most intensepeaks bands IR: Two out of threeof the most intense
NMR: The most intensepeak columns: GC: Relative retentiontime of either of two Carbowax 20M orSilicone SE-30
UV: Strongest absorptionband.
A search on state: solid, liquid, or gas,and on elements searches to facilitate was conductedprior to the signature the elimination ofcompounds from thecandidate list.
HT RESEARCHINSTITUTE
109 IITRI-C6104-4 Three categories ofinput compounds weretested. Input 1 consisted of 100 knowncompounds having all fivesignatures,
the data of which wereincluded in the file. Input 2 consisted
of five compounds forwhich data were availablebut not included
in the data file. Input 3 consisted of11 compounds for which
data on each of thefive signatures weregenerated in IITRI laboratories and data inthe file were obtainedfrom published
sources. Several search measures weredefined to aid in theevaluation
of feasibility. Discrimination is the capabilityto separate sets of compounds onthe basis of well-definedcharacteristics capability of a of a signature. Selectivity is fhe screening
sequential search. Precision is the accuracyof identification. Because data were notuniformly available forall compounds in
all signatures, dataavailability was a factorin the measure-
ments. The availability ratioof a signature is theratio of the
number of compoundshaving data to the totalnumber of compounds
in the data base. The default ratio is oneminus the availability 100 test ratio. Signature availabilityand default ratios for compounds are listed inTable 12. A match by default occurs
When input data for agiven signature of anunknown are available The and a stored compounddoes not have data forthe signature. discrimination factor is theproduct of fhe number ofmatches
obtained in a signature searchmultiplied by the defaultratio.
HT RESEARCHINSTITUTE
110 IITRI-C6104-4 Discrimination factors forthe signatures obtained by searching for the 100 Input1 compounds are listed inTable 13.
Infrared was the mostdiscriminating signature with an average of 30.39 matches persearch out of 470 availablesignatures. Mass spectrometry was secondwith an average of 49.83 matches per search out of478 available signatures. Gas chromatography was third with an averageof 11.74 matdhes persearch out of
308 available signatures. Nuclear magnetic resonance was fourth and ultravioletphotospectrometry was the least discriminating. Searches ordered on the basis ofdata availability provided more rapid screeningthan searches based on thenumber
of data points to be matdhed. Accordingly, the order ofsearch
in all the experiments was:
1. State
2. Elements
3. MS
4. IR
5. NMR
6. GC
7. UV
The search experiment wasdesigned for processing on a
computer but was conductedusing an inverted file systemin
which data were drilled on cards(Termatrex system). Matches
were established byoptical coincidence.
lITRESEARCH INSTITUTE
111 IITRI-C6104-4 Ratings were used to evaluatethe probability that an unknown compound was on thecandidate list in accordance with weights assigned to thesignatures and with the closeness of the match. The maximum ratings assigned tothe signatures and to the state and elementssearches were:
MS 40
IR 36
NMR 24
GC 18 (9 points per column)
UV 6
Elements 4
State 2
Total 130
A candidate list ratingtable showing the scoring procedure
appears in Appendix F. The most critical measure inestablishing feasibility is
precision. Of the 100 known Input 1 compounds, none wererejected
in the search process. Using the match criteria listedabove,
68 compounds were uniquelyidentified by the seven-stcge search.
The average number of survivingcandidates for the 100 searches
was 1.65 (see Table22). Sixty-five extraneous candidatesappeared
on the candidatelists of the 32 compounds withmultiple candi- dates, ranging from 1 to 5 extracandidates. The aldehydes
(8 compounds) had the lowest average ofsurviving candidates,
1.12, and the hydrocarbons had thehighest average, 2.65. The
lITRESEARCH INSTITUTE
112 IITRI-C6104-4 alcohols with an average of 1.63candidates ranked second highest. Only 2 of the 65 extraneous compoundshad data on all signatures (see Table 24). Eleven had one signature missing,
45 had 2 signatures missing, and 7had 3 signatures missing. Ratings for the candidates ranged from 34 to95 compared to maximum possible ratings of 117 to 130. The unavailability of
GC and NMR signatures contributedin large measure to the low precision in the alcohols and hydrocarbons. Two compounds appeared as candidates on sevencompound lists. The search for the five Input 2 candidatesresulted in three cases of perfect precision, thatis, no candidate compounds survived. One search resulted in the candidacy (score of 28) of a closely related compound butdata on three
signatures were unavailable. The search for a hydrocarbon
obtained six candidates. Nine of the 11 Input 3 compounds wereuniquely identified.
A hydrocarbon had four extraneouscandidates. Only one compound
was erroneouslyrejected because of failure to match on infrared. It is suspected that the compound'sfile data was in error
because of a significant discrepancy betweenthe published data
and IITRI laboratory data. To improve precision, the followingmodified matdh criteria
were imposed on the 32 Input 1compounds that had multiple
candidates:
HT RESEARCH INSTITUTE
113 IITRI-C6104-4 1. Both olumns ir GC wererequired
2. Three out of four MSpeaks were required, and the matches on permuted inputpeak 1 against stored
peak 4 and input peak4 against stored peak 1 were
eliminated
3. Two out of two NMRpeaks were required. The modified match criteriaeliminated 51 of the 65 extraneous candidates leaving only 8 testcompounds with multiplecandidates.
Of these 8 compounds,2 had one extraneous and6 had 2 extraneous candidates. Seven of the test compounds werehydrocarbons and one was analcohol.
In summary: 92 of 100 known compounds wereuniquely identified, and none of the 100 wereerroneously rejected
2 of 5 "unknown" compounds werenot rejected
when they should have been,but the surviving candidates were closely related
10 of 11 compounds whosesignatures were measured
in IITRI laboratories wereidentified, 9 of them uniquely; one compound waserroneously rejected
because of suspected errorin the published data.
The high precision attainedand the absence of erroneous
rejections, especially Whensignature data were completely
available, indicate thatidentification of organiccompounds
by the matching of compositechemical signatures is feasible. The screening of a large datafile in a sequential searchis
HT RESEARCHINSTITUTE
114 IITRI-C6104-4 effective and isreadily amenable tocomputer operations. closeness of matchesand The ratingsassociated with the signature provided a measurefor fhe weights assignedto each identifications can ranking thecandidates. Multiple candidates provide act as indicatorsof chemical groupsand can, therefore, clues for uniqueidentification. Refinement of the matchcriteria, signatureparameters, and in an tolerance limits canbe utilized toobtain high precision
enlarged data file.
The importance ofobtaining reliabledata from standardized analytical techniques wasrecognized early inthe program and The diversity of the test resultscorroborate this view. lack of coordination among indexing betweentedhniques and the composite signature. them poses manyproblems in developing a described in The establishmentof a data filealong the lines this report can serve asan impetusfor relating thetechniques
and coordinatingthe data sources.
VI. RECOMMENDATIONS of the Our investigationshave given us an awareness shortcomings of existinganalytical datainformation files and computerized file system can the contributionof our proposed weaknesses are: make to overcomingthese. The principal together in any way,and (1) that existingfiles are not tied type of datahave (2) that many of thecompounds for which one not found in data been collected, e.g.,NMR spectra, are
lITRESEARCH INSTITUTE
115 IITRI-C6104-4 collections of other types. This is the reason for the large number of defaults in our presentfile.
The input for such anoperational system should come from existing compilations and an attemptwill be made to coordinate the efforts of those organizationsproducing such compilations so that data gaps(defaults) will be minimized in the future.
Thus, we offer the followingrecommendations pertaining to the development of our system and itsutilization by the scientific community for organic compoundidentification and as an
information file.
1. A computerized systemfor storing, searching, and retrieving compound identification datashould be
established in a subsequent phase ofresearch.
2. The present file of 500 organiccompounds should be expanded to a minimum of 3,000compounds in the second
phase. This and the first recommendationconstitutes a pilot scale study to establish themechanics for
operating a larger file. A computerized system of3,000 compounds would constitute aworkable and useful system.
Actual application of thefile to solving users identification problems and informationneeds can
begin after the pilot study.
3. The mode of operation ofthe system should be centralized with respect to data storing,searching
and retrieving. However, data collecting pointsmight be located near major data sourcessuch as Sadtler,
HT RESEARCH INSTITUTE
116 IITRI-C6104-4 ASTM and NBS. The centralized modefor operating the file will offerbetter control over andcoordination of the data fileand its operations. should 4. Where possible, datafor all signatures be obtained for allcompounds. Our past research pointed out the need forcomplete data files. For the final operating system,this will require generation of some datamainly in gaschromatography, nuclear magnetic resonanceand to a lesser extentin mass spectrometry. Generating these datashould be undertaken by existingdata-generating organizations.
For the pilot study some gaschromatographic data may have to be generated. automatic 5. The investigation ofthe employment of
data reduction anddigitization is recommended. Equipment of this nature willspeed data processingand improve the accuracy of the data. signatures 6. The information file fororganic chemical will obtain data fromongoing data operationssuch as
NBS, ASTM and variousprivately-operated data services.
It is not our objective toprovide complete spectra or
yet another competingdata compilation. Our operation will supply the userwith a compoundidentification and references directing him tothe original orprimary data
source if he wishesto obtain the completespectra. In
this way vere hope toincrease the usefulnessof presently
PITRESEARCH INSTITUTE
117 IITRI-C6104-4 existing data services.
7. Our information file willbe, in essence, a subsystem of American Chemical Society'sdiscipline-based chemical information system. It can be so classifiedbecause of the tie-in of each compoundin our file to Chemical
Abstract Services' informationservice through the registry number. CAS has provided,and promised toprovide in the future,registry numbersfor all compounds used in our file making ourACS subsystem entirely feasible.
8. Although we have included ultravioletspectro- photometric data in our signaturesit is recommended that, in future developmentalwork, the UV signature be excluded in favor of a morecomplete file of signatures for each compound. We feel that while some discrimination was shown by the UVsignature, its elimination can be compensated for by morecomplete data for the other four signaturetechniques.
9. A survey of a representativesampling of potential users should be conducted todetermine the level of interest in and financial support of ourinformation
file and the growth of usage wecould expect for the
operating system.
lITRESEARCH INSTITUTE
118 IITRI-C6104-4 APPENDIX A
500 COMPOUNDS IN DATABASE
KEY
D= Dataavailable
T= Transparent A .= GC Column A dataavailable
B = GC Column Bdata available
AB = GC data forboth c,Aomns available
(Blank) = No data available *Preceding compound namedesignates Input 1 test compoundused in search experiment
RESEARCH INSTITUTE
119 IITRI-C6104-4 Availability of Spectra IITRI Compound MS GC IR NMR No. (CA name) UV
D D D 1 Methane T (same) D AB D D 2 Ethane T (same) D 3 n-Propane T (same) D AB D 4 n-Butane (same) AB D D 6 n-Hexane D (same) AB D D 7 n-Heptane D (same) D D 9 n-Nonane D AB (same) AB D D 10 n-Decane D (same) AB D D 12 n-Hexadecane D
rn 13 n-Heptadecane
B D 14 n-Octadecane T D
16 n-Eicosane T D (same)
17 Isobutane T D (2-Methylpropane) m 18 2-Methylheptane .. D AE
D 19 3-Methylheptane T D AB
. D 25 2,5-Dimethylhexane T D
lITRESEARCH INSTITUTE
120 IITRI-C6104-4 Avaijability of Spectra Compound IITRI MS GC IR NMR No. (CA name) UV
D D 28 2,3,4-Trimethylpentane T
29 2,3,3-Trimethylpentane T D
AB D 34 3,4-Dimethylhexane T D
D 36 Squalane T (2,6,10,15,19,23-Hexamethyl- tetracosane) D D 37 Isobutene D (2-Methylpropene) AB D D 40 Isoprene D (same) AB D D 41 1-Hexene D
A D D 42 trans-2-Hexene D
D 47 Butadiene D (1,3-Butadiene) D D 48 Cyclopropane D (same) A D D 50 Cyclopentane T D
D B D D 51 * Cycldhexane T (same) D AB D D 52 * Cycloheptane T
AB D D 53 cis-Decalin T D (cis-Decahydronaphthalene) D D D 54 trans-Decalin T (trans-Decahydronaphthalene) AB D D 56 Methylcyclohexane T D
lITRESEARCH INSTITUTE
121 IITRI-C6104-4 TITRI Compound Availability of Spectra IR NMR No. (CA name) UV MS GC
D ]I":: 57 Cyclohexene D AB
59 Dicyclopentadiene D AB D (3a,4,7,7a-Tetrahydro-4,7- methanoidene) D 60 * Benzene D D AB D (same) D D 61 * Toluene D D AB (same) D 62 o-Xylene D D AB D (same) D 63 m-Xylene D D AB D (same) D 64 p-Xylene D D AB D (same)
65 Ethylbenzene D D AB D D (same)
66 Vinylbenzene D D AB D (Styrene) wag,* D 67 Isopropylbenzene D D AB D (Cumene)
69 p-Isopropyltoluene D D D D (p-Cymene)
70 Naphthalene D D A D D (same)
71 Anthracene D D (same)
73 Azulene D D
77 Indane D D D D
78 Tetralin D A D D (1,2,314-Tetrahydronaphthalene)
lITRESEARCH INSTITUTE
122 IITRI-C6104-4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)
T D D D 79 cis-1,2-Dimethy1cyc lohexane
T D D D 80 trans-1,2-Dimethylc yclohexane
D AB D D 84 Ethylene oxide (same) D AB D D 85 Propylene oxide (same) D AB D D 88 Furan (same) D AB D D 89 Tetrahydrofuran (same) D D D D 90 Pyrrole (Azole) D D D 91 Pyrrolidine
D D D D 95 Pyridine (same) D B D D 96 Piperidine (same) D D D 98 3-Methylpyridine
D D D 99 4-Methylpyridine (4-Pico line) D D D 101 3-Methylpiperidine
D D D D 104 Isoquinoline (same) D D D D 105 Acridine
D D D 106 Indole
IIT RESEARCHINSTITUTE
123 IITRI-C6104-4 Availability of Spectra Compound TITRI UV MS GC IR NMR No. (CA name)
D D 107 Skatole (3-Methylindole) D D 110 Pyrazole
D AB D D 112 Dioxane (p-Dioxane) D D 113 Imidazole
NP D D 125 Tryptophan (same) NP D D 127 Phenylalanine (same) NP D 134 a-Alanine (same) NP D D 135 Glycine
D NP 146 Caffeine (same) D NP 147 Morphine (same) NP 151 Novocain
D D 154 Cholesterol (same)
164 Progesterone (same) D A D D 173 a-Pinene (2-Pinene) D 176 Alleacimene D
D D 177 p-Pinene (2(10) -Pinene)
lITRESEARCH INSTITUTE
124 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR
178 Camphpne D D D (same)
179 a-Terpineol D AB D (p-Menth-1 -en-8-ol)
180 Geraniol D A D D (3,7-Dimethyl-trans-2,6- octadien-l-ol)
181 Citral D D AB D (3,7-Dimethy1-2,6-octadienal)
182 Citronellol (3,7-Dimethy1-6-octen-1-ol)
183 Linalobl D AB D (3,7-Dimethy1-1,6-octadien-3-01)
184 Menthone D AB D (p-Menthan-3-one)
185 Menthol D AB D D (same)
194 Methylene dichloride D AB D D (atchloromethane)
195 Chloroform D D D (same)
196 Carbon tetrachloride D D T (same)
197 Chloroethane D AB D D (same)
198 Trichloroethylene D B D D (same)
200 1,1,2-Trichloroethane D AB D D (same)
201 cis-1,2-Dichloroethylene D AB D D
202 Tetrachloroethylene D D T (same)
lITRESEARCH INSTITUTE
125 IITRI-C6104-4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)
D D T 203 Hexachlorobenzene (same) D AB D 204 n-Propyl chloride
AB D D 205 Chlorobenzene D D (same) AB D D 206 o-Dichlorobenzene D D (same) D AB D 207 m-Dichlorobenzene D
AB D D 208 p-Dichlorobenzene D D (same) AB D D 211 Bromoform D (rr.tbramomethane) D AB D D 213 Bromobenzene D (same) D D T 214 Trichlorofluoromethane (same) D T 215 Dichlorodifluoromethane D (same) D T 216 Hexafluorobenzene D
A D D 217 Methyl iodide D (Iodomethane) AB D 222 Allyl chloride D (3-Chloropropene) A D D 223 Benzyldhloride D D (a-Chlorotoluene) D D 225 Benzotrichloride D (afala-Trichlorotoluene) D D 226 Benzal chloride D D (ala-Dichlorotoluene)
lIT RESEARCH INSTITUTE 1
126 IITRI-C6104-4 IITRI Compound Availability of Spectra GC IR NMR No. (CA name) UV MS
231 1,5-Dibromopentane D A (same) D D 235 o-Chlorotoluene D AB (same) D D 236 p-Chlorotoluene D D (same) D 239 3-Bromopropyne D AB D (same) D 240 Bromodhloromethane D A D
241 1-Iodonaphthalene D A (same) D 242 2-Iodopropane D AB D (same) D 243 Methanol T D AB D (same) D 244 Ethanol D AB D (EI:hyl alcohol) D 245 Propanol T D AB D (Propyl alcohol)
246 Butanol T D AB D D (Butyl alcohol)
247 Pentanol T D AB D D (Pentyl alcohol)
249 Heptanol T D AB D D (Heptyl alcohol)
250 Isopropanol T D AB D D (Isopropyl alcohol)
251 Isobutanol T D AB D D (Isobutyl alcohol)
253 2 -Methyl -1 -butanol T D AB D (same)
lITRESEARCH INSTITUTE
127 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CA name)
T D AB D 254 2-Methy1-2-butanol (tert7Pentyl alcohol) T D AB D 255 3-Methy1-2-pentanol
AB D 256 2,2-Dimethy1-1-butanol
AB D 257 2,3-Dimethy1-2-butanol
T D AB D D 258 Cyclopentanol (same) T D AB D D 259 Cyclohaxanol (same) T D AB D 260 cis-2-Methylcyclohexanol
D AB D D 262 2-Propyn-1-ol (same) D AB D D 266 2-Methy1-3-butyn-2-ol (same) D AB D 267 3-Methyl-l-pentyn-3-01 (same) D D AB D D 268 Phenol (same) D D D D 269 o-Cresol (same) D D 271 p-Cresol (same) D D D D 272 a-Naphthol (1-Naphthol) D D 273 13-Naphthol (2-Naphthol) D D 274 2,4-Dimethylphenol (2,4-Xylenol)
lITRESEARCH INSTITUTE
128 IITRI-C6104-4 Li Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)
D 276 2,5-Dimethylphenol D (2,5-Xylenol) D D D 277 3,5-Dimethylphenol D (3,5-Xylenol) D AB D D 278 Octanol T (Octyl alcohol) T D AB D D 279 Nonanol (Nonyl alcdhol) D AB D 281 Ethylene glycol T (same) AB D 282 1,2-Propanediol T D (same) D AB D 283 1,3-Propanediol T
D AB D D 284 * Diethylene glycol T
D AB D 285 1,2-Butanediol T
D AB D 286 1,4-Butanediol T (same) D D 288 Catechol D D (Pyrocatechol) D D 289 Resorcinol D D (same) AB D 291 2,3-Dimethy1-2,3-butanediol D (same) AB D 293 2,4-Pentanediol T D
D AB D D 294 * Acetone D (same) D AB D D 295 * Methyl ethyl ketone D (2-Butanone)
lITRESEARCH INSTITUTE
129 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CA name)
D D AB D D 296 2-Pentanone (same) D AB D D 297 3-Pentanone (same) D AB 298 3-Methy1-2-butanone
D AB D 299 2-Hexanone
D AB D 300 3-Hexanone D
D AB D 301 3-Methy1-2-pentanone
D AB D D 302 2-Heptanone (same) D AB D 304 4-Heptanone D (same) D AB D D 305 Cyclopentanone D (same) D AB D D 306 Cyclohexanone D (same) D AB D 307 3-Buten-2-one (same) D AB D 308 5-Hexen-2-one (same) D AB D 309 3-Methy1-3-buten-2-one
D AB D 310 2,3-Butanedione D
D AB D D 312 * 2,4-Pentanedione D (same) D D A D D 313 * Acetophenone (same)
lITRESEARCH INSTITUTE
130 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CA name)
D AB D D 314 Propiophenone D (same) D D 315 Benzophenone (same) D D 317 Benzi1 (same) D D 318 p-Benzoquinone D D (same) AB D 320 Formaldehyde D D (same) AB D D 321 Acetaldehyde D D (same) AB D D 322 Propionaldehyde D D (same) AB D D 323 Butyraldehyde D D (same) AB D 324 Valeraldehyde D (same)
V AB D 325 Hexanal D
AB D 326 Heptanal D (same) AB D D 327 Isobutyraldehyde D (same) AB D D 328 Isovaleraldehyde D (2-Methylbutyralddhyde) D 331 2-Ethylhexanal D AB (same) D 332 Acrolein D AB (same)
333 Methacrolein (Methacrylaldehyde)
HT RESEARCH INSTITUTE
131 IITRI-C6104-4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)
D D 336 2,4-Hexadienal (Sorbaldehyde) D D AB D D 337 * Crotonaldehyde (same) D A D D 339 Benzaldehyde D (same) D AB D D 340 Furfural D (2-Furaldehyde) D AB D D 344 Paraldehyde (same) D D D 347 Formic acid D (same) D D D 349 Propionic acid (same) D D D 350 Butyric acid (same) D D D 351 Valeric acid (same) D D D 352 Hexanoic acid (same) D D 353 Heptanoic acid (same) D D D 357 Benzoic acid D (same) D D 358 Phenylacetic acid D D (same) D D 359 o-Toluic acid D D (same) D D 360 m-Toluic acid D D (same) D D 361 p-Toluic acid D D (same)
HT RESEARCH INSTITUTE
132 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CIA name)
D D 362 p-Naphthoic acid (2-Naphnoic acid) D D 365 Succinic acid (same) D D 367 Adipic acid (same) D D D D 368 o-Phthalic acid (Phthalic acid) D D 369 m-Phthalic acid (Isophthalic acid) D D D D 370 Terephthalic acid (same)
ID D D 373 Methacrylic acid (same) acid 374 2,4,6-Mesitylenecarboxylic
D D 375 Stearic acid (same) D AB D D 379 Methyl formate (Formic acid, methylester) D AB D 381 Propyl formate (Formic acid, propylester) AB D Butyl formate D [. 382 D AB D 384 Isobutyl formate
D AB D D 385 Methyl acetate (Acetic acid, methylester) D AB D D 387 Propyl acetate (Acetic acid, propylester) D AB D D 388 Butyl acetate (Acetic acid, butylester)
lITRESEARCH INSTITUTE
133 IITRI-C6104.4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)
D AB D D 389 Isopropyl acetate (Acetic acid,isopropyl ester) D AB D 390 Isobutyl acetate (Acetic acid,isobutyl ester) D AB D D 391 Methyl propionate
D AB D 392 Ethyl prc)ionate (Propionic acid,ethyl ester) D AB D D 393 Propyl propionate (Propionic acid,propyl ester) D AB D 395 Isopropyl propionate
D AB D D 396 Isobutyl propionate
D AB D 398 sec-Butyl acetate (Acetic acid, sec-butylester) D AB 402 Ethylidene diacetate
D AB D D 403 Vinyl acetate
D AB D 404 Allyl acetate
D AB D 405 Vinyl butyrate (Butyric acid, vinylester) D AB D D 407 Methyl methacrylate (Methacrylic acid, methylester)
410 Allyl propionate
D AB D D 411 Isopropenyl acetate
D AB D 412 2-Ethyl-1-hexyl acetate (Acetic acid, 2-ethylhexylester)
UT RESEARCHINSTITUTE
134 IITRI-C6104-4 IrrR1 Compound Availability of Spectra IR NMR No. (CA name) UV MS GC
D 415 Isopentyl propionate D AB D
D 418 Methyl benzoate D D A D (Benzoic acid, methylester) D 419 Ethyl benzoate D D D (Benzoic acid, ethyl ester) D 421 Dimethyl-o-phthalate D D A D (Phthalic acid, dimethylester)
423 Dimethyl-p-phthalate D D A
424 Diethyl-p-phthalate D D
D 425 Diethyl malonate D D AB D (Malonic acid, diethylester)
426 Diethyl succinate D AB D D (Succinic acid, diethylester)
427 Diethyl oxalate D AB D (Oxalic acid, diethyl ester)
432 Ethyl methacrylate D D D (Methacrylic acid, ethylester)
434 Ethyl acrylate D AB D D (Acrylic acid, ethyl ester)
435 Methyl cinnamate D D D D (Cinnamic acid, methyl ester)
438 Methyl-o-toluate (o-Toluic acid, methyl ester)
439 Methyl-p-toluate D D D (2-Toluic acid, methyl ester)
440 Methyl-m-toluate D D D (n=roluic acid, methylester)
441 Dimethyl ether D AB D D (Methyl ether)
lITRESEARCH INSTITUTE
135 IITRI-C6104-4 Comound Availability of Spectra IITRI UV MS GC IR NMR No. (CA name)
D AB D D 443 Diethyl ether (Ethyl ether) D AB 444 Methylpropyl ether
D AB D D 445 Dipropyl ether (Propyl ether) D AB D 446 Butylethyl ether
D AB D D 449 Isopropyl ether (same) D AB D 450 Allylethyl ether
D AB D 452 Ethylvinyl ether (same) AB D 453 Butylvinyl ether (same) D AB D 454 2-Ethyl-1-hexyl vinylether (2-Ethylhexyl vinylether) D AB D D 456 Diallyl ether
D D D 457 Dibenzyl ether D (Benzyl ether) D AB D D 458 Diphenyl ether D (Phenyl ether) D AB D D 459 Anisole D (same) D D D 464 Butylamine (same) D D D 466 Hexylamine (same)
467 Dimethylamine (same)
PITRESEARCH INSTITUTE
136 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR
468 Diethylamine D D D (same)
471 Triethylamine D AB D D (same)
472 Trimethylamine D D D (same)
475 Allylamine D D D (same)
476 Aniline D D D D (same)
477 N-Methylaniline D D (same)
478 N,N-Dimethylaniline D D D D (same)
480 o-Toluidine D D D D (same)
484 m-Phenylenediamine D D D (same)
485 p-Phenylenediamine D D D D (same)
486 Diphenylamine D D D (same)
490 o-Tolidine D D D (3,3'-Dimethylbenzidine)
491 Formamide D D D (same)
492 Acetamide D D D (same)
493 Propionamide (same)
494 Butyramide
lITRESEARCH INSTITUTE
137 IITRI-C6104-4 Availability of Spectra Compound IITRI MS GC IR NMR No. (CA name) UV
D D D 498 Benzamide D
D D D 501 Dimethvlformamide (N/N-Dimethylformamide) D D 515 Methacrylamide (same) D 516 N-Methylformanilide D
D A D D 522 Benzonitrile D (same) AB D D 524 Acetonitrile D (same) AB D D 525 Propionitrile D (same) D D 526 Dicyanopropane D
D AB D D 527 * p-Tolunitrile D
D AB D D 528 * o-Tolunitrile D
AB D 529 Valeronitrile D (same) A D D 530 3-Butenenitrile D (same) D D 531 Acrylonitrile D (same) A 532 Malonitrile D (same) AB D D 534 Nitromethane D D (same) AB D D 535 Nitroethane D D (same)
lITRESEARCH INSTITUTE
138 IITRI-C6104-4 Availability ofSpectra IITRI Compound UV MS GC IR NMR No. (CIA. name)
D D AB D D 536 Nitropropane (1-Nitropropane) D D AB D D 537 2-Nitropropane (same) D D AB D D 540 Nitrobenzene (same) D A D D 542 o-Nitrotoluene
D D AB D D 544 p-Nitrotoluene (same)
545 m-Nitrotoluene (same) D D D 547 1-Nitronaphthalene (same)
557 Propylmercaptan (1-Propanethiol) D AB D D 560 Butylmercaptan (1-Butanethiol) D D AB D D 562 Thiopheno3 (Benzenethiol) D D D D 567 Dimethyl sulfide (Methyl sulfide) D D AB D D 568 Diethylsulfide
D D AB D D 569 Ethyl disulfide (same) D D AB D 572 tert-Butyl sulfide (same) D D AB D T 573 Carbon disulfide (same) D D A D D 574 Diphenyl sulfide (Phenyl sulfide)
lITRESEARCH INSTITUTE
139 IITRI-C6104-4 IITRI Compound Availability of Spectra NMR No. (CLA name) UV MS GC IR
D 575 Phenyl disulfide D D
578 Dimethyl sulfoxide D A D D (Methyl sulfoxide)
587 Dimethyl sulfone
594 2-Chloroethanol D AB D
595 3-Hydroxy-2-butanone D D D
600 2-Methoxyethylvinyl ether D AB D
610 Phthalic anhydride D D (same)
611 Succinic anhydride D D D (same)
613 Acetyl chloride D D
618 Trichloroacetic acid D D D
626 o-Nitrophenol D D A (same)
628 o-Methoxybenzaldehyde D A D D (o-Anisaldehyda)
633 Chloral (same)
655 Tetraethyl lead
658 Acetal D AB D D (Acetaldehyde, diethyl acetal)
659 Ketal D AB (Acetone, dimethyl acetal)
lITRESEARCH INSTITUTE
140 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR
661 Isobutyric anhydride D D (same)
662 Benzoic anhydride D D D D (same)
663 Maleic anhydride D D D D (same)
664 Coumarin D D B D D (same)
665 Chloroacetic acid D D (same)
666 Dichloroacetic acid D D (same)
667 m-Nitrobenzoic acid D D D D (same)
668 Diethoxymethane D AB D D
670 o-Methoxyphenol D D AB D D (same)
671 m-Methoxyphenol D D A D D (same)
672 p-Methoxyphenol D D A D D (same)
673 3-Bromopropionitrile D D
674 3-Bromopropionic acid D D
675 2-Bromobutyric acid D D (same)
676 2-Bromo-4-chlorophenol D D D D
677 p-Bromophenol D D D D (same)
lITRESEARCH INSTITUTE
141 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR
678 p-Chlorobenzonitrile D D
679 m-Chlorophenol D D D D (same)
680 * 1-Chlorophenol D D A D D
681 Salicylaldehyde D D A D D (same)
682 3-Etboxy-4-hydroxybenzaldehyde D D D D (same)
683 2-Furan acrolein D D D
684 * 2,4-Dichlorobenzaldehyde D D A D D
685 p-Anisaldehyde D D A D D (same)
687 p-Chlorobenzaldehyde D D AB D D (same)
688 3-Methoxy-1-butanol AB D D
689 2-Ethoxyethanol D AB D (same)
690 2-Butoxyethanol D AB D (same)
691 2-Methoxy-1-propanol D AB
692 1-Methoxy-2-propanol D AB D (same)
693 2-Ethoxyethyl acetate D AB D (2-Ethoxyethano1, acetate)
695 Dipropyl acetal D AB D (Acetaldehyde, dipropyl acetal)
lITRESEARCH INSTITUTE
142 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR
698 4-Hydroxy-4-methy1-2-pentanone D D B D D (same)
699 2-Methoxyethanol D AB D D (same)
700 bis(2-Etboxyethyl)ether D AB D D
701 2,3-Dimethylbutane T D AB D D (same)
702 o-Diethylbenzene D AB D D
703 p-Diethylbenzene D AB D D (same)
704 m-Diethylbenzene D D AB D D (same)
705 2,2-Dimethylbutane T D AB (same)
706 1-Methylcyclohexene D D D (Methylcyclohexene)
707 4-Viny1-1-cyclohexene D AB (4-Vinylcyclohexene)
711 3-P1-ienylpropene D D D D
712 Isopentane T D D D (2-Methylbutane)
713 2,4-Dimethylpentane T D D D (same)
714 2-Methylpentane T D D D (same)
715 3-Methylpentane T D AB D D (same)
716 2,3-Dimethyl-1-butene D D D
lITRESEARCH INSTITUTE
143 IITRI-C6104-4 IITRI Compound Availability of Spectra IR NMR No. (CA name) UV MS GC
D 717 2-Methy1-2-butene D D
B D D 718 * Biphenyl D D (same) D 720 3,5-Dimethylpyrazole D
D 721 2,6-Dimethylpyridine D D D
D 722 4-VinylPyridine D
D 723 2-Methylfuran D AB D
D 726 Chlorocyclohexane ID D
D D 727 1-Chlorohexane
D 728 1,2-Dichloropropane ID D (same) D 729 1,3-Dichloropropane D AB D (same) D 732 1,2,4-Trichlorobenzene D A D (same) D 733 Bromocyclohexane D D (same)
734 2-Bromobutane D AB D D (same)
735 1-Bromo-3-methylbutane ID D D (same)
736 1,2-Dibromobutane ID D D
737 Bromoethane D AB (same)
UT RESEARCH INSTITUTE
144 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR
738 1-Bromopentane D AB D D (same)
739 1-Bromopropane D AB D D (same)
740 2-Bromoprcpane D AB D D (same)
741 2-Bromo-2-methylpropane D D (t-Butyl bromide)
742 1,2 -Dibromopropane D D (same)
743 1,3-Dibromopropane D D (same)
744 2-Bromo-1-propene D D D
745 2,3-Dibromo-1-propene D D D
746 Iodocyclopentane D D D
747 1-Iodobutane D D D
748 2-Iodobutane D D D
749 Iodoethane D AB D D (same)
750 1-Iodopropane D AB D D (same)
751 Iodobenzene D D D
754 1-Bromo-2-chlorobenzene D D D
755 1-Bromo-4-chlorobenzene D D D
lITRESEARCH INSTITUTE
145 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (Ch name)
T D AB D D 757 2-Pentanol (same) T D AB D D 758 2,2-Dimethy1-1-propanol (same) T D AB D D 759 2-Ethyl-1-butanol (same) T D AB D D 760 3-Heptanol
T D AB D D 761 * 2-Ethyl-1-hexanol (same) T D AB D 763 2-Hexanol
T D AB D 764 3-Hexanol
T D AB D 765 2-Methyl-1-pentanol (same) T D AB D 767 4-Methy1-2-pentano1
T D AB D 768 2-Methy1-3-pentanol
T D AB D 769 3,3-Dimethy1-2-butanol
T D AB D 770 2-Heptanol
T D AB D 771 4-Heptanol
T D AB D 772 2127.Dimethy1-1-pentano1
T D AB D D 773 * 2-Octanol (same) D AB D 774 2-Methy1-2-propen-1-ol
HT RESEARCHINSTITUTE
146 IITRI-C6104-4 Availability of Spectra IITRI Compound GC IR NMR No. (CAL name) UV MS
D D 775 206-Dimethylphenol D
776 Cyclopentylmethanol T D
D 777 Benzyl alcohol D A D (same) D D 778 t-Butanol T AB
779 2,5-Hexanediol T D D D
780 2-Methy1-2,4-pentanediol T D AB D (same) D D 781 1,3-Butanediol T D AB (same) D D 783 2,3-Butanediol T D AB
D 785 4-Methy1-2-pentanone D AB D (same) D 786 2-Nonanone D AB D
D 787 Camphor D D AB D (same) D 788 4-Methylcyclohexanone D D D
789 * Butyrophenone D D A
D 791 * Cinnamaldehyde D D AB D (same)
792 o-Tolualdehyde D A
793 2-Ethylbutanal D AB D D
lITRESEARCH INSTITUTE
147 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (aPi name) UV MS GC IR NMR
794 Piperonal D D AB D D (same)
795 Cyclohexanecarboxylic acid D D D (same)
796 2-Methylbutyric acid D D D (same)
797 Isobutyric acid D D D (same)
798 Senecioic acid D D D
799 Pentyl formate D A
800 Hexyl formate D AB D
801 Allyl formate
803 Heptyl acetate D AB D (Acetic acid, heptyl ester)
804 Cyclohexyl acetate D AB D
805 Methyl butyrate D AB D
Propyl butyrate D AB D D (Butyric acid, propyl ester)
807 Isopropyl butyrate . D AB D
808 Butyl butyrate D AB D (Butyric acid, butyl ester)
809 Pentyl butyrate D AB D
810 Isopentyl butyrate D AB D (Butyric acid, isopentyl ester)
lITRESEARCH INSTITUTE
148 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (ak name) UV MS GC IR NMR
811 Butyl isobutyrate D AB D (Isobutyric acid, butyl ester)
812 Propyl acrylate AB D
814 Isopentyl acetate D AB D D
815 Hexyl acetate D AB D D (Acetic acid, hexyl ester)
816 Pentyl propionate D AB D D
817 Ethyl butyrate D AB D D (Butyric acid, ethyl ester)
818 Benzyl benzoate D D D D (Benzoic acid, benzyl ester)
819 * Butyl benzoate D D AB D D (Benzoic acid, butyl ester)
820 Pentyl ether D AB D D
821 Butylmethyl ether D AB D
822 Butyl ether D AB D (same)
823 Hexyl ether D AB D (same)
824 Isobutylvinyl ether D AB D (same)
825 Ethoxybenzene D D
826 p-Ethoxytoluene D D D D
827 NIN-Dimethylcyclobexylamine D D D (same)
HT RESEARCH INSTITUTE
149 IITRI-C6104-4 IITRI Compound vailability of Spectra No. (CA name) UV MS GC IR NMR
828 tert-Butylamine D D (same)
829 sec-Butylamine D D D
830 Isobutyronitrile D AB D D (same)
831 Butyronitrile D AB D D (same)
832 Fumaronitrile D D D
833 Hexanenitrile D AB D D
834 4-Methylpentanenitrile D D D
835 Phenylacetonitrile D A D D (same)
836 Oleic acid D D D (same)
837 m-Tolunitrile D A D D
838 Benzylmercaptan D AB D D (a-Toluenethiol)
839 2-Propanethiol D D D
840 114-Butanedithiol D D D
841 * Butyl sulfide D D AB D D
842 2-Methyl-1-butene D D D
843 cis-1,3-Dimethylcyclohexane T D D
HT RESEARCH INSTITUTE
150 IITRI-C6104-4 IITRI Compound Availabinty of Spectra IR NMR No. (CA name) UV MS pc
D 844 cis-1,4-Dimethylcyclohexane T D
D 845 trans-1,3-Dimethylcyclohexane T D
D 846 trans-1,4-Dimethylcyclohexane T D
850 trans-2-Methylcyclohexanol T D AB
lITRESEARCH INSTITUTE
151 IITRI-C6104-4 APPENDIX B
AVAILABILITY OF SIGNATUREDATA
Table 1Major Spectra Sources: MS Spectra MS Data Compilations
Table 2Major Spectra Sources: IR Spectra IR Data Compilations
Table 3Major Spectra Sources: NMR Spectra NMR Data Compilations
Table 4Major Spectra Sources: GC Spectra GC Data Compilations
Table 5Major Spectra Sources: UV Spectra UV Data Compilations
HT RESEARCHINSTITUTE
152 IITRI-C6104-4 MAJOR SPECTRA MS SPECTRA TABLE 1 SOURCES MAJORASTM SOURCE SPECTRA3,200 spectraAVAILABLE FORMIndexedstrongest OF SPECTRA by peaks $13 for DS-27 COST LiteratureCommercial,SPECTRAIndustrial, SOURCE andstrongest3,200($96) indices IBM cards-peak tik)Ul 9,000collectionadded (European in June) to be CollectionsMiscellaneous .Anmem
MS DATA COMPILATIONS
American Society forTesting & Materials. Mass Spectral Data Index, DS27. 248 pp., 1964. $13. Mass Spectral Data Mass Spectral Data,3,200 cards. $96. DS27-1a. cards. $115. DS27-1b. Mass Spectral Nameand Formula, 3,500 Spectral Data, Cornu, A., and Massot,R. Compilation of Mass Limited, Presses Index of Spectresde Masse. Heyden and Son Universitaires de France,1966. Rosenstock, H. M. National Bureau ofStandards. Garvin, D., and Chemical Kinetics Two National Bureauof Standards DataCenters: 7, No. 1, 31-4,1967. and Mass Spectrometry. J. Chem. Doc. Rosenstock, H. M. (NBS Mass SpectrometryData Center, Information from 1,000basic documents Washington, D. C.). Dec. published since 1955. Inf. Retr. Newsl.2, No. 2, p. 5, 1966.
HT RESEARCHINSTITUTE [.
154 IITRI-C6104-4 MAJOR SPECTRA SOURCES TABLE 2 MAJOR SOURCE SPECTRA AVAILABLE FORM OF SPECTRA IR SPECTRA Complete indexed collection, COST API,SPECTRA MCA, SOURCETRC, DMS, ASTM growth100,000IR - 2i2 potential spectra to 16,L; CodedNo complete spectra spectra paredthetapesadditional.$2,250 9th for -supplement; index IBM 3607090to collection upbeing pre- Available on (model 20) through LiteratureNBS,andIRDC-Japan. Sadtler,Foreign. - Domestic able.10thIndices15,000-20,000/Year Supplement DS-24 through avail- LaboratoriesSadtler Research 32,000 spectra hardCompletefilm.spectra copy spectra oridentified micro- - All Sadtler tional.$4,000. Spec-finder addi- Sadtler spectra. Coblentz Society 5,000 spectra Completeand indexed spectra by ASTM.only. Sold10/sheet. by Sadtler Research Laboratories.UniversityIndustrial andResearch byA11Completeidentified ASTM. Coblentz spectra. andspectra indexed Laboratories.30/sheet. Industrial TRC Approximately4,000 spectra AllTRCfiedSummer, TRC indexand spectra indexed 1968.to be identi- by ASTM. available Researchand University Laboratories. IR DATA COMPILATIONS
American Society for Testing &Materials. Numerical list of abstracted infrared spectraindexed on Wyandotte-ASTMpunched cards, December 1961. Firstsupplement-Jan.1963 Secondsupplement-Jan.1964 Thirdsupplement-Apr.1965 Fourthsupplement-Apr.1967. American Society for Testing &Materials. Serial number list of compound names and references,publidhed IR gpectra 1963. Fourth supplement to serialnumber lists - DS29-S4, April, 1967. American Society for Testing &Materials 1916 Race Street Philadelphia, Pennsylvania 19103 Indexes - 100,000 spectra of organiccompounds.
Anderson Physical Laboratory 609 South Sixth Street Champaign, Illinois Mostly organic IR - 0.7p to3.2p 2 spectra per 81/2" x 11"loose leaf gheet 1,200 Anderson Near InfraredSpectra 500 polymiclear compounds.
Coblentz Society. Pure compounds and commercialproducts. IR only,212 to 30p 5,000 spectra available on8-1/2" x 11" sheets Coblentz Society spectra - obtainedfrom Sadtler Research Laboratories, 3316 Spring Garden Street,Philadelphia 2, Pa. $100 per 1,000 spectra.
Dobriner, K., et al. Infrared Absorption Spectra ofSteroids. InterJcience Publishers, Inc., NewYork. 1953. An Atlas - Vol. I. 308 spectra on Perkin-ElmerModel 21 double beam spec- trometers, NACL or CalciumFluoride prism. Roberts, G., Gallagher, B. S., andJones, R. N. Infrared Absorp- tion Spectra of Steroids - AnAtlas - Vol. II. 1958. 453 gpec- tra.
lITRESEARCH INSTITUTE
156 IITRI-C6104-4 Documentation of MolecularSpectroscopy (DMS). Published jointly by Butterworth'sScientific Publications and Verlag Chemie GmbH.Weinheim/Bergstrosser. Spectra available from: Spex Industries, Inc. 205-02 Jamaica Avenue Hollis 23, New York
Fisk University, Nashville,Tennessee (Infrared Spectroscopy Research Laboratory). Maintains a library of 100volumes on infrared spectroscopy and gas chromatography. Hershenson, H. M. IR Absorption Spectra. Index for 1958-1962. Academic Press. 1964. Jonker Business Machines, Inc.,Gaithersburg, Maryland. ASTM Infrared OpticalCoincidence Index. Data retrieval and correlationfor organics in the following IR collections: API Project #44 1,702 compounds NRC-NBS 2,495 compounds Coblentz 1,859 compounds MCA 170 compounds IRDC-Japan 11197 compounds All hydrocarbons from all other ASTM indexed collections 11828 compounds Korobeinicheva, I. K., Petrov, A.K., and Koptyng, V. A. Spec- tral Atlas for Aromatic andHeterocyclic Compounds No.1, In- frared and UltravioletAbsorption Spectra of PolyfluoroAromatic and Polyfluoro HeterocyclicCompounds. Nauk: Hovosibirsk. 171 pages, 1967. LeCentre d'Information del'Infra-rouge (CIR). Organic compounds and Inorganic compounds. Sources: Sadtler DMS API Research Project#44 Coblentz collections Institute Francois de Petrole,RNU Renault, Rhone Poulenc, St. Gobain. CIR published 200 spectra thendiscontinued publication.
Ministry of Aviation. An index of publishedinfrared spectra. Her Majesty's StationeryOffice, London, 1960.
UT RESEARCH INSTITUTE
157 IITRI-C6104-4 Bureau of Standards. National ResearchCouncil - National Spectra Data Project. 8-1/2" x 11" Mostly organiccompounds - 2,510spectra on sheets and Project terminated andspectra distributedto universities not for profitresearch organizations asof December 1967.
Sadtler ResearchLaboratories, Inc. 3316 Spring GardenStreet Philadelphia, Pennsylvania 19104 IR - 32,000complete spectraorganic compounds Plenum Press. Szymanski/ H. A.(Ed.). Infrared BandHandbook. 1965. Supplements 1-21Plenum Press, 1965. Szymanski, H. A. (Ed.). Infrared Band Handbookand Supplements New York. and InterpretedInfraredSpectra. Plenum Press. 1963. $7.50. Plenum Press. White, R. G. Handbook of IndustrialIR Analysis. 1964. Wright-Patterson Air ForceBase, Ohio. Aliphatic and Aromatic Published as Hydrocarbons, bromohydrocarbons. IR 15,4-35A. WADC TR 57-359,57-413, 58-198 andsupplement 1 to 58-198.
IIT RESEARCHINSTITUTE
158 IITRI-C6104-4 MAJOR SPECTRA NMR SPECTRATABLE 3 SOURCES EAJOR SOURCE SPECTRA4,000 spectraAVAILABLE CompleteFORH OF SPECTRA spectra COST$800 for SPECTRASadtler SOURCE spectra LaboratoriesSadtlerTRC Researdh 1, 260 spectra Complete spectra collection30Vsheet UniversityIndustrialsearch Labs. and Re- -
NMR DATACOMPILATIONS
Shoolery, J. N. (Instrument Johnson, L.F., and ghacca, N. S., NMR SpectraCatalog, Vol.1 Division ofVarianAssociates). and 2,National Press,1962. and Webb, J.S. (Eds.). Formula Howell, M. G.,Kende, A. S., Vol. 2. Plenum Press. New York. Index to NMRLiterature Data, 1956a. $22.50. 518 pp. 1966. CA 62, circulated byESSO Research Chamberlain,privately NMR Data - chemical shiftvalues andcoupling and Engineering. Tables of constants forrepresentativestructures.
Sadtler ResearchLaboratories, Inc. Garden Street 3316 Spring 19104 Philadelphia,Pennsylvania printed form ormicrofilm at 4,000 NPR spectraavailable in a costof $680. Varian Associates, Catalogs, Vol.1 and 2. Varian Spectra of 700 spectraof representtive California. A total code, and by Palo Alto, by afunctional group compounds,indexed by name, chemical shifts.
lITRESEARCH INSTITUTE
160 IITRI-C6104-4 TABLE 4 SOURCES MAJOR SPECTRA -GC DATA SPECTRA COST SPECTRA SOURCE MAJORASTM SOURCE SPECTRA4,000 AVAILABLE FORMIndexed OF compilation 111 CommercialLiteratureIndustrial
1111111111111111111111111.111111.111111111.111111111.1.11.111111111111111111.11111111...... _._-_-_-_ -
GC DATACOMPILATIONS Compilation of Gas forTesting & Materials. American Society $20.00. Chromatographic Data. STP 343. 626 pp. 1963. Schupp, 0. E.). DS25A published1967(see reference,' Preston McReynolds, W. O. Gas ChromatographyRetention Data. 1966. $25. Technical AbstractsCo., Chicago,Illincis1 Co., Preston, S. T., Jr.,and Gill, M. (Preston Tech. Absts. to theliterature on Chicago, Ill.). Bibliography and index gaschromatography. 1966. Compilation of GasChrom- Schupp, O. E., III,and Lewis, J.S. Publ. No. DS25A,2nd Ed., atographic Data. ASTM Data Series ASTM, 732 pp. 4,000 compounds. 1967. Literature. Plenum Signeur, A. V. Guide to GasChromatography Press, New York. 359 pp. 1964. $12.50.
11T RESEARCHINSTITUTE
162 IITRI-C6104-4 - t=1111M7r MAJOR SPECTRA TABLE 5 SOURCES MAJOR SOURCE SPECTRA AVAILABLE FORM OF UV DATA SPECTRA COST SPECTRA SOURCE ASTM 25/749 spectra plete$754collectionadditionalIndices com- CommercialLiteratureIndustrial LI) Sadtler Researdh 22,000 spectra Complete81/2" x 11" spectra or $2/278 LaboratoriesSadtler Research LaboratoriesTRC 11246 spectra microfilmComplete spectra 3WSheet UniversityIndustrial and Labs. UV DATA COMPILATIONS
American Societyfor Testing &Materials 1916 Race Street 19103 Philadelphia, Pennsylvania compounds. Indexed through the7th supplement. 25,749 organic Cost $754 forcomplete index. Spectrometry Group Compounds. Photoelectric Atlas of Organic Spektroskopie. and the Institutfor Spektrochemieand Angewandte Plenum Press, NewYork. 5 volumes. 1966. Numerical List of American Society forTesting & Materials. Spectra Indexed onWyandotte- abstracted Ultravioletand Visible Second supplementto The ASTM punchedcards. March 1963. Abstracted Ultravioletand VisibleSpectra Numerical List of September 1966. Indexed onWyandotte-ASTM PunchedCards. of organic Hiragma, K. Handbook of UV andvisible spectra 616 pp. Data on 8,443 compounds. Plenum Press, NewYork. compounds fromliterature. 1967. Spectra. Hershenson, H. M. Ultraviolet andVisible Absorption York. 1966. Index 1960-1963. Academic Press, New Korobeinicheva, I. K.,Petrov, A. K.,and Koptyng, V.A. Aromatic andHeterocyclic CompoundsNo. 1, Spectral Atlas for Polyfluoro Aro- Ultraviolet AbsorptionSpectra of Infrared and Nauk: Novosibirsk. matic and PolyfluoroHeterocyclic Compounds. 171 pp. 1967. in the UV andVisible Region. Lang, L. (Ed.). Absorption Spectra Academic Press, NewYork. 1966. Visible Region, Lang, L. (Ed.). Absorption Spectrain the UV and Hungarian Academy ofSciences. 1961-1963.
Sadtler ResearchLaboratories, Inc. 3316 Spring GardenStreet Philadelphia, Pennsylvania 19104 Printed form, ormicrofilm availableof complete spectra. 22,000 UV spectraavailable at costof $21278.
PITRESEARCH INSTITUTE
164 IITRI-C6104-4 ,
APPENDIX C
DATA MASTERSHEET
IIT RESEARCHINSTITUTE
165 IITRI-C6104-4 106,423 p-xylene 106.160 CAS REG. NO. COMPOUND NAME MOL. WT.
TRIVIAL AND TRADE NAMES:Xylol
CH STRUCTURE: FORMULA: 810 $11101 13 3°C M.P.:
B.P.:138.4°C STATE: Liquid ANALYTICAL DATA McReynolds, CONDITION A CONDITION B Data Source - GC "GC RetentionData," 1966, AV=1.63 RV=1.56 pp 51 and145 g g 889 1180 I= Ix= x
51 77 27 78 50 92 MS MASS:CHARGE: 91 106 105 39 162 137118 82 77 77 REL. INTENSITY: 624 297 164 DATA SOURCE-APIPro'ect 44,4422
NMR ppm: NON
7 STRONGEST: 2.3 0 1 2 3 4 5 6 CDC1 ...... 05 - - 30 - - SOLVENT: _3 1, 1962,4203 DATA SOURCE-Varian, '7%7(31. TMS) (arranged in order ofincreasing shiftvalue from
IR MICRONS: 12.6, 6.6, 6.9
Phase - liquid Cell -0.01 mm 42276 DATA SOURCE-Sadtler AMII.....ftwriva. UV MILLIMICRONS:
= 2.2 mp, max Solvent - Cyclohexane
DATA SOURCE-Sadtler#2276N
166 1ITRI-C6104-4 APPENDIX D
CHEMICAL SIGNATURESEARCH EXPERIMENT
lITRESEARCH INSTITUTE
167 IITRI-C6104-4 PROJECT C6104
CHEMICAL SIGNATURE SEARCHEXPERIMENT
DATA SHEETS FOR "UNKNOWN"COMPOUNDS
Compound name Number
STATE Solid Liquid Gas
ELEMENTS NOH 0N BrCl
SKIP TRANSPARENT NMR
-0.1 ppm Max. Peak +0.1 ppm
SKIP TRANSPARENT
-2 mg Peak
SKIP NOT POSSIBLE Column A: Time Column B: Time
SKIP Peak 1: -0.1 A Peak 1 +0.1 g Peak 2: -0.1 A Peak 2 +0.1 A Peak 3: -0.1 A Peak 3 +0.1 g
SKIP Peak 1: Peak 3 Peak 2: Peak 4
Coder Date *Magnitude of tolerancebased on a slidingscale. 168 IITRI-C6104-4 APPENDIX E
CHEMICAL SIGNATURESEARCH RECORD
UTRESEARCHINSTITUTE
169 IITRI-C6104-4 PROJECT C6104 ExperimentUnknown CHEMICAL SIGNATURES SEARCH RECORD OperatorDate Routine Parameter Tally Skip MatCh M Search Tally Default D MD MatchCum. ElementsState I uv NP Col. A Col. B AB IR 1-1 1-2 1-3 2-1 2-2 2-3 3-1 3-2 3-3 I 1M I 314 Match3-4 = 2M+ 3M MS NP4-1 1-14-2 1-24-3 1-34-4 1-4 1M 2-1 2M 2-2 3M 2-34M Matcb2-4 = 2M+ 3M+ 4M 3-1 3-2 3-3 REMARKS: APPENDIX F
CANDIDATE LIST
IIT RESEARCHINSTITUTE
171 IITRI-C6104-4 Record 14 (*R14) ExperimentUnknown CANDIDATE LIST AnalystDate Compound State , Element No. NMR L Data Match GC Peaks IR Peaks MS Rank No. Name 1 M M MDMD1 Col 2 Col,DM D M , 1 -1- .-s 1- 1 I 1 i I I I APPENDIX G
CANDIDATE LIST RATING TABLE
IIT RESEARCHINSTITUTE
173 IITRI-C6104-4 CANDIDATE LIST RATING TABLE
SEARCH PARAMETER RATING SEARCH PARAMETER RATING 16 - 2 1-1 STATE 1-2 12 1-3 8 1-4 4 ELEMENTS - 4 2-1 9 + TOLERANCE 20 2-2 12 9 NMR EXACT PEAK 24 2-3 2-4 6
COLUMN A: MS 3,1 4 6 3-2 6 + TOLERANCE 3-3 8 EXACT 9 3-4 6
GC COLUMN B: 4-1 1 6 4-2 2 + TOLERANCE 4-3 3 EXACT 9 4-4 4 MAXIMUM 18 MAXIMUM 40
1 to 1 18 + TOLERANCE 4 15 UV 1 to +1 EXACT PEAK 6 1 to -.f 12 1 to +2 9 1 to -a- 6 Any SKIP 0 1 to +3 3 Any DEFAULT 1
2 to 1 8 , 2 to +1 6 IR 2 to 12 2 to +2 9 2 to -a- 8 2 to +3 6
3 to 1 2 3 to +1 1 3 to -2- 4 3 to +2 3 3 to .5. 6 3 to +3 4 MAXIMUM 130 MAXIMUM 36
174 IITRI-C6104-4 DISTRIBUTION LIST
This report is beingdistributed as follows:
Copy No. Recipient
1-100 National ScienceFoundation 1800 G Street Washington, D. C.
Attention: Mr. T. Quigley
101 IIT ResearchInstitute Section Files
102 IIT ResearchInstitute Editors, M. J. Klein,Main Files
103 IIT ResearchInstitute G. E. Burkholder
104 IIT ResearchInstitnte M. E. Williams
105 IIT ResearchInstitute P. Llewellen
106 IIT ResearchInstitute E. S. Schwartz
107 IIT ResearchInstitute H. J. O'Neill
108 IIT ResearchInstitute B. E. Edwards
109 IIT ResearchInstitute W. M. Linfield
110 IIT ResearchInstitute R. G. Scholz
lITRESEARCH INSTITUTE
175 IITRI-C6104-4