Determined Feasibility of Identifying an Unknown Compound

Home , Octadecane

DOCUMENT RESUME ED 025 413 SE 004 965 By- Scholtz, R G.. And Others of Feasibility Study of the Development of a Specialized ComputerSystem of Organic Chemical Signatures Spectral Data Illinois Inst. of Tech, Chicago. Spons Agency-National Science Foundation.WashingtON D.C. Repor/ No- IITRI-C6104- 4 Pub Date Apr 68 Contract NSF CS14 Note- 186p. MRS Price MF-S0 75 HC- $9 40 Descriptors- *Chemistry,Information Retrieval, *InformationScience, *Organic Chemistry Identifiers-National Science Foundation This final report of a feasibilitystudy describes the researchperformed in assessing the requirementsfor a chemical signaturefile and search schemefor organic compoundidentification and information retrieval.The research performed to determined feasibility of identifying anunknown compound involved screeningthe compound against a file of chemicalsignatures of knowncompounds. The chemical signatures were obtainedfrom (1) infrared spectrometry(IR), (2) nuclear magnetic resonance (NMR) spectrometry,(3) mass spectrometry(MS), (4) gas chromatography (GC), and (5) ultraviolet (UV)spectrophotometry. A physical stateand element search precededthesignaturesearch.The basicsystem containeddatafor 500 representative pure organiccompounds. The discriminatingability of each signature, from most to least discriminating,follow the order (1) IR, (2)MS, (3) GC, (4) NMR, and (5) UV. The logic and searchprocedures of the experimentalsystem weredesigned for ready conversion to a computersystem. Appendixes contain alisting of the 500 compounds in the data baseand other experimentdetails. OW EDUCATION & WELEARE U.S. DEPARTMENT Of HEALTH, OFFICE OF EDUCATION

REPRODUCED EXALTLY ASRECEIVED FROM THE THIS DOCUMENT HAS SEEN OPINIONS ORIGINATING IT.POINTS Of VIEW OR PERSON OR ORGANIZATION REPRESENT OFFICIAL OFFICEOF EDUCATION STATED DO NO1 NECESSARILY POSITION OR POLICY

Report No.IITRI-C6104-4 (Final Report) FEASIBILITY STUDY OF THEDEVELOPMENT OF A SPECIALIZEDCOMPUTER SYSTEM OF ORGANIC CHEMICALSIGNATURES OF SPECTRAL DATA

National ScienceFoundation Report No. IITRI-C6104-4 (Final Report)

FEASIBILITY STUDY OF THEDEVELOPMENT OF A SPECIALIZED COMPUTERSYSTEM OF ORGANIC CHEMICAL SIGNATURESOF SPECTRAL DATA

April 17, 1967, through April16, 1968 Contract No. NSF-0514 IITRI Project C6104

Prepared by

R. G. Scholz, E. S. Schwartz, and M. E. Williams

IIT RESEARCH INSTITUTE Tedhnology Center Chicago, Illinois 60616

for

National Science Foundation 1800 G Street Washington, D. C.

Attention: Mr. T. Quigley

Copy No. cV; April 291 1968

HT RESEARCH INSTITUTE FOREWORD

This is Report No.IITRI-C6104-4 (Final Report) onIITRI Project C6104, entitled"Feasibility Study of theDevelopment of a Specialized ComputerSystem of OrganicChemical Signatures This program isbeing performed for the of Spectral Data." The National ScienceF.:undation under ContractNo. NSF-0514. work reported herein wasperformed from April l 1967, through April 16, 1968.

The project leaderis Dr. Robert G.Scholz, Research Chemist, and the project is underthe administrativeguidance of Dr. Warner M.Linfield, Manager, OrganicChemistry Research. Major contributions tothis report were madeby Eugene S. Schwartz, Senior Scientist, andMiss Martha E. Williams,Manager, Technical Information Research.

The authors acknowledgethe excellent work ofthe following persons whocontributed to this study: M. M. Bogolin, P. A. Demink, B. E.Edwards, J. J. Finn, P. A.Llewellen, B. L. Mankus, H. J.O'Neill and A. J.Starshak. Respectfully submitted, j\ IIT RESEARCH INSTITUTE

R. G. Scholz Research Chemist Organic Chemistry Research

Approved by:

Yvi i4AA/° W. M.Linfield Manager Organic Chemistry Research

Li RGS:dfp

lITRESEARCH INSTITUTE

ii IITRI-C6104-4 FEASIBILITY STUDY OFTHE DEVELOPMENT OF A SPECIALIZEDCCMPUTER SYSTEM OF ORGANIC CHEMICALSIGNATURES OF SPECTRALDATA

ABSTRACT

This final reportdescribes a study ofthe feasibility of information file fororganic chemical signatures. a proposed for a The purpose of thestudy was to assessthe requirements file and a searchscheme which couldbe used for compound identification and informationretrieval. Use of the system could be made bychemists from industry,government and to identify educational institut;.ons. Users would be able compounds and obtainreferences to theprimary data sources compound sample from by supplying eitheranalytical data or a derived. These data wouldthen be submitted which data could be Each to the computerfor screening againstthe master file. compound in the filewill bear theChemical Abstractsregistry number and sourcereferences so that thechemical signature file will be tiedin with the existingNational Chemical Information System andoriginal data sources.

The researchperformed determinedthe feasibility of identifying an unknowncompound by screeningit against a file of chemical signaturesfor known compounds. The chemical signatures were obtainedfrom Infrared (IR)spectrometry, spectrometry nuclear magnetic resonance(NMR) spectrometry, mass (MS), gas chromatography(GC), and ultraviolet(UV) spectro- preceded photometry. A physical stateand element search the seardhes on thefive signatures. Data for 500representative pure organiccompounds were used todevelop the basic system. from existing sources,such as ASTM, Sadtler, Data were obtained Some Varian, and otherGovernment and privatecollections. additional data weregenerated.

The work encompassedselecting the representativecompounds; acquiring source data;analyzing, indexingand cataloging the data; devising afile structure andsearch strategy; and demonstrating feasibilityby operating thedata-matching and -screening system. A series ofexperimental searches wasconducted using the compounds as the database. In these searches 500 representative treated as unknowns 100 compounds allhaving five signatures were and screened againstthe file compounds. Also, eleven compounds analyzed by the fivetechniques and the data of the 100 were file with obtained were screenedagainst the file. An inverted optical coincidence(Termatrex system) was usedto demonstrate the feasibility ofthe identification system.

HT RESEARCHINSTITUTE

iii IITRI-C6104-4 A candidate had to fulfill the following criteria:

1. Same physical state at standard conditions

2. Elements

3. Match on two of the three most intense IR bands, within tolerances, in any order

4. Match on two of the four most intense MS peaks in any order

5. Match on relative retention time, within tolerances, on any one of 2 selected GC columns

6. Match on the most intense peak, within tolerances, in NMR

7. Match on the most intense band position, within tolerances, in UV.

Of the 100 "unknowns" screened using these original matcl criteria, 68 came through as unique candidates while 32 came through with multiple candidates. The large number of multiple candidates is attributed primarily to the lack of data (defaults).

A search made for the 11"unknowns" which were analyzed produced 9 unique candidates,one lost to suspected erroneous data in the file and one with4 candidates, attributed to lack of data.

A test on the 32 compounds having multiple candidates using modified, more rigid match criteria resulted in 24 coming through as unique candidates.

The discriminating ability of each signature, from most to least discriminating, follow the order: IR, MS, GC, NMR and UV.

The logic and the search procedures of the experimental system were designed for ready conversion to a computerized system.

To evaluate the results of these searches a weighting sdheme based on chemical criteria, i.e., relative usefulness of the types of analytical data, was established. Exact, tolerance, and default matches were also considered in the weight assignments. When data for more than one compound survived the search, the differences in ratings clearly identified the better candidate.

lITRESEARCH INSTITUTE

iv IITRI-C6104-4 The study has demonstrated that the composite chemical signature method for screening and matching data is feasible and provides a reliable base for compound identification.

IIT RESEARCH INSTITUTE

v IITRI-C6104-4 TABLE OF CONTENTS Page

Abstract iii

I. Introduction 1

Selection of Compounds 3

III. Acquisition of Data 4

A. General 4 B. Gas Chromatography 11 C. Nuclear Magnetic Resonance 16 D. Infrared 24 E. Mass Spectrometry 26 F. Ultraviolet Spectrophotometry 27 G. Survey of Signature Availability 28 H. Cataloguing Data and Data Master Card Design 31

IV. Testing Signature Concept 32

A. Experimental Design 32 B. Search Procedures 38 C. Search Measures 43 D. Test Results with Initial Match Criteria 47

V. Summary and Conclusions 108

VI. Recommendations 115

Appendix A-500 Compounds in Data Base 119

Appendix B-Availability of Signature Data 152

Appendix C - Data Master Sheet 165 Appendix D - Chemical Signature Search Experiment 167 Appendix E - Chemical Signature Search Record 169

Appendix F - Candidate List 171

Appendix G - Candidate List Rating Table 173

lITRESEARCH INSTITUTE

vi IITRI-C6104-4 LIST OF TABLES

Table Page

1 Compounds Available in Multisignature Combinations 6

2 Distribution of Multisignatures by Functional Group 7

3 4-Signature Combinations by Functional Group 8

4 3-Signature Combinations by Functional Group 9

5 2-Signature Combinations by Functional Group 10

6 Operating Conditions for Gas Chromatographic Retention Data 12

7 Variability of GC Retention Data 14

8 NMR Peak Positions for Diethyl Phthalate 19

9 NMR Peak Positions for Diethyl Phthalate According to Sadtler 20

10 NMR Peak Positions for Diethyl Phthalate According to Varian 21

11 Search Table (CSTABL) 41

12 Signature Availability and Default Ratios 45

13 Signature Discrimination for 100 Known Compounds 48

14 MS Matches on Peak Permutations 52

15 Number of Candidate Compounds Obtained with Different MS Match Criteria 53

16 IR Matches on Peak Permutations 54

17 Number of Different Compounds Obtained with Different IR Match Criteria 55

18 Number of Candidate Compounds Obtained with Different GC Match Criteria 57

19 State and Elements Data Availability 59

20 Selectivity of State and Elements Searches 59 lITRESEARCH INSTITUTE

vii IITRI-C6104-4 LIST OF TABLES(cont.)

Table Page 63 21 Selectivity with DifferentSearch Orders 67 22 Search Selectivity by Stages

23 Search Selectivity ofFunctional Groups 68

24 Data Availability 65 Extraneous CandidateCompounds 70 71 25 Distribution of Candidate Ratings 72 26 Search Precision

27 Compounds with Multiple Candidatesby Functional Group 74

28 Matches Obtained withMulticandidate Test Compounds 75

29 Extraneous Candidate CompoundSelected More than Once 92

30 Candidate Compounds Retrievedin Search for Octane 94 97 31 Comparison Between Storedand Laboratory Data 98 32 Identification of Compounds fromLaboratory Data

33 Candidates Eliminated withModified Match Criteria 101 103 34 Selectivity of Combinationsof Signatures

lITRESEARCH INSTITUTE

viii IITRI-C6104-4 LIST OF FIGURES

Figure Page

1 NMR Spectra of DiethylPhthalate 18

2 Distribution of Signature Matches 50

3 Selectivity with Sequence Orderedby Match Criteria 61

4 Selectivity with Sequence Orderedby Default Ratio 62

5 Selectivity of 7-Stage Searches(1) 64

6 Selectivity of 7-Stage Searches(2) 65

7 Selectivity of 2-Signature Searches Averages of 10 Known Compounds 104

8 Selectivity of 3-Signature Searches Averages of 10 Known Compounds 105

9 Selectivity of 4-SignatureSearches Averages of 10 Known Compounds 106

IIT RESEARCH INSTITUTE

ix IITRI-C6104-4 FEASIBILITY STUDYOF THE DWELOPMENT OF A SPECIALIZEDCOMPUTER SYSTEM DATA OF ORGANIC CHEMICALSIGNATURES OF SPECTRAL

I. INTRODUCTION engaged In the past yearIIT ResearchInstitute has been data in a program tostudy the feasibilityof an analytical information file fororganic compounds. The program was Science carried out underthe sponsorship ofthe National

Foundation. Although there is awealth ofanalytical dataavailable, know where it islocated nor the analyticalchemist does not would be of does he have ready accessto it. This information unknown considerable helpparticularly foridentification of the number of compounds. In the area oforganic chemistry, research in the areasof compounds is steadilyincreasing and analysis is becoming organic synthesis aswell as instrumental complexity of available morediversified. The volume and analytical data aregrowing and evenwell-trained analytical keep pace withthis and organic chemistsfind it difficult to of chemists tohave growth. Thus, there is aneed on the part their analytical a placeWhere they can gofor help with

problems. chemists The usual procedurefor analyticaland organic compound is toobtain trying to confirmthe identity of a spectral data by as manyanalytical techniques asare

available to comparethese data withreference data and lITRESEARCH INSTITUTE

1 IITRI-C6104-4 identify the unknown compound. All too often the chemist is limited by inadequate files. A search would befacilitated by a centralized systemwhich can use either spectraldata or can provide a more complete array ofspectral data to search for and identify the unknowncompound and direct the user to additional analytical, chemicaland physical data. To date, there is no such system. There are available variouscollections of spectral data fromindustrial and governmentorganizations such as Sadtler ResearchLaboratories, Varian Associates: the American Society for Testing andMaterials, the American Petroleum Institute and the NationalBureau of Standards.

Most of these collections arelimited in scope and, more important, are not correlated witheach other. At present a researcher will have to spendconsiderable time and money in an effort to locate a sourcethat has the analytical data on file and Whichwill help identify an unknowncompound and reference the original data. A well-planned andultimately computerized central data file wouldprovide a single location

to which a scientist couldsubmit his data or bring compounds

from which the spectral datacould be obtained. Such a file containing all published standardizeddata could be screened

to identify an unknowncompound. If the data are not available he need not go elsewhere if thefile incorporated data from

other collections. If the data are contained in thefile he will be able to identify thecompound, learn the original

source of the data andbe directed to additionalchemical and

lITRESEARCH INSTITUTE

2 IITRI-C6104-4 physical data pertainingto the compound. The objective of the research studyreported herein was todetermine the capability of a compositesignature to identify anunknown ccmpound. The composite signatureis made up of a minimum number of selected datapoints obtained from 5analytical accomplished by matching techniques. Identification would be the composite signatureof the unknown against afile of composite signatures. In a sequentialsearch, candidate compounds would be selected byobtaining the logicalintersection of the

individual signaturematches. We have prepared adata file and developedthe logical

procedures for searchingthe file. The work involvedacquiring

data for a referencefile of 500 pure organiccompounds, indexing

and tabulating the data,defining the search strategyand finally testing the systemfor its ability toselect the proper the work compounds. This final report onthe project describes

and the results. Recommendations based onthe study are provided. A

comprehensive list showingdata sources, type ofdata available,

costs and otherinformation are also included.

II. SELECTION OF COMPOUNDS Our initial efforts weredirected towards selecting500

compounds for which wecould obtain 2, 3, or 4sets of data

derived from infraredspectrometry, nuclearmagnetic resonance,

mass spectrometryand gas chromatography. We wanted the

lITRESEARCH INSTITUTE

3 IITRI-C6104-4 compounds to represent manycompound types and,in some cases, compounds whose compositesignatures weresimilar to test the selectivityof the sequentialsearch to select the the proper compound. To aid in theselection of compounds Chemical Names was used SOCMA Handbook forCommercial Or anic IUPAC name and CASregistry to identify thecompounds by their identified by our systemthe CAS number. When a compound was registry number wouldaccompany eachcompound allowing the user information about to obtainadditional chemicaland physical organic compounds were the compound. Ultimately, 838 different considered for use. These representedhydrocarbons, terpenes, steroids, heterocyclics, polyfunctionals,aromatic, amino acids,

alkaloids and others. Also, we made thedecision to include ultraviolet as a fifthsignature. From the 838compounds, we hoped to select as many aspossible (up to500) that had 5 compounds, 23 of theoriginal signatures. In the 500 selected 24 compound types wererepresented with onlycarbohydrates

being excluded.

III. ACQUISITION OF DATA

A. General Our plan wes to extractand index datafrom various

literature and commercialsources tobuild our data base.

Initially, since wehad not yet determinedthe minimum amount concerned with of data necessaryfor identification we were getting more data than wethought we might needfor the lITRESEARCH INSTITUTE

4 IITRI-C6104-4 final system. We tried to find allsignatures for all 838 compounds. This proved to be a major problem and moredifficult than had been anticipated. The difficulty lay not onlyin the lack of data but in our method of compoundselection - we were limiting the compounds to several hundred andrestricting ourselves to finding as many signatures as wecould for just these compounds. However, this approach was reasonable atthis stage of the program inasmuch as we were trying tomaintain the diversity of compound types. Of the 838 compounds consideredfor use, 357 had NMR data,

233 had UV data, 478 had MSdata, 476 had IR data and 308had

GC data. Of these, 500 were selectedfor use in our file. The following is a breakdownof the data distributionfor these

500 compounds: 100 had 5 signatures each

174 had 4 signatures eadh

195 had 3 signatures each

31 had 2 signatures each

Table 1 lists the number ofcompounds available in 5-, 4-, 3-,

and 2-signature combinations. Table 2 summarizes the number of signatures available in the23 chemical groups used in the

experiment. Tables 3, 4 and 5 list thecombinations of signatures available in the 23chemical groups in 4-, 3-, and

2-signature combinations.

HT RESEARCHINSTITUTE

5 IITRI-C6104-4 Table 1

COMPOUNDS AVAILABLE INMULTISIGNATURE COMBINATIONS

SIGNATURES NUMBLR

UV/MS/GC/IR/NMR 100 MS/GC/IR/NMR 89 UV/GC/IR/NMR 2 UV/MS/IR/NMR 50 UV/MS/GC/NMR 0 UV/MS/GC/IR 33 UV/MS/GC 3 UV/MS/IR 21 UV/MS/NMR 7 UV/GC/1R 2 UV/GC/NMR 1 UV/IR/NMR 7 MS/GC/IR 62 MS/GC/NMR 4 MS/IR/NMR 85 GC/IR/NMR 3 UV/MS 6 UV/GC 0 UV/IR 1 UV/NMR 0 MS/GC 6 9 MS/IR. MS/NMR 3 GC/IR 3 GC/NMR 0 IR/NMR 3

TOTAL 500

6 IITRI-C6104-4 Tabl e 2

DISTRIBUTION OF MULTISIGNATURES BY FUNCTIONAL GROUP

NUMBER FUNCTIONAL OF COMPOUNDS NUMBER OF SI GNATURES AVAILABLE GROUP I N GROUP FIVE FOUR THREE TWO

ALCOHOLS 50 19 20 10 1 ALDEHYDES 21 8 5 7 1 ALKALOIDS 3 0 0 2 1 AMI DE S/I MI DE S 8 0 1 4 3 AMI NES 18 0 4 13 1 AMI NO AC IDS 4 0 1 2 1 CARBOXYLIC ACIDS 24 0 8 15 1 DI OL S 14 4 9 1 0 ESTERS 56 4 23 26 3 ETHERS 22 2 8 9 3 HALOCARBONS 57 5 22 30 0 HETEROCYCLIC S 22 0 11 11 0 HYDROCARBONS 70 23 22 22 3 KETONES 27 10 9 7 1 NITRI LES 18 3 7 8 0 NI TRO 9 6 1 1 1 POLYP UNC TIONAL S 48 10 14 18 6 STEROIDS 2 0 0 1 1 SULF I DES 8 5 2 1 0 SULF ONE S 1 0 0 0 1 SULF OXE DE S 1 0 1 0 0 TERPENES 11 0 4 5 2 THIOLS 6 1 2 2 1

TOTALS 500 100 174 195 31

7 IITRI-C6104-4 Table 3

4-SIGNATURE CCMBINATIONS BYFUNCTIONAL GROUP

r4'

FUNCTIONAL GROUP

ALCOHOLS 3 1 3 0 13 ALDEHYDES 4 0 0 0 1 0 ALKALOIDS 0 0 0 0 0 AMIDES/IMIDES 0 0 1 0 0 AMINES 1 0 3 0 0 AMINO ACIDS 0 1 0 0 CARBOXYLIC ACIDS 0 0 8 0 0 6 DIOLS 0 0 3 0 ESTERS 19 0 3 0 1 6 0 2 0 0 ETHERS . 0 1 HALOCARBONS 19 0 2 0 0 HETEROCYCLICS 6 0 5 0 5 HYDROCARBONS 7 0 10 0 3 KETONES 4 0 2 0 0 NITRILES 7 0 0 0 0 NITRO 1 0 0 0 1 POLYFUNCTIONALS 6 0 7 0 0 STEROIDS 0 0 0 0 1 SULFIDES 0 0 1 0 0 SULFONES 0 0 0 0 0 SULFOXIDES 1 0 0 0 0 1 TERPENES 3 0 0 0 0 THIOLS 2 0

TOTALS 89 2 50 0 33

8 IITRI.C6104-4 00000N0000000000r400000°1

1 I-100MMOMOM00000M0Mr4r400r4NI2 r-I N i

\1 Or4r4000000000r40r40000000011, M Z

LI N000000r4MWIN000000MOINN 0 M Z

000001000000r40N0I-100000001r.

00H000000000000000000001H

000000000000000000

r-10000000000N.0000000000

M m00.INON00,-10r403000N0000.-1

`8 .-100000000000N000000000

M M 04 A 0 H w 4 um 0 A M 1HZM

cf) U) MWAH MW A 0 OMM4 N 4ge P4 AH Wil :M 6'! 1W MI IITRI-C6104-4 0 0 0 H 0 0 0 0 0 00 00 0 0 0 0 0 0,-1 0I 01

0 .0000000000000000000010

00 0 0 0,-100 ,-100000000000001 8

000 0 0 000000000000001

000 ri0000 000rIOOON 0ririO øs

00ri00000riri000ri00N0000001

I 0 00000000000000000000000 II

.1 0000000000 00000ri00000001 ri

I000000000000000000000001 0

1 riri0000000ri00NO0Or-I0000001 0

cn a, r) u) H O (J) IX (J) UM Z O rzlrl w MHZ 0 Z 140 H (11 E-4 cn cn 2 Ho ovliq (11 QUI u)ra A H 4ti R O 1-1 H.., 0 Cil cn 8 u)ra 4(11 H 0 OU1M o cnix 0 o ra 04 0 E- M 8c4 fzi g M° R 0 A R 13:1 WO .52 0 Ei a, E-4 E-4 44 u rz 10 The list of the 500compounds in the data base shownin

Appendix A contains, inaddition to the IITRI numberassigned to a compound, common namesand the name designated byChemical Abstracts, and the dataavailability for each compound.

B. Gas Chromatography The analytical data ofqualitative value in gaschromatography is retention time. Worker's have been trying tofind the best retention designationthat will be accurate,universally applicable and acceptable. The main difficulty is thelack of universal applicability; eachworker has a privatecollection of columns for analysis andtheir retention data areunrelatable to other workers'data. Most commonly used data arerelative retention values and a recentlyintroduced value called the 1 Kovats' Retention Index. We decided to use therelative retention values because of theirwide use and ease of measure- We ment. We also chose toluene asthe inttrnal standard. believe that for any futurework with this system we would

select another internal standardbecause under the operating

conditions we had selected, theretention distance of toluene

is somewhat short, possiblyresulting in relatively large

errors in measurement. A compound with alonger retention distance, such as hexyl or heptyl acetate, would bebetter

suited for this purpose. The operating conditions chosenfor our data base are

shown in Table 6. Two columns, Carbowax20M and Silicone SE-30 1Kovats, E., Z. Anal. Chem.,181, 351 (1961). lITRESEARCH INSTITUTE

11 IITRI-C6104-4 thermostatted at 160°C were selected. These columns are frequently used by gaschromatographers and offer excellent thermal stability and high-temperatureoperation if needed.

The liquid phases also arereadily obtainable from a multitude of distributors.

Table 6

OPERATING CONDITIONS FOR GASCHROMATOGRAPHIC RETENTION DATA

Condition A Condition B

olumn: Column: 20% SE-30 (Gen. Elec. Co.) 21.9% Carbowax 20M (Carbide 0.5% Polytergent J-300 & Carbon Chem. Co.) 60/70 mesh Celite 545(J-M Co.) 0.5% Polytergent J-300 3/8 in. x 12' Aluminum 60/70 mesh Celite 545(J-M Co.) 160°C 3/8 in. x 12' Aluminum 160°C Solvent: toluene & chloroform Solvent: methylene chloride Conditioned: 48 hr @ 180°C Conditioned: 40 hr @ 170°C 18 hr @ 160°C Carrier gas: Helium Carrier gas: Helium V toluene: 14.6 ml V toluene: 15.9 ml

We obtained gas chromatographicdata from two sources:

1. W. O. McReynolds' publishedcollection2

2. Data generated in our laboratory.

McReynolds' compilation includesrelative retention data

) for about 350 compounds on and Kovats' Retention Indeces(Ix 80 different substrates at twodifferent temperatures. We were

able to use about 180 compounds outof his compilation. Relative

times (Rv ) were calculated fromthe following equation.

R -R . x air Rw= Relative Retention = -R tol air

2McReynolds, W. 0.1 "Gas Chromatographic Retention Data, Preston Technical Abstracts Co., Chicago, In., 1966. lITRESEARCH INSTITUTE

12 IITRI-C6104-4 where Rx, Rt01 and Rair are measured retention distances for the compound, toluene and air,respectively. Although Kovats' retention indices are included in our datafile they were not used in the screening. Hence, no detail will be given concerning its computation except to say it is a numberthat relates the retention time of the unknown to the retentiontimes for the normal hydrocarbons which bracket theunknown.

In order to increase available gaschromatographic data we found it necessary to generatedata. At first, an attempt was made to assess theusefulness of literature data, ASTM published a gas chromatography data compilation(ASTM DS25A) whidh compiles data from the literature through 1965. Unfortunately, as Table 7 illustrates,relative retention distances obtained under similar operating conditions can varywidely making such data unusable for our purposes. Two conclusions drawn from the literature survey are:

1. It is not feasible to use literature sources

for standard gas chromatographic data

2. We were (and would be) restricted to the GCdata

we have on hand or can generateunder the conditions

used by McReynolds. Consequently, we requested and received permissionfrom

the National Science Foundation to generate gaschromatographic

data for 100 to 150 compounds. Mr. W. 0. McReynolds supplied

us with the Carbowax 20Mand Silicone SE-30 columnc he used and we generated data for about 125compounds.

HT RESEARCH INSTITUTE

13 IITRI-C6104-4 Table 7 VARIABILITY OF GC RETENTION DATA

ANI4ONNIMMINIEWMINNIIIM Commuricl Relative RetentionDataa On 20M Relative to Acetone Acetoldehyde 0.68 0.53 0.37 Propionaldehyde 0080 0.86 0.87 0.90 Butyraldehyde 1.40 1.35 1.35 1.34 Ethanol 2.20 1.27 1.40 1.48 Isovaleraldehyde 3.70 2.6 2.06 Formaldehyde 0.60 0.39 On 20M Relative to Toluene Dipropylether 0.20 0.26 Diethylether 0.68 0.11 Methanol 0.35 0.35 n-Amylacetate 2.08 1.82 n-Heptanol 5.73 4.7 n-Hexane 0.09 0.10 n-Decane 0.65 0.70 Acetone 0.29 0.31 0.21 2-Butanone 0.44 0.47 Benzene 0.62 0.64 Ethylbenzene 1.52 1.53 Methylformate 0.19 0.20 n-Butylacetate 0.99 0.94 Acetaldehyde 0.16 0.16 1.2 Butyraldehyde 0.38 0.42 On 20M Relative to Benzene Butylacetate 2.40 2.01 Cyclohexane 0.277 0.228 1-Butanol 3.58 308 Ethanol 0.74 1.29 On SE30 Relative to Cholestane Cholesterol 1.78 1.95 1.94 1.96 1.74 1.2 On SE30 Relative to Pirent Chrysene 2.78 2.81 2.81 2.27 2.08 1.83 On SE30 Relative to Pentane Dimethylbutane 1.35 1.24

0111. aEac column represents a differentliterature source or different column temperature. Some groups have only one entry per column; e.g.,cholesterol shows 6 different sources and/ortemperatures.

14 IITRI-C6104-4 The actual time spentin generating these data was approximately 20 man days. We estimate that, onthe average, wa can generateGC data for 10compounds during an 8-hour working day per instrument.

Our final data filefor gas chromatographycontained data for 308 compounds of which36 had data on SiliconeSE-30 only,

7 had data on Carbowax 20Monly and 265 had data onboth columns. Although wa are unhappyabout the lack of GC data wehave decided to retain it inthe system because of theexcellence of gas chromatography as aqualitative tool. Since there are little or no additional gaschromatographic data available

Which we could incorporateinto our file, any futuredata acquisition would requiregeneration. In order to include gas chromatography in the finalworking system, it will be necessary to standardize theoperating conditions such thatall users of the data retrieval systemwould obtain data oncolumns specially supplied for that purpose. This is a reasonablesolution to the perplexing problem nowexisting among gaschromatographers

regarding variability of"standard" data. The logic underlying

GC's qualitative abilityforecasts greater importance,increased emphasis, and improved data andtechniques for qualitative

analysis by GC. Thus it should be relatively easyto ask the scientific community to generateand contribute data by using

a prescribed setof conditions. Such a proposition would probably be accepted becausescientists have already accepted

HT RESEARCHINSTITUTE

15 IITRI-C6104-4 1 such systems in the areas of NMR, MS, X-ray diffraction, IR, etc.; universally accepted "standard" data that have been compiled are accepted by all for comparison with individual data.

ASTM is approaching the problem of standard GC data with their ASTM DS25A compilation, but, as evidenced by the data in

Table 7, unique data for each compound are not available and won't be until a set of standard conditions is agreed upon by the workers in this area.

C. Nuclear Magnetic Resonance

Probably the most difficult area for indexing data thus far encountered is the area of NMR. A number of systems have been developed by various workers none of Which is either completely satisfactory or widely accepted. The difficulties are inherent in the nature of NMR spectra.

In theory, and frequently in practice, a complete NMR spectrum for a given chemical structure may be generated bya computer. However, the reverse process, generation ofa structure from a spectrum has been too complex for satisfactory handling by computer methods. A skilled practioner may analyze the patterns present in a spectrum almost intuitively, but the rules by which such analyses are made become too complex for practical translation into computer language.

The purpose for which we are indexing these and all other data must be kept in mind, however. Our purpose is not to derive chemical structures from the data but touse the data PITRESEARCH INSTITUTE

16 IITRI-C6104-4 as a "map" to helpidentify unknown compounds and to direct us to the complete spectra ororiginal data source. The methods of indexing must be simple (free ofinterpretive quality) and amenable to computerization. With these goals in mind we can employ indexing methods which are oflittle value in terms of relating to structural characteristics but aresignificant in terms of defining, in an abbreviatedfashion, the uniqueness of the total data. The problems related to various NMRindexing systems may best be discussed with reference toparticular examples. The

NMR spectrum of diethyl phthalate(Figure 1) is a typical example of a relatively simple spectral patternin the format supplied by the Sadtler Research Laboratories, Inc. The molecule is

syn !trir:al and has only four chemically distinctkinds of hydrogen (indicated by a, bi c, and d). The pattern o f a

triplet at 1.35 ppm and a quadruplet at 4.29 ppmwith integral

ratio of 3 to 2 is immediately recognizable ascharacteristic

of an 0CH group. The pattern in the aromatic region(7.47 and 25 7.63 ppm) is typical of adjacent hydrogens on anaromatic ring,

further coupled to more distant aromatic ringhydrogens. It

should be noted here that the Sadtler assignments arechemical

shift values arising from interpretation of the spectrum,and

in the cases of b, c and d do notcorrespond to any actual

peak positions on the spectrum. They represent the centers of

multiplet patterns.

HT RESEARCHINSTITUTE

17 IITRI-C6104-4 PHTHALIC ACID, DIETHYLESTER

Mol. Wt. 222.33 B. P. 298-299°C/735 mm C12H1404 Kingsport, Tenn. Eastman ChemicalProducts, Inc., Source: ASSIGNMENTS cps a 1.35 Filter Bandwidth: 4 b 4.29 Sweep time: 250 sec cps c 7.47 Sweep width: 500t250 d 7.63 Sweep offset: -/225 cps Spectrum amp: 10 Integral amp: 25 Conc. -100mg/0.5m1 CC14

40 10 SI

141

1.1"11

!;1.4 4.0 1141 Si 00

Figure 1

NMR SPECTRAOF DIETHYLPHTHALATE

situation, theinstrumentation To furthercomplicate the evolution towardshigher of NMR isundergoing acontinuing widely usedinstruments atthe present magnetic fields. The most but 100 Mhzinstruments arecommercially time operateat 60 Mhz, studies at220 Mhz havebeen LI available andexperimental

reported.

HTRESEARCHINSTITUTE

18 IITRI-C6104-4 The significance ofthis evolution to our program maybe illustrated by reference tothe spectrum of diethylphthalate at

60 Mhz and at 100 Mhz. Two parametersdetermine the observed patterns in the NMR,the chemical shifttr, expressed in ppm

from trimethylsilane(TMS) and the spin-couplingconstants, J

in hz. The observed value ofgis dependent only on the magnetic field, and When expressedin the dimensionlessterms of ppm is

independent of the fieldstrength of the instrument. The value

of J is constant inhz, and is independentof the magnetic of 4.29 ppm. field. The quartet (b) indiethyl phthalate has a The observed peakpositions at 60 Mhz areshown in Table 8. Simple calculations givethe values at whidhthey would appear

in a spectrum run at100 Mhz. Note that the onlynumber which

is the same in both spectrais the chemical shiftg , which is does not a valueobtained by interpretationof the spectra and represent a peak position. In more complicated caseswhere

multiplet patterns overlap,the difficulties obviouslyincrease

rapidly.

Table 8

NMR PEAK POSITIONS FORDIETHYL PHTHALATE

60 Mhz 100 Mhz ppm hz ppm hz 4 4.29 257.5 4.29 429 Peak 1 4.12 247 4.19 419 2 4.23 254 4.26 426 3 4.35 261 4.33 433 4 4.47 268 4.40 440 = Chemicalshift of multiplet. lITRESEARCH INSTITUTE

19 IITRI-C6104-4 A simple versionof indexing observedband positions is the Sadtler Specfindersystem, a methodapparently derived from in each IR experience. In this systemthe most intense peak in 1 ppm region isrecorded, as well asthe most intense peak phthalate is presented the whole spectrum. The data for diethyl in Table 9 below. The range from 0-14 ppmis required to insure inclusion of allpossible compounds althoughpeaks are far from very rarelyfound beyond 9 ppm. While this system is ideal, from a practicalpoint of view it appearsto be best for

the purposes of thisstudy. The resulting series ofdigits represent a set of dataWhich is characteristicof a relatively

small number of compounds,and which maynevertheless be readily obtained from rawpublished data. The problem of

peak position shift withchanges in operatingfrequency outlined

in the previous sectionis still present, butis minimized by the fact that the strongestpeak in a multiplet willbe near

its center, and theshift with frequencywill therefore be

small.

Table 9

NMR PEAK POSITIONS FORDIETHYL PHTHALATE ACCORDING TO SADTLER

Strongest Band 3 4 5 6 7 8 9 10 11 12 13 14 1 2 1.35 35 - - 23- -50

A more refined system,developed by VarianAssociates,

consists of recording therelative integratedintensity of the

peaks in each 1 ppm range. For our example ofdiethyl phthalate, lITRESEARCH INSTITUTE

20 IITRI-C6104-4 these data would have the form shownin Table 10. While this form of data recording contains alittle more structural information (i.e., the ratios of thevarious chemically different kinds of protons in the molecule)it is less sensitive for distinguishing between structural isomers sincediethyl isophthalate, diethyl terephthalate, and probably anyethyl esters of trisubstituted benzoicacids would give the same profiles. This is because the profile simply represents aratio of 2 aromatic protons to one ethyl ester group. The shape of the

aromatic signal would change markedly, butit would not be shifted outside of the 7-8 ppm region. An advantage of the system is

that the data may be transferred directlyfrom the spectrometer

to computer storage, but thiswould be useful only in a program

of data generation. Also, for our present purposes, a good deal of the requisite data would be unavailablefrom the literature

except via calculations from theoreticalintegrals since very

little integral data is published. We investigated this method through discussions with Dr. LeRoy Johnsonof Varian Associates

so as to have a completeappraisal of the method on hand. This method of indexing did not appear to bepractical for our

purposes.

Table 10

NMR PEAK POSITIONS FOR DIETHYL PHTHALATE ACCORDING TO VARIAN

pm 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 % - 43 - -29- - 29 - - =IN 011 =IN =IN

lITRESEARCH INSTITUTE

21 IITRI-C6104-4 being An experimentalTermatrex indexof NMR data is compounds were evaluated in an ASTMproject. In this system, number of lines per indexed by the number,position, area and

"cluster" in theirspectra, as well asby the presence or While the absence of certainfunctional groups orelements. it is too system seemsfairly effective onthe scale tried, and the cumbersome for aproject of themagnitude we propose, mentioned method of indexingis open to manyof the objections

above in connectionwith othertechniques. shifts Other workers havedevised indicesbased on chemical techniques all require and spin-multiplicities,but since those prior to storage of considerable interpretationof the spectra further data, we caneliminate thesepossibilities without

discussion. Therefore, the bestcompromise of reasonablydefinitive experimental data which areeasily yet briefdescriptions of the handled by computertechniques is theSpecfinder system. used We have usedthis approach inrecording our data but Some tests, which the first mostintense peak forsearching. using the first and will be discussedlater, were conducted second most intensepeaks. particularly where the A systembased on this approach, would require instrument is connecteddirectly to the computer, (60, 100 or 220Mhz). Data a standardcondition in frequency obtained at differentfrequencies would have tobe interpreted Although before presentation tothe data file forscreening.

lITRESEARCH INSTITUTE

22 IITRI-C6104-4 would prevailwith any file someldhat inconvenient,this is what would not detractfrom the that exists now. Such a compromise application usefulness of thesystem consideringits qualitative

and purpose. the Sadtler andVarian Our main sourcesof data were to these, wehave searched spectra collections. In addition increase the numberof several other sourcesin order to which we have allfive types compounds on ourselected list for Index to NMRLiterature of data. We have usedthe "Formula locate the spectraof a number of Data," VolumesI and II, to the MCA and APIcollections, which compounds. In addition to of theThermodynamics are handledby the DataDistribution Office (and which Research Center atthe Texas A&MResearch Foundation obtained copies ofthe Tiers'Tables, we have onhand), we have privately circulatedby a collectionof chemicalshift values Collection, circulated G. Tiers of3M in 1958,and the Humble in 1959. The Jeolco by N. F.Chamberlain ofHumble Oil Company Sadtler. Collection of spectrahas also beenobtained from chemical shifts Although the currentpractice is to report literature arereported in ppm fromTMS, the datain the older has provided some in a varietyof fortw. This situation but so far thequality interesting exercisesin calculation, conversion to the ppm of the data appearssatisfactory after arecalibrated in scale. For instance,the Humble spectra These values areconverted to mgauss,relative to benzene. the magneticfield strength ppm relativeto TMS bydividing by

lITRESEARCH INSTITUTE

23 IITRI-C6104-4 --""--1li'addinga constant tochange the reference from benzene to

TMS. Many of the spectra are run at30 or 40 Mhz, vihich

magnifies the spin coupling constantsrelative to the chemical

shifts. Further, a variety of solventshave been used, and solvent shifts must therefore be considered. However, qualitative inspection of the data andcomparison of data for

a single compound fromseveral sources indicate that the constancy of the values recorded in oursystem is adequate,

although it leaves something to bedesired in absolute

reproducibility. NMR data derived from literature sourcescomprised 6

percent of our total NMR data input. The majority were obtained

from Sadtler and Varian compilations.

D. Infrared Two basic approaches to indexing IRdata were considered.

These are the methods used by Sadtler andASTM. We decided to

use ASTM's approachwith the exception that we listed thepeaks

in order of decreasing intensity. ASTM's method is to list,

in order of wavelength to the nearest0.1 micron, the most

intense band plus all bands Whoseintensities are 10 percent or

greater of the most intense band. Sadtler, on the other hand, lists the most intense band within each 1micron region from

0 to 15 microns. Using a modified ASTM approach weindexed the three most intense bands (except 3.44Which was so

prevalent as to be relativelynon-discriminating) in order of

decreasing intensity. HT RESEARCHINSTITUTE Ii

24 IITRI-C6104-4 Indexed IR data forabout 700 compoundsfrom our list were obtained from ASTMthrough Dr. Kuentzel. About 1,800 punched

IBM cards wereprovided by Dr. Kuentzeland represented, in compound. many cases,multiple listings of IRdata for a given Because the 1,800ASTM IR pundhed cards werenot interpreted, printouts of the deck wereprepared to facilitatedata extraction from the cards. The ASTM cards recordall significant Because our IR peaks, but theydo not indicateintensities. search routine whichwill be discussed laterrequires data for the three most intensepeaks, it was necessaryto consult the original spectra(mainly Sadtler) todetermine intensities. from the ASTM deck, The intensities wererecorded on the printout complete source and the printoutwith intensities is now a more document for the projectthan the ASTM cards.

The entires representing542 spectra from SadtlerResearch Laboratories were allcharacterized by punches4 and 3 in column

79 and y and 1 incolumn 80. In columns 74through 78 the ASTM

number gave these innumerical order; thus it was amuch simpler

task to check thevalue of the Sadtlerspectrum and the accuracy

of the ASTM listing. Fully 10 percent of theSadtler spectra(1962 edition)

reviewed were poor enough toraise doubts concerningsomeof the acceptability can be absorption peaks. Usually this lack of attributed to concentrationcontrol (the samples weretoo thick techniques. or toothin), to impure samples, orto poor mulling these It is understandablewhy Sadtler haspublished a number of

PITRESEARCH INSTITUTE

25 IITRI-C6104-4 again as "series B." Furthermore, in almost 5 percentof the spectra viewed, ASTM had madean error either in interpreting the

spectrum or in failing to transcribeand keypunch the data

correctly. In either case, strong absorptions (notattributable to the solvent or mulling medium)were not listed, or medium

absorptions were listed incorrectly. Medium strength absorptions that had not been listedwere not counted as a mistake in the

above error analysis. This emphasizes the importance ofhaving trained personnel verifyany data placed in the data bank of the retrieval system.

While the intensities of the 3 mostintense bandswere accorded the intensities,per se, were not used in the screening

process. The intensitieswere used only to rank the three peaks

in order of decreasingintensity.

E. Mass Spectrometry

Indexing mass spectrometral dataproved relatively easy. The ten most intense peakswere indexed along with their

relative intensities. The only standard criteriarestricting the selection of the peaks isthat they be obtained with

electron bombarding energies of 70electronvolts, a widely

accepted practice. The most intense peak (base peak)was given an intensity value of 1000 and the remainderwere given values in percents of 1000. In our actual screening only thefirst 4 peaks (in order of decreasingintensity) were used. The intensity, per se, was not used inthe screening process.

HT RESEARCH INSTITUTE

26 IITRI-C6104-4 taken from two sources: All of the massspectral data were of Mass Spectral 1. ASTM PublicationNo. 356, "Index Pa., 1963. Data, 1st Edition,ASTM, Philadelphia, of Mass 2. Cornu, A., Massot,R., "Compilation 1966. Spectral Data,"Heyden & Son,Ltd., London, about 3,200compounds The ASTM publicationcontains data for compounds. While Cornu andMassot's has datafor about 5,000 compilation are alsofound Most of thecompounds in the ASTM

in the Cornu-Massotcompilation.

F. UltravioletSpectrophotometry through the program webecame About two-thirdsof the way chromatographic data. We concerned aboutthe lack of gas spectral data into decided to incorporateultraviolet (UV) identification of our file toprovide a broaderbasis for data for the838 compounds compounds. Accordingly, spectral three data sources: were soughtusing the following located at AbbottLaboratories, 1. Sadlter's compilation

North Chicago,Illinois Compounds," 2. "Ultraviolet Spectraof Aromatic Wiley & Sons,Inc., R. A. Friedeland M. Orchin, John New York, 1957 Spectra Index 3. "Ultraviolet andVisible Absorption Academic Press, New for 1930-1953,"H. N. Hershenson,

York, 1956.

The third sourceis a literatureindex that provided the major references to theoriginal data. Sadtler's file was HT RESEARCHINSTITUTE

27 IITRI-C6104-4 source of data. Because ASTM's UV spectral files were not

readily available, we did not use that source for UV data. The ASTM file has data for almost 26,000 compounds.

The majority of UV spectra have been obtained in solvents over a wavelength range of 200 to 400 millimicron (mg). The most common solvent was methanol. Hence, data for compounds run in methanol were selected. Our data were reported in the

range of 200 to 400 mg. We recorded only the strongest peak

and excluded extinction coefficients. The strongest absorption band was located to the nearest mg. A tolerance of +2 mg was used in the search. Selection of this tolerance was based on experience in reproducing wavelength readings that are subject to both instrumental and interpretational variations.

A sample that is transparent in the UV region is so indicated in the data file.

G. Survey of Signature Availability

A survey was made by information specialists between June,

1967 and March, 1968 to determine the overall availability of signature data from the major data sources in the country.

The American Society for Testing and Materials (ASTM) is responsible for the identification and indexing of the significant spectral collections throughout the United States. The ASTM index identifies virtually all spectra available from Sadtler

Research Laboratories, National Bureau of Standards (NBS),

Documentation Molecular Spectroscopy (DMS), Thermodynamics

Research Center (TRC) (which includes the American Petroleum lITRESEARCH INSTITUTE

28 IITRI-C6104-4 Institute (API) and ManufacturingChemists Association (MCA) collections), the Coblentz Society, theInfrared Documentation

Center, Japan (IRDC-Japan), andthe open literature, domestic

and foreign. As of March, 1968, ASTM hadin its files information concerning 100,000 IR, 3,200 MS, and25,749 UV spectra. A

subcommittee is working on the indexingscheme for NMR. A

trial NMR indexing scheme has beenprepared by Jonker Business

Machines, Inc. This scheme, organized for use ofthe Termatrex

system, is now under review. ASTM has also prepared an updated

GC Compilation DS25A(approximately 4,000 compounds) Which was published in late 1967.

ASTM expects to add 15,000-20,000IR spectra yearly to its

index. Some 9,000 MS spectra will be addedthis June from

European collections. Sadtler Research Laboratories have32,000 IR spectra,

4,000 NMR spectra, no GC, no MS, and22,000 UV spectra on

organic compounds. All Sadtler spectra are identifiedin the

ASTM indices. It is anticipated that 2,000signatures in IR and NMR will be added annually. Dr. Zwolinski, director ofthe Chemical Thermodynamics Research Center (TRC), states that there arepresently 13,000

complete spectra available at TRC. As of December 31, 1967, this collection consisted of 3,950 IR,1,246 UV, 2,651 MS, and

1,260 NMR spectra. All TRC spectra are identified andindexed

through ASTM. However, TRC expects to have its ownindex

lITRESEARCH INSTITUTE

29 IITRI-C6104-4 " +404,,Prv*IP,'411.4.0014/0.,,,

available in the Summer of 1968. The Coblentz Society, which also has its spectra identified and indexed through ASTM, has 5,000 complete IR spectraavailable.

These are sold by Sadtler Research Laboratories.

A tabulation of major spectra sources appearsin Appendix B.

Eadh table lists the major sources, number of spectraavailable, the form of the spectra, the cost, and the data sources. The tables are accompanied by a listing of other datacompilations.

Cost figures for data acquisition, cannot be estimated adequately on the basis of the procedures used in this research inasmuch as the procedures were experimental, varied and unique for the method of compound selection used. Methods for data acquisition will be different if future researdh is conducted. In general, data acquisition costs whidh will account for a large share of costs in an operatingsystem fall into three categories:

1. Data obtained from published sources and collections Cost of collection Cost of transcription

2. Data measured from published sources

Cost of measurement

Cost of transcription

3. Data obtained from laboratory measurements

Cost of compounds Cost of manual processing Cost of data reduction and transcription, or Cost of automatic processing. lITRESEARCH INSTITUTE

30 IITRI-C6104-4 obtained from acombination The data placedin the file were processing of of the threecategories exceptthat automatic employed in thisphase of the laboratory measurementswas not study.

Design H. Cataloguing Data andData Master Card inclusion into thedata For every compoundselected for compound. file there is a masterdata sheet onfile for that the selection A searchmedhanism presentlyhas as its purpose gheets which represent of one to, at most, afew of such master match the inputdata. The the compound(s)whidh most closely the indexed data data contained onthe 8 1/2 x 11form include plus the originalsources for the analyticaltedhniques involved

of data. project is found A sample ofthe master sheetused in the compound used in in Appendix C. A gheet wasprepared for each includes: the file. The information onthe gheet Compound name - Thecompound will belisted by IUPAC name, andtrivial or trade namesare included for crossreference from CAS registry -The CAS registrynumber, obtained number the SOCNA, Handbookalso aids in identifying the compound is listed GC data - The gaschromatograPhic data R for Carbowax20M and SE-30columns. Both the relativespecific retention

lITRESEARCH INSTITUTE

31 IITRI-C6104-4 volume and the Kovats retention

index appear

MS data - The mass-to-charge ratio islisted above the relative intensities in

decreasing order

IR data - The three most intense infraredpeaks (excluding 3.4) are listed

in order of decreasing peak

transmittance

NMR data - The NMR is listed as it appearsin the Sadtler Specfinder Index. The strongest

peak and solvent are indicated

UV data - The most intense Land is listed tothe nearest millimicron (mg)along with

the solvent used.

Structure, empirical formula,melting point, boiling point, physical state and the original data source are also indicated on the master data sheet.

IV. TESTIVG SIGNATURE CONCEPT

&Igsperimental Design

1. Objectives

A search experiment wasdesigned to determine the feasibility of identifying chemical compounds bymatching their signatures

with those of known compounds. The experiment tested the hypothesis that an unknown compound canbe identified by lITRESZARCH INSTITUTE

32 IITRI-C6104-4 sequentially searching adata fileconsisting of a minimum signature number of significantdata points ineach designated of the datamatches. and obtaining thelogical intersection tolerances, and the By carefulselection of thedata points, the that the intersectionof matching criteria,it was anticipated compounds, the matches wouldcontain a small setof candidate one of whichwould be thedesired compound. provide data for The experiment wasalso designed to match criteria, establishing searchprocedures, formulating power ofthe five and investigatingthe discrimination objectives signatures in thedata base. These auxiliary of the experiment arelisted below. match criteria in (1) To investigatethe following the search ofeach signature:

(a) Data parameters Number of peaks Magnitude of peaks Ranking of peaks

(b) Data tolerances

(c) Conditions of dataacquisition Solvents (whenapplicable) Standard measurementconditions capabilities of (2) To determinethe discrimination

each signaturefor individualcompounds, chemical groups, andall the compoundstogether

UTRESEARCH INSTITUTE

33 IITRI-C6104-4 (3) To measure theselectivity of sequential search

by:

(a) Combining searches

In10combinationstakentwo at a time In10combinationstakenthree at a time In5combinationstakenfour at a time

(b) Determining the minimum numberof signatures required to achieveidentification of an unknown

(4) To develop a measure ofsearch effectiveness that

incorporates:

(a) Rapid screening

(b) The least number of searchoperations

(5) To investigate a weightingsystem for evaluating

and ranking compounds.

2. Test Inputs The experiments consisted of a setof searches with three categories of inputs:

Input 1: Compounds having all five signaturesavailable in the data base were used as"unknowns."

Input 2: Compounds having differentcombinations of signatures were used as unknowns;these data

were not includedin the data base.

Input 3: Experimental data derived fromin-house laboratory measurements werematched against

stored data.

IIT RESEARCHINSTITUTE

34 IITRI-C6104-4 Input 1 data were used to determine the ability of the search tc; make positive identifications, given that the compounds are stored in the data base. Input 2 data were used to test the ability of the search to discriminate against closely related compounds Which had data stored in the data base. Input 3 data were used to test the selectivity of search under non- controlled input conditions that may be encountered in practice.

Input 1 and 2 compounds served essentially the same discrimination-testing function. By eliminating an Input 1 compound from a candidate list, an Input 2 condition was established. The remaining candidates are those that would be obtained if the input data were not available in the data base. The Input 3 compounds were used to test the tolerance limits that would be required in an operational system in Which data describing unknown compounds to be identified would be derived from laboratory measurements made externally or at

IITRI.

3. Tests All "unknown" compounds from inputs 1, 2, or 3 were processed in the same manner. Input data were inserted in designated locations on the data sheet shown in Appendix D.

The data base, stored in an inverted-term-Termatrex system (described in Report No. IITRI-C6104-2) was searched, one signature at a time, and the numbers of compounds whose stored data matched the input data accoluing to the match criterion were obtained.

lITRESEARCH INSTITUTE

35 IITRI-C6104-4 To obtain data on matchcriteria and discrimination capabilities, the search in each signature wascarried out independently without reference to previoussearches. To obtain data on selectivity and search effectiveness, acumulative search was

then made by selecting the numbersof compounds in the inter-

section of all previous searches withthe currnt search; these data were tabulated on the searchrecord shown in Appendix E. Selected candidate compounds that werethe intersection of

the matdhes in all availablesignatures were entered on the

candidate list shown in Appendix F.

Two series of tests were made. The first test was based

upon the formulation ofinitial match criteria. The second tests

were developed to explorethe effects of modified matchcriteria

on search precision.

a. Initial Match Criteria

The initial match criteria forthe seven steps of the

sequential search were:

1. State: solid/liquid/gas

2. Elements: combinations of significant elements

as coded in column32 of the ASTM punched cards for

infrared spectra

3. Infrared spectroscopy: match on any two of the first three highest peaks of spectrum. The 3.44 peak (C-H band) was omitted from the input data

4. Nuclear magnetic resonance: match on the highest peak; solvents were coded but wheredisregarded

HT RESEARCHINSTITUTE

36 IITRI-C6104-4 5. Mass spectroscopy: match on any two of the four highest peaks; matches were made on the mass numbers

of ranked peaks but not on the relative amplitudes

6. Gas chromatography: match on relative retention time of either one of two columns depending upon

availability of data

7. Ultraviolet spectroscopy: match on the most

intense band.

b. Modified Match Criteria

Following an evaluation of the precision of the search results obtained with the initial match criteria, several modifications of the criteria were made to assess their effect in improving precision.. These modifications iacluded the following criteria used singly and in combinations:

1. State: no change

2. Elements: no change

3. Infrared spectroscopy: no change

4. Nuclear magnetic resonance: match on two

out of the two highest peaks

5. Mass spectroscopy: a) same as initial criteria except the 1/4 and 4/1 input data/stored

data peak matches were deleted; b) match on three

out of four peaks

6. Gas chromatography: match on relative retention

time of both columns A and B

7. Ultraviolet spectroscopy: no change.

HT RESEARCH INSTITUTE

37 IITRI-C6104-4 4. Candidate Ratings Ratings were used to evaluatethe probability that an unknown compound was a compound onthe candidate list in accordance with weightsassigned to the signatures andwith the closeness of the match. The maximum ratings assigned tothe five types of signatures and to the state and theelement searches are:

MS 40

IR 36

NMR 24

GC 18 (9 points per column)

UV 6

Elements 4

State 2

130

To test the tolerance limitsin each signature, the rating scale considered exact matches ascontrasted to matches within the tolerance range. A skip was rated 0, and adefault was rated 1. The rating table for a candidatelist is shown in

Appendix G.

B. Search Procedures

1. Search Logic A composite chemical signatureof a compound consists of

a representationof the significant parameters of itscomponent

signatures. The composite signature for acompound can be HT RESEARCHINSTITUTE

38 IITRI-C6104-4 represented as

Ci = f (Mi, Ri,Ni, Gil Ui.), i =(1, 2, n) where C = compound M = mass spectrometrysignature

R = infraredspectroscopy signature

N = nuclearmagnetic resonancesignature

G = gaschromatography signature

U = ultravioletspectroscopy signature. Each signature, in turn, canbe represented bythe parameters selectedfor the experiment: a a a 3 4 2 if i : M : a , M : , M M.=f (M 1 2 a 3 a 4 al 1 i 1 i 1 i i i i where M= maximumpeak 1 a= amplitudeof maximum peak =unity 1 a4 and 4th peaks M. : = rankorder of 2nd, 3rd, 3 Al

R. = f (b b , b ), 1 ' 2 3 i i i

= threehighest absorptionbands with transmittance Where b1,2,3 equal to or greaterthan 10% of the mostintense

band.

N= f(h ), i max.

6 = 0 x 10 = chemicalshift. where hmax. max

Gi = f (gAi gBi),

(dimensionless), and A and B Where g = relativeretention time

are columnsdescribed in Table6. HT RESEARCHINSTITUTE

39 IITRI-C6104-4 U. = f

the most intensepeak. where4max The objective of asearch aimed atidentifying an unknown compound is to yield aminimal set of compoundsthat satisfies the Boolean expression

C3. = tcrniin(Cail (c4fa)(CGil

designate the wherefil=logical AND, and thei and j subscripts stored known andinput unknown compoundsrespectively. and E preceded the State and elementssearches, Si i' composite signaturesearch in the experiment.

2. Com uter Search Desi n

A generalizedsearch procedure capableof being implemented

on a computer wasdeveloped to providethe Boolean search and described pre- to serve theobjectives of the experiment as to every viously. A data availabilitycode was assigned unknown compound whoseidentificationwas sought and to every or stored (known) compound. The code indicates the presence

absence of data uponwhich a match can beattempted. The absence

of data for a parameterin an unknown compound,for example, precludes a searchthrough the relevant parametersof the known

compounds. In a computer search,each step would startwith a match compound and on thedata availability codesof an unknown

the candidatecompounds. A search routineCSEAR would be

lITRESEARCH INSTITUTE

40 IITRI-C6104-4 Table 11 SEARCH TABLE (CSTABL)

CSEAR 1 STATE

PARAM 1 Solid/Liquid/Gas

CSEAR 2 ELEMENTS

PARAM 2 As specified by unknown

CSEAR 3 MS

/m (mass with highest amplitude) 1 M (mass with 2nd amplitude) PARAM 3 2 (mass with 3rd amplitude) 3 M (mass with 4th amplitude) 1M4 CSEAR 4 IR

PARAM 4 42 (second peak) 43 (third peaY)

TOLER 4 +0.14

CSEAR 5 NMR

PARAM 5 PPMmax TOLER 5 +0.1 PPM

(PARAM) Solvent (not used at present)

CSEAR 6 GC {Relative Retention Time, Column A PARAM 6 Relative Retention Time, Column B

TOLER 6 (See Section IVD1d)

CSEAR 7 UV

PARAM 7 Strongest Peak

TOLER 7 +2m4 lITRESEARCH INSTITUTE

41 IITRI-C6104-4 called in from a search table CSTABLand the appropriate parameter(s) PARAM and tolerance(s) TOLER would beinitialized

in accordance with Table 11.

The set of compounds that matchedin each stage would be placed in a candidate list togetherwith ratings as per the

rating table. Only the signatures of compounds on thecandidate lists would be seardhed in subsequent stages. Surviving candidates

at the end of all stages would beprinted out in order of

their accumulated ratings. Upon completion of the search, the resultswould fall into

one of four categories:

1. Zero selection - no compounds

2. Plural selection - more than one compound

3. Conditional selection - one compound isolated with less than "perfect" rating; identification

cannot rule out other possibilities

4. Unique selection - positive identification of single compound having a "perfect" rating.

3. Termatrex Experiment The search experiment described above wasimplemented

by using an inverted file with opticalcoincidence (Termatrex

system) to demonstrate the feasibility ofidentification by

sequential signature matching. The logic and search procedures, Which were designed for a computer, werecarried out using

drilled cards and appropriate tally forms.

lITRESEARCH INSTITUTE

42 IITRI-C6104-4 C. Search Measures

1. Availability andDefault Ratios unknown compound by The feasibilityof identifying an sequentially searching adata file consistingof a minimum

number of significantdata points indesignated signatures and on depends on gooddiscrimination in eadhsignature search

fine selectivityin screening thedata base. As used in thisreport, discriminationis the capability well-defined to separate setsof compounds onthe basis of characteristics of asignature. Selectivity is thescreening

capability of asequential search. Precision is the accuracy

of identification. Because data are notuniformly availablefor all compounds must be considered as a in all signatures,data availability selectivity, factor in formulatingmeasures fordiscrimination, actions are and precision. Accordingly, thefollowing search

defined. given signature of a A match occurs whendata stored for a tolerance range ofinput known compound areequal to or in the for a givensignature data. A default occurswhen input data compound does nothave of an unknown areavailable and a stored of stored data data for thatsignature. The nonavailability precludes obtaining amatch but does notrule out the possi- data is a candidate. bility that thecompound with no stored of an A skip occurswhen input data for agiven signature seardh of unknown compound arenot available. In this case, a HT RESEARCH INSTITUTE

43 IITRI-C6104-4 the stored data in the sic;-aturefile is not possible.

The availability ratioof a signature is the ratio of the number of compounds having dataavailable to the total number of compounds in the data base. The default ratio of a signature is the ratio of the numberof compounds not having data to the total number ofcompounds in the data base. These ratios can be expressed mathematically: a =Eff3=E=1- a,where

C is the number of compoundsin the data base

S is the number of compoundswith available data in a given signature D is the number of compounds with nodata in a given signature

a is the availability ratio

is the default ratio.

Table13 lists the availability and thedefault ratios of eadh signature in the 500-compounddata base.

2. Discrimination The discrimination factor of asignature is a function of the number of matches and thedefault ratio inasmuch as the number of possible successfulmatches is reduced by the non-

availability of data. The defaults can be likened to a seaof

uncertainty upon which successfulmatches float. The discrimi-

nation factor, b, is expressed as: =Mxp where M is the number of matchesobtained in a signature seardh.

HT RESEARCH INSTITUTE

44 I1TRI-C6104-4 C=73 C=f Table 12 SIGNATURE AVAILABILITY AND NUMBER OF DEFkULT RATIOS SIGNATURE MS COMPOUNDSNUMBER OF 500(C) SIGNATURESAVAILABLE (S)478 NUMBERDEFAULTS OF (D) 22 AVAILABILITY 0.956RATIO (a) DEFAULT RATIO0.044 (p ) IRNMR 500500 470357 192143 30 0.7140.9400.616 0.0600.3840.286 UVGC TOTALS 2500 500500 1846 233308 654267 0.4660.738 0.5340.262 3. Selectivity The selectivity of asequential search measuresthe si.ev- ing capability of thesearch and deals withthe intersections the ratio of the signature matches. Selectivity is defined as of the number ofsurviving candidates atthe end of a stage to the total number ofcompounds in the database. Inasmudh as compounds with defaults canbe present in thecandidate list, Expressed mathe- an adjustmentfor defaults is not necessary. matically,

2,...n) SelectivitYk= nk = 14-4alis = 1, where subscript k designates aseardh stage

n is themaximum number of stages Mk is the number ofmatches at the endof the k-th stage.

The sequential searchconsisted of seven stages: state, elements, and the fivesignatures (MS, IR, NMR,GC, and UV).

The final search results areindependent of the orderof search

inasmuch as intersection iscommutative. However, the search effectiveness, that is, therapidity of screeningand the total

number of compounds thatmust be seardhed,is affected.

4. Precision The most critical ofthe measures thatdescribe the

results of a sequentialsignature search isprecision. Precision 1 and is the accuracy ofidentification. In the case of Input

Input 3 compounds, perfectprecision would resultin the

lITRESEARCH INSTITUTE

46 IITRI-C6104-4 compound. identification of theinput compoundand only that precision would re- In the caseof Input 2compounds, perfect The elimina- sult in therejection of allcandidate compounds. of more than one tion of the"true" compound(or the presence identification of candidate) in theformer case andthe "false" would decreasethe one or morecompounds in thelatter case precision. is an arbi- Retention of acompound on thecandidate list number ofsignatures trary decisionthat depends onthe minimum threshold in therating scale. required forscreening and on a be increased By use of thesetwo quantities,the precision can discrimination by meansof or decreased. Similarly, enhancing criteria alsoincreases pre- tightening matchtolerances and

cision. selected Precision is measuredby the numberof candidates

in identifying acompound. Match Criteria D. Test Resultswith Initial based upon theinitial The resultsreported below are

match criterialisted in SectionIVA3a.

1. Discrimination number of matdhes,the The range ofmatches, the average and thediscriminationfactor for averagenumber of candidates Input 1 compoundsare shownin each signaturefor the 100 the number ofposi- Table13. The match rangecolumn indicates in eachsignature tive matches(less defaults)that occurred minimum number ofmatches obtained wlth "low"referring to the

lITRESEARCH INSTITUTE

47 IITRI-C6104-4 Table 13 FOR 100 KNOWN COMPOUNDS MATCHRANGE SIGNATURE DISCRIMINATION NUMBER OFAVERAGE NUMBER OF NUMBER OFAVERAGE DISCRIMINATION SIGNATURE (LESS DEFAULTS)LOW 1 HIGH153 MATCHES 49.83 DEFAULTS 22 CANDIDATES 71.83 FACTOR 2.19 MSIR 1 8459 30.3931.22 143 30 174.22 60.39 8.931.82 NMRGCUV 11 8539 36.7711.74 267192 203.74303.77 19.64 4.51 number. in the 100 searchesand "high"referring to the maximum compounds in There was at least onecompound among the 100

each signature that wasthe sole selectedcandidate, exclusive in the MS of defaults. The largest rangeof matches occurred of sig- signature, the smallest rangein GC. The distribution the nature matches isshown in Figure 2. Both the mean and medians are listedtogether with the rangeof matches including

defaults. The effect of thedefaults is clearly seen asthe distri- The butions of the signatures areshifted on theabscissa. distribution is due tothe 34 peak at 351-360matches in the UV compounds that aretransparent to UV.

a. MS Discrimination

The MS search rankedsecond in discriminationamong the seardh five signatures. An average numberof 49.83 matches per With from an availabledata base of 478compounds was found. candidates were found, onthe the 22 defaults, atotal of 71.83 The dis- average, foreadh of the 100 knowninputcOmpounds. crimination factor was2.19.

The criterion ofmatch was equalitybetween at least two in the offlour peaks. Because numerouspermutations were found ranks of the four majorpeaks among differentdata sources for combinations of a singlecompound, any two matchesfrom the 16 input and stored datapeaks were accepted. The rating table match of (Appendix G) takes permutedmatches into account, a match input peak 1 againststored peak 1 ascontrasted wlth a

UT RESEARCHINSTITUTE

49 IITRI-C6104-4 SIG. MS MEAN71.83 MEDIAN48.00 RANGE23 115 30 IR NMR NMRIR 60.39174.22 56.33176.00 144 31 202114 1020 rms04LNIa111) GCUV 303.77203.74 200.40280.77 268193 352231 50 0 MS UV 4030 i....OAIII..) 1020 0 co 0 co 0CN 0 0 0cN 0 0 0:11 0re) DISTRIBUTION OF SIGNATURE MATCHES NUMBER OF MATCHES Figure 2 'of input peak 1 againststored peak 4, forexample. the permuted peaks . e A summary of the MSmatches obtained on 14. of a sample of 25known input compoundsis presented in Table (maximum) number The range indicatesthe low(minimum) and hiah

of candidates obtainedin the searchesfor the 25 compounds. The results of the1 on 1, 3 on 3,and 4 on 4 matches were as expected, namely thatthe average numberof matches was great- match was there est in thesepermutations. Only in the 2 on 2 The 1 on 1 matches a reversalof order with the 2 on1 match. provided the largest averagenumber of matches,followed closely

by the 4 on 4matches. Since there is notolerance on an MSpeak, discrimination smaller) by of the search can beincreased (i.e.,co can be made (1) increasing the numberof peaks for whichmatches are re-

quired, e.g., three outof four; (2) increasingthe number of

peaks searched, e.g.,from four to five orsix, and raising the

match criterionaccordingly; (3) introducingrelative ampli- peak tudes, and (4)eliminating some of the16 combinations of

seardhes now employed, e.g.,input peak 1 mustmatch on any one

stored data peaks 1,2, or 3, but not4. To evaluate the effectof changing thenumber of peaks re-

quired for MS matching,Table 15 was prepared. The table lists criteria the average numberof matches obtainedwith the varying The of 1 out of 4 to 4 outof 4 peaks, exclusiveof defaults. matches on one peakonly, term "exclusive"in Table 15 refers to The two peaks only,three peaks only, andfour peaks only.

HT RESEARCH INSTITUTE

51 IITRI-C6104-4 Table 14

MS MATCHES ON PEAK PERMUTATIONS (25 Input Compounds)

PEAKS RANGE AVERAGE NUMBER INPUT STORED LOW HIGH OF MATCHES

1 2 78 34.76

2 2 37 19.92 1 3 0 68 16.24

4 1 69 14.12

1 0 78 19.20

2 2 37 15.52 2 3 2 68 13.00

4 0 69 12.08

1 0 78 12.20

2 1 37 13.08

3 3 2 68 20.00

4 1 69 15.80

1 0 78 16.40

2 0 37 16.92

4 3 0 68 24.88

4 1 69 29.16

52 IITRI-C6104-4 term "inclusive" refersto every instanceof one, two, three, or four mai:ches;for example the numberof matches on one peak includes the number of matches ontwo, three, and fourpeaks.

Table 15

NUMBER OF CANDIDATECOMPOUNDS OBTAINED WITH DIFFERENT MS MATCHCRITERIA (100 Known Input Compounds)

NUMBER OF MATCH RANGE AVERAGE NUMBER PEAKS ___(INCLUSIVE) OF MATCHES INCLUSIVE MATCHED LOW HIGH EXCLUSIVE 1

1 3 313 103.54 153.37

2a 1 153 40.23 49.83

3 2 38 7.85 9.60

4 1 6 1.75 1.75

aCriterion used for the experiment: matches on2 or more peaks. The average of 49.83 matches(exclusive of defaults) ob-

tained with the criterion ofmatching on 2 out of 4peaks would

be reduced to 9.60 if3 out of 4 peaks were thematch criterion

A 4 out of 4 criterionwould further reduce the averagenumber of candidates per searchto 1.75.

b. IR Discrimination The IR search was the mostdiscriminating of the five

signatures. An average number of30.39 matches per search was found from an available database of 470 compounds. With the

30 IR defaults, a total of60.39 candidates werefound, on the

average, for each ofthe 100 known inputcompounds. The dis-

crimination factor was 1.82. lITRESEARCH INSTITUTE

53 IITRI-C6104-4 compounds, The peak at3.44, characteristicof hydrocarbon and stored data. The 5.8/2 had been eliminatedfrom all input data and was avalid seardh peak, however, wasincluded in the stored data wereequal to value. A match wasregistered if the data. Permutations or withinthe tolerance rangeof the input MS seardh, i.e.,IR in- of ranked peaks wereseardhed as in the 2, and 3, etc. put peak 1 wassearched againststored peaks 1, permuted peaks A listing ofthe IR matdhesobtained on the measured in MS appearsin Table of the same 25compound sample obtained on the 1 on1 and 16. The average numberof matches categories. 2 on 2 peaks werethe highest intheir respective than The number of 3 on2 peak matches wereslightly greater

the 3 on 3 matdhes. Table 16

IR MATCHES ONPEAY PERMUTATIONS (25 InputCompounds)

PEAKS RANGE AVERAGE NUMBER OF MATCHES INPUT STORED LOW HIGH

1 10 71 41.72 22.12 1 2 13 39 3 11 24 18.44

1 1 71 24.36 27.28 2 2 2 57 , 2 67 26.20

1 2 64 26.40 34.56 3 2 4 59 3 9 67 33.92

. _ ii HT RESEARCHINSTITUTE

54 IITRI-C6104-4 Li The number of candidatecompounds obtained with1-, 2-, and 3-peak matches isshown in Table 17. The rarge end exclusive-inclusive definitions arethe same as thoseaccompanying

Table 15.

Table 17

NUMBER OF DIFFERENTCOMPOUNDS OBTAINED WITH DIFFERENT IR MATCHCRITERIA (100 Known InputCompounds)

NUMBER OF MATCH RANGE AVERAGE NUMBER PEAKS (INCLUSIVE) OF MATCHES MATCHED LOW HIGH EXCLUSIVE INCLUSIVE

1 38 282 161.28 191.67

2a 2 84 27.05 30.39

3 1 12 3.34 3.34

aCriterion used for theexperiment: matches on2 or more peaks. The 2-peak match criterionresulted in an averagenumber

of 30.39 matches perseardh (not includingdefaults), whereas 191.67 a single peakmatch would have resultedin an average of 3-peak matches. Greater discriminationwould be adhieved by a match criterion with an averageof 3.34 matches persearch.

C. NMR Discrimination

The NMR signature, with adiscrimination factor of8.93,

ranked fourth. An average number of31.22 matches per search from an available data baseof 357 compounds wasfound. To- gether with the 143 NMRdefaults, an average of174.22 matches

per search wasfound. The criterion of search was amatch on When the strongest peakwithin the tolerancelimit of +0.1ppm. HT RESEARCH INSTITUTE

55 IITRI-C6104-4 two peaks were ofthe same magnitude,eadh peak was assigned a different compound numberand the data for allsignatures; were entered for the twonumbers. Crotonaldehyde, for example,had nearly equal NMRpeaks, after rounding, of 2.1 ppmand 2.0 ppm. The former value was assigned IITRI number337; the latter wasassigned 961. In the search for crotonaldehydewith an NMR peak of2.0 ppm, compound

337 was selected, with arating of 130 points. It is interesting that since compound961 was within thetolerance range of 2.1 ppm, compound 961 wasalso selected,although its rating was 126, becauseof the tolerance match. The discrimination of theNMR search can beincreased by dhanging the criterion ofmatdh to two or more ofthe most intense peaks and by matchingsolvent data. Matching solvent data was not employedin the search, and itwould not improve the discrimination to the degreethat adding additionalpeaks would.

d. GC Discrimination The criterion of the GCsearch was a match withthe input

data of one of two columnswithin the tolerance limit. Because

of the wide range ofrelative retention times,the tolerance

was madeproportional to the time inthe following manner: 0.01-0.99: +0.01

1.0-9.9: +0.1

10-99: +1.

HT RESEARCH INSTITUTE

56 IITRI-C6104-4 The GC signatureprovided the lowest averagenumber of matches per search,11.74, but thenonavailability of data on

192 compounds reduced thediscrimination factor to4.51; GC thus ranked third indiscrimination. Table 18 summarizesthe

GC signature search. The discriminationof the GC signature canbe improved by using Column A alone orColumn B alone insteadof the inclusive

OR criterion usedin the experiment. Even greater discrimination would result fromusing the intersectionof the two columns, as shown inTable 18.

Table 18

NUMBER OF CANDIDATECOMPOUNDS OBTAINED WITH DIFFERENT GC MATCHCRITERIA (100 Known Input Compounds)

NUMBER OF MATCHES AVERAGE MATCH INPUT RANGE NUMBER OF CRITERION DATA LOWHIGH MATCHES

Column A 96 1 26 7.62

Column B 83 1 20 6.32

A and B 79 1 7 1.44 a N or B 100 1 39 11.75

aCriterion used for the experiment.

UTRESEARCHINSTITUTE

57 IITRI-C6104-4 The GC signatureprovided the lowest averagenumber of matches per search,11.74, but thenonavailability of data or

192 compounds reducedthe discriminationfactor to 4.51; GC thus ranked third indiscrimination. Table 18 summarizes tho

GC signature search. The discrimination ofthe GC signature canbe improved ] using Column A alone orColumn B alone insteadof the inclus

OR criterion usedin the experiment. Even greater discrimin tion would result fromusing the intersectionof the two col as shown inTable 18.

Table 18

NUMBER OF CANDIDATECOMPOUNDS OBTAINED WITH DIFFERENT GC MATCHCRITERIA (100 Known Input Compounds)

NUMBER OF MATCHES AVERAGE MATCH INPUT RANGE NUMBER OF CRITERION DATA LOWHIGH MATCHES

Column A 96 1 26 7.62

Column B 83 1 20 6.32

A and B 79 1 7 1.44 a N or B 100 1 39 11.75

aCriterion used for the experiment.

HT RESEARCHINSTITUTE

57 IITRI-C6104- e. UV Discrimination

The UV signature was the least discriminating of the five signatures. Its average output of 36.77 candidates per search adied to the 267 compounds for which UV data were not available resulted in an average of 303.77 candidates per search. The discrimination factor was 19.64.

2. Selectivity

a. State and Element Searches

The distribution of state and elements data for the 500- compound data base and the 100 test compounds is shown in Table

19. The liquid state category eliminated 15 of 100 compounds; the solid stata eliminated 86. The gas category provided unique identification. The selectivity of the two stages is given in

Table 20.

HT RESEARCHINSTITUTE

58 IITRI-C6104-4 Table 19

STATE AND ELEMENTSDATA AVAILABILITY

_ INPUT 1 SEARCH PARAMETER DATA BASE 500 COMPOUNDS 100 COMPOUNDS

Solid 95 14 85 STATE Liquid 387 Gas 18 1

Bromine 25 1 Chlorine 43 7 Fluorine 3 Iodine 9 ELEMENT* Lead 1 Nitrogen 80 9 Oxygen 300 63 Sulfur 16 6 No Hydrogen 7 1 No designation 70 23 _ *Data base and Input 1columns total moreth n 500 and 100 because more than oneelement was listedfor many compounds.

Table 20

SELECTIVITY OF STATE ANDELEMENTS SEARCHES (100 Known InputCompounds)

AVERAGE NUMBER OF STAGE SURVIVING CANDIDATESSELECTIVITY

State 342.43 0.685

Elements 195.11 0.390

lITRESEARCH INSTITUTE

59 IITRI-C6104-4 The effectivenessof the preselectionis indicated clearly compounds that was by the 61 percentreduction in candidate obtained at the endof the elementssearch.

It was anticipatedthat the state andelements searches would serve only as apreselection tool toreduce the number Although this of compounds to beseardhed in thesignatures.

objective was adhieved,it was found thatthe state and ele- of a number of ments searches alsoaided in identification

compounds.

b. Signature Searches experiments to Two sequences weretested in preliminary ordered on test searcheffectiveness. The first sequence was in accord- the basis of thenumber of data pointsto be searched NMR, UV, ance with thematch criteria. The order chosen was crotonalde- GC, IR, and MS. The selectivityfor compound 337, hyde, is Shown inFigure 3. The state andelements searches ordered were omittedfrom Figure 3. The second sequence was with the on the basisof the defaultratio/ the signature chosen fewest defaults beingsearched first. The seardh order for crotonalde- was MS, IR,NMR, GC, and UV. The selectivity hyde via this sequenceis Shown in Figure4. The presence of 143defaults in the NMRdata and 267 in

the UV data accountedfor 80 candidatecompounds that were carried into the thirdstage of search, asindicated in Figure the IR data, re- 3. The 22 and the 30defaults in the MS and spectively, gave rise to nodouble-default candidates, as lITRESEARCH INSTITUTE

60 IITRI-C6104-4 Signature Discrim-ination 337: Crotonaldehyde Selectivity 176 176 - NMR 411) 277 143 411) NMR NMR143 104 UV 10 267 1110 t 80 cs) NMR-UV NMR.UV NMR-EV NMR-UV 205 00 OW 26 i GC 49 11921 9 1110111 1111111111 NMR-UV-GC 1 9 J 44 MOONMR-UV-GC NMR-UV-GC 3 _r____te,-..str____ NMR-UV-GC RMR.UV-GC NMR-UV-GC IR 14 30 an I Is.....01 2 MS 47 69 22 NMR-UV-GC-IR 1 NMR-UV-GC-IR 0 MATCH DEFAULT 00000NMRUV-GC-IRMSSELECTIVITY WITH SEQUENCE Figure 3 ORDERED BY MATCH CRITERIA Discrim-ination 337: Crotonaldehyde Selectivity Signature 47 69 22 47 69 22 MS . IR 14 44 30 MS- MS 6 NMR 176 143 MSIR MI MS.IR GC 205 192 alnMSIRNMR elell al MSIR-NMR2 MS-IR-NMR 4111) 10 277 MS - IR -NMR 1 - GC MN=MS - IR-NMR-GC 0 MATCH DEFAULT UV 1267 41110111111100MS.IRNMR.GC-UV SELECTIVITY WITH SEQUENCE Figure 4 ORDERED BY DEFAULT RATIO indicated in Figure 4. The discrimination ofthe seardhes, of course, is also animportant factor inestablishing the selectivity. The selectivity of the twosequences issummarized in

Table 21.

Table 21

SELECTIVITY WITH DIFFERENT SEARCH ORDERS (Signatures Only)

MATCH DEFAULT STAGE CRITERIA RATIO

1 0.354 0.104

2 0.201 0.012 3 0.052 0.006 4 0.006 0.004 5 0.002 0.002

_...... , Inasmuch as the selectivity ofsearch of Table 21 was

characteristic of numerous compounds,the search sequence

based on default ratios wasemployed in the experimentaltests

which were made using a manualTermatrex system. The search

order was: 1) state, 2) elements, 3) MS,4) IR, 5) NMR, 6) GC,

and 7) UV. Selectivity curves of 10 compounds areshown in Figures

5 and 6. The pattern of selectionvaries with each compound. A unique identification was notadhieved until the final stage

of search in four of theillustrated compounds, whereaswith

compound 522: benzonitrile, asingle candidate was selected at

the end of the fourth stage,after the MS and IR searches. For

three of the compounds in Figure6, unique identification was lITRESEARCH INSTITUTE

63 IITRI-C6104-4 : 250 Isopropanol

522: Benzonitrile O.%

10: VaPhthalene

: 111111111111

194: Piperonal

337: Crotonaldebyde

0 o o 0 0 0 0 0 0 oi w 4=. CA) COMPOUNDS CANDIDATE

t79 tPOT93IHII 0 0 o 0.. o 0 o 0 0 o I_I 01 411. (k) n.) COMPOUNDS CANDIDATE

S9 p-dichlorobenzene, forexample, not achieved. Compound 208: elements searches, had 10 candidatesat the end ofthe state and

NMR, GC, and UV. . 3 at the end ofMS, 3 afterIR, and 2 after compounds A summary ofselectivity for the100 known input Table 22. At the in the 7-stagesequential searchis listed in nearly 93 percentof the end of the firstsignature seardh, MS, The IR search 500 compounds inthe data base wereeliminated. For the 100 eliminated up to 98percent of the500 compounds. survived follow- test compounds, anaverage of1.65 candidates

ing the fihal stageof search, UV. and The range columnof Table 22indicates the lowest stages for highest number ofmatches obtainedin the respective differ from those any of the100 test compounds. These ranges considered independentof of Table 13whereeadh signature was

the sequentialsearch. the stage at The two righthand columns ofTable 22 list Six compounds were which uniqueidentification wasobtained. MS. Six- identified at theend of the firstsignature search, cumulative total of22 were teen additionalcompounds for a IR. At the identified followingthe secondsignature search, compounds were conclusion of the7-stage search, 68of the 100

uniquely identified. functional The selectivity ofthe searchesvaried with the of the fivelargest groups as wasexpected. The selectivity compounds is listedin Table 23. groupsrepresented in the test

HT RESEARCHINSTITUTE

66 IITRI-C6104-4 SEARCH SELECTIVITY(100 Known BY Input Compounds) Table 22 STAGES AVERAGE SURVIVINGNO. C ANDIDATERANGE COMPOUNDSWITH ONENUMBER OF STAGE1. State CANDIDATES 342.43 SELECTIVITY 0.685 LOW18 HIGH 387 CANDIDATENEW 0 CUM. 0 4.3.2. MSElementsIR 195.11 37.72 8.96 0.0180.0750.390 161 143387 39 16 60 22 60 6.5.7. GCNMRUV 5.851.651.85 0.0120.00330.0037 11 26 67 142210 446858 SEARCH SELECTIVITY (Known Input Compounds) Table 23 OF FUNCTIONAL GROUPS STAGE ALCOHOLS AVERAGE NUMBERALDEHYDES SURVIVING HYDRO-CARBONS KETONES CANDIDATES POLY-FUNCTIONAL 1.2.3. MSElementsState 340.47210.47 56.05 207.37350.50 33.87 345.56345.56 59.96 357.80227.00 37.20 113.60241.00 14.10 4.6.5. GCNMRIR 13.73 9.681.68 4.125.621.37 14.00 8.913.21 10.80.8.37 1.40 2.701.401.80 No. of TestCompounds7. UV 1.6319 1.12 8 2.6523 1.4010 101.20 The aldehydes had an averageof 1.12 candidates per test com-

pound at the end of the7-stage search followedclosely by the polyfunctionals with an averageof 1.20 candidates. The hydro-

carbons had poorerselectivity than all other groupsexcept in

the state and NMRsearches.

3. Precision

a. Input 1 - 100 Known Com ounds

As noted in Table 24,68 out of the 100 testcompounds

were uniquelyidentified in the 7-stage search. Sixty-five extraneous candidates wereselected for the remaining 32compounds.

The number of extraneouscandidates ranged from one to five,with

an average of2.03 for the multicandidatecompounds (1.65 average

for the 100 test compounds). Only two of the 65 extraneouscompounds had data on all signatures available as Shown inTable 24. Eleven had one

signature missing, 45 had twosignatures missing, and 7 hadthree

signatures missing. The ratings for the 65 extraneouscandidates ranged from 34 to 95 compared tomaximum possible ratings of

117 to 130 as Shown in Table25. The distribution of the number ofcandidates broken down by

functional groups is shown in Table26. The hydrocarbons had 13

multicandidate compounds out of 23tesi,ed. The extraneous

candidates ranged from one to five. The alcohols had 9 out of

19 multicandidate compounds,although only one or two extraneous

candidates were found.

HT RESEARCH INSTITUTE

69 IITRI -C6104 -4 _ ...--=didnEall.1.1.11.1.111.1111111111111111111.111111.111.1111

Tabl e 24

DATA AVAILABILITY COMPOUNDS 65EXTRANEOUSCANDIDATE

SUB- UNAVAILABLE NUMBER OF NUMBER SIGNATURES SIGNATURE S COMPOUNDS TOTAL MS IR NMR GC UV UNAVAILABLE , 2 2 0 ...-.-.....4 X 4 X 4 1 X 3 11

2 X X 1 X X X X 8 2 X X 4 X X 16 45 X X 14

X 1 X X 1 X X 3 X 5 7 X X X

65 Tot al .....,_ _

70 IITRI-C6104-4 Table 25 DISTRIBUTION OFCANDIDATE RATINGS

TRUE COMPOUNDS(32)

Rating No. Compounds matched) 130 18 (All data data) 126 11 (No elements column available) 121 1 (Only one GC only one GCcolumn) 117 2 (No elements data;

EXTRANEOUS COMPOUNDS(65)

60 1 34 1 62 5 35 1 64 2 39 3 65 3 40 1 66 1 42 2 3 43 3 67 70 2 44 2 72 1 45 1 74 1 46 4 75 1 47 1 *77 1 48 1 2 49 2 **78 80 1 53 1 84 1 54 5 86 1 56 1 87 1 57 1 93 1 58 5 95 1 59 1

*5 Signaturesavailable **5 Signaturesavailable for onecompound

71 IITRI-C6104.4 Thiols

Sulfides

Polyfunctionals I-6

Nitro

Nitriles 0 tones Ke I. 0

F IV ON H 4 Hydrocarbons H0 H 0 Halocarbons R Ethers

Esters iv

Diols tv 1-4

Aldehydes 1-,

Alcohols al CA)

% 0 01 OD 0 OD hi 0 CD 0 . 0 -+ 0 .0 --, g 0ill Z 0

M M H Pi q W H tli CI' W Prj 0 th) libb til 0 rt 0 H N rt 0 tr 1-3 zi 0 E w

t-t0T93-IIIIII The 32 test compoundswith multicandidates arelisted in

Table 27 by functional group. Table 28 shows the detailedtest results for each of themulticandidate compounds. Notations below the test resultsfor some compounds inTable 28 are

discussed in the followingsection dealing withmodified match

criteria. The letters X and M inTables 28designate"default" (no

data available) and"match," respectively. A and B in the GC The Y/Y row designatethe columns on which amatch was made. designations in the MS andIR rows indicatethe peaks that

matched; the left digitdesignates the input data,the right

digit designates the storeddata. The symbols + and -designate

matches on the tolerancelimits. The "true" inputcompounds

appear at the leftof eadh set of candidatesand are marked

with an asterisk.

lITRESEARCH INSTITUTE

73 IITRI-C6104-4 COMPOUNDS WITH MULTIPLE Table 27CANDIDATES HY FUNCTIONAL GROUP ALCOHOLS 1.2. 279244 : ethanol CAND. 22 KETONES 1.2. 296:305: cyclopentanone2-pentanone CAND. 42 4.3.5. 246761760 : : butanolnonanol2-ethyl-1-hexanol3-heptanol 22 HYDROCARBONS 1. 2: ethane 2 9.8.7.6. 247759278259 : : pentanoloctanolcyclohexanol2-ethyl-1-butanol 3 4.5.3.2. 10:50:51:53: n-decanecycldhexanecis-decalincyclopentane 24343 4=. ALDEHYDES 1. 322: propionaldehyde 2 10. 6.9.8.7. 12:56: 6:9:7: methyln-hexanen-nonanen-heptanen-hexadecane cyclohexane 44 DIOLS 1.3.2. 783:284:282: diethylene1,2-propanediol2,3-butanediol glycol 42 13.12.11. 715:701: 52: cycloheptane3-methylpentane2,3-dimethylbutane 656 ESTERS 1. 425: diethyl malonate 2 POLYFUNCTIONALS 1. 698: 4-hydroxy2-pentanone -4 -methyl - 3 HALOCARBONS 2.1. 819:208: p-dichlorobenzenebutyl benzoate 2 COMPOUNDS: ALCOHOLS(1) MATCHES OBTAINEDWITHMULTICANDIDATE TEST

658: Acetal Alcohol 244*:Ethanol 2 M 2 M State 4 M 4 M Elements 21 40 2/1 3/3 4/4 MS 1/1 2/2 3/3 4/4 1/3- 2/1- 9 1/1 2/2 3/3 36 IR 24 M 24 M NMR B+ 6 M : B 18 M : GC M : A, X 1, M 6 UV Score 67 Score 130 GC: No MatchCol. A 769: 3,3-Dimethyl- 2-butanol Alcohol 246*:Butanol M 2 M 2 State 4 M 4 M Elements 14 40 2/4 3/3 MS 1/1 2/2 3/3 4/4 1/3 2/1- 12 1/1 2/2 3/3 36 IR X 1 M 24 NMR 6 18 M : B M: A, Di : B GC 6 M 6 M UV Score 45 Score 130 GC: No MatdhCol. A MS: 2 of 4 Peaks

Citronellol Octanol 182: 3 Alcohol 278*: 2 2 State 4 4 Elements 19 4/4 40 1/1 4/3 MS 1/1 2/2 3/3 2/1+ 18 1/1 2/2 3/3 36 1/2 IR X 1 24 NMR X 1 18 M : A, M : GC X 1 rvl 6 UV Score 46 Score 130 MS: 2 of 4Peaks

75 IITRI-C6104-4 Table 28.1 (Continued):

MATCHES OBTAINED WITH MULTICANDIDATETEST COMPOUNDS: ALCOHOLS (2)

Alcohol 279*:Nonanol 182:Citronellol

State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 1/1 3/3 24 IR 1/1 2/2 3/3 36 1/2 3/1+ 13 NMR M 24 X 1 1 GC M : A, M :B 18 X UV M 6 X 1 Score 130 Score 46 MS: 2 of 4 Peaks .wA--1 776: Cyclopentyl- Alcohol 760*: 3-Heptanol methanol

State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 2/3 3/1 13 X IR 1/1 2/2 3/3 36 1 NMR M 24 M+ 20 X 1 GC M : A, M :B 18 UV M 6 M 6 Score 130 Score 47 NMR:1 of 2 Peaks MS: 2 of 4 Peaks m, 761: 2-Ethyl- Alcohol 1-hexanol 182: Citronellol

State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 3/1 4/3 7 18 IR 1/1 2/2 3/3 36 1/2 2/1- X NMR M 24 1 X GC M : A, M :B 18 1 X UV M 6 1 Score 130 Score MS: 2 of 4 Peaks

76 IITRI-C6104-4 Milwarib Table 28.1 (Continued): 7 Alcohol MATCHES247*: OBTAINED WITH Pentanol NnLTICANDIDATE TEST 182: Citronellol COMPOUNDS: ALCOHOLS 256: 2,2-Dimethyl-1-butanol (3) ElementsState 1/1 2/2 M 3/3 4/4 40 42 2/3 MM 3/1 13 42 MDI 421 MSNMRIR 1/1 2/2 M 3/3 3624 1/1+ 2/2 X 27 1 2/1- 3/3 Xx M : B 12 91 GCUV 11 : A, M M:BScore: 130 18 6 MS: 2 of 4 Peaks X Score 49 1 GC: No Match Col. M Score Am--wi 35 6 AlcoholState 259*: Cyclohexanol M 2 1 690: 2-Butoxy-ethanol M 42 182: Citronellol M 42 MSElementsIR 1/1 2/2 M 3/3 4/4 4036 4 1/2-1/1 2/3- M 4/4 1915 1/1+ 2/22/4 M 4/1 27 81 NMR : M M : B 1824 M : A+, X M : B- 12 1 XX 1 GCUV M A, M Score 130 6 MS: 2 of 4 Peaks X Score 54 1 MS: 4/12 ofMatch 4 Peaks X Score 44 1 Table 28.1 (Continued): I MATCHES OBTAINED 759*: 2-Ethyl- WITH MULTICANDIDATE 765: 2,Methyl- TEST COMPOUNDS: AICOHOLS770: (4) 2-Heptanol 9 Alcohol 1-butanol 2 1-pentanol - M 2 M 2 MSElementsState 1/1 2/2 11M 3/3 4/4 4036 4 1/11/1 2/3- M 3/23/3 4/4 2828 4 1/21/1 2/2- M 3/4 4/3 2127 4 NMRGCIR 1/1 M : A, 2/2 M 3/3 M : B 2418 M : A, X M : B 18 61 MX DI : B 916 UV M Score 130 6 M Score 87 GC: No Match Col. Score A 70 I CD MATCHES OBTAINED WITH MULTICANDaDATE Table 28.2 TEST COMPOUNDS: DIOLS (1) DiolState 284*: Diethylene glycol M 2 282: 1,2-Propanediol M 2 MSElementsIR 1/1 2/2 M 3/3 4/4 4036 4 1/2+1/1 MM+ 3/33/1 4/4 282011 4 GCNMRUV M : A, MM M :Score B 130 2418 6 M : A+ M Score 77 66 i GC:NMR: 1 of 2 No Match Col. Peaks B Diol ,783*: 2,3-Butanediol 765: 2-Methyl- 1-pentanol ElementsState DIM 3/3 4/4 40 42 2/3 MM 3/1 4/4 17 42 MSNMRIR i 1/11/1 M : 2/22/2 A, M 3/3 M : B 243618 1/1 M : A 2/2 X 28 91 GC 1 M 6 Di 6 UV Score 130 L. GC: No Match Col. Score B 67 I MATCHES OBTAINED WITH MULTICANDIDATE TEST COMPOUNDS:Table 28.2(Continued): DIOLS (2) I 256: 2,2-Dimethyl- 12 DiolState 282*: 1,2-Propanediol M 2 284: Diethylene glycol M 2 765: 2-Methyl-1-pentanol M 42 1-butanol MM 42 MSElements 1/1 2/22/2 M 3/3 4/4 3640 4 1/31/1 2/1- M 3/3 4/4 1228 4 1/1 2/2 2/1 M 4/4 3013 1/1- 2/2+ X 24 1 NMRIR : M 1824 II : A- M- 20 6 M : A+ X 61 M : A- X 6 GC M A, M : B 1 6 M 6 M 6 M ' M UV , Score 44 co Score 130 NMR:GC: 1 of 2 Peaks No Matdh Col. B Score 78 MS:GC: No Match2 of 4 Col.Peaks B Score 62 GC: No Match Col. B MATCHES OBTAINED WITH MISCELLANEOUS GROUPS (1) TableMULTICANDIDATE 28.3 TEST COMPOUNDS: 13 Po1yfunc-1tional methy1-2-pentanone698*: 4-Hydroxy1-4- 2 1 797: Isobutyric acid M 2 402: Ethylidene diacetate 14 2 MSElementsState 1/1 2/2 m/I 3/3 4/4 3640 4 1/1 2/1 M 3/2- 4/3 1119 4 1/1 2/4 NX 22 41 GCNMRIR 1/1 2/2 M M : B 24 96 MX 24 1 X M : B 19 UV M Score 121 m.----- NMR:MS: 21 ofof 42 PeaksPeaks Score 62 MS: 2 of 4 Peaks Score 40 MATCHES OBTAINED WITH MULTICANDIDATE TEST Table 28.3 (Continued): COMPOUNDS: MISCELLANEOUS GROUPS (2) HalocarbonState 208*: p-Didhloro- benzene M 2 chloro-benzene755: 1-Bromo-4- M 42 MSNMRElementsIR 1/1 2/2 MM 3/3 4/4 402436 4 2/2+ 3/3+ MM+ 3/3 4/4 201312 GCUV M : A, M M Score: B 1130 18 6 X Score 53 11 Aldehyde 322*: Propionaldehyde MS: 332: Acrolein2 of 4 Peaks MSElementsStateIR 1/1 2/2 MM 3/3 4/4 3640 42 1/1+ MM 3/33/4 4/1 21 742 GCNMR M : A, M M : B 1824 6 M : A- XX 61 UV M Score 130 MS:GC: No4/1 2Matdh ofMatdh 4 Col.Peaks B Score 42 Table 28.4

MATCHES OBTAINED WITHMULTICAND1DATE TEST COMPOUNDS: ESTERS 661: Isobutyric Ester 425*:Diethyl malonate anhydride

State M 2 M 2 Elements M 4 M 4 MS 1/1 2/2 3/3 4/4 40 2/1 4/4 13 IR 1/1 2/2 3/3 36 1/1-1/2+ 24 NMR M 24 M- 20 X 1 GC M : A, M : B 18 X UV M 6 1 Score 130 Score 65 q , NMR:1 of2 Peaks MS: 2 of4 Peaks .

Benzyl benzoate Ester 819*:Butyl benzoate 818:

State M 21 M 4 Elements M 4 M 24 MS 1/1 2/2 3/3 4/4 40 1/1 3/3 34 IR 1/1 2/2 3/3 36 1/1 2/2 3/3- 24 NMR M 24 M 1 GC M :A, M:B 18 X 4 UV M 6 M+ Score 130 Score 9 NMR:1 of 2 Peaks MS: 2 of 4 Peaks

82 IITRI-C6104-4 MATCHES OBTAINED WITH MULTICANDaDATE Table 28.5 TEST COMPOUNDS: HYDROCARBONS (1) HydrocalbonElementsState 2*: Ethane M 02 Propane M 02 MSNMRIR 1/1 2/2 M 3/33/3 4/4 403624 1/21/1 2/2+2/3 X 2721 1 GCUV M : AI M M :Score: B 126 18 6 MS: 2 of MX4 Peaks Score 58 61 Hydrocarbon 53*: Cis-decalin 2 54: Trans-decalin M 2 MSElementsStateIR 1/1 2/2 M- 3/3 4/4 4036 0 1/1 2/2 - 3/3 4/3 3615 0 GCNMR M : A, M M : B 2418 XM 24 1 UV , M Score 126 6 MS: 2 of M 4 Peaks Score 84 6 MATCHES OBTAINED WITH MULTICANDIDATE Table 28.5 (Continued): TEST COMPOUNDS: 845: Trans-1,3-dimethyl-HYDROCARBONS (2) 20 Hydrocarbon 50*: Cyclopentane , 844: Cis-1,4-dimethyl-cyclohexane 2 cycldhexane 2 MSElementsState 1/1 2/2 M- 3/3 4/4 40 02 M_ 3/3 4/1 09 M_ 3/3 4/1 90 1 GCNMRIR 1/1 M : A 2/2 M 3/3 2436 9 MX 24 11 MXX 24 61 UV M Score 117 6 MS:MS: 4/12 ofMatch 4 Peaks M Score 43 6 MS:MS: 4/12 ofMatdh 4 Peaks M Score 43 MATCHES OBTAINED WITH MULTICANDIDATE TEST Table 28.5 (Continued): CM:POUNDS: HYDROCARBONS (3) Hydrocarbon 51*: Cyclohexane 2 844: Cis-1,4-dimethyl-cyclohexane 845: Trans-1,3-dimethylcyclohexane 1: MSElementsState 1/1 2/2 E- 3/3 4/4 40 0 - 3/3 4/1 90 ._ 3/3 4/1 90 GCNMRIR 1/1 2/2 M 3/3 M : B 3624 9 M+X 20 1 XM+X 20 1 UV M Score 117 6 MS: 4/12 ofMatch 4 Peaks M Score 39 6 MS: 4/12 ofMatch 4 Peaks M Score 39 6 HydrocarbonElementsState 6*: n-Hexane M- 02 1 13: n-Heptadecane M- 02 36: Squalane -M 02 839: 2-Propanethiol M- 02 MSIR 1/11/1 2/2 3/3 4/4 3640 1/11/1 2/2 3/43/3+ 34 1/11/1 2/22/2 3/4 3034 1/2+ 2/32/1 /I 3/3 1717 GCNUR M : A, M 1.1 : B 1824 X 1 XX 1 X 24 1 UV i M Score 126 6 M Score 78 6 M Score 74 6 X Score 62 1 MS:NMR: 21 of 42 Peaks I Table 28.5 (Continued): TEST COMPOUNDS: HYDROCARBONS (4) 7 : n-Heptan! MATCHES OBTAINED WITH MULTICANDIDATE 13: n-Heptadecane 839: 2-Propanethiol 36: Squalane Hydrocarbp0State M- IF 02 144- 02 /4 02 M- 0 MSElementsIR 1/1 2/2 3/33/3 4/4 4036 1/1-1/2 2/22/4 3/33/1 3322 1 1/21/1 2/3 242017 1/11/2 2/22/4 X 3/1 2722 1 GCNMRUV M : A, MM M :Score B 126 2418 6 MXX Score 65 61 MX Score 65 11 MX Score 59 61 Hydrocarbon 9*: n-Nonane 839: 2-Propanethiol MS:NMR: 1 of 2 13:Peaks n-Heptadecane2 of 4 Peaks 36: Squalane MSElementsState 1/1 2/2 II - 3/3 4/4 40 02 1/1 M- 3/3 24 02 1/2 2/1 14- 3/4 27 02 1/2 2/12/2 M- 3/4 27 02 GCNMRIR 1/1 M : A, 2/2 Di 3/3 M : B 182436 1/2 2/3 Di X 2420 1 1/1- 2/2 X 3/3 33 1 1/1 X 11 UV M Score 126 6 MS:NMR: 1 of 2 Peaks 2 of 4 Peaks X Score 72 1 M Score 70 6 M Score 64 6 Table 28.5 (Continued): Hydrocarbon 10*: n-Decane MATCHES OBTAINED WITH M1JLTICANDaDATE TEST 839: 2-Propanethiol COMPOUNDS:13: HYDROCARBONS (5) n-Heptadecane 36: Squalane MSElementsState 1/1 2/2 M- 3/3 4/4 40 02 1/1 M- 3/3 24 02 1/2 2/1 M- 3/4 17 02 1/2 2/1 M- 3/4 17 02 GCNMRIR 1/1 M : A, 2/2 M 3/3 M : B 361824 1/2+ 2/3+ MX 2415 1 1/1 2/2+ X 27 1 1/1 2/2+ X 27 11 UV M Score 126 6 MS:NMR: 1 of 2 Peaks 2 of 4 Peaks X Score 67 1 M Score 54 6 M Score 54 6 Hydrocarbon 12*: n-Hexadecane 13: n-Heptadecane 36: Squalane i 839: 2-Propanethiol MSElementsState 1/1 2/2 M- 3/3 4/4 40 02 1/1 2/2 M- 3/3 4/4 40 02 1/1 2/2 M- 3/3 4/4 40 02 2/1 M- 4/3 12 02 GCNMRIR 1/1 : A, 2/2 M 3/3 M : B 361824 1/1 2/2 X 3/3 36 1 1/1 2/2 X 30 1 1/2+ 2/3 MX 2417 1 UV M DI Score 126 6 M Score 86 6 M Score 80 6 MS:NMR: 21 of 42 Peaks X Score 57 1 Table 28.5 (Continued): COMPOUNDS: HYDROCARBONS (6) 27 Hydrocarbon 56*: Methylcyclo- hexane MATCHES OBTAINED WITH MULTICANDIDATE TEST 836: 4-Methyl-valeronitrile pentanenitrile834: 4-Methyl- dimethylcyclohexane846: Trans-1,4- MSElementsState 1/1 2/2 M - 3/3 4/4 40 02 2/1 M - 3/2 12 02 2/1 M- 3/2 12 02 2/1 M- 3/3 14 02 GCNMRIR 1/1 : A, 2/2 M 3/3 M : B 132436 1/1- 1/2+ 3/3- XM+ 2028 1 1/1- 1/2+ M+X 2024 1 MXx 24 1 UV M M Score 126 6 MS: 2 of 4 Peaks X Score 64 1 MS: 2 of 4 Peaks X Score 60 1 MS:NMR: 21 of 24 Peaks M Score 48 6 717: 2-Methyl- 2-butene 726: Chlorocyclo- hexane methylcyclohexane844: Cis-1,4-di- , 845: Trans-1,3-di-methylcyclohexane . 28 Hydrocarbon 52*: Cycloheptane M 2 M 2 m 2 m_. 2 m 2 MSElementsState 1/1 2/2 - 3/3 4/4 3640 0 1/31/1 - 3/13/2+ 1812 0 1/11/4 - 3/3-3/3 1812 0 1/3 x 3/1 12 01 1/3 x- 3/1 12 01 GCNnRIR 1/1 M : A, M M : B 1824 MX 24 1 MX 24 1 XM 24 61 MX 24 61 UV M Score 126 6 r X Score 58 1 -4 x Score 58 M Score 46 --i 2 of 4 Peaks M Score 46 I MS: 2 of 4 Peaks MS: 21/4 of Match4 Peaks MS: 2 of 4 Peaks 1 MS: MATCHES OBTAINED WITH MULTICANDIDATE Table 28.5 (Continued): TEST COMPOUNDS: HYDROCARBONS 705: 2,2-Dimethyl- (7) In: 2,3,4-Trimethyl- HydrocarboState 701*: 2,3-Dimethyl- butane M 2 836: 4-Methyl-valeronitrile M 2 36: Squalane - 02 13: n-Heptadecane M_ 02 butane M- 02 pentane M_ 0 - 0 , MSElementsIR 1/11/1 2/2 2/2 3/3 3/3 4/4 - 3640 0 1/11/3 2/3- 3/2 4/4 1824 1/11/2 2/2 3/4 1930 1/11/2 2/2 3/4 3018 1/1 2/2 3/3 4/4 X 40 1 1/1 2/2- 3/1- 3/3 X 3/4 1022 1 GCNKR M : A, M Ii:B 2418 XM- 20 1 XX 61 X 61 MX M : B- 6 MX 61 UV M Score 126 6 X Score 66 MS: 2 of 4 Peaks M Score 58 MS:NM: 21 ofof 42 PeaksPeaks M Score 58 i Score 56 MS: 2 of 4 Peaks Score 42 ' -"713: 2,3-Dimethyl- 41: 3-Hexene HydrocarbonState 715*: 3-Methyl- pentane M- 2 pentane M- 02 712: Isopentane M - 02 13: n-Heptadecane M - 02 36: Squalane M- 02 M- 02 FIR1MS Elements 1/1 2/2 3/3 4/4 4036 0 1/21/1 2/22/4 3/3 3626 1/11/4 2/2 3/3 3012 1/11/1 2/2 3/4 2230 1/1 2/2 3/4 3022 1 1/3+ 2/2 3/1 3/2- 2016 6 UVGCNMR M : A, M M : B 2418 6 MmX 24 61 MMX 24 61 MXX 61 XMX 61 M : A XM+ Score 54 19 Score 126 J Score 95 MS:MS: 1/4 2 ofMatch 4 Peaks Score 75 MS: 2 of 4 Peaks Score 62 MS: 2 of 4 Peaks Score 62 MS:NMR: 21 of 42 PeaksPedks 1 Table 28.6 KETONES MATCHES OBTAINED I 305*: Cyclo- WITH MULTICANDaDATE 812: PropylTEST COMPOUNDS:acrylate 31 KetoneElementsState pentanone M 42 MM 421 NSNMRIR 1/11/1 2/22/2 M 3/3 4/4 403624 1/1 : A- X X 3/3 24 61 GCUV M : A, El M :Score B 130 18 6 GC: M No Match Col. X Score B 39 1 _I 32 Ketone 296*: 2-Pentanone 2 1 373: Methacrylic M acid 2 ,1 404: Allyl acetate M 42 384: Isoblityl M formate 42 MSElementsState 1/1 2/2 M 3/3 4/4 4036 4 M 3/3-3/2 4/1 19 74 1/11/2- 2/3 M 4/2 1718 1/11/2- M 3/1- 4/4 1019 1 NMRGCIR 1/1 M : 2/2 A, M 3/3 M : B 2418 1/1+ XM- 20 1 M : A- XX 611 M : A- XX 61 UV M Score 130 6 X Score 54 Score 49 -i No Matdh Col. B Score 43 MS:NMR: 1 of 2 Peaks 4/12 ofMatch 4 Peaks I MS:GC: No 2Matdh of 4 Col.Peaks B GC: In general, it can besaid that hydrocarbons and alcohols posed the greatest problemwith respect to multiple candidates. The reasons for this appear tobe correlated to compound type and lack of data. The summarized results in Tables261 27 and

29 point out the large numberof alcohols and hydrocarbons that came through with multiplecandidates. Tables 24 and 29 show that, in general, significant data wereunavailable particularly in GC and NMR. Of all the signatures these twocould be the most important for characterizingalcohols and hydrocarbons. This correlates wlth the factthat hydrocarbons and alcohols were the worstoffenders among the extraneous candidates selected more than once. The observation one can make isthat a more complete data file, with particular emphasis on GCand NMR which are among the less abundant signatures, wouldgreatly reduce (by perhaps

75 percent) the multiple candidateproblem we experienced. The greatest influence of havingthese data would be greater discrimination in the alcohols andhydrocarbons.

b. Input 2 - Compounds WithoutStored Data Five compounds that do not havesignature data stored in

the data base were tested. These Input 2 compounds are:

68: n-Prophylbenzene

270: m-Cresol

8: Octane

74: Indene

92: Thiophene.

lITRESEARCH INSTITUTE

91 IITRI-C6104-4 L I' EXTRANEOUS CANDILATE COMPOUNDSSELECTED MORE THAN ONCE Table 29 COMPOUND SELECTIONS NUMBER MS UNAVAILABLESIGNATURESIR NMR GC UV' 182: 13:36: CitronellolSqualanen-Heptadecane 75 XX X X 839:845:844: cis-1,4-dimethyl-cycldhexane2-Propanethiol 35 X XX X 256:836:765: trans-1,3-dimethyl-cycldhexane4-Methyl-valeronitrile2-Methyl-1-pentanol2,2-Dimethyl-1-butanol 32 X XX X X precision; i.e., no Compounds 68, 92,and 74 had perfect candidate compoundssurvived. of one The search form-cresol resultedin the candidacy had a rating of compound, 274(2,4-dimethylphenol), which because no GC 28 points. The GC signaturesearch was omitted data were availablefor m-cresol andthe 2,4-dimethylphenol Hence passed through theNMR and the IRsearches by default. signature searches the candidatecompound passedthrough only two 2/2 and the by matching thenegative tolerance onUV and the

4/4 peaks in MS. Compound 274 is anexample of a compoundthat was very because of chemical similar to the unknownand passed through candidate similarity and default. If GC data wereavailable, the homologues, compound surelywould have beenrejected, because readily resolved, sudh as m-cresol and2,4-dimethylphenol, are particularly with the GCoperating parametersused. Although similar enough to these molecules aredifferent, they are considering produce very similarMS, IR, and NMRspectra, and, probably pass our matchcriteria,2,4-dimethylphenol will through as a candidatebased on thesespectral data. The search for n-octane,a hydrocarbon,resulted in

selecting the sixcandidates shown inTable 30 with scores

ranging from 72 to30 out of a possible130. The following screening key factors weresignificant in thefailure of the n-octane: process toreject the sixcandidates accompanying

lITRESEARCH INSTITUTE

93 IITRI-C6104-4 Table 30 CANDIDATE COMPOUNDS RETRIEVED IN SEARCH FOR OCTANE SEARCH a NO. 19: 3 -METHYLHEPTANE COMPOUND NAME STATE ELEMENT SKIP 2/41/1MSRESULTS 2/+21/+1 IR NMR D GC A UV M SCORE 72 25: 2,5 -DIMETHYLHEXANE SKIP 4/24/22/31/1 1/12/32/+2 D -B M 71 763:839: 2 2-PROPANETHIOL -HEXANOL SKIP 2/31/12/31/2 1/+22/+31/1 MD DA MD 6863 13: N-HEPTADECANE SKIP 4/11/22/4 1/13/32/+2 D D M 5630 a D 29:= default; M = match. 2 3.3-TRIMETHYLPENTANE M SKIP 4/41/1 D D D M rejection (1) GC data on Carbowax20M would have caused of 2-propanethiol and2-hexanol

(2) GC data on n-heptadecanewould have caused rejection of this compound

(3) The chemical similarity amongthe four hydrocarbons (all saturatedhydrocarbons, three of which

have eight carbonatoms) would make itdifficult to distinguish on the basis of ourMS, IR, and UV match

criteria. NMR may or may nothave been selective

(4) The MS fragmentationbehavior of hydrocarbons, 2-hexanol, and 2-propanethiolis very similar.

In practice, evenwith entire mass spectrafor

the candidate compounds,it is often difficult to distinguish between them.

The interesting featureof the above examplesis the similarity of the candidates tothe input compound. This proves to be anadvantage because if there are nodata in the

file for an unknown, thecompounds selected give astrong indication of the nature ofthe compound and can be veryhelpful of in identifying thecompound type. Of course, the selection "false" candidatei dependsnot only on matches,but also on defaults; the more dataavailable, the greater theselectivity,

while the lack of data couldpermit more candidates tobe

carried through by default.

If a large number ofdefault compounds arecarried through,

little help might be provided tothe user, since therewould be

lITRESEARCH INSTITUTE

95 IITRI-C6104-4 no reason to suspectthat the default candidates arestructurally related. No inferences shouldbe made on the basis of default matches. Judgments should be made onthe basis of true data matches. Because the Input 2 compounds servethe same purpose as

Input 1 compounds, no further tests weremade on this group, and further evaluation of testresults is included in thediscussion of modified match criteria.

. Input 3 - Compounds wlthLaboratory Measurements

The Input 3 compounds wereused to evaluate measurement repeatability, data standards, andthe tolerance limits that would be necessitated to accommodatevariations in measured data. Eleven compounds were chosen atrandom from the 100

"unknowns." All eleven compounds were analyzedby each of the

five analytical techniques. Only one, diphenyl sulfide,lacked

the relative retention data onthe Carbowax 20M column.

Table 31 lists the eleven compoundsalong wlth the

abstracted data generated in ourlaboratory. Included in the

table are the abstracted data forthe compounds contained in

the file. The laboratory generated data wereabstracted and entered into the data sheets forunknowns. These data were

screened against the file according tothe initial match

criteria described in IVA3a. The results of the screening process aretabulated in

Table 32. Nine of the eleven came through asunique matdhes.

lITRESEARCH INSTITUTE

96 . IITRI-C6104-4 Pebble 31 MS COMPARISON BETWEEN STORED AND LABORRTORY IR DATA NMR (Rel Ret. Time)GC (miL) UV , (Mass) (13m) Param. Stored Lab Stored Lab Unknown A 560:1-Butanethiol Compound PkParam. 321 Stored 274156 Lab 904156 PkParam. 132 Stored 8.27.86.8 Lab7.2*7.86.8 Stored 0.9 Lab0.9 Col BA 0.570.76 0.760.59x NA . E 327:Isobutyraldehyde PkPk 4 231 90274143 47+412743 Pk 321 3.76.85.8 6.83.75.8 0.9 1.0 Col BA 0.330.31 0.330.31 214 214 c 10:Decane PkPk 4 312 43714157 29+415743 Pk 213 6.83.97.2 13.7* 6.87.2 1.3 1.3 Col BA 0.702.7 0.68x2.7 D 184:Menthone PkPk 4 321 112 412969 112 416971+ Pk 312 5.87.36.8 5.87.36.8 NA 0.9 Col BA 6.0 6.1 NA 236 E 685:p-Anisaldehyde PkPk 4 312 135136 5577 136135 5577 PkPk 31 2 8.67.96.2 8.8*8.06.3 3.9 3.8 Col bA NA9.4 45 9.4 274 274 r 323:n-Butyraldehyde PkPk 41 32 92442729 4476+2743 PkPk 1 32 6.83.75.8 10.3* 5.88.7* 1.0 0.9 Col BA 0.420.35 0.410.21 290 290 G 458:Diphenyl Ether PkPk 14 23 170 775143 170 5141+77 Pk 321 8.16.76.3 14.5* 8.16.7* 7.0 7.0 ColCol A B 1754 1747 226 226 H 574:Diphenyl Sulfide PkPk 214 3 185186141 51 185186141 51 Pk 321 14.513.6 6.3 13.614.5 6.8* 7.2 7.2 Col BA 14 NA 14 250 250 I 249:n-Heptanol Pk 124 184 417056 184 417056 Pk 213 9.56.93.0 9.53.16.9 1.3 1.3 Col BA 4.62.5 4.3x2.4 J 540:Nitrobenzene PkPk 34 12 123 517743 123 4357+77 Pk?k 32 1 14.3 7.46.6 14.2 6.67.4 7.6 7.6 Col 3A 17 4.9 17 4.5 260 258 x 528:o-Tolunitrile PkPk 314 2 117116 9050 116117 9036+ PKPk 1 32 13-1 4.66.7 13.1 6.74.5 2.6 2.5 ColCol A B 12 4 12 3.9 228 228 + = No match on any one of four stored peaks.of any one of three stored Pk 43 89 89 peaks. TPNA = Transparent.Not available.x* L..= Outside Outside toleraueetolerance limits limits. V711411111MITIFI, 7 itiessmimerl

IDINTIPICATION or COMPOUNDS PhWiLAMOSATORY Table 32 Cosmound 4 Commoumd X Search _Mpiscrim C. ..und A M+D Select,Cand. 387 387Discrim.M coemound IS 387M+D Select, Cand. 387 387DiscrimM compound c M+D387 Sglect. Cand. 387 387Discrim.M compound D 387M+D Select, Cond. 387 I Discrim.387M compound E M+D387 Select. Cand. 387 Discrim.387M Compound F M+D387 Select,cand. 387 387Discrim.M 387 5elect cand. G387 387DiscrIm.M Compound H M+D387 Select.Cand. 387 387Inserts.M M+D387 Select.Cand. 387 387DlecrimM MiM387 §elipct. Cand. 387 Diecrlm.387M NE2387 Select. cand. 387 MSElementsState 387 773016 387 5216 14 24 300131 28 300153 58 227 1189 108 -48 130 78- 387 9134 300 d939 300119 61 227 25 4 300 10 6 300 4028 227 11 1 138300 14 160300 44 227 1096 300 1417 300 4439 227 18 2 16 1 164623 14 1 300103 49 125300 79 227 1668 2713 7 373527 10 2 1480 5 442780 48 24 GCMilltIR 1042 185107202 12 5011 203193 11 2 49 7 199192 20 56 41 5 272197184 13 2011 4 287154196 1 1042 6 273185202 03 1311 3 280154195 1 33 83 275195176 1 8549 7 352199192 13 1 1412 7 281199155 12 141612 281204159 12 UV 85 352 ti 1 6 273 1 85 352 . Candidate Score candidate Score Candidate Score Candidate Score Candidate Score Candidate Scars co cation(TrueIdentifi- thiol1-Butane-candidate Score 104 aldehydeiso-Butvr-Candidate Score 103 nSqualanen-DecaneCandidate -Septa- decane Score 107 6771 MenthoneCandidate Score 87 aadehydep-Anis-Candidate Score 110 EtherDiphenvl 108 SulfideDiohenyl 104 n-Heptanol 101 benzeneNitro- 107 nitrileo-Tolu- 120 lined)under-compound 2,3,32 -Propane-Tri -methylthiol - - 6729 M = Matches = Matches+Defaults pentane ,.... - ,e,orel/FOOFOilig_

One, n-butyralddhyde, was lost due to a "no-match" on infrared.

A comparison was made of our infrared spectrum with the original data source (Sadtler IR #333). Sadtler's spectrum shows a strong band at 13g Which is totally absent in our

spectrum. We strongly suspect an impurity in Sadtler's material making that Whole spectrum suspect. Since we cannot, at this time, obtain a confirmation on Sadtler's spectrum we must carry

it through as a mismatch.

Another unknown, n-Decane, encountered the same problem as

the hydrocarbons encountered in the Input 3 category Which were discussed earlier. N-Decane came through with 4 additional candidates. These four were the same compounds that came through with other hydrocarbons. These four candidates lacked gas chromatographic data. If gas chromatographic data had been

available these four extraneous candidates would have been

eliminated.

All but one of the proper candidates received a rating of

over 100 out of 130 maximum. None received 130 points. Menthone, though uniquely selected, obtained a ra ,ng of 87. The reason

for this was the lack of both NMR and UV data in the file. if these signatures had been present and there had been matdhes within the tolerance limits, 24 more points would have been

Gbtained. In spite of the lack of these data, methone still came through as a unique candidate.

Two of the eleven searches resulted in a unique selection only after UV had been searched. Prior to the UV search two

HT RESEARCH INSTITUTE

99 IITRI-C6104-4 candidates remained. Although this suggests that UV does have

its ability to be discriminating, a search based on the modified match criteria would probably have eliminated the need for UV.

This supposition was not tested however.

d. Precision with Modified Match Criteria

The results obtained above using the initial match criteria described in Section IVA3a can be considered as a first pass based upon assumptions as to the efficacy of each signature

in identifying an unknown compound. There were no explicit guidelines to determine the number of peaks to be searched in

each signature, the number required to establish a match, and

the tolerance limits that should be established.

Subsequent to an evaluation of the precision of identification obtained with the initial match criteria, additional tests were made using the modified match criteria described in

Section IVA3b. These tests were made only on the 32 known input compounds for which multiple candidates were obtained. No

effort was made to determine the effect of the modified criteria on the discrimination and selectivity measures derived from the

tests using the initial criteria. The modified criteria would have the effect, in general, of improving both discrimination

and selectivity. The availability and default ratios, however, would also be changed by the requirement for additional data

in the NMR and GC signatures.

The results of imposing the modified match criteria on the

32 test compounds with multiple candidates are recorded beneath

PITRESEARCH INSTITUTE

100 IITRI-C6104-4 been affected compoundsin Table 28. The ratings have not criteria, however. The changed in accordancewith the modified singly number of candidateseliminated by themodified criteria,

and in combination, arelisted in Table33.

Table

CANDIDATES ELIMINATEDWITH MODIFIED MATCHCRITERIA

Number of Candiewites Match Criterion Eliminated

(1) GC: Columns A and B 7

(2) NMR: 2 of 2Peaks

(3) MS: 3 of 4 Peaks 17

(4) MS: 1/4 or 4/1 match eliminated 2 (1) and (2) 1

, 5 (1) and (3) ;

i (2) and (3) 11 i 7 (3) and (4)

!I (1) and (3)and(4) 1

1 1 (2) and (3)and(4) ;

Total 51

eliminated Fifty-one of the65 extraneouscompounds were Of leaving only 8 testcompounds with multiplecandidates. candidate and 6have 2 these 8 compounds,2 have oneextraneous compounds were extraneouscandidates. Seven of the test lITRESEARCH INSTITUTE

101 IITRI-C6104-4 hydrocarbons and one was an alcohol. Each of five of the hydrocarbons had n-heptadecane and squalane as the extraneous candidates. Certainly a more complete data file on these compounds would have resulted in rejection also.

For the 100 test compounds, the modified match criteria resulted in unique identification of 92 of the compounds.

f. Additional Match Modifications

A number of additional match modifications can be considered for the improvement of search precision. These include the number of signatures to be searched, data tolerances, and number of peaks to be searched and matched. As the data base grows and encompasses thousands of compounds, someadditional match criteria may be required to maintain satisfactory precision.

One of the objectives of the search experiment was to determine the minimum number of signatures necessary to achieve identification. Table 22 has indicated that the five signatures each aided in this task. To investigate the precision obtained with various combinations of signatures, ten compounds were selected and matches were made on combinations of 2,3,4, and

5 signatures. Only combinations were tested because the final results are independent of the order of search. The results are summarized in Table 34 and illustrated in Figures7, 8, and

9. In the 2-signature search, the average number of candidates for the 10 test compounds averaged over the 10 combinations of five signatures taken 2 at a time was 49.2. The MS.IR combination

lITRESEARCH INSTITUTE

102 IITRI-C6104-4 Table 34

SELECTIVITY OF COMBINATIONS OF SIGNATURES (10 Known Input Compounds*)

AVERAGE NUMBER OF SURVIVING CANDIDATES NO. SIGN. SIGNATURES 'Stage1 Stage 2 Stage 3 Stage 4 Stage 5 MEAN

MS-IR 73.2 15.3 MS1\1MR 73.2 30.3 MS.GC 73.2 22.1 MS-UV 73.2 45.1 49.2 2 IR-NMR 65.8 32.2 IR-GC 65.8 28.4 IR-UV 65.8 39.2 NMRGC 175.3 55.6 NMR-UV 175.3 109.4 GC-UV 205.3 114.7

MSIR-GC 73.2 15.3 4.0 MSIRUV 73.2 15.3 10.2 MSIRNMR 73.2 15.3 8.8 MS-NMR-GC 73.2 30.3 7.4 13.2 3 MSNMR-UV 73.2 30.3 21.1 MSGCUV 73.2 22.1 11.0 IRNMR-GC 65.8 32.2 12.3 IRNMR-UV 65.8 32.2 20.9 IRGC-UV 65.8 28.4 11.2 NMR-GC-UV 175.3 55.6 25.0

MS-IRNMR-UV 73.2 15.3 8.8 5.4 MSNMR-GCUV 73.2 30.3 7.4 3.6 2.9 3.8 4 IRNMR-GC,MS 65.8 32.2 12.3 IR-GC-UVMS 65.8 28.4 11.2 2.1 NMR-GC-UVIR 175.3 55.6 25.0 5.0

2.9 1.5 1.5 5 IRNMR-GCMSUV65.8 32.2 12.3

*All compounds had only onesurviving candidate in 7-stagesearch. 103 IITRI-C6104-4 L Ii 500o 200300400 "C 1 00 NER NMR UIT UV

AVERAGE OF 10 COMBINATIONS = 49.2 SELECTIVITY OF 2-SIGNATUREAVERAGES SEARCHESOF 10 KNOWN COMPOUNDS Figure 7 SEARCH STAGES 400500300 i1 i i i i 200100 NMR 50 idS MS MS MS ms ND11111 IAS ILIIII IR MIlikIIIE! IR IR GC 0E 4030 lit NKR a 1.0iR GC NKR , NMR GC UV r=1P 20 IR sIR siR UV GC UV 0 C.)P 10 111 Ill UV N MR GC 453 GC 11 2 I AVERAGE J OF 10 COMBINATIONS = 13.2 1 SELECTIVITY OF 3SIGNATUREAVERAGES SEARCHES OF 10 KNOWN COMPOUNDS Figure 8 SEARCH STAGES 400500300 200100 ii ii NMR 50 llir MS rib ws MKIII IR . IR GC 403020 \ NMR NMR GC UV 10 IR . NMR GC UV -1C I 1 45 UV 1111 1 UV . IR 23 AVERAGE OF MS 5 COMBINATIONS MS = 3.8 1 SELECTIVITY OF 4-SIGNATUREAVERAGES SEARCHES OF 10 KNOWN COMPOUNDS SEARCH STAGES Figure 9 with an average of 15.3 candidates percompound was the most

selective. The GCUV combination with an average of114.7 was the leastselective attributed mainly to the largenumber of defaults in both techniques. The MS.IR.GC combination with an averageof 4.0 candidates per compound was the mostselective of the 10 3-signature searches, followed by MSNMRGC with7.4 candidates and MSIRNMR with 8.8. The average of the 10 combinations was

13.2 candidates per zompound. The IRGCUV.MS combination was mostselective among the

4-signature combinations with an average of2.1 candidates per

compound. The IR.NMR.GC.MS combination was secondwith an average of 2.9 candidates. For the 5 combinations, 3.8 was

the average number of candidates. The search on all five signaturesresulted in an average

of 1.5 candidates per compound thus showing onceagain that

each signature contributed to theprecision. Inasmuch as all

of the 10 compounds searched in thecombinatorial tests resulted in unique identifications in the 7-stagesearch, it is concluded

that the state and element searches alsocontributed to the

identification. It can also be concluded that eachof the 7 stages made a

contribution to identification in theexperiment, not all equally,

however. It is possible that themodified match criteria, which eliminated many extraneous candidates,would alter the

results of Table 34 sufficiently to permitelimination of a

lITRESEARCH INSTITUTE

107 IITRI-C6104-4 signature. The selectivity shown inTables 22 and 32 suggests that four signatures would be theminimum number to be searched using the number and ranking ofpeaks as employed in the modified match criteria.

V. SUMMARY AND CONCLUSIONS The objective of the researchstudy reported herein was to determine the capability of acomposite signature to identify an unknowncompound. The composite signature is made up of a minimum number of data points obtainedfrom five analytical techniques. Identification was accomplished bymatching the composite signature of the unknownagainst a file of composite

signatures. In a sequential search, candidates wereselected by obtaining the logical intersectionof the individual signature matdhes. To determine the feasibility ofthe above concept a data

base of 500 representative organic compounds wasestablished and

a search experiment wasconducted. The 500 compounds represented 23 functional chemical groups and wereselected from 838 compounds

for which datawerecollected. The data consisted of five signatures obtained from the followinganalytical tedhniques:

1. Mass spectrometry

2. Infrared spectrometry

3. Nuclear magnetic resonance

4. Gas chromatography

5. Ultraviolet spectrophotometry.

HT RESEARCHINSTITUTE

108 IITRI-C6104-4 The 500 testcompounds wereselected on the basis of compounds had data on maximum dataavailability. One hundred 195 on 3 signatures, and five signatures,174 on 4 signatures, and distribution 31 on 2 signatures. Signature availability among chemical groupsare summarizedin Tables 1 to 5 and a list of the 500compounds appears inAppendix A. The signature data werecollected primarily frompublished used sources. Standard measurementsand classifications were signatures for more than100 when available. Gas chromatography compounds were obtainedfrom laboratorymeasurements. the number The search e.:periment wasdesigned to determine tolerances of data points tobe associated witheach signature, the

that would be requiredto accommodateinstrument variability,

and the matchcriteria that wouldfacilitate positive

identification. The match criteriaused in the first partof the experiment

were asfollows:

MS: Two out of fourof the most intensepeaks bands IR: Two out of threeof the most intense

NMR: The most intensepeak columns: GC: Relative retentiontime of either of two Carbowax 20M orSilicone SE-30

UV: Strongest absorptionband.

A search on state: solid, liquid, or gas,and on elements searches to facilitate was conductedprior to the signature the elimination ofcompounds from thecandidate list.

HT RESEARCHINSTITUTE

109 IITRI-C6104-4 Three categories ofinput compounds weretested. Input 1 consisted of 100 knowncompounds having all fivesignatures,

the data of which wereincluded in the file. Input 2 consisted

of five compounds forwhich data were availablebut not included

in the data file. Input 3 consisted of11 compounds for which

data on each of thefive signatures weregenerated in IITRI laboratories and data inthe file were obtainedfrom published

sources. Several search measures weredefined to aid in theevaluation

of feasibility. Discrimination is the capabilityto separate sets of compounds onthe basis of well-definedcharacteristics capability of a of a signature. Selectivity is fhe screening

sequential search. Precision is the accuracyof identification. Because data were notuniformly available forall compounds in

all signatures, dataavailability was a factorin the measure-

ments. The availability ratioof a signature is theratio of the

number of compoundshaving data to the totalnumber of compounds

in the data base. The default ratio is oneminus the availability 100 test ratio. Signature availabilityand default ratios for compounds are listed inTable 12. A match by default occurs

When input data for agiven signature of anunknown are available The and a stored compounddoes not have data forthe signature. discrimination factor is theproduct of fhe number ofmatches

obtained in a signature searchmultiplied by the defaultratio.

HT RESEARCHINSTITUTE

110 IITRI-C6104-4 Discrimination factors forthe signatures obtained by searching for the 100 Input1 compounds are listed inTable 13.

Infrared was the mostdiscriminating signature with an average of 30.39 matches persearch out of 470 availablesignatures. Mass spectrometry was secondwith an average of 49.83 matches per search out of478 available signatures. Gas chromatography was third with an averageof 11.74 matdhes persearch out of

308 available signatures. Nuclear magnetic resonance was fourth and ultravioletphotospectrometry was the least discriminating. Searches ordered on the basis ofdata availability provided more rapid screeningthan searches based on thenumber

of data points to be matdhed. Accordingly, the order ofsearch

in all the experiments was:

1. State

2. Elements

3. MS

4. IR

5. NMR

6. GC

7. UV

The search experiment wasdesigned for processing on a

computer but was conductedusing an inverted file systemin

which data were drilled on cards(Termatrex system). Matches

were established byoptical coincidence.

lITRESEARCH INSTITUTE

111 IITRI-C6104-4 Ratings were used to evaluatethe probability that an unknown compound was on thecandidate list in accordance with weights assigned to thesignatures and with the closeness of the match. The maximum ratings assigned tothe signatures and to the state and elementssearches were:

MS 40

IR 36

NMR 24

GC 18 (9 points per column)

UV 6

Elements 4

State 2

Total 130

A candidate list ratingtable showing the scoring procedure

appears in Appendix F. The most critical measure inestablishing feasibility is

precision. Of the 100 known Input 1 compounds, none wererejected

in the search process. Using the match criteria listedabove,

68 compounds were uniquelyidentified by the seven-stcge search.

The average number of survivingcandidates for the 100 searches

was 1.65 (see Table22). Sixty-five extraneous candidatesappeared

on the candidatelists of the 32 compounds withmultiple candidates, ranging from 1 to 5 extracandidates. The aldehydes

(8 compounds) had the lowest average ofsurviving candidates,

1.12, and the hydrocarbons had thehighest average, 2.65. The

lITRESEARCH INSTITUTE

112 IITRI-C6104-4 alcohols with an average of 1.63candidates ranked second highest. Only 2 of the 65 extraneous compoundshad data on all signatures (see Table 24). Eleven had one signature missing,

45 had 2 signatures missing, and 7had 3 signatures missing. Ratings for the candidates ranged from 34 to95 compared to maximum possible ratings of 117 to 130. The unavailability of

GC and NMR signatures contributedin large measure to the low precision in the alcohols and hydrocarbons. Two compounds appeared as candidates on sevencompound lists. The search for the five Input 2 candidatesresulted in three cases of perfect precision, thatis, no candidate compounds survived. One search resulted in the candidacy (score of 28) of a closely related compound butdata on three

signatures were unavailable. The search for a hydrocarbon

obtained six candidates. Nine of the 11 Input 3 compounds wereuniquely identified.

A hydrocarbon had four extraneouscandidates. Only one compound

was erroneouslyrejected because of failure to match on infrared. It is suspected that the compound'sfile data was in error

because of a significant discrepancy betweenthe published data

and IITRI laboratory data. To improve precision, the followingmodified matdh criteria

were imposed on the 32 Input 1compounds that had multiple

candidates:

HT RESEARCH INSTITUTE

113 IITRI-C6104-4 1. Both olumns ir GC wererequired

2. Three out of four MSpeaks were required, and the matches on permuted inputpeak 1 against stored

peak 4 and input peak4 against stored peak 1 were

eliminated

3. Two out of two NMRpeaks were required. The modified match criteriaeliminated 51 of the 65 extraneous candidates leaving only 8 testcompounds with multiplecandidates.

Of these 8 compounds,2 had one extraneous and6 had 2 extraneous candidates. Seven of the test compounds werehydrocarbons and one was analcohol.

In summary: 92 of 100 known compounds wereuniquely identified, and none of the 100 wereerroneously rejected

2 of 5 "unknown" compounds werenot rejected

when they should have been,but the surviving candidates were closely related

10 of 11 compounds whosesignatures were measured

in IITRI laboratories wereidentified, 9 of them uniquely; one compound waserroneously rejected

because of suspected errorin the published data.

The high precision attainedand the absence of erroneous

rejections, especially Whensignature data were completely

available, indicate thatidentification of organiccompounds

by the matching of compositechemical signatures is feasible. The screening of a large datafile in a sequential searchis

HT RESEARCHINSTITUTE

114 IITRI-C6104-4 effective and isreadily amenable tocomputer operations. closeness of matchesand The ratingsassociated with the signature provided a measurefor fhe weights assignedto each identifications can ranking thecandidates. Multiple candidates provide act as indicatorsof chemical groupsand can, therefore, clues for uniqueidentification. Refinement of the matchcriteria, signatureparameters, and in an tolerance limits canbe utilized toobtain high precision

enlarged data file.

The importance ofobtaining reliabledata from standardized analytical techniques wasrecognized early inthe program and The diversity of the test resultscorroborate this view. lack of coordination among indexing betweentedhniques and the composite signature. them poses manyproblems in developing a described in The establishmentof a data filealong the lines this report can serve asan impetusfor relating thetechniques

and coordinatingthe data sources.

VI. RECOMMENDATIONS of the Our investigationshave given us an awareness shortcomings of existinganalytical datainformation files and computerized file system can the contributionof our proposed weaknesses are: make to overcomingthese. The principal together in any way,and (1) that existingfiles are not tied type of datahave (2) that many of thecompounds for which one not found in data been collected, e.g.,NMR spectra, are

lITRESEARCH INSTITUTE

115 IITRI-C6104-4 collections of other types. This is the reason for the large number of defaults in our presentfile.

The input for such anoperational system should come from existing compilations and an attemptwill be made to coordinate the efforts of those organizationsproducing such compilations so that data gaps(defaults) will be minimized in the future.

Thus, we offer the followingrecommendations pertaining to the development of our system and itsutilization by the scientific community for organic compoundidentification and as an

information file.

1. A computerized systemfor storing, searching, and retrieving compound identification datashould be

established in a subsequent phase ofresearch.

2. The present file of 500 organiccompounds should be expanded to a minimum of 3,000compounds in the second

phase. This and the first recommendationconstitutes a pilot scale study to establish themechanics for

operating a larger file. A computerized system of3,000 compounds would constitute aworkable and useful system.

Actual application of thefile to solving users identification problems and informationneeds can

begin after the pilot study.

3. The mode of operation ofthe system should be centralized with respect to data storing,searching

and retrieving. However, data collecting pointsmight be located near major data sourcessuch as Sadtler,

HT RESEARCH INSTITUTE

116 IITRI-C6104-4 ASTM and NBS. The centralized modefor operating the file will offerbetter control over andcoordination of the data fileand its operations. should 4. Where possible, datafor all signatures be obtained for allcompounds. Our past research pointed out the need forcomplete data files. For the final operating system,this will require generation of some datamainly in gaschromatography, nuclear magnetic resonanceand to a lesser extentin mass spectrometry. Generating these datashould be undertaken by existingdata-generating organizations.

For the pilot study some gaschromatographic data may have to be generated. automatic 5. The investigation ofthe employment of

data reduction anddigitization is recommended. Equipment of this nature willspeed data processingand improve the accuracy of the data. signatures 6. The information file fororganic chemical will obtain data fromongoing data operationssuch as

NBS, ASTM and variousprivately-operated data services.

It is not our objective toprovide complete spectra or

yet another competingdata compilation. Our operation will supply the userwith a compoundidentification and references directing him tothe original orprimary data

source if he wishesto obtain the completespectra. In

this way vere hope toincrease the usefulnessof presently

PITRESEARCH INSTITUTE

117 IITRI-C6104-4 existing data services.

7. Our information file willbe, in essence, a subsystem of American Chemical Society'sdiscipline-based chemical information system. It can be so classifiedbecause of the tie-in of each compoundin our file to Chemical

Abstract Services' informationservice through the registry number. CAS has provided,and promised toprovide in the future,registry numbersfor all compounds used in our file making ourACS subsystem entirely feasible.

8. Although we have included ultravioletspectro- photometric data in our signaturesit is recommended that, in future developmentalwork, the UV signature be excluded in favor of a morecomplete file of signatures for each compound. We feel that while some discrimination was shown by the UVsignature, its elimination can be compensated for by morecomplete data for the other four signaturetechniques.

9. A survey of a representativesampling of potential users should be conducted todetermine the level of interest in and financial support of ourinformation

file and the growth of usage wecould expect for the

operating system.

lITRESEARCH INSTITUTE

118 IITRI-C6104-4 APPENDIX A

500 COMPOUNDS IN DATABASE

KEY

D= Dataavailable

T= Transparent A .= GC Column A dataavailable

B = GC Column Bdata available

AB = GC data forboth c,Aomns available

(Blank) = No data available *Preceding compound namedesignates Input 1 test compoundused in search experiment

RESEARCH INSTITUTE

119 IITRI-C6104-4 Availability of Spectra IITRI Compound MS GC IR NMR No. (CA name) UV

D D D 1 Methane T (same) D AB D D 2 Ethane T (same) D 3 n-Propane T (same) D AB D 4 n-Butane (same) AB D D 6 n-Hexane D (same) AB D D 7 n-Heptane D (same) D D 9 n-Nonane D AB (same) AB D D 10 n-Decane D (same) AB D D 12 n-Hexadecane D

rn 13 n-Heptadecane

B D 14 n-Octadecane T D

16 n-Eicosane T D (same)

17 Isobutane T D (2-Methylpropane) m 18 2-Methylheptane .. D AE

D 19 3-Methylheptane T D AB

. D 25 2,5-Dimethylhexane T D

lITRESEARCH INSTITUTE

120 IITRI-C6104-4 Avaijability of Spectra Compound IITRI MS GC IR NMR No. (CA name) UV

D D 28 2,3,4-Trimethylpentane T

29 2,3,3-Trimethylpentane T D

AB D 34 3,4-Dimethylhexane T D

D 36 Squalane T (2,6,10,15,19,23-Hexamethyl- tetracosane) D D 37 Isobutene D (2-Methylpropene) AB D D 40 Isoprene D (same) AB D D 41 1-Hexene D

A D D 42 trans-2-Hexene D

D 47 Butadiene D (1,3-Butadiene) D D 48 Cyclopropane D (same) A D D 50 Cyclopentane T D

D B D D 51 * Cycldhexane T (same) D AB D D 52 * Cycloheptane T

AB D D 53 cis-Decalin T D (cis-Decahydronaphthalene) D D D 54 trans-Decalin T (trans-Decahydronaphthalene) AB D D 56 Methylcyclohexane T D

lITRESEARCH INSTITUTE

121 IITRI-C6104-4 TITRI Compound Availability of Spectra IR NMR No. (CA name) UV MS GC

D ]I":: 57 Cyclohexene D AB

59 Dicyclopentadiene D AB D (3a,4,7,7a-Tetrahydro-4,7- methanoidene) D 60 * Benzene D D AB D (same) D D 61 * Toluene D D AB (same) D 62 o-Xylene D D AB D (same) D 63 m-Xylene D D AB D (same) D 64 p-Xylene D D AB D (same)

65 Ethylbenzene D D AB D D (same)

66 Vinylbenzene D D AB D (Styrene) wag,* D 67 Isopropylbenzene D D AB D (Cumene)

69 p-Isopropyltoluene D D D D (p-Cymene)

70 Naphthalene D D A D D (same)

71 Anthracene D D (same)

73 Azulene D D

77 Indane D D D D

78 Tetralin D A D D (1,2,314-Tetrahydronaphthalene)

lITRESEARCH INSTITUTE

122 IITRI-C6104-4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)

T D D D 79 cis-1,2-Dimethy1cyc lohexane

T D D D 80 trans-1,2-Dimethylc yclohexane

D AB D D 84 Ethylene oxide (same) D AB D D 85 Propylene oxide (same) D AB D D 88 Furan (same) D AB D D 89 Tetrahydrofuran (same) D D D D 90 Pyrrole (Azole) D D D 91 Pyrrolidine

D D D D 95 Pyridine (same) D B D D 96 Piperidine (same) D D D 98 3-Methylpyridine

D D D 99 4-Methylpyridine (4-Pico line) D D D 101 3-Methylpiperidine

D D D D 104 Isoquinoline (same) D D D D 105 Acridine

D D D 106 Indole

IIT RESEARCHINSTITUTE

123 IITRI-C6104-4 Availability of Spectra Compound TITRI UV MS GC IR NMR No. (CA name)

D D 107 Skatole (3-Methylindole) D D 110 Pyrazole

D AB D D 112 Dioxane (p-Dioxane) D D 113 Imidazole

NP D D 125 Tryptophan (same) NP D D 127 Phenylalanine (same) NP D 134 a-Alanine (same) NP D D 135 Glycine

D NP 146 Caffeine (same) D NP 147 Morphine (same) NP 151 Novocain

D D 154 Cholesterol (same)

164 Progesterone (same) D A D D 173 a-Pinene (2-Pinene) D 176 Alleacimene D

D D 177 p-Pinene (2(10) -Pinene)

lITRESEARCH INSTITUTE

124 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR

178 Camphpne D D D (same)

179 a-Terpineol D AB D (p-Menth-1 -en-8-ol)

180 Geraniol D A D D (3,7-Dimethyl-trans-2,6- octadien-l-ol)

181 Citral D D AB D (3,7-Dimethy1-2,6-octadienal)

182 Citronellol (3,7-Dimethy1-6-octen-1-ol)

183 Linalobl D AB D (3,7-Dimethy1-1,6-octadien-3-01)

184 Menthone D AB D (p-Menthan-3-one)

185 Menthol D AB D D (same)

194 Methylene dichloride D AB D D (atchloromethane)

195 Chloroform D D D (same)

196 Carbon tetrachloride D D T (same)

197 Chloroethane D AB D D (same)

198 Trichloroethylene D B D D (same)

200 1,1,2-Trichloroethane D AB D D (same)

201 cis-1,2-Dichloroethylene D AB D D

202 Tetrachloroethylene D D T (same)

lITRESEARCH INSTITUTE

125 IITRI-C6104-4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)

D D T 203 Hexachlorobenzene (same) D AB D 204 n-Propyl chloride

AB D D 205 Chlorobenzene D D (same) AB D D 206 o-Dichlorobenzene D D (same) D AB D 207 m-Dichlorobenzene D

AB D D 208 p-Dichlorobenzene D D (same) AB D D 211 Bromoform D (rr.tbramomethane) D AB D D 213 Bromobenzene D (same) D D T 214 Trichlorofluoromethane (same) D T 215 Dichlorodifluoromethane D (same) D T 216 Hexafluorobenzene D

A D D 217 Methyl iodide D (Iodomethane) AB D 222 Allyl chloride D (3-Chloropropene) A D D 223 Benzyldhloride D D (a-Chlorotoluene) D D 225 Benzotrichloride D (afala-Trichlorotoluene) D D 226 Benzal chloride D D (ala-Dichlorotoluene)

lIT RESEARCH INSTITUTE 1

126 IITRI-C6104-4 IITRI Compound Availability of Spectra GC IR NMR No. (CA name) UV MS

231 1,5-Dibromopentane D A (same) D D 235 o-Chlorotoluene D AB (same) D D 236 p-Chlorotoluene D D (same) D 239 3-Bromopropyne D AB D (same) D 240 Bromodhloromethane D A D

241 1-Iodonaphthalene D A (same) D 242 2-Iodopropane D AB D (same) D 243 Methanol T D AB D (same) D 244 Ethanol D AB D (EI:hyl alcohol) D 245 Propanol T D AB D (Propyl alcohol)

246 Butanol T D AB D D (Butyl alcohol)

247 Pentanol T D AB D D (Pentyl alcohol)

249 Heptanol T D AB D D (Heptyl alcohol)

250 Isopropanol T D AB D D (Isopropyl alcohol)

251 Isobutanol T D AB D D (Isobutyl alcohol)

253 2 -Methyl -1 -butanol T D AB D (same)

lITRESEARCH INSTITUTE

127 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CA name)

T D AB D 254 2-Methy1-2-butanol (tert7Pentyl alcohol) T D AB D 255 3-Methy1-2-pentanol

AB D 256 2,2-Dimethy1-1-butanol

AB D 257 2,3-Dimethy1-2-butanol

T D AB D D 258 Cyclopentanol (same) T D AB D D 259 Cyclohaxanol (same) T D AB D 260 cis-2-Methylcyclohexanol

D AB D D 262 2-Propyn-1-ol (same) D AB D D 266 2-Methy1-3-butyn-2-ol (same) D AB D 267 3-Methyl-l-pentyn-3-01 (same) D D AB D D 268 Phenol (same) D D D D 269 o-Cresol (same) D D 271 p-Cresol (same) D D D D 272 a-Naphthol (1-Naphthol) D D 273 13-Naphthol (2-Naphthol) D D 274 2,4-Dimethylphenol (2,4-Xylenol)

lITRESEARCH INSTITUTE

128 IITRI-C6104-4 Li Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)

D 276 2,5-Dimethylphenol D (2,5-Xylenol) D D D 277 3,5-Dimethylphenol D (3,5-Xylenol) D AB D D 278 Octanol T (Octyl alcohol) T D AB D D 279 Nonanol (Nonyl alcdhol) D AB D 281 Ethylene glycol T (same) AB D 282 1,2-Propanediol T D (same) D AB D 283 1,3-Propanediol T

D AB D D 284 * Diethylene glycol T

D AB D 285 1,2-Butanediol T

D AB D 286 1,4-Butanediol T (same) D D 288 Catechol D D (Pyrocatechol) D D 289 Resorcinol D D (same) AB D 291 2,3-Dimethy1-2,3-butanediol D (same) AB D 293 2,4-Pentanediol T D

D AB D D 294 * Acetone D (same) D AB D D 295 * Methyl ethyl ketone D (2-Butanone)

lITRESEARCH INSTITUTE

129 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CA name)

D D AB D D 296 2-Pentanone (same) D AB D D 297 3-Pentanone (same) D AB 298 3-Methy1-2-butanone

D AB D 299 2-Hexanone

D AB D 300 3-Hexanone D

D AB D 301 3-Methy1-2-pentanone

D AB D D 302 2-Heptanone (same) D AB D 304 4-Heptanone D (same) D AB D D 305 Cyclopentanone D (same) D AB D D 306 Cyclohexanone D (same) D AB D 307 3-Buten-2-one (same) D AB D 308 5-Hexen-2-one (same) D AB D 309 3-Methy1-3-buten-2-one

D AB D 310 2,3-Butanedione D

D AB D D 312 * 2,4-Pentanedione D (same) D D A D D 313 * Acetophenone (same)

lITRESEARCH INSTITUTE

130 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CA name)

D AB D D 314 Propiophenone D (same) D D 315 Benzophenone (same) D D 317 Benzi1 (same) D D 318 p-Benzoquinone D D (same) AB D 320 Formaldehyde D D (same) AB D D 321 Acetaldehyde D D (same) AB D D 322 Propionaldehyde D D (same) AB D D 323 Butyraldehyde D D (same) AB D 324 Valeraldehyde D (same)

V AB D 325 Hexanal D

AB D 326 Heptanal D (same) AB D D 327 Isobutyraldehyde D (same) AB D D 328 Isovaleraldehyde D (2-Methylbutyralddhyde) D 331 2-Ethylhexanal D AB (same) D 332 Acrolein D AB (same)

333 Methacrolein (Methacrylaldehyde)

HT RESEARCH INSTITUTE

131 IITRI-C6104-4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)

D D 336 2,4-Hexadienal (Sorbaldehyde) D D AB D D 337 * Crotonaldehyde (same) D A D D 339 Benzaldehyde D (same) D AB D D 340 Furfural D (2-Furaldehyde) D AB D D 344 Paraldehyde (same) D D D 347 Formic acid D (same) D D D 349 Propionic acid (same) D D D 350 Butyric acid (same) D D D 351 Valeric acid (same) D D D 352 Hexanoic acid (same) D D 353 Heptanoic acid (same) D D D 357 Benzoic acid D (same) D D 358 Phenylacetic acid D D (same) D D 359 o-Toluic acid D D (same) D D 360 m-Toluic acid D D (same) D D 361 p-Toluic acid D D (same)

HT RESEARCH INSTITUTE

132 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (CIA name)

D D 362 p-Naphthoic acid (2-Naphnoic acid) D D 365 Succinic acid (same) D D 367 Adipic acid (same) D D D D 368 o-Phthalic acid (Phthalic acid) D D 369 m-Phthalic acid (Isophthalic acid) D D D D 370 Terephthalic acid (same)

ID D D 373 Methacrylic acid (same) acid 374 2,4,6-Mesitylenecarboxylic

D D 375 Stearic acid (same) D AB D D 379 Methyl formate (Formic acid, methylester) D AB D 381 Propyl formate (Formic acid, propylester) AB D Butyl formate D [. 382 D AB D 384 Isobutyl formate

D AB D D 385 Methyl acetate (Acetic acid, methylester) D AB D D 387 Propyl acetate (Acetic acid, propylester) D AB D D 388 Butyl acetate (Acetic acid, butylester)

lITRESEARCH INSTITUTE

133 IITRI-C6104.4 Availability of Spectra Compound IITRI UV MS GC IR NMR No. (CA name)

D AB D D 389 Isopropyl acetate (Acetic acid,isopropyl ester) D AB D 390 Isobutyl acetate (Acetic acid,isobutyl ester) D AB D D 391 Methyl propionate

D AB D 392 Ethyl prc)ionate (Propionic acid,ethyl ester) D AB D D 393 Propyl propionate (Propionic acid,propyl ester) D AB D 395 Isopropyl propionate

D AB D D 396 Isobutyl propionate

D AB D 398 sec-Butyl acetate (Acetic acid, sec-butylester) D AB 402 Ethylidene diacetate

D AB D D 403 Vinyl acetate

D AB D 404 Allyl acetate

D AB D 405 Vinyl butyrate (Butyric acid, vinylester) D AB D D 407 Methyl methacrylate (Methacrylic acid, methylester)

410 Allyl propionate

D AB D D 411 Isopropenyl acetate

D AB D 412 2-Ethyl-1-hexyl acetate (Acetic acid, 2-ethylhexylester)

UT RESEARCHINSTITUTE

134 IITRI-C6104-4 IrrR1 Compound Availability of Spectra IR NMR No. (CA name) UV MS GC

D 415 Isopentyl propionate D AB D

D 418 Methyl benzoate D D A D (Benzoic acid, methylester) D 419 Ethyl benzoate D D D (Benzoic acid, ethyl ester) D 421 Dimethyl-o-phthalate D D A D (Phthalic acid, dimethylester)

423 Dimethyl-p-phthalate D D A

424 Diethyl-p-phthalate D D

D 425 Diethyl malonate D D AB D (Malonic acid, diethylester)

426 Diethyl succinate D AB D D (Succinic acid, diethylester)

427 Diethyl oxalate D AB D (Oxalic acid, diethyl ester)

432 Ethyl methacrylate D D D (Methacrylic acid, ethylester)

434 Ethyl acrylate D AB D D (Acrylic acid, ethyl ester)

435 Methyl cinnamate D D D D (Cinnamic acid, methyl ester)

438 Methyl-o-toluate (o-Toluic acid, methyl ester)

439 Methyl-p-toluate D D D (2-Toluic acid, methyl ester)

440 Methyl-m-toluate D D D (n=roluic acid, methylester)

441 Dimethyl ether D AB D D (Methyl ether)

lITRESEARCH INSTITUTE

135 IITRI-C6104-4 Comound Availability of Spectra IITRI UV MS GC IR NMR No. (CA name)

D AB D D 443 Diethyl ether (Ethyl ether) D AB 444 Methylpropyl ether

D AB D D 445 Dipropyl ether (Propyl ether) D AB D 446 Butylethyl ether

D AB D D 449 Isopropyl ether (same) D AB D 450 Allylethyl ether

D AB D 452 Ethylvinyl ether (same) AB D 453 Butylvinyl ether (same) D AB D 454 2-Ethyl-1-hexyl vinylether (2-Ethylhexyl vinylether) D AB D D 456 Diallyl ether

D D D 457 Dibenzyl ether D (Benzyl ether) D AB D D 458 Diphenyl ether D (Phenyl ether) D AB D D 459 Anisole D (same) D D D 464 Butylamine (same) D D D 466 Hexylamine (same)

467 Dimethylamine (same)

PITRESEARCH INSTITUTE

136 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR

468 Diethylamine D D D (same)

471 Triethylamine D AB D D (same)

472 Trimethylamine D D D (same)

475 Allylamine D D D (same)

476 Aniline D D D D (same)

477 N-Methylaniline D D (same)

478 N,N-Dimethylaniline D D D D (same)

480 o-Toluidine D D D D (same)

484 m-Phenylenediamine D D D (same)

485 p-Phenylenediamine D D D D (same)

486 Diphenylamine D D D (same)

490 o-Tolidine D D D (3,3'-Dimethylbenzidine)

491 Formamide D D D (same)

492 Acetamide D D D (same)

493 Propionamide (same)

494 Butyramide

lITRESEARCH INSTITUTE

137 IITRI-C6104-4 Availability of Spectra Compound IITRI MS GC IR NMR No. (CA name) UV

D D D 498 Benzamide D

D D D 501 Dimethvlformamide (N/N-Dimethylformamide) D D 515 Methacrylamide (same) D 516 N-Methylformanilide D

D A D D 522 Benzonitrile D (same) AB D D 524 Acetonitrile D (same) AB D D 525 Propionitrile D (same) D D 526 Dicyanopropane D

D AB D D 527 * p-Tolunitrile D

D AB D D 528 * o-Tolunitrile D

AB D 529 Valeronitrile D (same) A D D 530 3-Butenenitrile D (same) D D 531 Acrylonitrile D (same) A 532 Malonitrile D (same) AB D D 534 Nitromethane D D (same) AB D D 535 Nitroethane D D (same)

lITRESEARCH INSTITUTE

138 IITRI-C6104-4 Availability ofSpectra IITRI Compound UV MS GC IR NMR No. (CIA. name)

D D AB D D 536 Nitropropane (1-Nitropropane) D D AB D D 537 2-Nitropropane (same) D D AB D D 540 Nitrobenzene (same) D A D D 542 o-Nitrotoluene

D D AB D D 544 p-Nitrotoluene (same)

545 m-Nitrotoluene (same) D D D 547 1-Nitronaphthalene (same)

557 Propylmercaptan (1-Propanethiol) D AB D D 560 Butylmercaptan (1-Butanethiol) D D AB D D 562 Thiopheno3 (Benzenethiol) D D D D 567 Dimethyl sulfide (Methyl sulfide) D D AB D D 568 Diethylsulfide

D D AB D D 569 Ethyl disulfide (same) D D AB D 572 tert-Butyl sulfide (same) D D AB D T 573 Carbon disulfide (same) D D A D D 574 Diphenyl sulfide (Phenyl sulfide)

lITRESEARCH INSTITUTE

139 IITRI-C6104-4 IITRI Compound Availability of Spectra NMR No. (CLA name) UV MS GC IR

D 575 Phenyl disulfide D D

578 Dimethyl sulfoxide D A D D (Methyl sulfoxide)

587 Dimethyl sulfone

594 2-Chloroethanol D AB D

595 3-Hydroxy-2-butanone D D D

600 2-Methoxyethylvinyl ether D AB D

610 Phthalic anhydride D D (same)

611 Succinic anhydride D D D (same)

613 Acetyl chloride D D

618 Trichloroacetic acid D D D

626 o-Nitrophenol D D A (same)

628 o-Methoxybenzaldehyde D A D D (o-Anisaldehyda)

633 Chloral (same)

655 Tetraethyl lead

658 Acetal D AB D D (Acetaldehyde, diethyl acetal)

659 Ketal D AB (Acetone, dimethyl acetal)

lITRESEARCH INSTITUTE

140 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR

661 Isobutyric anhydride D D (same)

662 Benzoic anhydride D D D D (same)

663 Maleic anhydride D D D D (same)

664 Coumarin D D B D D (same)

665 Chloroacetic acid D D (same)

666 Dichloroacetic acid D D (same)

667 m-Nitrobenzoic acid D D D D (same)

668 Diethoxymethane D AB D D

670 o-Methoxyphenol D D AB D D (same)

671 m-Methoxyphenol D D A D D (same)

672 p-Methoxyphenol D D A D D (same)

673 3-Bromopropionitrile D D

674 3-Bromopropionic acid D D

675 2-Bromobutyric acid D D (same)

676 2-Bromo-4-chlorophenol D D D D

677 p-Bromophenol D D D D (same)

lITRESEARCH INSTITUTE

141 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR

678 p-Chlorobenzonitrile D D

679 m-Chlorophenol D D D D (same)

680 * 1-Chlorophenol D D A D D

681 Salicylaldehyde D D A D D (same)

682 3-Etboxy-4-hydroxybenzaldehyde D D D D (same)

683 2-Furan acrolein D D D

684 * 2,4-Dichlorobenzaldehyde D D A D D

685 p-Anisaldehyde D D A D D (same)

687 p-Chlorobenzaldehyde D D AB D D (same)

688 3-Methoxy-1-butanol AB D D

689 2-Ethoxyethanol D AB D (same)

690 2-Butoxyethanol D AB D (same)

691 2-Methoxy-1-propanol D AB

692 1-Methoxy-2-propanol D AB D (same)

693 2-Ethoxyethyl acetate D AB D (2-Ethoxyethano1, acetate)

695 Dipropyl acetal D AB D (Acetaldehyde, dipropyl acetal)

lITRESEARCH INSTITUTE

142 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR

698 4-Hydroxy-4-methy1-2-pentanone D D B D D (same)

699 2-Methoxyethanol D AB D D (same)

700 bis(2-Etboxyethyl)ether D AB D D

701 2,3-Dimethylbutane T D AB D D (same)

702 o-Diethylbenzene D AB D D

703 p-Diethylbenzene D AB D D (same)

704 m-Diethylbenzene D D AB D D (same)

705 2,2-Dimethylbutane T D AB (same)

706 1-Methylcyclohexene D D D (Methylcyclohexene)

707 4-Viny1-1-cyclohexene D AB (4-Vinylcyclohexene)

711 3-P1-ienylpropene D D D D

712 Isopentane T D D D (2-Methylbutane)

713 2,4-Dimethylpentane T D D D (same)

714 2-Methylpentane T D D D (same)

715 3-Methylpentane T D AB D D (same)

716 2,3-Dimethyl-1-butene D D D

lITRESEARCH INSTITUTE

143 IITRI-C6104-4 IITRI Compound Availability of Spectra IR NMR No. (CA name) UV MS GC

D 717 2-Methy1-2-butene D D

B D D 718 * Biphenyl D D (same) D 720 3,5-Dimethylpyrazole D

D 721 2,6-Dimethylpyridine D D D

D 722 4-VinylPyridine D

D 723 2-Methylfuran D AB D

D 726 Chlorocyclohexane ID D

D D 727 1-Chlorohexane

D 728 1,2-Dichloropropane ID D (same) D 729 1,3-Dichloropropane D AB D (same) D 732 1,2,4-Trichlorobenzene D A D (same) D 733 Bromocyclohexane D D (same)

734 2-Bromobutane D AB D D (same)

735 1-Bromo-3-methylbutane ID D D (same)

736 1,2-Dibromobutane ID D D

737 Bromoethane D AB (same)

UT RESEARCH INSTITUTE

144 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (CA name) UV MS GC IR NMR

738 1-Bromopentane D AB D D (same)

739 1-Bromopropane D AB D D (same)

740 2-Bromoprcpane D AB D D (same)

741 2-Bromo-2-methylpropane D D (t-Butyl bromide)

742 1,2 -Dibromopropane D D (same)

743 1,3-Dibromopropane D D (same)

744 2-Bromo-1-propene D D D

745 2,3-Dibromo-1-propene D D D

746 Iodocyclopentane D D D

747 1-Iodobutane D D D

748 2-Iodobutane D D D

749 Iodoethane D AB D D (same)

750 1-Iodopropane D AB D D (same)

751 Iodobenzene D D D

754 1-Bromo-2-chlorobenzene D D D

755 1-Bromo-4-chlorobenzene D D D

lITRESEARCH INSTITUTE

145 IITRI-C6104-4 Availability of Spectra IITRI Compound UV MS GC IR NMR No. (Ch name)

T D AB D D 757 2-Pentanol (same) T D AB D D 758 2,2-Dimethy1-1-propanol (same) T D AB D D 759 2-Ethyl-1-butanol (same) T D AB D D 760 3-Heptanol

T D AB D D 761 * 2-Ethyl-1-hexanol (same) T D AB D 763 2-Hexanol

T D AB D 764 3-Hexanol

T D AB D 765 2-Methyl-1-pentanol (same) T D AB D 767 4-Methy1-2-pentano1

T D AB D 768 2-Methy1-3-pentanol

T D AB D 769 3,3-Dimethy1-2-butanol

T D AB D 770 2-Heptanol

T D AB D 771 4-Heptanol

T D AB D 772 2127.Dimethy1-1-pentano1

T D AB D D 773 * 2-Octanol (same) D AB D 774 2-Methy1-2-propen-1-ol

HT RESEARCHINSTITUTE

146 IITRI-C6104-4 Availability of Spectra IITRI Compound GC IR NMR No. (CAL name) UV MS

D D 775 206-Dimethylphenol D

776 Cyclopentylmethanol T D

D 777 Benzyl alcohol D A D (same) D D 778 t-Butanol T AB

779 2,5-Hexanediol T D D D

780 2-Methy1-2,4-pentanediol T D AB D (same) D D 781 1,3-Butanediol T D AB (same) D D 783 2,3-Butanediol T D AB

D 785 4-Methy1-2-pentanone D AB D (same) D 786 2-Nonanone D AB D

D 787 Camphor D D AB D (same) D 788 4-Methylcyclohexanone D D D

789 * Butyrophenone D D A

D 791 * Cinnamaldehyde D D AB D (same)

792 o-Tolualdehyde D A

793 2-Ethylbutanal D AB D D

lITRESEARCH INSTITUTE

147 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (aPi name) UV MS GC IR NMR

794 Piperonal D D AB D D (same)

795 Cyclohexanecarboxylic acid D D D (same)

796 2-Methylbutyric acid D D D (same)

797 Isobutyric acid D D D (same)

798 Senecioic acid D D D

799 Pentyl formate D A

800 Hexyl formate D AB D

801 Allyl formate

803 Heptyl acetate D AB D (Acetic acid, heptyl ester)

804 Cyclohexyl acetate D AB D

805 Methyl butyrate D AB D

Propyl butyrate D AB D D (Butyric acid, propyl ester)

807 Isopropyl butyrate . D AB D

808 Butyl butyrate D AB D (Butyric acid, butyl ester)

809 Pentyl butyrate D AB D

810 Isopentyl butyrate D AB D (Butyric acid, isopentyl ester)

lITRESEARCH INSTITUTE

148 IITRI-C6104-4 IITRI Compound Availability of Spectra No. (ak name) UV MS GC IR NMR

811 Butyl isobutyrate D AB D (Isobutyric acid, butyl ester)

812 Propyl acrylate AB D

814 Isopentyl acetate D AB D D

815 Hexyl acetate D AB D D (Acetic acid, hexyl ester)

816 Pentyl propionate D AB D D

817 Ethyl butyrate D AB D D (Butyric acid, ethyl ester)

818 Benzyl benzoate D D D D (Benzoic acid, benzyl ester)

819 * Butyl benzoate D D AB D D (Benzoic acid, butyl ester)

820 Pentyl ether D AB D D

821 Butylmethyl ether D AB D

822 Butyl ether D AB D (same)

823 Hexyl ether D AB D (same)

824 Isobutylvinyl ether D AB D (same)

825 Ethoxybenzene D D

826 p-Ethoxytoluene D D D D

827 NIN-Dimethylcyclobexylamine D D D (same)

HT RESEARCH INSTITUTE

149 IITRI-C6104-4 IITRI Compound vailability of Spectra No. (CA name) UV MS GC IR NMR

828 tert-Butylamine D D (same)

829 sec-Butylamine D D D

830 Isobutyronitrile D AB D D (same)

831 Butyronitrile D AB D D (same)

832 Fumaronitrile D D D

833 Hexanenitrile D AB D D

834 4-Methylpentanenitrile D D D

835 Phenylacetonitrile D A D D (same)

836 Oleic acid D D D (same)

837 m-Tolunitrile D A D D

838 Benzylmercaptan D AB D D (a-Toluenethiol)

839 2-Propanethiol D D D

840 114-Butanedithiol D D D

841 * Butyl sulfide D D AB D D

842 2-Methyl-1-butene D D D

843 cis-1,3-Dimethylcyclohexane T D D

HT RESEARCH INSTITUTE

150 IITRI-C6104-4 IITRI Compound Availabinty of Spectra IR NMR No. (CA name) UV MS pc

D 844 cis-1,4-Dimethylcyclohexane T D

D 845 trans-1,3-Dimethylcyclohexane T D

D 846 trans-1,4-Dimethylcyclohexane T D

850 trans-2-Methylcyclohexanol T D AB

lITRESEARCH INSTITUTE

151 IITRI-C6104-4 APPENDIX B

AVAILABILITY OF SIGNATUREDATA

Table 1Major Spectra Sources: MS Spectra MS Data Compilations

Table 2Major Spectra Sources: IR Spectra IR Data Compilations

Table 3Major Spectra Sources: NMR Spectra NMR Data Compilations

Table 4Major Spectra Sources: GC Spectra GC Data Compilations

Table 5Major Spectra Sources: UV Spectra UV Data Compilations

HT RESEARCHINSTITUTE

152 IITRI-C6104-4 MAJOR SPECTRA MS SPECTRA TABLE 1 SOURCES MAJORASTM SOURCE SPECTRA3,200 spectraAVAILABLE FORMIndexedstrongest OF SPECTRA by peaks $13 for DS-27 COST LiteratureCommercial,SPECTRAIndustrial, SOURCE andstrongest3,200($96) indices IBM cards-peak tik)Ul 9,000collectionadded (European in June) to be CollectionsMiscellaneous .Anmem

MS DATA COMPILATIONS

American Society forTesting & Materials. Mass Spectral Data Index, DS27. 248 pp., 1964. $13. Mass Spectral Data Mass Spectral Data,3,200 cards. $96. DS27-1a. cards. $115. DS27-1b. Mass Spectral Nameand Formula, 3,500 Spectral Data, Cornu, A., and Massot,R. Compilation of Mass Limited, Presses Index of Spectresde Masse. Heyden and Son Universitaires de France,1966. Rosenstock, H. M. National Bureau ofStandards. Garvin, D., and Chemical Kinetics Two National Bureauof Standards DataCenters: 7, No. 1, 31-4,1967. and Mass Spectrometry. J. Chem. Doc. Rosenstock, H. M. (NBS Mass SpectrometryData Center, Information from 1,000basic documents Washington, D. C.). Dec. published since 1955. Inf. Retr. Newsl.2, No. 2, p. 5, 1966.

HT RESEARCHINSTITUTE [.

154 IITRI-C6104-4 MAJOR SPECTRA SOURCES TABLE 2 MAJOR SOURCE SPECTRA AVAILABLE FORM OF SPECTRA IR SPECTRA Complete indexed collection, COST API,SPECTRA MCA, SOURCETRC, DMS, ASTM growth100,000IR - 2i2 potential spectra to 16,L; CodedNo complete spectra spectra paredthetapesadditional.$2,250 9th for -supplement; index IBM 3607090to collection upbeing pre- Available on (model 20) through LiteratureNBS,andIRDC-Japan. Sadtler,Foreign. - Domestic able.10thIndices15,000-20,000/Year Supplement DS-24 through avail- LaboratoriesSadtler Research 32,000 spectra hardCompletefilm.spectra copy spectra oridentified micro- - All Sadtler tional.$4,000. Spec-finder addi- Sadtler spectra. Coblentz Society 5,000 spectra Completeand indexed spectra by ASTM.only. Sold10/sheet. by Sadtler Research Laboratories.UniversityIndustrial andResearch byA11Completeidentified ASTM. Coblentz spectra. andspectra indexed Laboratories.30/sheet. Industrial TRC Approximately4,000 spectra AllTRCfiedSummer, TRC indexand spectra indexed 1968.to be identi- by ASTM. available Researchand University Laboratories. IR DATA COMPILATIONS

American Society for Testing &Materials. Numerical list of abstracted infrared spectraindexed on Wyandotte-ASTMpunched cards, December 1961. Firstsupplement-Jan.1963 Secondsupplement-Jan.1964 Thirdsupplement-Apr.1965 Fourthsupplement-Apr.1967. American Society for Testing &Materials. Serial number list of compound names and references,publidhed IR gpectra 1963. Fourth supplement to serialnumber lists - DS29-S4, April, 1967. American Society for Testing &Materials 1916 Race Street Philadelphia, Pennsylvania 19103 Indexes - 100,000 spectra of organiccompounds.

Anderson Physical Laboratory 609 South Sixth Street Champaign, Illinois Mostly organic IR - 0.7p to3.2p 2 spectra per 81/2" x 11"loose leaf gheet 1,200 Anderson Near InfraredSpectra 500 polymiclear compounds.

Coblentz Society. Pure compounds and commercialproducts. IR only,212 to 30p 5,000 spectra available on8-1/2" x 11" sheets Coblentz Society spectra - obtainedfrom Sadtler Research Laboratories, 3316 Spring Garden Street,Philadelphia 2, Pa. $100 per 1,000 spectra.

Dobriner, K., et al. Infrared Absorption Spectra ofSteroids. InterJcience Publishers, Inc., NewYork. 1953. An Atlas - Vol. I. 308 spectra on Perkin-ElmerModel 21 double beam spec- trometers, NACL or CalciumFluoride prism. Roberts, G., Gallagher, B. S., andJones, R. N. Infrared Absorp- tion Spectra of Steroids - AnAtlas - Vol. II. 1958. 453 gpectra.

lITRESEARCH INSTITUTE

156 IITRI-C6104-4 Documentation of MolecularSpectroscopy (DMS). Published jointly by Butterworth'sScientific Publications and Verlag Chemie GmbH.Weinheim/Bergstrosser. Spectra available from: Spex Industries, Inc. 205-02 Jamaica Avenue Hollis 23, New York

Fisk University, Nashville,Tennessee (Infrared Spectroscopy Research Laboratory). Maintains a library of 100volumes on infrared spectroscopy and gas chromatography. Hershenson, H. M. IR Absorption Spectra. Index for 1958-1962. Academic Press. 1964. Jonker Business Machines, Inc.,Gaithersburg, Maryland. ASTM Infrared OpticalCoincidence Index. Data retrieval and correlationfor organics in the following IR collections: API Project #44 1,702 compounds NRC-NBS 2,495 compounds Coblentz 1,859 compounds MCA 170 compounds IRDC-Japan 11197 compounds All hydrocarbons from all other ASTM indexed collections 11828 compounds Korobeinicheva, I. K., Petrov, A.K., and Koptyng, V. A. Spec- tral Atlas for Aromatic andHeterocyclic Compounds No.1, In- frared and UltravioletAbsorption Spectra of PolyfluoroAromatic and Polyfluoro HeterocyclicCompounds. Nauk: Hovosibirsk. 171 pages, 1967. LeCentre d'Information del'Infra-rouge (CIR). Organic compounds and Inorganic compounds. Sources: Sadtler DMS API Research Project#44 Coblentz collections Institute Francois de Petrole,RNU Renault, Rhone Poulenc, St. Gobain. CIR published 200 spectra thendiscontinued publication.

Ministry of Aviation. An index of publishedinfrared spectra. Her Majesty's StationeryOffice, London, 1960.

UT RESEARCH INSTITUTE

157 IITRI-C6104-4 Bureau of Standards. National ResearchCouncil - National Spectra Data Project. 8-1/2" x 11" Mostly organiccompounds - 2,510spectra on sheets and Project terminated andspectra distributedto universities not for profitresearch organizations asof December 1967.

Sadtler ResearchLaboratories, Inc. 3316 Spring GardenStreet Philadelphia, Pennsylvania 19104 IR - 32,000complete spectraorganic compounds Plenum Press. Szymanski/ H. A.(Ed.). Infrared BandHandbook. 1965. Supplements 1-21Plenum Press, 1965. Szymanski, H. A. (Ed.). Infrared Band Handbookand Supplements New York. and InterpretedInfraredSpectra. Plenum Press. 1963. $7.50. Plenum Press. White, R. G. Handbook of IndustrialIR Analysis. 1964. Wright-Patterson Air ForceBase, Ohio. Aliphatic and Aromatic Published as Hydrocarbons, bromohydrocarbons. IR 15,4-35A. WADC TR 57-359,57-413, 58-198 andsupplement 1 to 58-198.

IIT RESEARCHINSTITUTE

158 IITRI-C6104-4 MAJOR SPECTRA NMR SPECTRATABLE 3 SOURCES EAJOR SOURCE SPECTRA4,000 spectraAVAILABLE CompleteFORH OF SPECTRA spectra COST$800 for SPECTRASadtler SOURCE spectra LaboratoriesSadtlerTRC Researdh 1, 260 spectra Complete spectra collection30Vsheet UniversityIndustrialsearch Labs. and Re- -

NMR DATACOMPILATIONS

Shoolery, J. N. (Instrument Johnson, L.F., and ghacca, N. S., NMR SpectraCatalog, Vol.1 Division ofVarianAssociates). and 2,National Press,1962. and Webb, J.S. (Eds.). Formula Howell, M. G.,Kende, A. S., Vol. 2. Plenum Press. New York. Index to NMRLiterature Data, 1956a. $22.50. 518 pp. 1966. CA 62, circulated byESSO Research Chamberlain,privately NMR Data - chemical shiftvalues andcoupling and Engineering. Tables of constants forrepresentativestructures.

Sadtler ResearchLaboratories, Inc. Garden Street 3316 Spring 19104 Philadelphia,Pennsylvania printed form ormicrofilm at 4,000 NPR spectraavailable in a costof $680. Varian Associates, Catalogs, Vol.1 and 2. Varian Spectra of 700 spectraof representtive California. A total code, and by Palo Alto, by afunctional group compounds,indexed by name, chemical shifts.

lITRESEARCH INSTITUTE

160 IITRI-C6104-4 TABLE 4 SOURCES MAJOR SPECTRA -GC DATA SPECTRA COST SPECTRA SOURCE MAJORASTM SOURCE SPECTRA4,000 AVAILABLE FORMIndexed OF compilation 111 CommercialLiteratureIndustrial

1111111111111111111111111.111111.111111111.111111111.1.11.111111111111111111.11111111...... _._-_-_-_ -

GC DATACOMPILATIONS Compilation of Gas forTesting & Materials. American Society $20.00. Chromatographic Data. STP 343. 626 pp. 1963. Schupp, 0. E.). DS25A published1967(see reference,' Preston McReynolds, W. O. Gas ChromatographyRetention Data. 1966. $25. Technical AbstractsCo., Chicago,Illincis1 Co., Preston, S. T., Jr.,and Gill, M. (Preston Tech. Absts. to theliterature on Chicago, Ill.). Bibliography and index gaschromatography. 1966. Compilation of GasChrom- Schupp, O. E., III,and Lewis, J.S. Publ. No. DS25A,2nd Ed., atographic Data. ASTM Data Series ASTM, 732 pp. 4,000 compounds. 1967. Literature. Plenum Signeur, A. V. Guide to GasChromatography Press, New York. 359 pp. 1964. $12.50.

11T RESEARCHINSTITUTE

162 IITRI-C6104-4 - t=1111M7r MAJOR SPECTRA TABLE 5 SOURCES MAJOR SOURCE SPECTRA AVAILABLE FORM OF UV DATA SPECTRA COST SPECTRA SOURCE ASTM 25/749 spectra plete$754collectionadditionalIndices com- CommercialLiteratureIndustrial LI) Sadtler Researdh 22,000 spectra Complete81/2" x 11" spectra or $2/278 LaboratoriesSadtler Research LaboratoriesTRC 11246 spectra microfilmComplete spectra 3WSheet UniversityIndustrial and Labs. UV DATA COMPILATIONS

American Societyfor Testing &Materials 1916 Race Street 19103 Philadelphia, Pennsylvania compounds. Indexed through the7th supplement. 25,749 organic Cost $754 forcomplete index. Spectrometry Group Compounds. Photoelectric Atlas of Organic Spektroskopie. and the Institutfor Spektrochemieand Angewandte Plenum Press, NewYork. 5 volumes. 1966. Numerical List of American Society forTesting & Materials. Spectra Indexed onWyandotte- abstracted Ultravioletand Visible Second supplementto The ASTM punchedcards. March 1963. Abstracted Ultravioletand VisibleSpectra Numerical List of September 1966. Indexed onWyandotte-ASTM PunchedCards. of organic Hiragma, K. Handbook of UV andvisible spectra 616 pp. Data on 8,443 compounds. Plenum Press, NewYork. compounds fromliterature. 1967. Spectra. Hershenson, H. M. Ultraviolet andVisible Absorption York. 1966. Index 1960-1963. Academic Press, New Korobeinicheva, I. K.,Petrov, A. K.,and Koptyng, V.A. Aromatic andHeterocyclic CompoundsNo. 1, Spectral Atlas for Polyfluoro Aro- Ultraviolet AbsorptionSpectra of Infrared and Nauk: Novosibirsk. matic and PolyfluoroHeterocyclic Compounds. 171 pp. 1967. in the UV andVisible Region. Lang, L. (Ed.). Absorption Spectra Academic Press, NewYork. 1966. Visible Region, Lang, L. (Ed.). Absorption Spectrain the UV and Hungarian Academy ofSciences. 1961-1963.

Sadtler ResearchLaboratories, Inc. 3316 Spring GardenStreet Philadelphia, Pennsylvania 19104 Printed form, ormicrofilm availableof complete spectra. 22,000 UV spectraavailable at costof $21278.

PITRESEARCH INSTITUTE

164 IITRI-C6104-4 ,

APPENDIX C

DATA MASTERSHEET

IIT RESEARCHINSTITUTE

165 IITRI-C6104-4 106,423 p-xylene 106.160 CAS REG. NO. COMPOUND NAME MOL. WT.

TRIVIAL AND TRADE NAMES:Xylol

CH STRUCTURE: FORMULA: 810 $11101 13 3°C M.P.:

B.P.:138.4°C STATE: Liquid ANALYTICAL DATA McReynolds, CONDITION A CONDITION B Data Source - GC "GC RetentionData," 1966, AV=1.63 RV=1.56 pp 51 and145 g g 889 1180 I= Ix= x

51 77 27 78 50 92 MS MASS:CHARGE: 91 106 105 39 162 137118 82 77 77 REL. INTENSITY: 624 297 164 DATA SOURCE-APIPro'ect 44,4422

NMR ppm: NON

7 STRONGEST: 2.3 0 1 2 3 4 5 6 CDC1 ...... 05 - - 30 - - SOLVENT: _3 1, 1962,4203 DATA SOURCE-Varian, '7%7(31. TMS) (arranged in order ofincreasing shiftvalue from

IR MICRONS: 12.6, 6.6, 6.9

Phase - liquid Cell -0.01 mm 42276 DATA SOURCE-Sadtler AMII.....ftwriva. UV MILLIMICRONS:

= 2.2 mp, max Solvent - Cyclohexane

DATA SOURCE-Sadtler#2276N

166 1ITRI-C6104-4 APPENDIX D

CHEMICAL SIGNATURESEARCH EXPERIMENT

lITRESEARCH INSTITUTE

167 IITRI-C6104-4 PROJECT C6104

CHEMICAL SIGNATURE SEARCHEXPERIMENT

DATA SHEETS FOR "UNKNOWN"COMPOUNDS

Compound name Number

STATE Solid Liquid Gas

ELEMENTS NOH 0N BrCl

SKIP TRANSPARENT NMR

-0.1 ppm Max. Peak +0.1 ppm

SKIP TRANSPARENT

-2 mg Peak

SKIP NOT POSSIBLE Column A: Time Column B: Time

SKIP Peak 1: -0.1 A Peak 1 +0.1 g Peak 2: -0.1 A Peak 2 +0.1 A Peak 3: -0.1 A Peak 3 +0.1 g

SKIP Peak 1: Peak 3 Peak 2: Peak 4

Coder Date *Magnitude of tolerancebased on a slidingscale. 168 IITRI-C6104-4 APPENDIX E

CHEMICAL SIGNATURESEARCH RECORD

UTRESEARCHINSTITUTE

169 IITRI-C6104-4 PROJECT C6104 ExperimentUnknown CHEMICAL SIGNATURES SEARCH RECORD OperatorDate Routine Parameter Tally Skip MatCh M Search Tally Default D MD MatchCum. ElementsState I uv NP Col. A Col. B AB IR 1-1 1-2 1-3 2-1 2-2 2-3 3-1 3-2 3-3 I 1M I 314 Match3-4 = 2M+ 3M MS NP4-1 1-14-2 1-24-3 1-34-4 1-4 1M 2-1 2M 2-2 3M 2-34M Matcb2-4 = 2M+ 3M+ 4M 3-1 3-2 3-3 REMARKS: APPENDIX F

CANDIDATE LIST

IIT RESEARCHINSTITUTE

171 IITRI-C6104-4 Record 14 (*R14) ExperimentUnknown CANDIDATE LIST AnalystDate Compound State , Element No. NMR L Data Match GC Peaks IR Peaks MS Rank No. Name 1 M M MDMD1 Col 2 Col,DM D M , 1 -1- .-s 1- 1 I 1 i I I I APPENDIX G

CANDIDATE LIST RATING TABLE

IIT RESEARCHINSTITUTE

173 IITRI-C6104-4 CANDIDATE LIST RATING TABLE

SEARCH PARAMETER RATING SEARCH PARAMETER RATING 16 - 2 1-1 STATE 1-2 12 1-3 8 1-4 4 ELEMENTS - 4 2-1 9 + TOLERANCE 20 2-2 12 9 NMR EXACT PEAK 24 2-3 2-4 6

COLUMN A: MS 3,1 4 6 3-2 6 + TOLERANCE 3-3 8 EXACT 9 3-4 6

GC COLUMN B: 4-1 1 6 4-2 2 + TOLERANCE 4-3 3 EXACT 9 4-4 4 MAXIMUM 18 MAXIMUM 40

1 to 1 18 + TOLERANCE 4 15 UV 1 to +1 EXACT PEAK 6 1 to -.f 12 1 to +2 9 1 to -a- 6 Any SKIP 0 1 to +3 3 Any DEFAULT 1

2 to 1 8 , 2 to +1 6 IR 2 to 12 2 to +2 9 2 to -a- 8 2 to +3 6

3 to 1 2 3 to +1 1 3 to -2- 4 3 to +2 3 3 to .5. 6 3 to +3 4 MAXIMUM 130 MAXIMUM 36

174 IITRI-C6104-4 DISTRIBUTION LIST

This report is beingdistributed as follows:

Copy No. Recipient

1-100 National ScienceFoundation 1800 G Street Washington, D. C.

Attention: Mr. T. Quigley

101 IIT ResearchInstitute Section Files

102 IIT ResearchInstitute Editors, M. J. Klein,Main Files

103 IIT ResearchInstitute G. E. Burkholder

104 IIT ResearchInstitnte M. E. Williams

105 IIT ResearchInstitute P. Llewellen

106 IIT ResearchInstitute E. S. Schwartz

107 IIT ResearchInstitute H. J. O'Neill

108 IIT ResearchInstitute B. E. Edwards

109 IIT ResearchInstitute W. M. Linfield

110 IIT ResearchInstitute R. G. Scholz

lITRESEARCH INSTITUTE

175 IITRI-C6104-4