<<

(44) HPP-73 2

Tetrahedron. Vol. 29, ppA l l7 to .1134 Pergamon Press 1973 Primed in Britain

APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE—X INTSUM. A DATA INTERPRETATION AND SUMMARY PROGRAM APPLIED TO THE COLLECTED MASS SPECTRA OF ESTROGENIC STEROIDSt* D. H. B. G. Buchanan, W. C. White, E. A. Feigenbaum J. Lederberg and Carl Djerassi* Contributionfrom the Departments of Chemistry,Computer Scienceand Genetics, Stanford University, Stanford, California 94305

(Received in USA 17 January 1973;Received in UKfor publication 29 May 1973)

Abstract—A method for systematic interpretation and summary of evidence found for all possible mass spectral fragmentations of a molecule or set of related molecules is described. The method is embodied in a computer program (INTSUM) which interprets, in terms of fragmentation processes, mass spectral data collected on known compounds. Utilizing high resolution mass spectra from 47 estrogenic , the method is verified and new findings are discussed. Finally, the methodis used to explore the fragmentations of equilenins and several acetate and benzoate ester derivatives.

Interpretationof the mass spectra of known com- used to guide the search are those empirical rules whicjh pounds to elucidate mechanisms of fragmentation about fragmentation probabilities are used has been performed manually for many years. The routinely in manual data interpretation.Theserules results of such interpretations have resulted in a are sometimes very good, but sometimes inade- significantbody of empiricalrules relatingfreatures quate. The use of such judgemental knowledge of molecular structures to the ways in which the without guarantees of success characterizesartifi- structures fragment subsequent to ionization. cial intelligence programs as a whole. When more than a few related structures are in- Interpretation of large amounts of data is a task volved, the task of manual examination of the well suitedfor a computer. It can explore the space spectra to determine related modes of fragmenta- of possible interpretations much more systemati- tion can become complex and tedious. The advent cally and tirelessly than a chemist can. Computer of high resolution mass spectra to limit composi- techniques related to this approach have recently tional ambiguities inherent in low resolution mass been utilized in mechanistic interpretation of indi- spectral data has, to some extent, eased the prob- vidual low resolution mass spectrum/structure 5 lemof interpretation.Several techniqueshave been pairs. suggested which are designed to present high resol- A set of65 completehighresolution mass spectra per- ution spectra in a format that aids interpretation.'"' of estrogenic steroids§ was available to test the It can be argued, however,that detailedmechanis- formance of the computer program (termed IN- tic interpretationof data in a single high resolution TSUM) written for data interpretation and sum- spectrum, and, particularly, comparison of several mary. The first goal was to verify the performance spectra of related compounds is still a difficult task of the program by comparing itsresults with man- utilizing these techniques. ual interpretation, using a set of 47 compounds This particular area of data interpretation seems closely related to those studied previously." well suited for study by techniques of artificial in- This verification step offered the advantage of using search paradigm." The ensuring that generalizations developed previ- telligence the heuristic 6 search in this case is over the space of possible ously, utilizing low resolution mass spectra, were fragmentation processes. The heuristics, or rules, correct. This step also offered the possibility of in- vestigation,across a widevariety of compounds,of certain fragmentation processes which were of +This paper is dedicated to Professor Edgar Lederer of generality when incorporated into a prog- Naturelles. Gif-sur- limited Institut de Chimie des Substances ram for automatic structure elucidation.' The re- on the occasion of his sixty-fifth birthday. to Smith, B. G. Buchanan, R. S. mainder of the set of compounds was utilized +For Part IX, see D. H. to C. Djerassi, /. Am. Chem. fulfill the second goal, namely, use of INTSUM Engelmore, H. Adlercreutz and compounds Soc. submitted. explore fragmentation processes for §Derivatives of the l,3,5(10)-estratriene skeleton. (equilenins and several acetateandbenzoate esters)

3117

Great

Smith,

Yvette, 3118 D. H. Smith et al.

spectra whose mass have been subject to little or no identities of each atom in the superatom. prior investigation. Thus frag- mentation processes which result in fragments for- The program is described in the context of opera- mally containing the same numbers of atoms but tion with high resolution mass spectral data. It is actually representing unique portions of the original capable of analysis of low resolution mass spectra, skeleton are saved as separate processes. It is im- possible with elemental compositions for each frag- portant to note that for an acyclic skeleton, every ment limited ion only by the nominal mass and the process of single bond cleavage separates the heteroatomcontent of the molecule. As with man- molecule into two smaller fragments. For wholly ual interpretation there is a considerableincrease in cyclic skeletons, however, every process in ambiguityof explanatory hypotheses when low re- ALLBREAKS must consist of at least two single solution spectra are analyzed. The philosophy un- bond cleavages, as cleavage of one bond results derlying development of this program, its impor- only in a modified form of the molecular ion. In to tance automatic theory formation, a long-range addition, if a cyclic skeleton is viewed as a graph goal of this research, and an overview of require- structure, a fragmentation will be found in the ments for generality and thoroughness have been ALLBREAKS list if and only if it begins presented." and ends outside* the graph. These considerations are com- bined if the basic skeleton containsboth cycles and METHOD chains. As an example, consider alkyl- interpretation the Data and summary. The INTSUM substituted perhydroanthracene skeleton (Scheme program performs three basic tasks: 1). The fragmentation of the alkyl chain, formally (1) Given the basic skeleton (superatom*) com- mon to the set of related compounds, a non- JU redundant list of all possible fragmentationsof this skeleton which result in smaller, unique fragments is produced. This list is called "ALLBREAKS".t (2) Each structure/spectrum pair is then inter- preted in turn as the program seeks evidence for each fragmentation in ALLBREAKS with transfer Scheme 1 of hydrogens in or out of the charged fragment or without hydrogen transfers. depicted as process A ( 15|| 16), wouldbe included in (3) Evidence for all structure/spectrum pairs is ALLBREAKS with charge placement on either of collected and correlated. Evidence for common the two resultant fragments. Similarly, process fragmentation B modes is grouped together and a (Scheme 1, 10|| 12, 9|| 13) is an allowableprocess as it summary is output provided. divides the structure into two fragments (begins The summary output shows which fragmentation and ends outside the graph structure). Process C modes are common to the entire set of molecules (Scheme 1, 13||14, 6||7), however, is not an allow- and which modes are more dependent on sub- able process. Because it does not begin and end stituent placement, and to what extent they are de- outside the graph, it does not split the skeleton into pendent. The three tasks are described in more two fragments. detail below. Additional details of the method are The ALLBREAKS list may be available.8 extended and/or restricted in a number of ways, completely under the control of the chemistanalyzing the data, who (1) ALLBREAKS can make as many or as few simplifying assump- There are several points concerning generation tions as he wishes. The following heuristics (rules) of the ALLBREAKS list which deserve more de- may be used in any combination: tailed program comment. The is cognizant of the (A) Cleavage of aromatic ring bonds and/or iso- lated double or triple bonds can be forbidden. *As described in previous publications in this series, a (B) Cleavage of two or more C-C bonds to the superatom is definedas a structural subunit having at least same carbon atom can be forbidden. one free valence. A free valence is a bond on an atom in (C) A minimum number of skeletal the atoms in the superatom to which another atom (e.g., hydrogen) or charged fragment may be specified. superatom be connected. (D) A specified +The program has atom or group of atoms may be sufficient flexibility to allow the transferred charged chemist to input a selected list of into or out of the fragment. possible fragmentations This is subject to be investigated rather than considering ALLBREAKS. only to the valence constraint that JThis concept is defined more precisely in Ref 8. theremust be a sufficient numberof atoms to trans- §It is recognized that many hydrogen transferprocesses fer. Transfer processes may be restricted to hyd- are relatively site-specific. However, because of formida- rogen atoms. The source and destination of hyd- ble synthesis problems, a complete series of specifically rogen atoms are not specified.! deuterated analogs is seldomavailable,making it pointless (E) Loss of and/or fragmentations sub- to attempt within specification without additional data. stituents on the superatom can be explored.

may 3119 Applications of artificial intelligence for chemical inference—X

input structure. For ex- (F) Multiple step processes (two-step, three- stituent placements in the for frag- ) can be considered to a specified level ample, the ion C7HI2 wouldbe evidence step to three, . . .). mentation B (Scheme 1) with charge retention (level two, level particular (G) Analysisof a set of spectra with respect to a the right of B in any derivative of that 1,2, 3,4, 9, 11 and 12. given list of processes is also possible. In this case superatom unsubstituted at C- ALLBREAKS is completely specified by the With an hydroxyl substituent at the ion process B as de- chemist. C7H120 would be evidence for Considering (F) in more detail, unless additional scribed. (e.g., defocussing), a This part of the program performs another im- data are available metastable explanations multiple step process of level n is not allowable if portant task by grouping alternative elemental composition of an the same fragment can be generatedat level n-1. together. Because the portion of the skeleton In other words, allprocesses are considered as con- ion does not specify the may be explicableby certed rather than stepwise,if possible.* For exam- from which it arose, this ion with reference to Scheme 1, process B with more than one of the processes in ALLBREAKS. ple, appropriate charge to the left of B followed by pro- The alternatives may be eliminated if retention labeled cess A with charge retention to the right of A is a isotopically labeled or substituent general there may legitimate two-step process resulting in a fragment molecules are available. But in explanationsfor an ion, all of comprising 14, 13, 8, 7, 6, 5 and 15. Process B be several alternative with charge retention to the left of B, followed by which are saved. process A with charge retention to the left of A is a legitimate two-step process as it (3) Summary output not considered all the evidence found can be fully explained by the single process A with The program summarizes process. Each process for charge retention to the left of A. for each fragmentation which any evidence is found is presented together dis- Fragmentation with an ordered(by %1) list of thosemolecules (2) evidence process. This list also con- input to the interpretation section of the playing evidence for the Data explanations the ions as- program consists of structure/spectrum pairs. The tains all alternative for signed this process. structure is specified by the superatom used to to Spectrum by spectrum output is also provided create ALLBREAKS followed by modifiers repres- enting placements about the skeleton. which includes: substituent explanationsof each ion in spectrum For example, an hydroxyl group at carbon 1 (a) all each specified as (SUBSTOH 1). (within the restrictions for ALLBREAKS); (Scheme 1) would be explained spectrum consists of a list of elemental (b) a list of ions in each spectrum not The mass list; compositions with their associated intensities ex- by any process in the ALLBREAKS ion currentwhich remains pressed as a percentage of total ionization (%2). (c) thepercent of total Each structure/spectrum pair (a single pair may unexplained.t processes differing be if desired) is analyzed by searching In addition, fragmentation considered can begrouped together for supporting evidence in the spectrum for each onlyby hydrogen transfers These lists provide a entry in ALLBREAKS. The appropriateelemental as single processes if desired. composition is sought through ALLBREAKS' check on how well the chosen setof ALLBREAKS knowledge of the identity of skeletal atoms in each has explained the spectrum. proposed fragment and the specification of sub- RESULTS AND DISCUSSION The number of proposed unique fragmentation ♦This is an application of Occam's Razor. We feel that, processes to be considered for a complex molecule confronted with a choice between simple and complex is even when the operation of hypotheses, with no additionaldata, most scientists would ALLBREAKS is constrained at the discretion of choose the simpler one. chemist,by employing heuristics describedpre- may be misleading the tAreviewer suggests that this value viously. amply illustratedfor the subject of intensity have no This is if a number of low ions at high mass (Fig 1). The The facility examinationof the this study, the superatom explanation.We ag^ee. for (entries data in this detail is not included in the present program. number of processes in the ALLBREAKS shown Manual examinationof the spectrum by spectrum output table) for various degrees of restriction are quickly reveals this information. in Table 1. Note that the heuristics that the aroma- tThere are, of course, exceptions to these heuristics. tic ring not be cleaved and only H atoms can be Predominantly aromatic compounds frequently undergo transferred are common to all entries in Table 1. aromaticring cleavage." Also, trimethylsilyl ethers appear These are heuristics which apply to most classes of particularly prone to undergo group migrations during significant I0 molecules.* In particular,no structurally fragmentation. " involving ring cleavages or be when deal- processes aromatic These heuristic limitations can removed . molecules wherein these mechanisms group migration have been noted for ing with a class of (heuristic E) of generally lower probability may be operative. Loss or fragmentation of substituents

C-l,

C-10,

formidable, O. H. Smith et al.

I able I. The number of unique fragmentation processes for the estrogen superatom. Letter designations of heuristic restrictions to ALLBREAKS common to all entries are defined in theKEY

Number of processes* Allow H transfert Heuristic No H transfer -2,-1,0,1,2 Two-step processes: (1) A, D, E, allow up to two-step processes (2) As (1), but forbid cleavage of two bonds to the same carbon atom (3) As (2), but forbid fragments containing less than six skeletal carbon atoms One-step processes: (4) A, D, E, allow only one-step processes (5) As (4), but forbid cleavage of two bonds to the same carbon atom (6) As (5), but forbid fragments containing less than six skeletal carbon atoms

KEY A do not cleave aromatic ring bonds. D——allow only hydrogen atoms to be transferred (no larger groups). E do not consider loss or fragmentation of substituents. ♦Includes— the identity process (molecular ion). tA (-) sign indicates hydrogen transfer away from the charged fragment. The results in this column are not exact 5x multiples of the previous column because of valence constraints.

skeleton retaining the C atom to which two bonds were broken, formally result in generation of a chargedcarbene species, processes which are gen- erally regarded as unfavorable.* Cleavage of two C-C bonds to the same C atom involved in an elimi- nated neutral species may be a Fig 1 . The estrogen superatom. favorable process, however, particularly when the neutral species is are not considered in this tabulation as the increase carbon monoxide 13, 14 or another stable neutral in numbers of processes are a function of the size molecule. and nature of substituents on each particular The optionalrestriction of specifying a minimum molecule. numberof skeletal C atoms in the charged fragment One of the heuristics with greatest restrictive (compare (2) to (3) and (5) to (6), Table 1) reduces power is that which forbids cleavage of two C-C the complications of having the program consider bonds to the same carbon atom (compare (1) to (2) lowmass ions whichretain littleor no structural in- and (4) to (5), Table 1). Processes of this type, re- formation, at least in the case of estrogens. sulting in charge retention on the portion of the The questionof the level of processes to be con- sideredfor a class of moleculesis an important ♦Several instances of one. formal cleavage of a C-C bond Consideration of up to two-step processes and a C-H bond the same for the to C atom have been noted in estrogen superatom increased the size studies ofthe fragmentation of other classes of 2 of the steroids.' ALLBREAKS list by about (com- Because the program is to the of a factor of four insensitive source hyd- pare (1) to (4), 1). rogen atoms, this type ofprocess is formally cleavage of a Table This increase occurs de- single C-C bond with hydrogen transfer. Although carbon spite the restriction (see Method section) that a monoxide expulsionis sometimes noted in the fragmenta- two-step process is not considered if a single-step tion of derivatives of (7), there is no definitive process can explain the fragment. evidence for other processes involving cleavage of two By forbidding cleavageof two C-C bonds to the C-C bonds to the same C atom for the simple estrogens same carbon atom, the size of (below). the two-step process lists is reduced significantly resulting in about a

3120 Applications ofartificial intelligence for chemical inference—X 3121 factor of two more processes than for the single- step series (compare (2) to (5) and (3) to (6), Table I). The subsequent discussion will be concerned with up to two step processes as these serve quite well to explainthe majority of processes important to structure elucidation. This is, however,a charac- teristic of estrogens and may not be true in general. To provide an indication of the ambiguity of the high resolution data and the extent of alternative explanationsfor a datum, processes for generation of C nH,;, and C,,H|< fragment ions in the unsubsti- tuted estrogen superatom (Fig 1) are presented in Scheme 2. These species are the only two possible which contain 13 C atoms for single-step processes involving no hydrogen atom transfers (Table 1, Case 4). Note that processes 37, 38 and 42 would not be consideredif cleavage of two carbon-carbon bonds to the same carbon atom is forbidden. Also, the seven processes in Scheme 2 can yield the same ion, CnHiL if a single H atom transfer (-1, Table 1) is allowed in addition to zero H atoms transferred as specified in Scheme 2. The specification of fragmentation processes produced by the program is highly symbolic (Scheme 2). The program does not attempt to de- lineate the course of fragmentation in any further detail as it is felt that, without additionaldata (e.g. deuteriumlabelingto specify H transfers), such de- tail would represent only speculation. For simplicity the set of 65 compounds (see Table 2) was divided into two groups. The first group, comprising 47 compounds, contains a series of compounds substituted in variouspositions with what might be termed "simple" substituents, for example, hydroxyl, methoxyl, oxo, halogen, alkyl and so forth. These compounds are closely related to those studied previously6 and comprise the set used for verification of the program. The second group (18 compounds) contains those compounds whose spectra have not been subjected to scrutiny, the equilenins,6 and several acetate 15 and benzoate esters. This group comprised the set of compounds used to test the program on new data.

"Simple" estrogens The spectra of the 47 compoundswere analyzed employing the heuristics summarized in condition (3), Table 1 , allowing from -2 to +2 hydrogentrans- fers. Because previous work6 indicated no signifi- Scheme 2 cant processes involving substituents and because verification of the program's operation was the Exceptions are 13, which displays intense loss primary goal in analysis of the 47 compounds, sub- of the C-6 and C-7 OH substituents as water, and stituent processes were not considered. The right- to a lesser extent, 7-ketoestrone (17) which dis- hand column of Table 2 indicates the percentageof plays an intense series of ions of composition the total ion current explained by processes in the M'*—C„H ;„. 102 (n = 1-3), processes which remain list of ALLBREAKS. Generally 70-90% of the ions unexplained at this time. in each spectrum could be explained with the re- From the standpoint of structure elucidation, the maining ion current distributed over a variety of most important processes are those which occur in low intensity, low mass ions which represent com- all or nearly all of the compounds. The data pre- plex fragmentations including loss of substituents. sented in Table 3 represent a portion of the output 3122 D. H. Smith et al.

of the program. This includes processes which creasing as the size of the charged fragment comprise de- ten or more skeletal C atoms and for creases particularly with eight or skeletalcar- which fewer evidence was found in the spectra of more bon atoms. The completeset of results, only part of than of 40 out the 47 compounds (8.5% or more).* which is summarized in Table 3, confirms this ob- The intensity data represent results after H trans- servation by specifying the ambiguity of every peak fers werecombined. The most frequent H transfers in every spectrum. There are, for example, four are noted. A symbolic description of each process explanations for ions comprising eight skeletal C is included in the Table. The followinggeneral com- atoms, eight explanationsfor seven, and eight exp- ments may be made about these data. lanations for six skeletal C atoms. It has been a general assumption of mass spectroscopists Ambiguity that smaller fragments of molecular ions generally There is significant ambiguity in theresults, with specify less information about molecular structure. the number ofalternative explanationsgenerally in- These results place this assumption on a quantita- tive footing. In most cases where only a finite *The program does not impose a lower limit on inten- numberof compounds are available,ambiguity will sity of the peaks used as evidence. However, although be the rule rather than the exception. For a peak intensities vary rela- in principle through a continuum, the tively complex skeleton such as the estrogen skele- sensitivity of the mass spectrometer imposes a lower ton a series of derivatives representative limit. lons over a dynamic range at with sub- of least 100:1 but at every usually not more than 200: 1 were recorded all stituents skeletal atom is seldomavailable. for sam- For many ples. This is a useful intensity range for most investiga- molecules the pattern of substitution will not tionsof "significant" fragmentation processes, but it must allow differentiation of possible processes. be recognized that the operating conditions of the mass Thus a single ion may have alternative explana- spectrometer create an artificial threshold determining tions. Ambiguities can only be resolved by exami- whether a given molecule shows or does not show ions nation of the results obtained for compounds pos- due a given process. to fragmentation sessing substituents on skeletal atoms which are

Table 2. The set of sixty-fiveestrogens

Compound Name % explanation*

1 desoxyestrone 87 2 2-hydroxyestradiol 79 3 2-methoxyestradiol 78 4 2-hydroxyestrone 78 5 2-methoxyestrone 77 6 75 7 estrone 87 8 81 9 1- 77 10 16-hydroxyestrone 81 11 estrone 3-methyl ether 77 12 17<*-methylestradiol 77 13 l-methyl-6a-7a-dihydroxyestrone 44 14 17-vinylestradiol 3-methyl ether 7S 73 15 A -estrone () 74 16 A*' -estrone 74 17 7-ketostrone 63 18 1la - 77 19 17a-ethinylestradiol 3-methyl ether 71 20 1 la-hydroxyestradiol 70 21 6-ketoestradiol 81 22 15a-hydroxyestradiol 3-methyl ether 72 23 A" '-estradiol 72 67 24 A -estradiol 76 25 16-ketoestradiol 82 26 15a-hydroxyestrone 74 27 1 l-keto-9/3-estrone 67 28 estradiol 3-methyl ether 78 29 6-methylestrone 84 30 1 la-hydroxyestradiol 3-methyl ether 71 31 1 la-hydroxyestrone 3-methyl ether 70 32 1-methylestrone 3-methyl ether 79 33 A"'- 1-methylestrone 77

7 Applications ofartificialintelligence for chemical inference—X 3123

'ontinuei

Compound Name % explanation*

34 1-methylestrone 80 35 A°"-l-methylestrone 3-methyl ether 62 36 A* -6-methylestrone 74 37 1 ,2-dimethylestrone 78 38 A*'7 - 1,2-dimethylestradiol 70 39 A" -l -methylestradiol 72 40 16,16-difluoroestrone 3-methyl ether 75 41 A'' - 1,2-dimethylestrone 72 42 13-propy1- 18-nor-estrone 81 43 16/3-fluoroestrone 3-methyl ether 78 44 16-bromoestrone 3-methyl ether 83 45 A"'6-estrone 3-methyl ether 77 46 A"-estrone 3-methyl ether 71 47 l-methyl-2-bromoestradiol 65 48 90 49 equilenin 3-methyl ether 94 50 13-ethyl- 18-nor-equilenin-14/3 90 51 13-ethyl- 18-nor-equilenin-14/3 3-methyl ether 90 52 13-ethyl- 18-nor-equilenin 3-methyl ether 92 53 1-methylestradiol 17-monoacetate 94 54 equilenin 3-acetate 92 55 A67-6-methylestradiol 3-benzoate 97 56 A'"-estradiol 3-methyl ether 17-benzoate 94 57 estriol 3,16,17-triacetate 82 A67-estrone 3-acetate 93 59 estradiol 3-benzoate 99 60 2-methylestrone 3-benzoate 87 61 A67 -estrone 3-benzoate 98 62 A6' -l-methylestradiol 3,17-diacetate 92 63 A6' -estradiol 3,17-diacetate 92 64 A'6 l7 -estradiol 3-methyl ether 17-acetate 97 65 A6 - 1,2-dimethylestrone 3-acetate 91 66 16,16-d; -equilenin 3-methyl ether

*This column presents the amount of data explained by the processes included in ALLBREAKS (100% =total). Processes involving loss of or fragmentation within sub- stituents were considered for but not for 1-47. lost in one process but retained in an alternative Verification. The performance of the program is process. The alternative explanationsare by defini- verified and results serve to extend previous know- tion different processes and will differ from one ledge. Mechanisms corresponding to processes 2L another by at least two skeletal atoms for cyclic (cleavage of the bonds resulting in ex- skeletons.This methodof ambiguityresolution was pulsion of C-6 and C-7 as ethylene), 10L, 18L, manually applied to the program's output. Strong 17L, 7L and 6L have been proposed previously.6 weight was attached to processes which were un- These mechanisms are verified as general to the ambiguous explanations for at least some com- skeleton, with the exception of 2L. As has been pounds which permit differentiation. This method noted previously, this process does not yield a has the potential drawbackthat the particular sub- significant ionin several instances, the most notable stituent label may itself direct thefragmentation of being C-17 hydroxyl compounds (generally display- the molecule along a different pathway. This oc- ing a less abundant ion than corresponding C-17 currence is revealed in the spectrum by spectrum oxo compounds) and C-ll hydroxyl or oxo com- outputs which were manually examined in an at- pounds, which yield no ions corresponding to pro- tempt to avoid this possibility. The results of this cess 2L. The C-ll substituted compounds (20, 27, procedure are summarized in Table 4. 30 and 31) insteaddisplay evidencefor loss of a two carbon fragment comprising C-ll and C-12 *An alternative explanation is loss of C-16 and C-17 (process 9L). The resulting ion comprises 13-4%£ with the C-17 oxo group. As this process is not significant for 11-keto-9/3-estrone (27).* Process 9L may in any other estrone-related compounds, it is not a plaus- occur to a significant extent in other derivatives ible explanationfor 27. also, where it cannot be distinguished from process

7

7

7

7 7

'7

48-65,

C-5,6 C-7,8

7 Table 3. Processes" shownby a (* 4O/47) 85% of the simpleestrogens, including ten or moreskeletalcarbon atoms rj Retention of n skeletal Process carbon atoms Symbolic Observed/ Alternative %2 Most frequent label" total n = description J Ambiguity' explanations range hydrogen transfers

3 1-4-3- 1

26-7-0

X

2L/11L 3-7-0 5

B

20L 6-4-0 0.-1.-2

19L 14-4-0 -1.-2 2L/10L

18L 14-4-0 -1.-2 2L/IOL

0,-1

-1,-2 - 2L/10L 45/47 IXL 14-7-0 0,-1,-2 19L c

Xf

81. 45/47 I7L 105-0 -1. -2 2L/20L > ■3S o'» 5' 17L 46/47 8L 11-2-0 0,-1,-2 O 2L/20L 3 n3 E 5 2L/20L 40/47 B 8L 8-3-0 ■ a I7L a r.n a . ~,— a atr 7L 46/47 nI 2L/18L 19-8-0 0,-1,-2 :-_ 2L/19L ns; acl r. n 2L/IBL 40/47 X 7L 19-9-0 + 1,0,-1 2L/I9L

2L/19L 46/47 7L 19-9-0 +1,0,-1,-2 2L/18L UJ ! V.>

vt

-1,-2 UJ I J 9

Table 3—Continued

Retention of n skeletal Process Observed/ carbon atoms Symbolic label" Alternative % 1 Mostfrequent total n = description Ambiguity' explanations range" hydrogen transfers

16L 45/47 10 61 15 0-0 2L/I7L

D 61 47/47 10 16L 24-6-0-1 + X 2L/17L v.

r.H

2L/17L 45/47 10 16L 24-6-0 + 1.0. I 61.

The processes are ordered by decreasing size of the charged fragment. L SS nj£„ Pr abel fP^'^charge retention on the portion of the molecule which contains the lowest numbered skeletal f It °vS Cd b nd cleava es The numbering the positions ZITa p f p° 'T '" "if ° 8 of for the estrogen skeleton, or superatom is retaining molecule C-13. An H specifies the reverse. Two-step processes are designated by two labels A no separated by a "/" mark indicates that there was no process which was a frequent explanation. A "yes" processes alternative indicates that one or more were frequent alternative explanations. Frequent was arbitrarily defined to be more than of the time In all cases a relatively smooth decline of % 2 values was observed between the high and low values given in the column. Thus the average value of % 2 for a process is approximately the mid-point of the indicated

0,-1.-2

1,0.

+2,

,hC

'

50%,

range. Applications of artificial intelligence for chemical inference—X 3127

Table 4. Processes, general to the skeleton, which remain after manual removal of ambiguities"

Compounds not Process Description displaying process 0 Molecular ion 10L Retention of 15 skeletal carbon atoms 13 (Rings A, B, C) 20L* Retention of 14 skeletal carbon atoms 13, 26, 43, 44 2L/11L* 26, 41 18L Retention of 13 skeletal carbon atoms 2L/I0L" 13, 44 27, 31 17L Retention of 12 skeletal carbon 7L atoms 13 Retention of 1 1 skeletal carbon atoms 13 6L Retention of 10 skeletal carbon atoms (Rings A, B) 14L" Retention of 9 skeletal carbon atoms 15L 31 27,31 SL" Retention of 8 skeletal carbon atoms (including Ring A) 4L" Retention of 7 skeletal carbon atoms 13 (including Ring A)

"See Table 3 and Scheme 3 for process descriptions. , Previously unreported processes suggested by INTSUM

<^"

4L 15L SL 4, (+1,0,-1,-2,)* (-ri,(+1 u, -1i, -2Vzj i^'i m* ' o (+ l, 0)* (+2, +D* *Most frequent hydrogen transfers. Scheme 3 f iS r tOPk Substituent of "trogen spectra is generaUyquite complex, and feline ° ° ° are \j„ f there many doublets unseparablebv lowresolu- Newfragmentationfragmentationprocesses. The output of the tion techniques program combined with analysis of ambiguities Other 718 (Table 4) studies' have led to the proposal of a suggests that additional and alternative mechanism formally operative. corresponding to 15Lfor gen- mechanisms are Evidence supports both eration of fragment ionsretaining 20L and 2L/11L, and nine skeletalcar- 2L/10L as an alternative to bon atoms comprised in part, process 18L (Table 4). Stepwise processes ring A. Process involv- 14L, however, is as plausible an explanation ing 2L are alternative explanationsfor from other pro- the data at hand without invoking other cesses in Table4 as well,but not considera- with sufficient fre- tions offragmentationprobability such as preferen- quency to define them as general. example, As an tialcleavage of benzylic bonds. Process 4L is par- extensive metastable defocussing experimentsper- ticularly helpful 16 as it serves to define the sub- formed utilizingestrone methyl ether indicate that + stituents on the aromatic ring or at C-6. This pro- the fragment of m/e 256 (M as C tt,) a 2 is cess will allow more precise specification of detectable but not extensive contributor to ions re- 7 sulting molecular structure, as past efforts could place primarily from 17L, 7L and 6L, in the first only region. substituents somewhereon ring A or B based field-free on mass spectral data The new processes alone. suggested in Table 4 (14L or Intensity variation. The processes 15L, SL. 4L) to summarizedin serve define ever decreasing por- Table4 in general yield tions of rings B some variation in ion abun- and C. Because of their common dances as a function of particular occurrence they should facilitate substituents at structure elucida- particular positions. In many cases effects of sub- tion. High resolution mass spectral data will in- stituents on these general skeletal processes are crease the utility of these This region mechanisms. small. In certain molecules, however,the effect of

27,

.■

of,

-C-6,7 3128 D. H. Smith et al.

fragmentation substituents on is significant. Al- equilenins than for the simple estrogens (Table though the scope of this 4). report does not permit an Process 6L (Ring C and D loss) is insignificant for extensive summary of these variations, the most the equilenins. important are the following. The influence of the It is clear from a 6.7-dihydroxyl careful comparison of Figs 2 functionality was mentioned previ- and 3 in combination with the processes ously, as was the summar- influence of a C-ll hydroxyl or ized in Table 5 that important ions in the spectrum keto functionality. The presence of additional of 49 remain unexplained, 237 and skeletal double m/e mle 211. bonds usually results in diminished The mechanism postulated for genesis of 237, importance of processes mle involving cleavages of 1 1L(Table 5), equivalent to that suggested by these bonds, as expected. Bud- zikiewicz" appears plausible, but cannot be opera- There are instances where the influence of unsat- tive as the 16,16-d; analog ofequilenin methyl ether urations in Rings B, and D would 6 probably not (66) does not lose C-16 as illustrated in Figs 2 and be predicted. Examples are the enhancing effect of 3, where m/e 237 is observed to shift to 239. a double bond on process mle 7L, and a This observation is supported by examination of double bond on process 10L. These observations the intensities displayed by the resulting ion may indicate double bond for isomerization subse- 48-52. Compounds 48 and 49, which possess a quent to (Ref 19, ionization p. 276). Methyl sub- methyl group at display intense ions stitution in Ring A severely impor- at diminishes the M *-C2 of 8-9 and 1-1% S tance of the respectively. How- characteristic Ring D cleavage(process ever, compounds 50-52, which possess 10L, Table 3), presumably an ethyl a result of an increased group at yield of only about I%S for population of molecular ions ions formed by ionization process 1 1L. This implies that the angular at the aromatic ring A. Me or Et substituent is lost in the process, along with the ele- ments of Equilenins carbon monoxide. For compounds 50-52. this loss would be equivalent to (ambiguous with) Equilenins, estrogens with an aromatizedRing B, Ring D loss accompanied by loss of an H atom so were treatedas a separate sub-class. The five exam- that it cannot be stated unequivocallythat the pro- ples (48-52) are mentioned in Table together 2 with cess is operative. A process of this type not the percentageof data explained by was ALLBREAKS postulated by the program as it formally (about 90% or more). The improvement involves over the cleavage of two C-C bonds to the same C atom degree of explanation for the simple estrogens is (C-13). Examination of the proposed origin of partially due to of processes mle consideration involv- 211 (Fig 2), process 20L, in light of spectrum ing loss or fragmentation the of of substituents. The mass 16,16-d: -equilenin methyl ether (66, 3) spectra of equilenins see Fig re- have been partially inter- veals that it is operative only to preted previously a minor extent at based on low resolution mass least in the spectrum of 49. The spectral data.6 19 peak remains un- shifted indicating loss of C-16 in the process (mle The processes common to the set of five com- 211, Figs 2, 3). There appears to be no obvious al- pounds are summarized Table 5, along with pro- in ternative explanationthat holds true for all equile- cesses related to specific substituents. There are nins processes studied. There in fact appear to be two distinct common to the equilenins which arenot processes + operative, one generating M -C4 for summarized in the Table due to low ion abun- 48 and 49, which may be ring D loss accompanied dances, which preclude their use for structure by loss of elucidation. the angular Me group, yielding mle 211, Fig 2. This process is of lesser The processes importance in the outlined in Table 5 serve quite 13-ethyl-18-norcompounds (the well to origins 50-52 low resolu- indicate of the significant ions in the tion spectrum of is presented 6). spectra of the equilenins 52 inRef The other which have not been dis- process is analogousto 20L cussed previously. example, with loss of two addi- For with reference to tional atoms, is common the spectrum H to 48-52 and results in of equilenin methyl ether, 49, Fig 2, loss of C-16 (results, for 223 arises from 49, in mle 209, Fig 2). m/e process 10Lwithone hydrogen Because loss of the C-13 loss and Me or Et group is impli- m/e 196 from 19L with H loss (supported cated, the process would by the lack of correspond to loss of shift of mle 196 in the spectrum of 18, 16 and 17. Any interpretation 16,16-d: -equileninmethyl 66). mechanistic ether, There are sev- would be speculation additional eral important processes involving without data. substituents for Further deuterium labelling experiments the equilenins. For example are in the important hyd- progress to clarify the origins of these particular rocarbon ions m/e 178 and 179, m/e 165, and m/e ions. 152 and 153 arise from loss of the C-3 methoxyl substituent together with process 20L, 19L, and 8L Acetate and benzoate or 17L, esters respectively (SUB3L processes, Table 5). The previous examples As expected, illustrated several frag- processes 18L, 17L and 7L (Table mentation processes structuralsignificance. 5) all of which of The involve cleavage of a bond adjacent acetate and benzoate esters, however, (Table 2, to aromatic Ring B, are much less important for compounds 53-65) exhibit fragmentationprocesses

C,

C-7,8 C-9,11

C-13, H,O

C-13,

'

H,O

C-13, Applications of artificial intelligence for chemical inference— X 3129

Fig 2. The low resolution mass spectrum (70 ev) ofequilenin 3-methyl ether, 49.

Fig 3. Thelowresolution mass spectrum (70 ev) of 16, 16-d2 -equilenin3-methyl ether, 66. at least in part characteristic of the substituent thereof. Further decomposition of the ions so gen- rather thanthe estrogenskeleton.These derivatives erated is complex and givesrise to many low abun- are thus not as suitable for structural identification dance fragment ions. as the parent sterols. By requesting that the prog- 1-Methylestradiol 17-monoacetate (53) displays ram consider fragmentation processes involving the characteristicfragmentations notedfor the sim- substituents, theseprocessescan be investigated.A ple estrogens, but with diminished intensity. Only brief summary is providedto indicate some of the minor eliminationof acetic acid is noted. A16,17 es- major cleavagesand rearrangements involved. The tradiol 3-methyl ether 17-acetate(64), on the other data on acetate esters support and extend the hand, yields a mass spectrum exhibiting many of findings of a previous study based on lowresolution the features of the C-3-acetyl derivatives men- mass spectra." tioned above. There is a strong ion (base peak) for elimination of ketene followed by processes re- Acetyl derivatives miniscent of estrone 3-methyl ether which, how- The spectra of all C-3 acetates, (54, 57, 58, 62, 63, ever, yield ions of greatly reduced intensity, par- and 65), regardless of whether they are diacetates ticularly Ring D loss. (62, 63) or triacetates(57), are dominatedby loss of ketene from the C-3 acetyl group. This process, Benzoate esters characteristic of acetates derived from aromatic The C-3 benzoates (55, 59, 60 and 61) yield dis- 70 hydroxyl is followed by processes tinctive spectra consisting of the molecular ion, the summarizedpreviously, such as Ring D loss, in the benzoyl ion(C6H S CO+) and thephenyl ion (C6H s +) case of monoacetates 54, 58 and 65. The diacetates which account for 70%2 for comound 55 and 62 and 63 appearto follow loss of ketene with loss 90-95%2 for 59. 60 and 61. The C-17 benzoate, 56, of the C-17 acetyl function as acetic acid. Other however, displaysan intense ionfrom loss of ben- fragments are generated in further decompositions zoic acid, a process analogousto loss of acetic acid of these ions. (57) fragments in an from the C-17 acetates. This is accompaniedby an analogous manner with loss of ketene accompanied intense benzoyl ion, loss of benzoic acid plus Me by loss of two additionalmolecules of ketene, two radical and other ions of low abundance and obs- molecules of acetic acid and all combinations cure origins.

functionalities, Table 5. Fragmentation processes common to equilenins" Retention of n skeletal Process carbon atoms Observed/ Symbolic Alternative Most frequent label total n = % 2 description Ambiguity explanations range hydrogen transfers

O 5/5 L8 (molecular ion)

IL 5/5 17

c 1 IL 5/5 X 16 aa

XI

B

10L 5/5 15

20L 5/5 14

18L 5/5 13 19L 5/5 Yes 18L 4-7-1-8 -1

XI 5/5 Yes 17L 1-7-0-6 0,-1,-2

"o> "a_ n' B 5' 3 B 17L 5/5 Yes XI 1-4-0-8 +1,0,-1 ° I E." 5"

7L 5/5 No 1-2-0-7 +1

SUB3L/20L 5/5 No 4-7-2-3 0.-1,-2

UJ

'*)

r>

0,-1.-2 Table s—Continued5 —Continued

Retentionof I J n skeletal Process carbon atoms Symbolic Observed/ Alternative % I Most frequent description Ambiguity explanations range hydrogen transfers

Yes SUB3L/19L 6-4-4-4 I

Yes SUB3L/18L 6-4-4-4 I

0 r Yes SUB3L/17L 4-0-2-7 0, 1 v. 1 ft

Yes SUB3L/8L 4-1-2-8 + 1 o

Yes See 20L 1 -3-0-8 0

Yes See2oL 11-2-9-1 (see 20L)

+2,0 OC3*IL/20L 3/3 18L 3 3-2-9 0 -1.- i 19L

OC3*IL/18L 3/3 XL 1-3-0-9 OC3*IL/19L 17L I I I a SUBIBL 3/3' I 2-0-8 0 I 3' a

I3 8" 10-7-9- 1 0 -L- 7 o SUBIBL/10L 3/3 3- a 3 r. K

3 nHi "See Table 4 for explanation of terms. Processes yielding ions of <1%2, for all compounds, are not included. r. 'These processes involve loss of methyl from a methoxyl substituent, so that only compounds 49, 51 and 52 are included. r.3 ft 'These processes involve loss of the substituent at C-18, so that only compounds 50, 51 and 52 are included. X

w UJ UJ

t

+2,0 3134 D. H. Smith et al.

CONCLUSIONS search Projects Agency (SD-183) is gratefully acknow- The computer program for data interpretation, ledged. has been shown to be a powerful aid to the interpretationof large quantitiesof highresolu- REFERENCES tion mass spectral data. 'K. Biemann,P. Bommer and D. M. Desiderio. Tetrahed- ron Letters 1725 (1964) The program's representation of knowledge of ! molecular A. L. Burlingame and D. H. Smith, Tetrahedron 24 5749 structure and mass spectrometry is suffi- (1968) ciently flexible and general to suggest potential 'R. Venkniarnghiivnn wide applicability. and F. W. McLafferty Anaht The output of INTSUM is a Chem. 39. 278 (1967) valuable aid to chemists in determining firm rules of E. A. Feigenbaum, "Information Processing 6K." North fragmentation which can then be used in studies of Holland, Amsterdam (1968) related but unknown compounds. The program is 'A. B. Delfino and A. Buchs. Helv. Chim. Acta 55 2017 presently in routine use in studies ofthe fragmenta- (1972) *C. J. M. tion of other classes of compounds, includingother Djerassi, Wilson, H. Budzikiewicz and J. W steroids and alkaloids. J. Am. Chem. Soc. 84, 4544 (1962). D. H. B. G. Buchanan, R. S. Engelmore, A. M. A. Yeo, E. A. Feigenbaum, J. Lederberg and C Djerassi, Ibid. 94, 5962 (1972) EXPERIMENTAL "B. G. Buchanan, E. A. Feigenbaum and N. S. High resolution mass spectra were determinedutilizing Machine Intelligence 7 (Edited by B. both Meltzer and D. AEI-MS9 and Varian-MAT 71 1 mass spectrometers. Edinburgh University Edinburgh (1972) 9 Press, The former instrument was operated with an ionizing vol- H. Budzikiewicz, C. Djerassi and D. H. Williams, tage Mass and current of 70 ev and 500 ua, respectively. Scans Spectrometryof Organic Compounds, Holden-Day, San wererecorded at a scanrate of 34 sec/decade in mass. The Francisco (1967) latter l0 instrument was operated with an ionizing voltage W. B. Weber, R. A. Felix and A. K. Willard, JAm and current of 70 ev and 1-6 ma, respectively, and scan Chem. Soc. 92, 1420 (1970) rates of either 38 or 22 sec/decade. Samples were intro- "J. A. McCloskey, R. N. Stillwell and A. M. Lawson duced via the direct insertion probe in both instruments. Analyt. Chem. 40, 233 (1968) Data were recorded on-line to a Digital Equipment Corp L. Tokes and C. Djerassi, /. Am. Chem. Soc 91, 5017 PDP-11 interfaced directly to the ACME computer (1969) facility. J. H. Beynon, G. R. Lester and A. E. Williams J Phys Data on defocused metastable ions* were obtained Chem. A, 63, 1861 (1959) utilizing the AEI-MS9 and Varian-MAT 711 instruments "J. H. Bowie, D. W. R. G. F. Giles and D. H. with ion source conditions and sample introduction as Williams, J. Chem. Soc. B, 335 (1966) outlined above. "R. A. S. J. Clark and H. H. Wotiz. Anaht Available samples ofestrogens were subjected to mass Biochem. 44, 1 (1971) spectral analysis without further purification. Two sam- "D. H. A. M. Duffield and C. Djerassi, Org. Mass ples found to be mixtures were not included in this study. Spectrom. in press. The INTSUM program is written in the Stanford M. Spiteller-Friedmann and G. Spiteller, Fortschritte der 360/LISP language and runs in batch mode on the Stan- Chemischen Forschung, 12, 440 (1969) ford IBM 360/67 machine or on Stanford Medical "V. I. Zaretskii, N. S. V. L. Sadovskaya, IBM S. N. School's 360/50 (the ACME facility). On the fasterof Ananchenko and I. V. Torgov, the Tetrahedron 24 2339 two machines, the summary of 47 estrogen spectra (1968) took roughly 15min. Programming details are omitted "H. Budzikiewicz, Biochemical Applications Mass for the of here sake of brevity but can be obtainedupon re- Spectrometry (Edited by G. R. Waller), p. 251. Wiley quest from the authors. New York (1972) 20 R. H. Shapiro and KB. Tomer, Org. Mass Spectrom 2 579 (1969) Acknowledgements—Financial support from the National M. Barber and R. M. Elliott, presented Institutes of Health (RR-612-02) and the Advanced Re- at the Twelfth Annual Conference on Mass Spectrometry and Allied Topics, Montreal, June (1964) *These experiments were carried 22 out utilizing ion de- K. R. Jennings, J. Chem. Phys. 43, 4176 (1965) compositions in the first field-free region of a double 3 J. focussing H. Beynon, R. A. Saunders and A. E. Williams mass spectrometer. See Refs 21-23. Nature 204, 67 (1964)

INTSUM,

Chamberlin, 7 Smith, Duffield,

Sridharan,

,2

,3

Cameron, Okerholm, Smith,

,7

Wulfson,

2,