Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

Supplementary material »Online Resource«

Some findings relevant to the mechanistic interpretation in the case of predictive models for carcinogenicity based on the counter propagation artificial neural network

Natalja Fjodorova1*, Marjana Novič1

Addresses: 1 National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana,

Slovenia

Email: Natalja Fjodorova* - [email protected] ; Marjana Novič- [email protected]

* Corresponding author

1 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

The captures of Figures_SI (»Online Resource«):

Figure 1SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D1 (SdsCH- Sum of all ( = CH – )

E-State values in molecule); (c)- The names and structures of chemicals placed in the influential zone of MDL descriptor D1 with indication of position in Kohonen map.

Figure 2SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D2 (SdssC_acnt- Count of all ( = C

< ) groups in molecule); (c)- The names and structures of chemicals placed in the influential zone of

MDL descriptor D2 with indication of position in Kohonen map.

Figure 3SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D3- (SdsN_acntnt- Count of all ( =

N) groups in molecule); (c)- The names and structures of chemicals placed in the influential zone of

MDL descriptor D3 with indication of position in Kohonen map.

Figure 4SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b1, b2, b3)- The weight maps of model A corresponding to MDL descriptors D4 (dxp9 -Difference simple 9th order path chi indices), D5 (nxch6- Number of 6-membered rings) and D7 (SHCsats- sum of hydrogen E-State on sp3 C on saturated bond), respectively; (c)- The name and structure of chemicals placed in the influential zones of descriptors D4, D5 and D7 with indication of position in Kohonen map.

Figure 5SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to descriptor D6- (Gmin- Smallest atom E-State value in molecule). The position of most influential zone is marked in Kohonen maps. The chemicals related to marked area are represented in the table below.

Figure 6SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D8- (SHBint2_Acnt- Count of internal hydrogen bonds with 2 skeletal bonds between donor and acceptor); (c)- The names and structures of chemicals placed in the influential zone of MDL descriptor D3 with indication of position in Kohonen map.

2 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

The captions of Tables_SI (Online Resource).

Table 1SI Characterization of MDL descriptors used in modelling and their mechanistic interpretation

Table 2SI The structures of chemicals contaning two structural alerts (SA27_SA28)

Table 3SI The structures of chemicals containing SA_27

Table 4SI The structure of chemicals containing SA_28

3 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

MDL descriptor D1-SdsCH- Sum of all ( = CH – ) E-State values in molecule (a) (b)

2 35 3 5 2 2 2 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 2 2 2 2 1 2 1 1 1 10 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 2 2 1 2 2 1 1 1 . 8 2 2 2 2 2 1 2 1 30 9 3 0 1 1 1 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 1 2 1 2 2 1 . 6 1 2 1 1 2 1 1 1 1 8 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 25 2 5 2 2 2 2 2 1 1 2 1 1 1 1 1 1 . 4 2 2 2 2 1 1 2 2 2 1 2 2 7 1 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 1 1 1 . 2 6 2 2 2 2 2 1 2 1 1 2 1 1 1 20 2 0 2 2 2 2 1 2 1 2 2 1 2 2 1 2 2 1 1 1 1 2 1 y N y 2 1 1 1 1 1 1 2 1 1 1 1 5 N 2 2 2 2 2 1 1 1 2 2 2 2 2 1 2 1 1 2 2 2 1 1 2 15 1 5 1 1 2 2 1 2 1 2 2 1 2 1 1 1 2 1 2 4 0 . 8 1 2 2 1 2 1 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 1 2 1 1 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 3 1 1 2 1 1 1 2 2 1 2 1 1 1 1 2 2 2 1 1 1 0 . 6 10 1 0 1 1 2 1 1 2 2 1 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 1 2 1 2 2 1 2 2 2 1 2 1 2 2 2 1 1 2 2 0 . 4 2 1 2 2 1 2 1 1 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 1 5 5 2 1 1 1 2 1 1 2 1 2 2 1 2 1 2 1 2 2 1 2 1 1 1 2 1 2 2 1 1 2 0 . 2 1 2 1 2 1 2 2 2 2 1 1 2 2 2 0 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 0 5 10 15 20 25 30 35 5 1 0 1 5 2 0 2 5 3 0 3 5 Nx N x

Position - nx*ny=35*15;35*16 (c) Position -nx*ny=35*15; Position - nx*ny=35*16; Actinomycin D;(CASRN 50-76-0); Fumonisin b; (CASRN 1116355-83-0); GA; P NA; P

Figure 1SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D1 (SdsCH- Sum of all ( = CH – )

E-State values in molecule); (c)- The names and structures of chemicals placed in the influential zone of MDL descriptor D1 with indication of position in Kohonen map.

4 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

MDL descriptor D2-SdssC_acnt- Count of all ( = C < ) groups in molecule (a) (b) 2 3 5 2 2 2 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 3 5 2 2 2 2 1 2 1 1 1 1 4 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 2 2 1 2 2 1 1 1 . 8 2 2 2 2 2 1 2 1 3 0 1 1 1 2 2 2 2 2 2 1 1 1 2 3 0 2 2 2 2 2 2 2 2 1 2 1 2 2 1 2 1 . 6 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 5 2 2 2 2 2 1 1 2 1 1 1 1 1 1 . 4 2 5 2 2 2 2 1 1 2 2 2 1 2 2 1 0 1 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 1 1 1 . 2 2 2 2 2 2 1 2 1 1 2 1 1 1 2 0 2 2 2 2 1 2 1 2 2 1 2 0 8 2 2 1 2 2 1 1 1 1 2 1 y 2 1 1 1 1 1 1 2 1 1 1 1 y N 2 2 2 2 2 1 1 1 2 2 2 2 2 N 1 2 1 1 2 2 2 1 1 2 1 5 1 1 2 2 1 2 1 2 2 1 2 1 1 1 2 1 2 6 0 . 8 1 5 1 2 2 1 2 1 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 1 2 1 1 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1 2 2 1 2 1 1 1 1 2 2 2 1 1 1 0 . 6 4 1 0 1 1 2 1 1 2 2 1 2 1 1 1 0 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 2 2 2 1 1 2 2 0 . 4 2 1 2 2 1 2 1 1 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 5 2 1 1 1 2 1 1 2 1 2 2 1 2 5 1 2 1 2 2 1 2 1 1 1 2 1 2 2 1 1 2 0 . 2 1 2 1 2 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 0 0 5 1 0 1 5 2 0 2 5 3 0 3 5 5 1 0 1 5 2 0 2 5 3 0 3 5 N x N x

Position - nx*ny=35*21

(c) Position - nx*ny=35*21; beta-Carotene; (CASRN 7235-40-7); NA

Figure 2SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D2 (SdssC_acnt- Count of all ( = C

< ) groups in molecule); (c)- The names and structures of chemicals placed in the influential zone of

MDL descriptor D2 with indication of position in Kohonen map.

5 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

MDL descriptor D3- SdsN_acntnt- Count of all ( = N) groups in (a) (b)

2 7 35 2 2 2 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 3 5 2 2 2 2 1 2 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 2 2 1 2 2 1 1 1.8 2 2 2 2 2 1 2 1 6 30 1 1 1 2 2 2 2 2 2 1 1 1 2 3 0 2 2 2 2 2 2 2 2 1 2 1 2 2 1.6 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 5 25 2 2 2 2 2 1 1 2 1 1 1 1 1 1.4 2 5 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 1 1 4 2 2 2 2 2 1 2 1 1 2 1 1 1 1.2 20 2 2 2 2 1 2 1 2 2 1 2 0 2 2 1 2 2 1 1 1 1 2 1 y y

2 1 1 1 1 1 1 2 1 1 1 1 N N 2 2 2 2 2 1 1 1 2 2 2 2 2 3 1 2 1 1 2 2 2 1 1 2 15 1 1 2 2 1 2 1 2 2 1 2 1 1 1 2 1 2 1 5 0.8 1 2 2 1 2 1 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 1 2 1 1 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 1 1 1 1 2 2 2 1 1 1 0.6 10 1 1 2 1 1 2 2 1 2 1 1 1 0 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 2 2 2 1 1 2 2 0.4 1 2 1 2 2 1 2 1 1 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 5 2 1 1 1 2 1 1 2 1 2 2 1 2 5 1 2 1 2 2 1 2 1 1 1 2 1 2 2 1 1 2 0.2 1 2 1 2 1 2 2 2 2 1 1 2 2 2 0 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 0 5 10 15 20 25 30 35 5 1 0 1 5 2 0 2 5 3 0 3 5 Nx N x

Position -nx*ny=8*35;7*35 (c) Position -nx*ny=8*35; Position -nx*ny=7*35; Dacarbazine; (CASRN 4342-03-4); Diarylanilide yellow; CASRN 6358-85-6); SA13; P NA; NP

SA13- Hydrazine, R= any atom/group

Figure 3SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D3- (SdsN_acntnt- Count of all ( =

N) groups in molecule); (c)- The names and structures of chemicals placed in the influential zone of

MDL descriptor D3 with indication of position in Kohonen map.

6 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

MDL descriptor D4- dxp9- Difference simple 9th order path chi indices

2 (a) 3 5 2 2 2 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 3 5 2 2 2 2 1 2 1 1 1 1 0 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 2 2 1 2 2 1 1 1 . 8(b1) 2 2 2 2 2 1 2 1 9 3 0 1 1 1 2 2 2 2 2 2 1 1 1 2 3 0 2 2 2 2 2 2 2 2 1 2 1 2 2 1 . 6 1 2 1 1 2 1 1 1 1 8 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 2 5 2 2 2 2 2 1 1 2 1 1 1 1 1 1 . 4 2 5 7 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 1 1 1 . 2 6 2 2 2 2 2 1 2 1 1 2 1 1 1 2 0 2 2 2 2 1 2 1 2 2 1 2 0 2 2 1 2 2 1 1 1 1 2 1

y 5 2 1 1 1 1 1 1 2 1 1 1 1 y N 2 2 2 2 2 1 1 1 2 2 2 2 2 N 1 2 1 1 2 2 2 1 1 2 1 5 1 1 2 2 1 2 1 2 2 1 2 1 1 1 2 1 2 4 0 . 8 1 5 1 2 2 1 2 1 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 1 2 1 1 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 3 1 1 2 1 1 1 2 2 1 2 1 1 1 1 2 2 2 1 1 1 0 . 6 1 0 1 1 2 1 1 2 2 1 2 1 1 1 0 1 1 2 1 1 2 1 1 1 1 1 1 2 1 2 1 2 2 1 2 2 2 1 2 1 2 2 2 1 1 2 2 0 . 4 2 1 2 2 1 2 1 1 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 1 5 2 1 1 1 2 1 1 2 1 2 2 1 2 5 1 2 1 2 2 1 2 1 1 1 2 1 2 2 1 1 2 0 . 2 1 2 1 2 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 1 0 2 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 0 5 1 0 1 5 2 0 2 5 3 0 3 5 5 1 0 1 5 2 0 2 5 3 0 3 5 N x N x

Position of nx*ny=35*2;35*1

MDL descriptor D5- nxch6- Number of MDL descriptor D7- SHCsats - Sum of 6-membered rings hydrogen E-State on sp3 C on saturated bond

3 5

3 5 5 1 4 (b2) (b3)

3 0 3 0 1 2 4

2 5 1 0 2 5

3

2 0 8 2 0 y N y

N 2 6 1 5 1 5

4 1 1 0 1 0

2 5 0 5

0

5 1 0 1 5 2 0 2 5 3 0 3 5 N x 5 1 0 1 5 2 0 2 5 3 0 3 5 N x

Position of nx*ny=35*2; 35*1

(c) Position -35*1; Position -35*2; beta-Cyclodextrin; (CASRN 7585-39-9); Vinblastine; (CASRN 865-21-4); NA, NP NA, NP

Figure 4SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1); (b1, b2, b3)- The weight maps of model A corresponding to MDL descriptors D4 , D5 and D7 respectively; (c)- The name and structure of chemicals placed in the influental zones of descriptors D4, D5 and D7 with indication of position in Kohonen map.

7 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

MDL descriptor D6- Gmin- Smallest atom E-State value in molecule (a) (b)

2 3 5 2 2 2 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 3 5 2 2 2 2 1 2 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 2 2 1 2 2 1 1 1 . 8 1 . 5 2 2 2 2 2 1 2 1 3 0 1 1 1 2 2 2 2 2 2 1 1 1 2 3 0 2 2 2 2 2 2 2 2 1 2 1 2 2 1 1 . 6 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 1 1 0 . 5 2 5 2 2 2 2 2 1 1 2 1 1 1 1 1 1 . 4 2 5 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 0 2 2 2 2 2 2 2 1 1 2 1 1 1 2 2 2 2 2 1 2 1 1 2 1 1 1 1 . 2 2 0 2 2 2 2 1 2 1 2 2 1 2 0 - 0 . 5 2 2 1 2 2 1 1 1 1 2 1 y 2 1 1 1 1 1 1 2 1 1 1 1 y N N 2 2 2 2 2 1 1 1 2 2 2 2 2 - 1 1 2 1 1 2 2 2 1 1 2 1 5 1 1 2 2 1 2 1 2 2 1 2 1 1 1 2 1 2 0 . 8 1 5 1 2 2 1 2 1 2 2 1 - 1 . 5 1 1 2 1 1 2 2 2 2 2 1 2 1 2 1 1 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1 2 2 1 2 1 1 1 1 2 2 2 1 1 1 0 . 6 - 2 1 0 1 1 2 1 1 2 2 1 2 1 1 1 0 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 2 2 2 1 1 2 2 0 . 4 2 1 2 2 1 2 1 1 2 1 - 2 . 5 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 5 2 1 1 1 2 1 1 2 1 2 2 1 2 5 1 2 1 2 2 1 2 1 1 1 2 1 2 2 1 1 2 0 . 2 - 3 1 2 1 2 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 - 3 . 5 0 5 1 0 1 5 2 0 2 5 3 0 3 5 5 1 0 1 5 2 0 2 5 3 0 3 5 N x N x

Position - nx*ny=3*1; 5*1; 5*2

Location Chemicals structure Carcinogenic potency in of neurons Chemicals falled in pointed Top map neurons P-positive; nx*ny NP-not positive

Propylene, 3x1 NP CASRN 115-07-1

Vinyl bromide, 3x1 P CASRN 593-60-2

Dichloroacetylen, 5x1 P CASRN 7572-29-4

Furan, 5x1 P CASRN 110-00-9

Methyl bromide, CASRN 74- 5x1 P 83-9

Acrylonitrile, 5x2 NP CASRN 107-13-1

Dimethyl hydrazine (DMH), 5x2 NP CASRN 57-14-7

Figure 5SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1); (b)- The weight map of model A corresponding to descriptor D6- (Gmin- Smallest atom E-State value in molecule). The position of most influential zone is marked in Kohonen maps. The chemicals related to marked area are represented in the table below.

8 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

MDL descriptor D8- SHBint2_Acnt- Count of internal hydrogen bonds with 2 skeletal bonds between donor and acceptor (a) (b)

2 3 5 2 2 2 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 2 2 1 2 3 5 2 2 2 2 1 2 1 1 1 1 1 1 2 2 1 1 2 2 1 1 1 2 1 1 2 2 1 2 2 1 1 1 . 8 6 2 2 2 2 2 1 2 1 3 0 1 1 1 2 2 2 2 2 2 1 1 1 2 3 0 2 2 2 2 2 2 2 2 1 2 1 2 2 1 . 6 1 2 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 1 1 1 5 1 2 1 1 2 1 1 2 5 2 2 2 2 2 1 1 2 1 1 1 1 1 1 . 4 2 5 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 2 2 2 2 2 2 2 1 1 2 1 1 1 4 1 . 2 2 2 2 2 2 1 2 1 1 2 1 1 1 2 0 2 2 2 2 1 2 1 2 2 1 2 0 2 2 1 2 2 1 1 1 1 2 1 y y 2 1 1 1 1 1 1 2 1 1 1 1 N N 3 2 2 2 2 2 1 1 1 2 2 2 2 2 1 2 1 1 2 2 2 1 1 2 1 5 1 1 2 2 1 2 1 2 2 1 2 1 1 1 2 1 2 1 5 0 . 8 1 2 2 1 2 1 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 1 2 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 1 1 1 1 1 2 1 1 1 2 2 1 2 1 1 1 1 2 2 2 1 1 1 0 . 6 1 0 1 1 2 1 1 2 2 1 2 1 1 1 0 1 1 2 1 1 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2 2 1 2 1 2 2 2 1 1 2 2 1 0 . 4 2 1 2 2 1 2 1 1 2 1 2 1 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 5 2 1 1 1 2 1 1 2 1 2 2 1 2 5 1 2 1 2 2 1 2 1 1 1 2 1 2 2 1 1 2 0 . 2 1 2 1 2 1 2 2 2 2 1 1 2 2 2 0 2 1 2 2 1 1 2 2 2 2 1 2 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 2 1 1 1 0 5 1 0 1 5 2 0 2 5 3 0 3 5 5 1 0 1 5 2 0 2 5 3 0 3 5 N x N x

Position - nx*ny=21*35; 22*35 (c) Position - nx*ny=21*35; Position - nx*ny=22*35; Allantoin; (CASRN 97-59-6); Hexamethylmelamine; (CASRN 531-18-0); NA, NP SA3; P SA3- N-methylol derivatives; R = any atom/group

Figure 6SI. (a)- The top map of model A with distribution of carcinogens (2) and non-carcinogens (1);

(b)- The weight map of model A corresponding to MDL descriptor D8- (SHBint2_Acnt- Count of internal hydrogen bonds with 2 skeletal bonds between donor and acceptor); (c)- The names and structures of chemicals placed in the influential zone of MDL descriptor D3 with indication of position in Kohonen map.

9 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

*Notes: Abbreviations in Figures 1SI-6SI:

GA- genotoxic alert;

NA- no alert;

P- positive (carcinogen);

NP- non positive (non-carcinogen).

10 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

Table 1SI The characterization of MDL descriptors used in the modelling and their mechanistic interpretation

MDL Descriptor Characterization of descriptors Comments name and type D1-(SdsCH- Sum The descriptor D1 (SdsCH) is computed as Both of chemicals placed in most of all ( = CH – ) a summary of all ( = CH – ) E-State values. influental position for descriptor E-State values in Electrotopological state (E-State) D1(Fig. 1SI) have a lot of fragments molecule) -atom descriptors arise from electronic ( = CH – ) in their structure. Both type E-State environment of each atom due to its compounds are carcinogens (P). The descriptor. (Fig. intrinsic electronic properties and the first chemical Actinomycin D (CASRN 1SI) influence of other atoms in molecule. They 50-76-0) has genotoxic alert (GA) can parametrize hydrogen bonds, molecular according to data obtained from polarity, ets. Toxtree. The second chemical is Fumonisin b (CASRN 1116355-83-0) has no alert for carcinogenicity (NA).

D2- (SdssC_acnt- The descriptor D2 (SdssC_acnt) is related The more influental zone for MDL count of all ( = C to count of all ( = C < ) groups in molecule descriptor D2 is located in position

< ) groups in and belong to atom type E-state account. nx*ny=35*21 (Fig. 2SI) and molecule) (Fig. corresponds to non-positive (NP) 2SI). chemical beta-Carotene (CASRN 7235- 40-7) placed at the same area. If to take into consideration beta-Carotene we can see a lot of ( = C < ) groups in its structure. This is the reason why this chemical placed in the zone corresponding to the biggest value of this descriptor. We did not find answer how it correlated with carcinogenicity. In present example beta-Carotene have a big molecular weight that can be the reason why this compound is non active. More deep investigation needed. D3- (SdsN_acntnt- The descriptor D3 (SdsN_acntnt) is related The influential zone for descriptor D3

Count of all ( = N) to count of all ( = N) groups in molecule is located in position nx*ny=8*35 and groups in and belong to atom type E-state account. 7*35 and corresponds to positive (P) molecule) Dacarbazine with structural alert SA13 (Fig.3SI). (Hydrazine) in position 8*35 and non positive (NP) Diarylanilide yellow in position position 7*35 without carcinogenic alert (NA). (Fig. 3SI). Indeed, the both of chemicals shown in Fig. 3SI have a lot of ( = N) groups in their structures. Many of descriptors relate directly to or associated with structural alerts as was reported in papers [33-34]. In our case the desriptor D3 is related to nitroso compounds.

D6- (Gmin- The descriptor D4-(The global E-State A set of molecular structures in one of Smallest atom E- descriptor Gmin) is a measure of the most the zone of the highest values D4 State value in electrophilic atom in the molecule. (Gmin) is given in the table in Fig. 5SI. molecule) Mechanistically, an electrophilic center is Thus, the diversity of structures and (Fig.5SI) important for covalent bond formation with biological activity correponds to the

11 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

nucleophilic DNA. This is the reason why influential zone of this descriptor. this descriptor was found between the most Moreover this descriptors values are important descriptors correlated with spread over the map and have influence carcinogenicity. on chemicals in studied dataset.

D8- The descriptor D8- is related to count of Influential zone of descriptor D8 cover (SHBint2_Acnt- internal hydrogen bonds with 2 skeletal negative and positive chemicals Count of internal bonds between donor and acceptor which is represented in Fig 6SI. Positive hydrogen bonds important feature for covalent bond chemical Hexamethylmelamine has with 2 skeletal formation with DNA. carcinogenicity alert SA3 also shown in bonds between Fig 6SI. donor and acceptor) (Fig.6SI) D4- (dxp9- Descriptor D4 links to connectivities D4 and D5 descriptors encode difference simple simple group information on skeletal variation and 9th order path chi D4 belongs to Chi descriptors. This complexity.Descriptors D4, D5 and D7 indices) (Fig.4SI) familiy of descriptors describes molecular were considered together because they connectivity of molecule. Structural have the same influental zone in features such as size, branching, position nx*ny=35*2; 35*1 and can be unsaturation, heteroatom content and described with the same chemicals cyclicity are incoded. (Fig.4SI). The influental zone of studied descriptors corresponds to not D5- (nxch6- D5 links to connectivities subgroup count positive (NP) chemicals (non number of 6- family carcinogens). membered rings) (Fig.4SI)

D7- (SHCsats - Descriptor D7 belongs to HE-state for sum of hydrogen groups. Hydrogen atom E-state E-State on sp3 C characterises the valence state which on saturated bond) includes relationship for valence state (Fig.4SI). electro negativity and atom/group molar volume. D7 encodes E-state values for hydrogens on sp3 hybrid carbons bonded only other sp3carbon atoms. The electron accessibility of these sp3 hydrogens may relate in some manner to hydrophobic interactions between substrates and DNA or may have a relation to alkyl chlorides that are known toxicophores.

12 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

Table 2SI The structures of chemicals contaning two structural alerts SA27+SA28 (14 chemicals)

No ID_CPDBAS CASRN Structure P/NP

1 58 3775-55-1 P

2 59 712-68-5 P

3 60 38514-71-5 P

4 62 99-57-0 P

13 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

5 63 121-88-0 P

6 64 119-34-6 P

7 66 121-66-4 P

8 396 720-69-4 P

9 943 99-59-2 P

14 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

10 957 5307-14-2 NP

11 958 99-56-9 NP

12 959 99-55-8 NP

13 961 100-01-6 NP

14 1236 2871-01-4 NP

15 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

Table 3SI The structures of chemicals containing SA 27 (75 chemicals)

ID_v5 ID_CPDBAS CASRN Structure P/NP

24 33 3688-53-7 P

66 119 446-86-6 NP

67 120 123524-52-7 NP

97 176 33372-39-3 P

16 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

103 193 33229-34-4 NP

141 278 97-00-7 NP

142 280 88-73-3 NP

143 281 100-00-5 NP

248 472 33389-33-2 P

17 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

266 496 597-25-1 P

267 497 551-92-8 P

270 502 55738-54-0 P

279 522 29110-68-7 NP

281 528 606-20-2 P

18 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

386 727 33389-36-5 P

458 855 298-00-0 NP

479 892 443-48-1 P

488 908 58139-48-3 P

496 936 139-94-6 P

19 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

501 945 772-43-0 NP

502 946 92-55-7 NP

503 947 75198-31-1 P

504 948 2122-86-3 P

505 949 36133-88-7 P

20 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

507 951 53757-28-1 P

515 960 602-87-9 P

517 962 91-23-6 P

518 963 619-17-0 NP

519 964 98-95-3 P

21 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

520 965 94-52-0 NP

521 966 62-23-7 P

525 970 1836-75-5 P

531 978 86-57-7 NP

537 984 613-50-3 NP

22 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

538 985 607-35-2 P

627 1100 56-38-2 NP

772 1394 42011-48-3 P

773 1395 1582-09-8 NP

779 1406 99-35-4 NP

23 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

804 1458 59820-43-8 NP

24 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

Table 4SI The structure of chemicals containing SA28 (52 chemicals)

ID_v5 ID_CPDBAS CASRN Structure P/NP

49 76 61-82-5 P

55 90 62-53-3 NP

79 137 92-87-5 P

82 142 91-76-9 NP

25 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

133 262 133-90-4 NP

138 275 101-79-1 P

144 282 5131-60-2 P

145 283 95-83-0 P

146 286 95-74-9 NP

26 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

147 287 95-79-4 NP

153 295 106-47-8 NP

181 349 102-50-1 P

182 350 120-71-8 P

192 371 80-08-0 P

27 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

207 402 95-80-7 P

218 427 609-20-1 NP

223 432 91-94-1 P

255 482 5803-51-0 P

347 657 67730-11-4 P

28 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

348 658 67730-10-3 P

379 718 4463-22-3 NP

381 720 4363-03-5 NP

424 802 108-78-1 P

436 822 59-05-2 NP

29 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

463 865 101-14-4 P

464 867 838-88-0 P

492 916 2243-62-1 P

493 921 91-59-8 P

625 1094 101-80-4 P

30 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

647 1138 106-50-3 NP

710 1292 127-69-5 NP

721 1314 15721-02-5 NP

742 1344 139-65-1 P

755 1373 396-01-0 NP

31 Mechanistic interpretation of carcinogenicity models «J Comput Aided Mol Des»

776 1398 137-17-7 P

32