Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Athanasios Angelakis Ioanna Soulioti

JADS Department of Biology Eindhoven University of Technology University of Athens Den Bosch, Netherlands Athens, Greece [email protected] [email protected]

Abstract value is high regarding the survival of patients with AML [5]. Another reason we include the age is that from deep learning work in radiology, in particular We train a machine learning model on a dataset in ultrasound with even small data sets of 100 data of 2177 individuals using as features 26 probe sets instances [6], [7], and with CatBoost [8] using fea- and their age in order to classify if someone has tures coming from different sources we can achieve acute myeloid leukaemia or is healthy. The dataset high performance in binary classification problems is multicentric and consists of data from 27 organ- both on sensitivity and specificity. isations, 25 cities, 15 countries and 4 continents. The accuracy or our model is 99.94% and its F1- We first tune a CatBoost [9] on a curated pub- score is 0.9996. To the best of our knowledge the licly available Affymetrix microarray expres- performance of our model is the best one in the sion and normalized batch corrected dataset con- literature, as regards the prediction of AML using sisted of probe sets of 3374 individuals [3], in order similar or not data. Moreover, there has not been to classify if an individual has AML or is healthy. any bibliographic reference associated with acute CatBoost library offers the option to return the set myeloid leukaemia for the 26 probe sets we used as of features’ importance of CatBoost algorithm and features in our model. also the set of features’ importance of the loss func- tion change. The above two sets can differ.

1 Introduction We keep the 100 most important features for each of the above two sets and then we take the intersec- Acute myeloid leukaemia (AML) [1] is often char- tion of these which consists of 34 probe sets. The arXiv:2108.07396v1 [cs.LG] 17 Aug 2021 acterized by non detectable early symptoms and idea of intersection comes from the fact that we its quick prognosis, even in an intensive care unit would like to include features of high importance could have a huge impact on the overall survival [2]. regarding the predictability of CatBoost algorithm The use of machine learning can be helpful on the and at the same time its loss function change dur- diagnosis of this disease and therefore in the cre- ing the training process. ation of a screening tool [3], [4]. Here we focus on the primary diagnosis of AML using the minimum We randomly split the dataset of the 34 probe sets number of probe sets possible in order to achieve and the 3374 data instances using 80% for training excellent performance. In addition, we use the age and 20% for validation. We use 10 fold cross vali- as feature to our final model since its prognostic Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Table 1: Performance of the dimensionality reduc- tion CatBoost model of the 10CV on the 80% train- ing set and on the 20% validation set of 3374 data instances and 44754 probe sets. The dataset cor- responds to U133A, U133B and U133 2.0 microar- rays.

Metrics Validation Set 10CV Spec. 0.9929 0.9805 Sens. 1.0000 0.9991 AUC 0.9965 0.9898 F1-score 0.9964 0.9884

Table 2: Performance of the CatBoost34 model of the 10CV on the 80% training set and on the 20% validation set of 3374 data instances and 34 probe sets. The dataset corresponds to U133A, U133B and U133 2.0 microarrays.

Metrics Validation Set 10CV Spec. 1.0000 0.9929 Sens. 1.0000 0.9926 AUC 1.0000 0.9920 F1-score 1.0000 0.9972

Figure 1: Datasets and CatBoost models with their use 10CV in order to tune the CatBoost on the harmonic mean of precision and recall. The first training set, and then we validate it on the test set. integer corresponds to the number of the data in- In Figure 1 we show diagram of the three models stances and the second one corresponds to the num- and the corresponding datasets of our approach. ber of features. 2 Models dation (10CV) [10] in order to tune a CatBoost on The dimensionality reduction CatBoost model has the training set, and then we validate it on the test 200 iterators, depth 6 and learning rate 0.1. We set. randomly split the initial dataset of 3374 data in- stances and 44754 probe sets. The performance of From these 34 probe sets we keep only those for the tuned model appears in Table 1. which we cannot find any bibliographic reference regarding their correlation to AML, Table 9. The We compute the intersection of the sets of the most only correlated to AML feature we include in our important features, regarding the predictability of final machine learning model is the age of each in- CatBoost, and the most important features regard- dividual. ing the loss function change during the training pro- cess. We set the number of elements of each set to We randomly split the dataset of 2177 individuals be 100. The intersection has only 34 probe sets. We using 80% for training and 20% for validation. We tune a CatBoost model (CatBoost34) of 200 itera- Diagnosis of Acute Myeloid Leukaemia Using Machine Learning tors, depth 5 and learning rate 0.1 on the dataset any reference regarding their correlation to AML of 3374 data instances. The results in Table 2 show yet. Since we want to use also the age of the that using only 34 probe sets our machine learning individuals as feature to our diagnostic CatBoost model is able to achieve great performance. model, we drop-out all the data instances with no age filled-in. From the 34 probe sets we exclude all which are correlated from bibliographic references to AML so The final dataset consists of 2177 data instances we keep only the 26 probe sets of Table 9. The and it has 27 features (26 probe sets and the age). tuned CatBoost model which we use for the diag- Tables 6, 7 and 8 provide detailed information nosis of AML (CatBoost26) has 100 iterators and about the dataset, including the number of samples depth 11 with learning rate 0.1. used, the sample source, the sex and the age of the individuals, the organisations which provided the In all three models above we use the weight balance data, the AML subtypes and statistics about the parameters of CatBoost library since our datasets overall survival when available, as well as the total are imbalanced. Moreover, we keep all the other number of AML patients and healthy individuals. parameters of them similar to the default values provided by CatBoost library. From the 2177 individuals, 1013 are female (46.53%), 943 are male (43.32%) and 221 are un- 3 Data known (10.15%). In addition, 1629 are AML pa- tients (74.83%) and 548 are healthy (25.17%). The The initial dataset is a curated publicly available mean and the standard deviation of age are 48.87 Affymetrix microarray gene expression one and and 17.01, respectively. As regards the number of it consists of 34 datasets derived from 32 stud- data instances per age group in the data set we ies [3]. It is an international multicentric dataset have: 99 [0-19], 217 [20-29], 340 [30-39], 393 [40-49], since its data instances come from 27 organisa- 487 [50-59], 390 [60-69], 212 [70-79] and 39 [80-89]. tions, 25 cities, 15 countries and 4 continents. The We randomly split the final dataset in two data come from different transcriptomic platforms: sets: training and validation (Table 6, Table 7). Affymetrix U133 Plus 2.0 microar- The training set consists of 1740 data instances ray, Affymetrix Human Genome U133A microarray (79.93%) and the validation set of the rest 437 and Affymetrix Human Genome U133B microarray. (20.07%). Since the dataset is relatively small we At first, the dataset consisted of 44754 probe sets use 10 fold cross validation in order to tune our and 3374 data instances which corresponded to model. In the Figure 2 we observe the feature 3374 individuals. From the 3374 data instances importance of the 27 features as regards the pre- 2668 (79.08%) were labelled as AML and 706 dictability of the CatBoost model using the 10CV, (20.92%) as healthy. while in the Figure 3 we can see the feature im- portance of the loss change for each one of the 27 The dimensionality reduction tuned model is features. applied on this dataset. We keep the 26 probe sets of the 34 {227923_at, 212549_at, 219386_s_at, 4 Results 207754_at, 208022_s_at, 209543_s_at, 210244_at, 207206_s_at, 210789_x_at, At Table 3 we see that our diagnosis model, Cat- 239766_at, 241688_at, 244719_at, 236952_at, Boost26, performs really well. The confusion ma- 241611_s_at, 217901_at, 229963_at, 230527_at, trix, Table 4, shows the true-positives (down-right), 222312_s_at, 214705_at, 203294_s_at, the true-negatives (up-left), false-positives (down- 209603_at, 243659_at, 230753_at, 204777_s_at, left) and false-negatives (up-right). Here, a positive 234632_x_at, 217680_x_at, 219513_s_at, data instance is a data instance labelled as AML 214719_at, 211772_x_at, 207636_at, 243272_at, and negative as a healthy one. 214945_at, 226311_at, 242056_at} for which, to the best of our knowledge, there has not been The mean area under the curve (AUC) from the Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Figure 2: Features’ importance of the predictability Figure 3: Features’ importance of the CatBoost di- of CatBoost diagnosis model. agnosis model.

Table 3: Performance of the CatBoost diagnosis Table 4: Confusion Matrix of the CatBoost diag- model, CatBoost26, of the 10CV on the 80% train- nosis model’s performance on the training set. ing set and on the 20% validation set of 2177 data instances on 26 probe sets and the age. 437 1 0 1302 Metrics Validation Set 10CV Spec. 1.0000 1.0000 Sens. 1.0000 0.9992 CatBoost26, all the data instances comes from the AUC 1.0000 0.9988 Affymetrix Human Genome U133 Plus 2.0. F1-score 1.0000 0.9996 From Figure 2 we observe that the first 8 probe sets have the highest impact on the predictability 10CV is 0.9988 with standard deviation 0.0023 and of CatBoost26, including 6 named {GATA3, 95% confidence interval: [0.9994, 1.000]. The mean BEX5, DSG2, SLC46A3, SH2D3A, CEACAM3}, accuracy is 0.9994 with standard deviation 0.0011. 1 uncharacterized gene {LOC101926907} and 1 cDNA probe set. The first probe set has remark- From Figures 2 and 3 we observe that the probe set: ably high feature importance compared to the oth- 234632_x_at, which is “cDNA”, is the most impor- ers, more than 4 times higher. To the best of tant probe set as regards both, the predictability of our knowledge these genes have not been corre- the CatBoost and the loss function change. lated to AML yet. The gene GATA3 has been cor- related to acute lymphoblastic leukemia [52] and Our CatBoost34 model is transcriptomic platform other types of cancer as well breast cancer [53], agnostic [11] since the label if a data instance bladder cancer [54]; the gene DSG2 is implicated comes either from Affymetrix Human Genome in various kinds of cancer including cervical can- U133 Plus 2.0 microarray or the Affymetrix Hu- cer [56], epithelial-derived carcinomas [57], pancre- man Genome U133A microarray or the Affymetrix atic cancer [58], breast cancer [59], colon cancer Human Genome U133B microarray, has not been [60], lung cancer [61], [62], gastric cancer [63], [64], used as feature. This helps in the robustness and ovarian cancer [65], laryngeal cancer [66] and liver universality of our model’s application in the di- cancer [67]. In addition, SLC46A3 is correlated to agnosis of AML. As regards the diagnosis model liver cancer [68] and BEX5, SH2D3A, CEACAM3 Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

RNA-seq, different machine learning models and Table 5: Performance of the k-NN model of [3] of statistical learning techniques have been used (k- the 10CV on 80% training set and on the 20% val- NN, LASSO, linear discriminant analysis, random idation set of 3374 data instances and 984 probe forest, linear SVM, polynomial SVM, radial SVM, sets. sigmoid SVM) in [11] in order to predict if an in- dividual has AML or is healthy. The best results Metrics Validation Set 10CV regarding the accuracy are the following: 97.6%, Spec. 0.9716 0.9546 98.0% and 99.1%. These results have been achieved Sens. 0.9925 0.9920 by training and validating the LASSO algorithm AUC 0.9821 0.9788 on each of the Affymetrix Human Genome U133A F1-score 0.9925 0.9899 microarray, Affymetrix Human Genome U133 2.0 microarray and Illumina RNA-seq datasets accord- ingly. The first dataset consisted of 2500 data in- have not been correlated to any type of cancer yet. stances from which 1049 (41.96%) were labelled as AML and 1451 (58.04%) as healthy. The sec- In Figure 3 we observe that the first 11 probe sets ond dataset consisted of 8348 data instances from have the highest importance of loss function change which 2588 (31.00%) were labelled as AML and of CatBoost26, including 10 named genes {GATA3, 5760 (69%) as healthy. Finally, the third dataset BEX5, DSG2, SLC46A3, FAM153A, FAM153B, consisted of 1181 data instances from which 508 FAM153C, PATL2, CEACAM3, MAL}, 3 unchar- (43.01%) were labelled as AML and 673 (56.99%) acterized genes {LOC101926907, LOC100507387, as healthy. LOC105377751}, 1 expressed sequence tag and 1 cDNA probe set. The 234632_x_at probe set has The last work related directly to ours is [12] in at least 4 times higher feature importance than the which using microarrays a deep neural network 210789_x_at, while 230527_at is approximately 3 (DNN) has been trained to classify AML from times more important feature than 210789_x_at. healthy individuals. The corresponding dataset Moreover, the gene MAL has been correlated to consisted of only 26 data instances. DNN’s accu- gastric cancer [70], breast cancer [71], ovarian can- racy score was 96.67%. cer [72] and colorectal cancer [73]. The genes {PATL2, FAM153A, FAM153B, FAM153C} have All methods above use datasets from gene expres- not been correlated to any type of cancer yet. sion profiling (GEP) to diagnose AML. Another ap- proach on different type of data like histopathol- ogy slides, using machine techniques has a been 5 Related Work tried out but the performance of the correspond- ing model, as regards accuracy, is around 95% [13]. The first machine learning approach on a subset of the dataset of 3374 individuals with the 44754 Using invariant cluster genomic signatures a ma- probe sets, has been done in [3]. Statistical meth- chine learning approach has been developed in ods have been used in order to reduce the dimen- [14] for the classification of primary and secondary sionality of the dataset, which dropped down to AML reaching an accuracy score of 97%. 984 probe sets. Here we trained the k-NN machine learning model of [3] on the same 80% train set as Our method can be used in the classification of dif- we did with our dimensionality reduction CatBoost fuse large B-cell lymphoma patients [15] and also in model, using 10CV. The results, Table 5, shows sub-classification of leukaemia [16] since GEP has that the dimensionality reduction CatBoost model been used as dataset in both cases. as well as CatBoost34, outperforms k-NN (Tables 1, 2). 6 Conclusion Using similar to our work data of Affymetrix Hu- man Genome U133A microarray, Affymetrix Hu- We develop a model which using CatBoost and man Genome U133 2.0 microarray and Illumina gene expression profiling data from Affymetrix Hu- Diagnosis of Acute Myeloid Leukaemia Using Machine Learning man Genome U133A, Affymetrix Human Genome U133B and Affymetrix Human Genome U133 2.0 is able to diagnose with the highest performance in literature if an individual has acute myeloid leukaemia or is healthy. We use CatBoost not only as a predictor to our problem, but also as a dimensionality reduction technique. In our ap- proach both machine learning models, CatBoost34 and CatBoost26, outperform other machine learn- ing approaches which use a variety of different clas- sifiers and similar or different datasets.

On the clinical side, for the very first time in the literature we show that it is possible using probe sets, which have not been correlated yet to AML, and the state of the art machine learning gradient boosted tree algorithm CatBoost, to claim that we are able to diagnose AML. It would be of great importance to further investigate the role of these 26 probe sets, not only as regards the AML, but also other types of cancer. Machine learning can provide to us different insights from conventional approaches. As regards the explainability part, we hope the scientific community will use the impor- tance of the probe sets shown in Figures 2 and 3 in order to explain further their behavior in AML. In addition, from the 26 probe sets some of them have not been yet related to known genes.

Acute myeloid leukaemia can appear suddenly to anyone. The importance of a screening tool where its sensitivity and specificity is close to 1.00, where the sample source is peripheral blood and the cost is low, it would have a tremendous impact to hu- manity. We hope our approach will inspire others to use machine learning in order to solve cancer problems.

Acknowledgements

We thank Michael Filippakis of University of Pi- raeus for his feedback and valuable discussions. Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Table 6: Number of samples and sample source, (peripheral blood (PB), bone marrow (BM)) of the train and validation set of 2177 individuals.

Index Train #Sam. & Sam. Source % Val. #Sam. & Sam. Source % 0 6 BM 75.00% 2 BM 25.00% 1 245 BM 81.67% 55 BM 18.33% 2 22 PB 84.62% 4 PB 15.38% 3 56 (52 BM & 4 PB) 71.80% 22 (21 BM & 1 PB) 28.20% 4 412 (379 BM & 33 PB) 78.48% 113 (103 BM & 10 PB) 21.52% 5 14 BM 87.50% 2 BM 12.50% 6 194 (177 BM & 17 PB) 77.29% 57 (54 BM & 3 PB) 22.71% 7 6 PB 75.00% 2 PB 25.00% 8 18 PB 81.82% 4 PB 18.18% 9 11 PB 78.57% 3 PB 21.43% 10 13 PB 76.47% 4 PB 23.53% 11 20 PB 80.00% 5 PB 20.00% 12 50 PB 79.37% 13 PB 20.63% 13 22 (12 BM & 10 PB) 64.71% 12 (9 BM & 3 PB) 35.29% 14 11 PB 91.67% 1 PB 8.33% 15 1 PB 50.00% 1 PB 50.00% 16 11 (9 BM & 2 PB) 91.67% 1 BM 8.33% 17 28 PB 80.00% 7 PB 20.00% 18 120 BM 85.71% 20 BM 14.29% 19 37 PB 80.43% 9 PB 19.57% 20 12 (10 BM & 2 PB) 92.30% 1 BM 7.70% 21 19 PB 79.17% 5 PB 20.83% 22 9 (6 BM & 3 PB) 75.00% 3 (2 BM & 1 PB) 25.00% 23 148 BM 80.87% 35 BM 19.13% 24 12 PB 100.00% - 00.00% 25 3 PB 100.00% - 00.00% 26 42 (23 BM & 19 PB) 93.33% 3 (2 BM & 1 PB) 6.67% 27 26 PB 86.67% 4 PB 13.33% 28 25 PB 71.43% 10 PB 28.57% 29 49 PB 76.56% 15 PB 23.44% 30 99 PB 81.82% 22 PB 18.18% 31 - 00.00% 1 PB 100.00% Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Table 7: Number of patients per age group of the train and validation set. As regards the train set the mean of age is 48.98 with standard deviation 17.06 and as regards the validation set the mean of age is 48.46 and the standard deviation is 16.79.

Train set Validation set Age group: # Number of pa- Age group: # Number of pa- % % tients tients 0 to 19: 75 4.31% 0 to 19: 24 5.5% 20 to 29: 180 10.34% 20 to 29: 37 8.49% 30 to 39: 272 15.62% 30 to 39: 68 15.6% 40 to 49: 313 17.98% 40 to 49: 80 18.35% 50 to 59: 378 21.71% 50 to 59: 109 25% 60 to 69: 319 18.32% 60 to 69: 71 16.28% 70 to 79: 171 9.82% 70 to 79: 41 9.4% 80 to 100: 33 1.9% 80 to 100: 6 1.38% Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Table 8: References, GEO Accesion, Health Status, Origin of Study, AML subtypes of the study and Overall Survival of the train and validation set of 2177 individuals. The index corresponds to the index of Table 6. Some references which have note been published yet are notated with NYP.

Index Reference GEO Acc. AML/Healthy City, Country, Org. AML subtypes OS Vienna, Austria, Medical University 0 [17] GSE10258 AML M1, M5 n/a of Vienna St Louis, USA, Washington Univer- 1 [18], [19] GSE10358 AML M0, M1, n/a sity School of Medicine M2, M3, M4, M5, M6, M7 Boston, USA, Massachusetts Gen- 2 [20] GSE11375 Healthy n/a n/a eral Hospital Munich, Germany, University of Mean: 3 [21], [22] GSE12417 AML M0, M1, Munich 614.76, M2, M4, Std: 503.59 M5, M6 Houston, USA, MD Anderson Can- 4 [23], [24], [25] GSE14468 AML M0, M1, n/a cer Center M2, M3, M4, M4 eos, M5, M6 Rotterdam, Netherlands, Erasmus 5 [26] GSE14479 AML n/a n/a University Medical Center New York, USA, Columbia Univer- 6 [27] GSE15434 AML n/a n/a sity Medical Center Hangzhou, China, Second Affiliated 7 Wu 2012 (NYP) GSE15932 Healthy Hospital, School of Medicine, Zhe- n/a n/a jiang University Basel, Switzerland, 8 [28] GSE16028 Healthy n/a n/a F.Hoffmannn/La Roche AG Lisbon, Portugal, Instituto de 9 Krug 2011 (NYP) GSE17114 Healthy n/a n/a Medicina Molecular Boston, USA, Boston Children’s 10 [29] GSE18123 Healthy n/a n/a Hospital Portland, USA, Oregon Health & 11 [30] GSE18781 Healthy n/a n/a Science University Palo Alto, USA, Stanford Genome 12 [31] GSE19743 Healthy n/a n/a Technology Center Duarte, USA, City of Hope Beckman 13 [32] GSE23025 AML n/a n/a Research Institute Barcelona, Spain, Institut de Re- 14 [33] GSE25414 Healthy n/a n/a cerca Hospital Vall d’Hebron 15 [34] GSE2842 Healthy Bolzano, Italy, EURAC n/a n/a 16 [35] GSE29883 AML Berlin, Germany, Charité t(8;21), t(16;16) n/a Boston, USA, Massachusetts Gen- 17 [36] GSE36809 Healthy n/a n/a eral Hospital Munich, Germany, University Hospital Grosshadern, Lud- Mean: 18 [37], [38], [39], [40] GSE37642 AML M0, M1, wign/Maximiliansn/University 962.32, (LMU) Std: M2, M3, 1106.70 M4, M5, M6, M7 Brussels, Belgium, Université 19 [41], [42] GSE39088 Healthy n/a n/a catholique de Louvain Bullinger 2014 20 GSE39363 AML Berlin, Germany, Charité t(3;3) n/a (NYP) New York, USA, Columbia Univer- 21 [43] GSE46449 Healthy n/a n/a sity Medical Center 22 [44], [45] GSE46819 AML Berlin, Germany, Charité t(16;16) n/a Leong 2015 23 GSE68833 AML Rockville, USA, NCI M0, M1, n/a (NYP) M2, M3, M4, M5, M6, M7 Singapore, Singapore, Cancer Sci- 24 [46] GSE69565 AML n/a n/a ence Institute of Singapore Changchun, China, the Department 25 Meng 2015 (NYP) GSE71226 Healthy of Cardiology, China-Japan Union n/a n/a Hospital, Jilin University Ulm, Germany, University Hospital 26 Bohl 2016 (NYP) GSE84334 AML n/a n/a of Ulm Fujisawa, Japan, Takeda Pharmaceu- 27 [47] GSE84844 Healthy n/a n/a tical Company Limited Fujisawa, Japan, Takeda Pharmaceu- 28 [48] GSE93272 Healthy n/a n/a tical Company Limited Cambridge, United Kingdom, Uni- 29 [49] GSE98793 Healthy n/a n/a versity of Cambridge Tel Aviv, Israel, Tel Aviv Univer- 30 [50] GSE99039 Healthy n/a n/a sity Southport, Australia, Griffith In- Green 2009 31 GSE14845 Healthy situte for Health & Medical Re- n/a n/a (NYP) search Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

Table 9: The 26 probe sets ranked by their feature importance regarding the predictability of CatBoost. Information such as, probe set’s ID, corresponding gene symbols or NCBI accession numbers, blood malignancies and/or other types of cancer they are associated with, as well as, general annotations about the probe sets and the role of the gene products are presented here.

Gene Sym- Probe set ID bol/NCBI Acce- Blood Malignancies Other types of cancer General sion Number 234632_x_at AK026267 n/a n/a cDNA: FLJ22614 fis, clone HSI05089 [51] This gene encodes a , which plays a Acute Lymphoblastic Breast Cancer [53], 209603_at GATA3 role as regulator of T-cell development Leukemia (ALL)[52] Bladder Cancer [54] [51] 230527_at LOC101926907 n/a n/a Uncharacterized Gene [51] The protein encoded by this gene plays a 229963_at BEX5 n/a n/a role in neuronal development [55] Cervical Cancer [56], Epithelial-derived Car- cinomas [57], Pancre- atic Cancer [58], Breast This gene encodes a calcium-binding trans- Cancer [59], Colon Can- membrane glycoprotein component of 217901_at DSG2 n/a cer [60], Lung Cancer desmosomes, which plays a role in cell- [61], [62], Gastric Can- cell junctions between epithelial, my- cer [63], [64], Ovarian ocardial, and other types of cells [51] Cancer [65], Laryngeal Cancer [66], Liver Can- cer [67] This gene encodes a protein, which 214719_at SLC46A3 n/a Liver Cancer [68] is involved in transportation of small molecules across membranes [51] This gene encodes a protein, which may 219513_s_at SH2D3A n/a n/a play a role in JNK activation [69] The protein encoded by this gene it is 210789_x_at CEACAM3 n/a n/a thought to play an important role in con- trolling human-specific pathogens [51] This gene encodes a protein, which Gastric Cancer [70], plays a central role in the forma- Breast Cancer [71], 204777_s_at MAL n/a tion, stabilization and maintenance of Ovarian Cancer [72], glycosphingolipid-enriched membrane mi- Colorectal Cancer [73] crodomains [51] This gene encodes a protein, which is in- 203294_s_at LMAN1 n/a n/a volved in glycoprotein transportation [51] This gene encodes an RNA-binding protein, which plays a role as translational re- 230753_at PATL2 n/a n/a pressor in regulation of maternal mRNAs during oocyte maturation [74] The encoded protein acts as a transcrip- Lung Cancer [75], 242056_at TRIM45 n/a tional repressor of the mitogen-activated Glioma [76] protein kinase pathway [51] T-cell Acute Lym- Ovarian Cancer [79], The encoded protein is a component of the 217680_x_at RPL10 phoblastic Leukemia Pancreatic Cancer [80] 60S ribosome subunit [51] (T-ALL) [77], [78] FAM153A & FAM153B & Unknown function/Uncharacterized gene 214945_at FAM153C & n/a n/a [51] LOC100507387 & LOC105377751 222312_s_at AW969803 n/a n/a Expressed sequence tag [51] This gene encodes a protein, which is lo- 214705_at PATJ n/a n/a cated in tight junctions and in the apical membrane of epithelial cells [51] 241688_at AA677700 n/a n/a Expressed sequence tag [51] The protein encoded by this gene plays a 241611_s_at FNDC3A Multiple Myeloma [81] n/a role in spermatid-Sertoli adhesion during spermatogenesis [82] 236952_at AI309861 n/a n/a Expressed sequence tag [51] The encoded protein is involved in the reg- ulation of a variety of physiological pro- Chronic Lymphocytic 207636_at SERPINI2 pancreatic cancer [84] cesses, including coagulation, fibrinolysis, Leukemia (CLL) [83] development, malignancy, and inflamma- tion [51] 243659_at N63876 n/a n/a Expressed sequence tag [51] This gene encodes an extracellular met- Mixed Phenotype Acute Gastric Cancer [86], alloproteinase, which plays a significant 226311_at ADAMTS2 Leukemias (MPAL) [85] Kidney Cancer [87] role in the excision of the N-propeptides of procollagens I-III and type V [51] T-cell Acute Lym- The protein encoded by this gene is a 211772_x_at CHRNA3 phoblastic Leukemia Lung Cancer [89] ligand-gated ion channel, which plays a (T-ALL) [88] role in neurotransmission [51] 244719_at AA766704 n/a n/a Expressed sequence tag [51] 239766_at BF507518 n/a n/a Expressed sequence tag [51] 243272_at LOC101593348 n/a n/a Uncharacterized gene [51] Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

References [8] Angelakis A., Cats On The Classification Of Benign And Malignant Breast Lesions Using [1] Short NJ, Rytting ME, Cortes JE. Ultrasound Shear Wave Elastography Fea- Acute myeloid leukaemia.Lancet. 2018, tures And BI-RADS Score, Journal of Ultra- 392(10147):593-606. doi:10.1016/S0140- sound in Medicine. 2021, AIUM 2021 Annual 6736(18)31041-9 Convention,doi:10.1002/jum.15752

[2] Mottal N, Issa N, Dumas PY, et al. Re- [9] Prokhorenkova L, Gusev G, Vorobev A, et al. duce Mortality and Morbidity in Acute CatBoost: unbiased boosting with categori- Myeloid Leukemia With Hyperleukocytosis cal features. Advances in Neural Information With Early Admission in Intensive Care Unit: Processing Systems. 2018, vol. 31 A Retrospective Analysis. J Hematol. 2020, [10] Kohavi R, A study of cross-validation and 9(4):109-115. doi:10.14740/jh691 bootstrap for accuracy estimation and model selection. Proceedings of the 14th interna- [3] Roushangar R, Mias GI. Multi-study reanal- tional joint conference on Artificial intelli- ysis of 2,213 acute myeloid leukemia patients gence. 1995, vol. 2 reveals age- and sex-dependent gene expres- sion signatures. Sci Rep. 2019, 9(1):12413. [11] Warnat-Herresthal S, Perrakis K, Taschler Published 2019 Aug 27. doi:10.1038/s41598- B, et al. Scalable Prediction of Acute 019-48872-0 Myeloid Leukemia Using High-Dimensional Machine Learning and Blood Transcrip- [4] Abelson S, Collord G, Ng SWK, et al. tomics. iScience. 2020, 23(1):100780. Prediction of acute myeloid leukaemia doi:10.1016/j.isci.2019.100780 risk in healthy individuals. Nature. 2018, 559(7714):400-404.doi:10.1038/s41586-018- [12] Nazari E, Farzin AH, Aghemiri M, Avan A, 0317-6 Tara M, Tabesh H. Deep Learning for Acute Myeloid Leukemia Diagnosis. emphJ Med [5] Mosquera Orgueira A, Peleteiro Raíndo A, Life. 2020, 13(3):382-387. doi:10.25122/jml- Cid López M, et al. Personalized Survival 2019-0090 Prediction of Patients With Acute Myeloblas- tic Leukemia Using Gene Expression Profil- [13] Kazemi F, Najafabadi TA, Araabi BN. Au- ing. Front Oncol. 2021, 11:657191. Published tomatic Recognition of Acute Myelogenous 2021 Mar 29. doi:10.3389/fonc.2021.657191 Leukemia in Blood Microscopic Images Us- ing K-means Clustering and Support Vector [6] Angelakis A, Gatos I, Theotokas I et al. Machine. J Med Signals Sens. 2016, 6(3):183- A deep-learning approach to the significant 193. liver fibrosis binary classification problem using gender, morphologic and haemody- [14] Awada H, Durmaz A, Gurnari C, et al. Ma- namic measurements derived from B-mode chine Learning Integrates Genomic Signa- ultrasound images, Insights Imaging. 2018, tures for Subclassification Beyond Primary p.S279,9(Suppl 1):1.doi:10.1007/s13244-018- and Secondary Acute Myeloid Leukemia. 0603-8 [published online ahead of print, 2021 Jun 1]. Blood. 2021, blood.2020010603. [7] Angelakis A, Gatos I, Theotokas I et al. doi:10.1182/blood.2020010603 Binary Classification of Chronic Liver Disease Patients Using Deep Learning [15] Zhao S, Dong X, Shen W, et al. Machine on Morphologic B-Mode and Demo- learning-based classification of diffuse large graphic Data, Journal of Ultrasound in B-cell lymphoma patients by eight gene ex- Medicine. 2018, AIUM 2018 Annual Conven- pression profiles. Cancer Med. 2016, 5(5):837- tion,S3,doi:10.1002/jum.14750 852. doi:10.1002/cam4.650 Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

[16] Castillo D, Galvez JM, Herrera LJ, that is uniquely associated with a favorable et al. Leukemia multiclass assessment outcome. Blood. 2009, doi:10.1182/blood- and classification from Microarray and 2008-09-179895 RNA-seq technologies integration at gene expression level. PLoS One. 2019, [24] Taskesen E, Bullinger L, Corbacioglu A, et al. 14(2):e0212127. Published 2019 Feb 12. Prognostic impact, concurrent genetic muta- doi:10.1371/journal.pone.0212127 tions, and gene expression features of AML with CEBPA mutations in a cohort of 1182 [17] Zatkova A, Merk S, Wendehack M, et cytogenetically normal AML patients: Fur- al., AML/MDS with 11q/MLL amplifica- ther evidence for CEBPA double mutant tion show characteristic gene expression sig- AML as a distinctive disease entity. Blood. nature and interplay of DNA copy number 2011, 117(8):2469-2475. doi:10.1182/blood- changes. Genes Cancer. 2009, 2010-09-307280 48(6):510-520. doi:10.1002/gcc.20658 [25] Taskesen E, Babaei S, Reinders MMJ, [18] Tomasson MH, Xiang Z, Walgren R, et de Ridder J. Integration of gene expres- al. Somatic mutations and germline se- sion and DNA-methylation profiles im- quence variants in the expressed tyrosine ki- proves molecular subtype classification in nase genes of patients with de novo acute acute myeloid leukemia. BMC Bioinformat- myeloid leukemia. Blood. 2008, 111(9):4797- ics. 2015, 16(4):1-8. doi:10.1186/1471-2105- 4808. doi:10.1182/blood-2007-09-113027 16-S4-S5

[19] Walter MJ, Payton JE, Ries RE, et al. [26] Figueroa ME, Wouters BJ, Skrabanek L, Acquired copy number alterations in adult et al. Genome-wide epigenetic analysis de- acute myeloid leukemia genomes. Proc Natl lineates a biologically distinct immature Acad Sci U S A. 2009, 106(31):12950-12955. acute leukemia with myeloid/T-lymphoid doi:10.1073/pnas.0903091106 features. Blood. 2009, 113(12):2795-2804. doi:10.1182/blood-2008-08-172387 [20] Warren HS, Elson CM, Hayden DL, et al. A genomic score prognostic of outcome in [27] Klein HU, Ruckert C, Kohlmann A, et al. trauma patients. Mol Med. 2009, 15(7-8):220- Quantitative comparison of microarray ex- 227. doi:10.2119/molmed.2009.00027 periments with published leukemia related gene expression signatures. BMC Bioinfor- [21] Metzeler KH, Hummel M, Bloomfield CD, matics. 2009, 10:1-11. doi:10.1186/1471-2105- et al. An 86-probe-set gene-expression signa- 10-422 ture predicts survival in cytogenetically nor- mal acute myeloid leukemia. Blood. 2008, [28] Karlovich C, Duchateau-Nguyen G, Johnson 112(10):4193-4201. doi:10.1182/blood-2008- A, et al. A longitudinal study of gene expres- 02-134411 sion in healthy individuals. BMC Med Ge- nomics. 2009, 2:1-16. doi:10.1186/1755-8794- [22] Wang YH, Lin CC, Hsu CL, et al. 2-33 Distinct clinical and biological character- istics of acute myeloid leukemia with [29] Kong SW, Collins CD, Shimizu-Motohashi higher expression of long noncoding RNA Y, et al. Characteristics and Predic- KIAA0125. Ann Hematol. 2021, 100(2):487- tive Value of Blood Transcriptome 498. doi:10.1007/s00277-020-04358-y Signature in Males with Autism Spec- trum Disorders. PLoS One. 2012, 7(12). [23] Wouters BJ, Löwenberg B, Erpelinck- doi:10.1371/journal.pone.0049475 Verschueren CAJ, et al. Double CEBPA mutations, but not single CEBPA mutations, [30] Sharma SM, Choi D, Planck SR, et al. In- define a subgroup of acute myeloid leukemia sights in to the pathogenesis of axial spondy- with a distinctive gene expression profile loarthropathy based on gene expression pro- Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

files. Arthritis Res Ther. 2009, 11(6):1-9. 2014, 124(8):1304-1311. doi:10.1182/blood- doi:10.1186/ar2855 2013-12-540716

[31] Zhou B, Xu W, Herndon D, et al. Analysis of [39] Kuett A, Rieger C, Perathoner D, et al. IL-8 factorial time-course microarrays with appli- as mediator in the microenvironment- cation to a clinical study of burn injury. Proc leukaemia network in acute myeloid Natl Acad Sci U S A. 2010, 107(22):9923- leukaemia. Sci Rep. 2015, 5(December):1-11. 9928. doi:10.1073/pnas.1002757107 doi:10.1038/srep18411

[32] Li L, Li M, Sun C, et al. Altered Hematopoi- [40] Herold T, Jurinovic V, Batcha AMN, et etic Cell Gene Expression Precedes Devel- al. A 29-gene and cytogenetic score for opment of Therapy-Related Myelodyspla- the prediction of resistance to induction sia/Acute Myeloid Leukemia and Identi- treatment in acute myeloid leukemia. fies Patients at Risk. Cancer Cell. 2011, Haematologica. 2018, 103(3):456-465. 20(5):591-605. doi:10.1016/j.ccr.2011.09.011 doi:10.3324/haematol.2017.178442 [41] Lauwerys BR, Hachulla E, Spertini F, et [33] Rosell A, Vilalta A, García-Berrocoso T, al. Down-regulation of interferon signature et al. Brain perihematoma genomic pro- in systemic lupus erythematosus patients file following spontaneous human intracere- by active immunization with interferon α- bral hemorrhage. PLoS One. 2011, 6(2). kinoid. Arthritis Rheum. 2013, 65(2):447-456. doi:10.1371/journal.pone.0016750 doi:10.1002/art.37785 [34] Schmidt S, Rainer J, Riml S, et al. Iden- [42] Ducreux J, Houssiau FA, Vandepapelière P, tification of glucocorticoid-response genes et al. Interferon α-kinoid induces neutraliz- in children with acute lymphoblastic ing anti-interferon α antibodies that decrease leukemia. Blood. 2006, 107(5):2061-2069. the expression of interferon-induced and B doi:10.1182/blood-2005-07-2853 cell activation associated transcripts: Analy- sis of extended follow-up data from the inter- [35] Lück SC, Russ AC, Botzenhardt U, et feron α kinoid phase I/II study. Rheumatol al. Deregulated apoptosis signaling in core- (United Kingdom). 2016, 55(10):1901-1905. binding factor leukemia differentiates clini- doi:10.1093/rheumatology/kew262 cally relevant, molecular marker-independent subgroups. Leukemia. 2011, 25(11):1728- [43] Clelland CL, Read LL, Panek LJ, Nadrich 1738. doi:10.1038/leu.2011.154 RH, Bancroft C, Clelland JD. Utilization of Never-Medicated Bipolar Disorder Patients [36] Xiao W, Mindrinos MN, Seok J, et al. towards Development and Validation of a Pe- A genomic storm in critically injured hu- ripheral Biomarker Profile. PLoS One. 2013, mans. J Exp Med. 2011, 208(13):2581-2590. 8(6):1-11. doi:10.1371/journal.pone.0069082 doi:10.1084/jem.20111354 [44] Opel D, Schnaiter A, Dodier D, et al. Tar- [37] Li Z, Herold T, He C, et al. Iden- geting inhibitor of apoptosis by tification of a 24-gene prognostic signa- Smac mimetic elicits cell death in poor ture that improves the european Leukemi- prognostic subgroups of chronic lymphocytic aNet risk classification of acute myeloid leukemia. Int J Cancer. 2015, 137(12):2959- leukemia: An international collaborative 2970. doi:10.1002/ijc.29650 study. J Clin Oncol. 2013, 31(9):1172-1181. doi:10.1200/JCO.2012.44.3184 [45] Lueck SC, Russ AC, Botzenhardt U, et al. Smac mimetic induces cell death in a [38] Herold T, Metzeler KH, Vosberg S, et al. Iso- large proportion of primary acute myeloid lated trisomy 13 defines a homogeneous AML leukemia samples, which correlates with de- subgroup with high frequency of mutations in fined molecular markers. Oncotarget. 2016, spliceosome genes and poor prognosis. Blood. doi:10.18632/oncotarget.10390 Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

[46] Cao Q, Gearhart MD, Gery S, et al. BCOR [55] Kazi JU, Kabir NN, Rönnstrand L. Brain- regulates myeloid cell proliferation and differ- Expressed X-linked (BEX) proteins in entiation. Leukemia. 2016, 30(5):1155-1165. human cancers. Biochim Biophys Acta doi:10.1038/leu.2016.2 - Rev Cancer. 2015, 1856(2):226-233. doi:10.1016/j.bbcan.2015.09.001 [47] Tasaki S, Suzuki K, Nishikawa A, et al. Mul- tiomic disease signatures converge to cyto- [56] Qin S, Liao Y, Du Q, et al. DSG2 ex- toxic CD8 T cells in primary Sjögren’s syn- pression is correlated with poor progno- drome. Ann Rheum Dis. 2017, 76(8):1458- sis and promotes early-stage cervical can- 1466. doi:10.1136/annrheumdis-2016-210788 cer. Cancer Cell Int. 2020, 20(1):1-13. doi:10.1186/s12935-020-01292-x [48] Tasaki S, Suzuki K, Kassai Y, et al. Multi-omics monitoring of drug response [57] Brennan D, Mahoney MG. Increased in rheumatoid arthritis in pursuit of expression of Dsg2 in malignant skin molecular remission. Nat Commun. 2018, carcinomas: A tissue-microarray based doi:10.1038/s41467-018-05044-4 study. Cell Adhes Migr. 2009, 3(2):148-154. doi:10.4161/cam.3.2.7539 [49] Leday GGR, Vértes PE, Richardson S, et al. Replicable and Coupled Changes in [58] Hütz K, Zeiler J, Sachs L, et al. Loss of Innate and Adaptive Immune Gene Ex- desmoglein 2 promotes tumorigenic behav- pression in Two Case-Control Studies of ior in pancreatic cancer cells. Mol Carcinog. Blood Microarrays in Major Depressive Dis- 2017, 56(8):1884-1895. doi:10.1002/mc.22644 order. Biol Psychiatry. 2018, 83(1):70-80. [59] Davies EL, Cochrane RA, Hiscox S, et al. The doi:10.1016/j.biopsych.2017.01.021 role of desmoglein 2 and E-cadherin in the invasion and motility of human breast can- [50] Shamir R, Klein C, Amar D, et al. cer cells. Int J Oncol. 1997, 11(2):415-419. Analysis of blood-based gene expres- doi:10.3892/ijo.11.2.415 sion in idiopathic Parkinson disease. Neurology. 2017, 89(16):1676-1683. [60] Yang T, Gu X, Jia L, et al. DSG2 expres- doi:10.1212/WNL.0000000000004516 sion is low in colon cancer and correlates with poor survival. BMC Gastroenterol. 2021, [51] NCBI Resource Coordinators. Database re- 21(1):1-10. doi:10.1186/s12876-020-01588-2 sources of the National Center for Biotech- nology Information. Nucleic Acids Res. 2016, [61] Cai F, Zhu Q, Miao Y, et al. Desmoglein-2 44(D1):D7-D19. doi:10.1093/nar/gkv1290 is overexpressed in non-small cell lung cancer tissues and its knockdown suppresses NSCLC [52] Hou Q, Liao F, Zhang S, et al. Reg- growth by regulation of p27 and CDK2. J ulatory network of GATA3 in pedi- Cancer Res Clin Oncol. 2017, 143(1):59-69. atric acute lymphoblastic leukemia. doi:10.1007/s00432-016-2250-0 Oncotarget. 2017, 8(22):36040-36053. doi:10.18632/oncotarget.16424 [62] Saaber F, Chen Y, Cui T, et al. Ex- pression of desmogleins 1-3 and their clin- [53] Mehra R, Varambally S, Ding L, et al. Identi- ical impacts on human lung cancer. em- fication of GATA3 as a breast cancer prognos- phPathol Res Pract. 2015, 211(3):208-213. tic marker by global gene expression meta- doi:10.1016/j.prp.2014.10.008 analysis. Cancer Res. 2005, 65(24):11259- 11264. doi:10.1158/0008-5472.CAN-05-2495 [63] Yashiro M, Nishioka N, Hirakawa K. Decreased expression of the adhesion [54] Li Y, Ishiguro H, Kawahara T, et al. Loss of molecule desmoglein-2 is associated GATA3 in bladder cancer promotes cell mi- with diffuse-type gastric carcinoma. gration and invasion. Cancer Biol Ther. 2014, Eur J Cancer. 2006, 42(14):2397-2403. 15(4):428-435. doi:10.4161/cbt.27631 doi:10.1016/j.ejca.2006.03.024 Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

[64] Biedermann K, Vogelsang H, Becker I, et al. by promoter hypomethylation and plat- Desmoglein 2 is expressed abnormally rather inum resistance in epithelial ovarian can- than mutated in familial and sporadic gas- cer. Int J Cancer. 2010, 126(6):1378-1389. tric cancer. J Pathol. 2005, 207(2):199-206. doi:10.1002/ijc.24797 doi:10.1002/path.1821 [73] Kalmár A, Péterfia B, Hollósi P, et [65] Kim J, Beidler P, Wang H, et al. Desmoglein- al. DNA hypermethylation and decreased 2 as a prognostic and biomarker in ovarian mRNA expression of MAL, PRIMA1, PT- cancer. Cancer Biol Ther. 2020, 21(12):1154- GDR and SFRP1 in colorectal adenoma 1162. doi:10.1080/15384047.2020.1843323 and cancer. BMC Cancer. 2015, 15(1):1-14. doi:10.1186/s12885-015-1687-x [66] Cury SS, Lapa RML, de Mello JBH, et al. Increased DSG2 plasmatic lev- [74] Cao Q, Zhao C, Wang C, et al. The Re- els identified by transcriptomic-based current Mutation in PATL2 Inhibits Its secretome analysis is a potential prog- Degradation Thus Causing Female Infer- nostic biomarker in laryngeal carcinoma. tility Characterized by Oocyte Maturation Oral Oncol. 2020, 103(January):104592. Defect Through Regulation of the Mos- doi:10.1016/j.oraloncology.2020.104592 MAPK Pathway. Front Cell Dev Biol. 2021, [67] Han CP, Yu YH, Wang AG, et al. doi:10.3389/fcell.2021.628649 Desmoglein-2 overexpression predicts [75] Peng X, Wen Y, Zha L, et al. TRIM45 poor prognosis in hepatocellular car- Suppresses the Development of Non-small cinoma patients. Eur Rev Med Phar- Cell Lung Cancer. Curr Mol Med. 2019, macol Sci. 2018, 22(17):5481-5489. doi:10.2174/1566524019666191017143833 doi:10.26355/eurrev_201809_15808 [76] Zhang J, Zhang C, Cui J, et al. Trim45 [68] Zhao Q, Zheng B, Meng S, et al. In- functions as a tumor suppressor in the creased expression of SLC46A3 to oppose brain via its e3 ligase activity by sta- the progression of hepatocellular carcinoma bilizing p53 through k63-linked ubiquiti- and its effect on sorafenib therapy. Biomed nation. Cell Death Dis. 2017, 8(5):1-11. Pharmacother. 2019, 114(March):108864. doi:10.1038/cddis.2017.149 doi:10.1016/j.biopha.2019.108864 [69] Bateman A, Martin MJ, Orchard S, et al. [77] Raiser DM, Narla A, Ebert BL. The emerg- UniProt: The universal protein knowledge- ing importance of ribosomal dysfunction base in 2021. Nucleic Acids Res. Published in the pathogenesis of hematologic disor- online 2021. doi:10.1093/nar/gkaa1100 ders. Leuk Lymphoma. 2014, 55(3):491-500. doi:10.3109/10428194.2013.812786 [70] Buffart TE, Overmeer RM, Steenbergen RDM, et al. MAL promoter hypermethyla- [78] De Keersmaecker K, Atak ZK, Li N, et tion as a novel prognostic marker in gastric al. Exome sequencing identifies mutation cancer. Br J Cancer. 2008, 99(11):1802-1807. in CNOT3 and ribosomal genes RPL5 doi:10.1038/sj.bjc.6604777 and RPL10 in T-cell acute lymphoblastic leukemia. Nat Genet. 2013, 45(2):186-190. [71] Horne HN, Lee PS, Murphy SK, et al. In- doi:10.1038/ng.2508 activation of the MAL gene in breast can- cer is a common event that predicts bene- [79] Shi J, Zhang L, Zhou D, et al. Biological func- fit from adjuvant chemotherapy. Mol Cancer tion of ribosomal protein L10 on cell behavior Res. 2009, 7(2):199-209. doi:10.1158/1541- in human epithelial ovarian cancer. J Cancer. 7786.MCR-08-0314 2018, 9(4):745-756. doi:10.7150/jca.21614 [72] Lee PS, Teaberry VS, Bland AE, et al. [80] Yang J, Chen Z, Liu N, et al. Riboso- Elevated MAL expression is accompanied mal protein L10 in mitochondria serves as Diagnosis of Acute Myeloid Leukaemia Using Machine Learning

a regulator for ROS level in pancreatic can- [88] Laukkanen S, Liuksiala T, Nykter M, cer cells. Redox Biol. 2018, 19(May):158-165. et al. Identification of Novel Drug Tar- doi:10.1016/j.redox.2018.08.016 gets in T-Cell Acute Lymphoblastic Leukemia. Blood. 2015, 126(23):3646-3646. [81] Manfrini N, Mancino M, Miluzio A, et doi:10.1182/blood.v126.23.3646.3646 al. FAM46C and FNDC3A are multi- ple myeloma tumor suppressors that act [89] Wassenaar CA, Dong Q, Wei Q, et al. Re- in concert to impair clearing of protein lationship between CYP2A6 and CHRNA5- aggregates and autophagy. Cancer Res. CHRNA3-CHRNB4 variation and smok- 2020, 80(21):4693-4706. doi:10.1158/0008- ing behaviors and lung cancer risk. J 5472.CAN-20-1357 Natl Cancer Inst. 2011, 103(17):1342-1346. doi:10.1093/jnci/djr237 [82] Obholz KL, Akopyan A, Waymire KG, et al. FNDC3A is required for adhe- sion between spermatids and Sertoli cells. Dev Biol. 2006, 298(2):498-513. doi:10.1016/j.ydbio.2006.06.054

[83] Farfsing A, Engel F, Seiffert M, et al. Gene knockdown studies revealed CCDC50 as a candidate gene in mantle cell lymphoma and chronic lymphocytic leukemia. Leukemia. 2009, 23(11):2018-2026. doi:10.1038/leu.2009.144

[84] Ozaki K, Nagata M, Suzuki M, et al. Isolation and characterization of a novel human pancreas-specific gene, pancpin, that is down-regulated in pancreatic cancer cells. Genes Chromosom Cancer. 1998, 22(3):179-185. doi:10.1002/(SICI)1098- 2264(199807)22:3<179::AID- GCC3>3.0.CO;2-T

[85] Tota G, Coccaro N, Zagaria A, et al. ADAMTS2 gene dysregulation in T/myeloid mixed phenotype acute leukemia. BMC Can- cer. 2014, 14(1):1-6. doi:10.1186/1471-2407- 14-963

[86] Jiang C, Zhou Y, Huang Y, et al. Overex- pression of ADAMTS-2 in tumor cells and stroma is predictive of poor clinical prognosis in gastric cancer. Hum Pathol. 2019, 84:44-51. doi:10.1016/j.humpath.2018.08.030

[87] Roemer A, Schwettmann L, Jung M, et al. The membrane proteases ADAMs and hepsin are differentially expressed in renal cell carcinoma. Are they potential tumor markers? J Urol. 2004, 172(6 I):2162-2166. doi:10.1097/01.ju.0000144602.01322.49