applied sciences

Article Intelligent Microarray Data Analysis through Non-negative Matrix Factorization to Study Multiple Myeloma Cell Lines

Gabriella Casalino 1,* , Mauro Coluccia 2, Maria L. Pati 2, Alessandra Pannunzio 2 , Angelo Vacca 3, Antonio Scilimati 2,* and Maria G. Perrone 2,*

1 Department of Informatics, University of Bari “Aldo Moro”, Via Orabona 4, 70125 Bari, Italy 2 Department of Pharmacy-Pharmaceutical Sciences, University of Bari “Aldo Moro”, Via Orabona 4, 70125 Bari, Italy; [email protected] (M.C.); [email protected] (M.L.P.); [email protected] (A.P.) 3 Department of Biomedical Sciences and Human Oncology, Internal Medicine Unit G. Baccelli, University of Bari Aldo Moro Medical School, 70125, Bari, Italy; [email protected] * Correspondence: [email protected] (G.C.); [email protected] (A.S.); [email protected] (M.G.P.); Tel.: +39-0805442203 (G.C.); +39-0805442753 (A.S.); +39-0805442747 (M.G.P.); Fax: +39-0805442724 (A.S.); +39-0805442724 (M.G.P.)  Received: 23 October 2019; Accepted: 10 December 2019; Published: 17 December 2019 

Abstract: Microarray data are a kind of numerical non-negative data used to collect expression profiles. Since the number of in DNA is huge, they are usually high dimensional, therefore they require dimensionality reduction and clustering techniques to extract useful information. In this paper we use NMF, non-negative matrix factorization, to analyze microarray data, and also develop “intelligent” results visualization with the aim to facilitate the analysis of the domain experts. For this purpose, a case study based on the analysis of the profiles (GEPs), representative of the human multiple myeloma diseases, was investigated in 40 human myeloma cell lines (HMCLs). The aim of the experiments was to study the genes involved in arachidonic acid metabolism in order to detect gene patterns that possibly could be connected to the different gene expression profiles of multiple myeloma. NMF results have been verified by western blotting analysis in six HMCLs of proteins expressed by some of the most abundantly expressed genes. The experiments showed the effectiveness of NMF in intelligently analyzing microarray data.

Keywords: nonnegative matrix factorization; intelligent data analysis; feature extraction; dimensionality reduction; unsupervised learning; human multiple myeloma cell lines; gene expression profile; arachidonic acid metabolism

1. Introduction The massive use of technologies, in any domain, is leading to an exponential growth of available data. This abundant data and their complexity very often make them unusable. For this reason, automatic tools are more and more frequently used to facilitate their management and analysis. However, when completely automatic tools are used without proper technical knowledge, they provide results that are not completely understandable. Intelligent data analysis (IDA) is a methodology to extract useful hidden knowledge from data by involving human expertise in the analysis process [1,2]. It includes different methodologies to analyze real-world problems, such as statistics, artificial intelligence, machine learning, data mining, and data visualization. Domain experts and computer scientists are both necessary during the data analysis

Appl. Sci. 2019, 9, 5552; doi:10.3390/app9245552 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 5552 2 of 18 process. The former pertains to the domain expertise that is used to design the research questions that drive the analysis. The latter use automatic techniques to extract unknown patterns from data. However, there is a language gap between these two figures. Data visualization, dimensionality reduction, and clustering techniques allow the simplification of original data, making them more understandable and easier to analyze by non-technicians [3]. In this scenario, the non-negative matrix factorization (NMF) was shown to be suitable for IDA [4]. NMF is a low-rank approximation technique that, due to its non-negativity constraint, is able to describe original data as a linear additive combination of hidden factors: i.e., pieces of images in image recognition, topics in text mining, and metagenes (MGs) in the micro-array analysis [5–8]. NMF allows dimensionality reduction and clustering of original data, both useful for data understanding. On one hand clustering methods are used to group data according to similarity criteria [9]. On the other hand, dimensionality reduction methods represent data in a low dimensional space by discarding noise and worthless information and highlighting noteworthy elements. Both methods allow simplifying the complexity of original data, to bring out hidden structures among them. Differently from other similar methods, such as singular value decomposition (SVD), principal component analysis (PCA), and k-means, NMF gives easily interpretable results due to its non-negativity constraint [3]. In the last decade, NMF was widely used to analyze biological data [9,10]. Gene expression profiles, coming from microarray chips, are represented as numerical non-negative high-dimensional data. Thus, dimensionality reduction and clustering techniques are commonly used to discard unimportant information and highlight the relevant data. In this work, 40 human myeloma cell lines (HMCLs) were investigated. Particularly a selection of gene expression profiles (GEPs) that are representative of the human multiple myeloma diseases were analyzed through NMF. The NMF part-based representation simplifies the data and the relationships in them. Moreover, result visualizations are used with the aim to facilitate the data understanding from the domain experts. Multiple myelomas (MM) is still an unmet clinical need, being an incurable malignant disease of plasma cells affecting approximately 25,000 new patients per year in Europe and characterized by a marked genetic heterogeneity [11,12]. The major limitation to identify a proper MM therapy is mainly caused by a wide inter individual variation in response to the clinically available drugs [11,13–15]. This heterogeneous response to the treatment is mostly due to the molecular characteristics of the tumor, i.e., the differences in the involved GEP could determine the drug-resistance [16–18]. Deciphering key changes in gene expression levels underlying personalized sensitivity to chemotherapy is, therefore, essential to predict the efficacy of anticancer drugs and to prevent delay in the selection of more effective treatment strategies [19]. Cell lines are currently preferred in vitro pre-clinical tools to screen the effects of new drugs or combinations of drugs. Often, the efficacy of a drug candidate may vary depending on the cell line based model used for testing, probably also due to the different GEP of the used cells. The possibility to associate a specific cell line to a particular pathology condition would help to better design pharmaceutical studies, in terms of targets such as receptors, channel proteins or , and biological pathways. It is well-known that under some circumstances expressed genes do not produce the corresponding proteins necessary to exert a specific function. Thus, in continuation with our scientific interests, gene expression related to arachidonic acid (AA) metabolism was looked for in selected HMCLs and the presence of their coding proteins was verified by the western blotting technique. AA metabolism was chosen because, in its very complex network, several eicosanoids, as crucial mediators of several physio-pathological pathways, are produced. Prostaglandin E2 (PGE2), one of the known eicosanoids, is one of the pro-inflammatory and angiogenesis network mediators and supports the growth of several solid tumors [20–23]. In addition, PGE2 stimulates gene transcription, influences the mitogenesis of normal human bone cells, and promotes tumor metastases formation. Cyclooxygenase (COX) is responsible for PGE2 bio-synthesis from AA, in turn, released from the cell membrane upon mechanical or mitogen Appl. Sci. 2019, 9, 5552 3 of 18 stimuli. Both the two known COX isoforms (COX-1 and COX-2) contribute to PGE2 production, even though COX-1 seems to be its major source in normal tissue. COX-1 is also the target of a low dose of acetylsalicylic acid (ASA), the well-known aspirin active principle ingredient that irreversibly inactivates the platelet cyclooxygenase-1 by acetylating its S530, preventing thromboxane formation. In fact, thrombotic complications in patients with newly diagnosed MM treated with lenalidomide, dexamethasone, and thalidomide chemotherapy benefit from aspirin prophylaxis [24–26]. Tens of millions of worldwide adults take aspirin to reduce their risk of heart attack or stroke. Studies over the last two decades have suggested that regular use of aspirin may have another important benefit: decreasing the risk of developing or dying from some types of cancer [27,28]. These aspects potentiate the idea that the genes and their encoded proteins involved in the COX-mediated AA cascade would of relevance and noteworthy investigations also in multiple myeloma as in other types of cancer. For this purpose, a microarray data matrix containing the GEP of 40 HMCLs was used. The experiments, here below described, were designed to study a subset of 64 genes that were selected as involved in arachidonic acid metabolism. Particularly, for this purpose, the gene taxonomy proposed by [29] was used, 64 genes were grouped according to their role in arachidonic acid metabolism. Indeed the aim of the analysis was to detect expression patterns possibly connected to different multiple myeloma histotypes, stages, and grades that could emerge from the interaction of the selected genes. The NMF part-based representation was used to reduce the problem dimensionality and to describe the HMCLs as a composition of metagenes (i.e., groups of genes that emerged automatically from data). Finally, the western blotting technique was used to verify if the relationships between genes and HMCLs that have been suggested by NMF were actually coded.

2. Materials and Methods

2.1. Materials Cell culture reagents were purchased from EuroClone (Milan, Italy). Radioimmunoprecipitation assay buffer (RIPA) and protease inhibitor cocktail were obtained from Sigma–Aldrich (Milan, Italy). Anti-µ-ACTIN, anti-COX-1 (COX-111), and anti-rabbit secondary peroxidase antibodies were purchased from Thermo Fisher Scientific Italia (Monza, Italy). Anti-17µ-HSD4, anti-GGT1, anti-LTA4H, anti-ALOX-15B, anti-ACAA1, anti-CBR1, anti-CYP1A2, anti-PGEs2, and anti-VINCULIN were obtained from Santa Cruz Italia (Milan, Italy). Anti-mouse secondary peroxidase antibody and all reagents for western blotting were purchased from Bio-Rad Laboratories Srl (Milan, Italy). The other chemicals were prepared in AS and MGP laboratory [30–36].

2.2. Cell Culture The human NCI-H929, KMS-12-BM, and SK-MM-2 multiple myeloma (MM) cells were obtained from The Leibniz Institute DSMZ (Braunschweig, Germany). Human RPMI-8226, U266 B1, MM1S, HEK-293, and HepG2 cells were purchased from ATCC (Manassas, VA, USA). huCOX-1-IRES-mPGES-1cells were a gift from William Smith [37]. NCI-H929 cells were grown in RPMI-1640 supplemented with 20% fetal bovine serum, 50 µM µ-mercaptoethanol, 1 mM sodium pyruvate, 2 mM glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere. KMS-12-BM and SK-MM-2 cells were grown in RPMI-1640 supplemented with 20% fetal bovine serum, 2 mM glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere. MM1S cells were grown in RPMI-1640 supplemented with 10% fetal bovine serum, 1 mM sodium pyruvate, HEPES 10 mM, 1.4 M glucose, 2 mM glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere. RPMI-8226 and U266 B1 cells were grown in RPMI-1640 supplemented with 10% fetal bovine serum, 2 mM glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere. Human HEK-293 embryonal kidney cells were grown in DMEM Appl. Sci. 2019, 9, 5552 4 of 18

(Dulbecco’s Modified Eagle Medium) supplemented with 10% fetal bovine serum, 2 mM glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere. The human HepG2 hepatocellular carcinoma cells were grown in MEM supplemented with 10% fetal bovine serum, 2 mM glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere. HEK-293-COX-1 were grown in DMEM supplemented with 10% fetal bovine serum, 2 mM glutamine, 100 U/mL hygromycin B, and 100 µg/mL blasticidin in a humidified incubator at 37 ◦C with a 5% CO2 atmosphere.

2.3. Western Blotting The western blotting analysis was conducted by using the methods reported in Perrone et al. with minor modifications [38]. Briefly, all cells were washed twice with 10 mL phosphate-buffered saline (PBS), scraped in 1 mL PBS and centrifuged for 10 min at 1500 rpm, 4 ◦C. Proteins were extracted from cells by homogenization in cold RIPA buffer (Sigma–Aldrich) containing 1X protease inhibitor cocktail and centrifuged at 14,000 rpm for 10 min at 4 ◦C. The supernatant was recovered, and the protein concentration was measured using a DC Protein Assay Reagent Kit (BIO-RAD, Hercules, CA, USA). Protein extract (25 µg) was separated on 10% polyacrylamide gel (BIO-RAD) and then transferred onto a polyvinylidene difluoride membrane (PVDF) by a Trans-Blot Turbo Transfer System (BIO-RAD). The membrane was blocked for 1 h at room temperature with blocking buffer (1% blotting grade blocker, 0.1% Tween20 in Tris-buffered saline, TBS). The membrane was then incubated with either anti-COX-1 (1:500 mouse monoclonal, overnight at 4 ◦C), anti-LTA4H, anti-PGEs2, anti-CYP1A2, anti-CBR1 (1:200 mouse monoclonal, overnight at 4 ◦C), anti-ALOX-15-B (1:1000 rabbit monoclonal, 1 h at room temperature), or anti-17µ-HSD4, anti-GGT1, anti-ACAA1, anti-vinculin, anti-µ-actin (1:1000 mouse monoclonal, 1 h at room temperature) antibodies, diluted in blocking buffer. After the incubation time, the membrane was washed with washing buffer (0.1% Tween-20 in Tris-buffered saline, TBS) three times and incubated with a secondary peroxidase antibody (1:3000 anti-mouse or anti-rabbit) for 1 h at room temperature. After washing, the membrane was treated with enhanced chemiluminescence (ECL, BIO-RAD) according to the manufacturer’s instructions and the blot was visualized by UVITEC Cambridge (LifeTechnologies, Carlsbad, CA, USA). The expression level was evaluated by densitometric analysis using UVITEC Cambridge software (LifeTechnologies) and the µ-actin or vinculin expression level was used as a loading control to normalize the sample values.

2.4. Non-Negative Matrix Factorization NMF is a low-rank approximation technique that decomposes the original data matrix X Rn m ∈ +× in the product of two non-negative matrices W and H (X WH, with W Rn k and H Rk m) called ≈ ∈ +× ∈ +× basis and encoding matrix, respectively, where k is the factorization rank that indicates the number of hidden genes (metagenes) to extract. An explicative example is reported in Figure1. A microarray data matrix with 30 genes (rows) and 10 HMCLs (columns) was decomposed in the matrices W R30 2 and H R2 10. ∈ + × ∈ +× The basis matrix W, contains the latent factors that are automatically extracted by the algorithm. In the context of the microarray data analysis, these factors are called metagenes: groups of genes that emerged from the computation and used to describe the original data in a sub-dimensional space. Each column of W represents a metagene that is described by the same genes used in the original data. Each entry wij is the membership values of the i-th gene to the j-th metagene. The metagenes represent the relationships among the genes. From the mathematical point of view, these are linear relationships that emerge from data. One gene can be involved in the definition of different bases (metagenes); indeed, these are not exclusive relationships. This well represents the biological nature of the metagenes that could actually involve different genes depending on their role. The encoding matrix H represents the original data (HMCLs) in terms of the metagene expression profile. Each entry hij denotes the importance of the i-th metagene in reconstructing the j-th HMCL. Appl. Sci. 2019, 9, 5552 5 of 18 Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 17

Graphical representations of the metagene expression profiles could reveal different metagene The non-negativity constraint of NMF allows describing the HMCLs as a linear combination of these behaviors for the considered HMCLs, as in Figure 1. Therefore these patterns are used to metagenes, weighted by the coefficients in H (part-based decomposition). automatically group the HMCLs based on their similarities.

FigureFigure 1.1. Non-negativeNon-negative matrix factorization decomposition decomposition schema. schema. Each Each column column of ofX contains X contains a a didifferentfferent human human myeloma myeloma cell cell lines lines (HMCL) (HMCL) described described by by the the expression expression profiles profiles of theof the selected selected genes; eachgenes; row each represents row represents the expression the expression profile prof of theile correspondingof the corresponding gene, gene, through through the HMCLs. the HMCLs.

GraphicalFor this study, representations the standard of the multiplicative metagene expression update algorithm profiles could[39] based reveal on di fftheerent Euclidean metagene behaviorsdistance was for theused. considered Moreover, HMCLs, since the as NMF in Figure algorithms1. Therefore are strictly these dependent patterns are on used the initial to automatically starting grouppoint, the the HMCLs non-negative based ondouble their singular similarities. value decomposition (NNDSVD) algorithm was applied sinceFor it was this proven study, the to improve standard the multiplicative quality of clustering update algorithm results [40]. [39 ] based on the Euclidean distance was used.The rank Moreover, of the factorization since the NMF is an algorithms input parameter are strictly of the dependent problem. It on is the crucial initial for starting the results point, thesince non-negative it indicates doublethe number singular of metagenes value decomposition and the clusters (NNDSVD) that should algorithm be extracted. was applied The correlation since it was provenmatrix togives improve a qualitative the quality measurem of clusteringent of the results stability [40]. of the algorithm. Quantitative measurement of thisThe stability rank of is the given factorization by the cophenetic is an input correlation parameter coefficient of the problem. that in gene It is crucialexpression for the analysis results it sinceis itused indicates to quantify the number the stability of metagenes of the clustering and the clustersresults of that microarray should be gene extracted. expression The [10]. correlation matrix gives a qualitative measurement of the stability of the algorithm. Quantitative measurement of this 3. Results and Discussion stability is given by the cophenetic correlation coefficient that in gene expression analysis it is used to quantifyIn this the stabilitysection, ofwe the report clustering and discuss results the of microarrayresults obtained gene expressionby NMF on [10 the]. selected dataset (Section 3.1). Clustering results and the HMCLs part-based representation, in terms of the extracted 3.metagenes, Results and are Discussion firstly described and analyzed in Section 3.2. Section 3.3 describes the hidden metagenesIn this and section, their werole reportin arachidonic and discuss acid metabo the resultslism; finally, obtained Section by NMF 3.4 reports on the western selected blotting dataset (Sectionanalysis 3.1 to). validate Clustering NMF results results. and the HMCLs part-based representation, in terms of the extracted metagenes, are firstly described and analyzed in Section 3.2. Section 3.3 describes the hidden metagenes 3.1. Microarray Data and their role in arachidonic acid metabolism; finally, Section 3.4 reports western blotting analysis to validateA microarray NMF results. data matrix containing the gene expression profile of 40 HMCLs was used to study the genes involved in arachidonic acid metabolism, to detect genes expression profile patterns likely 3.1.connected Microarray to the Data different multiple myeloma types. All the numerical results were obtained by implementing the algorithms in Matlab 8.4 codes and running them on a machine equipped with an A microarray data matrix containing the gene expression profile of 40 HMCLs was used to Intel® core 2 Duo CPU with RAM 8.00 GB. study the genes involved in arachidonic acid metabolism, to detect genes expression profile patterns Microarray data of multiple myeloma cell lines available on the public database ArrayExpress likely connected to the different multiple myeloma types. All the numerical results were obtained by [41] were analyzed. implementing the algorithms in Matlab 8.4 codes and running them on a machine equipped with an Intel® core 2 Duo CPU with RAM 8.00 GB. Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 17 Appl. Sci. 2019, 9, 5552 6 of 18 The microarray data were collected using the Affymetrix GeneChip U133 Plus 2.0 [HG-U133-Plus-2]. The analysis was conducted on a subset of the original data starting from 40 humanMicroarray myeloma data cell of lines multiple (HMCLs) myeloma that cellprovided lines available a signature on thefor publicstratification database of ArrayExpresspatient risk [41] [41 ] wereand analyzed. 64 genes involved in arachidonic acid metabolism as depicted in Figure 2. TheA data microarray matrix datawas built were by collected using 64 using rows the (genes) Affymetrix and 40 GeneChipcolumns (HMCLs) Human Genome(Figure 3). U133 It was Plus 2.0evident [HG-U133-Plus-2]. that the dimensionality The analysis of the was data conducted was too big on to a directly subset ofidentify the original any GEP data pattern. starting Hence, from 40a human dimensionality myeloma reduction cell lines was (HMCLs) needed thatto bring provided out useful a signature information for stratification that could be ofeasily patient visualized risk [41 ] andthrough 64 genes the involvedpart-based in representation. arachidonic acid metabolism as depicted in Figure2.

FigureFigure 2. 2.Arachidonic Arachidonic acid acid (AA)(AA) metabolism,metabolism, released from from the the membrane membrane upon upon mechanical mechanical mitogen mitogen stimuli.stimuli. Eicosanoids Eicosanoids are are the the major major products products of AA of metabolism. AA metabolism. In this In figure, this thefigure, biosynthetic the biosynthetic pathways ofpathways the main eicosanoidsof the main togethereicosanoids with together genes coding with ge fornes proteins coding are for di proteinsfferently are colored: differently cyclooxygenase colored: (COX) (red), 5- lypoxygenase (5-LOX) (purple), 15-LOX (green), 12-LOX (yellow), cytochrome P (CYP) epoxygenase (orange), CYP w-hydroxylase (cyan). Modified image from [35]. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 17

cyclooxygenase (COX) (red), 5- lypoxygenase (5-LOX) (purple), 15-LOX (green), 12-LOX (yellow), cytochrome P (CYP) epoxygenase (orange), CYP w-hydroxylase (cyan). Modified image from [35]. Appl. Sci. 2019, 9, 5552 7 of 18

A data matrix was built by using 64 rows (genes) and 40 columns (HMCLs) (Figure3). It was evidentAppl. Sci. that2020, 10 the, x FOR dimensionality PEER REVIEW of the data was too big to directly identify any GEP pattern.7 of 17 Hence, a dimensionality reduction was needed to bring out useful information that could be easily cyclooxygenase (COX) (red), 5- lypoxygenase (5-LOX) (purple), 15-LOX (green), 12-LOX (yellow), visualized through the part-based representation. cytochrome P (CYP) epoxygenase (orange), CYP w-hydroxylase (cyan). Modified image from [35].

Figure 3. Visual representation of the data matrix. Different colors are used to represent the gene expression level: yellow grades for high expression, green grades for medium expression, blue grades for low or no expression. FigureFigure 3. 3.Visual Visual representation representation of of the the data data matrix. matrix. DiDifferentfferent colors colors are are used used to to represent represent the the gene gene 3.2. HMCL expressionClusteringexpression level: level: Results yellow yellow grades grades for for high high expression, expression green, green grades grades for for medium medium expression, expression, blue blue grades grades forfor low low or or no no expression. expression. Fifty runs of the NMF algorithm varying the rank value in the interval 2–10 were executed 3.2. HMCL Clustering Results [42,43], and3.2. HMCL both Clusteringthe consensus Results matrices and the cophenetic correlation coefficient were collected from the analysisFiftyFifty runs runs ofof of thethe the NMF NMFencoding algorithm algorithm matrix varying varying theH. rank theWhilst valuerank value inthe the consensusin interval the interval 2–10 werematrix 2–10 executed were gives executed [42 a,43 qualitative], and both the consensus matrices and the cophenetic correlation coefficient were collected from the measurement[42,43], ofand the both cluster the consensus quality, matrices the cophenetic and the cophenetic correlation correlation coefficient coefficient is a quantitative were collected measure, analysisfrom the of theanalysis encoding of the matrix encodingH. Whilst matrix the consensus H. Whilst matrix the givesconsensus a qualitative matrix measurementgives a qualitative of the and its clustertrendmeasurement quality,is used theof to the copheneticchoose cluster the quality, correlation optimal the coecophenetic rankfficient (i.e., is correlation athe quantitative number coefficient measure, of clusters). is anda quantitative its trend is measure, used to Figurechooseand its4 the trendshows optimal is usedthe rank tocophenetic (i.e.,choose the the number optimalcorrelation of clusters).rank (i.e., coeffi the cientnumber trend of clusters). in the interval 2–10 (Each point represents theFigureFigure mean4 shows4 showsvalue the theof cophenetic thecophenetic cophenetic correlation correlation correlati coe coeffiffioncientcient coefficient trend trend in in the onthe interval theinterval fifty 2–10 2–10 runs). (Each (Each From point point value five we observedrepresentsrepresents a significant the the mean mean value value reduction of of the the cophenetic cophenetic of this trend, correlation correlati so onit coe hascoefficientffi cientbeen on onchosen the the fifty fifty as runs). runs). factorization From From value value rank. five five wewe observed observed a a significant significant reduction reduction of of this this trend, trend, so so it it has has been been chosen chosen as as factorization factorization rank. rank.

Figure 4. Cophenetic index values in 2–10, as a function of the factorization rank. Figure 4. Cophenetic index values in 2–10, as a function of the factorization rank. TheFigure consensus 4. Cophenetic matrix in Figureindex 5values represents in 2–10, the HMCLsas a function (rows andof the columns) factorization grouped rank. according to the metagene expression profile similarities. Five clusters (yellow squares) are depicted showing Thesome consensus regularities matrix within in theFigure groups 5 represents. The algorithm the proved HMCLs to be (rows stable and and itcolumns) converged grouped to the same according to the metagenesolution in expression all 50 runs. Thisprofil resulte similarities. is shown in FigureFive cl5 usterswhere the(yellow squares squares) are clearly are separated depicted and showing only yellow and blue values are present. some regularities within the groups. The algorithm proved to be stable and it converged to the same solution in all 50 runs. This result is shown in Figure 5 where the squares are clearly separated and only yellow and blue values are present. Appl. Sci. 2019, 9, 5552 8 of 18

The consensus matrix in Figure5 represents the HMCLs (rows and columns) grouped according to the metagene expression profile similarities. Five clusters (yellow squares) are depicted showing some regularities within the groups. The algorithm proved to be stable and it converged to the same solutionAppl. Sci. in 2020 all, 50 10, runs.x FOR ThisPEER resultREVIEW is shown in Figure5 where the squares are clearly separated and8 of 17 only yellow and blue values are present.

FigureFigure 5. Consensus 5. Consensus matrix matrix obtained obtained with with 50 50 runs runs of of non-negative non-negative matrix matrix factorization factorization (NMF) (NMF) and andk =k 5.= 5.

It isIt known is known that that NMF NMF allows allows a part-based a part-based representation representation of theof the original original data data by by a linear a linear additive additive compositioncomposition of partsof parts that that can can be interpretedbe interpreted as buildingas building blocks. blocks. In thisIn this scenario, scenario, the the ‘blocks’ ‘blocks’ are are the the metagenes.metagenes. Figure Figure6 shows 6 shows the the HMCLs HMCLs composition composition in terms in terms of metagenes of metagenes in each in each of the of the five five groups groups thatthat have have been been returned returned from from the the consensus consensus matrix. matrix The. The whole whole bar bar is givenis given by by the the sum sum of theof the single single metagenemetagene bars, bars, expressed expressed as aas percentage. a percentage. TheThe emerging emerging patterns patterns are are now now clearer. clearer. The The dimensionality dimensionality reduction reduction and an thed the clustering clustering have have mademade it possible. it possible. They They allow allow the the study study of bothof both the the metagene metagene expression expression profiles profiles through through the the HMCLs HMCLs andand the the diff differenterent behaviors behaviors of theof the metagenes metagenes among among the th clusters.e clusters. This This simplified simplified representation representation helps helps thethe domain domain experts experts in betterin better understanding understanding the the relationships relationships hidden hidden in data.in data. It is It a is starting a starting point point for for conductingconducting the the biological biological analysis. analysis. FigureFigure6 shows 6 shows that that the the HMCLs HMCLs in the in the first first cluster cluster are are characterized characterize byd aby high a high expression expression of the of the firstfirst metagene metagene (dark (dark blue blue bars), bars), and and the the HMCLs HMCLs in thein the second second cluster cluster from from high high expression expression of theof the thirdthird metagene metagene (green). (green). The The HMCLs HMCLs in thein the third third cluster cluster are are characterized characterized by by a high a high expression expression of theof the fourthfourth metagene metagene (yellow) (yellow) bars, bars, and and the the bars bars in clustersin clusters four four and and five five have have very very high high expression expression of theof the metagenesmetagenes two two (light (light blue) blue) and and five five (dark (dark yellow), yellow), respectively. respectively. Furthermore,Furthermore, it is it noteworthyis noteworthy that, that, even even if the if th metagenese metagenes two two and and four four are are highly highly expressed expressed in in clustersclusters four four and three,and three, they are they always are expressedalways expres in all thesed HMCLs. in all the Note HMCLs. that even Note if this that representation even if this hasrepresentation allowed the identification has allowed of the the identification most expressed of metagenesthe most expressed per HMCLs, metagenes the analysis perhas HMCLs, to take the intoanalysis account has the to less take expressed into account metagenes. the less Indeed, expres allsed the metagenes. extracted Indeed, metagenes all contributethe extracted to definingmetagenes eachcontribute HMCL, thusto defining they cannot each beHMCL, removed thus asthey noise. cannot In fact, be removed the dimensionality as noise. In reduction fact, the dimensionality has already reduction has already removed the noise from data, whilst it has highlighted the relationships among genes and HMCLs. The emerged patterns are then used by the domain experts, involved in the data analysis process (biologists and physicians in this case), to better understand their role in arachidonic acid metabolism, as it is described in Section 3.4. The expression profiles of the single metagenes are depicted (same colors code used before) in Appendix A, reporting the same behavior summarized in Figure 6. Appl. Sci. 2019, 9, 5552 9 of 18 removed the noise from data, whilst it has highlighted the relationships among genes and HMCLs. The emerged patterns are then used by the domain experts, involved in the data analysis process (biologists and physicians in this case), to better understand their role in arachidonic acid metabolism, as it is described in Section 3.4. The expression profiles of the single metagenes are depicted (same colors code used before) in AppendixA, reporting the same behavior summarized in Figure6. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 17

FigureFigure 6. 6. MetageneMetagene expression expression profile profile through through the the HMCLs. HMCLs.

Moreover,Moreover, visual visual analysis analysis allows allows us us to to detect detect so someme subgroups subgroups within within the the groups. groups. It It is is the the case case thatthat some some HMCLs HMCLs such asas XG24,XG24, XG3, XG3, XG2, XG2, U266, U266, LP1, LP1, XG12, XG12, XG13, XG13, and and XG10 XG10 of theof firstthe first cluster cluster have havea medium a medium expression expression of the of metagene the metagene 3 (the 3 green(the green one), one), different different than than the other the other HMCLs HMCLs of the of same the samegroups; groups; the HMCLs the HMCLs XG4, XG4, XG6, XG6, and NAN6and NAN6 of the of second the second cluster, cluster, that usethat metagene use metagene 4 (the 4 yellow(the yellow bars); bars);in the in same the clusterssame clusters we recognize we recognize another anot subgroupher subgroup composed composed of the HMCLs of the XG6, HMCLs L363, XG6, KMS12BM, L363, KMS12BM,MDN, NAN1, MDN, and NAN1, NAN6 and that NAN6 have athat medium have expressiona medium ofexpression metagene of 2 metagene (light blue). 2 (light Differences blue). Differencesamong HMCLs among in theHMCLs same in group the same are the group object are of the study object for of the study domain for the experts. domain experts. AtAt this this point, point, it it appeared appeared interesting interesting to to analyze analyze the the genes genes forming forming each each metagene. metagene. Since Since our our analysisanalysis started started using using Moreaux’s Moreaux’s classification classification of of HMCLs, HMCLs, [41] [41 ]it it seemed seemed important important to to compare compare his his resultsresults to to our our obtained obtained clusters. clusters. According According to to Moreau Moreaux’sx’s study, study, the the 40 40 HMCLs HMCLs could could be be clustered clustered into into 66 groups groups using using unsupervised unsupervised hierarchical hierarchical clustering, clustering, whereas whereas our our clustering clustering led led to to five five clusters. clusters. ThisThis difference difference was was predictable. predictable. Moreaux’s Moreaux’s analys analysisis was was conducted conducted on a onlarge a largeset of setgenes. of genes. Our analysisOur analysis is focused is focused on a smaller on a smaller subset subset of genes, of genes, the arachidonic the arachidonic acid acidmetabolism metabolism involved involved genes. genes.

3.3.3.3. Metagene Metagene Analysis Analysis TheThe encoding encoding matrix matrix HH waswas used used to to cluster cluster the the HMCLs HMCLs and and to to extract extract the the metagene metagene expression expression profiles,profiles, whereas thethe basisbasis matrix matrixW Wcontains contains the the metagenes, metagenes, represented represented as sets as of sets genes of withgenes di ffwitherent differentweights. weights. We extracted We extracted five different five metagenes.different me Figuretagenes.7 depicts Figure the7 depicts top most the 20 top genes most involved 20 genes in involvedeach metagene. in each metagene. The genes haveThe genes been have ordered been and ordered visualized, and visualized, according according to their influence to their (expressedinfluence (expressedin percentage) in percentage) on each metagene. on each metagene. For a broader For analysis,a broader twenty analysis, genes twenty are reportedgenes are even reported though even we thoughobserved we the observed highest the weights highest for weights the first fivefor the/six first genes. five/six These genes. graphs These give moregraphs insights give more to the insights domain toexperts the domain that are experts helpful that to are drive helpful the biological to drive the analysis biological of the analysis HMCLs. of the HMCLs. AsAs an an example, example, analyzing analyzing Figure Figure 55 wewe know know th thatat the the HMCL HMCL KMM1 KMM1 is is mostly mostly composed composed of of the the expressionexpression of of metagene metagene 1, 1, partly partly by by metagene metagene 2, 2, and and minimally minimally by by metagene metagene 4. 4. Figure Figure 77 specifiesspecifies the the genesgenes that that contribute toto eacheach metagene metagene definition. definition. The The genes genes PTGES2, PTGES2, HSD17B4, HSD17B4, LTA4H, LTA4H, ACAA, ACAA, CBRI, CBRI, and PPARD are the heaviest in the definition of metagene 1, so the expression of these genes highly affects the HMCL KMM1; the genes MGST2, GGT1, SCP2, FPR2, PTGES3, PLAG10, and CYP2W1 from metagene 2 only partly influence KMM1, and finally, the genes CYSLTR1, CBR1, CYP2J2, EHHADH, CYP1A2, and ACOX3 from metagene 4 slightly condition KMM1. It is possible to conduct a similar analysis on the other cell lines, looking for a relationship among the genes and the cellular functions they are involved in. Appl. Sci. 2019, 9, 5552 10 of 18

and PPARD are the heaviest in the definition of metagene 1, so the expression of these genes highly affects the HMCL KMM1; the genes MGST2, GGT1, SCP2, FPR2, PTGES3, PLAG10, and CYP2W1 from metagene 2 only partly influence KMM1, and finally, the genes CYSLTR1, CBR1, CYP2J2, EHHADH, CYP1A2, and ACOX3 from metagene 4 slightly condition KMM1. It is possible to conduct a similar analysis on the other cell lines, looking for a relationship among the genes and the cellular functions Appl.they Sci. 2020 are, 10 involved, x FOR PEER in. REVIEW 10 of 17

FigureFigure 7. Composition 7. Composition of ofthe the metagenes metagenes MG1, MG2,MG2, MG3, MG3, MG4, MG4, and and MG5 MG5 in terms in te ofrms the of original the original genes. genes. The taxonomy in [24] can give more insight into the role of the genes in each metagene. ForThe example, taxonomy the in genes [24] can in the give first more metagene insight mostly into the belong role of to the the genes cyclooxygenases in each metagene. and related For example,enzymes the involvedgenes in inthe eicosanoid first metagene biosynthesis mostly in belong , to enzymes the cyclooxygenases involved in eicosanoid and related catabolism enzymes in involvedhuman, in eicosanoid eicosanoid receptors, biosynthesis and transcription in humans, factors enzymes in humans. involved Similar in analyses eicosanoid could catabolism be conducted in on the other metagenes. human, eicosanoid receptors, and transcription factors in humans. Similar analyses could be conducted on the other metagenes.

3.4. Proofs of Concepts: Western Blotting Analysis With the aim to verify the correctness of the NMF results, the expression of some proteins codified by the topmost twenty genes that are part of the identified metagene were found in six representative HMCLs (U266, SKMM2, KMS12BM, NCIH929, RPMI8226, MM1S) by using the western blotting technique [44–47]. One or two HMCLs were selected from each cluster, based on their different responses to the clinically used drugs. In Figure 5, the expression of some specific proteins, analyzed by western blotting analysis and involved in the eicosanoid biosynthesis related to the arachidonic acid pathway of various human myeloma cell lines, is depicted. The expression extent values of each protein were assigned as listed in Figure 8. Appl. Sci. 2019, 9, 5552 11 of 18

3.4. Proofs of Concepts: Western Blotting Analysis With the aim to verify the correctness of the NMF results, the expression of some proteins codified by the topmost twenty genes that are part of the identified metagene were found in six representative HMCLs (U266, SKMM2, KMS12BM, NCIH929, RPMI8226, MM1S) by using the western blotting technique [44–47]. One or two HMCLs were selected from each cluster, based on their different responses to the clinically used drugs. In Figure5, the expression of some specific proteins, analyzed by western blotting analysis and involved in the eicosanoid biosynthesis related to the arachidonic acid pathway of various human myeloma cell lines, is depicted. The expression extent values of each protein were assigned as listed in Figure8. Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 17

Figure 8. Protein expression in all studied cell lines (Expression values are reported with respect to Figure 8. Protein expression in all studied cell lines (Expression values are reported with respect to HEK293 wild type or HepG2 cell line were used as a positive control, as suggested by the antibody data HEK293 wild type or HepG2 cell line were used as a positive control, as suggested by the antibody sheet, to which the value 1 has been arbitrarily assigned. Color codes are the same used for metagenes data sheet, to which the value 1 has been arbitrarily assigned. Color codes are the same used for 1–5 depicted in AppendixA. All data are related to HEK293-COX-1, the value = 1 was assigned to the metagenes 1–5 depicted in Appendix A. All data are related to HEK293-COX-1, the value = 1 was expression of COX-1. NF = not found). assigned to the expression of COX-1. NF = not found). U266-B1U266-B1 isis aa BB lymphocytelymphocyte plasmacytomaplasmacytoma cellcell lineline belongingbelonging toto clustercluster 11 characterizedcharacterized byby thethe expressionexpression ofof MG1-2-3-5MG1-2-3-5 (depicted(depicted fromfrom thethe coloredcolored boxesboxes underunder cellcell lineline name,name, Figure8 8).). HSD17B4, thethe gene coding coding for for the the protein protein HSD417B, HSD417B, was was expressed expressed at 8% at 8%in MG1, in MG1, at 3% at in 3% MG3, in MG3, at 5% at in 5%MG4, in MG4,and 4% and in 4%MG5 in (gene MG5 (genepercentages percentages are indicated are indicated in the incolored the colored boxes, boxes, Figure Figures 8 and Figure7 and8 )7) but but its its expressionexpression (0.37)(0.37) inin U266-B1U266-B1 waswas duedue onlyonly toto MG1,MG1, 3,3, andand 5.5. In fact, U266-B1 (cluster(cluster 1,1, FigureFigure5 5)) was was notnot representedrepresented byby MG4MG4 andand HSD17B4HSD17B4 waswas notnot presentpresent inin thethe toptop mostmost 2020 genesgenes ofof MG2.MG2. InIn U266-B1,U266-B1, COX-1COX-1 was was not not detected detected by by western western blot blot analysis analysis (Figure (Figure6) thus 6) thus it was it supposedwas supposed that PTGS1that PTGS1 is downregulated is downregulated in this in cellthis line. cell line. GGT1GGT1 expressedexpressed atat 8%8% inin MG2 codifiescodifies the corresponding proteinprotein whichwhich waswas foundfound withwith aa valuevalue ofof 0.180.18 in U266-B1. Comparable expression values 0.17 and 0.26, 0.26, respectively, respectively, were found for LTA4H andand ALOX15B;ALOX15B; LTA4H LTA4H was was represented represented by by all 5all MG 5 MG with with a preponderant a preponderant contribution contribution of MG1 of in MG1 which in LTA4Hwhich LTA4H was expressed was expressed at 7% of at all 7% genes. of all ALOX15P genes. ALOX15P was found was in MG1,found 3, in 4, MG1, and 5 3, but 4, itsand expression 5 but its wasexpression higher inwas MG3 higher (7%). in MG3 (7%). AmongAmong thethe analyzed analyzed proteins, proteins, PGES2 PGES2 and and ACAA1 ACAA1 were were highly highly expressed expressed in U266-B1 in U266-B1 with awith value a ofvalue 19.9 of and 19.9 2.17, and respectively. 2.17, respectively. PGES2 PGES2 was found was found in MG1, in 2,MG1, and 2, 5 withand 5 no with involvement no involvement of MG3 of and MG3 4 whichand 4 dowhich not composedo not compose the U266-B1 the cellU266-B1 line. ACAA1cell line. derive ACAA1 from derive MG1, from 3, and MG1, 5, with 3, noand contribution 5, with no ofcontribution MG 2 and 4.of CBR1MG 2 andandCYP1A2 4. CBR1 wereand CYP1A2 not detected were in not U266-B1, detected even in thoughU266-B1, their even corresponding though their genescorresponding were represented genes were in MGs represented 1–5. Their absencein MGs could1–5. beTheir justified absence by the could downregulation be justified ofby CBR1 the anddownregulation CYP1A2. of CBR1 and CYP1A2. SK-MM-2SK-MM-2 and KMS-12-BM,KMS-12-BM, both belonging to cluster 2 (Figure5 5),), hadhad aa didifferentfferent metagenemetagene compositioncomposition asas depicteddepicted byby thethe coloredcolored boxesboxes underunder cellcell lineline namename (Figure(Figure8 ).8). SK-MM-2, a B lymphocyte plasmacytoma cell line, was mostly composed of metagene 3 and 1, with a little contribute given by metagene 2. From western blotting analysis (Figure 9) of this cell line, it was evident that all the proteins codified by the genes of metagenes 2 and 3 were expressed, except for CYP1A2. Concerning KMS-12-BM, a multiple myeloma cell line, it had an almost similar contribution from all metagenes with the exception of metagene 5. In this case, PGES2 was the major expressed protein while for COX-1, CBR1, and CYP1A2, no expression was detected (Figure 9). NCI-H929, another multiple myeloma cell line, belongs to cluster 3 with a marked expression of genes of metagene 4, and in part of metagenes 1, 3, and 5. Also, in this cell line, the expression of PGES2 was high (439) with respect to the positive control. CBR1 and CYP1A2 were not found in U266-B1, while a good expression of ALOX15B, HSD417B, ACCA1, and COX-1 was observed. In this cell line, a good COX-1 expression was observed if considering that the reference cell line was stimulated to express COX-1. Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 17

RPMI-8266 belonging to the cluster 4 was composed of the expression of genes belonging to metagene 2. Concerning HSD17B4, ACAA1, and ALOX15B, since their encoding genes were not present in the top most 20 representing MG2, we expected not to find them. Surprisingly, western blotting experiments (Figure 9) revealed the expression of HSD17B4, ACAA1, and ALOX15B with values of 0.44, 0.27, and 0.48, respectively. Appl.Their Sci. 2019 presence, 9, 5552 is justified by the fact that their codified genes are actually present but12 with of 18 a percentage of expression less than 1% and therefore not represented in Figure 8. The last but not least important cluster was represented from the cell line MM1S with a marked SK-MM-2, a B lymphocyte plasmacytoma cell line, was mostly composed of metagene 3 and 1, componentwith a little of contributemetagene given5 but bya low metagene contribution 2. From of westernmetagenes blotting 1, 2, analysisand 5. This (Figure is the9) ofonly this HMCL cell usedline, in itwhich was evident CYP1A2 that was all theexpressed proteins and codified with a by va thelue genes comparable of metagenes to the 2 positive and 3 were control. expressed, Similar expressionexcept for values CYP1A2. were found for the other proteins with the exception of GGT1 and ACAA1.

FigureFigure 9. 9.ExpressionExpression ofof some some representative representative proteins proteins involved involved in the eicosanoid in the eicosanoid biosynthetic biosynthetic pathways pathwaysof six human of six human myeloma myeloma cell lines cell analyzed lines analyz by westerned by western blotting. blotting. HEK-293 HEK-293 and HepG2 and HepG2 were used were usedas as positive positive controls controls for for 17- 17-β-HSD4,β-HSD4, LTA4H, LTA4H, PGES2, PGES2, ALOX-15B, ALOX-15B GGT1, GGT1 and and ACAA1 ACAA1 expressions, expressions, respectively.respectively. HEK293-COX-1 HEK293-COX-1 cells cells treated treated with with 10 10 μµg/mLg/mL tetracyclinetetracycline for for 24 24 h h were were used used as as a positivea positive controlcontrol for for COX-1 COX-1 expression. expression. β-actinβ-actin and and vinculin vinculin were were used asas loadingloading controlscontrols for for 17- 17-β-HSD4,β-HSD4, COX-1,COX-1, GGT1, GGT1, LTA4H, LTA4H, ALOX-15B, ALOX-15B, and and ACAA1, ACAA1, CBR1 CBR1,, CYP1A2, PGES2PGES2respectively, respectively, except except for for the the MM1SMM1S cell cell line line where where vinculin waswas used used as as the the loading loading control control for all for the all examined the examined proteins. proteins. The image The imageis the is resultthe result of di ffoferent different western we blottingstern blotting experiments experiments that were that lined were up lined in one up single in one image. single image.

Concerning KMS-12-BM, a multiple myeloma cell line, it had an almost similar contribution from all metagenes with the exception of metagene 5. In this case, PGES2 was the major expressed protein while for COX-1, CBR1, and CYP1A2, no expression was detected (Figure9). NCI-H929, another multiple myeloma cell line, belongs to cluster 3 with a marked expression of genes of metagene 4, and in part of metagenes 1, 3, and 5. Also, in this cell line, the expression of PGES2 was high (439) with respect to the positive control. CBR1 and CYP1A2 were not found in U266-B1, while a good expression of ALOX15B, HSD417B, ACCA1, and COX-1 was observed. In this cell line, Appl. Sci. 2019, 9, 5552 13 of 18 a good COX-1 expression was observed if considering that the reference cell line was stimulated to express COX-1. RPMI-8266 belonging to the cluster 4 was composed of the expression of genes belonging to metagene 2. Concerning HSD17B4, ACAA1, and ALOX15B, since their encoding genes were not present in the top most 20 representing MG2, we expected not to find them. Surprisingly, western blotting experiments (Figure9) revealed the expression of HSD17B4, ACAA1, and ALOX15B with values of 0.44, 0.27, and 0.48, respectively. Their presence is justified by the fact that their codified genes are actually present but with a percentage of expression less than 1% and therefore not represented in Figure8. The last but not least important cluster was represented from the cell line MM1S with a marked component of metagene 5 but a low contribution of metagenes 1, 2, and 5. This is the only HMCL used in which CYP1A2 was expressed and with a value comparable to the positive control. Similar expression values were found for the other proteins with the exception of GGT1 and ACAA1.

4. Conclusions Herein, a case study showing the use of NMF as a tool to intelligently analyze microarray data was proposed. Particularly the aim of the experiments was to study the genes involved in arachidonic acid metabolism, in order to detect gene patterns and relationships among the HMCLs that could be related to the different gene expression profiles of multiple myeloma. For this purpose, some numerical experiments on microarray gene expression real data belonging to 40 human multiple myeloma cell lines have been accomplished, selecting a subset of genes related to arachidonic acid metabolism, and applying the NMF algorithm to the so obtained genes–HMCLs matrix. The experiments showed the effectiveness of the NMF in intelligently analyzing microarray data to support the domain experts’ activities. Through NMF we were able to select, using the genes involved in the metabolism of arachidonic acid, a certain number of multiple myeloma cell lines. This characterization has a high value from a biological point of view because it allows performing experiments using the most correct cell line. For example, the information obtained from this study led to the selection of NCIH929 and RPMI-8226 as the two most appropriate cell lines to investigate structure–inhibitory activity relationships of a set of novel compounds and the cyclooxygenase (COX) catalytic activity. Moreover, NMF results have been verified by western blotting analysis in six HMCLs of proteins expressed by some of the most abundant expressed genes. Our data confirm the correctness of NMF data outcomes for the six HMCLs. Further experiments are ongoing to increase the number of HMCLs and to be correlated to patients’ disease profiles, in order to verify the presence of patterns that link different stages and grades of multiple myeloma, and the responses to the provided drug therapies.

Author Contributions: G.C. performed computational studies, M.L.P. performed biological studies, M.C. and A.P. designed and supervised G.C. work, A.V. provided MMS1 cell line and contributed to the writing of the article and A.S. and M.G.P. conceived the idea and supervised the entire work. Funding: This work was supported by First AIRC Grant-MFAG2015 (Project Id. 17566). Acknowledgments: Gabriella Casalino is a member of the INdAM research group GNCS (Gruppo Nazionale per il Calcolo Scientifico) of Istituto Nazionale di Alta Matematica Francesco Severi, P.le Aldo Moro, Roma, Italy. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A Metagene expression profiles through the HMCLs are reported in Figures A1–A5. Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 17

4. Conclusions Herein, a case study showing the use of NMF as a tool to intelligently analyze microarray data was proposed. Particularly the aim of the experiments was to study the genes involved in arachidonic acid metabolism, in order to detect gene patterns and relationships among the HMCLs that could be related to the different gene expression profiles of multiple myeloma. For this purpose, some numerical experiments on microarray gene expression real data belonging to 40 human multiple myeloma cell lines have been accomplished, selecting a subset of genes related to arachidonic acid metabolism, and applying the NMF algorithm to the so obtained genes–HMCLs matrix. The experiments showed the effectiveness of the NMF in intelligently analyzing microarray data to support the domain experts' activities. Through NMF we were able to select, using the genes involved in the metabolism of arachidonic acid, a certain number of multiple myeloma cell lines. This characterization has a high value from a biological point of view because it allows performing experiments using the most correct cell line. For example, the information obtained from this study led to the selection of NCIH929 and RPMI-8226 as the two most appropriate cell lines to investigate structure–inhibitory activity relationships of a set of novel compounds and the cyclooxygenase (COX) catalytic activity. Moreover, NMF results have been verified by western blotting analysis in six HMCLs of proteins expressed by some of the most abundant expressed genes. Our data confirm the correctness of NMF data outcomes for the six HMCLs. Further experiments are ongoing to increase the number of HMCLs and to be correlated to patients' disease profiles, in order to verify the presence of patterns that link different stages and grades of multiple myeloma, and the responses to the provided drug therapies.

Author Contributions: G.C. performed computational studies, M.L.P. performed biological studies, M.C. and A.P. designed and supervised G.C. work, A.V. provided MMS1 cell line and contributed to the writing of the article and A.S. and M.G.P. conceived the idea and supervised the entire work.

Funding: This work was supported by First AIRC Grant-MFAG2015 (Project Id. 17566).

Acknowledgments: Gabriella Casalino is a member of the INdAM research group GNCS (Gruppo Nazionale per il Calcolo Scientifico) of Istituto Nazionale di Alta Matematica Francesco Severi, P.le Aldo Moro, Roma, Italy.

Conflicts of Interest: The authors declare no conflict of interest.

Appendix A Appl. Sci. 2019, 9, 5552 14 of 18 Metagene expression profiles through the HMCLs are reported in Figures A1–A5.

Appl. Sci. 2020,, 10,, xx FORFOR PEERPEERFigureFigure REVIEWREVIEW A1. A1. Metagene Metagene 1 1expression expression profile profile through through the the HMCLs. HMCLs. 14 of 17

FigureFigure A2. A2. MetageneMetagene 2 2 expression expression profile profile through through the the HMCLs HMCLs...

FigureFigure A3. A3. MetageneMetagene 3 3 expression expression profile profile through through the the HMCLs HMCLs...

Figure A4. Metagene 4 expression profile through the HMCLs. Appl. Sci. 2020, 10, x FOR PEER REVIEW 14 of 17

Figure A2. Metagene 2 expression profile through the HMCLs.

Appl. Sci. 2019, 9, 5552 15 of 18 Figure A3. Metagene 3 expression profile through the HMCLs.

Appl. Sci. 2020, 10, x FOR PEERFigureFigure REVIEW A4. A4. Metagene Metagene 4 4 expression expression profile profile through through the the HMCLs. HMCLs. 15 of 17

FigureFigure A5. A5. MetageneMetagene 5 5 expression expression profile profile through through the the HMCLs. HMCLs.

ReferencesReferences

1.1. Berthold,Berthold, M.; M.; Hand, Hand, D.J. D.J. IntelligentIntelligent Data Data Analys Analysis:is: An An Introduction Introduction; ;Springer: Springer: NewNew York, York, NY, NY, USA; USA ;Secaucus, Secaucus, NJ,NJ, USA, USA, 2007. 2007. 2.2. Berthold,Berthold, M.R.; M.R.; Borgelt, Borgelt, C.; C.; Hoppner, Hoppner, F.; F.; Klawonn, Klawonn, F. F. Guide to Intelligent Data An Analysis:alysis: How How to to Intelligently Intelligently MakeMake Sense Sense of of Real Real Data Data,, 1st 1st ed.; ed.; Springer: Springer: Berlin/H Berlin/Heidelberg,eidelberg, Germany, Germany, 2010. 3.3. Nieto,Nieto, J.J.; J.J.; Torres, Torres, A.; A.; Georgiou, Georgiou, D.N.; D.N.; Karakasidis, Karakasidis, T.E. T.E.Fuzzy Fuzzy polynucleotide polynucleotide spaces spaces and metrics. and metrics. Bull. Math.Bull. Math.Biol. 2006 Biol., 682006, 703–725., 68, 703–725. [CrossRef][PubMed] 4.4. Casalino,Casalino, G.; G.; Del Del Buono, Buono, N.; N.; Mencar, Mencar, C. C. Non-Negative Non-Negative Matrix Matrix Factorizations Factorizations for for Intelligent Intelligent Data Data Analysis. Analysis. InIn Non-Negative MatrixMatrix Factorization Factorization Techniques, Techniques, Signals Signals and Communicationand Communication Technology Technology; Naik,; G.R., Naik, Ed.; G.R., Springer: Ed.; Springer:Berlin/Heidelberg, Berlin/Heidelberg, Germany, Ge 2016;rmany, ISBN 2016; 978-3-662-48330-5. ISBN 978-3-662-48330-5. 5.5. Casalino,Casalino, G.; Gillis,Gillis, N. N. Sequential Sequential dimensionality dimensionality reduction reduction for extracting for extracting localized localized features. features.Pattern Recognit.Pattern Recognit.2017, 63, 2017 15–29., 63 [,CrossRef 15–29. ] 6.6. Boccarelli,Boccarelli, A.; A.; Esposito, Esposito, F.; F.; Coluccia, Coluccia, M.; M.; Frassanito, Frassanito, M.A.; M.A.; Vacca, Vacca, A.; A.; Del Del Buono, Buono, N. N. Improving Improving knowledge knowledge onon the the activation activation of of bone bone marrow marrow fibroblasts fibroblasts in in MG MGUSUS and and MM MM disease disease through through the the automatic automatic extraction extraction ofof genes genes via via a a nonnegative nonnegative matrix matrix factorizat factorizationion approach approach on on gene gene expression expression profiles. profiles. J.J. Transl. Transl. Med. Med. 20182018, , 1616,, 217–233. 217–233. [CrossRef][PubMed] 7. Esposito, F.; Gillis, N.; Del Buono, N. Orthogonal joint sparse NMF for microarray data analysis J. Math. Biol. 2019, 79, 223. 8. Casalino, G.; Castiello, C.; Del Buono, N.; Mencar, C. A framework for intelligent Twitter data analysis with non-negative matrix factorization. Int. J. Web Inf. Syst. 2018, 14, 334–356, doi:10.1108/IJWIS-11-2017-0081. 9. Filippone, M.; Camastra, F.; Masulli, F.; Rovetta, S. A survey of kernel and spectral methods for clustering. Pattern Recognit. 2008, 41, 176–190. 10. Brunet, J.P.; Tamayo, P.; Golub, T.R.; Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 2004, 101, 4164–4169. 11. Marin, J.J.; Briz, O.; Monte, M.J.; Blazquez, A.G.; Macias, R.I. Genetic variants in genes involved in mechanisms of chemoresistance to anticancer drugs. Curr. Cancer Drug Targets 2012, 12, 402–438. 12. Lombardi, L.; Poretti, G.; Mattioli, M.; Fabris, S.; Agnelli, L.; Bicciato, S.; Lambertenghi-Deliliers, G. Molecular characterization of human multiple myeloma cell lines by integrative genomics: Insights into the biology of the disease. Genes Cancer 2007, 46, 226–238. 13. Richardson, P.G.; Sonneveld, P.; Schuster, M.W.; Irwin, D.; Stadtmauer, E.A.; Facon, T.; Harousseau, J.L.; Ben-Yehuda, D.; Lonial, S.; San Miguel, J.F.; et al. Safety and efficacy of bortezomib in high-risk and elderly patients with relapsed multiple myeloma. Br. J. Haematol. 2007, 137, 429–435. 14. Fonseca, R.; Bergsagel, P.L.; Drach, J.; Shaughnessy, J.; Gutierrez, N.; Stewart, A.K.; Morgan, G.; Van Ness, B.; Chesi, M.; Minvielle, S.; et al. International Myeloma Working Group molecular classification of multiple myeloma: Spotlight review. Leukemia 2009, 23, 2210–2221. Appl. Sci. 2019, 9, 5552 16 of 18

7. Esposito, F.; Gillis, N.; Del Buono, N. Orthogonal joint sparse NMF for microarray data analysis. J. Math. Biol. 2019, 79, 223. [CrossRef][PubMed] 8. Casalino, G.; Castiello, C.; Del Buono, N.; Mencar, C. A framework for intelligent Twitter data analysis with non-negative matrix factorization. Int. J. Web Inf. Syst. 2018, 14, 334–356. [CrossRef] 9. Filippone, M.; Camastra, F.; Masulli, F.; Rovetta, S. A survey of kernel and spectral methods for clustering. Pattern Recognit. 2008, 41, 176–190. [CrossRef] 10. Brunet, J.P.; Tamayo, P.; Golub, T.R.; Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 2004, 101, 4164–4169. [CrossRef] 11. Marin, J.J.; Briz, O.; Monte, M.J.; Blazquez, A.G.; Macias, R.I. Genetic variants in genes involved in mechanisms of chemoresistance to anticancer drugs. Curr. Cancer Drug Targets 2012, 12, 402–438. [CrossRef] 12. Lombardi, L.; Poretti, G.; Mattioli, M.; Fabris, S.; Agnelli, L.; Bicciato, S.; Lambertenghi-Deliliers, G. Molecular characterization of human multiple myeloma cell lines by integrative genomics: Insights into the biology of the disease. Genes Chromosomes Cancer 2007, 46, 226–238. [CrossRef] 13. Richardson, P.G.; Sonneveld, P.; Schuster, M.W.; Irwin, D.; Stadtmauer, E.A.; Facon, T.; Harousseau, J.L.; Ben-Yehuda, D.; Lonial, S.; San Miguel, J.F.; et al. Safety and efficacy of bortezomib in high-risk and elderly patients with relapsed multiple myeloma. Br. J. Haematol. 2007, 137, 429–435. [CrossRef] 14. Fonseca, R.; Bergsagel, P.L.; Drach, J.; Shaughnessy, J.; Gutierrez, N.; Stewart, A.K.; Morgan, G.; Van Ness, B.; Chesi, M.; Minvielle, S.; et al. International Myeloma Working Group molecular classification of multiple myeloma: Spotlight review. Leukemia 2009, 23, 2210–2221. [CrossRef][PubMed] 15. Vangsted, A.; Klausen, T.W.; Vogel, U. Genetic variations in multiple myeloma II: Association with effect of treatment. Eur. J. Haematol. 2012, 88, 93–117. [CrossRef][PubMed] 16. Kumar, S.; Rajkumar, S.V. Many facets of bortezomib resistance/susceptibility. Blood 2008, 112, 2177–2178. [CrossRef][PubMed] 17. Rajkumar, S.V.; Dimopoulos, M.A.; Palumbo, A.; Blade, J.; Merlini, G.; Mateos, M.V.; Kumar, S.; Hillengass, J.; Kastritis, E.; Richardson, P.; et al. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. Lancet Oncol. 2014, 15, 538–548. [CrossRef] 18. Rajkumar, S.V. Myeloma today: Disease definitions and treatment advances. Am. J. Hematol. 2016, 91, 90–100. [CrossRef] 19. Mitra, A.K.; Harding, T.; Mukherjee, U.K.; Jang, J.S.; Li, Y.; HongZheng, R.; Jen, J.; Sonneveld, P.; Kumar, S.; Kuehl, W.M.; et al. A gene expression signature distinguishes innate response and resistance to proteasome inhibitors in multiple myeloma. Blood Cancer J. 2017, 7, e581. [CrossRef] 20. Williams, T.J.; Peck, M.J. Role of prostaglandin-mediated vasodilatation in inflammation. Nature 1977, 270, 530–532. [CrossRef] 21. Pai, R.; Soreghan, B.; Szabo, I.L.; Pavelca, M.; Baatar, D.; Tarnawski, A.S. Prostaglandin E2 transactivates EGF receptor: A novel mechanism for promoting colon cancer growth and gastrointestinal hypertrophy. Nat. Med. 2002, 8, 289–293. [CrossRef] 22. Salcedo, R.; Zhang, X.; Young, H.A.; Michael, N.; Wasserman, K.; Ma, W.H. Angiogenic effects of prostaglandin E2 are mediated by up-regulation of CXCR4 on human microvascular endothelial cells. Blood 2003, 102, 1966–1977. [CrossRef] 23. Perrone, M.G.; Scilimati, A.; Simone, L.; Vitale, P. Selective COX-1 inhibition: A therapeutic target to be reconsidered. Curr. Med. Chem. 2010, 17, 3769–3805. [CrossRef] 24. Palumbo, A.; Cavo, M.; Bringhen, S.; Zamagni, E.; Romano, A.; Patriarca, F. Aspirin, Warfarin, or Enoxaparin thromboprophylaxis in patients with multiple myeloma treated with Thalidomide: A Phase III, Open-Label, Randomized Trial. J. Clin. Oncol. 2011, 29, 986–993. [CrossRef][PubMed] 25. Baz, R.; Li, L.; Kottke-Marchant, K.; Srkalovic, G.; McGowan, B.; Yiannaki, E. The Role of Aspirin in the Prevention of Thrombotic Complications of Thalidomide and Anthracycline-Based Chemotherapy for Multiple Myeloma. Mayo Clin. Proc. 2005, 80, 1568–1574. [CrossRef][PubMed] 26. Zonder, J.A.; Barlogie, B.; Durie, B.G.; McCoy, J.; Crowley, J.; Hussein, M.A. Thrombotic complications in patients with newly diagnosed multiple myeloma treated with lenalidomide and dexamethasone: Benefit of aspirin prophylaxis. Blood 2006, 108, 403–404. [CrossRef][PubMed] 27. Siegel, R.L.; Miller, K.D.; Jemal, A. Cancer statistics, 2015. CA Cancer J. Clin. 2015, 65, 5–29. [CrossRef] [PubMed] Appl. Sci. 2019, 9, 5552 17 of 18

28. Cuzick, J.; Thorat, M.A.; Bosetti, C.; Brown, P.H.; Burn, J.; Cook, N.R.; Ford, L.G.; Jacobs, E.J.; Jankowski, J.A.; La Vecchia, C.; et al. Estimates of benefits and harms of prophylactic use of aspirin in the general population. Ann. Oncol. 2015, 26, 47–57. [CrossRef][PubMed] 29. Perrone, M.G.; Malerba, P.; Uddin, M.J.; Vitale, P.; Panella, A.; Crews, B.C.; Daniel, C.K.; Ghebreselasie, K.; Nickels, M.; Tantawy, M.N.; et al. PET radiotracer [18F]-P6 selectively targeting COX-1 as a novel biomarker in ovarian cancer: Preliminary investigation Eur. J. Med. Chem. 2014, 80, 562–568. 30. Vitale, P.; Perna, F.M.; Perrone, M.G.; Scilimati, A. Screening on the use of Kluyveromyces marxianus CBS 6556 growing cells as enantioselective biocatalyst for ketones reduction. Tetrahedron Asymmetry 2011, 22, 1985–1993. [CrossRef] 31. Catalano, A.; Carocci, A.; Corbo, F.; Franchini, C.; Muraglia, M.; Scilimati, A.; De Bellis, M.; De Luca, A.; Camerino Conte, D.; Sinicropi, M.S.; et al. Constrained analogues of tocainide as potent skeletal muscle sodium channel blockers towards the development of antimyotonic agents. Eur. J. Med. Chem. 2008, 43, 2535–2540. [CrossRef] 32. Di Nunno, L.; Vitale, P.; Scilimati, A.; Simone, L.; Capitelli, F. Stereoselective dimerization of 3-arylisoxazoles to cage-shaped bis-beta-lactams syn 2,6-diaryl-3,7-diazatricyclo[4.2.0.02,5]- octan-4,8-diones induced by hindered lithium amides. Tetrahedron 2007, 63, 12388–12395. [CrossRef] 33. Vitale, P.; Scilimati, A. Functional 3-Arylisoxazoles and 3-Aryl-2-isoxazolines from reaction of aryl nitrile oxides and enolates: Synthesis and reactivity. Synthesis 2013, 45, 2940–2948. [CrossRef] 34. Perrone, M.G.; Santandrea, E.; Bleve, L.; Vitale, P.; Colabufo, N.A.; Jockers, R.; Milazzo, F.M.; Sciarroni, A.F.; Scilimati, A. Stereospecific synthesis and bio-activity of novel beta3-adrenoceptor agonists and inverse agonists. Bioorg. Med. Chem. 2008, 16, 473–2488. [CrossRef][PubMed] 35. Perrone, M.G.; Santandrea, E.; Scilimati, A.; Tortorella, V.; Capitelli, F.; Bertolasi, V. Baker’s yeast-mediated reduction of ethyl 2-(4-chlorophenoxy)-3-oxoalkanoates suitable intermediates for potential PPARalpha ligands. Tetrahedron Asymmetry 2004, 15, 3501–3510. [CrossRef] 36. Perrone, M.G.; Santandrea, E.; Dell’Uomo, N.; Giannessi, F.; Milazzo, F.M.; Sciarroni, A.F.; Scilimati, A.; Tortorella, V. Synthesis and Biological Evaluation of New Clofibrate Analogues as Potential PPARalpha Agonists. Eur. J. Med. Chem. 2005, 40, 143–154. [CrossRef][PubMed] 37. Yuan, C.; Smith, W.L. A Cyclooxygenase-2-dependent Prostaglandin E2 Biosynthetic System in the Golgi Apparatus. J. Biol. Chem. 2015, 290, 5606–5620. [CrossRef] 38. Pati, M.L.; Vitale, P.; Ferorelli, S.; Iaselli, M.; Miciaccia, M.; Boccarelli, A.; Di Mauro, G.D.; Fortuna, C.G.; Souza Domingos, T.F.; Rodrigues Pereira da Silva, L.C.; et al. Translational impact of novel widely pharmacological characterized mofezolac-derived COX-1 inhibitors combined with bortezomib on human multiple myeloma cell lines viability. Eur. J. Med. Chem. 2019, 164, 59–76. [CrossRef] 39. Lee, D.D.; Seung, H. Learning the parts of objects by nonnegative matrix factorization. Nature 1999, 401, 788–791. [CrossRef] 40. Boutsidis, C.; Gallopoulos, E. SVD-based initialization: A head start for nonnegative matrix factorization. Pattern Recognit. 2008, 41, 1350–1362. [CrossRef] 41. Moreaux, J.; Klein, B.; Bataille, R.; Descamps, G.; Maïga, S.; Hose, D. A high-risk signature for patients with multiple myeloma established from the molecular classification of human myeloma cell lines. Haematologica 2011, 96, 574–582. [CrossRef] 42. Buczynski, M.W.; Dumlao, D.S.; Dennis, E.A. An integrated omics analysis of eicosanoid biology. J. Lipid Res. 2009, 50, 1505. [CrossRef] 43. Liu, W.; Yuan, K.; Ye, D. Reducing microarray data via nonnegative matrix factorization for visualization and clustering analysis. J. Biomed. Inform. 2008, 41, 602–606. [CrossRef] 44. Park, J.; Bae, E.K.; Lee, C.; Choi, J.H.; Jung, W.J.; Ahn, K.S.; Yoon, S.S. Establishment and characterization of bortezomib-resistant U266 cell line: Constitutive activation of NF-κB-mediated cell signals and/or alterations of ubiquitylation-related genes reduce bortezomib-induced apoptosis. BMB Rep. 2014, 47, 274–279. [CrossRef] [PubMed] 45. Chauhan, D.; Singh, A.V.; Ciccarelli, B.; Richardson, P.G.; Palladino, M.A.; Anderson, K.C. Combination of proteasome inhibitors bortezomib and NPI-0052 trigger in vivo synergistic cytotoxicity in multiple myeloma. Blood 2008, 111, 1654–1664. [CrossRef][PubMed] Appl. Sci. 2019, 9, 5552 18 of 18

46. Ling, S.C.; Lau, E.K.; Al-Shabeeb, A.; Nikolic, A.; Catalano, A.; Iland, H.; Horvath, N.; Ho, P.J.; Harrison, S.; Fleming, S.; et al. Response of myeloma to the proteasome inhibitor bortezomib is correlated with the unfolded protein response regulator XBP-1. Haematologica 2012, 97, 64–72. [CrossRef][PubMed] 47. Maïga, S.; Gomez-Bougie, P.; Bonnaud, S.; Gratas, C.; Moreau, P.; Le Gouill, S.; Pellat-Deceunynck, C.; Amiot, M. Paradoxical effect of lenalidomide on cytokine/growth factor profiles in multiple myeloma. Br. J. Cancer 2013, 108, 1801–1806. [CrossRef][PubMed]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).