Meta-Analysis of Human Gene Expression in Response To
Total Page:16
File Type:pdf, Size:1020Kb
Wang et al. BMC Systems Biology (2018) 12:3 DOI 10.1186/s12918-017-0524-z RESEARCH ARTICLE Open Access Meta-analysis of human gene expression in response to Mycobacterium tuberculosis infection reveals potential therapeutic targets Zhang Wang1†, Seda Arat1,2†, Michal Magid-Slav1* and James R. Brown1* Abstract Background: With the global emergence of multi-drug resistant strains of Mycobacterium tuberculosis, new strategies to treat tuberculosis are urgently needed such as therapeutics targeting potential human host factors. Results: Here we performed a statistical meta-analysis of human gene expression in response to both latent and active pulmonary tuberculosis infections from nine published datasets. We found 1655 genes that were significantly differentially expressed during active tuberculosis infection. In contrast, no gene was significant for latent tuberculosis. Pathway enrichment analysis identified 90 significant canonical human pathways, including several pathways more commonly related to non-infectious diseases such as the LRRK2 pathway in Parkinson’sdisease,and PD-1/PD-L1 signaling pathway important for new immuno-oncology therapies. The analysis of human genome-wide association studies datasets revealed tuberculosis-associated genetic variants proximal to several genes in major histocompatibility complex for antigen presentation. We propose several new targets and drug-repurposing opportunities including intravenous immunoglobulin, ion-channel blockers and cancer immuno-therapeutics for development as combination therapeutics with anti-mycobacterial agents. Conclusions: Our meta-analysis provides novel insights into host genes and pathways important for tuberculosis and brings forth potential drug repurposing opportunities for host-directed therapies. Keywords: Tuberculosis, Host-direct therapies, Gene expression signature, Parkinson’s disease, Drug repurposing Background develop latent TB (LTB) infection, in which they are The causative agent of tuberculosis (TB), the bacterium asymptomatic with no clinical evidence of disease for Mycobacterium tuberculosis (Mtb), infects about one third years or decades [4]. During latent infection, Mtb is of the world’s population. In 2012, an estimated 8.6 mil- phagocytosed by macrophages, which trigger host im- lion individuals progressed to active disease and 1.3 mil- mune response involving the recruitment of additional lion died [1]. The prolonged duration of Mtb infection as macrophages and monocytes that ultimately form an or- well as the alarming emergence of multi-drug resistant ganized structure called granuloma [5]. Mtb is dormant strains makes the development of new and effective anti- and non-replicative within granuloma which suppresses tubercular therapeutics a global health priority [2, 3]. the immediate threat of active infection while evading The severity of TB is largely dependent upon the further immune response [6]. Approximately 5–10% of modulation of human cellular and immunity pathways LTB patients will go on to develop active pulmonary TB by the Mtb pathogen. The majority of infected patients (PTB), in which Mtb returns to the replication mode and provokes an active host immune response. When * Correspondence: [email protected]; [email protected] Mtb † the granuloma breaks down, sequestered can be re- Equal contributors leased into the airway and becomes transmissible to 1Computational Biology, Target Sciences, GlaxoSmithKline (GSK) R & D, Collegeville, PA 19426, USA other hosts. Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Wang et al. BMC Systems Biology (2018) 12:3 Page 2 of 18 Currently, the standard TB therapy involves a regimen based genome array platform; 2) there was at least one of four antibiotics taken during an initiation phase of control group and patient group in the dataset, with the two months and a continuation phase of 4–7 months [7, control group consisting of only healthy subjects, and the 8]. Despite a high efficacy at the onset of administration, patient group consisting of patients only infected by Mtb the confounded intervention and prolonged treatment without other diseases such as HIV; 3) each patient and period present challenges for patient compliance and po- control group had at least three samples. tentially foster the emergence of multidrug-resistance of Raw gene expression data, study design table and anno- Mtb. Thus, it is imperative to develop more effective tation table of each dataset were obtained from the GEO therapeutic approaches such as host-directed therapies. database and processed using ArrayStudio v8.0 (OmicSoft, Targeting the host has its advantages in potentially being USA). All datasets retrieved are microarray datasets ex- less susceptible to the drug resistance problem, with cept for GSE41055, which is an exon array and was ex- greater opportunities for repositioning known drugs to cluded from further analysis. Several datasets (GSE42834, new indications. GSE56153, GSE31348 and GSE36238) were further ex- Recent genome-wide studies of host-pathogen interac- cluded due to both a noisy kernel density plot and low tions provide new insights into human genes and path- within group pairwise correlation (correlation cutoff 0.9 as ways modulated by pathogen infection and proliferation. suggested in [15], Additional file 1), but were included as Multiple studies have generated extensive human micro- independent datasets for validation purposes. After quality array gene expression datasets for subjects infected by filtering, nine microarray datasets (GSE19435, GSE19439, Mtb with the overall objective of developing diagnostic GSE19444, GSE28623, GSE29536, GSE34608, GSE54992, biomarkers [9–11]. For example, by comparing 14 public GSE62525 and GSE65517) from either whole blood or human gene expression datasets for LTB, PTB and other peripheral blood mononuclear cells (PBMCs) were respiratory diseases, Sweeney et al. identified a three-gene retained for further analysis. set that was robustly diagnostic for PTB [12]. However, to We also searched for available datasets of individual hu- our knowledge, no systematic search for potential host man immune cells under in vitro Mtb challenge. We identi- therapeutic targets against TB has been published. fied three datasets of human dendritic cells (GSE34151, Previously, we applied integrated analysis to human GSE360 and GSE53143) and four datasets of THP-1 hu- gene expression data in response to respiratory bacterial man leukemia monocytic cells (GSE17477, GSE29628, and viral infections in order to identify novel host targets GSE51029 and GSE57028), and processed them separately. and potential drug repurposing opportunities [13, 14]. In By comparing gene expression profiles derived from this study, we performed a statistical meta-analysis on PBMCs or whole blood to those of immune cells, we could human gene expression data available for Mtb infection evaluate how well blood gene expression patterns recapitu- to identify common human transcriptomic signatures late those of immune cells during Mtb infection. characteristic of TB. The advantage of meta-analysis lies Quality Control (QC) analysis was performed for each of in its capability to alleviate potential study-specific biases the nine datasets as described previously [13]. Briefly, sam- and enhance statistical power to identify robust expres- ples that were outliers in at least two of the four assess- sion signature through increased sample size. We lever- ments: 1) kernel density; 2) Principal Component Analysis aged other evidence such as human genetic disease (PCA); 3) Median Absolute Deviation (MAD) score and; 4) associations and drug-repurposing analyses to prioritize within group pairwise correlation, were excluded from individual targets and compounds. We identified mul- downstream analyses. For example, samples GSM484458 tiple host genes and pathways that were significantly al- and GSM484465 in GSE19439, and GSM851876 and tered in TB and propose several potential drug GSM851889 in GSE34608 were excluded because they candidates for repurposing. were outliers in both MAD score and pairwise correlation (Additional file 2). Samples GSM484369 from GSE19435, Methods and GSM484609 and GSM484629 from GSE19444, were Data sources, filtering and selection outliers in both pairwise correlation and kernel density. Human microarray gene expression datasets in response Samples irrelevant to our study design (such as samples to Mtb infection were retrieved from the National Center from sarcoidosis patients in GSE34608) were also excluded for Biotechnology Information’s (NCBI) Gene Expression from each dataset. In total, 106 control, 76 LTB and 131 Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/ PTB samples from the nine datasets passed the QC criteria geo/), by searching ‘Tuberculosis,’ ‘Homo sapiens’ and ‘ex- for statistical meta-analysis (Table 1). pression profiling by array’. GEO datasets