Regulator Combinations Identify Systemic Sclerosis Patients with More Severe Disease
Total Page:16
File Type:pdf, Size:1020Kb
Regulator combinations identify systemic sclerosis patients with more severe disease Yue Wang, … , Monique Hinchcliff, Michael L. Whitfield JCI Insight. 2020. https://doi.org/10.1172/jci.insight.137567. Research In-Press Preview Dermatology Systemic sclerosis (SSc) is a heterogeneous autoimmune disorder that results in skin fibrosis, autoantibody production and internal organ dysfunction. We previously identified four ‘intrinsic’ subsets of SSc based upon skin gene expression that are found across organ systems. Gene expression regulators that underlie the SSc intrinsic subsets, or are associated with clinical covariates, have not been systematically characterized. Here we present a computational framework to calculate the activity scores of gene expression regulators and identify their associations with SSc clinical outcomes. We find regulator activity scores can reproduce the intrinsic molecular subsets with distinct sets of regulators identified for inflammatory, fibroproliferative and normal-like samples. Regulators most highly correlated with modified Rodnan skin score (MRSS) also varied by intrinsic subset. We identify a subgroup of fibroproliferative/inflammatory SSc patients with more severe pathophenotypes. We further identify a subgroup of SSc patients that had higher MRSS and increased likelihood of interstitial lung disease. Using an independent cohort, we show this group was most likely to show forced vital capacity decline over a period of 36 – 54 months. Our results demonstrate an association between the activation of regulators, gene expression subsets and clinical variables that can identify SSc patients with more severe disease. Find the latest version: https://jci.me/137567/pdf 1 Regulator combinations identify systemic sclerosis patients with more severe 2 disease 3 4 Yue Wang1, Jennifer M. Franks1, Monica Yang2, Diana M. Toledo1, Tammara A. Wood1, Monique 5 Hinchcliff2 and Michael L. Whitfield1,* 6 7 1Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, New 8 Hampshire 03756, USA. 2Department of Internal Medicine, Northwestern University Feinberg School of 9 Medicine, Chicago, Illinois, 60611, USA 10 11 12 Corresponding Author: 13 Michael L. Whitfield, Ph.D. 14 Department of Biomedical Data Science 15 Geisel School of Medicine at Dartmouth 16 1 Medical Center Drive 17 Lebanon, NH 03756 18 [email protected] 19 Phone: 603-650-1109, Fax: 603-650-1188 20 21 Conflict of interest statement 22 The authors have declared that no conflict of interest exists. 23 Abstract 24 Systemic sclerosis (SSc) is a heterogeneous autoimmune disorder that results in skin fibrosis, 25 autoantibody production and internal organ dysfunction. We previously identified four ‘intrinsic’ 26 subsets of SSc based upon skin gene expression that are found across organ systems. Gene 27 expression regulators that underlie the SSc intrinsic subsets, or are associated with clinical 28 covariates, have not been systematically characterized. Here we present a computational 29 framework to calculate the activity scores of gene expression regulators and identify their 30 associations with SSc clinical outcomes. We find regulator activity scores can reproduce the 31 intrinsic molecular subsets with distinct sets of regulators identified for inflammatory, 32 fibroproliferative, limited and normal-like samples. Regulators most highly correlated with 33 modified Rodnan skin score (MRSS) also varied by intrinsic subset. We identify subgroups of 34 fibroproliferative and inflammatory SSc patients with more severe pathophenotypes, such as 35 higher MRSS and increased likelihood of interstitial lung disease (ILD). Using an independent 36 cohort, we show the group with more severe ILD was more likely to show forced vital capacity 37 decline over a period of 36 – 54 months. Our results demonstrate an association between the 38 activation of regulators, gene expression subsets and clinical variables that can identify SSc 39 patients with more severe disease. 40 41 42 Keywords: Systemic sclerosis, Biomarkers, Pulmonary fibrosis, forced vital capacity decline, 43 Computational biology 44 Introduction 45 Systemic sclerosis (SSc) is a heterogeneous autoimmune disease that results in the production of 46 autoantibodies, skin fibrosis, and internal organ involvement (1); the pattern and severity of skin 47 and organ involvement varies across patients. Clinically, SSc is divided into two subtypes based 48 on the extent of skin involvement including limited cutaneous (lc) and diffuse cutaneous (dc) SSc 49 (2). The lungs, heart, kidney, and other organs may also be involved (1, 3, 4). We have previously 50 identified SSc ‘intrinsic’ subsets (fibroproliferative, inflammatory, limited, and normal-like) based 51 upon skin gene expression (5-7), and that may predict response to therapy (8, 9). Analysis of skin 52 gene expression across cohorts identified interactions between immune and stromal cells that may 53 act as key drivers of SSc pathogenesis in patients with a permissive genetic background (10, 11). 54 Regulators, either acting at the transcriptional or post-transcriptional level, control the 55 expression of their target gene networks, thus they contribute to different biological phenotypes 56 and can be proxies for tightly, coordinated and regulated pathways (12). However, the regulators 57 that underlie SSc and the association with clinical phenotypes have not been systematically 58 investigated. The goals of these analyses were two-fold. First, to identify regulators of gene 59 expression, such as transcription factors (TFs) and microRNAs (miRNAs), that are enriched in the 60 SSc intrinsic subsets, and second to identify regulators that could identify SSc patients with more 61 severe skin and lung disease. Our reasoning is that the regulators, and the network of genes that 62 they control, may be informative of pathological processes acting in SSc. 63 Herein, we constructed a computational framework to systematically examine the activity 64 of 836 regulators across 431 SSc skin and 35 SSc blood samples using publicly available gene 65 expression data (see Figure 1 for an overview). We characterized each regulator’s target genes and 66 their expression profile to infer the regulator activity scores for each SSc sample using the BASe 67 algorithm (13). Regulator activity scores were correlated with the modified Rodnan Skin Score 68 (MRSS), a common measure of SSc severity, to identify those with activity scores highly 69 associated with severity of skin disease, particularly in fibroproliferative and inflammatory 70 patients. We then built an interaction network using regulators that were significantly associated 71 with MRSS to give a comprehensive picture of the regulatory interactions within the SSc intrinsic 72 subsets. Then, novel subgroups within the fibroproliferative or inflammatory intrinsic subset were 73 identified by using a combination of the activity scores of two MRSS-correlated regulators. 74 Further, we found a novel group of SSc samples containing patients with higher MRSS and 75 significant decline in forced vital capacity (FVC) over 36 months of follow-up. 76 77 Results 78 Regulator activity scores reproduce the SSc intrinsic subsets 79 We analyzed the regulator activity scores in four independent, publicly available SSc datasets of 80 Milano et al., Pendergrass et al., Hinchcliff et al., and Assassi et al. (5-7, 14). For each dataset, we 81 first calculated sample-specific regulator activity scores using BASe (13) by integrating gene 82 expression profiles and regulator target gene sets (see Methods). We applied intrinsic gene analysis 83 to calculate a within-between score (15) to select regulators that showed the most consistent 84 activity scores within an SSc intrinsic subset, but had the most variable activity scores across all 85 subsets. The 270 regulators that had a false discovery rate (FDR) less than 2% in at least three 86 datasets were considered to be “intrinsic SSc regulators” and included activator proteins; 87 CCAAT/enhancer-binding proteins; members of the e2F family, eTS family, STAT family, and 88 GATA family; glucocorticoid receptors; interferon regulatory factors; RUNX1-related core 89 binding factors; B- and T-cell-related transcription factors; and numerous miRNAs 90 (Supplementary Table S1). 91 We organized the samples and regulators in each dataset by hierarchical clustering of the 92 activity scores with the 270 regulators. Broadly, we found that samples in the fibroproliferative 93 subset clustered together and displayed activation of key regulators of cell proliferation, such as 94 members of the e2F, MYC, and eTS families (Figure 2 and Supplementary Figures S1 to S4). 95 Target genes of these fibroproliferative cluster regulators are highly enriched in cell cycle and 96 DNA replication pathways (Supplementary Table S2 and Figure S5; corrected P-value < 0.05), 97 consistent with the activation of biological processes enriched in the fibroproliferative subset. 98 Immune-related proteins, such as those from the STAT family, Runx1-related core binding factors, 99 and nuclear factor of activated T cells (NFAT), are enriched in the inflammatory subset (Fig. 2 100 and Supplementary Figures. S1 to S4). Their target genes are significantly involved in immune- 101 related and signal transduction pathways such as the B-/T-cell receptor signaling pathway, Th17- 102 cell differentiation, and the TGF-beta signaling pathway (Supplementary Table S2 and Figure S6; 103 corrected