bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Full title: Defining tissue- and disease-associated macrophages using a 2 transcriptome-based classification model 3 4 Short title: macIDR: A transcriptome-driven classifier 5 6 Authors 1,† 2,3,† 1 7 Hung-Jen Chen , Andrew Y.F. Li Yim , Guillermo R. Griffith , Wouter J. de 4 2 6,# 2,& 8 Jonge , Marcel M.A.M. Mannens , Enrico Ferrero , Peter Henneman , Menno P.J. de 1,5,&,* 9 Winther 10 11 Affiliations 12 13 1 Department of Medical Biochemistry, Experimental Vascular Biology, Amsterdam University Medical 14 Centers, University of Amsterdam, Amsterdam, the Netherlands. 15 2 Genome Diagnostics Laboratory, Department of Clinical Genetics, Amsterdam University Medical 16 Centers, University of Amsterdam, Amsterdam, the Netherlands. 17 3 Epigenetics Discovery Performance Unit, GlaxoSmithKline, Stevenage, United Kingdom. 18 4 Tytgat Institute for Liver and Intestinal Research, Amsterdam University Medical Centers, University of 19 Amsterdam, Amsterdam, the Netherlands. 20 5 Institute for Cardiovascular Prevention (IPEK), Munich, Germany 21 6 Computational Biology, Target Sciences, GlaxoSmithKline, Stevenage, United Kingdom. 22 # Present Address: Autoimmunity Transplantation and Inflammation Bioinformatics, Novartis Institutes for 23 Biomedical Research, Basel, Switzerland. 24 *Corresponding author: [email protected] (MPJW) 25 †HJC and AYFLY contributed equally to this work. 26 &PH and MPJW are joint senior authors. 27 28

Page 1 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

29 Abstract 30 31 Macrophages are heterogeneous multifunctional leukocytes which are regulated in a tissue- and disease- 32 specific context. Many different studies have been published using in vitro macrophage models to study 33 disease. Here, we aggregated public expression data to define consensus expression profiles for eight 34 commonly-used in vitro macrophage models. Altogether, we observed well-known but also novel markers 35 for different macrophage subtypes. Using these data we subsequently built the classifier macIDR, capable 36 of distinguishing macrophage subsets with high accuracy (>0.95). This classifier was subsequently applied 37 to transcriptional profiles of tissue-isolated and disease-associated macrophages to specifically define 38 macrophage characteristics in vivo. Classification of these in vivo macrophages showed that alveolar 39 macrophages displayed high resemblance to interleukin-10 activated macrophages, whereas 40 macrophages from patients with chronic obstructive pulmonary disease patients displayed a drop in 41 interferon-γ signature. Adipose tissue-derived macrophages were classified as unstimulated macrophages, 42 but resembled LPS-activated macrophages more in diabetic-obese patients. Finally, rheumatoid arthritic 43 synovial macrophages showed characteristics of both interleukin-10 or interferon-γ signatures. Altogether, 44 our results suggest that macIDR is capable of identifying macrophage-specific changes as a result of 45 tissue- and disease-specific stimuli and thereby can be used to better define and model populations of 46 macrophages that contribute to disease. 47

Page 2 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

48 MAIN TEXT 49 50 Introduction 51 Macrophages are multifunctional innate immune cells that play a central role in the spatiotemporal 52 regulation of tissue homeostasis between pro-inflammatory defense and anti-inflammatory tissue repair. 53 Dysregulation of macrophages has been implicated in a variety of disorders. As in vivo macrophages are 54 often difficult to obtain and study, in vitro peripheral blood monocyte derived macrophages (MDMs) have 55 been used extensively as model systems for assessing the transcriptional and functional regulation in 56 response to various stimuli. 57 To mimic in vivo macrophages encountering different microenvironmental signals (1, 2), MDMs, 58 differentiated with for example macrophage- (M-CSF), or granulocyte-macrophage stimulating factor (GM- 59 CSF), are activated in vitro with bacterial lipopolysaccharides (LPS) or Th1 cytokine interferon gamma 60 (IFNγ) to generate pro-inflammatory macrophages (M1). Anti-inflammatory macrophages (M2) are often 61 generated by activating the cells with Th2 cytokines, interleukin-4 (IL4), or other anti-inflammatory stimuli, 62 such as interleukin-10 (IL10), tumor growth factor (TGF) and glucocorticoids (1-6). By applying these pro- 63 or anti-inflammatory stimuli to the MDMs, researchers sought to obtain proper models for studying the 64 transcriptomic alteration associated with inflammation, infection, wound healing, and tumor growth. 65 However, compared to in vitro MDM activation, in vivo macrophage activation represents a complex and 66 dynamic process driven by multiple local factors. The most comprehensive study to date on 67 expression profiling of in vitro macrophages was performed by Xue et al., where the authors activated 68 MDMs under various conditions and identified distinct stimulus-specific transcriptional modules (5). Their 69 results suggested a broader view of how macrophages react upon divergent stimulation, thereby extending 70 the classical dichotomous pro- and anti-inflammatory model to an activation spectrum. While there have 71 been attempts to summarize published studies in an effort to attain consensus (4), a proper integrative 72 analysis has thus far not been performed. Furthermore, it remains unclear to what extent the in vitro 73 macrophage-models resemble their in vivo counterparts and which specific subtypes associate with 74 disease. 75 In this study, we integrated 206 microarray and RNA-sequencing (RNA-seq) datasets from 19 76 different studies (7-24) (Table 1) to systematically characterize eight different human MDM activation 77 states. First, we identified consistently differentially expressed (cDEGs) by comparing activated with 78 unstimulated macrophages using a random effects meta-analysis (25). Next, we implemented penalized 79 multinomial logistic regression to construct a classification model, macIDR (macrophage identifier), 80 capable of distinguishing specific MDM activation states, independent of the macrophage differentiation 81 factor used. Finally, we used macIDR to project in vivo tissue- and disease-associated macrophages onto 82 the eight in vitro MDMs (26-34) and to identify expression signatures derived from the tissues and specific 83 for the patient groups.

Page 3 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

84 85 86 Results

87 Meta-analysis defines both well-known and novel transcriptional markers of macrophage 88 activation states

89 We searched the public repositories Gene Expression Omnibus and ArrayExpress for MDMs differentiated 90 with macrophage colony-stimulating-factor (M-CSF) and activated with commonly used stimuli. To be 91 included in the analysis, each activated MDM alongside an unstimulated control had to be represented by 92 at least 4 different studies, with each study containing at least 2 biological replicates per activation state. In 93 total, we assembled a cohort for eight macrophage activation states: unstimulated macrophages (M0) and

94 macrophages activated by: short exposure (2 to 4 hours) to LPS (M-LPSearly) or long exposure (18 to 24

95 hours) to LPS (M-LPSlate), LPS with IFNγ (M-LPS+IFNγ), IFNγ (M-IFNγ), IL-4 (M-IL4), IL-10 (M-IL10), and 96 dexamethasone (M-dex) (Table 1). To find consistent transcriptional differences, we performed a random 97 effects meta-analysis where we calculated the standardized effect size per study by comparing each 98 macrophage activation state with M0. 99 We identified consistent differentially expressed genes (cDEGs) that were previously observed to 100 be characteristic for certain in vitro macrophage subsets (Table S1). For example, interleukin-1 beta 101 (IL1B), C-C Motif Chemokine Ligand 17 (CCL17) and Cluster of Differentiation 163 (CD163) were

102 consistently upregulated when comparing M-LPSlate, M-IL4, and M-dex with M0, respectively (4). Notably, 103 we also identified several novel genes that were not widely considered as activated macrophage markers. 104 Consistent upregulation of interleukin 7 receptor (IL7R) and CD163 Molecule Like 1 (CD163L1) was

105 observed when comparing M-LPSearly and M-IL10 with M0, respectively. On the other hand, consistent 106 downregulation of Nephroblastoma Overexpressed (NOV, also known as CCN3) and Adenosine A2b

107 receptor (ADORA2B) was observed when comparing M-IFNγ, M-LPSlate, and M-LPS+IFNγ with M0, 108 respectively.

109 Correlation and enrichment analyses display classical pro- and anti-inflammatory clustering

110 Pairwise correlation analysis of the standardized effect sizes across studies showed that M-LPSearly, M-

111 LPSlate, M-IFNγ, and M-LPS+IFNγ formed one cluster, whereas M-IL4, M-IL10 and M-dex formed a second 112 cluster (Fig. 1A). Other activation states could not be discerned easily, which was attributed to study- 113 specific effects. This observation agrees with previous studies where transcriptional alterations induced 114 with these conventional pro- and anti-inflammatory stimuli were found to cluster according to the M1-M2 115 dichotomy (2, 5). Further sub-clustering appeared to divide the macrophages according to the stimuli. The 116 separation between the M1 and the M2 was also apparent as the first principal component (PC1), which 117 displayed a clear separation between M1 and M2 associated stimuli on the left and right, respectively (Fig. 118 1B). Relative to the M1 subsets, the M2 subsets appeared to display a stronger diversity along PC2, 119 suggesting a more distinctive transcriptional programming of the individual subsets. 120 Next, we performed canonical pathway analysis using the Ingenuity Pathway Analysis (IPA) 121 software package. While the clusters were slightly different among individual macrophage subsets, the

Page 4 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

122 overall separation between M1 and M2 was still apparent with hierarchical clustering revealing two sets of 123 pathways that appeared to be responsible for the separation (Fig. 1C and Table S1). Pathways known for 124 their pro- and anti-inflammatory responses, such as interferon signaling and LXR/RXR activation, 125 displayed a clear and distinct pattern for the M1 and M2 subsets, respectively.

126 A multinomial elastic net classifier distinguishes macrophage activation states with high accuracy

127 Subsequently, we sought to perform feature selection to identify genes capable of distinguishing 128 macrophage activation states. We merged the data from different microarray and RNA-seq platforms and 129 performed multinomial elastic net regression on the 5986 overlapping genes present in all datasets. The 130 expression data was randomly split into a training (2/3) and a test (1/3) set whereupon the training set was 131 used to build a model through ten-fold cross-validation. We repeated the cross-validation procedure 500 132 times and took the median thereof to stabilize the log odds ratios (35). Genes were considered stable 133 predictors if their log odds ratio was non-zero in more than 50% of the iterations (Fig. S1). Subsequent 134 classification was performed using the median log odds ratio per gene across all iterations (Fig. 2A). 135 Altogether, our classifier was composed of 97 median-stabilized predictor genes, and was compiled as an 136 R package called macIDR (https://github.com/ND91/macIDR). 137 To validate our model, we tested macIDR against the previously withheld test set, which included 138 a newly-generated RNA-seq experiment containing all included activation states. Classification of the test 139 set revealed an accuracy above 0.95 with both high sensitivity (>0.98) and specificity (> 0.83; Table 2). In 140 total, 75 out of 79 test samples were correctly classified (Fig. 2B). Notably, for three of the four 141 misclassified samples, the second-best prediction was the subset as reported by the authors. Investigation 142 of the four misclassified samples revealed that most errors were made regarding M0: two M0 datasets 143 were classified as M-IL10 (GSM151655) and M-IFNγ (GSM1338795), and one M-dex dataset (d54D) as

144 M0. The fourth misclassification pertained a M-LPSlate (GSM464241) dataset classified as M-LPS+IFNγ 145 (Fig. 2C). 146 Pathway analysis of the predictor genes revealed a clear enrichment for inflammatory pathways, 147 such as TNFα signaling, inflammatory response and interferon gamma signaling, confirming the 148 importance of inflammation regulation in macrophage activation (Fig. 2D, Table S3). A follow-up 149 transcription factor motif analysis on the promoters of the predictor genes showed significant enrichment 150 for macrophage transcription factors, the E26 transformation-specific PU.1 (Spi1) and SpiB (Fig. 2E).

151 MacIDR generalizes to granulocyte-macrophage colony-stimulating-factor differentiated 152 macrophages

153 Besides M-CSF, granulocyte-macrophage colony-stimulating-factor (GM-CSF) is often used to differentiate 154 monocytes to macrophages, where it is thought to evoke a more pro-inflammatory phenotype (36-38). We 155 investigated whether macIDR was capable of discerning GM-CSF-differentiated MDMs (GM-MDMs) 156 exposed to various stimuli. In total, we obtained 31 datasets from three microarray studies (5, 16, 39)

157 where GM-MDMs were activated with LPS (GM-LPSearly, GM-LPSlate), IFNγ (GM-IFNγ), and IL4 (GM-IL4), 158 or remained unstimulated (GM0). These studies were selected based on their similarity in activation

Page 5 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

159 duration with the M-CSF differentiated macrophages used in the training set. We observed that all GM- 160 MDMs were classified as their M-CSF counterparts (Fig. 3B) despite the fact that some predictive genes

161 were absent from two studies (Table S4). Our results indicate that the predictor genes for M-LPSearly, M-

162 LPSlate, M-IFNγ and M-IL4 are representative for the activation regardless of whether M-CSF or GM-CSF 163 was used for differentiation.

164 Non-macrophage cells classify as M0 whereas monocyte-derived dendritic cells classify as M-IL4

165 To understand the limitations of macIDR, we investigated its performance on non-MDM cells. Datasets 166 were obtained of monocyte-derived dendritic cells (MoDC) (5, 39), various T lymphocyte subtypes (T), B 167 lymphocytes (B), neutrophils (NP), and natural killer cells (NK) (16, 40), as well as fibroblast-like 168 synoviocytes (FLS) (27). 169 The MoDCs were primarily recognized as M-IL4 regardless of subsequent LPS maturation or other

170 pro-inflammatory stimulation (Fig. 3C). Spurious classification was ruled out as GM0 and GM-LPSlate 171 datasets from the same studies were correctly classified as described previously. Investigation of the M- 172 IL4 predictor genes revealed that concordant expression of monoamine oxidase A (MAOA) and lower 173 discordant expression of C-X-C motif chemokine 5 (CXCL5) of MoDCs were likely the main contributing 174 factors towards the M-IL4 classification (Table S4). By contrast, we found that the FLS, T, B, NP and NK 175 cells were all classified as M0 (Fig. 3A), suggesting that the M0 class is used as a label for expression 176 profiles of non-monocyte-derived cells. By looking at the distribution of the log odds we observed that true 177 M0 classifications generally displayed higher log odds than the M0 classifications of most non-macrophage 178 cells, but found this not to be definite (Fig. 3D). The increased variance for the M0 signal observed for GM- 179 MDMs and MoDCs was due to the different GM-MDMs activation and DC maturation states, as GM0 and 180 up to some extent DC0, depicted M0 log odds similar to true M0 samples.

181 Alveolar macrophages from COPD and smoking individuals show a reduced M-IFNγ signal

182 We next attempted to classify macrophages derived from patient tissues to study their semblance to in 183 vitro generated MDMs. We investigated alveolar macrophages (AMs) obtained through bronchioalveolar 184 lavage from smoking individuals, COPD patients, asthma patients (31), and healthy control (HCs). Overall, 185 we found that the AMs were classified primarily as M-IL10 (Fig. 4A and Table S10), which appears to be 186 driven by the MARCO signal. This corroborates the observation that lung tissue, specifically AMs, display 187 significantly higher gene expression of MARCO relative to surrounding cells and other tissues (41, 42). 188 Moreover, MARCO was found to be necessary in AMs for mounting a proper defense response (41-43). 189 Further comparisons of the AMs derived from different patient groups, we found that the macrophage 190 signal could be stratified according to health status where COPD- and smoker-derived AMs displayed a 191 higher M-IL4, M-IL10 and M-dex signal and a reduced M-IFNγ and M-LPS+IFNγ signal compared to AMs 192 obtained from HCs. This observation indicates a stronger M2- and lesser M1-like phenotype, which was 193 also noted in previous studies (5, 31). This difference in classification signal was found to be driven by a 194 decreased log odds for CXCL9 (M-IFNγ and M-LPS+IFNγ) and CXCL5 (M-IL4) and increased log odds for 195 TNFAIP6 (M-IL10), and ADORA3 (M-dex; Fig. S2).

Page 6 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

196 Furthermore, while no clear differences were observed when comparing AMs from asthma 197 patients to HCs, AMs from steroid-sensitive asthma patients displayed a decreased M-dex signal and

198 increased M-LPSearly, M-IFNγ and M-LPS+IFNγ signal relative to steroid resistant asthma AMs (Fig. 4A). 199 The apparent difference in steroid sensitivity appeared to be caused by a difference in TNF and CXCL9 200 signals contributing to the M-IFNγ and M-LPS+IFNγ classification respectively (Fig. S2). This observation 201 corroborates previous studies where IFNγ signaling was found to suppress glucocorticoid-triggered 202 transcriptional remodeling in macrophages leading to the macrophage-dependent steroid-resistance, 203 thereby reflecting a higher level of IFNγ in steroid-resistant relative to steroid-sensitive asthma patients.

204 Adipose macrophages show a M0 and M-IL4 classification

205 We next investigated visceral adipose tissue macrophages (ATM) derived from diabetic and non-diabetic 206 obese patients (34). Classification analysis suggested visceral ATMs showed most similarity with M0 207 followed by M-IL4 (Fig. 4B). While we were not capable of defining a set of genes responsible for the M0 208 classification, we observed that the concordant expression of MAOA and C-C chemokine ligand 18 209 (CCL18) contributed the most to the M-IL4 signal (Fig. S3). MAOA encodes a norepinephrine degradation 210 and is expressed more in sympathetic neuron-associated macrophages isolated from the 211 subcutaneous adipose tissue. In these cells, MAOA’s norepinephrine clearance activity has been linked to 212 obesity (44). Interestingly, when comparing ATMs from diabetic obese with non-diabetic obese patients,

213 we observed a stronger signal of M-IL10 and M-LPSearly driven by CCL18 and TNF respectively (Fig. S3). 214 Notably, CCL18 expression in both visceral and subcutaneous adipose tissue has been associated with 215 insulin-resistant obesity (45, 46). Furthermore, several studies have demonstrated that ATMs could be 216 divided into CD11C+CD206+ and CD11C-CD206+ subpopulations (47, 48). Specifically, an increased 217 density of the IL10 and TNFα-secreting CD11C+CD206+ ATMs in adipose tissue was associated with

218 insulin resistance (47), which coincides with the observed increase of M-IL10 and M-LPSearly signals while 219 the M-IL4 signal remained unaltered. Altogether, this observation shows the diverse roles of macrophages 220 in obesity and highlights the complex crosstalk between neural signaling, immune system and metabolism.

221 Synovial macrophages from rheumatoid arthritis patients display similarity to M- IFNγ and M-IL10

222 Finally, we analyzed synovial macrophages (SMs) from RA patients and MDMs from HCs (26-29). 223 Whereas unstimulated MDMs were successfully classified as M0, SMs from RA patients were classified as 224 either M-IL10 or M-IFNγ (Fig. 4C). Specifically, RA-derived synovial macrophages (RA-SMs) from 3 225 studies (26, 28, 29) were classified as M-IL10, whereas samples from one study (27) were classified as M- 226 IFNγ. Comparison of the classification signal of the RA-SMs with the HC MDMs displayed a higher signal 227 for M-IFNγ and M-IL10 (Fig. 4C). Concordantly, a previous study reported an increased gene expression 228 of IFNG and IL10 in RA synovial fluid mononuclear cells compared with PBMCs from both RA patients and 229 HCs (49). Further investigation of the M-IFNγ and M-IL10 classification revealed dominant signals for M- 230 IFNγ predictor gene C-X-C Motif Chemokine Ligand 9 (CXCL9) and for M-IL10 predictor gene Macrophage 231 Receptor With Collagenous Structure (MARCO; Fig. S4). This observation agrees with previous studies 232 where elevated gene and expression of CXCL9 was found in the synovium of RA patients

Page 7 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

233 compared with that of osteoarthritis patients (50, 51). Similarly, an increased presence of MARCO was 234 detected in the inflamed joints, particularly in RA patients (52). 235 236 237 Discussion 238 In this study, we performed a macrophage characterization study by integrating public datasets of eight in 239 vitro macrophage activation states. Our meta-analysis returned both well-known and novel markers for 240 activated macrophages. At a genome-wide level, we observed separation according to the conventional 241 pro- and anti-inflammatory macrophages. We subsequently built a classification model capable of 242 discriminating macrophage activation states based on their transcriptomic profile and made this available 243 as an R package called macIDR. By applying macIDR to in vivo macrophages, we projected the latter onto 244 the eight in vitro macrophage models providing insights in how disease and tissue of origin affected the 245 predicted composition. 246 Previous macrophage characterization studies focused primarily on gathering large cohorts. Xue et 247 al. adopted an inclusive strategy by categorizing genes into different activation states through self- 248 organizing maps and correlation analyses (5). Instead, we sought to find consensus from published data 249 by implementing a descriptive and an exclusive strategy representing the meta-analysis and the elastic net 250 classification analysis respectively. Where the meta-analysis identified genes that were consistently 251 differentially expressed across studies when comparing stimulated with unstimulated MDMs, elastic net 252 classification analysis represented a rigorous feature selection approach that yielded predictor genes 253 capable of classifying the eight in vitro MDMs with high accuracy. We implemented an additional layer of 254 robustness by performing repeated cross-validation to ensure that the final output of the elastic net 255 regression was stable. 256 Many of the observed cDEGs and predictor genes have been recognized as bona fide markers for

257 different activation states, such as TNF (M-LPSearly), IDO1 (M-LPS+IFNγ), CXCL9 (M-IFNγ), and ADORA3 258 (M-dex) (4) (Table S1 and Fig 2A). Pathway and transcription factor motif analyses of the predictor genes 259 revealed enrichment for inflammatory pathways and macrophage transcription factors PU.1 and SpiB. As 260 both PU.1 and SpiB are key transcription factors that drive macrophage differentiation (53), our 261 observation suggests that activation is determined by the regulation of macrophage-specific inflammation. 262 To test the limits of the classification model, we classified non-MDM cells. Classification of MoDCs 263 indicated that they were most similar to M-IL4, even after treatment with pro-inflammatory agents such as 264 LPS. This observation is likely due to the method to generate MoDCs, where monocytes are differentiated 265 with GM-CSF and IL4. Notably, while the MoDCs were classified primarily as M-IL4, some MoDCs

266 matured with LPS for 24 hours displayed a mildly increased signal towards M-LPS+IFNγ and M-LPSlate. 267 Similarly, immature MoDCs (DC0) displayed a slightly higher response for M0. All other non-macrophage 268 cells were classified as M0. While the log odds of proper M0 classifications were slightly higher than 269 improper M0 classifications, we acknowledge that no clear threshold could be defined. We are unsure why 270 M0 was predicted as a class for non-MDM and non-MoDC datasets, but we speculate that it might be 271 related to the M0 class being represented by most predictor genes relative to the other classes. We 272 therefore recommend potential users of macIDR to determine a priori that their dataset of interest

Page 8 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

273 represents macrophages either experimentally or using in silico cell-composition estimation methods, such

274 as CIBERSORT (54) or xCell (55). 275 As macIDR was capable of properly classifying differentially differentiated in vitro MDMs, we 276 applied it to in vivo macrophages with the goal of extracting subpopulation information. When comparing 277 tissue-derived macrophages among different patient groups and healthy donors, we observed differences 278 in predictions, such as reduced signals of M-IFNγ and M-LPS+IFNγ and increased signals of M-IL4, M- 279 IL10 and M-dex when comparing AMs from COPD patients with HCs. Moreover, we observed that the 280 tissue of origin had a large impact on the macrophage classification with in vivo macrophages obtained 281 from HCs not always being classified as M0 as was observed for the AMs. Since some in vivo 282 macrophages obtained from healthy tissue were classified as in vitro activated MDMs, unstimulated in vitro 283 MDMs likely do not reflect the basal state of all tissue-resident macrophages underpinning the importance 284 of how multiple factors in the microenvironment shape the transcriptome. Our results suggest that for 285 some in vitro models, using activated MDMs might achieve a more comparable phenotype to the in vivo 286 tissue macrophages that express the tissue transcriptomic signatures. 287 Unlike the AMs, no transcriptomic data was available of SMs from healthy donors. We were 288 therefore unable to conclude whether the M-IL10 and M-IFNγ predictions observed for the samples from 289 RA patients were tissue-specific or disease-associated. Though samples from different studies were 290 classified as M-IL10 or M-IFNγ, these two activation states appeared to be the highest two predicted 291 classes for SMs from RA patients among all recruited datasets. Notably, M-IL10 and M-IFNγ represent 292 predictions on opposite sides of the conventional inflammatory spectrum. Characterization studies on RA- 293 SMs suggested that they represent multiple subpopulations, such as the CD163+ anti-inflammatory tissue- 294 resident macrophages and the S100A8/9+ pro-inflammatory macrophages recruited from peripheral 295 monocytes (56). 296 As in vivo macrophages likely represent a more heterogeneous population compared to the in vitro 297 MDMs used in building the classifier, prediction of macrophage mixtures using bulk RNA-seq or microarray 298 returns signals from multiple different subsets. The classification therefore mainly reflects altered subset 299 proportions due to disease progression or association. It is likely that some in vivo macrophage 300 subpopulations do not share transcriptomic signatures with any of the eight in vitro MDMs preventing the 301 exact characterization thereof. Future studies using single cell RNA-sequencing should aim at defining 302 novel additional in vivo macrophage subsets. Nonetheless, we were capable of extracting signals from the 303 in vivo macrophage classifications that not only corroborated previous findings, but also provided novel 304 features for future research. It is essential to analyze macrophage subsets within the context of their in 305 vivo environment and therefore we provide this quantitative method to aid researchers in better defining 306 and modelling macrophages in tissue and disease. 307 308 Materials and Methods

309 Data selection

310 Datasets were found through an extensive search of the National Center for Biotechnology Information 311 (NCBI) Gene Expression Omnibus (GEO) and European Bioinformatics Institute (EBI) ArrayExpress (AE).

Page 9 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

312 For our search, we used the keywords “(macrophage) OR (monocyte) OR (MDM) OR (HBDM) OR 313 (MoDM) OR (MAC) OR (dendritic cell) AND "Homo sapiens” [porgn:__txid9606]”. Our search yielded 1,851 314 and 175 experiments for GEO and AE respectively at the time of writing (May 2018). 315 The initial screen was limited to studies that investigated primary macrophages and excluded the 316 stem cell derived macrophages or immortalized cell lines. Then we categorized macrophage subsets 317 based on the stimuli and the treatment time. For each subset, we sought to obtain at least 4 studies 318 including at least 2 biological replicates. Further, as a background control, only studies including 319 unstimulated control macrophages were selected. After this screening, we investigated macrophages 320 stimulated with control medium, LPS (either for 2 to 4 hours or 18 to 24 hours), LPS with IFNγ, IFNγ, IL4, 321 IL10 or dexamethasone for 18 to 24 hours. Microarray datasets generated on platforms other than 322 Illumina, Affymetrix or Agilent were excluded to ensure comparability. As we only investigated genes that 323 were measured in every single study, datasets that displayed limited overlap in the measured genes with 324 the other studies were removed. The in vivo macrophage datasets were samples obtained from clinical 325 specimen. In total, we obtained 206 datasets belonging to 19 studies for the meta-analysis and 326 classification.

327 Meta-analysis

328 A random effects meta-analysis was performed on the normalized data using the GeneMeta (v1.52.0) (57) 329 package, which implements the statistical framework outlined by Choi et al. (25). In short, the standardized 330 effect size (Cohen d adjusted using Hedges and Olkin’s bias factor) and the associated variance were 331 calculated for each study by comparing each activated macrophage with the unstimulated macrophage 332 within each study. The standardized effect sizes were then compared across studies by means of random 333 effects model to correct for the inter-study variation, thereby yielding a weighted least squares estimator of 334 the effect size and its associated variance. The estimator of the effect size was then used to calculate the 335 Z-statistic and the p-value, which was corrected for multiple testing using the Benjamini-Hochberg 336 procedure. We modified the GeneMeta functions to incorporate the shrunken sample variances obtained 337 from limma (58) for calculating the standardized effect sizes.

338 Classification

339 Raw microarray and RNA-seq data were log2 transformed where necessary after which the data was 340 (inner) merged and randomly divided into a training (2/3) and test set (1/3). As the test set should remain 341 hidden from the training set, raw microarray and RNA-seq data was used instead of normalized data to 342 prevent data leakage (59). Elastic net regression was subsequently performed using the R glmnet (v2.0) 343 (60) package. As penalized regression approaches are sensitive to the magnitude of each feature, we 344 investigated the use of standardization. As the lowest deviance appeared to be higher in the inter-fold 345 standardized training set relative to raw data, we did not include any standardization (Fig. S1). We also 346 investigated the optimal alpha for minimizing the deviance by means of an initial grid search approach, 347 which was found to be 0.8. 348 The training set was subjected to ten-fold cross-validation for tuning the penalty regularization 349 penalty parameter lambda. This process was subsequently repeated 500 times to stabilize the Page 10 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

350 randomness introduced during the splitting step for cross-validation (35). We considered genes to be 351 stable classifiers if they displayed a non-zero log odds ratio in at least 50% of the 500 iterations. The final 352 log odds ratio was selected by taking the median of each stable predictor gene across the 500 iterations. 353 We subsequently validated our classifier on the withheld test set. 354 In addition to the training and test data, we downloaded and imported additional datasets from 355 GM-MDMs, non-macrophage cells, and in vivo macrophages. Subsequent classification was performed 356 using the macIDR package. Unlike the studies included for training and testing, some of the included 357 studies were performed on platforms that did not measure the expression of some of the predictor genes. 358 To that end, the relative log odds ratios were calculated, which represent the log odds ratio present relative 359 to the total log odds ratio had all predictor genes been present.

360 Human monocyte-derived macrophage differentiation and stimulation

361 Buffy coats from three healthy anonymous donors were acquired from the Sanquin blood bank in 362 Amsterdam, the Netherlands. Monocytes were isolated through density centrifugation using Lymphoprep™ 363 (Axis-Shield) followed by human CD14 magnetic beads purifcation with the MACS® cell separation 364 columns (Miltenyi). The resulting monocytes were seeded on 24-well tissue culture plates at a density of 365 0.8 million cells/well. Cells were subsequently differentiated to macrophages for 6 days in the presence of 366 50ng/mL human M-CSF (Miltenyi) with Iscove's Modified Dulbecco's Medium (IMDM) containing 10% heat- 367 inactivated fetal bovine serum, 1% Penicillin/Streptomycin solution (Gibco) and 1% L-glutamine solution 368 (Gibco). The medium was renewed on the third day. After differentiation, the medium was replaced by 369 culture medium without M-CSF and supplemented with the following stimuli: nothing, 10ng/mL LPS 370 (Sigma, E. coli E55:O5), 10ng/mL LPS plus 50ng/mL IFNγ (R&D), 50ng/mL IFNγ, 50ng/mL IL4

371 (PeproTech), 50ng/mL IL10 (R&D), 100nM dexamethasone (Sigma) for 24 hours. LPSearly macrophages 372 were first cultured with culture medium for 21 hours and then stimulated with 10ng/mL LPS for 3 hours 373 prior to harvest.

374 RNA isolation and sequencing library preparation and analyses

375 Total RNA was isolated with Qiagen RNeasy Mini Kit per the manufacturer's recommended protocol. RNA 376 sequencing libraries were prepared using the standard protocols of NuGEN Ovation RNA-Seq System V2 377 kit. Size-selected cDNA library samples were sequenced on a HiSeq 4000 sequencer (Illumina) to a depth 378 of 16M per sample according to the 50 bp single-end protocol at the Amsterdam University Medical 379 Centers, location Vrije Universiteit medical center. Raw fastq files were subsequently processed in the 380 same manner as the public datasets to maintain consistency. 381 382 383 References and Notes 384 385 1. C. D. Mills, K. Kincaid, J. M. Alt, M. J. Heilman, A. M. Hill, M-1/M-2 macrophages and the Th1/Th2 386 paradigm. J Immunol 164, 6166-6173 (2000). 387 2. F. O. Martinez, S. Gordon, The M1 and M2 paradigm of macrophage activation: time for 388 reassessment. F1000Prime Rep 6, 13 (2014). 389 3. P. J. Murray, Macrophage Polarization. Annu Rev Physiol 79, 541-566 (2017).

Page 11 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

390 4. P. J. Murray et al., Macrophage activation and polarization: nomenclature and experimental 391 guidelines. Immunity 41, 14-20 (2014). 392 5. J. Xue et al., Transcriptome-based network analysis reveals a spectrum model of human 393 macrophage activation. Immunity 40, 274-288 (2014). 394 6. D. Y. Vogel et al., Human macrophage polarization in vitro: maturation and activation methods 395 compared. Immunobiology 219, 695-703 (2014). 396 7. J. Fuentes-Duculan et al., A subpopulation of CD163-positive macrophages is classically activated 397 in psoriasis. J Invest Dermatol 130, 2412-2422 (2010). 398 8. K. Schroder et al., Conservation and divergence in Toll-like receptor 4-regulated gene expression 399 in primary human versus mouse macrophages. Proc Natl Acad Sci U S A 109, E944-953 (2012). 400 9. M. E. Benoit, E. V. Clarke, P. Morgado, D. A. Fraser, A. J. Tenner, Complement protein C1q 401 directs macrophage polarization and limits inflammasome activity during the uptake of apoptotic 402 cells. J Immunol 188, 5682-5693 (2012). 403 10. S. Chandriani et al., Endogenously expressed IL-13Ralpha2 attenuates IL-13-mediated responses 404 but does not activate signaling in human lung fibroblasts. J Immunol 193, 111-119 (2014). 405 11. F. O. Martinez, S. Gordon, M. Locati, A. Mantovani, Transcriptional profiling of the human 406 monocyte-to-macrophage differentiation and polarization: new molecules and patterns of gene 407 expression. J Immunol 177, 7303-7311 (2006). 408 12. E. Derlindati et al., Transcriptomic analysis of human polarized macrophages: more than one role 409 of alternative activation? PLoS One 10, e0119751 (2015). 410 13. A. W. Jubb, R. S. Young, D. A. Hume, W. A. Bickmore, Enhancer Turnover Is Associated with a 411 Divergent Transcriptional Response to Glucocorticoid in Mouse and Human Macrophages. J 412 Immunol 196, 813-822 (2016). 413 14. J. Steiger et al., Imatinib Triggers Phagolysosome Acidification and Antimicrobial Activity against 414 Mycobacterium bovis Bacille Calmette-Guerin in Glucocorticoid-Treated Human Macrophages. J 415 Immunol 197, 222-232 (2016). 416 15. Y. Fujiwara et al., Guanylate-binding protein 5 is a marker of interferon-gamma-induced classically 417 activated macrophages. Clin Transl Immunology 5, e111 (2016). 418 16. M. Riera-Borrull et al., Palmitate Conditions Macrophages for Enhanced Responses toward 419 Inflammatory Stimuli via JNK Activation. J Immunol 199, 3858-3869 (2017). 420 17. J. Tsang et al., HIV-1 infection of macrophages is dependent on evasion of innate immune cellular 421 activation. AIDS 23, 2255-2263 (2009). 422 18. L. Przybyl et al., CD74-Downregulation of Placental Macrophage-Trophoblastic Interactions in 423 Preeclampsia. Circ Res 119, 55-68 (2016). 424 19. R. Byng-Maddick et al., Tumor Necrosis Factor (TNF) Bioactivity at the Site of an Acute Cell- 425 Mediated Immune Response Is Preserved in Rheumatoid Arthritis Patients Responding to Anti- 426 TNF Therapy. Front Immunol 8, 932 (2017). 427 20. E. Surdziel et al., Multidimensional pooled shRNA screens in human THP-1 cells identify 428 candidate modulators of macrophage polarization. PLoS One 12, e0183679 (2017). 429 21. S. H. Park et al., Type I interferons and the cytokine TNF cooperatively reprogram the 430 macrophage epigenome to promote inflammatory activation. Nat Immunol 18, 1104-1116 (2017). 431 22. H. Zhang et al., Functional analysis and transcriptomic profiling of iPSC-derived macrophages and 432 their application in modeling Mendelian disease. Circ Res 117, 17-28 (2015). 433 23. A. J. Martins et al., Environment Tunes Propagation of Cell-to-Cell Variation in the Human 434 Macrophage Gene Network. Cell Syst 4, 379-392 e312 (2017). 435 24. S. Realegeno et al., S100A12 Is Part of the Antimicrobial Network against Mycobacterium leprae 436 in Human Macrophages. PLoS Pathog 12, e1005705 (2016). 437 25. J. K. Choi, U. Yu, S. Kim, O. J. Yoo, paper presented at the ISMB (Supplement of Bioinformatics), 438 2003. 439 26. A. Yarilina, K. H. Park-Min, T. Antoniv, X. Hu, L. B. Ivashkiv, TNF activates an IRF1-dependent 440 autocrine loop leading to sustained expression of chemokines and STAT1-dependent type I 441 interferon-response genes. Nat Immunol 9, 378-387 (2008). 442 27. S. You et al., Identification of key regulators for the migration and invasion of rheumatoid 443 synoviocytes through a systems approach. Proc Natl Acad Sci U S A 111, 550-555 (2014). 444 28. K. Kang et al., Interferon-gamma Represses M2 Gene Expression in Human Macrophages by 445 Disassembling Enhancers Bound by the Transcription Factor MAF. Immunity 47, 235-250 e234 446 (2017). 447 29. D. L. Asquith et al., The liver X receptor pathway is highly upregulated in rheumatoid arthritis 448 synovial macrophages and potentiates TLR-driven cytokine release. Ann Rheum Dis 72, 2024- 449 2031 (2013).

Page 12 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

450 30. P. G. Woodruff et al., A distinctive alveolar macrophage activation state induced by cigarette 451 smoking. Am J Respir Crit Care Med 172, 1383-1392 (2005). 452 31. R. Shaykhiev et al., Smoking-dependent reprogramming of alveolar macrophage polarization: 453 implication for pathogenesis of chronic obstructive pulmonary disease. J Immunol 183, 2867-2883 454 (2009). 455 32. A. M. Madore et al., Alveolar macrophages in allergic asthma: an expression signature 456 characterized by heat shock protein pathways. Hum Immunol 71, 144-150 (2010). 457 33. E. Goleva et al., Corticosteroid-resistant asthma is associated with classical antimicrobial 458 activation of airway macrophages. J Allergy Clin Immunol 122, 550-559 e553 (2008). 459 34. E. Dalmas et al., T cell-derived IL-22 amplifies IL-1beta-driven inflammation in human adipose 460 tissue: relevance to obesity and type 2 diabetes. Diabetes 63, 1966-1977 (2014). 461 35. N. Meinshausen, P. Bühlmann, Stability selection. Journal of the Royal Statistical Society: Series 462 B (Statistical Methodology) 72, 417-473 (2010). 463 36. A. J. Fleetwood, T. Lawrence, J. A. Hamilton, A. D. Cook, Granulocyte-macrophage colony- 464 stimulating factor (CSF) and macrophage CSF-dependent macrophage phenotypes display 465 differences in cytokine profiles and transcription factor activities: implications for CSF blockade in 466 inflammation. J Immunol 178, 5245-5252 (2007). 467 37. J. A. Hamilton, Colony-stimulating factors in inflammation and autoimmunity. Nat Rev Immunol 8, 468 533-544 (2008). 469 38. D. C. Lacey et al., Defining GM-CSF- and macrophage-CSF-dependent macrophage responses 470 by in vitro models. J Immunol 188, 5752-5765 (2012). 471 39. R. Vento-Tormo et al., IL-4 orchestrates STAT6-mediated DNA demethylation leading to dendritic 472 cell differentiation. Genome Biol 17, 4 (2016). 473 40. S. Tasaki et al., Multi-omics monitoring of drug response in rheumatoid arthritis in pursuit of 474 molecular remission. Nat Commun 9, 2755 (2018). 475 41. C. Wu, X. Jin, G. Tsueng, C. Afrasiabi, A. I. Su, BioGPS: building your own mash-up of gene 476 annotations and expression profiles. Nucleic Acids Res 44, D313-316 (2016). 477 42. A. Palecanda et al., Role of the scavenger receptor MARCO in alveolar macrophage binding of 478 unopsonized environmental particles. J Exp Med 189, 1497-1506 (1999). 479 43. M. Arredouani et al., The scavenger receptor MARCO is required for lung defense against 480 pneumococcal pneumonia and inhaled particles. J Exp Med 200, 267-272 (2004). 481 44. R. M. Pirzgalska et al., Sympathetic neuron-associated macrophages contribute to obesity by 482 importing and metabolizing norepinephrine. Nat Med 23, 1309-1318 (2017). 483 45. D. Eriksson Hogling et al., Adipose and Circulating CCL18 Levels Associate With Metabolic Risk 484 Factors in Women. J Clin Endocrinol Metab 101, 4021-4029 (2016). 485 46. O. T. Hardy et al., Body mass index-independent inflammation in omental adipose tissue 486 associated with insulin resistance in morbid obesity. Surg Obes Relat Dis 7, 60-67 (2011). 487 47. J. M. Wentworth et al., Pro-inflammatory CD11c+CD206+ adipose tissue macrophages are 488 associated with insulin resistance in human obesity. Diabetes 59, 1648-1656 (2010). 489 48. D. L. Morris, K. Singer, C. N. Lumeng, Adipose tissue macrophages: phenotypic plasticity and 490 diversity in lean and obese states. Curr Opin Clin Nutr Metab Care 14, 341-346 (2011). 491 49. A. Bucht et al., Expression of interferon-gamma (IFN-gamma), IL-10, IL-12 and transforming 492 growth factor-beta (TGF-beta) mRNA in synovial fluid cells from patients in the early and late 493 phases of rheumatoid arthritis (RA). Clin Exp Immunol 103, 357-367 (1996). 494 50. P. Ruschpler et al., High CXCR3 expression in synovial mast cells associated with CXCL9 and 495 CXCL10 expression in inflammatory synovial tissues of patients with rheumatoid arthritis. Arthritis 496 Res Ther 5, R241-252 (2003). 497 51. D. D. Patel, J. P. Zachariah, L. P. Whichard, CXCR3 and CCR5 ligands in rheumatoid arthritis 498 synovium. Clin Immunol 98, 39-45 (2001). 499 52. N. Seta et al., Expression of host defense scavenger receptors in spondylarthropathy. Arthritis 500 Rheum 44, 931-939 (2001). 501 53. D. E. Zhang, C. J. Hetherington, H. M. Chen, D. G. Tenen, The macrophage transcription factor 502 PU.1 directs tissue-specific expression of the macrophage colony-stimulating factor receptor. Mol 503 Cell Biol 14, 373-381 (1994). 504 54. A. M. Newman et al., Robust enumeration of cell subsets from tissue expression profiles. Nat 505 Methods 12, 453-457 (2015). 506 55. D. Aran, Z. Hu, A. J. Butte, xCell: digitally portraying the tissue cellular heterogeneity landscape. 507 Genome Biol 18, 220 (2017). 508 56. M. Kurowska-Stolarska, S. Alivernini, Synovial tissue macrophages: friend or foe? RMD Open 3, 509 e000527 (2017). 510 57. L. Lusa, R. Gentleman, M. Ruschhaupt. (2018). Page 13 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

511 58. M. E. Ritchie et al., limma powers differential expression analyses for RNA-sequencing and 512 microarray studies. Nucleic Acids Res 43, e47 (2015). 513 59. S. Kaufman, S. Rosset, C. Perlich, paper presented at the Proceedings of the 17th ACM SIGKDD 514 international conference on Knowledge discovery and data mining, San Diego, California, USA, 515 2011. 516 60. J. Friedman, T. Hastie, R. Tibshirani, Regularization Paths for Generalized Linear Models via 517 Coordinate Descent. Journal of Statistical Software 33, 1-22 (2010). 518 519 520 521 Acknowledgments 522 523 General: We are thankful to all authors and participants of the studies included in this research for 524 sharing their transcriptomic data. This study makes use of data generated by the Blueprint Consortium. A 525 full list of the investigators who contributed to the generation of the data is available from www.blueprint- 526 epigenome.eu. Funding for the project was provided by the European Union's Seventh Framework 527 Programme (FP7/2007-2013) under grant agreement no 282510 – BLUEPRINT 528 Funding: This work was supported by the European Commission (SEP-210163258). 529 GlaxoSmithKline provided support in the form of salary for authors AYFLY. GlaxoSmithKline and Novartis 530 provided support in the form of salary for authors AYFLY and EF. GlaxoSmithKline and Novartis had no 531 additional role in the study design, data collection and analysis, decision to publish, or preparation of the 532 manuscript. 533 Author contributions: HJC performed the literature search. HJC and GG cultured the cells, 534 isolated the RNA for the RNA sequencing analysis. AYFLY downloaded, cleaned and performed the in 535 silico analyses of the public and in-house generated data. HJC, AYFLY, PH and MPJW decided on the 536 inclusion criteria for the public data. HJC and AYFLY drafted the manuscript. PH and MPJW conceived the 537 study and participated together with EF, WJJ and MMAMM in the design and coordination of the study. All 538 authors read and approved the final manuscript. 539 Competing interests: GlaxoSmithKline provided support in the form of salaries for author AYFLY. 540 GlaxoSmithKline and Novartis provided support in the form of salaries for author EF. WJJ was financially 541 supported by GlaxoSmithKline, Maed Johnsson, and Schwabe (aforementioned funds were unrelated to 542 this study) at the time of writing this manuscript. 543 Data and materials availability: The RNA-seq data generated and analyzed during the current 544 study are available in the ArrayExpress repository E-MTAB-7572. All public datasets analyzed during the 545 current study are available in the Gene Expression Omnibus and ArrayExpress repositories. Studies from 546 the GEO repository include: GSE18686, GSE19765, GSE30177, GSE47538, GSE5099, GSE57614, 547 GSE61880, GSE79077, GSE85346, GSE99056, GSE100382, GSE55536, GSE80727, GSE82227, 548 GSE99056, GSE75938, GSE46903, GSE93776, GSE10500, GSE49604, GSE97779, and GSE13896. 549 Studies from the ArrayExpress repository include: E-MEXP-2032, E-MTAB-3309, E-MTAB-5095, E-MTAB- 550 5913, and E-MEXP-3890. The scripts for performing the analyses are stored in the GitHub repository at 551 https://github.com/ND91/PRJ0000004_MACMETA. The macIDR package is stored in the GitHub 552 repository at https://github.com/ND91/macIDR. 553 554

Page 14 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

555 Figures and Tables 556

A B

0.6

MDM MDM Study M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ 0.3 M-IL4 M-IL10 M-dex

Study 1 GSE100382 0.0 0.8 GSE19765 GSE30177 0.6 GSE99056 PC2 (20%) 0.4 E-MEXP2032 BLUEPRINT 0.2 GSE18686 0 E-MTAB5095 -0.3 GSE79077 -0.2 GSE82227 GSE85346 E-MTAB5913 GSE5099 GSE55536 -0.6 GSE57614 GSE47538 -0.25 0.00 0.25 0.50 E-MTAB3309 PC1 (25.7%) GSE80727 MDM Study γ GSE61880 M-LPSearly M-LPS+IFN M-IL4 M-dex M-LPSlate M-IFNγ M-IL10

C

Role of IL-17F in Allergic Inflammatory Airway Diseases Role of RIG1-like Receptors in Antiviral Innate Immunity Interferon Signaling Type I Diabetes Mellitus Signaling Dendritic Cell Maturation TREM1 Signaling Neuroinflammation Signaling Pathway Inflammasome pathway Role of Pattern Recognition Receptors in Recognition of Bacteria and Viruses Osteoarthritis Pathway Gαi Signaling p53 Signaling iNOS Signaling Production of Nitric Oxide and Reactive Oxygen Species in Macrophages Toll-like Receptor Signaling MIF-mediated Glucocorticoid Regulation MIF Regulation of Innate Immunity Retinoic acid Mediated Apoptosis Signaling Sirtuin Signaling Pathway VDR/RXR Activation IL-9 Signaling p38 MAPK Signaling LPS/IL-1 Mediated Inhibition of RXR Function Aryl Hydrocarbon Receptor Signaling Tryptophan Degradation to 2-amino-3-carboxymuconate Semialdehyde Role of CHK in Cell Cycle Checkpoint Control Small Cell Lung Cancer Signaling IL-1 Signaling IL-6 Signaling Hypoxia Signaling in the Cardiovascular System Oxidative Phosphorylation EIF2 Signaling tRNA Charging LXR/RXR Activation PPAR Signaling D-myo-inositol (1,4,5)-trisphosphate Degradation Superpathway of D-myo-inositol (1,4,5)-trisphosphate Metabolism GDNF Family Ligand-Receptor Interactions IGF-1 Signaling Glutaryl-CoA Degradation Isoleucine Degradation I PPAR α/RXR α Activation Dopamine Receptor Signaling M-IL10 M-IL4 M-dex M-LPSlate M-LPSearly M-LPS+IFN M-IFN M-IL10 M-IL4 M-dex M-LPSlate M-LPSearly M-LPS+IFN M-IFN

γ γ -6-4-20246 γ γ 557 activation z score 558 Fig. 1. Summary meta-analysis. (A) Heatmap of the Cohen d pairwise Spearman correlation coefficients. 559 (B) Principal component analysis of the Z-values obtained from the meta-analysis. (C) Heatmap of the 560 canonical pathways with the intensity representing the activation z score. Two most defining clusters have 561 been enlarged and annotated on the right. 562

Page 15 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

median-stabilized log(OR) A -1 -0.5 0 0.5 1

M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex EBI3 CCL18 MAOA IDO1 TNFAIP6 IL7R TNFRSF21 CD9 CLEC10A CCL22 CTNNAL1 G0S2 SLC2A5 NAMPT ENPP2 RPL37 S100A9 PLTP SOCS3 RPL23A THBD PTX3 TNF CYTH4 CXCL10 EXOG MX2 SERPINA1 STAB1 FCMR TIMP3 CD101 PLEK H1F0 IL1B OLFML2B SEMA4D HLA-DOA CLIC1 ITGB7 RPL6 AVPI1 SGK1 IFI44L RNASE1 JUND SERPING1 BCL2A1 ASAP2 EMP1 MMP19 RGS1 CD52 BMP6 A2M MS4A6A CXCL1 SORL1 FCGBP IER2 CCL5 MT1G ALOX5AP SLC39A8 AMIGO2 SLAMF1 TFPI PALLD GPR153 NCAPH APBB1IP FOS RPL9 FKBP5 CD47 GCLM DDIT4 RHOBTB1 ALOX15B TPST1 FSCN1 RAB3IL1 USP49 JAKMIP2 CXCL9 VSIG4 ADORA3 HMGB2 HTR2B TAF10 FUCA1 CAMK1 TNFSF15 GM2A TREM2 MARCO CXCL5

B D reported enriched pathway groups 0246810121416 M-dex 1 5 HALLMARK TNFA SIGNALING VIA NFKB (M5890) HALLMARK INFLAMMATORY RESPONSE (M5932) M-IL10 6 inflammatory response (GO:0006954) HALLMARK INTERFERON GAMMA RESPONSE (M5913) M-IL4 11 IL-17 signaling pathway (hsa04657) M-IFNγ 8 leukocyte cell-cell adhesion (GO:0007159) positive regulation of activity (GO:0051345) M-LPS+IFNγ 7 regulation of blood coagulation (GO:0030193) Complement and coagulat on cascades (hsa04610) M-LPSlate 6 1 regulated exocytosis (GO:0045055) leukocyte aggregat on (GO:0070486) M-LPSearly 6 Osteoclast differentiation (hsa04380) HALLMARK KRAS SIGNALING UP (M5953) M0 26 1 1 low-density lipoprotein particle receptor activity (GO:0005041) γ γ 60S ribosomal subunit, cytoplasmic (CORUM:308) maintenance of location (GO:0051235) M0 M-IL4

M-dex cellular metal ion homeostasis (GO:0006875) M-IL10

M-IFN -log(p-value) positive regulation of hemopoiesis (GO:1903708)

M-LPSlate -log(q-value)

M-LPSearly regulation of osteoblast differentiation (GO:0045667) M-LPS+IFN predicted regulation of innate immune response (GO:0045088) C E reported predicted consensus motf name p-value q-value % target % background GSE18686: GSM464241 GSE55536: GSM1338795 A A GG G SpiB (GSE56857) 1.00E-05 0.0033 16.67% 4.64% TAA GTA G C C G GT CT T CT CTA CGT CGT A C TA C CC G TGA 1.00 T CAGAA AG PU.1(GSE21512) 1.00E-04 0.0144 21.88% 8.79% T C C GA C 0.75 GC T AGCT ACTGA CTGA CGT ACG ATGT AGC TA T AA A C RUNX1 (GSE29180) 1.00E-03 0.0612 21.88% 10.06% T GG C G G CT 0.50 GGCTACT AGTCA GTCA C TAGTCA G TACT A G T Chop (GSE35681) 1.00E-03 0.0708 8.33% 2.02% G A G

GAA 0.25 CTCGTACGTA C TCG TACT ACGTCGTGCT ACA G PU.1:IRF8 (GSE66899) 1.00E-02 0.1069 9.38% 2.78% G CT C A TC A TCT CGT CG GCCTAC TCGCA C TT GT T GAA A GA G 0.00 GAA TGA A A T At 4(GSE35681) 1.00E-02 0.3344 8.33% 2.78% C CT C

TT GSE57614: GSM1515655 Own: d54D GTGCTA GC AGCAGCTA CTGA G AGTC ACGT AGA NFkB-p50,p52 (Schreiber_et_al.) 1.00E-02 0.3344 7.29% 2.26% GA A GT 1.00 CA AT TA A T A A G G TTG CC TCT T CT CCA G AG A T TGGGGCAC GCCC CG TCA ELF5 (GSE30407) 1.00E-02 0.3344 23.96% 14.08% G

A

GC T ACGCCT T ACA 0.75 TAATACGTGCGTATGAGCGT AG AG GT TC IRF3 (GSE67343) 1.00E-02 0.3344 10.42% 4.16% GC CC A ACGC G CTTGCGAACGTA ACAGCA GA G A 0.50 TTT T TTCT G A AT T A STAT4(GSE22104) 1.00E-02 0.3344 19.79% 10.93% CC G G G

CG A C A G T CG A C TCA GT CGT G CT 0.25 TATGTCATCAGTCAA C AC ELF3 (GSE64557) 1.00E-02 0.3344 20.83% 11.80% GG A GCA T G

C T CT T T T G CGT 0.00 TAAGGAAA γ γ γ γ T CC TATA-Box 1.00E-02 0.3344 23.96% 14.31% 0 y 0 e 4 x C T A GG GA 1 ex rly G C T A A N N CT AT T AG TG GCA CG CG CGT C CT A M late d M0 a -IL de CATAAAGG TT IF M-IL4 Slat -IF M M-IL M- Se P +IFN M-IL10M- G PSearlLPS M-IFN P S M TC EWS:ERG-fusion (SRA014231) 1.00E-02 0.3344 17.71% 9.58% L - PS+ CT C

CT G A CG A CG A GT A GT A AC A A GGTG M M-L G M- M-L ACTTTCCT A 563 M-L M-LP 564 Fig. 2. Classification results. (A) Heatmap of the median-stabilized log odds ratios per macrophage 565 activation state for each of the 97 predictor genes. (B) Confusion matrix representing the number of 566 correctly classified samples (entries on the diagonal) versus the misclassified samples (entries on the off- 567 diagonal). Classes on the y-axis represents the reported class while classes on the x-axis represent the 568 predicted class. (C) Bar plots of the misclassified samples depicting the classification signal on a scale of 0 569 to 1 where the class with the largest signal represents the predicted class. Blue bars represent the 570 incorrectly predicted class and orange bars represent the reported class. (D) Pathway overrepresentation 571 analysis of the predictor genes ranked by p-value. (E) Motif overrepresentation analysis of the predictor 572 genes ranked by p-value. Columns represent the consensus sequence, the motif name, the p-value, the

Page 16 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

573 BH-adjusted p-value (q-value), the percentage of provided genes with the motif, and the percentage genes 574 in the background with the motif. 575

576 577 Fig. 3. Classification of cells not included in training set. (A) Boxplots representing the classification 578 signal on a scale of 0 to 1 where classes with the largest signal represents the predictions. Colors 579 represent GM-CSF differentiated macrophages (GM-MDMs), monocyte-derived dendritic cells (MoDCs), 580 fibroblast-like synoviocytes (FLS), B lymphocytes (B), T lymphocytes (T), natural killer cells (NK), and 581 neutrophils (NP). (B) GM-MDMs and (C) MoDCs colored by stimulation. (D) Visualization of the log odds 582 distribution of the M0 signal for the M0s from training and test set and the non-macrophage immune cells. 583 Further separation across different stimulations for the GM-MDMs and the MoDCs was plotted below. 584

A Alveolar macrophages

GSE13896 GSE2125 GSE22528 GSE7368

10.0 1.00 Relativelogoddsratio

7.5 0.75

5.0 0.50 Totallogodds 2.5 0.25

B Adipose tissue macrophages Alveolar macrophages Healthy control (AM) Smoker Steroid resistant asthma GSE54350

Relativelogoddsratio Chronic obstructive pulmonary disease Asthma 1.00 Steroid sensitive asthma

6.0 0.75 Adipose tissue macrophages 0.50 Diabetic obese Non-diabetic obese 4.0

Totallogodds 0.25 2.0 Synovial macrophages 0.00 Healthy control (MDM) Rheumatoid arthritis

C Synovial macrophages

GSE10500 GSE49604 GSE97779 E-MEXP3890 10.0 1.00 Relativelogoddsratio

7.5 0.75

5.0 0.50

2.5 0.25 Totallogodds

0.0 0.00

γ γ γ γ y γ γ x e γ γ x 4 0 0 rl te e rly t 4 0 e M0 late L 1 M M0 a la N N IL4 d M0 la N L 1 S IFN IL e F - S IF L -d P - M-I - M-dex +IFN M-IL4 M-dex S M M- Sea P - M-I -I M PSearly L M M PSearly S M-IFN M-IL10 LPS M-I M-IL10 L M M L - L LP - PS+IF LP - PS+IFN - M M-LPSlateLP - M L - M L M M- - M - M - 585 M-LPS+IFN M M M

Page 17 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

586 Fig. 4. Classification of in vivo macrophages. Summarized classification results per dataset with cross- 587 bars representing the mean and the standard errors of the log odds colored by the macrophage in vivo 588 type. Dots above represent the log odds ratio (log(OR)) relative to the sum of the log odds ratios if all 589 predictor genes were measured. (A) Alveolar macrophages obtained from smoking individuals, chronic 590 obstructive pulmonary disease (COPD), asthma patients, as well as healthy controls. (B) Adipose tissue 591 macrophages obtained from diabetic obese and non-diabetic obese patients. (C) Synovial macrophages 592 obtained from rheumatoid arthritis (RA) patients and MDMs from healthy controls (HCs). 593 594 Table 1. Included datasets. An overview of the datasets and the associated studies included in the in the 595 meta-analysis and the classification analysis.

Reference Type Dataset ID Purpose Fuentes-Duculan et al. Microarray GSE18686 Meta-analysis, training & test 2010 Schroder et al. 2012 Microarray GSE19765 Meta-analysis, training & test Benoit et al. 2012 Microarray GSE30177 Meta-analysis, training & test Chandriani et al. 2014 Microarray GSE47538 Meta-analysis, training & test Martinez et al. 2005 Microarray GSE5099* Meta-analysis, training & test Derlindati et al. 2014 Microarray GSE57614 Meta-analysis, training & test Jubb et al. 2016 Microarray GSE61880 Meta-analysis, training & test Steiger et al. 2016 Microarray GSE79077 Meta-analysis, training & test Fujiwara et al. 2016 Microarray GSE85346 Meta-analysis, training & test Corbi et al. 2017 Microarray GSE99056 Meta-analysis, training & test Tsang et al. 2009 Microarray E-MEXP-2032 Meta-analysis, training & test Przybyl et al. 2016 Microarray E-MTAB-3309 Meta-analysis, training & test Turner et al. 2017 Microarray E-MTAB-5095 Meta-analysis, training & test Surdziel et al. 2017 Microarray E-MTAB-5913 Meta-analysis, training & test RNAseq BLUEPRINT Meta-analysis, training & test Park et al. 2017 RNAseq GSE100382 Meta-analysis, training & test Zhang et al. 2015 RNAseq GSE55536 Meta-analysis, training & test Martins et al. 2017 RNAseq GSE80727 Meta-analysis, training & test Realegeno et al. 2016 RNAseq GSE82227 Meta-analysis, training & test Own RNAseq E-MTAB-7572 Test Riera-Borull et al. 2017 Microarray GSE99056 GM-CSF verification GM-CSF verification, non-MDM Vento-Tormo et al. 2016 Microarray GSE75938 verification GM-CSF verification, non-MDM Xue et al. 2014 Microarray GSE46903 verification GM-CSF verification, non-MDM Tasaki et al. 2018 Microarray GSE93776 verification Yarilina et al. 2008 Microarray GSE10500 Synovial macrophage test GM-CSF verification, Synovial You et al. 2014 Microarray GSE49604 macrophage test Shaykhiev et al. 2009 Microarray GSE13896 Alveolar macrophage test Woodruff et al. 2005 Microarray GSE2125 Alveolar macrophage test Madore et al. 2010 Microarray GSE22528 Alveolar macrophage test Goleva et al. 2008 Microarray GSE7368 Alveolar macrophage test

Page 18 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Dalmas et al. 2014 Microarray GSE54350 Adipose tissue macrophage test Kang et al. 2017 Microarray GSE97779 Synovial macrophage test Asquith et al. 2013 Microarray E-MEXP-3890 Synovial macrophage test 596 *The GSE5099 dataset was composed of two Affymetrix microarray datasets: U133A and U133B. Due to 597 the limited overlap in genes between U133B with the rest, we only included the U133A dataset. 598 599 Table 2. Classification testing. A confusion matrix representing the classifier performance on the test 600 set. TP: True positives, FP: False positives, TN: True negatives, FN: False negatives, TNR: True negative 601 rate/specificity, TPR: True positive rate/sensitivity. Accurac Macrophage TP FP TN FN TNR TPR y M0 26 1 50 2 0.98 0.93 0.96

M-LPSearly 6 0 73 0 1.00 1.00 1.00

M-LPSlate 6 0 72 1 1.00 0.86 0.99 M-LPS+IFNγ 7 1 71 0 0.99 1.00 0.99 M-IFNγ 8 1 70 0 0.99 1.00 0.99 M-IL4 11 0 68 0 1.00 1.00 1.00 M-IL10 6 1 72 0 0.99 1.00 0.99 M-dex 5 0 73 1 1.00 0.83 0.99 602

Page 19 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

603 Supplementary Materials 604 605 Supplemental methods:

606 Microarray data preparation

607 All analyses were performed in the R statistical environment (v3.5.0). Download of the raw and 608 processed GEO and AE microarray was performed using the GEOquery (v2.48.0) and the ArrayExpress 609 (v1.40.0) packages respectively. Raw data was normalized in a platform-specific fashion: Affymetrix 610 microarrays were normalized using the rma function from the affy (v1.54.0) and oligo (v1.40.2) packages, 611 whereas Illumina and Agilent microarrays were normalized using the neqc function from the limma

612 (v3.31.14) package. Quality control of the log2 transformed expression values was performed using 613 WGCNA (v1.51) and arrayQualityMetrics (v3.32.0) to remove unmeasured samples, genes, and studies of 614 insufficient quality. Microarray probes were reannotated to the ID according to the annotation files 615 on Bioconductor. Probes that associated to multiple Entrez IDs were removed and multiple probes 616 associating to the same Entrez ID were summarized by taking the median.

617 RNA sequencing data preparation

618 Public raw sequencing reads were sourced from the NCBI Sequence Read Archive (SRA) and 619 converted to fastq files using the fastq-dump function from the SRA-tools package (v2.9.0). All raw fastq 620 files were first checked for quality using FastQC (v0.11.7) and MultiQC (v1.4). The sequencing reads were 621 aligned against the GRCh38 using STAR (v2.5.4). Post-alignment processing was 622 performed using SAMtools (v1.7) after which reads overlapping gene features obtained from Ensembl 623 (v91) were counted using the featureCounts program in the Subread (v1.6.1) package. Gene annotations 624 were converted from Ensembl IDs to Entrez IDs using biomaRt.

625 macIDR package

626 The macIDR package for R comprises a set of functions for performing macrophage classification 627 analyses and visualization thereof using the log odds ratios learned from the aggregated datasets and can 628 be downloaded from https://github.com/ND91/macIDR. Included in this package are the log odds ratios of 629 the 500 repetitions and the logistic regression function necessary to perform classification. By default, the 630 log odds ratios for each predictor gene were defined as described before, namely by taking the median of 631 the log odds ratios for the stable genes across all 500 iterations. We acknowledge that some users might 632 prefer different methods and have therefore provided the option for users to define their own stability 633 threshold and method for aggregation when defining the input log odds ratios for classification analysis. 634 The logistic regression approach implemented in macIDR ignores missing values as we reasoned that 635 missing predictor genes should not prevent classification. Nonetheless, we provide functions such that the 636 user can assess which genes and what relative proportion of log odds ratio is missing by calculating the 637 relative log odds ratios. To that end, we leave it up to the user’s discretion to decide their own threshold for 638 continuing with the classification analysis.

Page 20 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

639 Functional analyses

640 Pairwise gene correlations were generated by calculating the Spearman correlation to minimize 641 the effect of outliers. Pathway and regulator analyses were performed using the Canonical Pathways and 642 Ingenuity Upstream Regulator found in the Ingenuity Pathway Analysis software package (QIAGEN Inc., 643 https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis) and Metascape 644 (http://metascape.org/gp/index.html). Transcription factor motif analysis was performed by using HOMER 645 (v4.10) using the following parameters: -start -200 -end 100 -len 8, 10, 12. 646 647 Supplemental figures: 648

649 650 Fig. S1. Median-stabilization. Plot representing the log odds ratio distribution as obtained from the 500 651 iterations of ten-fold cross-validation. Red and blue represent stable predictor genes with positive and 652 negative coefficients respectively. Grey represents unstable predictor genes.

Page 21 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

653

654 655 Fig. S2. Log odds per gene for alveolar macrophage classification. Plots of the log odds per gene for 656 the studies facetted by the different macrophage class for studies: (a) GSE13896, (b) GSE2125, (c) 657 GSE22528, and (d) GSE7368. 658

659 660 661 Fig. S3. Log odds per gene for adipose tissue macrophage classification. Plots of the log odds per 662 gene for the studies facetted by the different macrophage class for study: GSE54350. 663

664 665 Fig. S4 Log odds per gene for the synovial macrophage classification. Plots of the log odds per gene

Page 22 of 23

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

666 for the studies facetted by the different macrophage class for studies: (a) GSE10500, (b) GSE49604, (c) 667 GSE97779, and (d) E-MEXP3890. 668 669 Supplemental tables: 670 671 Table S1 meta-analysis results. A table containing the unbiased estimator of the effect size and the 672 variance as calculated by the meta-analysis ranked by the p-value. Separate tabs contain the results for 673 the different comparisons of the meta-analysis. Columns “Entrez” and “Gene” represent the Entrez gene 674 ID and the HGNC symbol respectively. The column “mu” represents the unbiased estimator of the effect 675 size and the “mu_var” represents the unbiased estimator of the variance. The columns “Z”, “Z_pval”, and 676 “Z_pval”BH” represent the Z-statistic, the associated p-value and the Benjamini-Hochberg adjusted p- 677 value. 678 679 Table S2 meta-analysis pathway analysis. A table containing the aggregated results of the enriched 680 canonical pathways for each comparison made for the meta-analysis. 681 682 Table S3 predictor gene pathway analysis. A table containing the aggregated results of the enriched 683 canonical pathways from IPA (Tab 1) and pathway analysis from metascape (Tab 2) for all predictor 684 genes. 685 686 Table S4 Log odds for the verification datasets. An Excel file where each tab represents the 687 classification signal as depicted in log odds for each individual study. Each tab contains a table 688 representing the log odds separated by predictor genes to depict the contribution of each gene to the 689 classification. Each row represents a predictor gene for a particular class and each column represents a 690 sample. Missing values for particular rows indicates that these genes were not present in the study. The 691 last three columns represent the Entrez gene ID, HGNC gene symbol and the class to which the weights 692 belong.

Page 23 of 23

M0 M-LPSearly M-LPSlate M-LPS+IFNγ HTR2B TNFSF15 IDO1 EBI3 TREM2 PTX3 GM2A TNFAIP6 CXCL9 CAMK1 FUCA1 TAF10 TNF IL7R CXCL10 FCMR STAB1 MT1G TIMP3 MX2 TNFAIP6 CD101 USP49 SORL1 SERPINA1 RNASE1 IFI44L JAKMIP2 MT1G CXCL1 H1F0 A2M RAB3IL1 CXCL10 IDO1 DHRS3 USP13 MSH2 IFI44L S1PR4 SERPING1 RNASET2 EBI3 SHCBP1 CXCL10 RNF19A VKORC1 MIS18BP1 MAN1C1 PPIB CYSLTR1 TESC HSD17B7 SGK1 CXCL9 CXCR4 TNPO1 CD52 AQP9 VSIG4 INHBA AVPI1 APBB1 CXCL9 HLA-G TREM2 HLA-DMB ITGB7 DECR2 RPL6 MS4A6A FPR3 CLIC1 DDX6 VAV3bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not MMP19 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. CXCL1 BMP6 FOS ASAP2 TFPI FCGBP EMP1 CD47 FKBP5 RPL9 IER2 CD9 IDO1 SLC39A8 AMIGO2 SORL1 CLEC10A SLAMF1 HMGB2 CCL18 CXCL10 TNFRSF21 EBI3 -0.4 -0.2 0.0 0.2 -0.2 0.0 0.2 0.4 -0.2 -0.1 0.0 0.1 0.2 0.3 -0.25 0.00 0.25 Log(OR) Log(OR) Log(OR) Log(OR) M-IFNγ M-IL4 M-IL10 M-dex

MAOA CXCL9 MARCO ADORA3 CCL18 ENPP2 VSIG4 EXOG CCL22 NAMPT CLEC10A TFPI HLA-DOA CTNNAL1 CCL18 DDIT4 SEMA4D PALLD S100A9 GCLM NCAPH MBD4 RPL37 APBB1IP ALOX15B CXCR4 PLTP GPR153 TPST1 FXYD5 SOCS3 S100A8 RHOBTB1 CYFIP2 THBD CXCL8 ORC6 CD74 RPL23A ICAM1 FN1 MS4A4A CYP27A1 CD47 BEX3 MNDA SDS BCL2A1 FARP1 HK3 SLAMF1 MSC CXCL10 IL1B RGS1 ALOX15B EBI3 JUND ALOX5AP FSCN1 OLFML2B IL1B MT1G PLEK PTX3 IDO1 TNFAIP6 CCL5 SLC2A5 MARCO IDO1 CCL18 HTR2B G0S2 CXCL5 CCL5 CYTH4 -0.25 0.00 0.25 0.50 0.75 1.00 -0.4 0.0 0.4 0.8 -0.2 0.0 0.2 -0.25 0.00 0.25 0.50 0.75 Log(OR) Log(OR) Log(OR) Log(OR)

Stable Negative Stable Positive Unstable a GSE13896

M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

10

5 log odds

0

-5

TNF A2M EBI3 IDO1 TFPI FOS RPL6 RPL9 EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 PLTP IL1B PLEK TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 AVPI1 PTX3 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 MAOA CCL5 CD47 MT1G THBD CCL5 GCLM JUND VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TREM2 CXCL1 CXCL1 CXCL9 FCGBP SORL1 USP49 CXCL9 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B NCAPH PALLD CCL18 ENPP2 NAMPT RPL37 SOCS3 CCL18 CYTH4 DDIT4 TPST1 AMIGO2 CXCL10 RNASE1SLAMF1 HMGB2 CXCL10 JAKMIP2 CXCL10 MMP19 SLC2A5 GPR153 MARCO BCL2A1 MARCO RPL23A S100A9 SLC39A8 TNFSF15 TNFAIP6 RAB3IL1 TNFAIP6 CLEC10A MS4A6A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3 SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP ALOX15B RHOBTB1

Group Healthy Control Chronic Obstructive Pulmonary Disease Smoker b GSE2125

M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

8

4 log odds

0

TNF A2M EBI3 IDO1 TFPI FOS RPL6 RPL9 EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 PLTP IL1B PLEK TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 AVPI1 PTX3 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 MAOA CCL5 CD47 MT1G THBD CCL5 GCLM JUND VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TREM2 CXCL1 CXCL1 CXCL9 FCGBP SORL1 USP49 CXCL9 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B NCAPH PALLD CCL18 ENPP2 NAMPT RPL37 SOCS3 CCL18 CYTH4 DDIT4 TPST1 AMIGO2 CXCL10 RNASE1SLAMF1 HMGB2 CXCL10 JAKMIP2 CXCL10 MMP19 SLC2A5 GPR153 MARCO BCL2A1 MARCO RPL23A S100A9 SLC39A8 TNFSF15 TNFAIP6 RAB3IL1 TNFAIP6 CLEC10A MS4A6A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3 SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP ALOX15B RHOBTB1

Group Healthy Control Asthma c GSE22528

M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex 8

6

4

2 log odds

0

-2

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not -4 certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

TNF A2M EBI3 IDO1 TFPI FOS RPL6 RPL9 EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 PLTP IL1B PLEK TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 AVPI1 PTX3 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 MAOA CCL5 CD47 MT1G THBD CCL5 GCLM JUND VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TREM2 CXCL1 CXCL1 CXCL9 FCGBP SORL1 USP49 CXCL9 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B NCAPH PALLD CCL18 ENPP2 NAMPT RPL37 SOCS3 CCL18 CYTH4 DDIT4 TPST1 AMIGO2 CXCL10 RNASE1SLAMF1 HMGB2 CXCL10 JAKMIP2 CXCL10 MMP19 SLC2A5 GPR153 MARCO BCL2A1 MARCO RPL23A S100A9 SLC39A8 TNFSF15 TNFAIP6 RAB3IL1 TNFAIP6 CLEC10A MS4A6A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3 SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP ALOX15B RHOBTB1

Group Healthy Control Asthma d GSE7368

M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

6

3 log odds

0

-3

TNF A2M EBI3 IDO1 TFPI FOS RPL6 RPL9 EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 PLTP IL1B PLEK TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 AVPI1 PTX3 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 MAOA CCL5 CD47 MT1G THBD CCL5 GCLM JUND VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TREM2 CXCL1 CXCL1 CXCL9 FCGBP SORL1 USP49 CXCL9 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B NCAPH PALLD CCL18 ENPP2 NAMPT RPL37 SOCS3 CCL18 CYTH4 DDIT4 TPST1 AMIGO2 CXCL10 RNASE1SLAMF1 HMGB2 CXCL10 JAKMIP2 CXCL10 MMP19 SLC2A5 GPR153 MARCO BCL2A1 MARCO RPL23A S100A9 SLC39A8 TNFSF15 TNFAIP6 RAB3IL1 TNFAIP6 CLEC10A MS4A6A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3 SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP ALOX15B RHOBTB1

Group Steroid Resistant Asthma Steroid Sensitive Asthma bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

GSE54350

M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

4

2 log odds 0

-2

TNF A2M EBI3 IDO1 TFPI FOS RPL6 RPL9 EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 PLTP IL1B PLEK TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 AVPI1 PTX3 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 MAOA CCL5 CD47 MT1G THBD CCL5 GCLM JUND VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TREM2 CXCL1 CXCL1 CXCL9 FCGBP SORL1 USP49 CXCL9 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B NCAPH PALLD CCL18 ENPP2 NAMPT RPL37 SOCS3 CCL18 CYTH4 DDIT4 TPST1 AMIGO2 CXCL10 RNASE1SLAMF1 HMGB2 CXCL10 JAKMIP2 CXCL10 MMP19 SLC2A5 GPR153 MARCO BCL2A1 MARCO RPL23A S100A9 SLC39A8 TNFSF15 TNFAIP6 RAB3IL1 TNFAIP6 CLEC10A MS4A6A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3 SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP ALOX15B RHOBTB1

Group Diabetic Obese Non-Diabetic Obese A GSE10500 M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

5.0

2.5 log odds

0.0

-2.5

IDO1 TFPI FOS TNF IDO1 IER2 IL7R CD9 H1F0 IDO1 MX2 IDO1 IL1B IDO1 IL1B TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 PTX3 RPL6 RPL9 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 EXOG G0S2 PTX3 RGS1 MAOA CCL5 CD47 MT1G PLTP THBD CCL5 DDIT4 GCLM JUND PLEK VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 CXCL1 CXCL1 CXCL9FCGBP SORL1 CXCL9 MMP19 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B NCAPH PALLD CCL18 ENPP2 NAMPT RPL37 SOCS3 CCL18 TPST1 AMIGO2 CXCL10 RNASE1SLAMF1 HMGB2 CXCL10 CXCL10 SLC2A5 MARCO BCL2A1 MARCO S100A9 SLC39A8 TNFAIP6 JAKMIP2 TNFAIP6 CLEC10A HLA-DOA SEMA4D CLEC10A OLFML2B TNFAIP6 ADORA3ALOX15B SERPINA1SERPING1TNFRSF21 CTNNAL1 ALOX5AP RHOBTB1

Group Healthy Control Rheumatoid Arthritis

B GSE49604 M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex 10

5 log odds

0

A2M EBI3 IDO1 TFPI FOS TNF EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 IL1B TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G TIMP3 AVPI1 PTX3 RPL6 RPL9 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 CCL5 CD47 MT1G PLTP THBD CCL5 DDIT4 GCLM JUND PLEK VSIG4 ASAP2 CAMK1CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TREM2 CXCL1 CXCL1 CXCL9 SORL1 USP49 CXCL9 MMP19 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B MAOA PALLD CCL18 ENPP2 RPL37 SOCS3 CCL18 CYTH4 TPST1 AMIGO2 CXCL10 SLAMF1 HMGB2 CXCL10 FCGBP CXCL10 MS4A6A SLC2A5 GPR153 MARCONCAPH BCL2A1 MARCO NAMPT RPL23A S100A9 RNASE1SLC39A8 TNFSF15 TNFAIP6 JAKMIP2 RAB3IL1 TNFAIP6 CLEC10A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3ALOX15B SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP RHOBTB1

Group Healthy Control Rheumatoid Arthritis

C GSE97779 M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

8

4 log odds

0

bioRxiv preprint doi: https://doi.org/10.1101/532986; this version posted January 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A2M EBI3 IDO1 TFPI FOS TNF EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 IL1B TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G AVPI1 PTX3 RPL6 RPL9 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 G0S2 PTX3 RGS1 CCL5 CD47 MT1G PLTP THBD CCL5 DDIT4 GCLM JUND PLEK ASAP2 CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TIMP3 TREM2 CXCL1 CXCL1 CXCL9 SORL1 USP49 CXCL9 CXCL9 EXOG FSCN1 CCL18 CCL22 CXCL5 HTR2B MAOA PALLD CCL18 ENPP2 RPL37 SOCS3 CCL18 CYTH4 TPST1 VSIG4 AMIGO2 CAMK1 CXCL10 SLAMF1 HMGB2 CXCL10 FCGBP CXCL10 MMP19MS4A6A SLC2A5 GPR153 MARCONCAPH BCL2A1 MARCO NAMPT RPL23A S100A9 RNASE1SLC39A8 TNFSF15 TNFAIP6 JAKMIP2 RAB3IL1 TNFAIP6 CLEC10A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3ALOX15B SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP RHOBTB1

Group Healthy Control Rheumatoid Arthritis

D E-MEXP3890 M0 M-LPSearly M-LPSlate M-LPS+IFNγ M-IFNγ M-IL4 M-IL10 M-dex

5 log odds

0

A2M EBI3 IDO1 TFPI FOS TNF EBI3 IDO1 IER2 IL7R CD9 EBI3 IDO1 MX2 IDO1 IL1B IDO1 IL1B TFPI BMP6 CD47 FCMR GM2A IFI44LMT1G AVPI1 PTX3 RPL6 RPL9 SGK1 CD52 CLIC1 IFI44L ITGB7 MT1G EMP1 H1F0 EXOG G0S2 PTX3 RGS1 CCL5 CD47 MT1G PLTP THBD CCL5 DDIT4 GCLM JUND PLEK ASAP2 CCL18CD101 FKBP5FUCA1 HTR2B SORL1STAB1TAF10 TIMP3 TREM2 CXCL1 CXCL1 CXCL9 SORL1 USP49 CXCL9 CXCL9 FSCN1 CCL18 CCL22 CXCL5 HTR2B MAOA PALLD CCL18 ENPP2 RPL37 SOCS3 CCL18 CYTH4 TPST1 VSIG4 AMIGO2 CAMK1 CXCL10 SLAMF1 HMGB2 CXCL10 FCGBP CXCL10 MMP19MS4A6A SLC2A5 GPR153 MARCONCAPH BCL2A1 MARCO NAMPT RPL23A S100A9 RNASE1SLC39A8 TNFSF15 TNFAIP6 JAKMIP2 RAB3IL1 TNFAIP6 CLEC10A HLA-DOA SEMA4D APBB1IP CLEC10A TNFAIP6 ADORA3ALOX15B SERPINA1SERPING1TNFRSF21 CTNNAL1 OLFML2B ALOX5AP RHOBTB1

Group Rheumatoid Arthritis