Stelios Pavlidis3, Matthew Loza3, Fred Baribaud3, Anthony
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Data Th2 and non-Th2 molecular phenotypes of asthma using sputum transcriptomics in UBIOPRED Chih-Hsi Scott Kuo1.2, Stelios Pavlidis3, Matthew Loza3, Fred Baribaud3, Anthony Rowe3, Iaonnis Pandis2, Ana Sousa4, Julie Corfield5, Ratko Djukanovic6, Rene 7 7 8 2 1† Lutter , Peter J. Sterk , Charles Auffray , Yike Guo , Ian M. Adcock & Kian Fan 1†* # Chung on behalf of the U-BIOPRED consortium project team 1Airways Disease, National Heart & Lung Institute, Imperial College London, & Biomedical Research Unit, Biomedical Research Unit, Royal Brompton & Harefield NHS Trust, London, United Kingdom; 2Department of Computing & Data Science Institute, Imperial College London, United Kingdom; 3Janssen Research and Development, High Wycombe, Buckinghamshire, United Kingdom; 4Respiratory Therapeutic Unit, GSK, Stockley Park, United Kingdom; 5AstraZeneca R&D Molndal, Sweden and Areteva R&D, Nottingham, United Kingdom; 6Faculty of Medicine, Southampton University, Southampton, United Kingdom; 7Faculty of Medicine, University of Amsterdam, Amsterdam, Netherlands; 8European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, France. †Contributed equally #Consortium project team members are listed under Supplementary 1 Materials *To whom correspondence should be addressed: [email protected] 2 List of the U-BIOPRED Consortium project team members Uruj Hoda & Christos Rossios, Airways Disease, National Heart & Lung Institute, Imperial College London, UK & Biomedical Research Unit, Biomedical Research Unit, Royal Brompton & Harefield NHS Trust, London, UK; Elisabeth Bel, Faculty of Medicine, University of Amsterdam, Amsterdam, Netherlands; Navin Rao, Janssen Research and Development, High Wycombe, Buckinghamshire, United Kingdom; David Myles, Respiratory Therapy Area Unit, GlaxoSmithKline, Stockley Park, UK; Chris Compton, Discovery Medicine, GlaxoSmithKline, Stockley Park, UK; Marleen Van Geest, AstraZeneca R&D Molndal, Sweden; Peter Howarth & Graham Roberts, Faculty of Medicine, Southampton University, Southampton, UK and NIHR Southampton Respiratory Biomedical Research Unit, University Hospital Southampton, Southampton, UK; Diane Lefaudeux, European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, France; Bertrand De Meulder, European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, France; Aruna T Bansal, Acclarogen Ltd, St John's Innovation Centre, Cambridge, CB4 0WS, UK; Richard Knowles, Knowles Consulting, Stevenage Bioscience Catalyst, Gunnels Wood Road, Stevenage SG1 2FX, UK; Damijn Erzen, Boehringer Ingelheim Pharma, Germany; Scott Wagers, BioSci Consulting, BioSci Consulting, Maasmechelen, Belgium; Norbert Krug, Immunology, Allergology and Clinical Inhalation, Fraunhofer Institute for Toxicology and Experimental Medicine, Hannover, Germany; Tim Higenbottam, Corporate Clinical Development, Chiesi Pharmaceutics Ltd, Cheadle, UK. Current address: Allergy Therapeutics, West Sussex, UK; John Matthews, Genentech Inc, 1 DNA Drive, South San Francisco, CA 94080-4990, USA; Veit Erpenbeek, Translational Medicine - Respiratory Profiling, Novartis Institutes for BioMedical Research, Basel, Switzerland; Leon Carayannopoulos, Merck Inc. Kenilworth, New Jersey, USA; 3 Amanda Roberts, UBIOPRED Patient Input Platform, ELF, Sheffield, UK; David Supple, UBIOPRED Patient Input Platform, ELF, Sheffield, UK; Pim deBoer, UBIOPRED Patient Input Platform, ELF, Sheffield, UK; Massimo Caruso, Department of Clinical and Experimental Medicine Hospital University, University of Catania, Italy; Pascal Chanez, Département des Maladies Respiratoires, Laboratoire d'immunologie, Aix Marseille Université Marseille, France; Sven-Erik Dahlen, The Centre for Allergy Research, The Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden; Ildikó Horváth, Department of Pulmonology, Semmelweis University, Budapest, Hungary; Nobert Krug, Fraunhofer Institute for Toxicology and Experimental Medicine Hannover, Germany; Jacek Musial, Dept. of Medicine, Jagiellonian University Medical College, Krakow, Poland; Thomas Sandström, Dept of Medicine, Respiratory and Allergy unit, University Hospital, SE 901 85 Umeå, Sweden. 4 Methods: Study design This study analysed data from the recently-reported UBIOPRED cohort (1). 104 participants (Supplementary Table S1) with moderate-to-severe asthma and 16 healthy non-asthma volunteers (HV) from the U-BIOPRED cohort underwent sputum cell profile analysis (1). Pre- bronchodilator spirometry, exhaled nitric oxide (FeNO), skin prick tests, serum total IgE, serum periostin, and differential blood count were measured. The study was approved by the Ethics Committees of the recruiting centres. All participants gave written informed consent. The data and bioinformatic analyses are described below. Validation of the transcriptomic-associated clusters was performed using sputum transcriptomic data from the ADEPT asthma cohort (2). Microarray analysis of sputum transcriptome Sputum was induced by inhalation of hypertonic saline solution and sputum plugs were collected from which sputum cells and sputum supernatants were obtained as previously described(3). Cell pellets were stored in RNA stabilization buffer (Norgen Biotek, Thorhold, Canada). RNA purity (RIN >6) was measured by Agilent Bioanalyser (Agilent, Santa Clara, Calif). Expression profiling was studied using Affymetrix U133 Plus 2.0 microarrays (Affymetrix, Santa Clara, Calif). Raw data were quality assessed and pre-processed by robust multi-array average normalization. Probes of low expression were filtered by robust multi-array signal analysis for values <5 and also for batch/technical effects. The intensity of the raw probe sets were log base 2 transformed and normalized by the robust multi-array average (RMA) method (4). A regression based method (R package limma) was used to analyse DEGs with respect to the groups of interest and batch/technical effects, age, sex and administration of oral corticosteroid were adjusted for as covariates in the linear 5 model. False discovery rate (FDR) using the Benjamini and Hochberg method was applied for p-value adjustment in relation to multiple tests. SomaLogic Proteomic Technique The SOMAscan proteomic assay is an array-based method measuring 1,096 proteins each assay run which had its technique described comprehensively elsewhere(5, 6). All proteomic measurements for sputum supernatants were performed by SomaLogic Inc., (Boulder, CO) blinded to all subjects’ clinical and transcriptomic data. Briefly, every protein measured in the assay has its own fluorophore-tagged SOMAmer (DNA) as a targeted reagent. SOMAmers that are in complexes with their cognate proteins are captured by automated partitioning steps. Using a custom Agilent hybridization chip designed as the antisense probe array specifically hybridizes to the SOMAmers, the measurement of proteins was transformed to the measurement of the fluorescent intensity of the hybridized SOMAmers. Protein concentrations were originally reported in relative fluorescence units (RFU) while this concentration were log10-transformed before statistical analysis to reduce heteroscedasticity. Pathway analysis of transcriptomic features We analysed 508 differentially-expressed genes (DEG) from a comparison of the three groups of the UBIOPRED cohort (Fig 1A, B; Supplementary Table S1). We defined a sputum eosinophil count ≥1.5% as being eosinophilic and a neutrophil count ≥74% as neutrophilic, while pauci-granulocytic and mixed-granulocytic counts were below and above these thresholds, respectively (1). Three sets of differentially expressed genes (DEGs) from pairwise contrasts of sputum EOS and non-EOS phenotypes, and healthy volunteers (HV) were analysed in order to obtain disease 6 driver genes. A filtering criteria with a false discovery rate (FDR) <0.05 and log2 fold change >0.5 was applied. Computational and statistical analyses Datasets were uploaded and curated in the tranSMART system(7). Statistical analysis was performed using R environment for statistical computing. False discovery rate was used to address multiple test correction. Hierarchical clustering based on Euclidean distance was used for cluster exploration and a resampling based technique was conducted as a measurement of cluster number optimization. Supervised learning algorithms using the shrunken centroid method (8) was applied to the cluster findings to determine predictive signatures for each cluster and feature reduction methods were implemented along with the learning algorithms to obtain a sparse model to facilitate interpretation. Kruskal-Wallis or ANOVA test was used for multiple group comparison of continuous variables. All categorical variables were analyzed using Fisher’s exact test and p-value <0.05 was considered statistically significant. Optimal cluster number determination In order to perform clustering of asthma subjects using transcriptomic features, we first determined the optimal cluster number from these 508 DEGs. Consensus clustering, a resampling technique taking into account the cluster consensus across multiple runs of a clustering algorithm, was used to address the issue of optimal cluster number (9-11). This method analyzes the N subjects’ cluster consensus distribution based on an (N x N) matrix built under the proportion of clustering runs in which two subjects are clustered together. The optimal cluster number is therefore determined by finding a cluster number K where consensus matrix histogram