INVESTIGATING THE MECHANISMS OF ANKYLOSING SPONDYLITIS-ASSOCIATED GENETIC VARIANTS

Jessica Whyte LLB/BSc, BSc (Hon), GDLP

Submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy

School of Biomedical Sciences

Faculty of Health

Queensland University of Technology

2021

Keywords

Ankylosing spondylitis, spondyloarthritis, genetics, expression, epigenetics, immunology, lymphocytes, autoimmunity, statistics.

ANZSRC

060404 Epigenetics, 45%

060405 , 45%

110706 Immunogenetics, 10%

FoR

0604 Genetics, 45%

1107 Immunology, 45%

0104 Statistics, 10%

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants i

ii Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Abstract

Ankylosing spondylitis (AS) is an inherited chronic immune-mediated disease, with a complex genetic and immunological basis. Early identification of the heritability of

AS in both twin and population-based studies spurred research into disease associated genetic polymorphisms, and currently 117 genetic loci have been associated with AS.

However, these cumulatively only account for a minority of the heritability of AS

(<28%). Contributing to the lack of understanding of the genetic contribution in AS is that many of these variants are non-coding or intergenic with unknown function, perhaps suggesting that the mechanism of disease association for these variants is through epigenetic effects. Profiling epigenetics, such as DNA methylation and RNA transcription, may inform on the functional impact of AS-associated genetic variants on the immune system, and disease. Unfortunately, a lack of research into DNA methylation and transcription in AS has left the field stagnant for some time. Further research is needed to improve the understanding of the fundamental basis of AS to facilitate the identification of novel pathways for drug treatment.

This thesis sought to examine the functional impact of disease-associated genetic variants by examining changes in DNA methylation and transcription related to AS.

Chapter 1 will review the literature on the structure of the immune system, current diagnosis and treatment of ankylosing spondylitis, and current research into the genotypic, transcriptional and methylation changes underlying ankylosing spondylitis.

Chapter 2 will outline the methodology used in this study for the remaining chapters, both ethical, experimental, and statistical.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants iii

Chapter 3 contains the results of the DNA methylation data analysis both of individual

CpG positions and regions, followed by pathway analysis of the associated with these changes in methylation and specifically with AS-associated loci.

Chapter 4 addresses the transcriptional data analysis, specifically investigating changes in gene expression, both broadly and within AS-associated genes, and the pathways associated with these changes in expression.

Chapter 5 discusses the integrated analysis of data generated in Chapter 3 and 4 with the AS-associated genetic loci, initially through quantitative trait loci (QTL) analysis, and secondly as an integrated signature between all three datasets (DNA methylation, transcription, and genotype).

The results of these chapters implicate new genes for examination in the pathogenesis of AS, ETS2 and RNASET2. The gene expression and DNA methylation changes associated with known AS-associated genetic loci occurred in a highly cell type specific manner and were indicative of increased inflammatory pathways in most cell types. Incorporating these three datasets enabled a better signature for disease but still indicates the remaining unknown complexity in understanding the underlying pathogenesis of AS.

In summary, this thesis outlines the experimental and statistical approaches used to investigate changes in DNA methylation and gene expression in individual immune cell types within the context of AS and highlights the association of these changes with

AS genetic loci. It provides both an insight into the cell type specific changes being driven by these genetic loci and a basis for which future research can address incorporating these findings further.

iv Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made. I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, financial support and any other original research work used or reported in my thesis.

Signature: QUT Verified Signature

Date: ______25/05/2021______

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants v

Publications included in this thesis

None.

Other publications during candidature

Whyte JM, Ellis JJ, Brown MA, Kenna TJ. Best practices in DNA methylation: lessons from inflammatory bowel disease, psoriasis, and ankylosing spondylitis.

Arthritis Res Ther 2019; 21(1): 133.

Published conference abstracts

J. Whyte, L. Bradbury, J. Phipps, S. Song, M. Clout, T. Kenna, M. Brown. DNA methylation is distinct between AS cases and controls and in different peripheral blood mononuclear cell types. International Journal of Rheumatic Diseases. 22 (S3) 40-226.

vi Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Contributions to thesis

Professor Matt Brown and Associate Professor Tony Kenna contributed substantially to project conception and design, interpretation of research data and critical revision of written work.

Sample collection and Ethics

Research ethics, participant recruitment, collection and documentation were contributed to substantially by Rheumatology Nurse Practitioner Linda Bradbury, research nurses Kelly Hollis and Julie Phipps, Professor Brown, and laboratory administrator Kim Gardner. Kerrie McAloney, coordinator for the QTwin registry directed by Professor Nick Martin, assisted with the ethics and recruitment of HLA- B*27 positive healthy volunteers.

Sample Processing

Technicians at the Translational Research Institute Flow Cytometry Core Facility ran prepared cells on the Astrios machine for FACS using a gating strategy developed by Associate Professor Kenna and Jessica Whyte. Erika De Guzman performed the genotyping on the Illumina CoreExome, bisulfite conversion of DNA and running the Illumina MethylationEPIC Beadchip on the Illumina iScan system. Lisa Anderson assisted with the RNAseq library preparation clean-up and library pooling. Lisa Anderson and Sahana Manoli ran prepared libraries on the Illumina NextSeq during pilot studies, and the Illumina NovaSeq for the RNAseq libraries.

Data quality control and processing

Dr Zhixiu Li and Dr Jonathan Ellis performed the genotyping calling and quality control. Dr Ellis assisted with STAR alignment with a standard pipeline. Dr Peter Sternes assisted with running the eQTL Matrix analysis and generation of QQ and boxplots of the outputs. Dr Aimee Hanson provided example code for the permutation testing outlined in Chapter 2.

Work by the student

All other work described within this thesis was performed by the student, Jessica Whyte. This included aspects of project design, ethics application and PBMC processing. All FACS processing, FACS analysis, DNA and RNA isolation, sample

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants vii

randomization, library preparation, DNA methylation analysis, RNAseq analysis, DIABLO analysis and generation of all figures and tables.

Financial Contributions

This research was supported by an Australian Government Research Training Stipend, and through the financial support of Professor Brown and Associate Professor Kenna.

Professor Brown provided financial support for the registration cost of attending the

Asia Pacific League of Associations for Rheumatology (APLAR) Conference 2019.

Research involving Human or Animal subjects

Human research ethics approval for the use of the blood samples collected within this study was granted by the following research ethics committees:

Metro South Hospital and Health Service Human research ethics office

(approval no. HREC/05/QPAH/221)

Princess Alexandra Hospital Human Research Ethics committee

(approval no. 2005/221)

Queensland University of Technology Human research ethics committee

(approval no. 1600000162)

QIMR Berghoefer Human Research ethics Committee

(approval no.: P193 and P455)

viii Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Acknowledgements

A PhD document is a slice of life, built on the support and work of a small village. It is hard to encapsulate all of the ways I have been supported in this journey when so many have contributed to this work being possible.

The backbone of support for this PhD and the people who enabled me to begin this journey in the first place were Matt Brown and Tony Kenna. It has been an enriching four years, and you have both supported me every step of the way. Despite the delays and the moves (was it something I said?), you have provided me with a wealth of experience. I appreciated that I could rely on the both of you, that certainty that when you said you would email back a draft it would be provided by that date. It may seem small, but during my most anxious times it was something I could rely upon. When I started this journey with you both I had no immunology knowledge and had never heard of ankylosing spondylitis, now I can say with great confidence that I have a working knowledge of both. Thank you for the time, the knowledge, and the patience you have both extended to me these past 4 years.

To my wonderful parents, since I was little you have given me room to grow and support to fall back on. You have looked after me throughout this long journey and your unwavering support has made this possible. My achievements have been built on the life lessons you have taught me, including perseverance, hard work but also acceptance of things that are beyond our ability to change. I love you both dearly.

To Patrick, our relationship has flowed alongside the journey of my PhD and I honestly cannot imagine what it would have been like without you. You make me so happy and you have held me together these past few months.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants ix

A special thank you must be made to Dr Aimee Hanson! Aimee, you taught me how to collect PBMCs, how to set up my HPC, how to code, and you have provided so much support and happiness throughout this journey. I am so glad to have done my

PhD at the same time as you, and I know you will make strides in your future career in “sunny” England!

To the other members of Brown lab, both past and present, you have all been a great group to work with. I adored the morning teas and the puzzles, which saved my sanity on more than one occasion. Kate, you are responsible for my podcast addiction, and my wine subscription. You are the best! Thank you to the ATGC crew for helping me with so much, and to Peter and Jonathan for answering my many, many, many emails.

I have such admiration and appreciation for our Rheumatology Nurse Practitioner

Linda Bradbury, research nurses Kelly Hollis and Julie Phipps, and laboratory administrator Kim Gardner. So much organization went into collecting these samples, and honestly it all goes on behind closed doors. Without you there would be no thesis.

Finally, I need to thank my grammar girls. It’s been 12 years of support and zaniness.

Even when you knew nothing about what I was doing you guys cheered me on. Thank you for letting me rant when it probably made no sense to you.

In keeping with tradition, a quote to work by:

Do, or do not. There is no try.

-Yoda

x Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Table of Contents

Keywords...... i Abstract ...... iii Statement of Original Authorship ...... v Acknowledgements ...... ix Table of Contents ...... xi List of Figures ...... xiv List of Tables ...... xvii List of Abbreviations...... xix Chapter 1: Introduction ...... 23 Immune system ...... 23 Antigen processing and presentation...... 26 Spondyloarthritis ...... 30 Ankylosing Spondylitis ...... 31 Impact of AS on Quality of life ...... 32 Diagnosis ...... 33 Treatment...... 35 Genetics of ankylosing spondylitis ...... 38 HLA-B*27 ...... 38 Arthritogenic peptide ...... 39 ER stress response...... 39 Heavy chain homodimers ...... 40 Immunodeficiency ...... 41 ERAP1 and ERAP2 ...... 42 Interleukin 23 ...... 42 Osteoimmunology ...... 43 Missing heritability and functionality ...... 43 Cell types in Ankylosing Spondylitis ...... 44 DNA methylation in AS ...... 45 Differential gene expression ...... 51 RNA expression in AS ...... 52 Multi-omics integrated analysis ...... 58 Thesis outline ...... 60 Aims ...... 60 Significance ...... 61

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants xi

Chapter 2: Materials and Methods ...... 63 Participants ...... 63 Genotyping ...... 65 PBMC processing ...... 66 Fluorescence activated cell sorting ...... 67 DNA and RNA extraction ...... 68 DNA methylation methods ...... 69 RNAseq methods ...... 75 Integrated Analysis ...... 80 Chapter 3: DNA methylation in AS ...... 83 Introduction ...... 83 DNA methylation in AS ...... 84 Insights from genetically related diseases ...... 85 Measuring DNA methylation ...... 85 Chapter Outline ...... 88 Results ...... 89 Quality assessment of DNA methylation data ...... 89 Differential methylation analysis ...... 94 Differentially methylated regions ...... 116 Pathway analysis...... 126 DMP in AS-associated loci regions ...... 136 Discussion ...... 138 Conclusion...... 149 Chapter 4: RNA expression in AS ...... 151 Introduction ...... 151 Chapter outline ...... 154 Results ...... 156 Sequencing metrics ...... 156 Differential expression analysis ...... 161 Pathway analysis...... 196 Discussion ...... 206 Conclusions ...... 220 Chapter 5: Integrated analysis ...... 223 Introduction ...... 223 Chapter Overview ...... 226 Results ...... 227 DIABLO signature for AS ...... 243 xii Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Discussion ...... 258 Conclusion ...... 266 Chapter 6: Final Discussion ...... 267 Overview of project findings ...... 268 Project Significance ...... 276 Future directions ...... 279 Concluding remarks ...... 284 Bibliography ...... 285 Appendices ...... 299 Appendix A AS-associated genetic loci and their associated genes (Genome wide association study (GWS) or cross-disease analysis (CDA) ...... 299 Appendix B Heat scree plot code ...... 303 Appendix C DMP Permutation code ...... 305 Appendix D Pilot study comparing library preparation methods ...... 307 Appendix E Demographics for γδ T-cell and NK cell samples ...... 311 Appendix F Online supplementary Data ...... 312 (1) DMP results for individual cell types ...... 312 (2) DEG results for individual cell types ...... 312 (3) Cis-QTL results for both methylation and gene expression ...... 312

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants xiii

List of Figures

Figure 1.1. Lineage for major immune cell types originating from haematopoietic stem cells...... 24 Figure 1.2 Antigen presentation by HLA class I and II...... 28 Figure 1.3. Examples of radiographic imaging for the spinal vertebrae in a healthy individual and an AS individual with a 'bamboo' spine phenotype (full ankylosis of the spine)19...... 32 Figure 1.4 Graphical representation of the CpG naming conventions based on proximity to CpG Island...... 45 Figure 1.5 The process of DNA methylation addition, maintenance and removal...... 47 Figure 1.6 Illustration of the cis and trans relationship between SNPs with DNA methylation and gene expression...... 59 Figure 2.1 FACS gating strategy used for sorting cell subsets...... 67 Figure 2.2”‘„‡––”‹–‹‘–Š”‘—‰Š‘—––Š‡’”‘ ‡••...... 70 Figure 3.1. PCA plot of all samples labelled by cell type in PC 1 and 2...... 90 Figure 3.2 Heat scree plot visualising each factors significance in explaining the variation within each PC...... 91 Figure 3.3 PCA plots of each cell type coloured by disease status...... 93 Figure 3.4. Boxplot of the most significant DMP identified for CD4+T-cells (cg03740323) ...... 96 Figure 3.5 Volcano plot of CD4+T-cell DMPs...... 97 Figure 3.6 CD8+T-cell DMP cg09997271 had the lowest p-value...... 100 Figure 3.7 Volcano plot of CD8+T-cell DMPs...... 101 Figure 3.8 Volcano plot of CD14+monocyte DMP ...... 105 Figure 3.9 Volcano plot of γδ T-cell differentially methylated positions...... 109 Figure 3.10 Volcano plot of NK cell DMP ...... 113 Figure 3.11 Plot of the significant CD4+T-cell DMR in the TMEM204 transcript ...... 118 Figure 3.12 The most significant DMR in CD8+T-cells...... 120 Figure 3.13 DMR containing the most significant DMP for CD8+T-cells (cg09997271) within the PDE4A promoter region...... 121 Figure 3.14 Gviz plot of the NK cell DMR overlapping the promoter region of AKAP8L (shown as reverse strand). The region is near to WIZ...... 125 Figure 3.15. GO terms identified using the hypermethylated DMP (left) and hypomethylated DMP (right) in CD4+T-cell...... 128 Figure 3.16 Cnet plots of CD4+T-cell associated KEGG terms for hypermethylated DMP (left) and hypomethylated DMP (right)...... 130 Figure 3.17 Cnetplots for the 5 most significant GO terms associated with CD8+T- cell hypomethylated (left) and hypomethylated (right) genes...... 132 Figure 3.18 Cnetplot for GO terms associated with NK cell hypermethylated (left) and hypomethylated (right) DMPs...... 135 xiv Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Figure 4.1 Alignment statistics from STAR for number and percentage of reads aligned for each sample...... 157 Figure 4.2 PCA plot for RNAseq counts showing cell type within the first and second PCs...... 158 Figure 4.3. Heat scree plot for technical and biological variables influence on variation within each PC...... 159 Figure 4.4 Volcano plot of the DEG associated with CD4+T-cells ...... 169 Figure 4.5 Volcano plot of CD8+T-cell DEG ...... 174 Figure 4.6 Heatmap of the 20 most significant DEG in CD8+T-cells forward strand (A) and reverse strand (B) coloured by fold change in expression levels...... 175 Figure 4.7 Volcano plot of the DEG associated with CD14+monocytes ...... 180 Figure 4.8 Volcano plot of γδ T-cell associated DEG ...... 185 Figure 4.9 Heatmaps of GD T-cell 20 most significant DEG for forward (A) and reverse (B) strand reads...... 186 Figure 4.10 Volcano plot of NK cell associated DEG ...... 191 Figure 4.11 Cnetplot of CD4+T-cell reverse strand-associated GO terms...... 197 Figure 4.12. Cnetplot of the 5 most statistically significant GO terms associated with CD14+monocyte DEG in forward (left) and reverse (right) strand reads...... 201 Figure 4.13 Cnet plot of CD14+monocyte reverse strand read associated KEGG terms...... 202 Figure 4.14 Cnet plot of NK cell reverse strand associated GO terms ...... 204 Figure 5.1 QQ plot of the CD4+T-cell cis (local) and trans (distal) mQTL results...... 228 Figure 5.2 cis-mQTLs associated with the ETS1 region. Overlayed for CD4+T- cells (blue), CD8+T-cells (orange) and CD14+monocytes (green)...... 231 Figure 5.3 QQ plot of eQTL for CD4+T-cells indicating local (cis) and distal (trans) associations...... 235 Figure 5.4 CD4 T-cell expression and methylation values for the ERAP2 SNP genotype (5:96211741:T:G)...... 237 Figure 5.5 Gviz plot of the cis-mQTL and eQTL across CD4+T-cells (blue), CD8+T-cells (orange) and CD14+monocytes (green) within the ERAP1/ERAP2 locus...... 238 Figure 5.6 cis-mQTL and cis-eQTLs in CD4 and CD8 T-cells associated with a RNASET2...... 240 Figure 5.7 Gviz plot for cis-mQTL and eQTL within the NPEPPS/TBX21 region. Shown are CD4+T-cells (blue), CD8+T-cells (orange), NK cells (red) and CD14+monocytes (green) ...... 241 Figure 5.8 PLS-DA results for CD4+T-cells, showing the classification error rate across components (A) and the PLS-DA plot for components 1 and 2 (B)...... 245 Figure 5.9 sPLS-DA output for CD4+T-cells indicating the balanced error rate over number of selected features (A) and the sPLS-DA plot for components 1 and 2 (B)...... 247

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants xv

Figure 5.10 sPLS-DA plot of DNA methylation (top) and gene expression (bottom) within each cell type...... 248 Figure 5.11 Classification error rate across 10 components for each cell type (A) and the correlation between the different data types (B)...... 251 Figure 5.12 Plot of the variables from each 'block' (data type) that contributed to component 2 in CD4+T-cells, and the circos plot showing the correlations between these variables...... 252 Figure 5.13 Plot of the variables from each 'block' (data type) that contributed to component 2 in CD8+T-cells, and the circos plot showing the correlations between these variables (bottom right)...... 254 Figure 5.14 Clustered image map of each cell type from the final DIABLO model. The disease groups are shown as rows and variables as columns, with the dataset of origin indicated...... 255

xvi Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

List of Tables

Table 1.1 Average frequency of PBMC cell subtypes in the blood of healthy human donors.3 ...... 26 Table 1.2 Modified New York criteria for diagnosis of AS 42...... 34 Table 1.3 A list of drugs currently available or undergoing Phase III trials for the treatment of AS...... 36 Table 1.4 Summary of design and outcomes of DNA methylation studies in AS...... 50 Table 1.5 A summary of previous studies investigating gene expression in AS...... 55 Table 2.1 Cohort statistics for clinical and biological parameters...... 65 Table 2.2 Primer sets for HLA-B*27 typing using PCR...... 66 Table 2.3 Sample numbers for each cell type within the five sample plates processed...... 69 Table 2.4 The number of samples for DNA methylation from each cell type post- QC...... 71 Table 2.5 Adapter sequences used for the Total RNAseq libraries...... 76 Table 2.6 The number of RNAseq samples post quality control...... 78 Table 3.1 Sample numbers for each cell type for AS and healthy individuals (HC)...... 89 Table 3.2 Number of probes within each cell type below various significance cut- offs...... 94 Table 3.3. Number of DEG and permuted FDR for each cell type at various p- value cut-offs...... 95 ƒ„Ž‡͵ǤͶͶΪǦ ‡ŽŽʹͲ‘•–•‹‰‹ˆ‹ ƒ–†‹ˆˆ‡”‡–‹ƒŽŽ›‡–Š›Žƒ–‡†’‘•‹–‹‘•Ǥ ...... 98 ƒ„Ž‡͵ǤͷͺΪǦ ‡ŽŽʹͲ‘•–•‹‰‹ˆ‹ ƒ–Ǥ ...... 102 ƒ„Ž‡͵Ǥ͸ͳͶڏ‘‘ ›–‡ʹͲ‘•–•‹‰‹ˆ‹ ƒ–Ǥ ...... 106 ƒ„Ž‡͵Ǥ͹ǤɀɁǦ ‡ŽŽ•ʹͲ‘•–•‹‰‹ˆ‹ ƒ–†‹ˆˆ‡”‡–‹ƒŽŽ›‡–Š›Žƒ–‡†’‘•‹–‹‘•Ǥ ...... 110 ƒ„Ž‡͵Ǥͺ ‡ŽŽʹͲ‘•–•‹‰‹ˆ‹ ƒ–Ǥ ...... 114 Table 3.9 The 10 most significant DMR associated with AS in CD4+T-cells...... 117 Table 3.10 The 10 most significant DMR identified in CD8+T-cells...... 119 Table 3.11 10 CD14+monocyte DMR with the lowest Stouffer value ranked by minimum FDR ...... 122 Table 3.12 10 most significant DMR associated with AS in γδ T-cells...... 123 Table 3.13 10 most significant DMR associated with AS in NK cells...... 124 Table 3.14 Enrichment of DMP in different window sizes around AS- associated loci...... 136 Table 3.15. The number of probes available at the AS-associated loci within the window sizes examined...... 136 Table 3.16 The percent of AS-associated loci within each window that had no DMP...... 137 Table 4.1 Sample numbers for AS individuals and healthy controls post QC...... 160

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants xvii

Table 4.2. DEG numbers at different levels of significance for each cell type in the forward strand dataset...... 162 Table 4.3. DEG numbers at different levels of significance for each cell type in the reverse strand dataset...... 162 Table 4.4 Permuted FDR and number of DEGs at various p-value cut-offs for each cell type...... 162 Table 4.5 Forward strand read results for all cell types for genes implicated at loci previously associated with AS...... 165 Table 4.6 Reverse strand results for all cell types for genes implicated at loci previously associated with AS...... 167 Table 4.7. Top 20 differentially expressed genes in CD4+T-cells forward strand...... 170 Table 4.8 Top 20 differentially expressed genes in CD4+T-cells reverse strand...... 172 Table 4.9 CD8+T-cell forward strand top 20 differentially expressed genes in AS individuals compared to healthy controls...... 176 Table 4.10 CD8+T-cell reverse strand top 20 differentially expressed genes in AS individuals compared to healthy controls ...... 178 Table 4.11 CD14+monocytes forward strand top 20 differentially expressed genes in AS individuals compared to healthy controls...... 181 Table 4.12 CD14+monocytes reverse strand top 20 differentially expressed genes in AS individuals compared to healthy controls...... 183 Table 4.13. γδ T-cell forward strand top 20 differentially expressed genes...... 187 Table 4.14 γδ T-cell reverse strand top 20 differentially expressed genes...... 189 Table 4.15 NK cell forward strand top 20 differentially expressed genes...... 192 Table 4.16 NK cell reverse strand top 20 differentially expressed genes...... 194 Table 5.1 The number of cis-mQTL results by cell type with the most significant cis-mQTL shown...... 229 Table 5.2 qvalue for top cis-mQTL associated with the lead SNP for the AS- associated genes listed...... 232 Table 5.3 The number of trans- mQTL results for each cell subset with the most significant trans-mQTL shown...... 234 Table 5.4 cis -eQTL results by cell type indicating the number of unique genes and those associated within the HLA region...... 236 Table 5.5 Qvalues for individual cell types cis-eQTL with the lead SNPs and the gene associated...... 237 Table 5.6 cis-eQTL associations for lead SNPs that are not with the gene thought to be for that SNP ...... 239 Table 5.7 Trans-eQTL results by cell type indicating total mQTL and number of mQTL with unique genes...... 242 Table 5.8 Number of healthy and AS individuals with data available per cell type...... 243 Table 5.9 Classification error rates of each cell type individual datasets for PLSDA (centroids distance)...... 244 Table 5.10 Balanced error rate (BER) and number of correct assignations for the test cohorts...... 257

xviii Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

List of Abbreviations

APC antigen presenting cell AS ankylosing Spondylitis ASAS assessment of spondyloarthritis international society ASDAS ankylosing spondylitis disease activity score axSpA axial spondyloarthritis β2-m β2- microglobulin BASDAI Bath ankylosing spondylitis disease activity index BASFI Bath ankylosing spondylitis functional index bDMARD biological disease modifying anti-rheumatic drugs CNV copy number variation CpG cytosine-phosphate-guanine paired bases CRP c-reactive DAMP damage associated molecular pattern DC dendritic cell DEG differentially expressed gene DMP differentially methylated position DMR differentially methylated region DNMT DNA methyltransferase ER endoplasmic reticulum ERAP endoplasmic reticulum aminopeptidase ESR erythrocyte sedimentation rate eQTL expression quantitative trait loci FACS fluorescence activated cell sorting FDR false discovery rate GO GRS genetic risk score HLA human leukocyte antigen IBD inflammatory bowel disease IFN-γ γ IL interleukin KEGG Kyoto encyclopaedia of genes and genomes

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants xix

KIR killer-cell immunoglobulin-like receptors LD linkage disequilibrium lncRNA long non-coding RNA MAF minor allele frequency MFI mean fluorescence index miRNA microRNA mRNA messenger RNA mQTL methylation quantitative trait loci MRI magnetic resonance imaging NGS next-generation sequencing NK cell natural killer cell nrAxSpA non-radiographic axial spondyloarthritis NSAID non-steroidal anti-inflammatory drug PAMP pathogen associated molecular pattern PBMC peripheral blood mononuclear cell PC principal component PCA principal components analysis PLS partial least squares pSpA peripheral spondyloarthritis RIN RNA integrity number RNAseq RNA sequencing SF synovial fluid SLE systemic lupus erythematosus SNP single nucleotide polymorphism SpA spondyloarthritis SVA surrogate variable analysis TCR T-cell TNF-α tumour necrosis factor- alpha VST variance stabilising transformation

xx Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants xxi

Chapter 1: Introduction

Ankylosing spondylitis (AS) is a common immune-mediated inflammatory arthritis that affects approximately one in two hundred Australians. Early studies identified the strong genetic basis of AS, and technological advancements alongside study design and sample size aspects, have steadily increased the number of AS-associated genetic loci identified. While these loci have provided some information on the pathways involved in AS they do not explain the majority of AS heritability. Identification of loci alone does not provide insight into how these loci alter gene function. The mechanism through which these genetic changes alter phenotype may be direct as non- synonymous mutations or indirectly through affecting regulatory functions.

Understanding the fundamental biology underlying the development and progression of AS is key in developing better treatment and diagnostics for individuals affected by this disease. Outlined below are the immune system concepts implicated in AS, AS diagnosis and treatment, and the current research into the genetics and epigenetics of

AS.

Immune system

The immune system is comprised of two broad arms: innate and adaptive immunity.

Innate immunity is the first line of defence against pathogens and consists of physical, chemical and cellular responses. Physical and chemical barriers exist throughout the human body to prevent access by pathogens, prime examples of these barriers are the gut mucosa and the epidermis. When these barriers fail, additional layers of innate response may be implemented.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 23

Innate cell types can respond to damage association molecular patterns (DAMPs) and pathogen associated molecular patterns (PAMPs), enabling them to recognise pathogens and undertake inflammatory responses. Phagocytic cells engulf and digest pathogens and cellular debris. Whilst most cells are capable of phagocytosis,

‘professional’ phagocytic cells include macrophages, neutrophils, and mature dendritic cells (DCs). Additionally, innate cell types can respond to recognition of

DAMPs and PAMPs by displaying antigen to adaptive cell types. These innate cells are known as antigen presenting cells (APCs) and include monocytes and natural killer

(NK) cells (lineage for these cell types is shown in Figure 1.1).

Figure 1.1. Lineage for major immune cell types originating from haematopoietic stem cells.

Monocytes are defined as classical (CD14++CD16-), intermediate (CD14++CD16+), and non-classical monocytes (CD14+CD16+) 2. Classical monocytes are the major monocyte subset in blood (~90%). Monocytes are the cellular precursors to DCs, tissue macrophages and osteoclasts (cells responsible for bone resorption). NK cells are primarily involved in direct killing of virally infected cells, and producing interferon-

24 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

γ (IFN-γ), a cytokine that activates macrophage phagocytosis of microbes 3. APCs are constantly sampling their surroundings and internal peptides. On recognition of antigens, these cells activate and migrate to the lymph node where they display antigen to T-cells and activate adaptive immune responses.

Innate immunity provides a rapid response to immune challenges, but innate responses are constrained by the germline encoded receptors. Adaptive immunity complements innate immunity by providing a broader scope for antigen recognition through somatic recombination. Humoral adaptive immunity involves antibody mediated neutralization of pathogens by B-cells, and cellular immunity involves T-cell mediated activation of immune responses.

T-cells are defined by the expression of a T-cell receptor (TCR), either αβ or γδ, and co-expression of a CD3 co-receptor. All T-cells originate from a bipotent progenitor in the thymus 4. γδ T-cells are the first T-cell to emerge during development, and initially comprise the bulk of T-cells 5. Over time the balance shifts to αβ T-cells in the blood and lymphoid tissue, while γδ T-cells can comprise between 10-100% of tissue resident T-cells in adults 6. The most common T-cell in peripheral blood, αβ T- cells are further classified by CD4 and CD8 membrane that assist T-cell receptors to bind to human leukocyte antigen (HLA).

Unlike αβ T-cells, γδ T-cells span the adaptive and innate immune systems. γδ T-cells can recognise non-peptide antigens without stimulation through the use of pattern recognition receptors which recognise DAMPs and PAMPs 7,8. In response to these challenges, γδ T-cells are potent producers of inflammatory cytokines IFN-γ, interleukin-17A (IL-17A) and tumour necrosis factor- alpha (TNF-α) 9. As with αβ T- cells, inflammatory cytokine expression is used to categorise the function of these cells. Unfortunately, there is a lack of knowledge of the antigens recognised by γδ T-

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 25

cells or if in fact antigens are presented to γδ T-cells in a classical sense as with αβ T- cells. Mucosal-associated invariant T-cells (MAIT) cells are another T-cell type that exhibit different characteristics to conventional T-cells, expressing only a single T-cell receptor chain which recognizes bacteria-derived vitamin B metabolites 10. Both innate and adaptive immune cells are highly dynamic, responding to immune challenges and disease through shifts in cellular composition and function (Table 1.1) 11-13.

Table 1.1 Average frequency of PBMC cell subtypes in the blood of healthy human donors.3

Cell Type % PBMC in Healthy Donors

T-cells ͶͺǦ͹͹

γδ T-cells ʹǦͷ

CD4+T-cells ʹ͸ǦͶͺ

CD8+T-cells ͳʹǦ͵͵

B-cells ͸Ǧͳ͸

NK Cells ͸Ǧ͵ͷ

CD14+Monocytes ͸Ǧͻ

Antigen processing and presentation

APCs constantly sample the internal and external environment and present peptide antigens to T and B cells, as a process of surveillance for anomalies such as pathogens, cellular damage, and unregulated cellular behaviour. Peptides are displayed to T-cells via three antigen presentation pathways: exogenous, endogenous, and cross presentation (shown in Figure 1.2). Endogenous peptides can be viral or from the cell itself, therefore several layers of regulation surround the process of antigen presentation. Endogenous peptides are first trimmed to 9-17 amino acids by the proteasome, after which these peptides can be transported into the endoplasmic reticulum (ER) by the transporter-associated protein (TAP). Once within the ER amino

26 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

peptidases, particularly the endoplasmic reticulum amino peptidase 1 (ERAP1), trim the peptides further to 9 amino acids, the optimal length for HLA peptide presentation.

Trimmed peptides can be loaded onto the HLA class I complex. The classical HLA class I receptors encompass the allelic forms HLA-A, HLA-B and HLA-C. HLA class

I is formed in the ER by an HLA heavy chain and β2-microglobulin complex folded around an endogenous peptide. Once folded, this tri-molecular complex is transported to the cell surface and displayed to CD8+T-cells. In the correct environmental cytokine conditions and co-receptor recognition, HLA-peptide recognition triggers proliferation and differentiation of CD8+T-cells into cytotoxic T-cells capable of releasing cytotoxic proteins to directly kill compromised cells 3.

Exogenous peptides are sampled from the environment through endocytosis and digested within the endosome. HLA class II receptors include HLA-DPA1, HLA-

DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA and HLA-DRB1. HLA class II molecules consist of an alpha and beta chain that are assembled within the ER and stabilised by the invariant chain (li). This complex is transported through the Golgi to the MHC class II compartment, where the acidic pH cleaves li to residual class II- associated li peptide (CLIP). The HLA class II-clip complex is transported to the cytosol, where the CLIP is exchanged for antigenic peptide. This peptide-MHCII complex is transported to the cell surface and presented to CD4+T-cells. On recognition of exogenous peptide-HLA complex, CD4+T-cells excrete cytokines that bind to macrophages and activate B-cells. CD4+T-cells are involved in activating or suppressing the response of other immune cells through cytokine signalling (helper T- cells). CD4+T-cells involved in self-tolerance through suppression of self-antigen recognising T-cells are classified as regulatory T-cells (Tregs), and are identified by the expression of the FoxP3 14-16.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 27

Figure 1.2 Antigen presentation by HLA class I and II. Peptides are displayed to T-cells via three antigen presentation pathways: exogenous (left), endogenous (middle) and cross presentation (right).

HLA class I receptors can also present exogenous peptides through cross-presentation

17,18. Cross-presentation can occur through two primary pathways: the vacuolar pathway and the endosome-to-cytosol pathway. In the vacuolar pathway antigen processing by lysosomal proteases and loading onto HLA class I receptors all occur within the endo/lysosomal compartment. Alternatively, endosome-to-cytosol processing involves the transport of internalised proteins to the cytosol where they are

28 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

degraded by the proteasome and puromycin-sensitive aminopeptidase (NPEPPS) and transported into the ER or to antigen containing endosomes for loading onto HLA class

I. There is some evidence that cross-presentation can occur without endocytosis, through the transport of pre-processed antigens or preloaded HLA class I from a donor cell through cell-cell contact. The physiological significance of this process and its functional role remains to be elucidated 17.

HLA-peptide complex display alone is not sufficient to prime T-cells. Specific environmental cytokine conditions and co-receptor recognition must occur alongside

HLA-peptide complex recognition to activate T-cells. This checkpoint ensures that endogenous peptides, such as self-peptides, do not activate T-cells. Complementary to these environmental requirements for activation, NK cells have developed to detect

‘escape’ from proper HLA development which can indicate incorrect cellular development or viral downregulation. It should be noted though that Memory T-cells are not as reliant on cytokine stimulation for activation as their naive counterparts19.

Killer cell immunoglobulin-like receptors (KIRs) are expressed on both NK cells and subsets of T-cells, and bind to HLA class I, whilst the CD94 receptor which is expressed by NK cells is specific for binding HLA-E 20. Proper recognition of HLA expression on a target cell inhibits the effector functions of NK cells, however lack of expression results in inflammatory signalling and direct cytotoxic response by the NK cell 21. These checkpoints ensure that the activation of T-cells is highly regulated.

Immune homeostasis relies upon the balance between tolerance towards self and protection from pathogens. When the delicate balance of tolerance and protection shifts, the immune system that is designed to protect from external challenges can itself cause harm.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 29

Spondyloarthritis

Spondyloarthritis is an umbrella term for a family of chronic rheumatic diseases, including axial spondyloarthritis (axSpA), peripheral spondyloarthritis (pSpA), psoriatic arthritis, reactive arthritis and enteropathic arthritis (complicating inflammatory bowel disease). Spondyloarthritis affects the enthesis, sites of insertion for muscle to bone, generally within high stress regions in the peripheral (e.g. hands) and/or axial (spine) joints. The diseases are typically divided into pSpA or axSpA, based on the sites of inflammation, however individuals may have both peripheral and axial disease. A category of undifferentiated spondyloarthritis is used to classify individuals who exhibit features of spondyloarthritis but fall short of reaching the classification threshold for a specific disease subgroup.

AxSpA includes, but is not limited to, ankylosing spondylitis (AS), the most common form of spondyloarthritis. The Assessment of Spondyloarthritis International Society

(ASAS) classification criteria for axSpA includes both clinical (such as HLA-B*27 status) and imaging criteria. Non-radiographic axial spondyloarthritis (nr-axSpA) involves inflammatory changes in the joint identifiable by magnetic resonance imaging (MRI) but not radiographic changes that are identifiable by x-ray. The presence of radiographic lesions particularly of the spine and sacroiliac joints is used to distinguish AS from nr-axSpA. Some individuals initially diagnosed with nr-axSpA may progress to radiographic disease. Spondyloarthritis affects approximately 1% of adults between the ages of 20-65 years old in American-Caucasians, and similar prevalence is observed across all Caucasian populations 22.

30 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Ankylosing Spondylitis

AS is a chronic inflammatory arthritis that primarily affects the spine and sacroiliac joints of the pelvis. Initial stages of AS are characterised by inflammation followed by degradation at spine and sacroiliac joints, which may progress to uncontrolled bone growth and eventually ankyloses (fusion) at these joints (illustrated in Figure 1.3)ʹ͵ǡʹͶ.

Whilst the primary sites of inflammation in AS are the entheses, regions of similar mechanical stress are often affected as well including the ciliary body (controls the shape of the lens in the eye), the ileocecal sphincter (controls movement of acidic fluid from the small intestine to the large intestine), and in rare cases the aortic valve

(controls the major blood flow to the heart) or the ascending aorta. Extra-articular manifestations of AS may include acute anterior uveitis (~30% of patients), psoriasis

(~10% of patients), inflammatory bowel disease (IBD) (primarily Crohn’s disease)

(~10% of patients), osteoporosis, and cardiovascular complications, primarily ascending aortitis (<1% of patients) 25-28. Spondyloarthritis generally is associated with increased cardiovascular disease, and AS individuals have specifically been found to have increased risk of ischemic heart disease, a narrowing of the blood vessels also known as coronary heart disease, and stroke 29. Arthritis conditions complicating psoriasis and IBD are closely related clinically with AS, and often co-occur in patients and families 26.

The prevalence of AS is highly correlated with the prevalence of HLA-B*27 in populations, with a variable prevalence from 0.02% in Sub-Saharan Africa (where

HLA-B*27 is rare) to 1.6% in Northern Arctic communities 30-32. Clinically AS is reported 2-3 times more often in males than females, and men have greater radiographic progression over time 33. Disease onset typically occurs prior to the age

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 31

of 45 years, however the average delay in diagnosis is 8 years after the onset of symptoms 34,35.

Figure 1.3. Examples of radiographic imaging for the spinal vertebrae in a healthy individual and an AS individual with a 'bamboo' spine phenotype (full ankylosis of the spine)19.

Impact of AS on Quality of life

AS impacts affected individuals’ quality of life globally. As the disease progresses inflammation and changes in the bone morphology cause pain and limit mobility.

Limitations in mobility can prevent individuals with AS from keeping physically active, travelling or participating in their normal work 36. Individuals with AS have higher levels of absenteeism, but also higher levels of presenteeism, where an individual is present at work but has reduced productivity 37. This may be due to higher

32 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

reported levels of fatigue, sleep disruption and depression38. The impact on work productivity and the cost of treatments for both AS and any associated comorbidities has a financial impact on individuals and their families. This is in addition to emotional burden as the disease shifts the focus of the relationship to the disease 38,39. Overall the impact of AS on individual quality of life is multifaceted and far-reaching. Delayed diagnosis can contribute to the impact of AS on an individuals’ quality of life.

Diagnosis

AS is diagnosed using the modified New York criteria, which utilises both clinical criteria and x-ray imaging (Table 1.2) 40. Clinical criteria include lumbar spine and chest expansion, and persistent lower back pain. Radiographic progression of AS is measured using the modified New York criteria for AS, which incorporates cervical and lumbar spine radiographs 41. As this methodology is specific to individuals with radiographic progression it is not as effective for the diagnosis of early stage AS. C- reactive protein (CRP) and erythrocyte sedimentation rate (ESR) are used alongside the modified New York criteria for AS as suggestive clinical markers due to their association with disease activity measures. The usefulness of these markers is limited as they are affected by a range of factors, and are not always indicative of disease or disease severity 42. BASDAI (Bath ankylosing spondylitis disease activity index) measures the subjective impact of AS on individuals using a visual analogue scale for items such as stiffness and fatigues. Impacts is scored between 0 (no disease activity) and 10. The AS disease activity score (ASDAS) incorporates three BASDAI measures with a quantitative measurement (ESR or CRP). A similar method to measure functional impact is the Bath AS functional index (BASFI).

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 33

Table 1.2 Modified New York criteria for diagnosis of AS 43.

Modified New York Criteria, 1984 ‡ˆ‹‹–‡‹ˆ‘‡”ƒ†‹‘Ž‘‰‹ ƒŽ ”‹–‡”‹‘ƒ†ƒ–Ž‡ƒ•–‘‡ Ž‹‹ ƒŽ ”‹–‡”‹‘Ǥ

Clinical Criteria (at least one): x ‘™„ƒ ’ƒ‹‘ˆƒ–Ž‡ƒ•–͵‘–Š•ǯ†—”ƒ–‹‘‹’”‘˜‡†„›‡š‡” ‹•‡ƒ†‘– ”‡Ž‹‡˜‡†„›”‡•– x ‹‹–ƒ–‹‘‘ˆ–Š‡Ž—„ƒ”•’‹‡‹•ƒ‰‹––ƒŽƒ†ˆ”‘–ƒŽ’Žƒ‡• x Š‡•–‡š’ƒ•‹‘†‡ ”‡ƒ•‡†”‡Žƒ–‹˜‡–‘‘”ƒŽ˜ƒŽ—‡•ˆ‘”ƒ‰‡ƒ†•‡š

Radiological Criteria: x ‹Žƒ–‡”ƒŽ•ƒ ”‘‹Ž‹‹–‹•‰”ƒ†‡ʹ–‘Ͷ x ‹Žƒ–‡”ƒŽ•ƒ ”‘‹Ž‹‹–‹•‰”ƒ†‡͵‘”Ͷ

Grading of Radiographs: Ͳ‘”ƒŽ —•’‹ ‹‘—• ‹‹ƒŽ•ƒ ”‘‹Ž‹‹–‹• ‘†‡”ƒ–‡•ƒ ”‘‹Ž‹‹–‹• ›Ž‘•‹•

Diagnosis remains an issue in AS, as current methods rely on broad clinical markers, which can be caused by other diseases or factors, and radiographic progression, which occurs in the later stages of disease. Increasing understanding of the fundamental biological basis of AS provides an avenue for the potential development of new diagnostic methods, such as genetic risk scores (GRS) which use known genetic risk loci to calculate the likelihood that an individual will develop disease, progress, or respond to treatment. Diseases such as type 1 diabetes, rheumatoid arthritis and IBD have illustrated the viability and potential benefits of the GRS approach 44-46. Currently a GRS for AS is under development by our group 47. The importance of improving diagnosis is vital to reduce time to diagnosis from symptom onset and enable early treatment of those individuals with AS.

34 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Treatment

Exercise is a fundamental and cost-effective treatment for AS 48-51. Physical therapy and exercise are used to maintain mobility of the spine and pelvis, and to develop functional capacity around any limitations. Medication is used to suppress inflammation, either with non-steroidal anti-inflammatory drugs (NSAIDs) or, in individuals with more severe disease, biological disease modifying anti-rheumatic drugs (bDMARDs) (summarised in Table 1.3). Anti-TNF-α was the first approved bDMARD in AS, and is the most common treatment with 5 drugs approved for use in

AS: adalimumab, golimumab, infliximab, certolizumab and etanercept 52-56. However, clinical trials have found anti-TNF-α therapy is ineffective in 20-40% of AS cases, and there is uncertainty if anti-TNF-α reduces radiographic progression 53,57,58.

Novel treatments to target alternative inflammatory pathways identified in genetic studies in AS have been developed targeting the IL-17, IL-23 and JAK/STAT signalling pathways. Treatment efficacy is evaluated using criteria developed by

ASAS/European League Against Rheumatism (EULAR) alongside disease activity measures such as BASDAI. For consistency drug efficacy is discussed in terms of

ASAS20 changes. An ASAS20 response is defined as an improvement of 20% in at least 3 of the 4 core areas of patient global, pain, inflammation and function, with no worsening in the remaining domain 59. Three anti-IL-17 antibodies have been trialled in AS. Secukinumab, an anti-IL-17 monoclonal antibody, has been approved by the

FDA for use in AS, and has an ASAS20 response rate of 61.1% compared to placebo response of 28.4% 60,61. The newer ixekizumab, was shown in phase III trials to achieve an ASAS20 response at week 16 of 69% when given every two weeks, and

64% when taken every four weeks, compared to an adalimumab group with an

ASAS20 of 59% at week 16 62. Bimekizumab a dual IL-17A and IL-17F monoclonal

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 35

antibody showed efficacy for spondyloarthritis in Phase II trials, with a ASAS20 of

58% at week 1263,64. Interestingly, two IL23 agonists, ustekinumab and risankizumab, both failed to show efficacy above placebo in AS 65-67. Three drugs tofacitinib, upadacitinib and filgotinib, targeting IL-23/IL-17 signalling through inhibition of

JAK1 and JAK3 are currently in phase III trials 68-71. In Phase II trials tofacitinib had an ASAS20 of 80.8% at week 12 compared to placebo (41.2%), and filgotinib had an

ASAS20 of 76% at 12 weeks70,72. Tofacitinib has been efficacious in psoriasis and rheumatoid arthritis 73-75.

Table 1.3 A list of drugs currently available or undergoing Phase III trials for the treatment of AS.

Drug Type of Inhibitor Target Status

Adalimumab Monoclonal antibody TNF-α In use

Golimumab Monoclonal antibody TNF-α In use

Infliximab Monoclonal antibody TNF-α In use

Certolizumab Monoclonal antibody TNF-α In use

Etanercept Soluble TNF receptor TNF-α receptor In use

Secukinumab Monoclonal antibody IL-17 In use

Efficacy in phase III trial Ixekizumab Monoclonal antibody IL-17A (under review with FDA) Efficacy in Phase 2, Bimekizumab Monoclonal antibody IL-17A/ IL-17F currently in Phase II trials Some efficacy in phase III trial, Ustekinumab Monoclonal antibody IL-12/IL-23p40 did not meet endpoints. Some efficacy in phase II trial, Risankizumab Monoclonal antibody IL-23p19 did not meet endpoints Synthetic small Efficacy in phase II trials, Tofacitinib JAK1 molecule inhibitor undergoing phase III Synthetic small Filgotinib JAK3 Efficacy in phase II trials molecule inhibitor Synthetic small Upadacitinib JAK1 Efficacy in Phase 2/3 trials molecule inhibitor

36 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

It is important to note that some biologics may be chosen for individuals due to their effect in the treatment of comorbidities, such as psoriasis and Crohn’s disease, and some may be contraindicated for this reason 76-78. Secukinumab failed as a treatment for Crohn’s disease, and therefore is not recommended for individuals with this comorbidity. In AS, bDMARD efficacy is affected by the delay in diagnosis, with studies indicating the longer the delay between symptom onset and initiation of therapy, the greater the rate of radiographic progression 57. Mouse models indicate that the initial spike in inflammation drives intervertebral disk destruction which may explain why anti-TNF-α treatment is most effective prior to extensive spinal damage occurring 79,80. In addition to time of treatment environmental factors, such as smoking, can affect bDMARD and DMARD efficacy. High levels of smoking are associated with earlier onset of symptoms, more severe disease measurements and poorer prognosis in AS 81,82. Smoking affects CRP and ESR levels even after cessation 83.

In addition to bDMARDs, drugs to treat peripheral arthritis or extra-articular manifestations may be prescribed. Glucocorticoids are recommended for the treatment of tissue-localised inflammation and provide suppression of the immune system through suppression of nuclear factor κβ (NF-κβ) transcriptional pathways, and inhibition of inflammatory cytokines.

Current treatments provide pain relief and may slow the radiographic progression of

AS; however, they do not address the underlying causes of AS and cannot prevent nor reverse radiographic changes. All current treatments in AS have been repurposed from other related diseases, such as RA and IBD. A lack of understanding of the fundamental changes underlying AS has hindered the development of drugs specific to this disease and efforts to diagnose and treat patients early in disease. Greater

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 37

understanding of the biological changes in AS would allow for more targeted therapies for the treatment of AS.

Genetics of ankylosing spondylitis

Studies into disease associated genetic polymorphisms in AS was driven by early identification of the high level of heritability in studies of twins, and in unrelated individuals 84-86. AS has a measured heritability of more than 90% 84,85. HLA-B*27, a

HLA class I receptor, was quickly identified as associated with AS, and alone accounts for 20.1% of AS heritability 86,87. Non-HLA-B*27 alleles have also been associated with AS both as risk alleles (e.g. HLA-A*02) and protective alleles (e.g. HLA-B*07)

88,89. Advances in gene coverage and the cost of genotyping have led to increasing numbers of genetic association studies which has steadily increased the number of disease-associated loci identified. A recent large scale study examined five genetically and clinically associated chronic immune-mediated diseases (AS, psoriasis, Crohn’s disease, ulcerative colitis and primary sclerosing cholangitis) on a single genotyping chip 89. This study identified 17 new AS-associated loci bringing the current number of known AS-associated loci to 115. The shared genetic burden between AS, IBD and psoriasis was identified as being due to pleiotropy, the same genes giving rise to different phenotypes, rather than genetic heterogeneity. Shared genes from these diseases include HLA, IL23R, DNA methyltransferase 3A (DNMT3A), DNMT3B,

DNMT3L, and several genes involved in the JAK-STAT pathway89-92.

HLA-B*27

There are four core theories regarding the association of HLA-B*27 with AS: arthritogenic peptide display, misfolding causing ER stress response, cell surface free heavy chain homodimer recognition, and HLA-B*27 as an immunodeficiency gene. It

38 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

is important to note the complex association of HLA-B*27 subtypes with AS, as some are associated with risk of AS (HLA-B*2702, HLA-B*2704, HLA-B*2705, HLA-

B*2707, HLA-B*2708) but several are protective (HLA-B*2706, HLA-B*2709) 93.

The functional differences between these allelic subtypes is subtle but have aided research into the mechanism through which HLA-B*27 contributes to AS pathogenesis.

Arthritogenic peptide

The arthritogenic peptide theory posits that microbial peptides mimicking, or resembling, self-peptides bind to HLA-B*27 and trigger the expansion of self-reactive

CD8+T-cells. Evidence for this theory includes the expansion of self-reactive T-cells at sites of inflammation in spondyloarthritis, and the enrichment of specific TCRβ motifs in the CD8+T-cells of HLA-B*27 positive AS individuals 94,95. To date no common antigen has been identified to drive CD8+T-cell expansion. Certain transcripts have been identified as more abundant in AS-associated HL-B*27 subtypes, but these are not exclusive to AS-associated subtypes nor are they bound by all the AS-associated subtypes that have been examined 96. One explanation may be that a threshold amount of antigen is required for activation of self-reactive T-cells.

This would correlate with the slower peptide trimming rate of AS protective ERAP1 variants compared to wild-type forms 97. More data on the peptide repertoire of HLA-

B*27 subtypes is needed to properly address the arthritogenic peptide theory.

ER stress response

A second theory is based on the relatively slow folding rate of HLA-B*27 and its association with beta 2-microglobulin (β2m) 98. This slow folding causes HLA-B*27 to accumulate in the ER, which in turn induces ER stress and increased production of

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 39

inflammatory cytokines, specifically IL-23 99. This theory is supported primarily from animal models that exhibit disease characteristics of AS and exhibit ER stress 100-102.

These models incorporate multiple copies of HLA-B*27 and their effect can be ameliorated by additional copies of the human β2m gene. The additional copies of human β2-microglobulin reduce colitis but result in an increase in the severity of arthritis, indicating that arthritis severity is not triggered by the unfolded protein response 103. There is little evidence that such an accumulation occurs in humans, nor do individuals with AS risk associated ERAP variants and HLA-B*27 have increased

ER stress response markers 104,105. Overall, this data suggests that HLA-B*27 and

ERAP1 act through alternative mechanisms than ER stress response to promote AS pathogenesis.

Heavy chain homodimers

The ability of the HLA-B*27 free heavy chain to form homodimers is thought to trigger chronic inflammation as it is capable of binding to killer cell immunoglobulin- like receptor 3DL2 (KIR3DL2), which classical HLA-B*27 does not 106,107. KIR3DL2 binds HLA-B*27 homodimers with greater affinity than other HLA class I molecules, and binding stimulates KIR3DL2 T-cell proliferation and the production of IL-17 (an inflammatory cytokine) 108,109. A recent paper reported no homodimer formation in the AS risk associated allele HLA-B*2703, but did report homodimer formation in the non-associated HLA-B*2709 allele form, effectively disproving this hypothesis 110.

There is no direct evidence of pathogenicity in humans in relation to homodimer formation 111.

40 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Immunodeficiency

An overarching model for AS pathogenesis is mucosal immunodeficiency. This theory is based on the prevalence of subclinical gut inflammation in AS (~70% of AS patients), and the number of IBD related genes associated with AS 112-116. The exact mechanism behind these shared genes is unclear, however many are involved in microbial sensing and response, such as IL23, TLR4, NOD2 and CARD9, or a loss of immune function such as with IL-10. The SKG mouse model of AS does not develop disease in microbe-free environments suggesting that the microbiome is necessary for initiation of disease 117,118. It is hypothesised that AS-associated genes either directly contribute to pathogenesis through permeability of the gut to bacteria, and/or indirectly through a heightened response to the invading microbes 119. AS individuals have been shown to have impaired gut vascular barriers and gut epithelial barriers which leads to bacteria and bacterial products being able to translocate into the bloodstream from the gut 120. AS patients have a significantly different terminal ileum microbiome, with specific increases in the abundance of six bacterial families compared to healthy controls (Lachnospiraceae, Ruminococcaceae, Rikenellaceae, Porphyromonadaceae, and Bacteroidaceae) and decreases in two (Veillonellaceae and Prevotellaceae) 121.

It is likely that these changes in gut morphology and gut microbiome are influenced by host genetics 122. There is some suggestive research for an association between

HLA-B*27 cells and lower levels of bacterial clearance in mouse fibroblasts, human monocytes and human epithelial cell lines, however this effect is not consistent between different cell lines or bacterial species 123-126. Further studies on the role of host genetics in mucosal immunodeficiency is required to determine whether these changes are consequential or causative.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 41

ERAP1 and ERAP2

ERAP1 and ERAP2 are associated with both HLA-B*27 and HLA-B*40 positive AS, and AS patients exhibit enhanced ERAP1 activity 127,128. After initial peptide trimming to 9-17 amino acids by the proteasome, ERAP1 is responsible for trimming peptides to an optimal length of 9 amino acids for HLA presentation 129,130. ERAP2 has overlapping functionality with ERAP1 but has been shown to have different specificities based on peptide length and sequence130. It is theorised that AS-associated

SNPs alter peptide trimming preferences and affect the peptides displayed by HLA receptors 97,131. Since functional reduction of ERAP1 and ERAP2 do not affect ER stress markers, surface free heavy chain availability or inflammatory cytokine levels, it is believed that this association of ERAP1 and ERAP2 supports the arthritogenic peptide theory 97,105,132.

Interleukin 23

IL-23, an inflammatory cytokine, is known to modulate several inflammatory pathways including regulation of IL-17 T-cell populations and autoreactive B cell selection 133,134. DCs release IL-23 alongside other inflammatory cytokines upon pathogen recognition, and these cytokines go on to activate T-cells, mast cells and neutrophils 135. Binding of IL-23 by the IL-23 receptors on these cells triggers STAT3 translocation to the nucleus to initiate further inflammatory cytokine expression, including IL-17. In mice, IL-23 overexpression alone is enough to induce the ‘dual’ bone physiology of bone erosion and uncontrolled bone growth seen in AS 136. AS individuals have elevated levels of IL-23 responsive IL-17 secreting CD4+T-cells, although this is not consistently observed 137. Elevated IL-23 and IL-17 secretion has been shown in AS individuals 9,109,137. Several genes involved in the JAK-STAT

42 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

signalling pathway are implicated in AS, including JAK2, tyrosine kinase 2 (TYK2),

STAT3, TNFAIP3, TBX21, and IL6 90,138,139. Cumulatively, the data suggests a role for the IL-23 signalling pathway in the inflammatory progression of AS.

Osteoimmunology

Strong evidence implicates the immune system and the microbiome in AS pathogenesis, yet the interaction between these processes and bone changes in AS remains elusive. The key cells involved in bone formation and resorption are osteoblasts and osteoclasts, specifically. Osteoclast precursors are broadly considered to be classical monocytes, although all monocytes can differentiate into osteoclasts.

The Wnt signalling pathway, a bone remodelling pathway, has been implicated in AS through GWAS, and in differential expression studies 97,140-143. IL-17A stimulates

RANKL expression and inhibition of the Wnt signalling pathway, which in turn inhibits osteoblast activity 144. Increased serum levels of Dickkof-1 (DKK1), an inhibitor of the Wnt signalling pathway, are associated with the likelihood of new syndesmophyte formation in AS individuals 145-147. Further evidence is required to determine how changes in inflammation in AS trigger changes in bone formation.

Missing heritability and functionality

Despite the numerous loci implicated in AS pathogenesis, there is no clear explanation for how inflammation induces a ‘dual’ bone phenotype of AS, or the mechanism through which many of these loci contribute to AS pathogenesis. Cumulatively AS- associated genetic loci only account for a moderate portion of the total heritability of

AS (~28%) 89. Potential reasons for this ‘missing heritability’ include large numbers of variants of smaller effect yet to be identified, rare variants being missed by available genotyping arrays, copy number variation (CNV), insertion/deletion events, gene-gene

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 43

interactions, and epigenetic factors 148. Epigenetics refers to functional modifications to DNA other than base sequence coding, and includes histone modifications, DNA methylation, and non-coding RNA interactions with transcriptional and translational machinery. Part of this ‘missing heritability’ could be explained by examining epigenetic factors such as DNA methylation and transcription.

Cell types in Ankylosing Spondylitis

The primary site of disease origin in AS remains contested. No robust comparison between circulating immune cells (PBMCs) and tissue resident cells has been performed, nor has the originating tissue for disease been identified. Our laboratory leveraged the availability of publicly available cell type-specific epigenetic, gene and protein expression data to interrogate which cell types these AS-associated genetic loci operate in 149. These included reference genome and cell line information for epigenome data, core histone modifications (H3K4me1, H3K4me3, H3K27me3,

H3K36me3 and H3K9me3), gene expression RNAseq data from the integrated NIH

Roadmap compendium (57 cell types) and BLUEPRINT of Haematopoietic

Epigenomes project (36 cell types). These approaches demonstrated that AS- associated loci were enriched in gut and immune cell types, particularly in CD4+T- cells, CD8+T-cells, NK cells, CD14+monocytes, and regulatory T-cells. A limitation of this study is the lack of publicly available data for lower prevalence cellular populations, such as γδ T-cells, and MAIT cells which have previously been identified as associated with AS through mouse and cellular assays 150-152.

Cell types identified using this approach have been shown to shift to more inflammatory profiles in AS individuals, but many do not have changes in the composition of these cell types 153,154. T-cells have increased numbers of T helper cells,

44 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

IL-17 producing T-cells, and IL-23 positive γδ T-cells 9,155,156. Monocytes have impaired osteoclastogenesis, the transition of monocytes to osteoclasts 157-165. The number of CD8+T-cells and NK cells have been suggested as predictive of therapeutic outcome in AS 166. This current data suggests that numerous cell types are affected in

AS and may be involved in pathogenesis. Further research is required to determine specifically whether AS-associated loci are acting broadly on all cell types, or in a cell type specific manner.

DNA methylation in AS

DNA methylation has been investigated in disease contexts to understand changes in transcriptional control, to determine the effects of genetic variants on function, and as a potential biomarker for disease outcome or severity. DNA methylation refers to the addition of a methyl group (CH3) to a cytosine to form 5-methylcytosine (5mC).

Predominantly methylation occurs on cytosine-phosphate-guanine paired bases

(CpGs).

Figure 1.4 Graphical representation of the CpG naming conventions based on proximity to CpG Island.

The majority of methylated CpGs occur in CpG islands, dense CpG regions of DNA between 300-3000 bp. CpG sites located 2kb upstream or downstream of a CpG island are defined as CpG shores, and CpG sites 2kb beyond these shores are defined as CpG shelves (shown graphically Figure 1.4). Regions outside this 4kb stretch are referred

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 45

to as the ‘open sea’. DNA methylation can occur in cytosines outside CpG sites, frequently in embryonic stem cells, oocytes, and some neuronal subsets, but whether these are recognised by cellular machinery in the same way as CpG methylation is unclear 167.

DNA methylation is a relatively stable chemical marker that is maintained through mitosis (Figure 1.5) 168. It is moderately heritable between generations as maternal exposure to environmental factors can affect embryos in utero, and for female embryos, also affect their gametes, as female reproductive cells are fixed at birth 169.

Cells undergo two cycles of demethylation during reproduction therefore it is unclear whether these changes can be transmitted beyond these generations 168,169.

Initially considered a ‘switch’ for gene activation or silencing, the exact effect of DNA methylation on gene expression is highly context dependent, with promoter DNA methylation associated with gene silencing, whereas DNA methylation in the effector region is associated with gene activation 170. Local DNA sequence is the primary determinant of DNA methylation state, and SNPs can alter DNA methylation patterns171. Disease associated SNPs can therefore alter DNA methylation patterns to affect gene expression and cellular function, either in cis (at the gene itself), or trans

(indirectly, often distantly).

46 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Figure 1.5 The process of DNA methylation addition, maintenance and removal. The de novo establishment of DNA methylation is carried out by DNMT2, DNMT3A, DNMT3B and DNMT3L. Once established, maintenance is required to prevent loss of methylation either through “passive” spontaneous deamination, or actively by the ten eleven translocation (TET) enzymes using successive oxidation steps.1.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 47

There have been 6 published studies on DNA methylation in AS, all published within the last 6 years (Table 1.4). The first DNA methylation study in AS was published in

2014 172. The authors examined the suppression of cytokine signalling 1 (SOCS1) pathway due to the increased cytokine signalling exhibited by AS patients 172. Serum cell-free DNA was used. DNA methylation was undetectable in healthy controls, and the paper acknowledged that the increased SOCS1 methylation in AS cases was likely due to inflammation driven cellular apoptosis. SOCS1 methylation of serum cell-free

DNA was associated with advanced sacroiliitis and increased inflammatory cytokines.

Two studies were published using age- and sex- matched controls to each examine a single gene promoter region. The first examined DNMT1 due to the role of this enzyme in the maintenance of DNA methylation 173. AS patients had increased DNMT1 promoter methylation compared with healthy controls. The second examined BCL11B, which was previously identified as differentially expressed in the whole blood of AS patients 174 175. BCL11B promoter methylation was increased in AS patients and correlated with lower BCL11B expression. Interestingly, this was more pronounced in

HLA-B*27+ AS patients compared to HLA-B*27- patients. Neither DNMT1 nor

BCL11B methylation was correlated with disease activity scores.

The first published multigene study on DNA methylation in AS was carried out in

2017 in a Han Chinese cohort of 5 AS patients with grade 4 bilateral sacroiliitis, complete fusion of both sacroiliac joints, and 5 age- and sex- matched controls 176.

There were 1,915 differentially methylated CpGs identified using the Illumina Human

Methylation 450K Beadchip. The most significant DMP was in HLA-DQB1, an MHC class II responsible for exogenous peptide display, which had previously been associated with AS radiographic severity and age of onset, but not within Han Chinese individuals. This finding was validated with RT-qPCR, but the study failed to account

48 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

for HLA-B*27 which is in strong LD with HLA-DQB1 177. When examining which gene ontology (GO) terms were overrepresented for the genes associated with the differentially methylated CpGs, terms involving antigen presentation by MHC class

II, and the ER membrane were identified.

In 2019 two studies were published on the same cohort of 99 AS individuals and 99 healthy individuals seeking to examine IRF8 and IL12B respectively 178,179. Although the sites examined were adjacent to the promoters of these genes and both were hypermethylated, the hypermethylated CpGs in IL12B were associated with increased

IL12B mRNA expression, and the hypermethylated CpGs in IRF8 were associated with decreased IRF8 mRNA expression.

In the same year, the second genome wide methylation study in AS was published using the Illumina MethylationEPIC Beadchip array, which covers >800,000 CpG sites 180. This study compared the whole blood of 24 individuals with AS and 12 osteoarthritis controls. The intention of the study was to identify DNA methylation changes in AS associated with HLA-B*27 status. A single CpG associated with HCP5 was increased in HLA-B*27 positive individuals compared to HLA-B*27 negative individuals. HCP5, encodes TCBA, PLD6 and a lncRNA within the MHC region. The

CpG site identified within HCP5 contains a SNP that is in linkage disequilibrium (LD) with HLA-B*27.

Overall there remains little known about the changes in DNA methylation associated with AS, especially compared to the breadth of studies in related disease of IBD and psoriasis (reviewed in 1). Issues of monogenic approaches, small samples sizes, and lack of control for HLA-B*27 status, smoking status, and age has impeded the utility of previous studies. Questions remain about the role of DNA methylation in AS as a mediator of the functional impact of genetic variants, and the genes that are affected.

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 49

Table 1.4 Summary of design and outcomes of DNA methylation studies in AS.

Paper Ref. Case Control Sample Method Focus DEG Pathways

Lai et al. Cell-free DNA 172 43 6 RT-qPCR SOCS-1 1 Not determined (2014) from serum

Aslani et al. 173 40 40 PBMC RT-qPCR DNMT1 1 Not determined (2016)

Karami et al. 174 50 50 PBMC RT-qPCR BCL11B 1 Not determined (2017)

x Antigen processing and presentation via Illumina Infinium Hao et al. MHC class II 176 10 10 PBMC Human Methylation 1,915 (2017) x MHC II protein complex 450K Beadchip x Integral to luminal side of ER membrane

Chen et al. 178 99 99 PBMC MethylTarget IRF8 1 Not determined (2019)

Zhang et al. 179 99 99 PBMC MethylTarget IL12B 1 Not determined (2019)

Illumina Infinium x GTPase activator activity Coit et al. 180 24 12 OA Whole Blood Human MethylationEPIC 67 x GTPase regulator activity (2019) Beadchip x Potassium ion binding

50 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Differential gene expression

Transcriptomics encompasses the variety of RNA types that allow cells to dynamically alter their processes in response to internal and external stimuli. RNA subtypes are categorised by size and function (coding or non-coding). In eukaryotes (such as humans), messenger RNA (mRNA) is responsible for coding protein and is identified by the presence of a poly-adenosine (poly-A) tail. Additional complexity within protein coding regions is achieved through alternative splicing, a mechanism of removing sections of RNA (introns) and re-arranging the remaining sections (exons).

This recombination provides multiple potential transcripts to be created from a single stretch of DNA, which can vary in prevalence and function. Non-coding RNA is segregated into several categories including ribosomal RNA (rRNA), microRNA

(miRNA) and long non-coding RNA (lncRNA). Size is used to define subsets as RNA can range from miRNA (<25 bp) to lncRNA (>5 kb) in length.

The pivotal role of non-coding RNA in the regulation of cellular function emerged within the last 50 years. Genetic variants are defined by their effect on transcripts based on whether they preserve the amino acid sequence (synonymous mutation) or alter the coding ability of individual amino acids (non-synonymous mutations). Non- synonymous mutations can maintain function with the altered amino acid (missense mutations) or result in the early termination of a transcript (nonsense variant). SNPs that affect binding of transcription factors or transcription start sites can affect the level of expression and prevent expression entirely both of near (cis-eQTL) and network associated (trans-eQTL) genes 181.

Technologies to investigate RNA as a potential diagnostic and therapeutic target have shifted from monogenic studies of coding RNAs to transcriptome wide studies of both

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 51

coding and non-coding RNA. Monogenic approaches such as RT-PCR are still used for known candidate genes, as they are a relatively cheap and easy method to implement. Microarrays were the first technology to enable rapid high throughput measurement of transcriptional data using probes based on known genomic sequences.

Microarrays and PCR-based methods are limited to known transcripts and have a limited dynamic range in comparison to sequencing methods (an upper limit).

RNAseq is a next-generation sequencing technique that allows the simultaneous comparison of multiple samples (through barcode adapters), direct sequencing of

RNA, and the absolute quantification of RNA transcripts 182. This method also enables the examination of non-coding and novel transcripts, which may be of interest.

Additionally, it provides sequence information which may reveal SNPs that have altered transcripts. RNAseq is defined by the methods for RNA capture incorporated alongside library preparation, with specific method used to capture and prepare miRNA, mRNA and total RNA. Methods to capture mRNA transcripts select for poly-

A tails, and the columns utilised in mRNA and total RNA do not capture any transcripts shorter than 200 nucleotides. It is therefore important for studies to consider which type of RNA transcript is required to answer the biological question under investigation.

RNA expression in AS

Unlike DNA methylation, differential RNA expression has been widely investigated in AS, particularly in the context of coding genes (summarised in Table 1.5). While there have been numerous studies examining gene expression in AS, there has been difficulty comparing the results of these studies due to the use of different samples types, techniques, and analysis. Most studies of gene expression in AS have used

52 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

microarray gene expression profiling, with only four studies using RNAseq, all examining blood. Blood, both whole blood or PBMCs, is used most commonly due to the known link with immune cell subsets and AS, and the relative accessibility compared to bone/joints. Similar pathways were identified in studies examining blood samples: T-cell selection and activation, inflammatory signalling, Wnt signalling, and

TGFβ/BMP signalling. These pathways match those identified through genetic association studies in AS. In contrast the pathways identified from synovial or hip joint biopsies are varied, ranging from Wnt signalling, T-cell and B-cell activation, to angiogenesis. It is likely that this is due to the high levels of variability in the cells obtained from these types of tissue biopsies.

Meta-analysis has been a popular tool to attempt to bridge the differences in genes identified, and to bolster the sample sizes available, as many published studies are too small to identify changes in expression genome wide. There are almost 20 studies that have used meta-analysis with publicly available AS data, and almost a third of these have used the same microarray study from 2011 (Table 1.5). Meta-analysis can provide a mechanism to overcome sample size issues and to re-evaluate data using new techniques but is hindered by the scarcity of publicly available data for AS. This has led to studies using mixed sample types, further confusing changes due to disease with cell type-based changes in gene expression.

Unpublished work by our lab examined changes in gene expression between the

PBMCs of healthy controls and AS cases using RNAseq 183-187. When compared to previous microarray studies 214 genes were differentially expressed in both, however only 141 genes had congruent changes in expression. Most of the differentially expressed genes (DEG) identified by RNAseq had low fold changes and may have been missed by the less sensitive technique of microarray. In addition, variation

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 53

between studies could be due to differences in cellular composition between blood and tissue. Studies in the related disease of IBD have shown that using mixed cell types for analysis can mask cell specific changes 188. Implementation of single cell transcriptomics methods has highlighted the unique differences in cells within populations that have previously been assumed to be homogeneous, let alone cell types known to be divergent functionally. There is a clear need for differential expression studies in AS that examine specific cell types in a genome wide approach with sufficient power to identify significant changes in gene expression.

54 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

Table 1.5 A summary of previous studies investigating gene expression in AS.

Paper Ref Sample Case Control Method DEG Top DEG Top Pathways Identified x CXCR4 Gu Microarray, x IFN-γ 189 PBMC 7 SpA 20 7 Not Determined (2002) RT-PCR x IL-7Rα x JAK3 Rihl Microarray, x IL-17 190 Synovial biopsy 3 SpA 3 47 Not Determined (2008) RT-PCR x IL-7 x GBP5 Smith Microarray, 191 Macrophages 8 AS 9 3 x RARRES3 Not Determined (2008) RT-PCR x TNFAIP6 Gu Microarray, x RGS1 187 PBMC 44 AS 46 6 Not Determined (2009) RT-PCR x SOCS3 x IL-1R x Negative regulation of Wnt/catenin Sharma 18 186 Whole blood 25 Microarray 107 x NLPR2 x Bone remodelling (2009) axSpA x TREM1 x IL-1 pathway PBMC x LIGHT Haroon Microarray, 192 (pre- and post- 16 AS 0 1,418 x IFNAR Not Determined (2010) qRT-PCR anti-TNF-α) x IL17R x CD69 x Juvenile rheumatoid arthritis Duan Microarray, 185 PBMC 18 AS 18 452 x NR4A2 x Rheumatoid arthritis (2010) qRT-PCR x TNFAIP3 x JAK/STAT signalling pathway x NF- κβ signalling Assassi Microarray, x TLR4 184 Whole blood 16 AS 14 83 x DC maturation (2011) qRT-PCR x TLR5 x TLR pathways x Negative regulation of adaptive Pimental- x BCL11B Microarray, immune response Santos 175 Whole Blood 18 AS 18 221 x DNMT1 qPCR x Thymic T-cell selection (2011) x CLEC4D x Bone matrix biosynthesis x B-cell receptor signalling Xu x B4GALT3 193 Hip joint ligament 18 AS 6 Microarray 519 x TCR signalling pathway (2012) x RBP5 x Regulation of the actin skeleton

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 55

x DKK3 x B-cell activation Thomas 8 SpA, Microarray, 194 Synovial biopsy 7 416 x MMP3 x IFN-γ response (2013) AS qRT-PCR x PTGER4 x Myeloid cell activation Yeremenko Microarray, x ACTA1 x TGFβ/BMP signalling 195 Synovial biopsy 9 SpA 63 64 (2013) qRT-PCR x ACTN2 x Wnt signalling x MAPK11 x Immune system process Li Microarray, x TRAF3 196 MSC 12 AS 12 676 x Immune system development (2014) qRT-PCR x IFNAR1 x Regulation of cell communication x IL6 x ADAMTS15 Talpin Monocyte-derived 197 9 SpA 10 Microarray 81 x CITED2 x Wnt signalling pathway (2014) DC x F13A1 x BMP-2 Chen 18 198 PBMCs 12 AS RT-PCR 3 x BMP-4 Not Determined (2015) (10 OA) x BMP7 Aslani 173 PBMC 40 AS 40 RT-PCR 1 x DNMT1 Not Determined (2016) Almasi 70 x TLR4 199 PBMC 40 AS RT-PCR 2 Not Determined (2016) (20 RA) x TLR5 Vecellio 200 PBMC 9 AS 0 qRT-PCR 1 x RUNX3 Not Determined (2016) Microarray x lnc-ZNF354A-1 x TGF-β signalling Xie (lncRNA & 201 MSC 12 AS 12 1,185 x lnc-LIN54-1 x Focal adhesion (2016) miRNA), x lnc-FRG2C-3 x Calcium signalling pathway qRT-PCR Whole blood Affymetrix HG- x STAT1 x FAS signalling Dolcino 202 (pre- and post- 10 AS 10 U133A 2.0 gene 740 x TNFRSF25 x (2017) anti-TNF-α) Chip x IKBKB x Wnt signalling x miR-29a Huang 203 PBMC 38 AS 32 RT-PCR 4 x DKK1 x Wnt signalling (2017) x RUNX2

Layh-Schmitt 2 x TNFRSF1 204 iPSC, MSC 1 RNAseq 27 Not Determined (2017) axSpA x STAT3

56 Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants

x Myd88 Roozbehkia 205 PBMCs 16 AS 10 RT-PCR 4 x NF-κβ Not Determined (2017) x MAPK14 x IL6R PBMC x Haematopoietic cell lineage Wang x NOTCH1 206 (pre-and post- 19 AS 0 RNAseq 574 x Intestinal immune network for IgA (2017) x CXCR1 anti-TNF-α) x Influenza A x TNFRSF1A Zhang Microarray, x miR-17-5p x G-protein receptor signalling 207 Hip joint ligament 20 AS 34 1,257 (2017) RT-PCR x miR-27b-3p x Angiogenesis Xu x KIR2DL3 x Haematopoietic cell lineage 208 Whole blood 3 AS 3 RNAseq 503 (2018) x GPR162 x Phagosome Xu x MANSC1 x DNA repair 209 Whole blood 3 AS 3 RNAseq 1,444 (2019) x DNMT1 x T-cell receptor signalling Kook miRNA 210 Serum 65 AS 39 12 x miR-214 Not Determined (2019) microarray

Investigating the mechanisms of Ankylosing Spondylitis-associated genetic variants 57

Multi-omics integrated analysis

The measurements mentioned above (genetics, DNA methylation and gene expression) are all co-regulated. Changes in genetic sequence can affect both DNA methylation and expression, either sequentially, simultaneously, or individually.

Integrating genotype can provide additional information on heterogeneity within a sample population, and on the effect that these genotypes have on phenotype.

Combining ‘omics’ techniques may compensate for missing information in one dataset, reduce the number of false positives, and elucidate changes that affect biological systems on multiple levels 211.

Combined analysis of ‘omic measurements can be done through testing step-wise hypothesis of action (multi-staged analysis), or a hypothesis of combinational effect on phenotype (meta-dimensional analysis) 212. This approach interrogates whether

‘omic datasets are affected by genotype directly, or whether observed changes are due to other variables. Measurements associated with genetic loci are known as expression quantitative trait loci (eQTL) or methylation quantitative trait loci (mQTL) (Figure

1.6). This has proven effective in the related diseases of IBD and psoriasis, where integrated analysis has identified mechanisms of effect for known disease-associated variants. Focusing on loci identified in genetic association studies provides a means to narrow study focus to that which is achievable within current sample sizes.

Further integration approaches have emerged recently for identifying molecular signatures across multiple ‘omic techniques. Concatenation combines data from different ‘omics techniques into a single large dataset. Multiple models are developed within each dataset before attempting to combine these models into a single model to explain the phenotype of interest. This approach is a reductionist one and is unable to

58 Chapter 1: Introduction

fully account for how individual data types may affect or interact with each other.

Figure 1.6 Illustration of the cis (<500kb apart on the same chromosome) and trans (>500 kb apart either on the same or on different ) relationship between SNPs with DNA methylation and gene expression.

DIABLO (Data Integration Analysis for Biomarker discovery using a Latent component method for Omics studies) attempts to “maximize the common or correlated information between multiple datasets whilst identifying in an optimal manner the key ‘omics variables (mRNA, CpGs, proteins, metabolites, etc.) that explain and reliably classify disease sub-groups or phenotypes of interest.” 213.

DIABLO builds upon the projection to latent structure models (PLS) to model and maximise the correlation between pairs of pre-specified ‘omics datasets to unravel relationships between these datasets. Known relationships between datatypes, such as

DNA methylation and gene expression, are leveraged to interrogate changes that occur in biological pathways. The same N individuals are used for each ‘omic approach sampled. Where summary data is intended to be compared, different models are required 214. Unfortunately to date there have been no studies in AS that integrated genotype with expression or methylation in a multigene manner, nor has any study examined the relationship between all three. Therefore, integrated analysis remains a relative unknown in AS.

Chapter 1: Introduction 59

Thesis outline

This thesis investigated the role of DNA methylation and gene expression changes in

AS pathogenesis, and whether these changes were specific to individual immune cell subsets from peripheral blood: CD4+T-cell, CD8+T-cell, γδ T-cells, NK cells and

CD14+monocytes. This information was then used to interrogate the pathways enriched for these changes, and the association of these changes with AS-associated genetic loci.

Aims

Aim 1: To identify changes in DNA methylation in individual cell subsets associated with AS (CD4+T-cell, CD8+T-cell, γδ T-cells, NK cells and CD14+monocytes).

Aim 2: To identify changes in transcription, both coding and non-coding, in individual cell subsets that are associated with AS (CD4+T-cell, CD8+T-cell, γδ T-cells, NK cells and CD14+monocytes).

Aim 3: To investigate if the previously identified changes in DNA methylation and expression are associated to known AS-associated loci, and together form a signature for AS.

60 Chapter 1: Introduction

Significance

Despite progression in the knowledge of genetic loci associated with AS, the functional impact of many loci remains elusive. A scarcity of data on the functional basis of AS has proven to be a barrier to evidence-base improvements in the diagnosis and treatment of AS. Difficulties in identifying the function of individual loci due to their location within non-coding regions or gene dense regions may be overcome through examining epigenetic changes in relation to these loci. Transcriptomics has been examined in AS to a certain extent with some promising results, however the uptake of novel methods such as RNAseq has been slow and many studies have been limited by the use of heterogenous sample types and/or small sample sizes. The same limitations have also hindered the few studies examining DNA methylation in AS. It is apparent that DNA methylation and transcription are altered in AS, however the pathways that are affected and the extent to which these are associated with genetic variants is unknown. This work provides the first cell type-specific DNA methylation and expression data in AS. It is also the first integrated analysis in AS that has integrated genotype, DNA methylation and expression, either as QTL associations or as an integrated signature of disease. This study provides a new integrated insight into the fundamental biological basis of AS that may inform future translation into diagnostic or treatment options.

Chapter 1: Introduction 61

Chapter 2: Materials and Methods

This chapter outlines the methodology used in this thesis to achieve the aims and outcomes outlined in Chapter 1, from cohort selection to statistical analysis. A single cohort was used for this thesis, and it is appropriate to discuss the methods used to investigate the research aims together.

Participants

Peripheral blood samples were obtained from 51 clinical patients satisfying the modified New York criteria for AS, and from 49 age-, sex- and HLA-B*27 status

(positive or negative)- matched healthy controls. One individual was re-classified midway through the study from a healthy individual to non-radiographic axial spondyloarthritis. All individuals were adults over the age of 18 years. Healthy controls were obtained from both unaffected first-degree relatives of HLA-B*27 positive AS patients not participating in this research, and unrelated healthy individuals obtained through collaboration with the Queensland Institute for Medical

Research (QIMR) Berghoefer Twin study. These cohorts were included to maximise the likelihood of obtaining HLA-B*27 positive individuals. HLA-B*27 status was only recorded as presence (positive) or absence (negative) of any HLA-B*27 allotype.

Individuals who had a history of AS, radiographic changes on x-ray, or reported lower back pain for more than the past 3 months were excluded from the healthy control cohort. All participants provided written informed consent and were approved by the

Queensland University of Technology (QUT), Metro South and QIMR Berghofer human research ethics committees under the ‘Genetics of Ankylosing Spondylitis study’ protocol (HREC/05/QPAH/221).

Chapter 2: Materials and Methods 63

Individuals on glucocorticoids therapy or who have smoked within 5 years prior to sample collection were excluded due to their effect on DNA methylation and gene expression. Use of NSAIDs was allowed as these have minimal effects on PBMC gene expression 194. All participants were of self-reported European ancestry.

Venous blood was collected at either the Princess Alexandra Hospital or distributed

Queensland Medical Laboratory (QML) Pathology sites. Blood was collected into serum separator clot activator tube, lithium/heparin tubes and ethylenediaminetetraacetic acid (EDTA) tubes for the purposes of isolating serum,

PBMC and DNA, respectively. Ethnicity, age, gender, height, weight, smoking status, medication use, and medical history was recorded for each participant. In addition, AS patients had year of disease symptom onset, BASDAI, CRP, ESR, bDMARD use and year of first bDMARD use recorded. Statistical significance for continuous variables were calculated using Wilcox non-parametric rank test, while discrete variables were determined using a chi-squared (χ2) test.

Sex and BMI were significantly different between cohorts (Table 2.1). BASDAI correlated significantly with whether an individual with AS was undergoing anti-TNF-

α, and therefore was considered to be confounded with treatment status. Only 10 AS individuals had a BASDAI >4 (considered to be active disease) and of these 8 were treatment naieve individuals. In addition, the BASDAI scores, a subjective measure of disease activity, did not directly correlate with the CRP and ESR results for these individuals as only the 6 individuals with a BASDAI>6 had CRP and ESR measurements above the normal range.

64 Chapter 2: Materials and Methods

Table 2.1 Cohort statistics for clinical and biological parameters.

* BMI information was no–ƒ˜ƒ‹Žƒ„Ž‡ˆ‘”ͳʹ‹†‹˜‹†—ƒŽ• $ ESR was not determined for the date of bleeding for one AS individual NS: Not significant (p-value>0.05) AS individuals Healthy Controls P-value

Sex Male 39 (76.5%) 19 (38.8%) 0.0002

Female 12 (23.5%) 30 (61.2%)

Age (median)(range) 47 (18-70) 37 (18-79) NS

HLA-B*27 status Positive 39 (76.5%) 35 (71.4%) NS

Negative 12 (23.5%) 14 (28.6%)

BMI (kg/m2) (median)* 27.7 (20 -47.9) 25.3 (17.4-56.2) 0.02

Smoking, n. Never 35 (68.6%) 34 (69.4%) NS

Ever 16 (31.4%) 15 (28.6%)

Iritis, n (%) 20 (39.2%) 0 (0%)

Psoriasis, n (%) 10 (19.6%) 3 (6.1%)

IBD, n (%) 11 (21.6%) 1 (2%)

Anti-TNF-α , n (%) 32 (62.7%) NA

BASDAI (median)(range) 1.3 (0.1 -8.1) NA

CRP (mg/L) (median) 5.4 (1.5 -66) NA

ESR (mm/ hr) (median)$ 14 (2 -73) NA

Genotyping

DNA was extracted from EDTA Tubes using the QIASymphony PAXgene Blood

DNA kit per manufacturer’s instructions (QIAGEN). All samples were genotyped using the Illumina CoreExome microarray SNP genotyping chip (Illumina,

California). Data was initially processed from the Illumina iScan using the Illumina

Genome Studio software (Illumina, California). Samples with a missing rate greater than 5%, Hardy-Weinberg Equilibrium p-value ≤10-6, or who group separately from the European Caucasian ethnic population on principal component analysis (PCA)

Chapter 2: Materials and Methods 65

were excluded from analysis. PCA was carried out using the mixOmics package 214.

Downstream analysis used genotypes for the 115 loci previously associated with AS in European-Caucasian populations (full list in Appendix A). HLA-B*27 status was imputed using HLA*IMP software215,216, and verified by PCR using publicly available sequences for exon 3 primers and a β-globulin control primer (Table 2.2). AS individuals were confirmed by Pathology Queensland.

Table 2.2 Primer sets for HLA-B*27 typing using PCR. B27 specific exon 3 primers E91s - GGGTCTCACACCCTCCAGAAT (135bp product) E136as – CGGCGGTCCAGGAGCT Β-globin PCR control primers KM29 - GGTTGGCCAATCTACTCCCAGG (536bp product) RS42 – GCTCACTCAGTGTGGCAAAG

PBMC processing

Venous samples were processed on the same day as collection using methods routinely used in our laboratory 90. Samples were processed using Ficoll density gradient either manually or with SepMate tubes (StemCell Technologies, Canada). 1mL of plasma was taken for storage at -80 °C. PBMC samples were stored in vials of approximately

1 x 107 cells with 5% DMSO/ 95% FBS (foetal bovine serum) cryomedia in liquid nitrogen (≤ -150°C). Serum separator clot activator tubes were centrifuged at 3000 x g for 15 minutes and serum aliquoted into 1mL prior to storage at -80 °C.

66 Chapter 2: Materials and Methods

Fluorescence activated cell sorting

FACS antibodies Antibodies for anti-CD3 PECF594 (UCHT1), anti-γδ TCR fluorescein isothiocyanate

(FITC) (B1), anti-CD4 phycoerythrin (PE) (RPA-T4), anti-CD14 Pacific Blue

(M5E2), and anti-CD16 Allophycocyanin-Cy7 (APC-Cy7) (3G8) were purchased from Biolegend (San Diego, CA). Antibodies for anti-CD8 Px10-Cyan7 (Px10-Cy7)

(HIT8a), and anti-CD56 APC (MEM-188) were purchased from BD Biosciences

(USA). HLA-B*27 status was tested using the pan anti-human HLA-B27 FITC (HLA-

ABC-m3) (Abcam, UK) with an IgG FITC isotype control 217,218. Live/dead staining was carried out with either LIVE/DEAD Aqua (Biolegend) or fixable viability stain

440UV (BD Biosciences).

Figure 2.1 FACS gating strategy used for sorting cell subsets. CD3 positive cells were separated into γδTCR+ and γδTCR-CD4+/ γδTCR- CD8+ cells. CD3 negative cells were split into CD14+ monocytes and NK cells.

FACS sorting PBMC vials from the same venous draw were transferred from liquid nitrogen to -80 °C for at least 24 hours before use.

Samples were then defrosted and placed into recovery media (20% FBS/80% RPMI).

Cells were washed in PBS and resuspended in 100μL PBS for staining. PBMC viability was determined using

Chapter 2: Materials and Methods 67

haemocytometer cell count with trypan blue, average viability was above 80%. Cells were stained with a live/dead dye for 5 minutes at room temperature prior to other antibodies. Remaining antibodies were incubated at 4°C for 20 minutes, washed in

MACS buffer, and resuspended in MACS buffer for use on the FACS machine. Sorting was carried out on a MoFlo Astrios Cell sorter (Beckman Coulter, Indiana) using the gating strategy shown in Figure 2.1. Purity was assessed post sorting on the same machine and was greater than 90%. Where gating was unclear, purity was prioritised over yield.

FACS Statistics FACS data was analysed using the FlowJo v10.4 software (FlowJo, LLC) to gate and extract statistics. Cell count and maximum fluorescence index (MFI) comparisons were performed in SPSS Statistics. All data are shown as mean ± standard error of the mean (S.E.M). Comparisons of average cell frequencies was performed using a paired sample t-test, and CD3MFI comparisons using an independent sample t-test.

DNA and RNA extraction

DNA and RNA were simultaneously extracted from the isolated cell subsets immediately after sorting using the AllPrep DNA/RNA Mini kit (QIAGEN, USA) as per the manufacturer’s instructions. The AllPrep kit enables simultaneous extraction of DNA and RNA for use in both downstream RNA sequencing and DNA methylation from the same PBMCs. This enables direct comparison of the gene expression and

DNA methylation profiles of the same cell subset from a single individual. RNA samples were DNase treated on column or in solution using the RNase-free DNase

(QIAGEN, USA) per manufacturer protocol. DNA sample quantity was determined using the Qubit dsDNA HS Assay kit (Thermo Fisher Scientific) per protocol. DNA samples that were less than 2.5 ng/μl were concentrated using a SpeedVac (Thermo

68 Chapter 2: Materials and Methods

Fisher Scientific). RNA quantity was measured using Qubit RNA HS assay

(Thermofisher Scientific), and RNA quality was assessed using the Agilent High

Sensitivity RNA Screentape (Agilent Technologies). Low RNA concentration samples were concentrated using the NucleoSpin RNA Clean-up XS kit (Macherey-Nagel) per manufacturer’s protocol.

DNA methylation methods

Bisulfite conversion and MethylationEPIC Beadchip A total of 250ng of extracted DNA was bisulfite converted using the EZ DNA methylation kit (Zymo research), and 5 representative samples from each plate were tested for conversion using methylation specific PCR. Converted DNA was analysed using the Illumina Infinium MethylationEPIC BeadChip (Illumina) per manufacturer’s protocol and imaged using the Illumina iScan system (Illumina). Final number of samples per cell type for each plate run is shown in Table 2.3. To prevent conflating technical and biological variables, samples were randomised across plates and rows by disease status (AS diagnosed individual or healthy control), anti- TNF-α treatment, sex, and cell type.

Table 2.3 ƒ’Ž‡—„‡”•ˆ‘”‡ƒ Š ‡ŽŽ–›’‡™‹–Š‹–Š‡ˆ‹˜‡•ƒ’Ž‡’Žƒ–‡•’”‘ ‡••‡†Ǥ ȗͶΪǦ ‡ŽŽƒ†ƒͳͶڏ‘‘ ›–‡ ‘–”‘Ž™‡”‡”—ƒ ”‘••‡ƒ Š’Žƒ–‡Ǥ CELL TYPE A B C D E TOTAL CD4 T-cells 21 20 25 22 17 104 * CD8 T-cells 19 20 23 22 13 97 γδ T-cells 11 11 2 8 7 39 NK cells 17 17 21 21 12 88 CD14 Monocytes 20 20 25 23 15 103 *

Chapter 2: Materials and Methods 69

DNA methylation Processing and Quality control

All statistical analysis was carried out in R (version 3.6.0) and RStudio (version

1.1.463) 219,220. Packages were downloaded through Bioconductor (version 3.9), or

CRAN repositories. Analysis was assisted by use of the TRI high performance computing (HPC) system “Trident”.

Raw signal intensities were extracted from .idat files using Minfi (version 1.30.0)221.

Probes were annotated using the IluminaHumanMethylationEPICanno.ilm10b2.hg19

(version 0.6.0) through the Minfi getAnnotation function. Samples with more than 5% of sites with a detection p-value greater than 0.05 were removed (0 samples). Probes with a detection p-value greater than 0.01 in 50% of samples were removed (104 probes) (Figure 2.2). Sex was determined using the getSex function and compared against clinical records to check for sample contamination. Sex chromosome related

probes were then removed prior to further analysis (18,191 probes). Probes that have been identified as potentially cross-hybridising to either CpG or non-CpG sites (44,210 probes) were removed 222.

Figure 2.2”‘„‡––”‹–‹‘–Š”‘—‰Š‘—––Š‡’”‘ ‡••.

70 Chapter 2: Materials and Methods

Beta values were calculated from intensity values for each probe using the ratio of methylated (M) and unmethylated (U) signals (β = M/(M + U + 100)). These scores have a binomial distribution due to values being constrained between 1 and 0, reflecting methylated and unmethylated states. The two probe types utilised by the

Illumina Infinium arrays have different distributions, Type II being broader than Type

I. Normalisation of these distributions is carried out to prevent this difference affecting the analysis. Two methods were compared for this process: beta mixed inter-quartile

(BMIQ) normalisation implemented through the wateRmelon package (version 1.28.0)

223,224, and the quantile normalisation (QN) method implemented through Minfi package. BMIQ was selected as it best minimised the difference in distribution within each individual cell type. Post-normalisation probes that were incomplete in a cell subset were removed, as were the legacy probes that have been removed from the

Illumina manifest v1.0 B4 (215 probes).

Table 2.4 The number of samples for DNA methylation from each cell type post-QC.

Cell Type CD4+T-cells CD8+T-cells Γδ T-cells NK cells CD14+monocytes

No. Samples 100 97 38 88 99

PCA was implemented through the mixOmics package (version 6.8.0). Normalisation was carried out prior to PCA and surrogate variable analysis (SVA) using the SVA package (version 3.32.1) implemented with cell type datasets 225. Cell types were examined separately as the large variation between cell types may mask other sources of variation. SVA was used to determine the impact of technical variables related to sample processing, such as batch, on sample methylation levels. Sources of technical and biological variation were visualised using a heat scree plot (Appendix B). Slide and plate were identified as sources of technical variation and were adjusted for using

Chapter 2: Materials and Methods 71

the comBat function within SVA package. Biological sources of variation (age, sex, and smoking status) were used as covariates in later analysis.

Differentially methylated position (DMP) analysis DMP were identified using the lmfit function implemented by the limma package

(version 3.40.2). This method fits a linear model for the specified variable of interest

(disease status (AS diagnosed individual or healthy control) accounting for known confounders. An example of the code used on CD4 T-cell data for this function on the combat adjusted Beta values:

Ͷ̴δǦ†’ ‹†‡”ȋƒ†Œ˜ƒŽǡ’Š‡‘α–ƒ–—•ǡ–›’‡α̶ ƒ–‡‰‘”‹ ƒŽ̶Ȍ

The eBayes function was implemented within Limma package to rank the DMP identified through the lmfit model.

‡•‹‰δǦ‘†‡ŽǤƒ–”‹šȋ̱–ƒ–—•Ϊ‡šΪ‰‡Ϊ‘‹‰ǡ†ƒ–ƒα’†Ȍ ŽδǦŽ ‹–ȋ˜ƒŽǡ†‡•‹‰Ȍ ˆ‹–Ǥ‡δǦ‡ƒ›‡•ȋŽȌ ”‡•—Ž–•δǦ†‡ ‹†‡‡•–•ȋˆ‹–Ǥ‡Ȍ

Statistical significance was set at P<0.05 following adjustment for multiple testing using Holm correction 226. DMP values are shown as absolute difference, fold change, and S.D. Variation due to each variable was determined using linear mixed modelling and is shown as adjusted p-value ± S.E. DMPs were visualised using the plotCpG function from Minfi, ggplot2 (version 3.2.0) and RColorBrewer (version 1.1.2).

Permutation analysis of FDR Permutation testing was used to assess the sampling distribution through resampling the observed data with altered disease status (shuffling the recorded AS or healthy control status). If the null hypothesis is true, changing the disease status will not alter the outcomes (whether a value is significantly associated with the disease status). The number of permutations limits the precision of the result, such that at 1000

72 Chapter 2: Materials and Methods

permutations the smallest possible p-value is 0.001. This p-value is determined by the number of outcomes that exceed the true value calculated. A t-test is then used to calculate whether the null hypothesis can be rejected for the value. As the linear model tested takes only minutes 1000 iterations were used (example code Appendix C).

Differentially methylated Regions (DMR) Two separate methods were utilised and compared for the regions they identified as differentially methylated, Bumphunter (implemented in the Minfi package), and

DMRcate (version 1.20.0). Statistically significant DMP (p-value 0.05) from the previous section were used as input for these programs. As genomic coverage in targeted methods such as the Illumina Human MethylationEPIC array can be sparse many DMR methods applied to array data implement kernel smoothing to allow a greater portion of the genome to be considered. This ‘smoothing’ relies on the assumption that closely located cytosines display correlated levels of methylation, an assumption that holds true for CpG methylation in mammals.

The major difference in these methods is the selection of region size. DMRcate selects regions based on collapsing contiguous CpGs within a specified bandwidth of each other, and Bumphunter selects clusters of consecutive probes for which all the t- statistics exceed a user defined threshold. Notably, while Bumphunter is faster to implement, it is less precise than DMRcate 227. Results were visualised using Gviz plots with the genes in the region identified (based on the GRranges function and the

Illumina manifest).

Pathway analysis To investigate the functions of the network of the genes identified as differentially methylated pathway analysis was carried out using two databases (Gene Ontology

(GO) terms, and Kyoto Encyclopaedia of Genes and Genomes (KEGG)). These

Chapter 2: Materials and Methods 73

databases vary in their exact content, and therefore both were tested and compared. As the CpG sites tested may be assigned to multiple genes, or multiple CpGs may be related to a single gene, an initial step in pathway analysis for methylation data is to consolidate the list of CpGs into a list of associated genes. GOmeth is a package that accomplishes this by first running the getMappedEntrezIDs function, which obtains a list of mapped Gene IDs, followed by the gometh function which takes an input of the significant probes and all probes tested on the array, with a prior.prob accounts for the differing number of probes per gene that are on the array. The getMappedEntrezIDs function can be utilised separately, and the output of this function was used with the ClusterProfiler (version 3.12.0) to run overrepresentation analysis, gene set enrichment and the accompanying visualization.

AS-associated loci enrichment analysis

To test for enrichment of AS associated DMPs in the regions surrounding AS- associated loci, various windows were used to select numbers of significant DMP ranging from 25kb to 500kb (p-value<0.01). 1000 random bins with similar probe density were selected for comparison. Wilcoxon sum rank comparison was implemented to test whether the AS-associated regions were enriched more than random bins. Comparison of these results with other diseases, both AS-associated

(IBD and psoriasis) and unrelated (systemic lupus erythematosus (SLE) and rheumatoid arthritis) was performed using the same method on disease associated loci obtained from the publicly available GWAS catalogue (https://www.ebi.ac.uk/gwas/) for European ancestry (EA) loci 228. DMP coordinates and AS loci coordinates were from the genome build 37. GWAS catalogue loci were from Genome Assembly

GRCh38.p13.

74 Chapter 2: Materials and Methods

RNAseq methods

RNAseq Library preparation As expected, different amounts of RNA were obtained per cell type, and the number of each cell type for which samples were successfully purified varied between individuals. Therefore, a low input library preparation method was selected. The

Clontech SMARTer Stranded Total RNA-Seq Kit- pico-input mammalian v2 (Takara

Bio, USA), a mammalian, stranded, total RNA, ultra-low input library preparation method was selected. The Clontech kit is compatible with Illumina machines and moderately correlated with the widely used Illumina TruSeq Stranded Total RNA kit

(Spearman correlation 0.9 for forward strand and 0.79 for reverse strand)(full pilot study comparison in Appendix D).

The Clontech kit was used per protocol, with a variation in section A steps 1 and 3 for samples with lower inputs and γδ T-cells, where 1.2x master mix volume was used.

This improved library output with samples that had low non-ribosomal RNA such as

γδ T-cells. Random priming enables the barcoding and amplification of all RNA transcripts, both coding and non-coding. Samples were enzymatically fragmented using the SMART Pico Oligos Mix v2 for 3 minutes (optimised for RNA quality).

Illumina primer adapters corresponding to forward primers D501-D512 and reverse primers D701-D712 were used (sequences from the kit protocol in Table 2.5). Final

RNA libraries were amplified using 14 cycles, optimised from the initial library sets

15 cycles. Samples were purified using AMpure beads prior to final library size and concentration determination using the Tapestation D1000 kit (Agilent, USA) per manufacturer protocol. Libraries were repeated if they were below 1 ng/uL. Size clean- up with AMpure beads was used to remove adapter dimers and other reagents.

Chapter 2: Materials and Methods 75

Table 2.5 †ƒ’–‡”•‡“—‡ ‡•—•‡†ˆ‘”–Š‡‘–ƒŽ•‡“Ž‹„”ƒ”‹‡•Ǥ i5 i5 i5 Bases for i5 Bases for i7 i7 i7 Bases Index Illumina Sample Sheet Sample Sheet Index Illumina for Sample Index HiSeq NextSeq/ HiSeq Index Sheet  Name  2000/2500  3000/4000  Name  ͵ǯͳ ͷͲͳ     ͷǯͳ ͹Ͳͳ   ͵ǯʹ ͷͲʹ     ͷǯʹ ͹Ͳʹ    ͵ǯ͵ ͷͲ͵     ͷǯ͵ ͹Ͳ͵   ͵ǯͶ ͷͲͶ      ͷǯͶ ͹ͲͶ   ͵ǯͷ ͷͲͷ       ͷǯͷ ͹Ͳͷ   ͵ǯ͸ ͷͲ͸    ͷǯ͸ ͹Ͳ͸   ͵ǯ͹ ͷͲ͹       ͷǯ͹ ͹Ͳ͹    ͵ǯͺ ͷͲͺ     ͷǯͺ ͹Ͳͺ       ͷǯͻ ͹Ͳͻ       ͷǯͳͲ ͹ͳͲ       ͷǯͳͳ ͹ͳͳ       ͷǯͳʹ ͹ͳʹ    

Sequencing Samples were normalised and pooled for sequencing in ~96 sample pools for a final pooled library concentration of ~0.875nM. Reads were split across runs to allow for the potential to re-pool or adjust based on the initial sequencing. Several low read samples were sequenced a third time to obtain sufficient reads. Pools were balanced according to the number of reads required based on million reads per sample input

(μL) achieved in the first two runs. To create the required complexity for sequencing, and as a sequencing control, each pool was spiked with 1% PhIX as recommended in the Clontech library manual. Pooled libraries were verified for amplification ability and quantification using KAPA qPCR (Kapa Biosystems, USA), and pools were quantified using both Tapestation D1000 (per manufacturer protocol) and QUBIT

High Sense DNA (per manufacturer protocol). Libraries were sequenced using

NovaSeq 600 S4 Reagents with an XP workflow to enable different pools to be run on each lane. The XP kit allows for manual loading of pools into each lane, which

76 Chapter 2: Materials and Methods

expanded the use of the 96 primer pairs available for use and allowed the use of a higher throughput sequencing kit. In addition, the input for each lane is lower when using XP loading. Libraries were sequenced with 2 x 150bp paired end reads to a minimum depth of 38 million reads (average depth of 49 million reads). Sequencing metrics were for %PF (passing filter)(% of reads and/or clusters passing filter), Q30

(percentage of bases with a quality score of less than 1 error per 1000 bases) and % aligned (percentage of sample aligned to PhiX genome) (as per Illumina guidelines).

RNAseq processing and quality control Raw sequencing data was demultiplexed using the Illumina software, bcl2fastq2

(version 2.20) according to the manufacturer recommendation. Reads were mapped to

GRCh38 (hg38) (ENSEMBL) using STAR (version 2.7)229.

##### Command Line:

ǦǦ”—Š”‡ƒ†ͳʹǦǦ”‡ƒ† ‹Ž‡•  ͹͵ͷǦͶǦ͹ͲͶǦͷͲͺǤ—ƒ’’‡†Ǥ„ƒ

ǦǦ”‡ƒ† ‹Ž‡•›’‡ǦǦ”‡ƒ† ‹Ž‡•‘ƒ†̶•ƒ–‘‘Ž•˜‹‡™̶

ǦǦ‰‡‘‡‹”‰‡‘‡ǦǦ‘—––›’‡‘”–‡†›‘‘”†‹ƒ–‡

ǦǦ‘—–—ƒ’’‡†‹–Š‹ǦǦ‘—–ƒ’“‹“—‡͹ͲǦǦ‘—– ‘’”‡••‹‘Ǧͳ

ǦǦ“—ƒ–‘†‡ ‡‡‘—–•ǦǦ‘—– ‹Ž‡ƒ‡”‡ˆ‹šǤȀͲ͵ͺͺ̴ͺȀ

ǦǦ‘—– ‹Ž–‡” ‘”‡‹˜‡””‡ƒ†ͲǤͷǦǦ‘—– ‹Ž–‡”ƒ– Š‹˜‡””‡ƒ†ͲǤͷ

Read counts per gene was obtained using STAR command –quantMode GeneCounts in the above command. Genes were annotated with EntrezGene ID using the

ENSEMBL database annotation hg 38 (v86) 230. Sequencing metrics from STAR were examined using MultiQC (version 1.7) for million reads per sample, reads aligned, and reads aligned to genes. Four samples that had RNA input below the recommended failed to align more than 50% of the reads, with a single sample within this group only aligning ~3.8% of reads. These samples were excluded from subsequent analysis due

Chapter 2: Materials and Methods 77

to the low quality of the libraries. Genes with less than 10 reads across all samples were removed. Prefiltering reduces the computational burden of calculating changes in expression, with the premise that those removed are low expressed or not expressed in the samples examined.

SVA package was used to test for batch effects or other technical confounders. The exact function used is different, svaseq(), as the count data requires different assumptions compared to the finite values within methylation data. Heat_scree plots were used to visually represent the influence of technical and biological variables on samples. Library preparation group was found to be a significant technical variable and was including as a covariate in analysis for this reason. Due to the nature of count data, it is preferable to include technical variants as covariates rather than adjust for them. PCA plots and heatmaps implemented using the mixOmics package were used to visually examine samples for sources of variability outside of sample origin, e.g. contamination, or inter-batch variation 231. The final sample numbers post quality control are outlined in Table 2.6.

Table 2.6 The number of RNAseq samples post quality control. CD14+ Cell Type CD4+T-cells CD8+T-cells γδ T-cells NK cells monocytes

Samples 100 99 48 90 99

RNAseq Analysis

The analysis of AS-associated differential expression was performed in DESeq2

(version 1.24.0) using a multi-factorial approach using healthy controls as the reference level 232. For analysis a group covariate was defined that combined cell type and disease status (e.g. CD4.AS, CD4.HC). The design used was:

78 Chapter 2: Materials and Methods

’†̈́‰”‘—’δǦ „‹†ȋ’†̈́‡ŽŽ›’‡ǡDzǤdzǡ’†̈́–ƒ–—•Ǥ‡™Ȍ †‡•‹‰δǦ̱‹„”ƒ”›”‡’Ϊƒ‰‡Ϊ‡šΪ‘‹‰Ǥ–ƒ–—•Ϊ‰”‘—’

Counts were normalised for library size using the variance stabilizing transformation

(VST) approach prior to visualisation. Most multidimensional data exploratory analysis deals best with data that has the same range of variance at different ranges of the mean values, known as homoscedasticity. Homoscedasticity is more apparent in

RNAseq as there is greater dispersion for larger counts, as these have the greatest absolute difference in mean values. Traditionally this was adjusted for using a log transformation approach, but this can be computationally heavy and time consuming.

VST is on roughly the same scale as log2 counts but has an upwards shift for smaller values. These values were used for visualisation methods (such as PCA), but unnormalized counts were used for differential expression analysis.

Differential expression was analysed using the deseq() function which creates a negative binomial generalised linear model fitting for the log fold change of a gene.

This then outputs for each gene a log2 fold change, log fold change standard error,

Wald test p-value and adjusted p-value (adjusted using Benjamini-Hochberg correction). Statistical significance was set to adjusted p-value<0.05. Adjusted p- values are set to NA where the gene counts are below the expected value based on the observed count distribution. Samples were examined for count outliers using Cook’s distances, a measure of the influence a single sample has on the fitted coefficients for the gene. A large Cook’s distance would indicate an outlier. No samples were identified as above the DESeq2 Cook’s distance cut-off of “the .99 quantile of the F(p, m-p) distribution, where p is the number of coefficients being fitted and m is the number of samples”. Significantly differentially expressed genes were visualised using ggplot2 and RColorBrewer.

Chapter 2: Materials and Methods 79

Results were annotated using AnnotationDbi (version 1.48.0) to implement annotation with org.Hs.eg.db (version 3.10) and EnsDb.Hsapiens.v86 (version 2.99.0), which are both annotation files. org.Hs.eg.db is largely based on Entrez IDs and

EnsDb.Hsapiens.v86 is based on the Ensembl database. Annotation code was:

ƒ’ †•ȋ•„Ǥ •ƒ’‹‡•Ǥ˜ͺ͸ǡ‡›α”‘™Ǥƒ‡•ȋ”‡•Ȍǡ ‘Ž—αDz̶ǡ ‡›–›’‡α̶  ̶ǡ—Ž–‹ƒŽ•α̶ˆ‹”•–̶Ȍ

ƒ’ †•ȋ‘”‰Ǥ •Ǥ‡‰Ǥ†„ǡ‡›α”‘™Ǥƒ‡•ȋ”‡•Ȍǡ ‘Ž—α̶ ̶ǡ‡›–›’‡α ̶̶ǡ—Ž–‹ƒŽ•α̶ˆ‹”•–̶Ȍ

Permutation analysis of FDR Permutation analysis was carried out as outlined above for DNA methylation analysis.

As a single glm model requires ~40 minutes of run time considerations were made for the number of iterations based on time and resource constraints. The number of iterations was set at 100.

Pathway analysis Pathway analysis for DEG was carried out using the ClusterProfiler package to examine over representation and gene set enrichment within the GO, and KEGG databases. Results were visualised using the same package.

Integrated Analysis

mQTL and eQTL analysis Matrix eQTL (version 2.2) was used to run both the expression and methylation QTL analysis with the AS-associated loci identified previously (Appendix A). SNPs with poor imputation (R2<0.8) were excluded from analysis. A minor allele frequency

(MAF) cut-off of 0.1 instead of the standard 0.05 cut-off due to the sample size being below 100 for most cell types. Further outlier filtering was carried out using quantile normalisation and Kruskal-Wallis testing. Population stratification control is implemented through the inclusion of principal components (PC) as covariates during

80 Chapter 2: Materials and Methods

analysis. Covariates of age, sex, smoking status, anti-TNFα and HLA-B*27 status were used. SNPs in LD with the lead AS-associated SNP were defined as within a

1000kb window (±500kb ) of the selected SNP. Associations were analysed separately for methylation and gene expression in cis (±500kb), and in trans (>500kb). Inflation was assessed using QQ plot of the observed vs expected p-values >0.01.

mixOmics To investigate molecular signatures across these different ‘omic techniques the mixOmics function DIABLO was used. DIABLO is intended for use where the same

N individuals were sampled across multiple different ‘omic approaches. DIABLO aims to “maximize the common or correlated information between multiple datasets whilst identifying in an optimal manner the key ‘omics variables (mRNA, miRNA,

CpGs, proteins, metabolites, etc.) that explain and reliably classify disease sub-groups or phenotypes of interest.”213 The method builds on the projection to latent structure models (PLS) to model and maximize the correlation between pairs of pre-specified

‘omics datasets to unravel the relationships between those ‘omics data, such as between mRNA and methylation.

The input data used for DIABLO excluded any genes/probes or samples removed during quality control in individual analysis but was not limited to regions identified in individual analysis as different between AS individuals and controls as this can result in overfitting of the model. The 10,000 most variable probes and genes were included in DIABLO analysis. Each cell type was analysed separately. Genotype is used as an additive model, and therefore is measured as the number of risk alleles (0,1 or 2). As with QTL analysis this excluded poorly imputed SNPs or those with no alternative allele in the cohorts. The exclusions of these SNPs should not affect the

Chapter 2: Materials and Methods 81

model, as their uniformity across both healthy and AS individuals makes it impossible to define their interaction with other measurements or effect on disease status.

Genotype, transcriptomics, and methylation data was input with disease status the outcome being modelled. The matrix design, which determines which blocks of data should be connected to maximise correlation or covariance between components

(ranging 0 to 1, for no correlation or correlation to maximise), was based on both prior knowledge of the relationship between these measures and data driven using block.pls.

82 Chapter 2: Materials and Methods

Chapter 3: DNA methylation in AS

Introduction

DNA methylation is a relatively stable chemical marker that can act either in a suppressive or activating role within the context of gene expression and genetic control

233. In disease DNA methylation has been investigated as a biomarker for disease and disease progression, and as an intermediary between genetic and environmental influences 188,234,235. Genetic sequence is the primary determinant of DNA methylation state, and disease associated loci can affect DNA methylation by altering this genetic sequence. Environmental factors can strongly influence DNA methylation, including age, sex, smoking and medications (outlined in 1). As outlined in Chapter 1, several of these environmental factors, particularly smoking, are associated with disease severity in AS. It is therefore plausible that AS-associated genetic loci effect phenotype through altered DNA methylation at locations of genetic control or at genes themselves.

The exact effect of methylation on transcription is a complex interaction between genomic location, the function of that genomic region (e.g. repressive transcription factor binding site), and whether the methylated CpG is within a CpG island 236.

Generally, DNA methylation within a promoter region is associated with repression of gene expression. Some types of cancers are characterised by global increases in methylation due to a loss of regulation. This is not the case for most complex diseases, where specific changes in DNA methylation occur in relation to associated genetic changes and environmental influences.

Chapter 3: DNA methylation in AS 83

DNA methylation in AS

DNA methylation was first implicated in AS due to the genetic association with

DNMT3A, DNMT3B and a likely association with DNMT3L. Alongside DNMT2, the proteins DNMT3A, DNMT3B and DNMT3L are responsible for de novo establishment of DNA methylation. DNMT3L is a catalytically inactive paralog to

DNMT3A and DNMT3B 237. While DNMT3L is not catalytically active, DNMT3L is responsible for maternal imprinting, and has been shown to facilitate interactions between DNMT3A and DNMT3B with transcription factors 238,239.

In the past two decades, almost 80 studies have examined DNA methylation in IBD, and over 60 have been carried out in psoriasis. In contrast, there have been relatively few studies in AS examining DNA methylation with fewer than 10 published in the same period (outlined in Chapter 1 and 1). The cost of DNA methylation assays has led to overuse of cheaper monogenic approaches, and limited cohorts where more expensive multigene approaches, such as Beadchip arrays, have been used.

Additionally, DNA methylation studies in AS have failed to account for biological variables known to alter DNA methylation including age, smoking status and medications. No study has performed a robust comparison of HLA-B*27 positive and negative individuals, nor do many studies even control for HLA-B*27 status in healthy individuals. Whether HLA-B*27 status affects the DNA methylation of other AS- associated loci or genes is unknown. Overall, there is a need for more robustly and broadly designed studies that account for known differences in AS and healthy individual demographics, like HLA-B*27 prevalence, to fully examine the role of

DNA methylation in AS.

84 Chapter 3: DNA methylation in AS

Insights from genetically related diseases

Genetically related diseases to AS such as IBD and psoriasis have shown the benefits of examining cell types individually, whether this be blood-derived subsets or separate epithelial layers. A study in IBD examined DNA methylation changes in isolated immune cell subsets from blood (CD4+T-cells, CD8+T-cells and CD14+monocytes)

188. When methylation and expression data from these cell types and whole PBMCs from the same individuals were examined using PCA analysis, all three cell types clustered separately from each other and from the PBMC samples. This study indicated that differences in DNA methylation due to cell type can be larger than those due to disease. Further many of the DMPs identified in isolated cell types were only expressed in a single cell type and were undetectable or diluted within whole blood.

For example, the most significant DMP in CD14+monocytes, a CpG associated with

HDAC4, was unidentifiable in whole blood and was only associated with changes in gene expression in CD14+monocytes. The cell type-specific signals were observed in both the methylation and expression data revealing that adjusting for cell proportions in whole blood does not allow for interpretation of the role of individual cell types in disease. Sample heterogeneity has been cited in several DNA methylation studies in both IBD and psoriasis as a potential confounder and source of inter-study variability

240,241.

Measuring DNA methylation

The current gold standard for characterising DNA methylation is bisulfite conversion.

This reaction converts unmethylated, but not methylated, cytosine to uracil enabling the identification of methylated regions through sequencing 242. Both monogenic, e.g.

PCR-based, and genome-wide approaches, e.g. microarrays, utilise bisulfite

Chapter 3: DNA methylation in AS 85

conversion to identify if a CpG is methylated or not. It should be noted that most of the genome is incapable of being methylated, and where methylation occurs a large proportion have unknown function. Beadchip arrays are commonly used for this reason as they provide high-throughput capability, relative economy per sample per CpG, reproducibility, and the inclusion of regions of known function and disease significance 243. The current version of the Beadchip array, the Illumina Human

MethylationEPIC, covers over 850,000 CpG sites and incorporates a larger number of

CpGs in the open sea, cytosine nucleotide guanine (CNG) and enhancers identified in

FANTOM5 study compared to the previous Illumina HumanMethylation450

Beadchip 244. The coverage of the array is approximately 3% of CpG sites in the genome, however these represent more than 70% of the Refseq identified TSS 5’UTR and 3’UTR CpG sites 245. The array also introduces coverage of distal elements however these are not robustly covered (only 7 % distal and 27 % proximal ENCODE regulatory elements are represented) as these regions show poor predictability based on the DNA methylation of a single CpG.

Data from these arrays can be analysed as either positions (DMP) or regions (DMR).

DMP analysis examines each CpG individually to determine if the site has altered methylation. As the site is measured across all cells sequenced, methylation at each

CpG site is constrained between 0 and 1, representing no methylation in any cells, or complete methylation within all cells sampled. The shift in this value for a CpG represents a difference in the total number of cells for which the position is methylated

(%).

Due to the large number of probes contained within the arrays, large sample sizes are necessary to achieve statistical significance after correction for multiple testing, such as Benjamini-Hochberg false discovery rate (FDR) 246-248. Obtaining sufficiently large

86 Chapter 3: DNA methylation in AS

cohorts is challenging due to the cost and input material required for Beadchip array methods. The issue of how to balance practical limitations of sample size with statistical significance is an ongoing debate. Currently, the emphasis within DNA methylation studies to identify pathways relevant to disease is shifting from the selection of relevant sites using a p-value cut-off to interpreting these values through the lens of biological relevance informed by statistical tests. One approach is utilising the relative methylation levels of colocalised CpGs which are usually coordinated to identify DMR. These provide a greater insight into the function of methylation changes within a region, particularly if these changes are coherent. Further, examining co- regulated data including transcription and chromatin accessibility can provide biological context to DNA methylation.

Chapter 3: DNA methylation in AS 87

Chapter Outline

The third chapter of this thesis sought to address the gap in knowledge of the status of

DNA methylation in AS by examining changes in DNA methylation in individual immune cell subsets (CD4+T-cells, CD8+Tcells, γδ T-cells, NK cells and

CD14+Monocytes) between AS and healthy individuals across more than 850,000

CpG sites using the Illumina Infinium MethylationEPIC Beadchip array.

Careful consideration was taken where practicable to control for factors known to affect DNA methylation, and well-established methodologies were implemented throughout both measurement and analysis of DNA methylation. DNA methylation data was initially used to assess DMP and DMR in each cell type between individuals with AS and healthy individuals. Identified DMP were used to interrogate the GO and

KEGG pathways for the genes in which these CpGs are associated.

I hypothesised three outcomes:

1. Cell type specific changes in DNA methylation occur in AS.

2. These changes are coordinated as regions (DMR).

3. Changes in DNA methylation occur in specific biological pathways

related to immune function.

88 Chapter 3: DNA methylation in AS

Results

Quality assessment of DNA methylation data

Of the 500 samples extracted for use in DNA methylation studies from FACS isolated cell subsets, 420 samples had sufficient DNA for use with the Illumina Infinium

MethylationEPIC Beadchip array. Samples with insufficient material were mostly γδ

T-cells and NK cells, due to low frequency in circulation. Low cell numbers were related to cell type rather than disease status (AS diagnosed individual or healthy control), as an equal number of AS and healthy individual samples were obtained

(Table 3.1). Two samples were removed due to failure of bisulfite conversion and probe binding, respectively. The remaining samples were all used in the following statistical analysis.

Table 3.1ƒ’Ž‡—„‡”•ˆ‘”‡ƒ Š ‡ŽŽ–›’‡ˆ‘”ƒ†Š‡ƒŽ–Š›‹†‹˜‹†—ƒŽ•ȋ ȌǤ CD4+ CD8+ CD14+ All γδ T-cells NK cells T-cells T-cells Monocytes subsets

AS 50 48 21 42 49 19

HC 50 49 18 43 50 15

TOTAL 100 97 39 85 99

Initial examination of sample clustering using PCA on normalised data showed that samples segregated by cell type across PC 1 and 2 (Figure 3.1). This clustering was greater than any differences based on disease or technical variables (data not shown).

The three T-cell populations clustered closely together, and the CD14+monocytes segregated completely from the other cell types. The separation observed is concordant with the divergence between lineage of these cell types, as CD14+monocytes originate

Chapter 3: DNA methylation in AS 89

from a myeloid progenitor, and the T-cell and NK cell subsets originate from a lymphoid progenitor.

Figure 3.1.’Ž‘–‘ˆƒŽŽ•ƒ’Ž‡•Žƒ„‡ŽŽ‡†„› ‡ŽŽ–›’‡‹ͳƒ†ʹǤ  ‘†‡•‡•Žƒ”‰‡ƒ‘—–•‘ˆ‡ƒ•—”‡‡–•‹–‘‹–•’”‹ƒ”› ‘’‘‡–•ƒ†–Š”‘—‰Š–Š‹• ƒ„‡—•‡†–‘‹†‡–‹ˆ›™Š‹ Šˆƒ –‘”•ƒ”‡ƒˆˆ‡ –‹‰–Š‡‘˜‡”ƒŽŽ‡–Š›Žƒ–‹‘’ƒ––‡”ǤŠ‡ •ƒ’Ž‡• Ž‡ƒ”Ž›•‡‰”‡‰ƒ–‡†ƒŽ‘‰ ‡ŽŽŽ‹‡ƒ‰‡ǡ™‹–ŠͳͶ‘‘ ›–‡•ˆ”‘ƒ›‡Ž‘‹†’”‘‰‡‹–‘” ƒ† ‡ŽŽ•ƒ†Ǧ ‡ŽŽ•ˆ”‘ƒŽ›’Š‘‹†’”‘‰‡‹–‘”Ǥ‡ŽŽ–›’‡™ƒ•ƒ— Š‰”‡ƒ–‡”‹ˆŽ—‡ ‡‘ ‡–Š›Žƒ–‹‘‘˜‡”ƒŽŽ–Šƒ†‹•‡ƒ•‡Ǥ

Cell types were examined separately for technical and biological factors influencing sample variability to avoid conflating variation due to cell type with technical differences. PCA values were calculated from each cell type and used to test which factors are driving variability within each cell subset. Each variable was tested in a linear model for how significantly that factor influences the variation captured within each PC (1 to 20). For ease of interpretation, these results can be visualised into a heat scree plot (the results for CD4+T-cells are shown in Figure 3.2). Squares are coloured

90 Chapter 3: DNA methylation in AS

by p-value with darker squares indicating greater influence on sample variation within the PC. In the CD4+T-cell example strong influences based on age and smoking status can be observed within PC1, and PC2 is influenced by plate, slide, disease status, sex and age. This method can be used to determine which technical variables should be adjusted for, and which biological variable should be included as covariates in analysis.

Figure 3.2 ‡ƒ–• ”‡‡’Ž‘–˜‹•—ƒŽ‹•‹‰‡ƒ Šˆƒ –‘”••‹‰‹ˆ‹ ƒ ‡‹‡š’Žƒ‹‹‰–Š‡˜ƒ”‹ƒ–‹‘™‹–Š‹ ‡ƒ ŠǤ Š‡˜ƒŽ—‡••Š‘™Š‡”‡ƒ”‡ˆ‘”ͶΪǦ ‡ŽŽ•„‡ˆ‘”‡ƒ†ƒˆ–‡”ƒ†Œ—•–‡–ˆ‘”•Ž‹†‡—•‹‰‘ƒ–Ǥ

Both plate and slide were significant variables in the first 10 PCs across all cell types except γδ T-cells where only plate was significant (p-value <0.05). Sample plates consist of 12 individual slides, and each slide is run with 8 samples. During visualisation a single slide of the 54 slides processed was observed to cluster separately

(not shown). The slide consisted of four CD8+T-cell samples, two CD4+T-cell samples and 2 CD14+monocytes. When tested, adjustment for slide alone using surrogate variable analysis (SVA) removed variation due to both slide and plate for all cell types except for γδ T-cells (data not shown). Adjustment for slide was carried out

Chapter 3: DNA methylation in AS 91

for CD4+T-cell, CD8+T-cells, NK cells and CD14+monocytes. NK cells and

CD14+monocytes were also adjusted for row, and γδ T-cells were adjusted for plate.

Adjustments did not affect significance of disease status, nor did they introduce random sources of variability (tested using dummy variables). Biological variables that remained significant in the first 10 PCs post adjustment were included in further analysis as covariates (Figure 3.2). Age, sex and smoking status, which were mismatched between the AS and healthy individual cohorts were selected as covariates

(outlined Chapter 2 Table 2.1).

After adjustment for technical variables, PCA plots for each cell type coloured by disease status (AS diagnosed individual or healthy control) shows some spread between AS individuals and healthy individuals. The lack of separation of the samples within PC 1 and 2 is likely an indication of the large number of CpGs tested that are not different in AS individuals (Figure 3.3). Due to the sheer number of methylation sites tested using the Illumina MethylationEPIC, CpGs that are not associated with changes due to disease status will be captured. As PCA is a technique that uses all the

CpG data available it is unlikely to show distinct differences.

92 Chapter 3: DNA methylation in AS

Figure 3.3’Ž‘–•‘ˆ‡ƒ Š ‡ŽŽ–›’‡ ‘Ž‘—”‡†„›†‹•‡ƒ•‡•–ƒ–—•ȋ‘”Š‡ƒŽ–Š›‹†‹˜‹†—ƒŽȌ A. ͶΪǦ ‡ŽŽ• BǤͺΪǦ ‡ŽŽ• CǤͳͶڏ‘‘ ›–‡• D.ɀɁǦ ‡ŽŽ• EǤ ‡ŽŽ• ’Ž‘–•ˆ‘”‡ƒ Š ‡ŽŽ–›’‡™‹–ŠŠ‡ƒŽ–Š›‹†‹˜‹†—ƒŽ•ȋ‰”‡›Ȍƒ†‹†‹˜‹†—ƒŽ•ȋ ‘Ž‘—”‡†ȌǤ

Chapter 3: DNA methylation in AS 93

Differential methylation analysis

DMPs between AS individuals and healthy controls were identified using linear modelling with the covariates of age, sex and smoking status. After adjustment for multiple testing using the “Benjamini Hochberg” method, none of the DMP remained significant (p-value <0.05) (Table 3.2). This is likely due to the large number of tests carried out using the Illumina MethylationEPIC results (>800,000), and the relatively small sample sizes of the cell types tested (<100 individuals). Overall, the CpGs identified as significantly differentially methylated between AS and healthy individuals had small changes (<20% change in beta values). Although there is some overlap in the DMPs identified in each cell type, different DMP were identified as the most significant in each cell type (full list in Appendix F(1)).

Table 3.2 Number of probes within each cell type below various significance cut-offs.

DMP DMP DMP Minimum Cell Type (P-value<0.05) (P-value<0.01) (adj.p-value<0.05) adj.p-value

CD4+T-cells 50441 12291 0 0.199

CD8+T-cells 55353 12650 0 0.188

γδ T-cells 26489 4457 0 0.999

NK cells 38891 6713 0 0.376

CD14+ 29271 5198 0 0.988 monocytes

Multiple testing adjustment using the conservative asymptotic FDR (Benjamini-

Hochberg) was untenable due to the sheer number of tests needed to adjust for

(>800,000). This approach uses an approximation to the true distribution. An alternative approach is permuting the distribution to determine FDR.

94 Chapter 3: DNA methylation in AS

Permuted FDR was determined by permutation of the linear modelling analysis randomising disease status and calculating association statistics at several p-value cut- offs for 1000 iterations. The resulting FDR cut-offs used were based on balancing the potential number of false positive with the number of DMP available. The

CD14+monocytes, and γδ T-cells had high numbers of false associations. The permuted FDR calculated from the permutation analysis at p-value cut-off of 0.01 (10-

2) was 0.21 for CD4+T-cells, 0.16 for CD8+T-cells and 0.48 for NK cells (Table 3.2).

At this p-value cut-off the permuted FDRs were substantial for CD14+monocytes and

γδ T-cells, at 1.05 and 1.15, respectively. At no cut-off p-value do the γδ T-cell DEGs have a greater number of true positives than false positives, therefore at no point can the null hypothesis be rejected for this cell type. Optimal permuted FDRs were obtained at p-value cut-off of 10-3 for CD4+T-cells and CD8+T-cells, 10-4 for

CD14+monocytes and NK cells.

Table 3.3. Number of DEG and permuted FDR for each cell type at various p-value cut-offs.

10-2 10-3 10-4 10-5

DMP FDR DMP FDR DMP FDR DMP FDR

CD4+T-cells 12291 0.21 1633 0.11 206 0.08 26 0.05

CD8+T-cells 12650 0.16 1679 0.09 219 0.05 33 0.03

CD14+monocytes 5198 1.05 471 0.82 43 0.64 3 0.81

Γδ T-cells 4457 1.15 321 1.11 27 0.98 3 0.79

NK cells 6713 0.48 591 0.40 59 0.29 6 0.22

Chapter 3: DNA methylation in AS 95

CD4+T-cells

The 20 DMPs with the lowest p-values identified in CD4+T-cells were associated with genes involved in immune signalling activation (TOLLIP, IL21R, NFATC2,

RP1043L13.1) and regulation (SOCS3, SBNO2, NRG1, PATZ1, CCDCC88C), cellular migration and adhesion (IFT140, CLEC3A, LIMS2), alongside cellular metabolism genes (PRKAG2, SUV420H1, ZZEF1, ZSWIM7) (Table 3.4).

Figure 3.4. Boxplot of the most significant DMP identified for CD4+T-cells (cg03740323) Significantly increased in CD4+T-cell results (left), but not other four cell types (right). AS individuals indicated by the darker colours. * p-value < 0.05 The most significant DMP for CD4+T-cells was PRKAG2, a subunit of AMPK, which regulates cellular metabolism through the regulation of glycogen storage 249. This

DMP was increased in AS individuals compared to healthy controls (Figure 3.4).

Cg03740323 is associated with the body or 3’UTR of PRKAG2, depending on the annotation used, it is predicted that this change in methylation is discordant with any change in PRKAG2 mRNA expression. Overall the absolute change in average beta difference between AS and healthy individuals for each CpG was small (<10%)

(Figure 3.5). Several DMP with unknown gene associations are in DNase hypersensitivity regions, or open chromatin, indicating transcriptionally active regions.

96 Chapter 3: DNA methylation in AS

Figure 3.5 Volcano plot of CD4+T-cell DMPs. The 10 most significant DMPs associated with CD4+T-cells are labelled. The average Beta difference in the CpGs tested had a small range predominantly being less than 10% (0.1). The black section indicates the samples below a 0.05 p-value cut-off. ND: Not determined.

Chapter 3: DNA methylation in AS 97

Table 3.4 CD4+T-cell 20 most significant differentially methylated positions. UCSC: ‡ˆ ‡‡ǤProm. Assoc.:”‘‘–‡”••‘ ‹ƒ–‡†Ǥ DHSǣƒ•‡Š›’‡”•‡•‹–‹˜‹–›•‹–‡ǤOCR:’‡ Š”‘ƒ–‹”‡‰‹‘ǤTFBSǣ–”ƒ• ”‹’–‹‘ˆƒ –‘”„‹†‹‰•‹–‡ǤT:”—‡ UCSC UCSC Gencode Gencode Prom. adj. Associated gene Name Chr. Pos. Region DHS OCR TFBS logFC p-value Name Group Name Group Assoc. p-value function AMPK subunit, Open cg03740323 7 151548248 PRKAG2 Body PRKAG2 3'UTR T 0.274 3.52x10-07 0.199 regulates cellular sea metabolism North Genomic cg17166288 11 67979344 SUV420H1 5'UTR SUV420H1 5'UTR T T -0.306 4.96x10-07 0.199 Shore instability. IFT40: Cilia formation and IFT140 Body osteogenesis TMEM204: Cell cg14470121 16 1587684 Island IFT140 3'UTR T -0.320 8.93x10-07 0.218 adhesion and TMEM204 Body cellular permeability at adherens junctions Open cg23998478 2 121269292 T 0.218 1.09x10-06 0.218 Unknown sea Stimulates Open 1.79 x10- regulatory cg16646614 8 32489382 NRG1 Body NRG1 5'UTR T -0.472 0.265 sea 06 phenotype in blood and spinal cord IL1 and TLR signalling pathway. North 1.98 x10- cg11095027 11 1297066 TOLLIP 3'UTR TOLLIP 3'UTR T 0.726 0.265 Inhibits cell Shore 06 activation by microbial products. Potentially involved South RP5- TSS1500 cg04343361 20 58663316 T -0.216 3.46x10-06 0.277 in IFN-α Shore 1043L13.1 5'UTR production. 250 Homologous Open TSS1500 3.72 x10- cg03901846 17 15884729 ZSWIM7 Body ZSWIM7 T -0.270 0.277 recombination sea 3'UTR 06 repair Antigen transport Open 5'UTR 3.86 x10- cg20732539 16 27416077 IL21R IL21R 5'UTR T T -0.315 0.277 by DCs. Control of sea Body 06 beacon function of

98 Chapter 3: DNA methylation in AS

CD4+T-cells. T-cell activation. Open 4.50 x10- Calcium ion cg01491358 17 4034650 ZZEF1 Body ZZEF1 3'UTR T -0.428 0.277 sea 06 binding. Open 5.23 x10- cg10804816 10 73625768 T -0.242 0.277 Unknown sea 06 Cell survival/ North TSS1500 5.67 x10- cg02135151 22 31739715 PATZ1 Body PATZ1 T T -0.192 0.277 apoptosis via p53 Shore TSS200 06 signalling pathway Open 5.69 x10- cg25334934 2 121269348 T 0.346 0.277 Unknown sea 06 Open 6.30 x10- cg15577767 15 101212275 T 0.280 0.277 Unknown sea 06 CLEC3A TSS200 Extracellular matrix Open 6.40 x10- cg21048669 16 78056379 CLEC3A TSS200 T 0.377 0.277 involved in sea RP11- 06 3'UTR osteoarthritis 281J9.2 Cytokine North 6.42 x10- cg18181703 17 76354621 SOCS3 Body T T T -0.344 0.277 suppression. Shore 06 Increased in AS. IL-10 anti- inflammatory Open 6.44 x10- cg12170787 19 1130965 SBNO2 Body T T -0.501 0.277 pathway via sea 06 STAT3. Osteoclast fusion. Body TSS200 6.64 x10- cg10661054 2 128422179 Island LIMS2 TSS200 LIMS2 T 0.152 0.277 Cell migration 5'UTR 06 5'UTR Regulates CD4+T- South 7.24 x10- cg04307107 20 50180603 NFATC2 TSS1500 NFATC2 TSS1500 T 0.326 0.277 cell and T helper Shore 06 cell differentiation. Open 7.90 x10- Wnt signalling cg10564112 14 91842149 CCDC88C Body CCDC88C 3'UTR T -0.258 0.277 sea 06 pathway

Chapter 3: DNA methylation in AS 99

CD8+T-cells

CD8+T-cell DMP were associated with immune signalling (PDE4A, CLEC4D, SKI,

TOX, TESPA1) (Table 3.5). The most significant DMP in CD8+T-cells, cg09997271, was associated with PDE4A, which regulates cyclic-AMP signalling and is associated with osteoclast function and inflammation 251,252. PDE4A is highly expressed in ulcerative colitis colon, and in a colitis mouse model reduction of PDE4A was associated with reduction in colitis 253. Cg09997271 was significant in both CD4+ and

CD8+T-cells but not γδ T-cells, NK cells or CD14+monocytes (Figure 3.6). PDE4 gene family members, PDE4A to PDE4D, all lead to increased expression of inflammatory cytokines including TNF-α 254. However, apremalist, a PDE4 inhibitor used in plaque psoriasis, failed to achieve a higher ASAS20 than placebo in AS, achieving ASAS20 32.5% compared to 36.6% placebo at week 16 255.

Figure 3.6 CD8+T-cell DMP cg09997271 had the lowest p-value. Cg09997271, associated with PDE4A, had increased methylation in CD4+T-cells and CD8+T-cells of AS individuals. *p-value <0.05 T-cell selection and signalling genes identified as associated with DMPs included

TESPA1 (thymic-specific TCR signalling regulator), TOX (regulator of CD8+T-cell exhaustion) and SKI (suppression of T-cell polarisation). The CpGs in TOX body and

5’UTR had increased methylation levels in AS individuals indicating potential

100 Chapter 3: DNA methylation in AS

increases in gene expression. The CpGs associated with TESPA 5’UTR (cg03485126) and SKI body (cg08884752) had decreased methylation levels in AS individuals, indicating potential reductions in gene expression (Figure 3.7).

Figure 3.7 Volcano plot of CD8+T-cell DMPs. The 10 most significant DMP are labelled. The average Beta difference in the CpGs tested had a small range predominantly being less than 10% (0.1). The black section indicates samples with p-value < 0.05.

Chapter 3: DNA methylation in AS 101

Table 3.5 CD8+T-cell 20 most significant DMP. UCSC: ‡ˆ ‡‡ǤProm. Assoc.:”‘‘–‡”••‘ ‹ƒ–‡†Ǥ DHSǣƒ•‡Š›’‡”•‡•‹–‹˜‹–›•‹–‡ǤOCR:’‡ Š”‘ƒ–‹”‡‰‹‘ǤTFBSǣ–”ƒ• ”‹’–‹‘ˆƒ –‘”„‹†‹‰•‹–‡ǤT:”—‡ UCSC UCSC Gencode Gencode Prom. adj. Associated gene Name Chr. Pos. Region DHS OCR TFBS logFC p-value Name Group Name Group Assoc. p-value function Regulates cAMP pathways; North Body TSS1500 cg09997271 19 10542200 PDE4A PDE4A T 0.477 3.40x10-7 0.188 osteoclast Shore TSS1500 5'UTR formation and inflammation Extracellular matrix formation/ Open cg10642636 18 33275580 GALNT1 Body -0.397 1.10x10-6 0.188 remodelling, sea Regulates cell proliferation North Dopamine receptor cg10133876 21 36041178 CLIC6 TSS1500 CLIC6 TSS1500 T 0.342 1.15x10-6 0.188 Shore signalling pathway Differentially expressed in AS. Open RP11- cg14108567 12 8717487 3'UTR T -0.478 1.33x10-6 0.188 Located near sea 561P12.5 CLEC4D/ CLEC4E209 Negative regulators of TGF-β South cg08884752 1 2162001 SKI Body T -0.499 1.54x10-6 0.188 signalling, Shore suppression of T- cell polarisation Central regulator of Open cg19374961 8 60019164 TOX Body TOX 5'UTR T 0.536 1.56x10-6 0.188 CD8+T-cell sea exhaustion in mice North Epigenetic cg12159502 10 69643547 SIRT1 TSS1500 SIRT1 TSS1500 T 0.422 1.87x10-6 0.188 Shore silencing North TSS1500 TSS1500 cg04561349 3 23851358 UBE2E1 UBE2E1 T -0.532 2.08x10-6 0.188 Ubiquitination Shore Body 5'UTR Open cg19499831 8 59967766 TOX Body TOX 5'UTR T 0.397 2.15x10-6 0.188 As above sea Open Extracellular matrix cg11077179 7 7559712 COL28A1 Body COL28A1 TSS200 -0.543 2.68x10-6 0.188 sea formation

102 Chapter 3: DNA methylation in AS

Regulates thymocyte Open development cg03485126 12 55373535 TESPA1 5'UTR TESPA1 5'UTR -0.864 2.69x10-6 0.188 sea through regulating TCR-mediated signalling ERI2 5'UTR ERI2 Body 3'UTR North RNA exonuclease cg18339422 16 20817344 1stExon T 0.259 2.81x10-6 0.188 Shore NEF-sp

REXO5 TSS1500 REXO5 TSS1500 Open LINCR- RP11- cg09549406 8 10331768 TSS1500 TSS1500 T 0.734 3.54x10-6 0.213 Unknown sea 0001 981G7.2 North cg03385918 21 44862425 T T -0.242 3.74x10-6 0.213 Unknown Shore Open cg18914023 14 74868443 T 0.275 3.99x10-6 0.213 Unknown sea North cg18833933 12 133185615 LRCOL1 3'UTR T 0.236 4.64x10-6 0.214 Unknown Shore Skeletal muscle South cg14197801 19 2256492 JSRP1 TSS200 JSRP1 TSS200 T 0.241 4.92x10-6 0.214 excitation/ Shelf contraction Open 3'UTR cg13378297 2 128398094 LIMS2 Body LIMS2 T T -0.286 5.00x10-6 0.214 Cell migration sea TSS200 Open cg15153141 5 81470056 ATG10 Body ATG10 5'UTR T 0.386 5.07x10-6 0.214 Autophagy sea South 1stExon 5'UTR Glycosyl- cg08203715 11 126226154 ST3GAL4 ST3GAL4 T T 0.274 5.34x10-6 0.214 Shore 5'UTR 1stExon transferase

Chapter 3: DNA methylation in AS 103

CD14+monocytes

Fewer DMPs were identified in CD14+monocytes than CD4+T-cells and CD8+T-cells (Figure 3.8). The

DMP identified were largely associated with translocation/vesicle transport (SYT2, ANK3, SLC6A3, IPO5,

TMEM200C), and cellular migration (RAB5B, GMFG, ANKMY1) (Table 3.6). Cg01245135, the most significant annotated DMP in CD14+monocytes, is associated with TCF4, a transcription factor positively regulated by Wnt. Cg01245135 had decreased methylation in AS individuals, yet due to the broad annotation (5’UTR/TSS1500/Body) it is difficult to predict how this might affect TCF4 expression.

Several DMP were located within transcriptional start sites indicating that DNA methylation at these sites may be correlated with gene expression, these DMP were associated with SYT2, ANK3, IPO5, RAB5B,

GMFG and ANKMY1. The CpGs associated with ANK3, membrane-cytoskeleton linker, and GMFG, negative modulator of TLR4 signalling and monocyte migration, are both downregulated in AS individuals.

The remaining CpGs in the TSS (SYT2, IPO5, RAB5B, ANKMY1) are all upregulated. Potentially these changes indicate increased expression of SYT2, IPO5, RAB5B, ANKMY1, and TCF4, alongside the potential decreased expression in ANK3 and GMFG is similar to the differences in expression of these genes between classical and non-classical monocytes 256.

104 Chapter 3: DNA methylation in AS

Figure 3.8 Volcano plot of CD14+monocyte DMP

The 10 DMP with the lowest p-value associated with CD14+monocytes are labelled. The black section indicates the samples below a 0.05 p-value cut-off. ND: Not Determined.

Chapter 3: DNA methylation in AS 105

Table 3.6 CD14+monocyte 20 most significant DMP. UCSC: ‡ˆ ‡‡ǤP5E:Šƒ–‘ͷŠƒ ‡”ǤProm. Assoc.:”‘‘–‡”••‘ ‹ƒ–‡†Ǥ DHSǣƒ•‡Š›’‡”•‡•‹–‹˜‹–›•‹–‡ǤOCR:’‡ Š”‘ƒ–‹”‡‰‹‘ǤTFBSǣ–”ƒ• ”‹’–‹‘ ˆƒ –‘”„‹†‹‰•‹–‡ǤT:”—‡ Region UCSC UCSC Gencode Gencode Prom. adj. Associated gene Name Chr. Pos. DHS OCR TFBS logFC p-value Name Group Name Group assoc. p-value function Open cg07302051 16 51671357 T 0.536 5.78x10-6 0.988 Unknown sea 5'UTR 5'UTR North Positively regulated cg01245135 18 53253483 TCF4 TSS200 TCF4 TSS1500 T -0.525 9.50x10-6 0.988 Shelf by WNT1 Body TSS200 Mediates microbicial cg24506936 8 143535891 Island BAI1 5'UTR T 0.210 9.89x10-6 0.988 activity. DC formation, vesicle South cg11785154 1 202679993 SYT2 TSS1500 SYT2 TSS1500 T 0.217 1.61x10-5 0.988 transport, Shore morphogenesis IER3: NF-kB antagonist IER3 3'UTR FLOT1 TSS1500 FLOT1: epithelial cg02915939 6 30711022 Island T T -0.217 1.80x10-5 0.988 mesenchymal FLOT1 TSS1500 IER3 3'UTR transition, via 1stExon ERK/AKT signalling pathway Correlated with Open cg26055950 1 3600852 TP73 Body T 0.207 1.89x10-5 0.988 Wnt/β-Catenin sea pathway 5'UTR Regulate cytoskeleton cg07592361 2 241496830 Island ANKMY1 5'UTR ANKMY1 TSS200 T T 0.364 2.19x10-5 0.988 organization 3'UTR South Associated with WBC cg22892437 10 100996747 HPSE2 TSS1500 HPSE2 TSS1500 T 0.387 2.25x10-5 0.988 Shelf content in blood. TSS1500 Open Bone marrow homing, cg18193195 12 56384508 RAB5B Body RAB5B 5'UTR 0.390 2.42x10-5 0.988 sea INF-y induced. 1stExon Link membrane Open TSS200 proteins to actin cg01021234 10 62060351 ANK3 Body ANK3 T -0.521 2.53x10-5 0.988 sea 5'UTR cytoskeleton. Golgi transport.

106 Chapter 3: DNA methylation in AS

PSMA1 TSS1500 South TSS1500 TSS200 cg23329752 11 14542293 PSMA1 T T -0.427 2.71x10-5 0.988 NOTCH3 binds. Shore Body RP11- TSS1500 140L24.4 Open Dopamine active cg13557594 5 1395203 SLC6A3 Body T 0.294 2.84x10-5 0.988 sea transporter. South cg14159342 1 3624292 TP73 Body T T 0.350 2.85x10-5 0.988 Unknown Shore TSS1500 cg24652620 13 98628558 Island IPO5 Body IPO5 5'UTR T T 0.337 3.21x10-5 0.988 Importin gene TSS200 QPCTL: modifier of SNRPD2 TSS1500 SNRPD2 TSS1500 myeloid immune checkpoint. cg04032871 19 46195965 Island T T T -0.307 3.48x10-5 0.988 SNRPD2: RNA QPCTL 1stExon QPCTL 1stExon splicing and cell cycle. Open TSS1500 MAPK and TLR cg04169263 16 1756061 MAPK8IP3 TSS200 MAPK8IP3 T T -0.299 3.65x10-5 0.988 sea TSS200 pathways Transmembrane cg18139195 18 5890496 Island TMEM200C 1stExon TMEM200C 1stExon T 0.225 4.07x10-5 0.988 protein Bimodally expressed TSS1500 cg26963093 2 88469730 Island THNSL2 TSS200 THNSL2 T 0.582 4.23x10-5 0.988 in human skeletal TSS200 muscle. ZNF213 5'UTR 5'UTR 1stExon Potential transcription cg03097541 16 3185250 Island ZNF213 T T -0.286 4.28x10-5 0.988 1stExon RP11- factor 473M20.14 TSS1500 Pseudopodia Open component in T-cells. cg27360210 19 39826923 GMFG TSS200 GMFG TSS200 T T T -0.218 4.46x10-5 0.988 sea Necessary for monocyte migration.

Chapter 3: DNA methylation in AS 107

γδ T-cells

DMP identified in γδ T-cell samples were related to genes involved in T-cell activation

(FAM49B, MLST8, PREX1, ADAM12, ERI1, ACKR2, EPS15L1) and adhesion/cellular morphology (SDK1, CCDC69, CYTH1) (Figure 3.9). Cg16871855, the strongest disease associated DMP in γδ T-cell, is associated with FAM49B, a key regulator of actin dynamics and T-cell activation through the repression of Rac activity.

Cg16871855 lies in the FAM49B body/3’UTR region and has decreased methylation in AS individuals (~5% reduction in average Beta value). This may indicate increased in FAM49B expression in AS individuals, assuming a negative correlation between

DNA methylation and gene expression. The majority of the 10 most significant DMP show decreased methylation in AS individuals, and lie within the body of their associated genes, indicating potential upregulation of gene expression (Table 3.7).

Two DMP, cg06301412 (AADACL2-AS1) and cg03781837 (ACKR2) are also associated with psoriasis. ACKR2 suppresses Th17 responses and IFN-γ producing γδ

T-cells, and the increased methylation in the body of ACKR2 suggests potential decreased expression in AS individuals.

108 Chapter 3: DNA methylation in AS

Figure 3.9 Volcano plot of γδ T-cell differentially methylated positions. The 10 most significant DMP associated with γδ T-cell are labelled and samples coloured in black have p-value < 0.05. A larger average beta difference was observed between the γδ T-cell samples, with several greater than 0.1 (10%). These were statistically not significant, which may be due to higher sample variability rather than a larger difference between AS and healthy individuals. ND: Not Determined.



Chapter 3: DNA methylation in AS 109

Table 3.7. γδ T-cells 20 most significant differentially methylated positions. UCSC: ‡ˆ ‡‡ǤProm. Assoc.:”‘‘–‡”••‘ ‹ƒ–‡†Ǥ DHSǣƒ•‡Š›’‡”•‡•‹–‹˜‹–›•‹–‡ǤOCR:’‡ Š”‘ƒ–‹”‡‰‹‘ǤTFBSǣ–”ƒ• ”‹’–‹‘ˆƒ –‘”„‹†‹‰•‹–‡ǤT:”—‡ UCSC UCSC Gencode Gencode Prom DH Adj Associated gene Name Chr Pos. Region OCR TFBS logFC p-value Name Group Name Group assoc. S p-value function Regulates Open 3'UTR mitochondrial cg16871855 8 130970618 FAM49B Body FAM49B T -1.433 4.87x10-6 0.999 Sea 5'UTR function and T-cell activation MLST8 5'UTR TSS1500 Subunit of South 1stExon mTORC1, a cg11395975 16 2255950 MLST8 5'UTR T T 0.719 6.00x10-6 0.999 Shore TSS200 serine/threonine kinase AC009065.3 TSS1500 Member of Ig Open superfamily, cg08105834 7 4267372 SDK1 Body SDK1 5'UTR 0.661 7.13x10-6 0.999 sea binding site adhesion molecule Open AADACL2 RP11- Associated with cg06301412 3 151618403 Body 3'UTR T -1.050 1.08x10-5 0.999 sea -AS1 454C18.2 psoriasis Open cg21539397 2 29224950 FAM179A Body FAM179A 5'UTR T -0.793 1.36x10-5 0.999 Unknown sea Open Blood clot cg13361086 13 113825610 PROZ Body PROZ 5'UTR T -0.948 1.45x10-5 0.999 sea formation North cg06836641 3 50261422 0.485 2.12x10-5 0.999 Unknown Shelf North cg21560830 22 49879782 0.503 2.15x10-5 0.999 Unknown Shelf B-cell survival 1stExon Open through regulation cg09075968 13 113841799 PCID2 Body PCID2 TSS200 T -0.789 2.61x10-5 0.999 sea of MAD2 3'UTR expression. mRNA stability and Open cg16312941 20 47405246 PREX1 Body T -0.812 2.89x10-5 0.999 secretion of IL-2, sea IL-4, IL-10 Open cg19857913 10 116452143 T 0.623 3.72x10-5 0.999 Unknown sea

110 Chapter 3: DNA methylation in AS

Open 5'UTR Central spindle cg24896703 5 150575512 CCDC69 Body CCDC69 T -1.050 3.77x10-5 0.999 sea 3'UTR formation Regulated by Open NOTCH, regulates cg24929693 10 127760436 ADAM12 Body ADAM12 TSS1500 0.816 3.91x10-5 0.999 sea the effector function of Th17 cells Ras inhibitor, Open 3'UTR regulates cg03785763 8 8868951 ERI1 Body ERI1 T -1.002 4.06x10-5 0.999 sea TSS1500 lymphocyte miRNA homeostasis CCBP2 TSS200 Associated with 5'UTR psoriasis, supresses Open RP11- cg03781837 3 42861026 ACKR2 5'UTR T 0.681 4.77x10-5 0.999 Th17 response and sea 70C1.3 3'UTR INF-γ producing γδ

T-cells KRBOX1 5'UTR incg151349 Open LINC0151 10 29049872 Body U6 TSS200 T 0.774 4.81x10-5 0.999 Splicing 88 sea 7 Open cg11575026 2 204722567 T -1.024 5.18x10-5 0.999 Unknown sea Open Homing and cg12010144 17 76733624 CYTH1 Body CYTH1 5'UTR T -0.681 6.13x10-5 0.999 sea engraftment of HPC Endocytosis of the transferrin receptor. cg23304156 19 16466421 Island EPS15L1 3'UTR T -0.604 6.24x10-5 0.999 Necessary for T-cell development Endothelial-to- Open cg27110849 11 86519306 PRSS23 Body PRSS23 5'UTR T T 0.485 6.32x10-5 0.999 mesenchymal sea transition

Chapter 3: DNA methylation in AS 111

NK cells

NK cell DMP were broadly associated with immune cell function (C9orf91, HOXA2,

SERPINB6, JUN, TRAK1) and muscle/tissue function (C8orf42, DTNB, DES,

CC2D2A) (Figure 3.10). C9orf91 lies within a region associated at genome-wide significance with AS, and is within a region associated with TNFSF8 in SpA 89,257. The most significant CpG, cg25533247, is found in a transcriptional start site associated with AKAP8L, which may interact with histone methyltransferase H3K4 258 (Table

3.8).

Almost half of the 20 DMP with the lowest p-values in NK cells were associated with non-coding transcripts of unknown function, or regions not associated with any gene or function to date. Two of these CpGs, cg17419888 and cg14377922, are associated with the same transcript KLHL29 (kelch-like protein 29) which has not been functionally annotated in humans but is a paralog of KLHL24 which is a ubiquitin ligase substrate receptor 259,260. AS individuals had increased methylation in cg13490403, associated with the 3’UTR/body of LHX6 (transcriptional repressor of

Wnt-β-Catenin pathways), and cg01855070, associated with TSS1500 of TRAK1

(endosome to lysosome trafficking) 261,262. Cg0941437, 3’UTR/1st Exon of JUN, associated with TLR4 signalling and regulation, had decreased methylation in AS individuals. This suggests increased expression of TLR signalling and endosome to lysosome signalling, and a removal of suppression on Wnt-β-catenin pathways by

LHX6.

112 Chapter 3: DNA methylation in AS

Figure 3.10 Volcano plot of NK cell DMP The 10 most significant DMP associated with NK cells are labelled. CpGs coloured in black are below a 0.05 p-value cut-off. The average beta difference in NK cells is less than that observed in other cell types. ND: Not Determined.

Chapter 3: DNA methylation in AS 113

Table 3.8 NK cell 20 most significant DMP. UCSC: ‡ˆ ‡‡ǤProm. Assoc.:”‘‘–‡”••‘ ‹ƒ–‡†Ǥ DHSǣƒ•‡Š›’‡”•‡•‹–‹˜‹–›•‹–‡ǤOCR:’‡ Š”‘ƒ–‹”‡‰‹‘ǤTFBSǣ–”ƒ• ”‹’–‹‘ˆƒ –‘”„‹†‹‰•‹–‡ǤT:”—‡ UCSC UCSC Gencode Gencode Prom TFB adj. Associated gene/s Name Chr Pos. Region DHS OCR logFC p-value Name Group Name Group assoc. S p-value function May interact with South cg25533247 19 15530630 AKAP8L TSS1500 AKAP8L TSS1500 T T -0.332 5.10x10-7 0.376 histone methyl Shore transferase TSS1500 Produces formate cg13604132 4 75023629 Island MTHFD2L TSS200 MTHFD2L TSS200 T -0.200 1.22x10-6 0.376 from one-carbon 5'UTR donors Transcriptional 5'UTR repressor. Inhibits cg13490403 9 124982413 Island LHX6 Body LHX6 T 0.616 1.41x10-6 0.376 3'UTR Wnt-β Catenin pathway Body plan North development and cg04737131 7 27142708 HOXA2 TSS1500 HOXA2 TSS1500 T 0.643 3.83x10-6 0.768 Shore cell fate determination C11orf88 TSS1500 TSS200 cg05216730 11 111385496 Island C11orf88 TSS200 T 0.386 5.98x10-6 0.931 Unknown RP11- 794P6.6 3'UTR Testis South cg04134279 8 496327 C8orf42 TSS1500 C8orf42 TSS1500 T 0.553 9.93x10-6 0.931 development, Shore altered by smoking Associated with cg20281264 2 25896729 Island DTNB TSS1500 DTNB TSS1500 T T -0.295 1.11x10-5 0.931 muscular dystrophy Region associated TSS1500 with AS cg17533850 9 117373523 Island C9orf91 TSS200 C9orf91 1stExon T T -0.510 1.16x10-5 0.931 susceptibility. 5'UTR Unknown function. North Innate immune cell cg01137047 6 2970877 SERPINB6 5'UTR SERPINB6 5'UTR T -0.455 1.22x10-5 0.931 Shore function 3'UTR 3'UTR Activated TLR4 cg09414137 1 59247592 Island JUN JUN T T -0.332 1.28x10-5 0.931 1stExon 1stExon signalling and

114 Chapter 3: DNA methylation in AS

regulation of gene expression AC069368 cg00152322 15 65198005 Island 5'UTR T T -0.391 1.30x10-5 0.931 Unknown .3 South AC003986 cg14499053 7 19158954 TSS1500 T 0.379 1.70x10-5 0.931 Unknown Shore .7 Open cg17419888 2 23840555 KLHL29 Body KLHL29 5'UTR T 0.396 1.89x10-5 0.931 Unknown sea Open cg14377922 2 23840537 KLHL29 Body KLHL29 5'UTR T 0.373 2.15x10-5 0.931 Unknown sea Maintains sarcomere, Found cg24927800 2 220283214 Island DES 1stExon DES 1stExon T 0.611 2.22x10-5 0.931 in heacardiac muscle and skeletal muscle North cg14940898 6 50678484 0.498 2.53x10-5 0.931 Unknown Shelf North cg22943986 7 27142700 HOXA2 TSS1500 HOXA2 TSS1500 T 0.563 2.54x10-5 0.931 Unknown Shore North cg14716990 10 129533731 T 0.421 2.72x10-5 0.931 Unknown Shore Open 5'UTR cg20816447 4 15480781 CC2D2A Body CC2D2A T -0.427 2.74x10-5 0.931 Ciliary transition sea TSS200 TSS1500 Endosome to Open TSS1500 cg01855070 3 42201314 TRAK1 TRAK1 5'UTR T 0.407 2.78x10-5 0.931 lysosome sea Body 3'UTR trafficking

Chapter 3: DNA methylation in AS 115

Differentially methylated regions

DMR analysis was performed using the previously identified DMP, with a p-value cut-off of 0.01. This cut-off was selected using the permuted FDR, due to the lack of

DMP after Benjamini-Hochberg adjustment for multiple testing. Two methods were utilised to identify DMR: bumphunter and DMRcate 263,264. Previous comparisons of these methods have found that DMRcate has a tendency to overestimate the true region size, while bumphunter has low precision and power 227. Reflecting this, bumphunter

DMR identified in this study were often smaller and less likely to encompass nearby

DMP if there were gaps between DMP, whereas the DMRcate regions were larger, often extending to encompass probes that were not differentially methylated. DMRcate results were relied upon as they are more robust than those identified through bumphunter 227. The array design relies upon the correlation in methylation levels of

CpGs located closely to each other.

DMRcate results are shown in order of Stouffer value by default, which is a method of meta-analysing using z-scores. Z-scores are a measurement of the values relationship to the mean group values expressed in terms of standard deviations, for example a z-score of 0 means the value is equal to the group mean, and a z-score of 1 means the value is 1 standard deviation larger than the mean group value. The Stouffer method relies on the assumption that these changes are in the same direction

(coherent), such as both being positive, and of comparative significance in p-values.

Stouffer values will be affected by the mixture of nominally significant (p-value <0.05) and insignificant p-values (p-value >0.05) within the DMR identified 265. Results are shown ordered by Stouffer value and minimum asymptotic FDR, to account for results where several DMR have the same Stouffer value.

116 Chapter 3: DNA methylation in AS

CD4+T-cells

The most significant DMR identified within CD4+T-cells overlaps with the promoter region of AKTIP (AKT interacting protein), part of the FHF complex which may promote vesicle trafficking and/or fusion, and can increase the release of TNFSF6 via the AKT1/GSK3B/NFATC1 signalling cascade 266 (Table 3.9).

Table 3.9Š‡ͳͲ‘•–•‹‰‹ˆ‹ ƒ–ƒ••‘ ‹ƒ–‡†™‹–Š‹ͶΪǦ ‡ŽŽ•Ǥ ȗ™‹–Š‹–Š‡•ƒ‡‰‡‡”‡‰‹‘‘ˆʹͲͶǤ Max Mean Overlapping Chr Start width CpGs Min FDR Stouffer Beta FC Beta FC promoters 16 53535593 333 3 4.80x10-7 0.332 -0.041 -0.035 AKTIP 2 128398072 890 5 3.87x10-8 0.336 -0.066 -0.048 LIMS FRAT2, 10 99092337 372 2 1.1x10-4 0.350 -0.023 -0.021 RP11- 452K12.4 16 11763519 820 2 3.23x10-3 0.354 -0.047 -0.037 SNN 16 1587309 534 5 1.24x10-11 0.370 -0.043 -0.032 NA* 19 13205470 212 2 8.90x10-4 0.373 0.029 0.021 NA* 15 101741557 57 2 1.39x10-3 0.425 0.017 0.017 NA 16 1585644 400 3 1.15x10-5 0.434 -0.061 -0.038 NA* 2 61045195 34 2 3.45x10-3 0.444 0.010 0.009 NA 18 66389420 28 2 3.31x10-3 0.453 -0.040 -0.030 NA

Three of the DMR (indicated in Table 3.9) are within the same gene transcript region for TMEM204 (visualised in Figure 3.11). Two of the TMEM204 DMR were identified using Bumphunter but were substantially smaller in size compared to those identified by DMRcate. TMEM204 is a transmembrane protein associated with cellular adhesion and cellular permeability at adherens junctions 267. TMEM204 has high levels of expression in naïve CD4+T-cells, and slightly higher expression in naïve regulatory

T-cells and memory CD4+T-cell T follicular helper (TFH) cells 256. Reduced methylation in all the DMR within the TMEM204 transcript region indicates coordinated increased expression of TMEM204 in AS individuals.

Chapter 3: DNA methylation in AS 117

Figure 3.11 Plot of the significant CD4+T-cell DMR in the TMEM204 transcript

This plot shows four DMR identified within the TMEM204 transcript, including the fifth and eight most significant DMR in CD4+T-cells. The relatively larger DMR within the TMEM204 promoter region was not as significant potentially due to its larger size.

118 Chapter 3: DNA methylation in AS

CD8+T-cells

SKI, a suppressor of TGF-β signalling, was associated with the CD8+T-cell DMR with the lowest Stouffer value (Figure 3.12). Multiple DMP were associated with SKI in

CD8+T-cells. The DMP with the lowest p-value in CD8+T-cells was part of a DMR within the PDE4A promoter region (Figure 3.13). The mean beta change across these

DMR is less than 5% indicating that the DMRcate regions include CpGs that are not differently methylated between healthy and AS individuals (Table 3.10). Thee region selected is potentially broader than the region of affected DMP and can be observed in both Figures 3.12 and 3.13.

While the majority of the 10 most significant DMR identified have a negative average beta fold change this is not reflective of the overall DMR identified, with 491 of the

698 DMR identified having positive mean beta fold changes (Supplementary Data).

The positive DMR overlap with promoters including EOMES, IKZF3, CLEC3A, and

TOX, indicating potential suppression of these genes. EOMES and IKZF3 both contain loci associated with AS at genome-wide significance.

Table 3.10 The 10 most significant DMR identified in CD8+T-cells.

Max Mean Overlapping Chr. Start width CpGs Min FDR Stouffer beta FC beta FC promoters 1 2161049 4531 13 1.72x10-12 0.210 -0.084 -0.048 SKI 1 112016558 474 6 5.83x10-13 0.240 -0.061 -0.051 C1orf162 18 66389420 28 2 3.84x10-5 0.256 -0.052 -0.050 NA 16 1585644 400 3 4.33x10-7 0.289 -0.077 -0.059 NA* 17 64300729 599 3 4.94x10-6 0.319 -0.051 -0.047 PRKCA 11 13894916 24 2 0.0003 0.337 0.049 0.049 NA 2 231094701 883 2 0.0006 0.363 -0.020 -0.014 SP140 1 200122591 131 2 0.0004 0.367 0.027 0.022 NA 22 20905256 301 3 2.93x10-5 0.414 -0.050 -0.041 MED15 8 144259286 1493 5 3.85x10-6 0.415 -0.071 -0.026 NA

Chapter 3: DNA methylation in AS 119

Figure 3.12 The most significant DMR in CD8+T-cells. The region overlaps with the promoter region of SKI, which had previously been associated with several significant DMP within CD8+T-cells. The promoter region is within a CpG Island (Grey). Both of the DMR methods identified the same region (DMR and Minfi Bumps).

120 Chapter 3: DNA methylation in AS

Figure 3.13 DMR containing the most significant DMP for CD8+T-cells (cg09997271) within the PDE4A promoter region.

The CpG is shown, alongside genomic information and the average methylation value for AS and healthy individuals for each probe within the region (bottom).

Chapter 3: DNA methylation in AS 121

CD14+Monocytes

DMR analysis for CD14+monocytes was statistically underpowered in comparison to the similar cohort sizes of CD4+T-cells and CD8+T-cells, which reflected the DMP analysis. The smallest Stouffer value is the same for 29 DMR (0.999236), and all consisted of 2 DMP (Table 3.11). Mean beta fold change for more than 90% of the identified DMR showed increased methylation, however only 13 DMR had an absolute mean beta fold change of more than 4.5%, associated with the promoter regions of

RGMA, OXR1, RP11-672A2.4-001, VTRNA, C22orf34, TMEM232, HOXA1, NPY2R, and RP11-2A4.3. This may indicate that the changes associated with

CD14+monocytes may potentially not be mediated by DNA methylation.

Table 3.11 10 CD14+monocyte DMR with the lowest Stouffer value ranked by minimum FDR

Min Max Mean Overlapping Chr. start width CpGs Stouffer FDR beta FC beta FC promoters 8 107630133 127 2 0.0021 0.9992 0.061 0.056 OXR1 HOMER3, 19 19049950 502 2 0.0022 0.9992 0.011 0.009 AC005932.1 13 24626904 664 2 0.0023 0.9992 0.056 0.032 NA TP73, 1 3569624 276 2 0.0027 0.9992 0.032 0.022 WRAP73 22 49881684 94 2 0.0051 0.9992 0.090 0.073 NA 18 7038746 198 2 0.0071 0.9992 0.049 0.043 LAMA1 11 132182675 132 2 0.0107 0.9992 0.033 0.020 NA 16 72942625 37 2 0.0128 0.9992 -0.034 -0.027 NA 15 31685635 189 2 0.0144 0.9992 0.030 0.026 KLF13 3 137492780 150 2 0.0148 0.9992 0.058 0.050 RP11-2A4.3

122 Chapter 3: DNA methylation in AS

γδ T-cells

All the DMR with a Stouffer value less than 1 in the γδ T-cells had less than 3 CpGs within them (Table 3.12). DMR containing larger numbers of CpGs (>5 CpGs) overlapped with the promoter regions of several genes involved in immune response including HLA-DPB1, TNFAIP8, HOXA2, and TMEM232. These DMR had mean average beta changes of 0.05, 0.03, -0.05 and -0.08 respectively, and maximum asymptotic FDR <0.002 indicating that the low Stouffer value is an indication of the limited power of these samples rather than unaltered DNA methylation in these regions. The HLA-DPB1 locus is associated with AS 88,128. HOXA2 is involved in the

Wnt-β-catenin pathway and affects skeletal development 268,269. TNFAIP8 is thought to potentially negatively regulate immunometabolism 270,271. The DMR overlapping the promoter regions of these genes suggest increased expression of HOXA2 and

TMEM232, and decreased expression of HLA-DPB1 and TNFAIP8.

Table 3.12 10 most significant DMR associated with AS in γδ T-cells.

Max Mean Overlapping Chr. Start width CpGs min FDR Stouffer beta FC beta FC promoters UBE2F, 2 238877332 127 2 2.91x10-5 0.999 -0.161 -0.122 UBE2F-SCLY 8 8868738 214 2 0.0016 0.999 -0.073 -0.039 ERI1 3 142316059 24 2 0.0032 0.999 0.082 0.054 PLS1 12 120692107 603 2 0.0035 0.999 -0.144 -0.119 NA 4 154462667 93 2 0.0042 0.999 -0.055 -0.034 NA 6 15505949 137 2 0.0051 0.999 -0.109 -0.104 NA 3 184243411 308 2 0.0067 0.999 0.096 0.041 NA 3 42861026 57 2 0.0074 0.999 0.057 0.037 ACKR2 3 4908643 540 2 0.0077 0.999 -0.117 -0.087 NA 17 739376 30 2 0.0080 0.999 0.075 0.061 NA

Chapter 3: DNA methylation in AS 123

NK cells

NK cell DMR encompassed only small numbers of CpG, as with previous cell types

(Table 3.13). The DMR with the smallest Stouffer value overlapped with the promoter region of AKAP8L, a protein that is still undergoing characterisation but is potentially associated with nuclear envelope breakdown and chromatin condensation (Figure

3.14) 272,273. The mean beta fold change is -0.027 indicating an increase in the expression of AKAP8L in AS individuals. Genes associated with an absolute mean beta fold changes (beta FC) >0.04 included HLA-C (2 DMR, beta FC -0.09 and -0.06),

TMEM232 (-0.05 beta FC), RUNX1 (-0.04 beta FC), HOXA1 (0.06 beta FC), DUSP22

(0.07 beta FC), TP73 (0.04 beta FC) and LHX6 (0.04 beta FC). Only a single previous study has examined genome-wide methylation between AS individuals and healthy controls, and the reported results showed a change in beta values of -0.23 for HLA-C,

0.20 for RUNX1 and 0.31 for LHX6 176. Results were for Han Chinese individual

PBMCs.

Table 3.13 10 most significant DMR associated with AS in NK cells.

Min mean Overlapping Chr. start width CpGs Stouffer FDR beta FC beta FC promoters 9 15530606 265 4 6.91x10-6 0.979 -0.036 -0.027 AKAP8L 4 186007379 34 2 0.00138 0.982 0.025 0.022 NA 5 1155853 601 2 0.00190 0.982 0.005 0.005 NA 17 35302132 221 2 0.00199 0.982 0.052 0.036 RP11-445F12.2 10 6095868 602 2 0.00251 0.982 -0.060 -0.051 NA 20 31408123 91 2 0.00271 0.982 -0.003 -0.002 MAPRE1 1 43856186 288 2 0.00467 0.982 -0.042 -0.025 MED8, SZT2 1 156718357 561 2 0.00482 0.982 -0.017 -0.016 HDGF, PRCC 2 235437185 146 2 0.00558 0.982 0.025 0.021 NA 16 743925 404 2 0.00567 0.982 0.032 0.031 FBXL16

124 Chapter 3: DNA methylation in AS

Figure 3.14 Gviz plot of the NK cell DMR overlapping the promoter region of AKAP8L (shown as reverse strand). The region is near to WIZ.

Chapter 3: DNA methylation in AS 125

Pathway analysis

Pathway analysis provides an insight into coordinated changes occurring within the same biological pathway and indicates whether a biological process is being increased or decreased rather than individual genes alone. Pathway analysis was carried out using two biological databases, GO and KEGG. The analysis used nominally significant DMP (p-value <0.05) that were associated with a transcript (as these can be annotated to specific pathways). Note that several of the DMP identified previously were not annotated and these are currently unable to be considered within pathway analysis. Hypomethylated and hypermethylated DMP were analysed separately to provide insight into the increase or decrease in methylation of individual pathways.

Multiple CpGs can be associated with a single gene, and therefore genes were assigned a value to indicate if they were associated with only hypermethylated DMP (1), only hypomethylated DMP (-1), or a mixture of hyper- and hypomethylated CpGs (0).

Where asymptotic FDR resulted in too few genes for pathway analysis (<10 genes) permuted FDR was used.

CD4+T-cells

CD4+T-cell DMP were enriched for GO terms of negative regulation of phosphorylation, cell-cell signalling by Wnt, regulation of GTPase activity and calcineurin-NFAT signalling cascade (data not shown). Hypomethylated DMP were associated with GO terms of histone modification (SKI, TET3, HDAC4, KDM5B,

ASXL1, FLCN, SIN3A, CXXC1l, JARID2, KMT5B, MIER2, KANSL3, ASH1L,

PRDM16, SREBF1, MAP3K7, UBE2E1, PAGR1, TBL1XR1, KAT6A, DPF1, TRRAP,

ARID5B, SETD3, LDB1, TRIP12, TADA2B, CLOCK, PPM1F, CDK1, USP3), calcineurin-NFAT signalling cascade (CHERP, RCAN1, MTOR, NRG1, STIMATE,

NFATC2, NFATC3, AKAP6), and IL-2 mediated signalling (JAK1, JAK3, STAT5A,

126 Chapter 3: DNA methylation in AS

PTK2B) (Figure 3.15). Hypermethylation was associated with GO terms of positive regulation of GTPase activity, and negative regulation of Wnt signalling pathway

(CSNK1A1, DACT3, CTNND1, TSKU, IGFBP1, WWOX, SCYL2, BARX1, CTNNBIP1,

NXN, STK4, TLE3, FUZ). The upregulation of IL-2 and calcineurin-NFAT signalling cascade, and the downregulation of negative regulation of Wnt signalling is all indicative of T-cell activation and survival.

Chapter 3: DNA methylation in AS 127

Figure 3.15. GO terms identified using the hypermethylated DMP (left) and hypomethylated DMP (right) in CD4+T-cell. Increased methylation of Wnt alongside loss of methylation for NFAT-calcineurin signalling and IL-2 suggest activation of CD4+T-cells.

128 Chapter 3: DNA methylation in AS

KEGG pathways associated with hypomethylated DMP in CD4+T-cells were involved in immune response to foreign bodies, such as viruses and bacteria, and Wnt signalling,

TH1 and TH2 cell differentiation (JAK3, JAK1, STAT5A, MAML2, NOTCH3, LAT,

HLA-DRB5 and NFATC3), and TH17 cell differentiation (MTOR, JAK1, STAT5A,

JAK3, LAT, HLA-DRB5, NFATC3, NFATC2, RXRB, IL21R, RORA, IL6R). The decreased methylation of these pathways potentially indicates upregulation of these functions.

Hypermethylation was associated with KEGG terms for MAPK signalling (TP53,

MKN1, MAP3K6, MAP3K1, MAP3K20, TRAF6, STK4, IGF2, DUSP4, GNA12,

FGF23, KRAS, EFNA2, EFNA5, MAP2K5, CACNA2D4), oxytocin signalling (CD38,

NFAT2, PRKAG2, MYLK4, CAMK4) and Rap1 signalling (RAP1GAP, SIPA1L2,

PARD3, APBB1IP, SIPA1L1, CTNND1, BCAR1). Both Rap1 and oxytocin signalling pathways are G protein regulators of MAPK signalling, and several of the genes overlap between MAPK signalling with RAP1 signalling and oxytocin signalling

(Figure 3.16) 274.

Chapter 3: DNA methylation in AS 129

Figure 3.16 Cnet plots of CD4+T-cell associated KEGG terms for hypermethylated DMP (left) and hypomethylated DMP (right).

130 Chapter 3: DNA methylation in AS

CD8+T-cells

Pathway analysis of CD8+T-cell DMP revealed enrichment of GO terms associated with positive regulation of immune responses and positive regulation of immune system processes. Hypomethylated DMP were associated with GO terms of regulation of cell-cell adhesion (MAPK14, CTLA4, CYLD, DTX1, ETS1, GATA3, IL4R, ITGA6,

MIA3, MUC21, PRKCA, TWSG1, ZMIZ1, RASAL3, TFRC, ZBTB16, FERMT3,

LOXL3, SKAP1, RUNX3, PDE5A, ADAM19, CD44, PIEZO1, TESPA1), regulation of

T-cell differentiation (DROSHA, RUNX3, CTLA4, IL4R, GATA3, CYLD, DTX1,

TESPA1, LOXL3, ZBTB16, ZMIZ1) and endomembrane system organisation (Figure

3.17). Hypermethylated CD8+T-cell DMP were associated with GO terms of cell junction assembly (LIMS2, CD9, SMAD3, ACTN3, BCAS3, WNT4, PRKCA, CTNNB1,

CDH4, OCLN, TLN2, TBCD, ABL1, PKP1, AGT, PARD3) and regulation of plasma membrane bounded cell projection assembly. Pathways that are involved in T-cell differentiation are upregulated and those involved in cell junction assembly are downregulated.

Hypomethylated DMP were associated with KEGG terms of Ras signalling, Th1 and

Th2 differentiation and changes in metabolism (fatty acid, etc). Hypermethylated

DMP were associated with KEGG pathways involved in tight junction and axon guidance, leukocyte migration and adhesion, and MAPK signalling. The GO and

KEGG terms identified for hypomethylated and hypomethylated DMP corresponded with each other.

Chapter 3: DNA methylation in AS 131

Figure 3.17 Cnetplots for the 5 most significant GO terms associated with CD8+T-cell hypomethylated (left) and hypomethylated (right) genes.

Genes are coloured on the basis of the overall fold change associated with all CpGs annotated to this gene.

132 Chapter 3: DNA methylation in AS

CD14+Monocyte

The small number of nominally significant (p-value <0.05) CD14+monocyte DMP resulted in identified GO and KEGG terms encompassing a very small number of differentially methylated genes (< 5 genes). This made pathway analysis with only hypomethylated or hypermethylated DMP untenable. GO terms associated with all

CD14+monocyte DMP included regulation of protein localization to membrane, monoamine transport and cellular response to retinoic acid. Hypomethylated DMP were associated with GO terms of IFN-γ mediated signalling (HLA-C, SP100,

TRIM26, NMI), regulation of IFN-β production (FLOT1, TBK1, NMI), and regulation of protein localization to membrane (GPC6, CLTC, ANK3, TP73, CRIPT).

Hypermethylated DMP indicates the upregulation of GO terms in positive regulation of protein localization to cell surface (ERBB4, RANGRF, CD247) and regulation of developmental growth (OLFM1, SYT2, DPYSL2, ERBB4, FSTL4, LATS2, BARHL2,

MIR200B, GPAM, SLC6A3, SLC6A4, BMPR1A, TP73, MKKS).

KEGG terms similarly were broad including cell-substrate adhesion, cellular catabolic processes, IFN-γ mediated signalling, and regulation IFN-β biosynthetic processes.

Metabolism related terms (tyrosine metabolism, phenylalanine metabolism, glycerophospholipid metabolism, and mucin type O-glycan biosynthesis) were associated with increased methylation in CD14+monocytes (TH, IL4I1, ALDH1A3,

PCYT2, ETNPPL, DGKG, GPAM, GALNT7, GALNT9 and GALNT18).

γδ T-cells

At DMP cut-off of p-value<10-4, the only cut-off with FDR<1, GO and KEGG terms had less than 3 associated genes. Increased methylation was associated with broad GO terms of adult behaviour and regulation of anatomical structure. Decreased methylation was associated with GO terms of lymphocyte differentiation

Chapter 3: DNA methylation in AS 133

(TMEM131L, PCID2, PREX1), and regulation of lymphocyte activation (TMEM131L,

PCID2, FAM49B).

KEGG terms associated with decreased methylation included Notch signalling (DLL1,

MAML3, CTBP2), Toll-like receptor signalling (TOLLIP, TAB2, IRF3) and cardiac muscle contraction (MYL3, CACNB2, UQCRQ). Hypermethylation was associated with a cluster of genes relating to terms of leukocyte trans-endothelial migration, chemokine signalling pathway, focal adhesion, and bacterial invasion of epithelial cells (PREX1, SEPT9, PXN, RAP1A, ITGAM, ITGA1).

NK cells

GO terms for hypomethylated NK cell DMP were associated with the negative regulation of mTORC1 signalling (SESN3, C12orf56, SZT2), regulation of DNA replication (JUN, DNA2, BLM, WRNIP1, TTF1, CCDC88A) and positive regulation of

T-cell proliferation (HLA-DMB, HLA-DPB1, IL2RA, CD46, BLM) (Figure 3.18).

Hypermethylated DMP in NK cells were associated with GO terms regulation of osteoblast differentiation (HOXA2, TWIST1, SMOC1, ZHX3, MIR9-1, HDAC4,

WNT7B), and cellular response to retinoic acid (WNT5A, WNT6, WNT7B, HOXA2,

RARB, MICB, LTK).

Hypomethylated DMP were associated with a group of overlapping genes (HLA-C,

HLA-DPB1, HLA-DMA, HLA-DMB, CCND1, IL2A, JUN, ETS2, RELA, ATF2) that encompassed KEGG terms of Th1 and Th2 cell differentiation, viral myocarditis, human T cell leukemia virus 1 infection and inflammatory bowel disease.

Hypermethylated DMP were involved in Hippo signalling/carcinoma associated pathways (Basal cell and hepatocellular) (TP73, WWTR1, WNT7B, WNT5A, WNT6,

GSTA3, ACTL6B). Wnt signalling pathways were upregulated in CD4+T-cells, γδ T- cells and NK cells.

134 Chapter 3: DNA methylation in AS

Figure 3.18‡–’Ž‘–ˆ‘” –‡”•ƒ••‘ ‹ƒ–‡†™‹–Š ‡ŽŽŠ›’‡”‡–Š›Žƒ–‡†ȋŽ‡ˆ–Ȍƒ†Š›’‘‡–Š›Žƒ–‡†ȋ”‹‰Š–Ȍ•Ǥ

Chapter 3: DNA methylation in AS 135

DMP in AS-associated loci regions

The findings in the previous part of this chapter have investigated the question of which DNA methylation changes are associated with AS in individual immune cell subsets. This section instead analyses whether there is enrichment of DMP around AS- associated loci. Enrichment analysis was performed using various window sizes around the AS-associated loci to select significant DMP (unadjusted p-value<0.01).

These numbers were compared to 1000 random bins adjusted for probe density, to ensure that results were not affected by array design.

Table 3.14 Enrichment of DMP in different window sizes around AS- associated loci.

Bold values are significant (p-value<0.05)

window size CD4+ CD8+ CD14+ γδ T-cell NK cell (kb) T-cell T-cell monocyte 25 0.39 0.62 0.09 0.88 0.09 50 0.90 0.99 0.74 0.68 0.02 100 0.41 0.77 0.99 0.89 0.25 250 0.55 0.09 0.10 0.09 0.43 500 0.04 0.01 0.05 0.06 0.10 1000 4.95x10-3 0.01 2.02x10-4 3.65x10-6 0.03

No significant enrichment (p-value<0.05) was observed in windows 250kb or smaller in any cell type (Table 3.14). When the number of probes available in total in the AS- associated genetic regions on the MethylationEPIC array were examined many of these smaller windows contained no or very few probes (<10 probes) (Figure 3.15).

Table 3.15. The number of probes available at the AS-associated loci within the window sizes examined.

Window size 25 kb 50kb 100 kb 250 kb 500 kb 1000 kb No. Probes 0 - 54 0 – 116 2 – 150 8 – 342 16 - 610 40 - 1377 (range) Loci with 0 9 1 0 0 0 0 probes

136 Chapter 3: DNA methylation in AS

The lack of probes in these regions is further compounded by the relatively small numbers of DMP available for analysis. DMP for each cell type ranged from 12,291 in CD4+T-cells to 4,457 in γδ T-cells. From a pool of >850,000 profiled CpG’s which themselves only account for ~3% of the CpGs genome-wide, this is an extremely small proportion of CpGs that are significant DMP. This resulted in a large percentage of loci within each window with no DMP (Table 3.16).

Table 3.16 The percent of AS-associated loci within each window that had no DMP.

Window CD14+ CD4+T-cell CD8+T-cell γδ T-cell NK cell size monocyte 25 kb 81.7 % 86.1% 91.3% 91.3% 97.4% 50kb 74.8% 73.9% 87.8% 84.3% 95.7% 100 kb 62.6% 56.5% 80.9% 71.3% 83.5% 250 kb 28.7% 22.6% 55.7% 47.8% 53.9% 500 kb 14.8% 8.7% 34.8% 30.4% 29.6% 1000 kb 4.3% 2.6% 14.8% 13.0% 14.8%

The same approach was applied to other complex diseases, IBD, psoriasis, RA and

SLE, with their respective associated genetic loci. All diseases had fewer loci with no probes available and all except SLE had enrichment of DMP (p-value<0.05) within smaller windows than AS (from 100kb). These results indicate that coverage for AS- associated genetic loci is reduced in comparison to other common complex diseases.

The smaller number of DMP identified within each cell type, which may be influenced by study power, prevents conclusions on whether DNA methylation changes are enriched in the regions surrounding AS-associated genetic loci. The lack of enrichment and small number of DMP could be either an indication that there is no change in DNA methylation due to AS, or that this study is underpowered to identify those changes.

Chapter 3: DNA methylation in AS 137

Discussion

Genetic association studies have previously identified AS-associated loci in DNMT3A,

DNMT3B and DNMT3L indicating the potential for DNA methylation changes to play a role in AS pathogenesis. This chapter outlines the first genome-wide study of changes in DNA methylation within isolated cell subsets of AS individuals compared to healthy individuals.

A total of 100 individuals were recruited for cell isolation. Of these 420 cell type samples had sufficient DNA for use in DNA methylation analysis. No difference in quality or quantity was observed between the different cell types, indicating that insufficient material was due to the low prevalence of NK cells and γδ T-cells in

PBMCs, rather than any difference between AS and healthy individuals (demographics for samples with these cell types in Appendix E). In PCA, cell type was the major contributor to differences between samples in PC 1 and 2, more than disease status or any other variable. This supports the need to examine cell types in isolation to identify methylation changes due to disease.

One limitation of this study is the degree to which cell types were sorted. All five cell types examined are capable of further division based on function. For example,

CD4+T-cells consist of several cell subsets with highly divergent purpose, including naïve CD4+T-cells, regulatory T-cells and effector T-cells. The categorisation used was selected firstly, as there is a limited understanding of which subsets are contributing to AS, secondly, as surveying higher level categorises provides an overview of shifts in overall functional capacity, and finally, for practical considerations of cost and sample availability. The input required for the

MethylationEPIC is approximately 100,000 cells, and alternative single cell

138 Chapter 3: DNA methylation in AS

approaches are substantially more expensive. The ability to profile more biological replicates was favoured over isolating more specific cell types.

After samples were subset by cell type, DNA methylation results were examined for technical and biological variables which could influence analysis. Technical variables were plate and slide, which can occur during manufacture. Biological variables affecting DNA methylation were those previously identified as unbalanced between cohorts, being sex, age and smoking status. These variables were included as covariates in DMP analysis. It is noteworthy that these variables are not adjusted for in numerous previous DNA methylation studies despite their known effect on DNA methylation (discussed in Chapter 1). Further, this study took lengths to minimise the affect of processing on DNA methylation (among other measurements) by ensuring blood samples were frozen on the day of being taken and that DNA and RNA were extracted immediately after defrosting and FACS sorting.

Relatively small changes in the average DNA methylation of individual CpGs between

AS and healthy individuals were observed in all cell types (<20%). This level of change would be missed in several previous studies due to the use of a cut-off for minimum change in beta value of 20%. The use of 20% is not based on a specific observation nor does it appear to have been selected based on study power. The largest methylation study in IBD study observed a s.d. between test groups of 0.02 188. The current study observed a range of effect sizes from 0.001 (CD8+T-cells) to 0.27 (γδ T- cells). The small effect size and variation in methylation may go some way to explain why so few DMP have previously been identified in AS, or in other complex diseases.

It also indicates that the effect of DNA methylation varies between cell types. The observed modest changes in DNA methylation may be an indication of low level inflammatory signalling in vivo in the steady state which may have been unable to be

Chapter 3: DNA methylation in AS 139

detected in ex vivo. Future work on the basis of the below results may utilise stimulation ex vivo to uncover the effects of these modest DNA methylation changes at specific key loci.

Enrichment analysis for DMP near AS-associated loci indicated that a large proportion of AS genetic loci had no DMP within 100kb, and that several did not even have probes within this space. This was not observed when enrichment was compared to other complex diseases, IBD, RA, psoriasis and SLE. This may indicate that the

MethylationEPIC array design may not provide sufficient coverage for regions associated with AS. Further, the small number of DMP may indicate either that there is no change in DNA methylation due to AS or that this study is underpowered to identify those changes. The current study is the largest study of CpG methylation changes in AS. Despite this the current methods for multiple testing adjustment requires cohort sizes closer to those implemented with genetic association studies to obtain significance for the >800,000 CpGs tested using the MethylationEPIC (in some estimates ~1000 individuals) 275. This current study has shown that this can be challenging both with regards to recruitment itself and the lower prevalence of specific cell types, such as γδ T-cells and NK cells.

In light of this limitation, permutation analysis was performed by shuffling the disease status of each sample. The permuted FDR results at p-value ≤ 0.01 were 0.21, 0.16,

1.15, 0.48, and 1.05 for CD4+T-cells, CD8+T-cells, γδ T-cells, NK cells and

CD14+monocytes respectively. Therefore, γδ T-cells and CD14+monocytes should be interpreted under the knowledge that there are likely more false positives than true positives. Coordinated changes from DMR and pathway analysis will still provide information on potential pathway implicated in these cell types. This chapter seeks to

140 Chapter 3: DNA methylation in AS

highlight the potential avenues for future functional work and provide insight into the pathways implicated in AS in each of the cell types surveyed.

CD4+T-cells

CD4+T-cell DMP were associated with immune related-genes (TMEM204,

SUV420H1, TOLLIP, IL21R, ZSWIM7, CCDC88C, NFATC2). All the genes were located within the gene body region, except for the NFATC2 CpG which was associated with TSS1500. Interpreting the functional effect of changes in expression is highly context dependent. It is typically assumed that increases in DNA methylation at a promoter region or gene body has an inverse change in gene expression, conversely increased DNA methylation in effector regions can result in increased binding efficiency, and therefore increased gene expression. The results from this chapter will be interpreted based on these assumptions. For example, the CpG with the strongest disease association in CD4+T-cells was in the 3’UTR/Body of PRKAG2. The assumption therefore would be that the increased methylation in cg03740323 indicates decreased expression of PRKAG2.

Most of the 20 DMP with the lowest p-value in CD4+T-cells were located in the ‘open sea’ (≥ 2kb outside CpG Island regions). The current version of the Illumina methylation beadchip array incorporates a greater number of open sea regions compared to previous versions. However, annotation data for the open sea regions is still being developed, and there is a lack of gene annotation for several of the most significant DMP within the open sea.

TMEM204 contained several DMR spanning the length of the gene, all exhibiting decreased methylation in AS individuals compared to healthy individuals. TMEM204 is associated with stress response and lymph vessel development but is not fully characterised 267. The most significant DMR in CD4+T-cells (Stouffer = 0.33, Δβ =-

Chapter 3: DNA methylation in AS 141

0.04) was associated with the promoter region of AKTIP, which increases the release of TNFSF6 276,277. SOCS3, a regulator of T-cell differentiation, was associated with several DMP with decreased methylation. These changes suggest increased expression of TMEM204, AKTIP, and SOCS3 in AS individual CD4+T-cells. High levels of expression of TMEM204 and AKTIP occur in naïve T-cells 256. Increased SOCS3 expression is associated with increased Th2 differentiation, and suppression of Th1 and Th17 differentiation from naïve CD4+T-cells278.

Pathway analysis indicated decreased methylation in Wnt signalling, and increased methylation with terms associated with negative regulation of Wnt signalling, suggesting increased Wnt signalling within AS individual CD4+T-cells. Wnt signalling is involved in T-cell differentiation. Decreased methylation was observed in genes associated with IL-2, TH1 and TH2 cell differentiation (JAK3, JAK1, STAT5A,

MAML2, NOTCH3, LAT, HLA-DRB5, NFATC3), and TH17 cell differentiation

(MTOR, JAK1, STAT5A, JAK3, LAT, HLA-DRB5, NFATC3, NFATC2, RXRB, IL21R,

RORA, IL6R), indicating upregulation of these pathways associated with activated T- cells. Several of the genes involved in these T-cell activation pathways have previously been associated with AS through genetic association studies (including RORC, IL6R,

IL1R1 and IL1R2). Together these changes in DNA methylation suggest that in AS cases there are changes in expression of genes associated with increased CD4+T-cell differentiation into either Th1 or Th17 cells. These cells are both inflammatory types of CD4+T-cell subsets.

CD8+T-cells

CD8+T-cells showed the most significant DNA methylation changes of the five cell types with strong changes in RUNX3, EOMES, CLEC3A, CTLA4, ETS1 and SMAD3 all in keeping with adaptive response of CD8+T-cells. This adaptive response is part

142 Chapter 3: DNA methylation in AS

of the MHC-1-opathy hypothesis for AS, which is based on the association of several diseases including AS, IBD and psoriasis with MHC alleles and ERAP1. The hypothesis is that barrier dysfunction in environmentally exposed organs (for example skin or gut) and aberrant immune reactions trigger secondary adaptive immune responses in CD8+T-cells to drive inflammation in these diseases. The theory proposes that specific MHC alleles dispose different organs to being affected, which is why these disease with genetic overlap present so distinctly within the body. The observations of this thesis to the changes in CD8+T-cells being more statistically significant than those in other cell types examined and their strong indication of

CD8+T-cell activation is in keeping with this theory.

The positive DMR overlap with promoters including EOMES, IKZF3, CLEC3A, and

TOX, indicating potential suppression of these genes. EOMES and IKZF3 both contain loci associated with AS at genome-wide significance. EOMES is a transcription factor involved in CD8+T-cell differentiation into effector and memory phases in conjunction with T-bet which is also associated with AS. EOMES is involved in T- cell differentiation into Th1 cells.

The majority of DMP identified in CD8+T-cells were hypermethylated, however very few DMP had large changes in DNA methylation (∆β >0.04). The most significant

DMP in CD8+T-cells was associated with a DMR within the promoter region of

PDE4A (Stouffer 0.91, mean Δβ = 0.02). PDE4A regulates cyclic-AMP (cAMP) signalling and is associated with osteoclast function and inflammation 251,252. All four of the PDE4 genes are increased in inflammatory cells and lead to increased expression of inflammatory cytokines including TNF-α 254. These changes observed in both

CD4+T-cell and CD8+T-cells suggests that PDE4A may have increased expression.

Strangely, a trial of a PDE4 inhibitor (apremalist) in AS was worse than placebo 255.

Chapter 3: DNA methylation in AS 143

This may be due to PDE4A not driving inflammation in AS, or functional redundancy in AS inflammation.

SKI, a negative regulator of TGF-β signalling, was associated with several DMP and a DMR in CD8+T-cells with decreased methylation in AS individuals. SKI was associated with hypomethylated terms relating to morphogenesis (CREBBP,

TBC1D32, GNA12, ROR2, CHST11, PTCH1, SKI, ZBTB16, IFT140). This may potentially indicate upregulation of TGF-β signalling.

CD8+T-cells had reduced methylation in positive regulation of cell-cell adhesion

(MAPK14, CTLA4, CYLD, DTX1, ETS1, GATA3, IL4R, ITGA6, MIA3, MUC21,

PRKCA, TWSG1, ZMIZ1, RASAL3, TFRC, ZBTB16, FERMT3, LOXL3, SKAP1,

RUNX3, PDE5A, ADAM19, CD44, PIEZO1, TESPA1), and regulation of T-cell differentiation (DROSHA, RUNX3, CTLA4, IL4R, GATA3, CYLD, DTX1, TESPA1,

LOXL3, ZBTB16, ZMIZ1). Increased methylation in CD8+T-cells was associated with cell junction assembly (LIMS2, CD9, SMAD3, ACTN3, BCAS3, WNT4, PRKCA,

CTNNB1, CDH4, OCLN, TLN2, TBCD, ABL1, PKP1, AGT, PARD3). Several of the genes identified as altered in CD8+T-cell pathway analysis are associated with AS genetic loci, including ETS1, RUNX3, and SMAD3. IL4R is a receptor for IL4 which can promote differentiation into Th2 cells 279,280. The indicated loss of methylation in

RUNX3, CTLA4 and IL4R in CD8+T-cells is suggestive of CD8+T-cell activation.

CD14+monocytes

The majority (90%) of the DMP identified in CD14+monocytes had increased methylation, although less than 10 had a change greater than 5%. This may indicate that DNA methylation is not altered within CD14+monocytes to the extent of other cell types. CD14+monocyte DMP were mostly located in TSS with increased

144 Chapter 3: DNA methylation in AS

methylation, indicating potential increased expression of associated genes. The genes identified included TCF4, SYT2, TP73, ANKMY1, ANK3 and RAB5B.

DMR analysis was limited by the small number of DMP identified with the lowest

Stouffer value shared by 29 DMR with only 2 DMP associated. Only 13 DMR had an absolute mean beta fold change of more than 4.5%, and were associated with the promoter regions of RGMA, OXR1, RP11-672A2.4-001, VTRNA, C22orf34,

TMEM232, HOXA1, NPY2R, and RP11-2A4.3 Several of which have unknown function.

Pathway analysis identified that hypermethylated DMP were associated with GO terms positive regulation of protein localization to cell surface, and KEGG terms associated with metabolism (TH, IL4I1, ALDH1A3, PCYT2, ETNPPL, DGKG, GPAM,

GALNT7, GALNT9, GALNT18). Reduced methylation was associated with GO terms of IFNy signalling (HLA-C, SP100, TRIM26, NMI), regulation of INFβ production

(FLOT1, TBK1, NMI), and regulation of protein localization to membrane (GPC6,

CLTC, ANK3, TP73, CRIPT), with KEGG terms of cell substrate adhesion, IFNγ signalling and IFNβ. These give some insight into changes in CD14+monocyte function by indicating upregulation of IFNγ and IFNβ signalling. The upregulation of these interferon signalling pathways indicates a proinflammatory state in

CD14+monocytes in AS individuals compared to healthy controls. These changes are coherent with the inflammatory state previously observed in PBMCs. A rare side- effect of anti-TNF-α therapy is drug-induced systemic lupus erythematosus (SLE) which in one theory is hypothesised to be driven by the anti-TNF-α causing a shift to

Th2 cytokine production which includes INF-α and IL-10 281.

The small number of changes in methylation within CD14+monocytes may be due to the smaller influence of methylation on CD14+monocyte function. In the alternative,

Chapter 3: DNA methylation in AS 145

it may indicate that the changes that do occur in CD14+monocytes are smaller than other cell types and the current study is underpowered to identify those changes.

γδ T-cells

Several γδ T-cell DMP were associated with genes involved in T-cell activation

(FAM49B, MLST8, PREX1, ADAM12, ERI1, ACKR2, EPS15L1). FAM49B is a key regulator of actin dynamics and T-cell activation through the repression of Rac activity, and had decreased methylation in AS individuals 282,283. PREX1 is involved in secretion of IL-2, IL-4, and IL-10, all of which are associated with T-cell activation.

Two of the DMP identified have previously been identified in psoriasis: cg06301412

(AADACL2-AS1) and cg03781837 (ACKR2) 284. ACKR2 suppresses Th17 responses and IFN-γ producing γδ T-cells 285,286. The increased methylation of CpGs within the body of ACKR2 suggests decreased expression of this gene in AS individuals and potentially increased Th17 responses.

DMR analysis was limited by the small number of DMP. DMR with larger numbers of DMP included HLA-DPB1 (mean beta FC 0.05), TNFAIP8 (mean beta FC 0.03), and HOXA2 (mean beta FC -0.05). This indicates that the large Stouffer values are likely due to lack of statistical power rather than a lack of biological change. HOXA2 is involved in the Wnt-β-catenin pathway and affects skeletal development268,269, and

TNFAIP8 is thought to potentially negatively regulate immunometabolism 270,271.

In pathway analysis this is reflected by hypermethylated DMP being associated with

KEGG terms related to Notch signalling, and Toll-like receptor signalling (TOLLIP,

TAB2, IRF3), and hypomethylated DMP being associated with GO terms of regulation of inflammatory response (ITGAM, GPX4, CNR2, A2M, METRNL, PSMA6, GPRC5B) and regulation of T-cell activation. KEGG terms for decreased methylation were related to a cluster of shared genes for terms leukocyte transendothelial migration,

146 Chapter 3: DNA methylation in AS

chemokine signalling pathway, focal adhesion and bacterial invasion of epithelial cells

(PREX1, SEPT9, PXN, RAP1A, ITGAM, ITGA1). Cumulatively, these changes indicate upregulation of T-cell activation pathways including Wnt signalling and inflammatory responses. Due to the limited statistical power of γδ T-cells, it is difficult to comment on whether these methylation changes are driving a specific inflammatory response within γδ T-cells, such as Th17.

NK cells

NK cell DMP were in immune cell function-related genes (C9orf91, HOXA2,

SERPINB6, JUN, TRAK1). C9orf91 contains a genome-wide significant locus associated with AS 89,257. The function of C9orf91 is unknown. The most significant

DMP and DMR in NK cells were both associated with AKAP8L which may interact with histone methyltransferase H3K4 258. AKAP8L is still undergoing characterisation but has been associated with constitutive transport element (CTE)-mediated gene expression and potentially with nuclear envelope breakdown and chromatin condensation 272,273.AS individuals also had increased methylation in CpGs associated with LHX6, transcriptional repressor of Wnt-β-Catenin pathways, and TRAK1, endosome to lysosome trafficking 261,262. Decreased methylation of JUN, a gene involved in TLR4 signalling and regulation, was observed. These shifts are all indicative of activation of NK cells through the TLR pathway.

However, almost half of the top DMP identified in NK cells were associated with non- coding transcripts of unknown function, or regions not associated with any gene or function to date. This is a particular difficulty in the interpretation of DNA methylation data which has less robust annotation compared to other measures such as gene expression. The sheer number of CpGs capable of being methylated has made developing a comprehensive annotation for CpGs a long-term project. Annotation is

Chapter 3: DNA methylation in AS 147

most comprehensive in regions of known function, or where CpGs lie within gene regions, but the function and gene associations of CpGs within the open sea and/or intergenic regions remains incomplete. Updating this dataset in the future may reveal the role of these currently unannotated transcripts and their role in the NK cells of AS individuals.

Pathways related to decreased methylation involved a group of interconnected genes

(HLA-C, HLA-DPB1, HLA-DMA, HLA-DMB, CCND1, IL2A, JUN, ETS2, RELA,

ATF2) that encompassed KEGG terms of Th1 and Th2 cell differentiation, viral myocarditis, human T cell leukemia virus 1 infection and inflammatory bowel disease.

These terms all involve NK cell activation in response to HLA engagement, and the genes are reflective of this process. The unannotated CpGs and genes with unknown function limit interpretation of the functional impact of methylation change in NK cells. Overall, the change in DNA methylation in NK cells that had known function indicate upregulation of Wnt signalling in conjunction with TLR signalling and chromatin remodelling.

148 Chapter 3: DNA methylation in AS

Conclusion

Understanding the specific changes affecting individual cell types involved in AS requires a greater understanding of the pathways affected in individual cell types, and the role of AS-associated loci in those changes. This study provides a robust dataset to examine multiple questions regarding AS pathogenesis, and basic biological questions for cell types that do not currently have publicly available data (γδ T-cells). PCA emphasised the importance of examining cell types in isolation where possible, rather than deconvolution of mixed cell populations as the cell type differences are much greater than disease-based variability. The literature supports the altered composition of PBMCs in AS individuals, and in inflammatory settings, and this was also observed in the current studies collection of PBMCs.

This chapter examined three hypotheses:

1. Cell type specific changes in DNA methylation occur in AS.

2. These changes are coordinated as regions (DMR).

3. Changes in DNA methylation occur in specific biological pathways

related to immune function.

Cell type specific changes in DNA methylation were observed in AS individuals compared to healthy individuals, although none of these changes were statistically significant. The changes were coordinated in regions (DMR) as shown with SKI in

CD8+T-cells which was identified as a DMP and DMR. Further, the pathways these changes were observed in were related to immune function such as TLR pathway,

Wnt-β-catenin pathway and TNF receptor signalling pathways, amongst others. This confirms that AS individuals have altered DNA methylation patterns and suggests these could be used in future analysis for methods of treatment or diagnosis.

Chapter 3: DNA methylation in AS 149

Power remains an issue despite this study being the largest DNA methylation study in

AS, however the use of sample randomisation and careful cohort selection has enabled a clean dataset for analysis. The preference for including potential biological confounders as covariates may be a conservative approach but provides confidence in the changes observed being due to AS. Genome wide changes were observed in the immune cells of AS individuals compared to healthy individuals, however these changes are small (<5%) indicating altered methylation rather than a loss of DNA methylation regulation. Altered immune function is apparent in all 5 cell types profiled, and whether these changes result in altered gene expression will be investigated in Chapter 4 using the paired RNAseq data. Further studies are required to confirm the findings outlined in this chapter, and to investigate functionally annotating those DMP that currently have no known function.

150 Chapter 3: DNA methylation in AS

Chapter 4: RNA expression in AS

Introduction

Transcription is a term that encompasses a broad range of RNA types, both coding and non-coding, each with their own role in cellular function. Improvements in sequencing technology to capture total RNA throughout the past decade have resulted in an increase in the number of studies examining non-coding RNA (both miRNA and lncRNA). Specific capture methods enable the selection of either mRNA (using poly-

A tail methods), total RNA (>200nt), or miRNAs (<100nt) for sequencing. After RNA capture, RNA can be profiled through several different approaches, with the most common being microarray (curated sequence approach), RT-PCR (quantitative but not scalable), and RNAseq. RNAseq, a next-generation sequencing technique, allows direct sequencing of RNA, the absolute quantification of RNA transcripts, and simultaneous examination of multiple samples through the incorporation of unique barcode adapters 182. No method is capable of capturing all RNAs due to the broad range in RNA size, which may impact the efficiency of sequencing. Therefore, the methods used must be selected for the specific biological question being investigated.

The depth of sequencing is important for sufficient coverage of transcripts for downstream analysis, with targeted expression studies requiring less than 5 million reads per sample and new transcript assembly requiring over 100 million reads per sample 287. Typical experiments investigating broad gene expression use around 50 million reads per sample, with the ENCODE guidelines requiring at least 30 million aligned reads for each sample 288. Depth of sequencing is not a replacement for biological replicates, and increasing sequencing depth has diminishing returns on the

Chapter 4: RNA expression in AS 151

power to detect differentially expressed genes compared to using additional biological replicates at any sequencing depth 289.

RNAseq does not directly quantify changes in gene expression, but rather provides a means of measuring relative changes in gene expression between experimental samples. These measurements are a profile of the sample population, meaning that

RNAseq with PBMCs does not allow the cells causing shifts in expression to be identified. Isolating cell types prior to RNAseq transcriptional profiling enables a greater understanding of cell type specific variation and disease related changes. This is particularly relevant in PBMCs where individual cell type proportions can vary substantially even in healthy individuals (outlined in Table 1.1). RNAseq and microarray results are generally validated using qPCR methods, although only variation above two log2 fold change can be validated due to the sensitivity of qPCR.

Transcriptomics in AS

AS-associated inflammation involves changes in immune cell numbers and function in affected tissues such as the spine and sacroiliac joints. Due to the relative inaccessibility of these sites, previous AS studies have largely focused on investigating underlying changes in gene expression in blood or synovial fluid. As mentioned in

Chapter 3, the sample types used are not reflective of a generally accepted initiation site of disease, however immunopathogenesis can be readily detected in the periphery, providing an informative perspective on disease.

Previous studies of DEG in AS have identified signatures for inflammation and immune activation (outlined in Chapter 1). Inter-study replication of DEGs has been relatively limited, with a variety of immune pathways implicated. The DEG identified in these studies included genes implicated in AS through genetic association studies,

152 Chapter 4: RNA expression in AS

such as RUNX3, IL6R, CXCR1, NFKB1A, PTGER4, TNFAIP3, TLR4, and NOTCH1.

Four published studies have used RNAseq to examine changes in AS transcripts all involving relatively small cohorts investigating blood samples or blood derived cells

204,206,208,209. The sole common pathway identified in these studies was haematopoietic cell lineage, but other pathways identified included T-cell receptor signalling, DNA repair and intestinal immune network for IgA.

These pathways are reflective of the genetic association with T-cells and gut microbiota implicated in previous microarray studies. Whilst there has been agreement in the pathways identified as differentially expressed in AS, the use of different sample types, different RNA profiling methods and relatively small samples sizes (most < 20 individuals total) has led to inconsistent identification of DEGs in AS.

Chapter 4: RNA expression in AS 153

Chapter outline

This chapter outlines the first study to compare the transcriptional profile of individual cell subsets from AS and healthy individuals. The examination of these cell types in isolation has enabled the identification of which cell subsets are driving transcriptional changes observed within PBMCs, which might be masked by variation inherent within the PBMC population.

The same PBMC vials were used to simultaneously obtain the RNA used in this chapter as the DNA from the previous chapter and as such the data can be interpreted in light of the results from chapter 3. A low input library preparation method was selected due to the low cell numbers and low amount of RNA per cell in many samples, particularly with regards to NK cells and γδ T-cells. Methodologies for gene expression profiling and analysis are well-established in comparison to DNA methylation methods, and standards for the quality of library preparation, sequencing and data quality control were used to verify the quality of this dataset.

The first section of this chapter will discuss the sample and sequencing quality, which can influence the accuracy of read alignment and differential expression counts.

Analysis of the differentially expressed transcripts is then outlined, including calculation of permuted FDR for each cell type, expression of AS-associated genes and the annotated genes for the most significant associations. Finally, pathway analysis of the GO and KEGG terms associated with DEGs will be outlined for each cell type.

154 Chapter 4: RNA expression in AS

I raise the following hypotheses:

1. Cell specific changes in transcription occur in AS

2. Changes in transcription will include genes located at or nearby to AS-

associated loci

3. Changes in gene expression associated with AS are co-ordinated within

specific biological pathways.

Chapter 4: RNA expression in AS 155

Results

Sequencing metrics

The use of a low-input library preparation kit maximised the number of samples able to be prepared and sequenced, with 440 of the 500 samples extracted having sufficient material for sequencing (10 ng). All libraries had an average read length greater than

300 bp after library preparation (the sequencing kit used 150 cycles, resulting in

~150bp sequencing length). The pooling strategy for this project involved sequencing

386 samples across two runs providing approximately half the reads necessary in each run to allow for adjustment in case issues related to pooling or sequencing arose. No adjustments were made between the first and second sequencing runs, however samples that did not reach the required 40 million reads were incorporated into the third and final sequencing run. A total of 53 samples were sequenced three times, of which 17 sample libraries were prepared again with different adapters to enable pooling. No difference was observed in the ratio of reads nor the read quality for those libraries that were processed with new adapters.

The average number of raw reads per sample was 49 million, ranging from 38 million to 107 million reads paired end reads. Four samples had less than 40% of reads aligned, and their total aligned read counts were below 20 million paired reads (Figure 4.1).

These three γδ T-cell samples and one CD8+T-cell sample were removed from downstream analysis. After removal of the low read samples, the average number of aligned reads per sample was 39.2 million (minimum 23.3 million to maximum 87.4 million), with an average mapped length of 287 bp.

156 Chapter 4: RNA expression in AS

All subsequent analysis was carried out on the STAR aligned transcripts sorted into forward and reverse strand reads. Aligned library size was highly variable, as indicated above, therefore samples were adjusted for library size using VST. After adjustment for library size, PCA showed samples clustered closely by cell type within PC 1 and 2

(Figure 4.2). As observed with the DNA methylation data in Chapter 3, the cell type clustering followed cell lineage, with all three T-cell subsets clustered closely together and CD14+monocytes clustered the furthest.

Figure 4.1 Alignment statistics from STAR for number and percentage of reads aligned for each sample.

Chapter 4: RNA expression in AS 157

Figure 4.2 PCA plot for RNAseq counts showing cell type within the first and second PCs.

Labelled samples are outside their respective cohorts and exhibit different gene expression levels to their respective cell types.

158 Chapter 4: RNA expression in AS

Figure 4.3. Heat scree plot for technical and biological variables influence on variation within each PC.

After normalisation for library size, samples were assessed for an effect of technical and biological variables on expression profiles using PCA. This is illustrated by heat scree plots for the first 20 PC (Figure 4.3). The analysis software used, DESeq2, recommends testing samples as a complete cohort then performing individual comparisons between groups of interest. Cell type was the only significant variable in

PC1 and 2. Library preparation group and sequencing batch were both significant technical variables captured by PC 3 and 4. Sequencing batches 1 and 2 consisted of the same library preparation groups. As can be observed in Figure 4.3, the variation due to sequencing batch is less than library preparation group. This suggests that library preparation group was driving batch variation. Library preparation was included as a covariate in the study design alongside biological variables of age, sex, and smoking status, which were also significant. Inclusion in the analysis design was selected over computational adjustment for library preparation effects, as there is a potential for the covariate effects to vary between genes and between cell types. To enable this, samples were labelled with a group variable, consisting of cell type and

AS status (e.g. CD4.AS).

Chapter 4: RNA expression in AS 159

Using linear regression modelling without covariates, AS status to variability in transcriptional profile was examined within each cell type separately to determine if it was a significant source of variation within the initial PCs. Visual interpretation of variation displayed within PCA plot is difficult, however AS status was a significant source of transcriptional variation within the first 3 PC in CD4+T-cells, CD8+T-cells,

NK cells and CD14+monocytes. After quality control equal numbers of AS and healthy individual samples were available for analysis (Table 4.1).

Table 4.1 Sample numbers for AS individuals and healthy controls post QC.

All Cell CD4+T- CD8+T- Γδ T- NK CD14+ cell Type cells cells cells cells monocytes types AS 52 52 23 49 52 23 HC 48 47 25 41 47 22 Total 100 99 48 90 99 45

On examining initial DEG analysis, several CD14+monocytes had significant DEG that were not associated with CD14+monocyte function (e.g. TRAC, TRBC). When expression levels were examined separately for each individual, four healthy controls were causing the variation in these DEG with significantly different expression profiles from the other healthy or AS individual CD14+monocytes. The four individuals were all female between the ages of 21 and 36, with no other common technical or biological variables. In PCA the same four CD14+monocyte samples grouped separately, as did CD4+T-cell and CD8+T-cell samples from two of the same individuals. No issues with regards to sample purity, RNA extraction, library preparation or pooling was noted. As the source of this variation could not be ascertained these samples were removed prior to DEG analysis (those labelled within

Figure 4.2).

160 Chapter 4: RNA expression in AS

Differential expression analysis

Differential expression analysis was carried out using a generalised linear model within the DESeq2 program. DESeq2 defaults to independent filtering using the mean of normalized counts across all samples to increase detection power with the same experiment-wide type I error. This is necessary as genes with low transcript counts are less likely to see significant differences due to their high count dispersion. Independent filtering was not used as the low sample number and sample input resulted in inappropriately strict cut-offs for many cell types. For example, NK cell independent filtering excluded almost half the transcripts identified. Samples were instead filtered for mean counts less than 2 to remove lowly expressed or spuriously expressed transcripts.

Whilst significant DEG (adjusted p-value <0.05) were identified in all cell types, the number of significant DEG after adjustment for multiple testing was drastically reduced in CD8+T-cells, and γδ T-cells (Table 4.2 and 4.3). A greater number of transcripts were significant in reverse strand reads, which correlates with a higher number of coding genes aligned from the reverse strand (full list of DEG Appendix

F(2)). This indicates that the reverse strand contains many sense strand derived transcripts. The designated orientations of forward and reverse is indicative of adapter orientation for reads within the transcript fragment. Therefore, the aligned forward and reverse reads cannot be combined directly, as this would result in conflating reads from different directions. The forward read in this case is representative of the antisense strand, and reverse strand representative of the sense strand.

Chapter 4: RNA expression in AS 161

Table 4.2. DEG numbers at different levels of significance for each cell type in the forward strand dataset.

p-value <0.05 Adj. p-value <0.05 CD4+T-cells 2326 219 CD8+T-cells 1961 23 γδ T-cells 2283 13 NK cells 2374 112 CD14+monocyte 3253 858

Table 4.3. DEG numbers at different levels of significance for each cell type in the reverse strand dataset.

p-value <0.05 Adj. p-value <0.05 CD4+T-cells 3455 762 CD8+T-cells 2308 78 γδ T-cells 2839 80 NK cells 3233 3932 CD14+monocyte 4677 1800

Benjamini-Hochberg adjustment is conservative method for multiple testing adjustment. For comparison, permuted FDR was calculated using 100 permutation of sample disease status. The resulting permuted FDR value is based on the ratio of permutations that resulted in a higher number of DEG than the true analysis. The permuted FDR for CD14+monocytes at all p-values tested, and for CD4+T-cells at all p-value except 10-5, was 0 (Table 4.4).

Table 4.4 Permuted FDR and number of DEGs at various p-value cut-offs for each cell type.

10-2 10-3 10-4 10-5

DEG FDR DEG FDR DEG FDR DEG FDR

CD4+T-cells 923 0 322 0 128 0 64 0.02

CD8+T-cells 621 0.06 133 0.04 30 0.04 11 0

CD14+monocytes 1950 0 921 0 526 0 325 0

γδ T-cells 751 0.32 141 0.64 32 0.62 12 0.22

NK cells 869 0.08 225 0.18 71 0.18 21 0.30

162 Chapter 4: RNA expression in AS

A permuted FDR of 0 indicates that none of the permutations resulted in a higher number of DEG than those identified in the true analysis. This is interpreted as

FDR<0.01 due to the minimum sensitivity (1/n permutations). The increasing FDR at a lower p-value cut-off is likely due to the decreasing number of DEG identified in the true analysis which may decrease at a greater rate than the permutations. An optimal p-value cut-off is one that balances permuted FDR and number of DEG identified. For example, the γδ T-cell permuted FDR at p-value cut-off 10-5 is 0.22 with 11 DEG

(indicating ~3 potential false positives, 8 true positives), and at p-value 10-2 permuted

FDR of 0.32 for 751 DEG (indicating ~240 false positive, 511 true positives). Pathway analysis particularly may benefit from additional DEG despite the increased FDR.

Where possible in analysis Benjamini-Hochberg adjusted p-value has been used, however where insufficient DEG impedes pathway analysis a permuted FDR of 0.01 is a practical alternative.

AS-associated genes

As discussed in the literature review in Chapter 1, AS has a strong genetic basis and over 100 individual loci have been associated with AS to date. The list of genes associated with, or suggested to be associated with, these loci (outlined Appendix A) were examined for changes in gene expression within each cell type. AS-associated genes which showed a significant difference in expression between AS and healthy individuals within any cell type for either forward or reverse reads, are shown in Table

4.5 and 4.6. Several AS-associated genes were not expressed or were expressed below the minimum required reads for quality control (CCL21, FAM205A, NKX2-3,

PLA2G2E, SOX14). Results are highlighted where genes are approaching significance

(p-value<0.2) as either increased (orange) or decreased (blue) expression. Comparing

Chapter 4: RNA expression in AS 163

AS cases and healthy controls, both cross-cell type and cell type specific gene associations were observed.

FOS, part of the FOS/JUN transcription factor, had increased expression in AS individual CD4+T-cells, CD8+T-cells and NK cells (adjusted p-value<0.05). AS individuals CD14+monocytes and γδ T-cells also had increased FOS expression but did not reach statistical significance. PTGER4 had statistically significant (p- value<0.05) increased expression in AS individual CD4+T-cells, CD8+T-cells, NK cells and CD14+monocytes. TLR4 was only differentially expressed in AS individual

CD4+T-cells forward strand (adjusted p-value 3.07x10-4). Similarly, several genes were only downregulated (EMSY, CXCR1, IKZF1), or only upregulated (BACH2,

ERN1, IRF1, IL10, NFKB1, SMAD3, TAGAP) in AS individual CD14+monocytes. In

γδ T-cells several T-cell related genes were close to significance (adjusted p-value

<0.2) (AIRE, LTBR, IL27), which may reflect the small sample numbers.

164 Chapter 4: RNA expression in AS

Table 4.5 Forward strand read results for all cell types for genes implicated at loci previously associated with AS.

Shown as the log2 fold change (log2FC) and Benjamini-Hochberg adjusted p-value (padj) for each gene in each cell type. Genes with padj<0.2 are coloured for increased (orange) or decreased (blue) expression in AS individuals.

CD4+T-cells CD8+T-cells Γδ T-cells NK cells CD14+monocytes Gene Name Ensembl log2FC padj log2FC padj log2FC padj log2FC padj log2FC padj ADO ENSG00000181915 -0.08 0.96 0.37 0.69 -0.04 0.99 0.30 0.71 -0.09 0.93 AHR ENSG00000106546 0.12 0.87 0.09 0.92 -0.31 0.64 0.37 0.34 0.28 0.32 AIRE ENSG00000160224 -0.05 0.98 0.44 0.74 -1.56 0.11 -0.48 0.76 -0.09 0.96 ANKRD55 ENSG00000164512 -0.08 0.97 0.30 0.87 0.25 0.92 0.71 0.69 0.93 0.12 BACH2 ENSG00000112182 -0.12 0.90 -0.01 1.00 -0.24 0.83 0.23 0.76 1.05 0.08 EMSY ENSG00000158636 -0.09 0.78 -0.01 0.99 -0.15 0.72 0.05 0.92 -0.19 0.25 CARD9 ENSG00000187796 0.26 0.69 0.00 1.00 0.32 0.74 0.25 0.73 0.12 0.86 CMC1 ENSG00000187118 -0.13 0.51 -0.03 0.96 0.02 0.98 -0.08 0.77 -0.06 0.81 CTLA4 ENSG00000163599 0.24 0.82 0.55 0.55 0.42 0.78 0.76 0.61 0.56 0.47 CXCR1 ENSG00000163464 0.60 0.78 0.74 0.62 0.33 0.89 -0.06 0.98 0.42 0.78 ERAP1 ENSG00000164307 -0.12 0.46 -0.04 0.93 -0.03 0.95 -0.18 0.14 -0.03 0.89 ERN1 ENSG00000178607 -0.13 0.78 -0.10 0.89 0.09 0.90 -0.09 0.87 0.46 6.89x10-3 FCGR2A ENSG00000143226 -1.28 0.05 -0.36 0.85 0.28 0.90 -0.44 0.71 0.07 0.97 FOS ENSG00000170345 0.89 3.08-3 0.86 0.03 0.51 0.55 0.80 0.03 0.28 0.60 GPR25 ENSG00000170128 0.09 0.98 -0.51 0.81 -1.26 0.45 -0.87 0.42 0.11 0.97 GPR65 ENSG00000140030 -0.41 0.77 0.10 0.97 0.14 0.95 0.12 0.94 0.19 0.88 IKZF1 ENSG00000185811 0.00 1.00 -0.05 0.89 0.16 0.54 -0.06 0.82 -0.11 0.43 IL10 ENSG00000136634 -0.29 0.92 0.15 0.96 0.20 0.95 -0.08 0.98 0.76 0.40 IL27 ENSG00000197272 -0.37 0.60 0.03 0.99 -0.46 0.59 -0.15 0.85 -0.17 0.76 IRF1 ENSG00000125347 0.01 0.99 -0.11 0.74 -0.03 0.96 -0.10 0.74 0.30 7.50x10-3

Chapter 4: RNA expression in AS 165

IRF5 ENSG00000128604 -0.30 0.21 -0.27 0.44 -0.29 0.56 -0.25 0.40 -0.15 0.61 KIF1A ENSG00000130294 0.04 0.99 0.46 0.80 -0.24 0.94 0.75 0.71 1.95 0.01 LTBR ENSG00000111321 -1.39 0.09 0.43 0.89 1.81 0.19 0.20 0.93 0.13 0.94 MEFV ENSG00000103313 -0.76 0.60 -0.13 0.98 0.91 0.67 0.97 0.39 0.13 0.95 MUC1 ENSG00000185499 -0.13 0.95 0.35 0.85 0.14 0.96 -0.47 0.77 -0.40 0.79 NFKB1 ENSG00000109320 -0.30 0.38 -0.02 0.99 0.04 0.97 -0.13 0.83 0.60 1.71x10-3 OTUD3 ENSG00000169914 -0.20 0.90 -0.02 0.99 -0.29 0.86 0.39 0.69 0.17 0.89 PDGFB ENSG00000100311 -0.17 0.90 -0.33 0.75 -0.74 0.42 0.72 0.36 -0.30 0.86 PRKCQ ENSG00000065675 0.12 0.82 0.11 0.87 -0.15 0.83 -0.12 0.82 1.24 1.15x10-9 PTGER4 ENSG00000171522 0.29 0.27 0.10 0.89 0.02 0.98 0.06 0.93 0.59 8.41x10-5 PTPN2 ENSG00000175354 -0.20 0.69 -0.09 0.92 0.08 0.94 -0.02 0.98 0.10 0.82 SMAD3 ENSG00000166949 -0.11 0.78 -0.07 0.91 -0.06 0.93 -0.12 0.73 0.31 0.02 SOCS1 ENSG00000185338 0.42 0.56 0.12 0.95 0.01 1.00 0.15 0.91 0.72 0.29 TAGAP ENSG00000164691 -0.02 0.99 -0.12 0.80 -0.05 0.95 -0.19 0.53 0.37 0.01 TLR4 ENSG00000136869 -0.44 0.85 0.58 0.80 1.11 0.59 -0.03 0.99 0.12 0.91 TNFAIP3 ENSG00000118503 0.56 0.16 0.34 0.66 -0.08 0.96 0.27 0.71 1.73 6.85x10-15 TNFRSF1A ENSG00000067182 -0.15 0.81 0.06 0.95 0.23 0.74 -0.01 0.99 -0.13 0.76 TNFSF8 ENSG00000106952 0.02 0.99 0.01 1.00 -0.36 0.66 0.06 0.96 0.36 0.29 UBE2L3 ENSG00000185651 -0.11 0.67 -0.05 0.91 -0.16 0.63 -0.22 0.17 -0.19 0.14 ZNF365 ENSG00000138311 0.52 0.85 -0.06 0.99 0.96 0.75 0.38 0.89 0.72 0.70

166 Chapter 4: RNA expression in AS

Table 4.6 Reverse strand results for all cell types for genes implicated at loci previously associated with AS.

Shown as the log2 fold change (log2FC) and Benjamini-Hochberg adjusted p-value (padj) for each gene in each cell type. Genes with padj<0.2 are coloured for increased (orange) or decreased (blue) expression in AS individuals.

CD4+T-cell CD8+T-cells GDT-cells NK Cells CD14+monocytes Gene Name Ensembl log2FC padj log2FC padj log2FC padj log2FC padj log2FC padj ADO ENSG00000181915 0.16 0.20 0.06 0.88 0.08 0.83 -0.09 0.66 0.12 0.36 AHR ENSG00000106546 0.01 0.99 0.16 0.78 0.11 0.88 0.07 0.88 0.40 0.02 AIRE ENSG00000160224 0.08 0.96 0.90 0.41 -0.01 1.00 -0.76 0.61 0.80 0.37 ANKRD55 ENSG00000164512 0.15 0.89 0.11 0.97 0.01 1.00 -0.41 0.78 -0.15 0.88 BACH2 ENSG00000112182 0.00 1.00 0.02 0.99 -0.18 0.78 0.02 0.97 0.41 0.08 EMSY ENSG00000158636 -0.09 0.24 -0.07 0.64 0.00 0.99 -0.07 0.51 -0.12 0.05 CARD9 ENSG00000187796 -1.31 7.79x1-4 -0.10 0.97 0.13 0.95 0.49 0.52 0.05 0.97 CMC1 ENSG00000187118 -0.27 0.17 -0.21 0.54 -0.44 0.16 -0.13 0.68 0.02 0.95 CTLA4 ENSG00000163599 0.44 0.21 0.41 0.46 0.51 0.45 0.47 0.34 0.47 0.29 CXCR1 ENSG00000163464 0.39 0.68 0.22 0.90 0.63 0.58 -0.18 0.86 -0.70 0.12 ERAP1 ENSG00000164307 -0.06 0.69 -0.02 0.96 0.05 0.87 -0.02 0.92 0.04 0.83 ERN1 ENSG00000178607 -0.10 0.75 -0.03 0.97 0.10 0.85 -0.08 0.82 0.22 0.17 FCGR2A ENSG00000143226 -1.58 2.31x10-6 -0.13 0.95 0.44 0.73 0.15 0.90 0.15 0.88 FOS ENSG00000170345 0.85 3.07x10-4 0.93 8.49x10-4 0.29 0.74 0.74 0.01 0.34 0.31 GPR25 ENSG00000170128 -0.08 0.93 -0.46 0.55 -0.13 0.92 -0.60 0.19 -0.16 0.93 GPR65 ENSG00000140030 0.12 0.56 -0.05 0.92 0.04 0.92 -0.21 0.16 0.00 0.99 IKZF1 ENSG00000185811 0.00 0.99 0.05 0.73 0.13 0.22 -0.04 0.67 -0.15 6.30x10-4 IL10 ENSG00000136634 0.31 0.80 0.05 0.99 1.03 0.51 -0.35 0.85 0.82 0.10 IL27 ENSG00000197272 0.18 0.94 -1.30 0.38 2.21 0.15 -1.08 0.44 0.40 0.70 IRF1 ENSG00000125347 0.06 0.83 -0.02 0.97 -0.07 0.88 -0.02 0.95 0.30 3.59x10-3

Chapter 4: RNA expression in AS 167

IRF5 ENSG00000128604 -0.57 4.79x10-3 -0.04 0.97 0.13 0.88 0.16 0.72 -0.03 0.95 KIF1A ENSG00000130294 NA NA NA NA NA NA NA NA NA NA KIF3B ENSG00000101350 0.02 0.90 0.04 0.88 0.04 0.88 0.12 0.19 -0.06 0.54 LTBR ENSG00000111321 -1.63 5.88x10-7 -0.07 0.98 0.40 0.76 0.01 0.99 0.12 0.89 MEFV ENSG00000103313 -1.53 6.97x10-5 -0.52 0.67 0.16 0.93 0.79 0.21 0.04 0.97 MUC1 ENSG00000185499 0.12 0.88 0.46 0.41 0.12 0.92 0.22 0.79 0.72 9.69x10-3 NFKB1 ENSG00000109320 0.09 0.70 0.09 0.81 0.04 0.94 0.12 0.57 0.48 1.99x10-7 OTUD3 ENSG00000169914 0.03 0.90 0.09 0.73 0.14 0.57 0.21 0.05 -0.02 0.94 PDGFB ENSG00000100311 -0.13 0.90 -0.25 0.86 -0.55 0.59 0.02 0.99 1.49 2.15x10-5 PRKCQ ENSG00000065675 0.02 0.94 0.03 0.95 0.01 0.98 -0.11 0.46 -0.02 0.94 PTGER4 ENSG00000171522 0.51 4.36x10-4 0.38 0.09 0.17 0.75 0.36 0.07 0.67 2.31x10-7 PTPN2 ENSG00000175354 -0.01 0.97 0.00 1.00 -0.02 0.93 0.04 0.73 0.16 1.69x10-4 SMAD3 ENSG00000166949 -0.10 0.66 -0.18 0.43 -0.07 0.88 -0.20 0.23 0.31 2.60x10-3 SOCS1 ENSG00000185338 0.46 0.03 0.36 0.31 0.17 0.82 0.09 0.86 0.72 4.23x10-5 TAGAP ENSG00000164691 -0.03 0.92 -0.08 0.86 -0.11 0.80 -0.09 0.75 0.41 4.78x10-5 TLR4 ENSG00000136869 -1.51 3.07x10-4 -0.39 0.82 0.59 0.69 0.79 0.26 0.13 0.91 TNFAIP3 ENSG00000118503 0.79 6.02x10-3 0.44 0.51 0.18 0.88 0.37 0.49 1.79 3.96x10-15 TNFRSF1A ENSG00000067182 -0.21 0.11 -0.14 0.63 0.03 0.96 -0.21 0.18 -0.09 0.64 TNFSF8 ENSG00000106952 0.07 0.88 -0.07 0.94 -0.28 0.57 0.02 0.97 0.32 0.09 UBE2L3 ENSG00000185651 0.00 1.00 0.05 0.82 0.08 0.72 0.03 0.88 0.04 0.79 ZNF365 ENSG00000138311 -0.16 0.80 0.10 0.94 -0.62 0.35 0.06 0.95 0.81 0.11

168 Chapter 4: RNA expression in AS

CD4+T-cells

Several AS-associated genes were significantly differentially expressed in CD4+T- cells, including increased expression of FOS, PTGER4 and SOCS1, and decreased expression in TLR4, FCGR2A, CARD9 and TNFAIP3 (shown above). None of these were amongst the 20 most significant differentially expressed genes (Table 4.7 and

4.8). Significant DEG identified include several genes within the T-cell receptor signalling pathway (NR4A2, JUN, JUNB, JUND, SYK, LAT2, SCIMP, TYROBP,

LAT3, PLCG2), and changes in the TLR4 signalling pathway (decreased expression:

TLR4, IRF8, IRF5, increased expression:STAT3, SOCS3, TRAF6) (Figure 4.4). HHEX, involved in T-cell development, has previously been shown to be downregulated in psoriasis, and is downregulated in AS CD4+T-cells (-1.64 log2FC) 290.

Figure 4.4 Volcano plot of the DEG associated with CD4+T-cells

DEG with an adjusted p-value <0.05 are coloured and the 10 DEG with the lowest p-value within each dataset are labelled with their gene name.

Chapter 4: RNA expression in AS 169

Table 4.7. Top 20 differentially expressed genes in CD4+T-cells forward strand.

Gene ID Symbol Alias MAP log2FC lfcSE Stat pvalue padj Gene Function CTD- ENSG00000269072 NA 19q13.41 -2.13 0.32 -6.75 1.52x10-11 2.12x10-7 Unknown 3187F8.14 ENSG00000261879 RP11-333E1.1 NA 17p13.2 -1.87 0.28 -6.72 1.84x10-11 2.12x10-7 Unknown Activated TLR4 signalling and ENSG00000177606 JUN AP-1 1p32.1 1.05 0.16 6.55 5.59x10-11 4.29x10-7 regulation of gene expression. Regulates IL-10 expression. Part of the FTS/Hook/FHIP ENSG00000095066 HOOK2 HK2 19p13.13 0.89 0.15 6.05 1.49x10-9 8.55x10-6 complex. Promotes vesicle trafficking and/or fusion. Cell cycle progression, pro- ENSG00000119801 YPEL5 CGI-127 2p23.1 0.66 0.11 5.87 4.36x10-9 2.00x10-5 apoptotic role. Decreased expression in Ps. ENSG00000152804 HHEX HEX 10q23.33 -1.87 0.32 -5.83 5.45x10-9 2.09x10-5 Regulates early lymphoid development. Higher expression enriched for IGFBP7- epithelial-mesenchymal ENSG00000245067 IGFBP7-AS1 4q12 -2.19 0.38 -5.76 8.32x10-9 2.73x10-5 AS1 system and p53 pathway terms. ER and plasma membrane ENSG00000133872 SARAF FOAP-7 8p12 0.59 0.10 5.64 1.75x10-8 5.02x10-5 negative regulator of intracellular Ca2+ entry. Initiates proteoglycan ENSG00000015532 XYLT2 PXYLT2 17q21.33 0.48 0.09 5.58 2.43x10-8 5.71x10-5 assembly. ENSG00000264270 RP11-474I11.7 NA NA 0.70 0.13 5.57 2.48x10-8 5.71x10-5 Unknown Downregulates TGF-β signalling pathway by ENSG00000087074 PPP1R15A GADD34 19q13.33 0.75 0.14 5.46 4.87x10-8 1.02x10-4 promoting TGFB1 dephosphorylation.

170 Chapter 4: RNA expression in AS

GTP-metabolizing protein. ENSG00000106069 CHN2 ARHGAP3 7p14.3 -1.48 0.27 -5.44 5.47x10-8 1.05x10-4 Insufficient expression increases Rac activity. ENSG00000237232 ZNF295-AS1 C21orf121 21q22.3 0.61 0.11 5.37 7.72x10-8 1.27x10-4 Unknown Intercellular adhesion and ENSG00000125810 CD93 C1QR1 20p11.21 -1.77 0.33 -5.37 7.74x10-8 1.27x10-4 phagocytosis. Treg and CD4 memory T-cells expressed. ENSG00000265845 RP11-20B24.5 NA 17q11.2 0.49 0.09 5.33 9.85x10-8 1.51x10-4 Unknown Unknown ENSG00000231006 RPL7P32 RPL7P NA 0.74 0.14 5.29 1.24x10-7 1.78x10-4 Interacts with VCAM1 (affinity capture). ENSG00000267232 CTB-31O20.9 NA NA 0.50 0.10 5.21 1.87x10-7 2.42x10-4 Unknown ENSG00000223401 RP11-211G3.2 BCL6-AS1 NA 0.64 0.12 5.21 1.90x10-7 2.42x10-4 Unknown RP11- ENSG00000263826 NA NA 0.21 0.04 5.13 2.90x10-7 3.51x10-4 Unknown 573D15.9 MEF2C- ENSG00000245864 CTC-467M3.1 5q14.3 -1.58 0.31 -5.11 3.16x10-7 3.63x10-4 Unknown AS1

Chapter 4: RNA expression in AS 171

Table 4.8 Top 20 differentially expressed genes in CD4+T-cells reverse strand.

Gene ID Symbol Alias MAP log2FC lfcSE Stat pvalue padj Gene Function ENSG00000154479 CCDC173 C2orf77 2q31.1 1.82 0.22 8.31 9.73x10-17 2.60x10-12 Unknown Downregulated in AS PBMCs. Targets FOXP3 directly and ENSG00000153234 NR4A2 HZF-3 2q24.1 1.31 0.17 7.72 1.14x10-14 6.10x10-11 regulates CD4+T-cell differentiation via IL-21 to Th1. Induced by NOTCH1. Potentially targeted by NF-kB. ENSG00000172115 CYCS CYC 7p15.3 0.39 0.05 7.79 6.77x10-15 6.10x10-11 Apoptosis. Alternative TCR signalling pathway with FcRγ chain ENSG00000165025 SYK p72-Syk 9q22.2 -1.63 0.21 -7.77 8.14x10-15 6.10x10-11 instead of CD3 ζ, Replaces ZAP70 in signalling. Immune restricted ENSG00000161929 SCIMP C17orf87 17p13.2 -1.91 0.25 -7.74 1.00x10-14 6.10x10-11 transmembrane protein scaffolding for TLR4. Inflammation signalling ENSG00000011600 TYROBP DAP12 19q13.12 -1.66 0.22 -7.61 2.84x10-14 1.26x10-10 through coupling to myeloid receptors. Cytotoxicity. Activated TLR4 signalling and ENSG00000177606 JUN AP-1 1p32.1 1.14 0.15 7.56 4.02x10-14 1.53x10-10 regulation of gene expression Cell cycle progression, pro- ENSG00000119801 YPEL5 CGI-127 2p23.1 0.57 0.08 7.40 1.37x10-13 4.56x10-10 apoptotic Glucose transporter, required for effector T-cell expansion ENSG00000059804 SLC2A3 GLUT3 12p13.31 0.78 0.11 7.24 4.58x10-13 1.31x10-9 and ability to induce inflammatory disease in vitro.

172 Chapter 4: RNA expression in AS

MAP4 kinase, Phosphorylates ENSG00000142178 SIK1 MSK 21q22.3 1.44 0.20 7.23 4.91x10-13 1.31x10-9 HDAC4, HDAC5, PPME1, SREBF1, CRTC1/TORC1 Regulates TNFRSF10A and ENSG00000175197 DDIT3 C/EBPzeta 12q13.3 0.82 0.12 7.09 1.35x10-12 3.28x10-9 TNFRSF10B expression in ER stress-mediated apoptosis. Inflammation, muscle ENSG00000135636 DYSF FER1L1 2p13.2 -1.41 0.20 -7.06 1.68x10-12 3.44x10-9 contraction and membrane regeneration and repairs. Can inhibit tyrosine signalling ENSG00000168995 SIGLEC7 AIRM1 19q13.41 -1.98 0.28 -7.07 1.59x10-12 3.44x10-9 by sequestering SHP1 and SHP2 Repression causes p38 MAPK ENSG00000143507 DUSP10 MKP-5 1q41 0.71 0.10 7.03 2.11x10-12 4.03x10-9 activation Acts as transcriptional ENSG00000173276 ZBTB21 ZNF295 21q22.3 0.61 0.09 6.92 4.62x10-12 8.23x10-9 repressor. 5.95x10- Monocyte/macrophage high ENSG00000172322 CLEC12A CD371 12p13.31 -2.03 0.29 -6.88 9.92x10-9 12 amounts. Decreased expression in Ps. ENSG00000152804 HHEX HEX 10q23.33 -1.64 0.24 -6.78 1.19x10-11 1.76x10-8 Regulates early lymphoid development. Intracellular vesicle trafficking. Expressed in ER, ENSG00000162236 STX5 SED5 11q12.3 0.30 0.04 6.79 1.12x10-11 1.76x10-8 ER to Golgi and cis-Golgi compartments. ENSG00000160058 BSDC1 BSDC1 1p35.1 0.24 0.04 6.73 1.67x10-11 2.35x10-8 Unknown Innate immunity, Altered ENSG00000158869 FCER1G FCRG 1q23.3 -1.33 0.20 -6.64 3.24x10-11 4.32x10-8 expression in immune response.

Chapter 4: RNA expression in AS 173

CD8+T-cells

The majority of DEG had increased expression in CD8+T-cells (Figure 4.5). Unlike

CD4+T-cells, two AS-associated genes were within the 20 most significant DEG in

CD8+T-cells forward and reverse strands (FOS, PTGER4) (Table 4.9 and 4.10). Genes associated with T-cell receptor signalling were upregulated in CD8+T-cells of AS individuals (JUN, FOS, JUND, ATF3, NR4A2, CXCL4, LIF, CYCS, JUNB, CD69,

SIK1, ING3, PTGER4, DUSP10). TNFRSF11A had increased expression in AS individuals CD8+T-cells (Forward strand 2.18 log2FC, adj. p-value 0.06, Reverse strand: 1.52 log2FC, adj. p-value 4.76 x10-4). TNFRSF11A, also known as RANK, was previously associated with radiographic change in AS 291. Interestingly EEF1DP3, upregulated in reverse strand CD8+T-cells of AS individuals, was associated with copy number variation in a Korean AS cohort 292. When examined by heatmap, the 20 most significant DEG begin to differentiate AS individuals from healthy controls

(Figure 4.6).

Figure 4.5 Volcano plot of CD8+T-cell DEG DEG with an adjusted p-value <0.05 are coloured and the 10 DEG with the lowest p-value within each dataset are labelled with their gene name.

174 Chapter 4: RNA expression in AS

Figure 4.6 Heatmap of the 20 most significant DEG in CD8+T-cells forward strand (A) and reverse strand (B) coloured by fold change in expression levels.

Chapter 4: RNA expression in AS 175

Table 4.9 CD8+T-cell forward strand top 20 differentially expressed genes in AS individuals compared to healthy controls.

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function RP11- ENSG00000227705 NA NA 1.86 0.35 5.26 1.46x10-7 0.003 Unknown 15M15.2 RP11- ENSG00000255928 NA NA 0.79 0.16 5.06 4.20x10-7 0.005 Unknown 456I15.2 Adaptor protein that constitutively binds to ENSG00000035681 NSMAF FAN 8q12.1 0.62 0.13 4.78 1.72x10-6 0.012 TNFR1, required for TNF-α induced IL-6 and CXCL-2 expression. Autophagy. Regulates Wnt/β-catenin ENSG00000103769 RAB11A YL8 15q22.31 0.55 0.12 4.74 2.12x10-6 0.012 signalling. Correlated with TCF expression. ENSG00000228413 AC024937.2 NA NA 1.50 0.32 4.66 3.10x10-6 0.014 Unknown Activated TLR4 signalling and ENSG00000177606 JUN AP-1 1p32.1 0.75 0.16 4.64 3.52x10-6 0.014 regulation of gene expression ENSG00000235790 RP11-73M7.6 NA NA 0.46 0.10 4.57 4.93x10-6 0.016 Unknown ENSG00000201013 Y_RNA NA NA 0.72 0.16 4.48 7.58x10-6 0.021 Unknown Interacts with p53 tumour ENSG00000153487 ING1 p24ING1c 13q34 0.51 0.11 4.45 8.41x10-6 0.021 suppressor protein. Apoptosis, DNA demethylation. RP11- ENSG00000263826 NA NA 0.18 0.04 4.39 1.14x10-5 0.026 Unknown 573D15.9 - ENSG00000157600 TMEM164 bB360B22.3 Xq23 -0.68 0.16 1.35x10-5 0.028 Unknown 4.35 Wybutosine synthesis. ENSG00000198874 TYW1 RSAFD1 7q11.21 0.36 0.08 4.34 1.45x10-5 0.028 Stabilizes codon-anticodon interactions in the ribosome

176 Chapter 4: RNA expression in AS

On TGF-B activation forms ENSG00000170345 FOS AP-1 14q24.3 0.86 0.20 4.29 1.75x10-5 0.031 SMAD3/SMAD4/JUN/FOS complex to regulate signalling. Associated with juvenile ENSG00000151327 FAM177A1 C14orf24 14q13.2 0.27 0.06 4.23 2.39x10-5 0.033 idiopathic arthritis. ENSG00000259883 EHD4-AS1 EHD4-AS1 15q15.1 0.88 0.21 4.21 2.50x10-5 0.033 Unknown Sleep/wake control. increased ENSG00000121764 HCRTR1 OX1R 1p35.2 0.32 0.08 4.21 2.58x10-5 0.033 in UC colon. Apoptosis. Acts at the G2/M cell cycle ENSG00000158402 CDC25C CDC25 5q31.2 0.44 0.11 4.21 2.58x10-5 0.033 transition point Transcription factor induced by IL4 signalling. Shown in ENSG00000165030 NFIL3 E4BP4 9q22.31 1.12 0.27 4.21 2.61x10-5 0.033 macrophages to regulate IL12p40 production. ENSG00000207494 Y_RNA NA NA 1.25 0.30 4.18 2.88x10-5 0.035 Unknown ER and plasma membrane ENSG00000133872 SARAF FOAP-7 8p12 0.44 0.11 4.14 3.40x10-5 0.039 expressed negative regulator of intracellular Ca2+ entry.

Chapter 4: RNA expression in AS 177

Table 4.10 CD8+T-cell reverse strand top 20 differentially expressed genes in AS individuals compared to healthy controls

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function Cell cycle progression, ENSG00000119801 YPEL5 CGI-127 2p23.1 0.45 0.08 5.84 5.35x10-9 1.43x10-4 pro-apoptotic role. Activated TLR4 signalling and ENSG00000177606 JUN AP-1 1p32.1 0.85 0.15 5.55 2.84x10-8 2.53x10-4 regulation of gene expression Induced by Wnt/β-catenin, ENSG00000057294 PKP2 ARVD9 12p11.21 1.57 0.28 5.56 2.70x10-8 2.53x10-4 may play a role in junctional plaques. Repression causes p38 MAPK ENSG00000143507 DUSP10 MKP-5 1q41 0.54 0.10 5.31 1.07x10-7 4.77x10-4 activation Encode neurofilament light ENSG00000104725 NEFL NULL NA 1.50 0.28 5.31 1.07x10-7 4.77x10-4 chain. High expression in memory T-cells. ENSG00000141655 TNFRSF11A CD265 18q21.33 1.53 0.28 5.37 7.79x10-8 4.77x10-4 AS GWAS. NF-kB Positive regulator of ICAM1 ENSG00000138670 RASGEF1B GPIG4 4q21.21 1.00 0.19 5.26 1.46x10-7 5.56x10-4 in TLR4/LPS pathway Transcription initiation by ENSG00000137947 GTF2B TF2B 1p22.2 0.25 0.05 5.09 3.50x10-7 8.49x10-4 RNA polymerase II in DAB complex Non-catalytic metalloprotease- ENSG00000114948 ADAM23 MDC-3 2q33.3 1.60 0.31 5.12 3.01x10-7 8.49x10-4 like protein. Unclear role in T- cells. Microtubule associated protein that is involved in ENSG00000073910 FRY 13CDNA73 13q13.1 0.56 0.11 5.11 3.18x10-7 8.49x10-4 chromosome alignment, and spindle organization. Transcription factor induced ENSG00000170345 FOS AP-1 14q24.3 0.93 0.18 5.15 2.59x10-7 8.49x10-4 by FOS regulates anergy

178 Chapter 4: RNA expression in AS

Induced in response to ENSG00000144655 CSRNP1 AXUD1 3p22.2 0.79 0.16 5.00 5.74x10-7 0.001 elevated axin, a negative regulator of Wnt signalling. ENSG00000202297 RNY3P12 RNY3P12 10 1.52 0.30 4.99 5.99x10-7 0.001 Unknown Regulate TNFRSF10A and 10B in ER stress mediated ENSG00000175197 DDIT3 C/EBPzeta 12q13.3 0.57 0.12 4.90 9.43x10-7 0.002 apoptosis. Mediates IL-17A inhibition induced apoptosis. Transcription factor involved ENSG00000162772 ATF3 ATF3 1q32.3 1.10 0.23 4.85 1.26x10-6 0.002 in T-cell activation via JUN. Induced by TLR signalling. Korean AS cohort had CNV in ENSG00000229715 EEF1DP3 EEF1DP3 NA 0.68 0.14 4.85 1.23x10-6 0.002 EEF1DP3 region (Jung et al 2014). Glucose transporter required for effector T-cell expansion ENSG00000059804 SLC2A3 GLUT3 12p13.31 0.52 0.11 4.80 1.61x10-6 0.003 and ability to induce inflammatory disease in vitro. Encodes NOXA, selects ENSG00000141682 PMAIP1 APR 18q21.32 0.73 0.16 4.70 2.57x10-6 0.004 antigen specific T-cells Represses cell proliferation. ENSG00000130522 JUND AP-1 19p13.11 0.39 0.08 4.67 3.06x10-6 0.004 Regulates Th cytokine expression. Downregulated in AS PBMCs. ENSG00000153234 NR4A2 HZF-3 2q24.1 0.79 0.17 4.60 4.21x10-6 0.005 Targets FOXP3 directly. Induced by NOTCH1.

Chapter 4: RNA expression in AS 179

CD14+monocyte

AS-associated genes showing differential expression in AS cases compared to healthy control subjects in CD14+monocytes included TNFAIP3, NFKBIA, PTGER4, IRF1,

SMAD3, ERN1 and SOCS1 (Table 4.11 and 4.12). All exhibited increased expression in AS individual CD14+monocytes, which reflected the shift in expression of the majority of DEG (Figure 4.7). Two genes related to TNFAIP3 (a TNF receptor) were also upregulated, TNIP1 which binds to TNFAIP3 to induce NF-kB inhibition, and

RP11-35612.4, which is predicted to regulate TNFAIP3. Negative regulators of the

TLR signalling pathway were upregulated in CD14+monocytes (TRAF1), as well as genes downstream of TLR activation (HS3ST3B1, IRAK2, STAT4). Increased expression of inflammatory cytokines associated with AS (TNFα, IL23, IL6 and IL1β) was observed in CD14+monocytes. CCL3 (MIP1α) and CCL4 (MIP1β) were both upregulated in CD14+monocytes of AS individuals. CCL3 and CCL4 are chemokines that are upregulated by inflammatory cytokines (TNFα, IL1β, INFγ) and can signal through CCR5 or CCR1.

Figure 4.7 Volcano plot of the DEG associated with CD14+monocytes DEG with an adjusted p-value <0.05 are coloured and the 10 DEG with the lowest p-value are labelled.

180 Chapter 4: RNA expression in AS

Table 4.11 CD14+monocytes forward strand top 20 differentially expressed genes in AS individuals compared to healthy controls.

Gene ID Symbol Alias MAP log2FC lfcSE stat Pvalue padj Gene Function Induced by TNF-α and CD40/CD40L. Selective for ENSG00000129277 CCL4 NULL NA 4.28 0.38 11.35 7.07x10-30 1.63x10-25 CCR5. Attractant to monocytes. Induces monocyte calcium mobilization. Negatively regulates TLR ENSG00000056558 TRAF1 EBI6 9q33.2 2.12 0.21 10.14 3.71x10-24 4.27x10-20 signalling. RA associated loci. CCR5 pathway in monocytes upstream of TNFR1, results in ENSG00000103168 TAF1C MGC:39976 16q24.1 2.39 0.24 9.80 1.07x10-22 8.23x10-19 NFKB, JNK and MAPK activation. Previously identified as increased in AS Protects cells against oxidative ENSG00000211445 GPX3 GPx-P 5q33.1 1.54 0.16 9.46 2.99x10-21 1.72x10-17 damage. Upregulated during CRP overexpression (in mice). SpA HLA-B27 transgenic rat model shows dysregulation of XCR1+DCs, and IL-7Rα ENSG00000173578 XCR1 CCXCR1 3p21.31 3.22 0.36 9.06 1.36x10-19 6.27x10-16 blockade showed reduced XCR1 expression in collagen induced mice. Induces autophagy. Deficiency facilitates inflammasome ENSG00000118503 TNFAIP3 A20 6q23.3 1.73 0.20 8.77 1.79x10-18 6.85x10-15 activation in monocytes. AS associated loci. Involved in antigen cross- ENSG00000223380 SEC22B NULL NA 2.13 0.25 8.46 2.57x10-17 8.44x10-14 presentation.

Chapter 4: RNA expression in AS 181

CTD- ENSG00000266978 NA NA 0.92 0.11 8.36 6.05x10-17 1.74x10-13 Unknown 2369P2.5 Metabolic processes and cell- ENSG00000042980 ADAM28 ADAM 28 8p21.2 3.20 0.38 8.34 7.37x10-17 1.88x10-13 cell interactions implicated. ENSG00000230438 SERPINB9P1 SERPINB9P 6p25.2 2.19 0.27 8.24 1.67x10-16 3.84x10-13 Unknown Binds CCR1, CCR4 and ENSG00000006075 CCL3 NULL NA 2.57 0.32 8.01 1.12x10-15 2.35x10-12 CCR5. ENSG00000224298 AC069363.1 NULL NA 2.49 0.32 7.91 2.55x10-15 4.88x10-12 Unknown Glycoprotein binding ligands ENSG00000013725 CD6 TP120 11q12.2 1.38 0.18 7.79 6.87x10-15 1.21x10-11 CD166 and CD318. AS monocyte-derived macrophages showed no correlation between TNF-α ENSG00000128271 ADORA2A A2aR 22q11.23 2.23 0.29 7.63 2.28x10-14 3.75x10-11 and ADORA2 expression, but inhibition of ADORA2A results in increased IL23A and reduced TNFα. TNFAIP3 interacting protein, ENSG00000145901 TNIP1 ABIN-1 5q33.1 0.88 0.12 7.57 3.78x10-14 5.80x10-11 enhances NF-κβ inhibition. Predicted to regulate ENSG00000237499 RP11-356I2.4 NA 6q23.3 1.22 0.16 7.55 4.33x10-14 6.23x10-11 TNFAIP3. ENSG00000232133 IMPDH1P10 IMPDH1P10 NA 0.77 0.10 7.54 4.67x10-14 6.32x10-11 Unknown ENSG00000249786 EAF1-AS1 EAF1-AS1 NA 0.64 0.09 7.24 4.58x10-13 5.85x10-10 Unknown Last step in heparan sulfate ENSG00000125430 HS3ST3B1 3-OST-3B 17p12 1.78 0.25 7.19 6.47x10-13 7.83x10-10 biosynthesis. Upregulated after TLR agonist exposure. ENSG00000100906 NFKBIA IKBA 14q13.2 1.17 0.16 7.14 9.08x10-13 1.04x10-9 AS associated locus.

182 Chapter 4: RNA expression in AS

Table 4.12 CD14+monocytes reverse strand top 20 differentially expressed genes in AS individuals compared to healthy controls.

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function Negatively regulates TLR ENSG00000056558 TRAF1 EBI6 9q33.2 1.89 0.17 11.12 1.05x10-28 2.81x10-24 signalling. Third enzyme in the ENSG00000100024 UPB1 BUP1 22q11.23 2.84 0.27 10.61 2.81x10-26 3.74x10-22 pyrimidine degradation pathway. Activated downstream of IRAK4 by TLR signalling. ENSG00000134070 IRAK2 IRAK-2 3p25.3 1.94 0.18 10.52 7.28x10-26 6.47x10-22 Mediates nuclear import of inflammatory genes Dephosphorylates ERK, JNK, ENSG00000184545 DUSP8 C11orf81 11p15.5 2.24 0.23 9.92 3.46x10-23 2.31x10-19 p38 MAPK. Unknown. Associated with negative ENSG00000266709 RP11-214O1.2 NA 17p12 1.65 0.17 9.70 3.05x10-22 1.63x10-18 regulation of cell-substrate adhesion Last step heparan sulfate ENSG00000125430 HS3ST3B1 3-OST-3B 17p12 1.47 0.15 9.53 1.53x10-21 6.79x10-18 biosynthesis. Upregulated by TLR agonist exposure. Positive regulator of IL-8 and ENSG00000253522 MIR3142HG MIR3142HG 5q33.3 3.15 0.34 9.39 6.18x10-21 2.36x10-17 CCL2. Induced by IL-1β. Negatively regulates NFkB ENSG00000050730 TNIP3 ABIN-3 4q27 2.90 0.31 9.27 1.94x10-20 6.17x10-17 activation in response to TNF and NfkB ENSG00000266378 RP11-214O1.3 NA NA 2.01 0.22 9.26 2.08x10-20 6.17x10-17 Unknown Regulates mRNA decay, ENSG00000163874 ZC3H12A MCPIP 1p34.3 1.37 0.15 9.16 5.42x10-20 1.45x10-16 rapidly induced by IL-17A. Induced by IL-1B.

Chapter 4: RNA expression in AS 183

Induced by RUNX3. ENSG00000110944 IL23A IL-23 12q13.3 2.30 0.25 9.13 7.11x10-20 1.72x10-16 Overexpressed in AS gut. IL23R is AS-associated locus. AS-associated locus. Increased 2q32.2- ENSG00000138378 STAT4 SLEB11 1.33 0.15 8.93 4.28x10-19 9.52x10-16 expression in AS PBMCs. q32.3 IL23 signalling pathway Encodes MRCKy, which acts as a downstream effector of ENSG00000171219 CDC42BPG DMPK2 11q13.1 1.34 0.15 8.90 5.52x10-19 1.13x10-15 CDC42 in cytoskeletal reorganization. Encodes ZIP8, metal cation transporter. Alters monocyte ENSG00000138821 SLC39A8 BIGM103 4q24 1.42 0.16 8.84 9.87x10-19 1.88x10-15 adhesion and recruitment by increasing Zn2+ absorption Carrier proteins for ENSG00000175970 UNC119B POC7B 12q24.31 -0.60 0.07 -8.81 1.26x10-18 2.25x10-15 mystolyated cargo. Can induce apoptosis ENSG00000164236 ANKRD33B ANKRD33B 5p15.2 2.16 0.25 8.73 2.52x10-18 3.96x10-15 Unknown AS associated loci. Induces autophagy. Deficiency ENSG00000118503 TNFAIP3 A20 6q23.3 1.79 0.21 8.74 2.42x10-18 3.96x10-15 facilitates inflammasome activation in monocytes. ENSG00000172653 C17orf66 HEATR9 NA 2.47 0.28 8.71 2.92x10-18 4.33x10-15 Unknown Phosphorylates MAP2 and ENSG00000069956 MAPK6 ERK3 15q21.2 0.75 0.09 8.69 3.76x10-18 5.29x10-15 MAPKAPK5. Cell cycle entry. Inflammatory cytokine, highly ENSG00000232810 TNF DIF 6p21.33 1.94 0.23 8.56 1.13x10-17 1.43x10-14 expressed in AS. Monocytes are the primary producer.

184 Chapter 4: RNA expression in AS

γδ T-cells

TRADD (TNF receptor type 1 associated death domain) was downregulated in AS individuals γδ T-cells, and has previously been demonstrated to have a suggestive association with AS (Table 4.13) 293. Due to the small number of individuals, very few

DEG were significantly associated after multiple testing adjustment (Benjamini-

Hochberg) (Figure 4.8 and Table 4.14). DEG were involved in cell cycle (CENPCP1,

MND1, SKA3), cytoskeleton remodelling (ANK1, INF2, PEAK1) and cell differentiation (MKPL2, FOXM1, DAB2). Large fold changes were identified in

DNMT3L (log2FC -2.13, adj. p-value 0.21) an AS associated locus. Interestingly

TET2, one of three TET proteins responsible for removal of DNA methylation marks had a small increase in expression in AS individual γδ T-cells, though this was only statistically significant in reverse strand reads (0.25 log2FC, adjusted p-value 0.04).

The 20 most significant DEG in γδ T-cells were sufficient to differentiate AS and healthy individuals (heatmap Figure 4.9).

Figure 4.8 Volcano plot of γδ T-cell associated DEG DEG with an adjusted p-value <0.05 are coloured and the 10 DEG with the lowest p-value within each dataset are labelled with their gene name.

Chapter 4: RNA expression in AS 185

Figure 4.9 Heatmaps of GD T-cell 20 most significant DEG for forward (A) and reverse (B) strand reads.

186 Chapter 4: RNA expression in AS

Table 4.13. γδ T-cell forward strand top 20 differentially expressed genes.

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function FAM201A expression is ENSG00000204860 FAM201A C9orf122 9p13.1 -2.42 0.40 -6.13 8.92x10-10 2.05x10-5 correlated with EGFR and HIF- 1α levels. ENSG00000265778 RP11-17M16.2 NA 18q23 2.41 0.44 5.52 3.30x10-8 3.80x10-4 Unknown ENSG00000230385 AC012507.4 NA NA -1.72 0.34 -5.04 4.69x10-7 0.004 Unknown Arp2/3 protein complex subunit. May be involved in ENSG00000241685 ARPC1A Arc40 7q22.1 -1.19 0.24 -4.90 9.38x10-7 0.005 control of actin polymerization. Interacts with ERBB2. ENSG00000259188 CTD-2647E9.3 NA NA 1.41 0.30 4.79 1.68x10-6 0.007 Unknown Involved in G2/M phase cell ENSG00000162910 MRPL55 AAVG5835 1q42.13 1.60 0.34 4.76 1.92x10-6 0.007 cycle progression. Necessary for Golgi architecture, podosome ENSG00000203485 INF2 C14orf151 14q32.33 -1.60 0.34 -4.75 2.04x10-6 0.007 structure and centrosome repositioning in T-cells after TCR engagement. Cell surface molecule expressed in synovia and leukocytes, may ENSG00000162512 SDC3 SDCN 1p35.2 2.89 0.63 4.62 3.85x10-6 0.011 bind CXCL8. Potential anti- inflammatory effect by leukocyte rolling & adhesion. Maintains proper kinetochore ENSG00000226982 CENPCP1 CENPC NA 1.15 0.25 4.59 4.38x10-6 0.011 size and timely transition to anaphase. Couples SOCS proteins with ENSG00000196372 ASB13 ASB13 10p15.1 2.91 0.64 4.57 4.90x10-6 0.011 binding partners, possibly to tag for degradation.

Chapter 4: RNA expression in AS 187

Involved in transcriptional elongation, pre-miRNA ENSG00000051596 THOC3 THO3 5q35.2 -1.78 0.40 -4.42 9.75x10-6 0.020 processing and nuclear mRNA transport. ENSG00000271868 RP11-1293J14.1 NA NA 1.92 0.44 4.37 1.24x10-5 0.022 Unknown Catalyzes oxidation of sulphite to sulfate. RA associated. ENSG00000139531 SUOX SUOX 12q13.2 2.26 0.52 4.37 1.24x10-5 0.022 Located in mitochondrial intermembrane space. Potential component of ENSG00000164967 RPP25L C9orf23 9p13.3 -2.92 0.70 -4.14 3.40x10-5 0.054 ribonuclease P. ENSG00000224831 RP11-651P23.4 NA NA 2.11 0.51 4.13 3.60x10-5 0.054 Unknown ENSG00000258760 CTD-2509G16.5 NA NA -2.12 0.52 -4.12 3.83x10-5 0.054 Unknown Negative regulator of Wnt signalling pathway. Targeted by ENSG00000153071 DAB2 DOC-2 5p13.1 1.67 0.41 4.11 4.01x10-5 0.054 FOXP3 and necessary for Treg function in maintaining naïve T-cell homeostasis. Enzyme involved in oxidative ENSG00000159348 CYB5R1 B5R.1 1q32.1 2.28 0.56 4.07 4.76x10-5 0.056 stress protection and drug metabolism Voltage-gated potassium ENSG00000131398 KCNC3 KSHIIID 19q13.33 -2.25 0.55 -4.07 4.80x10-5 0.056 channel. Associated with ER stress. Negatively regulates osteoclast ENSG00000075420 FNDC3B FAD104 3q26.31 0.49 0.12 4.06 4.85x10-5 0.056 differentiation and interacts with STAT3.

188 Chapter 4: RNA expression in AS

Table 4.14 γδ T-cell reverse strand top 20 differentially expressed genes.

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function Necessary for cell cycle. ENSG00000121211 MND1 GAJ 4q31.3 1.98 0.37 5.33 9.95x10-8 0.001 Forms complex with HOP2. TNF/LTA/LTB locus. Type II ENSG00000227507 LTB TNFC 6p21.33 -0.84 0.16 -5.39 7.02x10-8 0.001 membrane protein of the TNF family. Cell membrane protein. ENSG00000029534 ANK1 ANK 8p11.21 -1.65 0.32 -5.21 1.85x10-7 0.002 Interacts with cytoskeletal proteins (e.g. IkBζ, GP85) Cytoskeleton associated ENSG00000173517 PEAK1 SGK269 15q24.3 0.42 0.08 5.17 2.28x10-7 0.002 modulating focal adhesion dynamics. Scaffold for EGFR. Downregulated by PRDM1 ENSG00000252577 SCARNA20 ACA66 17q23.2 -1.77 0.36 -4.92 8.76x10-7 0.005 overexpression. Controls active vitamin D ENSG00000019186 CYP24A1 CP24 20q13.2 -2.66 0.55 -4.88 1.03x10-6 0.005 availability and plays role in calcium homeostasis. CTD- ENSG00000261396 NA NA 1.17 0.25 4.73 2.27x10-6 0.009 Unknown 2012K14.2 SKA1 complex component ENSG00000165480 SKA3 C13orf3 13q12.11 1.11 0.24 4.65 3.34x10-6 0.011 which is essential for proper chromosome segregation. ER membrane protein required for use of mannose-dolichol ENSG00000129255 MPDU1 CDGIF 17p13.1 0.43 0.09 4.61 4.01x10-6 0.012 phosphate in oligosaccharides and glycosylphosphatidylinositols. ENSG00000225345 SNX18P3 SNX18P3 NA -1.45 0.32 -4.56 5.23x10-6 0.014 Unknown

Chapter 4: RNA expression in AS 189

Pseudogene of UHFR1, may RP11- ENSG00000256663 NA NA 1.77 0.39 4.50 6.79x10-6 0.015 act as a competing 424C20.2 endogenous RNA. Transcriptional regulator. 16q22.2- Associated with RUNX3. ENSG00000140836 ZFHX3 ATBF1 0.99 0.22 4.51 6.42x10-6 0.015 q22.3 Upregulates CDKN1A following TGFβ stimulation. Leukocyte migration, ENSG00000135898 GPR55 LPIR1 2q37.1 -1.39 0.31 -4.43 9.35x10-6 0.015 signalling to NFAT, NFkB and CREB pathways. RP11- ENSG00000250992 NA NA -2.83 0.64 -4.42 1.00x10-5 0.015 Unknown 826N14.1 ENSG00000256316 HIST1H3F NULL NA 1.07 0.24 4.41 1.02x10-5 0.015 Histone. Negative regulator of ENSG00000160539 PLPP7 C9orf67 9q34.13 2.65 0.60 4.41 1.04x10-5 0.015 myoblast differentiation via MTOR signalling. Transcription factor and master regulator of activated ENSG00000111206 FOXM1 FKHL16 12p13.33 1.27 0.29 4.42 9.80x10-6 0.015 mature T-cells. High expression in T-regs. ENSG00000077348 EXOSC5 RRP41B 19q13.2 -0.50 0.11 -4.42 1.00x10-5 0.015 RNA degradation factor. Transcriptional repressor of ENSG00000101493 ZNF516 HsT287 18q23 0.76 0.17 4.38 1.19x10-5 0.017 EGFR, associated with bone mineral density. Suggestive AS-association. Adapter molecule for ENSG00000102871 TRADD Hs.89862 16q22.1 -0.45 0.10 -4.36 1.29x10-5 0.017 TNFRSF1A. overexpression leads to apoptosis and NF-κβ activation..

190 Chapter 4: RNA expression in AS

NK cells

Many of the genes identified as differentially expressed in NK cell forward strand reads are not annotated, far more than other cell types (Table 4.15 and Table 4.16).

Similar to CD14+monocytes, the majority of DEG identified in NK cells had increased expression in AS individuals (Figure 4.10). Annotated genes were genes related to functions of cytoskeleton remodelling (ARHGEF11, ABI2, TAGLN), cell adhesion

(CDH23, ITGA1) and cell activation and migration (RAPH1, MGLL, CCR2). NFKBIZ, with increased expression in AS individual NK cells, is essential for NK cell INF-γ production and cytotoxic activity. NFKBIZ is induced by IL-17A, IL-17B and IL-1β.

In NK cells of AS individuals, both TNF and RELT, a TNF receptor, are upregulated

(TNF: reverse strand 1.08 log2FC, adj.p-value 1.54x10-3).

Figure 4.10 Volcano plot of NK cell associated DEG

DEG with an adjusted p-value <0.05 are coloured and the 10 DEG with the lowest p-value within each dataset are labelled with their gene name.

Chapter 4: RNA expression in AS 191

Table 4.15 NK cell forward strand top 20 differentially expressed genes.

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function 19p13.13- ENSG00000132003 ZSWIM4 ZSWIM4 1.64 0.25 6.65 2.87x10-11 6.61x10-7 Unknown p13.12 Guanine-nucleotide exchange ENSG00000132694 ARHGEF11 GTRAP48 1q23.1 1.33 0.25 5.24 1.57x10-7 0.002 factor for RhoA, regulator of actin cytoskeleton dynamics. CTD- ENSG00000248898 NA NA 2.30 0.44 5.20 2.01x10-7 0.002 Unknown 2288O8.1 Regulator of actin cytoskeleton dynamics, and ENSG00000138443 ABI2 ABI-2 2q33.2 1.03 0.20 5.15 2.65x10-7 0.002 substrate for non-tyrosine kinase ABL1 and ABL2. ENSG00000256928 RP11-809N8.2 NA NA 0.70 0.14 5.05 4.42x10-7 0.002 Unknown RP11- ENSG00000259884 NA NA 1.66 0.33 5.04 4.64x10-7 0.002 Targets UPF1 in PsA. 1100L3.8 ENSG00000233547 RP11-57H14.2 NA NA 0.90 0.18 5.00 5.79x10-7 0.002 Unknown Stepwise process of protein ENSG00000157045 NTAN1 PNAA 16p13.11 -0.60 0.12 -4.89 1.03x10-6 0.003 degradation through the N-end rule pathway Binds nucleic acid. Associated ENSG00000173451 THAP2 THAP2 12q21.1 0.30 0.06 4.87 1.12x10-6 0.003 with imatinib resistance. Induced by IL-17A/IL-17F heterodimer. Inhibits NF- κβ. ENSG00000144802 NFKBIZ IKBZ 3q12.3 0.92 0.19 4.86 1.19x10-6 0.003 Activates IL-6 expression. Essential for IL-12 and IL-18 activation of NK cells. ENSG00000121316 PLBD1 PLBD1 12p13.1 -0.68 0.14 -4.86 1.20x10-6 0.003 Weak phosphatase activity ENSG00000158816 VWA5B1 VWA5B1 1p36.12 2.24 0.46 4.84 1.32x10-6 0.003 Unknown ENSG00000186047 DLEU7 DLEU7 13q14.3 -1.30 0.27 -4.80 1.62x10-6 0.003 Unknown

192 Chapter 4: RNA expression in AS

CTD- ENSG00000255441 NA 19q13.41 1.67 0.35 4.76 1.91x10-6 0.003 Unknown 2616J11.2 ENSG00000259883 EHD4-AS1 EHD4-AS1 15q15.1 1.02 0.22 4.70 2.55x10-6 0.004 Unknown ENSG00000184596 AF207550.1 NULL NA 0.52 0.11 4.69 2.73x10-6 0.004 Unknown ENSG00000227946 AC007383.3 NA NA 0.33 0.07 4.65 3.40x10-6 0.005 Unknown ENSG00000151327 FAM177A1 C14orf24 14q13.2 0.31 0.07 4.63 3.67x10-6 0.005 Associated with JIA ENSG00000272072 CTA-363E19.2 NA NA 0.51 0.11 4.63 3.72x10-6 0.005 Unknown ENSG00000256249 RP11-324E6.6 NA NA 2.02 0.44 4.61 4.12x10-6 0.005 Unknown

Chapter 4: RNA expression in AS 193

Table 4.16 NK cell reverse strand top 20 differentially expressed genes.

Gene ID Symbol Alias MAP log2FC lfcSE stat pvalue padj Gene Function Guanine-nucleotide exchange ENSG00000138670 RASGEF1B GPIG4 4q21.21 1.35 0.20 6.86 6.67x10-12 1.78x10-7 factor stimulated by TLR3 and TLR4 ENSG00000067082 KLF6 BCD1 10p15.2 0.67 0.10 6.59 4.34x10-11 3.86x10-7 Targets x10-cadherin, MMP14, p21. Upregulated by TGF-B Precursor miRNA. Correlated -11 -7 ENSG00000267519 MIR24-2 MIRN24-2 19p13.12 1.36 0.21 6.63 3.31x10 3.86x10 expression with TNF, IL-6, IL1B, TLR4, CXCL8. ENSG00000107736 CDH23 CDHR23 10q22.1 1.31 0.21 6.18 6.57x10-10 4.38x10-6 Calcium dependent cell-cell adhesion glycoproteins. Form a cell-surface receptor ENSG00000213949 ITGA1 CD49a 5q11.2 1.91 0.31 6.13 8.95x10-10 4.78x10-6 with ITGB1 for collagen and laminin. Cell adhesion link in inflammation Expression is restricted to brain ENSG00000075340 ADD2 ADDB 2p13.3 1.77 0.29 6.07 1.26x10-9 5.59x10-6 and hematopoietic tissues. Promotes the assembly of the spectrin-actin network. ER stress response transcription factor. Positively ENSG00000175197 DDIT3 C/EBPzeta 12q13.3 0.72 0.12 5.96 2.60x10-9 9.90x10-6 regulates IL-6, IL-23, TNFRSF10B. Inhibits canonical Wnt- signalling by binding to TCF4. -9 -5 ENSG00000261026 CTD-3247F14.2 NA NA 2.49 0.43 5.78 7.68x10 2.56x10 Unknown ENSG00000173166 RAPH1 ALS2CR18 2q33.2 1.46 0.26 5.61 1.98x10-8 4.41x10-5 Cell migration ENSG00000121807 CCR2 CC-CKR-2 3p21.31 -1.41 0.25 -5.64 1.65x10-8 4.41x10-5 Monocyte chemoattractant,

194 Chapter 4: RNA expression in AS

involved in RA Induced by IL-17A/IL-17F. Inhibits NF- κβ. Activates IL-6 ENSG00000144802 NFKBIZ IKBZ 3q12.3 0.93 0.16 5.64 1.73x10-8 4.41x10-5 expression. Essential for IL-12 and IL-18 activation of NK cells. Calcium channel mediating the ENSG00000165548 TMEM63C C14orf171 14q24.3 1.51 0.27 5.63 1.84x10-8 4.41x10-5 glomerular filtration barrier function Cell activation marker. ENSG00000074416 MGLL HU-K5 3q21.3 1.52 0.27 5.54 3.09x10-8 5.50x10-5 Catalyzes the conversion of monoacylglycerides to free fatty acids and glycerol. ENSG00000164142 FAM160A1 FAM160A1 4q31.3 1.86 0.33 5.56 2.73x10-8 5.50x10-5 Affects NK/T-cell lymphoma. Shape change and ENSG00000149591 TAGLN SM22 11q23.3 0.99 0.18 5.55 2.94x10-8 5.50x10-5 transformation sensitive actin- binding protein. -8 -5 ENSG00000231508 RP11-452K12.6 NA NA 1.29 0.23 5.52 3.44x10 5.74x10 Unknown TNF receptor that induces ENSG00000054967 RELT TNFRSF19L 11q13.4 0.71 0.13 5.49 4.12x10-8 6.46x10-5 apoptosis and mediates p38 and JNK activation -8 -4 ENSG00000168917 SLC35G2 TMEM22 3q22.3 2.30 0.43 5.37 7.92x10 1.11x10 Unknown -8 -4 ENSG00000197301 RP11-366L20.2 NA 12q14.3 2.76 0.51 5.37 7.81x10 1.11x10 Unknown. Stress induced transcription factor, SUMO-linked ENSG00000123358 NR4A1 GFRP1 12q13.13 1.40 0.26 5.34 9.26x10-8 1.24x10-4 ubiquitination leads to macrophage death and cytokine signalling.

Chapter 4: RNA expression in AS 195

Pathway analysis

Pathway analysis identified GO and KEGG terms which were overrepresented within the previously identified significant DEGs (adjusted p-values <0.05). Greater numbers of DEGs were associated with reverse strand reads, although CD8+T-cells, γδ T-cells and NK cells had a small number of DEGs (<100) for both forward and reverse strand reads. Genes identified as associated with specific KEGG or GO terms are in order of significance within DEG analysis (adjusted p-value).

CD4+T-cells

GO terms of leukocyte degranulation (including SYK, TYROBP, YPEL5, SLC2A3,

CLEC12A, FCER1G, NR4A3, CD93, MNDA, ITGAX, FCGR2A, S100A9, SIGLEC14,

SERPINA1, LILRB2, OSCAR, CD33, DOK3, MMP9, SLC11A1, LILRB3, CCR2) and leukocyte differentiation (SYK, TYROBP, JUN, DUSP10, HHEX, FCER1G, JUNB,

LTBR, LYN, MEF2C, NFAM1, CSF1R, LILRB2, VNN1, OSCAR, IL18, CD1D, SPI1,

TLR4, FOS, CBFA2T3, PTGER4, MMP9, DNAJB9, KLF6, CD86, PLCG2, LRRK1,

CCR1, LILRB3, STAT3, BTK, IFNG, HLX, NFKBIZ, FAM20C, BCL3, LIG4, GPR18,

JMJD6, RELB, SOCS1, CD8A, FASN, TMEM64, BATF2, LILRB1, IRF2BP2,

SH3RF1, SBNO2, SEMA4A) were enriched for within the group of DEG identified in

CD4+T-cells (both forward and reverse reads). Multiple genes overlap between the

GO terms of leukocyte degranulation and leukocyte differentiation (SYK, TYROBP,

FCERG1, LYN, NFAM1, LILRB2, VNN1, OSCAR, MMP9, LILRB3, BTK) (Figure

4.11). In addition, CD4+T-cell forward strand reads were associated with vesicle organisation (HOOK2, SERPINA1, RAB11A, VTI1B, STX1A, STX3, DYSF, CHMP1B,

RAB14, DNM3), and reverse strand reads were associated with GO terms of response to bacterium.

196 Chapter 4: RNA expression in AS

Figure 4.11 Cnetplot of CD4+T-cell reverse strand-associated GO terms. Highly overlapping terms with leukocyte degranulation are shown as a dense web on the bottom right.

Chapter 4: RNA expression in AS 197

The KEGG terms identified as associated with CD4+T-cells in both forward and reverse strand reads included osteoclast differentiation (SYK, TYROBP, JUN,

FCGR2C, JUNB, JUND, FCGR2A, NCF1, CSF1R, LILRB2, OSCAR, LILRA2, SPI1,

FOS, LILRA1, SOCS3, LILRA5, PLCG2, LILRB3, BTK, IFNG, FOSB, LILRA6, RELB,

FCGR1A, SOCS1, LILRB1, NCF2, MAPK1, LILRB4, RELA, TRAF6, BLNK,

TNFRSF1A, FCGR3A, NFKB2, PIK3R1, PPP3CC) and phagosome (FCGR2C,

FCGR2A, CYBB, NCF1, CLEC7A, TLR4, TUBA1A, HLA-DMB, HLA-DRB5, HLA-

DMA, HLA-DRA, COLEC12, FCGR1A, MPO, CTSS, NCF2, TFRC, ITGA5, CD209,

HLA-DQA1, ATP6V1C1, FCAR, M6PR, FCGR3A, ATP6V0B, ATP6V1A, ITGAV,

HLA-DPA1, TUBB4B, RAB5A, STX7).

Several disease-related KEGG terms were associated with CD4+T-cell DEGs, including tuberculosis and Epstein-Barr virus infection in both forward and reverse strand reads, and influenza A, toxoplasmosis, leishmaniasis and SLE in reverse strand reads. These disease related KEGG terms overlapped with DEGs involved in osteoclast differentiation, phagosome and, in regards to reverse strand terms, antigen processing and presentation (HSPA6, HLA-DMB, HLA-DRB5, HLA-DMA, IFNG,

HLA-DRA, HSPA1A, CIITA, CTSS, HSPA2, CD74, HLA-DQA1, HLA-DPA1).

CD8+T-cells

Forward strand CD8+T-cell did not have any GO or KEGG terms with more than two associated DEG. Numerous terms were associated with only JUN, and FOS, making it difficult to determine which term is truly enriched within the samples. When a nominally significant p-value (p-value < 0.01, permuted FDR = 0.06) was used instead of the adjusted p-value, forward strand DEGs were enriched for GO terms of CD8- positive, alpha-beta T-cell activation (EOMES, RUNX1, MAPK8IP1), regulation of inflammatory response (TNFRSF11A, FOXP3, DUSP10, MFHAS1, PTGS2, GGT1,

198 Chapter 4: RNA expression in AS

OSM, NT5E, VTN, STAP1, PER1, NPPA, CFB), negative regulation of production of molecular mediator of immune response, and regulation of epithelial migration (JUN,

DUSP10, SEMA5A, PPARG, RHOB, SPARC, ITGB3).

CD8+T-cell reverse strand DEG had a greater number of DEG passing the adjusted p- value cut-off (78 genes), with GO terms of response to cAMP (JUN, FOS, JUND,

DUSP1, SPARC, JUNB), myeloid leukocyte differentiation (JUN, TNFRSF11A, FOS,

PPARG, PF4, LIF, JUNB), and regulation of DNA binding. Further specificity within these parent terms was obtained at nominal p-value, including positive regulation of leukocyte differentiation (JUN, DUSP10, FOS, PF4, LIF, ZBTB46, TRAF6, PRKCA,

INFG, NFKBIZ, GATA3, SOCS1, RAG1, IL2, IL2RA), and positive regulation of epithelial cell migration (JUN, DUSP10, SEMA5A, PPARG, RHOB, SPARC, ITGB3,

PDCD6, PLK2, PRKCA, WNT7A, IFNG, NOS3, MIR23A, VEGFA, ADAM9, GATA3).

KEGG terms reflected the small number of DEG passing the adjusted p-value cut-off, therefore nominal p-value cut-off (p-value<0.01) will be discussed. Similar to CD4+T- cells, osteoclast differentiation was an enriched term between both forward and reverse strand reads (JUN, TNFRSF11A, FOS, JUND, PPARG, ITGB3, JUNB, TRAF6, GAB2,

FOSB, FOSL1, IFNG, SOCS3, PPP3CC, TRAF2, SOCS1), as was spliceosome

(TRA2B, BCAS2, DHX8, SNW1, SRSF3, EIF4A3, PRPF18, PHF5A, SRSF10,

SNRPB2, BUD31, HSPA2). Forward strand reads were associated with purine and pyrimidine metabolism. CD8+T-cell reverse strand reads were associated with MAPK signalling pathway (JUN, DUSP10, FOS, DDIT3, JUND, DUSP1, GADD45G,

DUSP5, EGF, TRAF6, PRKCA, PPP3CC, CACNA1D, NR4A1, VEGFA, TRAF2,

EREG, HSPA2), apoptosis (JUN, FOS, DDIT3, PMAIP1, TUBA1A, CYCS,

GADD45G, DIABLO, DFFB, TRAF2), and IL17 signalling pathway (JUN, FOS,

JUND, TRAF6, TRAF4, FOSB, FOSL1, IFNG, TRAF2).

Chapter 4: RNA expression in AS 199

CD14+monocytes

CD14+monocytes forward and reverse strand DEGs were enriched for GO terms of inflammatory response (XCR1, TNFAIP3, CD6, ADORA2A, TNIP1, NFKBIA,

PRKCQ, TNF, IL6, PLGRKT, THBS1, TNIP3, IRAK2, ZC3H12A, NFKBIZ, PTGER4,

TBK1, IL2RA, NT5E, ADORA2B, BIRC2, IL1B, HIF1A, ICAM1, NFKB1, KDM6B,

CSF1, IL17RB, TNFRSF4, NFKBID, CXCL3, CLU, ECM1, REL, PXK, CXCL2,

SMAD3, NAMPT, STK39, PIK3AP1, B4GALT1, ORM2, CXCL1, CD40LG, ABCF1), leukocyte cell-cell adhesion (CD6, ADORA2A, TNIP1, PRKCQ, TNF, IL6, LAX1,

ZC3H12A, NFKBIZ, ITGB7, IL2RA, NT5E, BAD, IL1B, ICAM1, CD83, MIA3,

ADTRP, MAP3K8, IRF1, NFKBID, NR4A3, ZAP70, PODXL2, DPP4, ITGB1,

LRRC32, LGALS3, SHH, CD40LG), and negative regulation of intracellular signal transduction ( including TNFAIP3, TNIP1, TNIP3, IL1B, DUSP2, ITGB1, CD22,

CARD19, TNFAIP1, PDE3B)(Figure 4.12). As expected, the GO term of response to tumour necrosis factor (TNF) was enriched (TRAF1, TNFAIP3, NFKBIA, TNF,

ZC3H12A, GBP2, ADAMTS13, GSDME, TNFRSF18, ZFP36, BIRC2, ICAM1,

ZFP36L1, NFKB1, TNFRSF4, HSPA1A, CD40LG). CD14+monocyte reverse strand

DEG were associated with GO terms of regulation of mRNA metabolic process, nucleus organization and positive regulation of cytokine production (IL23A, TNF,

CD83, RIPK2, CD274, CTNNB1, PDE4B, DDIT3, NFKB1, PTGER4, RGCC, NFKB2,

TICAM1, DDX21, DDX3X, HIF1A, CCR7, CD226, FLT4, NLRP3, CD80, EIF2AK3,

THBS1, CREB1, TRIM32, BIRC3, TUSC2, RELA, ADAM17, GBP5, SMAD3, IRF1,

PUM2, TNFRSF14, CRLF2, CLU, MAVS, NR4A3, MAPKAPK2, PELI1, BCL10,

HHLA2, PUM1, TIRAP, ATF4, SELENOK, CARD8, HSPA1B, TBK1, EREG, MBP,

DHX33, AKIRIN2, GPRC5B, PANX1, LY9, IL1RAP, TRAF6, HSPA1A).

200 Chapter 4: RNA expression in AS

Figure 4.12. Cnetplot of the 5 most statistically significant GO terms associated with CD14+monocyte DEG in forward (left) and reverse (right) strand reads.

Chapter 4: RNA expression in AS 201

KEGG terms associated with CD14+monocyte DEG included TNF signalling pathway

(CXCL1, CXCL2, CXCL3, IRF1, CSF1, MAP3K8, NFKB1, ICAM1, CFLAR, IL1B,

BIRC2, IL6, TNF, NFKBIA, TNFAIP3, TRAF1, MAP2K3, MAPK1, ATF4, SOCS3,

RELA, BIRC3, CREB1, EDN1), NF-kappa β signalling pathway (CD40LG, CXCL1,

BCL2A1, CXCL2, ZAP70, CXCL3, NFKB1, ICAM1, CFLAR, IL1B, BIRC2,

GADD45B, TNF, PRKCQ, NFKBIA, TNFAIP3, TRAF1) and NOD-like receptor signalling pathway (TRAF6, GABARAPL1, GBP4, CARD16, PANX1, MAPK1,

DHX33, TBK1, CARD8, CXCL3, MAVS, TANK, GBP5, RELA, GBP1, VDAC2,

BIRC3, CARD17, NLRP3, NAMPT, TICAM1, NFKBIB, NFKB1, GBP2, NFKBIA,

RIPK2, TNF, TNFAIP3) (Figure 4.13). Both apoptosis (BCL2A1, PMAIP1, ERN1,

NFKB1, CFLAR, BAD, BIRC2, TUBA3E, GADD45B, TNF, NFKBIA, TRAF1), and ferroptosis were significantly enriched pathways in AS individuals CD14+monocytes.

Figure 4.13 Cnet plot of CD14+monocyte reverse strand read associated KEGG terms.

202 Chapter 4: RNA expression in AS

γδ T-cell

GO term analysis of γδ T-cells for forward strand reads was untenable as only a single term had more than one associated DEG (actin cytoskeleton organization with

ARPC1A and IFN2). Reverse strand reads had greater numbers of DEG, with the largest number of DEG associated with small GTPase mediated signal transduction

(GPR55, FOXM1, RAB13, C15orf62, ARHGAP32, RHOBTB3). At nominal p-value cut-off (p-value <0.01, permuted FDR 0.32) the term with the lowest p-value in forward strand reads was vascular smooth muscle contraction (ACTA2, CD38, P2X1,

DOCK4, DOCK5), and in reverse strand reads was DNA packaging ( including multiple histone genes and NAA10, GPER1, CENPW, CDCA5, NUSAP1, NCAPG,

CDK1, TOP2A, NCAPG2, SPTY2D1, OIP5, TPR, , CHAF1A).

KEGG terms associated at nominal p-value (p-value<0.01, permuted FDR 0.32) with

γδ T-cell forward reads included Notch signalling (HDAC1, RBPJ, PSEN2, APH1B,

LFNG) oxidative phosphorylation and several metabolism terms, (Glyoxylate and dicarboxylate metabolism, Carbon metabolism, and Glycine, serine and threonine metabolism). Reverse transcripts were associated with KEGG terms of Cell cycle

(ESPL1, CCNA2, CCNB2, CDK1, MCM4, PLK1, CDC27, CDC6, BUB1B, TFDP2),

AMPK signalling pathway (CCNA2, PRKAA1, STK11, PPP2R3A, FBP1, IGF1R,

PFKP, PPARG) and adherens junction (ERBB2, FER, NECTIN4, FARP2, SSX2IP,

IGF1R), alongside several disease terms associated with histone markers.

NK cells

NK cell DEG at adjusted p-value <0.05 were associated with GO terms of positive regulation of cell-cell adhesion (NFKBIZ, AIF1, MYO10, TNFSF13B), inflammatory response (NFKBIZ, AIF1, P2RX1, FOS, ORM2) and cognition (NTAN1, RIN1, FOS,

Chapter 4: RNA expression in AS 203

SGK1) in forwards strand reads. In reverse strand reads associated GO terms included regulation of smooth muscle cell proliferation (TCF7L2, TNF, TRIB1, ITGA2, IFNG,

KLF4, JUN, AIF1, ANG, NAMPT, NR4A3, MYB, EDN1), muscle structure development (DDIT3, TAGLN, EGR3, TCF7L2, ATF3, NEURL1, MRAS, DDX5, TNF,

EGR1, MIR23A, FZD1, CDON, ZBTB18, FOS, SGCA, LMNA, ASF1A, NR1D2,

ZFPM1, HES1, SORT1, HEY1, SLC8A1, DMD, EDN1, YBX3), positive regulation of leukocyte cell-cell adhesion (CCR2, NFKBIZ, EGR3, DUSP10, TNF, CD83,

CEACAM1, IFNG, , AIF1, TNFRSF13C, SFTPD, VCAM1, SIRPB1, NR4A3,

HES1, MYB, LILRB2) and humoral immune response (CCR2, TNF, CD83, RGCC,

IFNG, C1QA, SFTPD, FCER2, ITLN1, RNASE6, CFD, PPBP, SLC11A1) (Figure

4.14). Several genes are shared between these terms.

Figure 4.14 Cnet plot of NK cell reverse strand associated GO terms

204 Chapter 4: RNA expression in AS

Forward strand KEGG terms in NK cells were again restricted to nominal p-values, included KEGG terms of SNARE interaction in vesicular transport (STX3, VTI1A,

STX1A, STX18, STX17), Sphingolipid signalling pathway (NSMAF, DEGS2, S1PR1,

RAC1, PPP2R2D, MAPK12, SPTLC2, FYN, PPP2R1B) and MAPK signalling pathway. NK cell reverse strand KEGG terms were a mix of terms relating to cardiomyopathy, the largest number of DEG being associated with the KEGG term arrhythmogenic right ventricular cardiomyopathy (ARVC) (ITGB3, DMD, SLC8A1,

LMNA, SGCA, JUP, ITGA2, CACNA2D4, TCF7L2, ITGA1), additionally the KEGG term transcriptional misregulation in cancer (MEN1, NR4A3, EWSR1, BCL2A1, SPI1,

CSF1R, JUP, CDK14, DDX5, LYL1, NFKBIZ, DDIT3). KEGG terms identified regarding specific cell function included apoptosis (DIABLO, LMNA, BCL2A1, FOS,

IL3RA, JUN, TNF, PTPN13, PMAIP1, DDIT3) and MAPK signalling pathway

(JUND, FOS, CSF1R, JUN, DUSP1, TNF, MRAS, CACNA2D4, DUSP10, NR4A1,

DDIT3).

Chapter 4: RNA expression in AS 205

Discussion

From 500 potential samples, derived from AS patients and healthy controls across five cell types, 420 (84%) samples had sufficient RNA to prepare RNAseq libraries with.

Low input library preparation kits enabled more biological replicates with some reduction in coverage of low expression genes. It is assumed that low expression genes contribute less to disease variation, and low gene counts are generally removed during analysis.

The average number of raw reads for all samples was 49 million. Four samples had poor quality libraries (less than 40 million raw reads, and less than 50% of reads aligned), and were removed from further analysis. After removal the average number of aligned reads per sample was 39.2 million (range 23.3 million to 87.4 million), which was in line with ENCODE guidelines for RNAseq studies (>30 million aligned reads). The average length of the aligned reads was 287bp.

Reads were examined for technical and biological variation separate from disease status. Library size (number of aligned reads) was a technical variable due to the broad range of read numbers. After adjustment for library size, cell type was the only significant variable in PC 1 and 2. As observed with DNA methylation measures, samples segregated by cell lineage to a greater extent than variation due to AS status.

As discussed in chapter 3, this observation is important due to the preference in most studies of using PBMCs or synovial fluid which contain multiple cell types in varying proportions. The use of these sample types can prevent identification of the cell types causing changes in expression and can conflate changes in cell type proportions with changes in function within a cell type.

206 Chapter 4: RNA expression in AS

Library preparation batch was a significant technical variable. Samples were randomized across library preparation plates based on disease status, anti-TNFα treatment status, and cell type. This suggested that the variation observed across library preparation batches was due to technical rather than biological factors. This can be due to small variations in handling, kit manufacture batches (1 kit per 2 library preparations), or storage time. Biological variables were those identified as disparate between cohorts in Chapter 2, and as variables in Chapter 3: sex, age, and smoking status. The library method for this study enabled stranded gene expression analysis, that is that the transcripts could be identified as being from forward or reverse strands.

This is important as some reverse transcriptional transcripts have a role in regulation of their forward gene transcript counterparts. As such these transcripts were examined separately. The lack of known genes in reverse transcripts is due to the fact that the majority of genes are expressed in the forwards strand orientation.

Both technical and biological variables were included in the experimental design rather than being statistically adjusted for separately which can lead to overfitting of variation. DESeq2 performs better with more samples included in differential expression analysis. Therefore, samples were analysed together using a group term incorporating both AS status and cell type (e.g. AS.CD4), and then specific contrasts within cell types were extracted afterwards.

During initial DEG analysis four healthy CD14+monocytes samples were identified with expression patterns that diverged from all other CD14+monocytes samples, both

AS and healthy individuals. No technical issues were noted in these individuals, including sample purity, library preparation or sequencing pool. All samples were female and between 20-35 years old. The genes identified at higher expression in these samples compared to the other CD14+monocytes remained below the levels observed

Chapter 4: RNA expression in AS 207

in other cell types (e.g. LCK: CD14+monocyte samples: ~ 1-100 reads, T-cells and

NK cells: ~5,000-10,000 reads, Outliers: ~200-1,000 reads). Apart from these four outliers, the expression levels of the cell types within this study broadly corresponded to those reported by the Human Protein Atlas.

The gating strategy used to isolate CD14+monocytes (CD3-/CD56-/ CD14+) also encompasses DC and neutrophils. Neutrophils should be excluded by the PBMC isolation method used, and none of these samples were processed on the same day or by the same person it is unlikely that PBMC isolation is the cause. The expression of several DC markers (CD11b, CD11c, CD1c, IL3RA, CLEC4C and CD33) were all upregulated in these samples, however the expression patterns for other genes including LCK did not match any cell type within the tissue database or within this study 294. Perhaps suggesting that this observed difference is due to a mixture of cell types. The CD4+ and CD8+T-cells of two individuals were also distinctly different from the other healthy control samples. As the reason for the different expression profiles of these samples is unknown and could not be adjusted for, it was elected that the samples should be excluded from analysis.

All cell types had DEG that were significant after multiple testing adjustment

(Benjamini-Hochberg), however CD8+T-cell and γδ T-cells DEG were few. The number of potential permutations of disease status is quite high (2100= 1.26x1030) and therefore a practical number of permutations was selected, 100 permutations. Where asymptotic FDR (Benjamini-Hochberg) failed to allow analysis, permuted FDR determined cut-off of p-value <0.01 was used as a nominal p-value cut-off.

AS-related genes

The genes that are located within or in close proximity with loci that have been identified as AS-associated at genome-wide significance were examined for patterns

208 Chapter 4: RNA expression in AS

of cell type specific or global changes in expression. A mixture of expression patterns were observed, corresponding to the function of the gene in question. FOS and

PTGER4 were differentially expressed in all cell types except γδ T-cells (which may relate to a lack of sufficient power as the log fold change in γδ T-cells is higher). AIRES was most differently expressed in γδ T-cells at log2 fold change -1.56 but did not reach statistical significance (adjusted p-value = 0.11). SOCS1 and TNFAIP3 were significantly increased in CD4+T-cells and CD14+monocytes, perhaps an indication of the power of these cells as similar directions of change in expression were observed in the other three cell types. A large number of the AS-associated genes identified as differentially expressed were only differentially expressed in CD14+monocytes.

Interestingly, TBX21 and IL23R were not identified as significantly differently expressed in T-cells or any other cell type as has been previously reported. This may be due to the relatively low expression and protein levels of these genes. Due to the number of cell type specific associations identified it is appropriate to discuss these changes within the context of the results for each cell type individually.

CD4+T-cells

CD4+T-cells DEG analysis indicated altered T-cell receptor signalling pathway genes, including genes related to AS-associated loci (FOS, TLR4). The upregulation of JUN and FOS alongside reductions in expression of genes in the TCR signalling pathways

(SYK, LAT2, SCIMP, TYROBP, LAT3, PLCG2) is a pattern of expression observed in the transition from naïve T-cell to memory T-cell 295,296. Toll-like receptor signalling genes also exhibited altered expression with increases in STAT3, SOCS3, SOCS1 and

TRAF6, and decreased expression in TLR4, IRF5, IRF8. SOCS3 and SOCS1 are involved in the regulation of leukocyte differentiation acting as both activating and suppressive signalling for naïve CD4+T-cell differentiation. The upregulation of both

Chapter 4: RNA expression in AS 209

SOCS1 and SOCS3 expression in CD4+T-cells of AS individuals indicates potential upregulation of the differentiation into several cell types, including Th17 cells, Th1 cells and Tregs. Genes involved in the additional signalling required for differentiation into these subsets were also upregulated. Th1 differentiation can be driven by IL6 signalling, and IL6 is upregulated in AS individual APCs (NK cells and CD14 monocytes) 297,298. SOCS3 suppresses differentiation by IL-12 into Th1 cells, which would suggest that if Th1 differentiation is occurring it is by IL-6 signalling. Tregs are identified by expression of FOXP3 and CTLA4. While CTLA4 expression was increased in AS individuals it was not significant (0.43 log2FC, adj. p-value =0.21), and there was no difference in the expression of FOXP3. Interestingly, HHEX was significantly downregulated in CD4+T-cells (reverse strand -1.64 log2FC, adj. p-value

= 1.76x10-8). HHEX is a TGF-β signalling pathway gene whose signalling is downregulated in regulatory T-cells and may act as a suppressor of FOXP3 signalling

299. IL23R has previously been identified as AS-associated in genetic association studies, but was not significantly different in AS individuals CD4+T-cells within this study (-0.35 log2FC, adjusted p-value 0.69) 300,301. IL-23 can drive differentiation of

T-cells to Th17, which is regulated by SOCS1 and SOCS3 134. Differentiation of Th17 cells require STAT3 and SMAD expression, both of which were upregulated in CD4+T- cells. While IFN-γ can suppress this differentiation, SOCS1 can counteract the suppressive influence of this cytokine.

Consistent with the observations from the DEGs identified, the enriched GO terms for

CD4+T-cells DEG included leukocyte differentiation and leukocyte degranulation, with overlapping genes identified between the two GO terms, suggesting that this finding may be driven by leukocyte differentiation more so than degranulation.

CD4+T-cell degranulation occurs on TCR engagement and T-cell activation.

210 Chapter 4: RNA expression in AS

Degranulation is a process whereby cells release cytotoxic molecules. Additional GO terms of vesicle organisation, response to bacterium and immune response (both adaptive and innate) were identified, overlapping with the terms of leukocyte differentiation and leukocyte degranulation. The terms of adaptive immune response and innate immune response included AS-associated genes (TLR4, TNFAIP3, STAT3,

SOCS1, PTGER4).

KEGG terms were dominated by the term for “osteoclast differentiation”, which encompassed many of the DEG identified within the GO terms of leukocyte differentiation (e.g. JUN, SOSC1, SOCS3), alongside the term for phagosome. KEGG terms phagosome and antigen processing and presentation included the upregulation of several HLA genes alongside IFNG. Broad immune related disease terms were also identified, encompassing viral, bacterial and autoimmune diseases, indicating that these terms were enriched based on the broad adaptive and innate immune response

DEGs rather than an indication of specific disease relation to AS.

Cumulatively these changes indicate that there is greater CD4+T-cell differentiation to activated subtypes, such as Th17 cells, in AS individuals. This may be due to the increased inflammatory cytokines expressed within AS individuals, particularly IL-23 which can drive Th17 differentiation.

CD8+T-cells

Two genes which overlap with AS-associated loci were significantly differentially expressed in the CD8+T-cells of AS individuals (FOS and PTGER4). Both of these genes were upregulated in both CD8+T-cells and CD4+T-cells. Similar to CD4+T- cells, there was increased T-cell receptor signalling in CD8+T-cells (JUN, FOS,

JUND, ATF3, NR4A2, CXCL4, LIF, CYCS, JUNB, CD69, SIK1, ING3, PTGER4,

DUSP10), indicating CD8+T-cell activation. Interestingly, CD69 and NR4A2 were

Chapter 4: RNA expression in AS 211

both downregulated in AS individuals in mixed cell type studies using array and qRT-

PCR (fold change ~0.5 and ~0.25 respectively) 185.

The majority of DEG in CD8+T-cells had increased expression in AS individuals.

TNFRSF11A, which has previously been correlated with radiographic severity in AS individuals, was increased in the CD8+T-cells of AS individuals in this study.

TNFRSF11A is a receptor involved in the NF-κβ signalling pathway. EEF1DP3, a likely pseudogene with no annotated function, was described as having a deletion based CNV associated with increased risk of AS in a Korean cohort. In the current cohort, AS individuals had significantly increased expression of EEF1DP3 in CD8+T- cells.

When using only the DEG with an adjusted p-value <0,05, the most significantly enriched terms contained only two genes: FOS and JUN. This was likely due to the low number of DEG passing the adjusted p-value <0.05 cut-off. To further investigate potential pathways that were altered but did not reach statistical significance in

CD8+T-cells, a nominal p-value cut-off determined by permuted FDR was used instead (p-value 0.01, permuted FDR=0.06). At nominal p-value the enriched terms for forward strand reads included alpha-beta T-cell activation, regulation of inflammatory response (incl FOXP3, TNSFR11A) and epithelial migration. Reverse strand reads had a greater number of DEG passing adjusted p-value cut-off with terms of cAMP and myeloid leukocyte differentiation (JUN, TNFRSF11A, FOS, PPARG,

PF4, LIF, JUNB). At nominal p-value, this was further specified as positive regulation of leukocyte differentiation (JUN, DUSP10, FOS, PF4, LIF, ZBTB46, TRAF6,

PRKCA, INFG, NFKBIZ, GATA3, SOCS1, RAG1, IL2, IL2RA). The DEG within the term of positive regulation of leukocyte differentiation exhibited increased expression except for RAG1. RAG1 is part of the RAG complex that is responsible for V(D)J

212 Chapter 4: RNA expression in AS

recombination in B and T-cells302,303. During activation RAG1 is downregulated256.

Similar to CD4+T-cells KEGG terms of osteoclast differentiation, MAPK signalling and IL17 signalling pathway (JUN, FOS, JUND, TRAF6, TRAF4, FOSB, FOSL1,

IFNG, TRAF2) were identified. Interestingly there is no difference in expression of

TNFA, IL17, granzyme B or FasL, all of which are indications of cytotoxicity in

CD8+T-cells. Neither do these cells exhibit changes in Th2-like cytokines IL4, IL5,

IL6 and IL10.There is some evidence for a subset of CD8+T-cells that exhibit a Th17 phenotype that does not produce granzyme B, and in vitro can exhibit loss of IL-17 expression but retains IFNγ expression304. This would correlate with the observations within the CD8+T-cells of AS individuals.

Cumulatively, these results indicate that the TCR of CD8+T-cells is being engaged and driving activation through the TCR signalling pathway in AS individuals. These

CD8+T-cells exhibit a Th17-like phenotype with expression of IFNγ but with loss of

IL-17 expression. These cells exhibit lower lytic activity compared to Th17 cells or cytotoxic CD8+T-cells.

CD14+monocytes

The largest number of AS-associated loci genes exhibited changes in gene expression in CD14+monocytes, including NFKBIA, SOCS1, PTGER4. CD14+monocytes are typically the most populous cell type in PBMCs, which may be the reason so many of these changes identified in PBMC were also observed in CD14+monocytes. However, several AS-associated loci genes ( NFKBIA, PTGER4, IRF1, SMAD3, SOCS2, ERN1) which had increased expression in AS individual CD14+monocytes from this study, had reduced or flat expression within other cell types examined and in previous studies of pooled samples185. This indicates the cell type specificity of these samples, and how cell type specific changes can be obscured within pooled samples.

Chapter 4: RNA expression in AS 213

CD14+monocytes exhibited increased expression of inflammatory cytokines (TNFα,

IL23, IL6, IL1β), which is consistent with previous observations in the whole blood and PBMCs of AS individuals. The “reverse IFNγ” signature observed previously in

PBMC differentiated macrophages was not evident in the CD14+monocytes examined in this study as there was no change in the expression of IFNG (adjusted p-value>0.8)

191. TNFAIP3, a TNF-α receptor regulated by TLR4 and NF-κβ signalling, and TNIP1, which binds TNFAIP3, had increased expression in CD14+monocytes of AS individuals. This indicates increased TNF-α signalling within CD14+monocytes.

TLR4 is not differentially expressed in CD14+monocytes (adjusted p-value>0.9), however genes upregulated by NF-κβ were differentially expressed, including TNFα,

CCL3, CCL4 and IL10. IRAK2 and STAT4 are both upregulated in AS individuals, both of which act through NF- κβ signalling pathway.

ETS2 was only differentially expressed in CD14+monocytes, not any of the other cell types examined (reverse strand: 0.56 log2FC, adjusted p-value 5.49x10-4, forward strand: 0.35 log2FC, adjusted p-value 0.45). ETS1 was not differentially expressed

(adjusted p-value>0.8). ETS2 is a transcription factor that is induced during monocyte activation, and is a candidate gene for the 21q22 locus associated with AS 140,305.

Pathway analysis identified a large number of DEG associated with GO terms of inflammatory response, leukocyte cell-cell adhesion and negative regulation of intracellular signal transduction. In addition, were expected terms of response to TNF

(including increased expression of TNF, TRAF1, TNFAIP3, NFKB1, TNFRSF4) and positive regulation of cytokine production (TNF, IL23R, SMAD3). KEGG terms were also associated with TNF signalling, NF-kappa beta signalling (CD40LG, CXCL1,

BCL2A1, CXCL2, ZAP70, CXCL3, NFKB1, ICAM1, CFLAR, IL1B, BIRC2,

GADD45B, TNF, PRKCQ, NFKBIA, TNFAIP3, TRAF1) and NOD-like receptor

214 Chapter 4: RNA expression in AS

signalling (TRAF6, GABARAPL1, GBP4, CARD16, PANX1, MAPK1, DHX33, TBK1,

CARD8, CXCL3, MAVS, TANK, GBP5, RELA, GBP1, VDAC2, BIRC3, CARD17,

NLRP3, NAMPT, TICAM1, NFKBIB, NFKB1, GBP2, NFKBIA, RIPK2, TNF,

TNFAIP3) , with several genes overlapping between these terms.

Activation of CD14+monocytes is indicated by the upregulation of NF-κβ signalling and expression of inflammatory cytokines associated with AS (TNFα, IL23, IL6, IL1β).

Increased cell-cell adhesion indicates a role for AS individual CD14+monocytes in activation of other immune cells, perhaps T-cells. Ferroptosis, non-apoptotic form of cellular death, and apoptosis (BCL2A1, PMAIP1, ERN1, NFKB1, CFLAR, BAD,

BIRC2, TUBA3E, GADD45B, TNF, NFKBIA, TRAF1) were both enriched terms for

DEG within CD14+monocytes of AS individuals. Apoptosis can be triggered by toll- like receptor signalling or NOD-like receptor signalling, both of which are also implicated in AS individual CD14+monocytes. Sustained TNF signalling would appear to be the driving cytokine for these pathways and functions, therefore as the

AS individuals examined were in established disease it is difficult to hypothesise whether these changes are in turn driven by the sustained expression of TNF-α, or caused the sustained expression of TNF-α.

γδ T-cells

The low number of genes that were significantly differentially expressed within AS individual γδ T-cells (after adjustment for multiple testing) reflected the small sample size available for this cell subset. Whilst no genes associated with AS-associated loci were significant, AIRE, which encodes a transcription factor essential to T-cell self- tolerance, had large changes that were approaching significance in γδ T-cell (-1.56 log2FC, adjusted p-value 0.11 in reverse strand reads). Similarly, TNFRSF11A, which

Chapter 4: RNA expression in AS 215

was significantly differentially expressed in CD8+T-cells had large but non-significant changes in expression in γδ T-cells (2.51 log2FC, adjusted p-value 0.18).

Of those DEG that were significant, the most interesting was TRADD, which was downregulated in AS individual γδ T-cells. A locus harbouring the gene TRADD has been reported to have a suggestive association with AS by previous genetic association studies293. TRADD is an adapter molecule for TNFRSF1A, that is required for NF-κβ signalling and TNF-α-induced apoptosis. TRADD fulfils these functions as part of a complex with TRAF2 and RIP, both of which are also downregulated in γδ T-cells

(log2FC <-0.2), however not significantly (adjusted p-value >0.2). Several of the significant DEG in γδ T-cells were associated with cytoskeleton remodelling and T- cell differentiation (FOXM1, MKPL2, DAB2). T-cell differentiation itself requires cytoskeleton remodelling. FOXM1 is a master regulator of mature activated cells, and early thymocyte proliferation. DAB2 is a transcriptional regulator that interacts with

FOXP3 and is responsible for Treg maintenance of naïve T-cell homeostasis306. The upregulation of FOXM1 and DAB2 indicates differentiation of γδ T-cells into an activated T-cells is occurring at an increased rate in AS cases.

Several genes with large changes in expression between AS individuals and healthy controls failed to reach statistical significance, although related genes with small fold changes did. For example, TET2, one of three TET genes responsible for the successive removal of DNA methylation marks, had a small yet significant increase in expression

(0.25 log2FC, adjusted p-value 0.04). DNMT3L, a catalytically inactive DNA methyltransferase which has previously been associated with enhancing the activity of

DNMT2 and DNMT1, had a large change in expression, however this did not reach statistical significance (-2.13 log2FC, p-value 0.21). No change in expression was

216 Chapter 4: RNA expression in AS

observed in any of the other DNMT family genes, however the changes in TET2 and

DNMT3L may indicate an increased demethylation within γδ T-cells.

The 20 most significant DEG within γδ T-cells were sufficient to begin separating AS and healthy individuals within hierarchical clustering (Figure). This may be due to the large log fold changes exhibited within γδ T-cells. Indicating that although only a small number of samples were able to be obtained for this cell type they exhibit large changes in gene expression within AS individuals.

GO terms for γδ T-cells had limited DEG at adjusted p-value, with a single term in each strand containing more than one DEG, in forward strand reads actin skeleton organization (IFN2, ARPC1A) and in reverse strand reads small GTPase mediated signal transduction (GPR55, FOXM1, RAB13, C15orf62, ARHGAP32, RHOBTB3). At nominal p-value, the term with the lowest adjusted p-value was DNA packaging.

KEGG terms were slightly more informative at nominal p-values with forward strand reads including Notch signalling, oxidative phosphorylation and several metabolism terms, and reverse strands including KEGG terms of cell cycle, AMPK signalling pathway (CCNA2, PRKAA1, STK11, PPP2R3A, FBP1, IGF1R, PFKP, PPARG) and adherens junction (ERBB2, FER, NECTIN4, FARP2, SSX2IP, IGF1R). Metabolism changes alongside cell cycle changes are consistent with the DEG identified as involved in cytoskeletal remodelling and T-cell differentiation. As the genes associated with Notch and AMPK signalling were a mixture of upregulated and downregulated, it is difficult to identify how these pathways have been affected.

Altered expression of NF-κβ and Notch signalling indicates potential T-cell differentiation, however there is no significant increase in inflammatory cytokine expression. Despite previous studies indicating that γδ T-cells are potent IL-17A producers in AS, no significant increase in IL17, IL23R or IL1R1 expression was

Chapter 4: RNA expression in AS 217

observed in the current study 9,307. IL23R expression was previously reported to be two-fold higher in AS individuals. It is unclear what is driving this discrepancy in observed expression, and whether this indicates γδ T-cell differentiation into non- inflammatory subsets.

NK cells

The DEG identified in NK cells contained multiple genes that have not yet been annotated, making it difficult to interpret the functional effect of these genes being differentially expressed in AS individuals. These DEG are mostly increases in gene expression of AS individuals. An annotated gene that had significantly increased expression (adjusted p-value<0.05) in both forward and reverse strand reads was

NFKBIZ, which is induced by IL-17A, TNF-α and IL-22 (synergistically) and is essential for IFNγ production and NK cell cytotoxicity upon activation308. TNFα and

RELT, a TNF receptor, were both upregulated in the NK cells of AS individuals.

MIR24-2, a precursor RNA correlated with TNF, IL6, IL1β, TLR4 and CXCL8 expression, and DDIT3, an ER stress response transcription factors that positively correlates with IL6, IL23 and TNFRSF10B expression, were both upregulated in AS individuals. However, only TNFα expression was significantly upregulated. No other inflammatory cytokines were significantly differentially expressed. This may be an indication of the complexity of relationships between these inflammatory cytokines, and the precise circumstances they can be upregulated within.

Pathway enrichment analysis for NK cells provided sufficient significant DEG

(adjusted p-value<0.05) to identify pathways. NK cell GO terms in forward reads included cell-cell adhesion and inflammatory response (NFKBIZ, AIF1, P2RX1, FOS,

ORM2). Reverse strand DEG contained child terms of the forward strand reads GO terms: positive regulation of leukocyte cell-cell adhesion (CCR2, NFKBIZ, EGR3,

218 Chapter 4: RNA expression in AS

DUSP10, TNF, CD83, CEACAM1, IFNG, KLF4, AIF1, TNFRSF13C, SFTPD,

VCAM1, SIRPB1, NR4A3, HES1, MYB, LILRB2) and humoral immune response

(CCR2, TNF, CD83, RGCC, IFNG, C1QA, SFTPD, FCER2, ITLN1, RNASE6, CFD,

PPBP, SLC11A1). The DEG in these terms indicate that these were upregulated pathways within AS individuals NK cells. KEGG terms were broader and included very few DEGs, with terms of apoptosis, MAPK signalling, and several cardiomyopathy KEGG terms identified.

These changes indicate activation of NK cells driven by cell-cell adhesion and engagement of the toll-like receptor pathway. This appears to result in the expression of TNFα and IFNG, but not IL6, IL17 or IL23 inflammatory cytokines.

Chapter 4: RNA expression in AS 219

Conclusions

This chapter has outlined the largest cell type specific RNAseq study undertaken on

AS individuals. The five cell types of interest, CD4+T-cells, CD8+T-cells, γδ T-cells,

CD14+monocytes and NK cells, were all examined individually for cell type specific changes in gene expression within AS individuals compared to healthy controls.

Peripheral blood mononuclear cells were examined for both practical reasons and that they represent the circulating immune profile of AS individuals. It is noted that this study will differ from those examining tissue resident immune cells. Isolation of individual cell types prior to RNAseq transcriptional profiling was essential due to the strong variation inherent to cell type alone. Technical and biological variability was able to be controlled for due to careful study design and cohort selection. While there remains variation that was not controlled by during cohort selection, the comprehensive recording of factors which can influence gene expression enabled these variables to be controlled for in statistical analysis. The observation of remaining unknown variation within four healthy individuals exemplifies the importance of recording participant demographics, as well as the remaining unknown factors which can influence gene expression beyond disease.

This chapter examined three hypotheses:

1. Cell specific changes in transcription occur in AS

2. Changes in transcription will include genes located at or nearby to AS-

associated loci

3. Changes in gene expression associated with AS are co-ordinated within

specific biological pathways.

220 Chapter 4: RNA expression in AS

Cell specific changes in transcription were identified in all five cell types examined.

DEG analysis identified that several of the genes previously identified as associated, or potentially associated, with AS loci from genetic association studies were differentially expressed in specific cell types. Unexpectedly, several genes previously observed as differentially expressed in AS such as IL-23R and RUNX3 were not identified in the current study. This may be due to the low expression levels of these genes, or may an issue of expression being correlated to one of the analysis covariates of age, sex or treatment meaning that it was controlled for during analysis. Of the AS associated DEG identified several were cell type or lineage specific, whereas others were differentially expressed in several different cell types.

CD4+T-cells had altered expression within pathways associated with T-cell differentiation to Th17 cells, including altered SOCS1 and SOCS3 expression alongside DEG within the TCR signalling pathway. However only IFNG expression was upregulated in CD4+T-cells, with no change in expression in IL17A, IL2, IL10,

TNFα, or RORC. Similar expression patterns were observed with CD8+T-cells.

However only CD4+T-cells indicated upregulation of degranulation indicating mature differentiated T-cells. CD8+T-cells displayed IFNG expression but no change in IL17 expression. Indicating that both CD4+T-cells and CD8+T-cells have increased differentiation into inflammatory subsets in AS individuals.

CD14+monocytes displayed a pattern of expression consistent with activation, including upregulation of inflammatory cytokines (TNFα, IL23, IL6, IL1β) alongside increases in TNFAIP3, a TNF receptor. Additional DEG indicate that this increase is due to upregulation of the NF-κβ, TLR and NOD-like receptor signalling pathways, and apoptosis pathways. NK cells were more discrete in the inflammatory cytokines expressed, only upregulating IFNG and TNFα. NFKBIZ was one of the most

Chapter 4: RNA expression in AS 221

significant DEG in NK cells, and is involved in expression of several cytokines including IL6. It is induced by both IL-17A, TNF-α and IL-22.

γδ T-cells were restricted in their interpretation due to a lack of power, however several of the genes that remained significant after adjustment for multiple testing were previously associated with AS. These included TRADD, previously identified as a suggestive association with AS in genetic association studies. Potential global methylation changes were also suggested by large decreases in the expression of

DNMT3L and increased expression of TET2.

Overall this chapter has shown that the role of individual cell types in AS is specific to the function of each cell type, and that changes identified within PBMCs are often driven by a single cell type. It is therefore essential to examine cell types separately to fully understand the individual changes in function that occur within AS. Additionally, genes that were identified or suggested from previous AS genetic association studies are often differentially expressed, but not all of these loci result in changes in expression. Suggesting that several mechanisms for altering gene function are responsible for the association of these loci with AS.

222 Chapter 4: RNA expression in AS

Chapter 5: Integrated analysis

Introduction

Biological changes such as DNA methylation and transcription are co-regulated in a coordinated manner based on underlying genetics and environmental influences.

Integrated analysis of multiple data types from the same individual can enable identification of a cohesive, multi-dimensional, biological signature that discriminates

AS individuals from healthy individuals more robustly than feature variation captured within a single data type. This method can compensate for missing information in one dataset, reduce the number of false positives, and elucidate multi-level changes in biological systems 211. Different methods can be used to interrogate these changes, broadly defined as either stepwise hypothesis of action (multi-staged analysis) or a hypothesis of combinational effect on phenotype (meta-dimensional analysis). These different approaches investigate whether the biological changes seen in AS individuals occur in a stepwise fashion or occur in parallel. Integrated analysis can be as direct as examining each biological measurement in relation to genetic variation (quantitative trait loci; QTL analysis), or as involved as combining several ‘omics measurements in a single analysis (termed multiomics).

QTL analysis is a well-established approach of interrogating links between genotype and a quantitative trait of interest, most commonly gene expression. Fewer studies have used this approach to interrogate the link between genotype (particularly disease- associated genetic variants) and DNA methylation. Two approaches can be used with

QTL analysis, examining relationships between genetic variation and quantitative measures from proximal features (e.g. gene enhancer, CpG sites) within the same genomic region (in cis) or across the entire genome (in trans). The quality of the

Chapter 5: Integrated analysis 223

datasets to be integrated affect the results of QTL analysis greatly, and therefore it is important to understand the limitations of the underlying measurement data will affect the statistical power of analysis and may affect the coverage of genomic regions to be investigated. QTL analysis only examines the relationship between genotyping and biological measurements affected by genotype and does not allow for interaction models between QTL measurements.

Approaches that integrate multiple datasets, such as genotypic, transcriptomic, and methylomic data, use two either of two approaches: concatenation or integration.

Concatenation forms models for disease based on each dataset individually then combines these models into a single larger model to explain the phenotype of interest.

Concatenation is reductionist and is unable to fully account for the relationships between datasets, such as the inherent association of gene expression with underlying methylation patterns. The reductionist approach misses weak effects networks of changes that occur in a coordinated fashion, but only have small effects within each dataset.

Alternatively, a meta-dimensional approach can be used for integration of datasets, such as DIABLO (data integration analysis for biomarker discovery using a latent component method for omics studies) 213. DIABLO attempts to maximise the common or correlated information between datasets whilst identifying the optimal ‘omics variables that explain and reasonably classify disease sub-groups. DIABLO is built upon the projection of latent structure (PLS) models which model and maximise the correlation between pairs of pre-classified ‘omics datasets. DIABLO can incorporate known relationships between datasets, for example the known effect of DNA methylation on gene expression, or mRNA expression on protein levels. This method

224 Chapter 5: Integrated analysis

can also be used to examine correlations between the data blocks using the initial modelling.

The DIABLO approach is a N-integration approach, that relies on the datasets being measured in the same N individuals. Examining data from independent studies is possible if the measurements were on the same P predictors (e.g. genes), however this is not relevant for the current study. Neither method is subjected to statistical significance analysis, as these methods are examining which variables best discriminate samples. The optimal model is selected using a balanced error rate measure for variable selection, and this is reported for the model and the separation of each sample subtype. Balanced error rate is the mean of the misclassification rate (the rate at which the model wrongly classifies an individual’s disease status, on all individual classes. This accounts for models where there are differences in the error rate of different data types.

Whilst studies in AS have examined the association of individual ‘omics data with genotype, no study has examined the relationship between multiple ‘omics datasets in

AS. In the related diseases of psoriasis and IBD, studies have used causal inference testing (CIT) to examine the causal relationship of individual datasets with each other

188,235. CIT examines whether the effect of one variable is independent of another, or if they cannot be separately associated, for example whether DNA methylation changes are associated with phenotype independent of genotype. This approach is limited to individual associations rather than signatures and is a hypothesis-based approach rather than a hypothesis free one. To enable a hypothesis-based approach to genetic associations with DNA methylation and gene expression in AS, an exploration of the relationship between these measurements and the genes implicated by them must be carried out.

Chapter 5: Integrated analysis 225

Chapter Overview

Overall, this chapter aimed to determine if changes in DNA methylation and expression explain the mechanism through which AS-associated loci affect immune cell function. This was approached in two ways, firstly by identifying which CpGs and genes are affected by AS-associated loci, or loci in LD with the lead SNP, and secondly, whether a signature of AS can be identified from DNA methylation, gene expression and genotype data.

Quality controlled data obtained in the previous two results chapters was used for integrated analysis, alongside genotype information for each individual. Genotype data was limited to known AS-associated loci identified in genome-wide and cross-disease studies. This is both due to the specific aim of this chapter and to reduce the number of tests performed during analysis. Only those samples with all three datasets were examined and the number of γδ T-cell samples with all three datatsets was too low to examine using this method (<10 individuals). All four other cells types had sufficient numbers of total individuals to split into both a test and a validation cohort.

I hypothesis the following:

1. Methylation and expression quantitative trait loci are associated with AS loci

2. The individual datasets created by this study contain discriminatory or

predictive values for AS

3. These datasets can be used to create a signature for AS that incorporates

variables from all three biological measures and is able to discriminate AS and

healthy individuals.

226 Chapter 5: Integrated analysis

Results

The 100 individuals enrolled into this study were genotyped using the same methods employed for the most recent genetic association study in AS 89. Individuals were confirmed to be of European Caucasian descent using PCA (data not shown). The lead

SNPs were those previously described as associated with AS that had a MAF<0.1), alongside the MHC region. SNPs in LD were defined as those within 1Mb of the lead

SNP. The total number of variants available after imputation in the regions outlined above was 39,114,720. Variants were excluded from integrated analysis if they exhibited poor imputation (R2<0.8) (2,007,095 SNPs)(~5%) or the absence of any individuals with the alternative allele within either cohort (28,936,348 SNPs)(~73%).

Analysis was carried out using the remaining 10,178,372 loci. Methylation and expression QTL analysis was carried out in cis (locally <1Mb) and in trans (> 1Mb including on other chromosomes) from the 115 AS-associated loci (a full list of all cis-

QTL results is in Appendix F(3)).

Chapter 5: Integrated analysis 227

Methylation QTL

DNA methylation data used in Chapter 3 analysis was all used as input for mQTL analysis. As mentioned in Chapter 3 with the enrichment analysis for DMP around

AS-associated loci, the MethylationEPIC array design has limited numbers of probes within cis of several AS-associated loci, compared to the loci of other diseases.

Inflation within the cis and trans mQTL was examined using QQ plot (Figure 5.1).

The mQTL results showed inflation of both cis and trans mQTL p-values (genomic inflation factor (GIF) = 0.35), but the data was normally distributed (median - log(p)=0.22). The smooth progression of the slope indicates that there is not a single mQTL with strong association driving this inflation, but rather may suggest LD between many of the mQTL identified.

Figure 5.1 QQ plot of the CD4+T-cell cis (local) and trans (distal) mQTL results.

228 Chapter 5: Integrated analysis

Cis-mQTL associations

A large number of cis-mQTL associations were observed above the cut-off

(FDR<0.05) for each cell type (Table 5.1). Comparatively the number of unique CpGs

within these cis-mQTLs was few (<10,000 unique probes). This reflects the number

of probes available locally to the AS-associated loci, with the majority of CpG probes

located distally from AS loci (discussed in Chapter 3). Almost half of the cis-mQTLs

identified were located within the HLA region on chromosome 6. Regions of strong

LD exist across the HLA locus which can result in numerous associations across the

entire region being driven by linkage with a small number of causal SNPs. As each

association is calculated separately the inclusion of the HLA region would not affect

the association analysis for other regions.

Table 5.1 The number of cis-mQTL results by cell type with the most significant cis-mQTL shown.

CD14+ CD4+T-cells CD8+T-cells γδ T-cells NK cells monocytes No. cis-mQTL 807,279 524,133 66,900 287,784 624,480 (FDR<0.05)

Unique probes 9,482 6,643 1,567 4,339 8,071

Unique probes (non-HLA 5,909 3,921 768 2,558 5,148 region) 6:32604372:G:A 6:29700840:C:T 18:12639143:C:T 6:32227488:A:G 2:9554255:G:A Most significant cg22933800 cg27230769 cg16864708 cg18698799 cg08439705 mQTL HLA-DQA1 LOC285830 SPIRE1 C6orf10 ITGB1BP1 1.24x10-27 4.69x10-26 3.79x10-8 3.30x10-19 4.76x10-26

The lead SNPs for each AS-associated locus were examined for an mQTL effect on

CpG methylation within the genes previously reported or suggested to be associated

with the lead SNP (Table 5.2). Where multiple genes have previously been reported as

potentially related to the AS-associated variant both genes were examined, such as

disease-associated SNPs situated in close proximity to both the NPEPPS loci and

Chapter 5: Integrated analysis 229

TBX21,which encodes a transcription factor 309. Only annotated CpGs were used for this process, and there may be unannotated CpGs that are associated with these genes that are also captured within the cis-mQTL analysis but are not outlined within Table

5.2. The most significant cis-mQTL associated with the lead SNP may lie outside the gene of interest.

CpGs associated with genes that have strong genetic associations with AS were identified across all cell types. The ERAP1/2 region contains three independent AS- associated loci. All three of these loci were identified as significant cis-mQTL across all cell types except γδ T-cells. The 5:96252589 SNP associated with ERAP2 had a stronger association with CpG methylation than the other two lead SNPs within the region.

Several of the cis-mQTLs identified with these lead SNPs are cell type specific associations. An example of this is the CpG sites annotated to ETS1, for which mQTL effects were only observed in CD14+monocytes (qvalue = 7.27x10-13) and, nominally, at far less statistical significance in NK cells (qvalue = 0.03). Two disease associated loci are within close proximity to ETS1. Cis-mQTLs in LD with the lead SNP were identified in other cell types (Figure 5.2). Interestingly, a variant in LD with the lead

AS-associated SNP within the 21q22 locus, encoding the ETS2 transcription factor, also exhibited a significant cis-mQTL effect for CpGs within ETS2 (qvalue = 2.12x10-

10). The lead SNP for the intergenic 21q22 region (21:39091357) was not identified as a significant cis-mQTL.

230 Chapter 5: Integrated analysis

Figure 5.2 cis-mQTLs associated with the ETS1 region. Overlayed for CD4+T-cells (blue), CD8+T- cells (orange) and CD14+monocytes (green).

Chapter 5: Integrated analysis 231

Table 5.2“˜ƒŽ—‡ˆ‘”–‘’cisǦƒ••‘ ‹ƒ–‡†™‹–Š–Š‡Ž‡ƒ†ˆ‘”–Š‡Ǧƒ••‘ ‹ƒ–‡†‰‡‡•Ž‹•–‡†Ǥ If several CpGs were associated with a lead SNP only the most significant is shown.

Gene SNP CpG CD4 CD8 Mon NKC GDT AIRE 21:45616324:G:A cg19450046 ns ns ns ns 0.01 ANKRD55 5:55444683:G:A cg06750410 8.12x10-4 ns ns 0.04 ns ANTXR2 4:80887969:A:G cg20156679 7.87x10-3 0.01 2.37x10-6 0.04 ns ASAP2 2:9402988:A:G cg06627617 9.53x10-11 3.56x10-8 1.83x10-9 3.36x10-4 0.01 BACH2 6:90976768:G:A cg00786977 6.30x10-5 4.82x10-3 1.72x10-6 Ns ns C1orf106 1:200878727:G:A cg10092377 1.19x10-12 1.76x10-11 4.20x10-14 3.73x10-13 2.74x10-4 CARD9 9:139263891:C:G cg25830586 NS 0.04 ns ns ns CCL2 17:32570547:G:A cg17864156 NS 2.64x10-3 8.38x10-3 ns ns CCR5 3:46428658:A:G cg22984586 2.24x10-3 1.05x10-5 1.03x10-6 1.21x10-4 ns CCR6 6:167437988:C:T cg10482512 5.14x10-6 ns 1.03x10-7 2.16x10-5 ns CD244 1:160846284:T:G cg05149320 ns ns 1.50x10-5 0.05 ns CD28 2:204612058:A:G cg24336674 ns 5.56x10-3 ns ns ns DNMT3B 20:31348750:C:T cg01844435 0.01 9.82x10-3 7.38x10-5 ns ns ERAP1 5:96121715:C:T cg08986950 2.74x10-14 2.48x10-12 2.22x10-13 8.15x10-11 ns ERAP2 5:96252589:T:C cg01955050 1.00x10-8 1.90x10-4 4.39x10-8 2.79x10-5 1.33x10-4 ERN1 17:62147192:G:C cg07684309 2.39x10-6 5.36x10-5 1.07x10-6 4.84x10-9 ns ETS1 11:128346793:T:C cg17179455 ns ns 7.27x10-13 0.03 ns FAM118A 22:45727565:T:G cg22693272 1.60x10-9 1.65x10-8 3.17x10-8 8.18x10-8 0.02 FAS 10:90749963:A:G cg19791409 2.81x10-3 0.01 6.67x10-5 ns ns FAS-AS1 10:90749963:A:G cg13130092 9.73x10-5 ns 7.27x10-4 5.38x10-4 ns FOS 14:75741751:C:T cg12061886 0.05 ns ns ns ns FUT2 19:49206108:C:G cg01656853 3.81x10-6 ns 0.04 ns ns GPR35 2:241569692:C:T cg13696171 1.15x10-3 5.99x10-3 ns ns ns IKZF1 7:50323174:T:C cg16499656 ns ns 0.05 ns ns

232 Chapter 5: Integrated analysis

IKZF3 17:38032680:T:C cg19468946 3.30x10-5 3.46x10-5 4.70x10-5 0.01 ns IL10 1:206943968:C:A cg16247264 3.10x10-8 4.79x10-10 ns ns ns IL1R1 2:102771282:T:A cg10417124 ns ns 4.03x10-5 ns ns IL1R2 2:102647300:G:A cg01613557 ns ns 4.68x10-9 ns ns IL23R 1:67681669:T:G cg19889859 ns 0.03 0.01 ns ns IL27 16:28517709:T:C cg13401734 ns ns 8.72x10-5 ns ns IL6R 1:154426264:C:T cg18748303 ns ns 0.01 ns ns IRF5 7:128573967:G:A cg14349538 8.46x10-4 ns 4.32x10-8 ns ns ITGAL 16:30485393:G:C cg24033122 6.81x10-6 1.43x10-3 1.73x10-14 9.60x10-6 ns JAK2 9:4981602:C:A cg08584037 ns ns 1.70x10-6 5.85x10-4 ns LSM14A 19:34656406:C:T cg21245903 8.10x10-13 4.23x10-9 1.14x10-12 3.76x10-8 ns LTBR 12:6502742:G:A cg25433705 ns ns 4.02x10-4 ns ns NKX2-3 10:101278725:C:T cg03711485 3.07x10-3 0.04 0.04 0.02 ns NOS2 17:26097131:T:C cg23346716 1.26x10-4 0.04 1.15x10-5 7.22x10-3 ns OTUD3 1:20171860:G:A cg16764945 ns 0.02 ns ns ns PDGFB 22:39660829:T:C cg13250388 9.90x10-3 ns ns ns ns RUNX3 1:25305114:T:C cg19774846 ns ns 0.02 ns ns SH2B3 12:111884608:T:C cg10509046 1.34x10-3 0.03 ns ns ns SOCS1 16:11365500:C:T cg03014241 0.02 ns ns ns ns SP140 2:231148128:A:G cg05259225 ns 4.05x10-3 ns ns ns TBX21 17:45690351:A:T cg05266212 6.53x10-11 ns 1.24x10-14 ns ns TNFAIP3 6:138197824:C:T cg25981174 ns ns 1.72x10-7 ns ns TNFRSF1A 12:6446777:G:A cg06016751 3.16x10-5 6.77x10-3 4.66x10-6 0.03 ns TNFSF8 9:117696336:C:T cg14361804 1.62x10-21 2.52x10-24 5.75x10-25 2.83x10-14 9.47x10-7 UBE2L3 22:21928597:C:G cg06850285 9.83x10-9 0.05 3.75x10-4 1.59x10-3 ns ZMIZ1 10:81042475:G:A cg13115165 1.15x10-7 1.72x10-3 4.66x10-9 3.03x10-3 ns ZNF365 10:64354262:C:T cg03885735 5.53x10-3 1.25x10-4 9.62x10-4 ns ns

Chapter 5: Integrated analysis 233

Trans-mQTL analysis

Trans-mQTL (mQTL effects on loci located >500kb from the AS-associated loci) were identified in smaller numbers than cis-mQTL (Table 5.3). Those trans-mQTL that were significant (qvalue<0.05) generally were less significant associations compared to cis-mQTL (as seen in the above mQTL QQ plot). No significant trans- mQTL (adjusted FDR<0.05) were identified in γδ T-cells, likely an indication of insufficient power. The same trans-mQTL was identified as the most statistically significant in CD8+T-cells, NK cells and CD14+monocytes

(6:32559019:G:A:cg15580874 (DFNB59;PRKRA;MIR548N). The majority of the trans-mQTL identified influenced methylation at CpGs that were unannotated, meaning that the functional result of these mQTL cannot be ascertained. The trans- mQTL influencing annotated CpGs included a broad range of genes, including

DNMT3B (qvalue = 0.04). DNMT3B is involved in de novo establishment of DNA methylation and is potentially associated with the AS-variant closest to SP140.

Table 5.3 The number of trans- mQTL results for each cell subset with the most significant trans- mQTL shown.

γδ CD14+ CD4+T-cells CD8+T-cells NK cells T-cells monocytes trans-mQTL 5892 2171 0 521 4775 (FDR<0.05) Unique 159 65 0 13 110 probes Annotated unique 54 35 0 11 69 probes 6:32559019:G:A 6:32559019:G:A 6:32589326:C:A Most 14:35899194:C:T cg15580874 cg15580874 cg15580874 cg11480769 significant DFNB59; NA DFNB59; DFNB59; DNAJC8 PRKRA;MIR548N PRKRA; MIR548N PRKRA;MIR548N mQTL 3.43x10-15 2.03x10-11 6.69x10-9 2.31x10-13

234 Chapter 5: Integrated analysis

eQTL analysis

The same approach used to identify mQTL was used to identify eQTL. The eQTL results had greater inflation than that observed with the mQTL results (GIF = 10.33)

(Figure 5.3). The median -log(p) was 0.47 indicating that the majority of eQTL associations identified were not significant. Instead this large inflation can be observed to be driven by a few genes, as indicated with the sections of steep slope in the QQ plot). The major source of this inflation is ERAP1, the second largest contributor to genetic heritability of AS 140.

Figure 5.3 QQ plot of eQTL for CD4+T-cells indicating local (cis) and distal (trans) associations.

Chapter 5: Integrated analysis 235

Cis-eQTL analysis

Cis-eQTL analysis identified a relatively small number of unique gene associations across all cell types (Table 5.4). Unlike with cis-mQTL analysis, only a small number of these were within the HLA region, although the same HLA region associated cis- eQTL was identified as the most significant cis-eQTL in CD4+T-cells, CD8+T-cells and NK cells. The adjusted p-values (qvalues) for cis-eQTLs was lower than those observed for mQTL, although substantial overlap between the regions of strong association with mQTL and eQTL was observed.

Table 5.4 cis -eQTL results by cell type indicating the number of unique genes and those associated within the HLA region.

CD14+ CD4+T-cells CD8+T-cells γδ T-cells NK cells monocytes No. eQTL 16619 17002 1710 8307 15493 (FDR<0.05) Unique 198 201 38 145 210 genes Unique non-HLA 145 141 22 97 149 eQTL

Most 6:29629344:T:C 6:29647628:T:A 6:32559019:G:A 6:29629344:T:C 6:32592868:T:C significant ZFP57 ZFP57 HLA-DRB6 ZFP57 HLA-DRB6 mQTL 6.69x10-23 3.60x10-28 7.98x10-7 4.54x10-22 1.05x10-22

236 Chapter 5: Integrated analysis

Fewer AS-associated SNPs were significant eQTLs than were mQTLs, and fewer still influenced expression of the genes reported to be associated with the AS-associated loci (Table 5.5). Strong cis-eQTL associations between ERAP1 and ERAP2 and their respective AS-associated SNPs were observed. The most statistically significant example of a loci that encompassed both an eQTL and mQTL association was

5:9262117411 (Figure 5.4). The eQTL influence on ERAP2 expression was inversely related to the mQTL for the same SNP. The ERAP1/2 region cis-eQTLs overlapped with the cis-mQTLs identified previously (Figure 5.5).

Table 5.5 Qvalues for individual cell types cis-eQTL with the lead SNPs and the gene associated.

SNP Gene Name CD4 CD8 GDT NKC Mon 5:55440730:G:A ANKRD55 4.33x10-3 ns ns ns ns 5:96121715:C:T ERAP1 2.26x10-3 6.44x10-5 ns 8.81x10-4 4.24x10-7 5:96121715:C:T ERAP2 8.12x10-3 1.81x10-3 ns ns 0.03 5:96252589:T:C ERAP2 1.03x10-6 4.40x10-10 3.28x10-3 3.60x10-7 1.82x10-15 17:62147192:G:C ERN1 0.01 2.87x10-4 ns 1.24x10-8 ns 10:90749963:A:G FAS 0.02 ns ns 4.75x10-7 4.20x10-12 4:103434253:T:C NFKB1 4.22x10-3 ns ns ns ns 9:117696336:C:T TNFSF8 ns ns ns 0.03 ns

Figure 5.4 CD4 T-cell expression and methylation values for the ERAP2 SNP genotype (5:96211741:T:G). Homozygosity for 5: 96211741 is significantly associated with ERAP2 expression and DNA methylation at Cg10531724 with AS individuals trending towards lower expression overall.

Chapter 5: Integrated analysis 237

Figure 5.5 Gviz plot of the cis-mQTL and eQTL across CD4+T-cells (blue), CD8+T-cells (orange) and CD14+monocytes (green) within the ERAP1/ERAP2 locus.

238 Chapter 5: Integrated analysis

The cis-eQTL identified with the AS-associated loci often were not associated with changes in expression of the genes that have previously been reported as likely (Table

5.6). The multiple additional gene associations with the lead SNPs are shown with the gene reported to be associated with the SNP. For example, the AS associated variant

6:167437988:C:T has previously been suggested to be affecting CCR6 which is closely located, however the only significant cis-eQTL was influencing RNASET2 expression.

RNASET2 was also influenced by several cis-eQTL for SNPs in LD with 6:167437988 in both CD4+ and CD8+T-cells (Figure 5.6). RNASET2 is a T2 ribonuclease which has been suggested in other species to be involved in APC priming of CD4+T-cells, however the role of RNASET2 in human requires further study 310,311.

Table 5.6 cis-eQTL associations for lead SNPs that are not with the gene thought to be for that SNP

Reported Associated SNP CD4 CD8 GDT NKC Mon Gene Gene 1:155130391:G:A Unclear GBAP1 9.11x10-5 ns ns ns ns 10:90749963:A:G FAS ACTA2 7.18x10-8 9.65x10-4 ns 1.45x10-11 3.85x10-19 ACTA2-AS1 2.82x10-7 0.02 ns 8.50x10-9 9.62x10-16 14:35563211:T:C NFKB1A FAM177A1 ns ns ns ns 2.26x10-7 PPP2R3C ns ns ns ns 3.06x10-10 16:28517709:T:C IL27 SULT1A2 ns ns ns ns 2.23x10-6 16:30485393:G:C Unclear SMG1P5 0.02 ns ns ns ns 17:26137540:C:T NOS2 LYRM9 ns ns ns ns 0.03 17:38032680:T:C IKZF3 GSDMB ns 2.98x10-3 ns ns ns ORMDL3 4.18x10-8 3.12x10-14 9.13x10-5 3.20x10-7 ns 17:45690351:A:T NPEPPS TBKBP1 ns ns ns ns 0.01 2:25097644:A:G Unclear ADCY3 0.02 ns ns ns 1.48x10-3 22:45727565:T:G FAM118A SMC1B 0.01 ns ns ns ns 5:96121715:C:T ERAP1 AC008906.1 ns ns ns ns 8.61x10-3 CAST 6.60x10-4 0.01 ns ns 0.02 5:96174929:C:T ERAP2 CAST 1.10x10-3 1.82x10-5 ns 2.00x10-7 5.63x10-6 6:167437988:C:T CCR6 RNASET2 5.33x10-8 2.48x10-8 ns ns ns RNASET2 2.60x10-3 9.20x10-6 ns ns ns pseudogene

Chapter 5: Integrated analysis 239

Figure 5.6 cis-mQTL and cis-eQTLs in CD4 and CD8 T-cells associated with a RNASET2. The same SNP was associated with RNASET2 gene expression and with CpG 25258033 in both CD4+T-cells and CD8+T-cells, with the AA homozygous individuals having higher RNASET2 expression and lower DNA methylation of cg25258033, and GG homozygous showing decreased RNASET2 expression.

240 Chapter 5: Integrated analysis

The lead SNP in the 17q21 region, housing the AS-candidate gene NPEPPS, was also identified as a cis-eQTL for TBKBP1. The 17q21 region is gene rich and encompasses

TBX21, NPEPPS and TBKBP1 amongst others (Figure 5.7). The entire region is similarly associated with cis-mQTLs and cis-eQTLs. No cis-eQTL associations for

NPEPPS or TBX21 were identified.

Figure 5.7 Gviz plot for cis-mQTL and eQTL within the NPEPPS/TBX21 region. Shown are CD4+T-cells (blue), CD8+T-cells (orange), NK cells (red) and CD14+monocytes (green)



Chapter 5: Integrated analysis 241

Trans-eQTL analysis

As with trans-mQTL, fewer trans-eQTL were identified than cis-eQTL (Table 5.7).

The same trans-eQTL association was identified as the most statistically significant in all cell types except γδ T-cells, between 3:49473646:C:T and the gene GPXP1.

GPX1P1 is a glutathione peroxidase pseudogene with unknown function located on the X chromosome. This observation was supported by surrounding SNPs which were also trans-eQTLs affecting GPX1P1, all with the same direction of association

(increased GPX1P1 expression with the minor allele). The trans-eQTL identified were all in LD with an AS-associated SNP in the 3p21 region (rs3197999), which has unknown function, but suggested association with MST1.

The majority of the genes identified in trans-eQTL analysis were non-coding RNAs, with unknown function. A small number of HLA region associations were identified in trans-eQTL. This made it difficult to interpret the functional impact of most trans- eQTLs.

Table 5.7 Trans-eQTL results by cell type indicating total mQTL and number of mQTL with unique genes.

CD14+ CD4+T-cells CD8+T-cells γδ T-cells NK cells monocytes No. trans- mQTL 227 182 3 112 161 (FDR<0.05) Unique 12 9 3 9 12 genes HLA region 5 2 3 0 4 genes

Most 3:49473646:C:T 3:49473646:C:T 6:31391401:T:C 3:49473646:C:T 3:49473646:C:T significant GPX1P1 GPX1P1 AC002511.1 GPX1P1 GPX1P1 mQTL 1.12x10-10 2.68x10-13 0.01 4.26x10-9 3.09x10-17

242 Chapter 5: Integrated analysis

DIABLO signature for AS

DIABLO analysis was used to determine if an integrated signature for AS could be determined from the genotype, DNA methylation and gene expression data. The same quality-controlled data from the previous sections was used. Datasets were initially examined using sparse partial least squares discrimination analysis (sPLS-DA) to determine which variables within each dataset were most strongly driving the differences between AS and healthy individuals, and to investigate the weight of each feature in discriminating between AS individuals and healthy controls.

Due to the practical considerations of computational processing power and time, the methylation data was reduced from the initial >800,000 probes. Probes with a variability of <0.0001 were removed (~400,000 probes). This level of variability was selected as it reflects a population level difference of <0.01 (1% methylation). The entirety of the gene expression and genotype data was used. As the values were based on variability within each cell type the input variables may differ between cell types.

Only those individuals for whom all three data types (genotype, DNA methylation and

RNAseq) were available within a cell type were used (Table 5.8). Approximately equal numbers of healthy and AS individuals were available for analysis within each cell type.

Table 5.8 Number of healthy and AS individuals with data available per cell type. CD14+ CD4+T-cells CD8+T-cells γδ T-cells NK cells monocytes AS 50 49 16 43 49 individuals Healthy 48 45 12 39 43 individuals

Chapter 5: Integrated analysis 243

sPLS-DA models

PLSDA and sPLS-DA were carried out to determine the ability of each dataset individually to discriminate healthy and AS individuals. Three measurements for classification error rate, the rate at which the model misattributes disease status, are used to evaluate the models formed using PLS-DA and DIABLO based on input biological variables: maximum distance, centroids distance and mahalanobis distance.

These are different methods for measuring the distance between the groups of interest within Euclidean space by defining the value of each variable into a location within multidimensional space. Centroid distance is the distance between groups averaging all sample locations within the cluster. Mahalanobis distance, is a multivariate equivalent of a Euclidean distance, and provides a measure of the distance of a point relative to the distribution of all points 214. The optimal number of components for each model is based on the point at which the addition of another component ceases to improve the classification error rate of the model. PLSDA classification error rates were highest within genotype and lowest for gene expression data (Table 5.9). The error rates for maximum distance and mahalanobis distance were almost identical, however centroid distance had a higher classification error rate than either of the other two measurements (Figure 5.8A). All PLS-DA models had overlap between AS and healthy cohorts (Figure 5.8B).

Table 5.9 Classification error rates of each cell type individual datasets for PLSDA (centroids distance).

CD14+ CD4+T-cells CD8+T-cells γδ T-cells NK cells monocyte Genotype 0.53 0.54 0.47 0.61 0.55 DNA 0.37 0.41 0.44 0.40 0.40 methylation Gene 0.26 0.32 0.43 0.34 0.29 expression .

244 Chapter 5: Integrated analysis

Figure 5.8 PLS-DA results for CD4+T-cells, showing the classification error rate across components (A) and the PLS-DA plot for components 1 and 2 (B).

Chapter 5: Integrated analysis 245

When further examined using a sPLS-DA model, which selects the subset of variables best able to discriminate the groups of interest from each other, only a small proportion of variables were selected (<200) (Figure 5.9A). Gene expression and DNA methylation models both indicated better discrimination using sPLS-DA models compared to PLS-DA models, however genotype lost the ability to discriminate when modelled using sPLS-DA (Figure 5.9B). This was observed across all cell types, and in all cell types gene expression was a better discriminator than DNA methylation

(Figure 5.10). This is despite sparse analysis having the potential to select different genotypes based on the cell type specific variables selected within the initial PLS-DA input based on variability across samples within a cell type, and also the variables that were selected for sparse analysis.

Only γδ T-cells had complete separation of groups in any dataset (Figure 5.10). Whilst this separation was also observed using PLS-DA analysis in γδ T-cells, within sPLS-

DA the groups were more closely grouped indicating that variables contributing to variation within the γδ T-cell groups was removed. Intragroup variation (within AS or healthy groups) in γδ T-cells (6%) was larger than any other cell type, and the between group variation (5%). Thus, the interpretation of γδ T-cells separation between AS and healthy individuals should be interpreted as a representation of greater variation within

γδ T-cells gene expression.

246 Chapter 5: Integrated analysis

Figure 5.9 sPLS-DA output for CD4+T-cells indicating the balanced error rate over number of selected features (A) and the sPLS-DA plot for components 1 and 2 (B).

Chapter 5: Integrated analysis 247

 Figure 5.10 sPLS-DA plot of DNA methylation (top) and gene expression (bottom) within each cell type.



248 Chapter 5: Integrated analysis

DIABLO model

For DIABLO analysis, samples were split into a training cohort and a test cohort, using

~70% of samples for training and the remaining to test the resulting models’ accuracy.

There were insufficient γδ T-cells to create a test and training cohort (28 individuals).

Instead the γδ T-cell samples are run per the training set and used to explore variable selection and dataset correlation. As processing is identical to the training models classification error rates for γδ T-cells were reported alongside the other cell types for those steps.

The design for the DIABLO model is partially defined through the known correlation between gene expression and DNA methylation, and the influence of genotype on both

(as shown in the above QTL analysis). As mentioned above, computational power influences the number of input variables for the tuning model. Per DIABLO recommendation, the initial model was tuned using the 10,000 most variable CpGs and genes (10,000 CpGs, 10,000 genes and 105 SNPs). This input performed better than

5,000variable input in tuning model (based on initial classification error rate). As observed in sPLS-DA, the centroid distance classification error rate was higher than max distance and mahalanobis distance (Figure 5.11A). High variability in classification error rate and broad confidence intervals were observed for NK cells and CD14+monocytes. The maximum distance measurement was used to select number of components for the model. In all cell types except γδ T-cells the classification error rate was below 0.40.

After the model was ‘tuned’ and variables best able to discriminate AS individuals from healthy controls were isolated, the classification error rate reduced further. The number of variables selected for each dataset was low, encompassing less than 50 variables from each measurement. The correlation between the datasets varied

Chapter 5: Integrated analysis 249

between cell types, with some indicating strong associations (>0.60) and others moderate associations (0.40-0.60) (Figure 5.11B). As the correlation is limited to those variables input into the dataset this may be an indication that selection of variables based on variance has affected correlation. In all cell types the correlation between gene expression and DNA methylation was greater than either of these two measurements with genotype.

250 Chapter 5: Integrated analysis

Figure 5.11 Classification error rate across 10 components for each cell type (A) and the correlation between the different data types (B).

Chapter 5: Integrated analysis 251

Figure 5.12 Plot of the variables from each 'block' (data type) that contributed to component 2 in CD4+T-cells, and the circos plot showing the correlations between these variables. The circos plot indicates the correlation between the variables included in the model, and it can be observed that the correlation is greatest between gene expression (totalRNA) and DNA methylation, with very few correlations with genotype (Figure 5.12 bottom right). The specific features that contributed to each block are shown in bar charts (labelled) with blue indicating a feature that defines AS individuals and an orange indicating a feature that distinguishes healthy individuals.

252 Chapter 5: Integrated analysis

The final model involved a much smaller group of variables, which contributed to each component to a varying extent (Figure 5.12). The circos plot indicates the correlation between the variables included in the model, and it can be observed that the correlation is greatest between gene expression (totalRNA) and DNA methylation, with very few correlations with genotype (Figure 5.12 bottom right). Other cell types indicated even fewer correlations between the datasets (e.g. CD8+T-cells Figure 5.13). Moderate or weak correlation are not shown (<0.7).

Chapter 5: Integrated analysis 253

Figure 5.13 Plot of the variables from each 'block' (data type) that contributed to component 2 in CD8+T-cells, and the circos plot showing the correlations between these variables (bottom right).

254 Chapter 5: Integrated analysis

Figure 5.14 Clustered image map of each cell type from the final DIABLO model. The disease groups are shown as rows and variables as columns, with the dataset of origin indicated. It can be observed that several blocks of individuals and measures have emerged in each cell type and these are readily apparent in CD8+T-cells with the columns of colour. All cell type models show some differentiation of AS individuals from healthy individuals, however γδ T-cells show a split into 4 groups across all three measures.

The ability of the final DIABLO signature to differentiate within the training samples is indicated within the clustered image map (Figure 5.13). Clustering is based on pairwise data correlation from DIABLO and uses complete clustering with Euclidean distance to form the clustering of samples and variables. The individuals used in the training cohorts appear to split into 4 overall groups based on DIABLO signature, with

Chapter 5: Integrated analysis 255

some groupings only encompassing healthy or AS individuals but the other two indicating a mix of individuals. This indicates that this signature is not capable of fully discriminating healthy individuals from AS individuals.

Testing the DIABLO signature

To test the accuracy of this model, the signature developed on the training cohorts was used to determine the disease status of individuals from the test cohort. The resulting number of true versus false identifications is used to calculate the balanced error rate of the model. The balanced error rate was similar for all cell types regardless of training and test size (Table 5.10). This indicates that the models developed for AS are not improved through the use of a single cell type over another. The BER of 0.40 indicates a poor discriminatory ability for the model, as the model is not able to significantly discriminate AS individuals from healthy controls more than chance. This is a similar error rate to the individual dataset discrimination using sPLS-DA above. The lowest balanced error rate was observed in CD8+T-cells, whose signature incorporated

RUNX3 expression which is downregulated in AS 200. RUNX3 is associated with T- cell differentiation and induces IL-23A expression. It has previously been identified as differentially expressed in AS individuals. The CD8+T-cell signature is in keeping with the MHC-1-opathy hypothesis discussed above in chapter 3.

256 Chapter 5: Integrated analysis

Table 5.10 Balanced error rate (BER) and number of correct assignations for the test cohorts. The BER is a measure of the accuracy of the signature for discriminating AS individuals from healthy individuals. Shown is the number of individuals tested and the number of individual correctly assigned as healthy or AS. The signatures appear to perform equally well for discriminating AS individuals as healthy individuals.

CD14+ CD4+T-cells CD8+T-cells NK cells monocyte BER 0.40 0.30 0.32 0.31 (centroid.dist) BER 0.40 0.32 0.40 0.40 (max.dist) Test samples 18AS:12HC 13AS:16HC 13AS:12HC 17AS:11HC (AS:HC) Correct assignation of affectation status 11AS: 7HC 11AS: 8HC 9AS: 6HC 11AS: 8HC (max.dist)

Chapter 5: Integrated analysis 257

Discussion

This Chapter used two separate integrated analysis approaches to investigate firstly, the relationship between AS-associated genetic variants and gene expression or DNA methylation, and secondly whether these relationships can be used to develop a signature to discriminate AS individuals from healthy controls. The Chapter is intended to provide insight into the mechanism through which genetic variants affect function in AS, and whether these datasets can inform on the underlying biology that differentiates the immune systems of AS individuals from healthy controls.

QTL analysis

A key point in interpreting the results of the mQTL analysis is the design of the EPIC array which does not enrich for several of the regions in which AS-associated loci reside (discussed in Chapter 3). This may result in mQTL effects not being identified with some loci regardless of whether an association exists. Multiple testing adjustment is also an issue with trans-QTL broadly due to the sheer number of associations tested.

QTL analysis with the current cohort should be interpreted in light of these limitations.

Inflation is observed in both mQTL and eQTL, however higher levels of inflation were observed within eQTL results. Inflation within mQTL is likely due to the non-random distribution of the Illumina MethylationEPIC probes throughout the genome. Genomic inflation is observed in SNPs and methylation probes within gene regions as these regions typically impart greater phenotypic effect than intergenic regions 312. The design of the MethylationEPIC is enriched for gene regions. Additionally, the array relies upon the correlation in methylation levels between probes locally, which can also result in higher observed p-values. In eQTL analysis, the inflation observed in QQ plot was largely driven by strong associations with ERAP2, the second largest

258 Chapter 5: Integrated analysis

contributor to AS heritability after HLA-B*27. As the median p-value is similar to that observed in mQTL it is likely that the high GIF observed is driven solely by ERAP2 and a few other highly influential genes. The HLA region was a strong influence on both mQTL and eQTL associations, as regions of strong LD exist across the HLA locus these effects were not able to be separated from each other. Therefore, HLA region QTLs will not be discussed.

Cell type specific associations are observed within both cis-mQTL and cis-eQTLs, and there was substantial overlap between both. The cell type specificity appears to be linked to the underlying patterns of expression or methylation in these genes, with strong influences observed in cell types that express the gene in question. An example of this was cis-mQTL influences on CpGs annotated to ETS1, which were only observed in CD14+monocytes and NK cells. ETS1 is nearby to three AS-associated loci. No significant eQTL was identified as influencing ETS1 in this region. This reflected the results in Chapter 4 which observed no difference in ETS1 expression between AS individuals and healthy controls. ETS2, a transcription factor that has previously been implicated in the 21q22 locus association with AS was affected by a significant cis-mQTL effect by a SNP in LD with the lead AS-associated variant in the

21q22 locus. CpGs annotated to ETS2 were differentially methylated within the

CD14+monocytes of AS individuals (see Chapter 3), and ETS2 was differentially expressed in AS individual CD14+monocytes (see Chapter 4). This suggests that the method by which AS-associated loci may affect gene function is not the same between all loci, nor within each cell type. Lead SNPs for which no eQTL or mQTL were identified may indicate weak associations, a lack of power within a specific cell type, or that the cell type most affected was not examined. This emphasises the importance of using isolated cell types for analysis of disease associations.

Chapter 5: Integrated analysis 259

The three AS-associated loci within the ERAP1/2 locus all encompassed cis-eQTL and cis-mQTL. No significant difference in expression or methylation between AS individual and healthy controls was observed within either ERAP1, ERAP2 or CAST within the previous two results chapters. Previous work in AS regarding the SNPs associated with these genes indicated that the polymorphisms associated with AS result in increased expression of ERAP1 and ERAP2 313. The observed increased expression was driven by allele-dependent altered expression of specific isoforms of

ERAP1 and ERAP2. Whilst the current study cannot comment on isoform expression, the risk alleles within the polymorphisms associated with AS (5:96121715:C:T and

5:96252589:T:C) were both associated with increased expression levels in their associated genes. Interestingly, the AS risk allele in 5:96121715:C:T which is associated with ERAP1, resulted in reduced expression of ERAP2 and CAST.

The eQTL influencing ERAP1 and ERAP2 expression showed different expression levels of AS individuals compared to healthy individuals within genotype. It should be noted that the eQTL associations for ERAP1 and ERAP2 are not controlled for the null mutation rs2248374, which heterozygous (G/G) genotype results in nonsense mediated decay of ERAP2 mRNA and is in strong LD with the AS-associated SNP rs2910686 (~0.83 in European Caucasians) 314,315. There were approximately equal numbers of healthy and AS individuals with G/G genotype (15 AS individuals (29.4%) and 13 healthy individuals (26.5%)). As individuals heterozygous for the null allele have no mRNA expression of ERAP2, which could drive the association of lower expression in the AS-associated risk allele in LD with the null allele. However, as there were equal numbers of AS and healthy individuals heterozygous for the null allele, this would not account for the differences in expression across all AS-associated loci

260 Chapter 5: Integrated analysis

genotypes. Further investigation of these associations would enable their relationship with ERAP1 and ERAP2 expression to be clarified.

Similarly, the 17q22 region, harbouring the AS candidate gene NPEPPS, contains many genes across an extensive area of LD. This has made identification of the key

AS-associated gene(s) at this locus challenging. The locus has strong mQTL associations at several positions but no strong eQTL associations in the region. The eQTL association identified with the lead SNP for this locus (17:45690351:A:T) was only identified within CD14+monocytes, and indicated

Similarly, no changes in the expression of NPEPPS, TBKBP1 or EFCAB13, all encoded within this region, were identified in any cell type. The only gene in this region to have differential expression in AS individuals was TBX21, which has a small non-significant increase in AS individual CD4+T-cells (log2FC=0.23). Previous work identified an eQTL association between TBX21 expression and rs11657479 which was only apparent in AS individuals 309. As the current study examined eQTL using both healthy controls and AS individuals, the eQTL association driven by AS individuals was likely diluted and did not reach statistical significance (adj.p-value<0.05). A separate analysis of AS individuals only could be performed in future to determine if this association is also observed in the current cohort. The strong mQTL identified across the region do suggest that there is coordinated changes in regulation of gene expression occurring based on genotype within this region.

Several of the strongest mQTL associations have been observed in other disease, such as cg27230769-LOC285830, which was previously reported in RA and MS 316,317. The association with other diseases suggests that these associations may not be specifically

AS driven or associated. Trans associations may be more likely to be difficult to

Chapter 5: Integrated analysis 261

distinguish as AS-associated or whether these associations are simply due to chromatin orientation with those regions capable of interacting.

Finally, the association of the lead SNP in the locus with RNASET2 expression and methylation, points to an interesting potential effect of the AS-associated SNP beyond effects on CCR6. It indicates the potential of these polymorphisms to affect multiple genes, and the observed changes go some way to explaining why the alteration of lead

SNPs may not always result in changes in gene expression.

DIABLO analysis

DIABLO analysis is itself an interrogation of each individual variables’ association with disease, but also the correlation between the datasets, and data types, used. sPLS-

DA analysis was used to examine the influence of each dataset individually on separating AS and healthy individuals. As expected the separation using DNA methylation and RNAseq data improved when sparse analysis was used compared to

PLS-DA analysis. PLS-DA uses all input genes which despite the reduction of DNA methylation data to only those CpGs that were >0.0001 variation in methylation would include many CpGs unaffected by AS. The inclusion of these non-related measurements reduced the ability of the models to discriminate the groups.

Conversely, genotype performed worse when reduced using sPLS-DA. Genotype is generally assumed to be additive in nature, that is that the greater the number of genotype included the more heritability of disease is encompassed, and alongside this the individual genotypes in AS only encompass a very small proportion of heritability each. Together this means that the loss of genotypes reduces the association with AS within the model. All datasets had high cumulative error rates of >25%. A high cumulative error rate may be caused by the model shown is not a good discriminator of the groups of interest (poor predictive ability), insufficient sample sizes causing

262 Chapter 5: Integrated analysis

overfitting of the model, or the wrong mfold selection. Mfold selection should result in the number of individuals divided by number of folds being greater than 5. Mfold selection for this analysis followed this recommendation, with the number of individuals divided by mfolds (10) being 6.8. The sample size is more than 50 for all cell types except γδ T-cells, but all are above the recommended 20 individuals. It is unlikely that sample size is an issue. It is therefore likely that the models used for each dataset only encompasses a small portion of the variance that discriminates AS individuals from healthy controls.

Integration of these datasets within a single model using DIABLO can increase the likelihood of discrimination, reduce error rate, as the contribution of each variable to the discriminatory capacity of the signature is assumed to be additive. The DIABLO models indicated that there is a moderate association between gene expression and

DNA methylation datasets with genotype, but a strong association between gene expression and DNA methylation measures. The reason for a lower than expected association with genotype could be the previously mentioned lack of enrichment for

DNA methylation measures within AS-associated genetic regions, and the selection of

DNA methylation and gene expression based on variability, not necessarily by their difference between AS individuals and healthy controls. Variability is used to select inputs rather than selection based on previously identified DMP and DEG to avoid overfitting the model and biasing the results. DIABLO is intended to be an unsupervised approach to identifying signatures to discriminate groups. Future work could use a method of input selection that maximises the correlation between variables, or even uses a much larger input, such as that used for sPLS-DA however this is a computationally intensive task that was beyond the scope of the current study. The reduce input used for the current model (10,000 CpGs, 10,000 genes and 105

Chapter 5: Integrated analysis 263

genotypes) took a week to run on the HPC, and a full model would be substantially larger.

Despite the limitations on correlation between data types, the model indicated correlation between gene expression and DNA methylation, and selected several genes known to drive specific cell type functions implicated in AS. In CD8+T-cells the model included changes in gene expression of RUNX3, involved in CD8+T-cell differentiation to activated cell types. RUNX3 is inversely correlated with several of the changes in methylation within the model. When examined within a clustered image map all five cell types appear to indicate samples segregating into 4 distinct subsets, with 2 subsets consisting of a mixture of AS and healthy individuals. This indicates that the variables used within the model are not exclusively altered within AS individuals, and that the variables selected only encompass part of the variation observed between AS and healthy individuals. This was expected due to the cumulative error rate (~0.40). When tested using the test sample groups all cell type models had a BER between 0.30 – 0.40, reflecting the segregation observed within the clustered image map. Although most individuals were correctly identified the error rate is not substantially improved upon chance (50% success). Opportunities to improve upon this model include converting the genotype counts to a score of all genotypes examined (score 0 for homozygous non-risk allele, 1 for heterozygous and

2 for homozygous risk allele). This would enable a score between 0 and 210. Allowing for the inclusion of all genotypes but removing any ability to examine correlation between specific genotypes and other measurements. As mentioned above the expansion of the variables input into the model would provide a better understanding of whether the limitations are due to unmeasured variability, an inherent lack of correlation between measurements, or non-biological variables. Unmeasured

264 Chapter 5: Integrated analysis

variability could be due to the curated measurement of DNA methylation, additional unidentified genotypes, or from measurements other than those used in this study, such as microRNAs. Integrating additional measurements known to affect AS would be a method to improve the discriminatory ability of a model, such as the inclusion of CRP or ESR measures, however these measurements were not available for healthy individuals only AS individuals. A caution on this approach is that the selection of these additional measurements should not be imbalanced between cohorts in a way that would bias results. This would include measurements such as age, due to the delay in diagnosing AS in individuals and the younger demographics that volunteer as healthy controls.

Overall, this chapter has shown the ability to create AS discriminatory signatures from cell subset measurements without the selection of variables known to differ between cohorts as input. The results suggest that even with the current data available this model could be improved upon with further adjustments, and potentially the use of the full datasets for consideration. Despite this there remains variation between AS and healthy individuals that is not explained by gene expression, DNA methylation and genotype alone. The complex nature of AS is emphasised by the small contribution of each variable to differentiate individuals, and the number of QTL associations within known AS associated gene regions.

Chapter 5: Integrated analysis 265

Conclusion

This final results chapter examined the hypotheses:

1. Methylation and expression quantitative trait loci were associated with AS loci

2. The individual datasets created by this study contain discriminatory or

predictive values for AS

3. These datasets can be used to create a signature for AS that incorporates

variables from all three biological measures and is able to discriminate AS and

healthy individuals.

The QTL analysis shows that the functional impact of AS-associated genetic loci is a mixture of eQTL and mQTL. The cell type specificity of several of the identified eQTLs indicates that the effect on gene expression may involve further environmental conditions. Cell type specific associations may only have been identifiable due to the use of isolated cell types. This provides insight into how these SNPs affect gene regulation and function and provide evidence for the association of lead SNPs with specific genes.

The individual datasets each provide some discriminatory value for determining if an individual has AS, as shown by the sPLS-DA results. A large proportion of the variability between AS individuals and healthy controls is still unexplained, and this may be due to the specific approaches used as the datasets themselves do not encompass all of the CpGs closely located to AS-associated SNPs. DIABLO analysis showed that despite this these associations between genotype, DNA methylation and gene expression can be used to discriminate AS and healthy individuals to some extent.

Further work in refining the AS signatures developed in this chapter may improve on this and provide a basis for better understanding the complex biology underlying AS.

266 Chapter 5: Integrated analysis

Chapter 6: Final Discussion

This project had the overarching aim of clarifying the functional impact of AS- associated genetic loci. Specifically, it aimed to determine whether there are changes in DNA methylation and gene expression in AS individuals that are associated with

AS genetic loci. Despite the strong genetic heritability of AS (>90%) the cumulative influence of all currently identified AS-associated loci is still less than 28% 89. The location of several of these SNPs in gene rich or intergenic regions has complicated the identification of novel genes involved in AS, as it has in all complex human genetic diseases, and limited underlying understanding of the mechanism through which these changes can cause AS. This study has investigated three specific aims:

1. To identify changes in DNA methylation in individual cell subsets associated

with AS.

2. To identify changes in transcription, both coding and non-coding, in individual

cell subsets that are associated with AS.

3. To investigate if the previously identified changes in DNA methylation and

expression are associated to each other and known AS-associated loci, and the

network formed by these interactions.

A conservative approach to controlling for potential biological and technical confounders, alongside the complex immunological and genetic basis of AS has limited the scope of this study. Despite these limitations, the findings of this study expanded upon the current understanding of the underlying changes in AS by addressing the three aims outlined above.

Chapter 6: Final Discussion 267

Overview of project findings

Each of the three overarching aims of this thesis were addressed within separate results chapters but using the same cohort. Initial design observations indicated the need for robust participant recruitment to ensure the AS and healthy individual cohorts were balanced for sex, age and HLA-B*27 status. Despite specific recruitment strategies this study was unable to obtain age and sex-matched healthy individuals positive for

HLA-B*27, and these variables were included as covariates in analysis. Each covariate that is included in analysis requires additional degrees of freedom and impacts the power of the study. This impact was observed with the results for DNA methylation and gene expression.

Chapter 3 examined the role of DNA methylation in AS within the individual cell types isolated. DNA methylation changes due to AS were observed in all cell types, and the changes in DNA methylation within each cell type due to AS is small (generally

<15%). AS-related changes in DNA methylation were smaller than those due to cell lineage and function. This may explain why the changes observed within this study have not been previously observed in studies that used a defined cut-off for change in methylation of 20% or greater. The probe coverage within the Illumina

MethylationEPIC array in the regions surrounding AS-associated loci was less than that observed for more broadly examined diseases such as rheumatoid arthritis. Despite this limitation the array method provides the most cost-effective coverage for genome wide methylation profiling and are enriched for regions with known function. When analysed for changes in AS individuals compared to healthy controls, no DMP were significant after multiple testing adjustment using the conservative Benjamini-

Hochberg method. Permutation analysis indicated that this approach may not be

268 Chapter 6: Final Discussion

appropriate for DNA methylation measurements, due to the co-regulated status of closely located CpGs.

CD4+T-cells had decreased methylation in CpGs associated with genes involved in

Th1 and Th2 differentiation, Th17 differentiation and Wnt signalling, indicating potential upregulation of these processes. This included changes in DNA methylation suggesting increased expression of genes associated with T-cell differentiation ( e.g.

SOCS3, IL21R, TMEM204, NFATC2) and in JAK/STAT signalling (JAK1, JAK3,

STAT5A). These findings suggest increased differentiation of CD4+T-cells in AS individuals driven by JAK-STAT signalling. As the pathways identified indicated, this could potentially be to Th1 and Th2 cells or Th17 cells. SOCS3 signalling promotes

T-cell differentiation into Th17, Th1 and Tregs depending on additional cytokine signalling such as SOCS1. AS individual CD4+T-cells have changes that indicate activation and differentiation but implicate multiple subsets (Th1 and Th17).

Interestingly, Gracey et al identified that the IL-17 axis was upregulated in male individuals with AS compared to their female counterparts. Our current analysis is adjusted for sex but the cohort is disproportionate between healthy and AS individuals which may have affected this result.

CD8+T-cells had similar changes in CpGs associated with genes involved in T-cell differentiation, such as RUNX3, CTLA4, CD44, IL4R and GATA3, and cell-cell adhesion. All of the CpGs annotated to these genes were hypomethylated, indicating upregulation in AS individuals. RUNX3 is responsible for maintaining activated

CD8+T-cell expression profiles, and CD44 and IL4R are upregulated in activated

CD8+T-cells upon exposure to IL-4. These indicate AS individuals have increased

CD8+T-cell activation.

Chapter 6: Final Discussion 269

One of the most interesting methylation changes was a DMR in the promoter region of PDE4A, which has previously been targeted in a drug trial in AS (Apremalist).

Apremalist, a PDE4 agonist, is approved as a treatment for psoriasis, but failed to achieve greater efficacy than placebo at 16 weeks. PDE4A is part of the cAMP pathway converting cAMP to AMP. This conversion results in increased production of pro-inflammatory cytokines, and suppression of Treg activity 252. PDE4B and

PDE4D are the PDE4 family members predominantly expressed by human T-cells 318.

Despite the significant reduction in PDE4A promoter methylation, PDE4A expression was non-significantly increased in AS individuals CD8+T-cells. PDE4 family members were only significantly increased in AS individual CD14+monocytes. As the

T-cells, NK cells and CD14+monocytes are all potentially producing inflammatory cytokines, this may suggest that Apremalist was not efficacious as PDE4 family members are not the driver of inflammatory cytokine production in AS individuals.

AS individual CD14+monocytes had decreased methylation in pathways of IFNγ and

IFNβ signalling, suggesting increased expression of these pathways. This is in keeping with increased PDE4 family member expression. Only a small number of DMP were significantly different within CD14+monocytes, perhaps indicating that differences in function in AS individual CD14+monocytes are not driven by large changes in DNA methylation. CD14+monocytes were unique in their expression of ETS2, suggested to be associated with the 21q22 genetic locus previously identified as an AS genetic locus. ETS2 had significantly reduced methylation and increased expression in AS individual CD14+monocytes, but no change in any of the other cell types examined.

This suggests that the association with 21q22 is specific to CD14+monocytes.

Although these suggest inflammatory change in AS individual CD14+monocytes, the small number of significant DMP makes further specificity untenable.

270 Chapter 6: Final Discussion

NK cells were more difficult to parse for changes as many DMP were associated with genes of unknown function. NK cells did exhibit decreased methylation in TLR4 signalling and Wnt-β-Catenin, both involved in NK cell activation. The upregulation of Wnt signalling implicates response to bacteria or viruses, and consequent activation of NK cells 319. NK cells may be activating in response to bacteria, which is part of the

“leaky gut” theory of AS pathogenesis.

High levels of variability were observed in γδ T-cells with increased methylation in

TLR and Notch signalling pathways, and reduced methylation in regulatory pathways for inflammatory processes and T-cell activation. This may suggest that γδ T-cells have increased activation in AS individuals, however the small sample numbers limit interpreting the changes observed further.

Overall the DNA methylation findings highlighted the need for greater annotation of

CpGs association with gene/gene regions to enable comprehensive interpretation of the results from approaches such as the MethylationEPIC. A dearth of functional annotation for non-coding genes compounds this, particularly in NK cells. This study indicates the utility of cell specific information but also the remaining limitations of

DNA methylation methods, such as cohort size requirements and annotation.

Chapter 4 discussed gene expression in AS individuals compared to healthy controls.

Similar to chapter 3, changes in gene expression due to AS occur within all cell types and were highly specific to cell lineage and function. Cell type variation was again greater than the changes due to AS, emphasising the necessity of examining cell type in isolation to fully capture the breadth of changes occurring due to disease. This was a fundamental finding within both the gene expression and DNA methylation datasets reflecting the role of both in designating cell lineage. The clustering observed within these studies is in keeping with observations that NK cells and T-cells have a common

Chapter 6: Final Discussion 271

progenitor 320. Few gene expression or methylation studies have used isolated subsets, and this ultimately results in findings that fail to identify the majority of changes due to disease. Further the continued use of historical cut-offs (based upon the sensitivity of PCR) need to be reassessed in light of technological advancements, and the understanding that such cut-offs are an artificial division between what is considered different and what is actually altered within cells. Examining genes known to be associated with AS genetic loci identified that many of these genes were only differentially expressed in specific cell types or cell lineages, reflecting the function of the genes themselves. Many of the changes identified had not been previously identified in AS samples or had different expression patterns to that observed in individual cell types. Emphasising that interpretation of changes in mixed cell populations should be made in light of these limitations.

CD4+T-cells had changes in genes associated with TCR engagement and T-cell differentiation (e.g. increased expression of CTLA4, SOCS1, SOCS3, TLR4, FOXP3).

Th17 differentiation requires STAT3 and SMAD expression, which are increased in

CD4+T-cells of AS individuals. IL6 is upregulated in AS individual CD4+T-cells and inhibits Th1 differentiation in AS to drive Th2 differentiation. However, Th2 requires

GATA3 and STAT6 expression for differentiation, and both are decreased in AS individuals. Th17 differentiation is most likely occurring, although they do not exhibit increased expression of IL17 or IL22, but instead have increased expression of IFNG and small increases in IL10 and TNF. Th17 cells plasticity may explain how these cells which indicate Th17 cell types result in a phenotype between Th1 and Treg phenotypes.

CD8+T-cells also exhibited changes in TCR signalling and T-cell differentiation genes, such as RUNX3, GATA3, IL4R and CTLA4. As mentioned above increased

272 Chapter 6: Final Discussion

GATA3, CD44, IL4R and RUNX3 signalling indicates activated CD8+T-cells. These cells have increased IFNG expression, but do not exhibit increased cytotoxic signalling, such as perforin, or granzyme B. AS individuals had small non-significant decreases in perforin, and small increases in granzyme B, which has previously been described in AS individuals 321. IFNy is a marker of cytotoxic CD8 T lymphocytes

(CTLs), and the changes in gene expression of AS individual CD8+T-cells indicate increased differentiation into CTLs with potentially altered lytic ability.

The functional shift in AS individual γδ T-cells was difficult to parse due to their low power and high variability. In contrast to previous studies, no change in expression of

IL17 or IL23R was identified in AS individual γδ T-cells. TRADD, a suggestive association with AS, had decreased expression in AS individuals in γδ T-cells. As this change in expression is only in one low prevalence cell type, this may explain the low changes observed in previous PBMC studies.

CD14+monocytes had large increases in the expression of inflammatory cytokines

(TNFA, IL23, IL1β) but not IFNG which had significantly reduced expression. This corresponded to increased NF-κβ signalling and regulation of cell adhesion.

Suggesting that AS individual CD14+monocytes are differentiating into inflammatory states. ETS2, a candidate gene within the 21q22 region, was upregulated, corresponding to the loss of methylation observed in Chapter 3. CD14+monocytes appear to be the drivers of a large proportion of inflammatory signalling in AS individuals.

NK cells conversely only had increased expression of TNFA and IFNG. NK cells had increased expression of genes involved in NK cell activation, however again a large number of the transcripts differentially expressed in NK cells had unknown function.

Chapter 6: Final Discussion 273

Chapter 5 integrated the data from Chapter 3 and 4 through two approaches to investigate correlations between genotype and the datasets developed in Chapters 3 and 4. Firstly, QTL analysis was performed to identify associations between DNA methylation and gene expression levels with AS-associated genetic loci. As expected, a high level of correlation was observed between mQTL and eQTL identified, although eQTLs were identified at lower significance than mQTLs, resulting in fewer associations. Associations between the lead SNPs for each AS-associated locus and the reported genes for that region were identified in most instances, such as ERAP1 and ERAP2 associations, however some lead SNPs also had associations with other genes in the same region such as RNASET2 and TBKBP1. ETS2 and ETS1 were both identified as mQTLs within LD of the lead SNPs for their respective regions. The changes in expression or methylation within genotype differed between AS and healthy individuals for some associations, however the number of individuals was not sufficient to determine if this was significant. These results suggest several genes that are worth further examination and for some genes such as ETS2, may explain the reason these associations are only observed in specific cell types.

The final analysis in this paper was to determine if the datasets generated in Chapter 3 and 4 could be incorporated with genotype to create a signature for AS. When examined separately for contribution to the variation between AS and healthy individuals, gene expression explained a large proportion of difference than DNA methylation or genotype. Genotype as expected was an additive measurement, explaining the most variation when all AS associated risk alleles were included. DNA methylation improved with sparse analysis confirming that only a small proportion of

CpGs are contributing to variation. The integrated signature indicated moderate correlation between measurements, with the greatest correlation between gene

274 Chapter 6: Final Discussion

expression and DNA methylation. The final signature had a BER ranging from 0.30 in

CD8+T-cells where the signature included RUNX3 (previously identified in both

Chapter 3 and 4), and highest for CD4+T-cells. CD4+T-cells had the lowest correlation between measurements. This indicates that using gene expression and

DNA methylation can provide a signature for AS, and the unsupervised DIABLO method is identifying genes that have been identified through supervised methods.

Further, it indicates that improving the correlation between measurements through selection of input or method of measurement (genome-wide compared to curated approach) could improve further upon this model.

Overall, the results of this study have shown the importance of cell isolated methodology in identifying disease associated changes, and that the integration of multiple ‘omics datasets can provide an opportunity to further distinguish the functional implications of AS-associated genetic loci.

Major Findings Summary x Isolating cell types is imperative to identifying disease-associated changes x DNA methylation changes due to AS are generally <20% x CD4+T-cells in AS individuals have increased Th17 or Th1 differentiation x CD8+T-cells in AS individuals show increased CTL differentiation with IFNG expression but not granzyme B or perforin x CD14+monocytes have increased inflammatory cytokine expression (TNFA, IL23, IL6, IL1β) and apoptosis, indicating activation. x γδ T-cells have no change in inflammatory cytokines observed x NK cells in AS individuals show activation and cytotoxicity with increased IFNG and TNFA expression x Cis-mQTL and cis-eQTLs were associated with AS-associated loci in a cell type specific manner x Genes not previously associated with AS loci were implicated as mQTL and eQTL, e.g. ETS2 and RNASET2

Chapter 6: Final Discussion 275

x DIABLO identified signatures for disease in individual cell types that could attribute affectation with a classification error rate 0.32-0.4. x Incorporating multiple omics datasets improved the classification error rate compared to one dataset alone

Project Significance

AS is a chronic disease affecting between 0.01 to 0.05% of the general population depending on ethnicity. Whilst more than 100 genetic associations have been identified the treatments for this disease are still largely broad immune suppressors through suppression of inflammatory signalling, such as TNF-α. These treatments are of obvious benefit in the reduction of inflammation and improving the quality of life of individuals with AS. Unfortunately, a lack of understanding of the underlying biology affected by AS-associated loci has hindered development of AS specific treatments.

Improving upon this understanding will provide a basis for developing novel treatments and improving disease diagnosis.

This study is the first to study cell type-specific influences on immune function in AS, providing a basis for further examination of fundamental immunology alongside novel genes for functional investigation. The data generated within provides the largest cohort for investigating DNA methylation in AS to date, and the only isolated cell type data in AS for gene expression or DNA methylation. In addition, it is the only study to perform integrated analysis between multiple ‘omics types with AS individuals. The findings within this study highlight the importance of careful study design as proof of concept for how to identify cell type specific changes in both DNA methylation and gene expression, enabling future use of this data to be comprehensive and effective.

276 Chapter 6: Final Discussion

Project Limitations

This project had imitations due to practical capabilities and several restrictions that only became evident during the study itself. The major limitation initially to the way this study was the availability of AS individuals and healthy male participants. The majority of our AS cohort are male which is inevitable when examining AS individuals however obtaining healthy males to compare with was extremely challenging. In addition to this challenge the majority of AS individuals were currently undergoing treatment. This is in part due to the established nature of the clinic through which we were recruiting and the need to begin individuals on treatment as early as possible. The inclusion of a majority of AS individuals on therapy further impaired the ability of this study to examine the association of measurements such as CRP, ESR and BASDAI with changes in DNA methylation and gene expression. The changes observed are potentially conflated with the result of anti-TNF-α therapy. Future studies would be recommended to initiate with several avenues of recruitment to maximise the numbers of these individuals in a shorter time period, or to selectively use individuals at extremes of BASDAI and CRP.

Tissue accessibility in AS is a well-established limitation on most studies as there are few situations in which these individuals are biopsied and fewer still in which healthy individuals have synovial fluid extracted. PBMCs are a useful alternative as they represent the changes that occur in the circulating immune system, and any signatures identified would be easily implementable as a method of diagnosis using blood.

The use of the EPIC array while affording a well-curated and established method of examining DNA methylation was limited by the low number of CpGs in close proximity to AS-associated loci (discussed in Chapter 3). Future studies would be recommended to utilise a specialised platform to better examine these CpGs. Similarly,

Chapter 6: Final Discussion 277

due to financial and technological limitation bulk cell RNAseq was utilised however current means of sorting and then implementing single cell RNAseq would have been a powerful means through which to specifically identify subsets of the cell types examined such as Th17 and Th1 cells.

278 Chapter 6: Final Discussion

Future directions

Future directions for this study can be divided into two broad questions, firstly regarding the basic underlying biology of several of the cells examined within this study, and secondly, specific interrogation of AS biology.

The importance of understanding the basic biology underlying individual cell types was apparent within this study, both from the importance of isolating individual cell types, but also as unknown function can prevent interpretation of what changes in expression or DNA methylation may mean to cell function. This was observed with

NK cells where a large number of the transcripts differentially methylated and expressed were not annotated or had unknown function. DNA methylation in particular suffers from a lag between the number of studies examining changes in a disease context and the number of studies investigating the functional role of CpGs located in intergenic regions or within the ‘open sea’. Examining the variation of these transcripts and CpGs between different cell types, and within various biological variables may provide some insight into the context in which they are functioning.

Importantly, this study emphasises that the selection of a cut-off level for changes in

DNA methylation (broadly used at 20%) based on the ability to confirm findings by

PCR, is an artificial approach to examining changes in methylation. The sensitivity of current array is much greater and the majority of changes in individual cell types fall below this level. Rather, the data from this study suggests that re-examining previous studies that used such a cut-off could be beneficial to expanding knowledge of AS-associated changes in DNA methylation. It is also a reminder that the justification for long-standing approaches should be re-evaluated as improvements to methodology and biological knowledge occur.

Chapter 6: Final Discussion 279

Further to this, this study provided an opportunity to examine γδ T-cell expression and DNA methylation levels. γδ T-cells are not currently represented in publicly available datasets such as ENCODE or BLUEPRINT projects. The relatively low prevalence of these cell types within PBMCs made it difficult to obtain a large number of individuals with sufficient sample for analysis. The position of γδ T-cells as encompassing both the innate and adaptive immune system makes this dataset an interesting basis to examine signatures for γδ T-cells in healthy individuals. The current study also observed two cell subsets within the γδ T-cell population, which have previously been reported as an activated subset and a homeostatic subset whose differences are driven by the expression of different δ chains 322,323. Future work within our group seeks to expand upon these observations and obtain expression data for each dataset to investigate the functional difference of these subsets within a healthy and AS context. These can be compared with the data generated from this dataset to identify whether the expression of γδ T-cells as a single cohort is representative of these subsets, and whether one or both subsets are affected within

AS individuals.

The cell types defined within this study are broad definitions, containing within them various subsets with distinct function. The use of these broad definitions is partially for practical reasons and partially to understand proportionally the shift in function of each cell type to a specific state, such as an activated inflammatory state. Any point of cell type definition is inherently arbitrary due to the spectrum of cellular function and state within a single lineage. Therefore, future work would not necessarily benefit from the use of single cell methods unless a specific change in underlying biology is in question, such as confirming the ETS2 QTL findings. This study is concerned with changes in the overarching function of cell types, which can be

280 Chapter 6: Final Discussion

profiled through cell type, whereas single cell approaches do not provide a population level profile of DNA methylation or gene expression changes. Practically, a greater number of individuals can be profiled using approaches such as RNAseq rather than single cell methods, and this is preferable to increased resolution of individual cell types.

The remaining questions regarding AS biology that can be addressed using this dataset is broad as concerted effort was taken to record as many variables as practicable during recruitment. The primary areas that link into the aims of the current thesis is influence of medication use on the changes identified in AS and whether this could explain some of the error within the signature for disease established in Chapter 5. Biologic treatment is a substantial source of potential variation within AS individuals as it is a broad suppressor of immune function.

Another lesser explored influence on gene expression and DNA methylation are

NSAIDs, which are taken by both the healthy control and AS cohorts. NSAID use was recorded for all participant including the frequency of use. The current cohort can be used to examine the effect of NSAIDs in comparison to biologics on individual cell types, and whether AS individuals being treated with biologics have a signature of expression and methylation closer to the “healthy” individual signature than AS individuals not taking biologics. Whilst individuals taking glucocorticoids were excluded from recruitment, NSAID use is far more prevalent in the general population. It is therefore important to define the scope of influence on gene expression and DNA methylation that NSAIDs have, and whether this should be considered during study design and analysis.

In addition to the potential effect of medication on AS gene expression and DNA methylation, it is important to note a major limitation of this study which is inherent

Chapter 6: Final Discussion 281

to almost all AS studies. All of the individuals recruited for the AS cohort had established disease, except one. This individual was initially recruited as a healthy individual but during the study was diagnosed with nrAxSpA. This individual was

HLA-B*27 positive and had a positive MRI, making it likely that they will go on to develop radiographic disease to meet the mNY criteria. Prior to analysis, this individual was examined for clustering within the healthy and AS cohorts, and closely grouped with the AS cohort. Therefore, this individual was included with the

AS cohort for analysis. The shifting of diagnosis highlights the broad features currently relied upon for AS diagnosis, and the emphasis on radiographic progression for positive diagnosis. This study is limited to providing a signature of AS in established disease, but comparisons with early disease may provide a method to differentiate which changes are representative of long-term inflammatory conditions with those that are causative of disease. At least a proportion of the changes observed are likely to be caused by inflammation alone.

To build specifically upon the findings in the current thesis, the depth of sequencing used can be used to better examine novel transcript assembly, such as within the

21q22 intergenic region. A proportion of data can be used to examine splicing variants which can be specifically influenced by genetic associated changes

(previously observed in AS with ERAP1) 313. This may further explain some of the variation observed during eQTL analysis if the variation increased or decreased a specific isoform. The confirmation of the causative link between the QTLs identified and AS genetic loci requires further confirmation in a new cohort.

Chapter 5 has already discussed potential methods through which the current cell type specific signatures of AS could be improved upon, however an interesting question which was not within the scope of the current study is whether all cell type

282 Chapter 6: Final Discussion

specific measurements could be incorporated into a single model of AS. This would be both a challenge in deciding how to select input to avoid overfitting or bias, and as such a model would be extremely large running such a model would be computationally intensive. The current study only had a small number of individuals for which all cell types were available (<20 individuals) or those with only four cell types (~80 individuals). Therefore, if all five cell types are intended to be used only correlation between measures could be examined. Alternatively, the data can be summarised for integration by treating the cell type measurements as unrelated studies using n-integration approaches such as MINT (Multivariate INTegrated) from the same package as DIABLO, which integrate based on measurements rather than individuals 214. This approach integrates based on predictors, e.g. genes, which would enable the selection of signatures when groups are defined as disease group and cell type (e.g. CD4.AS). This may provide for a model where the differences between AS and healthy controls in individual cell types do not overlap, but rather provide additive value to the model.

Limitations due to sample size have been evident throughout this study both in the discussions on statistical significance and the specific measures taken to avoid additional variation. In hindsight, the current study could have been refined by f

Overall, the design of the data generated for the current study was purposefully curated to enable the greatest use of the data obtained. This ranges from analysis to examine basic cell type biology to forming a multi-cell signature of AS. It is hoped that the results from the current study establish the quality of these datasets for future use and highlight areas of interest in changes of gene expression and methylation in

AS individuals that can drive further exploration and functional investigation.

Chapter 6: Final Discussion 283

Concluding remarks

In conclusion, this thesis outlines the broad cell type specific DNA methylation and gene expression changes that occur in AS individuals. It confirms previous studies and provides new avenues for investigation into specific pathways altered in AS, and the associations with AS-associated genetic loci. The expansion of the knowledge of the underlying biological changes driving AS has also highlighted the necessity of research into basic biology. Methodology improvements have expanded the extent to which DNA methylation and gene expression changes can be examined, increasing the coverage of non-coding genes and ‘open sea’ regions. However, these improvements are meaningless without proper expansion of annotation and function for these genetic mechanisms. This study is a platform from which that basic biological annotation and specific AS research can be performed. It is hoped that the data produced will inform future research into developing better diagnosis and treatment for AS individuals.

284 Chapter 6: Final Discussion

Bibliography

1. Whyte JM, Ellis JJ, Brown MA, Kenna TJ. Best practices in DNA methylation: lessons from inflammatory bowel disease, psoriasis and ankylosing spondylitis. Arthritis Res Ther 2019; 21(1): 133. 2. Ancuta P. A slan-based nomenclature for monocytes? Blood 2015; 126(24): 2536-8. 3. Abbas AK, Lichtman AH, Pillai S. Cellular and molecular immunology. Eighth edition. ed: Elsevier Saunders; 2015. 4. Zarin P, Chen EL, In TS, Anderson MK, Zuniga-Pflucker JC. Gamma delta T-cell differentiation and effector function programming, TCR signal strength, when and how much? Cell Immunol 2015; 296(1): 70-5. 5. Vantourout P, Hayday A. Six-of-the-best: unique contributions of gammadelta T cells to immunology. Nat Rev Immunol 2013; 13(2): 88-100. 6. Chien YH, Meyer C, Bonneville M. gammadelta T cells: first line of defense and beyond. Annu Rev Immunol 2014; 32(1): 121-55. 7. Hedges JF, Lubick KJ, Jutila MA. Gamma delta T cells respond directly to pathogen- associated molecular patterns. J Immunol 2005; 174(10): 6045-53. 8. Wesch D, Peters C, Oberg HH, Pietschmann K, Kabelitz D. Modulation of gammadelta T cell responses by TLR ligands. Cell Mol Life Sci 2011; 68(14): 2357-70. 9. Kenna TJ, Davidson SI, Duan R, et al. Enrichment of circulating interleukin-17-secreting interleukin-23 receptor-positive gamma/delta T cells in patients with active ankylosing spondylitis. Arthritis Rheum 2012; 64(5): 1420-9. 10. Park D, Kim HG, Kim M, et al. Differences in the molecular signatures of mucosal- associated invariant T cells and conventional T cells. Sci Rep 2019; 9(1): 7094. 11. Autissier P, Soulas C, Burdo TH, Williams KC. Evaluation of a 12-color flow cytometry panel to study lymphocyte, monocyte, and dendritic cell subsets in humans. Cytometry A 2010; 77(5): 410-9. 12. Murphy K, Travers P, Walport M, Janeway C. Janeway's immunobiology. 8th ed. New York: Garland Science; 2012. 13. Cimini E, Agrati C, D'Offizi G, et al. Primary and Chronic HIV Infection Differently Modulates Mucosal Vdelta1 and Vdelta2 T-Cells Differentiation Profile and Effector Functions. Plos One 2015; 10(6): e0129771. 14. Bennett CL, Christie J, Ramsdell F, et al. The immune dysregulation, polyendocrinopathy, enteropathy, X-linked syndrome (IPEX) is caused by mutations of FOXP3. Nat Genet 2001; 27(1): 20-1. 15. Appel H, Wu P, Scheer R, et al. Synovial and peripheral blood CD4+FoxP3+ T cells in spondyloarthritis. J Rheumatol 2011; 38(11): 2445-51. 16. Hori S, Nomura T, Sakaguchi S. Control of regulatory T cell development by the transcription factor Foxp3. Science 2003; 299(5609): 1057-61. 17. Embgenbroich M, Burgdorf S. Current Concepts of Antigen Cross-Presentation. Front Immunol 2018; 9(JUL): 1643. 18. Dasari V, Rehan S, Tey SK, Smyth MJ, Smith C, Khanna R. Autophagy and proteasome interconnect to coordinate cross-presentation through MHC class I pathway in B cells. Immunol Cell Biol 2016; 94(10): 964-74. 19. Glinos DA, Soskic B, Williams C, et al. Genomic profiling of T-cell activation suggests increased sensitivity of memory T cells to CD28 costimulation. Genes Immun 2020; 21(6-8): 390-408. 20. Braud VM, Allan DS, O'Callaghan CA, et al. HLA-E binds to natural killer cell receptors CD94/NKG2A, B and C. Nature 1998; 391(6669): 795-9. 21. Long EO, Kim HS, Liu D, Peterson ME, Rajagopalan S. Controlling natural killer cell responses: integration of signals for activation and inhibition. Annu Rev Immunol 2013; 31: 227-58. 22. Reveille JD, Hirsch R, Dillon CF, Carroll MD, Weisman MH. The prevalence of HLA-B27 in the US: data from the US National Health and Nutrition Examination Survey, 2009. Arthritis Rheum 2012; 64(5): 1407-11. 23. O’Neill J. Spine. In: O'Neill J, ed. Essential Imaging in Rheumatology. New York, NY: Springer New York; 2015: 351-70. 24. Braun J, Sieper J. Building consensus on nomenclature and disease classification for ankylosing spondylitis: results and discussion of a questionnaire prepared for the International Workshop on New Treatment Strategies in Ankylosing Spondylitis, Berlin, Germany, 18-19 January 2002. Ann Rheum Dis 2002; 61 Suppl 3: iii61-7.

Bibliography 285

25. Robinson PC, Leo PJ, Pointon JJ, et al. The genetic associations of acute anterior uveitis and their overlap with the genetics of ankylosing spondylitis. Genes Immun 2016; 17(1): 46-51. 26. van der Horst-Bruinsma IE, Nurmohamed MT, Landewe RB. Comorbidities in patients with spondyloarthritis. Rheum Dis Clin North Am 2012; 38(3): 523-38. 27. Haroon NN, Paterson JM, Li P, Inman RD, Haroon N. Patients With Ankylosing Spondylitis Have Increased Cardiovascular and Cerebrovascular Mortality: A Population-Based Study. Ann Intern Med 2015; 163(6): 409-16. 28. Thjodleifsson B, Geirsson AJ, Bjornsson S, Bjarnason I. A common genetic background for inflammatory bowel disease and ankylosing spondylitis: a genealogic study in Iceland. Arthritis Rheum 2007; 56(8): 2633-9. 29. Liew JW, Ramiro S, Gensler LS. Cardiovascular morbidity and mortality in ankylosing spondylitis and psoriatic arthritis. Best Pract Res Clin Rheumatol 2018; 32(3): 369-89. 30. Stolwijk C, van Onna M, Boonen A, van Tubergen A. Global Prevalence of Spondyloarthritis: A Systematic Review and Meta-Regression Analysis. Arthritis Care Res (Hoboken) 2016; 68(9): 1320-31. 31. Stolwijk C, Boonen A, van Tubergen A, Reveille JD. Epidemiology of spondyloarthritis. Rheum Dis Clin North Am 2012; 38(3): 441-76. 32. Akkoc N, Khan MA. Chapter 10 - Epidemiology of Ankylosing Spondylitis and Related Spondyloarthropathies A2 - Reveille, Michael H. WeismanDésirée van der HeijdeJohn D. Ankylosing Spondylitis and the Spondyloarthropathies. Philadelphia: Mosby; 2006: 117-31. 33. Webers C, Essers I, Ramiro S, et al. Gender-attributable differences in outcome of ankylosing spondylitis: long-term results from the Outcome in Ankylosing Spondylitis International Study. Rheumatology (Oxford) 2016; 55(3): 419-28. 34. Feldtkeller E, Khan MA, van der Heijde D, van der Linden S, Braun J. Age at disease onset and diagnosis delay in HLA-B27 negative vs. positive patients with ankylosing spondylitis. Rheumatol Int 2003; 23(2): 61-6. 35. Seo MR, Baek HL, Yoon HH, et al. Delayed diagnosis is linked to worse outcomes and unfavourable treatment responses in patients with axial spondyloarthritis. Clin Rheumatol 2015; 34(8): 1397-405. 36. Rosenbaum JT, Pisenti L, Park Y, Howard RA. Insight into the Quality of Life of Patients with Ankylosing Spondylitis: Real-World Data from a US-Based Life Impact Survey. Rheumatol Ther 2019; 6(3): 353-67. 37. Martindale J, Shukla R, Goodacre J. The impact of ankylosing spondylitis/axial spondyloarthritis on work productivity. Best Pract Res Clin Rheumatol 2015; 29(3): 512-23. 38. Kotsis K, Voulgari PV, Drosos AA, Carvalho AF, Hyphantis T. Health-related quality of life in patients with ankylosing spondylitis: a comprehensive review. Expert Rev Pharmacoecon Outcomes Res 2014; 14(6): 857-72. 39. Raybone K, Family H, Sengupta R, Jordan A. (Un)Spoken realities of living with axial spondyloarthritis: a qualitative study focused on couple experiences. BMJ Open 2019; 9(7): e025261. 40. van der Linden S, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis. A proposal for modification of the New York criteria. Arthritis Rheum 1984; 27(4): 361- 8. 41. Braun J, Van Den Berg R, Baraliakos X, et al. 2010 update of the ASAS/EULAR recommendations for the management of ankylosing spondylitis. Ann Rheum Dis 2011; 70(6): 896- 904. 42. Spoorenberg A, van der Heijde D, de Klerk E, et al. Relative value of erythrocyte sedimentation rate and C-reactive protein in assessment of disease activity in ankylosing spondylitis. J Rheumatol 1999; 26(4): 980-4. 43. Budd R, O'Dell JR, Gabriel SE, McInnes IB, Firestein GS. Kelley's textbook of rheumatology: Elsevier; 2016. 44. Yarwood A, Han B, Raychaudhuri S, et al. A weighted genetic risk score using all known susceptibility variants to estimate rheumatoid arthritis risk. Ann Rheum Dis 2015; 74(1): 170-6. 45. Mahajan A, Taliun D, Thurner M, et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 2018; 50(11): 1505-13. 46. Marigorta UM, Denson LA, Hyams JS, et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease. Nat Genet 2017; 49(10): 1517-21. 47. Z L, ED G, J H, et al. Genetic Risk Score Prediction in Ankylosing Spondylitis [abstract]. Arthritis Rheumatol 2018; 70.

286 Bibliography

48. Sweeney S, Taylor G, Calin A. The effect of a home based exercise intervention package on outcome in ankylosing spondylitis: a randomized controlled trial. J Rheumatol 2002; 29(4): 763-6. 49. Fernandez-de-Las-Penas C, Alonso-Blanco C, Morales-Cabezas M, Miangolarra-Page JC. Two exercise interventions for the management of patients with ankylosing spondylitis: a randomized controlled trial. Am J Phys Med Rehabil 2005; 84(6): 407-19. 50. Gurler O, Bezdan O, Akar S, et al. Effects of patient education and regular follow up on exercise compliance in ankylosing spondylitis. Ann Rheum Dis 2005; 64: 558-. 51. O'Dwyer T, O'Shea F, Wilson F. Decreased health-related physical fitness in adults with ankylosing spondylitis: a cross-sectional controlled study. Physiotherapy 2016; 102(2): 202-9. 52. Braun J, Brandt J, Listing J, et al. Treatment of active ankylosing spondylitis with infliximab: a randomised controlled multicentre trial. Lancet 2002; 359(9313): 1187-93. 53. van der Heijde D, Kivitz A, Schiff MH, et al. Efficacy and safety of adalimumab in patients with ankylosing spondylitis: results of a multicenter, randomized, double-blind, placebo-controlled trial. Arthritis Rheum 2006; 54(7): 2136-46. 54. Landewe R, Braun J, Deodhar A, et al. Efficacy of certolizumab pegol on signs and symptoms of axial spondyloarthritis including ankylosing spondylitis: 24-week results of a double- blind randomised placebo-controlled Phase 3 study. Ann Rheum Dis 2014; 73(1): 39-47. 55. Reveille JD, Deodhar A, Caldron PH, et al. Safety and Efficacy of Intravenous Golimumab in Adults with Ankylosing Spondylitis: Results through 1 Year of the GO-ALIVE Study. J Rheumatol 2019; 46(10): 1277-83. 56. van der Heijde D, Landewe R, Einstein S, et al. Radiographic progression of ankylosing spondylitis after up to two years of treatment with etanercept. Arthritis Rheum 2008; 58(5): 1324-31. 57. Haroon N, Inman RD, Learch TJ, et al. The impact of tumor necrosis factor alpha inhibitors on radiographic progression in ankylosing spondylitis. Arthritis Rheum 2013; 65(10): 2645-54. 58. Maxwell LJ, Zochling J, Boonen A, et al. TNF-alpha inhibitors for ankylosing spondylitis. Cochrane Database Syst Rev 2015; 4(4): CD005468. 59. Sieper J, Rudwaleit M, Baraliakos X, et al. The Assessment of SpondyloArthritis international Society (ASAS) handbook: a guide to assess spondyloarthritis. Ann Rheum Dis 2009; 68 Suppl 2(Suppl 2): ii1-44. 60. Baeten D, Sieper J, Braun J, et al. Secukinumab, an Interleukin-17A Inhibitor, in Ankylosing Spondylitis. N Engl J Med 2015; 373(26): 2534-48. 61. Marzo-Ortega H, Sieper J, Kivitz A, et al. Secukinumab and Sustained Improvement in Signs and Symptoms of Patients With Active Ankylosing Spondylitis Through Two Years: Results From a Phase III Study. Arthritis Care Res (Hoboken) 2017; 69(7): 1020-9. 62. van der Heijde D, Cheng-Chung Wei J, Dougados M, et al. Ixekizumab, an interleukin-17A antagonist in the treatment of ankylosing spondylitis or radiographic axial spondyloarthritis in patients previously untreated with biological disease-modifying anti-rheumatic drugs (COAST-V): 16 week results of a phase 3 randomised, double-blind, active-controlled and placebo-controlled trial. Lancet 2018; 392(10163): 2441-51. 63. van der Heijde D, Gensler LS, Deodhar A, et al. LB0001 Dual neutralisation of il-17a and il- 17f with bimekizumab in patients with active ankylosing spondylitis (AS): 12-week results from a phase 2b, randomised, double-blind, placebo-controlled, dose-ranging study. Ann Rheum Dis 2018; 77(Suppl 2): 70.1. 64. Reis J, Vender R, Torres T. Bimekizumab: The First Dual Inhibitor of Interleukin (IL)-17A and IL-17F for the Treatment of Psoriatic Disease and Ankylosing Spondylitis. BioDrugs 2019; 33(4): 391-9. 65. Baeten D, Ostergaard M, Wei JC, et al. Risankizumab, an IL-23 inhibitor, for ankylosing spondylitis: results of a randomised, double-blind, placebo-controlled, proof-of-concept, dose-finding phase 2 study. Ann Rheum Dis 2018; 77(9): 1295-302. 66. Poddubnyy D, Hermann KG, Callhoff J, Listing J, Sieper J. Ustekinumab for the treatment of patients with active ankylosing spondylitis: results of a 28-week, prospective, open-label, proof-of- concept study (TOPAS). Ann Rheum Dis 2014; 73(5): 817-23. 67. Mease P. Ustekinumab Fails to Show Efficacy in a Phase III Axial Spondyloarthritis Program: The Importance of Negative Results. Arthritis Rheumatol 2019; 71(2): 179-81. 68. Gao W, McGarry T, Orr C, McCormick J, Veale DJ, Fearon U. Tofacitinib regulates synovial inflammation in psoriatic arthritis, inhibiting STAT activation and induction of negative feedback inhibitors. Ann Rheum Dis 2016; 75(1): 311-5. 69. Boyle DL, Soma K, Hodge J, et al. The JAK inhibitor tofacitinib suppresses synovial JAK1- STAT signalling in rheumatoid arthritis. Ann Rheum Dis 2015; 74(6): 1311-6.

Bibliography 287

70. van der Heijde D, Baraliakos X, Gensler LS, et al. Efficacy and safety of filgotinib, a selective Janus kinase 1 inhibitor, in patients with active ankylosing spondylitis (TORTUGA): results from a randomised, placebo-controlled, phase 2 trial. Lancet 2018; 392(10162): 2378-87. 71. van der Heijde D, Song IH, Pangan AL, et al. Efficacy and safety of upadacitinib in patients with active ankylosing spondylitis (SELECT-AXIS 1): a multicentre, randomised, double-blind, placebo-controlled, phase 2/3 trial. Lancet 2019; 394(10214): 2108-17. 72. van der Heijde D, Deodhar A, Wei JC, et al. Tofacitinib in patients with ankylosing spondylitis: a phase II, 16-week, randomised, placebo-controlled, dose-ranging study. Ann Rheum Dis 2017; 76(8): 1340-7. 73. Bachelez H, van de Kerkhof PC, Strohal R, et al. Tofacitinib versus etanercept or placebo in moderate-to-severe chronic plaque psoriasis: a phase 3 randomised non-inferiority trial. Lancet 2015; 386(9993): 552-61. 74. van Vollenhoven RF, Fleischmann R, Cohen S, et al. Tofacitinib or adalimumab versus placebo in rheumatoid arthritis. N Engl J Med 2012; 367(6): 508-19. 75. Fleischmann R, Mease PJ, Schwartzman S, et al. Efficacy of tofacitinib in patients with rheumatoid arthritis stratified by background methotrexate dose group. Clin Rheumatol 2017; 36(1): 15-24. 76. Braun J, Baraliakos X, Listing J, Sieper J. Decreased incidence of anterior uveitis in patients with ankylosing spondylitis treated with the anti-tumor necrosis factor agents infliximab and etanercept. Arthritis Rheum 2005; 52(8): 2447-51. 77. Diaz-Llopis M, Salom D, Garcia-de-Vicuna C, et al. Treatment of refractory uveitis with adalimumab: a prospective multicenter study of 131 patients. Ophthalmology 2012; 119(8): 1575-81. 78. Mesquida M, Victoria Hernandez M, Llorenc V, et al. Behcet disease-associated uveitis successfully treated with golimumab. Ocul Immunol Inflamm 2013; 21(2): 160-2. 79. Sieper J, Braun J. How important is early therapy in axial spondyloarthritis? Rheum Dis Clin North Am 2012; 38(3): 635-42. 80. Tseng HW, Pitt ME, Glant TT, et al. Inflammation-driven bone formation in a mouse model of ankylosing spondylitis: sequential not parallel processes. Arthritis Res Ther 2016; 18: 35. 81. Zhao S, Challoner B, Khattak M, Moots RJ, Goodson NJ. Increasing smoking intensity is associated with increased disease activity in axial spondyloarthritis. Rheumatol Int 2017; 37(2): 239- 44. 82. Videm V, Cortes A, Thomas R, Brown MA. Current smoking is associated with incident ankylosing spondylitis -- the HUNT population-based Norwegian health study. J Rheumatol 2014; 41(10): 2041-8. 83. Hastie CE, Haw S, Pell JP. Impact of smoking cessation and lifetime exposure on C-reactive protein. Nicotine Tob Res 2008; 10(4): 637-42. 84. Brown MA, Kennedy LG, MacGregor AJ, et al. Susceptibility to ankylosing spondylitis in twins: the role of genes, HLA, and the environment. Arthritis Rheum 1997; 40(10): 1823-8. 85. Pedersen OB, Svendsen AJ, Ejstrup L, Skytthe A, Harris JR, Junker P. Ankylosing spondylitis in Danish and Norwegian twins: occurrence and the relative importance of genetic vs. environmental effectors in disease causation. Scand J Rheumatol 2008; 37(2): 120-6. 86. Brewerton DA, Hart FD, Nicholls A, Caffrey M, James DC, Sturrock RD. Ankylosing spondylitis and HL-A 27. Lancet 1973; 1(7809): 904-7. 87. Caffrey MF, James DC. Human lymphocyte antigen association in ankylosing spondylitis. Nature 1973; 242(5393): 121. 88. Reveille JD, Zhou X, Lee M, et al. HLA class I and II alleles in susceptibility to ankylosing spondylitis. Ann Rheum Dis 2019; 78(1): 66-73. 89. Ellinghaus D, Jostins L, Spain SL, et al. Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci. Nat Genet 2016; 48(5): 510-8. 90. International Genetics of Ankylosing Spondylitis C, Cortes A, Hadler J, et al. Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune- related loci. Nat Genet 2013; 45(7): 730-8. 91. Liu JZ, van Sommeren S, Huang H, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet 2015; 47(9): 979-86. 92. Tsoi LC, Stuart PE, Tian C, et al. Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants. Nat Commun 2017; 8: 15382.

288 Bibliography

93. Yang T, Duan Z, Wu S, et al. Association of HLA-B27 genetic polymorphisms with ankylosing spondylitis susceptibility worldwide: a meta-analysis. Mod Rheumatol 2014; 24(1): 150- 61. 94. Faham M, Carlton V, Moorhead M, et al. Discovery of T-Cell Receptor Beta Motifs Specific to HLA-B27+ Ankylosing Spondylitis by Deep Repertoire Sequence Analysis. Arthritis & Rheumatology 2016: n/a-n/a. 95. Hulsmeyer M, Fiorillo MT, Bettosini F, et al. Dual, HLA-B27 subtype-dependent conformation of a self-peptide. J Exp Med 2004; 199(2): 271-81. 96. Schittenhelm RB, Sian TC, Wilmann PG, Dudek NL, Purcell AW. Revisiting the arthritogenic peptide theory: quantitative not qualitative changes in the peptide repertoire of HLA-B27 allotypes. Arthritis Rheumatol 2015; 67(3): 702-13. 97. Evans DM, Spencer CC, Pointon JJ, et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet 2011; 43(8): 761-7. 98. Mear JP, Schreiber KL, Munz C, et al. Misfolding of HLA-B27 as a result of its B pocket suggests a novel mechanism for its role in susceptibility to spondyloarthropathies. J Immunol 1999; 163(12): 6665-70. 99. Goodall JC, Wu C, Zhang Y, et al. Endoplasmic reticulum stress-induced transcription factor, CHOP, is crucial for dendritic cell IL-23 expression. Proc Natl Acad Sci U S A 2010; 107(41): 17698-703. 100. Giquel B, Carmouse S, Denais C, et al. Two HLA-B27 alleles differently associated with spondylarthritis, B*2709 and B*2705, display similar intracellular trafficking and oligomer formation. Arthritis Rheum 2007; 56(7): 2232-43. 101. Turner MJ, Sowders DP, DeLay ML, et al. HLA-B27 misfolding in transgenic rats is associated with activation of the unfolded protein response. J Immunol 2005; 175(4): 2438-48. 102. Hammer RE, Maika SD, Richardson JA, Tang JP, Taurog JD. Spontaneous inflammatory disease in transgenic rats expressing HLA-B27 and human beta 2m: an animal model of HLA-B27- associated human disorders. Cell 1990; 63(5): 1099-112. 103. Tran TM, Dorris ML, Satumtira N, et al. Additional human beta2-microglobulin curbs HLA- B27 misfolding and promotes arthritis and spondylitis without colitis in male HLA-B27-transgenic rats. Arthritis Rheum 2006; 54(4): 1317-27. 104. Kenna TJ, Lau MC, Keith P, et al. Disease-associated polymorphisms in ERAP1 do not alter endoplasmic reticulum stress in patients with ankylosing spondylitis. Genes Immun 2015; 16(1): 35- 42. 105. Robinson PC, Lau E, Keith P, et al. ERAP2 functional knockout in humans does not alter surface heavy chains or HLA-B27, inflammatory cytokines or endoplasmic reticulum stress markers. Ann Rheum Dis 2015; 74(11): 2092-5. 106. Payeli SK, Kollnberger S, Marroquin Belaunzaran O, et al. Inhibiting HLA-B27 homodimer- driven immune cell inflammation in spondylarthritis. Arthritis Rheum 2012; 64(10): 3139-49. 107. Allen RL, O'Callaghan CA, McMichael AJ, Bowness P. Cutting edge: HLA-B27 can form a novel beta 2-microglobulin-free heavy chain homodimer structure. J Immunol 1999; 162(9): 5045-8. 108. Wong-Baeza I, Ridley A, Shaw J, et al. KIR3DL2 binds to HLA-B27 dimers and free H chains more strongly than other HLA class I and promotes the expansion of T cells in ankylosing spondylitis. J Immunol 2013; 190(7): 3216-24. 109. Bowness P, Ridley A, Shaw J, et al. Th17 cells expressing KIR3DL2+ and responsive to HLA-B27 homodimers are increased in ankylosing spondylitis. J Immunol 2011; 186(4): 2672-80. 110. Lim Kam Sian TCC, Indumathy S, Halim H, et al. Allelic association with ankylosing spondylitis fails to correlate with human leukocyte antigen B27 homodimer formation. J Biol Chem 2019; 294(52): 20185-95. 111. Bowness P. Hla-B27. Annu Rev Immunol 2015; 33: 29-48. 112. Bjarnason I, Helgason KO, Geirsson AJ, et al. Subclinical intestinal inflammation and sacroiliac changes in relatives of patients with ankylosing spondylitis. Gastroenterology 2003; 125(6): 1598-605. 113. De Vos M, Cuvelier C, Mielants H, Veys E, Barbier F, Elewaut A. Ileocolonoscopy in seronegative spondylarthropathy. Gastroenterology 1989; 96(2 Pt 1): 339-44. 114. Simioni J, Skare TL, Campos APB, et al. Fecal Calprotectin, Gut Inflammation and Spondyloarthritis. Arch Med Res 2019; 50(1): 41-6. 115. Rizzo A, Guggino G, Ferrante A, Ciccia F. Role of Subclinical Gut Inflammation in the Pathogenesis of Spondyloarthritis. Front Med (Lausanne) 2018; 5(MAY): 63.

Bibliography 289

116. Ostgard RD, Deleuran BW, Dam MY, Hansen IT, Jurik AG, Glerup H. Faecal calprotectin detects subclinical bowel inflammation and may predict treatment response in spondyloarthritis. Scand J Rheumatol 2018; 47(1): 48-55. 117. Taurog JD, Richardson JA, Croft JT, et al. The germfree state prevents development of gut and joint inflammatory disease in HLA-B27 transgenic rats. J Exp Med 1994; 180(6): 2359-64. 118. Rath HC, Herfarth HH, Ikeda JS, et al. Normal luminal bacteria, especially bacteroides species, mediate chronic colitis, gastritis, and arthritis in HLA-B27/human β2 microglobulin transgenic rats. Journal of Clinical Investigation 1996; 98(4): 945-53. 119. Rosenbaum JT, Lin P, Asquith M, Costello ME, Kenna TJ, Brown MA. Does the microbiome play a causal role in spondyloarthritis? Clin Rheumatol 2014; 33(6): 763-7. 120. Kehl AS, Learch TJ, Li D, McGovern DPB, Weisman MH. Relationship between the gut and the spine: a pilot study of first-degree relatives of patients with ankylosing spondylitis. RMD Open 2017; 3(2): e000437. 121. Costello ME, Ciccia F, Willner D, et al. Brief Report: Intestinal Dysbiosis in Ankylosing Spondylitis. Arthritis Rheumatol 2015; 67(3): 686-91. 122. Goodrich JK, Waters JL, Poole AC, et al. Human genetics shape the gut microbiome. Cell 2014; 159(4): 789-99. 123. Saarinen M, Ekman P, Ikeda M, et al. Invasion of Salmonella into human intestinal epithelial cells is modulated by HLA-B27. Rheumatology (Oxford) 2002; 41(6): 651-7. 124. Laitio P, Virtala M, Salmi M, Pelliniemi LJ, Yu DT, Granfors K. HLA-B27 modulates intracellular survival of Salmonella enteritidis in human monocytic cells. Eur J Immunol 1997; 27(6): 1331-8. 125. Virtala M, Kirveskari J, Granfors K. HLA-B27 modulates the survival of Salmonella enteritidis in transfected L cells, possibly by impaired nitric oxide production. Infect Immun 1997; 65(10): 4236-42. 126. Ortiz-Alvarez O, Yu DT, Petty RE, Finlay BB. HLA-B27 does not affect invasion of arthritogenic bacteria into human cells. J Rheumatol 1998; 25(9): 1765-71. 127. Robinson PC, Costello ME, Leo P, et al. ERAP2 is associated with ankylosing spondylitis in HLA-B27-positive and HLA-B27-negative patients. Ann Rheum Dis 2015; 74(8): 1627-9. 128. Cortes A, Pulit SL, Leo PJ, et al. Major histocompatibility complex associations of ankylosing spondylitis are complex and involve further epistasis with ERAP1. Nat Commun 2015; 6: 7146. 129. Hearn A, York IA, Rock KL. The specificity of trimming of MHC class I-presented peptides in the endoplasmic reticulum. Journal of Immunology 2009; 183(9): 5526-36. 130. Saveanu L, Carroll O, Lindo V, et al. Concerted peptide trimming by human ERAP1 and ERAP2 aminopeptidase complexes in the endoplasmic reticulum. Nat Immunol 2005; 6(7): 689-97. 131. Kanaseki T, Blanchard N, Hammer GE, Gonzalez F, Shastri N. ERAAP synergizes with MHC class I molecules to make the final cut in the antigenic peptide precursors in the endoplasmic reticulum. Immunity 2006; 25(5): 795-806. 132. Kenna TJ, Lau MC, Keith P, et al. Disease-associated polymorphisms in ERAP1 do not alter endoplasmic reticulum stress in patients with ankylosing spondylitis. Genes Immun 2015; 16(1): 35- 42. 133. Lupardus PJ, Garcia KC. The structure of interleukin-23 reveals the molecular basis of p40 subunit sharing with interleukin-12. J Mol Biol 2008; 382(4): 931-41. 134. Lubberts E. The IL-23-IL-17 axis in inflammatory arthritis. Nat Rev Rheumatol 2015; 11(7): 415-29. 135. Kenna TJ, Brown MA. The role of IL-17-secreting mast cells in inflammatory joint disease. Nat Rev Rheumatol 2013; 9(6): 375-9. 136. Sherlock JP, Joyce-Shaikh B, Turner SP, et al. IL-23 induces spondyloarthropathy by acting on ROR-gammat+ CD3+CD4-CD8- entheseal resident T cells. Nat Med 2012; 18(7): 1069-76. 137. Shen H, Goodall JC, Hill Gaston JS. Frequency and phenotype of peripheral blood Th17 cells in ankylosing spondylitis and rheumatoid arthritis. Arthritis Rheum 2009; 60(6): 1647-56. 138. Davidson SI, Liu Y, Danoy PA, et al. Association of STAT3 and TNFRSF1A with ankylosing spondylitis in Han Chinese. Ann Rheum Dis 2011; 70(2): 289-92. 139. Reveille JD. Genetics of spondyloarthritis--beyond the MHC. Nat Rev Rheumatol 2012; 8(5): 296-304. 140. Australo-Anglo-American Spondyloarthritis C, Reveille JD, Sims AM, et al. Genome-wide association study of ankylosing spondylitis identifies non-MHC susceptibility loci. Nat Genet 2010; 42(2): 123-7.

290 Bibliography

141. Chen C, Zhang X, Wang Y. ANTXR2 and IL-1R2 polymorphisms are not associated with ankylosing spondylitis in Chinese Han population. Rheumatol Int 2012; 32(1): 15-9. 142. Karaderi T, Keidel SM, Pointon JJ, et al. Ankylosing spondylitis is associated with the anthrax toxin receptor 2 gene (ANTXR2). Ann Rheum Dis 2014; 73(11): 2054-8. 143. Lin Z, Bei JX, Shen M, et al. A genome-wide association study in Han Chinese identifies new susceptibility loci for ankylosing spondylitis. Nat Genet 2011; 44(1): 73-7. 144. McGonagle DG, McInnes IB, Kirkham BW, Sherlock J, Moots R. The role of IL-17A in axial spondyloarthritis and psoriatic arthritis: recent advances and controversies. Ann Rheum Dis 2019; 78(9): 1167-78. 145. Diarra D, Stolina M, Polzer K, et al. Dickkopf-1 is a master regulator of joint remodeling. Nat Med 2007; 13(2): 156-63. 146. Heiland GR, Appel H, Poddubnyy D, et al. High level of functional dickkopf-1 predicts protection from syndesmophyte formation in patients with ankylosing spondylitis. Ann Rheum Dis 2012; 71(4): 572-4. 147. Uderhardt S, Diarra D, Katzenbeisser J, et al. Blockade of Dickkopf (DKK)-1 induces fusion of sacroiliac joints. Ann Rheum Dis 2010; 69(3): 592-7. 148. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A 2012; 109(4): 1193-8. 149. Li Z, Haynes K, Pennisi DJ, et al. Epigenetic and gene expression analysis of ankylosing spondylitis-associated loci implicate immune cells and the gut in the disease pathogenesis. Genes Immun 2017; 18(3): 135-43. 150. Kenna TJ, Davidson SI, Duan R, et al. Enrichment of circulating interleukin-17-secreting interleukin-23 receptor-positive γ/δ T cells in patients with active ankylosing spondylitis. Arthritis Rheum 2012; 64(5): 1420-9. 151. Gracey E, Qaiyum Z, Almaghlouth I, et al. IL-7 primes IL-17 in mucosal-associated invariant T (MAIT) cells, which contribute to the Th17-axis in ankylosing spondylitis. Ann Rheum Dis 2016; 75(12): 2124-32. 152. Toussirot E, Laheurte C, Gaugler B, Gabriel D, Saas P. Increased IL-22- and IL-17A- Producing Mucosal-Associated Invariant T Cells in the Peripheral Blood of Patients With Ankylosing Spondylitis. Front Immunol 2018; 9(JUL): 1610. 153. Duan Z, Gui Y, Li C, et al. The immune dysfunction in ankylosing spondylitis patients. Biosci Trends 2017; 11(1): 69-76. 154. Kim TJ, Lee SJ, Cho YN, et al. Immune cells and bone formation in ankylosing spondylitis. Clin Exp Rheumatol 2012; 30(4): 469-75. 155. Limon-Camacho L, Vargas-Rojas MI, Vazquez-Mellado J, et al. In vivo peripheral blood proinflammatory T cells in patients with ankylosing spondylitis. J Rheumatol 2012; 39(4): 830-5. 156. Meliconi R, Pitzalis C, Kingsley GH, Panayi GS. Gamma/delta T cells and their subpopulations in blood and synovial fluid from rheumatoid arthritis and spondyloarthritis. Clin Immunol Immunopathol 1991; 59(1): 165-72. 157. Surdacki A, Sulicka J, Korkosz M, et al. Blood monocyte heterogeneity and markers of endothelial activation in ankylosing spondylitis. J Rheumatol 2014; 41(3): 481-9. 158. Batko B, Schramm-Luc A, Skiba DS, Mikolajczyk TP, Siedlinski M. TNF-alpha Inhibitors Decrease Classical CD14(hi)CD16- Monocyte Subsets in Highly Active, Conventional Treatment Refractory Rheumatoid Arthritis and Ankylosing Spondylitis. Int J Mol Sci 2019; 20(2): 291. 159. Zhai Y, Lin P, Feng Z, et al. TNFAIP3-DEPTOR complex regulates inflammasome secretion through autophagy in ankylosing spondylitis monocytes. Autophagy 2018; 14(9): 1629-43. 160. Ciccia F, Guggino G, Zeng M, et al. Proinflammatory CX3CR1+CD59+Tumor Necrosis Factor-Like Molecule 1A+Interleukin-23+ Monocytes Are Expanded in Patients With Ankylosing Spondylitis and Modulate Innate Lymphoid Cell 3 Immune Functions. Arthritis Rheumatol 2018; 70(12): 2003-13. 161. Zhao J, Yuan W, Tao C, Sun P, Yang Z, Xu W. M2 polarization of monocytes in ankylosing spondylitis and relationship with inflammation and structural damage. APMIS 2017; 125(12): 1070-5. 162. Wright C, Edelmann M, diGleria K, et al. Ankylosing spondylitis monocytes show upregulation of proteins involved in inflammation and the ubiquitin proteasome pathway. Ann Rheum Dis 2009; 68(10): 1626-32. 163. Liu W, Wang P, Xie Z, et al. Abnormal inhibition of osteoclastogenesis by mesenchymal stem cells through the miR-4284/CXCL5 axis in ankylosing spondylitis. Cell Death Dis 2019; 10(3): 188.

Bibliography 291

164. Caparbo VF, Saad CGS, Moraes JC, de Brum-Fernandes AJ, Pereira RMR. Monocytes from male patients with ankylosing spondylitis display decreased osteoclastogenesis and decreased RANKL/OPG ratio. Osteoporos Int 2018; 29(11): 2565-73. 165. Xie Z, Wang P, Li J, et al. MCP1 triggers monocyte dysfunctions during abnormal osteogenic differentiation of mesenchymal stem cells in ankylosing spondylitis. J Mol Med (Berl) 2017; 95(2): 143-54. 166. Schulte-Wrede U, Sorensen T, Grun JR, et al. An explorative study on deep profiling of peripheral leukocytes to identify predictors for responsiveness to anti-tumour necrosis factor alpha therapies in ankylosing spondylitis: natural killer cells in focus. Arthritis Res Ther 2018; 20(1): 191. 167. Tomizawa S, Kobayashi H, Watanabe T, et al. Dynamic stage-specific changes in imprinted differentially methylated regions during early mammalian development and prevalence of non-CpG methylation in oocytes. Development 2011; 138(5): 811-20. 168. Radford EJ. Exploring the extent and scope of epigenetic inheritance. Nat Rev Endocrinol 2018; 14(6): 345-55. 169. van Otterdijk SD, Michels KB. Transgenerational epigenetic inheritance in mammals: how good is the evidence? FASEB J 2016; 30(7): 2457-65. 170. Schubeler D. Function and information content of DNA methylation. Nature 2015; 517(7534): 321-6. 171. Lienert F, Wirbelauer C, Som I, Dean A, Mohn F, Schubeler D. Identification of genetic elements that autonomously determine DNA methylation states. Nat Genet 2011; 43(11): 1091-7. 172. Lai NS, Chou JL, Chen GC, Liu SQ, Lu MC, Chan MW. Association between cytokines and methylation of SOCS-1 in serum of patients with ankylosing spondylitis. Mol Biol Rep 2014; 41(6): 3773-80. 173. Aslani S, Mahmoudi M, Garshasbi M, Jamshidi AR, Karami J, Nicknam MH. Evaluation of DNMT1 gene expression profile and methylation of its promoter region in patients with ankylosing spondylitis. Clin Rheumatol 2016; 35(11): 2723-31. 174. Karami J, Mahmoudi M, Amirzargar A, et al. Promoter hypermethylation of BCL11B gene correlates with downregulation of gene transcription in ankylosing spondylitis patients. Genes Immun 2017; 18(3): 170-5. 175. Pimentel-Santos FM, Ligeiro D, Matos M, et al. Whole blood transcriptional profiling in ankylosing spondylitis identifies novel candidate genes that might contribute to the inflammatory and tissue-destructive disease aspects. Arthritis Res Ther 2011; 13(2): R57. 176. Hao J, Liu Y, Xu J, et al. Genome-wide DNA methylation profile analysis identifies differentially methylated loci associated with ankylosis spondylitis. Arthritis Res Ther 2017; 19(1): 177. 177. Pimentel-Santos FM, Matos M, Ligeiro D, et al. HLA alleles and HLA-B27 haplotypes associated with susceptibility and severity of ankylosing spondylitis in a Portuguese population. Tissue Antigens 2013; 82(6): 374-9. 178. Chen M, Wu M, Hu X, et al. Ankylosing spondylitis is associated with aberrant DNA methylation of IFN regulatory factor 8 gene promoter region. Clin Rheumatol 2019; 38(8): 2161-9. 179. Zhang X, Lu J, Pan Z, et al. DNA methylation and transcriptome signature of the IL12B gene in ankylosing spondylitis. Int Immunopharmacol 2019; 71: 109-14. 180. Coit P, Kaushik P, Caplan L, et al. Genome-wide DNA methylation analysis in ankylosing spondylitis identifies HLA-B*27 dependent and independent DNA methylation changes in whole blood. J Autoimmun 2019; 102: 126-32. 181. Westra HJ, Franke L. From genome to function by studying eQTLs. Biochim Biophys Acta 2014; 1842(10): 1896-902. 182. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011; 12(2): 87-98. 183. Haynes KR. Regulatory RNAs underlying Genetic Associations in Ankylosing Spondylitis [Thesis]. Australia: University of Queensland; 2015. 184. Assassi S, Reveille JD, Arnett FC, et al. Whole-blood gene expression profiling in ankylosing spondylitis shows upregulation of toll-like receptor 4 and 5. J Rheumatol 2011; 38(1): 87- 98. 185. Duan R, Leo P, Bradbury L, Brown MA, Thomas G. Gene expression profiling reveals a downregulation in immune-associated genes in patients with AS. Ann Rheum Dis 2010; 69(9): 1724-9. 186. Sharma SM, Choi D, Planck SR, et al. Insights in to the pathogenesis of axial spondyloarthropathy based on gene expression profiles. Arthritis Res Ther 2009; 11(6): R168.

292 Bibliography

187. Gu J, Wei YL, Wei JC, et al. Identification of RGS1 as a candidate biomarker for undifferentiated spondylarthritis by genome-wide expression profiling and real-time polymerase chain reaction. Arthritis Rheum 2009; 60(11): 3269-79. 188. Ventham NT, Kennedy NA, Adams AT, et al. Integrative epigenome-wide analysis demonstrates that DNA methylation may mediate genetic risk in inflammatory bowel disease. Nat Commun 2016; 7: 13507. 189. Gu J, Marker-Hermann E, Baeten D, et al. A 588-gene microarray analysis of the peripheral blood mononuclear cells of spondyloarthropathy patients. Rheumatology (Oxford) 2002; 41(7): 759- 66. 190. Rihl M, Kellner H, Kellner W, et al. Identification of interleukin-7 as a candidate disease mediator in spondylarthritis. Arthritis Rheum 2008; 58(11): 3430-5. 191. Smith JA, Barnes MD, Hong D, DeLay ML, Inman RD, Colbert RA. Gene expression analysis of macrophages derived from ankylosing spondylitis patients reveals interferon-gamma dysregulation. Arthritis Rheum 2008; 58(6): 1640-9. 192. Haroon N, Tsui FW, O'Shea FD, et al. From gene expression to serum proteins: biomarker discovery in ankylosing spondylitis. Ann Rheum Dis 2010; 69(1): 297-300. 193. Xu L, Sun Q, Jiang S, Li J, He C, Xu W. Changes in gene expression profiles of the hip joint ligament of patients with ankylosing spondylitis revealed by DNA chip. Clin Rheumatol 2012; 31(10): 1479-91. 194. Thomas GP, Duan R, Pettit AR, et al. Expression profiling in spondyloarthropathy synovial biopsies highlights changes in expression of inflammatory genes in conjunction with tissue remodelling genes. BMC Musculoskelet Disord 2013; 14: 354. 195. Yeremenko N, Noordenbos T, Cantaert T, et al. Disease-specific and inflammation- independent stromal alterations in spondylarthritis synovitis. Arthritis Rheum 2013; 65(1): 174-85. 196. Li Y, Wang P, Xie Z, et al. Whole Genome Expression Profiling and Signal Pathway Screening of MSCs in Ankylosing Spondylitis. Stem Cells Int 2014; 2014: 913050. 197. Talpin A, Costantino F, Bonilla N, et al. Monocyte-derived dendritic cells from HLA-B27+ axial spondyloarthritis (SpA) patients display altered functional capacity and deregulated gene expression. Arthritis Res Ther 2014; 16(4): 417. 198. Chen MH, Chen HA, Chen WS, Chen MH, Tsai CY, Chou CT. Upregulation of BMP-2 expression in peripheral blood mononuclear cells by proinflammatory cytokines and radiographic progression in ankylosing spondylitis. Mod Rheumatol 2015; 25(6): 913-8. 199. Almasi S, Aslani S, Poormoghim H, Jamshidi AR, Poursani S, Mahmoudi M. Gene Expression Profiling of Toll-Like Receptor 4 and 5 in Peripheral Blood Mononuclear Cells in Rheumatic Disorders: Ankylosing Spondylitis and Rheumatoid Arthritis. Iran J Allergy Asthma Immunol 2016; 15(1): 87-92. 200. Vecellio M, Roberts AR, Cohen CJ, et al. The genetic association of RUNX3 with ankylosing spondylitis can be explained by allele-specific effects on IRF4 recruitment that alter gene expression. Ann Rheum Dis 2016; 75(8): 1534-40. 201. Xie Z, Li J, Wang P, et al. Differential Expression Profiles of Long Noncoding RNA and mRNA of Osteogenically Differentiated Mesenchymal Stem Cells in Ankylosing Spondylitis. J Rheumatol 2016; 43(8): 1523-31. 202. Dolcino M, Tinazzi E, Pelosi A, et al. Gene Expression Analysis before and after Treatment with Adalimumab in Patients with Ankylosing Spondylitis Identifies Molecular Pathways Associated with Response to Therapy. Genes (Basel) 2017; 8(4): E127. 203. Huang J, Song G, Yin Z, Fu Z, Ye Z. MiR-29a and Messenger RNA Expression of Bone Turnover Markers in Canonical Wnt Pathway in Patients with Ankylosing Spondylitis. Clin Lab 2017; 63(5): 955-60. 204. Layh-Schmitt G, Lu S, Navid F, et al. Generation and differentiation of induced pluripotent stem cells reveal ankylosing spondylitis risk gene expression in bone progenitors. Clin Rheumatol 2017; 36(1): 143-54. 205. Roozbehkia M, Mahmoudi M, Aletaha S, et al. The potent suppressive effect of beta-d- mannuronic acid (M2000) on molecular expression of the TLR/NF-kB Signaling Pathway in ankylosing spondylitis patients. Int Immunopharmacol 2017; 52(Supplement C): 191-6. 206. Wang XB, Ellis JJ, Pennisi DJ, et al. Transcriptome analysis of ankylosing spondylitis patients before and after TNF-alpha inhibitor therapy reveals the pathways affected. Genes Immun 2017; 18(3): 184-90. 207. Zhang C, Wang C, Jia Z, et al. Differentially expressed mRNAs, lncRNAs, and miRNAs with associated co-expression and ceRNA networks in ankylosing spondylitis. Oncotarget 2017; 8(69): 113543-57.

Bibliography 293

208. Xu Z, Wang X, Zheng Y. Screening for key genes and transcription factors in ankylosing spondylitis by RNA-Seq. Exp Ther Med 2018; 15(2): 1394-402. 209. Xu Z, Zhou X, Li H, Chen Q, Chen G. Identification of the key genes and long noncoding RNAs in ankylosing spondylitis using RNA sequencing. Int J Mol Med 2019; 43(3): 1179-92. 210. Kook HY, Jin SH, Park PR, Lee SJ, Shin HJ, Kim TJ. Serum miR-214 as a novel biomarker for ankylosing spondylitis. Int J Rheum Dis 2019; 22(7): 1196-201. 211. Holzinger ER, Ritchie MD. Integrating heterogeneous high-throughput data for meta- dimensional pharmacogenomics and disease-related studies. Pharmacogenomics 2012; 13(2): 213-22. 212. Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 2015; 16(2): 85-97. 213. Singh A, Shannon CP, Gautier B, et al. DIABLO: from multi-omics assays to biomarker discovery, an integrative approach. bioRxiv 2018: 067611. 214. Rohart F, Gautier B, Singh A, Le Cao KA. mixOmics: An R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 2017; 13(11): e1005752. 215. Leslie S, Donnelly P, McVean G. A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet 2008; 82(1): 48-56. 216. Dilthey AT, Moutsianas L, Leslie S, McVean G. HLA*IMP--an integrated framework for imputing classical HLA alleles from SNP genotypes. Bioinformatics 2011; 27(7): 968-72. 217. Lucotte G, Burckel A. DNA typing of HLA-B27 by polymerase chain reaction. Mol Cell Probes 1997; 11(4): 313-5. 218. Steffens-Nakken HM, Zwart G, van den Bergh FA. Validation of allele-specific polymerase chain reaction for DNA typing of HLA-B27. Clin Chem 1995; 41(5): 687-92. 219. Team RC. R: A language and environment for statistical computing.: R Foundation for Statistical Computing; 2015. 220. Team R. RStudio: Integrated Development for R. Boston, MA.: RStudio, Inc.; 2015. 221. Aryee MJ, Jaffe AE, Corrada-Bravo H, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 2014; 30(10): 1363-9. 222. McCartney DL, Walker RM, Morris SW, McIntosh AM, Porteous DJ, Evans KL. Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip. Genom Data 2016; 9: 22-4. 223. Teschendorff AE, Marabita F, Lechner M, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 2013; 29(2): 189-96. 224. Pidsley R, CC YW, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 2013; 14(1): 293. 225. Leek JT, johnson WE, Parker H, et al. sva: Surrogate Variable Analysis. 3.32.1 ed; 2019. 226. Holm S. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics 1979; 6(2): 65-70. 227. Mallik S, Odom GJ, Gao Z, Gomez L, Chen X, Wang L. An evaluation of supervised methods for identifying differentially methylated regions in Illumina methylation arrays. Brief Bioinform 2019; 20(6): 2224-35. 228. Buniello A, MacArthur JAL, Cerezo M, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 2019; 47(D1): D1005-D12. 229. Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013; 29(1): 15-21. 230. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 2009; 4(8): 1184-91. 231. Kim-Anh Le Cao IG, Sebastien Dejean with key contributors Florian Rohart, Benoit Gautier,, contributions from Pierre Monget JC, FangZou Yao and Benoit Liquet. mixOmics: Omics Data Integration Project. R package version 5.2.0; 2015. 232. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA- seq data with DESeq2. Genome Biol 2014; 15(12): 550. 233. Gutierrez-Arcelus M, Lappalainen T, Montgomery SB, et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. Elife 2013; 2(2): e00523. 234. Papadia C, Louwagie J, Del Rio P, et al. FOXE1 and SYNE1 genes hypermethylation panel as promising biomarker in colitis-associated colorectal neoplasia. Inflamm Bowel Dis 2014; 20(2): 271-7.

294 Bibliography

235. Zhou F, Shen C, Xu J, et al. Epigenome-wide association data implicates DNA methylation- mediated genetic risk in psoriasis. Clin Epigenetics 2016; 8(1): 131. 236. Tirado-Magallanes R, Rebbani K, Lim R, Pradhan S, Benoukraf T. Whole genome DNA methylation: beyond genes silencing. Oncotarget 2017; 8(3): 5629-37. 237. Aapola U, Kawasaki K, Scott HS, et al. Isolation and initial characterization of a novel gene, DNMT3L, on 21q22.3, related to the cytosine-5-methyltransferase 3 gene family. Genomics 2000; 65(3): 293-8. 238. Bourc'his D, Xu GL, Lin CS, Bollman B, Bestor TH. Dnmt3L and the establishment of maternal genomic imprints. Science 2001; 294(5551): 2536-9. 239. Kaneda M, Okano M, Hata K, et al. Essential role for de novo DNA methyltransferase Dnmt3a in paternal and maternal imprinting. Nature 2004; 429(6994): 900-3. 240. Chiba H, Kakuta Y, Kinouchi Y, et al. Allele-specific DNA methylation of disease susceptibility genes in Japanese patients with inflammatory bowel disease. Plos One 2018; 13(3): e0194036. 241. Howell KJ, Kraiczy J, Nayak KM, et al. DNA Methylation and Transcription Patterns in Intestinal Epithelial Cells From Pediatric Patients With Inflammatory Bowel Diseases Differentiate Disease Subtypes and Associate With Outcome. Gastroenterology 2018; 154(3): 585-98. 242. Frommer M, McDonald LE, Millar DS, et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 1992; 89(5): 1827-31. 243. Sandoval J, Heyn H, Moran S, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the . Epigenetics 2011; 6(6): 692-702. 244. Pidsley R, Zotenko E, Peters TJ, et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol 2016; 17(1): 208. 245. Moran S, Arribas C, Esteller M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics 2016; 8(3): 389-99. 246. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J R Stat Soc B 1995; 57(1): 289-300. 247. Noble WS. How does multiple testing correction work? Nat Biotechnol 2009; 27(12): 1135- 7. 248. McLaughlin MJ, Sainani KL. Bonferroni, Holm, and Hochberg corrections: fun names, serious changes to p values. PM R 2014; 6(6): 544-6. 249. Chen C-y, Zheng X, Qin Y-w, Xu R-l, Hu J-q. Influence of PRKAG2 gene G100S novel mutation on glycogen storage and calcium homeostasis in cardiomyocytes. Academic Journal of Second Military Medical University 2010; 30(11): 1165-8. 250. Berggren O, Alexsson A, Morris DL, et al. IFN-alpha production by plasmacytoid dendritic cell associations with polymorphisms in gene loci related to autoimmune and inflammatory diseases. Hum Mol Genet 2015; 24(12): 3571-81. 251. Hopwood B, Tsykin A, Findlay DM, Fazzalari NL. Microarray gene expression profiling of osteoarthritic bone suggests altered bone remodelling, WNT and transforming growth factor- beta/bone morphogenic protein signalling. Arthritis Res Ther 2007; 9(5): R100. 252. Catherine Jin SL, Ding SL, Lin SC. Phosphodiesterase 4 and its inhibitors in inflammatory diseases. Chang Gung Medical Journal 2012; 35(3): 197-210. 253. Li H, Fan C, Feng C, et al. Inhibition of phosphodiesterase-4 attenuates murine ulcerative colitis through interference with mucosal immunity. Br J Pharmacol 2019; 176(13): 2209-26. 254. Lelubre C, Medfai H, Akl I, et al. Leukocyte phosphodiesterase expression after lipopolysaccharide and during sepsis and its relationship with HLA-DR expression. J Leukoc Biol 2017; 101(6): 1419-26. 255. Celgene. Study of Apremilast to Treat Subjects With Active Ankylosing Spondylitis. February 24 2014. https://ClinicalTrials.gov/show/NCT01583374. 256. Uhlen M, Fagerberg L, Hallstrom BM, et al. Proteomics. Tissue-based map of the human proteome. Science 2015; 347(6220): 1260419. 257. Zinovieva E, Kadi A, Letourneur F, et al. Systematic candidate gene investigations in the SPA2 locus (9q32) show an association between TNFSF8 and susceptibility to spondylarthritis. Arthritis Rheum 2011; 63(7): 1853-9. 258. Bieluszewska A, Weglewska M, Bieluszewski T, Lesniewicz K, Poreba E. PKA-binding domain of AKAP8 is essential for direct interaction with DPY30 protein. FEBS J 2018; 285(5): 947- 64. 259. Bolling MC, Jonkman MF. KLHL24: Beyond Skin Fragility. J Invest Dermatol 2019; 139(1): 22-4.

Bibliography 295

260. Shi X, Xiang S, Cao J, et al. Kelch-like proteins: Physiological functions and relationships with diseases. Pharmacol Res 2019; 148: 104404. 261. Webber E, Li L, Chin LS. Hypertonia-associated protein Trak1 is a novel regulator of endosome-to-lysosome trafficking. J Mol Biol 2008; 382(3): 638-51. 262. Lee CA, Chin LS, Li L. Hypertonia-linked protein Trak1 functions with mitofusins to promote mitochondrial tethering and fusion. Protein Cell 2018; 9(8): 693-716. 263. Jaffe AE, Murakami P, Lee H, et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol 2012; 41(1): 200-9. 264. Peters TJ, Buckley MJ, Statham AL, et al. De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin 2015; 8(1): 6. 265. Riley JW. The American Soldier: Adjustment During Army Life (Studies in Social Psychology in World War II, vol 1). American Sociological Review 1949; 14(4): 557-9. 266. UniProt C. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 2019; 47(D1): D506-D15. 267. Kearsey J, Petit S, De Oliveira C, Schweighoffer F. A novel four transmembrane spanning protein, CLP24. A hypoxically regulated cell junction protein. Eur J Biochem 2004; 271(13): 2584- 92. 268. Donaldson IJ, Amin S, Hensman JJ, et al. Genome-wide occupancy links Hoxa2 to Wnt- beta-catenin signaling in mouse embryonic development. Nucleic Acids Res 2012; 40(9): 3990-4001. 269. Tavella S, Bobola N. Expressing Hoxa2 across the entire endochondral skeleton alters the shape of the skeletal template in a spatially restricted fashion. Differentiation 2010; 79(3): 194-202. 270. Li T, Wang W, Gong S, et al. Genome-wide analysis reveals TNFAIP8L2 as an immune checkpoint regulator of inflammation and metabolism. Mol Immunol 2018; 99: 154-62. 271. Yu B, Xu L, Cai M, Zhang D, Li S. Effect of tumor necrosis factor-alpha-induced protein 8 on the immune response of CD4+ T lymphocytes in mice following acute insult. Mol Med Rep 2018; 17(5): 6655-60. 272. Westberg C, Yang JP, Tang H, Reddy TR, Wong-Staal F. A novel shuttle protein binds to RNA helicase A and activates the retroviral constitutive transport element. J Biol Chem 2000; 275(28): 21396-401. 273. Martins SB, Eide T, Steen RL, Jahnsen T, Skalhegg BS, Collas P. HA95 is a protein of the chromatin and nuclear matrix regulating nuclear envelope dynamics. J Cell Sci 2000; 113 Pt 21: 3703-13. 274. Goldsmith ZG, Dhanasekaran DN. G protein regulation of MAPK networks. Oncogene 2007; 26(22): 3122-42. 275. Mansell G, Gorrie-Stone TJ, Bao Y, et al. Guidance for DNA methylation studies: statistical insights from the Illumina EPIC array. BMC Genomics 2019; 20(1): 366. 276. Remy I, Michnick SW. Regulation of apoptosis by the Ft1 protein, a new modulator of protein kinase B/Akt. Mol Cell Biol 2004; 24(4): 1493-504. 277. Xu L, Sowa ME, Chen J, Li X, Gygi SP, Harper JW. An FTS/Hook/p107(FHIP) complex interacts with and promotes endosomal clustering by the homotypic vacuolar protein sorting complex. Mol Biol Cell 2008; 19(12): 5059-71. 278. Yoshimura A, Suzuki M, Sakaguchi R, Hanada T, Yasukawa H. SOCS, Inflammation, and Autoimmunity. Front Immunol 2012; 3: 20. 279. Maldonado RA, Soriano MA, Perdomo LC, et al. Control of T helper cell differentiation through cytokine receptor inclusion in the immunological synapse. J Exp Med 2009; 206(4): 877-92. 280. Junttila IS. Tuning the Cytokine Responses: An Update on Interleukin (IL)-4 and IL-13 Receptor Complexes. Front Immunol 2018; 9: 888. 281. Almoallim H, Al-Ghamdi Y, Almaghrabi H, Alyasi O. Anti-Tumor Necrosis Factor-alpha Induced Systemic Lupus Erythematosus(). Open Rheumatol J 2012; 6: 315-9. 282. Shang W, Jiang Y, Boettcher M, et al. Genome-wide CRISPR screen identifies FAM49B as a key regulator of actin dynamics and T cell activation. Proc Natl Acad Sci U S A 2018; 115(17): E4051-E60. 283. Yuki KE, Marei H, Fiskin E, et al. CYRI/FAM49B negatively regulates RAC1-driven cytoskeletal remodelling and protects against bacterial infection. Nat Microbiol 2019; 4(9): 1516-31. 284. Baldwin HM, Singh MD, Codullo V, et al. Elevated ACKR2 expression is a common feature of inflammatory arthropathies. Rheumatology (Oxford) 2017; 56(9): 1607-17. 285. Hansell CA, MacLellan LM, Oldham RS, et al. The atypical chemokine receptor ACKR2 suppresses Th17 responses to protein autoantigens. Immunol Cell Biol 2015; 93(2): 167-76. 286. Mabuchi T, Hwang ST. ACKR2: Nature's Decoy Receptor Lures Unsuspecting Chemokines in Psoriasis. J Invest Dermatol 2017; 137(1): 7-11.

296 Bibliography

287. Inc I. Considerations for RNA-Seq read length and coverage. 2019. https://sapac.support.illumina.com/bulletins/2017/04/considerations-for-rna-seq-read-length-and- coverage-.html?langsel=/au/ (accessed 26/02/2020 2020). 288. Davis CA, Hitz BC, Sloan CA, et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 2018; 46(D1): D794-D801. 289. Liu Y, Zhou J, White KP. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 2014; 30(3): 301-4. 290. Chang WJ, Niu XP, Hou RX, et al. LITAF, HHEX, and DUSP1 expression in mesenchymal stem cells from patients with psoriasis. Genet Mol Res 2015; 14(4): 15793-801. 291. Cortes A, Maksymowych WP, Wordsworth BP, et al. Association study of genes related to bone formation and resorption and the extent of radiographic change in ankylosing spondylitis. Ann Rheum Dis 2015; 74(7): 1387-93. 292. Jung SH, Yim SH, Hu HJ, et al. Genome-wide copy number variation analysis identifies deletion variants associated with ankylosing spondylitis. Arthritis Rheumatol 2014; 66(8): 2103-12. 293. Pointon JJ, Harvey D, Karaderi T, et al. The chromosome 16q region associated with ankylosing spondylitis includes the candidate gene tumour necrosis factor receptor type 1-associated death domain (TRADD). Ann Rheum Dis 2010; 69(6): 1243-6. 294. Mele M, Ferreira PG, Reverter F, et al. Human genomics. The human transcriptome across tissues and individuals. Science 2015; 348(6235): 660-5. 295. Chu DH, van Oers NS, Malissen M, Harris J, Elder M, Weiss A. Pre-T cell receptor signals are responsible for the down-regulation of Syk protein tyrosine kinase expression. J Immunol 1999; 163(5): 2610-20. 296. Smith-Garvin JE, Koretzky GA, Jordan MS. T cell activation. Annu Rev Immunol 2009; 27: 591-619. 297. Diehl S, Chow CW, Weiss L, et al. Induction of NFATc2 expression by interleukin 6 promotes T helper type 2 differentiation. J Exp Med 2002; 196(1): 39-49. 298. Ridgley LA, Anderson AE, Maney NJ, et al. IL-6 Mediated Transcriptional Programming of Naïve CD4+ T Cells in Early Rheumatoid Arthritis Drives Dysregulated Effector Function. Frontiers in Immunology 2019; 10(1535). 299. Jang SW, Hwang SS, Kim HS, et al. protein Hhex negatively regulates Treg cells by inhibiting Foxp3 expression and function. Proc Natl Acad Sci U S A 2019; 116(51): 25790-9. 300. Roberts AR, Vecellio M, Chen L, et al. An ankylosing spondylitis-associated genetic variant in the IL23R-IL12RB2 intergenic region modulates enhancer activity and is associated with increased Th1-cell differentiation. Ann Rheum Dis 2016; 75(12): 2150-6. 301. Duan Z, Pan F, Zeng Z, et al. Interleukin-23 receptor genetic polymorphisms and ankylosing spondylitis susceptibility: a meta-analysis. Rheumatol Int 2012; 32(5): 1209-14. 302. Akamatsu Y, Oettinger MA. Distinct roles of RAG1 and RAG2 in binding the V(D)J recombination signal sequences. Mol Cell Biol 1998; 18(8): 4670-8. 303. Singh SK, Gellert M. Role of RAG1 autoubiquitination in V(D)J recombination. Proc Natl Acad Sci U S A 2015; 112(28): 8579-83. 304. Gonzalez SM, Taborda NA, Rugeles MT. Role of Different Subpopulations of CD8(+) T Cells during HIV Exposure and Infection. Front Immunol 2017; 8: 936. 305. Boulukos KE, Pognonec P, Sariban E, Bailly M, Lagrou C, Ghysdael J. Rapid and transient expression of Ets2 in mature macrophages following stimulation with cMGF, LPS, and PKC activators. Genes Dev 1990; 4(3): 401-9. 306. Jain N, Nguyen H, Friedline RH, et al. Cutting edge: Dab2 is a FOXP3 target gene required for regulatory T cell function. J Immunol 2009; 183(7): 4192-6. 307. Cuthbert RJ, Watad A, Fragkakis EM, et al. Evidence that tissue resident human enthesis gammadeltaT-cells can produce IL-17A independently of IL-23R transcript expression. Ann Rheum Dis 2019; 78(11): 1559-65. 308. Miyake T, Satoh T, Kato H, et al. IkappaBzeta is essential for natural killer cell activation in response to IL-12 and IL-18. Proc Natl Acad Sci U S A 2010; 107(41): 17680-5. 309. Lau MC, Keith P, Costello ME, et al. Genetic association of ankylosing spondylitis with TBX21 influences T-bet and pro-inflammatory cytokine expression in humans and SKG mice as a model of spondyloarthritis. Ann Rheum Dis 2017; 76(1): 261-9. 310. Acquati F, Mortara L, De Vito A, et al. Innate Immune Response Regulation by the Human RNASET2 Tumor Suppressor Gene. Frontiers in Immunology 2019; 10. 311. Ostendorf T, Zillinger T, Andryka K, et al. Immune Sensing of Synthetic, Bacterial, and Protozoan RNA by Toll-like Receptor 8 Requires Coordinated Processing by RNase T2 and RNase 2. Immunity 2020; 52(4): 591-605 e6.

Bibliography 297

312. Yang J, Manolio TA, Pasquale LR, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 2011; 43(6): 519-25. 313. Hanson AL, Cuddihy T, Haynes K, et al. Genetic Variants in ERAP1 and ERAP2 Associated With Immune-Mediated Diseases Influence Protein Expression and the Isoform Profile. Arthritis Rheumatol 2018; 70(2): 255-65. 314. Paladini F, Fiorillo MT, Vitulano C, et al. An allelic variant in the intergenic region between ERAP1 and ERAP2 correlates with an inverse expression of the two genes. Sci Rep 2018; 8(1): 10398. 315. Yates AD, Achuthan P, Akanni W, et al. Ensembl 2020. Nucleic Acids Res 2020; 48(D1): D682-D8. 316. De Jager PL, Jia X, Wang J, et al. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat Genet 2009; 41(7): 776- 82. 317. Eleftherohorinou H, Hoggart CJ, Wright VJ, Levin M, Coin LJ. Pathway-driven gene stability selection of two rheumatoid arthritis GWAS identifies and validates new susceptibility genes in receptor mediated signalling pathways. Hum Mol Genet 2011; 20(17): 3494-506. 318. Peter D, Jin SL, Conti M, Hatzelmann A, Zitt C. Differential expression and function of phosphodiesterase 4 (PDE4) subtypes in human primary CD4+ T cells: predominant role of PDE4D. J Immunol 2007; 178(8): 4820-31. 319. Grzywacz B, Kataria N, Kataria N, Blazar BR, Miller JS, Verneris MR. Natural killer-cell differentiation by myeloid progenitors. Blood 2011; 117(13): 3548-58. 320. Cichocki F, Grzywacz B, Miller JS. Human NK Cell Development: One Road or Many? Frontiers in Immunology 2019; 10(2078). 321. Gracey E, Yao Y, Qaiyum Z, Lim M, Tang M, Inman RD. Altered Cytotoxicity Profile of CD8+ T Cells in Ankylosing Spondylitis. Arthritis Rheumatol 2020; 72(3): 428-34. 322. Paget C, Chow MT, Gherardin NA, et al. CD3bright signals on gammadelta T cells identify IL-17A-producing Vgamma6Vdelta1+ T cells. Immunol Cell Biol 2015; 93(2): 198-212. 323. Yokobori N, Schierloh P, Geffner L, et al. CD3 expression distinguishes two gammadeltaT cell receptor subsets with different phenotype and effector function in tuberculous pleurisy. Clin Exp Immunol 2009; 157(3): 385-94.

298 Bibliography

Appendices

Appendix A

AS-associated genetic loci and their associated genes (Genome wide association

study (GWS) or cross-disease analysis (CDA)

LIKELY Significant SNP ID CHR Position PUTATIVE FUNCTION GENE analysis

?OTUD3 OTUD3: deubiquinase rs6426833 1p36 19845367 CDA /PLA2G2E PLA2G2E: phospholipase A2 rs6600247 1p36 24978623 RUNX3 Control of CD8 and Th1 differentiation GWS Activation/differentiation of IL-23R rs7517847 1p31 67215986 IL23R GWS expressing cells Activation/differentiation of IL-23R rs183686347 1p31 67237759 IL23R GWS expressing cells Activation/differentiation of IL-23R rs80174646 1p31 67242472 IL23R GWS expressing cells Activation/differentiation of IL-23R rs10889676 1p31 67256884 IL23R GWS expressing cells rs4845604 1q21 151829204 RORC Th17 differentiation CDA rs4129267 1q21 154453788 IL6R Th17 differentiation GWS rs4971079 1q22 155157915 ?MUC1 Unclear role CDA rs1333062 1q23 160876494 CD244 NK cell activation CDA rs10800314 1q32 161502999 FCGR2A Unknown GWS rs61802846 1q32 161504083 FCGR2A Unknown GWS C1orf106/ rs12131796 1q32 200909599 Unknown GWS GPR25 Immunoregulatory functions including rs3024493 1q32 206770623 IL10/IL19 CDA in gut Immunoregulatory functions including rs12075255 1q32 206788283 IL10/IL19 CDA in gut rs2666218 2p25 9262859 ASAP2 Unclear role GWS De novo DNA methylation rs13407913 2p23 24874775 ?DNMT3A CDA establishment. De novo DNA methylation rs201014116 2p23 25278036 DNMT3A CDA establishment. rs4672505 2p15 62333197 Intergenic Unknown GWS rs4851529 2q11 102030838 IL1R2 IL-1 receptor GWS rs871656 2q11 102154822 IL1R1 IL-1 receptor GWS rs72871627 2q24 162280432 IFIH1 Response to viral infection CDA CD28/ rs7426056 2q33 203747335 T-cell co-stimulation CDA CTLA4 rs11306716 2q33 204707771 CTLA4 T-cell co-stimulation GWS ?CXCR1/ rs11676348 2q35 218145423 Unclear role CDA CXCR2 rs12694846 2q37 230283413 SP140 unclear function CDA

Appendices 299

rs3749171 2q37 240630275 GPR35 Bacterial sensing GWS rs4676406 2q37 240639691 ?KIF1A Unclear role CDA rs10510607 3p24 28244770 CMC1 Unclear role GWS rs1001007 3p21 46387167 ?CCR5 Unclear role CDA rs3197999 3p21 49684099 ?MST1 Unclear role CDA Cell survival and cell death, rs6781808 3q22 137722530 SOX14 GWS Unclear role rs11098964 4q21 79966815 ANTXR2 Unknown GWS Controls proinflammatory cytokine rs3774937 4q24 102513096 NFKB1A GWS secretion (including TNF) rs11750385 5p15 10521556 ?ROPNIL Unclear role GWS Induces IL23 expression, rs1992661 5p13 40414887 PTGER4 activation/differentiation of IL-23R CDA expressing cells; bone anabolism Induces IL23 expression, rs9687958 5p13 40496321 PTGER4 activation/differentiation of IL-23R CDA expressing cells; bone anabolism rs71624119 5q11.2 56144903 ANKRD55 Unclear role CDA Peptide trimming prior to HLA class I rs469758 5q15 96786011 ERAP1 GWS presentation Peptide trimming prior to HLA class I rs2549803 5q15 96839226 ERAP2 GWS presentation Peptide trimming prior to HLA class I rs2910686 5q15 96916885 ERAP2 GWS presentation rs17622517 5q31 132467845 ?IRF1 Unclear role CDA Clearance of bacterial infections, rs11749391 5q33 150849504 IRGM CDA autophagy Activation/differentiation of IL-23R rs6556411 5q33 159356215 IL12B CDA expressing cells Activation/differentiation of IL-23R rs56167332 5q33 159400761 IL12B GWS expressing cells rs1267499 6p23 14715651 Intergenic Unknown CDA Methylthiotransferase family, rs2328530 6p22 20643496 CDKAL1 GWS unclear role. rs72928038 6q15 90267049 BACH2 B-cell differentiation GWS rs582757 6q21 137876687 TNFAIP3 Control of cytokine secretion incl. TNF CDA RhoGTPAse-activating protein rs2451258 6q25 159085568 TAGAP CDA involved in T-cell activation rs2301436 6q27 167024500 CCR6 Th17 differentiation CDA rs67025039 6q27 167415405 CCR6 Th17 differentiation GWS rs1525735 7p21 17157947 AHR Microbial sensing, T-cell activation GWS rs12718244 7p12 50136058 IKZF1 B- and T-cell development CDA rs4917129 7p12 50283578 IKZF1 B- and T-cell development CDA rs4728142 7q32 128933913 IRF5 Interferon signalling CDA Signalling kinase from IL-12, IL-23, rs10758669 9p24 4981602 JAK2 CDA INF-γ CCL21/ rs2812378 9p13 34710263 CCL19/ Unclear role CDA FAM205A rs726657 9q33 114934056 TNFSF8 TNF receptor CDA

300 Appendices

Innate immune receptor for bacterial rs4986790 9q33 117713024 TLR4 CDA LPS Th17 activation after β-glucan rs141992399 9q34 136365140 CARD9 GWS exposure Th17 activation after β-glucan rs10870077 9q34 136369439 CARD9 GWS exposure CARD9/ CARD9: IL-22 production rs3124998 9q34 136494980 GWS NOTCH1 NOTCH1: T-cell development rs2236379 10p15 6485181 PRKCQ T-cell activation CDA rs10761648 10q21 62594503 ZNF365 Unknown CDA rs7915475 10q21 62621908 ?ADO Unclear role CDA rs1250573 10q22 79282718 ZMIZ1 T-cell differentiation GWS rs1800682 10q23 88990206 FAS TNF receptor GWS rs11190133 10q24 99518968 NKX2-3 T-cell differentiation CDA rs10748781 10q24 99523573 NKX2-3 T-cell differentiation GWS rs10750899 11q12 58517478 Unclear Unclear role CDA rs11236797 11q13 76588605 C11orf30 Unclear role CDA rs7115956 11q22 110085620 ZC3H12C Unknown GWS rs7933433 11q24 128324555 ETS1 T-cell differentiation/ activation CDA rs11221322 11q24 128476898 ETS1 T-cell differentiation/ activation CDA rs11221332 11q24 128511079 ETS1 T-cell differentiation/ activation CDA rs1860545 12p13 6337611 TNFRSF1A TNF receptor GWS rs11616188 12p13 6393576 LTBR TNF receptor GWS rs12369214 12q23 106804833 Unclear Unclear role CDA rs3184504 12q24 111446804 SH2B3 TCR signalling GWS Controls proinflammatory cytokine rs8006884 4q24 35094005 NFKB1A CDA secretion (e.g. TNF) Controls proinflammatory cytokine rs2145623 4q24 35370030 NFKB1A CDA secretion (e.g. TNF) Interacts with SMAD3 to regulate rs1569328 14q24 75275048 FOS CDA TGFB function rs11624293 14q31 88022477 GPR65 Bacterial sensing GWS rs148783236 15q21 50492819 USP8 Unclear role GWS Interacts with FOS to regulation TGFB rs35874463 15q22 67165360 SMAD3 CDA function rs61752717 16p13 3243407 MEFV Bacterial sensing, IL-1 secretion GWS SOCS1 or rs367569 16p13 11271643 JAK2/STAT3 inhibitor CDA TNP2 rs26528 16p11 28506388 IL27 Th17/Th1 differentiation balance GWS Activation/differentiation of IL-23R rs34670647 1p31 30171017 IL23R GWS expressing cells rs11574938 16p11 30474072 ?ITGAL Unclear role CDA rs2066845 16q12 50722629 NOD2 Bacterial sensing/NFKB activation CDA rs72796367 16q12 50728860 NOD2 Bacterial sensing/NFKB activation CDA rs5743293 16q12 50729870 NOD2 Bacterial sensing/NFKB activation CDA rs9797244 17q11 27770105 NOS2 Nitric oxide synthesis GWS rs2779255 17q11 27810514 NOS2 Nitric oxide synthesis GWS CCL2/ rs9889296 17q12 34243528 Unclear role CDA MCP-1 rs35736272 17q21 39876427 IKZF3 B- and T-cell development CDA

Appendices 301

Peptide trimming prior to HLA class I rs12943464 17q21 47612985 NPEPPS GWS presentation rs1292035 17q23 59912196 ?TUBD1 Unclear role CDA ERN1: ER stress response ERN1/ rs196941 17q23 64069832 ICAM2: Mediates adhesion interaction GWS ICAM2 for antigen-specific immune response. TCR signalling control, JAK signalling rs12968719 18p11 12879467 PTPN2 CDA negative regulator Signalling from cytokine receptors, e.g. rs74956615 19p13 10317045 TYK2 GWS IL-23R Signalling from cytokine receptors, e.g. rs35018800 19p13 10354167 TYK2 GWS IL-23R Signalling from cytokine receptors, e.g. rs12720356 19p13 10359299 TYK2 GWS IL-23R rs587259 19q13 34165501 LSM14A Unclear role CDA rs679574 19q13 48702851 FUT2 Influences gut microbiome CDA rs4243971 20q11 32261714 ?KIF3B Unclear role CDA rs6058869 20q11 32760944 DNMT3B DNA methylation CDA rs2823288 21q21 15448569 Intergenic Unknown CDA rs9977672 21q22 39091357 Intergenic Unknown GWS ICOSLG: T-cell activation/ ICOSLG/ differentiation rs4456788 21q22 44196441 DNMT3L/ GWS DNMT3L: DNA methylation AIRE AIRE: T-cell development rs2266961 22q11 21574308 UBE2L3 Ubiquination, target unclear CDA rs2143178 22q13 39264824 ?PDGFB Unclear role CDA rs1569414 22q13 45331684 FAM118A Unknown GWS

302 Appendices

Appendix B Heat scree plot code heat_scree_plot <- function(Loadings, Importance){ adjust <- 1-Importance[1] pca_adjusted <- Importance[2:length(Importance)]/adjust pca_df <- data.frame(adjusted_variance = pca_adjusted, PC = seq(1:length(pca_adjusted)))

## Plot adjusted variance for each PC scree <- ggplot(pca_df[which (pca_df$PC < (PCs_to_view+1)),], aes(PC, adjusted_variance)) + geom_bar(stat = "identity", color = "black", fill = "grey")+ theme_bw() + theme(axis.text = element_text(size =12), axis.title = element_text(size =15), plot.margin = unit(c(1.25,1.6,0.2,3), "cm")) + ylab("Adjusted Variance")+ scale_x_continuous(breaks = seq(1,PCs_to_view,1))

## correlate metadata with PCs and run anova on each meta data variable aov_PC_meta <- lapply(1:ncol(pd_categorical), function(covar) sapply(1:ncol(Loadings), function(PC) summary(aov(Loadings[, PC] ~ pd_categorical[, covar]))[[1]]$"Pr(>F)"[1])) cor_PC_meta <- lapply(1:ncol(pd_continuous), function(covar) sapply(1:ncol(Loadings), function(PC) (cor.test(Loadings[, PC], as.numeric(pd_continuous[, covar]), alternative = "two.sided", method = "spearman", na.action = na.omit)$p.value))) names(aov_PC_meta) <- colnames(pd_categorical) names(cor_PC_meta) <- colnames(pd_continuous) aov_PC_meta <- do.call(rbind, aov_PC_meta) cor_PC_meta <- do.call(rbind, cor_PC_meta) aov_PC_meta <- rbind(aov_PC_meta, cor_PC_meta) aov_PC_meta <- as.data.frame(aov_PC_meta) aov_PC_meta_adjust <- aov_PC_meta[, 2:ncol(aov_PC_meta)]

# reshape data avo <- aov_PC_meta_adjust[,1:PCs_to_view] avo_heat_num <- apply(avo,2, as.numeric) avo_heat <- as.data.frame(avo_heat_num) avo_heat$meta <- rownames(avo) avo_heat_melt <- melt(avo_heat, id=c("meta"))

# cluster meta data meta_var_order <- unique(avo_heat_melt$meta)[rev(ord)]

Appendices 303

avo_heat_melt$meta <- factor(avo_heat_melt$meta, levels = meta_var_order)

# color if significant anova avo_heat_melt$Pvalue <- sapply(1:nrow(avo_heat_melt), function(x) if(avo_heat_melt$value[x] <=0.001) { "<=0.001" }else{ if(avo_heat_melt$value[x]<=0.01) { "<=0.01" } else { if(avo_heat_melt$value[x]<=0.05) { "<=0.05" }else{ ">0.05"} levels(avo_heat_melt$variable) <- sapply(1:PCs_to_view, function(x) paste("PC",x, sep=""))

# Plot significant meta for each PC heat <- ggplot(avo_heat_melt, aes(variable,meta, fill = Pvalue)) + geom_tile(color = "black", size=0.5) + theme_gray(8) + scale_fill_manual(values = Shade(10)[c(2,4,7,10)])+ theme(axis.text = element_text(size =10, color="black"), axis.text.x = element_text(), axis.title = element_text(size =15), legend.text = element_text(size =14), legend.title = element_text(size =12), legend.position = c(1, 0.4), legend.justification = c(1,1), plot.margin = unit(c(0,2.25,1,1),"cm"))+ xlab("Adjusted Principle Component") + ylab(NULL)

# Arrange the plots in the figure grid.arrange(scree, heat, ncol = 1) }

304 Appendices

Appendix C DMP Permutation code set.seed(100) b = 1000 ## number of permutations

## P-values thresholds p <- c("0.1","0.05", "0.01", "0.001", "0.0001", "0.00001","0.000001")

## Dataframes to store results for each permutation perm.status.CD4 <- data.frame(matrix(nrow = (b+1), ncol = length(p)+1)) colnames(perm.status.CD4) <- c("Iteration",p) permuted.CD4.DMP.logFC <- data.frame(matrix(nrow = nrow(adjBval), ncol = b)) permuted.CD4.DMP.Pval <- data.frame(matrix(nrow = nrow(adjBval), ncol = b)) permuted.CD4.DMP.adjPval <- data.frame(matrix(nrow = nrow(adjBval), ncol = b)) Status.list <- data.frame(matrix(nrow = nrow(pd), ncol = b))

## Define meta data Age<-as.numeric(pd$Age.at.collection) Sex<-factor(pd$Sex) Smoking<-factor(pd$Smoking.Status,levels=c("Never","Previous")) Status<-factor(pd$Status.New,levels=c("HC","AS"))

## Define results from actual analysis “actual results” design<-model.matrix(~Status+Sex+Age+Smoking, data=pd) fit<-lmFit(CD4_Mval, design) fit.e<-eBayes(fit) results<-decideTests(fit.e) DMPlm<-topTable(fit.e,num=Inf,coef=2,adjust.method="BH") ## result output

## record “actual data” perm.status.CD4[1,'Iteration'] <- 0 ## records iteration number for (s in 1:length(p)){ sig.thresh <- as.numeric(p[s]) perm.status.CD4[1,p[s]] <- length(which(DMPlm$P.Value <= sig.thresh)) }

## Loop for iterating DMP analysis for (i in 1:b){ #### Scramble disease status

Appendices 305

Status.scram <- sample(Status,length(Status),replace = FALSE) ### permuted (randomised) disease status Status.list[,i] <- Status.scram pd$Status.scram <- Status.scram

### STATISTICAL TEST ### design<-model.matrix(~Status.scram+Sex+Age+Smoking, data=pd) fit<-lmFit(CD4_Mval, design) fit.e<-eBayes(fit) results<-decideTests(fit.e) DMPlm<-topTable(fit.e,num=Inf,coef=2,adjust.method="BH") Statustest.CD4p <- subset(DMPlm, P.Value <= 0.1)

## create separate output files for logFC, P.Value, adj.P.Val DMPlm <- DMPlm[rownames(adjBval),] ## ensure all same order to output permuted.CD4.DMP.logFC[,i] <- DMPlm$logFC permuted.CD4.DMP.Pval[,i] <- DMPlm$P.Value permuted.CD4.DMP.adjPval[,i] <- DMPlm$adj.P.Val

### Record how many P-values are less than the chosen threshold perm.status.CD4[(i+1),’Iteration’] <- I ## records iteration number for (s in 1:length(p)){ sig.thresh <- as.numeric(p[s]) perm.status.CD4[(i+1),p[s]] <- length(which(DMPlm$P.Value <= sig.thresh)) } } # end loop ### Write out significant DMP for permuted FDR calculation write.table(perm.status.CD4, file = “~/EPIC/PermutationTestTable_DMP_Status_CD4.txt”)

306 Appendices

Appendix D Pilot study comparing library preparation methods

Due to the low cell numbers obtained from some cell subsets, primarily γδ T-cells, the previous library preparation method used by our laboratory (Illumina TruSeq library preparation kit) was untenable. The Illumina TruSeq kit requires a minimum input of 100 ng RNA, and many samples were less than 50ng. Therefore, an alternative kit, the Clontech Low-input mammalian kit, was selected. This kit was chosen as it is: 1) recommended by Illumina, 2) uses Illumina adapters, 3) the first step is cDNA synthesis providing library stability, and 4) it is highly cost effective. At the time of comparison, the kit had a single published paper using this method.

A pilot study was performed to compare the Clontech and Illumina kits.

Figure 1.Š‡‡Ž‡ –”‘’Š‡”‘‰”ƒ–”ƒ ‡‘ˆŽ‹„”ƒ”‹‡•’”‡’ƒ”‡†—•‹‰–Š‡Ž‘–‡ Š‹–”—‘–Š‡ ƒ’‡•–ƒ–‹‘ͶʹͲͲǤ‘•‹–‹˜‡ ‘–”‘Ž•ƒ”‡•Š‘™‹’‹ƒ†‰”‡›Ǥ Samples were obtained from a single healthy control bleed, and three cell types were FACS isolated. Two separate FACS and RNA isolations were carried out to simulate different

Appendices 307

processing days. Two input amounts were tested, 10ng (the kit input upper limit) and 5ng (the middle of the kits input range). The initial library fragmentation sizes were all within the expected size of 200-1000bp (Figure 1). The peaks at 25 bp and 1500 bp are the lower and upper markers respectively; there are no ribosomal RNA peaks. Concentration of final libraries varied between cell types, but the size matched the positive controls. The sequencing was performed on the NextSeq500, for which Illumina recommends specific quality cut-offs. The run clustering density of 180 K/mm2 was within the recommended range of 170-220 K/mm2.

The percentage of clusters passing filter (Clusters PF), described by Illumina as an “indication of signal purity from each cluster”, was 89.9%. The Q30 score, a measure of the calling accuracy of the sequencing, was 90.3%. This indicates that the sequencing for both kits was of a high quality.

Count data for kit comparison

The human genome used for this analysis has a total of 57,905 genes recorded, the number of genes with expressed gene transcripts was 13,059 for TruSeq kit and 13,041 for the Clontech kit 10ng input (Table 1).

Table 1 The number of genes with transcripts above 1, 10 and 1000 for each sample.

Number of Genes Kit Input >1 count >10 counts >1000 counts

TruSeq 13059 7106 369

Clontech 10 ng 13041 7070 343 (same batch TruSeq) Clontech 10 ng 12911 7061 360 (different batch) Clontech 5ng 13321 7302 408 (same batch)

Clontech samples had slightly lower numbers of uniquely mapped reads, likely due to the low input levels resulting in higher duplication (Table 2). Similarly, the higher multimapping in

Clontech may be due to duplication. The percentage of reads mapped to each chromosome

308 Appendices

was not significantly different between the kits, except with regards to mitochondrial percentage. The TruSeq kit depletes both mitochondrial and ribosomal RNA, whereas the

Clontech kit only depletes ribosomal. Comparison of the counts per gene using fragments per kilobase of gene per million reads (FPKM) indicated strong correlation between the TruSeq kit and the Clontech kit 10ng input in both CD4+T-cells and CD14+monocytes. Higher variability was observed at lower counts which have a higher level of dispersion. The variability between kits and input amounts indicated that a single method should be used.

Overall, the Clontech kit was highly comparable with the TruSeq kit when using samples from the same batch at either input amount. An input amount of 10ng was selected for the project.

Table 2 Mapping metrics for each of the cell types, input amounts and kits tested.

Uniquely % reads Input % reads Sample Batch mapped mapped to (ng) too short reads (%) multiple loci TruSeq CD4 T-cells 1 200 73.14% 16.38% 10.26% TruSeq CD14 Monocytes 1 200 74.17% 16.90% 8.71% Clontech CD4 T-cells 1 10 66.25% 12.55% 20.48% Clontech CD14 Monocytes 1 10 67.00% 13.49% 18.69% Clontech CD4 T-cells 2 10 59.36% 13.41% 25.93% Clontech CD14 Monocytes 2 10 61.84% 12.64% 23.22% Clontech γδ T-cells 2 10 72.46% 13.65% 13.11% Clontech CD4 T-cells 1 5 69.18% 9.38% 20.81% Clontech CD14 Monocytes 1 5 68.66% 12.56% 17.86% Clontech γδ T-cells 2 5 72.23% 11.98% 14.97%

Appendices 309

Figure 2. Comparison of counts per gene in Clontech and TruSeq kits for CD4+T-cells and CD14+monocytes.

310 Appendices

Appendix E Demographics for γδ T-cell and NK cell samples

γδ T-cells NK cells

AS HC AS HC

Number individuals 20 18 45 43

HLA-B*27+ n.(%) 14 (70%) 13 (72.2%) 33 (73.3%) 30 (69.8%)

Sex (Male) 13 (65%) 7 (38.9%) 34 (75.6%) 18 (41.9%)

Age (mean) 39.6 35.5 45.2 40

Anti-TNFα treated 14 (70%) NA 29 (64.4%) NA

Iritis 7 0 18 0

Psoriasis 5 2 9 3

IBD 0 0 2 0

NSAIDs 11 (55%) 6 (33.3%) 25 (55.6%) 14 (32.6%)

Appendices 311

Appendix F

Online supplementary Data

All online files are located in a single location in subfolders organised by the below subheadings. The main folder contains a README file that details the format and contents of each document. Shorthand is used to label most files as the cell type it is from: CD4 = CD4+T-cells, CD8 = CD8+

Access documents using the link: https://drive.google.com/open?id=18MBSZVTMck8_bg7QJjg7sQoyhWDnGRg6

(1) DMP results for individual cell types

The DMP results are contained in individual excel files labelled with an acronym for the cell type.

(2) DEG results for individual cell types

The DEG results are specified for each individual cell type for both forward and reverse strand reads. The read is specified by Fwd (forward) or Rev (reverse).

(3) Cis-QTL results for both methylation and gene expression

Only cis-QTL associations are reported due to the small number of trans-QTL identified. These results have been reduced to the most significant result for each CpG or gene, and results for the HLA region have been removed.

312 Appendices