UNIVERSITY OF MANCHESTER

Defining the key biological and genetic mechanisms

involved in psoriasis

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Biology, Medicine and Health

2017

Helen F. Ray-Jones

School of Biological Sciences

Division of Musculoskeletal and Dermatological Sciences

1

2

Table of Contents

0. Introduction ...... 25

0.1 Disease prevalence ...... 26

0.2 Phenotypes of psoriasis ...... 26

0.3 Cellular basis ...... 29

0.4 Age of onset ...... 33

0.5 Environmental risk factors ...... 35

0.6 Quality of life ...... 36

0.7 Therapeutic treatments ...... 37

0.7.1 Topical therapies ...... 37

0.7.2 UV therapy ...... 37

0.7.3 Systemic treatments ...... 38

0.7.4 Biologics ...... 38

0.7.4.1 Anti-TNFα therapies ...... 38

0.7.4.2 Anti- IL-12/IL-23 therapies ...... 39

0.7.4.3 Anti-IL-17 therapies ...... 39

0.7.4.4 Limitations of biologics ...... 40

0.8 Genetic risk factors ...... 40

0.8.1 Linkage studies ...... 41

0.8.2 Candidate gene studies ...... 41

0.8.3 Genome wide association studies ...... 42

0.8.3.1 GWAS and fine mapping in psoriasis ...... 44

0.8.3.2 Genetic overlap with psoriatic arthritis ...... 56

0.8.3.3 Genetics of late-onset psoriasis ...... 56

0.9 Moving beyond GWAS ...... 57

0.9.1 Interpreting GWAS data ...... 58

0.9.2 Functional annotation of GWAS loci ...... 59

3

0.9.2.1 Importance of cell type ...... 60

0.9.2.2 Accessible chromatin ...... 61

0.9.2.3 Protein interactions ...... 62

0.9.2.4 Chromatin interactions ...... 64

0.9.2.5 Selected psoriasis risk loci for functional follow-up ...... 69

0.10 Summary ...... 70

0.11 Overall aims and objectives ...... 71

0.12 Outline of thesis ...... 71

1. A genome-wide association study of late-onset psoriasis ...... 73

1.1 Introduction ...... 74

1.2 Aims and objectives of Section 1 ...... 74

1.3 Methods ...... 75

1.3.1 Samples ...... 75

1.3.1.1 Cases ...... 75

1.3.1.2 Controls ...... 76

1.3.2 Genotyping of the Manchester PsA cohort ...... 78

1.3.2.1 Illumina Infinium HTS Assay ...... 78

1.3.2.2 GenomeStudio ...... 80

1.3.2.3 Quality control of the Manchester PsA genotype data ...... 80

1.3.3 Merging case-control datasets ...... 82

1.3.4 Imputation ...... 82

1.3.5 Association analysis ...... 83

1.3.5.1 Frequentist test for association ...... 83

1.3.5.2 Correction for multiple testing ...... 84

1.3.5.3 Testing for independent signals in the MHC ...... 84

1.3.5.4 Annotation of results ...... 84

1.3.5.5 Post-analysis QC of novel signals ...... 85

4

1.4 Results ...... 86

1.4.1 Samples ...... 86

1.4.2 Genotyping of the Manchester PsA cohort ...... 86

1.4.3 Merging case-control datasets ...... 87

1.4.4 Imputation ...... 89

1.4.5 Association analysis ...... 89

1.4.5.1 Conditional analysis in the MHC ...... 94

1.4.5.2 Overlap with other traits in GWAS datasets ...... 94

1.4.5.3 Replication of LOP signals ...... 95

1.4.5.4 Putative novel LOP loci ...... 99

1.5 Discussion ...... 108

1.5.1 Validation of known psoriasis loci ...... 108

1.5.2 The 2q13 (IL1R1) locus ...... 109

1.5.3 Putative novel LOP loci ...... 110

1.5.4 Strengths and limitations ...... 113

1.5.5 Future work ...... 114

1.5.6 Conclusions ...... 115

2. Functional characterisation of psoriasis risk loci ...... 117

2.1 Introduction ...... 118

2.2 Aims and objectives of Section 2 ...... 118

2.3 Methods ...... 120

2.3.1 Methods for functional characterisation of individual risk loci ...... 120

2.3.1.1 ...... 120

2.3.1.2 Chromatin Immunoprecipitation ...... 126

2.3.1.3 Chromosome conformation capture ...... 139

2.3.1.4 Stimulation of HaCaT cells for ChIP and 3C in 9q31 ...... 160

2.3.2 Methods for functional characterisation of multiple risk loci ...... 165

5

2.3.2.1 HaCaT stimulation time-course and expression analysis ...... 165

2.3.2.2 Capture Hi-C study...... 167

2.4 Results ...... 186

2.4.1 Results for functional characterisation of individual risk loci ...... 186

2.4.1.1 The 9q31 (KLF4) risk locus ...... 186

2.4.1.2 The 6q23 (TNFAIP3) risk locus ...... 203

2.4.2 Results for functional characterisation of multiple risk loci ...... 209

2.4.2.1 HaCaT stimulation time-course and expression analysis ...... 209

2.4.2.2 Capture Hi-C study ...... 213

2.4.3 Summary of functional work ...... 228

2.5 Discussion of functional work ...... 229

2.5.1 Discussion of functional characterisation of individual risk loci...... 229

2.5.1.1 Bioinformatics, ChIP and 3C revealed that KLF4 is a likely target gene in the 9q31 risk locus ...... 230

2.5.1.2 Bioinformatics and 3C confirmed that a complex interaction landscape exists in the 6q23 (TNFAIP3) risk locus ...... 234

2.5.1.3 Strengths and limitations of methods used in the locus-specific study 236

2.5.2 Discussion of functional characterisation of multiple risk loci ...... 239

2.5.2.1 Expression microarray ...... 240

2.5.2.2 CHi-C experiment ...... 240

2.5.2.3 Strengths and limitations of methods used in the analysis of multiple risk loci 246

2.5.3 Future work ...... 248

2.5.3.1 The 9q31 and 6q23 loci ...... 248

2.5.3.2 The CHi-C study ...... 249

3. Discussion of Thesis ...... 253

3.1 The scope of genetic studies in psoriasis ...... 256

6

3.2 Conclusion ...... 258

4. References ...... 259

5. Appendix ...... 289

6. Publication arising from this thesis ...... 313

Word count: 63,563

7

List of Tables

Table 1: Non-MHC GWAS loci associated with psoriasis in cohorts of European and Chinese ancestry ...... 50

Table 2: Samples included in the LOP GWAS dataset ...... 75

Table 3: Overview of case and control cohorts ...... 88

Table 4: Results of the LOP association analysis at P < 1 x 10-5 ...... 92

Table 5: Overlap of lead variants with previously-reported GWAS traits ...... 95

Table 6: Comparison with LOP Immunochip findings ...... 96

Table 7: Novel LOP associations not prioritised for follow-up (info score < 0.7 or singletons) ...... 100

Table 8: Putative novel suggestive LOP signals (info > 0.7, non-singletons) ...... 103

Table 9: Datasets used in eQTL tools ...... 122

Table 10: Tools used in this project for bioinformatic characterisation of GWAS loci ..... 124

Table 11: Optimised Covaris sonication settings for individual cell types...... 131

Table 12: Immunoprecipitation wash buffers ...... 133

Table 13: Fragments targeted in the first 9q31 (KLF4) 3C-qPCR assay ...... 155

Table 14: Fragments targeted in the second 9q31 (KLF4) 3C-qPCR assay ...... 156

Table 15: Fragments targeted in the third 9q31 (KLF4) 3C-qPCR assay ...... 157

Table 16: Fragments targeted in the first 6q23 3C-qPCR assay ...... 159

Table 17: Fragments targeted in the second 6q23 3C-qPCR assay ...... 160

Table 18: Summary of generated CHi-C libraries ...... 169

Table 19: Variants in LD with rs10979182 with functional scores of 4 or less in RegulomeDB ...... 189

Table 20: Variants in LD with rs582757 with functional annotation ...... 205

Table 21: Results of HiCUP processing for CHi-C libraries ...... 213

Table 22: Frequency of significant interactions occurring with psoriasis-associated bait fragments ...... 214 8

Table 23: Sources for standard protocols used for kits in this thesis ...... 290

Table 24: Quality control measures for datasets used in the LOP GWAS ...... 291

Table 25: Predictive scores from RegulomeDB, adapted from Boyle et al. (2012)...... 292

Table 26: Primers used in ChIP experiments ...... 292

Table 27: Primers used to test 3C and Hi-C libraries for the presence of long-range and short-range interactions ...... 293

Table 28: BACs used for generating 3C control libraries ...... 293

Table 29: Primers used to test identity of BACs in the 9q31 (KLF4) locus ...... 294

Table 30: Primers and TaqMan probe used for 3C-qPCR assays in the 9q31 (KLF4) locus 296

Table 31: Primers used for 3C-qPCR in the 6q23 (TNFAIP3) locus ...... 298

Table 32: Primers used to test gene expression in the stimulatory time-course ...... 299

Table 33: Adapters and PCR primers used for amplification of Hi-C libraries ...... 300

Table 34: SNPs included in the CHi-C study design ...... 301

Table 35: Adapters and PCR primers used for amplification of CHi-C libraries ...... 306

Table 36: List of 91 variants in tight LD with rs10979182 in the KLF4 locus, with associated functional scores ...... 307

9

List of Figures

Figure 1: Clinical manifestations of psoriasis, adapted from Boehncke and Schon (2015) 27

Figure 2: Histology of psoriasis from Wagner et al. (2010) ...... 31

Figure 3: Schematic of psoriasis pathogenesis (Cai et al., 2012) ...... 32

Figure 4: Bimodal distribution of psoriasis age of onset from Henseler and Christophers (1985) ...... 35

Figure 5: Hypothetical model of psoriasis pathogenesis, adapted from Bergboer et al. (2012b) ...... 46

Figure 6: Workflow for the characterisation of GWAS SNPs, adapted from Ray-Jones et al. (2016) ...... 59

Figure 7: Overview of the chromatin immunoprecipitation (ChIP) technique ...... 63

Figure 8: Overview of the chromosome conformation capture (3C) technique ...... 65

Figure 9: Variations of the chromosome conformation capture technique ...... 66

Figure 10: Flow chart of the LOP GWAS process, indicating the contributing researchers at each stage ...... 77

Figure 11: Overview of the genotyping protocol (adapted from Illumina Infinium HTS Assay protocol guide) ...... 79

Figure 12: Overview of quality control of the Manchester PsA cohort using PLINK...... 87

Figure 13: PCA of merged dataset ...... 88

Figure 14: Quality scores of variants ...... 89

Figure 15: q-q plot for all variants in the LOP association analysis ...... 90

Figure 16: Manhattan plot showing genome-wide results of the LOP association analysis ...... 91

Figure 17: Regional association plots (Locuszoom) for known psoriasis loci in the LOP GWAS ...... 98

Figure 18: Regional association plots (Locuszoom) for novel putative signals in the LOP GWAS that failed further QC checks (info < 0.7 or singletons) ...... 101

10

Figure 19: Regional association plots (Locuszoom) of putative novel suggestive LOP loci ...... 104

Figure 20: Representative qPCR plots ...... 134

Figure 21: Example efficiency and specificity measurements for optimal and non-optimal primer pairs ...... 136

Figure 22: Minimally-overlapping BACs in the 9q31 locus ...... 145

Figure 23: Primer design and potential ligation products in 3C, adapted from Naumova et al. (2012) ...... 148

Figure 24: The use of TaqMan® technology in 3C-qPCR, from (Hagege et al., 2007)...... 150

Figure 25: Example of a standard curve generated from dilutions of a BAC library ...... 152

Figure 26: Anchor and target fragments for 3C assays across the 9q31 (KLF4) locus ...... 154

Figure 27: Anchor and target fragments across the 6q23 (TNFAIP3) locus ...... 158

Figure 28: Processes underlying 3C, Hi-C and CHi-C library generation ...... 168

Figure 29: Restriction digest QC of Hi-C libraries ...... 171

Figure 30: Schematic of solution capture hybridisation using the SureSelect kit ...... 178

Figure 31: Illumina sequencing technology, adapted from Metzker (2010) ...... 182

Figure 32: Overview of psoriasis-associated SNPs in 9q31 (hg19) ...... 187

Figure 33: VEP predictions of variant consequences in the 9q31 variant set ...... 188

Figure 34: ChIP optimisation in HaCaT and My-La cells ...... 191

Figure 35: ChIP results in HaCaT, My-La and NHEK cells ...... 193

Figure 36: 3C-qPCR results in the 9q31 locus anchored at the HindIII fragment containing the second psoriasis-associated putative enhancer (rs6477612) ...... 195

Figure 37: 3C-qPCR results in the 9q31 locus from a HindIII fragment ~ 8.7 kb downstream of KLF4 (Centromeric 1) ...... 197

Figure 38: 3C-qPCR results in the 9q31 locus from the HindIII fragment containing the KLF4 gene and promoter ...... 199

Figure 39: Fold change in KLF4 expression upon stimulation of HaCaT cells with IFN-γ ...200

Figure 40: ChIP results for H3K4me1 and H3K27ac in stimulated HaCaT cells ...... 201 11

Figure 41: 3C-qPCR results in 9q31 in unstimulated and stimulated HaCaT cells ...... 202

Figure 42: Overview of psoriasis-associated SNPs in 6q23 (hg19) ...... 203

Figure 43: VEP predictions of variant consequences in the 6q23 variant set ...... 204

Figure 44: 3C-qPCR results in the 6q23 (TNFAIP3) locus between immune-related genes and the Ps/RA loci in HaCaT cells ...... 206

Figure 45: 3C-qPCR results in the 6q23 (TNFAIP3) locus from the psoriasis-associated fragment downstream of TNFAIP3 (Ps SNPs 2) ...... 207

Figure 46: Multidimensional scaling plot for all samples across the stimulatory time- course ...... 210

Figure 47: Top tables for differentially expressed genes in HaCaT cells stimulated with IL- 17A ...... 211

Figure 48: Top tables for differentially expressed genes in HaCaT cells stimulated with IFN- γ...... 212

Figure 49: Number of genes and non-coding RNAs implicated by promoter fragment interactions with psoriasis-associated bait fragments ...... 215

Figure 50: Frequency distributions of the number of interactions with promoter fragments per psoriasis-associated bait fragment ...... 216

Figure 51: CHi-C interactions from psoriasis-associated bait fragments in the 9q31 (KLF4) risk locus ...... 219

Figure 52: CHi-C interactions from all bait fragments in the 6q23 (TNFAIP3) risk locus... 221

Figure 53: CHi-C interactions from psoriasis-associated bait fragments in the 5p13.1 risk locus ...... 223

Figure 54: CHi-C interactions from psoriasis-associated bait fragments in the 6p22.3 risk locus ...... 225

Figure 55: CHi-C interactions from psoriasis-associated bait fragments in the 18q21 risk locus ...... 227

Figure 56: Histogram of ages of patients in the LOP cohort ...... 291

Figure 57: Representative gels of BAC clone identity confirmation ...... 295

12

Figure 58: Representative Bioanalyzer output for a high-quality total RNA sample (RIN = 10) ...... 299

Figure 59: Representative Bioanalyzer trace for a Hi-C library ...... 300

Figure 60: Optimisation of ChIP sonication for HaCaT, My-La and NHEK cells ...... 310

Figure 61: Representative QC gels for HaCaT and My-La 3C libraries ...... 311

Figure 62: Colour keys used for ChromHMM and Gencode V19 genes illustrated in CHi-C figures ...... 312

13

List of Key Abbreviations

1KG: 1000 Genomes Project, 43, 124, 125, 186, 203, 236

3C: Chromosome conformation capture, 59, 64, 65, 66, 67, 68, 71, 118, 119, 120, 139, 140, 142, 143, 144, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 163, 164, 167, 168, 169, 170, 171, 186, 194, 195, 196, 197, 198, 199, 200, 201, 202, 205, 206, 207, 218, 220, 229, 230, 232, 233, 234, 235, 236, 237, 238, 242, 243, 248, 293, 296, 298, 311

4C: Chromosome conformation capture-on-chip, 66

5C: Chromosome conformation capture carbon copy, 66

ACTL7A: Actin-like 7A, 186, 188 ACTL7B: Actin-like 7B, 145, 186, 187, 188, 218 ADAs: Antidrug antibodies, 40 AP-1: Activator protein 1, 62

BLAST: Basic local alignment search tool, 134, 162 bp: Base pair, 42, 67, 130, 131, 134, 147, 148, 156, 162, 173, 175, 177, 182, 183, 184, 187, 190, 192, 232, 236, 237, 243, 249, 293, 296, 298, 300, 310 BSTOP: Biomarkers of systemic treatment outcomes in psoriasis, 75, 82, 87, 291

CADD: Combined annotation dependent depletion, 123, 188, 189, 204, 307 CARD14: Caspase recruitment domain family, member 14, 44, 48 CARM1: Coactivator-associated arginine methyltransferase 1, 48 CCL20: Chemokine (C-C motif) ligand 20, 33 CD: Crohn's disease, 48, 50, 51, 52, 53, 54, 55, 68, 242, 244 CD4: Cluster of differentiation 4, 30, 31, 32, 38 CD8: Cluster of differentiation 8, 28, 30, 31, 128 CHi-C: Capture Hi-C, 67, 69, 158, 165, 167, 168, 169, 179, 180, 181, 183, 184, 213, 219, 221, 223, 225, 227, 232, 234, 290, 301, 306 CHiCAGO: Capture Hi-C analysis of genomic organisation, 183, 184, 213, 214, 243, 247 ChIP: Chromatin immunoprecipitation, 59, 62, 63, 65, 68, 71, 118, 119, 120, 123, 126, 127, 129, 130, 132, 133, 134, 135, 137, 138, 139, 148, 149, 160, 163, 186, 189, 190, 191, 192, 193, 200, 201, 229, 230, 231, 233, 236, 237, 238, 248, 292, 310 CPP: Chronic plaque psoriasis, 26, 27, 36 CRISPR: Clustered regularly interspaced short palindromic repeats, 59, 248, 249 Ct: Cycle threshold, 135, 137, 138, 162 14

CTNNAL1: Catenin alpha like 1, 145, 154, 155, 188, 194, 195, 232, 242, 297 dCas9: Dead Cas9, 248, 249 DCs: Dendritic cells, 29, 30, 33, 38 DGV: Downstream gene variant, 189, 205, 307, 308 DMEM: Dubecco's modified Eagle medium, 126, 127, 129, 139, 160, 165 DNA: Deoxyribonucleic acid, 19, 42, 58, 59, 60, 61, 62, 63, 64, 65, 67, 74, 78, 118, 123, 126, 129, 131, 132, 133, 135, 139, 140, 141, 142, 143, 144, 145, 146, 147, 149, 161, 162, 163, 169, 170, 171, 172, 173, 174, 175, 176, 180, 181, 186, 203, 226, 230, 248, 290, 295, 300 DNAse 1: Deoxyribonuclease 1, 61 DZ: Dizygotic, 40, 41

EDTA: Ethylenediaminetetraacetic acid, 126, 127, 130, 133, 141, 171, 172, 173, 174 ENCODE: Encyclopedia of DNA elements, 62, 121, 124, 125, 186, 187, 203, 231, 250, 254 EOP: Early-onset psoriasis, 33, 34, 35, 56, 57, 74, 89, 90, 91, 96, 108, 109, 110, 113, 114, 115, 118, 255 eQTL: Expression quantitative trait locus, 59, 68, 111, 112, 121, 122, 124, 187, 203, 230, 248, 292 ERAP1: Endoplasmic reticulum aminopeptidase 1, 44, 46, 47

FAM206A: Family with sequence similarity 206 member A, 145, 154, 155, 188, 195, 297 FBS: Foetal bovine serum, 126, 160, 165 FDR: False discovery rate, 110, 214, 215

GPP: Generalised pustular psoriasis, 29 GTEx: Genotype-tissue expression project, 59, 112, 121, 122, 123, 124, 125, 153, 188, 204, 231, 234, 243, 245 GWAS: Genome wide association study, 19, 42, 43, 44, 45, 46, 47, 48, 50, 56, 57, 58, 59, 60, 61, 62, 63, 65, 68, 69, 70, 71, 74, 75, 77, 84, 94, 95, 96, 102, 108, 109, 110, 111, 112, 113, 118, 124, 125, 154, 155, 176, 211, 229, 230, 231, 236, 240, 241, 242, 243, 244, 254, 255, 256, 258, 291

HCE: HumanCoreExome, 78, 291 HLA: Human leukocyte antigen, 41, 45, 47, 56 HRQL: Health-related quality of life, 36

IBD: Identical by descent, 81, 82, 88; Inflammatory bowel disease, 50, 51, 52, 54, 55, 111, 244 IFIH1: Interferon induced with helicase C domain 1, 47, 51, 57, 58, 90, 92, 96, 108, 113, 115, 217, 240, 255, 302 IFNGR1: Interferon gamma receptor 1, 70, 158, 159, 204, 205, 206, 207, 220, 228, 234, 240, 242, 243, 298 IFN-γ: Interferon gamma, 29, 31, 33, 61, 160, 163, 165, 167, 169, 200, 201, 209, 211, 212, 233, 234, 240, 243 IKAP: IkB kinase complex-associated protein, 69 IL-10: Interleukin 10, 31 15

IL-12: Interleukin 12, 38, 39, 47, 256 IL12B: Interleukin 12B, 44, 47, 56, 57, 62, 240 IL-17: Interleukin 17, 32, 33, 38, 39, 40, 46, 231, 235, 240, 241, 243, 244, 256 IL1R1: Interleukin 1 receptor type 1, 19, 57, 74, 96, 109, 110, 115, 176, 255, 302 IL1α: Interleukin 1 alpha, 57 IL1β: Interleukin 1 beta, 34, 41, 57 IL-2: Interleukin 2, 31, 128, 214, 242 IL20RA: Interkeukin 20 receptor subunit alpha, 68, 70, 158, 204, 220, 238, 239 IL-22: Interleukin 22, 33, 61, 234 IL22RA: Interleukin 22 receptor subunit alpha 2, 234 IL22RA2: Interleukin 22 receptor subunit alpha 2, 70, 158, 159, 204, 205, 206, 220, 234, 243, 298 IL-23: Interleukin 23, 32, 38, 39, 40, 44, 47, 57, 240, 241, 243, 251, 256 IL23A: Interleukin 23 subunit alpha, 44, 47, 56, 57, 96, 217, 240, 256 IL23R: Interleukin 23 receptor, 44, 47, 56, 57, 58, 92, 94, 95, 96, 108, 115, 231, 240, 255, 256, 301 IL28RA: Interleukin 28 receptor, alpha subunit, 44 IL-4: Interleukin 4, 31 IL-5: Interleukin 5, 31 IL-6: Interleukin 6, 33 IL-8: Interleukin 8, 33 IQR: Interquartile range, 215 IRF4: Interferon regulatory factor 4, 44, 48, 240 IV: Intergenic variant, 189, 205, 307, 308, 309

JIA: Juvenile idiopathic arthritis, 177, 227

KAT2B: Lycine acetyltransferase 2B, 92, 94, 95, 102, 103, 112 kb: Kilobase, 58, 65, 67, 68, 69, 92, 93, 94, 96, 109, 110, 111, 112, 113, 143, 146, 154, 158, 186, 194, 195, 196, 197, 199, 205, 206, 207, 218, 220, 222, 226, 232, 233, 235, 242, 244, 249, 311 KLF4: Krueppel-like factor 4, 19, 45, 46, 48, 49, 52, 62, 68, 69, 118, 119, 120, 145, 149, 153, 154, 155, 156, 157, 160, 162, 185, 186, 187, 188, 189, 194, 195, 196, 197, 198, 199, 200, 201, 217, 218, 219, 230, 231, 232, 233, 234, 236, 238, 240, 242, 243, 246, 248, 249, 255, 292, 293, 294, 296, 297, 299, 304, 307

LC: Langerhans cell, 28, 34, 57 LCE: Late cornified envelope, 33, 45, 46, 48, 241 LCL: Lymphoblastoid cell lines, 122 LD: Linkage disequilibrium, 42, 43, 58, 59, 84, 85, 96, 100, 103, 111, 120, 121, 124, 154, 157, 177, 186, 189, 198, 203, 204, 205, 219, 221, 223, 225, 227, 234, 235, 296, 297, 307 LOP: Late-onset psoriasis, 19, 33, 34, 35, 56, 57, 71, 74, 75, 80, 82, 86, 87, 88, 89, 90, 91, 92, 94, 95, 96, 97, 99, 100, 102, 103, 104, 108, 109, 110, 111, 112, 113, 114, 115, 118, 176, 254, 255, 258, 291 LRRC7: Leucine rich repeat containing 7, 68 16

MAF: Minor allele frequency, 42, 82, 83, 89, 92, 100, 103, 111, 114, 134, 148, 162, 291 Mb: Megabase, 69, 185, 186, 187 MHC: Major histocompatibility complex, 41, 45, 46, 50, 56, 57, 74, 84, 89, 94, 96, 108, 113, 176 MICA: MHC class 1 polypeptide-related sequence A, 45, 47 mRNA: Messenger RNA, 31, 32 MZ: Monozygotic, 40, 41

NAALADL2: N-acetylated alpha-linked acidic dipeptidase like 2, 92, 103, 112, 113 NCTEV: Non-coding transcript exon variant, 307, 308 NFKBIZ: Nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, zeta, 44, 240 NFκB: Nuclear factor kappa B, 44, 57, 69 NIH: National Institutes of Health, 121 NK: Natural killer, 29, 32, 38 NP: Neutrophil, 133, 140

OA: Osteoarthritis, 75, 76, 88, 109, 114 OR: Odds ratio, 50, 57, 69, 83, 92, 94, 95, 96, 99, 100, 103, 108, 109, 110, 111, 112, 113, 243, 244, 245, 301

PASI: Psoriasis area and severity index, 37, 38, 39, 40, 239 PBS: Phosphate buffered saline, 126, 128, 130, 131, 140 PCA: Principal component analysis, 81, 82, 83, 88 PCR: Polymerase chain reaction, 131, 133, 143 PG: Peptidoglycan, 36 PICS: Probabilistic identification of causal SNPs, 120, 124, 186, 189, 203, 205, 230, 236, 307 PML: Progressive multifocal leukoencephalopathy, 38 POLI: Polymerase (DNA directed) iota, 217, 226, 245 PP: Pustular psoriasis, 28, 29 PPP: Palmoplantar pustulosis, 29 PsA: Psoriatic arthritis, 28, 39, 56, 75, 78, 80, 86, 87, 94, 103, 114, 177, 219, 221, 223, 227, 239, 244, 250, 291 PSORS1: Psoriasis susceptibility 1, 41 PTGER4: Prostaglandin E receptor 4, 51, 222, 223, 243, 244, 247, 256, 302 PV: Psoriasis vulgaris, 26, 28, 29 qPCR: Quantitative PCR, 63, 65, 119, 126, 132, 133, 134, 135, 137, 139, 144, 147, 148, 149, 150, 152, 153, 155, 156, 157, 158, 159, 160, 161, 162, 163, 168, 180, 181, 192, 194, 195, 196, 197, 198, 199, 202, 205, 206, 207, 231, 232, 233, 234, 235, 236, 237, 238, 243, 248, 249, 290, 296, 298

RA: Rheumatoid arthritis, 69, 158, 159, 160, 177, 205, 206, 207, 221, 234, 235, 298 17

RAB27B: RAB27B, member RAS oncogene family, 217, 226, 227, 245 RNA: Ribonucleic acid, 60, 122 RNF114: Ring finger protein 114, 41, 55, 57, 96, 305 RPMI: Roswell Park Memorial Institute, 128 RRV: Regulatory region variant, 189, 205, 307, 308, 309

SDS: Sodium dodecyl sulphate, 130, 132, 133, 140 SLE: Systemic lupus erythematosus, 50, 52, 69, 70, 112, 235 SNP: Single nucleotide polymorphism, 43, 46, 47, 48, 50, 56, 57, 58, 59, 63, 69, 80, 81, 83, 84, 85, 91, 92, 94, 95, 96, 99, 100, 103, 108, 109, 111, 120, 121, 123, 124, 125, 154, 155, 158, 159, 176, 177, 186, 187, 188, 203, 204, 205, 206, 207, 218, 229, 230, 231, 232, 233, 234, 239, 241, 242, 243, 244, 245, 292, 301 SOX4: SRY-box 4, 217, 224, 225, 244, 245 SSC: Systemic sclerosis, 177

TAD: Topologically associated domain, 64, 232 TAGAP: T-cell activation RhoGTPase activating protein, 44 TE: Tris-EDTA, 130, 132, 133, 141, 142 TF: Transcription factor, 292 TFBSV: Transcription factor binding site variant, 189, 307, 308 Th1: T helper type 1 cell, 31, 32, 39 Th17: T helper type 17 cell, 32, 39, 44, 47, 49, 57, 240 Th2: T helper type 2 cell, 31 TLE: Tris low-EDTA, 172, 173, 176, 180 TNFAIP3: Tumour necrosis factor, alpha-induced protein 3, 44, 52, 68, 69, 70, 118, 119, 120, 158, 159, 160, 185, 186, 203, 204, 205, 206, 207, 220, 221, 234, 235, 242, 257, 293, 298, 303 TNFRSF9: Tumour necrosis factor receptor superfamily, member 9, 44, 48 TNFα: Tumour necrosis factor alpha, 31, 34, 36, 38 TRAF3IP2: TRAF3 interacting protein 2, 44, 47, 52, 56, 57, 58, 90, 92, 96, 108, 109, 113, 115, 240, 255, 303

UCSC: University of California, Santa Cruz, 121 UGV: Upstream gene variant, 189, 205, 307 UV: Ultraviolet, 37, 131, 142, 143, 146, 175

VEP: Variant effect predictor, 123, 187, 189, 203, 307

WGE: Whole genome expression, 122 WISP3: WNT inducible signaling pathway protein 3, 92, 109 WTCCC: Wellcome Trust Case Control Consortium, 43

18

Abstract

Background: Psoriasis is a common, complex autoimmune disorder of the skin. The genetic basis of psoriasis has been partially characterised by genome wide association studies (GWAS) that have identified a number of underlying genetic signals. GWAS have also begun to explain the genetics underlying the dichotomy in age-of-onset of psoriasis, where a distinct subset of patients develops psoriasis later in life (late-onset psoriasis; LOP). However, in most loci the disease causal variant or target gene remains unknown.

Aims: The broad aim of this study was to improve our understanding of the genetics of psoriasis. Firstly, the study aimed to define genetic variants underpinning LOP. Secondly, the study aimed to use functional genomic techniques to determine the mechanism by which known non-coding signals affect gene function in psoriasis-associated loci, focusing on confirmed psoriasis signals.

Methods: To further define the genetic signals underpinning LOP, a GWAS was carried out comparing a cohort of patients with LOP against psoriasis-free controls. Two approaches were then taken to determine the function of known risk variants for psoriasis. Firstly, DNA-DNA and DNA-protein interactions were examined in two loci of interest in a hypothesis-driven manner. Secondly, a more hypothesis-free approach used capture Hi-C (CHi-C) to determine likely gene targets across all known psoriasis risk loci.

Results: The GWAS for LOP confirmed association with known psoriasis signals, but did not validate a previously identified LOP signal at IL1R1. Twelve novel signals reached study-wide, but not genome-wide significance. The locus-specific functional work provided evidence for gene targets in the two loci studied, notably KLF4 in the 9q31 locus. The CHi-C study identified interactions between psoriasis-associated loci and promoters of approximately 1000 genes and non-coding RNAs, providing compelling targets for further study.

Summary: This work has broadened our view of the genetic mechanisms underlying psoriasis. Whilst the LOP GWAS did not validate the IL1R1 locus, the novel suggestive signals prompt further analysis to confirm an association with age-of-onset. The functional work uncovered novel mechanisms in psoriasis risk loci; this evidence can be taken forward for further functional and clinical applications. 19

Declaration

I declare that no portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

Copyright statement

I. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and she has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. II. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. III. The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. IV. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=24420), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s regulations (see http://www.library.manchester.ac.uk/about/regulations/) and in The University’s policy on Presentation of Theses.

20

Dedication

This thesis is dedicated to my parents, Mike and Melissa Ray-Jones.

21

Acknowledgments

I would like to thank my three supervisors for their steadfast guidance throughout this PhD. Firstly to Professor Stephen Eyre, whose spirit and enthusiasm combined with his knowledge on genetics helped drive the project forward. Professor Richard Warren provided expert knowledge from the front line of the clinic, and has been the perfect advocate for a PhD student. From the start of my time in Manchester, Dr Kate Duffus has been an indispensable help with every aspect of the project, and has been an excellent mentor and friend.

As a member of the Functional Genomics group within the Arthritis Research UK Centre for Genetics and Genomics, I am indebted to all the researchers within the department who have helped me. Thanks to Dr John Bowes and Dr Paul Martin for their generous help with various analyses, and to Professor Anne Barton for her support at the beginning of the PhD process. I would particularly like to thank Dr Amanda McGovern for passing on her considerable skills and experience in the lab, and for sharing the odd pint with me outside of the lab. I am honoured to have been part of the Funtastic Four.

I could not have completed this PhD without the support of my fellow students in 2.706, among whom I have made friends for life. Thank you to my friends Effie, Chris, James Ding, Katie, James Oliver and Jo. Thanks also to my friends outside of the University who have put up with me over the last three years, particularly Cait, Julia, Mike, and the Monday Club.

Finally, thanks go to my whole family – near and far – for their love and encouragement. I am so grateful for my Mum, Dad and my brother Simon, who have never stopped believing in me.

22

The author

My background is in biological sciences, having completed a 4-year BSc in biology at the University of Bath (2010-2014). During my third year of university I undertook a research placement at the University of Arizona in Phoenix, where I studied the effects of traumatic brain injury and developed a wish to work in medical research. During the final year of my BSc I became increasingly fascinated with human genetics and decided to pursue a PhD in this field.

I felt compelled to carry out this particular PhD in order to use cutting-edge molecular biology techniques to further our knowledge of the genetics of autoimmune disease. Over the course of the PhD I have been able to hugely develop my laboratory skills, alongside learning computational techniques for analysing large genetic datasets. Having seen the incredible advancements that have been made over the past three years in our understanding of genome regulation and function, I am hopeful that research in this area will improve the lives of people with psoriasis.

23

24

0. INTRODUCTION

25

Psoriasis is a common yet debilitating immune-mediated inflammatory condition. It usually appears in the form of reddened, raised lesions or “plaques” on the skin, which may be covered with silver or white-coloured scales. The name “psoriasis” refers to the common occurrence of pruritus associated with these plaques; it is derived from the Greek word psora, which means “itch”. Throughout history, psoriasis was believed to be a form of leprosy (Perera et al., 2012), causing much social stigma. There are very few accurate accounts of psoriasis pre-19th century (De Bersaques, 2012). In 1808 it was first described and illustrated by the English physician Robert Willan in his book On Cutaneous Diseases (Willan, 1808). The recognition of the disease as a separate condition from leprosy, however, is credited to the dermatologist Ferdinand von Hebra in 1841 (Schon and Boehncke, 2005).

0.1 Disease prevalence Psoriasis affects around 1-3% of the European population, with prevalence varying depending on the geographical area or ethnic group studied (Parisi et al., 2013). Within the United Kingdom, the prevalence of psoriasis appears to be on the rise from 2.3% in 1999 to 2.8% in 2013 (Springate et al., 2017). In contrast, psoriasis has a lower prevalence elsewhere; in Asia, for instance, it occurs in less than 0.5% of the population (Chang et al., 2009; Ding et al., 2012; Yip, 1984). Ethnicity can affect disease risk; this is demonstrated by studies that show differential prevalence of psoriasis within multicultural populations such as the United States (Gelfand et al., 2005; Rachakonda et al., 2014). In 2014, a study in the US showed that Caucasians were more likely to develop psoriasis than those of other racial groups, with an average prevalence rate of 3.6% compared with 1.9% of African Americans and 1.6% of Hispanics (Rachakonda et al., 2014). There does not appear to be a difference in psoriasis prevalence between males and females (Braathen et al., 1989; Ferrandiz et al., 2001; Nevitt and Hutchinson, 1996; Springate et al., 2017).

0.2 Phenotypes of psoriasis Psoriasis occurs in patients in a number of distinct clinical subtypes that are collectively known as psoriasis vulgaris (PV). The first of these subtypes, accounting for 90% of all cases, is chronic plaque psoriasis (CPP) (Griffiths and Barker, 2007). Its clinical features include itchy or painful thick, red patches on the skin which are clearly defined from the non-involved skin surrounding them. These hyperproliferating patches may be covered by silver-white scales caused by a build-up of keratinocytes (Figure 1A). Psoriatic plaques

26 undergo asymmetric enlargement by actively growing at one edge and remaining static at the other, with plaque growth preceded by increased blood flow through the skin due to vascular alterations (Goodfield et al., 1994; Hull et al., 1989). Plaques in CPP appear commonly on the outer surfaces of the knees and elbows, but often appear in other areas, especially the scalp (Figure 1D) and the lower back (Figure 1B). Variations of CPP that are relevant to a specific region of the body include inverse (intertriginous) psoriasis, which occurs in skin folds (Figure 1C), and sebopsoriasis, which occurs on areas such as the scalp, nasolabial folds and sternum (Griffiths and Barker, 2007). In rare cases the dangerous condition erythroderma can develop, in which 90% of the body becomes inflamed, leading to problems with thermoregulation (Mrowietz and van de Kerkhof, 2011).

A B C D

E F G H

Figure 1: Clinical manifestations of psoriasis, adapted from Boehncke and Schon (2015) Psoriasis occurs in a number of phenotypes. The most common form is plaque psoriasis (A), in which silvery plaques develop, sometimes over large areas of the body (B). Inverse psoriasis occurs in bodily folds such as the armpits (C). Scalp involvement is common (D), as are changes to the nails such as pitting and discolouration (E). Guttate psoriasis typically manifests as small “droplet” lesions (F). Generalized pustular psoriasis manifests in visible, sterile pustules (G); this is sometimes localised to the hands (palmoplantar pustulosis; H). Image used with permission.

27

Nail psoriasis is common, occurring in 50-56% of psoriasis cases (Baran, 2010). In rare cases, nail psoriasis can also occur independently of cutaneous disease. The most common type of nail involvement is pitting, which manifests as small circular depressions in the upper nail plate (Figure 1E) (Jiaravuthisan et al., 2007). Other types include the detachment of the nail from the nail bed (onycholysis), discolouration due to psoriatic lesions in the nail bed (oil drop lesions), raising of the nail bed (subungual hyperkeratosis), transverse ridges (Beau’s lines) and longitudinal ridges with splitting (onychorrhexis) (Jiaravuthisan et al., 2007).

Guttate psoriasis is an acute condition appearing in children and young adults (Figure 1F). The name derives from the Latin “gutta” meaning “droplet”, which refers to the small pink papules that often appear in abundance on the trunk (Griffiths and Barker, 2007; Stern, 1997). It is often thought to be triggered by a prior pharyngeal infection (Stern, 1997). Guttate psoriasis has three outcomes; a third of patients have a single acute episode, a third spontaneously recover after a few weeks and a third worsen into chronic plaque psoriasis (Martin et al., 1996). Dendritic cells in the skin, known as Langerhans cells (LCs), have moderately impaired mobility in guttate psoriasis (Eaton et al., 2014). Patients that have recovered from a guttate psoriasis flare have normal LC function, suggesting that guttate psoriasis might be an intermediate, temporary phenotype (Eaton et al., 2014).

Psoriatic arthritis (PsA) co-occurs in approximately 14% of psoriasis patients in the UK (Ibrahim et al., 2009). PsA is a heterogeneous seronegative arthritis that most often manifests in painful inflammation in the joints and tendons, appearing some years after the development of cutaneous psoriasis. On a cellular level, PsA is thought to be driven by memory-effector CD8+ T cells, which accumulate in the synovial fluid (FitzGerald et al., 2015). The overlapping pathology between psoriasis and PsA is not well understood, but their genetic profiles are gradually being defined by association studies (Bowes et al., 2015a; Ellinghaus et al., 2012b; Okada et al., 2014a).

The heterogeneous nature of psoriasis requires careful patient phenotyping in conjunction with investigating the genetic background of disease. It is becoming more apparent that subtypes of psoriasis may in fact represent related, but distinct, autoimmune conditions. For instance, pustular psoriasis (PP) represents a group of psoriasis phenotypes that differ from PV and are characterised by the presence of visible, 28 sterile pustules on the skin (Navarini et al., 2016) (Figure 1G,H). Clinical phenotypes of PP include generalized pustular psoriasis (GPP) affecting non-peripheral body parts, acrodermatitis continua of Hallopeau (ACH) affecting the nails and palmoplantar pustulosis (PPP) affecting the palms and soles (Navarini et al., 2016). GPP can be associated with painful inflammation and redness, and is often accompanied by fever (Griffiths and Barker, 2007; Langley et al., 2005). Genetic studies of GPP have revealed that a small number of cases are associated with rare protein-coding mutations in IL36RN and CARD14 genes (Hayashi et al., 2014; Jordan et al., 2012b; Korber et al., 2013; Qin et al., 2014). Some of the IL36RN mutations are also associated with severity of disease and age of onset in GPP (Hussain et al., 2015). This distinct genetic background further distinguishes PP from PV and prompts further characterisation of the aetiology of PP phenotypes.

0.3 Cellular basis Under the current consensus, psoriasis is understood to be a genetically predefined autoimmune disorder that may be triggered by environmental factors. The importance of the immune system was first highlighted by the success of therapies with immunosuppressive effects such as methotrexate and cyclosporin A (Baker and Fry, 1992). Cyclosporin A was shown to be particularly effective in improving or clearing psoriatic plaques (Ellis et al., 1986; Griffiths et al., 1986; Van Joost et al., 1986). The drug was found to reduce numbers of T cells in the epidermis and the dermis, implicating these cells as central players in the immune basis of psoriasis (Baker et al., 1987; Lebwohl, 2003). Early work also showed that immune cytokines were important in psoriasis; for example Wrone-Smith and Nickoloff (1996) showed that non-involved skin from psoriasis patients could cause plaque formation in immunodeficient mice with the addition of cytokines such as interferon gamma (IFN-γ). Individuals at risk of developing psoriasis often have genetic polymorphisms affecting genes that are involved in the adaptive and innate immune systems and/or skin barrier regulation (Bergboer et al., 2012b). A combination of environmental and genetic factors leads to chronic inflammation and growth of psoriatic plaques from hyper-proliferating skin cells (Bergboer et al., 2012b; Perera et al., 2012).

Within psoriatic lesions, macrophages, neutrophils, natural killer (NK) cells, dendritic cells (DCs) and activated T cells are abundant (Bos et al., 2005). During lesion formation,

29 keratinocytes, traditionally considered to be relatively inert cells, interact with and activate T cells and DCs (Lowes et al., 2007). Keratinocytes account for 95% of all cells in the epidermis (Barker et al., 1991). They are essential for maintenance of the skin barrier and are now recognised as being actively involved in responses of the epidermal immune system to antigens. Normally, keratinocytes undergo controlled mitosis in the basal layer and then migrate to the suprabasal layers. There they begin to differentiate, finally residing as anuclear corneocytes in the outermost layer; the stratum corneum. This process, known as terminal differentiation, usually takes 28-30 days (Schon and Boehncke, 2005). However, in psoriatic skin, the process takes only 3-5 days due to increased keratinocyte turnover in the basal layer. This, coupled with abnormal differentiation and migration leads to the build-up of immature keratinocytes in the cornified layer (Schon and Boehncke, 2005). The causal mechanism underlying abnormal keratinocyte differentiation has yet to be defined, but probably involves complex interactions between the innate and the adaptive immune system.

This process of abnormal keratinocyte production and differentiation can be clearly seen in histopathological samples of involved psoriasis (Figure 2). Retention of cell nuclei (parakeratosis) in the stratum corneum, caused by premature keratinocyte maturation is characteristic of psoriatic plaques. In the epidermis, the high turnover of keratinocytes and infiltration of CD8+ T cells causes thickening of the layer (acanthosis), whilst the granular layer is depleted (hypergranulosis). Additionally, rete ridges, which are extensions of the epidermis pointing downwards between dermal papillae, become elongated (papillomatosis). Within the dermal papillae themselves, CD4+ T helper cells and DCs congregate (Figure 2) (Wagner et al., 2010).

30

A B

Figure 2: Histology of psoriasis from Wagner et al. (2010) Haemotoxylin-stained nuclei (purple) can be seen in the context of eosin-stained cytoplasm (pink); cross sections of normal skin (A) versus psoriatic skin (B) are shown. The psoriatic skin displays characteristic parakeratosis in the stratum corneum; acanthosis in the epidermis; epidermal rete ridge growth and hyperkeratosis. The arrow points to infiltrating inflammatory cells in the dermis. Image used with permission.

Psoriasis is driven by a dysregulated immune response, within which T cells play a key role (illustrated in Figure 3). T cells, which can be helper (CD4+) or killer (CD8+), release cytokines that can drive keratinocyte proliferation (Schlaak et al., 1994). In the 1990s, it was discovered that CD4+ cells are more numerous than CD8+ cells in sets of cloned T cells taken from the epidermis of psoriatic patients (Schlaak et al., 1994). CD4+ T helper cells are differentiated into Th1 cells and Th2 cells, based on which cytokines they produce (Uyemura et al., 1993). These early studies found that psoriasis lesions contain messenger RNA (mRNA) for Th1-associated cytokines, in particular interleukin 2 (IL-2) and interferon gamma (IFN-γ), and relatively low levels of mRNA for Th2-associated cytokines (IL-4, IL-5 and IL-10) (Schlaak et al., 1994; Uyemura et al., 1993). This led to the conclusion that psoriasis is driven in part by Th1 cells. This hypothesis was extended to include a central role for tumour necrosis factor alpha (TNFα) in psoriasis pathogenesis, due to the success of anti-TNFα therapies such as adalimumab and etanercept (Tracey et al., 2008). This suggests that the innate immune system is activated in psoriasis.

31

Figure 3: Schematic of psoriasis pathogenesis (Cai et al., 2012) DC dendritic cell, KC keratinocyte, MØ macrophage, NK natural killer cell, NKT natural killer T cell, Tc cytotoxic T cell, Th T helper cell, Treg regulatory T cell. Figure used with permission.

Psoriasis is now known to be driven by a CD4+ Th17 subset as well as Th1 cells (Lowes et al., 2008). Th17 cells produce the cytokine interleukin 17 (IL-17) in response to the cytokine IL-23 (Aggarwal et al., 2003). The subunits of IL-23 (p40 and p19) are upregulated in psoriasis lesions (Lee et al., 2004). It is proposed that an immune-mediated disease cycle occurs with IL-17 at its core, which is supported by evidence from Teunissen et al. (1998) who showed that biopsies of psoriatic plaques had IL-17 mRNA whereas non- involved skin did not. CD8+ cytotoxic T cells, which are abundant within skin lesions, are also producers of IL-17. In psoriasis, CD8+ T cells are thought to be autoreactive, meaning that they might respond to self-antigens such as keratin (Deng et al., 2016). CD8+ T cells 32 release several inflammatory cytokines in the skin including IL-17, IFN-γ, IL-22 and IL-13; of these, IFN-γ is thought to promote apoptosis of keratinocytes in the epidermis (Hijnen et al., 2013). Meanwhile, IL-17 causes abnormal differentiation of keratinocytes, and leads them to produce pro-inflammatory cytokines such as IL-6 and IL-8 (Teunissen et al., 1998). Subsequent cytokines secreted by keratinocytes such as chemokine (C-C motif) ligand 20 (CCL20) recruit further T cells and DCs, creating a positive feedback loop leading to the development of chronic psoriatic plaques (Martin et al., 2013).

Psoriasis is also likely to be driven by deficiencies in the skin barrier that affect infiltration of antigens. This might include structural or antimicrobial abnormalities that lead to a systemic immune response. For example, a higher genomic copy number of a β-defensin cluster at 8p21.3 was found to be associated with psoriasis in a Dutch cohort (Hollox et al., 2008). Β-defensins are small peptides with antimicrobial activity that are predicted to form part of the chemical barrier of the skin and attract inflammatory mediators (Bergboer et al., 2012b). In addition, genetic studies have identified a disease-associated deletion of Late Cornified Envelope (LCE) genes that were initially thought to have roles in skin barrier repair, but have recently been shown to have antibacterial properties (discussed further in Section 0.8.3.1) (Liu et al., 2008; Niehues et al., 2017).

0.4 Age of onset Psoriasis can occur in any stage of life. However, epidemiological evidence suggests that CPP onset occurs at two peak ages. In 1985 Henseler and Christophers analysed patient records at the University of Kiel to discover a bimodal frequency distribution of psoriasis age of onset, with peaks at 16-22 years of age and 57-60 years of age (Figure 4). In the UK, a recent review of public health records reported a distinct bimodal pattern of psoriasis incidence against age in both men and women, supporting the dichotomy of early-onset psoriasis (EOP) and late-onset psoriasis (LOP) (Springate et al., 2017). Under the current consensus, EOP develops in patients who are younger than 40, whereas LOP develops in patients who are older than 40. Approximately 75% of patients have EOP and the remaining 25% have LOP (Hebert et al., 2014b). As might be expected, the peaks of the age of onset occur in two normal distributions with an area of overlap between them (Henseler and Christophers, 1985). This is reflected in the genetics of the two types, in which there is also a large degree of overlap (Hebert et al., 2014a). Meanwhile the

33 heritability of LOP is unknown but thought to be lower than that of EOP, for which patients are more likely to have a positive family history (Heredi et al., 2013).

The cellular basis of LOP is seemingly different to that of EOP involving interleukin 1 beta (IL1β). IL1β is a pro-inflammatory cytokine that, alongside TNFα, activates LCs. In healthy individuals, IL1β and TNFα cause LCs to migrate towards draining lymph nodes, where they interact with T cells leading to inflammation. In 2010, Shaw et al. injected IL1β and TNFα into the skin of small groups of patients with EOP and LOP in order to measure the differential movement of epidermal LCs. The authors noted an impaired migration of LCs in response to both TNFα and IL1β in patients with EOP. In patients with LOP, however, there was impaired migration in response to TNFα but not IL1β. Since both of these cytokines are required for LC signalling, the authors speculated that there might be a production deficit of functioning IL1β in LOP patients, which prevented the LCs from responding correctly to TNFα alone (Shaw et al., 2010). A recent immunocytochemical study showed that there is a difference in the ratio of CD4+ T cells to CD8+ T cells in EOP patients and LOP patients (Theodorakopoulou et al., 2016). In this study, biopsies of involved skin extracted from EOP and LOP patients revealed a CD4+:CD8+ ratio of 0.5 in EOP compared with a ratio of 1.3 in LOP, attributed to an influx of CD4+ cells in LOP.

34

Figure 4: Bimodal distribution of psoriasis age of onset from Henseler and Christophers (1985) The likelihood of developing psoriasis at each age is plotted for male patients (A) and female patients (B). Line 1a = cumulative incidence rates; line 1b = non-cumulative incidence rates; line 2a = early-onset group; line 3a = late-onset group; line 2b indicates normal distribution of 2a; line 3b indicates normal distribution of 3a; µ1 = mean age of early-onset patients, µ2 = mean age of late-onset patients. Used with permission.

In addition to these cellular changes, there are visible phenotypic differences between EOP and LOP patients. Clinical studies have suggested that, in general, LOP is milder and more stable than EOP. In 2009, Guinot et al. utilised a statistical clustering approach to delineate psoriasis subtypes based on clinical features including age of onset, plaque size and regions of skin affected. Those patients that fell into clusters associated with a later age of onset were found to have a mild form of disease with fewer plaques and reduced sensitivity to environmental factors in comparison with the overall sample (Guinot et al., 2009). However, there is evidence for LOP patients having higher levels of anxiety and associated co-morbidities such as type 2 diabetes (Theodorakopoulou et al., 2016) and obesity (Heredi et al., 2013). Together, these studies strengthen the view that LOP is a distinct subtype of psoriasis (Kirby, 2016). Currently, however, there is no difference in the treatment regimen offered to patients based on their age of onset.

0.5 Environmental risk factors There are several environmental triggers implicated in psoriasis initiation and exacerbation. These include physical trauma, pharyngeal infection, medications, smoking 35 and alcohol (Perera et al., 2012). CPP is often triggered by a stressful event to the individual; life crises were reported to have been precursors in 46% of CPP cases in the Swedish population (Mallbris et al., 2005). Additionally, trauma to the skin can result in localised lesions, known as the Koebner phenomenon (Raychaudhuri et al., 2008). Medications that can trigger psoriasis include lithium, beta-blockers and antimalarial treatments (Abel et al., 1986; Milavec-Puretic et al., 2011). Paradoxically, anti-TNFα therapies have initiated or exacerbated psoriasis in a small number of cases (Wendling et al., 2008). Both smoking and alcohol have a negative impact on psoriasis, with smoking linked to disease incidence and alcohol implicated in worsening of the disease (Higgins, 2000). Obesity is also recognised as a risk factor for psoriasis, but the basis of the relationship between weight and disease severity is currently unclear (Duarte et al., 2013).

Guttate psoriasis is often triggered by a Streptococcal throat infection; in a Swedish study of young patients with guttate psoriasis, Streptococcus was identified as an infectious agent in 63% of cases (Mallbris et al., 2005). Streptococcal superantigens are thought to cause activation and expansion of T cells in psoriatic skin (Leung et al., 1995). There is some evidence to suggest that the superantigen involved is M-protein; a virulence factor situated in the bacterial cell membrane (Gudjonsson et al., 2003). One theory is that streptococcal M-protein has a similar structure to human keratin, which leads to cross reactivity by activated T cells following an infection (Valdimarsson et al., 2009). Alternatively, streptococcal peptidoglycan (PG) could be activating or modifying the immune response (Valdimarsson et al., 2009).

0.6 Quality of life In 2002, the direct cost of psoriasis in the U.S. was estimated at 649.6 million dollars (Javitz et al., 2002). The greatest cost, however, is to patients’ psychological wellbeing. Although rarely life threatening, psoriasis has a serious negative impact on patients’ Health-Related Quality of Life (HRQL), causing disability which is comparable to other major diseases such as cancer and diabetes (Rapp et al., 1999). It can lead to increased anxiety, depression and general social embarrassment as well as anger and loss of identity. As well as reducing the quality of life of those directly affected individuals, the disease has an indirect impact on those closest to them. Research has shown that those who care for or live with patients are generally more anxious and depressed, with 87.8%

36 of cohabitants displaying a reduced quality of life score in comparison with controls (Martinez-Garcia et al., 2014).

0.7 Therapeutic treatments There is currently no known cure for psoriasis but there are a number of therapies available, with treatment efficacy varying from patient to patient. These therapies fall into four broad categories: topical treatments, ultraviolet (UV) light therapy, traditional systemic drugs and biologics (Hebert et al., 2012). In clinical trials, the efficacy of therapies is commonly measured as a percentage of patients who achieved reduction in the Psoriasis Area and Severity Index (PASI) score; a tool to measure psoriasis severity. In past studies the primary end point was a 75% reduction in PASI score (PASI 75); more recently this has shifted to a target of 90% reduction (PASI 90).

0.7.1 Topical therapies Topical therapies are widely used for patients with mild to moderate disease. These include corticosteroids, vitamin D analogues, keratolytics, coal tar, retinoids and dithranol. Corticosteroids are currently the most widely used topical therapy as they are generally effective and well tolerated. They work by reducing inflammation, dampening cell proliferation and inhibiting immune cell function. In a systematic review, the most potent forms of corticosteroids were found to be more efficacious than other forms of topical therapy including vitamin D analogues and coal tar (Mason et al., 2002). Today, combinations of steroids and vitamin D analogues are often used. However, steroids are not a long-term solution due to the potential of adverse side effects such as skin atrophy, and vitamin D analogues can cause skin irritation.

0.7.2 UV therapy Typically UV treatment is used in patients with moderate to severe disease, which may cover a large area of skin. Phototherapy is thought to work by acting on the immune system by multiple mechanisms such as altering cytokine profiles and causing apoptosis (Wong et al., 2013). In the past, UVA therapy combined with the systemic drug psoralen (PUVA) was used; however, over time this can lead to an increased risk of developing melanoma (Stern, 2001). Today narrowband UVB, which uses the most effective UVB wavelengths, is favoured (Greb et al., 2016).

37

0.7.3 Systemic treatments The most common systemic treatments include methotrexate, cyclosporin and acitretin. Methotrexate is the most widely used systemic drug and is thought to have multiple actions, primarily targeting metabolism by blocking the pyrimidine and purine pathways (Hebert et al., 2012). Other systemic treatments include the immune modulators Apremilast and Fumaderm. Aprelimast is thought to regulate the innate immune system by blocking phosphodiesterase type 4 (PDE4), whereas the exact mechanism of Fumaderm is as yet unclear.

Although systemic drugs have proven to be very effective in a number of cases, there are some concerns. Long-term suppression of the immune system increases the risk of infection. Additionally there can be further side effects; for example, cyclosporin can adversely affect kidney function over time (Markham et al., 2002). Some patients with severe disease do not respond well to systemic drugs; in these cases biologics are considered.

0.7.4 Biologics Biologics are drugs that are produced by biological processes within organisms. Biologics used in the treatment of psoriasis target the immune system, either by inhibiting T cell function in general or by blocking the action of proinflammatory cytokines. In the past, biologics targeting T cell trafficking have included efalizumab (Raptiva) and alefacept (Amevive). However, prescription of these drugs was halted in 2009 and 2011, respectively. In the case of efalizumab, this was due to its usage being linked with several cases of progressive multifocal leukoencephalopathy (PML); an often fatal demyelinating disease (Carson et al., 2009). Other biologics have proved effective in targeting the cytokines at various stages in the psoriatic inflammatory cascade: TNFα, IL-12/IL-23 and IL-17.

0.7.4.1 Anti-TNFα therapies TNFα is a proinflammatory cytokine released by several active cell types in psoriasis including macrophages, DCs, CD4+ T cells and NK cells. TNFα has a wide range of effects, including immune cell maturation and migration (Fantuzzi et al., 2008). Biologics targeting TNFα include etanercept, infliximab and adalimumab (Lowes et al., 2007). Etanercept is a soluble TNF receptor fusion protein. In a phase III clinical trial for etanercept, PASI 75 was

38 reached by 47% of psoriasis patients, and improvements in fatigue and depression were associated with treatment (Tyring et al., 2006). Infliximab is a monoclonal antibody that has a good primary response rate: 80% of patients receiving infliximab in a phase III trial achieved PASI 75 after 10 weeks (Reich et al., 2005). However, infliximab does not produce a good secondary response; in the same study only 61% of patients achieved PASI 75 after 50 weeks (Reich et al., 2005). Adalimumab, another monoclonal antibody, is the current gold standard for anti-TNF therapy for psoriasis in the UK. In a phase III clinical trial, 71% of patients treated with adalimumab achieved PASI 75 after 16 weeks (Menter et al., 2008).

0.7.4.2 Anti- IL-12/IL-23 therapies The cytokines IL-12 and IL-23 are thought to be central to psoriasis pathology; they are involved in activation of T cell subsets Th1 and Th17, respectively (Lima and Kimball, 2010). The biologic ustekinumab was designed to inhibit IL-12, but also targets IL-23 via a shared protein subunit, p40. In phase III trials, psoriasis patients treated with varying doses of ustekinumab achieved PASI 75 in 66.4-75.7% of cases (Leonardi et al., 2008; Papp et al., 2008). Most of the action of ustekinumab probably occurs via targeting IL-23, since this cytokine causes downstream IL-17 production (Quatresooz et al., 2012). Ustekinumab was found to have a higher efficacy than high doses of etanercept in a direct comparison study over a 12 week period (Griffiths et al., 2010).

Currently, biologics are being developed against the p19 subunit of IL-23 in order to target this cytokine more specifically. These include guselkumab, risankizumab, tildrakizumab and mirikizumab; of these guselkumab is now licenced for therapeutic use. Guselkumab has shown very promising results in clinical trials: in a phase III trial, 73.3% of patients on guselkumab had PASI 90 in comparison with 49.7% of patients on adalimumab at week 16 (Nakamura et al., 2017).

0.7.4.3 Anti-IL-17 therapies Biologics targeting IL-17A (secukinumab and ixekizumab) or the IL-17 receptor (brodalumab) are also effective in treating psoriasis and PsA (reviewed by Mease (2015)). Secukinumab is a human monoclonal antibody for IL-17A that was approved for use in the treatment of psoriasis in 2015. In a phase III trial, PASI 75 was achieved in up to 87.6% of patients treated with secukinumab (Paul et al., 2015). Data from the secukinumab vs ustekinumab phase III CLEAR trial showed that 79% of secukinumab patients achieved 39

PASI 90 after 16 weeks, compared with 57.6% of ustekinumab patients (Thaci et al., 2015). Ixekizumab is another human monoclonal antibody targeting IL-17A that has also proven to be very effective in phase III trials, with PASI 75 achieved in up 89.7% of patients, and approximately 40% of patients having clear skin by 12 weeks (Griffiths et al., 2015). Brodalumab on the other hand is an IL-17 receptor antagonist that was originally co-developed by Amgen and AstraZeneca. By blocking the IL-17 receptor, brodalumab effectively inhibits IL-17A, IL-17F and IL-17E. Patients treated with brodalumab in an open-label extension study had promising response with 85% of patients achieving PASI 90 at week 12 (Papp et al., 2014).

0.7.4.4 Limitations of biologics Biologic therapies can gradually lose efficacy in patients over time; the reasons for this are currently unknown. This could in part be due to the production of neutralising antidrug antibodies (ADAs) (Hsu and Armstrong, 2013), which has been demonstrated in clinical studies. For instance, patients that developed ADAs in response to adalimumab or infliximab had a reduced clinical response in comparison with those without ADAs (Bito et al., 2014). However, ADAs that develop against the biologic etanercept seem to have no effect on clinical outcome (Hsu et al., 2014a). Therefore, ADAs do not provide a complete explanation for the gradual decrease in the efficacy of biologics.

Despite this, the overall success of biologics has given an insight into which molecular pathways are highly important in psoriasis. The evidence for the involvement of key cytokines has been reflected in genetic studies that have identified disease-associated variants near genes coding for proteins in inflammatory pathways, such as IL-23. This will be explored in the next section. It is important to learn more about the function of these genetic risk variants in disease, both for identifying novel therapeutic target genes and finding markers that predict how a patient will respond to particular therapies (Hebert et al., 2012).

0.8 Genetic risk factors Genetic predisposition is the largest known risk factor for psoriasis. Many genes implicated in psoriasis are involved in the innate immune system, adaptive immune system or skin barrier (Perera et al., 2012). Twin studies have traditionally been used to study the heritability of traits. Twin studies utilise the fact that monozygotic (MZ) twins share a gene repertoire and dizygotic (DZ) twins share half of their genes, and that twins 40 are likely to share environmental influences in early life. This allows for the separation of genetic and environmental factors. The largest twin study in psoriasis was conducted on 3246 MZ and 7479 DZ twin pairs in Denmark (Lonnberg et al., 2013). In this study, MZ twins whose co-twin had psoriasis had an 8-fold risk of developing the condition, while the risk was 4-fold in DZ twins. The authors reported a heritability of 60-75%, in agreement with previous findings (Grjibovski et al., 2007).

0.8.1 Linkage studies Linkage studies attempt to determine the genetic markers for a disease by observing how these markers co-segregate in large family pedigrees affected by the disease. This works on the basis that genetic markers in close proximity to a disease causal variant are unlikely to be separated by recombination; therefore they are more likely to be inherited together in all affected family members. In psoriasis, many of these studies were carried out in the 1990s (Puig et al., 2014). The most prominent psoriasis susceptibility locus is PSORS1 at 6p21.3 within the Major Histocompatibility Complex (MHC) class I (Nair et al., 1997; Trembath et al., 1997). The candidate genes in this locus include HLA-C*06, which is now known to be an important risk factor in psoriasis (Chandran, 2013).

Linkage studies revealed up to 13 PSORS loci in total (Asumalahti et al., 2003; Capon et al., 1999; Friberg et al., 2006; Lee et al., 2000; Matthews et al., 1996; Nair et al., 1997; Puig et al., 2014; Samuelsson et al., 1999; Veal et al., 2001; Zhang et al., 2002). Whilst some of these loci have been backed up by recent genetic studies; for example PSORS12 has been refined to the variant rs495337 near the RNF114 gene (Capon et al., 2008), linkage studies are generally underpowered to detect all but the genetic loci with very large effects (Visscher et al., 2012).

0.8.2 Candidate gene studies Candidate gene studies are performed on a particular gene of interest where there is a priori knowledge that the gene has an important function in disease. These studies can be performed by analysing a small number of variants within the gene and making a comparison of allele frequencies between a case group and a control group. An example in psoriasis is the candidate gene study for IL1B conducted in a late-onset psoriasis cohort by Hebert et al. (2014b). In this study IL1B was selected as a candidate gene due to the previously reported dysregulated response of Langerhans cells to inflammatory cytokines that implicated a role for IL1β in late-onset disease (Shaw et al., 2010). Hebert et al. 41 identified a significant variant (rs16944) approximately 500 bp upstream of IL1B and a second significant variant (rs11687624) in an intergenic region between IL1A and IL1B, thereby providing further evidence for the importance of IL1B in late-onset psoriasis (Hebert et al., 2014b). Candidate gene studies are advantageous over linkage analysis because they have more power to identify disease-associated variants. However, the main disadvantage is that the study is limited to genes with known effects in disease, therefore does not allow for discovery of novel gene targets.

0.8.3 Genome wide association studies Genome-wide association studies (GWAS) are used to analyse single nucleotide polymorphisms (SNPs) across the entire genome, accounting for common genetic variation between individuals. The technique has been made possible in recent years through the development of sophisticated methods for rapid genotyping of DNA. Association studies attempt to identify SNPs that are markers for a particular disease by searching for alleles that occur more frequently in a disease cohort than in a group of matched, healthy controls. Practical advantages of the GWAS technique over conventional linkage studies are that subjects do not need to be related, and cohorts can be shared between epidemiological and genetic projects. A successful GWAS can be made when the disease under investigation has a strong genetic background and the genotyped individuals have a well characterised clinical phenotype (Amos, 2007).

Linkage disequilibrium (LD) is the phenomenon whereby alleles are non-randomly associated with each other, and therefore occur together in the population more frequently than is expected by chance. In GWAS, LD between alleles can be used to identify the full extent of SNPs that are associated with disease, and therefore the set of SNPs that contains the potential causal variant(s). The associated variant in a GWAS implicates a range of SNPs that are all highly correlated (in high LD), that may all have the same likelihood of being causal. LD between two variants is usually measured in one of two ways: -squared (r2) or D prime (D’). The difference between these two measures is that r2 incorporates the allele frequency of the two variants in question, whereas D’ does not. Therefore if two variants are in LD but one variant has a much higher minor allele frequency (MAF) than the other, the r2 value may be low but D’ will be high. In general only common SNPs, those with a MAF of greater than 1%, are analysed in GWAS studies, since rarer alleles would require very large sample sizes to have the power to confidently

42 identify associated SNPs. To reduce the likelihood of false positive errors, the significance value for a marker is often adjusted for multiple testing by Bonferroni correction, which is generally defined as P < 5 x 10-8 for genome-wide significance.

The International HapMap Project is a collection of common genetic variants found in human populations (Altshuler et al., 2005). HapMap gives information on common haplotypes, which are a collection of SNPs in LD that are likely to be inherited together. More recently, the 1000 Genomes (1KG) project sequenced more than 2,500 genomes to characterise more than 88 million variants in humans (Altshuler et al., 2012; Auton et al., 2015). The data from both HapMap and 1KG has been made publically available, allowing researchers to impute unknown SNPs into association studies through their LD with a genotyped SNP. This means that a GWAS does not actually need to genotype every SNP; instead a microarray can be used which includes several hundred thousand tag SNPs to identify haplotypes, while further SNPs can be imputed.

The first major GWAS in complex diseases was the Wellcome Trust Case Control Consortium (WTCCC), which performed GWAS using seven disease cohorts of approximately 2000 cases with a shared control cohort of around 3000 individuals (The Wellcome Trust Case Control Consortium, 2007). The diseases included immune- mediated disorders such as Crohn’s Disease, rheumatoid arthritis and type I diabetes. In 2008 the consortium expanded to form the Wellcome Trust Case Control Consortium 2 (WTCCC2), encompassing thirteen further diseases that included psoriasis. The psoriasis GWAS in WTCCC2 was the largest of its kind, comprising more than 2000 cases and 5000 controls (Strange et al., 2010). The study demonstrated that larger sample sizes lead to an increase in significant findings, since it replicated all previously known loci and discovered eight new loci.

The first SNP microarrays, such as the GeneChip Mapping 10K, only covered approximately 10,000 SNPs. In comparison more recent arrays, such as the Genome-Wide Human SNP Array 6.0, cover approximately one million SNPs and are designed to allow for genome-wide SNP imputation. The WTCCC study utilised an Affymetrix chip covering more than 500,000 SNPs that facilitated genome-wide imputation from HapMap. The most recent Illumina arrays designed for whole genome coverage, such as the HumanOmni2.5, can cover several million SNPs across the genome. Arrays can also be targeted towards specific populations, or towards specific regions in the genome. For 43 example, the HumanCoreExome array incorporates markers for variants within exons allowing for gene-focused GWAS as well as genome-wide imputation. Exon arrays have been used in recent GWAS in Chinese populations to identify coding risk variants (Tang et al., 2014; Zuo et al., 2015). Importantly for immune-mediated conditions, the Illumina Immunochip microarray was designed to target immune-related loci across the genome (Cortes and Brown, 2011). The Immunochip consists of probes specific for 195,806 tag SNPs and 718 indels (insertion-deletion mutations), with the purpose of replicating meta- GWAS data and conducting fine mapping of GWAS susceptibility loci. It has been used in both early-onset (Tsoi et al., 2015; Tsoi et al., 2012) and late-onset (Hebert et al., 2014a) psoriasis, and has been able to identify loci shared between several autoimmune conditions.

0.8.3.1 GWAS and fine mapping in psoriasis To date there have been more than fifteen GWAS in psoriasis in European and Chinese populations that have uncovered a wealth of information about the genetic background of disease. The first few GWAS studies identified important risk loci such as HLA-C, IL12B, IL23R, LCE, TNIP1 and TNFAIP3 (Cargill et al., 2007; Nair et al., 2009; Strange et al., 2010). Subsequently, large-scale GWAS and meta-analyses that combine multiple GWAS datasets have been very successful in replicating these findings and identifying further risk loci (Tsoi et al., 2015; Tsoi et al., 2012; Tsoi et al., 2017). Meta-analyses have also allowed for comparison between different conditions such as between psoriasis and Crohn’s disease (Ellinghaus et al., 2012a) and between cutaneous disease and psoriatic arthritis (Stuart et al., 2015). The largest meta-analysis to date, consisting of 11,988 cases and 275,334 controls, brought the number of genome-wide significant loci (P < 5 x 10-8) to 63 in Europeans (Tsoi et al., 2017). In Han Chinese individuals, GWAS have indicated genetic overlap as well as unique associations (blue rows, Table 1).

Psoriasis susceptibility loci have highlighted dysfunction of innate and adaptive immune signalling in disease (Tsoi et al., 2012). Many of the gene candidates are involved in IFN or NFκB signalling (e.g. IL-28RA, Rel, TNIP1, TNFAIP3, NFKBIA, CARD14 and TYK2) and T cell regulation (e.g. TNFRSF9, RUNX3, IL13, IL4, TAGAP, ETS1 and MBD2). Several further gene candidates are specifically involved in IL-23/Th17 signalling, which is thought to be integral to psoriasis pathogenesis (e.g. IL-23R, IL12B, IRF4, TRAF3IP2, NFKBIZ, IL-23A, SOCS1 and STAT3). Gene candidates are also involved in antigen presentation (ERAP1 and

44

HLA-C) and regulation of antigen presenting cells; for example, NOS2 is thought to be important in dendritic cell function and ZC3H12C is involved in activation of macrophages. Additionally skin barrier regulatory genes, namely the LCE gene cluster and KLF4, are potential candidates for susceptibility (Figure 5). The various roles of these gene candidates will be examined more closely.

Antigen presentation GWAS studies have uniformly shown that immune system presentation of antigens by MHC class 1 to the adaptive immune system is a key factor in psoriasis pathology. Within the MHC class 1, an allele at the Human Leukocyte Antigen (HLA) gene, HLA-C*06:02 has consistently shown highly significant association with psoriasis (Ellinghaus et al., 2010; Nair et al., 2009; Strange et al., 2010; Stuart et al., 2010; Tsoi et al., 2012).

The MHC locus is complex, as it contains many linked genes (HLA-A, HLA-B, class II HLAs and MICA). This necessitates detailed investigation of the region to identify risk variants; achieved recently through fine mapping and imputation of SNPs and amino acid polymorphisms into MICA and HLA genes (Okada et al., 2014a). This study confirmed the strongest association at HLA-C*06:02. In order to identify any further associations the authors used conditional analysis; a technique that identifies independent signals by conditioning for the signal from the original variant. Conditioning for HLA-C*06:02 revealed another significant risk allele; HLA-C*12:03. Conditioning on HLA-C itself revealed further associations at HLA-B, with subsequent rounds of conditioning revealing associations at HLA-A and HLA-DQA1, but not at MICA. Therefore, it is likely that a combination of variants in HLA genes drive the high association with psoriasis at the MHC locus.

45

Figure 5: Hypothetical model of psoriasis pathogenesis, adapted from Bergboer et al. (2012b) This model emphasises the involvement of both dysregulated barrier function and altered immune signalling in disease pathogenesis. GWAS signals at LCE and KLF4 support the contribution of the skin barrier in susceptibility, whilst multiple GWAS signals implicate the innate and adaptive immune system. By this model, reduced barrier repair allows for infiltration of antigens such as microbial components causing activation of keratinocytes, which in turn signal to the adaptive immune system. T cells are involved in release of multiple cytokines, such as TNF-α and IL-17, which exacerbate the immune response and further activate keratinocytes. This leads to a cycle of inflammation in the skin. Proteins that are coded by genes that are directly implicated in GWAS findings are labelled in red.

In 2010, the WTCCC2 GWAS uncovered a susceptibility locus at endoplasmic reticulum aminopeptidase 1 (ERAP1) (Strange et al., 2010). ERAP1 encodes an enzyme that is responsible for N-terminal trimming of peptides to allow them to bind to the MHC-1 molecule in its final stages of production, and is implicated in other autoimmune conditions including ankylosing spondylitis (Alvarez-Navarro and de Castro, 2014). This risk locus therefore provides further evidence that antigen presentation is important in psoriasis. Additionally, the authors reported an epistatic interaction between MHC-1 and ERAP1, since those with the ERAP1 SNP were only at risk of psoriasis if they also carried

46 the HLA-C risk allele (Strange et al., 2010). However, in a more recent genotyping study by Lysell et al. (2013) that stratified for age, association with ERAP1 was independent of HLA- C*06:02, which contradicts the epistasis reported by Strange et al. (2010). Moreover, in this study significant association of psoriasis with SNPs at ERAP1 was only seen in the group whose onset was between 10 and 20 years of age (Lysell et al., 2013). It is possible that this was due to the moderate sample size of the study (954 patients and 1748 healthy controls). However, until more work takes place on the relationship between age and this epistatic effect, the discrepancy in findings remains unresolved.

Immune Signalling Many GWAS hits have been found near genes involved in pro-inflammatory cytokine networks and T cell signalling. In 2007, one of the first GWAS studies identified risk loci for psoriasis with targets in the IL-23/Th17 axis (Cargill et al., 2007). By analysing pooled genotypic data from three cohorts of North American Caucasians of European ancestry, the authors located risk loci in the IL12B and IL-23R genes. The initial large-scale probing of 25,215 SNPs in the test cohort revealed the association with IL12B, which was followed up by re-sequencing of the IL12B gene in a small group of psoriasis patients to identify unknown putative causal SNPs. IL12B codes for the p40 subunit required for production of IL-12 and IL-23 proteins (Puig et al., 2014). Single marker analysis and haplotype generation revealed the further risk locus IL-23R (Cargill et al., 2007). The major (G) allele of rs11209026 near IL-23R has been shown to correlate with a more severe psoriasis phenotype, and the major (A) allele of rs2066808 near IL-23A is associated with co- existence of psoriatic arthritis (Eiris et al., 2014). The associations at IL12B and IL-23R were confirmed in case-control and family analyses (Nair et al., 2008) and several further GWAS both in European cohorts (Ellinghaus et al., 2010; Strange et al., 2010; Tsoi et al., 2012) and Chinese cohorts (Zhang et al., 2009). Further risk loci, IL-23A and TRAF3IP2, which were identified in GWAS in 2009 and 2010 respectively (Nair et al., 2009; Strange et al., 2010), are also involved in the IL-23/Th17 axis (Elder, 2013).

In 2012, a GWAS meta-analysis identified 15 new risk loci for psoriasis (see Table 1) and replicated 19 previously known loci (Tsoi et al., 2012). The novel loci found in this study highlighted the role of genetic variations in innate immunity genes in psoriasis. Conditional analysis revealed five independent secondary signals at loci near IFIH1, ERAP2, IL12B, MICA and TYK2. A large number of the susceptibility loci identified in the

47 meta-analysis overlapped with other diseases, notably Crohn’s disease (CD) and coeliac disease, and often encompassed genes involved in T-cell function (e.g. TNFRSF9, RUNX3 and IRF4). Indeed recent publications by Parkes et al. (2013) and Farh et al. (2014) show the extent of overlap with a range of diseases, indicating that psoriasis is most related to seronegative (i.e. lacking autoantibodies) autoimmune diseases including CD and ankylosing spondylitis. Of the newly identified loci in Tsoi et al. that were unique to psoriasis, five encompassed genes involved in innate immunity (KLF4, CARD14, DDX58, ZC3H12C and CARM1).

Epidermal barrier function Although many GWAS hits implicate a dysfunctional immune system in psoriasis, some have hinted at a role for epidermal barrier abnormalities in psoriasis (Ye et al., 2014). It is hypothesised that an impaired skin barrier function, or improper response to antigen entry in the skin, could lead to infiltration of antigens that trigger an abnormally regulated immune response in susceptible individuals (Bergboer et al., 2012b). As mentioned above, one of the most prominent skin-related GWAS loci is found within the LCE gene cluster on chromosome 1. A risk SNP for psoriasis in the LCE locus was first identified in Europeans in 2008 (Liu et al., 2008), and subsequently in the Chinese population (Zhang et al., 2009). Following this, a copy number variant analysis found that a deletion of LCE3C and LCE3B (LCE3C_LCE3B-del) was significantly associated with psoriasis in Europeans (de Cid et al., 2009); this finding has since been replicated by other groups (Li et al., 2011; Tsoi et al., 2012).

LCE3 genes are up-regulated in psoriasis plaques and more highly expressed following skin injury, therefore they were initially thought to be involved in repairing the skin barrier (Bergboer et al., 2011). However, there was found to be no association between LCE3C_LCE3B-del and the presence of the Koebner phenomenon suggesting that the loss of LCE3C and LCE3B does not increase the likelihood of developing psoriatic plaques following skin trauma (Bergboer et al., 2012a). More recent evidence has arisen to suggest that the LCE3 proteins in fact have antibacterial properties (Niehues et al., 2017). This would suggest that, rather than having a direct impact on skin barrier, the LCE genes are involved in epidermal host defence. Individuals with the LCE3C_LCE3B-del appear to have increased expression of another LCE gene, LCE3A, which has distinct antibacterial

48 activity and could possibly lead to a dysregulated response to invading bacteria in psoriasis (Niehues et al., 2017).

Another GWAS locus implicating the skin barrier function is at 9q31 nearby the candidate gene KLF4. Whilst KLF4 has potential roles in innate immunity, including activation of macrophages and upregulation of IL-17A expression during Th17 cell differentiation (An et al., 2011), it is also thought to be required for epithelial differentiation, since mice lacking the protein have leaky skin barriers (Segre et al., 1999). It is expressed by keratinocytes in the skin and KLF4 protein is more abundant in psoriatic lesions than in non-involved skin (Kim et al., 2014). Therefore, the association at KLF4 supports the theory that genetic abnormalities in skin barrier function contribute to psoriasis risk.

49

Table 1: Non-MHC GWAS loci associated with psoriasis in cohorts of European and Chinese ancestry Adaption of supplementary table from Ray-Jones et al. (2016). a CHN, Chinese; EUR, European b Representative GWAS index SNP in each locus at genome wide significance (P ≤ 5 x 10-8), excluding any secondary signals in the locus *Significant associations with other traits were found by checking Immunobase and searching for traits in LD r2>0.8 with the lead SNP in Phenoscanner. AS ankylosing spondylitis, CD Crohn’s disease, CeD celiac disease, CKD chronic kidney disease, GV generalised vitiligo, HepB hepatitis B, HD Hodgkin’s disease, IBD inflammatory bowel disease, IgA immunoglobulin A, IgG immunoglobulin G, JIA juvenile idiopathic arthritis, MG myasthenia gravis, MS multiple sclerosis, PBC primary biliary cirrhosis, RA rheumatoid arthritis, SLE systemic lupus erythematosus, T1D type 1 diabetes, UC ulcerative colitis

Samples Other Notable Popula Risk Locus Index SNPb Annotation P-value OR cases/ Ref associated gene(s) tiona allele controls traits* Plasma Missense: 11,245/ Zuo et al. 1p36.3 MTHFR CHN rs2274976 2.33 x 10-10 G 1.21 homocystein MTHFR 11,177 (2015) e SLC45A1, 10,588/ Tsoi et al. 1p36.23 EUR rs11121129 Intergenic 1.7 x 10-8 A 1.13 TNFRSF9 22,806 (2012) 4.2kb 5' of 10,588/ Tsoi et al. EUR rs7552167 8.5 x 10-12 G 1.21 1p36 IL-28RA IL-28RA 22,806 (2012) 5.5kb 5' of 8,339/ Cheng et al. CHN rs4649203 9.74 x 10-11 A 1.19 IL-28RA 12,725 (2014) 1.5kb 5' of 10,588/ Tsoi et al. 1p36.11 RUNX3 EUR rs7536201 2.3 x 10-12 C 1.13 AS RUNX3 22,806 (2012) Missense: 11,245/ Zuo et al. 1p36.11 ZNF683 CHN rs10794532 4.18 x 10-8 A 1.11 ZNF683 11,177 (2015) 441bp 3' of 10,588/ Tsoi et al. AS, CD, IBD, EUR rs9988642 1.1 x 10-26 T 1.52 IL-23R 22,806 (2012) UC 1p31.3 IL-23R chr1: Nonsynonym 10,727/ Tang et al. CHN 67,421,184 1.94 x 10-11 G 1.28 ous: IL-23R 10,582 (2014) (build hg18) Missense: 11,245/ Zuo et al. 1p31.3 C1orf141 CHN rs72933970 1.23 x 10-8 G 1.16 C1orf141 11,177 (2015) Intronic: 11,988/ Tsoi et al. 1p31.1 FUBP1 EUR rs34517439 4.43 × 10−9 A 1.18 DNAJB4 275,334 (2017) 3.6kb 3' of 10,588/ Tsoi et al. EUR rs6677595 2.1 x 10-33 T 1.26 1q21.3 LCE3B, LCE3B 22,806 (2012) LCE3D 175bp 3' of 11,245/ Zuo et al. CHN rs10888501 6.48 x 10-13 A 1.14 LCE3E 11,177 (2015)

50

Samples Other Notable Popula Risk Locus Index SNPb Annotation P-value OR cases/ Ref associated gene(s) tiona allele controls traits* Stop-gained: 11,245/ Zuo et al. 1q22 AIM2 CHN rs2276405 3.22 x 10-9 G 1.17 AIM2 11,177 (2015) 11,988/ Tsoi et al. 1q24.3 FASLG EUR rs12118303 Intergenic 3.02 × 10−10 C 1.12 275,334 (2017) 15,295/ Tsoi et al. 1q31.1 LRRC7 EUR rs10789285 Intergenic 1.43 x 10-8 G 1.12 27,578 (2015) Intronic: 3.05 x 10-8 1,962/ Bowes et al. 1q31.3 DENND1B EUR rs2477077 T N/A IBD DENND1B (meta) 8,923 (2015a) Intronic: 11,988/ Tsoi et al. 1q32.1 IKBKE EUR rs41298997 2.37 × 10−8 T 1.13 IKBKE 275,334 (2017) FLJ16341, Intronic: 10,588/ Tsoi et al. 2p16.1 EUR rs62149416 1.8 x 10-17 T 1.17 RA REL FLJ16341 22,806 (2012) 10,588/ Tsoi et al. 2p15 B3GNT2 EUR rs10865331 Intergenic 4.7 x 10-10 A 1.12 AS, CD 22,806 (2012) Intronic: 11,245/ Zuo et al. Eosinophils, 2q12.1 IL1RL1 CHN rs1420101 1.71 x 10-10 G 1.12 IL1RL1 11,177 (2015) Asthma Intronic: 10,588/ Tsoi et al. KCNH7, EUR rs17716942 3.3 x 10-18 T 1.27 2q24.2 KCNH7 22,806 (2012) IFIH1 Intronic: 15,207/ Sheng et al. CHN rs13431841 2.96 x 10-9 G 1.17 Cholesterol IFIH1 17,103 (2014) Intronic: 15,295/ Tsoi et al. 3p24.3 PLCL2 EUR rs4685408 8.58 x 10-9 G 1.12 RA PLCL2 27,578 (2015) 400bp 3' of 3,496/ Yin et al. 3q11.2 TP63 EUR rs28512356 4.31 x 10-8 C 1.17 TP63 5,186 (2015) Intronic: 15,295/ Tsoi et al. 3q12.3 NFKBIZ EUR rs7637230 RP11- 2.07 x 10-9 A 1.14 27,578 (2015) 221J22.1 Missense: 11,245/ Zuo et al. Serum 3q13 CASR CHN rs1042636 1.88 x 10-10 A 1.09 CASR 11,177 (2015) calcium 3q26.2- Intronic: 11,245/ Zuo et al. GPR160 CHN rs6444895 1.44 x 10-12 G 1.11 q27 GPR160 11,177 (2015) Intronic: 15,207/ Sheng et al. 4q24 NFKB1 CHN rs1020760 2.19 x 10-8 G 1.12 NFKB1 17,103 (2014) PTGER4, 15,295/ Tsoi et al. 5p13.1 EUR rs114934997 Intergenic 1.27 x 10-8 C 1.17 CARD6 27,578 (2015) Missense: 11,245/ Zuo et al. 5q14 ZFYVE16 CHN rs249038 2.14 x 10-8 G 1.16 ZFYVE16 11,177 (2015) 5q15 ERAP1, Intronic: 10,588/ Tsoi et al. EUR rs27432 1.9 x 10-20 A 1.20 LNPEP ERAP1 22,806 (2012)

51

Samples Other Notable Popula Risk Locus Index SNPb Annotation P-value OR cases/ Ref associated gene(s) tiona allele controls traits* Intronic: 15,207/ Sheng et al. CHN rs27043 6.50 x 10-12 G 1.13 ERAP1 17,103 (2014) 10,588/ Tsoi et al. AD, Asthma, 5q31 IL13, IL4 EUR rs1295685 3'-UTR: IL13 3.4 x 10-10 G 1.18 22,806 (2012) IgE, HD 5'-UTR: 10,588/ Tsoi et al. EUR rs2233278 2.2 x 10-42 C 1.59 5q33.1 TNIP1 22,806 (2012) TNIP1 Intronic: 11,245/ Zuo et al. CHN rs10036748 4.26 x 10-9 G 1.10 IgA, MG, SLE TNIP1 11,177 (2015) 10,588/ Tsoi et al. EUR rs12188300 Intergenic 3.2 x 10-53 T 1.58 CD 5q33.3 IL12B 22,806 (2012) Intronic: 11,245/ Zuo et al. Platelet CHN rs10076782 4.11 x 10-11 G 1.12 RNF145 11,177 (2015) volume 8,312/ Sun et al. 5q33.3 PTTG1 CHN rs2431697 Intergenic 1.11 x 10-8 C 1.20 SLE 12,919 (2010) EXOC2, Intronic: 10,588/ Tsoi et al. 6p25.3 EUR rs9504361 2.1 x 10-11 A 1.12 IRF4 EXOC2 22,806 (2012) Intronic: 9,293/ Stuart et al. 6p22.3 CDKAL1 EUR rs4712528 8.4 x 10-11 C 1.16 CD, IBD, IgA CDKAL1 13,670 (2015) Missense: 10,588/ Tsoi et al. 6q21 TRAF3IP2 EUR rs33980500 4.2 x 10-45 T 1.52 UC TRAF3IP2 22,806 (2012) Intronic: 10,588/ Tsoi et al. 6q23.3 TNFAIP3 EUR rs582757 2.2 x 10-25 C 1.23 TNFAIP3 22,806 (2012) 10,588/ Tsoi et al. CeD, CD, MS, 6q25.3 TAGAP EUR rs2451258 Intergenic 3.4 x 10-8 C 1.12 22,806 (2012) RA Missense: 11,245/ Zuo et al. 7p14.3 CCDC129 CHN rs4141001 1.84 x 10-11 A 1.14 CCDC129 11,177 (2015) Intronic: 10,588/ Tsoi et al. 7p14.1 ELMO1 EUR rs2700987 4.3 x 10-9 A 1.11 ELMO1 22,806 (2012) Intronic: 8,312/ Sun et al. 8p23.2 CSMD1 CHN rs10088247 4.54 x 10-9 C 1.17 CSMD1 12,919 (2010) Intronic: 10,588/ Tsoi et al. 9p21.1 DDX58 EUR rs11795343 8.4 x 10-11 T 1.11 DDX58 22,806 (2012) 10,588/ Tsoi et al. 9q31.2 KLF4 EUR rs10979182 Intergenic 2.3 x 10-8 A 1.12 22,806 (2012) Intronic: 11,988/ Tsoi et al. 10q21.2 ZNF365 EUR rs2944542 1.76 × 10−8 G 1.08 ZNF365 275,334 (2017) CAMK2G, Intronic: 15,295/ Tsoi et al. 10q22.2 EUR rs2675662 7.35 x 10-9 A 1.12 FUT11 CAMK2G 27,578 (2015) Intronic: 8,644/ Ellinghaus et 10q22.3 ZMIZ1 EUR rs1250544 3.53 x 10-8 G 1.16 CeD, CD, IBD ZMIZ1 15,055 al. (2012a)

52

Samples Other Notable Popula Risk Locus Index SNPb Annotation P-value OR cases/ Ref associated gene(s) tiona allele controls traits* PTEN, 11,988/ Tsoi et al. 10q23.31 KLLN, EUR rs76959677 Intergenic 2.75 × 10−8 G 1.28 275,334 (2017) SNORD74 Fatty liver and alanine Intronic: 11,988/ Tsoi et al. 10q24.31 CHUK EUR rs61871342 1.56 × 10−9 G 1.10 aminotransf BLOC1S2 275,334 (2017) erase ALT levels Missense: 11,245/ Zuo et al. 11p15.4 ZNF143 CHN rs10743108 1.70 x 10-8 C 1.14 ZNF143 11,177 (2015) RPS6KA4, 256bp 5' of 8,644/ Ellinghaus et 11q13 EUR rs694739 3.71 x 10-9 A 1.12 CD PRDX5 AP003774.1 15,055 al. (2012a) CFL1, FIBP, Intronic: 11,988/ Tsoi et al. 11q13.1 EUR rs118086960 6.89 × 10−9 T 1.12 FOSL1 CFL1 275,334 (2017) Synonymous 11,245/ Zuo et al. 11q13.1 AP5B1 CHN rs610037 4.29 x 10-11 C 1.11 : AP5B1 11,177 (2015) 1.7kb 5' of 10,588/ Tsoi et al. 11q22.3 ZC3H12C EUR rs4561177 7.7 x 10-13 A 1.14 ZC3H12C 22,806 (2012) Intronic: 10,588/ Tsoi et al. 11q24.3 ETS1 EUR rs3802826 9.5 x 10-10 A 1.12 ETS1 22,806 (2012) CD27, Intronic: 15,207/ Sheng et al. 12p13.3 CHN rs758739 4.08 x 10-8 C 1.09 LAG3 NCAPD2 17,103 (2014) KLRK1, Intronic: 11,988/ Tsoi et al. 12p13.2 EUR rs11053802 4.17 × 10−9 T 1.11 KLRC4 KLRC1 275,334 (2017) IL-23A, Intronic: 10,588/ Tsoi et al. 12q13.3 EUR rs2066819 5.4 x 10-17 C 1.39 Height STAT2 STAT2 22,806 (2012) Blood- related traits, CeD, CKD, BRAP, 11,988/ Tsoi et al. 12q24.12 EUR rs11065979 Intergenic 1.67 × 10−8 T 1.08 cholesterol, MAPKAPK5 275,334 (2017) GV, JIA, PBC, T1D, hypothyroidi sm Intronic: 11,988/ Tsoi et al. 12q24.31 IL31 EUR rs11059675 1.50 × 10−8 A 1.10 LRRC43 275,334 (2017) Missense: 10,727/ Tang et al. 13q12.11 GJB2 CHN rs72474224 7.46 x 10-11 T 1.34 GJB2 10,582 (2014)

53

Samples Other Notable Popula Risk Locus Index SNPb Annotation P-value OR cases/ Ref associated gene(s) tiona allele controls traits* Intronic: 3,496/ Yin et al. RA, age at 13q14.11 COG6 EUR rs34394770 2.65 x 10-8 T 1.16 COG6 5,186 (2015) menarche Within 3,496/ Yin et al. 13q14.11 LOC144817 EUR rs9533962 1.93 x 10-8 C 1.14 LOC144817 5,186 (2015) UBAC2, Intronic: 11,988/ Tsoi et al. 13q32.3 EUR rs9513593 3.60 × 10−8 G 1.12 RN7SKP9 UBAC2 275,334 (2017) Intronic: 14q13.2 NFKBIA 10,588/ Tsoi et al. EUR rs8016947 RP11- 2.5 x 10-17 G 1.16 22,806 (2012) 56B11.3 11,245/ Zuo et al. 13q14.11 LOC144817 CHN rs12884468 Intergenic 1.05 x 10-8 G 1.12 11,177 (2015) 14q23.2 Stop-gained: 11,245/ Zuo et al. SYNE2 CHN rs2781377 4.21 x 10-11 G 1.15 SYNE2 11,177 (2015) RP11- Intronic: 11,988/ Tsoi et al. 14q32.2 EUR rs142903734 7.15 × 10−9 AAG 1.12 61O1.1 RP11-61O1.1 275,334 (2017) Intronic: 11,988/ Tsoi et al. 15q13.3 KLF13 EUR rs28624578 9.22 × 10−10 T 1.18 KLF13 275,334 (2017) PRM3, 1.6kb 3' of 10,588/ Tsoi et al. 16p13.13 EUR rs367569 4.9 x 10-8 C 1.13 PBC SOCS1 PRM3 22,806 (2012) FBXL19, Intronic: 10,588/ Tsoi et al. 16p11.2 EUR rs12445568 1.2 x 10-16 C 1.16 PRSS53 STX1B 22,806 (2012) Intronic: 10,588/ Tsoi et al. 17q11.2 NOS2 EUR rs28998802 3.3 x 10-16 A 1.22 NOS2 22,806 (2012) Asthma, CD, Intronic: 15,207/ Sheng et al. 17q12 IKZF3 CHN rs10852936 1.96 x 10-8 T 1.10 IBD, PBC, RA, ZPBP2 17,103 (2014) UC PTRF, Intronic: 10,588/ Tsoi et al. 17q21.2 STAT3, EUR rs963986 5.3 x 10-9 C 1.15 PTRF 22,806 (2012) STAT5A/B TRIM47, Intronic: 11,988/ Tsoi et al. 17q25.1 EUR rs55823223 1.06 × 10−8 A 1.15 TRIM65 TRIM65 275,334 (2012) Missense: 10,588/ Tsoi et al. EUR rs11652075 3.4 x 10-8 C 1.11 CARD14 22,806 (2012) 17q25.3 CARD14 Missense: 10,727/ Tang et al. CHN rs11652075 3.46 x 10-9 C 1.09 CARD14 10,582 (2014) Missense: 11,245/ Zuo et al. 17q25.3 TMC6 CHN rs12449858 2.28 x 10-8 A 1.12 TMC6 11,177 (2015) Intronic: 11,988/ Tsoi et al. 18p11.21 PTPN2 EUR rs559406 1.19 × 10−10 G 1.10 RA PTPN2 275,334 (2017)

54

Samples Other Notable Popula Risk Locus Index SNPb Annotation P-value OR cases/ Ref associated gene(s) tiona allele controls traits* POL1, Intronic: 10,588/ Tsoi et al. 18q21.2 STARD6, EUR rs545979 3.5 x 10-10 T 1.12 POL1 22,806 (2012) MBD2 3’ of 8,312/ Sun et al. 18q22.1 SERPINB8 CHN rs514315 5.92 x 10-9 T 1.13 SERPINB8 12,919 (2010) CD, IBD, JIA, Missense: 10,588/ Tsoi et al. 19p13.2 TYK2 EUR rs34536443 9.1 x 10-31 G 1.88 MS, PBC, RA, TYK2 22,806 (2012) T1D ILF3, Intronic: 10,588/ Tsoi et al. 19p13.2 EUR rs892085 3 x 10-17 A 1.17 CARM1 QTRT1 22,806 (2012) Blood metabolite Synonymous 11,988/ Tsoi et al. 19q13.33 FUT2 EUR rs492602 6.57 × 10−13 G 1.11 levels, : FUT2 275,334 (2017) cholesterol, CD Missense: 10,727/ Tang et al. 19q13.41 ZNF816A CHN rs12459008 2.25 x 10-9 A 1.12 ZNF816 10,582 (2014) Intronic: 10,588/ Tsoi et al. 20q13.13 RNF114 EUR rs1056198 1.5 x 10-14 C 1.16 RNF114 22,806 (2012) Intronic: 3,496/ Yin et al. 21q22 RUNX1 EUR rs8128234 3.74 x 10-8 T 1.17 RUNX1 5,186 (2015) Missense: 11,245/ Zuo et al. 21q22.11 IFNGR2 CHN rs9808753 2.75 x 10-8 A 1.08 IFNGR2 11,177 (2015) Missense: 11,245/ Zuo et al. 21q22.11 SON CHN rs3174808 1.15 x 10-8 G 1.10 SON 11,177 (2015) CeD, CD, UBE2L3, 1kb 3' of 10,588/ Tsoi et al. 22q11.21 EUR rs4821124 3.8 x 10-8 C 1.13 cholesterol, YDJC UBE2L3 22,806 (2012) HepB, IBD

55

0.8.3.2 Genetic overlap with psoriatic arthritis PsA is thought to have a stronger genetic component than psoriasis. The strong heritability of PsA was demonstrated in a population-based study that found that Icelandic patients with PsA were more related to each other than to the general population (Karason et al., 2009). Genetic studies have shown that PsA and psoriasis share susceptibility at several loci; notably the MHC (FitzGerald and Winchester, 2009), TRAF3IP2, IL12B, TNIP1, IL-23A TYK2 and IL-23R (Bowes et al., 2015a). Within the MHC, the strongest risk variant for both psoriasis and PsA is the classic four-digit allele HLA- C*0602. However, polymorphisms at HLA-B are thought to differentiate genetic risk between psoriasis and PsA (Bowes et al., 2015a; Okada et al., 2014a). In addition, there are potential PsA-specific risk loci at 5q31 and at PTPN22 (Bowes et al., 2015a; Bowes et al., 2015b).

0.8.3.3 Genetics of late-onset psoriasis So far, GWAS of psoriasis have mainly focused on loci associated with disease of early- onset. Studies into LOP have suffered from reduced power, since the lower disease prevalence means that only relatively small cohorts are available. Nevertheless, some data is emerging on the genetics of LOP. In 1985, Henseler and Christophers first used HLA tissue typing to show that EOP was more highly associated with PSORS1, within the MHC-1, than LOP. Subsequent studies have confirmed that LOP is not strongly associated with the major psoriasis risk allele HLA-C*06:02 (Allen et al., 2005; Gudjonsson et al., 2006). Indeed, carriage of the HLA-C*06:02 allele has been shown to be associated with a younger age of onset (Bowes et al., 2017).

The largest genotyping study of LOP patients to date was carried out using the Immunochip array, with a cohort of 543 LOP patients and 4373 controls (Hebert et al., 2014a). In this study, association with LOP at HLA-C reached genome-wide significance (P = 3.73 x 10-10) whilst an independent SNP in HLA-A reached study-wide significance (P = 2.54 x 10-6). Although the association with HLA-C in this study may seem contradictory to other studies showing that HLA-C is associated with an earlier age of onset, the authors note that the odds ratio for LOP (1.72) was much lower than that reported for EOP (4.32) by Tsoi et al. (2012).

56

Hebert et al. (2014a) also demonstrated that the genetics of LOP has a certain amount of overlap with EOP in non-MHC regions. An association with the known psoriasis risk locus IL12B reached genome-wide significance at P < 5x10-8 and five non-MHC known risk loci reached study-wide significance at P < 2.3x10-5: IL-23R, IF1H1, TRAF3IP2, IL-23A and RNF114 (Hebert et al., 2014a). As in EOP, these gene candidates have potential functions in IL-23/Th17 (IL-23R, TRAF3IP2, IL-23A) and IFN/NFκB signalling (IFIH1, RNF114).

A growing body of genetic evidence supports the theory that IL1β signalling may differentiate between EOP and LOP. Firstly, a genetic study investigating pro- inflammatory cytokines in psoriasis found that an IL1B allele (IL1B-511*1/1) was associated with late-onset but not early-onset disease (Reich et al., 2002). Experiments in vivo also suggested that LOP patients have aberrant function or production of IL1β, which affects keratinocyte response (Shaw et al., 2010). As described above, a more recent candidate gene study found significant associations with LOP at rs16944 (P = 0.03, OR = 1.15) and rs11687624 (P = 1.53 x 10-3, OR = 1.22) near IL1B (Hebert et al., 2014b).

The LOP Immunochip study identified a LOP-specific putative risk locus at the interleukin- 1-receptor, type 1 gene (IL1R1) at 2q13 (rs887998; P = 8.81 x 10-6, OR = 1.40) (Hebert et al., 2014a). The index SNP rs887998 is intronic to IL1R1, and was not associated with EOP when tested in the WTCCC2 GWAS dataset by Strange et al. (2010). IL1R1 encodes a receptor for IL1α and IL1β, which are both involved in the regulation of NF-κB and LC migration. However, this genetic association requires validation in an independent cohort of LOP patients.

0.9 Moving beyond GWAS The GWAS era has provided a wealth of information about genetic loci associated with autoimmune disorders, and yet we are still far from understanding psoriasis aetiology. It should be noted that the risk loci that have been discovered so far only account for a proportion of psoriasis heritability (missing heritability): reportedly 28% in Europeans (Tsoi et al., 2017) and 45.7% in Chinese (Jiang et al., 2015). As cohort sizes increase with the use of meta-analyses, further risk variants with small effect sizes continue to be identified, due to the increased power of detection (Tsoi et al., 2015; Yin et al., 2015). However, there are arguments that the current known loci actually explain much more of psoriasis heritability than they initially appear to, due to factors such as SNP interactions (epistasis), and gene-environment interactions, as well as the fact that methods used to 57 calculate heritability may be inaccurate (Zaitlen and Kraft, 2012). In light of this, the field of genetics is moving from the task of identifying new signals towards interpreting the ones that are already known.

0.9.1 Interpreting GWAS data Aside from the missing heritability problem, there are two major challenges in interpreting GWAS data. The first is that the index SNP in each locus is usually in tight linkage disequilibrium with several other proxy SNPs. Farh et al. (2014) developed an algorithm to identify probable causal GWAS SNPs in autoimmune disease and predicted that only 5% of index SNPs were likely to be causal. Indeed, they showed that the most likely causal SNPs, based on genetic and epigenetic data, tend to lie an average of 14 kb from the index SNP. The GWAS signal in each locus should therefore be considered as a set of SNPs, of which any one (or several) might be causal.

The second challenge in interpreting GWAS data is determining the function of disease- associated variants. Whilst a small number of risk loci for psoriasis are tagged by coding variants that probably affect gene function, such as IL23R, TRAF3IP2, CARD14 and IFIH1 (Tang et al., 2014; Tsoi et al., 2012; Yin et al., 2015), the vast majority of lead variants are present in non-coding regions either within genes themselves (intronic) or outside of genes altogether (intergenic). Dissecting how these variants affect molecular function in relation to disease is more challenging, since non-coding mutations do not directly affect protein structure. However, GWAS variants have been shown to be enriched in enhancer elements (Farh et al., 2015), which are regions of regulatory DNA that influence gene expression.

In psoriasis, many lead regulatory GWAS variants are present within an intron or promoter of a likely gene candidate; this is the case at several well-established loci including IL23R, ERAP1 and IL12B (Stuart et al., 2015; Tsoi et al., 2012). However, often the gene in which the risk variant resides is not necessarily the only gene affected in disease; for example in the IL12B locus, lead variants have also been reported within introns of a nearby gene RNF145, when the target gene is most likely IL12B (Zuo et al., 2015). Often variants are located outside of genes altogether (intergenic); in these cases the gene target is often attributed based on proximity or likely biological relevance, which may not always be the case. Here the challenge remains to identify the target genes, which may not be those in closest proximity to the enhancer. 58

0.9.2 Functional annotation of GWAS loci In the post-GWAS era, the hundreds of identified risk loci require careful functional characterisation through a combination of bioinformatics and in vitro studies (Figure 6).

Explore genes as Predict causal SNPs targets for novel and genes • Interrogate loci with: therapeutics • GWAS • ChIP • Fine mapping • 3C • Bioinformatic approaches • Immunochip • HiC • Define mechanism by which utilising data on: causal gene is affected in • CRISPR • eQTLs disease • Reporter assays • Chromatin features • Effect of gene-targeting on • Important cell types cellular phenotype Identify common Identify causal SNPs disease risk loci and genes in the lab

Figure 6: Workflow for the characterisation of GWAS SNPs, adapted from Ray-Jones et al. (2016) GWAS is used to identify common variants associated with a disease or trait. Significant risk variants can be verified by fine mapping studies or further genotyping using dense arrays such as the Immunochip. Bioinformatics is used to identify all SNPs associated with the lead SNP and predict which ones are likely to be functional in a relevant cell type, either through gene expression changes (eQTL) or overlap with regulatory chromatin features. Lab techniques can be used to deduce the mechanism by which SNPs affect gene function, perhaps through regulatory protein binding (ChIP) or chromatin looping (3C). Genome modification can be used to evaluate the effect of altering the DNA sequence surrounding risk SNPs (CRISPR) and reporter assays can be used to assess enhancer function. This information can be used as a basis for identifying the effect of gene expression changes on cellular phenotype, with the ultimate aim of targeting the gene with novel therapeutics or using it as a disease biomarker.

Before commencing functional analyses on a locus of interest, large-scale data sets can be interrogated in order to identify all SNPs in LD with the lead SNP in a particular locus and predict which SNPs are likely to be functional. For example, a SNP in the locus of interest may be correlated with change in expression of a gene; this SNP is known as an expression quantitative trait locus (eQTL). In recent years there has been a huge increase in the number of available eQTL resources, which can be collated from individual studies in specific cell types, and large-scale consortia such as the Genotype-Tissue Expression (GTEx) project, which currently holds eQTL data on 48 post-mortem tissues from 620 human donors (GTEx Consortium, 2017; Westra et al., 2013). Across the genome, eQTLs tend to be either highly tissue-specific or are shared across a range of related tissues (GTEx Consortium, 2017). eQTLs can also be context dependent and only have an effect in 59 certain environments, for example stimulation with IFN-γ and LPS affected the detectability of 51.4% of eQTLs in monocytes (Fairfax et al., 2014).

In recent years, functional studies have been reported in other disease settings including systemic lupus erythematosus (Wang et al., 2013), cancer (Grampp et al., 2016; Hoskins et al., 2016) and ankylosing spondylitis (Roberts et al., 2016), but are lacking in psoriasis (Ray-Jones et al., 2016). Therefore, the remainder of this section will describe features of the non-coding genome that should be considered during functional annotation of GWAS loci, and the methods that can be employed to study them, in the context of psoriasis.

0.9.2.1 Importance of cell type Levels of gene expression and features of the epigenome, such as enhancers, can be highly dependent on the cellular state (Heintzman et al., 2009). Therefore, it is important to consider which cell types are most relevant when investigating specific loci associated with psoriasis. This question can be explored in part by analysis of gene expression in vivo in psoriasis lesions and in vitro in relevant cell types. Transcriptomics gives an overview of all RNA transcripts produced in a genome, providing an insight into candidate gene expression in a particular tissue type. analysis is usually carried out via one of two techniques: DNA microarrays or RNA sequencing (RNA-seq). In a typical DNA microarray, complementary DNA (cDNA) or complementary RNA (cRNA) is generated from the sample and hybridized to target sequences on a solid surface. Target abundance is then detected through the use of fluorophores. RNA-seq, on the other hand, uses next- generation sequencing to determine cDNA sequence (Wang et al., 2009).

Recently a study by Swindell et al. (2014) attempted to identify relevant cell types in psoriasis by evaluating transcriptome data from microarrays across multiple datasets. Firstly they identified differentially expressed genes in lesional skin in psoriasis patients in comparison with non-lesional skin, and then explored how expression of those genes compared across ten different psoriasis-relevant cell types including T cells, keratinocytes, fibroblasts, neutrophils and macrophages. Of all the differentially expressed genes, half of those that were upregulated in psoriasis plaques were assigned to keratinocytes and fibroblasts and half were assigned to immune cells. They also collated a list of candidate genes implicated in psoriasis GWASs and found that their expression was most specific to neutrophils and macrophages. Therefore, it is likely that multiple cell types, including

60 keratinocytes, fibroblasts and immune cells are relevant for investigating the function of psoriasis risk variants (Swindell et al., 2014).

For the purposes of functional characterisation of genetic loci in the lab, it is often necessary to utilise a cell line that models the cells of interest. Cell lines are immortalised cells that can be proliferated indefinitely in vitro, allowing for the generation of millions of cells that are readily available for experimentation. In psoriasis, relevant models might include cell lines such as My-La (CD8+ T cells), Jurkat (CD4+ T cells) or HaCaT (keratinocytes). Once genetic mechanisms have been determined in cell lines, they can be validated in primary cells isolated from patients or healthy volunteers. Normal human epidermal keratinocytes (NHEK), for instance, can be isolated from skin and cultured for a few passages in vitro.

In order to examine the effects of an inflammatory environment on the function of disease-associated variants, cells can be stimulated with an appropriate cytokine. For example, keratinocytes could be stimulated with cytokines related to key pathways in disease such as Th17 or Th1 processes. For example, Seo et al. (2012) stimulated HaCaT cells and primary keratinocytes with a range of cytokines including IL-17A, IFN-γ, IL-4 and IL-22. Shi et al. (2011) stimulated keratinocytes with IL-17A to observe upregulation of keratin 17, and Cho et al. (2012) stimulated keratinocytes with IL-17A, IFN-γ or IL-22 to observe upregulation of IL-1β.

0.9.2.2 Accessible chromatin Examination of chromatin features is important for assessing the function of non-coding variation. Chromatin is a complex structure consisting of DNA packaged by proteins that serve to regulate gene expression. Within chromatin, DNA is packaged into nucleosome complexes by histone proteins. The DNA in regions bound by nucleosomes is in a condensed state, and therefore cannot be accessed by enzymes. Conversely, DNA in nucleosome-free regions is available for protein binding; hence it is more likely to be an active enhancer or a transcriptionally active region. These active regions were traditionally found by DNAse-seq, which uses the enzyme DNAse 1 to selectively degrade DNA in accessible regions known as DNAse 1 hypersensitive sites (Song and Crawford, 2010). Recently, further methods have been developed: FAIRE-seq, which makes use of formaldehyde fixation (Simon et al., 2012), and ATAC-seq, which utilises Tn5 transposase (Buenrostro et al., 2013). SNPs that overlap regions of accessible chromatin can be 61 identified using DNase hypersensitivity data from consortia such as NIH Roadmap (Bernstein et al., 2010) and ENCODE (Dunham et al., 2012).

0.9.2.3 Protein interactions When investigating susceptibility loci it is important to identify regulatory proteins bound to the genome; SNPs overlapping protein binding sites are likely to have a functional consequence. Some of the most important markers of regulatory DNA are histones that have been modified by acetylation or methylation at the amino terminus. Histone modifications known to mark different regulatory regions include H3K4me3, H3K4me1, H3K27me3 and H3K27ac. Active gene promoter regions are enriched for binding of H3K27ac and H3K4me3 (Liang et al., 2004). Meanwhile, enhancer elements are marked by binding of H3K4me1 and may be in an active or poised state. Active enhancers are enriched for H3K27ac whilst poised enhancers are enriched for H3K27me3 (Barski et al., 2007; Creyghton et al., 2010). SNPs themselves can affect histone modifications, as well as the position of nucleosomes and degree of DNase 1 hypersensitivity (McVicker et al., 2013).

Transcription factors are proteins that bind to specific sequences at enhancers and gene promoters to activate or repress transcription. The paper mentioned above, by Swindell et al. (2014), also attempted to define which transcription factors were most important in psoriasis risk loci. To do this they assembled a library of transcription factor binding motifs and determined which of these motifs contained GWAS risk SNPs adjacent to a candidate gene. From their transcriptome data, they could then see if these motifs were also enriched near genes co-expressed with the candidate gene in a given cell type. They hypothesised that intergenic SNPs could affect binding of transcription factors to these motifs. At psoriasis susceptibility loci near KLF4 and IL12B, this methodology highlighted the involvement of the activator protein 1 transcription factor (AP-1) group (Swindell et al., 2014). However, many other families of transcription factors are also likely to be involved in driving psoriasis pathogenesis, as explored in a later study by the same research group (Swindell et al., 2015).

In order to study interactions between DNA and proteins, such as transcription factors or modified histones, chromatin immunoprecipitation (ChIP) can be employed. ChIP is able to determine whether genomic loci are associated with specific protein complexes in vivo (Christova, 2013). The principals of ChIP involve formaldehyde cross-linking of DNA and 62 bound proteins in a living cell, followed by shearing of chromatin into small fragments through sonication and immunoprecipitation with an antibody that binds the protein of interest (Figure 7). Finally, the cross links are reversed and the purified DNA can be identified by quantitative real-time PCR (qPCR) analysis (ChIP-qPCR) or sequencing (ChIP- Seq) (Christova, 2013). ChIP-qPCR has been widely used to investigate the overlap of GWAS SNPs with transcription factors and histone marks. For example, Grampp et al. (2016) used allele-specific ChIP-qPCR to show that a risk SNP for renal cancer at 8q24.21 had increased binding to hypoxia-inducible transcription factors (HIFs) in comparison with the protective SNP. Alonside other epigenetic changes, this corresponded with increased expression of the oncogenes MYC and PVT1.

Crosslink chromatin with formaldehyde; extract from cells of interest

Antibody

Fragment chromatin (200-1000 bp). Immunoprecipitate with antibody for protein of interest

Reverse crosslinks, purify DNA and detect targets by qPCR or sequencing Reverse primer Reverse primer

Reverse primer

Figure 7: Overview of the chromatin immunoprecipitation (ChIP) technique In order to capture DNA/protein interactions, the chromatin is fixed in the cell using formaldehyde. The chromatin is then fragmented to 200-1000bp lengths using a sonicator. An antibody targeting the protein of interest is used to pull out the regions binding to that protein (immunoprecipitation). The cross-links are reversed and the DNA is isolated and cleaned. The loci that were originally bound to the protein can then be identified by qPCR or sequencing.

63

0.9.2.4 Chromatin interactions One of the mechanisms by which enhancers can influence gene expression is through long-range physical interaction with gene promoters. The genome is now known to form a complex 3D structure in which stretches of DNA are organised into structural compartments of approximately 1 Mb known as topologically associated domains (TADs) (Dixon et al., 2012; Nora et al., 2012). DNA elements within a TAD preferentially form contacts with each other rather than with elements within a different TAD, so that enhancers are likely to regulate expression of a gene within the same TAD compartment. Therefore, whilst TADs represent overall chromatin structure, it is important to look at finer-scale interactions within TADs to identify specific enhancer-promoter interactions. A suggested pipeline for identifying disease causal variants and gene targets might include a range of experimental techniques that allow the researcher to examine interactions involving the associated variant at differing resolutions in the genome (Krijger and de Laat, 2016).

In 2002, the development of the chromosome conformation capture (3C) technique allowed researchers to detect physical interactions between two different locations in the genome (Dekker et al., 2002). The main principles of 3C involve cross-linking of interacting regions in the nucleus using formaldehyde, followed by digestion with a restriction enzyme. The interacting fragments are then joined through intramolecular ligation. Following cross-link reversal, the product can be detected through the use of quantitative PCR (Figure 8).

64

Crosslink chromatin with formaldehyde; extract from cells of interest

HindIII

HindIII Fragment chromatin using a restriction enzyme (e.g. HindIII) HindIII HindIII

Join cut ends of interacting regions using T4 DNA ligase

Reverse crosslinks, purify DNA and detect interactions by Primer 1 Primer 2 quantitative PCR

Figure 8: Overview of the chromosome conformation capture (3C) technique As in ChIP, chromatin is first fixed within the cell using formaldehyde. The chromatin is then extracted and digested using a restriction enzyme such as HindIII. The cut ends are ligated together using T4 DNA ligase and the cross-links are reversed. After several rounds of purification, the interacting fragments of DNA can be detected by qPCR or sequencing.

Recently, 3C has been used as a tool to detect likely causal variants and their target genes in a disease context. For example, Hoskins et al. (2016) used 3C to investigate an intergenic risk locus for pancreatic cancer at 13q22.1, which harbours several potential candidate genes. They detected a long-range (570 kb) interaction between a regulatory GWAS risk variant and a region 6 kb from the DIS3 gene, thereby prioritising DIS3 as a candidate gene for follow-up. In another example, 3C was used to identify gene targets in a pan-autoimmune locus at 16p13 (Davison et al., 2012). In this locus the autoimmune- related SNPs are present within intron 19 of CLEC16A, but the investigators detected interactions between the SNPs and the promoter of DEXI. They also showed that intron 19 of CLEC16A encompassed a regulatory region that controlled DEXI expression (Davison et al., 2012).

65

The experimental design of 3C allows the researcher to examine single chromatin interactions a priori (the one-to-one approach) but is not optimal for the discovery of novel interactions. There are now numerous derivatives of 3C that address this issue (Figure 9). The first of these, 4C, employed the use of a secondary restriction digest and inverse PCR to identify all regions interacting with a specific fragment of interest (one-to- all) (Simonis et al., 2006). Next, the development of 5C allowed for identification of interactions between multiple anchor sites and multiple target sites (many-to-many), which was accomplished by capturing 3C libraries with oligonucleotides, amplifying with universal primers and detecting products through microarray analysis or high throughput sequencing (Dostie et al., 2006).

Figure 9: Variations of the chromosome conformation capture technique Figure reproduced from Krijger and de Laat (2016) – used with permission

The first truly unbiased genome-wide chromosome conformation technology (all-to-all) was Hi-C; an adaptation of 3C that enriches for interacting fragments through the biotinylation of restriction fragment ends, which can then be pulled out with streptavidin-

66 coated beads following ligation (Lieberman-Aiden et al., 2009). Hi-C libraries are analysed using high throughput sequencing, the depth of which determines the resolution of the detected interactions. Although technically the resolution of a Hi-C library is only restricted by the length of the restriction fragments – a few thousand bases for a 6-bp restriction enzyme such as HindIII – their extreme complexity means that billions of reads may be required to achieve this. Hi-C libraries are usually low resolution and are used for looking at overall chromatin architecture (e.g. TADs and sub-TADs) rather than specific interactions. In 2014, however, Rao et al. used a 4-bp restriction enzyme DpnIII coupled with in situ ligation to generate Hi-C maps at 1 kb resolution in 9 human and mouse cell lines that revealed principles of chromatin conformation in the genome. They identified thousands of chromatin looping domains that are smaller than TADs (average length of 185 kb) and are generally conserved across cell types. The loops were often found to link enhancers and promoters and were anchored at sites bound by the DNA binding protein CTCF (Rao et al., 2014).

The most recent derivatives of 3C and Hi-C achieve high resolution chromatin folding maps at target sites in the genome by combining high throughput sequencing with sequence capture (Davies et al., 2017). Based on 3C, next-generation capture-c uses custom RNA baits to pull out regions of interest from 3C libraries (Hughes et al., 2014). Similarly, capture-Hi-C (CHi-C) uses RNA baits to capture regions of interest from Hi-C libraries enriched for ligation junctions (Dryden et al., 2014).

In recent years, CHi-C has been increasingly used to map interactions between enhancers and gene promoters. One approach to do this is to capture all fragments overlapping gene promoters (Javierre et al., 2016; Mifsud et al., 2015; Schoenfelder et al., 2015). In 2016, Javierre et al. performed a promoter CHi-C experiment in 17 primary human blood cell types, providing a valuable resource for other researchers interested in genetic immune-related conditions. Another approach is to capture fragments overlapping disease-associated regions; this has been performed in diseases such as colorectal cancer and breast cancer (Dryden et al., 2014; Jager et al., 2015). Recently a study used CHi-C to investigate chromatin interactions in related autoimmune diseases (rheumatoid arthritis, psoriatic arthritis, type 1 diabetes and juvenile idiopathic arthritis) in B and T cell lines utilising a region capture and complementary promoter capture approach (Martin et al., 2015). The findings from this study shifted our understanding of autoimmune genetics by

67 re-assigning GWAS loci to novel gene candidates. For example, rheumatoid arthritis (RA)- associated SNPs in 3p24.1 nearby the EOMES gene were instead found to form a 640 kb interaction with the AZ121 gene, whose protein has roles in NFkB activation (Martin et al., 2015).

CHi-C studies generate hypotheses about genetic loci that can be followed up with further experiments that build on the evidence for promising therapeutic targets. For instance, the findings by Martin et al. (2015) led to detailed work on the pan-autoimmune risk locus at 6q23 in RA and multiple sclerosis (MS) (Martin et al., 2016; McGovern et al., 2016). In RA, a combination of genotype-specific 3C, allele-specific ChIP and eQTL analysis showed that the GWAS SNPs at TNFAIP3 come into close proximity with, and regulate the expression of, IL20RA (McGovern et al., 2016). The 6q23 region also contains psoriasis- associated variants; however chromosome conformation has not yet been explored in psoriasis risk loci in relevant cell types to the author’s knowledge. Several psoriasis loci overlap enhancers in intergenic regions including 9q31.2 (KLF4) and 5p13.1 (CARD6, PTGER4) (Tsoi et al., 2015; Tsoi et al., 2012); in these loci chromatin conformation will be particularly informative for defining gene targets.

Methods also exist for the combined analysis of chromatin conformation and protein binding. The first of these is known as chromatin interaction analysis using paired-end tag sequencing (ChIA-PET) and incorporates ChIP with 3C and high throughput sequencing (Fullwood and Ruan, 2009). It was first used to map chromatin interactions mediated by oestrogen receptor alpha in oestrogen-treated human breast adenocarcinoma cells (Fullwood et al., 2009). Of relevance to autoimmune disease, in 2015 a study performed ChIA-PET for three regulatory histone marks (H3K4me3, H3K27ac and H3K4me1) in lymphoblastoid cell lines from 75 individuals (Grubert et al., 2015). This study showed that the ChIA-PET interacting regions were enriched for variants that were significantly associated with autoimmune diseases at P < 1 x 10-6 (MS, RA, CD and ulcerative colitis). They were also able to identify SNPs that affected histone mark binding (hQTLs), and showed that these were enriched for GWAS SNPs (Grubert et al., 2015). More recently, a derivative of ChIA-PET known as HiChIP was developed; this method combines ChIP with Hi-C and Tn5 transposase library preparation thereby allowing the identification of protein mediated interactions within accessible chromatin (Mumbach et al., 2016). The advantage of HiChIP over ChIA-PET is that the number of starting cells can be reduced up

68 to 100-fold. HiChIP was most recently used to explore enhancer-promoter interactions in primary T cells and identified H3K27ac-mediated interactions between GWAS SNPs and transcription start sites for several autoimmune diseases (Mumbach et al., 2017).

0.9.2.5 Selected psoriasis risk loci for functional follow-up For certain traits, functional studies following up on GWAS findings have been successful in identifying important gene targets. For example, a widely cited study in 2014 followed up on variants within the FTO gene associated with obesity (Smemo et al., 2014). The authors of this study used a combination of in vitro and in vivo techniques to demonstrate that the obesity variants form a long-range regulatory interaction with IRX3; a gene that they showed affects body weight in mice. Similarly in psoriasis, GWAS findings present many opportunities for discovery of novel therapeutic targets. The 2012 psoriasis Immunochip study identified 15 novel loci nearby genes with functions in innate immunity; these loci represent strong candidates for follow-up (Tsoi et al., 2012). Two of these loci are described below.

9q31 (KLF4): a psoriasis-specific intergenic locus One such locus is at 9q31, tagged by the index SNP rs10979182 (P = 2.3 x 10-8, OR = 1.12), which is located within a large intergenic region devoid of protein coding genes over a distance of more than 1 Mb. The nearest gene candidate (~565 kb) is kruppel-like factor 4 (KLF4), which encodes a protein with compelling psoriasis-relevant roles in skin barrier formation and immune signalling (Feinberg et al., 2005; Segre et al., 1999). On the other side of the gene desert, a gene cluster includes another potential candidate gene IKBKAP, which encodes the IκB kinase complex-associated protein (IKAP) that may be involved in NFκB activation (Cohen et al., 1998). The GWAS association in 9q31 is specific to psoriasis and not associated with other related autoimmune conditions, although it is situated near a GWAS association for breast cancer (Fletcher et al., 2011; Michailidou et al., 2013). A recent CHi-C study identified interactions between an intergenic fragment near the breast cancer locus and KLF4 in breast cancer cells (Dryden et al., 2014). The 9q31 locus has not yet been functionally characterised in the context of psoriasis.

6q23 (TNFAIP3): a pan-autoimmune intronic locus The 2012 Immunochip study by Tsoi et al. also validated an important psoriasis risk locus at 6q23, which harbours associations with several other autoimmune diseases including RA, MS, systemic sclerosis, coeliac disease and systemic lupus erythematosus (SLE) that 69 are thought to be independent of the psoriasis association (Nititham et al., 2015). The lead risk variant for psoriasis rs582757 is located within an intron of TNFAIP3, which is a plausible gene target whose protein A20 has functions in NF-κB signalling. The SLE- associated variants downstream of TNFAIP3 have been shown to form a long-range interaction with the TNFAIP3 promoter and modulate its expression through recruitment of a transcription factor (Wang et al., 2013). In contrast, the variants upstream of TNFAIP3 associated with RA and MS have been shown to interact with long-distance immune- related genes including IL20RA, IL22RA2 and IFNGR1, as described above (Martin et al., 2016; McGovern et al., 2016). Investigation of the function of the psoriasis-associated variants in this region would therefore improve our understanding of the overlap of regulatory mechanisms between these related autoimmune diseases.

0.10 Summary In summary, the GWAS era has revealed many insights into the genetics of psoriasis but has also highlighted the disease complexity. There is still a vast amount of work to be done in delineating the mechanisms behind the differing phenotypes such as early-onset and late-onset psoriasis, which are currently treated in the same manner in the clinic. In addition, GWAS findings are difficult to interpret in a useful way since lead variants are enriched in non-coding regions with arbitrarily assigned target genes. The discovery of the true target genes is an important step in drug discovery, and will lead to a better understanding of pathways that are important in disease.

Whilst biologics can be effective treatments for psoriasis, the results are not always sustained and certain patients may not respond in the same way to a particular therapy. The reasons for these issues are not clear. In the emerging era of precision medicine, it is feasible that patients could one day be more effectively treated based on their genotype. In terms of novel therapeutics, research has shown that drug targets with strong genetic data behind them are much more likely to be successful in clinical trials (Nelson et al., 2015). As scientific advances make it increasingly possible to target a wide range of genes through therapies that affect expression, as opposed to classical protein targeting with small molecule drugs, it is important to continue efforts to define and characterise genes involved in disease (Nature Editorial, 2017).

70

0.11 Overall aims and objectives This project aims to improve our understanding of the genetic and biological mechanisms underlying psoriasis by building on the wealth of knowledge gained from recent GWAS findings. This aim will be achieved by employing a number of different statistical and experimental approaches. The first section of the project will attempt to validate risk loci specific to LOP, since research into the genetics of this psoriasis phenotype is currently lacking. Following on from this, functional molecular techniques including ChIP, 3C and CHi-C will be used to further characterise the genetic mechanisms occurring in psoriasis risk loci. Overall, this approach will help pave the way towards discovering novel candidate genes and causal variants involved in psoriasis pathogenesis.

0.12 Outline of thesis This thesis is presented in two sections; the first section describing the late-onset GWAS and the second section describing the functional molecular work. Each section contains its own methods, results and discussion. A final discussion brings together the two sections, which are linked by an overarching motivation to identify and explore genetic loci associated with psoriasis. The aim and content of each section is described below.

1. A genome-wide association study of late-onset psoriasis The underlying mechanisms delineating EOP and LOP are not well understood. Therefore, this study aimed to investigate genetic susceptibility to LOP. This was conducted by compiling the largest genotyping cohort of patients with LOP. A GWAS was performed comparing the genotypes of LOP patients with a cohort of psoriasis-free controls.

The logical next step from performing a GWAS is to functionally annotate disease- associated risk loci. At this stage no LOP-specific loci had been validated, so in the meantime the project focused on annotating risk loci for psoriasis per se. These loci are considered robust, since the lead SNPs have reached genome-wide significance in a previous large-scale psoriasis meta-analysis. Initially, a hypothesis-based approach was used in each selected locus, employing various bioinformatics and experimental methodologies. This was followed by a disease-wide approach that used high-throughput sequencing to characterise regulatory interactions in all known psoriasis loci.

71

2. Functional characterisation of psoriasis risk loci i. Characterisation of individual risk loci Non-coding risk loci identified by GWAS can only be interpreted by detailed functional characterisation. Therefore, this study aimed to identify potential causal SNPs and target genes utilising a hypothesis-based approach. The selected regions included an intergenic psoriasis-specific locus at 9q31.2 near KLF4 and a pan-autoimmune risk locus at 6q23 near TNFAIP3 (Tsoi et al., 2012). Bioinformatics were used to form a hypothesis about the regulatory mechanisms occurring in each locus. The functional molecular techniques ChIP and 3C were then used to test these hypotheses. ii. Characterisation of multiple psoriasis risk loci To extend the functional analysis of psoriasis risk loci, the next body of work employed the capture Hi-C technique to map chromatin interactions in all known psoriasis loci at the time. This data from this experiment indicates gene targets of GWAS variation in psoriasis.

72

1. A GENOME-WIDE ASSOCIATION STUDY OF LATE-ONSET PSORIASIS

73

1.1 Introduction Genome-wide association studies (GWAS) have uncovered a wealth of information about the genetic background of EOP. Genetic studies into the LOP phenotype have been historically lacking, with the largest study utilising the Immunochip consisting of 543 LOP patients and 4373 controls (Hebert et al., 2014a). This study uncovered a suggestive LOP- specific genetic risk factor at IL1R1, in line with previous evidence for genes involved in IL-1 signalling delineating LOP from EOP (Hebert et al., 2014b; Reich et al., 2002). However, further work is required to discover novel genome-wide genetic signals for LOP, as well as to validate the signal at IL1R1.

1.2 Aims and objectives of Section 1 The aim of this study was to perform the largest GWAS in LOP to date in order to identify potential LOP-specific signals and validate the putative LOP-specific IL1R1 signal in an independent case-control cohort.

The objectives of Section 1 were:

1. Perform genotyping on DNA from psoriasis patients 2. Merge the genotype data with previously collated LOP and control sample genotype datasets 3. Carry out genome-wide imputation of the genotype data 4. Perform a case-control association analysis for LOP against controls

74

1.3 Methods

1.3.1 Samples All samples were obtained from participants of European descent. Participants in this study provided written informed consent.

Table 2: Samples included in the LOP GWAS dataset Samples in LOP cases (pre- Dataset Phenotype Illumina Array cohort QC) Manchester PsA PsA HumanCoreExome 2281 575 Cases HumanOmniExpress BSTOP Psoriasis 1178 134 Exome Controls arcOGEN OA Human 610-Quad 3422 N/A

1.3.1.1 Cases

Manchester-based PsA cohort The majority of LOP cases were obtained from a Manchester-based cohort consisting of 2281 patients with PsA, of which 575 patients had LOP. These samples were genotyped on HumanCoreExome BeadChips by myself and other colleagues at the Arthritis Research UK (ARUK) Centre for Genetics and Genomics (see Figure 10).

Biomarkers of Systemic Treatment Outcomes in Psoriasis (BSTOP) Genotype data for the remaining LOP cases was obtained from psoriasis patients recruited by the Biomarkers of Systemic Treatment Outcomes in Psoriasis (BSTOP) study; a part of the Psoriasis Stratification to Optimise Relevant Therapy (PSORT) consortium. These samples had been genotyped using Illumina HumanOmiExpressExome-8v1.2_A BeadChips. Quality control on the genotyped data had previously been carried out Dr Nick Dand, Postdoctoral fellow at King’s College London (Appendix, Table 24). Unimputed genotype data for 134 patients with a recorded LOP phenotype were extracted for use in this analysis.

75

1.3.1.2 Controls arcOGEN study For controls, genotype data on a subset of patients with osteoarthritis (OA) from the arcOGEN study was obtained (3422) (Zeggini et al., 2012). OA is a non-inflammatory, non- autoimmune condition; therefore patients with OA form a valid control group for a psoriasis study. The arcOGEN samples had been genotyped using Illumina Human 610- Quad BeadChips. The unimputed genotype dataset had already undergone quality control measures, detailed in Zeggini et al. (2012) (Appendix, Table 24).

76

Sample collection/DNA extraction • ARUK technicians • DNA normalised to 50 ng/µL

• Harry Hebert Conducted alone Genotyping • Helen Ray-Jones or in pairs; HRJ • Illumina Infinium HTS assay • Ashley Budu-Aggrey contributed in 10 • Data collection on iScan • Jonathan Massey runs out of 21

GenomeStudio analysis • Cluster to call genotypes • Harry Hebert Conducted • Remove female Y SNPs from Y chromosome stats, • Helen Ray-Jones together removing samples with call rate < 90% • Re-cluster

Sample and marker QC (PLINK) • Samples removed based on: gender mismatch, genotyping efficiency < 98%, deviation from • John Bowes autosomal heterozygosity, duplicated or related Conducted • Helen Ray-Jones samples (IBD), non-European ancestry (PCA) in parallel • Eftychia Bellou • Markers removed based on: unplaced, mitochondrial or Y chromosome, missing rate > 2%, monomorphic, rare (MAF < 1%), departing from HWE (P < 1 x 10-3)

Merging case/control datasets (PLINK) • Identify common SNPs between PsA, BSTOP and arcOGEN datasets; merge in PLINK • Helen Ray-Jones • Remove duplicated or related samples (IBD) • PCA on merged dataset to include in final analysis

Imputation • Impute using Michigan Imputation Server • Helen Ray-Jones • Markers removed based on: multi-allelic, MAF < 5%, Imputation R2 < 0.5

Association analysis (SNPTEST) • Frequentist test for association • Create plots: Manhattan, qq-plot, regional • association plots (Locuszoom) Helen Ray-Jones • Conditional analysis in MHC • Prioritised novel signals based on regional association plots and SNPTEST info score

Figure 10: Flow chart of the LOP GWAS process, indicating the contributing researchers at each stage

77

1.3.2 Genotyping of the Manchester PsA cohort The Manchester PsA cohort was genotyped using the HumanCoreExome-24 BeadChip Array (Illumina). The HumanCoreExome (HCE) is a DNA microarray that covers 547,644 variants across the genome, 265,919 of which are located in exonic regions.

1.3.2.1 Illumina Infinium HTS Assay DNA samples were prepared by the technical team at the ARUK Centre for genetics and genomics. The DNA was extracted from blood samples and quantified on a Nanodrop spectrophotometer (Thermofisher Scientific). All DNA samples were then normalised to 50 ng/µL. The Illumina Infinium HTS assay was used to perform genotyping according to the manufacturer’s instructions (Appendix, Table 23). The protocol, illustrated in Figure 11, takes approximately three days to complete.

Each DNA sample (200 ng) was denatured and neutralised, followed by overnight whole- genome amplification in an oven at 37˚C. The DNA was then enzymatically fragmented, precipitated using isopropanol and then pelleted by centrifugation. Following re- suspension, the DNA was hybridised to the BeadChip array overnight in an oven at 37˚C. Subsequently each BeadChip was washed using the supplied reagents in order to remove un-hybridised or non-specifically hybridised DNA. An extension reaction was carried out in which labelled nucleotides were added to the primers on the BeadChip bound to the template DNA. The labelled extended primers were then stained and dried. The BeadChips were imaged using an iScan System (Illumina).

78

1. DNA denaturisation/neutralisation 7. Removal of non-hybridised DNA 4. Isopropanol precipitation and centrifugation

2. Whole genome amplification

5. Resuspension of DNA 79

8. DNA extension and staining

3. Enzymatic DNA fragmentation 6. Hybridisation to BeadChip 9. Imaging BeadChip on iScan

Figure 11: Overview of the genotyping protocol (adapted from Illumina Infinium HTS Assay protocol guide)

79

1.3.2.2 GenomeStudio The fluorescence data from the iScan was loaded into the GenomeStudio software (Illumina). Genotype calls were performed by GenomeStudio using the manifest and cluster files supplied by Illumina (HumanCoreExome-24 v1.0). Each marker is assigned a GenCall score, which ranges from 0-1 where the lower the value, the further a marker is situated from the centre of a cluster. Here, samples with a GenCall score of less than 0.15 were assigned a “no call”; this is the cutoff value recommended by Illumina. Preliminary quality control of the genotype data in GenomeStudio involved removing female Y-SNPs from Y chromosome SNP statistics and removing all samples with a call rate of less than 90%; i.e. samples that have more than 10% “no call” genotypes. The SNP genotypes were then re-clustered and a final report of the full dataset was generated for downstream analysis.

1.3.2.3 Quality control of the Manchester PsA genotype data The Manchester PsA data underwent quality control (QC) as a whole dataset, after which the LOP samples were extracted for use in the association analysis. QC was conducted in parallel with ARUK colleagues Dr John Bowes and Eftychia Bellou. The majority of the QC pipeline utilises the whole genome data analysis toolset PLINK (Purcell et al. (2007); available at http://zzz.bwh.harvard.edu/plink/).

Sample quality control in PLINK Gender mismatch, call rate and autosomal heterozygosity

Firstly, in order to identify potential sample mix-up, PLINK was used to estimate each sample’s gender through X chromosome heterozygosity. The recorded phenotypic gender for each patient was then compared against the estimated gender from their genotype and any disparate samples were removed from the dataset.

Next, samples with a call rate of < 98% were removed from the dataset. Additionally, samples were checked for autosomal heterozygosity. Sample contamination can lead to an excessive heterozygosity rate, whereas inbreeding or poor genotype calling can lead to a reduced heterozygosity rate (Anderson et al., 2010). Therefore, samples were removed if their heterozygosity rate was more than 3 standard deviations from the mean.

80

Alleles identical by descent (IBD) analysis

Duplicated or related samples were identified through alleles identical by descent (IBD) using KING version 1.9 software (Manichaikul et al., 2010). If two samples carry two sets of alleles IBD, they are either duplicates or monozygotic twins. If they share 50% of alleles IBD they are first-degree relatives (parent and child), whereas if they share 25% of alleles IBD they are second degree relatives (Turner et al., 2011). Here, duplicate samples and samples that were first or second degree relatives were identified; in each case the sample with the least amount of missing data was retained while the other was removed using PLINK.

Principal component analysis (PCA)

Next, measures were taken to control for population stratification; a phenomenon by which the differing population origin of samples causes systematic genetic differences in the cohort. If not accounted for, population stratification might lead to spurious associations that are actually associated with ancestry rather than disease risk. Principal Component Analysis (PCA) was used to control for population stratification. PCA is a method by which the variance in the non-correlated dataset is explained by linear variables known as principal components. The number of principal components is dependent on the dimensionality of the data. The first principal component explains the greatest amount of variance; each subsequent principal component explains a diminishing amount of variance.

Here, the data was firstly converted to EIGENSTRAT format using the confertf program within the EIGENSOFT package (Patterson et al., 2006; Price et al., 2006). Samples with differential genetic ancestry were then identified using the SNPweights package (Chen et al., 2013). SNPweights uses pre-computed SNP weights from external reference panels, such as HapMap, that are applied to the first two principal components of the dataset. SNP weights for European, West African and East Asian populations were used to filter out samples of non-European descent.

Marker quality control in PLINK Unplaced, mitochondrial, Y SNPs, missingness and minor allele frequency

PLINK was used to remove any unplaced, mitochondrial or Y-chromosome variants. Variants were removed if more than 2% was missing across the cohort. Monomorphic

81 variants and rare variants with a minor allele frequency (MAF) of < 1% were removed. In a small-scale study such as this, rare variants are likely to present as false positives and true rare associations may not be found due to a lack of power.

Hardy-Weinberg Equilibrium

Next, SNPs that departed from the Hardy-Weinberg Equilibrium (HWE) were removed. The HWE predicts the proportion of genotypes at a bi-allelic locus in a population, which should remain stable across generations. Variants significantly deviating from the HWE may result from genotyping errors, although some deviation is expected at true disease signals. Here, the --hwe function in PLINK was used to detect and remove all variants with a HWE p-value of < 1 x 10-3.

1.3.3 Merging case-control datasets To create the final LOP dataset, the filtered Manchester dataset was merged with the genotyped data for the BSTOP LOP samples and the arcOGEN osteoarthritis samples using PLINK. Initially, each dataset was aligned to the Haplotype Reference Consortium (HRC) reference panel GRCh37 (McCarthy et al., 2016) using the checking tool HRC-1000G- check-bim.pl version 4.2 by William Rayner (available at http://www.well.ox.ac.uk/~wrayner/tools/#Checking). This perl script is designed to convert SNPs to the correct strand, ID name, position, alleles and reference or alternative assignment of alleles. PLINK was then used to merge the three datasets based on the common variants between them. To check for duplicated or related samples, IBD analysis was again conducted and any matches were removed.

PCA was conducted on the merged dataset in order to identify any confounding variables that should be included as covariates. A pruned set of independent variants (HapMap3) with a MAF > 5% was generated. The EIGENSOFT package was then used to perform PCA and a scree plot was generated to determine a suitable number of principal components to include in the final analysis.

1.3.4 Imputation The SNPs in the final merged dataset were used to impute further SNPs genome-wide. Imputation increases the power of the study and allows for fine-mapping of causal variants. This can be performed by the Michigan Imputation Server; a powerful, free resource that uses Minimac3 to impute genotyped datasets (Das et al. (2016); available at 82 https://imputationserver.sph.umich.edu/index.html). Here the QC’d dataset was uploaded to the server and imputation was carried out based on haplotypes from the HRC reference panel (European population). The imputed, unphased dataset was downloaded and underwent post-imputation QC. This involved removing multi-allelic variants and variants with a MAF < 5%. Additionally, poorly-imputed variants were removed based on an imputation R2 < 0.5. The imputation R2 ranges from 0 – 1 and represents the squared correlation between genotyped variants and imputed variants.

1.3.5 Association analysis Case-control analyses compare allele frequencies between the case cohort and the control cohort to identify variants associated with the phenotype.

1.3.5.1 Frequentist test for association The case-control analysis was performed using a test that incorporates genotype uncertainty, since SNPs are imputed with varying levels of confidence that should be accounted for in the analysis. Frequentist tests for association manage this by weighting each SNP by its imputation probability whilst observing its effect on disease risk. Here, a frequentist association method “Score” was conducted using SNPTEST version 2.5.2, available at https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html. The Score method performs a missing data likelihood score test utilising genotype probabilities (Marchini and Howie, 2010). In addition, the first two principal components from the PCA on the merged dataset (Section 1.3.3) were included as covariates. An additive model was assumed in that each copy of the risk allele adds a uniform amount of risk of disease, i.e. if the risk of genotype Aa is 푥, the risk of genotype AA is 2푥 (Bush and Moore, 2012).

For each variant an allelic odds ratio (OR) is produced; this defines the odds that a person with allele A of a variant will display the phenotype compared with the odds of a person with version a of the variant (Clarke et al., 2011). The directional effect of an allele is defined by the OR: an OR > 1 for an allele at higher frequency in the case group indicates that that allele is risk for disease, whereas an OR < 1 indicates that that allele is protective for disease. 95% confidence intervals (CI) for the OR give an indication of the reliability of the OR; a large spread of values for the CI indicates that the OR may not be precise (Szumilas, 2010). In addition, the genetic association is generally not significant if the CI overlaps 1. Here, OR and 95% CI were reported for all associations. 83

1.3.5.2 Correction for multiple testing A P-value for the significance of the odds ratio is also generated by the association test. Here, the P-value threshold for significant associations was set at P = 5 x 10-8, based on a Bonferroni correction for multiple testing assuming 1,000,000 independent SNPs in the genome (calculated from 0.05 ÷ 1,000,000). The threshold for suggestive significance was set at P = 1 x 10-5, in line with other studies (Becker et al., 2016; Pezzolesi et al., 2009; Zhang et al., 2016).

1.3.5.3 Testing for independent signals in the MHC The MHC locus is a region of complex LD patterns. Multiple independent signals in the MHC have been shown to be associated with psoriasis (Knight et al., 2012; Okada et al., 2014a). In order to identify independent signals in the MHC locus, conditional analyses were carried out using SNPTEST. Firstly, a frequentist association test was carried out conditioning on the additive effect from the top SNP identified in the unconditional analysis. If a second significant signal was identified, the next test conditioned on the SNP from the unconditional analysis and the SNP from the conditional analysis, and so on until no further significant signals were identified.

1.3.5.4 Annotation of results A Manhattan plot and a q-q plot of the data were generated using the R package qqman, available at https://cran.r-project.org/web/packages/qqman/index.html. Regional association plots of P-value results and LD patterns were generated for lead SNPs above suggestive significance using Locuszoom (Pruim et al., 2010). To identify overlap with previously reported GWAS traits, all SNPs at suggestive significance were run through the PhenoScanner tool (Staley et al., 2016), available at http://www.phenoscanner.medschl.cam.ac.uk/phenoscanner. This tool utilises a curated database of GWAS summary statistics and published results allowing the user to search for traits associated with a variant. The tool incorporates LD information from 1000 Genomes to collect data on variants correlated with the inputed SNP. Here, the lead variants from non-MHC loci were run through PhenoScanner with a P-value cutoff of 1 x 10-5 and searching for proxy variants with r2 > 0.8 (1000 Genomes).

84

1.3.5.5 Post-analysis QC of novel signals Novel loci were further assessed for QC metrics in order to identify potential false positive associations. Each novel signal was firstly assessed by the SNPTEST info score, which gives an indication of genotype uncertainty at each SNP. This score is closely correlated with the imputation R2 score (Marchini and Howie, 2010). An info score of > 0.7 was set for novel locus prioritisation. Next, loci were assessed by the strength of signal in the regional association plot. Singleton SNPs that had no other SNPs in high LD (r2 > 0.6) in the analysis were not prioritised for follow-up.

85

1.4 Results

1.4.1 Samples Pre-QC, genotype data was collected on 709 LOP cases and 3,422 controls. Post-QC, 621 LOP cases and 3,422 controls remained. A detailed break-down of the QC of the Manchester PsA cohort is described below.

1.4.2 Genotyping of the Manchester PsA cohort In total, genotyping data was collected on 2281 PsA patients in the Manchester cohort, containing 575 LOP patients. Within GenomeStudio software, 57 samples were removed due to a call rate of less than 90%. The dataset exported from GenomeStudio included 2224 samples and 547,644 variants. After further quality control in PLINK, 144 samples and 274,837 variants were filtered out leaving 2080 samples and 272,807 variants remaining (Figure 12). Of these, 520 patients had LOP.

86

A QC measure Failed QC measure Failed SNPs Samples Unplaced, Y chromosome or 3,080 Failing IBD analysis 45 mitochondrial SNPs Call rate < 98% 8,852 Autosomal heterozygosity 64 deviation or call rate < 98%

MAF < 0.01 255,201 Sex check 10

Deviating from HWE 4,873 Failing PCA 25

Duplicates 5,208 Total unique samples to remove: 144

Total unique SNPs to remove: 274,837 B

C 2,224 PsA samples (575 LOP) 2,080 samples (520 LOP) Dataset QC 547,644 SNPs 272,807 SNPs

274,837 SNPs 144 samples

Figure 12: Overview of quality control of the Manchester PsA cohort using PLINK. A total of 274,837 SNPs and 144 samples failed quality control measures (A). The majority of samples did not deviate from expected heterozygosity rate and had sample call rate > 0.98 (B). Overall, 2080 samples and 272807 SNPs remained in the analysis (C).

1.4.3 Merging case-control datasets Combining the filtered genotype data on the LOP patients from the Manchester cohort (520) with those from the BSTOP dataset (134) gave a total of 654 LOP cases, which were

87 merged with the 3422 OA controls from arcOGEN. IBD analysis on the merged dataset caused one further LOP sample to be dropped. Additionally, it was noted that 32 cases had disparate phenotypic information; these patients were coded as LOP but their recorded age of onset was less than 40. Following removal of these samples, there were 621 cases and 3422 controls in the analysis. In the final case cohort the median age of psoriasis onset was 50 (interquartile range 45-56; missing = 5.0%) (Appendix, Figure 56).

Table 3: Overview of case and control cohorts * two male and two female samples from the case cohort had missing gender; their gender was inferred from X chromosome heterozygosity in PLINK

Cases Controls Males 341 (54.9%)* 1,265 (37.0%) Females 280 (45.1 %)* 2,157 (63.0%)

There were 130,949 common SNPs across the three arrays. A PCA conducted on independent variants in the analysis identified some sample stratification (Figure 13A); a scree plot suggested that the first two principal components should be included as covariates in the final analysis (Figure 13B).

A B

Figure 13: PCA of merged dataset A plot of PC1 against PC2 indicated stratification that was present in both cases and controls (A). A scree plot indicated that the first two principal components should be conditioned on in the final analysis (B).

88

1.4.4 Imputation Imputation of the merged dataset yielded 39,117,105 variants. An imputation R2 cut-off of 0.5 ensured that poorly imputed variants were removed (Figure 14A). After further filtering for MAF > 5%, there were 5,223,736 variants remaining in the analysis.

A B

Figure 14: Quality scores of variants A frequency polygon of R2 scores across all imputed variants indicated a peak of poorly imputed variants (R2 < 0.1); a cut-off of R2 > 0.5 was introduced (A). A frequency polygon of SNPtest info scores indicated that the majority of variant calls in the final analysis were of high quality (info > 0.9); a cut-off of info > 0.7 was later used for novel locus prioritisation (B).

1.4.5 Association analysis Following case-control analysis in SNPTEST, a qq-plot (Figure 15) and Manhattan plot (Figure 16) of the results were produced. Genome-wide significance was considered to be P < 5 x 10-8. At this threshold, two regions were associated with LOP: rs11209006 in the MHC near human leukocyte antigen A (HLA-A) and rs13435715 near Homo sapiens solute carrier family 4 member 4 (SLC4A4). The MHC is a known risk loci for both EOP and LOP (Hebert et al., 2014a; Tsoi et al., 2012) whereas the SLC4A4 locus has not yet been identified in psoriasis.

89

Suggestive significance was considered to be P < 1 x 10-5. At this threshold, 20 loci were associated with LOP. Of these, four loci had previously been identified in EOP and LOP: rs11209006 near interleukin 23 receptor (IL-23R), rs2546890 near interleukin 12B (IL12B), rs34977319 at interferon induced with helicase C domain 1 (IFIH1) and rs2673305 near TNF receptor-associated interacting protein 2 (TRAF3IP2). The remaining 16 loci have not yet been identified in psoriasis. A summary of all loci identified at P < 1 x 10-5 is presented in Table 4.

Figure 15: q-q plot for all variants in the LOP association analysis

The q-q plot was generated using the R package qqman. The plot shows the expected –log10 P-values (X- axis) against the observed –log10 P-values (Y-axis).

90

IL23R

IFIH1 KAT2B

NAALADL2 SLC4A4 FAT4 TRIO

SH3RF2 IL12B (LOC285626) HLA-A TRAF3IP2 (WISP3)

CHRNA6 TRPS1 SH3GL2 PCSK5

TBC1D4 DAOA • •

DHRS2 Novel associations psoriasis associations Known

ELAC2

GGTLC1 NCAM2 DSCAM

Figure 16: Manhattan plot showing genome-wide results of the LOP association analysis

-log10 P-values are plotted against genomic location. The index SNP in each locus is annotated with the nearest or most notable gene. Known associations for EOP and LOP are written in blue text whilst novel psoriasis associations are written in black text. Genome-wide significance is set at P < 5 x 10-8 (red line); suggestive significance is set at P < 1 x 10-5 (blue line).

91

Table 4: Results of the LOP association analysis at P < 1 x 10-5 Chr, chromosome; CI, confidence interval; MAF, minor allele frequency; OR, odds ratio; *conditional on rs1655901 at HLA-A

Position Allele Allele MAF MAF Allelic OR Locus SNP Chr Info score Total MAF 95% CI P-value Annotation Nearby genes (hg19) A B (cases) (controls) (B allele) 1p31.3 rs11209006 1 67619259 T G 0.99 0.35 0.41 0.34 1.39 1.23-1.57 1.70 x 10-7 13 kb 5' of IL23R IL23R 2q24.2 rs34977319 2 163134428 C T 0.61 0.09 0.11 0.08 1.42 1.17-1.73 4.42 x 10-6 Intronic: IFIH1 IFIH1 3p24.3 rs2929398 3 20100054 G A 0.88 0.17 0.21 0.16 0.71 0.61-0.83 1.98 x 10-6 Intronic: KAT2B KAT2B Intronic: 3q26.31 rs55941837 3 174419314 G A 0.91 0.41 0.47 0.40 1.35 1.20-1.53 2.64 x 10-7 NAALADL2 NAALADL2 4q13.3 rs13435715 4 72016010 A G 0.64 0.18 0.23 0.17 1.41 1.22-1.63 9.20 x 10-9 37 kb 5' of SLC4A4 SLC4A4 4q28.1 rs79726809 4 125983451 C G 0.92 0.06 0.09 0.06 1.62 1.30-2.02 3.23 x 10-6 Intergenic FAT4, ANKRD50 5p15.2 rs9637830 5 14526295 G T 0.52 0.06 0.08 0.06 1.46 1.16-1.84 3.88 x 10-6 15 kb 3' of TRIO TRIO

92 5q32 rs10223150 5 145305073 T C 0.54 0.25 0.21 0.26 0.78 0.67-0.90 6.83 x 10-6 11 kb 5' of SH3RF2 SH3RF2

5q33.3 rs2546890 5 158759900 A G 1.00 0.47 0.41 0.48 0.74 0.65-0.83 8.69 x 10-7 LOC285626 (IL12B) IL12B 6p22.1 rs1655901 6 29916804 T C 0.94 0.41 0.33 0.43 1.54 1.36-1.75 2.07 x 10-12 3 kb 3' of HLA-A HLA-A 6p22.1 rs2040748 6 31243785 G T 1.00 0.23 0.30 0.22 1.55* 1.36-1.77 9.98 x 10-12* 3.9 kb 5’ of HLA-C HLA-C 6q21 rs2673305 6 112345963 A C 0.78 0.35 0.30 0.36 1.31 1.15-1.49 5.06 x 10-6 29 kb 5' of WISP3 TRAF3IP2 8p11.21 rs78116261 8 42663621 G A 0.92 0.14 0.18 0.14 1.42 1.21-1.66 4.37 x 10-6 Intergenic CHRNA6, RNF170 8q23.3 rs800582 8 116864531 C G 0.75 0.16 0.20 0.16 0.73 0.63-0.85 2.94 x 10-6 Intergenic TRPS1 9p22.2 rs12002358 9 18054030 G A 0.93 0.10 0.07 0.11 0.58 0.46-0.73 1.64 x 10-6 Intergenic SH3GL2 9q21.13 rs7866089 9 78291861 G T 0.84 0.40 0.46 0.39 1.32 1.17-1.49 1.72 x 10-6 Within snoU13 PCSK5

92

Position Allele Allele MAF MAF Allelic OR Locus SNP Chr Info score Total MAF 95% CI P-value Annotation Nearby genes (hg19) A B (cases) (controls) (B allele) 13q22.2 rs1929756 13 76804369 G T 0.87 0.27 0.32 0.26 1.31 1.15-1.50 8.97 x 10-6 Intergenic TBC1D4, KLF12 13q33.2 rs16964862 13 105165666 T G 1.00 0.19 0.14 0.20 0.67 0.57-0.79 2.81 x 10-6 Intergenic DAOA, ERCC5 14q11.2 rs2149311 14 24275047 A G 0.91 0.07 0.09 0.06 1.58 1.27-1.96 7.37 x 10-6 Intergenic DHRS2, DHRS4 17p12 rs28664578 17 13011553 G A 0.77 0.05 0.08 0.05 1.65 1.31-2.08 3.45 x 10-7 90 kb 5' of ELAC2 ELAC2 20p11.21 rs2424608 20 23969285 C G 0.80 0.08 0.11 0.07 0.65 0.53-0.79 1.83 x 10-6 5'-UTR of GGTLC1 GGTLC1 21q21.1 rs59332578 21 23527056 A G 0.91 0.08 0.11 0.07 1.53 1.25-1.87 7.18 x 10-6 Intergenic NCAM2

21q22.2 rs76977037 21 42087560 T C 0.59 0.05 0.07 0.05 1.50 1.17-1.91 9.08 x 10-6 Intronic: DSCAM DSCAM

93

93

1.4.5.1 Conditional analysis in the MHC The strongest signal in this study was in the MHC locus at rs1655901 (P = 2.07 x 10-12, OR = 1.54), located approximately 3.3 kb 3’ of the HLA-A gene. Conditional analysis on rs1655901 revealed a second significant association at rs2040748 (P = 9.98 x 10-12, OR = 1.55) located approximately 3.9 kb 5’ of the HLA-C gene. Conditioning on the HLA-A signal and the HLA-C signal did not reveal any further associations in the MHC.

1.4.5.2 Overlap with other traits in GWAS datasets To determine whether any of the loci had previously been reported in a GWAS, the lead variants from non-MHC loci were run through PhenoScanner with a P-value cutoff of 1 x 10-5 and searching for proxy variants with r2 > 0.8. Four of the loci were in LD with SNPs with evidence of association in other traits: IL23R, KAT2B, IL12B, and TRPS1 (Table 5). Of these, the lead SNPs at IL23R and IL12B were in tight LD with previously reported psoriasis associations and the effect was in the same direction (Ellinghaus et al., 2012b; Yin et al., 2015). At IL23R there was evidence of an association with Crohn’s disease, which had an opposite direction of effect to the LOP association. At IL12B, there was evidence of an association with multiple sclerosis and PsA, of these the direction of effect was only available for multiple sclerosis which was in the same direction as the LOP association (Table 5) (Huffmeier et al., 2010; Sawcer et al., 2011).

At 3p24.3 (KAT2B), the lead variant was in tight LD with SNPs at suggestive significance for HDL cholesterol change with statins and serum ratio of gamma glutamylthreonineserine (Barber et al., 2010; Suhre et al., 2011). At 8q23.3 (TRPS1), the lead SNP was a perfect proxy for a SNP at suggestive significance for rheumatoid arthritis (Gregersen et al., 2009). None of the other lead variants were in tight LD with SNPs associated with traits in the available GWAS datasets.

94

Table 5: Overlap of lead variants with previously-reported GWAS traits In each locus the lead variant was run through PhenoScanner with a P-value cutoff of 1 x 10-5 and searching for proxy variants with r2 > 0.8. IBDGC, inflammatory bowel disease genetics consortium.

LOP Same LOP GWAS Proxy P- Locus Gene GWAS P- Proxy SNP r2 Trait direction Source SNP value value of effect?

rs4655683 0.99 Crohns disease 3.30 x 10-21 No IBDGC 1p31.3 IL23R rs11209006 1.70 x 10-7 Yin et al. rs2295359 0.95 Psoriasis 8.00 x 10-8 Yes (2015) HDL cholesterol Barber et al. change with 8.90 x 10-7 N/A (2010) statins 3p24.3 KAT2B rs2929398 1.98 x 10-6 rs2929404 0.94 Serum ratio of gamma Suhre et al. 1.70 x 10-6 N/A glutamylthreonine (2011) serine Ellinghaus et Psoriasis 1.00 x 10-20 Yes al. (2012a) Sawcer et al. 5q33.3 IL12B rs2546890 8.69 x 10-7 rs2546890 1 Multiple sclerosis 1.00 x 10-11 Yes (2011) Huffmeier et Psoriatic arthritis 3.00 x 10-9 Unknown al. (2010) Rheumatoid Gregersen et 8q23.3 TRPS1 rs800582 2.94 x 10-6 rs800583 1 7.05 x 10-7 Unknown arthritis al. (2009)

1.4.5.3 Replication of LOP signals In order to identify any validated LOP risk loci, the signals reported by the previous LOP Immunochip study (Hebert et al., 2014a) were checked for association and direction of effect in the present study (Table 6). Hebert et al. (2014a) reported nine loci associated with LOP; genotype data was available for the lead variant at seven of these loci in the current analysis (Table 6). Importantly, the direction of effect was found to be the same between the Immunochip study and the current analysis in each locus.

Three loci reached suggestive significance for LOP for the queried Immunochip SNP (IL12B, IL23R and HLA-A). In the case of IL12B, the lead SNP in the present analysis (rs2546890; P = 8.69 x 10-7, OR = 1.36) was the same lead SNP reported in the LOP Immunochip study. rs2546890 lies within the non-coding gene LOC285626 approximately

95

2.4 kb upstream of IL12B. At IL23R the lead SNP in the present analysis (rs11209006; P = 1.90 x 10-7, OR = 1.39) is in strong LD (r2 = 0.87) with the lead SNP at IL23R in the Immunochip study, rs72676067, which also reached suggestive significance in the present analysis (Table 6). rs11209006 lies 13 kb 5’ of the IL23R gene.

Within the MHC, the lead SNP at HLA-A in the present analysis (rs1655901, P = 2.07 x 10-12, OR = 1.54) is in marginal LD (r2 = 0.29) with the Immunochip SNP rs2256919, which also reached suggestive significance in the present analysis (Table 6). Similarly at the HLA-C locus, the lead SNP rs2040748, which was significant after conditioning on rs1655901 at HLA-A (P = 9.98 x 10-12, OR = 1.55), was in marginal LD (r2 = 0.29) with the Immunochip SNP rs2256919. However, genotype data on the lead Immunochip SNP rs13191099 was not available in the present analysis (Table 6).

Table 6: Comparison with LOP Immunochip findings

Same Risk/ iChip iChip OR GWAS GWAS OR Locus Gene iChip SNP direction of non risk P-value (95% CI) P-value (95% CI) effect? 1p31.3 IL23R rs72676067 A/G 1.87 x 10-6 1.38 (1.21-1.58) 1.91 x 10-6 1.35 (1.19-1.53) Yes 2q12.1 IL1R1 rs887998 A/G 8.81 x 10-6 1.40 (1.21-1.62) 0.17 1.11 (0.96-1.28) Yes 2q24.2 IFIH1 rs1990760 A/G 7.97 x 10-6 1.35 (1.18-1.55) 2.35 x 10-4 1.27 (1.11-1.44) Yes 5q33.3 IL12B rs2546890 A/G 7.16 x 10-12 1.57 (1.38-1.79) 8.69 x 10-7 1.36 (1.20-1.54) Yes 2.54 x 10-6 6p22.1 HLA-A rs2256919 A/C 1.38 (1.21-1.58) 8.31 x 10-12 1.58 (1.39-1.80) Yes (cond. HLA-C) 6p21.33 HLA-C rs13191099 G/A 3.73 x 10-10 1.72 (1.46-2.02) N/A N/A N/A 6q21 TRAF3IP2 rs71562288 G/A 1.64 x 10-6 1.61 (1.34-1.58) 7.59 x 10-4 1.39 (1.15-1.67) Yes 12q13.3 IL23A rs10876881 G/A 8.11 x 10-6 1.99 (1.43-2.75) N/A N/A N/A 20q13.13 RNF114 rs60813083 C/A 5.53 x 10-6 1.63 (1.33-1.55) 0.08 1.20 (0.97-1.47) Yes

For the four remaining loci where the lead Immunochip SNP could be queried in the current GWAS dataset, the SNP did not reach suggestive significance (IFIH1, TRAF3IP2, RNF114 and IL1R1). However, in two of these loci the SNP was associated at P < 1 x 10-3 (IFIH1 and TRAF3IP2) (Table 6). At IFIH1 the previously reported missense SNP rs1990760 was associated with LOP at P = 2.35 x 10-4 whilst at TRAF3IP2 the previously reported SNP rs71562288 was associated with LOP at P = 7.59 x 10-4 in the present study (Table 6). At the previously reported loci near RNF114, IL23A and IL1R1 there was no evidence for association with LOP in the present analysis. The 2q12.1 (IL1R1) locus is hypothesised to differentiate LOP from EOP; however it did not reach suggestive significance here. In the 96 current analysis, rs887998 itself was not associated with disease (P = 0.17) but the direction of effect was the same as the LOP Immunochip study (Table 6).

97

Figure 17: Regional association plots (Locuszoom) for known psoriasis loci in the LOP GWAS 98

(Figure 17 continued)

1.4.5.4 Putative novel LOP loci Seventeen putative novel loci were associated with LOP in this study: 3p24.3, 3q26.31, 4q13.3, 4q28.1, 5p15.2, 5q32, 8p11.21, 8q23.3, 9p22.2, 9q21.13, 13q22.2, 13q33.2, 14q11.2, 17p12, 20p11.21, 21q21.1 and 21q22.2 (Table 4, page 92). None of these loci have previously been shown to be associated with LOP or psoriasis as a whole. One of the loci was associated with LOP at genome wide significance: 4q13.3 near Solute Carrier Family 4 Member 4 (SLC4A4) (rs13435715, P = 9.20 x 10-9, OR = 1.41). In order to determine if this was a true signal, the quality metrics underlying the lead SNP were assessed. The info score reported by SNPTEST for rs13435715 was fairly low (0.64), 99 indicating poor imputation. Additionally, rs13435715 was a singleton with no other SNPs in tight LD (Figure 18). Therefore, this association should be regarded as a potential false positive.

The remaining putative novel loci were associated with LOP at suggestive significance (P < 1 x 10-5). To rule out further false positive signals, the loci were filtered for a SNPTEST info score > 0.7. Additionally, singleton associations were discarded. Three signals were found to have a low info score: rs9637830 at 6p15.2 (TRIO), rs10223150 at 5q32 (SH3RF2) and rs76977037 at 21q22.2 (DSCAM) (Table 7). One locus was found to harbour a singleton SNP with no SNPs in strong LD (r2 > 0.6): rs28664578 at 17p12 (ELAC2) (Figure 18). These associations are potential false positives.

Table 7: Novel LOP associations not prioritised for follow-up (info score < 0.7 or singletons) Chr, chromosome; CI, confidence interval; MAF, minor allele frequency; OR, odds ratio.

Position Locus SNP Chr A/B Info MAF P-value OR (95% CI) Gene Singleton? (hg19) 4q13.3 rs13435715 4 72016010 A/G 0.64 0.18 9.20 X 10-9 1.41 (1.22-1.63) SLC4A4 Y 5p15.2 rs9637830 5 14526295 G/T 0.52 0.06 3.88 X 10-6 1.46 (1.16-1.84) TRIO N 5q32 rs10223150 5 145305073 T/C 0.54 0.25 6.83 x 10-6 0.78 (0.67-0.90) SH3RF2 N 17p12 rs28664578 17 13011553 G/A 0.77 0.05 3.45 x 10-7 1.65 (1.31-2.08) ELAC2 Y 21q22.2 rs76977037 21 42087560 T/C 0.59 0.05 9.08 X 10-6 1.50 (1.17-1.91) DSCAM Y

100

Figure 18: Regional association plots (Locuszoom) for novel putative signals in the LOP GWAS that failed further QC checks (info < 0.7 or singletons) 101

(Figure 18 continued)

After applying these further quality control measures, 12 novel loci remained associated with LOP at suggestive significance (Table 8; Figure 19). According to 1000 Genomes Phase 3 data, none of the lead SNPs in these loci are strongly correlated with variants in gene coding regions. Additionally, a search of the RegulomeDB database did not reveal any risk variants correlating with gene expression (eQTLs). As mentioned in 1.4.5.2, two of the loci near KAT2B and TRPS1 overlapped with signals for other traits in previous GWAS datasets.

102

The strongest of the suggestive novel associations was at 3q26.31 (rs55941837, P = 2.64 x 10-7, OR = 1.35). The lead SNP rs55941837 is intronic to N-Acetylated Alpha-Linked Acidic Dipeptidase Like 2 (NAALADL2) partial 5’ UTR, variant 1. As the strongest-associated novel locus, the quality of the signal underlying the association at NAALADL2 was assessed in a number of ways. Firstly, the lead SNP rs55941837 had a high SNPTEST info score (0.91), indicating good imputation. Additionally, the regional association plot resembled the expected pattern of LD in the region (Figure 19). The SNP is common across the datasets, with an overall MAF of 0.41 reflecting the reported MAF in Europeans (0.40; 1000 Genomes Project Phase 3). To determine the quality of the genotyping at this locus, an association analysis was conducted on the merged genotype data using SNPTEST in the same manner as for the imputed dataset. The strongest genotyped signal in the NAALADL2 locus was at rs9846937 (P = 9.2 x 10-4, OR = 0.81). This SNP is in LD with the imputed SNP rs55941837 (r2 = 0.70). Within the PsA cohort, the cluster diagram for the genotyped SNP rs9846937 showed clear separation of genotypes. Additionally, there was no evidence of strand misalignment in any of the datasets during analysis. Therefore, there is no indication that the NAALADL2 signal is a false positive due to technical artefacts.

Table 8: Putative novel suggestive LOP signals (info > 0.7, non-singletons) CI, confidence interval; MAF, minor allele frequency; OR, odds ratio.

Info Total MAF MAF Locus SNP A/B OR (95% CI) P-value Nearby genes score MAF (cases) (controls) 3p24.3 rs2929398 G/A 0.88 0.17 0.21 0.16 0.71 (0.61-0.83) 1.98 x 10-6 KAT2B 3q26.31 rs55941837 G/A 0.91 0.41 0.47 0.40 1.35 (1.20-1.53) 2.64 x 10-7 NAALADL2 4q28.1 rs79726809 C/G 0.92 0.06 0.09 0.06 1.62 (1.30-2.02) 3.23 x 10-6 FAT4, ANKRD50 8p11.21 rs78116261 G/A 0.92 0.14 0.18 0.14 1.42 (1.21-1.66) 4.37 x 10-6 CHRNA6, RNF170 8q23.3 rs800582 C/G 0.75 0.16 0.20 0.16 0.73 (0.63-0.85) 2.94 x 10-6 TRPS1 9p22.2 rs12002358 G/A 0.93 0.10 0.07 0.11 0.58 (0.46-0.73) 1.64 x 10-6 SH3GL2 9q21.13 rs7866089 G/T 0.84 0.40 0.46 0.39 1.32 (1.17-1.49) 1.72 x 10-6 PCSK5 13q22.2 rs1929756 G/T 0.87 0.27 0.32 0.26 1.31 (1.15-1.50) 8.97 x 10-6 TBC1D4, KLF12 13q33.2 rs16964862 T/G 1.00 0.19 0.14 0.20 0.67 (0.57-0.79) 2.81 x 10-6 DAOA, ERCC5 14q11.2 rs2149311 A/G 0.91 0.07 0.09 0.06 1.58 (1.27-1.96) 7.37 x 10-6 DHRS2, DHRS4 20p11.21 rs2424608 C/G 0.80 0.08 0.11 0.07 0.65 (0.53-0.79) 1.83 x 10-6 GGTLC1 21q21.1 rs59332578 A/G 0.91 0.08 0.11 0.07 1.53 (1.25-1.87) 7.18 x 10-6 NCAM2

103

Figure 19: Regional association plots (Locuszoom) of putative novel suggestive LOP loci 104

(Figure 19 continued)

105

(Figure 19 continued) 106

(Figure 19 continued) 107

1.5 Discussion This study performed the largest GWAS for LOP to date utilising independent case and control cohorts. The aims of the study were to discover novel LOP signals and validate the risk loci originally identified in the LOP Immunochip study (Hebert et al., 2014a). In total, the present study provided evidence for association with LOP at six previously reported EOP/LOP susceptibility loci: HLA-A, HLA-C, IL23R, IL12B, IFIH1 and TRAF3IP2, and twelve novel loci that have not previously been identified in psoriasis at suggestive significance.

1.5.1 Validation of known psoriasis loci In this study the strongest association was at HLA-A (rs1655901; P = 2.07 x 10-12, OR = 1.54) and conditioning on HLA-A revealed a second significant association at HLA-C (rs2040748; P = 9.98 x 10-12, OR = 1.55). Conversely, the 2014 Immunochip study reported the strongest LOP association in the MHC locus at HLA-C (rs13191099; P = 3.73 x 10-10) and conditioning on HLA-C revealed an independent association at HLA-A (rs2256919; P = 2.54 x 10-6) (Hebert et al., 2014a). Polymorphisms at both HLA-C and HLA-A have previously been shown to be associated with psoriasis (Okada et al., 2014a). However, previous studies have showed that LOP is not as strongly associated with the major psoriasis risk locus at HLA-C when compared with EOP (Allen et al., 2005; Gudjonsson et al., 2006). Indeed, Bowes et al. (2017) recently showed that carriage of HLA-C*06:02 is significantly associated with a younger age of psoriasis onset. Although both the present study and the 2014 Immunochip study did find an association between LOP and HLA-C, the reported odds ratios of 1.55 (GWAS) and 1.72 (Immunochip) are much lower than that reported in EOP cohorts (for example OR = 4.32; Tsoi et al. (2012)). Combined, these studies therefore add to the consensus that LOP is not as strongly driven by the classic psoriasis risk locus at HLA-C as EOP.

In agreement with the previous LOP Immunochip analysis, this study provides strong evidence for association with LOP at the known IL12B, IL23R, IFIH1 and TRAF3IP2 psoriasis risk loci (P < 1 x 10-5). At IL12B, the lead variant rs2546890 was the same as that reported for the LOP Immunochip study. At IL23R, the lead variant rs11209006 was in strong LD (r2 = 0.87) with the LOP Immunochip lead variant, rs72676067 (Hebert et al., 2014a). At IFIH1 the lead variant rs34977319 was in low r2 with the previously reported LOP SNP rs1990760 (r2 = 0.06) but was in high D prime (D’ = 1.0), indicating that these variants were in linkage disequilibrium.

108

Within the TRAF3IP2 locus, a single SNP approximately 500 kb upstream of TRAF3IP2 was associated with LOP at suggestive significance (rs2673305; P = 5.06 x 10-6, OR = 1.31). This SNP was situated 29 kb 5’ of the WNT1 inducible signalling pathway protein 3 (WISP3) gene and 768 kb from the LOP SNP previously reported by Hebert et al. (rs71562288 at KIAA1919; approximately 300 kb downstream of TRAF3IP2). This previously reported LOP SNP, rs71562288, was associated at P = 7.59 x 10-4; OR = 1.39 in the present study (Table 6; page 96). The two SNPs rs2673305 and rs71562288 are not in strong LD (r2 = 0.0026; D’ = 0.18) and could therefore represent independent LOP signals on either side of TRAF3IP2.

In EOP, the main psoriasis-associated signal in the TRAF3IP2 locus is at rs33980500; a missense mutation within TRAF3IP2 (Ellinghaus et al., 2010; Huffmeier et al., 2010; Stuart et al., 2015). This SNP also had evidence of association with LOP in the present analysis (P = 8.2 x 10-4; OR = 1.41), but was not as strongly associated as rs2673305 at WISP3. Interestingly, interrogation of publically available promoter capture Hi-C data revealed the presence of physical chromatin interactions between WISP3 and the TRAF3IP2 promoter, as well as between KIAA1919 and the TRAF3IP2 promoter, in CD8+ T cells (Javierre et al., 2016; Schofield et al., 2016). Therefore, these three psoriasis signals could all be influencing TRAF3IP2 function through a different mechanism and would be interesting to follow-up with functional analyses. However, the lead SNP rs2673305 at WISP3 in this study was a singleton with no supporting SNPs in tight LD (see Figure 17), therefore it may require validation before it can be followed up.

1.5.2 The 2q13 (IL1R1) locus In the 2014 LOP Immunochip, a signal was identified at 2q13 (IL1R1) at suggestive significance (rs887998, P = 8.81 x 10-6, OR = 1.40 (1.21 – 1.62)) (Hebert et al., 2014a). This association was not validated at genome-wide or suggestive significance in the present study (rs887998, P = 0.17, OR = 1.11), although the same direction of effect was observed. There are several reasons why a true genetic signal might not replicate in an independent GWAS including differences in sample size, phenotypes and genotyping errors (Pearson and Manolio, 2008). Here, the sample sizes were very similar between the Immunochip and GWAS study (621 versus 543). Whilst the control phenotype differed between the two studies (healthy controls in the Immunochip versus OA in the GWAS), the frequency of the risk allele for rs887998 within the controls was very similar (0.202 in Immunochip

109 versus 0.208 in this study). Within the cases, the frequency of the risk allele for rs887998 was higher in the Immunochip (0.262) than in the present study (0.226). Despite this, the LOP phenotype between the two studies was comparable based on age of disease onset (mean age of 51.1 in the Immunochip versus 51.2 in this study).

The use of stringent quality control measures reduces the likelihood of genotyping errors in both LOP studies: in the Immunochip study rs887998 was directly typed with a call rate greater than 98% and in the present study it was imputed with a SNPTEST info score of 0.9996. Therefore, the present study is comparable to the LOP Immunochip study for the IL1R1 locus. A power calculation conducted using the online CaTS tool revealed that this study had 70.3% power to detect the association with rs887998 at an FDR of 0.05, assuming a psoriasis prevalence of 2%, a disease allele frequency of 0.211 (1 KG EUR) and a genotype relative risk of 1.21 (Skol et al., 2006). Therefore, although the initial study and this validation study have similar sample sizes, the present study is underpowered, even at a very modest significance threshold of 0.05. Failure to replicate the IL1R1 locus may be due to “winner’s curse”, whereby the initial finding overestimates the genetic effect (Palmer and Pe’er, 2017). An allele frequency of ~0.21 in the controls and ~0.23 in the cases may therefore be more representative of the actual difference, and could still constitute a significant finding in a larger cohort (Okada et al., 2014b).

It is noteworthy that, following the LOP Immunochip study, a GWAS for psoriasis in the Han Chinese population identified a signal in 2q12.1 at IL1RL1 (rs1420101, P = 1.71 x 10-10, OR = 1.12), which is located approximately 225 kb from the LOP-specific signal at IL1R1 (Sheng et al., 2015). Although the two variants at IL1R1 and IL1RL1 are not in LD (r2 = 0.001, D’ = 0.0842), their proximity might suggest that the EOP and LOP signals are within the same psoriasis locus. Further work utilising larger cohorts is necessary to determine if the putative IL1R1 locus is independently associated with a late age of onset.

1.5.3 Putative novel LOP loci This study initially identified a novel locus for LOP at genome-wide significance near SLC4A4 (rs13435715, P = 9.20 x 10-9), whilst a further 16 novel loci were associated at suggestive significance (P < 1 x 10-5). Since the SLC4A4 locus had not previously been found to be associated with any other autoimmune conditions it was suspected of being a false positive, which led to the application of further quality control measures for the putative novel loci. Despite the majority of signals having a high SNPTEST info score (> 110

0.9) (Figure 14), some of the novel suggestive associations had a low info score indicative of poor confidence in the genotype. For example, the lead SNP at SLC4A4 had an info score of 0.64. Therefore, a cut-off of 0.7 was set for the SNPTEST info score for each lead variant; this corresponds with a stringent imputation score (R2 > 0.7) that was utilised in recent meta-analyses (Tsoi et al., 2015; Tsoi et al., 2017). In addition, two of the novel suggestive loci (ELAC2 and DSCAM) harboured relatively rare singleton variants (MAF = 0.05) that were not supported by other variants in LD (r2 > 0.6) showing evidence of disease association. In total, five putative novel loci were identified as potential false positives (SLC4A4, TRIO, SH3RF2, ELAC2 and DSCAM). This analysis highlights the importance of assessing the quality metrics underlying novel loci in GWAS.

As with the known signals, the remaining twelve loci associated with LOP at suggestive significance were all present in non-coding regions, consistent with the majority of GWAS findings (Farh et al., 2015). Interestingly, the majority of the novel suggestive loci were not present in loci that have previously been found to be associated with autoimmune disorders. This is perhaps surprising since psoriasis has strong genetic overlap with other autoimmune disorders, in particular seronegative conditions (IBD, ankylosing spondylitis) (Farh et al., 2015). Previous overlaps between psoriasis and Crohn’s disease, for example, have included strong associations at 2p15 (B3GNT2), 6q25.3 (TAGAP) and 19p13.2 (TYK2) (Jostins et al., 2012; Liu et al., 2015; Tsoi et al., 2012). Here, one putative novel locus had evidence for association in another autoimmune disorder: at 8q23.3 (TRPS1) the lead SNP rs800582 (P = 2.94 x 10-6; OR = 0.73) was a perfect proxy for a SNP at suggestive significance for rheumatoid arthritis (rs800583, P = 7.05 x 10-7) (Gregersen et al., 2009). In this locus rs800582 is present in a 200 kb gene desert and has no evidence of eQTL function. The nearest gene TRPS1 encodes a transcription factor and mutations in the gene are associated with tricho-rhino-phalangeal syndrome; a rare condition that can manifest in arthritis-like symptoms (de Barros and Kakehasi, 2016). Whilst the association with RA at the TRPS1 locus is only suggestive, this evidence lends confidence to the LOP finding since other psoriasis loci are known to share association with RA such as 2p16.1 (REL), 6q25.3 (TAGAP) and 13q14.11 (COG6) (Okada et al., 2014b; Tsoi et al., 2012). It is important to identify loci that overlap between autoimmune diseases because it could lead to the identification of shared pathways and drug repositioning strategies, as has recently been suggested in Crohn’s disease (Li and Lu, 2013).

111

One suggestive novel locus at 3p24.3 (KAT2B) showed prior evidence for association with metabolic traits. Here the lead variant rs2929398 (P = 1.98 x 10-6, OR = 0.71) was in tight LD with SNPs at suggestive significance for HDL cholesterol change with statins (P = 8.90 x 10-7) and serum ratio of gamma glutamylthreonineserine (P = 1.70 x 10-6) (Barber et al., 2010; Suhre et al., 2011). A search of the GTEx database reveals that the lead variant, which is intronic to KAT2B (rs2929398), is an eQTL for KAT2B itself in thyroid and tibial nerve tissue (Keen and Moore, 2015). KAT2B encodes lysine acetyltransferase 2B, which has a role in transcriptional regulation by binding to CBP and P300 coactivators. The overlap between LOP and metabolic traits in this locus is interesting because metabolic syndrome is known to be a co-morbidity of psoriasis (Gelfand and Yeung, 2012).

The remainder of the novel suggestive signals were not found to be proxies for variants associated with any other GWAS traits. Of the novel suggestive signals, the strongest association with LOP was at 3q26.31 within NAALADL2 (rs55941837, P = 2.64 x 10-7, OR = 1.35). A careful analysis of the quality of the signal did not reveal any evidence to suggest that this was a false positive due to technical artefacts (Section 1.4.5.4). In this locus, the lead variant rs55941837 is situated in the third intron of NAALADL2 partial 5'UTR, variant 1. NAALADL2 is a large, alternatively spliced and poorly characterised gene that generates a protein of unclear function. However, variants within NAALADL2 have previously been shown to be associated with several traits including the autoimmune condition Kawasaki disease (P = 1 x 10-6) (Burgner et al., 2009) and autoantibody production in systemic lupus erythematosus (SLE) (P = 7.69 x 10-6) (Chung et al., 2011). Previous GWAS have revealed several susceptibility loci shared between psoriasis and SLE (Ramos et al., 2011), of which two shared loci (NFKBIA and IL28RA) were confirmed in a cross-phenotype analysis in Han Chinese (Li et al., 2013). Therefore the previous SLE association at this locus lends some confidence to the LOP finding. Whilst the LOP-associated variant is independent from the Kawasaki and SLE associations, the NAALADL2 locus would be interesting to follow up because it represents a case where variants independently associated with different autoimmune diseases might regulate the same target gene.

The putative novel loci at suggestive significance included some lead SNPs within the introns of genes (e.g. GGTLC1 and NAALADL2) and some SNPs within 50 kb of genes (e.g. CHRNA6 and PML). Several of the lead SNPs were intergenic and situated a long distance from the nearest gene target, for example 9p22.2 (rs12002358; approximately 255 kb

112 downstream of SH3GL2) and 13q22.2 (rs1929756; approximately 370 kb downstream of LMO7). The presence of lead SNPs within intronic and intergenic locations is consistent with previous GWAS findings in psoriasis (Tsoi et al., 2017). Whilst genes can be assigned to these loci based on proximity and biological function, identification of target genes would require examination of the chromatin state and conformation in relevant cell types in each locus. Identification of target genes, particularly in loci that are not associated with any other autoimmune trait, might shed light on the biological processes that underlie LOP and distinguish it from EOP.

1.5.4 Strengths and limitations The main strength of this analysis was the use of dense genotyping arrays that enabled genome-wide imputation of variants. Compared with the Immunochip, this study design represents a more hypothesis-free approach whereby the disease associations do not depend on prior evidence of the locus being involved in autoimmunity. This translated to the results, where many of the putative suggestive signals were not previously associated with an autoimmune condition. In addition, the known non-MHC psoriasis risk loci at immune-related genes were not genome wide significant and indeed some were less significant than the putative novel locus at NAALADL2 which reached P = 2.64 x 10-7 (IFIH1; P = 4.42 x 10-6, IL12B; P = 8.69 x 10-7 and TRAF3IP2; P = 5.06 x 10-6). LOP is thought to be less heritable than EOP (Henseler and Christophers, 1985; Springate et al., 2017), and, although studies have implicated IL1B signalling in LOP, little is known about the biological pathways that distinguish it from EOP (Hebert et al., 2014b; Shaw et al., 2010) . The results of this study might lead to speculation that LOP is less driven by genetic- mediated dysregulation of the immune system than EOP, particularly since the OR for HLA-C was lower than would be expected for EOP (1.55). However, it is important to note that the novel suggestive findings require validation and replication.

Another strength of this study was the use of stringent quality control. There is no universal cutoff for imputation R2, but studies have typically retained variants with R2 > 0.3 (Nair et al., 2009; Tsoi et al., 2012) whereas more recent meta-analyses use R2 > 0.7 (Tsoi et al., 2015; Tsoi et al., 2017). Here, a mid-range cutoff was used of R2 > 0.5, followed by a method for case control analysis that accounted for uncertainty in the imputed genotypes. Further removal of significant novel variants (P < 1 x 10-5) with a SNPTEST info score of < 0.7 reduced the likelihood of identifying false positives.

113

The present study is subject to a range of limitations. The small sample size, although an improvement on the 2014 Immunochip study, limits the ability of the study to detect both common variants with small effect sizes and rare variants. This necessitated a MAF cut-off of 5% in the present study. Low-frequency variants might play a role in disease risk, and recent studies have attempted to identify them in psoriasis (Jordan et al., 2012a; Sheng et al., 2014; Tang et al., 2014). However, exon sequencing has not revealed many rare coding variants with large effects, despite the power to detect them, hence it appears that common variation plays a larger role in general (Tang et al., 2014).

Another limitation to this study is that most of the LOP cases had PsA, which may affect the genetic signature of disease, although PsA is not known to be associated with age of psoriasis onset (Theodorakopoulou et al., 2016). In addition, it was necessary to draw the independent controls from a cohort of OA patients (Zeggini et al., 2012). However, OA is a non-autoimmune condition and is not related to psoriasis. None of the loci in this study have previously been reported as OA risk loci to the author’s knowledge, therefore all signals should be related to LOP risk rather than OA risk. OA can be confused with PsA (Ibrahim et al., 2009); however this was unlikely to be the case here because the OA patients utilised in the dataset had detailed phenotypes of primary OA; approximately 80% were assessed for total joint replacement (Zeggini et al., 2012).

1.5.5 Future work Since none of the putative novel LOP loci in this study reached an acceptable level of genome-wide significance, greater power may be required to confirm if these associations are real. To this end, genotype data on further LOP patients can be merged with the current dataset and the case-control analysis re-performed. In addition, a meta- analysis could be conducted with the Immunochip data (Hebert et al., 2014a). Any novel signals should be tested against EOP datasets to determine if they delineate LOP from EOP. Following this, the novel signals would require validation in a larger cohort of LOP patients to determine if they are true signals or false positives. Additionally, in order to determine if the signals are unique to LOP, ideally a case-control analysis would be carried out comparing LOP patients with EOP patients.

A different approach towards investigating the genetic background of age of psoriasis onset could involve utilising age as a continuous outcome in a psoriasis cohort, since dichotomising patients based on an age of onset at 40 is likely to be artificial. This line of 114 thinking is supported by the fact that Tsoi et al. (2017) recently found a strong negative correlation between age-of-onset and genetic risk score of psoriasis. Alternatively, an association analysis could be conducted by stratifying psoriasis patients based on their carriage of the major psoriasis risk allele, HLA-C*06:02, which is known to be associated with an earlier age of onset (Bowes et al., 2017). The present study supplements previous findings that HLA-C may not be the strongest genetic risk factor for LOP (Allen et al., 2005; Gudjonsson et al., 2006; Hebert et al., 2014a). Stratifying on HLA-C*06:02 may reveal novel LOP signals that were previously masked by presence of EOP samples in the dataset. For example, in the current analysis there were 38 LOP samples with an age of onset of 40, which is the cut-off age for LOP. However it seems unlikely that all of these samples truly represent the LOP phenotype.

In summary, it will be important to expand on this work for the purpose of identifying signals that delineate LOP from EOP. In EOP, large scale meta-analyses continue to identify novel disease-associated variants with increased sample sizes (Tsoi et al., 2015; Tsoi et al., 2012; Tsoi et al., 2017). The LOP study presented here utilised only 621 cases, yet found novel suggestive signals that have so far not been identified at genome-wide significance in EOP (up to 11,988 cases). It is therefore promising that these loci may reach genome-wide significance with an increased sample size. The next step for both EOP and LOP loci would be to use bioinformatic and functional analyses to determine causal variants and target genes in the context of psoriasis.

1.5.6 Conclusions In conclusion, this study performed a genome-wide association analysis of LOP patients against psoriasis-free controls. This study did not find an association between LOP and the previously reported locus at IL1R1. However, the results provide further evidence for a shared genetic background between EOP and LOP at HLA-A, HLA-C, IL23R, IFIH1, IL12B and TRAF3IP2. Further studies are required to confirm the association of novel suggestive signals with LOP.

115

116

2. FUNCTIONAL CHARACTERISATION OF PSORIASIS RISK LOCI

117

2.1 Introduction The vast majority of lead variants associated with psoriasis are present in non-coding regulatory elements with an unknown target gene. In these cases a combination of in silico and experimental techniques are required for determining the causal variant and target gene. Here, two loci associated with EOP were selected for further functional investigation. Since the work in the first section of this thesis did not confirm the presence of any LOP loci at genome-wide significance, functional follow-up was not conducted in the context of LOP.

The first line approach to prioritising disease-associated SNPs with the most likely regulatory function is to mine publically available online databases that contain a wealth of experimental wet lab data. Once likely credibility of functionality has been assigned to a locus the next step is to utilise molecular biology techniques in order to further dissect the biological link between the SNPs and their function in relation to disease mechanism. Regulatory elements are now known to affect the expression of gene targets through DNA looping that brings the element into close contact with a gene promoter (Miele and Dekker, 2008; Schoenfelder et al., 2010a); putative targets of disease-associated loci can therefore be identified using chromosome conformation capture (3C)-based methods (Dekker et al., 2002).

2.2 Aims and objectives of Section 2 The first aim of the functional work was to perform a detailed characterisation of selected psoriasis-associated GWAS loci by identifying likely causal variants and gene targets through bioinformatics and hypothesis-driven experimental approaches (ChIP and 3C) in relevant cell lines. The primary locus of interest was at 9q31 (KLF4); a locus thought to be specific to psoriasis. The secondary locus of interest was at 6q23 (TNFAIP3); a pan- autoimmune locus that has recently been under investigation by researchers in the ARUK lab group.

The second aim of the functional work was to employ a high-throughput approach towards characterising chromatin interactions in all known psoriasis loci in relevant cell types. This led on from the locus-specific functional work, which highlighted the complexity of the studied loci and motivated use of the capture Hi-C (CHi-C) technique.

118

The objectives of Section 2 were as follows:

1. Perform detailed functional characterisation of individual psoriasis risk loci: I. The 9q31 (KLF4) risk locus . Perform bioinformatics to inform on assay design . Perform ChIP-qPCR to identify psoriasis-associated variants in regulatory regions . Perform 3C-qPCR to identify putative gene targets II. The 6q23 (TNFAIP3) pan-autoimmune risk locus . Use bioinformatics to inform on assay design . Perform 3C-qPCR to identify gene targets 2. Perform functional characterisation of all known psoriasis loci using CHi-C in different cellular and stimulatory conditions

119

2.3 Methods

2.3.1 Methods for functional characterisation of individual risk loci The first section of the functional methods describes the experiments that were undertaken to characterise specific psoriasis-associated loci; primarily 9q31 (KLF4) and secondarily 6q23 (TNFAIP3). The methods include bioinformatics, ChIP and 3C.

2.3.1.1 Bioinformatics Selected known psoriasis loci were investigated using publically available data to inform on functional experiments. Here, a list of relevant tools and databases was collated in collaboration with Christopher Taylor (PhD student at ARUK). This manual pipeline was used to gain as much information as possible about each locus of interest. A hypothesis could then be formed based on which SNPs were most likely to be causal, and by which mechanism they might be regulating target genes. Appropriate functional experiments could then be designed and conducted.

The bioinformatic pipeline was used to investigate psoriasis-specific intergenic loci at 9q31.2 near KLF4 and 6q23 near TNFAIP3 (Tsoi et al., 2012). The process began by identifying all variants in linkage disequilibrium (LD) with the lead SNP in each locus and searching for any that have been shown to correlate with gene expression (eQTLs). Next, potentially causal SNPs were identified by their intersection with various regulatory features; scores for how deleterious or regulatory a SNP was were obtained using available bioinformatics tools. A description of each tool used can be found in Table 10; page 124.

SNPs in LD In each locus, a set of SNPs was generated to represent the full psoriasis association. To accomplish this, the Phase 3 release of the 1000 Genomes European data was interrogated using PLINK. Within PLINK, the functions “--r2” and “--ld-snp” were used to find all common SNPs in tight LD (R2 > 0.8) with the index SNP in each locus.

Once all variants in LD were identified, the Probabilistic Identification of causal SNPs (PICS) tool was employed (available at http://www.broadinstitute.org/pubs/finemapping/?q=pics). This tool utilises 2011 data

120 from 1000 Genomes in order to find proxy SNPs, but also uses a Bayesian approach to find the proportional likelihood that each SNP is the causal one (Farh et al., 2014).

Tools for locus visualisation Two websites were used to visualise the SNP set on the GRch37/hg19 (February 2009 release) human genome assembly: the UCSC Genome Browser (available at http://genome-euro.ucsc.edu/) and the WashU Epigenome Browser (Zhou et al. (2013); available at http://epigenomegateway.wustl.edu/browser/). In each case, the SNP set was formatted into a BED file and uploaded as a custom track. The SNPs were then visually assessed for their intersection with marks of regulatory regions from publically available datasets. These included protein binding (histone marks or transcription factors), DNase hypersensitivity (ENCODE) and predicted chromatin states (ChromHMM; Epigenomics Roadmap).

Locating eQTLs Four tools available online were used to locate eQTLs in the SNP sets: RegulomeDB V1.1, Blood eQTL, GTEx V7 and Haploreg V4.1. These tools have some overlap in the datasets queried (Table 9). RegulomeDB, is a multifunctional tool that identifies eQTLs, transcription factor binding and motifs, DNase peaks and DNase footprints in multiple cell types (available at http://regulomedb.org/). The second tool, Blood eQTL (available at http://genenetwork.nl/bloodeqtlbrowser/), uses data from peripheral whole blood (Westra et al., 2013). The third tool, GTEx (Keen and Moore, 2015; Lonsdale et al., 2013)(available at http://www.gtexportal.org/home/), is an ongoing project that curates expression data from post-mortem tissues, with the latest release (version 7, 2017) containing eQTLs for 48 tissue types. The final tool Haploreg (available at http://archive.broadinstitute.org/mammals/haploreg/haploreg.php) searches various eQTL databases from individual studies, the NCBI eQTL browser and the GTEx version 6 (2015) release. Similarly to RegulomeDB, Haploreg provides information on other local features such as transcription factor binding, DNase hypersensitivity and nearby genes using NIH Roadmap and ENCODE data.

121

Table 9: Datasets used in eQTL tools Abbreviations: EUR, European; LCL, lymphoblastoid cell lines; WGE, whole genome expression * GTEx is an ongoing project; refer to the website for updated information

Sample Total Regulome Blood Haploreg Reference Cell type Method GTEx V7 population samples DB V1.1 eQTL V4.1

Peripheral WGE Westra et al. (2013) EUR 5,311   blood array Stranger et al. Nigerian, WGE LCL 210    (2007) EUR, Asian array American- WGE Schadt et al. (2008) Liver 427    EUR array WGE Gibbs et al. (2010) Brain Caucasian 572    array Montgomery et al. LCL EUR RNA seq 60    (2010) Primary WGE Dimas et al. (2009) fibroblasts and EUR 75  array T cells, LCL WGE Myers et al. (2007) Brain EUR 279  array Veyrieras et al. Nigerian, WGE LCL 210  (2008) EUR, Asian array

Pickrell et al. (2010) LCL Nigerian RNA seq 69 

WGE Zeller et al. (2010) Monocytes EUR 1,490  array

Degner et al. (2012) LCL Nigerian RNA seq 70 

WGE 44 post- GTEx V6 (2015)* Multiple array, 449   mortem tissues RNA seq WGE 48 post- GTEx V7 (2017)* Multiple array, 620  mortem tissues RNA seq Various Up to Li et al. (2014b) EUR RNA seq  tumours 391 WGE Hao et al. (2012) Lung EUR 1,111  array Koopmann et al. WGE Heart EUR 129  (2014) array Primary WGE Fairfax et al. (2014) EUR 432  monocytes array Primary bone WGE (Grundberg et al.) Swedish 113  cells array Lappalainen et al. EUR, Up to LCL RNA seq  (2013) Yoruba 462 Ramasamy et al. WGE Brain EUR 134  (2014) array

122

Protein binding sites The next step was to identify any proteins bound to the SNPs, such as histone marks for enhancers (H3K4me1 and H3K27ac) and transcription factors; this information can be obtained from ChIP-seq data in relevant cell types. Additionally, the DNA sequence at each SNP can be examined in order to identify transcription factor binding motifs. SNPs modifying these motifs are likely to affect protein binding. The tools used for this purpose were RegulomeDB and Haploreg, as described above.

Functional scores Some online tools apply scores to SNPs based on information about their regulatory features (both experimentally shown and predicted). The first of these, RegulomeDB, assigns each SNP a score based on the prediction that a variant will affect protein binding and gene expression. The lower the score, the more likely that a variant will have a regulatory function (Appendix, Table 25). The second tool used was Combined Annotation Dependent Depletion (CADD) (available at http://cadd.gs.washington.edu/score), which uses more than 80 diverse annotations to assign SNPs a likelihood score that they will be deleterious (Kircher et al., 2014). The higher the score from CADD, the more likely that a variant will be deleterious. The final tool used to indicate SNP function was the Ensembl Variant Effect Predictor (VEP) (available at http://www.ensembl.org/Homo_sapiens/Tools/VEP). VEP gives an indication of the location of each variant and predicts whether or not they are present in a regulatory element.

Gene expression In each locus, GTEx was used to examine candidate gene expression across a range of cell types and tissues. The purpose of this was two-fold; allowing for 1) the prioritisation of genes that are expressed in disease-relevant tissues and 2) the selection of relevant cell types for functional experiments.

123

Table 10: Tools used in this project for bioinformatic characterisation of GWAS loci The table describes the selection of tools used for bioinformatic analysis; these tools encompass the largest publically available epigenetic datasets. Information was collated in collaboration with Chris Taylor. Annotation Tool Authors URL Purpose Input Output Data Used SNPs in LD www.cog- A multifunctional tool with index Purcell et al. 1KG phase 3 PLINK genomics.or for examining linkage Index SNP SNP, using (2007) (2014) g/plink2 and performing GWAS command -- ld-snp SNPs in LD www.broadi A Bayesian algorithm Potential Index nstitute.org/ designed by Farh et al. causal SNPs Broad SNP, P- 1KG phase 1 PICS pubs/finema (2014) to predict causal in tight LD Institute value and (2012) pping/?q=pi SNPs based on with lead ethnicity cs permutations SNP eQTLs with Regulome Boyle et al. regulomedb. An eQTL database for SNP ID or associated See Table 9 DB (2012) org multiple cell types coords functional scores Blood genenetwor Westra et al. A database of eQTLs in Gene or eQTL k.nl/bloode eQTLs See Table 9 (2013) blood SNP browser qtlbrowser eQTLs with eQTLs www.gtexpo associated Broad An eQTL database for Gene or GTEx rtal.org/ho gene See Table 9 Institute multiple tissues SNP me expression data www.broadi eQTLs nstitute.org/ An eQTL database for SNP IDs or among SNPs Broad Haploreg mammals/h multiple cell types and select a within r2 > See Table 9 Institute aploreg/hap tissues GWAS 0.8 with loreg. input SNP Genome Bioinformati Genome browser for Tracks from Public and UCSC cs Group at genome- viewing genomic multiple chosen custom Browser Genome UC Santa euro.ucsc.ed features and epigenetic sources e.g. epigenetic view Browser Cruz u features e.g. ENCODE dbSNP, GWAS tracks Genomics data Catalog Genome Institute viewer Genome browser for Tracks from Epigenome viewing genomic epigenomeg Public and multiple chosen WashU Informatics features and epigenetic ateway.wust custom Browser sources e.g. Epigenom group at features, particularly l.edu/brows epigenetic view Roadmap e Browser Washington useful for visualisation er tracks Epigenomics University of chromatin interaction ChromHMM data

124

Annotation Tool Authors URL Purpose Input Output Data Used Multiple sources including ENCODE, NIH known/predicted Score for Roadmap and regulatory elements each SNP DNase Regulome Boyle et al. regulomedb. including DNase SNP ID or denoting footprinting DB (2012) org hypersensitivity and coords likelihood of studies (Boyle et transcription factor (TF) regulatory al., 2011; Piper binding function et al., 2013; Pique-Regi et al., 2011) Table including The Ensembl www.ensem Predicts effects of predicted Regulatory Build bl.org/info/ variants on genes and variant (for non-coding SNP VEP Ensembl docs/tools/v transcripts. Includes SNP IDs impact and annotation) regulatory ep/index.ht SIFT and PolyPhen biotype e.g. (Zerbino et al., function ml scores for coding SNPs promoter, 2015)

enhancer ENCODE, NIH Roadmap and www.broadi Table of Annotates variants by others; see nstitute.org/ SNP IDs or regulatory Broad chromatin state, DNase archive.broadins Haploreg mammals/h select a info for Institute hypersensitivity and TF titute.org/mam aploreg/hap GWAS SNPs with r2 binding sites mals/haploreg/d loreg.php > 0.8 ocumentation_v 4.1.html Human A predictor for SNP Reference cadd.gs.was deleterious SNPs; coords in Genome, 1KG, CADD Kircher et al. hington.edu compares evolutionarily SNP scores VCF NHLBI Exome (2014) /score conserved variants with format Sequencing simulated mutations Project (ESP) Relative www.gtexpo Gene expression across expression 48 post-mortem Gene Broad GTEx rtal.org/ho multiple cell types and Gene across tissues from 620 expression Institute me post-mortem tissues sample samples types

125

2.3.1.2 Chromatin Immunoprecipitation ChIP is a method used to determine whether genomic loci are associated with specific protein complexes, such as transcription factors or histone proteins. The principles of ChIP involve cross-linking of DNA and its bound proteins in a living cell, followed by chromatin shearing and immunoprecipitation with an antibody complementary to the protein of interest. The cross-links are reversed and the purified DNA can be identified by qPCR analysis (Christova, 2013).

Here, a ChIP protocol described in McGovern et al. (2016) was used to investigate the binding of regulatory histone marks in the 9q31 locus in HaCaT and My-La cell lines. In addition, one ChIP experiment was conducted using normal human epidermal keratinocytes (NHEKs).

Cell work HaCaT cell culture

HaCaT cells are human keratinocytes derived from normal adult skin that were spontaneously immortalised in vitro (Boukamp et al., 1988). HaCaT cells are adherent, typically growing as a monolayer on the surface. In order to split these cells, therefore, it is necessary to use trypsin to dissociate them from the flask.

Here, HaCaT cells (Cell Lines Service) were retrieved from cryopreserved stocks at -80˚C and rapidly warmed between gloved hands. The cell suspension was quickly pipetted into a 15ml falcon tube containing Dubecco’s Modified Eagle Medium (DMEM) (Sigma Aldrich) at 37˚C, to a total of 10 mL. The tube was then centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was removed and the cells were re-suspended in 10ml DMEM containing high glucose (4500 mg/L), 2mM L-glutamine and sodium bicarbonate, supplemented with 1% penicillin-streptomycin and 10% foetal bovine serum (FBS) (ThermoFisher Scientific).

The cells were cultured in a T-75 flask (ThermoFisher Scientific) laid flat in an incubator at

37˚C, 5% CO2. Once cells formed a monolayer in a “cobblestone” formation, the medium was discarded and cells were washed with 5 mL pre-warmed PBS. The PBS was then discarded and 5ml pre-warmed Trypsin EDTA (Sigma Aldrich) was pipetted on top of the

126 cell layer. The cells were incubated for 5 minutes at 37˚C, 5% CO2, after which the flask was tapped sharply a few times to detach the cells. The flask was placed back in the incubator for another 5 minutes, after which it was tapped again. This incubating/tapping process was repeated until all of the cells were detached from the bottom of the flask (visualised under a microscope).

Once all cells were detached, 5 mL DMEM was added to the cell suspension, which was then transferred to a 15 mL falcon tube and centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was discarded and the pellet was re-suspended in 10ml warm, supplemented medium, which was then divided into flasks of fresh medium (total 10ml per flask) at a ratio 1:5 to 1:10 and returned to the incubator. The growing, detaching and splitting process was repeated until enough flasks of cells were obtained for subsequent experiments.

Any remaining flasks of cells were cryopreserved at a minimum density of 5 x 106 cells per mL. To achieve this, the cells were first centrifuged at 200 x g for 5 minutes at room temperature and the supernatant was discarded. The cells were then re-suspended in Recovery™ Cell Culture Freezing Medium (Life Technologies) at 5 x 106 cells per mL. The cell suspension was pipetted into cryogenic vials (Greiner Bio-One) at 1ml per vial, which were then frozen in a Mr Frosty™ (ThermoFisher Scientific) overnight at -80˚C. This ensured that the cells froze at a rate of -1˚C per minute, allowing for optimal preservation. Cells were stored at -150 ˚C for long-term storage.

NHEK cell culture

Primary Normal Human Epidermal Keratinocytes (NHEKs) were cultured for a ChIP experiment in order to examine their comparability with the HaCaT keratinocyte cell line. NHEKs grow at a much lower density than HaCaT cells, and were therefore unsuitable for experiments requiring large numbers of input cells.

Here, a flask of proliferating NHEKs (PromoCell) from a single adult donor was gratefully received from Dr Catherine O’Neill (Division of Musculoskeletal and Dermatological Sciences, University of Manchester). The cells were cultured in PromoCell Growth

Medium in T-75 flasks at 37˚C with 5% CO2. At approximately 70% confluence the cells were passaged using the PromoCell DetachKit. The media was removed using an aspirator and 2 mL trypsin EDTA (PromoCell) was added. The cells were incubated for 5 minutes at

127

37˚C, 5% CO2, after which the flask was tapped sharply a few times to detach the cells. To neutralise the trypsin, 2 mL Trypsin Neutralization Solution (PromoCell) was pipetted over the cells. The cell suspension was pelleted for 3 minutes at 200 x g and the cells were suspended in medium. Cells were seeded into new flasks at a minimum density of 5 x 104 cells per mL. The cells were cultured up to a maximum number of 5 passages.

My-La cell culture

My-La CD8+ cells are human T lymphocytes that were originally derived from an 82 year old male Caucasian with mycosis fungoides (Kaltoft et al., 1984). My-La cells grow as single cells or as small aggregates in a suspension culture.

Here, frozen My-La cells (Sigma Aldrich) were retrieved from previously cryopreserved stocks at -80˚C and rapidly warmed between gloved hands. The cell suspension was quickly pipetted into a 15ml falcon tube containing Roswell Park Memorial Institute (RPMI) 1640 medium (Sigma Aldrich) at 37˚C, to a total of 10mL. The tube was then centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was removed and the cells were re-suspended in 10ml RPMI-1640 containing 2mM L-glutamine and sodium bicarbonate, supplemented with 1% penicillin-streptomycin (Sigma-Aldrich), 10% AB Human serum (Sigma Aldrich) and 100 U/mL recombinant human IL-2 (Sigma Aldrich).

The cells were cultured in a T-25 flask in an incubator at 37˚C with 5% CO2. Once cells reached a density of 8 x 105 to 1 x 106 per mL the cell suspension was transferred to a 15ml falcon tube and centrifuged at 200 x g for 5 minutes. The supernatant was discarded and the pellet was re-suspended in 10 mL 1X Dulbecco’s Phosphate Buffered Saline (PBS; Sigma Aldrich) at 37˚C, followed by centrifugation at 200 x g for 5 minutes. This step was to remove debris originating from the AB human serum in the culture medium. The supernatant was again discarded and the pellet was re-suspended in 10 mL of supplemented medium at 37˚C, which was then divided into flasks of fresh medium at a ratio of 1:2 to 1:3 and returned to the incubator.

The growing and splitting process was repeated until enough flasks of cells were obtained for subsequent experiments. Any remaining cells were cryopreserved in the same manner as for HaCaT.

128

Cell counting

The suspension My-La cells were counted using a CASY Modell TT cell counter, both in order to maintain an optimal growing density and to obtain cell counts prior to fixation. The adherent HaCaT and NHEK cells, on the other hand, were counted prior to fixation using a haemocytometer under an optical microscope (Optika). This was because the cells tended to clump together even after trypsinisation, making automated counting inaccurate.

Formaldehyde cross-linking for ChIP Formaldehyde acts on the cells to cross-link DNA and its bound proteins for subsequent chromatin-based experiments. Here, flasks of HaCaT cells, NHEKs or My-La cells were cross-linked as described below.

HaCaT and NHEK cells

HaCaT and NHEK cells were cross-linked whilst still adhered to the culture flask in the method according to Belton et al. (2012). A confluent flask of HaCaT cells yielded approximately 1 x 107 cells whereas a confluent flask of NHEKs only yielded up to 1 x 106 cells per fixation.

The medium was removed and 22.5 mL DMEM (without serum) was layered over the cells. To cross-link, 625 µL of 37% formaldehyde was pipetted into the flask to a final concentration of 1% and the cells were incubated flat on the bench top for 10 minutes with gentle rocking every 2 minutes. After exactly 10 minutes, the reaction was quenched by the addition of 1.56 mL ice cold 2M glycine to a final concentration of 0.135M. The cells were again incubated flat on the bench top for 5 minutes, then transferred to ice and incubated for at least 15 minutes to make sure that the reaction was fully halted. After this, the fixed cells were scraped from the bottom of the flask using a 25 cm cell scraper (Greiner Bio-One) and the cell suspension was transferred to a 50 mL falcon tube. The cells were centrifuged at 200 x g for 10 minutes at 4˚C, the supernatant was removed and the cell pellet was re-suspended in 10 mL cold PBS. The cells were centrifuged at 200 x g for 10 minutes at 4˚C and the supernatant was completely removed. The cell pellet was snap frozen on dry ice for approximately 5 minutes before being transferred to a freezer at -80˚C.

129

My-La cells

My-La cells were cross-linked in suspension. A confluent flask of HaCaT cells yielded approximately 1 x 107 cells.

The My-La cell suspension was transferred to a 50 mL falcon tube and centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was discarded and the cell pellet was re-suspended in 5 mL PBS at 37˚C. The tube was again centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was then discarded and the cell pellet was re-suspended in 10 mL PBS, following which the suspension was topped up to 40 mL with PBS. In order to cross-link cells, 1.08 mL of 37% formaldehyde was added to a final concentration of 1% and the cells were incubated on a gently rocking platform at room temperature. After exactly 10 minutes, the reaction was quenched by the addition of 2.6 mL 2M ice-cold glycine. The cells were immediately transferred to ice and incubated, with gentle rocking, for 5 minutes in order to fully halt the reaction. The cells were centrifuged at 516 x g for 5 minutes at 4˚C, the supernatant was completely removed and the cell pellet was snap frozen on dry ice for approximately 5 minutes before being transferred to a freezer at -80˚C.

ChIP cell lysis and chromatin fragmentation Firstly, the cross-linked cells were removed from the -80˚C freezer and defrosted on ice. The cells were suspended in complete lysis buffer (volume variable according to cell type) containing 50 mM Tris-HCL pH 8.1, 10 mM EDTA, 1% SDS (Life Technologies) and fresh 1X protease inhibitor (EDTA free) and incubated for 15-30 minutes on ice. The chromatin was then fragmented with a Covaris S220 sonicator using the settings described in Table 11. The sonicated sample was centrifuged at maximum speed for 15 minutes at 4˚C and the supernatant stored at -80˚C.

Optimisation of sonication for different cell types

The Covaris S220 instrument offers various settings for sonication procedures. A number of these settings were adjusted for each of the cell types used (Table 11). In addition, a sonication time-course was carried out for each cell type in order to identify the duration of sonication for obtaining appropriate length fragments (200-400 bp). At each time- point, 10 µL chromatin suspension was transferred to a fresh tube. To this suspension was added 90 µL cold TE buffer, containing 10 mM Tris-HCL pH8.1 and 1 mM EDTA, and 2 µL

130 of 10 mg/mL RNaseA. The samples were then incubated at 37˚C for 30 minutes with shaking at 400 rpm. Proteinase K (5 µL of 10 mg/mL) was added to each sample, followed by incubation at 65˚C for 2 hours with shaking at 400 rpm. The DNA was purified using a Purelink PCR purification kit (Life Technologies) according to the manufacturer’s instructions (Appendix, Table 23) and the samples were run on a 1.5% agarose gel, containing ethidium bromide, at 120V for 45 minutes. The gel was visualised on a UV transilluminator and the time-point generating DNA fragments between 200-400 bp was selected for use in the main experiment.

Table 11: Optimised Covaris sonication settings for individual cell types. Condition My-La HaCaT NHEK Tube AFA Fiber & Cap Tube AFA Fiber & Cap MicroTUBE AFA Fiber Tube type 12x12mm 12x12mm with Snap-Cap Volume of input 1000 1000 130 chromatin (µL) Duty cycle (%) 5 10 5 Peak incident power 140 140 140 Cycles(Watts) per burst 200 200 200 Temperature (˚C) 4 4 4 Time (min) 20 20 8

Immunoprecipitation Immunoprecipitation was carried out on the fragmented chromatin using antibodies targeting the modified histone marks H3K4Me1 and H3K27ac, which mark active enhancer regions of DNA.

An excess of magnetic Dynabeads (Life Technologies) for each sample was prepared. A blend of type A and type G beads were combined in a 1:1 ratio. The beads were washed a total of three times with 500 µL 50 mg/mL BSA in PBS. To achieve this, the beads were captured on a magnet, the solution allowed to clear, then the supernatant discarded. The tube was then removed from the magnet and the beads re-suspended in wash buffer. During the third wash, the beads were incubated in the buffer for 30 minutes in order to reduce non-specific binding. The beads were captured on the magnet, the supernatant was removed and the beads were suspended in the starting volume of 50 mg/mL BSA in PBS.

131

For each immunoprecipitation, 100 µL of chromatin (corresponding to approximately 1 x 106 cells for My-La and HaCaT and 4 x 105 cells for NHEK; determined by optimisation) was thawed on ice and 400 µL dilution buffer (16.7 mM Tris-HCl pH 8.1, 1.1% Triton X-100 (Sigma Aldrich), 0.01 % SDS, 167 mM NaCl and fresh protease inhibitors) was added to each aliquot, to make 500 µL total. In order to create a 1% input control sample, 5µL of the diluted chromatin was removed and stored at 4oC for later use. The 1% input sample controls for the amount of chromatin that is present in each biological sample. A rabbit polyclonal antibody for H3K4me1 (Abcam; ab8895) or H3K27ac (Abcam; ab4729) was added to each sample (2 µg of antibody for My-La and HaCaT samples and 1 µg of antibody for NHEK samples; determined by optimisation). Washed beads were added to each sample (20 µL), and the mixtures incubated overnight at 4˚C with rotation. At least one sample per experiment was incubated without antibody in order to control for background signal in the final analysis.

ChIP Elution and reverse cross-linking At this stage of ChIP, the chromatin attached to the bead-bound antibody is washed in order to remove non-specific binding. The cross-linked chromatin is then eluted from the beads, which is achieved using an elution buffer containing SDS. Proteinase-K is used to reverse the cross-links and the DNA can be purified in preparation for analysis via qPCR.

Firstly, the beads were captured on the magnet and the supernatant discarded. The beads were washed in each of the buffers listed in Table 12. Each time, the beads were incubated in 0.5 mL cold buffer with rotation, followed by magnetic clearance and removal of the supernatant. The buffers were used in the following order: low salt wash buffer, high salt wash buffer, LiCl salt wash buffer and TE buffer.

132

Table 12: Immunoprecipitation wash buffers Buffer Recipe Order Low salt wash buffer 0.1% SDS, 1.0% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl (pH 8.1) 1 and 150 mM NaCl High salt wash buffer 0.1% SDS, 1.0% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl (pH 8.1) 2 and 500 mM NaCl LiCl wash buffer 1.0% Igepal-CA630 (NP-40), 1.0% deoxycholate, 1 mM EDTA, 10 3 mM Tris-HCl (pH 8.1) and 250 mM LiCl TE buffer 10 mM Tris-HCl (pH 8.1) and 1 mM EDTA 4

After the last wash (TE buffer), the beads were captured on the magnet and the buffer was completely removed. The beads were re-suspended in 100 µL elution buffer (1% SDS and 0.1M NaHCO3) and 1 µL of 10 mg/mL proteinase-K. The tubes were then incubated at 62oC for 2 hours with shaking, followed by 95oC for 10 minutes after which they were cooled to room temperature. Finally the beads were captured and the supernatant, containing the eluted samples, was transferred to a new tube. The DNA was purified using Purelink Quick PCR purification columns (Life Technologies). All samples were eluted in 50 µL elution buffer and stored at -20oC. qPCR for ChIP enrichment ChIP enrichment was measured using quantitative real-time PCR (qPCR). This is a widely- used technique whereby the quantity of a DNA product is measured in real time during amplification, instead of at the end-point of PCR. Here SYBR® green technology (ThermoFisher Scientific) was used to detect the products. SYBR® green is a dye that preferentially binds to double-stranded DNA (dsDNA) molecules and emits fluorescence, allowing the dsDNA to be detected and quantified in the qPCR reaction. The data generated by the qPCR reaction is given as a Ct (threshold cycle) value. This is the number of cycles at which the amplification curve, generated by an exponential increase in copies of the product, intersects a certain threshold (Figure 20). Therefore, a larger quantity of product in the template corresponds with a lower Ct value.

133

A

Threshold

B

Threshold

Figure 20: Representative qPCR plots Amplification data is shown from a sample dilution series across multiple primer sets. The change in Rn (ΔRn; Y axis in both graphs) is calculated by the level of fluorescence of the test dye divided by the level from a passive reference dye, minus the baseline signal. (A) ΔRn is plotted against cycle; here the threshold (yellow line) intersects each amplification trace as it enters the exponential stage, corresponding with a cycle threshold (Ct) of approximately 22 for the first sample (arrow). (B) Log ΔRn is plotted against cycle; here the threshold (yellow line) intersects each amplification trace within the linear stage, again corresponding with a Ct of 22 for the first sample (arrow).

Primer design for ChIP

Primers were designed using Primer3 (http://primer3.ut.ee/) to target regions of 100- 200 bp encompassing likely regulatory SNPs in 9q31 (Appendix, Table 26). Primer sequences were an optimum length of 20 nucleotides and avoided targets containing common variation (MAF > 1%). In addition the primers had minimal evidence of self- ligation; dsDNA from self-ligation products such as hairpins and primer dimers can cause false signal by binding SYBR® green dye. The primers were also run through Primer BLAST against the human reference genome (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) to check their uniqueness. Primers were rejected if any mismatching between the primer and template could occur at two or fewer base pairs in the nucleotide sequence.

134

Primer efficiency and specificity for ChIP

For qPCR using SYBR® green, primers must have sufficient specificity for the target region and reaction efficiency. This is because SYBR® green intercalates with all dsDNA, therefore any non-specific amplification in the reaction will produce false positive results. Primer specificity can be tested during qPCR by analysis of a melt curve profile. PCR efficiency can be tested by analysis of the reaction output over a dilution series.

Here, primer pair efficiency was tested against dilutions of a DNA template (Human Random Control DNA, Sigma) using SYBR® green as the reporter. In order to create a standard curve for each primer, 10-fold serial dilutions were made of the DNA (1, 1/10, 1/100, 1/1000 and 1/10,000). Each primer set was then tested in triplicate against each concentration of the template. For each primer set, the resulting Ct values were plotted against Log (dilution factor) in Excel and linear regression was used to generate a slope. The percentage primer efficiency was measured by calculating [(10(-1/slope)-1) x 100] (Figure 21). An acceptable efficiency range is 90 – 110%, where 100% efficiency indicates that there are 3.32 (√10) cycles difference in Ct values between each 10-fold dilution, i.e. there is a 2-fold increase in product per cycle.

The melt curve produced by the qPCR reaction was analysed by eye for each primer set to determine target specificity. A single peak in the melt curve indicates high specificity for the target region, whereas multiple peaks, especially at lower temperatures, can indicate the occurrence of primer dimers or contamination (Figure 21).

135

i. ii. A 35 35

30 30

25 25

20 20

15 15 Ctvalue y = -3.2241x + 25.922 Ctvalue y = -3.0442x + 24.765 10 R² = 0.9953 10 R² = 0.9977

5 Efficiency = 104.25% 5 Efficiency = 113.06% 0 0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 Log(dilution factor) Log(dilution factor)

B i. ii.

Rn’) Rn’)

- -

Derivative reporter ( reporter Derivative ( reporter Derivative

Temperature (˚C) Temperature (˚C)

Figure 21: Example efficiency and specificity measurements for optimal and non-optimal primer pairs The linear regression line for the first primer pair indicates an efficient PCR reaction (Ai.), whereas the second primer pair is over-efficient (Aii.). The corresponding melt curves demonstrate a single reaction product for the first primer pair (Bi.), whereas some off-target products are indicated with the second primer pair, particularly at low concentrations (1/10,000; purple line); this could be due to primer dimer formation (Bii.).

136

ChIP qPCR and analysis

For each target region, qPCR was carried out on three ChIP libraries in PCR triplicates in a MicroAmp® Optical 384-Well Reaction Plate (ThermoFisher Scientific). The reactions were set up as follows:

ChIP template 1 µL Power SYBR® Green (2X) 5 µL Forward primer (10 µM) 0.5 µL Reverse primer (10 µM) 0.5 µL Distilled water 3 µL TOTAL 10 µL

The plates covered with MicroAmp® Optical Adhesive Film (ThermoFisher Scientific) and run on a QuantStudio 12K Flex Real-Time PCR System (ThermoFisher Scientific) using the following settings:

50 ˚C 2 minutes 1 cycle 95 ˚C 10 minutes 1 cycle

95 ˚C 15 seconds 40 cycles 60 ˚C 1 minute

95 ˚C 15 seconds 1 cycle 60 ˚C 15 seconds (melt curve) 95 ˚C 15 seconds

For each ChIP plate, the target regions included positive controls; regions where the H3K4me1 and H3K27ac histone marks have been experimentally shown to bind, and a negative control, where they do not bind. For each primer pair, a no-antibody control and a no-template control was included. The 1% input samples were run alongside the samples in order to account for differing amounts of chromatin in each sample.

The raw Ct values from the qPCR were exported from the QuantStudio 12K Flex software into Microsoft Excel for analysis. Initially, the Ct values of the technical PCR triplicates were screened for a standard deviation (s.d.) < 0.5. For some reactions the s.d. of the Ct values of the PCR triplicates was higher; this was particularly the case for negative control

137 regions and no-antibody control samples. In these cases, the two triplicates closest to each other were retained whilst the outlier was excluded from the analysis.

The Ct values for the 1% input samples were adjusted to 100% (Ct – 6.644 cycles). This value of 6.644 is derived from log2 of 100, i.e. a dilution factor of 100. The adjusted input and sample Ct values were added into the final calculation to obtain the ChIP enrichment for each target:

2(adjusted input Ct –sample Ct) x 100

Finally, in order to normalise for background at each target, the enrichment value for the no-antibody control was subtracted from the enrichment value for the immunoprecipitated sample. To detect if a target had significantly higher binding enrichment than the negative control region, multiple T tests were conducted using GraphPad Prism 7. The Holm-Šídák test was used to control for multiple comparisons, as recommended by Prism.

138

2.3.1.3 Chromosome conformation capture 3C is a hypothesis-driven method for identifying individual DNA interactions in a specific genetic locus. 3C libraries for HaCaT and My-La cells were produced according to the protocol initially described by Lieberman-Aiden et al. (2009); established in the ARUK lab by Dr Amanda McGovern. The 3C process involves cross-linking of interacting regions followed by digestion with a restriction enzyme (HindIII), after which the sequences are joined through intramolecular ligation. Following cross-link reversal, the product can be detected via qPCR (Dekker et al., 2002).

Formaldehyde cross-linking for 3C Formaldehyde cross-linking was performed in a similar manner to ChIP but using optimised conditions for 3C. For each 3C experiment, three vials of confluent cells (2-3 x 107 cells) were combined together to create a library. HaCaT or My-La cells were cross- linked as described below.

HaCaT cells

The medium was removed and 22.5 mL DMEM (without serum) was layered over the cells. To cross-link, 625 µL of 37% formaldehyde was pipetted into the flask to a final concentration of 1% and the cells were incubated flat on the bench top for 10 minutes with gentle rocking every 2 minutes. After exactly 10 minutes, the reaction was quenched by the addition of 1.56 mL ice cold 2M glycine to a final concentration of 0.135M. The cells were again incubated at room temperature for 5 minutes, then transferred to ice and incubated for at least 15 minutes to make sure that the reaction was fully quenched. After this, the fixed cells were scraped from the bottom of the flask using a 25 cm cell scraper (Greiner Bio-One) and the cell suspension was transferred to a 50 mL falcon tube. The cells were centrifuged at 200 x g for 10 minutes at 4˚C, the supernatant was removed and the cell pellet was re-suspended in 10 mL cold PBS. The cells were centrifuged at 200 x g for 10 minutes at 4˚C and the supernatant was completely removed. The cell pellet was snap frozen on dry ice for approximately 5 minutes before being transferred to a freezer at -80˚C.

My-La cells

The cell suspension was transferred to a 50 mL falcon tube and centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was discarded and the cell pellet was

139 re-suspended in 5 mL PBS at 37˚C. The tube was again centrifuged at 200 x g for 5 minutes at room temperature. The supernatant was then discarded and the cell pellet was re-suspended in 10 mL room temperature DMEM, following which the suspension was topped up to 40 mL with DMEM. In order to cross-link the cells, 2.16 mL of 37% formaldehyde was added to a final concentration of 2% and the cells were incubated on a gently rocking platform at room temperature. After exactly 10 minutes, the reaction was quenched by the addition of 6 mL ice-cold 1 M glycine. The cells were incubated for 5 minutes at room temperature then transferred to ice and incubated, with gentle rocking, for 15 minutes in order to fully quench the reaction. The cells were centrifuged at 453 x g for 10 minutes at 4˚C, the supernatant was completely removed and the cell pellet was re-suspended in cold PBS to a final volume of 50 mL. The cells were centrifuged at 453 x g for 10 minutes at 4˚C, the supernatant was completely removed and the pellet was snap frozen on dry ice for approximately 5 minutes before being transferred to a freezer at - 80˚C.

Cell lysis and restriction digestion The next stage in the production of a 3C library is to lyse the cells and digest the DNA into roughly equal length fragments using a restriction enzyme; here HindIII.

The cross-linked cells were lysed in 50 mL ice cold lysis buffer (10 mM Tris-HCl, 0.2% NP-

40/ Igepal CA-420, 10 mM NaCl and protease inhibitors in H2O). Firstly, the cell pellets were re-suspended in 7 mL lysis buffer and kept on ice for 10 minutes. The cells were then mechanically lysed twice using dounce homogenisers (10 strokes each time, with a 5 minute gap between homogenisations to prevent the cells from overheating). The cellular suspension was combined with the remaining 43 mL of lysis buffer and kept on ice for the rest of the incubation time. The suspension was then centrifuged at 650 x g for 5 minutes at 4˚C. The supernatant was discarded and 800 µL 1.25 X NEBuffer 2 (NEB) was layered on top of the pellet. The supernatant was again discarded and the pellet was re-suspended in 250 µL of 1.25 X NEBuffer 2 per 5-6 x 106 cells, then split into corresponding 250 µL aliquots and 108 µL 1.25x NEBuffer 2 added to each aliquot.

In order to remove proteins not directly cross-linked to the DNA, 11 µL 10% SDS was added to each aliquot, and carefully mixed. The aliquots were then incubated at 37°C for 60 minutes with rotating at 950 rpm on a Thermomixer. To quench the reaction, 75 µL 10% Triton X-100 was added to each aliquot followed by incubation at 37°C for 60 140 minutes with rotating at 950 rpm. After this, the chromatin was digested with 1500 units (15 µL) of HindIII (NEB R0104T) per tube followed by an overnight incubation at 37°C with rotating at 950 rpm.

Ligation and reverse cross-linking Next, DNA ligase is used to join the interacting chromatin fragments. The ligation reaction is performed in a large reaction volume that favours intramolecular ligation over intermolecular ligation between non cross-linked fragments. Ligation is followed by incubation with proteinase-K, which reverses the cross-links.

Here a ligation reaction mixture was prepared in a 15 mL tube for each aliquot (6.71 mL distilled water, 82 µL 10 mg/mL BSA and 820 µL T4 DNA ligase reaction buffer (NEB; B0202S). The chromatin was added to each tube, followed by the addition of 2 µL of T4 DNA ligase (NEB; M0202S). The tubes were incubated in a water bath at 16˚C for 4-6 hours. After this, cross-links were reversed using 60 µL (10 mg/mL) proteinase-K per tube and the samples were incubated in a waterbath overnight at 65˚C.

DNA purification and quantification Following cross-link reversal, the DNA is isolated using phenol-based organic solvents. These solvents work by denaturing proteins in the sample and causing them to remain in the organic phase, whilst the DNA resides in the aqueous phase. The aqueous layer containing the DNA can then be pipetted off and the organic phase is discarded.

To ensure complete reversal of cross-links, a further 60 µL 10 mg/mL proteinase-K was added to each sample, followed by incubation for 2 hours at 37˚C. RNaseA (12.5 µL of 10 mg/mL) was then added to each sample, followed by incubation for 1 hour at 37˚C. The samples were then transferred to 50 mL phase lock gel tubes (5 Prime) and the DNA was purified by phenol and phenol-chloroform extractions as follows.

Firstly, 8 mL phenol pH8.0 (Sigma P4557) was added to all tubes and mixed by inverting. The tubes were centrifuged for at 1500 x g for 5 minutes at room temperature to allow the organic phase to move below the gel layer. Next, 2 mL TE buffer (10 mM Tris-HCL pH8.1, 1 mM EDTA) was added to all the tubes followed by 10 mL phenol:chloroform:isoamyl alcohol pH8.0 (P:C:I; Sigma P3803), then mixed by inverting. The tubes were centrifuged for at 1500 x g for 5 minutes at room temperature and the

141 resulting aqueous layer transferred to a fresh 50 mL tube. The DNA was precipitated overnight at -20˚C (1.0 mL NaOAc pH5.2 and 25 mL 100% EtOH per tube).

Following precipitation, the samples were mixed and centrifuged at 2500 x g for 30 minutes at 4oC. The supernatant was completely removed and the pellets dried at 37˚C for 45 – 60 minutes. The pellets were then re-suspended in 400 µL fresh 1X TE buffer. The samples were transferred to 2 mL phase lock gel tubes (5 Prime) and two P:C:I extractions were performed.

Firstly, 400 µL P:C:I was added to all the tubes and mixed by inverting. The tubes were then centrifuged at 12,000 – 16,000 x g for 5 minutes at room temperature to allow the organic phase to move below the gel layer. Another 400 µL P:C:I was added to all the tubes and mixed by inverting, followed by centrifugation at 12,000 – 16,000 x g for 5 minutes at room temperature. The resulting aqueous layer was transferred to a fresh 1.5 mL tube and the DNA was precipitated in ethanol overnight at -20˚C (40 µL NaOAc pH5.2 and 1 mL 100% EtOH per tube).

Following precipitation, the samples were centrifuged at 14,500 x g for 30 minutes at 4˚C, and then washed three times with fresh, cold 70% ethanol. Each time, the pellet was re- suspended in the ethanol then re-centrifuged for 10 minutes at 4˚C. After the final wash, the ethanol was removed and the pellets were dried at room temperature for 2 – 5 minutes. The pellets were re-suspended in 25 µL fresh TE buffer and pooled together into a single tube.

The libraries were quantified using a Qubit® fluorometer (Invitrogen) with the Qubit® dsDNA Broad Range (BR) kit (Life Technologies) according to the manufacturer’s instructions (Appendix, Table 23). This kit uses a reagent that binds to DNA molecules and emits fluorescence in such a way that the amount of fluorescence is directly proportional to the amount of DNA in the sample.

Quality control of 3C libraries In order to test the quality of the 3C libraries, gel electrophoresis was performed. A 1 µL aliquot was taken from each library and diluted 1:10 in water. From this solution, 2 µL and 6 µL aliquots were made up to 10 µL with water. These aliquots were then run on a 0.8% agarose gel containing ethidium bromide (100V for 75 minutes) and viewed under a UV

142 transilluminator. Libraries passed this QC step if the DNA formed tight bands that were greater than 10 kb in size (Appendix, Figure 61).

The next quality control step involves testing the 3C libraries for the presence of known long-range and short-range interactions; this can be done by running a PCR reaction with primers known to amplify these DNA interactions. Primers targeting known interactions in the MYC and AHF loci were used (Belton et al., 2012), as well as a custom-designed primer pair targeting a short-range interaction in one of the regions of interest, 9q31 (Appendix, Table 27).

PCRs were set up using the following reagents:

200 ng 3C library 1 µL 5X PCR mix (Qiagen) containing 200 µM of each dNTP 5 µL 10 µM forward primer 1 µL 10 µM reverse primer 1 µL Hotstar Taq polymerase (Qiagen) 0.5 µL Distilled water 16.5 µL TOTAL 25 µL

The PCR cycling conditions were as follows:

95 ˚C 15 minutes 60 ˚C 1 minute 72 ˚C 1 minute 36 cycles 94 ˚C 30 seconds 60 ˚C 2 minutes 72 ˚C 10 minutes 4 ˚C Hold

The PCR products were run on a 1.5% agarose gel containing ethidium bromide (120V, 75 minutes) and viewed under a UV transilluminator. Libraries passed this QC step if ligation products could be detected for 2 short-range interactions and 2 or 3 long-range interactions (described in the Appendix, Table 27; representative gels shown in the Appendix, Figure 61).

143

Generation of bacterial artificial chromosome (BAC) control libraries For correct interpretation of 3C qPCR data, it is important to include a control library that contains all potential interactions between the restriction fragments in the locus of interest. This library serves the purpose of controlling for differing PCR efficiencies. Sequential dilutions of the control library are run alongside the test library on the qPCR plate and a standard curve is produced, against which the interaction values are normalised.

Bacterial Artificial Chromosomes (BACs) are vectors that can be loaded with fragments of human DNA. BACs can be carried by proliferating bacteria such as E. coli, which allows the DNA to be copied many times (BAC clones). Here, 3C control libraries were created from equimolar quantities of BACs. To design the libraries, the UCSC Genome Browser was used to select minimally overlapping BACs (NCBI database) that spanned each region of interest (Figure 22). For the 9q31 locus, 11 BACs were chosen, spanning the region chr9:110168556-111889073 (GRCh37). For the 6q23 region, ready-made BAC control libraries consisting of 9 minimally overlapping BACs spanning the region chr6:137286536- 138591433 (GRCh37) were generously provided by Dr Amanda McGovern. A full list of BAC IDs can be found in the Appendix (Table 28). The method for making BAC libraries is based on the protocol described by Naumova et al. (2012).

144

Ps

Figure 22: Minimally-overlapping BACs in the 9q31 locus Selected BACs (highlighted) spanned a region encompassing the most likely gene candidate KLF4, as well as the gene cluster on the other side of the gene desert containing ACTL7B, IKBKAP, FAM206A, CTNNAL1 and TMEM245. The psoriasis-associated region (Ps) is shown as a yellow rectangle.

Growing BACs and extracting DNA

BAC clones (Life Technologies) were streaked onto individual LB-agar plates containing 12.5 µg/mL chloramphenicol (Sigma) and grown overnight at 37˚C. Single colonies were selected and grown in 5 mL LB-broth starter cultures containing 12.5 µg/mL chloramphenicol for 6-8 hours with shaking at 37˚C. A large 100 mL LB- broth/chloramphenicol solution was inoculated with 1 mL of starter culture and incubated overnight with shaking at 37˚C. The cells were collected by centrifugation at 1500 x g for 15 minutes at 4˚C, the supernatant discarded, and the cell pellets stored at −80˚C prior to DNA extraction. BAC DNA was extracted using the Nucleobond BAC 100 kit (Macherey-Nagel) according to the manufacturer’s instructions. The resultant DNA was

145 reconstituted in 200 µL deionised water on a shaking heat block at 20˚C for 10-60 minutes, then quantified on a Dropsense system and stored at -20˚C.

Confirmation of BAC clone identity

The identity of each BAC clone was confirmed using PCR, alongside 100 ng Human Random Control DNA as a positive control. Primers were designed to target two separate regions near the start and end of the DNA fragment of each BAC clone (see Appendix, Table 29). The PCR reaction mixtures contained:

Template (BAC DNA or human DNA) 1 µL MyTaqHS polymerase (Bioline) 0.5 µL PCR reaction buffer (5X) 5 µL Forward primer (10 µM) 0.25 µL Reverse primer (10 µM) 0.25 µL Distilled water 17.5 µL TOTAL 25 µL

PCR reactions were carried out according to the manufacturer’s optimised settings for the MyTaqHS polymerase (Bioline). The PCR products were run on a 1.5% agarose gel containing ethidium bromide (120V for 75 minutes) and the bands were visualised on a UV transilluminator (examples shown in the Appendix, Figure 57).

Digestion of BAC DNA

BAC DNA was combined in equimolar quantities using the following formula:

Vx = Lx x (M/Cx); where

Vx = Volume of BAC DNA (µL)

Lx = Length of BAC fragment (kb)

Cx = Concentration of BAC DNA (µg/µL) M = Total amount of DNA required in the reaction (here 20 µg) divided by the total length of all BAC fragments.

The BAC DNA was digested overnight at 37˚C in a total reaction volume of 500 µL containing 20 µg BAC DNA, 50 µL HindIII (NEB), 50 µL 10X CutSmart buffer (NEB) and deionised water.

146

After digestion, a 1:1 phenol:chloroform extraction was carried out as described on page 141, followed by ethanol precipitation for several hours at -20˚C. The sample was then centrifuged at 14,500 x g for 20 minutes at 4˚C, and washed in 1 mL of fresh, cold 70% ethanol. After centrifugation, the pellet was briefly dried at room temperature then re- suspended in 44 µL deionised water and incubated at 37 ˚C for 15 minutes.

Ligation and purification of control libraries

The ligation reaction was performed overnight at 16 °C in a total volume of 60 µL containing 44 µL digested BAC DNA, 5 µL T4 DNA ligase, 6 µL 10X T4 ligase buffer and 5 µL water.

The next day, the sample was incubated at 65 °C for 15 minutes in order to inactivate the ligase. 140 µL deionised water was added and two 1:1 P:C:I extractions were performed. Following this, a 1:1 chloroform extraction was performed in the same way using 200 µL of chloroform. The upper layer was transferred to a fresh 2 mL tube then 20 µL of 3M NaOAc, pH 5.2 and 500 µL of ice cold 100% ethanol was added, mixed by gentle inversion, and the sample incubated at -20˚C for several hours.

The sample was centrifuged at 14,500 x g for 20 minutes at 4˚C, and then washed twice in 1 mL of fresh, cold 70% ethanol. After centrifugation, the pellet was briefly dried at room temperature then re-suspended in 100 µL 1X TE, pH 8.0 and incubated at 37 ˚C for 15 minutes. The DNA was then quantified using a Qubit® fluorometer with the Qubit® dsDNA BR kit and stored at -20˚C.

3C qPCR and analysis Finally, qPCR was carried out to test for the presence of interactions in the regions of interest.

Primer design for 3C

Primers were designed using Primer3 to complement regions at the ends of the HindIII fragments to be tested in each locus (see Appendix Table 30 and Table 31 for primer sequences). In order to detect the required ligation products, the primers were designed in a unidirectional manner, approximately 50-100 bp from the restriction cut site. The unidirectional design prevents amplification of unwanted products including self-ligated fragments (see Figure 23). Assuming that the two fragments were ligated during library

147 preparation, this design generated amplicons of up to 200 bp. As for ChIP, primer sequences avoided targets containing common variation (MAF > 1%) and had minimal evidence of hairpins and primer dimers.

For each anchor fragment, a primer was designed to target a HindIII fragment located 2 – 3 fragments along from the anchor fragment. This served as a short-range positive control, since fragments in close proximity to each other should ligate together by chance. Depending upon the assay design, some experiments included a primer targeting a negative control region (NCR) in between the anchor and the target fragment(s). An interaction between the anchor fragment and the NCR was not expected to occur, based on prior knowledge of interactions in the locus. Finally, primers were designed to capture interactions with the chosen target fragments.

Figure 23: Primer design and potential ligation products in 3C, adapted from Naumova et al. (2012) In 3C, primers are designed to target the same strand at the ends of the two restriction fragments to be tested (1). There are several potential ligation products (2,3) but the only product that can be detected by qPCR, according to this design, is a product that is formed by ligation between the two fragments of interest (asterisk in 3.ii.). Used with permission.

SYBR® green and TaqMan® protocols

For each target, qPCR was carried out for three 3C libraries using PCR triplicates in a MicroAmp® Optical 384-Well Reaction Plate. For each tested interaction, 10-fold dilutions

148 of the BAC template (50 – 0.005 ng) were included alongside the 3C library templates. The reporter used was either SYBR® green or TaqMan® (ThermoFisher Scientific), as described below.

SYBR® green assays For the majority of 3C-qPCR assays, SYBR® green was used as a reporter. Reaction mixtures were set up as follows:

3C template (50 ng/µL) or BAC template 1 µL Power SYBR® Green (2X) 5 µL Anchor primer (10 µM) 0.5 µL Target primer (10 µM) 0.5 µL Distilled water 3 µL TOTAL 10 µL

PCR cycling conditions for SYBR® green are described in the ChIP methods (page 137).

TaqMan® analyses The use of TaqMan® technology is another approach towards detecting 3C products. TaqMan® probes are short nucleotide sequences that can be designed to complement a target region of interest between the forward and reverse primers. The probe is conjugated to a fluorophore at the 5’ end that is inhibited by a quencher at the 3’ end (denoted F and Q in Figure 24). As the target region is transcribed, the quencher is released and the fluorophore is allowed to emit fluorescence, which is detected by the qPCR machine. Therefore, as opposed to SYBR® that detects all double stranded DNA, TaqMan® is specific to the target locus and therefore reduces the likelihood of non- specific signals. In 3C, the TaqMan® probe is designed to bind to a region between the anchor primer and the restriction cut site, as close as possible to the latter (Figure 24). Here, a probe was used to target the restriction fragment containing KLF4 in the third qPCR assay in 9q31 (see page 153).

149

Figure 24: The use of TaqMan® technology in 3C-qPCR, from Hagege et al. (2007) A TaqMan probe consists of an oligonucleotide with a fluorophore at the 5’ end (F) and a quencher at the 3’ end (Q). In 3C the TaqMan probe is designed to complement a region close to the restriction cut site in the constant fragment (also known as the anchor fragment). During 3C, a target fragment ligates to the anchor fragment creating a chimeric product. Unidirectional primers (C and T1) ensure that the ligation product can be detected. Upon qPCR elongation, the enclosed TaqMan probe releases fluorescence due to cleavage of the quencher, leading to a signal that is proportional to the amount of chimeric product. Figure used with permission.

150

Here, reactions were set up in the following quantities according to Hagege et al. (2007):

3C template (100 ng/µL)/BAC template 1 µL TaqMan® gene expression mastermix 5 µL TaqMan probe (1.5 µM) 1 µL Anchor primer (10 µM) 0.5 µL Target primer (10 µM) 0.5 µL Distilled water 2 µL TOTAL 10 µL

The plates were run on a QuantStudio 12K Flex Real-Time PCR System using the following settings:

50 ˚C 2 minutes 1 cycle 95 ˚C 10 minutes 1 cycle

95 ˚C 15 seconds 40 cycles 60 ˚C 1 minute

3C data analysis

For both SYBR® and TaqMan® assays, the data were exported from the QuantStudio 12K Flex software into Microsoft Excel for analysis. Initially, the Ct values of the technical PCR triplicates were screened for a standard deviation (s.d.) < 0.5. For some interactions, the s.d. of the Ct values of the PCR triplicates was > 0.5; this was particularly the case for negative control regions and low concentrations of BAC library (0.005 ng/µL). In these cases, the two triplicates closest to each other were retained whilst the outlier was excluded from the analysis.

For each primer pair, BAC curves were generated from Log10 of the library concentration against the average Ct value across PCR triplicates (Figure 25). If the PCR triplicate Ct values for a particular dilution of the BAC library had a s.d. > 0.5, even after excluding an outlying Ct value, that BAC dilution was not used in the curve. Each curve was generated from a minimum of three serial dilutions of the BAC library.

151

BAC standard curve 40 35 30

25 20

Ctvalue 15 10 y = -3.2929x + 28.924 5 R² = 0.9973 0 -3 -2 -1 0 1 2

Log10(concentration)

Figure 25: Example of a standard curve generated from dilutions of a BAC library For this target, qPCR reactions were carried out in triplicate for five 10-fold dilutions of BAC library (50 – 0.005 ng/µL). The Ct value (mean of PCR triplicates) is plotted against Log(concentration in ng/µL). Here, the slope is -3.29 and the Y intercept is 28.9 (3 s.f.). The efficiency of this primer pair is 101.2%.

The interaction frequency was calculated for each tested interaction in the following manner:

Interaction Frequency (F) = 10((Ct – i)/s); where

Ct = Measured cycle threshold value of 3C library (mean of PCR triplicates) i = Y intercept of BAC curve (see Figure 25) S = Slope of BAC curve (see Figure 25)

The interaction frequencies were then normalised to the short-range control in the following manner:

퐹(푝표푠푖푡푖푣푒 푐표푛푡푟표푙) Relative Interaction Frequency (R) = ; where 퐹 (푡푒푠푡 푖푛푡푒푟푎푐푡푖표푛)

F = Interaction frequency

Interactions were considered to be positive if the relative interaction frequency was greater than that of a negative control in the locus. Statistical tests were performed using GraphPad Prism 7 (T test or ANOVA, as appropriate). Where a single target interaction was compared to a negative control interaction, an unpaired T test was used to challenge

152 the null hypothesis that the mean value of the two different groups was equal. Where multiple interactions were compared across a locus, one-way ANOVA was used to compare the mean values of multiple groups. Here, tests for multiple comparisons were used in order to determine which interactions were significantly different from each other (Dunnett test for one negative control region; Tukey’s test for multiple regions as recommended by Prism).

Selection of fragments for 3C

Assay designs in the 9q31 and 6q23 loci were informed by results from bioinformatic analyses (See results Section 2.4.1). See Appendix Table 30 and Table 31 for primer sequences.

9q31 3C assays In 9q31, 3C-qPCR assays were conducted to determine if the intergenic psoriasis association interacted with genes on either side of the gene desert. Although the assay was designed to be relatively hypothesis-free, it was necessary to prioritise gene targets due to the low-throughput nature of the protocol. Therefore, fragments were selected encompassing the promoters of candidate genes based on GTEx expression data and prior knowledge of chromatin architecture in the locus. This led to the targeting of several fragments surrounding KLF4, as a likely gene candidate. In total, three separate assay designs were drawn up involving the HindIII fragments shown in Figure 26; the assay designs are detailed in Table 13, Table 14 and Table 15.

153

1 2 5 6 7-18 19 20 22 23 24 3 21 4

Hi-C score (NHEK)

Figure 26: Anchor and target fragments for 3C assays across the 9q31 (KLF4) locus The gene desert in 9q31 represents a single large interacting domain: the heatmap shows Hi-C interaction data for NHEK at 100 kb resolution (Rao et al., 2014). HindIII Fragments (green bars) were selected to map interactions between psoriasis associated variants and surrounding gene regions: RP11-363D24.1 (1); KLF4 centromeric 1 (2); KLF4 centromeric 2 (3); KLF4 gene & promoter (4); KLF4 short range (5); Intergenic A (6); Psoriasis-associated LD block (r2 > 0.8 with rs10979182) regions 1-9 (7-15); Enhancer 2 short range (16); Psoriasis-associated LD block regions 10-11 (17-18); Intergenic B (19); Positive 1 (20); Positive 2 (21); Intergenic C (22); IKBKAP & FAM206A promoter (23); CTNNAL1 promoter (24).

The HindIII fragments targeted in the first assay are described in Table 13. The anchor fragment was located at the second putative psoriasis associated enhancer. This fragment was selected because it contained rs6477612, which was identified as the most likely casual SNP according to RegulomeDB. In addition, this fragment contained five other variants in tight LD with the lead GWAS SNP: rs113137157, rs55975335, rs6477613, rs1318148 and rs892687. Other-end primers were designed to target fragments near KLF4 that have previously been shown to interact with fragments in the gene desert, including a long non-coding RNA (RP11-363D24.1) and intergenic fragments downstream of KLF4 (KLF4 Centromeric 1 and 2) (Dryden et al., 2014; Martin et al., 2015). The first gene target was the KLF4 gene, which lies completely within one HindIII fragment. On the 154 right hand side the gene targets included IKBKAP and FAM206A, which share a promoter in the same fragment, and CTNNAL1. In addition, fragments were selected in between the psoriasis association and the gene targets (Intergenic A and C) (Table 13).

Table 13: Fragments targeted in the first 9q31 (KLF4) 3C-qPCR assay Fragment name Fragment designation HindIII fragment location HindIII fragment number Enhancer 2 rs6477612 Anchor chr9:110810596-110816598 14 (rs10979182 LD region 8) Enhancer 2 SR Short-range control chr9:110824104-110824379 16 RP11-363D24.1 Target chr9:110195017-110201199 1 KLF4 Centromeric 1 Target chr9:110237781-110238441 2 KLF4 Centromeric 2 Target chr9:110241992-110244654 3 KLF4 gene and promoter Target chr9:110244654-110255868 4 Intergenic A Target chr9:110452475-110459966 6 Intergenic C Target chr9:111343493-111346822 22 IKBKAP & FAM206A Target chr9:111692307-111703976 23 promoter CTNNAL1 promoter Target chr9:111770041-111776135 24

The HindIII fragments targeted in the second assay are described in Table 14. Other-end primers were designed to target the three psoriasis-associated putative enhancers containing potential causal SNPs rs10217259, rs6477612 and rs4978343, respectively. The fragment containing the third putative enhancer also contains the GWAS index SNP, rs10979182. Additional targets included intergenic fragments (Intergenic B and C) and positive control fragments that are expected to interact with the anchor, as previously shown in other cell types (Positive 1 and 2) (Dryden et al., 2014; Martin et al., 2015).

155

Table 14: Fragments targeted in the second 9q31 (KLF4) 3C-qPCR assay Fragment name Fragment designation HindIII fragment location HindIII fragment number KLF4 Centromeric 1 Anchor chr9:110237781-110238441 2

SR KLF4 Centromeric 1 Short-range control chr9:110241992-110244649 3 (KLF4 Centromeric 2 fragment) Enhancer 1 rs10217259 Target chr9:110801022- 110808470 13 (rs10979182 LD region 7) Enhancer 2 rs6477612 Target chr9:110810596-110816598 14 (rs10979182 LD region 8) Enhancer 3 rs4978343 Target chr9:110816603-110821889 15 (rs10979182 LD region 9) Intergenic B Target chr9:110930759-110940017 19 Positive 1 Target chr9:111035783-111036598 20 Positive 2 Target chr9:111038412-111038716 21 Intergenic C Target chr9:111343493-111346822 22

The HindIII fragments targeted in the third 3C-qPCR assay are detailed in Table 15. The assay was anchored at the fragment containing the KLF4 gene, where a TaqMan probe was designed to complement a sequence between the primer and the HindIII cut site. The probe was situated 26 bp from the HindIII cut site, inclusive of the 20 bp probe sequence (sequence shown in Appendix, Table 30). Primers were then designed to target eleven fragments evenly spaced across the psoriasis associated variant set (mean distance between HindIII cut sites 6822 bp). In addition, intergenic fragments were targeted on each side of the psoriasis association (Intergenic A and B) (Table 15).

156

Table 15: Fragments targeted in the third 9q31 (KLF4) 3C-qPCR assay Fragment name Fragment designation HindIII fragment location HindIII fragment number KLF4 gene and promoter Anchor chr9:110244654-110255868 4 KLF4 gene body SR Short-range control chr9:110257072-110259991 5 Intergenic A Target chr9:110452475-110459966 6 rs10979182 LD region 1 Target chr9:110765872-110770045 7 rs10979182 LD region 2 Target chr9:110772357-110775650 8 rs10979182 LD region 3 Target chr9:110780164-110782236 9 rs10979182 LD region 4 Target chr9:110783715-110788661 10 rs10979182 LD region 5 Target chr9:110793130-110796043 11 rs10979182 LD region 6 Target chr9:110798743-110801016 12 rs10979182 LD region 7 Target chr9:110801022-110808470 13 rs10979182 LD region 8 Target chr9:110810596-110816597 14 rs10979182 LD region 9 Target chr9:110816603-110821888 15 rs10979182 LD region 10 Target chr9:110824384-110829060 17 rs10979182 LD region 11 Target chr9:110833319-110838267 18 Intergenic B Target chr9:110930759-110940017 19

157

6q23 3C assays In 6q23, recent CHi-C data from other researchers within the ARUK lab group has demonstrated the presence of robust interactions between intergenic autoimmune disease-associated SNPs and regions near immune-related genes including IL20RA, IL22RA2, IFNGR1 and TNFAIP3 in B and T cell lines (Martin et al., 2016; Martin et al., 2015; McGovern et al., 2016). Therefore, the assay design for 6q23 was more hypothesis-driven than the assay design for 9q31. The purpose of the qPCR assays in 6q23 was to test for interactions between the psoriasis association and local immune-related genes. In addition, interactions with the RA locus were added as a positive control. All targeted fragments are shown in Figure 27.

1 2 3 4-5 6 8-11 7

Hi-C score (NHEK)

Figure 27: Anchor and target fragments across the 6q23 (TNFAIP3) locus The 1 Mb 6q23 locus contains complex chromatin folding: the heatmap shows Hi-C interaction data for NHEK at 50 kb resolution (Rao et al., 2014). HindIII fragments (green bars) were selected to map interactions between psoriasis-associated variants, RA-associated variants and surrounding gene regions: 1) short-range for IL22RA2, 2) IL22RA2; 3) IFNGR1 region; 4-5) NCR 1 and NCR2 at OLIG3; 6) RA (index SNP); 7) RA SNPs (LD block); 8) TNFAIP3 promoter; 9) Ps SNPs 1 (index SNP rs582757 within TNFAIP3); 10) short- range for Ps SNPs 2 and 11) Ps SNPs 2 (downstream of TNFAIP3).

158

For the first 6q23 locus assay, the primers and BAC control libraries were generously provided by Dr Amanda McGovern. In the first assay the anchor fragments were located at IL22RA2 or IFNGR1, whilst the positive control target fragments were located at the RA susceptibility locus (intergenic) or the psoriasis susceptibility locus (near TNFAIP3). Within the psoriasis susceptibility locus, two fragments were selected as targets: “Ps SNPs 1”, encompassing the lead SNP rs582757 within TNFAIP3, and “Ps SNPs 2”, encompassing two further correlated SNPs (rs4895498 and rs6933987) downstream of TNFAIP3. A negative control region (NCR) was selected at a fragment near OLIG3, which lies between the anchor fragment and the target fragment(s) (Figure 27). The targeted fragments are described in Table 16.

Table 16: Fragments targeted in the first 6q23 3C-qPCR assay Abbreviations: NCR, negative control region; Ps, psoriasis; RA, rheumatoid arthritis; SR, short-range

HindIII fragment Fragment name Fragment designation HindIII fragment location number SR IL22RA2 Short-range control 1 chr6:137413796-137414948 1 IL22RA2 promoter Anchor 1 chr6:137492387-137506744 2 SR IFNGR1 (IL22RA2 Short-range control 2 chr6:137492387-137506744 2 promoter) IFNGR1 region Anchor 2 chr6:137570293-137583223 3 NCR1 (OLIG3) Target chr6:137823604-137834160 4 RA SNPs Target chr6:138007206-138017056 7 Ps SNPs 1 (TNFAIP3) Target chr6:138196801-138202660 9 Ps SNPs 2 (TNFAIP3) Target chr6:138233166-138241189 11

The second assay in 6q23 aimed to determine potential interactions between psoriasis- associated variants and the promoter of TNFAIP3, as well as between psoriasis-associated variants and the RA-associated locus. For this assay, a fragment containing psoriasis- associated SNPs downstream of TNFAIP3 (Ps SNPs 2) was used as the anchor. Other-end primers were designed to target the TNFAIP3 promoter and a fragment containing the RA index SNP rs6920220. In addition, an intervening fragment at TNFAIP3 was included (Ps SNPs 1). HindIII fragments targeted in the second assay are described in Table 17.

159

Table 17: Fragments targeted in the second 6q23 3C-qPCR assay Abbreviations: NCR, negative control region; Ps, psoriasis; RA, rheumatoid arthritis; SR, short-range

HindIII fragment Fragment name Fragment designation HindIII fragment location number Ps SNPs 2 (TNFAIP3) Anchor chr6:138233166-138241189 11 Ps short-range Short-range control chr6:138222582-138223161 10 Ps SNPs 1 (TNFAIP3) Target chr6:138196801-138202660 9 TNFAIP3 promoter Target chr6:138184712-138186854 8 RA index Target chr6:138003257-138007201 6 NCR2 (OLIG3) Target chr6: 137837368-137841021 5

2.3.1.4 Stimulation of HaCaT cells for ChIP and 3C in 9q31 Inferferon gamma (IFN- γ) is a known inducer of KLF4 expression (Feinberg et al., 2005; Madonna et al., 2010). In addition, it is a pro-inflammatory Th1 cytokine that is upregulated in psoriatic plaques. Here, HaCaT cells were stimulated with IFN- γ over a time-course and qPCR was used to determine differential expression of KLF4. Subsequent ChIP and 3C libraries were generated in order to determine if stimulation had an effect on regulatory marks or chromatin interactions between the psoriasis association and KLF4 in 9q31.

Stimulation of HaCaT cells HaCaT cells were seeded into 6 well plates at 500,000 cells per well in 1 mL medium (DMEM supplemented with 1% penicillin-streptomycin and 10% FBS). The cells were then incubated at 37˚C with 5% CO2 overnight to allow the cells to attach to the plate and commence proliferation. The next day, the cells were stimulated with medium containing 100 ng/mL of recombinant human IFN-γ (285-IF-100; R&D Systems). As a control, cells were left unstimulated. After 8 hours, 24 hours and 48 hours post-stimulation the medium was aspirated and RNA was extracted from the cells.

RNA extraction RNA was extracted using the RNeasy mini kit (Qiagen). Here, the manufacturer’s instructions for “purification of total RNA from animal cells using spin technology” were followed (Appendix, Table 23). This involved directly lysing the cells in the monolayer with

160 buffer RLT and homogenising the lysate through QIAshredder spin columns (Qiagen). The RNA was then bound to the RNeasy spin columns and washed according to the protocol. The optional addition of a DNase digestion step using an RNase-free DNase kit (Qiagen) was carried out in order to remove any DNA contamination. The RNA was eluted in 30 µL RNase-free water and stored at -80˚C.

RNA quality and quantity The quality and quantity of the RNA was assessed initially using a NanoDrop spectrophotometer, which measures the sample absorbance at specific wavelengths of light, according to the sample type (e.g. DNA or RNA). RNA was considered to be pure if the ratio of absorbance at 260 nm and 280 nm was approximately 2.0. However, results from the NanoDrop can be affected by contamination with other kinds of nucleic acid that absorb these wavelengths, such as DNA. To obtain an accurate quantification, therefore, each RNA sample was then measured by a Qubit® fluorometer (Invitrogen) using the Qubit® RNA HS Assay Kit (Q32852) according to the manufacturer’s instructions (Appendix, Table 23). This kit uses a reagent that binds to RNA molecules and emits fluorescence in such a way that the amount of fluorescence is directly proportional to the amount of RNA in the sample. The advantage over a spectrometric technique, such as the NanoDrop, is that potential contaminators (e.g. DNA) do not contribute to the quantification.

In order to detect any RNA degradation, the samples were also analysed on a Bioanalyzer (Agilent Technologies) using the Agilent RNA 6000 Nano Kit (5067-1511) according to the manufacturer’s instructions (Appendix, Table 23). The Bioanalyzer uses microfluidics technology to perform capillary gel electrophoresis. The 2100 Expert Software (Agilent Technologies) was used to check for RNA quality through the RNA Integrity Number (RIN); a value ranging from 1-10 that is automatically assigned to total RNA samples by the software. The RIN is determined by the ratio of the 28S to 18S ribosomal RNA (rRNA) in the electropherogram trace. A RIN of 10 indicates high integrity RNA, whereas a RIN of 1 indicates major degradation. Here, all RNA samples had an acceptable RIN > 8. An example of a typical Bioanalyzer output from a high-quality RNA sample is shown in the Appendix (Figure 58). qPCR for differential gene expression Differential expression for certain genes of interest was detected by qPCR. 161

Primer design for expression qPCR

The cDNA sequence for KLF4 was downloaded from Ensembl (http://www.ensembl.org/). Primers were designed using Primer3 to target a region of 100-200 bp encompassing an exon-exon boundary. Primer sequences avoided targets containing common variation (MAF > 1%) and had minimal evidence of hairpins and primer dimers. Additionally, the primers were run through Primer BLAST against RefSeq mRNA to confirm their uniqueness. Routinely used primers for housekeeping genes GAPDH and HPRT1 were obtained from colleagues within the ARUK research group. Primer sequences are shown in the Appendix (Table 32).

Primer efficiency and specificity for expression qPCR

To measure efficiency the primers were used to amplify products from extracted RNA samples in a qPCR reaction using the Power SYBR® Green RNA-to-Ct™ 1-Step Kit (ThermoFisher Scientific), as described in the next section. In order to create a standard curve for each primer, 10-fold serial dilutions were made of the RNA template (1, 1/10, 1/100, 1/1000 and 1/10,000). Each primer set was then tested in triplicate against each concentration of template. Primer efficiency was calculated as before (page 135) and the melt curve was analysed by eye to determine target specificity.

Expression qPCR and differential gene expression analysis qPCR for expression of genes of interest was performed using the Power SYBR® Green RNA-to-Ct™ 1-Step Kit. In short, this kit provides a single reaction whereby the input RNA is reverse transcribed to cDNA, followed immediately by amplification and quantification in real time. Reactions were set up in triplicate in a MicroAmp® Optical 384-Well Reaction Plate in the following reaction quantities:

RNA template (5 ng/µL) 1 µL Power SYBR® Green RT-PCR Mix (2X) 5 µL RT Enzyme Mix 0.08 µL Forward primer (10 µM) 0.5 µL Reverse primer (10 µM) 0.5 µL Distilled water 2.92 µL TOTAL 10 µL

162

The plates were covered with MicroAmp® Optical Adhesive Film and run on a QuantStudio 12K Flex Real-Time PCR System using the following settings:

Reverse transcription 48 ˚C 30 minutes 1 cycle Activation of AmpliTaq Gold® DNA 95 ˚C 10 minutes 1 cycle Polymerase, UP (Ultra Pure)

Denaturing 95 ˚C 15 seconds 40 cycles Annealing/extension 60 ˚C 1 minute

Denaturing 95 ˚C 15 seconds 1 cycle Annealing 60 ˚C 15 seconds (melt curve) Denaturing 95 ˚C 15 seconds

The data were exported from the QuantStudio 12K Flex software into Microsoft Excel for analysis. Differential expression was determined using the comparative Ct method (Schmittgen and Livak, 2008). This is a simple analysis that allows the data to be presented as a fold change in expression between, for example, a treated and an untreated sample. For each sample, the Ct values for the test genes are normalised to internal control genes for which expression does not change upon stimulation. Here the housekeeping genes GAPDH and HPRT1 were used, since their expression was found to remain stable across all conditions. The following formulas were used to calculate the mean fold change in gene expression for each condition (samples in duplicate):

Fold change = 2-ΔΔCT

Where ΔΔCT = (ΔCT treated) – (ΔCT untreated)

And ΔCT = (Ct test gene) – (mean Ct of housekeeping genes)

Generation of stimulated ChIP and 3C libraries HaCaT cells were grown as described above (page 126). Once cells reached approximately 80% confluence, they were stimulated for 8 hours with 5 mL media supplemented with 100 ng/mL recombinant human IFN-γ. Control cells received 5 mL media that did not contain IFN-γ. Unstimulated and stimulated ChIP libraries were generated in triplicate and ChIP enrichment at target loci was determined by qPCR using the percentage input 163 method as described above. Unstimulated and stimulated 3C libraries were generated as described above in duplicate. A 3C assay was carried out using SYBR green anchored at a psoriasis-associated enhancer targeting interactions across the 9q31 locus.

164

2.3.2 Methods for functional characterisation of multiple risk loci Following on from the locus-specific functional work, this section of the functional methods describes the experiments that were undertaken to characterise chromatin folding in all known psoriasis loci by CHi-C. One aspect of the CHi-C experiment was to determine how the chromatin architecture responds to an inflammatory environment, using stimulated HaCaT cells. This required a pilot study to examine the effect of cellular stimulation on gene expression over a time-course. For this, HaCaT cells were stimulated with either IL-17A or IFN-γ; both of which are key pro-inflammatory mediators in psoriasis.

2.3.2.1 HaCaT stimulation time-course and expression analysis A stimulatory time-course was conducted over 48 hours. At each time point, RNA was extracted and analysed using an expression microarray.

Stimulation of HaCaT cells HaCaT cells were seeded into 6 well plates at 500,000 cells per well in 1 mL medium (DMEM supplemented with 1% penicillin-streptomycin and 10% FBS). The cells were then incubated at 37˚C with 5% CO2 overnight to allow the cells to attach to the plate and commence proliferation. The next day, the cells were stimulated with medium containing 100 ng/mL of recombinant human IL-17A (317-ILB-050; R&D Systems) or IFN-γ (285-IF- 100; R&D Systems). As a control, some cells were left unstimulated. After 2 hours, 8 hours, 24 hours and 48 hours post-stimulation the medium was aspirated and RNA was extracted from the cells.

RNA extraction and quality control RNA was extracted using the RNeasy mini kit and QIAshredder spin columns. This was followed by quantification and quality control using the NanoDrop, Qubit and Bioanalyzer as described above.

Expression microarray To determine the differential expression profiles between stimulatory conditions, RNA from cells stimulated with 100 ng/mL IL-17A or IFN-γ across the time-course was analysed on HumanHT-12 v4 Expression BeadChips (Illumina). The HumanHT-12 microarray contains > 47,000 probes for genes, gene candidates and splice variants across the genome.

165

RNA amplification and biotinylation

For each RNA sample, a starting amount of 500 ng in 11 µL water was amplified and biotinylated using the TotalPrep® RNA Amplification kit (Life Technologies) according to the manufacturer’s instructions (Appendix, Table 23). RNA samples were reverse transcribed to produce single stranded cDNA, which was then converted into double stranded cDNA. The cDNA was purified and, in an overnight reaction, transcribed into many copies of biotinylated cRNA. Following a final purification, the cRNA was quantified using the Qubit® RNA HS Assay Kit.

Hybridisation to the HumanHT-12 v4 Expression BeadChip

The Whole-Genome Gene Expression Direct Hybridization Assay (Appendix, Table 23) was used to hybridise the cRNA to the microarrays. For each sample, the prepared biotinylated cRNA was concentrated using an Eppendorf Speedvac concentrator at room temperature for an appropriate length of time (ranging from 8-45 minutes). For each sample, 750 ng cRNA in 5 µL was then loaded onto the BeadChips and allowed to hybridise overnight. On the second day the BeadChips were washed using the supplied reagents followed by imaging and signal detection using the iScan System.

Differential gene expression analysis

The BeadChip data was exported from Illumina’s GenomeStudio software and an initial quality assessment for each sample was run in R using the iScan metrics data. The expression data were then analysed in R using the Bioconductor package Limma (Ritchie et al., 2015); a program that has been extensively used in particular for differential expression analysis (Phipson et al., 2016). Firstly, the raw data were normalised by background correction to negative control probes, quantile normalisation and log2 transformation using the neqc function. Individual probes were then filtered from the analysis if they were not expressed in any array. The function plotMDS was used to create a multidimensional scaling plot displaying typical log2 fold changes between all of the samples, based on the top 500 genes. Further filtering was conducted based on probes that were uninformative according to the Illumina Human V4 database of probe sequences. The function lmFit was then used to create a linear model fitted to the data. The function arrayWeights was used here to weight the arrays based on their quality (Ritchie et al., 2006). Differential expression analysis was carried out using the function eBayes, which applies an empirical Bayes test to the model in order to find significant 166 differential expression. In this case, comparisons were made between duplicate samples of stimulated cells and unstimulated cells at each time-point (2, 8, 24 and 48 hours). Tables of the 10 most differentially expressed genes were produced using the command topTable.

2.3.2.2 Capture Hi-C study Capture Hi-C (CHi-C) is a cutting-edge method for mapping long range chromatin interactions in targeted regions across the genome through the use of custom designed RNA baits. Generation of a CHi-C library takes approximately three weeks and shares common processes with 3C and Hi-C protocols (Figure 28). Here, CHi-C libraries were generated in the following cell lines in duplicate: My-La (no stimulation), HaCaT (no stimulation) and HaCaT (stimulated with 100 ng/mL IFN-γ for 8 hours, based on the stimulation pilot study), summarised in Table 18.

Generation of Hi-C libraries Hi-C libraries were generated according to a protocol based on the original method by Lieberman-Aiden et al. (2009) with modifications by Dr Stefan Schoenfelder (Babraham Institute, Cambridge, UK). This protocol was established in the ARUK lab by Dr Amanda McGovern.

Hi-C cell culture and formaldehyde cross-linking

HaCaT cells were grown as described above (page 126). Once cells reached approximately 80% confluence, they were stimulated for 8 hours with 5 mL media supplemented with 100 ng/mL recombinant human IFN-γ. Control cells received 5 mL media that did not contain IFN-γ. After 8 hours, the media was aspirated and the cells were fixed with formaldehyde according to the 3C protocol (Page 139). Here, 5 flasks were combined at the scraping stage to make a total of 5 x 107 cells per Hi-C library.

My-La cells were grown as described above (Page 128). Once cells reached confluence, they were pooled into two tubes, each containing 2.5 x 107 cells. The cells in each of these tubes were then fixed with formaldehyde according to the 3C protocol (Page 139). At the lysis stage these two tubes were combined, making a total of 5 x 107 cells per Hi-C library.

167

Common processes

Hi-C /CHi-C processes Formaldehyde fixation

Cell lysis and digestion

Biotinylation Ligation and crosslink reversal

Purification and quantification

QC: Dilution gel and PCR QC: Restriction digest

Biotin removal from non- ligated ends 3C library DNA shearing and end-repair

A-tailing and SPRI bead selection

Biotin-streptavidin pulldown CHi-C-specific processes and adapter ligation

Test PCRs

Solution capture Final PCR hybridisation

Biotin-streptavidin pulldown Hi-C library Test PCRs

Final PCR

CHi-C library

Figure 28: Processes underlying 3C, Hi-C and CHi-C library generation All 3C-based techniques share initial core processes, beginning with formaldehyde fixation of cells (20-30 million cells for 3C; 50 million cells for Hi-C or CHi-C). The first stage (blue) generates complete 3C libraries, which can then be analysed by qPCR methods. Dependent on quality control (QC) checks, biotinylated Hi-C or CHi-C libraries are processed by continuing through the second stage (green), at the end of which complete Hi-C libraries are generated. CHi-C libraries are prepared in the third stage (red), during which the loci of interest are enriched for using custom RNA baits. Hi-C and CHi-C libraries can be analysed using high throughput sequencing. 168

Table 18: Summary of generated CHi-C libraries Cell Line Stimulation Number of libraries My-La None 2 HaCaT None 2 HaCaT IFN-γ 2

Cell lysis and restriction digestion

The cross-linked cells were lysed and digested in the same manner as for 3C (Page 140).

Biotinylation, ligation and reversal of cross-links

To create Hi-C libraries, the DNA ends created from HindIII digestion are filled in and marked with biotin. This allows for enrichment of the ligation junctions at a later stage. Non-biotinylated 3C libraries were also created as a control.

For each sample, most of the aliquots were processed for Hi-C whilst some were processed as 3C controls. Whilst biotinylation was carried out on the Hi-C libraries, the 3C control libraries were kept on a thermomixer at 37˚C. Hi-C libraries were placed on ice and each aliquot received the following: 10.5 µL biotinylation master mix (6 µL 10X NEBuffer2, 1.5 µL 10 mM dCTP, 1.5 µL 10 mM dGTP, 1.5 µL 10 mM dTTP), 37.5 µL biotin- 14-dATP (Life Technologies; 19524-016) and 10 µL DNA polymerase I, large (Klenow) fragment (NEB; M0210L). The samples were incubated at 37 ˚C for 60-90 minutes, whilst clumps were re-suspended every 10 minutes using a pipette.

Next, ligation of the 3C and Hi-C libraries was carried out. A ligation reaction mixture was prepared in a 15 mL tube for each aliquot (6.71 mL distilled water, 82 µL 10 mg/mL BSA and 820 µL T4 DNA ligase reaction buffer). The chromatin was added to each tube, followed by the addition of the appropriate ligase enzyme: Hi-C libraries received 50 µL T4 DNA ligase (Life Technologies; 155224025) whereas 3C control libraries received 2 µL of T4 DNA ligase (NEB; M0202S). The tubes were incubated in a water bath at 16˚C for 4-6 hours. After this, cross-links were reversed using 60 µL (10 mg/mL) proteinase-K per tube and the samples were incubated in a waterbath overnight at 65˚C.

169

Hi-C DNA purification and quantification

Following cross-link reversal, the DNA was purified in the same manner as described in the 3C protocol (Page 141). As for 3C, the libraries were quantified using a Qubit® fluorometer with the Qubit® dsDNA Broad Range (BR) kit according to the manufacturer’s instructions (Appendix, Table 23). The libraries were stored at -20˚C.

Quality control of Hi-C libraries

In order to test the quality of the Hi-C and 3C libraries, gel electrophoresis was performed as described in the 3C protocol (Page 142). This involved a running 1/10 dilutions of each library on a gel and performing PCR reactions for known long-range and short-range interactions, followed by detection of products on a gel.

In addition, a restriction digest was carried out in order to determine whether the biotinylation step was carried out sufficiently. In a Hi-C library, the process of filling in, biotinylating and ligating the DNA ends at each HindIII site (AAGCTT) creates a site recognised by another restriction enzyme, NheI (GCTAGC). This means that ligation products from Hi-C libraries should digest with NheI but not HindIII, whereas those from 3C libraries should digest with HindIII but not NheI (Figure 29).

To perform the restriction digest, five identical PCR reactions were set up for each Hi-C and 3C control library using primers targeting one of the short-range interactions at AHF. Following PCR, the resultant products were pooled for each library and purified using the Qiagen MinElute PCR Purification Kit according to the manufacturer’s instructions (Appendix, Table 23). The concentration of each product was determined using a NanoDrop (ThermoFisher Scientific). For each library, the product was processed under four different conditions: no restriction enzyme (undigested), digested with HindIII, digested with Nhe1, or digested with both HindIII and Nhe1. For each condition, 500 ng of product was combined with 2 µL of 10 mg/mL BSA, 1 µL of each enzyme (where required) and made up to 20 µL total with water. The reactions were incubated at 37˚C for 2 hours, then analysed by gel electrophoresis (2% agarose gel; 100V for 75 minutes). Samples passed this quality control step if the 3C control library digested only with HindIII and the Hi-C library digested only with Nhe1, as shown in Figure 29.

170

3C Hi-C

HindIII - + - + - + - + Nhe1 - - + + - - + +

Figure 29: Restriction digest QC of Hi-C libraries A ligation product from the 3C library contains a HindIII cut site but no Nhe1 site and therefore digests with HindIII but not Nhe1. A ligation product from a biotinylated Hi-C library contains an Nhe1 cut site but no HindIII site and therefore digests with Nhe1 but not HindIII (“+” indicates a 2 hour incubation with the enzyme; “-” indicates no enzyme)

Biotin removal from non-ligated ends

After passing quality control, 40 µg of each Hi-C library was carried forward. At this stage, biotin-14-dATP was removed from non-ligated ends using the exonuclease activity of T4 DNA polymerase (NEB; M0203L). The library was split into 8 aliquots corresponding to 5 µg of DNA each. To each aliquot, the following reagents were added: 0.5 µL 10 mg/mL BSA, 5 µL 10x NEBuffer 2, 2 µL 2.5mM dATP and 5 µL T4 DNA polymerase, made up to a total volume of 50 µL with water then incubated at 20˚C for 4 hours.

The reactions were quenched by adding 2 µL 0.5M EDTA pH8.0. Two reactions were then pooled, making a total of 4 tubes per library, and 100 µL water was added. The DNA was purified by performing a phenol:chloroform extraction: the samples were transferred to 2 mL PLG light tubes and an equal volume of phenol:chloroform:IAA was added (approx. 200 µL). The samples were briefly mixed before being centrifuged for 10 minutes at 12,000-16,000 x g (room temperature). The aqueous phase was transferred to a fresh 1.5 mL tube, and 1/10 volume of 3M sodium acetate pH5.2 was added (approx. 20 µL). The samples were precipitated overnight at -20˚C by adding 2.5 volumes of ice-cold 100% ethanol (approx. 500 µL).

171

Following precipitation, the DNA was pelleted by centrifugation for 30 minutes at 14,500 x g (4˚C). The supernatant was removed with a pipette and the pellet washed twice with 500 µL 70% ethanol; after each wash the pellet was re-centrifuged for 10 minutes (4 ˚C). Following the second wash, all of the supernatant was removed and the pellet dried at room temperature for 5 minutes. Each pellet was then re-suspended in 130 µL water.

DNA shearing and end-repair

In order to prepare the Hi-C libraries for high throughput sequencing, each sample was sheared using the Covaris S220 sonicator. The 130 μL sample was transferred to a MicroTUBE AFA Fiber with Snap-Cap (Covaris) and sonicated for 55 seconds using the following conditions: duty factor of 10%, peak incident power of 140 Watts, 200 cycles per burst. The sample was transferred to a fresh 1.5 mL tube and end-repair was performed by the addition of 18 μL 10X ligation buffer (NEB; B0202S), 18 μL 2.5 mM dNTP mix, 6.5 μL T4 DNA polymerase (NEB; M0203L), 6.5 μL T4 DNA polynucleotide kinase (NEB; M0201L) and 1.3 μL DNA polymerase I, large (Klenow) fragment (NEB M0210L). The reaction was incubated at room temperature for 30 minutes.

Each sample was split into two aliquots, where each aliquot now contained approximately 5 μg of DNA. The libraries were purified using the Qiagen MinElute Kit, with a modified protocol by (Belton et al., 2012) to maximise extraction efficiency. Briefly, each sample was combined with 5 volumes of the supplied PB buffer and loaded into a MinElute column. The columns were centrifuged at 3000 x g for 1 minute, then at 18,000 x g for another minute at room temperature. The flow-through was discarded and 750 μL of the supplied PE wash buffer was added. The columns were centrifuged at 18,000 x g for one minute at room temperature. The flow-through was discarded and the columns were centrifuged again at 18,000 x g for one minute at room temperature. The columns were placed into fresh 1.5 mL tubes and the DNA was eluted with 20 μL pre-warmed TLE buffer (10mM Tris-HCl, 0.1mM EDTA) at 65oC. After incubating for 2 minutes at room temperature, the columns were centrifuged at 6000 x g for 1 minute, then at 18,000 x g for another minute at room temperature. The elution was repeated using another 15 μL pre-warmed TLE buffer at 65oC and the samples were transferred to fresh 1.5 mL tubes.

172

A-tailing and SPRI bead size-selection

To allow for downstream ligation of Illumina sequencing adapters, the ends of the DNA fragments needed to be adenylated via the action of Klenow Fragment (3’→ 5’ exo-). To each of the 30 µL samples the following reagents were added: 5 µL 10X NEBuffer 2, 11.5 µL 1mM dATP and 3.5 µL Klenow (exo-) (NEB; M0212L). The samples were incubated at 37˚C for 30 minutes, followed by 65˚C for 20 minutes in order to inactivate the enzyme. Two aliquots were combined to form 100 µL samples, which were placed on ice before size-selection.

Solid Phase Reversible Immobilisations (SPRI) beads were used to select DNA fragments between 200-600 bp; an optimal size for sequencing. SPRI beads work by binding certain fragment lengths of DNA, depending on the ratio of SPRI beads in polyethylene glycol (PEG) buffer, to DNA in solution. In order to obtain the required fragment lengths, a double-sided selection was performed (0.6X followed by 0.9X beads to DNA ratio). To obtain the initial 0.6X selection, 60 µL Agencourt AMPure XP beads (Beckman Coulter) at room temperature were added to each 100 µL sample and allowed to bind for 10-20 minutes. The beads were captured on a magnet and the supernatant recovered. To obtain the 0.9X selection, an initial volume of 120 µL AMPure beads was concentrated to 30 µL by removing 90 µL of the supernatant whilst on the magnet. The concentrated beads were then added to each sample, bound for 10-20 minutes, after which the beads were recovered on the magnet. The beads were washed twice on the magnet with 70% ethanol, followed by elution of the bound size-selected DNA in 50 µL TLE (Tris low-EDTA; 10 mM Tris-HCl, 0.1 mM EDTA) buffer. At this stage, the aliquots for each Hi-C library were pooled into one and the concentration determined by Qubit® fluorometer using the Quant-iTTM dsDNA Broad Range kit.

Biotin-streptavidin pull-down and adapter ligation

At this stage of the protocol, streptavidin-coated beads were used to pull out the ligation junctions bound by biotin. Each pull-down was performed on 2-2.5 µg of DNA in 300 µL TLE buffer. Following pull-down, the resultant fragments were ligated to sequencing adapters.

For each pull-down, a 150 µL aliquot of Dynabeads MyOne Streptavidin C1 beads (Life Technologies) was prepared by washing the beads twice with 400 µL Tween Buffer TB

173

(5mM Tris-HCl, 0.5mM EDTA, 1M NaCl, 0.05% Tween) using the magnet. After the second wash the beads were re-suspended in 2X No Tween Buffer NTB (5mM Tris-HCl, 0.5mM EDTA, 1M NaCl) and combined with the Hi-C samples to produce a total volume of 600 µL. The samples were left to bind to the beads on a rotator for 30 minutes at room temperature, after which the beads were captured on the magnet and the supernatant discarded. The Hi-C-bound beads were washed with 400 μL 1X No Tween Buffer NTB (10mM Tris-HCl, 1mM EDTA, 2M NaCl), followed by a wash with 200 µL 1X T4 DNA ligase reaction buffer (NEB; B0202S).

Illumina Paired End (PE) Adapters were prepared by annealing PE Adapter 1 with PE Adapter 2 (Appendix, Table 33). These adapters are modified to perform optimally: adapter 1 is phosphorylated at the 5’ end in order to improve ligation efficiency whilst adapter 2 has a phosphorothioate bond between the C and T at the 3’ end in order to prevent removal of the final T by exonucleases. The adapters were annealed in the following manner: equal volumes of 15 µM Adapter 1 and Adapter 2 were combined and incubated at 95˚C for 15 minutes, then 70˚C for 15 minutes, then allowed to cool down to room temperature.

The Hi-C-bound beads were re-suspended in 50 µL 1X T4 DNA ligase reaction buffer and 4 µL of 15 µM annealed adapters was added to each tube, along with 4 µL T4 DNA ligase (NEB M0202S). The ligation was incubated on a rotator at room temperature for 2 hours. Afterwards, the Hi-C-bound beads were collected on the magnet and washed twice with 400 µL TB buffer, once with 200 µL 1X NTB buffer, once with 200 μL 1X NEBuffer 2 ( NEB), then once with 60 μL 1X NEBuffer 2. The beads were re-suspended in 40 µL 1X NEBuffer 2, pooled and stored at 4˚C.

Test PCRs for pre-capture Hi-C libraries

At this stage the Hi-C libraries are amplified before binding to the RNA baits. To determine the optimal number of cycles for final amplification, test amplifications were carried out by performing 6, 7, 9 or 12 cycles of PCR using Illumina paired-end short PCR primers (TruPE PCR1 and TruPE PCR2; Appendix, Table 33).

For each library, the reactions were set up using TruPE primers and Phusion reagents (NEB) in the following manner:

174

Hi-C library bound to streptavidin beads 2.5 µL 5X Phusion HF buffer (NEB) 5 µL 10 mM dNTPs (A,C,G and T) 0.7 µL (each) 100 µM Chic_TruPE_PCR_1 0.075 µL 100 µM Chic_TruPE_PCR_2 0.075 µL

High Fidelity Phusion polymerase (NEB) 0.3 µL Distilled water 14.25 µL TOTAL 25 µL

The test PCRs were run under the following conditions, where n indicates the number of PCR cycles being tested:

98 ˚C 30 seconds 65 ˚C 30 seconds 1 cycle 72 ˚C 30 seconds 98 ˚C 10 seconds 65 ˚C 30 seconds n – 2 cycles 72 ˚C 30 seconds 98 ˚C 10 seconds 65 ˚C 30 seconds 1 cycle 72 ˚C 7 minutes

Following PCR, the products were analysed by gel electrophoresis using a 1.5% agarose gel (100 volts; 75 minutes). Successful libraries showed a smear between 300 and 600 bp which increased in intensity with increasing PCR cycles, and was visible on the UV transilluminator by at least 9 cycles of PCR.

Final PCR for pre-capture Hi-C libraries

The remainder of each Hi-C library, still bound to the streptavidin beads, was amplified using the reagents and conditions described above for 7-8 cycles of PCR in order to gain enough material for capture and sequencing. The number of PCR reactions carried out was determined by the number of pull-downs conducted.

Following PCR, the amplified products and beads were pooled. The beads were captured on the magnet, and the volume of the supernatant containing the amplified DNA was 175 measured. In order to remove contaminants from PCR, the supernatant was transferred into a fresh tube and two clean-ups using SPRI beads were performed. Firstly, a 1.8X volume of SPRI beads (AmpliClean, Nimagen) was added to the sample and left to bind for 10-20 minutes. After this, the beads were captured on the magnet and the supernatant was discarded. The beads were washed twice on the magnet with 70% ethanol then dried for 3-5 minutes at 37˚C. The DNA was eluted in 100 µL TLE buffer and transferred to a new tube. A second SPRI bead clean-up and washing steps were performed in the same manner, using 180 µL beads. This time, the DNA was eluted in 20-25 µL TLE buffer.

The quantity and quality of each library was determined on the Bioanalyzer using the High Sensitivity DNA kit (Agilent Technologies) according to the manufacturer’s instructions (Appendix, Table 23). Here a good quality library should display a smooth normal distribution of the DNA fragment lengths. A representative library trace is shown in the Appendix (Figure 59).

Generation of capture Hi-C libraries The next stage of the protocol utilises a custom capture library (Agilent), consisting of RNA baits, that is hybridised to the Hi-C libraries in order to enrich for the regions of interest.

Design of capture baits

To design the capture bait library, a list of known non-MHC psoriasis risk loci was collated from all GWAS and fine mapping studies available at the time. Each locus was defined by one or more independent SNPs associated with psoriasis at genome-wide significance (Appendix, Table 34 page 301). When large scale studies reported replicating a locus from a previous, smaller scale study, the SNP reported in the most recent or larger-scale study was used. SNPs that were significant after conditioning on the lead signal were also included; this meant that some loci had multiple SNPs, for example the IL12B locus. In addition, the LOP-specific locus at IL1R1 was included, even though it did not reach genome-wide significance (Hebert et al., 2014a). The total number of SNPs included was 107 (59 associated with Europeans, 42 with Chinese, and 6 associated with both European and Chinese cohorts).

The psoriasis region capture was performed as part of a larger study aiming to observe interactions across multiple autoimmune conditions and traits in several cell types. For

176 these other traits, a list of associated SNPs was collated by various members of the research group. The traits represented in the region capture design were: juvenile idiopathic arthritis (JIA), asthma, psoriatic arthritis (PsA), rheumatoid arthritis (RA) and systemic sclerosis (SSC). For the purpose of this thesis, only interactions occurring in the psoriasis-associated loci will be explored.

As part of the wider autoimmune study, the RNA capture baits were designed in-house by Dr Paul Martin, a bioinformatician within the functional genomics research group. The baits were selected to target HindIII fragments that overlapped with the LD block in each locus, defined by SNPs in r2 > 0.8 with the lead SNP according to 1000 Genomes Phase 3 release. Each 120 bp bait was targeted to within 400 bp of a HindIII fragment end. This was to ensure that the target region was retained following library sonication. For optimal hybridisation, the baits comprised 25-65% GC content and contained fewer than three unknown bases. Following design, the baits were synthesised by Agilent Technologies.

Solution capture hybridisation

Solution capture hybridisation to biotinylated RNA baits, followed by a biotin-streptavidin pulldown, was used to enrich for target regions of interest in the Hi-C libraries (illustrated in Figure 30). Firstly, each Hi-C library was bound to the capture baits in a single hybridisation reaction using the SureSelectXT reagents and protocol by Agilent Technologies (Version B5, June 2016; Appendix, Table 23), using the steps outlined below. For each reaction, 750 ng of Hi-C library was concentrated to a total volume of 3.4 µL using an Eppendorf Speedvac concentrator at 30˚C for an appropriate length of time. The 3.4 µL library was then transferred to a well of an 8-well PCR strip.

177

Figure 30: Schematic of solution capture hybridisation using the SureSelect kit The prepared genomic sample is hybridised to the biotinylated RNA baits. Streptavidin coated to magnetic beads binds the biotin and the target regions can be pulled out using a magnet. The target regions are retained for sequencing whilst the remainder is discarded.

For each capture, a total of 5.6 µL SureSelect Block mix was prepared, consisting of: 2.5 µL SureSelect Indexing block 1, 2.5 µL SureSelect Indexing block 2 and 0.6 µL SureSelect Indexing block 3. The 5.6 µL of Block mix was added to each sample and mixed, before the strip was sealed and transferred to a thermal cycler. The thermal cycle was then set to run at 95˚C for 5 minutes, followed by a 65˚C indefinite hold, with the lid heated to 105˚C.

The SureSelect hybridisation buffer was prepared in a total volume of 13 µL per reaction using the following components: 6.63 µL SureSelect Hyb 1, 0.27 µL SureSelect Hyb 2, 2.65 178

µL SureSelect Hyb 3 and 3.45 µL SureSelect Hyb 4. This buffer was heated at 65˚C for 5 minutes before use in order to dissolve any precipitate. In addition, a 1:9 RNase block dilution was prepared by adding 2 µL RNase block to 18 µL water. Finally, the SureSelect capture library mix was prepared in an 8-well strip on ice in the following manner:

SureSelect hybridisation buffer 13 µL 1:9 RNase block dilution 5 µL SureSelect capture library 2 µL TOTAL 20 µL

Once the Hi-C library and Block mixture had been incubated at 95˚C for 5 minutes then 65˚C for at least 5 minutes, 20 µL of the SureSelect capture library mix (detailed above) was quickly added to each reaction whilst the samples were kept on the thermal cycler at 65˚C. The 8-well strip was sealed and the reactions were incubated at 65˚C for 16-24 hours.

Biotin-streptavidin pull-down of captured regions

The biotinylated baits bound to each targeted Hi-C library fragment were captured using streptavidin-coated beads. For each library, 50 µL Dynabeads MyOne Streptavidin T1 beads (Life Technologies) were combined with 200 µL SureSelect Binding buffer and transferred to a fresh low-bind tube. The mixture was vortexed for 5 seconds, before the beads were reclaimed using a magnetic separator and the supernatant discarded. A further two washes in binding buffer were performed, before the beads were re- suspended in 200 µL binding buffer. The thermal cycler was opened and the entire CHi-C reaction mixture was transferred to the bead mixture and mixed well. The samples were then incubated on a rotator for 30 minutes at room temperature to allow for the biotin- streptavidin interaction to occur.

The beads were captured on the magnet and the supernatant discarded. The beads were re-suspended in 500 µL SureSelect Wash buffer 1 and transferred to a fresh tube. The sample was incubated for 15 minutes at room temperature, during which it was mixed by vortexing for 5 seconds every 2-3 minutes. The beads were captured on the magnet and the supernatant discarded. The beads were re-suspended in 500 µL SureSelect Wash buffer 2, which had been pre-warmed to 65˚C, and transferred to a fresh tube. The sample was incubated at 65˚C for 15 minutes, during which it was mixed by vortexing for

179

5 seconds every 2-3 minutes. The beads were captured on the magnet and the supernatant discarded. Two more washes in SureSelect Wash buffer 2 were carried out in the same manner, making a total of three washes. Once the supernatant was discarded after the final wash, the beads were re-suspended in 200 µL 1 x NEBuffer 2 and transferred to a fresh tube. The beads were immediately captured on the magnet and the supernatant discarded. Finally, the beads were re-suspended in 30 µL water and transferred to a fresh tube. The bead-bound libraries were kept at 4˚C prior to PCR amplification.

Test PCRs for post-capture Hi-C libraries

In the final preparation before sequencing, the libraries are amplified and each library given a specific barcode. The barcodes consist of 6 unique bases that allow them to be identified during multiplex sequencing. To test that the CHi-C libraries were correctly prepared, test PCRs (9 and 15 cycles) were carried out using the Illumina universal primer and a second, barcoded primer (Appendix Table 35) according to the protocol as described above (Page 174).

Final PCR for post-capture Hi-C libraries

Final PCRs were carried out using the universal primer and the barcoded primer with a minimal number of cycles ranging from 6-8. For each reaction, 2.5 µL of bead-bound library was used, allowing for 10-12 reactions to be performed per library. Following PCR, the reactions were pooled together and the beads were captured on a magnet. The supernatant was transferred to a fresh tube and the volume was measured, while the beads were re-suspended in 30 µL 1 x NEBuffer 2 and stored at -20˚C. The samples were purified using SPRI beads as described above (Page 175). This time, the DNA was eluted in 20-25 µL TLE buffer. The quantity and quality of each library was then determined on the Bioanalyzer using the High Sensitivity DNA kit as previously described (Page 175).

The amplified CHi-C libraries were also quantified using a KAPA qPCR Library Quantification Kit (Kapa Biosystems) according to the manufacturer’s instructions (Appendix, Table 23). The KAPA quantification kit utilises SYBR® green technology and Illumina universal primers to amplify a dilution series of the CHi-C library alongside 6 DNA standards (10-fold dilutions). Here, CHi-C libraries were initially diluted 1:1000 in dilution buffer (10 mM Tris-HCl, pH 8.0, 0.05% Tween 20) and then diluted 2-fold in a serial

180 manner to create a dilution series from 1:2000 to 1:64,000. The qPCR plate was prepared and processed according to the manufacturer’s instructions. Each library dilution was run in technical triplicate and each standard was run in duplicate. The data was exported to Excel for analysis and the reaction efficiency for both standards and library dilutions was confirmed to be between 90 and 110%, as described above (Page 135). Linear regression was used to generate a standard curve for the DNA standards. The concentration of each CHi-C library was then determined by fitting the CHi-C amplification data to the standard curve.

Consistency of the measured library concentration between the Bioanalyzer results and the KAPA qPCR results was checked. The final concentration of each library was determined as the mean of these two values. The libraries were then prepared in pools and sent for paired-end next-generation sequencing.

Paired-end next-generation sequencing of CHi-C libraries Next-generation sequencing (NGS) is a process by which DNA sequences are determined in a high throughput, massively parallel manner. Here, Illumina sequencing by synthesis (SBS) technology was utilised, as illustrated in Figure 31 (Metzker, 2010).

181

A B

Figure 31: Illumina sequencing technology, adapted from Metzker (2010) The sample is loaded onto a flow cell and undergoes bridge amplification (A). Sequencing-by-synthesis detects the addition of each nucleotide on the growing strand (B). Used with permission.

The process begins with loading the prepared library onto a flow cell that is coated with a lawn of oligos complementary to the adapters bound to the end of each fragment in the library. The oligos capture the library fragments through complementary base pairing. The fragments are amplified into clonal clusters by a process known as bridge amplification. The fragments within the clusters are then used as templates for SBS, in which fluorescently-labelled nucleotides are detected as they are added one-by-one to the growing strands. This is made possible by a reversible terminator on each nucleotide that prevents continuous polymerisation, allowing for the detection of each nucleotide one at a time. For paired-end sequencing, this process is repeated from the other end of each fragment.

Here, the My-La libraries were pooled and sequenced on an Illumina NextSeq500 machine (Genomic Technologies Core Facility, Faculty of Life Sciences at the University of Manchester) producing 75 bp paired-end reads. For the HaCaT libraries, two pools were 182 created, each containing one unstimulated library and one stimulated library (four libraries in total). Each pool was sequenced on a single lane of an Illumina HiSeq 4000 (Edinburgh Genomics at The University of Edinburgh) producing 75 bp paired-end reads.

Analysis of CHi-C libraries The quality of the raw sequence data was first checked using FastQC (Andrews, 2010). This program gives an indication of any problems during sequencing, such as reduction in sequence quality across the read. Following this, the data was processed through the Hi-C User Pipeline (HiCUP) (Wingett et al., 2015), followed by calling of significant interactions using Capture Hi-C Analysis of Genomic Organisation (CHiCAGO) (Cairns et al., 2016). Finally, an overarching analysis of the CHi-C data was conducted using command line based tools, followed by visual analysis of regions of interest using the WashU Epigenome Browser.

HiCUP pipeline

HiCUP is an established bioinformatics pipeline for assessing the quality of Hi-C data and filtering Hi-C fragment pairs (known as di-tags) for further analysis (Wingett et al., 2015). HiCUP consists of a series of perl scripts that work by truncating the reads at the ligation junction, mapping the fragments to the reference genome using Bowtie2 (Langmead and Salzberg, 2012) and filtering out common artefacts generated during the Hi-C protocol. The filtering process is carried out by mapping the di-tags to an in silico digest of the reference genome. Common artefacts that HiCUP removes from the dataset include the following (described in detail by Wingett et al., 2015):

 Contiguous sequences. These can be caused by re-ligation of adjacent restriction fragments during library generation.  Read pairs that map to the same fragment. These can be caused by fragment self- ligation through circularisation. Alternatively, un-ligated fragments can become inserted between sequencing adapters, forming “dangling ends” or “internal fragments”.  Inserts of the wrong size. These can be caused by inaccurate mapping of the di- tag.  Duplicate di-tags. These are usually caused by duplication of the reads during PCR amplification.

183

Here, the raw sequence data was processed through the HiCUP pipeline and each library was assessed for quality using the generated report. The output file containing the aligned data was then taken forward for CHi-C analysis by retaining di-tags that mapped back to the capture bait library.

CHiCAGO pipeline

CHiCAGO is a pipeline designed to identify significant interactions in CHi-C datasets (Cairns et al., 2016). CHiCAGO employs a background correction model that takes into account two factors: (1) the frequency of interactions occurring by random (Brownian collisions), which decreases with increasing distance from the anchor fragment, and (2) technical noise caused by assay artefacts. Di-tag reads that are notably higher than the background (null) model are likely to be true interactions. P-values are adjusted for multiple testing by a weighted model that takes into account how likely it is that the fragments interact, based on distance. By default, CHiCAGO scores greater than 5 are considered to be significant.

Here, each set of duplicate libraries was processed through the CHiCAGO pipeline. The output was a text file of all significant interactions for each condition.

Overarching interpretation of CHi-C data

The output from CHiCAGO contained information on interactions occurring in all the autoimmune-associated loci included in the capture design. Interactions in psoriasis- associated loci were discovered by comparing the psoriasis bait fragment coordinates against the interaction data using BEDTools (Quinlan, 2014); a program used for a variety of genomic analyses. Next, BEDTools was used to detect interactions between psoriasis- associated fragments and gene promoters. Gene promoters were defined by fragments covering regions within 500 bp of transcription start sites (Ensembl release 75; GRCh37). In order to identify an overall enrichment of biological processes, the genes implicated by the interactions with promoter fragments were curated into lists and protein networks were identified using STRING version 10.5 (available at https://string-db.org/) (Szklarczyk et al., 2015). The output included a report of any enrichment of distinct biological processes; these are described by Gene Ontology (GO) terms (Ashburner et al., 2000).

184

Interactions within selected loci

The interaction data was uploaded to the WashU Epigenome Browser for visualisation. Each locus was examined within a 1 Mb – 2 Mb window surrounding the psoriasis association. The loci presented in this thesis include the candidate regions 9q31 (KLF4) and 6q23 (TNFAIP3), as well as three further novel loci selected for potential follow-up: 5p13.1, 6p22.3 and 18q21.2. The three novel loci were selected based on interactions between psoriasis bait fragments and compelling novel target genes.

185

2.4 Results

2.4.1 Results for functional characterisation of individual risk loci Locus-specific functional characterisation was carried out on two psoriasis-associated loci. For the primary locus at 9q31 (KLF4), the results include bioinformatic, ChIP and 3C analyses. For the secondary locus at 6q23 (TNFAIP3), the results include bioinformatic and 3C analyses.

2.4.1.1 The 9q31 (KLF4) risk locus

Bioinformatics Interrogation of the 1KG Phase 3 dataset revealed 90 variants in tight LD (r2 > 0.8) with the index SNP rs10979182 (Appendix, Table 36). Of these, 83 were in r2 > 0.9 with rs10979182. Thirteen of the variants were insertion/deletion mutations (indels) while the remainder were SNPs. The combined PICS score for these 91 variants suggested that the set was 90.17% likely to cover the causal variant, with the index SNP rs10979182 having the highest PICS score of 5.3%.

The variant set was found to span approximately 70 kb within a large, intergenic region encompassing more than 1 Mb (UCSC Genome Browser hg19) (Figure 32). The gene desert is flanked by clusters of candidate genes including KLF4, ACTL7A, ACTL7B and IKBKAP (Figure 32A). The variant set intersects with regulatory regions displaying modified histone marks (H3K4me1 and H3K27ac) in several cell types from ENCODE (Figure 32B). Of the cell types, histone mark binding was most prominent in NHEK (Figure 32C).

186

A

B

rs10979182 LD

H3K4me1

H3K27ac

c rs10217259 rs6477612 rs4978343 rs10979182 LD

H3K4me1 (NHEK) H3K27ac (NHEK) Dnase clusters Txn factor ChIP

Figure 32: Overview of psoriasis-associated SNPs in 9q31 (hg19) The SNP set in r2 > 0.8 with the index SNP (rs10979182), denoted by a purple bar, is located in a ~1 Mb gene desert between KLF4 and ACTL7B (A). The SNP set contains 91 SNPs (purple lines) that intersect roughly three putative enhancer regions defined by elevated levels of H3K4me1 and H3K27ac in ENCODE cell types (B). Three SNPs were later prioritised due to their intersection with modified histone marks at each of the three putative enhancers in NHEK (purple peaks), as well as DNase clusters and areas of transcription factor binding (C).

The online tools Blood eQTL, GTEX and RegulomeDB were used to search for eQTLs, but no eQTLs were identified in the variant set. The VEP tool predicted that 73% of the variants were intergenic whilst 13% were present in a regulatory region (Figure 33). One SNP, rs6477612, was located in a transcription factor-binding site while another SNP, rs10118193, was present in the exon of a non-coding pseudogene (RP11240E2.2). A further 2% of SNPs were a short distance upstream (886-1779 bp) of this non-coding gene, while 8% were located downstream (451-4756 bp). None of the SNPs in the KLF4 region were present in a coding gene.

187

VEP predictions: 9q31 (KLF4)

Figure 33: VEP predictions of variant consequences in the 9q31 variant set

RegulomeDB predicted that the variants rs6477612 and rs55975335 were the most deleterious of the variant set, with scores of 2a and 2b, respectively (see Appendix, Table 25). The second of these, rs55975335, is an indel of 30 nucleotides. Eleven variants scored 4, suggesting evidence of transcription factor binding and a DNase peak, whilst most variants scored 5 or 6 suggesting very little functionality. All SNPs scoring 4 or less can be viewed in Table 19. CADD, on the other hand, predicted that the most deleterious SNP was rs10816609, with a score of 19.43, while the second was rs10125120 (17.73) and the third was rs6477612 (17.4). A full list of functional scores can be seen in the Appendix (Table 36).

GTEx was used to examine candidate gene expression across the locus. The nearest gene KLF4 was found to be expressed most highly in skin (non-sun exposed) compared with other tissues in GTEx data. In comparison, IKBKAP, CTNNAL1 and TMEM245 were most highly expressed in the adrenal gland. ACTL7A and ACTL7B were not well characterised and most highly expressed in the testis, whilst FAM206A was most highly expressed in the thyroid.

188

Table 19: Variants in LD with rs10979182 with functional scores of 4 or less in RegulomeDB RegulomeDB scores decrease with increasing regulatory potential, whereas CADD scores increase with deleteriousness. Abbreviations: DGV, downstream gene variant; IV, intergenic variant; RRV, regulatory region variant; TFBSV, transcription factor binding site variant; UGV, upstream gene variant. *selected for ChIP analysis

Variant Position Ref allele Alt allele r2 PICS VEP Regulome CADD (hg19) DB score score rs6477612* 110811552 C T 0.93 0.016 TFBSV; IV 2a 17.4 rs55975335 110811312 AAAACAAAGTATTAAT - 0.93 N/A IV 2b 15.81 GAATAATAATATCT rs4978343* 110820132 G T 0.97 0.028 IV 4 10.31 rs10217259* 110803975 T C 0.95 0.013 DGV; RRV 4 6.826 rs10979183 110821050 C T 0.97 0.026 IV 4 5.548 rs10816618 110801608 C A 0.95 0.014 UGV; RRV 4 5.343 rs1434836 110822658 A G 0.97 0.032 IV; RRV 4 2.287 36 rs2417842 110818823 G T 0.97 0.024 IV 4 1.791 rs112957589 110834146 - AGAAAC,GAAACA 0.94 0.011 IV 4 1.085 rs1339756 110776765 T C 0.93 0.009 IV 4 0.915 rs1318148 110814693 C G 0.97 0.032 IV 4 0.591 rs1914513 110803324 G A 0.94 0.010 DGV; RRV 4 0.513 rs6477613 110811614 C T 0.92 0.015 IV 4 0.142

Chromatin immunoprecipitation In the 9q31 (KLF4) locus, three SNPs were selected for ChIP as they intersected putative enhancer regions, had the most likely scores on RegulomeDB and had corresponding high CADD scores: rs10217259, rs6477612 and rs4978343 (Figure 32). According to Haploreg V4.1, these SNPs are predicted to be in strong enhancers in skin and likely alter regulatory motifs. The SNPs rs10217259 and rs6477612 intersect with DNase hypersensitive regions in many cell types, and have several bound proteins. Additionally, rs6477612 is evolutionarily conserved (Haploreg). ChIP was conducted in order to confirm if these SNPs bind H3K4me1 or H3K27ac in HaCaT, My-La or NHEK cells.

189

Optimisation of ChIP protocol

Initially, ChIP protocols were optimised for each cell type and antibody. Sonication time- courses determined that 20 minutes of sonication was sufficient to produce optimal fragment lengths of chromatin (200-400 bp) from HaCaT and My-La cell lines, whereas 8 minutes was sufficient for chromatin from NHEK in a smaller volume (Appendix, Figure 60). The amount of antibody for H3K4me1 or H3K27ac per immunoprecipitation (IP) was optimised to 2 µg in both HaCaT and My-La cell lines, since this amount of antibody was found to increase enrichment at positive control regions in comparison with 1 µg per IP (Figure 34).

A titration experiment was carried out in HaCaT cells to determine the least amount of chromatin that could be used per IP, which might allow for an experiment using NHEK cells which are more difficult to grow in large quantities. For H3K4me1, a minimum amount of chromatin corresponding to 250,000 cells per IP was sufficient to distinguish clearly between positive and negative control regions (Figure 34). For H3K27ac, however, unexpected enrichment was observed in the negative control regions using 250,000 or 500,000 cells that only abated when using 750,000 or 1,000,000 cells per IP (Figure 34). Therefore, ChIP for H3K4me1 was carried out in NHEK using 250,000 cells per IP, whereas ChIP for H3K27ac in NHEK was not attempted.

190

H3K4me1 H3K27ac

t

t n

n 2.0 2.5 e

e 1 µg me1 1 µg 27ac

m

m h

h 2.0 c

c 1.5 2 µg me1 2 µg 27ac

i

i

r

r

n n

T 1.5

e

e

a 1.0

P

P

I

I h

C 1.0

h

C

C

a

0.5 e

e 0.5

g

g

H

a

a

t

t n

n 0.0 0.0

e

e

c

c

r

r e

e -0.5 -0.5 P P Positive Negative Positive Negative

Target Target

t

t n

0.6 n 3 e

1 µg me1 e 1 µg 27ac

m

m

h

h c

2 µg me1 c 2 µg 27ac

i i

r 0.4

r n

n 2

e

e

a

P

P

I

I L

h 0.2

h

-

C

C

y

e e

g 1

g

M

a a

t 0.0

t

n

n

e

e

c

c

r

r e

-0.2 e

P 0 Positive Negative P Positive Negative

Target Target t

No. cells per IP t No. cells per IP

n n

0.25 n 15 e

(1000s) e (1000s)

o

m

m

i

h t

0.20 250 h 250

c

c

i

i

a

r

r r

n 500

0.15 n 500 t

e 10

e

i

P

t 750 P

I 750

I

h 0.10

h T

C 1000 C

1000

e a 0.05 e

g 5

g

a

C

a

t

t

n a

0.00 n

e

e

c

H

c

r

r e

-0.05 e

P 0 + - + - + - + - P + - + - + - + - Target Target

Figure 34: ChIP optimisation in HaCaT and My-La cells ChIP was optimised for antibodies against the histone marks H3K4me1 and H3K27ac. Various conditions were tested using primers targeting positive control regions at GAPDH (denoted “Positive” or “+”) and negative control regions at GAPDH or MYOD (denoted “Negative” or “-”). Bars show mean ± SEM of triplicate IPs. Abbreviations: IP, immunoprecipitation; me1, H3K4me1; 27ac, H3K27ac

191

ChIP at prioritised SNPs in 9q31

ChIP-qPCR was carried out targeting 150-200 bp regions encompassing rs10217259, rs6477612 and rs4978343. ChIP revealed enrichment of both H3K4me1 and H3K27ac histone marks at the prioritised SNPs in HaCaT cells (Figure 35). The greatest enrichment for both H3K4me1 and H3K27ac was observed at rs4978343, at the third putative enhancer. For H3K4me1 in HaCaT cells, the enrichment at rs10217259 and rs4978343 was significantly greater than the negative control region at GAPDH (one-way ANOVA; adjusted P-value = 0.0340 and 0.0001, respectively). For H3K27ac in HaCaT cells, the enrichment at rs4978343 alone was significantly greater than the negative control region at MYOD (P = 0.0001). The pattern of H3K4me1 enrichment in NHEK cells mirrored that seen in HaCaT cells, although only rs4978343 had a significantly greater enrichment than the negative control (P = 0.0001) (Figure 35). In comparison, none of the prioritised SNPs were enriched for H3K4me1 or H3K27ac in My-La cells relative to the negative control (Figure 35).

192

H3K4me1 H3K27ac

0.4 15

t

t n

n *

e

e m m 0.3

h *

h T

c *

c

i

i

r a

r 10

n

n

e

C

e

0.2

P

a

P I

I *

h

h

H

C C

*

e

e 0.1 g

g 5

a

a

t

t

n

n

e

e c

c 0.0

r

r

e

e

P P -0.1 0 9 2 3 l l l l 5 1 4 o o 9 2 3 2 6 3 tr tr 5 1 4 ro ro 7 7 8 n n 2 6 3 t t 1 7 7 o o 7 7 8 n n 2 4 9 c 1 7 7 o o 0 6 4 c 2 4 9 c 1 s s s g 0 6 4 c s r r o e 1 s s s g r P N s r r o e r P N

0.6 * 3 *

t

t

n

n

e

e

m

m

h

h

c

c

i

i r

0.4 r 2

n

n

e

e

a

P

P

L

I

I

-

h

h

y

C

C

e

e

M g

0.2 g 1

a

a

t

t

n

n

e

e

c

c

r

r

e

e

P P 0.0 0 9 2 3 l l 9 2 3 l l 5 1 4 o o 5 1 4 o o 2 6 3 tr tr 2 6 3 tr tr 7 7 8 n n 7 7 8 n n 1 7 7 o o 1 7 7 o o 2 4 9 c c 2 4 9 c 0 6 4 0 6 4 c 1 s s s g 1 s s s g s r r o e s r r o e r P N r P N

10

t * n

e *

m 8

h

c

i

r

n

e K

6

P

I

E

h

H

C

e

N 4

g

a

t

n e

c 2

r

e P 0 9 2 3 l l 5 1 4 o o 2 6 3 tr tr 7 7 8 n n 1 7 7 o o 2 4 9 c 0 6 4 c 1 s s s g s r r o e r P N

Figure 35: ChIP results in HaCaT, My-La and NHEK cells ChIP was carried out for H3K4me1 (HaCaT, My-La and NHEK) and H3K27ac (HaCaT and My-La). Asterisks denote enrichment values at targets that are significantly higher than the negative control region (one-way ANOVA, adjusted P-value < 0.05). Bars show mean ± SEM of triplicate IPs.

193

3C-qPCR in the 9q31 (KLF4) locus In the 9q31 locus, 3C-qPCR assays utilising SYBR® green or TaqMan were carried out in HaCaT and My-La cells. 3C assays were designed to capture potential interactions between local genes and the psoriasis-associated putative enhancers containing rs10217259, rs6477612 and rs4978343.

Interaction profile between a psoriasis-associated putative enhancer (rs6477612) and surrounding genes in 9q31

The first hypothesis was that the intergenic psoriasis association would interact with a gene at the edge of the gene desert; most likely KLF4 based on prior Hi-C data (Rao et al., 2014) (Figure 26; page 154). Therefore, the first 3C-qPCR assay was anchored at a fragment within the psoriasis association and interactions were tested in both directions, using SYBR® green as a reporter.

Across the assay in both HaCaT and My-La cells an interaction peak occurred between the psoriasis-associated enhancer 2 fragment and the KLF4 Centromeric 1 fragment, which is situated approximately 8.7 kb downstream of KLF4. This was an interaction of over 500 kb from the anchor fragment. In HaCaTs the relative interaction frequency for this interaction was found to be significantly higher than for two other interactions: KLF4 and CTNNAL1 (one-way ANOVA; adjusted P-value = 0.0491 and 0.0149 respectively). In My-La cells the relative interaction frequency with KLF4 Centromeric 1 was significantly stronger than with CTNNAL1 (one-way ANOVA; adjusted P-value = 0.0230) (Figure 36). In My-La cells, there was another significant interaction between the psoriasis-associated enhancer 2 fragment and the KLF4 Centromeric 2 fragment, which is situated approximately 2.5 kb downstream of KLF4; this was significantly stronger than with CTNNAL1 (one-way ANOVA; adjusted P-value = 0.0474) (Figure 36).

194

Anchor = Psoriasis putative enhancer 2 (rs6477612)

* 0.3

y

c

n e

Cent 1 u

T

q

e a

r 0.2

f

C n

Cent 2 o

a

i

t

c

H

a

r

e

t

n i

0.1

e

v

i

t

a l e Int C

R IKBKAP RP11-363D24.1 KLF4 Int A

CTNNAL1

4 1 P

-750 -500 -250 0 250 500 750 L 1000

F

A L

Distance from anchor (kb) A

K

K

N

B N

0.20 K

I T * C

* y

c n

e 0.15

u

195 q

a Cent 1

e

r

f

L

-

Cent 2 n

o

y

i t

c 0.10

a M

RP11-363D24.1 r

e

t

n

i

e

v i

t 0.05 Int C

a l e IKBKAP KLF4 R Int A

CTNNAL1

4 1 P

-750 -500 -250 0 250 500 750 L 1000

F

A

L

A

K

K N

Distance from anchor (kb) B

N

K

I T C

Figure 36: 3C-qPCR results in the 9q31 locus anchored at the HindIII fragment containing the second psoriasis-associated putative enhancer (rs6477612) qPCR was carried out on HaCaT and My-La 3C libraries using SYBR® Green as the reporter. The anchor fragment at the second psoriasis-associated enhancer is at distance 0 kb. Test fragments were selected in and around KLF4, two points in the gene desert and at fragments containing gene promoters for IKBKAP, FAM206A and CTNNAL1. Interactions were normalised to a short range control. Asterisks denote fragments that had a significantly higher relative interaction frequency than one or more of the other tested fragments, after multiple testing (one-way ANOVA, adjusted P-value < 0.05). Bars show mean + SEM of triplicate 3C libraries. Abbreviations: Cent, centromeric; Int, intergenic. 195

Interaction profile between KLF4 Centromeric 1 fragment and regions within the gene desert

Since the first 3C-qPCR assay confirmed an interaction between the second psoriasis- associated enhancer (rs6477612) and a fragment 8.7 kb downstream of KLF4 (Centromeric 1) in both HaCaT and My-La cells, it was next hypothesised that the Centromeric 1 fragment interacts more strongly with the psoriasis association than with fragments elsewhere in the gene desert. Therefore, the second 3C-qPCR assay was anchored at KLF4 Centromeric 1 and target fragments were tested across the gene desert, using SYBR® green as a reporter. The target fragments included two fragments near the breast cancer association called Positive 1 and Positive 2, of which Positive 1 has previously been shown to interact with KLF4 in breast cancer cells (Dryden et al., 2014).

Across the assay in both HaCaT and My-La cells an interaction peak occurred between the KLF4 Centromeric 1 fragment and the Positive 2 fragment; an interaction of approximately 800 kb from the anchor fragment (Figure 37). In HaCaT cells the relative interaction frequency for this interaction was found to be significantly stronger than for four other interactions: the three psoriasis-associated enhancers and Intergenic 4 (one- way ANOVA; adjusted P-value = 0.0148, 0.0434, 0.0149 and 0.0248 respectively). In My-La cells the relative interaction frequency with the Positive 2 fragment was significantly stronger than for five other interactions: the three psoriasis-associated enhancers, Positive 1 and Intergenic C (one-way ANOVA; adjusted P-value = 0.0080, 0.0244, 0.0094, 0.0189 and 0.0120, respectively). In both cell types, the interaction peak within the three psoriasis-associated putative enhancers occurred at enhancer 2 (Figure 37). However, this interaction was not significant when put into context of the whole assay.

196

Anchor = KLF4 centromeric fragment 1

0.8

y c

n * e

u 0.6

q

e

r

f

n

T

o i

t Pos 2 a

c 0.4

a

C

r

e

a

t

n

i

H

e v

i 0.2

t

a

l e

R Ps 2 Int B Pos 1 Ps 3 Int C 0.0 Ps 1

0 4 100 500 600 700 800 900 1000 1100 1200

F L

K Distance from anchor (kb)

1.0

y c

n *

e 0.8

u

197

q e

r Pos 2

f

n

0.6

o

i

t

a

c a

L Int B

r

- e

t 0.4

y

n

i

M

e

v

i

t a

l 0.2 e

R Ps 2 Pos 1 Ps 1 Ps 3 Int C 0.0

0 4 100 500 600 700 800 900 1000 1100 1200

F L K Distance from anchor (kb) Figure 37: 3C-qPCR results in the 9q31 locus from a HindIII fragment ~ 8.7 kb downstream of KLF4 (Centromeric 1) qPCR was carried out on HaCaT and My-La 3C libraries using SYBR® Green as the reporter. The anchor fragment Centromeric 1 is at distance 0 kb. Test fragments were selected at the three putative psoriasis-associated enhancers, two intergenic loci (Int B and C) and positive control fragments (Pos 1 and Pos 2; where Pos 1 has previously been shown to interact with Centromeric 1 and Pos 2 is a fragment 1.8 kb from Pos 1). Asterisks denote fragments that had a significantly higher relative interaction frequency than one or more of the other tested fragments (one-way ANOVA, adjusted P-value < 0.05). Bars show mean + SEM of triplicate 3C libraries. Abbreviations: Int, intergenic; Pos, positive; Ps, psoriasis. 197

Interaction profile between KLF4 and the psoriasis associated variant set

The first 3C-qPCR assay identified an interaction between a fragment near KLF4 and the second prioritised enhancer within the psoriasis association. However, the second assay found that this interaction was not as strong as that with a fragment further into the gene desert. The third assay therefore took a more hypothesis-free approach in order to determine if the fragment containing KLF4 itself interacted strongly with any of the psoriasis-associated fragments, not limited to the prioritised putative enhancer regions. In order to introduce further specificity into the assay, TaqMan® was used as the reporter, and a probe was designed to target the anchor fragment containing KLF4.

In HaCaT cells, positive interactions were seen between KLF4 and the psoriasis-associated fragments 1, 3, 5, 6, 8 and 9, in comparison with the NCR at Intergenic A (one-way ANOVA, adjusted P-value = 0.0271, 0.0001, 0.0001, 0.0001, 0.0001 and 0.0048, respectively). Of these, the LD regions 8 and 9 contain prioritised SNPs in the second and third putative enhancers (rs6477612 and rs4978343, respectively). In addition, a positive interaction was seen between KLF4 and the non-disease-associated fragment Intergenic B (P = 0.0087) (Figure 38).

In My-La cells, positive interactions were seen between KLF4 and the psoriasis-associated fragments 2, 3 and 9, in comparison with the NCR (one-way ANOVA, adjusted P-value = 0.0060, 0.0085 and 0.0404, respectively). Again, a positive interaction was also seen between KLF4 and the Intergenic B fragment (P = 0.0013). The relative interaction frequency for this interaction appeared to be higher than for interactions occurring with the psoriasis association (Figure 38).

198

Anchor = KLF4 gene and promoter

0.08

y c

n *

e *

u q

e 0.06 *

r *

f

n

T

o a

i * *

t C

c 0.04 * a

a Int B

r

e

H

t

n

i

e 0.02

v

i t

a Int A (NCR)

l e R 0.00

4 200 400 600 800 F

L Distance from anchor (kb) K

0.10

y

c

n

e 199

u 0.08

q *

e

r

f * n * a Int B

o 0.06 i

t *

L

-

c

a

y

r e

M 0.04

t

n

i

e

v i

t 0.02

a l

e Int A (NCR) R 0.00

4 200 400 600 800 F L Distance from anchor (kb) K Figure 38: 3C-qPCR results in the 9q31 locus from the HindIII fragment containing the KLF4 gene and promoter qPCR was carried out on HaCaT and My-La 3C libraries using TaqMan® as the reporter. The anchor fragment (distance 0) contained the entire KLF4 gene and promoter. An intergenic fragment located approximately 200 kb from the anchor fragment was utilised as a negative control region. Eleven test fragments were selected at regular intervals across the psoriasis association. A second intergenic fragment was tested on the other side of the psoriasis association, approximately 700 kb from the anchor fragment. Asterisks denote fragments that had a significantly higher relative interaction frequency than the negative control region (one-way ANOVA, adjusted P-value < 0.05). Bars show mean + SEM of triplicate 3C libraries. 199

Stimulation of HaCaT cells for ChIP and 3C in 9q31 Stimulation of HaCaT cells was conducted in order to determine if differential expression of KLF4 might correlate with shifting protein binding or chromatin interactions with the psoriasis-associated variants.

Analysis of gene expression

Stimulation of HaCaTs with IFN-γ was found to upregulate KLF4 expression 8-fold from 8 hours post-stimulation (Figure 39). The 8-hour stimulation time-point was therefore selected for producing ChIP and 3C libraries.

10

d e

t 8

a

e

e

r

g

t n

n 6

a

u

h

c

o

t

d

l 4

e

o

v

i

F

t a

l 2

e r 0 8 4 8 2 4 Time (hours)

Figure 39: Fold change in KLF4 expression upon stimulation of HaCaT cells with IFN-γ Stimulation was performed on HaCaT cells using 100 ng/mL IFN-γ over a 48 hour time-course. Fold change was determined using the ΔΔCt method against untreated cells using two housekeeping genes (GAPDH and HPRT1)

ChIP in stimulated HaCaT cells

ChIP was carried out on stimulated HaCaTs to determine if there was a change in H3K4me1 or H3K27ac binding at prioritised SNPs. No significant difference in ChIP enrichment was seen between unstimulated cells and stimulated cells at any of the prioritised SNPs. Additionally, no significant change was seen in binding at the KLF4 promoter after multiple testing (Figure 40).

200

H3K4me1 H3K27ac

t t n

n 6 2.5 e

e Unstimulated

m m

h h c

c 2.0 Stimulated

i i

r r n

n 4

e e

1.5

P P

I I

h h

C C

1.0 e

e 2

g g

a a

t t n

n 0.5

e e

c c

r r e

e 0 0.0

P P 9 2 3 r 9 2 3 r 5 1 4 e 5 1 4 e 2 6 3 t 2 6 3 t 7 7 8 o 7 7 8 o 1 7 7 m 1 7 7 m 2 4 9 o 2 4 9 o 0 6 4 r 0 6 4 r 1 s s p 1 s s p s r r 4 s r r 4 r F r F L L K K

Figure 40: ChIP results for H3K4me1 and H3K27ac in stimulated HaCaT cells HaCaTs were stimulated with IFN-γ for 8 hours and ChIP was carried out, with unstimulated cells as controls. No significant changes were found at any of the targets (multiple T-tests with correction for multiple comparisons using the Holm-Sidak method). Bars show mean ± SEM of triplicate ChIP libraries

3C in stimulated HaCaT cells

3C was carried out on stimulated HaCaTs in duplicate to determine if there was a change in interaction frequency between a psoriasis-associated putative enhancer and KLF4.

In both unstimulated and stimulated HaCaTs, an interaction peak was observed between the third putative enhancer (rs4978343) and fragments near KLF4, although none of the interactions across the locus were significantly stronger than any others (Figure 41A). There were no significant differences between unstimulated and stimulated cells at any of the interactions (multiple T tests, adjusted P-value > 0.99 for each) (Figure 41B).

201

A Anchor = Psoriasis putative enhancer 3 (rs4978343)

0.8

y

c

n

e d

u 0.6

e

q

t

e

r

a

f

l

n

u KLF4 SR

o

i

m t

i Cent 1 c

t 0.4

a

s

r

e

n

t

u

n

i

KLF4 Int C

e

T

v a i 0.2

RP11-363D24.1 t

C a

l IKBKAP

a

e R

H Int A

4 1

-750 -500 -250 0 250 500 750 P 1000

F L

A

L A

Distance from anchor (kb) K

K

N

B

N

K

I

T C

0.8

y

c

n

d

e

e u

t 0.6

q

a

e

l

r

f u Cent 1

n Int C

m

o i

KLF4 SR i

t

t c

s 0.4

a

r

T

e

t

a

n

i C

KLF4

e

a v

RP11-363D24.1 i

H 0.2

t IKBKAP a

Int A l

e

R

4 1

-750 -500 -250 0 250 500 750 P 1000

F L

A

L A

Distance from anchor (kb) K

K

N

B

N

K

I

T C

y 0.8

B c

n

e

u q

e 0.6

r

f

n o

i Unstimulated t

c 0.4

a Stimulated

r

e

t

n

i

e 0.2

v

i

t

a

l e

R 0.0

.1 1 4 R A C P 4 c F S A 2 ri L ic ic K D e K 4 n n 3 F e e B 6 m L g g IK 3 o K r r - tr te te 1 n 1 e In In P c R

Figure 41: 3C-qPCR results in 9q31 in unstimulated and stimulated HaCaT cells Relative interaction frequencies are displayed across the locus (A) and between unstimulated and stimulated cells (B). Cent, centromeric; int, intergenic; SR, short-range

202

2.4.1.2 The 6q23 (TNFAIP3) risk locus

Bioinformatics in the 6q23 (TNFAIP3) risk locus Interrogation of the 1KG Phase 3 dataset revealed 9 variants in tight LD (r2 > 0.8) with the lead SNP rs582757. Of these, three were perfect proxies for rs582757: rs583522, rs598493 and rs643177; these four SNPs are all intronic to TNFAIP3. The combined PICS score for the 10 variants suggested that the set was 98.9% likely to cover the causal variant.

The variant set was located at TNFAIP3 (promoter and intronic regions) and an intergenic region downstream of TNFAIP3 (UCSC Genome Browser hg19) (Figure 42). Across the locus there was evidence of regulatory regions with elevated levels of histone mark binding, DNase clusters and transcription factor binding (Figure 42).

rs643177 rs6933987 rs598493 rs654912rs583522 rs582757 rs642627 rs612217 rs4895498 rs6909442

Figure 42: Overview of psoriasis-associated SNPs in 6q23 (hg19) The SNP set in r2 > 0.8 with the index SNP (rs582757) contains 10 SNPs that intersect TNFAIP3 and intergenic regions displaying elevated levels of histone modifications. The coloured peaks represent H3K4me1 and H3K27ac binding in various ENCODE cell types: GM12878 (red), H1-hESC (yellow), HSMM (green), HUVEC (light blue), K562 (dark blue), NHEK (purple) and NHLF (pink).

The online tools Blood eQTL, GTEX and RegulomeDB were used to search for eQTLs, but no eQTLs were identified in the variant set. VEP predictions for the variants included intronic variants, upstream or downstream gene variants, intergenic variants, non-coding transcript variants and regulatory region variants (Figure 33). None of the SNPs were within the exon in a coding gene.

203

VEP predictions: 6q23 (TNFAIP3)

Figure 43: VEP predictions of variant consequences in the 6q23 variant set

RegulomeDB predicted that rs598493 was the most deleterious of the variant set, with a score of 3A suggesting transcription factor binding, a transcription factor motif and a DNase peak. CADD, on the other hand, predicted that the most deleterious SNP was rs583522, with a score of 7.761. In all, the four intronic variants in perfect LD were deemed most likely to be regulatory. A full list of functional scores can be seen in Table 20.

GTEx was used to examine candidate gene expression across the locus. The nearest gene TNFAIP3 was found to be most highly expressed in Epstein-Barr virus (EBV)-transformed lymphocytes, according to GTEx data. IL20RA and IL22RA2 were most highly expressed in the skin (non-sun exposed), whilst IFNGR1 was most highly expressed in whole blood. In comparison, OLIG3 was barely expressed in any tissue in GTEx.

204

Table 20: Variants in LD with rs582757 with functional annotation RegulomeDB scores decrease with increasing regulatory potential, whereas CADD scores increase with deleteriousness. Abbreviations: DGV, downstream gene variant; INTROV, intronic variant; IV, intergenic variant; NCTV, non-coding transcript variant; RRV, regulatory region variant; UGV, upstream gene variant.

Position Ref Alt Regulome CADD SNP Annotation R2 PICS VEP (hg19) allele allele DB score score INTROV, rs598493 138195402 Intronic (TNFAIP3) T C 1.00 0.218 3a 2.605 DGV, UGV INTROV, rs643177 138195693 Intronic (TNFAIP3) T C 1.00 0.218 4 2.796 DGV, UGV UGV, rs583522 138189884 Intronic (TNFAIP3) C T 1.00 0.218 4 7.761 INTROV, RRV INTROV, rs582757 138197824 Intronic (TNFAIP3) C T 1.00 0.218 4 7.483 DGV, NCTV rs4895498 138236977 Intergenic T C 0.96 0.006 IV 4 2.229 rs6909442 138242487 Intergenic C G 0.92 0.005 IV 5 0.481 UGV, rs654912 138185422 RP11-35612.4 C T 0.87 N/A INTROV, 5 0.274 NCTV rs642627 138206783 2.3kb 3' of TNFAIP3 A G 0.98 0.039 DGV 6 2.416 rs6933987 138239434 Intergenic C T 0.96 0.010 IV No data 2.489 rs612217 138212961 8.5kb 3' of TNFAIP3 A G 0.98 0.059 IV No data 1.942

3C-qPCR in the 6q23 (TNFAIP3) locus In 6q23, chromatin interactions were measured between disease-associated regions and immune-related genes based on prior knowledge of the complex interactions in this locus (McGovern et al., 2016; Rao et al., 2014) (Figure 27, page 158). The targeted fragments within the first assay included the RA locus and two fragments within the psoriasis locus: the first containing the index SNP intronic to TNFAIP3 (Ps SNPs 1) and the second containing two SNPs downstream of TNFAIP3 (Ps SNPs 2). The anchor fragments included gene targets IL22RA2 and IFNGR1.

3C-qPCR revealed positive chromatin interactions in HaCaT cells between the IL22RA2 promoter and a fragment within the RA association, in comparison with the NCR at OLIG3 (unpaired T test, P = 0.0015). An interaction was also seen between the IL22RA2 promoter and a fragment within the psoriasis association downstream of TNFAIP3 (Psoriasis SNPs 2); an interaction over a distance of more than 720 kb (one-way ANOVA,

205 adjusted P = 0.0001). An interaction was not seen between the IL22RA2 promoter and the fragment containing the psoriasis index SNP (Psoriasis SNPs 1), in comparison with the NCR (Figure 44A).

A Anchor = IL22RA2 Anchor = IL22RA2 0.20 0.25

* y

y Ps SNPs 2

c c

n *

n 0.20 e

0.15 e u

RA u

q

q

e

e

r

r

f

f

n

n 0.15

o

o

i

i

t

t c

0.10 c

a

a

r

r

e

e t

t 0.10

n

n

i

i

e

e

v

v

i

i t

0.05 t

a

a

l l

e 0.05

e

R R

NCR

NCR Ps SNPs 1

2 1 3 3

1 3 3

-200 0 200 400 600 800 2

P A

R -200 0 200 400 600 800

G

P

A

R

I

G

I

I

I

R

G

R

A

G

L

A

L

2

2

F

N

F

N

2

O

2

O

F

N

L

F

N

I

L

I

I

I

T T Distance from anchor (kb) Distance from anchor (kb)

B Anchor = IFNGR1 Anchor = IFNGR1

8 1.5

y y

c *

c n

RA n e

6 e

u u

q *

q

e

e r

r 1.0

f

f

Ps SNPs 2

n

n

o

o

i

i

t

t c

4 c

a

a

r

r

e

e

t

t

n

n

i

i

0.5

e

e

v

v

i

i t

2 t

a

a

l

l

e

e

R R Ps SNPs 1

NCR NCR

-200 0 200 400 600 800 -200 0 200 400 600 800

2 1 3 3

2 1 3 3

P

A

R

P

A

G

R

G

I

I

I

I

R

R

G

G

A

L

A

L

2

2

F

N

F

N

2

O

2

O

F

F

N

L

N

L

I

I

I

I T Distance from anchor (kb) Distance from anchor (kb) T

Figure 44: 3C-qPCR results in the 6q23 (TNFAIP3) locus between immune-related genes and the Ps/RA loci in HaCaT cells qPCR was carried out on HaCaT 3C libraries using SYBR® green as the reporter. Interaction frequencies are relative to a short-range control interaction with each anchor fragment. Asterisks denote interactions that are significantly greater than that at the intervening negative control region (NCR) near OLIG3 (one-way ANOVA or unpaired T-test, adjusted P-value < 0.05). Bars show mean ± SEM of triplicate 3C libraries. Abbreviations: NCR, negative control region; Ps, psoriasis; RA, rheumatoid arthritis.

Similarly, a positive chromatin interaction was detected between a fragment ~30 kb upstream of IFNGR1 and a fragment within the RA association, in comparison with the NCR at OLIG3 (unpaired T test, P < 0.0001). An interaction was also seen between the IFNGR1 locus and the fragment within the psoriasis association downstream of TNFAIP3

206

(Psoriasis SNPs 2); an interaction over a distance of approximately 650 kb (one-way ANOVA, adjusted P = 0.0003). An interaction was not seen between the IFNGR1 region and the fragment containing the psoriasis index SNP (Psoriasis SNPs 1), in comparison with the NCR (Figure 44B).

The second 3C-qPCR assay in 6q23 tested interactions between Ps SNPs 2 and regions across TNFAIP3, as well as with a fragment containing the RA index SNP. A positive interaction was detected between the Ps SNPs 2 fragment and the fragment containing the promoter of TNFAIP3, in comparison with the intervening fragment within TNFAIP3 itself (Ps SNPs 1) (one-way ANOVA, adjusted P = 0.0001). This occurred over a distance of approximately 54 kb (Figure 45). In comparison, interactions were not seen with the RA index SNP or the NCR at OLIG3.

Anchor = Ps SNPs 2 (3' TNFAIP3) 1.0

R

* e

l 0.8 a TNFAIP3 promoter t

i

v

e

i

n

t 0.6 e

r

a

c

t

i

o

n

0.4 f

r

e

q

Ps SNPs 1 u

e

n

0.2 c

y

NCR RA index 0.0 -500 -400 -300 -200 -100 0 TNFAIP3

Distance from anchor (kb)

Figure 45: 3C-qPCR results in the 6q23 (TNFAIP3) locus from the psoriasis-associated fragment downstream of TNFAIP3 (Ps SNPs 2) qPCR was carried out on HaCaT 3C libraries using SYBR® green as the reporter. Interaction frequencies are relative to a short-range control interaction with each anchor fragment. Asterisks denote interactions that are significantly greater than that at the intervening region within TNFAIP3 (Ps SNPs 1) (one-way ANOVA, adjusted P-value < 0.05). Bars show mean ± SEM of triplicate 3C libraries. Abbreviations: NCR, negative control region; Ps, psoriasis; RA, rheumatoid arthritis.

207

208

2.4.2 Results for functional characterisation of multiple risk loci The unbiased analysis of all known psoriasis risk loci involved a capture Hi-C data utilising three conditions comprising two cell types and one stimulatory experiment. The stimulatory experiment was accompanied by gene expression data.

2.4.2.1 HaCaT stimulation time-course and expression analysis HaCaT cells received stimulation with either IL-17A or IFN-γ across a 48-hour time- course. RNA was collected at 2, 8, 24 and 48 hours post-stimulation and differential gene expression was determined using an Illumina expression microarray.

For an overall view of sample clustering across the time-course, a multidimensional scaling plot was produced using Limma in R that showed separation of control samples and stimulated samples (Figure 46). For each condition, the duplicate samples clustered together well on the plot. For the IL-17A stimulated samples, little separation was seen from the control samples in terms of gene expression until 48 hours post-stimulation. For the IFN-γ stimulated samples, however, clear separation from the control samples was seen from 2 hours post-stimulation indicating a greater change in gene expression for the top 500 probes, across the time-course.

209

Control IL-17A IFN-γ

2 hours + 8 hours 24 hours 48 hours

Figure 46: Multidimensional scaling plot for all samples across the stimulatory time-course HaCaT cells were stimulated with 100 ng/mL cytokine and the extracted RNA was run on the Illumina HT-12 v4 expression array. The plot was generated using the command plotMDS within Limma, and shows the typical difference in log2 fold change between all samples, based on the top 500 differential probes.

210

2 hours IL-17A 8 hours IL-17A

Gene Log2 fold change Adj. P-value Gene Log2 fold change Adj. P-value NFKBIZ 3.26 4.31E-11 ZC3H12A 3.04 1.07E-07 ZC3H12A 2.53 8.74E-07 NFKBIZ 2.89 6.62E-10 TCERG1 -1.93 0.00067 SPRR2A 2.25 3.37E-07 CDC2L2 -1.88 0.00039 SPRR2F 2.12 7.88E-07 LOC100133692 -1.87 0.001604 SPRR2D 2.08 4.08E-06 YWHAE 1.86 0.001604 CXCL1 1.80 0.000354 YTHDC1 -1.80 0.000137 SPRR2E 1.66 1.11E-05 NFAT5 -1.78 0.00039 MAP3K8 1.65 1.11E-05 CXCL1 1.78 0.00039 CXCL6 1.57 2.60E-05 LOC162073 -1.75 0.001697 NACC1 1.50 0.009251

24 hours IL-17A 48 hours IL-17A

Gene Log2 fold change Adj. P-value Gene Log2 fold change Adj. P-value PI3 3.03 5.62E-05 SAA2 4.90 6.11E-07 S100A8 3.01 0.00167 PDZK1IP1 4.39 4.53E-09 ZC3H12A 2.91 1.29E-06 S100A7 4.15 2.99E-08 SAA2 2.90 3.06E-06 S100A8 4.14 0.001661 S100A9 2.61 1.29E-06 PI3 4.02 0.000107 SAA1 2.55 5.31E-07 SAA1 3.47 0.000103 LOC653061 2.55 8.30E-06 SAA1 3.43 1.05E-06 NFKBIZ 2.53 8.72E-08 SERPINB4 3.36 2.04E-05 SPRR2F 2.21 3.33E-06 ZC3H12A 3.27 2.43E-05 SPRR2A 2.09 5.54E-06 LOC653061 3.15 5.06E-05

Figure 47: Top tables for differentially expressed genes in HaCaT cells stimulated with IL-17A HaCaT cells were stimulated with 100 ng/mL cytokine and the extracted RNA was run on the Illumina HT-12 v4 expression array. Differential expression analysis was conducted using eBayes in Limma. The top 10 most differentially expressed genes based on fold change are presented for each time-point.

The top ten most differentially expressed genes at each time-point are shown for IL-17A stimulated cells (Figure 47) and IFN-γ stimulated cells (Figure 48). The top differentially expressed genes for the IL-17A stimulated samples at 2 and 8 hours included NFKB inhibitor zeta (NFKBIZ), a psoriasis candidate GWAS gene involved in NFkB signalling, and zinc finger CCCH-type containing 12A (ZC3H12A), whose protein has various functions including cellular inflammatory response. At 24 and 48 hours, the top differentially expressed genes included peptidase inhibitor 3 (PI3), S100 calcium binding protein A8 (S100A8), serum amyloid A2 (SAA2) and PDZK1 interacting protein 1 (PDZK1IP1). Of these, S100A8 has clear roles in regulating inflammation and immune response.

211

2 hours IFN-γ 8 hours IFN-γ

Gene Log2 fold change Adj. P-value Gene Log2 fold change Adj. P-value CXCL9 4.97 5.21E-14 CXCL9 8.74 2.81E-17 CXCL10 4.66 1.22E-11 CXCL10 8.70 2.61E-15 C10ORF10 3.51 3.53E-11 IDO1 7.12 1.08E-13 IRF1 3.05 9.94E-07 INDO 6.64 3.37E-13 UBD 2.99 1.27E-10 GBP4 5.88 7.25E-13 HAPLN3 2.90 3.54E-08 IL18BP 5.59 3.92E-12 GBP1 2.60 2.72E-09 UBD 5.57 4.16E-14 CCL2 2.59 5.12E-09 HAPLN3 5.42 5.14E-12 ETV7 2.57 1.02E-09 CD74 5.31 3.92E-12 BCL6 2.56 1.27E-10 GBP1 5.30 1.74E-12

24 hours IFN-γ 48 hours IFN-γ

Gene Log2 fold change Adj. P-value Gene Log2 fold change Adj. P-value CXCL10 9.26 2.38E-14 CXCL10 8.76 2.52E-12 CXCL9 8.49 1.18E-15 CXCL9 8.06 1.81E-13 IDO1 8.11 2.64E-13 GBP5 7.91 2.06E-12 INDO 8.10 3.51E-13 HLA-DPA1 6.97 1.03E-13 HLA-DRA 7.64 4.69E-12 INDO 6.52 4.12E-10 GBP5 7.41 5.28E-14 ECM2 6.45 2.56E-09 CD74 7.28 5.28E-14 LOC100133678 6.40 9.74E-11 CD74 7.13 1.15E-12 IL18BP 6.25 9.13E-10 IL18BP 6.96 3.75E-12 IDO1 6.21 5.41E-10 HLA-DMB 6.81 6.51E-13 CLDN8 -6.18 4.93E-09

Figure 48: Top tables for differentially expressed genes in HaCaT cells stimulated with IFN-γ HaCaT cells were stimulated with 100 ng/mL cytokine and the extracted RNA was run on the Illumina HT-12 v4 expression array. Differential expression analysis was conducted using eBayes in Limma. The top 10 most differentially expressed genes based on fold change are presented for each time-point.

The top differentially expressed genes for IFN-γ stimulated cells included C-X-C motif chemokine ligand 9 (CXCL9) and C-X-C motif chemokine ligand 10 (CXCL10) at each time-point. These genes are both involved immune and inflammatory responses, in particular T-cell trafficking, acting as T-cell attractants. Other differentially expressed genes included indoleamine 2,3-dioxygenase 1 (IDO1), which is involved in controlling the immune response to prevent the development of autoimmunity, and guanylate binding protein 5 (GBP5), which has a role in the innate immune system by activating assembly of the NLRP3 inflammasome – a protein complex that activates inflammation.

Based on the gene expression data, an 8-hour incubation with IFN-γ was selected for the stimulated HaCaT libraries. This was in order to ensure a robust stimulation that affected the expression of multiple genes with clear roles in immune response. 212

2.4.2.2 Capture Hi-C study For each of the three conditions, CHi-C libraries were generated in duplicate resulting in six libraries in total. Each library was processed through the HiCUP pipeline to filter out unpaired reads and Hi-C artefacts. Following this, each library had > 41 million unique di- tags (Table 21). Mapped di-tags were generated by aligning the unique di-tags to the capture bait library; following this, each library had > 25 million mapped di-tags to be taken forward for CHiCAGO analysis (Table 21). The mean capture efficiency, defined by the number of mapped di-tags divided by the number of total unique di-tags, was 69.6%.

Table 21: Results of HiCUP processing for CHi-C libraries HiCUP was used to filter sequencing reads from CHi-C libraries. Total reads refers to the total number of reads generated by the sequencer. Paired reads refers to the number of reads that could be aligned to the reference genome and paired. Valid pairs remain after filtering out common Hi-C artefacts such as self- ligated fragments. Following de-duplication, the remaining unique di-tags are the valid Hi-C ligated fragments. The mapped reads are those that can be aligned back to the capture bait library; these reads are taken forward for analysis in CHiCAGO. Unique di- Mapped di- Capture CHi-C library Total reads Paired reads Valid pairs tags tags efficiency (%) HaCaT 1 162,825,364 128,227,957 95,390,501 73,744,765 48,725,010 66.1 (unstimulated) HaCaT 2 197,126,430 96,231,209 73,826,839 43,701,724 25,017,027 57.2 (unstimulated)

HaCaT 1 (stimulated) 165,232,624 108,448,092 102,159,919 77,110,266 51,670,420 67.0

HaCaT 2 (stimulated) 177,752,928 94,495,785 72,326,697 41,144,728 28,563,324 69.4

My-La 1 199,293,315 133,627,155 119,431,279 78,883,683 64,315,454 81.5

My-La 2 353,395,050 236,277,027 209,425,704 120,874,978 92,394,623 76.4

Overarching interpretation of CHi-C data CHiCAGO was used to identify significant interactions for each condition within the unique di-tags. Once restricted to interactions originating from psoriasis-associated bait fragments, the number of interactions per condition ranged from 5,335 in 213 unstimulated HaCaT cells to 11,651 in My-La cells (Table 22); this variation is likely due to the difference in the depth of sequencing. Of these interactions, 28.5 – 30.8% occurred with promoter fragments. The promoter fragment interactions corresponded with several hundred genes and non-coding RNAs: more than 400 in HaCaT cells and more than 800 in My-La cells (Table 22).

Table 22: Frequency of significant interactions occurring with psoriasis-associated bait fragments All included interactions had a significant CHiCAGO score (>5). Interactions were restricted to those originating from psoriasis-associated bait fragments. No. of interactions with No. of corresponding Condition No. of interactions promoter fragments genes HaCaT unstimulated 5335 1542 (28.9%) 445 HaCaT stimulated 5512 1699 (30.8%) 463 My-La 11651 3326 (28.5%) 831

Across all three conditions, there were 1009 genes and non-coding RNAs implicated by interactions between psoriasis-associated fragments and gene promoter fragments (Figure 49). Of these genes, 617 were recognised by the STRING protein network tool corresponding with an enrichment of 159 biological processes. The three most significantly associated processes were “keratinization” (15 genes; FDR = 1.11 x 10-7), “regulation of immune response” (54 genes; FDR = 1.79 x 10-5) and “regulation of immune system process” (74 genes; FDR = 6.35 x 10-5). The gene set unique to My-La cells was enriched for 9 biological processes of which several involved IL-2 signalling. The most significant was “regulation of interleukin-2 biosynthetic process” (7 genes; FDR = 4.23 x 10-5). In comparison with My-La cells, the gene set unique to HaCaT cells (both unstimulated and stimulated) was not significantly enriched for any biological processes. Similarly, there were no significant biological processes for the gene sets unique to unstimulated or stimulated HaCaT cells. However, the complete set of genes implicated in unstimulated HaCaT cells was enriched for skin-related and immune-related biological processes, with the most significant being “keratinization” (14 genes; FDR = 2.91 x 10-11); the same was true for stimulated HaCaT cells (13 genes; FDR = 1.71 x 10-9). Notably the complete set of genes implicated in stimulated HaCaT cells was also enriched for various response

214 processes such as “response to organic substance” (58 genes; FDR = 0.00679) that were not found in the unstimulated HaCaT gene set.

53 HaCaT unstimulated

60 37 295 HaCaT 43 My-La stimulated 65 456

Figure 49: Number of genes and non-coding RNAs implicated by promoter fragment interactions with psoriasis-associated bait fragments There were 1009 genes or non-coding RNAs in total that corresponded to promoter fragments interacting with psoriasis-associated bait fragments. The Venn diagram indicates the number of genes that were shared or unique to each condition.

Within the bait-promoter interactions, an analysis was conducted to determine how many promoter fragments on average a psoriasis bait fragment interacted with. This showed that each psoriasis-associated fragment interacted with more than one promoter fragment on average (Figure 50). In HaCaT cells, the median number of interactions with promoter fragments was 2 (IQR = 2.5 and 3 for unstimulated and stimulated cells, respectively). In My-La cells, the median number of interactions with promoter fragments was 3 (IQR = 5).

215

200

d

e

t a

l 150

u

y

c

m

n

i

e t

u 100

s

q

n

e

r

u

F

T 50

a C

a 0 H 1 2 3 4 5 6 7 8 9 0 1 2 3 4 6 7 8 0 1 5 1 1 1 1 1 1 1 1 2 2 2 Promoter fragment interactions per bait fragment

250

d e

t 200

a

l

y

u c

n 150

m

e

i

u

t

q

s

e

r 100

F

T a

50

C a

H 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 2 4 6 1 1 1 1 1 1 1 1 2 2 2 No. of promoter fragments per bait fragment

250

200

y

c a

n 150

L

e

-

u

y

q e

r 100

M F

50

0 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 5 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 Promoter fragment interactions per bait fragment

Figure 50: Frequency distributions of the number of interactions with promoter fragments per psoriasis- associated bait fragment To determine the frequency distribution of psoriasis bait-promoter interactions, the data was firstly restricted to interactions between psoriasis-associated bait fragments and promoter fragments. Next, the number of promoter fragments per bait fragment was counted. For HaCaT cells, the frequency distribution is displayed in bins of 1. For My-La cells, the frequency distribution is displayed in bins of 5.

216

A direct comparison was carried out between the interactions occurring in unstimulated and stimulated HaCaT cells. Of all the interactions in each condition, 37.7% were found to be unique to unstimulated HaCaT cells, whilst 39.7% were unique to stimulated HaCaT cells. Among the promoter fragment interactions, 32.2% were unique to unstimulated HaCaT cells involving 289 genes/non-coding RNA whilst 38.4% were unique to stimulated HaCaT cells involving 320 genes/non-coding RNA.

Among the bait-promoter interactions, there were a total of 12 genes interacting with psoriasis baits that were differentially expressed according to the expression array data with a fold change of less than 0.5 or greater than 2: ADRA1B, CAST, ERAP2, GSDMB, ICAM1, IFIH1, IL23A, KLF4, RAB27B, SOX4, STAT2 and STAT3. Of these genes, two were implicated in only unstimulated or stimulated cells: CAST (ERAP1 locus) and RAB27B (POLI locus). In unstimulated cells there were three interactions between psoriasis bait fragments and CAST promoter fragments and no interactions in stimulated cells, coinciding with a reduction in expression upon stimulation (fold change 0.457). In stimulated cells there was a single interaction between a psoriasis bait fragment at the RAB27B promoter fragment and no interactions in unstimulated cells, coinciding with an increase in expression upon stimulation (fold change 2.32).

217

Interactions within selected loci CHi-C interactions in each locus were examined on the WashU Epigenome Browser within a 1-2 Mb window.

CHi-C in the 9q31 (KLF4) risk locus

The first candidate locus to be examined was at 9q31, which had first been investigated using 3C. CHi-C revealed that the intergenic psoriasis-associated bait fragments, tagged by rs10979182, interacted with fragments on either side in the gene desert. To the telomeric side, none of the interactions intersected promoter fragments: the furthest stretching interaction to the telomeric side fell short of the nearest gene, ACTL7B, by approximately 33.4 kb in HaCaT cells (Figure 51).

To the centromeric side, interactions occurred between the psoriasis association and the promoter of KLF4 in both unstimulated and stimulated HaCaT cells, over a distance of approximately 560 kb. In unstimulated HaCaT cells, one bait fragment (chr9:110810592- 110816598) interacted with KLF4; this fragment encompasses the middle putative enhancer that contains a likely causal SNP rs6477612. This fragment also interacted with KLF4 in stimulated HaCaT cells. In addition, another bait fragment (chr9:110798319- 110798738) interacted with KLF4 in stimulated HaCaT cells; this fragment does not contain any of the SNPs prioritised in the locus-specific work. This additional interaction between unstimulated and stimulated cells coincided with a four-fold increase in KLF4 expression (Illumina expression array). In contrast, interactions were not seen between the bait fragments and the KLF4 promoter in My-La cells (Figure 51).

Several of the interactions in 9q31 implicated non-coding genes or pseudogenes. Within approximately 30 kb of the psoriasis association, interactions occurred with fragments containing the promoters of a non-coding RNA (RP11-240E2.2) and a pseudogene (CHCHD4P). To the centromeric side of the gene desert, interactions occurred with the non-coding genes RP11-417L14.1 (unstimulated HaCaTs), AL359552 (stimulated HaCaTs) and RNU6-492P (unstimulated HaCaTs and My-La cells) (Figure 51).

218

A B

i. ii. C D E ii. iv. v. F i. iii. G

ii. iii. iv. H i.

I

J i. ii. K iii.

L

Figure 51: CHi-C interactions from psoriasis-associated bait fragments in the 9q31 (KLF4) risk locus Significant interactions are displayed as arcs, while promoter interactions are displayed as columns where the height of the column is proportional to the number of interacting fragments. A, position on chromosome 9; B, GENCODE V19 genes representing coding (blue), non-coding (green), pseudogene (pink) and problem (red); C, bait fragments for i. PsA and ii. psoriasis/PsA; D, psoriasis-associated LD block; E, ChromHMM for NHEK cells (Roadmap); F, promoter interactions in unstimulated HaCaTs for the following: i. RP11-417L14.1, ii. RNU6-492P, iii. KLF4, iv. RP11-240E2.2 and v. CHCHD4P; G, all interactions in unstimulated HaCaTs; H, promoter interactions in stimulated HaCaTs for the following: i. AL359552, ii. KLF4, iii. RP11-240E2.2 and iv. CHCHD4P; I, all interactions in stimulated HaCaTs; J, ChromHMM for CD8+ T cells (Roadmap); K, promoter interactions in My-La cells for the following: i. RNU6-492P, ii. RP11-240E2.2 and iii. CHCHD4P; L, all interactions in My-La cells.

219

CHi-C in the 6q23 (TNFAIP3) risk locus

The second candidate locus examined was 6q23, which was initially investigated using 3C. Here, the baited fragments included the RA locus upstream of TNFAIP3 and the psoriasis- associated locus at TNFAIP3 tagged by rs582757 (Figure 52). In HaCaT cells, the psoriasis- associated bait fragments did not directly interact with the immune-related genes on the left hand side (IL20RA, IL22RA2 and IFNGR1). Rather, the psoriasis bait fragments interacted with the RA-associated locus upstream of TNFAIP3, which in turn interacted with the immune gene cluster. However, none of the interactions with the RA-associated bait fragments corresponded with the immune-related gene promoters in HaCaT cells. In My-La cells, interactions occurred between the psoriasis-associated fragments and the immune-related gene cluster, although none of the interactions intersected fragments containing promoters for those genes. As in HaCaT cells, the RA locus also interacted with the immune gene cluster, but in My-La cells there were direct interactions between RA- associated bait fragments and gene promoter fragments for IL20RA and IFNGR1.

In all three conditions, a psoriasis-associated bait fragment downstream of TNFAIP3 (chr6:138223162-138233161) interacted with the promoter of TNFAIP3 over a distance of 30 kb. This bait fragment is adjacent (centromeric) to the one previously targeted in 3C: “Ps SNPs 2”. In My-La cells, an additional interaction with the TNFAIP3 promoter occurred with the bait fragment adjacent to “Ps SNPs 2” on the telomeric side (chr6:138241190- 138244748). A number of non-coding RNAs were also implicated by bait-promoter fragment interactions in 6q23: RP11-95M15.2, AL357060.1, RP11-10J5.1, RP11-240M16.1 (HaCaT unstimulated/stimulated and My-La), RP11-55K22.2, RP11-356I2.1 and Y_RNA.571 (My-La) (Figure 52).

220

A B

i. ii. C D E ii. iii. F i. iv.

G

i. ii. H iii. iv.

I

J iii. vii. ii. v. vi. K i. iv.

L

Figure 52: CHi-C interactions from all bait fragments in the 6q23 (TNFAIP3) risk locus Significant interactions (from all autoimmune baits) are displayed as arcs, while promoter interactions (only from psoriasis-associated baits) are displayed as columns where the height of the column is proportional to the number of interacting fragments. A, position on chromosome 6; B, GENCODE V19 genes representing coding (blue), non-coding (green) and pseudogene (pink); C, bait fragments for i. RA and ii. psoriasis/PsA/RA; D, psoriasis-associated LD block; E, ChromHMM for NHEK cells (Roadmap); F, promoter interactions in unstimulated HaCaTs for the following: i. RP11-95M15.2, ii. AL357060.1, iii. TNFAIP3 and iv. RP11-10J5.1 and RP11-240M16.1; G, all interactions in unstimulated HaCaTs; H, promoter interactions in stimulated HaCaTs for the following: i. RP11-95M15.2, ii. AL357060.1, iii. TNFAIP3 and iv. RP11-10J5.1 and RP11-240M16.1; I, all interactions in stimulated HaCaTs; J, ChromHMM for CD8+ T cells (Roadmap); K, promoter interactions in My-La cells for the following: i. RP11-55K22.2, ii. RP11-95M15.2, iii. AL357060.1, iv. RP11-356I2.1, v. Y_RNA.571, vi. TNFAIP3 and vii. RP11-10J5.1 and RP11-240M16.1; L, all interactions in My- La cells.

221

CHi-C in the 5p13.1 (PTGER4) risk locus

The 5p13.1 risk locus represents a case where an intergenic psoriasis association interacts with a novel candidate gene. In this locus the psoriasis association, tagged by rs114934997, is present in a 1.25 Mb gene desert with the nearest gene candidates including prostaglandin E receptor 4 (PTGER4) and caspase recruitment domain family member 6 (CARD6). CHi-C revealed that several fragments within the psoriasis association interacted with the promoter of PTGER4 over a distance of approximately 300 kb in all conditions (Figure 53). In both unstimulated and stimulated HaCaTs, seven bait fragments interacted with the PTGER4 promoter and there was no change in gene expression according to the Illumina expression array (fold change 1.077). In My-La cells, eight bait fragments interacted with the PTGER4 promoter. Additional bait-promoter interactions in My-La cells included a non-coding RNA SNORA63.3 as well as coding genes TTC33 and RPL37 (Figure 53).

222

A B

i. ii. C D E i. F G

i. H

I

J ii. K i. iii. iv.

L

Figure 53: CHi-C interactions from psoriasis-associated bait fragments in the 5p13.1 risk locus Significant interactions are displayed as arcs, while promoter interactions are displayed as columns where the height of the column is proportional to the number of interacting fragments. A, position on chromosome 5; B, GENCODE V19 genes representing coding (blue), non-coding (green), pseudogene (pink) and problem (red); C, bait fragments for i. psoriasis and ii. PsA; D, psoriasis-associated LD block; E, ChromHMM for NHEK cells (Roadmap); F, promoter interactions in unstimulated HaCaTs for the following: i. PTGER4; G, all interactions in unstimulated HaCaTs; H, promoter interactions in stimulated HaCaTs for the following: i. PTGER4; I, all interactions in stimulated HaCaTs; J, ChromHMM for CD8+ T cells (Roadmap); K, promoter interactions in My-La cells for the following: i. SNORA63.3, ii. PTGER4, iii. TTC33 and iv. RPL37; L, all interactions in My-La cells.

223

CHi-C in the 6p22.3 (CDKAL1) locus

The 6p22.3 risk locus represents a case where an intronic association interacts with another candidate gene. In this locus the psoriasis association, tagged by rs4712528, is located within introns 4 and 5 of the cyclin dependent kinase 5 regulatory subunit associated protein 1 like 1 (CDKAL1) gene. Here, CHi-C revealed interactions between psoriasis-associated bait fragments and the promoter of SRY-box 4 (SOX4), in all conditions (Figure 54). The promoter of SOX4 was defined by two fragments (chr6:21571280-21593555 and chr6:21593556-21594408). In unstimulated HaCaT cells, there were a total of 3 interactions between psoriasis bait fragments and the SOX4 promoter fragments. In stimulated HaCaT cells, there were a total of 7 interactions between psoriasis bait fragments and the SOX4 promoter fragments. This coincided with a decrease in SOX4 expression according to the Illumina expression array (fold change 0.475). In My-La cells, there were a total of 21 interactions between psoriasis bait fragments and the SOX4 promoter fragments.

Further bait-promoter interactions in 6p22.3 implicated the CDKAL1 gene itself (My-La), as well as non-coding genes RP3-348I23.3 (My-La) and RP11-204E9.1 (stimulated HaCaT cells).

224

A B

C D E F i.

G

ii. H i.

I

J iv. K i. ii. iii.

L

Figure 54: CHi-C interactions from psoriasis-associated bait fragments in the 6p22.3 risk locus Significant interactions are displayed as arcs, while promoter interactions are displayed as columns where the height of the column is proportional to the number of interacting fragments. A, position on chromosome 6; B, GENCODE V19 genes representing coding (blue), non-coding (green) and pseudogene (pink); C, bait fragments for psoriasis; D, psoriasis-associated LD block; E, ChromHMM for NHEK cells (Roadmap); F, promoter interactions in unstimulated HaCaTs for the following: i. SOX4; G, all interactions in unstimulated HaCaTs; H, promoter interactions in stimulated HaCaTs for the following: i. RP11-204E9.1 and ii. SOX4; I, all interactions in stimulated HaCaTs; J, ChromHMM for CD8+ T cells (Roadmap); K, promoter interactions in My-La cells for the following: i. CDKAL1, ii. RP3-348I23.3, iii. CDKAL1 (non-coding) and iv. SOX4; L, all interactions in My-La cells.

225

CHi-C in the 18q21 (POLI) locus

The 18q21 locus represents a case where an association within a gene cluster interacts with a candidate gene outside of the cluster in a stimulation-specific manner. In this locus the psoriasis association, tagged by rs545979, is situated in non-coding regions in and around the polymerase (DNA directed) iota (POLI) gene. Here, CHi-C revealed interactions between psoriasis-associated bait fragments and POLI promoter fragments in all conditions (Figure 55). In addition, there were interactions between the psoriasis- associated bait fragments and the transcription factor 4 (TCF4) gene over a distance of approximately 1.25 Mb. In My-La cells, this corresponded with various promoter fragments of TCF4.

In stimulated HaCaT cells there was an interaction between a psoriasis bait (chr18:51779298-51784661) and a fragment containing the promoter of the member RAS oncogene family (RAB27B) gene, over a distance of approximately 716 kb. This interaction was not seen in unstimulated HaCaT cells and coincided with a 2.3-fold increase in RAB27B expression upon stimulation (Illumina array).

In My-La cells, interactions occurred between the psoriasis baits and fragments containing the promoters of the nearby coding genes MBD2, C18orf54 and STARD6, as well as the non-coding gene SNORA37 (Figure 55).

226

A B

C i. ii. D E i. F

G

H i. ii.

I

J ii. K i. iii. iv.

L

Figure 55: CHi-C interactions from psoriasis-associated bait fragments in the 18q21 risk locus Significant interactions are displayed as arcs, while promoter interactions are displayed as columns where the height of the column is proportional to the number of interacting fragments. A, position on chromosome 18; B, RefSeq genes representing coding (blue); C, bait fragments for i. psoriasis/PsA and ii. Suggestive association for systemic JIA (unpublished data); D, psoriasis-associated LD block; E, ChromHMM for NHEK cells (Roadmap); F, promoter interactions in unstimulated HaCaTs for the following: i. POL1; G, all interactions in unstimulated HaCaTs; H, promoter interactions in stimulated HaCaTs for the following: i. POL1 and ii. RAB27B; I, all interactions in stimulated HaCaTs; J, ChromHMM for CD8+ T cells (Roadmap); K, promoter interactions in My-La cells for the following: i. MBD2 and SNORA37, ii. POL1, iii. C18orf54 and STARD6, and iv. TCF4 (multiple transcripts); L, all interactions in My-La cells.

227

2.4.3 Summary of functional work In summary, the functional work has revealed a number of clues towards the mechanisms underlying psoriasis genetics, in a keratinocyte cell line (HaCaT) and a CD8+ T cell line (My-La). Some key findings in this section were:

 In 9q31, psoriasis-associated variants overlapped marks of active enhancers in HaCaT cells, but not in My-La cells. Fragments containing psoriasis-associated variants formed long-range interactions (~500 kb) with regions surrounding KLF4 in both HaCaT and My-La cells. These interactions were detected using both 3C and CHi-C.  In 6q23, there were complex long-range interactions that involved the psoriasis- associated variants at TNFAIP3 and RA-associated variants upstream of TNFAIP3. In accordance with previous findings of others, 3C and CHi-C detected interactions between disease-associated variants and regions near immune-related genes IL20RA, IL22RA2, IFNGR1 and TNFAIP3. CHi-C also found interactions between the psoriasis variants and the RA variants in both cell types, suggesting a shared disease mechanism.  In total, CHi-C revealed interactions between the psoriasis-associated fragments and the promoters of more than 600 genes.  Genes interacting with psoriasis-associated fragments in CHi-C were enriched for several processes including keratinization, immune signalling (HaCaT and My-La) and IL-2 signalling (My-La).  CHi-C data implicated several compelling gene targets mediated by interactions between psoriasis-associated fragments and promoter fragments in further susceptibility loci, such as: o PTGER4 (My-La cells and HaCaT cells) in 5p13.1 o CDKAL1 (My-La cells) and SOX4 (My-La cells and HaCaT cells) in 6p22.3 o POLI (My-La cells and HaCaT cells), TCF4, MBD2, C18orf54, STARD6 (My-La cells) and RAB27B (IFN-stimulated HaCaT cells) in 18q21

228

2.5 Discussion of functional work This section of the thesis launched an investigation into the function of established early- onset psoriasis risk loci. Traditional genetic studies can only go so far towards identifying the likely causal variants and genes that play a role in psoriasis susceptibility. In recent years, the swiftly developing field of functional genomics has enabled huge insights into the mechanisms underlying non-coding GWAS risk loci for common autoimmune disorders. For instance, it is has been found that non-coding variants can act to influence gene regulation through physical looping with target genes and through alteration of the binding of regulatory elements. However, this sort of work has been lacking in psoriasis and this chapter of the thesis is a crucial step towards understanding more about the molecular function of psoriasis risk loci. To the author’s knowledge, this thesis presents the first use of chromatin conformation techniques across psoriasis risk loci. The overall findings therefore provide an initial indication of novel gene targets for psoriasis.

2.5.1 Discussion of functional characterisation of individual risk loci Recent large-scale studies have utilised chromatin interaction techniques to facilitate the linking of non-coding disease-associated variants with their target genes in relevant cell types (Dryden et al., 2014; Martin et al., 2015; Mumbach et al., 2017). These studies have shown that non-coding variants may not interact with the nearest gene; for instance Mumbach et al. (2017) found that only 14% of gene targets for intergenic disease SNPs were the closest gene to the SNP. In addition, interactions can be highly cell-type specific and warrant further, detailed efforts to characterise disease mechanisms (Javierre et al., 2016; Mumbach et al., 2017). In this section of the thesis, the approach towards functional characterisation of individual psoriasis risk loci involved a combination of in silico and in vitro techniques in line with other, hypothesis-driven studies in the literature that use similar approaches (Dunning et al., 2016; Hoskins et al., 2016; McGovern et al., 2016; Visser et al., 2015; Wang et al., 2013). For example, Dunning et al. (2016) looked at a single breast cancer risk locus at 6q25 (ESR1). Among other techniques they used ChIP to show that risk variants bound the transcription factors GATA3 and CTCF and used 3C to show that a risk allele increased the strength of interactions between a disease- associated enhancer and the promoters of ESR1 and RMND1. Overall their study proved that various independent variants in the locus affected ESR1 expression (Dunning et al., 2016).

229

An important pre-requisite to functional studies is that the loci in question represent true disease associations. Therefore, the 9q31 and 6q23 are ideal because they have both been reported with high confidence in a large-scale Immunochip meta-analysis (Tsoi et al., 2012). A limitation of functional studies is that they rely on the reported lead SNPs representing the peak of disease association. During the final stages of the functional work described in this thesis, the 9q31 and 6q23 loci were replicated in the largest psoriasis meta-analysis to date at genome-wide significance (P = 9.15 x 10-10 and P = 1.04 x 10-25, respectively (Tsoi et al., 2017). Importantly, the variants had the same direction of effect in each of the datasets incorporated in the analysis. In the 6q23 locus the lead SNP remained the same as in the 2012 analysis (rs582757) whilst the lead SNP in the 9q31 locus changed to rs11531804, which is in tight LD (r2 = 0.93) with the lead SNP reported in 2012 (rs10979182) (Tsoi et al., 2017). Therefore, despite the shift in the lead SNP in 9q31, the variant set and the encompassed regulatory features described in the present analysis remain the same.

2.5.1.1 Bioinformatics, ChIP and 3C revealed that KLF4 is a likely target gene in the 9q31 risk locus In 9q31, bioinformatic analysis revealed a large number of variants within the gene desert that correlated with the lead variant (rs10979182). Many of these variants had evidence of regulatory function according to the available epigenetic datasets, which made prioritising likely causal variants challenging. Whilst rs10979182 had the highest PICS score, it was only predicted to have a 5.3% chance of being the causal SNP (Farh et al., 2015). The next step towards linking the associated SNPs with a gene was to search for eQTLs among the variants; however none were found in the datasets queried. Over recent years, an explosion in eQTL resources has shown clearly that eQTLs are cell-type, state, tissue and stimulus-specific (Fairfax et al., 2014; GTEx Consortium, 2017; Lappalainen et al., 2013; Westra et al., 2013), therefore it could be that eQTLs exist among the variant set in 9q31 in as-yet untested conditions. In order to inform the genetic association with disease, lead GWAS variants should co-localise with lead eQTLs; this has been recently tested using Bayesian co-localisation approaches in relevant cell types (Giambartolomei et al., 2014; Guo et al., 2015). Co-localisation has the capability to fine-map likely causal SNPs, where the eQTL and GWAS data identify the same subset of SNPs. Variants can also have allele-specific effects on the epigenome, such as DNA accessibility and histone modification (McVicker et al., 2013); in the future these types of 230

QTL will be incorporated in a new enhanced version of the GTEx Project (eGTEx), which should prove an invaluable resource for the annotation of GWAS variants (eGTEx Project, 2017). Again, this type of QTL co-localisation will have the capability of better informing a likely subset of causative SNPs and their mechanism of effect.

To design functional experiments, it was decided to focus on psoriasis-associated SNPs overlapping enhancer elements. GWAS variants are enriched in enhancer elements (Kundaje et al., 2015), with approximately 60% of predicted causal SNPs lying within immune enhancers in cell types relevant to the disease of interest, and approximately 8% in promoters (Ernst et al., 2011; Farh et al., 2015). Enhancers can be identified by the presence of regulatory histone marks (H3K4me1 and H3K27ac), DNase hypersensitivity (DHS) and transcription factor binding sites (Calo and Wysocka, 2013; Dunham et al., 2012). Recent hypothesis-driven experiments in disease loci have been successful in identifying variants affecting enhancer function; for example, Roberts et al. (2016) showed that a SNP associated with ankylosing spondylitis in a putative enhancer near IL23R affected H3K4me1 methylation in CD4+ T cells and changed the proportion of Th1 CD4+ T cells, providing a potential disease mechanism.

In 9q31, the psoriasis-associated LD block overlapped roughly three active enhancer elements, according to bioinformatic evidence. Within each of these regions, a SNP was selected that scored highly in RegulomeDB and CADD analyses. ChIP-qPCR results showed that the three SNPs bound H3K4me1 and H3K27ac histone marks in HaCaT keratinocytes but not My-La CD8+ T cells. These findings are supported by cell line ChIP-seq data (ENCODE) and primary cell data from the Epigenomics Roadmap consortium (ChromHMM), which predicts the present of active enhancers in these regions in NHEK but not in primary CD8+ T cells (memory or naïve) (Ernst and Kellis, 2012; Ernst et al., 2011).

Since the regulatory marks in 9q31 are enriched in keratinocyte cells, it is likely that the target gene candidate is expressed in the skin. Of all the genes tested, KLF4 had tissue specificity for skin according to GTEx. This data is supported by an immunohistochemical study showing that KLF4 is expressed in the basal and suprabasal layers of the skin (Kim et al., 2014). This study also showed that KLF4 is upregulated in psoriatic plaques in comparison with non-involved skin or control skin. KLF4 is an important transcription factor that has been shown to upregulate IL-17 production by binding to the promoter of 231 the IL17A gene in vitro (An et al., 2011). KLF4 also has roles in macrophage and keratinocyte differentiation and skin barrier formation in mice (Segre et al., 1999).

Here, the 3C-qPCR assays in 9q31 demonstrated that a number of the HindIII fragments containing psoriasis-associated variants interact with fragments surrounding KLF4 in both HaCaT and My-La cells. The peak of the interaction in the first 3C-qPCR assay was between the fragment containing the second putative enhancer, which encompassed rs6477612 as well as a high-scoring 30 bp indel rs55975335, and a fragment 8.7 kb downstream of KLF4 (Centromeric 1). This interaction stretched over 500 kb and was found to be significantly stronger than with the fragment containing KLF4 itself in HaCaT cells and was significantly stronger than with the fragment containing the promoter of CTNNAL1 on the other side of the gene desert in HaCaT cells and My-La cells. The second 3C-qPCR assay aimed to test this interaction by using the Centromeric 1 fragment as the anchor, and testing interactions across the gene desert. This assay revealed that the Centromeric 1 fragment has stronger interactions with fragments further into the gene desert than with the psoriasis-associated putative enhancers. This finding is supported by Hi-C data in the literature that shows that the gene desert is in a single topologically associating domain (TAD) in NHEK and regions near KLF4 interact strongly with intergenic fragments across the locus (Rao et al., 2014). In addition, a previous CHi-C study looking at breast cancer regions showed specific interactions between KLF4 and an intergenic fragment near a breast cancer association, further into the gene desert, in breast cancer cell lines (Dryden et al., 2014). In the present study, the peak of interaction was seen between KLF4 and a fragment ~2 kb from the observed interaction in Dryden et al. (the “positive 2” fragment). The KLF4 gene is also a good candidate gene for breast cancer because it is expressed in breast cancer cell lines (Dryden et al., 2014) and can act as an oncogene by causing a stem cell-like state (Yu et al., 2011).

The third 3C-qPCR assay utilised a TaqMan probe at the KLF4 promoter to identify if a peak of interaction could be observed across the psoriasis locus. The data from this assay suggested that KLF4 interacts with the whole of the psoriasis LD block and not specifically with any one fragment. Several of the interactions were significant in comparison with an intervening intergenic interaction, notably fragments 3 and 9 with KLF4 in both HaCaT and My-La cells, where fragment 9 contained the third putative enhancer (rs4978343). This fragment also contained the index SNP, rs10979182. Again, in this assay KLF4

232 strongly interacted with a non-disease-associated fragment further into the gene desert, suggesting a general overall structure whereby KLF4 may be regulated by multiple intergenic enhancers. Indeed there is evidence to suggest that some genes are regulated by distributed enhancers or so-called “shadow enhancers” (Hong et al., 2008), which allows for some redundancy in the enhancer set (Barolo, 2012). Research has shown that a combination of genomic variants, as opposed to a single variant, likely affect target gene expression from within multiple enhancers (Corradin et al., 2014), and multiple enhancers can be ranked based on their ability to regulate gene function (Fulco et al., 2016).

An interaction between the psoriasis-associated fragments and KLF4 is not seen in promoter capture Hi-C data in the literature. However, interactions are seen between the KLF4 promoter and a fragment approximately 15 kb from the most centromeric SNP (rs562409617) in macrophages and monocytes (Javierre et al., 2016). In addition, the stronger interactions with fragments further into the gene desert, identified in this thesis, are also observed in several cell types including macrophages and B cells (Javierre et al., 2016). These interaction patterns were also observed in data from our group using a promoter capture Hi-C approach in GM12878 (lymphoblastoid cells) and Jurkat (CD4+ T cells) cell lines (Martin et al., 2015). Despite the lack of direct interactions observed between the psoriasis variants and KLF4 in the literature, it should be noted that CHi-C data is not yet available in keratinocyte cell types.

To further investigate potential disease mechanism in 9q31, HaCaT cells were stimulated with IFN-γ, a known inducer of KLF4 expression (Feinberg et al., 2005; Madonna et al., 2010). Stimulation of HaCaT cells with IFN-γ showed an 8-fold increase in KLF4 expression according to qPCR data, whilst IL-17A stimulation had no effect (data not shown). However, IFN-stimulation did not correlate with changes in histone mark binding or chromatin interactions involving the psoriasis association in HaCaT cells as assayed by ChIP and 3C. Therefore, there is no evidence for an IFN-γ-mediated regulation of KLF4 by the psoriasis-associated putative enhancers despite the effect of IFN-γ on KLF4 expression.

Combined, the results for 9q31 propose a system whereby psoriasis-associated regulatory variants within the gene desert physically interact with KLF4; an interaction that is robust across at least two cell types. However, bioinformatics and ChIP data shows that the 233 psoriasis-associated variants intersect active enhancer marks in keratinocytes but not CD8+ T cells, suggesting a potential skin-specific model for the 9q31 locus. This supplements previous findings by Swindell et al. (2014), who showed that KLF4 expression is most detectable in keratinocytes compared with nine other psoriasis- relevant cell types, and prompts further functional analysis in keratinocytes to determine if the psoriasis-associated variants affect KLF4 expression. Encouraging data in the literature shows how work in the relevant cell type can reveal functional SNP-gene relationships, for example Musunuru et al. (2010) showed that a non-coding SNP associated with LDL cholesterol and myocardial infarction affected transcription factor binding and expression of SORT1 in hepatocytes, which led to alterations in lipoprotein levels in mouse liver. Hence in this way a non-coding SNP can be taken through to disease phenotype.

2.5.1.2 Bioinformatics and 3C confirmed that a complex interaction landscape exists in the 6q23 (TNFAIP3) risk locus In 6q23, bioinformatic analysis revealed only a small number of variants in tight LD with rs582757; these were distributed in and around TNFAIP3. As in 9q31, none of the variants were eQTLs in any of the queried datasets. Previous CHi-C data has implicated multiple immune-related gene targets of the RA association in 6q23 (Martin et al., 2016; Martin et al., 2015; McGovern et al., 2016). Therefore, 3C assays were carried out in HaCaT cells to identify interactions between immune-related genes and the psoriasis association, which have so far not been investigated in a skin cell type. Firstly, 3C-qPCR assays confirmed that the RA association interacts with fragments near IL22RA and IFNGR1 in HaCaT cells, as expected. Next, assays showed that a psoriasis-associated fragment downstream of TNFAIP3 (Ps SNPs 2) interacts with both gene regions as well, whereas the intronic association (Ps SNPs 1) does not (Figure 44). IL22RA2 was found to be expressed in the skin (GTEx) and it encodes a protein that binds to IL-22 and prevents it from binding to the IL-22 receptor. IFNGR1 is expressed in whole blood (GTEx) and encodes a protein that forms part of the receptor for IFN-γ. Therefore, both of these genes have a clear function in innate immunity. The downstream fragment “Ps SNPs 2” contains two SNPs correlating with rs582757: rs4895498 and rs6933987. These SNPs bind to marks of enhancers and overlay DNase hypersensitive regions in blood, and alter protein-binding motifs (Haploreg). Therefore, it is possible that these variants lie within an enhancer that regulates IL22RA2 and IFNGR1 function in 6q23. 234

Despite the evidence for long-range interactions in 6q23, TNFAIP3 itself cannot be ruled out as a candidate gene. The protein encoded by TNFAIP3, A20, is known to be involved in IL-17 signalling, making it a strong candidate for psoriasis (Garg and Gaffen, 2013). Here, two of the psoriasis-associated SNPs were found to be close to the TNFAIP3 promoter. In addition the downstream fragment containing potentially regulatory SNPs (Ps SNPs 2) was found to interact with the TNFAIP3 promoter through 3C-qPCR. Similarly in the literature, a study found that a fragment downstream of TNFAIP3 containing a pair of variants associated with SLE (denoted TT>A) interacted with the promoter of TNFAIP3 in various cell types (Wang et al., 2013). The reported fragment in Wang et al. is adjacent to the Ps SNPs 2 fragment, although the psoriasis-associated variants are not in strong LD with the SLE-associated variants. In Wang et al, TT>A was shown to be in a functional enhancer binding NF-kB and SATB1 that regulated TNFAIP3 expression in lymphoblastoid cell lines (Wang et al., 2013). Recently, the same group used genome editing to show that disrupting the TT>A variants reduced their interaction with the TNFAIP3 promoter and decreased gene expression (Wang et al., 2016). In the light of the genetic overlap between psoriasis and SLE (Ramos et al., 2011), it would be particularly interesting to see how the genetic mechanisms compare between the two diseases in 6q23.

Overall, the 3C-qPCR assays in 6q23 confirmed the presence of strong interactions between regions near immune-related genes and both the RA association and the psoriasis association. Studies of the 3D genome have indicated several loci where genes and regulatory elements come together in a structure known as a transcription factory (Schoenfelder et al., 2010a). For example, active globin genes in the mouse genome preferentially form transcription factories with other actively expressed genes (Schoenfelder et al., 2010b). The genes in these transcription factories are thought to make shared use of regulatory factors such as transcription factors within the cluster. In 6q23, the 3C findings described here support the hypothesis that the autoimmune- related enhancer elements of different diseases interact with each other and regulate several different genes in a transcription factory, which was first proposed for 6q23 by Martin et al. (2016) and McGovern et al. (2016). Determining the regulatory factors involved in this structure, as well as the cell type specificity, might elucidate how the genes are differentially regulated in the related autoimmune conditions.

235

2.5.1.3 Strengths and limitations of methods used in the locus-specific study The use of bioinformatics in the initial stages of the locus-specific study allowed for the development of detailed hypothesis about the causal variants and target genes in the 9q31 and 6q23 risk loci. A strength of the bioinformatic pipeline is that it makes use of data from large international consortia. A drawback is that it is dependent upon the resources being constantly updated alongside the swiftly moving field of genetics. For example, not all of the SNPs could be awarded a PICS score since the tool utilises data from 1KG Phase 1 that contains fewer variants than the more recent Phase 3 (Farh et al., 2015). Therefore the user needs to be aware of advancements made in the literature.

Another drawback of the bioinformatic pipeline is that the information was collated manually, and the selection of likely causal variants was fairly subjective due to the absence of any weighting technique. A limitation of this approach is that causal variants could be missed. For the 9q31 ChIP experiment, high-scoring SNPs in RegulomeDB were selected for follow-up; however, one of the likely functional variants in 9q31 was a 30-bp deletion, rs55975335, situated within the second putative enhancer region. Whilst this variant was not examined in the ChIP experiment, it falls within the restriction fragment containing the second putative enhancer region that was shown to interact with fragments nearby KLF4 in the 3C experiments. It is therefore possible that a functional relationship exists between this indel and KLF4; future work could explore this hypothesis (see Future Work, Section 2.5.3.1). In the future, causal variants could be better predicted using Bayesian approaches that generate credible sets of SNPs by integrating functional annotation with GWAS summary data. Currently available tools include PAINTOR (Kichaev et al., 2014), CAVIARBF (Chen et al., 2015) and FINEMAP (Benner et al., 2016). The credible SNP set could then be taken forward for experimental approaches.

This study made use of two different cell lines for the experimental approaches towards functional characterisation; this allowed for identification of cell-specific regulatory mechanisms. However, the use of cell lines can be considered a major limitation of the findings since a cell line may not represent a true model of a human cell (Seo et al., 2012). The decision to use cell lines was made because they can be grown quickly and inexpensively to large numbers with relative ease. This was necessary for methods such as 3C-qPCR, which requires 25-30 million cells per library in order to capture library complexity, and to retain enough material for analysis at the end of the process. Where

236 possible, results in the HaCaT and My-La cell lines have been compared with data from primary keratinocyte and CD8+ T cells found in the literature. In addition, for ChIP the H3K4me1 assay showed that NHEK data mirrored that of the HaCaT enrichment at the regions tested, increasing confidence in the validity of the HaCaT model. However, the results should still be treated with caution. With no limiting financial factor the experiments described here could potentially be repeated in NHEK cells pooled from multiple donors.

In the locus-specific study, ChIP-qPCR proved a swift and inexpensive way to quantify histone mark binding (H3K4me1 and H3K27ac) at prioritised SNPs of interest. The nature of the method allows for simple comparison between different genomic locations and between different cell types. However, a limitation of ChIP is that optimisation is required for each cell type, antibody and primer pair. This was not particularly problematic for the H3K4me1 and H3K27ac antibodies, which have been used extensively in our lab. The titration experiment for input cell numbers also demonstrated that fewer cells could be used without much influence on the percentage ChIP enrichment, particularly for the H3K4me1 antibody. This might allow for using rarer cell populations in future experiments. Alternatively, newer ChIP-seq methods that do not use the formaldehyde crosslinking step (native ChIP-seq) can allow for smaller input from 103 to 106 cells per immunoprecipitation for histone modifications (Brind'Amour et al., 2015; Gilfillan et al., 2012). However, it will be more challenging to perform ChIP for antibodies targeting less abundant proteins such as transcription factors. So far, initial attempts have been made to examine binding of RNA polymerase 2 at the prioritised variants in 9q31, which would indicate their influence on transcription. However, the data is not described in this thesis because more optimisation is required to obtain reliable results.

A strength of 3C-qPCR is that provides a high-resolution view of chromatin interactions that is only limited by the length of the restriction fragment. For HindIII, this cuts approximately every 4000 bp. However, a limitation is that 3C can only be used to examine interactions one-by-one, which is time-consuming and potentially biased towards the specific fragments targeted; many targeted fragments are required in order to identify the peak of an interaction (Hagege et al., 2007). This is why 3C is now often performed as a validation of more hypothesis-free chromatin conformation techniques like CHi-C (Dryden et al., 2014; McGovern et al., 2016). For the 6q23 region, previous

237

CHi-C data covering the psoriasis association was available in different cell types; the dataset is described in Martin et al. (2015). The 3C findings in this study reflected those previous findings indicating that different cell types share long-range interactions. For the 9q31 locus, however, no previous study had targeted the psoriasis association in any cell type. The requirement for a more hypothesis-free view of chromatin interactions in psoriasis loci led to the CHi-C study described in this thesis (discussed in Section 2.5.2).

The majority of the qPCR assays for ChIP and 3C were conducted using SYBR© Green as the reporter dye. The positive aspect of using SYBR© Green in comparison with TaqMan© is that it does not require a fluorescent probe for each target region. This was especially useful for 3C because it meant that there was greater flexibility in the selection of an anchor fragment, allowing for the inexpensive testing of multiple hypotheses. However, a downside is that SYBR© binds with all dsDNA, increasing the risk of signal from off-target regions that have been amplified in the PCR reaction. For this reason it was particularly important to design the primers so that they should only bind to one sequence in the genome. For each qPCR assay, the primer pairs were each tested against the BAC control library and the melt curves were checked to determine a single amplified product. However, for some primer pairs there was a double peak in the melt curve indicating primer-dimer or amplification of multiple products; in these cases the qPCR data was not included in the analysis. This had particular significance in the 6q23 region, in which previous CHi-C and 3C data has implicated strong interactions between the RA association and the IL20RA gene (McGovern et al., 2016). Initial attempts to perform 3C-qPCR on this interaction, however, have not been included in this thesis because all primer pairs designed to capture this interaction produced double melt curves indicative of non- specificity.

In the 9q31 locus, the final 3C-qPCR assay utilised TaqMan© technology to determine if any of the psoriasis-associated fragments interacted with the KLF4 promoter. The addition of a TaqMan probe in the anchor fragment reduces the likelihood of signal from non-specific amplification products and increases confidence in the results. TaqMan could not be utilised for every 3C-qPCR assay because it would be prohibitively expensive to design a probe to capture every interaction using a different anchor fragment. However, TaqMan could be used in the 6q23 locus to determine if an interaction exists between IL20RA and the disease-associated loci in the cell lines used here. This is especially

238 important since IL20RA encodes a subunit for the receptor for IL-20, which has been shown to have a role in psoriasis. A study showed that overexpression of the IL20 gene in mice causes a psoriasis-like condition, whilst upregulation of the genes encoding IL-20 receptor subunits occurs in human psoriatic skin compared with uninvolved skin (Blumberg et al., 2001). Recently, a clinical trial was carried out to examine the safety and preliminary efficacy of an anti-IL-20 antibody in psoriasis, although it was ended prematurely due to an absence of improvement in the PASI score (Gottlieb et al., 2015).

A limitation of the locus-specific functional work was that none of the assays in 9q31 or 6q23 were carried out in an allele-specific manner. To determine if a disease-associated SNP regulates gene function, there should be a difference in gene expression or chromatin features mediated by the risk allele or the non-risk allele of the causal variant. This could be investigated by using lymphoblastoid cell lines, for example, that are homozygous risk, homozygous non-risk and heterozygous for the SNP in question. This was not attempted in the primary locus of interest (9q31) because bioinformatics suggested that the regions overlapping the susceptibility SNPs were not enriched for histone marks (H3K4me1 and H3K27ac) in lymphoblastoid cell lines in comparison with keratinocytes. However, once functional methods are optimised for primary cells, experiments could potentially be carried out in NHEK of differing genotypes to determine differential allelic effect on chromatin features and gene expression. For example, one study used three different pancreatic cell lines that were homozygous for the risk allele, heterozygous or homozygous for the protective allele to investigate chromatin interactions in a pancreatic cancer locus (Hoskins et al., 2016). Alternatively, direct perturbation of the risk variants could be conducted using genome modification; discussed in Future work (Section 2.5.3.1).

2.5.2 Discussion of functional characterisation of multiple risk loci The approach towards functional characterisation of multiple psoriasis risk loci involved a CHi-C experiment in two cell types, including a stimulatory experiment with corresponding gene expression data. CHi-C can be used to map interactions from gene promoters (Javierre et al., 2016; Martin et al., 2015) and has also been very successful in identifying gene targets in several diseases including RA, JIA, PsA, type 1 diabetes, breast cancer and colorectal cancer (Dryden et al., 2014; Jager et al., 2015; Martin et al., 2015). The lack of chromatin conformation data in psoriasis-associated loci, or indeed in

239 keratinocytes at all, made the use of this technique exciting, applied to psoriasis for the first time.

2.5.2.1 Expression microarray The pro-inflammatory cytokines selected for stimulating HaCaT cells were IL-17A and IFN-γ, since both of these molecules are important in psoriasis pathology; additionally, protocols for keratinocyte stimulation were available in the literature (Cho et al., 2012; Shi et al., 2011). In psoriasis, the importance of IL-17 signalling has recently been highlighted by recent effective IL-17-targeting therapeutics, as described in the introduction (Section 0.7.4.3). Meanwhile IFN-γ represents the classical Th1 response where IFN-γ is released in abundance by activated T cells in psoriatic skin (Lowes et al., 2008). Research has shown that both IL-17 and IFN-γ-related pathways in particular are suppressed upon effective treatment of psoriasis with narrow-band ultraviolet therapy (Rácz et al., 2011). GWAS targets of IL-23/Th17 signalling are thought to include IL-23R, IL12B, IRF4, TRAF3IP2, NFKBIZ, IL-23A, SOCS1 and STAT3, whereas interferon signalling is related to gene targets such as DDX58, SOCS1, IFNGR1, IFIH1 and IL28RA.

Due to the cost and time limitation of the CHi-C experiment, a single stimulatory time- point needed to be selected that upregulated multiple genes with roles in inflammation, as well as GWAS targets. Here, stimulation of HaCaT cells with IL-17A and IFN-γ upregulated various pro-inflammatory cytokines as expected. For example, upregulated genes induced by IL-17A stimulation included S100A8 (17.8-fold at 48 hours) and CXCL1 (3.5-fold at 48 hours), as previously shown (Cho et al., 2012). Upregulated genes induced by IFN-γ included expected cytokines such as CXCL9 (427.6-fold at 8 hours) and CXCL10 (9.26-fold at 24 hours) (Kanda et al., 2007). In general, stimulating with IFN-γ had a much greater effect on gene expression than IL-17A (Figure 46, page 210), with several known GWAS target genes upregulated after 8 hours stimulation with IFN-γ including IFIH1, IL23A and KLF4, the latter of which was implicated in the locus-specific study. Research has shown that a combination of CD8+ T cells and IFN-γ can induce a psoriasis-like response in mice (Di Meglio and Duarte, 2013), linking to the use of My-La CD8+ T cells in this study.

2.5.2.2 CHi-C experiment The CHi-C experiment provides an opportunity to study gene regulation in psoriasis susceptibility loci at a genome-wide level. The overarching analysis of the CHi-C data 240 revealed that approximately 28 – 30% of interactions from psoriasis baits intersected fragments containing promoters for genes or non-coding RNAs at the target end, implicating 1009 gene/non-coding RNA targets. The CHi-C data also indicated that, among the bait-promoter interactions, the psoriasis-associated fragments interacted with more than one promoter fragment on average. This is supported by data from the recent HiChIP study, which reported that each autoimmune-associated SNP in their analysis had an average of 1.75 gene targets implicated by chromatin interactions (Mumbach et al., 2017).

The CHi-C data indicated cell type-specific interactions and gene targets, in line with Martin et al. (2015), who reported that only 20% of CHi-C interactions were shared between a B and T cell line. Similarly, Dryden et al. (2014) reported that the majority of their detected interactions were cell-type specific. Here, the HaCaT and My-La libraries were not directly compared, since the My-La libraries had greater sequencing depth and had correspondingly more significant interactions than HaCaT. Stimulation of the HaCaT cells revealed a handful of genes with altered expression that correlated with altered chromatin interaction data. Whilst the majority of these involved the re-arrangement of interactions between promoter fragments and psoriasis fragments, in two cases the interaction was only present in either unstimulated or stimulated cells (CAST and RAB27B). A limitation of the stimulation is that the Th1 pathway is now thought to be less integral to psoriasis than the Th17 pathway (Diani et al., 2016); it may be the case that a more biologically relevant stimulation such as IL-17 or IL-23 might have a greater impact on the psoriasis interactome. Despite this, approximately 38% of the interactions in psoriasis loci in HaCaT cells were unique to unstimulated or stimulated cells, indicating an important effect of stimulation on the chromatin interaction landscape.

When run through the STRING protein network analysis, the implicated gene products across the three conditions were enriched for biological pathways relevant to psoriasis. The most significant biological function was keratinization, which was driven by multiple interactions between psoriasis-associated fragments and genes in the epidermal differentiation complex (EDC), involving several of the late cornified envelope (LCE) genes, small proline rich protein (SPRR) genes, loricrin (LOR) and involucrin (IVL). These interactions occurred in both HaCaT and My-La cells. This is interesting because network analysis of GWAS results in the 63 confirmed European loci in the recent psoriasis GWAS

241 meta-analysis did not flag keratinization as an enriched function (Tsoi et al., 2017). This may be because Tsoi et al. included genes within 100 kb of the lead variant in each locus; however the CHi-C data revealed interactions between psoriasis-associated fragments and the SPRR, LOR and IVL genes spanning over 500 kb so these genes would not have been included in the previous enrichment analysis. This therefore shows how CHi-C data could be useful for interpreting GWAS data as a whole. Also of note, the set of genes specifically implicated in My-La interactions were enriched for processes in IL-2 signalling, which is interesting because the My-La cells are grown in an IL-2-containing media, and activated CD 8+ T cells are known to produce large amounts of IL-2 (Boyman and Sprent, 2012). Whilst IL-2 is not considered to be integral to psoriasis pathology, it is a cytokine released by Th1 cells and plays a role in inflammation.

CHi-C in 9q31 and 6q23 reflected the findings in locus-specific analyses The use of CHi-C provided an opportunity to validate the 3C findings in the 9q31 and 6q23 risk loci. In general, the CHi-C findings backed up the conclusions that had previously been reached in the locus-specific experiments. In 9q31, although two previous CHi-C experiments had provided a partial view of the interaction landscape (Dryden et al., 2014; Martin et al., 2015), the present study was the first to target the psoriasis-associated HindIII fragments. Here, CHi-C in HaCaT and My-La cells provided further evidence of chromatin interactions between the psoriasis association and the KLF4 gene. In this data, direct interactions were seen between the second psoriasis-associated enhancer (containing the likely causal SNP rs6477612) and the fragment containing the KLF4 promoter in HaCaT cells. In My-La cells, a direct interaction was not observed with the KLF4 promoter, rather with fragments on either side of the gene including one ~6 kb downstream of KLF4, which had also been observed in the 3C data (the Centromeric 1 fragment). CHi-C indicated that the psoriasis association did form interactions with fragments to the telomeric side of the gene desert in all conditions; however these interactions fell short of the gene cluster. As in the 3C assays, CHi-C data did not show interactions with fragments containing gene promoters such as IKBKAP and CTNNAL1.

In 6q23, the CHi-C results partially validated the 3C-findings. In all cell types, short-range interactions were seen between TNFAIP3 and fragments downstream of TNFAIP3, as in the 3C findings. The interaction between the IFNGR1 region (chr6:137570293-137583223) and the Ps SNPs 2 fragment was validated in My-La cells, but not in HaCaT cells.

242

Interactions between the IFNGR1 region and the RA locus were detected in all cell types. However, interactions with the IL22RA2 promoter were not detected for either the psoriasis association or the RA association in any cell type. This enforces a need for follow-up validation of these interactions.

Stimulating HaCaT cells with IFN-γ did not drastically change the CHi-C interaction landscapes in 9q31 and 6q23. In the 9q31 locus, 3C-qPCR did not reveal differences in the strength of interactions with KLF4 between stimulated and unstimulated cells, as previously discussed. This was similarly reflected in the CHi-C data, where the CHiCAGO scores for the interaction between the middle psoriasis-associated putative enhancer (rs6477612) and the KLF4 promoter were 6.75 and 5.29 for unstimulated and stimulated cells, respectively. However, the CHi-C data did reveal that there was an extra interaction between the psoriasis-associated region and the KLF4 promoter in stimulated cells. The additional interacting fragment (chr9:110798323- 110798738, hg19) was just 415 bp long and did not contain any disease-associated variants, but was situated only 879 bp from the psoriasis-associated SNP rs1361371, which binds enhancer marks in 10 tissues and is marked by DNase hypersensitivity in skin and lung tissue according to Haploreg v4.1.

CHi-C implicated PTGER4 as a gene target in the 5p13.3 risk locus The CHi-C data allowed for exploration of bait-promoter interactions in further psoriasis loci, such as 5p13.3. The lead SNP in the 5p13.3 (PTGER4) locus, rs114934997 was first found to be associated with psoriasis in the expanded meta-analysis in 2015 (OR=1.17, P = 1.27 x 10-8) (Tsoi et al., 2015). In the original GWAS paper it was described as an intergenic locus, and no genes were assigned to it, although CARD6 was flagged as a nearby candidate. The CHi-C data in this locus identified robust interactions between the psoriasis baits and the PTGER4 promoter in all cell types. Subsequently a search of the GTEx database has revealed that the majority of the SNPs in LD with rs114934997 (r2 > 0.8) are eQTLs for PTGER4 in tibial nerve tissue, linking a physical chromatin interaction with regulation of gene expression. PTGER4 encodes a receptor (EP4) for prostaglandin E2 (PGE2). In mice, the action of PGE2 on the EP4 receptor has been shown to facilitate Th1 cell differentiation and production of IL-23 by dendritic cells (Yao et al., 2009). PGE2 and EP4 have also been shown to enhance expansion of IL-17-producing Th17 cells in both human and mice cells in vitro (Boniface et al., 2009; Yao et al., 2009). Knockout mice for

243

PTGER4 have reduced Langerhans cell migration and a dampened skin immune response (Kabashima et al., 2003).

Variants upstream of PTGER4 are also associated with several other autoimmune conditions including IBD (Jostins et al., 2012), multiple sclerosis (MS) (Sawcer et al., 2011), CD (Barrett et al., 2008) and allergy (Hinds et al., 2013). Recently, the novel HiChIP method was also used to examine the 5p13.3 locus (Mumbach et al., 2017). HiChIP targeting the enhancer mark H3K27ac was used to narrow down potentially causal SNPs in the ulcerative colitis, MS and CD loci, which were found to interact with the PTGER4 promoter (Mumbach et al., 2017). Tsoi et al demonstrated that the psoriasis-associated variants in 5p13.3 are not in strong LD with any of the other autoimmune-related variants (Tsoi et al., 2015), so it would be interesting to determine the regulatory mechanisms that occur in the case of psoriasis. Data from promoter CHi-C data in the literature supports the observation of interactions between the promoter of PTGER4 and fragments overlapping the psoriasis-associated variants in several other primary cell types, including psoriasis-relevant cells such as macrophages, monocytes, CD4+ T cells, CD8+ T cells and neutrophils. Therefore, this interaction appears to be robust across cell types (Javierre et al., 2016).

CHi-C implicated CDKAL1 and SOX4 as gene targets in the 6p22.3 (CDKAL1) locus The lead SNP in the 6p22.3 (CDKAL1) locus rs4712528 reached genome-wide significance for psoriasis vulgaris in a cross-phenotype GWAS between cutaneous disease and PsA (OR= 1.16, P = 8.4 x 10-11) (Stuart et al., 2015), whilst earlier genetic evidence also lends support to the psoriasis association (Quaranta et al., 2009). SNPs within 6p22.3 are also associated with CD (Liu 2015), IBD (Jostins et al., 2012) and type II diabetes (Imamura et al., 2016). Of these, the psoriasis association is in LD with variants for CD and Type II diabetes (Stuart et al., 2015). CDKAL1 itself is a potential psoriasis gene candidate; in 2009 a study showed that CDKAL1 is not expressed in keratinocytes but in immune cells including CD4+ and CD8+ T cells and B cells, and is downregulated in these cell types upon activation (Quaranta et al., 2009). Indeed, here there were interactions between intronic CDKAL1 variants and the promoter of CDKAL1 over 200 kb in My-La CD8+ T cells. However the CHi-C data here also shows interactions with the promoter of SOX4 in all conditions. In mice, SOX4 has been shown to be essential for IL-17 production from prototypic innate-like γδ T cells (Tγδ17), and knockout mice for SOX4 do not initiate skin

244 inflammation response in an Imiquimod model of psoriasis (Malhotra et al., 2013). Interestingly, psoriasis bait fragments in stimulated HaCaT cells had four more contacts with the SOX4 promoter fragments than unstimulated cells; this coincided with a decrease in SOX4 expression (fold change 0.475). A search of promoter CHi-C data in the literature reveals the presence of chromatin interactions between SOX4 and a fragment in CDKAL1 approximately 17 kb from the first intronic psoriasis-associated variant (rs2328520) in CD8+ T cells, which increases confidence in SOX4 as a target gene as seen in the present study (Javierre et al., 2016).

RAB27B was revealed as a potential stimulation-dependent target in the 18q21 (POLI) locus The lead SNP in this locus rs545979 was found to be associated with psoriasis in the 2012 Immunochip study (OR = 1.12, P = 3.5 x 10-10) and nearby genes were flagged as MBD2, POLI and STARD6 (Tsoi et al., 2012). The CHi-C data presented here showed interactions between psoriasis bait fragments and the promoter fragment for POLI in all conditions. A search within Haploreg revealed that many of the SNPs in strong LD with rs545979 (R2 > 0.8) were eQTLs for POLI in several cell types including transformed fibroblasts, skeletal muscle and sun exposed skin (GTEx). In My-La cells, there were also interactions between psoriasis baits and promoter fragments for MBD2, SNORA37, C18orf54, STARD6 and TCF4. Some of the SNPs in the set were also found to be eQTLs for STARD6 in tibial nerve (GTEx), C18orf54 and MBD2 in lymphoblastoid cells (Lappalainen et al., 2013). Therefore, it is possible that the SNPs in this locus could be regulating several gene targets in the context of disease.

In stimulated HaCaT cells there was an interaction with RAB27B over approximately 716 Kb that was not present in unstimulated HaCaT cells or My-La cells; this coincided with a 2.3-fold increase in RAB27B expression. A search in the literature reveals that RAB27B is a gene of unclear function, but shares sequence homology with RAB27A – a gene associated with Griscelli syndrome, which is characterized in part by hypopigmentation of the skin (Menasche et al., 2000). Although no SNPs in LD with rs545979 were found to be eQTLs for RAB27B, this might only be the case under certain stimulatory conditions.

In publically available promoter CHi-C data in 18q21, interactions are also seen between fragments containing psoriasis variants and the promoter of STARD6 in several cell types 245 including CD8+ cells, CD4+ cells and neutrophils (Javierre et al., 2016). An interaction is also seen between the RAB27B promoter and a fragment approximately 44 kb from the psoriasis-associated SNP furthest downstream of POLI (rs664075) in erythroblasts, megakaryocytes and CD4+ cells, although the interaction does not directly overlap any psoriasis-associated SNPs (Javierre et al., 2016).

2.5.2.3 Strengths and limitations of methods used in the analysis of multiple risk loci A strength of the gene expression array was that it provided data on all well-characterised genes, allowing for a large-scale comparison between the CHi-C interactions and gene expression. However, a limitation was that the RNA used to generate the gene expression data was not obtained from the same cells as those used for the HaCaT CHi-C libraries. This was due to the experimental design; the time-course was carried out primarily in order to determine an optimal stimulation time point, after which the CHi-C libraries were generated. Financial limitations meant that further expression microarrays were not used to analyse RNA extracted from the same cells stimulated for CHi-C. However, bioinformatics analyses routinely utilise information from across different datasets; for example, Mumbach et al. (2017) use RNA-seq data from a publically available dataset in the same cell types to compare with their HiChIP data.

Another limitation of the expression microarray is that it did not provide information on the expression of less-well characterised genes or non-coding RNAs. The CHi-C data implicated many interactions with the promoters of non-coding RNAs, some of which were stimulation-specific. In the 9q31 locus, for example, an interaction between the psoriasis association and the non-coding gene AL359552 downstream of KLF4 was detected in stimulated HaCaT cells but not in unstimulated HaCaT cells or My-La cells. Recently, RNA-seq has been used to characterise the differential expression profile between psoriatic skin and uninvolved skin; RNA-seq is advantageous over an expression array because it allows for detection of lowly expressed genes and non-coding RNAs (Gupta et al., 2016; Li et al., 2014a). RNA-seq datasets could be integrated with the CHi-C data in order to further prioritise both coding and non-coding gene targets of psoriasis- associated regulatory variants.

A limitation of the CHi-C study was that, as in the locus-specific functional work, only cell lines were utilised. To overcome this, comparisons could be made between the My-La cell 246 data and CD8+ T cell data which was published in a recent promoter CHi-C paper (Javierre et al., 2016). For the PTGER4 locus, for example, the same enhancer-promoter interaction between the psoriasis association and the PTGER4 promoter can be observed in CD8+ T cells as in My-La cells, which increases confidence that PTGER4 is a target gene (Javierre et al., 2016).

A strength of the CHi-C experiment is that the technical data underlying the libraries is comparable to other CHi-C studies in the literature. The capture efficiency of the libraries is calculated by dividing the number of mapped reads by the number of unique pairs, and indicates the extent to which target enrichment was successful. Here, the mean capture efficiency across the libraries was 69.6%, comparable to Martin et al. (2015) who reported efficiencies of 62% for their region capture and 70% for their promoter capture, and Schoenfelder et al. (2015) who reported 65.6% - 71.1% for their promoter capture.

A limitation in the CHi-C study was the variation in the depth of sequencing between the libraries; the My-La libraries sequenced on the NextSeq had a greater sequencing depth than the HaCaT libraries sequenced on the HiSeq. This probably contributed to the greater number of interactions observed in the My-La libraries than in the HaCaT libraries; Mifsud et al. (2015) reported in their promoter capture experiment that the final number of unique read pairs directly corresponded with the number of significant interactions. This means that some form of normalisation would need to be conducted before direct comparisons could be made between My-La and HaCaT. A potential way to do this might be to use downsampling to reduce the number of My-La reads prior to interaction calling in CHiCAGO.

Whilst there was also variation in the number of unique pairs between the HaCaT libraries (approximately 41 – 77 million), across replicates the mean number of unique pairs was comparable between unstimulated and stimulated cells: approximately 58.7 million for unstimulated and 59.1 million for stimulated. Correspondingly, the number of significant interactions was similar: 5,335 in unstimulated versus 5,512 in stimulated. This allowed for direct comparisons to be made between the unstimulated and stimulated HaCaT cells, allowing for correlation with the expression data.

247

2.5.3 Future work This work has opened up various avenues of research. It is anticipated that the work outlined in this thesis will be prepared for publishing, following expansion of the findings to some extent as detailed below.

2.5.3.1 The 9q31 and 6q23 loci In 9q31 and 6q23, initial functional analyses have provided clues towards the genetic mechanisms leading to regulation of gene function. However, this work has not yet revealed causal relationships between the psoriasis-associated variants and the genes that they interact with. Therefore, this will be a priority of further work carried out in these loci. Determining the effect of risk/non-risk SNPs in non-coding loci can involve eQTL analyses in cells from patients, and generating 3C and ChIP libraries in cell lines of differing genotype (McGovern et al., 2016). However, with the dawn of genome editing techniques, it would be preferable to cause a direct perturbation of the psoriasis variation and determine the mechanistic effect. In recent years the CRISPR-Cas9 system has been used extensively for this purpose; this bacterial-derived system can be guided to a specific location in the genome where it causes DNA cleavage (Cong et al., 2013). This allows for multiple types of targeted experiments, including cutting out loci or risk variants, introducing new mutations, or, by using a Cas9 enzyme that does not cleave the DNA known as dead Cas9 (dCas9), blocking or activating gene promoters or enhancers (Hsu et al., 2014b; Lopes et al., 2016). The advantage of using genome modification over cell lines of different genotype is that the effect of each individual variant can be assessed, rather than the haplotype.

A hypothesis-based approach in the 9q31 locus could involve using dCas9 bound to a repressor element (e.g. KRAB) or an activator element (e.g. p300) to block or enhance each of the three psoriasis-associated putative enhancers in the gene desert. The effect on regulation can then be determined by measuring the expression of surrounding genes, including KLF4, using qPCR. An example of a similar approach was recently described in the type 2 diabetes locus at GCKR; here the disease-associated intronic locus was found to harbour an enhancer that increased GCKR expression in a liver cell line when activated with dCas9 bound to transcriptional activators (Rodriguez et al., 2017).

Alternatively, a more hypothesis-free approach might be to use a CRIPSR screen that has the potential to detect novel enhancers. For instance, a recent study developed a CRISPR 248 interference method that systematically detected enhancers in well-characterised cancer susceptibility loci at MYC and GATA1 (Fulco et al., 2016). Here, the CRISPR screen could be directed against the whole psoriasis-associated region in 9q31 or 6q23. This experiment would have the capacity to prioritise causal variants based on function without a priori knowledge of overlap with predicted enhancer regions; however it is more technically challenging than the hypothesis-based experiment.

As well as using dCas9 to identify enhancers overlapping the psoriasis variants, causal variants could be determined through direct perturbation using CRISPR. In 9q31 for instance, CRISPR could be used to introduce the 30 bp deletion (rs55975335) in the second psoriasis-associated enhancer region, which was found to interact with fragments nearby KLF4. Additionally, SNPs can be changed between risk and non-risk allele by introducing a target template for homology-directed repair. The effect of these changes on target gene expression can be determined using qPCR.

2.5.3.2 The CHi-C study In this thesis, an initial analysis of the CHi-C data has been conducted with a focus on several candidate loci. However, since the study design was disease-wide, there is a wealth of data that needs to be analysed further. Initially, the promoter interactions identified in this study could be further validated. The CHi-C technique is robust with concordance between replicates routinely high: approximately 95% of interactions are validated between replicates (Javierre et al., 2016). However, a biological replicate in the form of a promoter capture design, as conducted by Martin et al. (2015), would be a cost- effective way of removing the ~5% of interactions that are potential false positives, which may confound downstream functional prioritisation. This experiment would involve designing RNA baits to target all genes and non-coding RNAs that were found to interact with the psoriasis regions in the region capture experiment. The RNA baits could be hybridised to the same Hi-C libraries as those used for the region capture experiment. An advantage of performing the region capture experiment first is that a priori knowledge allows for subsequent capture of promoters from distant genes, as opposed to only those within 500 kb of disease associations as in Martin et al. (2015), which only allowed for validation of 4.3% of all interactions.

There are a number of opportunities to exploit the CHi-C data and learn more about the regulation of genes by psoriasis variants. For example, a first examination of the data 249 might involve exploring the genomic distance between interacting fragments, and identifying which proportion of gene targets are distant. The relationship between enhancers and promoters can be further explored by assessing how many enhancers interact with a single gene promoter. Additional questions might then address whether stimulation affects the strength of enhancer-promoter interactions, and how the interactions compare to other CHi-C experiments in the literature. The CHi-C data in this thesis can be used to fine-map psoriasis risk variants in regions of extended LD, through fragment-level promoter interactions. In addition, the pre-capture Hi-C data generated in this study can be used to call TADS in these cell types for the first time. Although the vast majority of TADS are thought to be non-variant between cell types (Schmitt et al., 2016), it will be of great interest to determine which TADS are unique to the cell types used here, and whether they are enriched for psoriasis-associated variants.

One important aspect of interpreting the CHi-C data will be to examine interactions involving capture baits associated with other autoimmune diseases, since this thesis only focuses on the psoriasis baits. This data can be used to explore whether disease associated fragments within the same locus tend to interact with the same gene, or whether different genes are implicated in different diseases. Additionally, it can be used to see whether psoriasis-associated fragments tend to interact with other autoimmune- related fragments. This will be particularly interesting for diseases with a lot of genetic overlap, such as psoriasis and PsA.

A wealth of epigenetic data available can also be incorporated into this analysis. For example, epigenomic features across multiple cell types (ENCODE) can be examined to determine whether any of the observed interacting fragments are enriched for marks of enhancers. Correlating epigenetic marks between enhancers and promoters has the ability to link these two features. In addition it may be possible to rank the psoriasis- associated enhancer elements, as it was recently demonstrated that a combination of H3K27ac, DNase hypersensitivity and Hi-C contacts can accurately predict the location of enhancers within a locus and rank them by their influence on gene function (Fulco et al., 2016). In this way gene targets can be confidently assigned to psoriasis-associated enhancer elements, generating hypotheses that can be tested with further functional experiments involving genome modification approaches.

250

In the future it would also be interesting to perform further stimulatory experiments to determine the effect of inflammation on the chromatin landscape. For instance in this study, IL-17A stimulation upregulated NFKBIZ expression in HaCaT cells. In the 2015 Immunochip meta-analysis, a variant nearby NFKBIZ was found to be associated with psoriasis at genome-wide significance (Tsoi et al., 2015) In the present study, no interactions were observed with the promoter of NFKBIZ in unstimulated or IFN- stimulated HaCaT cells, but were observed in My-La cells (data not shown). Therefore it would be interesting to determine if IL-17A stimulation mediates an interaction between the psoriasis variants and the promoter of NFKBIZ in HaCaT cells. In addition, time and financial limitations meant that a stimulation experiment was not conducted for the My- La CD8+ T cell line; in future this could be conducted using a suitable cytokine such as IL- 23, which is known to be an important upstream signal in psoriasis.

Very recently, the novel HiChIP method demonstrated how functional interactions can be detected between enhancers and gene promoters (Mumbach et al., 2016). One of the main advantages of HiChIP is the reduced number of required cells (less than 1 million cells per experiment) which would allow for a transition into primary cell work. It might be feasible to perform HiChIP on keratinocytes derived from psoriatic plaques or patient T cells; this method was recently applied to primary T cells and smooth muscle cells targeting regulatory regions marked by H3K27ac (Mumbach et al., 2017). This data could also be linked with genotype to gain a better understanding of how genotype drives enhancer-promoter interactions in the psoriatic environment.

251

252

3. DISCUSSION OF THESIS

253

Over the course of a decade, GWAS have dramatically increased our knowledge of the genetics of common heritable traits. In psoriasis, as in other conditions, GWAS have indicated the vast complexity of the disease genetic signature. Although more than 80 psoriasis risk loci have already been discovered, it seems unlikely that GWAS will ever provide a full picture of the disease heritability, since each subsequent meta-analysis reveals more and more risk loci with diminishing effect sizes. For psoriasis as a whole we are therefore in a post-GWAS era where discovering novel disease associations would require ever-increasing cohort sizes, fine mapping and improved statistical analyses. Several challenges remain in the effort to characterise the biological mechanisms underlying psoriasis in the post-GWAS era. One challenge is to determine the genetic signature underlying different disease phenotypes, such as LOP. Discovering the genetics behind LOP may help us to understand the biological pathways that cause people to develop psoriasis later in life, and could lead to the advancement of therapies that more effectively target the disease subtype.

Whilst most GWAS variants are located outside of gene coding regions, they are enriched in regulatory elements suggesting that they play a role in gene function in disease. Causal variants can therefore be thought of as regulators of genes that are important in disease; these genes are indicators of biological pathways and provide promising targets for the development of novel therapeutics. Another major challenge therefore is to link GWAS variants with their target genes through careful functional characterisation in the context of disease. This second challenge requires an understanding of gene regulation in relation to properties of the epigenome; the sheer complexity of which is only just becoming clear. Recent breakthroughs in high throughput molecular techniques have allowed for genome-wide mapping of epigenetic features in multiple tissue types (e.g. ENCODE, Roadmap). This data can be used in a multitude of ways to identify putative causal variants overlapping regulatory elements within a disease locus, and hypothesize which genes are affected. Linking back to the first challenge, this approach will also be necessary to characterise risk variants for disease subtypes such as LOP.

To address the first challenge, the first section of this thesis aimed to identify variants associated with LOP. Genotyping was carried out on psoriasis patient samples and data for patients with LOP was extracted and merged with further LOP datasets. A GWAS was then performed for patients with LOP against psoriasis-free controls. The GWAS

254 confirmed the genome-wide significant signal at HLA and revealed a number of signals at suggestive significance that had previously been found to be associated with both EOP and LOP (IL23R, IFIH1, IL12B and TRAF3IP2). The GWAS also revealed twelve novel loci at suggestive significance, ten of which were not in LD with any other trait from previously reported GWAS. These novel suggestive loci may not represent true associations with LOP, since they did not reach genome-wide significance and require replication in an independent cohort. The previously reported LOP locus at IL1R1 was not significantly associated with disease here, although the same direction of effect was observed as in the Immunochip study. Therefore, in summary the GWAS provided clues as to the genetic signature of LOP, but did not reveal any robust associations that differentiate from EOP.

Initially, the intention was to follow up on LOP risk loci with functional characterisation using bioinformatics and hypothesis-driven molecular techniques. However, since no signals at genome-wide significance were identified, this project instead focused on EOP risk loci with strong evidence for disease association (P < 5 x 10-8). The selected loci represented an intergenic locus (9q31) and an intronic locus (6q32), both of which were previously identified in an Immunochip meta-analysis (Tsoi et al., 2012); of these, 9q31 was the primary locus of interest. Combined, the functional work in 9q31 implicated KLF4 as a potential target gene in cell lines. The 6q23 locus, on the other hand, represents a pan-autoimmune region that requires collaboration across different groups to delineate the disease-specific mechanisms. The results in 6q23 correlate with previous findings and, rather than identifying a single gene target, implicate several in psoriasis as in other autoimmune diseases. Due to time limitations, the 6q23 locus could not be characterised as fully as the 9q31 locus so will warrant further work.

The locus-specific work identified a need for a more hypothesis-free analysis of chromatin interactions; this was particularly pertinent to the 9q31 locus where previous fine-scale chromatin conformation data was not available. The CHi-C experiment has generated a wealth of information that is yet to be fully exploited. The initial analysis has identified genes and non-coding RNA targets for psoriasis-associated variants in two relevant cell lines; these genes have various relevant functions including keratinization and innate immune response. Importantly, the tendency of psoriasis fragments to interact with more than one gene promoter fragment suggests that regulatory variants may affect multiple genes and re-enforces the concept of complex genetic mechanisms in disease. As well as

255 providing an overarching disease view that potentially shifts our understanding of psoriasis genetics, the CHi-C data can be used to scrutinize individual loci and search for gene targets. Although not providing concrete evidence of a functional relationship, the data narrows down the number of genes in a region that are potentially involved based on enhancer-promoter interactions. This was demonstrated in the 5p13.1 locus, where robust interactions were seen between the psoriasis association and PTGER4, but not with another potential gene candidate CARD6. This information is important for the prioritisation of therapeutic targets.

3.1 The scope of genetic studies in psoriasis In psoriasis, as in the vast majority of complex diseases, the largest proportion of risk is found in the genetic background. Therefore, the aim of this project was to improve our understanding of the biological and genetic mechanisms occurring in psoriasis. Knowledge of these mechanisms has potential downstream implications for the clinic in several ways, including precision medicine, development of novel therapies and drug repositioning. Whilst there are not yet any examples where a psoriasis GWAS variant has been characterised and its target gene taken through to therapeutic intervention, with the dawn of biologics it has become clear that genetic risk variants are biomarkers for key pathways in disease that can be accurately targeted. In psoriasis for example, the successful biologic ustekinumab was originally intended to target IL-12 but happened to also target IL-23, which shares the same p40 subunit. Subsequent GWAS revealed genetic risk signals nearby the IL23A and IL23R genes in addition to IL12B (Cargill et al., 2007; Nair et al., 2009). It is now generally acknowledged that ustekinumab probably exerts most of its effect through IL-23 rather than through IL-12 in psoriasis, since IL-23 is an important upstream regulator of Th17-mediated IL-17 signalling. Newer biologics such as guselkumab are targeted specifically against IL-23 (p19 subunit) and have recently been found to be efficacious in phase III trials (Nakamura et al., 2017). Therefore, genetic findings underpin this successful drug target in psoriasis.

The GWAS era has led to pharmaceutical companies incorporating genetic risk factors during their prioritisation of novel therapeutic targets. A review of AstraZeneca’s drug development pipeline between 2005 and 2010 found that targets with supporting genetic background were much more likely to be efficacious than those without (Cook et al., 2014). In this study, 73% of targets with genetic support were successful in phase II,

256 compared to only 43% without genetic support. Furthermore, an analysis conducted on approved drug indications found that an underlying genetic association could double the chances that a drug is successful through phase I to approval (Nelson et al., 2015). It seems likely that the identification of target genes through functional studies will improve the success rate of novel therapies. Therefore, the functional work described in this thesis represents a strong starting point on which to build further information and deduce the important mechanisms occurring in disease. As discussed above, integration of the CHi-C data with epigenomic features such as modified histone marks and chromatin accessibility will aid in the identification of causal variants that overlap with regulatory elements. Ultimately, a list of narrowed-down likely causal variants and their target genes can be generated.

Also of relevance to the clinic, the CHi-C study described in this thesis provides a compelling opportunity to compare shared risk loci between related autoimmune conditions. In a previous CHi-C study, Martin et al. (2015) found that lead SNPs independently associated with different autoimmune diseases within the same locus could interact with a single gene promoter. For example, they showed that RA-associated SNPs within an intron of RAD51B interacted with the promoter of ZFP36L1, which also contained SNPs independently associated with JIA (Martin et al., 2015). In the current study, interactions between independent disease-associated SNPs for psoriasis and RA occurred with the promoter of TNFAIP3 in 6q23, indicating a shared disease mechanism. Shared gene targets such as this could allow for drug repositioning strategies whereby therapeutics utilised in one disease might prove effective in another (Sanseau et al., 2012).

Perhaps the biggest transformation in post-GWAS functional annotation is the arrival of precise genomic engineering in the form of CRISPR. CRISPR allows, for the first time, the empirical examination of disease associated variants to discover the genes, pathways, cell types and time points in which they have their genetic affect. Ground-breaking studies have already indicated how the full CRISPR toolkit could be employed to advance complex genetic discoveries: for example, Simeonov et al. (2017) showed how changing a single autoimmune risk variant can have a measurable effect on downstream gene expression and cell fate, in a time dependent manner, to drive cells towards a disease risk phenotype.

257

High-throughput techniques making use of next generation sequencing have revealed important features of the regulatory non-coding genome. Newer techniques such as HiChIP allow for the use of fewer input cells, making it possible to compare primary cell data between individuals (Mumbach et al., 2016). The future will also see an increase in the use of single-cell technology, which has already enabled the development of a single- cell Hi-C method (Ramani et al., 2017). Combining GWAS data with ever-growing epigenomic and bioinformatic datasets will lead to the characterisation of functional relationships between risk variants and genes. In this way, functional genomics has the potential to make huge breakthroughs in our understanding of autoimmune genetics.

3.2 Conclusion In conclusion, this thesis has made some headway into characterising variants and genes associated with psoriasis pathogenesis. The thesis includes the largest GWAS of LOP to date, and demonstrates the first instance of a high-throughput analysis of chromatin interactions in psoriasis risk loci. Whilst not providing immediate clinical relevance, this data provides a basis on which future studies can build evidence for prioritisation of novel therapeutic targets in psoriasis.

258

4. REFERENCES

259

Abel, E.A., Dicicco, L.M., Orenberg, E.K., Fraki, J.E., and Farber, E.M. (1986). Drugs in exacerbation of psoriasis. J Am Acad Dermatol 15, 1007-1022.

Aggarwal, S., Ghilardi, N., Xie, M.H., de Sauvage, F.J., and Gurney, A.L. (2003). Interleukin-23 promotes a distinct CD4 T cell activation state characterized by the production of interleukin-17. J Biol Chem 278, 1910-1914.

Allen, M.H., Ameen, H., Veal, C., Evans, J., Ramrakha-Jones, V.S., Marsland, A.M., Burden, A.D., Griffiths, C.E., Trembath, R.C., and Barker, J.N. (2005). The major psoriasis susceptibility locus PSORS1 is not a risk factor for late-onset psoriasis. The Journal of investigative dermatology 124, 103-106.

Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., Donnelly, P., Gibbs, R.A., Belmont, J.W., Boudreau, A., Leal, S.M., et al. (2005). A haplotype map of the human genome. Nature 437, 1299-1320.

Altshuler, D.M., Durbin, R.M., Abecasis, G.R., Bentley, D.R., Chakravarti, A., Clark, A.G., Donnelly, P., Eichler, E.E., Flicek, P., Gabriel, S.B., et al. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65.

Alvarez-Navarro, C., and de Castro, J.A.L. (2014). ERAP1 structure, function and pathogenetic role in ankylosing spondylitis and other MHC-associated diseases. Mol Immunol 57, 12-21.

Amos, C.I. (2007). Successful design and conduct of genome-wide association studies. Hum Mol Genet 16, R220-R225.

An, J., Golech, S., Klaewsongkram, J., Zhang, Y.Q., Subedi, K., Huston, G.E., Wood, W.H., Wersto, R.P., Becker, K.G., Swain, S.L., et al. (2011). Kruppel-like factor 4 (KLF4) directly regulates proliferation in thymocyte development and IL-17 expression during Th17 differentiation. Faseb J 25, 3634-3645.

Anderson, C.A., Pettersson, F.H., Clarke, G.M., Cardon, L.R., Morris, A.P., and Zondervan, K.T. (2010). Data quality control in genetic case-control association studies. Nature Protocols 5, 1564- 1573.

Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data.

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29.

Asumalahti, K., Laitinen, T., Lahermo, P., Suomela, S., Itkonen-Vatjus, R., Jansen, C., Karvonen, J., Karvonen, S.L., Reunala, T., Snellman, E., et al. (2003). Psoriasis susceptibility locus on 18p revealed by genome scan in Finnish families not associated with PSORS1. Journal of Investigative Dermatology 121, 735-740.

Auton, A., Brooks, L.D., Durbin, R.M., Garrison, E.P., Kang, H.M., Korbel, J.O., Marchini, J.L., McCarthy, S., McVean, G.A., and Abecasis, G.R. (2015). A global reference for human genetic variation. Nature 526, 68-74.

260

Baker, B.S., and Fry, L. (1992). The immunology of psoriasis. Br J Dermatol 126, 1-9.

Baker, B.S., Griffiths, C.E., Lambert, S., Powles, A.V., Leonard, J.N., Valdimarsson, H., and Fry, L. (1987). The effects of cyclosporin A on T lymphocyte and dendritic cell sub-populations in psoriasis. The British journal of dermatology 116, 503-510.

Baran, R. (2010). The Burden of Nail Psoriasis: An Introduction. Dermatology 221, 1-5.

Barber, M.J., Mangravite, L.M., Hyde, C.L., Chasman, D.I., Smith, J.D., McCarty, C.A., Li, X., Wilke, R.A., Rieder, M.J., Williams, P.T., et al. (2010). Genome-Wide Association of Lipid-Lowering Response to Statins in Combined Study Populations. PLoS One 5, e9763.

Barker, J., Mitra, R.S., Griffiths, C.E.M., Dixit, V.M., and Nickoloff, B.J. (1991). Keratinocytes as initiators of inflammation. Lancet 337, 211-214.

Barolo, S. (2012). Shadow enhancers: Frequently asked questions about distributed cis-regulatory information and enhancer redundancy. Bioessays 34, 135-141.

Barrett, J.C., Hansoul, S., Nicolae, D.L., Cho, J.H., Duerr, R.H., Rioux, J.D., Brant, S.R., Silverberg, M.S., Taylor, K.D., Barmada, M.M., et al. (2008). Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat Genet 40, 955-962.

Barski, A., Cuddapah, S., Cui, K.R., Roh, T.Y., Schones, D.E., Wang, Z.B., Wei, G., Chepelev, I., and Zhao, K.J. (2007). High-resolution profiling of histone methylations in the human genome. Cell 129, 823-837.

Becker, K., Siegert, S., Toliat, M.R., Du, J., Casper, R., Dolmans, G.H., Werker, P.M., Tinschert, S., Franke, A., Gieger, C., et al. (2016). Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren's Disease. PLoS One 11, e0158101.

Belton, J.M., McCord, R.P., Gibcus, J.H., Naumova, N., Zhan, Y., and Dekker, J. (2012). Hi-C: A comprehensive technique to capture the conformation of genomes. Methods 58, 268-276.

Benner, C., Spencer, C.C.A., Havulinna, A.S., Salomaa, V., Ripatti, S., and Pirinen, M. (2016). FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501.

Bergboer, J.G., Oostveen, A.M., de Jager, M.E., Zeeuwen, P.L., Joosten, I., Seyger, M.M., and Schalkwijk, J. (2012a). Koebner phenomenon in psoriasis is not associated with deletion of late cornified envelope genes LCE3B and LCE3C. The Journal of investigative dermatology 132, 475- 476.

Bergboer, J.G.M., Tjabringa, G.S., Kamsteeg, M., van Vlijmen-Willems, I., Rodijk-Olthuis, D., Jansen, P.A.M., Thuret, J.Y., Narita, M., Ishida-Yamamoto, A., Zeeuwen, P., et al. (2011). Psoriasis risk genes of the Late Cornified Envelope-3 group are distinctly expressed compared with genes of other LCE groups. Am J Pathol 178, 1470-1477.

261

Bergboer, J.G.M., Zeeuwen, P., and Schalkwijk, J. (2012b). Genetics of Psoriasis: Evidence for Epistatic Interaction between Skin Barrier Abnormalities and Immune Deviation. Journal of Investigative Dermatology 132, 2320-2331.

Bernstein, B.E., Stamatoyannopoulos, J.A., Costello, J.F., Ren, B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M.A., Beaudet, A.L., Ecker, J.R., et al. (2010). The NIH Roadmap Epigenomics Mapping Consortium. Nature Biotechnology 28, 1045-1048.

Bito, T., Nishikawa, R., Hatakeyama, M., Kikusawa, A., Kanki, H., Nagai, H., Sarayama, Y., Ikeda, T., Yoshizaki, H., Seto, H., et al. (2014). Influence of neutralizing antibodies to adalimumab and infliximab on the treatment of psoriasis. Br J Dermatol 170, 922-929.

Blumberg, H., Conklin, D., Xu, W., Grossmann, A., Brender, T., Carollo, S., Eagan, M., Foster, D., Haldeman, B.A., Hammond, A., et al. (2001). Interleukin 20: Discovery, Receptor Identification, and Role in Epidermal Function. Cell 104, 9-19.

Boehncke, W.H., and Schon, M.P. (2015). Psoriasis. Lancet 386, 983-994.

Boniface, K., Bak-Jensen, K.S., Li, Y., Blumenschein, W.M., McGeachy, M.J., McClanahan, T.K., McKenzie, B.S., Kastelein, R.A., Cua, D.J., and Malefyt, R.D. (2009). Prostaglandin E2 regulates Th17 cell differentiation and function through cyclic AMP and EP2/EP4 receptor signaling. J Exp Med 206, 535-548.

Bos, J.D., de Rie, M.A., Teunissen, M.B.M., and Piskin, G. (2005). Psoriasis: dysregulation of innate immunity. Br J Dermatol 152, 1098-1107.

Boukamp, P., Petrussevska, R.T., Breitkreutz, D., Hornung, J., Markham, A., and Fusenig, N.E. (1988). Normal keratinization in a spontaneously immortalized aneuploid human keratinocyte cell-line. J Cell Biol 106, 761-771.

Bowes, J., Ashcroft, J., Dand, N., Jalali-Najafabadi, F., Bellou, E., Ho, P., Marzo-Ortega, H., Helliwell, P.S., Feletar, M., Ryan, A.W., et al. (2017). Cross-phenotype association mapping of the MHC identifies genetic variants that differentiate psoriatic arthritis from psoriasis. Ann Rheum Dis 76, 1774-1779.

Bowes, J., Budu-Aggrey, A., Huffmeier, U., Uebe, S., Steel, K., Hebert, H.L., Wallace, C., Massey, J., Bruce, I.N., Bluett, J., et al. (2015a). Dense genotyping of immune-related susceptibility loci reveals new insights into the genetics of psoriatic arthritis. Nat Commun 6, 6046.

Bowes, J., Loehr, S., Budu-Aggrey, A., Uebe, S., Bruce, I.N., Feletar, M., Marzo-Ortega, H., Helliwell, P., Ryan, A.W., Kane, D., et al. (2015b). PTPN22 is associated with susceptibility to psoriatic arthritis but not psoriasis: evidence for a further PsA-specific risk locus. Ann Rheum Dis 74, 1882- 1885.

Boyle, A.P., Hong, E.L., Hariharan, M., Cheng, Y., Schaub, M.A., Kasowski, M., Karczewski, K.J., Park, J., Hitz, B.C., Weng, S., et al. (2012). Annotation of functional variation in personal genomes using RegulomeDB. Genome research 22, 1790-1797.

262

Boyle, A.P., Song, L., Lee, B.K., London, D., Keefe, D., Birney, E., Iyer, V.R., Crawford, G.E., and Furey, T.S. (2011). High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res 21, 456-464.

Boyman, O., and Sprent, J. (2012). The role of interleukin-2 during homeostasis and activation of the immune system. Nat Rev Immunol 12, 180-190.

Braathen, L.R., Botten, G., and Bjerkedal, T. (1989). Prevalence of psoriasis in Norway. Acta dermato-venereologica Supplementum 142, 5-8.

Brind'Amour, J., Liu, S., Hudson, M., Chen, C., Karimi, M.M., and Lorincz, M.C. (2015). An ultra- low-input native ChIP-seq protocol for genome-wide profiling of rare cell populations. Nat Commun 6, 6033.

Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y., and Greenleaf, W.J. (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10, 1213-+.

Burgner, D., Davila, S., Breunis, W.B., Ng, S.B., Li, Y., Bonnard, C., Ling, L., Wright, V.J., Thalamuthu, A., Odam, M., et al. (2009). A Genome-Wide Association Study Identifies Novel and Functionally Related Susceptibility Loci for Kawasaki Disease. PLoS Genet 5, 15.

Bush, W.S., and Moore, J.H. (2012). Chapter 11: Genome-Wide Association Studies. PLOS Computational Biology 8, e1002822.

Cai, Y.H., Fleming, C., and Yan, J. (2012). New insights of T cells in the pathogenesis of psoriasis. Cell Mol Immunol 9, 302-309.

Cairns, J., Freire-Pritchett, P., Wingett, S.W., Varnai, C., Dimond, A., Plagnol, V., Zerbino, D., Schoenfelder, S., Javierre, B.M., Osborne, C., et al. (2016). CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol 17, 127.

Calo, E., and Wysocka, J. (2013). Modification of enhancer chromatin: what, how and why? Mol Cell 49.

Capon, F., Bijlmakers, M.J., Wolf, N., Quaranta, M., Huffmeier, U., Allen, M., Timms, K., Abkevich, V., Gutin, A., Smith, R., et al. (2008). Identification of ZNF313/RNF114 as a novel psoriasis susceptibility gene. Hum Mol Genet 17, 1938-1945.

Capon, F., Novelli, G., Semprini, S., Clementi, M., Nudo, M., Vultaggio, P., Mazzanti, C., Gobello, T., Botta, A., Fabrizi, G., et al. (1999). Searching for psoriasis susceptibility genes in Italy: Genome scan and evidence for a new locus on chromosome 1. Journal of Investigative Dermatology 112, 32-35.

Cargill, M., Schrodi, S.J., Chang, M., Garcia, V.E., Brandon, R., Callis, K.P., Matsunami, N., Ardlie, K.G., Civello, D., Catanese, J.J., et al. (2007). A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. American journal of human genetics 80, 273-290.

263

Carson, K.R., Focosi, D., Major, E.O., Petrini, M., Richey, E.A., West, D.P., and Bennett, C.L. (2009). Monoclonal anti body-associated progressive multifocal leucoencephalopathy in patients treated with rituximab, natalizumab, and efalizumab: a Review from the Research on Adverse Drug Events and Reports (RADAR) Project. Lancet Oncol 10, 816-824.

Chandran, V. (2013). The Genetics of Psoriasis and Psoriatic Arthritis. Clin Rev Allergy Immunol 44, 149-156.

Chang, Y.T., Chen, T.J., Liu, P.C., Chen, Y.C., Chen, Y.J., Huang, Y.L., Jih, J.S., Chen, C.C., Lee, D.D., Wang, W.J., et al. (2009). Epidemiological Study of Psoriasis in the National Health Insurance Database in Taiwan. Acta Derm-Venereol 89, 262-266.

Chen, C.Y., Pollack, S., Hunter, D.J., Hirschhorn, J.N., Kraft, P., and Price, A.L. (2013). Improved ancestry inference using weights from external reference panels. Bioinformatics 29, 1399-1406.

Chen, W., Larrabee, B.R., Ovsyannikova, I.G., Kennedy, R.B., Haralambieva, I.H., Poland, G.A., and Schaid, D.J. (2015). Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 200, 719-736.

Cheng, H., Li, Y., Zuo, X.B., Tang, H.Y., Tang, X.F., Gao, J.P., Sheng, Y.J., Yin, X.Y., Zhou, F.S., Zhang, C., et al. (2014). Identification of a Missense Variant in LNPEP that Confers Psoriasis Risk. Journal of Investigative Dermatology 134, 359-365.

Cho, K.A., Suh, J.W., Lee, K.H., Kang, J.L., and Woo, S.Y. (2012). IL-17 and IL-22 enhance skin inflammation by stimulating the secretion of IL-1beta by keratinocytes via the ROS-NLRP3- caspase-1 pathway. International immunology 24, 147-158.

Christova, R. (2013). Detecting DNA-Protein Interactions in Living Cells-ChIP Approach. In Protein- Nucleic Acids Interactions, R. Donev, ed. (San Diego: Elsevier Academic Press Inc), pp. 101-133.

Chung, S.A., Taylor, K.E., Graham, R.R., Nititham, J., Lee, A.T., Ortmann, W.A., Jacob, C.O., Alarcon- Riquelme, M.E., Tsao, B.P., Harley, J.B., et al. (2011). Differential genetic associations for systemic lupus erythematosus based on anti-dsDNA autoantibody production. PLoS Genet 7, e1001323.

Clarke, G.M., Anderson, C.A., Pettersson, F.H., Cardon, L.R., Morris, A.P., and Zondervan, K.T. (2011). Basic statistical analysis in genetic case-control studies. Nature Protocols 6, 121-133.

Cohen, L., Henzel, W.J., and Baeuerle, P.A. (1998). IKAP is a scaffold protein of the I kappa B kinase complex. Nature 395, 292-296.

Cong, L., Ran, F.A., Cox, D., Lin, S.L., Barretto, R., Habib, N., Hsu, P.D., Wu, X.B., Jiang, W.Y., Marraffini, L.A., et al. (2013). Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 339, 819-823.

Cook, D., Brown, D., Alexander, R., March, R., Morgan, P., Satterthwaite, G., and Pangalos, M.N. (2014). Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat Rev Drug Discov 13, 419-431.

Corradin, O., Saiakhova, A., Akhtar-Zaidi, B., Myeroff, L., Willis, J., Iari, R.C.S., Lupien, M., Markowitz, S., and Scacheri, P.C. (2014). Combinatorial effects of multiple enhancer variants in 264 linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome research 24, 1-13.

Cortes, A., and Brown, M.A. (2011). Promise and pitfalls of the Immunochip. Arthritis Res Ther 13, 3.

Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., et al. (2010). Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107, 21931-21936.

Das, S., Forer, L., Schonherr, S., Sidore, C., Locke, A.E., Kwong, A., Vrieze, S.I., Chew, E.Y., Levy, S., McGue, M., et al. (2016). Next-generation genotype imputation service and methods. Nature Genetics 48, 1284-1287.

Das, S., Stuart, P.E., Ding, J., Tejasvi, T., Li, Y., Tsoi, L.C., Chandran, V., Fischer, J., Helms, C., Duffin, K.C., et al. (2014). Fine mapping of eight psoriasis susceptibility loci. European journal of human genetics : EJHG.

Davies, J.O., Oudelaar, A.M., Higgs, D.R., and Hughes, J.R. (2017). How best to identify chromosomal interactions: a comparison of approaches. Nat Methods 14, 125-134.

Davison, L.J., Wallace, C., Cooper, J.D., Cope, N.F., Wilson, N.K., Smyth, D.J., Howson, J.M.M., Saleh, N., Al-Jeffery, A., Angus, K.L., et al. (2012). Long-range DNA looping and gene expression analyses identify DEXI as an autoimmune disease candidate gene. Hum Mol Genet 21, 322-333. de Barros, G.M., and Kakehasi, A.M. (2016). Skeletal abnormalities of tricho-rhino-phalangeal syndrome type I. Revista Brasileira de Reumatologia (English Edition) 56, 86-89.

De Bersaques, J. (2012). Two pre-Willan descriptions of psoriasis. Clinics in dermatology 30, 544- 547. de Cid, R., Riveira-Munoz, E., Zeeuwen, P., Robarge, J., Liao, W., Dannhauser, E.N., Giardina, E., Stuart, P.E., Nair, R., Helms, C., et al. (2009). Deletion of the late cornified envelope LCE3B and LCE3C genes as a susceptibility factor for psoriasis. Nature Genetics 41, 211-215.

Degner, J.F., Pai, A.A., Pique-Regi, R., Veyrieras, J.B., Gaffney, D.J., Pickrell, J.K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G.E., et al. (2012). DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390-394.

Dekker, J., Rippe, K., Dekker, M., and Kleckner, N. (2002). Capturing chromosome conformation. Science 295, 1306-1311.

Deng, Y., Chang, C., and Lu, Q. (2016). The Inflammatory Response in Psoriasis: a Comprehensive Review. Clin Rev Allergy Immunol 50, 377-389.

Di Meglio, P., and Duarte, J.H. (2013). CD8 T Cells and IFN-gamma Emerge as Critical Players for Psoriasis in a Novel Model of Mouse Psoriasiform Skin Inflammation. Journal of Investigative Dermatology 133, 871-874.

265

Diani, M., Altomare, G., and Reali, E. (2016). T Helper Cell Subsets in Clinical Manifestations of Psoriasis. Journal of immunology research 2016, 7692024.

Dimas, A.S., Deutsch, S., Stranger, B.E., Montgomery, S.B., Borel, C., Attar-Cohen, H., Ingle, C., Beazley, C., Arcelus, M.G., Sekowska, M., et al. (2009). Common Regulatory Variation Impacts Gene Expression in a Cell Type-Dependent Manner. Science 325, 1246-1250.

Ding, X.L., Wang, T.L., Shen, Y.W., Wang, X.Y., Zhou, C., Tian, S., Liu, Y., Peng, G.H., Zhou, J.N., Xue, S.Y., et al. (2012). Prevalence of psoriasis in China: A population-based study in six cities. European Journal of Dermatology 22, 663-667.

Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376-380.

Dostie, J., Richmond, T.A., Arnaout, R.A., Selzer, R.R., Lee, W.L., Honan, T.A., Rubio, E.D., Krumm, A., Lamb, J., Nusbaum, C., et al. (2006). Chromosome Conformation Capture Carbon Copy (5C): A massively parallel solution for mapping interactions between genomic elements. Genome Research 16, 1299-1309.

Dryden, N.H., Broome, L.R., Dudbridge, F., Johnson, N., Orr, N., Schoenfelder, S., Nagano, T., Andrews, S., Wingett, S., Kozarewa, I., et al. (2014). Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome research 24, 1854-1868.

Duarte, G.V., Oliveira, M., Cardoso, T.M., Follador, I., Silva, T.S., Cavalheiro, C.M.A., Nonato, W., and Carvalho, E.M. (2013). Association between obesity measured by different parameters and severity of psoriasis. Int J Dermatol 52, 177-181.

Dunham, I., Kundaje, A., Aldred, S.F., Collins, P.J., Davis, C., Doyle, F., Epstein, C.B., Frietze, S., Harrow, J., Kaul, R., et al. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74.

Dunning, A.M., Michailidou, K., Kuchenbaecker, K.B., Thompson, D., French, J.D., Beesley, J., Healey, C.S., Kar, S., Pooley, K.A., Lopez-Knowles, E., et al. (2016). Breast cancer risk variants at 6q25 display different phenotype associations and regulate ESR1, RMND1 and CCDC170. Nat Genet 48, 374-386.

Eaton, L.H., Chularojanamontri, L., Ali, F.R., Theodorakopoulou, E., Dearman, R.J., Kimber, I., and Griffiths, C.E.M. (2014). Guttate psoriasis is associated with an intermediate phenotype of impaired Langerhans cell migration. The British journal of dermatology 171, 409-411. eGTEx Project (2017). Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. Nat Genet 49, 1664-1670.

Eiris, N., Gonzalez-Lara, L., Santos-Juanes, J., Queiro, R., Coto, E., and Coto-Segura, P. (2014). Genetic variation at IL12B, IL23R and IL23A is associated with psoriasis severity, psoriatic arthritis and type 2 diabetes mellitus. Journal of dermatological science 75, 167-172.

266

Elder, J.T. (2013). What can the genetics of psoriasis teach us about alopecia areata? The journal of investigative dermatology Symposium proceedings / the Society for Investigative Dermatology, Inc [and] European Society for Dermatological Research 16, S34-36.

Ellinghaus, D., Ellinghaus, E., Nair, R.P., Stuart, P.E., Esko, T., Metspalu, A., Debrus, S., Raelson, J.V., Tejasvi, T., Belouchi, M., et al. (2012a). Combined Analysis of Genome-wide Association Studies for Crohn Disease and Psoriasis Identifies Seven Shared Susceptibility Loci. Am J Hum Genet 90, 636-647.

Ellinghaus, E., Ellinghaus, D., Stuart, P.E., Nair, R.P., Debrus, S., Raelson, J.V., Belouchi, M., Fournier, H., Reinhard, C., Ding, J., et al. (2010). Genome-wide association study identifies a psoriasis susceptibility locus at TRAF3IP2. Nature Genetics 42, 991-U113.

Ellinghaus, E., Stuart, P.E., Ellinghaus, D., Nair, R.P., Debrus, S., Raelson, J.V., Belouchi, M., Tejasvi, T., Li, Y., Tsoi, L.C., et al. (2012b). Genome-Wide Meta-Analysis of Psoriatic Arthritis Identifies Susceptibility Locus at REL. Journal of Investigative Dermatology 132, 1133-1140.

Ellis, C.N., Gorsulowsky, D.C., Hamilton, T.A., Billings, J.K., Brown, M.D., Headington, J.T., Cooper, K.D., Baadsgaard, O., Duell, E.A., Annesley, T.M., et al. (1986). Cyclosporine improves psoriasis in a double blind study. JAMA-J Am Med Assoc 256, 3110-3116.

Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215-216.

Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X.L., Wang, L., Issner, R., Coyne, M., et al. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43-49.

Fairfax, B.P., Humburg, P., Makino, S., Naranbhai, V., Wong, D., Lau, E., Jostins, L., Plant, K., Andrews, R., McGee, C., et al. (2014). Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949.

Fantuzzi, F., Del Giglio, M., Gisondi, P., and Girolomoni, G. (2008). Targeting tumor necrosis factor alpha in psoriasis and psoriatic arthritis. Expert Opin Ther Targets 12, 1085-1096.

Farh, K.K.-H., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W.J., Beik, S., Shoresh, N., Whitton, H., Ryan, R.J.H., Shishkin, A.A., et al. (2014). Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature advance online publication.

Farh, K.K., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W.J., Beik, S., Shoresh, N., Whitton, H., Ryan, R.J., Shishkin, A.A., et al. (2015). Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337-343.

Feinberg, M.W., Cao, Z., Wara, A.K., Lebedeva, M.A., Senbanerjee, S., and Jain, M.K. (2005). Kruppel-like factor 4 is a mediator of proinflammatory signaling in macrophages. The Journal of biological chemistry 280, 38247-38258.

Ferrandiz, C., Bordas, X., Garcia-Patos, V., Puig, S., Pujol, R., and Smandia, A. (2001). Prevalence of psoriasis in Spain (Epiderma Project: phase I). J Eur Acad Dermatol Venereol 15, 20-23.

267

FitzGerald, O., Haroon, M., Giles, J.T., and Winchester, R. (2015). Concepts of pathogenesis in psoriatic arthritis: genotype determines clinical phenotype. Arthritis Res Ther 17.

FitzGerald, O., and Winchester, R. (2009). Psoriatic arthritis: from pathogenesis to therapy. Arthritis Res Ther 11.

Fletcher, O., Johnson, N., Orr, N., Hosking, F.J., Gibson, L.J., Walker, K., Zelenika, D., Gut, I., Heath, S., Palles, C., et al. (2011). Novel Breast Cancer Susceptibility Locus at 9q31.2: Results of a Genome-Wide Association Study. Journal of the National Cancer Institute 103, 425-435.

Friberg, C., Bjorck, K., Nilsson, S., Inerot, A., Wahlstrom, J., and Samuelsson, L. (2006). Analysis of chromosome 5q31-32 and psoriasis: Confirmation of a susceptibility locus but no association with SNPs within SLC22A4 and SLC22A5. Journal of Investigative Dermatology 126, 998-1002.

Fulco, C.P., Munschauer, M., Anyoha, R., Munson, G., Grossman, S.R., Perez, E.M., Kane, M., Cleary, B., Lander, E.S., and Engreitz, J.M. (2016). Systematic mapping of functional enhancer- promoter connections with CRISPR interference. Science 354, 769-773.

Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Bin Mohamed, Y., Orlov, Y.L., Velkov, S., Ho, A., Mei, P.H., et al. (2009). An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58-64.

Fullwood, M.J., and Ruan, Y. (2009). ChIP-based methods for the identification of long-range chromatin interactions. Journal of cellular biochemistry 107, 30-39.

Garg, A.V., and Gaffen, S.L. (2013). IL-17 signaling and A20 A balancing act. Cell Cycle 12, 3459- 3460.

Gelfand, J.M., Stern, R.S., Nijsten, T., Feldman, S.R., Thomas, J., Kist, J., Rolstad, T., and Margolis, D.J. (2005). The prevalence of psoriasis in African Americans: Results from a population-based study. J Am Acad Dermatol 52, 23-26.

Gelfand, J.M., and Yeung, H. (2012). Metabolic Syndrome in Patients with Psoriatic Disease. The Journal of rheumatology Supplement 89, 24-28.

Giambartolomei, C., Vukcevic, D., Schadt, E.E., Franke, L., Hingorani, A.D., Wallace, C., and Plagnol, V. (2014). Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS Genet 10, e1004383.

Gibbs, J.R., van der Brug, M.P., Hernandez, D.G., Traynor, B.J., Nalls, M.A., Lai, S.L., Arepalli, S., Dillman, A., Rafferty, I.P., Troncoso, J., et al. (2010). Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain. PLoS Genet 6.

Gilfillan, G.D., Hughes, T., Sheng, Y., Hjorthaug, H.S., Straub, T., Gervin, K., Harris, J.R., Undlien, D.E., and Lyle, R. (2012). Limitations and possibilities of low cell number ChIP-seq. BMC Genomics 13, 645-645.

Goodfield, M., Hull, S.M., Holland, D., Roberts, G., Wood, E., Reid, S., and Cunliffe, W. (1994). Investigations of the active edge of plaque psoriasis - vascular proliferation precedes changes in epidermal keratin. Br J Dermatol 131, 808-813. 268

Gottlieb, A.B., Krueger, J.G., Sandberg Lundblad, M., Göthberg, M., and Skolnick, B.E. (2015). First- In-Human, Phase 1, Randomized, Dose-Escalation Trial with Recombinant Anti–IL-20 Monoclonal Antibody in Patients with Psoriasis. PLoS One 10, e0134703.

Grampp, S., Platt, J.L., Lauer, V., Salama, R., Kranz, F., Neumann, V.K., Wach, S., Stohr, C., Hartmann, A., Eckardt, K.U., et al. (2016). Genetic variation at the 8q24.21 renal cancer susceptibility locus affects HIF binding to a MYC enhancer. Nat Commun 7, 13183.

Greb, J.E., Goldminz, A.M., Elder, J.T., Lebwohl, M.G., Gladman, D.D., Wu, J.J., Mehta, N.N., Finlay, A.Y., and Gottlieb, A.B. (2016). Psoriasis. Nature reviews Disease primers 2, 16082.

Gregersen, P.K., Amos, C.I., Lee, A.T., Lu, Y., Remmers, E.F., Kastner, D.L., Seldin, M.F., Criswell, L.A., Plenge, R.M., Holers, V.M., et al. (2009). REL, encoding a member of the NF-kappaB family of transcription factors, is a newly defined risk locus for rheumatoid arthritis. Nat Genet 41, 820-823.

Griffiths, C.E., Powles, A.V., Leonard, J.N., Fry, L., Baker, B.S., and Valdimarsson, H. (1986). Clearance of psoriasis with low dose cyclosporin. British medical journal (Clinical research ed) 293, 731-732.

Griffiths, C.E., Reich, K., Lebwohl, M., van de Kerkhof, P., Paul, C., Menter, A., Cameron, G.S., Erickson, J., Zhang, L., Secrest, R.J., et al. (2015). Comparison of ixekizumab with etanercept or placebo in moderate-to-severe psoriasis (UNCOVER-2 and UNCOVER-3): results from two phase 3 randomised trials. Lancet 386, 541-551.

Griffiths, C.E.M., and Barker, J. (2007). Psoriasis 1 - Pathogenesis and clinical features of psoriasis. Lancet 370, 263-271.

Griffiths, C.E.M., Strober, B.E., van de Kerkhof, P., Ho, V., Fidelus-Gort, R., Yeilding, N., Guzzo, C., Xia, Y.C., Zhou, B., Li, S., et al. (2010). Comparison of Ustekinumab and Etanercept for Moderate- to-Severe Psoriasis. N Engl J Med 362, 118-128.

Grjibovski, A.M., Olsen, A.O., Magnus, P., and Harris, J.R. (2007). Psoriasis in Norwegian twins: contribution of genetic and environmental effects. J Eur Acad Dermatol Venereol 21, 1337-1343.

Grubert, F., Zaugg, J.B., Kasowski, M., Ursu, O., Spacek, D.V., Martin, A.R., Greenside, P., Srivas, R., Phanstiel, D.H., Pekowska, A., et al. (2015). Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell 162, 1051-1065.

Grundberg, E., Adoue, V., Kwan, T., Ge, B., Duan, Q.L., Lam, K.C.L., Koka, V., Kindmark, A., Weiss, S.T., Tantisira, K., et al. (2011). Global Analysis of the Impact of Environmental Perturbation on cis- Regulation of Gene Expression. PLoS Genet 7.

GTEx Consortium (2017). Genetic effects on gene expression across human tissues. Nature 550, 204-213.

Gudjonsson, J.E., Karason, A., Runarsdottir, E.H., Antonsdottir, A.A., Hauksson, V.B., Jonsson, H.H., Gulcher, J., Stefansson, K., and Valdimarsson, H. (2006). Distinct clinical differences between HLA- Cw*0602 positive and negative psoriasis patients - An analysis of 1019 HLA-C- and HLA-B-typed patients. Journal of Investigative Dermatology 126, 740-745.

269

Gudjonsson, J.E., Thorarinsson, A.M., Sigurgeirsson, B., Kristinsson, K.G., and Valdimarsson, H. (2003). Streptococcal throat infections and exacerbation of chronic plaque psoriasis: a prospective study. Br J Dermatol 149, 530-534.

Guinot, C., Latreille, J., Perrussel, M., Doss, N., Dubertret, L., and French Psoriasis, G. (2009). Psoriasis: characterization of six different clinical phenotypes. Exp Dermatol 18, 712-719.

Guo, H., Fortune, M.D., Burren, O.S., Schofield, E., Todd, J.A., and Wallace, C. (2015). Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases. Hum Mol Genet 24, 3305-3313.

Gupta, R., Ahn, R., Lai, K., Mullins, E., Debbaneh, M., Dimon, M., Arron, S., and Liao, W. (2016). Landscape of long non-coding RNAs in psoriatic and healthy skin. The Journal of investigative dermatology 136, 603-609.

Hagege, H., Klous, P., Braem, C., Splinter, E., Dekker, J., Cathala, G., de Laat, W., and Forne, T. (2007). Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nature Protocols 2, 1722-1733.

Hao, K., Bosse, Y., Nickle, D.C., Pare, P.D., Postma, D.S., Laviolette, M., Sandford, A., Hackett, T.L., Daley, D., Hogg, J.C., et al. (2012). Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet 8, e1003029.

Hayashi, M., Nakayama, T., Hirota, T., Saeki, H., Nobeyama, Y., Ito, T., Umezawa, Y., Fukuchi, O., Yanaba, K., Kikuchi, S., et al. (2014). Novel IL36RN gene mutation revealed by analysis of 8 Japanese patients with generalized pustular psoriasis. Journal of Dermatological Science 76, 267- 269.

Hebert, H.L., Ali, F.R., Bowes, J., Griffiths, C.E.M., Barton, A., and Warren, R.B. (2012). Genetic susceptibility to psoriasis and psoriatic arthritis: implications for therapy. Br J Dermatol 166, 474- 482.

Hebert, H.L., Bowes, J., Smith, R.L., Flynn, E., Parslew, R., Alsharqi, A., McHugh, N.J., Barker, J.N., Griffiths, C.E., Barton, A., et al. (2014a). Identification of Loci Associated with Late-Onset Psoriasis Using Dense Genotyping of Immune-Related Regions. The British journal of dermatology.

Hebert, H.L., Bowes, J., Smith, R.L., McHugh, N.J., Barker, J., Griffiths, C.E.M., Barton, A., and Warren, R.B. (2014b). Polymorphisms in IL-1B Distinguish between Psoriasis of Early and Late Onset. Journal of Investigative Dermatology 134, 1459-1462.

Heintzman, N.D., Hon, G.C., Hawkins, R.D., Kheradpour, P., Stark, A., Harp, L.F., Ye, Z., Lee, L.K., Stuart, R.K., Ching, C.W., et al. (2009). Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108-112.

Henseler, T., and Christophers, E. (1985). Psoriasis of early and late onset - characterization of 2 types of psoriasis-vulgaris. J Am Acad Dermatol 13, 450-456.

Heredi, E., Csordas, A., Clemens, M., Adam, B., Gaspar, K., Torocsik, D., Nagy, G., Adany, R., Gaal, J., Remenyik, E., et al. (2013). The prevalence of obesity is increased in patients with late compared with early onset psoriasis. Ann Epidemiol 23, 688-692. 270

Higgins, E. (2000). Alcohol, smoking and psoriasis. Clin Exp Dermatol 25, 107-110.

Hijnen, D., Knol, E.F., Gent, Y.Y., Giovannone, B., Beijn, S.J., Kupper, T.S., Bruijnzeel-Koomen, C.A., and Clark, R.A. (2013). CD8(+) T cells in the lesional skin of atopic dermatitis and psoriasis patients are an important source of IFN-gamma, IL-13, IL-17, and IL-22. The Journal of investigative dermatology 133, 973-979.

Hinds, D.A., McMahon, G., Kiefer, A.K., Do, C.B., Eriksson, N., Evans, D.M., St Pourcain, B., Ring, S.M., Mountain, J.L., Francke, U., et al. (2013). A genome-wide association meta-analysis of self- reported allergy identifies shared and allergy-specific susceptibility loci. Nat Genet 45, 907-911.

Hollox, E.J., Huffmeier, U., Zeeuwen, P., Palla, R., Lascorz, J., Rodijk-Olthuis, D., van de Kerkhof, P.C.M., Traupe, H., de Jongh, G., den Heijer, M., et al. (2008). Psoriasis is associated with increased beta-defensin genomic copy number. Nature Genetics 40, 23-25.

Hong, J.W., Hendrix, D.A., and Levine, M.S. (2008). Shadow enhancers as a source of evolutionary novelty. Science 321, 1314-1314.

Hoskins, J.W., Ibrahim, A., Emmanuel, M.A., Manmiller, S.M., Wu, Y., O'Neill, M., Jia, J., Collins, I., Zhang, M., Thomas, J.V., et al. (2016). Functional characterization of a chr13q22.1 pancreatic cancer risk locus reveals long-range interaction and allele-specific effects on DIS3 expression. Hum Mol Genet 25, 4726-4738.

Hsu, L., and Armstrong, A.W. (2013). Anti-drug antibodies in psoriasis: a critical evaluation of clinical significance and impact on treatment response. Expert Rev Clin Immunol 9, 949-958.

Hsu, L., Snodgrass, B.T., and Armstrong, A.W. (2014a). Antidrug antibodies in psoriasis: a systematic review. Br J Dermatol 170, 261-273.

Hsu, P.D., Lander, E.S., and Zhang, F. (2014b). Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262-1278.

Huffmeier, U., Uebe, S., Ekici, A.B., Bowes, J., Giardina, E., Korendowych, E., Juneblad, K., Apel, M., McManus, R., Ho, P., et al. (2010). Common variants at TRAF3IP2 are associated with susceptibility to psoriatic arthritis and psoriasis. Nat Genet 42, 996-999.

Hughes, J.R., Roberts, N., McGowan, S., Hay, D., Giannoulatou, E., Lynch, M., De Gobbi, M., Taylor, S., Gibbons, R., and Higgs, D.R. (2014). Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet 46, 205-212.

Hull, S.M., Goodfield, M., Wood, E.J., and Cunliffe, W.J. (1989). Active and inactive edges of psoriatic plaques - identification by tracing and investigation by laser-Doppler flowmetry and immunocytochemical techniques. Journal of Investigative Dermatology 92, 782-785.

Hussain, S., Berki, D.M., Choon, S.E., Burden, A.D., Allen, M.H., Arostegui, J.I., Chaves, A., Duckworth, M., Irvine, A.D., Mockenhaupt, M., et al. (2015). IL36RN mutations define a severe autoinflammatory phenotype of generalized pustular psoriasis. J Allergy Clin Immunol 135, 1067- 1070.

271

Ibrahim, G., Waxman, R., and Helliwell, P.S. (2009). The Prevalence of Psoriatic Arthritis in People With Psoriasis. Arthritis Care Res 61, 1373-1378.

Imamura, M., Takahashi, A., Yamauchi, T., Hara, K., Yasuda, K., Grarup, N., Zhao, W., Wang, X., Huerta-Chagoya, A., Hu, C., et al. (2016). Genome-wide association studies in the Japanese population identify seven novel loci for type 2 diabetes. Nat Commun 7, 10531.

Jager, R., Migliorini, G., Henrion, M., Kandaswamy, R., Speedy, H.E., Heindl, A., Whiffin, N., Carnicer, M.J., Broome, L., Dryden, N., et al. (2015). Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat Commun 6, 6178.

Javierre, B.M., Burren, O.S., Wilder, S.P., Kreuzhuber, R., Hill, S.M., Sewitz, S., Cairns, J., Wingett, S.W., Varnai, C., Thiecke, M.J., et al. (2016). Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters. Cell 167, 1369-1384 e1319.

Javitz, H.S., Ward, M.M., Farber, E., Nail, L., and Vallow, S.G. (2002). The direct cost of care for psoriasis and psoriatic arthritis in the United States. J Am Acad Dermatol 46, 850-860.

Jiang, L., Liu, L., Cheng, Y., Lin, Y., Shen, C., Zhu, C., Yang, S., Yin, X., and Zhang, X. (2015). More heritability probably captured by psoriasis genome-wide association study in Han Chinese. Gene 573, 46-49.

Jiaravuthisan, M.M., Sasseville, D., Vender, R.B., Murphy, F., and Muhn, C.Y. (2007). Psoriasis of the nail: Anatomy, pathology, clinical presentation, and a review of the literature on therapy. J Am Acad Dermatol 57, 1-27.

Jordan, C.T., Cao, L., Roberson, E.D.O., Duan, S.H., Helms, C.A., Nair, R.P., Duffin, K.C., Stuart, P.E., Goldgar, D., Hayashi, G., et al. (2012a). Rare and Common Variants in CARD14, Encoding an Epidermal Regulator of NF-kappaB, in Psoriasis. Am J Hum Genet 90, 796-808.

Jordan, C.T., Cao, L., Roberson, E.D.O., Pierson, K.C., Yang, C.F., Joyce, C.E., Ryan, C., Duan, S.H., Helms, C.A., Liu, Y., et al. (2012b). PSORS2 Is Due to Mutations in CARD14. American Journal of Human Genetics 90, 784-795.

Jostins, L., Ripke, S., Weersma, R.K., Duerr, R.H., McGovern, D.P., Hui, K.Y., Lee, J.C., Schumm, L.P., Sharma, Y., Anderson, C.A., et al. (2012). Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119-124.

Kabashima, K., Sakata, D., Nagamachi, M., Miyachi, Y., Inaba, K., and Narumiya, S. (2003). Prostaglandin E-2-EP4 signaling initiates skin immune responses by promoting migration and maturation of Langerhans cells. Nature medicine 9, 744-749.

Kaltoft, K., Thestruppedersen, K., Jensen, J.R., Bisballe, S., and Zachariae, H. (1984). Establishment of T-cell and B-cell lines from patients with mycosis-fungoides. Br J Dermatol 111, 303-308.

Kanda, N., Shimizu, T., Tada, Y., and Watanabe, S. (2007). IL-18 enhances IFN-gamma-induced production of CXCL9, CXCL10, and CXCL11 in human keratinocytes. Eur J Immunol 37, 338-350.

272

Karason, A., Love, T.J., and Gudbjornsson, B. (2009). A strong heritability of psoriatic arthritis over four generations-the Reykjavik Psoriatic Arthritis Study. Rheumatology 48, 1424-1428.

Keen, J.C., and Moore, H.M. (2015). The Genotype-Tissue Expression (GTEx) Project: Linking Clinical Data with Molecular Analysis to Advance Personalized Medicine. Journal of personalized medicine 5, 22-29.

Kichaev, G., Yang, W.Y., Lindstrom, S., Hormozdiari, F., Eskin, E., Price, A.L., Kraft, P., and Pasaniuc, B. (2014). Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10, e1004722.

Kim, K.J., Park, S., Park, Y.H., Ku, S.H., Cho, B.B., Park, B.J., and Kim, K.H. (2014). The Expression and Role of Kruppel-Like Factor 4 in Psoriasis. Ann Dermatol 26, 675-680.

Kirby, B. (2016). Is late-onset psoriasis a distinct subtype of chronic plaque psoriasis? Br J Dermatol 175, 869-870.

Kircher, M., Witten, D.M., Jain, P., O'Roak, B.J., Cooper, G.M., and Shendure, J. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics 46, 310-+.

Knight, J., Spain, S.L., Capon, F., Hayday, A., Nestle, F.O., Clop, A., Barker, J.N., Weale, M.E., and Trembath, R.C. (2012). Conditional analysis identifies three novel major histocompatibility complex loci associated with psoriasis. Hum Mol Genet 21, 5185-5192.

Koopmann, T.T., Adriaens, M.E., Moerland, P.D., Marsman, R.F., Westerveld, M.L., Lal, S., Zhang, T., Simmons, C.Q., Baczko, I., dos Remedios, C., et al. (2014). Genome-wide identification of expression quantitative trait loci (eQTLs) in human heart. PLoS One 9, e97380.

Korber, A., Mossner, R., Renner, R., Sticht, H., Wilsmann-Theis, D., Schulz, P., Sticherling, M., Traupe, H., and Huffmeier, U. (2013). Mutations in IL36RN in patients with generalized pustular psoriasis. Journal of Investigative Dermatology 133, 2634-2637.

Krijger, P.H., and de Laat, W. (2016). Regulation of disease-associated gene expression in the 3D genome. Nature reviews Molecular cell biology 17, 771-782.

Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317.

Langley, R.G.B., Krueger, G.G., and Griffiths, C.E.M. (2005). Psoriasis: epidemiology, clinical features, and quality of life. Ann Rheum Dis 64, 18-23.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357-U354.

Lappalainen, T., Sammeth, M., Friedlander, M.R., t Hoen, P.A., Monlong, J., Rivas, M.A., Gonzalez- Porta, M., Kurbatova, N., Griebel, T., Ferreira, P.G., et al. (2013). Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506-511.

273

Lebwohl, M. (2003). Psoriasis. Lancet 361, 1197-1204.

Lee, E., Trepicchio, W.L., Oestreicher, J.L., Pittman, D., Wang, F., Chamian, F., Dhodapkar, M., and Krueger, J.G. (2004). Increased expression of interleukin 23 p19 and p40 in lesional skin of patients with psoriasis vulgaris. J Exp Med 199, 125-130.

Lee, Y.A., Ruschendorf, F., Windemuth, C., Schmitt-Egenolf, M., Stadelmann, A., Nurnberg, G., Stander, M., Wienker, T.F., Reis, A., and Traupe, H. (2000). Genomewide scan in German families reveals evidence for a novel psoriasis-susceptibility locus on chromosome 19p13. Am J Hum Genet 67, 1020-1024.

Leonardi, C.L., Kimball, A.B., Papp, K.A., Yeilding, N., Guzzo, C., Wang, Y.H., Li, S., Dooley, L.T., Gordon, K.B., and Investigators, P.S. (2008). Efficacy and safety of ustekinumab, a human interleukin-12/23 monoclonal antibody, in patients with psoriasis: 76-week results from a randomised, double-blind, placebo-controlled trial (PHOENIX 1). Lancet 371, 1665-1674.

Leung, D.Y.M., Travers, J.B., Giorno, R., Norris, D.A., Skinner, R., Aelion, J., Kazemi, L.V., Kim, M.H., Trumble, A.E., Koth, M., et al. (1995). Evidence for a streptococcal superantigen-driven process in acute guttate psoriasis. J Clin Invest 96, 2106-2112.

Li, B.S., Tsoi, L.C., Swindell, W.R., Gudjonsson, J.E., Tejasvi, T., Johnston, A., Ding, J., Stuart, P.E., Xing, X.Y., Kochkodan, J.J., et al. (2014a). Transcriptome Analysis of Psoriasis in a Large Case- Control Sample: RNA-Seq Provides Insights into Disease Mechanisms. Journal of Investigative Dermatology 134, 1828-1838.

Li, J., and Lu, Z. (2013). Pathway-based drug repositioning using causal inference. BMC bioinformatics 14, S3-S3.

Li, M., Wu, Y., Chen, G., Yang, Y., Zhou, D., Zhang, Z., Zhang, D., Chen, Y., Lu, Z., He, L., et al. (2011). Deletion of the late cornified envelope genes LCE3C and LCE3B is associated with psoriasis in a Chinese population. The Journal of investigative dermatology 131, 1639-1643.

Li, Q., Stram, A., Chen, C., Kar, S., Gayther, S., Pharoah, P., Haiman, C., Stranger, B., Kraft, P., and Freedman, M.L. (2014b). Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types. Hum Mol Genet 23, 5294-5302.

Li, Y., Cheng, H., Zuo, X.B., Sheng, Y.J., Zhou, F.S., Tang, X.F., Tang, H.Y., Gao, J.P., Zhang, Z., He, S.M., et al. (2013). Association analyses identifying two common susceptibility loci shared by psoriasis and systemic lupus erythematosus in the Chinese Han population. J Med Genet 50, 812- 818.

Liang, G., Lin, J.C.Y., Wei, V., Yoo, C., Cheng, J.C., Nguyen, C.T., Weisenberger, D.J., Egger, G., Takai, D., Gonzales, F.A., et al. (2004). Distinct localization of histone H3 acetylation and H3-K4 methylation to the transcription start sites in the human genome. Proc Natl Acad Sci U S A 101, 7357-7362.

Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289-293.

274

Lima, H.C., and Kimball, A.B. (2010). Targeting IL-23: insights into the pathogenesis and the treatment of psoriasis. Indian journal of dermatology 55, 171-175.

Liu, J.Z., van Sommeren, S., Huang, H., Ng, S.C., Alberts, R., Takahashi, A., Ripke, S., Lee, J.C., Jostins, L., Shah, T., et al. (2015). Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet 47, 979-986.

Liu, Y., Helms, C., Liao, W., Zaba, L.C., Duan, S., Gardner, J., Wise, C., Miner, A., Malloy, M.J., Pullinger, C.R., et al. (2008). A genome-wide association study of psoriasis and psoriatic arthritis identifies new disease loci. PLoS Genet 4.

Lonnberg, A.S., Skov, L., Skytthe, A., Kyvik, K.O., Pedersen, O.B., and Thomsen, S.F. (2013). Heritability of psoriasis in a large twin sample. Br J Dermatol 169, 412-416.

Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad, S., Hasz, R., Walters, G., Garcia, F., Young, N., et al. (2013). The Genotype-Tissue Expression (GTEx) project. Nature Genetics 45, 580- 585.

Lopes, R., Korkmaz, G., and Agami, R. (2016). Applying CRISPR-Cas9 tools to identify and characterize transcriptional enhancers. Nat Rev Mol Cell Biol 17, 597-604.

Lowes, M.A., Bowcock, A.M., and Krueger, J.G. (2007). Pathogenesis and therapy of psoriasis. Nature 445, 866-873.

Lowes, M.A., Kikuchi, T., Fuentes-Duculan, J., Cardinale, I., Zaba, L.C., Haider, A.S., Bowman, E.P., and Krueger, J.G. (2008). Psoriasis vulgaris lesions contain discrete populations of Th1 and Th17 T cells. Journal of Investigative Dermatology 128, 1207-1211.

Lysell, J., Padyukov, L., Kockum, I., Nikamo, P., and Stahle, M. (2013). Genetic Association with ERAP1 in Psoriasis Is Confined to Disease Onset after Puberty and Not Dependent on HLA-C*06. J Invest Dermatol 133, 411-417.

Madonna, S., Scarponi, C., Sestito, R., Pallotta, S., Cavani, A., and Albanesi, C. (2010). The IFN- gamma-Dependent Suppressor of Cytokine Signaling 1 Promoter Activity Is Positively Regulated by IFN Regulatory Factor-1 and Sp1 but Repressed by Growth Factor Independence-1b and Kruppel- Like Factor-4, and It Is Dysregulated in Psoriatic Keratinocytes. J Immunol 185, 2467-2481.

Malhotra, N., Narayan, K., Cho, O.H., Sylvia, K.E., Yin, C., Melichar, H., Rashighi, M., Lefebvre, V., Harris, J.E., Berg, L.J., et al. (2013). A network of high-mobility group box transcription factors programs innate interleukin-17 production. Immunity 38, 681-693.

Mallbris, L., Larsson, P., Bergqvist, S., Vingard, E., Granath, F., and Stahle, M. (2005). Psoriasis phenotype at disease onset: Clinical characterization of 400 adult cases. Journal of Investigative Dermatology 124, 499-504.

Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., and Chen, W.M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867-2873.

275

Marchini, J., and Howie, B. (2010). Genotype imputation for genome-wide association studies. Nat Rev Genet 11, 499-511.

Markham, T., Watson, A., and Rogers, S. (2002). Adverse effects with long-term cyclosporin for severe psoriasis. Clin Exp Dermatol 27, 111-114.

Martin, B.A., Chalmers, R.J.G., and Telfer, N.R. (1996). How great is the risk of further psoriasis following a single episode of acute guttate psoriasis? Arch Dermatol 132, 717-718.

Martin, D.A., Towne, J.E., Kricorian, G., Klekotka, P., Gudjonsson, J.E., Krueger, J.G., and Russell, C.B. (2013). The Emerging Role of IL-17 in the Pathogenesis of Psoriasis: Preclinical and Clinical Findings. Journal of Investigative Dermatology 133, 17-26.

Martin, P., McGovern, A., Massey, J., Schoenfelder, S., Duffus, K., Yarwood, A., Barton, A., Worthington, J., Fraser, P., Eyre, S., et al. (2016). Identifying Causal Genes at the Multiple Sclerosis Associated Region 6q23 Using Capture Hi-C. PLoS One 11.

Martin, P., McGovern, A., Orozco, G., Duffus, K., Yarwood, A., Schoenfelder, S., Cooper, N.J., Barton, A., Wallace, C., Fraser, P., et al. (2015). Capture Hi-C reveals novel candidate genes and complex long-range interactions with related autoimmune risk loci. Nat Commun 6, 10069.

Martinez-Garcia, E., Arias-Santiago, S., Valenzuela-Salas, I., Garrido-Colmenero, C., Garcia- Mellado, V., and Buendia-Eisman, A. (2014). Quality of life in persons living with psoriasis patients. J Am Acad Dermatol 71, 302-307.

Mason, J., Mason, A.R., and Cork, M.J. (2002). Topical preparations for the treatment of psoriasis: a systematic review. The British journal of dermatology 146, 351-364.

Matthews, D., Fry, L., Powles, A., Weber, J., McCarthy, M., Fisher, E., Davies, K., and Williamson, R. (1996). Evidence that a locus for familial psoriasis maps to chromosome 4q. Nature Genetics 14, 231-233.

McCarthy, S., Das, S., Kretzschmar, W., Delaneau, O., Wood, A.R., Teumer, A., Kang, H.M., Fuchsberger, C., Danecek, P., Sharp, K., et al. (2016). A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics 48, 1279-1283.

McGovern, A., Schoenfelder, S., Martin, P., Massey, J., Duffus, K., Plant, D., Yarwood, A., Pratt, A.G., Anderson, A.E., Isaacs, J.D., et al. (2016). Capture Hi-C identifies a novel causal gene, IL20RA, in the pan-autoimmune genetic susceptibility region 6q23. Genome Biol 17.

McVicker, G., van de Geijn, B., Degner, J.F., Cain, C.E., Banovich, N.E., Raj, A., Lewellen, N., Myrthil, M., Gilad, Y., and Pritchard, J.K. (2013). Identification of Genetic Variants That Affect Histone Modifications in Human Cells. Science 342, 747-749.

Mease, P.J. (2015). Inhibition of interleukin-17, interleukin-23 and the TH17 cell pathway in the treatment of psoriatic arthritis and psoriasis. Curr Opin Rheumatol 27, 127-133.

Menasche, G., Pastural, E., Feldmann, J., Certain, S., Ersoy, F., Dupuis, S., Wulffraat, N., Bianchi, D., Fischer, A., Le Deist, F., et al. (2000). Mutations in RAB27A cause Griscelli syndrome associated with haemophagocytic syndrome. Nat Genet 25, 173-176. 276

Menter, A., Tyring, S.K., Gordon, K., Kimball, A.B., Leonardi, C.L., Langley, R.G., Strober, B.E., Kaul, M., Gu, Y., Okun, M., et al. (2008). Adalimumab therapy for moderate to severe psoriasis: A randomized, controlled phase III trial. J Am Acad Dermatol 58, 106-115.

Metzker, M.L. (2010). Sequencing technologies - the next generation. Nat Rev Genet 11, 31-46.

Michailidou, K., Hall, P., Gonzalez-Neira, A., Ghoussaini, M., Dennis, J., Milne, R.L., Schmidt, M.K., Chang-Claude, J., Bojesen, S.E., Bolla, M.K., et al. (2013). Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet 45, 353-361, 361e351-352.

Miele, A., and Dekker, J. (2008). Long-range chromosomal interactions and gene regulation. Molecular bioSystems 4, 1046-1057.

Mifsud, B., Tavares-Cadete, F., Young, A.N., Sugar, R., Schoenfelder, S., Ferreira, L., Wingett, S.W., Andrews, S., Grey, W., Ewels, P.A., et al. (2015). Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nature Genetics 47, 598-606.

Milavec-Puretic, V., Mance, M., Ceovic, R., and Lipozencic, J. (2011). Drug Induced Psoriasis. Acta Dermatovenerol Croat 19, 39-42.

Montgomery, S.B., Sammeth, M., Gutierrez-Arcelus, M., Lach, R.P., Ingle, C., Nisbett, J., Guigo, R., and Dermitzakis, E.T. (2010). Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773-U151.

Mrowietz, U., and van de Kerkhof, P.C.M. (2011). Management of palmoplantar pustulosis: do we need to change? Br J Dermatol 164, 942-946.

Mumbach, M.R., Rubin, A.J., Flynn, R.A., Dai, C., Khavari, P.A., Greenleaf, W.J., and Chang, H.Y. (2016). HiChIP: Efficient and sensitive analysis of protein-directed genome architecture. Nat Methods 13, 919-922.

Mumbach, M.R., Satpathy, A.T., Boyle, E.A., Dai, C., Gowen, B.G., Cho, S.W., Nguyen, M.L., Rubin, A.J., Granja, J.M., Kazane, K.R., et al. (2017). Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat Genet 49, 1602-1612.

Musunuru, K., Strong, A., Frank-Kamenetsky, M., Lee, N.E., Ahfeldt, T., Sachs, K.V., Li, X., Li, H., Kuperwasser, N., Ruda, V.M., et al. (2010). From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714-719.

Myers, A.J., Gibbs, J.R., Awebster, J., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M., Leung, D., Bryden, L., Nath, P., et al. (2007). A survey of genetic human cortical gene expression. Nature Genetics 39, 1494-1499.

Nair, R.P., Duffin, K.C., Helms, C., Ding, J., Stuart, P.E., Goldgar, D., Gudjonsson, J.E., Li, Y., Tejasvi, T., Feng, B.J., et al. (2009). Genome-wide scan reveals association of psoriasis with IL-23 and NF- kappa B pathways. Nature Genetics 41, 199-204.

Nair, R.P., Henseler, T., Jenisch, S., Stuart, P., Bichakjian, C.K., Lenk, W., Westphal, E., Guo, S.W., Christophers, E., Voorhees, J.J., et al. (1997). Evidence for two psoriasis susceptibility loci (HLA and

277

17q) and two novel candidate regions (16q and 20p) by genome-wide scan. Hum Mol Genet 6, 1349-1356.

Nair, R.P., Ruether, A., Stuart, P.E., Jenisch, S., Tejasvi, T., Hiremagalore, R., Schreiber, S., Kabelitz, D., Lim, H.W., Voorhees, J.J., et al. (2008). Polymorphisms of the IL12B and IL23R genes are associated with psoriasis. Journal of Investigative Dermatology 128, 1653-1661.

Nakamura, M., Lee, K., Jeon, C., Sekhon, S., Afifi, L., Yan, D., Lee, K., and Bhutani, T. (2017). Guselkumab for the Treatment of Psoriasis: A Review of Phase III Trials. Dermatology and Therapy 7, 281-292.

Nature Editorial (2017). It's all druggable. Nat Genet 49, 169.

Naumova, N., Smith, E.M., Zhan, Y., and Dekker, J. (2012). Analysis of long-range chromatin interactions using Chromosome Conformation Capture. Methods 58, 192-203.

Navarini, A.A., Burden, D., Capon, F., Mrowietz, U., Puig, L., Smith, C.H., and Barker, J. (2016). European consensus statement on phenotypes of pustular psoriasis. Journal of Investigative Dermatology 136, S236-S236.

Nelson, M.R., Tipney, H., Painter, J.L., Shen, J., Nicoletti, P., Shen, Y., Floratos, A., Sham, P.C., Li, M.J., Wang, J., et al. (2015). The support of human genetic evidence for approved drug indications. Nat Genet 47, 856-860.

Nevitt, G.J., and Hutchinson, P.E. (1996). Psoriasis in the community: Prevalence, severity and patients' beliefs and attitudes towards the disease. Br J Dermatol 135, 533-537.

Niehues, H., Tsoi, L.C., van der Krieken, D.A., Jansen, P.A.M., Oortveld, M.A.W., Rodijk-Olthuis, D., van Vlijmen, I., Hendriks, W., Helder, R.W., Bouwstra, J.A., et al. (2017). Psoriasis-Associated Late Cornified Envelope (LCE) Proteins Have Antibacterial Activity. The Journal of investigative dermatology 137, 2380-2388.

Nititham, J., Taylor, K.E., Gupta, R., Chen, H., Ahn, R., Liu, J., Seielstad, M., Ma, A., Bowcock, A.M., Criswell, L.A., et al. (2015). Meta-analysis of the TNFAIP3 region in psoriasis reveals a risk haplotype that is distinct from other autoimmune diseases. Genes and immunity 16, 120-126.

Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I., Servant, N., Piolot, T., van Berkum, N.L., Meisig, J., Sedat, J., et al. (2012). Spatial partitioning of the regulatory landscape of the X- inactivation centre. Nature 485, 381-385.

Okada, Y., Han, B., Tsoi, L.C., Stuart, P.E., Ellinghaus, E., Tejasvi, T., Chandran, V., Pellett, F., Pollock, R., Bowcock, A.M., et al. (2014a). Fine Mapping Major Histocompatibility Complex Associations in Psoriasis and Its Clinical Subtypes. Am J Hum Genet 95, 162-172.

Okada, Y., Wu, D., Trynka, G., Raj, T., Terao, C., Ikari, K., Kochi, Y., Ohmura, K., Suzuki, A., Yoshida, S., et al. (2014b). Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376-381.

Palmer, C., and Pe’er, I. (2017). Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies. PLoS Genet 13, e1006916. 278

Papp, K., Leonardi, C., Menter, A., Thompson, E.H., Milmont, C.E., Kricorian, G., Nirula, A., and Klekotka, P. (2014). Safety and efficacy of brodalumab for psoriasis after 120 weeks of treatment. J Am Acad Dermatol 71, 1183-1190 e1183.

Papp, K.A., Langley, R.G., Lebwohl, M., Krueger, G.G., Szapary, P., Yeilding, N., Guzzo, C., Hsu, M.C., Wang, Y.H., Li, S., et al. (2008). Efficacy and safety of ustekinumab, a human interleukin- 12/23 monoclonal antibody, in patients with psoriasis: 52-week results from a randomised, double-blind, placebo-controlled trial (PHOENIX 2). Lancet 371, 1675-1684.

Parisi, R., Symmons, D.P.M., Griffiths, C.E.M., Ashcroft, D.M., and Identification, M. (2013). Global Epidemiology of Psoriasis: A Systematic Review of Incidence and Prevalence. J Invest Dermatol 133, 377-385.

Patterson, N., Price, A.L., and Reich, D. (2006). Population structure and eigenanalysis. PLoS Genet 2, 2074-2093.

Paul, C., Lacour, J.P., Tedremets, L., Kreutzer, K., Jazayeri, S., Adams, S., Guindon, C., You, R., Papavassilis, C., and Grp, J.S. (2015). Efficacy, safety and usability of secukinumab administration by autoinjector/pen in psoriasis: a randomized, controlled trial (JUNCTURE). J Eur Acad Dermatol Venereol 29, 1082-1090.

Pearson, T.A., and Manolio, T.A. (2008). How to interpret a genome-wide association study. Jama 299, 1335-1344.

Perera, G.K., Di Meglio, P., and Nestle, F.O. (2012). Psoriasis. Annu Rev Pathol-Mech Dis 7, 385- 422.

Pezzolesi, M.G., Poznik, G.D., Mychaleckyj, J.C., Paterson, A.D., Barati, M.T., Klein, J.B., Ng, D.P., Placha, G., Canani, L.H., Bochenski, J., et al. (2009). Genome-wide association scan for diabetic nephropathy susceptibility genes in type 1 diabetes. Diabetes 58, 1403-1410.

Phipson, B., Lee, S., Majewski, I.J., Alexander, W.S., and Smyth, G.K. (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. The annals of applied statistics 10, 946-963.

Pickrell, J.K., Marioni, J.C., Pai, A.A., Degner, J.F., Engelhardt, B.E., Nkadori, E., Veyrieras, J.B., Stephens, M., Gilad, Y., and Pritchard, J.K. (2010). Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768-772.

Piper, J., Elze, M.C., Cauchy, P., Cockerill, P.N., Bonifer, C., and Ott, S. (2013). Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res 41, e201.

Pique-Regi, R., Degner, J.F., Pai, A.A., Gaffney, D.J., Gilad, Y., and Pritchard, J.K. (2011). Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res 21, 447-455.

Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38, 904-909. 279

Pruim, R.J., Welch, R.P., Sanna, S., Teslovich, T.M., Chines, P.S., Gliedt, T.P., Boehnke, M., Abecasis, G.R., and Willer, C.J. (2010). LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336-2337.

Puig, L., Julia, A., and Marsal, S. (2014). The pathogenesis and genetics of psoriasis. Actas dermo- sifiliograficas 105, 535-545.

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559-575.

Qin, P.P., Zhang, Q.L., Chen, M.F., Fu, X., Wang, C., Wang, Z.Z., Yu, G.Q., Yu, Y.X., Li, X.Y., Sun, Y.H., et al. (2014). Variant analysis of CARD14 in a Chinese Han population with psoriasis vulgaris and generalized pustular psoriasis. Journal of Investigative Dermatology 134, 2994-2996.

Quaranta, M., Burden, A.D., Griffiths, C.E., Worthington, J., Barker, J.N., Trembath, R.C., and Capon, F. (2009). Differential contribution of CDKAL1 variants to psoriasis, Crohn's disease and type II diabetes. Genes and immunity 10, 654-658.

Quatresooz, P., Hermanns-Le, T., Pierard, G.E., Humbert, P., Delvenne, P., and Pierard- Franchimont, C. (2012). Ustekinumab in Psoriasis Immunopathology with Emphasis on the Th17- IL23 Axis: A Primer. J Biomed Biotechnol.

Quinlan, A.R. (2014). BEDTools: the Swiss-army tool for genome feature analysis. Current protocols in bioinformatics / editoral board, Andreas D Baxevanis [et al] 47, 11 12 11-11 12 34.

Rachakonda, T.D., Schupp, C.W., and Armstrong, A.W. (2014). Psoriasis prevalence among adults in the United States. J Am Acad Dermatol 70, 512-516.

Rácz, E., Prens, E.P., Kurek, D., Kant, M., de Ridder, D., Mourits, S., Baerveldt, E.M., Ozgur, Z., van Ijcken, W.F.J., Laman, J.D., et al. (2011). Effective Treatment of Psoriasis with Narrow-Band UVB Phototherapy Is Linked to Suppression of the IFN and Th17 Pathways. Journal of Investigative Dermatology 131, 1547-1558.

Ramani, V., Deng, X.X., Qiu, R.L., Gunderson, K.L., Steemers, F.J., Disteche, C.M., Noble, W.S., Duan, Z.J., and Shendure, J. (2017). Massively multiplex single-cell Hi-C. Nature Methods 14, 263- 266.

Ramasamy, A., Trabzuni, D., Guelfi, S., Varghese, V., Smith, C., Walker, R., De, T., Coin, L., de Silva, R., Cookson, M.R., et al. (2014). Genetic variability in the regulation of gene expression in ten regions of the human brain. Nature neuroscience 17, 1418-1428.

Ramos, P.S., Criswell, L.A., Moser, K.L., Comeau, M.E., Williams, A.H., Pajewski, N.M., Chung, S.A., Graham, R.R., Zidovetzki, R., Kelly, J.A., et al. (2011). A Comprehensive Analysis of Shared Loci between Systemic Lupus Erythematosus (SLE) and Sixteen Autoimmune Diseases Reveals Limited Genetic Overlap. PLoS Genet 7.

Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665-1680. 280

Rapp, S.R., Feldman, S.R., Exum, M.L., Fleischer, A.B., and Reboussin, D.M. (1999). Psoriasis causes as much disability as other major medical diseases. J Am Acad Dermatol 41, 401-407.

Ray-Jones, H., Eyre, S., Barton, A., and Warren, R.B. (2016). One SNP at a Time: Moving beyond GWAS in Psoriasis. The Journal of investigative dermatology 136, 567-573.

Raychaudhuri, S.P., Jiang, W.Y., and Raychaudhuri, S.K. (2008). Revisiting the Koebner phenomenon - Role of NGF and its receptor system in the pathogenesis of psoriasis. American Journal of Pathology 172, 961-971.

Reich, K., Mossner, R., Konig, I.R., Westphal, G., Ziegler, A., and Neumann, C. (2002). Promoter polymorphisms of the genes encoding tumor necrosis factor-alpha and interleukin-1 beta are associated with different subtypes of psoriasis characterized by early and late disease onset. Journal of Investigative Dermatology 118, 155-163.

Reich, K., Nestle, F.O., Papp, K., Ortonne, J.P., Evans, R., Guzzo, C., Li, S., Dooley, L.T., Griffiths, C.E.M., and Investigators, E.S. (2005). Infliximab induction and maintenance therapy for moderate-to-severe psoriasis: a phase III, multicentre, double-blind trial. Lancet 366, 1367-1374.

Ritchie, M.E., Diyagama, D., Neilson, J., van Laar, R., Dobrovic, A., Holloway, A., and Smyth, G.K. (2006). Empirical array quality weights in the analysis of microarray data. BMC bioinformatics 7, 261.

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47.

Roberts, A.R., Vecellio, M., Chen, L., Ridley, A., Cortes, A., Knight, J.C., Bowness, P., Cohen, C.J., and Wordsworth, B.P. (2016). An ankylosing spondylitis-associated genetic variant in the IL23R- IL12RB2 intergenic region modulates enhancer activity and is associated with increased Th1-cell differentiation. Ann Rheum Dis 75, 2150-2156.

Rodriguez, M.L., Kaminska, D., Lappalainen, K., Pihlajamaki, J., Kaikkonen, M.U., and Laakso, M. (2017). Identification and characterization of a FOXA2-regulated transcriptional enhancer at a type 2 diabetes intronic locus that controls GCKR expression in liver cells. Genome Med 9, 14.

Samuelsson, L., Enlund, F., Torinsson, A., Yhr, M., Inerot, A., Enerback, C., Wahlstrom, J., Swanbeck, G., and Martinsson, T. (1999). A genome-wide search for genes predisposing to familial psoriasis by using a stratification approach. Hum Genet 105, 523-529.

Sanseau, P., Agarwal, P., Barnes, M.R., Pastinen, T., Richards, J.B., Cardon, L.R., and Mooser, V. (2012). Use of genome-wide association studies for drug repositioning. Nature Biotechnology 30, 317-320.

Sawcer, S., Hellenthal, G., Pirinen, M., Spencer, C.C.A., Patsopoulos, N.A., Moutsianas, L., Dilthey, A., Su, Z., Freeman, C., Hunt, S.E., et al. (2011). Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214-219.

281

Schadt, E.E., Molony, C., Chudin, E., Hao, K., Yang, X., Lum, P.Y., Kasarskis, A., Zhang, B., Wang, S., Suver, C., et al. (2008). Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6, 1020-1032.

Schlaak, J.F., Buslau, M., Jochum, W., Hermann, E., Girndt, M., Gallati, H., Zumbuschenfelde, K.H.M., and Fleischer, B. (1994). T-cells involved in psoriasis-vulgaris belong to the Th1 subset. Journal of Investigative Dermatology 102, 145-149.

Schmitt, A.D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C.L., Li, Y., Lin, S., Lin, Y., Barr, C.L., et al. (2016). A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. Cell Rep 17, 2042-2059.

Schmittgen, T.D., and Livak, K.J. (2008). Analyzing real-time PCR data by the comparative C-T method. Nature Protocols 3, 1101-1108.

Schoenfelder, S., Clay, I., and Fraser, P. (2010a). The transcriptional interactome: gene expression in 3D. Current opinion in genetics & development 20, 127-133.

Schoenfelder, S., Furlan-Magaril, M., Mifsud, B., Tavares-Cadete, F., Sugar, R., Javierre, B.M., Nagano, T., Katsman, Y., Sakthidevi, M., Wingett, S.W., et al. (2015). The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res 25, 582- 597.

Schoenfelder, S., Sexton, T., Chakalova, L., Cope, N.F., Horton, A., Andrews, S., Kurukuti, S., Mitchell, J.A., Umlauf, D., Dimitrova, D.S., et al. (2010b). Preferential associations between co- regulated genes reveal a transcriptional interactome in erythroid cells. Nature genetics 42, 53-61.

Schofield, E.C., Carver, T., Achuthan, P., Freire-Pritchett, P., Spivakov, M., Todd, J.A., and Burren, O.S. (2016). CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets. Bioinformatics 32, 2511-2513.

Schon, M.P., and Boehncke, W.H. (2005). Medical progress - Psoriasis. N Engl J Med 352, 1899- 1912.

Segre, J.A., Bauer, C., and Fuchs, E. (1999). Klf4 is a transcription factor required for establishing the barrier function of the skin. Nature Genetics 22, 356-360.

Seo, M.D., Kang, T.J., Lee, C.H., Lee, A.Y., and Noh, M. (2012). HaCaT Keratinocytes and Primary Epidermal Keratinocytes Have Different Transcriptional Profiles of Cornified Envelope-Associated Genes to T Helper Cell Cytokines. Biomol Ther 20, 171-176.

Shaw, F.L., Cumberbatch, M., Kleyn, C.E., Begum, R., Dearman, R.J., Kimber, I., and Griffiths, C.E.M. (2010). Langerhans Cell Mobilization Distinguishes between Early-Onset and Late-Onset Psoriasis. Journal of Investigative Dermatology 130, 1940-1942.

Sheng, Y., Jin, X., Xu, J.H., Gao, J.P., Du, X.Q., Duan, D.W., Li, B., Zhao, J.H., Zhan, W.Y., Tang, H.Y., et al. (2014). Sequencing-based approach identified three new susceptibility loci for psoriasis. Nat Commun 5.

282

Shi, X., Jin, L., Dang, E., Chang, T., Feng, Z., Liu, Y., and Wang, G. (2011). IL-17A upregulates keratin 17 expression in keratinocytes through STAT1- and STAT3-dependent mechanisms. The Journal of investigative dermatology 131, 2401-2408.

Simeonov, D.R., Gowen, B.G., Boontanrart, M., Roth, T.L., Gagnon, J.D., Mumbach, M.R., Satpathy, A.T., Lee, Y., Bray, N.L., Chan, A.Y., et al. (2017). Discovery of stimulation-responsive immune enhancers with CRISPR activation. Nature 549, 111-115.

Simon, J.M., Giresi, P.G., Davis, I.J., and Lieb, J.D. (2012). Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA. Nat Protoc 7, 256-267.

Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B., and de Laat, W. (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature Genetics 38, 1348-1354.

Skol, A.D., Scott, L.J., Abecasis, G.R., and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38, 209-213.

Smemo, S., Tena, J.J., Kim, K.H., Gamazon, E.R., Sakabe, N.J., Gómez-Marín, C., Aneas, I., Credidio, F.L., Sobreira, D.R., Wasserman, N.F., et al. (2014). Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371-375.

Song, L., and Crawford, G.E. (2010). DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor protocols 2010, pdb prot5384.

Springate, D.A., Parisi, R., Kontopantelis, E., Reeves, D., Griffiths, C.E.M., and Ashcroft, D.M. (2017). Incidence, prevalence and mortality of patients with psoriasis: a UK population-based cohort study. Br J Dermatol 176, 650-658.

Staley, J.R., Blackshaw, J., Kamat, M.A., Ellis, S., Surendran, P., Sun, B.B., Paul, D.S., Freitag, D., Burgess, S., Danesh, J., et al. (2016). PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207-3209.

Stern, R.S. (1997). Psoriasis. Lancet 350, 349-353.

Stern, R.S. (2001). Puva Follow Up Study: The risk of melanoma in association with long-term exposure to PUVA. J Am Acad Dermatol 44, 755-761.

Strange, A., Capon, F., Spencer, C.C.A., Knight, J., Weale, M.E., Allen, M.H., Barton, A., Band, G., Bellenguez, C., Bergboer, J.G.M., et al. (2010). A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nature Genetics 42, 985-U106.

Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., et al. (2007). Population genomics of human gene expression. Nature Genetics 39, 1217-1224.

283

Stuart, P.E., Nair, R.P., Ellinghaus, E., Ding, J., Tejasvi, T., Gudjonsson, J.E., Li, Y., Weidinger, S., Eberlein, B., Gieger, C., et al. (2010). Genome-wide association analysis identifies three psoriasis susceptibility loci. Nature Genet 42, 1000-U1125.

Stuart, P.E., Nair, R.P., Tsoi, L.C., Tejasvi, T., Das, S., Kang, H.M., Ellinghaus, E., Chandran, V., Callis- Duffin, K., Ike, R., et al. (2015). Genome-wide Association Analysis of Psoriatic Arthritis and Cutaneous Psoriasis Reveals Differences in Their Genetic Architecture. Am J Hum Genet 97, 816- 836.

Suhre, K., Wallaschofski, H., Raffler, J., Friedrich, N., Haring, R., Michael, K., Wasner, C., Krebs, A., Kronenberg, F., Chang, D., et al. (2011). A genome-wide association study of metabolic traits in human urine. Nat Genet 43, 565-569.

Sun, L.D., Cheng, H., Wang, Z.X., Zhang, A.P., Wang, P.G., Xu, J.H., Zhu, Q.X., Zhou, H.S., Ellinghaus, E., Zhang, F.R., et al. (2010). Association analyses identify six new psoriasis susceptibility loci in the Chinese population. Nature Genetics 42, 1005-U1132.

Swindell, W.R., Sarkar, M.K., Stuart, P.E., Voorhees, J.J., Elder, J.T., Johnston, A., and Gudjonsson, J.E. (2015). Psoriasis drug development and GWAS interpretation through in silico analysis of transcription factor binding sites. Clinical and translational medicine 4, 13.

Swindell, W.R., Stuart, P.E., Sarkar, M.K., Voorhees, J.J., Elder, J.T., Johnston, A., and Gudjonsson, J.E. (2014). Cellular dissection of psoriasis for transcriptome analyses and the post-GWAS era. BMC Med Genomics 7.

Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K.P., et al. (2015). STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43, D447-452.

Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry = Journal de l'Academie canadienne de psychiatrie de l'enfant et de l'adolescent 19, 227-229.

Tang, H., Jin, X., Li, Y., Jiang, H., Tang, X., Yang, X., Cheng, H., Qiu, Y., Chen, G., Mei, J., et al. (2014). A large-scale screen for coding variants predisposing to psoriasis. Nat Genet 46, 45-50.

Teunissen, M.B.M., Koomen, C.W., Malefyt, R.D., Wierenga, E.A., and Bos, J.D. (1998). Interleukin- 17 and interferon-gamma synergize in the enhancement of proinflammatory cytokine production by human keratinocytes. Journal of Investigative Dermatology 111, 645-649.

Thaci, D., Blauvelt, A., Reich, K., Tsai, T.F., Vanaclocha, F., Kingo, K., Ziv, M., Pinter, A., Hugot, S., You, R., et al. (2015). Secukinumab is superior to ustekinumab in clearing skin of subjects with moderate to severe plaque psoriasis: CLEAR, a randomized controlled trial. J Am Acad Dermatol.

The Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661.

Theodorakopoulou, E., Yiu, Z.Z., Bundy, C., Chularojanamontri, L., Gittins, M., Jamieson, L.A., Motta, L., Warren, R.B., and Griffiths, C.E. (2016). Early- and late-onset psoriasis: a cross-sectional

284 clinical and immunocytochemical investigation. The British journal of dermatology 175, 1038- 1044.

Tracey, D., Klareskog, L., Sasso, E.H., Salfeld, J.G., and Tak, P.P. (2008). Tumor necrosis factor antagonist mechanisms of action: A comprehensive review. Pharmacol Ther 117, 244-279.

Trembath, R.C., Clough, R.L., Rosbotham, J.L., Jones, A.B., Camp, R.D.R., Frodsham, A., Browne, J., Barber, R., Terwilliger, J., Lathrop, G.M., et al. (1997). Identification of a major susceptibility locus on chromosome 6p and evidence for further disease loci revealed by a two stage genome-wide search in psoriasis. Hum Mol Genet 6, 813-820.

Tsoi, L.C., Spain, S.L., Ellinghaus, E., Stuart, P.E., Capon, F., Knight, J., Tejasvi, T., Kang, H.M., Allen, M.H., Lambert, S., et al. (2015). Enhanced meta-analysis and replication studies identify five new psoriasis susceptibility loci. Nat Commun 6, 7001.

Tsoi, L.C., Spain, S.L., Knight, J., Ellinghaus, E., Stuart, P.E., Capon, F., Ding, J., Li, Y.M., Tejasvi, T., Gudjonsson, J.E., et al. (2012). Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nature Genetics 44, 1341-1348.

Tsoi, L.C., Stuart, P.E., Tian, C., Gudjonsson, J.E., Das, S., Zawistowski, M., Ellinghaus, E., Barker, J.N., Chandran, V., Dand, N., et al. (2017). Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants. Nat Commun 8, 8.

Turner, S., Armstrong, L.L., Bradford, Y., Carlson, C.S., Crawford, D.C., Crenshaw, A.T., de Andrade, M., Doheny, K.F., Haines, J.L., Hayes, G., et al. (2011). Quality control procedures for genome-wide association studies. Current protocols in human genetics / editorial board, Jonathan L Haines [et al] Chapter 1, Unit1.19.

Tyring, S., Gottlieb, A., Papp, K., Gordon, K., Leonardi, C., Wang, A., Lalla, D., Woolley, M., Jahreis, A., Zitnik, R., et al. (2006). Etanercept and clinical outcomes, fatigue, and depression in psoriasis: double-blind placebo-controlled randomised phase III trial. Lancet 367, 29-35.

Uyemura, K., Yamamura, M., Fivenson, D.F., Modlin, R.L., and Nickoloff, B.J. (1993). The cytokine network in lesional and lesion-free psoriatic skin is characterized by a T-helper type-1 cell- mediated response. Journal of Investigative Dermatology 101, 701-705.

Valdimarsson, H., Thorleifsdottir, R.H., Sigurdardottir, S.L., Gudjonsson, J.E., and Johnston, A. (2009). Psoriasis - as an autoimmune disease caused by molecular mimicry. Trends Immunol 30, 494-501.

Van Joost, T., Heule, F., Stolz, E., and Beukers, R. (1986). Short-term use of cyclosporin A in severe psoriasis. The British journal of dermatology 114, 615-620.

Veal, C.D., Clough, R.L., Barber, R.C., Mason, S., Tillman, D., Ferry, B., Jones, A.B., Ameen, M., Balendran, N., Powis, S.H., et al. (2001). Identification of a novel psoriasis susceptibility locus at 1p and evidence of epistasis between PSORS1 and candidate loci. J Med Genet 38, 7-13.

Veyrieras, J.B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., and Pritchard, J.K. (2008). High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation. PLoS Genet 4. 285

Visscher, P.M., Brown, M.A., McCarthy, M.I., and Yang, J. (2012). Five Years of GWAS Discovery. Am J Hum Genet 90, 7-24.

Visser, M., Palstra, R.J., and Kayser, M. (2015). Allele-specific transcriptional regulation of IRF4 in melanocytes is mediated by chromatin looping of the intronic rs12203592 enhancer to the IRF4 promoter. Hum Mol Genet 24, 2649-2661.

Wagner, E.F., Schonthaler, H.B., Guinea-Viniegra, J., and Tschachler, E. (2010). Psoriasis: what we have learned from mouse models. Nat Rev Rheumatol 6, 704-714.

Wang, S., Wen, F., Tessneer, K.L., and Gaffney, P.M. (2016). TALEN-mediated enhancer knockout influences TNFAIP3 gene expression and mimics a molecular phenotype associated with systemic lupus erythematosus. Genes and immunity 17, 165-170.

Wang, S.F., Wen, F., Wiley, G.B., Kinter, M.T., and Gaffney, P.M. (2013). An Enhancer Element Harboring Variants Associated with Systemic Lupus Erythematosus Engages the TNFAIP3 Promoter to Influence A20 Expression. PLoS Genet 9, 10.

Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57-63.

Wendling, D., Balblanc, J.C., Brianqon, D., Brousse, A., Lohse, A., DepreZ, P., Humbert, P., and Aubin, F. (2008). Onset or exacerbation of cutaneous psoriasis during TNF alpha antagonist therapy. Joint Bone Spine 75, 315-318.

Westra, H.-J., Peters, M.J., Esko, T., Yaghootkar, H., Schurmann, C., Kettunen, J., Christiansen, M.W., Fairfax, B.P., Schramm, K., Powell, J.E., et al. (2013). Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet 45, 1238-1243.

Willan, R. (1808). On cutaneous diseases. Vol. 1 (London: J. Johnson).

Wingett, S., Ewels, P., Furlan-Magaril, M., Nagano, T., Schoenfelder, S., Fraser, P., and Andrews, S. (2015). HiCUP: pipeline for mapping and processing Hi-C data. F1000Research 4, 1310.

Wong, T., Hsu, L., and Liao, W. (2013). Phototherapy in psoriasis: a review of mechanisms of action. Journal of cutaneous medicine and surgery 17, 6-12.

Wrone-Smith, T., and Nickoloff, B.J. (1996). Dermal injection of immunocytes induces psoriasis. J Clin Invest 98, 1878-1887.

Yao, C., Sakata, D., Esaki, Y., Li, Y., Matsuoka, T., Kuroiwa, K., Sugimoto, Y., and Narumiya, S. (2009). Prostaglandin E2-EP4 signaling promotes immune inflammation through Th1 cell differentiation and Th17 cell expansion. Nature medicine 15, 633-640.

Ye, L., Lv, C.Z., Man, G., Song, S.P., Elias, P.M., and Man, M.Q. (2014). Abnormal Epidermal Barrier Recovery in Uninvolved Skin Supports the Notion of an Epidermal Pathogenesis of Psoriasis. Journal of Investigative Dermatology 134, 2843-2846.

286

Yin, X., Low, H.Q., Wang, L., Li, Y., Ellinghaus, E., Han, J., Estivill, X., Sun, L., Zuo, X., Shen, C., et al. (2015). Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat Commun 6, 6916.

Yip, S.Y. (1984). The prevalence of psoriasis in the Mongoloid race. J Am Acad Dermatol 10, 965- 968.

Yu, F., Li, J., Chen, H., Fu, J., Ray, S., Huang, S., Zheng, H., and Ai, W. (2011). Kruppel-like factor 4 (KLF4) is required for maintenance of breast cancer stem cells and for cell migration and invasion. Oncogene 30, 2161-2172.

Zaitlen, N., and Kraft, P. (2012). Heritability in the genome-wide association era. Hum Genet 131, 1655-1664.

Zeggini, E., Panoutsopoulou, K., Southam, L., Rayner, N.W., Day-Williams, A.G., Lopes, M.C., Boraska, V., Esko, T., Evangelou, E., Hofman, A., et al. (2012). Identification of new susceptibility loci for osteoarthritis (arcOGEN): a genome-wide association study. Lancet 380, 815-823.

Zeller, T., Wild, P., Szymczak, S., Rotival, M., Schillert, A., Castagne, R., Maouche, S., Germain, M., Lackner, K., Rossmann, H., et al. (2010). Genetics and Beyond - The Transcriptome of Human Monocytes and Disease Susceptibility. PLoS One 5.

Zerbino, D.R., Wilder, S.P., Johnson, N., Juettemann, T., and Flicek, P.R. (2015). The Ensembl Regulatory Build. Genome Biol 16, 8.

Zhang, X.J., He, P.P., Wang, Z.X., Zhang, J., Li, Y.B., Wang, H.Y., Wei, S.C., Chen, S.Y., Xu, S.J., Jin, L., et al. (2002). Evidence for a major psoriasis susceptibility locus at 6p21(PSORS1) and a novel candidate region at 4q31 by genome-wide scan in Chinese Hans. Journal of Investigative Dermatology 119, 1361-1366.

Zhang, X.J., Huang, W., Yang, S., Sun, L.D., Zhang, F.Y., Zhu, Q.X., Zhang, F.R., Zhang, C., Du, W.H., Pu, X.M., et al. (2009). Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21. Nature Genetics 41, 205-210.

Zhang, Y.B., Hu, J., Zhang, J., Zhou, X., Li, X., Gu, C., Liu, T., Xie, Y., Liu, J., Gu, M., et al. (2016). Genome-wide association study identifies multiple susceptibility loci for craniofacial microsomia. Nat Commun 7, 10605.

Zhou, X., Lowdon, R.F., Li, D., Lawson, H.A., Madden, P.A.F., Costello, J.F., and Wang, T. (2013). Exploring long-range genome interaction data using the WashU Epigenome Browser. Nat Methods 10.

Zuo, X., Sun, L., Yin, X., Gao, J., Sheng, Y., Xu, J., Zhang, J., He, C., Qiu, Y., Wen, G., et al. (2015). Whole-exome SNP array identifies 15 new susceptibility loci for psoriasis. Nat Commun 6, 6793.

287

288

5. APPENDIX

289

Table 23: Sources for standard protocols used for kits in this thesis

Date Supplier Name of kit Purpose Protocol URL accessed Assessing quantity http://www.genomics.agilent.com/literat High Sensitivity DNA and size distribution ure.jsp?crumbAction=push&tabId=AG- 10/05/17 Kit of DNA libraries PR-1040&contentType=User+Manual Assessing quantity http://www.genomics.agilent.com/literat RNA 6000 Nano Kit and integrity of RNA ure.jsp?crumbAction=push&tabId=AG- 10/05/17 Agilent samples PR-1172&contentType=User+Manual Technologies SureSelectXT Target Enrichment System Hybridising capture http://www.agilent.com/cs/library/user for Illumina Paired- baits to Hi-C 10/05/17 manuals/Public/G7530-90000.pdf End Multiplexed libraries Sequencing Library http://support.illumina.com/downloads/i Infinium® HTS Assay Genotyping DNA nfinium_hts_assay_protocol_user_guide 10/05/17 Protocol samples _15045738_a.html Illumina Whole-Genome Gene https://support.illumina.com/downloads Measuring whole- Expression Direct /wggex_direct_hybridization_assay_guid 10/05/17 genome expression Hybridization Assay e_(11322355_a).html https://www.kapabiosystems.com/produ Accurate Kapa KAPA SYBR® FAST ct-applications/products/next- quantification of 10/05/17 Biosystems Universal qPCR Kit generation-sequencing-2/library- CHi-C libraries quantification/ Purelink® PCR https://www.thermofisher.com/order/ca Purifying DNA 23/05/17 Life Purification Kit talog/product/K310001 Technologies TotalPrep® RNA Amplifying and https://www.thermofisher.com/order/ca 10/05/17 Amplification Kit biotinylating RNA talog/product/AMIL1791 Purifying DNA and https://www.qiagen.com/gb/resources/r MinElute PCR eluting in a small esourcedetail?id=fa2ed17d-a5e8-4843- 23/05/17 Purification Kit volume 80c1-3d0ea6c2287d&lang=en Qiagen https://www.qiagen.com/gb/resources/r RNeasy Mini Kit Extracting RNA esourcedetail?id=14e7cf6e-521a-4cf7- 10/05/17 8cbc-bf9f6fa33e24&lang=en Quantifying DNA https://www.thermofisher.com/order/ca dsDNA BR Assay Kit 10/05/17 samples talog/product/Q32850 Qubit Quantifying RNA https://www.thermofisher.com/order/ca RNA HS Assay Kit 10/05/17 samples talog/product/Q32852

290

Table 24: Quality control measures for datasets used in the LOP GWAS

Criteria for excluding samples or variants QC measure PsA HCE BSTOP arcOGEN Sample call rate Call rate < 98% Call rate < 99% Call rate < 97% Heterozygosity rate ± 3 Heterozygosity rate ± 4 Sample heterozygosity Heterozygosity rate < 30.5% or > 34% s.ds from mean s.ds from mean Variant MAF MAF < 1 % MAF < 1% Either: MAF ≥ 5% and call rate < 95%, Variant call rate Call rate < 98% Call rate < 99% or MAF < 5% and call rate < 99% HWE p-value P < 1 x 10-3 P < 7.5 x 10-8 P < 1 x 10-4

2 0 0

s

l a

u 1 5 0

d

i

v

i

d

n i

1 0 0

f

o

r e

b 5 0

m

u N 0

0 5 0 5 0 5 0 5 0 5 0 5 0 4 4 5 5 6 6 7 7 8 8 9 9 0 1

A g e

Figure 56: Histogram of ages of patients in the LOP cohort

291

Table 25: Predictive scores from RegulomeDB, adapted from Boyle et al. (2012).

Functional Supporting data Prediction Score 1a eQTL, TF binding, matched TF motif, matched DNase footprint and DNase peak

1b eQTL, TF binding, any motif, DNase footprint and DNase peak Affects protein 1c eQTL, TF binding, matched TF motif and DNase peak binding and 1d eQTL, TF binding, any motif and DNase peak targets gene 1e eQTL, TF binding and matched TF motif expression 1f eQTL and TF binding/DNase peak 2a TF binding, matched TF motif, matched DNase footprint and DNase peak Affects protein 2b TF binding, any motif, DNase footprint and DNase peak binding 2c TF binding, matched TF motif and DNase peak 3a TF binding, any motif and DNase peak Possibly affects 3b TF binding and matched TF motif protein binding 4 TF binding and DNase peak Does not affect 5 TF binding or DNase peak protein binding 6 Motif hit

Table 26: Primers used in ChIP experiments

Target Primer name Sequence

rs10217259_F1 5’ – TGTTCATGGGGTAACTGAGGA – 3’ rs10217259 (Enhancer 1) rs10217259_R1 5’ – CAAGGAGGATGAGTCAAGGC – 3’

rs6477612_F1 5’ – GATTACATCAGCAGCCAGGC – 3’ rs6477612/ rs6477613 (Enhancer 2) rs6477612_R1 5’ – TGCTCCAAGTACACCAAGAAAC – 3’

rs4978343_F2 5’ – TCCAGCACTTATGTAAAACTGCT – 3’ rs4978343 (Enhancer 3) rs4978343_R2 5’ – TCGGTGTGGTGGAAAGACAT – 3’

CHIP_KLF4_PROM_F1 5’ – CCTGAACCCCAAAGTCAACG – 3’ KLF4 promoter CHIP_KLF4_PROM_R1 5’ – CGGACCTACTTACTCGCCTT – 3’

H3K4me1_POS_F 5’ – GGCTCCCACCTTTCTCATCC – 3’ GAPDH gene H3K4me1_POS_R 5’ – GGCCATCCACAGTCTTCTGG – 3’

H3K4me1_NEG_F 5’ – TACTAGCGGTTTTACGGGCG – 3’ GAPDH promoter H3K4me1_NEG_R 5’ – TCGAACAGGAGGAGCAGAGAGCGA – 3’

H3K27ac_POS_F 5’ – TCGACAGTCAGCCGCATCTTC – 3’ GAPDH promoter H3K27ac_POS_R 5’ – CTAGCCTCCCGGGTTTCTCT – 3’

H3K27ac_NEG_F 5’ – CCGCCTGAGCAAAGTAAATGAG – 3’ MYOD gene H3K27ac_NEG_R 5’ – TGGGCAACCGCTGGTTTGGAT – 3’

292

Table 27: Primers used to test 3C and Hi-C libraries for the presence of long-range and short-range interactions Product Interaction Interaction Locus Primer name Primer sequence length length (bp) (bp)

Human_Myc_G2 5’ - GGAGAACCGGTAATGGCAAA-3’ Long-range 1 MYC 538,098 187 Human_Roger_1R 5’ - TGCCTGATGGATAGTGCTTTC-3’

Human_Myc_G2 5’ - GGAGAACCGGTAATGGCAAA-3’ Long-range 2 MYC 1,820,734 167 Human_Myc_O3 5’ - AAAATGCCCATTTCCTTCTCC-3’

Human_Myc_G2 5’ - GGAGAACCGGTAATGGCAAA-3’ Long-range 3 MYC 513,108 190 Human_Myc_540 5’ - GCATTCTGAAACCTGAATGCTC-3’

AHF64_Dekker 5’ - GCATGCATTAGCCTCTGCTGTTCTCTGAAATC-3’ Short-range 1 AHF 6,288 240 AHF66_Dekker 5’ - CTGTCCAAGTACATTCCTGTTCACAAACCC-3’ Enhancer2_rs6477 9q31 5’ - TCAGGGTGTCTAGGAGGTCT-3’ Short-range 2 612_F1 5,614 283 (KLF4) SR_rs2417842_R1 5’ - CCAGTACGAGGTCCATCTCC-3’

Table 28: BACs used for generating 3C control libraries

Locus BAC clone ID Chromosomal location RP11-795I4 chr9: 110168556-110338106 CTD-2258L2 chr9: 110337626-110513771 RP11-762G1 chr9: 10431157-110648879 RP11-358A7 chr9: 110626528-110814450 RP11-779J13 chr9: 110805350-110975922 9q31 (KLF4) CTD-2517A7 chr9: 110957827-111155999 RP11-454G15 chr9: 111116889-111315982 RP11-585H18 chr9: 111308894-111469280 CTD-2333H8 chr9: 111458514-111576442 CTD-2649N21 chr9: 111543044-111776165 RP11-316H9 chr9: 111714046-111889073 RP11-162O9 chr6: 137286536-137450559 RP11-1058B18 chr6: 137425741-137630056 CTD-2071P12 chr6: 137625360-137804064 CTD-3175H21 chr6: 137799448-137879516 6q23 (TNFAIP3) – generated by CTD-2374P13 chr6: 137867732-137992452 Dr Amanda McGovern CTD-2511N24 chr6: 137916872-138137422 CTD-2244P2 chr6: 138128388-138273180 CTD-2106E2 chr6: 138268647-138400874 RP11-1023E5 chr6: 138393699-138591433

293

Table 29: Primers used to test identity of BACs in the 9q31 (KLF4) locus Two primer sets were used to test each BAC identity: pair 1 (F1/R1) targeted a region at the 5’ end of the BAC sequence whilst pair 2 (F2/R2) targeted a region at the 3’ end. Target Primer name Sequence RP11-795I4_F1 AAGCTACTCCAGACACAGGG RP11-795I4_R1 GATGTGCATTACGGTGGTCC RP11-795I4 RP11-795I4_F2 CATGCACCGTCTAATGGCAA RP11-795I4_R2 CTTGAATGTGGGAGGCGAAG CTD-2258L2_F1 CCATGAGACCAGAGCAAGGA CTD-2258L2_R1 GCTGCCCTCAAATATTCCGG CTD-2258L2 CTD-2258L2_F2 TTCCACTCTCATTGACCGCT CTD-2258L2_R2 GAAGTGGGGCTCTGGGTATT RP11-762G1_F1 TCAGCGAGAGACAGACCTTC RP11-762G1_R1 TGAAAGGCTTAGGGGTCCAG RP11-762G1 RP11-762G1_F2 GACTCAACTGGAAGCATGGC RP11-762G1_R2 GGTATGGGGTGAAGTGTGGT RP11-358A7_F1 CTTAGAGCAGGCCCAGACTT RP11-358A7_R1 AAGCGTCTGTGAAGGCTACT RP11-358A7 RP11-358A7_F2 TCTGAAGGCAAGAGACGAGG RP11-358A7_R2 TCCTGTTTCCTCTGACACCC RP11-779J13_F1 AGTGGAGAAGGAGGCAGAAC RP11-779J13_R1 CATCAAAGACCCTGGCACTC RP11-779J13 RP11-779J13_F2 AGGCAAGTGAAGCTCTCTGT RP11-779J13_R2 AAGGCAGTCACTTCAGCTCA CTD-2517A7_F1 TTCGTCACCACTCCCGTTTA CTD-2517A7_R1 TTGCCTCCCACTCATCACAT CTD-2517A7 CTD-2517A7_F2 CCAAGTGTGTGGCTCAGAAC CTD-2517A7_R2 GAAACCCTCTGAGCCTCTGT RP11-454G15_F1 CACTACAAAGGCCCCTCAGT RP11-454G15_R1 AAGGCAGTGAGGATGGTTCA RP11-454G15 RP11-454G15_F2 GGACACAAAGCAGCAAGGAA RP11-454G15_R2 TGCTGGGTCCTGAAACTGAT RP11-585H18_F1 TCCCTCCTGCATTCTGAGTG RP11-585H18_R1 TGGATCTGTGGTCCCTAGGA RP11-585H18 RP11-585H18_F2 AGGGTAGTCAGGGTCGTACT RP11-585H18_R2 ACAGACCCCAGAACTCAGTG CTD-2333H8_F1 TGACAGTTCCCAGAAGCAGT CTD-2333H8_R1 TGCTGGTTGTCTTTGCCATC CTD-2333H8 CTD-2333H8_F2 TGACCCCATCTCTCTCTCCA CTD-2333H8_R2 AGTGTGACCAGTGCTCATGA CTD-2649N21_F1 TGTACAAAGGGGTCACAGCT CTD-2649N21 CTD-2649N21_R1 TTCCTGCCTAATTGAGCCCA 294

Target Primer name Sequence CTD-2649N21_F2 TGAACCAGAGAAGAGGGCAG CTD-2649N21_R2 ACCAGGCAGAATGATGTCCA RP11-316H9_F1 TGGAGCCAGAGAAAGCAAGT RP11-316H9_R1 TGAACAGAGAAGCCAGGGAG RP11-316H9 RP11-316H9_F2 ATCAAGCTGACCCTGGAGAC RP11-316H9_R2 ACCCCAAGAGTCACTGTCAG

Primer pair 1 Primer pair 2

BAC +ve -ve BAC +ve -ve Rp11-795I4

200bp

CTD-2258L2

200bp

Figure 57: Representative gels of BAC clone identity confirmation BAC clone identities were confirmed by PCR amplification of regions near the 5’ end (primer pair 1) and 3’ end (primer pair 2) of the BAC sequence. Gel image shows PCR products using the BAC template, human random control DNA (+ve) or distilled water (-ve) for two BAC clones in the 9q31 locus.

295

Table 30: Primers and TaqMan probe used for 3C-qPCR assays in the 9q31 (KLF4) locus The probe for the TaqMan assay is highlighted in green. Abbreviations: SR, short-range; LD, linkage disequilibrium

SYBR or TaqMan Distance from Target Primer/probe name Sequence HindIII location (hg19) assay HindIII site (bp) RP11-363D24.1 RP11-363D24.1_F2 SYBR AGGAGCTGCATGATTCACCA chr9:110201199-110201204 51 KLF4 centromeric 1 KLF4_cent1_F1 SYBR GCAGGGCAATGGTCAATTCC chr9:110238441-110238446 80 KLF4 centromeric 2 KLF4_cent2_F1 SYBR CACACCTAGGAGCCCACAG chr9:110244649-110244654 107 SR KLF4 centromeric 1 KLF4_cent2_F2 SYBR CCACAGCTGATAGGTCCCAG chr9:110244649-110244654 110 KLF4 gene and promoter KLF4_gene_F1 SYBR/TaqMan TGCTTTGAAATGAAATCCCTGC chr9:110255868-110255873 45

KLF4 gene and promoter KLF4 TaqMan probe TaqMan GCTTGCAGCTTTCACAAGGT chr9:110255868-110255873 26 296

KLF4 gene body SR KLF4_gene_SR1_F2 TaqMan GCAAACTCCTCTTATATCCAGGG chr9:110259991-110259996 34

Intergenic 1 Int1_KLF4_F1 SYBR/TaqMan GGCCAGCTAAGATTCACTGC chr9:110459966-110459971 37 rs10979182 LD region 1 LD_reg_1_KLF4_F1 TaqMan TCGAAGACAGGTTGTTGGGA chr9:110770045-110770051 35 rs10979182 LD region 2 LD_reg_2_KLF4_F1 TaqMan CTAAGGCCTGCAATGAAGACA chr9:110775650-110775656 51 rs10979182 LD region 3 LD_reg_3_KLF4_F1 TaqMan GAAGAAGCATCCCATGGCTG chr9:110782236-110782242 56 rs10979182 LD region 4 LD_reg_4_KLF4_F1 TaqMan TAAACCCAAGACAGTGCTGC chr9:110788661-110788667 38 rs10979182 LD region 5 LD_reg_5_KLF4_F1 TaqMan GGCACCTGAGGCATACAATG chr9:110796043-110796049 66 rs10979182 LD region 6 LD_reg_6_KLF4_F1 TaqMan AAATCATGTCCTTGGTGCCC chr9:110801016-110801022 38

296

SYBR or TaqMan Distance from Target Primer/probe name Sequence HindIII location (hg19) assay HindIII site (bp) rs10979182 LD region 7 Enhancer1_rs10217259_F1 TaqMan TGGCTACATCCAGAGTTGCT chr9:110808470-110808476 113 Enhancer 2 rs6477612 Enhancer2_rs6477612_F2 SYBR CAGGGTGTCTAGGAGGTCTTC chr9:110816598-110816603 139 rs10979182 LD region 8 Enhancer2_rs6477612_F3 TaqMan TCTTCGCTTCCTGTGGGC chr9:110816598-110816603 70 Enhancer 3 rs4978343 Enhancer3_rs4978343_F2 SYBR GGGAGAGAAAAGGAAGAAGAGT chr9:110821889-110821894 118 rs10979182 LD region 9 Enhancer3_rs4978343_F3 TaqMan TCAGCTCTTCAAGTTCTCATTCT chr9:110821888-110821894 76 Enhancer 2 SR SR_enhancer2_KLF4_F2 SYBR ACCACATTCCTCTTTTAGCCC chr9:110824379-110824384 44 rs10979182 LD region 10 LD_reg_10_KLF4_F2 TaqMan AACCAAGTCACAGAGAAGGCT chr9:110829060-110829066 84 rs10979182 LD region 11 LD_reg_11_KLF4_F1 TaqMan CTAATTGCTGCAGGACCCAC chr9:110838267-110838273 76

Intergenic 3 Int3_KLF4_F1 TaqMan CTTGGAGTAGCTTGCTGAGG chr9:110940017-110940022 3 297

Positive 1 (Dryden/Martin) DrydenBC_F2 SYBR CTCCTCCAATCCCAAGCTGA chr9:111036598-111036603 69

Positive 2 (SR BC_SR1_F1 SYBR CAGACAAAGACTGGACCCCA chr9:111038716-111038721 117 Dryden/Martin) Intergenic 4 Int4_KLF4_F2 SYBR TGAGTACACGCATCTTTTCCT chr9:111346822-111346827 98 IKBKAP & FAM206A IKBKAP_F1 SYBR AGCTGTGGTCAATTGGCATT chr9:111703976-111703981 68 promoter CTNNAL1 promoter CTNNAL1_F2 SYBR GGTGGCGAGGAGAAACAAAA chr9:111776135-111776140 59

297

Table 31: Primers used for 3C-qPCR in the 6q23 (TNFAIP3) locus Primers without “HRJ” in the name were designed by Dr Amanda McGovern (ARUK). Abbreviations: SR, short-range; NCR, negative control region; RA, rheumatoid arthritis; Ps, psoriasis

Distance from HindIII site Target Primer name Sequence HindIII location (hg19) (bp) SR IL22RA2 P3 ATGGTTCTGCAAGGCTGTG chr6:137414948-137414953 61 IL22RA2 promoter IL22RA2 TGTTGTTGTCTGCCTCTGGA chr6:137506744-137506749 61

SR IFNGR1 P4 GTTGTCTGCCTCTGGATCCC chr6:137506744-137506749 57 298 IFNGR1 region 8 CAAGGCAAGGTGGTGGTGGTTTT chr6:137583223-137583228 57

NCR1 (OLIG3) 36 ACAGGCAGTGGTATGTTGGA chr6:137834160-137834165 163 NCR2 (OLIG3) 38 GGGGCTCAGTGTTCTCAGAT chr6:137840905-137840924 124 RA index TNF_HRJ_7 CCAAAGTGCATTACAGAAGCA chr6:138007133-138007154 71 RA SNPs RA 15 AGAACACTACAGCAGACCCC chr6:138017056-138017061 178 TNFAIP3 promoter TNF_HRJ_11 GGCTTTGGAGTAACACAGGC chr6:138186779-138186798 78 Ps SNPs 1 (TNFAIP3) Ps 13 TGGGCTTAAGGGTGTCTGAG chr6:138202660-138202665 108 SR Ps enhancer TNF_HRJ_14 AGCTAAGAATGAAGGTGGCG chr6:138223133-138223152 31 Ps SNPs 2 (TNFAIP3) Ps 15 AGTCTAGCTGGTTTGGGAGG chr6:138241189-138241194 57

298

Table 32: Primers used to test gene expression in the stimulatory time-course Gene target Primer name Primer sequence KLF4 KLF4_F2 5’ – CCCAAGCCAAAGAGGGGAAG– 3’ KLF4_R2 5’ – TCATCTGAGCGGGCGAATTT– 3’ GAPDH GAPDH_F 5’ – CACAGTCCATGCCATCACTG– 3’ GAPDH_R 5’ – CCATGCCAGTGAGCTTCCC– 3’ HPRT1 HPRT1_F 5’ – ATGGACAGGACTGAACGTCTTG– 3’ HPRT1_R 5’ – GGCTACAATGTGATGGCCTC– 3’

A C

6000

4000 28S

28S B 2000 18S 18S 1000

500

Marker 200

Marker (25)

Figure 58: Representative Bioanalyzer output for a high-quality total RNA sample (RIN = 10) A successful ladder run for the RNA Nano 6000 Kit produces seven clear peaks on the electropherogram where the first peak corresponds with the 25 nt marker (A). Total RNA is mostly made up of two rRNA subunits 18S and 28S; high quality rRNA has clear peaks for the two subunits where the 28S peak is higher than the 18S peak (B). The corresponding gel image for the sample produced by the Bioanalyzer is shown on the right (C).

299

Table 33: Adapters and PCR primers used for amplification of Hi-C libraries Oligo application Oligo name Sequence

Chic_TruPE_adapter_1 5’- P-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3’ Adapters Chic_TruPE_adapter_2 5’- ACACTCTTTCCCTACACGACGCTCTTCCGATC*T-3’

Chic_TruPE_PCR_1 5’- ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3’ PCR primers Chic_TruPE_PCR_2 5’- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC -3’

517 bp

Marker 1 Marker 2

Figure 59: Representative Bioanalyzer trace for a Hi-C library A good quality Hi-C library forms a smooth normal distribution between the two DNA markers (35 bp and 10380 bp). The mean size of this particular library was 517 bp.

300

Table 34: SNPs included in the CHi-C study design

Risk Locus Gene(s) Popn Index SNP Annotation P-value OR Reference allele

Missense: -8

1p31.3 C1orf141 CHN rs72933970 1.23 x 10 G 1.16 Zuo et al. (2015) C1orf141 chr1: Nonsynonymo -11

1p31.3 IL-23R CHN 67,421,184 1.94 x 10 G 1.38 Tang et al. (2014) us: IL-23R (hg 18)

441bp 3' of IL- -26

1p31.3 IL-23R EUR rs9988642 1.1 x 10 T 1.52 Tsoi et al. (2012) 23R

-14

1p31.3 IL-23R EUR rs113935720 Intronic: IL23R 7 × 10 T 1.41 Stuart et al. (2015) Intronic: 9 × 10-13 1p31.3 IL-23R EUR rs4655683 A 1.18 Stuart et al. (2015) C1orf141 (conditional) 1 × 10-8 1p31.3 IL-23R EUR rs112626518 Intronic: IL23R T 1.14 Stuart et al. (2015) (conditional)

5.5kb 5' of IL- -11

1p36 IL-28RA CHN rs4649203 9.74 x 10 A 1.19 Cheng et al. (2014) 28RA 4.2kb 5' of IL- 1p36 IL-28RA EUR rs7552167 8.5 x 10-12 G 1.21 Tsoi et al. (2012) 28RA 1.5kb 5' of 1p36.11 RUNX3 EUR rs7536201 2.3 x 10-12 C 1.13 Tsoi et al. (2012) RUNX3 Missense: 1p36.11 ZNF683 CHN rs10794532 4.18 x 10-8 A 1.11 Zuo et al. (2015) ZNF683 Missense: 1p36.22 NPPA CHN rs5063 3.51 x 10-9 G 1.18 Zuo et al. (2015) NPPA SLC45A1, 1p36.23 EUR rs11121129 Intergenic 1.7 x 10-8 A 1.13 Tsoi et al. (2012) TNFRSF9 Missense: 1p36.3 MTHFR CHN rs2274976 2.33 x 10-10 G 1.27 Zuo et al. (2015) MTHFR LCE3B, 175bp 3' of 1q21.3 CHN rs10888501 6.48 x 10-13 A 1.16 Zuo et al. (2015) LCE3D LCE3E LCE3B, Missense: 1q21.3 CHN rs41268474 5.99 x 10-11 A 1.17 Zuo et al. (2015) LCE3D C1orf68 LCE3B, Missense: 1q21.3 CHN rs76337351 1.71 x 10-8 C 1.20 Zuo et al. (2015) LCE3D KPRP LCE3B, 3.6kb 3' of 1q21.3 EUR rs6677595 2.1 x 10-33 T 1.26 Tsoi et al. (2012) LCE3D LCE3B Nonsynonymo 1q21.3 LCE3D CHN rs512208 2.92 x 10-23 A 1.25 Tang et al. (2014) us: LCE3D Stop-gained: 1q22 AIM2 CHN rs2276405 3.22 x 10-9 G 1.20 Zuo et al. (2015) AIM2

-8

1q31.1 LRRC7 EUR rs10789285 Intergenic 1.43 x 10 G 1.12 Tsoi et al. (2015)

301

Risk Locus Gene(s) Popn Index SNP Annotation P-value OR Reference allele 2p15 B3GNT2 EUR rs10865331 Intergenic 4.7 x 10-10 A 1.12 Tsoi et al. (2012) FLJ16341, Intronic: 2p16.1 EUR rs62149416 1.8 x 10-17 T 1.17 Tsoi et al. (2012) REL FLJ16341 Intronic: 2q12.1 IL1RL1 CHN rs1420101 1.71 x 10-10 G 1.14 Zuo et al. (2015) IL1RL1

-6

2q13 IL1R1 EUR rs887998 Intronic: IL1R1 8.81 x 10 A 1.40 Hebert et al. (2014a) EUR/ Missense: 1.52 x 10-18 2q24.2 IFIH1 rs3747517 C 1.30 Yin et al. (2015) CHN IFIH1 (conditional) EUR/ Missense: 2q24.2 IFIH1 rs1990760 1.92 x 10-9 T 1.12 Yin et al. (2015) CHN IFIH1

KCNH7, -9

2q24.2 CHN rs13431841 Intronic: IFIH1 2.96 x 10 G 1.20 Sheng et al. (2014) IFIH1 3p24.3 PLCL2 EUR rs4685408 Intronic: PLCL2 8.58 x 10-9 G 1.12 Tsoi et al. (2015) 400bp 3' of 3q28 TP63 EUR rs28512356 4.31 x 10-8 C 1.17 Yin et al. (2015) TP63 Intronic: RP11- 3q12.3 NFKBIZ EUR rs7637230 2.07 x 10-9 A 1.14 Tsoi et al. (2015) 221J22.1 Missense: 3q13 CASR CHN rs1042636 1.88 x 10-10 A 1.10 Zuo et al. (2015) CASR 3q26.2- Intronic: GPR160 CHN rs6444895 1.44 x 10-12 G 1.11 Zuo et al. (2015) q27 GPR160 Intronic: 4q24 NFKB1 CHN rs1020760 2.19 x 10-8 G 1.12 Sheng et al. (2014) NFKB1 PTGER4, 5p13.1 EUR rs114934997 Intergenic 1.27 x 10-8 C 1.17 Tsoi et al. (2015) CARD6 Missense: 5q14 ZFYVE16 CHN rs249038 2.14 x 10-8 G 1.19 Zuo et al. (2015) ZFYVE16 ERAP1, Intronic: 5q15 CHN rs27043 6.50 x 10-12 G 1.15 Sheng et al. (2014) LNPEP ERAP1 ERAP1, Intronic: 5q15 EUR rs27432 1.9 x 10-20 A 1.20 Tsoi et al. (2012) LNPEP ERAP1 Missense: 5q15 ERAP1 CHN rs26653 5.27 x 10-12 C 1.15 Tang et al. (2014) ERAP1 ERAP1, Intronic: 5q15 EUR rs30376 7.23 x 10-13 C 1.14 Yin et al. (2015) LNPEP ERAP1 ERAP1, Intronic: 2.0 x 10-8 5q15 EUR rs2910686 C 1.12 Tsoi et al. (2012) LNPEP ERAP2 (conditional) Missense: 5q15 LNPEP CHN rs2303138 1.83 x 10-13 A 1.16 Cheng et al. (2014) LNPEP Tsoi et al. (2012), Das 5q31 IL13, IL4 EUR rs1295685 3'-UTR: IL13 3.4 x 10-10 G 1.18 et al. (2015) 302

Risk Locus Gene(s) Popn Index SNP Annotation P-value OR Reference allele 5q33.1 TNIP1 CHN rs10036748 Intronic: TNIP1 4.26 x 10-9 G 1.10 Zuo et al. (2015) 5q33.1 TNIP1 EUR rs2233278 5'-UTR: TNIP1 2.2 x 10-42 C 1.59 Tsoi et al. (2012) 5q33.1 TNIP1 EUR rs17728338 Intergenic 4.15 x 10-13 A 1.81 Das et al. (2014) Intronic: 5q33.3 IL12B CHN rs10076782 4.11 x 10-11 G 1.14 Zuo et al. (2015) RNF145 Intronic: 5q33.3 IL12B CHN rs1473247 5.63 x 10-11 A 1.14 Zuo et al. (2015) RNF145 5q33.3 IL12B CHN rs2288831 Intronic: IL12B 2.30 x 10-20 T 1.20 Sheng et al. (2014) CHN/ 5q33.3 IL12B rs4921493 Intergenic 2.22 x 10-16 T 1.27 Yin et al. (2015) EUR CHN/ 5q33.3 IL12B rs2853694 Intronic: IL12B 8.61 x 10-30 G 1.36 Yin et al. (2015) EUR CHN/ Intronic: 5q33.3 IL12B rs7709212 1.20 x 10-30 T 1.38 Yin et al. (2015) EUR LOC285626 5q33.3 IL12B EUR rs12188300 Intergenic 3.2 x 10-53 T 1.58 Tsoi et al. (2012) 9.0 x 10-40 5q33.3 IL12B EUR rs4379175 Intergenic G 1.31 Tsoi et al. (2012) (conditional) 5q33.3 IL12B EUR rs918518 Intergenic 3.22 x 10-11 T 1.44 Das et al. (2015) 5q33.3 IL12B EUR rs918520 Intergenic 1.0 x 10-47 G 1.45 Stuart et al., 2015 Intronic: 6 × 10-21 5q33.3 IL12B EUR rs62377586 G 1.26 Stuart et al. (2015) LOC285626 (conditional) Intronic: 4 x 10-11 5q33.3 IL12B EUR rs953861 G 1.25 Stuart et al. (2015) LOC285626 (conditional) 1 × 10-7 5q33.3 IL12B EUR rs6870256 Intergenic C 1.13 Stuart et al. (2015) (conditional)

-8

5q33.3 PTTG1 CHN rs2431697 Intergenic 1.11 x 10 C 1.20 Sun et al. (2010) EXOC2, Intronic: 6p25.3 EUR rs9504361 2.1 x 10-11 A 1.12 Tsoi et al. (2012) IRF4 EXOC2 Intronic: 6p22.3 CDKAL1 EUR rs4712528 8.4 x 10-11 C 1.16 Stuart et al. (2015) CDKAL1 Missense: 6q21 TRAF3IP2 EUR rs33980500 4.2 x 10-45 T 1.52 Tsoi et al. (2012) TRAF3IP2 Intronic: 6q23.3 TNFAIP3 EUR rs582757 2.2 x 10-25 C 1.23 Tsoi et al. (2012) TNFAIP3 6q25.3 TAGAP EUR rs2451258 Intergenic 3.4 x 10-8 C 1.12 Tsoi et al. (2012) Intronic: 7p14.1 ELMO1 EUR rs2700987 4.3 x 10-9 A 1.11 Tsoi et al. (2012) ELMO1 Missense: 7p14.3 CCDC129 CHN rs4141001 1.84 x 10-11 A 1.16 Zuo et al. (2015) CCDC129

303

Risk Locus Gene(s) Popn Index SNP Annotation P-value OR Reference allele Intronic: 8p23.2 CSMD1 CHN rs10088247 4.54 x 10-9 C 1.17 Sun et al. (2010) CSMD1 Intronic: 8p23.2 CSMD1 CHN rs7007032 3.78 x 10-8 C 1.16 Sun et al. (2010) CSMD1 Intronic: 9p21.1 DDX58 EUR rs11795343 8.4 x 10-11 T 1.11 Tsoi et al. (2012) DDX58 9q31.2 KLF4 EUR rs10979182 Intergenic 2.3 x 10-8 A 1.12 Tsoi et al. (2012) CAMK2G, Intronic: 10q22.2 EUR rs2675662 7.35 x 10-9 A 1.12 Tsoi et al. (2015) FUT11 CAMK2G Intronic: Ellinghaus et al. 10q22.3 ZMIZ1 EUR rs1250544 3.53 x 10-8 G 1.16

ZMIZ1 (2012a) Missense: 11p15.4 ZNF143 CHN rs10743108 1.70 x 10-8 C 1.14 Zuo et al. (2015) ZNF143 RPS6KA4, 256bp 5' of Ellinghaus et al. 11q13.1 EUR rs694739 3.71 x 10-9 A 1.14 PRDX5 AP003774.1 (2012) Synonymous: 11q13.1 AP5B1 CHN rs610037 4.29 x 10-11 C 1.11 Zuo et al. (2015) AP5B1 1.7kb 5' of 11q22.3 ZC3H12C EUR rs4561177 7.7 x 10-13 A 1.14 Tsoi et al. (2012) ZC3H12C 11q24.3 ETS1 EUR rs3802826 Intronic: ETS1 9.5 x 10-10 A 1.12 Tsoi et al. (2012) Intronic: 12p13.31 CD27, LAG3 CHN rs758739 4.08 x 10-8 C 1.10 Sheng et al. (2014) NCAPD2 Intronic: 12p13.31 CD27, LAG3 CHN rs2243750 4.38 x 10-8 T 1.10 Sheng et al. (2014) TAPBPL IL-23A, Intronic: 12q13.3 EUR rs2066819 5.4 x 10-17 C 1.39 Tsoi et al. (2012) STAT2 STAT2 IL-23A, 12q13.3 EUR rs61937678 Intergenic 1.82 x 10-7 C 1.58 Das et al. (2015) STAT2 Missense: 13q12.11 GJB2 CHN rs72474224 7.46 x 10-11 T 1.34 Tang et al. (2014) GJB2 13q12.11 GJB2 CHN rs3751385 3’-UTR: GJB2 8.57 x 10-8 T 1.15 Sun et al. (2010) 13q14.11 COG6 EUR rs34394770 Intronic: COG6 2.65 x 10-8 T 1.16 Yin et al. (2015) Within 13q14.11 LOC144817 EUR rs9533962 1.93 x 10-8 C 1.14 Yin et al. (2015) LOC144817 14q13.2 NFKBIA CHN rs12884468 Intergenic 1.05 x 10-8 G 1.14 Zuo et al. (2015) Intronic: RP11- 14q13.2 NFKBIA EUR rs8016947 2.5 x 10-17 G 1.16 Tsoi et al. (2012) 56B11.3 Stop-gained: 14q23.2 SYNE2 CHN rs2781377 4.21 x 10-11 G 1.18 Zuo et al. (2015) SYNE2 FBXL19, Intronic: 16p11.2 EUR rs12445568 1.2 x 10-16 C 1.16 Tsoi et al. (2012) PRSS53 STX1B 304

Risk Locus Gene(s) Popn Index SNP Annotation P-value OR Reference allele PRM3, 1.6kb 3' of 16p13.13 EUR rs367569 4.9 x 10-8 C 1.13 Tsoi et al. (2012) SOCS1 PRM3 17q11.2 NOS2 EUR rs28998802 Intronic: NOS2 3.3 x 10-16 A 1.22 Tsoi et al. (2012) Intronic: RP1- 6 x 10-9 17q11.2 NOS2 EUR rs2301369 C 1.15 Stuart et al. (2015) 66C13.4 (conditional) Intronic: 17q12 IKZF3 CHN rs10852936 1.96 x 10-8 T 1.10 Sheng et al. (2014) ZPBP2 PTRF, 17q21.2 STAT3, EUR rs963986 Intronic: PTRF 5.3 x 10-9 C 1.15 Tsoi et al. (2012) STAT5A/B CHN/ Missense: 3.46 x 10-9 / 1.10/ Tang et al. (2014)/Tsoi 17q25.3 CARD14 rs11652075 C EUR CARD14 3.4 x 10-8 1.11 et al. (2012) Missense: 17q25.3 TMC6 CHN rs12449858 2.28 x 10-8 A 1.12 Zuo et al. (2015) TMC6 POL1, 18q21.2 STARD6, EUR rs545979 Intronic: POL1 3.5 x 10-10 T 1.12 Tsoi et al. (2012) MBD2 18q22.1 SERPINB8 CHN rs514315 3’ of SERPINB8 5.92 x 10-9 T 1.15 Sun et al. (2010) ILF3, Intronic: 19p13.2 EUR rs892085 3 x 10-17 A 1.17 Tsoi et al. (2012) CARM1 QTRT1 Missense: 19p13.2 TYK2 EUR rs34536443 9.1 x 10-31 G 1.88 Tsoi et al. (2012) TYK2 Missense: 19p13.2 TYK2 EUR rs12720356 3.2 x 10-10 A 1.25 Tsoi et al. (2012) TYK2 Missense: 19q13.41 ZNF816A CHN rs12459008 2.25 x 10-9 A 1.14 Tang et al. (2014) ZNF816 Intronic: 20q13.13 RNF114 EUR rs1056198 1.5 x 10-14 C 1.16 Tsoi et al. (2012) RNF114 Intronic: 21q22 RUNX1 EUR rs8128234 3.74 x 10-8 T 1.17 Yin et al. (2015) RUNX1 Missense: 21q22.11 IFNGR2 CHN rs9808753 2.75 x 10-8 A 1.09 Zuo et al. (2015) IFNGR2 21q22.11 SON CHN rs3174808 Missense: SON 1.15 x 10-8 G 1.10 Zuo et al. (2015) UBE2L3, 1kb 3' of 3.8 x 10-8 22q11.21 EUR rs4821124 C 1.13 Tsoi et al. (2012) YDJC UBE2L3 (conditional)

305

Table 35: Adapters and PCR primers used for amplification of CHi-C libraries Primer Primer name Sequence application

Universal TruSeq Universal 5’- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -3’ primer primer TruSeq Index 5’- CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC -3’ Barcoded Primer 006 primer TruSeq Index 5’- CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC -3’ Primer 012

306

Table 36: List of 91 variants in tight LD with rs10979182 in the KLF4 locus, with associated functional scores Abbreviations for VEP consequences: IV, intergenic variant; RRV, regulatory region variant; DGV, downstream gene variant; UGV, upstream gene variant; NCTEV, non-coding transcript exon variant; TFBSV, transcription factor binding site variant.

Variant Position Ref allele Alt R2 PICS VEP score Regulome CADD (hg19) allele DB score score rs10979182 110817020 A G 1.000 0.053 IV 6 1.542 rs892687 110814936 A C 0.989 0.037 IV 5 4.274 rs11355519 110817393 A - 0.985 N/A IV 5 0.861 rs1369190 110816795 T C 0.985 0.032 IV 5 9.028 rs4978668 110817549 G T 0.978 0.037 IV 5 2.573 rs2417842 110818823 G T 0.974 0.024 IV 4 1.791 rs10979183 110821050 C T 0.974 0.026 IV 4 5.548 rs6477614 110821622 A G 0.974 N/A IV 5 0.539 rs1318148 110814693 C G 0.974 0.032 IV 4 0.591 rs1434836 110822658 A G 0.974 0.032 IV; RRV 4 2.287 rs113137157 110810757 - AACTT 0.970 0.007 IV 5 0.575 rs4978343 110820132 G T 0.970 0.028 IV 4 10.31 rs35719858 110826505 T - 0.962 N/A IV 6 0.769 rs1369189 110822918 C T 0.959 0.022 IV No Data 7.856 rs4978670 110826546 A G 0.959 0.024 IV 6 0.198 rs10217259 110803975 T C 0.952 0.013 DGV; RRV 4 6.826 rs4979625 110786533 A T 0.949 0.013 IV 5 2.833 rs200032386 110791067 C T 0.949 N/A IV No Data 2.735 rs143632367 110791074 A G 0.949 N/A IV No Data 5.538 rs114882864 110791479 C T 0.949 N/A IV No Data 1.678 rs139152228 110791703 T C 0.949 N/A IV No Data 1.833 rs6477610 110793123 T C 0.949 0.004 IV No Data 5.134 rs10816616 110793539 C A 0.949 N/A IV 6 3.218 rs138954071 110793853 - TCC 0.949 0.001 IV 6 0.869 rs4979626 110793907 T C 0.949 0.006 IV No Data 0.068 rs7865570 110794594 T C 0.949 0.010 IV; RRV 5 1.985 rs7850481 110794744 G A 0.949 0.008 IV 6 8.676 rs7850593 110794809 G A 0.949 0.011 IV 6 1.584 rs1995761 110795237 A G 0.949 0.013 IV 6 2.235 rs10816618 110801608 C A 0.949 0.014 UGV; RRV 4 5.343 rs10816617 110800715 C T 0.945 0.012 UGV No Data 5.429 rs35078320 110781209 T - 0.945 N/A IV 6 0.343 rs10816624 110833259 A G 0.944 0.012 IV 5 2.787 rs10979185 110833332 T C 0.944 0.011 IV 5 0.135

307

Variant Position Ref allele Alt R2 PICS VEP score Regulome CADD (hg19) allele DB score score rs7868720 110787096 C T 0.942 0.012 IV No Data 1.433 rs10546118 110787413 TC - 0.942 0.009 IV No Data 1.999 rs7855502 110787468 T A 0.942 0.012 IV No Data 3.998 rs7466417 110795042 C T 0.942 0.011 IV 5 3.173 rs10759272 110795806 C G 0.942 0.012 IV No Data 0.567 rs10118193 110802747 A G NCTEV; No Data 4.843 0.941 0.011 RRV rs7468416 110831367 T C 0.940 0.012 IV No Data 2.113 rs13285347 110831501 C G 0.940 0.006 IV 6 1.366 rs13286627 110831560 T C 0.940 0.002 IV 6 0.131 rs112957589 110834146 - GAAACA 0.940 0.011 IV 4 1.085 rs1914513 110803324 G A 0.938 0.010 DGV; RRV 4 0.513 rs10708219 110807112 A - 0.938 N/A DGV 6 0.515 rs10816609 110779419 C A 0.938 0.010 RRV; IV No Data 19.43 rs141688385 110792282 T C 0.937 0.005 IV 6 0.9 rs7869206 110787286 G A 0.935 0.010 IV No Data 2.095 rs1361371 110797442 C T 0.935 0.010 RRV; IV No Data 6.08 rs151283439 110807514 AA - 0.934 N/A DGV 6 0.105 rs7024627 110807629 A T 0.934 0.010 DGV 6 0.481 rs10979180 110810345 C T 0.934 0.014 IV 6 6.205 rs10125120 110779166 G C 0.934 0.010 RRV; IV 5 17.73 rs4461962 110792438 T C 0.933 0.001 IV No Data 3.291 rs7470161 110830273 A G 0.933 0.009 IV No Data 0.056 rs1356435 110806645 C T 0.931 0.009 DGV No Data 9.231 rs7020797 110806917 C T 0.931 0.011 DGV 6 2.538 rs6477611 110806981 G C 0.931 0.011 DGV No Data 1.798 rs6477612 110811552 C T 0.931 0.016 TFBSV; IV 2a 17.4 rs7019552 110810024 T C 0.930 0.008 RRV; IV No Data 2.474 rs10816610 110781922 A C 0.929 0.010 IV No Data 0.392 rs4979624 110782512 C G 0.929 0.010 RRV; IV 5 3.559 rs10512368 110783036 G A 0.929 0.010 IV 6 1.269 rs35955641 110788873 A T 0.929 0.009 IV 6 0.59 rs55975335 110811312 AAAACAAAGT - IV 2b 15.81 ATTAATGAAT AATAATATCT 0.927 N/A rs5899779 110805395 T - 0.926 N/A DGV 6 0.029 rs1339756 110776765 T C 0.925 0.009 IV 4 0.915 rs6477613 110811614 C T 0.924 0.015 IV 4 0.142 rs10435852 110775536 C G 0.922 0.008 IV 6 10.48 rs7029094 110782565 T C 0.922 0.009 RRV; IV 5 3.552

308

Variant Position Ref allele Alt R2 PICS VEP score Regulome CADD (hg19) allele DB score score rs10816611 110784233 G A 0.922 0.008 IV No Data 5.356 rs12551076 110787730 A G 0.922 0.009 IV No Data 1.378 rs7860780 110772665 T C 0.922 0.008 IV No Data 1.836 rs9697003 110773245 G A 0.918 0.008 IV No Data 2.921 rs10759271 110776303 T C 0.918 0.008 IV No Data 5.973 rs10816612 110786222 G A 0.918 0.008 IV No Data 0.166 rs7388803 110789372 G T 0.915 N/A IV No Data 0.552 rs10816613 110789837 A G 0.915 0.009 IV No Data 0.354 rs7852687 110790553 G A 0.915 N/A IV No Data 8.658 rs10816607 110777707 T C 0.914 0.007 IV 6 2.177 rs4979623 110769283 G A 0.907 0.007 IV No Data 0.395 rs10979167 110790082 A G 0.904 0.001 IV No Data 4.983 rs9695201 110769468 A G 0.904 0.007 IV 6 2.041 rs7467711 110830496 T C 0.893 N/A IV 6 4.515 rs9299144 110767478 A G 0.893 0.006 IV No Data 7.632 rs562409617 110767057 - AAAGCA IV N/A 0.141 GT 0.890 N/A rs10481656 110836284 C T 0.858 0.007 RRV; IV 5 2.725 rs10481658 110836416 G A 0.856 0.006 RRV; IV 5 0.384 rs35320206 110776613 - CT 0.840 N/A IV 6 0.41 rs10816608 110778738 G T 0.836 0.003 IV 6 0.021

309

Time (min): 0 5 10 15 20* 25 HaCaT

600 bp

400 bp

200 bp

Time (min): 0 5 10 15 20* 25 My-La

600 bp 400 bp

200 bp

Time (min): 0 2 4 6 8* 10 NHEK

600 bp

400 bp

200 bp

Figure 60: Optimisation of ChIP sonication for HaCaT, My-La and NHEK cells * Time-point selected to obtain optimal chromatin fragment lengths of 150 – 400 bp

310

Dilution QC QC PCR

2 µL 6 µL HaCaT

200 bp

LR1 LR2 LR3 SR (AHF) SR (KLF4)

2 µL 6 µL

La

- My

200 bp

LR1 LR2 LR3 SR (AHF) SR (KLF4)

Figure 61: Representative QC gels for HaCaT and My-La 3C libraries 1 in 10 dilutions of each library were made, and 2 or 6 µL loaded into a 0.8% agarose gel. Each library ran as a tight band larger than 10 kb. PCRs were conducted with primers for known long-range (LR) or short-range (SR) interactions and the products run on a 1.5% agarose gel.

311

ChromHMM colour key Gencode V19 colour key

Figure 62: Colour keys used for ChromHMM and Gencode V19 genes illustrated in CHi-C figures

312

6. PUBLICATION ARISING FROM THIS THESIS

313

REVIEW

One SNP at a Time: Moving beyond GWAS in Psoriasis Helen Ray-Jones1,2, Stephen Eyre1, Anne Barton1,3 and Richard B. Warren2

Although genome-wide association studies have many challenges remain before the full genetic component of revealed important insights into the global genetic disease association can be understood. One of these chal- basis of psoriasis, the findings require further inves- lenges is to better understand how the known genetic loci tigation. At present, the known genetic risk loci are confer risk to disease, which is the primary focus of this largely uncharacterized in terms of the variant or gene review. responsible for the association, the biological pathway involved, and the main cell type driving the GENETICS OF PSORIASIS: A BRIEF OVERVIEW pathology. This review primarily focuses on current The genetic locus conferring the greatest risk for psoriasis approaches toward gaining a complete understanding susceptibility in both European and Chinese populations is the major histocompatibility complex (MHC) class I, impli- of how these known genetic loci contribute to an cating the involvement of the adaptive immune system in increased disease risk in psoriasis. psoriasis pathology (Ellinghaus et al., 2010; Liu et al., 2008; Journal of Investigative Dermatology (2016) 136, 567e573; doi:10.1016/ Nair et al., 2009; Strange et al., 2010; Stuart et al., 2010; Tsoi j.jid.2015.11.025 et al., 2012; Zhang et al., 2009). Within the MHC, an allele at the HLA gene, HLA-C*06:02, shows the strongest association with psoriasis. However, it is evident that independent risk INTRODUCTION associations exist across the MHC (Feng et al., 2009), Psoriasis is thought to be dependent on a complex interplay including ethnicity-specific signals (Yin et al., 2015). A recent between many genetic loci and environmental factors. The fine mapping study confirmed the presence of independent development of sophisticated methods for rapid genotyping signals at HLA-C*12:03, HLA-B, HLA-A, and HLA-DQA1 of DNA has led to the era of high-powered genome-wide through conditional analysis (Okada et al., 2014). Outside of association studies (GWASs), which have revolutionized the MHC, a second well-established risk locus resides at the our understanding of complex trait genetics (Stranger et al., gene for endoplasmic reticulum aminopeptidase 1 (Strange 2011). GWAS and more targeted candidate gene ap- et al., 2010). The endoplasmic reticulum aminopeptidase proaches [Immunochip; Tsoi et al. (2012)] have identified 1 protein is thought to be responsible for N-terminal trimming more than 40 single nucleotide polymorphisms (SNPs) asso- of peptides allowing binding to the MHC class I molecule ciated with psoriasis at a genome-wide significance level (Alvarez-Navarro and de Castro, 2014; Saric et al., 2002); e (P < 5 10 8), many of which are situated near genes therefore, this signal further implicates the involvement of the involved in adaptive and innate immunity pathways, which adaptive immune system in psoriasis. are summarized in Supplementary Table S1 (online) and Many SNPs have also implicated gene candidates reviewed by Mahil et al. (2015). In the “post-GWAS” era, from innate immunity pathways in European cohorts (Capon et al., 2008; Cargill et al., 2007; Ellinghaus et al., 2012; Nair et al., 2009; Strange et al., 2010; Stuart et al., 2010; Tsoi et al., 2012, 2015b). These include NF-kB signaling 1Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Institute of Inflammation and Repair, (e.g., REL, TNIP1, NFKBIA, and CARD14), IFN signaling Manchester Academic Health Science Centre, The University of (e.g., IL28RA and TYK2), T-cell regulation (e.g., RUNX3, Manchester, Manchester, United Kingdom; 2The Dermatology Centre, IL13, TAGAP, ETS1, and MBD2), and antiviral signaling (e.g., Salford Royal NHS Foundation Trust, University of Manchester, Manchester IFIH1, DDX58, and RNF114). Multiple loci containing genes Academic Health Science Centre, Manchester, United Kingdom; and 3NIHR Manchester Musculoskeletal Biomedical Research Unit, Central involved in the IL-23 pathway specifically implicate a role for Manchester University Hospitals NHS Foundation Trust, Manchester Th17 cells (e.g., TNFAIP3, IL23R, IL12B, TRAF3IP2, IL23A, Academic Health Science Centre, Manchester, United Kingdom and STAT3). Correspondence: Helen Ray-Jones, Arthritis Research UK Centre for Aside from the immune system, skin barrier regulatory Epidemiology, Centre for Musculoskeletal Research, Institute for genes of the late cornified envelope (LCE) within the Inflammation and Repair, Manchester Academic Health Science Centre, The University of Manchester, Manchester, United Kingdom. E-mail: helen. epidermal differentiation complex are associated with pso- [email protected] riasis in both European and Chinese populations (de Cid Abbreviations: ChIP, chromatin immunoprecipitation; eQTL, expression et al., 2009; Strange et al., 2010; Tsoi et al., 2012; Zhang quantitative trait locus; GPP, generalized pustular psoriasis; GWAS, genome- et al., 2009). The variants in this region likely tag a 30-kb wide association study; LCE, late cornified envelope; LD, linkage dysequili- deletion including the genes LCE3C and LCE3B. The loss of brium; MHC, major histocompatibility complex; SNP, single nucleotide polymorphism these genes is thought to impair reparation of the skin barrier Received 12 May 2015; revised 27 November 2015; accepted 30 November after injury (Bergboer et al., 2011, 2012). Alternatively, the 2015; corrected proof published online 22 January 2016 loss of an epidermal-specific enhancer element within the

ª 2015 The Authors. Published by Elsevier, Inc. on behalf of the Society for Investigative Dermatology. www.jidonline.org 567 H Ray-Jones et al. Moving beyond GWAS in Psoriasis

deleted region could be causing aberrant global transcription 2014). Ultimately, it is likely that more genetic signals will be of epidermal differentiation complex genes (de Guzman discovered along with the increased use of next-generation Strong et al., 2010). sequencing technology that encompasses whole genomes Several recent studies have demonstrated that Chinese or exomes. Additionally, increased study power through large populations display a number of unique genetic associations sample sets and refined statistical methods (fine mapping, with psoriasis, such as NFKB1, PTTG1, MTHFR, and genotype calling, and imputation) can identify common CCDC129 (Sun et al., 2010; Zuo et al., 2015), and share novel loci, strengthen known signals, and find independent some loci with European populations (Cheng et al., 2014; effects at known loci (Tsoi et al., 2015b; Yin et al., 2015). To Y Li et al., 2013; Sheng et al., 2014; Tang et al., 2014; detect rare variants, however, novel statistical analysis tech- Zhang et al., 2009). Recently a transethnic psoriasis GWAS, niques such as burden testing may be required. including both Chinese and European cohorts, identified four novel loci in European patients (LOC144817, COG6, Interpreting association signals RUNX1, and TP63) and population-specific effects at several GWAS-associated variants usually require further extensive loci (Yin et al., 2015). Further transethnic GWAS studies that interrogation for a full interpretation of the data. In part, this is compare allele frequencies, odds ratios, and disease path- because of the number of highly correlated genetic variants ways between different populations will advance the current in linkage dysequilibrium (LD) that may be causal. Research understanding of global psoriasis pathogenesis. has shown that only 5% of lead GWAS SNPs are likely to be The genetics of late-onset psoriasis, in which disease causal and tend to lie an average distance of 14 kb from the occurs after 40 years of age, substantially overlaps with that probable causal SNP (Farh et al., 2015). Thus, the first task of early-onset psoriasis. In a GWAS, the known type I pso- after GWAS is to perform dense genotyping, resequencing or riasis risk loci IL12B and HLA-C reached genome-wide imputation, to test the association of all variants in LD with significance, and six more known loci reached study-wide the lead variant and gain a detailed picture of potentially significance (IL23R, TRAF3IP2, IL23A, IFIH1, RNF114, and causal variants. The remainder of this review addresses the HLA-A)(Hebert et al., 2015). However, late-onset psoriasis question of how to best utilize the current gains made by may also have unique risk loci at IL1B and IL1R1 (Hebert GWAS by identifying the function of putative causal variants, et al., 2014, 2015; Reich et al., 2002). Subsequent well- particularly in noncoding regions (Figure 1). powered studies will be required to determine how an increasing age of onset affects the strength of these genetic CONSIDERATIONS FOR FUNCTIONAL ANNOTATION OF associations. GWAS VARIANTS The genetic architecture of psoriasis subtypes are gradually In a minority of psoriasis susceptibility loci, the GWAS signal being defined; for example, generalized pustular psoriasis intersects with coding regions of genes (e.g., IL23R and (GPP) is associated with protein-coding mutations in CARD14). In these cases, the function of the variants can be CARD14 (Jordan et al., 2012b; Qin et al., 2014; Sugiura readily assessed (di Meglio et al., 2013; Jordan et al., 2012b; et al., 2014) and IL36RN (Hayashi et al., 2014; Korber Sarin et al., 2011). The genetic association of psoriasis with et al., 2013; M Li et al., 2013; Sugiura et al., 2013). In CARD14 was initially discovered through linkage mapping CARD14, the de novo mutation p.Glu138Ala was found in a (Tomfohrde et al., 1994,) followed by positional cloning using child with GPP (Jordan et al., 2012b), and the rare variant next-generation sequencing that identified an excess of rare p.Asp176His was shown to predispose to GPP with plaque missense variants in families affected by the disease and in psoriasis in Japanese patients (Sugiura et al., 2014). In psoriasis cohorts (Jordan et al., 2012a, 2012b). After these IL36RN, protein modeling and biochemical analyses showed studies, a psoriasis-associated common missense variant that the GPP-associated mutation p.L27P reduced the discovered by Jordan et al. (2012a) achieved genome-wide stability of IL36RN protein and decreased its expression and significance in cohorts of European and Chinese ancestry potency, leading to increased proinflammatory signaling (Tang et al., 2014; Tsoi et al., 2012). CARD14 is a scaffolding (Marrakchi et al., 2011). Rare IL36RN mutations are thought protein that has a role in NF-kB activation. In functional ex- to be uniquely associated with GPP (Capon, 2013; Sugiura periments, some of the associated rare variants were found to et al., 2013) and are linked with a more severe disease affect CARD14 splicing, leading to increased downstream NF- phenotype and earlier age of onset (Hussain et al., 2015). kB expression in keratinocytes (Jordan et al., 2012a, 2012b). The majority of psoriasis-associated GWAS loci are located WHY HASN’T GWAS PROVIDED ALL THE ANSWERS? outside of traditional gene coding regions, often in regulatory Missing heritability enhancer regions characterized by open areas of accessible To date GWAS has only revealed a small proportion of the chromatin that are sensitive to DNAse I and contain modified genetic component of psoriasis. In Europeans, the proportion histone marks (Ernst et al., 2011). Here, identification of the of psoriasis heritability explained by GWAS variants was causal SNP becomes more challenging; bioinformatic evi- most recently estimated at 22% (Tsoi et al., 2012), whereas in dence is first used to form a hypothesis about which SNPs are Chinese it is reportedly 45.7% (Jiang et al., 2015). Several likely to be causal. The hypothesis should then be tested reasons have been proposed for the apparent missing directly with functional experiments to show the mechanism heritability in complex disease, including gene-gene and by which the putative causal SNP affects gene expression or gene-environment interactions and the existence of highly function. deleterious rare variants, although the latter may not greatly Bioinformatic and experimental approaches must take into impact on psoriasis heritability (Hunt et al., 2013; Tang et al., account relevant cell types and stimulatory factors that may

568 Journal of Investigative Dermatology (2016), Volume 136 H Ray-Jones et al. Moving beyond GWAS in Psoriasis

Figure 1. Workflow for identification of putative causal variants and the genes they affect. Genome-wide association study (GWAS) is a hypothesis-free method for identifying single nucleotide polymorphisms (SNPs) correlating with disease risk. Dense, targeted genotyping arrays such as Immunochip can be used for both replication of GWAS loci and genetic fine-mapping of all variants in disease-associated loci, further narrowing down the association signal. Bioinformatics can then be used to both locate and functionally annotate SNPs in linkage dysequilibrium (LD) with the lead SNPs. In noncoding regions, SNPs coinciding with regulatory features such as histone modifications or transcription factor binding sites are most likely to have a functional effect. Appropriate functional experimental techniques can then be used to investigate genotype-specific protein interactions (ChIP), DNA conformation (3C), and gene expression (eQTL and reporter gene assays) in disease-relevant cell types. SNPs associated with disease that coincide with coding regions of genes require bespoke experimental confirmation dependent on both position within gene (e.g., binding domain) and function of protein (e.g., enzymatic activity). Once a causal variant affecting gene function has been identified, its effect on relevant biochemical pathways and resultant disease phenotype can be investigated. affect regulatory mechanisms. Transcriptome studies in pso- which variants are present in potential gene regulatory riasis have demonstrated that gene expression is often tissue regions. or cell type specific (Filkor et al., 2013; Jabbari et al., 2012; Li Freely available databases such as 1000 Genomes et al., 2014; Suarez-Farinas et al., 2012; Tian et al., 2012). As (Altshuler et al., 2012) can be interrogated to identify SNPs in well as protein coding genes, long noncoding RNAs have LD with the index GWAS SNP. Statistical packages may then recently been shown to have substantially different expres- be used to prioritize potential causative SNPs. For example, sion between skin and other tissues (Tsoi et al., 2015a). With the Probabilistic Identification of Causal SNPs (PICS) algo- respect to the selection of relevant cell types in psoriasis, rithm combines the underlying haplotype structure and the dysregulated gene transcripts in psoriatic skin are often strength of the genetic evidence in a bayesian analysis to derived from keratinocytes, fibroblasts, and immune cells, assign probability scores for the likelihood of each SNP in LD whereas GWAS candidate gene expression often derives from being causal (Farh et al., 2015). The Probabilistic Annotation multiple immune cell types, particularly neutrophils INTegratOR (PAINTOR) combines the genetic association (Swindell et al., 2014). Additionally, interaction analysis of data with functional annotation data to score SNPs (Kichaev psoriasis GWAS hits with cell-specific epigenetic marks of et al., 2014), whereas the Combined Annotation Dependent gene activity revealed T helper cells (Th1, Th2, and Th17) to Depletion (CADD) algorithm gathers evidence from multiple be likely key cells in driving susceptibility to psoriasis (Farh resources to assign a score as to the likelihood of any variant et al., 2015). Research has also shown that endothelial being deleterious (Kircher et al., 2014). cells, which highly express the candidate gene CARD14, are If the associated genetic variants are involved in differential likely to be important (Harden et al., 2014). Stimulation is regulation of gene expression, there should be a correlation also likely to be an important factor, especially because between genotype and gene expression (Nicolae et al., psoriatic lesions are thought to be subjected to a range of 2010). Expression quantitative trait loci (eQTLs) can be cytokines, such as tumor necrosis factor-a, OSM IL-22, identified in databases such as GenVAR (Yang et al., 2010), IL-17A, and IL-1a (Bernard et al., 2012; Guilloteau et al., GTex (Lonsdale et al., 2013), and RegulomeDB (Boyle et al., 2010; Rabeony et al., 2014). An inflammatory milieu may 2012). Importantly, the lead SNP associated with disease risk be required for pathogenic mechanisms to occur. must be the lead SNP correlating with expression (lead eQTL)—and not merely in strong LD—for evidence of altered BIOINFORMATIC APPROACHES TOWARD FUNCTIONAL gene expression to be fully informative. Ideally, the colocal- ANNOTATION OF GWAS VARIANTS ization of both signals from the same SNP needs to be Before expensive, hypothesis-driven laboratory experiments statistically proven (Guo et al., 2015). To date it has been are undertaken, bioinformatics may be used to annotate unusual for GWAS association signals to coincide with eQTL disease-associated SNPs. Publically available data can be signals; this may due in part to eQTLs acting in a cell- and interrogated in order to (i) define the set of associated variants stimulation-specific manner. For example, a recent analysis that may be causal, (ii) determine which of these variants is identified cell-specific cis-eQTL effects in monocytes and þ correlated with the expression of genes, and (iii) annotate the CD4 T cells for several traits, including psoriasis (Raj et al., associated variants with epigenetic features that indicate 2014).

www.jidonline.org 569 H Ray-Jones et al. Moving beyond GWAS in Psoriasis

A wealth of bioinformatic data has been generated by technique known as chromatin immunoprecipitation (ChIP) international efforts such as ENCODE (Dunham et al., 2012) (Christova, 2013). ChIP involves formaldehyde cross-linking and NIH Roadmap Epigenomics (Bernstein et al., 2010), of DNA and its bound proteins in a living cell, followed which annotate different cell types with epigenetic markers of by chromatin fragmentation, immunoprecipitation with an genome activity. Online tools use these data to apply func- antibody specific for the protein of interest, reversal of cross- tional scores to SNPs based on information about their links, and identification of the DNA by quantitative PCR regulatory features. For example, the Ensembl Variant Effect (ChIP-qPCR) or sequencing (ChIP-Seq). When the method is Predictor (VEP) indicates noncoding SNP consequences used in cells of different genetic backgrounds, experimental and assigns scores to exonic SNPs through predictors of evidence can be gained as to whether a putative causal risk protein function: PolyPhen (Ramensky et al., 2002) and SIFT allele at a particular SNP affects the level of protein binding (Kumar et al., 2009). Another useful tool, PrediXcan, com- to DNA and is, therefore, functional. bines information from large-scale transcriptome datasets with genotype data to enable the identification of disease- Gene expression associated genes in a GWAS locus (Gamazon et al., 2015). Within appropriate cell types, the effect of regulatory regions on subsequent gene expression can be examined using re- EXPERIMENTAL APPROACHES TOWARD FUNCTIONAL porter gene assays. In this technique, the regulatory region ANNOTATION OF GWAS VARIANTS containing the disease-associated variant is cloned into a Experimental approaches can be used to characterize the vector containing a reporter gene such as luciferase. In a effect of putative causal variants on gene expression, deduce relevant cell type, expression of the luciferase gene can be the mechanism by which this occurs, and link this to the inferred from the amount of luciferase enzyme activity. In the disease phenotype. As an example, a noncoding SNP asso- near future, it is likely that such techniques will be combined ciated with myocardial infarction was recently shown to alter with targeted genome editing in order to identify how indi- SORT1 cell-specific (liver) expression via creation of a tran- vidual SNP alleles affect gene expression. DNA currently can scription factor binding site, ultimately leading to altered be altered using novel CRISPR/Cas9 genome-editing systems levels of low-density lipoprotein cholesterol (Musunuru et al., (Cong et al., 2013), as was recently demonstrated in human 2010). Causal genetic mechanisms such as this can be primary T cells (Schumann et al., 2015). CRISPR/Cas9 deduced using targeted experimental techniques; several of systems are likely to become standard tools for evaluating the which are described below. effect of altering single SNPs on gene expression, as they can better reflect the in vivo changes that confer disease risk. DNA interactions From experiments such as this, novel disease pathways can Noncoding regulatory elements have been shown to interact be predicted by referral to RNA-seq databases, thereby with distant genes through DNA looping in a cell-type identifying genes and noncoding RNAs that are coexpressed specific manner (Dryden et al., 2014; Mifsud et al., 2015; with the gene in question. Tolhuis et al., 2002). To test if a specific DNA interaction An example of an autoimmune disease locus where the exists, chromosome conformation capture (3C) can be uti- causal mechanism has been successfully identified is at lized (Dekker et al., 2002). 3C is a powerful hypothesis- TNFAIP3 in systemic lupus erythematosus (Wang et al., driven method that works best over relatively small regions 2013). Independent genetic variants in and around of DNA (10 kb to 1 Mb) (Naumova et al., 2012). In order to TNFAIP3 are associated with multiple traits, including pso- capture the interactions, the DNA is first cross-linked within riasis (Nair et al., 2009), rheumatoid arthritis (Thomson et al., the cell environment, followed by digestion with a restriction 2007) and systemic lupus erythematosus (Han et al., 2009). enzyme creating small fragments. These fragments undergo In systemic lupus erythematosus, the causal variant was intramolecular ligation, followed by reversal of the original localized to a pair of tandem polymorphic dinucleotides cross-links. The product containing the interacting DNA is (TT>A) in an enhancer region 42 kb downstream of the detected using quantitative PCR. The method has recently TNFAIP3 promoter (Adrianto et al., 2011). A functional study been developed into hypothesis-free Hi-C, which utilizes using several techniques including luciferase reporter assays, ligation of labeled nucleotides coupled with high-throughput 3C, and ChIP showed that TT>A interacts with TNFAIP3 sequencing to identify all genomic interactions at relatively through DNA looping, hence bringing the transcription factor low resolution (Belton et al., 2012; Lieberman-Aiden et al., NF-kB into close proximity with the TNFAIP3 promoter 2009). A further derivative of Hi-C, so-called capture Hi-C, (Wang et al., 2013). The systemic lupus erythematosus risk gains resolution by enriching target loci with RNA baits variant of TT>A was found to have reduced ability to bind (Dryden et al., 2014; Jager et al., 2015; Mifsud et al., 2015) NF-kB, which led to aberrant expression of TNFAIP3. and is an ideal technique for interrogating target genes at A similar process could be used to elucidate the casual psoriasis-associated loci. variant in psoriasis. Protein interactions Variants in regulatory regions such as enhancers and gene CONCLUDING REMARKS promoters are likely to interfere with transcription factor or To fully exploit the robust GWAS data already generated histone binding (McVicker et al., 2013), with a subsequent and to better understand the genetic susceptibility to effect on gene expression. Therefore, a complementary psoriasis, one of the post-GWAS challenges is the identifi- approach to studying DNA-DNA interactions is to study cation and functional annotation of causal variants in DNA-protein interactions at GWAS risk loci, using an in vivo known risk loci and the genes they regulate. Incorporating

570 Journal of Investigative Dermatology (2016), Volume 136 H Ray-Jones et al. Moving beyond GWAS in Psoriasis

GWAS data and functional experiments can describe bio- Cong L, Ran FA, Cox D, Lin SL, Barretto R, Habib N, et al. Multiplex genome e logical pathways that lead to disease, providing targets for engineering using CRISPR/Cas systems. Science 2013;339:819 23. novel therapy development; diagnostic or prognostic bio- de Cid R, Riveira-Munoz E, Zeeuwen P, Robarge J, Liao W, Dannhauser EN, et al. Deletion of the late cornified envelope LCE3B and LCE3C genes as a markers, or biomarkers to target the right treatments to the susceptibility factor for psoriasis. Nat Genet 2009;41:211e5. right patients. de Guzman Strong C, Conlan S, Deming CB, Cheng J, Sears KE, Segre JA. A milieu of regulatory elements in the epidermal differentiation complex ORCIDs syntenic block: implications for atopic dermatitis and psoriasis. Hum Mol Helen Ray-Jones: http://orcid.org/0000-0002-8884-6865 Genet 2010;19:1453e60. Stephen Eyre: http://orcid.org/0000-0002-1251-6974 Anne Barton: http://orcid.org/0000-0003-3316-2527 Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome confor- Richard B. Warren: http://orcid.org/0000-0002-2918-6481 mation. Science 2002;295:1306e11. di Meglio P, Villanova F, Napolitano L, Tosi I, Barberio MT, Mak RK, et al. The CONFLICT OF INTEREST IL23R A/GIn381 allele promotes IL-23 unresponsiveness in human memory The authors state no conflict of interest. T-helper 17 cells and impairs Th17 responses in psoriasis patients. J Invest Dermatol 2013;133:2381e9. ACKNOWLEDGMENTS HRJ is supported by The Sir Jules Thorn Charitable Trust PhD Scholarship. This Dryden NH, Broome LR, Dudbridge F, Johnson N, Orr N, Schoenfelder S, work was carried out in Manchester, United Kingdom. et al. Unbiased analysis of potential targets of breast cancer susceptibility loci by Capture Hi-C. Genome Res 2014;24:1854e68. Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis C, Doyle F, et al. An SUPPLEMENTARY MATERIAL integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57e74. Supplementary material is linked to the online version of the paper at www. jidonline.org, and at http://dx.doi.org/10.1016/j.jid.2015.11.025. Ellinghaus D, Ellinghaus E, Nair RP, Stuart PE, Esko T, Metspalu A, et al. Combined analysis of genome-wide association studies for crohn disease and psoriasis identifies seven shared susceptibility loci. Am J Hum Genet REFERENCES 2012;90:636e47. Adrianto I, Wen F, Templeton A, Wiley G, King JB, Lessard CJ, et al. Asso- Ellinghaus E, Ellinghaus D, Stuart PE, Nair RP, Debrus S, Raelson JV, et al. ciation of a functional variant downstream of TNFAIP3 with systemic lupus Genome-wide association study identifies a psoriasis susceptibility locus at erythematosus. Nat Genet 2011;43:253e8. TRAF3IP2. Nat Genet 2010;42:991e5. Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Clark AG, et al. An integrated map of genetic variation from 1,092 human Mapping and analysis of chromatin state dynamics in nine human cell genomes. Nature 2012;491:56e65. types. Nature 2011;473:43e9. Alvarez-Navarro C, de Castro JAL. ERAP1 structure, function and pathoge- Farh KK, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, et al. netic role in ankylosing spondylitis and other MHC-associated diseases. Genetic and epigenetic fine mapping of causal autoimmune disease vari- Mol Immunol 2014;57:12e21. ants. Nature 2015;518:337e43. Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J. Hi-C: a Feng BJ, Sun LD, Soltani-Arabshahi R, Bowcock AM, Nair RP, Stuart P, et al. comprehensive technique to capture the conformation of genomes. Multiple loci within the major histocompatibility complex confer risk of Methods 2012;58:268e76. psoriasis. PLoS Genet 2009;5:e1000606. Bergboer JGM, Tjabringa GS, Kamsteeg M, van Vlijmen-Willems I, Rodijk- Filkor K, Hegedus Z, Szasz A, Tubak V, Kemeny L, Kondorosi E, et al. Genome Olthuis D, Jansen PAM, et al. Psoriasis risk genes of the Late Cornified wide transcriptome analysis of dendritic cells identifies genes with altered Envelope-3 group are distinctly expressed compared with genes of other expression in psoriasis. PLoS One 2013;8:e73435. e LCE groups. Am J Pathol 2011;178:1470 7. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Bergboer JGM, Zeeuwen P, Schalkwijk J. Genetics of psoriasis: evidence for Carroll RJ, et al. A gene-based association method for mapping traits using epistatic interaction between skin barrier abnormalities and immune de- reference transcriptome data. Nat Genet 2015;47:1091e8. e viation. J Invest Dermatol 2012;132:2320 31. Guilloteau K, Paris I, Pedretti N, Boniface K, Juchaux F, Huguier V, et al. Skin Bernard FX, Morel F, Camus M, Pedretti N, Barrault C, Garnier J, et al. inflammation induced by the synergistic action of IL-17A, IL-22, oncostatin Keratinocytes under fire of proinflammatory cytokines: bona fide innate M, IL-1 alpha, and TNF-alpha recapitulates some features of psoriasis. immune cells involved in the physiopathology of chronic atopic dermatitis J Immunol 2010;184:5263e70. and psoriasis. J Allergy (Cairo) 2012;2012:718725. Guo H, Fortune MD, Burren OS, Schofield E, Todd JA, Wallace C. Integration Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, of disease association and eQTL data using a Bayesian colocalisation Meissner A, et al. The NIH Roadmap Epigenomics Mapping Consortium. approach highlights six candidate causal genes in immune-mediated Nat Biotechnol 2010;28:1045e8. diseases. Hum Mol Genet 2015;24:3305e13. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Han JW, Zheng HF, Cui Y, Sun LD, Ye DQ, Hu Z, et al. Genome-wide as- Annotation of functional variation in personal genomes using sociation study in a Chinese Han population identifies nine new suscep- RegulomeDB. Genome Res 2012;22:1790e7. tibility loci for systemic lupus erythematosus. Nat Genet 2009;41:1234e7. Capon F. IL36RN mutations in Generalized Pustular Psoriasis: just the tip of Harden JL, Lewis SM, Pierson KC, Suarez-Farinas M, Lentini T, Ortenzio FS, the iceberg? J Invest Dermatol 2013;133:2503e4. et al. CARD14 expression in dermal endothelial cells in psoriasis. PLoS Capon F, Bijlmakers MJ, Wolf N, Quaranta M, Huffmeier U, Allen M, et al. One 2014;9:e111255. Identification of ZNF313/RNF114 as a novel psoriasis susceptibility gene. Hayashi M, Nakayama T, Hirota T, Saeki H, Nobeyama Y, Ito T, et al. Novel Hum Mol Genet 2008;17:1938e45. IL36RN gene mutation revealed by analysis of 8 Japanese patients with e Cargill M, Schrodi SJ, Chang M, Garcia VE, Brandon R, Callis KP, et al. generalized pustular psoriasis. J Dermatol Sci 2014;76:267 9. A large-scale genetic association study confirms IL12B and leads to the Hebert HL, Bowes J, Smith RL, Flynn E, Parslew R, Alsharqi A, et al. Identi- identification of IL23R as psoriasis-risk genes. Am J Hum Genet 2007;80: fication of loci associated with late-onset psoriasis using dense genotyping 273e90. of immune-related regions. Br J Dermatol 2015;172:933e9. Cheng H, Li Y, Zuo XB, Tang HY, Tang XF, Gao JP, et al. Identification of a Hebert HL, Bowes J, Smith RL, McHugh NJ, Barker J, Griffiths CEM, et al. missense variant in LNPEP that confers psoriasis risk. J Invest Dermatol Polymorphisms in IL-1B distinguish between psoriasis of early and late 2014;134:359e65. onset. J Invest Dermatol 2014;134:1459e62. Christova R. Detecting DNA-protein interactions in living cells-ChIP Hunt KA, Mistry V, Bockett NA, Ahmad T, Ban M, Barker JN, et al. Negligible approach. In: Donev R, editor. Protein-nucleic acids interactions, vol. 91. impact of rare autoimmune-locus coding-region variants on missing heri- San Diego, CA: Elsevier Academic Press; 2013. p. 101e33. tability. Nature 2013;498:232e5.

www.jidonline.org 571 H Ray-Jones et al. Moving beyond GWAS in Psoriasis

Hussain S, Berki DM, Choon SE, Burden AD, Allen MH, Arostegui JI, et al. Naumova N, Smith EM, Zhan Y, Dekker J. Analysis of long-range chromatin IL36RN mutations define a severe autoinflammatory phenotype of interactions using chromosome conformation capture. Methods 2012;58: generalized pustular psoriasis. J Allergy Clin Immunol 2015;135: 192e203. e 1067 70. Nicolae DL, Gamazon E, Zhang W, Duan SW, Dolan ME, Cox NJ. Jabbari A, Suarez-Farinas M, Dewell S, Krueger JG. Transcriptional profiling of Trait-associated SNPs are more likely to be eQTLs: annotation to enhance psoriasis using RNA-seq reveals previously unidentified differentially discovery from GWAS. PLoS Genet 2010;6:e1000888. e expressed genes. J Invest Dermatol 2012;132:246 9. Okada Y, Han B, Tsoi LC, Stuart PE, Ellinghaus E, Tejasvi T, et al. Fine Jager R, Migliorini G, Henrion M, Kandaswamy R, Speedy HE, Heindl A, et al. mapping major histocompatibility complex associations in psoriasis and its Capture Hi-C identifies the chromatin interactome of colorectal cancer risk clinical subtypes. Am J Hum Genet 2014;95:162e72. loci. Nat Commun 2015;6:6178. Qin PP, Zhang QL, Chen MF, Fu X, Wang C, Wang ZZ, et al. Variant analysis Jiang L, Liu L, Cheng Y, Lin Y, Shen C, Zhu C, et al. More heritability probably of CARD14 in a Chinese Han population with psoriasis vulgaris and captured by psoriasis genome-wide association study in Han Chinese. generalized pustular psoriasis. J Invest Dermatol 2014;134:2994e6. e Gene 2015;573:46 9. Rabeony H, Petit-Paris I, Garnier J, Barrault C, Pedretti N, Guilloteau K, et al. Jordan CT, Cao L, Roberson EDO, Duan SH, Helms CA, Nair RP, et al. Rare Inhibition of keratinocyte differentiation by the synergistic effect of IL-17A. and common variants in CARD14, encoding an epidermal regulator of IL-22, IL-1 alpha, TNF alpha and oncostatin M. PLoS One 2014;9:e101937. e NF-kappaB, in psoriasis. Am J Hum Genet 2012a;90:796 808. Raj T, Rothamel K, Mostafavi S, Ye C, Lee MN, Replogle JM, et al. Polarization Jordan CT, Cao L, Roberson EDO, Pierson KC, Yang CF, Joyce CE, et al. of the effects of autoimmune and neurodegenerative risk alleles in leuko- PSORS2 is due to mutations in CARD14. Am J Hum Genet 2012b;90: cytes. Science 2014;344:519e23. e 784 95. Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and Kichaev G, Yang WY, Lindstrom S, Hormozdiari F, Eskin E, Price AL, et al. survey. Nucleic Acids Res 2002;30:3894e900. Integrating functional data to prioritize causal variants in statistical fine- Reich K, Mossner R, Konig IR, Westphal G, Ziegler A, Neumann C. Promoter mapping studies. PLoS Genet 2014;10:e1004722. polymorphisms of the genes encoding tumor necrosis factor-alpha and Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general interleukin-1 beta are associated with different subtypes of psoriasis char- framework for estimating the relative pathogenicity of human genetic acterized by early and late disease onset. J Invest Dermatol 2002;118: variants. Nat Genet 2014;46:310e5. 155e63. Korber A, Mossner R, Renner R, Sticht H, Wilsmann-Theis D, Schulz P, et al. Saric T, Chang SC, Hattori A, York IA, Markant S, Rock KL, et al. An Mutations in IL36RN in patients with generalized pustular psoriasis. J Invest IFN-gamma-induced aminopeptidase in the ER, ERAP1, trims pre- Dermatol 2013;133:2634e7. cursors to MHC class I-presented peptides. Nat Immunol 2002;3: e Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous 1169 76. variants on protein function using the SIFT algorithm. Nat Protoc 2009;4: Sarin R, Wu XX, Abraham C. Inflammatory disease protective R381Q 1073e82. IL23 receptor polymorphism results in decreased primary CD4þand þ Li BS, Tsoi LC, Swindell WR, Gudjonsson JE, Tejasvi T, Johnston A, et al. CD8 human T-cell functional responses. Proc Natl Acad Sci U S A e Transcriptome analysis of psoriasis in a large case-control sample: 2011;108:9560 5. RNA-Seq provides insights into disease mechanisms. J Invest Dermatol Schumann K, Lin S, Boyer E, Simeonov DR, Subramaniam M, Gate RE, et al. 2014;134:1828e38. Generation of knock-in primary human T cells using Cas9 ribonucleo- e Li M, Han JW, Lu ZY, Li HG, Zhu KJ, Cheng RH, et al. Prevalent and rare proteins. Proc Natl Acad Sci U S A 2015;112:10437 42. mutations in IL-36RN gene in chinese patients with generalized pus- Sheng Y, Jin X, Xu J, Gao J, Du X, Duan D, et al. Sequencing-based approach tular psoriasis and psoriasis vulgaris. J Invest Dermatol 2013;133: identified three new susceptibility loci for psoriasis. Nat Commun 2014;5: 2637e9. 4331. Li Y, Cheng H, Zuo XB, Sheng YJ, Zhou FS, Tang XF, et al. Association ana- Strange A, Capon F, Spencer CCA, Knight J, Weale ME, Allen MH, et al. lyses identifying two common susceptibility loci shared by psoriasis and A genome-wide association study identifies new psoriasis susceptibility systemic lupus erythematosus in the Chinese Han population. J Med Genet loci and an interaction between HLA-C and ERAP1. Nat Genet 2010;42: 2013;50:812e8. 985e90. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Stranger BE, Stahl EA, Raj T. Progress and promise of genome-wide associa- Telling A, et al. Comprehensive mapping of long-range interactions re- tion studies for human complex trait genetics. Genetics 2011;187:367e83. veals folding principles of the human genome. Science 2009;326: Stuart PE, Nair RP, Ellinghaus E, Ding J, Tejasvi T, Gudjonsson JE, et al. e 289 93. Genome-wide association analysis identifies three psoriasis susceptibility Liu Y, Helms C, Liao W, Zaba LC, Duan S, Gardner J, et al. A genome-wide loci. Nat Genet 2010;42:1000e4. association study of psoriasis and psoriatic arthritis identifies new disease Suarez-Farinas M, Li K, Fuentes-Duculan J, Hayden K, Brodmerkel C, loci. PLoS Genet 2008;4:e1000041. Krueger JG. Expanding the psoriasis disease profile: interrogation of the Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The skin and serum of patients with moderate-to-severe psoriasis. J Invest Genotype-Tissue Expression (GTEx) project. Nat Genet 2013;45:580e5. Dermatol 2012;132:2552e64. Mahil SK, Capon F, Barker JN. Genetics of psoriasis. Dermatol Clin 2015;33: Sugiura K, Muto M, Akiyama M. CARD14 c.526G > C (p.Asp176His) is a 1e11. significant risk factor for generalized pustular psoriasis with psoriasis vul- e Marrakchi S, Guigue P, Renshaw BR, Puel A, Pei XY, Fraitag S, et al. garis in the Japanese cohort. J Invest Dermatol 2014;134:1755 7. Interleukin-36-receptor antagonist deficiency and generalized pustular Sugiura K, Takemoto A, Yamaguchi M, Takahashi H, Shoda Y, Mitsuma T, psoriasis. N Engl J Med 2011;365:620e8. et al. The majority of generalized pustular psoriasis without psoriasis McVicker G, van de Geijn B, Degner JF, Cain CE, Banovich NE, Raj A, et al. vulgaris is caused by deficiency of interleukin-36 receptor antagonist. e Identification of genetic variants that affect histone modifications in human J Invest Dermatol 2013;133:2514 21. cells. Science 2013;342:747e9. Sun LD, Cheng H, Wang ZX, Zhang AP, Wang PG, Xu JH, et al. Association Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, analyses identify six new psoriasis susceptibility loci in the Chinese e et al. Mapping long-range promoter contacts in human cells with high- population. Nat Genet 2010;42:1005 9. resolution capture Hi-C. Nat Genet 2015;47:598e606. Swindell WR, Stuart PE, Sarkar MK, Voorhees JJ, Elder JT, Johnston A, et al. Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, Sachs KV, Cellular dissection of psoriasis for transcriptome analyses and the post- et al. From noncoding variant to phenotype via SORT1 at the 1p13 GWAS era. BMC Med Genomics 2014;7. cholesterol locus. Nature 2010;466:714e9. Tang H, Jin X, Li Y, Jiang H, Tang X, Yang X, et al. A large-scale screen for e Nair RP, Duffin KC, Helms C, Ding J, Stuart PE, Goldgar D, et al. Genome- coding variants predisposing to psoriasis. Nat Genet 2014;46:45 50. wide scan reveals association of psoriasis with IL-23 and NF-kappa B Thomson W, Barton A, Ke X, Eyre S, Hinks A, Bowes J, et al. Rheumatoid pathways. Nat Genet 2009;41:199e204. arthritis association at 6q23. Nat Genet 2007;39:1431e3.

572 Journal of Investigative Dermatology (2016), Volume 136 H Ray-Jones et al. Moving beyond GWAS in Psoriasis

Tian SY, Krueger JG, Li K, Jabbari A, Brodmerkel C, Lowes MA, et al. Met- Wang SF, Wen F, Wiley GB, Kinter MT, Gaffney PM. An Enhancer element a-Analysis Derived (MAD) transcriptome of psoriasis defines the “core” harboring variants associated with systemic lupus erythematosus engages pathogenesis of disease. PLoS One 2012;7:e44274. the TNFAIP3 promoter to influence A20 expression. PLoS Genet 2013; Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W. Looping and inter- 9:10. action between hypersensitive sites in the active beta-globin locus. Mol Yang TP, Beazley C, Montgomery SB, Dimas AS, Gutierrez-Arcelus M, Cell 2002;10:1453e65. Stranger BE, et al. Genevar: a database and Java application for the analysis Tomfohrde J, Silverman A, Barnes R, Fernandezvina MA, Young M, Lory D, and visualization of SNP-gene associations in eQTL studies. Bioinformatics e et al. Gene for familial psoriasis susceptibility mapped to the distal end of 2010;26:2474 6. human-chromosome 17q. Science 1994;264:1141e5. Yin X, Low HQ, Wang L, Li Y, Ellinghaus E, Han J, et al. Genome-wide meta- Tsoi LC, Iyer MK, Stuart PE, Swindell WR, Gudjonsson JE, Tejasvi T,et al. Analysis analysis identifies multiple novel associations and ethnic heterogeneity of of long non-coding RNAs highlights tissue-specific expression patterns and psoriasis susceptibility. Nat Commun 2015;6:6916. epigenetic profiles in normal and psoriatic skin. Genome Biol 2015a;16:24. Zhang XJ, Huang W, Yang S, Sun LD, Zhang FY, Zhu QX, et al. Psoriasis Tsoi LC, Spain SL, Ellinghaus E, Stuart PE, Capon F, Knight J, et al. Enhanced genome-wide association study identifies susceptibility variants within late meta-analysis and replication studies identify five new psoriasis suscepti- cornified envelope (LCE) gene cluster at 1q21. Nat Genet 2009;41: bility loci. Nat Commun 2015b;6:7001. 205e10. Tsoi LC, Spain SL, Knight J, Ellinghaus E, Stuart PE, Capon F, et al. Identifi- Zuo X, Sun L, Yin X, Gao J, Sheng Y, Xu J, et al. Whole-exome SNP array cation of 15 new psoriasis susceptibility loci highlights the role of innate identifies 15 new susceptibility loci for psoriasis. Nat Commun 2015;6: immunity. Nat Genet 2012;44:1341e8. 6793.

www.jidonline.org 573