YJOCA4821_proof ■ 24 March 2021 ■ 1/15

Osteoarthritis and Cartilage xxx (xxxx) xxx

55 56 57 58 59 60 61 62 63 Review 64 65 1 Q8 Genetics of osteoarthritis 66 2 67 3 a a * 68 Q7 G. Aubourg , S.J. Rice , P. Bruce-Wootton, J. Loughlin 4 69 5 Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK 70 6 71 7 72 8 article info summary 73 9 74 10 Article history: Osteoarthritis genetics has been transformed in the past decade through the application of large-scale 75 Received 11 December 2020 11 genome-wide association scans. So far, over 100 polymorphic DNA variants have been associated with 76 Accepted 6 March 2021 this common and complex disease. These genetic risk variants account for over 20% of osteoarthritis 12 77 13 heritability and the vast majority map to non- coding regions of the genome where they are Keywords: fi 78 14 presumed to act by regulating the expression of target . Statistical ne mapping, in silico analyses of Genetics genomics data, and laboratory-based functional studies have enabled the identification of some of these 79 15 Epigenetics targets, which encode with diverse roles, including extracellular signaling molecules, intracel- 80 SNPs 16 lular enzymes, transcription factors, and cytoskeletal proteins. A large number of the risk variants 81 17 GWAS DNA methylation correlate with epigenetic factors, in particular cartilage DNA methylation changes in cis, implying that 82 18 Functional analysis epigenetics may be a conduit through which genetic effects on expression are mediated. Some of 83 19 the variants also appear to have been selected as humans adapted to bipedalism, suggesting that a 84 20 proportion of osteoarthritis genetic susceptibility results from antagonistic pleiotropy, with risk variants 85 21 having a positive role in joint formation but a negative role in the long-term health of the joint. Although 86 data from an osteoarthritis genetic study has not yet directly led to a novel treatment, some of the 22 87 osteoarthritis associated genes code for proteins that have available therapeutics. Genetic investigations 23 88 24 are therefore revealing fascinating fundamental insights into osteoarthritis and can expose options for translational intervention. 89 25 © 2021 The Author(s). Published by Elsevier Ltd on behalf of Osteoarthritis Research Society 90 26 International. This is an open access article under the CC BY license (http://creativecommons.org/ 91 27 licenses/by/4.0/). 92 28 93 29 94 30 95 31 96 32 Q1 Introduction Osteoarthritis (OA) has been subjected to such detailed genetic 97 33 investigation and each year new OA susceptibility risk loci are re- 98 34 Comprehensive molecular genetic investigations into multifac- ported, along with subsequent mechanistic investigations that help 99 35 torial human diseases have been a scientific highlight of the past to clarify how genetic risk impacts the cells and tissues of the 100 36 decade. Such analyses have garnered insights into fundamental articulating joint. In this review we summarize the current status of 101 37 aspects of disease pathophysiology, identifying gene targets and OA genetics, from discovery of risk loci and their functional inves- 102 38 biological pathways for therapeutic intervention. These genetic tigation, through to epigenetics and the clinical utility of the new 103 39 studies involve the genome-wide association scan (GWAS) of DNA knowledge. 104 40 variants, principally mapping alleles at single nucleotide poly- 105 41 morphisms (SNPs), in cases and controls. The use of large cohorts, 106 OA is a polygenic disease 42 often involving hundreds of thousands of individuals, and high- 107 43 density SNP genotyping arrays supplemented by statistical impu- 108 Investigators conducting OA GWAS have focused on cases 44 tation has enabled tens of thousands of DNA variants to be asso- 109 1 diagnosed with the typical age-related form of the disease, in which 45 ciated with a broad range of diseases . 110 onset occurs from the fifth decade of life. Furthermore, individuals 46 111 with post-traumatic OA, or another obviously non-genetic cause of 47 112 disease, are often excluded from analyses. The logic here is that the 48 113 * Address correspondence and reprint requests to: J. Loughlin, Newcastle Uni- case group will then consist of individuals who are more likely to 49 versity, Biosciences Institute, International Centre for Life, Newcastle upon Tyne, 114 have a genetic component underlying their disease. This is the most 50 NE1 3BZ, United Kingdom. Tel.: 44 (0)191-241-8988; Fax: 44 (0)191-241-8666. 115 common form of OA, which consequently confers the highest 51 E-mail address: [email protected] (J. Loughlin). 116 a These authors contributed equally. burden upon society. Selection of OA cases has involved a range of 52 117 53 https://doi.org/10.1016/j.joca.2021.03.002 118 54 1063-4584/© 2021 The Author(s). Published by Elsevier Ltd on behalf of Osteoarthritis Research Society International. This is an open access article under the CC BY license 119 (http://creativecommons.org/licenses/by/4.0/).

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 2/15

2 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

1 66 2 67 3 68 4 69 Association SNP Nearerst protein coding gene EA/NEA OR Variant (chromatin state) Position (hg19) P-value Causal gene Reference 5 70 1 rs3753841 COL11A1 A/G 1.08 Missense (Transcribed) 10,33,79,918 5.20 10 10 22 6 71 rs2126643 COL11A1 C/T 1.10 Intronic (Transcribed) 10,34,04,384 2.10 10 14 22 7 rs4338381 COL11A1 A/G 1.10 Intronic (Promoter) 10,35,72,927 4.37 1015 24 72 8 chr1:150214028 ANP32E C/CT 1.03 Intergenic 15,02,14,028 2.54 10 8 24 73 9 rs550034492 RABGAP1L TA(17)/T 1.03 Intronic (Transcribed) 17,41,92,403 1.05 10 8 24 74 rs11583641 COLGALT2 C/T 1.08 30UTR (Transcribed) 18,39,06,245 5.58 10 10 COLGALT2 24 10 75 rs2820436 ZC3H11B A/C 1.07 Intergenic (Enhancer) 21,96,40,680 2.01 10 9 21 11 76 rs2785988 SLC30A10 A/C 1.08 Intergenic (Repressed) 21,97,44,138 3.90 10 10 22 12 rs2820443 SLC30A10 C/T 1.06 Intergenic (Enhancer) 21,97,53,509 2.01 10 9 21 77 13 1.06 6.01 10 11 24 78 14 rs10916199 ZNF678 A/G ns Intronic (Heterochromatin) 22,79,02,472 2.4 10 13 WNT9A 25 79 rs10218792 KIF26B G/T 1.04 Intronic (Transcribed) 24,57,50,932 2.03 10 8 24 15 80 2 rs2061027 LTBP1 A/G 1.04 Intronic (Transcribed) 33,43,336 3.16 10 13 24 16 rs2061026 LTBP1 A/G 1.06 Intronic (Enhancer) 33,43,549 1.40 10 11 22 81 17 rs2862851 TGFA C/T ns Intronic (Repressed) 7,07,12,802 5.20 10 11 14 82 18 rs3771501 TGFA A/G 1.06 Intronic (Repressed) 7,07,17,653 1.66 10 8 21 83 1.05 4.24 10 16 24 19 84 rs12470967 SDPR A/G 1.06 Intergenic 19,26,71,981 1.50 10 8 24 20 85 rs62182810 RAPH1 A/G 1.03 Intronic (Transcribed) 20,43,87,482 1.65 10 9 24 21 3 rs7639618 COL6A4P1 G/A 1.43 Missense (Repressed) 1,52,16,429 7.3 10 11 3 86 22 rs62262139 RBM6 A/G 1.04 Intronic (Transcribed) 5,00,22,049 9.09 10 11 24 87 23 rs11177 GNL3 A/G 1.12 Missense (Transcribed) 5,27,21,305 1.25 10 10 8 88 rs6976 GNL3 T/C 1.12 30UTR (Transcribed) 5,27,28,804 7.24 10 11 8 24 89 rs3774355 ITIH1 A/G 1.09 Intronic (Repressed) 5,28,17,778 8.20 10 14 24 25 rs678 ITIH1 T/A 1.08 Missense (Repressed) 5,28,20,981 1.60 10 9 22 90 26 rs12107036 TP63 G/A 1.21 Intronic 18,96,00,160 2.15 103 8 91 27 4 rs11732213 SLBP T/C 1.06 Intronic (Transcribed) 17,04,244 8.81 10 10 24 92 rs1913707 RAB28 A/G 1.08 Intergenic 1,30,39,440 2.96 10 11 24 28 93 rs34811474 ANAPC4 G/A 1.04 Intronic (Transcribed) 2,54,08,838 2.17 10 9 24 29 94 rs11335718 ANXA3 -/C 1.11 Intronic (Transcribed) 7,95,28,543 4.26 10 8 21 30 rs13107325 SLC39A8 T/C 1.10 Missense (Transcribed) 10,31,88,709 8.29 10 19 24 95 31 5 rs10471753 PIK3R1 G/C ns Intergenic (Enhancer) 6,78,18,952 3.80 10 9 14 96 32 rs35611929 AP3B1 A/G 1.06 Intronic (Transcribed) 7,74,67,824 8.29 10 19 24 97 rs3884606 FGF18 G/A 1.04 Intronic (Enhancer) 17,08,71,074 8.25 10 9 24 33 98 6 rs1800562 HFE G/A 1.95 Missense (Transcribed) 2,60,93,141 5.0 10 14 22 34 99 rs115740542 HIST1H2BC C/T 1.06 Intergenic (Promoter) 2,61,23,502 8.59 10 9 24 35 rs10947262 BTNL2 C/T 1.31 Intronic (Repressed) 3,23,73,312 5.0 10 9 4 100 36 rs7775228 HLA-DQB1 T/C 1.34 Intergenic (Repressed) 3,26,58,079 2.43 10 8 4 101 rs9277552 HLA-DPB1 C/T 1.06 30UTR (Transcribed) 3,30,55,501 2.37 10 10 24 37 102 rs12154055 CDC5L G/A 1.03 Intergenic 4,44,49,697 2.71 10 8 24 38 103 rs10948155 SUPT3H/RUNX2 C/T ns Intergenic 4,46,87,987 5.20 10 11 RUNX2 14 39 rs10948172 SUPT3H/RUNX2 G/A 1.14 Intronic (Enhancer) 4,47,77,961 7.92 10 8 RUNX2 8 104 40 9.00 10 11 22 105 41 rs2396502 SUPT3H/RUNX2 C/A 1.09 Intronic (Transcribed) 4,53,57,699 2.12 10 12 24 106 rs1997995 SUPT3H/RUNX2 G/A 1.09 Intronic (Transcribed) 4,53,74,183 1.1 10 11 22 42 107 rs12206662 SUPT3H/RUNX2 G/A ns Intronic (Transcribed) 4,53,76,221 1.3 10 9 14 43 108 rs80287694 BMP5 G/A 1.12 Intronic 5,56,36,940 2.66 10 9 24 44 rs12209223 FILIP1 A/C 1.16 Intronic 7,61,64,589 2.9 10 15 22 109 45 1.17 3.88 10 16 24 110 rs9350591 FILIP1 T/C 1.18 Intergenic 7,62,41,527 2.42 10 9 8 46 111 7 rs143083812 SMO T/C 2.84 Missense (Transcribed) 1,28,84,410 7.90 10 12 22 47 112 rs11764536 HDAC9 C/A 1.26 Intronic 1,84,09,993 1.60 10 9 22 48 rs788748 IGFBP3 A/G 0.70 Intergenic 4,60,26,181 2.0 10 8 13 113 49 rs11409738 DYNC1L1 TA/T 1.04 Intronic (Transcribed) 9,57,19,834 2.13 10 10 24 114 50 rs3815148 COG5 C/A 1.14 Intronic (Transcribed) 10,69,38,420 4.11 10 9 5 115 rs4730250 DUS4L G/A 1.17 Intronic (Transcribed) 10,72,07,695 9.20 10 9 11 51 116 rs7792864 RNF32 C/G 2.35 Intergenic 15,63,46,087 4.00 10 9 17 52 117 8 rs330050 PPP1R3B G/C 1.04 Intergenic (Repressed) 90,87,679 1.93 10 11 24 53 rs4733724 GSDMC A/G 1.11 Intergenic (Enhancer) 13,01,23,728 7.20 10 12 22 118 54 rs60890741 GSDMC C/CA 1.11 Intronic 13,07,68,503 4.50 10 9 24 119 rs11780978 PLEC A/G 1.13 Intronic (Transcribed) 14,50,34,852 1.98 10 9 PLEC 21 55 120 9 rs10116772 GLIS3 C/A 1.03 Intronic (Transcribed) 42,90,541 3.71 10 8 23 56 121 rs10974438 GLIS3 A/C 1.03 Intronic (Transcribed) 42,91,928 1.34 10 8 24 57 rs116882138 MOB3B A/G 1.25 Intergenic 2,73,13,557 5.09 10 8 21 122 58 rs1078301 COL27A1 T/A 1.07 Intergenic (Repressed) 11,69,09,146 1.4 10 10 22 123 59 rs919642 COL27A1 T/A 1.05 Intergenic (Repressed) 11,69,11,147 8.55 10 15 24 124 rs1330349 TNC C/G 1.08 Intronic (Transcribed) 11,78,40,742 4.10 10 11 24 60 125 rs2480930 TNC A/G 1.09 Intronic (Transcribed) 11,78,42,307 6.60 10 12 22 61 126 rs4836732 ASTN2 C/T 1.20 Intronic (Transcribed) 11,92,66,695 6.11 10 10 8 62 rs13283416 ASTN2 G/T 1.10 Intronic 11,93,01,607 5.3 10 14 22 127 63 rs34687269 ASTN2 A/T 1.09 Intronic 11,94,84,132 1.67 10 12 24 128 64 rs10760442 LMX1B G/A 1.09 Intronic (Repressed) 12,93,83,900 7.60 10 12 22 129 12 24 65 rs62578127 LMX1B C/T 1.09 Intronic (Repressed) 12,93,86,860 2.77 10 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 3/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 3

1 Table I (continued ) 66

2 Chromosome Association SNP Nearerst protein coding gene EA/NEA OR Variant (chromatin state) Position (hg19) P-value Causal gene Reference 67

3 68 11 rs17659798 METTL15 A/C 1.06 Intergenic 2,88,74,997 2.06 10 10 24 4 69 rs11031191 DCDC5 T/G 1.03 Intergenic 3,07,74,280 1.42 10 8 24 5 rs2070852 F2 C/G ns Intronic (Transcribed) 4,67,44,925 4.7 10 8 25 70 6 rs10896015 LTBP3 G/A 1.09 Intronic (Enhancer) 6,53,23,725 7.70 10 10 22 71 7 1.08 2.74 10 9 24 72 8 24 8 rs34419890 SPTBN2 T/C 1.13 Intergenic (Repressed) 6,65,01,624 1.99 10 73 rs1149620 TSKU T/A 1.04 Intronic (Transcribed) 7,65,06,572 6.93 1010 24 9 12 rs4764133 ERP27 T/C ns Intergenic 1,50,64,363 1.80 10 15 MGP 18 74 10 ns 4.8 10 14 25 75 11 rs10492367 PTHLH/KLHL42 T/G 1.14 Intergenic 2,80,14,970 1.25 10 24 24 76 rs10843013 PTHLH/KLHL42 C/A 1.14 Intergenic 2,80,25,196 5.4 10 19 22 12 77 rs12049916 CCDC91 G/A ns Intronic 2,83,59,985 2.00 10 8 25 13 78 rs79056043 LRIG3 G/A 1.18 Intronic (Transcribed) 5,82,89,598 1.33 10 9 24 14 rs317630 CPSF6 T/C 1.04 Intronic (Transcribed) 6,96,37,847 1.97 10 8 24 79 15 rs11105466 ATP2B1 A/G 1.04 Intergenic (Enhancer) 9,03,26,919 2.16 10 8 24 80 rs2171126 CRADD T/C 1.03 Intronic (Transcribed) 9,41,67,220 9.07 10 10 24 16 81 CHST11 G/A 1.13 Intronic 10,50,60,767 1.64 10 8 8 17 82 rs11059094 MLXIP T/C 1.08 Intronic (Transcribed) 12,26,06,837 7.38 10 11 24 18 rs1060105 SBNO1 C/T 1.07 Missense (Transcribed) 12,38,06,219 1.9 10 8 22 83 19 rs56116847 SBNO1 A/G 1.06 Intronic (Transcribed) 12,38,35,233 3.19 10 10 24 84 20 rs4765540 FAM101A C/T 1.08 30UTR (Transcribed) 12,47,99,642 3.4 10 9 22 85 13 rs11842874 MCF2L A/G 1.17 Intergenic (Repressed) 11,36,94,509 2.04 10 8 7 21 86 15 rs35912128 USP8 T/- 1.08 Intronic (Transcribed) 5,07,31,132 2.18 10 8 24 22 87 rs4775006 ALDH1A2 A/C 1.06 Intergenic (Enhancer) 5,82,15,727 8.40 10 10 24 23 rs3204689 ALDH1A2 C/G 1.46 30UTR (Transcribed) 5,82,46,802 3.99 10 10 ALDH1A2 12 88 24 rs12901372 SMAD3 C/G 1.08 Intronic (Enhancer) 6,70,78,168 5.60 10 11 22 89 25 rs12901071 SMAD3 A/G 1.08 Intronic (Enhancer) 6,73,70,389 3.12 10 10 20 90 rs35206230 CSK T/C 1.04 Intergenic (Transcribed) 7,50,97,780 1.48 10 12 24 26 91 16 rs9930333 FTO G/T 1.05 Intronic (Transcribed) 5,37,99,977 1.52 10 9 24 27 rs8044769 FTO C/T 1.11 Intronic (Transcribed) 5,38,39,135 6.85 108 8 92 28 rs6499244 NFAT5/WWP2 A/T 1.06 30UTR (Transcribed) 6,97,35,271 3.88 10 11 24 93 29 rs34195470 WWP2 G/A 1.07 Intronic (Enhancer) 6,99,55,690 2.7 10 11 22 94 rs864839 JPH3 T/G 1.08 Intronic (Repressed) 8,76,71,419 2.01 10 8 21 30 95 rs1126464 DPEP1 G/C 1.04 Missense (Transcribed) 8,97,04,365 1.56 10 10 24 31 96 17 rs35087650 SMG6 TT/- 1.07 Intronic (Transcribed) 20,58,350 1.18 10 9 24 32 rs8067763 SOX9 G/A 1.06 Intergenic (Repressed) 1,00,12,939 2.39 10 9 24 97 33 rs2953013 NF1 C/A 1.05 Intronic (Transcribed) 2,94,96,343 3.07 10 10 24 98 34 rs62063281 MAPT G/A 1.10 Intronic (Repressed) 4,40,38,785 5.30 10 12 24 99 rs547116051 MAPT C/- 1.83 Intronic 4,40,57,887 1.50 10 8 24 35 100 rs7222178 NACA2 A/T 1.09 Intergenic 5,96,52,282 1.70 10 9 22 36 A/T 1.10 3.78 10 11 24 101 37 rs2521349 MAP2K6 A/G 1.13 Intronic (Transcribed) 6,75,03,501 9.95 10 10 21 102 38 18 rs10502437 TMEM241 G/A 1.03 Intronic (Transcribed) 2,09,70,706 2.50 10 8 24 103 19 rs11880992 DOT1L A/G ns Intronic (Transcribed) 21,76,403 3.20 10 16 14 39 104 rs12982744 DOT1L C/G 1.17 Intronic (Transcribed) 21,77,193 7.8 10 9 10 40 105 rs1560707 SLC44A2 T/G 1.04 Intronic (Transcribed) 1,07,50,738 1.35 10 13 24 41 chr19:18,898,330 COMP G/C 16.70 Missense 1,88,98,330 4.00 10 12 15 106 42 rs375575359 ZNF345 del/T 1.21 Intronic (Heterochromatin) 3,73,53,347 7.54 10 9 21 107 43 rs75621460 TGFB1 A/G 1.16 Intergenic (Enhancer) 4,18,33,784 1.62 10 15 TGFB1 24 108 rs4252548 IL11 T/C 1.30 Missense (Transcribed) 5,58,79,672 2.1 10 11 22 44 109 1.32 1.96 10 12 24 45 20 rs143384 GDF5 A/G 1.10 50UTR (Promoter) 3,40,25,756 1.40 10 19 22 110 46 1.10 4.77 1023 24 111 47 rs143383 GDF5 T/C 1.79 50UTR (Promoter) 3,40,25,983 1.8 10 13 GDF5 2 112 1.17 6.2 10 11 6 48 113 rs6094710 NCOA3 A/G 1.28 Intergenic (Enhancer) 4,60,95,649 2.0 10 10 NCOA3 11 49 114 21 rs6516886 RWDD2B T/A 1.10 Intergenic (Transcribed) 3,03,93,664 5.84 10 8 RWDD2B 21 50 rs2836618 ERG A/G 1.09 Intergenic 4,00,48,295 3.20 10 11 24 115 51 22 rs117018441 EP300 T/G 5.89 Intergenic (Transcribed) 4,15,53,917 1.8 10 25 22 116 52 rs532464664 CHADL ins8/- 7.70 Missense (Promoter) 4,16,34,088 4.50 10 18 15 117 8 24 53 rs528981060 SCUBE1 A/G 1.68 Intronic (Repressed) 4,36,62,241 2.0 10 118 54 119 55 The 124 SNPs significantly associated with osteoarthritis risk. The nearest protein coding gene to the 120 56 SNP is named. EA, effect allele; NEA, non-effect allele; OR, odds ratio; ns, not stated. The physical 121 57 Table I location of the SNP (hg19 genome assembly) is reported along with the chromatin-state description of Osteoarthritis 122 58 the region in the Roadmap ChIP-seq dataset in mesenchymal stem cells (MSCs, E006), cultured andCartilage 123 59 chondrocytes (E046), adipose-derived MSCs (E025), cultured adipocytes (E023), and osteoblasts 124 60 (E129) 125 61 126 62 127 63 128 64 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 4/15

4 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

1 diagnostic tools including X-ray evidence of joint space narrowing, regions of high linkage disequilibrium (LD), and this information 66 2 joint pain, or the need to replace the diseased joint due to severe alone provides little insight into causal variants, target genes, and 67 3 end-stage OA. Cases have been sub-categorized based on disease at the molecular mechanisms underpinning disease pathology. 68 4 particular skeletal sites (principally knees, hips, and hands) and Direct functional follow-up of the loci in relevant tissues and 69 5 GWAS have been performed both by joint site stratification, and by disease models is therefore essential, yet this process is laborious, 70 6 combining all cases. Studies have encompassed European cohorts, expensive, and can require large numbers of patient samples. 71 7 individuals of European descent, Asian cohorts, and African Furthermore, causal variants can exert their pathological effects in a 72 8 Americans. To increase confidence in the results, studies are spatiotemporal manner, and it is vital that appropriate investigative 73 9 collaborative and usually involve a discovery component followed tools and models are applied. As a result, statistical methods of 74 10 by replication and meta-analysis. Such rigor has enabled geneticists refining GWAS signals, along with the integration of genome and 75 e 11 to identify 124 SNPs associated with OA to date (Table I)2 25. These epigenome-wide public datasets are increasingly being applied 76 12 SNPs encompass 95 independent loci spread across the genome, post-GWAS to prioritise variants and genes for subsequent func- 77 13 with some loci (for example, at the type XI collagen gene COL11A1) tional analyses. 78 14 having several SNPs marking separate associations at the locus. OA Due to the small effect sizes conferred by SNP genotype in most 79 15 is therefore a highly polygenic disease. polygenic diseases, simply selecting the SNP at each locus with the 80 16 Effect sizes of OA risk-conferring alleles are small, with the vast lowest P-value is unlikely to provide insight into the most probable 81 17 majority of odds ratios (ORs) being <1.5 and with no common causal variant35. Bayesian fine-mapping considers all SNPs at a 82 18 large-effect loci of the magnitude seen, for example, at the human single locus that reach genome-wide significance (P < 5 10 8) 83 19 leucocyte antigen (HLA) in autoimmune arthritic diseases such as along with accompanying variants in LD36. Posterior probabilities 84 20 rheumatoid arthritis26. The largest OR so far reported for an OA (PP) of causality are applied to identify a credible set of variants, 85 21 locus is 16.70 to a variant within the cartilage oligomeric matrix which can contain either single or multiple causal SNPs36,37.To 86 22 protein gene COMP, but this variant is restricted to an extensive date, fine-mapping approaches have been applied to several GWAS 87 23 Icelandic pedigree and is absent from other European populations of OA21,23,24. This includes the largest OA GWAS to date, which 88 24 (Table I)15. mined the full UK Biobank dataset to investigate over 77,000 in- 89 25 Overall, OA is an archetypal example of a common, polygenic dividuals with the disease24. At 6 of the 52 novel OA risk loci re- 90 26 disease in which disease occurs due to the inheritance of multiple ported, the investigators were able to identify a single causal 91 27 risk alleles of modest individual impact. It also fits with the “lia- variant with >95% PP; SNPs rs34811474, rs13107325, rs547116051, 92 28 bility threshold” model of polygenic diseases27. This model con- rs75621460, rs4252548 and rs528981060 in Table I. 93 29 siders that many distinct genetic variants can increase Post-GWAS, the resolution of fine-mapping can be further 94 30 susceptibility to a discontinuous disease or trait, such as whether enhanced by the integration of genomic and epigenomic functional 95 31 an individual has OA or does not. If an individual reaches the data36. Again, this approach is being increasingly utilised in the OA 96 32 threshold number of those variants and their concomitant effects field, aided in recent years by the generation and subsequent public 97 33 are exerted, they will consequently develop the disease. availability of large datasets generated in human articular cartilage, 98 34 chondrocytes, and mesenchymal stem cells (Fig. 2, sections A to H). 99 35 OA risk alleles primarily reside in non-protein coding regions of the Such datasets have proved invaluable to the OA and musculoskel- 100 36 genome etal research community and include gene and transcript expres- 101 37 sion data38,39, foetal and aged chromatin accessibility data (ATAC- 102 38 There are two main mechanisms by which genetic variation can seq)40,41, chromatin state data (ChIP-seq)42, long range chromatin 103 39 act on a phenotype. The first is through a direct change to a protein. interactions (Capture HiC)43, and DNA methylation (DNAm) 104 40 This could, for example, be the result of a genetic variant changing 105 41 the DNA code in the coding sequence of a gene, which introduces 106 42 an amino acid substitution that alters protein function. This is the 107 1.7% 43 mechanism by which DNA variants act in many Mendelian dis- 3' UTR 108 44 eases28 but is rare for common diseases. The second mechanism is 9.8% 4.9% 5' UTR 109 45 through altering the regulation of , leading to an Intergenic 110 Intronic 46 increase or decrease of the level of a gene's mRNA, and subse- 111 Missense 47 quently the levels of the encoded protein. This is the mechanism by 112 48 which the vast majority of variants act in common diseases29. Ge- 113 49 netic loci at which SNP genotypes are associated with such changes 114 50 in gene expression levels are termed expression quantitative trait 29.5% 115 51 loci (eQTLs)30,31 and most of the genetic variants that have so far 116 52 been associated with common diseases are located in non-coding 117 53 regions of the genome32. In OA, less than 10% of the 124 SNPs so far 118 54 associated with the disease are located in a protein coding 54.1% 119 55 sequence (Fig. 1). 120 56 121 57 Functional follow-up of OA genetic signals e statistical fine mapping 122 58 and in silico analysis 123 59 124 fi 60 Across the eld of common disease research, it has proven The genomic locations of the 125 61 challenging biologically to interpret the results of GWAS and SNPs reported to be associ- Osteoarthritis 126 62 consequently the translation of genetic discoveries into effective Fig. 1 ated with OA. UTR, untrans- 127 33 fi andCartilage 63 therapies remains elusive . Signi cant GWAS hits are reported in lated region. 128 64 the form of lead, or “sentinel”, SNPs which mark a genomic locus 129 65 associated with disease34. These loci can be large and complex 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 5/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 5

1 66 2 67 3 68 4 69 5 70 6 71 7 72 8 73 9 74 10 75 11 76 12 77 13 78 14 79 15 80 16 81 17 82 18 83 19 84 20 85 21 86 22 87 23 88 24 89 25 90 26 91 27 92 28 93 29 94 30 95 31 96 32 97 33 98 34 99 35 100 36 101 37 102 38 103 39 104 40 105 41 106 42 107 43 108 44 109 45 110 46 111 47 112 48 113 49 114 50 115 51 116 52 117 53 118 54 119 55 120 56 121 57 122 58 123 59 124 60 125 61 126 62 127 63 128 64 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 6/15

6 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

e 1 data44 48. The inclusion of such data into post-GWAS analysis helps Functional follow-up of OA genetic signals e laboratory studies 66 2 to assign a biological function to OA-associated variants. A prime 67 3 example of the successful mining of such datasets post-GWAS was Following statistical fine mapping and in silico analysis, focussed 68 4 reported in a recent study of hand OA25. The integration of relevant in vitro and in vivo studies of individual loci are essential to estab- 69 5 datasets prioritised the Wnt ligand gene WNT9A, its regulatory el- lish causal effects of allelic variation and to validate prioritised gene 70 6 ements, and causal variant rs1158850 at the locus25. Other public targets. In the OA field, a range of molecular genetic techniques 71 7 resources have been applied to the OA field to further support in have been applied for this purpose using patient joint tissues and 72 8 silico analyses. This includes TRANSFAC49 to identify SNP-mediated relevant primary and immortalised cells. Arthroplasty samples of 73 9 changes to transcription factor binding motifs, and GTEx30 to osteoarthritic joint tissues are invaluable to study the effect of ge- 74 10 colocalise eQTLs with associated SNPs36. However, as mentioned notype upon differential expression of genes (eQTLs) and to iden- 75 11 above, it is vital that such data integrations are correctly inter- tify correlations between genotype and DNA methylation (mQTLs; 76 12 preted, as many public databases do not include data generated in see below). Due to interindividual variability in gene expression, it 77 13 articular joint tissues and, as such, the continuation of data gen- is often difficult to detect eQTLs through direct stratification of gene 78 14 eration in human musculoskeletal tissues is vital to advance the expression levels by SNP genotype. This observation is supported 79 15 field. by the large numbers of patient samples required to identify such 80 16 tissue-specific effects in the GTEx database30. A complementary 81 17 82 18 83 19 84 20 Example of how an OA GWAS risk SNP can lead to mechanistic insights and an associated OA risk gene. 85 21 (A) Manhattan plot of an OA GWAS. Each dot represents a SNP. The -log10 P-value represents the sig- 86 22 nificance of each SNP being preferentially carried by OA patients vs healthy controls. The dotted line 87 23 represents the genome wide significance threshold. SNPs rs1, rs2 and rs3 are all OA significant risk SNPs. 88 24 (B) Cartilage mQTL analysis for rs1. Each circle represents a CpG site available on the methylation array. 89 25 The -log10 P-value represents the significance that methylation at a CpG is associated with rs1 genotype. 90 26 The dotted line represents the statistical significance threshold after multiple testing correction. The red 91 27 circle represents a CpG site (cg1) significantly associated with rs1 genotype (this is an mQTL). This mQTL 92 28 analysis should be done for each SNP that is found to be associated with OA (rs2 and rs3). This analysis is 93 29 usually limited to the CpGs within the physical proximity of each association SNP (e.g., 1 megabase (Mb) 94 30 region). (C) Arrows representing the location and transcriptional direction of each gene within the region. rs1 95 31 is located within GENE3. (D) Linkage disequilibrium (LD) in CEU population from the HapMap project. Image 96 32 correlates with the topologically associated domains (TADs), shown by each pyramid, within the region. 97 33 Red represents regions in LD, with each pyramid encompassing a region that is inherited as a block. In this 98 34 example both rs1 and cg1 are within the same TAD as are GENE2 and GENE3. These genes would be 99 35 prioritized as candidate OA risk genes for further analysis. (E) LD link r2 values between rs1 and each SNP 100 36 within the region. Each bar represents a SNP and the height corresponds to the r2 value between 0 and 1. 101 37 All SNPs in high LD with rs1 are just as likely to be the functional SNP. (F) UCSC (http://genome.ucsc.edu/) 102 38 track from GTEx showing all eQTLs operating within the region. Each line represents an eQTL in a tissue 103 39 type for one of the four genes listed. The position of the bar on the x-axis represents which SNP this 104 40 operates at. All SNPs with an eQTL are prioritized as functional. (G) UCSC track of ChIP-seq data in 105 41 chondrocytes and osteoblasts. Green represents transcription sites, yellow enhancer sites and red tran- Osteoarthritis 106 42 Fig. 2 107 scription start sites. Enhancers and transcription start sites are prioritized as functional. (H) Genome wide andCartilage 43 ATAC-seq data for knee articular cartilage40 available on UCSC browser. (I) Association between genotype 108 44 at rs1 and GENE2 expression levels using qPCR. Each dot represents cartilage tissue from a different 109 45 patient. C allele is associated with increased expression. (J) Violin plot showing association between ge- 110 46 notype at rs1 and methylation levels at cg1. The width of the blue violin represents the proportion of pa- 111 47 tients within that range, the bar the mean and the dotted line the quartile range. C allele at rs1 is associated 112 48 with decreased methylation at cg1. (K) Allelic expression imbalance (AEI) of GENE2. rs1 C allele is pref- 113 49 erentially transcribed compared to the T allele. (L) Association between GENE2 expression levels and cg1 114 50 methylation levels, highlighting a methylation-expression QTL (meQTL). Each dot represents cartilage tis- 115 51 sue from a different patient. Increased GENE2 expression is associated with decreased cg1 methylation 116 52 levels. (M) Association between GENE2 AEI and cg1 methylation, again highlighting an meQTL. An 117 53 increased ratio of rs1 C/T allele transcripts is associated with decreased methylation at cg1. (N) In vitro 118 54 Lucia reporter analysis with DNA insert containing rs1 and cg1. rs1 C and T alleles are compared to each 119 55 other with cg1 methylated or unmethylated. C allele and decreased methylation act synergistically to in- 120 56 crease enhancer activity. (O) In vitro deletion of cg1 region using CRISPR-Cas9 and subsequent gene 121 57 expression analysis of GENE2 and GENE3. Each dot represents a biological in vitro replicate. GENE3 122 58 expression levels are unchanged whilst GENE2 expression levels are reduced. (P) In vitro methylation and 123 59 demethylation of cg1 using CRISPR-dCas9 DNMT3a/TET1. On the x-axis, controls are listed by the 124 60 abbreviation C and cases by the lightning bolt. Methylation changes do not influence GENE3 expression 125 61 levels. Increased methylation of cg1 decreases GENE2 expression and conversely, decreased methylation 126 62 increases expression levels. This concludes that methylation at cg1 is only driving GENE2 expression. 127 63 GENE2 is deemed the target of the OA risk marked by SNP rs1. 128 64 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 7/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 7

1 approach, which is considerably more sensitive, is allelic expression gene TGFB1, and the collagen galactosyltransferase gene COLGALT2 66 2 imbalance (AEI) analysis, in which the relative ratio of mRNA as targets of OA genetic risk52,60,61. The development of a catalyti- 67 3 transcripts produced from each allele of a SNP are cally dead Cas9 (dCas9) fused to enzymes which either methylate 68 e 4 quantified12,18,50 57. This has proven useful in identifying or nar- (DNMT3a) or demethylate (TET1) CpGs has allowed precision 69 5 rowing down the effector gene, or genes, at a gene-rich OA locus, editing of DNAm at targeted CpGs for the first time. This is 70 6 such as at chr3p21.1, where significant AEI was identified in patient increasingly being applied to cell models of OA, and has demon- 71 7 cartilage for five of the seven investigated genes, including the strated causal relationships between CpG methylation and gene 72 8 nuclear protein gene GNL3 and the signal peptidase gene SPCS151.A expression at loci where correlative relationships have previously 73 9 separate investigation of OA risk conferred by the T allele of been identified in patient samples57. A recent study focussing on 74 10 rs4764133, identified a reduction in cartilage expression of the the OA risk residing at chr21q21 used a dCas9-TET1 construct for 75 11 matrix Gla protein gene MGP, relative to the non-risk allele C, targeted demethylation of a hypermethylated Methylation quan- 76 12 confirming this gene as the mediator of OA risk at this locus18.Of titative trait locus (mQTL) region within the promoter of the RWD 77 13 note, this effect was also observed in other joint tissues, whilst the domain-containing protein gene RWDD2B. A mean reduction in 78 14 opposite effect was observed in whole blood samples55, high- methylation of 21.5% within the cell population resulted in a 3.8- 79 15 lighting the tissue-specificity of the molecular mechanisms un- fold increase in gene expression, specifictoRWDD2B, confirming 80 16 derlying disease risk and the biological pleiotropy exerted by risk the gene as the target of OA-associated genetic and epigenetic ef- 81 17 SNPs. fects at the locus57. 82 18 Gene reporter assays have been widely applied to validate re- Functional follow-up studies of GWAS results therefore enable 83 19 gions for regulatory activity in vitro, and can be adapted to inves- the identification of target genes of OA associated SNPs. Table II lists 84 20 tigate the impact of both SNP genotype and DNAm upon regulatory some of the functional tools and approaches used in the analysis of 85 21 function52,53. Furthermore, such assays can be used to validate risk loci whilst Fig. 2 provides a schematic of how one can progress 86 22 putative microRNA (miRNA) binding sites and miRNA gene targets. from an association SNP (sections A to H) towards a functional 87 23 In 2019, a massively-parallel reporter assay (MPRA) was applied to variant (sections I to M) and target gene (sections N to P). The OA 88 24 investigate 35 known OA risk loci58. Klein and colleagues investi- targets so far discovered via such approaches include COLGALT2, 89 25 gated all SNPs in LD at each locus, a total of 1,605, and observed RUNX2, PLEC (encoding plectin), MGP, ALDH1A2 (encoding aldehyde 90 26 significant differential allelic regulation at six variants58. dehydrogenase), TGFB1, GDF5 (encoding growth differentiation 91 e 27 The advent of CRISPR-Cas9 and subsequent development of the factor 5), and RWDD2B (Table I)2,12,18,52 57,60,61. These genes encode 92 28 Cas9 toolbox has revolutionised targeted editing of the genome and proteins with diverse roles, including extracellular signaling 93 29 epigenome59. For functional analyses of OA risk loci, CRISPR has molecule (GDF5, TGFB1), extracellular calcium regulator (MGP), 94 30 been used to delete both putative regulatory elements and func- intracellular enzyme (ALDH1A2), transcription factor (RUNX2) and 95 31 tional SNPs in Tc28a2 chondrocytes, confirming the RUNX family cytoskeletal protein (PLEC), indicating that OA genetic risk is 96 32 transcription factor gene RUNX2, the transforming growth factor operating on a broad range of biological functions. 97 33 98 34 99 35 100 36 101 37 102 38 103 39 Tool Definition Utility and interpretation of data 104 40 Linkage disequilibrium The non-random association of two alleles within a population. LD SNPs in LD with the association SNP, and the genomic regions in which 105 41 (LD) ranges from 0 to 1. At perfect LD (r2 ¼ 1) alleles are inherited together they reside, are prioritized for further investigation 106 42 as a haplotype 107 ATAC-sequencing Assesses chromatin accessibility at a genome-wide level in cells of Open regions signify potential regulatory elements and are prioritized 43 108 interest as functionally important 44 ChIP-sequencing Categorizes DNA-protein binding regions into transcription start sites, Regions labelled as active are prioritized as functionally important 109 45 enhancers or other active regulatory sequences 110 46 Expression quantitative SNP genotype correlates with expression of a gene. Can be measured If expression correlates with genotype, the gene may be the target of 111 47 trait locus (eQTL) by allelic expression imbalance (AEI) analysis or by stratifying the functional effect mediated by the association signal 112 expression of a gene by GWAS SNP genotype 48 Methylation quantitative SNP genotype correlates with methylation levels at a CpG If methylation correlates with genotype, that CpG may be an 113 49 trait locus (mQTL) intermediate of the functional effect mediated by the association 114 50 signal 115 51 Methylation and CpG methylation levels correlate with the expression of a gene The GWAS SNP may act by differentially methylating DNA to regulate 116 expression gene expression 52 117 quantitative trait 53 locus (meQTL) 118 54 Reporter analysis in vitro experiment assessing reporter protein expression driven by a Can be used to confirm that a sequence is a regulator of gene 119 55 DNA insert cloned at a promoter or enhancer site expression, that the two alleles of a SNP have differential regulatory 120 56 activity, and that these effects can be modulated by DNA methylation 121 CRISPR-cas9 Sequence-specific guide RNAs are used to delete or alter the sequence Can reveal a particular gene from amongst many as being the target of 57 of a DNA region of interest in a cell line or primary cell to assess effect the association signal 122 58 on gene expression 123 59 in vitro CRISPR dCas9- Sequence-specific guide RNAs are used to hyper- or hypo-methylate a Can confirm the role of DNA methylation as an intermediate between 124 60 DNMT3a/TET1 DNA sequence to assess effect on gene expression an association signal and the expression of its target gene 125 61 126 62 Osteoarthritis 127 63 Table II A list of functional tools and how they are used in OA genetic studies andCartilage 128 64 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 8/15

8 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

1 Of the above eight genes, six of the respective protein products Interaction between OA genetics and epigenetics 66 2 fall within a network of protein interactions (Fig. 3). Interestingly, 67 3 two additional proteins in this network, CDC5L (cell division cycle 5 In recent years, the increase in the numbers of reported OA 68 4 like) and SMAD3 (SMAD family member 3), are also encoded by genetic risk variants, along with technological advances allowing 69 5 genes that reside at OA risk loci (Table I)20,22,24. This in-depth epigenetic studies in cartilage, have uncovered an inter- 70 6 proteineprotein interaction (PPI) network indicates that multiple play between genetics and epigenetics in OA. This was recently the 71 7 variants on distinct can lead to the dysregulation of subject of an in-depth review62 and will only be discussed briefly 72 8 genes that play vital roles in shared chondrocyte pathways. here. 73 9 Furthermore, the network of OA risk gene products (PPI enrich- There are three main mechanisms of epigenetic regulation of 74 10 ment P ¼ 1.0 10 16) is enriched for the (GO; http:// gene expression: post-translational modification of histones, non- 75 11 geneontology.org/) processes extracellular matrix organization, coding RNAs (ncRNAs, such as miRNAs), and DNAm, of which the 76 12 tissue development, and cellular response to growth factor stim- latter is the most well-studied. Studies of the cartilage DNA 77 13 ulus (Fig. 3). Such PPI networks offer mechanistic insight into the methylome have led to the discovery of OA-associated methylation 78 14 liability threshold model of polygenic diseases mentioned earlier, quantitative trait loci (mQTLs), at which there is a correlation be- 79 15 by highlighting that when a pathway is impacted by multiple risk tween genotype at an OA risk SNP and cis DNAm25,54,63,64.DNAmis 80 16 alleles, it is likely to be perturbed. thought to act as a conduit through which the functional effect of 81 17 82 18 83 19 84 Protein Status 20 85 21 86 Encoded by known 22 87 OA risk gene GDF5 23 88 24 Encoded by gene at 89 OA risk locus 25 90 26 BMPR1B 91 Gene Ontology Process 27 RUNX2 92 28 Extracellular matrix SMAD3 93 29 organisiation SMAD2 94 30 Tissue development TGFBR2 95 31 MGP 96 TGFB1 32 Cellular response to 97 growth factor stimulus TGFBR1 33 CTGF 98 34 VEGFA TIMP1 99 35 100 ENG 36 FN1 101 37 SMAD7 102 38 SERPINE1 103 39 104 40 THBS1 105 41 106 42 107 ITGAV 43 ITGA6 ITGB6 108 44 109 45 110 46 PLEC 111 47 112 ITGB4 48 113 COL4A5 49 114 50 115 CDC5L 51 116 52 117 53 118 COL17A1 COLGALT2 54 119 55 120 56 A STRING proteineprotein interaction network of OA risk gene products. Yellow nodes indicate proteins 121 57 encoded by OA risk genes COLGALT2, RUNX2, PLEC, MGP, TGFB1 and GDF5. Blue nodes are proteins 122 58 encoded by unconfirmed causal genes SMAD3 and CDC5L at OA risk loci. Grey nodes are protein inter- 123 59 actors. Nodes were functionally annotated with the top three Gene Ontology (GO) processes: extracellular 124 Osteoarthritis 60 Fig. 3 matrix (ECM) organization, green (P ¼ 5.23 10 14); tissue development, orange (P ¼ 5.06 10 13); and 125 andCartilage 61 cellular response to growth factor stimulus, purple (P ¼ 1.63 10 13). Edge thickness (lines between 126 62 proteins) indicates the confidence of interactions, based upon experimental evidence, co-expression, da- 127 63 tabases, and text-mining. STRING website; https://string-db.org/. 128 64 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 9/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 9

1 66 2 67 3 68 4 69 Locus GWAS SNP Proxy SNP r2* SNP SNP position CpG ID CpG position FDR Slope Gene Reference r2** 5 chromosome (hg19) (hg19) 70 6 71 1 rs11583641 rs10911472 1.00 1 18,39,06,245 cg18131582 18,39,12,305 2.50E- 0.52 COLGALT2 64 7 03 72 8 2 rs10916199 rs10799428 0.97 1 22,79,02,472 cg09796739 22,79,24,055 6.89E- 0.38 ZNF678 26 73 9 07 74 10 cg11520395 22,79,24,115 6.83E- 0.21 75 11 03 76 3 rs62182810 rs2305417 0.94 2 20,43,87,482 cg10114877 20,44,27,199 4.66E- 0.98 RAPH1 64 12 08 77 13 4 rs6976/ e 3 5,27,28,804 cg18099408 5,25,52,593 3.85E- 0.79 GNL3 65 0.89 78 14 rs11177* 07 79 15 e cg13060642 5,25,56,643 2.86E- 0.26 80 02 16 81 e cg27294008 5,27,48,359 2.23E- 0.23 17 02 82 18 e cg18404041 5,28,24,283 2.51E- 0.21 83 19 04 84 e 20 rs678 3 5,28,20,981 cg18099408 5,25,52,593 4.43E- 0.76 Novel to this 85 06 analysis 21 e cg13060642 5,25,56,643 1.36E- 0.28 86 22 02 87 23 e cg27294008 5,27,48,359 2.95E- 0.23 88 24 02 89 e cg18404041 5,28,24,283 2.75E- 0.24 25 06 90 26 5 rs11732213 rs798756 1.00 4 17,04,244 cg20987369 15,79,572 5.00E- 0.39 SLBP 64 91 27 03 92 28 4 cg25007799 15,79,657 5.35E- 0.73 93 29 03 94 6 rs9277552 rs9277464 0.77 6 3,30,55,501 cg02197634 3,30,48,875 1.43E- 0.96 HLA-DPB1 64 30 02 95 31 cg25491704 3,30,48,879 4.38E- 0.86 96 32 03 97 33 cg13921245 3,30,53,791 6.21E- 0.40 98 03 34 99 cg02375585 3,30,91,111 3.81E- 0.64 35 02 100 36 7 rs10948155 rs529125 0.82 6 4,46,87,957 cg13979708 4,46,95,318 2.05E- 0.36 SUPT3H/ Novel to this 0.60 101 37 04 RUNX2 analysis 102 38 cg20913747 4,46,95,427 3.56E- 0.66 103 07 39 rs10948172 e 6 4,47,77,691 cg13979708 4,46,95,318 2.05E- 0.36 65 104 40 04 105 41 e cg20913747 4,46,95,427 3.56E- 0.66 106 42 07 107 8 rs1997995 rs9296459 0.94 6 4,53,74,183 cg25494480 4,53,87,917 4.34E- 0.32 SUPT3H/ Novel to this 43 108 02 RUNX2 analysis 44 9 rs60890741 rs12542856 0.96 8 13,07,68,504 cg18170545 13,10,80,137 1.59E- 0.35 ASAP1 64 109 45 02 110 46 10 rs11780978 e 8 14,50,34,852 cg19405177 14,50,01,428 2.48E- 0.77 PLEC 55 111 47 17 112 e cg20784950 14,50,02,522 1.31E- 0.47 48 05 113 49 e cg01870834 14,50,02,835 1.09E- 0.38 114 50 05 115 51 e cg24891660y 14,50,03,653 3.81E- 0.42 116 02 52 117 e cg07427475 14,50,08,110 6.18E- 1.24 53 18 118 54 e cg02331830 14,50,08,288 4.48E- 0.99 119 55 08 120 56 e cg04255391 14,50,08,397 8.46E- 0.84 121 16 57 122 e cg14598846 14,50,08,909 7.25E- 1.24 58 22 123 59 e cg23299254 14,50,08,957 2.81E- 1.24 124 60 18 125 cg10299941 14,50,09,137 3.44E- 0.33 61 e 126 02 62 e cg21511203y 14,50,15,037 2.93E- 0.25 127 63 02 128 64 cg07212837y 14,50,68,773 0.11 129 65 (continued on next page) 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 10/15

10 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

1 Table III (continued ) 66

2 Locus GWAS SNP Proxy SNP r2* SNP SNP position CpG ID CpG position FDR Slope Gene Reference r2** 67 3 chromosome (hg19) (hg19) 68 4 69 e 4.34E- 5 02 70 6 11 rs2070852 e 11 4,67,44,925 cg03339077 4,71,65,057 7.53E- 0.38 F2 26 71 7 03 72 8 12 rs10896015 rs12270054 1.00 11 6,53,23,725 cg21890820 6,53,08,645 4.45E- 0.38 LTBP3 Novel to this 73 02 analysis 9 13 rs4764133 rs11614333 0.98 12 1,50,64,363 cg20917083 1,51,14,233 1.76E- 0.27 MGP 64 74 10 02 75 11 14 rs317630 rs490872 1.00 12 6,96,37,847 cg22375663 6,97,25,435 1.31E- 1.06 CPSF6 64 76 12 11 77 15 rs1060105 e 12 12,38,06,219 cg21745287 12,34,64,511 2.23E- 0.39 SBNO1 Novel to this 0.12 13 78 02 analysis 14 e cg10169515 12,37,07,536 6.89E- 0.77 79 15 07 80 16 rs56116847 rs28466887 0.96 12 12,38,35,233 cg10169515 12,37,07,536 4.36E- 0.46 Novel to this 81 17 02 analysis 82 16 rs3204689 rs3204689 15 5,82,46,802 cg12031962 5,83,53,849 1.12E- 0.62 ALDH1A2 65 18 07 83 19 17 rs35206230 rs1378940 1.00 15 7,50,97,780 cg20040747 7,47,15,105 1.97E- 0.29 CSK 64 84 20 03 85 21 cg10253484 7,51,65,896 1.10E- 0.36 86 02 22 87 18 rs6499244 rs1364063 0.90 16 6,97,35,271 cg26736200 6,99,51,706 1.06E- 0.48 WWP2 64 0.26 23 02 88 24 cg26661922 6,99,51,820 6.16E- 0.45 89 25 03 90 26 rs34195470 rs904809 0.79 16 6,99,55,690 cg26736200 6,99,51,706 1.20E- 1.02 Novel to this 91 26 analysis 27 cg26661922 6,99,51,820 4.85E- 0.90 92 28 22 93 29 19 rs2953013 e 17 2,94,96,343 cg13263104 2,96,71,293 4.86E- 0.33 NF1 64 94 30 02 95 20 rs62063281 rs16940665 1.00 17 4,40,38,785 cg17117718 4,36,63,208 1.20E- 1.65 MAPT 64 31 96 26 32 cg10826688 4,37,14,992 1.07E- 0.35 97 33 03 98 34 cg15295732 4,39,42,128 3.89E- 0.39 99 35 04 100 cg11117266 4,39,71,461 5.30E- 0.46 36 03 101 37 cg16520312 4,39,71,471 2.26E- 0.31 102 38 03 103 39 cg18878992 4,39,74,344 1.95E- 1.40 104 14 40 105 cg18228076 4,39,83,362 7.91E- 1.10 41 11 106 42 cg07368127 4,42,30,994 4.34E- 0.44 107 43 02 108 44 cg15633388 4,42,66,530 1.16E- 1.52 109 18 45 cg23616531 4,42,69,258 5.20E- 0.41 110 46 05 111 47 21 rs10502437 e 18 2,09,70,706 cg15535896 2,10,79,381 1.68E- 1.20 TMEM241 Novel to this 112 48 02 analysis 113 49 22 rs143383 rs6087704 0.91 20 3,40,25,756 cg14752227 3,40,00,481 8.54E- 0.78 GDF5 114 03 50 23 rs6516886 rs2832155 1.00 21 3,03,93,664 cg18001,427 3,03,91,784 3.81E- 0.53 RWDD2B 55 115 51 02 116 52 cg20220242 3,03,92,188 1.05E- 0.68 117 53 05 118 cg24751378 3,03,96,349 3.09E- 0.23 54 04 119 55 cg16140273 3,04,55,616 1.86E- 0.34 120 56 02 121 57 122 58 Cartilage methylation quantitative trait loci (mQTLs) at OA association SNPs. The nearest protein 123 59 coding gene to the mQTL is named. The physical location of the CpG (hg19 genome assembly) is 124 60 reported. r2*, linkage disequilibrium (LD) value between association SNP and the proxy SNP used for 125 61 Table III mQTL analysis; an r2 of 1.0 is perfect LD. r2**, linkage disequilibrium value between the SNPs marking Osteoarthritis 126 62 two mQTLs at a locus. rs6976/rs11177*, SNPs are in perfect LD. FDR, P-values were adjusted to andCartilage 127 63 account for the tests performed using a false discovery rate (FDR) estimation based on Benjamini- 128 64 Hochberg correction 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 11/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 11

1 SNP genotype is exerted upon expression of the target gene (Fig. 2). deleterious effects of allelic drift and of pleiotropic alleles will 66 2 These analyses of the cartilage epigenome have allowed the become increasingly important to public health as populations 67 3 epigenetic prioritisation of regulatory regions (harbouring the age and the prevalence of OA continues to increase. 68 4 mQTLs), causal SNPs (located within the regions) and effector genes The concept that OA has etiological routes in joint development 69 e 5 (regulated by the regions)54. We have recently repeated the mQTL is not a new one69 72 and many studies have considered the role of 70 e 6 analysis in our dataset of 87 hip and knee arthroplasty cartilage OA SNPs as modulators of joint shape73 77. However, the data from 71 7 samples to include all OA GWAS SNPs reported to date, including the Capellini team provides evidence that a proportion of OA ge- 72 8 both novel variants and those previously analysed (Table III). Of the netic risk is instigated at the start of life. This has profound impli- 73 9 124 SNPs across the 95 independent loci reported in the literature cations in that joint development may be as important, if not more 74 10 (Table I), we were able to investigate 114, which were either directly so, than joint maintenance when considering fundamental causes 75 11 genotyped on our array, or for which a suitable proxy was available. of the disease. There are also consequences for the functional and 76 12 Genotype at all SNPs was investigated for correlations with CpG genomic studies of OA GWAS signals. At present, when primary 77 13 methylation in a 1 megabase (Mb) region flanking the association tissues are investigated these tend to be donated by OA patients 78 14 signal as previously described54. Twenty-eight SNPs were shown to who have undergone arthroplasty. In future, developmental tissues 79 15 mediate mQTLs across 23 different loci (some SNPs had shared should also be analyzed. This will enable further understanding of 80 16 effects upon DNAm at CpGs, Table III). This is consistent with our the interaction between our DNA and its own self-regulatory 81 17 previous observations that 25% of OA risk loci are also mechanisms, how this relationship adapts throughout the life- 82 18 mQTLs54,63,64. course, and when and how functional effects are exerted, leading to 83 19 Several OA risk loci have been identified which colocalise with disease. As an example, one could examine foetal tissue to assess if 84 20 genes encoding histone-modifying proteins including the tran- OA associated eQTLs identified in OA patients are operating during 85 21 scriptional regulator SUPT3H, the nuclear receptor coactivator development. If so, this would imply that the gene regulatory ele- 86 e 22 NCOA3 and the histone methyltransferase DOT1L8 11,14. DOT1L ments that are the targets of OA genetic risk are also functionally 87 23 expression is essential for cartilage homeostasis, and the identi- active during development and that they then go on to predispose 88 24 fication of this region as a risk locus strongly supports the to OA as we age. 89 25 requirement for tight regulation of chromatin state to maintain 90 26 cartilage integrity62. Similarly, OA risk SNPs have been identified 91 27 in the region of cartilage-specific ncRNAs that are known to be Clinical utility of OA genetics 92 28 vital for homeostasis of the articular joint surface and are dys- 93 29 regulated in OA, including the miRNAs miR-140 and miR-45523,25. Although the biological complexity of OA is daunting, the 94 30 Our knowledge of the interplay between genetics and epige- progress being made in our molecular genetic understanding of the 95 31 netics is rapidly developing as novel technologies emerge that disease has the potential to inform and accelerate development of 96 32 allow interrogation of the epigenome. Future analyses require in- prognostic markers and tailored OA therapeutics78. Indeed, several 97 33 depth studies utilising large sample sizes to further enhance our disease modifying osteoarthritis drugs (DMOADs) that are 98 34 understanding of the role of epigenetics as a contributor to OA currently in clinical trials, including intra-articular TGF-b and FGF- 99 35 development. 18 growth factor therapies and Wnt inhibitors79, are targeting 100 36 proteins whose genes have been highlighted by GWAS21,24. 101 37 The link between OA genetics and epigenetics opens up the 102 38 Evolutionary origins and developmental effects of OA genetic risk latter as a therapeutic option. As noted earlier, several OA risk loci 103 39 colocalise with genes encoding histone-modifying proteins8,9,10,11,14 104 40 One of the most fascinating investigations of OA genetics in and histone deacetylase (HDAC) inhibitors do show efficacy as in- 105 41 recent years has been into the role of natural selection as humans hibitors of the expression of catabolic molecules, such as matrix 106 42 adapted to bipedalism and its relationship to OA risk. As noted metalloproteinases (MMPs) and IL-1, in OA chondrocyte and mouse 107 43 above, the vast majority of OA SNPs reside in non-coding regions OA models80. The ability to modulate DNAm using CRISPR-Cas9 108 44 of the genome and are presumed to regulate gene expression. In a tools is also promising. In the functional study cited above, a dCas9- 109 45 series of publications, Capellini and colleagues have shown that a TET1 construct was used for demethylation of a hypermethylated 110 46 large proportion of these SNPs are enriched in chondrocyte reg- mQTL within the promoter of RWDD2B57. The resulting increase in 111 47 ulatory elements that are active during joint formation in the expression of RWDD2B reversed the impact of the OA genetic risk at 112 48 embryo, that these elements have been subjected to selection to this locus. Although undertaken in an immortalized cell line, this 113 49 facilitate bipedalism, and that their DNA sequences have then study highlights the possibility of using epigenome editing to 114 50 been constrained to further changes to preserve the derived joint counter the gene expression effects of a risk locus. 115 e 51 shape41,65 67. It would appear, therefore, that a proportion of OA The use of DMOADs and epigenetic interventions will need to 116 52 risk alleles are subtly changing these regulatory elements (for consider how best to deliver a therapeutic into the joint tissue and 117 53 example, by quantitatively altering the binding of gene regulatory ensure its permanency, and which patients to target for which 118 54 proteins), that this is tolerated due to only modest effects on joint therapies79. This latter issue may be aided by the application of OA 119 55 morphology but that these changes become pathological and polygenic risk scores (PRS). As ever more GWAS loci are reported for 120 56 detrimentalasweage.Sincethesenegativeeffectsonlyimpact a given disease, there is an enhanced ability to stratify an in- 121 57 the elderly, there is no selective pressure acting against the al- dividual's probability of developing disease based upon frequency 122 58 leles. It is proposed that these OA risk alleles have achieved and effect size of their inherited variants, generating a PRS81. The 123 59 appreciable frequency in the population by drift or by antago- development of PRS has the potential to accurately screen for dis- 124 60 nistic pleiotropy41. The latter is when an allele has a positive ease risk among populations and can inform interventions82,83. 125 61 phenotypic effect in the young but a negative phenotypic effect in There are therefore reasonable grounds to be optimistic that the 126 62 the elderly and is a process thought to contribute to the occur- data emerging from OA genetic studies, and from the genomic 127 63 rence of many common age-associated diseases68.These analyses that complement these, will be applied to patient 128 64 129 65 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 12/15

12 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

1 treatment84. Based on the likely developmental origin of some OA Conflict of interest 66 2 risk, translation of genetic discoveries will need to consider the The authors declare that they have no conflicts of interest related to 67 3 time in an individual's life at which it is most optimal to start a the content of this manuscript. 68 4 treatment. 69 5 Acknowledgements 70 6 71 Concluding remarks 7 Our research is supported by Versus Arthritis (grant 20771), by 72 8 the Medical Research Council and Versus Arthritis Centre for Inte- 73 Many OA genetics reviews have been written over the years and 9 grated Research into Musculoskeletal Ageing (CIMA, grant refer- 74 each new one provides further insight into this common, complex 10 ences JXR 10641, MR/P020941/1 and MR/R502182/1), by The John 75 disease. This reflects the speed at which the field is advancing, with Q2 11 George William Patterson (JGWP) Foundation, and by the Ruth & 76 new discoveries being reported on a regular basis. The bedrock of 12 Lionel Jacobson Charitable Trust. These funding bodies had no 77 this is the highly successful application of GWAS to large cohorts, 13 editorial influence on the content of this manuscript. 78 which have so far yielded over 100 OA GWAS SNPs (Table I). From 14 79 these studies, it is apparent that OA genetic risk is operating on a 15 References 80 range of biological mechanisms encompassing articular joint for- 16 81 mation, homeostasis and maintenance (Fig. 3), with impacts on the 17 1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, 82 expression of synovial joint tissue genes being common. 18 Malangone C, et al. The NHGRI-EBI GWAS Catalog of published 83 Although impressive, the current proportion of the heritability 19 genome-wide association studies, targeted arrays and sum- 84 accounted for by known OA risk loci is just over 20%24, meaning 20 mary statistics 2019. Nucleic Acids Res 2019;47:D1005e12. 85 that there are still a large number of loci to be discovered. Efforts 21 2. Miyamoto Y, Mabuchi A, Shi D, Kubo T, Takatori Y, Saito S, et al. 86 are underway to fill the gap and at an impressive scale, with large 22 A functional polymorphism in the 5' UTR of GDF5 is associated 87 cohorts from across the globe being investigated (https://www. 23 with susceptibility to osteoarthritis. Nat Genet 2007;39: 88 genetics-osteoarthritis.com/home/index.html). What is particu- 24 529e33. 89 larly exciting about these studies is that ethnic groups who have 25 3. Miyamoto Y, Shi D, Nakajima M, Ozaki K, Sudo A, Kotani A, 90 been underrepresented in OA GWAS are now starting to be 26 et al. Common variants in DVWA on chromosome 3p24.3 are 91 included, which should enable a clearer picture to be generated of 27 associated with susceptibility to knee osteoarthritis. Nat Genet 92 the genetic architecture of OA at a more global population level. 28 2008;40:994e8. 93 Ongoing GWAS should continue to be complemented by follow-up 29 4. Nakajima M, Takahashi A, Kou I, Rodriguez-Fontenla C, 94 genomic data analyses. This should not be restricted to cartilage nor 30 Gomez-Reino JJ, Furuichi T, et al. New sequence variants in 95 to a particular age group: as touched on above, the molecular ge- 31 HLA class II/III region associated with susceptibility to knee 96 netics of OA should be considered as a risk running throughout the 32 osteoarthritis identified by genome-wide association study. 97 life-course and not restricted to older individuals. 33 PloS One 2010;5, e9723. 98 Although for many loci it is possible to highlight a causal SNP 34 5. Kerkhof HJ, Lories RJ, Meulenbelt I, Jonsdottir I, Valdes AM, 99 and then prioritize a target gene by statistical fine mapping and in 35 Arp P, et al. A genome-wide association study identifies an 100 silico analysis, this is not always the case. The application of labo- 36 osteoarthritis susceptibility locus on chromosome 7q22. 101 ratory-based functional tools is essential to validate a target and to 37 Arthritis Rheum 2010;62:499e510. 102 elucidate one when fine-mapping draws a blank. Going forward, 38 6. Valdes AM, Evangelou E, Kerkhof HJ, Tamm A, Doherty SA, 103 the use of cells with enhanced chondrogenic potential85 combined 39 Kisand K, et al. The GDF5 rs143383 polymorphism is associated 104 with three dimensional (3D) models of cartilage, and of cartilage 40 with osteoarthritis of the knee with genome-wide statistical 105 with bone86, will provide more robust and realistic cell and organ 41 significance. Ann Rheum Dis 2011;70:873e5. 106 models for the functional analyses of OA SNPs and their target 42 7. Day-Williams AG, Southam L, Panoutsopoulou K, Rayner NW, 107 genes. These models will also enable the inclusion of mechanical 43 Esko T, Estrada K, et al. A variant in MCF2L is associated with 108 load as an experimental parameter, further aligning them with the 44 osteoarthritis. Am J Hum Genet 2011;89:446e50. 109 in vivo reality of an articulating joint87. A complementary approach 45 8. arcOGEN Consortium, arcOGEN Collaborators, Zeggini E, 110 to such in vitro functional studies is the investigation of risk alleles 46 Panoutsopoulou K, Southam L, Rayner NW, et al. Identification 111 in mice. This has so far proven particularly insightful for the OA risk 47 of new susceptibility loci for osteoarthritis (arcOGEN): a 112 that maps to the GDF5 locus41 and similar reports are appearing in 48 genome-wide association study. Lancet 2012;380:815e23. 113 the literature88. A recent database of genes associated with OA in 49 9. Castano-Betancourt~ MC, Cailotto F, Kerkhof HJ, Cornelis FM, 114 animal models can complement these investigations89. The degree 50 Doherty SA, Hart DJ, et al. Genome-wide association and 115 to which the functional characterization of OA GWAS signals will 51 functional studies identify the DOT1L gene to be involved in 116 benefit from animal models is open to debate, but clearly there are 52 cartilage thickness and hip osteoarthritis. Proc Natl Acad Sci 117 grounds for optimism88,90. 53 USA 2012;109:8218e23. 118 Finally, although no OA genetic study has yet led to a diagnostic 54 10. Evangelou E, Valdes AM, Castano-Betancourt~ MC, Doherty M, 119 tool or treatment, some of the OA-associated genes encode proteins 55 Doherty S, Esko T, et al. The DOT1L rs12982744 polymorphism 120 that are active in pathways which have therapeutics available 56 is associated with osteoarthritis of the hip with genome-wide 121 (Fig. 3), several of which are in clinical trials21,24. Genetic discov- 57 statistical significance in males. Ann Rheum Dis 2013;72: 122 eries do therefore have translatable potential and as more GWAS 58 1264e5. 123 loci are reported, the current gap between discovery and utility will 59 11. Evangelou E, Kerkhof HJ, Styrkarsdottir U, Evangelia NE, 124 narrow. 60 Bos SD, Esko T, et al. A meta-analysis of genome-wide associ- 125 61 ation studies identifies novel variants associated with osteo- 126 62 Author contributions arthritis of the hip. Ann Rheum Dis 2014;73:2130e6. Q3 127 63 All authors were involved in drafting the article and revising it 12. Styrkarsdottir U, Thorleifsson G, Helgadottir HT, Bomer N, 128 64 critically for intellectual content, and all authors approved the final Metrustry S, Bierma-Zeinstra S, et al. Severe osteoarthritis of 129 65 version to be submitted. the hand associates with common variants within the 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 13/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 13

1 ALDH1A2 gene and with variants at 1p31. Nat Genet 2014;46: 29. Lappalainen T, Sammeth M, Friedlander MR, ‘t Hoen PA, 66 2 498e502. Monlong J, Rivas MA, et al. Transcriptome and genome 67 3 13. Evans DS, Cailotto F, Parimi N, Valdes AM, Castano-~ sequencing uncovers functional variation in humans. Nature 68 4 Betancourt MC, Liu Y, et al. Genome-wide association and 2013;501:506e11. 69 5 functional studies identify a role for IGFBP3 in hip osteoar- 30. GTEx Consortium. Genetic effects on gene expression across 70 6 thritis. Ann Rheum Dis 2015;74:1861e7. human tissues. Nature 2017;550:204e13. 71 7 14. Castano-Betancourt~ MC, Evans DS, Ramos YF, Boer CG, 31. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. 72 8 Metrustry S, Liu Y, et al. Novel genetic variants for cartilage Trait-associated SNPs are more likely to be eQTLs: annotation 73 9 thickness and hip osteoarthritis. PLoS Genet 2016;12, to enhance discovery from GWAS. PLoS Genet 2010;6, 74 10 e1006260. e1000888. 75 11 15. Styrkarsdottir U, Helgason H, Sigurdsson A, Norddahl GL, 32. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, 76 12 Agustsdottir AB, Reynard LN, et al. Whole-genome sequencing Wang H, et al. Systematic localization of common disease- 77 13 identifies rare genotypes in COMP and CHADL associated with associated variation in regulatory DNA. Science 2012;337: 78 14 high risk of hip osteoarthritis. Nat Genet 2017;49:801e5. 1190e5. 79 15 16. Yau MS, Yerges-Armstrong LM, Liu Y, Lewis CE, Duggan DJ, 33. Gallagher MD, Chen-Plotkin AS. The post-GWAS era: from 80 16 Renner JB, et al. Genome-wide association study of radio- association to function. Am J Hum Genet 2018;102:717e30. 81 17 graphic knee osteoarthritis in north American Caucasians. 34. Pers TH, Karjalainen JM, Chan Y, Westra HJ, Wood AR, Yang J, 82 18 Arthritis Rheum 2017;69:343e51. et al. Biological interpretation of genome-wide association 83 19 17. Liu Y, Yau MS, Yerges-Armstrong LM, Duggan DJ, Renner JB, studies using predicted gene functions. Nat Commun 2015;6: 84 20 Hochberg MC, et al. Genetic determinants of radiographic knee 1e9. 85 21 osteoarthritis in African Americans. J Rheumatol 2017;44: 35. Zaykin DV, Zhivotovsky LA. Ranks of genuine associations in 86 22 1652e8. whole-genome scans. Genetics 2005;171:813e23. 87 23 18. den Hollander W, Boer CG, Hart DJ, Yau MS, Ramos YFM, 36. Schaid DJ, Chen W, Larson NB. From genome-wide associations 88 24 Metrustry S, et al. Genome-wide association and functional to candidate causal variants by statistical fine-mapping. Nat 89 25 studies identify a role for matrix Gla protein in osteoarthritis Rev Genet 2018;19:491e504. 90 26 of the hand. Ann Rheum Dis 2017;76:2046e53. 37. Hutchinson A, Watson H, Wallace C. Improving the coverage of 91 27 19. Panoutsopoulou K, Thiagarajah S, Zengini E, Day-Williams AG, credible sets in Bayesian genetic fine-mapping. PLoS Comput 92 28 Ramos YF, Meessen JM, et al. Radiographic endophenotyping Biol 2020;16, e1007829. 93 29 in hip osteoarthritis improves the precision of genetic associ- 38. den Hollander W, Pulyakhina I, Boer C, Bomer N, van der 94 30 ation studies. Ann Rheum Dis 2017;76:1199e206. Breggen R, Arindrarto W, et al. Annotating transcriptional ef- 95 31 20. Hackinger S, Trajanoska K, Styrkarsdottir U, Zengini E, fects of genetic variants in disease-relevant tissue: tran- 96 32 Steinberg J, Ritchie GR, et al. Evaluation of shared genetic scriptome-wide allelic imbalance in osteoarthritic cartilage. 97 33 aetiology between osteoarthritis and bone mineral density Arthritis Rheum 2019;71:561e70. 98 34 identifies SMAD3 as a novel osteoarthritis risk locus. Hum Mol 39. Steinberg J, Ritchie GR, Roumeliotis TI, Jayasuriya RL, Clark MJ, 99 35 Genet 2017;26:3850e8. Brooks RA, et al. Integrative epigenomics, transcriptomics and 100 36 21. Zengini E, Hatzikotoulas K, Tachmazidou I, Steinberg J, proteomics of patient chondrocytes reveal genes and path- 101 37 Hartwig FP, Southam L, et al. Genome-wide analyses using UK ways involved in osteoarthritis. Sci Rep 2017;7:8935. 102 38 Biobank data provide insights into the genetic architecture of 40. Liu Y, Chang J-C, Hon C-C, Fukui N, Tanaka N, Zhang Z, et al. 103 39 osteoarthritis. Nat Genet 2018;50:549e58. Chromatin accessibility landscape of articular knee cartilage 104 40 22. Styrkarsdottir U, Lund SH, Thorleifsson G, Zink F, reveals aberrant enhancer regulation in osteoarthritis. Sci Rep 105 41 Stefansson OA, Sigurdsson JK, et al. Meta-analysis of Icelandic 2018;8:15499. 106 42 and UK data sets identifies missense variants in SMO, IL11, 41. Richard D, Liu Z, Cao J, Kiapour AM, Willen J, Yarlagadda S, 107 43 COL11A1 and 13 more new loci associated with osteoarthritis. et al. Evolutionary selection and constraint on human knee 108 44 Nat Genet 2018;50:1681e7. chondrocyte regulation impacts osteoarthritis risk. Cell 109 45 23. Casalone E, Tachmazidou I, Zengini E, Hatzikotoulas K, 2020;181:362e81. 110 46 Hackinger S, Suveges D, et al. A novel variant in GLIS3 is asso- 42. Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, 111 47 ciated with osteoarthritis. Ann Rheum Dis 2018;77:620e3. Ernst J, Bilenky M, Yen A, et al. Integrative analysis of 111 112 48 24. Tachmazidou I, Hatzikotoulas K, Southam L, Esparza-Gordillo J, reference human epigenomes. Nature 2015;518:317e29. 113 49 Haberland V, Zheng J, et al. Identification of new therapeutic 43. Wang Y, Song F, Zhang B, Zhang L, Xu J, Kuang D, et al. The 3D 114 50 targets for osteoarthritis through genome-wide analyses of UK Genome Browser: a web-based browser for visualizing 3D 115 51 Biobank data. Nat Genet 2019;51:230e6. genome organization and long-range chromatin interactions. 116 52 25. Boer CG, Yau MS, Rice SJ, Coutinho de Almeida R, Cheung K, Genome Biol 2018;19:151. 117 53 Styrkarsdottir U, et al. Genome-wide association of pheno- 44. Rushton MD, Reynard LN, Barter MJ, Refaie R, Rankin KS, 118 54 types based on clustering patterns of hand osteoarthritis Young DA, et al. Characterization of the cartilage DNA meth- 119 55 identify WNT9A as novel osteoarthritis gene. Ann Rheum Dis ylome in knee and hip osteoarthritis. Arthritis Rheum 120 56 2021, https://doi.org/10.1136/annrheumdis-2020-217834 (in 2014;66:2450e60. 121 57 Q4 press). 45. Fernandez-Tajes J, Soto-Hermida A, Vazquez-Mosquera ME, 122 58 26. Okada Y, Eyre S, Suzuki A, Kochi Y, Yamamoto K. Genetics of Cortes-Pereira E, Mosquera A, Fernandez-Moreno M, et al. 123 59 rheumatoid arthritis: 2018 status. Ann Rheum Dis 2019;78: Genome-wide DNA methylation analysis of articular chon- 124 60 446e53. drocytes reveals a cluster of osteoarthritic patients. Ann 125 61 27. Dempster ER, Lerner IM. Heritability of threshold characters. Rheum Dis 2014;73:668e77. 126 62 Genetics 1950;35:212e36. 46. Jeffries MA, Donica M, Baker LW, Stevenson ME, Annan AC, 127 63 28. Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: Humphrey MB, et al. Genome-wide DNA methylation study 128 64 leveraging knowledge across phenotype-gene relationships. identifies significant epigenomic changes in osteoarthritic 129 65 Nucleic Acids Res 2019;47:D1038e43. cartilage. Arthritis Rheum 2014;66:2804e15. 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 14/15

14 G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx

1 47. Moazedi-Fuerst FC, Hofner M, Gruber G, Weinhaeusel A, mapping to novel osteoarthritis genetic risk signals. Osteoar- 66 2 Stradner MH, Angerer H, et al. Epigenetic differences in human thritis Cartilage 2019;27:1545e56. 67 3 cartilage between mild and severe OA. J Orthop Res 2014;32: 64. Rushton MD, Reynard LN, Young DA, Shepherd C, Aubourg G, 68 4 1636e45. Gee F, et al. Methylation quantitative trait locus (meQTL) 69 5 48. den Hollander W, Ramos YF, Bos SD, Bomer N, van der analysis of osteoarthritis links epigenetics with genetic risk. 70 6 Breggen R, Lakenberg N, et al. Knee and hip articular cartilage Hum Mol Genet 2015;24:7432e44. 2015. 71 7 have distinct epigenomic landscapes: implications for future 65. Capellini TD, Chen H, Cao J, Doxey AC, Kiapour AM, Schoor M, 72 8 cartilage regeneration approaches. Ann Rheum Dis 2014;73: et al. Ancient selection for derived alleles at a GDF5 enhancer 73 9 2208e12. influencing human growth and osteoarthritis risk. Nat Genet 74 10 49. Wingender E, Dietze P, Karas H, Knüppel R. TRANSFAC: a 2017;49:1202e10. 75 11 database on transcription factors and their DNA binding sites. 66. Kiapour AM, Cao J, Young M, Capellini TD. The role of Gdf5 76 12 Nucleic Acids Res 1996;24:238e41. regulatory regions in development of hip morphology. PloS 77 13 50. Gee F, Clubbs CF, Raine EV, Reynard LN, Loughlin J. Allelic One 2018;13, e0202785. 78 14 expression analysis of the osteoarthritis susceptibility locus 67. Pregizer SK, Kiapour AM, Young M, Chen H, Schoor M, Liu Z, 79 15 that maps to chromosome 3p21 reveals cis-acting eQTLs at et al. Impact of broad regulatory regions on Gdf5 expression 80 16 GNL3 and SPCS1. BMC Med Genet 2014;15:53. and function in knee development and susceptibility to oste- 81 17 51. Gee F, Rushton MD, Loughlin J, Reynard LN. Correlation of the oarthritis. Ann Rheum Dis 2018;77:450. 82 18 osteoarthritis susceptibility variants that map to chromosome 68. Byars SG, Voskarides K. Antagonistic pleiotropy in human 83 19 20q13 with an expression quantitative trait locus operating on disease. J Mol Evol 2020;88:12e25. 84 20 NCOA3 and with functional variation at the polymorphism 69. Bos SD, Slagboom PE, Meulenbelt I. New insights into osteo- 85 21 rs116855380. Arthritis Rheum 2015;67:2923e32. arthritis: early developmental features of an ageing-related 86 22 52. Rice SJ, Aubourg G, Sorial AK, Almarza D, Tselepi M, Deehan DJ, disease. Curr Opin Rheumatol 2008;20:553e9. 87 23 et al. Identification of a novel, methylation-dependent, RUNX2 70. Slagboom E, Meulenbelt I. Genetics of osteoarthritis: early 88 24 regulatory region associated with osteoarthritis risk. Hum Mol developmental clues to an old disease. Nat Clin Pract Rheu- 89 25 Genet 2018;27:3464e74. matol 2008;4:563. 90 26 53. Shepherd C, Zhu D, Skelton AJ, Combe J, Threadgold H, Zhu L, 71. Pitsillides AA, Beier F. Cartilage biology in osteoarthritis - les- 91 27 et al. Functional characterization of the osteoarthritis genetic sons from developmental biology. Nat Rev Rheumatol 2011;7: 92 28 risk residing at ALDH1A2 identifies rs12915901 as a key target 654e63. 93 29 variant. Arthritis Rheum 2018;70:1577e87. 72. Sandell LJ. Etiology of osteoarthritis: genetics and synovial 94 30 54. Rice SJ, Tselepi M, Sorial AK, Aubourg G, Shepherd C, joint development. Nat Rev Rheumatol 2012;8:77e89. 95 31 Almarza D, et al. Prioritization of PLEC and GRINA as osteoar- 73. Baker-Lepain JC, Lynch JA, Parimi N, McCulloch CE, Nevitt MC, 96 32 thritis risk genes through the identification and characteriza- Corr M, et al. Variant alleles of the Wnt antagonist FRZB are 97 33 tion of novel methylation quantitative trait loci. Arthritis determinants of hip shape and modify the relationship be- 98 34 Rheum 2019;71:1285e96. tween hip shape and osteoarthritis. Arthritis Rheum 2012;64: 99 35 55. Shepherd C, Reese AE, Reynard LN, Loughlin J. Expression 1457e65. 100 36 analysis of the osteoarthritis genetic susceptibility mapping to 74. Lindner C, Thiagarajah S, Wilkinson JM, Panoutsopoulou K, 101 37 the matrix Gla protein gene MGP. Arthritis Res Ther 2019;21: Day-Williams AG, arcOGEN Consortium, et al. Investigation of 102 38 149. association between hip osteoarthritis susceptibility loci and 103 39 56. Sorial AK, Hofer IM, Tselepi M, Cheung K, Parker E, Deehan DJ, radiographic proximal femur shape. Arthritis Rheum 2015;67: 104 40 et al. Multi-tissue epigenetic analysis of the osteoarthritis 2076e84. 105 41 susceptibility locus mapping to the plectin gene PLEC. Osteo- 75. Baird DA, Paternoster L, Gregory JS, Faber BG, Saunders FR, 106 42 arthritis Cartilage 2020;28:1448e58. Giuraniuc CV, et al. Investigation of the relationship between 107 43 57. Parker E, Hofer IM, Rice SJ, Earl L, Anjum SA, Deehan DJ, et al. susceptibility loci for hip osteoarthritis and dual x-ray absorp- 108 44 Multi-tissue epigenetic and gene expression analysis com- tiometry-derived hip shape in a population-based cohort of 109 45 bined with epigenome modulation identifies RWDD2B as a perimenopausal women. Arthritis Rheum 2018;70:1984e93. 110 46 target of osteoarthritis susceptibility. Arthritis Rheum 76. Hatzikotoulas K, Roposch A, , DDH Case Control Consortium, 111 47 2021;73:100e9. Shah KM, Clark MJ, Bratherton S, et al. Genome-wide associ- 112 48 58. Klein JC, Keith A, Rice SJ, Shepherd C, Agarwal V, Loughlin J, ation study of developmental dysplasia of the hip identifies an 113 49 et al. Functional testing of thousands of osteoarthritis-associ- association with GDF5. Commun Biol 2018;1:56. 114 50 ated variants for regulatory activity. Nat Commun 2019;10: 77. Frysz M, Baird D, Gregory JS, Aspden RM, Lane NE, Ohlsson C, 115 51 2434. et al. The influence of adult hip shape genetic variants on 116 52 59. Adli M. The CRISPR tool kit for genome editing and beyond. adolescent hip shape: findings from a population-based DXA 117 53 Nat Commun 2018;9:1911. study. Bone 2021;143:115792. 118 54 60. Kehayova Y, Wilkinson JM, Loughlin J, Rice SJ. The molecular 78. Grandi FC, Bhutani N. Epigenetic therapies for osteoarthritis. 119 55 genetics and epigenetics of COLGALT2, a risk locus for osteo- Trends Pharmacol Sci 2020;41:557e69. 120 56 arthritis. Osteoarthritis Cartilage 2020;28(Suppl 1):S54e5. 79. Oo WM, Hunter DJ. Disease modification in osteoarthritis: are 121 57 61. Rice SJ, Tselepi M, Roberts J, Loughlin J. Molecular genetic and we there yet? Clin Exp Rheumatol 2019;37:135e40. 122 58 epigenetic analysis of the osteaorthritis risk residing down- 80. Khan NM, Haqqi TM. Epigenetics in osteoarthritis: potential of 123 59 stream of the gene TGFB1. Osteoarthritis Cartilage HDAC inhibitors as therapeutics. Pharmacol Res 2018;128:73e9. 124 60 2020;28(Suppl 1):S339. 81. Torkamani A, Wineinger NE, Topol EJ. The personal and clinical 125 61 62. Rice SJ, Beier F, Young DA, Loughlin J. Interplay between ge- utility of polygenic risk scores. Nat Rev Genet 2018;19:581e90. 126 62 netics and epigenetics in osteoarthritis. Nat Rev Rheumatol 82. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, 127 63 2020;16:268e81. et al. Genome-wide polygenic scores for common diseases 128 64 63. Rice SJ, Cheung K, Reynard LN, Loughlin J. Discovery and identify individuals with risk equivalent to monogenic muta- 129 65 analysis of methylation quantitative trait loci (mQTLs) tions. Nat Genet 2018;50:1219e24. 130

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002 YJOCA4821_proof ■ 24 March 2021 ■ 15/15

G. Aubourg et al. / Osteoarthritis and Cartilage xxx (xxxx) xxx 15

1 83. Gibson G. On the utilization of polygenic risk scores for ther- 87. Vincent TL, Wann AKT. Mechanoadaption: articular cartilage 16 2 apeutic targeting. PLoS Genet 2019;15:e1008060. through thick and thin. J Physiol 2019;597:1271e81. 17 3 84. Zeggini E, Gloyn AL, Barton AC, Wain LV. Translational geno- 88. Butterfield NC, Curry KF, Steinberg J, Dewhurst H, 18 4 mics and precision medicine: moving from the lab to the Komla-Ebri D, Mannan NS, et al. Accelerating functional 19 5 clinic. Science 2019;365:1409. 13. gene discovery in osteoarthritis. Nat Commun 2021;12: 20 6 85. Katz DB, Huynh NPT, Savadipour A, Palte I, Guilak F. An 467. 21 7 immortalized human adipose-derived stem cell line with 89. Soul J, Barter MJ, Little CB, Young DA. OATargets: a knowledge 22 8 highly enhanced chondrogenic properties. Biochem Biophys base of genes associated with osteoarthritis joint damage in 23 9 Res Commun 2020;530:252e8. animals. Ann Rheum Dis 2021, https://doi.org/10.1136/annr- 24 10 86. Li Z, Xiang S, Li EN, Fritch MR, Alexander PG, Lin H, et al. Tissue heumdis-2020-218344 (in press). Q6 25 11 engineering for musculoskeletal regeneration and disease 90. Vincent TL. Of mice and men: converging on a common mo- 26 12 modeling. Handb Exp Pharmacol 2021, https://doi.org/ lecular understanding of osteoarthritis. Lancet Rheumatol 27 13 Q5 10.1007/164_2020_377 (in press). 2020;2:e633e45. 28 14 29 15 30

Please cite this article as: Aubourg G et al., Genetics of osteoarthritis, Osteoarthritis and Cartilage, https://doi.org/10.1016/j.joca.2021.03.002