bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Title:

2 Implicating candidate at GWAS signals by leveraging topologically associating

3 domains

4 Running Title:

5 TADs aid candidate discovery from GWAS

6 Authors:

7 Gregory P. Way1,2, Daniel W. Youngstrom3, Kurt D. Hankenson3, Casey S. Greene2*,

8 and Struan F.A. Grant4,5*

9 * These authors directed this work jointly

10 Affiliations:

11 1Genomics and Computational Biology Graduate Program, University of Pennsylvania,

12 Philadelphia, PA 19104, USA

13 2Department of Systems Pharmacology and Translational Therapeutics, University of

14 Pennsylvania, Philadelphia, PA 19104, USA

15 3Department of Orthopaedic Surgery, School of Medicine, University of Michigan, Ann

16 Arbor, MI, USA

17 4Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania,

18 Philadelphia, PA 19104, USA

19 5Division of Human Genetics, Division of Endocrinology and Diabetes, The Children's

20 Hospital of Philadelphia, Philadelphia, PA 19104, USA

21

22

23

1 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

24 Co-Corresponding Authors:*

Struan F.A. Grant Casey S. Greene

Rm 1102D, 3615 Civic Center Blvd 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA 19104 Philadelphia, PA 19104 Office: 267-426-2795 Office: 215-573-2991 Fax: 215-590-1258 Fax: 215-573-9135

25 Conflict of Interest:

26 The authors have nothing to disclose.

27 Sources of Support:

28 This work was supported by the Genomics and Computational Biology Graduate

29 program at The University of Pennsylvania (to G.P.W.); the Gordon and Betty Moore

30 Foundation’s Data Driven Discovery Initiative (grant number GBMF 4552 to C.S.G);

31 the National Institute of Dental & Craniofacial Research (grant number NIH

32 F32DE026346 to D.W.Y.); S.F.A.G is supported by the Daniel B. Burke Endowed

33 Chair for Diabetes Research. D.W.Y. is supported by NIH-NRSA F32 DE

34

35

36

37

38

39

40

41

2 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

42 Abstract:

43 Genome wide association studies (GWAS) have contributed significantly to the

44 understanding of complex disease genetics. However, GWAS only report associated signals

45 and do not necessarily identify culprit genes. As most signals occur in non-coding regions

46 of the genome, it is often challenging to assign genomic variants to the underlying causal

47 mechanism(s). Topologically associating domains (TADs) are primarily cell-type

48 independent genomic regions that define interactome boundaries and can aid in the

49 designation of limits within which an association most likely impacts gene function. We

50 describe and validate a computational method that uses the genic content of TADs to

51 discover candidate genes. Our method, called “TAD_Pathways”, performs a Gene

52 Ontology (GO) analysis over genes that reside within TAD boundaries corresponding to

53 GWAS signals for a given trait or disease. We applied our pipeline to the GWAS catalog

54 entries associated with bone mineral density (BMD), identifying ‘Skeletal System

55 Development’ (Benjamini-Hochberg adjusted p=1.02x10-5) as the top ranked pathway. In

56 many cases, our method implicated a gene other than the nearest gene. Our molecular

57 experiments describe a novel example: ACP2, implicated at the canonical ‘ARHGAP1’

58 locus. We found ACP2 to be an important regulator of osteoblast metabolism, whereas

59 ARHGAP1 was not supported. Our results via the example of BMD demonstrate how basic

60 principles of three-dimensional genome organization can define biologically informed

61 association windows.

62

63

64

3 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

65 Keywords:

66 Candidate gene method, genome wide association study, topologically associating

67 domains, pathway analysis, bone mineral density

68

69 Introduction:

70 GWAS have been applied to over 300 different traits, leading to the discovery of

71 important disease associations1. However, assigning signals to causal genes has proven

72 difficult because these signals fall principally within noncoding regions and do not

73 necessarily implicate the nearest gene2,3. For example, a signal found in an FTO intron has

74 been shown to physically interact with and lead to the differential expression of other

75 genes, and not FTO itself 4. Moreover, there is evidence suggesting a type 2 diabetes

76 GWAS association impacting TCF7L2 also influences ACSL5 5. It remains unclear how

77 pervasive these kinds of associations are, but similar strategies are necessary in order for

78 GWAS to better guide research and precision medicine6.

79 Chromatin interaction studies have discovered key genome organization principles

80 including TADs7. TADs are genomic regions defined by increased contact frequency,

81 consistency across cell types, and their enrichment of insulator element flanks8. Therefore,

82 TADs can be used as boundaries of where non-coding causal variants will most likely

83 impact tissue independent function.

84 We developed a computational approach, called “TAD_Pathways”, which uses TADs

85 to determine candidate genes. We validated our method using BMD GWAS9. Using our

86 method, we identified ACP2 as a novel regulator of osteoblast metabolism.

87

4 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

88 Methods:

89 Computational procedures to identify candidate genes

90 TAD_Pathways is a computational method that uses TADs to discover candidate genes

91 (Figure 1A). Alternative approaches either assign genes based on nearest gene or by an

92 arbitrary or a linkage disequilibrium-based window of several kilobases10 (Figure 1B).

93 Here, we use human embryonic stem cell TAD boundaries as reported by Dixon et al. and

94 converted to hg19 by Ho et al. to build TAD based genesets that consists of all Gencode

95 genes that fall inside TADs implicated with BMD associations8,11. We perform a pathway

96 overrepresentation test12 for the input TAD genes against (GO) terms13.

97 This determines if the geneset is associated with any term at a higher probability than by

98 chance. We included both experimentally confirmed and computationally inferred genes,

99 which permit the inclusion of putative genes which do not necessarily have literature

100 support, but are predicted computationally. For validation, we consider only the most

101 significantly enriched term, but a user can also select multiple. Our method also supports

102 custom input SNP lists. TAD_Pathways software is available at

103 https://github.com/greenelab/tad_pathways.

104

105

5 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

106

107 Figure 1. Concepts motivating our approach. Topologically associating domains (TADS) 108 are shown as orange triangles, genes are shown as black lines, and a genome wide 109 significant GWAS signal is shown as a dotted red line. (A) The TAD_Pathways method. 110 An example using Bone Mineral Density GWAS signals is shown. (B) Three hypothetical 111 examples illustrated by a cartoon. The ground truth causal gene is shaded in red. The 112 method-specific selected genes are shaded in blue. The top panel describes a nearest gene 113 approach. The nearest gene in this scenario is not the gene actually impacted by the GWAS 114 SNP. The middle panel describes a window approach. Based either on linkage 115 disequilibrium or an arbitrarily sized window, the scenario does not capture the true gene. 116 The bottom panel describes the TAD_Pathways approach. In this scenario, the causal gene 117 is selected for downstream assessment. 118

119 Experimental knockdown of candidate genes

120 We investigated two candidate genes predicted by TAD_Pathways; ACP2 and DEAF1.

121 The corresponding BMD GWAS loci rs7932354 (cytoband: 11p11.2) and rs11602954

122 (cytoband: 11p15.5) are assigned to ARHGAP1 and BET1L, respectively. The candidate

123 genes were selected because no experimental support exists for their impact in bone and the

124 nearest genes to the GWAS signals do not have immediate bone related functional

125 implications. All experiments were conducted in three temporally separated independent

126 technical replicates from cryopreserved P2 aliquots of a human fetal osteoblast cell line

127 (ATCC hFOB 1.19 CRL-11372). Full experimental design can be found in the

128 accompanying Supplementary Information.

6 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

129 siRNA transfections were conducted using a commercial reagent system with cells

130 assigned to one of 8 experimental groups: untreated control, siRNA-negative transfection

131 reagent control, scrambled control siRNA, TNAP siRNA, ARHGAP1 siRNA, ACP2 siRNA,

132 BET1L siRNA or Suppressin (DEAF1) siRNA.

133 For quantitative gene expression analysis, whole RNA was harvested at Day 4, isolated

134 and reverse-transcribed. Duplicate 20ng templates for each gene of interest were assayed

135 via real-time PCR using SYBR reagents. Results were analyzed using the 2-ddCt method

136 using GAPDH as a housekeeping gene and reported as (mean±standard deviation) fold-

137 change versus untreated controls. Cellular metabolism/proliferation was assessed at 1 and 4

138 days post-transfection using an MTT cell growth determination kit. Results are reported as

139 the difference in mean absorbance at 570nm minus 690nm from Day 1 to Day 4 and error

140 bars represent root-mean-square standard deviation from the measurements at both days.

141 Early osteoblast differentiation was assessed at 4 days post-transfection using an alkaline

142 phosphatase (ALP) kit. Dried plates were scanned and ALP+ intensities/areas were

143 quantified in ImageJ. Values are reported as mean ± standard deviation.

144

145 Results:

146 TAD_Pathways reveals candidate genes within phenotype-associated TADs

147 We applied TAD_Pathways to BMD GWAS results derived from replication-requiring

148 journals9,14–16. Our method implicated ‘Skeletal System Development’ as the top ranked

149 pathway (Benjamini-Hochberg adjusted p=1.02x10-5). For full BMD TAD_Pathways refer

150 to Supplementary Table S1. Many candidates were not the nearest gene to the GWAS

151 signal and several had independent eQTL support (Supplementary Table S2).

7 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

152 siRNA Knockdown of TAD_Pathways Gene Predictions in Osteoblast Cells

153 We targeted the expression of four of these genes in vitro using siRNA and assessed

154 transcriptional knockdown efficiency. Knockdown efficiencies were: TNAP 48.7±9.9%

155 (p=0.141), ARHGAP1 68.7±14.3% (p=0.015), ACP2 48.9±6.4% (p=0.035), BET1L

156 56.4±1.0% (p>0.05) and DEAF1 52.7±9.2% (p=0.021) (Figure 2).

157 We noted variation across the three controls, with the scrambled siRNA control altering

158 expression of OCN (osteocalcin), IBSP (bone sialoprotein), TNAP and BET1L (p<0.05).

159 Relative to the scrambled siRNA control, OCN was downregulated in all siRNA groups

160 (p<0.05) except for BET1L siRNA (p=0.122). OSX, IBSP and TNAP were not significantly

161 altered by any siRNA treatment (Figure 2).

162

163 Figure 2: Real-time PCR of osteoblast differentiation genes and GWAS/TAD hits in hFOB 164 cells. siRNA was used to knock down expression of TNAP (positive control), ARHGAP1, 165 ACP2, BET1L and DEAF1. Relative expression of the osteoblast marker genes OSX, OCN 166 and IBSP suggest that GWAS/TAD hits are not major regulators of bone differentiation in 167 this model. Red bars highlight specificity of each siRNA knockdown. Values represent 168 mean ± standard deviation. Statistical significance relative to the scrambled siRNA control 169 is annotated as: *p ≤ 0.05 and #p ≤ 0.10 using a two-tailed Student’s t-test.

8 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

170 Metabolic Activity of TAD_Pathways Gene Predictions

171 Treatment with ACP2 siRNA led to a 66.0% reduction in MTT metabolic activity

172 versus the scrambled siRNA control (p=0.012). ARHGAP1 siRNA caused a 38.8%

173 reduction, falling short of statistical significance (p=0.088). siRNA targeted against TNAP,

174 BET1L or DEAF1 did not alter MTT metabolic activity (Figure 3A).

175

176 Alkaline Phosphatase Activity of TAD_Pathways Gene Predictions

177 ALP is highly expressed in osteoblasts; disruption of proliferation or osteoblast

178 differentiation would result in downregulation of ALP. TNAP siRNA significantly reduced

179 ALP intensity by 5.98±1.77 units versus the scrambled siRNA control (p=0.006). ACP2

180 siRNA also significantly reduced ALP intensity by 8.74±2.11 (p=0.003). The control

181 stained less intensely than untreated or transfection reagent controls, but this did not reach

182 statistical significance (0.05

183

184 Figure 3. Validating two ‘TAD Pathway’ predictions for Bone Mineral Density GWAS hits 185 on hFOB cells. siRNA was used to knock down expression of TNAP, ARHGAP1, ACP2, 186 BET1L and DEAF1. (A) Knockdown of ACP2 decreases cellular metabolic activity, 187 demonstrated using an MTT assay. (B) ALP staining and quantitation indicates that 188 knockdown of TNAP or ACP2 inhibits performance in an osteoblast differentiation assay. 189 Values represent mean ± standard deviation. Statistical significance relative to the 190 scrambled siRNA control is annotated as: *p ≤ 0.05 and #p ≤ 0.10 using a two-tailed 191 Student’s t-test.

9 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

192 Discussion:

193 We showed that TAD_Pathways can reveal functional gene to intermediate phenotype

194 relationships using BMD. Several of the TAD_Pathways genes, such as LRP5, are bona

195 fide BMD genes already identified by nearest gene GWAS, eQTL analyses, and clinical

196 syndromes17, thus providing positive controls. However, several BMD GWAS signals do

197 not have obvious nearest-gene associations with bone. Our results suggest that a nearby

198 gene ACP2, and not the nearest gene ARHGAP1, regulates osteoblast proliferation/viability.

199 There are several limitations to the approach. Publication biases from pathway curation

200 present challenges18. To lessen this bias, we include curated and computationally predicted

201 GO annotations. Additionally, we used TAD boundaries defined by Dixon et al., whereas

202 increased Hi-C resolution reduced estimated TAD sizes19. Despite our method using larger

203 TADs, thus including more presumably false positive genes, we still identify relevant

204 pathways. Furthermore, the method will likely fail in a disease instigated by aberrant

205 looping. We were also concerned that TAD_Pathways would work only in BMD. Indeed,

206 when we applied TAD_Pathways to Type 2 Diabetes, we also identify several candidate

207 genes that are not the nearest gene (see Supplementary Table S3). Moreover, the

208 experimental validation was performed in a simplified in vitro cell culture system lacking

209 organismal complexity, and the cell line selected is tetraploid, which may partially

210 compensate for gene knockdown. While TAD_Pathways identified several candidate genes,

211 we only examined two, and our validation approach does not directly interrogate each SNP.

212 Lastly, one of the investigated GWAS SNPs, rs7932354, located in the ARHGAP1

213 promoter, is an ARHGAP1 eQTL in several tissues as determined by GTEx20. However,

10 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

214 none of these tissues are bone related and our screen did not implicate ARHGAP1 in

215 osteoblast processes.

216 In conclusion, TAD_Pathways can be used as a candidate gene discovery tool that uses

217 chromatin looping to map associations to candidate genes. We also validated our method by

218 implicating ACP2 as a gene involved in BMD determination, which warrants future

219 investigation. We believe TAD_Pathways and algorithms that leverage 3D genomic

220 structure will be poised to discover novel disease features.

221

222 Acknowledgements

223 Hannah E. Sexton and Troy L. Mitchell assisted in optimizing siRNA transfection

224 conditions. Daniel Himmelstein and Amy Campbell performed analytical code review.

225

226 Conflict of Interest

227 The authors have nothing to disclose.

228

229 Availability of data and material

230 All data used to construct the TAD_Pathways approach are publically available

231 datasets. We make all software used to develop this approach publically available in a

232 GitHub repository (http://github.com/greenelab/tad_pathways). We also provide a

233 docker image (https://hub.docker.com/r/gregway/tad_pathways/) and archive the

234 GitHub software on Zenodo (https://zenodo.org/record/163950).

235

236

11 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

237 References:

238 1 Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H et al. The NHGRI 239 GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014; 240 42: D1001–D1006.

241 2 Brodie A, Azaria JR, Ofran Y. How far from the SNP may the causative genes be? 242 Nucleic Acids Res 2016; : gkw500.

243 3 Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and 244 human disease. Nat Biotechnol 2012; 30: 1095–1106.

245 4 Claussnitzer M, Dankel SN, Kim K-H, Quon G, Meuleman W, Haugen C et al. FTO 246 Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med 2015; 247 373: 895–907.

248 5 Xia Q, Chesi A, Manduchi E, Johnston BT, Lu S, Leonard ME et al. The type 2 249 diabetes presumed causal variant within TCF7L2 resides in an element that controls the 250 expression of ACSL5. Diabetologia 2016; 59: 2360–2368.

251 6 Herman MA, Rosen ED. Making Biological Sense of GWAS Data: Lessons from the 252 FTO Locus. Cell Metab 2015; 22: 538–539.

253 7 Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A 254 et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles 255 of the . Science 2009; 326: 289–293.

256 8 Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y et al. Topological domains in 257 mammalian genomes identified by analysis of chromatin interactions. Nature 2012; 258 485: 376–380.

259 9 Estrada K, Styrkarsdottir U, Evangelou E, Hsu Y-H, Duncan EL, Ntzani EE et al. 260 Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci 261 associated with risk of fracture. Nat Genet 2012; 44: 491–501.

262 10 Hao K, Di X, Cawley S. LdCompare: rapid computation of single- and multiple-marker 263 r2 and genetic coverage. Bioinformatics 2007; 23: 252–254.

264 11 Ho JWK, Jung YL, Liu T, Alver BH, Lee S, Ikegami K et al. Comparative analysis of 265 metazoan chromatin organization. Nature 2014; 512: 449–452.

266 12 Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit 267 (WebGestalt): update 2013. Nucleic Acids Res 2013; 41: W77–W83.

268 13 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al. Gene 269 Ontology: tool for the unification of biology. Nat Genet 2000; 25: 25–29.

12 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

270 14 Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG et al. 271 Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide 272 association study. Lancet Lond Engl 2008; 371: 1505–1512.

273 15 Rivadeneira F, Styrkársdottir U, Estrada K, Halldórsson BV, Hsu Y-H, Richards JB et 274 al. Twenty bone-mineral-density loci identified by large-scale meta-analysis of 275 genome-wide association studies. Nat Genet 2009; 41: 1199–1206.

276 16 Styrkarsdottir U, Thorleifsson G, Sulem P, Gudbjartsson DF, Sigurdsson A, Jonasdottir 277 A et al. Nonsense in the LGR4 gene is associated with several human diseases 278 and other traits. Nature 2013; 497: 517–520.

279 17 Baron R, Kneissel M. WNT signaling in bone homeostasis and disease: from human 280 to treatments. Nat Med 2013; 19: 179–192.

281 18 Greene CS, Troyanskaya OG. Accurate evaluation and analysis of functional genomics 282 data and methods: Accurate evaluation and analysis of functional genomics data and 283 methods. Ann N Y Acad Sci 2012; 1260: 95–100.

284 19 Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT et al. A 285 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of 286 Chromatin Looping. Cell 2014; 159: 1665–1680.

287 20 Aguet F, Brown AA, Castel S, Davis JR, Mohammadi P, Segre AV et al. Local genetic 288 effects on gene expression across 44 human tissues. 289 2016http://biorxiv.org/lookup/doi/10.1101/074450 (accessed 3 Jan2017).

290

13