bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
1 Title:
2 Implicating candidate genes at GWAS signals by leveraging topologically associating
3 domains
4 Running Title:
5 TADs aid candidate gene discovery from GWAS
6 Authors:
7 Gregory P. Way1,2, Daniel W. Youngstrom3, Kurt D. Hankenson3, Casey S. Greene2*,
8 and Struan F.A. Grant4,5*
9 * These authors directed this work jointly
10 Affiliations:
11 1Genomics and Computational Biology Graduate Program, University of Pennsylvania,
12 Philadelphia, PA 19104, USA
13 2Department of Systems Pharmacology and Translational Therapeutics, University of
14 Pennsylvania, Philadelphia, PA 19104, USA
15 3Department of Orthopaedic Surgery, School of Medicine, University of Michigan, Ann
16 Arbor, MI, USA
17 4Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania,
18 Philadelphia, PA 19104, USA
19 5Division of Human Genetics, Division of Endocrinology and Diabetes, The Children's
20 Hospital of Philadelphia, Philadelphia, PA 19104, USA
21
22
23
1 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
24 Co-Corresponding Authors:*
Struan F.A. Grant Casey S. Greene
Rm 1102D, 3615 Civic Center Blvd 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA 19104 Philadelphia, PA 19104 Office: 267-426-2795 Office: 215-573-2991 Fax: 215-590-1258 Fax: 215-573-9135
25 Conflict of Interest:
26 The authors have nothing to disclose.
27 Sources of Support:
28 This work was supported by the Genomics and Computational Biology Graduate
29 program at The University of Pennsylvania (to G.P.W.); the Gordon and Betty Moore
30 Foundation’s Data Driven Discovery Initiative (grant number GBMF 4552 to C.S.G);
31 the National Institute of Dental & Craniofacial Research (grant number NIH
32 F32DE026346 to D.W.Y.); S.F.A.G is supported by the Daniel B. Burke Endowed
33 Chair for Diabetes Research. D.W.Y. is supported by NIH-NRSA F32 DE
34
35
36
37
38
39
40
41
2 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
42 Abstract:
43 Genome wide association studies (GWAS) have contributed significantly to the
44 understanding of complex disease genetics. However, GWAS only report associated signals
45 and do not necessarily identify culprit genes. As most signals occur in non-coding regions
46 of the genome, it is often challenging to assign genomic variants to the underlying causal
47 mechanism(s). Topologically associating domains (TADs) are primarily cell-type
48 independent genomic regions that define interactome boundaries and can aid in the
49 designation of limits within which an association most likely impacts gene function. We
50 describe and validate a computational method that uses the genic content of TADs to
51 discover candidate genes. Our method, called “TAD_Pathways”, performs a Gene
52 Ontology (GO) analysis over genes that reside within TAD boundaries corresponding to
53 GWAS signals for a given trait or disease. We applied our pipeline to the GWAS catalog
54 entries associated with bone mineral density (BMD), identifying ‘Skeletal System
55 Development’ (Benjamini-Hochberg adjusted p=1.02x10-5) as the top ranked pathway. In
56 many cases, our method implicated a gene other than the nearest gene. Our molecular
57 experiments describe a novel example: ACP2, implicated at the canonical ‘ARHGAP1’
58 locus. We found ACP2 to be an important regulator of osteoblast metabolism, whereas
59 ARHGAP1 was not supported. Our results via the example of BMD demonstrate how basic
60 principles of three-dimensional genome organization can define biologically informed
61 association windows.
62
63
64
3 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
65 Keywords:
66 Candidate gene method, genome wide association study, topologically associating
67 domains, pathway analysis, bone mineral density
68
69 Introduction:
70 GWAS have been applied to over 300 different traits, leading to the discovery of
71 important disease associations1. However, assigning signals to causal genes has proven
72 difficult because these signals fall principally within noncoding regions and do not
73 necessarily implicate the nearest gene2,3. For example, a signal found in an FTO intron has
74 been shown to physically interact with and lead to the differential expression of other
75 genes, and not FTO itself 4. Moreover, there is evidence suggesting a type 2 diabetes
76 GWAS association impacting TCF7L2 also influences ACSL5 5. It remains unclear how
77 pervasive these kinds of associations are, but similar strategies are necessary in order for
78 GWAS to better guide research and precision medicine6.
79 Chromatin interaction studies have discovered key genome organization principles
80 including TADs7. TADs are genomic regions defined by increased contact frequency,
81 consistency across cell types, and their enrichment of insulator element flanks8. Therefore,
82 TADs can be used as boundaries of where non-coding causal variants will most likely
83 impact tissue independent function.
84 We developed a computational approach, called “TAD_Pathways”, which uses TADs
85 to determine candidate genes. We validated our method using BMD GWAS9. Using our
86 method, we identified ACP2 as a novel regulator of osteoblast metabolism.
87
4 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
88 Methods:
89 Computational procedures to identify candidate genes
90 TAD_Pathways is a computational method that uses TADs to discover candidate genes
91 (Figure 1A). Alternative approaches either assign genes based on nearest gene or by an
92 arbitrary or a linkage disequilibrium-based window of several kilobases10 (Figure 1B).
93 Here, we use human embryonic stem cell TAD boundaries as reported by Dixon et al. and
94 converted to hg19 by Ho et al. to build TAD based genesets that consists of all Gencode
95 genes that fall inside TADs implicated with BMD associations8,11. We perform a pathway
96 overrepresentation test12 for the input TAD genes against Gene Ontology (GO) terms13.
97 This determines if the geneset is associated with any term at a higher probability than by
98 chance. We included both experimentally confirmed and computationally inferred genes,
99 which permit the inclusion of putative genes which do not necessarily have literature
100 support, but are predicted computationally. For validation, we consider only the most
101 significantly enriched term, but a user can also select multiple. Our method also supports
102 custom input SNP lists. TAD_Pathways software is available at
103 https://github.com/greenelab/tad_pathways.
104
105
5 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
106
107 Figure 1. Concepts motivating our approach. Topologically associating domains (TADS) 108 are shown as orange triangles, genes are shown as black lines, and a genome wide 109 significant GWAS signal is shown as a dotted red line. (A) The TAD_Pathways method. 110 An example using Bone Mineral Density GWAS signals is shown. (B) Three hypothetical 111 examples illustrated by a cartoon. The ground truth causal gene is shaded in red. The 112 method-specific selected genes are shaded in blue. The top panel describes a nearest gene 113 approach. The nearest gene in this scenario is not the gene actually impacted by the GWAS 114 SNP. The middle panel describes a window approach. Based either on linkage 115 disequilibrium or an arbitrarily sized window, the scenario does not capture the true gene. 116 The bottom panel describes the TAD_Pathways approach. In this scenario, the causal gene 117 is selected for downstream assessment. 118
119 Experimental knockdown of candidate genes
120 We investigated two candidate genes predicted by TAD_Pathways; ACP2 and DEAF1.
121 The corresponding BMD GWAS loci rs7932354 (cytoband: 11p11.2) and rs11602954
122 (cytoband: 11p15.5) are assigned to ARHGAP1 and BET1L, respectively. The candidate
123 genes were selected because no experimental support exists for their impact in bone and the
124 nearest genes to the GWAS signals do not have immediate bone related functional
125 implications. All experiments were conducted in three temporally separated independent
126 technical replicates from cryopreserved P2 aliquots of a human fetal osteoblast cell line
127 (ATCC hFOB 1.19 CRL-11372). Full experimental design can be found in the
128 accompanying Supplementary Information.
6 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
129 siRNA transfections were conducted using a commercial reagent system with cells
130 assigned to one of 8 experimental groups: untreated control, siRNA-negative transfection
131 reagent control, scrambled control siRNA, TNAP siRNA, ARHGAP1 siRNA, ACP2 siRNA,
132 BET1L siRNA or Suppressin (DEAF1) siRNA.
133 For quantitative gene expression analysis, whole RNA was harvested at Day 4, isolated
134 and reverse-transcribed. Duplicate 20ng templates for each gene of interest were assayed
135 via real-time PCR using SYBR reagents. Results were analyzed using the 2-ddCt method
136 using GAPDH as a housekeeping gene and reported as (mean±standard deviation) fold-
137 change versus untreated controls. Cellular metabolism/proliferation was assessed at 1 and 4
138 days post-transfection using an MTT cell growth determination kit. Results are reported as
139 the difference in mean absorbance at 570nm minus 690nm from Day 1 to Day 4 and error
140 bars represent root-mean-square standard deviation from the measurements at both days.
141 Early osteoblast differentiation was assessed at 4 days post-transfection using an alkaline
142 phosphatase (ALP) kit. Dried plates were scanned and ALP+ intensities/areas were
143 quantified in ImageJ. Values are reported as mean ± standard deviation.
144
145 Results:
146 TAD_Pathways reveals candidate genes within phenotype-associated TADs
147 We applied TAD_Pathways to BMD GWAS results derived from replication-requiring
148 journals9,14–16. Our method implicated ‘Skeletal System Development’ as the top ranked
149 pathway (Benjamini-Hochberg adjusted p=1.02x10-5). For full BMD TAD_Pathways refer
150 to Supplementary Table S1. Many candidates were not the nearest gene to the GWAS
151 signal and several had independent eQTL support (Supplementary Table S2).
7 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
152 siRNA Knockdown of TAD_Pathways Gene Predictions in Osteoblast Cells
153 We targeted the expression of four of these genes in vitro using siRNA and assessed
154 transcriptional knockdown efficiency. Knockdown efficiencies were: TNAP 48.7±9.9%
155 (p=0.141), ARHGAP1 68.7±14.3% (p=0.015), ACP2 48.9±6.4% (p=0.035), BET1L
156 56.4±1.0% (p>0.05) and DEAF1 52.7±9.2% (p=0.021) (Figure 2).
157 We noted variation across the three controls, with the scrambled siRNA control altering
158 expression of OCN (osteocalcin), IBSP (bone sialoprotein), TNAP and BET1L (p<0.05).
159 Relative to the scrambled siRNA control, OCN was downregulated in all siRNA groups
160 (p<0.05) except for BET1L siRNA (p=0.122). OSX, IBSP and TNAP were not significantly
161 altered by any siRNA treatment (Figure 2).
162
163 Figure 2: Real-time PCR of osteoblast differentiation genes and GWAS/TAD hits in hFOB 164 cells. siRNA was used to knock down expression of TNAP (positive control), ARHGAP1, 165 ACP2, BET1L and DEAF1. Relative expression of the osteoblast marker genes OSX, OCN 166 and IBSP suggest that GWAS/TAD hits are not major regulators of bone differentiation in 167 this model. Red bars highlight specificity of each siRNA knockdown. Values represent 168 mean ± standard deviation. Statistical significance relative to the scrambled siRNA control 169 is annotated as: *p ≤ 0.05 and #p ≤ 0.10 using a two-tailed Student’s t-test.
8 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
170 Metabolic Activity of TAD_Pathways Gene Predictions
171 Treatment with ACP2 siRNA led to a 66.0% reduction in MTT metabolic activity
172 versus the scrambled siRNA control (p=0.012). ARHGAP1 siRNA caused a 38.8%
173 reduction, falling short of statistical significance (p=0.088). siRNA targeted against TNAP,
174 BET1L or DEAF1 did not alter MTT metabolic activity (Figure 3A).
175
176 Alkaline Phosphatase Activity of TAD_Pathways Gene Predictions
177 ALP is highly expressed in osteoblasts; disruption of proliferation or osteoblast
178 differentiation would result in downregulation of ALP. TNAP siRNA significantly reduced
179 ALP intensity by 5.98±1.77 units versus the scrambled siRNA control (p=0.006). ACP2
180 siRNA also significantly reduced ALP intensity by 8.74±2.11 (p=0.003). The control
181 stained less intensely than untreated or transfection reagent controls, but this did not reach
182 statistical significance (0.05
183
184 Figure 3. Validating two ‘TAD Pathway’ predictions for Bone Mineral Density GWAS hits 185 on hFOB cells. siRNA was used to knock down expression of TNAP, ARHGAP1, ACP2, 186 BET1L and DEAF1. (A) Knockdown of ACP2 decreases cellular metabolic activity, 187 demonstrated using an MTT assay. (B) ALP staining and quantitation indicates that 188 knockdown of TNAP or ACP2 inhibits performance in an osteoblast differentiation assay. 189 Values represent mean ± standard deviation. Statistical significance relative to the 190 scrambled siRNA control is annotated as: *p ≤ 0.05 and #p ≤ 0.10 using a two-tailed 191 Student’s t-test.
9 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
192 Discussion:
193 We showed that TAD_Pathways can reveal functional gene to intermediate phenotype
194 relationships using BMD. Several of the TAD_Pathways genes, such as LRP5, are bona
195 fide BMD genes already identified by nearest gene GWAS, eQTL analyses, and clinical
196 syndromes17, thus providing positive controls. However, several BMD GWAS signals do
197 not have obvious nearest-gene associations with bone. Our results suggest that a nearby
198 gene ACP2, and not the nearest gene ARHGAP1, regulates osteoblast proliferation/viability.
199 There are several limitations to the approach. Publication biases from pathway curation
200 present challenges18. To lessen this bias, we include curated and computationally predicted
201 GO annotations. Additionally, we used TAD boundaries defined by Dixon et al., whereas
202 increased Hi-C resolution reduced estimated TAD sizes19. Despite our method using larger
203 TADs, thus including more presumably false positive genes, we still identify relevant
204 pathways. Furthermore, the method will likely fail in a disease instigated by aberrant
205 looping. We were also concerned that TAD_Pathways would work only in BMD. Indeed,
206 when we applied TAD_Pathways to Type 2 Diabetes, we also identify several candidate
207 genes that are not the nearest gene (see Supplementary Table S3). Moreover, the
208 experimental validation was performed in a simplified in vitro cell culture system lacking
209 organismal complexity, and the cell line selected is tetraploid, which may partially
210 compensate for gene knockdown. While TAD_Pathways identified several candidate genes,
211 we only examined two, and our validation approach does not directly interrogate each SNP.
212 Lastly, one of the investigated GWAS SNPs, rs7932354, located in the ARHGAP1
213 promoter, is an ARHGAP1 eQTL in several tissues as determined by GTEx20. However,
10 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
214 none of these tissues are bone related and our screen did not implicate ARHGAP1 in
215 osteoblast processes.
216 In conclusion, TAD_Pathways can be used as a candidate gene discovery tool that uses
217 chromatin looping to map associations to candidate genes. We also validated our method by
218 implicating ACP2 as a gene involved in BMD determination, which warrants future
219 investigation. We believe TAD_Pathways and algorithms that leverage 3D genomic
220 structure will be poised to discover novel disease features.
221
222 Acknowledgements
223 Hannah E. Sexton and Troy L. Mitchell assisted in optimizing siRNA transfection
224 conditions. Daniel Himmelstein and Amy Campbell performed analytical code review.
225
226 Conflict of Interest
227 The authors have nothing to disclose.
228
229 Availability of data and material
230 All data used to construct the TAD_Pathways approach are publically available
231 datasets. We make all software used to develop this approach publically available in a
232 GitHub repository (http://github.com/greenelab/tad_pathways). We also provide a
233 docker image (https://hub.docker.com/r/gregway/tad_pathways/) and archive the
234 GitHub software on Zenodo (https://zenodo.org/record/163950).
235
236
11 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
237 References:
238 1 Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H et al. The NHGRI 239 GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014; 240 42: D1001–D1006.
241 2 Brodie A, Azaria JR, Ofran Y. How far from the SNP may the causative genes be? 242 Nucleic Acids Res 2016; : gkw500.
243 3 Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and 244 human disease. Nat Biotechnol 2012; 30: 1095–1106.
245 4 Claussnitzer M, Dankel SN, Kim K-H, Quon G, Meuleman W, Haugen C et al. FTO 246 Obesity Variant Circuitry and Adipocyte Browning in Humans. N Engl J Med 2015; 247 373: 895–907.
248 5 Xia Q, Chesi A, Manduchi E, Johnston BT, Lu S, Leonard ME et al. The type 2 249 diabetes presumed causal variant within TCF7L2 resides in an element that controls the 250 expression of ACSL5. Diabetologia 2016; 59: 2360–2368.
251 6 Herman MA, Rosen ED. Making Biological Sense of GWAS Data: Lessons from the 252 FTO Locus. Cell Metab 2015; 22: 538–539.
253 7 Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A 254 et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles 255 of the Human Genome. Science 2009; 326: 289–293.
256 8 Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y et al. Topological domains in 257 mammalian genomes identified by analysis of chromatin interactions. Nature 2012; 258 485: 376–380.
259 9 Estrada K, Styrkarsdottir U, Evangelou E, Hsu Y-H, Duncan EL, Ntzani EE et al. 260 Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci 261 associated with risk of fracture. Nat Genet 2012; 44: 491–501.
262 10 Hao K, Di X, Cawley S. LdCompare: rapid computation of single- and multiple-marker 263 r2 and genetic coverage. Bioinformatics 2007; 23: 252–254.
264 11 Ho JWK, Jung YL, Liu T, Alver BH, Lee S, Ikegami K et al. Comparative analysis of 265 metazoan chromatin organization. Nature 2014; 512: 449–452.
266 12 Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit 267 (WebGestalt): update 2013. Nucleic Acids Res 2013; 41: W77–W83.
268 13 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al. Gene 269 Ontology: tool for the unification of biology. Nat Genet 2000; 25: 25–29.
12 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
270 14 Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG et al. 271 Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide 272 association study. Lancet Lond Engl 2008; 371: 1505–1512.
273 15 Rivadeneira F, Styrkársdottir U, Estrada K, Halldórsson BV, Hsu Y-H, Richards JB et 274 al. Twenty bone-mineral-density loci identified by large-scale meta-analysis of 275 genome-wide association studies. Nat Genet 2009; 41: 1199–1206.
276 16 Styrkarsdottir U, Thorleifsson G, Sulem P, Gudbjartsson DF, Sigurdsson A, Jonasdottir 277 A et al. Nonsense mutation in the LGR4 gene is associated with several human diseases 278 and other traits. Nature 2013; 497: 517–520.
279 17 Baron R, Kneissel M. WNT signaling in bone homeostasis and disease: from human 280 mutations to treatments. Nat Med 2013; 19: 179–192.
281 18 Greene CS, Troyanskaya OG. Accurate evaluation and analysis of functional genomics 282 data and methods: Accurate evaluation and analysis of functional genomics data and 283 methods. Ann N Y Acad Sci 2012; 1260: 95–100.
284 19 Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT et al. A 285 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of 286 Chromatin Looping. Cell 2014; 159: 1665–1680.
287 20 Aguet F, Brown AA, Castel S, Davis JR, Mohammadi P, Segre AV et al. Local genetic 288 effects on gene expression across 44 human tissues. 289 2016http://biorxiv.org/lookup/doi/10.1101/074450 (accessed 3 Jan2017).
290
13