Implicating Candidate Genes at GWAS Signals by Leveraging Topologically Associating
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Title: 2 Implicating candidate genes at GWAS signals by leveraging topologically associating 3 domains 4 Running Title: 5 TADs aid candidate gene discovery from GWAS 6 Authors: 7 Gregory P. Way1,2, Daniel W. Youngstrom3, Kurt D. Hankenson3, Casey S. Greene2*, 8 and Struan F.A. Grant4,5* 9 * These authors directed this work jointly 10 Affiliations: 11 1Genomics and Computational Biology Graduate Program, University of Pennsylvania, 12 Philadelphia, PA 19104, USA 13 2Department of Systems Pharmacology and Translational Therapeutics, University of 14 Pennsylvania, Philadelphia, PA 19104, USA 15 3Department of Orthopaedic Surgery, School of Medicine, University of Michigan, Ann 16 Arbor, MI, USA 17 4Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, 18 Philadelphia, PA 19104, USA 19 5Division of Human Genetics, Division of Endocrinology and Diabetes, The Children's 20 Hospital of Philadelphia, Philadelphia, PA 19104, USA 21 22 23 1 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 24 Co-Corresponding Authors:* Struan F.A. Grant Casey S. Greene Rm 1102D, 3615 Civic Center Blvd 10-131 SCTR 34th and Civic Center Blvd, Philadelphia, PA 19104 Philadelphia, PA 19104 Office: 267-426-2795 Office: 215-573-2991 Fax: 215-590-1258 Fax: 215-573-9135 25 Conflict of Interest: 26 The authors have nothing to disclose. 27 Sources of Support: 28 This work was supported by the Genomics and Computational Biology Graduate 29 program at The University of Pennsylvania (to G.P.W.); the Gordon and Betty Moore 30 Foundation’s Data Driven Discovery Initiative (grant number GBMF 4552 to C.S.G); 31 the National Institute of Dental & Craniofacial Research (grant number NIH 32 F32DE026346 to D.W.Y.); S.F.A.G is supported by the Daniel B. Burke Endowed 33 Chair for Diabetes Research. D.W.Y. is supported by NIH-NRSA F32 DE 34 35 36 37 38 39 40 41 2 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 42 Abstract: 43 Genome wide association studies (GWAS) have contributed significantly to the 44 understanding of complex disease genetics. However, GWAS only report associated signals 45 and do not necessarily identify culprit genes. As most signals occur in non-coding regions 46 of the genome, it is often challenging to assign genomic variants to the underlying causal 47 mechanism(s). Topologically associating domains (TADs) are primarily cell-type 48 independent genomic regions that define interactome boundaries and can aid in the 49 designation of limits within which an association most likely impacts gene function. We 50 describe and validate a computational method that uses the genic content of TADs to 51 discover candidate genes. Our method, called “TAD_Pathways”, performs a Gene 52 Ontology (GO) analysis over genes that reside within TAD boundaries corresponding to 53 GWAS signals for a given trait or disease. We applied our pipeline to the GWAS catalog 54 entries associated with bone mineral density (BMD), identifying ‘Skeletal System 55 Development’ (Benjamini-Hochberg adjusted p=1.02x10-5) as the top ranked pathway. In 56 many cases, our method implicated a gene other than the nearest gene. Our molecular 57 experiments describe a novel example: ACP2, implicated at the canonical ‘ARHGAP1’ 58 locus. We found ACP2 to be an important regulator of osteoblast metabolism, whereas 59 ARHGAP1 was not supported. Our results via the example of BMD demonstrate how basic 60 principles of three-dimensional genome organization can define biologically informed 61 association windows. 62 63 64 3 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 65 Keywords: 66 Candidate gene method, genome wide association study, topologically associating 67 domains, pathway analysis, bone mineral density 68 69 Introduction: 70 GWAS have been applied to over 300 different traits, leading to the discovery of 71 important disease associations1. However, assigning signals to causal genes has proven 72 difficult because these signals fall principally within noncoding regions and do not 73 necessarily implicate the nearest gene2,3. For example, a signal found in an FTO intron has 74 been shown to physically interact with and lead to the differential expression of other 75 genes, and not FTO itself 4. Moreover, there is evidence suggesting a type 2 diabetes 76 GWAS association impacting TCF7L2 also influences ACSL5 5. It remains unclear how 77 pervasive these kinds of associations are, but similar strategies are necessary in order for 78 GWAS to better guide research and precision medicine6. 79 Chromatin interaction studies have discovered key genome organization principles 80 including TADs7. TADs are genomic regions defined by increased contact frequency, 81 consistency across cell types, and their enrichment of insulator element flanks8. Therefore, 82 TADs can be used as boundaries of where non-coding causal variants will most likely 83 impact tissue independent function. 84 We developed a computational approach, called “TAD_Pathways”, which uses TADs 85 to determine candidate genes. We validated our method using BMD GWAS9. Using our 86 method, we identified ACP2 as a novel regulator of osteoblast metabolism. 87 4 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 88 Methods: 89 Computational procedures to identify candidate genes 90 TAD_Pathways is a computational method that uses TADs to discover candidate genes 91 (Figure 1A). Alternative approaches either assign genes based on nearest gene or by an 92 arbitrary or a linkage disequilibrium-based window of several kilobases10 (Figure 1B). 93 Here, we use human embryonic stem cell TAD boundaries as reported by Dixon et al. and 94 converted to hg19 by Ho et al. to build TAD based genesets that consists of all Gencode 95 genes that fall inside TADs implicated with BMD associations8,11. We perform a pathway 96 overrepresentation test12 for the input TAD genes against Gene Ontology (GO) terms13. 97 This determines if the geneset is associated with any term at a higher probability than by 98 chance. We included both experimentally confirmed and computationally inferred genes, 99 which permit the inclusion of putative genes which do not necessarily have literature 100 support, but are predicted computationally. For validation, we consider only the most 101 significantly enriched term, but a user can also select multiple. Our method also supports 102 custom input SNP lists. TAD_Pathways software is available at 103 https://github.com/greenelab/tad_pathways. 104 105 5 bioRxiv preprint doi: https://doi.org/10.1101/087718; this version posted January 25, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 106 107 Figure 1. Concepts motivating our approach. Topologically associating domains (TADS) 108 are shown as orange triangles, genes are shown as black lines, and a genome wide 109 significant GWAS signal is shown as a dotted red line. (A) The TAD_Pathways method. 110 An example using Bone Mineral Density GWAS signals is shown. (B) Three hypothetical 111 examples illustrated by a cartoon. The ground truth causal gene is shaded in red. The 112 method-specific selected genes are shaded in blue. The top panel describes a nearest gene 113 approach. The nearest gene in this scenario is not the gene actually impacted by the GWAS 114 SNP. The middle panel describes a window approach. Based either on linkage 115 disequilibrium or an arbitrarily sized window, the scenario does not capture the true gene. 116 The bottom panel describes the TAD_Pathways approach. In this scenario, the causal gene 117 is selected for downstream assessment. 118 119 Experimental knockdown of candidate genes 120 We investigated two candidate genes predicted by TAD_Pathways; ACP2 and DEAF1. 121 The corresponding BMD GWAS loci rs7932354 (cytoband: 11p11.2) and rs11602954 122 (cytoband: 11p15.5) are assigned to ARHGAP1 and BET1L, respectively. The candidate 123 genes were selected because no experimental support exists for their impact in bone and the 124 nearest genes to the GWAS signals do not have immediate bone related functional 125 implications. All experiments were conducted in three temporally separated independent 126 technical replicates from cryopreserved P2 aliquots of a human fetal osteoblast cell line 127 (ATCC hFOB 1.19 CRL-11372). Full experimental design can be found in the 128 accompanying Supplementary Information.