Distant Regulatory Effects of Genetic Variation in Multiple Human Tissues
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/074419; this version posted September 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Distant regulatory effects of genetic variation in multiple human tissues 2 Brian Jo1&, Yuan He2&, Benjamin J. Strober2&, Princy Parsana3&, François Aguet4, Andrew A. Brown5,6,7, Stephane 3 E. Castel8,9, Eric R. Gamazon10,11, Ariel Gewirtz1, Genna Gliner12, Buhm Han13, Amy Z. He3, Eun Yong Kang14, Ian 4 C. McDowell15, Xiao Li4, Pejman Mohammadi8,9, Christine B. Peterson16, Gerald Quon4,17, Ashis Saha3, Ayellet V. 5 Segrè4, Jae Hoon Sul18, Timothy J. Sullivan4, Kristin G. Ardlie4, Christopher D. Brown19, Donald F. Conrad20, 6 Nancy J. Cox10, Emmanouil T. Dermitzakis5,6,7, Eleazar Eskin14,21, Manolis Kellis4,17, Tuuli Lappalainen8,9, Chiara 7 Sabatti22, GTEx Consortium, Barbara E. Engelhardt23*, Alexis Battle3* 8 1 Lewis Sigler Institute, Princeton University, Princeton, NJ 08540 9 2 Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218 10 3 Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 11 4 The Broad Institute of Massachusetts Institute of Technology and Harvard University Cambridge, Massachusetts 12 02142 13 5 Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, 14 Switzerland 15 6 Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, 1211 Geneva, Switzerland 16 7 Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland 17 8 New York Genome Center, 101 Avenue of the Americas, New York, NY, 10013 18 9 Department of Systems Biology, Columbia University Medical Center, New York, NY 10032 19 10 Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Canter, Nashville, TN 20 37232 21 11 Department of Clinical Epidemiology, Biostatistics and Bioinformatics and Department of Psychiatry, Academic 22 Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands 23 12 Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08540 24 13 Department of Convergence Medicine, University of Ulsan College of Medicine, Asian Medical Center, Korea 25 14 Department of Computer Science, University of California, Los Angeles, CA 90095 26 15 Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, NC 27708 27 16 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 1400 Pressler Street, Houston, 28 TX 77030 29 17 Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, 02139 30 18 Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA 90095 31 19 University of Pennsylvania, Perelman School of Medicine, Department of Genetics, Philadelphia, PA, 19104 32 20 Department of Genetics, Department of Pathology & Immunology, Washington University School of Medicine, 33 St. Louis, Missouri 34 21 Department of Human Genetics, University of California, Los Angeles, CA 90095 35 22 Departments of Biomedical Data Science and Statistics, HRP Redwood building, Stanford, CA 94305-5404 36 23 Princeton University, Department of Computer Science, Center for Statistics and Machine Learning, 35 Olden 37 Street, Princeton, NJ 08540 38 & Equal contribution 39 * Corresponding authors: [email protected], [email protected] bioRxiv preprint doi: https://doi.org/10.1101/074419; this version posted September 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 40 Abstract 41 Understanding the genetics of gene regulation provides information on the cellular mechanisms 42 through which genetic variation influences complex traits. Expression quantitative trait loci, or 43 eQTLs, are enriched for polymorphisms that have been found to be associated with disease risk. 44 While most analyses of human data has focused on regulation of expression by nearby variants 45 (cis-eQTLs), distal or trans-eQTLs may have broader effects on the transcriptome and important 46 phenotypic consequences, necessitating a comprehensive study of the effects of genetic variants 47 on distal gene transcription levels. In this work, we identify trans-eQTLs in the Genotype Tissue 48 Expression (GTEx) project data1, consisting of 449 individuals with RNA-sequencing data 49 across 44 tissue types. We find 81 genes with a trans-eQTL in at least one tissue, and we 50 demonstrate that trans-eQTLs are more likely than cis-eQTLs to have effects specific to a single 51 tissue. We evaluate the genomic and functional properties of trans-eQTL variants, identifying 52 strong enrichment in enhancer elements and Piwi-interacting RNA clusters. Finally, we describe 53 three tissue-specific regulatory loci underlying relevant disease associations: 9q22 in thyroid that 54 has a role in thyroid cancer, 5q31 in skeletal muscle, and a previously reported master regulator 55 near KLF14 in adipose. These analyses provide a comprehensive characterization of trans-eQTLs 56 across human tissues, which contribute to an improved understanding of the tissue-specific 57 cellular mechanisms of regulatory genetic variation. 58 Introduction 59 Variation in the human genome influences complex disease risk through changes at a cellular 60 level. Many disease-associated variants are also associated with gene expression levels through 61 which they mediate disease risk. The majority of expression quantitative trait locus (eQTL) 62 studies1–6 thus far have focused on local- or cis-eQTLs because of the relative simplicity of 63 association mapping in human for both statistical and biological reasons7,8. Trans-eQTLs, or 64 genetic variants that affect gene expression levels of distant target genes, have received much 65 less attention in comparison to cis-eQTLs, in part due to the considerable multiple hypotheses 66 testing burden9. Far fewer replicable, strong effect trans-eQTLs have been discovered in human 67 data as compared to cis-eQTLs, unlike in model organisms such as Saccharomyces cerevisiae or 68 Arabidopsis thaliana7,10,11. However, a handful of replicable trans-eQTLs have now been 69 identified in human tissues3,12,13. Additionally, recent work has suggested that trans-eQTLs 70 contribute substantially to the genetic regulation of complex diseases12, motivating a careful 71 examination of the role of trans-eQTLs across human tissues in disease etiology. 72 Here, we identify trans-eQTLs in the Genotype-Tissue Expression (GTEx) v6 data, which 73 include 449 individuals with imputed genotypes and RNA-seq data across 44 tissues for a total 74 of 7,051 samples. We evaluate the tissue-specificity of trans-eQTLs, and we demonstrate 75 replication of trans-eQTLs in a large independent RNA-seq study14. We show enrichment of 76 trans-eQTLs for tests restricted to genetic variants associated with expression of nearby genes bioRxiv preprint doi: https://doi.org/10.1101/074419; this version posted September 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 77 and trait-associated variants. We then explore properties of genetic variants with significant 78 associations with distal gene expression levels including functional enrichment in cis regulatory 79 elements and Piwi-interacting RNA clusters. We discuss three examples of trans-eQTLs in the 80 GTEx data: the broad regulatory role of the 9q22 locus near thyroid-specific transcription factor 81 FOXE1; a trait-associated regulatory locus in skeletal muscle acting through interferon 82 regulatory factor IRF-1; and replication of a previously-identified master regulator in adipose 83 tissue near KLF14 with broad but differential effects in subcutaneous and visceral adipose. 84 Detection of trans-eQTLs across 44 tissues 85 We performed trans-eQTL association mapping in each of the 44 GTEx tissues independently. 86 We applied a linear model controlling for ancestry, sex, genotyping platform, and unobserved 87 factors in the expression data for each tissue that may reflect batch or other technical 88 confounders15,16 (see Online Methods). We tested for association between every protein coding 89 gene or long non-coding RNA and all autosomal variants (minor allele frequency, MAF > 0.05), 90 where the gene-variant pair was located on different chromosomes. We developed a 91 standardized pipeline for filtering detectable false positives from trans-eQTL identification in 92 RNA-seq data. For example, one major source of artifacts arises from mapping errors in 93 sequencing data, for which true cis-eQTL variants appear to regulate distal genes due to 94 sequence similarity between distant regions of the genome3. To correct for this, we eliminated 95 from consideration genes with poor mappability, variants in repetitive elements, and trans-eQTL 96 associations between pairs of genomic loci that show evidence of cross-mapping (see Online 97 Methods). 98 Applying this approach, we found a total of 590 trans-eQTLs (false discovery rate, FDR ≤ 0.1,