Downloaded by guest on September 25, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1707375114 Lopes-Ramos M. networks Camila eQTL Fagny with Maud tissues in regulation Exploring GTEx and associated. potential disease regulatory being in of differing likelihood the hubs with community informative, and highly global also chro- is active centrality tissue-specific Node regions. in matin variants around coalesce pro- tissue-specific that with cesses associated communities as well pro- across as processes biological tissues, shared coherent representing communities in find We involved cesses. these often commu- that modular grouping tissues, highly dense, nities 13 into organized in are show, networks structure eQTL We network function. the that regulatory found informs and levels expression variants genetic and between associations which significant in represent networks edges bipartite trait constructed quantitative we analyses, expression a (eQTL) Using developing map. in vari- challenge phenotype major genetic to a 2017) genotype of is 3, May phenotypes impact review complex for on regulatory (received ants 2017 collective 4, August the approved and Characterizing CA, Berkeley, California, of University Rine, Jasper by Edited and 02115; 02115; MA MA Boston, Health, Public of School a aoyrlso omnvrat n hi olcieipc on impact collective their regu- and net- tissue-specific variants these and that common shared find of we the roles edge, into latory an insight as cast provide is works tissue a within ciation Expression networks Genotype–Tissue from eQTL tissue-level the constructing by By consortium. collected (GTEx) (MAF) tissues frequency 13 allele in ants [minor common of effects tissues different in functions cellular incomplete. and is and humans, expression multiple in gene feasible how ence become of recently understanding only our has (14) How- tissues con- (13). is of organisms phenotype detection model large-scale the in ever, on observations impact similar an with phenotypes have sistent might of (12), variety humans a in in variants these That (9–11). of importance the for gene, the near those of (TSS) site Fur- start gene, the transcriptional a (8). the of tissues from proportion across far variants greater identified thermore, diabetes a those 2 than explain type heritability to for disease shown tissues been adipose iden- have and on eQTL (T2D) muscle depends example, skeletal For also in phenotype. likely the tified variants to these relevant Identify- tissues of the influence 7). role (6, regulatory they regulation the that gene ing suggesting in changes (5), quantitative through analysis phenotypes expression (eQTL) by locus measured to likely trait as variants expression, the for gene enriched of are affect (SNPs) regions traits polymorphisms complex noncoding single-nucleotide with in associated Those lie 4). (∼93%) (3, which genome overwhelm- of the 2), majority (1, size ing by effect small influenced have relatively are of traits, variants phenotypes many phenotypic complex that association and shown for variants resoundingly look genetic which common Genome- (GWASs), between limited. studies remains association traits wide complex and variation genetic M GWAS eateto isaitc n opttoa ilg,Dn-abrCne nttt,Bso,M 02115; MA Boston, Institute, Cancer Dana-Farber Biology, Computational and Biostatistics of Department epromdassesgntc nlsso h regulatory the of analysis genetics systems a performed We | and cis- xrsinqatttv ri locus trait quantitative expression r hnadcd fe h eunigo h human the between of relationship sequencing the the of after understanding our decade genome, a than ore eT,epanmr ftehrtblt fTDthan T2D of heritability the of more explain trans-eQTL, QL hc a nunehnrd fgenes of hundreds influence can which trans-eQTL, eT,weeec infiatSPgn asso- SNP–gene significant each where trans-eQTL, d a,b eateto acrBooy aaFre acrIsiue otn A02115 MA Boston, Institute, Cancer Dana-Farber Biology, Cancer of Department oehN Paulson N. Joseph , QL n hr smutn evidence mounting is there and cis-eQTL, a,b ibryGlass Kimberly , QLars ouain and populations across trans-eQTL | eQTL a,b c and cis- hnigDvso fNtokMdcn,BihmadWmnsHsia n avr eia colBoston, School Medical Harvard and Hospital Women’s and Brigham Medicine, Network of Division Channing aik .Kuijjer L. Marieke , | iatt networks bipartite c eT influ- trans-eQTL onQuackenbush John , % vari- >5%] | a,b bietR Sonawane R. Abhijeet , 1073/pnas.1707375114/-/DCSupplemental at online information supporting contains article This option. access open 1 PNAS the through online available Freely Submission. Direct PNAS a is article This interest. of paper. conflict the no wrote J.P. declare analyzed and authors J.P. J.Q., The and K.G., C.M.L.-R., J.Q., C.-Y.C., J.N.P., A.R.S., M.F., M.L.K., J.N.P., research; M.F., and performed data; J.P. and M.F. research; designed N-e n eoyigdt rpoesn n tissue-specific and preprocessing data genotyping and S1 (see RNA-Seq preprocessing Fig. and Appendix, control quality SI (approved After (dbGaP) GTEx 9112). Phenotypes no. the and protocol from from Genotypes release) lines of 2015-10-05 database cell the (phs000424.v6.p1, two dataset and 6.0 types version tissue 50 samples from postmortem from coming obtained genotypes Modular. imputed and Highly data Are Networks eQTL Discussion and Results avail- are tissues, with 13 all along at across networks, able properties tissue- network These gene and variants. and shared genetic SNP the of both effects understanding nat- specific a for provides that framework and reflects ural tissues that of associations across of architecture analysis web of polygenic from complex the devoid a emerges is and networks that eQTL enhancers picture the enriched nongenic The as network—are association. such the GWAS elements throughout distal genes site for many con- start are to hubs—which transcriptional global nected (iii) the and association; to GWAS for close and for regions enriched community—are chromatin their active in genes SNPs to are connected SNPs)—which regula- highly (core hubs open for community and (ii) regions); enriched transcribed tory (actively edges—are tissue-specific regions in chromatin within-group SNPs active and of genes, related density functionally genes pathways, and high SNPs a of composed with the are biol- of Communities—which regulatory aspects (i) tissue-level ogy: three inform that find topology we network eQTL particular, In pathways. biological uhrcnrbtos ...... YC,CML-. .. .. n J.P. and J.Q., K.G., C.M.L.-R., C.-Y.C., A.R.S., M.L.K., J.N.P., M.F., contributions: Author [email protected]. owo orsodnemyb drse.Eal [email protected] or [email protected] Email: addressed. be may correspondence whom To a,b,d,1 nqeisgtit h eoyepeoyerelationship. genotype–phenotype the into to provide insight findings variants unique linked Our clusters genetic potential. by regulatory eQTL tissue-specific tissue-specific driven with of and functions, tissue-specific clusters tissues structure to linked the across found explored functions We and shared networks. network, a those tissues in of 13 edges asso- of significant as represented each ciations analysis, In eQTL insight an phenotype. provide performed can of we genome gene, regulation the each genetic across of into levels locations expression of the millions with associates at which variants analysis, (eQTL) genetic locus in trait Expres- expressed tissue. quantitative be the sion pheno- can on influences genome depending ways, same genotype different the substantially that individual, is an genetics In type. in tenet core A Significance n onPlatig John and , networkmedicine.org:3838/eqtl/. PNAS c h-iChen Cho-Yi , | and b eateto isaitc,HradT .Chan H. T. Harvard Biostatistics, of Department ulse nieAgs 9 2017 29, August online Published a,b,1 aeil n Methods and Materials . a,b edwlae RNA-Seq downloaded We www.pnas.org/lookup/suppl/doi:10. , o eal of details for | E7841–E7850

SYSTEMS BIOLOGY PNAS PLUS gene filtering), we retained 29,242 genes (15, 16) and 5,096,867 regions into 15 chromatin states based on epigenetic marks mea- SNPs across all tissues. For statistical power purposes, we consid- sured in a specific tissue or cell line; these classifications were ered only tissues for which we had both RNA-Seq and imputed available for eight tissues (adipose subcutaneous, artery aorta, genotyping data for at least 200 individuals (SI Appendix, Fig. fibroblast cell line, esophagus mucosa, heart left ventricle, lung, S2). Twelve tissues and one cell line met all criteria (SI Appendix, skeletal muscle, and whole blood) (20). Correcting for local Fig. S1 and Table S1). linkage disequilibrium, we found that trans-eQTL are signifi- For each of the 13 tissues, we tested for association between cantly more likely to be located in regulatory and actively tran- SNP genotypes and levels both in cis and in trans, scribed regions (TSSs and their flanking regions, enhancers) in correcting for reported sex, age, ethnic background, and the top at least seven of the eight tissues and significantly less likely three principal components obtained using genotyping data (SI to fall into quiescent regions and constitutive heterochromatin Appendix, Figs. S1 and S3 and Materials and Methods). Including (Dataset S1). This is confirmed by P values combined across SNPs within 1 Mb of each gene, we found between 285,283 and tissues obtained using conditional logistic regression (stratified 691,333 significant cis-eQTL (5,301–11,035 genes) and between by tissue; Dataset S1). In addition, we found that a major- 7,151 and 15,183 significant trans-eQTL (326–955 genes) at a ity (50–66%) of trans-eQTL were also associated with genes false discovery rate (FDR) of 5% in each tissue. Despite dif- in cis. ferences in normalization, number and type of covariates, and For each of the 13 tissues, we represented the significant eQTL eQTL P-value calculation, on average 73% of our cis-eQTL were as a bipartite network, with nodes representing either SNPs or also detected by the GTEx project in each tissue (minimum 70% genes and edges representing significant SNP–gene associations in artery aorta and maximum 76% in thyroid; SI Appendix, Table (for example, heart left ventricle tissue; Fig. 1A). Each network S2). Consistent with previous reports (17–19), we find most cis- was composed of a “giant connected component” (GCC) plus eQTL are located around TSSs, with 50% of SNPs located within additional small connected components. To increase the size of ∼16,000 bp of the nearest TSS (14,767 bp for whole blood and the GCC and because network centrality measures are more 17,109 bp for thyroid). We also observed that cis-eQTL were sensitive to false negative than to false positive edges (21, 22), highly replicable across tissues (70–88% were replicated in at we relaxed the FDR cutoff and included all eQTL with FDR least one other tissue), while trans-eQTL replicability across tis- q values under 0.2. At this threshold, about 11% of SNPs are sues was more varied (37–88% were replicated in at least one associated with at least one gene in cis (333,225–751,418 SNPs) other tissue). and 0.3% in trans (10,814–22,570 SNPs; Dataset S2). For all If our trans-eQTL reflect actual associations, we would subsequent analyses, we focused on the networks defined by expect that they should be preferentially located in regulatory these GCCs. genomic regions. To test for this, we used the Roadmap Epige- We used the R condor package (23) to identify communities nomics Project core 15-states model, which classifies genomic in each of the 13 eQTL networks (SI Appendix, Fig. S1). We

A

B Com C SNPs WBL 22 ATA 56 HRV 123 LNG 62 ATT 85 SMU 103 ADS 104 EMC 131 EMS 122

FIB 131 Genes SKN 169 TNV 147 THY 162

0.0 0.2 0.4 0.6 0.8 1.0 Modularity

Fig. 1. Structure of eQTL networks. (A) The eQTL network from heart left ventricle. Each circle represents a community. The nodes, both SNPs and genes, are located around each circle. Gray lines represent network edges (significant cis- and trans-eQTL associations). (B) Modularity of the eQTL network from each of the 13 tissues. Modularity assesses the strength of division of the network in communities and corresponds to the fraction of edges observed within each community minus the expected fraction if edges were randomly distributed. B, Right shows the number of communities (Com) in each network. (C) Structure of communities within the eQTL heart left ventricle network. The heart left ventricle network is represented as a matrix with SNPs in columns and genes in rows. Each network edge is represented by a point. Intracommunity edges are plotted in blue and intercommunity edges in black. Community structure for the other 12 networks is presented in SI Appendix, Fig. S4.

E7842 | www.pnas.org/cgi/doi/10.1073/pnas.1707375114 Fagny et al. Downloaded by guest on September 25, 2021 Downloaded by guest on September 25, 2021 an tal. et Fagny akydarmlnigcutr rmtehampt the to heatmap the of from number clusters the linking to proportional diagram is Sankey thickness (B) ribbon’s included. A functions. are clustered network. tissues TS the each 12 in in involved least terms genes GO at of for in cluster enriched each significant community for one were enriched least communities that at terms contain GO that in tissues Only found tissues. be can all network each from and community each in processes 2. Fig. in (example density communities the different 1 than nodes from Fig. higher nodes linking much linking edges edges being of of communities community density same the the the network, with from each clustered, Fig. within (thyroid, tightly that 0.97 were to suggests eQTL blood) the This (whole communities of 1B). 0.74 (skin) modularities from The 169 ranged network. networks and tissue-specific blood) each (whole in 22 between found ae xlsvl ngnmccreain uha ikg dise- linkage correlation. as expression not such gene is or correlations quilibrium communities genomic network on results eQTL exclusively or our of based Overall, two structure the S6). from that Fig. SNPs suggest Appendix, subcuta- and (SI (adipose genes 94% included more and communities line) tissues, of 13 cell neous) all Across (fibroblast distribu- community. 71% each chromosomal in between the SNPs and examined genes we of cluster tion this, to test region chromosomal To same the together. clustering from a genes and to SNPs lead of might disequilibrium, linkage local including using found be to likely not methods. frac- thus network tiny and coexpression a genes, for few except contain correlation, that tion expression gene 29, by of driven ratio not odds S5B, Fig. cor- , Appendix median P (SI with 0.4 those than higher among than relation overrepresented (fewer were communities genes) small-size correlation 10 Moreover, median a 0.4. presenting communities, than communities of higher median of 86% The in 2% 0.2 about S5A). than with Fig. lower Appendix, was (SI correlation from absolute communities ranged the community of and most tissue coef- by pairwise of ficients distributions The network. each and community correla- Pearson (Pearson the coefficient computed tion we structure, community and mined 3 between and genes 1,220 SNPs. and 26,056 2 between broadly, ranged A ial,w eonz htgntcrcmiaineffects, recombination genetic that recognize we Finally, deter- expression gene correlated much how determine To au of value C oentokcmuiisaeerce o ilgclfntossae costsus opeels ftesgicnl vrersne biological overrepresented significantly the of list complete A tissues. across shared functions biological for enriched are communities network Some and 2.3 .Tesz ftecommunities the of size The S4). Fig. Appendix, SI × 2 5 10 −20 Tissues -Communities .Ti hw htorcmuiisare communities our that shows This ). r o ahpi fgnswti each within genes of pair each for ) 3 1 4 . o02for 0.2 to −0.2 ( S3. Dataset −log Biological Processes 10 eta lseigtesmlrt fG ilgclpoessi communities in processes biological GO of similarity the clustering Heatmap A) 8 0 4 ( p ) ee qiaeti iet h omnt nteoiia test original the in community by the of tissue to set each selected size ( randomly in in a term equivalent using genes GO test enrichment shared tests the a of rerunning 1,000 number for of the distribution enriched to shared null due community a in signal computed enrichment spurious we a this run, not that was ensure terms sig- tissue- further GO have To 13 terms S3). GO the Dataset shared of these were that combined 12 nificant showed that probability least we combined terms Fisher at (FCPT), the test GO from Using networks. as community eQTL specific defined a in we enriched which 2 functions, (Figs. logical tissues all to relevant 3). functions and included biological others with while for functions genes enriched tissue-specific were in communities involved some genes that found We and S3). communities tissues (Dataset across 5% and processes of biological FDR observed for a the enriched at compared processes process communities biological 208 biological one found least we at (GO) tissues, all Ontology over- Across for (24). Gene tissue each of in communities representation all tested we Bio- communities, Tissue-Specific and Functions. Shared logical Reflects Structure Community eQTL endfiegop fcmuiis(i.2and 2 (Fig. communities of that groups immunity and five metabolism regu- to RNA defined related and terms transcription GO of of groups lation five identified We 2A). (Fig. enrichment enrich- terms. random GO than enrichment common of in rather observed 4% ment processes the bottom biological that the shared indicating in reflects distributions, was null enrichment observed the the communities, the and with tissues ciated all across of terms “regulation GO in GO:0010468 shown term is expression” GO gene shared the in enrichment .A xml fanl itiuinof distribution null a of example An S3). Dataset efis dnie omnte nihdfruiutu bio- ubiquitous for enriched communities identified first We oaayeteersls ecutrdcmuiisuigtheir using communities clustered we results, these analyze To B P ausi hs bqiosbooia ucin only functions biological ubiquitous these in values oivsiaetebooia eeac fthese of relevance biological the investigate To P ausars l ise (P tissues all across values PNAS | o l shared all For S8. Fig. Appendix, SI ulse nieAgs 9 2017 29, August online Published P vle o each for -values aussonin shown values P aae S3 Dataset P au asso- value ausfor values | E7843 ).

SYSTEMS BIOLOGY PNAS PLUS 22 21 rs4660306 A 20 1 B generation of precursor 19 rs1053941 metabolites and energy 18 energy derivation by oxidation 17 2 of organic compounds rs2779116 oxidative phosphorylation 16 ATP synthesis coupled 15 3 electron transport mitochondrial ATP synthesis coupled electron transport 14 mitochondrial electron transport, NADH to ubiquinone 4 rs7727544 rs419291 13 proton transport rs11950562 rs273914 hydrogen transport Odds Ratio 2 12 5 rs273913 hydrogen ion rs272889 transmembrane transport 10 rs272869 11 mitochondrion organization 20 6 rs274567 rs1016988 10 0 5 10 15 20 25 7 rs2106854

rs2896526 9 8 rs2522056 −log10(FDR qvalue)

Fig. 3. Close-up of a community enriched for TS GO biological processes: the heart ventricle community 86. (A) Circos diagram for heart left ventricle eQTL in community 86. From the outside to the inside: chromosomes (in black), genes from community 86 (in blue, dark blue bars represent genes associated with cellular metabolism), SNPs (in green, with dark green bars indicating an association with metabolic traits), and SNP–gene assocations (in gray: all associations, in red: eQTL linked to genes involved in cellular respiration or SNPs involved in metabolism). Details about GWAS annotation of the SNPs and genes are given in Dataset S4.(B) Heart left ventricle community 86 is enriched for genes involved in cellular respiration.

For example, we found that genes involved in functions linked of these 52 genes to be associated with cardiovascular-related to pathogen recognition, innate and adaptive immune response traits and metabolism such as conotruncal heart defects, obesity, triggering, T-cell activation, and inflammation (groups 2, 3, and and blood metabolite levels, a significant enrichment [resam- 5) are overrepresented in at least one community in all tissue- pling P = 1.6 × 10−3, Fig. 3A and Dataset S4 (26)]. This com- specific eQTL networks except lung. These communities include munity also contains SNPs from three genomic regions that have many genes from the major histocompatibility complex (MHC) been linked to metabolic traits and blood metabolite levels; these class II and may reflect the presence of infiltrated macrophages include association with the metabolite carnitine, a molecule in all tissues. Similarly, 12 of the 13 tissue-specific networks involved in the transport of fatty acid from cytoplasm to the mito- present communities strongly enriched for genes related to the chondrial matrix where those fatty acids are metabolized (Fig. 3A control of transcription and RNA metabolism (groups 1 and 4). and Dataset S4). Communities that are enriched for the same GO biological Other examples of tissue-specific overrepresentation of bio- process tend to share many of the same SNPs, genes, and edges, logical functions include transmission of nerve impulse and with the average pairwise Jaccard index between tissues for each myelination in community 4 from tibial nerve and muscle devel- GO biological process ranging from 0.16 to 0.69 for SNPs, 0.17 opment and contraction in muscular tissues. Community 30 from to 0.47 for genes, and 0.10 to 0.46 for edges. The content of these the heart left ventricle network is enriched for genes related communities with shared biological functions can vary slightly to ventricular cardiac muscle tissue morphogenesis and contrac- from tissue to tissue (1–5%, 1–10%, and 9–32% for SNPs, genes, tion, community 75 from skeletal muscle for striated muscle con- and edges, respectively). However, these differences are small traction and cell differentiation, and community 83 from esopha- and likely due to slight variations in sample size between tissues, gus muscularis for smooth muscle contraction (SI Appendix, Fig. affecting the significance of the eQTL associations, network clus- S7 and Dataset S3). tering, and the small differences in the populations analyzed for This suggests that eQTL network community structure reflects each tissue (average proportion of samples shared with other tis- many SNPs working together to influence groups of functionally sues varies from 64% to 95%). related genes and that some of the communities capture unique We also used the GO enrichment analysis to identify tissue- features of tissues and phenotypic states. Our analysis of these specific functions, which we defined as a community-enriched eQTL communities paints an empirically robust picture of com- GO term appearing in no more than two tissues. An interesting munities that involve functions shared across tissues as well as example is community 86 from the heart left ventricle network those that are highly specialized to individual tissues. (Fig. 3). This community is enriched for genes involved in cellu- lar respiration and the mitochondrial respiratory chain (Fig. 3B Biological Characteristics of Tissue-Specific Communities. Given and Dataset S3). These functions, despite being ubiquitous, are that genetic variants are present in all tissue types, it may be particularly important in heart due to its high metabolic require- that tissue-specific epigenetic activation influences their regula- ments, and studies have shown that dysfunction of the respiratory tory effect on gene expression. Consequently, we searched for chain is involved in many heart diseases (25). This heart commu- chromatin state changes associated with the tissue-specific com- nity contains 13,143 SNPs (1,834 linkage disequilibrium blocks) munities we observed. We defined tissue-specific and shared linked to 1,182 genes (gray links, Fig. 3A). Of these, there is a communities, using the enrichment in GO biological processes. subset involved in cellular respiration (red links) that includes Tissue-specific (TS) communities were defined as those that 547 SNPs [100 linkage disequilibrium (LD) blocks] linked to showed a higher than expected proportion of GO biological 52 genes, located on 20 autosomes. GWASs have reported 5 processes that were present in no more than two tissues, while

E7844 | www.pnas.org/cgi/doi/10.1073/pnas.1707375114 Fagny et al. Downloaded by guest on September 25, 2021 Downloaded by guest on September 25, 2021 i hteegsfo u tutrlaayi feT ewrsis all networks eQTL not of analysis structural but our hypothe- from 27), natural emerges that One 14, sis roles. regulatory (3, obvious have regions SNPs eQTL regulatory in Functional overrepresented SNP Different Reflect Cores Roles. Community and Hubs Global and 1.21 were test Haenszel and ratio odds combined the (odds communities and shared TS in ratios show SNPs to TS likely than and more activation TS significantly chromatin tissues, TS are eight communities the comparing TS of By in five SNPs interest in tissues. that of found other tissue we communities, seven the shared in simi- the state repressed, with chromatin a compared identical presented exactly that or were that regions lar, was those chromatin and that interest in region cat- of chromatin located tissue following a the two in in the activated located of specifically net- were one each to that SNPs those For TS egories: data. avail- the Project SNPs assigned were Epigenomics we TS maps work, Roadmap all chromatin-state the for which in state for able chromatin tissues eight the test the To extracted regions. in first genomic we of this, activation TS for from result may nities an tal. et Fagny FCPT across-tissue and 4A, (P 2.6 Fig. 100% communities in to the presented up of are represent SNPs can content which TS the communities, of of TS proportion in higher edges significantly and a observed also We etil;LG ug S osgicn;SN kn M,seea uce N,tba ev;TY hri;WL hl lo WL;*P (WBL); blood whole WBL, thyroid; THY, nerve; tibial TNV, muscle; skeletal SMU, skin; SKN, nonsignificant; ***P TS NS, in lung; elements LNG, unique communities. of ventricle; shared proportion with mean compared of communities (A TS ratio LD. among the regions represent chromatin Barplots activated pathways. TS shared in involved communities. shared genes vs. with communities with compared processes 4. Fig. a with 4A), (Fig. tissues 13 the This of tissue). 9 the in combined on significant depending is communities, 2% difference from shared and in communities TS 23% pro- in to 37% (average to communities 5% from shared range than higher portions genes a have TS to of tend we communities proportion TS expected, tissues, As than all communities. in more shared that no observed and in tis- TS present 13 are between that the tissues) edges edges 2 of and and genes, genes, 12 SNPs, SNPs, least is, TS of (that at proportion in the compared present We proportion sues. processes expected biological than GO higher of a showed communities shared hs SSP n de eT soitos nT commu- TS in associations) (eQTL edges and SNPs TS These × < and 10 0.001. ubro tde aesonta QLSP are SNPs eQTL that shown have studies of number A niheti Sgns Ns n de mn omnte ihgnsivle nT Obiological GO TS in involved genes with communities among edges and SNPs, genes, TS in Enrichment (A) communities. TS of Characteristics −12 D,aioesbuaeu;AA ot;AT reytba;EC spau uoa M,eohgsmsuai;FB bols;HV er left heart HRV, fibroblast; FIB, muscularis; esophagus EMS, mucosa; esophagus EMC, tibial; artery ATT, aorta; ATA, subcutaneous; adipose ADS, B) P P and au costsusuigteFP of FCPT the using tissues across value auscretdfrL r rsne nFg 4B; Fig. in presented are LD for corrected values 7.9 × A, 10 Right WBL THY SKN TNV SMU LNG HRV EMS EMC FIB ATT ATA ADS AB −10 (tissues-specific /sharedcommunities) 12345 shows respectively). , 1.4 P Ratio oftissue-specificelements Genes au sn h Cochran–Mantel– the using value × P 10 ausotie sn h Mann–Whitney the using obtained values −15 auscretdfrLD for corrected values respectively). , Edges P 9.8 ausare values × SNPs 10 −10 *** *** *** *** NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS ** ** ** ** ** ** ** . * * * * * * * * * * * * 2 n 31;Fg 5B 7, Fig. (states 13–14; regions and repressed 12, Polycomb and enhancers nongenic capacities (29). invasion cells adenocarcinomatous and esophageal migration, of proliferation, asso- the and mucous with esophagus, forming ciated the in including involved surfaces family epithelial mucin on the the barriers of from a model for ing 15-states a core is It the GBAP1, 5C). (Fig. to (20) Project region according Epigenomics TSS Roadmap active tissue the an in this in 31 located community in of and SNP network core a mucosa is esophagus (28), GWAS gas- a and in esophageal cancer with tric associated in been located has SNP that 1 a rs4072037, example, Fig. For 10, S5. and Dataset 1 (states promoters for 5A enriched are SNPs core obtain that to us and back- allows ratio odds which as combined regression, Methods). GCC logistic and the conditional (Materials a density in Using gene SNPs for all correcting using and ground available, chromatin- which were Epige- for maps tissues SNPs Roadmap eight state central the the in of Methods) enrichment using and calculated (Materials manner, We 15 TS data. across Project a nomics SNPs) in high-degree states or chromatin high–core-score (either SNPs global for edges. account within-community to more have normalized may also which the is hubs, com- score while its within core network, SNP The each entire of munity. centrality the local the within reflects score centrality core is SNP “core SNP global or each SNP the each which by to (Eq. contributed score” genes modularity of centrality: the number of linked—and measures total two degree—the using the role, regulatory and structure the or at globally SNPs than either network. roles the connected, regulatory of highly periphery different have more would expect are might locally, that one Specifically, SNPs SNPs. that the of with roles associated are regulatory that various properties network be may there that WBL SMU LNG HRV EMC FIB ATA ADS lblhb r eeal infiatymr ieyt i within lie to likely more significantly generally are hubs Global central of enrichment the studied we network, TS each For network eQTL between relationship the investigated We U and SSP r oelkl ob oae in located be to likely more are SNPs TS (B) LD. for correcting and test . . 2.5 2 1.5 1 0.5 .Tsu-pcfi dsrto a efudin found be can ratios odds Tissue-specific S5). Dataset DCST2, P 2 auswr banduigteFse etadcretn for correcting and test Fisher the using obtained were values in .ASPsdge reflects degree SNP’s A Methods). and Materials GBA, Odds Ratio PNAS and RP11-263K19.4, P and au cosalegttsus efound we tissues, eight all across value | ulse nieAgs 9 2017 29, August online Published .Ago xml is example good A S5). Dataset NS NS NS *** *** *** *** * eecod- gene a MUC1, < 5 **P 0.05, eT of cis-eQTL MUC1 | < E7845 0.01, on

SYSTEMS BIOLOGY PNAS PLUS ABHigh core-score High degree 1_TssA 1_TssA 2_TssAFlnk 2_TssAFlnk 3_TxFlnk 3_TxFlnk 4_Tx 4_Tx 5_TxWk 5_TxWk 6_EnhG 6_EnhG 7_Enh 7_Enh 8_ZNF/Rpts 8_ZNF/Rpts 9_Het 9_Het 10_TssBiv 10_TssBiv 11_BivFlnk 11_BivFlnk 12_EnhBiv 12_EnhBiv Chromatin State 13_ReprPC 13_ReprPC 14_ReprPCWk 14_ReprPCWk 15_Quies 15_Quies

0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Odds Ratio Odds Ratio CD 155.00 155.05 155.10 155.15 155.20 172.65 172.75 172.85 172.95 173.05 173.15

chr1 (Mb) chr5 (Mb)

RP11−263K19.4 GBAP1 NKX2−5 rs2546765 DCST2 GBA MUC1 HAX1 (chr1) ALCAM (chr3) rs4072037 SEC24D (chr4) SNX9 & FAM229B (chr6) 1_TssA 6_EnhG 11_BivFlnk HGF & AC018464.3 (chr7) 2_TssAFlnk 7_Enh 12_EnhBiv MMAB & RP11-620J15.3 (chr12) 3_TxFlnk 8_ZNF/Rpts 13_ReprPC SIPA1L1 (chr14) 4_Tx 9_Het 14_ReprPCWk MED9 (chr17) RAB31 (chr18) 5_TxWk 10_TssBiv 15_Quies

Fig. 5. Network hub SNPs are more likely to lie in active chromatin regions than nonhub SNPs. (A and B) Symbols shows the odds ratios across all eight tissues with matching chromatin states from the Roadmap Epigenomics Project. Odds ratios are combined across tissues, using conditional logistic regression. Bars represent the 95% confidence interval. Each odds ratio measures the enrichment of central SNPs in a particular functional category, corrected for number of genes within 1 Mb of the SNP. (A) Enrichment in each chromatin state for SNPs with a core score in the top 25%. (B) Enrichment in each chromatin state for SNPs with network degree greater than 10. Odds ratios and P values for each TS network are listed in Dataset S5. Enrichment in each chromatin state for all trans-eQTL are presented in Dataset S1.(C) Example of a core SNP located in an active TSS region: rs4072037, associated with esophageal and gastric cancer. (D) Example of a high-degree SNP located in an active enhancer: rs2546765. The chromatin-state tracks are based on results from the Roadmap Epigenomics Project core 15-states model for esophagus mucosa (E079) for C and heart left ventricle (E095) for D.

rs2546765, which is an intergenic SNP on chromosome 5, a high- where the eQTL network structure provides insight into biologi- degree SNP in community 86 in the heart left ventricle network, cal processes, an insight that is reproducible across tissues. connected to 13 genes across 10 chromosomes and 3 communi- ties and located in an enhancer region in this tissue (Fig. 5D). SNPs in Community Cores Are Associated with Trait and Disease Phe- It is a cis-eQTL of NKX2-5, a gene involved in heart develop- notypes. We tested whether the SNPs with high centrality were ment, and a trans-eQTL of among others ALCAM, involved in more likely to be associated with complex traits in the GWAS by heart development in vertebrates; HAX-1, a regulator of con- mapping trait-associated SNPs in the National tractility and calcium cycling in the heart; SIPA1L1, associated in Research Institute-European Bioinformatics Institute (NHGRI- the GWAS with QRS intervals; and SEC24D, HGF, and MMAB, EBI) GWAS catalog (26) to each TS eQTL network, considering associated in the GWAS with cardiovascular diseases and/or dys- only SNPs with GWAS P values less than 10−8 (GWAS SNPs). lipidemia. As expected, both global and core SNPs are depleted In each tissue, we compared the distribution of degrees (number in constitutive heterochromatin (state 9). of edges per node) for all SNPs and GWAS SNPs. We found that Overall, we find that global hubs and community cores each SNPs of low degree (1–2) were significantly depleted in GWAS have different associations with active regulatory regions: Global SNPs while SNPs of intermediate degree (5–10) showed consis- hubs are preferentially located in distal regulatory regions tent significant enrichment for GWAS SNPs across all 13 tissues (enhancers) while core SNPs are significantly overrepresented in and SNPs of degree greater than 15 were devoid of GWAS asso- proximal regulatory elements (promoters). This is another case ciations (SI Appendix, Fig. S9).

E7846 | www.pnas.org/cgi/doi/10.1073/pnas.1707375114 Fagny et al. Downloaded by guest on September 25, 2021 Downloaded by guest on September 25, 2021 D a o ewl otoldfrtelrenme fGO of number al. et large Fagny the for controlled well the While be processes. not biological may functions coherent FDR similar with share associated that are or chromosomes, multiple genes, for genes on enrichment organized the an located was find examine we tissues community, we each 13 in When the represented communities. found of modular we each highly in genes, into network and eQTL for SNPs the search connected that to densely (30) algorithm of tis- detection communities within community and a across Using variants genetic sues. the of that roles regulatory found insight the functional have into provide We networks these project. of properties GTEx structural the from available data con- networks eQTL from of analysis structed systematic a out carried have We Discussion community—and functional its function. of perturb to center likely the more at therefore likelihood is the variant to the related a and is that tissue, associated each rela- disease in a being of processes to likelihood relevant map of GWAS number the small through sizes tively identified effect be with to SNPs enough associ- large Disease-linked genotype are disease. from findings complex expect These in might network. one ations what the with within agreement hubs in global, local, the the not in but overrepresented cluster significantly SNPs those are within non- GWAS and they communities that of communities, of find number those small we connectivity relatively from a tissue, of within widely each pattern In differs unique SNPs. that GWAS a networks have eQTL SNPs within GWAS that idea itdwt uomuedsae sonfrwoebodi Fig. in asso- blood SNPs whole for only 6B, (shown considering diseases when autoimmune with present ciated also was scores core Fig. Appendix, (SI tissues core 13 signifi- all median in S10 had higher the SNPs SNPs with GWAS non-GWAS for distributions, scores and core-score for GWAS different controlling that non- and cantly GWAS found for test for we those likelihood-ratio scores to LD, a catalog core Using communi- GWAS SNPs. the eQTL NHGRI-EBI GWAS comparing the the central by within from highly this instead SNPs be but tested to scale, We tend global ties. SNPs a GWAS on led that not SNPs hypothesize (Fig. intermediate-degree to degree of us intermediate overrepresentation associated for network, enriched This SNPs eQTL were 6A). they whole-blood mapped that the we found to we tissue when diseases the example, autoimmune to with For related interest. phenotypes with of associated SNPs GWAS in are GWAS. diseases by and traits diseases GWAS in autoimmune are all to network including gray) TS networks each (in TS in not all diseases for or and blue) traits GWAS (in all associated degree. including SNPs network Ratios SNP. their each on around depending density GWASs gene by diseases autoimmune with associated 6. Fig. h ossec fteerslsars ise uprsthe supports tissues across results these of consistency The hs eut r vnmr tiigwe ecnie only consider we when striking more even are results These FCPT , P 1 = ewr rpriso WSSP soitdwt uomuedsae nwoebod ( blood. whole in diseases autoimmune with associated SNPs GWAS of properties Network .3 × P 10 and cis- value −13 ). < 2.22 eT n1 ifrn ise,using tissues, different 13 in trans-eQTL A

× GWAS−SNPs (Obs/Exp) 10

−16 024681012 110 .Ti nihetfrhigh for enrichment This ). Degree i.S10. Fig. Appendix, SI P auswr banduigaLTadpuigfrSP nL.Distributions LD. in SNPs for pruning and LRT a using obtained were values betfo lblhb.Tesgicn vrersnainof overrepresentation significant and The degree, hubs. low global for from depleted also absent degree, is enriched network are GWAS SNPs intermediate GWAS the for tissues: dis- 13 degree from the across The variants consistent highly not. trait-associated are for hubs nongenic tribution global as while GWAS- such variants, for elements enriched while associated are distal enhancers, hubs community for not Moreover, enriched but enhancers. are site, start hubs transcriptional global the regions chromatin to tis- active These for across close enriched properties network. are biological hubs the different Community have sues: throughout hubs which genes of hubs, types many two global to and community, connected their are highly in SNPs genes are which to “cores,” connected or hubs community hubs: of types regulatory into expression gene TS communities. coalesce single help influence in only also emerge not but and eQTL genes, changes TS epigenetic eQTL. shared, TS of be with specificity to concert tissue appear rele- the eQTL of is most discussion Although This ongoing genes. the and to SNPs, enriched vant (eQTL), also edges are tissue-specific they function; enriched for gene only tissue-relevant not specific are for tissue-level communities respective these their addition, related in In functionally roles processes. of important expression with the genes in on genes, act effects sur- SNPs genetic regions these exert chromatin that to and of SNPs activation specific epigenetic communities rounding the these by of communities driven organization that TS the is to these that unique suggests are in that This SNPs regions tissue. chromatin eQTL active for TS enriched that are Epigenome find Roadmap the we from commu- tissues Project these eight of in some data of Using specificity nities: mechanis- tissue plausible the for a explanation is con- tic There as muscle muscularis. smooth such esophagus or processes in ventricle traction left functional heart TS in in communities in respiration SNPs TS cellular involved find the genes also of contain however, that role do, We pleiotropic enrichment communities. the functional these reinforcing of tissues, patterns across common with communities communities. these of structure and organization the drives multiple that of influence genetic the and is it few are that very suggesting communities with genes), communities these of ruled (excepting expect, coexpression possibility completely by might driven the one be not what However, cannot to tests. Contrary explanations be out. of plausibility to number hoc unlikely large post is the enrichment to observed the due across that shared suggests terms GO tissues for analysis resampling our tested, terms Obs/Exp efidtee1 QLntok oss w informative two possess networks eQTL 13 these find We many find we tissues, between communities comparing When eT Nso ucinlyrltdgop fgenes of groups related functionally on SNPs trans-eQTL P auswr banduig100rsmlns aigit account into taking resamplings, 1,000 using obtained were values P−values B itiuino oesoe for scores core of Distribution (B) S9 . Fig. Appendix, SI ai fosre s xetdnme fSNPs of number expected vs. observed of Ratio A) -6

PNAS Core scores (x10 ) o−WSGWAS non−GWAS 0 5 10 15 20 25 | p =1.26x10 ulse nieAgs 9 2017 29, August online Published -13 cis and | trans E7847 cis-

SYSTEMS BIOLOGY PNAS PLUS GWAS SNPs among the community cores provides another late eQTL with an additive linear function that models gene expression lev- important insight. Across tissues, disease SNPs are those most els (Exp) in function of genotypes (Gen) and included age, sex, and ethnic likely to perturb groups of genes and, in doing so, may disturb background (Eth), as well as the first three genotype principal components important biological processes. (PCg), as covariates:

While the observed relationships between eQTL network Exp ∼ Gen + Age + Sex + Eth + PC1g + PC2g + PC3g + . properties and SNP/gene function are consistent across tissue We tested for association between gene expression levels and SNPs both in types, we cannot rule out the possibility that the large number cis and in trans, where we defined cis-SNPs as those within 1 Mb of the TSS of statistical tests performed in the cis- and trans-eQTL analy- of the gene based on mapping using the Bioconductor R biomaRt package sis could lead to identification of some individual eQTL asso- (34). P values were adjusted for multiple testing using Benjamini–Hochberg ciations as significant when they are not. Although we cannot correction for cis- and trans-eQTL separately and only those with adjusted P conclude, based on our analysis, that any individual SNP–gene values less than 0.2 were used in subsequent analyses. association is correct, the consistency of our findings regarding To compare our results with those reported by the GTEx consor- the structure of the networks across multiple tissues and the con- tium, we downloaded the single-tissue cis-eQTL from the GTEx portal sistent functional enrichment we observe for global and local (www.gtexportal.org/). hubs indicate that the higher-order structural organization of Community Identification. For each tissue, we represented the significant the networks likely provides a robust model of SNP regulatory eQTL as edges of a bipartite network linking SNPs and gene nodes. To iden- effects. While one could imagine that the observed network pat- tify highly connected communities of SNPs and genes in the eQTL networks, terns might be driven by unwanted systematic variations in the we used the R condor package (23), which maximizes the bipartite modular- genotype and gene expression data, our identification of similar ity (30). As recursive cluster identification and optimization can be compu- structural properties in an eQTL network derived using an inde- tationally slow, we calculated an initial community structure assignment on pendent chronic obstructive pulmonary disease (COPD) dataset the weighted, gene-space projection, using the multilevel.community func- tion in the R igraph package (35). This function is an implementation of (23) further supports the network structural associations we have a fast unipartite modularity maximization algorithm (36). Using this initial described. Nevertheless, this possibility, along with more detailed assignment, we then iteratively converged on a community structure corre- analysis of specific network-prioritized SNPs, should be further sponding to a maximum bipartite modularity. investigated as additional TS gene expression and genotype data The bipartite modularity is defined in Eq. 1, where m is the number become available through the next release of GTEx and other of links in the network, Aeij is the upper right block of the network adja- large-cohort studies. cency matrix (a binary matrix where a 1 represents a connection between Our analysis of bipartite networks built from both cis- and a SNP and a gene and 0 otherwise), ki is the degree of SNP i, dj is the trans-eQTL in 13 tissues provides important evidence about the degree of gene j, and Ci, Cj are the community indexes of SNP i and gene j, collective role of eQTL in TS function and disease. The net- respectively:   work communities reveal biological processes under the shared 1 X kidj Q = Aeij − δ(Ci, Cj). [1] m m genetic influence of many variants, including both processes i, j shared across tissues and those that are TS. The TS genetic reg- ulation we observe is driven in part by SNPs that lie in TS active SNP Core-Score Calculation. We defined a SNP’s core score as the SNP’s con- chromatin regions. This suggests that epigenetic profile analysis, tribution to the modularity of its community. Specifically, For SNP i in com- munity h, its core score, Qih, is defined by Eq. 2. To normalize SNPs across applied to both genic and nongenic elements, will be essential communities, we accounted for community membership in our downstream for understanding the processes responsible for TS function. The testing (Eqs. 3 and 4), which better accounts for community variation com- eQTL networks also group together functionally related sets of pared with the normalization method used in ref. 23. Indeed, Qih is depen- variants, including GWAS SNPs, and the structure of the net- dent on community size (SI Appendix, Fig. S11): work provides a model of how multiple cis- and trans-acting vari-   1 X kidj Q = Aeij − δ(Ci, h)δ(Cj, h). [2] ants can work together to influence function and phenotype. ih m m While the network models we describe do not fully resolve the j question of how weak-effect variants determine complex traits Functional Category Enrichment. We extracted the list of and disease, this network approach provides a framework with genes within each community in each TS network and then used the R distinct explanatory power that can serve as a basis for further GOstat package (37) to perform a tissue-by-tissue analysis of the overrep- exploration of the link between genotype and phenotype. resentation of Gene Ontology biological processes within each community. Our reference set consisted of all of the genes present in the corresponding Materials and Methods tissue-specific network. Communities were considered significantly enriched in a given category if the FDR-adjusted value was <0.05. GTEx Data Preprocessing, Filtering, and Merging. We downloaded NHGRI P GTEx version 6 imputed genotyping data and RNA-Seq data from the dbGaP TS SNP Enrichment. We defined TS communities and shared communities database. The RNA-Seq data were preprocessed using the Bioconductor R based on their enrichment in GO biological process (BP) terms. We first cal- YARN package (15, 16) and normalized using the Bioconductor R qsmooth culated for each community the proportion of TS BP terms (defined as the package (31). We excluded five sex-specific tissues (prostate, testis, uterus, BP terms that were significant in no more than 2 of the 13 networks) and vagina, and ovary) and merged the skin samples from the suprapubic and the proportion of shared BP terms (defined as the BP terms that were signif- lower leg sites. We limited our eQTL analysis to the 13 tissues for which icant in at least 12 of the 13 networks). In each network, we then extracted there were available data for at least 200 individuals. The RNA-Seq and communities that had a higher than average proportion of either TS BP genotyping data were mapped by the GTEx Consortium to GENCODE ver- terms (TS communities) or shared BP terms (shared communities). In each sion 19, which was based on human genome build GRCh37.p13 (Dec 2013). of these communities, we computed the proportion of TS elements (SNPs, We accounted for RNA extraction kit effects, using the removeBatchEffect genes, or edges), defined as the proportion of elements present in no more function in the R limma package (32). than 2 of the 13 networks. To control for LD in the case of SNPs and edges, we generated lists of SNPs falling into the same LD block using the plink2 eQTL Mapping and Bipartite Network Construction. For eQTL analysis, we –blocks option, a 5-Mb maximum block size, and an r2 of 0.8. In each com- excluded SNPs from all analyses if they had a call rate under 0.9 or an allele munity, we collapsed SNPs by LD blocks and number of network in which frequency lower than 5% in any tissue. A gene was considered expressed in they were present and, in the case of edges, by genes to which they were a sample if its read count was greater than or equal to 6. Genes that were associated. We then compared the distribution of proportion of unique ele- expressed in fewer than 10 of the samples in a tissue were removed for ments between TS and shared communities, using a Mann–Whitney U test. the eQTL analysis in that tissue. To correct for varying degrees of admixture in the African-American subjects, we used the first three principal compo- Enrichment in TS Activated Regions Among Unique SNPs. We first determined nents of the genotyping data provided by the GTEx consortium and included the chromatin state at each TS SNP (as defined above) in eight tissues, using these in our eQTL model. We used the R MatrixEQTL package (33) to calcu- the core 15-states model of the Roadmap Epigenomics Project (below). We

E7848 | www.pnas.org/cgi/doi/10.1073/pnas.1707375114 Fagny et al. Downloaded by guest on September 25, 2021 Downloaded by guest on September 25, 2021 3 rmR,YetG lno ,Kula 20)Gntcdseto ftranscriptional of dissection Genetic (2002) L Kruglyak and R, transcripts Clinton of G, hundreds Yvert RB, affect Brem trans-eqtls Large-scale 13. (2017) al. et B, Brynedal 12. identi- transcriptome human the of genetics the Dissecting (2015) al. et H, Kirsten 10. 1 Franz 11. an tal. et Fagny Epigenomics Roadmap the from data chromatin-state model 15-state core Definition. Category Chromatin-State regressions. linear the in block input maximum as values 5-Mb these a of option, median –blocks the the into plink2 falling an the SNPs and of using size, lists block, generated we LD SNPs, same between LD for control To community to disease belongs or SNP and trait the otherwise, a if 0 with 1 to to associated equal equal is and SNP GWAS the the if in 1 to equal function indicator Eqs. covari- In a community regression. as linear identity the community in added ate we communities, across GWAS uniform includes not (Eq. that variable model this linear include a In (LRT). whether test assesses (Eq. likelihood-ratio status LRT a GWAS- using the with network, between setting, associated TS not our scores each those for core and diseases catalog SNP or NHGRI-EBI traits of the from distribution SNPs the associated compared (https://www.ebi.ac. then website EBI We the from v1.0) uk/gwas version 2015, 8, December Analysis. GWAS Scmuiis oprdwt SSP nsae omnte (definition communities shared in SNPs in TS SNPs above). categories TS with among other compared category communities, three activated TS each the in for enrichment in comparisons calculated seven and calculated we all then activated network, over We Summing the class). comparison. silent in pairwise active a each SNPs an for (from to TS activated bivalent of and a number class), from the silent or a class to other the bivalent any of to a state from simi- or chromatin same), class 4 another other the example to exactly for one is class, from same state change (chromatin (chromatin-state same lar categories: four in SNPs 14 (1 chromatin split (10 active We present. classes: were three 3 they into the which in randomly in states state we network chromatin chromatin LD, which the and for in above) to control (definition tissue corresponding To blocks the tissue others. LD seven in by the SNPs SNP of TS TS selected each each to at present was state it chromatin the compared then .Wsr J ta.(03 ytmtcietfiaino rn QL spttv rvr of drivers putative as eQTLs trans of identification Systematic (2013) al. et HJ, Westra heritabil- the Partitioning 9. eqtls: tissue-specific and Cross-tissue (2014) human al. nine et in JM, Torres dynamics state chromatin 8. of analysis and Mapping (2011) al. et J, Ernst vari- disease-associated 7. common of localization Systematic (2012) al. et Annotation MT, eQTLs: Maurano be to likely 6. more are SNPs Trait-associated (2010) al. et DL, Nicolae genome and 5. epigenomics Using GWAS: of sense Making (2015) traits PJ complex Farnham in YG, Tak variation genetic 4. noncoding Interpreting (2012) M Kellis LD, Ward 3. .Srne E th A a 21)Pors n rms fgnm-ieassociation genome-wide of promise and discovery. Progress (2011) GWAS T of Raj years EA, Stahl Five BE, (2012) Stranger J Yang 2. MI, McCarthy MA, Brown PM, Visscher 1. xlk 4 TxFlnk, euaini udn yeast. budding in regulation co-regulation. transcriptional of patterns mediate diseases. and tissues across regulation gene non- of relevance regulatory loci. the coding corroborates protein and trans-eqtls trait-related novel fies associations. disease known trait. complex a of ity types. cell DNA. regulatory in ation gwas. from discovery enhance to of regions non-coding in genome. SNPs human of the relevance functional the understand to engineering disease. human and tde o ua ope ri genetics. trait complex human for studies Genet Hum J Am erCk 15 ReprPCWk, sBv 11 TssBiv, nO ta.(06 adoeaoi iklc hr ontemcs n trans- and cis- downstream share loci risk Cardiometabolic (2016) al. et O, en ´ P .W lee u soitoswith associations out filtered We ). auswr acltduigaFse test. Fisher a using calculated were values t h bevddt etrta iermdlta osnot does that model linear a than better data observed the fits 4) Nature h, x 5 Tx, r 2 iFn,12 BivFlnk, n Q f08 nec omnt,frec Dbok eextracted we block, LD each for community, each In 0.8. of edwlae h HR-B WSctlg(accessed catalog GWAS NHGRI-EBI the downloaded We ih h ubro omnte ntetissue, the in communities of number the Q 90:7–24. 473:43–49. xk 6 TxWk, ue) o ahpiws oprsn ecasfidTS classified we comparison, pairwise each For Quies). o WSSP n o-WSSP eaaeyadused and separately SNPs non-GWAS and SNPs GWAS for ih u o Genet Mol Hum a Biotechnol Nat pgntc Chromatin Epigenetics ∼ mJHmGenet Hum J Am I Science (GWAS nBv,adsln hoai (9 chromatin silent and EnhBiv), .A h itiuino N oesoe (Q scores core SNP of distribution the As 3). Q Tx a Genet Nat nG 7 EnhG, ih Science → ∼ 337:1190–1195. LSGenet PLoS X n−1 = k 5 =1 30:1095–1106. 24:4746–4763. 296:752–755. xk,rpesd(rma ciet any to active an (from repressed TxWk), 1) 3 I 45:1238–1243. n,8 Enh, 95:521–534. (C and + edwlae h genome-wide the downloaded We Genetics k X n−1 k Science 8:1–18. = =1 4, 6:e1000888. 1) I k Q N/ps,bvln chromatin bivalent ZNF/Rpts), (C I (C + ih n qa o0otherwise: 0 to equal and mJHmGenet Hum J Am P 187:367–383. k 353:827–830. k stecr cr fSNP of score core the is  = ausgetrta 10 than greater values = 1) )a niao function indicator an 1) + . sA 2 TssA, e,13 Het, I (GWAS 100:581–591. TssAFlnk, ReprPC, = )an 1) ih −8 i is ) [3] [4] in . 4 h TxCnotu 21)TeGntp-iseEpeso GE)pltanalysis: pilot (GTEx) Expression Genotype-Tissue The (2015) Consortium GTEx The 14. 7 ia S ta.(09 omnrgltr aito mat eeepeso nacell a in expression gene impacts variation regulatory Common (2009) al. et AS, Dimas het- 17. for normalization and processing RNA-Seq Tissue-aware (2016) al. et J, Paulson 16. (2016) al. et JN, Paulson 15. 6 etrD ta.(04 h HR WSctlg uae eoreo SNP-trait of resource curated a catalog, GWAS NHGRI The (2014) al. dys- et chain D, respiratory heart—Mitochondrial Welter breathing The 26. The biology. (2014) of al. unification et the K, for Tool Schwarz ontology: 25. Gene (2000) al. structure et community Bipartite M, (2016) Ashburner J 24. Quackenbush D, DeMeo PJ, Castaldi J, Platig 23. errors. link to measures network of Robustness (2013) M Girvan E, Ott J, reference Platig 111 of 21. analysis Integrative (2015) al. et insight Consortium, Epigenomics yields Roadmap expression-QTLs of 20. mapping High-resolution (2008) al. expression et gene JB, Veyrieras with results 19. GWAS Coordinating (2012) PL Jager De BE, Stranger 18. 2 agD,SiX caln A ekvcJ(02 esrmn ro nntokdata: network in error Measurement (2012) J Leskovec DA, McFarland X, Shi DJ, Wang 22. rjc est (www.roadmapepigenomics.org/ website Project a rvddtruhagatfo h VDAfudto.Ti okwas work (5P50CA127003, This 9112. no. foundation. protocol NVIDIA approved the Institute funding dbGaP from under Additional grant conducted (5R01AI099204). a Cancer Disease through provided Infectious was and National Allergy Insti- of National the tute Heart, and the 5P30CA006516), National and 5P01HL114501, 1U01CA190234, the 1R35CA197449, K25HL133599), 5R01HL111759, from (5P01HL105339, grants Institute and Blood including and Health, Lung, of Institutes National than greater MAF ACKNOWLEDGMENTS. a with SNPs background. all as used 5% We to otherwise. equal or 0 to equal and eQTL where tt among state number SNP. the a of impact Mb can 1 which within SNP, genes a in around enrichment density To of gene computing SNPs. the high–core-score when for among covariate control categories a Project Epigenomics as Roadmap community of all indicators used We background. tissue. as by network analysis TS the each stratify in to SNPs us allows strata(Tissue) where model the using tissues, across ratio of odds quartile combined the top calculate the (in central by is estimated SNP and the if otherwise, or 1 scores 0 core to to equal equal function indicator and an category functional the to where covariates, for allows which (38) model regression local logistic or global a either using among hubs, category functional each in enrichment culated Analyses. Enrichment Chromatin-State (Quies). quiescent Polycomb and (ReprPCWk), repressed constitu- Polycomb repressed (EnhBiv), bivalent weak (ZNF/Rpts), enhancers (ReprPC), flanking regions bivalent (TssBiv), (EnhG), (BivFlnk), TSS repeated enhancers TSS/enhancers bivalent/poised and (Het), genic genes heterochromatin (TxWk), tive ZNF flanking transcription (Enh), (TssA), weak enhancers TSS active 5 (Tx), gene’s are a transcription states at transcribed chromatin and (TssAFlnk), muscle, 15 TSS fibroblast skeletal active The lung, aorta, (20). ventricle, blood artery left heart whole subcutaneous, mucosa, adipose esophagus line, available: cell are data which sn h aemto,w tde h niheti ahchromatin each in enrichment the studied we method, same the Using included we SNPs, GWAS in enrichment of calculation the to Similar uttsu eerglto nhumans. in regulation gene Multitissue yedpnetmanner. type-dependent October Accessed bioRxiv:dx.doi.org/10.1101/081802. 2016. data. 20, sparse and erogeneous Normalization. nassesimnlgcprdg nautoimmunity. in paradigm immunologic systems a in associations. disease. cardiac in function consortium. ontology gene eQTLs. of epigenomes. human regulation. gene human into 551. re-classification. A Phys Matter Soft Nonlin Stat E Rev ascain,w de oait orsodn otenme of number the to corresponding covariate a added we cis-associations, Clogit I I (Category (Trans [I LSCmu Biol Comput PLoS (Category Logit eT o ahtissue, each for trans-eQTL exp(β 0frdge)adeult tews.Teod aisare ratios odds The otherwise. 0 to equal and degree) for >10 uli cd Res Acids Nucleic = Logit akg eso 1.0.1. version package R )i nidctrfnto qa o1i h N sa is SNP the if 1 to equal function indicator an is 1) [I = (Category oilNetworks Social 1 )i nidctrfnto qa o1i h N belongs SNP the if 1 to equal function indicator an is 1) .Smlry eue odtoa oitcrgeso to regression logistic conditional a used we Similarly, ). Nature [ I = (Category hswr a upre ygat rmteUS the from grants by supported was work This 1)] AN outMliCniinRASqPercsigand Preprocessing RNA-Seq Multi-Condition Robust YARN: Science PNAS n Cardiol J Int a Genet Nat 12:e1005033. 518:317–330. ∼ LSGenet PLoS 42:D1001–D1006. = β 325:1246–1250. 1)] 1 | = 88:062812. I 34:396–409. (Central ulse nieAgs 9 2017 29, August online Published 1)] ∼ 25:25–29. Science 171:134–143. β o hoai-tt nlss ecal- we analyses, chromatin-state For 4:e1000214. ∼ 1 I (Central β = 1 348:648–660. I 1) (Trans 0 + n 3 and ... o h ih ise for tissues eight the for ) = = urOi Immunol Opin Curr + 1) 0 1) strata(Tissue) + ns(xlk,strong (TxFlnk), ends + ... , + I (Central , + | 24:544– = trans- , E7849 )is 1) Phys

SYSTEMS BIOLOGY PNAS PLUS 27. Battle A, et al. (2015) Genomic variation. Impact of regulatory variation from RNA to 33. Shabalin AA (2012) Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. protein. Science 347:664–667. Bioinformatics 28:1353–1358. 28. Abnet CC, et al. (2010) A shared susceptibility locus in PLCE1 at 10q23 for gas- 34. Durinck S, Spellman PT, Birney E, Huber W (2009) Mapping identifiers for the inte- tric adenocarcinoma and esophageal squamous cell carcinoma. Nat Genet 42:764– gration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 767. 4:1184–1191. 29. Gronnier C, et al. (2014) The MUC1 mucin regulates the tumorigenic properties 35. Csardi G, Nepusz T (2006) The igraph software package for complex network research. of human esophageal adenocarcinomatous cells. Biochim Biophys Acta 1843:2432– InterJournal Complex Syst 1695. 2437. 36. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of commu- 30. Barber MJ (2007) Modularity and community detection in bipartite networks. Phys nities in large networks. J Stat Mech Theor Exp 2008:P10008. Rev E Stat Nonlin Soft Matter Phys 76:066102. 37. Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term association. 31. Hicks SC, et al. (2016) Smooth quantile normalization. bioRxiv:dx.doi.org/10. Bioinformatics 23:257–258. 1101/085175. 38. Kudaravalli S, Veyrieras JB, Stranger BE, Dermitzakis ET, Pritchard JK (2009) Gene 32. Ritchie ME, et al. (2015) Limma powers differential expression analyses for RNA- expression levels are a target of recent natural selection in the human genome. Mol sequencing and microarray studies. Nucleic Acids Res 43:e47. Biol Evol 26:649–658.

E7850 | www.pnas.org/cgi/doi/10.1073/pnas.1707375114 Fagny et al. Downloaded by guest on September 25, 2021