bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

LASTEDITEDON 2020-03-12 BY X.Z.

1 MODELING REGULATORY NETWORK TOPOLOGY IMPROVES

2 GENOME-WIDE ANALYSES OF COMPLEX HUMAN TRAITS

3 BY XIANG ZHUAND ZHANA DURENAND WING HUNG WONG

4 Stanford University

5 Genome-wide association studies (GWAS) have cataloged many sig- 6 nificant associations between genetic variants and complex traits. How- 7 ever, most of these findings have unclear biological significance, because 8 they often have small effects and occur in non-coding regions. Integra- 9 tion of GWAS with regulatory networks addresses both issues by 10 aggregating weak genetic signals within regulatory programs. Here we 11 develop a Bayesian framework that integrates GWAS summary statis- 12 tics with regulatory networks to infer enrichments and associations si- 13 multaneously. Our method improves upon existing approaches by ex- 14 plicitly modeling network topology to assess enrichments, and by au- 15 tomatically leveraging enrichments to identify associations. Applying 16 this method to 18 human traits and 38 regulatory networks shows that 17 genetic signals of complex traits are often enriched in networks spe- 18 cific to trait-relevant tissue or cell types. Prioritizing variants within 19 enriched networks identifies known and new trait-associated re- 20 vealing novel biological and therapeutic insights.

21 INTRODUCTION

22 Genome-wide association studies (GWAS) have catalogued many signifi-

23 cant associations between common genetic variants, notably single-nucleotide 1,2 24 polymorphisms (SNPs), and a full spectrum of human complex traits . How-

25 ever, it remains challenging to translate most of these findings into biological

26 mechanisms and clinical applications. In particular, most variants have small 3 4 27 effects and are often mapped to non-coding regions .

28 One possible interpretation is that non-coding variants cumulatively affect

29 complex traits through gene regulation. To test this hypothesis, large-scale 5,6 7,8 30 epigenomic and transcriptomic data have been made available span-

31 ning diverse human cell and tissue types. Exploiting these regulatory ge-

32 nomic data, many studies have shown enrichments of trait-associated SNPs 6,9,10 11–13 33 in chromatin regions and genes that are active in trait-relevant tis-

34 sue or cell types. These studies often incorporate regulatory information into

35 effects of SNPs in a linear manner, and ignore potential functional interac-

36 tions among loci within regulatory programs. 14–17 37 Gene regulatory networks have proven useful in mining functional

38 interactions of genes from genomic data. Further, context-specific regulatory

*Correspondence should be addressed to1 X.Z. ([email protected]) and W.H.W ([email protected]). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2 / 29

39 networks are potentially informative to dissect the genetics of complex traits,

40 since, through cellular interactions, trait-associated variants are likely to be 18 41 topologically related . Though promising, the full potential of regulatory

42 networks is yet to be unleashed in GWAS. For example, recent connectiv- 15,19 43 ity analyses identify enrichments of genetic signals across many traits

44 and networks, but do not leverage observed enrichments to further enhance 14 45 trait-associated gene discovery .

46 To further exploit regulatory networks in GWAS, we develop a Bayesian

47 framework for simultaneous genome-wide enrichment and prioritization anal-

48 ysis. Through extensive simulations on the new method, we show its flexibil-

49 ity to various genetic architectures, its robustness to a wide range of model

50 mis-specification, and its improved performance over existing methods. Ap-

51 plying the method to 18 human traits and 38 regulatory networks, we iden-

52 tify strong enrichments of genetic associations in networks that are specific

53 to trait-relevant tissue or cell types. By prioritizing variants within the en-

54 riched networks we identify trait-associated genes that were not implicated

55 by the same GWAS. Many of these putatively novel genes have strong sup-

56 port from multiple lines of external biological and clinical evidence; some are

57 further validated by follow-up GWAS of the same traits with increased sam-

58 ple sizes. Together, these results demonstrate the potential for our method to

59 yield novel biological and therapeutic insights from existing GWAS.

60 RESULTS

61 Method overview. Figure1 provides a schematic method overview. In

62 brief, we develop a new prior distribution that dissects the total effect of a sin-

63 gle variant on a trait into effects of multiple (nearby or distal) genes through

64 a regulatory network, and then we combine this network-induced prior with

65 a multiple-SNP regression likelihood based on single-SNP association sum- 20 66 mary statistics to analyze regulatory networks and complex traits jointly.

67 We refer to this integrative framework as RSS-NET (Methods). 68 RSS-NET specifies the likelihood for SNP-level effects (β) based on single- 69 SNP effect estimates and their standard errors from GWAS (Fig. 1a), and

70 linkage disequilibrium (LD) estimates from an external reference panel with

71 ancestry matching the GWAS (Fig. 1b). For a given network (Fig. 1c), RSS-

72 NET uses its topology (nodes and edges) to specify a prior that decomposes 73 the total effect of each SNP (β) into effects of multiple interconnected genes. 2 74 This prior contains two independent enrichment parameters: θ and σ , which 75 measures the extent to which, SNPs near network nodes have increased like-

76 lihood to be associated with the trait, and, SNPs near network edges have

77 larger effect sizes, respectively. See Methods for mathematical definitions. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3 / 29

1. Public Input Data a. GWAS summary statistics c. Gene regulatory network annotations

● ●

●● ● ● ● ● ● ●● ●● ●● ● ●● ● cjg ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ●●● ● ● ● ● ●●● ● ● ● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●●● ●● ●●● ● ●● ● ●● ● ●●●●● ●●● ●● ●●● ● ● ●● ● ● ● ●●●● ● ● ●●● ●● ● ● ●● ● ●● ● ● ● ●●● ● ●●●●●●● ● ●● ●● ● ●● ●● ● ● ●● ● ● ●● ● ● ●● ●●●● ●● ● ●● ● ●●● ● ●●● ●●● ● ● ●●● ●●● ●●● ● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ● ●●● ● ●● ●●● ● ● ●● ●● ●●● ● ●● ●● ● ●●●●●●●● ●● ● ● ●●●●● ●● ● ● ●● ● ●●●● ●●● ● ● ●●●● ●● ● ● ●● ● ●●● ●●● ●●●● ● ● ●●●● ● ● ●● ●● ● ●● ● ●● ●●●● ●● ●●●●● ● ● ●● ●●● ●● ●●●●● ● ● ●●●● ●● ●●● ● ● ● ● ● ●●●● ●●● ●●●●● ● ●●●●●●● ●●●● ●● ●●●●● ●●●● ● ● ●● ●●●● ● ● ●●●●● ●●● ● ● ● ●●● ●●●● ●● ●●●●● ●●●●● ●● ● ●●● ●●● ● ● ●●●●●● ●● ●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●● ●● ●● ● ●●●●●●●●●● ●●●●●●●●● ●●● ●●● ● ●●●●● ●●●● ●● ●● ●●●●●● ●●● ●● ●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●● ● ●●●● ●●●● ● ●●●●●● ● ●●●●●●● ●● ● ●●●●● ●● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ●● ●●● ●●●●●●●●●● ● ● ●●● ●●● ● ●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●● ● ●●●●●● ●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●● ●● ●●● ● ●●●● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●● ●●●●●●●● ●●●●● ●●●●●●●●● ● ●●● ●●● ●●●●●●● ●●●●●●●● ●●●●●●●● ● ● ●● ●●●●●● ●●●●●●●●●●●●● ● ● ●●●●●●● ●●● ● ●●●●●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●● ●●●●●● ●●● ●●●● ●●● ●● ●●●●●●● ●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●●● ●●● ●●● ●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ˆ βj := marginal effect estimate of SNP j gt TG t TF g gu TG u OG k TF q qn gn TG n ˆ sˆj := standard error of βj SNP j, aj = 1 ˆ ˆ ˆ ˆ β := (β1,..., βp)0, S := diag{(s1,..., sp)0} vgt Gj = {TF g, TG u, OG k}

b b gn b. External LD estimates gt TG t TG n Gene (TF/TG/OG) RE for TF&TG qn gu TF g Trans effect Tg = {TG t, TG u, TG n} of TF on TG TG u TF q Cis effect Tq = {TG n} of SNP on gene

SNP j cjk OG k + cju TG u + cjg (TF g + vgn TG n + vgu TG u + vgt TG t) ∼ · · · · · · a := 1{SNP j “near” the network}, G := {genes “near” SNP j} −1 0 1 j j Tg := {TGs directly regulated by TF g}, vgt := trans impact of TF g on TG t

R := p p matrix of LD between SNPs γjg := effect of SNP j due to gene g, cjg := cis impact of SNP j on gene g × Ò 2. Bayesian Model 3. Posterior Inference

d. Parameter of interest f. Prior distribution g. Network enrichment h. Gene prioritization β := true effect of SNP j β π (μ , σ2) + (1 π ) δ j j j N j 0 j 0 TF TG ∼ · − · ●●●●● Pr(Data M ) 1 ● 1 ● ● (θ0+aj θ) 1 1 ●● ● ●●●●●● β := (β , . . . , β ) ● ● n ●● 1 p ● 0 πj = (1 + 10 ) ● ●● BF := ● ● ●●● − · − | ● t ● ● ● ●●● g ● ● ● Pr(Data M ) ● ●● 0 ● ●●● ● u ●●● ●● ●●● μ = [c (γ + v γ )] ● ● ●● j jg jg gt jt q ●●● | ● ● ● ● ●● ●● ● · t · ● ●●● g Gj Tg ●● ● e. Likelihood function ●●● ●● ● Enrichment model ● ● X X ●●● ∈ ∈ ● ● ●●● ●● 2 ● Enrichment P1 Enrichment ● P1 Enrichment cis trans ● ●●●● M : θ > 0 or σ > 0 ● 1 1 0 0 β SRS− β, SRS 2 0 1 0 1 N γjg (0, σ ) Baseline model ∼ |N {z } | {z } 2 Baseline P1 Baseline P1   ∼ M0 : θ = 0 and σ = 0 b bÒb bÒb

Fig 1: Schematic overview of RSS-NET. RSS-NET requires three types of input data: GWAS summary statistics (a), external LD estimates (b) and gene regulatory network annotations (c). Here a regulatory network is a bipartite graph that has two types of nodes, transcription factor (TF) and target gene (TG), and has directed edges from TFs to TGs through regulatory elements (REs). RSS-NET uses a regulatory network to decompose the total effect of each SNP into effects of multiple genes. For example, the expected total effect of SNP j shown in Panel c can be represented as a sum of cis effects of three nearby genes, outside-network gene (OG) k, TG u and TF g, and trans effects of three TGs (n, u, t) that are directly regulated by TF g. RSS-NET performs Bayesian hierarchical modeling (d-f) under two models about two enrichment parameters (θ for nodes; σ2 for edges): the “baseline model” 2 (M0 : θ 0 and σ 0) that each SNP has equal chance of being associated with the trait (θ 0) and each trait-associated SNP = = 2 2 = has the same effect size distribution (σ 0), and, the “enrichment model” (M1 : θ 0 or σ 0) that SNPs near network nodes = > > are more often associated with the trait (θ 0) or SNPs near network edges have larger effect sizes (σ2 0). To assess network > > enrichment, RSS-NET computes a Bayes factor (BF) comparing M0 and M1 (g). RSS-NET also automatically prioritizes loci within an enriched network by comparing the posterior distributions of genetic effects (β) under M0 and M1 (h). For each locus, RSS-NET summarizes the posterior of β as P1, the posterior probability that at least one SNP in this locus is associated with the trait (β j 0). 6= Differences between P1 under M0 and M1 reflect the influence of a regulatory network on genetic associations, which can highlight new trait-associated genes (h). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4 / 29

78 RSS-NET provides a unified framework (Fig. 1d-f) for two tasks: (1) test-

79 ing whether a network is enriched for genetic associations; (2) identifying

80 which genes within this network drive the enrichment. To assess network

81 enrichment (Fig. 1g), RSS-NET computes a Bayes factor (BF) comparing the 2 82 “enrichment model” (M1 : θ 0 or σ 0) against the “baseline model” (M0 : 2 > > 83 θ 0 and σ 0). To prioritize genes within enriched networks (Fig. 1h) RSS- = = 84 NET contrasts posterior distributions of genetic effects (β) under M0 and M1. 85 RSS-NET outputs results for these two tasks simultaneously. 13 86 RSS-NET improves upon its predecessor RSS-E . Indeed RSS-NET in- 2 87 cludes RSS-E as a special case where edge-enrichment σ 0 and only node- = 88 enrichment θ is learned from data. By estimating the additional parame- 2 89 ter σ , RSS-NET is more flexible than RSS-E, and thus, RSS-NET consis- 90 tently outperforms RSS-E in various simulation scenarios, and often yields 2 91 better fit on real data. Despite different treatments of σ , RSS-NET and 92 RSS-E share computation schemes (Supplementary Notes), which allows us

93 to build RSS-NET on the efficient algorithm of RSS-E. Software is available 94 at https://github.com/suwonglab/rss-net.

95 Method comparison based on simulations. The novelty of RSS-NET

96 lies in its use of regulatory network topology to infer enrichments from whole-

97 genome association statistics, and more importantly, its automatic priori-

98 tization of loci in light of inferred enrichments. We are not aware of any

99 published method with the same features. However, one could ignore topol-

100 ogy and simply create SNP-level annotations based on proximity to network

101 nodes (Supplementary Notes). On the node-based annotations, there are meth-

102 ods to test global enrichments or local associations using GWAS summary 21 10,22 13 103 data. Here we use Pascal , LDSC and RSS-E to benchmark RSS-NET 23 104 through genome-wide simulations on real genotypes (Methods).

105 We started with simulations where RSS-NET modeling assumptions were

106 satisfied. Here we considered two genetic architectures: a sparse scenario 107 with most SNPs being null (β 0), and, a polygenic scenario with most SNPs = 108 being trait-associated (β 0); see Supplementary Figure 1 for details. For 6= 109 each architecture, we simulated baseline datasets from M0 and enrichment 2 2 110 datasets from three patterns of M1 (only θ 0; only σ 0; both θ 0 and σ > > > > 111 0), and used RSS-NET and existing methods to detect M1 from all datasets. 112 Figure 2 and Supplementary Figure 1 show the trade-off between false and

113 true enrichment discoveries for each method. Existing methods tend to per-

114 form well in select settings. For example, Pascal and LDSC perform poorly

115 when genetic signals are very sparse (Fig.2b) or weak (Supplementary Fig.

116 1); RSS-E performs poorly when enrichment patterns are inconsistent with bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.0. It is made

available under aCC-BY-NC-ND 4.0 International license. and > 2 0 and 22 σ = θ 0, : 0 > θ M approximation c 2 χ 0; 1.00 1.00

5 / 29 > 2 0.90 0.90 σ 0, , LDSC-baselineLD = 0.75 0.75 RSS−NET (1.000) RSS−E (0.542) LDSC−baseline (0.916) LDSC−baselineLD (0.914) (0.857) Pascal−max−chi (0.926) Pascal−max−emp (0.754) Pascal−sum−chi (0.849) Pascal−sum−emp RSS−NET (1.000) RSS−E (0.600) LDSC−baseline (0.961) LDSC−baselineLD (0.957) (0.722) Pascal−max−chi (0.776) Pascal−max−emp (0.770) Pascal−sum−chi (0.839) Pascal−sum−emp 10 θ Method (AUROC) Random (0.500) Method (AUROC) Random (0.500) b of 348,965 genome-wide SNPs to 0; = 23 0.50 0.50 2 σ 0, Sparse scenario False positive fraction positive False fraction positive False Polygenic scenario Polygenic > , LDSC-baseline θ 13 0.25 0.25 a 0.10 0.10 0.00 0.00 ( -sum ), and two pathway scoring options:

We used real genotypes

1.00 0.90 0.75 0.50 0.25 0.10 0.00 1.00 0.90 0.75 0.50 0.25 0.10 0.00 True positive fraction positive True True positive fraction positive True 2 c 1.00 1.00 0.90 0.90 0.75 0.75 RSS−NET (0.996) RSS−E (0.938) LDSC−baseline (0.557) LDSC−baselineLD (0.576) (0.454) Pascal−max−chi (0.562) Pascal−max−emp (0.506) Pascal−sum−chi (0.559) Pascal−sum−emp RSS−NET (1.000) RSS−E (1.000) LDSC−baseline (0.779) LDSC−baselineLD (0.777) (0.790) Pascal−max−chi (0.890) Pascal−max−emp (0.649) Pascal−sum−chi (0.734) Pascal−sum−emp Method (AUROC) Random (0.500) Method (AUROC) Random (0.500) ( -max ) and sum-of- χ -values, whereas RSS-E and RSS-NET produced BFs; these statistics were used to 2 p 0.50 0.50 Sparse scenario False positive fraction positive False fraction positive False Polygenic scenario Polygenic 0) for a target network under three scenarios: 0.25 0.25 > 2 σ 0.10 0.10 0 or 0.00 0.00 > θ

:

1.00 0.90 0.75 0.50 0.25 0.10 0.00 1.00 0.90 0.75 0.50 0.25 0.10 0.00 True positive fraction positive True True positive fraction positive True 1 b M 1.00 1.00 0.90 0.90 0.75 0.75 RSS−NET (0.977) RSS−E (0.979) LDSC−baseline (0.962) LDSC−baselineLD (0.956) (0.809) Pascal−max−chi (0.815) Pascal−max−emp (0.821) Pascal−sum−chi (0.820) Pascal−sum−emp RSS−NET (0.971) RSS−E (0.971) LDSC−baseline (0.977) LDSC−baselineLD (0.967) (0.691) Pascal−max−chi (0.694) Pascal−max−emp (0.801) Pascal−sum−chi (0.802) Pascal−sum−emp Method (AUROC) Random (0.500) Method (AUROC) Random (0.500) 0.50 0.50 Sparse scenario False positive fraction positive False fraction positive False Polygenic scenario Polygenic 0.25 0.25 0.10 0.10 0.00 0.00

1.00 0.90 0.75 0.50 0.25 0.10 0.00 1.00 0.90 0.75 0.50 0.25 0.10 0.00 True positive fraction positive True fraction positive True a using their default setups. Pascal includes two gene scoring options: maximum-of- χ Flexibility of RSS-NET to identify network-based enrichments from GWAS summary statistics. 21 ) and empirical sampling ( -emp ). For each dataset, Pascal and LDSC methods produced 0). The enrichment data followed the enrichment model ( = ig 2: 2 Pascal ( -chi rank the significance of enrichments.Each A panel false displays and true the positiveenrichment trade-off occurs datasets between if of false a a and method givenThe identifies true simulation enrichment dashed enrichment scenario, of discoveries, and diagonal the i.e. also target lineSupplementary receiver reports network Figure denotes the operating in 1 . corresponding characteristics random a areas baseline (ROC) ROC under and curves, curve, ROC enrichment for dataset curve with all respectively. (AUROCs), methods AUROC where in a being 200 higher 0.5. value baseline Simulation indicates and better details 200 performance. and additional results (with weaker enrichment signal) are provided in simulate baseline and enrichment individual-levelσ data under two genetic architectures (“sparse” and “polygenic”). The baseline data followed the baseline model ( F We computed the corresponding single-SNP summary statistics, and, on the summary data, we compared RSS-NET with RSS-E bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6 / 29

117 its modeling assumptions (Fig.2c). In contrast, RSS-NET performs consis-

118 tently well in all scenarios. This is expected because RSS-NET models are

119 sufficiently flexible to capture various genetic architectures and enrichment

120 patterns. In practice, one rarely knows in advance the correct genetic or en-

121 richment architecture. This makes the flexibility of RSS-NET appealing.

122 Genetic associations of complex traits are often enriched in regulatory re- 5,6,10–13,22 123 gions . Since a regulatory network is a set of genes linked by regula-

124 tory elements, it is important to confirm that network enrichments identified

125 by RSS-NET are not driven by general regulatory enrichments. Hence, we

126 performed simulations where baseline datasets had enriched associations in

127 random near-gene (Fig.3a; Supplementary Fig. 2) or regulatory SNPs (Fig.

128 3b; Supplementary Fig. 3). The results show that RSS-NET is unlikely to

129 yield false discoveries due to arbitrary regulatory enrichments, and is yet

130 more powerful than other methods.

131 Regulatory network edges play important roles in driving context speci- 16 24 132 ficity and propagating disease risk , but existing methods largely focus

133 on network nodes (genes). In contrast, RSS-NET leverages information from

134 both edges and nodes. This topology-aware feature increases the potential of

135 RSS-NET to identify the most relevant network for a trait among candidates

136 that share many nodes but differ in edges. To illustrate this, we designed a

137 scenario where a real target network and random candidates had the same

138 nodes and edge counts, but different edges. We simulated enrichment and

139 baseline datasets where genetic associations were enriched in the target net-

140 work and random candidates respectively, and then tested enrichment of the

141 target network on all datasets. As expected, only RSS-NET can reliably dis-

142 tinguish true enrichments of the target network from enrichments of its edge-

143 altered counterparts (Fig.3c; Supplementary Fig. 4).

144 To benchmark its prioritization component, we compared RSS-NET with 13 21 145 gene-based association methods in RSS-E and Pascal . Figure 4 and Sup-

146 plementary Figure 5 show the power of each method to identify genome-wide 13 147 gene-level associations. Consistent with previous work , RSS-based meth-

148 ods substantially outperform Pascal methods even without network enrich-

149 ment (Fig. 4a). This is because RSS-NET and RSS-E exploit a multiple regres- 20 150 sion framework to learn the genetic architecture from data of all genes and

151 assesses their effects jointly, whereas Pascal only uses data of a single gene to

152 estimate its effect. Similar to enrichment simulations (Fig. 2), RSS-NET out-

153 performs RSS-E methods in prioritizing genes across different enrichment

154 patterns (Fig. 4b-d). This again highlights the flexibility of RSS-NET.

155 Finally, since RSS-NET uses a regulatory network as is, and, most net-

156 works to date are algorithmically inferred, we performed simulations to as- bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2 c. 0 and > θ with 1 1.00 1.00 7 / 29 M 0.90 0.90 0.75 0.75 RSS−NET (0.975) LDSC−baseline (0.360) LDSC−baselineLD (0.356) (0.418) Pascal−max−chi (0.482) Pascal−max−emp (0.425) Pascal−sum−chi (0.533) Pascal−sum−emp RSS−NET (1.000) LDSC−baseline (0.374) LDSC−baselineLD (0.373) (0.560) Pascal−max−chi (0.626) Pascal−max−emp (0.494) Pascal−sum−chi (0.585) Pascal−sum−emp Supplementary Figures 2-4 . Method (AUROC) Random (0.500) Method (AUROC) Random (0.500) a random set of near-RE SNPs; b 0.50 0.50 Sparse scenario False positive fraction positive False fraction positive False Polygenic scenario Polygenic 0.25 0.25 0.10 0.10 0.00 0.00

1.00 0.90 0.75 0.50 0.25 0.10 0.00 1.00 0.90 0.75 0.50 0.25 0.10 0.00 True positive fraction positive True fraction positive True c a random set of near-gene SNPs; 1.00 1.00 a 0.90 0.90 0.75 0.75 RSS−NET (1.000) LDSC−baseline (0.818) LDSC−baselineLD (0.820) (0.785) Pascal−max−chi (0.876) Pascal−max−emp (0.682) Pascal−sum−chi (0.786) Pascal−sum−emp RSS−NET (1.000) LDSC−baseline (0.839) LDSC−baselineLD (0.849) (0.633) Pascal−max−chi (0.694) Pascal−max−emp (0.669) Pascal−sum−chi (0.755) Pascal−sum−emp Method (AUROC) Random (0.500) Method (AUROC) Random (0.500) Here the enrichment data were generated from a special case of 0.50 0.50 Sparse scenario False positive fraction positive False fraction positive False Polygenic scenario Polygenic 0.25 0.25 0.10 0.10 0.00 0.00

1.00 0.90 0.75 0.50 0.25 0.10 0.00 1.00 0.90 0.75 0.50 0.25 0.10 0.00 True positive fraction positive True fraction positive True b 1.00 1.00 0.90 0.90 2 . Simulation details and additional results (with weaker enrichment signal) are provided in was mis-specified in all three scenarios. Similar to enrichment datasets, enrichments in all baseline datasets manifested in 0 M 0.75 0.75 RSS−NET (1.000) LDSC−baseline (0.903) LDSC−baselineLD (0.905) (0.834) Pascal−max−chi (0.911) Pascal−max−emp (0.744) Pascal−sum−chi (0.844) Pascal−sum−emp RSS−NET (1.000) LDSC−baseline (0.929) LDSC−baselineLD (0.935) (0.728) Pascal−max−chi (0.781) Pascal−max−emp (0.746) Pascal−sum−chi (0.819) Pascal−sum−emp Method (AUROC) Random (0.500) Method (AUROC) Random (0.500) 0.50 0.50 Sparse scenario False positive fraction positive False fraction positive False Polygenic scenario Polygenic 0.25 0.25 0.10 0.10 0.00 0.00

1.00 0.90 0.75 0.50 0.25 0.10 0.00 1.00 0.90 0.75 0.50 0.25 0.10 0.00 True positive fraction positive True fraction positive True a Robustness of RSS-NET to model mis-specification in enrichment analysis. 0. The baseline data were simulated from three scenarios where genetic associations were enriched in: > a random edge-altered network. By this design, ig 3: 2 F both association proportion (more frequent) and magnitude (larger effect). RSS-E was removed from this set of simulations because of its poor performance shown in Figure σ c The rest of this simulation study is the same as Figure niiullvlGA aaudrfu scenarios: four under data GWAS individual-level F uvs(URs o l ehd,wt ahdhrzna ie niaigrno efrac.FrARCadAPC ihrvleidctsbte efrac.Simulation performance. better PRC indicates under value areas higher and AUPRC, transcribed curves and the (PRC) AUROC precision-recall methods of For in displays Pascal performance. kb panel provided whereas random bioRxiv 100 each are indicating trait-associated, of within results lines was half additional horizontal SNP gene lower and dashed the one The details with that least (AUROC=0.5). methods, probabilities performance at all posterior random for if produced indicating (AUPRCs) “trait-associated” methods lines curves RSS as diagonal dataset, dashed gene preprint with each a methods, in defined all gene we each For dataset, effect. each association nonzero For produced had genes. gene -coding this autosomal of doi: assuming 16,954 region RSS-E with of each RSS-E in case datasets, available special 200 methods https://doi.org/10.1101/2020.03.13.990010 association a gene-based is with RSS-E-baseline RSS-NET compared and we data, summary the on and, ; this version posted March 14, 2020. The copyright holder for this preprint

(which was not certified by peer review) 4: ig is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. a oe fRSNTt dniygn-ae soitosfo WSsmaystatistics. summary GWAS from associations gene-based identify to RSS-NET of Power True positive fraction

Precision 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.00 0.10 0.10

8 / 29 0.25 0.25 p False positive fraction vle;teesaitc eeue orn h infiac fgn-ee soitos h pe afo ahpnldsly O uvsadARC for AUROCs and curves ROC displays panel each of half upper The associations. gene-level of significance the rank to used were statistics these -values; Recall 0.50 0.50

Random (0.233) Method (AUPRC) Random (0.500) Method (AUROC) Pascal−sum (0.250) Pascal−max (0.247) RSS−E−baseline (0.438) RSS−E (0.436) RSS−NET (0.436) Pascal−sum (0.522) Pascal−max (0.519) RSS−E−baseline (0.707) RSS−E (0.706) RSS−NET (0.706) 0.75 0.75 0.90 0.90 1.00 1.00 upeetr iue5 . Figure Supplementary

True positive fraction b

Precision 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 a θ 0.00 0.00 = 0, 0.10 0.10 θ σ = 2 0.25 0.25 = .Psa nldstogn crn pin:maximum-of- χ options: scoring gene two includes Pascal 0. False positive fraction 0; b θ Recall 0.50 0.50 =

0, Random (0.233) Method (AUPRC) Random (0.500) Method (AUROC) σ Pascal−sum (0.262) Pascal−max (0.262) RSS−E−baseline (0.437) RSS−E (0.386) RSS−NET (0.447) Pascal−sum (0.524) Pascal−max (0.522) RSS−E−baseline (0.706) RSS−E (0.656) RSS−NET (0.712) 2 0.75 0.75 > 0; 0.90 0.90 c θ 1.00 1.00 > 0, c σ True positive fraction 2

Precision 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 = 0; 0.00 0.00 d 0.10 0.10 θ > 0, 0.25 0.25 False positive fraction σ 2 eue elgntpso 4,6 eoewd SNPs genome-wide 348,965 of genotypes real used We 13 > Recall .W optdtecrepnigsnl-N umr statistics, summary single-SNP corresponding the computed We 0. n Pascal and 0.50 0.50

Random (0.883) Method (AUPRC) Random (0.500) Method (AUROC) Pascal−sum (0.888) Pascal−max (0.887) RSS−E−baseline (0.865) RSS−E (0.997) RSS−NET (0.997) Pascal−sum (0.513) Pascal−max (0.511) RSS−E−baseline (0.445) RSS−E (0.979) RSS−NET (0.979) 0.75 0.75 21 2 0.90 0.90 S- saseilcs fRSNTassuming RSS-NET of case special a is RSS-E . ( -max 1.00 1.00 n sum-of- χ and ) True positive fraction d

Precision 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.10 0.25 0.50 0.75 0.90 1.00 0.00 0.00 0.10 0.10 0.25 0.25 2 False positive fraction .Ec cnrocontains scenario Each ( -sum ). Recall 0.50 0.50

Random (0.883) Method (AUPRC) Random (0.500) Method (AUROC) Pascal−sum (0.890) Pascal−max (0.889) RSS−E−baseline (0.865) RSS−E (0.997) RSS−NET (0.997) Pascal−sum (0.515) Pascal−max (0.513) RSS−E−baseline (0.445) RSS−E (0.978) RSS−NET (0.979) 0.75 0.75 23 0.90 0.90 osimulate to 1.00 1.00 σ 2 = 0, bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

9 / 29

157 sess the robustness of RSS-NET under noisy networks. Specifically we sim-

158 ulated datasets from a real target network, created noisy networks by ran-

159 domly removing edges from this real target, and then used the noisy net-

160 works, rather than the real one, in RSS-NET analyses. By exploiting retained

161 true nodes and edges, RSS-NET produces reliable results in identifying both

162 network enrichments and genetic associations, and unsurprisingly, its perfor-

163 mance drops as the noise level increases (Supplementary Fig. 6).

164 In conclusion, RSS-NET is flexible to perform well in various genetic ar-

165 chitectures and enrichment patterns, is robust to a wide range of model mis-

166 specification, and outperforms existing related methods. To further investi-

167 gate its real-world utility, we applied RSS-NET to analyze 18 complex traits

168 and 38 regulatory networks.

169 Enrichment analyses of 38 networks across 18 traits. We first in- 17 170 ferred whole-genome regulatory networks for 38 tissue or cell types, using

171 public paired data of gene expression and chromatin accessibility (Methods;

172 Supplementary Table 1). Clustering analysis showed that networks recapit-

173 ulated context similarity, with immune cells and brain regions grouping to-

174 gether as two single units (Fig. 5a; Supplementary Fig. 7).

175 On these 38 networks, we then applied RSS-NET to analyze 1.1 million 25 176 common SNPs for 18 traits, using GWAS summary statistics from 20,883

177 to 253,288 European ancestry individuals (Supplementary Table 2). For each

178 trait-network pair we computed a BF assessing network enrichment. Full

179 results of 684 trait-network pairs are available online (Methods).

180 To check whether observed enrichments could be driven by general regu-

181 latory enrichments, we created a “near-gene” control network with 18,334

182 protein-coding autosomal genes as nodes and no edge, and then analyzed

183 this control with RSS-NET on the same GWAS data. For most traits, the

184 near-gene control has substantially weaker enrichment than the actual net-

185 works. In particular, 512 out of 684 trait-network pairs (one-sided Binomial 40 186 p 2.2 10− ) showed stronger enrichments than their near-gene counter- = × 15 187 parts (average log10 BF increase: 13.94, one-sided p 5.1 10− ), and, 16 = × 188 out of 18 traits had multiple networks more enriched than the near-gene con- 4 189 trol (minimum 5; one-sided Wilcoxon p 1.2 10− ; Supplementary Table 3). = × 190 Consistent with simulations (Fig. 3a-b), these results indicate that network

191 enrichments identified by RSS-NET are unlikely due to generic regulatory

192 enrichments harbored in the vicinity of genes.

193 Among 512 trait-network pairs passing the near-gene enrichment control,

194 we further examined whether the observed enrichments could be confounded

195 by network properties or genomic annotations. We associated the BFs with bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10 / 29

196 three network features and did not observe any correlation (proportion of 2 2 197 SNPs in a network: r 3.0 10− , p 0.49; node counts: r 5.4 10− , = − × 3 = = − × 198 p 0.23; edge counts: r 9.2 10− , p 0.84). To check confounding ef- = = − × = 199 fects of genomic annotations (e.g., promoter), we computed the correlation

200 between BFs and proportions of SNPs falling into a network and each of 10,22 201 73 functional categories , and we did not find any significant correla-

202 tion ( 0.13 r 0.01, p 0.05/73). Similar patterns hold for all 684 trait- − < < − > 203 network pairs (Supplementary Tables 4-5). Together, these results suggest

204 that observed network enrichments are unlikely to be driven by known fea-

205 tures and the resulting model mis-specification.

206 For each trait-network pair, we also computed BFs comparing the base- 207 line (M0) against three disjoint models where enrichments were contributed 2 2 208 by (1) only nodes (M11 : θ 0,σ 0); (2) only edges (M12 : θ 0,σ 0); > = 2 = > 209 (3) both nodes and edges (M13 : θ 0,σ 0). We found that M13 was the > > 210 most supported model by data (with the largest BF) for 411 out of 512 trait- 45 211 network pairs (one-sided Binomial p 1.2 10− ), corroborating the “omni- 24,26 = × 212 genic” hypothesis that genetic signals of complex traits tend to be spread

213 across the genome via regulatory interconnections. When stratifying results

214 by traits, however, we observed that enrichment patterns could vary consid-

215 erably (Fig.5b; Supplementary Table 6). For type 2 diabetes (T2D), two of

216 five networks passing the near-gene enrichment control showed the strongest 217 support for node-only enrichment (M11). Many networks showed the strongest 218 support for edge-only enrichment (M12) in breast cancer (10), body mass in- 219 dex (BMI, 14), waist-hip ratio (37) and schizophrenia (38). Since one rarely 220 knows the true enrichment patterns a priori, and M1 includes the restricted 221 models (M11, M12, M13) as special cases, we used BFs based on M1 in this 222 study. Collectively, these results highlight the heterogeneity of network en-

223 richments across complex traits, which can be potentially learned from data

224 by flexible approaches like RSS-NET.

225 Top-ranked enrichments recapitulated many trait-context links reported

226 in previous GWAS. Genetic associations with BMI were enriched in the net- 13 12 227 works of pancreas (BF 2.07 10 ), bowel (BF 8.02 10 ) and adipose 12 = × = × 228 (BF 4.73 10 ), consistent with the roles of obesity-related genes in insulin = × 229 biology and energy metabolism. Networks of immune cells showed enrich- 60 230 ments for rheumatoid arthritis (RA, BF 2.95 10 ), inflammatory bowel 35 = × 26 231 disease (IBD, BF 5.07 10 ) and Alzheimer’s disease (BF 8.31 10 ). = × = × 232 Networks of cardiac and other muscle tissues showed enrichments for coro- 28 233 nary artery disease (CAD, BF 9.78 10 ), atrial fibrillation (AF, BF 14 = × 7 = 234 8.55 10 ), and heart rate (BF 2.43 10 ). Other examples include brain × = × 19 235 network with neuroticism (BF 2.12 10 ), and, liver network with high- = × bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

11 / 29

21 236 density lipoprotein (HDL, BF 2.81 10 ) and low-density lipoprotein (LDL, 27 = × 237 BF 7.66 10 ). = × 238 Some top-ranked enrichments were not identified in the original GWAS,

239 but they are biologically relevant. For example, natural killer (NK) cell net-

240 work showed the strongest enrichment among 38 networks for BMI (BF 13 30 77 = 241 3.95 10 ), LDL (BF 5.18 10 ) and T2D (BF 1.49 10 ). This result × = × 27 = × 242 supports a recent mouse study revealing the role of NK cell in obesity-

243 induced inflammation and insulin resistance, and adds to the considerable 28 244 evidence unifying metabolism and immunity in many pathological states . 29 29 245 Other examples include adipose network with CAD (BF 1.67 10 ), liver 13,30 20 = × 246 network with Alzheimer’s disease (BF 1.09 10 ) and monocyte net- 31,32 12 = × 247 work with AF (BF 4.84 10 ). = × 248 Some networks show enrichments in multiple traits. To assess network

249 co-enrichments among traits, we tested correlations for all trait pairs us-

250 ing their BFs of 38 networks (Fig.5c; Supplementary Table 7). Among 153

251 trait pairs, 29 of them were significantly correlated (p 0.05/153). Reas- < 252 suringly, subtypes of the same disease showed strongly correlated enrich- 20 253 ments, as in IBD subtypes (r 0.96, p 1.3 10− ) and CAD subtypes 14 = = × 254 (r 0.90, p 3.3 10− ). The results also recapitulated known genetic cor- = = × 33 9 255 relations including RA with IBD (r 0.79, p 5.3 10− ), and, neuroti- 34 = =7 × 256 cism with schizophrenia (r 0.73, p 1.6 10− ). Network enrichments of = = × 257 CAD were correlated with enrichments of its established risk factors such as 35 8 36 7 258 heart rate (r 0.75, p 5.1 10− ) and BMI (r 0.71, p 5.1 10− ), = = × 37 = 6 = × 38 259 and its associated traits such as AF (r 0.65, p 9.2 10− ) and height 5 = = × 260 (r 0.64, p 1.6 10− ). Network enrichments of Alzheimer’s disease were = = × 14 261 strongly correlated with enrichments of LDL (r 0.90, p 2.6 10− ) and 9 = = × 262 IBD (r 0.78, p 8.3 10− ), consistent with recent data linking Alzheimer’s = = × 39 40 263 disease to lipid metabolism and immune activation . The results show

264 the potential of GWAS to highlight trait similarities via regulatory networks, 41 42 13 265 complementing previous work via SNPs , heritabilities and pathways .

266 Enrichment-informed prioritization of network genes. A key fea- 13 267 ture of RSS-NET, inherited from RSS-E , is that inferred network enrich-

268 ments automatically contribute to prioritize genetic associations of network base near net 269 genes. Specifically, for each locus RSS-NET produces P1 , P1 and P1 , 270 the posterior probability that at least one SNP in the locus is associated with 271 the trait, assuming M0, M1 for the near-gene control network, and M1 for a 272 given network, respectively (Method). When multiple networks are enriched, bma net 273 RSS-NET produces P1 by averaging P1 over all networks passing the 274 near-gene control, weighted by their BFs (Method). This allows us to assess risaedfie in defined are traits ecnae fte“et nihetmdl(hwn h ags F stend-ny( node-only the as BF) largest the (showing model enrichment “best” the of percentages lto Fcreain mn 8tat cos3 ewrs hr riswt togrcreain r oae lsradcnetdb akregs See edges. darker by connected and closer located are correlations stronger with traits where networks, 38 across traits 18 7 among correlations BF of plot See networks. across model F 3; ( e : (FDR trait trait each GWAS for a shown for are prioritized genes categories for top-ranked enriched only significantly visualization, is See simplify genes control. To Mendelian near-gene ratios. or the odds than mouse enriched knockout more of networks ( category multiple a genes had prioritized that RSS-NET traits bioRxiv of 16 among Overlap evaluated are Medians cutoff. given preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint t dsrbtdsohsi egbu medn.Aayi eal n diinlrslsaepoie in provided are results additional and details Analysis embedding. neighbour stochastic -distributed

(which was not certified by 5: ig peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made o orlto estimates. correlation for available under aCC-BY-NC-ND 4.0 International license. S-E nlsso WSsmaysaitc o 8cmlxtat cos3 euaoyntok.a networks. regulatory 38 across traits complex 18 for statistics summary GWAS of analyses RSS-NET Network Putamen Caudate a ● d Median % of genes with Enrichment P1 > Reference P1 a a a a 75% 80% 85% 90% 95% Other tissue Omnibus Immune cell Brain region

12 / 29 Hippocampus Cortex upeetr al 2 . Table Supplementary Reference P1 ● ● 0 Near−gene Baseline Cerebellum 0.1 ● Prostate upeetr al 6 Table Supplementary Ovary d 0.2 ● Transverse colon einpooto fgnswith genes of proportion Median CD4 cell Uterus NK cell Stomach 0.3 ● B cell Reference P1 cutoff Gastroesophageal junction ● ● ● ● ● CD8 cell Monocyte 0.4 ● P Esophagus muscularis 1 bma 0.5 Subcutaneous adipose ● Omnibus Esophagus mucosa Testis Thyroid Skeletal muscle Spleen ≥ 0.6 Pancreas ● )wt ee mlctdi ncotmuephenotypes mouse knockout in implicated genes with 0 . 9) Aorta Visceral omentum Vagina Terminal ileum Lung Liver 0.7 ● Sun−exposed skin o ueia aus hw r 6tat hthdmlil ewrsmr nihdta h ergn control. near-gene the than enriched more networks multiple had that traits 16 are Shown values. numerical for Atrial appendage Sigmoid colon Adrenal gland 0.8 Left ventricle ● Tibial nerve Breast 0.9 ● e b P

1 bma 100 ihrta eeec siae ( estimates reference than higher

% of best model as 80

20 60 θ > 0 & σ 2

% of best model as =0 40 Type 2 diabetes 20 40 Schizophrenia Body mass index M ●

● Waist−hip ratio Waist−hip 100 11 θ ● > 60 Breast cancer 0 : 80 & θ σ 2 >

>0

60 % of best model as as model best of % ) See 2). f : 0 ,

80

● 11 traits 11 40 σ

● 2

θ

=

= 0

20 &

σ

>0 ) deol ( edge-only 0), 100 2 ● upeetr iue7 . Figure Supplementary P 43 upeetr als12-13 Tables Supplementary 1 base n ua edla diseases Mendelian human and ( e ) or f c P 1 CAD near ● M ≤ ,aoggnswt eeec siae ihrta a than higher estimates reference with genes among ), 12 ) hce de orsodt togrenrichment stronger to correspond edges Thicker 0 . 1). HEIGHT : θ HR ● = Pearson correlation ● lseigo 8rgltr ewrsbsdon based networks regulatory 38 of Clustering upeetr al 8 Table Supplementary 0 , ● MI BMI WAIST σ ● AF ● 2 ● b ● T2D > SCZ ● enr iga hwn,frec trait, each for showing, diagram Ternary )adnd-n-de( node-and-edge and 0) NEU ● o ulrsls brvain fGWAS of Abbreviations results. full for −1.0 HDL ● −0.5 0.0 ● BC 44 , 45 0.5 RA ● 1.0 LDL .A deidctsthat indicates edge An ( f ). o ueia values. numerical for ● IBD ● UC upeetr Table Supplementary LOAD ● ● ● CD M 13 : θ > c 0 , Network σ 2 > e - f 0) bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

13 / 29

275 genetic associations in light of enrichment without having to select a sin- net 276 gle enriched network. Differences in estimates based on enrichment (P1 or bma base near 277 P1 ) and reference (P1 or P1 ) reflect the enrichment impact on a locus. 278 RSS-NET enhances genetic association detection by leveraging inferred

279 enrichments. To quantify this improvement, for each trait we calculated the bma base near 280 proportion of genes with higher P1 than reference estimates (P1 or P1 ), 281 among genes with reference P1 passing a given cutoff (Fig. 5d). When using base bma base 282 P1 as reference, we observed high proportions of genes with P1 P1 base > 283 (median: 82 98%) across a wide range of P -cutoffs (0 0.9), and as ex- − 1 − 284 pected, the improvement decreased as the reference cutoff increased. When near 285 using P1 as reference, we observed less genes with improved P1 than using base 4 286 P (one-sided Wilcoxon p 9.8 10− ), suggesting the observed improve- 1 = × 287 ment might be partially due to general near-gene enrichments, but propor- bma near 288 tions of genes with P1 P1 remained high (median: 74 94%) nonethe- > − net 289 less. Similar patterns occurred when we repeated the analysis with P1 290 across 512 trait-network pairs (Supplementary Table 8). Together the results

291 demonstrate the strong influence of network enrichments on nominating ad-

292 ditional trait-associated genes.

293 RSS-NET tends to promote more genes in networks with stronger enrich- net near 294 ments. For each trait the proportion of genes with P P in a network 1 > 1 295 is often positively correlated with its enrichment BF (r : 0.28 0.91; Supple- − net 296 mentary Table 9). When a gene belongs to multiple networks, its highest P1 297 often occurs in the top-enriched networks. We illustrate this coherent pattern 7 298 with MT1G, a liver-active gene that was prioritized for HDL by RSS-NET

299 and also implicated in a recent multi-ancestry genome-wide sleep-SNP inter- 46 300 action analysis of HDL . Although MT1G belongs to regulatory networks 21 301 of 18 contexts, only the top enrichment in liver (BF 2.81 10 ) informs a net = × 302 strong association between MT1G and HDL (P1 0.98), and remaining net- = base 303 works with weaker enrichments yield minimal improvement (P1 0.10, net = 304 P : 0.14 0.19). Additional examples are shown in Figure 6. 1 − 305 RSS-NET recapitulates many genes previously implicated in the same

306 GWAS. For each analyzed dataset we downloaded the corresponding genes 47 307 from the GWAS Catalog and computed the proportion of these genes that bma 308 had high P1 . With a stringent cutoff 0.9, we observed a significant overlap 26 309 (median across traits: 69%; median Fisher exact p 1.24 10− ; Supplemen- = × 310 tary Table 10). Reassuringly, many recapitulated genes are well-established

311 for the traits (Supplementary Table 11), such as CACNA1C for schizophrenia,

312 TCF7L2 for T2D, APOB for lipids and STAT4 for autoimmune diseases.

313 RSS-NET also uncovers putative associations that were not reported in

314 the same GWAS. To demonstrate that many of these new associations are bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14 / 29

a Body mass index & Pancreas network PAX2 RPL27A 0.95 1.00 Pancreas● Pancreas PAX2 0.90 ●

RPL27A Cerebellum● ● 0.90 ● Ovary Terminal ileum NK cell 0.75 Prostate● 0.85 ●

● 0.85 ● Uterus 0.50 ● ● Testis

Network P1 Network P1 Network P1 Omnibus 0.80 0.80 Putamen 0.25 Caudate ● ● ● TF ●● ● ● ● ● TG 0.75 ●● ●●● ●●●● ●●●●● Testis 0.00 ●●● ●Omnibus 0.75 0.00 0.25 0.50 0.75 1.00 8 10 12 8 10 12 Baseline P1 Log10 enrichment BF Log10 enrichment BF b Rheumatoid arthritis & B cell network TAL1 SLC15A2

●●● CD4 cell B cell 1.00 ● ●● ● SLC15A2 ● 0.94 ● ● ● ● Monocyte ● ● ● ● Monocyte TAL1 ●● ● ● ●● ●B cell ● ● ● ● ● NK cell ● ● ●● ● ● 0.90 ● ● ● Putamen 0.75 ●● ● 0.92

● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● 0.85 ● 0.50 ● ● ● ● ● ●● 0.90 ● ● Uterus ● ● ● ●● Network P1 ● ●● Network P1 Network P1 Lung ● ● ● ● ●●● ● ● ● ● ● ●● ● ●●●● ●● ●● ●●●● ● ● 0.25 ● ●● ●● ●●●●●●●● 0.80 ●●●●●●●● ● 0.88 ●●●● ● ●●●● ●● ●Omnibus ●●● ● ● ● ●●●●●●●● ●●●●●●●● Hippocampus ●●●●●●● ● ● TF ●●●●● ● ● ●●●●●● ●●●●●● ●●●●●●●● ●●●●● ● TG Prostate ●●●●●●●● ●●●●●●●● ●●●●●● ●●●●● 0.00 ● ● Omnibus 0.75 Tibial nerve 0.86 0.00 0.25 0.50 0.75 1.00 30 40 50 60 30 40 50 Baseline P1 Log10 enrichment BF Log10 enrichment BF c High−density lipoprotein & Liver network ESR1 MT1G 0.95 1.00 ● Liver● 1.00 Liver ESR1 MT1G ●●

● 0.94

0.75 ● 0.75 0.93 ● ● ● 0.50 ● 0.92 0.50 Network P1 Network P1 Network P1

●● ● ● 0.91 0.25 ● ● ● Visceral omentum Sigmoid colon ● ● ● ●● ● ● ● ● ● Omnibus ● ● ●● ● TF ● ● ●● ●● ● ● 0.25 ● ●● ● ● ● ●●●● ● ●●●●● ● ●●●●●● TG ● ●●●●●●●● 0.90 ●●●●●● ●●●●● Breast Omnibus ●●●● ● ●●● ●Thyroid ● 0.00 Vagina Pancreas 0.00 0.25 0.50 0.75 1.00 9 12 15 18 21 9 12 15 18 21 Baseline P1 Log10 enrichment BF Log10 enrichment BF d Neuroticism & Putamen network PAX6 SNCA Putamen 1.00 ● Putamen CD8 cell ● ● ● Hippocampus ● ● CD4 cell ● 0.6 0.78 SNCA Cortex● ● ● ● ● 0.75 Stomach Caudate PAX6 0.77 ● Cerebellum● 0.5 0.50 0.76 Cerebellum ● Network P1 Network P1 Pancreas Network P1

Hippocampus ● 0.75 0.25 ● 0.4 ●● ●● ● ●● ● TF ●● ● ● ● ●● ●● TG ●● 0.74 ●● ●●● ●● 0.00 ●●● ●Omnibus Cortex 0.00 0.25 0.50 0.75 1.00 12.5 15.0 17.5 20.0 18.0 18.5 19.0 Baseline P1 Log10 enrichment BF Log10 enrichment BF

Fig 6: RSS-NET gene prioritization results of select trait-network pairs. In the left column, each dot represents a member gene of a given network. In the center and right columns, each dot represents a network to which a select gene belongs. Numerical values are available online (Methods). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15 / 29

base near bma net Trait Gene (Role) P1 P1 P1 P1 (Network, BF) Mouse trait Therapeutic/clinical evidence BMI PAX2 (TF) 0.78 0.80 0.94 0.94 (Pancreas, 2.07 1013) Eye, Renal FSGS7, PAPRS × FLT3 (TG) 0.61 0.70 0.85 0.85 (Cerebellum, 8.70 1011) Growth, Immune Acute myeloid leukemia × WAIST LAMB1 (TG) 0.97 0.97 0.98 0.98 (Esophagus, 6.78 10239) Neuron, NS Lissencephaly 5 × BC KCTD1 (TG) 0.89 0.93 0.98 0.98 (Heart, 8.08 107) CS Scalp-ear-nipple syndrome × CASP8 (TG) 0.71 0.72 0.94 0.94 (Aorta, 8.27 108) Growth, Immune HCC, Glionitrin A × RA AIRE (TF) 0.54 0.61 0.84 0.84 (B cell, 3.31 1057) Immune APS1 × IBD LPP (TG) 0.98 0.94 0.99 0.99 (Monocyte, 6.28 1031) Cellular Acute myeloid leukemia × FOXP1 (TF) 0.84 0.78 0.95 0.95 (NK cell, 5.07 1035) Immune, Neuron Language impairment × CCND3 (TG) 0.81 0.89 0.95 0.95 (NK cell, 5.07 1035) Immune × HDL ALOX5 (TG) 0.97 0.97 0.99 0.99 (Monocyte, 4.75 1015) Immune, Metab. Atherosclerosis × GPAM (TG) 0.92 0.95 0.98 0.98 (Liver, 2.81 1021) Liver, Metab. × NR0B2 (TG) 0.84 0.93 0.98 0.98 (Liver, 2.81 1021) Growth, Metab. Early-onset obesity × LDL CERS2 (TG) 0.99 0.99 1.00 1.00 (NK cell, 5.18 1030) Liver, Metab. × ABCA1 (TG) 0.98 0.98 0.99 0.99 (Liver, 7.66 1027) Liver, Metab. Tangier disease, Probucol × ABCB11 (TG) 0.68 0.72 0.88 0.88 (Liver, 7.66 1027) Liver, Metab. Cholestasis BRI2, PFI2 × DLG4 (TG) 0.69 0.59 0.85 0.85 (NK cell, 5.18 1030) Metab., NS Tat-NR2B9c × SOX17 (TF) 0.52 0.65 0.82 0.84 (CD8, 5.86 1028) Liver, Metab. Vesicoureteral reflux 3 × CAD TGFB1 (TG) 0.92 0.99 0.99 0.99 (Adipose, 1.67 1029)CS, Growth Camurati-Engelmann disease × FN1 (TG) 0.58 0.79 0.91 0.92 (GEJ, 9.78 1028) CS, Metab. GFND2, SMDCF × CDH13 (TG) 0.31 0.55 0.77 0.82 (Heart, 1.93 1028) CS, Metab. × EDNRA (TG) 0.57 0.79 0.80 0.82 (Aorta, 1.09 1027) CS, Muscle Ambrisentan, Macitentan × AF SCN5A (TG) 0.87 0.92 1.00 1.00 (Heart, 6.89 1012)CS, Muscle Brugada syndrome 1, FAF 10 × ENPEP (TG) 0.50 0.76 0.92 0.94 (Uterus, 2.71 1011) QGC-001 × ATXN1 (TG) 0.45 0.62 0.90 0.90 (Colon, 7.54 1014) Muscle, NS Spinocerebellar ataxia 1 × MYOT (TG) 0.55 0.66 0.86 0.87 (Muscle, 8.55 1014) Spheroid body myopathy, MFM3 × SCZ FOXP1 (TF) 1.00 1.00 1.00 1.00 (Colon, 1.20 10144) Growth, Neuron Mental retardation × BCL11A (TG) 1.00 1.00 1.00 1.00 (Spleen, 1.44 10141) Immune, NS Dias-Logan syndrome × SLC25A12 (TG) 0.79 0.81 0.88 0.88 (Muscle, 4.99 10127) Neuron, NS EIEE39 × NEU TCF4 (TF) 0.72 0.88 0.95 0.95 (CD8, 3.66 1020) Immune, NS Pitt-Hopkins syndrome × RAPSN (TG) 0.77 0.88 0.93 0.93 (Muscle, 8.20 1017) Muscle, NS CMS11 × MEF2C (TF) 0.15 0.40 0.83 0.83 (Ileum, 8.56 1022) Growth, Neuron Mental retardation 20 × SNCA (TG) 0.15 0.32 0.78 0.79 (Putamen, 2.12 1019) Neuron, NS DLB, Parkinson 1, 4, BIIB054 × PAX6 (TF) 0.10 0.22 0.62 0.64 (Putamen, 2.12 1019)NS, Vision Optic nerve hypoplasia × PCLO (TG) 0.06 0.17 0.63 0.63 (Ileum, 8.56 1022) Growth, NS Pontocerebellar hypoplasia 3 × TABLE 1 Examples of RSS-NET highlighted genes that were not reported in GWAS of the same data (p 5 10 8) but were implicated in later GWAS with increased sample sizes (p 5 10 8). ≥ × − < × − The “mouse trait” column is based on the Mouse Genome Informatics 43. The “therapeutic/clinical evidence” column is based on the Online Mendelian Inheritance in Man 44 and Therapeutic Target Database 48. Click blue links to view details online. Drugs are highlighted in yellow . Abbreviations of GWAS traits are defined in Supplementary Table 2. GEJ: gastroesophageal junction; CS: cardiovascular system; DS: digestive/alimentary system; Metab.: metabolism; NS: nervous system. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

16 / 29

315 potentially real we exploited 15 analyzed traits that each had an updated

316 GWAS with larger sample size. In each case we obtained newly mapped 47 317 genes from the GWAS Catalog and computed the proportion of these genes bma 318 that were identified by RSS-NET (P1 0.9). The overlap proportions re- ≥ 5 319 mained significant (median: 12%; median Fisher exact p 1.93 10− ; Sup- = × 320 plementary Table 10), showing the potential of RSS-NET to identify trait-

321 associated genes that can be validated by later GWAS with additional sam-

322 ples. Among these validated genes, many are strongly supported by multiple 7 323 lines of external evidence. A particular example is NR0B2, a liver-active 21 base net 324 gene prioritized for HDL (BF 2.81 10 , P1 0.84, P1 0.98), which = ×49 = = 325 was not identified by standard GWAS of the same data (minimum single- 7 326 SNP p 1.4 10− within 100kb, n 99,900). NR0B2 is associated with var- = × 50–52 = 53 327 ious mouse lipid traits and human obesity , and was later identified 54 16 328 in a GWAS of HDL with larger sample size (p 9.7 10− , n 187,056). = × = 329 Table1 lists additional examples.

330 Biological and clinical relevance of prioritized genes. Despite sig-

331 nificant overlaps with GWAS-implicated genes, a large fraction of RSS-NET bma 332 prioritized genes (P 0.9) were not identified by GWAS (median: 70%; 1 ≥ 333 Supplementary Table 10). To systematically assess their relevance, we cross-

334 referenced these genes with multiple orthogonal databases.

335 Mouse phenomics provides important resources to study genetics of human 55 336 traits . Here we evaluated overlap between RSS-NET prioritized genes and 43 337 genes implicated in 27 categories of knockout mouse phenotypes . Network- bma 338 informed genes (P 0.9) were significantly enriched in 128 mouse-human 1 ≥ 339 trait pairs (FDR 0.1; Supplementary Table 12). Fewer significant pairs were ≤ near base 340 identified without network information (119 for P 0.9; 80 for P 0.9). 1 ≥ 1 ≥ 341 For many human traits, top enrichments of network-prioritized genes oc-

342 curred in closely related mouse phenotypes (Fig. 5e). Schizophrenia-associated

343 genes were strongly enriched in nervous, neurological and growth phenotypes

344 (OR: 1.77 2.04). Genes prioritized for autoimmune diseases were strongly − 345 enriched in immune and hematopoietic phenotypes (OR: 2.05 2.35). The − 346 cardiovascular system showed strong enrichments of genes associated with

347 heart conditions (OR: 2.45 2.92). The biliary system showed strong enrich- − 348 ments of genes associated with lipids, BMI, CAD and T2D (OR: 2.16 10.78). − 349 The cross-species phenotypically matched enrichments strengthen the bio-

350 logical relevance of RSS-NET results.

351 Mendelian disease-causing genes have been recognized as an vital con- 56,57 352 tributor to complex traits . Here we quantified overlap between RSS- 45 353 NET prioritized genes and genes causing 19 categories of Mendelian disor- bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

17 / 29

base near bma net Trait Gene (Role) P1 P1 P1 P1 (Network, BF) Mouse trait Therapeutic/clinical evidence BMI NEXN (TG) 0.71 0.79 0.89 0.90 (Muscle, 9.31 1012) CS, Muscle Cardiomyopathy D1CC, H20 × CDX2 (TF) 0.61 0.70 0.83 0.86 (NK cell, 3.95 1013) DS, Growth × WAIST BSCL2 (TG) 0.80 0.68 0.87 0.87 (Esophagus, 6.78 10239) Adipose, Growth Lipodystrophy CG2 × FOXP2 (TF) 0.56 0.59 0.73 0.73 (Esophagus, 6.78 10239) Growth, NS Speech-language disorder 1 × BC ADSL (TG) 0.76 0.80 0.91 0.92 (Aorta, 8.27 108) CS, Eye Adenylosuccinase deficiency × SYNE1 (TG) 0.57 0.63 0.89 0.90 (Esophagus, 6.30 107) Growth, Muscle AMCM, EDMD4, SCAR8 × RA TAL1 (TF) 0.71 0.79 0.91 0.93 (CD4, 3.02 1052) Immune, Tumor Acute lymphocytic leukemia × FHIT (TG) 0.30 0.60 0.90 0.91 (CD4, 3.02 1052) Immune, Tumor × FLT3 (TG) 0.33 0.57 0.73 0.73 (B cell, 3.31 1057) Immune, Tumor Acute myeloid leukemia × IBD FHIT (TG) 0.63 0.87 0.95 0.95 (CD4, 5.32 1033) Immune, Tumor × GATA3 (TF) 0.85 0.83 0.94 0.94 (NK cell, 5.07 1035) Immune, Renal Barakat syndrome × RORA (TF) 0.66 0.78 0.87 0.90 (B cell, 1.49 1032) Immune, NS IDDECA × NFKB2 (TF) 0.74 0.85 0.84 0.88 (B cell, 1.49 1032) Immune CVID10, DIMS-0150 × LRBA (TG) 0.42 0.58 0.72 0.72 (NK cell, 5.07 1035) Immune Immunodeficiency CV8 × DOCK2 (TG) 0.38 0.53 0.71 0.71 (NK cell, 5.07 1035) Immune Immunodeficiency 40 × HDL MT1G (TG) 0.10 0.09 0.98 0.98 (Liver, 2.81 1021) CS, Metab. × RETSAT (TG) 0.79 0.80 0.95 0.95 (Liver, 2.81 1021) Adipose, Metab. × ESR1 (TF) 0.77 0.82 0.95 0.95 (Liver, 2.81 1021) CS, Metab. Myocardial infarction × HCAR3 (TG) 0.85 0.85 0.92 0.92 (Monocyte, 4.75 1015) Metab. ARI-3037MO × TNNC1 (TG) 0.48 0.45 0.78 0.78 (Liver, 2.81 1021) CS, Muscle CMD1Z, CMH13, Levosimendan × LDL RAF1 (TG) 0.79 0.83 0.90 0.90 (Aorta, 3.71 1027)CS, Immune CMD1NN, Semapimod × APOA1 (TG) 0.70 0.76 0.90 0.90 (Liver, 7.66 1027) CS, Metab. Amyloidosis, HDL deficiency × ACADVL (TG) 0.69 0.59 0.85 0.85 (NK cell, 5.18 1030) Liver, Metab. VLCAD deficiency × T2D ITGB6 (TG) 0.75 0.99 0.99 0.99 (Ileum, 4.52 1062) Immune, Metab. AI1H × HR TKT (TG) 0.65 0.67 0.92 0.93 (Aorta, 2.43 107) CS, Growth SDDHD × CAD OSM (TG) 0.56 0.78 0.86 0.86 (Aorta, 1.09 1027) Immune, Metab. GSK2330811 × TRIB1 (TG) 0.43 0.68 0.85 0.85 (Adipose, 1.67 1029) Adipose, Metab. × TAB2 (TG) 0.19 0.43 0.61 0.61 (CD8, 1.13 1025) CS Congenital heart defects × AF TPMT (TG) 0.88 0.93 0.99 0.99 (Ileum, 4.43 1013) Metab. THPM1 × RUNX1 (TF) 0.44 0.60 0.88 0.89 (Heart, 2.15 1014) CS, Immune Acute myeloid leukemia, FPDMM × CSF3 (TG) 0.56 0.72 0.88 0.88 (Muscle, 8.55 1014) Blood, Immune Interleukin-3 × LOAD CASP2 (TG) 0.99 1.00 1.00 1.00 (CD8, 8.31 1026) Cellular, NS Caspase-2 × TTR (TG) 0.64 0.92 0.94 0.94 (Pancreas, 3.53 1020) Metab. FAP, Inotersen, Patisiran × SCZ RORA (TF) 1.00 1.00 1.00 1.00 (Cortex, 5.39 10128) Neuron, NS IDDECA × ERBB4 (TG) 1.00 1.00 1.00 1.00 (Putamen, 7.22 10116) Neuron, NS ALS19 × NFIB (TF) 0.97 0.97 0.98 0.98 (Cortex, 5.39 10128) NS MACID × GRIK2 (TG) 0.90 0.94 0.97 0.97 (Cerebellum, 3.15 10129) Neuron, NS Mental retardation 6 × SYT1 (TG) 0.84 0.89 0.93 0.93 (Cerebellum, 3.15 10129) Neuron, NS Baker-Gordon syndrome × ESR1 (TF) 0.80 0.84 0.93 0.93 (Colon, 1.07 10141) Neuron, NS Migraine × NTRK2 (TG) 0.78 0.84 0.91 0.91 (Cerebellum, 3.15 10129) Neuron, NS EIEE58 × LRRK2 (TG) 0.73 0.78 0.86 0.86 (Monocyte, 5.85 10131) Neuron, NS Parkinson 8, DNL151, DNL201 × C9orf72 (TG) 0.74 0.78 0.83 0.83 (Spleen, 1.44 10141) Neuron, NS FTDALS1 × SNCA (TG) 0.60 0.66 0.74 0.74 (Cerebellum, 3.15 10129) Neuron, NS DLB, Parkinson 1, 4 × NEU LMBRD1 (TG) 0.42 0.66 0.94 0.94 (Ileum, 8.56 1022) Metab. MAHCF × PRKCQ (TG) 0.36 0.56 0.90 0.91 (Spleen, 2.13 1019) Immune, NS × ATP1A2 (TG) 0.33 0.39 0.76 0.78 (Putamen, 2.12 1019) Neuron, NS AHC1, FHM2 × TABLE 2 Examples of RSS-NET highlighted genes that have not reached genome-wide significance in the GWAS Catalog 47 (p 5 10 8) at the time of analysis. The rest is the same as Table 1. ≥ × − bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18 / 29

44 bma 354 ders . Leveraging regulatory networks (P1 0.9), we observed 47 signifi- ≥ near 355 cantly enriched Mendelian-complex trait pairs (FDR 0.1; 44 for P1 0.9; base ≤ ≥ 356 31 for P 0.9; Supplementary Table 13), among which the top-ranked 1 ≥ 357 ones were often phenotypically matched (Fig. 5f). Schizophrenia-associated

358 genes were strongly enriched in Mendelian development and psychiatric dis-

359 orders (OR: 2.22 2.23). Genes prioritized for atrial fibrillation and heart − 360 rate were strongly enriched in arrhythmia (OR: 7.16 8.28). Genes priori- − 361 tized for autoimmune diseases were strongly enriched in monogenic immune

362 dysregulation (OR: 3.11 4.32). Monogenic cardiovascular diseases showed − 363 strong enrichments of genes associated with lipids and heart conditions (OR:

364 2.69 3.70). We also identified pairs where Mendelian and complex traits − 365 seemed unrelated but were indeed linked. Examples included Alzheimer’s 40 366 disease with immune dysregulation (OR 7.32) and breast cancer with in- 58 = 367 sulin disorders (OR 9.71). The results corroborate that Mendelian and = 368 complex traits are not dichotomous, but rather exist on a continuum.

369 Human genetics has proven valuable in therapeutic development for vali- 59 60 370 dating molecular targets and improving success rates . To evaluate their

371 potential in drug discovery, we examined whether RSS-NET prioritized genes 48 372 are pharmacologically active targets with known clinical indications . We

373 identified genes with perfectly matched drug indications and GWAS traits.

374 The most illustrative identical match is EDNRA, a gene that is prioritized base net 375 for CAD (P 0.57, P 0.82 for aorta network), and is also a successful 1 = 1 = 376 target of approved drugs for cardiovascular diseases (Table 1). We identified

377 genes with closely related drug indications and GWAS traits. For example, base bma 378 gene TTR is prioritized for Alzheimer (P 0.64, P 0.94), and is also 1 = 1 = 379 a successful target of approved drugs for amyloidosis (Table 2). For early-

380 stage development, overlaps between drug indications and GWAS traits may

381 provide additional genetic confidence. For example, gene HCAR3 is priori- base bma 382 tized for HDL (P 0.85, P 0.92), and is also a clinical trial target 1 = 1 = 383 for lipid metabolism disorders (Table 2). Other examples include CASP8 with

384 cancer, NFKB2 with IBD, and DLG4 with stroke (Tables 1-2). We also found

385 mismatches between drug indications and GWAS traits, which could suggest 61 386 drug repurposing opportunities . For example, gene CSF3 is prioritized for base bma 387 AF (P 0.56, P 0.88), and is also a successful target of an approved 1 = 1 = 388 drug for aplastic anemia (AA). Since CSF3 is associated with various blood 62 63 389 cell traits in mouse and human , and inflammation plays a role in both 32,64 390 AA and AF etiology , it is tempting to assess effects of the approved AA

391 drug on AF. Overall, further evaluations are required to mechanistically un-

392 derstand the prioritized therapeutic genes, but the findings could be a useful

393 basis for future studies. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

19 / 29

394 DISCUSSION

395 We have presented RSS-NET, a new topology-aware method for integrative

396 analysis of regulatory networks and GWAS summary data. We have demon-

397 strated the improvement of RSS-NET over existing methods through a wide

398 variety of simulations, and illustrated its potential to yield novel insights via

399 extensive analyses of 38 networks and 18 traits. With multi-omics integra-

400 tion becoming a routine in modern GWAS, we expect that researchers will

401 find RSS-NET and its open-sourced software useful.

402 Compared with existing integrative approaches, RSS-NET has several key

403 strengths. First, unlike many methods that require loci passing a significance 9,14,65 404 threshold , RSS-NET uses data from genome-wide common variants.

405 This potentially allows RSS-NET to identify subtle enrichments even in stud-

406 ies with few significant hits. Second, RSS-NET models enrichments directly 2 407 as increased rates (θ) and sizes (σ ) of SNP-level associations, and thus by- 408 passes the issue of converting SNP-level GWAS summary data to gene-level 14,15,21 13 409 data . Third, RSS-NET inherits from RSS-E an important feature

410 that inferred enrichments automatically highlight which network genes are

411 most likely to be trait-associated. This prioritization component, though use- 10,12,15,19,22 412 ful, is missing in current polygenic analyses . Fourth, compared 13 413 with RSS-E , RSS-NET makes more flexible modeling assumptions, and

414 thus is more adaptive to unknown genetic and enrichment architectures.

415 RSS-NET provides a new view of complex trait genetics through the lens of 14–16,19,24 416 regulatory topology. Complementing previous connectivity analyses ,

417 RSS-NET highlights a consistent pattern where genetic signals of complex

418 traits often distribute across genome via the regulatory topology. RSS-NET

419 further leverages topology enrichments to enhance trait-associated gene dis-

420 covery. The topology awareness of RSS-NET relies on a novel model that de-

421 composes effect size of a single SNP into effects of multiple (cis or trans)

422 genes through a regulatory network. Other than similar perspective in a re- 26 423 cent theory paper , we are not aware of any published work implementing

424 and evaluating the topology-aware model in practice.

425 RSS-NET depends critically on the quality of input regulatory networks.

426 The more accurate networks are, the better performance RSS-NET achieves.

427 Currently our understanding of regulatory networks remains incomplete, 14–17 428 and most of available networks are algorithmically constructed . Artifact

429 nodes and edges of inferred networks can bias RSS-NET results; however our

430 simulations confirm the robustness of RSS-NET when input networks are

431 not severely deviated from ground truth. As more accurate networks become

432 available, the performance of RSS-NET will be markedly enhanced. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

20 / 29

433 Like any method, RSS-NET has several limitations in its current form.

434 First, despite its prioritization feature, RSS-NET does not attempt to pin-

435 point associations to single causal variants within prioritized loci. For this 66 436 task we recommend using off-the-shelf fine-mapping methods . Second, RSS-

437 NET analyzes a single network at a time. Since a complex disease typically

438 manifests in various sites, multiple cellular networks are likely to mediate

439 disease risk jointly. To extend RSS-NET to incorporate multiple networks,

440 an intuitive idea would be representing the total effect of a SNP as an av-

441 erage of its effect size in each network, weighted by network relevance for a

442 disease. Third, RSS-NET does not leverage known genomic annotations, ei- 10,19,22 11–13 443 ther at the level of SNPs or genes . Although our mis-specification

444 simulations and near-gene control analyses have confirmed that RSS-NET is

445 robust to generic enrichments of known features, accounting for known anno- 19 446 tations can help interpret observed network enrichments . Our preliminary

447 experiments, however, showed that incorporating additional networks or an-

448 notations in RSS-NET increased computation costs. Hence, we view the de-

449 velopment of more efficient multi-network, multi-annotation methods as an

450 important direction for future work.

451 METHODS

452 Gene and SNP information. This study used genes and SNPs from the

453 assembly GRCh37. This study used 18,334 protein-coding au- 454 tosomal genes (http://ftp.ensembl.org/pub/grch37/release-94/gtf/homo_ 455 sapiens, accessed January 3, 2019). Simulations used 348,965 genome-wide 23 456 SNPs (https://www.wtccc.org.uk), and data analyses used 1,289,786 genome- 25 457 wide HapMap3 SNPs (https://data.broadinstitute.org/alkesgroup/ 458 LDSCORE/w_hm3.snplist.bz2, accessed November 27, 2018). This study also 459 excluded SNPs on sex , SNPs with minor allele frequency less

460 than 1%, and SNPs in the human leukocyte antigen region.

461 GWAS summary statistics and LD estimates. The GWAS summary 49,67–79 80 462 statistics (Supplementary Table 2) and LD estimates used in the 13 463 present study were processed in the same way as those in our previous work .

464 Data download links are provided in Supplementary Notes.

465 Gene regulatory networks. We inferred 38 regulatory networks from

466 paired high-throughput sequencing data of gene expression (e.g., RNA-seq)

467 and chromatin accessibility (e.g., DNAse-seq or ATAC-seq), using a regression- 17 468 based method . We first constructed an “omnibus” network from paired data bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

21 / 29

469 of all available biological samples, and then reorganized this “omnibus” net- 81 470 work in light of regulatory elements (REs) identified in each context to

471 generate context-specific networks for 5 immune cell types, 5 brain regions

472 and 27 non-brain tissues. The network-construction software is available at 473 https://github.com/suwonglab/PECA. The 38 networks are available at https: 474 //github.com/suwonglab/rss-net. 475 For simplicity we formulate a regulatory network as a bipartite graph 476 {VTF,VTG,ETF TG}, where VTF denotes the node set of transcription factors → 477 (TFs), VTG denotes the node set of target genes (TGs), and ETF TG denotes → 478 the set of directed TF-to-TG edges, summarizing how TFs regulate TGs through

479 REs (but not vice versa). Each edge has a weight between 0 and 1, measuring

480 the relative regulation strength of a TF on a TG. Each network file contains a

481 list of REs, TFs, TGs, TF-to-TG edges and weights. On average each network

482 has 431 TFs, 3,298 TGs and 93,764 TF-to-TG edges. Additional information

483 of networks is provided in Supplementary Notes and Tables 14-16.

484 The sequencing data used for network construction were obtained from 5,82 485 ENCODE data portal (https://www.encodeproject.org, accessed De- 7 486 cember 14, 2018) and GTEx data portal (https://gtexportal.org, accessed 487 July 13, 2019). Details are provided in Supplementary Table 1.

488 External databases for cross-reference. To validate and interpret

489 RSS-NET gene-based results, we used the following external databases (ac- 47 490 cessed November 28, 2019): GWAS Catalog (https://www.ebi.ac.uk/gwas/), 43 491 Mouse Genome Informatics (http://www.informatics.jax.org/), phenotype- 45 492 specific Mendelian gene sets (https://github.com/bogdanlab/gene_sets/), 44 493 Online Mendelian Inheritance in Man (https://www.omim.org/), Thera- 48 494 peutic Target Database (http://db.idrblab.net/ttd/).

495 Network-induced effect size distribution. We model the total effect 496 of SNP j on a given trait, β j, as 2 (1) β j π j N (µ j, σ ) (1 π j) δ0, ∼ · 0 + − · 497 where π j denotes the probability that SNP j is associated with the trait (β j 2 6= 498 0), {µ j,σ0} characterize the center and variability of a trait-associated SNP 499 j’s effect size, and δ0 indicates point mass at zero (β j 0). = 500 We model the trait-association probability π j as

µ π j ¶ (2) log10 θ0 a j θ, 1 π j = + · − 501 where θ0 0 captures the genome-wide background proportion of trait-associated < 502 SNPs, θ 0 reflects the increase in probability, on the log10-odds scale, that > bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

22 / 29

503 a SNP inside a network is trait-associated, and a j indicates whether SNP j 12,13,19 504 is inside a network. Following previous analyses , we let a j 1 if SNP = 505 j is within 100 kb of any element (TG, RE, or TF) in a given network. The

506 idea of (2) is that if a tissue or cell type plays an important role in a trait

507 then genetic associations may tend to occur more often in SNPs involved in

508 the network of this context than expected by chance. 509 We model the mean effect size µ j as X (3) µ j w jg γ jg = · g O j ∈ 510 where O j is the set of all nearby or distal genes contributing to the total 511 effect of SNP j, w jg measures the relevance between SNP j and gene g, and 512 γ jg denotes the effect of SNP j on a trait due to gene g. In this study we use 513 a pre-defined regulatory network to specify {O j,w jg}: X X (4) µ j [c jg (γ jg vgt γ jt)], = · + · g G j t Tg ∈ ∈ | {z } | {z } cis trans

514 where G j is the set of all genes within 1 Mb cis window of SNP j, c jg mea- 515 sures the relative impact of a cis SNP j on gene g, Tg is the set of all genes 516 that are directly regulated by TF g in trans in a given network (Tg if gene = ; 517 g is not a TF), and vgt measures the relative impact of a TF g on its target 17 518 gene t. We use pre-defined network edges and weights to specify the trans 519 interconnection Tg and impact vgt respectively. We use context-matching cis- 7,8,83 520 eQTL data to specify the cis impact c jg; see Supplementary Notes and 521 Tables 17-18 for details. The idea of (3)-(4) is that the true effect of a SNP

522 may fan out through some regulatory network of multiple (nearby or distal) 24,26 523 genes to affect the trait . 524 We model the random effect of SNP j due to gene g, γ jg, as

i.i.d. 2 (5) γ jg N (0,σ ), ∼ 525 where the SNP-level subscript j in γ jg ensures the exchangeability of β j in 2 526 (1); see Supplementary Notes. The constant variance σ in (5) is chosen for 2 527 computational convenience. (One could potentially improve (5) by letting σ 10,22 528 depend on functional annotations of SNP j and/or context-specific ex- 11–13 529 pression of gene g, though possibly at higher computational cost.)

530 Bayesian hierarchical modeling. Consider a GWAS with n unrelated

531 individuals measured on p SNPs. In practice we do not know the true SNP- 532 level effects β : (β1,...,βp)0 in (1), but we can infer them from GWAS sum- = 533 mary statistics and LD estimates. Specifically, we perform Bayesian inference bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

23 / 29

534 for β by combining the network-based effect size prior (1)–(5) with the “Re- 20 535 gression with Summary Statistics” (RSS) likelihood :

1 (6) βb N (SbRb Sb− β, SbRb Sb), ∼ ˆ ˆ 536 where βb : (β1,...,βp)0 is a p 1 vector, Sb : diag(bs) is a p p diagonal matrix = × = ˆ × 537 with diagonal elements being s : (sˆ1,..., sˆp)0, β j and sˆ j are estimated single- b = 538 SNP effect size of each SNP j and its standard error from the GWAS, and 539 Rb is the p p LD matrix estimated from an external reference panel with × 540 ancestry matching the GWAS.

541 RSS-NET, defined by the hierarchical model (1)–(6), consists of four un- 2 2 542 known hyper-parameters: {θ0,θ,σ0,σ }. To specify hyper-priors, we first in- 2 2 543 troduce two free parameters {η,ρ} [0,1] to re-parameterize {σ ,σ }: ∈ 0 Ã ! 1 Ã P 2 ! 1 p − p π j w − 2 X π j 2 X g O j jg (7) σ η (1 ρ) , σ η ρ · ∈ , 0 = · − · 2 = · · 2 j 1 nsˆ j j 1 nsˆ j = =

544 where, roughly, η represents the proportion of the total phenotypic variation 545 explained by p SNPs, and ρ represents the proportion of total genetic varia- 2 546 tion explained by network annotations {O j,w jg}. Because nsˆ j is roughly the 547 ratio of phenotype variance to genotype variance, (7) ensures that genetic ef- 548 fect sizes (β) do not rely on sample size n, and have the same measurement 549 unit as the trait. See Supplementary Notes for the derivation of (7). We then 550 place independent uniform grid priors on {θ0,θ,η,ρ} (Supplementary Table 551 19). We verify that RSS-NET results are robust to grid choice (Supplementary 552 Fig. 8). (If one had specific information about {θ0,θ,η,ρ} in a given setting 553 then this could be incorporated here.)

554 Network enrichment. To assess whether a regulatory network is en-

555 riched for genetic associations with a trait, we evaluate a Bayes factor (BF):

p(βb Sb,Rb ,a,O,W, M1) (8) BF | , = p(βb Sb,Rb ,a,O,W, M0) | 556 where p( ) denotes probability densities, a is defined in (2), {O,W} are defined · 2 557 in (3), M1 denotes the enrichment model where θ 0 or σ 0, and M0 de- 2 > > 558 notes the baseline model where θ 0 and σ 0. The observed data are BF = = 559 times more likely under M1 than under M0, and so the larger the BF, the 560 stronger evidence for network enrichment. See Supplementary Notes for de- 561 tails of computing BF. To compute BFs used in Figure 5b, we replace M1 in (8) 562 with three restricted enrichment models (M11, M12, M13). Unless otherwise 563 specified, all BFs reported in this work are based on M1. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

24 / 29

564 Locus association. To identify association between a locus and a trait, 565 we compute P1, the posterior probability that at least one SNP in the locus is 566 associated with the trait:

(9) P1 1 Pr(β j 0, j locus D,model), = − = ∀ ∈ |

567 where D is a shorthand for the input data of RSS-NET including GWAS sum- 568 mary statistics {βb,Sb}, LD estimates Rb and network annotations {a,O,W}. See base near 569 Supplementary Notes for details of computing P1. For a locus, P1 , P1 net 570 and P1 correspond to P1 evaluated under the baseline model M0, the en- 571 richment model M1 for the near-gene control network with all genes as nodes 572 and no edges, and M1 for a given network. In this study a locus is defined as 573 the transcribed region of a gene plus 100 kb upstream and downstream.

574 For K networks with enrichments stronger than the near-gene control, we bma 575 use Bayesian model averaging (BMA) to compute P1 for each locus:

PK Pnet(k) BF(k) bma k 1 1 (10) P = · , 1 = PK k 1 BF(k) = net 576 where P (k) and BF(k) are enrichment P1 and BF for network k 1,...,K. 1 = 577 The ability to average across models in (10) is an advantage the Bayesian

578 approach, because it allows us to assess associations in light of the network

579 enrichment without having to select a single enrichment model.

580 Computation time. The total computational time of RSS-NET to ana-

581 lyze a pair of trait and network is determined by the number of SNPs ana-

582 lyzed, the size of hyper-parameter grid, and the number of variational itera-

583 tions till convergence, all of which can vary considerably among studies. It is

584 thus hard to make general statements about computational time. However,

585 to give a specific example, we finished the analysis of 1.1 million HapMap3

586 SNPs and liver network for HDL within 12 hours in a standard computer

587 cluster (60 nodes, 8 CPUs and 32 Gb memory per node).

588 Simulation overview. Using genotypes of 348,965 genome-wide autoso- 23 589 mal SNPs from 1,458 individuals , we simulated enrichment datasets under 5,17,82 590 M1 for the B cell regulatory network (Fig.s2-4; Supplementary Fig.s 1- 591 6), and simulated baseline datasets in the following scenarios: (1) M0 (Figs.2, 592 4; Supplementary Fig.s 1, 5, 6); (2) random near-gene SNPs were enriched for

593 associations (Fig3a; Supplementary Fig. 2); (3) random near-RE SNPs were

594 enriched for associations (Fig 3b; Supplementary Fig. 3); (4) edge-altered

595 B cell networks were enriched for associations (Fig 3c; Supplementary Fig. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

25 / 29

596 4). We matched enrichment and baseline datasets by the number of trait-

597 associated SNPs and the proportion of phenotypic variation explained by all

598 SNPs. On the simulated datasets we assessed enrichments of the B cell net-

599 work (Fig.s 2-3; Supplementary Fig.s 1-4) and tested gene-based associations

600 (Fig. 4; Supplementary Fig. 5). The only exception is the noisy network sim-

601 ulations (Supplementary Fig. 6) where we analyzed random subsets of the B

602 cell network. Simulation details are provided in Supplementary Figures 1-6.

603 This study used the following software packages in simulations: RSS-E 604 (https://github.com/stephenslab/rss, accessed October 19, 2018), Pas- 605 cal (https://www2.unil.ch/cbg/index.php?title=Pascal, accessed Octo- 606 ber 5, 2017) and LDSC (version 1.0.0, https://github.com/bulik/ldsc, ac- 607 cessed November 27, 2018). See Supplementary Notes for details.

608 Code availability. The RSS-NET software is available at https://github. 609 com/suwonglab/rss-net. Tutorials of installing and using RSS-NET are pro- 610 vided in https://suwonglab.github.io/rss-net. Results of this study were 611 generated from MATLAB version 9.3.0.713579 (R2017b), on a Linux system

612 with Intel E5-2650V2 2.6 GHz and E5-2640V4 2.4 GHz processors. All other

613 codes are specified in Methods and Supplementary Notes.

614 Data availability. Network files used in this study are available at https: 615 //github.com/suwonglab/rss-net. Analysis results of this study are avail- 616 able at https://xiangzhu.github.io/rss-peca. All other data are speci- 617 fied in Methods and Supplementary Notes.

618 Author contributions. X.Z. and W.H.W. conceived the study. X.Z. devel-

619 oped the methods and implemented the software. X.Z. conducted the simula-

620 tion experiments. Z.D. provided the 38 regulatory networks. X.Z. performed

621 the data analyses. X.Z. prepared the supplementary materials and online re-

622 sources. X.Z. wrote the manuscript. X.Z. and W.H.W. revised the manuscript.

623 Acknowledgments. This study is supported by Stein Fellowship to X.Z.

624 and NIH grants P50HG007735 and R01HG010359 to W.H.W. This study uses

625 data generated by the Wellcome Trust Case Control Consortium, 1000 Genomes

626 Project, ENCODE Consortium, GTEx Project, DICE Project, eQTLGen Con-

627 sortium, and multiple GWAS consortia. We thank them for making their data

628 publicly available. Detailed acknowledgments are in Supplementary Notes.

629 References.

630 [1] Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. The 631 American Journal of Human Genetics 101, 5–22 (2017). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

26 / 29

632 [2] Tam, V. et al. Benefits and limitations of genome-wide association studies. Nature Re- 633 views Genetics 20, 467–484 (2019). 634 [3] Manolio, T. et al. Finding the missing heritability of complex diseases. Nature 461, 635 747–753 (2009). 636 [4] Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide as- 637 sociation loci for human diseases and traits. Proceedings of the National Academy of 638 Sciences 106, 9362–9367 (2009). 639 [5] ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the hu- 640 man genome. Nature 489, 57–74 (2012). 641 [6] Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 642 317–330 (2015). 643 [7] GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 644 550, 204–213 (2017). 645 [8] Schmiedel, B. J. et al. Impact of genetic polymorphisms on human immune cell gene 646 expression. Cell 175, 1701–1715 (2018). 647 [9] Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex 648 trait variants. Nature Genetics 45, 124–130 (2013). 649 [10] Finucane, H. K. et al. Partitioning heritability by functional annotation using genome- 650 wide association summary statistics. Nature Genetics 47, 1228–1235 (2015). 651 [11] Calderon, D. et al. Inferring relevant cell types for complex traits by using single-cell 652 gene expression. American Journal of Human Genetics 101, 686–699 (2017). 653 [12] Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies 654 disease-relevant tissues and cell types. Nature Genetics 50, 621–629 (2018). 655 [13] Zhu, X. & Stephens, M. Large-scale genome-wide enrichment analyses identify new 656 trait-associated genes and pathways across 31 human phenotypes. Nature Communica- 657 tions 9, 4361 (2018). 658 [14] Greene, C. S. et al. Understanding multicellular function and disease with human tissue- 659 specific networks. Nature Genetics 47, 569–576 (2015). 660 [15] Marbach, D. et al. Tissue-specific regulatory circuits reveal variable modular perturba- 661 tions across complex diseases. Nature Methods 13, 366–370 (2016). 662 [16] Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Reports 21, 663 1077–1088 (2017). 664 [17] Duren, Z., Chen, X., Jiang, R., Wang, Y. & Wong, W. H. Modeling gene regulation from 665 paired expression and chromatin accessibility data. Proceedings of the National Academy 666 of Sciences 114, E4914–E4923 (2017). 667 [18] Califano, A., Butte, A. J., Friend, S., Ideker, T. & Schadt, E. Leveraging models of cell 668 regulation and GWAS data in integrative network-based association studies. Nature 669 Genetics 44, 841–847 (2012). 670 [19] Kim, S. S. et al. Genes with high network connectivity are enriched for disease heritabil- 671 ity. American Journal of Human Genetics 104, 896–913 (2019). 672 [20] Zhu, X. & Stephens, M. Bayesian large-scale multiple regression with summary statis- 673 tics from genome-wide association studies. Annals of Applied Statistics 11, 1561–1592 674 (2017). 675 [21] Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. & Bergmann, S. Fast and rigorous 676 computation of gene and pathway scores from SNP-based summary statistics. PLoS 677 Computational Biology 12, e1004714 (2016). 678 [22] Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits 679 shows action of negative selection. Nature Genetics 49, 1421–1427 (2017). 680 [23] Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 681 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

27 / 29

682 [24] Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from poly- 683 genic to omnigenic. Cell 169, 1177–1186 (2017). 684 [25] International HapMap 3 Consortium. Integrating common and rare genetic variation in 685 diverse human populations. Nature 467, 52–58 (2010). 686 [26] Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic 687 inheritance. Cell 177, 1022–1034 (2019). 688 [27] Lee, B.-C. et al. Adipose natural killer cells regulate adipose tissue macrophages to 689 promote insulin resistance in obesity. Cell Metabolism 23, 685–698 (2016). 690 [28] Hotamisligil, G. S. Inflammation, metaflammation and immunometabolic disorders. Na- 691 ture 542, 177–185 (2017). 692 [29] Oikonomou, E. K. & Antoniades, C. The role of adipose tissue in cardiovascular health 693 and disease. Nature Reviews Cardiology 16, 83–99 (2019). 694 [30] Kang, J. & Rivest, S. Lipid metabolism and neuroinflammation in Alzheimer’s disease: 695 a role for liver X receptors. Endocrine Reviews 33, 715–746 (2012). 696 [31] Shahid, F., Lip, G. Y. & Shantsila, E. Role of monocytes in heart failure and atrial fibril- 697 lation. Journal of the American Heart Association 7, e007849 (2018). 698 [32] Aviles, R. J. et al. Inflammation as a risk factor for atrial fibrillation. Circulation 108, 699 3006–3010 (2003). 700 [33] Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS 701 Genetics 7, e1002254 (2011). 702 [34] Lo, M. et al. Genome-wide analyses for personality traits identify six genomic loci and 703 show correlations with psychiatric disorders. Nature Genetics 49, 152–156 (2017). 704 [35] Ferrari, R. & Fox, K. Heart rate reduction in coronary artery disease and heart failure. 705 Nature Reviews Cardiology 13, 493–501 (2016). 706 [36] Dale, C. E. et al. Causal associations of adiposity and body fat distribution with coronary 707 heart disease, stroke subtypes, and type 2 diabetes mellitus: a Mendelian randomization 708 analysis. Circulation 135, 2373–2388 (2017). 709 [37] Odutayo, A. et al. Atrial fibrillation and risks of cardiovascular disease, renal disease, 710 and death: systematic review and meta-analysis. BMJ 354, i4482 (2016). 711 [38] Nelson, C. P. et al. Genetically determined height and coronary artery disease. New 712 England Journal of Medicine 372, 1608–1618 (2015). 713 [39] Di, G. P. & Kim, T. Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nature 714 Reviews Neuroscience 12, 284–296 (2011). 715 [40] Heppner, F., Ransohoff, R. & Becher, B. Immune attack: the role of inflammation in 716 Alzheimer disease. Nature Reviews Neuroscience 16, 358–372 (2015). 717 [41] Pickrell, J. et al. Detection and interpretation of shared genetic influences on 42 human 718 traits. Nature Genetics 48, 709–717 (2016). 719 [42] Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and 720 traits. Nature Genetics 47, 1236–1241 (2015). 721 [43] Bult, C. et al. Mouse Genome Database (MGD) 2019. Nucleic Acids Research 47, D801– 722 D806 (2019). 723 [44] Amberger, J., Bocchini, C., Scott, A. & Hamosh, A. OMIM.org: leveraging knowledge 724 across phenotype-gene relationships. Nucleic Acids Research 47, D1038–D1043 (2019). 725 [45] Freund, M. K. et al. Phenotype-specific enrichment of Mendelian disorder genes near 726 GWAS regions across 62 complex traits. The American Journal of Human Genetics 103, 727 535–552 (2018). 728 [46] Noordam, R. et al. Multi-ancestry sleep-by-SNP interaction analysis in 126,926 indi- 729 viduals reveals lipid loci stratified by sleep duration. Nature Communications 10, 1–13 730 (2019). 731 [47] Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide associa- bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

28 / 29

732 tion studies, targeted arrays and summary statistics 2019. Nucleic Acids Research 47, 733 D1005–D1012 (2019). 734 [48] Wang, Y. et al. Therapeutic target database 2020: enriched resource for facilitating 735 research and early development of targeted therapeutics. Nucleic acids research 48, 736 D1031–D1041 (2020). 737 [49] Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood 738 lipids. Nature 466, 707–713 (2010). 739 [50] Kerr, T. A. et al. Loss of nuclear receptor SHP impairs but does not eliminate negative 740 feedback regulation of bile acid synthesis. Developmental Cell 2, 713–720 (2002). 741 [51] Wang, L. et al. Redundant pathways for negative feedback regulation of bile acid pro- 742 duction. Developmental Cell 2, 721–731 (2002). 743 [52] Hartman, H. B., Lai, K. & Evans, M. J. Loss of small heterodimer partner expression in 744 the liver protects against dyslipidemia. Journal of Lipid Research 50, 193–203 (2009). 745 [53] Nishigori, H. et al. Mutations in the small heterodimer partner gene are associated with 746 mild obesity in Japanese subjects. Proceedings of the National Academy of Sciences 98, 747 575–580 (2001). 748 [54] Willer, C. et al. Discovery and refinement of loci associated with lipid levels. Nature 749 Genetics 45, 1274–1283 (2013). 750 [55] Brown, S. et al. High-throughput mouse phenomics for characterizing mammalian gene 751 function. Nature Reviews Genetics 19, 357–370 (2018). 752 [56] Lupski, J., Belmont, J., Boerwinkle, E. & Gibbs, R. Clan genomics and the complex 753 architecture of human disease. Cell 147, 32–43 (2011). 754 [57] Blair, D. R. et al. A nondegenerate code of deleterious variants in mendelian loci con- 755 tributes to complex disease risk. Cell 155, 70–80 (2013). 756 [58] Bruning, P. F. et al. Insulin resistance and breast-cancer risk. International Journal of 757 Cancer 52, 511–516 (1992). 758 [59] Plenge, R., Scolnick, E. & Altshuler, D. Validating therapeutic targets through human 759 genetics. Nature Reviews Drug Discovery 12, 581–594 (2013). 760 [60] Nelson, M. et al. The support of human genetic evidence for approved drug indications. 761 Nature Genetics 47, 856–860 (2015). 762 [61] Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nature 763 Biotechnology 30, 317–320 (2012). 764 [62] Lieschke, G. et al. Mice lacking granulocyte colony-stimulating factor have chronic 765 neutropenia, granulocyte and macrophage progenitor cell deficiency, and impaired neu- 766 trophil mobilization. Blood 84, 1737–1746 (1994). 767 [63] Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to 768 common complex disease. Cell 167, 1415–1429 (2016). 769 [64] Barrett, A. & Sloand, E. Autoimmune mechanisms in the pathophysiology of myelodys- 770 plastic syndromes and their clinical relevance. Haematologica 94, 449–451 (2009). 771 [65] Wang, Q. et al. A Bayesian framework that integrates multi-omics data and gene net- 772 works predicts risk genes from schizophrenia GWAS data. Nature Neuroscience 22, 691– 773 699 (2019). 774 [66] Schaid, D., Chen, W. & Larson, N. From genome-wide associations to candidate causal 775 variants by statistical fine-mapping. Nature Reviews Genetics 19, 491–504 (2018). 776 [67] Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 777 12 new loci associated with atrial fibrillation. Nature Genetics 49, 946–952 (2017). 778 [68] Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast 779 cancer risk. Nature Genetics 45, 353–61 (2013). 780 [69] Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity 781 biology. Nature 515, 197–206 (2015). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.13.990010; this version posted March 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

29 / 29

782 [70] Nikpay, M. et al. A comprehensive 1000 genomes–based genome-wide association meta- 783 analysis of coronary artery disease. Nature Genetics 47, 1121–1130 (2015). 784 [71] Wood, A. R. et al. Defining the role of common variation in the genomic and biological 785 architecture of adult human height. Nature Genetics 46, 1173–1186 (2014). 786 [72] Den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac 787 conduction and rhythm disorders. Nature Genetics 45, 621–631 (2013). 788 [73] Liu, J. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel 789 disease and highlight shared genetic risk across populations. Nature Genetics 47, 979– 790 986 (2015). 791 [74] Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility 792 loci for Alzheimer’s disease. Nature Genetics 45, 1452–1458 (2013). 793 [75] Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symp- 794 toms, and neuroticism identified through genome-wide analyses. Nature Genetics 48, 795 624–633 (2016). 796 [76] Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discov- 797 ery. Nature 506, 376–381 (2014). 798 [77] Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 799 511, 421–427 (2014). 800 [78] Morris, A. P. et al. Large-scale association analysis provides insights into the genetic 801 architecture and pathophysiology of type 2 diabetes. Nature Genetics 44, 981–990 (2012). 802 [79] Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribu- 803 tion. Nature 518, 187–196 (2015). 804 [80] 1000 Genomes Project Consortium. A global reference for human genetic variation. Na- 805 ture 526, 68–74 (2015). 806 [81] Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biology 9, R137 807 (2008). 808 [82] Luo, Y. et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data 809 portal. Nucleic Acids Research 48, D882–D889 (2020). 810 [83] Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL 811 meta-analysis. bioRxiv 447367 (2018).

812 XIANG ZHU ZHANA DURENAND WING HUNG WONG DEPARTMENTOF STATISTICS DEPARTMENT OF STATISTICS STANFORD UNIVERSITY STANFORD UNIVERSITY 390 JANE STANFORD WAY 390 JANE STANFORD WAY STANFORD, CA 94305 STANFORD, CA 94305 USA USA AND DEPARTMENT OF BIOMEDICAL DATA SCIENCE STANFORD UNIVERSITY 1265 WELCH ROAD STANFORD, CA 94305 USA