bioRxiv preprint doi: https://doi.org/10.1101/597484; this version posted May 13, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 1 Estimation of Non-null SNP Effect Size Distributions Enables 2 the Detection of Enriched Genes Underlying Complex Traits 3 1,2 1,2 2-4 4 Wei Cheng , Sohini Ramachandran y, and Lorin Crawford y 5 1 Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, 6 USA 7 2 Center for Computational Molecular Biology, Brown University, Providence, RI, USA 8 3 Department of Biostatistics, Brown University, Providence, RI, USA 9 4 Center for Statistical Sciences, Brown University, Providence, RI, USA 10 Corresponding E-mail:
[email protected]; lorin
[email protected] y 11 Abstract 12 Traditional univariate genome-wide association studies generate false positives and negatives due to 13 difficulties distinguishing associated variants from variants with spurious nonzero effects that do not 14 directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways 15 enriched for mutations in quantitative traits or case-control studies, but these can be computationally 16 costly and hampered by strict model assumptions. Here, we present gene-", a new approach for identifying 17 statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment 18 studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis 19 to identify spurious small-to-intermediate SNP effects and classify them as non-causal.