Using Genetic Risk Score Approaches to Infer Whether an Environmental

1 Using genetic risk score approaches to infer whether an

2 environmental factor attenuates or exacerbates the

3 adverse influence of a candidate gene

4 5 Wan-Yu Lin 1,2*, Yu-Shun Lin 1, Chang-Chuan Chan 2,3, Yu-Li Liu 4, Shih-Jen Tsai 5,6,7, 6 Po-Hsiu Kuo 1,2 7 8 1 Institute of Epidemiology and Preventive Medicine, College of Public Health, National 9 Taiwan University, Taipei, Taiwan 10 2 Department of Public Health, College of Public Health, National Taiwan University, 11 Taipei, Taiwan 12 3 Institute of Environmental and Occupational Health Sciences, College of Public Health, 13 National Taiwan University, Taipei, Taiwan 14 4 Center for Neuropsychiatric Research, National Health Research Institutes, Zhunan, 15 Taiwan 16 5 Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan 17 6 Division of Psychiatry, National Yang-Ming University, Taipei, Taiwan 18 7 Institute of Brain Science, National Yang-Ming University, Taipei, Taiwan 19 20 21 Running title: GRS approaches for GxE 22 23 24 * Corresponding author:

25 Wan-Yu Lin, Ph.D. 26 Room 501, No. 17, Xu-Zhou Road, Taipei 100, Taiwan 27 Phone/Fax: +886-2-33668106; E-mail: [email protected] 28 29 30 31 32

33 Abstract 34 Some candidate genes have been robustly reported to be associated with complex traits, such

35 as the fat mass and obesity-associated (FTO) gene on body mass index, and the fibroblast growth

36 factor 5 (FGF5) gene on blood pressure levels. It is of interest to know whether an environmental

37 factor (E) can attenuate or exacerbate the adverse influence of a candidate gene. To this end, we

38 here evaluate the performance of “genetic risk score” (GRS) approaches to detect

39 “gene-environment interactions” (GxE). In the first stage, a GRS is calculated according to the

40 genotypes of variants in a candidate gene. In the second stage, we test whether E can significantly

41 modify this GRS effect. This two-stage procedure can not only provide a p-value for a GxE test

42 but also guide inferences on how E modifies the adverse effect of a gene. With systematic

43 simulations, we compared several ways to construct a GRS. If E exacerbates the adverse

44 influence of a gene, GRS formed by the elastic net (ENET) or the least absolute shrinkage and

45 selection operator (LASSO) is recommended. However, the performance of ENET or LASSO

46 will be compromised if E attenuates the adverse influence of a gene, and using the ridge

47 regression (RIDGE) can be more powerful in this situation. Applying RIDGE to 18,424 subjects

48 in the Taiwan Biobank, we showed that performing regular exercise can attenuate the adverse

49 influence of the FTO gene on 4 obesity measures: body mass index (p = 0.0009), body fat

50 percentage (p = 0.0031), waist circumference (p = 0.0052), and hip circumference (p = 0.0001).

51 As another example, we used RIDGE and found the FGF5 gene has a stronger effect on blood

52 pressure in Han Chinese with a higher waist-to-hip ratio (p = 0.0013 for diastolic blood pressure

53 and p = 0.0027 for systolic blood pressure). This study provides an evaluation on the GRS

54 approaches, which is important to infer whether E attenuates or exacerbates the adverse influence

55 of a candidate gene.

56 Keywords: body mass index; elastic net regression; gene-environment interaction; lasso; ridge

2 57 regression.

58 Introduction 59 The detection of “gene-environment interactions” (GxE) is important and is even more

60 challenging than the detection of main effects of genes (Aschard 2016; Greenland 1983; Hunter

61 2005). Although some gene-based GxE methods have been developed (Chen et al., 2014; Jiao et

62 al., 2013; Lin, Huang, et al., 2019; Lin et al., 2013; Lin et al., 2016), very few GxE findings have

0.05 63 reached the genome-wide significance level (i.e., p < = 2.5 × 10−6. Because there are 20000

64 ~20,000 genes across the genome, 2.5 × 10−6 is the commonly used significance level in

65 genome-wide gene-based analyses) (Chen et al., 2014; Lin, Huang, et al., 2019). Most evidence

66 of GxE was discovered by candidate gene analyses in which genome-wide association study

67 (GWAS) hits were targeted. For example, physical activity has been found to attenuate the

68 influence of the fat mass and obesity-associated (FTO) gene on obesity risk (Kilpelainen et al.,

69 2011; Vimaleswaran et al., 2009). This means that the association of the FTO risk alleles with

70 obesity measures is weaker in physically active subjects than in physically inactive subjects.

71 Another example is related to blood pressure levels. The fibroblast growth factor 5 (FGF5)

72 gene is associated with blood pressure levels in Han Chinese (Lin, Huang, et al., 2019; Liu et al.,

73 2011). Obesity has been reported to exacerbate the adverse influence of FGF5 on blood pressure.

74 That is, the association of the FGF5 risk alleles with blood pressure is stronger in obese subjects

75 than in lean subjects (Li et al., 2015).

76 A candidate gene usually harbors multiple risk variants. It is of interest to know whether an

77 environmental factor (E) attenuates or exacerbates the effects of these risk variants. However,

78 most GxE methods provide only p-values without any inference for the direction of interaction

79 (Chen et al., 2014; Lin, Huang, et al., 2019; Lin et al., 2013; Lin et al., 2016). In contrast, a

3 80 genetic risk score (GRS) approach aggregates the effects among an ensemble of single-nucleotide

81 polymorphisms (SNPs) and can indicate whether the GRS interacts synergistically or

82 antagonistically with E. A GRS is a linear combination of risk allele counts, where risk alleles

83 and weights are usually retrieved from large published GWASs or meta-analyses. However, the

84 vast majority of genetic studies have been performed on subjects of European ancestry (Sirugo et

85 al., 2019). Appropriate external GWAS results may not be available for GxE studies in other

86 ethnic populations.

87 To address this issue, GRS approaches using internal weights have been proposed for

88 pathway-based GxE studies (Huls, Ickstadt, et al., 2017; Huls, Kramer, et al., 2017) and GWASs

89 (Lin, Chan, et al., 2019; Lin et al., 2020; Lin et al., 2018). The internal weights come from

90 marginal effects of SNPs that can be estimated by a multivariate elastic net (ENET) regression

91 (Huls, Ickstadt, et al., 2017; Huls, Kramer, et al., 2017).

92 In this study, we described how to use GRS approaches to infer whether E attenuates or

93 exacerbates the adverse influence of a gene (therefore, GRS here aggregates the effects among an

94 ensemble of SNPs in a gene). Moreover, we compared GRS approaches with the “Set-Based

95 gene-EnviRonment InterAction test” (SBERIA) (Jiao et al., 2013), the “interaction Sequence

96 Kernel Association Test” (iSKAT) (Lin et al., 2016), and our previously developed “ADAptive

97 combination of Bayes Factors method” (ADABF) (Lin, Huang, et al., 2019). These methods were

98 then applied to the Taiwan Biobank (TWB) data.

99 Materials and Methods

100 Genetic risk score approaches

101 Usually, a GRS combines information of multiple nearly independent SNPs across the

4 102 genome (Ahmad et al., 2013; Lin, Chan, et al., 2019; Lin et al., 2020; Rask-Andersen et al.,

103 2017). However, there have been few applications of using GRS in gene-based GxE tests. To

104 have a clear description, we first explain covariate adjustment in the GRS gene-based GxE

105 approach. Suppose there are L SNPs in a gene or a pathway. Let 푔[∙] be the link function, 푌푖 be

106 the phenotype that can be either continuous or binary, 퐺푖푗 be the number of minor alleles (0, 1,

th 107 or 2, by additive genetic model) at the j SNP (j = 1, …, L), 퐸푖 be the environmental factor, 푿풊

108 be the vector of non-genetic covariates such as the age and sex, and the subscript “i” represents

109 data for the ith subject (푖 = 1, ⋯ , 푛). In addition to the additive model (by counting the number of

110 minor alleles), 퐺푖푗 can also be coded according to the dominant or recessive genetic model.

111 Some GRS approaches involve SNP selection and then aggregate the information of selected

112 SNPs. Therefore, we leave 퐺푖푗 (푗 = 1, … , 퐿) later and first regress 푌푖 on 푿풊 by a linear model

113 or a logistic regression model, as follows:

114 푔[퐸(푌푖)] = 훼0 + 휶′푿풊 , 푖 = 1, ⋯ , 푛 . (1)

̂ 푒푥푝(훼̂푘+휶̂′푿풊) 115 Let 휇̂0푖 = 훼̂0 + 휶′푿풊 (for continuous 푌푖) or 휇̂0푖 = (for binary 푌푖) be the 1+푒푥푝(훼̂푘+휶̂′푿풊)

th 116 predicted mean of 푌푖 under model (1). Therefore, the covariate-adjusted phenotype for the i

117 subject is 휀̂푖 = 푌푖 − 휇̂0푖.

118 Subsequently, we regress 휀̂푖 on 퐺푖1, ⋯ , 퐺푖퐿, as follows:

퐿 119 푔[퐸(휀̂푖)] = 훽0 + ∑푗=1 훽푗퐺푖푗 , 푖 = 1, ⋯ , 푛. (2)

120 Let 휷′ = [훽0 ⋯ 훽퐿] be the vector of regression coefficients in model (2). The ordinary least

121 squares (OLS) estimate of 휷 is as follows:

2 ̂ 푎푟푔푚푖푛 ∑푛 ∑퐿 ( )−1( ) 122 휷 = 휷 [ 푖=1(휀̂푖 − 훽0 − 푗=1 훽푗퐺푖푗) ] = 푮′푮 푮′휺̂ , (3)

th 123 where n is the sample size, 푮 is the n × (퐿 + 1) matrix with the i row of 1GGi1 ... iL  ,

5 124 and 휺̂ is the n-length vector of covariate-adjusted phenotypes. Because of linkage disequilibrium

125 (LD), the SNPs in a gene are usually highly correlated with each other. In this situation, 푮′푮

126 may be singular and not invertible.

127 Ridge regression (RIDGE) (Hoerl and Kennard 1970) is used to address the collinearity

128 problem in model (2), where the regression coefficients are estimated by minimizing the residual

129 sum of squares and an l2-norm penalty, as follows:

2 ̂ 푎푟푔푚푖푛 ∑푛 ∑퐿 ∑퐿 2 130 휷 = 휷 [ 푖=1(휀̂푖 − 훽0 − 푗=1 훽푗퐺푖푗) + 휆 푗=1 훽푗 ]. (4)

131 The regularization parameter 휆 controls the amount of shrinkage. When 휆 is close to 0, 휷̂

̂ ̂ ̂ 132 from RIDGE will approximate the 휷 from OLS. When 휆 is large, 훽1, ⋯ , 훽퐿 will approach 0,

133 and model (2) will be reduced to an intercept-only model.

134 LASSO was later proposed to estimate regression coefficients by minimizing the residual

135 sum of squares and an l1-norm penalty (Tibshirani 1996) as follows:

2 ̂ 푎푟푔푚푖푛 ∑푛 ∑퐿 ∑퐿 136 휷 = 휷 [ 푖=1(휀̂푖 − 훽0 − 푗=1 훽푗퐺푖푗) + 휆 푗=1|훽푗|]. (5)

137 The regularization parameter 휆 controls the amount of shrinkage. Moreover, variable selection

138 can be performed by shrinking some 훽푗푠 to 0 (푗 = 1, … , 퐿). Because equation (5) selects SNPs

139 that are marginally associated with the covariate-adjusted phenotype (휀̂푖, 푖 = 1, ⋯ , 푛), this is

140 called the marginal-association filtering by LASSO. The term “marginal” is used here because

141 the 퐸푖 and 퐺푖푗 × 퐸푖 interaction term have not been included in equation (5). If 훽푗 is shrunk to

142 0, the jth SNP is regarded as unassociated with the covariate-adjusted phenotype and will not be

143 used for the construction of a GRS.

144 ENET (Zou and Hastie 2005) strikes a balance between RIDGE and LASSO by estimating

145 regression coefficients while minimizing the residual sum of squares and a mixture of an l1-norm

146 and an l2-norm, as follows:

6 2 1 147 휷̂ = 푎푟푔푚푖푛 {∑푛 (휀̂ − 훽 − ∑퐿 훽 퐺 ) + 휆 ∑퐿 [ (1 − 훼)훽2 + 훼|훽 |]}. (6) 휷 푖=1 푖 0 푗=1 푗 푖푗 푗=1 2 푗 푗

148 The regularization parameter 휆 controls the amount of shrinkage, whereas 훼 is a penalty

149 weight ranging from 0 (RIDGE) to 1 (LASSO). As suggested by Hüls et al. (Huls, Kramer, et al.,

150 2017), α = 0.5 is used for ENET throughout this work to achieve an optimal balance between

151 RIDGE and LASSO. Equation (6) is the marginal-association filtering by ENET.

152 Similar to genetic studies using penalized regression approaches (Huls, Ickstadt, et al., 2017;

153 Huls, Kramer, et al., 2017; Waldmann et al., 2013), we used the R package “glmnet” (Friedman et

154 al., 2010) to obtain 휷̂. The optimal values of the regularization parameter 휆 in equations (4)-(6)

155 were determined by 10-fold cross-validation. GWASs (Waldmann et al., 2013) and

156 pathway-based GxE studies (Huls, Ickstadt, et al., 2017; Huls, Kramer, et al., 2017) have

157 recommended choosing the largest 휆 such that the mean squared error (MSE) is within 1

158 standard error of the minimum MSE, to avoid selecting too many SNPs in ENET or LASSO.

159 However, most complex traits are polygenic, and a single gene usually explains little phenotypic

160 variation. In our simulations and real data analyses, this criterion usually selected 0 SNPs for

161 ENET or LASSO. Therefore, in gene-based GxE studies, we recommend using the 휆 that leads

162 to the minimum MSE.

163 RIDGE, LASSO, and ENET are all techniques for regression models that suffer from

164 multicollinearity. Therefore, the pruning stage to remove SNPs with LD is not required here.

165 After obtaining 휷̂ from RIDGE, LASSO, or ENET, the GRS of the ith subject is constructed by

퐿 ̂ th th 166 퐺푅푆푖′ = ∑푗=1 훽푗퐺푖푗, where 퐺푖푗 is the number of minor alleles (0, 1, or 2) at the j SNP of the i

̂ 167 subject. A positive 훽푗 indicates that the minor allele is associated with an increase in phenotype.

168 Therefore, a subject with more copies of the minor allele (more phenotype-increasing alleles) will

̂ 169 obtain an increase in his/her 퐺푅푆′. In contrast, a negative 훽푗 indicates that the minor allele is

7 170 associated with a decrease in phenotype, and a subject with more copies of the minor allele (more

171 phenotype-decreasing alleles) will obtain a decrease in his/her 퐺푅푆′. The 퐺푅푆′ is then

172 transformed into a Z-score that represents how many standard deviations the 퐺푅푆′ is from the

173 mean. The standardized GRS Z-score is denoted as 퐺푅푆푖. A larger GRS is always associated with

174 an increased phenotype.

175 Afterwards, we fit the following generalized linear model (GLM):

176 푔[퐸(푌푖)] = 훾0 + 훾퐺퐺푅푆푖 + 훾퐸퐸푖 + 훾퐼푛푡퐺푅푆푖 × 퐸푖 + 휸푪′푿풊 , 푖 = 1, ⋯ , 푛, (7)

177 where 훾퐺 > 0 because a larger GRS is always associated with an increased phenotype. Adding a

178 constraint (훾퐺 > 0) is expected to improve power, although for simplicity we here perform a

179 GLM without this constraint. By testing 퐻0: 훾퐼푛푡 = 0 푣푠. 퐻1: 훾퐼푛푡 ≠ 0, we evaluate whether GxE

180 exists. For continuous traits, each 1 standard deviation (sd) increase in GRS is associated with a

181 훾̂퐼푛푡 change in phenotype for subjects exposed to E = 1 than for subjects exposed to E = 0. For

182 binary traits, each 1 sd increase in GRS is associated with an odds ratio of 푒푥푝(훾̂퐼푛푡) for

183 subjects exposed to E = 1 compared to subjects exposed to E = 0. A positive (negative) 훾̂퐼푛푡

184 indicates that E = 1, or a larger E, exacerbates (attenuates) the adverse influence of a candidate

185 gene.

186 The whole data can be used to estimate 훽1, ⋯ , 훽퐿 (equations 4-6) and then to test

̂ 187 퐻0: 훾퐼푛푡 = 0 푣푠. 퐻1: 훾퐼푛푡 ≠ 0 (model 7) without data splitting. Theoretically, 훽푗 (푗 = 1, ⋯ , 퐿)

188 and 훾̂퐼푛푡 are asymptotically independent under the null hypothesis of no SNP-by-environment

189 interaction, proved by corollary 1 of (Dai et al., 2012). A two-stage approach that first filters

190 SNPs by a criterion independent of the test statistic (훾̂퐼푛푡 estimated from model 7) under the null

191 hypothesis, and then only uses SNPs that pass the filter, can maintain type I error rates and boost

192 power (Bourgon et al., 2010; Frost et al., 2016). Empirically, our following simulation studies

8 193 confirmed that GRS using internal weights is a valid approach in the sense that type I error rates

194 match the nominal significance level. Moreover, studies on genes (Frost et al., 2016; Jiao et al.,

195 2013), pathways (Huls, Ickstadt, et al., 2017; Huls, Kramer, et al., 2017), and GWASs (Lin et al.,

196 2018) have presented the validity of using internally weighted GRSs to test for G×E.

197 Competing methods

198 The abovementioned GRS approaches with some form of penalized regression (RIDGE,

199 LASSO, or ENET) were compared with the SBERIA (Jiao et al., 2013), iSKAT (Lin et al., 2016),

200 and ADABF (Lin, Huang, et al., 2019) methods.

201 SBERIA is the first gene-based G×E test based on the GRS concept. The phenotype is first

202 regressed on each SNP separately, while adjusting for covariates such as sex, age, and ancestry

203 principal components (PCs), as follows:

′ 204 푔[퐸(푌푖)] = 훽0 + 훽푗퐺푖푗 + 휷푪푿풊 , 푖 = 1, ⋯ , 푛; 푗 = 1, ⋯ , 퐿. (8)

̂ th 205 Let 훽푗 be the estimated regression coefficient of the j SNP and 푝푗 be the p-value of

th 206 testing 퐻0: 훽푗 = 0 푣푠. 퐻1: 훽푗 ≠ 0. The GRS of the i subject is constructed by 퐺푅푆푖′ =

퐿 ̂ 207 ∑푗=1[퐼(푝푗 < 0.1)푠푖푔푛(훽푗) + 휈]퐺푖푗, where 퐼(푝푗 < 0.1) is an indicator variable with a value of 1

̂ ̂ 208 if 푝푗 < 0.1 and 0 otherwise, 푠푖푔푛(훽푗) is either 1 or −1 depending on the sign of 훽푗, and 휈

209 is a very small value (e.g., 0.0001). Therefore, SBERIA GRS includes both SNP selection (only

210 SNPs marginally associated with the phenotype are used to construct the GRS) and SNP

211 weighting (only the directions of how SNPs influence the phenotype are used in the GRS). The

212 standardized GRS Z-score is denoted as 퐺푅푆푖. Then, the following GLM is fitted:

[ ( )] ∑퐿 213 푔 퐸 푌푖 = 훾0 + 푗=1 훾퐺푗퐺푖푗 + 훾퐸퐸푖 + 훾퐼푛푡퐺푅푆푖 × 퐸푖 + 휸푪′푿풊 , 푖 = 1, ⋯ , 푛; 푗 = 1, ⋯ , 퐿. (9)

214 By testing 퐻0: 훾퐼푛푡 = 0 푣푠. 퐻1: 훾퐼푛푡 ≠ 0, we can evaluate whether GxE exists (Jiao et al., 2013).

215 There are two fundamental differences between SBERIA and ENET (or LASSO). First, in

9 216 the marginal-association filtering stage, SBERIA fits L regression models respectively (equation

217 8), whereas ENET and LASSO fit a multivariate model incorporating L SNPs simultaneously

218 (equations 5-6). Second, in the model for testing GRS×E, SBERIA includes the main effects of

∑퐿 219 all the L SNPs ( 푗=1 훾퐺푗퐺푖푗 in equation 9), whereas ENET and LASSO incorporate only a GRS

220 term as the aggregated genetic main effects (훾퐺퐺푅푆푖 in equation 7).

221 We compared the abovementioned GRS tests with iSKAT (Lin et al., 2016) and ADABF

222 (Lin, Huang, et al., 2019). In iSKAT, the following model is considered:

[ ( )] ∑퐿 ∑퐿 223 푔 퐸 푌푖 = 훿0 + 푗=1 훿퐺푗퐺푖푗 + 훿퐸퐸푖 + 푗=1 훿퐼푛푡푗퐺푖푗퐸푖 + 휹푪′푿풊 , 푖 = 1, ⋯ , 푛, (10)

th ( ) 224 where 훿퐼푛푡푗 is the interaction effect between the j SNP and E. Assuming 훿퐼푛푡푗푠 푗 = 1, ⋯ , 퐿

225 follow a distribution with a mean of 0 and a variance of 휏, the null hypothesis of all 훿퐼푛푡푗푠 =

226 0 (푗 = 1, ⋯ , 퐿) can be reduced to 휏 = 0. The iSKAT is a score statistic for testing the variance

227 component, i.e., 퐻0: 휏 = 0 vs. 퐻1: 휏 > 0. The statistic can be referred to as Equation (6) in (Lin

228 et al., 2016). The iSKAT method is regarded as optimal in the class of variance component tests

229 (Chen et al., 2014; Lin et al., 2013). Therefore, we chose iSKAT as the representative of variance

230 component tests.

231 In ADABF, we consider the following model for the jth SNP (푗 = 1, ⋯ , 퐿):

[ ( )] 232 푔 퐸 푌푖 = 훿0 + 훿퐺푗 퐺푖푗 + 훿퐸퐸푖 + 훿퐼푛푡푗 퐺푖푗퐸푖 + 휹푪′푿풊 , 푖 = 1, ⋯ , 푛 . (11)

th 233 The SNP×E interaction is of interest, and therefore, 퐻0: 훿퐼푛푡푗 = 0 vs. 퐻1: 훿퐼푛푡푗 ≠ 0 for the j

234 SNP (푗 = 1, ⋯ , 퐿). A Bayes factor (BF) was calculated for each SNP×E, where 퐵퐹 =

235 푃푟(퐷푎푡푎|퐻1)⁄푃푟(퐷푎푡푎|퐻0). A larger BF indicates that the relative evidence in favor of 퐻1

236 (SNP×E interaction exists) is stronger. Because the number of SNPs exhibiting interactions with

237 E varies gene by gene, ADABF exhaustively searches for the evidence of G×E interaction by

238 considering the largest BF, combining the largest 2 BFs, combining the largest 3 BFs, …, to

10 239 aggregating all the L BFs in the gene. The significance of G×E interaction is then determined by

240 the efficient sequential resampling procedure (Liu et al., 2016). ADABF was selected for

241 comparison because it was recommended as a powerful and robust gene-based GxE test in a

242 recent study (Lin, Huang, et al., 2019). The iSKAT and ADABF methods do not test for GxE

243 through a GRS term. Therefore, these two methods do not make inference for the direction of

244 G×E.

245 Simulation study

246 To reflect the real LD structures of the human genome, we used the genotypes of 18,424

247 TWB subjects as our simulation material. Three genes (i.e., TNNT3, RFX3, and FTO) were drawn

248 for simulations, including 48, 95, and 242 SNPs, respectively. It is common to see multiple

249 trait-associated SNPs to be included in the same gene (Fawcett and Barroso 2010; Willer et al.,

250 2007). Therefore, we randomly specified 4 trait-associated SNPs in each simulation replicate. We

251 considered binary and continuous exposures, respectively. For binary exposures, E was randomly

252 sampled from 1 (exposed) or 0 (non-exposed), with 푃(퐸 = 1) = 0.2 and 푃(퐸 = 1) = 0.5,

253 respectively. For continuous exposures, E was randomly selected from the normal distribution

254 with a mean of 0 and a standard deviation of 0.5.

255 The continuous trait of the ith subject was simulated as follows:

∑4 ∑퐷 256 푌푖 = 푑=1 훽퐺푑퐺푖푑 + 훽퐸퐸푖 + 푑=1 훽퐼푛푡푑퐺푖푑 퐸푖 + 휀푖, (12)

th 257 where 훽퐺푑 is the SNP main effect of the d trait-associated SNP (푑 = 1, ⋯ ,4), 훽퐼푛푡푑 is the

258 effect size of interaction between the dth trait-associated SNP and E (푑 = 1, ⋯ , 퐷), D is the

259 number of trait-associated SNPs that also exhibit interactions with E (D = 4 or 2 in our

260 simulation), and 휀푖 is the random error term following the standard normal distribution.

261 The binary trait of the ith subject was simulated as follows:

11 ( ) 푃 푌푖=1 ∑4 ∑퐷 262 푙표푔 = 훽0 + 푑=1 훽퐺푑퐺푖푑 + 훽퐸퐸푖 + 푑=1 훽퐼푛푡푑퐺푖푑 퐸푖, (13) 1−푃(푌푖=1)

0.1 0.4 263 where the intercept 훽0 was 푙표푔( ⁄0.9) = −2.2 or 푙표푔( ⁄0.6) = −0.4. 푌푖 = 1 represents

th 264 that the i subject was diseased whereas 푌푖 = 0 indicates that he/she was non-diseased. This

265 setting corresponds to a disease prevalence of 10% or 40%. A disease prevalence of 40% is also a

266 reasonable setting for some complex diseases. For example, the worldwide prevalence of

267 hypertension among adults aged  25 years was ~40% (Abebe et al., 2015).

268 The magnitudes of SNP main effects, |훽퐺푑|s (푑 = 1, ⋯ ,4), were uniformly sampled from

269 0.04 to 0.08 for continuous traits or uniformly sampled from 푙표푔(1.05) to 푙표푔(1.15) for

270 binary traits. This simulation setting for binary trait (odds ratios uniformly sampled from 1.05 to

271 1.15) concurs with broad GWAS findings, where most odds ratios < 1.5 and many odds ratios <

272 1.2 (Baxter and Jordan 2012; Hodge and Greenberg 2016).

273 Unlike small effect sizes for SNPs, the effect size of E was assumed to be larger, because

274 some E is critical to traits (e.g., regular exercise and dietary habits are important to obesity

275 measures). The magnitude of E, |훽퐸|, was assigned to be 0.3 for continuous traits and

276 푙표푔(1.3) = 0.2624 for binary traits, respectively. When evaluating type I error rates, data have

277 to be generated from the null hypothesis of no 푆푁푃 × 퐸 interactions, and therefore we specified

278 all 훽퐼푛푡푑s = 0 (푑 = 1, ⋯ , 퐷). When assessing power, the magnitudes of 푆푁푃 × 퐸 interaction

279 effects, |훽퐼푛푡푑|s (푑 = 1, ⋯ , 퐷), were uniformly sampled from 0.04 to 0.08 for continuous traits

280 or uniformly sampled from 푙표푔(1.05) to 푙표푔(1.15) for binary traits. To not favor GRS-based

281 approaches that construct GRSs with SNPs marginal effects, the sampling of |훽퐼푛푡푑 |푠 (푑 =

282 1, ⋯ , 퐷) was independent of the sampling of |훽퐺푑|푠 (푑 = 1, ⋯ ,4). Power was evaluated under

283 14 simulation scenarios listed in Table 1, in which “+” indicates a positive effect and “−” means

12 284 a negative effect. As summarized by Table 1, we always assumed 4 trait-associated SNPs, but the

285 number of trait-associated SNPs that also exhibited interactions with E could be 2 or 4. This

286 setting mimics the common situation that some trait-associated SNPs may also present

287 interactions with E (Chen et al., 2014; Jonsson et al., 2009; Li et al., 2015).

288 Scenarios 1-7 describe that a larger E is associated with an increase in trait, whereas

289 scenarios 8-14 represent that a larger E is associated with a decrease in trait. Suppose that a

290 higher trait is linked to a less healthy situation, e.g., 푌푖 = 1 in binary traits. Scenarios 1, 3, 6, 8,

291 10, and 13 indicate that a larger E exacerbates the adverse effect of a candidate gene, whereas

292 scenarios 2, 4, 7, 9, 11, and 14 present that a larger E attenuates the adverse effect of a candidate

293 gene. Scenarios 5 and 12 are cross-over situations, representing that a larger E exacerbates the

294 adverse effect of 50% of trait-associated SNPs while attenuating the adverse effect of the

295 remaining 50% of trait-associated SNPs.

296 [Table 1 is approximately here]

297 Application to the Taiwan Biobank Data

298 The TWB aims to build a research database that integrates the genomic profiles and lifestyle

299 patterns of residents aged 30 to 70 years in Taiwan (Chen et al., 2016). Community-based

300 volunteers provided blood samples and a range of information via a face-to-face interview and

301 physical examination. This study included 20,287 TWB individuals and was approved by the

302 Research Ethics Committee of the National Taiwan University Hospital (NTUH-REC No.

303 201805050RINB). To remove cryptic relatedness, we used PLINK 1.9 (Purcell et al., 2007) to

304 calculate the genome-wide identity by descent (IBD) sharing coefficients between any two

305 subjects. Similar to many genetic studies (Lowe et al., 2009; Mok et al., 2014; Ombrello et al.,

306 2014), we excluded one subject from each pair with PI-HAT  0.125, where PI-HAT =

13 307 Probability(IBD = 2) + 0.5  Probability(IBD = 1). Through this step, relatives within the

308 third-degree of consanguinity were removed. Finally, 18,424 unrelated subjects (9,093 males and

309 9,331 females) were remained in the following analysis. Table 2 shows basic characteristics of

310 TWB participants stratified by sex.

311 [Table 2 is approximately here]

312 The majority of TWB subjects were of Han Chinese ancestry (Chen et al., 2016). The

313 Axiom Genome-Wide TWB genotyping array was designed for Han Chinese in Taiwan, which

314 was run on the Axiom Genome-Wide Array Plate System (Affymetrix, Santa Clara, CA, USA).

315 A total of 646,783 autosomal SNPs were genotyped in this TWB array. After removing 51,293

316 SNPs with genotyping rates < 95%, 6,095 SNPs with Hardy-Weinberg test P-values < 5.7 107

317 (WTCCC 2007), and 1,869 variants with minor allele frequencies (MAFs) < 1%, 587,526 SNPs

318 were used to construct ancestry PCs. Because variants with MAF < 1% have been excluded, no

319 rare variants were included in our following analyses. Removing variants with MAFs < 1% is a

320 commonly-used quality control step in genetic association studies, because the chances of errors

321 in genotype calling increase with decreasing MAFs (Coleman et al., 2016; Goldstein et al., 2012).

322 Results 323 Type I error rates

324 To evaluate type I error rates, continuous traits and binary traits were simulated according to

325 models (12) and (13), respectively. Scenarios 1 and 8 in Table 1 were simulated, but all 훽퐼푛푡푑s

326 (푑 = 1, ⋯ ,4) have to be set at 0 in order to evaluate type I error rates. The main effects of SNPs,

| | 327 |훽퐺푑|s (푑 = 1, ⋯ ,4), and the environmental factor, 훽퐸 , have been described in section

328 “Simulation study”. Based on 10,000 replications for each scenario, Figures 1 and S1 show that

329 all methods were valid in the sense that their type I error rates matched the nominal significance

14 330 level.

331 [Figure 1 is approximately here]

332

333 Power of 6 GxE methods

334 We compared the power of the 4 GRS-based tests and 2 GxE methods: iSKAT (Lin et al.,

335 2016) and ADABF (Lin, Huang, et al., 2019). The power of the 4 GRS-based tests is defined as

336 the probability of rejecting 퐻0: 훾퐼푛푡 = 0 (p-value < 0.05) and correctly specifying the sign of

337 훾퐼푛푡 (훾퐼푛푡 can be found from models 7 and 9). The iSKAT (Lin et al., 2016) and ADABF (Lin,

338 Huang, et al., 2019) methods can only provide a p-value for testing GxE, without an inference of

339 how E modifies the genetic effects. Therefore, their power is defined as the probability of

340 rejecting 퐻0: 훾퐼푛푡 = 0 (p-value < 0.05). Figure 2 presents the power based on 1,000 simulation

341 replications under each scenario, for continuous traits and 푃(퐸 = 1) = 0.2. If E exacerbates the

342 adverse effect of a gene, ENET and LASSO outperform the other methods (scenarios 1, 3, 6, 8,

343 10, and 13). If E attenuates the adverse effect of a gene, RIDGE is more powerful (scenarios 2, 4,

344 7, 9, 11, and 14). Moreover, ADABF was the optimal test under the cross-over scenario

345 (scenarios 5 and 12).

346 [Figure 2 is approximately here]

347 Among the 4 GRS-based tests, ENET, LASSO, and SBERIA first select SNPs marginally

348 associated with the phenotype. ENET and LASSO select SNPs according to the multivariate

349 ENET regression (equation 6) and multivariate LASSO regression (equation 5), respectively.

350 SBERIA selects SNPs based on regressions considering one SNP at a time (equation 8). The

351 performance of these three methods to detect G×E depends on their ability to find the true

352 trait-associated SNPs. Figures S2 and S3 summarize the sensitivity (SEN) and positive predictive

15 353 value (PPV) of the marginal-association filtering in ENET, LASSO, and SBERIA. As described

354 in equation (12) and Table 1, 4 trait-associated SNPs were randomly assigned in each simulation

355 replication. SEN is defined as the percentage of true positives among the 4 trait-associated SNPs,

356 whereas PPV is defined as the percentage of true positives among the total findings.

357 Because SBERIA takes a liberal p-value threshold of 0.1, in the filtering stage it usually

358 selects more SNPs than ENET and LASSO do. Therefore, as shown in Figures S2 and S3,

359 SBERIA generally has a higher SEN but a lower PPV, compared with ENET and LASSO. For

360 each method, SEN was much lower in attenuation scenarios than in exacerbation scenarios. This

361 means that pinpointing true trait-associated SNPs is more difficult in attenuation scenarios. In the

362 filtering stage (equations 5, 6, and 8), 퐺푖푑 × 퐸푖 cannot be included; therefore, a 훽퐼푛푡푑 in the

th 363 opposite direction to 훽퐺푑 will weaken the magnitude of the marginal effect of the d

364 trait-associated SNP. In contrast, a 훽퐼푛푡푑 in the same direction to 훽퐺푑 will strengthen the

365 magnitude of the marginal effect of the dth trait-associated SNP. Therefore, for each method, the

366 ability to pinpoint true trait-associated SNPs is inferior in attenuation scenarios than in

367 exacerbation scenarios (as shown in Figures S2 and S16 for continuous traits and binary traits,

368 respectively).

369 Because there is no SNP selection in RIDGE, true trait-associated SNPs are all reserved for

370 constructing GRS. Therefore, RIDGE is the optimal GRS-based test in attenuation scenarios

371 (Figure 2). In exacerbation scenarios, ENET and LASSO are the best 2 methods because of their

372 superior ability in finding trait-associated SNPs (Figures S2 and S16).

373 We then investigated the percentage of sign-misspecifications for each GRS-based method.

374 In the presence of G×E, we calculated the percentage of wrongly specifying the sign of 훾퐼푛푡 (in

375 models 7 or 9) among all rejections of 퐻0: 훾퐼푛푡 = 0 (p-value < 0.05). With 1,000 simulation

16 376 replications under each scenario, Figure 3 presents the percentages of sign-misspecifications, for

377 continuous traits and 푃(퐸 = 1) = 0.2. The true 훾퐼푛푡 is positive under scenarios 1, 3, 6, 8, 10,

378 and 13; negative under scenarios 2, 4, 7, 9, 11, and 14. The cross-over scenarios (5 and 12) were

379 not considered here, because the true sign of 훾퐼푛푡 was unclear when E = 1 (or a larger

380 continuous E) exacerbates the adverse effect of 50% of trait-associated SNPs but attenuates the

381 adverse effect of the remaining 50% of trait-associated SNPs.

382 [Figure 3 is approximately here]

383 Figure 3 shows that the signs of 훾퐼푛푡 were usually correctly specified except scenarios 4

384 and 11. SBERIA generally makes more mistakes in indicating the true direction for 훾퐼푛푡, because

385 it filters SNPs by considering only one SNP at a time (equation 8). Figures S4-S7 present the

386 power and percentages of sign-misspecifications in continuous-trait simulations, given

387 푃(퐸 = 1) = 0.5 and a continuous E, respectively. Each method under 푃(퐸 = 1) = 0.5 was

388 slightly more powerful than that under 푃(퐸 = 1) = 0.2, but the comparisons across methods

389 were similar to those shown in Figures 2-3.

390 The performance of each method for binary-trait simulations can be found in Figures

391 S8-S21. Regarding different prevalence values, each method was more powerful under

392 푃(푌 = 1) = 0.4 than under 푃(푌 = 1) = 0.1. Concerning different probabilities of being

393 exposed, each method was more powerful under 푃(퐸 = 1) = 0.5 than under 푃(퐸 = 1) = 0.2.

394 Nonetheless, the comparisons across methods were similar to those described for continuous

395 traits.

396 Regarding the 14 simulation scenarios listed in Table 1, only 2 SNP×E interactions (rather

397 than 4) were simulated in scenarios 4 and 11, and 훽퐼푛푡s were in the opposite direction to 훽퐺푠.

398 Therefore, detecting G×E and correctly specifying the sign for 훾퐼푛푡 were more challenging

17 399 under scenarios 4 and 11.

400 Regarding the computation time (Figures S22-S29), SBERIA is the fastest method, followed

401 by the 3 penalized regression methods: RIDGE, ENET, and LASSO. ADABF uses the sequential

402 resampling procedure (Liu et al., 2016), and therefore, it takes a longer time in the presence of

403 G×E. A recent study concluded that ADABF is a computationally feasible method in a GWAS

404 context (Lin, Huang, et al., 2019). This is because in a real GWAS usually very few genes can be

405 detected to interact with E. Therefore, a genome-wide analysis using ADABF takes a much

406 shorter time than the power evaluation performed here (power is always evaluated in the presence

407 of G×E). On average, the iSKAT method requires the longest time in most situations (Figures

408 S22-S29).

409 FTO x exercise interaction on 5 obesity measures

410 The FTO gene is obesity-associated and has been replicated by many studies (Fawcett and

411 Barroso 2010; Frayling et al., 2007; Scuteri et al., 2007). It spans from 53,737,875 to 54,148,379

412 base pairs on chromosome 16, according to the human genome GRCh37/hg19 assembly. Similar

413 to many gene-based tests (Alonso-Gonzalez et al., 2019; Lin, Huang, et al., 2019; Liu et al., 2010;

414 Wray et al., 2012), 50 kb in the 3’ and 5’ regions that might regulate a gene were also

415 incorporated in our analysis. Specifying a large boundary would be difficult to pinpoint the exact

416 gene interacting with E, whereas a small boundary may not fully capture the regulatory regions

417 (Liu et al., 2010). A total of 242 SNPs (all with MAF > 1%) were included in this chromosomal

418 region. Here, we used the 6 G×E tests to investigate whether the influence of FTO can be

419 modified by performing regular physical exercise. A total of 5 obesity measures were analyzed:

420 body mass index (BMI), body fat percentage (BFP), waist circumference (WC), hip

421 circumference (HC), and the waist-to-hip ratio (WHR).

18 422 Regular physical exercise was defined as engaging in 30 minutes of “exercise” 3 times a

423 week. “Exercise” indicated leisure-time activities such as jogging, yoga, mountain climbing, or

424 playing basketball. According to the TWB questionnaire, activities during work were not counted

425 in “exercise”. Among the 18,424 subjects, 7,652 (41.5%) reported performing regular exercise,

426 while 10,764 reported no regular exercise. A total of 8 subjects did not respond to this question.

427 Covariates adjusted in all models included sex, age (in years), educational attainment, drinking

428 status, smoking status, and the first 10 ancestry PCs.

429 [Table 3 is approximately here]

430 Table 3 presents the results of 6 methods. We may miss possibly important findings due to a

431 harsh penalty of multiple testing. Because this study focuses on G×E detection for candidate

432 genes, we set the significance level at 0.05. The results of RIDGE show that regular physical

433 exercise attenuates the adverse effect of the FTO gene on BMI, BFP, WC, and HC. The other

434 three GRS-based tests (ENET, LASSO, and SBERIA) provided significant results for three

435 obesity measures, respectively. All 훾̂퐼푛푡s were negative, indicating that regularly performing

436 exercise attenuates the adverse influence of FTO on obesity measures.

437 GRS-based tests are more powerful than iSKAT and ADABF because performing regular

438 exercise generally blunted the effects of the trait-increasing alleles in FTO. Take BMI-associated

439 SNPs as examples. A total of 12 out of the 242 SNPs reached the genome-wide significance

−8 440 (p-value < 5 × 10 ) when testing 퐻0: 훽푗 = 0 푣푠. 퐻1: 훽푗 ≠ 0 in the following model:

′ 441 퐸(푌푖) = 훽0 + 훽푗퐺푖푗 + 휷푪푿풊 , (14)

th 442 where 푌푖 is BMI, 퐺푖푗 is the number of minor alleles (0, 1, or 2) at the j SNP (j = 1, …, 242),

th ̂ 443 and 푿풊 is the vector of covariates of the i subject (푖 = 1, ⋯ , 18424). All 훽푗s of the 12

444 genome-wide significant SNPs were positive, representing that their minor alleles were

19 445 associated with increased BMIs.

446 When exercise (퐸푖 = 1 if yes; 퐸푖 = 0 if no) and the SNP×exercise interaction were

447 included in the 12 models, as follows:

′ 448 퐸(푌푖) = 훽0 + 훽푗퐺푖푗 + 훽퐸퐸푖 + 훽퐼푛푡푗퐺푖푗 × 퐸푖 + 휷푪푿풊 , 푗 = 1, ⋯ ,12, (15)

̂ ̂ 449 all 훽푗s were positive, and all 훽퐼푛푡푗s were negative (푗 = 1, ⋯ ,12). This indicated that the minor

450 alleles of the 12 SNPs were associated with increased BMIs, but their effects were blunted by

451 performing regular exercise. As mentioned above, a 훽퐼푛푡푗 in the opposite direction to 훽푗

̂ ̂ 452 weakens the magnitude of marginal effect of this SNP, and therefore 훽푗 from model (15) > 훽푗

453 from model (14). This is a situation similar to our simulation scenario 9 in Table 1, because

454 exercise is associated with a decrease in obesity measures. According to Figure 2 (scenario 9, 242

455 SNPs), RIDGE is the most powerful test. In real data, the FTO gene harbors more

456 BMI-associated SNPs than our simulation scenario 9, and these SNPs interact with E in the same

457 direction. Therefore, GRS-based tests are more powerful than iSKAT and ADABF.

458 According to RIDGE, each 1 sd increase in BMI-GRS was associated with a 0.1743 kg/m2

459 lower BMI in exercisers than in non-exercisers (p = 0.0009, Table 3). Each 1 sd increase in

460 BFP-GRS was associated with a 0.2661% lower BFP in exercisers than in non-exercisers (p =

461 0.0031). Each 1 sd increase in WC-GRS was associated with a 0.3854 cm lower WC in exercisers

462 than in non-exercisers (p = 0.0052). Each 1 sd increase in HC-GRS was associated with a 0.3868

463 cm lower HC in exercisers than in non-exercisers (p = 0.0001). As shown by Figure 4, except

464 WHR, GRS effect on each obesity measure was smaller in exercisers than in non-exercisers. For

465 each obesity measure, 훾̂퐼푛푡 in Table 3 is approximately equal to the length of blue bar – the

466 length of red bar in Figure 4. In the whole study population, each 1 sd increase in BMI-GRS was

467 associated with a 0.2420 kg/m2 (the length of orange bar in Fig. 3A) higher BMI (P = 1.1 ×

20 −20 468 10 ). This association was stronger in non-exercisers than in exercisers (PInt = 0.0009, Table 3).

469 In non-exercisers, each 1 sd increase in BMI-GRS was associated with a 0.3136 kg/m2 (the

470 length of red bar in Fig. 3A) higher BMI (P = 3.4 × 10−18); in exercisers, each 1 sd increase in

471 BMI-GRS was associated with a 0.1382 kg/m2 (the length of blue bar in Fig. 3A) higher BMI (P

472 = 1.4 × 10−4).

473 [Figure 4 is approximately here]

474 Performing regular exercises is associated with an attenuation of the adverse influence of the

475 FTO gene on both WC and HC, but not on WHR because it is a ratio of WC to HC. This negative

476 result for WHR is in line with the result from polygenic analyses (i.e., aggregating information of

477 multiple nearly independent SNPs across the genome) (Lin, Chan, et al., 2019).

478 FGF5 x WHR interaction on blood pressure levels

479 In addition to binary exposures, we also provide an example for continuous exposures. The

480 FGF5 gene is associated with blood pressure levels in Han Chinese (Lin, Huang, et al., 2019; Liu

481 et al., 2011). FGF5 spans from 81,187,742 to 81,212,171 base pairs on chromosome 4, according

482 to the human genome GRCh37/hg19 assembly. Similar to many gene-based tests (Liu et al.,

483 2010), our analysis region also included 50 kb in the 3’ and 5’ regions that might regulate the

484 gene. A total of 38 SNPs (all with MAF > 1%) were included in this chromosomal region. It has

485 been known that central obesity is a risk factor for hypertension (Parrinello et al., 1996).

486 Therefore, we used the 6 G×E tests to investigate whether the influence of FGF5 can be

487 modified by an indicator of central obesity, WHR. Covariates adjusted in all models included sex,

488 age (in years), drinking status, smoking status, and the first 10 ancestry PCs. As described above,

489 a total of 18,424 unrelated TWB subjects were included in our analysis, where 9,093 were males

490 and 9,331 were females.

21 491 [Table 4 is approximately here]

492 Table 4 presents the results of 6 methods. All 6 tests suggested the significance of

493 FGF5×WHR interaction on diastolic blood pressure (DBP) and systolic blood pressure (SBP)

494 (푃퐼푛푡 < 0.05). Positive 훾̂퐼푛푡s from GRS-based tests represent that a higher WHR acerbates the

495 adverse influence of FGF5.

496 The above two examples demonstrate using GRS-based methods to infer whether E

497 attenuates or exacerbates the adverse influence of a candidate gene (FTO and FGF5,

498 respectively).

499 Discussion 500 Detecting GxE has been an important but challenging issue. Although set-based GxE tests

501 have been evaluated in a genome-wide context, very few genes have been found to interact with

0.05 502 some exposures at the genome-wide significance level (i.e., p < = 2.5 × 10−6. Because 20000

503 there are ~20,000 genes across the genome, 2.5 × 10−6 is the commonly used significance level

504 in genome-wide gene-based analyses) (Chen et al., 2014; Lin, Huang, et al., 2019). Searching for

505 evidence of GxE for candidate genes may still be a more feasible strategy.

506 It is common to see a GRS that aggregates information of multiple nearly independent SNPs

507 across the genome (Ahmad et al., 2013; Lin, Chan, et al., 2019; Lin et al., 2020; Rask-Andersen

508 et al., 2017). However, there have been few applications of using GRS in gene-based GxE tests.

509 In this work, we evaluate the performance of GRS gene-based GxE approach, with simulations

510 and real applications. GRS-based tests can outperform other methods if interactions tend to go in

511 the same direction (Aschard 2016). The filtering stage of SBERIA overlooks LD among SNPs

512 because it considers L SNPs in L respective regressions (equation 8). RIDGE has the advantage

513 of accommodating LD among SNPs and providing regression coefficients that lead to minimum

22 514 MSE. Although RIDGE has been used to address LD in genetic association analysis (Malo et al.,

515 2008), our study is the first attempt to use the ridge regression to construct a GRS for GxE

516 analyses. ENET and LASSO can not only accommodate LD among SNPs, but they also select

517 SNPs by shrinking some regression coefficients to 0.

518 Although ENET has been suggested as a powerful GxE test in pathway analyses (Huls,

519 Ickstadt, et al., 2017; Huls, Kramer, et al., 2017), in this study, we showed that its performance to

520 find trait-associated SNPs can be compromised if E attenuates the adverse effect of a gene

521 (simulation scenarios 2, 4, 7, 9, 11, and 14). As a result, RIDGE is recommended if E attenuates

522 the adverse effects of most SNPs in a gene, as shown in the application of FTO×exercise

523 interaction on obesity measures (Table 3). In contrast, if E exacerbates the adverse effects of most

524 SNPs in a gene, ENET and LASSO will be more powerful, as shown in the application of

525 FGF5×WHR interaction on SBP (Table 4).

526 In this work, we described how to use GRS approaches to infer whether E attenuates or

527 exacerbates the adverse effect of a gene. GRS approaches can not only provide a p-value for GxE

528 but also infer how E interacts with the gene. The signs of 훾퐼푛푡 were usually correctly specified

529 except scenarios 4 and 11, where only 2 SNP×E interactions were specified and their interaction

530 effects were in the opposite direction to the SNP main effects. SBERIA usually makes more

531 mistakes because its filtering stage overlooks LD among SNPs by considering L SNPs in L

532 respective regressions (equation 8).

533 In the cross-over situations (simulation scenarios 5 and 12), where E exacerbates the adverse

534 effects of some SNPs but attenuates the adverse effects of others, GRS approaches were

535 underpowered and ADABF became the most useful. None of the 6 methods could outperform the

536 others across all situations. In this work, we performed simulations to explore the relative

23 537 performances of these 6 tests in detecting GxE and correctly indicating the interaction direction.

538 To show the utility of GRS-based approaches, we provided an example for binary exposure

539 and an example for continuous exposure: (1) FTO x exercise interaction on 5 obesity measures

540 (Table 3); (2) FGF5 x WHR interaction on blood pressure levels (Table 4). The 훾퐼푛푡s in Table 3

541 were consistently negative, representing that performing regular exercise attenuates the adverse

542 influence of the FTO gene on 4 obesity measures. Moreover, 훾퐼푛푡s in Table 4 were consistently

543 positive, indicating that a higher WHR exacerbates the adverse effect of the FGF5 gene on blood

544 pressure levels.

545 This work provides contributions to an important issue: identify whether E attenuates or

546 exacerbates the adverse influence of a candidate gene. Genes exhibiting interactions with E are

547 usually difficult to detect and replicate (McAllister et al., 2017), especially at the genome-wide

548 significance level (2.5 × 10−6). When testing for many genes simultaneously, even true positive

549 GxEs would have difficulty in standing out among all the noise (Lin and Lee 2010). Therefore,

550 our discussions here are restricted to candidate genes. As TWB keeps recruiting more subjects, in

551 the future we look forward to performing genome-wide GxE analyses on a larger TWB cohort.

552 Author contributions 553 W.-Y.L. developed the methods and the analysis tools, designed and performed the simulation

554 studies, analyzed the TWB data, and wrote the manuscript. Y.-S.L. contributed to the

555 programming for the simulations and data analyses. C.-C.C., Y.-L.L., S.-J.T., and P.-H.K.

556 contributed to the writing of the manuscript. P.-H.K., Y.-L.L., and S.-J.T. provided the TWB data.

557 All authors reviewed the manuscript.

558 Acknowledgments 559 We would like to thank the reviewers for their insightful and constructive comments, and Mr.

24 560 Ya-Chin Lee for assisting with the acquisition of the TWB data. This study was supported by the

561 Ministry of Science and Technology of Taiwan (MOST, grant number MOST

562 107-2314-B-002-195-MY3 to Wan-Yu Lin). The acquisition of TWB data was supported by a

563 MOST grant (grant number MOST 102-2314-B-002-117-MY3 to Po-Hsiu Kuo) and a

564 collaboration grant (National Taiwan University Hospital: grant number UN106-050 to

565 Shyr-Chyr Chen and Po-Hsiu Kuo).

566 References 567 568 Abebe SM, Berhane Y, Worku A, Getachew A. 2015. Prevalence and associated factors of 569 hypertension: a crossectional community based study in northwest ethiopia. PLoS 570 One.10:e0125210. 571 Ahmad S, Rukh G, Varga TV, Ali A, Kurbasic A, Shungin D, et al. 2013. Gene x physical activity 572 interactions in obesity: combined analysis of 111,421 individuals of European ancestry. PLoS 573 genetics.9:e1003607. 574 Alonso-Gonzalez A, Calaza M, Rodriguez-Fontenla C, Carracedo A. 2019. Novel Gene-Based 575 Analysis of ASD GWAS: Insight Into the Biological Role of Associated Genes. Front Genet.10:733. 576 Aschard H. 2016. A perspective on interaction effects in genetic association studies. Genetic 577 epidemiology. Dec;40:678-688. 578 Baxter AG, Jordan MA. 2012. From markers to molecular mechanisms: type 1 diabetes in the 579 post-GWAS era. Rev Diabet Stud. Winter;9:201-223. 580 Bourgon R, Gentleman R, Huber W. 2010. Independent filtering increases detection power for 581 high-throughput experiments. Proc Natl Acad Sci U S A. May 25;107:9546-9551. 582 Chen CH, Yang JH, Chiang CWK, Hsiung CN, Wu PE, Chang LC, et al. 2016. Population structure of 583 Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan 584 Biobank project. Human Molecular Genetics. Dec 15;25:5321-5331. 585 Chen H, Meigs JB, Dupuis J. 2014. Incorporating gene-environment interaction in testing for 586 association with rare genetic variants. Hum Hered.78:81-90. 587 Coleman JR, Euesden J, Patel H, Folarin AA, Newhouse S, Breen G. 2016. Quality control, 588 imputation and analysis of genome-wide genotyping data from the Illumina HumanCoreExome 589 microarray. Brief Funct Genomics. Jul;15:298-304. 590 Dai JY, Kooperberg C, Leblanc M, Prentice RL. 2012. Two-stage testing procedures with 591 independent filtering for genome-wide gene-environment interaction. Biometrika. 592 Dec;99:929-944. 593 Fawcett KA, Barroso I. 2010. The genetics of obesity: FTO leads the way. Trends in genetics : TIG. 594 Jun;26:266-274. 595 Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, et al. 2007. A 596 common variant in the FTO gene is associated with body mass index and predisposes to

25 597 childhood and adult obesity. Science. May 11;316:889-894. 598 Friedman J, Hastie T, Tibshirani R. 2010. Regularization Paths for Generalized Linear Models via 599 Coordinate Descent. J Stat Softw. Feb;33:1-22. 600 Frost HR, Shen L, Saykin AJ, Williams SM, Moore JH, Alzheimer's Disease Neuroimaging I. 2016. 601 Identifying significant gene-environment interactions using a combination of screening testing 602 and hierarchical false discovery rate control. Genetic epidemiology. Nov;40:544-557. 603 Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, et al. 2012. zCall: a rare 604 variant caller for array-based genotyping: genetics and population analysis. Bioinformatics. Oct 605 1;28:2543-2545. 606 Greenland S. 1983. Tests for interaction in epidemiologic studies: a review and a study of power. 607 Stat Med. Apr-Jun;2:243-251. 608 Hodge SE, Greenberg DA. 2016. How Can We Explain Very Low Odds Ratios in GWAS? I. 609 Polygenic Models. Hum Hered.81:173-180. 610 Hoerl AE, Kennard RW. 1970. Ridge Regression - Biased Estimation for Nonorthogonal Problems. 611 Technometrics.12:55-67. 612 Huls A, Ickstadt K, Schikowski T, Kramer U. 2017. Detection of gene-environment interactions in 613 the presence of linkage disequilibrium and noise by using genetic risk scores with internal 614 weights from elastic net regression. BMC genetics. Jun 12;18:55. 615 Huls A, Kramer U, Carlsten C, Schikowski T, Ickstadt K, Schwender H. 2017. Comparison of 616 weighting approaches for genetic risk scores in gene-environment interaction studies. BMC 617 genetics. Dec 16;18:115. 618 Hunter DJ. 2005. Gene-environment interactions in human diseases. Nat Rev Genet. 619 Apr;6:287-298. 620 Jiao S, Hsu L, Bezieau S, Brenner H, Chan AT, Chang-Claude J, et al. 2013. SBERIA: set-based 621 gene-environment interaction test for rare and common variants in complex diseases. Genetic 622 epidemiology. Jul;37:452-464. 623 Jonsson A, Renstrom F, Lyssenko V, Brito EC, Isomaa B, Berglund G, et al. 2009. Assessing the 624 effect of interaction between an FTO variant (rs9939609) and physical activity on obesity in 625 15,925 Swedish and 2,511 Finnish adults. Diabetologia. Jul;52:1334-1338. 626 Kilpelainen TO, Qi L, Brage S, Sharp SJ, Sonestedt E, Demerath E, et al. 2011. Physical activity 627 attenuates the influence of FTO variants on obesity risk: a meta-analysis of 218,166 adults and 628 19,268 children. Plos Med. Nov;8:e1001116. 629 Li J, Shi JX, Huang W, Sun JL, Wu Y, Duan Q, et al. 2015. Variant Near FGF5 Has Stronger Effects 630 on Blood Pressure in Chinese With a Higher Body Mass Index. Am J Hypertens. 631 Aug;28:1031-1037. 632 Lin WY, Chan CC, Liu YL, Yang AC, Tsai SJ, Kuo PH. 2019. Performing different kinds of physical 633 exercise differentially attenuates the genetic effects on obesity measures: Evidence from 18,424 634 Taiwan Biobank participants. PLoS genetics. Aug;15:e1008277. 635 Lin WY, Chan CC, Liu YL, Yang AC, Tsai SJ, Kuo PH. 2020. Sex-specific autosomal genetic effects 636 across 26 human complex traits. Human Molecular Genetics.in press. 637 Lin WY, Huang CC, Liu YL, Tsai SJ, Kuo PH. 2018. Polygenic approaches to detect 638 gene-environment interactions when external information is unavailable. Brief Bioinform. Sep 639 13;, doi: 10.1093/bib/bby086. [Epub ahead of print]. 640 Lin WY, Huang CC, Liu YL, Tsai SJ, Kuo PH. 2019. Genome-wide gene-environment interaction

26 641 analysis using set-based association tests. Frontiers in Genetics.9:715. 642 Lin WY, Lee WC. 2010. Incorporating prior knowledge to facilitate discoveries in a genome-wide 643 association study on age-related macular degeneration. BMC Res Notes.3:26. 644 Lin XY, Lee S, Christiani DC, Lin XH. 2013. Test for interactions between a genetic marker set and 645 environment in generalized linear models. Biostatistics. Sep;14:667-681. 646 Lin XY, Lee S, Wu MC, Wang CL, Chen H, Li ZL, et al. 2016. Test for Rare Variants by Environment 647 Interactions in Sequencing Association Studies. Biometrics. Mar;72:156-164. 648 Liu C, Li H, Qi Q, Lu L, Gan W, Loos RJ, et al. 2011. Common variants in or near FGF5, CYP17A1 649 and MTHFR genes are associated with blood pressure and hypertension in Chinese Hans. J 650 Hypertens. Jan;29:70-75. 651 Liu JZ, Mcrae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, et al. 2010. A Versatile 652 Gene-Based Test for Genome-wide Association Studies. American journal of human genetics. Jul 653 9;87:139-145. 654 Liu Q, Chen LS, Nicolae DL, Pierce BL. 2016. A unified set-based test with adaptive filtering for 655 gene-environment interaction analyses. Biometrics. Jun;72:629-638. 656 Lowe JK, Maller JB, Pe'er I, Neale BM, Salit J, Kenny EE, et al. 2009. Genome-wide association 657 studies in an isolated founder population from the Pacific Island of Kosrae. PLoS genetics. 658 Feb;5:e1000365. 659 Malo N, Libiger O, Schork NJ. 2008. Accommodating linkage disequilibrium in genetic-association 660 analyses via ridge regression. American journal of human genetics. Feb;82:375-385. 661 McAllister K, Mechanic LE, Amos C, Aschard H, Blair IA, Chatterjee N, et al. 2017. Current 662 Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex 663 Diseases. Am J Epidemiol. Oct 1;186:753-761. 664 Mok KY, Schneider SA, Trabzuni D, Stamelou M, Edwards M, Kasperaviciute D, et al. 2014. 665 Genomewide association study in cervical dystonia demonstrates possible association with 666 sodium leak channel. Mov Disord. Feb;29:245-251. 667 Ombrello MJ, Kirino Y, de Bakker PI, Gul A, Kastner DL, Remmers EF. 2014. Behcet 668 disease-associated MHC class I residues implicate antigen binding and regulation of 669 cell-mediated cytotoxicity. Proc Natl Acad Sci U S A. Jun 17;111:8867-8872. 670 Parrinello G, Scaglione R, Pinto A, Corrao S, Cecala M, DiSilvestre G, et al. 1996. Central obesity 671 and hypertension - The role of plasma endothelin. Am J Hypertens. Dec;9:1186-1191. 672 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. 2007. PLINK: a tool set 673 for whole-genome association and population-based linkage analyses. American journal of 674 human genetics. Sep;81:559-575. 675 Rask-Andersen M, Karlsson T, Ek WE, Johansson A. 2017. Gene-environment interaction study 676 for BMI reveals interactions between genetic factors and physical activity, alcohol consumption 677 and socioeconomic status. PLoS genetics. Sep;13:e1006977. 678 Scuteri A, Sanna S, Chen WM, Uda M, Albai G, Strait J, et al. 2007. Genome-wide association 679 scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS 680 genetics. Jul;3:1200-1210. 681 Sirugo G, Williams SM, Tishkoff SA. 2019. The Missing Diversity in Human Genetic Studies. Cell. 682 Mar 21;177:26-31. 683 Tibshirani R. 1996. Regression shrinkage and selection via the Lasso. J Roy Stat Soc B 684 Met.58:267-288.

27 685 Vimaleswaran KS, Li S, Zhao JH, Luan J, Bingham SA, Khaw KT, et al. 2009. Physical activity 686 attenuates the body mass index-increasing influence of genetic variation in the FTO gene. Am J 687 Clin Nutr. Aug;90:425-428. 688 Waldmann P, Meszaros G, Gredler B, Fuerst C, Solkner J. 2013. Evaluation of the lasso and the 689 elastic net in genome-wide association studies. Front Genet.4:270. 690 Willer CJ, Bonnycastle LL, Conneely KN, Duren WL, Jackson AU, Scott LJ, et al. 2007. Screening of 691 134 single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes 692 replicates association with 12 SNPs in nine genes. Diabetes. Jan;56:256-264. 693 Wray NR, Pergadia ML, Blackwood DH, Penninx BW, Gordon SD, Nyholt DR, et al. 2012. 694 Genome-wide association study of major depressive disorder: new results, meta-analysis, and 695 lessons learned. Mol Psychiatry. Jan;17:36-48. 696 WTCCC. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 697 3,000 shared controls. Nature. Jun 7;447:661-678. Epub 2007/06/08. 698 Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J Roy Stat Soc 699 B.67:301-320. 700

28 701 Figure Legends 702

703 Figure 1 Empirical type I error rates under the nominal significance level of 0.05 (continuous

704 trait)

705

706 Figure 2 Power given a significance level of 0.05, for continuous traits and 푃(퐸 = 1) = 0.2

707

708 Figure 3 Percentages of sign-misspecifications for 훾퐼푛푡, under continuous traits and

709 푃(퐸 = 1) = 0.2

710

711 Figure 4 The effect of GRS on the 5 obesity measures

712 The regression model was built as 푂푏푒푠푖푡푦 푚푒푎푠푢푟푒 = 훽0 + 훽퐺푅푆퐺푅푆 + 휷푪푪풐풗풂풓풊풂풕풆풔 + 휀,

713 where GRS was obtained by RIDGE. Three regression models were built for each obesity

714 measure, one for all 18,424 subjects, one for 7,652 exercisers, and one for 10,764 non-exercisers.

̂ 715 The bars represent 훽퐺푅푆 on an obesity measure, and the black segments mark the 95%

716 confidence intervals, i.e.,

̂ ̂ ̂ ̂ 717 [훽퐺푅푆 − 1.96 × standard error of 훽퐺푅푆, 훽퐺푅푆 + 1.96 × standard error of 훽퐺푅푆]. Covariates

718 adjusted in all models included sex, age (in years), educational attainment, drinking status,

719 smoking status, and the first 10 ancestry PCs.

720

721

722

723

29 724 Tables 725

Scenario E 휷푮ퟏ 휷푮ퟐ 휷푮ퟑ 휷푮ퟒ 휷푰풏풕ퟏ 휷푰풏풕ퟐ 휷푰풏풕ퟑ 휷푰풏풕ퟒ 1 + + + + + + + + + Exacerbation 2 + + + + + − − − − Attenuation 3 + + + + + + + 0 0 Exacerbation 4 + + + + + − − 0 0 Attenuation 5 + + + + + + + − − cross-over 6 + + + − − + + − − Exacerbation 7 + + + − − − − + + Attenuation 8 − + + + + + + + + Exacerbation 9 − + + + + − − − − Attenuation 10 − + + + + + + 0 0 Exacerbation 11 − + + + + − − 0 0 Attenuation 12 − + + + + + + − − cross-over 13 − + + − − + + − − Exacerbation 14 − + + − − − − + + Attenuation 726

727 Table 1 The 14 simulation scenarios for power comparison, where “exacerbation” and

728 “attenuation” mean that E = 1 (or a larger continuous E) exacerbates or attenuates the adverse

729 effect of a candidate gene, respectively.

30 Overall Males Females

Total, n (%) 18,424 9,093 9,331

Age (years), mean (s.d.) 48.9 (11.0) 49.0 (11.0) 48.9 (10.9)

Smoking, n (%) 2,134 (11.6) 1,882 (20.7) 252 (2.7)

Drinking, n (%) 1,345 (7.3) 1,178 (13.0) 167 (1.8)

Regular exercise, n (%) 7,652 (41.5) 3,896 (42.8) 3,756 (40.3)

Educational attainment, 5.46 (0.99) 5.62 (0.92) 5.29 (1.02) mean (s.d.) BMI (kg/m2), mean (s.d.) 24.31 (3.66) 25.2 (3.4) 23.5 (3.7)

Body fat %, mean (s.d.) 27.29 (7.38) 22.9 (5.5) 31.5 (6.5)

Waist circumference (cm), 83.93 (10.03) 87.4 (9.1) 80.5 (9.7) mean (s.d.) Hip circumference (cm), 96.34 (6.90) 97.6 (6.5) 95.2 (7.0) mean (s.d.) Waist-hip ratio, mean (s.d.) 0.87 (0.068) 0.90 (0.06) 0.85 (0.07)

Diastolic blood pressure 73.11 (11.10) 76.9 (10.6) 69.4 (10.3) (mmHg), mean (s.d.) Systolic blood pressure 117.62 (17.37) 121.9 (16.1) 113.5 (17.6) (mmHg), mean (s.d.) 730

731 Table 2. Basic characteristics of TWB participants stratified by sex

732

733

31 Trait RIDGE ENET LASSO SBERIA iSKAT ADABF

훾̂퐼푛푡 -0.1743 -0.0821 -0.0964 -0.1482 BMI (kg/m2) PInt 0.0009 0.1192 0.0671 0.0067 0.2043 0.1700

훾̂퐼푛푡 -0.2661 -0.2069 -0.2081 -0.2259 Body fat % PInt 0.0031 0.0212 0.0205 0.0160 0.2430 0.2200

Waist circumference 훾̂퐼푛푡 -0.3854 -0.3719 -0.3760 -0.2786

(cm) PInt 0.0052 0.0069 0.0063 0.0512 0.5369 0.3700

Hip circumference 훾̂퐼푛푡 -0.3868 -0.3286 -0.3291 -0.2902

(cm) PInt 0.0001 0.0011 0.0011 0.0055 0.5061 0.3300

훾̂퐼푛푡 -0.000116 -0.000775 -0.000374 -0.000314 Waist-to-hip ratio PInt 0.8951 0.3773 0.6699 0.7308 0.7994 0.3100

734

735 Table 3 FTO x exercise interaction on 5 obesity measures. Significant results with PInt < 0.05

736 are highlighted. Negative 훾̂퐼푛푡s indicate that regular physical exercise attenuates the adverse

737 influence of FTO on obesity measures.

32 Trait RIDGE ENET LASSO SBERIA iSKAT ADABF

훾̂퐼푛푡 0.2419 0.1980 0.2141 0.2378 DBP (mmHg) PInt 0.0013 0.0082 0.0042 0.0014 0.0154 0.0096

훾̂퐼푛푡 0.3396 0.3548 0.3551 0.3261 SBP (mmHg) PInt 0.0027 0.0017 0.0017 0.0039 0.0482 0.0480

738

739 Table 4 FGF5 x WHR interaction on blood pressure levels. Significant results with PInt < 0.05

740 are highlighted. Positive 훾̂퐼푛푡s indicate that a higher WHR exacerbates the adverse influence of

741 FGF5 on blood pressure levels.

742

33 743 Supplementary Material 744 Figure S1 Empirical type I error rates under the nominal significance level of 0.05 (binary trait)

745 Figure S2 The sensitivity of the marginal-association filtering in ENET, LASSO, and SBERIA,

746 for continuous traits and 푃(퐸 = 1) = 0.2

747 Figure S3 The positive predictive value of the marginal-association filtering in ENET, LASSO,

748 and SBERIA, for continuous traits and 푃(퐸 = 1) = 0.2

749 Figure S4 Power given a significance level of 0.05, for continuous traits and 푃(퐸 = 1) = 0.5

750 Figure S5 Percentages of sign-misspecifications for 훾퐼푛푡, under continuous traits and

751 푃(퐸 = 1) = 0.5

752 Figure S6 Power given a significance level of 0.05, for continuous traits and a continuous E

753 Figure S7 Percentages of sign-misspecifications for 훾퐼푛푡, under continuous traits and a

754 continuous E

755 Figure S8 Power given a significance level of 0.05, for binary traits, 푃(푌 = 1) = 0.1, and

756 푃(퐸 = 1) = 0.2

757 Figure S9 Percentages of sign-misspecifications for 훾퐼푛푡, under binary traits, 푃(푌 = 1) = 0.1,

758 and 푃(퐸 = 1) = 0.2

759 Figure S10 Power given a significance level of 0.05, for binary traits, 푃(푌 = 1) = 0.4, and

760 푃(퐸 = 1) = 0.2

761 Figure S11 Percentages of sign-misspecifications for 훾퐼푛푡, under binary traits, 푃(푌 = 1) =

762 0.4, and 푃(퐸 = 1) = 0.2

763 Figure S12 Power given a significance level of 0.05, for binary traits, 푃(푌 = 1) = 0.1, and

764 푃(퐸 = 1) = 0.5

765 Figure S13 Percentages of sign-misspecifications for 훾퐼푛푡, under binary traits, 푃(푌 = 1) =

34 766 0.1, and 푃(퐸 = 1) = 0.5

767 Figure S14 Power given a significance level of 0.05, for binary traits, 푃(푌 = 1) = 0.4, and

768 푃(퐸 = 1) = 0.5

769 Figure S15 Percentages of sign-misspecifications for 훾퐼푛푡, under binary traits, 푃(푌 = 1) =

770 0.4, and 푃(퐸 = 1) = 0.5

771 Figure S16 The sensitivity of the marginal-association filtering in ENET, LASSO, and

772 SBERIA, for binary traits, 푃(푌 = 1) = 0.4, and 푃(퐸 = 1) = 0.5

773 Figure S17 The positive predictive value of the marginal-association filtering in ENET,

774 LASSO, and SBERIA, for binary traits, 푃(푌 = 1) = 0.4, and 푃(퐸 = 1) = 0.5

775 Figure S18 Power given a significance level of 0.05, for binary traits, 푃(푌 = 1) = 0.1, and a

776 continuous E

777 Figure S19 Percentages of sign-misspecifications for 훾퐼푛푡, under binary traits, 푃(푌 = 1) =

778 0.1, and a continuous E

779 Figure S20 Power given a significance level of 0.05, for binary traits, 푃(푌 = 1) = 0.4, and a

780 continuous E

781 Figure S21 Percentages of sign-misspecifications for 훾퐼푛푡, under binary traits, 푃(푌 = 1) =

782 0.4, and a continuous E

783 Figure S22 Average time spent (in seconds) for each simulation replication, under 퐻0, for

784 continuous traits.

785 Figure S23 Average time spent (in seconds) for each simulation replication, under 퐻0, for

786 binary traits.

787 Figure S24 Average time spent (in seconds) for each simulation replication, under 퐻1, for

788 continuous traits and 푃(퐸 = 1) = 0.2

35 789 Figure S25 Average time spent (in seconds) for each simulation replication, under 퐻1, for

790 continuous traits and 푃(퐸 = 1) = 0.5

791 Figure S26 Average time spent (in seconds) for each simulation replication, under 퐻1, for

792 binary traits, 푃(푌 = 1) = 0.1, and 푃(퐸 = 1) = 0.2

793 Figure S27 Average time spent (in seconds) for each simulation replication, under 퐻1, for

794 binary traits, 푃(푌 = 1) = 0.4, and 푃(퐸 = 1) = 0.2

795 Figure S28 Average time spent (in seconds) for each simulation replication, under 퐻1, for

796 binary traits, 푃(푌 = 1) = 0.1, and 푃(퐸 = 1) = 0.5

797 Figure S29 Average time spent (in seconds) for each simulation replication, under 퐻1, for

798 binary traits, 푃(푌 = 1) = 0.4, and 푃(퐸 = 1) = 0.5

36 799 800 801 Figure 1

37 802 803 804 Figure 2

38 805 806 807 Figure 3

39 808 809 Figure 4