ADJUSTING FOR PUBLICATION BIAS IN JASP 1

Adjusting for Publication Bias in JASP — Selection Models and Robust Bayesian Meta-Analysis

František Bartoš1,2∗, Maximilian Maier1∗, & Eric-Jan Wagenmakers1 1 University of Amsterdam 2 Charles University, Prague

∗Both authors contributed equally.

Correspondence concerning this article should be addressed to: František Bartoš, University of Amsterdam, Department of Psychological Methods, Nieuwe Achtergracht 129-B, 1018 VZ Amsterdam, The Netherlands, [email protected]

Author Note

This project was supported in part by a Vici grant (#016.Vici.170.083) to EJW. ADJUSTING FOR PUBLICATION BIAS IN JASP 2

Abstract

Meta-analysis is essential for cumulative science, but its validity is compromised by publication bias. In order to mitigate the impact of publication bias, one may apply selection models, which estimate the degree to which non-significant studies are suppressed. Implemented in JASP, these methods allow researchers without programming experience to conduct state-of-the-art publication bias adjusted meta-analysis. In this tutorial, we demonstrate how to conduct a publication bias adjusted meta-analysis in JASP and interpret the results. First, we explain how frequentist selection models correct for publication bias. Second, we introduce Robust Bayesian Meta-Analysis (RoBMA), a Bayesian extension of the frequentist selection models. We illustrate the methodology with two data sets and discuss the interpretation of the results. In addition, we include example text to provide concrete guidance on reporting the meta-analytic results in an academic article. Finally, three tutorial videos are available at https://tinyurl.com/y4g2yodc. Keywords: Selection Models, Robust Bayesian Meta-Analysis, Model-Averaging, Publication Bias ADJUSTING FOR PUBLICATION BIAS IN JASP 3

Adjusting for Publication Bias in JASP — Selection Models and Robust Bayesian Meta-Analysis

1 Meta-analyses are a powerful tool for evidence synthesis. However, publication bias,

2 the preferential publishing of significant studies, leads to an overestimation of effect sizes

3 when accumulating evidence across a set of primary studies. Some researchers claim that

4 most research findings might never be published but remain in researchers’ file drawers

5 (e.g., Ioannidis, 2005; Rosenthal, 1979). Even if the true extent of publication bias was less

6 severe than these researchers have suggested, it would remain a formidable threat to the

7 validity of meta-analyses (Borenstein et al., 2009).

8 To alleviate this problem and explicitly account for publication bias, a variety of

9 statistical methods have been proposed (e.g., Carter & McCullough, 2014; Duval &

10 Tweedie, 2000; Egger et al., 1997; Iyengar & Greenhouse, 1988; Simonsohn et al., 2014;

11 Stanley & Doucouliagos, 2017). However, simulations have shown that most methods

12 perform poorly under heterogeneity (Carter et al., 2019; Maier et al., 2020; McShane et al.,

13 2016; Renkewitz & Keiner, 2019) or do not provide meta-analytic estimates (Bartoš &

14 Schimmack, 2020; Brunner & Schimmack, 2020). Heterogeneity occurs whenever the

15 individual studies are not exact replications of each other and the true effect size varies

16 across primary studies, which is usually the case in psychology (e.g., McShane et al., 2016).

17 Under high heterogeneity, most tests for publication bias have either high false-positive

18 rates or low power; in addition, the associated meta-analytic effect size estimates are biased

19 or highly variable. An exception to this rule are selection models – these models provide an

20 explicit account of a p-value based publication bias process and adjust the estimate of

21 effect size accordingly. Selection models perform well even under high heterogeneity

22 (Carter et al., 2019; Maier et al., 2020; McShane et al., 2016). However, despite their

23 strong performance in simulations, selection models are rarely used in practice. Their

1 24 relative obscurity is arguably because selection models are considered overly complicated ,

1 For instance, Rothstein et al. (2005, p. 172) remark that “Weight function models are complex and ADJUSTING FOR PUBLICATION BIAS IN JASP 4

25 and that they have, to the best of our knowledge, not yet been implemented in statistical

26 software packages with a graphical user interface (GUI), limiting their accessibility for

27 applied researchers.

28 To make selection models more readily available to applied researchers, we

29 implemented these models in the open-source statistical program JASP (JASP Team,

30 2020), as part of the Meta-Analysis module. The implementation concerns an intuitive

31 graphical interface for the packages weightr (Coburn et al., 2019) and RoBMA (Bartoš &

32 Maier, 2020) that allow users to fit either frequentist or Bayesian selection models. Below

33 we first provide a conceptual introduction to frequentist selection models and show how to

34 fit these models using JASP. Second, we introduce a Bayesian selection method, Robust

35 Bayesian Meta-Analysis (RoBMA, Maier et al., 2020), and show how it can overcome

36 several limitations that are inherent to frequentist selection models. We explain how to

37 interpret the results using two examples: a meta-analysis on the influence of “interracial

38 dyads” on performance and observed behavior (Toosi et al., 2012) and a meta-analysis on

39 acculturation mismatch (Lui, 2015). We also provide an example report of a result section

40 that describes the application of both frequentist and Bayesian selection models to the

41 meta-analysis on the influence of “interracial dyads”. Finally, we recorded tutorial videos to

42 facilitate the application of the implemented methods further. The videos are available at

43 https://tinyurl.com/y4g2yodc.

44 Frequentist Selection Models

45 Selection models use weighted likelihood to account for studies that are missing due

46 to publication bias. Selection models are well-established amongst statisticians (e.g.,

47 Iyengar & Greenhouse, 1988; Larose & Dey, 1998; Vevea & Hedges, 1995) and can

48 accommodate realistic assumptions regarding publication bias and heterogeneity. In

49 selection models, analysts specify p-value intervals with different assumed publication

involve a substantial amount of computation. Thus they are unlikely to be used routinely in meta-analysis”. ADJUSTING FOR PUBLICATION BIAS IN JASP 5

50 probabilities, for example, “statistically significant” p-values (p < .05) versus

51 non-significant p-values (p > .05). The models typically use maximum likelihood to obtain

52 a bias adjusted pooled point estimate by accounting for the relative publication

53 probabilities in each interval (called weights) and using the weighted likelihood function.

54 Selection models can accommodate effect size heterogeneity by extending random effects

55 models (McShane et al., 2016; Rothstein et al., 2005; Vevea & Hedges, 1995, pp. 145-174).

56 Selection models can be specified flexibly in several ways. First, researchers can

57 decide between one-sided and two-sided selection. One-sided selection means that only

58 significant effects in the expected direction are more likely to be published. Commonly,

59 significant positive effects are more likely to be published, although in some cases,

60 significant negative effect sizes might be more likely to be published as well. Researchers

61 can specify the direction of selection flexibly. Two-sided selection means that the

62 probability of publication does not depend on the direction of the effect; in other words,

63 positive and negative effects have the same probability of being published, given that they

64 fall in the same p-value interval.

65 Second, researchers may also specify different intervals for different publication

66 probabilities. For example, to account for the fact that marginally significant results

67 (.05 < p < .10) are potentially more likely to be published than non-significant results,

68 researchers could specify this as a third interval. Note that, when the observed effect is in

69 the predicted direction, a marginally significant result using a two-sided test is significant

70 using a one-sided test. Therefore, a two-sided selection process with different publication

71 probabilities for significant versus “marginally significant” findings accommodates a

72 one-sided selection process with publication probabilities depending on whether or not the

73 p-value is statistically significant. ADJUSTING FOR PUBLICATION BIAS IN JASP 6

74 Example 1: Dyadic Interracial Interactions and Performance

75 Toosi et al. (2012) conducted a meta-analysis on the effect of “interracial”

76 interactions on positive attitudes, negative affect, nonverbal behavior, and “objective

77 measures of performance” (Toosi et al., 2012, p.1). The meta-analysis compared dyadic

78 same-race versus “interracial” interactions. A standard reanalysis confirms that

79 “performance” was slightly better in dyads of the same-race compared to dyads of different

80 race, r = 0.070, 95% CI [0.023, 0.117], p = .004, τ (on Cohen’s d scale) = 0.289, 95% CI

2 81 [0.173, 0.370]. Toosi et al. (2012) applied Egger’s regression (Egger et al., 1997) and

82 reported a lack of funnel plot asymmetry, suggesting that the data set is not contaminated

83 by publication bias. However, funnel plot based methods to assess publication bias have

84 repeatedly been criticized for having low power and generating a high proportion of

85 false-positives, especially under heterogeneity (e.g., Lau et al., 2006; Maier et al., 2020).

86 We, therefore, revisit the question of publication bias by reanalyzing this study using

87 selection models in JASP.

2 Our result is similar to that reported by Toosi et al. (2012), namely r = 0.070, 95% CI [0.03, 0.11], I2 = 67.29%. For the re-analysis, we used the data set as recoded by Stanley et al. (2018), accessible at https://osf.io/2vfyj/files/. ADJUSTING FOR PUBLICATION BIAS IN JASP 7

Figure 1 Results from Toosi et al. (2012) Using the Default Settings of JASP Selection Models

Note. Screenshot from the JASP graphical user interface when analyzing the data of Toosi et al. (2012). The analysis settings are specified in the left panel and the associated output is shown in the right panel. The default output concerns (1) a test of heterogeneity; (2) a test of publication bias; (3) adjusted and unadjusted effect size estimates for fixed effects and random effects models. A video tutorial for this analysis is available at https://www.youtube.com/watch?v=mswvm5Ne0eg&t=2s.

88 We start by loading the data set (available on https://osf.io/6hf7r/ including the

89 annotated .jasp file) into JASP. We then open the module menu (by navigating to the “+” ADJUSTING FOR PUBLICATION BIAS IN JASP 8

90 sign in the top right corner of the main ribbon) and activate the Meta-Analysis module. In

91 this module, we choose the “Selection Models” option. The left panel of Figure 1 provides

92 an overview of the resulting GUI for the analysis input options. We set the radio button

93 that determines the input to “Correlation & N”, allowing us to supply effect sizes measured

94 as correlations. The analysis internally transforms the correlations to Cohen’s d effect sizes,

95 estimates the selection models, and transforms the results back into correlations (apart

96 from the heterogeneity estimate τ). We place the variables containing the effect sizes (ES)

3 97 and sample sizes (N) into the appropriate variable boxes. By default, the analysis assumes

4 98 a two-sided selection process with p-value cutoffs .05 and .10. If there are too few studies

99 in one of the specified p-value intervals, the selection model fails to estimate the publication

100 probability. Using the default setting, the JASP implementation tries to circumvent this

101 problem by automatically joining p-value intervals that contain fewer than four p-values

102 (all of these options can be changed in the “Model” tab, see the left panel in Figure 1).

103 The right panel of Figure 1 shows the default output provided by JASP. Under the

104 output tables, the note “Only the following one-sided p-value cutoffs were used:

105 0.025, 0.05, 0.95.” informs us that some of the specified p-value intervals did not

106 contain enough p-values for estimation, and were therefore collapsed. The upper table in

107 the right panel shows that there is statistically significant heterogeneity between studies,

108 Q(54) = 170.77, p < .001. The table underneath shows the results for two tests of

109 publication bias, one under homogeneity and one under heterogeneity. The test that

110 assumes heterogeneity indicates the presence of statistically significant publication bias,

2 111 χ (3) = 13.20, p = .005. The two lower output tables show the results for fixed effects and

3 The “p-value (one-sided)” input is optional; if not provided, the analysis automatically computes the p-values using z approximation in cases of “Effect sizes & SE” input or from t and degrees of freedom computed based on the specified correlations and sample sizes when “Correlations & N” are used

4 The weightr package does not natively support two-sided selection with two-sided p-values; therefore, the specified two-sided p-value cutoffs are internally transformed into one-sided p-value cutoffs. Consequently, the resulting p-value intervals are outputted on the one-sided p-values scale. ADJUSTING FOR PUBLICATION BIAS IN JASP 9

112 random effects estimates, with and without correction for publication bias. The tests

113 shown in the upper two tables suggest that the best estimate of effect size is obtained from

114 the random effects table (because of the presence of heterogeneity), and, within that table,

115 from the row “Adjusted” (because of the presence of publication bias). The adjusted mean

116 estimate is no longer statistically significant, r = −0.011, 95% CI [−0.075, 0.053], p = .734.

117 A visual comparison of the mean estimates from the different models can be obtained using

118 the “Mean Model Estimates” option in the “Plots” section (i.e., Figure 2). The adjusted

119 heterogeneity estimate is τ (on Cohen’s d scale) = 0.152, 95% CI [0.000, 0.227] (not shown

120 here; these results can be obtained by checking “Estimated Heterogeneity” under the

121 “Random Effects” column in the “Model” section).

Figure 2 Comparison of Effect Size Estimates of The Adjusted/Unadjusted and Fixed/Random Effects models from Toosi et al. (2012). Figure from JASP

Fixed effects Fixed effects (adjusted) Random effects Random effects (adjusted)

-0.05 0.00 0.05 0.10 Mean Estimates (ρ)

Note. After adjusting for publication bias (second and fourth estimate from the top) the estimated effect sizes are no longer statistically significant.

122 Limitations of Frequentist Selection Models

123 Frequentist selection models have several shortcomings. Before we discuss these, it

124 is important to reiterate that they are the only frequentist meta-analytical method to

125 perform well under both publication bias and heterogeneity (Carter et al., 2019; Maier

126 et al., 2020; McShane et al., 2016). Therefore, opting for a different frequentist method

127 with less visible limitations would usually result in worse inferences. ADJUSTING FOR PUBLICATION BIAS IN JASP 10

128 The first limitation is that frequentist hypothesis tests are based on binary

129 accept/reject decisions. When the number of primary studies is small, selection models

130 might have insufficient power, compromising the reliability of the accept/reject decisions.

131 This is particularly worrisome since the differences between the publication bias adjusted

132 and unadjusted estimates can be considerable. To illustrate this we turn to another

133 example. Lui (2015) studied how the acculturation mismatch (AM) that is the result of the

134 contrast between the collectivist cultures of Asian and Latin immigrant groups correlates

135 with intergenerational cultural conflict (ICC). Lui (2015) meta-analyzed 18 independent

136 studies correlating AM with ICC. A standard reanalysis indicates a significant effect of AM

5 137 on increased ICC, r = 0.250, p < .001. We reanalyze the data with selection models, using

138 the same default settings as before. The test for heterogeneity is significant, Q(17) = 75.50,

139 p < .001. The test for publication bias assuming heterogeneity is significant only when

2 140 using α = 0.10, as advocated by Renkewitz and Keiner (2019), χ (1) = 3.11, p = .078.

141 However, proponents of the effect may advocate for the more stringent criterion of α = 0.05.

142 This stricter significance level would lead researchers to conclude that they do not need to

143 adjust the estimate for publication bias. This decision has considerable impact on the

144 results: the adjusted effect size estimate is non-significant at the .05 level (i.e., r = 0.159,

145 p = .055), whereas the unadjusted estimate is significant (i.e., r = 0.250, p < .001).

146 The second limitation also relates to small sample sizes. If we consider the example

147 outlined in the previous paragraph in greater detail, we discover that the automatic p-value

148 interval collapsing leaves us with only one p-value cutoff, .025 (corresponding to one-sided

149 p-values). This collapsing prevents possible estimation issues that happen with only a few

150 primary studies in some of the p-value intervals. There is no clear solution to this problem

151 from the frequentist standpoint. A possible solution is to try again with different p-value

152 intervals. However, this makes the analysis data-dependent and might not always be

5 This result is close to that reported by Lui (2015), namely r = 0.23. For the re-analysis, we used the data set as recoded by Stanley et al. (2018), accessible at https://osf.io/2vfyj/files/. ADJUSTING FOR PUBLICATION BIAS IN JASP 11

153 possible (e.g., when there are almost exclusively significant studies). Furthermore, it can

154 also lead to different results. If we automatically collapse the p-value intervals, as in the

155 previous example, the estimated effect size from the publication bias adjusted random

156 effects model is significant on the α = .10 level, r = 0.159, p = .055. However, not

157 collapsing the p-value intervals would have resulted in a non-significant test for publication

2 158 bias, χ (4) = 5.03, p = 0.284, and the subsequent selection of the original, significant effect

159 size estimate from the non-adjusted random effects model.

160 A third limitation is that a non-significant test for publication bias is often not very

161 informative. From a frequentist point of view, the act of not rejecting the null hypothesis

162 does not imply that there is evidence in its favor. In other words, frequentist methods

163 cannot distinguish between absence of evidence (i.e., the data are uninformative) or

164 evidence of absence (i.e., the data support the null hypothesis; Keysers et al., 2020;

165 Wagenmakers et al., 2016). This problem is related to the earlier example in which it was

166 unclear whether nonsignificance on the .05 level indicates evidence of absence or absence of

167 evidence regarding publication bias.

168 A fourth limitation is accumulation bias (ter Schure & Grünwald, 2019). Consider

169 meta-analyzing k primary studies with a frequentist selection model. At a later point in

170 time, an additional study k + 1 becomes available, and researchers want to add this study

171 to the set and update the analysis. For frequentist methods, this introduces the problem of

172 multiple testing. To avoid accumulation bias, the sampling plan would need to be known in

173 advance. However, since researchers usually conduct meta-analyses on available data

174 collected by others, accumulation bias is all but inevitable.

175 To overcome these limitations, we developed a Robust Bayesian Meta-Analysis

176 (RoBMA; Maier et al., 2020). In the next paragraphs, we explain RoBMA conceptually

177 and show how it alleviates the shortcomings of frequentist selection models. In addition, we

178 illustrate the workings of the JASP implementation in practice. ADJUSTING FOR PUBLICATION BIAS IN JASP 12

179 Robust Bayesian Meta-Analysis

180 RoBMA is a Bayesian extension of selection models. It allows researchers to

181 simultaneously estimate different models and base the results on a weighted combination of

182 their estimates. The models can be generally divided into three qualitatively different pairs:

183 1. Models assuming the null hypothesis to be true versus models assuming the

184 alternative hypothesis to be true (i.e., H0 vs. H1).

f 185 2. Models assuming fixed effects versus models assuming random effects (i.e., H vs.

r 186 H ).

187 3. Models assuming publication bias (i.e., selection models) and models assuming no

ω ω 188 publication bias (i.e., H vs. H ).

189 Combining these models results in 2 × 2 × 2 = 8 different model types. The default

190 RoBMA implementation of JASP provides inference based on all eight of these model types

191 simultaneously. In the default setting, as used in Maier et al. (2020), all model types are

192 deemed equally likely a priori. Also, the model types assuming publication bias contain

193 two models – one assuming a two-sided selection process with two steps and one assuming

194 a two-sided selection process with three steps. Table 1 displays an overview of the

195 individual models, containing all prior distributions and prior model probabilities. This

196 table is automatically generated in JASP before the start of the fitting process.

197 Consider the models in Table 1 in more detail. The first six models with effect size

198 prior “Spike(0)” are the null models that assume no effect. Models seven to twelve assume

199 the presence of an effect with the (non-truncated) standard normal prior distribution

200 (“Normal(0,1)”). Regarding heterogeneity, we again use spike at zero for the null models

201 and the inverse gamma distribution for the models assuming heterogeneity to be present

202 (van Erp et al., 2017). For publication bias, we distinguish between three types of models.

203 First, we include models that assume no publication bias (publication probability is equal

204 to 1, i.e., “Spike(1)”). Second, we include publication bias models that assume two-sided ADJUSTING FOR PUBLICATION BIAS IN JASP 13

Table 1 Models Overview for the Default Settings in the JASP Implementation of RoBMA. Table from JASP

Prior Distribution Effect Size Heterogeneity Publication Bias P(M)

1 Spike(0) Spike(0) Spike(1) 0.125 2 Spike(0) Spike(0) Two-sided((0.05), (1, 1)) 0.063 3 Spike(0) Spike(0) Two-sided((0.1, 0.05), (1, 1, 1)) 0.063 4 Spike(0) InvGamma(1, 0.15)[0, Inf] Spike(1) 0.125 5 Spike(0) InvGamma(1, 0.15)[0, Inf] Two-sided((0.05), (1, 1)) 0.063 6 Spike(0) InvGamma(1, 0.15)[0, Inf] Two-sided((0.1, 0.05), (1, 1, 1)) 0.063 7 Normal(0, 1)[-Inf, Inf] Spike(0) Spike(1) 0.125 8 Normal(0, 1)[-Inf, Inf] Spike(0) Two-sided((0.05), (1, 1)) 0.063 9 Normal(0, 1)[-Inf, Inf] Spike(0) Two-sided((0.1, 0.05), (1, 1, 1)) 0.063 10 Normal(0, 1)[-Inf, Inf] InvGamma(1, 0.15)[0, Inf] Spike(1) 0.125 11 Normal(0, 1)[-Inf, Inf] InvGamma(1, 0.15)[0, Inf] Two-sided((0.05), (1, 1)) 0.063 12 Normal(0, 1)[-Inf, Inf] InvGamma(1, 0.15)[0, Inf] Two-sided((0.1, 0.05), (1, 1, 1)) 0.063

205 selection on two-sided p-values based on significance. Here the cumulative sum of the

206 Dirichlet distribution with parameters (1,1) is used as prior on the publication bias weights

207 (i.e., “Two-sided(0.05), (1,1)”). We use the cumulative sum to induce ordinality (the “more

208 significant” the more likely to be published) and to assume that significant studies are

209 always published. Third, we include models that also assume two-sided p-values and

210 two-sided selection but distinguish between “marginally significant” and non-significant

211 studies (“Two-sided(0.1, 0.05)(1,1,1)”). The prior probability is split equally across the

212 different model pairs (i.e., effect size, heterogeneity, publication bias), resulting in a

213 probability of 0.125 for each model. Since there are two models assuming publication bias, ADJUSTING FOR PUBLICATION BIAS IN JASP 14

214 we split the prior probability again for those two models resulting in 0.0625 (rounded in

215 the JASP output) for these models. However, we want to point out that these are only the

216 default priors, and researchers can specify different priors if they like. Those who are not

217 interested in testing a null hypothesis that assumes the effect to be zero (e.g., McElreath,

218 2020) can also specify different models as the null hypothesis or not specify a null

219 hypothesis at all (see Example 3).

220 The models are then updated according to Bayes’ rule. In other words, models that

221 predict the data well receive a boost in posterior probability, while models that predict the

222 data poorly suffer a decline. Comparing only two models, we can describe their relative

223 predictive performance using Bayes factors (Etz & Wagenmakers, 2017; Jeffreys, 1961;

224 Kass & Raftery, 1995; Rouder & Morey, 2019; Wrinch & Jeffreys, 1921). The

rω rω 225 equals the change from prior to posterior odds. Assuming that H1 and H1 are equally

226 likely a priori the posterior odds equal the Bayes factor. Equation 1 shows the Bayes

227 factor using the example of comparing publication bias and no publication bias models

228 assuming the presence of an effect and the presence of heterogeneity.

p(data | Hrω) p(Hrω | data) , p(Hrω) 1 = 1 1 . (1) p(data | Hrω) p(Hrω | data) rω) 1 1 p(H1 | {z } | {z } | {z } Bayes factor Posterior odds Prior odds

229 More than two models can be compared using the “inclusion Bayes factor”. This

230 Bayes factor allows researchers to quantify the evidence for a meta-analytical effect, the

231 evidence for heterogeneity, and the evidence for publication bias. When comparing models

232 assuming publication bias to models assuming no publication bias, the inclusion Bayes ADJUSTING FOR PUBLICATION BIAS IN JASP 15

233 factor can be calculated as in Equation 2.

p(Hfω | data) + p(Hrω | data) + p(Hfω | data) + p(Hrω | data) BFωω = 1 1 0 0 | {z } fω rω fω rω Inclusion Bayes factor p(H1 | data) + p(H1 | data) + p(H0 | data) + p(H0 | data) for publication bias | {z } Posterior inclusion odds for effect (2) , p(Hfω) + p(Hrω) + p(Hfω) + p(Hrω) 1 1 0 0 . fω rω fω rω p(H1 ) + p(H1 ) + p(H0 ) + p(H0 ) | {z } Prior inclusion odds for effect

234 In other words, the inclusion Bayes factor for publication bias is obtained by

235 contrasting the prediction accuracy of all models that assume publication bias to all

236 models that assume no publication bias. The inclusion Bayes factor for effect size and

237 heterogeneity can be calculated analogously. One advantage of Bayes factors is that they

238 can distinguish between absence of evidence and evidence of absence. In addition, they can

239 quantify evidence on a continuous scale and can be updated sequentially as studies

240 accumulate.

241 After updating the models according to their posterior probability, the final effect

242 size estimate is obtained by Bayesian model averaging (e.g., Hinne et al., 2020; Hoeting

243 et al., 1999). In Bayesian model averaging the effect size from each individual model is

244 weighted by its posterior probability. Since those models that predicted the data best have

245 the highest posterior probability, the final estimate is based most strongly on the best

246 models.

247 RoBMA overcomes the limitations of frequentist selection models in several ways.

248 First, the Bayes factor allows researchers to quantify evidence for the null hypothesis and,

249 thus, distinguish between absence of evidence and evidence of absence.

250 Second, the model averaging obviates the need to select a single model. Therefore, if

251 there is uncertainty regarding the presence of publication bias, RoBMA can base the

252 inference on both the “normal” models and the publication bias adjusted models instead of

253 needing to commit fully to a single model. ADJUSTING FOR PUBLICATION BIAS IN JASP 16

254 Third, the prior distributions allow the selection models to be estimated even in

255 cases with few p-values in some of the p-value intervals. The method will not fail to

256 converge under these conditions. However, especially in this context, it is important to

257 specify the prior distributions carefully and check the robustness of the results to different

258 specifications of the prior distributions.

259 Fourth, Bayes factors allow for sequential updating (Rouder, 2014; Rouder &

260 Morey, 2011; Wagenmakers et al., 2016), meaning that new studies can be added to the set

261 and the analysis can be updated without having to worry about accumulation bias. At

262 every point in time, RoBMA quantifies evidence based on the relative predictive

263 performance of the rival models for the observed data.

264 Example 2: Acculturation Mismatch (AM) and Intergenerational Cultural

265 Conflict (ICC)

266 We illustrate RoBMA using the meta-analysis on the relationship between AM and

267 ICC (Lui, 2015) mentioned in the section “limitations of frequentist selection models”. To

268 analyze the data, we first specify effect sizes as correlations, using the “Correlations & N”

269 input option (cf. the earlier example on dyadic “interracial” interactions). The RoBMA

270 package internally transforms the correlations into Cohen’s d effect sizes, which are then

271 used to estimate the models. Second, we specify prior distributions for the effect size,

272 heterogeneity, and publication bias (the prior distribution specification is discussed in

273 greater detail in Example 3). Because we are using correlations as an input, the prior

274 distributions for effect size and heterogeneity parameters are specified on the Cohen’s d

275 scale. The prior distributions can be visualized with JASP (Figure 3 shows prior

276 distribution for the effect size of the alternative hypothesis) by selecting the “Plot priors”

277 checkbox in the “Models” panel. The prior distribution for effect size is automatically

278 transformed from Cohen’s d to the input scale selected earlier; in this case, it is ADJUSTING FOR PUBLICATION BIAS IN JASP 17

6 279 transformed to the correlation scale. Figure 3 The Default Standard-Normal Prior Distribution on Cohen’s d Induces a Bell-Shaped Prior Distribution on the Correlation Scale. Figure from JASP

1

0.8

0.6

Density 0.4

0.2

0

−1 −0.5 0 0.5 1

ρ (Cohen's d ~ Normal(0, 1) [ −∞, ∞ ])

280 After specifying the prior distributions, the models can be estimated. To do so, we

281 place the effect sizes and sample sizes to the corresponding input panels (“Effect Size”,

282 “N”). JASP then automatically starts estimating the models, which is signaled by the

283 appearance of a progress bar at the top of the right panel. After a few minutes, JASP

284 produces the default output depicted in Figure 4. Note, however, that with larger numbers

285 of primary studies the estimation takes considerably longer. The current example features

286 18 studies and took 15 minutes to estimate, whereas the first example featured 55 studies

7 287 and took around 55 minutes to estimate. To ameliorate the problems associated with

6 The heterogeneity parameter τ is interpreted on the Cohen’s d scale because its transformation to the correlation scale is dependent on the location of the effect size estimate.

7 As timed on a notebook with a rather old 4720HQ Intel processor. ADJUSTING FOR PUBLICATION BIAS IN JASP 18

288 lengthy model fitting times, the “Save the fitted model” option in the “Advanced” section

289 allows users to save an already estimated model. The saved model can then be loaded into

290 JASP (or R) using the “Fitted model” data input option. This allows users to fit the

291 model only once, share the estimated model with colleagues, and circumvent the issue of

292 lengthy refitting times.

8 293 The “Model Summary” table presents the results of the RoBMA hypothesis tests.

294 The results indicate that there is a strong evidence for the presence of the effect,

rf 6 295 BF10 = 2777.84 and heterogeneity, BF = 3.74 × 10 , but virtually no evidence for either

ωω 296 absence or presence of publication bias BF = 1.35.

8 The warnings below the table inform us that the starting values for the MCMC algorithm needed to be set to the mean of the observed values due to a problem with likelihood calculation for the randomly chosen starting values. This message is harmless and only indicates that the starting values could not be chosen randomly. A more detailed discussion about the potential error and warning messages is available at https://fbartos.github.io/RoBMA/articles/WarningsAndErrors.html. ADJUSTING FOR PUBLICATION BIAS IN JASP 19

Figure 4 Results from Lui (2015) Using the Default Settings of JASP Robust Bayesian Meta-Analysis

Note. The analysis settings are specified in the left panel and the associated output is shown in the right panel. The default output concerns (1) tests for effect, heterogeneity, and publication bias and (2) model-averaged parameter estimates for the effect size, heterogeneity, and publication bias weights. A tutorial video showing this analysis is available at https://www.youtube.com/watch?v=5Ff9jsb1_TM.

297 The following table, “Model Averaged Estimates”, reports the estimates for effect

298 size and for heterogeneity as obtained by model-averaging across all models. The model’s

299 posterior probabilities function as weights for the model-averaging. The mean estimate of

300 effect size is close to the original estimate, r = 0.240, 95% CI [0.157, 0.321], with the

301 heterogeneity estimate, τ (on Cohen’s d scale) = 0.317, 95% CI [0.194, 0.493]. ADJUSTING FOR PUBLICATION BIAS IN JASP 20

302 We can visualize the model-averaged prior and posterior distribution for effect size

303 by selecting the “Effect” option under the “Pooled Estimates” heading in the “Plots”

304 section. Figure 5 shows the change from the prior distribution (dashed grey line and grey

305 arrow) to the posterior distribution (solid black line and black arrow). The posterior

306 probability of the spike is much lower than its prior probability (i.e., the grey arrow goes

307 up to .50 on the secondary y-axis, whereas the height of the black arrow only slightly

308 exceeds 0), showing that the data undercut the hypothesis that the effect is absent. The

309 posterior distribution under the alternative hypothesis is relatively peaked on

310 medium-sized effects. We can also explore the effect size estimates from the conditional

311 individual models (showing and model-averaging only across models assuming the presence

312 of the effect), which can be produced in the same section under the “Individual Models”

313 heading. Figure 6 provides an example. JASP provides similar figures for the heterogeneity

314 and the weights (i.e., the relative publication probabilities). ADJUSTING FOR PUBLICATION BIAS IN JASP 21

Figure 5 The Model-Averaged Prior and Posterior Distribution for the Effect Size Estimate from Lui (2015) Using the Default Settings. Figure from JASP

12 1

10 .75 8 Probability

6 .50 Density

4 .25 2

0 0

−1 −0.5 0 0.5 1 ρ (averaged)

Note. The model-averaged prior distribution for effect size is displayed using the grey arrow and the grey dashed line; the posterior distribution is displayed using the black arrow (which is close to zero) and the solid black line. The arrows visualize the probability (secondary y-axis) on the point-null prior and posterior distribution. ADJUSTING FOR PUBLICATION BIAS IN JASP 22

Figure 6 Effect Size Estimates under the Models Assuming ρ 6= 0 from Lui (2015) Using the Default Settings. Figure from JASP

Mean [95% CI] Model: Post. prob. (Prior prob.)

µ ~ Normal(0, 1) [ −∞, ∞ ] τ ~ Spike(0) 0.21 [0.17, 0.24] ω ~ Spike(1) 0.00 (0.25)

µ ~ Normal(0, 1) [ −∞, ∞ ] τ ~ Spike(0) 0.20 [0.15, 0.23] ω ~ Two−sided((0.05), (1, 1)) 0.00 (0.12)

µ ~ Normal(0, 1) [ −∞, ∞ ] τ ~ Spike(0) 0.19 [0.15, 0.23] 0.00 (0.12) ω ~ Two−sided((0.1, 0.05), (1, 1, 1))

µ ~ Normal(0, 1) −∞ ∞ [ , ] 0.25 [0.17, 0.33] τ ~ InvGamma(1, 0.15) [ 0, ∞ ] ω ~ Spike(1) 0.43 (0.25)

µ ~ Normal(0, 1) −∞ ∞ [ , ] 0.24 [0.16, 0.32] τ ~ InvGamma(1, 0.15) [ 0, ∞ ] ω ~ Two−sided((0.05), (1, 1)) 0.21 (0.12)

µ ~ Normal(0, 1) −∞ ∞ [ , ] 0.23 [0.15, 0.32] τ ~ InvGamma(1, 0.15) [ 0, ∞ ] ω ~ Two−sided((0.1, 0.05), (1, 1, 1)) 0.36 (0.12)

Overall (Conditional) 0.24 [0.16, 0.32]

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 ρ

Note. The left side describes prior distribution configuration for each of the model parameters and the right sides sides provides numerical summaries for the mean, 95% CI, posterior model probability and prior model probability. The bottom row visualizes the effect size estimate model-averaged across the visualized models.

315 The models are estimated using Markov chain Monte Carlo techniques (MCMC) in

316 JAGS (Plummer, 2003). The default settings include 1000 adaptation, 5000 burnin, and

317 10000 sampling iterations with 3 chains and thinning set to 1. These settings can be

318 adjusted under the “Advanced” section. The section also contains a “Set seed” option for ADJUSTING FOR PUBLICATION BIAS IN JASP 23

319 exact repeatability of the results. MCMC convergence can be checked using the

320 “Overview” option under the “MCMC Diagnostics”. The “Model Diagnostics Overview”

321 table shows diagnostics summaries for the individual models. We see that all models had

322 excellent Rhat convergence diagnostic (the recommended maximum value is 1.05; Gelman,

323 Rubin, et al., 1992; but see Vehtari et al., 2019 for a more stringent standard) and

324 acceptable effective sample size (ESS). Parameter and model-specific diagnostics can be

325 obtained from the “Plot” section, which allows the visualization of trace plots,

326 autocorrelation histograms, and posterior sample density plots for any of the estimated

327 parameters. See McElreath (2020, p. 285) or Plummer et al. (2019) for a more detailed

328 explanation of these plotting options. When one of more models seem unfit for drawing

329 inferences (i.e., large Rhat and/or low ESS), these models can be excluded using the

330 “Exclude models” option under the “Advanced” tab. If desired, these models can be

331 re-estimated with an increased number of burnin and sampling iterations.

332 This example highlights the Bayesian benefit of taking all uncertainty into account.

333 In the frequentist framework, it was unclear whether or not to adjust for publication bias.

334 In contrast, RoBMA does not require an all-or-none decision on the presence of publication

335 bias. Instead, all models are taken into account simultaneously, and the effect size estimate

336 is based on a weighted average across the various models, with the averaging weights

337 determined by the support that each model receives from the data. This difference is not

338 only philosophical but has important practical implications: RoBMA finds clear evidence

339 for the presence of an effect, whereas frequentist selection models can either indicate that

340 the effect is present or absent, depending on the analyst’s choices regarding publication

341 bias adjustment.

342 Example 3: Specifying Different Priors

343 In the previous example, RoBMA revealed compelling evidence against the point

344 null hypothesis. However, it has been repeatedly argued that point null hypotheses are not ADJUSTING FOR PUBLICATION BIAS IN JASP 24

345 realistic, and therefore not meaningful to test (e.g., Gelman & Carlin, 2014; Good, 1967;

346 Meehl, 1978; Orben & Lakens, 2020). RoBMA overcomes this objection by allowing the

347 specification of ‘perinull’ hypotheses, that is, hypotheses with prior distributions tightly

348 centered around an effect size of zero (e.g., Berger & Delampady, 1987; Cornfield, 1966;

349 George & McCulloch, 1993). Adjustments may also be desired for the prior distribution on

350 effect size that is postulated under the alternative hypothesis. By default, the most

351 plausible value for this prior distribution is zero (i.e., under H1, the prior distribution is

352 centered on zero), and this may not reflect the information at hand (e.g., Gronau et al.,

353 2020). A more diagnostic test requires an ‘informed prior’, one that is centered around a

354 non-zero value of effect size.

355 Here we continue the example of acculturation mismatch and intergenerational

356 cultural conflict (Lui, 2015) and demonstrate how RoBMA allows researchers to specify

357 both a perinull hypothesis and an informed hypothesis, and compare their predictive

358 performance for the observed data. First, we specify a perinull hypothesis by assigning

359 effect size a zero-centered normal distribution with a standard deviation .10 on the Cohen’s

360 d scale; propagated to the correlation scale, this yields a prior distribution with

361 approximately 95% probability mass in the interval r ∈ [−0.10, 0.10]. For a perinull

362 hypothesis, this range may be considered relatively wide. If the goal of the perinull

363 distribution is primarily to counter the objection that “the null hypothesis is never true

364 exactly”, a much more narrow 95% interval could be specified, such as one ranging from

365 −.01 to .01, for instance.

366 Second, we specify the informed alternative hypothesis by assigning the effect size a

367 normal distribution centered at .60 with standard deviation .20 on the Cohen’s d scale.

368 Translated to the correlation scale, this results in a prior distribution with most probability

369 on correlations higher than .10, with the prior median at a correlation slightly lower than

370 .30.

371 To specify these hypotheses in JASP, we open the “Models” section and adjust the ADJUSTING FOR PUBLICATION BIAS IN JASP 25

372 parameter values under the “Effect” heading (for the alternative hypothesis) and under

373 “Effect (Null)” (for the null hypothesis), which can be accessed by checking the “Set null

374 priors” checkbox. The models settings is depicted in Figure 7. The model specification

375 implemented in JASP allows researchers to specify any desired combination of hypotheses

9 376 using different prior distributions (see the JASP help file accessible under the (I) icon).

377 The specified prior distribution under each hypothesis is then used to generate a

378 combination of all possible models. These models are automatically used to draw inference

379 using the inclusion Bayes factor and model-averaged to obtain model estimates.

9 For specifying more complex models, see the RoBMA R package manual and the “Fitting custom meta-analytic ensembles vignette” (https://fbartos.github.io/RoBMA/articles/CustomEnsembles.html). ADJUSTING FOR PUBLICATION BIAS IN JASP 26

Figure 7 Comparison Between a Perinull Hypothesis and an Informed Alternative Hypothesis as Applied to the Data from Lui (2015). Screenshot from the JASP RoBMA Module

Note. The left panel shows the prior settings specifying the informed alternative hypothesis (“Normal” distribution under “Effect”) and the perinull hypothesis (“Normal” distribution under “Effect (Null)”). The right panel shows the default output: the tests for effect, heterogeneity, and publication bias and model-averaged parameter estimates for the effect size, heterogeneity, and publication bias weights. A video showing this analysis is available at https://www.youtube.com/watch?v=BEMijDxQD2k.

380 The results indicate strong support against the perinull hypothesis, BF10 = 407.48,

rf 6 381 with overwhelming evidence for heterogeneity, BF = 5.08 ∗ 10 , and lack of evidence for

ωω 382 absence or presence of publication bias, BF = 1.25 (Figure 7). The model-averaged prior ADJUSTING FOR PUBLICATION BIAS IN JASP 27

383 and posterior plot for the mean parameter, shown in Figure 8, indicates that most of the

384 posterior probability mass is concentrated in the area predicted by the informed alternative

385 hypothesis models.

Figure 8 The Model-Averaged Prior and Posterior Distribution for the Effect Size Parameter Estimate from Lui (2015) Using Customized Priors, Figure from JASP

12

10

8

6 Density

4

2

0

−1 −0.5 0 0.5 1 ρ (averaged)

Note. The model-averaged prior distribution on effect size is depicted as the dashed grey line; the model-averaged posterior distribution is depicted as the solid black line.

386 Example Report

387 In the previous sections, we illustrated the various functionalities of the JASP

388 RoBMA module for conduction publication bias adjusted meta-analyses. Here, we briefly

389 demonstrate how to report the results of both Selection Models and Robust Bayesian

390 Meta-Analysis, using our first example concerning the effect of “interracial” vs. same race

391 dyads on performance (Toosi et al., 2012). For more general reporting guidelines see ADJUSTING FOR PUBLICATION BIAS IN JASP 28

392 van Doorn et al. (in press)

393 First, we start with the selection models. Prior to data analysis, we decided to use

394 significance level α = 0.10 for publication bias and α = 0.05 for heterogeneity and effect

395 sizes. We estimated the two-sided selection models with p-value cut-offs set to (0.05, 0.10)

396 and automatically joined p-value intervals. The models were estimated using correlations

397 and sample sizes with Cohen’s d effect size transformation. The p-value intervals were

398 automatically reduced to the (0.025, 0.05, 0.95) cut-offs corresponding to one-sided p-values.

399 The tests for heterogeneity was significant, Q(54) = 170.77, p < .001. Therefore, we applied

400 the test for publication bias assuming heterogeneity, which was significant as well,

2 401 χ (3) = 13.20, p = .005. Consequently, we interpreted the bias adjusted effect size estimate

402 from a random effects model. The effect size was not statistically significant, r = −0.011,

403 95% CI [−0.075, 0.053], p = .734, with the heterogeneity estimate, τ (on Cohen’s d scale) =

404 0.152, 95% CI [0.000, 0.227]. The resulting JASP file can be found at https://osf.io/6hf7r/.

405 Second, we re-analyzed the same data set using Robust Bayesian Meta-Analysis.

406 Before the analysis, we decided to use the default prior settings (i.e., standard normal

407 distribution on effect sizes, inverse gamma distribution with α = 1 and β = 0.15 on

408 heterogeneity, two two-sided weight functions with cut-points at (0.05) and (0.05, 0.10) and

409 parameters α = (1, 1) and (1, 1, 1), and the default point priors on the null hypotheses). We

410 set the prior hypothesis probability to 0.50 for the effect size, heterogeneity, and publication

411 bias. The models were estimated using correlations and sample sizes with Cohen’s d effect

412 size transformation. The results did not indicate evidence for either presence or absence of

rf 413 the effect, BF10 = 2.01, they indicated strong evidence for heterogeneity, BF = 108.52,

ωω 414 and they indicated strong evidence for publication bias, BF = 280.74. The resulting

415 model-averaged effect size estimate was r = 0.032, 95% CI [0.000, 0.083], with the

416 heterogeneity estimate, τ (on Cohen’s d scale) = 0.175, 95% CI [0.078, 0.291]. The MCMC

417 diagnostics were good, with all Rhat values below 1.01 and all ESS above 800. ADJUSTING FOR PUBLICATION BIAS IN JASP 29

418 Concluding Comments

419 In this paper, we introduced two approaches to adjust for publication bias in

420 meta-analysis, both implemented in JASP. First, we discussed frequentist selection models,

421 which have been demonstrated to work well even under high heterogeneity. In selection

422 models, we first test for heterogeneity and publication bias and then select the appropriate

423 model. Second, we discussed RoBMA, Robust Bayesian Meta-Analysis. In contrast to the

424 frequentist selection models, RoBMA does not select a single model. Instead, it keeps all

425 models in play and weights their impact for effect size estimation according to their

426 posterior probability.

427 The publication bias adjusted meta-analysis module in JASP does not incorporate

428 popular methods such as Egger regression (Egger et al., 1997), or p-curve (Simonsohn

429 et al., 2014). The reason for this choice is that simulations, as well as analyses of existing

430 data, suggest that publication bias adjustment based on selection models is the most

431 promising approach (Carter et al., 2019; Maier et al., 2020; McShane et al., 2016;

432 Renkewitz & Keiner, 2019). However, in future research, it could be interesting to add

433 alternative models of publication bias (e.g., correlation of effect size and standard error as

434 in Egger’s regression (Egger et al., 1997)) to RoBMA. This would allow us to compare

435 different models of publication bias directly on empirical data.

436 To conclude, the publication bias adjusted meta-analyses in JASP allows researchers

437 without programming experience to conduct state-of-the-art publication bias corrected

438 meta-analysis in an intuitive and user-friendly way. We hope that this methodology will

439 improve the inferences researchers make when conducting meta-analysis. ADJUSTING FOR PUBLICATION BIAS IN JASP 30

440 References

441 Bartoš, F., & Maier, M. (2020). RoBMA: An R Package for Robust Bayesian

442 Meta-Analyses [R package version 1.0.5].

443 https://CRAN.R-project.org/package=RoBMA

444 Bartoš, F., & Schimmack, U. (2020). Z-curve. 2.0: Estimating replication rates and

445 discovery rates. https://doi.org/10.31234/osf.io/urgtn

446 Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science,

447 317–335.

448 Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Publication bias. In

449 M. Borenstein (Ed.), Introduction to Meta-Analysis (pp. 277–292). Wiley.

450 Brunner, J., & Schimmack, U. (2020). Estimating population mean power under conditions

451 of heterogeneity and selection for significance. Meta-Psychology, 4.

452 https://doi.org/10.15626/MP.2018.874

453 Carter, E. C., & McCullough, M. E. (2014). Publication bias and the limited strength

454 model of self-control: Has the evidence for ego depletion been overestimated?

455 Frontiers in Psychology, 5, Article 823. https://doi.org/10.3389/fpsyg.2014.00823

456 Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias

457 in psychology: A comparison of meta-analytic methods. Advances in Methods and

458 Practices in Psychological Science, 2 (2), 115–144.

459 https://doi.org/10.1177/2515245919847196

460 Coburn, K. M., Vevea, J. L., & Coburn, M. K. M. (2019). Package ‘weightr’ [R package

461 version 2.0.2]. https://cran.rproject.org/web/packages/weightr/weightr.pdf

462 Cornfield, J. (1966). A Bayesian test of some classical hypotheses—with applications to

463 sequential clinical trials. Journal of the American Statistical Association, 61 (315),

464 577–594. https://doi.org/10.1080/01621459.1966.10480890 ADJUSTING FOR PUBLICATION BIAS IN JASP 31

465 Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot–based method of

466 testing and adjusting for publication bias in meta-analysis. Biometrics, 56 (2),

467 455–463. https://doi.org/10.1111/j.0006-341X.2000.00455.x

468 Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis

469 detected by a simple, graphical test. BMJ, 315 (7109), 629–634.

470 https://doi.org/10.1136/bmj.315.7109.629

471 Etz, A., & Wagenmakers, E.-J. (2017). J. B. S. Haldane’s contribution to the Bayes factor

472 hypothesis test. Statistical Science, 32, 313–329. https://doi.org/10.1214/16-STS599

473 Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (sign) and

474 Type M (magnitude) errors. Perspectives on Psychological Science, 9 (6), 641–651.

475 https://doi.org/10.1177/1745691614551642

476 Gelman, A., Rubin, D. B. Et al. (1992). Inference from iterative simulation using multiple

477 sequences. Statistical science, 7 (4), 457–472.

478 George, E. I., & McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of

479 the American Statistical Association, 88 (423), 881–889.

480 https://doi.org/10.1080/01621459.1993.10476353

481 Good, I. J. (1967). A Bayesian significance test for multinomial distributions. Journal of

482 the Royal Statistical Society: Series B (Methodological), 29 (3), 399–418.

483 https://doi.org/10.1111/j.2517-6161.1967.tb00705.x

484 Gronau, Q. F., Ly, A., & Wagenmakers, E.-J. (2020). Informed Bayesian t-tests. The

485 American Statistician, 74, 137–143. https://doi.org/10.1080/00031305.2018.1562983

486 Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E.-J. (2020). A conceptual

487 introduction to Bayesian model averaging. Advances in Methods and Practices in

488 Psychological Science, 3 (2), 200–215. https://doi.org/10.1177/2515245919898657

489 Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model

490 averaging: A tutorial. Statistical Science, 14 (4), 382–401. ADJUSTING FOR PUBLICATION BIAS IN JASP 32

491 Ioannidis, J. P. (2005). Why most published research findings are false. PLOS Medicine,

492 2 (8), e124. https://doi.org/10.1371/journal.pmed.0020124

493 Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file drawer problem.

494 Statistical Science, 3 (1), 109–117.

495 JASP Team. (2020). JASP (Version 0.14)[Computer software]. https://jasp-stats.org/

496 Jeffreys. (1961). Theory of probability (3rd Edition). Oxford University Press.

497 Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical

498 Association, 90 (430), 773–795. https://doi.org/10.1080/01621459.1995.10476572

499 Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis

500 testing in neuroscience to establish evidence of absence. Nature Neuroscience, 23,

501 788–799. https://doi.org/10.1038/s41593-020-0660-4

502 Larose, D. T., & Dey, D. K. (1998). Modeling publication bias using weighted distributions

503 in a Bayesian framework. Computational Statistics & Data Analysis, 26 (3),

504 279–302. https://doi.org/10.1016/S0167-9473(97)00039-X

505 Lau, J., Ioannidis, J. P., Terrin, N., Schmid, C. H., & Olkin, I. (2006). The case of the

506 misleading funnel plot. BMJ, 333 (7568), 597–600.

507 https://doi.org/10.1136/bmj.333.7568.597

508 Lui, P. P. (2015). Intergenerational cultural conflict, mental health, and educational

509 outcomes among asian and latino/a americans: Qualitative and meta-analytic

510 review. Psychological Bulletin, 141 (2), 404–446.

511 Maier, M., Bartoš, F., & Wagenmakers, E.-J. (2020). Robust Bayesian meta-analysis:

512 Addressing publication bias with model-averaging.

513 https://doi.org/10.31234/osf.io/u4cns

514 McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and

515 (2nd ed.). CRC press.

516 McShane, B. B., Böckenholt, U., & Hansen, K. T. (2016). Adjusting for publication bias in

517 meta-analysis: An evaluation of selection methods and some cautionary notes. ADJUSTING FOR PUBLICATION BIAS IN JASP 33

518 Perspectives on Psychological Science, 11 (5), 730–749.

519 https://doi.org/10.1177/1745691616662243

520 Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the

521 slow progress of soft psychology. Journal of Consulting and Clinical Psychology,

522 46 (4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806

523 Orben, A., & Lakens, D. (2020). Crud (re)defined. Advances in Methods and Practices in

524 Psychological Science, 3 (2), 238–247. https://doi.org/10.1177/2515245920917961

525 Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using

526 Gibbs sampling, In Proceedings of the 3rd international workshop on distributed

527 statistical computing. Vienna, Austria.

528 Plummer, M., Best, N., Cowles, K., & Vines, K. (2019). Package ‘coda’.

529 Renkewitz, F., & Keiner, M. (2019). How to detect publication bias in psychological

530 research. Zeitschrift für Psychologie, 227 (4), 261–279.

531 https://doi.org/10.1027/2151-2604/a000386

532 Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological

533 Bulletin, 86 (3), 638. https://doi.org/10.1037/0033-2909.86.3.638

534 Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis.

535 John Wiley & Sons.

536 Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin

537 & Review, 21 (2), 301–308. https://doi.org/10.3758/s13423-014-0595-4

538 Rouder, J. N., & Morey, R. D. (2011). A Bayes factor meta-analysis of Bem’s ESP claim.

539 Psychonomic Bulletin & Review, 18 (4), 682–689.

540 https://doi.org/10.3758/s13423-011-0088-7

541 Rouder, J. N., & Morey, R. D. (2019). Teaching Bayes’ theorem: Strength of evidence as

542 predictive accuracy. The American Statistician, 73 (2), 186–190.

543 https://doi.org/10.1080/00031305.2017.1341334 ADJUSTING FOR PUBLICATION BIAS IN JASP 34

544 Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer.

545 Journal of Experimental Psychology: General, 143 (2).

546 https://doi.org/10.1037/a0033242

547 Stanley, T. D., Carter, E. C., & Doucouliagos, H. (2018). What meta-analyses reveal about

548 the replicability of psychological research. Psychological Bulletin, 144 (12),

549 1325–1346. https://doi/10.1037/bul0000169

550 Stanley, T. D., & Doucouliagos, H. (2017). Neither fixed nor random: Weighted least

551 squares meta-regression. Research Synthesis Methods, 8 (1), 19–42.

552 https://doi.org/10.1002/jrsm.1211

553 ter Schure, J., & Grünwald, P. (2019). Accumulation Bias in meta-analysis: The need to

554 consider time in error control. F1000Research, 8, Article 962.

555 https://dx.doi.org/10.12688/f1000research.19375.1

556 Toosi, N. R., Babbitt, L. G., Ambady, N., & Sommers, S. R. (2012). Dyadic interracial

557 interactions: A meta-analysis. Psychological Bulletin, 138 (1), 1–27.

558 https://doi.org/10.1037/a0025767

559 van Doorn, J., van den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A.,

560 Evans, N. J., Gronau, Q. F., Hinne, M., Kucharský, Š., Ly, A., Marsman, M.,

561 Matzke, D., Komarlu Narendra Gupta, A. R., Sarafoglou, A., Stefan, A.,

562 Voelkel, J. G., & Wagenmakers, E.-J. (in press). The JASP guidelines for

563 conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review.

564 https://psyarxiv.com/yqxfr

565 van Erp, S., Verhagen, J., Grasman, R. P., & Wagenmakers, E.-J. (2017). Estimates of

566 between-study heterogeneity for 705 meta-analyses reported in Psychological

567 Bulletin from 1990–2013. Journal of Open Psychology Data, 5 (1), Article 4.

568 http://doi.org/10.5334/jopd.33 ADJUSTING FOR PUBLICATION BIAS IN JASP 35

569 Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2019).

570 Rank-normalization, folding, and localization: An improved Rb for assessing

571 convergence of mcmc.

572 Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in

573 the presence of publication bias. Psychometrika, 60 (3), 419–435.

574 https://doi.org/10.1007/BF02294384

575 Wagenmakers, E.-J., Morey, R. D., & Lee, M. D. (2016). Bayesian benefits for the

576 pragmatic researcher. Current Directions in Psychological Science, 25 (3), 169–176.

577 https://doi.org/10.1177/0963721416643289

578 Wrinch, D., & Jeffreys, H. (1921). On certain fundamental principles of scientific inquiry.

579 Philosophical Magazine, 42, 369–390. https://doi.org/10.1080/14786442108633773