ADJUSTING FOR PUBLICATION BIAS IN JASP 1
Adjusting for Publication Bias in JASP — Selection Models and Robust Bayesian Meta-Analysis
František Bartoš1,2∗, Maximilian Maier1∗, & Eric-Jan Wagenmakers1 1 University of Amsterdam 2 Charles University, Prague
∗Both authors contributed equally.
Correspondence concerning this article should be addressed to: František Bartoš, University of Amsterdam, Department of Psychological Methods, Nieuwe Achtergracht 129-B, 1018 VZ Amsterdam, The Netherlands, [email protected]
Author Note
This project was supported in part by a Vici grant (#016.Vici.170.083) to EJW. ADJUSTING FOR PUBLICATION BIAS IN JASP 2
Abstract
Meta-analysis is essential for cumulative science, but its validity is compromised by publication bias. In order to mitigate the impact of publication bias, one may apply selection models, which estimate the degree to which non-significant studies are suppressed. Implemented in JASP, these methods allow researchers without programming experience to conduct state-of-the-art publication bias adjusted meta-analysis. In this tutorial, we demonstrate how to conduct a publication bias adjusted meta-analysis in JASP and interpret the results. First, we explain how frequentist selection models correct for publication bias. Second, we introduce Robust Bayesian Meta-Analysis (RoBMA), a Bayesian extension of the frequentist selection models. We illustrate the methodology with two data sets and discuss the interpretation of the results. In addition, we include example text to provide concrete guidance on reporting the meta-analytic results in an academic article. Finally, three tutorial videos are available at https://tinyurl.com/y4g2yodc. Keywords: Selection Models, Robust Bayesian Meta-Analysis, Model-Averaging, Publication Bias ADJUSTING FOR PUBLICATION BIAS IN JASP 3
Adjusting for Publication Bias in JASP — Selection Models and Robust Bayesian Meta-Analysis
1 Meta-analyses are a powerful tool for evidence synthesis. However, publication bias,
2 the preferential publishing of significant studies, leads to an overestimation of effect sizes
3 when accumulating evidence across a set of primary studies. Some researchers claim that
4 most research findings might never be published but remain in researchers’ file drawers
5 (e.g., Ioannidis, 2005; Rosenthal, 1979). Even if the true extent of publication bias was less
6 severe than these researchers have suggested, it would remain a formidable threat to the
7 validity of meta-analyses (Borenstein et al., 2009).
8 To alleviate this problem and explicitly account for publication bias, a variety of
9 statistical methods have been proposed (e.g., Carter & McCullough, 2014; Duval &
10 Tweedie, 2000; Egger et al., 1997; Iyengar & Greenhouse, 1988; Simonsohn et al., 2014;
11 Stanley & Doucouliagos, 2017). However, simulations have shown that most methods
12 perform poorly under heterogeneity (Carter et al., 2019; Maier et al., 2020; McShane et al.,
13 2016; Renkewitz & Keiner, 2019) or do not provide meta-analytic estimates (Bartoš &
14 Schimmack, 2020; Brunner & Schimmack, 2020). Heterogeneity occurs whenever the
15 individual studies are not exact replications of each other and the true effect size varies
16 across primary studies, which is usually the case in psychology (e.g., McShane et al., 2016).
17 Under high heterogeneity, most tests for publication bias have either high false-positive
18 rates or low power; in addition, the associated meta-analytic effect size estimates are biased
19 or highly variable. An exception to this rule are selection models – these models provide an
20 explicit account of a p-value based publication bias process and adjust the estimate of
21 effect size accordingly. Selection models perform well even under high heterogeneity
22 (Carter et al., 2019; Maier et al., 2020; McShane et al., 2016). However, despite their
23 strong performance in simulations, selection models are rarely used in practice. Their
1 24 relative obscurity is arguably because selection models are considered overly complicated ,
1 For instance, Rothstein et al. (2005, p. 172) remark that “Weight function models are complex and ADJUSTING FOR PUBLICATION BIAS IN JASP 4
25 and that they have, to the best of our knowledge, not yet been implemented in statistical
26 software packages with a graphical user interface (GUI), limiting their accessibility for
27 applied researchers.
28 To make selection models more readily available to applied researchers, we
29 implemented these models in the open-source statistical program JASP (JASP Team,
30 2020), as part of the Meta-Analysis module. The implementation concerns an intuitive
31 graphical interface for the R packages weightr (Coburn et al., 2019) and RoBMA (Bartoš &
32 Maier, 2020) that allow users to fit either frequentist or Bayesian selection models. Below
33 we first provide a conceptual introduction to frequentist selection models and show how to
34 fit these models using JASP. Second, we introduce a Bayesian selection method, Robust
35 Bayesian Meta-Analysis (RoBMA, Maier et al., 2020), and show how it can overcome
36 several limitations that are inherent to frequentist selection models. We explain how to
37 interpret the results using two examples: a meta-analysis on the influence of “interracial
38 dyads” on performance and observed behavior (Toosi et al., 2012) and a meta-analysis on
39 acculturation mismatch (Lui, 2015). We also provide an example report of a result section
40 that describes the application of both frequentist and Bayesian selection models to the
41 meta-analysis on the influence of “interracial dyads”. Finally, we recorded tutorial videos to
42 facilitate the application of the implemented methods further. The videos are available at
43 https://tinyurl.com/y4g2yodc.
44 Frequentist Selection Models
45 Selection models use weighted likelihood to account for studies that are missing due
46 to publication bias. Selection models are well-established amongst statisticians (e.g.,
47 Iyengar & Greenhouse, 1988; Larose & Dey, 1998; Vevea & Hedges, 1995) and can
48 accommodate realistic assumptions regarding publication bias and heterogeneity. In
49 selection models, analysts specify p-value intervals with different assumed publication
involve a substantial amount of computation. Thus they are unlikely to be used routinely in meta-analysis”. ADJUSTING FOR PUBLICATION BIAS IN JASP 5
50 probabilities, for example, “statistically significant” p-values (p < .05) versus
51 non-significant p-values (p > .05). The models typically use maximum likelihood to obtain
52 a bias adjusted pooled point estimate by accounting for the relative publication
53 probabilities in each interval (called weights) and using the weighted likelihood function.
54 Selection models can accommodate effect size heterogeneity by extending random effects
55 models (McShane et al., 2016; Rothstein et al., 2005; Vevea & Hedges, 1995, pp. 145-174).
56 Selection models can be specified flexibly in several ways. First, researchers can
57 decide between one-sided and two-sided selection. One-sided selection means that only
58 significant effects in the expected direction are more likely to be published. Commonly,
59 significant positive effects are more likely to be published, although in some cases,
60 significant negative effect sizes might be more likely to be published as well. Researchers
61 can specify the direction of selection flexibly. Two-sided selection means that the
62 probability of publication does not depend on the direction of the effect; in other words,
63 positive and negative effects have the same probability of being published, given that they
64 fall in the same p-value interval.
65 Second, researchers may also specify different intervals for different publication
66 probabilities. For example, to account for the fact that marginally significant results
67 (.05 < p < .10) are potentially more likely to be published than non-significant results,
68 researchers could specify this as a third interval. Note that, when the observed effect is in
69 the predicted direction, a marginally significant result using a two-sided test is significant
70 using a one-sided test. Therefore, a two-sided selection process with different publication
71 probabilities for significant versus “marginally significant” findings accommodates a
72 one-sided selection process with publication probabilities depending on whether or not the
73 p-value is statistically significant. ADJUSTING FOR PUBLICATION BIAS IN JASP 6
74 Example 1: Dyadic Interracial Interactions and Performance
75 Toosi et al. (2012) conducted a meta-analysis on the effect of “interracial”
76 interactions on positive attitudes, negative affect, nonverbal behavior, and “objective
77 measures of performance” (Toosi et al., 2012, p.1). The meta-analysis compared dyadic
78 same-race versus “interracial” interactions. A standard reanalysis confirms that
79 “performance” was slightly better in dyads of the same-race compared to dyads of different
80 race, r = 0.070, 95% CI [0.023, 0.117], p = .004, τ (on Cohen’s d scale) = 0.289, 95% CI
2 81 [0.173, 0.370]. Toosi et al. (2012) applied Egger’s regression (Egger et al., 1997) and
82 reported a lack of funnel plot asymmetry, suggesting that the data set is not contaminated
83 by publication bias. However, funnel plot based methods to assess publication bias have
84 repeatedly been criticized for having low power and generating a high proportion of
85 false-positives, especially under heterogeneity (e.g., Lau et al., 2006; Maier et al., 2020).
86 We, therefore, revisit the question of publication bias by reanalyzing this study using
87 selection models in JASP.
2 Our result is similar to that reported by Toosi et al. (2012), namely r = 0.070, 95% CI [0.03, 0.11], I2 = 67.29%. For the re-analysis, we used the data set as recoded by Stanley et al. (2018), accessible at https://osf.io/2vfyj/files/. ADJUSTING FOR PUBLICATION BIAS IN JASP 7
Figure 1 Results from Toosi et al. (2012) Using the Default Settings of JASP Selection Models
Note. Screenshot from the JASP graphical user interface when analyzing the data of Toosi et al. (2012). The analysis settings are specified in the left panel and the associated output is shown in the right panel. The default output concerns (1) a test of heterogeneity; (2) a test of publication bias; (3) adjusted and unadjusted effect size estimates for fixed effects and random effects models. A video tutorial for this analysis is available at https://www.youtube.com/watch?v=mswvm5Ne0eg&t=2s.
88 We start by loading the data set (available on https://osf.io/6hf7r/ including the
89 annotated .jasp file) into JASP. We then open the module menu (by navigating to the “+” ADJUSTING FOR PUBLICATION BIAS IN JASP 8
90 sign in the top right corner of the main ribbon) and activate the Meta-Analysis module. In
91 this module, we choose the “Selection Models” option. The left panel of Figure 1 provides
92 an overview of the resulting GUI for the analysis input options. We set the radio button
93 that determines the input to “Correlation & N”, allowing us to supply effect sizes measured
94 as correlations. The analysis internally transforms the correlations to Cohen’s d effect sizes,
95 estimates the selection models, and transforms the results back into correlations (apart
96 from the heterogeneity estimate τ). We place the variables containing the effect sizes (ES)
3 97 and sample sizes (N) into the appropriate variable boxes. By default, the analysis assumes
4 98 a two-sided selection process with p-value cutoffs .05 and .10. If there are too few studies
99 in one of the specified p-value intervals, the selection model fails to estimate the publication
100 probability. Using the default setting, the JASP implementation tries to circumvent this
101 problem by automatically joining p-value intervals that contain fewer than four p-values
102 (all of these options can be changed in the “Model” tab, see the left panel in Figure 1).
103 The right panel of Figure 1 shows the default output provided by JASP. Under the
104 output tables, the note “Only the following one-sided p-value cutoffs were used:
105 0.025, 0.05, 0.95.” informs us that some of the specified p-value intervals did not
106 contain enough p-values for estimation, and were therefore collapsed. The upper table in
107 the right panel shows that there is statistically significant heterogeneity between studies,
108 Q(54) = 170.77, p < .001. The table underneath shows the results for two tests of
109 publication bias, one under homogeneity and one under heterogeneity. The test that
110 assumes heterogeneity indicates the presence of statistically significant publication bias,
2 111 χ (3) = 13.20, p = .005. The two lower output tables show the results for fixed effects and
3 The “p-value (one-sided)” input is optional; if not provided, the analysis automatically computes the p-values using z approximation in cases of “Effect sizes & SE” input or from t statistics and degrees of freedom computed based on the specified correlations and sample sizes when “Correlations & N” are used
4 The weightr package does not natively support two-sided selection with two-sided p-values; therefore, the specified two-sided p-value cutoffs are internally transformed into one-sided p-value cutoffs. Consequently, the resulting p-value intervals are outputted on the one-sided p-values scale. ADJUSTING FOR PUBLICATION BIAS IN JASP 9
112 random effects estimates, with and without correction for publication bias. The tests
113 shown in the upper two tables suggest that the best estimate of effect size is obtained from
114 the random effects table (because of the presence of heterogeneity), and, within that table,
115 from the row “Adjusted” (because of the presence of publication bias). The adjusted mean
116 estimate is no longer statistically significant, r = −0.011, 95% CI [−0.075, 0.053], p = .734.
117 A visual comparison of the mean estimates from the different models can be obtained using
118 the “Mean Model Estimates” option in the “Plots” section (i.e., Figure 2). The adjusted
119 heterogeneity estimate is τ (on Cohen’s d scale) = 0.152, 95% CI [0.000, 0.227] (not shown
120 here; these results can be obtained by checking “Estimated Heterogeneity” under the
121 “Random Effects” column in the “Model” section).
Figure 2 Comparison of Effect Size Estimates of The Adjusted/Unadjusted and Fixed/Random Effects models from Toosi et al. (2012). Figure from JASP
Fixed effects Fixed effects (adjusted) Random effects Random effects (adjusted)
-0.05 0.00 0.05 0.10 Mean Estimates (ρ)
Note. After adjusting for publication bias (second and fourth estimate from the top) the estimated effect sizes are no longer statistically significant.
122 Limitations of Frequentist Selection Models
123 Frequentist selection models have several shortcomings. Before we discuss these, it
124 is important to reiterate that they are the only frequentist meta-analytical method to
125 perform well under both publication bias and heterogeneity (Carter et al., 2019; Maier
126 et al., 2020; McShane et al., 2016). Therefore, opting for a different frequentist method
127 with less visible limitations would usually result in worse inferences. ADJUSTING FOR PUBLICATION BIAS IN JASP 10
128 The first limitation is that frequentist hypothesis tests are based on binary
129 accept/reject decisions. When the number of primary studies is small, selection models
130 might have insufficient power, compromising the reliability of the accept/reject decisions.
131 This is particularly worrisome since the differences between the publication bias adjusted
132 and unadjusted estimates can be considerable. To illustrate this we turn to another
133 example. Lui (2015) studied how the acculturation mismatch (AM) that is the result of the
134 contrast between the collectivist cultures of Asian and Latin immigrant groups correlates
135 with intergenerational cultural conflict (ICC). Lui (2015) meta-analyzed 18 independent
136 studies correlating AM with ICC. A standard reanalysis indicates a significant effect of AM
5 137 on increased ICC, r = 0.250, p < .001. We reanalyze the data with selection models, using
138 the same default settings as before. The test for heterogeneity is significant, Q(17) = 75.50,
139 p < .001. The test for publication bias assuming heterogeneity is significant only when
2 140 using α = 0.10, as advocated by Renkewitz and Keiner (2019), χ (1) = 3.11, p = .078.
141 However, proponents of the effect may advocate for the more stringent criterion of α = 0.05.
142 This stricter significance level would lead researchers to conclude that they do not need to
143 adjust the estimate for publication bias. This decision has considerable impact on the
144 results: the adjusted effect size estimate is non-significant at the .05 level (i.e., r = 0.159,
145 p = .055), whereas the unadjusted estimate is significant (i.e., r = 0.250, p < .001).
146 The second limitation also relates to small sample sizes. If we consider the example
147 outlined in the previous paragraph in greater detail, we discover that the automatic p-value
148 interval collapsing leaves us with only one p-value cutoff, .025 (corresponding to one-sided
149 p-values). This collapsing prevents possible estimation issues that happen with only a few
150 primary studies in some of the p-value intervals. There is no clear solution to this problem
151 from the frequentist standpoint. A possible solution is to try again with different p-value
152 intervals. However, this makes the analysis data-dependent and might not always be
5 This result is close to that reported by Lui (2015), namely r = 0.23. For the re-analysis, we used the data set as recoded by Stanley et al. (2018), accessible at https://osf.io/2vfyj/files/. ADJUSTING FOR PUBLICATION BIAS IN JASP 11
153 possible (e.g., when there are almost exclusively significant studies). Furthermore, it can
154 also lead to different results. If we automatically collapse the p-value intervals, as in the
155 previous example, the estimated effect size from the publication bias adjusted random
156 effects model is significant on the α = .10 level, r = 0.159, p = .055. However, not
157 collapsing the p-value intervals would have resulted in a non-significant test for publication
2 158 bias, χ (4) = 5.03, p = 0.284, and the subsequent selection of the original, significant effect
159 size estimate from the non-adjusted random effects model.
160 A third limitation is that a non-significant test for publication bias is often not very
161 informative. From a frequentist point of view, the act of not rejecting the null hypothesis
162 does not imply that there is evidence in its favor. In other words, frequentist methods
163 cannot distinguish between absence of evidence (i.e., the data are uninformative) or
164 evidence of absence (i.e., the data support the null hypothesis; Keysers et al., 2020;
165 Wagenmakers et al., 2016). This problem is related to the earlier example in which it was
166 unclear whether nonsignificance on the .05 level indicates evidence of absence or absence of
167 evidence regarding publication bias.
168 A fourth limitation is accumulation bias (ter Schure & Grünwald, 2019). Consider
169 meta-analyzing k primary studies with a frequentist selection model. At a later point in
170 time, an additional study k + 1 becomes available, and researchers want to add this study
171 to the set and update the analysis. For frequentist methods, this introduces the problem of
172 multiple testing. To avoid accumulation bias, the sampling plan would need to be known in
173 advance. However, since researchers usually conduct meta-analyses on available data
174 collected by others, accumulation bias is all but inevitable.
175 To overcome these limitations, we developed a Robust Bayesian Meta-Analysis
176 (RoBMA; Maier et al., 2020). In the next paragraphs, we explain RoBMA conceptually
177 and show how it alleviates the shortcomings of frequentist selection models. In addition, we
178 illustrate the workings of the JASP implementation in practice. ADJUSTING FOR PUBLICATION BIAS IN JASP 12
179 Robust Bayesian Meta-Analysis
180 RoBMA is a Bayesian extension of selection models. It allows researchers to
181 simultaneously estimate different models and base the results on a weighted combination of
182 their estimates. The models can be generally divided into three qualitatively different pairs:
183 1. Models assuming the null hypothesis to be true versus models assuming the
184 alternative hypothesis to be true (i.e., H0 vs. H1).
f 185 2. Models assuming fixed effects versus models assuming random effects (i.e., H vs.
r 186 H ).
187 3. Models assuming publication bias (i.e., selection models) and models assuming no
ω ω 188 publication bias (i.e., H vs. H ).
189 Combining these models results in 2 × 2 × 2 = 8 different model types. The default
190 RoBMA implementation of JASP provides inference based on all eight of these model types
191 simultaneously. In the default setting, as used in Maier et al. (2020), all model types are
192 deemed equally likely a priori. Also, the model types assuming publication bias contain
193 two models – one assuming a two-sided selection process with two steps and one assuming
194 a two-sided selection process with three steps. Table 1 displays an overview of the
195 individual models, containing all prior distributions and prior model probabilities. This
196 table is automatically generated in JASP before the start of the fitting process.
197 Consider the models in Table 1 in more detail. The first six models with effect size
198 prior “Spike(0)” are the null models that assume no effect. Models seven to twelve assume
199 the presence of an effect with the (non-truncated) standard normal prior distribution
200 (“Normal(0,1)”). Regarding heterogeneity, we again use spike at zero for the null models
201 and the inverse gamma distribution for the models assuming heterogeneity to be present
202 (van Erp et al., 2017). For publication bias, we distinguish between three types of models.
203 First, we include models that assume no publication bias (publication probability is equal
204 to 1, i.e., “Spike(1)”). Second, we include publication bias models that assume two-sided ADJUSTING FOR PUBLICATION BIAS IN JASP 13
Table 1 Models Overview for the Default Settings in the JASP Implementation of RoBMA. Table from JASP
Prior Distribution Effect Size Heterogeneity Publication Bias P(M)
1 Spike(0) Spike(0) Spike(1) 0.125 2 Spike(0) Spike(0) Two-sided((0.05), (1, 1)) 0.063 3 Spike(0) Spike(0) Two-sided((0.1, 0.05), (1, 1, 1)) 0.063 4 Spike(0) InvGamma(1, 0.15)[0, Inf] Spike(1) 0.125 5 Spike(0) InvGamma(1, 0.15)[0, Inf] Two-sided((0.05), (1, 1)) 0.063 6 Spike(0) InvGamma(1, 0.15)[0, Inf] Two-sided((0.1, 0.05), (1, 1, 1)) 0.063 7 Normal(0, 1)[-Inf, Inf] Spike(0) Spike(1) 0.125 8 Normal(0, 1)[-Inf, Inf] Spike(0) Two-sided((0.05), (1, 1)) 0.063 9 Normal(0, 1)[-Inf, Inf] Spike(0) Two-sided((0.1, 0.05), (1, 1, 1)) 0.063 10 Normal(0, 1)[-Inf, Inf] InvGamma(1, 0.15)[0, Inf] Spike(1) 0.125 11 Normal(0, 1)[-Inf, Inf] InvGamma(1, 0.15)[0, Inf] Two-sided((0.05), (1, 1)) 0.063 12 Normal(0, 1)[-Inf, Inf] InvGamma(1, 0.15)[0, Inf] Two-sided((0.1, 0.05), (1, 1, 1)) 0.063
205 selection on two-sided p-values based on significance. Here the cumulative sum of the
206 Dirichlet distribution with parameters (1,1) is used as prior on the publication bias weights
207 (i.e., “Two-sided(0.05), (1,1)”). We use the cumulative sum to induce ordinality (the “more
208 significant” the more likely to be published) and to assume that significant studies are
209 always published. Third, we include models that also assume two-sided p-values and
210 two-sided selection but distinguish between “marginally significant” and non-significant
211 studies (“Two-sided(0.1, 0.05)(1,1,1)”). The prior probability is split equally across the
212 different model pairs (i.e., effect size, heterogeneity, publication bias), resulting in a
213 probability of 0.125 for each model. Since there are two models assuming publication bias, ADJUSTING FOR PUBLICATION BIAS IN JASP 14
214 we split the prior probability again for those two models resulting in 0.0625 (rounded in
215 the JASP output) for these models. However, we want to point out that these are only the
216 default priors, and researchers can specify different priors if they like. Those who are not
217 interested in testing a null hypothesis that assumes the effect to be zero (e.g., McElreath,
218 2020) can also specify different models as the null hypothesis or not specify a null
219 hypothesis at all (see Example 3).
220 The models are then updated according to Bayes’ rule. In other words, models that
221 predict the data well receive a boost in posterior probability, while models that predict the
222 data poorly suffer a decline. Comparing only two models, we can describe their relative
223 predictive performance using Bayes factors (Etz & Wagenmakers, 2017; Jeffreys, 1961;
224 Kass & Raftery, 1995; Rouder & Morey, 2019; Wrinch & Jeffreys, 1921). The Bayes factor
rω rω 225 equals the change from prior to posterior odds. Assuming that H1 and H1 are equally
226 likely a priori the posterior odds equal the Bayes factor. Equation 1 shows the Bayes
227 factor using the example of comparing publication bias and no publication bias models
228 assuming the presence of an effect and the presence of heterogeneity.
p(data | Hrω) p(Hrω | data) , p(Hrω) 1 = 1 1 . (1) p(data | Hrω) p(Hrω | data) rω) 1 1 p(H1 | {z } | {z } | {z } Bayes factor Posterior odds Prior odds
229 More than two models can be compared using the “inclusion Bayes factor”. This
230 Bayes factor allows researchers to quantify the evidence for a meta-analytical effect, the
231 evidence for heterogeneity, and the evidence for publication bias. When comparing models
232 assuming publication bias to models assuming no publication bias, the inclusion Bayes ADJUSTING FOR PUBLICATION BIAS IN JASP 15
233 factor can be calculated as in Equation 2.
p(Hfω | data) + p(Hrω | data) + p(Hfω | data) + p(Hrω | data) BFωω = 1 1 0 0 | {z } fω rω fω rω Inclusion Bayes factor p(H1 | data) + p(H1 | data) + p(H0 | data) + p(H0 | data) for publication bias | {z } Posterior inclusion odds for effect (2) , p(Hfω) + p(Hrω) + p(Hfω) + p(Hrω) 1 1 0 0 . fω rω fω rω p(H1 ) + p(H1 ) + p(H0 ) + p(H0 ) | {z } Prior inclusion odds for effect
234 In other words, the inclusion Bayes factor for publication bias is obtained by
235 contrasting the prediction accuracy of all models that assume publication bias to all
236 models that assume no publication bias. The inclusion Bayes factor for effect size and
237 heterogeneity can be calculated analogously. One advantage of Bayes factors is that they
238 can distinguish between absence of evidence and evidence of absence. In addition, they can
239 quantify evidence on a continuous scale and can be updated sequentially as studies
240 accumulate.
241 After updating the models according to their posterior probability, the final effect
242 size estimate is obtained by Bayesian model averaging (e.g., Hinne et al., 2020; Hoeting
243 et al., 1999). In Bayesian model averaging the effect size from each individual model is
244 weighted by its posterior probability. Since those models that predicted the data best have
245 the highest posterior probability, the final estimate is based most strongly on the best
246 models.
247 RoBMA overcomes the limitations of frequentist selection models in several ways.
248 First, the Bayes factor allows researchers to quantify evidence for the null hypothesis and,
249 thus, distinguish between absence of evidence and evidence of absence.
250 Second, the model averaging obviates the need to select a single model. Therefore, if
251 there is uncertainty regarding the presence of publication bias, RoBMA can base the
252 inference on both the “normal” models and the publication bias adjusted models instead of
253 needing to commit fully to a single model. ADJUSTING FOR PUBLICATION BIAS IN JASP 16
254 Third, the prior distributions allow the selection models to be estimated even in
255 cases with few p-values in some of the p-value intervals. The method will not fail to
256 converge under these conditions. However, especially in this context, it is important to
257 specify the prior distributions carefully and check the robustness of the results to different
258 specifications of the prior distributions.
259 Fourth, Bayes factors allow for sequential updating (Rouder, 2014; Rouder &
260 Morey, 2011; Wagenmakers et al., 2016), meaning that new studies can be added to the set
261 and the analysis can be updated without having to worry about accumulation bias. At
262 every point in time, RoBMA quantifies evidence based on the relative predictive
263 performance of the rival models for the observed data.
264 Example 2: Acculturation Mismatch (AM) and Intergenerational Cultural
265 Conflict (ICC)
266 We illustrate RoBMA using the meta-analysis on the relationship between AM and
267 ICC (Lui, 2015) mentioned in the section “limitations of frequentist selection models”. To
268 analyze the data, we first specify effect sizes as correlations, using the “Correlations & N”
269 input option (cf. the earlier example on dyadic “interracial” interactions). The RoBMA
270 package internally transforms the correlations into Cohen’s d effect sizes, which are then
271 used to estimate the models. Second, we specify prior distributions for the effect size,
272 heterogeneity, and publication bias (the prior distribution specification is discussed in
273 greater detail in Example 3). Because we are using correlations as an input, the prior
274 distributions for effect size and heterogeneity parameters are specified on the Cohen’s d
275 scale. The prior distributions can be visualized with JASP (Figure 3 shows prior
276 distribution for the effect size of the alternative hypothesis) by selecting the “Plot priors”
277 checkbox in the “Models” panel. The prior distribution for effect size is automatically
278 transformed from Cohen’s d to the input scale selected earlier; in this case, it is ADJUSTING FOR PUBLICATION BIAS IN JASP 17
6 279 transformed to the correlation scale. Figure 3 The Default Standard-Normal Prior Distribution on Cohen’s d Induces a Bell-Shaped Prior Distribution on the Correlation Scale. Figure from JASP
1
0.8
0.6
Density 0.4
0.2
0
−1 −0.5 0 0.5 1
ρ (Cohen's d ~ Normal(0, 1) [ −∞, ∞ ])
280 After specifying the prior distributions, the models can be estimated. To do so, we
281 place the effect sizes and sample sizes to the corresponding input panels (“Effect Size”,
282 “N”). JASP then automatically starts estimating the models, which is signaled by the
283 appearance of a progress bar at the top of the right panel. After a few minutes, JASP
284 produces the default output depicted in Figure 4. Note, however, that with larger numbers
285 of primary studies the estimation takes considerably longer. The current example features
286 18 studies and took 15 minutes to estimate, whereas the first example featured 55 studies
7 287 and took around 55 minutes to estimate. To ameliorate the problems associated with
6 The heterogeneity parameter τ is interpreted on the Cohen’s d scale because its transformation to the correlation scale is dependent on the location of the effect size estimate.
7 As timed on a notebook with a rather old 4720HQ Intel processor. ADJUSTING FOR PUBLICATION BIAS IN JASP 18
288 lengthy model fitting times, the “Save the fitted model” option in the “Advanced” section
289 allows users to save an already estimated model. The saved model can then be loaded into
290 JASP (or R) using the “Fitted model” data input option. This allows users to fit the
291 model only once, share the estimated model with colleagues, and circumvent the issue of
292 lengthy refitting times.
8 293 The “Model Summary” table presents the results of the RoBMA hypothesis tests.
294 The results indicate that there is a strong evidence for the presence of the effect,
rf 6 295 BF10 = 2777.84 and heterogeneity, BF = 3.74 × 10 , but virtually no evidence for either
ωω 296 absence or presence of publication bias BF = 1.35.
8 The warnings below the table inform us that the starting values for the MCMC algorithm needed to be set to the mean of the observed values due to a problem with likelihood calculation for the randomly chosen starting values. This message is harmless and only indicates that the starting values could not be chosen randomly. A more detailed discussion about the potential error and warning messages is available at https://fbartos.github.io/RoBMA/articles/WarningsAndErrors.html. ADJUSTING FOR PUBLICATION BIAS IN JASP 19
Figure 4 Results from Lui (2015) Using the Default Settings of JASP Robust Bayesian Meta-Analysis
Note. The analysis settings are specified in the left panel and the associated output is shown in the right panel. The default output concerns (1) tests for effect, heterogeneity, and publication bias and (2) model-averaged parameter estimates for the effect size, heterogeneity, and publication bias weights. A tutorial video showing this analysis is available at https://www.youtube.com/watch?v=5Ff9jsb1_TM.
297 The following table, “Model Averaged Estimates”, reports the estimates for effect
298 size and for heterogeneity as obtained by model-averaging across all models. The model’s
299 posterior probabilities function as weights for the model-averaging. The mean estimate of
300 effect size is close to the original estimate, r = 0.240, 95% CI [0.157, 0.321], with the
301 heterogeneity estimate, τ (on Cohen’s d scale) = 0.317, 95% CI [0.194, 0.493]. ADJUSTING FOR PUBLICATION BIAS IN JASP 20
302 We can visualize the model-averaged prior and posterior distribution for effect size
303 by selecting the “Effect” option under the “Pooled Estimates” heading in the “Plots”
304 section. Figure 5 shows the change from the prior distribution (dashed grey line and grey
305 arrow) to the posterior distribution (solid black line and black arrow). The posterior
306 probability of the spike is much lower than its prior probability (i.e., the grey arrow goes
307 up to .50 on the secondary y-axis, whereas the height of the black arrow only slightly
308 exceeds 0), showing that the data undercut the hypothesis that the effect is absent. The
309 posterior distribution under the alternative hypothesis is relatively peaked on
310 medium-sized effects. We can also explore the effect size estimates from the conditional
311 individual models (showing and model-averaging only across models assuming the presence
312 of the effect), which can be produced in the same section under the “Individual Models”
313 heading. Figure 6 provides an example. JASP provides similar figures for the heterogeneity
314 and the weights (i.e., the relative publication probabilities). ADJUSTING FOR PUBLICATION BIAS IN JASP 21
Figure 5 The Model-Averaged Prior and Posterior Distribution for the Effect Size Estimate from Lui (2015) Using the Default Settings. Figure from JASP
12 1
10 .75 8 Probability
6 .50 Density
4 .25 2
0 0
−1 −0.5 0 0.5 1 ρ (averaged)
Note. The model-averaged prior distribution for effect size is displayed using the grey arrow and the grey dashed line; the posterior distribution is displayed using the black arrow (which is close to zero) and the solid black line. The arrows visualize the probability (secondary y-axis) on the point-null prior and posterior distribution. ADJUSTING FOR PUBLICATION BIAS IN JASP 22
Figure 6 Effect Size Estimates under the Models Assuming ρ 6= 0 from Lui (2015) Using the Default Settings. Figure from JASP
Mean [95% CI] Model: Post. prob. (Prior prob.)
µ ~ Normal(0, 1) [ −∞, ∞ ] τ ~ Spike(0) 0.21 [0.17, 0.24] ω ~ Spike(1) 0.00 (0.25)
µ ~ Normal(0, 1) [ −∞, ∞ ] τ ~ Spike(0) 0.20 [0.15, 0.23] ω ~ Two−sided((0.05), (1, 1)) 0.00 (0.12)
µ ~ Normal(0, 1) [ −∞, ∞ ] τ ~ Spike(0) 0.19 [0.15, 0.23] 0.00 (0.12) ω ~ Two−sided((0.1, 0.05), (1, 1, 1))
µ ~ Normal(0, 1) −∞ ∞ [ , ] 0.25 [0.17, 0.33] τ ~ InvGamma(1, 0.15) [ 0, ∞ ] ω ~ Spike(1) 0.43 (0.25)
µ ~ Normal(0, 1) −∞ ∞ [ , ] 0.24 [0.16, 0.32] τ ~ InvGamma(1, 0.15) [ 0, ∞ ] ω ~ Two−sided((0.05), (1, 1)) 0.21 (0.12)
µ ~ Normal(0, 1) −∞ ∞ [ , ] 0.23 [0.15, 0.32] τ ~ InvGamma(1, 0.15) [ 0, ∞ ] ω ~ Two−sided((0.1, 0.05), (1, 1, 1)) 0.36 (0.12)
Overall (Conditional) 0.24 [0.16, 0.32]
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 ρ
Note. The left side describes prior distribution configuration for each of the model parameters and the right sides sides provides numerical summaries for the mean, 95% CI, posterior model probability and prior model probability. The bottom row visualizes the effect size estimate model-averaged across the visualized models.
315 The models are estimated using Markov chain Monte Carlo techniques (MCMC) in
316 JAGS (Plummer, 2003). The default settings include 1000 adaptation, 5000 burnin, and
317 10000 sampling iterations with 3 chains and thinning set to 1. These settings can be
318 adjusted under the “Advanced” section. The section also contains a “Set seed” option for ADJUSTING FOR PUBLICATION BIAS IN JASP 23
319 exact repeatability of the results. MCMC convergence can be checked using the
320 “Overview” option under the “MCMC Diagnostics”. The “Model Diagnostics Overview”
321 table shows diagnostics summaries for the individual models. We see that all models had
322 excellent Rhat convergence diagnostic (the recommended maximum value is 1.05; Gelman,
323 Rubin, et al., 1992; but see Vehtari et al., 2019 for a more stringent standard) and
324 acceptable effective sample size (ESS). Parameter and model-specific diagnostics can be
325 obtained from the “Plot” section, which allows the visualization of trace plots,
326 autocorrelation histograms, and posterior sample density plots for any of the estimated
327 parameters. See McElreath (2020, p. 285) or Plummer et al. (2019) for a more detailed
328 explanation of these plotting options. When one of more models seem unfit for drawing
329 inferences (i.e., large Rhat and/or low ESS), these models can be excluded using the
330 “Exclude models” option under the “Advanced” tab. If desired, these models can be
331 re-estimated with an increased number of burnin and sampling iterations.
332 This example highlights the Bayesian benefit of taking all uncertainty into account.
333 In the frequentist framework, it was unclear whether or not to adjust for publication bias.
334 In contrast, RoBMA does not require an all-or-none decision on the presence of publication
335 bias. Instead, all models are taken into account simultaneously, and the effect size estimate
336 is based on a weighted average across the various models, with the averaging weights
337 determined by the support that each model receives from the data. This difference is not
338 only philosophical but has important practical implications: RoBMA finds clear evidence
339 for the presence of an effect, whereas frequentist selection models can either indicate that
340 the effect is present or absent, depending on the analyst’s choices regarding publication
341 bias adjustment.
342 Example 3: Specifying Different Priors
343 In the previous example, RoBMA revealed compelling evidence against the point
344 null hypothesis. However, it has been repeatedly argued that point null hypotheses are not ADJUSTING FOR PUBLICATION BIAS IN JASP 24
345 realistic, and therefore not meaningful to test (e.g., Gelman & Carlin, 2014; Good, 1967;
346 Meehl, 1978; Orben & Lakens, 2020). RoBMA overcomes this objection by allowing the
347 specification of ‘perinull’ hypotheses, that is, hypotheses with prior distributions tightly
348 centered around an effect size of zero (e.g., Berger & Delampady, 1987; Cornfield, 1966;
349 George & McCulloch, 1993). Adjustments may also be desired for the prior distribution on
350 effect size that is postulated under the alternative hypothesis. By default, the most
351 plausible value for this prior distribution is zero (i.e., under H1, the prior distribution is
352 centered on zero), and this may not reflect the information at hand (e.g., Gronau et al.,
353 2020). A more diagnostic test requires an ‘informed prior’, one that is centered around a
354 non-zero value of effect size.
355 Here we continue the example of acculturation mismatch and intergenerational
356 cultural conflict (Lui, 2015) and demonstrate how RoBMA allows researchers to specify
357 both a perinull hypothesis and an informed hypothesis, and compare their predictive
358 performance for the observed data. First, we specify a perinull hypothesis by assigning
359 effect size a zero-centered normal distribution with a standard deviation .10 on the Cohen’s
360 d scale; propagated to the correlation scale, this yields a prior distribution with
361 approximately 95% probability mass in the interval r ∈ [−0.10, 0.10]. For a perinull
362 hypothesis, this range may be considered relatively wide. If the goal of the perinull
363 distribution is primarily to counter the objection that “the null hypothesis is never true
364 exactly”, a much more narrow 95% interval could be specified, such as one ranging from
365 −.01 to .01, for instance.
366 Second, we specify the informed alternative hypothesis by assigning the effect size a
367 normal distribution centered at .60 with standard deviation .20 on the Cohen’s d scale.
368 Translated to the correlation scale, this results in a prior distribution with most probability
369 on correlations higher than .10, with the prior median at a correlation slightly lower than
370 .30.
371 To specify these hypotheses in JASP, we open the “Models” section and adjust the ADJUSTING FOR PUBLICATION BIAS IN JASP 25
372 parameter values under the “Effect” heading (for the alternative hypothesis) and under
373 “Effect (Null)” (for the null hypothesis), which can be accessed by checking the “Set null
374 priors” checkbox. The models settings is depicted in Figure 7. The model specification
375 implemented in JASP allows researchers to specify any desired combination of hypotheses
9 376 using different prior distributions (see the JASP help file accessible under the (I) icon).
377 The specified prior distribution under each hypothesis is then used to generate a
378 combination of all possible models. These models are automatically used to draw inference
379 using the inclusion Bayes factor and model-averaged to obtain model estimates.
9 For specifying more complex models, see the RoBMA R package manual and the “Fitting custom meta-analytic ensembles vignette” (https://fbartos.github.io/RoBMA/articles/CustomEnsembles.html). ADJUSTING FOR PUBLICATION BIAS IN JASP 26
Figure 7 Comparison Between a Perinull Hypothesis and an Informed Alternative Hypothesis as Applied to the Data from Lui (2015). Screenshot from the JASP RoBMA Module
Note. The left panel shows the prior settings specifying the informed alternative hypothesis (“Normal” distribution under “Effect”) and the perinull hypothesis (“Normal” distribution under “Effect (Null)”). The right panel shows the default output: the tests for effect, heterogeneity, and publication bias and model-averaged parameter estimates for the effect size, heterogeneity, and publication bias weights. A video showing this analysis is available at https://www.youtube.com/watch?v=BEMijDxQD2k.
380 The results indicate strong support against the perinull hypothesis, BF10 = 407.48,
rf 6 381 with overwhelming evidence for heterogeneity, BF = 5.08 ∗ 10 , and lack of evidence for
ωω 382 absence or presence of publication bias, BF = 1.25 (Figure 7). The model-averaged prior ADJUSTING FOR PUBLICATION BIAS IN JASP 27
383 and posterior plot for the mean parameter, shown in Figure 8, indicates that most of the
384 posterior probability mass is concentrated in the area predicted by the informed alternative
385 hypothesis models.
Figure 8 The Model-Averaged Prior and Posterior Distribution for the Effect Size Parameter Estimate from Lui (2015) Using Customized Priors, Figure from JASP
12
10
8
6 Density
4
2
0
−1 −0.5 0 0.5 1 ρ (averaged)
Note. The model-averaged prior distribution on effect size is depicted as the dashed grey line; the model-averaged posterior distribution is depicted as the solid black line.
386 Example Report
387 In the previous sections, we illustrated the various functionalities of the JASP
388 RoBMA module for conduction publication bias adjusted meta-analyses. Here, we briefly
389 demonstrate how to report the results of both Selection Models and Robust Bayesian
390 Meta-Analysis, using our first example concerning the effect of “interracial” vs. same race
391 dyads on performance (Toosi et al., 2012). For more general reporting guidelines see ADJUSTING FOR PUBLICATION BIAS IN JASP 28
392 van Doorn et al. (in press)
393 First, we start with the selection models. Prior to data analysis, we decided to use
394 significance level α = 0.10 for publication bias and α = 0.05 for heterogeneity and effect
395 sizes. We estimated the two-sided selection models with p-value cut-offs set to (0.05, 0.10)
396 and automatically joined p-value intervals. The models were estimated using correlations
397 and sample sizes with Cohen’s d effect size transformation. The p-value intervals were
398 automatically reduced to the (0.025, 0.05, 0.95) cut-offs corresponding to one-sided p-values.
399 The tests for heterogeneity was significant, Q(54) = 170.77, p < .001. Therefore, we applied
400 the test for publication bias assuming heterogeneity, which was significant as well,
2 401 χ (3) = 13.20, p = .005. Consequently, we interpreted the bias adjusted effect size estimate
402 from a random effects model. The effect size was not statistically significant, r = −0.011,
403 95% CI [−0.075, 0.053], p = .734, with the heterogeneity estimate, τ (on Cohen’s d scale) =
404 0.152, 95% CI [0.000, 0.227]. The resulting JASP file can be found at https://osf.io/6hf7r/.
405 Second, we re-analyzed the same data set using Robust Bayesian Meta-Analysis.
406 Before the analysis, we decided to use the default prior settings (i.e., standard normal
407 distribution on effect sizes, inverse gamma distribution with α = 1 and β = 0.15 on
408 heterogeneity, two two-sided weight functions with cut-points at (0.05) and (0.05, 0.10) and
409 parameters α = (1, 1) and (1, 1, 1), and the default point priors on the null hypotheses). We
410 set the prior hypothesis probability to 0.50 for the effect size, heterogeneity, and publication
411 bias. The models were estimated using correlations and sample sizes with Cohen’s d effect
412 size transformation. The results did not indicate evidence for either presence or absence of
rf 413 the effect, BF10 = 2.01, they indicated strong evidence for heterogeneity, BF = 108.52,
ωω 414 and they indicated strong evidence for publication bias, BF = 280.74. The resulting
415 model-averaged effect size estimate was r = 0.032, 95% CI [0.000, 0.083], with the
416 heterogeneity estimate, τ (on Cohen’s d scale) = 0.175, 95% CI [0.078, 0.291]. The MCMC
417 diagnostics were good, with all Rhat values below 1.01 and all ESS above 800. ADJUSTING FOR PUBLICATION BIAS IN JASP 29
418 Concluding Comments
419 In this paper, we introduced two approaches to adjust for publication bias in
420 meta-analysis, both implemented in JASP. First, we discussed frequentist selection models,
421 which have been demonstrated to work well even under high heterogeneity. In selection
422 models, we first test for heterogeneity and publication bias and then select the appropriate
423 model. Second, we discussed RoBMA, Robust Bayesian Meta-Analysis. In contrast to the
424 frequentist selection models, RoBMA does not select a single model. Instead, it keeps all
425 models in play and weights their impact for effect size estimation according to their
426 posterior probability.
427 The publication bias adjusted meta-analysis module in JASP does not incorporate
428 popular methods such as Egger regression (Egger et al., 1997), or p-curve (Simonsohn
429 et al., 2014). The reason for this choice is that simulations, as well as analyses of existing
430 data, suggest that publication bias adjustment based on selection models is the most
431 promising approach (Carter et al., 2019; Maier et al., 2020; McShane et al., 2016;
432 Renkewitz & Keiner, 2019). However, in future research, it could be interesting to add
433 alternative models of publication bias (e.g., correlation of effect size and standard error as
434 in Egger’s regression (Egger et al., 1997)) to RoBMA. This would allow us to compare
435 different models of publication bias directly on empirical data.
436 To conclude, the publication bias adjusted meta-analyses in JASP allows researchers
437 without programming experience to conduct state-of-the-art publication bias corrected
438 meta-analysis in an intuitive and user-friendly way. We hope that this methodology will
439 improve the inferences researchers make when conducting meta-analysis. ADJUSTING FOR PUBLICATION BIAS IN JASP 30
440 References
441 Bartoš, F., & Maier, M. (2020). RoBMA: An R Package for Robust Bayesian
442 Meta-Analyses [R package version 1.0.5].
443 https://CRAN.R-project.org/package=RoBMA
444 Bartoš, F., & Schimmack, U. (2020). Z-curve. 2.0: Estimating replication rates and
445 discovery rates. https://doi.org/10.31234/osf.io/urgtn
446 Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science,
447 317–335.
448 Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Publication bias. In
449 M. Borenstein (Ed.), Introduction to Meta-Analysis (pp. 277–292). Wiley.
450 Brunner, J., & Schimmack, U. (2020). Estimating population mean power under conditions
451 of heterogeneity and selection for significance. Meta-Psychology, 4.
452 https://doi.org/10.15626/MP.2018.874
453 Carter, E. C., & McCullough, M. E. (2014). Publication bias and the limited strength
454 model of self-control: Has the evidence for ego depletion been overestimated?
455 Frontiers in Psychology, 5, Article 823. https://doi.org/10.3389/fpsyg.2014.00823
456 Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias
457 in psychology: A comparison of meta-analytic methods. Advances in Methods and
458 Practices in Psychological Science, 2 (2), 115–144.
459 https://doi.org/10.1177/2515245919847196
460 Coburn, K. M., Vevea, J. L., & Coburn, M. K. M. (2019). Package ‘weightr’ [R package
461 version 2.0.2]. https://cran.rproject.org/web/packages/weightr/weightr.pdf
462 Cornfield, J. (1966). A Bayesian test of some classical hypotheses—with applications to
463 sequential clinical trials. Journal of the American Statistical Association, 61 (315),
464 577–594. https://doi.org/10.1080/01621459.1966.10480890 ADJUSTING FOR PUBLICATION BIAS IN JASP 31
465 Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot–based method of
466 testing and adjusting for publication bias in meta-analysis. Biometrics, 56 (2),
467 455–463. https://doi.org/10.1111/j.0006-341X.2000.00455.x
468 Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis
469 detected by a simple, graphical test. BMJ, 315 (7109), 629–634.
470 https://doi.org/10.1136/bmj.315.7109.629
471 Etz, A., & Wagenmakers, E.-J. (2017). J. B. S. Haldane’s contribution to the Bayes factor
472 hypothesis test. Statistical Science, 32, 313–329. https://doi.org/10.1214/16-STS599
473 Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing Type S (sign) and
474 Type M (magnitude) errors. Perspectives on Psychological Science, 9 (6), 641–651.
475 https://doi.org/10.1177/1745691614551642
476 Gelman, A., Rubin, D. B. Et al. (1992). Inference from iterative simulation using multiple
477 sequences. Statistical science, 7 (4), 457–472.
478 George, E. I., & McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of
479 the American Statistical Association, 88 (423), 881–889.
480 https://doi.org/10.1080/01621459.1993.10476353
481 Good, I. J. (1967). A Bayesian significance test for multinomial distributions. Journal of
482 the Royal Statistical Society: Series B (Methodological), 29 (3), 399–418.
483 https://doi.org/10.1111/j.2517-6161.1967.tb00705.x
484 Gronau, Q. F., Ly, A., & Wagenmakers, E.-J. (2020). Informed Bayesian t-tests. The
485 American Statistician, 74, 137–143. https://doi.org/10.1080/00031305.2018.1562983
486 Hinne, M., Gronau, Q. F., van den Bergh, D., & Wagenmakers, E.-J. (2020). A conceptual
487 introduction to Bayesian model averaging. Advances in Methods and Practices in
488 Psychological Science, 3 (2), 200–215. https://doi.org/10.1177/2515245919898657
489 Hoeting, J. A., Madigan, D., Raftery, A. E., & Volinsky, C. T. (1999). Bayesian model
490 averaging: A tutorial. Statistical Science, 14 (4), 382–401. ADJUSTING FOR PUBLICATION BIAS IN JASP 32
491 Ioannidis, J. P. (2005). Why most published research findings are false. PLOS Medicine,
492 2 (8), e124. https://doi.org/10.1371/journal.pmed.0020124
493 Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file drawer problem.
494 Statistical Science, 3 (1), 109–117.
495 JASP Team. (2020). JASP (Version 0.14)[Computer software]. https://jasp-stats.org/
496 Jeffreys. (1961). Theory of probability (3rd Edition). Oxford University Press.
497 Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical
498 Association, 90 (430), 773–795. https://doi.org/10.1080/01621459.1995.10476572
499 Keysers, C., Gazzola, V., & Wagenmakers, E.-J. (2020). Using Bayes factor hypothesis
500 testing in neuroscience to establish evidence of absence. Nature Neuroscience, 23,
501 788–799. https://doi.org/10.1038/s41593-020-0660-4
502 Larose, D. T., & Dey, D. K. (1998). Modeling publication bias using weighted distributions
503 in a Bayesian framework. Computational Statistics & Data Analysis, 26 (3),
504 279–302. https://doi.org/10.1016/S0167-9473(97)00039-X
505 Lau, J., Ioannidis, J. P., Terrin, N., Schmid, C. H., & Olkin, I. (2006). The case of the
506 misleading funnel plot. BMJ, 333 (7568), 597–600.
507 https://doi.org/10.1136/bmj.333.7568.597
508 Lui, P. P. (2015). Intergenerational cultural conflict, mental health, and educational
509 outcomes among asian and latino/a americans: Qualitative and meta-analytic
510 review. Psychological Bulletin, 141 (2), 404–446.
511 Maier, M., Bartoš, F., & Wagenmakers, E.-J. (2020). Robust Bayesian meta-analysis:
512 Addressing publication bias with model-averaging.
513 https://doi.org/10.31234/osf.io/u4cns
514 McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and
515 Stan (2nd ed.). CRC press.
516 McShane, B. B., Böckenholt, U., & Hansen, K. T. (2016). Adjusting for publication bias in
517 meta-analysis: An evaluation of selection methods and some cautionary notes. ADJUSTING FOR PUBLICATION BIAS IN JASP 33
518 Perspectives on Psychological Science, 11 (5), 730–749.
519 https://doi.org/10.1177/1745691616662243
520 Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the
521 slow progress of soft psychology. Journal of Consulting and Clinical Psychology,
522 46 (4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806
523 Orben, A., & Lakens, D. (2020). Crud (re)defined. Advances in Methods and Practices in
524 Psychological Science, 3 (2), 238–247. https://doi.org/10.1177/2515245920917961
525 Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using
526 Gibbs sampling, In Proceedings of the 3rd international workshop on distributed
527 statistical computing. Vienna, Austria.
528 Plummer, M., Best, N., Cowles, K., & Vines, K. (2019). Package ‘coda’.
529 Renkewitz, F., & Keiner, M. (2019). How to detect publication bias in psychological
530 research. Zeitschrift für Psychologie, 227 (4), 261–279.
531 https://doi.org/10.1027/2151-2604/a000386
532 Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological
533 Bulletin, 86 (3), 638. https://doi.org/10.1037/0033-2909.86.3.638
534 Rothstein, H. R., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis.
535 John Wiley & Sons.
536 Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin
537 & Review, 21 (2), 301–308. https://doi.org/10.3758/s13423-014-0595-4
538 Rouder, J. N., & Morey, R. D. (2011). A Bayes factor meta-analysis of Bem’s ESP claim.
539 Psychonomic Bulletin & Review, 18 (4), 682–689.
540 https://doi.org/10.3758/s13423-011-0088-7
541 Rouder, J. N., & Morey, R. D. (2019). Teaching Bayes’ theorem: Strength of evidence as
542 predictive accuracy. The American Statistician, 73 (2), 186–190.
543 https://doi.org/10.1080/00031305.2017.1341334 ADJUSTING FOR PUBLICATION BIAS IN JASP 34
544 Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer.
545 Journal of Experimental Psychology: General, 143 (2).
546 https://doi.org/10.1037/a0033242
547 Stanley, T. D., Carter, E. C., & Doucouliagos, H. (2018). What meta-analyses reveal about
548 the replicability of psychological research. Psychological Bulletin, 144 (12),
549 1325–1346. https://doi/10.1037/bul0000169
550 Stanley, T. D., & Doucouliagos, H. (2017). Neither fixed nor random: Weighted least
551 squares meta-regression. Research Synthesis Methods, 8 (1), 19–42.
552 https://doi.org/10.1002/jrsm.1211
553 ter Schure, J., & Grünwald, P. (2019). Accumulation Bias in meta-analysis: The need to
554 consider time in error control. F1000Research, 8, Article 962.
555 https://dx.doi.org/10.12688/f1000research.19375.1
556 Toosi, N. R., Babbitt, L. G., Ambady, N., & Sommers, S. R. (2012). Dyadic interracial
557 interactions: A meta-analysis. Psychological Bulletin, 138 (1), 1–27.
558 https://doi.org/10.1037/a0025767
559 van Doorn, J., van den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A.,
560 Evans, N. J., Gronau, Q. F., Hinne, M., Kucharský, Š., Ly, A., Marsman, M.,
561 Matzke, D., Komarlu Narendra Gupta, A. R., Sarafoglou, A., Stefan, A.,
562 Voelkel, J. G., & Wagenmakers, E.-J. (in press). The JASP guidelines for
563 conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review.
564 https://psyarxiv.com/yqxfr
565 van Erp, S., Verhagen, J., Grasman, R. P., & Wagenmakers, E.-J. (2017). Estimates of
566 between-study heterogeneity for 705 meta-analyses reported in Psychological
567 Bulletin from 1990–2013. Journal of Open Psychology Data, 5 (1), Article 4.
568 http://doi.org/10.5334/jopd.33 ADJUSTING FOR PUBLICATION BIAS IN JASP 35
569 Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2019).
570 Rank-normalization, folding, and localization: An improved Rb for assessing
571 convergence of mcmc.
572 Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in
573 the presence of publication bias. Psychometrika, 60 (3), 419–435.
574 https://doi.org/10.1007/BF02294384
575 Wagenmakers, E.-J., Morey, R. D., & Lee, M. D. (2016). Bayesian benefits for the
576 pragmatic researcher. Current Directions in Psychological Science, 25 (3), 169–176.
577 https://doi.org/10.1177/0963721416643289
578 Wrinch, D., & Jeffreys, H. (1921). On certain fundamental principles of scientific inquiry.
579 Philosophical Magazine, 42, 369–390. https://doi.org/10.1080/14786442108633773