METHODS FOR ADJUSTING FOR PUBLICATION BIAS
AND OTHER SMALL‐STUDY EFFECTS
IN EVIDENCE SYNTHESIS
Thesis submitted for the degree of
Doctor of Philosophy
At the University of Leicester
by
Santiago Gutierrez Moreno BSc
Department of Health Sciences
University of Leicester
September 2009
METHODS FOR ADJUSTING FOR PUBLICATION BIAS
AND OTHER SMALL‐STUDY EFFECTS
IN EVIDENCE SYNTHESIS
Santiago Gutierrez Moreno BSc
Abstract
Meta-analyses usually combine published studies, omitting those that for some reason have not been published. If the reason for not publishing is other than random, the problem of publication bias arises. Research into publication bias suggests that it is the ‘interest level’, or statistical significance of findings, not study rigour or quality, that determines which research gets published and subsequently publicly available.
When the results of the scientific literature as a whole are considered, such publication practices distort the true picture, which may exaggerate clinical effects resulting in potentially erroneous clinical decision-making. Therefore, meta-analyses (as well as other more complex evidence synthesis models) based on the published literature should be seen as ‘at risk’ of publication bias, which has the potential to bias conclusions and thus adversely affect decision-making. Many methods exist for detecting publication bias, but this alone is not sufficient if results from meta-analyses are going to be used within a decision-making framework. What is required in the view of this thesis is a reliable way to adjust pooled estimates for publication bias.
This thesis explores different novel and existing approaches to publication bias adjustment, including frequentist and Bayesian approaches with the aim to identifying those with the most desirable statistical properties. Special attention is given to regression-based methods commonly used to test for the presence of publication bias (and other ‘small-study effects’). The regression-based approach is seen to produce very encouraging results in a case study for which gold standard data exists. The incorporation of external information about the direction and strength of the bias is also explored in the hope of improving the methods’ performance. Ultimately, the routine estimation of the bias-adjusted effect is recommended as it improves the overall results compared to standard meta-analysis.
Acknowledgements
I would like to express my gratitude to Prof. Tony Ades for sharing his novel
Bayesian semi-parametric model considered in chapter 10 in addition to providing crucial input and inspiring thinking throughout the thesis. I thank Prof. Keith Abrams for providing constructive criticism in the methodological issues and devoting time to me despite his frantic agenda. I am also incredibly grateful to Dr. Nicola Cooper for being so supportive and encouraging while helping to improve the overall quality of this work.
I would like to thank Prof. John Thompson, Dr. Tom Palmer, Dr. Nicola Novielli and Dr.
Jaime Peters for constructive discussions regarding the work contained in this thesis.
Dr. Jaime Peters also assisted me with the English grammar for which I am grateful.
Thanks also to all my other colleagues and friends at the department for having made my stay here so wonderful.
Needless to say, this thesis would have never been possible without the guidance of my outstanding supervisor Prof. Alex Sutton. What is more, I am greatly beholden to him for putting some much faith in me from the very first day. I would like to acknowledge the Medical Research Council (HSRC) for sponsoring me during this four-year doctoral programme; in particular, to Prof. Paul Dieppe and his team for continuously injecting (the so much needed) enthusiasm to the PhD students. I am deeply indebted to my wife, mother, sisters and friends for their unconditional support and infinite faith in me. Ultimately, this thesis is dedicated to the memory of my father, who sadly did not live to celebrate its completion.
TABLE OF CONTENTS
Abstract
Chapter 1 – Introduction
1.1. Background………………………………………………………………...…. 1 1.2. Aims of the thesis……………………………………………………….……. 7
1.3. Thesis outline……………………………………………….………………… 9
Chapter 2 – Literature review on meta-analysis
2.1. Introduction to meta-analysis……………………………………………… 11 2.2. Fixed-effect meta-analysis model………………………………………… 12 2.3. Between-study variability (heterogeneity) ………………………….…… 16 2.4. Random-effects meta-analysis model……………………………….…… 21
2.5. Introduction to meta-regression…………………………………………… 26 2.6. Aggregation bias………………………………………………………….… 29 2.7. Multiple meta-regression…………………………………………………… 30 2.8. The regression model…………………………………………….………… 31 2.9. Summary………………………………………………………………..…… 33
Chapter 3 – Literature review on publication bias and small-study effects
3.1. Biases in meta-analytical data………………………………………..…… 34
3.2. Reporting biases………………………………………………………….… 35
3.3. Introduction to small-study effects………………………………………… 37
3.4. Sources of small-study effects………………………………..…………… 38
• 3.4.1. Reporting biases…………………………………………….…… 40
• 3.4.2. True/genuine heterogeneity……………………..……………… 42
• 3.4.3. Data irregularities………………………………………………… 44
• 3.4.4. Artifactual heterogeneity………………………………………… 48 3.5. Summary……………………………………………………………..……… 52
Chapter 4 – Evidence for publication bias (and other small-study
effects) and how to address it in meta-analysis 4.1. Evidence for publication bias (and other small-study effects) ……….… 53
• 4.1.1. Meta-epidemiological studies…………….………………..….… 53
• 4.1.2. Funnel plot …………………………………………………..…… 54
• 4.1.3. Contour-enhanced funnel plot……………………………..…….58
• 4.1.4. Tests for publication bias/small-study effects………….……… 61
4.2. Methods for addressing publication bias…………………………….…… 66
• 4.2.1. Prevention …………………………………………………...…… 66
• 4.2.2. Best evidence synthesis approach…………………...…...…… 68
• 4.2.3. Grey literature…………………………………………..………… 69
• 4.2.4. File-drawer number……………………………………..………...70
• 4.2.5. Trim & Fill adjustment method…………………………..……… 71
• 4.2.6. Selection models………………………………………………… 74
• 4.2.7. Multiple imputation……………………………………..………… 77 4.3. Summary………………………………………………………………..…… 78
Chapter 5 – Underlying theory for publication bias
adjustment through regression 5.1. Proposed method for adjusting for publication bias (and other small-study effects) …………………………………………… 79
5.2. The original Egger’s model………………………………………………… 84
5.3. Biases affecting the Egger’s model……………………….…….………… 86
• 5.3.1. Structural correlation……………………………………..……… 86
• 5.3.2. Regression assumptions………………………………..….…… 90
• 5.3.3. Measurement error and attenuation bias……….…………...… 91
• 5.3.4. Heteroscedasticity…………………………………………..…… 99 5.4. Weighted regression and the Egger’s model variants………………… 100
5.5. Discussion………………………………………………………….…….… 104 5.6. Summary………………………………………………………...……….… 108
Chapter 6 – Assessment of existing & novel methods for adjusting for
publication bias (and other small-study effects) through simulation 6.1. Introduction……………………………………………………………….… 109
6.2. Published simulation studies evaluating methods for publication bias..111 6.3. Statistical methods to be evaluated……………………………………… 114
• 6.3.1. Non-parametric adjustment Trim and Fill method………...… 118
• 6.3.2. Parametric adjustment methods (regression-based) ….…… 118
• 6.3.3. Conditional methods……………………………………….…… 122 6.4. Simulation procedures………………………………………………..…… 122
• 6.4.1. Level of dependence between simulated datasets……….… 123
• 6.4.2. Software to perform simulations………………………….…… 123
• 6.4.3. Number of simulations to be performed……………………… 123 6.5. Methods for generating the datasets…………………………………..… 124
• 6.5.1. Underlying effect size………………………………………...… 125
• 6.5.2. Number of primary studies in a meta-analysis……….……… 126
• 6.5.3. Event rate……………………………………………………..… 126
• 6.5.4. Number of events………………………………………...…..… 128
• 6.5.5. Number of subjects…………………………………………..… 128
• 6.5.6. Ratio of subjects………………………………………….…..… 130
• 6.5.7. Inducing heterogeneity…………………………..……….….… 130
• 6.5.8. Inducing publication bias……………………………...…..…… 134
o 6.5.8.1. Inducing publication bias by p-value………….….…137 o 6.5.8.2. Inducing publication bias by effect size……….……142 • 6.5.9. Impact of publication bias on between-study variance………145 6.6. Simulation scenarios to be investigated……………………………….…146
6.7. Criteria to assess the methods performance………………………….…148
• 6.7.1. Assessment of bias (model accuracy) ……………………..…149
• 6.7.2. Combined assessment of bias (model accuracy) and variability (model precision) ………………………………………………………..149
• 6.7.3. Assessment of coverage……………………………………...…152
• 6.7.4. Assessment of variability……………………………………..….153
6.8. Results of the simulation study……………………………………………155 6.9. Discussion ………………………………………………………………..…168 6.10. Summary……………………………………………………………….…..178
Chapter 7 – Adjustment method implemented on a case study where a gold standard exists 7.1. Antidepressants case study………………………………………….…….179 7.2. Data Collection……………………………………………………….…...…180
7.3. Analysis……………………………………………………….……….…..…181 7.4. Results………………………………………………………….……….……182 7.5. Discussion…………………………………………………….…………..…190 7.6. Summary……………………………………………………….………….…198
Chapter 8 – The case for the routine implementation of the adjustment method
8.1. Introduction……………………………………………………….………..…199 8.2. Weighting properties of the regression approach………………...……...199
8.3. Pre-eclampsia case study………………………………………….…….…201 8.4. ‘Set shifting’ ability case study…………………………………….…...…..206 8.5. Discussion……………………………………………………………...….…210 8.6. Summary………………………………………………………………..……213
Chapter 9 – Simplified Rubin’s surface estimation
9.1. Introduction……………………………………………………….….………214
9.2. The ‘effect size surface’ approach…………………………….…..………215
9.3. Rubin’s surface function in terms of small-study effects…….….………217
9.4. Discussion……………………………………………………….…..………223
9.5. Summary………………………………………………………….…………224
Chapter 10 – Novel Bayesian semi-parametric regression model
10.1. Introduction to Bayesian statistics in meta-analysis………………..…225 10.2. Parametric versus non-parametric modelling……………………….…233
10.3. WinBUGS – Bayesian model fitting software………………………..…234 10.4. Description of the semi-parametric regression model…………………235 10.5. Implementation upon the magnesium case study…………………..…239 10.6. Additive regression and sensitivity analysis of model parameters...... 240 10.7. Simulation study & customized models to be evaluated………….…..247
10.8. Results of the simulation study…………………………………….….…248 10.9. Simulation procedures & model checking………………………..….....249 10.10. Discussion & Summary………………………………………………....250
CHAPTER 11 – A fully Bayesian approach to regression adjustment by using
prior information 11.1. Use of external data to inform the small-study effects trend……. ..…251 11.2. Application to the antidepressants case study……..…………….……252
11.3 Deriving the prior distribution…. …………………………………………256 11.4. Discussion on the use of prior information…………………………..…264 11.5. Introduction to network meta-analysis…………………………….….…268 11.6. Adapting the adjustment method to network meta-analysis…….….…271
11.7. Summary………………………………………………………………...…272
CHAPTER 12 – Discussion & conclusions
12.1. Thesis summary………………………………………………….……..…273
12.2. Discussion……………………………………………………………….…276
12.3. Further work…………………………………………………………..……283
12.4. Conclusions……………………………………………………………...…290
APPENDIXES 1. Logistic regression formulation……………………………………………….….…292 2. Relationship between p-value and effect size……………………………….……293 3. Publication bias intensity levels established by Duval and Tweedie……………295
4. Additional plots summarising the results from the remaining scenarios……..…295 5. Derivation of equations presented in chapter 8……………………………..….…315
BIBLIOGRAPHY…………………………………………………………………….……320
ABBREVIATIONS
CI Confidence Interval DIC Deviance Information Criterion FDA Food and Drug Administration (USA) FE Fixed-Effect IPD Individual Patient Data lnOR Natural logarithm of the Odds Ratio MA Meta-Analysis MCMC Markov Chain Monte Carlo MR Meta-Regression MSE Mean sum of Squared Errors OLS Ordinary Least Squares OR Odds Ratio PB Publication Bias RCT Randomised Controlled Trial RE Random-Effects SMD Standardised Mean Difference TF Trim & Fill VWLS Variance-Weighted Least Squares WOLS Weighted Ordinary Least Squares
Publications, poster and oral presentations associated with this thesis
1. Article published in the BMJ relating to chapter 7 (Moreno et al 2009b) 2. Letter published in the Lancet about the impact of publication bias on network meta-analysis (Turner et al 2009a) 3. Article published in the BMC Medical Research Methodology journal regarding the simulation study described in chapter 6 (Moreno et al 2009a) 4. Article accepted for publication in PLoS Medicine evaluating the ‘deceptive’ efficacy of C reactive protein as a prognostic marker among patients with stable coronary artery disease (Hemingway et al 2009) 5. Article published in the Stata journal about the use of WinBUGS from within Stata (Thompson et al 2006) 6. Article accepted for publication in the Stata journal about further Stata commands for Bayesian analysis (Thompson et al 2009) 7. Article published in the Stata journal about the Stata command for the contour- enhanced funnel plots (Palmer et al 2008), which was later compiled in a book (Sterne et al 2009a) 8. Article in press by the journal of the Royal Statistical Society (Serie A) Assessing publication bias in meta-analysis in the presence of between-study heterogeneity (Peters et al 2009) 9. Oral presentation during the 16th German Cochrane Colloquium summarizing chapters 5-7 (Moreno et al 2008) 10. Oral presentation during the MiM (Methods in Meta-analysis) meeting (June 2007) describing the simulation study design and the interpretation of preliminary results 11. My poster presentation at the Royal Statistical Society received the best poster award at the 2007 conference in York. The poster described the design of the simulation study and the interpretation of its preliminary results 12. Oral presentation at the conference for the society of Medical Decision Making in Birmingham (June 2006). Preliminary results from the Bayesian semi-parametric model described in chapter 10 were discussed 13. Other publications by the author of the thesis that are not related to this research project (Mar et al 2006, Comas et al 2008, Roman et al 2008, Oliva et al 2009, Parra-Blanco et al 2010) 14. A press release about some of the results of the thesis has been posted on AlphaGalileo, the online news centre for European research for distribution to
journalists all over the world (www.alphagalileo.org/ViewItem.aspx?ItemId=60234&CultureCode=en)
Introduction
1.1. Background
A collection of unbiased data is vital in order to make reliable inferences (Copas &
Li 1997). It is well known that the selection of evidence favouring one particular hypothesis will lead to biased inferences (Melander et al 2003). As an illustrative example, news coverage is generally focussed on what editors and journalists perceive to be of ‘interest’ to the public (Goldacre 2008). Therefore, if the general public solely considers the media coverage as the only valid source of information, it is not surprising that the general opinion is systematically biased. For instance, railway safety is generally perceived worse since its privatisation. Undoubtedly, the extensive and continuous media coverage of the tragic rail accidents has led to this widely held belief.
What this thesis proposes is that instead of taking for granted that any compilation of evidence is fairly representative of the state of affairs, evidence should be first scrutinised to examine whether it is indeed the case. In truth, there is not enough robust evidence from the accident data 1967-2005 to conclude that privatisation compromised safety (Evans 2007). Similar to news coverage, scientific literature can also provide a distorted picture of the truth with consequences that cannot be ignored
(Dickersin 2008, Landefeld & Steinman 2009).
In health-related research, systematic reviews are used to compile, assess and summarise the scientific evidence around a particular research question. The overall aim of these reviews is to evaluate comprehensively the available evidence keeping potential biases to a minimum. This is achieved by following a systematic protocol so that the review is undertaken in a structured and explicit way (see the Cochrane guideline on literature reviews as the most prominent one (Jørgensen et al 2006,
Higgins & Green 2008, Jørgensen et al 2008, Anne et al 2009)). This systematic
1 approach to evidence synthesis also allows for reproducibility and for ease in updating the review (Sutton et al 2000a, Egger et al 2001, Sutton et al 2009b).
Meta-analysis (MA) is considered to be the quantitative feature of the evidence synthesis process. MAs following systematic reviews of randomised controlled trials
(RCTs) are regarded as the highest level of evidence in medicine for evaluating interventions (Harbour & Miller 2001) and are the main source of knowledge for physicians and other health professionals (Stinson & Mueller 1980). In mathematical essence, the estimate from a MA can be thought of as a weighted average of the results of the primary studies, where the weighting depends on some measure of study precision (so that smaller studies are given less weight). However, MA results are only as valid as the available evidence and so depend on the appropriateness of the compiled studies (Stangl & Berry 2000, Melander et al 2003).
This thesis focuses on the most common meta-analytical setting. That is, summary efficacy data from RCTs. Indeed, RCTs are the most popular study design in scientific experiments for investigating efficacy of interventions, particularly health technologies in the medical science. Its key advantage is that random allocation of interventions
(treatment or control) to patients plays a key role in preventing confounding bias in the experimental studies (Kunz et al 2008). Interestingly, a MA of RCTs shall be considered as an observational study (Egger et al 1997b, Good & Hardin 2003) because all the available studies are combined without any kind of randomisation between them. Yet, inferences from MAs are usually made under the assumption of randomisation. Thus, MAs are susceptible to unforeseen selection mechanisms that could induce sampling bias with the end result of “evidence b(i)ased medicine”
(Melander et al 2003).
2
Predominantly, MA is used to estimate the pooled treatment effect based on all the evidence collected. Unfortunately, MAs may provide a bias estimate of efficacy due to publication bias (PB), which may threaten the validity of findings and consequently mislead decision makers into incorrect funding and service provision (Rising et al 2008,
Turner et al 2008a). Conventionally, PB is defined as the tendency to publish a study based on its results, rather than based on its theoretical or methodological quality
(Berlin et al 1989). This implies that it is the ‘interest level’, or statistical significance of findings (Sterling 1959, Nieminen et al 2007), not study rigour or quality, that determines which research is published and subsequently publicly available. Since PB threatens the internal validity of meta-analytic results, PB is one of the most widely researched topics in MA (Stangl & Berry 2000). Indeed, PB is considered the most important bias within the selection biases category, which refers to the lack of accuracy of the sampling frame (Delgado-Rodriguez & Llorca 2004). That is, the selection process generates a sample that is not representative of the population of existing studies because the missing studies are not missing at random. Therefore, the inferences made from this biased sample may be erroneous.
In other words, PB occurs when the published studies in the MA are not representative of the totality of all existing research (Preston et al 2004), where the absent studies are missing depending on the perceived value of their findings. To sum up, research that achieves ‘interesting’ or encouraging results is more likely published, potentially biasing the literature towards ‘interesting’ conclusions that may not be representative of the underlying truth (Rothstein et al 2005). Hence, it is imperative that, whenever decisions need to be made based on the evidence available, awareness is needed when interpreting the findings because they may be subjected to this phenomenon -publication bias (PB)- as it has come to be known (Egger & Smith
1995, Sterne et al 2001b).
3
The problem of PB was initially raised as early as 1605, with a clear reference to medical science in 1909 (Dickersin et al 1993, Petticrew 1998). Evidence about PB was first compiled by Sterling in 1959 (Sterling 1959, Sterling et al 1995) when he realised that the vast majority of published papers reported significant results, which entailed that studies with non-significant findings were somehow more likely to be missing/unpublished. Since then, there has been much scientific discussion of the topic
(Chalmers et al 1990, Dickersin 1990, Song et al 2000, Liesegang et al 2008) revealing some shocking episodes (Egger & Smith 1995). Statisticians have also developed methods to either detect or adjust for PB although their application is not routine in the literature (Thornton & Lee 2000, Pharm et al 2001, Rothstein et al 2005).
The reason PB is such a crucial topic in health sciences is that it applies to the reporting of study findings on health technologies. The consequences of a biased medical literature range from waste of public resources to the harm of human patients.
For these reasons, PB can be considered scientific misconduct (Dickersin 2008,
Liesegang et al 2008). All areas of empirical research are susceptible to PB (Stanley
2005) but the consequences of PB can be arguably more severe in health care even when a pooled estimate is not intended, such as in qualitative research (Petticrew et al
2008). Altogether, trials on new drug therapies that achieve ‘interesting’ or encouraging results are more likely published than those with ‘uninteresting’ results (Liesegang et al
2008). Consequently, the synthesis of the biased evidence will tend to exaggerate the benefits of the novel therapy even when such therapy may be equally effective than the comparator, ineffective or even harmful (Smith 1980). Regardless of the motives for its existence, either unintentional or deliberate (Rennie 1999, Calnan et al 2006, Young et al 2008), PB has severe implications. The results of the evidence synthesis exercise are compromised and the integrity of medical research is questioned (Rennie 1999,
Jørgensen et al 2006, Jørgensen et al 2008); but more importantly, exaggerated clinical effects lead those caring for patients to make potentially inappropriate treatment decisions (Rothstein et al 2005). The unpredictable consequences of this affect
4 patients, and hence, all of us at some point during our lives. What is more, ethical implications may derive from breaking the agreement between investigators and trial participants by not publishing all the results on human studies (Chalmers 1990,
Krzyzanowska et al 2003, Curt & Chabner 2008, Doroshow 2008, Dubben 2009). Thus,
PB is of such concern in the field of medicine that ignoring it is definitely not an option
(Baker & Jackson 2006).
A simple and effective measure to attenuate substantially the global problem of PB has been proposed (Horton & Smith 1999, Abbasi & Godlee 2005, Liesegang et al
2008). Prior registration of drug trials must become a requisite if their results are to be utilised, entailing that no trials with ‘uninteresting’ or discouraging results can just vanish. To this end, the World Health Organization is making a significant contribution to the prevention of PB by proposing world standards for prospective trial registration of all human medical research (WHO 2009b).
Although the best solution to PB is to prevent it (Rothstein et al 2005), the truth is that the underlying selection mechanism that leads to PB has continued unchanged over a period of at least thirty years (Dickersin et al 1993, Sterling et al 1995). The use of gold standard data sources, such as the US Food and Drug Administration (FDA) trial registry database, is one way of achieving a less biased data collection (Lee et al
2008, Rising et al 2008, Turner et al 2008a). Nevertheless, this is a lengthy and not always feasible remedy, and so there is a need to rely on analytic methods to deal with the problem.
The typical procedure followed when PB is suspected in a MA is to simply test for its presence. However, the question remains as to what to do when the test result is positive. Should all the trials in the MA be disregarded because there is evidence of significant PB in the data? (Vandenbroucke 1998, Sterne et al 2001b); or alternatively, should one act with caution when interpreting the results? Neither approach is sufficient
5 if the MA result is aimed at informing policymaking. Often, it is equally inappropriate to assume the absence of PB when failing to detect PB since the probability of a type II error is large in such tests, particularly in heterogeneous MAs with few studies (Peters et al 2005, Ioannidis 2008b, Peters et al 2009). What is required, then, is a reliable way to adjust pooled estimates for PB accordingly to allow for more reliable decision- making.
Altogether, this thesis highlights the serious consequences for science in general and clinical decisions in particular when based on selective and thus biased information; specifically, the problem of PB. The view of this thesis is that conclusions from the MA cannot be just a warning message about the potential danger of apparent
PB, but a corrected effect estimate that can be assumed unbiased so that it is incorporated into decision-making to allow for more consistent judgments. It is crucial, however, to be realistic about the potential achievements of this research project in developing statistical methods to adjust for PB. The main limitation is that the underlying criteria for publication is unknown and may differ between the different stakeholders involved in the publication process. Indeed, the publication selection mechanism “may depend on many factors for which the available data can only act as a proxy” (Copas & Malley 2008). Until such a time when the uncertainty surrounding the process of PB is understood (if ever), no statistical approach will entirely correct for PB, but ignoring it is an unwise option (Copas 2005, Baker & Jackson 2006). Hence, the ultimate objective of this thesis is to develop a statistical method able to estimate a pooled estimate adjusted for PB as is feasible for the purpose of decision-making in health policy. The next section develops the aims of the thesis more explicitly.
6
1.2. Aims of the thesis
The issues that justify the need for this project have been covered above. With this in mind, the overall aim of this work is to contribute to the development of valid approaches to addressing the problem of PB in evidence synthesis, with special attention to MA. The project consists of the following elements to facilitate the accomplishment of this core aim:
1. Investigation of the problem of PB (and other biases) affecting meta-analytic data.
Review and critically appraise approaches that have so far been proposed to tackle
PB; and a discussion of advantages and disadvantages of the different methods
will accompany the presentation of each one.
2. Development of alternative techniques aimed at overcoming some of the limitations
of currently applied methods, either by extending and adapting existing methods or
by developing new ones. To facilitate the evaluation and comparison between
competing adjustment methods, a simulation study is undertaken. Moreover,
several case studies are used to demonstrate the implementation of the adjustment
methods proposed, to compare them with currently used methods, and to illustrate
the potential impact of inappropriate analyses on the estimation of the bias-
adjusted pooled effect.
3. Provide recommendations to advise meta-analysts as to how best to address PB
(and other small-study effects) in MA and other more complex evidence synthesis
models. This implies the proposal of an adjustment method as the preferred one
justified by its strengths and limitations in comparison to the alternatives. Besides
proposing an adjustment method, exploration of the potential benefits and
methodological challenges in incorporating external information (within a Bayesian
framework) is carried out with the intention of improving the accuracy of the bias-
corrected pooled effect.
7
4. In addition to presenting the results, the simulation study designed here is
proposed as a consensus simulation framework in which future testing and
adjustment methods can be evaluated. This should alleviate the previous problems
of the methods being evaluated under different (and arguably favourable)
simulation conditions.
8
1.3. Thesis outline
Subsequent to the introductory chapter, chapter 2 comprises a literature review of essential issues concerning MA. It covers fixed and random-effects approaches to MA.
Methodological approaches to dealing with fundamental aspects of meta-analytic data such as heterogeneity are also dealt with in chapter 2 with special focus on meta- regression.
Chapter 3 defines the different types of biases affecting meta-analytic data, highlighting how PB, as well as other biases, induce a traceable trend known as small- study effects that is often used to detect the presence of PB in MAs. Chapter 4 reviews and critically appraises the approaches that have so far been proposed to address PB.
Chapter 5 develops an alternative approach to PB adjustment by proposing a regression-based method as the most coherent way of tackling PB (and other small- study effects).
Chapter 6 presents a simulation study designed to compare novel and existing methods to adjust for PB (and other small-study effects) from a frequentist perspective.
In chapter 7, the preferred adjustment method is illustrated in a case study as a first step to check the external validity of the method’s results. In order to facilitate a better understanding of the properties of the proposed method relative to the standard MA, chapter 8 derives algebraically the weighting scheme of the adjustment method of choice.
9
Chapter 9 investigates the links between Rubin’s ‘effect size surface estimation’ approach and the adjustment method proposed here. The following chapter 10 addresses some shortcomings in the frequentist-based adjustment methods by embarking on a Bayesian approach to PB adjustment. Chapter 11 goes a step further by investigating the way in which external information can further assist in the goal of adjusting for PB more accurately. This chapter also examines the benefits and methodological challenges in adapting the adjustment method of choice to the more complex evidence synthesis framework of network MA (also known as mixed treatment comparison models (Sutton et al 2008)).
Chapter 12 concludes by identifying important issues for routine practice. For that, chapter 12 summarizes the comparative benefits and limitations of the proposed adjustment (discussed across the thesis) used to justify replacing the present naïve MA approach with the routine adjustment of small-study effects. Future research to address unanswered questions is also suggested. Some additional material such as published research articles and supplementary plots can be found in the appendixes.
10
Literature review on meta-analysis
2.1. Introduction to meta-analysis
MA allows the quantitative estimation of the mean effect across several primary studies. The major benefit of MA is that it can provide a more precise answer to the research question of interest compared to a unique trial thanks to the statistical power gained. There are numerous meta-analytic techniques, which have been extensively discussed in the literature (Stangl & Berry 2000, Sutton et al 2000a, Egger et al 2001).
Techniques relevant to the aims of this thesis are reviewed in this and subsequent chapters. As noted earlier, the most commonly reported pooled estimate is calculated using a weighted average of the results of primary studies, where the weight corresponds to the precision of the effect estimate so that studies with large precision are given more weight in the MA. The simplest one is the fixed-effect (FE) MA that defines study precision as the inverse of study variance, and so it is known as the inverse-variance weighted MA model (section 2.2). Whenever heterogeneous effects need combining, a random-effects (RE) MA model is used (section 2.4). This is also based on the concept of inverse-variance weighting although a between-study variance parameter is comprised in the weighting to allow for heterogeneity (section 2.3). Both
FE & RE MA models are used in this thesis hereafter by default.
The inverse-variance approach can be applied in many MA scenarios to any outcome measure with an associated standard error. Other methods of estimation of the standard FE pooled estimate are the Mantel-Haenszel (Mantel & Haenszel 1959),
Peto (Peto et al 1985) and maximum likelihood based methods (Emerson 1994).
Although these are also available, they are likely to give very similar results to the inverse-variance approach in the ordinary RCT setting. The Bayesian approach to MA
(Spiegelhalter et al 2003) is also applied in chapter 10 and tends to provide similar
11 results to the frequentist inverse-variance approach (Good & Hardin 2003) provided vague prior information is used.
This thesis focuses on the most common meta-analytical setting (Morton et al
2004) for RCTs, with special attention to dichotomous outcomes (binary data). Only the logarithm odds ratio is considered as the summary statistic for binary data by means of the logistic link function. Note that any log transformation exclusively relates to the natural log transformation with base e. Nevertheless, some case studies also consider continuous data in the form of standardised mean differences, which is also common in the context of MA.
Fixed and random-effects meta-analytic models are the two main models used to combine results from individual studies (Song et al 2001); and the choice of one over the other is usually made on the basis of the variability between the study effect estimates (Viechtbauer 2007). Between-study variability (section 2.3), also known as heterogeneity (Sutton & Higgins 2008), is an important feature of any MA that must be considered and explored (Thompson 1994). Moreover, it is extensively advocated that the estimation of heterogeneity and exploration of its sources is as important as the estimation of the pooled effect itself (Stangl & Berry 2000, Morton et al 2004) (section
2.5).
2.2. Fixed-effect meta-analysis model
The fixed-effect (FE) MA model assumes that the summary estimates from the individual studies are all estimating the same underlying effect; i.e. the observed study effect sizes are all sampled from a common underlying distribution and therefore are homogeneous. Any differences between study effect estimates are assumed to be due to sampling error only (i.e. within-study variance) so that patients are assumed comparable between studies in relation to treatment efficacy. That is, patients’ and
12 study characteristics (such as the way the therapies are applied) are assumed not to interact with the effect size. If some of these characteristics did interact, then all studies and patients should share them (in equal measure), otherwise heterogeneous effects could arise. Figure 2.1 exhibits the effect size distribution of four hypothetical homogeneous studies that share a common underlying effect size θ. However, the meta-analyst only observes the reported effect sizes on the right hand side (◊); which only differ due to sampling error (i.e. within-study variation).
Figure 2.1 Illustration of a meta-analysis under the fixed-effect model assumptions
(with permission from Wolfgang Viechtbauer; presentation at Reading University 2008)
θ
The parameterisation below of a FE MA model suitably accounts for continuous outcome data (Sutton et al 2000a),
2 [Equation 2.1] yi ~ N(θ,σ )
th yi is the effect size estimate in the i study, while θ is the true common effect size
2 with variance σ . Hence, the pooled estimate of effect θ is given by:
13
∑ ∑
The weights that minimise σ 2 (variance of θ) are the inverse of the estimate of
2 study variance wi= 1/vi. An estimate of σ is given by the reciprocal of the sum of weights σ 1 ∑ w
Although many RCTs collect binary data, often summarised and reported as odds ratios (OR), such data can be suitably combined with the above model. Note that the
OR is defined as a measure of association between exposure to a risk factor or intervention and the clinical event. More specifically, the OR is the ratio of two odds, one representing the active treatment group and the other placebo or no treatment group. Although the OR is conceptually challenging to interpret, it could be interpreted as a risk ratio (ratio of two probabilities) when pi is small, which is easier to understand conceptually. However, this equivalence in interpretation is erroneous as soon as pi is no longer small (Norton et al 2004).
th The OR of the i study can be calculated byOR where for each study i, a and b represent the observed number who experience the outcome of interest in the treated and control groups, respectively, and c and d are the numbers corresponding to those not developing the outcome in the treated and control group
th respectively. Thus, the sample size of the i study corresponds to the sum of ai, bi, ci and di.
14